focal.rl_pipeline module

Custom Gym environment that simulates cleaving fibers using a surrogate CNN model.

The agent adjusts tension over multiple steps to achieve optimal cleave quality, which is evaluated via a CNN surrogate model. Observations include fiber context and tension; rewards are based on CNN predictions and a guassian reward function when predicted tensions is close to optimal value.

class focal.rl_pipeline.CleaveEnv(csv_path: str, cnn_path: str, img_folder: str, feature_shape: List[int], threshold: float, max_steps: int, low_range: float, high_range: float, max_delta: float, max_tension_change: float, quality_weight=100.0, proximity_weight=50.0, scale=25.0)[source]

Bases: Env

Creates the simulated cleave enviornment.

load_process_images(filename: str) Tensor[source]

Load and preprocess image from file path.

Parameters:

filename – Image filename or path

Returns:

Preprocessed image tensor

Return type:

tf.Tensor

metadata: dict[str, Any] = {'render_modes': ['human']}
render(action: ndarray, cnn_pred: float, reward: float) None[source]

Render the environment’s current state in human-readable format.

Parameters:
  • action (np.ndarray) – The action taken (as a 1D array).

  • cnn_pred (float) – The CNN’s predicted cleave quality.

  • reward (float) – The reward received after the action.

reset(seed: int | None = None, options: dict | None = None) Tuple[ndarray, Dict[str, Any]][source]

Reset the environment to an initial state.

Parameters:
  • seed (Optional[int]) – Random seed for reproducibility.

  • options (Optional[dict]) – Additional options for reset (unused).

Returns:

Initial observation and empty info dict.

Return type:

Tuple[np.ndarray, dict]

step(action: Any) Tuple[ndarray, float, bool, bool, Dict[str, Any]][source]

Take a step in the environment using the given action.

Parameters:

action (gym.ActType) – A 1D array-like action representing tension adjustment.

Returns:

  • observation (np.ndarray): The next observation.

  • reward (float): The reward received after taking the action.

  • terminated (bool): True if the episode ends successfully.

  • truncated (bool): True if the episode is truncated (max steps reached).

  • info (dict): Additional info (empty by default).

Return type:

Tuple

class focal.rl_pipeline.TestAgent(csv_path: str, cnn_path: str, img_folder: str, agent_path: str, feature_shape: List[int], threshold: float, max_steps: int, low_range: float, high_range: float, max_delta: float, max_tension_change: float)[source]

Bases: object

test_agent(episodes: int) Dict[source]

Test the trained RL agent on random episodes.

Parameters:

episodes (int) – total number of episodes to test agent

class focal.rl_pipeline.TrainAgent(csv_path: str, cnn_path: str, img_folder: str, threshold: float, feature_shape: List[int], max_steps: int, low_range: float, high_range: float, max_delta: float, max_tension_change: float)[source]

Bases: object

Class for training the RL agent

save_agent(save_path: str) None[source]
train(env: Env, device: str, buffer_size: int, learning_rate: float, batch_size: int, tau: float, timesteps: int) None[source]

Train the agent using Soft Actor Critic algo.

Parameters:
  • env (gym.Env) – simulated training enviornment

  • device (str) – cuda to use GPU

  • buffer_size (int) – replay buffer size

  • learning_rate (float) – typical learning rate for ml

  • batch_size (int) – number of episodes to batch together

  • tau (float) – Soft update coefficient