Step-by-Step Guide

Training and testing an end-to-end machine learning pipeline on a custom dataset can be challenging. Fortunately, I have developed a set of steps to make this process easier to follow. By following these steps in order, you can develop a complete model for predicting cleave quality and optimizing the tension parameter for input to the LDC handset.

Step 1: Train CNN

After installing the application on your local machine and doing a test run (as shown in Getting Started), the next step is to fully develop the CNN classification model. Use the skeleton file provided under config_files/config_skeletons/train_cnn_skeleton.json to input the required parameters for CNN training. You can rename this file to suit your needs. I will refer to it as train_cnn.json for the remainder of the tutorial.

Most likely, you will not achieve your desired result metrics in this first run, but this serves as a foundational training run to establish a baseline.

To start the training run:

focal --file_path config_files/train_cnn.json

Step 3: Train CNN (With Optimal Parameters)

Now that you have the optimal hyperparameters from the search, run the train_cnn.json file again. The resulting metrics should improve significantly with the correct parameters.

You may still want to experiment with early_stopping, checkpoints, and other training options to achieve the best final model. This is done experimentally. There is no established routine for always achieving the ideal model state.

Step 4: Train MLP or XGBoost Regression Model

For determining the ideal change in tension, you have two options:

  1. MLP Regression Model — Uses the same architecture as the CNN classification model but with a different final activation function to output a continuous value.

  2. XGBoost Model — A faster alternative that can save computation time (preferred choice).

In either case, you must provide the path to the trained CNN model from Step 3. Also, fill out the required parameters in either config_files/config_skeletons/train_mlp_skeleton.json or config_files/config_skeletons/train_xgb_skeleton.json depending on your chosen model. Again, you can rename this to suit your needs.

To run the MLP regression training: ( Assuming you renamed the file to train_mlp.json)

focal --file_path config_files/train_mlp.json

To run the XGBoost regression training (Assuming you renamed the file to train_xgb.json):

focal --file_path config_files/train_xgb.json

Step 6: Train MLP Model (With Optimal Hyperparameters)

Update your train_mlp.json file with the optimal hyperparameters from Step 5, then re-run the training.

The model should now produce improved regression performance for predicting optimal tension value.

Step 7: Prediction Testing for CNN Model

Now that you have a fully trained CNN classification model, you can evaluate its performance on an unseen dataset. This step is crucial to verify that the model generalizes well and does not overfit to the training data.

Use the provided config_files/config_skeletons/test_cnn_skeleton.json configuration file to specify the following:

  • Path to the trained CNN model from Step 3 (or the latest retraining with optimal hyperparameters).

  • Path to the test dataset containing images that were not used in training or validation.

  • Any required preprocessing or scaling parameters (if applicable).

To run the prediction test:

focal --file_path config_files/test_cnn.json

The script will output:

  • Classification metrics such as accuracy, precision, recall, and F1-score.

  • Confusion matrix visualization to see where the model is making errors.

  • ROC curve and AUC score for binary classification evaluation.

All metrics and plots will be saved to MLflow tracking for review and comparison.

Step 8: Prediction Testing for MLP/XGBoost Models

Once you have trained your MLP or XGBoost regression model for predicting optimal tension, the next step is to evaluate its performance on an unseen test dataset.

Use the provided config_files/config_skeletons/test_mlp.json or config_files/config_skeletons/test_xgb.json configuration file, depending on which model you trained. In the config file, specify:

  • Path to the trained regression model from Step 6 (MLP) or Step 4 (XGBoost).

  • Path to the test dataset containing images and associated features not used in training.

  • Path to the trained CNN model from Step 3, which is required for feature extraction.

  • Any necessary preprocessing/scaler paths for feature normalization.

To run the prediction test for the MLP regression model:

focal --file_path config_files/test_mlp.json

To run the prediction test for the XGBoost regression model:

focal --file_path config_files/test_xgb.json

This, again, assumes you renamed the files.

The script will output a printout of predicted tension change vs. actual tension change. All metrics, plots, and evaluation artifacts will be saved to MLflow tracking for detailed analysis.

Step 9 (Optional): Train Reinforcement Learning (RL) Agent

For advanced optimization of the tension parameter, you can optionally train a Reinforcement Learning (RL) agent. This approach allows the model to learn tension adjustments through iterative feedback rather than relying solely on the supervised regression outputs from the MLP or XGBoost models.

The RL agent interacts with a simulated or real-world environment where:

  • State: Feature vector combining image-derived CNN features and physical parameters.

  • Action: Proposed change to the tension setting.

  • Reward: Improvement in cleave quality score compared to the baseline.

Note

Training the RL agent will require you to develop a CNN surrogate model to associate the numerical features from the csv dataset with the labeled cleave quality. This is not implemented in the code, but can be done using a simple XGBoost regression model.

The RL training process can be launched using the config_files/config_skeletons/train_rl.json configuration file, which should include:

  • Training hyperparameters

  • Path to the trained CNN surrogate model.

To run RL training:

focal --file_path config_files/train_rl.json

The RL agent will be trained to maximize the long-term reward for cleave quality, and the resulting policy will be saved for future evaluation and deployment.

Note

RL training is computationally intensive and may require GPU acceleration or distributed training resources.

Step 10 (Optional): Test Reinforcement Learning (RL) Agent

For testing the RL agent, fill out to corresponding input parameters in config_files/config_skeletons/test_rl.json and then run the following command:

focal --file_path config_files/test_rl.json

The script will output:

  • Start tension for specific fiber Type

  • Change in tension chosen for each step the agent takes

  • The recieved reward value