Step-by-Step Guide¶
Training and testing an end-to-end machine learning pipeline on a custom dataset can be challenging. Fortunately, I have developed a set of steps to make this process easier to follow. By following these steps in order, you can develop a complete model for predicting cleave quality and optimizing the tension parameter for input to the LDC handset.
Step 1: Train CNN¶
After installing the application on your local machine and doing a test run (as shown in Getting Started),
the next step is to fully develop the CNN classification model.
Use the skeleton file provided under config_files/config_skeletons/train_cnn_skeleton.json to input the required parameters for CNN training. You can rename this file to suit your needs.
I will refer to it as train_cnn.json for the remainder of the tutorial.
Most likely, you will not achieve your desired result metrics in this first run, but this serves as a foundational training run to establish a baseline.
To start the training run:
focal --file_path config_files/train_cnn.json
Step 2: CNN Hyperparameter Search¶
Choosing optimal hyperparameters is essential for achieving valuable metrics when training the model. Finding these values through brute force can be particularly frustrating and time-consuming.
Fortunately, Keras provides a class for hyperparameter tuning that makes this search much easier to implement.
Using the config_files/config_skeletons/cnn_hyperparameter_skeleton.json config file, fill out the required inputs as described in Configuration.
To run the search:
focal --file_path config_files/cnn_hyperparameter.json
The optimal parameters will be printed to the console and saved to MLflow tracking. Record these parameters and re-enter them into the cnn training file.
Step 3: Train CNN (With Optimal Parameters)¶
Now that you have the optimal hyperparameters from the search, run the train_cnn.json file again.
The resulting metrics should improve significantly with the correct parameters.
You may still want to experiment with early_stopping, checkpoints, and other training options to achieve the best final model.
This is done experimentally. There is no established routine for always achieving the ideal model state.
Step 4: Train MLP or XGBoost Regression Model¶
For determining the ideal change in tension, you have two options:
MLP Regression Model — Uses the same architecture as the CNN classification model but with a different final activation function to output a continuous value.
XGBoost Model — A faster alternative that can save computation time (preferred choice).
In either case, you must provide the path to the trained CNN model from Step 3.
Also, fill out the required parameters in either config_files/config_skeletons/train_mlp_skeleton.json or config_files/config_skeletons/train_xgb_skeleton.json depending on your chosen model.
Again, you can rename this to suit your needs.
To run the MLP regression training: ( Assuming you renamed the file to train_mlp.json)
focal --file_path config_files/train_mlp.json
To run the XGBoost regression training (Assuming you renamed the file to train_xgb.json):
focal --file_path config_files/train_xgb.json
Step 5: MLP Hyperparameter Search¶
Similar to the CNN model, you will likely not achieve optimal metrics from randomly chosen hyperparameters in train_mlp.json.
Follow the same procedure from Step 2, but use the config_files/config_skeletons/mlp_hyperparameter_skeleton.json file.
To run the search (Assuming you renamed the file to mlp_hyperparameter.json):
focal --file_path config_files/mlp_hyperparameter.json
The optimal hyperparameters will be printed to the console and saved to MLflow. Record these for use in Step 6.
Note
Currently, there is no implementation for automated hyperparameter tuning for XGBoost models. This feature is in development.
Step 6: Train MLP Model (With Optimal Hyperparameters)¶
Update your train_mlp.json file with the optimal hyperparameters from Step 5,
then re-run the training.
The model should now produce improved regression performance for predicting optimal tension value.
Step 7: Prediction Testing for CNN Model¶
Now that you have a fully trained CNN classification model, you can evaluate its performance on an unseen dataset. This step is crucial to verify that the model generalizes well and does not overfit to the training data.
Use the provided config_files/config_skeletons/test_cnn_skeleton.json configuration file to specify the following:
Path to the trained CNN model from Step 3 (or the latest retraining with optimal hyperparameters).
Path to the test dataset containing images that were not used in training or validation.
Any required preprocessing or scaling parameters (if applicable).
To run the prediction test:
focal --file_path config_files/test_cnn.json
The script will output:
Classification metrics such as accuracy, precision, recall, and F1-score.
Confusion matrix visualization to see where the model is making errors.
ROC curve and AUC score for binary classification evaluation.
All metrics and plots will be saved to MLflow tracking for review and comparison.
Step 8: Prediction Testing for MLP/XGBoost Models¶
Once you have trained your MLP or XGBoost regression model for predicting optimal tension, the next step is to evaluate its performance on an unseen test dataset.
Use the provided config_files/config_skeletons/test_mlp.json or config_files/config_skeletons/test_xgb.json configuration file, depending on which model you trained.
In the config file, specify:
Path to the trained regression model from Step 6 (MLP) or Step 4 (XGBoost).
Path to the test dataset containing images and associated features not used in training.
Path to the trained CNN model from Step 3, which is required for feature extraction.
Any necessary preprocessing/scaler paths for feature normalization.
To run the prediction test for the MLP regression model:
focal --file_path config_files/test_mlp.json
To run the prediction test for the XGBoost regression model:
focal --file_path config_files/test_xgb.json
This, again, assumes you renamed the files.
The script will output a printout of predicted tension change vs. actual tension change. All metrics, plots, and evaluation artifacts will be saved to MLflow tracking for detailed analysis.
Step 9 (Optional): Train Reinforcement Learning (RL) Agent¶
For advanced optimization of the tension parameter, you can optionally train a Reinforcement Learning (RL) agent. This approach allows the model to learn tension adjustments through iterative feedback rather than relying solely on the supervised regression outputs from the MLP or XGBoost models.
The RL agent interacts with a simulated or real-world environment where:
State: Feature vector combining image-derived CNN features and physical parameters.
Action: Proposed change to the tension setting.
Reward: Improvement in cleave quality score compared to the baseline.
Note
Training the RL agent will require you to develop a CNN surrogate model to associate the numerical features from the csv dataset with the labeled cleave quality. This is not implemented in the code, but can be done using a simple XGBoost regression model.
The RL training process can be launched using the config_files/config_skeletons/train_rl.json configuration file, which should include:
Training hyperparameters
Path to the trained CNN surrogate model.
To run RL training:
focal --file_path config_files/train_rl.json
The RL agent will be trained to maximize the long-term reward for cleave quality, and the resulting policy will be saved for future evaluation and deployment.
Note
RL training is computationally intensive and may require GPU acceleration or distributed training resources.
Step 10 (Optional): Test Reinforcement Learning (RL) Agent¶
For testing the RL agent, fill out to corresponding input parameters in config_files/config_skeletons/test_rl.json and then run the following command:
focal --file_path config_files/test_rl.json
The script will output:
Start tension for specific fiber Type
Change in tension chosen for each step the agent takes
The recieved reward value