.. _full_guide: Step-by-Step Guide ================== Training and testing an end-to-end machine learning pipeline on a custom dataset can be challenging. Fortunately, I have developed a set of steps to make this process easier to follow. By following these steps in order, you can develop a complete model for predicting cleave quality and optimizing the tension parameter for input to the LDC handset. Step 1: Train CNN ----------------- After installing the application on your local machine and doing a test run (as shown in :doc:`Getting Started `), the next step is to fully develop the CNN classification model. Use the skeleton file provided under ``config_files/config_skeletons/train_cnn_skeleton.json`` to input the required parameters for CNN training. You can rename this file to suit your needs. I will refer to it as ``train_cnn.json`` for the remainder of the tutorial. Most likely, you will not achieve your desired result metrics in this first run, but this serves as a foundational training run to establish a baseline. To start the training run: .. code-block:: bash focal --file_path config_files/train_cnn.json Step 2: CNN Hyperparameter Search --------------------------------- Choosing optimal hyperparameters is essential for achieving valuable metrics when training the model. Finding these values through brute force can be particularly frustrating and time-consuming. Fortunately, Keras provides a class for hyperparameter tuning that makes this search much easier to implement. Using the ``config_files/config_skeletons/cnn_hyperparameter_skeleton.json`` config file, fill out the required inputs as described in :doc:`Configuration `. To run the search: .. code-block:: bash focal --file_path config_files/cnn_hyperparameter.json The optimal parameters will be printed to the console and saved to MLflow tracking. Record these parameters and re-enter them into the cnn training file. Step 3: Train CNN (With Optimal Parameters) ------------------------------------------- Now that you have the optimal hyperparameters from the search, run the ``train_cnn.json`` file again. The resulting metrics should improve significantly with the correct parameters. You may still want to experiment with ``early_stopping``, ``checkpoints``, and other training options to achieve the best final model. This is done experimentally. There is no established routine for always achieving the ideal model state. Step 4: Train MLP or XGBoost Regression Model --------------------------------------------- For determining the ideal change in tension, you have two options: 1. **MLP Regression Model** — Uses the same architecture as the CNN classification model but with a different final activation function to output a continuous value. 2. **XGBoost Model** — A faster alternative that can save computation time (preferred choice). In either case, you must provide the path to the trained CNN model from Step 3. Also, fill out the required parameters in either ``config_files/config_skeletons/train_mlp_skeleton.json`` or ``config_files/config_skeletons/train_xgb_skeleton.json`` depending on your chosen model. Again, you can rename this to suit your needs. To run the MLP regression training: ( Assuming you renamed the file to ``train_mlp.json``) .. code-block:: bash focal --file_path config_files/train_mlp.json To run the XGBoost regression training (Assuming you renamed the file to ``train_xgb.json``): .. code-block:: bash focal --file_path config_files/train_xgb.json Step 5: MLP Hyperparameter Search --------------------------------- Similar to the CNN model, you will likely not achieve optimal metrics from randomly chosen hyperparameters in ``train_mlp.json``. Follow the same procedure from Step 2, but use the ``config_files/config_skeletons/mlp_hyperparameter_skeleton.json`` file. To run the search (Assuming you renamed the file to ``mlp_hyperparameter.json``): .. code-block:: bash focal --file_path config_files/mlp_hyperparameter.json The optimal hyperparameters will be printed to the console and saved to MLflow. Record these for use in Step 6. .. note:: Currently, there is no implementation for automated hyperparameter tuning for XGBoost models. This feature is in development. Step 6: Train MLP Model (With Optimal Hyperparameters) ------------------------------------------------------ Update your ``train_mlp.json`` file with the optimal hyperparameters from Step 5, then re-run the training. The model should now produce improved regression performance for predicting optimal tension value. Step 7: Prediction Testing for CNN Model ---------------------------------------- Now that you have a fully trained CNN classification model, you can evaluate its performance on an unseen dataset. This step is crucial to verify that the model generalizes well and does not overfit to the training data. Use the provided ``config_files/config_skeletons/test_cnn_skeleton.json`` configuration file to specify the following: - **Path to the trained CNN model** from Step 3 (or the latest retraining with optimal hyperparameters). - **Path to the test dataset** containing images that were not used in training or validation. - Any required preprocessing or scaling parameters (if applicable). To run the prediction test: .. code-block:: bash focal --file_path config_files/test_cnn.json The script will output: - **Classification metrics** such as accuracy, precision, recall, and F1-score. - **Confusion matrix** visualization to see where the model is making errors. - **ROC curve and AUC score** for binary classification evaluation. All metrics and plots will be saved to MLflow tracking for review and comparison. Step 8: Prediction Testing for MLP/XGBoost Models ------------------------------------------------- Once you have trained your MLP or XGBoost regression model for predicting optimal tension, the next step is to evaluate its performance on an unseen test dataset. Use the provided ``config_files/config_skeletons/test_mlp.json`` or ``config_files/config_skeletons/test_xgb.json`` configuration file, depending on which model you trained. In the config file, specify: - **Path to the trained regression model** from Step 6 (MLP) or Step 4 (XGBoost). - **Path to the test dataset** containing images and associated features not used in training. - **Path to the trained CNN model** from Step 3, which is required for feature extraction. - Any necessary preprocessing/scaler paths for feature normalization. To run the prediction test for the MLP regression model: .. code-block:: bash focal --file_path config_files/test_mlp.json To run the prediction test for the XGBoost regression model: .. code-block:: bash focal --file_path config_files/test_xgb.json This, again, assumes you renamed the files. The script will output a printout of predicted tension change vs. actual tension change. All metrics, plots, and evaluation artifacts will be saved to MLflow tracking for detailed analysis. Step 9 (Optional): Train Reinforcement Learning (RL) Agent ---------------------------------------------------------- For advanced optimization of the tension parameter, you can optionally train a Reinforcement Learning (RL) agent. This approach allows the model to learn tension adjustments through iterative feedback rather than relying solely on the supervised regression outputs from the MLP or XGBoost models. The RL agent interacts with a simulated or real-world environment where: - **State**: Feature vector combining image-derived CNN features and physical parameters. - **Action**: Proposed change to the tension setting. - **Reward**: Improvement in cleave quality score compared to the baseline. .. note:: Training the RL agent will require you to develop a CNN surrogate model to associate the numerical features from the csv dataset with the labeled cleave quality. This is not implemented in the code, but can be done using a simple XGBoost regression model. The RL training process can be launched using the ``config_files/config_skeletons/train_rl.json`` configuration file, which should include: - Training hyperparameters - Path to the trained CNN surrogate model. To run RL training: .. code-block:: bash focal --file_path config_files/train_rl.json The RL agent will be trained to maximize the long-term reward for cleave quality, and the resulting policy will be saved for future evaluation and deployment. .. note:: RL training is computationally intensive and may require GPU acceleration or distributed training resources. Step 10 (Optional): Test Reinforcement Learning (RL) Agent ---------------------------------------------------------- For testing the RL agent, fill out to corresponding input parameters in ``config_files/config_skeletons/test_rl.json`` and then run the following command: .. code-block:: bash focal --file_path config_files/test_rl.json The script will output: - Start tension for specific fiber Type - Change in tension chosen for each step the agent takes - The recieved reward value