focal.data_processing module¶
Data processing module for the Fiber Cleave Processing application.
This module provides classes for loading, preprocessing, and organizing data for training CNN and MLP models for fiber cleave analysis.
- class focal.data_processing.BadCleaveTensionClassifier(csv_path: str, img_folder: str, tension_threshold: int, backbone: str | None = 'efficientnet', encoder_path: str | None = None, classification_type: str | None = 'binary')[source]¶
Bases:
DataCollector
- class focal.data_processing.DataCollector(csv_path: str, img_folder: str, angle_threshold: float, diameter_threshold: float, classification_type: str | None = 'binary', backbone: str | None = 'mobilenet', set_mask: str | None = 'n', encoder_path: str | None = None)[source]¶
Bases:
objectClass for collecting and preprocessing data from CSV files and image folders.
This class handles loading cleave metadata from CSV files, processing images, and creating TensorFlow datasets for training machine learning models.
- create_custom_dataset(image_shape: Tuple[int, int, int], test_size: float = 0.2, buffer_size: int = 32, batch_size: int = 16) Tuple[DatasetV2, DatasetV2][source]¶
Create datasets using only grayscale images and labels with a custom image shape.
- Parameters:
image_shape – Desired image shape (height, width, channels)
test_size – Fraction of data to use for testing
buffer_size – Buffer size for shuffling
batch_size – Batch size for training
- Returns:
Tuple of (train_ds, test_ds)
- create_datasets(images: ndarray, features: ndarray, labels: ndarray, test_size: float, buffer_size: int, batch_size: int, train_p: float, test_p: float, feature_scaler_path: str | None = None) Tuple[DatasetV2, DatasetV2, dict[int, float] | None][source]¶
Create train and test datasets with feature scaling.
- Parameters:
images – Array of image paths
features – Array of numerical features
labels – Array of target labels
test_size – Fraction of data to use for testing
buffer_size – Buffer size for dataset shuffling
batch_size – Batch size for training
feature_scaler_path – Optional path to save feature scaler
train_p – Masking probability for training.
test_p – Masking probability for testing.
- Returns:
Tuple of (train_ds, test_ds)
- create_kfold_datasets(images: ndarray, features: ndarray, labels: ndarray, buffer_size: int, batch_size: int, train_p: float, test_p: float, n_splits: int = 5) List[Tuple[DatasetV2, DatasetV2]][source]¶
Create datasets based on stratified k-fold cross validation.
- Parameters:
images – Array of image paths
features – Array of numerical features
labels – Array of target labels
buffer_size – Buffer size for dataset shuffling
batch_size – Batch size for training
n_splits – Number of k-fold splits
train_p – Masking probability for training
test_p – Masking probabilty for testing
- Returns:
List of (train_ds, test_ds) tuples for each fold
- property df: DataFrame | None¶
Lazy loading for memory efficiency.
- extract_data() Tuple[ndarray, ndarray, ndarray][source]¶
Extract data from DataFrame into separate arrays for model training.
- Returns:
Tuple of (images, features, labels) arrays
- get_backbone_preprocessor(backbone: str)[source]¶
Return the preprocessing function for the specified backbone model.
- Parameters:
backbone (str) – Name of the backbone to use. Must be one of: - “mobilenet” - “resnet” - “efficientnet”
- Returns:
The preprocess_input function tied to the chosen backbone.
- Return type:
Callable
- Raises:
ValueError – If backbone is not one of the supported options.
- image_only_dataset(original_dataset: DatasetV2) DatasetV2[source]¶
Convert dataset to image-only format (remove feature inputs).
- Parameters:
original_dataset – Original dataset with (image, features) inputs
- Returns:
Dataset with only image inputs
- Return type:
tf.data.Dataset
- class focal.data_processing.MLPDataCollector(csv_path: str, img_folder: str, angle_threshold: float, diameter_threshold: float, backbone: str | None = None)[source]¶
Bases:
DataCollectorData collector specifically for MLP regression models.
This class handles data preparation for tension prediction models, including proper scaling of both features and labels.
- create_datasets(images: ndarray, features: ndarray, labels: ndarray, test_size: float, buffer_size: int, batch_size: int, feature_scaler_path: str | None = None, tension_scaler_path: str | None = None) Tuple[DatasetV2, DatasetV2][source]¶
Create train and test datasets for MLP regression with proper scaling.
- Parameters:
images – Array of image paths
features – Array of numerical features
labels – Array of tension values
test_size – Fraction of data to use for testing
buffer_size – Buffer size for dataset shuffling
batch_size – Batch size for training
feature_scaler_path – Optional path to save feature scaler
tension_scaler_path – Optional path to save tension scaler
- Returns:
Tuple of (train_ds, test_ds)
- create_kfold_datasets(images: ndarray, features: ndarray, labels: ndarray, buffer_size: int, batch_size: int, n_splits: int = 5) Tuple[List[Tuple[DatasetV2, DatasetV2]], MinMaxScaler][source]¶
Create k-fold datasets for MLP regression with proper scaling.
- Parameters:
images – Array of image paths
features – Array of numerical features
labels – Array of tension values
buffer_size – Buffer size for dataset shuffling
batch_size – Batch size for training
n_splits – Number of k-fold splits
- Returns:
Tuple of (datasets, label_scaler)