API Reference: SSL


class masterful.ssl.SemiSupervisedParams(algorithms=<factory>)

Parameters which control the semi-supervised learning aspects of Masterful training.

In this context, semi-supervised learning incorporates self training, self-supervised learning, and traditional semi-supervised learning (any learning with a combination of labeled and unlabeled data).


algorithms (Optional[Sequence[str]]) – An optional list of semi-supervised learning algorithms to use during training. Can be any combination of [“noisy_student”, “barlow_twins”]. Defaults to [“noisy_student”]

Return type



masterful.ssl.learn_ssl_params(training_dataset, training_dataset_params, unlabeled_datasets=None, synthetic_datasets=None)

Learns the optimal set of semi-supervised learning parameters to use during training.


training_dataset: tf.data.Dataset = ...
training_dataset_params: masterful.data.params.DataParams = ...
ssl_params = masterful.ssl.learn_ssl_params(training_dataset,
  • training_dataset (masterful.data.DatasetLike) – The labeled dataset to use during training.

  • training_dataset_params (masterful.data.DataParams) – The parameters of the labeled dataset.

  • unlabeled_datasets (Optional[Sequence[Tuple[masterful.data.DatasetLike, masterful.data.DataParams]]]) – Optional sequence of unlabled datasets and their parameters, to use during training. If an unlabeled dataset is specified, then a set of algorithms must be specified in ssl_params otherwise this will have no effect.

  • synthetic_datasets (Optional[Sequence[Tuple[masterful.data.DatasetLike, masterful.data.DataParams]]]) – Optional sequence of synthetic data and parameters to use during training. The amount of synthetic data used during training is controlled by masterful.regularization.RegularizationParams.synthetic_proportion.

Return type



masterful.ssl.learn_representation(model, model_params, optimization_params, ssl_params, training_dataset=None, training_dataset_params=None, validation_dataset=None, validation_dataset_params=None, unlabeled_datasets=None, synthetic_datasets=None, **kwargs)

Pretrain the weights of the given model using the provided datasets. The model is assumed to be the feature extractor (backbone) of a larger model, so there should be no classification heads (softmax output) in the model provided.


model: tf.keras.Model = ...
model_params: masterful.architecture.params.ArchitectureParams = ...
optimization_params: masterful.optimization.params.OptimizationParams = ...
ssl_params: masterful.ssl.params.SemiSupervisedParams = ...
training_dataset: tf.data.Dataset = ...
training_dataset_params: masterful.data.params.DataParams = ...
validation_dataset: tf.data.Dataset = ...
validation_dataset_params: masterful.data.params.DataParams = ...
training_report = masterful.ssl.learn_representation(
  • model (keras.engine.training.Model) – The model to pretrain. Models used here should have no classification head attached.

  • model_params (masterful.data.DataParams) – The parameters of the model to train.

  • training_dataset (Optional[tensorflow.python.data.ops.dataset_ops.DatasetV2]) – The labeled data to use for training the model. Labeled data must be unbatched, and use the Keras formulation of (image, label) for each mini-batch of data.

  • training_dataset_params (Optional[masterful.data.DataParams]) – The parameters of the training dataset.

  • validation_dataset (Optional[tensorflow.python.data.ops.dataset_ops.DatasetV2]) – The labeled data to use for validating the model. Labeled data must be unbatched, and use the Keras formulation of (image, label) for each mini-batch of data.

  • validation_dataset_params (Optional[masterful.data.DataParams]) – The parameters of the validation dataset.

  • unlabeled_datasets (Optional[Sequence[Tuple[tensorflow.python.data.ops.dataset_ops.DatasetV2, masterful.data.DataParams]]]) – [Optional] A set of unlabeled datasets and their parameters which can be used to improve the training of the model through semi-supervised and unsupervised techniques.

  • synthetic_datasets (Optional[Sequence[Tuple[tensorflow.python.data.ops.dataset_ops.DatasetV2, masterful.data.DataParams]]]) – [Optional] A set of labeled, synthetic data and their parameters that can be used to improve the performance of the model.

  • optimization_params (masterful.optimization.OptimizationParams) –

  • ssl_params (masterful.ssl.SemiSupervisedParams) –


An instance of FitReport, containing the results of pretraining the model. In order to measure the performance of pretraining, a small task specific head is temporarily attached and trained at the end to measure the performance of the pretraining task.

Return type



masterful.ssl.analyze_data_then_save_to(model, labeled_training_data, unlabeled_training_data, path)

Analyze labeled and unlabeled data then save intermediate results to disk.

Please see the Simple Semi-Supervised Learning Recipe for more details.


model: tf.keras.Model = ...
training_dataset: tf.data.Dataset = ...
unlabeled_dataset: tf.data.Dataset = ...
    model, training_dataset, unlabeled_dataset, path='/tmp/ssl')
  • model (keras.engine.training.Model) –

    A trained model. The output must be probabilities, in other words, your model’s final layer should be a softmax or sigmoid activation.

    If your model finishes with a tf.keras.layers.Dense layer, without an activation, then it’s said to be ‘outputting logits’. In that case, typically you’ll use a loss function initialized with from_logits=True. If this describes your model, you can simply attach an extra sigmoid or softmax activation to your model and pass the new model into this function. You do not need to change your original model, loss function, or training loop. For example, the model below outputs logits:

    m = tf.keras.Sequential([tf.keras.Input((32,32,3)),

    To use this model, attach a softmax activation:

    activated_model = tf.keras.Sequential([m, tf.keras.layers.Softmax()])
    masterful.ssl.save_data(activated_model, ...)

  • architecture_params – Parameters about the model architecture.

  • labeled_training_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – Labeled training data as a tf.data.Dataset. The data should be batched. Each example should have the following structure: (original_images, original_labels).

  • labeled_training_data_params – Params that describe the labeled training data.

  • unlabeled_training_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – Unlabeled training data as a tf.data.Dataset. The data should be batched. Each example should be a tensor of images.

  • unlabeled_training_data_params – Params that describe the unlabeled training data.

  • path (str) – The filepath to save to.


ValueError – If the path is empty or malformed.

Return type



masterful.ssl.load_from(path, unlabeled_weight=1.0)

Load data from disk into a tf.data.Dataset.

Please see the Simple Semi-Supervised Learning Recipe for more details.


ssl_training_dataset = masterful.ssl.load_from(path='/tmp/ssl')
  • path (str) – The location on disk to load from.

  • unlabeled_weight (Optional[float]) – A weighting for the unlabeled data.


A dataset ready to be trained against. The dataset is unbatched. If unlabeled_weight is specified, and not set to 1.0, the dataset elements are (image, label, weight), where weight is 1.0 for labeled data unlabeled_weight for unlabeled data. Otherwise, the dataset elements are (image, label).

Return type


  • ValueError – If the path is empty or malformed.

  • FileNotFound – If the path does not point to a valid file on disk.