Unlabeled Data with Masterful (Part 1)

Author: sam
Date created: 2022/03/29
Last modified: 2022/03/29
Description: Part 1 of using unlabeled data with Masterful.

Open In Colab        DownloadDownload this Notebook

Introduction

In this guide, you will learn how to use unlabeled data with the Masterful API. Semi-supervised learning with unlabeled data is an excellent way to improve your model without the extra cost, difficulty, and hassle of labeling more data.

Masterful supports two different forms of semi-supervised learning: self-supervision to learn an improved representation of your data, and self training to boost the performance of your model by taking advantage of unlabeled data during model training. This guide will walk you through the second form of semi-supervised learning (self training) inside of Masterful, and demonstrate the performance improvements possible using unlabeled data in conjunction with your labeled data.

For Part 1 of this guide, you will simulate a small labeled dataset, on the order of only 50 labeled examples per class. To do this, you will use a small subset of the CIFAR-10 dataset (1%) as the labeled examples, and the rest of the dataset as the “unlabeled” examples.

Prerequisites

Please follow the Masterful installation instructions here in order to run this Quickstart.

Imports

First, import the necessary libraries and register the Masterful package.

[1]:
import numpy as np
import tensorflow as tf
import tensorflow_addons as tfa

import masterful

masterful = masterful.register()
MASTERFUL: Your account has been successfully registered. Masterful v0.4.1.dev202204051649129729 is loaded.

Prepare the Data

For this guide, you will use only 1% of the CIFAR-10 data as your labeled dataset, in order to simulate a small of amount of labeled training data. You will then use 10x that amount of unlabeled data (from the remaining CIFAR-10 dataset) in order to boost the performance of your model at training time. Why should you use 10x the amount of unlabeled data? In practice, we have found diminishing returns from larger amounts of unlabeled data, and an ideal range is generally between 2-10x the size of your labeled data.

[2]:
NUM_CLASSES = 10
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

# Normalize into the [0,1] range for numerical stability.
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Masterful does not recommend sparse labels so convert to categorical.
y_train = tf.keras.utils.to_categorical(y_train, NUM_CLASSES)
y_test = tf.keras.utils.to_categorical(y_test, NUM_CLASSES)

# Shuffle the data, and take 1% for the labeled data set,
# and 10x that amount for the unlabeled dataset.
training_percentage = 0.01
unlabeled_multiplier = 10
dataset_size = len(x_train)
indices = np.array(range(dataset_size))
generator = np.random.default_rng(seed=42)
generator.shuffle(indices)
cut = int(training_percentage * dataset_size)
train_indices = indices[:cut]
unlabeled_indices = indices[
    cut : cut + int(dataset_size * training_percentage * unlabeled_multiplier)
]

# Create the datasets from the splits
training_dataset = tf.data.Dataset.from_tensor_slices(
    (x_train[train_indices], y_train[train_indices])
)
unlabeled_dataset = tf.data.Dataset.from_tensor_slices((x_train[unlabeled_indices],))

# Split the test dataset into a test and validation dataset.
# The validation dataset is used for measuring training performance.
indices = np.array(range(len(x_test)))
generator.shuffle(indices)
test_indices = indices[:5000]
validation_indices = indices[5000:]
test_dataset = tf.data.Dataset.from_tensor_slices(
    (x_test[test_indices], y_test[test_indices])
)
validation_dataset = tf.data.Dataset.from_tensor_slices(
    (x_test[validation_indices], y_test[validation_indices])
)

Create the Model

For this example, you will use a ResNet-18v2 model from Identity Mappings in Deep Residual Networks. ResNet’s are a very standard architecture and with a good training methodology can meet most state of the art results. In general, a ResNet-18 would be way too large for only 500 labeled examples of data. And for this guide, you could use a much smaller model that would train a lot faster and still achieve the same results. However, in part 2 of this guide, you will learn how to take advantage of even more unlabeled data using self-supervision inside of Masterful. In order to realize those gains, you need a model with the capacity to handle the size of your unlabeled dataset, not just your labeled data. You will use the model trained here in Part 2 to demonstrate and compare against those gains.

The only difference between the model defined below and the ResNet-18 definition in the paper is the first convolutional layer has been reduced from a 7x7 convolution to a 3x3 convolution, in order to handle the small input size of CIFAR-10 better.

[3]:
from tensorflow.keras.layers import (
    Input,
    Add,
    Conv2D,
    GlobalAveragePooling2D,
    MaxPooling2D,
    ReLU,
    ZeroPadding2D,
    BatchNormalization,
    Dense,
)


def identity_block(x, name, stage, unit, n_filters):
    shortcut = x

    x = BatchNormalization(name=name.format(stage, unit, "bn", 1))(x)
    x = ReLU(name=name.format(stage, unit, "relu", 1))(x)
    x = Conv2D(
        n_filters,
        (3, 3),
        strides=(1, 1),
        padding="same",
        kernel_initializer="he_uniform",
        name=name.format(stage, unit, "conv", 1),
    )(x)

    x = BatchNormalization(name=name.format(stage, unit, "bn", 2))(x)
    x = ReLU(name=name.format(stage, unit, "relu", 2))(x)
    x = Conv2D(
        n_filters,
        (3, 3),
        strides=(1, 1),
        padding="same",
        kernel_initializer="he_uniform",
        name=name.format(stage, unit, "conv", 2),
    )(x)

    x = Add(name=name.format(stage, unit, "add", 1))([shortcut, x])
    return x


def projection_block(x, name, stage, unit, strides, n_filters):
    x = BatchNormalization(name=name.format(stage, unit, "bn", 1))(x)
    x = ReLU(name=name.format(stage, unit, "relu", 1))(x)
    shortcut = Conv2D(
        n_filters,
        (1, 1),
        strides=strides,
        kernel_initializer="he_uniform",
        name=name.format(stage, unit, "sc", 1),
    )(x)

    x = Conv2D(
        n_filters,
        (3, 3),
        strides=strides,
        padding="same",
        kernel_initializer="he_uniform",
        name=name.format(stage, unit, "conv", 1),
    )(x)
    x = BatchNormalization(name=name.format(stage, unit, "bn", 2))(x)
    x = ReLU(name=name.format(stage, unit, "relu", 2))(x)
    x = Conv2D(
        n_filters,
        (3, 3),
        strides=(1, 1),
        padding="same",
        kernel_initializer="he_uniform",
        name=name.format(stage, unit, "conv", 2),
    )(x)

    x = Add(name=name.format(stage, unit, "add", 1))([x, shortcut])
    return x


def group(x, name, stage, strides, n_blocks, n_filters):
    x = projection_block(
        x, name=name, stage=stage, unit=1, strides=strides, n_filters=n_filters
    )
    for unit in range(n_blocks - 1):
        x = identity_block(
            x, name=name, stage=stage, unit=unit + 2, n_filters=n_filters
        )
    return x


def resnet18(input_shape, num_classes):
    inputs = Input(input_shape)
    x = ZeroPadding2D(padding=(3, 3))(inputs)

    x = Conv2D(
        64, (3, 3), strides=(1, 1), padding="valid", kernel_initializer="he_uniform"
    )(x)
    x = BatchNormalization()(x)
    x = ReLU()(x)
    x = ZeroPadding2D(padding=(1, 1))(x)
    x = MaxPooling2D((3, 3), strides=(2, 2))(x)

    x = group(
        x, strides=(1, 1), name="stage{}_unit{}_{}{}", stage=1, n_blocks=2, n_filters=64
    )
    x = group(
        x,
        strides=(2, 2),
        name="stage{}_unit{}_{}{}",
        stage=2,
        n_blocks=2,
        n_filters=128,
    )
    x = group(
        x,
        strides=(2, 2),
        name="stage{}_unit{}_{}{}",
        stage=3,
        n_blocks=2,
        n_filters=256,
    )
    x = group(
        x,
        strides=(2, 2),
        name="stage{}_unit{}_{}{}",
        stage=4,
        n_blocks=2,
        n_filters=512,
    )

    x = BatchNormalization()(x)
    x = ReLU()(x)

    x = GlobalAveragePooling2D()(x)
    x = Dense(num_classes, kernel_initializer="he_normal")(x)
    return tf.keras.Model(inputs=inputs, outputs=x)


INPUT_SHAPE = (32, 32, 3)
NUM_CLASSES = 10

model = resnet18(INPUT_SHAPE, NUM_CLASSES)
model.summary()
[2022-04-05 15:16:52.783 ip-172-31-37-63:2606 INFO utils.py:27] RULE_JOB_STOP_SIGNAL_FILENAME: None
[2022-04-05 15:16:52.857 ip-172-31-37-63:2606 INFO profiler_config_parser.py:111] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            [(None, 32, 32, 3)]  0
__________________________________________________________________________________________________
zero_padding2d (ZeroPadding2D)  (None, 38, 38, 3)    0           input_1[0][0]
__________________________________________________________________________________________________
conv2d (Conv2D)                 (None, 36, 36, 64)   1792        zero_padding2d[0][0]
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 36, 36, 64)   256         conv2d[0][0]
__________________________________________________________________________________________________
re_lu (ReLU)                    (None, 36, 36, 64)   0           batch_normalization[0][0]
__________________________________________________________________________________________________
zero_padding2d_1 (ZeroPadding2D (None, 38, 38, 64)   0           re_lu[0][0]
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D)    (None, 18, 18, 64)   0           zero_padding2d_1[0][0]
__________________________________________________________________________________________________
stage1_unit1_bn1 (BatchNormaliz (None, 18, 18, 64)   256         max_pooling2d[0][0]
__________________________________________________________________________________________________
stage1_unit1_relu1 (ReLU)       (None, 18, 18, 64)   0           stage1_unit1_bn1[0][0]
__________________________________________________________________________________________________
stage1_unit1_conv1 (Conv2D)     (None, 18, 18, 64)   36928       stage1_unit1_relu1[0][0]
__________________________________________________________________________________________________
stage1_unit1_bn2 (BatchNormaliz (None, 18, 18, 64)   256         stage1_unit1_conv1[0][0]
__________________________________________________________________________________________________
stage1_unit1_relu2 (ReLU)       (None, 18, 18, 64)   0           stage1_unit1_bn2[0][0]
__________________________________________________________________________________________________
stage1_unit1_conv2 (Conv2D)     (None, 18, 18, 64)   36928       stage1_unit1_relu2[0][0]
__________________________________________________________________________________________________
stage1_unit1_sc1 (Conv2D)       (None, 18, 18, 64)   4160        stage1_unit1_relu1[0][0]
__________________________________________________________________________________________________
stage1_unit1_add1 (Add)         (None, 18, 18, 64)   0           stage1_unit1_conv2[0][0]
                                                                 stage1_unit1_sc1[0][0]
__________________________________________________________________________________________________
stage1_unit2_bn1 (BatchNormaliz (None, 18, 18, 64)   256         stage1_unit1_add1[0][0]
__________________________________________________________________________________________________
stage1_unit2_relu1 (ReLU)       (None, 18, 18, 64)   0           stage1_unit2_bn1[0][0]
__________________________________________________________________________________________________
stage1_unit2_conv1 (Conv2D)     (None, 18, 18, 64)   36928       stage1_unit2_relu1[0][0]
__________________________________________________________________________________________________
stage1_unit2_bn2 (BatchNormaliz (None, 18, 18, 64)   256         stage1_unit2_conv1[0][0]
__________________________________________________________________________________________________
stage1_unit2_relu2 (ReLU)       (None, 18, 18, 64)   0           stage1_unit2_bn2[0][0]
__________________________________________________________________________________________________
stage1_unit2_conv2 (Conv2D)     (None, 18, 18, 64)   36928       stage1_unit2_relu2[0][0]
__________________________________________________________________________________________________
stage1_unit2_add1 (Add)         (None, 18, 18, 64)   0           stage1_unit1_add1[0][0]
                                                                 stage1_unit2_conv2[0][0]
__________________________________________________________________________________________________
stage2_unit1_bn1 (BatchNormaliz (None, 18, 18, 64)   256         stage1_unit2_add1[0][0]
__________________________________________________________________________________________________
stage2_unit1_relu1 (ReLU)       (None, 18, 18, 64)   0           stage2_unit1_bn1[0][0]
__________________________________________________________________________________________________
stage2_unit1_conv1 (Conv2D)     (None, 9, 9, 128)    73856       stage2_unit1_relu1[0][0]
__________________________________________________________________________________________________
stage2_unit1_bn2 (BatchNormaliz (None, 9, 9, 128)    512         stage2_unit1_conv1[0][0]
__________________________________________________________________________________________________
stage2_unit1_relu2 (ReLU)       (None, 9, 9, 128)    0           stage2_unit1_bn2[0][0]
__________________________________________________________________________________________________
stage2_unit1_conv2 (Conv2D)     (None, 9, 9, 128)    147584      stage2_unit1_relu2[0][0]
__________________________________________________________________________________________________
stage2_unit1_sc1 (Conv2D)       (None, 9, 9, 128)    8320        stage2_unit1_relu1[0][0]
__________________________________________________________________________________________________
stage2_unit1_add1 (Add)         (None, 9, 9, 128)    0           stage2_unit1_conv2[0][0]
                                                                 stage2_unit1_sc1[0][0]
__________________________________________________________________________________________________
stage2_unit2_bn1 (BatchNormaliz (None, 9, 9, 128)    512         stage2_unit1_add1[0][0]
__________________________________________________________________________________________________
stage2_unit2_relu1 (ReLU)       (None, 9, 9, 128)    0           stage2_unit2_bn1[0][0]
__________________________________________________________________________________________________
stage2_unit2_conv1 (Conv2D)     (None, 9, 9, 128)    147584      stage2_unit2_relu1[0][0]
__________________________________________________________________________________________________
stage2_unit2_bn2 (BatchNormaliz (None, 9, 9, 128)    512         stage2_unit2_conv1[0][0]
__________________________________________________________________________________________________
stage2_unit2_relu2 (ReLU)       (None, 9, 9, 128)    0           stage2_unit2_bn2[0][0]
__________________________________________________________________________________________________
stage2_unit2_conv2 (Conv2D)     (None, 9, 9, 128)    147584      stage2_unit2_relu2[0][0]
__________________________________________________________________________________________________
stage2_unit2_add1 (Add)         (None, 9, 9, 128)    0           stage2_unit1_add1[0][0]
                                                                 stage2_unit2_conv2[0][0]
__________________________________________________________________________________________________
stage3_unit1_bn1 (BatchNormaliz (None, 9, 9, 128)    512         stage2_unit2_add1[0][0]
__________________________________________________________________________________________________
stage3_unit1_relu1 (ReLU)       (None, 9, 9, 128)    0           stage3_unit1_bn1[0][0]
__________________________________________________________________________________________________
stage3_unit1_conv1 (Conv2D)     (None, 5, 5, 256)    295168      stage3_unit1_relu1[0][0]
__________________________________________________________________________________________________
stage3_unit1_bn2 (BatchNormaliz (None, 5, 5, 256)    1024        stage3_unit1_conv1[0][0]
__________________________________________________________________________________________________
stage3_unit1_relu2 (ReLU)       (None, 5, 5, 256)    0           stage3_unit1_bn2[0][0]
__________________________________________________________________________________________________
stage3_unit1_conv2 (Conv2D)     (None, 5, 5, 256)    590080      stage3_unit1_relu2[0][0]
__________________________________________________________________________________________________
stage3_unit1_sc1 (Conv2D)       (None, 5, 5, 256)    33024       stage3_unit1_relu1[0][0]
__________________________________________________________________________________________________
stage3_unit1_add1 (Add)         (None, 5, 5, 256)    0           stage3_unit1_conv2[0][0]
                                                                 stage3_unit1_sc1[0][0]
__________________________________________________________________________________________________
stage3_unit2_bn1 (BatchNormaliz (None, 5, 5, 256)    1024        stage3_unit1_add1[0][0]
__________________________________________________________________________________________________
stage3_unit2_relu1 (ReLU)       (None, 5, 5, 256)    0           stage3_unit2_bn1[0][0]
__________________________________________________________________________________________________
stage3_unit2_conv1 (Conv2D)     (None, 5, 5, 256)    590080      stage3_unit2_relu1[0][0]
__________________________________________________________________________________________________
stage3_unit2_bn2 (BatchNormaliz (None, 5, 5, 256)    1024        stage3_unit2_conv1[0][0]
__________________________________________________________________________________________________
stage3_unit2_relu2 (ReLU)       (None, 5, 5, 256)    0           stage3_unit2_bn2[0][0]
__________________________________________________________________________________________________
stage3_unit2_conv2 (Conv2D)     (None, 5, 5, 256)    590080      stage3_unit2_relu2[0][0]
__________________________________________________________________________________________________
stage3_unit2_add1 (Add)         (None, 5, 5, 256)    0           stage3_unit1_add1[0][0]
                                                                 stage3_unit2_conv2[0][0]
__________________________________________________________________________________________________
stage4_unit1_bn1 (BatchNormaliz (None, 5, 5, 256)    1024        stage3_unit2_add1[0][0]
__________________________________________________________________________________________________
stage4_unit1_relu1 (ReLU)       (None, 5, 5, 256)    0           stage4_unit1_bn1[0][0]
__________________________________________________________________________________________________
stage4_unit1_conv1 (Conv2D)     (None, 3, 3, 512)    1180160     stage4_unit1_relu1[0][0]
__________________________________________________________________________________________________
stage4_unit1_bn2 (BatchNormaliz (None, 3, 3, 512)    2048        stage4_unit1_conv1[0][0]
__________________________________________________________________________________________________
stage4_unit1_relu2 (ReLU)       (None, 3, 3, 512)    0           stage4_unit1_bn2[0][0]
__________________________________________________________________________________________________
stage4_unit1_conv2 (Conv2D)     (None, 3, 3, 512)    2359808     stage4_unit1_relu2[0][0]
__________________________________________________________________________________________________
stage4_unit1_sc1 (Conv2D)       (None, 3, 3, 512)    131584      stage4_unit1_relu1[0][0]
__________________________________________________________________________________________________
stage4_unit1_add1 (Add)         (None, 3, 3, 512)    0           stage4_unit1_conv2[0][0]
                                                                 stage4_unit1_sc1[0][0]
__________________________________________________________________________________________________
stage4_unit2_bn1 (BatchNormaliz (None, 3, 3, 512)    2048        stage4_unit1_add1[0][0]
__________________________________________________________________________________________________
stage4_unit2_relu1 (ReLU)       (None, 3, 3, 512)    0           stage4_unit2_bn1[0][0]
__________________________________________________________________________________________________
stage4_unit2_conv1 (Conv2D)     (None, 3, 3, 512)    2359808     stage4_unit2_relu1[0][0]
__________________________________________________________________________________________________
stage4_unit2_bn2 (BatchNormaliz (None, 3, 3, 512)    2048        stage4_unit2_conv1[0][0]
__________________________________________________________________________________________________
stage4_unit2_relu2 (ReLU)       (None, 3, 3, 512)    0           stage4_unit2_bn2[0][0]
__________________________________________________________________________________________________
stage4_unit2_conv2 (Conv2D)     (None, 3, 3, 512)    2359808     stage4_unit2_relu2[0][0]
__________________________________________________________________________________________________
stage4_unit2_add1 (Add)         (None, 3, 3, 512)    0           stage4_unit1_add1[0][0]
                                                                 stage4_unit2_conv2[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 3, 3, 512)    2048        stage4_unit2_add1[0][0]
__________________________________________________________________________________________________
re_lu_1 (ReLU)                  (None, 3, 3, 512)    0           batch_normalization_1[0][0]
__________________________________________________________________________________________________
global_average_pooling2d (Globa (None, 512)          0           re_lu_1[0][0]
__________________________________________________________________________________________________
dense (Dense)                   (None, 10)           5130        global_average_pooling2d[0][0]
==================================================================================================
Total params: 11,189,194
Trainable params: 11,181,258
Non-trainable params: 7,936
__________________________________________________________________________________________________

Baseline Training

In order to measure the performance improvements from Masterful, you should measure the performance of your model after training with a standard training loop, with no unlabeled data. Below, you will setup a standard training loop with some basic data augmentation (color space augmentation, random resized crops, and horizontal mirroring).

The performance of this model should be very poor. There are only 50 labeled examples per class, so in general this model will perform barely above random guessing. The hyperparameter values below (learning rate, epochs, batch size, etc) were all found using a manual search.

[4]:
def augment_image(image):
    """A simple augmentation pipeline."""
    image = tf.image.random_brightness(image, 0.1)
    image = tf.image.random_hue(image, 0.1)
    image = tf.image.random_crop(image, size=[28, 28, 3])
    image = tf.image.resize(image, size=[32, 32])
    image = tf.image.random_flip_left_right(image)
    return image


model.compile(
    optimizer=tfa.optimizers.LAMB(learning_rate=0.001),
    loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
    metrics=[tf.keras.metrics.CategoricalAccuracy()],
)

batch_size = 256
shuffle_buffer_size = 500
epochs = 30
model.fit(
    training_dataset.shuffle(shuffle_buffer_size)
    .map(lambda image, label: (augment_image(image), label))
    .batch(batch_size),
    validation_data=validation_dataset.batch(batch_size),
    epochs=epochs,
    verbose=0,
)
baseline_metrics = model.evaluate(test_dataset.batch(batch_size), return_dict=True)
print(f"Baseline model accuracy: {baseline_metrics['categorical_accuracy']}")
20/20 [==============================] - 0s 15ms/step - loss: 2.7604 - categorical_accuracy: 0.1452
Baseline model accuracy: 0.1451999992132187

Setup Masterful

The Masterful AutoML platform learns how to train your model by focusing on five core organizational principles in deep learning: architecture, data, optimization, regularization, and semi-supervision.

Architecture is the structure of weights, biases, and activations that define a model. In this example, the architecture is defined by the model you created above.

Data is the input used to train the model. In this example, you are using a labeled training dataset - CIFAR-10. More advanced usages of the Masterful AutoML platform can take into account unlabeled and synthetic data as well, using a variety of different techniques.

Optimization means finding the best weights for a model and training data. Optimization is different from regularization because optimization does not consider generalization to unseen data. The central challenge of optimization is speed - find the best weights faster.

Regularization means helping a model generalize to data it has not yet seen. Another way of saying this is that regularization is about fighting overfitting.

Semi-Supervision is the process by which a model can be trained using both labeled and unlabeled data.

The first step when using Masterful is to learn the optimal set of parameters for each of the five buckets above. You start by learning the architecture and data parameters of the model and training dataset. In the code below, you are telling Masterful that your model is performing a classification task (masterful.enums.Task.CLASSIFICATION) with 10 labels (num_classes=NUM_CLASSES), and that the input range of the image features going into your model are in the range [0,255] (input_range=masterful.enums.ImageRange.ZERO_255). Also, the model outputs logits rather than a softmax classification (prediction_logits=True).

Furthermore, in the training dataset, you are providing dense labels (sparse_labels=False) rather than sparse labels.

For more details on architecture and data parameters, see the API specifications for ArchitectureParams and DataParams.

[5]:
# Start fresh with a new model
tf.keras.backend.clear_session()
model = resnet18(INPUT_SHAPE, NUM_CLASSES)
model_params = masterful.architecture.learn_architecture_params(
    model=model,
    task=masterful.enums.Task.CLASSIFICATION,
    input_range=masterful.enums.ImageRange.ZERO_ONE,
    num_classes=NUM_CLASSES,
    prediction_logits=True,
)
training_dataset_params = masterful.data.learn_data_params(
    dataset=training_dataset,
    task=masterful.enums.Task.CLASSIFICATION,
    image_range=masterful.enums.ImageRange.ZERO_ONE,
    num_classes=NUM_CLASSES,
    sparse_labels=False,
)
validation_dataset_params = masterful.data.learn_data_params(
    dataset=validation_dataset,
    task=masterful.enums.Task.CLASSIFICATION,
    image_range=masterful.enums.ImageRange.ZERO_ONE,
    num_classes=NUM_CLASSES,
    sparse_labels=False,
)
unlabeled_dataset_params = masterful.data.learn_data_params(
    dataset=unlabeled_dataset,
    task=masterful.enums.Task.CLASSIFICATION,
    image_range=masterful.enums.ImageRange.ZERO_ONE,
    num_classes=NUM_CLASSES,
    sparse_labels=None,
)

Next you learn the optimization parameters that will be used to train the model. Below, you use Masterful to learn the standard set of optimization parameters to train your model for a classification task.

For more details on the optmization parameters, please see the OptimizationParams API specification.

[6]:
optimization_params = masterful.optimization.learn_optimization_params(
    model,
    model_params,
    training_dataset,
    training_dataset_params,
)
MASTERFUL: Learning optimal batch size.
MASTERFUL: Learning optimal initial learning rate for batch size 32.

The regularization parameters used can have a dramatic impact on the final performance of your trained model. Learning these parameters can be a time-consuming and domain specific challenge. Masterful can speed up this process by learning these parameters for you. In general, this can be an expensive operation. A rough order of magnitude for learning these parameters is 2x the time it takes to train your model. However, this is still dramatically faster than manually finding these parameters yourself. In the example below, you will use one of the many sets of pre-learned regularization parameters that are shipped in the Masterful API. In most instances, you should learn these parameters directly using the learn_regularization_params API.

For more details on the regularization parameters, please see the RegularizationParams API specification.

[7]:
# This is a set of parameters learned on CIFAR10 for
# for  ResNet18 models.
regularization_params = masterful.regularization.parameters.CIFAR10_RESNET18

The final step before training is to learn the optimal set of semi-supervision parameters. In this example, Masterful will apply Noisy Student Training to improve your model during training with the provided unlabeled data.

For more details on the semi-supervision parameters, please see the SemiSupervisedParams API specification.

[8]:
ssl_params = masterful.ssl.learn_ssl_params(
    training_dataset,
    training_dataset_params,
    unlabeled_datasets=[(unlabeled_dataset, unlabeled_dataset_params)],
)

Training with Unlabeled Data

Now, you are ready to train your model using the Masterful AutoML platform. In the next cell, you will see the call to masterful.training.train, which is the entry point to the meta-learning engine of the Masterful AutoML platform. Notice there is no need to batch your data (Masterful will find the optimal batch size for you). No need to shuffle your data (Masterful handles this for you). You don’t even need to pass in a validation dataset (Masterful finds one for you). You hand Masterful a model and a dataset, and Masterful will figure the rest out for you.

[9]:
training_report = masterful.training.train(
    model,
    model_params,
    optimization_params,
    regularization_params,
    ssl_params,
    training_dataset,
    training_dataset_params,
    validation_dataset,
    validation_dataset_params,
    unlabeled_datasets=[(unlabeled_dataset, unlabeled_dataset_params)],
)
MASTERFUL: Training model with semi-supervised learning enabled.
MASTERFUL: Performing basic dataset analysis.
MASTERFUL: Training model with:
MASTERFUL:      500 labeled examples.
MASTERFUL:      5000 validation examples.
MASTERFUL:      0 synthetic examples.
MASTERFUL:      5000 unlabeled examples.
MASTERFUL: Training model with learned parameters wing-polarized-spectacles in two phases.
MASTERFUL: The first phase is supervised training with the learned parameters.
MASTERFUL: The second phase is semi-supervised training to boost performance.
MASTERFUL: Warming up model for supervised training.
MASTERFUL:      Warming up batch norm statistics (this could take a few minutes).
MASTERFUL:      Warming up training for 500 steps.
100%|██████████| 500/500 [02:02<00:00,  4.08steps/s]
MASTERFUL:      Validating batch norm statistics after warmup for stability (this could take a few minutes).
MASTERFUL: Starting Phase 1: Supervised training until the validation loss stabilizes...
Supervised Training: 100%|██████████| 4152/4152 [01:21<00:00, 50.79steps/s]
MASTERFUL: Starting Phase 2: Semi-supervised training until the validation loss stabilizes...
MASTERFUL: Warming up model for semi-supervised training.
MASTERFUL:      Warming up batch norm statistics (this could take a few minutes).
MASTERFUL:      Warming up training for 500 steps.
100%|██████████| 500/500 [01:17<00:00,  6.43steps/s]
MASTERFUL:      Validating batch norm statistics after warmup for stability (this could take a few minutes).
Semi-Supervised Training: 100%|██████████| 8554/8554 [11:02<00:00, 12.92steps/s]
MASTERFUL: Semi-Supervised training complete.
MASTERFUL: Training complete in 17.955665187040964 minutes.

The model you passed into masterful.training.train is now trained and updated in place, so you are able to evaluate it just like any other trained Keras model.

[10]:
masterful_metrics = model.evaluate(
    test_dataset.batch(optimization_params.batch_size), return_dict=True
)
print(f"Baseline model accuracy: {baseline_metrics['categorical_accuracy']}")
print(f"Masterful model accuracy: {masterful_metrics['categorical_accuracy']}")
157/157 [==============================] - 1s 5ms/step - loss: 1.8697 - categorical_accuracy: 0.3226
Baseline model accuracy: 0.1451999992132187
Masterful model accuracy: 0.32260000705718994

As you can see, you reduced the error rate of your model by 10-30% (results may vary depending on your run) simply by using unlabeled data with the Masterful AutoML platform. However, the final accuracy of this model (25-35%) is still not sufficient to deploy it to production. Read Part 2 of this guide to improve this model even more.

Next Steps

In Part 2 of this guide, you will look at improving these results even more with self-supervision. By the end of Part 2 you will have a production model from this very limited dataset.