Train with Unlabeled Data via SSL

Open In Colab         Download]Download this Notebook

Introduction

In this guide, you will learn how Masterful automatically uses unlabeled data to improve model accuracy through semi-supervised learning (SSL).

SSL is an excellent way to improve your model without the extra cost, difficulty, and hassle of labeling more data.

Masterful uses SSL through many of it’s APIs. This guide will walk you through the use of SSL in the masterful.training.train function, which is the primary model training function in the API. This API function is also invoked by the Masterful CLI Trainer, so the Masterful CLI Trainer also provides full support for using unlabeled data automatically during model training.

For this guide, you will simulate a small labeled dataset, on the order of only 500 labeled examples per class. To do this, you will use a small subset of the CIFAR-10 dataset (10%) as the labeled examples, and the rest of the dataset as the “unlabeled” examples.

Prerequisites

Please follow the Masterful installation instructions here.

Imports

First, import the necessary libraries and activate the Masterful package.

[1]:
import numpy as np
import tensorflow as tf
import tensorflow_addons as tfa

import masterful

masterful = masterful.activate()
MASTERFUL: Your account has been successfully registered. Masterful v0.5.0 is loaded.
/home/yaoshiang/miniconda3/envs/tf/lib/python3.8/site-packages/tensorflow_addons/utils/ensure_tf_install.py:53: UserWarning: Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.7.0 and strictly below 2.10.0 (nightly versions are not supported).
 The versions of TensorFlow you are currently using is 2.6.2 and is not supported.
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version.
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons
  warnings.warn(

Prepare the Data

For this guide, you will use only 10% of the CIFAR-10 data as your labeled dataset, in order to simulate a small of amount of labeled training data. You will then use 4x that amount of unlabeled data (from the remaining CIFAR-10 dataset) in order to boost the performance of your model at training time. Why should you use 4x the amount of unlabeled data? In practice, we have found diminishing returns from larger amounts of unlabeled data, and an ideal range is generally between 2-10x the size of your labeled data.

[2]:
NUM_CLASSES = 10
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

# Normalize into the [0,1] range for numerical stability.
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Masterful does not recommend sparse labels so convert to categorical.
y_train = tf.keras.utils.to_categorical(y_train, NUM_CLASSES)
y_test = tf.keras.utils.to_categorical(y_test, NUM_CLASSES)

# Shuffle the data, and take 10% for the labeled data set,
# and 10x that amount for the unlabeled dataset.
training_percentage = 0.1
unlabeled_multiplier = 4
dataset_size = len(x_train)
indices = np.array(range(dataset_size))
generator = np.random.default_rng(seed=42)
generator.shuffle(indices)
cut = int(training_percentage * dataset_size)
train_indices = indices[:cut]
unlabeled_indices = indices[
    cut : cut + int(dataset_size * training_percentage * unlabeled_multiplier)
]

# Create the datasets from the splits
training_dataset = tf.data.Dataset.from_tensor_slices(
    (x_train[train_indices], y_train[train_indices])
)
unlabeled_dataset = tf.data.Dataset.from_tensor_slices((x_train[unlabeled_indices],))

# Split the test dataset into a test and validation dataset.
# The validation dataset is used for measuring training performance.
indices = np.array(range(len(x_test)))
generator.shuffle(indices)
test_indices = indices[:5000]
validation_indices = indices[5000:]
test_dataset = tf.data.Dataset.from_tensor_slices(
    (x_test[test_indices], y_test[test_indices])
)
validation_dataset = tf.data.Dataset.from_tensor_slices(
    (x_test[validation_indices], y_test[validation_indices])
)
2022-06-20 16:28:08.730624: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-20 16:28:08.735124: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-20 16:28:08.735536: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-20 16:28:08.736352: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-20 16:28:08.736857: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-20 16:28:08.737375: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-20 16:28:08.737790: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-20 16:28:09.047265: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-20 16:28:09.047716: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-20 16:28:09.048124: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-20 16:28:09.048524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22318 MB memory:  -> device: 0, name: GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6

Create the Model

For this example, you will use a ResNet-18v2 model from Identity Mappings in Deep Residual Networks. ResNet’s are a very standard architecture and with a good training methodology can meet most state of the art results. In general, a ResNet-18 would be way too large for only 500 labeled examples of data. And for this guide, you could use a much smaller model that would train a lot faster and still achieve the same results.

The only difference between the model defined below and the ResNet-18 definition in the paper is the first convolutional layer has been reduced from a 7x7 convolution to a 3x3 convolution, in order to handle the small input size of CIFAR-10 better.

Note that in general, real applications will work with image sizes much larger than CIFAR-10’s 32x32, so you’ll want to use an existing model architecture with pretrained weights from the tf.keras.applications module.

[3]:
from tensorflow.keras.layers import (
    Input,
    Add,
    Conv2D,
    GlobalAveragePooling2D,
    MaxPooling2D,
    ReLU,
    ZeroPadding2D,
    BatchNormalization,
    Dense,
)


def identity_block(x, name, stage, unit, n_filters):
    shortcut = x

    x = BatchNormalization(name=name.format(stage, unit, "bn", 1))(x)
    x = ReLU(name=name.format(stage, unit, "relu", 1))(x)
    x = Conv2D(
        n_filters,
        (3, 3),
        strides=(1, 1),
        padding="same",
        kernel_initializer="he_uniform",
        name=name.format(stage, unit, "conv", 1),
    )(x)

    x = BatchNormalization(name=name.format(stage, unit, "bn", 2))(x)
    x = ReLU(name=name.format(stage, unit, "relu", 2))(x)
    x = Conv2D(
        n_filters,
        (3, 3),
        strides=(1, 1),
        padding="same",
        kernel_initializer="he_uniform",
        name=name.format(stage, unit, "conv", 2),
    )(x)

    x = Add(name=name.format(stage, unit, "add", 1))([shortcut, x])
    return x


def projection_block(x, name, stage, unit, strides, n_filters):
    x = BatchNormalization(name=name.format(stage, unit, "bn", 1))(x)
    x = ReLU(name=name.format(stage, unit, "relu", 1))(x)
    shortcut = Conv2D(
        n_filters,
        (1, 1),
        strides=strides,
        kernel_initializer="he_uniform",
        name=name.format(stage, unit, "sc", 1),
    )(x)

    x = Conv2D(
        n_filters,
        (3, 3),
        strides=strides,
        padding="same",
        kernel_initializer="he_uniform",
        name=name.format(stage, unit, "conv", 1),
    )(x)
    x = BatchNormalization(name=name.format(stage, unit, "bn", 2))(x)
    x = ReLU(name=name.format(stage, unit, "relu", 2))(x)
    x = Conv2D(
        n_filters,
        (3, 3),
        strides=(1, 1),
        padding="same",
        kernel_initializer="he_uniform",
        name=name.format(stage, unit, "conv", 2),
    )(x)

    x = Add(name=name.format(stage, unit, "add", 1))([x, shortcut])
    return x


def group(x, name, stage, strides, n_blocks, n_filters):
    x = projection_block(
        x, name=name, stage=stage, unit=1, strides=strides, n_filters=n_filters
    )
    for unit in range(n_blocks - 1):
        x = identity_block(
            x, name=name, stage=stage, unit=unit + 2, n_filters=n_filters
        )
    return x


def resnet18(input_shape, num_classes):
    inputs = Input(input_shape)
    x = ZeroPadding2D(padding=(3, 3))(inputs)

    x = Conv2D(
        64, (3, 3), strides=(1, 1), padding="valid", kernel_initializer="he_uniform"
    )(x)
    x = BatchNormalization()(x)
    x = ReLU()(x)
    x = ZeroPadding2D(padding=(1, 1))(x)
    x = MaxPooling2D((3, 3), strides=(2, 2))(x)

    x = group(
        x, strides=(1, 1), name="stage{}_unit{}_{}{}", stage=1, n_blocks=2, n_filters=64
    )
    x = group(
        x,
        strides=(2, 2),
        name="stage{}_unit{}_{}{}",
        stage=2,
        n_blocks=2,
        n_filters=128,
    )
    x = group(
        x,
        strides=(2, 2),
        name="stage{}_unit{}_{}{}",
        stage=3,
        n_blocks=2,
        n_filters=256,
    )
    x = group(
        x,
        strides=(2, 2),
        name="stage{}_unit{}_{}{}",
        stage=4,
        n_blocks=2,
        n_filters=512,
    )

    x = BatchNormalization()(x)
    x = ReLU()(x)

    x = GlobalAveragePooling2D()(x)
    x = Dense(num_classes, kernel_initializer="he_normal")(x)
    return tf.keras.Model(inputs=inputs, outputs=x)


INPUT_SHAPE = (32, 32, 3)
NUM_CLASSES = 10

model = resnet18(INPUT_SHAPE, NUM_CLASSES)
model.summary()
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            [(None, 32, 32, 3)]  0
__________________________________________________________________________________________________
zero_padding2d (ZeroPadding2D)  (None, 38, 38, 3)    0           input_1[0][0]
__________________________________________________________________________________________________
conv2d (Conv2D)                 (None, 36, 36, 64)   1792        zero_padding2d[0][0]
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 36, 36, 64)   256         conv2d[0][0]
__________________________________________________________________________________________________
re_lu (ReLU)                    (None, 36, 36, 64)   0           batch_normalization[0][0]
__________________________________________________________________________________________________
zero_padding2d_1 (ZeroPadding2D (None, 38, 38, 64)   0           re_lu[0][0]
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D)    (None, 18, 18, 64)   0           zero_padding2d_1[0][0]
__________________________________________________________________________________________________
stage1_unit1_bn1 (BatchNormaliz (None, 18, 18, 64)   256         max_pooling2d[0][0]
__________________________________________________________________________________________________
stage1_unit1_relu1 (ReLU)       (None, 18, 18, 64)   0           stage1_unit1_bn1[0][0]
__________________________________________________________________________________________________
stage1_unit1_conv1 (Conv2D)     (None, 18, 18, 64)   36928       stage1_unit1_relu1[0][0]
__________________________________________________________________________________________________
stage1_unit1_bn2 (BatchNormaliz (None, 18, 18, 64)   256         stage1_unit1_conv1[0][0]
__________________________________________________________________________________________________
stage1_unit1_relu2 (ReLU)       (None, 18, 18, 64)   0           stage1_unit1_bn2[0][0]
__________________________________________________________________________________________________
stage1_unit1_conv2 (Conv2D)     (None, 18, 18, 64)   36928       stage1_unit1_relu2[0][0]
__________________________________________________________________________________________________
stage1_unit1_sc1 (Conv2D)       (None, 18, 18, 64)   4160        stage1_unit1_relu1[0][0]
__________________________________________________________________________________________________
stage1_unit1_add1 (Add)         (None, 18, 18, 64)   0           stage1_unit1_conv2[0][0]
                                                                 stage1_unit1_sc1[0][0]
__________________________________________________________________________________________________
stage1_unit2_bn1 (BatchNormaliz (None, 18, 18, 64)   256         stage1_unit1_add1[0][0]
__________________________________________________________________________________________________
stage1_unit2_relu1 (ReLU)       (None, 18, 18, 64)   0           stage1_unit2_bn1[0][0]
__________________________________________________________________________________________________
stage1_unit2_conv1 (Conv2D)     (None, 18, 18, 64)   36928       stage1_unit2_relu1[0][0]
__________________________________________________________________________________________________
stage1_unit2_bn2 (BatchNormaliz (None, 18, 18, 64)   256         stage1_unit2_conv1[0][0]
__________________________________________________________________________________________________
stage1_unit2_relu2 (ReLU)       (None, 18, 18, 64)   0           stage1_unit2_bn2[0][0]
__________________________________________________________________________________________________
stage1_unit2_conv2 (Conv2D)     (None, 18, 18, 64)   36928       stage1_unit2_relu2[0][0]
__________________________________________________________________________________________________
stage1_unit2_add1 (Add)         (None, 18, 18, 64)   0           stage1_unit1_add1[0][0]
                                                                 stage1_unit2_conv2[0][0]
__________________________________________________________________________________________________
stage2_unit1_bn1 (BatchNormaliz (None, 18, 18, 64)   256         stage1_unit2_add1[0][0]
__________________________________________________________________________________________________
stage2_unit1_relu1 (ReLU)       (None, 18, 18, 64)   0           stage2_unit1_bn1[0][0]
__________________________________________________________________________________________________
stage2_unit1_conv1 (Conv2D)     (None, 9, 9, 128)    73856       stage2_unit1_relu1[0][0]
__________________________________________________________________________________________________
stage2_unit1_bn2 (BatchNormaliz (None, 9, 9, 128)    512         stage2_unit1_conv1[0][0]
__________________________________________________________________________________________________
stage2_unit1_relu2 (ReLU)       (None, 9, 9, 128)    0           stage2_unit1_bn2[0][0]
__________________________________________________________________________________________________
stage2_unit1_conv2 (Conv2D)     (None, 9, 9, 128)    147584      stage2_unit1_relu2[0][0]
__________________________________________________________________________________________________
stage2_unit1_sc1 (Conv2D)       (None, 9, 9, 128)    8320        stage2_unit1_relu1[0][0]
__________________________________________________________________________________________________
stage2_unit1_add1 (Add)         (None, 9, 9, 128)    0           stage2_unit1_conv2[0][0]
                                                                 stage2_unit1_sc1[0][0]
__________________________________________________________________________________________________
stage2_unit2_bn1 (BatchNormaliz (None, 9, 9, 128)    512         stage2_unit1_add1[0][0]
__________________________________________________________________________________________________
stage2_unit2_relu1 (ReLU)       (None, 9, 9, 128)    0           stage2_unit2_bn1[0][0]
__________________________________________________________________________________________________
stage2_unit2_conv1 (Conv2D)     (None, 9, 9, 128)    147584      stage2_unit2_relu1[0][0]
__________________________________________________________________________________________________
stage2_unit2_bn2 (BatchNormaliz (None, 9, 9, 128)    512         stage2_unit2_conv1[0][0]
__________________________________________________________________________________________________
stage2_unit2_relu2 (ReLU)       (None, 9, 9, 128)    0           stage2_unit2_bn2[0][0]
__________________________________________________________________________________________________
stage2_unit2_conv2 (Conv2D)     (None, 9, 9, 128)    147584      stage2_unit2_relu2[0][0]
__________________________________________________________________________________________________
stage2_unit2_add1 (Add)         (None, 9, 9, 128)    0           stage2_unit1_add1[0][0]
                                                                 stage2_unit2_conv2[0][0]
__________________________________________________________________________________________________
stage3_unit1_bn1 (BatchNormaliz (None, 9, 9, 128)    512         stage2_unit2_add1[0][0]
__________________________________________________________________________________________________
stage3_unit1_relu1 (ReLU)       (None, 9, 9, 128)    0           stage3_unit1_bn1[0][0]
__________________________________________________________________________________________________
stage3_unit1_conv1 (Conv2D)     (None, 5, 5, 256)    295168      stage3_unit1_relu1[0][0]
__________________________________________________________________________________________________
stage3_unit1_bn2 (BatchNormaliz (None, 5, 5, 256)    1024        stage3_unit1_conv1[0][0]
__________________________________________________________________________________________________
stage3_unit1_relu2 (ReLU)       (None, 5, 5, 256)    0           stage3_unit1_bn2[0][0]
__________________________________________________________________________________________________
stage3_unit1_conv2 (Conv2D)     (None, 5, 5, 256)    590080      stage3_unit1_relu2[0][0]
__________________________________________________________________________________________________
stage3_unit1_sc1 (Conv2D)       (None, 5, 5, 256)    33024       stage3_unit1_relu1[0][0]
__________________________________________________________________________________________________
stage3_unit1_add1 (Add)         (None, 5, 5, 256)    0           stage3_unit1_conv2[0][0]
                                                                 stage3_unit1_sc1[0][0]
__________________________________________________________________________________________________
stage3_unit2_bn1 (BatchNormaliz (None, 5, 5, 256)    1024        stage3_unit1_add1[0][0]
__________________________________________________________________________________________________
stage3_unit2_relu1 (ReLU)       (None, 5, 5, 256)    0           stage3_unit2_bn1[0][0]
__________________________________________________________________________________________________
stage3_unit2_conv1 (Conv2D)     (None, 5, 5, 256)    590080      stage3_unit2_relu1[0][0]
__________________________________________________________________________________________________
stage3_unit2_bn2 (BatchNormaliz (None, 5, 5, 256)    1024        stage3_unit2_conv1[0][0]
__________________________________________________________________________________________________
stage3_unit2_relu2 (ReLU)       (None, 5, 5, 256)    0           stage3_unit2_bn2[0][0]
__________________________________________________________________________________________________
stage3_unit2_conv2 (Conv2D)     (None, 5, 5, 256)    590080      stage3_unit2_relu2[0][0]
__________________________________________________________________________________________________
stage3_unit2_add1 (Add)         (None, 5, 5, 256)    0           stage3_unit1_add1[0][0]
                                                                 stage3_unit2_conv2[0][0]
__________________________________________________________________________________________________
stage4_unit1_bn1 (BatchNormaliz (None, 5, 5, 256)    1024        stage3_unit2_add1[0][0]
__________________________________________________________________________________________________
stage4_unit1_relu1 (ReLU)       (None, 5, 5, 256)    0           stage4_unit1_bn1[0][0]
__________________________________________________________________________________________________
stage4_unit1_conv1 (Conv2D)     (None, 3, 3, 512)    1180160     stage4_unit1_relu1[0][0]
__________________________________________________________________________________________________
stage4_unit1_bn2 (BatchNormaliz (None, 3, 3, 512)    2048        stage4_unit1_conv1[0][0]
__________________________________________________________________________________________________
stage4_unit1_relu2 (ReLU)       (None, 3, 3, 512)    0           stage4_unit1_bn2[0][0]
__________________________________________________________________________________________________
stage4_unit1_conv2 (Conv2D)     (None, 3, 3, 512)    2359808     stage4_unit1_relu2[0][0]
__________________________________________________________________________________________________
stage4_unit1_sc1 (Conv2D)       (None, 3, 3, 512)    131584      stage4_unit1_relu1[0][0]
__________________________________________________________________________________________________
stage4_unit1_add1 (Add)         (None, 3, 3, 512)    0           stage4_unit1_conv2[0][0]
                                                                 stage4_unit1_sc1[0][0]
__________________________________________________________________________________________________
stage4_unit2_bn1 (BatchNormaliz (None, 3, 3, 512)    2048        stage4_unit1_add1[0][0]
__________________________________________________________________________________________________
stage4_unit2_relu1 (ReLU)       (None, 3, 3, 512)    0           stage4_unit2_bn1[0][0]
__________________________________________________________________________________________________
stage4_unit2_conv1 (Conv2D)     (None, 3, 3, 512)    2359808     stage4_unit2_relu1[0][0]
__________________________________________________________________________________________________
stage4_unit2_bn2 (BatchNormaliz (None, 3, 3, 512)    2048        stage4_unit2_conv1[0][0]
__________________________________________________________________________________________________
stage4_unit2_relu2 (ReLU)       (None, 3, 3, 512)    0           stage4_unit2_bn2[0][0]
__________________________________________________________________________________________________
stage4_unit2_conv2 (Conv2D)     (None, 3, 3, 512)    2359808     stage4_unit2_relu2[0][0]
__________________________________________________________________________________________________
stage4_unit2_add1 (Add)         (None, 3, 3, 512)    0           stage4_unit1_add1[0][0]
                                                                 stage4_unit2_conv2[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 3, 3, 512)    2048        stage4_unit2_add1[0][0]
__________________________________________________________________________________________________
re_lu_1 (ReLU)                  (None, 3, 3, 512)    0           batch_normalization_1[0][0]
__________________________________________________________________________________________________
global_average_pooling2d (Globa (None, 512)          0           re_lu_1[0][0]
__________________________________________________________________________________________________
dense (Dense)                   (None, 10)           5130        global_average_pooling2d[0][0]
==================================================================================================
Total params: 11,189,194
Trainable params: 11,181,258
Non-trainable params: 7,936
__________________________________________________________________________________________________

Baseline Training

In order to measure the performance improvements from Masterful, you should measure the performance of your model after training with a standard training loop, with no unlabeled data. Below, you will setup a standard training loop with some basic data augmentation (color space augmentation, random resized crops, and horizontal mirroring). The hyperparameter values below (learning rate, epochs, batch size, etc) were all found using a manual search.

[4]:

def augment_image(image):
    """A simple augmentation pipeline."""
    image = tf.image.random_brightness(image, 0.1)
    image = tf.image.random_hue(image, 0.1)
    image = tf.image.random_crop(image, size=[28, 28, 3])
    image = tf.image.resize(image, size=[32, 32])
    image = tf.image.random_flip_left_right(image)
    return image


model.compile(
    optimizer=tfa.optimizers.LAMB(learning_rate=0.001),
    loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
    metrics=[tf.keras.metrics.CategoricalAccuracy()],
)

batch_size = 256
shuffle_buffer_size = 500
epochs = 30
model.fit(
    training_dataset.shuffle(shuffle_buffer_size)
    .map(lambda image, label: (augment_image(image), label))
    .batch(batch_size),
    validation_data=validation_dataset.batch(batch_size),
    epochs=epochs,
    verbose=0,
)
baseline_metrics = model.evaluate(test_dataset.batch(batch_size), return_dict=True)
print(f"Baseline model accuracy: {baseline_metrics['categorical_accuracy']}")
2022-06-20 16:28:10.174844: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2022-06-20 16:28:15.155991: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8100
2022-06-20 16:28:15.580144: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-06-20 16:28:15.580398: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-06-20 16:28:15.580432: W tensorflow/stream_executor/gpu/asm_compiler.cc:77] Couldn't get ptxas version string: Internal: Couldn't invoke ptxas --version
2022-06-20 16:28:15.580680: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-06-20 16:28:15.580722: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Failed to launch ptxas
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
2022-06-20 16:28:16.203576: I tensorflow/stream_executor/cuda/cuda_blas.cc:1760] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
20/20 [==============================] - 0s 10ms/step - loss: 1.6297 - categorical_accuracy: 0.5294
Baseline model accuracy: 0.5293999910354614

Setup Masterful

The Masterful AutoML platform learns how to train your model by focusing on five core organizational principles in deep learning: architecture, data, optimization, regularization, and semi-supervision.

Architecture is the structure of weights, biases, and activations that define a model. In this example, the architecture is defined by the model you created above.

Data is the input used to train the model. In this example, you are using a labeled training dataset - CIFAR-10. More advanced usages of the Masterful AutoML platform can take into account unlabeled and synthetic data as well, using a variety of different techniques.

Optimization means finding the best weights for a model and training data. Optimization is different from regularization because optimization does not consider generalization to unseen data. The central challenge of optimization is speed - find the best weights faster.

Regularization means helping a model generalize to data it has not yet seen. Another way of saying this is that regularization is about fighting overfitting.

Semi-Supervision is the process by which a model can be trained using both labeled and unlabeled data.

The first step when using Masterful is to learn the optimal set of parameters for each of the five buckets above. You start by learning the architecture and data parameters of the model and training dataset. In the code below, you are telling Masterful that your model is performing a classification task (masterful.enums.Task.CLASSIFICATION) with 10 labels (num_classes=NUM_CLASSES), and that the input range of the image features going into your model are in the range [0,255] (input_range=masterful.enums.ImageRange.ZERO_255). Also, the model outputs logits rather than a softmax classification (prediction_logits=True).

Furthermore, in the training dataset, you are providing dense labels (sparse_labels=False) rather than sparse labels.

For more details on architecture and data parameters, see the API specifications for ArchitectureParams and DataParams.

[5]:
# Start fresh with a new model
tf.keras.backend.clear_session()
model = resnet18(INPUT_SHAPE, NUM_CLASSES)
model_params = masterful.architecture.learn_architecture_params(
    model=model,
    task=masterful.enums.Task.CLASSIFICATION,
    input_range=masterful.enums.ImageRange.ZERO_ONE,
    num_classes=NUM_CLASSES,
    prediction_logits=True,
)
training_dataset_params = masterful.data.learn_data_params(
    dataset=training_dataset,
    task=masterful.enums.Task.CLASSIFICATION,
    image_range=masterful.enums.ImageRange.ZERO_ONE,
    num_classes=NUM_CLASSES,
    sparse_labels=False,
)
validation_dataset_params = masterful.data.learn_data_params(
    dataset=validation_dataset,
    task=masterful.enums.Task.CLASSIFICATION,
    image_range=masterful.enums.ImageRange.ZERO_ONE,
    num_classes=NUM_CLASSES,
    sparse_labels=False,
)
unlabeled_dataset_params = masterful.data.learn_data_params(
    dataset=unlabeled_dataset,
    task=masterful.enums.Task.CLASSIFICATION,
    image_range=masterful.enums.ImageRange.ZERO_ONE,
    num_classes=NUM_CLASSES,
    sparse_labels=None,
)
Log API_EVENT (400): {'app_exception': 'InvalidUUID', 'context': {'message': 'account id or password with bad format.'}}
Log API_EVENT (400): {'app_exception': 'InvalidUUID', 'context': {'message': 'account id or password with bad format.'}}
Log API_EVENT (400): {'app_exception': 'InvalidUUID', 'context': {'message': 'account id or password with bad format.'}}
Log API_EVENT (400): {'app_exception': 'InvalidUUID', 'context': {'message': 'account id or password with bad format.'}}

Next you learn the optimization parameters that will be used to train the model. Below, you use Masterful to learn the standard set of optimization parameters to train your model for a classification task.

For more details on the optmization parameters, please see the OptimizationParams API specification.

[6]:
optimization_params = masterful.optimization.learn_optimization_params(
    model,
    model_params,
    training_dataset,
    training_dataset_params,
)
Log API_EVENT (400): {'app_exception': 'InvalidUUID', 'context': {'message': 'account id or password with bad format.'}}
Callbacks: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:24<00:00,  3.05s/steps]

The regularization parameters used can have a dramatic impact on the final performance of your trained model. Learning these parameters can be a time-consuming and domain specific challenge. Masterful can speed up this process by learning these parameters for you. In general, this can be an expensive operation. A rough order of magnitude for learning these parameters is 2x the time it takes to train your model. However, this is still dramatically faster than manually finding these parameters yourself. In the example below, you will use one of the many sets of pre-learned regularization parameters that are shipped in the Masterful API. In most instances, you should learn these parameters directly using the learn_regularization_params API.

For more details on the regularization parameters, please see the RegularizationParams API specification.

[7]:
# This is a set of parameters learned on CIFAR10 for
# for  ResNet18 models.
regularization_params = masterful.regularization.parameters.CIFAR10_RESNET18

The final step before training is to learn the optimal set of semi-supervision parameters. In this example, Masterful will apply Noisy Student Training to improve your model during training with the provided unlabeled data.

For more details on the semi-supervision parameters, please see the SemiSupervisedParams API specification.

[8]:
ssl_params = masterful.ssl.learn_ssl_params(
    training_dataset,
    training_dataset_params,
    unlabeled_datasets=[(unlabeled_dataset, unlabeled_dataset_params)],
)
Log API_EVENT (400): {'app_exception': 'InvalidUUID', 'context': {'message': 'account id or password with bad format.'}}
2022-06-20 16:29:15.175503: W tensorflow/core/data/root_dataset.cc:167] Optimization loop failed: Cancelled: Operation was cancelled

Training with Unlabeled Data

Now, you are ready to train your model using Masterful. In the next cell, you will see the call to masterful.training.train, which is the entry point to the meta-learning engine of the Masterful AutoML platform.

[9]:
training_report = masterful.training.train(
    model,
    model_params,
    optimization_params,
    regularization_params,
    ssl_params,
    training_dataset,
    training_dataset_params,
    validation_dataset,
    validation_dataset_params,
    unlabeled_datasets=[(unlabeled_dataset, unlabeled_dataset_params)],
)
Log API_EVENT (400): {'app_exception': 'InvalidUUID', 'context': {'message': 'account id or password with bad format.'}}
MASTERFUL [16:29:15]: Training model with semi-supervised learning enabled.
MASTERFUL [16:29:16]: Performing basic dataset analysis.
MASTERFUL [16:29:16]: Training model with:
MASTERFUL [16:29:16]:   5000 labeled examples.
MASTERFUL [16:29:16]:   5000 validation examples.
MASTERFUL [16:29:16]:   0 synthetic examples.
MASTERFUL [16:29:16]:   20000 unlabeled examples.
MASTERFUL [16:29:17]: Training model with learned parameters mollusk-coral-elephant in two phases.
MASTERFUL [16:29:17]: The first phase is supervised training with the learned parameters.
MASTERFUL [16:29:17]: The second phase is semi-supervised training to boost performance.
MASTERFUL [16:29:18]: Warming up model for supervised training.
MASTERFUL [16:29:21]:   Warming up batch norm statistics (this could take a few minutes).
MASTERFUL [16:29:52]:   Warming up training for 500 steps.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:47<00:00, 10.63steps/s]
MASTERFUL [16:30:39]:   Validating batch norm statistics after warmup for stability (this could take a few minutes).
MASTERFUL [16:30:53]: Starting Phase 1: Supervised training until the validation loss stabilizes...
Supervised Training: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1760/1760 [01:17<00:00, 22.67steps/s]
MASTERFUL [16:32:14]: Starting Phase 2: Semi-supervised training until the validation loss stabilizes...
MASTERFUL [16:32:14]: Warming up model for semi-supervised training.
MASTERFUL [16:32:15]:   Warming up batch norm statistics (this could take a few minutes).
MASTERFUL [16:32:45]:   Warming up training for 500 steps.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:25<00:00, 19.78steps/s]
MASTERFUL [16:33:11]:   Validating batch norm statistics after warmup for stability (this could take a few minutes).
Semi-Supervised Training: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5192/5192 [05:34<00:00, 15.50steps/s]
MASTERFUL [16:39:10]: Semi-Supervised training complete.
MASTERFUL [16:39:10]: Training complete in 9.887277317047118 minutes.

The model you passed into masterful.training.train is now trained and updated in place, so you are able to evaluate it just like any other trained Keras model.

[10]:
masterful_metrics = model.evaluate(
    test_dataset.batch(optimization_params.batch_size), return_dict=True
)
print(f"Baseline model accuracy: {baseline_metrics['categorical_accuracy']}")
print(f"Masterful model accuracy: {masterful_metrics['categorical_accuracy']}")
20/20 [==============================] - 0s 10ms/step - loss: 1.1407 - categorical_accuracy: 0.6382
Baseline model accuracy: 0.5293999910354614
Masterful model accuracy: 0.6381999850273132

As you can see, you boosted your accuracy from ~53% to ~64% (results may vary depending on your run) simply by using unlabeled data with Masterful.