{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "# Unlabeled Data with Masterful (Part 1)\n", "\n", "**Author:** [sam](mailto:sam@masterfulai.com) \n", "**Date created:** 2022/03/29 \n", "**Last modified:** 2022/03/29 \n", "**Description:** Part 1 of using unlabeled data with Masterful." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)][1]        [![Download](images/download.png)][2][Download this Notebook][2]\n", "\n", "[1]:https://colab.research.google.com/github/masterfulai/masterful-docs/blob/main/notebooks/guide_ssl_using_unlabeled_data_part1.ipynb\n", "[2]:http://docs.masterfulai.com/0.4.1/notebooks/guide_ssl_using_unlabeled_data_part2.ipynb" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Introduction\n", "\n", "In this guide, you will learn how to use unlabeled data with the\n", "Masterful API. Semi-supervised learning with unlabeled data is an\n", "excellent way to improve your model without the extra cost, difficulty,\n", "and hassle of labeling more data.\n", "\n", "Masterful supports two different forms of semi-supervised learning:\n", "self-supervision to learn an improved representation of your data, and\n", "self training to boost the performance of your model by taking advantage\n", "of unlabeled data during model training. This guide will walk you through\n", "the second form of semi-supervised learning (self training) inside of Masterful,\n", "and demonstrate the performance improvements possible using unlabeled\n", "data in conjunction with your labeled data.\n", "\n", "For Part 1 of this guide, you will simulate a small labeled dataset, on\n", "the order of only 50 labeled examples per class. To do this, you will\n", "use a small subset of the CIFAR-10 dataset (1%) as the labeled examples,\n", "and the rest of the dataset as the \"unlabeled\" examples." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Prerequisites\n", "\n", "Please follow the Masterful installation instructions [here](../tutorials/tutorial_installation.md)\n", "in order to run this Quickstart." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Imports\n", "\n", "First, import the necessary libraries and register the Masterful package." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-04-05T15:16:37.088456Z", "iopub.status.busy": "2022-04-05T15:16:37.087661Z", "iopub.status.idle": "2022-04-05T15:16:47.404985Z", "shell.execute_reply": "2022-04-05T15:16:47.404090Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MASTERFUL: Your account has been successfully registered. Masterful v0.4.1.dev202204051649129729 is loaded.\n" ] } ], "source": [ "import numpy as np\n", "import tensorflow as tf\n", "import tensorflow_addons as tfa\n", "\n", "import masterful\n", "\n", "masterful = masterful.register()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Prepare the Data\n", "\n", "For this guide, you will use only 1% of the CIFAR-10 data as your labeled\n", "dataset, in order to simulate a small of amount of labeled training\n", "data. You will then use 10x that amount of unlabeled data (from the remaining\n", "CIFAR-10 dataset) in order to boost the performance of your model\n", "at training time. Why should you use 10x the amount of unlabeled data?\n", "In practice, we have found diminishing returns from larger amounts of\n", "unlabeled data, and an ideal range is generally between 2-10x the size\n", "of your labeled data." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-04-05T15:16:47.413801Z", "iopub.status.busy": "2022-04-05T15:16:47.412955Z", "iopub.status.idle": "2022-04-05T15:16:52.460473Z", "shell.execute_reply": "2022-04-05T15:16:52.459712Z" } }, "outputs": [], "source": [ "NUM_CLASSES = 10\n", "(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()\n", "\n", "# Normalize into the [0,1] range for numerical stability.\n", "x_train = x_train.astype(\"float32\") / 255.0\n", "x_test = x_test.astype(\"float32\") / 255.0\n", "\n", "# Masterful does not recommend sparse labels so convert to categorical.\n", "y_train = tf.keras.utils.to_categorical(y_train, NUM_CLASSES)\n", "y_test = tf.keras.utils.to_categorical(y_test, NUM_CLASSES)\n", "\n", "# Shuffle the data, and take 1% for the labeled data set,\n", "# and 10x that amount for the unlabeled dataset.\n", "training_percentage = 0.01\n", "unlabeled_multiplier = 10\n", "dataset_size = len(x_train)\n", "indices = np.array(range(dataset_size))\n", "generator = np.random.default_rng(seed=42)\n", "generator.shuffle(indices)\n", "cut = int(training_percentage * dataset_size)\n", "train_indices = indices[:cut]\n", "unlabeled_indices = indices[\n", " cut : cut + int(dataset_size * training_percentage * unlabeled_multiplier)\n", "]\n", "\n", "# Create the datasets from the splits\n", "training_dataset = tf.data.Dataset.from_tensor_slices(\n", " (x_train[train_indices], y_train[train_indices])\n", ")\n", "unlabeled_dataset = tf.data.Dataset.from_tensor_slices((x_train[unlabeled_indices],))\n", "\n", "# Split the test dataset into a test and validation dataset.\n", "# The validation dataset is used for measuring training performance.\n", "indices = np.array(range(len(x_test)))\n", "generator.shuffle(indices)\n", "test_indices = indices[:5000]\n", "validation_indices = indices[5000:]\n", "test_dataset = tf.data.Dataset.from_tensor_slices(\n", " (x_test[test_indices], y_test[test_indices])\n", ")\n", "validation_dataset = tf.data.Dataset.from_tensor_slices(\n", " (x_test[validation_indices], y_test[validation_indices])\n", ")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Create the Model\n", "\n", "For this example, you will use a ResNet-18v2 model from\n", "[Identity Mappings in Deep Residual Networks](https://arxiv.org/abs/1603.05027).\n", "ResNet's are a very standard architecture and with a good training\n", "methodology can meet most state of the art results. In general,\n", "a ResNet-18 would be way too large for only 500 labeled examples\n", "of data. And for this guide, you could use a much smaller model that\n", "would train a lot faster and still achieve the same results. However,\n", "in part 2 of this guide, you will learn how to take advantage of even\n", "more unlabeled data using self-supervision inside of Masterful. In order\n", "to realize those gains, you need a model with the capacity to handle\n", "the size of your unlabeled dataset, not just your labeled data. You\n", "will use the model trained here in Part 2 to demonstrate and compare\n", "against those gains.\n", "\n", "The only difference between the model defined below and the\n", "ResNet-18 definition in the paper is the first convolutional layer\n", "has been reduced from a 7x7 convolution to a 3x3 convolution, in\n", "order to handle the small input size of CIFAR-10 better." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-04-05T15:16:52.474215Z", "iopub.status.busy": "2022-04-05T15:16:52.468818Z", "iopub.status.idle": "2022-04-05T15:16:53.779752Z", "shell.execute_reply": "2022-04-05T15:16:53.778892Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2022-04-05 15:16:52.783 ip-172-31-37-63:2606 INFO utils.py:27] RULE_JOB_STOP_SIGNAL_FILENAME: None\n", "[2022-04-05 15:16:52.857 ip-172-31-37-63:2606 INFO profiler_config_parser.py:111] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.\n", "Model: \"model\"\n", "__________________________________________________________________________________________________\n", "Layer (type) Output Shape Param # Connected to \n", "==================================================================================================\n", "input_1 (InputLayer) [(None, 32, 32, 3)] 0 \n", "__________________________________________________________________________________________________\n", "zero_padding2d (ZeroPadding2D) (None, 38, 38, 3) 0 input_1[0][0] \n", "__________________________________________________________________________________________________\n", "conv2d (Conv2D) (None, 36, 36, 64) 1792 zero_padding2d[0][0] \n", "__________________________________________________________________________________________________\n", "batch_normalization (BatchNorma (None, 36, 36, 64) 256 conv2d[0][0] \n", "__________________________________________________________________________________________________\n", "re_lu (ReLU) (None, 36, 36, 64) 0 batch_normalization[0][0] \n", "__________________________________________________________________________________________________\n", "zero_padding2d_1 (ZeroPadding2D (None, 38, 38, 64) 0 re_lu[0][0] \n", "__________________________________________________________________________________________________\n", "max_pooling2d (MaxPooling2D) (None, 18, 18, 64) 0 zero_padding2d_1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_bn1 (BatchNormaliz (None, 18, 18, 64) 256 max_pooling2d[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_relu1 (ReLU) (None, 18, 18, 64) 0 stage1_unit1_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_conv1 (Conv2D) (None, 18, 18, 64) 36928 stage1_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_bn2 (BatchNormaliz (None, 18, 18, 64) 256 stage1_unit1_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_relu2 (ReLU) (None, 18, 18, 64) 0 stage1_unit1_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_conv2 (Conv2D) (None, 18, 18, 64) 36928 stage1_unit1_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_sc1 (Conv2D) (None, 18, 18, 64) 4160 stage1_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_add1 (Add) (None, 18, 18, 64) 0 stage1_unit1_conv2[0][0] \n", " stage1_unit1_sc1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit2_bn1 (BatchNormaliz (None, 18, 18, 64) 256 stage1_unit1_add1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit2_relu1 (ReLU) (None, 18, 18, 64) 0 stage1_unit2_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit2_conv1 (Conv2D) (None, 18, 18, 64) 36928 stage1_unit2_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit2_bn2 (BatchNormaliz (None, 18, 18, 64) 256 stage1_unit2_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit2_relu2 (ReLU) (None, 18, 18, 64) 0 stage1_unit2_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit2_conv2 (Conv2D) (None, 18, 18, 64) 36928 stage1_unit2_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit2_add1 (Add) (None, 18, 18, 64) 0 stage1_unit1_add1[0][0] \n", " stage1_unit2_conv2[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_bn1 (BatchNormaliz (None, 18, 18, 64) 256 stage1_unit2_add1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_relu1 (ReLU) (None, 18, 18, 64) 0 stage2_unit1_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_conv1 (Conv2D) (None, 9, 9, 128) 73856 stage2_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_bn2 (BatchNormaliz (None, 9, 9, 128) 512 stage2_unit1_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_relu2 (ReLU) (None, 9, 9, 128) 0 stage2_unit1_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_conv2 (Conv2D) (None, 9, 9, 128) 147584 stage2_unit1_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_sc1 (Conv2D) (None, 9, 9, 128) 8320 stage2_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_add1 (Add) (None, 9, 9, 128) 0 stage2_unit1_conv2[0][0] \n", " stage2_unit1_sc1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit2_bn1 (BatchNormaliz (None, 9, 9, 128) 512 stage2_unit1_add1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit2_relu1 (ReLU) (None, 9, 9, 128) 0 stage2_unit2_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit2_conv1 (Conv2D) (None, 9, 9, 128) 147584 stage2_unit2_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit2_bn2 (BatchNormaliz (None, 9, 9, 128) 512 stage2_unit2_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit2_relu2 (ReLU) (None, 9, 9, 128) 0 stage2_unit2_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit2_conv2 (Conv2D) (None, 9, 9, 128) 147584 stage2_unit2_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit2_add1 (Add) (None, 9, 9, 128) 0 stage2_unit1_add1[0][0] \n", " stage2_unit2_conv2[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_bn1 (BatchNormaliz (None, 9, 9, 128) 512 stage2_unit2_add1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_relu1 (ReLU) (None, 9, 9, 128) 0 stage3_unit1_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_conv1 (Conv2D) (None, 5, 5, 256) 295168 stage3_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_bn2 (BatchNormaliz (None, 5, 5, 256) 1024 stage3_unit1_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_relu2 (ReLU) (None, 5, 5, 256) 0 stage3_unit1_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_conv2 (Conv2D) (None, 5, 5, 256) 590080 stage3_unit1_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_sc1 (Conv2D) (None, 5, 5, 256) 33024 stage3_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_add1 (Add) (None, 5, 5, 256) 0 stage3_unit1_conv2[0][0] \n", " stage3_unit1_sc1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit2_bn1 (BatchNormaliz (None, 5, 5, 256) 1024 stage3_unit1_add1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit2_relu1 (ReLU) (None, 5, 5, 256) 0 stage3_unit2_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit2_conv1 (Conv2D) (None, 5, 5, 256) 590080 stage3_unit2_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit2_bn2 (BatchNormaliz (None, 5, 5, 256) 1024 stage3_unit2_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit2_relu2 (ReLU) (None, 5, 5, 256) 0 stage3_unit2_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit2_conv2 (Conv2D) (None, 5, 5, 256) 590080 stage3_unit2_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit2_add1 (Add) (None, 5, 5, 256) 0 stage3_unit1_add1[0][0] \n", " stage3_unit2_conv2[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_bn1 (BatchNormaliz (None, 5, 5, 256) 1024 stage3_unit2_add1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_relu1 (ReLU) (None, 5, 5, 256) 0 stage4_unit1_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_conv1 (Conv2D) (None, 3, 3, 512) 1180160 stage4_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_bn2 (BatchNormaliz (None, 3, 3, 512) 2048 stage4_unit1_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_relu2 (ReLU) (None, 3, 3, 512) 0 stage4_unit1_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_conv2 (Conv2D) (None, 3, 3, 512) 2359808 stage4_unit1_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_sc1 (Conv2D) (None, 3, 3, 512) 131584 stage4_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_add1 (Add) (None, 3, 3, 512) 0 stage4_unit1_conv2[0][0] \n", " stage4_unit1_sc1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit2_bn1 (BatchNormaliz (None, 3, 3, 512) 2048 stage4_unit1_add1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit2_relu1 (ReLU) (None, 3, 3, 512) 0 stage4_unit2_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit2_conv1 (Conv2D) (None, 3, 3, 512) 2359808 stage4_unit2_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit2_bn2 (BatchNormaliz (None, 3, 3, 512) 2048 stage4_unit2_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit2_relu2 (ReLU) (None, 3, 3, 512) 0 stage4_unit2_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit2_conv2 (Conv2D) (None, 3, 3, 512) 2359808 stage4_unit2_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit2_add1 (Add) (None, 3, 3, 512) 0 stage4_unit1_add1[0][0] \n", " stage4_unit2_conv2[0][0] \n", "__________________________________________________________________________________________________\n", "batch_normalization_1 (BatchNor (None, 3, 3, 512) 2048 stage4_unit2_add1[0][0] \n", "__________________________________________________________________________________________________\n", "re_lu_1 (ReLU) (None, 3, 3, 512) 0 batch_normalization_1[0][0] \n", "__________________________________________________________________________________________________\n", "global_average_pooling2d (Globa (None, 512) 0 re_lu_1[0][0] \n", "__________________________________________________________________________________________________\n", "dense (Dense) (None, 10) 5130 global_average_pooling2d[0][0] \n", "==================================================================================================\n", "Total params: 11,189,194\n", "Trainable params: 11,181,258\n", "Non-trainable params: 7,936\n", "__________________________________________________________________________________________________\n" ] } ], "source": [ "from tensorflow.keras.layers import (\n", " Input,\n", " Add,\n", " Conv2D,\n", " GlobalAveragePooling2D,\n", " MaxPooling2D,\n", " ReLU,\n", " ZeroPadding2D,\n", " BatchNormalization,\n", " Dense,\n", ")\n", "\n", "\n", "def identity_block(x, name, stage, unit, n_filters):\n", " shortcut = x\n", "\n", " x = BatchNormalization(name=name.format(stage, unit, \"bn\", 1))(x)\n", " x = ReLU(name=name.format(stage, unit, \"relu\", 1))(x)\n", " x = Conv2D(\n", " n_filters,\n", " (3, 3),\n", " strides=(1, 1),\n", " padding=\"same\",\n", " kernel_initializer=\"he_uniform\",\n", " name=name.format(stage, unit, \"conv\", 1),\n", " )(x)\n", "\n", " x = BatchNormalization(name=name.format(stage, unit, \"bn\", 2))(x)\n", " x = ReLU(name=name.format(stage, unit, \"relu\", 2))(x)\n", " x = Conv2D(\n", " n_filters,\n", " (3, 3),\n", " strides=(1, 1),\n", " padding=\"same\",\n", " kernel_initializer=\"he_uniform\",\n", " name=name.format(stage, unit, \"conv\", 2),\n", " )(x)\n", "\n", " x = Add(name=name.format(stage, unit, \"add\", 1))([shortcut, x])\n", " return x\n", "\n", "\n", "def projection_block(x, name, stage, unit, strides, n_filters):\n", " x = BatchNormalization(name=name.format(stage, unit, \"bn\", 1))(x)\n", " x = ReLU(name=name.format(stage, unit, \"relu\", 1))(x)\n", " shortcut = Conv2D(\n", " n_filters,\n", " (1, 1),\n", " strides=strides,\n", " kernel_initializer=\"he_uniform\",\n", " name=name.format(stage, unit, \"sc\", 1),\n", " )(x)\n", "\n", " x = Conv2D(\n", " n_filters,\n", " (3, 3),\n", " strides=strides,\n", " padding=\"same\",\n", " kernel_initializer=\"he_uniform\",\n", " name=name.format(stage, unit, \"conv\", 1),\n", " )(x)\n", " x = BatchNormalization(name=name.format(stage, unit, \"bn\", 2))(x)\n", " x = ReLU(name=name.format(stage, unit, \"relu\", 2))(x)\n", " x = Conv2D(\n", " n_filters,\n", " (3, 3),\n", " strides=(1, 1),\n", " padding=\"same\",\n", " kernel_initializer=\"he_uniform\",\n", " name=name.format(stage, unit, \"conv\", 2),\n", " )(x)\n", "\n", " x = Add(name=name.format(stage, unit, \"add\", 1))([x, shortcut])\n", " return x\n", "\n", "\n", "def group(x, name, stage, strides, n_blocks, n_filters):\n", " x = projection_block(\n", " x, name=name, stage=stage, unit=1, strides=strides, n_filters=n_filters\n", " )\n", " for unit in range(n_blocks - 1):\n", " x = identity_block(\n", " x, name=name, stage=stage, unit=unit + 2, n_filters=n_filters\n", " )\n", " return x\n", "\n", "\n", "def resnet18(input_shape, num_classes):\n", " inputs = Input(input_shape)\n", " x = ZeroPadding2D(padding=(3, 3))(inputs)\n", "\n", " x = Conv2D(\n", " 64, (3, 3), strides=(1, 1), padding=\"valid\", kernel_initializer=\"he_uniform\"\n", " )(x)\n", " x = BatchNormalization()(x)\n", " x = ReLU()(x)\n", " x = ZeroPadding2D(padding=(1, 1))(x)\n", " x = MaxPooling2D((3, 3), strides=(2, 2))(x)\n", "\n", " x = group(\n", " x, strides=(1, 1), name=\"stage{}_unit{}_{}{}\", stage=1, n_blocks=2, n_filters=64\n", " )\n", " x = group(\n", " x,\n", " strides=(2, 2),\n", " name=\"stage{}_unit{}_{}{}\",\n", " stage=2,\n", " n_blocks=2,\n", " n_filters=128,\n", " )\n", " x = group(\n", " x,\n", " strides=(2, 2),\n", " name=\"stage{}_unit{}_{}{}\",\n", " stage=3,\n", " n_blocks=2,\n", " n_filters=256,\n", " )\n", " x = group(\n", " x,\n", " strides=(2, 2),\n", " name=\"stage{}_unit{}_{}{}\",\n", " stage=4,\n", " n_blocks=2,\n", " n_filters=512,\n", " )\n", "\n", " x = BatchNormalization()(x)\n", " x = ReLU()(x)\n", "\n", " x = GlobalAveragePooling2D()(x)\n", " x = Dense(num_classes, kernel_initializer=\"he_normal\")(x)\n", " return tf.keras.Model(inputs=inputs, outputs=x)\n", "\n", "\n", "INPUT_SHAPE = (32, 32, 3)\n", "NUM_CLASSES = 10\n", "\n", "model = resnet18(INPUT_SHAPE, NUM_CLASSES)\n", "model.summary()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Baseline Training\n", "\n", "In order to measure the performance improvements from Masterful,\n", "you should measure the performance of your model after training\n", "with a standard training loop, with no unlabeled data. Below, you\n", "will setup a standard training loop with some basic data augmentation\n", "(color space augmentation, random resized crops, and horizontal\n", "mirroring).\n", "\n", "The performance of this model should be very poor. There are only\n", "50 labeled examples per class, so in general this model will perform\n", "barely above random guessing. The hyperparameter values below (learning\n", "rate, epochs, batch size, etc) were all found using a manual search." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-04-05T15:16:53.793023Z", "iopub.status.busy": "2022-04-05T15:16:53.789151Z", "iopub.status.idle": "2022-04-05T15:17:35.200476Z", "shell.execute_reply": "2022-04-05T15:17:35.199808Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "20/20 [==============================] - 0s 15ms/step - loss: 2.7604 - categorical_accuracy: 0.1452\n", "Baseline model accuracy: 0.1451999992132187\n" ] } ], "source": [ "\n", "def augment_image(image):\n", " \"\"\"A simple augmentation pipeline.\"\"\"\n", " image = tf.image.random_brightness(image, 0.1)\n", " image = tf.image.random_hue(image, 0.1)\n", " image = tf.image.random_crop(image, size=[28, 28, 3])\n", " image = tf.image.resize(image, size=[32, 32])\n", " image = tf.image.random_flip_left_right(image)\n", " return image\n", "\n", "\n", "model.compile(\n", " optimizer=tfa.optimizers.LAMB(learning_rate=0.001),\n", " loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),\n", " metrics=[tf.keras.metrics.CategoricalAccuracy()],\n", ")\n", "\n", "batch_size = 256\n", "shuffle_buffer_size = 500\n", "epochs = 30\n", "model.fit(\n", " training_dataset.shuffle(shuffle_buffer_size)\n", " .map(lambda image, label: (augment_image(image), label))\n", " .batch(batch_size),\n", " validation_data=validation_dataset.batch(batch_size),\n", " epochs=epochs,\n", " verbose=0,\n", ")\n", "baseline_metrics = model.evaluate(test_dataset.batch(batch_size), return_dict=True)\n", "print(f\"Baseline model accuracy: {baseline_metrics['categorical_accuracy']}\")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Setup Masterful\n", "\n", "The Masterful AutoML platform learns how to train your model by\n", "focusing on five core organizational principles in deep\n", "learning: architecture, data, optimization, regularization,\n", "and semi-supervision.\n", "\n", "**Architecture** is the structure of weights, biases, and activations\n", "that define a model. In this example, the architecture is defined by the model you created above.\n", "\n", "**Data** is the input used to train the model. In this example, you\n", "are using a labeled training dataset - [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html).\n", "More advanced usages of the Masterful AutoML platform can take into account unlabeled and synthetic\n", "data as well, using a variety of different techniques.\n", "\n", "**Optimization** means finding the best weights for a model and\n", "training data. Optimization is different from regularization because\n", "optimization does not consider generalization to unseen data. The\n", "central challenge of optimization is speed - find the best weights\n", "faster.\n", "\n", "**Regularization** means helping a model generalize to data it has\n", "not yet seen. Another way of saying this is that regularization is\n", "about fighting overfitting.\n", "\n", "**Semi-Supervision** is the process by which a model can be trained\n", "using both labeled and unlabeled data.\n", "\n", "The first step when using Masterful is to learn the optimal set of\n", "parameters for each of the five buckets above. You start by learning\n", "the architecture and data parameters of the model and training dataset. In the code below, you are telling Masterful that your model is performing a classification task (`masterful.enums.Task.CLASSIFICATION`) with 10 labels (`num_classes=NUM_CLASSES`), and that the input range of the image features going into your model are in the range [0,255] (`input_range=masterful.enums.ImageRange.ZERO_255`). Also, the model outputs logits rather than a softmax classification (`prediction_logits=True`).\n", "\n", "Furthermore, in the training dataset, you are providing dense labels\n", "(`sparse_labels=False`) rather than sparse labels.\n", "\n", "For more details on architecture and data parameters, see the API\n", "specifications for [ArchitectureParams](../api/api_architecture.rst#masterful.architecture.ArchitectureParams) and\n", "[DataParams](../api/api_data.rst#masterful.data.DataParams)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-04-05T15:17:35.207609Z", "iopub.status.busy": "2022-04-05T15:17:35.206828Z", "iopub.status.idle": "2022-04-05T15:17:35.647059Z", "shell.execute_reply": "2022-04-05T15:17:35.647567Z" } }, "outputs": [], "source": [ "# Start fresh with a new model\n", "tf.keras.backend.clear_session()\n", "model = resnet18(INPUT_SHAPE, NUM_CLASSES)\n", "model_params = masterful.architecture.learn_architecture_params(\n", " model=model,\n", " task=masterful.enums.Task.CLASSIFICATION,\n", " input_range=masterful.enums.ImageRange.ZERO_ONE,\n", " num_classes=NUM_CLASSES,\n", " prediction_logits=True,\n", ")\n", "training_dataset_params = masterful.data.learn_data_params(\n", " dataset=training_dataset,\n", " task=masterful.enums.Task.CLASSIFICATION,\n", " image_range=masterful.enums.ImageRange.ZERO_ONE,\n", " num_classes=NUM_CLASSES,\n", " sparse_labels=False,\n", ")\n", "validation_dataset_params = masterful.data.learn_data_params(\n", " dataset=validation_dataset,\n", " task=masterful.enums.Task.CLASSIFICATION,\n", " image_range=masterful.enums.ImageRange.ZERO_ONE,\n", " num_classes=NUM_CLASSES,\n", " sparse_labels=False,\n", ")\n", "unlabeled_dataset_params = masterful.data.learn_data_params(\n", " dataset=unlabeled_dataset,\n", " task=masterful.enums.Task.CLASSIFICATION,\n", " image_range=masterful.enums.ImageRange.ZERO_ONE,\n", " num_classes=NUM_CLASSES,\n", " sparse_labels=None,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "Next you learn the optimization parameters that will be used to train\n", "the model. Below, you use Masterful to learn the standard set of\n", "optimization parameters to train your model for a classification task.\n", "\n", "For more details on the optmization parameters, please see the [OptimizationParams](../api/api_optimization.rst#masterful.optimization.OptimizationParams) API specification." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-04-05T15:17:35.651867Z", "iopub.status.busy": "2022-04-05T15:17:35.651116Z", "iopub.status.idle": "2022-04-05T15:18:02.403319Z", "shell.execute_reply": "2022-04-05T15:18:02.402583Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MASTERFUL: Learning optimal batch size.\n", "MASTERFUL: Learning optimal initial learning rate for batch size 32.\n" ] } ], "source": [ "optimization_params = masterful.optimization.learn_optimization_params(\n", " model,\n", " model_params,\n", " training_dataset,\n", " training_dataset_params,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "The regularization parameters used can have a dramatic impact on the\n", "final performance of your trained model. Learning these parameters can\n", "be a time-consuming and domain specific challenge. Masterful can speed\n", "up this process by learning these parameters for you. In general, this\n", "can be an expensive operation. A rough order of magnitude for learning\n", "these parameters is 2x the time it takes to train your model. However,\n", "this is still dramatically faster than manually finding these\n", "parameters yourself. In the example below, you will use one of the\n", "many sets of pre-learned regularization parameters that are shipped\n", "in the Masterful API. In most instances, you should learn these\n", "parameters directly using the [learn_regularization_params](../api/api_regularization.rst#masterful.regularization.learn_regularization_params) API.\n", "\n", "For more details on the regularization parameters, please see the\n", "[RegularizationParams](../api/api_regularization.rst#masterful.regularization.RegularizationParams) API specification." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-04-05T15:18:02.407572Z", "iopub.status.busy": "2022-04-05T15:18:02.406825Z", "iopub.status.idle": "2022-04-05T15:18:02.409471Z", "shell.execute_reply": "2022-04-05T15:18:02.408845Z" } }, "outputs": [], "source": [ "# This is a set of parameters learned on CIFAR10 for\n", "# for ResNet18 models.\n", "regularization_params = masterful.regularization.parameters.CIFAR10_RESNET18" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "The final step before training is to learn the optimal set of\n", "semi-supervision parameters. In this example, Masterful will\n", "apply [Noisy Student Training](https://arxiv.org/abs/1911.04252)\n", "to improve your model during training with the provided unlabeled\n", "data.\n", "\n", "For more details on the semi-supervision parameters, please see the\n", "[SemiSupervisedParams](../api/api_ssl.rst#masterful.ssl.SemiSupervisedParams) API specification." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-04-05T15:18:02.413547Z", "iopub.status.busy": "2022-04-05T15:18:02.412848Z", "iopub.status.idle": "2022-04-05T15:18:02.415531Z", "shell.execute_reply": "2022-04-05T15:18:02.414813Z" } }, "outputs": [], "source": [ "ssl_params = masterful.ssl.learn_ssl_params(\n", " training_dataset,\n", " training_dataset_params,\n", " unlabeled_datasets=[(unlabeled_dataset, unlabeled_dataset_params)],\n", ")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Training with Unlabeled Data\n", "\n", "Now, you are ready to train your model using the Masterful AutoML\n", "platform. In the next cell, you will see the call to\n", "[masterful.training.train](../api/api_training.rst#masterful.training.train),\n", "which is the entry point to the meta-learning engine of the Masterful AutoML\n", "platform. Notice there is no need to batch your data (Masterful will\n", "find the optimal batch size for you). No need to shuffle your data\n", "(Masterful handles this for you). You don't even need to pass in a\n", "validation dataset (Masterful finds one for you). You hand Masterful\n", "a model and a dataset, and Masterful will figure the rest out for you." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-04-05T15:18:02.419896Z", "iopub.status.busy": "2022-04-05T15:18:02.419189Z", "iopub.status.idle": "2022-04-05T15:36:07.251670Z", "shell.execute_reply": "2022-04-05T15:36:07.250952Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MASTERFUL: Training model with semi-supervised learning enabled.\n", "MASTERFUL: Performing basic dataset analysis.\n", "MASTERFUL: Training model with:\n", "MASTERFUL: \t500 labeled examples.\n", "MASTERFUL: \t5000 validation examples.\n", "MASTERFUL: \t0 synthetic examples.\n", "MASTERFUL: \t5000 unlabeled examples.\n", "MASTERFUL: Training model with learned parameters wing-polarized-spectacles in two phases.\n", "MASTERFUL: The first phase is supervised training with the learned parameters.\n", "MASTERFUL: The second phase is semi-supervised training to boost performance.\n", "MASTERFUL: Warming up model for supervised training.\n", "MASTERFUL: \tWarming up batch norm statistics (this could take a few minutes).\n", "MASTERFUL: \tWarming up training for 500 steps.\n", "100%|██████████| 500/500 [02:02<00:00, 4.08steps/s]\n", "MASTERFUL: \tValidating batch norm statistics after warmup for stability (this could take a few minutes).\n", "MASTERFUL: Starting Phase 1: Supervised training until the validation loss stabilizes...\n", "Supervised Training: 100%|██████████| 4152/4152 [01:21<00:00, 50.79steps/s] \n", "MASTERFUL: Starting Phase 2: Semi-supervised training until the validation loss stabilizes...\n", "MASTERFUL: Warming up model for semi-supervised training.\n", "MASTERFUL: \tWarming up batch norm statistics (this could take a few minutes).\n", "MASTERFUL: \tWarming up training for 500 steps.\n", "100%|██████████| 500/500 [01:17<00:00, 6.43steps/s]\n", "MASTERFUL: \tValidating batch norm statistics after warmup for stability (this could take a few minutes).\n", "Semi-Supervised Training: 100%|██████████| 8554/8554 [11:02<00:00, 12.92steps/s] \n", "MASTERFUL: Semi-Supervised training complete.\n", "MASTERFUL: Training complete in 17.955665187040964 minutes.\n" ] } ], "source": [ "training_report = masterful.training.train(\n", " model,\n", " model_params,\n", " optimization_params,\n", " regularization_params,\n", " ssl_params,\n", " training_dataset,\n", " training_dataset_params,\n", " validation_dataset,\n", " validation_dataset_params,\n", " unlabeled_datasets=[(unlabeled_dataset, unlabeled_dataset_params)],\n", ")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "The model you passed into [masterful.training.train](../api/api_training.rst#masterful.training.train)\n", "is now trained and updated in place, so you are able to evaluate it\n", "just like any other trained Keras model." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-04-05T15:36:07.256967Z", "iopub.status.busy": "2022-04-05T15:36:07.256213Z", "iopub.status.idle": "2022-04-05T15:36:08.205001Z", "shell.execute_reply": "2022-04-05T15:36:08.204232Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "157/157 [==============================] - 1s 5ms/step - loss: 1.8697 - categorical_accuracy: 0.3226\n", "Baseline model accuracy: 0.1451999992132187\n", "Masterful model accuracy: 0.32260000705718994\n" ] } ], "source": [ "masterful_metrics = model.evaluate(\n", " test_dataset.batch(optimization_params.batch_size), return_dict=True\n", ")\n", "print(f\"Baseline model accuracy: {baseline_metrics['categorical_accuracy']}\")\n", "print(f\"Masterful model accuracy: {masterful_metrics['categorical_accuracy']}\")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "As you can see, you reduced the error rate of your model by 10-30%\n", "(results may vary depending on your run) simply by\n", "using unlabeled data with the Masterful AutoML platform. However, the\n", "final accuracy of this model (25-35%) is still not sufficient to deploy it to production.\n", "Read [Part 2](../notebooks/guide_ssl_using_unlabeled_data_part2.ipynb) of this\n", "guide to improve this model even more." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Next Steps\n", "\n", "In [Part 2](../notebooks/guide_ssl_using_unlabeled_data_part2.ipynb) of this guide, you will\n", "look at improving these results even more with self-supervision. By the end\n", "of [Part 2](../notebooks/guide_ssl_using_unlabeled_data_part2.ipynb) you will have\n", "a *production* model from this very limited dataset." ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "guide_ssl_using_unlabeled_data_part1", "private_outputs": false, "provenance": [], "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }