{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "# Unlabeled Data with Masterful (Part 2)\n", "\n", "**Author:** [sam](mailto:sam@masterfulai.com) \n", "**Date created:** 2022/03/29 \n", "**Last modified:** 2022/03/29 \n", "**Description:** Part 2 of using unlabeled data with Masterful." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)][1]        [![Download](images/download.png)][2][Download this Notebook][2]\n", "\n", "[1]:https://colab.research.google.com/github/masterfulai/masterful-docs/blob/main/notebooks/guide_ssl_using_unlabeled_data_part2.ipynb\n", "[2]:http://docs.masterfulai.com/0.4.1/notebooks/guide_ssl_using_unlabeled_data_part2.ipynb" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Introduction\n", "\n", "In this guide, you will improve upon the results you achieved\n", "in [Part 1](../notebooks/guide_ssl_using_unlabeled_data_part1.ipynb)\n", "using semi-supervised learning inside of the Masterful AutoML platform.\n", "You will learn how to use a pretraining technique which can generate\n", "an optimal set of initial weights for your model from only unlabeled data.\n", "The pretraining algorithm you will use is a form of self-supervision which\n", "can learn meaningful representations of your dataset without any labels at\n", "all. In other words, at the end of this guide, you will have learned\n", "how to take a model that was only 15% accurate using 500 labeled training\n", "examples, and improve its accuracy to over 75% using only unlabeled training\n", "examples!" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Prerequisites\n", "\n", "Please follow the Masterful installation instructions [here](../tutorials/tutorial_installation.md)\n", "in order to run this Quickstart." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Imports\n", "\n", "First, import the necessary libraries and register the Masterful package." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-04-07T01:28:48.615045Z", "iopub.status.busy": "2022-04-07T01:28:48.614269Z", "iopub.status.idle": "2022-04-07T01:28:52.482538Z", "shell.execute_reply": "2022-04-07T01:28:52.481694Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MASTERFUL: Your account has been successfully registered. Masterful v0.4.1.dev202204071649294505 is loaded.\n" ] } ], "source": [ "import numpy as np\n", "import os\n", "import tempfile\n", "import tensorflow as tf\n", "import urllib.request\n", "\n", "import masterful\n", "\n", "masterful = masterful.register()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Prepare the Data\n", "\n", "For this guide, you will use only 1% of the CIFAR-10 data as your labeled\n", "dataset, in order to simulate a small of amount of labeled training\n", "data. You will also use the **full** CIFAR-10 training dataset\n", "**without** labels, as a pretraining step to learn an accurate\n", "representation of the this dataset in an unsupervised way (as part\n", "of self-supervision using Masterful).\n", "\n", "Finally, you will then use the same 10x unlabeled data as in\n", "[Part 1](../notebooks/guide_ssl_using_unlabeled_data_part1.ipynb)\n", "to train a classification model from the representation you learned.\n", "\n", "Below, you will create training, validation, and test sets that\n", "are duplicates of the [Part 1](../notebooks/guide_ssl_using_unlabeled_data_part1.ipynb)\n", "datasets. In addition, you will a much larger unlabeled dataset\n", "to use with [learn_representation](../api/api_ssl.rst#masterful.ssl.learn_representation)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-04-07T01:28:52.491699Z", "iopub.status.busy": "2022-04-07T01:28:52.490904Z", "iopub.status.idle": "2022-04-07T01:28:55.857654Z", "shell.execute_reply": "2022-04-07T01:28:55.858166Z" } }, "outputs": [], "source": [ "NUM_CLASSES = 10\n", "(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()\n", "\n", "# Normalize into the [0,1] range for numerical stability.\n", "x_train = x_train.astype(\"float32\") / 255.0\n", "x_test = x_test.astype(\"float32\") / 255.0\n", "\n", "# Masterful does not recommend sparse labels so convert to categorical.\n", "y_train = tf.keras.utils.to_categorical(y_train, NUM_CLASSES)\n", "y_test = tf.keras.utils.to_categorical(y_test, NUM_CLASSES)\n", "\n", "# Shuffle the data, and take 1% for the labeled data set,\n", "# and 10x that amount for the unlabeled dataset.\n", "training_percentage = 0.01\n", "unlabeled_multiplier = 10\n", "dataset_size = len(x_train)\n", "indices = np.array(range(dataset_size))\n", "generator = np.random.default_rng(seed=42)\n", "generator.shuffle(indices)\n", "cut = int(training_percentage * dataset_size)\n", "part1_train_indices = indices[:cut]\n", "part1_unlabeled_indices = indices[\n", " cut : cut + int(dataset_size * training_percentage * unlabeled_multiplier)\n", "]\n", "\n", "# Create the datasets from the splits. In order to\n", "# compare against Part 1, you will explicitly create\n", "# the same 1% training dataset and 10x unlabeled dataset.\n", "# In addition, you will create a much larger unlabeled\n", "# dataset you will use for representation learning.\n", "part1_training_dataset = tf.data.Dataset.from_tensor_slices(\n", " (x_train[part1_train_indices], y_train[part1_train_indices])\n", ")\n", "part1_unlabeled_dataset = tf.data.Dataset.from_tensor_slices(\n", " (x_train[part1_unlabeled_indices],)\n", ")\n", "full_unlabeled_dataset = tf.data.Dataset.from_tensor_slices((x_train,))\n", "\n", "# Split the test dataset into a test and validation dataset.\n", "# The validation dataset is used for measuring training performance.\n", "indices = np.array(range(len(x_test)))\n", "generator.shuffle(indices)\n", "test_indices = indices[:5000]\n", "validation_indices = indices[5000:]\n", "test_dataset = tf.data.Dataset.from_tensor_slices(\n", " (x_test[test_indices], y_test[test_indices])\n", ")\n", "validation_dataset = tf.data.Dataset.from_tensor_slices(\n", " (x_test[validation_indices], y_test[validation_indices])\n", ")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Create the Model\n", "\n", "You are going to use the same model architecture from [Part 1](../notebooks/guide_ssl_using_unlabeled_data_part1.ipynb)\n", "of this guide here. However, you are going to separate the model into\n", "two parts: and embedding module and a classification head. This just\n", "mean you will not add a classification head to the model (typically a\n", "pooling layer followed by a final dense layer). Why do this? In the first\n", "step, you will be learning an embedded representation of the unlabeled\n", "dataset using self-supervision, and for this task, you do not need\n", "the classification layer. In fact, since there are no labels, there is\n", "no way to make use of a classification layer...yet. So below, you will\n", "see us create the same model as [Part 1](../notebooks/guide_ssl_using_unlabeled_data_part1.ipynb),\n", "and then you will slice off the last two layers (GlobalAveragePooling2D and Dense)\n", "to create an \"embedding model\"." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-04-07T01:28:55.871788Z", "iopub.status.busy": "2022-04-07T01:28:55.861165Z", "iopub.status.idle": "2022-04-07T01:28:56.981371Z", "shell.execute_reply": "2022-04-07T01:28:56.980635Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Extension horovod.torch has not been built: /home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/horovod/torch/mpi_lib/_mpi_lib.cpython-37m-x86_64-linux-gnu.so not found\n", "If this is not expected, reinstall Horovod with HOROVOD_WITH_PYTORCH=1 to debug the build error.\n", "Warning! MPI libs are missing, but python applications are still avaiable.\n", "[2022-04-07 01:28:56.004 ip-172-31-46-120:13987 INFO utils.py:27] RULE_JOB_STOP_SIGNAL_FILENAME: None\n", "[2022-04-07 01:28:56.032 ip-172-31-46-120:13987 INFO profiler_config_parser.py:111] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.\n", "Model: \"model_1\"\n", "__________________________________________________________________________________________________\n", "Layer (type) Output Shape Param # Connected to \n", "==================================================================================================\n", "input_1 (InputLayer) [(None, 32, 32, 3)] 0 \n", "__________________________________________________________________________________________________\n", "zero_padding2d (ZeroPadding2D) (None, 38, 38, 3) 0 input_1[0][0] \n", "__________________________________________________________________________________________________\n", "conv2d (Conv2D) (None, 36, 36, 64) 1792 zero_padding2d[0][0] \n", "__________________________________________________________________________________________________\n", "batch_normalization (BatchNorma (None, 36, 36, 64) 256 conv2d[0][0] \n", "__________________________________________________________________________________________________\n", "re_lu (ReLU) (None, 36, 36, 64) 0 batch_normalization[0][0] \n", "__________________________________________________________________________________________________\n", "zero_padding2d_1 (ZeroPadding2D (None, 38, 38, 64) 0 re_lu[0][0] \n", "__________________________________________________________________________________________________\n", "max_pooling2d (MaxPooling2D) (None, 18, 18, 64) 0 zero_padding2d_1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_bn1 (BatchNormaliz (None, 18, 18, 64) 256 max_pooling2d[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_relu1 (ReLU) (None, 18, 18, 64) 0 stage1_unit1_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_conv1 (Conv2D) (None, 18, 18, 64) 36928 stage1_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_bn2 (BatchNormaliz (None, 18, 18, 64) 256 stage1_unit1_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_relu2 (ReLU) (None, 18, 18, 64) 0 stage1_unit1_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_conv2 (Conv2D) (None, 18, 18, 64) 36928 stage1_unit1_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_sc1 (Conv2D) (None, 18, 18, 64) 4160 stage1_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_add1 (Add) (None, 18, 18, 64) 0 stage1_unit1_conv2[0][0] \n", " stage1_unit1_sc1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit2_bn1 (BatchNormaliz (None, 18, 18, 64) 256 stage1_unit1_add1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit2_relu1 (ReLU) (None, 18, 18, 64) 0 stage1_unit2_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit2_conv1 (Conv2D) (None, 18, 18, 64) 36928 stage1_unit2_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit2_bn2 (BatchNormaliz (None, 18, 18, 64) 256 stage1_unit2_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit2_relu2 (ReLU) (None, 18, 18, 64) 0 stage1_unit2_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit2_conv2 (Conv2D) (None, 18, 18, 64) 36928 stage1_unit2_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit2_add1 (Add) (None, 18, 18, 64) 0 stage1_unit1_add1[0][0] \n", " stage1_unit2_conv2[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_bn1 (BatchNormaliz (None, 18, 18, 64) 256 stage1_unit2_add1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_relu1 (ReLU) (None, 18, 18, 64) 0 stage2_unit1_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_conv1 (Conv2D) (None, 9, 9, 128) 73856 stage2_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_bn2 (BatchNormaliz (None, 9, 9, 128) 512 stage2_unit1_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_relu2 (ReLU) (None, 9, 9, 128) 0 stage2_unit1_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_conv2 (Conv2D) (None, 9, 9, 128) 147584 stage2_unit1_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_sc1 (Conv2D) (None, 9, 9, 128) 8320 stage2_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_add1 (Add) (None, 9, 9, 128) 0 stage2_unit1_conv2[0][0] \n", " stage2_unit1_sc1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit2_bn1 (BatchNormaliz (None, 9, 9, 128) 512 stage2_unit1_add1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit2_relu1 (ReLU) (None, 9, 9, 128) 0 stage2_unit2_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit2_conv1 (Conv2D) (None, 9, 9, 128) 147584 stage2_unit2_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit2_bn2 (BatchNormaliz (None, 9, 9, 128) 512 stage2_unit2_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit2_relu2 (ReLU) (None, 9, 9, 128) 0 stage2_unit2_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit2_conv2 (Conv2D) (None, 9, 9, 128) 147584 stage2_unit2_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit2_add1 (Add) (None, 9, 9, 128) 0 stage2_unit1_add1[0][0] \n", " stage2_unit2_conv2[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_bn1 (BatchNormaliz (None, 9, 9, 128) 512 stage2_unit2_add1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_relu1 (ReLU) (None, 9, 9, 128) 0 stage3_unit1_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_conv1 (Conv2D) (None, 5, 5, 256) 295168 stage3_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_bn2 (BatchNormaliz (None, 5, 5, 256) 1024 stage3_unit1_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_relu2 (ReLU) (None, 5, 5, 256) 0 stage3_unit1_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_conv2 (Conv2D) (None, 5, 5, 256) 590080 stage3_unit1_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_sc1 (Conv2D) (None, 5, 5, 256) 33024 stage3_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_add1 (Add) (None, 5, 5, 256) 0 stage3_unit1_conv2[0][0] \n", " stage3_unit1_sc1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit2_bn1 (BatchNormaliz (None, 5, 5, 256) 1024 stage3_unit1_add1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit2_relu1 (ReLU) (None, 5, 5, 256) 0 stage3_unit2_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit2_conv1 (Conv2D) (None, 5, 5, 256) 590080 stage3_unit2_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit2_bn2 (BatchNormaliz (None, 5, 5, 256) 1024 stage3_unit2_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit2_relu2 (ReLU) (None, 5, 5, 256) 0 stage3_unit2_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit2_conv2 (Conv2D) (None, 5, 5, 256) 590080 stage3_unit2_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit2_add1 (Add) (None, 5, 5, 256) 0 stage3_unit1_add1[0][0] \n", " stage3_unit2_conv2[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_bn1 (BatchNormaliz (None, 5, 5, 256) 1024 stage3_unit2_add1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_relu1 (ReLU) (None, 5, 5, 256) 0 stage4_unit1_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_conv1 (Conv2D) (None, 3, 3, 512) 1180160 stage4_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_bn2 (BatchNormaliz (None, 3, 3, 512) 2048 stage4_unit1_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_relu2 (ReLU) (None, 3, 3, 512) 0 stage4_unit1_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_conv2 (Conv2D) (None, 3, 3, 512) 2359808 stage4_unit1_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_sc1 (Conv2D) (None, 3, 3, 512) 131584 stage4_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_add1 (Add) (None, 3, 3, 512) 0 stage4_unit1_conv2[0][0] \n", " stage4_unit1_sc1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit2_bn1 (BatchNormaliz (None, 3, 3, 512) 2048 stage4_unit1_add1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit2_relu1 (ReLU) (None, 3, 3, 512) 0 stage4_unit2_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit2_conv1 (Conv2D) (None, 3, 3, 512) 2359808 stage4_unit2_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit2_bn2 (BatchNormaliz (None, 3, 3, 512) 2048 stage4_unit2_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit2_relu2 (ReLU) (None, 3, 3, 512) 0 stage4_unit2_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit2_conv2 (Conv2D) (None, 3, 3, 512) 2359808 stage4_unit2_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit2_add1 (Add) (None, 3, 3, 512) 0 stage4_unit1_add1[0][0] \n", " stage4_unit2_conv2[0][0] \n", "__________________________________________________________________________________________________\n", "batch_normalization_1 (BatchNor (None, 3, 3, 512) 2048 stage4_unit2_add1[0][0] \n", "__________________________________________________________________________________________________\n", "re_lu_1 (ReLU) (None, 3, 3, 512) 0 batch_normalization_1[0][0] \n", "==================================================================================================\n", "Total params: 11,184,064\n", "Trainable params: 11,176,128\n", "Non-trainable params: 7,936\n", "__________________________________________________________________________________________________\n" ] } ], "source": [ "from tensorflow.keras.layers import (\n", " Input,\n", " Add,\n", " Conv2D,\n", " GlobalAveragePooling2D,\n", " MaxPooling2D,\n", " ReLU,\n", " ZeroPadding2D,\n", " BatchNormalization,\n", " Dense,\n", ")\n", "\n", "\n", "def identity_block(x, name, stage, unit, n_filters):\n", " shortcut = x\n", "\n", " x = BatchNormalization(name=name.format(stage, unit, \"bn\", 1))(x)\n", " x = ReLU(name=name.format(stage, unit, \"relu\", 1))(x)\n", " x = Conv2D(\n", " n_filters,\n", " (3, 3),\n", " strides=(1, 1),\n", " padding=\"same\",\n", " kernel_initializer=\"he_uniform\",\n", " name=name.format(stage, unit, \"conv\", 1),\n", " )(x)\n", "\n", " x = BatchNormalization(name=name.format(stage, unit, \"bn\", 2))(x)\n", " x = ReLU(name=name.format(stage, unit, \"relu\", 2))(x)\n", " x = Conv2D(\n", " n_filters,\n", " (3, 3),\n", " strides=(1, 1),\n", " padding=\"same\",\n", " kernel_initializer=\"he_uniform\",\n", " name=name.format(stage, unit, \"conv\", 2),\n", " )(x)\n", "\n", " x = Add(name=name.format(stage, unit, \"add\", 1))([shortcut, x])\n", " return x\n", "\n", "\n", "def projection_block(x, name, stage, unit, strides, n_filters):\n", " x = BatchNormalization(name=name.format(stage, unit, \"bn\", 1))(x)\n", " x = ReLU(name=name.format(stage, unit, \"relu\", 1))(x)\n", " shortcut = Conv2D(\n", " n_filters,\n", " (1, 1),\n", " strides=strides,\n", " kernel_initializer=\"he_uniform\",\n", " name=name.format(stage, unit, \"sc\", 1),\n", " )(x)\n", "\n", " x = Conv2D(\n", " n_filters,\n", " (3, 3),\n", " strides=strides,\n", " padding=\"same\",\n", " kernel_initializer=\"he_uniform\",\n", " name=name.format(stage, unit, \"conv\", 1),\n", " )(x)\n", " x = BatchNormalization(name=name.format(stage, unit, \"bn\", 2))(x)\n", " x = ReLU(name=name.format(stage, unit, \"relu\", 2))(x)\n", " x = Conv2D(\n", " n_filters,\n", " (3, 3),\n", " strides=(1, 1),\n", " padding=\"same\",\n", " kernel_initializer=\"he_uniform\",\n", " name=name.format(stage, unit, \"conv\", 2),\n", " )(x)\n", "\n", " x = Add(name=name.format(stage, unit, \"add\", 1))([x, shortcut])\n", " return x\n", "\n", "\n", "def group(x, name, stage, strides, n_blocks, n_filters):\n", " x = projection_block(\n", " x, name=name, stage=stage, unit=1, strides=strides, n_filters=n_filters\n", " )\n", " for unit in range(n_blocks - 1):\n", " x = identity_block(\n", " x, name=name, stage=stage, unit=unit + 2, n_filters=n_filters\n", " )\n", " return x\n", "\n", "\n", "def resnet18(input_shape, num_classes):\n", " inputs = Input(input_shape)\n", " x = ZeroPadding2D(padding=(3, 3))(inputs)\n", "\n", " x = Conv2D(\n", " 64, (3, 3), strides=(1, 1), padding=\"valid\", kernel_initializer=\"he_uniform\"\n", " )(x)\n", " x = BatchNormalization()(x)\n", " x = ReLU()(x)\n", " x = ZeroPadding2D(padding=(1, 1))(x)\n", " x = MaxPooling2D((3, 3), strides=(2, 2))(x)\n", "\n", " x = group(\n", " x, strides=(1, 1), name=\"stage{}_unit{}_{}{}\", stage=1, n_blocks=2, n_filters=64\n", " )\n", " x = group(\n", " x,\n", " strides=(2, 2),\n", " name=\"stage{}_unit{}_{}{}\",\n", " stage=2,\n", " n_blocks=2,\n", " n_filters=128,\n", " )\n", " x = group(\n", " x,\n", " strides=(2, 2),\n", " name=\"stage{}_unit{}_{}{}\",\n", " stage=3,\n", " n_blocks=2,\n", " n_filters=256,\n", " )\n", " x = group(\n", " x,\n", " strides=(2, 2),\n", " name=\"stage{}_unit{}_{}{}\",\n", " stage=4,\n", " n_blocks=2,\n", " n_filters=512,\n", " )\n", "\n", " x = BatchNormalization()(x)\n", " x = ReLU()(x)\n", "\n", " x = GlobalAveragePooling2D()(x)\n", " x = Dense(num_classes, kernel_initializer=\"he_normal\")(x)\n", " return tf.keras.Model(inputs=inputs, outputs=x)\n", "\n", "\n", "INPUT_SHAPE = (32, 32, 3)\n", "NUM_CLASSES = 10\n", "\n", "classification_model = resnet18(INPUT_SHAPE, NUM_CLASSES)\n", "\n", "# Slice off the final two layers to create an embedding model.\n", "embedding_output = classification_model.layers[-3].output\n", "embedding_model = tf.keras.Model(\n", " inputs=classification_model.input, outputs=embedding_output\n", ")\n", "embedding_model.summary()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Learn Unlabeled Representation\n", "\n", "In this section, you will use a self-supervised technique\n", "called [Barlow Twins](https://arxiv.org/abs/2103.03230) to learn\n", "a representation of your unlabeled data. In essence, this will\n", "create a better set of initialized weights for you model, that\n", "will allow it train better.\n", "\n", "You will use the Masterful API [learn_representation](../api/api_ssl.rst#masterful.ssl.learn_representation)\n", "to learn from the unlabeled dataset and create an optimal set of\n", "initial weights for your model.\n", "\n", "This is the same set of steps as provided in the [Pretraining Guide](../notebooks/guide_pretraining.ipynb)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-04-07T01:28:56.989372Z", "iopub.status.busy": "2022-04-07T01:28:56.988606Z", "iopub.status.idle": "2022-04-07T01:32:39.371894Z", "shell.execute_reply": "2022-04-07T01:32:39.376531Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1/2\n", "98/98 [==============================] - 113s 992ms/step - loss: 2041.6415\n", "Epoch 2/2\n", "98/98 [==============================] - 103s 995ms/step - loss: 1973.8130\n" ] } ], "source": [ "full_unlabeled_dataset_params = masterful.data.learn_data_params(\n", " task=masterful.enums.Task.CLASSIFICATION,\n", " dataset=full_unlabeled_dataset,\n", " image_range=masterful.enums.ImageRange.ZERO_ONE,\n", " num_classes=NUM_CLASSES,\n", " sparse_labels=False,\n", ")\n", "model_params = masterful.architecture.learn_architecture_params(\n", " model=embedding_model,\n", " task=masterful.enums.Task.CLASSIFICATION,\n", " input_range=masterful.enums.ImageRange.ZERO_ONE,\n", " num_classes=NUM_CLASSES,\n", " prediction_logits=True,\n", " backbone_only=True,\n", ")\n", "\n", "# For demonstration purposes, you are only training for\n", "# two epochs here. In general though, you will want to train\n", "# for a lot longer. For example, Barlow Twins will typically\n", "# require 800 epochs of training, with 5 warmup epochs\n", "# for stability.\n", "optimization_params = masterful.optimization.OptimizationParams(\n", " batch_size=512,\n", " epochs=2,\n", " warmup_epochs=1,\n", ")\n", "\n", "# Explicitly use Barlow Twins as the representation learner.\n", "ssl_params = masterful.ssl.SemiSupervisedParams(algorithms=[\"barlow_twins\"])\n", "\n", "_ = masterful.ssl.learn_representation(\n", " model=embedding_model,\n", " model_params=model_params,\n", " optimization_params=optimization_params,\n", " ssl_params=ssl_params,\n", " unlabeled_datasets=[(full_unlabeled_dataset, full_unlabeled_dataset_params)],\n", ")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Build the Classification Model\n", "\n", "In the previous step, you created an embedding model that\n", "learned a representation from the full unlabeled dataset. But\n", "what should you do with this? The next step is take this\n", "representation and train a classification model created from it\n", "on your labeled training dataset, in this instance 1% of\n", "the CIFAR-10 dataset.\n", "\n", "The first step is to create a classification model from your\n", "embedding model. This involves attaching a dense head\n", "to the embedding model. In the below, we also load a set of\n", "pretrained weights that we created for this guide, by running\n", "[learn_representation](../api/api_ssl.rst#masterful.ssl.learn_representation)\n", "for a full 800 epochs. In particular, you can reproduce these results using the same setup as above\n", "with the following optimization parameters:\n", "\n", "```python\n", "optimization_params = OptimizationParams(batch_size=512,\n", " epochs=800,\n", " warmup_epochs=5)\n", "```" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-04-07T01:32:39.383759Z", "iopub.status.busy": "2022-04-07T01:32:39.383026Z", "iopub.status.idle": "2022-04-07T01:32:40.892075Z", "shell.execute_reply": "2022-04-07T01:32:40.891353Z" } }, "outputs": [], "source": [ "\n", "# Load the pretrained weights into the model backbone.\n", "# These weights were trained on above model using\n", "# masterful.ssl.learn_representation with the above\n", "# documented optimization parameters.\n", "def load_pretrained_weights(model):\n", " with tempfile.TemporaryDirectory() as tempdir:\n", " # Download the weights from AWS\n", " urllib.request.urlretrieve(\n", " \"https://masterful-public.s3.us-west-1.amazonaws.com/933013963/static-data/resnet18_cifar10/trained_model_weights.index\",\n", " os.path.join(tempdir, \"trained_model_weights.index\"),\n", " )\n", " urllib.request.urlretrieve(\n", " \"https://masterful-public.s3.us-west-1.amazonaws.com/933013963/static-data/resnet18_cifar10/trained_model_weights.data-00000-of-00001\",\n", " os.path.join(tempdir, \"trained_model_weights.data-00000-of-00001\"),\n", " )\n", " model.load_weights(os.path.join(tempdir, \"trained_model_weights\"))\n", "\n", "\n", "tf.keras.backend.clear_session()\n", "load_pretrained_weights(embedding_model)\n", "\n", "# Attach the classifier head to the embedding model.\n", "input = tf.keras.layers.Input(shape=INPUT_SHAPE)\n", "x = embedding_model(input)\n", "x = GlobalAveragePooling2D()(x)\n", "x = Dense(NUM_CLASSES, kernel_initializer=\"he_normal\")(x)\n", "classification_model = tf.keras.Model(inputs=input, outputs=x)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Setup Training Parameters\n", "\n", "Next, you need to setup all of the parameters for training\n", "with Masterful. These parameters are different than the ones\n", "you used to learn the representation, because you will be training\n", "on different datasets - namely the small labeled dataset and the\n", "larger (but not full) unlabeled dataset from [Part 1](../notebooks/guide_ssl_using_unlabeled_data_part1.ipynb)." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-04-07T01:32:40.903100Z", "iopub.status.busy": "2022-04-07T01:32:40.902346Z", "iopub.status.idle": "2022-04-07T01:32:40.904454Z", "shell.execute_reply": "2022-04-07T01:32:40.904935Z" } }, "outputs": [], "source": [ "model_params = masterful.architecture.learn_architecture_params(\n", " model=classification_model,\n", " task=masterful.enums.Task.CLASSIFICATION,\n", " input_range=masterful.enums.ImageRange.ZERO_ONE,\n", " num_classes=NUM_CLASSES,\n", " prediction_logits=True,\n", ")\n", "training_dataset_params = masterful.data.learn_data_params(\n", " dataset=part1_training_dataset,\n", " task=masterful.enums.Task.CLASSIFICATION,\n", " image_range=masterful.enums.ImageRange.ZERO_ONE,\n", " num_classes=NUM_CLASSES,\n", " sparse_labels=False,\n", ")\n", "validation_dataset_params = masterful.data.learn_data_params(\n", " dataset=validation_dataset,\n", " task=masterful.enums.Task.CLASSIFICATION,\n", " image_range=masterful.enums.ImageRange.ZERO_ONE,\n", " num_classes=NUM_CLASSES,\n", " sparse_labels=False,\n", ")\n", "unlabeled_dataset_params = masterful.data.learn_data_params(\n", " dataset=part1_unlabeled_dataset,\n", " task=masterful.enums.Task.CLASSIFICATION,\n", " image_range=masterful.enums.ImageRange.ZERO_ONE,\n", " num_classes=NUM_CLASSES,\n", " sparse_labels=None,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "Next you learn the optimization parameters that will be used to train\n", "the classification model. These are different than the optimization\n", "parameters you used above to learn the representation.\n", "\n", "For more details on the optmization parameters, please see the [OptimizationParams](../api/api_optimization.rst#masterful.optimization.OptimizationParams) API specification." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-04-07T01:32:40.908840Z", "iopub.status.busy": "2022-04-07T01:32:40.908143Z", "iopub.status.idle": "2022-04-07T01:33:04.819827Z", "shell.execute_reply": "2022-04-07T01:33:04.819068Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MASTERFUL: Learning optimal batch size.\n", "MASTERFUL: Learning optimal initial learning rate for batch size 32.\n" ] } ], "source": [ "optimization_params = masterful.optimization.learn_optimization_params(\n", " classification_model,\n", " model_params,\n", " part1_training_dataset,\n", " training_dataset_params,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "The regularization parameters used can have a dramatic impact on the\n", "final performance of your trained model. Learning these parameters can\n", "be a time-consuming and domain specific challenge. Masterful can speed\n", "up this process by learning these parameters for you. In general, this\n", "can be an expensive operation. A rough order of magnitude for learning\n", "these parameters is 2x the time it takes to train your model. However,\n", "this is still dramatically faster than manually finding these\n", "parameters yourself. In the example below, you will use one of the\n", "many sets of pre-learned regularization parameters that are shipped\n", "in the Masterful API. In most instances, you should learn these\n", "parameters directly using the [learn_regularization_params](../api/api_regularization.rst#masterful.regularization.learn_regularization_params) API.\n", "\n", "For more details on the regularization parameters, please see the\n", "[RegularizationParams](../api/api_regularization.rst#masterful.regularization.RegularizationParams) API specification." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-04-07T01:33:04.824237Z", "iopub.status.busy": "2022-04-07T01:33:04.823477Z", "iopub.status.idle": "2022-04-07T01:33:04.826223Z", "shell.execute_reply": "2022-04-07T01:33:04.825604Z" } }, "outputs": [], "source": [ "# This is a set of parameters learned on CIFAR10 for\n", "# for ResNet18 models.\n", "regularization_params = masterful.regularization.parameters.CIFAR10_RESNET18" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "The final step before training is to learn the optimal set of\n", "semi-supervision parameters. In this example, Masterful will\n", "apply [Noisy Student Training](https://arxiv.org/abs/1911.04252)\n", "to improve your model during training with the provided unlabeled\n", "data.\n", "\n", "For more details on the semi-supervision parameters, please see the\n", "[SemiSupervisedParams](../api/api_ssl.rst#masterful.ssl.SemiSupervisedParams) API specification." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-04-07T01:33:04.830421Z", "iopub.status.busy": "2022-04-07T01:33:04.829719Z", "iopub.status.idle": "2022-04-07T01:33:04.832515Z", "shell.execute_reply": "2022-04-07T01:33:04.831806Z" } }, "outputs": [], "source": [ "ssl_params = masterful.ssl.learn_ssl_params(\n", " part1_training_dataset,\n", " training_dataset_params,\n", " unlabeled_datasets=[(part1_unlabeled_dataset, unlabeled_dataset_params)],\n", ")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Training with Unlabeled Data\n", "\n", "Now, you are ready to train your model using the Masterful AutoML\n", "platform. In the next cell, you will see the call to\n", "[masterful.training.train](../api/api_training.rst#masterful.training.train),\n", "which is the entry point to the meta-learning engine of the Masterful AutoML\n", "platform. Notice there is no need to batch your data (Masterful will\n", "find the optimal batch size for you). No need to shuffle your data\n", "(Masterful handles this for you). You don't even need to pass in a\n", "validation dataset (Masterful finds one for you). You hand Masterful\n", "a model and a dataset, and Masterful will figure the rest out for you." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-04-07T01:33:04.837345Z", "iopub.status.busy": "2022-04-07T01:33:04.836598Z", "iopub.status.idle": "2022-04-07T02:05:46.752932Z", "shell.execute_reply": "2022-04-07T02:05:46.752064Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MASTERFUL: Training model with semi-supervised learning enabled.\n", "MASTERFUL: Performing basic dataset analysis.\n", "MASTERFUL: Training model with:\n", "MASTERFUL: \t500 labeled examples.\n", "MASTERFUL: \t5000 validation examples.\n", "MASTERFUL: \t0 synthetic examples.\n", "MASTERFUL: \t5000 unlabeled examples.\n", "MASTERFUL: Training model with learned parameters quesadilla-cyclic-offer in two phases.\n", "MASTERFUL: The first phase is supervised training with the learned parameters.\n", "MASTERFUL: The second phase is semi-supervised training to boost performance.\n", "MASTERFUL: Warming up model for supervised training.\n", "MASTERFUL: \tWarming up batch norm statistics (this could take a few minutes).\n", "MASTERFUL: \tWarming up training for 500 steps.\n", "100%|██████████| 500/500 [02:01<00:00, 4.12steps/s]\n", "MASTERFUL: \tValidating batch norm statistics after warmup for stability (this could take a few minutes).\n", "MASTERFUL: Starting Phase 1: Supervised training until the validation loss stabilizes...\n", "Supervised Training: 100%|██████████| 7612/7612 [02:22<00:00, 53.33steps/s] \n", "MASTERFUL: Starting Phase 2: Semi-supervised training until the validation loss stabilizes...\n", "MASTERFUL: Warming up model for semi-supervised training.\n", "MASTERFUL: \tWarming up batch norm statistics (this could take a few minutes).\n", "MASTERFUL: \tWarming up training for 500 steps.\n", "100%|██████████| 500/500 [01:12<00:00, 6.92steps/s]\n", "MASTERFUL: \tValidating batch norm statistics after warmup for stability (this could take a few minutes).\n", "Semi-Supervised Training: 100%|██████████| 21385/21385 [26:15<00:00, 13.57steps/s] \n", "MASTERFUL: Semi-Supervised training complete.\n", "MASTERFUL: Training complete in 32.63983148733775 minutes.\n" ] } ], "source": [ "training_report = masterful.training.train(\n", " classification_model,\n", " model_params,\n", " optimization_params,\n", " regularization_params,\n", " ssl_params,\n", " part1_training_dataset,\n", " training_dataset_params,\n", " validation_dataset,\n", " validation_dataset_params,\n", " unlabeled_datasets=[(part1_unlabeled_dataset, unlabeled_dataset_params)],\n", ")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "The model you passed into [masterful.training.train](../api/api_training.rst#masterful.training.train)\n", "is now trained and updated in place, so you are able to evaluate it\n", "just like any other trained Keras model." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-04-07T02:05:46.757795Z", "iopub.status.busy": "2022-04-07T02:05:46.757084Z", "iopub.status.idle": "2022-04-07T02:05:47.687735Z", "shell.execute_reply": "2022-04-07T02:05:47.686955Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "157/157 [==============================] - 1s 5ms/step - loss: 0.7355 - categorical_accuracy: 0.7712\n", "Masterful model accuracy: 0.7712000012397766\n" ] } ], "source": [ "masterful_metrics = classification_model.evaluate(\n", " test_dataset.batch(optimization_params.batch_size), return_dict=True\n", ")\n", "print(f\"Masterful model accuracy: {masterful_metrics['categorical_accuracy']}\")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "If you recall, in [Part 1](../notebooks/guide_ssl_using_unlabeled_data_part1.ipynb)\n", "you ended up with a model that was 25-35% accurate. Now, using self-supervision\n", "on a much larger unlabeled dataset, and self-training on the original dataset,\n", "you increased that accuracy to over 75%! And all of this occurred with **no additional labels** in\n", "your dataset." ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "guide_ssl_using_unlabeled_data_part2", "private_outputs": false, "provenance": [], "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }