{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "# Train with Unlabeled Data via SSL\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)][1]        \n", "[![Download](images/download.png)][2]][Download this Notebook][2]\n", "\n", "[1]: https://colab.research.google.com/github/masterfulai/masterful-docs/blob/main/notebooks/guide_ssl_training.ipynb\n", "[2]: https://docs.masterfulai.com/0.5.2/notebooks/guide_ssl_training.ipynb" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Introduction\n", "\n", "In this guide, you will learn how Masterful automatically uses unlabeled\n", "data to improve model accuracy through semi-supervised learning (SSL).\n", "\n", "SSL is an excellent way to improve your model without the extra cost,\n", "difficulty, and hassle of labeling more data.\n", "\n", "Masterful uses SSL through many of it's APIs. This guide will walk you through\n", "the use of SSL in the [masterful.training.train][1] function, which is the\n", "primary model training function in the API. This API function is also\n", "invoked by the Masterful CLI Trainer, so the Masterful CLI Trainer also provides\n", "full support for using unlabeled data automatically during model training.\n", "\n", "For this guide, you will simulate a small labeled dataset, on\n", "the order of only 500 labeled examples per class. To do this, you will\n", "use a small subset of the CIFAR-10 dataset (10%) as the labeled examples,\n", "and the rest of the dataset as the \"unlabeled\" examples.\n", "\n", "[1]: ../api/api_training.rst#masterful.training.train" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Prerequisites\n", "\n", "Please follow the Masterful installation instructions [here](../markdown/tutorial_installation.md)." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Imports\n", "\n", "First, import the necessary libraries and activate the Masterful package." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-06-20T23:28:06.522742Z", "iopub.status.busy": "2022-06-20T23:28:06.522551Z", "iopub.status.idle": "2022-06-20T23:28:07.512430Z", "shell.execute_reply": "2022-06-20T23:28:07.511926Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MASTERFUL: Your account has been successfully registered. Masterful v0.5.0 is loaded.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/yaoshiang/miniconda3/envs/tf/lib/python3.8/site-packages/tensorflow_addons/utils/ensure_tf_install.py:53: UserWarning: Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.7.0 and strictly below 2.10.0 (nightly versions are not supported). \n", " The versions of TensorFlow you are currently using is 2.6.2 and is not supported. \n", "Some things might work, some things might not.\n", "If you were to encounter a bug, do not file an issue.\n", "If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version. \n", "You can find the compatibility matrix in TensorFlow Addon's readme:\n", "https://github.com/tensorflow/addons\n", " warnings.warn(\n" ] } ], "source": [ "import numpy as np\n", "import tensorflow as tf\n", "import tensorflow_addons as tfa\n", "\n", "import masterful\n", "\n", "masterful = masterful.activate()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Prepare the Data\n", "\n", "For this guide, you will use only 10% of the CIFAR-10 data as your labeled\n", "dataset, in order to simulate a small of amount of labeled training\n", "data. You will then use 4x that amount of unlabeled data (from the remaining\n", "CIFAR-10 dataset) in order to boost the performance of your model\n", "at training time. Why should you use 4x the amount of unlabeled data?\n", "In practice, we have found diminishing returns from larger amounts of\n", "unlabeled data, and an ideal range is generally between 2-10x the size\n", "of your labeled data." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-06-20T23:28:07.528416Z", "iopub.status.busy": "2022-06-20T23:28:07.528128Z", "iopub.status.idle": "2022-06-20T23:28:09.679018Z", "shell.execute_reply": "2022-06-20T23:28:09.678504Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-06-20 16:28:08.730624: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", "2022-06-20 16:28:08.735124: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", "2022-06-20 16:28:08.735536: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", "2022-06-20 16:28:08.736352: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA\n", "To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", "2022-06-20 16:28:08.736857: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", "2022-06-20 16:28:08.737375: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", "2022-06-20 16:28:08.737790: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", "2022-06-20 16:28:09.047265: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", "2022-06-20 16:28:09.047716: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", "2022-06-20 16:28:09.048124: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", "2022-06-20 16:28:09.048524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22318 MB memory: -> device: 0, name: GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6\n" ] } ], "source": [ "NUM_CLASSES = 10\n", "(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()\n", "\n", "# Normalize into the [0,1] range for numerical stability.\n", "x_train = x_train.astype(\"float32\") / 255.0\n", "x_test = x_test.astype(\"float32\") / 255.0\n", "\n", "# Masterful does not recommend sparse labels so convert to categorical.\n", "y_train = tf.keras.utils.to_categorical(y_train, NUM_CLASSES)\n", "y_test = tf.keras.utils.to_categorical(y_test, NUM_CLASSES)\n", "\n", "# Shuffle the data, and take 10% for the labeled data set,\n", "# and 10x that amount for the unlabeled dataset.\n", "training_percentage = 0.1\n", "unlabeled_multiplier = 4\n", "dataset_size = len(x_train)\n", "indices = np.array(range(dataset_size))\n", "generator = np.random.default_rng(seed=42)\n", "generator.shuffle(indices)\n", "cut = int(training_percentage * dataset_size)\n", "train_indices = indices[:cut]\n", "unlabeled_indices = indices[\n", " cut : cut + int(dataset_size * training_percentage * unlabeled_multiplier)\n", "]\n", "\n", "# Create the datasets from the splits\n", "training_dataset = tf.data.Dataset.from_tensor_slices(\n", " (x_train[train_indices], y_train[train_indices])\n", ")\n", "unlabeled_dataset = tf.data.Dataset.from_tensor_slices((x_train[unlabeled_indices],))\n", "\n", "# Split the test dataset into a test and validation dataset.\n", "# The validation dataset is used for measuring training performance.\n", "indices = np.array(range(len(x_test)))\n", "generator.shuffle(indices)\n", "test_indices = indices[:5000]\n", "validation_indices = indices[5000:]\n", "test_dataset = tf.data.Dataset.from_tensor_slices(\n", " (x_test[test_indices], y_test[test_indices])\n", ")\n", "validation_dataset = tf.data.Dataset.from_tensor_slices(\n", " (x_test[validation_indices], y_test[validation_indices])\n", ")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Create the Model\n", "\n", "For this example, you will use a ResNet-18v2 model from\n", "[Identity Mappings in Deep Residual Networks](https://arxiv.org/abs/1603.05027).\n", "ResNet's are a very standard architecture and with a good training\n", "methodology can meet most state of the art results. In general,\n", "a ResNet-18 would be way too large for only 500 labeled examples\n", "of data. And for this guide, you could use a much smaller model that\n", "would train a lot faster and still achieve the same results.\n", "\n", "The only difference between the model defined below and the\n", "ResNet-18 definition in the paper is the first convolutional layer\n", "has been reduced from a 7x7 convolution to a 3x3 convolution, in\n", "order to handle the small input size of CIFAR-10 better.\n", "\n", "Note that in general, real applications will work with image sizes much\n", "larger than CIFAR-10's 32x32, so you'll want to use an existing\n", "model architecture with pretrained weights from the `tf.keras.applications`\n", "module." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-06-20T23:28:09.681647Z", "iopub.status.busy": "2022-06-20T23:28:09.681473Z", "iopub.status.idle": "2022-06-20T23:28:09.885041Z", "shell.execute_reply": "2022-06-20T23:28:09.884493Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model: \"model\"\n", "__________________________________________________________________________________________________\n", "Layer (type) Output Shape Param # Connected to \n", "==================================================================================================\n", "input_1 (InputLayer) [(None, 32, 32, 3)] 0 \n", "__________________________________________________________________________________________________\n", "zero_padding2d (ZeroPadding2D) (None, 38, 38, 3) 0 input_1[0][0] \n", "__________________________________________________________________________________________________\n", "conv2d (Conv2D) (None, 36, 36, 64) 1792 zero_padding2d[0][0] \n", "__________________________________________________________________________________________________\n", "batch_normalization (BatchNorma (None, 36, 36, 64) 256 conv2d[0][0] \n", "__________________________________________________________________________________________________\n", "re_lu (ReLU) (None, 36, 36, 64) 0 batch_normalization[0][0] \n", "__________________________________________________________________________________________________\n", "zero_padding2d_1 (ZeroPadding2D (None, 38, 38, 64) 0 re_lu[0][0] \n", "__________________________________________________________________________________________________\n", "max_pooling2d (MaxPooling2D) (None, 18, 18, 64) 0 zero_padding2d_1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_bn1 (BatchNormaliz (None, 18, 18, 64) 256 max_pooling2d[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_relu1 (ReLU) (None, 18, 18, 64) 0 stage1_unit1_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_conv1 (Conv2D) (None, 18, 18, 64) 36928 stage1_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_bn2 (BatchNormaliz (None, 18, 18, 64) 256 stage1_unit1_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_relu2 (ReLU) (None, 18, 18, 64) 0 stage1_unit1_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_conv2 (Conv2D) (None, 18, 18, 64) 36928 stage1_unit1_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_sc1 (Conv2D) (None, 18, 18, 64) 4160 stage1_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit1_add1 (Add) (None, 18, 18, 64) 0 stage1_unit1_conv2[0][0] \n", " stage1_unit1_sc1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit2_bn1 (BatchNormaliz (None, 18, 18, 64) 256 stage1_unit1_add1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit2_relu1 (ReLU) (None, 18, 18, 64) 0 stage1_unit2_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit2_conv1 (Conv2D) (None, 18, 18, 64) 36928 stage1_unit2_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit2_bn2 (BatchNormaliz (None, 18, 18, 64) 256 stage1_unit2_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit2_relu2 (ReLU) (None, 18, 18, 64) 0 stage1_unit2_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit2_conv2 (Conv2D) (None, 18, 18, 64) 36928 stage1_unit2_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage1_unit2_add1 (Add) (None, 18, 18, 64) 0 stage1_unit1_add1[0][0] \n", " stage1_unit2_conv2[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_bn1 (BatchNormaliz (None, 18, 18, 64) 256 stage1_unit2_add1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_relu1 (ReLU) (None, 18, 18, 64) 0 stage2_unit1_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_conv1 (Conv2D) (None, 9, 9, 128) 73856 stage2_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_bn2 (BatchNormaliz (None, 9, 9, 128) 512 stage2_unit1_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_relu2 (ReLU) (None, 9, 9, 128) 0 stage2_unit1_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_conv2 (Conv2D) (None, 9, 9, 128) 147584 stage2_unit1_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_sc1 (Conv2D) (None, 9, 9, 128) 8320 stage2_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit1_add1 (Add) (None, 9, 9, 128) 0 stage2_unit1_conv2[0][0] \n", " stage2_unit1_sc1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit2_bn1 (BatchNormaliz (None, 9, 9, 128) 512 stage2_unit1_add1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit2_relu1 (ReLU) (None, 9, 9, 128) 0 stage2_unit2_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit2_conv1 (Conv2D) (None, 9, 9, 128) 147584 stage2_unit2_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit2_bn2 (BatchNormaliz (None, 9, 9, 128) 512 stage2_unit2_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit2_relu2 (ReLU) (None, 9, 9, 128) 0 stage2_unit2_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit2_conv2 (Conv2D) (None, 9, 9, 128) 147584 stage2_unit2_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage2_unit2_add1 (Add) (None, 9, 9, 128) 0 stage2_unit1_add1[0][0] \n", " stage2_unit2_conv2[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_bn1 (BatchNormaliz (None, 9, 9, 128) 512 stage2_unit2_add1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_relu1 (ReLU) (None, 9, 9, 128) 0 stage3_unit1_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_conv1 (Conv2D) (None, 5, 5, 256) 295168 stage3_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_bn2 (BatchNormaliz (None, 5, 5, 256) 1024 stage3_unit1_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_relu2 (ReLU) (None, 5, 5, 256) 0 stage3_unit1_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_conv2 (Conv2D) (None, 5, 5, 256) 590080 stage3_unit1_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_sc1 (Conv2D) (None, 5, 5, 256) 33024 stage3_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit1_add1 (Add) (None, 5, 5, 256) 0 stage3_unit1_conv2[0][0] \n", " stage3_unit1_sc1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit2_bn1 (BatchNormaliz (None, 5, 5, 256) 1024 stage3_unit1_add1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit2_relu1 (ReLU) (None, 5, 5, 256) 0 stage3_unit2_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit2_conv1 (Conv2D) (None, 5, 5, 256) 590080 stage3_unit2_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit2_bn2 (BatchNormaliz (None, 5, 5, 256) 1024 stage3_unit2_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit2_relu2 (ReLU) (None, 5, 5, 256) 0 stage3_unit2_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit2_conv2 (Conv2D) (None, 5, 5, 256) 590080 stage3_unit2_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage3_unit2_add1 (Add) (None, 5, 5, 256) 0 stage3_unit1_add1[0][0] \n", " stage3_unit2_conv2[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_bn1 (BatchNormaliz (None, 5, 5, 256) 1024 stage3_unit2_add1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_relu1 (ReLU) (None, 5, 5, 256) 0 stage4_unit1_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_conv1 (Conv2D) (None, 3, 3, 512) 1180160 stage4_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_bn2 (BatchNormaliz (None, 3, 3, 512) 2048 stage4_unit1_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_relu2 (ReLU) (None, 3, 3, 512) 0 stage4_unit1_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_conv2 (Conv2D) (None, 3, 3, 512) 2359808 stage4_unit1_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_sc1 (Conv2D) (None, 3, 3, 512) 131584 stage4_unit1_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit1_add1 (Add) (None, 3, 3, 512) 0 stage4_unit1_conv2[0][0] \n", " stage4_unit1_sc1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit2_bn1 (BatchNormaliz (None, 3, 3, 512) 2048 stage4_unit1_add1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit2_relu1 (ReLU) (None, 3, 3, 512) 0 stage4_unit2_bn1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit2_conv1 (Conv2D) (None, 3, 3, 512) 2359808 stage4_unit2_relu1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit2_bn2 (BatchNormaliz (None, 3, 3, 512) 2048 stage4_unit2_conv1[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit2_relu2 (ReLU) (None, 3, 3, 512) 0 stage4_unit2_bn2[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit2_conv2 (Conv2D) (None, 3, 3, 512) 2359808 stage4_unit2_relu2[0][0] \n", "__________________________________________________________________________________________________\n", "stage4_unit2_add1 (Add) (None, 3, 3, 512) 0 stage4_unit1_add1[0][0] \n", " stage4_unit2_conv2[0][0] \n", "__________________________________________________________________________________________________\n", "batch_normalization_1 (BatchNor (None, 3, 3, 512) 2048 stage4_unit2_add1[0][0] \n", "__________________________________________________________________________________________________\n", "re_lu_1 (ReLU) (None, 3, 3, 512) 0 batch_normalization_1[0][0] \n", "__________________________________________________________________________________________________\n", "global_average_pooling2d (Globa (None, 512) 0 re_lu_1[0][0] \n", "__________________________________________________________________________________________________\n", "dense (Dense) (None, 10) 5130 global_average_pooling2d[0][0] \n", "==================================================================================================\n", "Total params: 11,189,194\n", "Trainable params: 11,181,258\n", "Non-trainable params: 7,936\n", "__________________________________________________________________________________________________\n" ] } ], "source": [ "from tensorflow.keras.layers import (\n", " Input,\n", " Add,\n", " Conv2D,\n", " GlobalAveragePooling2D,\n", " MaxPooling2D,\n", " ReLU,\n", " ZeroPadding2D,\n", " BatchNormalization,\n", " Dense,\n", ")\n", "\n", "\n", "def identity_block(x, name, stage, unit, n_filters):\n", " shortcut = x\n", "\n", " x = BatchNormalization(name=name.format(stage, unit, \"bn\", 1))(x)\n", " x = ReLU(name=name.format(stage, unit, \"relu\", 1))(x)\n", " x = Conv2D(\n", " n_filters,\n", " (3, 3),\n", " strides=(1, 1),\n", " padding=\"same\",\n", " kernel_initializer=\"he_uniform\",\n", " name=name.format(stage, unit, \"conv\", 1),\n", " )(x)\n", "\n", " x = BatchNormalization(name=name.format(stage, unit, \"bn\", 2))(x)\n", " x = ReLU(name=name.format(stage, unit, \"relu\", 2))(x)\n", " x = Conv2D(\n", " n_filters,\n", " (3, 3),\n", " strides=(1, 1),\n", " padding=\"same\",\n", " kernel_initializer=\"he_uniform\",\n", " name=name.format(stage, unit, \"conv\", 2),\n", " )(x)\n", "\n", " x = Add(name=name.format(stage, unit, \"add\", 1))([shortcut, x])\n", " return x\n", "\n", "\n", "def projection_block(x, name, stage, unit, strides, n_filters):\n", " x = BatchNormalization(name=name.format(stage, unit, \"bn\", 1))(x)\n", " x = ReLU(name=name.format(stage, unit, \"relu\", 1))(x)\n", " shortcut = Conv2D(\n", " n_filters,\n", " (1, 1),\n", " strides=strides,\n", " kernel_initializer=\"he_uniform\",\n", " name=name.format(stage, unit, \"sc\", 1),\n", " )(x)\n", "\n", " x = Conv2D(\n", " n_filters,\n", " (3, 3),\n", " strides=strides,\n", " padding=\"same\",\n", " kernel_initializer=\"he_uniform\",\n", " name=name.format(stage, unit, \"conv\", 1),\n", " )(x)\n", " x = BatchNormalization(name=name.format(stage, unit, \"bn\", 2))(x)\n", " x = ReLU(name=name.format(stage, unit, \"relu\", 2))(x)\n", " x = Conv2D(\n", " n_filters,\n", " (3, 3),\n", " strides=(1, 1),\n", " padding=\"same\",\n", " kernel_initializer=\"he_uniform\",\n", " name=name.format(stage, unit, \"conv\", 2),\n", " )(x)\n", "\n", " x = Add(name=name.format(stage, unit, \"add\", 1))([x, shortcut])\n", " return x\n", "\n", "\n", "def group(x, name, stage, strides, n_blocks, n_filters):\n", " x = projection_block(\n", " x, name=name, stage=stage, unit=1, strides=strides, n_filters=n_filters\n", " )\n", " for unit in range(n_blocks - 1):\n", " x = identity_block(\n", " x, name=name, stage=stage, unit=unit + 2, n_filters=n_filters\n", " )\n", " return x\n", "\n", "\n", "def resnet18(input_shape, num_classes):\n", " inputs = Input(input_shape)\n", " x = ZeroPadding2D(padding=(3, 3))(inputs)\n", "\n", " x = Conv2D(\n", " 64, (3, 3), strides=(1, 1), padding=\"valid\", kernel_initializer=\"he_uniform\"\n", " )(x)\n", " x = BatchNormalization()(x)\n", " x = ReLU()(x)\n", " x = ZeroPadding2D(padding=(1, 1))(x)\n", " x = MaxPooling2D((3, 3), strides=(2, 2))(x)\n", "\n", " x = group(\n", " x, strides=(1, 1), name=\"stage{}_unit{}_{}{}\", stage=1, n_blocks=2, n_filters=64\n", " )\n", " x = group(\n", " x,\n", " strides=(2, 2),\n", " name=\"stage{}_unit{}_{}{}\",\n", " stage=2,\n", " n_blocks=2,\n", " n_filters=128,\n", " )\n", " x = group(\n", " x,\n", " strides=(2, 2),\n", " name=\"stage{}_unit{}_{}{}\",\n", " stage=3,\n", " n_blocks=2,\n", " n_filters=256,\n", " )\n", " x = group(\n", " x,\n", " strides=(2, 2),\n", " name=\"stage{}_unit{}_{}{}\",\n", " stage=4,\n", " n_blocks=2,\n", " n_filters=512,\n", " )\n", "\n", " x = BatchNormalization()(x)\n", " x = ReLU()(x)\n", "\n", " x = GlobalAveragePooling2D()(x)\n", " x = Dense(num_classes, kernel_initializer=\"he_normal\")(x)\n", " return tf.keras.Model(inputs=inputs, outputs=x)\n", "\n", "\n", "INPUT_SHAPE = (32, 32, 3)\n", "NUM_CLASSES = 10\n", "\n", "model = resnet18(INPUT_SHAPE, NUM_CLASSES)\n", "model.summary()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Baseline Training\n", "\n", "In order to measure the performance improvements from Masterful,\n", "you should measure the performance of your model after training\n", "with a standard training loop, with no unlabeled data. Below, you\n", "will setup a standard training loop with some basic data augmentation\n", "(color space augmentation, random resized crops, and horizontal\n", "mirroring). The hyperparameter values below (learning\n", "rate, epochs, batch size, etc) were all found using a manual search." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-06-20T23:28:09.887194Z", "iopub.status.busy": "2022-06-20T23:28:09.887017Z", "iopub.status.idle": "2022-06-20T23:28:46.335413Z", "shell.execute_reply": "2022-06-20T23:28:46.334938Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-06-20 16:28:10.174844: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)\n", "2022-06-20 16:28:15.155991: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8100\n", "2022-06-20 16:28:15.580144: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory\n", "2022-06-20 16:28:15.580398: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory\n", "2022-06-20 16:28:15.580432: W tensorflow/stream_executor/gpu/asm_compiler.cc:77] Couldn't get ptxas version string: Internal: Couldn't invoke ptxas --version\n", "2022-06-20 16:28:15.580680: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory\n", "2022-06-20 16:28:15.580722: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Failed to launch ptxas\n", "Relying on driver to perform ptx compilation. \n", "Modify $PATH to customize ptxas location.\n", "This message will be only logged once.\n", "2022-06-20 16:28:16.203576: I tensorflow/stream_executor/cuda/cuda_blas.cc:1760] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "20/20 [==============================] - 0s 10ms/step - loss: 1.6297 - categorical_accuracy: 0.5294\n", "Baseline model accuracy: 0.5293999910354614\n" ] } ], "source": [ "\n", "def augment_image(image):\n", " \"\"\"A simple augmentation pipeline.\"\"\"\n", " image = tf.image.random_brightness(image, 0.1)\n", " image = tf.image.random_hue(image, 0.1)\n", " image = tf.image.random_crop(image, size=[28, 28, 3])\n", " image = tf.image.resize(image, size=[32, 32])\n", " image = tf.image.random_flip_left_right(image)\n", " return image\n", "\n", "\n", "model.compile(\n", " optimizer=tfa.optimizers.LAMB(learning_rate=0.001),\n", " loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),\n", " metrics=[tf.keras.metrics.CategoricalAccuracy()],\n", ")\n", "\n", "batch_size = 256\n", "shuffle_buffer_size = 500\n", "epochs = 30\n", "model.fit(\n", " training_dataset.shuffle(shuffle_buffer_size)\n", " .map(lambda image, label: (augment_image(image), label))\n", " .batch(batch_size),\n", " validation_data=validation_dataset.batch(batch_size),\n", " epochs=epochs,\n", " verbose=0,\n", ")\n", "baseline_metrics = model.evaluate(test_dataset.batch(batch_size), return_dict=True)\n", "print(f\"Baseline model accuracy: {baseline_metrics['categorical_accuracy']}\")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Setup Masterful\n", "\n", "The Masterful AutoML platform learns how to train your model by\n", "focusing on five core organizational principles in deep\n", "learning: architecture, data, optimization, regularization,\n", "and semi-supervision.\n", "\n", "**Architecture** is the structure of weights, biases, and activations\n", "that define a model. In this example, the architecture is defined by the model you created above.\n", "\n", "**Data** is the input used to train the model. In this example, you\n", "are using a labeled training dataset - [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html).\n", "More advanced usages of the Masterful AutoML platform can take into account unlabeled and synthetic\n", "data as well, using a variety of different techniques.\n", "\n", "**Optimization** means finding the best weights for a model and\n", "training data. Optimization is different from regularization because\n", "optimization does not consider generalization to unseen data. The\n", "central challenge of optimization is speed - find the best weights\n", "faster.\n", "\n", "**Regularization** means helping a model generalize to data it has\n", "not yet seen. Another way of saying this is that regularization is\n", "about fighting overfitting.\n", "\n", "**Semi-Supervision** is the process by which a model can be trained\n", "using both labeled and unlabeled data.\n", "\n", "The first step when using Masterful is to learn the optimal set of\n", "parameters for each of the five buckets above. You start by learning\n", "the architecture and data parameters of the model and training dataset. In the code below, you are telling Masterful that your model is performing a classification task (`masterful.enums.Task.CLASSIFICATION`) with 10 labels (`num_classes=NUM_CLASSES`), and that the input range of the image features going into your model are in the range [0,255] (`input_range=masterful.enums.ImageRange.ZERO_255`). Also, the model outputs logits rather than a softmax classification (`prediction_logits=True`).\n", "\n", "Furthermore, in the training dataset, you are providing dense labels\n", "(`sparse_labels=False`) rather than sparse labels.\n", "\n", "For more details on architecture and data parameters, see the API\n", "specifications for [ArchitectureParams](../api/api_architecture.rst#masterful.architecture.ArchitectureParams) and\n", "[DataParams](../api/api_data.rst#masterful.data.DataParams)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-06-20T23:28:46.337483Z", "iopub.status.busy": "2022-06-20T23:28:46.337202Z", "iopub.status.idle": "2022-06-20T23:28:49.066699Z", "shell.execute_reply": "2022-06-20T23:28:49.064140Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Log API_EVENT (400): {'app_exception': 'InvalidUUID', 'context': {'message': 'account id or password with bad format.'}}\n", "Log API_EVENT (400): {'app_exception': 'InvalidUUID', 'context': {'message': 'account id or password with bad format.'}}\n", "Log API_EVENT (400): {'app_exception': 'InvalidUUID', 'context': {'message': 'account id or password with bad format.'}}\n", "Log API_EVENT (400): {'app_exception': 'InvalidUUID', 'context': {'message': 'account id or password with bad format.'}}\n" ] } ], "source": [ "# Start fresh with a new model\n", "tf.keras.backend.clear_session()\n", "model = resnet18(INPUT_SHAPE, NUM_CLASSES)\n", "model_params = masterful.architecture.learn_architecture_params(\n", " model=model,\n", " task=masterful.enums.Task.CLASSIFICATION,\n", " input_range=masterful.enums.ImageRange.ZERO_ONE,\n", " num_classes=NUM_CLASSES,\n", " prediction_logits=True,\n", ")\n", "training_dataset_params = masterful.data.learn_data_params(\n", " dataset=training_dataset,\n", " task=masterful.enums.Task.CLASSIFICATION,\n", " image_range=masterful.enums.ImageRange.ZERO_ONE,\n", " num_classes=NUM_CLASSES,\n", " sparse_labels=False,\n", ")\n", "validation_dataset_params = masterful.data.learn_data_params(\n", " dataset=validation_dataset,\n", " task=masterful.enums.Task.CLASSIFICATION,\n", " image_range=masterful.enums.ImageRange.ZERO_ONE,\n", " num_classes=NUM_CLASSES,\n", " sparse_labels=False,\n", ")\n", "unlabeled_dataset_params = masterful.data.learn_data_params(\n", " dataset=unlabeled_dataset,\n", " task=masterful.enums.Task.CLASSIFICATION,\n", " image_range=masterful.enums.ImageRange.ZERO_ONE,\n", " num_classes=NUM_CLASSES,\n", " sparse_labels=None,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "Next you learn the optimization parameters that will be used to train\n", "the model. Below, you use Masterful to learn the standard set of\n", "optimization parameters to train your model for a classification task.\n", "\n", "For more details on the optmization parameters, please see the [OptimizationParams](../api/api_optimization.rst#masterful.optimization.OptimizationParams) API specification." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-06-20T23:28:49.094527Z", "iopub.status.busy": "2022-06-20T23:28:49.094361Z", "iopub.status.idle": "2022-06-20T23:29:14.396151Z", "shell.execute_reply": "2022-06-20T23:29:14.395784Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Log API_EVENT (400): {'app_exception': 'InvalidUUID', 'context': {'message': 'account id or password with bad format.'}}\n", "Callbacks: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:24<00:00, 3.05s/steps]\n" ] } ], "source": [ "optimization_params = masterful.optimization.learn_optimization_params(\n", " model,\n", " model_params,\n", " training_dataset,\n", " training_dataset_params,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "The regularization parameters used can have a dramatic impact on the\n", "final performance of your trained model. Learning these parameters can\n", "be a time-consuming and domain specific challenge. Masterful can speed\n", "up this process by learning these parameters for you. In general, this\n", "can be an expensive operation. A rough order of magnitude for learning\n", "these parameters is 2x the time it takes to train your model. However,\n", "this is still dramatically faster than manually finding these\n", "parameters yourself. In the example below, you will use one of the\n", "many sets of pre-learned regularization parameters that are shipped\n", "in the Masterful API. In most instances, you should learn these\n", "parameters directly using the [learn_regularization_params](../api/api_regularization.rst#masterful.regularization.learn_regularization_params) API.\n", "\n", "For more details on the regularization parameters, please see the\n", "[RegularizationParams](../api/api_regularization.rst#masterful.regularization.RegularizationParams) API specification." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-06-20T23:29:14.398200Z", "iopub.status.busy": "2022-06-20T23:29:14.398036Z", "iopub.status.idle": "2022-06-20T23:29:14.400241Z", "shell.execute_reply": "2022-06-20T23:29:14.399893Z" } }, "outputs": [], "source": [ "# This is a set of parameters learned on CIFAR10 for\n", "# for ResNet18 models.\n", "regularization_params = masterful.regularization.parameters.CIFAR10_RESNET18" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "The final step before training is to learn the optimal set of\n", "semi-supervision parameters. In this example, Masterful will\n", "apply [Noisy Student Training](https://arxiv.org/abs/1911.04252)\n", "to improve your model during training with the provided unlabeled\n", "data.\n", "\n", "For more details on the semi-supervision parameters, please see the\n", "[SemiSupervisedParams](../api/api_ssl.rst#masterful.ssl.SemiSupervisedParams) API specification." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-06-20T23:29:14.402199Z", "iopub.status.busy": "2022-06-20T23:29:14.402045Z", "iopub.status.idle": "2022-06-20T23:29:15.177901Z", "shell.execute_reply": "2022-06-20T23:29:15.177522Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Log API_EVENT (400): {'app_exception': 'InvalidUUID', 'context': {'message': 'account id or password with bad format.'}}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-06-20 16:29:15.175503: W tensorflow/core/data/root_dataset.cc:167] Optimization loop failed: Cancelled: Operation was cancelled\n" ] } ], "source": [ "ssl_params = masterful.ssl.learn_ssl_params(\n", " training_dataset,\n", " training_dataset_params,\n", " unlabeled_datasets=[(unlabeled_dataset, unlabeled_dataset_params)],\n", ")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Training with Unlabeled Data\n", "\n", "Now, you are ready to train your model using Masterful.\n", "In the next cell, you will see the call to\n", "[masterful.training.train](../api/api_training.rst#masterful.training.train),\n", "which is the entry point to the meta-learning engine of the Masterful AutoML\n", "platform." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-06-20T23:29:15.179776Z", "iopub.status.busy": "2022-06-20T23:29:15.179632Z", "iopub.status.idle": "2022-06-20T23:39:19.922142Z", "shell.execute_reply": "2022-06-20T23:39:19.919501Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Log API_EVENT (400): {'app_exception': 'InvalidUUID', 'context': {'message': 'account id or password with bad format.'}}\n", "MASTERFUL [16:29:15]: Training model with semi-supervised learning enabled.\n", "MASTERFUL [16:29:16]: Performing basic dataset analysis.\n", "MASTERFUL [16:29:16]: Training model with:\n", "MASTERFUL [16:29:16]: \t5000 labeled examples.\n", "MASTERFUL [16:29:16]: \t5000 validation examples.\n", "MASTERFUL [16:29:16]: \t0 synthetic examples.\n", "MASTERFUL [16:29:16]: \t20000 unlabeled examples.\n", "MASTERFUL [16:29:17]: Training model with learned parameters mollusk-coral-elephant in two phases.\n", "MASTERFUL [16:29:17]: The first phase is supervised training with the learned parameters.\n", "MASTERFUL [16:29:17]: The second phase is semi-supervised training to boost performance.\n", "MASTERFUL [16:29:18]: Warming up model for supervised training.\n", "MASTERFUL [16:29:21]: \tWarming up batch norm statistics (this could take a few minutes).\n", "MASTERFUL [16:29:52]: \tWarming up training for 500 steps.\n", "100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:47<00:00, 10.63steps/s]\n", "MASTERFUL [16:30:39]: \tValidating batch norm statistics after warmup for stability (this could take a few minutes).\n", "MASTERFUL [16:30:53]: Starting Phase 1: Supervised training until the validation loss stabilizes...\n", "Supervised Training: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1760/1760 [01:17<00:00, 22.67steps/s]\n", "MASTERFUL [16:32:14]: Starting Phase 2: Semi-supervised training until the validation loss stabilizes...\n", "MASTERFUL [16:32:14]: Warming up model for semi-supervised training.\n", "MASTERFUL [16:32:15]: \tWarming up batch norm statistics (this could take a few minutes).\n", "MASTERFUL [16:32:45]: \tWarming up training for 500 steps.\n", "100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:25<00:00, 19.78steps/s]\n", "MASTERFUL [16:33:11]: \tValidating batch norm statistics after warmup for stability (this could take a few minutes).\n", "Semi-Supervised Training: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5192/5192 [05:34<00:00, 15.50steps/s]\n", "MASTERFUL [16:39:10]: Semi-Supervised training complete.\n", "MASTERFUL [16:39:10]: Training complete in 9.887277317047118 minutes.\n" ] } ], "source": [ "training_report = masterful.training.train(\n", " model,\n", " model_params,\n", " optimization_params,\n", " regularization_params,\n", " ssl_params,\n", " training_dataset,\n", " training_dataset_params,\n", " validation_dataset,\n", " validation_dataset_params,\n", " unlabeled_datasets=[(unlabeled_dataset, unlabeled_dataset_params)],\n", ")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "The model you passed into [masterful.training.train](../api/api_training.rst#masterful.training.train)\n", "is now trained and updated in place, so you are able to evaluate it\n", "just like any other trained Keras model." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "colab_type": "code", "execution": { "iopub.execute_input": "2022-06-20T23:39:19.929734Z", "iopub.status.busy": "2022-06-20T23:39:19.929585Z", "iopub.status.idle": "2022-06-20T23:39:20.200530Z", "shell.execute_reply": "2022-06-20T23:39:20.200149Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "20/20 [==============================] - 0s 10ms/step - loss: 1.1407 - categorical_accuracy: 0.6382\n", "Baseline model accuracy: 0.5293999910354614\n", "Masterful model accuracy: 0.6381999850273132\n" ] } ], "source": [ "masterful_metrics = model.evaluate(\n", " test_dataset.batch(optimization_params.batch_size), return_dict=True\n", ")\n", "print(f\"Baseline model accuracy: {baseline_metrics['categorical_accuracy']}\")\n", "print(f\"Masterful model accuracy: {masterful_metrics['categorical_accuracy']}\")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "As you can see, you boosted your accuracy from ~53% to ~64%\n", "(results may vary depending on your run) simply by\n", "using unlabeled data with Masterful." ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "guide_ssl_training", "private_outputs": false, "provenance": [], "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" } }, "nbformat": 4, "nbformat_minor": 0 }