{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Optimization with Masterful\n", "\n", "**Author:** [Nikhil Gajendrakumar](mailto:nikhil@masterfulai.com) \n", "**Date created:** 2022/04/27 \n", "**Last modified:** 2022/05/04 \n", "**Description:** Optimization with Masterful\n", "\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)][1]        [![Download](images/download.png)][2][Download this Notebook][2]\n", "\n", "[1]:https://colab.research.google.com/github/masterfulai/masterful-docs/blob/main/notebooks/guide_batch_lr_finder.ipynb\n", "[2]:http://docs.masterfulai.com/0.4.1/notebooks/guide_batch_lr_finder.ipynb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is an introduction to Masterful's automatic solution to tune algorithmic hyperparameters. Hyperparameter such as learning rate and batch size control the training process of a model. Given these hyperparameters, the training algorithm learns the model parameters from the data. Picking the right values for these hyperparameters can be very painful. You take this ardous journey of trial-and-error where you try different values, run the training for few epochs, and pick the values that achieve the best validation accuracy. This process is very tedious and if you modify the model architecture, you will have to repeat the entire process again.\n", "\n", "This guide will demonstrate how Masterful, algorithmically, picks the right values for these hyperparameters, for the given model and dataset, that can achieve the best performance in shortest training time." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prerequisites\n", "\n", "Please follow the Masterful installation instructions [here](../tutorials/tutorial_installation.md)\n", "in order to run this Quickstart." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports\n", "\n", "First, import the necessary libraries and register the Masterful package." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MASTERFUL: Your account has been successfully registered. Masterful v0.4.1 is loaded.\n" ] } ], "source": [ "import numpy as np\n", "import tensorflow as tf\n", "import tensorflow_addons as tfa\n", "import tensorflow_datasets as tfds\n", "import matplotlib.pyplot as plt\n", "from time import time\n", "import masterful\n", "masterful = masterful.register()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prepare the Data\n", "\n", "For this guide, you will use SVHN (Street View House Numbers) dataset which is an image digit recognition dataset of digit images coming from real world data. Images are cropped to 32x32. The dataset consists of 73,257 images for training and 26,032 for testing.\n", "\n", "You can verify that the data was downloaded correctly by displaying a few examples from the dataset. The first step in any ML project is some exploratory data analysis (EDA). The bare minimum is to visualize the images and understand their contents." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": true }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "train_dataset = tfds.load(\"svhn_cropped\",\n", " split=\"train[:5%]\",\n", " as_supervised=True)\n", "test_dataset = tfds.load(\"svhn_cropped\",\n", " split=\"test[:5%]\",\n", " as_supervised=True)\n", "\n", "f, axarr = plt.subplots(4, 4, figsize=(10,10))\n", "\n", "for i,(x,y) in enumerate(train_dataset.take(4*4)):\n", " row = i // 4\n", " col = i % 4\n", " axarr[row, col].imshow(x.numpy())\n", " axarr[row, col].title.set_text(f'{y.numpy()}')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# set image range to [0-1]\n", "train_dataset = train_dataset.map(\n", " lambda image, label:\n", " (tf.cast(image, tf.float32) / 255.0, label),\n", " num_parallel_calls=tf.data.AUTOTUNE\n", ")\n", "\n", "test_dataset = test_dataset.map(\n", " lambda image, label:\n", " (tf.cast(image, tf.float32) / 255.0, label),\n", " num_parallel_calls=tf.data.AUTOTUNE\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create the Model\n", "For this example, we will use a simple model inspired by simple CNN used in\n", "[this tensorflow tutorial](https://www.tensorflow.org/tutorials/images/cnn). This is a toy model for demonstration purposes only, and should not be used in a production environment. It has a few convolutional layers and outputs logits directly, rather than using a softmax layer at the end." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "NUM_CLASSES = 10\n", "\n", "def get_model():\n", "\n", " model = tf.keras.models.Sequential([\n", " tf.keras.layers.Input(shape=(32, 32, 3)),\n", " tf.keras.layers.Conv2D(64, 3, padding='same', activation='relu'),\n", " tf.keras.layers.MaxPooling2D(),\n", " tf.keras.layers.Conv2D(128, 3, padding='same', activation='relu'),\n", " tf.keras.layers.MaxPooling2D(),\n", " tf.keras.layers.Conv2D(256, 3, padding='same', activation='relu'),\n", " tf.keras.layers.MaxPooling2D(),\n", " tf.keras.layers.Conv2D(512, 3, padding='same', activation='relu'),\n", " tf.keras.layers.MaxPooling2D(),\n", " tf.keras.layers.Conv2D(1024, 3, padding='same', activation='relu'),\n", " tf.keras.layers.MaxPooling2D(),\n", " tf.keras.layers.Flatten(),\n", " tf.keras.layers.Dense(256, activation='relu'),\n", " tf.keras.layers.Dense(128, activation='relu'),\n", " tf.keras.layers.Dense(NUM_CLASSES)\n", " ])\n", "\n", " return model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baseline Training\n", "\n", "In order to measure the performance improvements from Masterful,\n", "you should measure the performance of your model after training\n", "with a standard training loop.\n", "\n", "The hyperparameter values learning rate = 0.001 and batch size = 32\n", "were picked since they are the most popular starting values that\n", "have produced reasonable results across different model architectures\n", "and datasets" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def get_lr_metric(optimizer):\n", " def lr(y_true, y_pred):\n", " return optimizer.lr\n", " return lr" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# setup callback to log the total training time\n", "class TrainingTime(tf.keras.callbacks.Callback):\n", " def __init__(self, batch_size, lr, logs=None):\n", " self.time_per_epoch = []\n", " self.batch_size = batch_size\n", " self.learning_rate = lr\n", "\n", " def on_epoch_begin(self, epoch, logs=None):\n", " self.starttime = time()\n", "\n", " def on_epoch_end(self, epoch, logs=None):\n", " self.time_per_epoch.append(time() - self.starttime)\n", " \n", " def on_train_end(self, logs=None):\n", " total_train_time = sum(self.time_per_epoch)\n", " msg = (f\"Training time with batch size = {self.batch_size} and \"\n", " f\"learning rate = {self.learning_rate} is {total_train_time:.2f} s.\")\n", " print(msg)\n", " " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# set few common parameters for both experiments\n", "\n", "epochs = 100\n", "shuffle_buffer_size = 1024\n", "optimizer = tfa.optimizers.LAMB(learning_rate=0.001)\n", "lr_metric = get_lr_metric(optimizer)\n", "reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss',\n", " factor=0.5,\n", " patience=8)\n", "early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=12)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training time with batch size = 32 and learning rate = 0.001 is 122.04 s.\n", "Baseline model Train accuracy: 0.991264, and Test accuracy: 0.791091\n" ] } ], "source": [ "batch_size = 32\n", "learning_rate = 0.001\n", "\n", "tf.keras.backend.set_value(optimizer.lr, learning_rate)\n", "\n", "base_model = get_model()\n", "base_model.compile(optimizer=optimizer,\n", " loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", " metrics=['accuracy', lr_metric])\n", "\n", "\n", "base_history = base_model.fit(train_dataset.shuffle(shuffle_buffer_size)\n", " .batch(batch_size),\n", " epochs=epochs,\n", " validation_data=test_dataset.batch(64),\n", " callbacks=[\n", " reduce_lr,\n", " early_stop,\n", " TrainingTime(batch_size,learning_rate)\n", " ],\n", " verbose=0)\n", "\n", "msg = (f\"Baseline model Train accuracy: {max(base_history.history['accuracy']):4f}, \"\n", " f\"and Test accuracy: {max(base_history.history['val_accuracy']):4f}\")\n", "print(msg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Automatic hyperparameter value selection\n", "\n", "The Masterful AutoML platform learns how to train your model by\n", "focusing on five core organizational principles in deep\n", "learning: architecture, data, optimization, regularization,\n", "and semi-supervision.\n", "\n", "In this guide we are only demostrating Masterful's optimizer and automatic\n", "hyperparameter value selection, which is part of our optimization bucket,\n", "and the performance benefit we acheive from the right hyperparameter values\n", "alone in terms of training time and accuracy. Therefore we only run\n", "Masterful apis that are necessary to get the hyperparameter values and\n", "train the model with a standard tensorflow training loop.\n", "\n", "Masterful's meta-learner for optimization hyperparameters are tailored\n", "to your model architecture and data. So the first thing to create\n", "[ArchitectureParams](../api/api_architecture.rst#masterful.architecture.ArchitectureParams)\n", "and [DataParams](../api/api_data.rst#masterful.data.DataParams) via their\n", "respective learner functions. \n", "\n", "In the code below, you are telling Masterful that your model is \n", "performing a classification task (`masterful.enums.Task.CLASSIFICATION`) \n", "with 10 labels (`num_classes=NUM_CLASSES`), and that the \n", "input range of the image features going into your model are \n", "in the range [0,1] (`input_range=masterful.enums.ImageRange.ZERO_ONE`). \n", "Also, the model outputs logits rather than a softmax classification\n", " (`prediction_logits=True`)." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# Start fresh with a new model\n", "tf.keras.backend.clear_session()\n", "model = get_model()\n", "\n", "model_params = masterful.architecture.learn_architecture_params(\n", " model=model,\n", " task=masterful.enums.Task.CLASSIFICATION,\n", " input_range=masterful.enums.ImageRange.ZERO_ONE,\n", " num_classes=NUM_CLASSES,\n", " prediction_logits=True,\n", ")\n", "training_dataset_params = masterful.data.learn_data_params(\n", " dataset=train_dataset,\n", " task=masterful.enums.Task.CLASSIFICATION,\n", " image_range=masterful.enums.ImageRange.ZERO_ONE,\n", " num_classes=NUM_CLASSES,\n", " sparse_labels=True,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Meta-Learning Optimization Hyperparameters\n", "\n", "Now, call the learner for optimization hyperparameters. \n", "\n", "For more details on the optmization parameters, please see the [OptimizationParams](../api/api_optimization.rst#masterful.optimization.OptimizationParams) API specification.\n", "\n", "To try our platform to train the model end to end, refer to our [Quick start guide](https://masterful-public.s3.us-west-1.amazonaws.com/933013963/0.4.1/notebooks/tutorial_quickstart.html#)\n", "\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MASTERFUL: Learning optimal batch size.\n", "MASTERFUL: Learning optimal initial learning rate for batch size 256.\n" ] } ], "source": [ "optimization_params = masterful.optimization.learn_optimization_params(\n", " model,\n", " model_params,\n", " train_dataset,\n", " training_dataset_params,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training time with batch size = 256 and learning rate = 0.0035355337895452976 is 64.27 s.\n", "Masterful model Train accuracy: 0.998635, and Test accuracy: 0.803379\n" ] } ], "source": [ "batch_size = optimization_params.batch_size\n", "learning_rate = optimization_params.learning_rate\n", "\n", "# set starting optimizer learning rate\n", "tf.keras.backend.set_value(optimizer.lr, learning_rate)\n", "lr_metric = get_lr_metric(optimizer)\n", "\n", "model.compile(optimizer=optimizer,\n", " loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", " metrics=['accuracy', lr_metric])\n", "\n", "history = model.fit(train_dataset.shuffle(shuffle_buffer_size)\n", " .batch(batch_size),\n", " epochs=epochs,\n", " validation_data=test_dataset.batch(64),\n", " callbacks=[\n", " reduce_lr,\n", " early_stop,\n", " TrainingTime(batch_size, learning_rate)\n", " ],\n", " verbose=0)\n", "\n", "msg = (f\"Masterful model Train accuracy: {max(history.history['accuracy']):4f}, \"\n", " f\"and Test accuracy: {max(history.history['val_accuracy']):4f}\")\n", "print(msg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, model trained with Masterful's hyperparameter achieved better training and test accuracy in 50% less training time, just by selecting the right values for batch size and learning rate. To learn more theory on how we automatically find these hyperparameter values, please refer [this blog post](https://www.masterfulai.com/blog/stop-burning-money-on-the-wrong-batch-size)\n", "\n", "Masterful's full optimization suite can reduce training time and increase accuracy by many fold by customizing your training loop. To learn more on that, please check out our [docs](https://masterful-public.s3.us-west-1.amazonaws.com/933013963/0.4.1/index.html).\n", "\n", "We would love to hear your thoughts on this guide or the entire Masterful platform. Join our [slack community]( https://www.masterfulai.com/community) and let us know what you think." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 2 }