Semantic Segmentation of Dogs and Cats with the Masterful Python API

Open In Colab        DownloadDownload this Notebook

In this guide you will walk through the steps on using the Masterful Python API to train a model on the task of semantic segmentation.

This guide is inspired by the Tensorflow Image Segmentation guide. You will use the same dataset (Oxford IIIT Pets) and model (a modified U-Net) as that guide, so that you can see the side-by-side comparison between training with and without Masterful.

Prerequisites

Please follow the Masterful installation instructions here in order to run this Quickstart.

Import the Necessary Libraries

In this Guide, you will use Tensorflow as the training infrastructure, and Tensorflow Datasets to provide the training data. You will use Keras on top of Tensorflow to help build the model, and since you are using the same model as the Image Segmentation guide, you have a few imports to pull in the model definition from there.

[1]:
import tensorflow as tf
import tensorflow_datasets as tfds

# Use Keras to build the individual model layers
from tensorflow.keras.layers import Conv2D, BatchNormalization, Activation, MaxPool2D, Conv2DTranspose, Concatenate, Input
from tensorflow.keras.models import Model

Import Masterful

Import and register the Masterful package.

[2]:
import masterful
masterful = masterful.register()
Loaded Masterful version 0.4.0. This software is distributed free of
charge for personal projects and evaluation purposes.
See http://www.masterfulai.com/personal-and-evaluation-agreement for details.
Sign up in the next 45 days at https://www.masterfulai.com/get-it-now
to continue using Masterful.

Prepare the Dataset

The dataset is available from TensorFlow Datasets. The segmentation masks are included in version 3+.

[3]:
INPUT_SHAPE = (128, 128, 3)
dataset_splits, info = tfds.load('oxford_iiit_pet:3.*.*', with_info=True, split=['train', 'test'])

Tensorflow Datasets returns a dictionary of features for each example, but Masterful requires an explicit (image, label) tuple for input examples, similar to the input you would provide to Keras model.fit(), for example. Therefore, the first task is to extract the image and label (in this case the segmentation mask) from the feature dictionary.

After the image and label has been extracted, you need to standardize the data for the model you are using. In this case, the U-Net model you have chosen expects each input image to be in a square format. In addition, the image color space should be RGB floats in the range [0,1].

Finally, you need to create the labels (segmentation masks) you will use for supervised training. The Oxford Pets dataset consists of images of 37 pet breeds, with 200 images per breed (~100 each in the training and test splits). Each image includes the corresponding labels, and pixel-wise masks. The masks are class-labels for each pixel. Each pixel is given one of three categories:

  • Class 1: Pixel belonging to the pet.

  • Class 2: Pixel bordering the pet.

  • Class 3: None of the above/a surrounding pixel.

For convenience, you will convert the class labels from [1,2,3] to [0,1,2] similar to the Image Segmentation guide, and you will then convert them to one-hot class labels, because Masterful performs better with dense rather than sparse labels.

[4]:
def extract_image_and_label(features):
  """
  Extracts the image and segmentation mask from the feature dictionary,
  and applies minimal normalization (resizing and label standardization).
  """
  image = tf.image.resize(features['image'], (INPUT_SHAPE[0], INPUT_SHAPE[1]))
  mask = tf.image.resize(features['segmentation_mask'], (INPUT_SHAPE[0], INPUT_SHAPE[1]))

  # Convert the image into the [0,1] RGB color space.
  image = tf.cast(image, tf.float32) / 255.0

  # For convenience, convert the segmentation mask into
  # [0,1,2] class labels.
  mask -= 1

  # Convert to one-hot labels in the mask.
  mask = tf.one_hot(tf.cast(tf.squeeze(mask, axis=-1), tf.int32), depth=3)
  return image, mask

Make sure to apply the extract_image_and_label function to both the training and test datasets, so that model training and evaluation both see the same input.

[6]:
training_dataset = dataset_splits[0]
test_dataset = dataset_splits[1]

training_dataset = training_dataset.map(extract_image_and_label, num_parallel_calls=tf.data.AUTOTUNE)
test_dataset = test_dataset.map(extract_image_and_label, num_parallel_calls=tf.data.AUTOTUNE)

# The Oxford Pets dataset on TF Datasets contains a few corrupted images,
# as well as several images that do not decode correctly. Apply
# the ignore_errors() operator here to eliminate the annoying warnings
# that will show up in the console due to these errors.
training_dataset = training_dataset.apply(tf.data.experimental.ignore_errors())
test_dataset = test_dataset.apply(tf.data.experimental.ignore_errors())

Define the Model (U-Net)

The model being used here is a modified U-Net. A U-Net consists of an encoder (downsampler) and decoder (upsampler). In this guide, you are using a very simple encoder/decoder architecture, to demonstrate the principles of using Masterful. This model should not be used in a production environment.

[8]:
def simple_unet(input_shape):
  """
  Creates a simple UNet encoder-decoder architecture, with a
  3 channel output (logits based) for semantic segmentation.
  Assumes an input/output size of (128,128).
  """

  def conv_block(input, num_filters):
    x = Conv2D(num_filters, 3, padding="same")(input)
    x = BatchNormalization()(x)
    x = Activation("relu")(x)

    x = Conv2D(num_filters, 3, padding="same")(x)
    x = BatchNormalization()(x)
    x = Activation("relu")(x)

    return x

  def encoder_block(input, num_filters):
    x = conv_block(input, num_filters)
    p = MaxPool2D((2, 2))(x)
    return x, p

  def decoder_block(input, skip_features, num_filters):
    x = Conv2DTranspose(num_filters, (2, 2), strides=2, padding="same")(input)
    x = Concatenate()([x, skip_features])
    x = conv_block(x, num_filters)
    return x

  inputs = Input(input_shape)

  s1, p1 = encoder_block(inputs, 32)
  s2, p2 = encoder_block(p1, 64)
  s3, p3 = encoder_block(p2, 128)
  s4, p4 = encoder_block(p3, 128)

  b1 = conv_block(p4, 128)

  d1 = decoder_block(b1, s4, 128)
  d2 = decoder_block(d1, s3, 128)
  d3 = decoder_block(d2, s2, 64)
  d4 = decoder_block(d3, s1, 32)

  outputs = Conv2D(3, 1, padding="same")(d4)
  model = Model(inputs, outputs, name="U-Net")
  return model
model = simple_unet(INPUT_SHAPE)

Train the Model.

The Masterful AutoML platform learns how to train your model by focusing on five core organizational principles in deep learning: architecture, data, optimization, regularization, and semi-supervision.

Architecture is the structure of weights, biases, and activations that define a model. In this example, the architecture is defined by the model you created above.

Data is the input used to train the model. In this example, you are using a labeled training dataset of flowers. More advanced usages of the Masterful AutoML platform can take into account unlabeled and synthetic data as well, using a variety of different techniques.

Optimization means finding the best weights for a model and training data. Optimization is different from regularization because optimization does not consider generalization to unseen data. The central challenge of optimization is speed - find the best weights faster.

Regularization means helping a model generalize to data it has not yet seen. Another way of saying this is that regularization is about fighting overfitting.

Semi-Supervision is the process by which a model can be trained using both labeled and unlabeled data.

The first step when using Masterful is to learn the optimal set of parameters for each of the five buckets above. You start by learning the architecture and data parameters of the model and training dataset. In the code below, you are telling Masterful that your model is performing a classification task (masterful.enums.Task.SEMANTIC_SEGMENTATION) with 3 labels (num_classes=3), and that the input range of the image features going into your model are in the range [0,1] (input_range=masterful.enums.ImageRange.ZERO_ONE). Also, the model outputs logits rather than a softmax classification (prediction_logits=True).

Furthermore, in the training dataset, you are providing dense labels (sparse_labels=False) rather than sparse labels.

For more details on architecture and data parameters, see the API specifications for ArchitectureParams and DataParams.

[15]:
model_params = masterful.architecture.learn_architecture_params(
  model=model,
  task=masterful.enums.Task.SEMANTIC_SEGMENTATION,
  input_range=masterful.enums.ImageRange.ZERO_ONE,
  num_classes=3,
  prediction_logits=True,
)
training_dataset_params = masterful.data.learn_data_params(
  dataset=training_dataset,
  task=masterful.enums.Task.SEMANTIC_SEGMENTATION,
  image_range=masterful.enums.ImageRange.ZERO_ONE,
  num_classes=3,
  sparse_labels=False,
)

Next you learn the optimization parameters that will be used to train the model. Below, you use Masterful to learn the standard set of optimization parameters to train your model for a classification task.

For more details on the optmization parameters, please see the OptimizationParams API specification.

[ ]:
optimization_params = masterful.optimization.learn_optimization_params(
  model,
  model_params,
  training_dataset,
  training_dataset_params,
)

The regularization parameters used can have a dramatic impact on the final performance of your trained model. Learning these parameters can be a time-consuming and domain specific challenge. Masterful can speed up this process by learning these parameters for you. In general, this can be an expensive operation. A rough order of magnitude for learning these parameters is 2x the time it takes to train your model. However, this is still dramatically faster than manually finding these parameters yourself. In the example below, you will use the learn_regularization_params API to learn these parameters directly from your dataset and model.

For more details on the regularization parameters, please see the RegularizationParams API specification.

[ ]:
regularization_params = masterful.regularization.learn_regularization_params(
  model,
  model_params,
  optimization_params,
  training_dataset,
  training_dataset_params,
)

The final step before training is to learn the optimal set of semi-supervision parameters. For this guide, you are not using any unlabeled or synthetic data as part of training, so most forms of semi-supervision will be disabled by default.

For more details on the semi-supervision parameters, please see the SemiSupervisedParams API specification.

[ ]:
ssl_params = masterful.ssl.learn_ssl_params(
  training_dataset,
  training_dataset_params,
)

Now, you are ready to train your model using the Masterful AutoML platform. In the next cell, you will see the call to masterful.training.train, which is the entry point to the meta-learning engine of the Masterful AutoML platform. Notice there is no need to batch your data (Masterful will find the optimal batch size for you). No need to shuffle your data (Masterful handles this for you). You don’t even need to pass in a validation dataset (Masterful finds one for you). You hand Masterful a model and a dataset, and Masterful will figure the rest out for you.

[16]:
training_report = masterful.training.train(
  model,
  model_params,
  optimization_params,
  regularization_params,
  ssl_params,
  training_dataset,
  training_dataset_params,
)
MASTERFUL: Auto-fitting model to datasets.
...
MASTERFUL: Training complete in 13.6272061546643575 minutes.

The model you passed into masterful.training.train is now trained and updated in place, so you are able to evaluate it just like any other trained Keras model.

[ ]:
model.evaluate(test_dataset.batch(optimization_params.batch_size))
58/58 [==============================] - 3s 49ms/step - loss: 0.2624 - categorical_accuracy: 0.9003