Convert Scale Data for Masterful CLI

Open In Colab         Download Download this Notebook

Introduction

In this guide, you will learn how to transform Scale’s General Image Annotation (imageannotation) and 2D Semantic Segmentation Annotation (segmentannotation) data to the CSV file format that is compatible with Masterful CLI. Whether you choose to export your data from Scale, or directly connect to your account using the Scale API Python Client, our data conversion tool will generate the records.csv and label_map.csv files required to train image classification, object detection, or semantic segmentation models with a single command. You will still need to define your own YAML file for training and will likely want to split records.csv into training, validation, and test splits which Scale does not manage. For more details on training with Masterful CLI, refer to the quickstart guide.

Setting up the Scale API Python Client

Install with PyPI (pip)

$ pip install --upgrade scaleapi

or install with Anaconda (conda)

$ conda install -c conda-forge scaleapi

The Scale API client requires a key to access project data. The live API key can be found in the upper right hand corner of their dashboard after logging in.

Key Location Image

To use this key with our data conversion tool either: 1. Assign it to a SCALE_API_KEY environment variable (Example in a shell’s configuration file: export SCALE_API_KEY="paste_value_here"

or

  1. Save the value to a file and point to its location using an optional argument when running our data conversion tool (Example: --scale_api_key_path "paste_key_path_here")

Note that Scale protects a user’s image data by using download URLs with a time to expire. Masterful CLI cannot train on this data if the links have already expired. Using the Scale API Python Client ensures that the links are fresh.

Exporting Scale Project Data

If there is no desire to connect to your Scale account directly, our data conversion tool accepts exported Scale data in JSON format. To export data within Scale, select a project and go to the tasks tab. Clicking on the blue download button will export the data to a JSON file.

Scale Data Download Button

Make sure the download URLs featured in the JSON file have not expired, if you want to train with Masterful CLI!

Overview

Running the help command will provide a list of required and optional arguments used by the Scale data conversion tool: $ python -m masterful.data.converters.scale --help

usage: scale [-h] (-p PROJECT_NAME | -j JSON_PATH) -t
             [{image_classification,object_detection,semantic_segmentation}] -o OUTPUT_PATH
             [-n BATCH_NAMES [BATCH_NAMES ...]] [-a CREATED_AFTER] [-b CREATED_BEFORE]
             [-k SCALE_API_KEY_PATH] [-d]

Required Arguments:
  -p PROJECT_NAME, --project_name PROJECT_NAME
                        Scale project name to download current data from. The Scale API key must be set
                        in order to download. Batches can be selected by setting the optional
                        'created_after' and 'created_before' arguments.
  -j JSON_PATH, --json_path JSON_PATH
                        Scale JSON file path. Supports local, AWS, and GCP paths.
  -t [{image_classification,object_detection,semantic_segmentation}], --task [{image_classification,object_detection,semantic_segmentation}]
                        Types of computer vision tasks supported by Masterful.
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Local destination folder to store output files.

Optional Arguments:
  -a CREATED_AFTER, --created_after CREATED_AFTER
                        Only select annotations after specified date (YYYY-MM-DD).
  -b CREATED_BEFORE, --created_before CREATED_BEFORE
                        Only select annotations before specified date (YYYY-MM-DD).
  -k SCALE_API_KEY_PATH, --scale_api_key_path SCALE_API_KEY_PATH
                        Path to file containing the live API key used by Scale for refreshing download
                        URLs (or set the 'SCALE_API_KEY' environment variable). If not defined, expired
                        default URLs will be used.
  -d, --download_images
                        Choose whether or not to download image files to output folder.

By default only Scale tasks with a completed status will be parsed. There is currently no review status filter for this data conversion tool. Any annotation record that does not meet the requirements for the type of conversion being performed will be logged as a warning. Annotations that are excluded from the --created_after and --created_before filters will not be logged.

Note that the required arguments are not positional arguments like with many CLI apps, and that the ``–project_name`` and ``–json_path`` arguments are mutually exclusive.

Image Classification

Regardless if the Scale project format is imageannotation or segmentannotation, you will be able to use it to train image classification models with Masterful CLI.

For the imageannotation format, each image sample will contain a list of response annotations. Each annotation must be defined by a geometry of boxes, polygons, points, lines, ellipses, or cuboids. Any one of these geometries will indicate that the class is present within the image. Example:

python -m masterful.data.converters.scale --task image_classification --project_name my_imclass_proj --output_path /path/to/output_folder --scale_api_key_path /path/to/scale_api_key_file --download_images

For the segmentannotation format, each image sample will contain a label mapping that defines the number of mask pixels for each class. If this value is greater than 0 for any given class, the Scale data converter will interpret it as being present within the image. Example:

python -m masterful.data.converters.scale --task image_classification --project_name my_semseg_proj --output_path /path/to/output_folder --scale_api_key_path /path/to/scale_api_key_file --download_images

As long as the --task required argument is set to image_classification, label_map.csv and records.csv files will be written to the output folder to later be read when training with Masterful CLI. In the above examples the --download_images flag is set, and a path to a file containing the Scale API key is defined, so an images subfolder will be created within the output path and each eligible sample image will be written to it as well. Additional filtering can be performed by assigning the --created_after and --created_before arguments with a date in YYYY-MM-DD format.

Object Detection

For object detection conversion, the imageannotation format and box geometry must be used to prevent records from being skipped over. If this happens it will be logged as a warning. With these limitations in mind, the data conversion tool operates in the same way as with image classification. Assign the --task argument to object_detection. Example:

python -m masterful.data.converters.scale --task object_detection --project_name my_obj_detect_proj --output_path /path/to/output_folder --scale_api_key_path /path/to/scale_api_key_file --download_images

Semantic Segmentation

For semantic segmentation conversion, the segmentannotation format must be used. Besides this, program execution is the same as before but with the --task argument set to semantic_segmentation.

python -m masterful.data.converters.scale --task semantic_segmentation --project_name my_semseg_proj --output_path /path/to/output_folder --scale_api_key_path /path/to/scale_api_key_file --download_images

If the optional --download_images flag is set, both the images and the index labeled masks will be saved to the output path’s images subfolder. The mask will have a similar name as its counterpart, but the name will end in “_label.jpg”.