Convert Scale Data for Masterful CLI¶
Introduction¶
In this guide, you will learn how to transform Scale’s General Image Annotation (imageannotation
) and 2D Semantic Segmentation Annotation (segmentannotation
) data to the CSV file format that is compatible with Masterful CLI. Whether you
choose to export your data from Scale, or directly connect to your account using the Scale API Python Client, our data conversion tool will generate the records.csv
and label_map.csv
files required to train image classification, object detection, or semantic segmentation models with a single command. You will still need to define your own YAML file for training and will likely want to
split records.csv
into training, validation, and test splits which Scale does not manage. For more details on training with Masterful CLI, refer to the quickstart guide.
Setting up the Scale API Python Client¶
Install with PyPI (pip
)
$ pip install --upgrade scaleapi
or install with Anaconda (conda
)
$ conda install -c conda-forge scaleapi
The Scale API client requires a key to access project data. The live API key can be found in the upper right hand corner of their dashboard after logging in.
To use this key with our data conversion tool either: 1. Assign it to a SCALE_API_KEY
environment variable (Example in a shell’s configuration file: export SCALE_API_KEY="paste_value_here"
or
Save the value to a file and point to its location using an optional argument when running our data conversion tool (Example:
--scale_api_key_path "paste_key_path_here"
)
Note that Scale protects a user’s image data by using download URLs with a time to expire. Masterful CLI cannot train on this data if the links have already expired. Using the Scale API Python Client ensures that the links are fresh.
Exporting Scale Project Data¶
If there is no desire to connect to your Scale account directly, our data conversion tool accepts exported Scale data in JSON format. To export data within Scale, select a project and go to the tasks tab. Clicking on the blue download button will export the data to a JSON file.
Make sure the download URLs featured in the JSON file have not expired, if you want to train with Masterful CLI!
Overview¶
Running the help command will provide a list of required and optional arguments used by the Scale data conversion tool: $ python -m masterful.data.converters.scale --help
usage: scale [-h] (-p PROJECT_NAME | -j JSON_PATH) -t
[{image_classification,object_detection,semantic_segmentation}] -o OUTPUT_PATH
[-n BATCH_NAMES [BATCH_NAMES ...]] [-a CREATED_AFTER] [-b CREATED_BEFORE]
[-k SCALE_API_KEY_PATH] [-d]
Required Arguments:
-p PROJECT_NAME, --project_name PROJECT_NAME
Scale project name to download current data from. The Scale API key must be set
in order to download. Batches can be selected by setting the optional
'created_after' and 'created_before' arguments.
-j JSON_PATH, --json_path JSON_PATH
Scale JSON file path. Supports local, AWS, and GCP paths.
-t [{image_classification,object_detection,semantic_segmentation}], --task [{image_classification,object_detection,semantic_segmentation}]
Types of computer vision tasks supported by Masterful.
-o OUTPUT_PATH, --output_path OUTPUT_PATH
Local destination folder to store output files.
Optional Arguments:
-a CREATED_AFTER, --created_after CREATED_AFTER
Only select annotations after specified date (YYYY-MM-DD).
-b CREATED_BEFORE, --created_before CREATED_BEFORE
Only select annotations before specified date (YYYY-MM-DD).
-k SCALE_API_KEY_PATH, --scale_api_key_path SCALE_API_KEY_PATH
Path to file containing the live API key used by Scale for refreshing download
URLs (or set the 'SCALE_API_KEY' environment variable). If not defined, expired
default URLs will be used.
-d, --download_images
Choose whether or not to download image files to output folder.
By default only Scale tasks with a completed status will be parsed. There is currently no review status filter for this data conversion tool. Any annotation record that does not meet the requirements for the type of conversion being performed will be logged as a warning. Annotations that are excluded from the --created_after
and --created_before
filters will not be logged.
Note that the required arguments are not positional arguments like with many CLI apps, and that the ``–project_name`` and ``–json_path`` arguments are mutually exclusive.
Image Classification¶
Regardless if the Scale project format is imageannotation
or segmentannotation
, you will be able to use it to train image classification models with Masterful CLI.
For the imageannotation
format, each image sample will contain a list of response annotations. Each annotation must be defined by a geometry of boxes, polygons, points, lines, ellipses, or cuboids. Any one of these geometries will indicate that the class is present within the image. Example:
python -m masterful.data.converters.scale --task image_classification --project_name my_imclass_proj --output_path /path/to/output_folder --scale_api_key_path /path/to/scale_api_key_file --download_images
For the segmentannotation
format, each image sample will contain a label mapping that defines the number of mask pixels for each class. If this value is greater than 0 for any given class, the Scale data converter will interpret it as being present within the image. Example:
python -m masterful.data.converters.scale --task image_classification --project_name my_semseg_proj --output_path /path/to/output_folder --scale_api_key_path /path/to/scale_api_key_file --download_images
As long as the --task
required argument is set to image_classification
, label_map.csv
and records.csv
files will be written to the output folder to later be read when training with Masterful CLI. In the above examples the --download_images
flag is set, and a path to a file containing the Scale API key is defined, so an images
subfolder will be created within the output path and each eligible sample image will be written to it as well. Additional filtering can be performed
by assigning the --created_after
and --created_before
arguments with a date in YYYY-MM-DD format.
Object Detection¶
For object detection conversion, the imageannotation
format and box geometry must be used to prevent records from being skipped over. If this happens it will be logged as a warning. With these limitations in mind, the data conversion tool operates in the same way as with image classification. Assign the --task
argument to object_detection
. Example:
python -m masterful.data.converters.scale --task object_detection --project_name my_obj_detect_proj --output_path /path/to/output_folder --scale_api_key_path /path/to/scale_api_key_file --download_images
Semantic Segmentation¶
For semantic segmentation conversion, the segmentannotation
format must be used. Besides this, program execution is the same as before but with the --task
argument set to semantic_segmentation
.
python -m masterful.data.converters.scale --task semantic_segmentation --project_name my_semseg_proj --output_path /path/to/output_folder --scale_api_key_path /path/to/scale_api_key_file --download_images
If the optional --download_images
flag is set, both the images and the index labeled masks will be saved to the output path’s images
subfolder. The mask will have a similar name as its counterpart, but the name will end in “_label.jpg”.