{ "cells": [ { "cell_type": "markdown", "id": "668ff5a3", "metadata": {}, "source": [ "# Convert Scale Data for Masterful CLI" ] }, { "cell_type": "markdown", "id": "bce5c11b", "metadata": {}, "source": [ "[![Open In Colab](images/colab_badge.png)][1]        \n", "[![Download](images/download.png)][2] [Download this Notebook][2]\n", "\n", "[1]: https://colab.research.google.com/github/masterfulai/masterful-docs/blob/main/notebooks/guide_scale_data_conversion.ipynb\n", "[2]: https://docs.masterfulai.com/0.6.0/notebooks/guide_scale_data_conversion.ipynb" ] }, { "cell_type": "markdown", "id": "8d8c64b4", "metadata": {}, "source": [ "## Introduction\n", "\n", "In this guide, you will learn how to transform [Scale's](https://scale.com) [General Image Annotation](https://docs.scale.com/reference/general-image-annotation) (`imageannotation`) and [2D Semantic Segmentation Annotation](https://docs.scale.com/reference/semantic-segmentation-annotation) (`segmentannotation`) data to the [CSV file format](../markdown/guide_cli_data_directory_format.md) that is compatible with [Masterful CLI](../markdown/intro_cli_trainer.md). Whether you choose to export your data from Scale, or directly connect to your account using the [Scale API Python Client](https://github.com/scaleapi/scaleapi-python-client), our data conversion tool will generate the `records.csv` and `label_map.csv` files required to train image classification, object detection, or semantic segmentation models with a single command. You will still need to define your own [YAML file](../markdown/guide_cli_yaml_config.md) for training and will likely want to split `records.csv` into training, validation, and test splits which Scale does not manage. For more details on training with Masterful CLI, refer to the [quickstart guide](../notebooks/tutorial_quickstart_cli.ipynb)." ] }, { "cell_type": "markdown", "id": "26a452ae", "metadata": {}, "source": [ "### Setting up the Scale API Python Client\n", "\n", "Install with PyPI (`pip`)\n", "```\n", "$ pip install --upgrade scaleapi\n", "```\n", "\n", "or install with Anaconda (`conda`)\n", "```\n", "$ conda install -c conda-forge scaleapi\n", "```\n", "\n", "The Scale API client requires a key to access project data. The live API key can be found in the upper right hand corner of their [dashboard](https://dashboard.scale.com) after logging in.\n", "\n", "\"Key\n", "\n", "To use this key with our data conversion tool either:\n", "1. Assign it to a `SCALE_API_KEY` environment variable (Example in a shell's configuration file: `export SCALE_API_KEY=\"paste_value_here\"`\n", "\n", "or\n", "\n", "2. Save the value to a file and point to its location using an optional argument when running our data conversion tool (Example: `--scale_api_key_path \"paste_key_path_here\"`)\n", "\n", "*Note that Scale protects a user's image data by using download URLs with a time to expire. Masterful CLI cannot train on this data if the links have already expired. Using the Scale API Python Client ensures that the links are fresh.*" ] }, { "cell_type": "markdown", "id": "cc6730c8", "metadata": {}, "source": [ "### Exporting Scale Project Data\n", "\n", "If there is no desire to connect to your Scale account directly, our data conversion tool accepts exported Scale data in JSON format. To export data within Scale, select a project and go to the tasks tab. Clicking on the blue download button will export the data to a JSON file.\n", "\n", "\"Scale\n", "\n", "*Make sure the download URLs featured in the JSON file have not expired, if you want to train with Masterful CLI!*" ] }, { "cell_type": "markdown", "id": "dbea6090", "metadata": {}, "source": [ "## Overview\n", "\n", "Running the help command will provide a list of required and optional arguments used by the Scale data conversion tool: `$ python -m masterful.data.converters.scale --help`\n", "\n", "```\n", "usage: scale [-h] (-p PROJECT_NAME | -j JSON_PATH) -t\n", " [{image_classification,object_detection,semantic_segmentation}] -o OUTPUT_PATH\n", " [-n BATCH_NAMES [BATCH_NAMES ...]] [-a CREATED_AFTER] [-b CREATED_BEFORE]\n", " [-k SCALE_API_KEY_PATH] [-d]\n", "\n", "Required Arguments:\n", " -p PROJECT_NAME, --project_name PROJECT_NAME\n", " Scale project name to download current data from. The Scale API key must be set\n", " in order to download. Batches can be selected by setting the optional\n", " 'created_after' and 'created_before' arguments.\n", " -j JSON_PATH, --json_path JSON_PATH\n", " Scale JSON file path. Supports local, AWS, and GCP paths.\n", " -t [{image_classification,object_detection,semantic_segmentation}], --task [{image_classification,object_detection,semantic_segmentation}]\n", " Types of computer vision tasks supported by Masterful.\n", " -o OUTPUT_PATH, --output_path OUTPUT_PATH\n", " Local destination folder to store output files.\n", "\n", "Optional Arguments:\n", " -a CREATED_AFTER, --created_after CREATED_AFTER\n", " Only select annotations after specified date (YYYY-MM-DD).\n", " -b CREATED_BEFORE, --created_before CREATED_BEFORE\n", " Only select annotations before specified date (YYYY-MM-DD).\n", " -k SCALE_API_KEY_PATH, --scale_api_key_path SCALE_API_KEY_PATH\n", " Path to file containing the live API key used by Scale for refreshing download\n", " URLs (or set the 'SCALE_API_KEY' environment variable). If not defined, expired\n", " default URLs will be used.\n", " -d, --download_images\n", " Choose whether or not to download image files to output folder.\n", "```\n", "\n", "By default only Scale tasks with a completed status will be parsed. There is currently no review status filter for this data conversion tool. Any annotation record that does not meet the requirements for the type of conversion being performed will be logged as a warning. Annotations that are excluded from the `--created_after` and `--created_before` filters will not be logged.\n", "\n", "*Note that the required arguments are not positional arguments like with many CLI apps, and that the `--project_name` and `--json_path` arguments are mutually exclusive.*" ] }, { "cell_type": "markdown", "id": "7a982c88", "metadata": {}, "source": [ "## Image Classification\n", "\n", "Regardless if the Scale project format is `imageannotation` or `segmentannotation`, you will be able to use it to train image classification models with Masterful CLI.\n", "\n", "For the `imageannotation` format, each image sample will contain a list of response annotations. Each annotation must be defined by a geometry of boxes, polygons, points, lines, ellipses, or cuboids. Any one of these geometries will indicate that the class is present within the image. Example:\n", "```\n", "python -m masterful.data.converters.scale --task image_classification --project_name my_imclass_proj --output_path /path/to/output_folder --scale_api_key_path /path/to/scale_api_key_file --download_images\n", "```\n", "\n", "For the `segmentannotation` format, each image sample will contain a label mapping that defines the number of mask pixels for each class. If this value is greater than 0 for any given class, the Scale data converter will interpret it as being present within the image. Example:\n", "```\n", "python -m masterful.data.converters.scale --task image_classification --project_name my_semseg_proj --output_path /path/to/output_folder --scale_api_key_path /path/to/scale_api_key_file --download_images\n", "```\n", "\n", "As long as the `--task` required argument is set to `image_classification`, `label_map.csv` and `records.csv` files will be written to the output folder to later be read when training with Masterful CLI. In the above examples the `--download_images` flag is set, and a path to a file containing the Scale API key is defined, so an `images` subfolder will be created within the output path and each eligible sample image will be written to it as well. Additional filtering can be performed by assigning the `--created_after` and `--created_before` arguments with a date in YYYY-MM-DD format." ] }, { "cell_type": "markdown", "id": "ed50eecf", "metadata": {}, "source": [ "## Object Detection\n", "\n", "For object detection conversion, the `imageannotation` format and box geometry must be used to prevent records from being skipped over. If this happens it will be logged as a warning. With these limitations in mind, the data conversion tool operates in the same way as with image classification. Assign the `--task` argument to `object_detection`. Example:\n", "\n", "```\n", "python -m masterful.data.converters.scale --task object_detection --project_name my_obj_detect_proj --output_path /path/to/output_folder --scale_api_key_path /path/to/scale_api_key_file --download_images\n", "```" ] }, { "cell_type": "markdown", "id": "511b9d3b", "metadata": {}, "source": [ "## Semantic Segmentation\n", "\n", "For semantic segmentation conversion, the `segmentannotation` format must be used. Besides this, program execution is the same as before but with the `--task` argument set to `semantic_segmentation`.\n", "\n", "```\n", "python -m masterful.data.converters.scale --task semantic_segmentation --project_name my_semseg_proj --output_path /path/to/output_folder --scale_api_key_path /path/to/scale_api_key_file --download_images\n", "```\n", "\n", "If the optional `--download_images` flag is set, both the images and the index labeled masks will be saved to the output path's `images` subfolder. The mask will have a similar name as its counterpart, but the name will end in \"_label.jpg\"." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.5 ('masterful')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" }, "vscode": { "interpreter": { "hash": "3246dc5397f03b283072dce4862e0488f464e2254f3c7f1768d36ad0557bc76c" } } }, "nbformat": 4, "nbformat_minor": 5 }