API Reference: Data¶
DataParams¶
- class masterful.data.DataParams(num_classes=None, task=None, image_shape=None, image_range=None, image_dtype=None, image_channels_last=True, label_dtype=None, label_shape=None, label_structure=None, label_sparse=None, label_bounding_box_format=None)¶
Parameters describing the datasets used during training.
These parameters describe both the structure of the dataset (image and label shapes for examples) as well as semantic structure of the labels (the bounding box format for example, or whether or not the labels are sparse or dense).
- Parameters
num_classes (int) – The number of possible classes in the dataset.
task (masterful.enums.Task) – The task this dataset will be used for.
image_shape (Tuple) – The input shape of image data in the dataset, in the format (height, width, channels) if input_channels_last=True, otherwise (channels, height, width) if input_channels_last=False.
image_range (masterful.enums.ImageRange) – The range of pixels in the input image space that of the dataset.
image_dtype (tensorflow.python.framework.dtypes.DType) – The image data type in the dataset.
image_channels_last (bool) – The ordering of the dimensions in the inputs. input_channels_last=True corresponds to inputs with shape (height, width, channels) while input_channels_last=False corresponds to inputs with shape (channels, height, width). Defaults to True.
label_dtype (type) – The data type of the labels.
label_shape (Tuple) – The shape of the labels.
label_structure (masterful.enums.TensorStructure) – The tensor format of the label examples.
label_sparse (bool) – True if the labels are in sparse format, False for dense (one-hot) labels.
label_bounding_box_format (Optional[masterful.enums.BoundingBoxFormat]) – The format of bounding boxes in the label, if they exist.
- Return type
None
learn_data_params¶
- masterful.data.learn_data_params(dataset, image_range, num_classes, sparse_labels, task, bounding_box_format=None)¶
Learns the
DataParams
for the given dataset.Most parameters can be introspected from the dataset itself. Anything that cannot be introspected is passed into this function as an argument, or set on the
DataParams
after creation.Example:
# Learn parameters for a single dataset. training_dataset: tf.data.Dataset = ... dataset_params = masterful.data.learn_data_params( dataset=training_dataset, image_range=masterful.enums.ImageRange.ZERO_255, num_classes=10, sparse_labels=False, task=masterful.enums.Task.CLASSIFICATION) # Learn parameters for three datasets at the same time training_dataset: tf.data.Dataset = ... validation_dataset: tf.data.Dataset = ... test_dataset: tf.data.Dataset = ... (training_dataset_params, validation_dataset_params, test_dataset_params) = masterful.data.learn_data_params( datasets=[training_dataset, validation_dataset, test_dataset), image_range=masterful.enums.ImageRange.ZERO_255, num_classes=10, sparse_labels=[False, False, False], task=masterful.enums.Task.CLASSIFICATION)
- Parameters
dataset (Union[tensorflow.python.data.ops.dataset_ops.DatasetV2, numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray], tensorflow.python.keras.utils.data_utils.Sequence, Tuple[Callable[[], Iterator], Tuple[tensorflow.python.framework.tensor_spec.TensorSpec, tensorflow.python.framework.tensor_spec.TensorSpec]]]) – A tf.data.Dataset instance to learn the parameters for.
image_range (masterful.enums.ImageRange) – The range of pixels in the input image space that of the dataset.
num_classes (int) – The number of possible classes in the dataset.
sparse_labels (bool) – True if the labels are in sparse format, False for dense (one-hot) labels.
task (masterful.enums.Task) – The task this dataset will be used for.
bounding_box_format (masterful.enums.BoundingBoxFormat) – The format of bounding boxes in the label, if they exist.
- Returns
A new instance of DataParams describing the passed in dataset.
- Return type
Union[masterful.data.params.DataParams, Sequence[masterful.data.params.DataParams]]
run_health_check¶
- masterful.data.run_health_check(training_dataset, training_dataset_params, validation_dataset=None, validation_dataset_params=None, unlabeled_datasets=None, verbose=True)¶
Runs a data health check, which examines all of the datasets used during training and reports on their statistics.
- Parameters
training_dataset (masterful.data.DatasetLike) – The labeled dataset to use during training.
training_dataset_params (masterful.data.DataParams) – The parameters of the labeled dataset.
validation_dataset (Optional[masterful.data.DatasetLike]) – An optional validation dataset to use during training. If no validation set is specified, Masterful will autmoatically create one from the labeled dataset.
validation_dataset_params (Optional[masterful.data.DataParams]) – Optional parameters of the validation dataset.
unlabeled_datasets (Optional[Sequence[Tuple[masterful.data.DatasetLike, masterful.data.DataParams]]]) – Optional sequence of unlabled datasets and their parameters, to use during training. If an unlabeled dataset is specified, then a set of algorithms must be specified in ssl_params otherwise this will have no effect.
verbose (bool) – Boolean True for console progress bars during analysis, False for no console output.
Returns:
convert_and_pad_boxes¶
- masterful.data.preprocessing.convert_and_pad_boxes(boxes, classes, box_format, sparse_labels, num_classes, max_bounding_boxes)¶
Converts the bounding boxes into Masterful format and pads them so that they can be batched appropriately. At the end of this, the labels returned are appropriate for passing directly into Masterful.
- Parameters
boxes (tensorflow.python.framework.ops.Tensor) – Bounding boxes.
classes (tensorflow.python.framework.ops.Tensor) – The class labels.
box_format (masterful.enums.BoundingBoxFormat) – The format of the source bounding boxes.
sparse_labels (bool) – True if the labels are sparse, False if dense.
num_classes (int) – The number of classes per label.
max_bounding_boxes (int) – The maximum number of bounding boxes to convert.
- Returns
A Tensor of labels formatted for Masterful, appropriate for batching.
- Return type
tensorflow.python.framework.ops.Tensor
resize_and_pad¶
- masterful.data.preprocessing.resize_and_pad(image, labels=None, size=256)¶
Image preserving resize, which scales the largest side to ‘size’ and center pads the shortest side to match.
- Parameters
image (tensorflow.python.framework.ops.Tensor) – The image to resize.
labels (Optional[tensorflow.python.framework.ops.Tensor]) – Optional labels in Masterful format [valid, ymin, xmin, ymax, xmax, class] with shape [num_boxes, 1+4+num_classes]
size (int) – The square size to resize to.
- Returns
Tuple of image and label, where the image is padded to the square input size, and the labels are adjusted for the padding.
- Return type
Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor]