API Reference: Data


class masterful.data.DataParams(num_classes=None, task=None, image_shape=None, image_range=None, image_dtype=None, image_channels_last=True, label_dtype=None, label_shape=None, label_structure=None, label_sparse=None, label_bounding_box_format=None)

Parameters describing the datasets used during training.

These parameters describe both the structure of the dataset (image and label shapes for examples) as well as semantic structure of the labels (the bounding box format for example, or whether or not the labels are sparse or dense).

  • num_classes (int) – The number of possible classes in the dataset.

  • task (masterful.enums.Task) – The task this dataset will be used for.

  • image_shape (Tuple) – The input shape of image data in the dataset, in the format (height, width, channels) if input_channels_last=True, otherwise (channels, height, width) if input_channels_last=False.

  • image_range (masterful.enums.ImageRange) – The range of pixels in the input image space that of the dataset.

  • image_dtype (tensorflow.python.framework.dtypes.DType) – The image data type in the dataset.

  • image_channels_last (bool) – The ordering of the dimensions in the inputs. input_channels_last=True corresponds to inputs with shape (height, width, channels) while input_channels_last=False corresponds to inputs with shape (channels, height, width). Defaults to True.

  • label_dtype (type) – The data type of the labels.

  • label_shape (Tuple) – The shape of the labels.

  • label_structure (masterful.enums.TensorStructure) – The tensor format of the label examples.

  • label_sparse (bool) – True if the labels are in sparse format, False for dense (one-hot) labels.

  • label_bounding_box_format (Optional[masterful.enums.BoundingBoxFormat]) – The format of bounding boxes in the label, if they exist.

Return type



masterful.data.learn_data_params(dataset, image_range, num_classes, sparse_labels, task, bounding_box_format=None)

Learns the DataParams for the given dataset.

Most parameters can be introspected from the dataset itself. Anything that cannot be introspected is passed into this function as an argument, or set on the DataParams after creation.


# Learn parameters for a single dataset.
training_dataset: tf.data.Dataset = ...
dataset_params = masterful.data.learn_data_params(

# Learn parameters for three datasets at the same time
training_dataset: tf.data.Dataset = ...
validation_dataset: tf.data.Dataset = ...
test_dataset: tf.data.Dataset = ...

(training_dataset_params, validation_dataset_params, test_dataset_params) = masterful.data.learn_data_params(
    datasets=[training_dataset, validation_dataset, test_dataset),
    sparse_labels=[False, False, False],
  • dataset (Union[tensorflow.python.data.ops.dataset_ops.DatasetV2, numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray], tensorflow.python.keras.utils.data_utils.Sequence, Tuple[Callable[[], Iterator], Tuple[tensorflow.python.framework.tensor_spec.TensorSpec, tensorflow.python.framework.tensor_spec.TensorSpec]]]) – A tf.data.Dataset instance to learn the parameters for.

  • image_range (masterful.enums.ImageRange) – The range of pixels in the input image space that of the dataset.

  • num_classes (int) – The number of possible classes in the dataset.

  • sparse_labels (bool) – True if the labels are in sparse format, False for dense (one-hot) labels.

  • task (masterful.enums.Task) – The task this dataset will be used for.

  • bounding_box_format (masterful.enums.BoundingBoxFormat) – The format of bounding boxes in the label, if they exist.


A new instance of DataParams describing the passed in dataset.

Return type

Union[masterful.data.params.DataParams, Sequence[masterful.data.params.DataParams]]


masterful.data.run_health_check(training_dataset, training_dataset_params, validation_dataset=None, validation_dataset_params=None, unlabeled_datasets=None, verbose=True)

Runs a data health check, which examines all of the datasets used during training and reports on their statistics.

  • training_dataset (masterful.data.DatasetLike) – The labeled dataset to use during training.

  • training_dataset_params (masterful.data.DataParams) – The parameters of the labeled dataset.

  • validation_dataset (Optional[masterful.data.DatasetLike]) – An optional validation dataset to use during training. If no validation set is specified, Masterful will autmoatically create one from the labeled dataset.

  • validation_dataset_params (Optional[masterful.data.DataParams]) – Optional parameters of the validation dataset.

  • unlabeled_datasets (Optional[Sequence[Tuple[masterful.data.DatasetLike, masterful.data.DataParams]]]) – Optional sequence of unlabled datasets and their parameters, to use during training. If an unlabeled dataset is specified, then a set of algorithms must be specified in ssl_params otherwise this will have no effect.

  • verbose (bool) – Boolean True for console progress bars during analysis, False for no console output.



masterful.data.preprocessing.convert_and_pad_boxes(boxes, classes, box_format, sparse_labels, num_classes, max_bounding_boxes)

Converts the bounding boxes into Masterful format and pads them so that they can be batched appropriately. At the end of this, the labels returned are appropriate for passing directly into Masterful.

  • boxes (tensorflow.python.framework.ops.Tensor) – Bounding boxes.

  • classes (tensorflow.python.framework.ops.Tensor) – The class labels.

  • box_format (masterful.enums.BoundingBoxFormat) – The format of the source bounding boxes.

  • sparse_labels (bool) – True if the labels are sparse, False if dense.

  • num_classes (int) – The number of classes per label.

  • max_bounding_boxes (int) – The maximum number of bounding boxes to convert.


A Tensor of labels formatted for Masterful, appropriate for batching.

Return type



masterful.data.preprocessing.resize_and_pad(image, labels=None, size=256)

Image preserving resize, which scales the largest side to ‘size’ and center pads the shortest side to match.

  • image (tensorflow.python.framework.ops.Tensor) – The image to resize.

  • labels (Optional[tensorflow.python.framework.ops.Tensor]) – Optional labels in Masterful format [valid, ymin, xmin, ymax, xmax, class] with shape [num_boxes, 1+4+num_classes]

  • size (int) – The square size to resize to.


Tuple of image and label, where the image is padded to the square input size, and the labels are adjusted for the padding.

Return type

Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor]