API Reference: Data

DataParams

class masterful.data.DataParams(num_classes=None, task=None, image_shape=None, image_range=None, image_dtype=None, image_channels_last=True, label_dtype=None, label_shape=None, label_structure=None, label_sparse=None, label_bounding_box_format=None)

Parameters describing the datasets used during training.

These parameters describe both the structure of the dataset (image and label shapes for examples) as well as semantic structure of the labels (the bounding box format for example, or whether or not the labels are sparse or dense).

Parameters
  • num_classes (int) – The number of possible classes in the dataset.

  • task (masterful.enums.Task) – The task this dataset will be used for.

  • image_shape (Tuple) – The input shape of image data in the dataset, in the format (height, width, channels) if input_channels_last=True, otherwise (channels, height, width) if input_channels_last=False.

  • image_range (masterful.enums.ImageRange) – The range of pixels in the input image space that of the dataset.

  • image_dtype (tensorflow.python.framework.dtypes.DType) – The image data type in the dataset.

  • image_channels_last (bool) – The ordering of the dimensions in the inputs. input_channels_last=True corresponds to inputs with shape (height, width, channels) while input_channels_last=False corresponds to inputs with shape (channels, height, width). Defaults to True.

  • label_dtype (type) – The data type of the labels.

  • label_shape (Tuple) – The shape of the labels.

  • label_structure (masterful.enums.TensorStructure) – The tensor format of the label examples.

  • label_sparse (bool) – True if the labels are in sparse format, False for dense (one-hot) labels.

  • label_bounding_box_format (Optional[masterful.enums.BoundingBoxFormat]) – The format of bounding boxes in the label, if they exist.

Return type

None

learn_data_params

masterful.data.learn_data_params(dataset, image_range, num_classes, sparse_labels, task, bounding_box_format=None)

Learns the DataParams for the given dataset.

Most parameters can be introspected from the dataset itself. Anything that cannot be introspected is passed into this function as an argument, or set on the DataParams after creation.

Example:

# Learn parameters for a single dataset.
training_dataset: tf.data.Dataset = ...
dataset_params = masterful.data.learn_data_params(
    dataset=training_dataset,
    image_range=masterful.enums.ImageRange.ZERO_255,
    num_classes=10,
    sparse_labels=False,
    task=masterful.enums.Task.CLASSIFICATION)

# Learn parameters for three datasets at the same time
training_dataset: tf.data.Dataset = ...
validation_dataset: tf.data.Dataset = ...
test_dataset: tf.data.Dataset = ...

(training_dataset_params, validation_dataset_params, test_dataset_params) = masterful.data.learn_data_params(
    datasets=[training_dataset, validation_dataset, test_dataset),
    image_range=masterful.enums.ImageRange.ZERO_255,
    num_classes=10,
    sparse_labels=[False, False, False],
    task=masterful.enums.Task.CLASSIFICATION)
Parameters
  • dataset (Union[tensorflow.python.data.ops.dataset_ops.DatasetV2, numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray], keras.utils.data_utils.Sequence, Tuple[Callable[[], Iterator], Tuple[tensorflow.python.framework.tensor_spec.TensorSpec, tensorflow.python.framework.tensor_spec.TensorSpec]]]) – A tf.data.Dataset instance to learn the parameters for.

  • image_range (masterful.enums.ImageRange) – The range of pixels in the input image space that of the dataset.

  • num_classes (int) – The number of possible classes in the dataset.

  • sparse_labels (bool) – True if the labels are in sparse format, False for dense (one-hot) labels.

  • task (masterful.enums.Task) – The task this dataset will be used for.

  • bounding_box_format (masterful.enums.BoundingBoxFormat) – The format of bounding boxes in the label, if they exist.

Returns

A new instance of DataParams describing the passed in dataset.

Return type

Union[masterful.data.params.DataParams, Sequence[masterful.data.params.DataParams]]

convert_and_pad_boxes

masterful.data.preprocessing.convert_and_pad_boxes(boxes, classes, box_format, sparse_labels, num_classes, max_bounding_boxes)

Converts the bounding boxes into Masterful format and pads them so that they can be batched appropriately. At the end of this, the labels returned are appropriate for passing directly into Masterful.

Parameters
  • boxes (tensorflow.python.framework.ops.Tensor) – Bounding boxes.

  • classes (tensorflow.python.framework.ops.Tensor) – The class labels.

  • box_format (masterful.enums.BoundingBoxFormat) – The format of the source bounding boxes.

  • sparse_labels (bool) – True if the labels are sparse, False if dense.

  • num_classes (int) – The number of classes per label.

  • max_bounding_boxes (int) – The maximum number of bounding boxes to convert.

Returns

A Tensor of labels formatted for Masterful, appropriate for batching.

Return type

tensorflow.python.framework.ops.Tensor

resize_and_pad

masterful.data.preprocessing.resize_and_pad(image, labels=None, size=256)

Image preserving resize, which scales the largest side to ‘size’ and pads the shortest side to match.

Parameters
  • image (tensorflow.python.framework.ops.Tensor) – The image to resize.

  • labels (Optional[tensorflow.python.framework.ops.Tensor]) – Optional labels in Masterful format [valid, ymin, xmin, ymax, xmax, class] with shape [num_boxes, 1+4+num_classes]

  • size (int) – The square size to resize to.

Returns

Tuple of image and label, where the image is padded to the square input size, and the labels are adjusted for the padding.

Return type

Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor]