API Reference: Evaluation


class masterful.evaluation.detection.coco.CocoEvaluationMetrics(categories, include_metrics_per_category=False, all_metrics_per_category=False, skip_predictions_for_unlabeled_class=False, super_categories=None)

Class for evaluating the MSCOCO evaluation metrics against the outputs from a Keras based object detection model.

This relies on the pycocotools package from the MSCOCO official release to perform the evaluation.

__init__(categories, include_metrics_per_category=False, all_metrics_per_category=False, skip_predictions_for_unlabeled_class=False, super_categories=None)

Initializes a new instance of CocoEvaluationMetrics.

  • categories – A list of dicts, each of which has the following keys - ‘id’: (required) an integer id uniquely identifying this category. ‘name’: (required) string representing category name e.g., ‘cat’, ‘dog’.

  • include_metrics_per_category – If True, include metrics for each category.

  • all_metrics_per_category – Whether to include all the summary metrics for each category in per_category_ap. Be careful with setting it to true if you have more than handful of categories, because it will pollute your mldash.

  • skip_predictions_for_unlabeled_class – Skip predictions that do not match with the labeled classes for the image.

  • super_categories – None or a python dict mapping super-category names (strings) to lists of categories (corresponding to category names in the label_map). Metrics are aggregated along these super-categories and added to the per_category_ap and are associated with the name PerformanceBySuperCategory/<super-category-name>.


Clears the state to prepare for a fresh evaluation.

evaluate_model(model, predictions_to_labels, test_dataset, num_classes, max_examples=9223372036854775807)

Evaluates a Keras based detection model and returns the COCO statistics.

  • model (keras.engine.training.Model) – The model to use for predictions.

  • predictions_to_labels (Callable) – A Callbable function which converts model predictions into labels that can be evaluated using the COCO metrics.

  • test_dataset (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The dataset to measure over.

  • num_classes (int) – The number of classes in the dataset and model predictions.

  • max_examples (int) – Maximum number of examples to evaluate. Defaults to all examples.


A dictionary of COCO evaluation metrics.

Return type



Saves the detections into json_output_path in the format used by MS COCO.


json_output_path – String containing the output file’s path. It can be also None. In that case nothing will be written to the output file.