Skip to content

Tracing + Evaluation

This page explains the workflow for collecting inference log data and annotation information from datasets with ground truth, and running evaluations using the MLdebugger SDK.

Overview

The Tracing + Evaluation flow consists of the following steps:

  1. Set up authentication credentials
  2. Collect inference log data and annotation information with Tracer
  3. Run evaluation with Evaluator
  4. Review results

The Tracer class used in STEP 2 varies by task:

Task Tracer Class
Classification ClassificationTracer
Object Detection ObjectDetectionTracer
3D Object Detection ObjectDetection3DTracer

STEP 1: Set Up Authentication Credentials

Set authentication credentials in environment variables to connect to the MLdebugger API. For API Key generation, see Authentication.

export MLD_API_ENDPOINT="https://api.adansons.ai"
export MLD_API_KEY="mldbg_*************"
import os

os.environ["MLD_API_ENDPOINT"] = "https://api.adansons.ai"
os.environ["MLD_API_KEY"] = "mldbg_*************"

STEP 2: Collect Data with Tracer

Use ClassificationTracer to collect internal features and label information during model inference.

Initialize Tracer

from ml_debugger.training import ClassificationTracer

tracer = ClassificationTracer(
    model,                      # Model to evaluate
    model_name="resnet18",      # Model identifier
    version_name="v1",          # Version identifier
)

Collect Data

Collect data in batches using a DataLoader.

import torch

for image, label, indices in dataloader:
    image = image.to(device)
    label = label.to(device)

    with torch.no_grad():
        _ = tracer(
            image,                                              # Input data
            label,                                              # Ground truth label
            input_ids=indices.cpu().numpy(),                    # Data identifiers
            dataset_type="train",                               # Dataset type
            n_epoch=0,                                          # Epoch number (optional)
        )

Upload Data

Upload collected data to the API server.

tracer.wait_for_save()

Use ObjectDetectionTracer to collect inference results and ground truth from Object Detection models.

Supported Models:

Framework Supported Models
PyTorch Faster R-CNN, SSD, YOLOv8 (Ultralytics), CenterNet, DETR
TensorFlow KerasCV, TF Object Detection API, Custom Keras

DETR Family Support

DETR (DEtection TRansformer) and its variants (Conditional DETR, Deformable DETR) are supported. Both Facebook Research DETR and HuggingFace Transformers DETR models are auto-detected. DETR does not use NMS — the iou_thresh parameter is ignored. Filtering is controlled by score_thresh and max_detections_per_image.

Initialize Tracer

from ml_debugger.training import ObjectDetectionTracer

tracer = ObjectDetectionTracer(
    model,                          # PyTorch or TensorFlow OD model
    model_name="faster_rcnn",       # Model identifier
    version_name="v1",              # Version identifier
    score_thresh=None,              # NMS score threshold (optional, auto-inferred from model)
    iou_thresh=None,                # NMS IoU threshold (optional, auto-inferred from model)
    max_detections_per_image=None,  # NMS max detections (optional, auto-inferred from model)
)

ObjectDetectionTracer is a factory function that auto-detects the model's framework (PyTorch / TensorFlow) and architecture, returning the appropriate tracer subclass.

Parameter Type Default Description
model nn.Module / tf.keras.Model (required) Target OD model
model_name str (required) Model identifier
version_name str (required) Version identifier
score_thresh Optional[float] None NMS score threshold. Auto-inferred from model if not specified, falls back to 0.05
iou_thresh Optional[float] None NMS IoU threshold. Auto-inferred from model if not specified, falls back to 0.5
max_detections_per_image Optional[int] None Max detections after NMS. Auto-inferred from model if not specified, falls back to 300

Collect Data

Collect data in batches using a DataLoader.

import torch

for images, targets, image_ids in dataloader:
    images = images.to(device)

    _ = tracer(
        images,                         # Input images (B, C, H, W)
        targets,                        # Ground truth
        input_ids=image_ids,            # Image identifiers
        dataset_type="train",           # Dataset type
        n_epoch=0,                      # Epoch number (optional)
    )

Ground Truth Format

Ground truth supports two formats:

List[Dict] format — A list of dicts with boxes and labels per image:

ground_truth = [
    {
        "boxes": torch.tensor([[x1, y1, x2, y2], ...]),  # (N, 4)
        "labels": torch.tensor([0, 1, ...]),              # (N,)
    },
    ...  # One per image in the batch
]

Dict[str, List] format — A dict with boxes and labels keys, each containing per-image lists:

ground_truth = {
    "boxes": [tensor_img1, tensor_img2, ...],
    "labels": [tensor_img1, tensor_img2, ...],
}

Coordinate Format

Bounding boxes should be in [x1, y1, x2, y2] (top-left, bottom-right) format.

Upload Data

Upload collected data to the API server.

tracer.wait_for_save()

Use ObjectDetection3DTracer to collect inference results and ground truth from 3D Object Detection models.

Supported Models:

Model Category Modality Primary Frameworks
PointPillars Pillar-based LiDAR-only MMDetection3D, OpenPCDet
CenterPoint Anchor-free center-based LiDAR-only MMDetection3D, OpenPCDet
BEVFormer BEV Transformer Camera-only MMDetection3D
BEVFusion Multi-modal fusion LiDAR+Camera MMDetection3D, OpenPCDet

Derivative Model Support

Each model also supports derivative models with the same output format:

  • PointPillars: PointPillars++, EFMF-Pillars
  • CenterPoint: CenterFormer, VoxelNeXt, TransFusion, Voxel R-CNN
  • BEVFormer: BEVDet, BEVDet4D, StreamPETR
  • BEVFusion: Att-BEVFusion, SAMFusion

BEVFormer Family Support

BEVFormer and its derivatives use DETR-style Transformer query output. They do not use NMS — the iou_thresh parameter is ignored. Filtering is controlled by score_thresh and max_detections_per_frame.

Initialize Tracer

from ml_debugger.training import ObjectDetection3DTracer

tracer = ObjectDetection3DTracer(
    model,                              # PyTorch 3D OD model
    model_name="centerpoint",           # Model identifier
    version_name="v1",                  # Version identifier
    score_thresh=None,                  # Score threshold (optional, auto-inferred from model)
    iou_thresh=None,                    # BEV IoU threshold (optional, auto-inferred from model)
    max_detections_per_frame=None,      # Max detections per frame (optional, auto-inferred from model)
    pc_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0],  # Point cloud range (required for BEVFormer, auto-inferred from model)
)

ObjectDetection3DTracer is a factory function that auto-detects the model's architecture and returns the appropriate tracer subclass.

input_id / input_hash guideline

Set input_id to the smallest unit where inference is expected to be reproducible under fixed inference settings. For 3D OD, this is typically one frame (one timestamp). input_hash is derived from all modalities in that frame.

Parameter Type Default Description
model nn.Module (required) Target 3D OD model
model_name str (required) Model identifier
version_name str (required) Version identifier
score_thresh Optional[float] None Score threshold. Auto-inferred from model if not specified, falls back to 0.1
iou_thresh Optional[float] None BEV IoU threshold. Only effective for PointPillars (other models do not use NMS). Auto-inferred from model if not specified, falls back to 0.2
max_detections_per_frame Optional[int] None Max detections per frame. Auto-inferred from model if not specified, falls back to 300
pc_range Optional[List[float]] None Point cloud range [x_min, y_min, z_min, x_max, y_max, z_max]. Required for BEVFormer. Auto-inferred from model if not specified, raises error if unavailable.

Collect Data

Collect data in batches using a DataLoader.

LiDAR Input (Point Cloud Tensor):

import torch

for points, targets, frame_ids in dataloader:
    points = points.to(device)

    _ = tracer(
        points,                         # Point cloud (N, 4+)
        targets,                        # Ground truth
        input_ids=frame_ids,            # Frame identifiers
        dataset_type="train",           # Dataset type
        n_epoch=0,                      # Epoch number (optional)
    )

Dict Input (MMDetection3D format):

for batch_data in dataloader:
    model_input = {
        "points": [pts.to(device) for pts in batch_data["points"]],
        "img": batch_data["img"].to(device),
        "img_metas": batch_data["img_metas"],
    }

    _ = tracer(
        model_input,                    # Dict-format input
        batch_data["gt"],               # Ground truth
        input_ids=batch_data["frame_ids"],
        dataset_type="train",
        n_epoch=0,
    )

Ground Truth Format

List[Dict] format — A list of dicts with boxes_3d and labels_3d per frame:

ground_truth = [
    {
        "boxes_3d": torch.tensor([[cx, cy, cz, w, l, h, yaw], ...]),  # (N, 7)
        "labels_3d": torch.tensor([0, 1, ...]),                        # (N,)
        "velocities": torch.tensor([[vx, vy], ...]),                   # (N, 2) optional
    },
    ...  # One per frame in the batch
]

Dict[str, List] format — A dict with boxes_3d and labels_3d keys, each containing per-frame lists:

ground_truth = {
    "boxes_3d": [tensor_frame1, tensor_frame2, ...],
    "labels_3d": [tensor_frame1, tensor_frame2, ...],
    "velocities": [tensor_frame1, tensor_frame2, ...],  # optional
}

3D Bounding Box Format

3D bounding boxes should be in [cx, cy, cz, w, l, h, yaw] 7-DoF format. Coordinates are in meters, yaw is in radians.

Velocity Field

For models with velocity estimation (CenterPoint, BEVFusion), you can include a velocities field ([vx, vy] m/s) in the ground truth.

Upload Data

Upload collected data to the API server.

tracer.wait_for_save()

When using n_epoch

If you specified n_epoch during data collection, you must pass the same n_epoch to wait_for_save().

tracer.wait_for_save(n_epoch=0)

Using n_epoch

By specifying the n_epoch parameter, you can collect data by epoch. Specify n_epoch="latest" during evaluation to evaluate only the latest epoch data.

STEP 3: Run Evaluation with Evaluator

Use Evaluator to run evaluation on the collected data. Use the same model_name and version_name specified in the Tracer.

from ml_debugger.evaluator import Evaluator

evaluator = Evaluator(
    model_name="<your_model_name>",
    version_name="v1",
)

# Run evaluation
result = evaluator.request_evaluation()

request_evaluation() accepts the following optional arguments:

Argument Type Default Description
result_name Optional[str] None Custom name for the evaluation result. If not specified, a name is auto-generated.
n_epoch Union[str, int, None] "latest" Target epoch for evaluation. Use "latest" for the latest epoch, an integer for a specific epoch, or None to evaluate all epochs.
# Specify a result name and evaluate a specific epoch
result = evaluator.request_evaluation(
    result_name="my_model_experiment_001",
    n_epoch=5,
)

Get Past Evaluation Results

# Get list of evaluation results
evaluator.list_results()

# Get specific evaluation result
result = evaluator.get_result(result_name="<result_name>")

STEP 4: Review Results

Review Evaluation Results in GUI

  1. Home - Open the target model and move to the Training screen.

    Home: https://app.adansons.ai/home

    Training: https://app.adansons.ai/training

    Home overview

  2. Evaluation with GT - In this screen, you can review evaluation results. Use error codes and heatmaps to understand model performance on datasets with ground truth.

    Evaluation overview

    Evaluation Heatmap

    You can also check detailed error codes by category, review recommended debugging methods, and identify which error codes occur most frequently.

    Evaluation Summary

  3. Compare - You can also compare model performance with other models on this screen.

    Compare overview

    Add model comparison

    Comparison of models

Review Evaluation Results with SDK

Use the Result object to review evaluation results.

Metrics Summary

print(result.metrics_summary())

Example output:

dataset  counts  accuracy  auroc  auprc  net_entropy_loss  net_entropy  error_proba_auroc  error_proba_auprc
train    5000    0.098     0.457  0.094  12875.500         10480.190    0.857              0.453

Issue Category Summary

print(result.issue_category_summary())

Example output:

dataset  stable_coverage_ratio  operational_coverage_ratio  hotspot_ratio  recessive_hotspot_ratio  critical_hotspot_ratio  aleatoric_hotspot_ratio
train    0.000                  0.001                       0.753          0.226                    0.019                   0.006

Detailed Summary

result.get_summary()

This method displays metrics, Issue Category, and detailed error code distribution for each category.

Get Issue List

issues_df = result.get_issues()

You can get a list of all Issues (error codes) as a DataFrame.

Get Custom View

result.get_view(
    groupby=["category", "error_code"],
    adjustby="category"
)

You can get custom views with grouping and filtering.

Complete Sample Code

import os
import torch
from ml_debugger.training import ClassificationTracer
from ml_debugger.evaluator import Evaluator

# Set authentication credentials
os.environ["MLD_API_ENDPOINT"] = "https://api.adansons.ai"
os.environ["MLD_API_KEY"] = "mldbg_*************"

# Prepare model and dataloader (user implementation)
model = ...  # Trained model
dataloader = ...  # DataLoader
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model.to(device)
model.eval()

# Initialize Tracer
tracer = ClassificationTracer(
    model,
    model_name="my_model",
    version_name="v1",
)

# Collect data
for image, label, indices in dataloader:
    image = image.to(device)
    label = label.to(device)
    with torch.no_grad():
        _ = tracer(image, label, input_ids=indices.cpu().numpy(), dataset_type="train")

# Upload
tracer.wait_for_save()

# Run evaluation
evaluator = Evaluator(model_name="my_model", version_name="v1")
result = evaluator.request_evaluation()

# Review results
result.get_summary()
import os
import torch
from ml_debugger.training import ObjectDetectionTracer
from ml_debugger.evaluator import Evaluator

# Set authentication credentials
os.environ["MLD_API_ENDPOINT"] = "https://api.adansons.ai"
os.environ["MLD_API_KEY"] = "mldbg_*************"

# Prepare model and dataloader (user implementation)
model = ...  # Object Detection model (e.g. Faster R-CNN, SSD, YOLOv8, DETR)
dataloader = ...  # DataLoader
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model.to(device)
model.eval()

# Initialize Tracer
tracer = ObjectDetectionTracer(
    model,
    model_name="my_od_model",
    version_name="v1",
)

# Collect data
for images, targets, image_ids in dataloader:
    images = images.to(device)
    _ = tracer(
        images,
        targets,
        input_ids=image_ids,
        dataset_type="train",
    )

# Upload
tracer.wait_for_save()

# Run evaluation
evaluator = Evaluator(model_name="my_od_model", version_name="v1")
result = evaluator.request_evaluation()

# Review results
result.get_summary()
import os
import torch
from ml_debugger.training import ObjectDetection3DTracer
from ml_debugger.evaluator import Evaluator

# Set authentication credentials
os.environ["MLD_API_ENDPOINT"] = "https://api.adansons.ai"
os.environ["MLD_API_KEY"] = "mldbg_*************"

# Prepare model and dataloader (user implementation)
model = ...  # 3D Object Detection model (e.g. CenterPoint, PointPillars, BEVFormer, BEVFusion)
dataloader = ...  # DataLoader
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model.to(device)
model.eval()

# Initialize Tracer
tracer = ObjectDetection3DTracer(
    model,
    model_name="my_3d_od_model",
    version_name="v1",
)

# Collect data
for points, targets, frame_ids in dataloader:
    points = points.to(device)
    _ = tracer(
        points,
        targets,
        input_ids=frame_ids,
        dataset_type="train",
    )

# Upload
tracer.wait_for_save()

# Run evaluation
evaluator = Evaluator(model_name="my_3d_od_model", version_name="v1")
result = evaluator.request_evaluation()

# Review results
result.get_summary()

Next Steps