Tracing + Evaluation

This page explains the workflow for collecting inference log data and annotation information from datasets with ground truth, and running evaluations using the MLdebugger SDK.

Overview

The Tracing + Evaluation flow consists of the following steps:

Set up authentication credentials
Collect inference log data and annotation information with Tracer
Run evaluation with Evaluator
Review results

The Tracer class used in STEP 2 varies by task:

Task	Tracer Class
Classification	`ClassificationTracer`
Object Detection	`ObjectDetectionTracer`
3D Object Detection	`ObjectDetection3DTracer`

STEP 1: Set Up Authentication Credentials

Set authentication credentials in environment variables to connect to the MLdebugger API. For API Key generation, see Authentication.

TerminalPython

export MLD_API_ENDPOINT="https://api.adansons.ai"
export MLD_API_KEY="mldbg_*************"

import os

os.environ["MLD_API_ENDPOINT"] = "https://api.adansons.ai"
os.environ["MLD_API_KEY"] = "mldbg_*************"

STEP 2: Collect Data with Tracer

ClassificationObject Detection3D Object Detection

Use ClassificationTracer to collect internal features and label information during model inference.

Initialize Tracer

from ml_debugger.training import ClassificationTracer

tracer = ClassificationTracer(
    model,                      # Model to evaluate
    model_name="resnet18",      # Model identifier
    version_name="v1",          # Version identifier
)

Collect Data

Collect data in batches using a DataLoader.

import torch

for image, label, indices in dataloader:
    image = image.to(device)
    label = label.to(device)

    with torch.no_grad():
        _ = tracer(
            image,                                              # Input data
            label,                                              # Ground truth label
            input_ids=indices.cpu().numpy(),                    # Data identifiers
            dataset_type="train",                               # Dataset type
            n_epoch=0,                                          # Epoch number (optional)
        )

Upload Data

Upload collected data to the API server.

tracer.wait_for_save()

Use ObjectDetectionTracer to collect inference results and ground truth from Object Detection models.

Supported Models:

Framework	Supported Models
PyTorch	Faster R-CNN, SSD, YOLOv8 (Ultralytics), CenterNet, DETR
TensorFlow	KerasCV, TF Object Detection API, Custom Keras

DETR Family Support

DETR (DEtection TRansformer) and its variants (Conditional DETR, Deformable DETR) are supported. Both Facebook Research DETR and HuggingFace Transformers DETR models are auto-detected. DETR does not use NMS — the iou_thresh parameter is ignored. Filtering is controlled by score_thresh and max_detections_per_image.

Initialize Tracer

from ml_debugger.training import ObjectDetectionTracer

tracer = ObjectDetectionTracer(
    model,                          # PyTorch or TensorFlow OD model
    model_name="faster_rcnn",       # Model identifier
    version_name="v1",              # Version identifier
    score_thresh=None,              # NMS score threshold (optional, auto-inferred from model)
    iou_thresh=None,                # NMS IoU threshold (optional, auto-inferred from model)
    max_detections_per_image=None,  # NMS max detections (optional, auto-inferred from model)
)

ObjectDetectionTracer is a factory function that auto-detects the model's framework (PyTorch / TensorFlow) and architecture, returning the appropriate tracer subclass.

Parameter	Type	Default	Description
`model`	`nn.Module` / `tf.keras.Model`	(required)	Target OD model
`model_name`	`str`	(required)	Model identifier
`version_name`	`str`	(required)	Version identifier
`score_thresh`	`Optional[float]`	`None`	NMS score threshold. Auto-inferred from model if not specified, falls back to `0.05`
`iou_thresh`	`Optional[float]`	`None`	NMS IoU threshold. Auto-inferred from model if not specified, falls back to `0.5`
`max_detections_per_image`	`Optional[int]`	`None`	Max detections after NMS. Auto-inferred from model if not specified, falls back to `300`

Collect Data

Collect data in batches using a DataLoader.

import torch

for images, targets, image_ids in dataloader:
    images = images.to(device)

    _ = tracer(
        images,                         # Input images (B, C, H, W)
        targets,                        # Ground truth
        input_ids=image_ids,            # Image identifiers
        dataset_type="train",           # Dataset type
        n_epoch=0,                      # Epoch number (optional)
    )

Ground Truth Format

Ground truth supports two formats:

List[Dict] format — A list of dicts with boxes and labels per image:

ground_truth = [
    {
        "boxes": torch.tensor([[x1, y1, x2, y2], ...]),  # (N, 4)
        "labels": torch.tensor([0, 1, ...]),              # (N,)
    },
    ...  # One per image in the batch
]

Dict[str, List] format — A dict with boxes and labels keys, each containing per-image lists:

ground_truth = {
    "boxes": [tensor_img1, tensor_img2, ...],
    "labels": [tensor_img1, tensor_img2, ...],
}

Coordinate Format

Bounding boxes should be in [x1, y1, x2, y2] (top-left, bottom-right) format.

Upload Data

Upload collected data to the API server.

tracer.wait_for_save()

Use ObjectDetection3DTracer to collect inference results and ground truth from 3D Object Detection models.

Supported Models:

Model	Category	Modality	Primary Frameworks
PointPillars	Pillar-based	LiDAR-only	MMDetection3D, OpenPCDet
CenterPoint	Anchor-free center-based	LiDAR-only	MMDetection3D, OpenPCDet
BEVFormer	BEV Transformer	Camera-only	MMDetection3D
BEVFusion	Multi-modal fusion	LiDAR+Camera	MMDetection3D, OpenPCDet

Derivative Model Support

Each model also supports derivative models with the same output format:

PointPillars: PointPillars++, EFMF-Pillars
CenterPoint: CenterFormer, VoxelNeXt, TransFusion, Voxel R-CNN
BEVFormer: BEVDet, BEVDet4D, StreamPETR
BEVFusion: Att-BEVFusion, SAMFusion

BEVFormer Family Support

BEVFormer and its derivatives use DETR-style Transformer query output. They do not use NMS — the iou_thresh parameter is ignored. Filtering is controlled by score_thresh and max_detections_per_frame.

Initialize Tracer

from ml_debugger.training import ObjectDetection3DTracer

tracer = ObjectDetection3DTracer(
    model,                              # PyTorch 3D OD model
    model_name="centerpoint",           # Model identifier
    version_name="v1",                  # Version identifier
    score_thresh=None,                  # Score threshold (optional, auto-inferred from model)
    iou_thresh=None,                    # BEV IoU threshold (optional, auto-inferred from model)
    max_detections_per_frame=None,      # Max detections per frame (optional, auto-inferred from model)
    pc_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0],  # Point cloud range (required for BEVFormer, auto-inferred from model)
)

ObjectDetection3DTracer is a factory function that auto-detects the model's architecture and returns the appropriate tracer subclass.

input_id / input_hash guideline

Set input_id to the smallest unit where inference is expected to be reproducible under fixed inference settings. For 3D OD, this is typically one frame (one timestamp). input_hash is derived from all modalities in that frame.

Parameter	Type	Default	Description
`model`	`nn.Module`	(required)	Target 3D OD model
`model_name`	`str`	(required)	Model identifier
`version_name`	`str`	(required)	Version identifier
`score_thresh`	`Optional[float]`	`None`	Score threshold. Auto-inferred from model if not specified, falls back to `0.1`
`iou_thresh`	`Optional[float]`	`None`	BEV IoU threshold. Only effective for PointPillars (other models do not use NMS). Auto-inferred from model if not specified, falls back to `0.2`
`max_detections_per_frame`	`Optional[int]`	`None`	Max detections per frame. Auto-inferred from model if not specified, falls back to `300`
`pc_range`	`Optional[List[float]]`	`None`	Point cloud range `[x_min, y_min, z_min, x_max, y_max, z_max]`. Required for BEVFormer. Auto-inferred from model if not specified, raises error if unavailable.

Collect Data

Collect data in batches using a DataLoader.

LiDAR Input (Point Cloud Tensor):

import torch

for points, targets, frame_ids in dataloader:
    points = points.to(device)

    _ = tracer(
        points,                         # Point cloud (N, 4+)
        targets,                        # Ground truth
        input_ids=frame_ids,            # Frame identifiers
        dataset_type="train",           # Dataset type
        n_epoch=0,                      # Epoch number (optional)
    )

Dict Input (MMDetection3D format):

for batch_data in dataloader:
    model_input = {
        "points": [pts.to(device) for pts in batch_data["points"]],
        "img": batch_data["img"].to(device),
        "img_metas": batch_data["img_metas"],
    }

    _ = tracer(
        model_input,                    # Dict-format input
        batch_data["gt"],               # Ground truth
        input_ids=batch_data["frame_ids"],
        dataset_type="train",
        n_epoch=0,
    )

Ground Truth Format

List[Dict] format — A list of dicts with boxes_3d and labels_3d per frame:

ground_truth = [
    {
        "boxes_3d": torch.tensor([[cx, cy, cz, w, l, h, yaw], ...]),  # (N, 7)
        "labels_3d": torch.tensor([0, 1, ...]),                        # (N,)
        "velocities": torch.tensor([[vx, vy], ...]),                   # (N, 2) optional
    },
    ...  # One per frame in the batch
]

Dict[str, List] format — A dict with boxes_3d and labels_3d keys, each containing per-frame lists:

ground_truth = {
    "boxes_3d": [tensor_frame1, tensor_frame2, ...],
    "labels_3d": [tensor_frame1, tensor_frame2, ...],
    "velocities": [tensor_frame1, tensor_frame2, ...],  # optional
}

3D Bounding Box Format

3D bounding boxes should be in [cx, cy, cz, w, l, h, yaw] 7-DoF format. Coordinates are in meters, yaw is in radians.

Velocity Field

For models with velocity estimation (CenterPoint, BEVFusion), you can include a velocities field ([vx, vy] m/s) in the ground truth.

Upload Data

Upload collected data to the API server.

tracer.wait_for_save()

When using n_epoch

If you specified n_epoch during data collection, you must pass the same n_epoch to wait_for_save().

tracer.wait_for_save(n_epoch=0)

Using n_epoch

By specifying the n_epoch parameter, you can collect data by epoch. Specify n_epoch="latest" during evaluation to evaluate only the latest epoch data.

STEP 3: Run Evaluation with Evaluator

Use Evaluator to run evaluation on the collected data. Use the same model_name and version_name specified in the Tracer.

from ml_debugger.evaluator import Evaluator

evaluator = Evaluator(
    model_name="<your_model_name>",
    version_name="v1",
)

# Run evaluation
result = evaluator.request_evaluation()

request_evaluation() accepts the following optional arguments:

Argument	Type	Default	Description
`result_name`	`Optional[str]`	`None`	Custom name for the evaluation result. If not specified, a name is auto-generated.
`n_epoch`	`Union[str, int, None]`	`"latest"`	Target epoch for evaluation. Use `"latest"` for the latest epoch, an integer for a specific epoch, or `None` to evaluate all epochs.

# Specify a result name and evaluate a specific epoch
result = evaluator.request_evaluation(
    result_name="my_model_experiment_001",
    n_epoch=5,
)

Get Past Evaluation Results

# Get list of evaluation results
evaluator.list_results()

# Get specific evaluation result
result = evaluator.get_result(result_name="<result_name>")

STEP 4: Review Results

Review Evaluation Results in GUI

Home - Open the target model and move to the Training screen.

Home: https://app.adansons.ai/home

Training: https://app.adansons.ai/training
Evaluation with GT - In this screen, you can review evaluation results. Use error codes and heatmaps to understand model performance on datasets with ground truth.

You can also check detailed error codes by category, review recommended debugging methods, and identify which error codes occur most frequently.
Compare - You can also compare model performance with other models on this screen.

Review Evaluation Results with SDK

Use the Result object to review evaluation results.

Metrics Summary

print(result.metrics_summary())

Example output:

dataset  counts  accuracy  auroc  auprc  net_entropy_loss  net_entropy  error_proba_auroc  error_proba_auprc
train    5000    0.098     0.457  0.094  12875.500         10480.190    0.857              0.453

Issue Category Summary

print(result.issue_category_summary())

Example output:

dataset  stable_coverage_ratio  operational_coverage_ratio  hotspot_ratio  recessive_hotspot_ratio  critical_hotspot_ratio  aleatoric_hotspot_ratio
train    0.000                  0.001                       0.753          0.226                    0.019                   0.006

Detailed Summary

result.get_summary()

This method displays metrics, Issue Category, and detailed error code distribution for each category.

Get Issue List

issues_df = result.get_issues()

You can get a list of all Issues (error codes) as a DataFrame.

Get Custom View

result.get_view(
    groupby=["category", "error_code"],
    adjustby="category"
)

You can get custom views with grouping and filtering.

Complete Sample Code

ClassificationObject Detection3D Object Detection

import os
import torch
from ml_debugger.training import ClassificationTracer
from ml_debugger.evaluator import Evaluator

# Set authentication credentials
os.environ["MLD_API_ENDPOINT"] = "https://api.adansons.ai"
os.environ["MLD_API_KEY"] = "mldbg_*************"

# Prepare model and dataloader (user implementation)
model = ...  # Trained model
dataloader = ...  # DataLoader
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model.to(device)
model.eval()

# Initialize Tracer
tracer = ClassificationTracer(
    model,
    model_name="my_model",
    version_name="v1",
)

# Collect data
for image, label, indices in dataloader:
    image = image.to(device)
    label = label.to(device)
    with torch.no_grad():
        _ = tracer(image, label, input_ids=indices.cpu().numpy(), dataset_type="train")

# Upload
tracer.wait_for_save()

# Run evaluation
evaluator = Evaluator(model_name="my_model", version_name="v1")
result = evaluator.request_evaluation()

# Review results
result.get_summary()

import os
import torch
from ml_debugger.training import ObjectDetectionTracer
from ml_debugger.evaluator import Evaluator

# Set authentication credentials
os.environ["MLD_API_ENDPOINT"] = "https://api.adansons.ai"
os.environ["MLD_API_KEY"] = "mldbg_*************"

# Prepare model and dataloader (user implementation)
model = ...  # Object Detection model (e.g. Faster R-CNN, SSD, YOLOv8, DETR)
dataloader = ...  # DataLoader
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model.to(device)
model.eval()

# Initialize Tracer
tracer = ObjectDetectionTracer(
    model,
    model_name="my_od_model",
    version_name="v1",
)

# Collect data
for images, targets, image_ids in dataloader:
    images = images.to(device)
    _ = tracer(
        images,
        targets,
        input_ids=image_ids,
        dataset_type="train",
    )

# Upload
tracer.wait_for_save()

# Run evaluation
evaluator = Evaluator(model_name="my_od_model", version_name="v1")
result = evaluator.request_evaluation()

# Review results
result.get_summary()

import os
import torch
from ml_debugger.training import ObjectDetection3DTracer
from ml_debugger.evaluator import Evaluator

# Set authentication credentials
os.environ["MLD_API_ENDPOINT"] = "https://api.adansons.ai"
os.environ["MLD_API_KEY"] = "mldbg_*************"

# Prepare model and dataloader (user implementation)
model = ...  # 3D Object Detection model (e.g. CenterPoint, PointPillars, BEVFormer, BEVFusion)
dataloader = ...  # DataLoader
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model.to(device)
model.eval()

# Initialize Tracer
tracer = ObjectDetection3DTracer(
    model,
    model_name="my_3d_od_model",
    version_name="v1",
)

# Collect data
for points, targets, frame_ids in dataloader:
    points = points.to(device)
    _ = tracer(
        points,
        targets,
        input_ids=frame_ids,
        dataset_type="train",
    )

# Upload
tracer.wait_for_save()

# Run evaluation
evaluator = Evaluator(model_name="my_3d_od_model", version_name="v1")
result = evaluator.request_evaluation()

# Review results
result.get_summary()

Next Steps

DataFiltering - Data selection for Active Learning
Logging - Inference log monitoring during operation
model_name / version_name - Detailed explanation of identifiers
Evaluation and Result - Detailed evaluation settings