model_name / version_name

The MLdebugger SDK uses model_name and version_name to identify models and their training versions. This page explains the concepts of these identifiers and how to use them correctly.

Overview

Identifier	Scope	Purpose
`model_name`	Unique within Organization	Model architecture identification
`version_name`	Unique within model_name	Training conditions/dataset identification
`n_epoch`	Metadata within version_name	Distinguish data by epoch

Organization

Definition

Organization is the unit of management for models and data in MLdebugger. All model_names are scoped within an Organization.

Personal Organization

Currently, the ability for users to create new Organizations is not available. Each user is automatically assigned a Personal Organization, which is used as the default Organization.

Organization Management

Personal Organization is automatically created when a user account is created. Team-oriented Organization features are planned for future updates.

model_name

Definition

model_name is a string that uniquely identifies a model within an Organization.

tracer = ClassificationTracer(
    model,
    model_name="resnet18_cifar10",  # Model identifier
    version_name="v1",
)

Characteristics

Unique within Organization

The same model_name cannot be used in multiple projects within the same Organization
Assign different model_name to different projects

Model Quota Consumption

Creating a new model_name consumes 1 Model Quota
Quota limits vary by contract plan

Association with Architecture Hash

The same model_name is associated with the same model architecture
The model's architecture hash is recorded on first use

Naming Recommendations

# Good examples
model_name = "resnet18_cifar10_classification"
model_name = "yolov8_coco_detection"
model_name = "project_a_main_model"

# Examples to avoid
model_name = "model"          # Too generic
model_name = "test"           # Cannot distinguish production and test
model_name = "v1"             # Easily confused with version_name

Note When Changing Architecture

When changing the model architecture, you need to use a new model_name.

# Using ResNet18
tracer = ClassificationTracer(model_resnet18, model_name="resnet18_project", ...)

# When changed to ResNet50 → new model_name required
tracer = ClassificationTracer(model_resnet50, model_name="resnet50_project", ...)

Same model_name with Different Architectures

Using the same model_name with different architecture models will cause an error.

version_name

Definition

version_name is a string that uniquely identifies a training version within a model_name.

tracer = ClassificationTracer(
    model,
    model_name="resnet18_cifar10",
    version_name="v1_lr0001",  # Training version identifier
)

Characteristics

Unique within model_name

Only one version_name can exist within the same model_name
The same version_name can be used in different model_names

Distinguishing Dataset/Hyperparameters

Use a new version_name when changing datasets
Also use a new version_name when changing hyperparameters

Reusing Same version_name When Resuming Training

When interrupting and resuming training, the same version_name can be reused
Data will be appended

Naming Recommendations

# Good examples: Names that indicate conditions
version_name = "v1_lr0001_batch32"
version_name = "20251219_experiment_a"
version_name = "baseline"

# Examples by use case
version_name = "train_augmented"     # With data augmentation
version_name = "train_no_augment"    # Without data augmentation
version_name = "finetune_v1"         # Fine-tuning

Condition Conflicts with Same version_name

Collecting data with different training conditions under the same version_name may cause data conflicts.

# Dangerous: Collecting data with different conditions under the same version_name
tracer = ClassificationTracer(model, "model_a", "v1")
# ... collect data trained with lr=0.001 ...

# Later, collecting data with lr=0.01 under the same version_name → Conflict!
tracer = ClassificationTracer(model, "model_a", "v1")
# ... collect data trained with lr=0.01 ...

Use New version_name When Conditions Change

Always use a new version_name when changing training conditions.

n_epoch

Definition

n_epoch is an optional parameter to distinguish data by epoch.

tracer(
    image,
    label,
    input_ids=indices,
    dataset_type="train",
    n_epoch=5,  # Epoch number
)

Characteristics

Optional Metadata

n_epoch is not required
If not specified, all data is treated as the same epoch

Internal Value When Not Specified

When n_epoch is not specified, it is internally managed as n_epoch=None.

You can later collect data with a specific n_epoch value. In that case, data previously collected without specifying n_epoch can be referenced by explicitly specifying n_epoch=None.

# Initially collect data without specifying n_epoch
tracer(image, label, input_ids=indices, dataset_type="train")
# → Internally saved as n_epoch=None

# Later collect data with n_epoch specified
tracer(image, label, input_ids=indices, dataset_type="train", n_epoch=1)

# To target data collected without n_epoch during evaluation
result = evaluator.request_evaluation(n_epoch=None)

"latest" Alias

Specify n_epoch="latest" during evaluation to evaluate only the latest epoch data.

result = evaluator.request_evaluation(n_epoch="latest")

Usage When Reusing version_name

When continuing training with the same version_name, n_epoch can be used to distinguish data.

# Initial training: epoch 0-4
for epoch in range(5):
    for batch in dataloader:
        tracer(..., n_epoch=epoch)

# Continued training: epoch 5-9
for epoch in range(5, 10):
    for batch in dataloader:
        tracer(..., n_epoch=epoch)

# Evaluate only latest epoch (9)
result = evaluator.request_evaluation(n_epoch="latest")

Caveats for "latest"

latest is Determined by Timestamp

n_epoch="latest" determines the latest epoch based on the timestamp of collected inference data. It is not based on the numerical value of n_epoch.

Problematic Case:

If you restart epochs from 1 with the same version_name, the previously recorded maximum n_epoch value (e.g., 9) will no longer be treated as latest. The newly recorded n_epoch=1 data has the most recent timestamp, so latest will point to that data.

# Initial training: record epoch 0-9 (January 2025)
for epoch in range(10):
    tracer(..., n_epoch=epoch)

# Months later, restart from epoch 0 with same version_name (March 2025)
for epoch in range(5):
    tracer(..., n_epoch=epoch)

# latest points to epoch=4 (because it has the most recent timestamp)
# epoch=9 data is NOT treated as latest
result = evaluator.request_evaluation(n_epoch="latest")

This can result in unintended data being evaluated or data conflicts.

Solutions:

Use a new version_name when restarting training from scratch
Explicitly specify n_epoch during evaluation
Specify result_name and n_epoch when initializing DataFilter/Logger

See DataFiltering for details.

Usage Summary

Scenario	model_name	version_name	n_epoch
New model architecture	New	Any	-
Training with new dataset	Existing	New	-
Hyperparameter change	Existing	New	-
Interrupt → Resume training	Existing	Existing	Recommended
Evaluation by epoch	Existing	Existing	Specify

Next Steps

Getting Started - Basic usage
Evaluation and Result - Detailed evaluation settings