model_name / version_name
The MLdebugger SDK uses model_name and version_name to identify models and their training versions.
This page explains the concepts of these identifiers and how to use them correctly.
Overview
| Identifier | Scope | Purpose |
|---|---|---|
model_name |
Unique within Organization | Model architecture identification |
version_name |
Unique within model_name | Training conditions/dataset identification |
n_epoch |
Metadata within version_name | Distinguish data by epoch |
Organization
Definition
Organization is the unit of management for models and data in MLdebugger.
All model_names are scoped within an Organization.
Personal Organization
Currently, the ability for users to create new Organizations is not available. Each user is automatically assigned a Personal Organization, which is used as the default Organization.
Organization Management
Personal Organization is automatically created when a user account is created. Team-oriented Organization features are planned for future updates.
model_name
Definition
model_name is a string that uniquely identifies a model within an Organization.
tracer = ClassificationTracer(
model,
model_name="resnet18_cifar10", # Model identifier
version_name="v1",
)
Characteristics
Unique within Organization
- The same
model_namecannot be used in multiple projects within the same Organization - Assign different
model_nameto different projects
Model Quota Consumption
- Creating a new
model_nameconsumes 1 Model Quota - Quota limits vary by contract plan
Association with Architecture Hash
- The same
model_nameis associated with the same model architecture - The model's architecture hash is recorded on first use
Naming Recommendations
# Good examples
model_name = "resnet18_cifar10_classification"
model_name = "yolov8_coco_detection"
model_name = "project_a_main_model"
# Examples to avoid
model_name = "model" # Too generic
model_name = "test" # Cannot distinguish production and test
model_name = "v1" # Easily confused with version_name
Note When Changing Architecture
When changing the model architecture, you need to use a new model_name.
# Using ResNet18
tracer = ClassificationTracer(model_resnet18, model_name="resnet18_project", ...)
# When changed to ResNet50 → new model_name required
tracer = ClassificationTracer(model_resnet50, model_name="resnet50_project", ...)
Same model_name with Different Architectures
Using the same model_name with different architecture models will cause an error.
version_name
Definition
version_name is a string that uniquely identifies a training version within a model_name.
tracer = ClassificationTracer(
model,
model_name="resnet18_cifar10",
version_name="v1_lr0001", # Training version identifier
)
Characteristics
Unique within model_name
- Only one
version_namecan exist within the samemodel_name - The same
version_namecan be used in differentmodel_names
Distinguishing Dataset/Hyperparameters
- Use a new
version_namewhen changing datasets - Also use a new
version_namewhen changing hyperparameters
Reusing Same version_name When Resuming Training
- When interrupting and resuming training, the same
version_namecan be reused - Data will be appended
Naming Recommendations
# Good examples: Names that indicate conditions
version_name = "v1_lr0001_batch32"
version_name = "20251219_experiment_a"
version_name = "baseline"
# Examples by use case
version_name = "train_augmented" # With data augmentation
version_name = "train_no_augment" # Without data augmentation
version_name = "finetune_v1" # Fine-tuning
Condition Conflicts with Same version_name
Collecting data with different training conditions under the same version_name may cause data conflicts.
# Dangerous: Collecting data with different conditions under the same version_name
tracer = ClassificationTracer(model, "model_a", "v1")
# ... collect data trained with lr=0.001 ...
# Later, collecting data with lr=0.01 under the same version_name → Conflict!
tracer = ClassificationTracer(model, "model_a", "v1")
# ... collect data trained with lr=0.01 ...
Use New version_name When Conditions Change
Always use a new version_name when changing training conditions.
n_epoch
Definition
n_epoch is an optional parameter to distinguish data by epoch.
tracer(
image,
label,
input_ids=indices,
dataset_type="train",
n_epoch=5, # Epoch number
)
Characteristics
Optional Metadata
n_epochis not required- If not specified, all data is treated as the same epoch
Internal Value When Not Specified
When n_epoch is not specified, it is internally managed as n_epoch=None.
You can later collect data with a specific n_epoch value. In that case, data previously collected without specifying n_epoch can be referenced by explicitly specifying n_epoch=None.
# Initially collect data without specifying n_epoch
tracer(image, label, input_ids=indices, dataset_type="train")
# → Internally saved as n_epoch=None
# Later collect data with n_epoch specified
tracer(image, label, input_ids=indices, dataset_type="train", n_epoch=1)
# To target data collected without n_epoch during evaluation
result = evaluator.request_evaluation(n_epoch=None)
"latest" Alias
Specify n_epoch="latest" during evaluation to evaluate only the latest epoch data.
result = evaluator.request_evaluation(n_epoch="latest")
Usage When Reusing version_name
When continuing training with the same version_name, n_epoch can be used to distinguish data.
# Initial training: epoch 0-4
for epoch in range(5):
for batch in dataloader:
tracer(..., n_epoch=epoch)
# Continued training: epoch 5-9
for epoch in range(5, 10):
for batch in dataloader:
tracer(..., n_epoch=epoch)
# Evaluate only latest epoch (9)
result = evaluator.request_evaluation(n_epoch="latest")
Caveats for "latest"
latest is Determined by Timestamp
n_epoch="latest" determines the latest epoch based on the timestamp of collected inference data.
It is not based on the numerical value of n_epoch.
Problematic Case:
If you restart epochs from 1 with the same version_name, the previously recorded maximum n_epoch value (e.g., 9) will no longer be treated as latest.
The newly recorded n_epoch=1 data has the most recent timestamp, so latest will point to that data.
# Initial training: record epoch 0-9 (January 2025)
for epoch in range(10):
tracer(..., n_epoch=epoch)
# Months later, restart from epoch 0 with same version_name (March 2025)
for epoch in range(5):
tracer(..., n_epoch=epoch)
# latest points to epoch=4 (because it has the most recent timestamp)
# epoch=9 data is NOT treated as latest
result = evaluator.request_evaluation(n_epoch="latest")
This can result in unintended data being evaluated or data conflicts.
Solutions:
- Use a new
version_namewhen restarting training from scratch - Explicitly specify
n_epochduring evaluation - Specify
result_nameandn_epochwhen initializing DataFilter/Logger
See DataFiltering for details.
Usage Summary
| Scenario | model_name | version_name | n_epoch |
|---|---|---|---|
| New model architecture | New | Any | - |
| Training with new dataset | Existing | New | - |
| Hyperparameter change | Existing | New | - |
| Interrupt → Resume training | Existing | Existing | Recommended |
| Evaluation by epoch | Existing | Existing | Specify |
Next Steps
- Getting Started - Basic usage
- Evaluation and Result - Detailed evaluation settings