Error Codes - Classification

MLdebugger error codes are labels that classify how models make mistakes. This page explains the definition of error codes for Classification tasks.

About Issue Category

For the definition of Issue Categories (Stable Coverage, Hotspot, etc.) to which error codes belong, see Evaluation and Result.

Error Code Format

Classification task error codes are expressed in the following format.

Basic Format

CLS{t}->{y}

t: Annotated ground truth class ID
y: Predicted class ID

Examples: - CLS0->0: Class 0 is correct and predicted as class 0 (correct) - CLS3->5: Class 3 is correct but predicted as class 5 (incorrect)

With Fluctuation

CLS{t}->{y1}|{y2}|...

When internal features are unstable and the ground truth class t is easily confused with multiple classes, they are expressed separated by |.

Examples: - CLS3->7|1: Class 3 is easily confused with class 7 or 1 - CLS5->7|1|9: Class 5 is easily confused with classes 7, 1, and 9

Critical/Aleatoric Marker

CLS{t}->{y}**

Within Critical Hotspot or Aleatoric Hotspot, when the most plausible class based on internal features matches the annotation (t) but the prediction (y) differs, ** is appended.

Error codes with this marker indicate a high likelihood of being incorrect from an internal feature perspective as well.

Examples: - CLS9->7**: Class 9 is correct and 9 is most plausible from internal features, but predicted as 7 (incorrect) - CLS1->3**: Class 1 is correct and 1 is most plausible from internal features, but predicted as 3 (incorrect)

Regular Critical/Aleatoric

Critical Hotspot and Aleatoric Hotspot without the ** marker are displayed in the normal CLS{t}->{y} format.

Error Code Interpretation Examples

Example 1: `CLS6->7|1` (Hotspot)

Ground truth class: 6
Predicted class: 7 or 1
Issue Category: hotspot (Unstable)
Interpretation: Data of class 6, but easily confused with classes 7 and 1 in an unstable region
Action: Add training data at the boundary of classes 6, 7, and 1

Example 2: `CLS5->3**` (Critical Hotspot)

Ground truth class: 5
Predicted class: 3 (incorrect)
Issue Category: critical_hotspot (Over-Confidence)
Interpretation: Class 5 is correct and 5 is most plausible from internal features, but predicted as 3. High risk of confident incorrect predictions
Action: Highest priority - add Hard Negative data for this pattern

Example 3: `CLS0->7|1|5` (Hotspot)

Ground truth class: 0
Predicted class: 7, 1, 5
Issue Category: hotspot (Unstable)
Interpretation: Data of class 0, but easily confused with completely different multiple classes
Action: Need data to learn class 0 features more clearly

Debug Code

Each error code is assigned a debug_code indicating the debugging direction.

debug_code	Description	Action
`epistemic`	Due to model's lack of knowledge	Add data, retrain
`aleatoric`	Due to data ambiguity itself	Review annotation, reconsider task definition

Using Result API

# Get issue list
issues_df = result.get_issues()

# Distribution by error code
error_distribution = issues_df.groupby("error_code")["counts"].sum()

# Distribution by debug_code
debug_distribution = issues_df.groupby("debug_code")["counts"].sum()

# Extract only epistemic errors (improvable by adding data)
epistemic_issues = issues_df[issues_df["debug_code"] == "epistemic"]

Visualization Examples

# Get distribution by category
category_view = result.get_view(
    groupby=["category"],
)

# Detailed error code distribution in Hotspot
hotspot_detail = result.get_view(
    query="category == 'hotspot'",
    groupby=["error_code"],
)

# Distribution by debug_code
debug_view = result.get_view(
    groupby=["debug_code", "category"],
)

Next Steps

Evaluation and Result - Result API details