Skip to content

Evaluator Class

ml_debugger.evaluator.evaluator.Evaluator

ML Debugger Evaluator for managing model evaluations.

The Evaluator class provides a high-level interface for requesting, managing, and retrieving ML model evaluation results. It handles authentication, evaluation requests, and result management through a service client.

Attributes:

Name Type Description
model_name str

Name of the ML model being evaluated.

version_name str

Version identifier of the ML model.

service_client ServiceClient

Client for interacting with the evaluation service.

Example
>>> from ml_debugger.evaluator.evaluator import Evaluator
>>> evaluator = Evaluator(
...     model_name="my_model",
...     version_name="v1.0",
...     api_endpoint="https://api.example.com",
...     api_key="your_api_key"
... )
>>> result = evaluator.request_evaluation("ObjectDetection_v1")
>>> results = evaluator.list_results()
>>>
>>> # Retrieve a specific result
>>> retrieved_result = evaluator.get_result(results["result_name"][0])

__init__(model_name, version_name, api_endpoint=None, api_key=None)

Initialize the Evaluator instance.

Parameters:

Name Type Description Default

model_name

str

Name of the ML model to be evaluated. This identifier is used to organize and track evaluation results.

required

version_name

str

Version identifier of the ML model. Allows tracking evaluations across different model versions.

required

api_endpoint

Optional[str]

Base API endpoint URL for the evaluation service. If None, uses default service configuration. Defaults to None.

None

api_key

Optional[str]

API key for authentication with the evaluation service. If None, uses default authentication method. Defaults to None.

None

Raises:

Type Description
LicenseError

If the license validation fails.

ConnectionError

If unable to connect to the evaluation service.

request_evaluation(method_name='default', result_name=None, n_epoch='latest', options=None)

Request a new model evaluation.

Initiates an evaluation request for the specified method and returns a Result object for tracking and retrieving the evaluation outcome.

Parameters:

Name Type Description Default

method_name

str

The name of the evaluation method to execute. This corresponds to predefined evaluation algorithms or metrics available in the evaluation service.

'default'

result_name

Optional[str]

Custom name for the evaluation result. If None, a default name will be generated based on the method and timestamp. Defaults to None.

None

n_epoch

Union[str, Optional[int]]

n_epoch filter for evaluation

'latest'

options

Optional[Dict[str, Any]]

Additional parameters and configuration options for the evaluation method. The structure depends on the specific evaluation method. Defaults to None.

None

Returns:

Name Type Description
Result Result

A Result object that can be used to monitor evaluation progress and retrieve results when completed.

Raises:

Type Description
ValueError

If the method_name is not supported.

ConnectionError

If unable to communicate with the evaluation service.

Example
>>> result = evaluator.request_evaluation(
...     method_name="ObjectDetection_v1",
...     result_name="model_v1_detection_evaluation",
...     options=None
... )

list_results(result_name=None, method_name=None, n_epoch='latest')

Get list of past evaluation results.

Retrieves a list of evaluation results that have been previously executed. Results can be filtered by name and returned in different formats.

Parameters:

Name Type Description Default

result_name

Optional[str]

Specific result name to filter by. If None, returns all available results. Defaults to None.

None

method_name

Optional[str]

Name of evaluation method to filter. default alias available.

None

n_epoch

Union[str, Optional[int]]

Filter option for n_epoch value. latest or all alias available.

'latest'

Returns:

Type Description
DataFrame

pd.DataFrame: Evaluation results.

Example
>>> # Get results
>>> results_df = evaluator.list_results()

get_result(result_name, n_epoch='latest')

Get a specific evaluation result.

Retrieves a previously executed evaluation result by name and returns a Result object for accessing the evaluation data and metadata.

Parameters:

Name Type Description Default

result_name

str

The name of the existing evaluation result to retrieve. This must correspond to a result that was previously created through request_evaluation or is available in the evaluation service.

required

n_epoch

Union[str, Optional[int]]

Filter option for n_epoch value. latest or all alias available.

'latest'

Returns:

Name Type Description
Result Result

A Result object containing the evaluation data and providing methods to access the evaluation outcomes, metrics, and metadata.

Raises:

Type Description
KeyError

If the result_name is not found in the cached results.

ConnectionError

If unable to communicate with the evaluation service.

Example
>>> # First create an evaluation
>>> result = evaluator.request_evaluation(
...     method_name="ObjectDetection_v1",
...     result_name="model_v1_detection_evaluation",
...     options=None
... )
>>>
>>> # Later retrieve the same result
>>> retrieved_result = evaluator.get_result("model_v1_detection_evaluation")
>>> retrieved_result.get_summary()