Evaluator Class
ml_debugger.evaluator.evaluator.Evaluator
ML Debugger Evaluator for managing model evaluations.
The Evaluator class provides a high-level interface for requesting, managing, and retrieving ML model evaluation results. It handles authentication, evaluation requests, and result management through a service client.
Attributes:
| Name | Type | Description |
|---|---|---|
model_name |
str
|
Name of the ML model being evaluated. |
version_name |
str
|
Version identifier of the ML model. |
service_client |
ServiceClient
|
Client for interacting with the evaluation service. |
Example
>>> from ml_debugger.evaluator.evaluator import Evaluator
>>> evaluator = Evaluator(
... model_name="my_model",
... version_name="v1.0",
... api_endpoint="https://api.example.com",
... api_key="your_api_key"
... )
>>> result = evaluator.request_evaluation("ObjectDetection_v1")
>>> results = evaluator.list_results()
>>>
>>> # Retrieve a specific result
>>> retrieved_result = evaluator.get_result(results["result_name"][0])
__init__(model_name, version_name, api_endpoint=None, api_key=None)
Initialize the Evaluator instance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
str
|
Name of the ML model to be evaluated. This identifier is used to organize and track evaluation results. |
required |
|
str
|
Version identifier of the ML model. Allows tracking evaluations across different model versions. |
required |
|
Optional[str]
|
Base API endpoint URL for the evaluation service. If None, uses default service configuration. Defaults to None. |
None
|
|
Optional[str]
|
API key for authentication with the evaluation service. If None, uses default authentication method. Defaults to None. |
None
|
Raises:
| Type | Description |
|---|---|
LicenseError
|
If the license validation fails. |
ConnectionError
|
If unable to connect to the evaluation service. |
request_evaluation(method_name='default', result_name=None, n_epoch='latest', options=None)
Request a new model evaluation.
Initiates an evaluation request for the specified method and returns a Result object for tracking and retrieving the evaluation outcome.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
str
|
The name of the evaluation method to execute. This corresponds to predefined evaluation algorithms or metrics available in the evaluation service. |
'default'
|
|
Optional[str]
|
Custom name for the evaluation result. If None, a default name will be generated based on the method and timestamp. Defaults to None. |
None
|
|
Union[str, Optional[int]]
|
n_epoch filter for evaluation |
'latest'
|
|
Optional[Dict[str, Any]]
|
Additional parameters and configuration options for the evaluation method. The structure depends on the specific evaluation method. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
Result |
Result
|
A Result object that can be used to monitor evaluation progress and retrieve results when completed. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the method_name is not supported. |
ConnectionError
|
If unable to communicate with the evaluation service. |
Example
>>> result = evaluator.request_evaluation(
... method_name="ObjectDetection_v1",
... result_name="model_v1_detection_evaluation",
... options=None
... )
list_results(result_name=None, method_name=None, n_epoch='latest')
Get list of past evaluation results.
Retrieves a list of evaluation results that have been previously executed. Results can be filtered by name and returned in different formats.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
Optional[str]
|
Specific result name to filter by. If None, returns all available results. Defaults to None. |
None
|
|
Optional[str]
|
Name of evaluation method to filter.
|
None
|
|
Union[str, Optional[int]]
|
Filter option for n_epoch value.
|
'latest'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: Evaluation results. |
Example
>>> # Get results
>>> results_df = evaluator.list_results()
get_result(result_name, n_epoch='latest')
Get a specific evaluation result.
Retrieves a previously executed evaluation result by name and returns a Result object for accessing the evaluation data and metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
str
|
The name of the existing evaluation result to retrieve. This must correspond to a result that was previously created through request_evaluation or is available in the evaluation service. |
required |
|
Union[str, Optional[int]]
|
Filter option for n_epoch value.
|
'latest'
|
Returns:
| Name | Type | Description |
|---|---|---|
Result |
Result
|
A Result object containing the evaluation data and providing methods to access the evaluation outcomes, metrics, and metadata. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If the result_name is not found in the cached results. |
ConnectionError
|
If unable to communicate with the evaluation service. |
Example
>>> # First create an evaluation
>>> result = evaluator.request_evaluation(
... method_name="ObjectDetection_v1",
... result_name="model_v1_detection_evaluation",
... options=None
... )
>>>
>>> # Later retrieve the same result
>>> retrieved_result = evaluator.get_result("model_v1_detection_evaluation")
>>> retrieved_result.get_summary()