ObjectDetectionDataFilter Class

`ml_debugger.data_filter.object_detection.object_detection_torch_data_filter.ObjectDetectionTorchDataFilter`

Bases: CommonObjectDetectionDataFilter, ObjectDetectionTorchLogger

DataFilter for object detection tasks using PyTorch models.

Combines CommonObjectDetectionDataFilter (query, filter, aggregation) with ObjectDetectionTorchLogger (NMS, _parse_and_save_io_data, bbox hashing) to provide real-time filtering and Active Learning query support.

`init(model, model_name, version_name, result_name=None, n_epoch='latest', filter_config=None, target_layers=None, additional_fields=None, auto_sync=False, force_table_recreate=False, api_endpoint=None, api_key=None, n_class=None, score_thresh=None, iou_thresh=None, max_detections_per_image=None)`

Initialize object detection data filter.

Parameters:

Name	Type	Description	Default
`model`	`Module`	PyTorch model to trace.	required
`model_name`	`str`	Name of the ML model.	required
`version_name`	`str`	Version identifier for the ML model.	required
`result_name`	`Optional[str]`	The name of the existing evaluation result to retrieve.	`None`
`n_epoch`	`Union[str, Optional[int]]`	Filter option for n_epoch value.	`'latest'`
`filter_config`	`Optional[Union[BBoxStrategy, Dict[str, Any]]]`	BBox-level aggregation strategy for real-time filtering. Controls bbox selection, aggregation, and image-level threshold. Can be a BBoxStrategy instance, a dict, or None (defaults).	`None`
`target_layers`	`Optional[Dict[str, str]]`	Mapping of layer aliases to module paths.	`None`
`additional_fields`	`Optional[List[dict]]`	Extra fields for database schema.	`None`
`auto_sync`	`bool`	Enable background syncing of logged data.	`False`
`force_table_recreate`	`bool`	Whether to drop and recreate existing tables.	`False`
`api_endpoint`	`Optional[str]`	URL of the service API for data upload.	`None`
`api_key`	`Optional[str]`	API key for authenticating with the service.	`None`
`n_class`	`Optional[int]`	Number of classes. Auto-detected from model if None.	`None`
`score_thresh`	`Optional[float]`	Minimum score threshold for user-visible NMS decisions.	`None`
`iou_thresh`	`Optional[float]`	IoU threshold for user-visible NMS decisions.	`None`
`max_detections_per_image`	`Optional[int]`	Maximum detections per image for user-visible output.	`None`

`call(model_input, input_ids, dataset_type='pool', **kwargs)`

Invoke the data filter on a single inference, recording I/O data.

Parameters:

Name	Type	Description	Default
`model_input`	`Any`	Input data for the model inference.	required
`input_ids`	`List[str]`	Identifiers of each input data. Must not contain duplicates.	required
`dataset_type`	`str`	Identifier of input dataset. (e.g. 'pool')	`'pool'`
`**kwargs`	`Any`	Additional keyword arguments for parsing and saving I/O data.	`{}`

Returns:

Type	Description
`Tuple[Any, List[Optional[bool]]]`	Tuple of (model_output, filter_flags): - model_output: Raw model output. - filter_flags: List of booleans indicating if each input matches filter condition (True = matches / should be extracted, False = does not match, None = no filter configured or no bboxes).

Raises:

Type	Description
`ValueError`	If input_ids contains duplicates.

`get_hooked_features(layer_name)`

Retrieve the captured output for a given layer alias.

Parameters:

Name	Type	Description	Default
`layer_name`	`str`	Alias of the layer whose activation was captured.	required

Returns:

Name	Type	Description
`Any`	`Any`	Activation data stored for the specified layer.

Raises:

Type	Description
`KeyError`	If no activation has been captured for `layer_name`.

`export(output_path=None)`

Export extracted features into a ZIP archive.

Uses the internal n_epoch resolved during validator setup to filter records, consistent with upload() and wait_for_save().

Parameters:

Name	Type	Description	Default
`output_path`	`Optional[str]`	Path or directory for saving the ZIP file. If no .zip extension, the default filename is appended. Defaults to cwd.	`None`

Returns:

Type	Description
`Optional[Path]`	Path to the created ZIP file, or None on non-primary distributed ranks.

`wait_for_save(interval=3, *, force=False)`

`upload(*, force=False)`

`query(n_data, strategy, dataset_type='pool', type_cast=None, aggregation_level='input_id', sequence_mapping=None, sampling=None)`

Sort and query dataset based on strategy with input or sequence aggregation.

Parameters:

Name	Type	Description	Default
`n_data`	`int`	Maximum number of images to query.	required
`strategy`	`Union[str, BBoxStrategy, Dict[str, Any]]`	Query strategy. - 'high_error_proba': Per-image max of error_proba, sorted descending. Returns images where at least one bbox has high error. - 'low_error_proba': Per-image min of error_proba, sorted ascending. Returns images where at least one bbox has low error. - BBoxStrategy: Full control over bbox selection, aggregation, target column, and sort order. - dict: Validated as BBoxStrategy. Unknown keys are rejected, unspecified fields use defaults. Example: {'target_column': 'det_error_proba', 'aggregation': 'mean', 'top_n': 3}	required
`dataset_type`	`str`	Filter of input dataset (e.g. 'pool').	`'pool'`
`type_cast`	`Optional[type]`	Type for casting input_id (e.g. int). Ignored when aggregation_level="sequence".	`None`
`aggregation_level`	`str`	Result granularity. - "input_id": returns input_ids. - "sequence": returns sequence_ids. Requires sequence_mapping.	`'input_id'`
`sequence_mapping`	`Optional[SequenceMappingInput]`	Required when aggregation_level="sequence". Pattern A: {sequence_id: [input_id, ...]} Pattern B: {input_id: sequence_id}	`None`
`sampling`	`Optional[Union[str, SamplingConfig, Dict[str, Any]]]`	Sampling configuration for result selection. - None: Existing sort+top-N behavior (default). - "class_balanced": Per-class bbox pipeline with quota (string shorthand). - dict: Full configuration with method and options. Random requires dict form with min_value/max_value range filter: e.g. {"method": "random", "min_value": 0.5} e.g. {"method": "random", "min_value": 0.3, "max_value": 0.8} Class-balanced with options: e.g. {"method": "class_balanced", "min_per_class": 5, "seed": 42}	`None`

Returns:

Type	Description
`List[str]`	List of input_ids or sequence_ids.

Raises:

Type	Description
`ValueError`	If strategy is invalid or no data found.

`get_image_scores(strategy='high_error_proba', dataset_type='pool', aggregation_level='input_id', sequence_mapping=None)`

Get input-level or sequence-level ranking values sorted by strategy.

Aggregates per-bbox target values to input-level ranking values using the specified strategy, and returns them as a sorted DataFrame.

Parameters:

Name	Type	Description	Default
`strategy`	`Union[str, BBoxStrategy, Dict[str, Any]]`	Scoring strategy. - 'high_error_proba': Sort by error_proba descending (default). - 'low_error_proba': Sort by error_proba ascending. - BBoxStrategy: Full control over bbox selection, aggregation, target column, and sort order. - dict: Validated as BBoxStrategy, with optional sequence config. - file path: YAML/JSON strategy config.	`'high_error_proba'`
`dataset_type`	`str`	Filter of input dataset (e.g. 'pool').	`'pool'`
`aggregation_level`	`str`	Result granularity. - "input_id": input-level results. - "sequence": sequence-level results. Requires sequence_mapping.	`'input_id'`
`sequence_mapping`	`Optional[SequenceMappingInput]`	Required when aggregation_level="sequence". Pattern A: {sequence_id: [input_id, ...]} Pattern B: {input_id: sequence_id}	`None`

Returns:

Type	Description
`DataFrame`	aggregation_level="input_id": If target_column=="error_proba": [input_id, error_proba, pred_score]. Otherwise: [input_id, rank_value, pred_score].
`DataFrame`	aggregation_level="sequence": [sequence_id, rank_value, n_inputs_available, n_inputs_used].

Raises:

Type	Description
`ValueError`	If strategy is invalid, aggregation_level is invalid, or no data found.

ObjectDetectionDataFilter Class

ml_debugger.data_filter.object_detection.object_detection_torch_data_filter.ObjectDetectionTorchDataFilter

model

model_name

version_name

result_name

n_epoch

filter_config

target_layers

additional_fields

auto_sync

force_table_recreate

api_endpoint

api_key

n_class

score_thresh

iou_thresh

max_detections_per_image

__call__(model_input, input_ids, dataset_type='pool', **kwargs)

model_input

input_ids

dataset_type

**kwargs

get_hooked_features(layer_name)

layer_name

export(output_path=None)

output_path

wait_for_save(interval=3, *, force=False)

upload(*, force=False)

query(n_data, strategy, dataset_type='pool', type_cast=None, aggregation_level='input_id', sequence_mapping=None, sampling=None)

n_data

strategy

dataset_type

type_cast

aggregation_level

sequence_mapping

sampling

get_image_scores(strategy='high_error_proba', dataset_type='pool', aggregation_level='input_id', sequence_mapping=None)

strategy

dataset_type

aggregation_level

sequence_mapping

`ml_debugger.data_filter.object_detection.object_detection_torch_data_filter.ObjectDetectionTorchDataFilter`

`model`

`model_name`

`version_name`

`result_name`

`n_epoch`

`filter_config`

`target_layers`

`additional_fields`

`auto_sync`

`force_table_recreate`

`api_endpoint`

`api_key`

`n_class`

`score_thresh`

`iou_thresh`

`max_detections_per_image`

`call(model_input, input_ids, dataset_type='pool', **kwargs)`

`model_input`

`input_ids`

`dataset_type`

`**kwargs`

`get_hooked_features(layer_name)`

`layer_name`

`export(output_path=None)`

`output_path`

`wait_for_save(interval=3, *, force=False)`

`upload(*, force=False)`

`query(n_data, strategy, dataset_type='pool', type_cast=None, aggregation_level='input_id', sequence_mapping=None, sampling=None)`

`n_data`

`strategy`

`dataset_type`

`type_cast`

`aggregation_level`

`sequence_mapping`

`sampling`

`get_image_scores(strategy='high_error_proba', dataset_type='pool', aggregation_level='input_id', sequence_mapping=None)`

`strategy`

`dataset_type`

`aggregation_level`

`sequence_mapping`