dacapo.experiments.tasks.evaluators.evaluator

Attributes

`OutputIdentifier`
`Iteration`
`Score`
`BestScore`

Classes

Evaluator

Base class of all evaluators: An abstract class representing an evaluator that compares and evaluates the output array against the evaluation array.

Module Contents

dacapo.experiments.tasks.evaluators.evaluator.OutputIdentifier

dacapo.experiments.tasks.evaluators.evaluator.Iteration

dacapo.experiments.tasks.evaluators.evaluator.Score

dacapo.experiments.tasks.evaluators.evaluator.BestScore

class dacapo.experiments.tasks.evaluators.evaluator.Evaluator

Base class of all evaluators: An abstract class representing an evaluator that compares and evaluates the output array against the evaluation array.

An evaluator takes a post-processor’s output and compares it against ground-truth. It then returns a set of scores that can be used to determine the quality of the post-processor’s output.

best_scores: Dict[OutputIdentifier, BestScore] the best scores for each dataset/post-processing parameter/criterion combination

evaluate(output_array_identifier, evaluation_array): Compare and evaluate the output array against the evaluation array.

is_best(dataset, parameter, criterion, score): Check if the provided score is the best for this dataset/parameter/criterion combo.

get_overall_best(dataset, criterion): Return the best score for the given dataset and criterion.

get_overall_best_parameters(dataset, criterion): Return the best parameters for the given dataset and criterion.

compare(score_1, score_2, criterion): Compare two scores for the given criterion.

set_best(validation_scores): Find the best iteration for each dataset/post_processing_parameter/criterion.

higher_is_better(criterion): Return whether higher is better for the given criterion.

bounds(criterion): Return the bounds for the given criterion.

store_best(criterion): Return whether to store the best score for the given criterion.

Note

The Evaluator class is used to compare and evaluate the output array against the evaluation array.

abstract evaluate(output_array_identifier: dacapo.store.local_array_store.LocalArrayIdentifier, evaluation_array: dacapo.experiments.datasplits.datasets.arrays.Array) → dacapo.experiments.tasks.evaluators.evaluation_scores.EvaluationScores

Compares and evaluates the output array against the evaluation array.

Parameters:

output_array_identifier – LocalArrayIdentifier The identifier of the output array.
evaluation_array – Array The evaluation array.

Returns:

EvaluationScores: The evaluation scores.

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> output_array_identifier = LocalArrayIdentifier("output_array")
>>> evaluation_array = Array()
>>> evaluator.evaluate(output_array_identifier, evaluation_array)
EvaluationScores()

Note

This function is used to compare and evaluate the output array against the evaluation array.

property best_scores: Dict[OutputIdentifier, BestScore]

The best scores for each dataset/post-processing parameter/criterion combination.

Returns:

Dict[OutputIdentifier, BestScore]: the best scores for each dataset/post-processing parameter/criterion combination

Raises:

AttributeError – if the best scores are not set

Examples

>>> evaluator = Evaluator()
>>> evaluator.best_scores
{}

Note

This function is used to return the best scores for each dataset/post-processing parameter/criterion combination.

is_best(dataset: dacapo.experiments.datasplits.datasets.Dataset, parameter: dacapo.experiments.tasks.post_processors.PostProcessorParameters, criterion: str, score: dacapo.experiments.tasks.evaluators.evaluation_scores.EvaluationScores) → bool

Check if the provided score is the best for this dataset/parameter/criterion combo.

Parameters:

dataset – Dataset the dataset
parameter – PostProcessorParameters the post-processor parameters
criterion – str the criterion
score – EvaluationScores the evaluation scores

Returns:

bool: whether the provided score is the best for this dataset/parameter/criterion combo

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> dataset = Dataset()
>>> parameter = PostProcessorParameters()
>>> criterion = "criterion"
>>> score = EvaluationScores()
>>> evaluator.is_best(dataset, parameter, criterion, score)
False

Note

This function is used to check if the provided score is the best for this dataset/parameter/criterion combo.

get_overall_best(dataset: dacapo.experiments.datasplits.datasets.Dataset, criterion: str)

Return the best score for the given dataset and criterion.

Parameters:

dataset – Dataset the dataset
criterion – str the criterion

Returns:

Optional[float]: the best score for the given dataset and criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> dataset = Dataset()
>>> criterion = "criterion"
>>> evaluator.get_overall_best(dataset, criterion)
None

Note

This function is used to return the best score for the given dataset and criterion.

get_overall_best_parameters(dataset: dacapo.experiments.datasplits.datasets.Dataset, criterion: str)

Return the best parameters for the given dataset and criterion.

Parameters:

dataset – Dataset the dataset
criterion – str the criterion

Returns:

Optional[PostProcessorParameters]: the best parameters for the given dataset and criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> dataset = Dataset()
>>> criterion = "criterion"
>>> evaluator.get_overall_best_parameters(dataset, criterion)
None

Note

This function is used to return the best parameters for the given dataset and criterion.

compare(score_1, score_2, criterion)

Compare two scores for the given criterion.

Parameters:

score_1 – float the first score
score_2 – float the second score
criterion – str the criterion

Returns:

bool: whether the first score is better than the second score for the given criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> score_1 = 0.0
>>> score_2 = 0.0
>>> criterion = "criterion"
>>> evaluator.compare(score_1, score_2, criterion)
False

Note

This function is used to compare two scores for the given criterion.

set_best(validation_scores: dacapo.experiments.validation_scores.ValidationScores) → None

Find the best iteration for each dataset/post_processing_parameter/criterion.

Parameters:: validation_scores – ValidationScores the validation scores
Raises:: NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> validation_scores = ValidationScores()
>>> evaluator.set_best(validation_scores)
None

Note

This function is used to find the best iteration for each dataset/post_processing_parameter/criterion. Typically, this function is called after the validation scores have been computed.

property criteria: List[str]

Abstractmethod:

A list of all criteria for which a model might be “best”. i.e. your criteria might be “precision”, “recall”, and “jaccard”. It is unlikely that the best iteration/post processing parameters will be the same for all 3 of these criteria

Returns:

List[str]: the evaluation criteria

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> evaluator.criteria
[]

Note

This function is used to return the evaluation criteria.

higher_is_better(criterion: str) → bool

Wether or not higher is better for this criterion.

Parameters:

criterion – str the criterion

Returns:

bool: whether higher is better for the given criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> criterion = "criterion"
>>> evaluator.higher_is_better(criterion)
False

Note

This function is used to determine whether higher is better for the given criterion.

bounds(criterion: str) → Tuple[int | float | None, int | float | None]

The bounds for this criterion

Parameters:

criterion – str the criterion

Returns:

Tuple[Union[int, float, None], Union[int, float, None]]: the bounds for the given criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> criterion = "criterion"
>>> evaluator.bounds(criterion)
(0, 1)

Note

This function is used to return the bounds for the given criterion.

store_best(criterion: str) → bool

The bounds for this criterion

Parameters:

criterion – str the criterion

Returns:

bool: whether to store the best score for the given criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> criterion = "criterion"
>>> evaluator.store_best(criterion)
False

Note

This function is used to return whether to store the best score for the given criterion.

property score: dacapo.experiments.tasks.evaluators.evaluation_scores.EvaluationScores

Abstractmethod:

The evaluation scores.

Returns:

EvaluationScores: the evaluation scores

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> evaluator.score
EvaluationScores()

Note

This function is used to return the evaluation scores.