dacapo.experiments.tasks.evaluators.evaluator

Attributes

OutputIdentifier

Iteration

Score

BestScore

Classes

Evaluator

Base class of all evaluators: An abstract class representing an evaluator that compares and evaluates the output array against the evaluation array.

Module Contents

dacapo.experiments.tasks.evaluators.evaluator.OutputIdentifier
dacapo.experiments.tasks.evaluators.evaluator.Iteration
dacapo.experiments.tasks.evaluators.evaluator.Score
dacapo.experiments.tasks.evaluators.evaluator.BestScore
class dacapo.experiments.tasks.evaluators.evaluator.Evaluator

Base class of all evaluators: An abstract class representing an evaluator that compares and evaluates the output array against the evaluation array.

An evaluator takes a post-processor’s output and compares it against ground-truth. It then returns a set of scores that can be used to determine the quality of the post-processor’s output.

best_scores

Dict[OutputIdentifier, BestScore] the best scores for each dataset/post-processing parameter/criterion combination

evaluate(output_array_identifier, evaluation_array)

Compare and evaluate the output array against the evaluation array.

is_best(dataset, parameter, criterion, score)

Check if the provided score is the best for this dataset/parameter/criterion combo.

get_overall_best(dataset, criterion)

Return the best score for the given dataset and criterion.

get_overall_best_parameters(dataset, criterion)

Return the best parameters for the given dataset and criterion.

compare(score_1, score_2, criterion)

Compare two scores for the given criterion.

set_best(validation_scores)

Find the best iteration for each dataset/post_processing_parameter/criterion.

higher_is_better(criterion)

Return whether higher is better for the given criterion.

bounds(criterion)

Return the bounds for the given criterion.

store_best(criterion)

Return whether to store the best score for the given criterion.

Note

The Evaluator class is used to compare and evaluate the output array against the evaluation array.

abstract evaluate(output_array_identifier: dacapo.store.local_array_store.LocalArrayIdentifier, evaluation_array: dacapo.experiments.datasplits.datasets.arrays.Array) dacapo.experiments.tasks.evaluators.evaluation_scores.EvaluationScores

Compares and evaluates the output array against the evaluation array.

Parameters:
  • output_array_identifier – LocalArrayIdentifier The identifier of the output array.

  • evaluation_array – Array The evaluation array.

Returns:

EvaluationScores

The evaluation scores.

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> output_array_identifier = LocalArrayIdentifier("output_array")
>>> evaluation_array = Array()
>>> evaluator.evaluate(output_array_identifier, evaluation_array)
EvaluationScores()

Note

This function is used to compare and evaluate the output array against the evaluation array.

property best_scores: Dict[OutputIdentifier, BestScore]

The best scores for each dataset/post-processing parameter/criterion combination.

Returns:

Dict[OutputIdentifier, BestScore]

the best scores for each dataset/post-processing parameter/criterion combination

Raises:

AttributeError – if the best scores are not set

Examples

>>> evaluator = Evaluator()
>>> evaluator.best_scores
{}

Note

This function is used to return the best scores for each dataset/post-processing parameter/criterion combination.

is_best(dataset: dacapo.experiments.datasplits.datasets.Dataset, parameter: dacapo.experiments.tasks.post_processors.PostProcessorParameters, criterion: str, score: dacapo.experiments.tasks.evaluators.evaluation_scores.EvaluationScores) bool

Check if the provided score is the best for this dataset/parameter/criterion combo.

Parameters:
  • dataset – Dataset the dataset

  • parameter – PostProcessorParameters the post-processor parameters

  • criterion – str the criterion

  • score – EvaluationScores the evaluation scores

Returns:

bool

whether the provided score is the best for this dataset/parameter/criterion combo

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> dataset = Dataset()
>>> parameter = PostProcessorParameters()
>>> criterion = "criterion"
>>> score = EvaluationScores()
>>> evaluator.is_best(dataset, parameter, criterion, score)
False

Note

This function is used to check if the provided score is the best for this dataset/parameter/criterion combo.

get_overall_best(dataset: dacapo.experiments.datasplits.datasets.Dataset, criterion: str)

Return the best score for the given dataset and criterion.

Parameters:
  • dataset – Dataset the dataset

  • criterion – str the criterion

Returns:

Optional[float]

the best score for the given dataset and criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> dataset = Dataset()
>>> criterion = "criterion"
>>> evaluator.get_overall_best(dataset, criterion)
None

Note

This function is used to return the best score for the given dataset and criterion.

get_overall_best_parameters(dataset: dacapo.experiments.datasplits.datasets.Dataset, criterion: str)

Return the best parameters for the given dataset and criterion.

Parameters:
  • dataset – Dataset the dataset

  • criterion – str the criterion

Returns:

Optional[PostProcessorParameters]

the best parameters for the given dataset and criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> dataset = Dataset()
>>> criterion = "criterion"
>>> evaluator.get_overall_best_parameters(dataset, criterion)
None

Note

This function is used to return the best parameters for the given dataset and criterion.

compare(score_1, score_2, criterion)

Compare two scores for the given criterion.

Parameters:
  • score_1 – float the first score

  • score_2 – float the second score

  • criterion – str the criterion

Returns:

bool

whether the first score is better than the second score for the given criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> score_1 = 0.0
>>> score_2 = 0.0
>>> criterion = "criterion"
>>> evaluator.compare(score_1, score_2, criterion)
False

Note

This function is used to compare two scores for the given criterion.

set_best(validation_scores: dacapo.experiments.validation_scores.ValidationScores) None

Find the best iteration for each dataset/post_processing_parameter/criterion.

Parameters:

validation_scores – ValidationScores the validation scores

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> validation_scores = ValidationScores()
>>> evaluator.set_best(validation_scores)
None

Note

This function is used to find the best iteration for each dataset/post_processing_parameter/criterion. Typically, this function is called after the validation scores have been computed.

property criteria: List[str]
Abstractmethod:

A list of all criteria for which a model might be “best”. i.e. your criteria might be “precision”, “recall”, and “jaccard”. It is unlikely that the best iteration/post processing parameters will be the same for all 3 of these criteria

Returns:

List[str]

the evaluation criteria

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> evaluator.criteria
[]

Note

This function is used to return the evaluation criteria.

higher_is_better(criterion: str) bool

Wether or not higher is better for this criterion.

Parameters:

criterion – str the criterion

Returns:

bool

whether higher is better for the given criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> criterion = "criterion"
>>> evaluator.higher_is_better(criterion)
False

Note

This function is used to determine whether higher is better for the given criterion.

bounds(criterion: str) Tuple[int | float | None, int | float | None]

The bounds for this criterion

Parameters:

criterion – str the criterion

Returns:

Tuple[Union[int, float, None], Union[int, float, None]]

the bounds for the given criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> criterion = "criterion"
>>> evaluator.bounds(criterion)
(0, 1)

Note

This function is used to return the bounds for the given criterion.

store_best(criterion: str) bool

The bounds for this criterion

Parameters:

criterion – str the criterion

Returns:

bool

whether to store the best score for the given criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> criterion = "criterion"
>>> evaluator.store_best(criterion)
False

Note

This function is used to return whether to store the best score for the given criterion.

property score: dacapo.experiments.tasks.evaluators.evaluation_scores.EvaluationScores
Abstractmethod:

The evaluation scores.

Returns:

EvaluationScores

the evaluation scores

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> evaluator.score
EvaluationScores()

Note

This function is used to return the evaluation scores.