dacapo.experiments.tasks.evaluators

Submodules

Classes

DummyEvaluationScores

The evaluation scores for the dummy task. The scores include the frizz level and blipp score. A higher frizz level indicates more frizz, while a higher blipp score indicates better performance.

DummyEvaluator

A class representing a dummy evaluator. This evaluator is used for testing purposes.

EvaluationScores

Base class for evaluation scores. This class is used to store the evaluation scores for a task.

Evaluator

Base class of all evaluators: An abstract class representing an evaluator that compares and evaluates the output array against the evaluation array.

MultiChannelBinarySegmentationEvaluationScores

Class representing evaluation scores for multi-channel binary segmentation tasks.

BinarySegmentationEvaluationScores

Class representing evaluation scores for binary segmentation tasks.

BinarySegmentationEvaluator

Given a binary segmentation, compute various metrics to determine their similarity. The metrics include:

InstanceEvaluationScores

The evaluation scores for the instance segmentation task. The scores include the variation of information (VOI) split, VOI merge, and VOI.

InstanceEvaluator

A class representing an evaluator for instance segmentation tasks.

Package Contents

class dacapo.experiments.tasks.evaluators.DummyEvaluationScores

The evaluation scores for the dummy task. The scores include the frizz level and blipp score. A higher frizz level indicates more frizz, while a higher blipp score indicates better performance.

frizz_level

float the frizz level

blipp_score

float the blipp score

higher_is_better(criterion)

Return whether higher is better for the given criterion.

bounds(criterion)

Return the bounds for the given criterion.

store_best(criterion)

Return whether to store the best score for the given criterion.

Note

The DummyEvaluationScores class is used to store the evaluation scores for the dummy task. The class also provides methods to determine whether higher is better for a given criterion, the bounds for a given criterion, and whether to store the best score for a given criterion.

criteria = ['frizz_level', 'blipp_score']

The evaluation criteria.

Returns:

List[str]

the evaluation criteria

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluation_scores = EvaluationScores()
>>> evaluation_scores.criteria
["criterion1", "criterion2"]

Note

This function is used to return the evaluation criteria.

frizz_level: float
blipp_score: float
static higher_is_better(criterion: str) bool

Return whether higher is better for the given criterion.

Parameters:

criterion – str the evaluation criterion

Returns:

bool

whether higher is better for this criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> DummyEvaluationScores.higher_is_better("frizz_level")
True

Note

This function is used to determine whether higher is better for the given criterion.

static bounds(criterion: str) Tuple[int | float | None, int | float | None]

Return the bounds for the given criterion.

Parameters:

criterion – str the evaluation criterion

Returns:

Tuple[Union[int, float, None], Union[int, float, None]]

the bounds for the given criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> DummyEvaluationScores.bounds("frizz_level")
(0.0, 1.0)

Note

This function is used to return the bounds for the given criterion.

static store_best(criterion: str) bool

Return whether to store the best score for the given criterion.

Parameters:

criterion – str the evaluation criterion

Returns:

bool

whether to store the best score for the given criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> DummyEvaluationScores.store_best("frizz_level")
True

Note

This function is used to determine whether to store the best score for the given criterion.

class dacapo.experiments.tasks.evaluators.DummyEvaluator

A class representing a dummy evaluator. This evaluator is used for testing purposes.

criteria

List[str] the evaluation criteria

evaluate(output_array_identifier, evaluation_dataset)

Evaluate the output array against the evaluation dataset.

score()

Return the evaluation scores.

Note

The DummyEvaluator class is used to evaluate the performance of a dummy task.

criteria = ['frizz_level', 'blipp_score']

A list of all criteria for which a model might be “best”. i.e. your criteria might be “precision”, “recall”, and “jaccard”. It is unlikely that the best iteration/post processing parameters will be the same for all 3 of these criteria

Returns:

List[str]

the evaluation criteria

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> evaluator.criteria
[]

Note

This function is used to return the evaluation criteria.

evaluate(output_array_identifier, evaluation_dataset)

Evaluate the given output array and dataset and returns the scores based on predefined criteria.

Parameters:
  • output_array_identifier – The output array to be evaluated.

  • evaluation_dataset – The dataset to be used for evaluation.

Returns:

An object of DummyEvaluationScores class, with the evaluation scores.

Return type:

DummyEvaluationScore

Raises:

ValueError – if the output array identifier is not valid

Examples

>>> dummy_evaluator = DummyEvaluator()
>>> output_array_identifier = "output_array"
>>> evaluation_dataset = "evaluation_dataset"
>>> dummy_evaluator.evaluate(output_array_identifier, evaluation_dataset)
DummyEvaluationScores(frizz_level=0.0, blipp_score=0.0)

Note

This function is used to evaluate the output array against the evaluation dataset.

property score: dacapo.experiments.tasks.evaluators.dummy_evaluation_scores.DummyEvaluationScores

Return the evaluation scores.

Returns:

An object of DummyEvaluationScores class, with the evaluation scores.

Return type:

DummyEvaluationScores

Examples

>>> dummy_evaluator = DummyEvaluator()
>>> dummy_evaluator.score
DummyEvaluationScores(frizz_level=0.0, blipp_score=0.0)

Note

This function is used to return the evaluation scores.

class dacapo.experiments.tasks.evaluators.EvaluationScores

Base class for evaluation scores. This class is used to store the evaluation scores for a task. The scores include the evaluation criteria. The class also provides methods to determine whether higher is better for a given criterion, the bounds for a given criterion, and whether to store the best score for a given criterion.

criteria

List[str] the evaluation criteria

higher_is_better(criterion)

Return whether higher is better for the given criterion.

bounds(criterion)

Return the bounds for the given criterion.

store_best(criterion)

Return whether to store the best score for the given criterion.

Note

The EvaluationScores class is used to store the evaluation scores for a task. All evaluation scores should inherit from this class.

property criteria: List[str]
Abstractmethod:

The evaluation criteria.

Returns:

List[str]

the evaluation criteria

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluation_scores = EvaluationScores()
>>> evaluation_scores.criteria
["criterion1", "criterion2"]

Note

This function is used to return the evaluation criteria.

static higher_is_better(criterion: str) bool
Abstractmethod:

Wether or not higher is better for this criterion.

Parameters:

criterion – str the evaluation criterion

Returns:

bool

whether higher is better for this criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluation_scores = EvaluationScores()
>>> criterion = "criterion1"
>>> evaluation_scores.higher_is_better(criterion)
True

Note

This function is used to determine whether higher is better for a given criterion.

static bounds(criterion: str) Tuple[int | float | None, int | float | None]
Abstractmethod:

The bounds for this criterion.

Parameters:

criterion – str the evaluation criterion

Returns:

Tuple[Union[int, float, None], Union[int, float, None]]

the bounds for this criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluation_scores = EvaluationScores()
>>> criterion = "criterion1"
>>> evaluation_scores.bounds(criterion)
(0, 1)

Note

This function is used to return the bounds for the given criterion.

static store_best(criterion: str) bool
Abstractmethod:

Whether or not to save the best validation block and model weights for this criterion.

Parameters:

criterion – str the evaluation criterion

Returns:

bool

whether to store the best score for this criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluation_scores = EvaluationScores()
>>> criterion = "criterion1"
>>> evaluation_scores.store_best(criterion)
True

Note

This function is used to return whether to store the best score for the given criterion.

class dacapo.experiments.tasks.evaluators.Evaluator

Base class of all evaluators: An abstract class representing an evaluator that compares and evaluates the output array against the evaluation array.

An evaluator takes a post-processor’s output and compares it against ground-truth. It then returns a set of scores that can be used to determine the quality of the post-processor’s output.

best_scores

Dict[OutputIdentifier, BestScore] the best scores for each dataset/post-processing parameter/criterion combination

evaluate(output_array_identifier, evaluation_array)

Compare and evaluate the output array against the evaluation array.

is_best(dataset, parameter, criterion, score)

Check if the provided score is the best for this dataset/parameter/criterion combo.

get_overall_best(dataset, criterion)

Return the best score for the given dataset and criterion.

get_overall_best_parameters(dataset, criterion)

Return the best parameters for the given dataset and criterion.

compare(score_1, score_2, criterion)

Compare two scores for the given criterion.

set_best(validation_scores)

Find the best iteration for each dataset/post_processing_parameter/criterion.

higher_is_better(criterion)

Return whether higher is better for the given criterion.

bounds(criterion)

Return the bounds for the given criterion.

store_best(criterion)

Return whether to store the best score for the given criterion.

Note

The Evaluator class is used to compare and evaluate the output array against the evaluation array.

abstract evaluate(output_array_identifier: dacapo.store.local_array_store.LocalArrayIdentifier, evaluation_array: dacapo.experiments.datasplits.datasets.arrays.Array) dacapo.experiments.tasks.evaluators.evaluation_scores.EvaluationScores

Compares and evaluates the output array against the evaluation array.

Parameters:
  • output_array_identifier – LocalArrayIdentifier The identifier of the output array.

  • evaluation_array – Array The evaluation array.

Returns:

EvaluationScores

The evaluation scores.

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> output_array_identifier = LocalArrayIdentifier("output_array")
>>> evaluation_array = Array()
>>> evaluator.evaluate(output_array_identifier, evaluation_array)
EvaluationScores()

Note

This function is used to compare and evaluate the output array against the evaluation array.

property best_scores: Dict[OutputIdentifier, BestScore]

The best scores for each dataset/post-processing parameter/criterion combination.

Returns:

Dict[OutputIdentifier, BestScore]

the best scores for each dataset/post-processing parameter/criterion combination

Raises:

AttributeError – if the best scores are not set

Examples

>>> evaluator = Evaluator()
>>> evaluator.best_scores
{}

Note

This function is used to return the best scores for each dataset/post-processing parameter/criterion combination.

is_best(dataset: dacapo.experiments.datasplits.datasets.Dataset, parameter: dacapo.experiments.tasks.post_processors.PostProcessorParameters, criterion: str, score: dacapo.experiments.tasks.evaluators.evaluation_scores.EvaluationScores) bool

Check if the provided score is the best for this dataset/parameter/criterion combo.

Parameters:
  • dataset – Dataset the dataset

  • parameter – PostProcessorParameters the post-processor parameters

  • criterion – str the criterion

  • score – EvaluationScores the evaluation scores

Returns:

bool

whether the provided score is the best for this dataset/parameter/criterion combo

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> dataset = Dataset()
>>> parameter = PostProcessorParameters()
>>> criterion = "criterion"
>>> score = EvaluationScores()
>>> evaluator.is_best(dataset, parameter, criterion, score)
False

Note

This function is used to check if the provided score is the best for this dataset/parameter/criterion combo.

get_overall_best(dataset: dacapo.experiments.datasplits.datasets.Dataset, criterion: str)

Return the best score for the given dataset and criterion.

Parameters:
  • dataset – Dataset the dataset

  • criterion – str the criterion

Returns:

Optional[float]

the best score for the given dataset and criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> dataset = Dataset()
>>> criterion = "criterion"
>>> evaluator.get_overall_best(dataset, criterion)
None

Note

This function is used to return the best score for the given dataset and criterion.

get_overall_best_parameters(dataset: dacapo.experiments.datasplits.datasets.Dataset, criterion: str)

Return the best parameters for the given dataset and criterion.

Parameters:
  • dataset – Dataset the dataset

  • criterion – str the criterion

Returns:

Optional[PostProcessorParameters]

the best parameters for the given dataset and criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> dataset = Dataset()
>>> criterion = "criterion"
>>> evaluator.get_overall_best_parameters(dataset, criterion)
None

Note

This function is used to return the best parameters for the given dataset and criterion.

compare(score_1, score_2, criterion)

Compare two scores for the given criterion.

Parameters:
  • score_1 – float the first score

  • score_2 – float the second score

  • criterion – str the criterion

Returns:

bool

whether the first score is better than the second score for the given criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> score_1 = 0.0
>>> score_2 = 0.0
>>> criterion = "criterion"
>>> evaluator.compare(score_1, score_2, criterion)
False

Note

This function is used to compare two scores for the given criterion.

set_best(validation_scores: dacapo.experiments.validation_scores.ValidationScores) None

Find the best iteration for each dataset/post_processing_parameter/criterion.

Parameters:

validation_scores – ValidationScores the validation scores

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> validation_scores = ValidationScores()
>>> evaluator.set_best(validation_scores)
None

Note

This function is used to find the best iteration for each dataset/post_processing_parameter/criterion. Typically, this function is called after the validation scores have been computed.

property criteria: List[str]
Abstractmethod:

A list of all criteria for which a model might be “best”. i.e. your criteria might be “precision”, “recall”, and “jaccard”. It is unlikely that the best iteration/post processing parameters will be the same for all 3 of these criteria

Returns:

List[str]

the evaluation criteria

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> evaluator.criteria
[]

Note

This function is used to return the evaluation criteria.

higher_is_better(criterion: str) bool

Wether or not higher is better for this criterion.

Parameters:

criterion – str the criterion

Returns:

bool

whether higher is better for the given criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> criterion = "criterion"
>>> evaluator.higher_is_better(criterion)
False

Note

This function is used to determine whether higher is better for the given criterion.

bounds(criterion: str) Tuple[int | float | None, int | float | None]

The bounds for this criterion

Parameters:

criterion – str the criterion

Returns:

Tuple[Union[int, float, None], Union[int, float, None]]

the bounds for the given criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> criterion = "criterion"
>>> evaluator.bounds(criterion)
(0, 1)

Note

This function is used to return the bounds for the given criterion.

store_best(criterion: str) bool

The bounds for this criterion

Parameters:

criterion – str the criterion

Returns:

bool

whether to store the best score for the given criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> criterion = "criterion"
>>> evaluator.store_best(criterion)
False

Note

This function is used to return whether to store the best score for the given criterion.

property score: dacapo.experiments.tasks.evaluators.evaluation_scores.EvaluationScores
Abstractmethod:

The evaluation scores.

Returns:

EvaluationScores

the evaluation scores

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> evaluator.score
EvaluationScores()

Note

This function is used to return the evaluation scores.

class dacapo.experiments.tasks.evaluators.MultiChannelBinarySegmentationEvaluationScores

Class representing evaluation scores for multi-channel binary segmentation tasks.

channel_scores

The list of channel scores.

Type:

List[Tuple[str, BinarySegmentationEvaluationScores]]

higher_is_better(criterion

str) -> bool: Determines whether a higher value is better for a given criterion.

store_best(criterion

str) -> bool: Whether or not to store the best weights/validation blocks for this criterion.

bounds(criterion

str) -> Tuple[Union[int, float, None], Union[int, float, None]]: Determines the bounds for a given criterion.

Notes

The evaluation scores are stored as attributes of the class. The class also contains methods to determine whether a higher value is better for a given criterion, whether or not to store the best weights/validation blocks for a given criterion, and the bounds for a given criterion.

channel_scores: List[Tuple[str, BinarySegmentationEvaluationScores]]
property criteria
Returns a list of all criteria for all channels.
Returns:

The list of criteria.

Return type:

List[str]

Raises:

ValueError – If the criterion is not recognized.

Examples

>>> channel_scores = [("channel1", BinarySegmentationEvaluationScores()), ("channel2", BinarySegmentationEvaluationScores())]
>>> MultiChannelBinarySegmentationEvaluationScores(channel_scores).criteria

Notes

The method returns a list of all criteria for all channels. The criteria are stored as attributes of the class.

static higher_is_better(criterion: str) bool

Determines whether a higher value is better for a given criterion.

Parameters:

criterion (str) – The evaluation criterion.

Returns:

True if a higher value is better, False otherwise.

Return type:

bool

Raises:

ValueError – If the criterion is not recognized.

Examples

>>> MultiChannelBinarySegmentationEvaluationScores.higher_is_better("channel1__dice")
True
>>> MultiChannelBinarySegmentationEvaluationScores.higher_is_better("channel1__f1_score")
True

Notes

The method returns True if the criterion is recognized and False otherwise. Whether a higher value is better for a given criterion is determined by the mapping dictionary.

static store_best(criterion: str) bool

Determines whether or not to store the best weights/validation blocks for a given criterion.

Parameters:

criterion (str) – The evaluation criterion.

Returns:

True if the best weights/validation blocks should be stored, False otherwise.

Return type:

bool

Raises:

ValueError – If the criterion is not recognized.

Examples

>>> MultiChannelBinarySegmentationEvaluationScores.store_best("channel1__dice")
False
>>> MultiChannelBinarySegmentationEvaluationScores.store_best("channel1__f1_score")
True

Notes

The method returns True if the criterion is recognized and False otherwise. Whether or not to store the best weights/validation blocks for a given criterion is determined by the mapping dictionary.

static bounds(criterion: str) Tuple[int | float | None, int | float | None]

Determines the bounds for a given criterion. The bounds are used to determine the best value for a given criterion.

Parameters:

criterion (str) – The evaluation criterion.

Returns:

The lower and upper bounds for the criterion.

Return type:

Tuple[Union[int, float, None], Union[int, float, None]]

Raises:

ValueError – If the criterion is not recognized.

Examples

>>> MultiChannelBinarySegmentationEvaluationScores.bounds("channel1__dice")
(0, 1)
>>> MultiChannelBinarySegmentationEvaluationScores.bounds("channel1__hausdorff")
(0, nan)

Notes

The method returns the lower and upper bounds for the criterion. The bounds are determined by the mapping dictionary.

class dacapo.experiments.tasks.evaluators.BinarySegmentationEvaluationScores

Class representing evaluation scores for binary segmentation tasks.

The metrics include: - Dice coefficient: 2 * |A ∩ B| / |A| + |B| ; where A and B are the binary segmentations - Jaccard coefficient: |A ∩ B| / |A ∪ B| ; where A and B are the binary segmentations - Hausdorff distance: max(h(A, B), h(B, A)) ; where h(A, B) is the Hausdorff distance between A and B - False negative rate: |A - B| / |A| ; where A and B are the binary segmentations - False positive rate: |B - A| / |B| ; where A and B are the binary segmentations - False discovery rate: |B - A| / |A| ; where A and B are the binary segmentations - VOI: Variation of Information; split and merge errors combined into a single measure of segmentation quality - Mean false distance: 0.5 * (mean false positive distance + mean false negative distance) - Mean false negative distance: mean distance of false negatives - Mean false positive distance: mean distance of false positives - Mean false distance clipped: 0.5 * (mean false positive distance clipped + mean false negative distance clipped) ; clipped to a maximum distance - Mean false negative distance clipped: mean distance of false negatives clipped ; clipped to a maximum distance - Mean false positive distance clipped: mean distance of false positives clipped ; clipped to a maximum distance - Precision with tolerance: TP / (TP + FP) ; where TP and FP are the true and false positives within a tolerance distance - Recall with tolerance: TP / (TP + FN) ; where TP and FN are the true and false positives within a tolerance distance - F1 score with tolerance: 2 * (Recall * Precision) / (Recall + Precision) ; where Recall and Precision are the true and false positives within a tolerance distance - Precision: TP / (TP + FP) ; where TP and FP are the true and false positives - Recall: TP / (TP + FN) ; where TP and FN are the true and false positives - F1 score: 2 * (Recall * Precision) / (Recall + Precision) ; where Recall and Precision are the true and false positives

dice

The Dice coefficient.

Type:

float

jaccard

The Jaccard index.

Type:

float

hausdorff

The Hausdorff distance.

Type:

float

false_negative_rate

The false negative rate.

Type:

float

false_negative_rate_with_tolerance

The false negative rate with tolerance.

Type:

float

false_positive_rate

The false positive rate.

Type:

float

false_discovery_rate

The false discovery rate.

Type:

float

false_positive_rate_with_tolerance

The false positive rate with tolerance.

Type:

float

voi

The variation of information.

Type:

float

mean_false_distance

The mean false distance.

Type:

float

mean_false_negative_distance

The mean false negative distance.

Type:

float

mean_false_positive_distance

The mean false positive distance.

Type:

float

mean_false_distance_clipped

The mean false distance clipped.

Type:

float

mean_false_negative_distance_clipped

The mean false negative distance clipped.

Type:

float

mean_false_positive_distance_clipped

The mean false positive distance clipped.

Type:

float

precision_with_tolerance

The precision with tolerance.

Type:

float

recall_with_tolerance

The recall with tolerance.

Type:

float

f1_score_with_tolerance

The F1 score with tolerance.

Type:

float

precision

The precision.

Type:

float

recall

The recall.

Type:

float

f1_score

The F1 score.

Type:

float

store_best(criterion

str) -> bool: Whether or not to store the best weights/validation blocks for this criterion.

higher_is_better(criterion

str) -> bool: Determines whether a higher value is better for a given criterion.

bounds(criterion

str) -> Tuple[Union[int, float, None], Union[int, float, None]]: Determines the bounds for a given criterion.

Notes

The evaluation scores are stored as attributes of the class. The class also contains methods to determine whether a higher value is better for a given criterion, whether or not to store the best weights/validation blocks for a given criterion, and the bounds for a given criterion.

dice: float
jaccard: float
hausdorff: float
false_negative_rate: float
false_negative_rate_with_tolerance: float
false_positive_rate: float
false_discovery_rate: float
false_positive_rate_with_tolerance: float
voi: float
mean_false_distance: float
mean_false_negative_distance: float
mean_false_positive_distance: float
mean_false_distance_clipped: float
mean_false_negative_distance_clipped: float
mean_false_positive_distance_clipped: float
precision_with_tolerance: float
recall_with_tolerance: float
f1_score_with_tolerance: float
precision: float
recall: float
f1_score: float
criteria = ['dice', 'jaccard', 'hausdorff', 'false_negative_rate', 'false_negative_rate_with_tolerance',...

The evaluation criteria.

Returns:

List[str]

the evaluation criteria

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluation_scores = EvaluationScores()
>>> evaluation_scores.criteria
["criterion1", "criterion2"]

Note

This function is used to return the evaluation criteria.

static store_best(criterion: str) bool

Determines whether or not to store the best weights/validation blocks for a given criterion.

Parameters:

criterion (str) – The evaluation criterion.

Returns:

True if the best weights/validation blocks should be stored, False otherwise.

Return type:

bool

Raises:

ValueError – If the criterion is not recognized.

Examples

>>> BinarySegmentationEvaluationScores.store_best("dice")
False
>>> BinarySegmentationEvaluationScores.store_best("f1_score")
True

Notes

The method returns True if the criterion is recognized and False otherwise. Whether or not to store the best weights/validation blocks for a given criterion is determined by the mapping dictionary.

static higher_is_better(criterion: str) bool

Determines whether a higher value is better for a given criterion.

Parameters:

criterion (str) – The evaluation criterion.

Returns:

True if a higher value is better, False otherwise.

Return type:

bool

Raises:

ValueError – If the criterion is not recognized.

Examples

>>> BinarySegmentationEvaluationScores.higher_is_better("dice")
True
>>> BinarySegmentationEvaluationScores.higher_is_better("f1_score")
True

Notes

The method returns True if the criterion is recognized and False otherwise. Whether a higher value is better for a given criterion is determined by the mapping dictionary.

static bounds(criterion: str) Tuple[int | float | None, int | float | None]

Determines the bounds for a given criterion. The bounds are used to determine the best value for a given criterion.

Parameters:

criterion (str) – The evaluation criterion.

Returns:

The lower and upper bounds for the criterion.

Return type:

Tuple[Union[int, float, None], Union[int, float, None]]

Raises:

ValueError – If the criterion is not recognized.

Examples

>>> BinarySegmentationEvaluationScores.bounds("dice")
(0, 1)
>>> BinarySegmentationEvaluationScores.bounds("hausdorff")
(0, nan)

Notes

The method returns the lower and upper bounds for the criterion. The bounds are determined by the mapping dictionary.

class dacapo.experiments.tasks.evaluators.BinarySegmentationEvaluator(clip_distance: float, tol_distance: float, channels: List[str])

Given a binary segmentation, compute various metrics to determine their similarity. The metrics include: - Dice coefficient: 2 * |A ∩ B| / |A| + |B| ; where A and B are the binary segmentations - Jaccard coefficient: |A ∩ B| / |A ∪ B| ; where A and B are the binary segmentations - Hausdorff distance: max(h(A, B), h(B, A)) ; where h(A, B) is the Hausdorff distance between A and B - False negative rate: |A - B| / |A| ; where A and B are the binary segmentations - False positive rate: |B - A| / |B| ; where A and B are the binary segmentations - False discovery rate: |B - A| / |A| ; where A and B are the binary segmentations - VOI: Variation of Information; split and merge errors combined into a single measure of segmentation quality - Mean false distance: 0.5 * (mean false positive distance + mean false negative distance) - Mean false negative distance: mean distance of false negatives - Mean false positive distance: mean distance of false positives - Mean false distance clipped: 0.5 * (mean false positive distance clipped + mean false negative distance clipped) ; clipped to a maximum distance - Mean false negative distance clipped: mean distance of false negatives clipped ; clipped to a maximum distance - Mean false positive distance clipped: mean distance of false positives clipped ; clipped to a maximum distance - Precision with tolerance: TP / (TP + FP) ; where TP and FP are the true and false positives within a tolerance distance - Recall with tolerance: TP / (TP + FN) ; where TP and FN are the true and false positives within a tolerance distance - F1 score with tolerance: 2 * (Recall * Precision) / (Recall + Precision) ; where Recall and Precision are the true and false positives within a tolerance distance - Precision: TP / (TP + FP) ; where TP and FP are the true and false positives - Recall: TP / (TP + FN) ; where TP and FN are the true and false positives - F1 score: 2 * (Recall * Precision) / (Recall + Precision) ; where Recall and Precision are the true and false positives

clip_distance

float the clip distance

tol_distance

float the tolerance distance

channels

List[str] the channels

criteria

List[str] the evaluation criteria

evaluate(output_array_identifier, evaluation_array)

Evaluate the output array against the evaluation array.

score()

Return the evaluation scores.

Note

The BinarySegmentationEvaluator class is used to evaluate the performance of a binary segmentation task. The class provides methods to evaluate the output array against the evaluation array and return the evaluation scores. All evaluation scores should inherit from this class.

Clip distance is the maximum distance between the ground truth and the predicted segmentation for a pixel to be considered a false positive. Tolerance distance is the maximum distance between the ground truth and the predicted segmentation for a pixel to be considered a true positive. Channels are the channels of the binary segmentation. Criteria are the evaluation criteria.

criteria = ['jaccard', 'voi']

A list of all criteria for which a model might be “best”. i.e. your criteria might be “precision”, “recall”, and “jaccard”. It is unlikely that the best iteration/post processing parameters will be the same for all 3 of these criteria

Returns:

List[str]

the evaluation criteria

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> evaluator.criteria
[]

Note

This function is used to return the evaluation criteria.

clip_distance
tol_distance
channels
evaluate(output_array_identifier, evaluation_array)

Evaluate the output array against the evaluation array.

Parameters:
  • output_array_identifier – str the identifier of the output array

  • evaluation_array – ZarrArray the evaluation array

Returns:

BinarySegmentationEvaluationScores or MultiChannelBinarySegmentationEvaluationScores

the evaluation scores

Raises:

ValueError – if the output array identifier is not valid

Examples

>>> binary_segmentation_evaluator = BinarySegmentationEvaluator(clip_distance=200, tol_distance=40, channels=["channel1", "channel2"])
>>> output_array_identifier = "output_array"
>>> evaluation_array = ZarrArray.open_from_array_identifier("evaluation_array")
>>> binary_segmentation_evaluator.evaluate(output_array_identifier, evaluation_array)
BinarySegmentationEvaluationScores(dice=0.0, jaccard=0.0, hausdorff=0.0, false_negative_rate=0.0, false_positive_rate=0.0, false_discovery_rate=0.0, voi=0.0, mean_false_distance=0.0, mean_false_negative_distance=0.0, mean_false_positive_distance=0.0, mean_false_distance_clipped=0.0, mean_false_negative_distance_clipped=0.0, mean_false_positive_distance_clipped=0.0, precision_with_tolerance=0.0, recall_with_tolerance=0.0, f1_score_with_tolerance=0.0, precision=0.0, recall=0.0, f1_score=0.0)

Note

This function is used to evaluate the output array against the evaluation array.

property score
Return the evaluation scores.
Returns:

BinarySegmentationEvaluationScores or MultiChannelBinarySegmentationEvaluationScores

the evaluation scores

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> binary_segmentation_evaluator = BinarySegmentationEvaluator(clip_distance=200, tol_distance=40, channels=["channel1", "channel2"])
>>> binary_segmentation_evaluator.score
BinarySegmentationEvaluationScores(dice=0.0, jaccard=0.0, hausdorff=0.0, false_negative_rate=0.0, false_positive_rate=0.0, false_discovery_rate=0.0, voi=0.0, mean_false_distance=0.0, mean_false_negative_distance=0.0, mean_false_positive_distance=0.0, mean_false_distance_clipped=0.0, mean_false_negative_distance_clipped=0.0, mean_false_positive_distance_clipped=0.0, precision_with_tolerance=0.0, recall_with_tolerance=0.0, f1_score_with_tolerance=0.0, precision=0.0, recall=0.0, f1_score=0.0)

Note

This function is used to return the evaluation scores.

class dacapo.experiments.tasks.evaluators.InstanceEvaluationScores

The evaluation scores for the instance segmentation task. The scores include the variation of information (VOI) split, VOI merge, and VOI.

voi_split

float the variation of information (VOI) split

voi_merge

float the variation of information (VOI) merge

voi

float the variation of information (VOI)

higher_is_better(criterion)

Return whether higher is better for the given criterion.

bounds(criterion)

Return the bounds for the given criterion.

store_best(criterion)

Return whether to store the best score for the given criterion.

Note

The InstanceEvaluationScores class is used to store the evaluation scores for the instance segmentation task.

criteria = ['voi_split', 'voi_merge', 'voi']

The evaluation criteria.

Returns:

List[str]

the evaluation criteria

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluation_scores = EvaluationScores()
>>> evaluation_scores.criteria
["criterion1", "criterion2"]

Note

This function is used to return the evaluation criteria.

voi_split: float
voi_merge: float
property voi
Return the average of the VOI split and VOI merge.
Returns:

float

the average of the VOI split and VOI merge

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> instance_evaluation_scores = InstanceEvaluationScores(voi_split=0.1, voi_merge=0.2)
>>> instance_evaluation_scores.voi
0.15

Note

This function is used to calculate the average of the VOI split and VOI merge.

static higher_is_better(criterion: str) bool

Return whether higher is better for the given criterion.

Parameters:

criterion – str the evaluation criterion

Returns:

bool

whether higher is better for the given criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> InstanceEvaluationScores.higher_is_better("voi_split")
False

Note

This function is used to determine whether higher is better for the given criterion.

static bounds(criterion: str) Tuple[int | float | None, int | float | None]

Return the bounds for the given criterion.

Parameters:

criterion – str the evaluation criterion

Returns:

Tuple[Union[int, float, None], Union[int, float, None]]

the bounds for the given criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> InstanceEvaluationScores.bounds("voi_split")
(0, 1)

Note

This function is used to return the bounds for the given criterion.

static store_best(criterion: str) bool

Return whether to store the best score for the given criterion.

Parameters:

criterion – str the evaluation criterion

Returns:

bool

whether to store the best score for the given criterion

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> InstanceEvaluationScores.store_best("voi_split")
True

Note

This function is used to determine whether to store the best score for the given criterion.

class dacapo.experiments.tasks.evaluators.InstanceEvaluator

A class representing an evaluator for instance segmentation tasks.

criteria

List[str] the evaluation criteria

evaluate(output_array_identifier, evaluation_array)

Evaluate the output array against the evaluation array.

score()

Return the evaluation scores.

Note

The InstanceEvaluator class is used to evaluate the performance of an instance segmentation task.

criteria: List[str] = ['voi_merge', 'voi_split', 'voi']

A list of all criteria for which a model might be “best”. i.e. your criteria might be “precision”, “recall”, and “jaccard”. It is unlikely that the best iteration/post processing parameters will be the same for all 3 of these criteria

Returns:

List[str]

the evaluation criteria

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> evaluator = Evaluator()
>>> evaluator.criteria
[]

Note

This function is used to return the evaluation criteria.

evaluate(output_array_identifier, evaluation_array)

Evaluate the output array against the evaluation array.

Parameters:
  • output_array_identifier – str the identifier of the output array

  • evaluation_array – ZarrArray the evaluation array

Returns:

InstanceEvaluationScores

the evaluation scores

Raises:

ValueError – if the output array identifier is not valid

Examples

>>> instance_evaluator = InstanceEvaluator()
>>> output_array_identifier = "output_array"
>>> evaluation_array = ZarrArray.open_from_array_identifier("evaluation_array")
>>> instance_evaluator.evaluate(output_array_identifier, evaluation_array)
InstanceEvaluationScores(voi_merge=0.0, voi_split=0.0)

Note

This function is used to evaluate the output array against the evaluation array.

property score: dacapo.experiments.tasks.evaluators.instance_evaluation_scores.InstanceEvaluationScores

Return the evaluation scores.

Returns:

InstanceEvaluationScores

the evaluation scores

Raises:

NotImplementedError – if the function is not implemented

Examples

>>> instance_evaluator = InstanceEvaluator()
>>> instance_evaluator.score
InstanceEvaluationScores(voi_merge=0.0, voi_split=0.0)

Note

This function is used to return the evaluation scores.