dacapo.experiments.tasks.evaluators
Submodules
- dacapo.experiments.tasks.evaluators.binary_segmentation_evaluation_scores
- dacapo.experiments.tasks.evaluators.binary_segmentation_evaluator
- dacapo.experiments.tasks.evaluators.dummy_evaluation_scores
- dacapo.experiments.tasks.evaluators.dummy_evaluator
- dacapo.experiments.tasks.evaluators.evaluation_scores
- dacapo.experiments.tasks.evaluators.evaluator
- dacapo.experiments.tasks.evaluators.instance_evaluation_scores
- dacapo.experiments.tasks.evaluators.instance_evaluator
Classes
The evaluation scores for the dummy task. The scores include the frizz level and blipp score. A higher frizz level indicates more frizz, while a higher blipp score indicates better performance. |
|
A class representing a dummy evaluator. This evaluator is used for testing purposes. |
|
Base class for evaluation scores. This class is used to store the evaluation scores for a task. |
|
Base class of all evaluators: An abstract class representing an evaluator that compares and evaluates the output array against the evaluation array. |
|
Class representing evaluation scores for multi-channel binary segmentation tasks. |
|
Class representing evaluation scores for binary segmentation tasks. |
|
Given a binary segmentation, compute various metrics to determine their similarity. The metrics include: |
|
The evaluation scores for the instance segmentation task. The scores include the variation of information (VOI) split, VOI merge, and VOI. |
|
A class representing an evaluator for instance segmentation tasks. |
Package Contents
- class dacapo.experiments.tasks.evaluators.DummyEvaluationScores
The evaluation scores for the dummy task. The scores include the frizz level and blipp score. A higher frizz level indicates more frizz, while a higher blipp score indicates better performance.
- frizz_level
float the frizz level
- blipp_score
float the blipp score
- higher_is_better(criterion)
Return whether higher is better for the given criterion.
- bounds(criterion)
Return the bounds for the given criterion.
- store_best(criterion)
Return whether to store the best score for the given criterion.
Note
The DummyEvaluationScores class is used to store the evaluation scores for the dummy task. The class also provides methods to determine whether higher is better for a given criterion, the bounds for a given criterion, and whether to store the best score for a given criterion.
- criteria = ['frizz_level', 'blipp_score']
The evaluation criteria.
- Returns:
- List[str]
the evaluation criteria
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> evaluation_scores = EvaluationScores() >>> evaluation_scores.criteria ["criterion1", "criterion2"]
Note
This function is used to return the evaluation criteria.
- frizz_level: float
- blipp_score: float
- static higher_is_better(criterion: str) bool
Return whether higher is better for the given criterion.
- Parameters:
criterion – str the evaluation criterion
- Returns:
- bool
whether higher is better for this criterion
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> DummyEvaluationScores.higher_is_better("frizz_level") True
Note
This function is used to determine whether higher is better for the given criterion.
- static bounds(criterion: str) Tuple[int | float | None, int | float | None]
Return the bounds for the given criterion.
- Parameters:
criterion – str the evaluation criterion
- Returns:
- Tuple[Union[int, float, None], Union[int, float, None]]
the bounds for the given criterion
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> DummyEvaluationScores.bounds("frizz_level") (0.0, 1.0)
Note
This function is used to return the bounds for the given criterion.
- static store_best(criterion: str) bool
Return whether to store the best score for the given criterion.
- Parameters:
criterion – str the evaluation criterion
- Returns:
- bool
whether to store the best score for the given criterion
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> DummyEvaluationScores.store_best("frizz_level") True
Note
This function is used to determine whether to store the best score for the given criterion.
- class dacapo.experiments.tasks.evaluators.DummyEvaluator
A class representing a dummy evaluator. This evaluator is used for testing purposes.
- criteria
List[str] the evaluation criteria
- evaluate(output_array_identifier, evaluation_dataset)
Evaluate the output array against the evaluation dataset.
- score()
Return the evaluation scores.
Note
The DummyEvaluator class is used to evaluate the performance of a dummy task.
- criteria = ['frizz_level', 'blipp_score']
A list of all criteria for which a model might be “best”. i.e. your criteria might be “precision”, “recall”, and “jaccard”. It is unlikely that the best iteration/post processing parameters will be the same for all 3 of these criteria
- Returns:
- List[str]
the evaluation criteria
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> evaluator = Evaluator() >>> evaluator.criteria []
Note
This function is used to return the evaluation criteria.
- evaluate(output_array_identifier, evaluation_dataset)
Evaluate the given output array and dataset and returns the scores based on predefined criteria.
- Parameters:
output_array_identifier – The output array to be evaluated.
evaluation_dataset – The dataset to be used for evaluation.
- Returns:
An object of DummyEvaluationScores class, with the evaluation scores.
- Return type:
DummyEvaluationScore
- Raises:
ValueError – if the output array identifier is not valid
Examples
>>> dummy_evaluator = DummyEvaluator() >>> output_array_identifier = "output_array" >>> evaluation_dataset = "evaluation_dataset" >>> dummy_evaluator.evaluate(output_array_identifier, evaluation_dataset) DummyEvaluationScores(frizz_level=0.0, blipp_score=0.0)
Note
This function is used to evaluate the output array against the evaluation dataset.
- property score: dacapo.experiments.tasks.evaluators.dummy_evaluation_scores.DummyEvaluationScores
Return the evaluation scores.
- Returns:
An object of DummyEvaluationScores class, with the evaluation scores.
- Return type:
Examples
>>> dummy_evaluator = DummyEvaluator() >>> dummy_evaluator.score DummyEvaluationScores(frizz_level=0.0, blipp_score=0.0)
Note
This function is used to return the evaluation scores.
- class dacapo.experiments.tasks.evaluators.EvaluationScores
Base class for evaluation scores. This class is used to store the evaluation scores for a task. The scores include the evaluation criteria. The class also provides methods to determine whether higher is better for a given criterion, the bounds for a given criterion, and whether to store the best score for a given criterion.
- criteria
List[str] the evaluation criteria
- higher_is_better(criterion)
Return whether higher is better for the given criterion.
- bounds(criterion)
Return the bounds for the given criterion.
- store_best(criterion)
Return whether to store the best score for the given criterion.
Note
The EvaluationScores class is used to store the evaluation scores for a task. All evaluation scores should inherit from this class.
- property criteria: List[str]
- Abstractmethod:
The evaluation criteria.
- Returns:
- List[str]
the evaluation criteria
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> evaluation_scores = EvaluationScores() >>> evaluation_scores.criteria ["criterion1", "criterion2"]
Note
This function is used to return the evaluation criteria.
- static higher_is_better(criterion: str) bool
- Abstractmethod:
Wether or not higher is better for this criterion.
- Parameters:
criterion – str the evaluation criterion
- Returns:
- bool
whether higher is better for this criterion
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> evaluation_scores = EvaluationScores() >>> criterion = "criterion1" >>> evaluation_scores.higher_is_better(criterion) True
Note
This function is used to determine whether higher is better for a given criterion.
- static bounds(criterion: str) Tuple[int | float | None, int | float | None]
- Abstractmethod:
The bounds for this criterion.
- Parameters:
criterion – str the evaluation criterion
- Returns:
- Tuple[Union[int, float, None], Union[int, float, None]]
the bounds for this criterion
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> evaluation_scores = EvaluationScores() >>> criterion = "criterion1" >>> evaluation_scores.bounds(criterion) (0, 1)
Note
This function is used to return the bounds for the given criterion.
- static store_best(criterion: str) bool
- Abstractmethod:
Whether or not to save the best validation block and model weights for this criterion.
- Parameters:
criterion – str the evaluation criterion
- Returns:
- bool
whether to store the best score for this criterion
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> evaluation_scores = EvaluationScores() >>> criterion = "criterion1" >>> evaluation_scores.store_best(criterion) True
Note
This function is used to return whether to store the best score for the given criterion.
- class dacapo.experiments.tasks.evaluators.Evaluator
Base class of all evaluators: An abstract class representing an evaluator that compares and evaluates the output array against the evaluation array.
An evaluator takes a post-processor’s output and compares it against ground-truth. It then returns a set of scores that can be used to determine the quality of the post-processor’s output.
- best_scores
Dict[OutputIdentifier, BestScore] the best scores for each dataset/post-processing parameter/criterion combination
- evaluate(output_array_identifier, evaluation_array)
Compare and evaluate the output array against the evaluation array.
- is_best(dataset, parameter, criterion, score)
Check if the provided score is the best for this dataset/parameter/criterion combo.
- get_overall_best(dataset, criterion)
Return the best score for the given dataset and criterion.
- get_overall_best_parameters(dataset, criterion)
Return the best parameters for the given dataset and criterion.
- compare(score_1, score_2, criterion)
Compare two scores for the given criterion.
- set_best(validation_scores)
Find the best iteration for each dataset/post_processing_parameter/criterion.
- higher_is_better(criterion)
Return whether higher is better for the given criterion.
- bounds(criterion)
Return the bounds for the given criterion.
- store_best(criterion)
Return whether to store the best score for the given criterion.
Note
The Evaluator class is used to compare and evaluate the output array against the evaluation array.
- abstract evaluate(output_array_identifier: dacapo.store.local_array_store.LocalArrayIdentifier, evaluation_array: dacapo.experiments.datasplits.datasets.arrays.Array) dacapo.experiments.tasks.evaluators.evaluation_scores.EvaluationScores
Compares and evaluates the output array against the evaluation array.
- Parameters:
output_array_identifier – LocalArrayIdentifier The identifier of the output array.
evaluation_array – Array The evaluation array.
- Returns:
- EvaluationScores
The evaluation scores.
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> evaluator = Evaluator() >>> output_array_identifier = LocalArrayIdentifier("output_array") >>> evaluation_array = Array() >>> evaluator.evaluate(output_array_identifier, evaluation_array) EvaluationScores()
Note
This function is used to compare and evaluate the output array against the evaluation array.
- property best_scores: Dict[OutputIdentifier, BestScore]
The best scores for each dataset/post-processing parameter/criterion combination.
- Returns:
- Dict[OutputIdentifier, BestScore]
the best scores for each dataset/post-processing parameter/criterion combination
- Raises:
AttributeError – if the best scores are not set
Examples
>>> evaluator = Evaluator() >>> evaluator.best_scores {}
Note
This function is used to return the best scores for each dataset/post-processing parameter/criterion combination.
- is_best(dataset: dacapo.experiments.datasplits.datasets.Dataset, parameter: dacapo.experiments.tasks.post_processors.PostProcessorParameters, criterion: str, score: dacapo.experiments.tasks.evaluators.evaluation_scores.EvaluationScores) bool
Check if the provided score is the best for this dataset/parameter/criterion combo.
- Parameters:
dataset – Dataset the dataset
parameter – PostProcessorParameters the post-processor parameters
criterion – str the criterion
score – EvaluationScores the evaluation scores
- Returns:
- bool
whether the provided score is the best for this dataset/parameter/criterion combo
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> evaluator = Evaluator() >>> dataset = Dataset() >>> parameter = PostProcessorParameters() >>> criterion = "criterion" >>> score = EvaluationScores() >>> evaluator.is_best(dataset, parameter, criterion, score) False
Note
This function is used to check if the provided score is the best for this dataset/parameter/criterion combo.
- get_overall_best(dataset: dacapo.experiments.datasplits.datasets.Dataset, criterion: str)
Return the best score for the given dataset and criterion.
- Parameters:
dataset – Dataset the dataset
criterion – str the criterion
- Returns:
- Optional[float]
the best score for the given dataset and criterion
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> evaluator = Evaluator() >>> dataset = Dataset() >>> criterion = "criterion" >>> evaluator.get_overall_best(dataset, criterion) None
Note
This function is used to return the best score for the given dataset and criterion.
- get_overall_best_parameters(dataset: dacapo.experiments.datasplits.datasets.Dataset, criterion: str)
Return the best parameters for the given dataset and criterion.
- Parameters:
dataset – Dataset the dataset
criterion – str the criterion
- Returns:
- Optional[PostProcessorParameters]
the best parameters for the given dataset and criterion
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> evaluator = Evaluator() >>> dataset = Dataset() >>> criterion = "criterion" >>> evaluator.get_overall_best_parameters(dataset, criterion) None
Note
This function is used to return the best parameters for the given dataset and criterion.
- compare(score_1, score_2, criterion)
Compare two scores for the given criterion.
- Parameters:
score_1 – float the first score
score_2 – float the second score
criterion – str the criterion
- Returns:
- bool
whether the first score is better than the second score for the given criterion
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> evaluator = Evaluator() >>> score_1 = 0.0 >>> score_2 = 0.0 >>> criterion = "criterion" >>> evaluator.compare(score_1, score_2, criterion) False
Note
This function is used to compare two scores for the given criterion.
- set_best(validation_scores: dacapo.experiments.validation_scores.ValidationScores) None
Find the best iteration for each dataset/post_processing_parameter/criterion.
- Parameters:
validation_scores – ValidationScores the validation scores
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> evaluator = Evaluator() >>> validation_scores = ValidationScores() >>> evaluator.set_best(validation_scores) None
Note
This function is used to find the best iteration for each dataset/post_processing_parameter/criterion. Typically, this function is called after the validation scores have been computed.
- property criteria: List[str]
- Abstractmethod:
A list of all criteria for which a model might be “best”. i.e. your criteria might be “precision”, “recall”, and “jaccard”. It is unlikely that the best iteration/post processing parameters will be the same for all 3 of these criteria
- Returns:
- List[str]
the evaluation criteria
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> evaluator = Evaluator() >>> evaluator.criteria []
Note
This function is used to return the evaluation criteria.
- higher_is_better(criterion: str) bool
Wether or not higher is better for this criterion.
- Parameters:
criterion – str the criterion
- Returns:
- bool
whether higher is better for the given criterion
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> evaluator = Evaluator() >>> criterion = "criterion" >>> evaluator.higher_is_better(criterion) False
Note
This function is used to determine whether higher is better for the given criterion.
- bounds(criterion: str) Tuple[int | float | None, int | float | None]
The bounds for this criterion
- Parameters:
criterion – str the criterion
- Returns:
- Tuple[Union[int, float, None], Union[int, float, None]]
the bounds for the given criterion
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> evaluator = Evaluator() >>> criterion = "criterion" >>> evaluator.bounds(criterion) (0, 1)
Note
This function is used to return the bounds for the given criterion.
- store_best(criterion: str) bool
The bounds for this criterion
- Parameters:
criterion – str the criterion
- Returns:
- bool
whether to store the best score for the given criterion
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> evaluator = Evaluator() >>> criterion = "criterion" >>> evaluator.store_best(criterion) False
Note
This function is used to return whether to store the best score for the given criterion.
- property score: dacapo.experiments.tasks.evaluators.evaluation_scores.EvaluationScores
- Abstractmethod:
The evaluation scores.
- Returns:
- EvaluationScores
the evaluation scores
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> evaluator = Evaluator() >>> evaluator.score EvaluationScores()
Note
This function is used to return the evaluation scores.
- class dacapo.experiments.tasks.evaluators.MultiChannelBinarySegmentationEvaluationScores
Class representing evaluation scores for multi-channel binary segmentation tasks.
- channel_scores
The list of channel scores.
- Type:
List[Tuple[str, BinarySegmentationEvaluationScores]]
- higher_is_better(criterion
str) -> bool: Determines whether a higher value is better for a given criterion.
- store_best(criterion
str) -> bool: Whether or not to store the best weights/validation blocks for this criterion.
- bounds(criterion
str) -> Tuple[Union[int, float, None], Union[int, float, None]]: Determines the bounds for a given criterion.
Notes
The evaluation scores are stored as attributes of the class. The class also contains methods to determine whether a higher value is better for a given criterion, whether or not to store the best weights/validation blocks for a given criterion, and the bounds for a given criterion.
- channel_scores: List[Tuple[str, BinarySegmentationEvaluationScores]]
- property criteria
- Returns a list of all criteria for all channels.
- Returns:
The list of criteria.
- Return type:
List[str]
- Raises:
ValueError – If the criterion is not recognized.
Examples
>>> channel_scores = [("channel1", BinarySegmentationEvaluationScores()), ("channel2", BinarySegmentationEvaluationScores())] >>> MultiChannelBinarySegmentationEvaluationScores(channel_scores).criteria
Notes
The method returns a list of all criteria for all channels. The criteria are stored as attributes of the class.
- static higher_is_better(criterion: str) bool
Determines whether a higher value is better for a given criterion.
- Parameters:
criterion (str) – The evaluation criterion.
- Returns:
True if a higher value is better, False otherwise.
- Return type:
bool
- Raises:
ValueError – If the criterion is not recognized.
Examples
>>> MultiChannelBinarySegmentationEvaluationScores.higher_is_better("channel1__dice") True >>> MultiChannelBinarySegmentationEvaluationScores.higher_is_better("channel1__f1_score") True
Notes
The method returns True if the criterion is recognized and False otherwise. Whether a higher value is better for a given criterion is determined by the mapping dictionary.
- static store_best(criterion: str) bool
Determines whether or not to store the best weights/validation blocks for a given criterion.
- Parameters:
criterion (str) – The evaluation criterion.
- Returns:
True if the best weights/validation blocks should be stored, False otherwise.
- Return type:
bool
- Raises:
ValueError – If the criterion is not recognized.
Examples
>>> MultiChannelBinarySegmentationEvaluationScores.store_best("channel1__dice") False >>> MultiChannelBinarySegmentationEvaluationScores.store_best("channel1__f1_score") True
Notes
The method returns True if the criterion is recognized and False otherwise. Whether or not to store the best weights/validation blocks for a given criterion is determined by the mapping dictionary.
- static bounds(criterion: str) Tuple[int | float | None, int | float | None]
Determines the bounds for a given criterion. The bounds are used to determine the best value for a given criterion.
- Parameters:
criterion (str) – The evaluation criterion.
- Returns:
The lower and upper bounds for the criterion.
- Return type:
Tuple[Union[int, float, None], Union[int, float, None]]
- Raises:
ValueError – If the criterion is not recognized.
Examples
>>> MultiChannelBinarySegmentationEvaluationScores.bounds("channel1__dice") (0, 1) >>> MultiChannelBinarySegmentationEvaluationScores.bounds("channel1__hausdorff") (0, nan)
Notes
The method returns the lower and upper bounds for the criterion. The bounds are determined by the mapping dictionary.
- class dacapo.experiments.tasks.evaluators.BinarySegmentationEvaluationScores
Class representing evaluation scores for binary segmentation tasks.
The metrics include: - Dice coefficient: 2 * |A ∩ B| / |A| + |B| ; where A and B are the binary segmentations - Jaccard coefficient: |A ∩ B| / |A ∪ B| ; where A and B are the binary segmentations - Hausdorff distance: max(h(A, B), h(B, A)) ; where h(A, B) is the Hausdorff distance between A and B - False negative rate: |A - B| / |A| ; where A and B are the binary segmentations - False positive rate: |B - A| / |B| ; where A and B are the binary segmentations - False discovery rate: |B - A| / |A| ; where A and B are the binary segmentations - VOI: Variation of Information; split and merge errors combined into a single measure of segmentation quality - Mean false distance: 0.5 * (mean false positive distance + mean false negative distance) - Mean false negative distance: mean distance of false negatives - Mean false positive distance: mean distance of false positives - Mean false distance clipped: 0.5 * (mean false positive distance clipped + mean false negative distance clipped) ; clipped to a maximum distance - Mean false negative distance clipped: mean distance of false negatives clipped ; clipped to a maximum distance - Mean false positive distance clipped: mean distance of false positives clipped ; clipped to a maximum distance - Precision with tolerance: TP / (TP + FP) ; where TP and FP are the true and false positives within a tolerance distance - Recall with tolerance: TP / (TP + FN) ; where TP and FN are the true and false positives within a tolerance distance - F1 score with tolerance: 2 * (Recall * Precision) / (Recall + Precision) ; where Recall and Precision are the true and false positives within a tolerance distance - Precision: TP / (TP + FP) ; where TP and FP are the true and false positives - Recall: TP / (TP + FN) ; where TP and FN are the true and false positives - F1 score: 2 * (Recall * Precision) / (Recall + Precision) ; where Recall and Precision are the true and false positives
- dice
The Dice coefficient.
- Type:
float
- jaccard
The Jaccard index.
- Type:
float
- hausdorff
The Hausdorff distance.
- Type:
float
- false_negative_rate
The false negative rate.
- Type:
float
- false_negative_rate_with_tolerance
The false negative rate with tolerance.
- Type:
float
- false_positive_rate
The false positive rate.
- Type:
float
- false_discovery_rate
The false discovery rate.
- Type:
float
- false_positive_rate_with_tolerance
The false positive rate with tolerance.
- Type:
float
- voi
The variation of information.
- Type:
float
- mean_false_distance
The mean false distance.
- Type:
float
- mean_false_negative_distance
The mean false negative distance.
- Type:
float
- mean_false_positive_distance
The mean false positive distance.
- Type:
float
- mean_false_distance_clipped
The mean false distance clipped.
- Type:
float
- mean_false_negative_distance_clipped
The mean false negative distance clipped.
- Type:
float
- mean_false_positive_distance_clipped
The mean false positive distance clipped.
- Type:
float
- precision_with_tolerance
The precision with tolerance.
- Type:
float
- recall_with_tolerance
The recall with tolerance.
- Type:
float
- f1_score_with_tolerance
The F1 score with tolerance.
- Type:
float
- precision
The precision.
- Type:
float
- recall
The recall.
- Type:
float
- f1_score
The F1 score.
- Type:
float
- store_best(criterion
str) -> bool: Whether or not to store the best weights/validation blocks for this criterion.
- higher_is_better(criterion
str) -> bool: Determines whether a higher value is better for a given criterion.
- bounds(criterion
str) -> Tuple[Union[int, float, None], Union[int, float, None]]: Determines the bounds for a given criterion.
Notes
The evaluation scores are stored as attributes of the class. The class also contains methods to determine whether a higher value is better for a given criterion, whether or not to store the best weights/validation blocks for a given criterion, and the bounds for a given criterion.
- dice: float
- jaccard: float
- hausdorff: float
- false_negative_rate: float
- false_negative_rate_with_tolerance: float
- false_positive_rate: float
- false_discovery_rate: float
- false_positive_rate_with_tolerance: float
- voi: float
- mean_false_distance: float
- mean_false_negative_distance: float
- mean_false_positive_distance: float
- mean_false_distance_clipped: float
- mean_false_negative_distance_clipped: float
- mean_false_positive_distance_clipped: float
- precision_with_tolerance: float
- recall_with_tolerance: float
- f1_score_with_tolerance: float
- precision: float
- recall: float
- f1_score: float
- criteria = ['dice', 'jaccard', 'hausdorff', 'false_negative_rate', 'false_negative_rate_with_tolerance',...
The evaluation criteria.
- Returns:
- List[str]
the evaluation criteria
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> evaluation_scores = EvaluationScores() >>> evaluation_scores.criteria ["criterion1", "criterion2"]
Note
This function is used to return the evaluation criteria.
- static store_best(criterion: str) bool
Determines whether or not to store the best weights/validation blocks for a given criterion.
- Parameters:
criterion (str) – The evaluation criterion.
- Returns:
True if the best weights/validation blocks should be stored, False otherwise.
- Return type:
bool
- Raises:
ValueError – If the criterion is not recognized.
Examples
>>> BinarySegmentationEvaluationScores.store_best("dice") False >>> BinarySegmentationEvaluationScores.store_best("f1_score") True
Notes
The method returns True if the criterion is recognized and False otherwise. Whether or not to store the best weights/validation blocks for a given criterion is determined by the mapping dictionary.
- static higher_is_better(criterion: str) bool
Determines whether a higher value is better for a given criterion.
- Parameters:
criterion (str) – The evaluation criterion.
- Returns:
True if a higher value is better, False otherwise.
- Return type:
bool
- Raises:
ValueError – If the criterion is not recognized.
Examples
>>> BinarySegmentationEvaluationScores.higher_is_better("dice") True >>> BinarySegmentationEvaluationScores.higher_is_better("f1_score") True
Notes
The method returns True if the criterion is recognized and False otherwise. Whether a higher value is better for a given criterion is determined by the mapping dictionary.
- static bounds(criterion: str) Tuple[int | float | None, int | float | None]
Determines the bounds for a given criterion. The bounds are used to determine the best value for a given criterion.
- Parameters:
criterion (str) – The evaluation criterion.
- Returns:
The lower and upper bounds for the criterion.
- Return type:
Tuple[Union[int, float, None], Union[int, float, None]]
- Raises:
ValueError – If the criterion is not recognized.
Examples
>>> BinarySegmentationEvaluationScores.bounds("dice") (0, 1) >>> BinarySegmentationEvaluationScores.bounds("hausdorff") (0, nan)
Notes
The method returns the lower and upper bounds for the criterion. The bounds are determined by the mapping dictionary.
- class dacapo.experiments.tasks.evaluators.BinarySegmentationEvaluator(clip_distance: float, tol_distance: float, channels: List[str])
Given a binary segmentation, compute various metrics to determine their similarity. The metrics include: - Dice coefficient: 2 * |A ∩ B| / |A| + |B| ; where A and B are the binary segmentations - Jaccard coefficient: |A ∩ B| / |A ∪ B| ; where A and B are the binary segmentations - Hausdorff distance: max(h(A, B), h(B, A)) ; where h(A, B) is the Hausdorff distance between A and B - False negative rate: |A - B| / |A| ; where A and B are the binary segmentations - False positive rate: |B - A| / |B| ; where A and B are the binary segmentations - False discovery rate: |B - A| / |A| ; where A and B are the binary segmentations - VOI: Variation of Information; split and merge errors combined into a single measure of segmentation quality - Mean false distance: 0.5 * (mean false positive distance + mean false negative distance) - Mean false negative distance: mean distance of false negatives - Mean false positive distance: mean distance of false positives - Mean false distance clipped: 0.5 * (mean false positive distance clipped + mean false negative distance clipped) ; clipped to a maximum distance - Mean false negative distance clipped: mean distance of false negatives clipped ; clipped to a maximum distance - Mean false positive distance clipped: mean distance of false positives clipped ; clipped to a maximum distance - Precision with tolerance: TP / (TP + FP) ; where TP and FP are the true and false positives within a tolerance distance - Recall with tolerance: TP / (TP + FN) ; where TP and FN are the true and false positives within a tolerance distance - F1 score with tolerance: 2 * (Recall * Precision) / (Recall + Precision) ; where Recall and Precision are the true and false positives within a tolerance distance - Precision: TP / (TP + FP) ; where TP and FP are the true and false positives - Recall: TP / (TP + FN) ; where TP and FN are the true and false positives - F1 score: 2 * (Recall * Precision) / (Recall + Precision) ; where Recall and Precision are the true and false positives
- clip_distance
float the clip distance
- tol_distance
float the tolerance distance
- channels
List[str] the channels
- criteria
List[str] the evaluation criteria
- evaluate(output_array_identifier, evaluation_array)
Evaluate the output array against the evaluation array.
- score()
Return the evaluation scores.
Note
The BinarySegmentationEvaluator class is used to evaluate the performance of a binary segmentation task. The class provides methods to evaluate the output array against the evaluation array and return the evaluation scores. All evaluation scores should inherit from this class.
Clip distance is the maximum distance between the ground truth and the predicted segmentation for a pixel to be considered a false positive. Tolerance distance is the maximum distance between the ground truth and the predicted segmentation for a pixel to be considered a true positive. Channels are the channels of the binary segmentation. Criteria are the evaluation criteria.
- criteria = ['jaccard', 'voi']
A list of all criteria for which a model might be “best”. i.e. your criteria might be “precision”, “recall”, and “jaccard”. It is unlikely that the best iteration/post processing parameters will be the same for all 3 of these criteria
- Returns:
- List[str]
the evaluation criteria
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> evaluator = Evaluator() >>> evaluator.criteria []
Note
This function is used to return the evaluation criteria.
- clip_distance
- tol_distance
- channels
- evaluate(output_array_identifier, evaluation_array)
Evaluate the output array against the evaluation array.
- Parameters:
output_array_identifier – str the identifier of the output array
evaluation_array – ZarrArray the evaluation array
- Returns:
- BinarySegmentationEvaluationScores or MultiChannelBinarySegmentationEvaluationScores
the evaluation scores
- Raises:
ValueError – if the output array identifier is not valid
Examples
>>> binary_segmentation_evaluator = BinarySegmentationEvaluator(clip_distance=200, tol_distance=40, channels=["channel1", "channel2"]) >>> output_array_identifier = "output_array" >>> evaluation_array = ZarrArray.open_from_array_identifier("evaluation_array") >>> binary_segmentation_evaluator.evaluate(output_array_identifier, evaluation_array) BinarySegmentationEvaluationScores(dice=0.0, jaccard=0.0, hausdorff=0.0, false_negative_rate=0.0, false_positive_rate=0.0, false_discovery_rate=0.0, voi=0.0, mean_false_distance=0.0, mean_false_negative_distance=0.0, mean_false_positive_distance=0.0, mean_false_distance_clipped=0.0, mean_false_negative_distance_clipped=0.0, mean_false_positive_distance_clipped=0.0, precision_with_tolerance=0.0, recall_with_tolerance=0.0, f1_score_with_tolerance=0.0, precision=0.0, recall=0.0, f1_score=0.0)
Note
This function is used to evaluate the output array against the evaluation array.
- property score
- Return the evaluation scores.
- Returns:
- BinarySegmentationEvaluationScores or MultiChannelBinarySegmentationEvaluationScores
the evaluation scores
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> binary_segmentation_evaluator = BinarySegmentationEvaluator(clip_distance=200, tol_distance=40, channels=["channel1", "channel2"]) >>> binary_segmentation_evaluator.score BinarySegmentationEvaluationScores(dice=0.0, jaccard=0.0, hausdorff=0.0, false_negative_rate=0.0, false_positive_rate=0.0, false_discovery_rate=0.0, voi=0.0, mean_false_distance=0.0, mean_false_negative_distance=0.0, mean_false_positive_distance=0.0, mean_false_distance_clipped=0.0, mean_false_negative_distance_clipped=0.0, mean_false_positive_distance_clipped=0.0, precision_with_tolerance=0.0, recall_with_tolerance=0.0, f1_score_with_tolerance=0.0, precision=0.0, recall=0.0, f1_score=0.0)
Note
This function is used to return the evaluation scores.
- class dacapo.experiments.tasks.evaluators.InstanceEvaluationScores
The evaluation scores for the instance segmentation task. The scores include the variation of information (VOI) split, VOI merge, and VOI.
- voi_split
float the variation of information (VOI) split
- voi_merge
float the variation of information (VOI) merge
- voi
float the variation of information (VOI)
- higher_is_better(criterion)
Return whether higher is better for the given criterion.
- bounds(criterion)
Return the bounds for the given criterion.
- store_best(criterion)
Return whether to store the best score for the given criterion.
Note
The InstanceEvaluationScores class is used to store the evaluation scores for the instance segmentation task.
- criteria = ['voi_split', 'voi_merge', 'voi']
The evaluation criteria.
- Returns:
- List[str]
the evaluation criteria
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> evaluation_scores = EvaluationScores() >>> evaluation_scores.criteria ["criterion1", "criterion2"]
Note
This function is used to return the evaluation criteria.
- voi_split: float
- voi_merge: float
- property voi
- Return the average of the VOI split and VOI merge.
- Returns:
- float
the average of the VOI split and VOI merge
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> instance_evaluation_scores = InstanceEvaluationScores(voi_split=0.1, voi_merge=0.2) >>> instance_evaluation_scores.voi 0.15
Note
This function is used to calculate the average of the VOI split and VOI merge.
- static higher_is_better(criterion: str) bool
Return whether higher is better for the given criterion.
- Parameters:
criterion – str the evaluation criterion
- Returns:
- bool
whether higher is better for the given criterion
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> InstanceEvaluationScores.higher_is_better("voi_split") False
Note
This function is used to determine whether higher is better for the given criterion.
- static bounds(criterion: str) Tuple[int | float | None, int | float | None]
Return the bounds for the given criterion.
- Parameters:
criterion – str the evaluation criterion
- Returns:
- Tuple[Union[int, float, None], Union[int, float, None]]
the bounds for the given criterion
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> InstanceEvaluationScores.bounds("voi_split") (0, 1)
Note
This function is used to return the bounds for the given criterion.
- static store_best(criterion: str) bool
Return whether to store the best score for the given criterion.
- Parameters:
criterion – str the evaluation criterion
- Returns:
- bool
whether to store the best score for the given criterion
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> InstanceEvaluationScores.store_best("voi_split") True
Note
This function is used to determine whether to store the best score for the given criterion.
- class dacapo.experiments.tasks.evaluators.InstanceEvaluator
A class representing an evaluator for instance segmentation tasks.
- criteria
List[str] the evaluation criteria
- evaluate(output_array_identifier, evaluation_array)
Evaluate the output array against the evaluation array.
- score()
Return the evaluation scores.
Note
The InstanceEvaluator class is used to evaluate the performance of an instance segmentation task.
- criteria: List[str] = ['voi_merge', 'voi_split', 'voi']
A list of all criteria for which a model might be “best”. i.e. your criteria might be “precision”, “recall”, and “jaccard”. It is unlikely that the best iteration/post processing parameters will be the same for all 3 of these criteria
- Returns:
- List[str]
the evaluation criteria
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> evaluator = Evaluator() >>> evaluator.criteria []
Note
This function is used to return the evaluation criteria.
- evaluate(output_array_identifier, evaluation_array)
Evaluate the output array against the evaluation array.
- Parameters:
output_array_identifier – str the identifier of the output array
evaluation_array – ZarrArray the evaluation array
- Returns:
- InstanceEvaluationScores
the evaluation scores
- Raises:
ValueError – if the output array identifier is not valid
Examples
>>> instance_evaluator = InstanceEvaluator() >>> output_array_identifier = "output_array" >>> evaluation_array = ZarrArray.open_from_array_identifier("evaluation_array") >>> instance_evaluator.evaluate(output_array_identifier, evaluation_array) InstanceEvaluationScores(voi_merge=0.0, voi_split=0.0)
Note
This function is used to evaluate the output array against the evaluation array.
- property score: dacapo.experiments.tasks.evaluators.instance_evaluation_scores.InstanceEvaluationScores
Return the evaluation scores.
- Returns:
- InstanceEvaluationScores
the evaluation scores
- Raises:
NotImplementedError – if the function is not implemented
Examples
>>> instance_evaluator = InstanceEvaluator() >>> instance_evaluator.score InstanceEvaluationScores(voi_merge=0.0, voi_split=0.0)
Note
This function is used to return the evaluation scores.