dacapo.experiments.tasks.evaluators =================================== .. py:module:: dacapo.experiments.tasks.evaluators Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/dacapo/experiments/tasks/evaluators/binary_segmentation_evaluation_scores/index /autoapi/dacapo/experiments/tasks/evaluators/binary_segmentation_evaluator/index /autoapi/dacapo/experiments/tasks/evaluators/dummy_evaluation_scores/index /autoapi/dacapo/experiments/tasks/evaluators/dummy_evaluator/index /autoapi/dacapo/experiments/tasks/evaluators/evaluation_scores/index /autoapi/dacapo/experiments/tasks/evaluators/evaluator/index /autoapi/dacapo/experiments/tasks/evaluators/instance_evaluation_scores/index /autoapi/dacapo/experiments/tasks/evaluators/instance_evaluator/index Classes ------- .. autoapisummary:: dacapo.experiments.tasks.evaluators.DummyEvaluationScores dacapo.experiments.tasks.evaluators.DummyEvaluator dacapo.experiments.tasks.evaluators.EvaluationScores dacapo.experiments.tasks.evaluators.Evaluator dacapo.experiments.tasks.evaluators.MultiChannelBinarySegmentationEvaluationScores dacapo.experiments.tasks.evaluators.BinarySegmentationEvaluationScores dacapo.experiments.tasks.evaluators.BinarySegmentationEvaluator dacapo.experiments.tasks.evaluators.InstanceEvaluationScores dacapo.experiments.tasks.evaluators.InstanceEvaluator Package Contents ---------------- .. py:class:: DummyEvaluationScores The evaluation scores for the dummy task. The scores include the frizz level and blipp score. A higher frizz level indicates more frizz, while a higher blipp score indicates better performance. .. attribute:: frizz_level float the frizz level .. attribute:: blipp_score float the blipp score .. method:: higher_is_better(criterion) Return whether higher is better for the given criterion. .. method:: bounds(criterion) Return the bounds for the given criterion. .. method:: store_best(criterion) Return whether to store the best score for the given criterion. .. note:: The DummyEvaluationScores class is used to store the evaluation scores for the dummy task. The class also provides methods to determine whether higher is better for a given criterion, the bounds for a given criterion, and whether to store the best score for a given criterion. .. py:attribute:: criteria :value: ['frizz_level', 'blipp_score'] The evaluation criteria. :returns: List[str] the evaluation criteria :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> evaluation_scores = EvaluationScores() >>> evaluation_scores.criteria ["criterion1", "criterion2"] .. note:: This function is used to return the evaluation criteria. .. py:attribute:: frizz_level :type: float .. py:attribute:: blipp_score :type: float .. py:method:: higher_is_better(criterion: str) -> bool :staticmethod: Return whether higher is better for the given criterion. :param criterion: str the evaluation criterion :returns: bool whether higher is better for this criterion :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> DummyEvaluationScores.higher_is_better("frizz_level") True .. note:: This function is used to determine whether higher is better for the given criterion. .. py:method:: bounds(criterion: str) -> Tuple[Union[int, float, None], Union[int, float, None]] :staticmethod: Return the bounds for the given criterion. :param criterion: str the evaluation criterion :returns: Tuple[Union[int, float, None], Union[int, float, None]] the bounds for the given criterion :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> DummyEvaluationScores.bounds("frizz_level") (0.0, 1.0) .. note:: This function is used to return the bounds for the given criterion. .. py:method:: store_best(criterion: str) -> bool :staticmethod: Return whether to store the best score for the given criterion. :param criterion: str the evaluation criterion :returns: bool whether to store the best score for the given criterion :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> DummyEvaluationScores.store_best("frizz_level") True .. note:: This function is used to determine whether to store the best score for the given criterion. .. py:class:: DummyEvaluator A class representing a dummy evaluator. This evaluator is used for testing purposes. .. attribute:: criteria List[str] the evaluation criteria .. method:: evaluate(output_array_identifier, evaluation_dataset) Evaluate the output array against the evaluation dataset. .. method:: score Return the evaluation scores. .. note:: The DummyEvaluator class is used to evaluate the performance of a dummy task. .. py:attribute:: criteria :value: ['frizz_level', 'blipp_score'] A list of all criteria for which a model might be "best". i.e. your criteria might be "precision", "recall", and "jaccard". It is unlikely that the best iteration/post processing parameters will be the same for all 3 of these criteria :returns: List[str] the evaluation criteria :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> evaluator = Evaluator() >>> evaluator.criteria [] .. note:: This function is used to return the evaluation criteria. .. py:method:: evaluate(output_array_identifier, evaluation_dataset) Evaluate the given output array and dataset and returns the scores based on predefined criteria. :param output_array_identifier: The output array to be evaluated. :param evaluation_dataset: The dataset to be used for evaluation. :returns: An object of DummyEvaluationScores class, with the evaluation scores. :rtype: DummyEvaluationScore :raises ValueError: if the output array identifier is not valid .. rubric:: Examples >>> dummy_evaluator = DummyEvaluator() >>> output_array_identifier = "output_array" >>> evaluation_dataset = "evaluation_dataset" >>> dummy_evaluator.evaluate(output_array_identifier, evaluation_dataset) DummyEvaluationScores(frizz_level=0.0, blipp_score=0.0) .. note:: This function is used to evaluate the output array against the evaluation dataset. .. py:property:: score :type: dacapo.experiments.tasks.evaluators.dummy_evaluation_scores.DummyEvaluationScores Return the evaluation scores. :returns: An object of DummyEvaluationScores class, with the evaluation scores. :rtype: DummyEvaluationScores .. rubric:: Examples >>> dummy_evaluator = DummyEvaluator() >>> dummy_evaluator.score DummyEvaluationScores(frizz_level=0.0, blipp_score=0.0) .. note:: This function is used to return the evaluation scores. .. py:class:: EvaluationScores Base class for evaluation scores. This class is used to store the evaluation scores for a task. The scores include the evaluation criteria. The class also provides methods to determine whether higher is better for a given criterion, the bounds for a given criterion, and whether to store the best score for a given criterion. .. attribute:: criteria List[str] the evaluation criteria .. method:: higher_is_better(criterion) Return whether higher is better for the given criterion. .. method:: bounds(criterion) Return the bounds for the given criterion. .. method:: store_best(criterion) Return whether to store the best score for the given criterion. .. note:: The EvaluationScores class is used to store the evaluation scores for a task. All evaluation scores should inherit from this class. .. py:property:: criteria :type: List[str] :abstractmethod: The evaluation criteria. :returns: List[str] the evaluation criteria :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> evaluation_scores = EvaluationScores() >>> evaluation_scores.criteria ["criterion1", "criterion2"] .. note:: This function is used to return the evaluation criteria. .. py:method:: higher_is_better(criterion: str) -> bool :staticmethod: :abstractmethod: Wether or not higher is better for this criterion. :param criterion: str the evaluation criterion :returns: bool whether higher is better for this criterion :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> evaluation_scores = EvaluationScores() >>> criterion = "criterion1" >>> evaluation_scores.higher_is_better(criterion) True .. note:: This function is used to determine whether higher is better for a given criterion. .. py:method:: bounds(criterion: str) -> Tuple[Union[int, float, None], Union[int, float, None]] :staticmethod: :abstractmethod: The bounds for this criterion. :param criterion: str the evaluation criterion :returns: Tuple[Union[int, float, None], Union[int, float, None]] the bounds for this criterion :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> evaluation_scores = EvaluationScores() >>> criterion = "criterion1" >>> evaluation_scores.bounds(criterion) (0, 1) .. note:: This function is used to return the bounds for the given criterion. .. py:method:: store_best(criterion: str) -> bool :staticmethod: :abstractmethod: Whether or not to save the best validation block and model weights for this criterion. :param criterion: str the evaluation criterion :returns: bool whether to store the best score for this criterion :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> evaluation_scores = EvaluationScores() >>> criterion = "criterion1" >>> evaluation_scores.store_best(criterion) True .. note:: This function is used to return whether to store the best score for the given criterion. .. py:class:: Evaluator Base class of all evaluators: An abstract class representing an evaluator that compares and evaluates the output array against the evaluation array. An evaluator takes a post-processor's output and compares it against ground-truth. It then returns a set of scores that can be used to determine the quality of the post-processor's output. .. attribute:: best_scores Dict[OutputIdentifier, BestScore] the best scores for each dataset/post-processing parameter/criterion combination .. method:: evaluate(output_array_identifier, evaluation_array) Compare and evaluate the output array against the evaluation array. .. method:: is_best(dataset, parameter, criterion, score) Check if the provided score is the best for this dataset/parameter/criterion combo. .. method:: get_overall_best(dataset, criterion) Return the best score for the given dataset and criterion. .. method:: get_overall_best_parameters(dataset, criterion) Return the best parameters for the given dataset and criterion. .. method:: compare(score_1, score_2, criterion) Compare two scores for the given criterion. .. method:: set_best(validation_scores) Find the best iteration for each dataset/post_processing_parameter/criterion. .. method:: higher_is_better(criterion) Return whether higher is better for the given criterion. .. method:: bounds(criterion) Return the bounds for the given criterion. .. method:: store_best(criterion) Return whether to store the best score for the given criterion. .. note:: The Evaluator class is used to compare and evaluate the output array against the evaluation array. .. py:method:: evaluate(output_array_identifier: dacapo.store.local_array_store.LocalArrayIdentifier, evaluation_array: dacapo.experiments.datasplits.datasets.arrays.Array) -> dacapo.experiments.tasks.evaluators.evaluation_scores.EvaluationScores :abstractmethod: Compares and evaluates the output array against the evaluation array. :param output_array_identifier: LocalArrayIdentifier The identifier of the output array. :param evaluation_array: Array The evaluation array. :returns: EvaluationScores The evaluation scores. :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> evaluator = Evaluator() >>> output_array_identifier = LocalArrayIdentifier("output_array") >>> evaluation_array = Array() >>> evaluator.evaluate(output_array_identifier, evaluation_array) EvaluationScores() .. note:: This function is used to compare and evaluate the output array against the evaluation array. .. py:property:: best_scores :type: Dict[OutputIdentifier, BestScore] The best scores for each dataset/post-processing parameter/criterion combination. :returns: Dict[OutputIdentifier, BestScore] the best scores for each dataset/post-processing parameter/criterion combination :raises AttributeError: if the best scores are not set .. rubric:: Examples >>> evaluator = Evaluator() >>> evaluator.best_scores {} .. note:: This function is used to return the best scores for each dataset/post-processing parameter/criterion combination. .. py:method:: is_best(dataset: dacapo.experiments.datasplits.datasets.Dataset, parameter: dacapo.experiments.tasks.post_processors.PostProcessorParameters, criterion: str, score: dacapo.experiments.tasks.evaluators.evaluation_scores.EvaluationScores) -> bool Check if the provided score is the best for this dataset/parameter/criterion combo. :param dataset: Dataset the dataset :param parameter: PostProcessorParameters the post-processor parameters :param criterion: str the criterion :param score: EvaluationScores the evaluation scores :returns: bool whether the provided score is the best for this dataset/parameter/criterion combo :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> evaluator = Evaluator() >>> dataset = Dataset() >>> parameter = PostProcessorParameters() >>> criterion = "criterion" >>> score = EvaluationScores() >>> evaluator.is_best(dataset, parameter, criterion, score) False .. note:: This function is used to check if the provided score is the best for this dataset/parameter/criterion combo. .. py:method:: get_overall_best(dataset: dacapo.experiments.datasplits.datasets.Dataset, criterion: str) Return the best score for the given dataset and criterion. :param dataset: Dataset the dataset :param criterion: str the criterion :returns: Optional[float] the best score for the given dataset and criterion :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> evaluator = Evaluator() >>> dataset = Dataset() >>> criterion = "criterion" >>> evaluator.get_overall_best(dataset, criterion) None .. note:: This function is used to return the best score for the given dataset and criterion. .. py:method:: get_overall_best_parameters(dataset: dacapo.experiments.datasplits.datasets.Dataset, criterion: str) Return the best parameters for the given dataset and criterion. :param dataset: Dataset the dataset :param criterion: str the criterion :returns: Optional[PostProcessorParameters] the best parameters for the given dataset and criterion :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> evaluator = Evaluator() >>> dataset = Dataset() >>> criterion = "criterion" >>> evaluator.get_overall_best_parameters(dataset, criterion) None .. note:: This function is used to return the best parameters for the given dataset and criterion. .. py:method:: compare(score_1, score_2, criterion) Compare two scores for the given criterion. :param score_1: float the first score :param score_2: float the second score :param criterion: str the criterion :returns: bool whether the first score is better than the second score for the given criterion :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> evaluator = Evaluator() >>> score_1 = 0.0 >>> score_2 = 0.0 >>> criterion = "criterion" >>> evaluator.compare(score_1, score_2, criterion) False .. note:: This function is used to compare two scores for the given criterion. .. py:method:: set_best(validation_scores: dacapo.experiments.validation_scores.ValidationScores) -> None Find the best iteration for each dataset/post_processing_parameter/criterion. :param validation_scores: ValidationScores the validation scores :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> evaluator = Evaluator() >>> validation_scores = ValidationScores() >>> evaluator.set_best(validation_scores) None .. note:: This function is used to find the best iteration for each dataset/post_processing_parameter/criterion. Typically, this function is called after the validation scores have been computed. .. py:property:: criteria :type: List[str] :abstractmethod: A list of all criteria for which a model might be "best". i.e. your criteria might be "precision", "recall", and "jaccard". It is unlikely that the best iteration/post processing parameters will be the same for all 3 of these criteria :returns: List[str] the evaluation criteria :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> evaluator = Evaluator() >>> evaluator.criteria [] .. note:: This function is used to return the evaluation criteria. .. py:method:: higher_is_better(criterion: str) -> bool Wether or not higher is better for this criterion. :param criterion: str the criterion :returns: bool whether higher is better for the given criterion :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> evaluator = Evaluator() >>> criterion = "criterion" >>> evaluator.higher_is_better(criterion) False .. note:: This function is used to determine whether higher is better for the given criterion. .. py:method:: bounds(criterion: str) -> Tuple[Union[int, float, None], Union[int, float, None]] The bounds for this criterion :param criterion: str the criterion :returns: Tuple[Union[int, float, None], Union[int, float, None]] the bounds for the given criterion :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> evaluator = Evaluator() >>> criterion = "criterion" >>> evaluator.bounds(criterion) (0, 1) .. note:: This function is used to return the bounds for the given criterion. .. py:method:: store_best(criterion: str) -> bool The bounds for this criterion :param criterion: str the criterion :returns: bool whether to store the best score for the given criterion :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> evaluator = Evaluator() >>> criterion = "criterion" >>> evaluator.store_best(criterion) False .. note:: This function is used to return whether to store the best score for the given criterion. .. py:property:: score :type: dacapo.experiments.tasks.evaluators.evaluation_scores.EvaluationScores :abstractmethod: The evaluation scores. :returns: EvaluationScores the evaluation scores :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> evaluator = Evaluator() >>> evaluator.score EvaluationScores() .. note:: This function is used to return the evaluation scores. .. py:class:: MultiChannelBinarySegmentationEvaluationScores Class representing evaluation scores for multi-channel binary segmentation tasks. .. attribute:: channel_scores The list of channel scores. :type: List[Tuple[str, BinarySegmentationEvaluationScores]] .. method:: higher_is_better(criterion str) -> bool: Determines whether a higher value is better for a given criterion. .. method:: store_best(criterion str) -> bool: Whether or not to store the best weights/validation blocks for this criterion. .. method:: bounds(criterion str) -> Tuple[Union[int, float, None], Union[int, float, None]]: Determines the bounds for a given criterion. .. rubric:: Notes The evaluation scores are stored as attributes of the class. The class also contains methods to determine whether a higher value is better for a given criterion, whether or not to store the best weights/validation blocks for a given criterion, and the bounds for a given criterion. .. py:attribute:: channel_scores :type: List[Tuple[str, BinarySegmentationEvaluationScores]] .. py:property:: criteria Returns a list of all criteria for all channels. :returns: The list of criteria. :rtype: List[str] :raises ValueError: If the criterion is not recognized. .. rubric:: Examples >>> channel_scores = [("channel1", BinarySegmentationEvaluationScores()), ("channel2", BinarySegmentationEvaluationScores())] >>> MultiChannelBinarySegmentationEvaluationScores(channel_scores).criteria .. rubric:: Notes The method returns a list of all criteria for all channels. The criteria are stored as attributes of the class. .. py:method:: higher_is_better(criterion: str) -> bool :staticmethod: Determines whether a higher value is better for a given criterion. :param criterion: The evaluation criterion. :type criterion: str :returns: True if a higher value is better, False otherwise. :rtype: bool :raises ValueError: If the criterion is not recognized. .. rubric:: Examples >>> MultiChannelBinarySegmentationEvaluationScores.higher_is_better("channel1__dice") True >>> MultiChannelBinarySegmentationEvaluationScores.higher_is_better("channel1__f1_score") True .. rubric:: Notes The method returns True if the criterion is recognized and False otherwise. Whether a higher value is better for a given criterion is determined by the mapping dictionary. .. py:method:: store_best(criterion: str) -> bool :staticmethod: Determines whether or not to store the best weights/validation blocks for a given criterion. :param criterion: The evaluation criterion. :type criterion: str :returns: True if the best weights/validation blocks should be stored, False otherwise. :rtype: bool :raises ValueError: If the criterion is not recognized. .. rubric:: Examples >>> MultiChannelBinarySegmentationEvaluationScores.store_best("channel1__dice") False >>> MultiChannelBinarySegmentationEvaluationScores.store_best("channel1__f1_score") True .. rubric:: Notes The method returns True if the criterion is recognized and False otherwise. Whether or not to store the best weights/validation blocks for a given criterion is determined by the mapping dictionary. .. py:method:: bounds(criterion: str) -> Tuple[Union[int, float, None], Union[int, float, None]] :staticmethod: Determines the bounds for a given criterion. The bounds are used to determine the best value for a given criterion. :param criterion: The evaluation criterion. :type criterion: str :returns: The lower and upper bounds for the criterion. :rtype: Tuple[Union[int, float, None], Union[int, float, None]] :raises ValueError: If the criterion is not recognized. .. rubric:: Examples >>> MultiChannelBinarySegmentationEvaluationScores.bounds("channel1__dice") (0, 1) >>> MultiChannelBinarySegmentationEvaluationScores.bounds("channel1__hausdorff") (0, nan) .. rubric:: Notes The method returns the lower and upper bounds for the criterion. The bounds are determined by the mapping dictionary. .. py:class:: BinarySegmentationEvaluationScores Class representing evaluation scores for binary segmentation tasks. The metrics include: - Dice coefficient: 2 * |A ∩ B| / |A| + |B| ; where A and B are the binary segmentations - Jaccard coefficient: |A ∩ B| / |A ∪ B| ; where A and B are the binary segmentations - Hausdorff distance: max(h(A, B), h(B, A)) ; where h(A, B) is the Hausdorff distance between A and B - False negative rate: |A - B| / |A| ; where A and B are the binary segmentations - False positive rate: |B - A| / |B| ; where A and B are the binary segmentations - False discovery rate: |B - A| / |A| ; where A and B are the binary segmentations - VOI: Variation of Information; split and merge errors combined into a single measure of segmentation quality - Mean false distance: 0.5 * (mean false positive distance + mean false negative distance) - Mean false negative distance: mean distance of false negatives - Mean false positive distance: mean distance of false positives - Mean false distance clipped: 0.5 * (mean false positive distance clipped + mean false negative distance clipped) ; clipped to a maximum distance - Mean false negative distance clipped: mean distance of false negatives clipped ; clipped to a maximum distance - Mean false positive distance clipped: mean distance of false positives clipped ; clipped to a maximum distance - Precision with tolerance: TP / (TP + FP) ; where TP and FP are the true and false positives within a tolerance distance - Recall with tolerance: TP / (TP + FN) ; where TP and FN are the true and false positives within a tolerance distance - F1 score with tolerance: 2 * (Recall * Precision) / (Recall + Precision) ; where Recall and Precision are the true and false positives within a tolerance distance - Precision: TP / (TP + FP) ; where TP and FP are the true and false positives - Recall: TP / (TP + FN) ; where TP and FN are the true and false positives - F1 score: 2 * (Recall * Precision) / (Recall + Precision) ; where Recall and Precision are the true and false positives .. attribute:: dice The Dice coefficient. :type: float .. attribute:: jaccard The Jaccard index. :type: float .. attribute:: hausdorff The Hausdorff distance. :type: float .. attribute:: false_negative_rate The false negative rate. :type: float .. attribute:: false_negative_rate_with_tolerance The false negative rate with tolerance. :type: float .. attribute:: false_positive_rate The false positive rate. :type: float .. attribute:: false_discovery_rate The false discovery rate. :type: float .. attribute:: false_positive_rate_with_tolerance The false positive rate with tolerance. :type: float .. attribute:: voi The variation of information. :type: float .. attribute:: mean_false_distance The mean false distance. :type: float .. attribute:: mean_false_negative_distance The mean false negative distance. :type: float .. attribute:: mean_false_positive_distance The mean false positive distance. :type: float .. attribute:: mean_false_distance_clipped The mean false distance clipped. :type: float .. attribute:: mean_false_negative_distance_clipped The mean false negative distance clipped. :type: float .. attribute:: mean_false_positive_distance_clipped The mean false positive distance clipped. :type: float .. attribute:: precision_with_tolerance The precision with tolerance. :type: float .. attribute:: recall_with_tolerance The recall with tolerance. :type: float .. attribute:: f1_score_with_tolerance The F1 score with tolerance. :type: float .. attribute:: precision The precision. :type: float .. attribute:: recall The recall. :type: float .. attribute:: f1_score The F1 score. :type: float .. method:: store_best(criterion str) -> bool: Whether or not to store the best weights/validation blocks for this criterion. .. method:: higher_is_better(criterion str) -> bool: Determines whether a higher value is better for a given criterion. .. method:: bounds(criterion str) -> Tuple[Union[int, float, None], Union[int, float, None]]: Determines the bounds for a given criterion. .. rubric:: Notes The evaluation scores are stored as attributes of the class. The class also contains methods to determine whether a higher value is better for a given criterion, whether or not to store the best weights/validation blocks for a given criterion, and the bounds for a given criterion. .. py:attribute:: dice :type: float .. py:attribute:: jaccard :type: float .. py:attribute:: hausdorff :type: float .. py:attribute:: false_negative_rate :type: float .. py:attribute:: false_negative_rate_with_tolerance :type: float .. py:attribute:: false_positive_rate :type: float .. py:attribute:: false_discovery_rate :type: float .. py:attribute:: false_positive_rate_with_tolerance :type: float .. py:attribute:: voi :type: float .. py:attribute:: mean_false_distance :type: float .. py:attribute:: mean_false_negative_distance :type: float .. py:attribute:: mean_false_positive_distance :type: float .. py:attribute:: mean_false_distance_clipped :type: float .. py:attribute:: mean_false_negative_distance_clipped :type: float .. py:attribute:: mean_false_positive_distance_clipped :type: float .. py:attribute:: precision_with_tolerance :type: float .. py:attribute:: recall_with_tolerance :type: float .. py:attribute:: f1_score_with_tolerance :type: float .. py:attribute:: precision :type: float .. py:attribute:: recall :type: float .. py:attribute:: f1_score :type: float .. py:attribute:: criteria :value: ['dice', 'jaccard', 'hausdorff', 'false_negative_rate', 'false_negative_rate_with_tolerance',... The evaluation criteria. :returns: List[str] the evaluation criteria :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> evaluation_scores = EvaluationScores() >>> evaluation_scores.criteria ["criterion1", "criterion2"] .. note:: This function is used to return the evaluation criteria. .. py:method:: store_best(criterion: str) -> bool :staticmethod: Determines whether or not to store the best weights/validation blocks for a given criterion. :param criterion: The evaluation criterion. :type criterion: str :returns: True if the best weights/validation blocks should be stored, False otherwise. :rtype: bool :raises ValueError: If the criterion is not recognized. .. rubric:: Examples >>> BinarySegmentationEvaluationScores.store_best("dice") False >>> BinarySegmentationEvaluationScores.store_best("f1_score") True .. rubric:: Notes The method returns True if the criterion is recognized and False otherwise. Whether or not to store the best weights/validation blocks for a given criterion is determined by the mapping dictionary. .. py:method:: higher_is_better(criterion: str) -> bool :staticmethod: Determines whether a higher value is better for a given criterion. :param criterion: The evaluation criterion. :type criterion: str :returns: True if a higher value is better, False otherwise. :rtype: bool :raises ValueError: If the criterion is not recognized. .. rubric:: Examples >>> BinarySegmentationEvaluationScores.higher_is_better("dice") True >>> BinarySegmentationEvaluationScores.higher_is_better("f1_score") True .. rubric:: Notes The method returns True if the criterion is recognized and False otherwise. Whether a higher value is better for a given criterion is determined by the mapping dictionary. .. py:method:: bounds(criterion: str) -> Tuple[Union[int, float, None], Union[int, float, None]] :staticmethod: Determines the bounds for a given criterion. The bounds are used to determine the best value for a given criterion. :param criterion: The evaluation criterion. :type criterion: str :returns: The lower and upper bounds for the criterion. :rtype: Tuple[Union[int, float, None], Union[int, float, None]] :raises ValueError: If the criterion is not recognized. .. rubric:: Examples >>> BinarySegmentationEvaluationScores.bounds("dice") (0, 1) >>> BinarySegmentationEvaluationScores.bounds("hausdorff") (0, nan) .. rubric:: Notes The method returns the lower and upper bounds for the criterion. The bounds are determined by the mapping dictionary. .. py:class:: BinarySegmentationEvaluator(clip_distance: float, tol_distance: float, channels: List[str]) Given a binary segmentation, compute various metrics to determine their similarity. The metrics include: - Dice coefficient: 2 * |A ∩ B| / |A| + |B| ; where A and B are the binary segmentations - Jaccard coefficient: |A ∩ B| / |A ∪ B| ; where A and B are the binary segmentations - Hausdorff distance: max(h(A, B), h(B, A)) ; where h(A, B) is the Hausdorff distance between A and B - False negative rate: |A - B| / |A| ; where A and B are the binary segmentations - False positive rate: |B - A| / |B| ; where A and B are the binary segmentations - False discovery rate: |B - A| / |A| ; where A and B are the binary segmentations - VOI: Variation of Information; split and merge errors combined into a single measure of segmentation quality - Mean false distance: 0.5 * (mean false positive distance + mean false negative distance) - Mean false negative distance: mean distance of false negatives - Mean false positive distance: mean distance of false positives - Mean false distance clipped: 0.5 * (mean false positive distance clipped + mean false negative distance clipped) ; clipped to a maximum distance - Mean false negative distance clipped: mean distance of false negatives clipped ; clipped to a maximum distance - Mean false positive distance clipped: mean distance of false positives clipped ; clipped to a maximum distance - Precision with tolerance: TP / (TP + FP) ; where TP and FP are the true and false positives within a tolerance distance - Recall with tolerance: TP / (TP + FN) ; where TP and FN are the true and false positives within a tolerance distance - F1 score with tolerance: 2 * (Recall * Precision) / (Recall + Precision) ; where Recall and Precision are the true and false positives within a tolerance distance - Precision: TP / (TP + FP) ; where TP and FP are the true and false positives - Recall: TP / (TP + FN) ; where TP and FN are the true and false positives - F1 score: 2 * (Recall * Precision) / (Recall + Precision) ; where Recall and Precision are the true and false positives .. attribute:: clip_distance float the clip distance .. attribute:: tol_distance float the tolerance distance .. attribute:: channels List[str] the channels .. attribute:: criteria List[str] the evaluation criteria .. method:: evaluate(output_array_identifier, evaluation_array) Evaluate the output array against the evaluation array. .. method:: score Return the evaluation scores. .. note:: The BinarySegmentationEvaluator class is used to evaluate the performance of a binary segmentation task. The class provides methods to evaluate the output array against the evaluation array and return the evaluation scores. All evaluation scores should inherit from this class. Clip distance is the maximum distance between the ground truth and the predicted segmentation for a pixel to be considered a false positive. Tolerance distance is the maximum distance between the ground truth and the predicted segmentation for a pixel to be considered a true positive. Channels are the channels of the binary segmentation. Criteria are the evaluation criteria. .. py:attribute:: criteria :value: ['jaccard', 'voi'] A list of all criteria for which a model might be "best". i.e. your criteria might be "precision", "recall", and "jaccard". It is unlikely that the best iteration/post processing parameters will be the same for all 3 of these criteria :returns: List[str] the evaluation criteria :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> evaluator = Evaluator() >>> evaluator.criteria [] .. note:: This function is used to return the evaluation criteria. .. py:attribute:: clip_distance .. py:attribute:: tol_distance .. py:attribute:: channels .. py:method:: evaluate(output_array_identifier, evaluation_array) Evaluate the output array against the evaluation array. :param output_array_identifier: str the identifier of the output array :param evaluation_array: ZarrArray the evaluation array :returns: BinarySegmentationEvaluationScores or MultiChannelBinarySegmentationEvaluationScores the evaluation scores :raises ValueError: if the output array identifier is not valid .. rubric:: Examples >>> binary_segmentation_evaluator = BinarySegmentationEvaluator(clip_distance=200, tol_distance=40, channels=["channel1", "channel2"]) >>> output_array_identifier = "output_array" >>> evaluation_array = ZarrArray.open_from_array_identifier("evaluation_array") >>> binary_segmentation_evaluator.evaluate(output_array_identifier, evaluation_array) BinarySegmentationEvaluationScores(dice=0.0, jaccard=0.0, hausdorff=0.0, false_negative_rate=0.0, false_positive_rate=0.0, false_discovery_rate=0.0, voi=0.0, mean_false_distance=0.0, mean_false_negative_distance=0.0, mean_false_positive_distance=0.0, mean_false_distance_clipped=0.0, mean_false_negative_distance_clipped=0.0, mean_false_positive_distance_clipped=0.0, precision_with_tolerance=0.0, recall_with_tolerance=0.0, f1_score_with_tolerance=0.0, precision=0.0, recall=0.0, f1_score=0.0) .. note:: This function is used to evaluate the output array against the evaluation array. .. py:property:: score Return the evaluation scores. :returns: BinarySegmentationEvaluationScores or MultiChannelBinarySegmentationEvaluationScores the evaluation scores :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> binary_segmentation_evaluator = BinarySegmentationEvaluator(clip_distance=200, tol_distance=40, channels=["channel1", "channel2"]) >>> binary_segmentation_evaluator.score BinarySegmentationEvaluationScores(dice=0.0, jaccard=0.0, hausdorff=0.0, false_negative_rate=0.0, false_positive_rate=0.0, false_discovery_rate=0.0, voi=0.0, mean_false_distance=0.0, mean_false_negative_distance=0.0, mean_false_positive_distance=0.0, mean_false_distance_clipped=0.0, mean_false_negative_distance_clipped=0.0, mean_false_positive_distance_clipped=0.0, precision_with_tolerance=0.0, recall_with_tolerance=0.0, f1_score_with_tolerance=0.0, precision=0.0, recall=0.0, f1_score=0.0) .. note:: This function is used to return the evaluation scores. .. py:class:: InstanceEvaluationScores The evaluation scores for the instance segmentation task. The scores include the variation of information (VOI) split, VOI merge, and VOI. .. attribute:: voi_split float the variation of information (VOI) split .. attribute:: voi_merge float the variation of information (VOI) merge .. attribute:: voi float the variation of information (VOI) .. method:: higher_is_better(criterion) Return whether higher is better for the given criterion. .. method:: bounds(criterion) Return the bounds for the given criterion. .. method:: store_best(criterion) Return whether to store the best score for the given criterion. .. note:: The InstanceEvaluationScores class is used to store the evaluation scores for the instance segmentation task. .. py:attribute:: criteria :value: ['voi_split', 'voi_merge', 'voi'] The evaluation criteria. :returns: List[str] the evaluation criteria :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> evaluation_scores = EvaluationScores() >>> evaluation_scores.criteria ["criterion1", "criterion2"] .. note:: This function is used to return the evaluation criteria. .. py:attribute:: voi_split :type: float .. py:attribute:: voi_merge :type: float .. py:property:: voi Return the average of the VOI split and VOI merge. :returns: float the average of the VOI split and VOI merge :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> instance_evaluation_scores = InstanceEvaluationScores(voi_split=0.1, voi_merge=0.2) >>> instance_evaluation_scores.voi 0.15 .. note:: This function is used to calculate the average of the VOI split and VOI merge. .. py:method:: higher_is_better(criterion: str) -> bool :staticmethod: Return whether higher is better for the given criterion. :param criterion: str the evaluation criterion :returns: bool whether higher is better for the given criterion :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> InstanceEvaluationScores.higher_is_better("voi_split") False .. note:: This function is used to determine whether higher is better for the given criterion. .. py:method:: bounds(criterion: str) -> Tuple[Union[int, float, None], Union[int, float, None]] :staticmethod: Return the bounds for the given criterion. :param criterion: str the evaluation criterion :returns: Tuple[Union[int, float, None], Union[int, float, None]] the bounds for the given criterion :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> InstanceEvaluationScores.bounds("voi_split") (0, 1) .. note:: This function is used to return the bounds for the given criterion. .. py:method:: store_best(criterion: str) -> bool :staticmethod: Return whether to store the best score for the given criterion. :param criterion: str the evaluation criterion :returns: bool whether to store the best score for the given criterion :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> InstanceEvaluationScores.store_best("voi_split") True .. note:: This function is used to determine whether to store the best score for the given criterion. .. py:class:: InstanceEvaluator A class representing an evaluator for instance segmentation tasks. .. attribute:: criteria List[str] the evaluation criteria .. method:: evaluate(output_array_identifier, evaluation_array) Evaluate the output array against the evaluation array. .. method:: score Return the evaluation scores. .. note:: The InstanceEvaluator class is used to evaluate the performance of an instance segmentation task. .. py:attribute:: criteria :type: List[str] :value: ['voi_merge', 'voi_split', 'voi'] A list of all criteria for which a model might be "best". i.e. your criteria might be "precision", "recall", and "jaccard". It is unlikely that the best iteration/post processing parameters will be the same for all 3 of these criteria :returns: List[str] the evaluation criteria :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> evaluator = Evaluator() >>> evaluator.criteria [] .. note:: This function is used to return the evaluation criteria. .. py:method:: evaluate(output_array_identifier, evaluation_array) Evaluate the output array against the evaluation array. :param output_array_identifier: str the identifier of the output array :param evaluation_array: ZarrArray the evaluation array :returns: InstanceEvaluationScores the evaluation scores :raises ValueError: if the output array identifier is not valid .. rubric:: Examples >>> instance_evaluator = InstanceEvaluator() >>> output_array_identifier = "output_array" >>> evaluation_array = ZarrArray.open_from_array_identifier("evaluation_array") >>> instance_evaluator.evaluate(output_array_identifier, evaluation_array) InstanceEvaluationScores(voi_merge=0.0, voi_split=0.0) .. note:: This function is used to evaluate the output array against the evaluation array. .. py:property:: score :type: dacapo.experiments.tasks.evaluators.instance_evaluation_scores.InstanceEvaluationScores Return the evaluation scores. :returns: InstanceEvaluationScores the evaluation scores :raises NotImplementedError: if the function is not implemented .. rubric:: Examples >>> instance_evaluator = InstanceEvaluator() >>> instance_evaluator.score InstanceEvaluationScores(voi_merge=0.0, voi_split=0.0) .. note:: This function is used to return the evaluation scores.