dacapo.experiments.datasplits.datasplit_generator
=================================================

.. py:module:: dacapo.experiments.datasplits.datasplit_generator


Attributes
----------

.. autoapisummary::

   dacapo.experiments.datasplits.datasplit_generator.logger


Classes
-------

.. autoapisummary::

   dacapo.experiments.datasplits.datasplit_generator.CustomEnumMeta
   dacapo.experiments.datasplits.datasplit_generator.CustomEnum
   dacapo.experiments.datasplits.datasplit_generator.DatasetType
   dacapo.experiments.datasplits.datasplit_generator.SegmentationType
   dacapo.experiments.datasplits.datasplit_generator.DatasetSpec
   dacapo.experiments.datasplits.datasplit_generator.DataSplitGenerator


Functions
---------

.. autoapisummary::

   dacapo.experiments.datasplits.datasplit_generator.is_zarr_group
   dacapo.experiments.datasplits.datasplit_generator.resize_if_needed
   dacapo.experiments.datasplits.datasplit_generator.limit_validation_crop_size
   dacapo.experiments.datasplits.datasplit_generator.get_right_resolution_array_config
   dacapo.experiments.datasplits.datasplit_generator.generate_dataspec_from_csv
   dacapo.experiments.datasplits.datasplit_generator.format_class_name


Module Contents
---------------

.. py:data:: logger

.. py:function:: is_zarr_group(file_name: upath.UPath, dataset: str)

   Check if the dataset is a Zarr group. If the dataset is a Zarr group, it will return True, otherwise False.

   :param file_name: str
                     The name of the file.
   :param dataset: str
                   The name of the dataset.

   :returns: True if the dataset is a Zarr group, otherwise False.
   :rtype: bool

   :raises FileNotFoundError: If the file does not exist, a FileNotFoundError is raised.

   .. rubric:: Examples

   >>> is_zarr_group(file_name, dataset)

   .. rubric:: Notes

   This function is used to check if the dataset is a Zarr group.


.. py:function:: resize_if_needed(array_config: dacapo.experiments.datasplits.datasets.arrays.ZarrArrayConfig, target_resolution: funlib.geometry.Coordinate, extra_str='')

   Resize the array if needed. If the array needs to be resized, it will return the resized array, otherwise it will return the original array.

   :param array_config: obj
                        The configuration of the array.
   :param target_resolution: obj
                             The target resolution.
   :param extra_str: str
                     An extra string.

   :returns: The resized array if needed, otherwise the original array.
   :rtype: obj

   :raises FileNotFoundError: If the file does not exist, a FileNotFoundError is raised.

   .. rubric:: Examples

   >>> resize_if_needed(array_config, target_resolution, extra_str)

   .. rubric:: Notes

   This function is used to resize the array if needed.


.. py:function:: limit_validation_crop_size(gt_config, mask_config, max_size)

.. py:function:: get_right_resolution_array_config(container: upath.UPath, dataset, target_resolution, extra_str='')

   Get the right resolution array configuration. It will return the right resolution array configuration.

   :param container: obj
                     The container.
   :param dataset: str
                   The dataset.
   :param target_resolution: obj
                             The target resolution.
   :param extra_str: str
                     An extra string.

   :returns: The right resolution array configuration.
   :rtype: obj

   :raises FileNotFoundError: If the file does not exist, a FileNotFoundError is raised.

   .. rubric:: Examples

   >>> get_right_resolution_array_config(container, dataset, target_resolution, extra_str)

   .. rubric:: Notes

   This function is used to get the right resolution array configuration.


.. py:class:: CustomEnumMeta


   Custom Enum Meta class to raise KeyError when an invalid option is passed.

   .. attribute:: _member_names_

      list
      The list of member names.

   .. method:: __getitem__(self, item)

      
      A method to get the item.

   .. rubric:: Notes

   This class is used to raise KeyError when an invalid option is passed.


.. py:class:: CustomEnum


   A custom Enum class to raise KeyError when an invalid option is passed.

   .. attribute:: __str__

      str
      The string representation of the class.

   .. method:: __str__(self)

      
      A method to get the string representation of the class.

   .. rubric:: Notes

   This class is used to raise KeyError when an invalid option is passed.


.. py:class:: DatasetType


   An Enum class to represent the dataset type. It is derived from `CustomEnum` class.

   .. attribute:: val

      int
      The value of the dataset type.

   .. attribute:: train

      int
      The training dataset type.

   .. method:: __str__(self)

      
      A method to get the string representation of the class.

   .. rubric:: Notes

   This class is used to represent the dataset type.


   .. py:attribute:: val
      :value: 1


   .. py:attribute:: train
      :value: 2


.. py:class:: SegmentationType


   An Enum class to represent the segmentation type. It is derived from `CustomEnum` class.

   .. attribute:: semantic

      int
      The semantic segmentation type.

   .. attribute:: instance

      int
      The instance segmentation type.

   .. method:: __str__(self)

      
      A method to get the string representation of the class.

   .. rubric:: Notes

   This class is used to represent the segmentation type.


   .. py:attribute:: semantic
      :value: 1


   .. py:attribute:: instance
      :value: 2


.. py:class:: DatasetSpec(dataset_type: Union[str, DatasetType], raw_container: Union[str, upath.UPath], raw_dataset: str, gt_container: Union[str, upath.UPath], gt_dataset: str)

   A class for dataset specification. It is used to specify the dataset.

   .. attribute:: dataset_type

      obj
      The dataset type.

   .. attribute:: raw_container

      obj
      The raw container.

   .. attribute:: raw_dataset

      str
      The raw dataset.

   .. attribute:: gt_container

      obj
      The ground truth container.

   .. attribute:: gt_dataset

      str
      The ground truth dataset.

   .. method:: __init__(dataset_type, raw_container, raw_dataset, gt_container, gt_dataset)

      
      Initializes the DatasetSpec class with the specified dataset type, raw container, raw dataset, ground truth container, and ground truth dataset.

   .. method:: __str__(self)

      
      A method to get the string representation of the class.

   .. rubric:: Notes

   This class is used to specify the dataset.


   .. py:attribute:: dataset_type


   .. py:attribute:: raw_container


   .. py:attribute:: raw_dataset


   .. py:attribute:: gt_container


   .. py:attribute:: gt_dataset


.. py:function:: generate_dataspec_from_csv(csv_path: upath.UPath)

   Generate the dataset specification from the CSV file. It will return the dataset specification.

   :param csv_path: obj
                    The CSV file path.

   :returns: The dataset specification.
   :rtype: list

   :raises FileNotFoundError: If the file does not exist, a FileNotFoundError is raised.

   .. rubric:: Examples

   >>> generate_dataspec_from_csv(csv_path)

   .. rubric:: Notes

   This function is used to generate the dataset specification from the CSV file.


.. py:class:: DataSplitGenerator(name: str, datasets: List[DatasetSpec], input_resolution: Union[Sequence[int], funlib.geometry.Coordinate], output_resolution: Union[Sequence[int], funlib.geometry.Coordinate], targets: Optional[List[str]] = None, segmentation_type: Union[str, SegmentationType] = 'semantic', max_gt_downsample=32, max_gt_upsample=4, max_raw_training_downsample=16, max_raw_training_upsample=2, max_raw_validation_downsample=8, max_raw_validation_upsample=2, min_training_volume_size=8000, raw_min=0, raw_max=255, classes_separator_character='&', use_negative_class=False, max_validation_volume_size=None, binarize_gt=False)

   Generates DataSplitConfig for a given task config and datasets.

   Class names in gt_dataset should be within [] e.g. [mito&peroxisome&er] for
   multiple classes or [mito] for one class.

   Currently only supports:
    - semantic segmentation.
    Supports:
       - 2D and 3D datasets.
       - Zarr, N5 and OME-Zarr datasets.
       - Multi class targets.
       - Different resolutions for raw and ground truth datasets.
       - Different resolutions for training and validation datasets.

   .. attribute:: name

      str
      The name of the data split generator.

   .. attribute:: datasets

      list
      The list of dataset specifications.

   .. attribute:: input_resolution

      obj
      The input resolution.

   .. attribute:: output_resolution

      obj
      The output resolution.

   .. attribute:: targets

      list
      The list of targets.

   .. attribute:: segmentation_type

      obj
      The segmentation type.

   .. attribute:: max_gt_downsample

      int
      The maximum ground truth downsample.

   .. attribute:: max_gt_upsample

      int
      The maximum ground truth upsample.

   .. attribute:: max_raw_training_downsample

      int
      The maximum raw training downsample.

   .. attribute:: max_raw_training_upsample

      int
      The maximum raw training upsample.

   .. attribute:: max_raw_validation_downsample

      int
      The maximum raw validation downsample.

   .. attribute:: max_raw_validation_upsample

      int
      The maximum raw validation upsample.

   .. attribute:: min_training_volume_size

      int
      The minimum training volume size.

   .. attribute:: raw_min

      int
      The minimum raw value.

   .. attribute:: raw_max

      int
      The maximum raw value.

   .. attribute:: classes_separator_character

      str
      The classes separator character.

   .. attribute:: max_validation_volume_size

      int
      The maximum validation volume size. Default is None. If None, the validation volume size is not limited.
      else, the validation volume size is limited to the specified value.
      e.g. 600**3 for 600^3 voxels = 216_000_000 voxels.

   .. method:: __init__(name, datasets, input_resolution, output_resolution, targets, segmentation_type, max_gt_downsample, max_gt_upsample, max_raw_training_downsample, max_raw_training_upsample, max_raw_validation_downsample, max_raw_validation_upsample, min_training_volume_size, raw_min, raw_max, classes_separator_character)

      
      Initializes the DataSplitGenerator class with the specified name, datasets, input resolution, output resolution, targets, segmentation type, maximum ground truth downsample, maximum ground truth upsample, maximum raw training downsample, maximum raw training upsample, maximum raw validation downsample, maximum raw validation upsample, minimum training volume size, minimum raw value, maximum raw value, and classes separator character.

   .. method:: __str__(self)

      
      A method to get the string representation of the class.

   .. method:: class_name(self)

      
      A method to get the class name.

   .. method:: check_class_name(self, class_name)

      
      A method to check the class name.

   .. method:: compute(self)

      
      A method to compute the data split.

   .. method:: __generate_semantic_seg_datasplit(self)

      
      A method to generate the semantic segmentation data split.

   .. method:: __generate_semantic_seg_dataset_crop(self, dataset)

      
      A method to generate the semantic segmentation dataset crop.

   .. method:: generate_csv(datasets, csv_path)

      
      A method to generate the CSV file.

   .. method:: generate_from_csv(csv_path, input_resolution, output_resolution, name, **kwargs)

      
      A method to generate the data split from the CSV file.

   .. rubric:: Notes

   - This class is used to generate the DataSplitConfig for a given task config and datasets.
   - Class names in gt_dataset shoulb be within [] e.g. [mito&peroxisome&er] for mutiple classes or [mito] for one class


   .. py:attribute:: name


   .. py:attribute:: datasets


   .. py:attribute:: input_resolution


   .. py:attribute:: output_resolution


   .. py:attribute:: targets


   .. py:attribute:: segmentation_type


   .. py:attribute:: max_gt_downsample


   .. py:attribute:: max_gt_upsample


   .. py:attribute:: max_raw_training_downsample


   .. py:attribute:: max_raw_training_upsample


   .. py:attribute:: max_raw_validation_downsample


   .. py:attribute:: max_raw_validation_upsample


   .. py:attribute:: min_training_volume_size


   .. py:attribute:: raw_min


   .. py:attribute:: raw_max


   .. py:attribute:: classes_separator_character


   .. py:attribute:: use_negative_class


   .. py:attribute:: max_validation_volume_size


   .. py:attribute:: binarize_gt


   .. py:property:: class_name
      Get the class name.

      :param self: obj
                   The object.

      :returns: The class name.
      :rtype: obj

      :raises ValueError:
      :raises If the class name is already set, a ValueError is raised.:

      .. rubric:: Examples

      >>> class_name

      .. rubric:: Notes

      This function is used to get the class name.


   .. py:method:: check_class_name(class_name)

      Check the class name.

      :param self: obj
                   The object.
      :param class_name: obj
                         The class name.

      :returns: The class name.
      :rtype: obj

      :raises ValueError:
      :raises If the class name is already set, a ValueError is raised.:

      .. rubric:: Examples

      >>> check_class_name(class_name)

      .. rubric:: Notes

      This function is used to check the class name.


   .. py:method:: compute()

      Compute the data split.

      :param self: obj
                   The object.

      :returns: The data split.
      :rtype: obj

      :raises NotImplementedError:
      :raises If the segmentation type is not implemented, a NotImplementedError is raised.:

      .. rubric:: Examples

      >>> compute()

      .. rubric:: Notes

      This function is used to compute the data split.


   .. py:method:: generate_from_csv(csv_path: upath.UPath, input_resolution: Union[Sequence[int], funlib.geometry.Coordinate], output_resolution: Union[Sequence[int], funlib.geometry.Coordinate], name: Optional[str] = None, **kwargs)
      :staticmethod:


      Generate the data split from the CSV file.

      :param csv_path: obj
                       The CSV file path.
      :param input_resolution: obj
                               The input resolution.
      :param output_resolution: obj
                                The output resolution.
      :param name: str
                   The name.
      :param \*\*kwargs: dict
                         The keyword arguments.

      :returns: The data split.
      :rtype: obj

      :raises FileNotFoundError:
      :raises If the file does not exist, a FileNotFoundError is raised.:

      .. rubric:: Examples

      >>> generate_from_csv(csv_path, input_resolution, output_resolution, name, **kwargs)

      .. rubric:: Notes

      This function is used to generate the data split from the CSV file.


.. py:function:: format_class_name(class_name, separator_character='&', targets=None)

   Format the class name.

   :param class_name: obj
                      The class name.
   :param separator_character: str
                               The separator character.

   :returns: The class name.
   :rtype: obj

   :raises ValueError: If the class name is invalid, a ValueError is raised.

   .. rubric:: Examples

   >>> format_class_name(class_name, separator_character)

   .. rubric:: Notes

   This function is used to format the class name.