dacapo.experiments.datasplits.datasplit_generator

Attributes

logger

Classes

CustomEnumMeta

Custom Enum Meta class to raise KeyError when an invalid option is passed.

CustomEnum

A custom Enum class to raise KeyError when an invalid option is passed.

DatasetType

An Enum class to represent the dataset type. It is derived from CustomEnum class.

SegmentationType

An Enum class to represent the segmentation type. It is derived from CustomEnum class.

DatasetSpec

A class for dataset specification. It is used to specify the dataset.

DataSplitGenerator

Generates DataSplitConfig for a given task config and datasets.

Functions

is_zarr_group(file_name, dataset)

Check if the dataset is a Zarr group. If the dataset is a Zarr group, it will return True, otherwise False.

resize_if_needed(array_config, target_resolution[, ...])

Resize the array if needed. If the array needs to be resized, it will return the resized array, otherwise it will return the original array.

limit_validation_crop_size(gt_config, mask_config, ...)

get_right_resolution_array_config(container, dataset, ...)

Get the right resolution array configuration. It will return the right resolution array configuration.

generate_dataspec_from_csv(csv_path)

Generate the dataset specification from the CSV file. It will return the dataset specification.

format_class_name(class_name[, separator_character, ...])

Format the class name.

Module Contents

dacapo.experiments.datasplits.datasplit_generator.logger
dacapo.experiments.datasplits.datasplit_generator.is_zarr_group(file_name: upath.UPath, dataset: str)

Check if the dataset is a Zarr group. If the dataset is a Zarr group, it will return True, otherwise False.

Parameters:
  • file_name – str The name of the file.

  • dataset – str The name of the dataset.

Returns:

True if the dataset is a Zarr group, otherwise False.

Return type:

bool

Raises:

FileNotFoundError – If the file does not exist, a FileNotFoundError is raised.

Examples

>>> is_zarr_group(file_name, dataset)

Notes

This function is used to check if the dataset is a Zarr group.

dacapo.experiments.datasplits.datasplit_generator.resize_if_needed(array_config: dacapo.experiments.datasplits.datasets.arrays.ZarrArrayConfig, target_resolution: funlib.geometry.Coordinate, extra_str='')

Resize the array if needed. If the array needs to be resized, it will return the resized array, otherwise it will return the original array.

Parameters:
  • array_config – obj The configuration of the array.

  • target_resolution – obj The target resolution.

  • extra_str – str An extra string.

Returns:

The resized array if needed, otherwise the original array.

Return type:

obj

Raises:

FileNotFoundError – If the file does not exist, a FileNotFoundError is raised.

Examples

>>> resize_if_needed(array_config, target_resolution, extra_str)

Notes

This function is used to resize the array if needed.

dacapo.experiments.datasplits.datasplit_generator.limit_validation_crop_size(gt_config, mask_config, max_size)
dacapo.experiments.datasplits.datasplit_generator.get_right_resolution_array_config(container: upath.UPath, dataset, target_resolution, extra_str='')

Get the right resolution array configuration. It will return the right resolution array configuration.

Parameters:
  • container – obj The container.

  • dataset – str The dataset.

  • target_resolution – obj The target resolution.

  • extra_str – str An extra string.

Returns:

The right resolution array configuration.

Return type:

obj

Raises:

FileNotFoundError – If the file does not exist, a FileNotFoundError is raised.

Examples

>>> get_right_resolution_array_config(container, dataset, target_resolution, extra_str)

Notes

This function is used to get the right resolution array configuration.

class dacapo.experiments.datasplits.datasplit_generator.CustomEnumMeta

Custom Enum Meta class to raise KeyError when an invalid option is passed.

_member_names_

list The list of member names.

__getitem__(self, item)

A method to get the item.

Notes

This class is used to raise KeyError when an invalid option is passed.

class dacapo.experiments.datasplits.datasplit_generator.CustomEnum

A custom Enum class to raise KeyError when an invalid option is passed.

__str__

str The string representation of the class.

__str__(self)

A method to get the string representation of the class.

Notes

This class is used to raise KeyError when an invalid option is passed.

class dacapo.experiments.datasplits.datasplit_generator.DatasetType

An Enum class to represent the dataset type. It is derived from CustomEnum class.

val

int The value of the dataset type.

train

int The training dataset type.

__str__(self)

A method to get the string representation of the class.

Notes

This class is used to represent the dataset type.

val = 1
train = 2
class dacapo.experiments.datasplits.datasplit_generator.SegmentationType

An Enum class to represent the segmentation type. It is derived from CustomEnum class.

semantic

int The semantic segmentation type.

instance

int The instance segmentation type.

__str__(self)

A method to get the string representation of the class.

Notes

This class is used to represent the segmentation type.

semantic = 1
instance = 2
class dacapo.experiments.datasplits.datasplit_generator.DatasetSpec(dataset_type: str | DatasetType, raw_container: str | upath.UPath, raw_dataset: str, gt_container: str | upath.UPath, gt_dataset: str)

A class for dataset specification. It is used to specify the dataset.

dataset_type

obj The dataset type.

raw_container

obj The raw container.

raw_dataset

str The raw dataset.

gt_container

obj The ground truth container.

gt_dataset

str The ground truth dataset.

__init__(dataset_type, raw_container, raw_dataset, gt_container, gt_dataset)

Initializes the DatasetSpec class with the specified dataset type, raw container, raw dataset, ground truth container, and ground truth dataset.

__str__(self)

A method to get the string representation of the class.

Notes

This class is used to specify the dataset.

dataset_type
raw_container
raw_dataset
gt_container
gt_dataset
dacapo.experiments.datasplits.datasplit_generator.generate_dataspec_from_csv(csv_path: upath.UPath)

Generate the dataset specification from the CSV file. It will return the dataset specification.

Parameters:

csv_path – obj The CSV file path.

Returns:

The dataset specification.

Return type:

list

Raises:

FileNotFoundError – If the file does not exist, a FileNotFoundError is raised.

Examples

>>> generate_dataspec_from_csv(csv_path)

Notes

This function is used to generate the dataset specification from the CSV file.

class dacapo.experiments.datasplits.datasplit_generator.DataSplitGenerator(name: str, datasets: List[DatasetSpec], input_resolution: Sequence[int] | funlib.geometry.Coordinate, output_resolution: Sequence[int] | funlib.geometry.Coordinate, targets: List[str] | None = None, segmentation_type: str | SegmentationType = 'semantic', max_gt_downsample=32, max_gt_upsample=4, max_raw_training_downsample=16, max_raw_training_upsample=2, max_raw_validation_downsample=8, max_raw_validation_upsample=2, min_training_volume_size=8000, raw_min=0, raw_max=255, classes_separator_character='&', use_negative_class=False, max_validation_volume_size=None, binarize_gt=False)

Generates DataSplitConfig for a given task config and datasets.

Class names in gt_dataset should be within [] e.g. [mito&peroxisome&er] for multiple classes or [mito] for one class.

Currently only supports:
  • semantic segmentation.

Supports:
  • 2D and 3D datasets.

  • Zarr, N5 and OME-Zarr datasets.

  • Multi class targets.

  • Different resolutions for raw and ground truth datasets.

  • Different resolutions for training and validation datasets.

name

str The name of the data split generator.

datasets

list The list of dataset specifications.

input_resolution

obj The input resolution.

output_resolution

obj The output resolution.

targets

list The list of targets.

segmentation_type

obj The segmentation type.

max_gt_downsample

int The maximum ground truth downsample.

max_gt_upsample

int The maximum ground truth upsample.

max_raw_training_downsample

int The maximum raw training downsample.

max_raw_training_upsample

int The maximum raw training upsample.

max_raw_validation_downsample

int The maximum raw validation downsample.

max_raw_validation_upsample

int The maximum raw validation upsample.

min_training_volume_size

int The minimum training volume size.

raw_min

int The minimum raw value.

raw_max

int The maximum raw value.

classes_separator_character

str The classes separator character.

max_validation_volume_size

int The maximum validation volume size. Default is None. If None, the validation volume size is not limited. else, the validation volume size is limited to the specified value. e.g. 600**3 for 600^3 voxels = 216_000_000 voxels.

__init__(name, datasets, input_resolution, output_resolution, targets, segmentation_type, max_gt_downsample, max_gt_upsample, max_raw_training_downsample, max_raw_training_upsample, max_raw_validation_downsample, max_raw_validation_upsample, min_training_volume_size, raw_min, raw_max, classes_separator_character)

Initializes the DataSplitGenerator class with the specified name, datasets, input resolution, output resolution, targets, segmentation type, maximum ground truth downsample, maximum ground truth upsample, maximum raw training downsample, maximum raw training upsample, maximum raw validation downsample, maximum raw validation upsample, minimum training volume size, minimum raw value, maximum raw value, and classes separator character.

__str__(self)

A method to get the string representation of the class.

class_name(self)

A method to get the class name.

check_class_name(self, class_name)

A method to check the class name.

compute(self)

A method to compute the data split.

__generate_semantic_seg_datasplit(self)

A method to generate the semantic segmentation data split.

__generate_semantic_seg_dataset_crop(self, dataset)

A method to generate the semantic segmentation dataset crop.

generate_csv(datasets, csv_path)

A method to generate the CSV file.

generate_from_csv(csv_path, input_resolution, output_resolution, name, **kwargs)

A method to generate the data split from the CSV file.

Notes

  • This class is used to generate the DataSplitConfig for a given task config and datasets.

  • Class names in gt_dataset shoulb be within [] e.g. [mito&peroxisome&er] for mutiple classes or [mito] for one class

name
datasets
input_resolution
output_resolution
targets
segmentation_type
max_gt_downsample
max_gt_upsample
max_raw_training_downsample
max_raw_training_upsample
max_raw_validation_downsample
max_raw_validation_upsample
min_training_volume_size
raw_min
raw_max
classes_separator_character
use_negative_class
max_validation_volume_size
binarize_gt
property class_name
Get the class name.
Parameters:

self – obj The object.

Returns:

The class name.

Return type:

obj

Raises:
  • ValueError

  • If the class name is already set, a ValueError is raised.

Examples

>>> class_name

Notes

This function is used to get the class name.

check_class_name(class_name)

Check the class name.

Parameters:
  • self – obj The object.

  • class_name – obj The class name.

Returns:

The class name.

Return type:

obj

Raises:
  • ValueError

  • If the class name is already set, a ValueError is raised.

Examples

>>> check_class_name(class_name)

Notes

This function is used to check the class name.

compute()

Compute the data split.

Parameters:

self – obj The object.

Returns:

The data split.

Return type:

obj

Raises:
  • NotImplementedError

  • If the segmentation type is not implemented, a NotImplementedError is raised.

Examples

>>> compute()

Notes

This function is used to compute the data split.

static generate_from_csv(csv_path: upath.UPath, input_resolution: Sequence[int] | funlib.geometry.Coordinate, output_resolution: Sequence[int] | funlib.geometry.Coordinate, name: str | None = None, **kwargs)

Generate the data split from the CSV file.

Parameters:
  • csv_path – obj The CSV file path.

  • input_resolution – obj The input resolution.

  • output_resolution – obj The output resolution.

  • name – str The name.

  • **kwargs – dict The keyword arguments.

Returns:

The data split.

Return type:

obj

Raises:
  • FileNotFoundError

  • If the file does not exist, a FileNotFoundError is raised.

Examples

>>> generate_from_csv(csv_path, input_resolution, output_resolution, name, **kwargs)

Notes

This function is used to generate the data split from the CSV file.

dacapo.experiments.datasplits.datasplit_generator.format_class_name(class_name, separator_character='&', targets=None)

Format the class name.

Parameters:
  • class_name – obj The class name.

  • separator_character – str The separator character.

Returns:

The class name.

Return type:

obj

Raises:

ValueError – If the class name is invalid, a ValueError is raised.

Examples

>>> format_class_name(class_name, separator_character)

Notes

This function is used to format the class name.