dacapo.experiments

Subpackages

Submodules

Classes

`Model`	A trainable DaCapo model. Consists of an `Architecture` and a
`RunConfig`	A class to represent a configuration of a run that helps to structure all the tasks,
`TrainingIterationStats`	A class to represent the training iteration statistics. It contains the loss and time taken for each iteration.
`TrainingStats`	A class used to represent Training Statistics. It contains a list of training
`ValidationIterationScores`	A class used to represent the validation iteration scores in an organized structure.
`ValidationScores`	Class representing the validation scores for a set of parameters and datasets.

Package Contents

class dacapo.experiments.Model(architecture: dacapo.experiments.architectures.architecture.Architecture, prediction_head: torch.nn.Module, eval_activation: torch.nn.Module | None = None)

A trainable DaCapo model. Consists of an Architecture and a prediction head. Models are generated by ``Predictor``s.

May include an optional eval_activation that is only executed when the model is in eval mode. This is particularly useful if you want to train with something like BCELossWithLogits, since you want to avoid applying softmax while training, but apply it during evaluation.

architecture

The architecture of the model.

Type:: Architecture

prediction_head

The prediction head of the model.

Type:: torch.nn.Module

chain

The architecture followed by the prediction head.

Type:: torch.nn.Sequential

num_in_channels

The number of input channels.

Type:: int

input_shape

The shape of the input tensor.

Type:: Coordinate

eval_input_shape

The shape of the input tensor during evaluation.

Type:: Coordinate

num_out_channels

The number of output channels.

Type:: int

output_shape

The shape of the output

Type:: Coordinate

eval_activation

The activation function to apply during evaluation.

Type:: torch.nn.Module | None

forward(x: torch.Tensor) -> torch.Tensor: Forward pass of the model.

compute_output_shape(input_shape: Coordinate) -> Tuple[int, Coordinate]: Compute the spatial shape of this model, when fed a tensor of the given spatial shape as input.

scale(voxel_size: Coordinate) -> Coordinate: Scale the model by the given voxel size.

Note

The output shape is the spatial shape of the model, i.e., not accounting for channels and batch dimensions.

num_out_channels: int

num_in_channels: int

architecture

prediction_head

chain

input_shape

eval_input_shape

eval_activation

forward(x)

Forward pass of the model.

Parameters:: x (torch.Tensor) – The input tensor.
Returns:: The output tensor.
Return type:: torch.Tensor

Examples

>>> model = Model(architecture, prediction_head)
>>> model.forward(x)
torch.Tensor

Note

The eval_activation is only applied during evaluation. This is particularly useful if you want to train with something like BCELossWithLogits, since you want to avoid applying softmax while training, but apply it during evaluation.

compute_output_shape(input_shape: funlib.geometry.Coordinate) → Tuple[int, funlib.geometry.Coordinate]

Compute the spatial shape (i.e., not accounting for channels and batch dimensions) of this model, when fed a tensor of the given spatial shape as input.

Parameters:: input_shape (Coordinate) – The shape of the input tensor.
Returns:: The number of output channels and the spatial shape of the output.
Return type:: Tuple[int, Coordinate]
Raises:: AssertionError – If the input_shape is not a Coordinate.

Examples

>>> model = Model(architecture, prediction_head)
>>> model.compute_output_shape(input_shape)
(1, Coordinate(1, 1, 1))

Note

The output shape is the spatial shape of the model, i.e., not accounting for channels and batch dimensions.

scale(voxel_size: funlib.geometry.Coordinate) → funlib.geometry.Coordinate

Scale the model by the given voxel size.

Parameters:: voxel_size (Coordinate) – The voxel size to scale the model by.
Returns:: The scaled model.
Return type:: Coordinate
Raises:: AssertionError – If the voxel_size is not a Coordinate.

Examples

>>> model = Model(architecture, prediction_head)
>>> model.scale(voxel_size)
Coordinate(1, 1, 1)

Note

The output shape is the spatial shape of the model, i.e., not accounting for channels and batch dimensions.

class dacapo.experiments.RunConfig

A class to represent a configuration of a run that helps to structure all the tasks, architecture, training, and datasplit configurations.

…

Attributes:

task_config: TaskConfig: A config defining the Task to run that includes deciding the output of the model and different methods to achieve the goal.
architecture_config: ArchitectureConfig: A config that defines the backbone architecture of the model. It impacts the model’s performance significantly.
trainer_config: TrainerConfig: Defines how batches are generated and passed for training the model along with defining configurations like batch size, learning rate, number of cpu workers and snapshot logging.
datasplit_config: DataSplitConfig: Configures the data available for the model during training or validation phases.
name: str: A unique name for this run to distinguish it.
repetition: int: The repetition number of this run.
num_iterations: int: The total number of iterations to train for during this run.
validation_interval: int: Specifies how often to perform validation during the run. It defaults to 1000.
start_configOptional[StartConfig]: A starting point for continued training. It is optional and can be left out.

task_config: dacapo.experiments.tasks.TaskConfig

architecture_config: dacapo.experiments.architectures.ArchitectureConfig

trainer_config: dacapo.experiments.trainers.TrainerConfig

datasplit_config: dacapo.experiments.datasplits.DataSplitConfig

name: str

repetition: int

num_iterations: int

validation_interval: int

start_config: dacapo.experiments.starts.StartConfig | None

class dacapo.experiments.TrainingIterationStats

A class to represent the training iteration statistics. It contains the loss and time taken for each iteration.

iteration

The iteration that produced these stats.

Type:: int

loss

The loss value of this iteration.

Type:: float

time

The time it took to process this iteration.

Type:: float

Note

The iteration stats list is structured as follows: - The outer list contains the stats for each iteration. - The inner list contains the stats for each training iteration.

iteration: int

loss: float

time: float

class dacapo.experiments.TrainingStats

A class used to represent Training Statistics. It contains a list of training iteration statistics. It also provides methods to add new iteration stats, delete stats after a specified iteration, get the number of iterations trained for, and convert the stats to a xarray data array.

iteration_stats: List[TrainingIterationStats] an ordered list of training stats.

add_iteration_stats(iteration_stats: TrainingIterationStats) -> None: Add a new set of iterations stats to the existing list of iteration stats.

delete_after(iteration: int) -> None: Deletes training stats after a specified iteration number.

trained_until() → int: Gets the number of iterations that the model has been trained for.

to_xarray() → xr.DataArray: Converts the iteration statistics to a xarray data array.

Note

The iteration stats list is structured as follows: - The outer list contains the stats for each iteration. - The inner list contains the stats for each training iteration.

iteration_stats: List[dacapo.experiments.training_iteration_stats.TrainingIterationStats]

add_iteration_stats(iteration_stats: dacapo.experiments.training_iteration_stats.TrainingIterationStats) → None

Add a new iteration stats to the current iteration stats.

Parameters:: iteration_stats (TrainingIterationStats) – a new iteration stats object.
Raises:: assert – if the new iteration stats do not follow the order of existing iteration stats.

Examples

>>> training_stats = TrainingStats()
>>> training_stats.add_iteration_stats(TrainingIterationStats(0, 0.1))
>>> training_stats.add_iteration_stats(TrainingIterationStats(1, 0.2))
>>> training_stats.add_iteration_stats(TrainingIterationStats(2, 0.3))
>>> training_stats.iteration_stats
[TrainingIterationStats(iteration=0, loss=0.1),
 TrainingIterationStats(iteration=1, loss=0.2),
 TrainingIterationStats(iteration=2, loss=0.3)]

Note

The iteration stats list is structured as follows: - The outer list contains the stats for each iteration. - The inner list contains the stats for each training iteration.

delete_after(iteration: int) → None

Deletes training stats after a specified iteration.

Parameters:: iteration (int) – the iteration after which the stats are to be deleted.
Raises:: assert – if the iteration number is less than the maximum iteration number.

Examples

>>> training_stats = TrainingStats()
>>> training_stats.add_iteration_stats(TrainingIterationStats(0, 0.1))
>>> training_stats.add_iteration_stats(TrainingIterationStats(1, 0.2))
>>> training_stats.add_iteration_stats(TrainingIterationStats(2, 0.3))
>>> training_stats.delete_after(1)
>>> training_stats.iteration_stats
[TrainingIterationStats(iteration=0, loss=0.1)]

Note

The iteration stats list is structured as follows: - The outer list contains the stats for each iteration. - The inner list contains the stats for each training iteration.

trained_until() → int

The number of iterations trained for (the maximum iteration plus one). Returns zero if no iterations trained yet.

Returns:: number of iterations that the model has been trained for.
Return type:: int
Raises:: assert – if the iteration stats list is empty.

Examples

>>> training_stats = TrainingStats()
>>> training_stats.add_iteration_stats(TrainingIterationStats(0, 0.1))
>>> training_stats.add_iteration_stats(TrainingIterationStats(1, 0.2))
>>> training_stats.add_iteration_stats(TrainingIterationStats(2, 0.3))
>>> training_stats.trained_until()
3

Note

The iteration stats list is structured as follows: - The outer list contains the stats for each iteration. - The inner list contains the stats for each training iteration.

to_xarray() → xarray.DataArray

Converts the iteration stats to a data array format easily manipulatable.

Returns:: xarray DataArray of iteration losses.
Return type:: xr.DataArray
Raises:: assert – if the iteration stats list is empty.

Examples

>>> training_stats = TrainingStats()
>>> training_stats.add_iteration_stats(TrainingIterationStats(0, 0.1))
>>> training_stats.add_iteration_stats(TrainingIterationStats(1, 0.2))
>>> training_stats.add_iteration_stats(TrainingIterationStats(2, 0.3))
>>> training_stats.to_xarray()
<xarray.DataArray (iterations: 3)>
array([0.1, 0.2, 0.3])
Coordinates:
  * iterations  (iterations) int64 0 1 2

Note

The iteration stats list is structured as follows: - The outer list contains the stats for each iteration. - The inner list contains the stats for each training iteration.

class dacapo.experiments.ValidationIterationScores

A class used to represent the validation iteration scores in an organized structure.

iteration

The iteration associated with these validation scores.

Type:: int

scores

A list of scores per dataset, post processor

Type:: List[List[List[float]]]

parameters, and evaluation criterion.

Note

The scores list is structured as follows: - The outer list contains the scores for each dataset. - The middle list contains the scores for each post processor parameter. - The inner list contains the scores for each evaluation criterion.

iteration: int

scores: List[List[List[float]]]

class dacapo.experiments.ValidationScores

Class representing the validation scores for a set of parameters and datasets.

parameters

The list of parameters that are being evaluated.

Type:: List[PostProcessorParameters]

datasets

The datasets that will be evaluated at each iteration.

Type:: List[Dataset]

evaluation_scores

The scores that are collected on each iteration per PostProcessorParameters and Dataset.

Type:: EvaluationScores

scores

A list of evaluation scores and their associated post-processing parameters.

Type:: List[ValidationIterationScores]

subscores(iteration_scores): Create a new ValidationScores object with a subset of the iteration scores.

add_iteration_scores(iteration_scores): Add iteration scores to the list of scores.

delete_after(iteration): Delete scores after a specified iteration.

validated_until(): Get the number of iterations validated for (the maximum iteration plus one).

compare(existing_iteration_scores): Compare iteration stats provided from elsewhere to scores we have saved locally.

criteria(): Get the list of evaluation criteria.

parameter_names(): Get the list of parameter names.

to_xarray(): Convert the validation scores to an xarray DataArray.

get_best(data, dim): Compute the Best scores along dimension “dim” per criterion.

Notes

The scores attribute is a list of ValidationIterationScores objects, each of which contains the scores for a single iteration.

parameters: List[dacapo.experiments.tasks.post_processors.PostProcessorParameters]

datasets: List[dacapo.experiments.datasplits.datasets.Dataset]

evaluation_scores: dacapo.experiments.tasks.evaluators.EvaluationScores

scores: List[dacapo.experiments.validation_iteration_scores.ValidationIterationScores]

subscores(iteration_scores: List[dacapo.experiments.validation_iteration_scores.ValidationIterationScores]) → ValidationScores

Create a new ValidationScores object with a subset of the iteration scores.

Parameters:: iteration_scores – The iteration scores to include in the new ValidationScores object.
Returns:: A new ValidationScores object with the specified iteration scores.
Raises:: ValueError – If the iteration scores are not in the list of scores.

Examples

>>> validation_scores.subscores([validation_scores.scores[0]])

Note

This method is used to create a new ValidationScores object with a subset of the iteration scores. This is useful when you want to create a new ValidationScores object that only contains the scores up to a certain iteration.

add_iteration_scores(iteration_scores: dacapo.experiments.validation_iteration_scores.ValidationIterationScores) → None

Add iteration scores to the list of scores.

Parameters:: iteration_scores – The iteration scores to add.
Raises:: ValueError – If the iteration scores are already in the list of scores.

Examples

>>> validation_scores.add_iteration_scores(validation_scores.scores[0])

Note

This method is used to add iteration scores to the list of scores. This is useful when you want to add scores for a new iteration to the ValidationScores object.

delete_after(iteration: int) → None

Delete scores after a specified iteration.

Parameters:: iteration – The iteration after which to delete the scores.
Raises:: ValueError – If the iteration scores are not in the list of scores.

Examples

>>> validation_scores.delete_after(0)

Note

This method is used to delete scores after a specified iteration. This is useful when you want to delete scores after a certain iteration.

validated_until() → int

Get the number of iterations validated for (the maximum iteration plus one).

Returns:: The number of iterations validated for.
Raises:: ValueError – If there are no scores.

Examples

>>> validation_scores.validated_until()

Note

This method is used to get the number of iterations validated for (the maximum iteration plus one). This is useful when you want to know how many iterations have been validated.

compare(existing_iteration_scores: List[dacapo.experiments.validation_iteration_scores.ValidationIterationScores]) → Tuple[bool, int]

Compare iteration stats provided from elsewhere to scores we have saved locally. Local scores take priority. If local scores are at a lower iteration than the existing ones, delete the existing ones and replace with local. If local iteration > existing iteration, just update existing scores with the last overhanging local scores.

Parameters:: existing_iteration_scores – The existing iteration scores to compare with.
Returns:: A tuple indicating whether the local scores should replace the existing ones and the existing iteration number.
Raises:: ValueError – If the iteration scores are not in the list of scores.

Examples

>>> validation_scores.compare([validation_scores.scores[0]])

Note

This method is used to compare iteration stats provided from elsewhere to scores we have saved locally. Local scores take priority. If local scores are at a lower iteration than the existing ones, delete the existing ones and replace with local. If local iteration > existing iteration, just update existing scores with the last overhanging local scores.

property criteria: List[str]

Get the list of evaluation criteria.

Returns:: The list of evaluation criteria.
Raises:: ValueError – If there are no scores.

Examples

>>> validation_scores.criteria

Note

This property is used to get the list of evaluation criteria. This is useful when you want to know what criteria are being used to evaluate the scores.

property parameter_names: List[str]

Get the list of parameter names.

Returns:: The list of parameter names.
Raises:: ValueError – If there are no scores.

Examples

>>> validation_scores.parameter_names

Note

This property is used to get the list of parameter names. This is useful when you want to know what parameters are being used to evaluate the scores.

to_xarray() → xarray.DataArray

Convert the validation scores to an xarray DataArray.

Returns:: An xarray DataArray representing the validation scores.
Raises:: ValueError – If there are no scores.

Examples

>>> validation_scores.to_xarray()

Note

This method is used to convert the validation scores to an xarray DataArray. This is useful when you want to work with the validation scores as an xarray DataArray.

get_best(data: xarray.DataArray, dim: str) → Tuple[xarray.DataArray, xarray.DataArray]

Compute the Best scores along dimension “dim” per criterion. Returns both the index associated with the best value, and the best value in two separate arrays.

Parameters:

data – The data array to compute the best scores from.
dim – The dimension along which to compute the best scores.

Returns:

A tuple containing the index associated with the best value and the best value in two separate arrays.

Raises:

ValueError – If the criteria are not in the data array.

Examples

>>> validation_scores.get_best(data, "iterations")

Note

This method is used to compute the Best scores along dimension “dim” per criterion. It returns both the index associated with the best value and the best value in two separate arrays. This is useful when you want to know the best scores for a given data array. Fix: The method is currently not able to handle the case where the criteria are not in the data array. To fix this, we need to add a check to see if the criteria are in the data array and raise an error if they are not.