dacapo.experiments

Subpackages

Submodules

Classes

Model

A trainable DaCapo model. Consists of an Architecture and a

RunConfig

A class to represent a configuration of a run that helps to structure all the tasks,

TrainingIterationStats

A class to represent the training iteration statistics. It contains the loss and time taken for each iteration.

TrainingStats

A class used to represent Training Statistics. It contains a list of training

ValidationIterationScores

A class used to represent the validation iteration scores in an organized structure.

ValidationScores

Class representing the validation scores for a set of parameters and datasets.

Package Contents

class dacapo.experiments.Model(architecture: dacapo.experiments.architectures.architecture.Architecture, prediction_head: torch.nn.Module, eval_activation: torch.nn.Module | None = None)

A trainable DaCapo model. Consists of an Architecture and a prediction head. Models are generated by ``Predictor``s.

May include an optional eval_activation that is only executed when the model is in eval mode. This is particularly useful if you want to train with something like BCELossWithLogits, since you want to avoid applying softmax while training, but apply it during evaluation.

architecture

The architecture of the model.

Type:

Architecture

prediction_head

The prediction head of the model.

Type:

torch.nn.Module

chain

The architecture followed by the prediction head.

Type:

torch.nn.Sequential

num_in_channels

The number of input channels.

Type:

int

input_shape

The shape of the input tensor.

Type:

Coordinate

eval_input_shape

The shape of the input tensor during evaluation.

Type:

Coordinate

num_out_channels

The number of output channels.

Type:

int

output_shape

The shape of the output

Type:

Coordinate

eval_activation

The activation function to apply during evaluation.

Type:

torch.nn.Module | None

forward(x

torch.Tensor) -> torch.Tensor: Forward pass of the model.

compute_output_shape(input_shape

Coordinate) -> Tuple[int, Coordinate]: Compute the spatial shape of this model, when fed a tensor of the given spatial shape as input.

scale(voxel_size

Coordinate) -> Coordinate: Scale the model by the given voxel size.

Note

The output shape is the spatial shape of the model, i.e., not accounting for channels and batch dimensions.

num_out_channels: int
num_in_channels: int
architecture
prediction_head
chain
input_shape
eval_input_shape
eval_activation
forward(x)

Forward pass of the model.

Parameters:

x (torch.Tensor) – The input tensor.

Returns:

The output tensor.

Return type:

torch.Tensor

Examples

>>> model = Model(architecture, prediction_head)
>>> model.forward(x)
torch.Tensor

Note

The eval_activation is only applied during evaluation. This is particularly useful if you want to train with something like BCELossWithLogits, since you want to avoid applying softmax while training, but apply it during evaluation.

compute_output_shape(input_shape: funlib.geometry.Coordinate) Tuple[int, funlib.geometry.Coordinate]

Compute the spatial shape (i.e., not accounting for channels and batch dimensions) of this model, when fed a tensor of the given spatial shape as input.

Parameters:

input_shape (Coordinate) – The shape of the input tensor.

Returns:

The number of output channels and the spatial shape of the output.

Return type:

Tuple[int, Coordinate]

Raises:

AssertionError – If the input_shape is not a Coordinate.

Examples

>>> model = Model(architecture, prediction_head)
>>> model.compute_output_shape(input_shape)
(1, Coordinate(1, 1, 1))

Note

The output shape is the spatial shape of the model, i.e., not accounting for channels and batch dimensions.

scale(voxel_size: funlib.geometry.Coordinate) funlib.geometry.Coordinate

Scale the model by the given voxel size.

Parameters:

voxel_size (Coordinate) – The voxel size to scale the model by.

Returns:

The scaled model.

Return type:

Coordinate

Raises:

AssertionError – If the voxel_size is not a Coordinate.

Examples

>>> model = Model(architecture, prediction_head)
>>> model.scale(voxel_size)
Coordinate(1, 1, 1)

Note

The output shape is the spatial shape of the model, i.e., not accounting for channels and batch dimensions.

class dacapo.experiments.RunConfig

A class to represent a configuration of a run that helps to structure all the tasks, architecture, training, and datasplit configurations.

Attributes:

task_config: TaskConfig

A config defining the Task to run that includes deciding the output of the model and different methods to achieve the goal.

architecture_config: ArchitectureConfig

A config that defines the backbone architecture of the model. It impacts the model’s performance significantly.

trainer_config: TrainerConfig

Defines how batches are generated and passed for training the model along with defining configurations like batch size, learning rate, number of cpu workers and snapshot logging.

datasplit_config: DataSplitConfig

Configures the data available for the model during training or validation phases.

name: str

A unique name for this run to distinguish it.

repetition: int

The repetition number of this run.

num_iterations: int

The total number of iterations to train for during this run.

validation_interval: int

Specifies how often to perform validation during the run. It defaults to 1000.

start_configOptional[StartConfig]

A starting point for continued training. It is optional and can be left out.

task_config: dacapo.experiments.tasks.TaskConfig
architecture_config: dacapo.experiments.architectures.ArchitectureConfig
trainer_config: dacapo.experiments.trainers.TrainerConfig
datasplit_config: dacapo.experiments.datasplits.DataSplitConfig
name: str
repetition: int
num_iterations: int
validation_interval: int
start_config: dacapo.experiments.starts.StartConfig | None
class dacapo.experiments.TrainingIterationStats

A class to represent the training iteration statistics. It contains the loss and time taken for each iteration.

iteration

The iteration that produced these stats.

Type:

int

loss

The loss value of this iteration.

Type:

float

time

The time it took to process this iteration.

Type:

float

Note

The iteration stats list is structured as follows: - The outer list contains the stats for each iteration. - The inner list contains the stats for each training iteration.

iteration: int
loss: float
time: float
class dacapo.experiments.TrainingStats

A class used to represent Training Statistics. It contains a list of training iteration statistics. It also provides methods to add new iteration stats, delete stats after a specified iteration, get the number of iterations trained for, and convert the stats to a xarray data array.

iteration_stats

List[TrainingIterationStats] an ordered list of training stats.

add_iteration_stats(iteration_stats

TrainingIterationStats) -> None: Add a new set of iterations stats to the existing list of iteration stats.

delete_after(iteration

int) -> None: Deletes training stats after a specified iteration number.

trained_until() int

Gets the number of iterations that the model has been trained for.

to_xarray() xr.DataArray

Converts the iteration statistics to a xarray data array.

Note

The iteration stats list is structured as follows: - The outer list contains the stats for each iteration. - The inner list contains the stats for each training iteration.

iteration_stats: List[dacapo.experiments.training_iteration_stats.TrainingIterationStats]
add_iteration_stats(iteration_stats: dacapo.experiments.training_iteration_stats.TrainingIterationStats) None

Add a new iteration stats to the current iteration stats.

Parameters:

iteration_stats (TrainingIterationStats) – a new iteration stats object.

Raises:

assert – if the new iteration stats do not follow the order of existing iteration stats.

Examples

>>> training_stats = TrainingStats()
>>> training_stats.add_iteration_stats(TrainingIterationStats(0, 0.1))
>>> training_stats.add_iteration_stats(TrainingIterationStats(1, 0.2))
>>> training_stats.add_iteration_stats(TrainingIterationStats(2, 0.3))
>>> training_stats.iteration_stats
[TrainingIterationStats(iteration=0, loss=0.1),
 TrainingIterationStats(iteration=1, loss=0.2),
 TrainingIterationStats(iteration=2, loss=0.3)]

Note

The iteration stats list is structured as follows: - The outer list contains the stats for each iteration. - The inner list contains the stats for each training iteration.

delete_after(iteration: int) None

Deletes training stats after a specified iteration.

Parameters:

iteration (int) – the iteration after which the stats are to be deleted.

Raises:

assert – if the iteration number is less than the maximum iteration number.

Examples

>>> training_stats = TrainingStats()
>>> training_stats.add_iteration_stats(TrainingIterationStats(0, 0.1))
>>> training_stats.add_iteration_stats(TrainingIterationStats(1, 0.2))
>>> training_stats.add_iteration_stats(TrainingIterationStats(2, 0.3))
>>> training_stats.delete_after(1)
>>> training_stats.iteration_stats
[TrainingIterationStats(iteration=0, loss=0.1)]

Note

The iteration stats list is structured as follows: - The outer list contains the stats for each iteration. - The inner list contains the stats for each training iteration.

trained_until() int

The number of iterations trained for (the maximum iteration plus one). Returns zero if no iterations trained yet.

Returns:

number of iterations that the model has been trained for.

Return type:

int

Raises:

assert – if the iteration stats list is empty.

Examples

>>> training_stats = TrainingStats()
>>> training_stats.add_iteration_stats(TrainingIterationStats(0, 0.1))
>>> training_stats.add_iteration_stats(TrainingIterationStats(1, 0.2))
>>> training_stats.add_iteration_stats(TrainingIterationStats(2, 0.3))
>>> training_stats.trained_until()
3

Note

The iteration stats list is structured as follows: - The outer list contains the stats for each iteration. - The inner list contains the stats for each training iteration.

to_xarray() xarray.DataArray

Converts the iteration stats to a data array format easily manipulatable.

Returns:

xarray DataArray of iteration losses.

Return type:

xr.DataArray

Raises:

assert – if the iteration stats list is empty.

Examples

>>> training_stats = TrainingStats()
>>> training_stats.add_iteration_stats(TrainingIterationStats(0, 0.1))
>>> training_stats.add_iteration_stats(TrainingIterationStats(1, 0.2))
>>> training_stats.add_iteration_stats(TrainingIterationStats(2, 0.3))
>>> training_stats.to_xarray()
<xarray.DataArray (iterations: 3)>
array([0.1, 0.2, 0.3])
Coordinates:
  * iterations  (iterations) int64 0 1 2

Note

The iteration stats list is structured as follows: - The outer list contains the stats for each iteration. - The inner list contains the stats for each training iteration.

class dacapo.experiments.ValidationIterationScores

A class used to represent the validation iteration scores in an organized structure.

iteration

The iteration associated with these validation scores.

Type:

int

scores

A list of scores per dataset, post processor

Type:

List[List[List[float]]]

parameters, and evaluation criterion.

Note

The scores list is structured as follows: - The outer list contains the scores for each dataset. - The middle list contains the scores for each post processor parameter. - The inner list contains the scores for each evaluation criterion.

iteration: int
scores: List[List[List[float]]]
class dacapo.experiments.ValidationScores

Class representing the validation scores for a set of parameters and datasets.

parameters

The list of parameters that are being evaluated.

Type:

List[PostProcessorParameters]

datasets

The datasets that will be evaluated at each iteration.

Type:

List[Dataset]

evaluation_scores

The scores that are collected on each iteration per PostProcessorParameters and Dataset.

Type:

EvaluationScores

scores

A list of evaluation scores and their associated post-processing parameters.

Type:

List[ValidationIterationScores]

subscores(iteration_scores)

Create a new ValidationScores object with a subset of the iteration scores.

add_iteration_scores(iteration_scores)

Add iteration scores to the list of scores.

delete_after(iteration)

Delete scores after a specified iteration.

validated_until()

Get the number of iterations validated for (the maximum iteration plus one).

compare(existing_iteration_scores)

Compare iteration stats provided from elsewhere to scores we have saved locally.

criteria()

Get the list of evaluation criteria.

parameter_names()

Get the list of parameter names.

to_xarray()

Convert the validation scores to an xarray DataArray.

get_best(data, dim)

Compute the Best scores along dimension “dim” per criterion.

Notes

The scores attribute is a list of ValidationIterationScores objects, each of which contains the scores for a single iteration.

parameters: List[dacapo.experiments.tasks.post_processors.PostProcessorParameters]
datasets: List[dacapo.experiments.datasplits.datasets.Dataset]
evaluation_scores: dacapo.experiments.tasks.evaluators.EvaluationScores
scores: List[dacapo.experiments.validation_iteration_scores.ValidationIterationScores]
subscores(iteration_scores: List[dacapo.experiments.validation_iteration_scores.ValidationIterationScores]) ValidationScores

Create a new ValidationScores object with a subset of the iteration scores.

Parameters:

iteration_scores – The iteration scores to include in the new ValidationScores object.

Returns:

A new ValidationScores object with the specified iteration scores.

Raises:

ValueError – If the iteration scores are not in the list of scores.

Examples

>>> validation_scores.subscores([validation_scores.scores[0]])

Note

This method is used to create a new ValidationScores object with a subset of the iteration scores. This is useful when you want to create a new ValidationScores object that only contains the scores up to a certain iteration.

add_iteration_scores(iteration_scores: dacapo.experiments.validation_iteration_scores.ValidationIterationScores) None

Add iteration scores to the list of scores.

Parameters:

iteration_scores – The iteration scores to add.

Raises:

ValueError – If the iteration scores are already in the list of scores.

Examples

>>> validation_scores.add_iteration_scores(validation_scores.scores[0])

Note

This method is used to add iteration scores to the list of scores. This is useful when you want to add scores for a new iteration to the ValidationScores object.

delete_after(iteration: int) None

Delete scores after a specified iteration.

Parameters:

iteration – The iteration after which to delete the scores.

Raises:

ValueError – If the iteration scores are not in the list of scores.

Examples

>>> validation_scores.delete_after(0)

Note

This method is used to delete scores after a specified iteration. This is useful when you want to delete scores after a certain iteration.

validated_until() int

Get the number of iterations validated for (the maximum iteration plus one).

Returns:

The number of iterations validated for.

Raises:

ValueError – If there are no scores.

Examples

>>> validation_scores.validated_until()

Note

This method is used to get the number of iterations validated for (the maximum iteration plus one). This is useful when you want to know how many iterations have been validated.

compare(existing_iteration_scores: List[dacapo.experiments.validation_iteration_scores.ValidationIterationScores]) Tuple[bool, int]

Compare iteration stats provided from elsewhere to scores we have saved locally. Local scores take priority. If local scores are at a lower iteration than the existing ones, delete the existing ones and replace with local. If local iteration > existing iteration, just update existing scores with the last overhanging local scores.

Parameters:

existing_iteration_scores – The existing iteration scores to compare with.

Returns:

A tuple indicating whether the local scores should replace the existing ones and the existing iteration number.

Raises:

ValueError – If the iteration scores are not in the list of scores.

Examples

>>> validation_scores.compare([validation_scores.scores[0]])

Note

This method is used to compare iteration stats provided from elsewhere to scores we have saved locally. Local scores take priority. If local scores are at a lower iteration than the existing ones, delete the existing ones and replace with local. If local iteration > existing iteration, just update existing scores with the last overhanging local scores.

property criteria: List[str]

Get the list of evaluation criteria.

Returns:

The list of evaluation criteria.

Raises:

ValueError – If there are no scores.

Examples

>>> validation_scores.criteria

Note

This property is used to get the list of evaluation criteria. This is useful when you want to know what criteria are being used to evaluate the scores.

property parameter_names: List[str]

Get the list of parameter names.

Returns:

The list of parameter names.

Raises:

ValueError – If there are no scores.

Examples

>>> validation_scores.parameter_names

Note

This property is used to get the list of parameter names. This is useful when you want to know what parameters are being used to evaluate the scores.

to_xarray() xarray.DataArray

Convert the validation scores to an xarray DataArray.

Returns:

An xarray DataArray representing the validation scores.

Raises:

ValueError – If there are no scores.

Examples

>>> validation_scores.to_xarray()

Note

This method is used to convert the validation scores to an xarray DataArray. This is useful when you want to work with the validation scores as an xarray DataArray.

get_best(data: xarray.DataArray, dim: str) Tuple[xarray.DataArray, xarray.DataArray]

Compute the Best scores along dimension “dim” per criterion. Returns both the index associated with the best value, and the best value in two separate arrays.

Parameters:
  • data – The data array to compute the best scores from.

  • dim – The dimension along which to compute the best scores.

Returns:

A tuple containing the index associated with the best value and the best value in two separate arrays.

Raises:

ValueError – If the criteria are not in the data array.

Examples

>>> validation_scores.get_best(data, "iterations")

Note

This method is used to compute the Best scores along dimension “dim” per criterion. It returns both the index associated with the best value and the best value in two separate arrays. This is useful when you want to know the best scores for a given data array. Fix: The method is currently not able to handle the case where the criteria are not in the data array. To fix this, we need to add a check to see if the criteria are in the data array and raise an error if they are not.