dacapo.utils.voi
================

.. py:module:: dacapo.utils.voi


Functions
---------

.. autoapisummary::

   dacapo.utils.voi.voi
   dacapo.utils.voi.split_vi
   dacapo.utils.voi.vi_tables
   dacapo.utils.voi.contingency_table
   dacapo.utils.voi.divide_columns
   dacapo.utils.voi.divide_rows
   dacapo.utils.voi.xlogx


Module Contents
---------------

.. py:function:: voi(reconstruction, groundtruth, ignore_reconstruction=[], ignore_groundtruth=[0])

   Return the conditional entropies of the variation of information metric. [1]

   Let X be a reconstruction, and Y a ground truth labelling. The variation of
   information between the two is the sum of two conditional entropies:

       VI(X, Y) = H(X|Y) + H(Y|X).

   The first one, H(X|Y), is a measure of oversegmentation, the second one,
   H(Y|X), a measure of undersegmentation. These measures are referred to as
   the variation of information split or merge error, respectively.

   :param seg: A candidate segmentation.
   :type seg: np.ndarray, int type, arbitrary shape
   :param gt: The ground truth segmentation.
   :type gt: np.ndarray, int type, same shape as `seg`
   :param ignore_seg: Any points having a label in this list are ignored in the evaluation.
                      By default, only the label 0 in the ground truth will be ignored.
   :type ignore_seg: list of int, optional
   :param ignore_gt: Any points having a label in this list are ignored in the evaluation.
                     By default, only the label 0 in the ground truth will be ignored.
   :type ignore_gt: list of int, optional

   :returns: **(split, merge)** -- The variation of information split and merge error, i.e., H(X|Y) and H(Y|X)
   :rtype: float

   :raises ValueError: If `reconstruction` and `groundtruth` have different shapes.

   .. rubric:: References

   [1] Meila, M. (2007). Comparing clusterings - an information based
   distance. Journal of Multivariate Analysis 98, 873-895.


.. py:function:: split_vi(x, y=None, ignore_x=[0], ignore_y=[0])

   Return the symmetric conditional entropies associated with the VI.

   The variation of information is defined as VI(X,Y) = H(X|Y) + H(Y|X).
   If Y is the ground-truth segmentation, then H(Y|X) can be interpreted
   as the amount of under-segmentation of Y and H(X|Y) is then the amount
   of over-segmentation.  In other words, a perfect over-segmentation
   will have H(Y|X)=0 and a perfect under-segmentation will have H(X|Y)=0.

   If y is None, x is assumed to be a contingency table.

   :param x: Label field (int type) or contingency table (float). `x` is
             interpreted as a contingency table (summing to 1.0) if and only if `y`
             is not provided.
   :type x: np.ndarray
   :param y: A label field to compare to `x`.
   :type y: np.ndarray of int, same shape as x, optional
   :param ignore_x: Any points having a label in this list are ignored in the evaluation.
                    Ignore 0-labeled points by default.
   :type ignore_x: list of int, optional
   :param ignore_y: Any points having a label in this list are ignored in the evaluation.
                    Ignore 0-labeled points by default.
   :type ignore_y: list of int, optional

   :returns: **sv** -- The conditional entropies of Y|X and X|Y.
   :rtype: np.ndarray of float, shape (2,)

   .. seealso:: :obj:`vi`


.. py:function:: vi_tables(x, y=None, ignore_x=[0], ignore_y=[0])

   Return probability tables used for calculating VI.

   If y is None, x is assumed to be a contingency table.

   :param x: Either x and y are provided as equal-shaped np.ndarray label fields
             (int type), or y is not provided and x is a contingency table
             (sparse.csc_matrix) that may or may not sum to 1.
   :type x: np.ndarray
   :param y: Either x and y are provided as equal-shaped np.ndarray label fields
             (int type), or y is not provided and x is a contingency table
             (sparse.csc_matrix) that may or may not sum to 1.
   :type y: np.ndarray
   :param ignore_x: Rows and columns (respectively) to ignore in the contingency table.
                    These are labels that are not counted when evaluating VI.
   :type ignore_x: list of int, optional
   :param ignore_y: Rows and columns (respectively) to ignore in the contingency table.
                    These are labels that are not counted when evaluating VI.
   :type ignore_y: list of int, optional

   :returns: * **pxy** (*sparse.csc_matrix of float*) -- The normalized contingency table.
             * **px, py, hxgy, hygx, lpygx, lpxgy** (*np.ndarray of float*) -- The proportions of each label in `x` and `y` (`px`, `py`), the
               per-segment conditional entropies of `x` given `y` and vice-versa, the
               per-segment conditional probability p log p.

   :raises ValueError: If `x` and `y` have different shapes.


.. py:function:: contingency_table(seg, gt, ignore_seg=[0], ignore_gt=[0], norm=True)

   Return the contingency table for all regions in matched segmentations.

   :param seg: A candidate segmentation.
   :type seg: np.ndarray, int type, arbitrary shape
   :param gt: The ground truth segmentation.
   :type gt: np.ndarray, int type, same shape as `seg`
   :param ignore_seg: Values to ignore in `seg`. Voxels in `seg` having a value in this list
                      will not contribute to the contingency table. (default: [0])
   :type ignore_seg: list of int, optional
   :param ignore_gt: Values to ignore in `gt`. Voxels in `gt` having a value in this list
                     will not contribute to the contingency table. (default: [0])
   :type ignore_gt: list of int, optional
   :param norm: Whether to normalize the table so that it sums to 1.
   :type norm: bool, optional

   :returns: **cont** -- A contingency table. `cont[i, j]` will equal the number of voxels
             labeled `i` in `seg` and `j` in `gt`. (Or the proportion of such voxels
             if `norm=True`.)
   :rtype: scipy.sparse.csc_matrix

   :raises ValueError: If `seg` and `gt` have different shapes.


.. py:function:: divide_columns(matrix, row, in_place=False)

   Divide each column of `matrix` by the corresponding element in `row`.

   The result is as follows: out[i, j] = matrix[i, j] / row[j]

   :param matrix: The input matrix.
   :type matrix: np.ndarray, scipy.sparse.csc_matrix or csr_matrix, shape (M, N)
   :param column: The row dividing `matrix`.
   :type column: a 1D np.ndarray, shape (N,)
   :param in_place: Do the computation in-place.
   :type in_place: bool (optional, default False)

   :returns: **out** -- The result of the row-wise division.
   :rtype: same type as `matrix`

   :raises ValueError: If `row` contains zeros.


.. py:function:: divide_rows(matrix, column, in_place=False)

   Divide each row of `matrix` by the corresponding element in `column`.

   The result is as follows: out[i, j] = matrix[i, j] / column[i]

   :param matrix: The input matrix.
   :type matrix: np.ndarray, scipy.sparse.csc_matrix or csr_matrix, shape (M, N)
   :param column: The column dividing `matrix`.
   :type column: a 1D np.ndarray, shape (M,)
   :param in_place: Do the computation in-place.
   :type in_place: bool (optional, default False)

   :returns: **out** -- The result of the row-wise division.
   :rtype: same type as `matrix`

   :raises ValueError: If `column` contains zeros.


.. py:function:: xlogx(x, out=None, in_place=False)

   Compute x * log_2(x).

   We define 0 * log_2(0) = 0

   :param x: The input array.
   :type x: np.ndarray or scipy.sparse.csc_matrix or csr_matrix
   :param out: If provided, use this array/matrix for the result.
   :type out: same type as x (optional)
   :param in_place: Operate directly on x.
   :type in_place: bool (optional, default False)

   :returns: **y** -- Result of x * log_2(x).
   :rtype: same type as x

   :raises ValueError: If x contains negative values.