Takes a dataset (scalar, vector, or matrix) and returns a training set and a test set that sum to the original data matrix.

datathin(data, family, K = 2, epsilon = NULL, arg = NULL)

Arguments

data

A scalar, vector, or matrix of data.

family

The distribution of the data. Options are "poisson", "negative binomial", "normal" (or "gaussian"), "normal-variance" (or "gaussian-variance"), "mvnormal" (or "mvgaussian"), "binomial", "multinomial", "exponential", "gamma", "chi-squared", "gamma-weibull", "weibull", "pareto", "shifted-exponential", "scaled-uniform", "scaled-beta".

K

The number of folds. Note that for the "chi-squared" and "gamma-weibull" decompositions, the number of folds implies the degrees of freedom and shape parameters respectively.

epsilon

The tuning parameter for convolution-closed data thinning; must be a simplex vector of length K. Larger values correspond to more information in the respective fold. Available for "poisson", "negative binomial", "normal" (or "gaussian"), "mvnormal" (or "mvgaussian"), "binomial", "multinomial", "exponential", and "gamma" families. If epsilon is not supplied, rep(1/K, K) is used.

arg

The extra parameter that must be known in order to thin. Either a scalar or a matrix with the same dimensions as data (excluding "mvnormal" (or "mvgaussian") and "multinomial" families; see below). Requirements vary by decomposition:

  • Not needed for "poisson", "exponential", "chi-squared", or "scaled-uniform" distributions.

  • "negative binomial" requires the size parameter.

  • "normal" (or "gaussian") requires the variance.

  • "normal-variance" (or "gaussian-variance") requires the mean.

  • "mvnormal" (or "mvgaussian") requires the covariance matrix. If the dimensions of data are nxp, arg must be nxpxp.

  • "binomial" and "multinomial" require the number of trials. For "multinomial", if the dimensions of data are nxp, arg must be a vector of length n.

  • "gamma", "gamma-weibull", and "weibull" requires the shape parameter.

  • "pareto" requires the location paramter.

  • "shifted-exponential" requires the rate parameter.

  • "scaled-beta" requires the first shape parameter.

Please refer to https://arxiv.org/abs/2301.07276 and https://arxiv.org/abs/2303.12931 for further details.

Details

See https://anna-neufeld.github.io/datathin/articles/introduction_tutorial.html for examples of each decomposition.