What is countsplit?

The countsplit R package splits an integer-valued matrix into a training matrix and a test matrix using binomial thinning. Under a Poisson assumption, the training and test matrices are independent.

The motivation for this method is described in Neufeld et al., 2022 (link to paper) in the context of inference after latent variable estimation for single cell RNA sequencing data. Briefly, count splitting allows users to perform differential expression analysis to see which genes vary across estimated cell types (such as those obtained via clustering) or along an estimated cellular trajectory (pseudotime).

We have improved the ability of the package to work with sparse matrices. We have added a negative binomial count splitting function, but the tutorials for this function are still a work in progress. This function implements the decomposition of the negative binomial described in Neufeld et al., 2022 (link to preprint).

The vignettes and data associated with this package are stored in the associated countsplit.tutorials” package. To see the tutorials, please visit the updated tutorial website: https://anna-neufeld.github.io/countsplit.tutorials/. This change helps with overall package size and build time.

How can I get countsplit?

Make sure that remotes is installed by running install.packages("remotes"), then type

remotes::install_github("anna-neufeld/countsplit")

To also download the data needed to reproduce the package vignettes, be sure to also install the countsplit.tutorials” package.

remotes::install_github("anna-neufeld/countsplit.tutorials").