What is countsplit?

The countsplit R package splits an integer-valued matrix into a training matrix and a test matrix using binomial thinning. Under a Poisson assumption, the training and test matrices are independent.

The motivation for this method is described in Neufeld et al., 2022 (link to preprint) in the context of inference after latent variable estimation for single cell RNA sequencing data. Briefly, count splitting allows users to perform differential expression analysis to see which genes vary across estimated cell types (such as those obtained via clustering) or along an estimated cellular trajectory (pseudotime).

How can I get countsplit?

Make sure that remotes is installed by running install.packages("remotes"), then type


Where can I learn more?

See the introductory tutorial tab for an introduction to our framework on simple simulated data. See the seurat, scran, and monocle3 tutorials for examples of how the count splitting package can be integrated with common scRNA-seq analysis pipelines.

Please see the double dipping demonstration for the code that goes with Appendix A of our paper.

Please visit https://github.com/anna-neufeld/countsplit_paper for code to reproduce the figures and tables from our paper.


Neufeld, A.,Gao, L., Popp, J., Battle, A. & Witten, D. (2022), ‘Inference after latent variable estimation for single-cell RNA sequencing data’, arXiv.2207.00554