Primary Research Interests

Classical statistical tools are designed for testing pre-specified hypotheses about pre-specified models. In the real world, data analysis is an adaptive process that involves exploring the data, fitting several models, evaluating these models to select the best one, and then testing hypotheses about this selected model. I refer to the practice of using the same data for multiple tasks along this exploratory pipeline as "double dipping". When classical statistical tools are applied without care in contexts that involve double dipping, the conclusions may be erroneous. Motivated by the gap between classical statistical tools and practical data analysis, my research program focuses on enabling scientists to safely draw conclusions from data in realistic settings where models and hypotheses are not pre-specified. There are two primary ways to avoid the pitfalls associated with double dipping.
  1. Account for double dipping by creating specialized statistical procedures that directly account for the double-use of data.
  2. Avoid double dipping by splitting the data into independent training and test sets, such that only one set is used for each task. While this is typically accomplished via sample splitting, there are settings in which sample splitting is not an option and where alternatives are needed.
My recent projects have focused on developing specialized procedures that account for double dipping (1), or on developing alternatives to sample splitting that allow us to avoid double dipping (2). Publications and talks related to these projects are linked below. For a relatively recent overview, you can see the slides from my dissertation defense.

Featured publications or preprints

Additional publications or preprints