I am interested in applied probability and high-dimensional statistics in general. Recently, I have been working on High-dimensional Bayesian Variational Inference, Adaptive Experimental Design, and Random Graphs.

## Publications and Preprints

and Sen, S. (2023).**Qiu, J.****The TAP Free Energy for High-Dimensional Linear Regression**,*Annals of Applied Probability*. [PDF] [arXiv]- We derived a variational representation for the log-normalizing constant of the posterior distribution in Bayesian linear regression with a uniform spherical prior and an i.i.d. Gaussian design. We work under the “proportional" asymptotic regime, where the number of observations and the number of features grow at a proportional rate. This rigorously establishes the Thouless-Anderson-Palmer (TAP) approximation arising from spin glass theory, and proves a conjecture of [Krzakala et al., 2014] in the special case of the spherical prior.

- Ham, D.* and
. (2023).__Qiu, J.__***Hypothesis Testing in Sequentially Sampled Data: ART to Maximize Power Beyond iid Sampling**,*TEST*. [PDF] [arXiv]- Testing whether a variable of interest affects the outcome is one of the most fundamental problems in statistics. To tackle this problem, the conditional randomization test (CRT) is a design-based method that is widely used to test the independence of a variable of interest (X) with an outcome (Y) holding some controls (Z) fixed. The CRT relies solely on the random iid sampling of (X,Z) to produce exact finite-sample p-values that are constructed using any test statistic. We propose a new method, the adaptive randomization test (AdapRT), that similarly tackles the independence problem but allows the data to be sequentially sampled. Like the CRT, the AdapRT relies solely on knowing the (adaptive) sampling distribution of (X,Z). In this paper, we additionally show the significant power increase by adaptively sampling in two illustrative settings.

(2023).__Qiu, J.__**Sub-optimality of the Naive Mean Field approximation for proportional high-dimensional Linear Regression**,*NeurIPS 2023*. [PDF] [arXiv]- Despite popularity of Naïve Mean Field (NMF) approximation in practice, theoretical guarantees for high-dimensional problems are only available under strong structural assumptions (e.g., sparsity). Moreover, existing theory often does not explain empirical observations noted in the existing literature.

In this paper, we take a step towards addressing these problems by deriving sharp asymptotic characterizations for the NMF approximation in high-dimensional linear regression. Our results apply to a wide class of natural priors, and allow for model mismatch (i.e., the underlying statistical model can be different from the fitted model). We work under an iid Gaussian design and the proportional asymptotic regime, where the number of features and number of observations grow at a proportional rate. As a consequence of our asymptotic characterization, we establish two concrete corollaries: (a) we establish the inaccuracy of the NMF approximation for the log-normalizing constant in this regime, and (b) provide theoretical results backing the empirical observation that the NMF approximation can be overconfident in terms of uncertainty quantification.

- Despite popularity of Naïve Mean Field (NMF) approximation in practice, theoretical guarantees for high-dimensional problems are only available under strong structural assumptions (e.g., sparsity). Moreover, existing theory often does not explain empirical observations noted in the existing literature.
, Mukherjee, S., and Sen, S. (2023+).__Qiu, J__.**On Naive Mean-Field Approximation for high-dimensional canonical GLMs**, in preparation,

## Talks

- “TAP formula: High-Dimensional Linear Regression”. NESS Symposium, IS-22: Advances in probabilistic algorithms, May 2022.