Book Description
The field of post-selection inference focuses on developing solutions for problems in which a researcher uses a single dataset to both identify a promising set of hypotheses and conduct statistical inference. One promising heuristic for adjusting for model/hypothesis selection in inference is that of conditioning on the selection event (conditional inference), where the data is constrained to a subset of the sample space that guarantees the selection of a specific model. Two major obstacles to conducting valid and tractable conditional inference are that the conditional distribution of the data does not converge to a normal distribution asymptotically, and that the likelihood itself is often intractable in multivariate problems. A key idea underlying most recent works on conditional inference in regression is the polyhedral lemma which overcomes these difficulties by conditioning on information beyond the selection of a model to obtain a tractable inference procedure with finite sample guarantees. However, this extra conditioning comes at a hefty price, as it results in oversized confidence intervals and tests with less power. Our goal in this dissertation is to propose alternative approaches to conditional inference which do not rely on any extra conditioning. First we tackle the problem of estimation following model selection. To overcome the intractable conditional likelihood, we generate noisy unbiased estimates of the post-selection score function and use them in a stochastic ascent algorithm that yields correct post-selection maximum likelihood estimates. We apply the proposed technique to the problem of estimating linear models selected by the lasso. In an asymptotic analysis the resulting estimates are shown to be consistent for the selected parameters, and in a simulation study they are shown to offer better estimation accuracy compared to the lasso estimator in most of the simulation settings considered. In Chapter 3 we consider the problem of inference following aggregate tests in regression. There, we formulate the polyhedral lemma for inference following model selection with aggregate tests, but also propose two alternative approaches for conducting valid post-selection inference. The first is based on conducting inference under a conservative parametrization, and the other a regime switching method which yields point-wise consistent confidence intervals by estimating the post-selection distribution of the data. In a simulation study, we show that the proposed methods control the selective type-I error rate while offering improved power. In Chapter 4 we generalize the regime switching approach to a more general setting of conducting inference after model selection in regression. We propose a modified bootstrap approach in which we seek to consistently estimate the post-selection distribution of the data by thresholding small coefficients to zero and taking parametric bootstrap samples from the estimated conditional distribution. In an asymptotic analysis we show that the resulting confidence intervals are point-wise consistent. In a simulation study we show that our modified bootstrap procedure obtains the desired coverage rate in all simulation settings considered while producing much shorter confidence intervals with improved power to detect true signals in the selected model.