The Effect of Model-Selection Uncertainty on Error Bands for Estimated Impulse Response Functions in Vector Autoregressive Models


Book Description

Model selection uncertainty adds to the variability in the coefficient estimates when small samples are used because model-selection criteria perform poorly in small samples. Previous literatures account for model-selection uncertainty to improve inference by endogenizing the lag order selection using bootstrap methods. This paper shows that all bootstrap methods fail in cases that are most common in macroeconomic applications. As the maximum eigenvalue of the vector autoregressive model gets closer to one, the bias of the impulse response estimates increases. As a result, the standard bootstrap resampling produces low interval coverage accuracy while bootstrap subsampling produces zero coverage. A proposed solution for this problem is using the first-order bias correction with bootstrap interval for impulse response estimates, which corrects for the first and second order bias of these estimators. This dramatically improves the interval coverage accuracy for impulse response estimates.




On Model Uncertainty and its Statistical Implications


Book Description

In this book problems related to the choice of models in such diverse fields as regression, covariance structure, time series analysis and multinomial experiments are discussed. The emphasis is on the statistical implications for model assessment when the assessment is done with the same data that generated the model. This is a problem of long standing, notorious for its difficulty. Some contributors discuss this problem in an illuminating way. Others, and this is a truly novel feature, investigate systematically whether sample re-use methods like the bootstrap can be used to assess the quality of estimators or predictors in a reliable way given the initial model uncertainty. The book should prove to be valuable for advanced practitioners and statistical methodologists alike.




The Effect of Model Selection Uncertainty on the Error Bands for Impulse Response Functions in Vector Error Correction Models


Book Description

Conventional asymptotic and bootstrap methods for finite-order autoregressive models condition on the estimated lag-order of the model, which is later, used to construct the error bands for impulse response functions. Even if the estimated lag order is believed to be correct, this procedure ignores the sampling uncertainty of the lag order. An earlier study by Kilian (1998) introduced an endogenous lag order bootstrap algorithm that reflected the true extent of sampling uncertainty in the regression estimates. Applications of Kilian's method to vector autoregressive (VAR) and vector error correction (VEC) assumed that the true cointegration rank is known. This paper modifies the application of kilian's method on VEC models by endogenizing the cointegration rank besides the lag order. Monte Carlo simulations results from two U.S. economy models show that ignoring cointegration rank uncertainty may seriously undermine the coverage accuracy of bootstrap confidence intervals for VEC impulse response estimates. Endogenizing the cointegration rank choice is shown to improve coverage accuracy at low additional computational cost.




Model Selection and Error Estimation in a Nutshell


Book Description

How can we select the best performing data-driven model? How can we rigorously estimate its generalization error? Statistical learning theory answers these questions by deriving non-asymptotic bounds on the generalization error of a model or, in other words, by upper bounding the true error of the learned model based just on quantities computed on the available data. However, for a long time, Statistical learning theory has been considered only an abstract theoretical framework, useful for inspiring new learning approaches, but with limited applicability to practical problems. The purpose of this book is to give an intelligible overview of the problems of model selection and error estimation, by focusing on the ideas behind the different statistical learning theory approaches and simplifying most of the technical aspects with the purpose of making them more accessible and usable in practice. The book starts by presenting the seminal works of the 80’s and includes the most recent results. It discusses open problems and outlines future directions for research.




Estimation and Testing Following Model Selection


Book Description

The field of post-selection inference focuses on developing solutions for problems in which a researcher uses a single dataset to both identify a promising set of hypotheses and conduct statistical inference. One promising heuristic for adjusting for model/hypothesis selection in inference is that of conditioning on the selection event (conditional inference), where the data is constrained to a subset of the sample space that guarantees the selection of a specific model. Two major obstacles to conducting valid and tractable conditional inference are that the conditional distribution of the data does not converge to a normal distribution asymptotically, and that the likelihood itself is often intractable in multivariate problems. A key idea underlying most recent works on conditional inference in regression is the polyhedral lemma which overcomes these difficulties by conditioning on information beyond the selection of a model to obtain a tractable inference procedure with finite sample guarantees. However, this extra conditioning comes at a hefty price, as it results in oversized confidence intervals and tests with less power. Our goal in this dissertation is to propose alternative approaches to conditional inference which do not rely on any extra conditioning. First we tackle the problem of estimation following model selection. To overcome the intractable conditional likelihood, we generate noisy unbiased estimates of the post-selection score function and use them in a stochastic ascent algorithm that yields correct post-selection maximum likelihood estimates. We apply the proposed technique to the problem of estimating linear models selected by the lasso. In an asymptotic analysis the resulting estimates are shown to be consistent for the selected parameters, and in a simulation study they are shown to offer better estimation accuracy compared to the lasso estimator in most of the simulation settings considered. In Chapter 3 we consider the problem of inference following aggregate tests in regression. There, we formulate the polyhedral lemma for inference following model selection with aggregate tests, but also propose two alternative approaches for conducting valid post-selection inference. The first is based on conducting inference under a conservative parametrization, and the other a regime switching method which yields point-wise consistent confidence intervals by estimating the post-selection distribution of the data. In a simulation study, we show that the proposed methods control the selective type-I error rate while offering improved power. In Chapter 4 we generalize the regime switching approach to a more general setting of conducting inference after model selection in regression. We propose a modified bootstrap approach in which we seek to consistently estimate the post-selection distribution of the data by thresholding small coefficients to zero and taking parametric bootstrap samples from the estimated conditional distribution. In an asymptotic analysis we show that the resulting confidence intervals are point-wise consistent. In a simulation study we show that our modified bootstrap procedure obtains the desired coverage rate in all simulation settings considered while producing much shorter confidence intervals with improved power to detect true signals in the selected model.







Variance Estimates and Model Selection


Book Description

The large majority of the criteria for model selection are functions of the usual variance estimate for a regression model. The validity of the usual variance estimate depends on some assumptions, most critically the validity of the model being estimated. This is often violated in model selection contexts, where model search takes place over invalid models. A cross validated variance estimate is more robust to specification errors (see, for example, Efron, 1983). We consider the effects of replacing the usual variance estimate by a cross validated variance estimate, namely, the Prediction Sum of Squares (PRESS) in the functions of several model selection criteria. Such replacements improve the probability of finding the true model, at least in large samples.




Uncertainty Quantification in High Dimensional Model Selection and Inference for Regression


Book Description

Recent advances in $ell_1$-regularization methods have proved to be very useful for high dimensional model selection and inference. In the high dimensional regression context, the lasso and its extensions have been successfully employed to identify parsimonious sets of predictors It is well known that the lasso has the advantage of performing model selection and estimation simultaneously. It is less well understood how much uncertainty the lasso estimates may have due to small sample sizes. To model this uncertainty, we present a method, called the "contour Bayesian lasso" for the purposes of constructing joint credible regions for regression parameters. The contour Bayesian lasso is an extension of a recent approach called the "Bayesian lasso" which in turn is based on the Bayesian interpretation of the lasso. The Bayesian lasso uses a Gibbs sampler to generate from the Bayesian lasso posterior and is thus a convenient approach for quantifying uncertainty of lasso estimates. We give theoretical results regarding the optimality of the contour approach, study posterior consistency and the convergence of the Gibbs sampler. We also analyze the frequentist properties of the Bayesian lasso approach. A theoretical analysis of how the convergence of the Gibbs sampler depends on the dimensionality and sample size is undertaken. Our methodology is also illustrated on simulated and real data. We demonstrate that our posterior credible method has good coverage, and thus yields more accurate sparse solutions when the sample size is small. Real life examples are given for the South African prostate cancer data and the diabetes data set.