Estimation and Testing Following Model Selection


Book Description

The field of post-selection inference focuses on developing solutions for problems in which a researcher uses a single dataset to both identify a promising set of hypotheses and conduct statistical inference. One promising heuristic for adjusting for model/hypothesis selection in inference is that of conditioning on the selection event (conditional inference), where the data is constrained to a subset of the sample space that guarantees the selection of a specific model. Two major obstacles to conducting valid and tractable conditional inference are that the conditional distribution of the data does not converge to a normal distribution asymptotically, and that the likelihood itself is often intractable in multivariate problems. A key idea underlying most recent works on conditional inference in regression is the polyhedral lemma which overcomes these difficulties by conditioning on information beyond the selection of a model to obtain a tractable inference procedure with finite sample guarantees. However, this extra conditioning comes at a hefty price, as it results in oversized confidence intervals and tests with less power. Our goal in this dissertation is to propose alternative approaches to conditional inference which do not rely on any extra conditioning. First we tackle the problem of estimation following model selection. To overcome the intractable conditional likelihood, we generate noisy unbiased estimates of the post-selection score function and use them in a stochastic ascent algorithm that yields correct post-selection maximum likelihood estimates. We apply the proposed technique to the problem of estimating linear models selected by the lasso. In an asymptotic analysis the resulting estimates are shown to be consistent for the selected parameters, and in a simulation study they are shown to offer better estimation accuracy compared to the lasso estimator in most of the simulation settings considered. In Chapter 3 we consider the problem of inference following aggregate tests in regression. There, we formulate the polyhedral lemma for inference following model selection with aggregate tests, but also propose two alternative approaches for conducting valid post-selection inference. The first is based on conducting inference under a conservative parametrization, and the other a regime switching method which yields point-wise consistent confidence intervals by estimating the post-selection distribution of the data. In a simulation study, we show that the proposed methods control the selective type-I error rate while offering improved power. In Chapter 4 we generalize the regime switching approach to a more general setting of conducting inference after model selection in regression. We propose a modified bootstrap approach in which we seek to consistently estimate the post-selection distribution of the data by thresholding small coefficients to zero and taking parametric bootstrap samples from the estimated conditional distribution. In an asymptotic analysis we show that the resulting confidence intervals are point-wise consistent. In a simulation study we show that our modified bootstrap procedure obtains the desired coverage rate in all simulation settings considered while producing much shorter confidence intervals with improved power to detect true signals in the selected model.







Econometric Analysis of Model Selection and Model Testing


Book Description

In recent years econometricians have examined the problems of diagnostic testing, specification testing, semiparametric estimation and model selection. In addition researchers have considered whether to use model testing and model selection procedures to decide the models that best fit a particular dataset. This book explores both issues with application to various regression models, including the arbitrage pricing theory models. It is ideal as a reference for statistical sciences postgraduate students, academic researchers and policy makers in understanding the current status of model building and testing techniques.




From Data to Model


Book Description

The problem of obtaining dynamical models directly from an observed time-series occurs in many fields of application. There are a number of possible approaches to this problem. In this volume a number of such points of view are exposed: the statistical time series approach, a theory of guaranted performance, and finally a deterministic approximation approach. This volume is an out-growth of a number of get-togethers sponsered by the Systems and Decision Sciences group of the International Institute of Applied Systems Analysis (IIASA) in Laxenburg, Austria. The hospitality and support of this organization is gratefully acknowledged. Jan Willems Groningen, the Netherlands May 1989 TABLE OF CONTENTS Linear System Identification- A Survey page 1 M. Deistler A Tutorial on Hankel-Norm Approximation 26 K. Glover A Deterministic Approach to Approximate Modelling 49 C. Heij and J. C. Willems Identification - a Theory of Guaranteed Estimates 135 A. B. Kurzhanski Statistical Aspects of Model Selection 215 R. Shibata Index 241 Addresses of Authors 246 LINEAR SYSTEM IDENTIFICATION· A SURVEY M. DEISTLER Abstract In this paper we give an introductory survey on the theory of identification of (in general MIMO) linear systems from (discrete) time series data. The main parts are: Structure theory for linear systems, asymptotic properties of maximum likelihood type estimators, estimation of the dynamic specification by methods based on information criteria and finally, extensions and alternative approaches such as identification of unstable systems and errors-in-variables. Keywords Linear systems, parametrization, maximum likelihood estimation, information criteria, errors-in-variables.




Model Selection and Multimodel Inference


Book Description

A unique and comprehensive text on the philosophy of model-based data analysis and strategy for the analysis of empirical data. The book introduces information theoretic approaches and focuses critical attention on a priori modeling and the selection of a good approximating model that best represents the inference supported by the data. It contains several new approaches to estimating model selection uncertainty and incorporating selection uncertainty into estimates of precision. An array of examples is given to illustrate various technical issues. The text has been written for biologists and statisticians using models for making inferences from empirical data.




Model Selection and Inference


Book Description

Statisticians and applied scientists must often select a model to fit empirical data. This book discusses the philosophy and strategy of selecting such a model using the information theory approach pioneered by Hirotugu Akaike. This approach focuses critical attention on a priori modeling and the selection of a good approximating model that best represents the inference supported by the data. The book includes practical applications in biology and environmental science.




Regression and Time Series Model Selection


Book Description

This important book describes procedures for selecting a model from a large set of competing statistical models. It includes model selection techniques for univariate and multivariate regression models, univariate and multivariate autoregressive models, nonparametric (including wavelets) and semiparametric regression models, and quasi-likelihood and robust regression models. Information-based model selection criteria are discussed, and small sample and asymptotic properties are presented. The book also provides examples and large scale simulation studies comparing the performances of information-based model selection criteria, bootstrapping, and cross-validation selection methods over a wide range of models.




Hypothesis Testing and Model Selection in the Social Sciences


Book Description

Examining the major approaches to hypothesis testing and model selection, this book blends statistical theory with recommendations for practice, illustrated with real-world social science examples. It systematically compares classical (frequentist) and Bayesian approaches, showing how they are applied, exploring ways to reconcile the differences between them, and evaluating key controversies and criticisms. The book also addresses the role of hypothesis testing in the evaluation of theories, the relationship between hypothesis tests and confidence intervals, and the role of prior knowledge in Bayesian estimation and Bayesian hypothesis testing. Two easily calculated alternatives to standard hypothesis tests are discussed in depth: the Akaike information criterion (AIC) and Bayesian information criterion (BIC). The companion website ([ital]www.guilford.com/weakliem-materials[/ital]) supplies data and syntax files for the book's examples.




Forecasting: principles and practice


Book Description

Forecasting is required in many situations. Stocking an inventory may require forecasts of demand months in advance. Telecommunication routing requires traffic forecasts a few minutes ahead. Whatever the circumstances or time horizons involved, forecasting is an important aid in effective and efficient planning. This textbook provides a comprehensive introduction to forecasting methods and presents enough information about each method for readers to use them sensibly.




Hypothesis Testing and Model Selection in the Social Sciences


Book Description

Examining the major approaches to hypothesis testing and model selection, this book blends statistical theory with recommendations for practice, illustrated with real-world social science examples. It systematically compares classical (frequentist) and Bayesian approaches, showing how they are applied, exploring ways to reconcile the differences between them, and evaluating key controversies and criticisms. The book also addresses the role of hypothesis testing in the evaluation of theories, the relationship between hypothesis tests and confidence intervals, and the role of prior knowledge in Bayesian estimation and Bayesian hypothesis testing. Two easily calculated alternatives to standard hypothesis tests are discussed in depth: the Akaike information criterion (AIC) and Bayesian information criterion (BIC). The companion website ([ital]www.guilford.com/weakliem-materials[/ital]) supplies data and syntax files for the book's examples.