Bayesian Model Selection Consistency for High-dimensional Regression


Book Description

Bayesian model selection has enjoyed considerable prominence in high-dimensional variable selection in recent years. Despite its popularity, the asymptotic theory for high-dimensional variable selection has not been fully explored yet. In this study, we aim to identify prior conditions for Bayesian model selection consistency under high-dimensional regression settings. In a Bayesian framework, posterior model probabilities can be used to quantify the importance of models given the observed data. Hence, our focus is on the asymptotic behavior of posterior model probabilities when the number of the potential predictors grows with the sample size. This dissertation contains the following three projects. In the first project, we investigate the asymptotic behavior of posterior model probabilities under the Zellner's g-prior, which is one of the most popular choices for model selection in Bayesian linear regression. We establish a simple and intuitive condition of the Zellner's g-prior under which the posterior model distribution tends to be concentrated at the true model as the sample size increases even if the number of predictors grows much faster than the sample size does. Simulation study results indicate that the satisfaction of our condition is essential for the success of Bayesian high-dimensional variable selection under the g-prior. In the second project, we extend our framework to a general class of priors. The most pressing challenge in our generalization is that the marginal likelihood cannot be expressed in a closed form. To address this problem, we develop a general form of Laplace approximation under a high-dimensional setting. As a result, we establish general sufficient conditions for high-dimensional Bayesian model selection consistency. Our simulation study and real data analysis demonstrate that the proposed condition allows us to identify the true data generating model consistently. In the last project, we extend our framework to Bayesian generalized linear regression models. The distinctive feature of our proposed framework is that we do not impose any specific form of data distribution. In this project we develop a general condition under which the true model tends to maximize the marginal likelihood even when the number of predictors increases faster than the sample size. Our condition provides useful guidelines for the specification of priors including hyperparameter selection. Our simulation study demonstrates the validity of the proposed condition for Bayesian model selection consistency with non-Gaussian data.




Bayesian Hypothesis Testing and Variable Selection in High Dimensional Regression


Book Description

Abstract: This dissertation consists of three distinct but related research projects. First of all, we study the Bayesian approach to model selection in the class of normal regression models. We propose an explicit closed-form expression of the Bayes factor with the use of Zellner's g-prior and the beta-prime prior for g. Noting that linear models with a growing number of unknown parameters have recently gained increasing popularity in practice, such as the spline problem, we shall thus be particularly interested in studying the model selection consistency of the Bayes factor under the scenario in which the dimension of the parameter space increases with the sample size. Our results show that the proposed Bayes factor is always consistent under the null model and is consistent under the alternative model except for a small set of alternative models which can be characterized. It is noteworthy that the results mentioned above can be applied to the analysis of variance (ANOVA) model, which has been widely used in many areas of science, such as ecology, psychology, and behavioral research. For the one-way unbalanced ANOVA model, we propose an explicit closed-form expression of the Bayes factor which is thus easy to compute. In addition, its corresponding model selection consistency has been investigated under different asymptotic situations. For the one-way random effects models, we also propose a closed-form Bayes factor without integral representation which has reasonable model selection consistency under different asymptotic scenarios. Moreover, the performance of the proposed Bayes factor is examined by numerical studies. The second project deals with the intrinsic Bayesian inference for the correlation coefficient between the disturbances in the system of two seemingly unrelated regression equations. This work was inspired by the observation that considerable attention has been paid to the improved estimation of the regression coefficients of each model, whereas little attention has just been paid for making inference of the correlation coefficient, even though most of the improved estimation of the regression coefficients depend on the correlation coefficient. We propose an objective Bayesian solution to the problems of hypothesis testing and point estimation for the correlation coefficient based on combined use of the invariant loss function and the objective prior distribution for the unknown model parameters. This new solution possesses an invariance property under monotonic reparameterization of the quantity of interest. Some simulation studies and one real-data example are given for illustrative purpose. In the third project, we propose a new Bayesian strength of evidence built on divergence measures for testing point null hypotheses. Our proposed approach can be viewed as an objective and automatic approach to the problem of testing a point null hypothesis. It is shown that the new evidence successfully reconciles the disagreement between frequentists and Bayesians in many classical examples in which Lindley's paradox often occurs. In particular, note that the proposed Bayesian approach under the noninformative prior often recovers the frequentist P-values. From a Bayesian decision-theoretical viewpoint, it is justified that the new evidence is a formal Bayes test for some specific loss functions. The performance of the proposed approach is illustrated through several numerical examples. Possible applications of the new evidence for a variety of point null hypothesis testing problems are also briefly discussed.




Handbook of Bayesian Variable Selection


Book Description

Bayesian variable selection has experienced substantial developments over the past 30 years with the proliferation of large data sets. Identifying relevant variables to include in a model allows simpler interpretation, avoids overfitting and multicollinearity, and can provide insights into the mechanisms underlying an observed phenomenon. Variable selection is especially important when the number of potential predictors is substantially larger than the sample size and sparsity can reasonably be assumed. The Handbook of Bayesian Variable Selection provides a comprehensive review of theoretical, methodological and computational aspects of Bayesian methods for variable selection. The topics covered include spike-and-slab priors, continuous shrinkage priors, Bayes factors, Bayesian model averaging, partitioning methods, as well as variable selection in decision trees and edge selection in graphical models. The handbook targets graduate students and established researchers who seek to understand the latest developments in the field. It also provides a valuable reference for all interested in applying existing methods and/or pursuing methodological extensions. Features: Provides a comprehensive review of methods and applications of Bayesian variable selection. Divided into four parts: Spike-and-Slab Priors; Continuous Shrinkage Priors; Extensions to various Modeling; Other Approaches to Bayesian Variable Selection. Covers theoretical and methodological aspects, as well as worked out examples with R code provided in the online supplement. Includes contributions by experts in the field. Supported by a website with code, data, and other supplementary material




Uncertainty Quantification in High Dimensional Model Selection and Inference for Regression


Book Description

Recent advances in $ell_1$-regularization methods have proved to be very useful for high dimensional model selection and inference. In the high dimensional regression context, the lasso and its extensions have been successfully employed to identify parsimonious sets of predictors It is well known that the lasso has the advantage of performing model selection and estimation simultaneously. It is less well understood how much uncertainty the lasso estimates may have due to small sample sizes. To model this uncertainty, we present a method, called the "contour Bayesian lasso" for the purposes of constructing joint credible regions for regression parameters. The contour Bayesian lasso is an extension of a recent approach called the "Bayesian lasso" which in turn is based on the Bayesian interpretation of the lasso. The Bayesian lasso uses a Gibbs sampler to generate from the Bayesian lasso posterior and is thus a convenient approach for quantifying uncertainty of lasso estimates. We give theoretical results regarding the optimality of the contour approach, study posterior consistency and the convergence of the Gibbs sampler. We also analyze the frequentist properties of the Bayesian lasso approach. A theoretical analysis of how the convergence of the Gibbs sampler depends on the dimensionality and sample size is undertaken. Our methodology is also illustrated on simulated and real data. We demonstrate that our posterior credible method has good coverage, and thus yields more accurate sparse solutions when the sample size is small. Real life examples are given for the South African prostate cancer data and the diabetes data set.




Consistency of an Information Criterion for High-Dimensional Multivariate Regression


Book Description

This is the first book on an evaluation of (weak) consistency of an information criterion for variable selection in high-dimensional multivariate linear regression models by using the high-dimensional asymptotic framework. It is an asymptotic framework such that the sample size n and the dimension of response variables vector p are approaching ∞ simultaneously under a condition that p/n goes to a constant included in [0,1).Most statistical textbooks evaluate consistency of an information criterion by using the large-sample asymptotic framework such that n goes to ∞ under the fixed p. The evaluation of consistency of an information criterion from the high-dimensional asymptotic framework provides new knowledge to us, e.g., Akaike's information criterion (AIC) sometimes becomes consistent under the high-dimensional asymptotic framework although it never has a consistency under the large-sample asymptotic framework; and Bayesian information criterion (BIC) sometimes becomes inconsistent under the high-dimensional asymptotic framework although it is always consistent under the large-sample asymptotic framework. The knowledge may help to choose an information criterion to be used for high-dimensional data analysis, which has been attracting the attention of many researchers.




Bayesian Model Selection for High-dimensional High-throughput Data


Book Description

Bayesian methods are often criticized on the grounds of subjectivity. Furthermore, misspecified priors can have a deleterious effect on Bayesian inference. Noting that model selection is effectively a test of many hypotheses, Dr. Valen E. Johnson sought to eliminate the need of prior specification by computing Bayes' factors from frequentist test statistics. In his pioneering work that was published in the year 2005, Dr. Johnson proposed using so-called local priors for computing Bayes? factors from test statistics. Dr. Johnson and Dr. Jianhua Hu used Bayes' factors for model selection in a linear model setting. In an independent work, Dr. Johnson and another colleage, David Rossell, investigated two families of non-local priors for testing the regression parameter in a linear model setting. These non-local priors enable greater separation between the theories of null and alternative hypotheses. In this dissertation, I extend model selection based on Bayes' factors and use nonlocal priors to define Bayes' factors based on test statistics. With these priors, I have been able to reduce the problem of prior specification to setting to just one scaling parameter. That scaling parameter can be easily set, for example, on the basis of frequentist operating characteristics of the corresponding Bayes' factors. Furthermore, the loss of information by basing a Bayes' factors on a test statistic is minimal. Along with Dr. Johnson and Dr. Hu, I used the Bayes' factors based on the likelihood ratio statistic to develop a method for clustering gene expression data. This method has performed well in both simulated examples and real datasets. An outline of that work is also included in this dissertation. Further, I extend the clustering model to a subclass of the decomposable graphical model class, which is more appropriate for genotype data sets, such as single-nucleotide polymorphism (SNP) data. Efficient FORTRAN programming has enabled me to apply the methodology to hundreds of nodes. For problems that produce computationally harder probability landscapes, I propose a modification of the Markov chain Monte Carlo algorithm to extract information regarding the important network structures in the data. This modified algorithm performs well in inferring complex network structures. I use this method to develop a prediction model for disease based on SNP data. My method performs well in cross-validation studies.




Asymptotic Statistics


Book Description

This book is an introduction to the field of asymptotic statistics. The treatment is both practical and mathematically rigorous. In addition to most of the standard topics of an asymptotics course, including likelihood inference, M-estimation, the theory of asymptotic efficiency, U-statistics, and rank procedures, the book also presents recent research topics such as semiparametric models, the bootstrap, and empirical processes and their applications. The topics are organized from the central idea of approximation by limit experiments, which gives the book one of its unifying themes. This entails mainly the local approximation of the classical i.i.d. set up with smooth parameters by location experiments involving a single, normally distributed observation. Thus, even the standard subjects of asymptotic statistics are presented in a novel way. Suitable as a graduate or Master s level statistics text, this book will also give researchers an overview of the latest research in asymptotic statistics.




Handbook of Bayesian Variable Selection


Book Description

Bayesian variable selection has experienced substantial developments over the past 30 years with the proliferation of large data sets. Identifying relevant variables to include in a model allows simpler interpretation, avoids overfitting and multicollinearity, and can provide insights into the mechanisms underlying an observed phenomenon. Variable selection is especially important when the number of potential predictors is substantially larger than the sample size and sparsity can reasonably be assumed. The Handbook of Bayesian Variable Selection provides a comprehensive review of theoretical, methodological and computational aspects of Bayesian methods for variable selection. The topics covered include spike-and-slab priors, continuous shrinkage priors, Bayes factors, Bayesian model averaging, partitioning methods, as well as variable selection in decision trees and edge selection in graphical models. The handbook targets graduate students and established researchers who seek to understand the latest developments in the field. It also provides a valuable reference for all interested in applying existing methods and/or pursuing methodological extensions. Features: Provides a comprehensive review of methods and applications of Bayesian variable selection. Divided into four parts: Spike-and-Slab Priors; Continuous Shrinkage Priors; Extensions to various Modeling; Other Approaches to Bayesian Variable Selection. Covers theoretical and methodological aspects, as well as worked out examples with R code provided in the online supplement. Includes contributions by experts in the field. Supported by a website with code, data, and other supplementary material




Statistics for High-Dimensional Data


Book Description

Modern statistics deals with large and complex data sets, and consequently with models containing a large number of parameters. This book presents a detailed account of recently developed approaches, including the Lasso and versions of it for various models, boosting methods, undirected graphical modeling, and procedures controlling false positive selections. A special characteristic of the book is that it contains comprehensive mathematical theory on high-dimensional statistics combined with methodology, algorithms and illustrations with real data examples. This in-depth approach highlights the methods’ great potential and practical applicability in a variety of settings. As such, it is a valuable resource for researchers, graduate students and experts in statistics, applied mathematics and computer science.




Sparse Graphical Modeling for High Dimensional Data


Book Description

A general framework for learning sparse graphical models with conditional independence tests Complete treatments for different types of data, Gaussian, Poisson, multinomial, and mixed data Unified treatments for data integration, network comparison, and covariate adjustment Unified treatments for missing data and heterogeneous data Efficient methods for joint estimation of multiple graphical models Effective methods of high-dimensional variable selection Effective methods of high-dimensional inference