Handbook of Bayesian Variable Selection


Book Description

Bayesian variable selection has experienced substantial developments over the past 30 years with the proliferation of large data sets. Identifying relevant variables to include in a model allows simpler interpretation, avoids overfitting and multicollinearity, and can provide insights into the mechanisms underlying an observed phenomenon. Variable selection is especially important when the number of potential predictors is substantially larger than the sample size and sparsity can reasonably be assumed. The Handbook of Bayesian Variable Selection provides a comprehensive review of theoretical, methodological and computational aspects of Bayesian methods for variable selection. The topics covered include spike-and-slab priors, continuous shrinkage priors, Bayes factors, Bayesian model averaging, partitioning methods, as well as variable selection in decision trees and edge selection in graphical models. The handbook targets graduate students and established researchers who seek to understand the latest developments in the field. It also provides a valuable reference for all interested in applying existing methods and/or pursuing methodological extensions. Features: Provides a comprehensive review of methods and applications of Bayesian variable selection. Divided into four parts: Spike-and-Slab Priors; Continuous Shrinkage Priors; Extensions to various Modeling; Other Approaches to Bayesian Variable Selection. Covers theoretical and methodological aspects, as well as worked out examples with R code provided in the online supplement. Includes contributions by experts in the field. Supported by a website with code, data, and other supplementary material




Handbook of Bayesian Variable Selection


Book Description

"Bayesian variable selection has experienced substantial developments over the past 30 years with the proliferation of large data sets. Identifying relevant variables to include in a model allows simpler interpretation, avoids overfitting and multicollinearity, and can provide insights into the mechanisms underlying an observed phenomenon. Variable selection is especially important when the number of potential predictors is substantially larger than the sample size and sparsity can reasonably be assumed. The Handbook of Bayesian Variable Selection provides a comprehensive review of theoretical, methodological and computational aspects of Bayesian methods for variable selection. The topics covered include spike-and-slab priors, continuous shrinkage priors, Bayes factors, Bayesian model averaging, partitioning methods, as well as variable selection in decision trees and edge selection in graphical models. The handbook targets graduate students and established researchers who seek to understand the latest developments in the field. It also provides a valuable reference for all interested in applying existing methods and/or pursuing methodological extensions"--










Bayesian Variable Selection for High Dimensional Data Analysis


Book Description

In the practice of statistical modeling, it is often desirable to have an accurate predictive model. Modern data sets usually have a large number of predictors.Hence parsimony is especially an important issue. Best-subset selection is a conventional method of variable selection. Due to the large number of variables with relatively small sample size and severe collinearity among the variables, standard statistical methods for selecting relevant variables often face difficulties. Bayesian stochastic search variable selection has gained much empirical success in a variety of applications. This book, therefore, proposes a modified Bayesian stochastic variable selection approach for variable selection and two/multi-class classification based on a (multinomial) probit regression model.We demonstrate the performance of the approach via many real data. The results show that our approach selects smaller numbers of relevant variables and obtains competitive classification accuracy based on obtained results.




Jointness in Bayesian Variable Selection with Applications to Growth Regression


Book Description

The authors present a measure of jointness to explore dependence among regressors in the context of Bayesian model selection. The jointness measure they propose equals the posterior odds ratio between those models that include a set of variables and the models that only include proper subsets. They show its application in cross-country growth regressions using two data-sets from the model-averaging growth literature.




Federal Statistics, Multiple Data Sources, and Privacy Protection


Book Description

The environment for obtaining information and providing statistical data for policy makers and the public has changed significantly in the past decade, raising questions about the fundamental survey paradigm that underlies federal statistics. New data sources provide opportunities to develop a new paradigm that can improve timeliness, geographic or subpopulation detail, and statistical efficiency. It also has the potential to reduce the costs of producing federal statistics. The panel's first report described federal statistical agencies' current paradigm, which relies heavily on sample surveys for producing national statistics, and challenges agencies are facing; the legal frameworks and mechanisms for protecting the privacy and confidentiality of statistical data and for providing researchers access to data, and challenges to those frameworks and mechanisms; and statistical agencies access to alternative sources of data. The panel recommended a new approach for federal statistical programs that would combine diverse data sources from government and private sector sources and the creation of a new entity that would provide the foundational elements needed for this new approach, including legal authority to access data and protect privacy. This second of the panel's two reports builds on the analysis, conclusions, and recommendations in the first one. This report assesses alternative methods for implementing a new approach that would combine diverse data sources from government and private sector sources, including describing statistical models for combining data from multiple sources; examining statistical and computer science approaches that foster privacy protections; evaluating frameworks for assessing the quality and utility of alternative data sources; and various models for implementing the recommended new entity. Together, the two reports offer ideas and recommendations to help federal statistical agencies examine and evaluate data from alternative sources and then combine them as appropriate to provide the country with more timely, actionable, and useful information for policy makers, businesses, and individuals.




A Bayesian Variable Selection Method with Applications to Spatial Data


Book Description

This thesis first describes the general idea behind Bayes Inference, various sampling methods based on Bayes theorem and many examples. Then a Bayes approach to model selection, called Stochastic Search Variable Selection (SSVS) is discussed. It was originally proposed by George and McCulloch (1993). In a normal regression model where the number of covariates is large, only a small subset tend to be significant most of the times. This Bayes procedure specifies a mixture prior for each of the unknown regression coefficient, the mixture prior was originally proposed by Geweke (1996). This mixture prior will be updated as data becomes available to generate a posterior distribution that assigns higher posterior probabilities to coefficients that are significant in explaining the response. Spatial modeling method is described in this thesis. Prior distribution for all unknown parameters and latent variables are specified. Simulated studies under different models have been implemented to test the efficiency of SSVS. A real dataset taken by choosing a small region from the Cape Floristic Region in South Africa is used to analyze the plants distribution in that region. The original multi-cateogory response is transformed into a presence and absence (binary) response for simpler analysis. First, SSVS is used on this dataset to select the subset of significant covariates. Then a spatial model is fitted using the chosen covariates and, post-estimation, predictive map of posterior probabilities of presence and absence are obtained for the study region. Posterior estimates for the true regression coefficients are also provided along with map for spatial random effects.




Bayesian Variable Selection for Non-Gaussian Data Using Global-Local Shrinkage Priors and the Multivaraite Logit-Beta Distribution


Book Description

Variable selection methods have become an important and growing problem in Bayesian analysis. The literature on Bayesian variable selection methods tends to be applied to a single response- type, and more typically, a continuous response-type, where it is assumed that the data is Gaus- sian/symmetric. In this dissertation, we develop a novel global-local shrinkage prior in non- symmetric settings and multiple response-types settings by combining the perspectives of global- local shrinkage and the conjugate multivaraite distribution. In Chapter 2, we focus on the problem of variable selection when the data is possibly non- symmetric continuous-valued. We propose modeling continuous-valued data and the coefficient vector with the multivariate logit-beta (MLB) distribution. To perform variable selection in a Bayesian context we make use of shrinkage global-local priors to enforce sparsity. Specifically, they can be defined as a Gaussian scale mixture of a global shrinkage parameter and a local shrinkage parameter for a regression coefficient. We provide a technical discussion that illustrates that our use of the multivariate logit-beta distribution under a P ́olya-Gamma augmentation scheme has an explicit connection to a well-known global-local shrinkage method (id est, the horseshoe prior) and extends it to possibly non-symmetric data. Moreover, our method can be implemented using an efficient block Gibbs sampler. Evidence of improvements in terms of mean squared error and variable selection as compared to the standard implementation of the horseshoe prior for skewed data settings is provided in simulated and real data examples. In Chapter 3, we direct our attention to the canonical variable selection problem in multiple response-types settings, where the observed dataset consists of multiple response-types (e.g., con- tinuous, count-valued, Bernoulli trials, et cetera). We propose the same global-local shrinkage prior in Chapter 2 but for multiple response-types datasets. The implementation of our Bayesian variable selection method to such data types is straightforward given the fact that the multivariate logit-beta prior is the conjugate prior for several members from the natural exponential family of distributions, which leads to the binomial/beta and negative binomial/beta hierarchical models. Our proposed model not just allows the estimation and selection of independent regression coefficients, but also those of shared regression coefficients across-response-types, which can be used to explicitly model dependence in spatial and time-series settings. An efficient block Gibbs sampler is developed, which is found to be effective in obtaining accurate estimates and variable selection results in simulation studies and an analysis of public health and financial costs from natural disasters in the U.S.