Bayesian Variable Selection Via a Benchmark


Book Description

With increasing appearances of high dimensional data over the past decades, variable selections through likelihood penalization remains a popular yet challenging research area in statistics. Ridge and Lasso, the two of the most popular penalized regression methods, served as the foundation of regularization technique and motivated several extensions to accommodate various circumstances, mostly through frequentist models. These two regularization problems can also be solved by their Bayesian counterparts, via putting proper priors on the regression parameters and then followed by Gibbs sampling. Compared to the frequentist version, the Bayesian framework enables easier interpretation and more straightforward inference on the parameters, based on the posterior distributional results. In general, however, the Bayesian approaches do not provide sparse estimates for the regression coefficients. In this thesis, an innovative Bayesian variable selection method via a benchmark variable in conjunction with a modified BIC is proposed under the framework of linear regression models as the first attempt, to promote both model sparsity and accuracy. The motivation of introducing such a benchmark is discussed, and the statistical properties regarding its role in the model are demonstrated. In short, it serves as a criterion to measure the importance of each variable based on the posterior inference of the corresponding coefficients, and only the most important variables providing the minimal modified BIC value are included. The Bayesian approach via a benchmark is extended to accommodate linear models with covariates exhibiting group structures. An iterative algorithm is implemented to identify both important groups and important variables within the selected groups. What's more, the method is further developed and modified to select variables for generalized linear models, by taking advantage of the normal approximation on the likelihood function. Simulation studies are carried out to assess and compare the performances among the proposed approaches and other state-of-art methods for each of the above three scenarios. The numerical results consistently illustrate our Bayesian variable selection approaches tend to select exactly the true variables or groups, while producing comparable prediction errors as other methods. Besides the numerical work, several real data sets are analyzed by these methods and the corresponding performances are further compared. The variable selection results by our approach are intuitively appealing or consistent with existing literatures in general.




Handbook of Bayesian Variable Selection


Book Description

Bayesian variable selection has experienced substantial developments over the past 30 years with the proliferation of large data sets. Identifying relevant variables to include in a model allows simpler interpretation, avoids overfitting and multicollinearity, and can provide insights into the mechanisms underlying an observed phenomenon. Variable selection is especially important when the number of potential predictors is substantially larger than the sample size and sparsity can reasonably be assumed. The Handbook of Bayesian Variable Selection provides a comprehensive review of theoretical, methodological and computational aspects of Bayesian methods for variable selection. The topics covered include spike-and-slab priors, continuous shrinkage priors, Bayes factors, Bayesian model averaging, partitioning methods, as well as variable selection in decision trees and edge selection in graphical models. The handbook targets graduate students and established researchers who seek to understand the latest developments in the field. It also provides a valuable reference for all interested in applying existing methods and/or pursuing methodological extensions. Features: Provides a comprehensive review of methods and applications of Bayesian variable selection. Divided into four parts: Spike-and-Slab Priors; Continuous Shrinkage Priors; Extensions to various Modeling; Other Approaches to Bayesian Variable Selection. Covers theoretical and methodological aspects, as well as worked out examples with R code provided in the online supplement. Includes contributions by experts in the field. Supported by a website with code, data, and other supplementary material










Assigning G in Zellner's G Prior for Bayesian Variable Selection


Book Description

There are numerous frequentist statistics variable selection methods such as Stepwise regression, AIC and BIC etc. In particular, the latter two criteria include a penalty term which discourages overfitting. In terms of the framework of Bayesian variable selection, a popular approach is using Bayes Factor (Kass & Raftery 1995), which also has a natural built-in penalty term (Berger & Pericchi 2001). Zellner's g prior (Zellner 1986) is a common prior for coefficients in the linear regression model due to its computational speed of analytic solutions for posterior. However, the choice of g is a problem which has attracted a lot of attention. (Zellner 1986) pointed out that if g is unknown, a prior can be introduced and g can be integrated out. One of the prior choices is Hyper-g Priors proposed by (Liang et al. 2008). Instead of proposing a prior for g, we will assign a fixed value for g based on controlling the Type I error for the test based on the Bayes factor. Since we are using Bayes factor to do model selection, the test statistic is Bayes factor. Every test comes with a Type I error, so it is reasonable to restrict this error under a critical value, which we will take as benchmark values, such as 0.1 or 0.05. This approach will automatically involve setting a value of g. Based on this idea, a fixed g can be selected, hence avoiding the need to find a prior for g.




Handbook of Bayesian Variable Selection


Book Description

Bayesian variable selection has experienced substantial developments over the past 30 years with the proliferation of large data sets. Identifying relevant variables to include in a model allows simpler interpretation, avoids overfitting and multicollinearity, and can provide insights into the mechanisms underlying an observed phenomenon. Variable selection is especially important when the number of potential predictors is substantially larger than the sample size and sparsity can reasonably be assumed. The Handbook of Bayesian Variable Selection provides a comprehensive review of theoretical, methodological and computational aspects of Bayesian methods for variable selection. The topics covered include spike-and-slab priors, continuous shrinkage priors, Bayes factors, Bayesian model averaging, partitioning methods, as well as variable selection in decision trees and edge selection in graphical models. The handbook targets graduate students and established researchers who seek to understand the latest developments in the field. It also provides a valuable reference for all interested in applying existing methods and/or pursuing methodological extensions. Features: Provides a comprehensive review of methods and applications of Bayesian variable selection. Divided into four parts: Spike-and-Slab Priors; Continuous Shrinkage Priors; Extensions to various Modeling; Other Approaches to Bayesian Variable Selection. Covers theoretical and methodological aspects, as well as worked out examples with R code provided in the online supplement. Includes contributions by experts in the field. Supported by a website with code, data, and other supplementary material







Flexible Bayesian Regression Modelling


Book Description

Flexible Bayesian Regression Modeling is a step-by-step guide to the Bayesian revolution in regression modeling, for use in advanced econometric and statistical analysis where datasets are characterized by complexity, multiplicity, and large sample sizes, necessitating the need for considerable flexibility in modeling techniques. It reviews three forms of flexibility: methods which provide flexibility in their error distribution; methods which model non-central parts of the distribution (such as quantile regression); and finally models that allow the mean function to be flexible (such as spline models). Each chapter discusses the key aspects of fitting a regression model. R programs accompany the methods. This book is particularly relevant to non-specialist practitioners with intermediate mathematical training seeking to apply Bayesian approaches in economics, biology, finance, engineering and medicine. Introduces powerful new nonparametric Bayesian regression techniques to classically trained practitioners Focuses on approaches offering both superior power and methodological flexibility Supplemented with instructive and relevant R programs within the text Covers linear regression, nonlinear regression and quantile regression techniques Provides diverse disciplinary case studies for correlation and optimization problems drawn from Bayesian analysis ‘in the wild’




Bayesian Variable Selection for High Dimensional Data Analysis


Book Description

In the practice of statistical modeling, it is often desirable to have an accurate predictive model. Modern data sets usually have a large number of predictors.Hence parsimony is especially an important issue. Best-subset selection is a conventional method of variable selection. Due to the large number of variables with relatively small sample size and severe collinearity among the variables, standard statistical methods for selecting relevant variables often face difficulties. Bayesian stochastic search variable selection has gained much empirical success in a variety of applications. This book, therefore, proposes a modified Bayesian stochastic variable selection approach for variable selection and two/multi-class classification based on a (multinomial) probit regression model.We demonstrate the performance of the approach via many real data. The results show that our approach selects smaller numbers of relevant variables and obtains competitive classification accuracy based on obtained results.