Bayesian Variable Selection Using Lasso


Book Description

This thesis proposes to combine the Kuo and Mallick approach (1998) and Bayesian Lasso approach (2008) by introducing a Laplace distribution on the conditional prior of the regression parameters given the indicator variables. Gibbs Sampling will be used to sample from the joint posterior distribution. We compare these two new method to existing Bayesian variable selection methods such as Kuo and Mallick, George and McCulloch and Park and Casella and provide an overall qualitative assessment of the efficiency of mixing and separation. We will also use air pollution dataset to test the proposed methodology with the goal of identifying the main factors controlling the pollutant concentration.




Handbook of Bayesian Variable Selection


Book Description

Bayesian variable selection has experienced substantial developments over the past 30 years with the proliferation of large data sets. Identifying relevant variables to include in a model allows simpler interpretation, avoids overfitting and multicollinearity, and can provide insights into the mechanisms underlying an observed phenomenon. Variable selection is especially important when the number of potential predictors is substantially larger than the sample size and sparsity can reasonably be assumed. The Handbook of Bayesian Variable Selection provides a comprehensive review of theoretical, methodological and computational aspects of Bayesian methods for variable selection. The topics covered include spike-and-slab priors, continuous shrinkage priors, Bayes factors, Bayesian model averaging, partitioning methods, as well as variable selection in decision trees and edge selection in graphical models. The handbook targets graduate students and established researchers who seek to understand the latest developments in the field. It also provides a valuable reference for all interested in applying existing methods and/or pursuing methodological extensions. Features: Provides a comprehensive review of methods and applications of Bayesian variable selection. Divided into four parts: Spike-and-Slab Priors; Continuous Shrinkage Priors; Extensions to various Modeling; Other Approaches to Bayesian Variable Selection. Covers theoretical and methodological aspects, as well as worked out examples with R code provided in the online supplement. Includes contributions by experts in the field. Supported by a website with code, data, and other supplementary material




A Two-stage Bayesian Variable Selection Method with the Extension of Lasso for Geo-referenced Count Data


Book Description

Due to the complex nature of geo-referenced data, multicollinearity of the risk factors in public health spatial studies is a commonly encountered issue, which leads to low parameter estimation accuracy because it inflates the variance in the regression analysis. To address this issue, we proposed a two-stage variable selection method by extending the least absolute shrinkage and selection operator (Lasso) to the Bayesian spatial setting, investigating the impact of risk factors to health outcomes. Specifically, in stage I, we performed the variable selection using Bayesian Lasso and several other variable selection approaches. Then, in stage II, we performed the model selection with only the selected variables from stage I and compared again the methods. To evaluate the performance of the two-stage variable selection methods, we conducted a simulation study with different distributions for the risk factors, using geo-referenced count data as the outcome and Michigan as the research region. We considered the cases when all candidate risk factors are independently normally distributed, or follow a multivariate normal distribution with different correlation levels. Two other Bayesian variable selection methods, Binary indicator, and the combination of Binary indicator and Lasso are considered and compared as alternative methods. The simulation results indicate that the proposed two-stage Bayesian Lasso variable selection method has the best performance for both independent and dependent cases considered. When compared with the one-stage approach, and the other two alternative methods, the two-stage Bayesian Lasso approach provides the highest estimation accuracy in all scenarios considered.




Bayesian Variable Selection and Estimation


Book Description

The paper considers the classical Bayesian variable selection problem and an important subproblem in which grouping information of predictors is available. We propose the Half Thresholding (HT) estimator for simultaneous variable selection and estimation with shrinkage priors. Under orthogonal design matrix, variable selection consistency and asymptotic distribution of HT estimators are investigated and the oracle property is established with Three Parameter Beta Mixture of Normals (TPBN) priors. We then revisit Bayesian group lasso and use spike and slab priors for variable selection at the group level. In the process, the connection of our model with penalized regression is demonstrated, and the role of posterior median for thresholding is pointed out. We show that the posterior median estimator has the oracle property for group variable selection and estimation under orthogonal design while the group lasso has suboptimal asymptotic estimation rate when variable selection consistency is achieved. Next we consider Bayesian sparse group lasso again with spike and slab priors to select variables both at the group level and also within the group, and develop the necessary algorithm for its implementation. We demonstrate via simulation that the posterior median estimator of our spike and slab models has excellent performance for both variable selection and estimation.




Handbook of Bayesian Variable Selection


Book Description

Bayesian variable selection has experienced substantial developments over the past 30 years with the proliferation of large data sets. Identifying relevant variables to include in a model allows simpler interpretation, avoids overfitting and multicollinearity, and can provide insights into the mechanisms underlying an observed phenomenon. Variable selection is especially important when the number of potential predictors is substantially larger than the sample size and sparsity can reasonably be assumed. The Handbook of Bayesian Variable Selection provides a comprehensive review of theoretical, methodological and computational aspects of Bayesian methods for variable selection. The topics covered include spike-and-slab priors, continuous shrinkage priors, Bayes factors, Bayesian model averaging, partitioning methods, as well as variable selection in decision trees and edge selection in graphical models. The handbook targets graduate students and established researchers who seek to understand the latest developments in the field. It also provides a valuable reference for all interested in applying existing methods and/or pursuing methodological extensions. Features: Provides a comprehensive review of methods and applications of Bayesian variable selection. Divided into four parts: Spike-and-Slab Priors; Continuous Shrinkage Priors; Extensions to various Modeling; Other Approaches to Bayesian Variable Selection. Covers theoretical and methodological aspects, as well as worked out examples with R code provided in the online supplement. Includes contributions by experts in the field. Supported by a website with code, data, and other supplementary material




Monte Carlo Simulation and Resampling Methods for Social Science


Book Description

Taking the topics of a quantitative methodology course and illustrating them through Monte Carlo simulation, this book examines abstract principles, such as bias, efficiency, and measures of uncertainty in an intuitive, visual way. Instead of thinking in the abstract about what would happen to a particular estimator "in repeated samples," the book uses simulation to actually create those repeated samples and summarize the results. The book includes basic examples appropriate for readers learning the material for the first time, as well as more advanced examples that a researcher might use to evaluate an estimator he or she was using in an actual research project. The book also covers a wide range of topics related to Monte Carlo simulation, such as resampling methods, simulations of substantive theory, simulation of quantities of interest (QI) from model results, and cross-validation. Complete R code from all examples is provided so readers can replicate every analysis presented using R.




Statistical Learning with Sparsity


Book Description

Discover New Methods for Dealing with High-Dimensional DataA sparse statistical model has only a small number of nonzero parameters or weights; therefore, it is much easier to estimate and interpret than a dense model. Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underl




Flexible Imputation of Missing Data, Second Edition


Book Description

Missing data pose challenges to real-life data analysis. Simple ad-hoc fixes, like deletion or mean imputation, only work under highly restrictive conditions, which are often not met in practice. Multiple imputation replaces each missing value by multiple plausible values. The variability between these replacements reflects our ignorance of the true (but missing) value. Each of the completed data set is then analyzed by standard methods, and the results are pooled to obtain unbiased estimates with correct confidence intervals. Multiple imputation is a general approach that also inspires novel solutions to old problems by reformulating the task at hand as a missing-data problem. This is the second edition of a popular book on multiple imputation, focused on explaining the application of methods through detailed worked examples using the MICE package as developed by the author. This new edition incorporates the recent developments in this fast-moving field. This class-tested book avoids mathematical and technical details as much as possible: formulas are accompanied by verbal statements that explain the formula in accessible terms. The book sharpens the reader’s intuition on how to think about missing data, and provides all the tools needed to execute a well-grounded quantitative analysis in the presence of missing data.




Bayesian Variable Selection with Spike-and-slab Priors


Book Description

A major focus of intensive methodological research in recent times has been on knowledge extraction from high-dimensional datasets made available by advances in research technologies. Coupled with the growing popularity of Bayesian methods in statistical analysis, a range of new techniques have evolved that allow innovative model-building and inference in high-dimensional settings – an important one among these being Bayesian variable selection (BVS). The broad goal of this thesis is to explore different BVS methods and demonstrate their application in high-dimensional psychological data analysis. In particular, the focus will be on a class of sparsity-enforcing priors called 'spike-and-slab' priors which are mixture priors on regression coefficients with density functions that are peaked at zero (the 'spike') and also have large probability mass for a wide range of non-zero values (the 'slab'). It is demonstrated that BVS with spike-and-slab priors achieved a reasonable degree of dimensionality-reduction when applied to a psychiatric dataset in a logistic regression setup. BVS performance was also compared to that of LASSO (least absolute shrinkage and selection operator), a popular machine-learning technique, as reported in Ahn et al.(2016). The findings indicate that BVS with a spike-and-slab prior provides a competitive alternative to machine-learning methods, with the additional advantages of ease of interpretation and potential to handle more complex models. In conclusion, this thesis serves to add a new cutting-edge technique to the lab’s tool-shed and helps introduce Bayesian variable-selection to researchers in Cognitive Psychology where it still remains relatively unexplored as a dimensionality-reduction tool.




Bayesian Variable Selection Via a Benchmark


Book Description

With increasing appearances of high dimensional data over the past decades, variable selections through likelihood penalization remains a popular yet challenging research area in statistics. Ridge and Lasso, the two of the most popular penalized regression methods, served as the foundation of regularization technique and motivated several extensions to accommodate various circumstances, mostly through frequentist models. These two regularization problems can also be solved by their Bayesian counterparts, via putting proper priors on the regression parameters and then followed by Gibbs sampling. Compared to the frequentist version, the Bayesian framework enables easier interpretation and more straightforward inference on the parameters, based on the posterior distributional results. In general, however, the Bayesian approaches do not provide sparse estimates for the regression coefficients. In this thesis, an innovative Bayesian variable selection method via a benchmark variable in conjunction with a modified BIC is proposed under the framework of linear regression models as the first attempt, to promote both model sparsity and accuracy. The motivation of introducing such a benchmark is discussed, and the statistical properties regarding its role in the model are demonstrated. In short, it serves as a criterion to measure the importance of each variable based on the posterior inference of the corresponding coefficients, and only the most important variables providing the minimal modified BIC value are included. The Bayesian approach via a benchmark is extended to accommodate linear models with covariates exhibiting group structures. An iterative algorithm is implemented to identify both important groups and important variables within the selected groups. What's more, the method is further developed and modified to select variables for generalized linear models, by taking advantage of the normal approximation on the likelihood function. Simulation studies are carried out to assess and compare the performances among the proposed approaches and other state-of-art methods for each of the above three scenarios. The numerical results consistently illustrate our Bayesian variable selection approaches tend to select exactly the true variables or groups, while producing comparable prediction errors as other methods. Besides the numerical work, several real data sets are analyzed by these methods and the corresponding performances are further compared. The variable selection results by our approach are intuitively appealing or consistent with existing literatures in general.