Bayesian Variable Selection and Functional Data Analysis


Book Description

High-dimensional statistics is one of the most studied topics in the field of statistics. The most interesting problem to arise in the last 15 years is variable selection or subset selection. Variable selection is a strong statistical tool that can be explored in functional data analysis. In the first part of this thesis, we implement a Bayesian variable selection method for automatic knot selection. We propose a spike-and-slab prior on knots and formulate a conjugate stochastic search variable selection for significant knots. The computation is substantially faster than existing knot selection methods, as we use Metropolis-Hastings algorithms and a Gibbs sampler for estimation. This work focuses on a single nonlinear covariate, modeled as regression splines. In the next stage, we study Bayesian variable selection in additive models with high-dimensional predictors. The selection of nonlinear functions in models is highly important in recent research, and the Bayesian method of selection has more advantages than contemporary frequentist methods. Chapter 2 examines Bayesian sparse group lasso theory based on spike-and-slab priors to determine its applicability for variable selection and function estimation in nonparametric additive models.The primary objective of Chapter 3 is to build a classification method using longitudinal volumetric magnetic resonance imaging (MRI) data from five regions of interest (ROIs). A functional data analysis method is used to handle the longitudinal measurement of ROIs, and the functional coefficients are later used in the classification models. We propose a P\\'olya-gamma augmentation method to classify normal controls and diseased patients based on functional MRI measurements. We obtain fast-posterior sampling by avoiding the slow and complicated Metropolis-Hastings algorithm. Our main motivation is to determine the important ROIs that have the highest separating power to classify our dichotomous response. We compare the sensitivity, specificity, and accuracy of the classification based on single ROIs and with various combinations of them. We obtain a sensitivity of over 85% and a specificity of around 90% for most of the combinations.Next, we work with Bayesian classification and selection methodology. The main goal of Chapter 4 is to employ longitudinal trajectories in a significant number of sub-regional brain volumetric MRI data as statistical predictors for Alzheimer's disease (AD) classification. We use logistic regression in a Bayesian framework that includes many functional predictors. The direct sampling of regression coefficients from the Bayesian logistic model is difficult due to its complicated likelihood function. In high-dimensional scenarios, the selection of predictors is paramount with the introduction of either spike-and-slab priors, non-local priors, or Horseshoe priors. We seek to avoid the complicated Metropolis-Hastings approach and to develop an easily implementable Gibbs sampler. In addition, the Bayesian estimation provides proper estimates of the model parameters, which are also useful for building inference. Another advantage of working with logistic regression is that it calculates the log of odds of relative risk for AD compared to normal control based on the selected longitudinal predictors, rather than simply classifying patients based on cross-sectional estimates. Ultimately, however, we combine approaches and use a probability threshold to classify individual patients. We employ 49 functional predictors consisting of volumetric estimates of brain sub-regions, chosen for their established clinical significance. Moreover, the use of spike-and-slab priors ensures that many redundant predictors are dropped from the model.Finally, we present a new approach of Bayesian model-based clustering for spatiotemporal data in chapter 5 . A simple linear mixed model (LME) derived from a functional model is used to model spatiotemporal cerebral white matter data extracted from healthy aging individuals. LME provides us with prior information for spatial covariance structure and brain segmentation based on white matter intensity. This motivates us to build stochastic model-based clustering to group voxels considering their longitudinal and location information. The cluster-specific random effect causes correlation among repeated measures. The problem of finding partitions is dealt with by imposing prior structure on cluster partitions in order to derive a stochastic objective function.




Handbook of Bayesian Variable Selection


Book Description

Bayesian variable selection has experienced substantial developments over the past 30 years with the proliferation of large data sets. Identifying relevant variables to include in a model allows simpler interpretation, avoids overfitting and multicollinearity, and can provide insights into the mechanisms underlying an observed phenomenon. Variable selection is especially important when the number of potential predictors is substantially larger than the sample size and sparsity can reasonably be assumed. The Handbook of Bayesian Variable Selection provides a comprehensive review of theoretical, methodological and computational aspects of Bayesian methods for variable selection. The topics covered include spike-and-slab priors, continuous shrinkage priors, Bayes factors, Bayesian model averaging, partitioning methods, as well as variable selection in decision trees and edge selection in graphical models. The handbook targets graduate students and established researchers who seek to understand the latest developments in the field. It also provides a valuable reference for all interested in applying existing methods and/or pursuing methodological extensions. Features: Provides a comprehensive review of methods and applications of Bayesian variable selection. Divided into four parts: Spike-and-Slab Priors; Continuous Shrinkage Priors; Extensions to various Modeling; Other Approaches to Bayesian Variable Selection. Covers theoretical and methodological aspects, as well as worked out examples with R code provided in the online supplement. Includes contributions by experts in the field. Supported by a website with code, data, and other supplementary material




Gaussian Process Regression Analysis for Functional Data


Book Description

Gaussian Process Regression Analysis for Functional Data presents nonparametric statistical methods for functional regression analysis, specifically the methods based on a Gaussian process prior in a functional space. The authors focus on problems involving functional response variables and mixed covariates of functional and scalar variables. Covering the basics of Gaussian process regression, the first several chapters discuss functional data analysis, theoretical aspects based on the asymptotic properties of Gaussian process regression models, and new methodological developments for high dimensional data and variable selection. The remainder of the text explores advanced topics of functional regression analysis, including novel nonparametric statistical methods for curve prediction, curve clustering, functional ANOVA, and functional regression analysis of batch data, repeated curves, and non-Gaussian data. Many flexible models based on Gaussian processes provide efficient ways of model learning, interpreting model structure, and carrying out inference, particularly when dealing with large dimensional functional data. This book shows how to use these Gaussian process regression models in the analysis of functional data. Some MATLAB® and C codes are available on the first author’s website.




Functional Data Analysis with R and MATLAB


Book Description

The book provides an application-oriented overview of functional analysis, with extended and accessible presentations of key concepts such as spline basis functions, data smoothing, curve registration, functional linear models and dynamic systems Functional data analysis is put to work in a wide a range of applications, so that new problems are likely to find close analogues in this book The code in R and Matlab in the book has been designed to permit easy modification to adapt to new data structures and research problems




Bayesian Variable Selection for High Dimensional Data Analysis


Book Description

In the practice of statistical modeling, it is often desirable to have an accurate predictive model. Modern data sets usually have a large number of predictors.Hence parsimony is especially an important issue. Best-subset selection is a conventional method of variable selection. Due to the large number of variables with relatively small sample size and severe collinearity among the variables, standard statistical methods for selecting relevant variables often face difficulties. Bayesian stochastic search variable selection has gained much empirical success in a variety of applications. This book, therefore, proposes a modified Bayesian stochastic variable selection approach for variable selection and two/multi-class classification based on a (multinomial) probit regression model.We demonstrate the performance of the approach via many real data. The results show that our approach selects smaller numbers of relevant variables and obtains competitive classification accuracy based on obtained results.




Bayesian Data Analysis, Third Edition


Book Description

Now in its third edition, this classic book is widely considered the leading text on Bayesian methods, lauded for its accessible, practical approach to analyzing data and solving research problems. Bayesian Data Analysis, Third Edition continues to take an applied approach to analysis using up-to-date Bayesian methods. The authors—all leaders in the statistics community—introduce basic concepts from a data-analytic perspective before presenting advanced methods. Throughout the text, numerous worked examples drawn from real applications and research emphasize the use of Bayesian inference in practice. New to the Third Edition Four new chapters on nonparametric modeling Coverage of weakly informative priors and boundary-avoiding priors Updated discussion of cross-validation and predictive information criteria Improved convergence monitoring and effective sample size calculations for iterative simulation Presentations of Hamiltonian Monte Carlo, variational Bayes, and expectation propagation New and revised software code The book can be used in three different ways. For undergraduate students, it introduces Bayesian inference starting from first principles. For graduate students, the text presents effective current approaches to Bayesian modeling and computation in statistics and related fields. For researchers, it provides an assortment of Bayesian methods in applied statistics. Additional materials, including data sets used in the examples, solutions to selected exercises, and software instructions, are available on the book’s web page.




Bayesian Variable Selection with Spike-and-slab Priors


Book Description

A major focus of intensive methodological research in recent times has been on knowledge extraction from high-dimensional datasets made available by advances in research technologies. Coupled with the growing popularity of Bayesian methods in statistical analysis, a range of new techniques have evolved that allow innovative model-building and inference in high-dimensional settings – an important one among these being Bayesian variable selection (BVS). The broad goal of this thesis is to explore different BVS methods and demonstrate their application in high-dimensional psychological data analysis. In particular, the focus will be on a class of sparsity-enforcing priors called 'spike-and-slab' priors which are mixture priors on regression coefficients with density functions that are peaked at zero (the 'spike') and also have large probability mass for a wide range of non-zero values (the 'slab'). It is demonstrated that BVS with spike-and-slab priors achieved a reasonable degree of dimensionality-reduction when applied to a psychiatric dataset in a logistic regression setup. BVS performance was also compared to that of LASSO (least absolute shrinkage and selection operator), a popular machine-learning technique, as reported in Ahn et al.(2016). The findings indicate that BVS with a spike-and-slab prior provides a competitive alternative to machine-learning methods, with the additional advantages of ease of interpretation and potential to handle more complex models. In conclusion, this thesis serves to add a new cutting-edge technique to the lab’s tool-shed and helps introduce Bayesian variable-selection to researchers in Cognitive Psychology where it still remains relatively unexplored as a dimensionality-reduction tool.




Bayesian Variable Selection in Linear and Non-linear Models


Book Description

Appropriate feature selection is a fundamental problem in the field of statistics. Models with large number of features or variables require special attention due to the computational complexity of the huge model space. This is generally known as the variable or model selection problem in the field of statistics whereas in machine learning and other literature, this is also known as feature selection, attribute selection or variable subset selection. The method of variable selection is the process of efficiently selecting an optimal subset of relevant variables for use in model construction. The central assumption in this methodology is that the data contain many redundant variable; those which do not provide any significant additional information than the optimally selected subset of variable. Variable selection is widely used in all application areas of data analytics, ranging from optimal selection of genes in large scale micro-array studies, to optimal selection of biomarkers for targeted therapy in cancer genomics to selection of optimal predictors in business analytics. Under the Bayesian approach, the formal way to perform this optimal selection is to select the model with highest posterior probability. Using this fact the problem may be thought as an optimization problem over the model space where the objective function is the posterior probability of model and the maximization is taken place with respect to the models. We propose an efficient method for implementing this optimization and we illustrate its feasibility in high dimensional problems. By means of various simulation studies, this new approach has been shown to be efficient and to outperform other statistical feature selection methods methods namely median probability model and sampling method with frequency based estimators. Theoretical justifications are provided. Applications to logistic regression and survival regression are discussed.




Multivariable Model - Building


Book Description

Multivariable regression models are of fundamental importance in all areas of science in which empirical data must be analyzed. This book proposes a systematic approach to building such models based on standard principles of statistical modeling. The main emphasis is on the fractional polynomial method for modeling the influence of continuous variables in a multivariable context, a topic for which there is no standard approach. Existing options range from very simple step functions to highly complex adaptive methods such as multivariate splines with many knots and penalisation. This new approach, developed in part by the authors over the last decade, is a compromise which promotes interpretable, comprehensible and transportable models.