Constrained Principal Component Analysis and Related Techniques


Book Description

In multivariate data analysis, regression techniques predict one set of variables from another while principal component analysis (PCA) finds a subspace of minimal dimensionality that captures the largest variability in the data. How can regression analysis and PCA be combined in a beneficial way? Why and when is it a good idea to combine them? Wha




Constrained Principal Component Analysis and Related Techniques


Book Description

Constrained Principal Component Analysis and Related Techniques shows how constrained principal component analysis (CPCA) offers a unified framework for regression analysis and PCA. The book begins with four concrete examples of CPCA that provide you with a basic understanding of the technique and its applications. It gives a detailed account of projection and singular value decomposition. The author then describes the basic data requirements, models, and analytical tools for CPCA and their immediate extensions. He also introduces techniques that are special cases of or closely related to CPCA and discusses several topics relevant to practical uses of CPCA. The book concludes with a technique that imposes different constraints on different dimensions, along with its analytical extensions. Features, Presents an in-depth, unified theoretical treatment of CPCA, Contains implementation details and many real application examples, Offers material for methodologically oriented readers interested in developing statistical techniques of their own, Keeps the use of complicated iterative methods to a minimum, Gives an overview of computer software for CPCA in the appendix, Provides MATLAB® programs and data on the author's website Book jacket.




Principal Component Analysis


Book Description

Principal component analysis is probably the oldest and best known of the It was first introduced by Pearson (1901), techniques ofmultivariate analysis. and developed independently by Hotelling (1933). Like many multivariate methods, it was not widely used until the advent of electronic computers, but it is now weIl entrenched in virtually every statistical computer package. The central idea of principal component analysis is to reduce the dimen sionality of a data set in which there are a large number of interrelated variables, while retaining as much as possible of the variation present in the data set. This reduction is achieved by transforming to a new set of variables, the principal components, which are uncorrelated, and which are ordered so that the first few retain most of the variation present in all of the original variables. Computation of the principal components reduces to the solution of an eigenvalue-eigenvector problem for a positive-semidefinite symmetrie matrix. Thus, the definition and computation of principal components are straightforward but, as will be seen, this apparently simple technique has a wide variety of different applications, as weIl as a number of different deri vations. Any feelings that principal component analysis is a narrow subject should soon be dispelled by the present book; indeed some quite broad topics which are related to principal component analysis receive no more than a brief mention in the final two chapters.




Generalized Principal Component Analysis


Book Description

This book provides a comprehensive introduction to the latest advances in the mathematical theory and computational tools for modeling high-dimensional data drawn from one or multiple low-dimensional subspaces (or manifolds) and potentially corrupted by noise, gross errors, or outliers. This challenging task requires the development of new algebraic, geometric, statistical, and computational methods for efficient and robust estimation and segmentation of one or multiple subspaces. The book also presents interesting real-world applications of these new methods in image processing, image and video segmentation, face recognition and clustering, and hybrid system identification etc. This book is intended to serve as a textbook for graduate students and beginning researchers in data science, machine learning, computer vision, image and signal processing, and systems theory. It contains ample illustrations, examples, and exercises and is made largely self-contained with three Appendices which survey basic concepts and principles from statistics, optimization, and algebraic-geometry used in this book. René Vidal is a Professor of Biomedical Engineering and Director of the Vision Dynamics and Learning Lab at The Johns Hopkins University. Yi Ma is Executive Dean and Professor at the School of Information Science and Technology at ShanghaiTech University. S. Shankar Sastry is Dean of the College of Engineering, Professor of Electrical Engineering and Computer Science and Professor of Bioengineering at the University of California, Berkeley.




Matrix-Based Introduction to Multivariate Data Analysis


Book Description

This is the first textbook that allows readers who may be unfamiliar with matrices to understand a variety of multivariate analysis procedures in matrix forms. By explaining which models underlie particular procedures and what objective function is optimized to fit the model to the data, it enables readers to rapidly comprehend multivariate data analysis. Arranged so that readers can intuitively grasp the purposes for which multivariate analysis procedures are used, the book also offers clear explanations of those purposes, with numerical examples preceding the mathematical descriptions. Supporting the modern matrix formulations by highlighting singular value decomposition among theorems in matrix algebra, this book is useful for undergraduate students who have already learned introductory statistics, as well as for graduate students and researchers who are not familiar with matrix-intensive formulations of multivariate data analysis. The book begins by explaining fundamental matrix operations and the matrix expressions of elementary statistics. Then, it offers an introduction to popular multivariate procedures, with each chapter featuring increasing advanced levels of matrix algebra. Further the book includes in six chapters on advanced procedures, covering advanced matrix operations and recently proposed multivariate procedures, such as sparse estimation, together with a clear explication of the differences between principal components and factor analyses solutions. In a nutshell, this book allows readers to gain an understanding of the latest developments in multivariate data science.




The Multiple Facets of Partial Least Squares and Related Methods


Book Description

This volume presents state of the art theories, new developments, and important applications of Partial Least Square (PLS) methods. The text begins with the invited communications of current leaders in the field who cover the history of PLS, an overview of methodological issues, and recent advances in regression and multi-block approaches. The rest of the volume comprises selected, reviewed contributions from the 8th International Conference on Partial Least Squares and Related Methods held in Paris, France, on 26-28 May, 2014. They are organized in four coherent sections: 1) new developments in genomics and brain imaging, 2) new and alternative methods for multi-table and path analysis, 3) advances in partial least square regression (PLSR), and 4) partial least square path modeling (PLS-PM) breakthroughs and applications. PLS methods are very versatile methods that are now used in areas as diverse as engineering, life science, sociology, psychology, brain imaging, genomics, and business among both academics and practitioners. The selected chapters here highlight this diversity with applied examples as well as the most recent advances.




Optimal Quantification and Symmetry


Book Description

This book offers a unique new look at the familiar quantification theory from the point of view of mathematical symmetry and spatial symmetry. Symmetry exists in many aspects of our life—for instance, in the arts and biology as an ingredient of beauty and equilibrium, and more importantly, for data analysis as an indispensable representation of functional optimality. This unique focus on symmetry clarifies the objectives of quantification theory and the demarcation of quantification space, something that has never caught the attention of researchers. Mathematical symmetry is well known, as can be inferred from Hirschfeld’s simultaneous linear regressions, but spatial symmetry has not been discussed before, except for what one may infer from Nishisato’s dual scaling. The focus on symmetry here clarifies the demarcation of quantification analysis and makes it easier to understand such a perennial problem as that of joint graphical display in quantification theory. The new framework will help advance the frontier of further developments of quantification theory. Many numerical examples are included to clarify the details of quantification theory, with a focus on symmetry as its operational principle. In this way, the book is useful not only for graduate students but also for researchers in diverse areas of data analysis.




Robust Cluster Analysis and Variable Selection


Book Description

Clustering remains a vibrant area of research in statistics. Although there are many books on this topic, there are relatively few that are well founded in the theoretical aspects. In Robust Cluster Analysis and Variable Selection, Gunter Ritter presents an overview of the theory and applications of probabilistic clustering and variable selection, synthesizing the key research results of the last 50 years. The author focuses on the robust clustering methods he found to be the most useful on simulated data and real-time applications. The book provides clear guidance for the varying needs of both applications, describing scenarios in which accuracy and speed are the primary goals. Robust Cluster Analysis and Variable Selection includes all of the important theoretical details, and covers the key probabilistic models, robustness issues, optimization algorithms, validation techniques, and variable selection methods. The book illustrates the different methods with simulated data and applies them to real-world data sets that can be easily downloaded from the web. This provides you with guidance in how to use clustering methods as well as applicable procedures and algorithms without having to understand their probabilistic fundamentals.




Asymptotic Analysis of Mixed Effects Models


Book Description

Large sample techniques are fundamental to all fields of statistics. Mixed effects models, including linear mixed models, generalized linear mixed models, non-linear mixed effects models, and non-parametric mixed effects models are complex models, yet, these models are extensively used in practice. This monograph provides a comprehensive account of asymptotic analysis of mixed effects models. The monograph is suitable for researchers and graduate students who wish to learn about asymptotic tools and research problems in mixed effects models. It may also be used as a reference book for a graduate-level course on mixed effects models, or asymptotic analysis.




Missing and Modified Data in Nonparametric Estimation


Book Description

This book presents a systematic and unified approach for modern nonparametric treatment of missing and modified data via examples of density and hazard rate estimation, nonparametric regression, filtering signals, and time series analysis. All basic types of missing at random and not at random, biasing, truncation, censoring, and measurement errors are discussed, and their treatment is explained. Ten chapters of the book cover basic cases of direct data, biased data, nondestructive and destructive missing, survival data modified by truncation and censoring, missing survival data, stationary and nonstationary time series and processes, and ill-posed modifications. The coverage is suitable for self-study or a one-semester course for graduate students with a prerequisite of a standard course in introductory probability. Exercises of various levels of difficulty will be helpful for the instructor and self-study. The book is primarily about practically important small samples. It explains when consistent estimation is possible, and why in some cases missing data should be ignored and why others must be considered. If missing or data modification makes consistent estimation impossible, then the author explains what type of action is needed to restore the lost information. The book contains more than a hundred figures with simulated data that explain virtually every setting, claim, and development. The companion R software package allows the reader to verify, reproduce and modify every simulation and used estimators. This makes the material fully transparent and allows one to study it interactively. Sam Efromovich is the Endowed Professor of Mathematical Sciences and the Head of the Actuarial Program at the University of Texas at Dallas. He is well known for his work on the theory and application of nonparametric curve estimation and is the author of Nonparametric Curve Estimation: Methods, Theory, and Applications. Professor Sam Efromovich is a Fellow of the Institute of Mathematical Statistics and the American Statistical Association.