Robust Cluster Analysis and Variable Selection


Book Description

Clustering remains a vibrant area of research in statistics. Although there are many books on this topic, there are relatively few that are well founded in the theoretical aspects. In Robust Cluster Analysis and Variable Selection, Gunter Ritter presents an overview of the theory and applications of probabilistic clustering and variable selection, synthesizing the key research results of the last 50 years. The author focuses on the robust clustering methods he found to be the most useful on simulated data and real-time applications. The book provides clear guidance for the varying needs of both applications, describing scenarios in which accuracy and speed are the primary goals. Robust Cluster Analysis and Variable Selection includes all of the important theoretical details, and covers the key probabilistic models, robustness issues, optimization algorithms, validation techniques, and variable selection methods. The book illustrates the different methods with simulated data and applies them to real-world data sets that can be easily downloaded from the web. This provides you with guidance in how to use clustering methods as well as applicable procedures and algorithms without having to understand their probabilistic fundamentals.




Model-Based Clustering and Classification for Data Science


Book Description

Cluster analysis finds groups in data automatically. Most methods have been heuristic and leave open such central questions as: how many clusters are there? Which method should I use? How should I handle outliers? Classification assigns new observations to groups given previously classified observations, and also has open questions about parameter tuning, robustness and uncertainty assessment. This book frames cluster analysis and classification in terms of statistical models, thus yielding principled estimation, testing and prediction methods, and sound answers to the central questions. It builds the basic ideas in an accessible but rigorous way, with extensive data examples and R code; describes modern approaches to high-dimensional data and networks; and explains such recent advances as Bayesian regularization, non-Gaussian model-based clustering, cluster merging, variable selection, semi-supervised and robust classification, clustering of functional data, text and images, and co-clustering. Written for advanced undergraduates in data science, as well as researchers and practitioners, it assumes basic knowledge of multivariate calculus, linear algebra, probability and statistics.




Handbook of Cluster Analysis


Book Description

Handbook of Cluster Analysis provides a comprehensive and unified account of the main research developments in cluster analysis. Written by active, distinguished researchers in this area, the book helps readers make informed choices of the most suitable clustering approach for their problem and make better use of existing cluster analysis tools.The




Classification and Data Science in the Digital Age


Book Description

The contributions gathered in this open access book focus on modern methods for data science and classification and present a series of real-world applications. Numerous research topics are covered, ranging from statistical inference and modeling to clustering and dimension reduction, from functional data analysis to time series analysis, and network analysis. The applications reflect new analyses in a variety of fields, including medicine, marketing, genetics, engineering, and education. The book comprises selected and peer-reviewed papers presented at the 17th Conference of the International Federation of Classification Societies (IFCS 2022), held in Porto, Portugal, July 19–23, 2022. The IFCS federates the classification societies and the IFCS biennial conference brings together researchers and stakeholders in the areas of Data Science, Classification, and Machine Learning. It provides a forum for presenting high-quality theoretical and applied works, and promoting and fostering interdisciplinary research and international cooperation. The intended audience is researchers and practitioners who seek the latest developments and applications in the field of data science and classification.




Soft Methods for Data Science


Book Description

This proceedings volume is a collection of peer reviewed papers presented at the 8th International Conference on Soft Methods in Probability and Statistics (SMPS 2016) held in Rome (Italy). The book is dedicated to Data science which aims at developing automated methods to analyze massive amounts of data and to extract knowledge from them. It shows how Data science employs various programming techniques and methods of data wrangling, data visualization, machine learning, probability and statistics. The soft methods proposed in this volume represent a collection of tools in these fields that can also be useful for data science.




The Mathematics of the Uncertain


Book Description

This book is a tribute to Professor Pedro Gil, who created the Department of Statistics, OR and TM at the University of Oviedo, and a former President of the Spanish Society of Statistics and OR (SEIO). In more than eighty original contributions, it illustrates the extent to which Mathematics can help manage uncertainty, a factor that is inherent to real life. Today it goes without saying that, in order to model experiments and systems and to analyze related outcomes and data, it is necessary to consider formal ideas and develop scientific approaches and techniques for dealing with uncertainty. Mathematics is crucial in this endeavor, as this book demonstrates. As Professor Pedro Gil highlighted twenty years ago, there are several well-known mathematical branches for this purpose, including Mathematics of chance (Probability and Statistics), Mathematics of communication (Information Theory), and Mathematics of imprecision (Fuzzy Sets Theory and others). These branches often intertwine, since different sources of uncertainty can coexist, and they are not exhaustive. While most of the papers presented here address the three aforementioned fields, some hail from other Mathematical disciplines such as Operations Research; others, in turn, put the spotlight on real-world studies and applications. The intended audience of this book is mainly statisticians, mathematicians and computer scientists, but practitioners in these areas will certainly also find the book a very interesting read.




Cladag 2017 Book of Short Papers


Book Description

This book is the collection of the Abstract / Short Papers submitted by the authors of the International Conference of The CLAssification and Data Analysis Group (CLADAG) of the Italian Statistical Society (SIS), held in Milan (Italy) on September 13-15, 2017.




KI 2020: Advances in Artificial Intelligence


Book Description

This book constitutes the refereed proceedings of the 43rd German Conference on Artificial Intelligence, KI 2020, held in Bamberg, Germany, in September 2020. The 16 full and 12 short papers presented together with 6 extended abstracts in this volume were carefully reviewed and selected from 62 submissions. As well-established annual conference series KI is dedicated to research on theory and applications across all methods and topic areas of AI research. KI 2020 had a special focus on human-centered AI with highlights on AI and education and explainable machine learning. Due to the Corona pandemic KI 2020 was held as a virtual event.




Mixture Model-Based Classification


Book Description

"This is a great overview of the field of model-based clustering and classification by one of its leading developers. McNicholas provides a resource that I am certain will be used by researchers in statistics and related disciplines for quite some time. The discussion of mixtures with heavy tails and asymmetric distributions will place this text as the authoritative, modern reference in the mixture modeling literature." (Douglas Steinley, University of Missouri) Mixture Model-Based Classification is the first monograph devoted to mixture model-based approaches to clustering and classification. This is both a book for established researchers and newcomers to the field. A history of mixture models as a tool for classification is provided and Gaussian mixtures are considered extensively, including mixtures of factor analyzers and other approaches for high-dimensional data. Non-Gaussian mixtures are considered, from mixtures with components that parameterize skewness and/or concentration, right up to mixtures of multiple scaled distributions. Several other important topics are considered, including mixture approaches for clustering and classification of longitudinal data as well as discussion about how to define a cluster Paul D. McNicholas is the Canada Research Chair in Computational Statistics at McMaster University, where he is a Professor in the Department of Mathematics and Statistics. His research focuses on the use of mixture model-based approaches for classification, with particular attention to clustering applications, and he has published extensively within the field. He is an associate editor for several journals and has served as a guest editor for a number of special issues on mixture models.




Asymptotic Analysis of Mixed Effects Models


Book Description

Large sample techniques are fundamental to all fields of statistics. Mixed effects models, including linear mixed models, generalized linear mixed models, non-linear mixed effects models, and non-parametric mixed effects models are complex models, yet, these models are extensively used in practice. This monograph provides a comprehensive account of asymptotic analysis of mixed effects models. The monograph is suitable for researchers and graduate students who wish to learn about asymptotic tools and research problems in mixed effects models. It may also be used as a reference book for a graduate-level course on mixed effects models, or asymptotic analysis.