Foundations and Methods in Combinatorial and Statistical Data Analysis and Clustering


Book Description

This book offers an original and broad exploration of the fundamental methods in Clustering and Combinatorial Data Analysis, presenting new formulations and ideas within this very active field. With extensive introductions, formal and mathematical developments and real case studies, this book provides readers with a deeper understanding of the mutual relationships between these methods, which are clearly expressed with respect to three facets: logical, combinatorial and statistical. Using relational mathematical representation, all types of data structures can be handled in precise and unified ways which the author highlights in three stages: Clustering a set of descriptive attributes Clustering a set of objects or a set of object categories Establishing correspondence between these two dual clusterings Tools for interpreting the reasons of a given cluster or clustering are also included. Foundations and Methods in Combinatorial and Statistical Data Analysis and Clustering will be a valuable resource for students and researchers who are interested in the areas of Data Analysis, Clustering, Data Mining and Knowledge Discovery.




Seriation in Combinatorial and Statistical Data Analysis


Book Description

This monograph offers an original broad and very diverse exploration of the seriation domain in data analysis, together with building a specific relation to clustering. Relative to a data table crossing a set of objects and a set of descriptive attributes, the search for orders which correspond respectively to these two sets is formalized mathematically and statistically. State-of-the-art methods are created and compared with classical methods and a thorough understanding of the mutual relationships between these methods is clearly expressed. The authors distinguish two families of methods: Geometric representation methods Algorithmic and Combinatorial methods Original and accurate methods are provided in the framework for both families. Their basis and comparison is made on both theoretical and experimental levels. The experimental analysis is very varied and very comprehensive. Seriation in Combinatorial and Statistical Data Analysis has a unique character in the literature falling within the fields of Data Analysis, Data Mining and Knowledge Discovery. It will be a valuable resource for students and researchers in the latter fields.




Classification and Data Science in the Digital Age


Book Description

The contributions gathered in this open access book focus on modern methods for data science and classification and present a series of real-world applications. Numerous research topics are covered, ranging from statistical inference and modeling to clustering and dimension reduction, from functional data analysis to time series analysis, and network analysis. The applications reflect new analyses in a variety of fields, including medicine, marketing, genetics, engineering, and education. The book comprises selected and peer-reviewed papers presented at the 17th Conference of the International Federation of Classification Societies (IFCS 2022), held in Porto, Portugal, July 19–23, 2022. The IFCS federates the classification societies and the IFCS biennial conference brings together researchers and stakeholders in the areas of Data Science, Classification, and Machine Learning. It provides a forum for presenting high-quality theoretical and applied works, and promoting and fostering interdisciplinary research and international cooperation. The intended audience is researchers and practitioners who seek the latest developments and applications in the field of data science and classification.




Data Clustering


Book Description

Research on the problem of clustering tends to be fragmented across the pattern recognition, database, data mining, and machine learning communities. Addressing this problem in a unified way, Data Clustering: Algorithms and Applications provides complete coverage of the entire area of clustering, from basic methods to more refined and complex data clustering approaches. It pays special attention to recent issues in graphs, social networks, and other domains. The book focuses on three primary aspects of data clustering: Methods, describing key techniques commonly used for clustering, such as feature selection, agglomerative clustering, partitional clustering, density-based clustering, probabilistic clustering, grid-based clustering, spectral clustering, and nonnegative matrix factorization Domains, covering methods used for different domains of data, such as categorical data, text data, multimedia data, graph data, biological data, stream data, uncertain data, time series clustering, high-dimensional clustering, and big data Variations and Insights, discussing important variations of the clustering process, such as semisupervised clustering, interactive clustering, multiview clustering, cluster ensembles, and cluster validation In this book, top researchers from around the world explore the characteristics of clustering problems in a variety of application areas. They also explain how to glean detailed insight from the clustering process—including how to verify the quality of the underlying clusters—through supervision, human intervention, or the automated generation of alternative clusters.




Foundations of Data Science


Book Description

This book provides an introduction to the mathematical and algorithmic foundations of data science, including machine learning, high-dimensional geometry, and analysis of large networks. Topics include the counterintuitive nature of data in high dimensions, important linear algebraic techniques such as singular value decomposition, the theory of random walks and Markov chains, the fundamentals of and important algorithms for machine learning, algorithms and analysis for clustering, probabilistic models for large networks, representation learning including topic modelling and non-negative matrix factorization, wavelets and compressed sensing. Important probabilistic techniques are developed including the law of large numbers, tail inequalities, analysis of random projections, generalization guarantees in machine learning, and moment methods for analysis of phase transitions in large random graphs. Additionally, important structural and complexity measures are discussed such as matrix norms and VC-dimension. This book is suitable for both undergraduate and graduate courses in the design and analysis of algorithms for data.




Data Privacy: Foundations, New Developments and the Big Data Challenge


Book Description

This book offers a broad, cohesive overview of the field of data privacy. It discusses, from a technological perspective, the problems and solutions of the three main communities working on data privacy: statistical disclosure control (those with a statistical background), privacy-preserving data mining (those working with data bases and data mining), and privacy-enhancing technologies (those involved in communications and security) communities. Presenting different approaches, the book describes alternative privacy models and disclosure risk measures as well as data protection procedures for respondent, holder and user privacy. It also discusses specific data privacy problems and solutions for readers who need to deal with big data.




Combinatorial Methods in Discrete Distributions


Book Description

A unique approach illustrating discrete distribution theory through combinatorial methods This book provides a unique approach by presenting combinatorial methods in tandem with discrete distribution theory. This method, particular to discreteness, allows readers to gain a deeper understanding of theory by using applications to solve problems. The author makes extensive use of the reduction approach to conditional distributions of independent random occupancy numbers, and provides excellent studies of occupancy and sequential occupancy distributions, convolutions of truncated discrete distributions, and compound and mixture distributions. Combinatorial Methods in Discrete Distributions begins with a brief presentation of set theory followed by basic counting principles. Fundamental principles of combinatorics, finite differences, and discrete probability are included to give readers the necessary foundation to the topics presented in the text. A thorough examination of the field is provided and features: Stirling numbers and generalized factorial coefficients Occupancy and sequential occupancy distributions n-fold convolutions of truncated distributions Compound and mixture distributions Thoroughly worked examples aid readers in understanding complex theory and discovering how theory can be applied to solve practical problems. An appendix with hints and answers to the exercises helps readers work through the more complex sections. Reference notes are provided at the end of each chapter, and an extensive bibliography offers readers a resource for additional information on specialized topics.




Methods for Statistical Data Analysis of Multivariate Observations


Book Description

A practical guide for multivariate statistical techniques-- nowupdated and revised In recent years, innovations in computer technology and statisticalmethodologies have dramatically altered the landscape ofmultivariate data analysis. This new edition of Methods forStatistical Data Analysis of Multivariate Observations explorescurrent multivariate concepts and techniques while retaining thesame practical focus of its predecessor. It integrates methods anddata-based interpretations relevant to multivariate analysis in away that addresses real-world problems arising in many areas ofinterest. Greatly revised and updated, this Second Edition provides helpfulexamples, graphical orientation, numerous illustrations, and anappendix detailing statistical software, including the S (or Splus)and SAS systems. It also offers * An expanded chapter on cluster analysis that covers advances inpattern recognition * New sections on inputs to clustering algorithms and aids forinterpreting the results of cluster analysis * An exploration of some new techniques of summarization andexposure * New graphical methods for assessing the separations among theeigenvalues of a correlation matrix and for comparing sets ofeigenvectors * Knowledge gained from advances in robust estimation anddistributional models that are slightly broader than themultivariate normal This Second Edition is invaluable for graduate students, appliedstatisticians, engineers, and scientists wishing to usemultivariate techniques in a variety of disciplines.




Mathematical Tools for Data Mining


Book Description

This volume was born from the experience of the authors as researchers and educators,whichsuggeststhatmanystudentsofdataminingarehandicapped in their research by the lack of a formal, systematic education in its mat- matics. The data mining literature contains many excellent titles that address the needs of users with a variety of interests ranging from decision making to p- tern investigation in biological data. However, these books do not deal with the mathematical tools that are currently needed by data mining researchers and doctoral students. We felt it timely to produce a book that integrates the mathematics of data mining with its applications. We emphasize that this book is about mathematical tools for data mining and not about data mining itself; despite this, a substantial amount of applications of mathematical c- cepts in data mining are presented. The book is intended as a reference for the working data miner. In our opinion, three areas of mathematics are vital for data mining: set theory,includingpartially orderedsetsandcombinatorics;linear algebra,with its many applications in principal component analysis and neural networks; and probability theory, which plays a foundational role in statistics, machine learning and data mining. Thisvolumeisdedicatedtothestudyofset-theoreticalfoundationsofdata mining. Two further volumes are contemplated that will cover linear algebra and probability theory. The ?rst part of this book, dedicated to set theory, begins with a study of functionsandrelations.Applicationsofthesefundamentalconceptstosuch- sues as equivalences and partitions are discussed. Also, we prepare the ground for the following volumes by discussing indicator functions, ?elds and?-?elds, and other concepts.




Foundations and Advances in Data Mining


Book Description

With the growing use of information technology and the recent advances in web systems, the amount of data available to users has increased exponentially. Thus, there is a critical need to understand the content of the data. As a result, data-mining has become a popular research topic in recent years for the treatment of the "data rich and information poor" syndrome. In this carefully edited volume a theoretical foundation as well as important new directions for data-mining research are presented. It brings together a set of well respected data mining theoreticians and researchers with practical data mining experiences. The presented theories will give data mining practitioners a scientific perspective in data mining and thus provide more insight into their problems, and the provided new data mining topics can be expected to stimulate further research in these important directions.