Recursive Partitioning and Applications


Book Description

Multiple complex pathways, characterized by interrelated events and c- ditions, represent routes to many illnesses, diseases, and ultimately death. Although there are substantial data and plausibility arguments suppo- ing many conditions as contributory components of pathways to illness and disease end points, we have, historically, lacked an e?ective method- ogy for identifying the structure of the full pathways. Regression methods, with strong linearity assumptions and data-basedconstraints onthe extent and order of interaction terms, have traditionally been the strategies of choice for relating outcomes to potentially complex explanatory pathways. However, nonlinear relationships among candidate explanatory variables are a generic feature that must be dealt with in any characterization of how health outcomes come about. It is noteworthy that similar challenges arise from data analyses in Economics, Finance, Engineering, etc. Thus, the purpose of this book is to demonstrate the e?ectiveness of a relatively recently developed methodology—recursive partitioning—as a response to this challenge. We also compare and contrast what is learned via rec- sive partitioning with results obtained on the same data sets using more traditional methods. This serves to highlight exactly where—and for what kinds of questions—recursive partitioning–based strategies have a decisive advantage over classical regression techniques.




Recursive Partitioning in the Health Sciences


Book Description

A demonstration of the recursive partitioning methodology and its effectiveness as a response to the challenge of analysing and interpreting multiple complex pathways to many illnesses, diseases, and ultimately death. For comparison purposes, standard regression methods are presented briefly and then applied in the examples. This book is suitable for three broad groups of readers: biomedical researchers, clinicians, public health practitioners including epidemiologists, health service researchers, and environmental policy advisers; consulting statisticians who can use the recursive partitioning technique as a guide in providing effective and insightful solutions to clients'problems; and statisticians interested in methodological and theoretical issues. The book provides an up-to-date summary of the methodological and theoretical underpinnings of recursive partitioning, as well as a host of unsolved problems the solutions of which would advance the rigorous underpinnings of statistics in general.




Classification and Regression Trees


Book Description

The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.




Continuous Time Modeling in the Behavioral and Related Sciences


Book Description

This unique book provides an overview of continuous time modeling in the behavioral and related sciences. It argues that the use of discrete time models for processes that are in fact evolving in continuous time produces problems that make their application in practice highly questionable. One main issue is the dependence of discrete time parameter estimates on the chosen time interval, which leads to incomparability of results across different observation intervals. Continuous time modeling by means of differential equations offers a powerful approach for studying dynamic phenomena, yet the use of this approach in the behavioral and related sciences such as psychology, sociology, economics and medicine, is still rare. This is unfortunate, because in these fields often only a few discrete time (sampled) observations are available for analysis (e.g., daily, weekly, yearly, etc.). However, as emphasized by Rex Bergstrom, the pioneer of continuous-time modeling in econometrics, neither human beings nor the economy cease to exist in between observations. In 16 chapters, the book addresses a vast range of topics in continuous time modeling, from approaches that closely mimic traditional linear discrete time models to highly nonlinear state space modeling techniques. Each chapter describes the type of research questions and data that the approach is most suitable for, provides detailed statistical explanations of the models, and includes one or more applied examples. To allow readers to implement the various techniques directly, accompanying computer code is made available online. The book is intended as a reference work for students and scientists working with longitudinal data who have a Master's- or early PhD-level knowledge of statistics.




Data Clustering: Theory, Algorithms, and Applications, Second Edition


Book Description

Data clustering, also known as cluster analysis, is an unsupervised process that divides a set of objects into homogeneous groups. Since the publication of the first edition of this monograph in 2007, development in the area has exploded, especially in clustering algorithms for big data and open-source software for cluster analysis. This second edition reflects these new developments, covers the basics of data clustering, includes a list of popular clustering algorithms, and provides program code that helps users implement clustering algorithms. Data Clustering: Theory, Algorithms and Applications, Second Edition will be of interest to researchers, practitioners, and data scientists as well as undergraduate and graduate students.




Machine Learning for Knowledge Discovery with R


Book Description

Machine Learning for Knowledge Discovery with R contains methodologies and examples for statistical modelling, inference, and prediction of data analysis. It includes many recent supervised and unsupervised machine learning methodologies such as recursive partitioning modelling, regularized regression, support vector machine, neural network, clustering, and causal-effect inference. Additionally, it emphasizes statistical thinking of data analysis, use of statistical graphs for data structure exploration, and result presentations. The book includes many real-world data examples from life-science, finance, etc. to illustrate the applications of the methods described therein. Key Features: Contains statistical theory for the most recent supervised and unsupervised machine learning methodologies. Emphasizes broad statistical thinking, judgment, graphical methods, and collaboration with subject-matter-experts in analysis, interpretation, and presentations. Written by statistical data analysis practitioner for practitioners. The book is suitable for upper-level-undergraduate or graduate-level data analysis course. It also serves as a useful desk-reference for data analysts in scientific research or industrial applications.




Data Mining With Decision Trees: Theory And Applications (2nd Edition)


Book Description

Decision trees have become one of the most powerful and popular approaches in knowledge discovery and data mining; it is the science of exploring large and complex bodies of data in order to discover useful patterns. Decision tree learning continues to evolve over time. Existing methods are constantly being improved and new methods introduced.This 2nd Edition is dedicated entirely to the field of decision trees in data mining; to cover all aspects of this important technique, as well as improved or new methods and techniques developed after the publication of our first edition. In this new edition, all chapters have been revised and new topics brought in. New topics include Cost-Sensitive Active Learning, Learning with Uncertain and Imbalanced Data, Using Decision Trees beyond Classification Tasks, Privacy Preserving Decision Tree Learning, Lessons Learned from Comparative Studies, and Learning Decision Trees for Big Data. A walk-through guide to existing open-source data mining software is also included in this edition.This book invites readers to explore the many benefits in data mining that decision trees offer:




Springer Handbook of Engineering Statistics


Book Description

In today’s global and highly competitive environment, continuous improvement in the processes and products of any field of engineering is essential for survival. This book gathers together the full range of statistical techniques required by engineers from all fields. It will assist them to gain sensible statistical feedback on how their processes or products are functioning and to give them realistic predictions of how these could be improved. The handbook will be essential reading for all engineers and engineering-connected managers who are serious about keeping their methods and products at the cutting edge of quality and competitiveness.




Partitioned convolution algorithms for real-time auralization


Book Description

This work discusses methods for efficient audio processing with finite impulse response (FIR) filters. Such filters are widely used for high-quality acoustic signal processing, e.g. for headphone or loudspeaker equalization, in binaural synthesis, in spatial sound reproduction techniques and for the auralization of reverberant environments. This work focuses on real-time applications, where the audio processing is subject to minimal delays (latencies). Different fast convolution concepts (transform-based, interpolation-based and number-theoretic), which are used to implement FIR filters efficiently, are examined regarding their applicability in real-time. These fast, elementary techniques can be further improved by the concept of partitioned convolution. This work introduces a classification and a general framework for partitioned convolution algorithms and analyzes the algorithmic classes which are relevant for real-time filtering: Elementary concepts which do not partition the filter impulse response (e.g. regular Overlap-Add and Overlap-Save convolution) and advanced techniques, which partition filters uniformly and non-uniformly. The algorithms are thereby regarded in their analytic complexity, their performance on target hardware, the optimal choice of parameters, assemblies of multiple filters, multi-channel processing and the exchange of filter impulse responses without audible artifacts. Suitable convolution techniques are identified for different types of audio applications, ranging from resource-aware auralizations on mobile devices to extensive room acoustics audio rendering using dedicated multi-processor systems.




Feature Engineering and Selection


Book Description

The process of developing predictive models includes many stages. Most resources focus on the modeling algorithms but neglect other critical aspects of the modeling process. This book describes techniques for finding the best representations of predictors for modeling and for nding the best subset of predictors for improving model performance. A variety of example data sets are used to illustrate the techniques along with R programs for reproducing the results.