Outlier Analysis


Book Description

This book provides comprehensive coverage of the field of outlier analysis from a computer science point of view. It integrates methods from data mining, machine learning, and statistics within the computational framework and therefore appeals to multiple communities. The chapters of this book can be organized into three categories: Basic algorithms: Chapters 1 through 7 discuss the fundamental algorithms for outlier analysis, including probabilistic and statistical methods, linear methods, proximity-based methods, high-dimensional (subspace) methods, ensemble methods, and supervised methods. Domain-specific methods: Chapters 8 through 12 discuss outlier detection algorithms for various domains of data, such as text, categorical data, time-series data, discrete sequence data, spatial data, and network data. Applications: Chapter 13 is devoted to various applications of outlier analysis. Some guidance is also provided for the practitioner. The second edition of this book is more detailed and is written to appeal to both researchers and practitioners. Significant new material has been added on topics such as kernel methods, one-class support-vector machines, matrix factorization, neural networks, outlier ensembles, time-series methods, and subspace methods. It is written as a textbook and can be used for classroom teaching.




Outlier Analysis


Book Description

With the increasing advances in hardware technology for data collection, and advances in software technology (databases) for data organization, computer scientists have increasingly participated in the latest advancements of the outlier analysis field. Computer scientists, specifically, approach this field based on their practical experiences in managing large amounts of data, and with far fewer assumptions– the data can be of any type, structured or unstructured, and may be extremely large. Outlier Analysis is a comprehensive exposition, as understood by data mining experts, statisticians and computer scientists. The book has been organized carefully, and emphasis was placed on simplifying the content, so that students and practitioners can also benefit. Chapters will typically cover one of three areas: methods and techniques commonly used in outlier analysis, such as linear methods, proximity-based methods, subspace methods, and supervised methods; data domains, such as, text, categorical, mixed-attribute, time-series, streaming, discrete sequence, spatial and network data; and key applications of these methods as applied to diverse domains such as credit card fraud detection, intrusion detection, medical diagnosis, earth science, web log analytics, and social network analysis are covered.




Outlier Detection: Techniques and Applications


Book Description

This book, drawing on recent literature, highlights several methodologies for the detection of outliers and explains how to apply them to solve several interesting real-life problems. The detection of objects that deviate from the norm in a data set is an essential task in data mining due to its significance in many contemporary applications. More specifically, the detection of fraud in e-commerce transactions and discovering anomalies in network data have become prominent tasks, given recent developments in the field of information and communication technologies and security. Accordingly, the book sheds light on specific state-of-the-art algorithmic approaches such as the community-based analysis of networks and characterization of temporal outliers present in dynamic networks. It offers a valuable resource for young researchers working in data mining, helping them understand the technical depth of the outlier detection problem and devise innovative solutions to address related challenges.




Outliers


Book Description

From the bestselling author of Blink and The Tipping Point, Malcolm Gladwell's Outliers: The Story of Success overturns conventional wisdom about genius to show us what makes an ordinary person an extreme overachiever. Why do some people achieve so much more than others? Can they lie so far out of the ordinary? In this provocative and inspiring book, Malcolm Gladwell looks at everyone from rock stars to professional athletes, software billionaires to scientific geniuses, to show that the story of success is far more surprising, and far more fascinating, than we could ever have imagined. He reveals that it's as much about where we're from and what we do, as who we are - and that no one, not even a genius, ever makes it alone. Outliers will change the way you think about your own life story, and about what makes us all unique. 'Gladwell is not only a brilliant storyteller; he can see what those stories tell us, the lessons they contain' Guardian 'Malcolm Gladwell is a global phenomenon ... he has a genius for making everything he writes seem like an impossible adventure' Observer 'He is the best kind of writer - the kind who makes you feel like you're a genius, rather than he's a genius' The Times




Identification of Outliers


Book Description

The problem of outliers is one of the oldest in statistics, and during the last century and a half interest in it has waxed and waned several times. Currently it is once again an active research area after some years of relative neglect, and recent work has solved a number of old problems in outlier theory, and identified new ones. The major results are, however, scattered amongst many journal articles, and for some time there has been a clear need to bring them together in one place. That was the original intention of this monograph: but during execution it became clear that the existing theory of outliers was deficient in several areas, and so the monograph also contains a number of new results and conjectures. In view of the enormous volume ofliterature on the outlier problem and its cousins, no attempt has been made to make the coverage exhaustive. The material is concerned almost entirely with the use of outlier tests that are known (or may reasonably be expected) to be optimal in some way. Such topics as robust estimation are largely ignored, being covered more adequately in other sources. The numerous ad hoc statistics proposed in the early work on the grounds of intuitive appeal or computational simplicity also are not discussed in any detail.




Robust Regression and Outlier Detection


Book Description

WILEY-INTERSCIENCE PAPERBACK SERIES The Wiley-Interscience Paperback Series consists of selectedbooks that have been made more accessible to consumers in an effortto increase global appeal and general circulation. With these newunabridged softcover volumes, Wiley hopes to extend the lives ofthese works by making them available to future generations ofstatisticians, mathematicians, and scientists. "The writing style is clear and informal, and much of thediscussion is oriented to application. In short, the book is akeeper." –Mathematical Geology "I would highly recommend the addition of this book to thelibraries of both students and professionals. It is a usefultextbook for the graduate student, because it emphasizes both thephilosophy and practice of robustness in regression settings, andit provides excellent examples of precise, logical proofs oftheorems. . . .Even for those who are familiar with robustness, thebook will be a good reference because it consolidates the researchin high-breakdown affine equivariant estimators and includes anextensive bibliography in robust regression, outlier diagnostics,and related methods. The aim of this book, the authors tell us, is‘to make robust regression available for everyday statisticalpractice.’ Rousseeuw and Leroy have included all of thenecessary ingredients to make this happen." –Journal of the American Statistical Association




Secondary Analysis of Electronic Health Records


Book Description

This book trains the next generation of scientists representing different disciplines to leverage the data generated during routine patient care. It formulates a more complete lexicon of evidence-based recommendations and support shared, ethical decision making by doctors with their patients. Diagnostic and therapeutic technologies continue to evolve rapidly, and both individual practitioners and clinical teams face increasingly complex ethical decisions. Unfortunately, the current state of medical knowledge does not provide the guidance to make the majority of clinical decisions on the basis of evidence. The present research infrastructure is inefficient and frequently produces unreliable results that cannot be replicated. Even randomized controlled trials (RCTs), the traditional gold standards of the research reliability hierarchy, are not without limitations. They can be costly, labor intensive, and slow, and can return results that are seldom generalizable to every patient population. Furthermore, many pertinent but unresolved clinical and medical systems issues do not seem to have attracted the interest of the research enterprise, which has come to focus instead on cellular and molecular investigations and single-agent (e.g., a drug or device) effects. For clinicians, the end result is a bit of a “data desert” when it comes to making decisions. The new research infrastructure proposed in this book will help the medical profession to make ethically sound and well informed decisions for their patients.




Outlier Detection for Temporal Data


Book Description

Outlier (or anomaly) detection is a very broad field which has been studied in the context of a large number of research areas like statistics, data mining, sensor networks, environmental science, distributed systems, spatio-temporal mining, etc. Initial research in outlier detection focused on time series-based outliers (in statistics). Since then, outlier detection has been studied on a large variety of data types including high-dimensional data, uncertain data, stream data, network data, time series data, spatial data, and spatio-temporal data. While there have been many tutorials and surveys for general outlier detection, we focus on outlier detection for temporal data in this book. A large number of applications generate temporal datasets. For example, in our everyday life, various kinds of records like credit, personnel, financial, judicial, medical, etc., are all temporal. This stresses the need for an organized and detailed study of outliers with respect to such temporal data. In the past decade, there has been a lot of research on various forms of temporal data including consecutive data snapshots, series of data snapshots and data streams. Besides the initial work on time series, researchers have focused on rich forms of data including multiple data streams, spatio-temporal data, network data, community distribution data, etc. Compared to general outlier detection, techniques for temporal outlier detection are very different. In this book, we will present an organized picture of both recent and past research in temporal outlier detection. We start with the basics and then ramp up the reader to the main ideas in state-of-the-art outlier detection techniques. We motivate the importance of temporal outlier detection and brief the challenges beyond usual outlier detection. Then, we list down a taxonomy of proposed techniques for temporal outlier detection. Such techniques broadly include statistical techniques (like AR models, Markov models, histograms, neural networks), distance- and density-based approaches, grouping-based approaches (clustering, community detection), network-based approaches, and spatio-temporal outlier detection approaches. We summarize by presenting a wide collection of applications where temporal outlier detection techniques have been applied to discover interesting outliers. Table of Contents: Preface / Acknowledgments / Figure Credits / Introduction and Challenges / Outlier Detection for Time Series and Data Sequences / Outlier Detection for Data Streams / Outlier Detection for Distributed Data Streams / Outlier Detection for Spatio-Temporal Data / Outlier Detection for Temporal Network Data / Applications of Outlier Detection for Temporal Data / Conclusions and Research Directions / Bibliography / Authors' Biographies




Outliers


Book Description

"This brief monograph, in the broadest terms, reviews some of the techniques for outlier detection and analysis. In addition, the effect of the presence of outliers on the statistical parameters such as higher-order moments, quartiles, deciles, percentiles, skewness, and kurtosis, etc. of the distribution are studied. It also discusses the masking and swamping effect of outliers and some primitive methods of detecting these behaviors. Furthermore, some methods of detecting outliers in multivariate data using the clustering algorithm approach are also discussed"--




New Developments in Unsupervised Outlier Detection


Book Description

This book enriches unsupervised outlier detection research by proposing several new distance-based and density-based outlier scores in a k-nearest neighbors’ setting. The respective chapters highlight the latest developments in k-nearest neighbor-based outlier detection research and cover such topics as our present understanding of unsupervised outlier detection in general; distance-based and density-based outlier detection in particular; and the applications of the latest findings to boundary point detection and novel object detection. The book also offers a new perspective on bridging the gap between k-nearest neighbor-based outlier detection and clustering-based outlier detection, laying the groundwork for future advances in unsupervised outlier detection research. The authors hope the algorithms and applications proposed here will serve as valuable resources for outlier detection researchers for years to come.