Explaining the Success of Nearest Neighbor Methods in Prediction


Book Description

Many modern methods for prediction leverage nearest neighbor search to find past training examples most similar to a test example, an idea that dates back in text to at least the 11th century and has stood the test of time. This monograph aims to explain the success of these methods, both in theory, for which we cover foundational nonasymptotic statistical guarantees on nearest-neighbor-based regression and classification, and in practice, for which we gather prominent methods for approximate nearest neighbor search that have been essential to scaling prediction systems reliant on nearest neighbor analysis to handle massive datasets. Furthermore, we discuss connections to learning distances for use with nearest neighbor methods, including how random decision trees and ensemble methods learn nearest neighbor structure, as well as recent developments in crowdsourcing and graphons. In terms of theory, our focus is on nonasymptotic statistical guarantees, which we state in the form of how many training data and what algorithm parameters ensure that a nearest neighbor prediction method achieves a user-specified error tolerance. We begin with the most general of such results for nearest neighbor and related kernel regression and classification in general metric spaces. In such settings in which we assume very little structure, what enables successful prediction is smoothness in the function being estimated for regression, and a low probability of landing near the decision boundary for classification. In practice, these conditions could be difficult to verify empirically for a real dataset. We then cover recent theoretical guarantees on nearest neighbor prediction in the three case studies of time series forecasting, recommending products to people over time, and delineating human organs in medical images by looking at image patches. In these case studies, clustering structure, which is easier to verify in data and more readily interpretable by practitioners, enables successful prediction.




Beyond the Worst-Case Analysis of Algorithms


Book Description

Introduces exciting new methods for assessing algorithms for problems ranging from clustering to linear programming to neural networks.




Inference and Learning from Data


Book Description

Discover data-driven learning methods with the third volume of this extraordinary three-volume set.







Mathematical Analysis in Interdisciplinary Research


Book Description

This contributed volume provides an extensive account of research and expository papers in a broad domain of mathematical analysis and its various applications to a multitude of fields. Presenting the state-of-the-art knowledge in a wide range of topics, the book will be useful to graduate students and researchers in theoretical and applicable interdisciplinary research. The focus is on several subjects including: optimal control problems, optimal maintenance of communication networks, optimal emergency evacuation with uncertainty, cooperative and noncooperative partial differential systems, variational inequalities and general equilibrium models, anisotropic elasticity and harmonic functions, nonlinear stochastic differential equations, operator equations, max-product operators of Kantorovich type, perturbations of operators, integral operators, dynamical systems involving maximal monotone operators, the three-body problem, deceptive systems, hyperbolic equations, strongly generalized preinvex functions, Dirichlet characters, probability distribution functions, applied statistics, integral inequalities, generalized convexity, global hyperbolicity of spacetimes, Douglas-Rachford methods, fixed point problems, the general Rodrigues problem, Banach algebras, affine group, Gibbs semigroup, relator spaces, sparse data representation, Meier-Keeler sequential contractions, hybrid contractions, and polynomial equations. Some of the works published within this volume provide as well guidelines for further research and proposals for new directions and open problems.




Machine Learning for Data Science Handbook


Book Description

This book organizes key concepts, theories, standards, methodologies, trends, challenges and applications of data mining and knowledge discovery in databases. It first surveys, then provides comprehensive yet concise algorithmic descriptions of methods, including classic methods plus the extensions and novel methods developed recently. It also gives in-depth descriptions of data mining applications in various interdisciplinary industries.




Machine Learning and Knowledge Discovery in Databases


Book Description

The 5-volume proceedings, LNAI 12457 until 12461 constitutes the refereed proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2020, which was held during September 14-18, 2020. The conference was planned to take place in Ghent, Belgium, but had to change to an online format due to the COVID-19 pandemic. The 232 full papers and 10 demo papers presented in this volume were carefully reviewed and selected for inclusion in the proceedings. The volumes are organized in topical sections as follows: Part I: Pattern Mining; clustering; privacy and fairness; (social) network analysis and computational social science; dimensionality reduction and autoencoders; domain adaptation; sketching, sampling, and binary projections; graphical models and causality; (spatio-) temporal data and recurrent neural networks; collaborative filtering and matrix completion. Part II: deep learning optimization and theory; active learning; adversarial learning; federated learning; Kernel methods and online learning; partial label learning; reinforcement learning; transfer and multi-task learning; Bayesian optimization and few-shot learning. Part III: Combinatorial optimization; large-scale optimization and differential privacy; boosting and ensemble methods; Bayesian methods; architecture of neural networks; graph neural networks; Gaussian processes; computer vision and image processing; natural language processing; bioinformatics. Part IV: applied data science: recommendation; applied data science: anomaly detection; applied data science: Web mining; applied data science: transportation; applied data science: activity recognition; applied data science: hardware and manufacturing; applied data science: spatiotemporal data. Part V: applied data science: social good; applied data science: healthcare; applied data science: e-commerce and finance; applied data science: computational social science; applied data science: sports; demo track.




Introduction to Graph Signal Processing


Book Description

An intuitive, accessible text explaining the fundamentals and applications of signal processing on graphs. It covers basic and advanced topics, includes numerous exercises and Matlab examples, and is accompanied online by a solutions manual for instructors, making it essential reading for graduate students, researchers, and industry professionals.




Medical Image Computing and Computer Assisted Intervention – MICCAI 2023


Book Description

The ten-volume set LNCS 14220, 14221, 14222, 14223, 14224, 14225, 14226, 14227, 14228, and 14229 constitutes the refereed proceedings of the 26th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2023, which was held in Vancouver, Canada, in October 2023. The 730 revised full papers presented were carefully reviewed and selected from a total of 2250 submissions. The papers are organized in the following topical sections: Part I: Machine learning with limited supervision and machine learning – transfer learning; Part II: Machine learning – learning strategies; machine learning – explainability, bias, and uncertainty; Part III: Machine learning – explainability, bias and uncertainty; image segmentation; Part IV: Image segmentation; Part V: Computer-aided diagnosis; Part VI: Computer-aided diagnosis; computational pathology; Part VII: Clinical applications – abdomen; clinical applications – breast; clinical applications – cardiac; clinical applications – dermatology; clinical applications – fetal imaging; clinical applications – lung; clinical applications – musculoskeletal; clinical applications – oncology; clinical applications – ophthalmology; clinical applications – vascular; Part VIII: Clinical applications – neuroimaging; microscopy; Part IX: Image-guided intervention, surgical planning, and data science; Part X: Image reconstruction and image registration.




Machine Learning in VLSI Computer-Aided Design


Book Description

This book provides readers with an up-to-date account of the use of machine learning frameworks, methodologies, algorithms and techniques in the context of computer-aided design (CAD) for very-large-scale integrated circuits (VLSI). Coverage includes the various machine learning methods used in lithography, physical design, yield prediction, post-silicon performance analysis, reliability and failure analysis, power and thermal analysis, analog design, logic synthesis, verification, and neuromorphic design. Provides up-to-date information on machine learning in VLSI CAD for device modeling, layout verifications, yield prediction, post-silicon validation, and reliability; Discusses the use of machine learning techniques in the context of analog and digital synthesis; Demonstrates how to formulate VLSI CAD objectives as machine learning problems and provides a comprehensive treatment of their efficient solutions; Discusses the tradeoff between the cost of collecting data and prediction accuracy and provides a methodology for using prior data to reduce cost of data collection in the design, testing and validation of both analog and digital VLSI designs. From the Foreword As the semiconductor industry embraces the rising swell of cognitive systems and edge intelligence, this book could serve as a harbinger and example of the osmosis that will exist between our cognitive structures and methods, on the one hand, and the hardware architectures and technologies that will support them, on the other....As we transition from the computing era to the cognitive one, it behooves us to remember the success story of VLSI CAD and to earnestly seek the help of the invisible hand so that our future cognitive systems are used to design more powerful cognitive systems. This book is very much aligned with this on-going transition from computing to cognition, and it is with deep pleasure that I recommend it to all those who are actively engaged in this exciting transformation. Dr. Ruchir Puri, IBM Fellow, IBM Watson CTO & Chief Architect, IBM T. J. Watson Research Center