Nearest Neighbor Search:


Book Description

Modern applications are both data and computationally intensive and require the storage and manipulation of voluminous traditional (alphanumeric) and nontraditional data sets (images, text, geometric objects, time-series). Examples of such emerging application domains are: Geographical Information Systems (GIS), Multimedia Information Systems, CAD/CAM, Time-Series Analysis, Medical Information Sstems, On-Line Analytical Processing (OLAP), and Data Mining. These applications pose diverse requirements with respect to the information and the operations that need to be supported. From the database perspective, new techniques and tools therefore need to be developed towards increased processing efficiency. This monograph explores the way spatial database management systems aim at supporting queries that involve the space characteristics of the underlying data, and discusses query processing techniques for nearest neighbor queries. It provides both basic concepts and state-of-the-art results in spatial databases and parallel processing research, and studies numerous applications of nearest neighbor queries.




Dimensionality Reduction with Unsupervised Nearest Neighbors


Book Description

This book is devoted to a novel approach for dimensionality reduction based on the famous nearest neighbor method that is a powerful classification and regression approach. It starts with an introduction to machine learning concepts and a real-world application from the energy domain. Then, unsupervised nearest neighbors (UNN) is introduced as efficient iterative method for dimensionality reduction. Various UNN models are developed step by step, reaching from a simple iterative strategy for discrete latent spaces to a stochastic kernel-based algorithm for learning submanifolds with independent parameterizations. Extensions that allow the embedding of incomplete and noisy patterns are introduced. Various optimization approaches are compared, from evolutionary to swarm-based heuristics. Experimental comparisons to related methodologies taking into account artificial test data sets and also real-world data demonstrate the behavior of UNN in practical scenarios. The book contains numerous color figures to illustrate the introduced concepts and to highlight the experimental results.




Lectures on the Nearest Neighbor Method


Book Description

This text presents a wide-ranging and rigorous overview of nearest neighbor methods, one of the most important paradigms in machine learning. Now in one self-contained volume, this book systematically covers key statistical, probabilistic, combinatorial and geometric ideas for understanding, analyzing and developing nearest neighbor methods. Gérard Biau is a professor at Université Pierre et Marie Curie (Paris). Luc Devroye is a professor at the School of Computer Science at McGill University (Montreal).




Nearest-neighbor Methods in Learning and Vision


Book Description

This text presents theoretical and practical discussions of nearest neighbour (NN) methods in machine learning and examines computer vision as an application domain in which the benefit of these advanced methods is often dramatic.




Proceedings Of The International Congress Of Mathematicians 2018 (Icm 2018) (In 4 Volumes)


Book Description

The Proceedings of the ICM publishes the talks, by invited speakers, at the conference organized by the International Mathematical Union every 4 years. It covers several areas of Mathematics and it includes the Fields Medal and Nevanlinna, Gauss and Leelavati Prizes and the Chern Medal laudatios.




Data Algorithms


Book Description

If you are ready to dive into the MapReduce framework for processing large datasets, this practical book takes you step by step through the algorithms and tools you need to build distributed MapReduce applications with Apache Hadoop or Apache Spark. Each chapter provides a recipe for solving a massive computational problem, such as building a recommendation system. You’ll learn how to implement the appropriate MapReduce solution with code that you can use in your projects. Dr. Mahmoud Parsian covers basic design patterns, optimization techniques, and data mining and machine learning solutions for problems in bioinformatics, genomics, statistics, and social network analysis. This book also includes an overview of MapReduce, Hadoop, and Spark. Topics include: Market basket analysis for a large set of transactions Data mining algorithms (K-means, KNN, and Naive Bayes) Using huge genomic data to sequence DNA and RNA Naive Bayes theorem and Markov chains for data and market prediction Recommendation algorithms and pairwise document similarity Linear regression, Cox regression, and Pearson correlation Allelic frequency and mining DNA Social network analysis (recommendation systems, counting triangles, sentiment analysis)







Algorithms and Data Structures


Book Description

This book constitutes the refereed proceedings of the 10th International Workshop on Algorithms and Data Structures, WADS 2007, held in Halifax, Canada, in August 2007. The papers present original research on the theory and application of algorithms and data structures in all areas, including combinatorics, computational geometry, databases, graphics, parallel and distributed computing.




Advanced Data Mining and Applications


Book Description

With the ever-growing power of generating, transmitting, and collecting huge amounts of data, information overloadis nowan imminent problemto mankind. The overwhelming demand for information processing is not just about a better understanding of data, but also a better usage of data in a timely fashion. Data mining, or knowledge discovery from databases, is proposed to gain insight into aspects ofdata and to help peoplemakeinformed,sensible,and better decisions. At present, growing attention has been paid to the study, development, and application of data mining. As a result there is an urgent need for sophisticated techniques and toolsthat can handle new ?elds of data mining, e. g. , spatialdata mining, biomedical data mining, and mining on high-speed and time-variant data streams. The knowledge of data mining should also be expanded to new applications. The 6th International Conference on Advanced Data Mining and Appli- tions(ADMA2010)aimedtobringtogethertheexpertsondataminingthrou- out the world. It provided a leading international forum for the dissemination of original research results in advanced data mining techniques, applications, al- rithms, software and systems, and di?erent applied disciplines. The conference attracted 361 online submissions from 34 di?erent countries and areas. All full papers were peer reviewed by at least three members of the Program Comm- tee composed of international experts in data mining ?elds. A total number of 118 papers were accepted for the conference. Amongst them, 63 papers were selected as regular papers and 55 papers were selected as short papers.




Fundamentals of Database Indexing and Searching


Book Description

Fundamentals of Database Indexing and Searching presents well-known database searching and indexing techniques. It focuses on similarity search queries, showing how to use distance functions to measure the notion of dissimilarity. After defining database queries and similarity search queries, the book organizes the most common and representative index structures according to their characteristics. The author first describes low-dimensional index structures, memory-based index structures, and hierarchical disk-based index structures. He then outlines useful distance measures and index structures that use the distance information to efficiently solve similarity search queries. Focusing on the difficult dimensionality phenomenon, he also presents several indexing methods that specifically deal with high-dimensional spaces. In addition, the book covers data reduction techniques, including embedding, various data transforms, and histograms. Through numerous real-world examples, this book explores how to effectively index and search for information in large collections of data. Requiring only a basic computer science background, it is accessible to practitioners and advanced undergraduate students.