Understanding High-Dimensional Spaces


Book Description

High-dimensional spaces arise as a way of modelling datasets with many attributes. Such a dataset can be directly represented in a space spanned by its attributes, with each record represented as a point in the space with its position depending on its attribute values. Such spaces are not easy to work with because of their high dimensionality: our intuition about space is not reliable, and measures such as distance do not provide as clear information as we might expect. There are three main areas where complex high dimensionality and large datasets arise naturally: data collected by online retailers, preference sites, and social media sites, and customer relationship databases, where there are large but sparse records available for each individual; data derived from text and speech, where the attributes are words and so the corresponding datasets are wide, and sparse; and data collected for security, defense, law enforcement, and intelligence purposes, where the datasets are large and wide. Such datasets are usually understood either by finding the set of clusters they contain or by looking for the outliers, but these strategies conceal subtleties that are often ignored. In this book the author suggests new ways of thinking about high-dimensional spaces using two models: a skeleton that relates the clusters to one another; and boundaries in the empty space between clusters that provide new perspectives on outliers and on outlying regions. The book will be of value to practitioners, graduate students and researchers.




Understanding High-Dimensional Spaces


Book Description

This book proposes new ways of thinking about high-dimensional spaces using two models: the skeleton that relates the clusters to one another, and the boundaries in empty space that provide new perspectives on outliers and on outlying regions.




Database Theory - ICDT 2001


Book Description

This book constitutes the refereed proceedings of the 8th International Conference on Database Theory, ICDT 2001, held in London, UK, in January 2001. The 26 revised full papers presented together with two invited papers were carefully reviewed and selected from 75 submissions. All current issues on database theory and the foundations of database systems are addressed. Among the topics covered are database queries, SQL, information retrieval, database logic, database mining, constraint databases, transactions, algorithmic aspects, semi-structured data, data engineering, XML, term rewriting, clustering, etc.




High-Dimensional Probability


Book Description

An integrated package of powerful probabilistic tools and key applications in modern mathematical data science.




Introduction to High-Dimensional Statistics


Book Description

Praise for the first edition: "[This book] succeeds singularly at providing a structured introduction to this active field of research. ... it is arguably the most accessible overview yet published of the mathematical ideas and principles that one needs to master to enter the field of high-dimensional statistics. ... recommended to anyone interested in the main results of current research in high-dimensional statistics as well as anyone interested in acquiring the core mathematical skills to enter this area of research." —Journal of the American Statistical Association Introduction to High-Dimensional Statistics, Second Edition preserves the philosophy of the first edition: to be a concise guide for students and researchers discovering the area and interested in the mathematics involved. The main concepts and ideas are presented in simple settings, avoiding thereby unessential technicalities. High-dimensional statistics is a fast-evolving field, and much progress has been made on a large variety of topics, providing new insights and methods. Offering a succinct presentation of the mathematical foundations of high-dimensional statistics, this new edition: Offers revised chapters from the previous edition, with the inclusion of many additional materials on some important topics, including compress sensing, estimation with convex constraints, the slope estimator, simultaneously low-rank and row-sparse linear regression, or aggregation of a continuous set of estimators. Introduces three new chapters on iterative algorithms, clustering, and minimax lower bounds. Provides enhanced appendices, minimax lower-bounds mainly with the addition of the Davis-Kahan perturbation bound and of two simple versions of the Hanson-Wright concentration inequality. Covers cutting-edge statistical methods including model selection, sparsity and the Lasso, iterative hard thresholding, aggregation, support vector machines, and learning theory. Provides detailed exercises at the end of every chapter with collaborative solutions on a wiki site. Illustrates concepts with simple but clear practical examples.




High-Dimensional Statistics


Book Description

A coherent introductory text from a groundbreaking researcher, focusing on clarity and motivation to build intuition and understanding.




How Surfaces Intersect in Space


Book Description

This marvelous book of pictures illustrates the fundamental concepts of geometric topology in a way that is very friendly to the reader. It will be of value to anyone who wants to understand the subject by way of examples. Undergraduates, beginning graduate students, and non-professionals will profit from reading the book and from just looking at the pictures.




Hyperspace


Book Description

Are there other dimensions beyond our own? Is time travel possible? Can we change the past? Are there gateways to parallel universes? All of us have pondered such questions, but there was a time when scientists dismissed these notions as outlandish speculations. Not any more. Today, they are the focus of the most intense scientific activity in recent memory. In Hyperspace, Michio Kaku, author of the widely acclaimed Beyond Einstein and a leading theoretical physicist, offers the first book-length tour of the most exciting (and perhaps most bizarre) work in modern physics, work which includes research on the tenth dimension, time warps, black holes, and multiple universes. The theory of hyperspace (or higher dimensional space)--and its newest wrinkle, superstring theory--stand at the center of this revolution, with adherents in every major research laboratory in the world, including several Nobel laureates. Beginning where Hawking's Brief History of Time left off, Kaku paints a vivid portrayal of the breakthroughs now rocking the physics establishment. Why all the excitement? As the author points out, for over half a century, scientists have puzzled over why the basic forces of the cosmos--gravity, electromagnetism, and the strong and weak nuclear forces--require markedly different mathematical descriptions. But if we see these forces as vibrations in a higher dimensional space, their field equations suddenly fit together like pieces in a jigsaw puzzle, perfectly snug, in an elegant, astonishingly simple form. This may thus be our leading candidate for the Theory of Everything. If so, it would be the crowning achievement of 2,000 years of scientific investigation into matter and its forces. Already, the theory has inspired several thousand research papers, and has been the focus of over 200 international conferences. Michio Kaku is one of the leading pioneers in superstring theory and has been at the forefront of this revolution in modern physics. With Hyperspace, he has produced a book for general readers which conveys the vitality of the field and the excitement as scientists grapple with the meaning of space and time. It is an exhilarating look at physics today and an eye-opening glimpse into the ultimate nature of the universe.




High-Dimensional Indexing


Book Description

In this monograph, we study the problem of high-dimensional indexing and systematically introduce two efficient index structures: one for range queries and the other for similarity queries. Extensive experiments and comparison studies are conducted to demonstrate the superiority of the proposed indexing methods. Many new database applications, such as multimedia databases or stock price information systems, transform important features or properties of data objects into high-dimensional points. Searching for objects based on these features is thus a search of points in this feature space. To support efficient retrieval in such high-dimensional databases, indexes are required to prune the search space. Indexes for low-dimensional databases are well studied, whereas most of these application specific indexes are not scaleable with the number of dimensions, and they are not designed to support similarity searches and high-dimensional joins.




Foundations of Data Science


Book Description

This book provides an introduction to the mathematical and algorithmic foundations of data science, including machine learning, high-dimensional geometry, and analysis of large networks. Topics include the counterintuitive nature of data in high dimensions, important linear algebraic techniques such as singular value decomposition, the theory of random walks and Markov chains, the fundamentals of and important algorithms for machine learning, algorithms and analysis for clustering, probabilistic models for large networks, representation learning including topic modelling and non-negative matrix factorization, wavelets and compressed sensing. Important probabilistic techniques are developed including the law of large numbers, tail inequalities, analysis of random projections, generalization guarantees in machine learning, and moment methods for analysis of phase transitions in large random graphs. Additionally, important structural and complexity measures are discussed such as matrix norms and VC-dimension. This book is suitable for both undergraduate and graduate courses in the design and analysis of algorithms for data.