Kernel Based Algorithms for Mining Huge Data Sets


Book Description

This is the first book treating the fields of supervised, semi-supervised and unsupervised machine learning collectively. The book presents both the theory and the algorithms for mining huge data sets using support vector machines (SVMs) in an iterative way. It demonstrates how kernel based SVMs can be used for dimensionality reduction and shows the similarities and differences between the two most popular unsupervised techniques.




Mining of Massive Datasets


Book Description

Now in its second edition, this book focuses on practical algorithms for mining data from even the largest datasets.




Kernel Methods for Pattern Analysis


Book Description

Publisher Description




Support Vector Machines and Perceptrons


Book Description

This work reviews the state of the art in SVM and perceptron classifiers. A Support Vector Machine (SVM) is easily the most popular tool for dealing with a variety of machine-learning tasks, including classification. SVMs are associated with maximizing the margin between two classes. The concerned optimization problem is a convex optimization guaranteeing a globally optimal solution. The weight vector associated with SVM is obtained by a linear combination of some of the boundary and noisy vectors. Further, when the data are not linearly separable, tuning the coefficient of the regularization term becomes crucial. Even though SVMs have popularized the kernel trick, in most of the practical applications that are high-dimensional, linear SVMs are popularly used. The text examines applications to social and information networks. The work also discusses another popular linear classifier, the perceptron, and compares its performance with that of the SVM in different application areas.>




Mining Massive Data Sets for Security


Book Description

The real power for security applications will come from the synergy of academic and commercial research focusing on the specific issue of security. This book is suitable for those interested in understanding the techniques for handling very large data sets and how to apply them in conjunction for solving security issues.




Large-scale Kernel Machines


Book Description

Solutions for learning from large scale datasets, including kernel learning algorithms that scale linearly with the volume of the data and experiments carried out on realistically large datasets. Pervasive and networked computers have dramatically reduced the cost of collecting and distributing large datasets. In this context, machine learning algorithms that scale poorly could simply become irrelevant. We need learning algorithms that scale linearly with the volume of the data while maintaining enough statistical efficiency to outperform algorithms that simply process a random subset of the data. This volume offers researchers and engineers practical solutions for learning from large scale datasets, with detailed descriptions of algorithms and experiments carried out on realistically large datasets. At the same time it offers researchers information that can address the relative lack of theoretical grounding for many useful algorithms. After a detailed description of state-of-the-art support vector machine technology, an introduction of the essential concepts discussed in the volume, and a comparison of primal and dual optimization techniques, the book progresses from well-understood techniques to more novel and controversial approaches. Many contributors have made their code and data available online for further experimentation. Topics covered include fast implementations of known algorithms, approximations that are amenable to theoretical guarantees, and algorithms that perform well in practice but are difficult to analyze theoretically. Contributors Léon Bottou, Yoshua Bengio, Stéphane Canu, Eric Cosatto, Olivier Chapelle, Ronan Collobert, Dennis DeCoste, Ramani Duraiswami, Igor Durdanovic, Hans-Peter Graf, Arthur Gretton, Patrick Haffner, Stefanie Jegelka, Stephan Kanthak, S. Sathiya Keerthi, Yann LeCun, Chih-Jen Lin, Gaëlle Loosli, Joaquin Quiñonero-Candela, Carl Edward Rasmussen, Gunnar Rätsch, Vikas Chandrakant Raykar, Konrad Rieck, Vikas Sindhwani, Fabian Sinz, Sören Sonnenburg, Jason Weston, Christopher K. I. Williams, Elad Yom-Tov




Learning Kernel Classifiers


Book Description

An overview of the theory and application of kernel classification methods. Linear classifiers in kernel spaces have emerged as a major topic within the field of machine learning. The kernel technique takes the linear classifier—a limited, but well-established and comprehensively studied model—and extends its applicability to a wide range of nonlinear pattern-recognition tasks such as natural language processing, machine vision, and biological sequence analysis. This book provides the first comprehensive overview of both the theory and algorithms of kernel classifiers, including the most recent developments. It begins by describing the major algorithmic advances: kernel perceptron learning, kernel Fisher discriminants, support vector machines, relevance vector machines, Gaussian processes, and Bayes point machines. Then follows a detailed introduction to learning theory, including VC and PAC-Bayesian theory, data-dependent structural risk minimization, and compression bounds. Throughout, the book emphasizes the interaction between theory and algorithms: how learning algorithms work and why. The book includes many examples, complete pseudo code of the algorithms presented, and an extensive source code library.




Principles of Data Mining


Book Description

The first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics. The growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically, different aspects of data mining have been addressed independently by different disciplines. This is the first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics. The book consists of three sections. The first, foundations, provides a tutorial overview of the principles underlying data mining algorithms and their application. The presentation emphasizes intuition rather than rigor. The second section, data mining algorithms, shows how algorithms are constructed to solve specific problems in a principled manner. The algorithms covered include trees and rules for classification and regression, association rules, belief networks, classical statistical models, nonlinear models such as neural networks, and local "memory-based" models. The third section shows how all of the preceding analysis fits together when applied to real-world data mining problems. Topics include the role of metadata, how to handle missing data, and data preprocessing.




The Top Ten Algorithms in Data Mining


Book Description

Identifying some of the most influential algorithms that are widely used in the data mining community, The Top Ten Algorithms in Data Mining provides a description of each algorithm, discusses its impact, and reviews current and future research. Thoroughly evaluated by independent reviewers, each chapter focuses on a particular algorithm and is wri