Optimization with Sparsity-Inducing Penalties


Book Description

Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. They were first dedicated to linear variable selection but numerous extensions have now emerged such as structured sparsity or kernel selection. It turns out that many of the related estimation problems can be cast as convex optimization problems by regularizing the empirical risk with appropriate nonsmooth norms. Optimization with Sparsity-Inducing Penalties presents optimization tools and techniques dedicated to such sparsity-inducing penalties from a general perspective. It covers proximal methods, block-coordinate descent, reweighted ?2-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provides an extensive set of experiments to compare various algorithms from a computational point of view. The presentation of Optimization with Sparsity-Inducing Penalties is essentially based on existing literature, but the process of constructing a general framework leads naturally to new results, connections and points of view. It is an ideal reference on the topic for anyone working in machine learning and related areas.




Optimization with Sparsity-Inducing Penalties


Book Description

Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. They were first dedicated to linear variable selection but numerous extensions have now emerged such as structured sparsity or kernel selection. It turns out that many of the related estimation problems can be cast as convex optimization problems by regularizing the empirical risk with appropriate nonsmooth norms. Optimization with Sparsity-Inducing Penalties presents optimization tools and techniques dedicated to such sparsity-inducing penalties from a general perspective. It covers proximal methods, block-coordinate descent, reweighted ?2-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provides an extensive set of experiments to compare various algorithms from a computational point of view. The presentation of Optimization with Sparsity-Inducing Penalties is essentially based on existing literature, but the process of constructing a general framework leads naturally to new results, connections and points of view. It is an ideal reference on the topic for anyone working in machine learning and related areas.




Learning with Submodular Functions


Book Description

Submodular functions are relevant to machine learning for at least two reasons: (1) some problems may be expressed directly as the optimization of submodular functions and (2) the Lovász extension of submodular functions provides a useful set of regularization functions for supervised and unsupervised learning. In this monograph, we present the theory of submodular functions from a convex analysis perspective, presenting tight links between certain polyhedra, combinatorial optimization and convex optimization problems. In particular, we show how submodular function minimization is equivalent to solving a wide variety of convex optimization problems. This allows the derivation of new efficient algorithms for approximate and exact submodular function minimization with theoretical guarantees and good practical performance. By listing many examples of submodular functions, we review various applications to machine learning, such as clustering, experimental design, sensor placement, graphical model structure learning or subset selection, as well as a family of structured sparsity-inducing norms that can be derived and used from submodular functions.




Proximal Algorithms


Book Description

Proximal Algorithms discusses proximal operators and proximal algorithms, and illustrates their applicability to standard and distributed convex optimization in general and many applications of recent interest in particular. Much like Newton's method is a standard tool for solving unconstrained smooth optimization problems of modest size, proximal algorithms can be viewed as an analogous tool for nonsmooth, constrained, large-scale, or distributed versions of these problems. They are very generally applicable, but are especially well-suited to problems of substantial recent interest involving large or high-dimensional datasets. Proximal methods sit at a higher level of abstraction than classical algorithms like Newton's method: the base operation is evaluating the proximal operator of a function, which itself involves solving a small convex optimization problem. These subproblems, which generalize the problem of projecting a point onto a convex set, often admit closed-form solutions or can be solved very quickly with standard or simple specialized methods. Proximal Algorithms discusses different interpretations of proximal operators and algorithms, looks at their connections to many other topics in optimization and applied mathematics, surveys some popular algorithms, and provides a large number of examples of proximal operators that commonly arise in practice.




Sparse Modeling for Image and Vision Processing


Book Description

Sparse Modeling for Image and Vision Processing offers a self-contained view of sparse modeling for visual recognition and image processing. More specifically, it focuses on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.




Statistical Learning with Sparsity


Book Description

Discover New Methods for Dealing with High-Dimensional DataA sparse statistical model has only a small number of nonzero parameters or weights; therefore, it is much easier to estimate and interpret than a dense model. Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underl




Estimation and Testing Under Sparsity


Book Description

Taking the Lasso method as its starting point, this book describes the main ingredients needed to study general loss functions and sparsity-inducing regularizers. It also provides a semi-parametric approach to establishing confidence intervals and tests. Sparsity-inducing methods have proven to be very useful in the analysis of high-dimensional data. Examples include the Lasso and group Lasso methods, and the least squares method with other norm-penalties, such as the nuclear norm. The illustrations provided include generalized linear models, density estimation, matrix completion and sparse principal components. Each chapter ends with a problem section. The book can be used as a textbook for a graduate or PhD course.




Convex Optimization Algorithms


Book Description

This book provides a comprehensive and accessible presentation of algorithms for solving convex optimization problems. It relies on rigorous mathematical analysis, but also aims at an intuitive exposition that makes use of visualization where possible. This is facilitated by the extensive use of analytical and algorithmic concepts of duality, which by nature lend themselves to geometrical interpretation. The book places particular emphasis on modern developments, and their widespread applications in fields such as large-scale resource allocation problems, signal processing, and machine learning. The book is aimed at students, researchers, and practitioners, roughly at the first year graduate level. It is similar in style to the author's 2009"Convex Optimization Theory" book, but can be read independently. The latter book focuses on convexity theory and optimization duality, while the present book focuses on algorithmic issues. The two books share notation, and together cover the entire finite-dimensional convex optimization methodology. To facilitate readability, the statements of definitions and results of the "theory book" are reproduced without proofs in Appendix B.




Convex Optimization


Book Description

This monograph presents the main complexity theorems in convex optimization and their corresponding algorithms. It begins with the fundamental theory of black-box optimization and proceeds to guide the reader through recent advances in structural optimization and stochastic optimization. The presentation of black-box optimization, strongly influenced by the seminal book by Nesterov, includes the analysis of cutting plane methods, as well as (accelerated) gradient descent schemes. Special attention is also given to non-Euclidean settings (relevant algorithms include Frank-Wolfe, mirror descent, and dual averaging), and discussing their relevance in machine learning. The text provides a gentle introduction to structural optimization with FISTA (to optimize a sum of a smooth and a simple non-smooth term), saddle-point mirror prox (Nemirovski's alternative to Nesterov's smoothing), and a concise description of interior point methods. In stochastic optimization it discusses stochastic gradient descent, mini-batches, random coordinate descent, and sublinear algorithms. It also briefly touches upon convex relaxation of combinatorial problems and the use of randomness to round solutions, as well as random walks based methods.