From safe screening rules to working sets for faster Lasso-type solvers

Convex sparsity promoting regularizations are now ubiquitous to regularize inverse problems in statistics, in signal processing and in machine learning. By construction, they yield solutions with few non-zero coefficients. This point is particularly appealing for Working Set (WS) strategies, an optimization technique that solves simpler problems by handling small subsets of variables, whose indices form the WS.Such methods involve two nested iterations: the outer loop corresponds to the definition of the WS and the inner loop calls a solver for the subproblems. This is well suited for Lasso-type estimators where a WS is a set of features, while for a Group Lasso with block sparsity it consists of a set of groups. It has recently been shown that WS reaches state-of-the-art performance for Lasso-type solvers, e.g., “Blitz” by Johnson and Guestrin 2015.
In practice, WS are generally small so the associated feature Gram matrix can fit in memory. Hence, greedy methods (Gauss-Southwell) for block coordinate descent techniques can be applied to for additional speed-ups.
We combine this strategy with an aggressive use of the so-called Gap Safe screening rules, and we illustrate practical benefits on M/EEG inverse problems.
(This is joint work with A. Gramfort and M. Massias.)