ArticlePublisher preview available

An active-set algorithmic framework for non-convex optimization problems over the simplex

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

In this paper, we describe a new active-set algorithmic framework for minimizing a non-convex function over the unit simplex. At each iteration, the method makes use of a rule for identifying active variables (i.e., variables that are zero at a stationary point) and specific directions (that we name active-set gradient related directions) satisfying a new “nonorthogonality” type of condition. We prove global convergence to stationary points when using an Armijo line search in the given framework. We further describe three different examples of active-set gradient related directions that guarantee linear convergence rate (under suitable assumptions). Finally, we report numerical experiments showing the effectiveness of the approach.
This content is subject to copyright. Terms and conditions apply.
Vol.:(0123456789)
Computational Optimization and Applications (2020) 77:57–89
https://doi.org/10.1007/s10589-020-00195-x
1 3
An active‑set algorithmic framework fornon‑convex
optimization problems overthesimplex
AndreaCristofari1 · MariannaDeSantis2· StefanoLucidi2· FrancescoRinaldi1
Received: 16 February 2019 / Published online: 16 May 2020
© Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract
In this paper, we describe a new active-set algorithmic framework for minimizing a
non-convex function over the unit simplex. At each iteration, the method makes use
of a rule for identifying active variables (i.e., variables that are zero at a stationary
point) and specific directions (that we name active-set gradient related directions)
satisfying a new “nonorthogonality” type of condition. We prove global convergence
to stationary points when using an Armijo line search in the given framework. We
further describe three different examples of active-set gradient related directions that
guarantee linear convergence rate (under suitable assumptions). Finally, we report
numerical experiments showing the effectiveness of the approach.
Keywords Active-set methods· Unit simplex· Non-convex optimization· Large-
scale optimization
Mathematics Subject Classication 65K05· 90C06· 90C30
* Andrea Cristofari
andrea.cristofari@unipd.it
Marianna De Santis
mdesantis@diag.uniroma1.it
Stefano Lucidi
lucidi@diag.uniroma1.it
Francesco Rinaldi
rinaldi@math.unipd.it
1 Dipartimento di Matematica “Tullio Levi-Civita”, Università di Padova, Padua, Italy
2 Dipartimento di Ingegneria Informatica, Automatica e Gestionale, Sapienza Università di Roma,
Rome, Italy
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... This would indeed guarantee a significant speed-up of the optimization process. A number of active-set strategies for structured feasible sets is available in the literature (see, e.g., [3,4,7,9,10,13,18,19,[22][23][24]28] and references therein), but none of those directly handles the 1 ball. ...
... In this paper, inspired by the work carried out in [10], we propose a tailored active-set strategy for problem (1) and embed it into a first-order projection-based algorithm. At each iteration, the method first sets to zero the variables that are guessed to be zero at the final solution. ...
... (i) For any feasible point x of problem (1), by (7) we can compute a feasible point y of problem (6) such that (ii) According to (8), for every feasible point x of problem (1) we have that Thus, it is natural to estimate a variable x i as active at x * if both y i and y n+i are estimated to be zero at the point corresponding to x * in the y space. To estimate the zero variables among y 1 , … , y 2n+1 we use the active-set estimate described in [10], specifically devised for minimization problems over the unit simplex. (iii) Then, we are able to go back in the original x space to obtain an active-set estimate of problem (1) without explicitly considering the variables y 1 , … , y 2n+1 of the reformulated problem. ...
Article
Full-text available
The $$\ell _1$$ ℓ 1 -ball is a nicely structured feasible set that is widely used in many fields (e.g., machine learning, statistics and signal analysis) to enforce some sparsity in the model solutions. In this paper, we devise an active-set strategy for efficiently dealing with minimization problems over the $$\ell _1$$ ℓ 1 -ball and embed it into a tailored algorithmic scheme that makes use of a non-monotone first-order approach to explore the given subspace at each iteration. We prove global convergence to stationary points. Finally, we report numerical experiments, on two different classes of instances, showing the effectiveness of the algorithm.
... This would indeed guarantee a significant speed-up of the optimization process. A number of active set strategies for structured feasible sets is available in the literature (see, e.g., [3,4,7,9,10,13,18,19,22,23,24,28] and references therein), but none of those directly handles the ℓ 1 ball. ...
... To estimate the zero variables among y 1 , . . . , y 2n+1 we use the active-set estimate described in [10], specifically devised for minimization problems over the unit simplex. ...
... Considering problem (4) and using the active-set estimate proposed in [10] for minimization problems over the unit simplex, given any feasible point y of problem (4) we define: ...
Preprint
The l1-ball is a nicely structured feasible set that is widely used in many fields (e.g., machine learning, statistics and signal analysis) to enforce some sparsity in the model solutions. In this paper, we devise an active-set strategy for efficiently dealing with minimization problems over the l1-ball and embed it into a tailored algorithmic scheme that makes use of a non-monotone first-order approach to explore the given subspace at each iteration. We prove global convergence to stationary points. Finally, we report numerical experiments, on two different classes of instances, showing the effectiveness of the algorithm.
... In the literature, much effort has been devoted to proving identification properties of some algorithms for smooth optimization [3,5,6,7,8,9,10,11,19,21,24,46,48], non-smooth optimization [16,23,25,29,31,35,41,42,47,49], stochastic optimization [18,28,45] and derivative-free optimization [30]. Moreover, a wide class of methods, known as active-set methods, has been object of extensive study from decades (see, e.g., [4,13,14,17,20,22] and the references therein), making use of specific techniques to identify the so called active set, which is the set of constraints or variables that parametrizes a surface containing a solution. ...
... where τ ∈ (0, 1] is the parameter used to choose j(k), satisfying (14). Then, for all k ≥ k j we have that j(k) / ∈ Z (x * ). ...
... contradicting (14). ...
Preprint
In this paper, it is established finite active-set identification of an almost cyclic 2-coordinate descent method for problems with one linear coupling constraint and simple bounds. First, general active-set identification results are stated for non-convex objective functions. Then, under strong convexity, complexity results on the number of iterations required to identify the active set are given. In our analysis, a simple Armijo line search is used to compute the stepsize, thus not requiring exact minimizations or additional information.
... This guarantee is indeed more loose than for the other variants, because there is no satisfactory bound on the number of such problematic steps (there is a best known bound of 3N! bad steps for each good step); • it eliminates the dependence of the convergence rates on the support of the starting point (see, e.g., [40] and [10]). This dependence can significantly affect the performance of FW variants on smooth non convex optimization problems [41]. ...
... Finally, while beyond the scope of this paper, we mention that bad steps lead to a slow active set identification for the AFW, when compared to the "one shot" identification property characterizing proximal gradient methods and active set strategies (see [41,42] and references therein). More precisely, analyses in recent works ( [20,43] and [44]) show that a number of bad steps equal to the number of "wrong" atoms is performed by the method in a sufficiently small neighborhood of a solution to identify its support. ...
Article
Full-text available
The study of Frank-Wolfe (FW) variants is often complicated by the presence of different kinds of “good” and “bad” steps. In this article, we aim to simplify the convergence analysis of specific variants by getting rid of such a distinction between steps, and to improve existing rates by ensuring a non-trivial bound at each iteration. In order to do this, we define the Short Step Chain (SSC) procedure, which skips gradient computations in consecutive short steps until proper conditions are satisfied. This algorithmic tool allows us to give a unified analysis and converge rates in the general smooth non convex setting, as well as a linear convergence rate under a Kurdyka-Łojasiewicz (KL) property. While the KL setting has been widely studied for proximal gradient type methods, to our knowledge, it has never been analyzed before for the Frank-Wolfe variants considered in the paper. An angle condition, ensuring that the directions selected by the methods have the steepest slope possible up to a constant, is used to carry out our analysis. We prove that such a condition is satisfied, when considering minimization problems over a polytope, by the away step Frank-Wolfe (AFW), the pairwise Frank-Wolfe (PFW), and the Frank-Wolfe method with in face directions (FDFW).
... This follows from (10), (11) and (12). In this section we propose a decomposition algorithm, named Active-Set Zero-Sum-Lasso (AS-ZSL), to efficiently solve problem (1). ...
... In the field of constrained optimization, several active-set techniques were proposed to identify the active (or binding) constraints, see, e.g., [4,6,10,11,13,16,17,22,23,34,36]. Active-set strategies were successfully used also to identify the zero variables in ℓ 1 -regularized problems [14,26,38,43,44] and in ℓ 1 -constrained problems [12]. ...
Preprint
In this paper, we consider lasso problems with zero-sum constraint, commonly required for the analysis of compositional data in high-dimensional spaces. A novel algorithm is proposed to solve these problems, combining a tailored active-set technique, to identify the zero variables in the optimal solution, with a 2-coordinate descent scheme. At every iteration, the algorithm chooses between two different strategies: the first one requires to compute the whole gradient of the smooth term of the objective function and is more accurate in the active-set estimate, while the second one only uses partial derivatives and is computationally more efficient. Global convergence to optimal solutions is proved and numerical results are provided on synthetic and real datasets, showing the effectiveness of the proposed method. The software is publicly available.
... • it eliminates the dependence of the convergence rates on the support of the starting point (see, e.g., [30] and [37]). This dependence can significantly affect the performance of FW variants on smooth non convex optimization problems [21]. ...
... Finally, while beyond the scope of this paper, we mention that bad steps lead to a slow active set identification for the AFW, when compared to the "one shot" identification property characterizing proximal gradient methods and active set strategies (see [21], [47] and references therein). More precisely, analyses in recent works ( [14], [15] and [24]) show that a number of bad steps equal to the number of "wrong" atoms is performed by the method in a sufficiently small neighborhood of a solution to identify its support. ...
Preprint
Full-text available
The analysis of Frank Wolfe (FW) variants is often complicated by the presence of different kinds of "good" and "bad" steps. In this article we aim to simplify the convergence analysis of some of these variants by getting rid of such a distinction between steps, and to improve existing rates by ensuring a sizable decrease of the objective at each iteration. In order to do this, we define the Short Step Chain (SSC) procedure, which skips gradient computations in consecutive short steps until proper stopping conditions are satisfied. This technique allows us to give a unified analysis and converge rates in the general smooth non convex setting, as well as a linear convergence rate under a Kurdyka-Lojasiewicz (KL) property. While this setting has been widely studied for proximal gradient type methods, to our knowledge, it has not been analyzed before for the Frank Wolfe variants under study. An angle condition, ensuring that the directions selected by the methods have the steepest slope possible up to a constant, is used to carry out our analysis. We prove that this condition is satisfied on polytopes by the away step Frank-Wolfe (AFW), the pairwise Frank-Wolfe (PFW), and the Frank-Wolfe method with in face directions (FDFW).
... Convergence and finite time identification for the PFW and the AFW are proved in Bomze et al. (2019) for a specific class of non-convex minimization problems over the standard simplex, under the additional assumption that the sequence generated has a finite set of limit points. In another line of work, active set identification strategies combined with FW variants have been proposed in Cristofari et al. (2020) and Sun (2020). ...
Article
Full-text available
Invented some 65 years ago in a seminal paper by Marguerite Straus-Frank and Philip Wolfe, the Frank–Wolfe method recently enjoys a remarkable revival, fuelled by the need of fast and reliable first-order optimization methods in Data Science and other relevant application areas. This review tries to explain the success of this approach by illustrating versatility and applicability in a wide range of contexts, combined with an account on recent progress in variants, improving on both the speed and efficiency of this surprisingly simple principle of first-order optimization.
... Convergence and finite time identification for the PFW and the AFW are proved in [13] for a specific class of non-convex minimization problems over the standard simplex, under the additional assumption that the sequence generated has a finite set of limit points. In another line of work, active set identification strategies combined with FW variants have been proposed in [29] and [80]. ...
Preprint
Full-text available
Invented some 65 years ago in a seminal paper by Marguerite Straus-Frank and Philip Wolfe, the Frank-Wolfe method recently enjoys a remarkable revival, fuelled by the need of fast and reliable first-order optimization methods in Data Science and other relevant application areas. This review tries to explain the success of this approach by illustrating versatility and applicability in a wide range of contexts, combined with an account on recent progress in variants, both improving on the speed and efficiency of this surprisingly simple principle of first-order optimization.
Article
In this paper, we consider lasso problems with zero-sum constraint, commonly required for the analysis of compositional data in high-dimensional spaces. A novel algorithm is proposed to solve these problems, combining a tailored active-set technique, to identify the zero variables in the optimal solution, with a 2-coordinate descent scheme. At every iteration, the algorithm chooses between two different strategies: the first one requires to compute the whole gradient of the smooth term of the objective function and is more accurate in the active-set estimate, while the second one only uses partial derivatives and is computationally more efficient. Global convergence to optimal solutions is proved and numerical results are provided on synthetic and real datasets, showing the effectiveness of the proposed method. The software is publicly available.
Article
Full-text available
We study splitting methods for solving the Eigenvalue Complementarity Problem (EiCP). We introduce four variants, which depend on the properties (symmetry, nonsymmetry, positive definite, negative definite, indefinite) of the matrices included in the definition of EiCP. Convergence analyses for each one of these versions of the splitting method are discussed. Special choices for the splitting matrices associated with these versions are recommended and tested on the solution of small and large symmetric and nonsymmetric EiCPs. These experiments show that the four versions of the splitting method work well at least for some choices of the splitting matrices. Furthermore, these versions of the splitting methods seem to be competitive with the most efficient state-of-the-art algorithms for the solution of EiCP.
Article
Full-text available
We propose a gradient-based method for quadratic programming problems with a single linear constraint and bounds on the variables. Inspired by the GPCG algorithm for bound-constrained convex quadratic programming [J.J. Mor\'e and G. Toraldo, SIAM J. Optim. 1, 1991], our approach alternates between two phases until convergence: an identification phase, which performs gradient projection iterations until either a candidate active set is identified or no reasonable progress is made, and an unconstrained minimization phase, which reduces the objective function in a suitable space defined by the identification phase, by applying either the conjugate gradient method or a recently proposed spectral gradient method. However, the algorithm differs from GPCG not only because it deals with a more general class of problems, but mainly for the way it stops the minimization phase. This is based on a comparison between a measure of optimality in the reduced space and a measure of bindingness of the variables that are on the bounds, defined by extending the concept of proportioning, which was proposed by some authors for box-constrained problems. If the objective function is bounded, the algorithm converges to a stationary point thanks to a suitable application of the gradient projection method in the identification phase. For strictly convex problems, the algorithm converges to the optimal solution in a finite number of steps even in case of degeneracy. Extensive numerical experiments show the effectiveness of the proposed approach.
Article
Full-text available
In this paper, we describe a two-stage method for solving optimization problems with bound constraints. It combines the active-set estimate described in [Facchinei and Lucidi, 1995] with a modification of the nonmonotone line search framework recently proposed in [De Santis et al., 2012]. In the first stage, the algorithm exploits a property of the active-set estimate that ensures a significant reduction of the objective function when setting to the bounds all those variables estimated active. In the second stage, a truncated-Newton strategy is used in the subspace of the variables estimated non-active. In order to properly combine the two phases, a proximity check is included in the scheme. This new tool, together with the other theoretical features of the two stages, enables us to prove global convergence. Furthermore, under additional standard assumptions, we can show that the algorithm converges at a superlinear rate. We report results of a numerical experience on bound-constrained problems from the CUTEst collection, showing the efficiency of the proposed approach.
Article
Full-text available
We propose a feasible active set method for convex quadratic programming problems with non-negativity constraints. This method is specifically designed to be embedded into a branch-and-bound algorithm for convex quadratic mixed integer programming problems. The branch-and-bound algorithm generalizes the approach for unconstrained convex quadratic integer programming proposed by Buchheim, Caprara and Lodi to the presence of linear constraints. The main feature of the latter approach consists in a sophisticated preprocessing phase, leading to a fast enumeration of the branch-and-bound nodes. Moreover, the feasible active set method takes advantage of this preprocessing phase and is well suited for reoptimization. Experimental results for randomly generated instances show that the new approach significantly outperforms the MIQP solver of CPLEX 12.6 for instances with a small number of constraints.
Article
Full-text available
The Frank-Wolfe (FW) optimization algorithm has lately re-gained popularity thanks in particular to its ability to nicely handle the structured constraints appearing in machine learning applications. However, its convergence rate is known to be slow (sublinear) when the solution lies at the boundary. A simple less-known fix is to add the possibility to take 'away steps' during optimization, an operation that importantly does not require a feasibility oracle. In this paper, we highlight and clarify several variants of the Frank-Wolfe optimization algorithm that have been successfully applied in practice: away-steps FW, pairwise FW, fully-corrective FW and Wolfe's minimum norm point algorithm, and prove for the first time that they all enjoy global linear convergence, under a weaker condition than strong convexity of the objective. The constant in the convergence rate has an elegant interpretation as the product of the (classical) condition number of the function with a novel geometric quantity that plays the role of a 'condition number' of the constraint set. We provide pointers to where these algorithms have made a difference in practice, in particular with the flow polytope, the marginal polytope and the base polytope for submodular optimization.
Article
Full-text available
A new algorithm for large-scale nonlinear programs with box constraints is introduced. The algorithm is based on an efficient identification technique of the active set at the solution and on a nonmonotone stabilization technique. It possesses global and superlinear convergence properties under standard assumptions. A new technique for generating test problems with known characteristics is also introduced. The implementation of the method is described along with computational results for large-scale problems.
Article
An algorithm is developed for projecting a point onto a polyhedron. The algorithm solves a dual version of the projection problem and then uses the relationship between the primal and dual to recover the projection. The techniques in the paper exploit sparsity. Sparse reconstruction by separable approximation (SpaRSA) is used to approximately identify active constraints in the polyhedron, and the dual active set algorithm (DASA) is used to compute a high precision solution. A linear convergence result is established for SpaRSA that does not require the strong concavity of the dual to the projection problem, and an earlier R-linear convergence rate is strengthened to a Q-linear convergence property. An algorithmic framework is developed for combining SpaRSA with an asymptotically preferred algorithm such as DASA. It is shown that only the preferred algorithm is executed asymptotically. Numerical results are given using the polyhedra associated with the Netlib LP test set. A comparison is made to the interior point method contained in the general purpose open source software package IPOPT for nonlinear optimization, and to the commercial package CPLEX, which contains an implementation of the barrier method that is targeted to problems with the structure of the polyhedral projection problem.
Article
In this paper, we address the solution of the symmetric eigenvalue complementarity problem (EiCP) by treating an equivalent reformulation of finding a stationary point of a fractional quadratic program on the unit simplex. The spectral projected-gradient (SPG) method has been recommended to this optimization problem when the dimension of the symmetric EiCP is large and the accuracy of the solution is not a very important issue. We suggest a new algorithm which combines elements from the SPG method and the block active set method, where the latter was originally designed for box constrained quadratic programs. In the new algorithm the projection onto the unit simplex in the SPG method is replaced by the much cheaper projection onto a box. This can be of particular advantage for large and sparse symmetric EiCPs. Global convergence to a solution of the symmetric EiCP is established. Computational experience with medium and large symmetric EiCPs is reported to illustrate the efficacy and efficiency of the new algorithm.
Article
A polyhedral active set algorithm PASA is developed for solving a nonlinear optimization problem whose feasible set is a polyhedron. Phase one of the algorithm is the gradient projection method, while phase two is any algorithm for solving a linearly constrained optimization problem. Rules are provided for branching between the two phases. Global convergence to a stationary point is established, while asymptotically PASA performs only phase two when either a nondegeneracy assumption holds, or the active constraints are linearly independent and a strong second-order sufficient optimality condition holds.