Content uploaded by Philip E. Gill
Author content
All content in this area was uploaded by Philip E. Gill
Content may be subject to copyright.
REGULARIZED SEQUENTIAL QUADRATIC
PROGRAMMING METHODS
Philip E. Gill∗Daniel P. Robinson†
UCSD Department of Mathematics
Technical Report NA-11-02
October 2011
Abstract
We present the formulation and analysis of a new sequential quadratic pro-
gramming (SQP) method for general nonlinearly constrained optimization. The
method pairs a primal-dual generalized augmented Lagrangian merit function
with a flexible line search to obtain a sequence of improving estimates of the
solution. This function is a primal-dual variant of the augmented Lagrangian
proposed by Hestenes and Powell in the early 1970s. A crucial feature of the
method is that the QP subproblems are convex, but formed from the exact
second derivatives of the original problem. This is in contrast to methods that
use a less accurate quasi-Newton approximation. Additional benefits of this
approach include the following: (i) each QP subproblem is regularized; (ii) the
QP subproblem always has a known feasible point; and (iii) a projected gradient
method may be used to identify the QP active set when far from the solution.
Key words. Nonlinear programming, nonlinear constraints, augmented
Lagrangian, sequential quadratic programming, SQP methods, regularized meth-
ods, primal-dual methods.
AMS subject classifications. 49J20, 49J15, 49M37, 49D37, 65F05, 65K05,
90C30
∗Department of Mathematics, University of California, San Diego, La Jolla, CA 92093-0112
(pgill@ucsd.edu). Research supported in part by National Science Foundation grants DMS-
0511766 and DMS-0915220, and by Department of Energy grant DE-SC0002349.
†Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD
21218-2682 (daniel.p.robinson@jhu.edu).
1
1. Introduction 2
1. Introduction
We present a sequential quadratic programming (SQP) method for optimization
problems involving general linear and nonlinear constraints. The method is de-
scribed in terms of the problem format:
(NP) minimize
x∈Rnf(x) subject to c(x)=0, x ≥0,
where c:Rn7→ Rmand f:Rn7→ Rare twice-continuously differentiable. This
problem format assumes that all general inequality constraints have been converted
to equalities by the use of slack variables. Methods for solving problem (NP) easily
carry over to the more general setting with l≤x≤u. The vector-pair (x∗, y∗) is
called a first-order solution to problem (NP) if it satisfies
c(x∗) = 0 and min x∗, z∗= 0,(1.1)
where y∗are the Lagrange multipliers associated with the constraints c(x) = 0, and
z∗are the reduced costs at (x∗, y∗), i.e. z∗=g(x∗)−J(x∗)Ty∗
Sequential quadratic programming methods and interior methods are two alter-
native approaches to handling the inequality constraints in problem (NP). Sequential
quadratic programming (SQP) methods find an approximate solution of a sequence
of quadratic programming (QP) subproblems in which a quadratic model of the ob-
jective function is minimized subject to the linearized constraints. Interior methods
approximate a continuous path that passes through a solution of (NP). In the sim-
plest case, the path is parameterized by a positive scalar parameter µthat may be
interpreted as a perturbation for the optimality conditions for the problem (NP).
Both interior methods and SQP methods have an inner/outer iteration structure,
with the work for an inner iteration being dominated by the cost of solving a large
sparse system of symmetric indefinite linear equations. In the case of SQP meth-
ods, these equations involve a subset of the variables and constraints; for interior
methods, the equations involve all the constraints and variables.
SQP methods provide a relatively reliable “certificate of infeasibility” and they
have the potential of being able to capitalize on a good initial starting point. Sophis-
ticated matrix factorization updating techniques are used to exploit the fact that
the linear equations change by only a single row and column at each inner iteration.
These updating techniques are often customized for the particular QP method being
used and have the benefit of providing a uniform treatment of ill-conditioning and
singularity.
On the negative side, it is difficult to implement SQP methods so that exact sec-
ond derivatives can be used efficiently and reliably. Some of these difficulties stem
from the theoretical properties of the quadratic programming subproblem, which
can be nonconvex when second derivatives are used. Nonconvex quadratic pro-
gramming is NP-hard—even for the calculation of a local minimizer [11,25]. The
complexity of the QP subproblem has been a major impediment to the formula-
tion of second-derivative SQP methods (although methods based on indefinite QP
have been proposed [19,20]). Over the years, algorithm developers have avoided
1. Introduction 3
this difficulty by eschewing second derivatives and by solving a convex QP subprob-
lem defined with a positive semidefinite quasi-Newton approximate Hessian (see,
e.g., [28]); some authors enhance these basic methods with an additional subspace
phase that incorporates exact second derivatives [33,34,40]. A difficulty with active-
set methods is that they may require a substantial number of QP iterations when the
outer iterates are far from the solution. The use of a QP subproblem is motivated by
the assumption that the QP objective and constraints provide good “models” of the
objective and constraints of problem (NP). This should make it unnecessary (and
inefficient) to solve the QP to high accuracy during the preliminary iterations. Un-
fortunately, the simple expedient of limiting the number of inner iterations may have
a detrimental effect upon reliability. An approximate QP solution may not predict
a sufficient improvement in a merit function. Moreover, some of the QP multipliers
will have the wrong sign if an active-set method is terminated before a solution is
found. This may cause difficulties if the QP multipliers are used to estimate the
multipliers for the nonlinear problem. These issues would largely disappear if a
primal-dual interior method were to be used to solve the QP subproblem. These
methods have the benefit of providing a sequence of feasible (i.e., correctly signed)
dual iterates. Nevertheless, QP solvers based on conventional interior methods have
had limited success within SQP methods because they are difficult to “warm start”
from a near-optimal point (see the discussion below). This makes it difficult to
capitalize on the property that, as the outer iterates converge, the solution of one
QP subproblem is a very good estimate of the solution of the next.
Broadly speaking, the advantages and disadvantages of SQP methods and in-
terior methods complement each other. Interior methods are most efficient when
implemented with exact second derivatives. Moreover, they can converge in few
inner iterations—even for very large problems. The inner iterates are the iterates of
Newton’s method for finding an approximate solution of the perturbed optimality
conditions for a given µ. As the dimension and zero/nonzero structure of the New-
ton equations remains fixed, these Newton equations may be solved efficiently using
either iterative or direct methods available in the form of advanced “off-the-shelf”
linear algebra software. In particular, any new software for multicore and parallel
architectures is immediately applicable. Moreover, the perturbation parameter µ
plays an auxiliary role as an implicit regularization parameter of the linear equa-
tions. This implicit regularization plays a crucial role in the robustness of interior
methods on ill-conditioned and ill-posed problems.
On the negative side, although interior methods are very effective for solving
“one-off” problems, they are difficult to adapt to solving a sequence of related non-
linear problems. This difficulty may be explained in terms of the “path-following”
interpretation of interior methods. In the neighborhood of an optimal solution, a
step along the path x(µ) of perturbed solutions is well-defined, whereas a step onto
the path from a neighboring point will be extremely sensitive to perturbations in
the problem functions (and hence difficult to compute). Another difficulty with con-
ventional interior methods is that a substantial number of iterations may be needed
when the constraints are infeasible.
The idea of replacing a constrained optimization problem by a sequence of un-
1. Introduction 4
constrained problems parameterized by a scalar µhas played a fundamental role
in the formulation of algorithms since the early 1960s (for a seminal reference, see
Fiacco and McCormick [16,17]). One of the best-known methods for solving the
equality-constrained problem (NEP) uses an unconstrained function based on the
quadratic penalty function, which combines fwith a term of order 1/µ that “penal-
izes” the sum of the squares of the constraint violations. Under certain conditions
(see, e.g., [17,26,49,51]), the minimizers of the penalty function define a differen-
tiable trajectory or central path that approaches the solution as µ→0. Penalty
methods approximate this path by minimizing the penalty function for a finite se-
quence of decreasing values of µ. In this form, the methods have a two-level structure
of inner and outer iterations: the inner iterations are those of the method used to
minimize the penalty function, and the outer iterations test for convergence and
adjust the value of µ. As µ→0, the Newton equations for minimizing the penalty
function are increasingly ill-conditioned, and this ill-conditioning was perceived to
be the reason for the poor numerical performance on some problems. In separate
papers, Hestenes [36] and Powell [42] proposed the augmented Lagrangian function
for (NEP), which is an unconstrained function based on augmenting the Lagrangian
function with a quadratic penalty term that does not require µto go to zero for con-
vergence. The price that must be paid for keeping 1/µ finite is the need to update
estimates of the Lagrange multipliers in each outer iteration.
Since the first appearance of the Hestenes-Powell function, many algorithms have
been proposed based on using the augmented Lagrangian as an objective function for
sequential unconstrained minimization. Augmented Lagrangian functions have also
been proposed that treat the multiplier vector as a continuous function of x; some
of these ensure global convergence and permit local superlinear convergence (see,
e.g., Fletcher [18]; DiPillo and Grippo [13]; Bertsekas [1,2]; Boggs and Tolle [4]).
As methods for treating linear inequality constraints and bounds became more
sophisticated, the emphasis of algorithms shifted from sequential unconstrained min-
imization to sequential linearly constrained minimization. In this context, the aug-
mented Lagrangian has been used successfully within a number of different algo-
rithmic frameworks for problem (NP). The method used in the software package
LANCELOT [9] finds the approximate solution of a sequence of bound constrained
problems with an augmented Lagrangian objective function. Similarly, the software
package MINOS of Murtagh and Saunders [41] employs a variant of Robinson’s lin-
early constrained Lagrangian (LCL) method [44] in which an augmented Lagrangian
is minimized subject to the linearized nonlinear constraints. Friedlander and Saun-
ders [27] define a globally convergent version of the LCL method that can treat
infeasible constraints and infeasible subproblems. Augmented Lagrangian functions
have also been used extensively as a merit function for sequential quadratic pro-
gramming (SQP) methods (see, e.g., [3,5,7,21,28,30,45–48]).
The development of path-following interior methods for linear programming in
the mid-1980s stimulated renewed interest in the treatment of constraints by sequen-
tial unconstrained optimization. This new attention not only resulted in a new un-
derstanding of the computational complexity of existing methods but also provided
the impetus for the development of new approaches. A notable development was the
1. Introduction 5
derivation of efficient path-following methods for linear programming based on ap-
plying Newton’s method with respect to both the primal and dual variables. These
new approaches also refocused attention on two computational aspects of penalty-
and barrier-function methods for nonlinear optimization. First, the recognition of
the formal equivalence between some primal-dual methods and conventional penalty
methods indicated that the inherent ill-conditioning of penalty and barrier functions
is not necessarily the reason for poor numerical performance. Second, the crucial
role of penalty and barrier functions in problem regularization was recognized and
better understood.
In this paper we formulate and analyze a new sequential quadratic programming
(SQP) method for nonlinearly constrained optimization. The method pairs a primal-
dual generalized augmented Lagrangian merit function with a flexible line search
to obtain a sequence of improving estimates of the solution. This function is a
primal-dual variant of the augmented Lagrangian proposed by Hestenes and Powell
in the early 1970s. A crucial feature of the method is that the QP subproblems
are convex, but formed from the exact second derivatives of the original problem.
This is in contrast to methods that use a less accurate quasi-Newton approximation.
Additional benefits of this approach include the following: (i) each QP subproblem
is regularized; (ii) the QP subproblem always has a known feasible point; and (iii) a
projected gradient method may be used to identify the QP active set when far from
the solution. Preliminary numerical experiments on a subset of problems from the
CUTEr test collection indicate that the proposed SQP method is significantly more
efficient than our current SQP package SNOPT.
The paper is organized in five sections. Section 1is a review of some of the
basic properties of SQP methods. In Section 2, the steps of the primal-dual SQP
method are defined. Similarities with the conventional Hestenes-Powell augmented
Lagrangian method are also discussed. In Section 3, we consider methods for the
solution of the QP subproblem and show that in the neighborhood of a solution,
the method is equivalent to the stabilized SQP method [15,35,38,50]. A rather
general global convergence result is established in Section 4that does not make any
constraint qualification or non-degeneracy assumption.
Notation and Terminology
Unless explicitly indicated otherwise, k·k denotes the vector two-norm or its induced
matrix norm. The inertia of a real symmetric matrix A, denoted by In(A), is the
integer triple (a+, a−, a0) giving the number of positive, negative and zero eigen-
values of A. Given vectors aand bwith the same dimension, the vector with ith
component aibiis denoted by a·b. The vectors eand ejdenote, respectively, the
column vector of ones and the jth column of the identity matrix I. The dimensions
of e,eiand Iare defined by the context. Given vectors xand y, the long vector
consisting of the elements of xaugmented by elements of yis denoted by (x, y). The
ith component of a vector labeled with a subscript will be denoted by [ ·]i, e.g., [ v]i
is the ith component of the vector v. The subvector of components with indices in
the index set Sis denoted by [ ·]S, e.g., [ v]Sis the vector with components [ v]i
1. Introduction 6
for i∈ S. Similarly, if Mis a symmetric matrix, then [ M]Sdenotes the symmetric
matrix with elements mij for i,j∈ S. A local solution of an optimization problem
is denoted by x∗. The vector g(x) is used to denote ∇f(x), the gradient of f(x),
and H(x) denotes the (symmetric) Hessian matrix ∇2
f(x). The matrix J(x) de-
notes the m×nconstraint Jacobian, which has ith row ∇ci(x)T, the gradient of
the ith constraint function ci(x). The matrix Hi(x) denotes the Hessian of ci(x).
The Lagrangian function associated with (NP) is L(x, y, z ) = f(x)−c(x)Ty−zTx,
where yand zare m- and n-vectors of dual variables associated with the equality
constraints and bounds, repectively. The Hessian of the Lagrangian with respect to
xis denoted by H(x, y) = H(x)−Pm
i=1 yiHi(x).
Background
Some of the most efficient algorithms for nonlinear optimization are sequential
quadratic programming (SQP) methods. Conventional SQP methods find an ap-
proximate solution of a sequence of quadratic programming (QP) subproblems in
which a quadratic model of the objective function is minimized subject to the lin-
earized constraints. Given a current estimate (xk, yk) of a primal-dual solution of
(NP), a line search SQP method computes a search direction pksuch that xk+pk
is the solution (when it exists) of the convex quadratic program
minimize
xgT
k(x−xk) + 1
2(x−xk)T¯
Hk(x−xk)
subject to ck+Jk(x−xk) = 0, x ≥0,(1.2)
where ck,gkand Jkdenote the quantities c(x), g(x) and J(x) evaluated at xk, and
¯
Hkis some positive-definite approximation to H(xk, yk). If the Lagrange multiplier
vector associated with the constraint ck+Jk(x−xk) = 0 is written in the form
yk+qk, then a solution (xk+pk, yk+qk) of the QP subproblem (1.2) satisfies
ck+Jkpk= 0 and min xk+pk, gk+¯
Hkpk−JT
k(yk+qk)= 0,
Given any x≥0, let A0and F0denote the index sets
A0(x) = {i:xi= 0}and F0(x) = {1,2, . . . , n}/A0(x),(1.3)
If xis feasible for the constraints ck+Jk(x−xk) = 0, then A0(x) is the active set
at x. If the set A0associated with a solution of the subproblem (1.2) is known,
then xk+pkmay be found by solving linear equations that represent the optimality
conditions for an equality-constrained QP with the inequalities x≥0 replaced by
xi= 0 for i∈ A0. In general, the optimal A0is not known in advance, and
active-set methods generate a sequence of estimates (bpj,bqj)≈(pk, qk) such that
(bpj+1,bqj+1) = (bpj,bqj) + αj(∆pj, ∆qj), with (∆pj, ∆qj) a solution of
¯
HF−JT
F
JF0∆pF
∆qj=−[gk+¯
Hkbpj−JT
k(yk+bqj) ]F
ck+Jkbpj,(1.4)
where ¯
HFis the matrix of free rows and columns of ¯
Hk,JFis the matrix of free
columns of Jk, and the step length αis chosen to ensure feasibility of all variables,
not just those in the set A0.
2. A Regularized Primal-Dual Line-Search SQP Algorithm 7
If the equations (1.4) are to be used to define ∆pFand ∆qj, then it is necessary
that JFhas full rank, which is probably the greatest outstanding issue associated
with systems of the form (1.4). Two remedies are available.
•Rank-enforcing active-set methods maintain a set of indices Bassociated with a
matrix of columns JBwith rank m, i.e., the rows of JBare linearly independent.
The set Bis the complement in (1,2, . . . , n) of a “working set” of indices that
estimates the set A0at a solution of (1.2). If Nis a subset of A0, then the
system analogous to (1.4) is given by
¯
HB−JT
B
JB0∆pB
∆qj=−[gk+¯
Hkbpj−JT
k(yk+bqj) ]B
ck+Jkbpj,(1.5)
which is nonsingular because of the linear independence of the rows of JB.
•Regularized active-set methods add a positive-definite regularization term in
the (2,2) block of (1.4). The magnitude of the regularization is generally based
an heuristic arguments, which gives mixed results in practice.
2. A Regularized Primal-Dual Line-Search SQP Algorithm
In this section, we define a regularized SQP line-search method based on the primal-
dual augmented Lagrangian merit function
Mν(x, y ;yE, µ) = f(x)−c(x)TyE+1
2µkc(x)k2+ν
2µkc(x) + µ(y−yE)k2,(2.1)
where νis a scalar, µis the so-called penalty parameter, yEis an estimate of an
optimal Lagrange multiplier vector y∗. This function, proposed by Robinson [43],
and Gill and Robinson [31], may be derived by applying the primal-dual penalty
function of Forsgren and Gill [23] to a problem in which the constraints are shifted
by a constant vector (see Powell [42]). With the notation c=c(x), g=g(x), and
J=J(x), the gradient of Mν(x, y ;yE, µ) may be written as
∇Mν(x, y ;yE, µ) = g−JT(1 + ν)(yE−1
µc)−νy
νc+µ(y−yE)!(2.2a)
= g−JTπ+ν(π−y)
νµ(y−π)!,(2.2b)
where π=π(x;yE, µ) denotes the vector-valued function
π(x;yE, µ) = yE−1
µc(x).(2.3)
Similarly, the Hessian of Mν(x, y ;yE, µ) may be written as
∇2
Mν(x, y ;yE, µ) = Hx, π +ν(π−y)+1
µ(1 + ν)JTJ νJ T
νJ νµI .(2.4)
2. A Regularized Primal-Dual Line-Search SQP Algorithm 8
We use Mν(x, y), ∇Mν(x, y), and ∇2
Mν(x, y), to denote Mν,∇Mν, and ∇2
Mν
evaluated with parameters yEand µ. (We note that a trust-region based method
could also be given, but we leave the statement and analysis to a future paper.)
Our approach is motivated by the following theorem, which shows that minimiz-
ers of problem (NP) are also minimizers—under certain assumptions—of the bound
constrained problem
minimize
x,y Mν(x, y ;y∗, µ) subject to x≥0,(2.5)
where y∗is a Lagrange multiplier vector for the equality constraints c(x) = 0.
Theorem 2.1. If (x∗, y∗)satisfies the second-order sufficient conditions for a solu-
tion of problem (NP), then there exists a positive ¯µsuch that for all 0<µ< ¯µ, the
point (x∗, y∗)is a minimizer of the bound constrained problem (2.5)for all ν > 0.
2.1. Definition of the search direction
To motivate the computation of the step, we consider a quadratic approximation to
Mν. Given (x, y) and fixed ν≥0, we define
Hν
M(x, y ;µ) = ¯
H(x, y) + 1
µ(1 + ν)J(x)TJ(x)νJ (x)T
νJ (x)νµI ,(2.6)
where ¯
H(x, y) is a symmetric approximation to Hx, π +ν(π−y)≈H(x, y ) such
that ¯
H(x, y) + 1
µJ(x)TJ(x) is positive definite. The approximation π+ν(π−y)≈y
is valid provided π≈y. The restriction on the inertia of ¯
Himplies that Hν
M(x, y ;µ)
is positive definite for ν > 0 and positive semidefinite for ν= 0 (see Theorem 3.1 of
Section 3.2.3).
Using this definition of Hν
Mat the kth primal-dual iterate vk= (xk, yk), consider
the convex QP subproblem
minimize
∆v=(p,q)∇Mν(vk)T∆v +1
2∆vTHν
M(vk)∆v subject to xk+p≥0,(2.7)
where Mν(v) denotes the merit function evaluated at v. For any primal-dual QP
solution ∆vk= (pk, qk), it is shown in Theorem 3.3 of Section 3.2.3 that the first-
order conditions associated with the variables in F0(xk+pk) may be written in
matrix form as:
¯
HF−JT
F
JFµI pF
qk!=− [gk−JT
kyk−¯
Hks]F
ck+µ(yk−yE)−Jks!,(2.8)
where ck,gkand Jkdenote the quantities c(x), g(x) and J(x) evaluated at xk, and
sis a nonnegative vector such that
si=[xk]iif i∈ A0(xk+pk);
0 if i∈ F0(xk+pk).
2. A Regularized Primal-Dual Line-Search SQP Algorithm 9
(The assumption of positive-definiteness of ¯
Hk+1
µJT
kJkimplies that the matrix
associated with the equations (2.8) is nonsingular.) It follows that if A0(xk+pk) =
A0(xk), then (pk, qk) satisfies the perturbed Newton equations
HF−JT
F
JFµI pF
qk!=− [gk−JT
kyk]F
ck+µ(yk−yE)!.
A key property is that if µ= 0 and JFhas full rank, then this equation is identical
to the equation for the conventional SQP step given by (1.4). This provides the
motivation to use different penalty parameters for the step computation and the
merit function.
Given an iterate vk= (xk, yk) and Lagrange multiplier estimate yE
k, the primal-
dual search direction ∆vk= (pk, qk) is defined such that vk+∆vk= (xk+pk, yk+qk)
is a solution of the convex QP problem
minimize
v=(x,y)(v−vk)T∇Mν(vk;yE
k, µR
k) + 1
2(v−vk)THν
M(vk;µR
k)(v−vk)
subject to x≥0,(2.9)
where µR
kis a small parameter, and Hν
M(vk;µR
k) is the matrix (2.6) written in terms
of the composite variables vk= (xk, yk). In this context, µR
kplays the role of a
regularization parameter rather than a penalty parameter, thereby providing an
O(µR
k) estimate of the conventional SQP direction. This approach is nonstandard
because a small “penalty parameter” µR
kis used by design, whereas other augmented
Lagrangian-based methods attempt to keep µas large as possible [8,28].
Finally, we note that if v=vkis a solution of the QP (2.9), then vkis a first-order
solution of
minimize
v=(x,y)Mν(v;yE
k, µR
k) subject to x≥0.(2.10)
In Section 3it is shown that, under certain conditions, the primal-dual vector
vk+∆vk= (xk+pk, yk+qk) is a solution of problem (2.9) if and only if it solves
minimize
x,y gT
k(x−xk) + 1
2(x−xk)T¯
H(xk, yk)(x−xk) + 1
2µR
kkyk2
subject to ck+Jk(x−xk) + µR
k(y−yE
k)=0, x ≥0,(2.11)
which is often referred to as the “stabilized” SQP subproblem because of its calming
effect on multiplier estimates for degenerate problems (see, e.g., [35,50]). Therefore,
the proposed method provides a natural link between the stabilized SQP methods
(which employ a subproblem appropriate for degenerate problems), conventional
SQP methods (which are highly efficient in practice), and augmented Lagrangian
methods (which have desirable global convergence properties).
2.2. Definition of the new iterate
Once the search direction ∆vkhas been determined, a “flexible” backtracking line
search is performed on the primal-dual augmented Lagrangian. A conventional
2. A Regularized Primal-Dual Line-Search SQP Algorithm 10
backtracking line search defines vk+1 =vk+αk∆vk, where αk= 2−jand jis the
smallest nonnegative integer such that
Mν(vk+αk∆vk;yE
k, µk)≤ Mν(vk;yE
k, µk) + αkηS∆vT
k∇Mν(vk;yE
k, µk)
for a given scalar ηS∈(0,1). However, this approach would suffer from the Maratos
effect [39] simply because the penalty parameter µkand the regularization parameter
µR
kgenerally have different values. Thus, we use a “flexible penalty function” based
on the work of Curtis and Nocedal [12] and define αk= 2−j, where jis the smallest
nonnegative integer such that
Mν(vk+αk∆vk;yE
k, µF
k)≤ Mν(vk;yE
k, µF
k) + αkηSNk(2.12)
for some value µF
k∈[µR
k, µk], and where
Nk4
=max ∆vT
k∇Mν(vk;yE
k, µR
k),−10−3k∆vkk2≤0 (2.13)
is a sufficiently negative real number that will allow us to prove global convergence
of our proposed method. Once an appropriate value for αkis found, the new primal-
dual solution estimate is given by
xk+1 =xk+αkpkand yk+1 =yk+αkqk.
We note that the step acceptance is well-defined since the weakened Armijo condi-
tion (2.12) will be satisfied for µF
k=µR
kand all αsufficiently small.
2.3. Updating the multiplier estimate
The preliminary numerical results presented in [31] indicate that the method out-
lined thus far is robust with respect to updating yE
k. In particular, the numerical
results generated in that paper updated yE
kat every iteration. Consequently, we
seek a strategy that allows for frequent updates to yE
k. To this end, we use the
(merit) functions
φS(v) = η(x) + 10−5ω(v) and φL(v) = 10−5η(x) + ω(v),(2.14)
where
η(x) = kc(x)kand ω(x, y) =
min x, g(x)−J(x)Ty
(2.15)
are feasibility and stationarity measures at the point (x, y), respectively. These
optimality measures are based on the optimality conditions for problem (NP) rather
than for minimizing the merit function Mν. Both measures are bounded below by
zero, and are equal to zero if vis a first-order solution to problem (NP). Such
conditions are appropriate because trial steps are regularized SQP steps that should
converge rapidly to a solution of problem (NP).
The estimate yE
kis updated when any iterate vksatisfies either φS(vk)≤1
2φmax
S
or φL(vk)≤1
2φmax
L, where φmax
Sand φmax
Lare bounds that are updated throughout
the solution process. To ensure global convergence, the update to yE
kis accompanied
by a decrease in either φmax
Sor φmax
L.
2. A Regularized Primal-Dual Line-Search SQP Algorithm 11
Finally, yE
kis also updated if an approximate first-order solution of the problem
minimize
x,y Mν(x, y ;yE
k, µR
k) subject to x≥0 (2.16)
has been found. The test for optimality is
k∇
yMν(vk+1 ;yE
k, µR
k)k ≤ τkand
min xk+1,∇
xMν(vk+1 ;yE
k, µR
k)
≤τk(2.17)
for some small tolerance τk>0. This condition is rarely satisfied in practice, but
the test is required for the proof of convergence. Nonetheless, if the condition is
satisfied, yE
kis updated with the safeguarded estimate
yE
k+1 = mid−106, yk+1,106.
2.4. Updating the penalty parameters
As we only want to decrease µR
kwhen “close” to optimality (ignoring locally infea-
sible problems), we use the definition
µR
k+1 =(min 1
2µR
k,krkk3/2,if (2.17) is satisfied;
min µR
k,krkk3/2,otherwise, (2.18)
where
rk+1 ≡ropt(vk+1)4
=c(xk+1)
min xk+1, g(xk+1)−J(xk+1)Tyk+1 .(2.19)
The update to µkis motivated by a different goal. Namely, we wish to decrease µk
only when the trial step indicates that the merit function with penalty parameter
µkincreases. Thus, we use the definition
µk+1 =(µk,Mν(vk+1 ;yE
k, µk)≤ Mν(vk;yE
k, µk) + min(αmin, αk)ηSNk
max 1
2µk, µR
k+1,otherwise,
(2.20)
for some positive αmin. The use of the scalar αmin increases the likelihood that µk
will not be decreased.
2.5. Formal statement of the algorithm
In this section we formally state the proposed method as Algorithm 2.1 and in-
clude some additional details. During each iteration, the trial step is computed as
described in Section 2.1, the solution estimate is updated as in Section 2.2,yE
kis
updated as in Section 2.3, and the penalty parameters are updated as in Section 2.4.
The value of yE
kis crucial for both global and local convergence. To this end, there
are three possibilities. First, yE
kis set to yk+1 if (xk+1, yk+1 ) is acceptable to either
of the merit functions φSor φLgiven by (2.14). These iterates are labeled as S- and
L-iterates, respectively. It is to be expected that yE
kwill be updated in this way
most of the time. Second, if (xk+1 , yk+1) is not acceptable to either of the merit
3. Solution of the QP Subproblem 12
functions φSor φL, we check whether we have computed an approximate first-order
solution to problem (2.16) by verifying conditions (2.17) for the current value of τk.
If these conditions are satisfied, the iterate is called an M-iterate. In this case, the
regularization parameter µR
kand subproblem tolerance τkare decreased and yE
kis
updated as in (2.3). Finally, an iterate at which neither of the first two cases occur
is called an F-iterate. The multiplier estimate yE
kis not changed in an F-iterate.
Algorithm 2.1. Regularized primal-dual SQP algorithm (pdSQP)
Input (x0, y0);
Set algorithm parameters αmin >0, ηS∈(0,1), τstop >0, and ν > 0;
Initialize yE
0=y0,τ0>0, µR
0>0, µ0∈[µR
0,∞), and k= 0;
Compute f(x0), c(x0), g(x0), J(x0), and H(x0, y0);
for k= 0,1,2, . . . do
Define ¯
Hk≈H(xk, yk) such that ¯
Hk+ (1/µR
k)JT
kJkis positive definite;
Solve the QP (2.9) for the search direction ∆vk= (pk, qk);
Find an αksatisfying (2.12) and (2.13);
Update the primal-dual estimate xk+1 =xk+αkpk,yk+1 =yk+αkqk;
Compute f(xk+1), c(xk+1), g(xk+1), J(xk+1), and H(xk+1, yk+1);
if φS(xk+1, yk+1)≤1
2φmax
Sthen [S-iterate]
φmax
S=1
2φmax
S;
yE
k+1 =yk+1;
τk+1 =τk;
else if φL(xk+1 , yk+1)≤1
2φmax
Lthen [L-iterate]
φmax
L=1
2φmax
L;
yE
k+1 =yk+1;
τk+1 =τk;
else if vk+1 = (xk+1 , yk+1) satisfies (2.17) [M-iterate]
yE
k+1 = mid(−106, yk+1,106);
τk+1 =1
2τk;
else [F-iterate]
yE
k+1 =yE
k;
τk+1 =τk;
end if
Update µR
k+1 and µk+1 according to (2.18) and (2.20), respectively;
if krkk ≤ τstop then exit ;
end (for)
3. Solution of the QP Subproblem
In this section we consider various theoretical and computational issues associated
with the QP subproblem (2.9). In particular, it is shown that the search direction
computed using subproblem (2.9) is the unique solution of the “stabilized” SQP
subproblem (2.11), and independent of the value of ν. Moreover, an active-set
3. Solution of the QP Subproblem 13
method applied to problems (2.9) and (2.11) generates identical iterates, provided
a common (feasible) starting point is used.
3.1. Equivalence with Stabilized SQP
In this section it is shown that, under certain conditions, the regularized QP sub-
problem (2.9) is equivalent to the stabilized SQP subproblem (2.11). Equivalent
problems are considered in which the unknowns are written in terms of the steps
(p, q) for given variables (x, y).
Theorem 3.1. Consider the bound constrained QP
minimize
∆v=(p,q)gT
M∆v +1
2∆vTHM∆v subject to x+p≥0,(3.1)
where xand yare constant,
gM=g−JTπ+ν(π−y)
νc+µ(y−yE),and HM=H+1
µ(1 + ν)JTJ νJ T
νJ νµI ,
with H+1
µJTJpositive definite and ν≥0. For the same quantities c,g,Jand H,
consider the stabilized QP problem
minimize
p,q gTp+1
2pTHp +1
2µky+qk2
subject to c+J p +µ(y+q−yE)=0, x +p≥0.(3.2)
The following results hold.
(a) The stabilized QP (3.2)has a bounded unique primal-dual solution (p, q).
(b) The unique solution ∆v = (p, q)of the stabilized QP (3.2)is a solution of the
bound constrained QP (3.1)for all ν≥0. If ν > 0, then the stabilized solution
∆v = (p, q)is the unique solution of (3.1).
Proof. For part (a), let ∆v = (p, q) denote an arbitrary feasible point for the
constraints of the stabilized QP (3.2). Given the particular feasible point ∆v0=
(0, π −y), consider an n-vector of variables wdefined by the linear transformation
∆v =∆v0+Mw, where M= µI
−J!.
The matrix Mis (n+m)×nwith rank n, and its columns form a basis for the
null-space of the constraint matrix J µI . Using this transformation gives rise to
the equivalent problem
minimize
w∈Rn
µ
2wTH+1
µJTJw+wTg−JTπsubject to x+µw ≥0.
The matrix H+1
µJTJis positive definite by assumption, and it follows that the
stabilized QP (3.2) is equivalent to a convex program with a strictly convex objective.
The existence of a bounded unique solution follows directly.
3. Solution of the QP Subproblem 14
For part (b), we begin by stating the first-order conditions for (p, q) to be a
solution of the stabilized QP (3.2):
c+Jp +µ(y+q−yE)=0, µ(y+q) = µw,
g+Hp −JTw−z= 0, z ≥0,
z·(x+p)=0, x +p≥0,
where wand zdenote the dual variables for the equality and inequality constraints
of problem (3.2), respectively. Eliminating wusing the equation w=y+qgives
c+Jp +µ(y+q−yE) = 0,(3.3a)
g+Hp −JT(y+q)−z= 0, z ≥0,(3.3b)
z·(x+p) = 0, x +p≥0.(3.3c)
First, we prove part (b) for the case ν > 0. The optimality conditions for (3.1)
are
gM+HM∆v =z
0, z ≥0,(3.4)
z·(x+p)=0, x +p≥0.
Pre-multiplying the equality of (3.4) by the nonsingular matrix Tsuch that
T= I−1+ν
νµ JT
01
νI!,
and using the definition (2.2a) yields the equivalent conditions
g+Hp −JT(y+q)−z= 0 and c+Jp +µ(y+q−yE)=0,
which are identical to the relevant equalities in (3.3). Thus, the solutions of (3.2)
and (3.1) are identical in the case ν > 0.
It remains to consider the case ν= 0. In this situation, the objective function
of the QP (3.1) includes only the primal variables p, which implies that the problem
may be written as
minimize
p(g−JTπ)Tp+1
2pTH+1
µJTJpsubject to x+p≥0,(3.5)
with qan arbitrary vector. Although there are infinitely many solutions of (3.1),
the vector passociated with a particular solution (p, q) is unique because it is the
solution of problem (3.5) for a positive-definite matrix H+1
µJTJ. The optimality
conditions for (3.5) are
g−JTπ+H+1
µJTJp=z, z ≥0,(3.6)
z·(x+p)=0, x +p≥0.
3. Solution of the QP Subproblem 15
For the given yand optimal p, define the m-vector qsuch that
q=−1
µJp +c+µ(y−ye)=−1
µJp +µ(y−π).(3.7)
Equation (3.7) and the equality of (3.6) may be combined to give the matrix equation
g−JTy+ 2JT(y−π)
µ(y−π)!+ H+2
µJTJ JT
J µI ! p
q!= z
0!.
Applying the nonsingular matrix I−2
µJT
0Ito both sides of this equation yields
g−JTy
c+µ(y−ye)+H−JT
J µI p
q=z
0.
It follows that if ν= 0, then the unique solution of (3.2) is a solution of (3.1), which
is what we wanted to show.
When ν > 0, the uniqueness of the solution ∆v = (p, q ) follows from the obser-
vation that QP (3.1) is then convex with a strictly convex objective.
Theorem 3.1 shows that the direction defined by bound-constrained QP is inde-
pendent of the parameter ν. Moreover, this direction may be defined as the solution
of an equivalent stabilized SQP subproblem (2.11) that does not include νat all.
However, the parameter νdoes appear explicitly in the definition of the merit func-
tion Mν(2.1), and therefore plays an important role in influencing the length of
the step during the flexible line search. The value of νdetermines the proximity
of the primal-dual iterates to the so-called “primal-dual trajectory”, which is the
one-parameter family of points x(µ), y(µ), such that x(µ) is a minimizer of the
conventional augmented Lagrangian for fixed yE. The definition of Mνimplies that
larger values of νtend to force the iterates to be close to the primal-dual trajec-
tory. If ν= 0 then the method reverts to a regularized SQP method based on the
(primal) conventional augmented Lagrangian (for which no emphasis is placed on
staying close to the primal-dual trajectory). The algorithm may be modified to
allow for the choice ν= 0 by always setting yE
k+1 to be π(xk+1); this does emphasize
the primal-dual trajectory, but only after the major iteration has been completed.
The use of the primal-dual augmented Lagrangian function allows the emphasis on
the dual variables during the line search.
3.2. Equivalent iterates of an active-set method
In Section 3.1 it is shown that, if ν > 0 then the unique solutions of subproblems (2.9)
and (2.11) are identical, and if ν= 0 then the solution of (2.9) is no longer unique,
but there is a particular solution that is identical to the unique solution of (2.11). In
this section we continue our study of these subproblems by considering the iterates
that result when solved with an active-set method.
3. Solution of the QP Subproblem 16
3.2.1. An active-set method
For the remainder of this section, the indices associated with the SQP iteration are
omitted and it will be assumed that the constraints of the QP involve the constraints
linearized at the point ¯x. In all cases, the suffix jwill be reserved for the iteration
index of the QP algorithm.
We start by defining a “conventional” active-set method on a generic convex QP
with constraints written in standard form. The problem format is
minimize
xQ(x) = gT(x−¯x) + 1
2(x−¯x)TH(x−¯x)
subject to c+A(x−¯x)=0, x ≥0,(3.8)
where ¯x,c,A,gand Hare constant, with Hpositive-definite. Throughout, we
assume that the constraints are feasible, i.e., there exists at least one nonnegative x
such that c+A(x−¯x) = 0.
Given a feasible x0, active-set methods generate a feasible sequence {xj}such
that Q(xj+1)≤ Q(xj) with xj+1 =xj+αjpj. Let the index sets A0and F0be
defined as in (1.3). At the start of the jth QP iteration, given primal-dual iterates
(xj, wj), new estimates (xj+pj, wj+qj) are defined by solving a QP formed by
fixing the variables with indices in A0(xj) and defining pjsuch that xj+pjminimizes
Q(x) with respect to the free variables, subject to the equality constraints. With this
definition, the quantities wj+qjare the Lagrange multipliers at the minimizer xj+pj.
The components of pjwith indices in A0(xj) are zero, and the free components
pF= [ pj]Fare determined from the equations
HF−AT
F
AF0pF
qj=−[g+H(xj−¯x)−ATwj]F
c+A(xj−¯x),(3.9)
where [ ·]Fdenotes the subvector of components with indices in F0(xj). The choice
of step length αjis based on remaining feasible with respect to the satisfied bounds.
If xj+pjis feasible, i.e., xj+pj≥0, then αjwill be taken as unity. Otherwise,
αis set to αM, the largest feasible step along pj. Finally, the iteration index jis
incremented by one and the iteration is repeated.
It must be emphasized that this active-set method is not well defined unless the
equations (3.9) have a solution at every (xj, wj).
3.2.2. Solution of the bound-constrained subproblem
In this section we apply the active-set method to a QP of the form
minimize
v=(x,y)gT
M(v−¯v) + 1
2(v−¯v)THM(v−¯v) subject to x≥0,(3.10)
where ¯v= (¯x, ¯y), and
gM=g−JTπ+ν(π−¯y)
νc+µ(¯y−yE), HM=H+1
µ(1 + ν)JTJ νJ T
νJ νµI ,
3. Solution of the QP Subproblem 17
with H+1
µJTJpositive definite, ν≥0, and π=yE−c/µ (see (2.3)). The matrix
HMis positive semidefinite under the given assumptions. This follows from the
identity
LTHML= H+1
µJTJ0
0νµIm!,where L= In0
−1
µJ Im!.
The matrix Lis nonsingular, and Sylvester’s Law of Inertia gives
In(HM) = In(LTHML) = In H+1
µJTJ+ (m, 0,0) = (n+m, 0,0) for ν > 0,
and
In(HM) = In H+1
µJTJ+ (0,0, m) = (n, 0, m) for ν= 0.
It follows that problem (3.10) is a convex QP, and we may apply the active-set
method of Section 3.2.1.
Given the jth QP iterate vj= (xj, yj), the generic active-set method applied to
(3.10) defines the next iterate as vj+1 =vj+αj∆vj, where the free components of
the vector ∆vj= (pj, qj) satisfy the equations
[HM]F∆vF=−[gM+HM(vj−¯v) ]F,(3.11)
where ∆vF= (pF, qj) and the index set F0(xj) is defined as in (1.3). The equations
(3.11) appear to be ill-conditioned for small µbecause of the O(1/µ) term in the
(1,1) block of the matrix HM. However, this ill-conditioning is superficial. The next
result shows that ∆vFmay be determined by solving an equivalent nonsingular
primal-dual system with conditioning dependent on that of the original problem.
Theorem 3.2. Consider the application of the active-set method to the QP (3.10).
Then, for every ν≥0, there exists a positive ¯µsuch that, for all 0<µ< ¯µ, the free
components of the QP search direction (pj, qj)satisfy the nonsingular primal-dual
system HF−JT
F
JFµI pF
qj=−[g+H(xj−¯x)−JTyj]F
c+µ(yj−yE) + J(xj−¯x).(3.12)
Proof. First, we consider the definition of the search direction when ν > 0. In this
case it suffices to show that the linear systems (3.11) and (3.12) are equivalent. For
any positive ν, we may define the matrix
T= I−1+ν
νµ JT
F
01
νIm!,
where the identity matrix Ihas dimension nF, the column dimension of JF. The
matrix Tis nonsingular with nF+mrows and columns. It follows that the equations
T[HM]F∆vF=−T[gM+HM(vj−¯v) ]F
3. Solution of the QP Subproblem 18
have the same solution as those of (3.11). The primal-dual equations (3.12) follow
by direct multiplication. The nonsingularity of the equations (3.12) follows from the
nonsingularity of T, and the fact that HM(and all symmetric submatrices formed
from its rows and columns) is nonsingular.
The resulting equations (3.12) are independent of ν, but the simple proof above
is not applicable when ν= 0 because Tis undefined in this case. For ν= 0, the QP
objective includes only the primal variables x, which implies that problem (3.10)
may be written as
minimize
x≥0g−JTπTx−¯x+1
2x−¯xTH+1
µJTJx−¯x,
with yarbitrary. The active-set equations analogous to (3.11) are then
HF+1
µJT
FJFpF=−g+H+1
µJTJxj−¯x−JTπF
.(3.13)
For any choice of yj, define the m-vector qjsuch that
qj=−1
µJFpF+µ(yj−π) + J(xj−¯x),(3.14)
where π=yE−c/µ (see (2.3)). Equations (3.13) and (3.14) may be combined to
give equations K∆vF=−r, where ∆vF= (pF, qj),
K= HF+2
µJT
FJFJT
F
JFµI !
and right-hand side
r= [g+H(xj−¯x) ]F+2
µJT
FJ(xj−¯x)−JT
Fyj+ 2JT
F(yj−π)
µ(yj−π) + J(xj−¯x)!.
Forming the equations T K ∆vF=−T r, where Tis the nonsingular matrix
T= I−2
µJT
F
0Im!,
gives the equivalent system
HF−JT
F
JFµI pF
qj=−[g+H(xj−¯x)−JTyj]F
c+µ(yj−yE) + J(xj−¯x),
which is identical to the system (3.12).
3. Solution of the QP Subproblem 19
Theorem 3.3. Let (pk, qk)be the solution of the QP subproblem (2.7). If pFdenotes
the components of pkwith indices in F0(xk+pk), then (pF, qk)satisfies the equations
¯
HF−JT
F
JFµI ! pF
qk!=− [gk−JT
kyk−¯
Hks]F
ck+µ(yk−yE)−Jks!,
where Fis defined in terms of the set F0(xk+pk)and sis a nonnegative vector
such that
si=[xk]iif i∈ A0(xk+pk);
0if i∈ F0(xk+pk).
Proof. The proof is analogous to that of Theorem 3.2.
3.2.3. Solution of the stabilized SQP subproblem
In this section we show that under certain conditions, the conventional active-set
method applied to the stabilized SQP subproblem (3.2) and the bound-constrained
QP (3.1) will generate identical iterates.
Consider the application of the “generic” active-set method of Section 3.2.1 to
the stabilized QP:
minimize
x,y gT(x−¯x) + 1
2(x−¯x)TH(x−¯x) + 1
2µkyk2
subject to c+J(x−¯x) + µ(y−yE)=0, x ≥0.(3.15)
In terms of the data “(x, ¯x, H, g, A, c)” for the generic QP (3.8), we have variables
“x” = (x, y), with “ ¯x” = (¯x, ¯y),
“H” = H0
0µI,“g” = g
µ¯y,“A” = J µI,and “c” = c+µ( ¯y−yE).
(The discussion of the properties of the stabilized QP relative to the generic form
(3.8) is not affected by the nonnegativity constraints being applied to only a subset
of the variables in (3.15).) After some simplification, the equations analogous to
(3.9) may be written as
HF0−JT
F
0µI −µI
JFµI 0
pF
¯pF
qj
=−
[g+H(xj−¯x)−JTwj]F
µyj−µwj
c+µ(yj−yE) + J(xj−¯x)
,(3.16)
where pFand ¯pFdenote the free components of the search directions for the xand
yvariables respectively. (Observe that the right-hand side of (3.16) is independent
of ¯y.) The second block of equations gives ¯pF=qj−yj+wj, which implies that
yj+1 =yj+ ¯pF=yj+qj−yj+wj=wj+qj=wj+1,
so that the primal y-variables and dual variables of the stabilized QP are identical.
4. Convergence 20
Similarly, substituting for ¯pFin the third block of equations in (3.17), and using
the primal-dual equivalence wj=yjgives
HF−JT
F
JFµI pF
qj=−[g+H(xj−¯x)−JTyj]F
c+µ(yj−yE) + J(xj−¯x),(3.17)
which are identical to the equations associated with those for the QP subproblem
(3.10).
The preceding discussion constitutes a proof of the following result.
Theorem 3.4. Consider the application of the active-set method to the bound con-
strained QP (3.10)and stabilized QP (3.15)defined with the same quantities c,g,J
and H. Consider any x0and y0such that (x0, y0)is feasible for the stabilized QP
(3.15). Then, for every ν≥0, there exists a positive ¯µsuch that, for all 0<µ< ¯µ,
the active-set method generates identical primal-dual iterates {(xj, yj)}j≥0.
4. Convergence
The convergence of Algorithm 2.1 is discussed under the following assumptions.
Assumption 4.1. Each ¯
H(xk, yk)is chosen so that the sequence {¯
H(xk, yk)}k≥0is
bounded, with {¯
H(xk, yk) + (1/µR
k)J(xk)TJ(xk)}k≥0uniformly positive definite.
Assumption 4.2. The functions fand care twice continuously differentiable.
Assumption 4.3. The sequence {xk}k≥0is contained in a compact set.
In the “worst” case, i.e., when all iterates are eventually M-iterates or F-iterates,
Algorithm 2.1 emulates a primal-dual augmented Lagrangian method [9,10,43].
Consequently, it is possible that yE
kand µR
kwill remain fixed over a sequence of
iterations, although this is rare in practice. Nonetheless, our convergence result
must consider this situation, which we now investigate.
Theorem 4.1. Let Assumptions 4.1–4.3 hold. If there exists an integer b
ksuch that
µR
k≡µR>0and kis an F-iterate for all k≥b
k, then the following hold:
(i) solutions {∆vk}k≥
b
kto subproblem (2.9)are bounded above;
(ii) solutions {∆vk}k≥
b
kto subproblem (2.9)are bounded away from zero; and
(iii) there exists a constant > 0such that
∇Mν(vk;yE
k, µR
k)T∆vk≤ −for all k≥b
k.
Proof. The assumptions of this theorem guarantee that
τk≡τ > 0, µR
k=µR,and yE
k=yEfor all k≥b
k. (4.1)
4. Convergence 21
We first prove part (i). As in the proof of Theorem 3.1, we know that the solution
to (2.9) satisfies
∆vk=0
πk−yk+Mkw∗,where Mk=µRI
−Jk
and w∗is the unique solution of
minimize
w∈Rn
µR
2wT¯
Hk+1
µRJT
kJkw+wTgk−JT
kπksubject to xk+µRw≥0,
for all k≥b
k. It follows from Assumption 4.1 that {∆vk}k≥
b
kis uniformly bounded
provided that the quantities gk−JT
kπk,Mk,πk, and ykare all uniformly bounded
for k≥b
k. The boundedness of gk−JT
kπk,πkand Mkfollows from Assumption 4.2,
Assumption 4.3, (4.1), and (2.3). Thus, it remains to prove that {yk}k≥
b
kis bounded.
To this end, we first note that since µR
k=µRfor all k≥b
k, the update to µkgiven
by (2.20) implies that µk≡µ≥µRfor some µand all ksufficiently large. From
this point onwards the primal-dual merit function is monotonically decreasing, i.e.,
Mν(xk+1, yk+1 ;yE, µ)≤ Mν(xk, yk;yE, µ). Thus {yk}k≥
b
kmust be bounded since
if there existed a subsequence such that kykkconverged to infinity, then along that
same subsequence Mνwould also converge to infinity since both {fk−cT
kyE+
1
2µkckk2}k≥
b
kand {ck}k≥
b
kare bounded because of Assumptions 4.2 and 4.3. This
completes the proof of part (i).
Part (ii) is established by showing that {∆vk}k≥
b
kis bounded away from zero. If
this were not the case, there would exist a subsequence S1⊆ {k:k≥b
k}such that
limk∈S1∆vk= 0. It follows that the solution ∆vkto problem (2.9) satisfies
zk
0=Hν
M(vk;µR)∆vk+∇Mν(vk;yE, µR) and 0 = min(xk+pk, zk)
for all k∈ S1. We may then conclude from the definition of Hν
M, Assumptions 4.1–
4.3, and (4.1) that for k∈ S1sufficiently large, iterate vkwill satisfy condition (2.17),
be an M-iterate, and µR
kwould be decreased. This contradicts the assumption that
µR
k≡µRfor all k≥b
k. It follows that {kvkk}k≥
b
kis bounded away from zero and
part (ii) holds.
The proof of part (iii) is also by contradiction. Assume that there exists a
subsequence S2of {k:k≥b
k}such that
lim
k∈S2
∇Mν(vk;yE, µR)T∆vk= 0,(4.2)
where we have used (4.1). Using the matrix
Lk=I0
1
µRJkI,
the fact that ∆v = 0 is feasible for the convex problem (2.9), that ∆vkis the solution
to problem (2.9) for ν > 0 chosen in Algorithm 2.1, (4.1), and Assumption 4.1, it
4. Convergence 22
follows that
−∇Mν(vk;yE, µR)T∆vk≥1
2∆vT
kHν
M(vk;µR)∆vk
=1
2∆vT
kL−T
kLT
kHν
M(vk;µR)LkL−1
k∆vk
= pk
qk+1
µRJkpk!T ¯
Hk+1
µRJT
kJk0
0νµR! pk
qk+1
µRJkpk!T
≥λminkpkk2+νµRkqk+ (1/µR)Jkpkk2,
for some λmin >0. Combining this with (4.2) we deduce that
lim
k∈S2
pk= lim
k∈S2qk+ (1/µR)Jkpk= 0,
from which limk∈S2qk= 0 follows from Assumptions 4.2 and 4.3. This contradicts
part (ii), which shows that limk∈S2∆vk= 0. It follows that part (iii) must hold.
We may now state our convergence result for Algorithm 2.1.
Theorem 4.2. Let Assumptions 4.1–4.3 hold. If vkdenotes the kth iterate gener-
ated by Algorithm 2.1, then either
(i) Algorithm 2.1 terminates with an approximate primal-dual first-order solution
vKsatisfying
kropt(vK)k ≤ τstop ,
where ropt is defined by (2.19); or
(ii) there exists a subsequence Ssuch that limk∈S µR
k= 0,{yE
k}k∈S is bounded,
limk∈S τk= 0, and for each k∈ S the vector vk+1 is an approximate minimizer
of the primal-dual augmented Lagrangian function (2.1)satisfying (2.17).
Proof. There are two cases to consider.
Case 1. A subsequence of {kropt(vk)k}k≥0converges to zero.
In this case it is clear from the definition of S-iterates, M-iterates, φS, and φL, and
the fact that τstop >0 that part (i) will be satisfied for some Ksufficiently large.
Case 2. The sequence {kropt(vk)k}k≥0is bounded away from zero.
Using the definition of an S-iterate, M-iterates, and the functions φS, and φLwe may
conclude that the number of S-iterates and L-iterates must be finite. We now claim
that there must be an infinite number of M-iterates. To prove this, we assume to the
contrary that the number of M-iterates is finite, so that all iterates are F-iterates
for ksufficiently large. It follows that the update to µR
kgiven by (2.18) and the
assumption of this case, that µR
kis eventually never decreased any further, and that
5. Conclusions 23
the update to µkgiven by (2.20) implies that µkis also eventually fixed. Gathering
these facts gives the existence of an integer b
ksuch that
µR
k≡µR≤µ≡µk, yE
k≡yE, τk≡τ > 0,and kis an F-iterate for all k≥b
k.
It then follows from (2.20) that
Mν(vk+1 ;yE, µ)≤ Mν(vk;yE, µ) + min(αmin, αk)ηSNkfor all k≥b
k, (4.3)
where Nkis defined by (2.13). Moreover, parts (ii) and (iii) of Theorem 4.1 ensures
that {Nk}k≥
b
kis a negative sequence bounded away from zero. We also claim that
{αk}k≥
b
kis bounded away from zero. To see this, we first note that parts (i) and
(iii) of Theorem 4.1 and Assumption 4.2 would ensure that {αk}k≥
b
kis bounded
away from zero if a standard Armijo line search was used, i.e., if µF
k=µRand
Nk=∆vT
k∇Mν(vk;yE, µR) in (2.12). However, the αkthat we actually compute
can be no smaller since the actual definition of Nkis less restrictive and we use a
flexible line search that makes step acceptance more likely. Combining these facts
with (4.3), we conclude that
Mν(vk+1 ;yE, µ)≤ Mν(vk;yE, µ)−κfor all k≥b
kand some κ > 0,
so that
lim
k→∞ Mν(vk;yE, µ) = −∞.
However, Assumptions 4.2 and 4.3 ensure that this is not possible. A contradiction
has been reached so there exists infinitely many M-iterations, and all iterates are
M-iterates and F-iterates for all ksufficiently large. Part (ii) now follows from (2.18)
and the properties of the updates to τkand yE
kused for M-iterates and F-iterates in
Algorithm 2.1.
The “ideal” scenario is that Algorithm 2.1 generates many S-iterates/L-iterates
that rapidly converge to an approximate solution of NP; this corresponds to part (i)
of Theorem 4.2. Part (ii) of Theorem 4.2, i.e., generating infinitely many M-iterates,
is the fall-back position of Algorithm 2.1. We believe this result is the best that
can be expected since we have not assumed any constraint qualification. In fact,
the assumptions we have made does no preclude the possibility that problem NP is
infeasible. Also, it has recently been proved [14,15,37] that iterates generated from
the stabilized SQP subproblem exhibit superlinear convergence under rather mild
conditions; in particular, strict complementarity is not assumed and no constraint
qualification is required.
5. Conclusions
In this paper we developed and analyzed an SQP method for solving general non-
linear optimization problems. The algorithm is based on the natural pairing of a
generalized primal-dual augmented Lagrangian function with a flexible line search.