ChapterPDF Available

Sequential Quadratic Programming Methods

November 2012

November 2012
154:147-224

DOI:10.1007/978-1-4614-1927-3_6

In book: Mixed Integer Nonlinear Programming (pp.147-224)

Authors:

Philip E. Gill

University of California, San Diego

In his 1963 PhD thesis, Wilson proposed the first sequential quadratic programming (SQP) method for the solution of constrained nonlinear optimization problems. In the intervening 48 years, SQP methods have evolved into a powerful and effective class of methods for a wide range of optimization problems. We review some of the most prominent developments in SQP methods since 1963 and discuss the relationship of SQP methods to other popular methods, including augmented Lagrangian methods and interior methods. Given the scope and utility of nonlinear optimization, it is not surprising that SQP methods are still a subject of active research. Recent developments in methods for mixed integer nonlinear programming (MINLP) and the minimization of functions subject to differential equation constraints has led to a heightened interest in methods that may be “warm started” from a good approximate solution. We discuss the role of SQP methods in these contexts Key wordsLarge-scale nonlinear programming-SQP methods-nonconvex programming-quadratic programming-KKT systems

Content uploaded by Philip E. Gill

Content may be subject to copyright.

REGULARIZED SEQUENTIAL QUADRATIC

PROGRAMMING METHODS

Philip E. Gill∗Daniel P. Robinson†

UCSD Department of Mathematics

Technical Report NA-11-02

October 2011

Abstract

We present the formulation and analysis of a new sequential quadratic pro-

gramming (SQP) method for general nonlinearly constrained optimization. The

method pairs a primal-dual generalized augmented Lagrangian merit function

with a ﬂexible line search to obtain a sequence of improving estimates of the

solution. This function is a primal-dual variant of the augmented Lagrangian

proposed by Hestenes and Powell in the early 1970s. A crucial feature of the

method is that the QP subproblems are convex, but formed from the exact

second derivatives of the original problem. This is in contrast to methods that

use a less accurate quasi-Newton approximation. Additional beneﬁts of this

approach include the following: (i) each QP subproblem is regularized; (ii) the

QP subproblem always has a known feasible point; and (iii) a projected gradient

method may be used to identify the QP active set when far from the solution.

Key words. Nonlinear programming, nonlinear constraints, augmented

Lagrangian, sequential quadratic programming, SQP methods, regularized meth-

ods, primal-dual methods.

AMS subject classiﬁcations. 49J20, 49J15, 49M37, 49D37, 65F05, 65K05,

90C30

∗Department of Mathematics, University of California, San Diego, La Jolla, CA 92093-0112

(pgill@ucsd.edu). Research supported in part by National Science Foundation grants DMS-

0511766 and DMS-0915220, and by Department of Energy grant DE-SC0002349.

†Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD

21218-2682 (daniel.p.robinson@jhu.edu).

1. Introduction 2

1. Introduction

We present a sequential quadratic programming (SQP) method for optimization

problems involving general linear and nonlinear constraints. The method is de-

scribed in terms of the problem format:

(NP) minimize

x∈Rnf(x) subject to c(x)=0, x ≥0,

where c:Rn7→ Rmand f:Rn7→ Rare twice-continuously diﬀerentiable. This

problem format assumes that all general inequality constraints have been converted

to equalities by the use of slack variables. Methods for solving problem (NP) easily

carry over to the more general setting with l≤x≤u. The vector-pair (x∗, y∗) is

called a ﬁrst-order solution to problem (NP) if it satisﬁes

c(x∗) = 0 and min x∗, z∗= 0,(1.1)

where y∗are the Lagrange multipliers associated with the constraints c(x) = 0, and

z∗are the reduced costs at (x∗, y∗), i.e. z∗=g(x∗)−J(x∗)Ty∗

Sequential quadratic programming methods and interior methods are two alter-

native approaches to handling the inequality constraints in problem (NP). Sequential

quadratic programming (SQP) methods ﬁnd an approximate solution of a sequence

of quadratic programming (QP) subproblems in which a quadratic model of the ob-

jective function is minimized subject to the linearized constraints. Interior methods

approximate a continuous path that passes through a solution of (NP). In the sim-

plest case, the path is parameterized by a positive scalar parameter µthat may be

interpreted as a perturbation for the optimality conditions for the problem (NP).

Both interior methods and SQP methods have an inner/outer iteration structure,

with the work for an inner iteration being dominated by the cost of solving a large

sparse system of symmetric indeﬁnite linear equations. In the case of SQP meth-

ods, these equations involve a subset of the variables and constraints; for interior

methods, the equations involve all the constraints and variables.

SQP methods provide a relatively reliable “certiﬁcate of infeasibility” and they

have the potential of being able to capitalize on a good initial starting point. Sophis-

ticated matrix factorization updating techniques are used to exploit the fact that

the linear equations change by only a single row and column at each inner iteration.

These updating techniques are often customized for the particular QP method being

used and have the beneﬁt of providing a uniform treatment of ill-conditioning and

singularity.

On the negative side, it is diﬃcult to implement SQP methods so that exact sec-

ond derivatives can be used eﬃciently and reliably. Some of these diﬃculties stem

from the theoretical properties of the quadratic programming subproblem, which

can be nonconvex when second derivatives are used. Nonconvex quadratic pro-

gramming is NP-hard—even for the calculation of a local minimizer [11,25]. The

complexity of the QP subproblem has been a major impediment to the formula-

tion of second-derivative SQP methods (although methods based on indeﬁnite QP

have been proposed [19,20]). Over the years, algorithm developers have avoided

1. Introduction 3

this diﬃculty by eschewing second derivatives and by solving a convex QP subprob-

lem deﬁned with a positive semideﬁnite quasi-Newton approximate Hessian (see,

e.g., [28]); some authors enhance these basic methods with an additional subspace

phase that incorporates exact second derivatives [33,34,40]. A diﬃculty with active-

set methods is that they may require a substantial number of QP iterations when the

outer iterates are far from the solution. The use of a QP subproblem is motivated by

the assumption that the QP objective and constraints provide good “models” of the

objective and constraints of problem (NP). This should make it unnecessary (and

ineﬃcient) to solve the QP to high accuracy during the preliminary iterations. Un-

fortunately, the simple expedient of limiting the number of inner iterations may have

a detrimental eﬀect upon reliability. An approximate QP solution may not predict

a suﬃcient improvement in a merit function. Moreover, some of the QP multipliers

will have the wrong sign if an active-set method is terminated before a solution is

found. This may cause diﬃculties if the QP multipliers are used to estimate the

multipliers for the nonlinear problem. These issues would largely disappear if a

primal-dual interior method were to be used to solve the QP subproblem. These

methods have the beneﬁt of providing a sequence of feasible (i.e., correctly signed)

dual iterates. Nevertheless, QP solvers based on conventional interior methods have

had limited success within SQP methods because they are diﬃcult to “warm start”

from a near-optimal point (see the discussion below). This makes it diﬃcult to

capitalize on the property that, as the outer iterates converge, the solution of one

QP subproblem is a very good estimate of the solution of the next.

Broadly speaking, the advantages and disadvantages of SQP methods and in-

terior methods complement each other. Interior methods are most eﬃcient when

implemented with exact second derivatives. Moreover, they can converge in few

inner iterations—even for very large problems. The inner iterates are the iterates of

Newton’s method for ﬁnding an approximate solution of the perturbed optimality

conditions for a given µ. As the dimension and zero/nonzero structure of the New-

ton equations remains ﬁxed, these Newton equations may be solved eﬃciently using

either iterative or direct methods available in the form of advanced “oﬀ-the-shelf”

linear algebra software. In particular, any new software for multicore and parallel

architectures is immediately applicable. Moreover, the perturbation parameter µ

plays an auxiliary role as an implicit regularization parameter of the linear equa-

tions. This implicit regularization plays a crucial role in the robustness of interior

methods on ill-conditioned and ill-posed problems.

On the negative side, although interior methods are very eﬀective for solving

“one-oﬀ” problems, they are diﬃcult to adapt to solving a sequence of related non-

linear problems. This diﬃculty may be explained in terms of the “path-following”

interpretation of interior methods. In the neighborhood of an optimal solution, a

step along the path x(µ) of perturbed solutions is well-deﬁned, whereas a step onto

the path from a neighboring point will be extremely sensitive to perturbations in

the problem functions (and hence diﬃcult to compute). Another diﬃculty with con-

ventional interior methods is that a substantial number of iterations may be needed

when the constraints are infeasible.

The idea of replacing a constrained optimization problem by a sequence of un-

1. Introduction 4

constrained problems parameterized by a scalar µhas played a fundamental role

in the formulation of algorithms since the early 1960s (for a seminal reference, see

Fiacco and McCormick [16,17]). One of the best-known methods for solving the

equality-constrained problem (NEP) uses an unconstrained function based on the

quadratic penalty function, which combines fwith a term of order 1/µ that “penal-

izes” the sum of the squares of the constraint violations. Under certain conditions

(see, e.g., [17,26,49,51]), the minimizers of the penalty function deﬁne a diﬀeren-

tiable trajectory or central path that approaches the solution as µ→0. Penalty

methods approximate this path by minimizing the penalty function for a ﬁnite se-

quence of decreasing values of µ. In this form, the methods have a two-level structure

of inner and outer iterations: the inner iterations are those of the method used to

minimize the penalty function, and the outer iterations test for convergence and

adjust the value of µ. As µ→0, the Newton equations for minimizing the penalty

function are increasingly ill-conditioned, and this ill-conditioning was perceived to

be the reason for the poor numerical performance on some problems. In separate

papers, Hestenes [36] and Powell [42] proposed the augmented Lagrangian function

for (NEP), which is an unconstrained function based on augmenting the Lagrangian

function with a quadratic penalty term that does not require µto go to zero for con-

vergence. The price that must be paid for keeping 1/µ ﬁnite is the need to update

estimates of the Lagrange multipliers in each outer iteration.

Since the ﬁrst appearance of the Hestenes-Powell function, many algorithms have

been proposed based on using the augmented Lagrangian as an objective function for

sequential unconstrained minimization. Augmented Lagrangian functions have also

been proposed that treat the multiplier vector as a continuous function of x; some

of these ensure global convergence and permit local superlinear convergence (see,

e.g., Fletcher [18]; DiPillo and Grippo [13]; Bertsekas [1,2]; Boggs and Tolle [4]).

As methods for treating linear inequality constraints and bounds became more

sophisticated, the emphasis of algorithms shifted from sequential unconstrained min-

imization to sequential linearly constrained minimization. In this context, the aug-

mented Lagrangian has been used successfully within a number of diﬀerent algo-

rithmic frameworks for problem (NP). The method used in the software package

LANCELOT [9] ﬁnds the approximate solution of a sequence of bound constrained

problems with an augmented Lagrangian objective function. Similarly, the software

package MINOS of Murtagh and Saunders [41] employs a variant of Robinson’s lin-

early constrained Lagrangian (LCL) method [44] in which an augmented Lagrangian

is minimized subject to the linearized nonlinear constraints. Friedlander and Saun-

ders [27] deﬁne a globally convergent version of the LCL method that can treat

infeasible constraints and infeasible subproblems. Augmented Lagrangian functions

have also been used extensively as a merit function for sequential quadratic pro-

gramming (SQP) methods (see, e.g., [3,5,7,21,28,30,45–48]).

The development of path-following interior methods for linear programming in

the mid-1980s stimulated renewed interest in the treatment of constraints by sequen-

tial unconstrained optimization. This new attention not only resulted in a new un-

derstanding of the computational complexity of existing methods but also provided

the impetus for the development of new approaches. A notable development was the

1. Introduction 5

derivation of eﬃcient path-following methods for linear programming based on ap-

plying Newton’s method with respect to both the primal and dual variables. These

new approaches also refocused attention on two computational aspects of penalty-

and barrier-function methods for nonlinear optimization. First, the recognition of

the formal equivalence between some primal-dual methods and conventional penalty

methods indicated that the inherent ill-conditioning of penalty and barrier functions

is not necessarily the reason for poor numerical performance. Second, the crucial

role of penalty and barrier functions in problem regularization was recognized and

better understood.

In this paper we formulate and analyze a new sequential quadratic programming

(SQP) method for nonlinearly constrained optimization. The method pairs a primal-

dual generalized augmented Lagrangian merit function with a ﬂexible line search

to obtain a sequence of improving estimates of the solution. This function is a

primal-dual variant of the augmented Lagrangian proposed by Hestenes and Powell

in the early 1970s. A crucial feature of the method is that the QP subproblems

are convex, but formed from the exact second derivatives of the original problem.

This is in contrast to methods that use a less accurate quasi-Newton approximation.

Additional beneﬁts of this approach include the following: (i) each QP subproblem

is regularized; (ii) the QP subproblem always has a known feasible point; and (iii) a

projected gradient method may be used to identify the QP active set when far from

the solution. Preliminary numerical experiments on a subset of problems from the

CUTEr test collection indicate that the proposed SQP method is signiﬁcantly more

eﬃcient than our current SQP package SNOPT.

The paper is organized in ﬁve sections. Section 1is a review of some of the

basic properties of SQP methods. In Section 2, the steps of the primal-dual SQP

method are deﬁned. Similarities with the conventional Hestenes-Powell augmented

Lagrangian method are also discussed. In Section 3, we consider methods for the

solution of the QP subproblem and show that in the neighborhood of a solution,

the method is equivalent to the stabilized SQP method [15,35,38,50]. A rather

general global convergence result is established in Section 4that does not make any

constraint qualiﬁcation or non-degeneracy assumption.

Notation and Terminology

Unless explicitly indicated otherwise, k·k denotes the vector two-norm or its induced

matrix norm. The inertia of a real symmetric matrix A, denoted by In(A), is the

integer triple (a+, a−, a0) giving the number of positive, negative and zero eigen-

values of A. Given vectors aand bwith the same dimension, the vector with ith

component aibiis denoted by a·b. The vectors eand ejdenote, respectively, the

column vector of ones and the jth column of the identity matrix I. The dimensions

of e,eiand Iare deﬁned by the context. Given vectors xand y, the long vector

consisting of the elements of xaugmented by elements of yis denoted by (x, y). The

ith component of a vector labeled with a subscript will be denoted by [ ·]i, e.g., [ v]i

is the ith component of the vector v. The subvector of components with indices in

the index set Sis denoted by [ ·]S, e.g., [ v]Sis the vector with components [ v]i

1. Introduction 6

for i∈ S. Similarly, if Mis a symmetric matrix, then [ M]Sdenotes the symmetric

matrix with elements mij for i,j∈ S. A local solution of an optimization problem

is denoted by x∗. The vector g(x) is used to denote ∇f(x), the gradient of f(x),

and H(x) denotes the (symmetric) Hessian matrix ∇2

f(x). The matrix J(x) de-

notes the m×nconstraint Jacobian, which has ith row ∇ci(x)T, the gradient of

the ith constraint function ci(x). The matrix Hi(x) denotes the Hessian of ci(x).

The Lagrangian function associated with (NP) is L(x, y, z ) = f(x)−c(x)Ty−zTx,

where yand zare m- and n-vectors of dual variables associated with the equality

constraints and bounds, repectively. The Hessian of the Lagrangian with respect to

xis denoted by H(x, y) = H(x)−Pm

i=1 yiHi(x).

Background

Some of the most eﬃcient algorithms for nonlinear optimization are sequential

quadratic programming (SQP) methods. Conventional SQP methods ﬁnd an ap-

proximate solution of a sequence of quadratic programming (QP) subproblems in

which a quadratic model of the objective function is minimized subject to the lin-

earized constraints. Given a current estimate (xk, yk) of a primal-dual solution of

(NP), a line search SQP method computes a search direction pksuch that xk+pk

is the solution (when it exists) of the convex quadratic program

minimize

xgT

k(x−xk) + 1

2(x−xk)T¯

Hk(x−xk)

subject to ck+Jk(x−xk) = 0, x ≥0,(1.2)

where ck,gkand Jkdenote the quantities c(x), g(x) and J(x) evaluated at xk, and

Hkis some positive-deﬁnite approximation to H(xk, yk). If the Lagrange multiplier

vector associated with the constraint ck+Jk(x−xk) = 0 is written in the form

yk+qk, then a solution (xk+pk, yk+qk) of the QP subproblem (1.2) satisﬁes

ck+Jkpk= 0 and min xk+pk, gk+¯

Hkpk−JT

k(yk+qk)= 0,

Given any x≥0, let A0and F0denote the index sets

A0(x) = {i:xi= 0}and F0(x) = {1,2, . . . , n}/A0(x),(1.3)

If xis feasible for the constraints ck+Jk(x−xk) = 0, then A0(x) is the active set

at x. If the set A0associated with a solution of the subproblem (1.2) is known,

then xk+pkmay be found by solving linear equations that represent the optimality

conditions for an equality-constrained QP with the inequalities x≥0 replaced by

xi= 0 for i∈ A0. In general, the optimal A0is not known in advance, and

active-set methods generate a sequence of estimates (bpj,bqj)≈(pk, qk) such that

(bpj+1,bqj+1) = (bpj,bqj) + αj(∆pj, ∆qj), with (∆pj, ∆qj) a solution of

¯

HF−JT

JF0∆pF

∆qj=−[gk+¯

Hkbpj−JT

k(yk+bqj) ]F

ck+Jkbpj,(1.4)

where ¯

HFis the matrix of free rows and columns of ¯

Hk,JFis the matrix of free

columns of Jk, and the step length αis chosen to ensure feasibility of all variables,

not just those in the set A0.

2. A Regularized Primal-Dual Line-Search SQP Algorithm 7

If the equations (1.4) are to be used to deﬁne ∆pFand ∆qj, then it is necessary

that JFhas full rank, which is probably the greatest outstanding issue associated

with systems of the form (1.4). Two remedies are available.

•Rank-enforcing active-set methods maintain a set of indices Bassociated with a

matrix of columns JBwith rank m, i.e., the rows of JBare linearly independent.

The set Bis the complement in (1,2, . . . , n) of a “working set” of indices that

estimates the set A0at a solution of (1.2). If Nis a subset of A0, then the

system analogous to (1.4) is given by

¯

HB−JT

JB0∆pB

∆qj=−[gk+¯

Hkbpj−JT

k(yk+bqj) ]B

ck+Jkbpj,(1.5)

which is nonsingular because of the linear independence of the rows of JB.

•Regularized active-set methods add a positive-deﬁnite regularization term in

the (2,2) block of (1.4). The magnitude of the regularization is generally based

an heuristic arguments, which gives mixed results in practice.

2. A Regularized Primal-Dual Line-Search SQP Algorithm

In this section, we deﬁne a regularized SQP line-search method based on the primal-

dual augmented Lagrangian merit function

Mν(x, y ;yE, µ) = f(x)−c(x)TyE+1

2µkc(x)k2+ν

2µkc(x) + µ(y−yE)k2,(2.1)

where νis a scalar, µis the so-called penalty parameter, yEis an estimate of an

optimal Lagrange multiplier vector y∗. This function, proposed by Robinson [43],

and Gill and Robinson [31], may be derived by applying the primal-dual penalty

function of Forsgren and Gill [23] to a problem in which the constraints are shifted

by a constant vector (see Powell [42]). With the notation c=c(x), g=g(x), and

J=J(x), the gradient of Mν(x, y ;yE, µ) may be written as

∇Mν(x, y ;yE, µ) = g−JT(1 + ν)(yE−1

µc)−νy

νc+µ(y−yE)!(2.2a)

= g−JTπ+ν(π−y)

νµ(y−π)!,(2.2b)

where π=π(x;yE, µ) denotes the vector-valued function

π(x;yE, µ) = yE−1

µc(x).(2.3)

Similarly, the Hessian of Mν(x, y ;yE, µ) may be written as

∇2

Mν(x, y ;yE, µ) = Hx, π +ν(π−y)+1

µ(1 + ν)JTJ νJ T

νJ νµI .(2.4)

2. A Regularized Primal-Dual Line-Search SQP Algorithm 8

We use Mν(x, y), ∇Mν(x, y), and ∇2

Mν(x, y), to denote Mν,∇Mν, and ∇2

Mν

evaluated with parameters yEand µ. (We note that a trust-region based method

could also be given, but we leave the statement and analysis to a future paper.)

Our approach is motivated by the following theorem, which shows that minimiz-

ers of problem (NP) are also minimizers—under certain assumptions—of the bound

constrained problem

minimize

x,y Mν(x, y ;y∗, µ) subject to x≥0,(2.5)

where y∗is a Lagrange multiplier vector for the equality constraints c(x) = 0.

Theorem 2.1. If (x∗, y∗)satisﬁes the second-order suﬃcient conditions for a solu-

tion of problem (NP), then there exists a positive ¯µsuch that for all 0<µ< ¯µ, the

point (x∗, y∗)is a minimizer of the bound constrained problem (2.5)for all ν > 0.

2.1. Deﬁnition of the search direction

To motivate the computation of the step, we consider a quadratic approximation to

Mν. Given (x, y) and ﬁxed ν≥0, we deﬁne

Hν

M(x, y ;µ) = ¯

H(x, y) + 1

µ(1 + ν)J(x)TJ(x)νJ (x)T

νJ (x)νµI ,(2.6)

where ¯

H(x, y) is a symmetric approximation to Hx, π +ν(π−y)≈H(x, y ) such

that ¯

H(x, y) + 1

µJ(x)TJ(x) is positive deﬁnite. The approximation π+ν(π−y)≈y

is valid provided π≈y. The restriction on the inertia of ¯

Himplies that Hν

M(x, y ;µ)

is positive deﬁnite for ν > 0 and positive semideﬁnite for ν= 0 (see Theorem 3.1 of

Section 3.2.3).

Using this deﬁnition of Hν

Mat the kth primal-dual iterate vk= (xk, yk), consider

the convex QP subproblem

minimize

∆v=(p,q)∇Mν(vk)T∆v +1

2∆vTHν

M(vk)∆v subject to xk+p≥0,(2.7)

where Mν(v) denotes the merit function evaluated at v. For any primal-dual QP

solution ∆vk= (pk, qk), it is shown in Theorem 3.3 of Section 3.2.3 that the ﬁrst-

order conditions associated with the variables in F0(xk+pk) may be written in

matrix form as:

¯

HF−JT

JFµI  pF

qk!=− [gk−JT

kyk−¯

Hks]F

ck+µ(yk−yE)−Jks!,(2.8)

where ck,gkand Jkdenote the quantities c(x), g(x) and J(x) evaluated at xk, and

sis a nonnegative vector such that

si=[xk]iif i∈ A0(xk+pk);

0 if i∈ F0(xk+pk).

2. A Regularized Primal-Dual Line-Search SQP Algorithm 9

(The assumption of positive-deﬁniteness of ¯

Hk+1

µJT

kJkimplies that the matrix

associated with the equations (2.8) is nonsingular.) It follows that if A0(xk+pk) =

A0(xk), then (pk, qk) satisﬁes the perturbed Newton equations

HF−JT

JFµI  pF

qk!=− [gk−JT

kyk]F

ck+µ(yk−yE)!.

A key property is that if µ= 0 and JFhas full rank, then this equation is identical

to the equation for the conventional SQP step given by (1.4). This provides the

motivation to use diﬀerent penalty parameters for the step computation and the

merit function.

Given an iterate vk= (xk, yk) and Lagrange multiplier estimate yE

k, the primal-

dual search direction ∆vk= (pk, qk) is deﬁned such that vk+∆vk= (xk+pk, yk+qk)

is a solution of the convex QP problem

minimize

v=(x,y)(v−vk)T∇Mν(vk;yE

k, µR

k) + 1

2(v−vk)THν

M(vk;µR

k)(v−vk)

subject to x≥0,(2.9)

where µR

kis a small parameter, and Hν

M(vk;µR

k) is the matrix (2.6) written in terms

of the composite variables vk= (xk, yk). In this context, µR

kplays the role of a

regularization parameter rather than a penalty parameter, thereby providing an

O(µR

k) estimate of the conventional SQP direction. This approach is nonstandard

because a small “penalty parameter” µR

kis used by design, whereas other augmented

Lagrangian-based methods attempt to keep µas large as possible [8,28].

Finally, we note that if v=vkis a solution of the QP (2.9), then vkis a ﬁrst-order

solution of

minimize

v=(x,y)Mν(v;yE

k, µR

k) subject to x≥0.(2.10)

In Section 3it is shown that, under certain conditions, the primal-dual vector

vk+∆vk= (xk+pk, yk+qk) is a solution of problem (2.9) if and only if it solves

minimize

x,y gT

k(x−xk) + 1

2(x−xk)T¯

H(xk, yk)(x−xk) + 1

2µR

kkyk2

subject to ck+Jk(x−xk) + µR

k(y−yE

k)=0, x ≥0,(2.11)

which is often referred to as the “stabilized” SQP subproblem because of its calming

eﬀect on multiplier estimates for degenerate problems (see, e.g., [35,50]). Therefore,

the proposed method provides a natural link between the stabilized SQP methods

(which employ a subproblem appropriate for degenerate problems), conventional

SQP methods (which are highly eﬃcient in practice), and augmented Lagrangian

methods (which have desirable global convergence properties).

2.2. Deﬁnition of the new iterate

Once the search direction ∆vkhas been determined, a “ﬂexible” backtracking line

search is performed on the primal-dual augmented Lagrangian. A conventional

2. A Regularized Primal-Dual Line-Search SQP Algorithm 10

backtracking line search deﬁnes vk+1 =vk+αk∆vk, where αk= 2−jand jis the

smallest nonnegative integer such that

Mν(vk+αk∆vk;yE

k, µk)≤ Mν(vk;yE

k, µk) + αkηS∆vT

k∇Mν(vk;yE

k, µk)

for a given scalar ηS∈(0,1). However, this approach would suﬀer from the Maratos

eﬀect [39] simply because the penalty parameter µkand the regularization parameter

µR

kgenerally have diﬀerent values. Thus, we use a “ﬂexible penalty function” based

on the work of Curtis and Nocedal [12] and deﬁne αk= 2−j, where jis the smallest

nonnegative integer such that

Mν(vk+αk∆vk;yE

k, µF

k)≤ Mν(vk;yE

k, µF

k) + αkηSNk(2.12)

for some value µF

k∈[µR

k, µk], and where

Nk4

=max ∆vT

k∇Mν(vk;yE

k, µR

k),−10−3k∆vkk2≤0 (2.13)

is a suﬃciently negative real number that will allow us to prove global convergence

of our proposed method. Once an appropriate value for αkis found, the new primal-

dual solution estimate is given by

xk+1 =xk+αkpkand yk+1 =yk+αkqk.

We note that the step acceptance is well-deﬁned since the weakened Armijo condi-

tion (2.12) will be satisﬁed for µF

k=µR

kand all αsuﬃciently small.

2.3. Updating the multiplier estimate

The preliminary numerical results presented in [31] indicate that the method out-

lined thus far is robust with respect to updating yE

k. In particular, the numerical

results generated in that paper updated yE

kat every iteration. Consequently, we

seek a strategy that allows for frequent updates to yE

k. To this end, we use the

(merit) functions

φS(v) = η(x) + 10−5ω(v) and φL(v) = 10−5η(x) + ω(v),(2.14)

where

η(x) = kc(x)kand ω(x, y) = 

min x, g(x)−J(x)Ty

(2.15)

are feasibility and stationarity measures at the point (x, y), respectively. These

optimality measures are based on the optimality conditions for problem (NP) rather

than for minimizing the merit function Mν. Both measures are bounded below by

zero, and are equal to zero if vis a ﬁrst-order solution to problem (NP). Such

conditions are appropriate because trial steps are regularized SQP steps that should

converge rapidly to a solution of problem (NP).

The estimate yE

kis updated when any iterate vksatisﬁes either φS(vk)≤1

2φmax

or φL(vk)≤1

2φmax

L, where φmax

Sand φmax

Lare bounds that are updated throughout

the solution process. To ensure global convergence, the update to yE

kis accompanied

by a decrease in either φmax

Sor φmax

2. A Regularized Primal-Dual Line-Search SQP Algorithm 11

Finally, yE

kis also updated if an approximate ﬁrst-order solution of the problem

minimize

x,y Mν(x, y ;yE

k, µR

k) subject to x≥0 (2.16)

has been found. The test for optimality is

k∇

yMν(vk+1 ;yE

k, µR

k)k ≤ τkand 

min xk+1,∇

xMν(vk+1 ;yE

k, µR

k)

≤τk(2.17)

for some small tolerance τk>0. This condition is rarely satisﬁed in practice, but

the test is required for the proof of convergence. Nonetheless, if the condition is

satisﬁed, yE

kis updated with the safeguarded estimate

k+1 = mid−106, yk+1,106.

2.4. Updating the penalty parameters

As we only want to decrease µR

kwhen “close” to optimality (ignoring locally infea-

sible problems), we use the deﬁnition

µR

k+1 =(min 1

2µR

k,krkk3/2,if (2.17) is satisﬁed;

min µR

k,krkk3/2,otherwise, (2.18)

where

rk+1 ≡ropt(vk+1)4

=c(xk+1)

min xk+1, g(xk+1)−J(xk+1)Tyk+1 .(2.19)

The update to µkis motivated by a diﬀerent goal. Namely, we wish to decrease µk

only when the trial step indicates that the merit function with penalty parameter

µkincreases. Thus, we use the deﬁnition

µk+1 =(µk,Mν(vk+1 ;yE

k, µk)≤ Mν(vk;yE

k, µk) + min(αmin, αk)ηSNk

max 1

2µk, µR

k+1,otherwise,

(2.20)

for some positive αmin. The use of the scalar αmin increases the likelihood that µk

will not be decreased.

2.5. Formal statement of the algorithm

In this section we formally state the proposed method as Algorithm 2.1 and in-

clude some additional details. During each iteration, the trial step is computed as

described in Section 2.1, the solution estimate is updated as in Section 2.2,yE

kis

updated as in Section 2.3, and the penalty parameters are updated as in Section 2.4.

The value of yE

kis crucial for both global and local convergence. To this end, there

are three possibilities. First, yE

kis set to yk+1 if (xk+1, yk+1 ) is acceptable to either

of the merit functions φSor φLgiven by (2.14). These iterates are labeled as S- and

L-iterates, respectively. It is to be expected that yE

kwill be updated in this way

most of the time. Second, if (xk+1 , yk+1) is not acceptable to either of the merit

3. Solution of the QP Subproblem 12

functions φSor φL, we check whether we have computed an approximate ﬁrst-order

solution to problem (2.16) by verifying conditions (2.17) for the current value of τk.

If these conditions are satisﬁed, the iterate is called an M-iterate. In this case, the

regularization parameter µR

kand subproblem tolerance τkare decreased and yE

kis

updated as in (2.3). Finally, an iterate at which neither of the ﬁrst two cases occur

is called an F-iterate. The multiplier estimate yE

kis not changed in an F-iterate.

Algorithm 2.1. Regularized primal-dual SQP algorithm (pdSQP)

Input (x0, y0);

Set algorithm parameters αmin >0, ηS∈(0,1), τstop >0, and ν > 0;

Initialize yE

0=y0,τ0>0, µR

0>0, µ0∈[µR

0,∞), and k= 0;

Compute f(x0), c(x0), g(x0), J(x0), and H(x0, y0);

for k= 0,1,2, . . . do

Deﬁne ¯

Hk≈H(xk, yk) such that ¯

Hk+ (1/µR

k)JT

kJkis positive deﬁnite;

Solve the QP (2.9) for the search direction ∆vk= (pk, qk);

Find an αksatisfying (2.12) and (2.13);

Update the primal-dual estimate xk+1 =xk+αkpk,yk+1 =yk+αkqk;

Compute f(xk+1), c(xk+1), g(xk+1), J(xk+1), and H(xk+1, yk+1);

if φS(xk+1, yk+1)≤1

2φmax

Sthen [S-iterate]

φmax

S=1

2φmax

k+1 =yk+1;

τk+1 =τk;

else if φL(xk+1 , yk+1)≤1

2φmax

Lthen [L-iterate]

φmax

L=1

2φmax

k+1 =yk+1;

τk+1 =τk;

else if vk+1 = (xk+1 , yk+1) satisﬁes (2.17) [M-iterate]

k+1 = mid(−106, yk+1,106);

τk+1 =1

2τk;

else [F-iterate]

k+1 =yE

τk+1 =τk;

end if

Update µR

k+1 and µk+1 according to (2.18) and (2.20), respectively;

if krkk ≤ τstop then exit ;

end (for)

3. Solution of the QP Subproblem

In this section we consider various theoretical and computational issues associated

with the QP subproblem (2.9). In particular, it is shown that the search direction

computed using subproblem (2.9) is the unique solution of the “stabilized” SQP

subproblem (2.11), and independent of the value of ν. Moreover, an active-set

3. Solution of the QP Subproblem 13

method applied to problems (2.9) and (2.11) generates identical iterates, provided

a common (feasible) starting point is used.

3.1. Equivalence with Stabilized SQP

In this section it is shown that, under certain conditions, the regularized QP sub-

problem (2.9) is equivalent to the stabilized SQP subproblem (2.11). Equivalent

problems are considered in which the unknowns are written in terms of the steps

(p, q) for given variables (x, y).

Theorem 3.1. Consider the bound constrained QP

minimize

∆v=(p,q)gT

M∆v +1

2∆vTHM∆v subject to x+p≥0,(3.1)

where xand yare constant,

gM=g−JTπ+ν(π−y)

νc+µ(y−yE),and HM=H+1

µ(1 + ν)JTJ νJ T

νJ νµI ,

with H+1

µJTJpositive deﬁnite and ν≥0. For the same quantities c,g,Jand H,

consider the stabilized QP problem

minimize

p,q gTp+1

2pTHp +1

2µky+qk2

subject to c+J p +µ(y+q−yE)=0, x +p≥0.(3.2)

The following results hold.

(a) The stabilized QP (3.2)has a bounded unique primal-dual solution (p, q).

(b) The unique solution ∆v = (p, q)of the stabilized QP (3.2)is a solution of the

bound constrained QP (3.1)for all ν≥0. If ν > 0, then the stabilized solution

∆v = (p, q)is the unique solution of (3.1).

Proof. For part (a), let ∆v = (p, q) denote an arbitrary feasible point for the

constraints of the stabilized QP (3.2). Given the particular feasible point ∆v0=

(0, π −y), consider an n-vector of variables wdeﬁned by the linear transformation

∆v =∆v0+Mw, where M= µI

−J!.

The matrix Mis (n+m)×nwith rank n, and its columns form a basis for the

null-space of the constraint matrix J µI . Using this transformation gives rise to

the equivalent problem

minimize

w∈Rn

2wTH+1

µJTJw+wTg−JTπsubject to x+µw ≥0.

The matrix H+1

µJTJis positive deﬁnite by assumption, and it follows that the

stabilized QP (3.2) is equivalent to a convex program with a strictly convex objective.

The existence of a bounded unique solution follows directly.

3. Solution of the QP Subproblem 14

For part (b), we begin by stating the ﬁrst-order conditions for (p, q) to be a

solution of the stabilized QP (3.2):

c+Jp +µ(y+q−yE)=0, µ(y+q) = µw,

g+Hp −JTw−z= 0, z ≥0,

z·(x+p)=0, x +p≥0,

where wand zdenote the dual variables for the equality and inequality constraints

of problem (3.2), respectively. Eliminating wusing the equation w=y+qgives

c+Jp +µ(y+q−yE) = 0,(3.3a)

g+Hp −JT(y+q)−z= 0, z ≥0,(3.3b)

z·(x+p) = 0, x +p≥0.(3.3c)

First, we prove part (b) for the case ν > 0. The optimality conditions for (3.1)

are

gM+HM∆v =z

0, z ≥0,(3.4)

z·(x+p)=0, x +p≥0.

Pre-multiplying the equality of (3.4) by the nonsingular matrix Tsuch that

T= I−1+ν

νµ JT

νI!,

and using the deﬁnition (2.2a) yields the equivalent conditions

g+Hp −JT(y+q)−z= 0 and c+Jp +µ(y+q−yE)=0,

which are identical to the relevant equalities in (3.3). Thus, the solutions of (3.2)

and (3.1) are identical in the case ν > 0.

It remains to consider the case ν= 0. In this situation, the objective function

of the QP (3.1) includes only the primal variables p, which implies that the problem

may be written as

minimize

p(g−JTπ)Tp+1

2pTH+1

µJTJpsubject to x+p≥0,(3.5)

with qan arbitrary vector. Although there are inﬁnitely many solutions of (3.1),

the vector passociated with a particular solution (p, q) is unique because it is the

solution of problem (3.5) for a positive-deﬁnite matrix H+1

µJTJ. The optimality

conditions for (3.5) are

g−JTπ+H+1

µJTJp=z, z ≥0,(3.6)

z·(x+p)=0, x +p≥0.

3. Solution of the QP Subproblem 15

For the given yand optimal p, deﬁne the m-vector qsuch that

q=−1

µJp +c+µ(y−ye)=−1

µJp +µ(y−π).(3.7)

Equation (3.7) and the equality of (3.6) may be combined to give the matrix equation

g−JTy+ 2JT(y−π)

µ(y−π)!+ H+2

µJTJ JT

J µI ! p

q!= z

0!.

Applying the nonsingular matrix I−2

µJT

0Ito both sides of this equation yields

g−JTy

c+µ(y−ye)+H−JT

J µI p

q=z

0.

It follows that if ν= 0, then the unique solution of (3.2) is a solution of (3.1), which

is what we wanted to show.

When ν > 0, the uniqueness of the solution ∆v = (p, q ) follows from the obser-

vation that QP (3.1) is then convex with a strictly convex objective.

Theorem 3.1 shows that the direction deﬁned by bound-constrained QP is inde-

pendent of the parameter ν. Moreover, this direction may be deﬁned as the solution

of an equivalent stabilized SQP subproblem (2.11) that does not include νat all.

However, the parameter νdoes appear explicitly in the deﬁnition of the merit func-

tion Mν(2.1), and therefore plays an important role in inﬂuencing the length of

the step during the ﬂexible line search. The value of νdetermines the proximity

of the primal-dual iterates to the so-called “primal-dual trajectory”, which is the

one-parameter family of points x(µ), y(µ), such that x(µ) is a minimizer of the

conventional augmented Lagrangian for ﬁxed yE. The deﬁnition of Mνimplies that

larger values of νtend to force the iterates to be close to the primal-dual trajec-

tory. If ν= 0 then the method reverts to a regularized SQP method based on the

(primal) conventional augmented Lagrangian (for which no emphasis is placed on

staying close to the primal-dual trajectory). The algorithm may be modiﬁed to

allow for the choice ν= 0 by always setting yE

k+1 to be π(xk+1); this does emphasize

the primal-dual trajectory, but only after the major iteration has been completed.

The use of the primal-dual augmented Lagrangian function allows the emphasis on

the dual variables during the line search.

3.2. Equivalent iterates of an active-set method

In Section 3.1 it is shown that, if ν > 0 then the unique solutions of subproblems (2.9)

and (2.11) are identical, and if ν= 0 then the solution of (2.9) is no longer unique,

but there is a particular solution that is identical to the unique solution of (2.11). In

this section we continue our study of these subproblems by considering the iterates

that result when solved with an active-set method.

3. Solution of the QP Subproblem 16

3.2.1. An active-set method

For the remainder of this section, the indices associated with the SQP iteration are

omitted and it will be assumed that the constraints of the QP involve the constraints

linearized at the point ¯x. In all cases, the suﬃx jwill be reserved for the iteration

index of the QP algorithm.

We start by deﬁning a “conventional” active-set method on a generic convex QP

with constraints written in standard form. The problem format is

minimize

xQ(x) = gT(x−¯x) + 1

2(x−¯x)TH(x−¯x)

subject to c+A(x−¯x)=0, x ≥0,(3.8)

where ¯x,c,A,gand Hare constant, with Hpositive-deﬁnite. Throughout, we

assume that the constraints are feasible, i.e., there exists at least one nonnegative x

such that c+A(x−¯x) = 0.

Given a feasible x0, active-set methods generate a feasible sequence {xj}such

that Q(xj+1)≤ Q(xj) with xj+1 =xj+αjpj. Let the index sets A0and F0be

deﬁned as in (1.3). At the start of the jth QP iteration, given primal-dual iterates

(xj, wj), new estimates (xj+pj, wj+qj) are deﬁned by solving a QP formed by

ﬁxing the variables with indices in A0(xj) and deﬁning pjsuch that xj+pjminimizes

Q(x) with respect to the free variables, subject to the equality constraints. With this

deﬁnition, the quantities wj+qjare the Lagrange multipliers at the minimizer xj+pj.

The components of pjwith indices in A0(xj) are zero, and the free components

pF= [ pj]Fare determined from the equations

HF−AT

AF0pF

qj=−[g+H(xj−¯x)−ATwj]F

c+A(xj−¯x),(3.9)

where [ ·]Fdenotes the subvector of components with indices in F0(xj). The choice

of step length αjis based on remaining feasible with respect to the satisﬁed bounds.

If xj+pjis feasible, i.e., xj+pj≥0, then αjwill be taken as unity. Otherwise,

αis set to αM, the largest feasible step along pj. Finally, the iteration index jis

incremented by one and the iteration is repeated.

It must be emphasized that this active-set method is not well deﬁned unless the

equations (3.9) have a solution at every (xj, wj).

3.2.2. Solution of the bound-constrained subproblem

In this section we apply the active-set method to a QP of the form

minimize

v=(x,y)gT

M(v−¯v) + 1

2(v−¯v)THM(v−¯v) subject to x≥0,(3.10)

where ¯v= (¯x, ¯y), and

gM=g−JTπ+ν(π−¯y)

νc+µ(¯y−yE), HM=H+1

µ(1 + ν)JTJ νJ T

νJ νµI ,

3. Solution of the QP Subproblem 17

with H+1

µJTJpositive deﬁnite, ν≥0, and π=yE−c/µ (see (2.3)). The matrix

HMis positive semideﬁnite under the given assumptions. This follows from the

identity

LTHML= H+1

µJTJ0

0νµIm!,where L= In0

−1

µJ Im!.

The matrix Lis nonsingular, and Sylvester’s Law of Inertia gives

In(HM) = In(LTHML) = In H+1

µJTJ+ (m, 0,0) = (n+m, 0,0) for ν > 0,

and

In(HM) = In H+1

µJTJ+ (0,0, m) = (n, 0, m) for ν= 0.

It follows that problem (3.10) is a convex QP, and we may apply the active-set

method of Section 3.2.1.

Given the jth QP iterate vj= (xj, yj), the generic active-set method applied to

(3.10) deﬁnes the next iterate as vj+1 =vj+αj∆vj, where the free components of

the vector ∆vj= (pj, qj) satisfy the equations

[HM]F∆vF=−[gM+HM(vj−¯v) ]F,(3.11)

where ∆vF= (pF, qj) and the index set F0(xj) is deﬁned as in (1.3). The equations

(3.11) appear to be ill-conditioned for small µbecause of the O(1/µ) term in the

(1,1) block of the matrix HM. However, this ill-conditioning is superﬁcial. The next

result shows that ∆vFmay be determined by solving an equivalent nonsingular

primal-dual system with conditioning dependent on that of the original problem.

Theorem 3.2. Consider the application of the active-set method to the QP (3.10).

Then, for every ν≥0, there exists a positive ¯µsuch that, for all 0<µ< ¯µ, the free

components of the QP search direction (pj, qj)satisfy the nonsingular primal-dual

system HF−JT

JFµI pF

qj=−[g+H(xj−¯x)−JTyj]F

c+µ(yj−yE) + J(xj−¯x).(3.12)

Proof. First, we consider the deﬁnition of the search direction when ν > 0. In this

case it suﬃces to show that the linear systems (3.11) and (3.12) are equivalent. For

any positive ν, we may deﬁne the matrix

T= I−1+ν

νµ JT

νIm!,

where the identity matrix Ihas dimension nF, the column dimension of JF. The

matrix Tis nonsingular with nF+mrows and columns. It follows that the equations

T[HM]F∆vF=−T[gM+HM(vj−¯v) ]F

3. Solution of the QP Subproblem 18

have the same solution as those of (3.11). The primal-dual equations (3.12) follow

by direct multiplication. The nonsingularity of the equations (3.12) follows from the

nonsingularity of T, and the fact that HM(and all symmetric submatrices formed

from its rows and columns) is nonsingular.

The resulting equations (3.12) are independent of ν, but the simple proof above

is not applicable when ν= 0 because Tis undeﬁned in this case. For ν= 0, the QP

objective includes only the primal variables x, which implies that problem (3.10)

may be written as

minimize

x≥0g−JTπTx−¯x+1

2x−¯xTH+1

µJTJx−¯x,

with yarbitrary. The active-set equations analogous to (3.11) are then

HF+1

µJT

FJFpF=−g+H+1

µJTJxj−¯x−JTπF

.(3.13)

For any choice of yj, deﬁne the m-vector qjsuch that

qj=−1

µJFpF+µ(yj−π) + J(xj−¯x),(3.14)

where π=yE−c/µ (see (2.3)). Equations (3.13) and (3.14) may be combined to

give equations K∆vF=−r, where ∆vF= (pF, qj),

K= HF+2

µJT

FJFJT

JFµI !

and right-hand side

r= [g+H(xj−¯x) ]F+2

µJT

FJ(xj−¯x)−JT

Fyj+ 2JT

F(yj−π)

µ(yj−π) + J(xj−¯x)!.

Forming the equations T K ∆vF=−T r, where Tis the nonsingular matrix

T= I−2

µJT

0Im!,

gives the equivalent system

HF−JT

JFµI pF

qj=−[g+H(xj−¯x)−JTyj]F

c+µ(yj−yE) + J(xj−¯x),

which is identical to the system (3.12).

3. Solution of the QP Subproblem 19

Theorem 3.3. Let (pk, qk)be the solution of the QP subproblem (2.7). If pFdenotes

the components of pkwith indices in F0(xk+pk), then (pF, qk)satisﬁes the equations

HF−JT

JFµI ! pF

qk!=− [gk−JT

kyk−¯

Hks]F

ck+µ(yk−yE)−Jks!,

where Fis deﬁned in terms of the set F0(xk+pk)and sis a nonnegative vector

such that

si=[xk]iif i∈ A0(xk+pk);

0if i∈ F0(xk+pk).

Proof. The proof is analogous to that of Theorem 3.2.

3.2.3. Solution of the stabilized SQP subproblem

In this section we show that under certain conditions, the conventional active-set

method applied to the stabilized SQP subproblem (3.2) and the bound-constrained

QP (3.1) will generate identical iterates.

Consider the application of the “generic” active-set method of Section 3.2.1 to

the stabilized QP:

minimize

x,y gT(x−¯x) + 1

2(x−¯x)TH(x−¯x) + 1

2µkyk2

subject to c+J(x−¯x) + µ(y−yE)=0, x ≥0.(3.15)

In terms of the data “(x, ¯x, H, g, A, c)” for the generic QP (3.8), we have variables

“x” = (x, y), with “ ¯x” = (¯x, ¯y),

“H” = H0

0µI,“g” = g

µ¯y,“A” = J µI,and “c” = c+µ( ¯y−yE).

(The discussion of the properties of the stabilized QP relative to the generic form

(3.8) is not aﬀected by the nonnegativity constraints being applied to only a subset

of the variables in (3.15).) After some simpliﬁcation, the equations analogous to

(3.9) may be written as





HF0−JT

0µI −µI

JFµI 0





¯pF

qj

=−



[g+H(xj−¯x)−JTwj]F

µyj−µwj

c+µ(yj−yE) + J(xj−¯x)

,(3.16)

where pFand ¯pFdenote the free components of the search directions for the xand

yvariables respectively. (Observe that the right-hand side of (3.16) is independent

of ¯y.) The second block of equations gives ¯pF=qj−yj+wj, which implies that

yj+1 =yj+ ¯pF=yj+qj−yj+wj=wj+qj=wj+1,

so that the primal y-variables and dual variables of the stabilized QP are identical.

4. Convergence 20

Similarly, substituting for ¯pFin the third block of equations in (3.17), and using

the primal-dual equivalence wj=yjgives

HF−JT

JFµI pF

qj=−[g+H(xj−¯x)−JTyj]F

c+µ(yj−yE) + J(xj−¯x),(3.17)

which are identical to the equations associated with those for the QP subproblem

(3.10).

The preceding discussion constitutes a proof of the following result.

Theorem 3.4. Consider the application of the active-set method to the bound con-

strained QP (3.10)and stabilized QP (3.15)deﬁned with the same quantities c,g,J

and H. Consider any x0and y0such that (x0, y0)is feasible for the stabilized QP

(3.15). Then, for every ν≥0, there exists a positive ¯µsuch that, for all 0<µ< ¯µ,

the active-set method generates identical primal-dual iterates {(xj, yj)}j≥0.

4. Convergence

The convergence of Algorithm 2.1 is discussed under the following assumptions.

Assumption 4.1. Each ¯

H(xk, yk)is chosen so that the sequence {¯

H(xk, yk)}k≥0is

bounded, with {¯

H(xk, yk) + (1/µR

k)J(xk)TJ(xk)}k≥0uniformly positive deﬁnite.

Assumption 4.2. The functions fand care twice continuously diﬀerentiable.

Assumption 4.3. The sequence {xk}k≥0is contained in a compact set.

In the “worst” case, i.e., when all iterates are eventually M-iterates or F-iterates,

Algorithm 2.1 emulates a primal-dual augmented Lagrangian method [9,10,43].

Consequently, it is possible that yE

kand µR

kwill remain ﬁxed over a sequence of

iterations, although this is rare in practice. Nonetheless, our convergence result

must consider this situation, which we now investigate.

Theorem 4.1. Let Assumptions 4.1–4.3 hold. If there exists an integer b

ksuch that

µR

k≡µR>0and kis an F-iterate for all k≥b

k, then the following hold:

(i) solutions {∆vk}k≥

kto subproblem (2.9)are bounded above;

(ii) solutions {∆vk}k≥

kto subproblem (2.9)are bounded away from zero; and

(iii) there exists a constant  > 0such that

∇Mν(vk;yE

k, µR

k)T∆vk≤ −for all k≥b

Proof. The assumptions of this theorem guarantee that

τk≡τ > 0, µR

k=µR,and yE

k=yEfor all k≥b

k. (4.1)

4. Convergence 21

We ﬁrst prove part (i). As in the proof of Theorem 3.1, we know that the solution

to (2.9) satisﬁes

∆vk=0

πk−yk+Mkw∗,where Mk=µRI

−Jk

and w∗is the unique solution of

minimize

w∈Rn

µR

2wT¯

Hk+1

µRJT

kJkw+wTgk−JT

kπksubject to xk+µRw≥0,

for all k≥b

k. It follows from Assumption 4.1 that {∆vk}k≥

kis uniformly bounded

provided that the quantities gk−JT

kπk,Mk,πk, and ykare all uniformly bounded

for k≥b

k. The boundedness of gk−JT

kπk,πkand Mkfollows from Assumption 4.2,

Assumption 4.3, (4.1), and (2.3). Thus, it remains to prove that {yk}k≥

kis bounded.

To this end, we ﬁrst note that since µR

k=µRfor all k≥b

k, the update to µkgiven

by (2.20) implies that µk≡µ≥µRfor some µand all ksuﬃciently large. From

this point onwards the primal-dual merit function is monotonically decreasing, i.e.,

Mν(xk+1, yk+1 ;yE, µ)≤ Mν(xk, yk;yE, µ). Thus {yk}k≥

kmust be bounded since

if there existed a subsequence such that kykkconverged to inﬁnity, then along that

same subsequence Mνwould also converge to inﬁnity since both {fk−cT

kyE+

2µkckk2}k≥

kand {ck}k≥

kare bounded because of Assumptions 4.2 and 4.3. This

completes the proof of part (i).

Part (ii) is established by showing that {∆vk}k≥

kis bounded away from zero. If

this were not the case, there would exist a subsequence S1⊆ {k:k≥b

k}such that

limk∈S1∆vk= 0. It follows that the solution ∆vkto problem (2.9) satisﬁes

zk

0=Hν

M(vk;µR)∆vk+∇Mν(vk;yE, µR) and 0 = min(xk+pk, zk)

for all k∈ S1. We may then conclude from the deﬁnition of Hν

M, Assumptions 4.1–

4.3, and (4.1) that for k∈ S1suﬃciently large, iterate vkwill satisfy condition (2.17),

be an M-iterate, and µR

kwould be decreased. This contradicts the assumption that

µR

k≡µRfor all k≥b

k. It follows that {kvkk}k≥

kis bounded away from zero and

part (ii) holds.

The proof of part (iii) is also by contradiction. Assume that there exists a

subsequence S2of {k:k≥b

k}such that

lim

k∈S2

∇Mν(vk;yE, µR)T∆vk= 0,(4.2)

where we have used (4.1). Using the matrix

Lk=I0

µRJkI,

the fact that ∆v = 0 is feasible for the convex problem (2.9), that ∆vkis the solution

to problem (2.9) for ν > 0 chosen in Algorithm 2.1, (4.1), and Assumption 4.1, it

4. Convergence 22

follows that

−∇Mν(vk;yE, µR)T∆vk≥1

2∆vT

kHν

M(vk;µR)∆vk

2∆vT

kL−T

kLT

kHν

M(vk;µR)LkL−1

k∆vk

= pk

qk+1

µRJkpk!T ¯

Hk+1

µRJT

kJk0

0νµR! pk

qk+1

µRJkpk!T

≥λminkpkk2+νµRkqk+ (1/µR)Jkpkk2,

for some λmin >0. Combining this with (4.2) we deduce that

lim

k∈S2

pk= lim

k∈S2qk+ (1/µR)Jkpk= 0,

from which limk∈S2qk= 0 follows from Assumptions 4.2 and 4.3. This contradicts

part (ii), which shows that limk∈S2∆vk= 0. It follows that part (iii) must hold.

We may now state our convergence result for Algorithm 2.1.

Theorem 4.2. Let Assumptions 4.1–4.3 hold. If vkdenotes the kth iterate gener-

ated by Algorithm 2.1, then either

(i) Algorithm 2.1 terminates with an approximate primal-dual ﬁrst-order solution

vKsatisfying

kropt(vK)k ≤ τstop ,

where ropt is deﬁned by (2.19); or

(ii) there exists a subsequence Ssuch that limk∈S µR

k= 0,{yE

k}k∈S is bounded,

limk∈S τk= 0, and for each k∈ S the vector vk+1 is an approximate minimizer

of the primal-dual augmented Lagrangian function (2.1)satisfying (2.17).

Proof. There are two cases to consider.

Case 1. A subsequence of {kropt(vk)k}k≥0converges to zero.

In this case it is clear from the deﬁnition of S-iterates, M-iterates, φS, and φL, and

the fact that τstop >0 that part (i) will be satisﬁed for some Ksuﬃciently large.

Case 2. The sequence {kropt(vk)k}k≥0is bounded away from zero.

Using the deﬁnition of an S-iterate, M-iterates, and the functions φS, and φLwe may

conclude that the number of S-iterates and L-iterates must be ﬁnite. We now claim

that there must be an inﬁnite number of M-iterates. To prove this, we assume to the

contrary that the number of M-iterates is ﬁnite, so that all iterates are F-iterates

for ksuﬃciently large. It follows that the update to µR

kgiven by (2.18) and the

assumption of this case, that µR

kis eventually never decreased any further, and that

5. Conclusions 23

the update to µkgiven by (2.20) implies that µkis also eventually ﬁxed. Gathering

these facts gives the existence of an integer b

ksuch that

µR

k≡µR≤µ≡µk, yE

k≡yE, τk≡τ > 0,and kis an F-iterate for all k≥b

It then follows from (2.20) that

Mν(vk+1 ;yE, µ)≤ Mν(vk;yE, µ) + min(αmin, αk)ηSNkfor all k≥b

k, (4.3)

where Nkis deﬁned by (2.13). Moreover, parts (ii) and (iii) of Theorem 4.1 ensures

that {Nk}k≥

kis a negative sequence bounded away from zero. We also claim that

{αk}k≥

kis bounded away from zero. To see this, we ﬁrst note that parts (i) and

(iii) of Theorem 4.1 and Assumption 4.2 would ensure that {αk}k≥

kis bounded

away from zero if a standard Armijo line search was used, i.e., if µF

k=µRand

Nk=∆vT

k∇Mν(vk;yE, µR) in (2.12). However, the αkthat we actually compute

can be no smaller since the actual deﬁnition of Nkis less restrictive and we use a

ﬂexible line search that makes step acceptance more likely. Combining these facts

with (4.3), we conclude that

Mν(vk+1 ;yE, µ)≤ Mν(vk;yE, µ)−κfor all k≥b

kand some κ > 0,

so that

lim

k→∞ Mν(vk;yE, µ) = −∞.

However, Assumptions 4.2 and 4.3 ensure that this is not possible. A contradiction

has been reached so there exists inﬁnitely many M-iterations, and all iterates are

M-iterates and F-iterates for all ksuﬃciently large. Part (ii) now follows from (2.18)

and the properties of the updates to τkand yE

kused for M-iterates and F-iterates in

Algorithm 2.1.

The “ideal” scenario is that Algorithm 2.1 generates many S-iterates/L-iterates

that rapidly converge to an approximate solution of NP; this corresponds to part (i)

of Theorem 4.2. Part (ii) of Theorem 4.2, i.e., generating inﬁnitely many M-iterates,

is the fall-back position of Algorithm 2.1. We believe this result is the best that

can be expected since we have not assumed any constraint qualiﬁcation. In fact,

the assumptions we have made does no preclude the possibility that problem NP is

infeasible. Also, it has recently been proved [14,15,37] that iterates generated from

the stabilized SQP subproblem exhibit superlinear convergence under rather mild

conditions; in particular, strict complementarity is not assumed and no constraint

qualiﬁcation is required.

5. Conclusions

In this paper we developed and analyzed an SQP method for solving general non-

linear optimization problems. The algorithm is based on the natural pairing of a

generalized primal-dual augmented Lagrangian function with a ﬂexible line search.

A sequential quadratic programming based strategy for particle swarm optimization on single-objective numerical optimization

Article

Full-text available

Apr 2024

(a) Please download code from:【https://github.com/microhard1999/CODES】 Over the last decade, particle swarm optimization has become increasingly sophisticated because well-balanced exploration and exploitation mechanisms have been proposed. The sequential quadratic programming method, which is widely used for real-parameter optimization problems, demonstrates its outstanding local search capability. In this study, two mechanisms are proposed and integrated into particle swarm optimization for single-objective numerical optimization. A novel ratio adaptation scheme is utilized for calculating the proportion of subpopulations and intermittently invoking the sequential quadratic programming for local search start from the best particle to seek a better solution. The novel particle swarm optimization variant was validated on CEC2013, CEC2014, and CEC2017 benchmark functions. The experimental results demonstrate impressive performance compared with the state-of-the-art particle swarm optimization-based algorithms. Furthermore, the results also illustrate the effectiveness of the two mechanisms when cooperating to achieve significant improvement.

A Learnable Viewpoint Evolution Method for Accurate Pose Estimation of Complex Assembled Product

Article

Full-text available

May 2024

Balancing adaptability, reliability, and accuracy in vision technology has always been a major bottleneck limiting its application in appearance assurance for complex objects in high-end equipment production. Data-driven deep learning shows robustness to feature diversity but is limited by interpretability and accuracy. The traditional vision scheme is reliable and can achieve high accuracy, but its adaptability is insufficient. The deeper reason is the lack of appropriate architecture and integration strategies between the learning paradigm and empirical design. To this end, a learnable viewpoint evolution algorithm for high-accuracy pose estimation of complex assembled products under free view is proposed. To alleviate the balance problem of exploration and optimization in estimation, shape-constrained virtual–real matching, evolvable feasible region, and specialized population migration and reproduction strategies are designed. Furthermore, a learnable evolution control mechanism is proposed, which integrates a guided model based on experience and is cyclic-trained with automatically generated effective trajectories to improve the evolution process. Compared to the m of the state-of-the-art data-driven method and the m of the classic strategy combination, the pose estimation error of complex assembled product in this study is m, which proves the effectiveness of the proposed method. Meanwhile, through in-depth exploration, the robustness, parameter sensitivity, and adaptability to the virtual–real appearance variations are sequentially verified.

RadRCom: A Relay-Assisted Radar Communication System Design Framework

Article

Full-text available

Jan 2024

This study introduces the novel communication topology, namely RadRCom, integrating radar and relay-assisted communication systems for single antenna configuration as a proof of concept. While simultaneous radar and communication operations within the same spectrum gain momentum, our work advances this concept by incorporating relay assistance, particularly crucial in applications like vehicle-to-anything (V2X) communication. The inclusion of relays significantly enhances communication system performance, addressing challenges such as interference management between radar, relay, and communication nodes. This topology attracts three design challenges such as optimal radar waveform, relay parameters and communication system parameters. However, the key bottlenecks are the interference from radar to the relay and communication receiver and similarly the one from communication transmitter and relay node to the radar. Therefore, the work addresses these challenges simultaneously meeting the quality of service. Our proposed RadRCom system optimizes radar waveform and relay parameters to improve signal-to-interference noise ratio (SINR) at the radar and mean square error (MSE) of data transmission. We introduce two frameworks for parameter design, i.e, radar-centric and relay-centric ones. We take a sub-optimal iterative approach to address the computational complexity. Numerical simulations are performed to evaluate the performances of the proposed RadRCom system with the proposed algorithms.

Hybrid Renewable Production Scheduling for a PV–Wind-EV-Battery Architecture Using Sequential Quadratic Programming and Long Short-Term Memory–K-Nearest Neighbors Learning for Smart Buildings

Article

Full-text available

Mar 2024

Electricity demand in residential areas is generally met by the local low-voltage grid or, alternatively, the national grid, which produces electricity using thermal power stations based on conventional sources. These generators are holding back the revolution and the transition to a green planet, being unable to cope with climatic constraints. In the residential context, to ensure a smooth transition to an ecological green city, the idea of using alternative sources will offer the solution. These alternatives must be renewable and naturally available on the planet. This requires a generation that is very responsive to the constraints of the 21st century. However, these sources are intermittent and require a hybrid solution known as Hybrid Renewable Energy Systems (HRESs). To this end, we have designed a hybrid system based on PV-, wind-turbine- and grid-supported battery storage and an electric vehicle connected to a residential building. We proposed an energy management system based on nonlinear programming. This optimization was solved using sequential quadrature programming. The data were then processed using a long short-term memory (LSTM) model to predict, with the contribution and cooperation of each source, how to meet the energy needs of each home. The prediction was ensured with an accuracy of around 95%. These prediction results have been injected into K-nearest neighbors (KNN), random forest (RF) and gradient boost (GRU) repressors to predict the storage collaboration rates handled by the local battery and the electric vehicle. Results have shown an R2_score of 0.6953, 0.8381, and 0.739, respectively. This combination permitted an efficient prediction of the potential consumption from the grid with a value of an R²-score of around 0.9834 using LSTM. This methodology is effective in allowing us to know in advance the amount of energy of each source, storage, and excess grid injection and to propose the switching control of the hybrid architecture.

Adaptive decoupled robust design optimization

Article

Nov 2023
STRUCT SAF

Robust design optimization (RDO) is a valuable technique in the design of engineering structures as it can provide an optimum design solution that is relatively insensitive to input uncertainties. However, the nested double-loop estimation process required in RDO often results in significant computational costs. To address this issue, we propose an adaptive decoupled RDO method based on the Kriging surrogate model. This method transforms the nested double-loop estimation process into a traditional deterministic optimization procedure, thus reducing computational costs. Furthermore, a novel estimation expression for the performance standard deviation that can simultaneously reflect the uncertainties in both the prediction and the performance mean is established. The closed-form expressions of the performance mean and performance standard deviation under different design parameters are deduced, which are further implemented to the uncertainty propagation during the design optimization. Moreover, an adaptive framework is introduced to improve the computational accuracy of uncertainty propagation as well as optimization procedure to guarantee the estimation accuracy of RDO problems. Several numerical examples along with engineering cases are introduced to illustrate the effectiveness of the established adaptive decoupled adaptive RDO method, and the results demonstrate that the proposed method can effectively optimize the design of structures while reducing computational costs.

Developing a stage-independent multiple sampling plan with loss-based capability index for lot disposition

Article

Jun 2024
J OPER RES SOC

Biogas and Biofuel Production from Biowaste: Modelling and Simulation Study

Chapter

May 2024

The global energy demands have reached high peaks and even in 2023, it has been witnessed that nearly 90% of the energy in the world is still based and produced from fossil fuels. Considering all the consequences of using such conventional energy sources like CO2 and other greenhouse gases emissions, and at the same time questioning the sustainability factor of the energy transition, there is a dire need to transition to cleaner energy fuels. To inculcate the conventional energy sources, a lot of renewable energy sources like biogas, biofuels, hydrogen, solar energy etc. have been researched. The primary success towards biogas production is due to the affordability of the available feedstock, the ease and availability of biofuels, low production costs, and the applications of biogases which involves heating, electricity, fuel, refrigeration and power generation. Some of the criticalities and challenges discussed in the study includes a gigantic gap between biotechnology research and development, commercialization and analysis of the future of biogas in the circular economy. Many lignocellulosic sources, such as manure, fruit, and vegetable wastes, can be used to generate biowaste, and anaerobic digestion can be used on a local or large scale. In this study, MATLAB/Simulink environment is used to carry out a multitude of models and simulations that take into account speculative objectives and potential energy futures (calculating the number of functioning plants in 2030, 2050, etc.) which include Input–output models, the Anaerobic Digestion Model 1 (ADM1), and other models. There are a lot of predictions, points of view, and conclusions that are discussed that claim the outcomes of simulations of such models can cause significant changes in the economic systems, as the use of biogas and biofuel will lead to the recovery of a lot of fossil phosphorous, to the tune of 100–150 billion euros, and that also proved the effectiveness and applications of biogas and biofuels. Graphical Abstract Graphical Abstract giving the overview of the various models and simulation models for biogas production from biowaste with the help of MATLAB/Simulink.

Payload Optimization of Four Stage Launch Vehicle using Genetic Algorithm and Quadratic Programming based on Trajectory Optimization (in Persian)

Conference Paper

Full-text available

May 2024

Fast Robust Optimization of ORC Based on an Artificial Neural Network for Waste Heat Recovery

Article

May 2024
ENERGY

An Edge Architecture for Enabling Autonomous Aerial Navigation with Embedded Collision Avoidance Through Remote Nonlinear Model Predictive Control

Article

Jun 2024
J PARALLEL DISTR COM

A Combined Filter Line Search and Trust Region Method for Nonlinear Programming

Conference Paper

Full-text available

May 2006

A framework for solving a class of nonlinear programming problems via the filter method is presented. The proposed technique first solve a sequence of quadratic programming subproblems via line search strategy and to induce global convergence, trial points are accepted provided there is a sufficient decrease in the objective function or constraints violation function. In the event when the step size has reached a minimum threshold such that the trial iterate is rejected by the filter, the algorithm temporarily exits to a trust region based algorithm to generate iterates that approach the feasible region and also acceptable to the filter. Computational results on selected large scale CUTE problems on the prototype code filLS are very encouraging and numerical performance with LOQO and SNOPT show that the algorithm is efficient and reliable.

A Combined Filter Line Search and Trust Region Method for Nonlinear Programming

Article

Full-text available

Jun 2006

Testing a Class of Methods for Solving Minimization Problems with Simple Bounds on the Variables

Article

Full-text available

May 1988

We describe the results of a series of tests for a class of new methods of trust region type for solving the simple bound constrained minimization problem. The results are encouraging and lead us to believe that the methods will prove useful in solving large-scale problems.

A Schur-complement method for sparse quadratic programming

Chapter

Sep 1990

This volume comprises a set of research papers that together will provide an up-to-date survey of the current state of the art in numerical analysis. The contributions are based on talks given at a conference in honour of Jim Wilkinson, one of the foremost pioneers in numerical analysis. The contributors were all his colleagues and collaborators and are leading figures in their respective fields. The breadth of Jim Wilkinson's research is reflected in the main themes covered: linear algebra, error analysis and computer arithmetic, algorithms, and mathematical software. Particular topics covered include analysis of the Lanczos algorithm, determining the nearest defective matrix to a given one, QR-factorizations, error propagation models, parameter estimation problems, sparse systems, and shape-preserving splines. As a whole the volume reflects the current vitality of numerical analysis and will prove an invaluable reference for all numerical analysts.

A NEW DERIVATION OF SYMMETRIC POSITIVE DEFINITE SECANT UPDATES

Chapter

Dec 1981

In this paper, we introduce a simple new set of techniques for deriving symmetric and positive definite secant updates. We use these techniques to present a simple new derivation of the BFGS update using neither matrix inverses nor weighting matrices. A related derivation is shown to generate a large class of symmetric rank-two update formulas, together with the condition for each to preserve positive definiteness. We apply our techniques to generate a new projected BFGS update, and indicate applications to the efficient implementation of secant algorithms via the Cholesky factorization.

Duality in Quadratic Progrmming

Article

Jul 1960

W.S. Dorn

A proof, based on the duality theorem of linear programming, is given for a duality theorem for a class of quadratic programs. An illustrative application is made in the theory of elastic structures.

Second order corrections for non-differentiable optimization

Article

Jan 1982

R. Fletcher

Adaptive Precision Control in Quadratic Programming with Simple Bounds and/or Equalities

Article

Jan 1998

Many algorithms for solution of quadratic programming problems generate a sequence of simpler auxiliary problems whose solutions approximate the solution of a given problem. When these auxiliary problems are solved iteratively, which may be advantageous for large problems, it is necessary to define precision of their solution so that the whole procedure is effective. In this paper, we review our recent results on implementation of algorithms with precision control that exploits the norm of violation of Karush-Tucker conditions.

On the Accurate Determination of Search Directions for Simple Differentiable Penalty Functions

Article

Jul 1986
IMA J NUMER ANAL

Nicholas Ian Mark Gould

We present numerically reliable methods for the calculation of a search direction for use in sequential methods for solving nonlinear programming problems. The methods presented are easy to adapt to such problems as locating directions of negative curvature and linear infinite descent. Encouraging numerical results are included.

An ℓ 1 penalty method for nonlinear constraints

Article

R. Fletcher

Sequential Quadratic Programming Methods

Abstract

Recommended publications

Hopfield Learning-based and Nonlinear Programming methods for Resource Allocation in OCDMA Networks

A Decomposition Method Based on SQP for a Class of Multistage Stochastic Nonlinear Programs

Large-Scale Decomposition for Successive Quadratic Programming

Large-scale SQP Methods and their Application in Trajectory Optimization