PresentationPDF Available

Subspace Methods

December 2020

December 2020

Conference: Optimization Group Seminar

Authors:

Optimization Group Seminar

Monochromatic image of the DE-472 lung phantoms

…

Speedup ratio of the parallel refinement of the trust region method

…

Content may be subject to copyright.

Content uploaded by Pengcheng Xie

Content may be subject to copyright.

Subspace Methods: Dimension Balance between

Approximation to Optimization Problem and Solving

Subproblem

Pengcheng Xie

xpc@lsec.cc.ac.cn

Supervised by Prof. Ya-xiang Yuan

Institute of Computational Mathematics and Scientiﬁc/Engineering Computing

Academy of Mathematics and Systems Science

Chinese Academy of Sciences, China

Group Seminar

December 8, 2020

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 1 /51

Outline

Introduction

Why study subspace methods?

Subspace methods with different structure

How to design subspace methods?

Conclusion and future work

What kinds of subspace methods are wanted?

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 2 /51

Introduction

Why study subspace methods?

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 3 /51

Solve image reconstruction in CT by PDFO2: failed

An inverse problem in [Chen et al. 2017]:

ﬁnd a best x∈Rnwhich satisﬁes

f(x) = y⇒min

x∈Rnkf(x)−yk2

where f:Rn→Rnand y∈Rn,xor yrepresent a long

vector reshaped from a matrix of 512 ×512 =262144.

We choose PDFO to solve, but PDFO shows an error:

“uobyqa: problem too large for uobyqa. Try other

solvers.”

Figure 1:

Monochromatic image

of the DE-472 lung

phantoms

Sad: This problem can not and does not have to be solved by DFO

Happy: Tom’s words1

1Tom M. Ragonneau: Ph.D. Student in PolyU. Supervised by Prof. Zaikun Zhang and co-supervised by Prof. Xiaojun Chen.

2Powell’s Derivative Free Optimization solvers : https://www.pdfo.net

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 4 /51

Tom’s words and Zaikun’s subspace method

Tom:

“In DFO,n=100 is considered as a large

problem,n=200 is considered as a very

large problem. I read once that

NEWUOA has been tested with n=1000,

but this is incredibly huge.”

“Do you have any way to reduce the size

of your problem, to ﬁnd some kind of

space (or lower dimension) in which your

variables may belong (even

approximately).”

[Zhang 2012]:

Solve the subproblem

min

d∈Sk

Qk(xk+d)

on the subspace

Sk=span∇Qk(xk),dk−1,¯

dk,

where

dk=∑

y∈Ik

f(y)−f(xk)

ky−xkk2

·y−xk

ky−xkk2

≈∇f(xk),

where Ikis the interpolation point set.

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 5 /51

Zaikun’s subspace method

Algorithm 1 NEWUOAs

1. Choose positive sequences {hk},{pk}and constant ε≥0.Set the initial point x1;s0:=0; k:=1.

2. Choose mk∈[n+1,(n+1)(n+2)

2], call MODEL(xk,hk,mk), get ˜gk, the approximation of the gradient

at xk. If hk<εand k˜gkk<ε, end. Let

Sk=span{˜gk,sk−1}.

3. Set RHOEND =pk,call NEWUOA to solve the subproblem

min

d∈Sk

f(xk+d)

and get dk.

4. If f(xk+dk)<f(xk),then xk+1:=xk+dk,sk:=dk; otherwise, xk+1:=xk,sk:=sk−1.k:=k+1.

go to step2.

NEWUOA: dimension <1000.

NEWUOAs: dimension = 2000.

Global convergence and R-linear convergence

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 6 /51

Optimization problem and subproblem

Optimization problem

Find x∗satisﬁes

min

xf(x)

s.t. x∈X.

Subproblem

Find xk+1=xk+dsatisﬁes

min

dmk(xk+d)

s.t. d∈D.

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 7 /51

Choose xk+1from xkin the subproblem

Line search method

1. Generate a descent search direction dk.

2. Search along this direction for a step

size αk.

min

α∈Rφk(α) = f(xk+αdk).

3. xk+1=xk+αkdk.

Direction: n-dimension problem

Stepsize: 1-dimension problem

Trust region method

1. Given trust region radius like a step size.

2. Compute a search direction in trust re-

gion.

min

s∈RnQk(s) = g>

ks+1

2s>Bks

s.t. ksk2≤∆k.

3. xk+1=xk+sk.

Radius: 1-dimension problem

Direction: n-dimension problem

Where is the mediant dimension problem?

(1<mediant <n)

Subspace methods

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 8 /51

Why do we want to solve the mediant dimension problem

Question:

There is no need to deliberately produce mediant dimension problem.

Answer:

There is unbalance in computing the direction and the stepsize3.

Reduce the dimension.

Gather more information.

Special problems or needs.

[Conn et al. 1994]:

Require that Skcontains at least two compo-

nents:

a Gradient-related direction, to

encourage global convergence.

a Newton-related direction, to

encourage fast asymptotic convergence.

Extension of the dogleg method:

min

d∈Rnm(d)def

=f+gTd+1

2dTBd

s.t. kdk ≤ ∆

⇓

min

dm(d) = f+gTd+1

2dTBd

s.t. kdk ≤ ∆,d∈spanhg,B−1gi

3Prof. Ya-xiang Yuan’s presentation on ICM 2014

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 9 /51

Subspace methods with

different structure

How to design subspace methods?

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 10 /51

Typical scenarios to design subspace methods

[Liu, Wen and Yuan 2020]4:

Subproblem:

xk→xk+1:

min

dmk(xk+d)

s.t. d∈D

Problem:

min

xf(x)

s.t. x∈X

Find a linear combination of several known directions.

Linear and nonlinear conjugate gradient methods [Sun and Yuan

2006; Nocedal and Wright 2006]

Nesterov’s accelerated gradient method [Nesterov 2003; Nesterov

1983]

Heavy-ball method [Polyak 1964]

Momentum method [Goodfellow, Bengio, and Courville 2016]

Keep the objective function and constraints, but add an extra

restriction in a certain subspace.

OMP [Tropp and Gilbert 2008]

CoSaMP [Needell and Tropp 2010]

LOBPCG [Andrew 2001]

LMSVD [Liu, Wen, and Zhang 2013]

4Subspace Methods for Nonlinear Optimization: http://bicmr.pku.edu.cn/ wenzw/paper/SubOptv.pdf

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 11 /51

Typical scenarios to design subspace methods

Subproblem:

xk→xk+1:

min

dmk(xk+d)

s.t. d∈D

Problem:

min

xf(x)

s.t. x∈X

Approximate the objective function but keep the constraints.

BCD [Tseng and Yun 2009]

RBR [Wen, Goldfarb, and Scheinberg 2012]

Parallel subspace correction [Fornasier 2007; Fornasier and

Sch¨

onlieb 2008]

Use subspace techniques to approximate the objective

functions.

Sampling/Sketching [Goodfellow, Bengio, and Courville 2016;

Mahoney 2011]

Nystr¨

om approximation [Tropp et al. 2017]

Approximate the objective function and design new

constraints.

Trust region methods with subspaces [Shultz, Schnabel, and Byrd

1985]

FPC AS [Wen et al. 2010]

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 12 /51

Typical scenarios to design subspace methods

Subproblem:

xk→xk+1:

min

dmk(xk+d)

s.t. d∈D

Problem:

min

xf(x)

s.t. x∈X

Add a postprocess procedure after the subspace problem is

solved.

Truncated subspace method for tensor train [Zhang, Wen, and

Zhang 2016]

Integrate the optimization method and subspace update in one

framework.

Polynomial-ﬁltered subspace method for low-rank matrix

optimization [Liu, Wen and Yuan 2020]

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 13 /51

Subspace relationship

dim(Sk) = dim(Sk+1):Sk≈Sk+1

dim(Sk)≤dim(Sk+1):Sk⊆Sk+1

∑p

k=1dim(Sk) = n:S1+· · · +Sp=Rn

dim(Sk) = ik:Ik→Sk

dim(Sk)≥dim(Sk+1):Sk⊇Sk+1

Direction-Gradient subspaces

One-add-one-drop subspaces

Krylov subspaces

Nested subspaces

Complement subspaces

Subsampling/Sketching

Stochastic optimization

Active set methods (towards)

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 14 /51

Subspace relationship

dim(Sk) = dim(Sk+1):Sk≈Sk+1

dim(Sk)≤dim(Sk+1):Sk⊆Sk+1

∑p

k=1dim(Sk) = n:S1+· · · +Sp=Rn

dim(Sk) = ik:Ik→Sk

dim(Sk)≥dim(Sk+1):Sk⊇Sk+1

Direction-Gradient subspaces

One-add-one-drop subspaces

Krylov subspaces

Nested subspaces

Complement subspaces

Subsampling/Sketching

Stochastic optimization

Active set methods (towards)

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 15 /51

Direction-Gradient subspace method for x∈Rn

Linear combination of several known directions

Conjugate gradient method:

dk=−gk+βk−1dk−1,

Sk=span{xk−1,gk,dk−1}.

Nesterov’s accelerated gradient method

(FISTA method) [Beck and Teboulle 2009;

Nesterov 2003]:

yk=xk−1+k−2

k+1(xk−1−xk−2),

xk=yk−αk∇f(yk),

Sk=span{xk−1,xk−2,∇f(yk)}.

Global convergence

n-step local quadratic

convergence

Gradient method:

stepsize: 1

convergence rate: O(1

↓

FISTA method:

stepsize: 1

convergence rate: O(1

k2)

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 16 /51

Limited memory methods for eigenvalue computation

Find a p-dimensional eigenspace associated with plargest eigenvalues of A

⇔max

X∈Rn×ptrX>AX,s.t. X>X=I.(1)

The ﬁrst-order optimality conditions of (1):

AX =XΛ,X>X=I,

where Λ=X>AX ∈Rp×pis the matrix of Lagrangian multipliers.

At each iteration, the methods solve a subspace trace maximization problem

Y=argmax

X∈Rn×pntrX>AX:X>X=I,X∈So.

LOBPCG [Andrew 2001]:S=span{Xi−1,Xi,AXi}.

No theory to predict accurately the convergence speed.

Not converge slower than block steepest ascent on every step.

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 17 /51

Truncated subspace method for tensor train

[Zhang, Wen, and Zhang 2016]:

x∈Rn→x∈Rn1×n2×···×nd,e.g.n→O(1042).

Tensor cores Xµ∈Rrµ−1×rµ×nµ, ﬁxed dimension rµ, constant r: TT rank.

Figure 2: xi1i2...id=X1(i1)X2(i2)·· ·Xd(id)

TT format

Figure 3: Xi1,...,iµ,...,id;j=X1(i1)·· · Xµ,jiµ·· ·Xd(id)

µ-BTT format

Operator TT format: Ai1i2···id,j1j2···jd=A1(i1,j1)A2(i2,j2)· · · Ad(id,jd),

where Aµiµ,jµ∈Rrµ−1×rµfor iµ,jµ∈1,...,nµ.

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 18 /51

Truncated subspace method for tensor train

The eigenvalue problem in the BTT format is

min

X∈Rn×ptrX>AX,s.t. X>X=Ipand X∈Tn,r,p.

Subspaces: ST

k=span{PT(AXk),Xk,Xk−1},ST

k=span{Xk,PT(Rk),PT(Pk)},

where PT(AXk)is the truncation of Xto Tn,r,p. The subspace problem in the BTT

format is

Yk+1:=argmin

X∈Rn×p

trX>AX,s.t. X>X=Ip,X∈ST

k,(2)

which is equivalent to a generalized eigenvalue decomposition problem:

min

V∈Rq×ptrV>S>ASV,s.t. V>S>SV =Ip.

We next project Yk+1to the required space Tn,r,pas

Xk+1=argmin

X∈Rn×p

kX−Yk+1k2

F,s.t. X>X=Ip,X∈Tn,r,p.

This problem can be solved by using the alternating minimization scheme.

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 19 /51

Subspace relationship

dim(Sk) = dim(Sk+1):Sk≈Sk+1

dim(Sk)≤dim(Sk+1):Sk⊆Sk+1

∑p

k=1dim(Sk) = n:S1+· · · +Sp=Rn

dim(Sk) = ik:Ik→Sk

dim(Sk)≥dim(Sk+1):Sk⊇Sk+1

Direction-Gradient subspaces

One-add-one-drop subspaces

Krylov subspaces

Nested subspaces

Complement subspaces

Subsampling/Sketching

Stochastic optimization

Active set methods (towards)

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 20 /51

Quasi-Newton methods

L-BFGS: Matrix Bkand inverse matrix Hk[Sun and Yuan 2006; Nocedal and

Wright 2006].

The search direction is dk=−B−1

kgk=−Hkgk(Both Bkand Hkcan be written in a

compact representation [Byrd, Nocedal, and Schnabel 1997].

Assume that there are ppairs of vectors:

Uk=sk−p,...,sk−1∈Rn×p,Yk=yk−p,...,yk−1∈Rn×p,

where si=xi+1−xi,yi=gi+1−gi.

For a given initial matrix H(0)

k,Hk=H(0)

k+CkPkC>

k, where

Ck:=hUk,H(0)

kYki∈Rn×2p,Dk=diaghs>

k−pyk−p,...,s>

k−1yk−1i,

Pk:="R−>

kDk+Y>

kH(0)

kYkR−1

k−R−>

−R−1

k0#,(Rk)i,j=s>

k−p+i−1yk−p+j−1,if i≤j,

0,otherwise.

The initial matrix H(0)

kis γkI. Then

dk∈spangk,sk−1,...,sk−p,yk−1,...,yk−p.

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 21 /51

Limited memory methods for eigenvalue computation

max

X∈Rn×ptrX>AX,s.t. X>X=I.(3)

The ﬁrst-order optimality conditions of (3):

AX =XΛ,X>X=I,

where Λ=X>AX ∈Rp×pis the matrix of Lagrangian multipliers.

At each iteration, the methods solve a subspace trace maximization problem

Y=argmax

X∈Rn×pntrX>AX:X>X=I,X∈So.

.....................................................................................................................................

LMSVD [Liu, Wen, and Zhang 2013]:S=span {Xi,Xi−1,· ·· ,Xi−t}.

Global convergence under reasonable assumptions.

Table 1: SSI vs LMSVD (pkn)

method SSI LMSVD

total cost per iteration O(n+k)Ok(1+p)2

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 22 /51

Subspace relationship

dim(Sk) = dim(Sk+1):Sk≈Sk+1

dim(Sk)≤dim(Sk+1):Sk⊆Sk+1

∑p

k=1dim(Sk) = n:S1+· · · +Sp=Rn

dim(Sk) = ik:Ik→Sk

dim(Sk)≥dim(Sk+1):Sk⊇Sk+1

Direction-Gradient subspaces

One-add-one-drop subspaces

Krylov subspaces

Nested subspaces

Complement subspaces

Subsampling/Sketching

Stochastic optimization

Active set methods (towards)

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 23 /51

Augmented Rayleigh-Ritz method for eigenvalue

computation

RR map (Y,Σ) = RR(A,Z)⇔the trace-maximization subproblem with

S=R(Z).

The augmentation of the subspaces in LOGPCG and LMSVD is the main

reason why they generally achieve faster convergence than the classic SSI.

ARR: For some integer t≥0, design a block Krylov subspace structure:

S=spannX,AX,A2X,...,AtXo.(4)

RR procedure using (ˆ

Y,ˆ

Σ) = RR(A,Kt), where Kt=X,AX,A2X,...,AtX.

The pleading Ritz pairs (Y,Σ)is extracted from (ˆ

Y,ˆ

Σ).

The analysis of ARR in [Wen and Zhang 2017; Wen and Zhang 2015]:

the convergence rate of SSI:



λp+1

λpfor RR (t=0)⇒

λ(t+1)p+1

λpfor ARR (t>0).

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 24 /51

Subspace relationship

dim(Sk) = dim(Sk+1):Sk≈Sk+1

dim(Sk)≤dim(Sk+1):Sk⊆Sk+1

∑p

k=1dim(Sk) = n:S1+· · · +Sp=Rn

dim(Sk) = ik:Ik→Sk

dim(Sk)≥dim(Sk+1):Sk⊇Sk+1

Direction-Gradient subspaces

One-add-one-drop subspaces

Krylov subspaces

Nested subspaces

Complement subspaces

Subsampling/Sketching

Stochastic optimization

Active set methods (towards)

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 25 /51

Trust region methods with subspace method

The trust region subproblem (TRS) is normally

min

s∈RnQk(s) = g>

ks+1

2s>Bks

s.t. ksk2≤∆k,

(5)

where Bk≈∇2Qk(xk).

A subspace version of the trust region subproblem is suggested in [Shultz,

Schnabel, and Byrd 1985]:

min

s∈RnQk(s)

s.t. ksk2≤∆k,s∈Sk.

(6)

The Steihaug truncated CG method [Steihaug 1983]

Dog leg method [Powell 1970]

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 26 /51

Parallel computing reﬁnement of trust region methods

based on truncated CG method 5

Table 2: Speedup ratio of the parallel

reﬁnement of the trust region method

dimension np=2 np=4 np=6

100 =1021.68180 1.94154 2.36451

1000 =1030.920956 1.47545 1.55419

10000 =1041.79342 2.94063 3.86112

50000 =5×1041.87369 3.04962 3.94852

100000 =1051.89060 3.55094 5.17231

dimension np=8 np=10 np=12

100 =1022.91613 3.43903 3.67575

1000 =1031.84841 2.43805 2.64320

10000 =1044.49823 4.94911 5.18691

50000 =5×1045.10126 6.29814 6.71970

100000 =1055.88022 6.52538 7.02531

     





















Figure 4: Time versus number of process in

the parallel reﬁnement

TRS min

s∈RnQk(s),s.t. ksk2≤∆kis solved by truncated CG.

5Homework of Parallel Computing taught by Prof. Tao Cui

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 27 /51

Trust region methods with subspace methods

Theorem (Wang and Yuan 2006)

Suppose B1=σI, with σ>0,the matrix updating formula is any one chosen from

PSB and Broyden family, (where the updates may be singular), and Bkis the kth

updated matrix. Let skbe an optimal solution of TRS (5) and set xk+1=xk+sk.Let

Sk=span{g1,g2,··· ,gk}. Then sk∈Skand for any z ∈Sk,u∈S⊥

k,it holds

Bkz∈Sk,Bku=σu.

Subspace trust region quasi-Newton method for unconstrained optimization

[Wang and Yuan 2006].

Line search quasi-Newton methods [Gill and Leonard 1999; Gill and Leonard

2000].

Subspace Powell–Yuan trust region method for equality constrained

optimization [Grapiglia, Yuan, and Yuan 2013].

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 28 /51

Subspace relationship

dim(Sk) = dim(Sk+1):Sk≈Sk+1

dim(Sk)≤dim(Sk+1):Sk⊆Sk+1

∑p

k=1dim(Sk) = n:S1+· · · +Sp=Rn

dim(Sk) = ik:Ik→Sk

dim(Sk)≥dim(Sk+1):Sk⊇Sk+1

Direction-Gradient subspaces

One-add-one-drop subspaces

Krylov subspaces

Nested subspaces

Complement subspaces

Subsampling/Sketching

Stochastic optimization

Active set methods (towards)

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 29 /51

Coordinate descent methods

Algorithm 2 Coordinate Descent Algorithm

1: Input initial value x(0).

2: For t=1,2, ....

3: Pick coordinate ifrom 1,2, . .. n,

x(t+1)

i=argmin

xi∈R

fxi,ω(t)

−i,

where ω(t)

−irepresent all other coordinates.

Si=span{xi}

4: End.

Converges slowly.

Does not require

calculation of the

gradient ∇fk.

Several algorithms, such

as that of Hooke and

Jeeves [Hooke and

Jeeves 1961], are based

on these ideas

[Mackworth 1987;

Ricketts 1982].

Block coordinate descent method (BCD) [Tseng 2001]

⇓

The alternating direction method of multipliers (ADMM) [Boyd et al. 2011]

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 30 /51

Parallel line search subspace correction method

The optimization problem

min

x∈Rnϕ(x):=f(x) + h(x),(7)

where f(x): differentiable convex; h(x): convex function (possibly nonsmooth).

The `1-regularized minimization (LASSO) [Tibshirani 1996] and the sparse

logistic regression [Shevade and Keerthi 2003] are examples of (7).

Rn=X1+X2+· · · +Xp,

where Xi={x∈Rn|supp(x)⊂Ji},1≤i≤p,s.t. J:={1,...,n}and J=Sp

i=1Ji.

Let ϕ(i)

kbe a surrogate function of ϕrestricted to the i-th subspace at k-th iteration.

The PSC framework for solving (7) is:

d(i)

k=argmin

d(i)∈X(i)

ϕ(i)

kd(i),i=1,...,p,xk+1=xk+

∑

i=1

α(i)

kd(i)

k.(8)

The convergence: if ∑p

i=1α(i)

k≤1 and α(i)

k>0(1≤i≤p).

Usually, α(i)

kis quite small and convergence becomes slow.

A parallel subspace correction method (PSCL) is proposed in [Dong et al. 2015],

with the Armjio backtracking line search for a large step size.

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 31 /51

Subspace relationship

dim(Sk) = dim(Sk+1):Sk≈Sk+1

dim(Sk)≤dim(Sk+1):Sk⊆Sk+1

∑p

k=1dim(Sk) = n:S1+· · · +Sp=Rn

dim(Sk) = ik:Ik→Sk

dim(Sk)≥dim(Sk+1):Sk⊇Sk+1

Direction-Gradient subspaces

One-add-one-drop subspaces

Krylov subspaces

Nested subspaces

Complement subspaces

Subsampling/Sketching

Stochastic optimization

Active set methods (towards)

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 32 /51

Subspace by subsampling/sketching

For a linear least squares problem on massive data sets:

min

xkAx −bk2

2⇒min

xkW(Ax −b)k2

2,(9)

where A∈Rm×nand b∈Rm.

The sketching technique chooses a matrix W∈Rr×mwith rmand formulates a

reduced problem.

Each element of Wis sampled from an i.i.d. normal random variable with mean

zero and variance 1

r[Mahoney 2011; Woodruff 2014].

Consider the system of nonlinear equations

F(x) = 0,x∈Rn(10)

and nonlinear least squares problem

min

x∈RnkF(x)k2

where F(x) = (F1(x),F2(x),··· ,Fm(x))>∈Rm. Consider Fi(x) = 0,i∈Ik.

More work has been done in [Yuan 2009].

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 33 /51

Subspace by coordinate directions

[Yuan 2014]:

For sparsity structures, let g(i)

kbe the i-th component of the gradient gk, satisﬁes

g(i1)

k≥g(i2)

k≥g(i3)

k≥ · · · ≥ g(in)

k.

τ-steepest coordinates subspace:

Sk=spanne(i1),e(i2),...,e(iτ)o.

The steepest descent direction in the subspace is sufﬁciently descent:

min

d∈Sk

d>gk

kdk2kgkk2

≤ −τ

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 34 /51

Subspace relationship

dim(Sk) = dim(Sk+1):Sk≈Sk+1

dim(Sk)≤dim(Sk+1):Sk⊆Sk+1

∑p

k=1dim(Sk) = n:S1+· · · +Sp=Rn

dim(Sk) = ik:Ik→Sk

dim(Sk)≥dim(Sk+1):Sk⊇Sk+1

Direction-Gradient subspaces

One-add-one-drop subspaces

Krylov subspaces

Nested subspaces

Complement subspaces

Subsampling/Sketching

Stochastic optimization

Active set methods (towards)

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 35 /51

Stochastic methods

An empirical risk minimization is

min

xf(x):=1

∑

i=1

fi(x).

Stochastic gradient method [Goodfellow, Bengio, and Courville 2016]:

Selects a uniformly random sample skfrom {1,...,N}and updates

xk+1=xk−αk∇fsk(xk).

The mini-batch SGD method:

xk+1=xk−αk

|Ik|∑

sk∈Ik

∇fsk(xk).

The momentum method:vk+1=µkvk−αk∇fsk(xk),xk+1=xk+vk+1.

Stochastic Second-Order method: The subsampled Newton method:





IH

k

∑

i∈IH

∇2fi(x)

dk=−1

|Ik|∑

sk∈Ik

∇fsk(xk).

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 36 /51

Subspace relationship

dim(Sk) = dim(Sk+1):Sk≈Sk+1

dim(Sk)≤dim(Sk+1):Sk⊆Sk+1

∑p

k=1dim(Sk) = n:S1+· · · +Sp=Rn

dim(Sk) = ik:Ik→Sk

dim(Sk)≥dim(Sk+1):Sk⊇Sk+1

Direction-Gradient subspaces

One-add-one-drop subspaces

Krylov subspaces

Nested subspaces

Complement subspaces

Subsampling/Sketching

Stochastic optimization

Active set methods (towards)

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 37 /51

Active set methods for sparse optimization

The `1-regularized minimization problem

min

x∈Rnφµ(x):=µkxk1+f(x),(11)

where µ>0 and f(x):Rn→Ris continuously differentiable.

FPC AS [Wen et al. 2010], a two-stage active set algorithm.

Subspace optimization in the second stage:

For a given vector x∈Rn,

A(x):=ni∈ {1,··· ,n}|x(i)|=0oand I(x):=ni∈ {1,· · · ,n}|x(i)|>0o.

Then a smooth subproblem seems an essentially unconstrained problem

min

xµsignx(Ik)

k>x(Ik)+f(x),s.t. x(i)=0,i∈A(xk).(12)

If |I(xk+1)|>mthen do hard truncation. Solve the subspace optimization problem

to obtain xk+1.

Problem (12) can be solved by L-BFGS-B [Byrd et al. 1995].

The active set strategies have also been studied in [Solntsev, Nocedal, and

Byrd 2014; Keskar et al. 2015].

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 38 /51

Conclusion and future work

What kinds of subspace methods are wanted?

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 39 /51

Conclusion

Optimization problem

Find x∗

min

xf(x)

s.t. x∈X

Subproblem

Find xk+1=xk+d

min

dmk(xk+d)

s.t. d∈D

dim(Sk) = dim(Sk+1):Sk≈Sk+1

dim(Sk)≤dim(Sk+1):Sk⊆Sk+1

∑p

k=1dim(Sk) =n: S1+· · · +Sp=Rn

dim(Sk) = ik:Ik→Sk

dim(Sk)≥dim(Sk+1):Sk⊇Sk+1

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 40 /51

Future work

Relationship between subspaces in the iteration

Subspace methods in manifold optimization

Subspace methods in derivative free optimization

Subspace acceleration for given algorithms

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 41 /51

Future work: relationship between subspaces in the

iteration

Subspace is an evolution of the direction

Conjugate direction method

Conjugate subspace method

Deﬁnition

p0,p1,··· ,plis conjugate with respect to the

symmetric positive deﬁnite matrix Aif

iApj=0,for all i6=j.

Only search in a subspace ONCE Figure 5: Coordinate search

method can make slow progress

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 42 /51

Future work: subspace methods in derivative free

optimization

Main difference between Powell’s derivative free

optimization and optimization with derivative:

How to get subproblem objective function mk(x).











α0+α>y1+1

2y>

1Hy1=F(y1)

α0+α>y2+1

2y>

2Hy2=F(y2)

α0+α>yk+1

2y>

kHyk=F(yk).Figure 6: Model function by

interpolation

NEWUOA:

number of interpolation points: (n+1)(n+2)

2→2n+1

min

Qk



∇2Qk−∇2Qk−1



F,s.t. Qk(y) = F(y),y∈Yk.

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 43 /51

Future work: subspace methods in manifold optimization

Riemannian steepest descent method [Udriste 1994]:-grad f(x).

Robust global convergence

Slow local convergence: linear

Riemannian Newton method [Luenberger 1972; Gabay 1982]:

-Hess f(x)−1grad f(x).

Fast local convergence: quadratic or even cubic

Requires additional work for global convergence

Riemannian trust-region method [Absil, Baker, and Gallivan 2007].

Find solution to η=argmin

η∈TxM,kηk≤∆

mx(η),xnext =Rx(η).

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 44 /51

References I

P.-A Absil, Christopher Baker, and Kyle Gallivan. “Trust-Region Methods on Riemannian Manifolds”. In:

Foundations of Computational Mathematics 7 (July 2007), pp. 303–330. DOI:10.1007/s10208- 005-0179-9.

Knyazev Andrew. “Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned

Conjugate Gradient Method”. In: (Nov. 2001).

Richard Byrd, Jorge Nocedal, and Robert Schnabel. “Representations Of Quasi-Newton Matrices And Their Use In

Limited Memory Methods”. In: Mathematical Programming 63 (Aug. 1997). DOI:10.1007/BF01582063.

Stephen Boyd et al. “Distributed Optimization and Statistical Learning via the Alternating Direction Method of

Multipliers”. In: Foundations and Trends in Machine Learning 3 (Jan. 2011), pp. 1–122. DO I:

10.1561/2200000016.

Amir Beck and Marc Teboulle. “A Fast IterativeShrinkage-Thresholding Algorithm for Linear Inverse Problems”.

In: SIAM J. Imaging Sciences 2 (Jan. 2009), pp. 183–202. DOI:10.1137/080716542.

Richard H. Byrd et al. “A limited memory algorithm for bound constrained optimization”. English. In: SIAM

Journal of Scientiﬁc Computing 16 (Sept. 1995), pp. 1190–1208. ISS N: 1064-8275. DOI:10.1137/0916069.

Buxin Chen et al. “Image reconstruction and scan conﬁgurations enabled by optimization-based algorithms in

multispectral CT”. In: Physics in Medicine and Biology 62 (Nov. 2017), pp. 8763–8793. DOI:

10.1088/1361-6560/aa8a4b.

A. R. Conn et al. On Iterated-Subspace Minimization Methods for Nonlinear Optimization.1994.

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 45 /51

References II

Qian Dong et al. “A Parallel Line Search Subspace Correction Method for Composite Convex Optimization”. In:

Journal of the Operations Research Society of China 3 (May 2015). DOI:10.1007/s40305- 015-0079-x.

Massimo Fornasier. “Domain decomposition methods for linear inverse problems with sparsity constraints”. In:

Inverse Problems - INVERSE PROBL 23 (Dec. 2007). DOI:10.1088/0266-5611/23/6/014.

Massimo Fornasier and Carola-Bibiane Sch¨

onlieb. “Subspace Correction Methods for Total Variation and

`1-Minimization”. In: SIAM Journal on Numerical Analysis 47 (Jan. 2008). DOI:10.1137/070710779.

Daniel Gabay. “Minimizing a differentiable function over a differential manifold”. In: Journal of Optimization

Theory and Applications 37 (June 1982). DOI :10.1007/BF00934767.

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning.http://www.deeplearningbook.org.

MIT Press, 2016.

Philip Gill and Michael Leonard. “Reduced-Hessian Quasi-Newton Methods For Unconstrained Optimization”. In:

SIAM Journal on Optimization 12 (Mar. 2000). DOI:10.1137/S1052623400307950.

Philip Gill and Michael Leonard. “Limited-Memory Reduced-Hessian Methods For Large-Scale Unconstrained

Optimization”. In: SIAM J. Optim. 14 (Aug. 1999). DOI:10.1137/S1052623497319973.

Geovani Grapiglia, Jin-Yun Yuan, and Ya-xiang Yuan. “A Subspace Version of the Powell–Yuan Trust-Region

Algorithm for Equality Constrained Optimization”. In: Journal of the Operations Research Society of China 4 (Dec.

2013). DOI:10.1007/s40305- 013-0029- 4.

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 46 /51

References III

Robert Hooke and T. A. Jeeves. ““ Direct Search” Solution of Numerical and Statistical Problems”. In:J. ACM 8.2

(Apr. 1961), 212–229. ISSN : 0004-5411. DOI:10.1145/321062.321069.U RL:

https://doi.org/10.1145/321062.321069.

Nitish Keskar et al. “A Second-Order Method for Convex `1-Regularized Optimization with Active Set Prediction”.

In: Optimization Methods and Software 31 (May 2015). DOI:10.1080/10556788.2016.1138222.

David Luenberger. “The Gradient Projection Method Along Geodesics”. In: Management Science 18 (July 1972),

pp. 620–631. DOI :10.1287/mnsc.18.11.620.

Xin Liu, Zaiwen Wen, and Yin Zhang. “Limited Memory Block Krylov Subspace Optimization for Computing

Dominant Singular Value Decompositions”. In: SIAM Journal on Scientiﬁc Computing 35 (May 2013). DOI:

10.1137/120871328.

A.K. Mackworth. “John Wiley , Sons”. In: Encyclopedia of Artiﬁcial Intelligence (Jan. 1987), pp. 205–211.

Michael Mahoney. “Randomized Algorithms for Matrices and Data”. In: Computing Research Repository - CORR 3

(Apr. 2011). DOI:10.1561/2200000035.

Y. Nesterov. “Introductory Lectures on Convex Optimization: A Basic Course”. In: Comput. Program. (Jan. 2003).

Yu Nesterov. “A method of solving a convex programming problem with convergence rate O(1/k2)”. In: vol. 27.

Jan. 1983, pp. 372–376.

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 47 /51

References IV

Deanna Needell and Joel Tropp. “CoSaMP: Iterative Signal Recovery from Incomplete and Inaccurate Samples”.

In: Communications of the ACM 53 (Dec. 2010). DOI:10.1145/1859204.1859229.

Jorge Nocedal and Stephen Wright. Numerical Optimization.Jan. 2006. ISB N: 978-0-387-30303-1. DOI:

10.1007/978-0- 387-40065- 5.

Boris Polyak. “Some methods of speeding up the convergence of iteration methods”. In: Ussr Computational

Mathematics and Mathematical Physics 4 (Dec. 1964), pp. 1–17. DOI:10.1016/0041-5553(64)90137- 5.

M. J. D. Powell. “A Hybrid Method for Nonlinear Equations”. In: Numerical Methods for Nonlinear Algebraic

Equations. Ed. by P. Rabinowitz. Gordon and Breach, 1970.

R. E. Ricketts. “Practical optimization, Philip E. Gill, Walter Murray and Margret H. Wright, Academic Press Inc.

(London) Limited, 1981. No. of pages: 401. Price 19.20, 46.50. ISBN: 0.12.283950.1”. In: International Journal for

Numerical Methods in Engineering 18.6 (1982), pp. 954–954. DOI :

https://doi.org/10.1002/nme.1620180612. eprint:

https://onlinelibrary.wiley.com/doi/pdf/10.1002/nme.1620180612.URL:

https://onlinelibrary.wiley.com/doi/abs/10.1002/nme.1620180612.

S Shevade and S Keerthi. “A Simple and Efﬁcient Algorithm for Gene Selection Using Sparse Logistic Regression”.

In: Bioinformatics (Oxford, England) 19 (Dec. 2003), pp. 2246–2253. DOI:10.1093/bioinformatics/btg308.

Stefan Solntsev, Jorge Nocedal, and Richard Byrd. “An Algorithm for Quadratic `1-Regularized Optimization with

a Flexible Active-Set Strategy”. In: Optimization Methods and Software 30 (Dec. 2014). DOI:

10.1080/10556788.2015.1028062.

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 48 /51

References V

Gerald Shultz, Robert Schnabel, and Richard Byrd. “A Family of Trust-Region-Based Algorithms for

Unconstrained Minimization with Strong Global Convergence Properties”. In: Siam Journal on Numerical Analysis

- SIAM J NUMER ANAL 22 (Feb. 1985), pp. 47–67. DOI:10.1137/0722003.

Trond Steihaug. “The Conjugate Gradient Method and Trust Regions in Large Scale Optimization”. In: Siam

Journal on Numerical Analysis - SIAM J NUMER ANAL 20 (June 1983), pp. 626–637. DOI:10.1137/0720042.

Wenyu Sun and Ya-xiang Yuan. “Optimization theory and methods. Nonlinear programming”. In: 1 (Jan. 2006).

DOI:10.1007/b106451.

Joel Tropp and Anna Gilbert. “Signal Recovery From Random Measurements ViaOrthogonal Matching Pursuit”.

In: Information Theory, IEEE Transactionson 53 (Jan. 2008), pp. 4655 –4666. DOI:10.1109/TIT.2007.909108.

Robert Tibshirani. “Regression Shrinkage and Selection Via the Lasso”. In: Journal of the Royal Statistical Society:

Series B (Methodological) 58 (Jan. 1996), pp. 267–288. DOI :10.1111/j.2517-6161.1996.tb02080.x.

Joel Tropp et al. “Fixed-Rank Approximation of a Positive-Semideﬁnite Matrix from Streaming Data”. In: (June

2017).

P. Tseng. “Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization”. In: Journal of

Optimization Theory and Applications 109 (Jan. 2001), pp. 475–494. DOI :10.1023/A:1017501703105.

Paul Tseng and Sangwoon Yun. “A Coordinate Gradient Descent Method for Nonsmooth Separable Minimization”.

In: Math. Program. 117 (Mar.2009), pp. 387–423. DOI:10.1007/s10107- 007-0170- 0.

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 49 /51

References VI

Constantin Udriste. Convex Functions and Optimization Methods on Riemannian Manifolds.Jan. 1994. DOI:

10.1007/978-94- 015-8390- 9.

Zaiwen Wen et al. “A Fast Algorithm for Sparse Reconstruction Based on Shrinkage, Subspace Optimization, and

Continuation”. In: SIAM J. Scientiﬁc Computing 32 (Jan. 2010), pp. 1832–1857. DOI:10.1137/090747695.

Zaiwen Wen, Donald Goldfarb, and Katya Scheinberg. “Block Coordinate Descent Methods for Semideﬁnite

Programming”. In: vol. 166. Jan. 2012. DOI:10.1007/978-1- 4614-0769-0_19.

David Woodruff.“Sketching as a Tool for Numerical Linear Algebra”. In: Foundations and Trends in Theoretical

Computer Science 10 (Nov. 2014). DOI:10.1561/0400000060.

Zhouhong Wang and Ya-xiang Yuan. “A subspace implementation of quasi-Newton trust region methods for

unconstrained optimization”. In: Numerische Mathematik 104 (Aug. 2006), pp. 241–269. DOI :

10.1007/s00211-006- 0021-6.

Zaiwen Wen and Yin Zhang. “Block algorithms with augmented Rayleigh-Ritz projections for large-scale eigenpair

computation”. In: (July 2015).

Zaiwen Wen and Yin Zhang. “Accelerating Convergence by Augmented Rayleigh–Ritz Projections For Large-Scale

Eigenpair Computation”. In: SIAM Journal on Matrix Analysis and Applications 38 (Jan. 2017), pp. 273–296. DOI:

10.1137/16M1058534.

Ya-xiang Yuan. “Subspace methods for large scale nonlinear equations and nonlinear least squares”. In:

Optimization and Engineering 10 (June 2009), pp. 207–218. DOI :10.1007/s11081-008-9064- 0.

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 50 /51

References VII

Junyu Zhang, Zaiwen Wen, and Yin Zhang. “Subspace Methods with Local Reﬁnements for Eigenvalue

Computation Using Low-Rank Tensor-Train Format”. In: Journal of Scientiﬁc Computing 70 (July 2016). DOI:

10.1007/s10915-016- 0255-0.

Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 51 /51

ResearchGate has not been able to resolve any citations for this publication.

Image reconstruction and scan configurations enabled by optimization-based algorithms in multispectral CT

Article

Full-text available

Nov 2017
PHYS MED BIOL

Optimization-based algorithms for image reconstruction in multispectral (or photon-counting) computed tomography (MCT) remains a topic of active research. The challenge of optimization-based image reconstruction in MCT stems from the inherently non-linear data model that can lead to a non-convex optimization program for which no mathematically exact solver seems to exist for achieving globally optimal solutions. In this work, based upon a non-linear data model, we design a non-convex optimization program, derive its first-order-optimality conditions, and propose an algorithm to solve the program for image reconstruction in MCT. In addition to consideration of image reconstruction for the standard scan configuration, the emphasis is on investigating the algorithm's potential for enabling non-standard scan configurations with no or minimum hardware modification to existing CT systems, which has potential practical implications for lowered hardware cost, enhanced scanning flexibility, and reduced imaging dose/time in MCT. Numerical studies are carried out for verification of the algorithm and its implementation, and for a preliminary demonstration and characterization of the algorithm in reconstructing images and in enabling non-standard configurations with varying scanning angular range and/or x-ray illumination coverage in MCT.

Subspace Methods with Local Refinements for Eigenvalue Computation Using Low-Rank Tensor-Train Format

Article

Full-text available

Feb 2017
J SCI COMPUT

Computing a few eigenpairs from large-scale symmetric eigenvalue problems is far beyond the tractability of classic eigensolvers when the storage of the eigenvectors in the classical way is impossible. We consider a tractable case in which both the coefficient matrix and its eigenvectors can be represented in the low-rank tensor train formats. We propose a subspace optimization method combined with some suitable truncation steps to the given low-rank Tensor Train formats. Its performance can be further improved if the alternating minimization method is used to refine the intermediate solutions locally. Preliminary numerical experiments show that our algorithm is competitive to the state-of-the-art methods on problems arising from the discretization of the stationary Schrödinger equation.

Block algorithms with augmented Rayleigh-Ritz projections for large-scale eigenpair computation

Article

Full-text available

Jul 2015

Most iterative algorithms for eigenpair computation consist of two main steps: a subspace update (SU) step that generates bases for approximate eigenspaces, followed by a Rayleigh-Ritz (RR) projection step that extracts approximate eigenpairs. So far the predominant methodology for the SU step is based on Krylov subspaces that builds orthonormal bases piece by piece in a sequential manner. In this work, we investigate block methods in the SU step that allow a higher level of concurrency than what is reachable by Krylov subspace methods. To achieve a competitive speed, we propose an augmented Rayleigh-Ritz (ARR) procedure and analyze its rate of convergence under realistic conditions. Combining this ARR procedure with a set of polynomial accelerators, as well as utilizing a few other techniques such as continuation and deflation, we construct a block algorithm designed to reduce the number of RR steps and elevate concurrency in the SU steps. Extensive computational experiments are conducted in Matlab on a representative set of test problems to evaluate the performance of two variants of our algorithm in comparison to two well-established, high-quality eigensolvers ARPACK and FEAST. Numerical results, obtained on a many-core computer without explicit code parallelization, show that when computing a relatively large number of eigenpairs, the performance of our algorithms is competitive with, and frequently superior to, that of the two state-of-the-art eigensolvers.

Regression Shrinkage and Selection Via the Lasso

Article

Jan 1996

Robert Tibshirani

We propose a new method for estimation in linear models. The ‘lasso’ minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree‐based models are briefly described.

Numerical Optimization

Book

Jan 2006

Numerical Optimization presents a comprehensive and up-to-date description of the most effective methods in continuous optimization. It responds to the growing interest in optimization in engineering, science, and business by focusing on the methods that are best suited to practical problems. For this new edition the book has been thoroughly updated throughout. There are new chapters on nonlinear interior methods and derivative-free methods for optimization, both of which are used widely in practice and the focus of much current research. Because of the emphasis on practical methods, as well as the extensive illustrations and exercises, the book is accessible to a wide audience. It can be used as a graduate text in engineering, operations research, mathematics, computer science, and business. It also serves as a handbook for researchers and practitioners in the field. The authors have strived to produce a text that is pleasant to read, informative, and rigorous - one that reveals both the beautiful nature of the discipline and its practical side.

Fixed-Rank Approximation of a Positive-Semidefinite Matrix from Streaming Data

Article

Jun 2017

Several important applications, such as streaming PCA and semidefinite programming, involve a large-scale positive-semidefinite (psd) matrix that is presented as a sequence of linear updates. Because of storage limitations, it may only be possible to retain a sketch of the psd matrix. This paper develops a new algorithm for fixed-rank psd approximation from a sketch. The approach combines the Nystrom approximation with a novel mechanism for rank truncation. Theoretical analysis establishes that the proposed method can achieve any prescribed relative error in the Schatten 1-norm and that it exploits the spectral decay of the input matrix. Computer experiments show that the proposed method dominates alternative techniques for fixed-rank psd matrix approximation across a wide range of examples.

Accelerating Convergence by Augmented Rayleigh--Ritz Projections For Large-Scale Eigenpair Computation

Article

Jan 2017

Iterative algorithms for large-scale eigenpair computation of symmetric matrices are mostly based on subspace projections consisting of two main steps: a subspace update (SU) step that generates bases for approximate eigenspaces, followed by a Rayleigh--Ritz projection step that extracts approximate eigenpairs. A predominant methodology for the SU step makes use of Krylov subspaces and builds orthonormal bases piece by piece in a sequential manner. On the other hand, block methods such as the classic (simultaneous) subspace iteration, allow higher levels of concurrency than what is reachable by Krylov subspace methods, but may suffer from slow convergence. In this work, we analyze the rate of convergence for a simple block algorithmic framework that combines an augmented Rayleigh--Ritz (ARR) procedure with the subspace iteration. Our main results are Theorem 4.5 and its corollaries, which show that the ARR procedure can provide significant accelerations to convergence speed. Our analysis will offer useful...

A Second-Order Method for Convex $\ell_1$-Regularized Optimization with Active Set Prediction

Article

May 2015

We describe an active-set method for the minimization of an objective function $\phi$ that is the sum of a smooth convex function and an $\ell_1$-regularization term. A distinctive feature of the method is the way in which active-set identification and {second-order} subspace minimization steps are integrated to combine the predictive power of the two approaches. At every iteration, the algorithm selects a candidate set of free and fixed variables, performs an (inexact) subspace phase, and then assesses the quality of the new active set. If it is not judged to be acceptable, then the set of free variables is restricted and a new active-set prediction is made. We establish global convergence for our approach, and compare the new method against the state-of-the-art code LIBLINEAR.

A Parallel Line Search Subspace Correction Method for Composite Convex Optimization

Article

May 2015

In this paper, we investigate a parallel subspace correction framework for composite convex optimization. The variables are first divided into a few blocks based on certain rules. At each iteration, the algorithms solve a suitable subproblem on each block simultaneously, construct a search direction by combining their solutions on all blocks, then identify a new point along this direction using a step size satisfying the Armijo line search condition. They are called PSCLN and PSCLO, respectively, depending on whether there are overlapping regions between two immediately adjacent blocks of variables. Their convergence is established under mild assumptions. We compare PSCLN and PSCLO with the parallel version of the fast iterative thresholding algorithm and the fixed-point continuation method using the Barzilai-Borwein step size and the greedy coordinate block descent method for solving the $\ell _1$ -regularized minimization problems. Our numerical results show that PSCLN and PSCLO can run fast and return solutions not worse than those from the state-of-the-art algorithms on most test problems. It is also observed that the overlapping domain decomposition scheme is helpful when the data of the problem has certain special structures.

An Algorithm for Quadratic $\ell_1$-Regularized Optimization with a Flexible Active-Set Strategy

Article

Dec 2014

We present an active-set method for minimizing an objective that is the sum of a convex quadratic and $\ell_1$ regularization term. Unlike two-phase methods that combine a first-order active set identification step and a subspace phase consisting of a \emph{cycle} of conjugate gradient (CG) iterations, the method presented here has the flexibility of computing a first-order proximal gradient step or a subspace CG step at each iteration. The decision of which type of step to perform is based on the relative magnitudes of some scaled components of the minimum norm subgradient of the objective function. The paper establishes global rates of convergence, as well as work complexity estimates for two variants of our approach, which we call the iiCG method. Numerical results illustrating the behavior of the method on a variety of test problems are presented.

Subspace Methods

Abstract and Figures

Recommended publications

Special Issue: The IX International Seminar in Optimization and Related Areas Preface

Algorithmic Methods for Optimization in Public Transport (Dagstuhl Seminar 16171)

Special topics of applied mathematics. Functional analysis, numerical analysis and optimization. Pro...

Numerical optimization methods and their applications. Papers for the seminar