Content uploaded by Pengcheng Xie
Author content
All content in this area was uploaded by Pengcheng Xie on Dec 08, 2020
Content may be subject to copyright.
Subspace Methods: Dimension Balance between
Approximation to Optimization Problem and Solving
Subproblem
Pengcheng Xie
xpc@lsec.cc.ac.cn
Supervised by Prof. Ya-xiang Yuan
Institute of Computational Mathematics and Scientific/Engineering Computing
Academy of Mathematics and Systems Science
Chinese Academy of Sciences, China
Group Seminar
December 8, 2020
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 1 /51
Solve image reconstruction in CT by PDFO2: failed
An inverse problem in [Chen et al. 2017]:
find a best x∈Rnwhich satisfies
f(x) = y⇒min
x∈Rnkf(x)−yk2
2,
where f:Rn→Rnand y∈Rn,xor yrepresent a long
vector reshaped from a matrix of 512 ×512 =262144.
We choose PDFO to solve, but PDFO shows an error:
“uobyqa: problem too large for uobyqa. Try other
solvers.”
Figure 1:
Monochromatic image
of the DE-472 lung
phantoms
Sad: This problem can not and does not have to be solved by DFO
Happy: Tom’s words1
1Tom M. Ragonneau: Ph.D. Student in PolyU. Supervised by Prof. Zaikun Zhang and co-supervised by Prof. Xiaojun Chen.
2Powell’s Derivative Free Optimization solvers : https://www.pdfo.net
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 4 /51
Tom’s words and Zaikun’s subspace method
Tom:
“In DFO,n=100 is considered as a large
problem,n=200 is considered as a very
large problem. I read once that
NEWUOA has been tested with n=1000,
but this is incredibly huge.”
“Do you have any way to reduce the size
of your problem, to find some kind of
space (or lower dimension) in which your
variables may belong (even
approximately).”
[Zhang 2012]:
Solve the subproblem
min
d∈Sk
Qk(xk+d)
on the subspace
Sk=span∇Qk(xk),dk−1,¯
dk,
where
¯
dk=∑
y∈Ik
f(y)−f(xk)
ky−xkk2
·y−xk
ky−xkk2
≈∇f(xk),
where Ikis the interpolation point set.
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 5 /51
Zaikun’s subspace method
Algorithm 1 NEWUOAs
1. Choose positive sequences {hk},{pk}and constant ε≥0.Set the initial point x1;s0:=0; k:=1.
2. Choose mk∈[n+1,(n+1)(n+2)
2], call MODEL(xk,hk,mk), get ˜gk, the approximation of the gradient
at xk. If hk<εand k˜gkk<ε, end. Let
Sk=span{˜gk,sk−1}.
3. Set RHOEND =pk,call NEWUOA to solve the subproblem
min
d∈Sk
f(xk+d)
and get dk.
4. If f(xk+dk)<f(xk),then xk+1:=xk+dk,sk:=dk; otherwise, xk+1:=xk,sk:=sk−1.k:=k+1.
go to step2.
NEWUOA: dimension <1000.
NEWUOAs: dimension = 2000.
Global convergence and R-linear convergence
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 6 /51
Choose xk+1from xkin the subproblem
Line search method
1. Generate a descent search direction dk.
2. Search along this direction for a step
size αk.
min
α∈Rφk(α) = f(xk+αdk).
3. xk+1=xk+αkdk.
Direction: n-dimension problem
Stepsize: 1-dimension problem
Trust region method
1. Given trust region radius like a step size.
2. Compute a search direction in trust re-
gion.
min
s∈RnQk(s) = g>
ks+1
2s>Bks
s.t. ksk2≤∆k.
3. xk+1=xk+sk.
Radius: 1-dimension problem
Direction: n-dimension problem
Where is the mediant dimension problem?
(1<mediant <n)
Subspace methods
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 8 /51
Why do we want to solve the mediant dimension problem
Question:
There is no need to deliberately produce mediant dimension problem.
Answer:
There is unbalance in computing the direction and the stepsize3.
Reduce the dimension.
Gather more information.
Special problems or needs.
[Conn et al. 1994]:
Require that Skcontains at least two compo-
nents:
a Gradient-related direction, to
encourage global convergence.
a Newton-related direction, to
encourage fast asymptotic convergence.
Extension of the dogleg method:
min
d∈Rnm(d)def
=f+gTd+1
2dTBd
s.t. kdk ≤ ∆
⇓
min
dm(d) = f+gTd+1
2dTBd
s.t. kdk ≤ ∆,d∈spanhg,B−1gi
3Prof. Ya-xiang Yuan’s presentation on ICM 2014
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 9 /51
Typical scenarios to design subspace methods
[Liu, Wen and Yuan 2020]4:
Subproblem:
xk→xk+1:
min
dmk(xk+d)
s.t. d∈D
Problem:
min
xf(x)
s.t. x∈X
Find a linear combination of several known directions.
Linear and nonlinear conjugate gradient methods [Sun and Yuan
2006; Nocedal and Wright 2006]
Nesterov’s accelerated gradient method [Nesterov 2003; Nesterov
1983]
Heavy-ball method [Polyak 1964]
Momentum method [Goodfellow, Bengio, and Courville 2016]
Keep the objective function and constraints, but add an extra
restriction in a certain subspace.
OMP [Tropp and Gilbert 2008]
CoSaMP [Needell and Tropp 2010]
LOBPCG [Andrew 2001]
LMSVD [Liu, Wen, and Zhang 2013]
4Subspace Methods for Nonlinear Optimization: http://bicmr.pku.edu.cn/ wenzw/paper/SubOptv.pdf
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 11 /51
Typical scenarios to design subspace methods
Subproblem:
xk→xk+1:
min
dmk(xk+d)
s.t. d∈D
Problem:
min
xf(x)
s.t. x∈X
Approximate the objective function but keep the constraints.
BCD [Tseng and Yun 2009]
RBR [Wen, Goldfarb, and Scheinberg 2012]
Parallel subspace correction [Fornasier 2007; Fornasier and
Sch¨
onlieb 2008]
Use subspace techniques to approximate the objective
functions.
Sampling/Sketching [Goodfellow, Bengio, and Courville 2016;
Mahoney 2011]
Nystr¨
om approximation [Tropp et al. 2017]
Approximate the objective function and design new
constraints.
Trust region methods with subspaces [Shultz, Schnabel, and Byrd
1985]
FPC AS [Wen et al. 2010]
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 12 /51
Typical scenarios to design subspace methods
Subproblem:
xk→xk+1:
min
dmk(xk+d)
s.t. d∈D
Problem:
min
xf(x)
s.t. x∈X
Add a postprocess procedure after the subspace problem is
solved.
Truncated subspace method for tensor train [Zhang, Wen, and
Zhang 2016]
Integrate the optimization method and subspace update in one
framework.
Polynomial-filtered subspace method for low-rank matrix
optimization [Liu, Wen and Yuan 2020]
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 13 /51
Subspace relationship
dim(Sk) = dim(Sk+1):Sk≈Sk+1
dim(Sk)≤dim(Sk+1):Sk⊆Sk+1
∑p
k=1dim(Sk) = n:S1+· · · +Sp=Rn
dim(Sk) = ik:Ik→Sk
dim(Sk)≥dim(Sk+1):Sk⊇Sk+1
Direction-Gradient subspaces
One-add-one-drop subspaces
Krylov subspaces
Nested subspaces
Complement subspaces
Subsampling/Sketching
Stochastic optimization
Active set methods (towards)
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 14 /51
Subspace relationship
dim(Sk) = dim(Sk+1):Sk≈Sk+1
dim(Sk)≤dim(Sk+1):Sk⊆Sk+1
∑p
k=1dim(Sk) = n:S1+· · · +Sp=Rn
dim(Sk) = ik:Ik→Sk
dim(Sk)≥dim(Sk+1):Sk⊇Sk+1
Direction-Gradient subspaces
One-add-one-drop subspaces
Krylov subspaces
Nested subspaces
Complement subspaces
Subsampling/Sketching
Stochastic optimization
Active set methods (towards)
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 15 /51
Direction-Gradient subspace method for x∈Rn
Linear combination of several known directions
Conjugate gradient method:
dk=−gk+βk−1dk−1,
Sk=span{xk−1,gk,dk−1}.
Nesterov’s accelerated gradient method
(FISTA method) [Beck and Teboulle 2009;
Nesterov 2003]:
yk=xk−1+k−2
k+1(xk−1−xk−2),
xk=yk−αk∇f(yk),
Sk=span{xk−1,xk−2,∇f(yk)}.
Global convergence
n-step local quadratic
convergence
Gradient method:
stepsize: 1
L
convergence rate: O(1
k)
↓
FISTA method:
stepsize: 1
L
convergence rate: O(1
k2)
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 16 /51
Limited memory methods for eigenvalue computation
Find a p-dimensional eigenspace associated with plargest eigenvalues of A
⇔max
X∈Rn×ptrX>AX,s.t. X>X=I.(1)
The first-order optimality conditions of (1):
AX =XΛ,X>X=I,
where Λ=X>AX ∈Rp×pis the matrix of Lagrangian multipliers.
At each iteration, the methods solve a subspace trace maximization problem
Y=argmax
X∈Rn×pntrX>AX:X>X=I,X∈So.
LOBPCG [Andrew 2001]:S=span{Xi−1,Xi,AXi}.
No theory to predict accurately the convergence speed.
Not converge slower than block steepest ascent on every step.
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 17 /51
Truncated subspace method for tensor train
[Zhang, Wen, and Zhang 2016]:
x∈Rn→x∈Rn1×n2×···×nd,e.g.n→O(1042).
Tensor cores Xµ∈Rrµ−1×rµ×nµ, fixed dimension rµ, constant r: TT rank.
Figure 2: xi1i2...id=X1(i1)X2(i2)·· ·Xd(id)
TT format
Figure 3: Xi1,...,iµ,...,id;j=X1(i1)·· · Xµ,jiµ·· ·Xd(id)
µ-BTT format
Operator TT format: Ai1i2···id,j1j2···jd=A1(i1,j1)A2(i2,j2)· · · Ad(id,jd),
where Aµiµ,jµ∈Rrµ−1×rµfor iµ,jµ∈1,...,nµ.
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 18 /51
Truncated subspace method for tensor train
The eigenvalue problem in the BTT format is
min
X∈Rn×ptrX>AX,s.t. X>X=Ipand X∈Tn,r,p.
Subspaces: ST
k=span{PT(AXk),Xk,Xk−1},ST
k=span{Xk,PT(Rk),PT(Pk)},
where PT(AXk)is the truncation of Xto Tn,r,p. The subspace problem in the BTT
format is
Yk+1:=argmin
X∈Rn×p
trX>AX,s.t. X>X=Ip,X∈ST
k,(2)
which is equivalent to a generalized eigenvalue decomposition problem:
min
V∈Rq×ptrV>S>ASV,s.t. V>S>SV =Ip.
We next project Yk+1to the required space Tn,r,pas
Xk+1=argmin
X∈Rn×p
kX−Yk+1k2
F,s.t. X>X=Ip,X∈Tn,r,p.
This problem can be solved by using the alternating minimization scheme.
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 19 /51
Subspace relationship
dim(Sk) = dim(Sk+1):Sk≈Sk+1
dim(Sk)≤dim(Sk+1):Sk⊆Sk+1
∑p
k=1dim(Sk) = n:S1+· · · +Sp=Rn
dim(Sk) = ik:Ik→Sk
dim(Sk)≥dim(Sk+1):Sk⊇Sk+1
Direction-Gradient subspaces
One-add-one-drop subspaces
Krylov subspaces
Nested subspaces
Complement subspaces
Subsampling/Sketching
Stochastic optimization
Active set methods (towards)
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 20 /51
Quasi-Newton methods
L-BFGS: Matrix Bkand inverse matrix Hk[Sun and Yuan 2006; Nocedal and
Wright 2006].
The search direction is dk=−B−1
kgk=−Hkgk(Both Bkand Hkcan be written in a
compact representation [Byrd, Nocedal, and Schnabel 1997].
Assume that there are ppairs of vectors:
Uk=sk−p,...,sk−1∈Rn×p,Yk=yk−p,...,yk−1∈Rn×p,
where si=xi+1−xi,yi=gi+1−gi.
For a given initial matrix H(0)
k,Hk=H(0)
k+CkPkC>
k, where
Ck:=hUk,H(0)
kYki∈Rn×2p,Dk=diaghs>
k−pyk−p,...,s>
k−1yk−1i,
Pk:="R−>
kDk+Y>
kH(0)
kYkR−1
k−R−>
k
−R−1
k0#,(Rk)i,j=s>
k−p+i−1yk−p+j−1,if i≤j,
0,otherwise.
The initial matrix H(0)
kis γkI. Then
dk∈spangk,sk−1,...,sk−p,yk−1,...,yk−p.
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 21 /51
Limited memory methods for eigenvalue computation
max
X∈Rn×ptrX>AX,s.t. X>X=I.(3)
The first-order optimality conditions of (3):
AX =XΛ,X>X=I,
where Λ=X>AX ∈Rp×pis the matrix of Lagrangian multipliers.
At each iteration, the methods solve a subspace trace maximization problem
Y=argmax
X∈Rn×pntrX>AX:X>X=I,X∈So.
.....................................................................................................................................
LMSVD [Liu, Wen, and Zhang 2013]:S=span {Xi,Xi−1,· ·· ,Xi−t}.
Global convergence under reasonable assumptions.
Table 1: SSI vs LMSVD (pkn)
method SSI LMSVD
total cost per iteration O(n+k)Ok(1+p)2
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 22 /51
Subspace relationship
dim(Sk) = dim(Sk+1):Sk≈Sk+1
dim(Sk)≤dim(Sk+1):Sk⊆Sk+1
∑p
k=1dim(Sk) = n:S1+· · · +Sp=Rn
dim(Sk) = ik:Ik→Sk
dim(Sk)≥dim(Sk+1):Sk⊇Sk+1
Direction-Gradient subspaces
One-add-one-drop subspaces
Krylov subspaces
Nested subspaces
Complement subspaces
Subsampling/Sketching
Stochastic optimization
Active set methods (towards)
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 23 /51
Augmented Rayleigh-Ritz method for eigenvalue
computation
RR map (Y,Σ) = RR(A,Z)⇔the trace-maximization subproblem with
S=R(Z).
The augmentation of the subspaces in LOGPCG and LMSVD is the main
reason why they generally achieve faster convergence than the classic SSI.
ARR: For some integer t≥0, design a block Krylov subspace structure:
S=spannX,AX,A2X,...,AtXo.(4)
RR procedure using (ˆ
Y,ˆ
Σ) = RR(A,Kt), where Kt=X,AX,A2X,...,AtX.
The pleading Ritz pairs (Y,Σ)is extracted from (ˆ
Y,ˆ
Σ).
The analysis of ARR in [Wen and Zhang 2017; Wen and Zhang 2015]:
the convergence rate of SSI:
λp+1
λpfor RR (t=0)⇒
λ(t+1)p+1
λpfor ARR (t>0).
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 24 /51
Subspace relationship
dim(Sk) = dim(Sk+1):Sk≈Sk+1
dim(Sk)≤dim(Sk+1):Sk⊆Sk+1
∑p
k=1dim(Sk) = n:S1+· · · +Sp=Rn
dim(Sk) = ik:Ik→Sk
dim(Sk)≥dim(Sk+1):Sk⊇Sk+1
Direction-Gradient subspaces
One-add-one-drop subspaces
Krylov subspaces
Nested subspaces
Complement subspaces
Subsampling/Sketching
Stochastic optimization
Active set methods (towards)
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 25 /51
Trust region methods with subspace method
The trust region subproblem (TRS) is normally
min
s∈RnQk(s) = g>
ks+1
2s>Bks
s.t. ksk2≤∆k,
(5)
where Bk≈∇2Qk(xk).
A subspace version of the trust region subproblem is suggested in [Shultz,
Schnabel, and Byrd 1985]:
min
s∈RnQk(s)
s.t. ksk2≤∆k,s∈Sk.
(6)
The Steihaug truncated CG method [Steihaug 1983]
Dog leg method [Powell 1970]
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 26 /51
Parallel computing refinement of trust region methods
based on truncated CG method 5
Table 2: Speedup ratio of the parallel
refinement of the trust region method
dimension np=2 np=4 np=6
100 =1021.68180 1.94154 2.36451
1000 =1030.920956 1.47545 1.55419
10000 =1041.79342 2.94063 3.86112
50000 =5×1041.87369 3.04962 3.94852
100000 =1051.89060 3.55094 5.17231
dimension np=8 np=10 np=12
100 =1022.91613 3.43903 3.67575
1000 =1031.84841 2.43805 2.64320
10000 =1044.49823 4.94911 5.18691
50000 =5×1045.10126 6.29814 6.71970
100000 =1055.88022 6.52538 7.02531
Figure 4: Time versus number of process in
the parallel refinement
TRS min
s∈RnQk(s),s.t. ksk2≤∆kis solved by truncated CG.
5Homework of Parallel Computing taught by Prof. Tao Cui
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 27 /51
Trust region methods with subspace methods
Theorem (Wang and Yuan 2006)
Suppose B1=σI, with σ>0,the matrix updating formula is any one chosen from
PSB and Broyden family, (where the updates may be singular), and Bkis the kth
updated matrix. Let skbe an optimal solution of TRS (5) and set xk+1=xk+sk.Let
Sk=span{g1,g2,··· ,gk}. Then sk∈Skand for any z ∈Sk,u∈S⊥
k,it holds
Bkz∈Sk,Bku=σu.
Subspace trust region quasi-Newton method for unconstrained optimization
[Wang and Yuan 2006].
Line search quasi-Newton methods [Gill and Leonard 1999; Gill and Leonard
2000].
Subspace Powell–Yuan trust region method for equality constrained
optimization [Grapiglia, Yuan, and Yuan 2013].
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 28 /51
Subspace relationship
dim(Sk) = dim(Sk+1):Sk≈Sk+1
dim(Sk)≤dim(Sk+1):Sk⊆Sk+1
∑p
k=1dim(Sk) = n:S1+· · · +Sp=Rn
dim(Sk) = ik:Ik→Sk
dim(Sk)≥dim(Sk+1):Sk⊇Sk+1
Direction-Gradient subspaces
One-add-one-drop subspaces
Krylov subspaces
Nested subspaces
Complement subspaces
Subsampling/Sketching
Stochastic optimization
Active set methods (towards)
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 29 /51
Coordinate descent methods
Algorithm 2 Coordinate Descent Algorithm
1: Input initial value x(0).
2: For t=1,2, ....
3: Pick coordinate ifrom 1,2, . .. n,
x(t+1)
i=argmin
xi∈R
fxi,ω(t)
−i,
where ω(t)
−irepresent all other coordinates.
Si=span{xi}
4: End.
Converges slowly.
Does not require
calculation of the
gradient ∇fk.
Several algorithms, such
as that of Hooke and
Jeeves [Hooke and
Jeeves 1961], are based
on these ideas
[Mackworth 1987;
Ricketts 1982].
Block coordinate descent method (BCD) [Tseng 2001]
⇓
The alternating direction method of multipliers (ADMM) [Boyd et al. 2011]
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 30 /51
Parallel line search subspace correction method
The optimization problem
min
x∈Rnϕ(x):=f(x) + h(x),(7)
where f(x): differentiable convex; h(x): convex function (possibly nonsmooth).
The `1-regularized minimization (LASSO) [Tibshirani 1996] and the sparse
logistic regression [Shevade and Keerthi 2003] are examples of (7).
Rn=X1+X2+· · · +Xp,
where Xi={x∈Rn|supp(x)⊂Ji},1≤i≤p,s.t. J:={1,...,n}and J=Sp
i=1Ji.
Let ϕ(i)
kbe a surrogate function of ϕrestricted to the i-th subspace at k-th iteration.
The PSC framework for solving (7) is:
d(i)
k=argmin
d(i)∈X(i)
ϕ(i)
kd(i),i=1,...,p,xk+1=xk+
p
∑
i=1
α(i)
kd(i)
k.(8)
The convergence: if ∑p
i=1α(i)
k≤1 and α(i)
k>0(1≤i≤p).
Usually, α(i)
kis quite small and convergence becomes slow.
A parallel subspace correction method (PSCL) is proposed in [Dong et al. 2015],
with the Armjio backtracking line search for a large step size.
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 31 /51
Subspace relationship
dim(Sk) = dim(Sk+1):Sk≈Sk+1
dim(Sk)≤dim(Sk+1):Sk⊆Sk+1
∑p
k=1dim(Sk) = n:S1+· · · +Sp=Rn
dim(Sk) = ik:Ik→Sk
dim(Sk)≥dim(Sk+1):Sk⊇Sk+1
Direction-Gradient subspaces
One-add-one-drop subspaces
Krylov subspaces
Nested subspaces
Complement subspaces
Subsampling/Sketching
Stochastic optimization
Active set methods (towards)
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 32 /51
Subspace by subsampling/sketching
For a linear least squares problem on massive data sets:
min
xkAx −bk2
2⇒min
xkW(Ax −b)k2
2,(9)
where A∈Rm×nand b∈Rm.
The sketching technique chooses a matrix W∈Rr×mwith rmand formulates a
reduced problem.
Each element of Wis sampled from an i.i.d. normal random variable with mean
zero and variance 1
r[Mahoney 2011; Woodruff 2014].
Consider the system of nonlinear equations
F(x) = 0,x∈Rn(10)
and nonlinear least squares problem
min
x∈RnkF(x)k2
2,
where F(x) = (F1(x),F2(x),··· ,Fm(x))>∈Rm. Consider Fi(x) = 0,i∈Ik.
More work has been done in [Yuan 2009].
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 33 /51
Subspace by coordinate directions
[Yuan 2014]:
For sparsity structures, let g(i)
kbe the i-th component of the gradient gk, satisfies
g(i1)
k≥g(i2)
k≥g(i3)
k≥ · · · ≥ g(in)
k.
τ-steepest coordinates subspace:
Sk=spanne(i1),e(i2),...,e(iτ)o.
The steepest descent direction in the subspace is sufficiently descent:
min
d∈Sk
d>gk
kdk2kgkk2
≤ −τ
n.
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 34 /51
Subspace relationship
dim(Sk) = dim(Sk+1):Sk≈Sk+1
dim(Sk)≤dim(Sk+1):Sk⊆Sk+1
∑p
k=1dim(Sk) = n:S1+· · · +Sp=Rn
dim(Sk) = ik:Ik→Sk
dim(Sk)≥dim(Sk+1):Sk⊇Sk+1
Direction-Gradient subspaces
One-add-one-drop subspaces
Krylov subspaces
Nested subspaces
Complement subspaces
Subsampling/Sketching
Stochastic optimization
Active set methods (towards)
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 35 /51
Stochastic methods
An empirical risk minimization is
min
xf(x):=1
N
N
∑
i=1
fi(x).
Stochastic gradient method [Goodfellow, Bengio, and Courville 2016]:
Selects a uniformly random sample skfrom {1,...,N}and updates
xk+1=xk−αk∇fsk(xk).
The mini-batch SGD method:
xk+1=xk−αk
|Ik|∑
sk∈Ik
∇fsk(xk).
The momentum method:vk+1=µkvk−αk∇fsk(xk),xk+1=xk+vk+1.
Stochastic Second-Order method: The subsampled Newton method:
1
IH
k
∑
i∈IH
k
∇2fi(x)
dk=−1
|Ik|∑
sk∈Ik
∇fsk(xk).
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 36 /51
Subspace relationship
dim(Sk) = dim(Sk+1):Sk≈Sk+1
dim(Sk)≤dim(Sk+1):Sk⊆Sk+1
∑p
k=1dim(Sk) = n:S1+· · · +Sp=Rn
dim(Sk) = ik:Ik→Sk
dim(Sk)≥dim(Sk+1):Sk⊇Sk+1
Direction-Gradient subspaces
One-add-one-drop subspaces
Krylov subspaces
Nested subspaces
Complement subspaces
Subsampling/Sketching
Stochastic optimization
Active set methods (towards)
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 37 /51
Active set methods for sparse optimization
The `1-regularized minimization problem
min
x∈Rnφµ(x):=µkxk1+f(x),(11)
where µ>0 and f(x):Rn→Ris continuously differentiable.
FPC AS [Wen et al. 2010], a two-stage active set algorithm.
Subspace optimization in the second stage:
For a given vector x∈Rn,
A(x):=ni∈ {1,··· ,n}|x(i)|=0oand I(x):=ni∈ {1,· · · ,n}|x(i)|>0o.
Then a smooth subproblem seems an essentially unconstrained problem
min
xµsignx(Ik)
k>x(Ik)+f(x),s.t. x(i)=0,i∈A(xk).(12)
If |I(xk+1)|>mthen do hard truncation. Solve the subspace optimization problem
to obtain xk+1.
Problem (12) can be solved by L-BFGS-B [Byrd et al. 1995].
The active set strategies have also been studied in [Solntsev, Nocedal, and
Byrd 2014; Keskar et al. 2015].
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 38 /51
Future work: relationship between subspaces in the
iteration
Subspace is an evolution of the direction
Conjugate direction method
Conjugate subspace method
Definition
p0,p1,··· ,plis conjugate with respect to the
symmetric positive definite matrix Aif
pT
iApj=0,for all i6=j.
Only search in a subspace ONCE Figure 5: Coordinate search
method can make slow progress
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 42 /51
Future work: subspace methods in derivative free
optimization
Main difference between Powell’s derivative free
optimization and optimization with derivative:
How to get subproblem objective function mk(x).
α0+α>y1+1
2y>
1Hy1=F(y1)
α0+α>y2+1
2y>
2Hy2=F(y2)
.
.
.
α0+α>yk+1
2y>
kHyk=F(yk).Figure 6: Model function by
interpolation
NEWUOA:
number of interpolation points: (n+1)(n+2)
2→2n+1
min
Qk
∇2Qk−∇2Qk−1
2
F,s.t. Qk(y) = F(y),y∈Yk.
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 43 /51
Future work: subspace methods in manifold optimization
Riemannian steepest descent method [Udriste 1994]:-grad f(x).
Robust global convergence
Slow local convergence: linear
Riemannian Newton method [Luenberger 1972; Gabay 1982]:
-Hess f(x)−1grad f(x).
Fast local convergence: quadratic or even cubic
Requires additional work for global convergence
Riemannian trust-region method [Absil, Baker, and Gallivan 2007].
Find solution to η=argmin
η∈TxM,kηk≤∆
mx(η),xnext =Rx(η).
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 44 /51
References I
P.-A Absil, Christopher Baker, and Kyle Gallivan. “Trust-Region Methods on Riemannian Manifolds”. In:
Foundations of Computational Mathematics 7 (July 2007), pp. 303–330. DOI:10.1007/s10208- 005-0179-9.
Knyazev Andrew. “Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned
Conjugate Gradient Method”. In: (Nov. 2001).
Richard Byrd, Jorge Nocedal, and Robert Schnabel. “Representations Of Quasi-Newton Matrices And Their Use In
Limited Memory Methods”. In: Mathematical Programming 63 (Aug. 1997). DOI:10.1007/BF01582063.
Stephen Boyd et al. “Distributed Optimization and Statistical Learning via the Alternating Direction Method of
Multipliers”. In: Foundations and Trends in Machine Learning 3 (Jan. 2011), pp. 1–122. DO I:
10.1561/2200000016.
Amir Beck and Marc Teboulle. “A Fast IterativeShrinkage-Thresholding Algorithm for Linear Inverse Problems”.
In: SIAM J. Imaging Sciences 2 (Jan. 2009), pp. 183–202. DOI:10.1137/080716542.
Richard H. Byrd et al. “A limited memory algorithm for bound constrained optimization”. English. In: SIAM
Journal of Scientific Computing 16 (Sept. 1995), pp. 1190–1208. ISS N: 1064-8275. DOI:10.1137/0916069.
Buxin Chen et al. “Image reconstruction and scan configurations enabled by optimization-based algorithms in
multispectral CT”. In: Physics in Medicine and Biology 62 (Nov. 2017), pp. 8763–8793. DOI:
10.1088/1361-6560/aa8a4b.
A. R. Conn et al. On Iterated-Subspace Minimization Methods for Nonlinear Optimization.1994.
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 45 /51
References II
Qian Dong et al. “A Parallel Line Search Subspace Correction Method for Composite Convex Optimization”. In:
Journal of the Operations Research Society of China 3 (May 2015). DOI:10.1007/s40305- 015-0079-x.
Massimo Fornasier. “Domain decomposition methods for linear inverse problems with sparsity constraints”. In:
Inverse Problems - INVERSE PROBL 23 (Dec. 2007). DOI:10.1088/0266-5611/23/6/014.
Massimo Fornasier and Carola-Bibiane Sch¨
onlieb. “Subspace Correction Methods for Total Variation and
`1-Minimization”. In: SIAM Journal on Numerical Analysis 47 (Jan. 2008). DOI:10.1137/070710779.
Daniel Gabay. “Minimizing a differentiable function over a differential manifold”. In: Journal of Optimization
Theory and Applications 37 (June 1982). DOI :10.1007/BF00934767.
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning.http://www.deeplearningbook.org.
MIT Press, 2016.
Philip Gill and Michael Leonard. “Reduced-Hessian Quasi-Newton Methods For Unconstrained Optimization”. In:
SIAM Journal on Optimization 12 (Mar. 2000). DOI:10.1137/S1052623400307950.
Philip Gill and Michael Leonard. “Limited-Memory Reduced-Hessian Methods For Large-Scale Unconstrained
Optimization”. In: SIAM J. Optim. 14 (Aug. 1999). DOI:10.1137/S1052623497319973.
Geovani Grapiglia, Jin-Yun Yuan, and Ya-xiang Yuan. “A Subspace Version of the Powell–Yuan Trust-Region
Algorithm for Equality Constrained Optimization”. In: Journal of the Operations Research Society of China 4 (Dec.
2013). DOI:10.1007/s40305- 013-0029- 4.
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 46 /51
References III
Robert Hooke and T. A. Jeeves. ““ Direct Search” Solution of Numerical and Statistical Problems”. In:J. ACM 8.2
(Apr. 1961), 212–229. ISSN : 0004-5411. DOI:10.1145/321062.321069.U RL:
https://doi.org/10.1145/321062.321069.
Nitish Keskar et al. “A Second-Order Method for Convex `1-Regularized Optimization with Active Set Prediction”.
In: Optimization Methods and Software 31 (May 2015). DOI:10.1080/10556788.2016.1138222.
David Luenberger. “The Gradient Projection Method Along Geodesics”. In: Management Science 18 (July 1972),
pp. 620–631. DOI :10.1287/mnsc.18.11.620.
Xin Liu, Zaiwen Wen, and Yin Zhang. “Limited Memory Block Krylov Subspace Optimization for Computing
Dominant Singular Value Decompositions”. In: SIAM Journal on Scientific Computing 35 (May 2013). DOI:
10.1137/120871328.
A.K. Mackworth. “John Wiley , Sons”. In: Encyclopedia of Artificial Intelligence (Jan. 1987), pp. 205–211.
Michael Mahoney. “Randomized Algorithms for Matrices and Data”. In: Computing Research Repository - CORR 3
(Apr. 2011). DOI:10.1561/2200000035.
Y. Nesterov. “Introductory Lectures on Convex Optimization: A Basic Course”. In: Comput. Program. (Jan. 2003).
Yu Nesterov. “A method of solving a convex programming problem with convergence rate O(1/k2)”. In: vol. 27.
Jan. 1983, pp. 372–376.
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 47 /51
References IV
Deanna Needell and Joel Tropp. “CoSaMP: Iterative Signal Recovery from Incomplete and Inaccurate Samples”.
In: Communications of the ACM 53 (Dec. 2010). DOI:10.1145/1859204.1859229.
Jorge Nocedal and Stephen Wright. Numerical Optimization.Jan. 2006. ISB N: 978-0-387-30303-1. DOI:
10.1007/978-0- 387-40065- 5.
Boris Polyak. “Some methods of speeding up the convergence of iteration methods”. In: Ussr Computational
Mathematics and Mathematical Physics 4 (Dec. 1964), pp. 1–17. DOI:10.1016/0041-5553(64)90137- 5.
M. J. D. Powell. “A Hybrid Method for Nonlinear Equations”. In: Numerical Methods for Nonlinear Algebraic
Equations. Ed. by P. Rabinowitz. Gordon and Breach, 1970.
R. E. Ricketts. “Practical optimization, Philip E. Gill, Walter Murray and Margret H. Wright, Academic Press Inc.
(London) Limited, 1981. No. of pages: 401. Price 19.20, 46.50. ISBN: 0.12.283950.1”. In: International Journal for
Numerical Methods in Engineering 18.6 (1982), pp. 954–954. DOI :
https://doi.org/10.1002/nme.1620180612. eprint:
https://onlinelibrary.wiley.com/doi/pdf/10.1002/nme.1620180612.URL:
https://onlinelibrary.wiley.com/doi/abs/10.1002/nme.1620180612.
S Shevade and S Keerthi. “A Simple and Efficient Algorithm for Gene Selection Using Sparse Logistic Regression”.
In: Bioinformatics (Oxford, England) 19 (Dec. 2003), pp. 2246–2253. DOI:10.1093/bioinformatics/btg308.
Stefan Solntsev, Jorge Nocedal, and Richard Byrd. “An Algorithm for Quadratic `1-Regularized Optimization with
a Flexible Active-Set Strategy”. In: Optimization Methods and Software 30 (Dec. 2014). DOI:
10.1080/10556788.2015.1028062.
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 48 /51
References V
Gerald Shultz, Robert Schnabel, and Richard Byrd. “A Family of Trust-Region-Based Algorithms for
Unconstrained Minimization with Strong Global Convergence Properties”. In: Siam Journal on Numerical Analysis
- SIAM J NUMER ANAL 22 (Feb. 1985), pp. 47–67. DOI:10.1137/0722003.
Trond Steihaug. “The Conjugate Gradient Method and Trust Regions in Large Scale Optimization”. In: Siam
Journal on Numerical Analysis - SIAM J NUMER ANAL 20 (June 1983), pp. 626–637. DOI:10.1137/0720042.
Wenyu Sun and Ya-xiang Yuan. “Optimization theory and methods. Nonlinear programming”. In: 1 (Jan. 2006).
DOI:10.1007/b106451.
Joel Tropp and Anna Gilbert. “Signal Recovery From Random Measurements ViaOrthogonal Matching Pursuit”.
In: Information Theory, IEEE Transactionson 53 (Jan. 2008), pp. 4655 –4666. DOI:10.1109/TIT.2007.909108.
Robert Tibshirani. “Regression Shrinkage and Selection Via the Lasso”. In: Journal of the Royal Statistical Society:
Series B (Methodological) 58 (Jan. 1996), pp. 267–288. DOI :10.1111/j.2517-6161.1996.tb02080.x.
Joel Tropp et al. “Fixed-Rank Approximation of a Positive-Semidefinite Matrix from Streaming Data”. In: (June
2017).
P. Tseng. “Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization”. In: Journal of
Optimization Theory and Applications 109 (Jan. 2001), pp. 475–494. DOI :10.1023/A:1017501703105.
Paul Tseng and Sangwoon Yun. “A Coordinate Gradient Descent Method for Nonsmooth Separable Minimization”.
In: Math. Program. 117 (Mar.2009), pp. 387–423. DOI:10.1007/s10107- 007-0170- 0.
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 49 /51
References VI
Constantin Udriste. Convex Functions and Optimization Methods on Riemannian Manifolds.Jan. 1994. DOI:
10.1007/978-94- 015-8390- 9.
Zaiwen Wen et al. “A Fast Algorithm for Sparse Reconstruction Based on Shrinkage, Subspace Optimization, and
Continuation”. In: SIAM J. Scientific Computing 32 (Jan. 2010), pp. 1832–1857. DOI:10.1137/090747695.
Zaiwen Wen, Donald Goldfarb, and Katya Scheinberg. “Block Coordinate Descent Methods for Semidefinite
Programming”. In: vol. 166. Jan. 2012. DOI:10.1007/978-1- 4614-0769-0_19.
David Woodruff.“Sketching as a Tool for Numerical Linear Algebra”. In: Foundations and Trends in Theoretical
Computer Science 10 (Nov. 2014). DOI:10.1561/0400000060.
Zhouhong Wang and Ya-xiang Yuan. “A subspace implementation of quasi-Newton trust region methods for
unconstrained optimization”. In: Numerische Mathematik 104 (Aug. 2006), pp. 241–269. DOI :
10.1007/s00211-006- 0021-6.
Zaiwen Wen and Yin Zhang. “Block algorithms with augmented Rayleigh-Ritz projections for large-scale eigenpair
computation”. In: (July 2015).
Zaiwen Wen and Yin Zhang. “Accelerating Convergence by Augmented Rayleigh–Ritz Projections For Large-Scale
Eigenpair Computation”. In: SIAM Journal on Matrix Analysis and Applications 38 (Jan. 2017), pp. 273–296. DOI:
10.1137/16M1058534.
Ya-xiang Yuan. “Subspace methods for large scale nonlinear equations and nonlinear least squares”. In:
Optimization and Engineering 10 (June 2009), pp. 207–218. DOI :10.1007/s11081-008-9064- 0.
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 50 /51
References VII
Junyu Zhang, Zaiwen Wen, and Yin Zhang. “Subspace Methods with Local Refinements for Eigenvalue
Computation Using Low-Rank Tensor-Train Format”. In: Journal of Scientific Computing 70 (July 2016). DOI:
10.1007/s10915-016- 0255-0.
Pengcheng Xie (ICMSEC AMSS) Subspace methods December 8, 2020 51 /51