Content uploaded by Marcella Bonazzoli
Author content
All content in this area was uploaded by Marcella Bonazzoli on Jul 28, 2022
Content may be subject to copyright.
ISSN 0249-6399 ISRN INRIA/RR--9477--FR+ENG
RESEARCH
REPORT
N° 9477
July 2022
Project-Teams IDEFIX
Convergence analysis of
multi-step one-shot
methods for linear
inverse problems
Marcella Bonazzoli, Houssem Haddar, Tuan Anh Vu
RESEARCH CENTRE
SACLAY – ÎLE-DE-FRANCE
1 rue Honoré d’Estienne d’Orves
Bâtiment Alan Turing
Campus de l’École Polytechnique
91120 Palaiseau
Convergence analysis of multi-step one-shot
methods for linear inverse problems
Marcella Bonazzoli∗, Houssem Haddar∗, Tuan Anh Vu∗
Project-Teams IDEFIX
Research Report n°9477 — July 2022 — 55 pages
Abstract: In this work we are interested in general linear inverse problems where the corre-
sponding forward problem is solved iteratively using fixed point methods. Then one-shot methods,
which iterate at the same time on the forward problem solution and on the inverse problem un-
known, can be applied. We analyze two variants of the so-called multi-step one-shot methods and
establish sufficient conditions on the descent step for their convergence, by studying the eigen-
values of the block matrix of the coupled iterations. Several numerical experiments are provided
to illustrate the convergence of these methods in comparison with the classical usual and shifted
gradient descent. In particular, we observe that very few inner iterations on the forward problem
are enough to guarantee good convergence of the inversion algorithm.
Key-words: inverse problems, one-shot methods, convergence analysis, parameter identification
∗Inria, UMA, ENSTA Paris, Institut Polytechnique de Paris
Analyse de convergence pour des méthodes d’inversion
multi-étapes de type one-shot
Résumé : Dans ce travail nous nous intéressons à des problèmes inverses linéaires généraux
où le problème direct correrpondant est résolu de façon itérative en utilisant des méthodes de
point fixe. Ainsi, les méthodes de type one-shot, qui itérent en même temps sur la solution du
problème direct et l’inconnue du problème inverse, peuvent être appliquées. Nous considérons
deux variantes des méthodes multi-étapes de type one-shot et nous establissons des conditions
suffisantes et nécessaires sur le pas de descente pour leur convergence, en étudiant les valeurs
propres de la matrice par blocs des iterations couplées. Plusieurs tests numériques sont présentés
pour illustrer la convergence de ces méthodes par rapport aux méthodes de descente de gradient
usuelle et décentrée. En particulier, nous observons que très peu d’iterations internes pour le
problème direct sont suffisantes pour garantir une bonne convergence de l’algorithme d’inversion.
Mots-clés : problèmes inverses, méthodes de type one-shot, analyse de convergence, identifi-
cation de paramètres
Convergence analysis of multi-step one-shot methods for linear inverse problems 3
Contents
1 Introduction 4
2 Multi-step one-shot inversion methods 5
3 Convergence of one-step one-shot methods (k= 1)7
3.1 Block iteration matrices and eigenvalue equations .................. 7
3.2 Real eigenvalues .................................... 10
3.3 Complex eigenvalues .................................. 11
3.4 Final result (k= 1)................................... 15
4 Convergence of multi-step one-shot methods (k≥2)16
4.1 Block iteration matrices and eigenvalue equations .................. 16
4.2 Real eigenvalues .................................... 18
4.3 Complex eigenvalues .................................. 20
4.4 Final result (k≥2)................................... 26
5 Inverse problem with complex forward problem and real parameter 26
6 Numerical experiments 28
7 Conclusion 33
A Some useful lemmas 36
B Descent step for usual and shifted gradient descent 42
C Convergence study for the scalar case 44
C.1 Notations and preliminary calculation ........................ 44
C.2 Necessary and sufficient conditions for convergence ................. 45
C.2.1 Descent step for the usual gradient descent ................. 45
C.2.2 Descent step for the shifted gradient descent ................ 45
C.2.3 Descent step for k-step one-shot ....................... 46
C.2.4 Descent step for shifted k-step one-shot ................... 49
C.3 Comparison of the bounds for the descent step ................... 52
D A proof of Lemma C.1 based on Marden’s works 52
RR n°9477
4M. Bonazzoli & H. Haddar & T. A. Vu
1 Introduction
For large-scale inverse problems, which often arise in real life applications, the solution of the
corresponding forward and adjoint problems is generally computed using an iterative solver, such
as (preconditioned) fixed point or Krylov subspace methods. Indeed, the corresponding linear
systems could be too large to be handled with direct solvers (e.g. LU-type solvers), and iterative
solvers are easier to parallelize on many cores. Naturally this leads to the idea of one-step
one-shot methods, which iterate at the same time on the forward problem solution (the state
variable), the adjoint problem solution (the adjoint state) and on the inverse problem unknown
(the parameter or design variable). If two or more inner iterations are performed on the state
and adjoint state before updating the parameter (by starting from the previous iterates as initial
guess for the state and adjoint state), we speak of multi-step one-shot methods. Our goal is to
rigorously analyze the convergence of such inversion methods. In particular, we are interested
in those schemes where the inner iterations on the direct and adjoint problems are incomplete,
i.e. stopped before achieving convergence. Indeed, solving the forward and adjoint problems
exactly by direct solvers or very accurately by iterative solvers could be very time-consuming
with little improvement in the accuracy of the inverse problem solution.
The concept of one-shot methods was first introduced by Ta’asan [22] for optimal control
problems. Based on this idea, a variety of related methods, such as the all-at-once methods, where
the state equation is included in the misfit functional, were developed for aerodynamic shape
optimization, see for instance [23,21,11,19,18] and the literature review in the introduction
of [19]. All-at-once approaches to inverse problems for parameter identification were studied
in, e.g., [8,2,15]. An alternative method, called Wavefield Reconstruction Inversion (WRI),
was introduced for seismic imaging in [25], as an improvement of the classical Full Waveform
Inversion (FWI) [24]. WRI is a penalty method which combines the advantages of the all-at-once
approach with those of the reduced approach (where the state equation represents a constraint
and is enforced at each iteration, as in FWI), and was extended to more general inverse problems
in [26].
Few convergence proofs, especially for the multi-step one-shot methods, are available in litera-
ture. In particular, for non-linear design optimization problems, Griewank [6] proposed a version
of one-step one-shot methods where a Hessian-based preconditioner is used in the design variable
iteration. The author proved conditions to ensure that the real eigenvalues of the Jacobian of the
coupled iterations are smaller than 1, but these are just necessary and not sufficient conditions
to exclude real eigenvalues smaller than −1. In addition, no condition to also bound complex
eigenvalues below 1in modulus was found, and multi-step methods were not investigated. In
[9,10,4] an exact penalty function of doubly augmented Lagrangian type was introduced to coor-
dinate the coupled iterations, and global convergence of the proposed optimization approach was
proved under some assumptions. In [7] this particular one-step one-shot approach was extended
to time-dependent problems.
In this work, we consider two variants of multi-step one-shot methods where the forward
and adjoint problems are solved using fixed point methods and the inverse problem is solved
using gradient descent methods. This is a preparatory work where we focus on (discretized)
linear inverse problems. Note that the present analysis in the linear case implies also local
convergence in the non-linear case. The only basic assumptions we require are the inverse problem
uniqueness and the convergence of the fixed point iteration for the forward problem. To analyze
the convergence of the coupled iterations we study the real and complex eigenvalues of the
block iteration matrices. We prove that if the descent step is small enough then the considered
multi-step one-shot methods converge. Moreover, the upper bounds for the descent step in
these sufficient conditions are explicit in the number of inner iterations and in the norms of
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 5
the operators involved in the problem. In the particular scalar case (Appendix C), we establish
sufficient and also necessary convergence conditions on the descent step.
This paper is structured as follows. In Section 2, we introduce the principle of multi-step
one-shot methods and define two variants of these algorithms. Then, in Section 3, respectively
Section 4, we analyze the convergence of one-step one-shot methods, respectively multi-step
one-shot methods: first, we establish eigenvalue equations for the block matrices of the coupled
iterations, then we derive sufficient convergence conditions on the descent step by studying both
real and complex eigenvalues. In Section 5we show that the previous analysis can be extended
to the case where the state variable is complex. Finally, in Section 6we test numerically the
performance of the different algorithms on a toy 2D Helmholtz inverse problem.
Throughout this work, h·,·i indicates the usual Hermitian scalar product in Cn, that is
hx, yi:=y|x, ∀x, y ∈Cn, and k·k the vector/matrix norms induced by h·,·i. We denote by
A∗=A|the adjoint operator of a matrix A∈Cm×n, and likewise by z∗=zthe conjugate of a
complex number z. The identity matrix is always denoted by I, whose size is understood from
context. Finally, for a matrix T∈Cn×nwith ρ(T)<1, we define
s(T):= sup
z∈C,|z|≥1
(I−T/z)−1
which is further studied in Appendix A.
2 Multi-step one-shot inversion methods
We focus on (discretized) linear inverse problems, which correspond to a direct (or forward)
problem of the form: find u≡u(σ)such that
u=Bu +M σ +F(1)
where u∈Rnu,σ∈Rnσ,B∈Rnu×nu,M∈Rnu×nσand F∈Rnu. Here I−Bis the invertible
matrix of the direct problem, obtained after discretization, with parameter σ. Note that in the
non-linear case Bwould be a function of σ. Equation (1) is also called state equation and uis
called state. Given σ, we can solve for uby a fixed point iteration
u`+1 =Bu`+M σ +F, ` = 0,1,..., (2)
which converges for any initial guess u0if and only if the spectral radius ρ(B)is strictly less than
1(see e.g. [5, Theorem 2.1.1]). Hence we assume ρ(B)<1. Now, we measure f=Hu(σ), where
H∈Rnf×nu, and we are interested in the linear inverse problem of finding σfrom f. In order
to guarantee the uniqueness of the inverse problem, we assume that H(I−B)−1Mis injective.
In summary, we set
direct problem: u=Bu +M σ +F,
inverse problem: measure f=Hu(σ),find σ(3)
with the assumptions:
ρ(B)<1, H(I−B)−1Mis injective.(4)
To solve the inverse problem we write its least squares formulation: given σex the exact solution
of the inverse problem and f:=Hu(σex),
σex =argminσ∈RnσJ(σ)where J(σ):=1
2kHu(σ)−fk2.
RR n°9477
6M. Bonazzoli & H. Haddar & T. A. Vu
Using the classical Lagrangian technique with real scalar products, we introduce the adjoint state
p≡p(σ), which is the solution of
p=B∗p+H∗(Hu −f)
and allows us to compute the gradient of the cost functional
∇J(σ) = M∗p(σ).
The classical gradient descent algorithm then reads
usual gradient descent:
σn+1 =σn−τM ∗pn,
un=Bun+M σn+F,
pn=B∗pn+H∗(Hun−f),
(5)
where τ > 0is the descent step size, and the state and adjoint state equations are solved exactly
by a direct solver. Here σn+1 =σn−τ∇J(σn); if instead we update σn+1 =σn−τ∇J(σn−1),
we obtain the
shifted gradient descent:
σn+1 =σn−τM ∗pn,
un+1 =Bun+1 +M σn+F,
pn+1 =B∗pn+1 +H∗(Hun+1 −f).
(6)
Both algorithms converge for sufficiently small τ(see e.g. Appendix B): for any initial guess, (5)
converges if
τ < 2
kH(I−B)−1Mk2,(7)
and (6) converges if
τ < 1
kH(I−B)−1Mk2.(8)
Here, we are interested in methods where the direct and adjoint problems are rather solved
iteratively as in (2), and where we iterate at the same time on the forward problem solution and
the inverse problem unknown: such methods are called one-shot methods. More precisely, we are
interested in two variants of multi-step one-shot methods, defined as follows. Let nbe the index
of the (outer) iteration on σ, the solution to the inverse problem. We update σn+1 =σn−τ M∗pn
as in gradient descent methods, but the state and adjoint state equations are now solved by a
fixed point iteration method, using just kinner iterations, and coupled:
(un+1
`+1 =Bun+1
`+Mσ +F,
pn+1
`+1 =B∗pn+1
`+H∗(Hun+1
`−f),`= 0,1, . . . , k, (un+1 =un+1
k,
pn+1 =pn+1
k
where σdepends on the considered variant (σ=σn+1 or, for the shifted methods, σ=σn). As
initial guess we naturally choose un+1
0=unand pn+1
0=pn, the information from the previous
(outer) step. In summary, we have two multi-step one-shot algorithms
k-step one-shot:
σn+1 =σn−τM ∗pn,
un+1
0=un, pn+1
0=pn,
un+1
`+1 =Bun+1
`+Mσn+1 +F,
pn+1
`+1 =B∗pn+1
`+H∗(Hun+1
`−f),
un+1 =un+1
k, pn+1 =pn+1
k
(9)
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 7
and
shifted k-step one-shot:
σn+1 =σn−τM ∗pn,
un+1
0=un, pn+1
0=pn,
un+1
`+1 =Bun+1
`+Mσn+F,
pn+1
`+1 =B∗pn+1
`+H∗(Hun+1
`−f),
un+1 =un+1
k, pn+1 =pn+1
k,
(10)
and in particular, when k= 1, we obtain the following two algorithms
one-step one-shot:
σn+1 =σn−τM ∗pn,
un+1 =Bun+M σn+1 +F
pn+1 =B∗pn+H∗(Hun−f)
(11)
and
shifted one-step one-shot:
σn+1 =σn−τM ∗pn,
un+1 =Bun+M σn+F
pn+1 =B∗pn+H∗(Hun−f).
(12)
The only difference for the shifted versions lies in the fact that σnis used in (10) and (12), instead
of σn+1 in (9) and (11), so that in (9) and (11) we need to wait for σbefore updating uand
p, while in (10) and (12) we can update σ, u, p at the same time. Also note that when k→ ∞,
the k-step one-shot method (9) formally converges to the usual gradient descent (5), while the
shifted k-step one-shot method (10) formally converges to the shifted gradient descent (6).
We first analyze the one-step one-shot methods (k= 1) in Section 3and then the multi-step
one-shot methods (k≥2) in Section 4.
3 Convergence of one-step one-shot methods (k= 1)
3.1 Block iteration matrices and eigenvalue equations
To analyze the convergence of these methods, first we express (σn+1, un+1, pn+1 )in terms of
(σn, un, pn), by inserting the expression for σn+1 into the iteration for un+1 in (11), so that
system (11) is rewritten as
σn+1 =σn−τM ∗pn
un+1 =Bun+M σn−τ M M ∗pn+F
pn+1 =B∗pn+H∗Hun−H∗f.
(13)
System (12) is already in the form we need. In what follows we first study the shifted 1-step
one-shot method, then the 1-step one-shot method.
Now, we consider the errors (σn−σex, un−u(σex), pn−p(σex )) with respect to the exact
solution at the n-th iteration, and, by abuse of notation, we designate them by (σn, un, pn). We
obtain that the errors satisfy: for the shifted algorithm (12)
σn+1 =σn−τM ∗pn
un+1 =Bun+M σn
pn+1 =B∗pn+H∗Hun
(14)
RR n°9477
8M. Bonazzoli & H. Haddar & T. A. Vu
and for algorithm (13)
σn+1 =σn−τM ∗pn
un+1 =Bun+M σn−τ M M ∗pn
pn+1 =B∗pn+H∗Hun,
(15)
or equivalently, by putting in evidence the block iteration matrices
pn+1
un+1
σn+1
=
B∗H∗H0
0B M
−τM ∗0I
pn
un
σn
(16)
and
pn+1
un+1
σn+1
=
B∗H∗H0
−τM M ∗B M
−τM ∗0I
pn
un
σn
.(17)
Now recall that a fixed point iteration converges if and only if the spectral radius of its iteration
matrix is strictly less than 1. Therefore in the following propositions we establish eigenvalue
equations for the iteration matrix of the two methods.
Proposition 3.1 (Eigenvalue equation for the shifted 1-step one-shot method).Assume that
λ∈Cis an eigenvalue of the iteration matrix in (16).
(i) If λ∈C,λ /∈Spec(B), then ∃y∈Cnσ, y 6= 0 such that
(λ−1) kyk2+τhM∗(λI −B∗)−1H∗H(λI −B)−1M y, yi= 0.(18)
(ii) λ= 1 is not an eigenvalue of the iteration matrix.
Remark 3.2.Since ρ(B)is strictly less than 1, so is ρ(B∗).
Proof. Since λ∈Cis an eigenvalue of the iteration matrix in (16), there exists a non-zero vector
(˜p, ˜u, y)∈Cnu+nu+nσsuch that
λy =y−τM ∗˜p
λ˜u=B˜u+My
λ˜p=B∗˜p+H∗H˜u.
(19)
By the second equation in (19)˜u= (λI −B)−1M y, so together with the third equation
˜p= (λI −B∗)−1H∗H˜u= (λI −B∗)−1H∗H(λI −B)−1M y,
and by inserting this result into the first equation we obtain
(λ−1)y=−τM ∗(λI −B∗)−1H∗H(λI −B)−1M y, (20)
that gives (18) by taking the scalar product with y. We also see that if y= 0 then the above
formulas for ˜u, ˜pimmediately give ˜u= ˜p= 0, that is a contradiction.
(ii) Assume that λ= 1 is an eigenvalue of the iteration matrix, then (20) gives us
M∗(I−B∗)−1H∗H(I−B)−1My = 0,
but this cannot happen for y6= 0 due to the injectivity of H(I−B)−1M.
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 9
Proposition 3.3 (Eigenvalue equation for the 1-step one-shot method).Assume that λ∈Cis
an eigenvalue of the iteration matrix in (17).
(i) If λ∈C,λ /∈Spec(B)then ∃y∈Cnσ, y 6= 0 such that:
(λ−1) kyk2+τλhM∗(λI −B∗)−1H∗H(λI −B)−1M y, yi= 0.(21)
(ii) λ= 1 is not an eigenvalue of the iteration matrix.
Proof. Since λ∈Cis an eigenvalue of the iteration matrix in (17), there exists a non-zero vector
(˜p, ˜u, y)∈Cnu+nu+nσsuch that
λy =y−τM ∗˜p
λ˜u=B˜u+M y −τ MM ∗˜p
λ˜p=B∗˜p+H∗H˜u.
(22)
By the third equation in (22)˜p= (λI −B∗)−1H∗H˜u, and inserting this result into the second
equation we obtain
λ˜u=B˜u+M y −τ MM ∗(λI −B∗)−1H∗H˜u,
or equivalently,
[I+τM M ∗A](λI −B)˜u=M y
where A= (λI −B∗)−1H∗H(λI −B)−1. Since τ > 0,I+τ M M ∗Ais a positive definite matrix.
Therefore
˜u= (λI −B)−1[I+τ MM ∗A]−1M y
and
˜p= (λI −B∗)−1H∗H˜u=A[I+τ M M ∗A]−1My.
By inserting this result into the first equation in (22) we obtain
(λ−1)y=−τM ∗A[I+τ M M∗A]−1My.
Thanks to the fact that [I+τM M ∗A]−1and M M∗Acommute, we have
(λ−1)My =−τ M M ∗A[I+τMM ∗A]−1M y =−τ[I+τ MM ∗A]−1M M ∗AM y
then
(λ−1)[I+τM M ∗A]M y =−τMM ∗AM y,
that leads to
(λ−1)My +τ λM M ∗AMy = 0.
Since H(I−B)−1Mis injective, so is M. Therefore
(λ−1)y+τλM ∗AM y = 0,(23)
that gives (21) by taking scalar product with y. We also see that if y= 0 then the above formulas
for ˜u, ˜pimmediately give ˜u= ˜p= 0, that is a contradiction.
(ii) Assume that λ= 1 is an eigenvalue of the iteration matrix, then (23) gives us
M∗(I−B∗)−1H∗H(I−B)−1My = 0,
but this cannot happen for y6= 0 due to the injectivity of H(I−B)−1M.
RR n°9477
10 M. Bonazzoli & H. Haddar & T. A. Vu
In the following sections we will show that, for sufficiently small τ, equations (18) and (21)
admit no solution |λ| ≥ 1, thus algorithms (12) and (11) converge. When λ6= 0, it is convenient
to rewrite (18) and (21) respectively as
λ2(λ−1) kyk2+τhM∗(I−B∗/λ)−1H∗H(I−B/λ)−1M y, yi= 0 (24)
and
λ(λ−1) kyk2+τhM∗(I−B∗/λ)−1H∗H(I−B/λ)−1M y, yi= 0.(25)
For the analysis we use auxiliary results proved in Appendix A.
First, we study separately the very particular case where B= 0.
Proposition 3.4 (shifted 1-step one-shot method).When B= 0, the eigenvalue equation (24)
admits no solution λ∈C,|λ| ≥ 1if τ < −1+√5
2kHk2kMk2.
Proof. When B= 0, equation (24) becomes λ2(λ−1) kyk2+τkH M yk2= 0 which is equivalent
to λ3−λ2+kHM yk2
kyk2τ= 0. Then, the conclusion can be obtained by Lemma C.1.
Proposition 3.5 (1-step one-shot method).When B= 0, the eigenvalue equation (25)admits
no solution λ∈C,|λ| ≤ 1if τ < 1
kHk2kMk2.
Proof. When B= 0, equation (25) becomes λ(λ−1) kyk2+τkHM yk2= 0 which yields λ3−
λ2+kHM yk2
kyk2τλ = 0. Then, the conclusion can be obtained by Lemma C.1.
3.2 Real eigenvalues
We now find conditions on the descent step τsuch that the real eigenvalues stay inside the unit
disk. Recall that we have already proved that λ= 1 is not an eigenvalue for both methods.
Proposition 3.6 (shifted 1-step one-shot method).Equation (24)
(i) admits no solution λ∈R, λ > 1for all τ > 0;
(ii) admits no solution λ∈R, λ ≤ −1if we take
τ < 2
kHk2kMk2s(B)2,
where s(B)is defined in Lemma A.2; moreover if 0<kBk<1, we can take
τ < χ0(1,kBk)
kHk2kMk2,where χ0(1, b) = 2(1 −b)2(26)
(here in the notation χ0(1, b),1refers to k= 1).
Proof. When λ∈R\{0}equation (24) becomes
λ2(λ−1) kyk2+τ
H(I−B/λ)−1M y
2= 0.
The left-hand side of the above equation is strictly positive for any τ > 0if λ > 1; it is strictly
negative for τsatisfying the inequality in (ii) if λ≤ −1, noting that λ7→ λ2(λ−1) is increasing
for λ≤ −1.
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 11
Proposition 3.7 (1-step one-shot method).Equation (25)admits no solution λ∈R, λ 6= 1,|λ| ≥
1for all τ > 0.
Proof. When λ∈R\{0}equation (25) becomes
λ(λ−1) kyk2+τ
H(I−B/λ)−1M y
2= 0.
If λ∈R, λ 6= 1,|λ| ≥ 1then λ(λ−1) >0, thus the left-hand side of the above equation is strictly
positive for any τ > 0.
3.3 Complex eigenvalues
We now look for conditions on the descent step τsuch that also the complex eigenvalues stay
inside the unit disk. We first deal with the shifted 1-step one-shot method.
Proposition 3.8 (shifted 1-step one-shot method).If B6= 0,∃τ > 0sufficiently small such
that equation (24)admits no solution λ∈C\R,|λ| ≥ 1. In particular, if 0<kBk<1, given any
δ0>0and 0< θ0≤π
6, take
τ < min{χ1(1,kBk), χ2(1,kBk), χ3(1,kBk), χ4(1,kBk)}
kHk2kMk2,
where
χ1(1, b) = (1 −b)4
4b2, χ2(1, b) = 2 sin θ0
2(1 −b)2
(1 + b)2,
χ3(1, b) = δ0cos25θ0
2
21+2δ0sin 5θ0
2+δ2
0·(1 −b)4
b2, χ4(1, b) = hsin π
2−3θ0+ cos 2θ0i(1 −b)2
(here in the notation χi(1, b), i = 1,...,4,1refers to k= 1).
Proof. Step 1. Rewrite equation (24)so that we can study its real and imaginary
parts.
Let λ=R(cos θ+ i sin θ)in polar form where R=|λ| ≥ 1and θ∈(−π, π). Write 1/λ =
r(cos φ+ i sin φ)in polar form where r= 1/|λ|= 1/R ≤1and φ=−θ∈(−π, π). By Lemma
A.3, we have
I−B
λ!−1
=P(λ)+iQ(λ), I−B∗
λ!−1
=P(λ)∗+ iQ(λ)∗
where P(λ)and Q(λ)are Cnu×nu-valued functions, and, by omitting the dependence on λ,
kPk ≤ p:=
(1 + kBk)s(B)2for general B6= 0,
1
1− kBkwhen kBk<1; (27)
kQk ≤ q1:=
kBks(B)2for general B6= 0,
kBk
1− kBkwhen 0<kBk<1; (28)
kQk ≤ |sin θ|q2, q2:=
kBks(B)2for general B6= 0,
kBk
(1 − kBk)2when 0<kBk<1.(29)
RR n°9477
12 M. Bonazzoli & H. Haddar & T. A. Vu
Now we rewrite (24) as
λ2(λ−1) kyk2+τG(P∗+ iQ∗, P + iQ) = 0 (30)
where
G(X, Y ) = hM∗X H ∗H Y M y, yi ∈ C, X, Y ∈Cnu×nu.
Gsatisfies the following properties:
•∀X, Y1, Y2∈Cnu×nu,∀z1, z2∈C:G(X, z1Y1+z2Y2) = z1G(X, Y1) + z2G(X, Y2).
•∀X1, X2, Y ∈Cnu×nu,∀z1, z2∈C:G(z1X1+z2X2, Y ) = z1G(X1, Y ) + z2G(X2, Y ).
•∀X∈Cnu×nu:0≤G(X∗, X) = kHX M yk2≤(kHkkMkkXk)2kyk2.
•∀X, Y ∈Cnu×nu:G(X, Y ) + G(Y∗, X∗)∈R, indeed
G(X, Y ) = hM∗XH∗HY M y, yi=hy , M∗Y∗H∗HX∗Myi
=hM∗Y∗H∗HX∗M y, yi∗=G(Y∗, X∗)∗.
With these properties of G, we expand (30) and take its real and imaginary parts, so we respec-
tively obtain:
<(λ3−λ2)kyk2+τ[G(P∗, P )−G(Q∗, Q)] = 0 (31)
and
=(λ3−λ2)kyk2+τ[G(P∗, Q) + G(Q∗, P )] = 0 (32)
Step 2. Find a suitable combination of equations (31)and (32), choose τso that we
obtain a new equation with a left-hand side which is strictly positive/negative.
Let γ=γ(λ)∈R, defined by cases as in Lemma A.4. Multiplying equation (32) with γthen
summing it with equation (31), we obtain:
[<(λ3−λ2) + γ=(λ3−λ2)] kyk2+τ[G(P∗, P )−G(Q∗, Q) + γG(P∗, Q) + γG(Q∗, P )] = 0,
or equivalently,
[<(λ3−λ2) + γ=(λ3−λ2)] kyk2+τG(P∗+γQ∗, P +γQ)−(1 + γ2)τ G(Q∗, Q)=0.(33)
Now we consider four cases of λas in Lemma A.4:
•Case 1. <(λ3−λ2)≥0;
•Case 2. <(λ3−λ2)<0and θ∈[θ0, π −θ0]∪[−π+θ0,−θ0]for fixed 0< θ0≤π
6;
•Case 3. <(λ3−λ2)<0and θ∈(−θ0, θ0)for fixed 0< θ0≤π
6;
•Case 4. <(λ3−λ2)<0and θ∈(π−θ0, π)∪(−π, −π+θ0)for fixed 0< θ0≤π
6.
The four cases will be treated in the following four lemmas (Lemmas 3.9–3.12), which together
give the statement of this proposition.
Lemma 3.9 (Case 1).Equation (24)admits no solutions λin Case 1 if we take
τ < 1
4kHk2kMk2kBk2s(B)4.
Moreover, if 0<kBk<1, we can take
τ < (1 − kBk)4
4kHk2kMk2kBk2.
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 13
Proof. Writing (33) for γ=γ1as in Lemma A.4 (i) (in particular γ2
1= 1), we have
[<(λ3−λ2) + γ1=(λ3−λ2)] kyk2+τG(P∗+γ1Q∗, P +γ1Q)−2τG(Q∗, Q)=0.(34)
By the properties of Gwe have
G(P∗+γ1Q∗, P +γ1Q)≥0
and
G(Q∗, Q)≤(kHkkMkkQk)2kyk2≤(kHk kMk |sin θ|q2)2kyk2,
therefore the left-hand side of (34) will be strictly positive if τsatisfies
τ < <(λ3−λ2) + γ1=(λ3−λ2)
2 (kHkkMk| sin θ|q2)2.
Since <(λ3−λ2) + γ1=(λ3−λ2)≥2|sin(θ/2)|by Lemma A.4 (i), it is enough to choose
τ < 1
4sin θ
2cos2θ
2kHk2kMk2q2
2
.
Since sin θ
2cos2θ
2≤1, it is sufficient to choose τ < 1
4kHk2kMk2q2
2
and we use definition (29) of
q2.
Lemma 3.10 (Case 2).Equation (24)admits no solutions λin Case 2 if we take
τ < 2 sin θ0
2
kHk2kMk2(1 + 2 kBk)2s(B)4.
Moreover, if 0<kBk<1, we can take
τ < 2 sin θ0
2(1 − kBk)2
kHk2kMk2(1 + kBk)2.
Proof. Writing (33) for γ=γ2as in Lemma A.4 (ii) (in particular γ2
2= 1), we have
[<(λ3−λ2) + γ2=(λ3−λ2)] kyk2+τG(P∗+γ2Q∗, P +γ2Q)−2τG(Q∗, Q)=0.(35)
By the properties of G
G(Q∗, Q)≥0, G(P∗+γ2Q∗, P +γ2Q)≤(kHk kMk kP+γ2Qk)2kyk2
and the estimate kP+γ2Qk ≤ kPk+|γ2|kQk=kPk+kQk ≤ p+q1,the left-hand side of (35)
will be strictly negative if τsatisfies:
τ < −<(λ3−λ2)−γ2=(λ3−λ2)
[kHkkMk(p+q1)]2.
Thanks to Lemma A.4 (ii), it is sufficient to choose
τ < 2 sin θ0
2
kHk2kMk2(p+q1)2
and we use definitions (27) and (28) of pand q1.
RR n°9477
14 M. Bonazzoli & H. Haddar & T. A. Vu
Lemma 3.11 (Case 3).Let δ0>0be fixed. Equation (24)admits no solutions λin Case 3 if
we take
τ < δ0cos25θ0
2
21+2δ0sin 5θ0
2+δ2
0·1
kHk2kMk2kBk2s(B)4.
Moreover, if 0<kBk<1, we can take
τ < δ0cos25θ0
2
21+2δ0sin 5θ0
2+δ2
0·(1 − kBk)4
kHk2kMk2kBk2.
Proof. Writing (33) for γ=γ3as in Lemma A.4 (iii), we have
[<(λ3−λ2) + γ3=(λ3−λ2)] kyk2+τG(P∗+γ3Q∗, P +γ3Q)−(1 + γ2
3)τG(Q∗, Q)=0.(36)
By the properties of G
G(P∗+γ3Q∗, P +γ3Q)≥0, G(Q∗, Q)≤(kHk kMk kQk)2kyk2
and by the estimate kQk ≤ |sin θ|q2, the left-hand side of (36) will be strictly positive if τ
satisfies:
τ < <(λ2−λ) + γ3=(λ2−λ)
(1 + γ2
3) (kHkkMk| sin θ|q2)2.
Since by Lemma A.4 (iii) <(λ3−λ2) + γ3=(λ3−λ2)>2δ0sin θ
2, it is sufficient to choose
τ < δ0
2(1 + γ2
3)kHk2kMk2q2
2
=1
2kHk2kMk2q2
2·δ0cos25θ0
2
1+2δ0sin 5θ0
2+δ2
0
,
where we have used the definition of γ3. To conclude we use definition (29) of q2.
Lemma 3.12 (Case 4).Equation (24)admits no solutions λin Case 4 if we take
τ < sin π
2−3θ0+ cos 2θ0
kHk2kMk2(1 + kBk)2s(B)2.
Moreover, if 0<kBk<1, we can take
τ < hsin π
2−3θ0+ cos 2θ0i(1 − kBk)2
kHk2kMk2.
Proof. Here it is enough to consider (31). By the properties of G
G(Q∗, Q)≥0, G(P∗, P )≤(kHkkMkp)2kyk2
we see that the left-hand side of (31) will be strictly negative if τsatisfies
τ < −<(λ3−λ2)
(kHkkMkp)2.
Thanks to Lemma A.4 (iv), it is sufficient to choose
τ < sin π
2−3θ0+ cos 2θ0
kHk2kMk2p2,
and definition (27) of pleads to the conclusion.
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 15
Similarly, with the help of Lemma A.5, we prove for the 1-step one-shot method the analogue
of Proposition 3.8. In particular, note that here just three cases of λneed to be considered,
because the analogue of the fourth one is excluded by Lemma A.5 (iv).
Proposition 3.13 (1-step one-shot method).If B6= 0,∃τ > 0sufficiently small such that
equation (25)admits no solution λ∈C\R,|λ| ≥ 1. In particular, if 0<kBk<1, given any
δ0>0and 0< θ0≤π
4, take
τ < min{ψ1(1,kBk), ψ2(1,kBk), ψ3(1,kBk)}
kHk2kMk2,
where
ψ1(1, b) = (1 −b)4
4b2, ψ2(1, b) = 2 sin θ0
2(1 −b)2
(1 + b)2, ψ3(1, b) = δ0cos23θ0
2(1 −b)4
21+2δ0sin 3θ0
2+δ2
0b2
(here in the notation ψi(1, b), i = 1,2,3,1refers to k= 1).
3.4 Final result (k= 1)
Considering Proposition 3.4, and taking the minimum between the bound (26) in Proposition 3.6
for real eigenvalues and the bound in Proposition 3.8 for complex eigenvalues, we obtain a
sufficient condition on the descent step τto ensure convergence of the shifted 1-step one-shot
method.
Theorem 3.14 (Convergence of shifted 1-step one-shot).Under assumption (4), the shifted
1-step one-shot method (12)converges for sufficiently small τ. In particular, for kBk<1, it is
enough to take
τ < χ(1,kBk)
kHk2kMk2,
where χ(1,kBk)is an explicit function of kBk(in this notation 1refers to k= 1).
Remark 3.15.Set b=kBk. For 0<b<1, a practical (but not optimal) bound for τis
τ < 1
kHk2kMk2·min 1
2·(1 −b)2
(1 + b)2,1−sin 5π
12
4·(1 −b)4
b2.
Indeed, using the notation in Proposition 3.6 and 3.8, it is easy to show that χ2(1, b)≤χ0(1, b)
and χ3(1, b)≤χ1(1, b). By studying χ3(1, b)and noting that δ2
0+ 1 ≥2δ0, we see that we
should take δ0= 1. Finally, we can take for instance θ0=π
6, then compare χ2(1, b),χ3(1, b)and
χ4(1, b).
Putting together Propositions 3.5,3.7,3.13, we obtain a sufficient condition on the descent
step τto ensure convergence of the 1-step one-shot method.
Theorem 3.16 (Convergence of 1-step one-shot).Under assumption (4), the 1-step one-shot
method (11)converges for sufficiently small τ. In particular, for kBk<1, it is enough to take
τ < ψ(1,kBk)
kHk2kMk2,
where ψ(1,kBk)is an explicit function of kBk(in this notation 1refers to k= 1).
Remark 3.17.Similarly as above, for 0< b < 1, a practical (but not optimal) bound for τis
τ < 1
kHk2kMk2·min 2 sin π
8·(1 −b)2
(1 + b)2,1−sin 3π
8
4·(1 −b)4
b2.
RR n°9477
16 M. Bonazzoli & H. Haddar & T. A. Vu
4 Convergence of multi-step one-shot methods (k≥2)
We now tackle the multi-step case, that is the k-step one-shot methods with k≥2.
4.1 Block iteration matrices and eigenvalue equations
Once again, to analyze the convergence of these methods, first we express (σn+1, un+1, pn+1 )in
terms of (σn, un, pn), by rewriting the recursions for uand p: systems (9) and (10) are respectively
rewritten as
σn+1 =σn−τM ∗pn
un+1 =Bkun+TkMσn−τ TkMM∗pn+TkF
pn+1 = [(B∗)k−τXkM M ∗]pn+Ukun+XkM σn+XkF−T∗
kH∗f
(37)
and
σn+1 =σn−τM ∗pn
un+1 =Bkun+TkMσn+TkF
pn+1 = (B∗)kpn+Ukun+XkMσn+XkF−T∗
kH∗f
(38)
where
Tk=I+B+... +Bk−1= (I−B)−1(I−Bk), k ≥1,(39)
Uk= (B∗)k−1H∗H+ (B∗)k−2H∗HB +... +H∗HBk−1, k ≥1,
Xk=(B∗)k−2H∗HT1+ (B∗)k−3H∗HT2+... +H∗H Tk−1if k≥2,
0if k= 1.(40)
Note that (37) (k-step one-shot) can be obtained from (38) (shifted k-step one-shot) by replacing
σnwith σn+1 =σn−τM ∗pnin the equations for uand p, which yields two extra terms in (37). In
what follows we first study the shifted k-step one-shot method then the k-step one-shot method.
The following lemma gathers some useful properties of Tk, Ukand Xk.
Lemma 4.1. (i) The matrices Ukand Xkcan be rewritten as
Uk=X
i+j=k−1
(B∗)iH∗HBjfor k≥1,
Xk=
k−2
X
l=0 X
i+j=l
(B∗)iH∗HBj=
k−1
X
l=1
Ulfor k≥2.
(ii) The matrices Ukand Xkare self-adjoint: U∗
k=Uk,X∗
k=Xk.
(iii) We have the relation
UkTk−XkBk+Xk=T∗
kH∗HTk,∀k≥1.(41)
Proof. (i) is easy to check by the definitions. (ii) follows from (i).
(iii) For k= 1, we have U1=H∗H,T1=Iand X1= 0, hence the identity is verified. For k≥2,
note that Xk+1 =B∗Xk+H∗HTk, then by (ii) Xk+1 =X∗
k+1 =XkB+T∗
kH∗H. On the other
hand, from (i) we get that Xk+1 =Xk+Uk. Thus,
Xk+Uk=XkB+T∗
kH∗H, or equivalently, Uk=Xk(B−I) + T∗
kH∗H.
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 17
Finally,
UkTk=Xk(B−I)Tk+T∗
kH∗HTk=Xk(Bk−I) + T∗
kH∗HTk.
Now, we consider the errors (σn−σex, un−u(σex), pn−p(σex )) with respect to the exact
solution at the n-th iteration, and, by abuse of notation, we designate them by (σn, un, pn). We
obtain that the errors satisfy: for the shifted algorithm (38)
σn+1 =σn−τM ∗pn
un+1 =Bkun+TkMσn
pn+1 = (B∗)kpn+Ukun+XkMσn
(42)
and for algorithm (37)
σn+1 =σn−τM ∗pn
un+1 =Bkun+TkMσn−τ TkMM∗pn
pn+1 = [(B∗)k−τXkM M ∗]pn+Ukun+XkM σn,
(43)
or equivalently, by putting in evidence the block iteration matrices
pn+1
un+1
σn+1
=
(B∗)kUkXkM
0BkTkM
−τM ∗0I
pn
un
σn
(44)
and
pn+1
un+1
σn+1
=
(B∗)k−τXkM M ∗UkXkM
−τTkM M ∗BkTkM
−τM ∗0I
pn
un
σn
.(45)
Now recall that a fixed point iteration converges if and only if the spectral radius of its iteration
matrix is strictly less than 1. Therefore in the following propositions we establish eigenvalue
equations for the iteration matrix of the two methods.
Proposition 4.2 (Eigenvalue equation for the shifted k-step one-shot method).Assume that
λ∈Cis an eigenvalue of the iteration matrix in (44).
(i) If λ∈C,λ /∈Spec(Bk), then ∃y∈Cnσ, y 6= 0 such that
(λ−1) kyk2+τhM∗[λI −(B∗)k]−1[(λ−1)Xk+T∗
kH∗HTk](λI −Bk)−1M y, yi= 0.(46)
(ii) λ= 1 is not an eigenvalue of the iteration matrix.
Proposition 4.3 (Eigenvalue equation for the k-step one-shot method).Assume that λ∈Cis
an eigenvalue of the iteration matrix in (45).
(i) If λ∈C,λ /∈Spec(Bk)then ∃y∈Cnσ, y 6= 0 such that:
(λ−1) kyk2+τλhM∗[λI −(B∗)k]−1[(λ−1)Xk+T∗
kH∗HTk](λI −Bk)−1M y, yi= 0.(47)
(ii) λ= 1 is not an eigenvalue of the iteration matrix.
Remark 4.4.Since ρ(B)is strictly less than 1, so are ρ(B∗), ρ(Bk)and ρ((B∗)k).
RR n°9477
18 M. Bonazzoli & H. Haddar & T. A. Vu
The proofs for Propositions 4.2 and 4.3 are respectively similar to the ones of Propositions 3.1
and 3.3, the slight difference is that in the calculation we use (41) to simplify some terms.
In the following sections we will show that, for sufficiently small τ, equations (46) and (47)
admit no solution |λ| ≥ 1, thus algorithms (10) and (9) converge. When λ6= 0, it is convenient
to rewrite (46) and (47) respectively as
λ2(λ−1) kyk2+τhM∗I−(B∗)k/λ−1[(λ−1)Xk+T∗
kH∗HTk]I−Bk/λ−1M y, yi= 0 (48)
and
λ(λ−1) kyk2+τhM∗I−(B∗)k/λ−1[(λ−1)Xk+T∗
kH∗HTk]I−Bk/λ−1M y, yi= 0 (49)
The scalar case where nu, nσ, nf= 1 is analyzed in Appendix C.
Remark 4.5.Note that when B= 0 and k≥2, the shifted k-step one-shot and k-step one-
shot are respectively equivalent to the shifted and usual gradient descent methods, therefore we
retrieve the same bounds (8)–(7) for the descent step τas for those methods.
For the analysis we use auxiliary results proved in Appendix A, and the following bounds for
s(Bk), Tk, Xk.
Lemma 4.6. If kBk<1,
s(Bk)≤1
1− kBkk,kTkk ≤ 1− kBkk
1− kBk,kXkk ≤ kHk2(1 −kkBkk−1+ (k−1) kBkk)
(1 − kBk)2.
Proof. The bound for s(Bk)is proved using Lemma A.2 and
Bk
≤ kBkk. Next, from (39) we
have
kTkk ≤ 1 + kBk+... +kBkk−1=1− kBkk
1− kBk.
From (40), if k≥2we have
kXkk ≤ kHk2kBkk−2+kBkk−3(1 + kBk) + ... + (1 + kBk+... +kBkk−2)
=kHk2(1 + 2 kBk+... + (k−1) kBkk−2) = kHk2(1 −kkBkk−1+ (k−1) kBkk)
(1 − kBk)2.
4.2 Real eigenvalues
We first find conditions on the descent step τsuch that the real eigenvalues stay inside the unit
disk. Recall that we have already proved that λ= 1 is not an eigenvalue for any k.
Proposition 4.7 (shifted k-step one-shot method).When k≥2,∃τ > 0sufficiently small such
that equation (48)admits no solution λ∈R, λ 6= 1,|λ| ≥ 1. More precisely, take
•τ < 2
kMk2(kHk2kTkk2+2kXkk)s(Bk)2if the denominator of the right-hand side is not 0;
•any τ > 0otherwise.
Moreover, if kBk<1, we can take
τ < (1 − kBk)2
kHk2kMk2·2(1 − kBkk)2
(1 − kBkk)2+ 2(1 −kkBkk−1+ (k−1) kBkk).
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 19
Proof. When λ∈Requation (48) is rewritten as
λ2(λ−1) kyk2+τ
HTkI−Bk
λ−1My
2
+τ(λ−1)hM∗hI−(B∗)k
λi−1XkI−Bk
λ−1M y, yi= 0.
We show that if λ > 1(or respectively λ≤ −1) we can choose τso that the left-hand side of the
above equation is strictly positive (or respectively negative). Indeed, if λ > 1, we choose τsuch
that
λ2kyk2−τhM∗I−(B∗)k
λ−1
XkI−Bk
λ−1
M y, yi
>0
and this can be done by taking τsuch that
[kXkkkMk2s(Bk)2]τ < 1.
If λ≤ −1, we choose τsuch that
λ2(λ−1) kyk2+τ
HTkI−Bk
λ−1My
2
+τ(1 −λ)hM∗hI−(B∗)k
λi−1XkI−Bk
λ−1M y, yi<0
and this can be done by taking τsuch that
"kHk2kTkk2kMk2s(Bk)2
2+kXkkkMk2s(Bk)2#τ < 1,
so we obtain the first conclusion. Finally, the second conclusion in the case kBk<1can be
obtained by Lemma 4.6.
Proposition 4.8 (k-step one-shot method).When k≥2,∃τ > 0sufficiently small such that
equation (49)admits no solution λ∈R, λ 6= 1,|λ| ≥ 1. More precisely, take
•τ < 1
kXkkkMk2s(Bk)2if the denominator of the right-hand side is not 0;
•any τ > 0otherwise.
Moreover, if kBk<1, we can take
τ < (1 − kBk)2
kHk2kMk2·(1 − kBkk)2
1−kkBkk−1+ (k−1) kBkk.
Proof. When λ∈Requation (49) is rewritten as
λ(λ−1) kyk2+τ
HTkI−Bk
λ−1My
2
+τ(λ−1)hM∗hI−(B∗)k
λi−1XkI−Bk
λ−1M y, yi= 0.
RR n°9477
20 M. Bonazzoli & H. Haddar & T. A. Vu
We show that we can choose τso that the left-hand side of the above equation is strictly positive.
Indeed, if λ > 1, we choose τsuch that
λkyk2−τhM∗I−(B∗)k
λ−1
XkI−Bk
λ−1
M y, yi
>0
and this can be done by taking τsuch that
kXkkkMk2s(Bk)2τ < 1.
If λ≤ −1, we choose τsuch that
λkyk2+τhM∗I−(B∗)k
λ−1
XkI−Bk
λ−1
M y, yi
<0
and this is also done by taking τsuch that
kXkkkMk2s(Bk)2τ < 1.
so we obtain the first conclusion. Finally, the conclusion in the case kBk<1can be obtained by
Lemma 4.6.
4.3 Complex eigenvalues
We now look for conditions on the descent step τsuch that also the complex eigenvalues stay
inside the unit disk. We first deal with the shifted k-step one-shot method.
Proposition 4.9 (shifted k-step one-shot method).When k≥2,∃τ > 0sufficiently small such
that equation (48)admits no solution λ∈C\R,|λ| ≥ 1. In particular, if kBk<1, given any
δ0>0and 0< θ0<π
6, take
τ < min{χ1(k, kBk), χ2(k, kBk), χ3(k , kBk), χ4(k, kBk)}
kHk2kMk2
where
χ1(k, b) = (1 −b)2(1 −bk)2
4b2k+√2(1 −kbk−1+ (k−1)bk)(1 + bk)2
χ2(k, b) = (1 −b)2(1 −bk)2
1
2 sin(θ0/2) (1 −bk)2+√2(1 −kbk−1+ (k−1)bk)(1 + bk)2
χ3(k, b) = (1 −b)2(1 −bk)2
2csin(θ0/2)
δ0b2k+ (1 −kbk−1+ (k−1)bk)h√c
δ0(1 + b2k) + 2 max√c
δ0,√c
cos 3θ0bki
χ4(k, b) = sin π
2−3θ0+ cos 2θ0(1 −b)2(1 −bk)2
(1 −bk)2+ 2(1 −kbk−1+ (k−1)bk)(1 + bk)2
and c=1+2δ0sin 5θ0
2+δ2
0
cos25θ0
2
.
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 21
Proof. Step 1. Rewrite equation (48)so that we can study its real and imaginary
parts.
Let λ=R(cos θ+ i sin θ)in polar form where R=|λ| ≥ 1and θ∈(−π, π). Write 1/λ =
r(cos φ+ i sin φ)in polar form where r= 1/|λ|= 1/R ≤1and φ=−θ∈(−π, π). By Lemma
A.3 applied to T=Bk, we have
I−Bk
λ!−1
=P(λ)+iQ(λ), I−(B∗)k
λ!−1
=P(λ)∗+ iQ(λ)∗
where P(λ)and Q(λ)are Cnu×nu-valued functions, and, by omitting the dependence on λ,
kPk ≤ p:=
(1 +
Bk
)s(Bk)2for general B,
1
1− kBkkwhen kBk<1; (50)
kQk ≤ q1:=
Bk
s(Bk)2for general B,
kBkk
1− kBkkwhen kBk<1; (51)
kQk ≤ q2|sin θ|, q2:=
Bk
s(Bk)2for general B,
kBkk
(1 − kBkk)2when kBk<1.(52)
Now we rewrite (48) as
λ2(λ−1) kyk2+τG(P∗+ iQ∗, P + iQ) + τ(λ−1)L(P∗+ iQ∗, P + iQ)=0.(53)
where
G(X, Y ) = hM∗X T ∗
kH∗HTkY M y, yi, L(X, Y ) = hM∗XXkY My , yi
for X, Y ∈Cnu×nu.Gsatisfies the following properties:
•∀X, Y1, Y2∈Cnu×nu,∀z1, z2∈C:G(X, z1Y1+z2Y2) = z1G(X, Y1) + z2G(X, Y2).
•∀X1, X2, Y ∈Cnu×nu,∀z1, z2∈C:G(z1X1+z2X2, Y ) = z1G(X1, Y ) + z2G(X2, Y ).
•∀X∈Cnu×nu:G(X∗, X)∈R.
•∀X, Y ∈Cnu×nu:G(X, Y ) + G(Y∗, X∗)∈R, indeed
G(X, Y ) = hM∗X T ∗
kH∗HTkY M y, yi=hy , M∗Y∗T∗
kH∗HTkX∗Myi
=hM∗Y∗T∗
kH∗HTkX∗M y, yi∗=G(Y∗, X∗)∗.
Similarly, Lhas the same properties as G(note that X∗
k=Xkby Lemma 4.1). With these
properties of Gand L, we expand (53) and take its real and imaginary parts, so we respectively
obtain:
<(λ3−λ2)kyk2+τG1+τ[<(λ−1)L1− =(λ−1)L2] = 0 (54)
and
=(λ3−λ2)kyk2+τG2+τ[=(λ−1)L1+<(λ−1)L2] = 0 (55)
where
G1=G(P∗, P )−G(Q∗, Q), G2=G(P∗, Q) + G(Q∗, P ),
RR n°9477
22 M. Bonazzoli & H. Haddar & T. A. Vu
L1=L(P∗, P )−L(Q∗, Q), L2=L(P∗, Q) + L(Q∗, P ).
Step 2. Find a suitable combination of equations (54)and (55), choose τso that we
obtain a new equation with a left-hand side which is strictly positive/negative.
Let γ=γ(λ)∈R, defined by cases as in Lemma A.4. Multiplying equation (55) with γthen
summing it with equation (54), we obtain:
[<(λ3−λ2) + γ=(λ3−λ2)] kyk2+τG(P∗+γQ∗, P +γQ)−(1 + γ2)τ G(Q∗, Q)
+τ([<(λ−1) + γ=(λ−1)]L1+ [γ<(λ−1) − =(λ−1)]L2) = 0.(56)
Now we prepare some useful estimates.
•∀X∈Cnu×nu:0≤G(X∗, X) = kHTkX M yk2≤(kHkkTkkkMkkXk)2kyk2.
Since kQk ≤ q1and kQk ≤ q2|sin θ|, we have
G(Q∗, Q)≤(kHkkTkkkMkq1)2kyk2and G(Q∗, Q)≤(kHk kTkk kMkq2sin |θ|)2kyk2.
•By Cauchy-Schwarz inequality we have
|<(λ−1) + γ=(λ−1)| ≤ p1 + γ2|λ−1|;|γ<(λ−1) − =(λ−1)| ≤ p1 + γ2|λ−1|.
•∀X, Y ∈Cnu×nu:|L(X, Y )|=|hM∗XXkY M y, yi| ≤ kXkkkMk2kXkkYk kyk2.Hence
|L1|=|L(P∗, P )−L(Q∗, Q)|≤|L(P∗, P )|+|L(Q∗, Q)|
≤ kXkkkMk2(kPk2+kQk2)kyk2≤ kXkkkMk2(p2+q2
1)kyk2,
|L2|=|L(P∗, Q) + L(Q∗, P )|≤|L(P∗, Q)|+|L(Q∗, P )|
≤2kXkkkMk2kPkkQk kyk2≤2kXkk kMk2pq1kyk2,
and then |[<(λ−1) + γ=(λ−1)]L1+ [γ<(λ−1) − =(λ−1)]L2|
≤ |<(λ−1) + γ=(λ−1)||L1|+|γ<(λ−1) − =(λ−1)||L2|
≤p1 + γ2|λ−1|kXkkkMk2(p2+q2
1+ 2pq1)kyk2
=p1 + γ2|λ−1|kXkkkMk2(p+q1)2kyk2.
Now we consider four cases of λas in Lemma A.4:
•Case 1. <(λ3−λ2)≥0;
•Case 2. <(λ3−λ2)<0and θ∈[θ0, π −θ0]∪[−π+θ0,−θ0]for fixed 0< θ0<π
6;
•Case 3. <(λ3−λ2)<0and θ∈(−θ0, θ0)for fixed 0< θ0<π
6;
•Case 4. <(λ3−λ2)<0and θ∈(π−θ0, π)∪(−π, −π+θ0)for fixed 0< θ0<π
6.
The four cases will be treated in the following four lemmas (Lemmas 4.10–4.13), which together
give the statement of this proposition.
Lemma 4.10 (Case 1).For k≥2, equation (48)admits no solutions λin Case 1 if we take
•τ < s(Bk)−4
4kHk2kMk2kTkk2kBkk2+√2kMk2kXkk(1 + 2 kBkk)2if the denominator of the
right-hand side is not 0;
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 23
•any τ > 0otherwise.
Moreover, if kBk<1, we can take
τ < (1 − kBk)2
kHk2kMk2·(1 − kBkk)2
4kBk2k+√2(1 −kkBkk−1+ (k−1) kBkk)(1 + kBkk)2.
Proof. Writing (56) for γ=γ1as in Lemma A.4 (i) (in particular γ2
1= 1), we have
[<(λ3−λ2) + γ1=(λ3−λ2)] kyk2+τG(P∗+γ1Q∗, P +γ1Q)−2τG(Q∗, Q)
+τ([<(λ−1) + γ1=(λ−1)]L1+ [γ1<(λ−1) − =(λ−1)]L2) = 0.(57)
Since G(P∗+γ1Q∗, P +γ1Q)≥0, and by estimating
G(Q∗, Q)≤(kHkkTkkkMkq2sin |θ|)2kyk2,
[<(λ−1) + γ1=(λ−1)]L1+ [γ1<(λ−1) − =(λ−1)]L2
≥ −√2|λ−1|kXkkkMk2(p+q1)2kyk2,
by Lemma A.4 (i) the left-hand side of (57) will be strictly positive if τsatisfies:
2 (kHkkTkkkMkq2)2|sin θ|2
|λ−1|+√2kXkkkMk2(p+q1)2τ < 1.
Since |sin θ|2
|λ−1|≤|sin θ|2
2|sin(θ/2)|= 2 sin θ
2cos2θ
2≤2, we have the first part of the conclusion using
definitions (50), (51), (52) of p, q1, q2. Finally, the conclusion in the case kBk<1can be obtained
by Lemma 4.6.
Lemma 4.11 (Case 2).For k≥2, equation (48)admits no solutions λin Case 2 if we take
•τ < s(Bk)−4
1
2 sin(θ0/2) kHk2kMk2kTkk2+√2kMk2kXkk(1 + 2 kBkk)2if the denominator of
the right-hand side is not 0;
•any τotherwise.
Moreover, if kBk<1, we can take
τ < (1 − kBk)2
kHk2kMk2·(1 −
Bk
)2
h1
2 sin(θ0/2) (1 − kBkk)2+√2(1 −kkBkk−1+ (k−1) kBkk)i(1 + kBkk)2.
Proof. Writing (56) for γ=γ2as in Lemma A.4 (ii) (in particular γ2
2= 1), we have
[<(λ3−λ2) + γ2=(λ3−λ2)] kyk2+τG(P∗+γ2Q∗, P +γ2Q)−2τG(Q∗, Q)
+τ([<(λ−1) + γ2=(λ−1)]L1+ [γ2<(λ−1) − =(λ−1)]L2) = 0.(58)
Since G(Q∗, Q)≥0, and by estimating kP+γ2Qk≤kPk+|γ2|kQk=kPk+kQk ≤ p+q1, so
that
G(P∗+γ2Q∗, P +γ2Q)≤[kHk kTkk kMk(p+q1)]2kyk2,
RR n°9477
24 M. Bonazzoli & H. Haddar & T. A. Vu
and
[<(λ−1) + γ2=(λ−1)]L1+ [γ2<(λ−1) − =(λ−1)]L2
≤√2|λ−1|kXkkkMk2(p+q1)2kyk2,
by Lemma A.4 (ii), the left-hand side of (58) will be strictly negative if τsatisfies:
[kHkkTkkkMk(p+q1)]21
|λ−1|+√2kXkkkMk2(p+q1)2τ < 1.
Since 1
|λ−1|≤1
2 sin(θ0/2) , we have the first part of the conclusion using definitions (50), (51) of
p, q1. Finally, the conclusion in the case kBk<1can be obtained by Lemma 4.6.
Lemma 4.12 (Case 3).Let δ0>0be fixed and c:=1+2δ0sin 5θ0
2+δ2
0
cos25θ0
2
. For k≥2, equation (48)
admits no solutions λin Case 3 if we take
•τ < s(Bk)−42csin θ0
2
δ0kHk2kMk2kTkk2
Bk
2+√c
δ0kMk2kXkk(1 + 2
Bk
+ 2
Bk
2)
+2 max √c
δ0,√c
cos 3θ0kMk2kXkk(
Bk
+
Bk
2)iif the denominator of the right-hand
side is not 0;
•any τ > 0otherwise.
Moreover, if kBk<1, we can take
τ < (1−kBk)2
kHk2kMk2(1 − kBkk)22csin θ0
2
δ0
Bk
2k+√c
δ0(1 −kkBkk−1+ (k−1) kBkk)(1 + kBk2k)
+2 max √c
δ0,√c
cos 3θ0(1 −kkBkk−1+ (k−1) kBkk)kBkki−1.
Proof. Writing (56) for γ=γ3as in Lemma A.4 (iii), we have
[<(λ3−λ2) + γ3=(λ3−λ2)] kyk2+τG(P∗+γ3Q∗, P +γ3Q)−(1 + γ2
3)τG(Q∗, Q)
+τ([<(λ−1) + γ3=(λ−1)]L1+ [γ3<(λ−1) − =(λ−1)]L2) = 0.
(59)
Since G(P∗+γ3Q∗, P +γ3Q)≥0, the left-hand side of (59) will be strictly positive if τsatisfies:
τ < 1
kyk2"(1 + γ2
3)G(Q∗, Q)
<(λ3−λ2) + γ3=(λ3−λ2)
+|L1||<(λ−1) + γ3=(λ−1)|
<(λ3−λ2) + γ3=(λ3−λ2)+|L2||γ3<(λ−1) − =(λ−1)|
<(λ3−λ2) + γ3=(λ3−λ2)#−1
.
By estimating
•G(Q∗, Q)≤(kHkkTkkkMkq2|sin θ|)2kyk2
•|L1|≤kXkk kMk2(p2+q2
1)kyk2;
•|L2| ≤ 2kXkkkMk2pq1kyk2
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 25
and using Lemma A.4 (iii), it suffices to choose
(1 + γ2
3) (kHkkTkkkMkq2)22|sin θ
2|cos2θ
2
δ0+kXkkkMk2(p2+q2
1)√1+γ2
3
δ0
+2 kXkkkMk2pq1max √1+γ2
3
δ0,√1+γ2
3
cos 3θ0τ < 1.
Noting that c= 1 + γ2
3, the final result is obtained by definitions (50), (51), (52) of p, q1, q2.
Finally, the conclusion in the case 0<kBk<1can be obtained by Lemma 4.6.
Lemma 4.13 (Case 4).For k≥2, equation (48)admits no solutions λin Case 4 if we take
•τ < sin π
2−3θ0+ cos 2θ0s(Bk)−4
kHk2kMk2kTkk2(1 + kBkk)2+ 2 kMk2kXkk(1 + 2 kBkk)2if the denominator of the
right-hand side is not 0;
•any τ > 0otherwise.
Moreover, if kBk<1, we can take
τ < (1 − kBk)2
kHk2kMk2·sin π
2−3θ0+ cos 2θ0(1 − kBkk)2
(1 − kBkk)2+ 2(1 −kkBkk−1+ (k−1) kBkk)(1 + kBkk)2.
Proof. Here it is enough to consider (54). By the properties of G
G(Q∗, Q)≥0, G(P∗, P )≤(kHkkTkkkMkp)2kyk2
and Lemma A.4 (iv), we see that the left-hand side of (54) will be strictly negative if τsatisfies:
(kHkkTkkkMkp)21
sin(π
2−3θ0)+cos 2θ0
+kXkkkMk2(p+q1)22
sin(π
2−3θ0)+cos 2θ0τ < 1.
Definitions (50), (51) of p, q1lead to the final result. Finally, the conclusion in the case 0<
kBk<1can be obtained by Lemma 4.6.
Similarly, with the help of Lemma A.5, we prove for the k-step one-shot method the analogue
of Proposition 4.9. In particular, note that here just three cases of λneed to be considered,
because the analogue of the fourth one is excluded by Lemma A.5 (iv).
Proposition 4.14 (k-step one-shot method).∃τ > 0sufficiently small such that equation (49)
admits no solution λ∈C\R,|λ| ≥ 1. In particular, if kBk<1, given any δ0>0and 0< θ0<π
4,
take
τ < min{ψ1(k, b), ψ2(k, b), ψ3(k , b)}
kHk2kMk2
where
ψ1(k, b) = (1 −b)2(1 −bk)2
4b2k+√2(1 −kbk−1+ (k−1)bk)(1 + bk)2
ψ2(k, b) = (1 −b)2(1 −bk)2
h1
2 sin(θ0/2) (1 −bk)2+√2(1 −kbk−1+ (k−1)bk)i(1 + bk)2
ψ3(k, b) = (1 −b)2(1 −bk)2
2csin(θ0/2)
δ0b2k+ (1 −kbk−1+ (k−1)bk)h√c
δ0(1 + b2k) + 2 max√c
δ0,√c
cos 2θ0bki
and c=1+2δ0sin 3θ0
2+δ2
0
cos23θ0
2
.
RR n°9477
26 M. Bonazzoli & H. Haddar & T. A. Vu
4.4 Final result (k≥2)
Considering Remark 4.5, and taking the minimum between the bound in Proposition 4.7 for
real eigenvalues and the bound in Proposition 4.9 for complex eigenvalues, we finally obtain a
sufficient condition on the descent step τto ensure convergence of the shifted multi-step one-shot
method.
Theorem 4.15 (Convergence of shifted k-step one-shot, k≥2).Under assumption (4), the
shifted k-step one-shot method, k≥2, converges for sufficiently small τ. In particular, for
kBk<1, it is enough to take
τ < χ(k, kBk)
kHk2kMk2,
where χ(k, kBk)is an explicit function of kand kBk.
Similarly, by combining Remark 4.5, Propositions 4.8 and 4.14, we obtain a sufficient condition
on the descent step τto ensure convergence of the multi-step one-shot method.
Theorem 4.16 (Convergence of k-step one-shot, k≥2).Under assumption (4), the k-step
one-shot method, k≥2, converges for sufficiently small τ. In particular, for kBk<1, it is
enough to take
τ < ψ(k, kBk)
kHk2kMk2,
where ψ(k, kBk)is an explicit function of kand kBk.
5 Inverse problem with complex forward problem and real
parameter
In this section we show that a linear inverse problem with associated complex forward problem
and real parameter can be transformed into a linear inverse problem which matches with the real
model at the beginning of Section 2, so that the previous theory applies. More precisely, here
we study the state equation
u=Bu +M σ +F
where u∈Cnu,σ∈Rnσ,B∈Cnu×nu, M ∈Cnu×nσ. We measure H u(σ) = fwhere H∈Cnf×nu
and we want to recover σfrom f. Using the method of least squares, we consider the cost
functional
J(σ):=1
2kHu(σ)−fk2,
then by the Lagrangian technique with
L(u, v, σ) = 1
2kHu −fk2+<hBu +mσ +F−u, vi,
we can define the adjoint state p=p(σ)such that
p=B∗p+H∗(Hu(σ)−f),
which allows us to compute
∇J(σ) = <(M∗p).
By separating the real and imaginary parts of all vectors and matrices u=u1+ iu2,p=p1+ ip2,
B=B1+ iB2,M=M1+ iM2,F=F1+ iF2,H=H1+ iH2,f=f1+ if2, we can transform
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 27
this inverse problem with complex forward problem into the inverse problem with real forward
problem introduced at the beginning of Section 2. Indeed, note that B∗=B∗
1−iB∗
2,M∗=
M∗
1−iM∗
2,H∗=H∗
1−iH∗
2, so we have
u1+ iu2= (B1+ iB2)(u1+ iu2)+(M1+ iM2)σ+ (F1+ iF2)
p1+ ip2= (B∗
1−iB∗
2)(p1+ ip2)+(H∗
1−iH∗
2)[(H1+ iH2)(u1+ iu2)−(f1+ if2)]
∇J(σ) = <[(M∗
1−iM∗
2)(p1+ ip2)],
which implies
u1=B1u1−B2u2+M1σ+F1
u2=B2u1+B1u2+M2σ+F2
p1=B∗
1p1+B∗
2p2+ (H∗
1H1+H∗
2H2)u1−(H∗
2H1−H∗
1H2)u2−(H∗
1f1+H∗
2f2)
p2=−B∗
2p1+B∗
1p2+ (H∗
2H1−H∗
1H2)u1+ (H∗
1H1+H∗
2H2)u2−(−H∗
2f1+H∗
1f2)
∇J(σ) = M∗
1p1+M∗
2p2.
By setting
˜u=u1
u2,˜p=p1
p2,˜
B=B1−B2
B2B1,˜
M=M1
M2,˜
F=F1
F2,˜
H=H1−H2
H2H1,˜
f=f1
f2
we have
˜u=˜
B˜u+˜
Mσ +˜
F
˜p=˜
B∗˜p+˜
H∗(˜
H˜u−˜
f)
∇J(σ) = ˜
M∗˜p,
that has the same structure as the inverse problem at the beginning of Section 2.
Finally we finish this section by two lemmas that match the assumptions of the inverse
problem with complex state variable with the assumptions of the transformed inverse problem
with real state variable.
Lemma 5.1. Spec( ˜
B) = Spec(B)∪Spec(B).
Proof. By writing
˜
B=B1−B2
B2B1=I I
iI−iI
| {z }
C−1
B0
0B1
2I−i
2I
1
2Ii
2I
| {z }
C
,(60)
we find that det( ˜
B−λI) = det(B−λI) det(B−λI ). The conclusion is then deduced thanks to
the fact that Spec(B) = Spec(B).
Lemma 5.2. Assume that ρ(B)<1, and H(I−B)−1Mis injective. Then ρ(˜
B)<1, and
˜
H(˜
I−˜
B)−1˜
Mis injective where ˜
I∈R2nu×2nuis the identity matrix.
Proof. The previous lemma says that ρ(˜
B) = ρ(B)<1. Therefore (˜
I−˜
B)−1is well-defined and
thanks to (60),
(˜
I−˜
B)−1=I I
iI−iI
| {z }
C−1
(I−B)−10
0 (I−B)−11
2I−i
2I
1
2Ii
2I
| {z }
C
=1
2(I−B)−1+ (I−B)−1−i(I−B)−1+ i(I−B)−1
i(I−B)−1−i(I−B)−1(I−B)−1+ (I−B)−1.
RR n°9477
28 M. Bonazzoli & H. Haddar & T. A. Vu
Now we have
˜
H(˜
I−˜
B)−1˜
M=1
2H1−H2
H2H1(I−B)−1+ (I−B)−1−i(I−B)−1+ i(I−B)−1
i(I−B)−1−i(I−B)−1(I−B)−1+ (I−B)−1M1
M2
=1
2H(I−B)−1+H(I−B)−1−iH(I−B)−1+ iH(I−B)−1
iH(I−B)−1−iH(I−B)−1H(I−B)−1+H(I−B)−1M1
M2
=1
2H(I−B)−1M+H(I−B)−1M
iH(I−B)−1M−iH(I−B)−1M.
Now assume that there exists x∈Cnσsuch that H(˜
I−˜
B)−1˜
Mx = 0, then
([H(I−B)−1M+H(I−B)−1M]x= 0
[iH(I−B)−1M−iH(I−B)−1M]x= 0
or, equivalently, ([H(I−B)−1M+H(I−B)−1M]x= 0
[−H(I−B)−1M+H(I−B)−1M]x= 0.
By summing up these two equations we deduce that H(I−B)−1M x = 0, then x= 0 thanks to
the injectivity of H(I−B)−1M.
6 Numerical experiments
Let us introduce a toy model to illustrate numerically the performance of the different methods.
Given Ω⊂Rnan open bounded Lipschitz domain, we consider the direct problem for the
linearized scattered field u∈H2(Ω) given by the Helmholtz equation
div(˜σ0∇u) + ˜
k2u=div(σ∇u0),in Ω,
u= 0,on ∂Ω,(61)
where the incident field u0: Ω →Rsatisfies
div(˜σ0∇u0) + ˜
k2u= 0,in Ω,
u0=f, on ∂Ω(62)
with the datum f:∂Ω→R. Here σ: Ω →Rsuch that σ∂Ω= 0;˜σ0=σ0+δσris a given
function with δ≥0and random σr. More precisely, given ˜σ0and f, we solve for u0=u0(f)in
(62), then insert u0into (61) to solve for u=u(σ). The variational formulations for uand u0
are respectively
ZΩ
˜σ0∇u· ∇v−ZΩ
˜
k2uv =ZΩ
σ∇u0· ∇v, ∀v∈H1
0(Ω) and u= 0 on ∂Ω,(63)
ZΩ
˜σ0∇u0· ∇v−ZΩ
˜
k2uv = 0,∀v∈H1
0(Ω) and u0=fon ∂Ω.(64)
We are interested in the inverse problem of finding σfrom the measurement Hu(σ)where Hu :=
˜σ0∂ u
∂ ν ∂Ω. To solve this inverse problem we use the method of least squares. Denoting by σex the
exact σand g= ˜σ0∂ u(σex)
∂ ν ∂Ωthe corresponding measurement, we consider the cost functional
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 29
J(σ) = 1
2kHu(σ)−gk2
L2(∂Ω) =1
2R∂Ω(˜σ0∂ u(σ)
∂ ν −g)2. The Lagrangian technique allows us to
compute the gradient ∇σJ(σ) = −∇u0· ∇p(σ), where the adjoint state p=p(σ)satisfies
ZΩ
˜σ0∇p· ∇v−ZΩ
˜
k2pv = 0,∀v∈H1(Ω) and p=˜σ0
∂ u(σ)
∂ ν ∂Ω−gon ∂Ω.(65)
By discretizing uby P1finite elements on a mesh Tu
hof Ω, and σby P0finite elements
on a coarser mesh Tσ
hof Ω, the discretization of (63) can be written as the linear system
A1~u =A2~σ, where ~u ∈Rnu,~σ ∈Rnσ. More precisely, A1and A2are respectively issued from
the discretization of RΩ˜σ0∇u· ∇v−RΩ˜
k2uv and RΩσ∇u0· ∇v, where the Dirichlet boundary
conditions are imposed by the penalty method. To rewrite the system in the form (1), we
consider the naive splitting A1=A11 +δA12, where A11 and A12 are respectively issued from
the discretization of RΩσ0∇u· ∇v−RΩ˜
k2uv and RΩσr∇u· ∇v. Then we get
~u =A−1
11 (−δA12~u +A2~σ)and ~u = 0 on ∂Ω
and
~p =A−1
11 (−δA12 ~p)and ~p =H~u −~g on ∂Ω
where H∈Rnf×nuis the discretization of the above operator Hby abuse of notation. Choosing
δsuch that δ
A−1
11 A12
2<1, we consider (3) with B=−δA−1
11 A12,M=A−1
11 A2,F= 0.
The application of A−1
11 , which has the same size as matrix A1, is done by a direct solver; more
practical fixed point iterations will be investigated in the future.
Figure 1: Domain with six source points for the numerical experiments. The unknown σis
supported on the three squares.
We then perform some numerical experiments in FreeFEM [12] with the following setting:
•Wavenumber ˜
k= 2π,σ0= 1,δ= 0.01,σris a random real function with range in the
interval [1,2].
•Wavelength λ=2π
˜
k√σ0= 1, mesh size h=λ
20 = 0.05. The domain Ωis the disk shown in
Figure 1, where the squares are the support of function σ. Here nu= 5853,nσ= 6.
RR n°9477
30 M. Bonazzoli & H. Haddar & T. A. Vu
•We test with 6data fgiven by zero-order Bessel function of the second kind centered at the
points shown in Figure 1, and the cost functional is the normalized sum of the contributions
corresponding to different data.
•We take σex = 10 in every square and 0otherwise. The initial guess for the inverse problem
is 12 in every square and 0otherwise.
•For the first iteration, we perform a line search to adapt the descent step τ, using a direct
solver for the forward and adjoint problems.
•The stopping rule for the outer iteration is based on the relative value of the cost functional
and on the relative norm of the gradient with a tolerance of 10−5.
Recall that kis the number of inner iterations on the direct and adjoint problems. We are
interested in two experiments.
In the first experiment, we study the dependence on the descent step τ. In Figure 2a and
2b we respectively fix k= 1 and k= 2 and compare k-step one-shot methods with the usual
gradient descent method. On the horizontal axis we indicate the (outer) iteration number n
in (5) and (9). We can verify that for sufficiently small τ, both one-shot methods converge.
In particular, for τ= 2, while gradient descent and 2-step one-shot converge, 1-step one-shot
diverges. Oscillations may appear on the convergence curve for certain values of τ, but they
gradually vanish when τgets smaller. For sufficiently small τ, the convergence curves of both
one-shot methods are comparable to the one of gradient descent.
In the second experiment, we study the dependence on the number of inner iterations k, for
fixed τ. First (Figures 2c–2d), we investigate for which kthe convergence curve of k-step one-
shot is comparable with the one of usual gradient descent. As in the previous pictures, on the
horizontal axis we indicate the (outer) iteration number nin (5) and (9). For τ= 2 (see Figure
2c), we observe that for k= 3,4the convergence curves of k-step one-shot are close to the one of
usual gradient descent. Note that with 3inner iterations the L2error between unand the exact
solution to the forward problem ranges between 4.3·10−6and 0.0136 for different nin (9); in
fact this error is rather significant at the beginning then it tends to reduce when we are closer to
convergence for the parameter σ. Therefore incomplete inner iterations on the forward problem
are enough to have good precision on the solution of the inverse problem. In the very particular
case τ= 2.5(see Figure 2d), we observe an interesting phenomenon: when k= 3,5,10, with
k-step one-shot the cost functional decreases even faster than with usual GD. For bigger k, for
example k= 14, the convergence curve of one-shot is close to the one of usual gradient descent
as expected. Next (Figures 2e–2f), since the overall cost of the k-step one-shot method increases
with k, we indicate on the horizontal axis the accumulated inner iteration number, which sums
up kfrom an outer iteration to the next. More precisely, because at the first outer iteration
we perform a step search by a direct solver, we set to 1the first accumulated inner iteration
number; for the following outer iterations n≥2, the accumulated inner iteration number is set
to 1+(n−1)k. In Figures 2e–2f we replot the results for the converging k-step one-shot methods
of Figures 2c–2d with respect to the accumulated inner iteration number. For τ= 2 (see Figure
2e), while k= 2 presents some oscillations, quite interestingly it appears that k= 3 gives a
faster decrease of the cost functional with respect to k= 4, at least after the first iterations. For
τ= 2.5(see Figure 2f) we observe that k= 3 is enough for the decrease of the cost functional,
but with some oscillations, and the considered higher kappears again to give slower decrease.
A similar behavior can be observed for the shifted methods in Figure 3.
Finally we fix two particular values of τand compare all considered methods in Figure 4. We
note that shifted methods present more oscillations with respect to non-shifted ones, especially
for larger τ.
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 31
(a) Convergence curves of usual gradient descent
and 1-step one-shot for different descent step τ.
(b) Convergence curves of usual gradient descent
and 2-step one-shot for different descent step τ.
(c) Convergence curves of usual gradient descent
and k-step one-shot for different kwith τ= 2.
(d) Convergence curves of usual gradient descent
and k-step one-shot for different kwith τ= 2.5.
(e) Convergence curves of k-step one-shot for dif-
ferent kwith τ= 2.
(f) Convergence curves of k-step one-shot for dif-
ferent kwith τ= 2.5.
Figure 2: Convergence curves of usual gradient descent and k-step one-shot.
RR n°9477
32 M. Bonazzoli & H. Haddar & T. A. Vu
(a) Convergence curves of shifted gradient descent
and shifted 1-step one-shot for different descent
step τ.
(b) Convergence curves of shifted gradient descent
and shifted 2-step one-shot for different descent
step τ.
(c) Convergence curves of shifted gradient descent
and shifted k-step one-shot for different kwith τ=
0.25.
(d) Convergence curves of shifted gradient descent
and shifted k-step one-shot for different kwith τ=
0.5.
(e) Convergence curves of shifted k-step one-shot
for different kwith τ= 0.25.
(f) Convergence curves of shifted k-step one-shot
for different kwith τ= 0.5.
Figure 3: Convergence curves of shifted gradient descent and shifted k-step one-shot.
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 33
(a) Convergence curves with τ= 0.5.(b) Convergence curves with τ= 1.3.
Figure 4: Comparison of usual gradient descent and k-step one-shot with shifted gradient descent
and shifted k-step one-shot.
7 Conclusion
We have proved sufficient conditions on the descent step for the convergence of two variants of
multi-step one-shot methods. Although these bounds on the descent step are not optimal, to our
knowledge no other bounds, explicit in the number of inner iterations, are available in literature
for multi-step one-shot methods. Furthermore, we have shown in the numerical experiments
that very few inner iterations on the forward and adjoint problems are enough to guarantee good
convergence of the inversion algorithm.
These encouraging numerical results are preliminary in the sense that the considered fixed
point iteration is not a practical one, since it involves a direct solve of a problem of the same
size as the original forward problem. We will investigate in the future iterative solvers based
on domain decomposition methods (see e.g. [3]), which are well adapted to large-scale problems.
In addition, fixed point iterations could be replaced by more efficient Krylov subspace methods,
such as conjugate gradient or GMRES.
Another interesting issue is how to adapt the number of inner iterations in the course of
the outer iterations. Moreover, based on this linear inverse problem study, we plan to tackle
non-linear and time-dependent inverse problems.
References
[1] S. Barnett. Polynomials and linear control systems, volume 77 of Pure Appl. Math. Marcel
Dekker, Inc., New York, NY, 1983.
[2] M. Burger and W. Mühlhuber. Iterative regularization of parameter identification problems
by sequential quadratic programming methods. Inverse Problems, 18:943–969, 2002.
[3] V. Dolean, P. Jolivet, and F. Nataf. An Introduction to Domain Decomposition Methods:
Algorithms, Theory, and Parallel Implementation. Society for Industrial and Applied Math-
ematics, Philadelphia, PA, 2015.
[4] N. Gauger, A. Griewank, A. Hamdi, C. Kratzenstein, E. Özkaya, and T. Slawig. Automated
extension of fixed point PDE solvers for optimal design with bounded retardation. In Con-
RR n°9477
34 M. Bonazzoli & H. Haddar & T. A. Vu
strained Optimization and Optimal Control for Partial Differential Equations, International
Series of Numerical Mathematics, pages 99–122. Springer Basel, 2012.
[5] A. Greenbaum. Iterative Methods for Solving Linear Systems. Number 17 in Frontiers in
Applied Mathematics. Soc. for Industrial and Applied Math, Philadelphia, 1997.
[6] A. Griewank. Projected Hessians for Preconditioning in One-Step One-Shot Design Opti-
mization. In Large-Scale Nonlinear Optimization, volume 83, pages 151–171. Springer US,
Boston, MA, 2006. Series Title: Nonconvex Optimization and Its Applications.
[7] S. Günther, N. R. Gauger, and Q. Wang. Simultaneous single-step one-shot optimization
with unsteady PDEs. Journal of Computational and Applied Mathematics, 294:12–22, 2016.
[8] E. Haber and U. M. Ascher. Preconditioned all-at-once methods for large, sparse parameter
estimation problems. Inverse Problems, 17(6):1847–1864, 2001.
[9] A. Hamdi and A. Griewank. Reduced quasi-Newton method for simultaneous design and
optimization. Computational Optimization and Applications, 49(3):521–548, 2009.
[10] A. Hamdi and A. Griewank. Properties of an augmented Lagrangian for design optimization.
Optimization Methods and Software, 25(4):645–664, 2010.
[11] S.B. Hazra, V. Schulz, J. Brezillon, and N.R. Gauger. Aerodynamic shape optimization
using simultaneous pseudo-timestepping. Journal of Computational Physics, 204(1):46–64,
2005.
[12] F. Hecht. New development in FreeFem++. J. Numer. Math., 20(3-4):251–265, 2012.
[13] E.I. Jury. On the roots of a real polynomial inside the unit circle and a stability criterion for
linear discrete systems. IFAC Proceedings Volumes, 1(2):142–153, 1963. 2nd International
IFAC Congress on Automatic and Remote Control: Theory, Basle, Switzerland, 1963.
[14] E.I. Jury. Theory and Applications of the Z-Transform Method. New York, 1964.
[15] B. Kaltenbacher, A. Kirchner, and B. Vexler. Goal oriented adaptivity in the IRGNM for
parameter identification in PDEs II: all-at-once formulations. Inverse Problems, 30:045002,
2014.
[16] M. Marden. The geometry of the zeros of a polynomial in a complex variable, volume 3 of
Math. Surv. American Mathematical Society (AMS), Providence, RI, 1949.
[17] M. Marden. Geometry of Polynomials. Number 3 in Mathematical Surveys and Monographs.
American Math. Soc, Providence, RI, 2nd edition, 1966.
[18] E. Özkaya and N. R. Gauger. Single-step One-shot Aerodynamic Shape Optimization. In
Optimal Control of Coupled Systems of Partial Differential Equations, volume 158, pages
191–204. Birkhäuser Basel, Basel, 2009. Series Title: International Series of Numerical
Mathematics.
[19] V. Schulz and I. Gherman. One-Shot Methods for Aerodynamic Shape Optimization. In
MEGADESIGN and MegaOpt - German Initiatives for Aerodynamic Simulation and Opti-
mization in Aircraft Design, volume 107, pages 207–220. Springer Berlin Heidelberg, Berlin,
Heidelberg, 2009. Series Title: Notes on Numerical Fluid Mechanics and Multidisciplinary
Design.
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 35
[20] I. Schur. Über Potenzreihen, die im Innern des Einheitskreises beschränkt sind. Journal für
die reine und angewandte Mathematik (Crelles Journal), 1917(147):205–232, 1917.
[21] A. Shenoy, M. Heinkenschloss, and E. M. Cliff. Airfoil design by an all-at-once method.
International Journal of Computational Fluid Dynamics, 11(1-2):3–25, 1998.
[22] S. Ta’asan. "One Shot" Methods for Optimal Control of Distributed Parameter Systems I:
Finite Dimensional Control. Technical Report 91-2, ICASE, Hampton, 1991.
[23] S. Ta’asan, G. Kuruvila, and M. Salas. Aerodynamic design and optimization in one shot. In
30th Aerospace Sciences Meeting and Exhibit, Reno, NV, U.S.A., 1992. American Institute
of Aeronautics and Astronautics.
[24] A. Tarantola and B. Valette. Generalized nonlinear inverse problems solved using the least
squares criterion. Reviews of Geophysics, 20(2):219–232, 1982.
[25] T. van Leeuwen and F. J. Herrmann. Mitigating local minima in full-waveform inversion by
expanding the search space. Geophysical Journal International, 195(1):661–667, 2013.
[26] T. van Leeuwen and F. J. Herrmann. A penalty method for PDE-constrained optimization
in inverse problems. Inverse Problems, 32(1):015007, 2015.
RR n°9477
36 M. Bonazzoli & H. Haddar & T. A. Vu
A Some useful lemmas
We state auxiliary results about matrices like those appearing in the eigenvalue equations (24),
(25), (48), (49).
Lemma A.1. Let (Cn×n,k·k)be a normed space and T∈Cn×n. If ρ(T)<1, then
∞
X
k=0
Tkconverges and ∞
X
k=0
Tk= (I−T)−1.
Moreover, if kTk<1,
(I−T)−1
≤1
1−kTk.
Lemma A.2. Let T∈Cn×nsuch that ρ(T)<1. Set
s(T):= sup
z∈C,|z|≥1
(I−T/z)−1
(66)
then 0< s(T)<+∞. Moreover, if kTk<1,0< s(T)≤1
1− kTk.
Proof. The functional z7→
(I−T/z)−1
, with z∈C,|z| ≥ 1, is well-defined and continuous,
and we use Lemma A.1.
The following lemma says that, for T∈Cn×nand λ∈C,|λ| ≥ 1, we can decompose
I−T
λ!−1
=P(λ)+iQ(λ)and I−T∗
λ!−1
=P(λ)∗+ iQ(λ)∗
and gives bounds for P(λ)and Q(λ).
Lemma A.3. Let T∈Cn×nsuch that ρ(T)<1and λ∈C,|λ| ≥ 1. Write 1
λ=r(cos φ+ i sin φ)
in polar form, where 0< r ≤1and φ∈[−π , π]. Then
I−T
λ!−1
=P(λ)+iQ(λ)and I−T∗
λ!−1
=P(λ)∗+ iQ(λ)∗
where
P(λ)=(I−rcos φ T )(I−2rcos φ T +r2T2)−1, Q(λ) = rsin φ T (I−2rcos φ T +r2T2)−1
are Cn×n-valued functions. We also have the following properties:
(i) kP(λ)k ≤ (1 + kTk)s(T)2and kQ(λ)k≤|sin φ|kTks(T)2≤ kTks(T)2.
(ii) Moreover if kTk<1then
kP(λ)k ≤ 1
1− kTkand kQ(λ)k ≤ kTk
1− kTk.
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 37
Proof. The first part of the lemma is verified by direct computation, using
(I−T/λ)−1= (I−T /λ∗) [(I−T /λ) (I−T /λ∗)]−1,
(I−T∗/λ)−1= [(I−T∗/λ∗) (I−T∗/λ)]−1(I−T∗/λ∗)
and
(I−T/λ) (I−T /λ∗) = I−2rcos φ T +r2T2.
After that, with the help of Lemma A.2, it is not difficult to show the inequalities in (i). To
prove (ii), first observe that the two series
∞
X
k=0
rkcos(kφ)Tkand ∞
X
k=1
rksin(kφ)Tk
converge. Then, by expanding and simplifying the left-hand sides, we can show that
"∞
X
k=0
rkcos(kφ)Tk#(I−2rcos φ T +r2T2) = I−rcos φ T
and "∞
X
k=1
rksin(kφ)Tk#(I−2rcos φ T +r2T2) = rsin φ T
so P(λ)and Q(λ)can be expressed as the series above, and the inequalities in (ii) follow.
In Sections 3.3 and 4.3 we identify different cases of λ∈Cand we need corresponding
estimations, given in the two following lemmas. Lemma A.4 is used for the shifted k-step one-
shot method and Lemma A.5 is used for the k-step one-shot method.
Lemma A.4. For λ∈C\R,|λ| ≥ 1we write λ=R(cos θ+ i sin θ)in polar form where R≥1,
θ∈(−π, π),θ6= 0.
(i) For λsatisfying <(λ3−λ2)≥0, let γ1=γ1(λ) = 1,if =(λ3−λ2)≥0,
−1,if =(λ3−λ2)<0then
<(λ3−λ2) + γ1=(λ3−λ2)≥ |λ−1| ≥ 2|sin(θ/2)|.
(ii) Let 0< θ0≤π
6. For λsatisfying <(λ3−λ2)<0and θ∈[θ0, π −θ0]∪[−π+θ0,−θ0],
let γ2=−1,if =(λ3−λ2)≥0,
1,if =(λ3−λ2)<0then
−<(λ3−λ2)−γ2=(λ3−λ2)≥ |λ−1| ≥ 2 sin(θ0/2).
(iii) Let 0< θ0≤π
6and δ0>0. For λsatisfying <(λ3−λ2)<0and θ∈(−θ0, θ0)\{0}, let
γ3=γ3(sign(θ)) = δ0+ sin 5θ0
2/cos 5θ0
2if θ > 0,
−δ0+ sin 5θ0
2/cos 5θ0
2if θ < 0then
<(λ3−λ2) + γ3=(λ3−λ2)≥2δ0|sin(θ/2)|.
Moreover, if 0< θ0<π
6, we have
RR n°9477
38 M. Bonazzoli & H. Haddar & T. A. Vu
|<(λ−1)+γ3=(λ−1)|
<(λ3−λ2)+γ3=(λ3−λ2)≤√1+γ2
3
δ0and |γ3<(λ−1)−=(λ−1)|
<(λ3−λ2)+γ3=(λ3−λ2)≤max √1+γ2
3
δ0,√1+γ2
3
cos 3θ0.
(iv) Let 0< θ0≤π
6. For λsatisfying <(λ3−λ2)<0and θ∈(π−θ0, π)∪(−π, −π+θ0), we
have
−<(λ3−λ2)≥sin π
2−3θ0+ cos 2θ0,
|<(λ−1)|
−<(λ3−λ2)≤2
sin π
2−3θ0+ cos 2θ0
and |=(λ−1)|
−<(λ3−λ2)≤2
sin π
2−3θ0+ cos 2θ0
.
Proof. (i) From the definition of γ1we see that γ2
1= 1,γ1=(λ3−λ2)≥0and
<(λ3−λ2) + γ1=(λ3−λ2)2=<(λ3−λ2)2+=(λ3−λ2)2+ 2γ1<(λ3−λ2)=(λ3−λ2)
≥<(λ3−λ2)2+=(λ3−λ2)2=|λ3−λ2|2,
which yields <(λ3−λ2) + γ1=(λ3−λ2)≥R2|λ−1|. Finally,
|λ−1|=|Rcos θ−1+iRsin θ|=pR2+ 1 −2Rcos θ≥√2−2 cos θ= 2|sin(θ/2)|
since the function R7→ R2+ 1 −2Rcos θ, for R≥1, is increasing.
(ii) In this case we have θ
2∈θ0
2,π
2−θ0
2∪−π
2+θ0
2,−θ0
2so sin θ
2≥sin θ0
2. From the definition
of γ2we see that γ2
2= 1 and γ2=(λ3−λ2)≤0. Similar to (i), we have −<(λ2−λ)−γ2=(λ2−λ)≥
|λ−1| ≥ 2|sin(θ/2)|, that implies the conclusion.
(iii) Note that cos 3θ > 0,−π
2<3θ < π
2, and sin 3θhas the same sign as θand γ3, so we have
<(λ3−λ2) + γ3=(λ3−λ2) = R2(Rcos 3θ−cos 2θ+γ3Rsin 3θ−γ3sin 2θ)
≥cos 3θ−cos 2θ+γ3sin 3θ−γ3sin 2θ
=−2 sin 5θ
2sin θ
2+ 2γ3cos 5θ
2sin θ
2
= 2 sin θ
2γ3cos 5θ
2−sin 5θ
2.
Then we consider two cases: if 0< θ < θ0then γ3>0,sin θ
2= sin θ
2>0,0<5θ
2<5θ0
2<π
2and
γ3cos 5θ
2−sin 5θ
2> γ3cos 5θ0
2−sin 5θ0
2=δ0; if −θ0< θ < 0then −γ3>0,sin θ
2=−sin θ
2>0,
−π
2<−5θ0
2<5θ
2<0and −γ3cos 5θ
2+ sin 5θ
2>−γ3cos 5θ0
2−sin 5θ0
2=δ0.
Next, if 0< θ0<π
6, we will show that |<(λ−1)+γ3=(λ−1)|
<(λ3−λ2)+γ3=(λ3−λ2)and |γ3<(λ−1)−=(λ−1)|
<(λ3−λ2)+γ3=(λ3−λ2)are both
bounded. First,
|<(λ−1) + γ3=(λ−1)|
<(λ3−λ2) + γ3=(λ3−λ2)=|(cos θ+γ3sin θ)R−1|
R2[(cos 3θ+γ3sin 3θ)R−(cos 2θ+γ3sin 2θ)]
≤|(cos θ+γ3sin θ)R−1|
(cos 3θ+γ3sin 3θ)R−(cos 2θ+γ3sin 2θ).
Since γ3does not depend on R, let us study f1(R) = aR−1
bR−c2where a= cos θ+γ3sin θ,
b= cos 3θ+γ3sin 3θand c= cos 2θ+γ3sin 2θ. We observe that:
•a, b, c > 0. Indeed, cos θ, cos 2θ, cos 3θ > 0, and θand γ3have the same sign.
•bR −c > 0since <(λ3−λ2) + γ3=(λ3−λ2)>0, thus R > c
b.
•ac > b (equivalently c
b>1
a), since
ac = cos θcos 2θ+γ2
3sin θsin 2θ+γ3sin 3θ > cos θcos 2θ−sin θsin 2θ+γ3sin 3θ=b.
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 39
Now, f0
1(R) = 2 ·aR−1
bR−c·b−ac
(bR−c)2<0for R > c
b>1
aand we would like to have c
b<1so that
f1(R)≤f1(1),∀R≥1. Indeed c
b<1is equivalent to
cos 2θ+γ3sin 2θ < cos 3θ+γ3sin 3θ⇔ |γ3|>sin 5θ
2
cos 5θ
2
,
which is true since
|γ3|=δ0+ sin 5θ0
2
cos 5θ0
2
>sin 5θ
2
cos 5θ
2
+ε0where ε0=δ0
cos 5θ0
2
.
Then we study
f1(1) = cos θ−1 + γ3sin θ
cos 3θ−cos 2θ+γ3(sin 3θ−sin 2θ)2
= −sin θ
2+γ3cos θ
2
−γ3sin 5θ
2+γ2
3cos 5θ
2!2
γ2
3.
We have:
•(−sin θ
2+γ3cos θ
2)2≤1 + γ2
3by Cauchy-Schwarz inequality;
•γ2
3=|γ3|2>γ3sin 5θ
2
cos 5θ
2
+ε0|γ3|that leads to −γ3sin 5θ
2+γ2
3cos 5θ
2> ε0cos 5θ0
2|γ3|=δ0|γ3|;
hence f1(1) ≤1+γ2
3
δ2
0
and finally |<(λ−1)+γ3=(λ−1)|
<(λ3−λ2)+γ3=(λ3−λ2)≤√1+γ2
3
δ0. Next, we have
|γ3<(λ−1) − =(λ−1)|
<(λ2−λ) + γ3=(λ2−λ)=|(γ3cos θ−sin θ)R−γ3|
R2[(cos 3θ+γ3sin 3θ)R−(cos 2θ+γ3sin 2θ)]
≤|(γ3cos θ−sin θ)R−γ3|
(cos 3θ+γ3sin 3θ)R−(cos 2θ+γ3sin 2θ).
Since γ3does not depend on R, let us study f2(R) = dR−γ3
bR−c2where d=γ3cos θ−sin θand
b, c as above. We observe that:
•γ3b−cd and θhave the same sign. Indeed, γ3b−cd = (γ2
3+ 1) sin θcos 2θ. Consequently,
we always have (γ3b−cd)γ3>0.
•We always have γ3
d>1. Indeed, if θ > 0then d > 0since γ3=δ0+sin 5θ0
2
cos 5θ0
2
>sin θ
cos θ,
also γ3
d=γ3
γ3cos θ−sin θ>1; if θ < 0then d < 0since −γ3=δ0+sin 5θ0
2
cos 5θ0
2
>−sin θ
cos θ, also
γ3
d=−γ3
−γ3cos θ+sin θ>1.
Now, f0
2(R)=2·
d
γ3R−1
bR−c·(γ3b−cd)γ3
(bR−c)2, so, thanks to the above results, f2(R)decreases for 1≤R < γ3
d
and increases for R > γ3
d. Moreover, like for f1(1), we can estimate
f2(1) = −cos θ
2−γ3sin θ
2
−γ3sin 5θ
2+γ2
3cos 5θ
2!2
γ2
3≤1 + γ2
3
δ2
0
,
and limR→+∞f2(R) = γ3cos θ−sin θ
cos 3θ+γ3sin 3θ2≤1+γ2
3
cos23θ0. Therefore
|γ3<(λ−1) − =(λ−1)|
<(λ2−λ) + γ3=(λ2−λ)≤max p1 + γ2
3
δ0
,p1 + γ2
3
cos 3θ0!.
(iv) Since θ∈(π−θ0, π)∪(−π, −π+θ0), we have
RR n°9477
40 M. Bonazzoli & H. Haddar & T. A. Vu
•2θ∈(2π−2θ0,2π)∪(−2π, −2π+ 2θ0)⊆2π−π
3,2π∪−2π, −2π+π
3thus cos 2θ >
cos 2θ0>0;
•3θ∈(3π−3θ0,3π)∪(−3π, −3π+ 3θ0)⊆3π−π
2,3π∪−3π, −3π+π
2, thus −cos 3θ >
−cos(3π−3θ0) = sin π
2−3θ0≥0;
So we have
−<(λ3−λ2) = R2(−Rcos 3θ+ cos 2θ)>hsin π
2−3θ0+ cos 2θ0iR2>0.
Finally, |<(λ−1)|
−<(λ3−λ2)≤R+1
[sin(π
2−3θ0)+cos 2θ0]R2≤2
sin(π
2−3θ0)+cos 2θ0
and similarly for |=(λ−1)|
−<(λ3−λ2).
Lemma A.5. For λ∈C\R,|λ| ≥ 1we write λ=R(cos θ+ i sin θ)in polar form where R≥1,
θ∈(−π, π),θ6= 0.
(i) For λsatisfying <(λ2−λ)≥0, let γ1=γ1(λ) = 1,if =(λ2−λ)≥0,
−1,if =(λ2−λ)<0then
<(λ2−λ) + γ1=(λ2−λ)≥ |λ(λ−1)| ≥ 2|sin(θ/2)|.
(ii) Let 0< θ0≤π
4. For λsatisfying <(λ2−λ)<0and θ∈[θ0, π −θ0]∪[−π+θ0,−θ0], let
γ2=γ2(λ) = −1,if =(λ2−λ)≥0,
1,if =(λ2−λ)<0then
−<(λ2−λ)−γ2=(λ2−λ)≥ |λ(λ−1)| ≥ 2 sin(θ0/2).
(iii) Let 0< θ0≤π
4and δ0>0. For λsatisfying <(λ2−λ)<0and θ∈(−θ0, θ0)\{0}, let
γ3=γ3(sign(θ)) = δ0+ sin 3θ0
2/cos 3θ0
2if θ > 0,
−δ0+ sin 3θ0
2/cos 3θ0
2if θ < 0then
<(λ2−λ) + γ3=(λ2−λ)≥2δ0|sin(θ/2)|.
Moreover, if 0< θ0<π
4then
|<(λ−1)+γ3=(λ−1)|
<(λ2−λ)+γ3=(λ2−λ)≤√1+γ2
3
δ0and |γ3<(λ−1)−=(λ−1)|
<(λ2−λ)+γ3=(λ2−λ)≤max √1+γ2
3
δ0,√1+γ2
3
cos 2θ0.
(iv) Let 0< θ0≤π
4. There exists no λsatisfying <(λ2−λ)<0and θ∈(π−θ0, π)∪(−π, −π+
θ0).
Proof. The proofs for (i) and (ii) are similar to those in Lemma A.4.
(iii) Note that cos 2θ > 0,−π
2<2θ < π
2, and sin 2θhas the same sign as θand γ3, so we have
<(λ2−λ) + γ3=(λ2−λ) = R(Rcos 2θ−cos θ+γ3Rsin 2θ−γ3sin θ)
≥cos 2θ−cos θ+γ3sin 2θ−γ3sin θ
=−2 sin 3θ
2sin θ
2+ 2γ3cos 3θ
2sin θ
2
= 2 sin θ
2γ3cos 3θ
2−sin 3θ
2.
Then we consider two cases: if 0< θ < θ0then γ3>0,sin θ
2= sin θ
2>0,0<3θ
2<3θ0
2<π
2and
γ3cos 3θ
2−sin 3θ
2> γ3cos 3θ0
2−sin 3θ0
2=δ0; if −θ0< θ < 0then −γ3>0,sin θ
2=−sin θ
2>0,
−π
2<−3θ0
2<3θ
2<0and −γ3cos 3θ
2+ sin 3θ
2>−γ3cos 3θ0
2−sin 3θ0
2=δ0.
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 41
Next, if 0< θ0<π
4, we will show that |<(λ−1)+γ3=(λ−1)|
<(λ2−λ)+γ3=(λ2−λ)and |γ3<(λ−1)−=(λ−1)|
<(λ2−λ)+γ3=(λ2−λ)are both
bounded. First,
|<(λ−1) + γ3=(λ−1)|
<(λ2−λ) + γ3=(λ2−λ)=|(cos θ+γ3sin θ)R−1|
R[(cos 2θ+γ3sin 2θ)R−(cos θ+γ3sin θ)]
≤|(cos θ+γ3sin θ)R−1|
(cos 2θ+γ3sin 2θ)R−(cos θ+γ3sin θ).
Since γ3does not depend on R, let us study f1(R) = aR−1
bR−a2where a= cos θ+γ3sin θ,
b= cos 2θ+γ3sin 2θ. We observe that:
•a > 0and b > 0. Indeed, cos θ > 0,cos 2θ > 0, and θand γ3have the same sign.
•bR −a > 0since <(λ2−λ) + γ3=(λ2−λ)>0, thus R > a
b.
•a2> b (equivalently a
b>1
a), since a2= cos2θ+γ2
3sin2θ+γ3sin 2θ > cos2θ−sin2θ+
γ3sin 2θ=b.
Now, f0
1(R) = 2 ·aR−1
bR−a·b−a2
(bR−a)2<0for R > a
b>1
aand we would like to have a
b<1so that
f1(R)≤f1(1),∀R≥1. Indeed a
b<1is equivalent to
cos θ+γ3sin θ < cos 2θ+γ3sin 2θ⇔ |γ3|>sin 3θ
2
cos 3θ
2
,
which is true since
|γ3|=δ0+ sin 3θ0
2
cos 3θ0
2
>sin 3θ
2
cos 3θ
2
+ε0where ε0=δ0
cos 3θ0
2
.
Then we study
f1(1) = cos θ−1 + γ3sin θ
cos 2θ−cos θ+γ3(sin 2θ−sin θ)2
= −sin θ
2+γ3cos θ
2
−γ3sin 3θ
2+γ2
3cos 3θ
2!2
γ2
3.
We have:
•(−sin θ
2+γ3cos θ
2)2≤1 + γ2
3by Cauchy-Schwarz inequality;
•γ2
3=|γ3|2>γ3sin 3θ
2
cos 3θ
2
+ε0|γ3|that leads to −γ3sin 3θ
2+γ2
3cos 3θ
2> ε0cos 3θ
2|γ3|=δ0|γ3|;
hence f1(1) ≤1+γ2
3
δ2
0
and finally |<(λ−1)+γ3=(λ−1)|
<(λ2−λ)+γ3=(λ2−λ)≤√1+γ2
3
δ0. Next, we have
|γ3<(λ−1) − =(λ−1)|
<(λ2−λ) + γ3=(λ2−λ)=|(γ3cos θ−sin θ)R−γ3|
R[(cos 2θ+γ3sin 2θ)R−(cos θ+γ3sin θ)]
≤|(γ3cos θ−sin θ)R−γ3|
(cos 2θ+γ3sin 2θ)R−(cos θ+γ3sin θ).
Since γ3does not depend on R, let us study f2(R) = cR−γ3
bR−a2where c=γ3cos θ−sin θand
a, b as above. We observe that:
RR n°9477
42 M. Bonazzoli & H. Haddar & T. A. Vu
•γ3b−ca and θhave the same sign. Indeed, γ3b−ca = (γ2
3+ 1) sin θcos θ. Consequently,
we always have (γ3b−ca)γ3>0.
•We always have γ3
c>1. Indeed, if θ > 0then c > 0since γ3=δ0+sin 3θ0
2
cos 3θ0
2
>sin θ
cos θ,
also γ3
c=γ3
γ3cos θ−sin θ>1; if θ < 0then c < 0since −γ3=δ0+sin 3θ0
2
cos 3θ0
2
>−sin θ
cos θ, also
γ3
c=−γ3
−γ3cos θ+sin θ>1.
Now, f0
2(R)=2·
c
γ3R−1
bR−a·(γ3b−ca)γ3
(bR−a)2, so, thanks to the above results, f2(R)decreases for 1≤R < γ3
c
and increases for R > γ3
c. Moreover, like for f1(1), we can estimate
f2(1) = −cos θ
2−γ3sin θ
2
−γ3sin 3θ
2+γ2
3cos 3θ
2!2
γ2
3≤1 + γ2
3
δ2
0
and limR→+∞f2(R) = γ3cos θ−sin θ
cos 2θ+γ3sin 2θ2≤1+γ2
3
cos 2θ0. Therefore
|γ3<(λ−1) − =(λ−1)|
<(λ2−λ) + γ3=(λ2−λ)≤max p1 + γ2
3
δ0
,p1 + γ2
3
cos 2θ0!.
(iv) For θ∈(π−θ0, π )∪(−π, −π+θ0), we have cos 2θ > 0since 2θ∈3π
2,2π∪−2π, −3π
2,
while cos θ < 0. Hence <(λ2−λ) = R(Rcos 2θ−cos θ)>0.
B Descent step for usual and shifted gradient descent
Proposition B.1 (Descent step for the usual gradient descent).The usual gradient descent
algorithm (5)converges if
0< τ < 2
kH(I−B)−1Mk2.
Proof. The error system for (5) can be rewritten as
pn+1
un+1
σn+1
=
−τ(I−B∗)−1H∗H(I−B)−1MM ∗0 (I−B∗)−1H∗H(I−B)−1M
−τ(I−B)−1MM ∗0 (I−B)−1M
−τM ∗0I
pn
un
σn
(67)
Recall that a fixed point iteration converges if and only if the spectral radius of its iteration
matrix is strictly less than 1. We can show that:
(i) If λ∈C\{0,1}is an eigenvalue of the iteration matrix, then, proceeding as in Proposition
4.3, there exists y∈Cnσ, y 6= 0 such that
λ2(λ−1) + τ
H(I−B)−1My
2
kyk2λ2= 0 (68)
hence λ= 1 −τkH(I−B)−1M yk2
kyk2. If we take τ < 2
kH(I−B)−1Mk2then equation (68) admits
no solution λwith |λ| ≥ 1.
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 43
(ii) λ= 1 is not an eigenvalue of the iteration matrix. To show this, we rewrite iteration (67)
as
σn+1
pn+1
un+1
=
I−τM ∗0
(I−B∗)−1H∗H(I−B)−1M−τ(I−B∗)−1H∗H(I−B)−1MM ∗0
(I−B)−1M−τ(I−B)−1MM ∗0
σn
pn
un
.
Proposition B.2 (Convergence of the shifted gradient descent).The shifted gradient descent
algorithm (6)converges if
0< τ < 1
kH(I−B)−1Mk2.
Proof. The error system for (6) can be rewritten as
pn+1
un+1
σn+1
=
0 0 (I−B∗)−1H∗H(I−B)−1M
0 0 (I−B)−1M
−τM ∗0I
pn
un
σn
.(69)
Recall that a fixed point iteration converges if and only if the spectral radius of its iteration
matrix is strictly less than 1. We can show that:
(i) If λ∈C\{0,1}is an eigenvalue of the iteration matrix, then, proceeding as in Proposition
4.2, there exists y∈Cnσ, y 6= 0 such that
λ2(λ−1) + τ
H(I−B)−1My
2
kyk2λ= 0.(70)
By applying Lemma C.1 for
a0= 0, a1=τ
H(I−B)−1My
2
kyk2, a2=−1,
we see that equation (70) admits no solution λwith |λ| ≥ 1if we take τ < kyk2
kH(I−B)−1My k2.
Then it is enough to take τ < 1
kH(I−B)−1Mk2.
(ii) λ= 1 is not an eigenvalue of the iteration matrix. To show this, we rewrite iteration (69)
as
σn+1
pn+1
un+1
=
I−τM ∗0
(I−B∗)−1H∗H(I−B)−1M0 0
(I−B)−1M0 0
σn
pn
un
.
and proceed as in Proposition 4.2.
RR n°9477
44 M. Bonazzoli & H. Haddar & T. A. Vu
C Convergence study for the scalar case
C.1 Notations and preliminary calculation
In the scalar case, that is when nu, nσ, nf= 1, we change the notation from capital to lower case
letters:
B←b∈R, b < 1, M ←m∈R, m 6= 0, H ←h∈R, h 6= 0,
Tk←tk= 1 + b+... +bk−1=1−bk
1−b, Uk←uk=kh2bk−1(71)
Xk←xk=0, k = 1,
h2[1 + 2b+ 3b2+... + (k−1)bk−2], k ≥2.
The identity 1+2x+ 3x2+... +nxn−1=1−xn+1
1−x0=1−(n+1)xn+nxn+1
(1−x)2says that
xk=h21−kbk−1+ (k−1)bk
(1 −b)2, k ≥1,(72)
where we set bk−1= 1 when k= 1 and b= 0. Now for each of algorithms (5), (6), (10), (9), we
write the iterations for the errors in the scalar case and the corresponding iteration matrix M
such that [pn+1, un+1, σ n+1]|=M[pn, un, σn]|.
•Usual gradient descent (usual GD):
σn+1 =σn−τmpn
un=bun+mσn
pn=bpn+h2unM=
−h2m2(1 −b)−2τ0h2m(1 −b)−2
−m2(1 −b)−1τ0m(1 −b)−1
−mτ 0 1
(73)
•Shifted gradient descent (shifted GD):
σn+1 =σn−τmpn
un+1 =bun+1 +mσn
pn+1 =bpn+1 +h2un+1 M=
0 0 h2m(1 −b)−2
0 0 m(1 −b)−1
−mτ 0 1
(74)
•k-step one-shot:
σn+1 =σn−τmpn
pn+1 = (bk−τm2xk)pn+ukun+mxkσn
un+1 =bkun+mtkσn−τm2tkpnM=
bk−m2xkτ ukmxk
−m2tkτ bkmtk
−mτ 0 1
(75)
•Shifted k-step one-shot:
σn+1 =σn−τmpn
pn+1 =bkpn+ukun+mxkσn
un+1 =bkun+mtkσnM=
bkukmxk
0bkmtk
−mτ 0 1
.(76)
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 45
C.2 Necessary and sufficient conditions for convergence
In this simpler scalar case, we will be able to prove sufficient and also necessary conditions on
the descent step τfor convergence. Our strategy to study the spectral radius ρ(M)is as follows:
1. Compute det(M − λI )to write the eigenvalue equation P(λ) = 0. For the considered
methods, Pturns out to be a polynomial of degree 3,P(λ) = a0+a1λ+a2λ2+λ3, where
a0, a1, a2∈Rdepend on h, m, b, τ . For the computations, the identity uktk−bkxk+xk=
h2t2
k, which is the scalar version of (41), can be helpful.
2. Apply to PLemma C.1, which states a necessary and sufficient condition for a real coeffi-
cient polynomial of degree 3 to have all roots inside the unit circle of the complex plane.
Then deduce conditions on τ.
Lemma C.1. Let a0, a1, a2∈R, then all roots of P(z) = a0+a1z+a2z2+z3stay (strictly)
inside the unit circle of the complex plane if and only if
(a0−1)(a0+ 1) <0,(77)
(a2
0−a2a0+a1−1)(a2
0+a2a0−a1−1) >0,(78)
(a0+a2−a1−1)(a0+a2+a1+ 1) <0.(79)
The proof of Lemma C.1 is in Appendix Dand is mainly based on Marden’s works [17].
C.2.1 Descent step for the usual gradient descent
Here, the coefficients of Pare
a0= 0, a1= 0, a2=h2m2(1 −b)−2τ−1.
Conditions (77) and (78) of Lemma C.1 are automatically satisfied. Condition (79) gives
0< τ < 2(1 −b)2
h2m2,
that is (7) in the scalar case.
C.2.2 Descent step for the shifted gradient descent
Here, the coefficients of Pare
a0= 0, a1=h2m2(1 −b)−2τ, a2=−1.
Condition (77) of Lemma C.1 is automatically satisfied, condition (79) is automatically satisfied
for τ > 0, and condition (78) gives us
τ < (1 −b)2
h2m2,
that is (8) in the scalar case.
RR n°9477
46 M. Bonazzoli & H. Haddar & T. A. Vu
C.2.3 Descent step for k-step one-shot
Here, the coefficients of Pare
a0=−s2, a1=m2(h2t2
k−xk)τ+ (s2+ 2s), a2=m2xkτ−(2s+ 1)
where s=bk. Condition (77) of Lemma C.1 is obviously satisfied since |b|<1. Next we deal
with condition (78). The computation shows that
a2
0−a2a0+a1−1 = m2(h2t2
k−xk+xks2)τ+ (s−1)3(s+ 1)
| {z }
<0
,(80)
a2
0+a2a0−a1−1 = −m2(h2t2
k−xk+xks2)τ+ (s−1)(s+ 1)3
| {z }
<0
(81)
and
h2t2
k−xk+xks2=h2bk−1(1 −bk)[k−(k+ 1)b+kbk−(k−1)bk+1]
(1 −b)2.(82)
Lemma C.2. k−(k+ 1)b+kbk−(k−1)bk+1 >0,∀|b|<1,∀k≥1.
Proof. We write k−(k+1)b+kbk−(k−1)bk+1 = (1 −b)Awhere A=k+ 1−1−bk
1−b+(k−1)bk. It
suffices to show A > 0. If k= 1 then A= 1 >0. If either kis even, or k≥3is odd and 0≤b < 1,
then (k−1)bk≥0and 1−bk
1−b=|bk−1+bk−2+... +b+ 1|≤|bk−1|+|bk−2|+... +|b|+ 1 < k give
us the conclusion. If k≥3is odd and −1<b<0then (k−1)(1 + bk+ 1) >0and 1−bk
1−b<1
therefore A= 1 + 1−1−bk
1−b+ (k−1)(1 + bk)>0.
Then, condition (78) imposes
•τ < (1−b)2(1+bk)(1−bk)2
h2m2bk−1[k−(k+1)b+kbk−(k−1)bk+1]if bk−1>0;
•τ < (1−b)2(1+bk)3
h2m2bk−1[−k+(k+1)b−kbk+(k−1)bk+1]if bk−1<0;
•no condition on τif k≥2and b= 0.
Finally we check condition (79). We have a0+a2+a1+ 1 = h2m2t2
kτ > 0and
a0+a2−a1−1 = h2m2(1 −2kbk−1+ 2kbk−b2k)
(1 −b)2τ−2(1 + s)2,
therefore, condition (79) gives
•τ < 2(1−b)2(1+bk)2
h2m2(1−2kbk−1+2kbk−b2k)if 1−2kbk−1+ 2kbk−b2k>0;
•no condition on τif 1−2kbk−1+ 2kbk−b2k≤0.
In the following lemma we study the quantity 1−2kbk−1+ 2kbk−b2kthat appears above.
Lemma C.3. Let fk(b)=1−2kbk−1+ 2kbk−b2kfor k∈N∗and −1≤b≤1.
(i) f1(b) = −(1 −b)2<0,∀ − 1< b < 1.
(ii) f2(b) = 1 −4b+ 4b2−b4has a unique solution b=−1 + √2in (−1,1); and f2(b)>0if
−1< b < −1 + √2,f2(b)<0if −1 + √2< b < 1.
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 47
(iii) If k≥3is odd then fk(b)has exactly two solutions b1(k)< b2(k)in (−1,1); if k≥2is
even then fk(b)has a unique solution b3(k)in (−1,1). Moreover, for every odd k≥3:
•−1< b1(k)<0< b2(k)<1;
•fk(b)>0⇔b1(k)< b < b2(k);
•fk(b)<0⇔ −1< b < b1(k)∨b2(k)<b<1.
and for every even k≥2:
•0< b3(k)<1;
•fk(b)>0⇔ −1< b < b3(k);
•fk(b)<0⇔b3(k)<b<1.
(iv) lim
kodd
k→∞
b1(k) = −1and lim
kodd
k→∞
b2(k) = 1 = lim
keven
k→∞
b3(k)=1.
Proof. (i) and (ii) are easy to verify. (iii) It remains to consider k≥3. We have
f0
k(b) = bk−2−2k(k−1) + 2k2b−2kbk+1,−1<b<1.
Set
gk(b) = −2k(k−1) + 2k2b−2kbk+1,−1≤b≤1, k ≥3.
Case 1. [k≥3is odd] By studying the sign of g0
k(b), we find that
•gkhas a unique solution v1(k)in (−1,1) and 0< v1(k)<k
qk
k+1 <1;
•gk(b)>0⇔v1(k)<b<1;
•gk(b)<0⇔ −1< b < v1(k).
Next, by studying the sign of f0
k(b), we find that
•fk(b)has exactly two solutions b1(k)< b2(k)in (−1,1) and −1< b1(k)<0< b2(k)<1;
•fk(b)>0⇔b1(k)< b < b2(k);
•fk(b)<0⇔ −1< b < b1(k)∨b2(k)< b < 1.
Case 2. [k≥4is even] By studying the sign of g0
k(b), we find that
•gkhas a unique solution v2(k)in (−1,1) and 0< v2(k)<k
qk
k+1 <1;
•gk(b)>0⇔v2(k)<b<1;
•gk(b)<0⇔0< b < v2(k).
Next, by studying the sign of f0
k(b), we find that
•fk(b)has a unique solution b3(k)in (−1,1) and 0< b3(k)<1;
•fk(b)>0⇔ −1< b < b3(k);
•fk(b)<0⇔b3(k)< b < 1.
RR n°9477
48 M. Bonazzoli & H. Haddar & T. A. Vu
(iv) We have
fk1
2= 1 −k
2k−1−1
2k,∀k≥3and fk−1
2= 1 −3k
2k−1−1
2k,∀odd k≥3,
hence for sufficiently large kwe have fk1
2>0and for sufficiently large odd kwe have fk−1
2>
0. By the table of signs of fk, we conclude that b1(k)<−1
2for large odd k,b2(k)>1
2for large
odd kand b3(k)>1
2for large even k.
Case 1. [k≥3is odd and sufficiently large] First we work with b1(k). We have
1−2kb1(k)k−1+ 2kb1(k)k−b1(k)2k= 0
and b1(k)<−1
2so
−b1(k)2k+ 2kb1(k)k+ 1 = 2kb1(k)k−1= [−2kb1(k)k]
| {z }
>0
·1
−b1(k)<[−2kb1(k)k].2 = −4kb1(k)k,
which leads to
b1(k)2k−6kb1(k)k−1>0⇔[b1(k)k−3k]2>1+9k2.
Since −1< b1(k)<0and kis odd, this tells us that
−1< b1(k)<−(−3k+p1+9k2)1/k =−1
(3k+√1+9k2)1/k <−1
(7k)1/k,
which yields lim
kodd
k→∞
b1(k) = −1. Next, we have
1−2kb2(k)k−1+ 2kb2(k)k−b2(k)2k= 0
and b2(k)>1
2so
−b2(k)2k+ 2kb2(k)k+ 1 = 2kb2(k)k−1= 2kb2(k)k·1
b2(k)<4kb2(k)k,
which leads to
b2(k)2k+ 2kb2(k)k−1>0⇔[b2(k)k+k]2>1 + k2.
Since 0< b2(k)<1, this tells us that
1> b2(k)>(−k+p1 + k2)1/k =1
(k+√1 + k2)1/k >1
(3k)1/k,
which yields lim
keven
k→∞
b2(k) = 1.
Case 2. [k≥4is even and sufficiently large] We repeat the same arguments as b2(k)for b3(k).
In summary, we have the following proposition.
Proposition C.4 (Convergence of k-step one-shot).Let η1(k, b):= +∞and
η21(k, b):=(1 −b)2(1 + bk)(1 −bk)2
bk−1[k−(k+ 1)b+kbk−(k−1)bk+1];
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 49
η22(k, b):=−(1 −b)2(1 + bk)3
bk−1[k−(k+ 1)b+kbk−(k−1)bk+1];
η3(k, b):=2(1 −b)2(1 + bk)2
1−2kbk−1+ 2kbk−b2k
then the necessary and sufficient condition for the convergence of k-step one-shot in the scalar
case is of the form τ < η(k,b)
h2m2where η(k, b)is defined as follows:
(i) η(1, b) = η21(1, b) = (1 −b)3(1 + b),−1< b < 1;
(ii) for odd k≥3,
η(k, b) =
η21(k, b),−1< b ≤b1(k)∨b2(k)≤b < 1,
min {η21(k, b), η3(k, b)}, b1(k)< b < b2(k)∧b6= 0,
2, b = 0
where −1< b1(k)<0< b2(k)<1are the two solutions of
1−2kbk−1+ 2kbk−b2k= 0,−1<b<1;
(iii) for even k≥2,
η(k, b) =
η21(k, b), b3(k)≤b < 1,
min {η21(k, b), η3(k, b)},0< b < b3(k),
2, b = 0,
min {η22(k, b), η3(k, b)},−1< b < 0
where 0< b3(k)<1is the unique solution of
1−2kbk−1+ 2kbk−b2k= 0,−1<b<1.
Note that lim
kodd
k→∞
b1(k) = −1and lim
kodd
k→∞
b2(k) = 1 = lim
keven
k→∞
b3(k), so the behavior of τwhen k→ ∞ is
consistent with the result τ < 2(1−b)2
h2m2,−1< b < 1for the usual gradient descent. For illustrations
of the function η(k, b)for different ksee section C.3.
C.2.4 Descent step for shifted k-step one-shot
Here, the coefficients of the polynomial Pof the eigenvalue equation are
a0=h2m2vkτ−s2, a1=h2m2ykτ+s2+ 2s, a2=−2s−1,(83)
where s=bk,yk=xk
h2=1−kbk−1+(k−1)bk
(1−b)2and vk=t2
k−yk=bk−1[k−(k+1)b+bk+1]
(1−b)2. Note that vk
and bk−1have the same sign, also vk= 0 if and only if k≥2and b= 0, since it is easy to show
that k−(k+ 1)b+bk+1 >0,∀|b|<1,∀k≥1. Then, condition (77) of Lemma C.1 imposes
•τ < 1+s2
h2m2vk=(1−b)2(1+b2k)
h2m2bk−1[k−(k+1)b+bk+1]if bk−1>0;
•τ < −1+s2
h2m2vk=(1−b)2(−1+b2k)
h2m2bk−1[k−(k+1)b+bk+1]if bk−1<0;
•no condition on τif k≥2and b= 0.
RR n°9477
50 M. Bonazzoli & H. Haddar & T. A. Vu
Next we study condition (78). We have
a2
0−a2a0+a1−1 = v2
k(h2m2τ)2+ [(−2s2+ 2s+ 1)vk+yk]h2m2τ+ (s−1)3(s+ 1)
| {z }
<0
and
a2
0+a2a0−a1−1 = v2
k(h2m2τ)2−[(2s2+ 2s+ 1)vk+yk]h2m2τ+ (s−1)(s+ 1)3
| {z }
<0
,
each of which, considered as a second order polynomial of h2m2τif vk6= 0, has exactly two roots
of opposite signs. Therefore if vk6= 0, condition (78) is equivalent to (h2m2τ−r1)(h2m2τ−r2)>0
where
r1:=(2s2−2s−1)vk−yk+p(−4s+ 5)v2
k+y2
k+ 2(−2s2+ 2s+ 1)vkyk
2v2
k
>0
and
r2:=(2s2+ 2s+ 1)vk+yk+p(8s2+ 12s+ 5)v2
k+y2
k+ 2(2s2+ 2s+ 1)vkyk
2v2
k
>0.
Lemma C.5. r1and r2cannot be both strictly less than 1+s2
vk.r1and r2cannot be both strictly
less than −1+s2
vk.
Proof. Either r1<1+s2
vkor r1<−1+s2
vkimplies (s2+ 4s+ 1)v2
k+ (s2+ 1)vkyk>0. Either
r2<1+s2
vkor r2<−1+s2
vkimplies (s2+ 4s+ 1)v2
k+ (s2+ 1)vkyk<0.
Thanks to this lemma we see that condition (78), in combination with condition (77), gives
•τ < 1
h2m2min{r1, r2}if bk−16= 0;
•τ < 1
h2m2if k≥2and b= 0.
Finally, we have a0+a2+a1+ 1 = h2m2t2
kτ > 0and
a0+a2−a1−1 = h2m2
(1 −b)2[−1+2kbk−1−2kbk+b2k]τ−2(1 −bk)2,
thus condition (79) is equivalent to
•τ < 2(1−b)2(1−bk)2
h2m2(−1+2kbk−1−2kbk+b2k)if 1−2kbk−1+ 2kbk−b2k<0;
•no condition on τif 1−2kbk−1+ 2kbk−b2k≥0.
One can look again at Lemma C.3 for the analysis of 1−2kbk−1+ 2kbk−b2k. In summary, we
have the following proposition.
Proposition C.6 (Convergence of shifted k-step one-shot).Let
κ11(k, b):=(1 −b)2(1 + b2k)
bk−1[k−(k+ 1)b+bk+1];
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 51
κ12(k, b):=(1 −b)2(−1 + b2k)
bk−1[k−(k+ 1)b+bk+1];
tk:=1−bk
1−b, yk:=1−kbk−1+ (k−1)bk
(1 −b)2, s :=bk, vk:=t2
k−yk,
κ21(k, b):=(2s2−2s−1)vk−yk+p(−4s+ 5)v2
k+y2
k+ 2(−2s2+ 2s+ 1)vkyk
2v2
k
,
κ22(k, b):=(2s2+ 2s+ 1)vk+yk+p(8s2+ 12s+ 5)v2
k+y2
k+ 2(2s2+ 2s+ 1)vkyk
2v2
k
,
κ2(k, b):= min{κ21(k, b), κ22 (k, b)};
κ3(k, b):=2(1 −b)2(1 −bk)2
−1+2kbk−1−2kbk+b2k
then the necessary and sufficient condition for the convergence of shifted k-step one-shot in the
scalar case is of the form τ < κ(k,b)
h2m2where κ(k, b)is defined as follows:
(i) κ(1, b) = min {κ11(1, b), κ2(1, b), κ3(1, b)}, also note that
κ11(1, b) = 1 + b2, κ21(1, b) = 2b2−2b−1 + √−4b+ 5
2,
κ22(1, b) = 2b2+ 2b+1+√8b2+ 12b+ 5
2, κ3(1, b) = 2(1 −b)2;
(ii) for odd k≥3,
κ(k, b) =
min {κ11(k, b), κ2(k, b), κ3(k, b)},−1< b < b1(k)∨b2(k)< b < 1,
min {κ11(k, b), κ2(k, b)}, b1(k)≤b≤b2(k)∧b6= 0,
1, b = 0
where −1< b1(k)<0< b2(k)<1are the two solutions of
1−2kbk−1+ 2kbk−b2k= 0,−1<b<1;
(iii) for even k≥2,
κ(k, b) =
min {κ11(k, b), κ2(k, b), κ3(k, b)}, b3(k)< b < 1,
min {κ11(k, b), κ2(k, b)},0< b ≤b3(k),
1, b = 0,
min {κ12(k, b), κ2(k, b)},−1< b < 0
where 0< b3(k)<1is the unique solution of
1−2kbk−1+ 2kbk−b2k= 0,−1<b<1.
Remark C.7.In implementation, we rewrite κ21 (k, b)as
b(1 −b)2(bk−1)
k−(k+ 1)b+bk+1 +
2·h−bk+1+b(1−b)2(1−bk)yk
k−(k+1)b+bk+1 i
yk+vk+p(−4s+ 5)v2
k+y2
k+ 2(−2s2+ 2s+ 1)vkyk
to avoid numerical errors. Also in this formula, we see that κ21(k, b)k→∞
−→ (1 −b)2(note that
yk=1−kbk−1+(k−1)bk
(1−b)2
k→∞
−→ 1
(1−b)2and vk=t2
k−yk
k→∞
−→ 0).
For illustrations of the function κ(k , b)for different ksee section C.3.
RR n°9477
52 M. Bonazzoli & H. Haddar & T. A. Vu
C.3 Comparison of the bounds for the descent step
In summary, in the scalar case, the necessary and sufficient convergence conditions on the descent
step τ > 0are:
τ < 2(1 −b)2
h2m2, τ < (1 −b)2
h2m2, τ < η(k , b)
h2m2, τ < κ(k , b)
h2m2,
respectively for usual GD, shifted GD, k-step one-shot (with η(k, b)given in Proposition C.4),
shifted k-step one-shot (with κ(k, b)given in Proposition C.6). By taking m=h= 1, in Figure 5
we plot for different kthe functions: b7→ 2(1 −b)2(usual GD), b7→ (1 −b)2(shifted GD),
b7→ η(k, b)(k-step one-shot) and b7→ κ(k, b)(shifted k-step one-shot).
From these plots we can draw two important conclusions. First, when kincreases the visu-
alized curves for k-step one-shot and shifted k-step one-shot tend to the corresponding curves
for usual and shifted gradient descent, as expected. Second, even in this scalar case, it ap-
pears difficult to establish a simplified expression for η(k, b)in Proposition C.4 and κ(k, b)in
Proposition C.6 to find a practical upper bound for the descent step τ.
Remark C.8.For k≥2, we observe that for some bthe admissible range of τof k-step one-shot
is larger than the one of usual GD, that is not intuitive. This is indeed verified numerically using
FreeFEM: when b= 0.2and τ= 2.08,2-step one-shot converges while the usual GD does not.
D A proof of Lemma C.1 based on Marden’s works
Definition D.1. We say that a complex coefficient polynomial has property Pif all its zeros
lie (strictly) inside the unit circle |z|= 1.
We recall some definitions from Marden’s works [17].
Definition D.2. Let P(z) = a0+a1z+... +anznwhere ak∈R, k = 0, ..., n (we do not require
an6= 0 here). We define
˜
P(z):=an+an−1z+... +a0zn
and call it the reverse polynomial of P. One can also see that ˜
P(z) = znP(1/z).
Definition D.3. Let P(z) = a0+a1z+... +anznwhere ak∈R, k = 0, ..., n. We define a
polynomial sequence {Pk}0≤k≤nwhere
Pk(z) = a(k)
0+a(k)
1z+... +a(k)
n−kzn−k
as follows:
•P0=P;
•Pk+1 =a(k)
0Pk−a(k)
n−k˜
Pkfor 0≤k≤n−1.
Then we define
mk(P) = a(1)
0a(2)
0··· a(k)
0,1≤k≤n.
The coefficients of these polynomials can be gathered in the following table, that we call
Marden’s table:
Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 53
(a) Shifted 1-step one-shot (b) 1-step one-shot
(c) Shifted k-step one-shot, odd k≥3(d) k-step one-shot, odd k≥3
(e) Shifted k-step one-shot, even k≥2(f) k-step one-shot, even k≥2
Figure 5: Admissible τin the scalar case as a function of b.
RR n°9477
54 M. Bonazzoli & H. Haddar & T. A. Vu
1x x2... xn−1xn
P0a0a1a2... an−1an
˜
P0anan−1an−2... a1a0
P1a(1)
0a(1)
1a(1)
2... a(1)
n−1
˜
P1a(1)
n−1a(1)
n−2a(1)
n−3... a(1)
0
.
.
..
.
..
.
..
.
..
.
..
.
..
.
.
Pn−1a(n−1)
0a(n−1)
1
˜
Pn−1a(n−1)
1a(n−1)
0
Pna(n)
0
We have a nice and simple criterion mainly based on the works of Marden [16,17] and Jury
[13,14], known as Jury-Marden Criterion:
Theorem D.4 (Jury-Marden Criterion).The polynomial Phas property Pif and only if
a(1)
0<0; a(k)
0>0,∀2≤k≤n.
This necessary and sufficient condition is mentioned several times in the literature (see e.g. [1,
Theorem 3.10]), but it is not easy to find an explicit proof, so we provide a proof for the reader’s
convenience. Before proving this result, we apply Jury-Marden Criterion to a polynomial of
degree 3and obtain precisely Lemma C.1, that is the following proposition.
Proposition D.5. Let P(z) = a0+a1z+a2z2+z3, z ∈Cwhere a0, a1, a2∈R. Then Phas
property Pif and only if
(a0−1)(a0+ 1) <0
(a2
0−a2a0+a1−1)(a2
0+a2a0−a1−1) >0
(a0+a2−a1−1)(a0+a2+a1+ 1) <0.
Proof. By directly applying Jury-Marden Criterion to P, we obtain Marden’s table as follows:
1x x2x3
P0=P a0a1a21
˜
P01a2a1a0
P1a2
0−1a1a0−a2a2a0−a1
˜
P1a2a0−a1a1a0−a2a2
0−1
P2(a2
0−1)2−(a2a0−a1)2(a1a0−a2)(a2
0−a2a0+a1−1)
˜
P2(a1a0−a2)(a2
0−a2a0+a1−1) (a2
0−1)2−(a2a0−a1)2
and
P3(x) = (a2
0−1)2−(a2a0−a1)22−(a1a0−a2)2(a2
0−a2a0+a1−1)2.
Hence
a(1)
0=a2
0−1=(a0−1)(a0+ 1),
a(2)
0= (a2
0−1)2−(a2a0−a1)2= (a2
0−a2a0+a1−1)(a2
0+a2a0−a1−1),
a(3)
0=(a2
0−1)2−(a2a0−a1)22−(a1a0−a2)2(a2
0−a2a0+a1−1)2
=(a2
0+a2a0−a1−1)2−(a1a0−a2)2(a2
0−a2a0+a1−1)2
= [a2
0+ (a2−a1)a0+a2−a1−1][a2
0+ (a2+a1)a0−a2−a1−1]
(a2
0−1−a2a0+a1)2
= (a0+ 1)(a0+a2−a1−1)(a0−1)(a0+a2+a1+ 1)(a2
0−1−a2a0+a1)2.Inria
Convergence analysis of multi-step one-shot methods for linear inverse problems 55
Then the condition a(1)
0<0, a(2)
0>0, a(3)
0>0, after being simplified, is equivalent to three
inequalities of the statement.
Now, to prove Jury-Marden Criterion, we need the following two results.
Theorem D.6 (Marden, [17], Theorem 42.1).Let Pbe a real coefficient polynomial of n-th
degree. If the sequence
m1(P), m2(P), ..., mn(P)
has exactly pnegative elements and n−ppositive elements (hence no null elements), then Phas
pcomplex roots (including multiplicities) inside the unit circle |z|= 1, no roots on this circle
and n−pcomplex roots (including multiplicities) outside this circle.
Lemma D.7 (Schur, [20]).Let P(z) = a0+a1z+... +anznwhere ak∈R,∀1≤k≤n. Assume
that |a0|<|an|. Then deg ˜
P1=n−1, and Phas property Pif and only if ˜
P1has property P.
Proof of Jury-Marden Criterion D.4.The sufficient condition for Phaving property Pis a di-
rect consequence of Marden’s Theorem D.6. It remains to prove the necessary one.
For that, we will prove the following statement M(n)by induction: “For every real-coefficient
polynomial Pof n-th degree having property P, the sequence a(1)
0, ..., a(n)
0obtained by Marden’s
algorithm must satisfy
a(1)
0<0, a(k)
0>0,∀2≤k≤n.”
To check M(1), let P(z) = a0+a1zwhere a0, a1∈R, a16= 0. Then P(z) = 0 ⇔z=−a0/a1
and | − a0/a1|<1⇔ |a0|<|a1| ⇔ a(1)
0=a2
0−a2
1<0.
Now supposing that M(n−1) is true for some n∈N, n ≥2, we show that M(n)is true. Let
P(z) = a0+a1z+... +anznwhere ak∈R, k = 0, ..., n and an6= 0. Assume that Phas property
P. First, a(1)
0=a2
0−a2
n<0. Indeed, let z1, z2, ..., znbe the nzeros including multiplicities of
P, then by Viète’s formulas z1z2···zn= (−1)n(a0/an). Taking the module of both sides of this
identity and noting that Phas property P, we have |a0/an|<1, thus a(1)
0=a2
0−a2
n<0. Next,
by Lemma D.7,˜
P1is of (n−1)-th degree and it also has property P. Marden’s table for ˜
P1can
be easily found:
1x x2... xn−3xn−2xn−1
˜
P1a(1)
n−1a(1)
n−2a(1)
n−3... a(1)
2a(1)
1a(1)
0
P1a(1)
0a(1)
1a(1)
2... a(1)
n−3a(1)
n−2a(1)
n−1
−P2−a(2)
0−a(2)
1−a(2)
2... −a(2)
n−3−a(2)
n−2
−˜
P2−a(2)
n−2−a(2)
n−3a(1)
n−4... −a(2)
1−a(2)
0
P3a(3)
0a(3)
1a(3)
2... a(3)
n−3
˜
P3a(3)
n−3a(3)
n−4a(3)
n−5... a(3)
0
.
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
.
Pn−1a(n−1)
0a(n−1)
1
˜
Pn−1a(n−1)
1a(n−1)
0
Pna(n)
0
By M(n−1), we must then have −a(2)
0<0, a(k)
0>0,∀3≤k≤n.
RR n°9477
RESEARCH CENTRE
SACLAY – ÎLE-DE-FRANCE
1 rue Honoré d’Estienne d’Orves
Bâtiment Alan Turing
Campus de l’École Polytechnique
91120 Palaiseau
Publisher
Inria
Domaine de Voluceau - Rocquencourt
BP 105 - 78153 Le Chesnay Cedex
inria.fr
ISSN 0249-6399