Content uploaded by Nathanaël Perraudin
Author content
All content in this area was uploaded by Nathanaël Perraudin on Aug 20, 2015
Content may be subject to copyright.
2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 20-23, 2013, New Paltz, NY
A FAST GRIFFIN-LIM ALGORITHM
Nathanaël Perraudin1, Peter Balazs2, Peter L. Søndergaard2
1EPFL, Switzerland, nathanael.perraudin@epfl.ch
2Acoustics Research Institute, Vienna, Austria
peter.soendergaard@oeaw.ac.at, peter.balazs@oeaw.ac.at
ABSTRACT
In this paper, we present a new algorithm to estimate a signal from
its short-time Fourier transform modulus (STFTM). This algorithm
is computationally simple and is obtained by an acceleration of
the well-known Griffin-Lim algorithm (GLA). Before deriving
the algorithm, we will give a new interpretation of the GLA
and formulate the phase recovery problem in an optimization
form. We then present some experimental results where the new
algorithm is tested on various signals. It shows not only significant
improvement in speed of convergence but it does as well recover
the signals with a smaller error than the traditional GLA.
Index Terms—Magnitude-only reconstruction, Short-time
Fourier transform, Phase reconstruction, time-scale modification
(TSM), signal estimation, spectrogram inversion
I. INTRODUCTION
Time-frequency representations, in particular Gabor transforms
[1], i.e. the sampled Short-Time Fourier transforms (STFT), are
ubiquitous in signal processing. Gabor transforms describe a signal
in time and frequency simultaneously. This transformation is fast
(thanks to the Fast Fourier transform (FFT)) and provides a good
tool for signal modification. If the magnitude squared of the STFT
is understood to be the "localized time-frequency power spectrum",
the phase remains a complicated object which is difficult to modify
appropriately. As a consequence, most of the transformations on
the STFT work with the magnitude or the magnitude squared
(the spectrogram), leaving the phase unchanged or sometime
completely dropped. Since the STFT is a redundant construction,
the obtained coefficients usually does not form a valid spectrogram
(ie: there exists no signal having exactly this spectrogram.)
For instance, in the case of adaptive filtering like denoising, the
magnitude of the STFT is often modified without any modification
of the phase [2].
Furthermore, accurate reconstruction of a signal from its spec-
trogram is also important. This is known as the phase recovery
problem: recovering a signal from the amplitude of some mea-
surements, only. In the influential paper [3], it was proven that
for frames with sufficiently high redundancy, a signal can be
reconstructed from the magnitude of its frame coefficients only
(up to a global phase factor). Recent results [4] put the necessary
redundancy at 4L−4
4for a frame for CL.
The notion of valid spectrogram plays a very important role
in the problem: the STFT has to verify a so-called "consistency
criterion" [5], [6]. In fact, the set of complex STFT coefficients
is a proper subset of the coefficient space i.e. taking an array of
complex coefficients usually does not correspond to the STFT of
a signal. As a result, modifying the magnitude of the STFT does
not in general lead to a valid spectrogram.
From this difference, we consider two different problems re-
spectively called:
•Phase recovery: constructing a signal from a valid spectro-
gram and no phase information.
•STFT magnitude approximation: constructing a signal from
a non-valid STFT magnitude and eventually some starting
phase.
Both of those problems can be solved using our algorithm which
is highly inspired by the Griffin-Lim algorithm (GLA) [7]. This last
method (see section IV) performs iteratively two projections. We
propose to consider the difference between two iterations. Doing
so, we lose all theoretical guarantees of convergence. However, this
new structure allows a more accurate and faster convergence. We
also expect it to be compatible with GLA modification presented
in [5], [8], [9].
After presenting briefly the Gabor transform, we will give a
new interpretation of the problem and the GLA. This will lead to
a new proposition of algorithm which we call the fast Griffin-Lim
algorithm (FGLA). We will then present simulation results.
It should be noted that both the GLA and the algorithm
presented in this paper can be applied to any frame, and not just
to Gabor frames. However, for clarity and simplicity, we shall
consider only the Gabor case in this paper.
II. GABOR THEORY
In this contribution, we consider Gabor systems G(g, a, M )
in CL. All signals and windows on CLare considered to have
periodic boundary conditions. For g∈CL, and integer a, M > 0,
we define the Gabor system
G(g, a, M ) := gm,n =g[· − na]e2πim·/Mn, m ,(1)
where m= 0,...,M −1is the index of the frequency-channel
and n= 0,...,N −1is the index of the time-position. If Gis
also a frame [10], we refer to the system as a Gabor frame. For
x∈CL, the corresponding Gabor transform is given by
(Gx)[m+nM] = hx, gm,ni=
L−1
X
l=0
x[l]gm,n[l],(2)
with the analysis operator Gthat is given by the matrix
G[m+nM, l] := Gg ,a,M [m+nM, l] := gm,n [l].
Gabor synthesis is performed by applying the conjugate trans-
pose of Gto a coefficient sequence c∈CMN . The action of the
synthesis operator can be equivalently described as
xsyn[l] = (G∗c)[l] = X
m,n
c[m+nM]g[l−na]e2πiml/M .(3)
The concatenation S=G∗Gof analysis and synthesis opera-
tors is called the frame operator.
Reconstruction can be realized using the so-called canonical
dual system, obtained by inverting Sand defined as
γm,n =S−1gm,n.(4)
2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 20-23, 2013, New Paltz, NY
In the particular case of Gabor frames, the canonical dual system
is again a Gabor frame, i.e. it equals G(γ0,0, a, M ). Thus we refer
to γ0,0=S−1gas the canonical dual window.
In this case the synthesis operator of γ0,0coincides
with the pseudo-inverse of the original analysis operator, i.e.
G∗
γ,a,M =G†. So the inversion formula reads
x[l] = X
m,n
hx, gm,niγm,n [l] = G†Gx[l].(5)
A particular way to modify the coefficients is by multiplication
by a fixed symbol s.
Mf =X
m,n
sm,n hx, gm,niγm,n .
Such operators, called multipliers can be defined for all kind of
frames [11], and find applications in acoustics, see e.g. [12].
III. THE PROBLEM
The problem can be expressed as finding a signal x∗∈RL(or
more generally in CL) from given a given set of non-negative
coefficients s, such that the magnitude of the STFT of x∗:|Gx|,
is as close as possible to s. The `2-norm will be used as measure
of closeness. Mathematically, we formulate the problem in the
following form:
Given a frame Gand real positive coefficients s=|s|,x∗is
the solution of
minimizex∈RLk|Gx| − sk2(6)
We will call sa valid STFT magnitude if there exists an xsuch
that |Gx|=s.
For convenience, we define an equivalent problem with the
optimization variable on the coefficient side.
minimizec∈CMN k|c| − sk2s. t. ∃x∈RL|c=Gx(7)
Those definitions lead naturally to a measure of error:
E(x) = k|Gx| − sk2
ksk2
(8)
For convenience, instead of (8), we use the signal to noise ratio
of the STFT magnitude. This can be expressed
SSNR(x) = −10 log10 (E(x)) (9)
IV. THE GRIFFIN-LIM ALGORITHM (GLA)
The GLA (named after their authors) was presented in 1984
in [7]. It aims at estimating a signal from its modified short time
Fourier transform. The GLA is a version of the double-projection
algorithm originally suggested by Gerchberg and Saxton [13] for
solving the phase recovery problem in terms of the Fourier trans-
form. The Gerchberg-Saxton works for a non-redundant system
(the Fourier transform) by considering additional side-constraints
to make the solution unique. The GLA algorithm on the other hand
works for redundant systems without any side constraints, where
the uniqueness of the solution comes via the redundancy.
The GLA proceeds by projecting a signal iteratively onto two
different sets in CL
a×Mdenoted by C1and C2.
C1is the set of admissible points for problem (7). It is also the
set of coefficients cthat can be reached from x∈RLthrough the
frame G, i.e. the range of G:
C1={c| ∃x∈RLs. t. c=Gx}(10)
This meets the hard constraint of problem (7). Note that C1is the
set that satisfies the consistency criterion [14]. By [10] we can
express the projection in the following way:
PC1(c) = GG†c(11)
Let C2to be the set of coefficients minimizing (7) without
necessary satisfying the hard constraint. It is simply given by
C2=nc∈CMN
|c|=so.
The projection onto C2is simply equivalent to forcing the magni-
tude of sto be celementwise:
PC2(c) = s·e·i∠c.(12)
The GLA can now be formulated (cf algo 1).
Algorithm 1 Griffin-Lim algorithm (GLA)
Fix the initial phase ∠c0
Initialize c0=s·e·i∠c0
Iterate for n= 1,2, ...
cn=PC1(PC2(cn−1))
Until convergence
x∗=G†cn
Improvements of the GLA can be found in the literature. In [5],
an approximate way to perform the projection PC1is proposed. As
the projection operator is highly structured, it is normally applied
using a fast algorithm, and this structure cannot be exploited
in the approximation. We have therefore chosen not to use this
approximation in this paper.
In [15], [8] the Real-Time Spectrogram Inversion RTISI algo-
rithm, which is an extension of the GLA was proposed. Recon-
struction is performed piece by piece by using again GLA and a
clever starting point. A further improvement is presented in [9].
In the next section, we propose a different modification for the
GLA. It should be possible to combine both modifications into
one algorithm, however the detailed analysis of this is beyond the
scope of this contribution.
V. THE FAST GRIFFIN LIM ALGORITHM (FGLA)
Equations (6) and (7) define the problem in an optimization
form. However, classic optimization algorithms cannot easily reach
a solution since both (7) and (6) are not convex. Phase recovery
was recently expressed as a convex optimization problem in [16],
[17]. However, nowadays, the heavy computation cost of the
method makes it unsuitable for long signal (i.e. L > 128). In this
contribution, we rather propose to search for the solution of the
non convex problem (7). In fact, we need to find the intersection
of the two sets C1and C2. Iterative projections would converge
to an optimal solution if both sets would be convex. Our idea is
to make larger steps at each iteration. To do so, we will use the
information available in the previous iterations.
More precisely, we will replace the update rule of the Griffin-
Lim
cn=PC1(PC2(cn−1)) (13)
by
cn=PC1(PC2(cn−1+αn∆cn−1)) (14)
where ∆cn=cn−cn−1. At convergence, (14) and (13) are equiv-
alent. However, (14) is a faster way to converge to the solution.
Indeed the parameter αn∆cn−1increases the steps depending on
the current iterations values.
The similar trick is used in the algorithm called "FISTA" (fast
iterative shrinkage thresholding algorithm) [18] that speeds up
the algorithm "ISTA". In this method, they provide the optimal
sequence of αnthat optimizes the convergence. In our case, the
computation of such sequence remains still an open question, due
to the non convexity of our problem. Thus, in the following, we
have considered the simple case: αbeing a constant.
Using (14), we define the algorithm 2 called the Fast Griffin-
Lim algorithm (FGLA). We observe that the heavy part of the
2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 20-23, 2013, New Paltz, NY
computation takes place into the projection PC1which happens
only once per iteration in both algorithms. Hence we assume the
computation cost per iteration to be equivalent in both algorithms.
Since the projection only involves pure Gabor analysis and syn-
thesis, efficient algorithms [19] for these operators can be used.
Note that changing the update rule suppresses all theoretical
guarantees of convergence. This is another open issue of this
contribution.
Algorithm 2 Fast Griffin-Lim algorithm (FGLA)
Fix the initial phase ∠c0
Initialize c0=s·e·i∠c0,t0=PC2(PC1(c0))
Iterate for n= 1,2, ...
tn=PC1(PC2(cn−1))
cn=tn+αn(tn−tn−1)
Update αn
Until convergence
x∗=G†cn
VI. NUMERICAL RESULTS
In this section we present three different experiments of phase
reconstruction. We remind the reader that as the problem is
not convex, the algorithms will converge most likely to a local
minimum depending on the starting point.
Experiments were done using classical windows. Those pre-
sented in this paper use a Nuttall (figure 1), a Gaussian (figures
2) and a Hann window (figure 3). We choose as frame parameters
a= 32,M= 256. This makes a redundancy of 8that assures all
the information to lie in the spectrogram [3].
Using different parameters or windows lead to similar results.
A reproducible research addendum can be downloaded at http:
//unlocbox.sourceforge.net/rr/fgla/. From this archive, parameters
can be easily changed and other configuration tested.
In the first example, we aim at finding a signal from its
spectrogram (phase reconstruction). In this specific case we do
know that such a signal exists. The initial phase is simply set
to zero and the number of iterations for both algorithm is fixed
to 100000. In figure 1, we observe that the FGLA does not only
converge faster (better average slopes), but also to points with
smaller error. Note that for the signal ’bat’, the new algorithm
was able to perform perfect reconstruction. This signal is very
short, only 400 samples. We also observe that, using the FGLA,
the SSNR is not strictly increasing from one iteration to another.
However in average, the SS N R is increasing.
In the second example, we start from a signal, compute the
Gabor coefficients, apply a spectrogram multiplier and reconstruct
a new signal as good as we can. In that case, signals fitting exactly
the modified STFTM usually do not exist. As a consequence, we
are looking for the signal with the best spectrogram approximation.
The applied multiplier is random. This multiplier is chosen because
it modifies the spectrogram in a significant way and, in that
case, algorithm usually need more iterations to converge. Other
multipliers gives similar results. The initial phase, this time, is not
set to zero like in the previous experiment, but we keep the original
phase of the STFT. We fixed the maximum number of iterations
to 10000 as well. Generally, the new algorithm converges faster.
The SSNR is sometime improved, but not in a very significant
manner.
In the third and last experiment, we analyze the effect of α
onto the FGLA. Figure 3 displays tests for various constants α.
α= 1 seems to be the limit of stability of the algorithm. α= 0
correspond to the Griffin-Lim algorithm. Increasing αleads to
better results with some optimal value near 1but not bigger. As a
consequence, 0.99 has been chosen for the other experiments.
Figure 1. Phase recovery problem: SSNR through iterations for the GLA
and the FGLA.
Figure 2. STFT magnitude optimization problem: SSNR through iterations
for the GLA and the FGLA.
The algorithm presented in this paper has been incorporated as
an option for the frsynabs function in the the LTFAT toolbox,
[20].
2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 20-23, 2013, New Paltz, NY
Figure 3. Influence of the parameter alpha onto the FGLA.
VII. CONCLUSION
In this paper, we have presented the phase recovery problem
in the form of an optimization problem. This approach allows us
to give a new interpretation of the GLA in order to be able to
speed it up. We proposed an algorithm (FGLA) that was indeed
faster but seems also to converge to better points. However, any
theoretical guarantee of convergence has been lost in the process.
Practically, our algorithm can replace the GLA at a very low cost
of implementation and computation. In our further research ,we
will look for a convergence proof and an optimal sequence of αn
and possible merge our algorithm with the RTISI real-time GLA
algorithm.
Acknowledgment
This work was supported by the Austrian Science Fund (FWF)
START-project FLAME (“Frames and Linear Operators for Acous-
tical Modeling and Parameter Estimation”; Y 551-N13).
VIII. REFERENCES
[1] H. G. Feichtinger and T. Strohmer, Eds., Gabor Analysis and
Algorithms, Boston, 1998.
[2] P. Majdak, P. Balazs, W. Kreuzer, and M. Dörfler, “A time-
frequency method for increasing the signal-to-noise ratio
in system identification with exponential sweeps,” in Pro-
ceedings of the 36th International Conference on Acoustics,
Speech and Signal Processing, ICASSP 2011, Prag, 2011.
[3] R. Balan, P. Casazza, and D. Edidin, “On signal reconstruc-
tion without phase,” Applied and Computational Harmonic
Analysis, vol. 20, no. 3, pp. 345–356, 2006.
[4] B. G. Bodmann and N. Hammen, “Stable phase retrieval with
low-redundancy frames,” arXiv preprint arXiv:1302.5487,
2013.
[5] J. Le Roux, H. Kameoka, N. Ono, and S. Sagayama, “Fast
signal reconstruction from magnitude stft spectrogram based
on spectrogram consistency,” in Proc. 13th International
Conference on Digital Audio Effects (DAFx-10), 2010, pp.
397–403.
[6] J. Le Roux and E. Vincent, “Consistent Wiener filtering for
audio source separation,” Signal Processing Letters, IEEE,
vol. 20, no. 3, pp. 217–220, 2013.
[7] D. Griffin and J. Lim, “Signal estimation from modified
short-time fourier transform,” Acoustics, Speech and Signal
Processing, IEEE Transactions on, vol. 32, no. 2, pp. 236–
243, 1984.
[8] X. Zhu, G. Beauregard, and L. Wyse, “Real-time signal esti-
mation from modified short-time fourier transform magnitude
spectra,” Audio, Speech, and Language Processing, IEEE
Transactions on, vol. 15, no. 5, pp. 1645–1653, 2007.
[9] V. Gnann and M. Spiertz, “Improving RTISI Phase Estima-
tion With Energy Order and Phase Unwrapping,” in Proc.
of International Conference on Digital Audio Effects DAFx,
vol. 10, 2010.
[10] O. Christensen, An Introduction to Frames and Riesz Bases.
Birkhäuser, 2003.
[11] P. Balazs, “Basic definition and properties of Bessel multi-
pliers,” Journal of Mathematical Analysis and Applications,
vol. 325, no. 1, pp. 571–585, January 2007. [Online].
Available: http://dx.doi.org/10.1016/j.jmaa.2006.02.012
[12] P. Balazs, B. Laback, G. Eckel, and W. A. Deutsch, “Time-
frequency sparsity by removing perceptually irrelevant
components using a simple model of simultaneous masking,”
IEEE Transactions on Audio, Speech and Language
Processing, vol. 18, no. 1, pp. 34–49, 2010. [Online].
Available: http://www.kfs.oeaw.ac.at/xxl/mask/mask.pdf
[13] R. W. Gerchberg and W. O. Saxton, “A practical algorithm
for the determination of the phase from image and diffraction
plane pictures,” Optik, vol. 35, no. 2, pp. 237–250, 1972.
[14] J. Le Roux, N. Ono, and S. Sagayama, “Explicit consistency
constraints for stft spectrograms and their application to
phase reconstruction,” Proc. SAPA, pp. 23–28, 2008.
[15] G. T. Beauregard, X. Zhu, and L. Wyse, “An efficient algo-
rithm for real-time spectrogram inversion,” in Proceedings
of the 8th International Conference on Digital Audio Effects,
2005, pp. 116–118.
[16] E. J. Candes, T. Strohmer, and V. Voroninski, “Phaselift:
Exact and stable signal recovery from magnitude measure-
ments via convex programming,” Communications on Pure
and Applied Mathematics, 2012.
[17] D. L. Sun and J. O. Smith III, “Estimating a signal from
a magnitude spectrogram via convex optimization,” arXiv
preprint arXiv:1209.2076, 2012.
[18] A. Beck and M. Teboulle, “A fast iterative shrinkage-
thresholding algorithm for linear inverse problems,” SIAM
Journal on Imaging Sciences, vol. 2, no. 1, pp. 183–202,
2009.
[19] P. L. Søndergaard, “Efficient Algorithms for the Discrete
Gabor Transform with a long FIR window,” J. Fourier Anal.
Appl., vol. 18, no. 3, pp. 456–470, 2012.
[20] P. L. Søndergaard, B. Torrésani, and P. Balazs, “The Linear
Time Frequency Analysis Toolbox,” International Journal of
Wavelets, Multiresolution Analysis and Information Process-
ing, vol. 10, no. 4, 2012.