Content uploaded by Jan Mandel
Author content
All content in this area was uploaded by Jan Mandel on Jan 20, 2024
Content may be subject to copyright.
University of Colorado at Denver and Health Sciences Center
Efficient Implementation of
the Ensemble Kalman Filter
Jan Mandel
May 2006 UCDHSC/CCM Report No. 231
CENTER FOR COMPUTATIONAL MATHEMATICS REPORTS
EFFICIENT IMPLEMENTATION OF
THE ENSEMBLE KALMAN FILTER
JAN MANDEL∗
Abstract. We present several methods for the efficient implementation of the Ensemble Kalman
Filter (EnKF) of Evensen. It is shown that the EnKF can be implemented without access to the
observation matrix, and only an observation function is needed; this greatly simplifies software design.
New implementations of the EnKF formulas are proposed, with linear computational complexity in
the number of data points. These implementations are possible when the data covariance matrix is
easy to decompose, such as a diagonal or a banded matrix, or given in a factored form as sample
covariance. Unlike previous methods, our method for the former case uses Choleski decomposition on
a small matrix from the Sherman-Morrison-Woodbury formula instead of SVD on a large matrix, and
our method in the latter case does not impose any constraints on data randomization. One version
of the EnKF formulas was implemented in a distributed parallel environment, using SCALAPACK
and MPI.
1. Introduction. The Ensemble Kalman Filter (EnKF) is a Monte-Carlo
implementation of the Bayesian update problem: Given a probability distribution
of the modeled system (the prior, called often the forecast in geosciences) and data
likelihood, the Bayes theorem is used to to obtain the probability distribution with
the data likelihood taken into account (the posterior or the analysis). The Bayesian
update is combined with advancing the model in time, with the data incorporated
from time to time. The original Kalman Filter [14] relies on the assumption that
the probability distributions are Gaussian, and provided algebraic formulas for the
change of the mean and covariance by the Bayesian update, and a formula for
advancing the covariance matrix in time provided the system is linear. However,
this is not possible computationally for high dimensional systems. For this reasons,
EnKFs were developed [7, 12]. EnKFs represent the distribution of the system state
using a random sample, called an ensemble, and replace the covariance matrix by the
sample covariance of the ensemble. One advantage of EnKFs is that advancing the
probability distribution in time is achieved by simply advancing each member of the
ensemble. EnKFs, however, still rely on the Gaussian assumption, though they are
of course used in practice for nonlinear problems, where the Gaussian assumption is
not satisfied. Related filters attempting to relax the Gaussian assumption in EnKF
include [3, 4, 15, 17].
This paper is focused on the Bayesian update step for the EnKF version from
[6, 8]. This filter involves randomization of data. For filters without randomization
of data, see [2, 9, 16].
The paper is organized as follows. In Sec. 2, we state the Bayesian update formulas
in EnKF. In Section 3, we show how to evaluate formulas without an observation
matrix, involving observation functions. Sec. 4 contains the discussion of several
implementations of the EnKF and their computational complexity. Finally, Sec. 5
briefly reports on the experience from a distributed parallel implementation.
2. Notation and the EnKF Formulas. We review the EnKF formulas
following [6, 8], with only one minor difference. The forecast ensemble consists of
Nmembers, which are state vectors of dimension n. The ensemble can be written as
∗Center for Computational Mathematics, University of Colorado at Denver and Health Sciences
Center, Denver, CO 80217-3364
1
the Nby nmatrix
Xf= [x1,...,xN] = [xi]
are the forecast ensemble states. The ensemble mean is
E(X) = 1
N
N
X
k=1
xk.
and the ensemble covariance matrix is
C=AAT
N−1
where
A=X−E(X) = X−1
N(XeN×1)e1×N,
and edenotes the matrix of all ones of the indicated size.
The data is given as a measurement vector dsize mand error covariance matrix
Rsize mby m. Then the measurement matrix Dsize mby Nis defined by
D= [d1, d2, ..., dN], dj=d+vj, vj∼N(0, R)
where vjare independent random perturbations.
The analysis ensemble is then given by
Xa=X+CH TH CHT+R−1(D−HX),(2.1)
cf., [8, eq. (20)].
The difference between (2.1) and [8, eq. (20)] is that we use the covariance matrix
Rof the measurement error rather than the sample covariance DDT/(N−1) of the
randomized data. Because Ris always positive definite, there is no difficulty with the
inverse in (2.1), unlike [8, eq. (20)].
The analysis formula (2.1) can be rewritten similarly as in [8, eq. (54)] as
Xa=X+1
N−1A(HA)TP−1(D−HX ) (2.2)
where
HA =HX −1
N((HX)eN×1)e1×N,(2.3)
P=1
N−1HA (HA)T+R. (2.4)
3. Observation Matrix-Free Method. Clearly, the matrix Hin (2.2) – (2.4)
is needed only in the matrix vector product HX, which needs to be computed only
once. However, it is very inconvenient to operate with the matrix Hexplicitly for
several reasons:
1. An observation function h(x) that creates synthetic data from the state can
be quite complicated. Even when the observation function is affine, thus of
the form
h(x) = Hx +f, (3.1)
creating the matrix Hand the vector fis an additional effort, which is
typically much harder than programming the observation function itself.
2
2. Computing the matrix-vector product HX takes computational resources.
3. The matrix His typically sparse and it can be very large. E.g., in the
quite common case when every measurement coincides with the value of
a state variable, the matrix will have just one one in every row and the
rest of its entries are zeros. If the measurement is an image, the number of
measurements can be very large, but each pixel in the image is interpolated
from just a few entries of the state vector. Assimilation of image data
into a geophysical model may easily require m≈106and n≈106, which
makes manipulation of the matrix Hstored as full impossible on current
computers. So, the matrix Hmust be stored as sparse, which is an additional
complication.
Fortunately, creating the matrix Hexplicitly is not needed. Assume that we only
have access to the evaluation of h(x) in (3.1) but not to the values of Hor f. To
compute the data residual, note that
d−Hx =d+f−(Hx +f) = e
d−h(x),
where
e
d=d+f
is the data actually given. To compute HA, note that
[HA]i=Hxi−H1
n
n
X
j=1
xj
= (Hxi+f)−1
N
N
X
j=1
(Hxj+f)
=h(xi)−1
N
N
X
j=1
h(xj).
Consequently, the ensemble update can be computed by evaluating the observation
function hon each ensemble member once.
4. Computational Complexity. All operations in (2.2) – (2.4) are evaluations
of the observation function h(x), cf., Sec. 3, and matrix-matrix and matrix-vector
operations, which can be implemented efficiently by calls to the LAPACK, BLAS,
and SCALAPACK libraries [1, 5]. These libraries contain routines for operations
including Choleski decomposition of a symmetric positive definite matrix (CHOL)
LLT=A, and matrix multiply (GEMM), A=αA +βB(T)C(T), where (T) is either
transpose or nothing, for rank update. Recall that mis the number of data points, n
is the number degrees of freedom, Nis the ensemble size, so Xis nby N,HX and
HA are mby N,Ris mby m. Assume that evaluation of h(x) costs O(m), and that
multiplication of matrices of size n1by n2and n2by n3costs O(n1n2n3).
4.1. Reference Implementation. Straightforward implementation of the
formulas (2.2) – (2.4) leads to the following algorithm:
3
computation operation size cost
HX N times h(x)m×N O(mN )
z= (HX)eN×1matrix multiply m×N×1O(mN)
HA =HX −1
Nze1×Nmatrix multiply m×1×N O(mN)
Y=D−HX matrix add m×N O(mN )
P=R+1
N−1HA (HA)Tmatrix multiply m×N×m O(m2N)
LLT=PCholeski m O(m3)
M=P−1Ysolution m×m×N O(m2N)
Z= (HA)TMmatrix multiply N×m×N O(mN 2)
Xa=X+1
N−1AZ matrix multiply n×N×N O(nN 2)
(4.1)
The total computational complexity of the algorithm (4.1) is
Om3+m2N+mN2+nN2.
So, this method is suitable for a large number of degrees of freedom n, but not for a
large number of observations m.
4.2. Large Number of Data Points. In practice, the number of data points
mis often large, while the data error covariance matrix Ris often diagonal (when the
data errors are uncorrelated), nearly diagonal, or easy to decompose. This happens,
e.g., in the assimilation of images, or in regularized EnKF where the gradient of fields
in the state is assimilated as an additional artificial observation [13]. In this case,
the following algorithm, which has only linear complexity in m, provides a significant
advantage. Assume that multiplication by R−1is dominated by other costs. Using
the Sherman-Morrison-Woodbury formula [11]
(R+UV T)−1=R−1−R−1U(I+VTR−1U)−1VTR−1,
with
U=1
N−1HA, V =HA,
we have
P−1=R+1
N−1HA (HA)T−1
=R−1"I−1
N−1(HA)I+ (HA)TR−11
N−1(HA)−1
(HA)TR−1#.
4
The computation (4.1) then becomes
computation operation size cost
HX N times h(x)m×N O(mN )
z= (HX)eN×1matrix multiply m×N×1O(mN)
HA =HX −1
Nze1×Nmatrix multiply m×1×N O(mN)
Y=D−HX matrix add m×N O(mN )
Q=I+ (HA)TR−11
N−1(HA) matrix multiply N×m×M O(mN 2)
LLT=QCholeski N O(N3)
Z= (HA)TR−1Ymatrix multiply N×m×N O(mN 2)
W=Q−1Zsolution N×N O N3
M=R−1hI−1
N−1(HA)Wimatrix multiply m×N×N O(mN 2)
Z= (HA)TMmatrix multiply N×m×N O(mN 2)
Xa=X+1
N−1AZ matrix multiply n×N×N O(nN 2)
(4.2)
This gives overall complexity ON3+mN2+nN2, which is suitable for large nand
large m.
4.3. Square Root Alternative. Decompose first R=SST, e.g., by Choleski
decomposition, and multiplication by S−1is cheap to compute. Let e
U=S−1U,
e
V=S−1V. Then the Sherman-Morrison-Woodbury formula becomes
(SST+U V T)−1=S−TI−e
UI+e
VTe
U−1e
VS−1,
which gives, again with R=SST,B=S−1H A
P−1=S−T"I−1
N−1BI+1
N−1BTB−1
BT#S−1
and one proceeds just as in Sec. 4.2. The asymptotic complexity is same as in Sec.
4.2, but the formulas involve symmetric products of matrices, which is numerically
more stable and allows to save memory by storing just one triangle.
4.4. SVD Method for Full Data Error Covariance. Assume again that
R=SSTand multiplication by S−1is cheap to compute. Write
P=1
N−1HA (HA)T+R=S1
N−1S−1HA (HA)TS−T+IST
and use the singular value decomposition (SVD) [10]
S−1HA =UΣVT,
where Uand Vare orthogonal square matrices, and Σ = diagm×N(σ1, ..., σk) is the
diagonal matrix size mby Nwith the singular values σ1, ..., σk,k= min {m, N }, on
the diagonal; then, using the orthogonality relations VTV=I,UUT=I, we have
1
N−1S−1HA (HA)TS−T+I=1
N−1UΣUT+I=UΣ2
N−1+IUT
5
thus
P−1=R−1/2UTdiag
1
σ2
i
N+1 + 1
UR−1/2
The complexity is again O(mN2),assuming that m > N and that computation of
SVD of matrix size mby Ncosts O(mN min(m, N)).
4.5. SVD and Eigenvalue Methods for Sample Data Error Covariance.
It should be noted that the use of SVD in Sec. 4.4 is different than in [8], where
spectral methods and SVD were advocated as a device to overcome the singularity of
the data sample covariance matrix R, given as R=SST/(N−1) with Ssize m×N.
In that case, [8, eq. (56)] suggests the eigenvalue decomposition for the matrix
P=1
N−1HA (HA)T+SST=ZΛZT,(4.3)
which is size mby mbut there are only Nnonzero eigenvalues, i.e., diagonal entries of
Λ. If the matrix Pis created explicitly, the cost of the decomposition (4.3) is between
Om2Nand Om3.
However, we note that in this case, Pcan be written as
P=1
N−1F F T, F = [HA, S],(4.4)
where Fhas size mby 2N. Therefore, applying the SVD decomposition to the matrix
W, we obtain W=UΣVT, and
P−1= (N−1) UΣ−2UT.(4.5)
The matrix Uis size mby mbut it has only 2Nonzero columns, and SVD routines
actually return the mby 2Nsubmatrix, at the cost OmN 2. The resulting
algorithm, using the multiplication by factored P−1from (4.5) in (2.1), has again
the cost ON3+mN2+nN2. [8, eq. (57)] discusses a method using SVD, with the
same asymptotic cost, but requiring the data perturbation selected in a special way.
No such constraint is needed here.
4.6. Iterative Methods. The linear system P M =Ycan be solved by
conjugate gradients for Nright-hand sides. However, each iteration costs OmN2,
so iterative methods do not seem to be competitive.
5. Distributed Parallel Implementation. The method described in Sec. 4.2
was implemented in a distributed parallel environment using MPI and SCALAPACK.
EnKF is naturally parallel: each ensemble member can be advanced in time
independently. The linear algebra in the Bayesian update step links the ensemble
members together. The ensemble matrix Xis then naturally distributed so that
each process owns a block of columns. However, such distribution is a bottleneck to
parallelism: SCALAPACK requires that all matrices involved in an operation must
be distributed on the same processor grid (though possibly with different block sizes),
and, for best performance, the processor grid should be close to a square. 1D processor
grids tend to be particularly inefficient. Therefore, the ensemble matrix must be
redistributed before the matrix linear algebra operations.
6
6. Acknowledgements. Section 4.4 is based on a discussion with Andrew
Knyazev and Craig Johns. The author would like to thank Craig Johns and Mingshi
Chen for useful discussions about the algebra of EnKF, and Jonathan Beezley for
reading this paper. Thanks are also due to Jonathan Beezley, Craig Douglas, Deng Li,
and Adam Zornes for contributing to an object oriented interface to SCALAPACK,
to Craig Douglas and Wei Li for assistance with MPI wrappers, and to Jonathan
Beezley for assistance with a parallel distributed implementation of EnKF on top
of SCALAPACK. This research was supported by the National Science Foundation
under the grant CNS-0325314. Computer time on IBM BlueGene/L supercomputer
was provided by NSF MRI Grants CNS-0421498, CNS-0420873, and CNS-0420985,
NSF sponsorship of the National Center for Atmospheric Research, the University of
Colorado, and a grant from the IBM Shared University Research (SUR) program.
REFERENCES
[1] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz,
A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen,LAPACK Users’
Guide, Society for Industrial and Applied Mathematics, Philadelphia, PA, third ed., 1999.
[2] J. L. Anderson,An ensemble adjustment Kalman filter for data assimilation, Monthly
Weather Review, 129 (1999), pp. 2884–2903.
[3] J. L. Anderson and S. L. Anderson,A Monte Carlo implementation of the nonlinear filtering
problem to produce ensemble assimilations and forecasts, Monthly Weather Review, 127
(1999), pp. 2741–2758.
[4] T. Bengtsson, C. Snyder, and D. Nychka,Toward a nonlinear ensemble filter for high
dimensional systems, Journal of Geophysical Research - Atmospheres, 108(D24) (2003),
pp. STS 2–1–10.
[5] L. S. Blackford, J. Choi, A. Cleary, E. D’Azevedo, J. Demmel, I. Dhillon, J. Dongarra,
S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley,
ScaLAPACK Users’ Guide, Society for Industrial and Applied Mathematics, Philadelphia,
PA, 1997.
[6] G. Burgers, P. J. van Leeuwen, and G. Evensen,Analysis scheme in the ensemble Kalman
filter, Monthly Weather Review, 126 (1998), pp. 1719–1724.
[7] G. Evensen,Sequential data assimilation with nonlinear quasi-geostrophic model using Monte
Carlo methods to forecast error statistics, Journal of Geophysical Research, 99 (C5) (1994),
pp. 143–162.
[8] G. Evensen,The ensemble Kalman filter: Theoretical formulation and practical
implementation, Ocean Dynamics, 53 (2003), pp. 343–367.
[9] ,Sampling strategies and square root analysis schemes for the EnKF, Ocean Dynamics,
(2004), pp. 539–560.
[10] G. H. Golub and C. F. V. Loan,Matrix Computations, Johns Hopkins Univ. Press, 1989.
Second Edition.
[11] W. W. Hager,Updating the inverse of a matrix, SIAM Rev., 31 (1989), pp. 221–239.
[12] P. Houtekamer and H. L. Mitchell,Data assimilation using an ensemble Kalman filter
technique, Monthly Weather Review, 126 (1998), pp. 796–811.
[13] C. J. Johns and J. Mandel,A two-stage ensemble Kalman filter for smooth data assimilation.
Environmental and Ecological Statistics. Conference on New Developments of Statistical
Analysis in Wildlife, Fisheries, and Ecological Research, Oct 13-16, 2004, Columbia, MI,
in print.
[14] R. E. Kalman,A new approach to linear filtering and prediction problems, Transactions of the
ASME – Journal of Basic Engineering, Series D, 82 (1960), pp. 35–45.
[15] J. Mandel and J. D. Beezley,Predictor-corrector ensemble filters for the as-
similation of sparse data into high dimensional nonlinear systems. CCM Re-
port 232, University of Colorado at Denver and Health Sciences Center, 2006;
http://www.math.cudenver.edu/ccm/reports, May 2006.
[16] M. K. Tippett, J. L. Anderson, C. H. Bishop, T. M. Hamill, and J. S. Whitaker,Ensemble
square root filters, Monthly Weather Review, 131 (2003), pp. 1485–1490.
[17] P. van Leeuwen,A variance-minimizing filter for large-scale applications, Monthly Weather
Review, 131 (2003), pp. 2071–2084.
7
CENTER FOR COMPUTATIONAL MATHEMATICS REPORTS
University of Colorado at Denver and Health Sciences Center Fax: (303) 556-8550
P.O. Box 173364, Campus Box 170 Phone: (303) 556-8442
Denver, CO 80217-3364 http://www-math.cudenver.edu/ccm
220 Andrew V. Knyazev and Merico E. Argentati, “Implementation of a Preconditioned
Eigensolver Using Hypre.” April 2005.
221 Craig Johns and Jan Mandel, “A Two-Stage Ensemble Kalman Filter for Smooth Data
Assimilation.” April 2005
222 I. Lashuk, M. E. Argentati, E. Ovchinnikov and A. V. Knyazev, “Preconditioned Eigen-
solver LOBPCG in hypre and PETSc.” September 2005.
223 A. V. Knyazev and M. E. Argentati, “Majorization for Changes in Angles Between
Subspaces, Ritz values, and graph Laplacian spectra.” September 2005.
224 D. W. Dean, “A Semimartingale Approach For Modeling Multiphase Flow In Hetero-
geneous Porous Media.” November 2005.
225 D.W. Dean, T.H. Illangasekare and T.F. Russell, “A Stochastic Differential Equation
Approach For Modeling Of NAPL Flow In Heterogeneous Porous Media.” November
2005.
226 J. Mandel and B. Soused´ık, “Adaptive Coarse Space Selection in the BDDC and
the FETI-DP Iterative Substructuring Methods: Optimal Face Degrees of Freedom.”
November 2005.
227 J. Mandel and B. Soused´ık, “Adaptive Selection of Face Coarse Degrees of Freedom in
the BDDC and the FETI-DP Iterative Substructuring Methods.” November 2005.
228 Craig C. Douglas, Jonathan D. Beezley, Janice Coen, Deng Li, Wei Li, Alan K. Man-
del, Jan Mandel, Guan Qin, and Anthony Vodacek, “Demonstrating the Validity of a
Wildfire DDDAS.” February 2006.
229 Lynn Schreyer Bennethum, “Flow and Deformation: Understanding the Assumptions
and Thermodynamics, Updated.” February 2006.
230 Lynn Schreyer Bennethum, “Compressibility Moduli for Porous Materials Incorporating
Volume Fraction with Detailed Computations.” February 2006.
231 Jan Mandel, “Efficient Implementation of the Ensemble Kalman Filter.” May 2006.