Article

Parameter estimation in an intermediate complexity Earth System Model using an ensemble Kalman filter

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We describe the development of an efficient method for parameter estimation and ensemble forecasting in climate modelling. The technique is based on the ensemble Kalman filter and is several orders of magnitude more efficient than many others which have been previously used to address this problem. As well as being theoretically (near-)optimal, the method does not suffer from the 'curse of dimensionality' and can comfortably handle multivariate parameter estimation. We demonstrate the potential of this method in identical twin testing with an intermediate complexity coupled AOGCM. The model's climatology is successfully tuned via the simultaneous estimation of 12 parameters. Several minor modifications arc described by which the method was adapted to a steady state (temporally averaged) case. The method is relatively simple to implement, and with only O(50) model runs required, we believe that optimal parameter estimation is now accessible even to computationally demanding models.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... As all climate models, several parameters of EcoGENIE have been tuned to suit climatological observations. The parameters of the oceanic component (GOLDSTEIN) were calibrated against annual mean climatological observations of temperature, salinity, surface air temperature and humidity using the ensemble Kalman filter (EnKF) methodology [Hargreaves et al., 2004;Annan et al., 2005]. Furthermore, the parameters of the ocean biogeochemistry component (BIOGEM) were optimized with respect to a 3D data fields of phosphate [Conkright et al., 2002] and alkalinity [Key et al., 2004] using the EnKF methodology [Annan et al., 2005]. ...
... The parameters of the oceanic component (GOLDSTEIN) were calibrated against annual mean climatological observations of temperature, salinity, surface air temperature and humidity using the ensemble Kalman filter (EnKF) methodology [Hargreaves et al., 2004;Annan et al., 2005]. Furthermore, the parameters of the ocean biogeochemistry component (BIOGEM) were optimized with respect to a 3D data fields of phosphate [Conkright et al., 2002] and alkalinity [Key et al., 2004] using the EnKF methodology [Annan et al., 2005]. After calibration, the global particulate organic carbon, inorganic carbon export and dissolved O 2 are consistent with recent data and computationally expensive 3D ocean circulation model estimates . ...
... This oceanic component includes a surface mixed layer scheme based on the scheme by Kraus and Turner [1967]. The parameters for GOLDSTEIN are calibrated against annual mean climatological observations of temperature, salinity, surface air temperature and humidity using the ensemble Kalman filter (EnKF) methodology [Hargreaves et al., 2004;Annan et al., 2005]. ...
Thesis
Full-text available
Marine phytoplankton are unicellular algae that are the first link of the marine food chain. They can influence the climate system via biogeochemical and biogeophysical mechanisms, especially during large blooms. Phytoplankton can absorb light at the surface of the ocean, modifying the distribution of radiative heat along the water column. These changes in heat budget alter the oceanic properties, the atmospheric properties and finally the overall climate system. In this thesis, I investigate the role of the marine biota in the climate system by using an Earth system model of intermediate complexity called EcoGENIE. I modified the oceanic and ecosystem model components to consider phytoplankton light absorption. Over the past years, the number of plankton functional types in models has increased but the relative importance of biological processes such as phytoplankton light absorption is still unclear. As a logical extension, I compared the relative importance of phytoplankton light absorption with an increase in marine ecosystem complexity. I show that phytoplankton light absorption increases the atmospheric CO2 concentration and the overall heat budget of the planet. In contrast, increasing ecosystem complexity only slightly affects the carbon cycle and thus the heat budget. In conclusion, phytoplankton light absorption has a higher impact on the climate system than an increase in marine ecosystem complexity. After demonstrating that phytoplankton light absorption has an impact on the climate system, I focus on the climate pathways behind the atmospheric warming due to this biogeophysical mechanism. Phytoplankton light absorption increases the oceanic temperature with consequences on the air-sea heat and CO2 fluxes. I evidence that changes in air-sea CO2 exchange due to phytoplankton light absorption have a larger contribution to the atmospheric heating than phytoplankton-induced changes in air-sea heat flux. After demonstrating that phytoplankton light absorption increases the atmospheric temperature via an increase in atmospheric CO2 concentration, I explore the effects of this biogeophysical mechanism would have in a warmer climate. To shed light on this question, I conduct simulations under RCPs and pre-industrial conditions. First, I evidence that the overall warming due to phytoplankton light absorption is smaller than the overall warming due to climate change. Secondly, chlorophyll biomass is expected to decrease under global warming and my results indicate that phytoplankton light absorption enhances the reduction of the chlorophyll biomass. As a consequence, less heat is trapped by chlorophyll and the effect of phytoplankton light absorption on the climate system is reduced. Thirdly I demonstrate that prescribing atmospheric CO2 concentration in model simulations blur the real effect of phytoplankton light absorption on the climate system. This thesis supports the idea that phytoplankton light absorption should be considered in climate studies as an internal constituent of the climate system for long-term climate adjustment.
... In these ensemble-based methods, the model state can be augmented with a set of poorly known parameters ( Banks, 1992;Anderson, 2001). Updating the augmented state with observations can therefore estimate the model state and parameter simultaneously ( Kivman, 2003;Annan and Hargreaves, 2004;Annan et al., 2005). In practice, PE is more difficult than state estimation (SE) because the connection between parameters and observations is indirect, often nonlinear. ...
... It is one of the most widely used assumptions in PE that parameter remains unchanged during the state model integration. This assumption is introduced in the early stage of PE using the data assimilation method ( Evensen et al., 1998;Kivman, 2003;Annan and Hargreaves, 2004;Annan et al., 2005), and works well when the model is simple. As the state model becomes more and more complicated, this assumption brings several issues. ...
Article
Full-text available
Parameter estimation is defined as the process to adjust or optimize the model parameter using observations. A long-term problem in ensemble-based parameter estimation methods is that the parameters are assumed to be constant during model integration. This assumption will cause underestimation of parameter ensemble spread, such that the parameter ensemble tends to collapse before an optimal solution is found. In this work, a two-stage inflation method is developed for parameter estimation, which can address the collapse of parameter ensemble due to the constant evolution of parameters. In the first stage, adaptive inflation is applied to the augmented states, in which the global scalar parameter is transformed to fields with spatial dependence. In the second stage, extra multiplicative inflation is used to inflate the scalar parameter ensemble to compensate for constant parameter evolution, where the inflation factor is determined according to the spread growth ratio of model states. The observation system simulation experiment with Community Earth System Model (CESM) shows that the second stage of the inflation scheme plays a crucial role in successful parameter estimation. With proper multiplicative inflation factors, the parameter estimation can effectively reduce the parameter biases, providing more accurate analyses.
... The frequent large deviations of Arctic sea ice cover from its climatology and the impact of sea ice cover on the overlying atmosphere and on ocean-atmosphere fluxes motivate including an active sea ice component in seasonal-to-sub-seasonal (S2S) weather forecasts (Vitart et al., 2015). The persistence and reemergence of sea ice thickness (SIT) and sea surface temperature anomalies are major sources of predictability for Arctic sea ice extent (SIE; Blanchard-Wrigglesworth et al., 2011). Previous studies have demonstrated the importance of accurate initial conditions, especially SIT, in predicting Arctic sea ice extent (Day et al., 2014). ...
... Anderson (2002) demonstrated the feasibility of updating parameters using an ensemble filter in a low-order model. Annan et al. (2005) were among the first to apply an ensemble filter to estimate parameters in a complex Earth system model. Massonnet et al. (2014) employed the ensemble Kalman filter (EnKF) in a sea ice model to estimate three parameters that control sea ice dynamics. ...
Article
Full-text available
Uncertain or inaccurate parameters in sea ice models influence seasonal predictions and climate change projections in terms of both mean and trend. We explore the feasibility and benefits of applying an ensemble Kalman filter (EnKF) to estimate parameters in the Los Alamos sea ice model (CICE). Parameter estimation (PE) is applied to the highly influential dry snow grain radius and combined with state estimation in a series of perfect model observing system simulation experiments (OSSEs). Allowing the parameter to vary in space improves performance along the sea ice edge but degrades in the central Arctic compared to requiring the parameter to be uniform everywhere, suggesting that spatially varying parameters will likely improve PE performance at local scales and should be considered with caution. We compare experiments with both PE and state estimation to experiments with only the latter and have found that the benefits of PE mostly occur after the data assimilation period, when no observations are available to assimilate (i.e., the forecast period), which suggests PE's relevance for improving seasonal predictions of Arctic sea ice.
... Sequential Bayesian inference method, such as the ensemble Kalman filter (EnKF) [4,7,27], has also been intensively utilized for parameter estimation of ocean models (e.g., [2,3,5,11,55,66,70]). The EnKF has been found efficient, with advantages over the MCMC approach in accommodating large state-parameter vectors at reasonable computational cost [46,57,58]. ...
... The covariance has two independent hyper-parameters, being the correlation length l and the regularity parameter γ: q = (l, γ). The priors of the hyper-parameters are both uniform with ranges l ∼ U [0. 25,1], and γ ∼ U [1. 5,2]. The integral equations defining the q-dependent modes, ...
Article
Full-text available
Bayesian inference with coordinate transformations and polynomial chaos for a Gaussian process with a parametrized prior covariance model was introduced in [61] to enable and infer uncertainties in a parameter-ized prior field. The feasibility of the method was successfully demonstrated on a simple transient diffusion equation. In this work, we adopt a similar approach to infer a spatially varying Manning's n field in a coastal ocean model. The idea is to view the prior on the Manning's n field as a stochastic Gaussian field, expressed through a covariance function with uncertain hyper-parameters. A generalized Karhunen-Loeve (KL) expansion, which incorporates the construction of a reference basis of spatial modes and a coordinate transformation, is then applied to the prior field. To improve the computational efficiency of the method proposed in [61], we propose to use two polynomial chaos expansions to: (i) approximate the coordinate transformation, and (ii) build a cheap surrogate of the large-scale advanced circulation (ADCIRC) numerical model. These two surrogates are used to accelerate the Bayesian inference process using a Markov chain Monte Carlo algorithm. Water elevation data are inverted within an observing system simulation experiment framework, based on a realistic ADCIRC model, to infer the KL coordinates and hyper-parameters of a reference 2D Manning's field. Our results demonstrate the efficiency of the proposed approach and suggest that including the hyper-parameter uncertainties greatly enhances the inferred Manning's n field, compared to using a covariance with fixed hyper-parameters.
... Importantly, the impact of simulation crashes on the validity of global sensitivity analysis (GSA) results has often been overlooked in the literature, wherein simulation crashes have been commonly classified as ignorable (see Sect. 1.2). As such, a surprisingly limited number of studies have reported simulation crashes (examples related to uncertainty analysis include Annan et al., 2005;Edwards and Marsh, 2005;Lucas et al., 2013). This is despite the fact that these crashes can be very computationally costly for GSA algorithms because they can waste the rest of the model runs, prevent the completion of GSA, or inevitably introduce ambiguity into the inferences drawn from GSA. ...
... They further applied this approach to investigate the impact of various model parameters on simulation failures. A similar approach is based on model preemption strategies, in which the simulation performance is monitored while the model is running and the model run is terminated early if it is predicted that the simulation will not be informative (Razavi et al., 2010;Asadzadeh et al., 2014). ...
Article
Full-text available
Complex, software-intensive, technically advanced, and computationally demanding models, presumably with ever-growing realism and fidelity, have been widely used to simulate and predict the dynamics of the Earth and environmental systems. The parameter-induced simulation crash (failure) problem is typical across most of these models despite considerable efforts that modellers have directed at model development and implementation over the last few decades. A simulation failure mainly occurs due to the violation of numerical stability conditions, non-robust numerical implementations, or errors in programming. However, the existing sampling-based analysis techniques such as global sensitivity analysis (GSA) methods, which require running these models under many configurations of parameter values, are ill equipped to effectively deal with model failures. To tackle this problem, we propose a new approach that allows users to cope with failed designs (samples) when performing GSA without rerunning the entire experiment. This approach deems model crashes as missing data and uses strategies such as median substitution, single nearest-neighbor, or response surface modeling to fill in for model crashes. We test the proposed approach on a 10-parameter HBV-SASK (Hydrologiska Byråns Vattenbalansavdelning modified by the second author for educational purposes) rainfall–runoff model and a 111-parameter Modélisation Environmentale–Surface et Hydrologie (MESH) land surface–hydrology model. Our results show that response surface modeling is a superior strategy, out of the data-filling strategies tested, and can comply with the dimensionality of the model, sample size, and the ratio of the number of failures to the sample size. Further, we conduct a “failure analysis” and discuss some possible causes of the MESH model failure that can be used for future model improvement.
... In parameter estimation problem, it is assumed that the uncertainties in the model parameters are the sources of errors for the model errors [66]. According to Annan et al. [150], it is important to tune the parameters to gain a better confidence in the predictions of the state values. ...
... In state-parameter augmentation, parameters are considered as part of the model, which are updated in the analysis step of the data assimilation algorithm together with the model variables [150]. An evolution model for model parameters is required for the state-parameter augmented model [151]. ...
Thesis
Cardiovascular blood flow simulations can fill several critical gaps in current clinical capabilities. They offer non-invasive ways to quantify hemodynamics in the heart and major blood vessels for patients with cardiovascular diseases, that cannot be directly obtained from medical imaging. Patient-specific simulations (incorporating data unique to the individual) enable individualised risk prediction, provide key insights into disease progression and/or abnormal physiologic detection. They also provide means to systematically design and test new medical devices, and are used as predictive tools to surgical and personalize treatment planning and, thus aid in clinical decision-making. Patient-specific predictive simulations require effective assimilation of medical data for reliable simulated predictions. This is usually achieved by the solution of an inverse hemodynamic problem, where uncertain model parameters are estimated using the techniques for merging data and numerical models known as data assimilation methods.In this thesis, the inverse problem is solved through a data assimilation method using an ensemble Kalman filter (EnKF) for parameter estimation. By using an ensemble Kalman filter, the solution also comes with a quantification of the uncertainties for the estimated parameters. An ensemble Kalman filter-based parameter estimation algorithm is proposed for patient-specific hemodynamic computations in a schematic arterial network from uncertain clinical measurements. Several in silico scenarii (using synthetic data) are considered to investigate the efficiency of the parameter estimation algorithm using EnKF. The usefulness of the parameter estimation algorithm is also assessed using experimental data from an in vitro test rig and actual real clinical data from a volunteer (patient-specific case). The proposed algorithm is evaluated on arterial networks which include single arteries, cases of bifurcation, a simple human arterial network and a complex arterial network including the circle of Willis.The ultimate aim is to perform patient-specific hemodynamic analysis in the network of the circle of Willis. Common hemodynamic properties (parameters), like arterial wall properties (Young’s modulus, wall thickness, and viscoelastic coefficient) and terminal boundary parameters (reflection coefficient and Windkessel model parameters) are estimated as the solution to an inverse problem using time series pressure values and blood flow rate as measurements. It is also demonstrated that a proper reduced order zero-dimensional compartment model can lead to a simple and reliable estimation of blood flow features in the circle of Willis. The simulations with the estimated parameters capture target pressure or flow rate waveforms at given specific locations.
... Later, the technique of state augmentation with model parameters (e.g. Friedland, 1969;Smith et al., 2011) and an ensemble Kalman filter (EnKF) (Evensen, 1994) was used by Annan et al. (2005a) and Annan et al. (2005b) in synthetic experiments with an Earth system model of intermediate complexity (EMIC) and with an AGCM coupled to a slab ocean, respectively. The additional issue of sparsity in paleoclimate proxies was addressed by Paul and Schäfer-Neth (2005) for climate field reconstructions with an EMIC and manual tuning. ...
... Here, a multistep approach is conducted by inflating the observation-error covariance matrix R and recursively applying a standard Kalman smoother over the assimilation window with the inflated R and the same observations. The multistep idea of inflating R for repeated assimilation of the observations was proposed by Annan et al. (2005a) and further clarified and applied by Annan et al. (2005b) for an atmospheric GCM using the EnKF with parameter augmentation. Their approach is designed for steady-state cases, for which time-averaged climate observations corresponding to a long DAW can be assumed as constant along a sequence of smaller assimilation sub-windows into which the DAW is divided. ...
Article
Full-text available
Paleoclimate reconstruction based on assimilation of proxy observations requires specification of the control variables and their background statistics. As opposed to numerical weather prediction (NWP), which is mostly an initial condition problem, the main source of error growth in deterministic Earth system models (ESMs) regarding the model low-frequency response comes from errors in other inputs: parameters for the small-scale physics, as well as forcing and boundary conditions. Also, comprehensive ESMs are non-linear and only a few ensemble members can be run in current high-performance computers. Under these conditions we evaluate two assimilation schemes, which (a) count on iterations to deal with non-linearity and (b) are based on low-dimensional control vectors to reduce the computational need. The practical implementation would assume that the ESM has been previously globally tuned with current observations and that for a given situation there is previous knowledge of the most sensitive inputs (given corresponding uncertainties), which should be selected as control variables. The low dimension of the control vector allows for using full-rank covariances and resorting to finite-difference sensitivities (FDSs). The schemes are then an FDS implementation of the iterative Kalman smoother (FDS-IKS, a Gauss–Newton scheme) and a so-called FDS-multistep Kalman smoother (FDS-MKS, based on repeated assimilation of the observations). We describe the schemes and evaluate the analysis step for a data assimilation window in two numerical experiments: (a) a simple 1-D energy balance model (Ebm1D; which has an adjoint code) with present-day surface air temperature from the NCEP/NCAR reanalysis data as a target and (b) a multi-decadal synthetic case with the Community Earth System Model (CESM v1.2, with no adjoint). In the Ebm1D experiment, the FDS-IKS converges to the same parameters and cost function values as a 4D-Var scheme. For similar iterations to the FDS-IKS, the FDS-MKS results in slightly higher cost function values, which are still substantially lower than those of an ensemble transform Kalman filter (ETKF). In the CESM experiment, we include an ETKF with Gaussian anamorphosis (ETKF-GA) implementation as a potential non-linear assimilation alternative. For three iterations, both FDS schemes obtain cost functions values that are close between them and (with about half the computational cost) lower than those of the ETKF and ETKF-GA (with similar cost function values). Overall, the FDS-IKS seems more adequate for the problem, with the FDS-MKS potentially more useful to damp increments in early iterations of the FDS-IKS.
... Importantly, the impact of simulation crashes on the validity of global sensitivity analysis (GSA) results has often been overlooked in the literature, wherein simulation crashes have been commonly classified as ignorable (see Sect. 1.2). As such, a surprisingly limited number of studies have reported simulation crashes (examples related to uncertainty analysis include Annan et al., 2005;Edwards and Marsh, 2005;Lucas et al., 2013). This is despite the fact that these crashes can be very computationally costly for GSA algorithms because they can waste the rest of the model runs, prevent the completion of GSA, or inevitably introduce ambiguity into the inferences drawn from GSA. ...
... They further applied this approach to investigate the impact of various model parameters on simulation failures. A similar approach is based on model preemption strategies, in which the simulation performance is monitored while the model is running and the model run is terminated early if it is predicted that the simulation will not be informative (Razavi et al., 2010;Asadzadeh et al., 2014). ...
... Currently, the main advanced data assimilation methods are four-dimensional variational (4D-Var) (Lewis and Derber, 1985;Le Dimet and Talagrand, 1986;Courtier et al., 1994), ensemble Kalman filter (EnKF) (Evensen, 1994) and particle filter (PF) (Doucet et al., 2001). Previous studies about CPO are mainly based on 4D-Var (Lu and Hsieh, 1998;Du et al., 2009;Ito et al., 2010) and EnKF (Annan et al., 2005;Zhang et al., 2012;Han et al., 2013), while PF CPO is still under development. Here, we are concerned with 4D-Var. ...
... This paper makes a contribution that is necessary if ABMs are to become more widely used in policy, particularly for real-time applications. Specifically, its main aim is to demonstrate that an ensemble Kalman filter (EnKF)-a method that has shown great value in updating models of physical systems such as the climate [15,16]-can improve the accuracy with which an ABM simulates a system of pedestrians. Although other approaches have attempted to leverage data assimilation (DA) techniques for real-time agent-based modelling [13,[17][18][19][20][21][22][23], and one has even used an EnKF [24], this paper is the first to apply the EnKF to an ABM that contains unique agents and their interactions (both key elements for an ABM [25]) and tests the algorithm using a real-world example of crowd simulation rather than a toy system. ...
Article
Full-text available
Agent-based modelling has emerged as a powerful tool for modelling systems that are driven by discrete, heterogeneous individuals and has proven particularly popular in the realm of pedestrian simulation. However, real-time agent-based simulations face the challenge that they will diverge from the real system over time. This paper addresses this challenge by integrating the ensemble Kalman filter (EnKF) with an agent-based crowd model to enhance its accuracy in real time. Using the example of Grand Central Station in New York, we demonstrate how our approach can update the state of an agent-based model in real time, aligning it with the evolution of the actual system. The findings reveal that the EnKF can substantially improve the accuracy of agent-based pedestrian simulations by assimilating data as they evolve. This approach not only offers efficiency advantages over existing methods but also presents a more realistic representation of a complex environment than most previous attempts. The potential applications of this method span the management of public spaces under ‘normality’ to exceptional circumstances such as disaster response, marking a significant advancement for real-time agent-based modelling applications.
... The reduced physics coupled model used in this work has been used in parameter estimation experiments in previous studies (e.g., Hargreaves et al., 2004;Edwards et al., 2005;Annan et al., 2005). ...
Preprint
Full-text available
The Atlantic Meridional Overturning Circulation (AMOC) plays a central role in long-term climate variations through its heat and freshwater transports, which can collapse under a rapid increase in greenhouse gas forcing in climate models. Previous studies have suggested that the deviation of model parameters is one of the major factors inducing inaccurate AMOC simulations. In this work, with a low-resolution Earth system model, we try to explore whether reasonably adjusting the key model parameter can help to re-estabilish the AMOC after its collapse. Through a new optimization strategy, the freshwater flux (FWF) parameter is determined to be the dominant one on affecting the AMOC’s variability. Traditional ensemble optimal interpolation (EnOI) data assimilation and new machine learning methods are adopted to optimize the FWF parameter in an abrupted 4×CO 2 forcing experiment to improve the adaptability of model parameters and accelerate the recovery of AMOC. The results show that under an abrupted 4×CO 2 forcing in millennial simulations, the AMOC will first collapse and then be slowly re-established by the default FWF parameter. However, during the parameter adjustment process, the saltier and colder sea water over the North Atlantic region are the dominant factors in usefully improving the adaptability of the FWF parameter and accelerating the recovery of AMOC, according to their physical relationship with FWF on the interdecadal timescale.
... The limitations of the DA method include not only the application area and conditions, but also the inevitable errors in the application of the method. The error estimation in DA mainly comes from model error, observation error, analysis error and algorithm error caused by different DA methods (Annan et al., 2005). The model error mainly includes model structure error, parameter error, driving data error and model calculation error. ...
Article
Full-text available
As an important part of space weather forecasting, the prediction of solar wind parameters in the near‐Earth space is particularly significant. The introduction of data assimilation (DA) method can improve the reliability of numerical prediction. In this study, we use a three‐dimensional (3D) magnetohydrodynamics (MHD) numerical model with Kalman filter to infer the impact of the DA on solar wind modeling. We use the 3D MHD numerical model with near‐Earth in situ observations from the OMNI database to reconstruct solar wind parameters between 21.5 solar radii and 1 AU. The period from 2018 to 2021 is simulated, when the solar activity in the decay of the 24th solar cycle to the rising of 25th solar cycle. The numerical model generates two separate results, one without DA and one with DA directly performed on the model‐only results. Statistical analysis of observed, modeled and assimilated solar wind parameters at 1 AU reveals that assimilating simulations provide a more accurate forecast than the model‐only results with a sharp reduction in the root mean square error and an increase of correlation coefficient.
... Due to both these properties, a DA method such as the EnKF or variational methods that relies on Gaussian assumptions and uses only the first two statistical moments in the analysis step, needs to be modified in order to be able to deal with probability density functions poorly approximated by the normal distribution. The work of [163], [164], and [165] presented techniques in parameter estimation that have been successfully applied in low-resolution nonchaotic systems. In addition, [166], [167] adopted a new algorithm of [168] to the estimation of cloud microphysical parameters that use higher than second order moments. ...
Article
Full-text available
Data assimilation (DA) and uncertainty quantification (UQ) are extensively used in analysing and reducing error propagation in high-dimensional spatial-temporal dynamics. Typical applications span from computational fluid dynamics (CFD) to geoscience and climate systems. Recently, much effort has been given in combining DA, UQ and machine learning (ML) techniques. These research efforts seek to address some critical challenges in high-dimensional dynamical systems, including but not limited to dynamical system identification, reduced order surro-gate modelling, error covariance specification and model error correction. A large number of developed techniques and methodologies exhibit a broad applicability across numerous domains, resulting in the necessity for a comprehensive guide. This paper provides the first overview of state-of-the-art researches in this interdisciplinary field, covering a wide range of applications. This review is aimed at ML scientists who attempt to apply DA and UQ techniques to improve the accuracy and the interpretability of their models, but also at DA and UQ experts who intend to integrate cutting-edge ML approaches to their systems. Therefore, this article has a special focus on how ML methods can overcome the existing limits of DA and UQ, and vice versa. Some exciting perspectives of this rapidly developing research field are also discussed.
... Due to both these properties, a DA method such as the EnKF or variational methods that relies on Gaussian assumptions and uses only the first two statistical moments in the analysis step, needs to be modified in order to be able to deal with probability density functions poorly approximated by the normal distribution. The work of [163], [164], and [165] presented techniques in parameter estimation that have been successfully applied in low-resolution nonchaotic systems. In addition, [166], [167] adopted a new algorithm of [168] to the estimation of cloud microphysical parameters that use higher than second order moments. ...
Preprint
Full-text available
Data Assimilation (DA) and Uncertainty quantification (UQ) are extensively used in analysing and reducing error propagation in high-dimensional spatial-temporal dynamics. Typical applications span from computational fluid dynamics (CFD) to geoscience and climate systems. Recently, much effort has been given in combining DA, UQ and machine learning (ML) techniques. These research efforts seek to address some critical challenges in high-dimensional dynamical systems, including but not limited to dynamical system identification, reduced order surrogate modelling, error covariance specification and model error correction. A large number of developed techniques and methodologies exhibit a broad applicability across numerous domains, resulting in the necessity for a comprehensive guide. This paper provides the first overview of the state-of-the-art researches in this interdisciplinary field, covering a wide range of applications. This review aims at ML scientists who attempt to apply DA and UQ techniques to improve the accuracy and the interpretability of their models, but also at DA and UQ experts who intend to integrate cutting-edge ML approaches to their systems. Therefore, this article has a special focus on how ML methods can overcome the existing limits of DA and UQ, and vice versa. Some exciting perspectives of this rapidly developing research field are also discussed.
... Data assimilation (DA) schemes provide an objective and efficient methodology for parameter estimation by combining observations with a numerical model simulation (Eknes and Evensen, 2002). Particularly, ensemble based sequential DA schemes like the Ensemble Kalman Filter (EnKF) offer a simple but efficient framework for automatic optimisation of model parameters alongside the state variables by simply augmenting them together using "Joint-EnKF" formulation (Anderson, 2001;Annan et al., 2005;Jazwinski, 2007). The EnKF (Evensen, 2003) is based on a Monte Carlo sampling of the state space thereby avoiding model linearization. ...
Article
Full-text available
Ocean biogeochemical (BGC) models utilise a large number of poorly-constrained global parameters to mimic unresolved processes and reproduce the observed complex spatio-temporal patterns. Large model errors stem primarily from inaccuracies in these parameters whose optimal values can vary both in space and time. This study aims to demonstrate the ability of ensemble data assimilation (DA) methods to provide high-quality and improved BGC parameters within an Earth system model in an idealized perfect twin experiment framework. We use the Norwegian Climate Prediction Model (NorCPM), which combines the Norwegian Earth System Model with the Dual-One-Step ahead smoothing-based Ensemble Kalman Filter (DOSA-EnKF). We aim to estimate five spatially varying BGC parameters by assimilating salinity and temperature profiles and surface BGC (Phytoplankton, Nitrate, Phosphate, Silicate, and Oxygen) observations in a strongly coupled DA framework—i.e., jointly updating ocean and BGC state-parameters during the assimilation. We show how BGC observations can effectively constrain error in the ocean physics and vice versa. The method converges quickly (less than a year) and largely reduces the errors in the BGC parameters. Some parameter error remains, but the resulting state variable error using the estimated parameters for a free ensemble run and for a reanalysis performs nearly as well as with true parameter values. Optimal parameter values can also be recovered by assimilating climatological BGC observations or sparse observational networks. The findings of this study demonstrate the applicability of the DA approach for tuning the system in a real framework.
... Data assimilation (DA) is a statistical technique that combines sparse observations and spatially complete model output to generate a better estimate of climate fields when compared to using models and observations individually. Using DA to reconstruct climate variability has undergone rapid development in recent years (Dirren and Hakim 2005;Annan et al. 2005;Goosse et al. 2006;Ridgwell et al. 2007;Widmann et al. 2010;Bhend et al. 2012;Steiger et al. 2014;Hakim et al. 2016;Tardif et al. 2019), though few studies have used DA techniques to reconstruct sea ice. Klein et al. (2014) reconstruct Arctic sea ice over a 400-year period in the mid-Holocene with a particle-filter that uses a proxy based reconstruction (based on dinocyst assemblages) to constrain an ensemble of model runs at each time step. ...
Article
Arctic sea ice decline in recent decades has been dramatic; however, few long-term records of Arctic sea ice exist to put such a decline in context. Here we employ an ensemble Kalman filter data assimilation approach to reconstruct Arctic sea ice concentration over the last two millennia by assimilating temperature-sensitive proxy records with ensembles drawn from last millennium climate model simulations. We first test the efficacy of this method using pseudoproxy experiments. Results show good agreement between the target and reconstructed total Arctic sea ice extent ( R ² value and coefficient of efficiency values of 0.51 and 0.47 for perfect model experiments and 0.43 and 0.43 for imperfect model experiments). Imperfect model experiments indicate that the reconstructions inherit some bias from the model prior. We assimilate 487 temperature-sensitive proxy records with two climate model simulations to produce two gridded reconstructions of Arctic sea ice over the last two millennia. These reconstructions show good agreement with satellite observations between 1979 and 1999 CE for total Arctic sea ice extent with an R ² value and coefficient of efficiency of about 0.60 and 0.50, respectively, for both models. Regional quantities derived from these reconstructions show encouraging similarities with independent reconstructions and sea ice sensitive proxy records from the Barents Sea, Baffin Bay, and East Greenland Sea. The reconstructions show a positive trend in Arctic sea ice extent between around 750 and 1820 CE, and increases during years with large volcanic eruptions that persist for about 5 years. Trend analysis of total Arctic sea ice extent reveals that for time periods longer than 30 years, the satellite era decline in total Arctic sea ice extent is unprecedented over the last millennium. Significance Statement Areal coverage of Arctic sea ice is a critical aspect of the climate system that has been changing rapidly in recent decades. Prior to the advent of satellite observations, sparse observations of Arctic sea ice make it difficult to put the current changes in context. Here we reconstruct annual averages of Arctic sea ice coverage for the last two millennia by combining temperature-sensitive proxy records (i.e., ice cores, tree rings, and corals) with climate model simulations using a statistical technique called data assimilation. We find large interannual changes in Arctic sea ice coverage prior to 1850 that are associated with volcanic eruptions, with a steady rise in Arctic sea ice coverage between 750 and 1820 CE. The satellite-period loss of sea ice has no analog during the last millennium.
... The method of parameter estimation is based on the theory of data assimilation, i.e., information estimation theory or filtering theory (e.g., Jazwinski, 1970). Research on the use of observations to estimate model parameters has attracted extensive attention and has produced encouraging results in the literature (Annan et al., 2005;Aksoy et al., 2006a, b;Hansen and Penland, 2007;Kondrashov et al., 2008;Hu et al., 2010). Based on EAKF, a data assimilation scheme for enhanced parameter correction is designed to improve parameter estimation using observations . ...
Article
Full-text available
The multiple equilibria are an outstanding characteristic of the Atlantic meridional overturning circulation (AMOC) that has important impacts on the Earth climate system appearing as regime transitions. The AMOC can be simulated in different models, but the behavior deviates from the real world due to the existence of model errors. Here, we first combine a general AMOC model with an ensemble Kalman filter to form an ensemble coupled model data assimilation and parameter estimation (CDAPE) system and derive the general methodology to capture the observed AMOC regime transitions through utilization of observational information. Then we apply this methodology designed within a “twin” experiment framework with a simple conceptual model that simulates the transition phenomenon of AMOC multiple equilibria as well as a more physics-based MOC box model to reconstruct the “observed” AMOC multiple equilibria. The results show that the coupled model parameter estimation with observations can significantly mitigate the model deviations, thus capturing regime transitions of the AMOC. This simple model study serves as a guideline when a coupled general circulation model is used to incorporate observations to reconstruct the AMOC historical states and make multi-decadal climate predictions.
... The EnKF provided a better parameter estimation and consequently better production forecasts. In the field of atmospheric modelling, Annan et al. (2005) implemented the EnKF for estimation of parameters from radiation, convection and surface parametrisation. The authors concluded that the EnKF can handle multivariate parameter estimation comfortably and demonstrated the method to be useful in determining structural deficiencies in the model which can not be improved by tuning and can be a useful tool to guide model development. ...
Article
A recursive ensemble Kalman filter (EnKF) is used as the data assimilation scheme to estimate strength and stiffness parameters simultaneously for a fully coupled hydro-mechanical slope stability analysis. Two different constitutive models are used in the hydro-mechanical model: the Mohr-Coulomb (MC) model and the Hardening Soil (HS) model. The data assimilation framework allows the investigation of the effect of constitutive behaviour on its ability to estimate the factor of safety using measurements of horizontal nodal displacement at the sloping face. In a synthetic study, close-to-failure and far-from-failure cases of prior property estimations illustrate the effect of initial material property distribution with different material models. The results show that both models provide a reliable factor of safety when the distribution of prior parameters is selected close-to-failure. However, the HS model results in the improved estimation of factor of safety for the far-from-failure case while this is not the case for the MC model. In addition, for the same level of accuracy the computational effort required for the HS model is comparatively less than for the MC model.
... It provides a real-time optimization for the prediction by "adjusting" the model parameters based on the observational information of the variables (e.g., penetration resistance in this study). The technique has been widely incorporated in the forecast models of earth systems [25][26][27][28][29][30][31][32]. The steps for the implementation of POT are summarized in Appendix B. ...
Article
Full-text available
Clay–sand–clay deposits are commonly encountered in the offshore field. For spudcan installation in this soil stratigraphy, the potential for punch-through exists, with the peak penetration resistance formed within the interbedded sand layer. Therefore, a careful assessment of the penetration resistance profile has to be performed. Based on the recently proposed failure-stress-dependent model, this paper presents a modified predictive model for estimating the peak resistance. The modified model incorporates the bearing capacity depth factor and the protruded soil plug in the bottom clay layer into the formulation. It is proven that the modified predictive model provides improved deterministic estimations for the peak resistances measured in centrifuge tests. Based on the modified predictive model, a parameter optimization technique is utilized to optimize the prediction of peak resistance using penetration resistances observed beforehand. A detailed application procedure is proposed and applied to the centrifuge tests accumulated from existing publications, with further improvement on the predictions demonstrated. The proposed parameter optimization procedure combined with the modified predictive model provides an approach to perform real-time optimization for assessing spudcan peak resistance in clay–sand–clay deposits.
... Annan et al. (2005b) used an EnKF to simultaneously estimate the state and five parameters within a spectral atmospheric model, resulting in an improved fit to an atmospheric reanalysis. With application to a coupled atmosphere ocean intermediate complexity model, Annan et al. (2005a) used an EnKF to estimate 12 parameters and in doing so successfully tuned the model climatology to better agree with the observed one. In a numerical weather prediction context, Aksoy et al. (2006) used an EnKF to estimate the state and vertical mixing parameter in twin experiments with a regional area atmospheric model. ...
Article
Full-text available
Abstract Coupled general circulation models (GCM), and their atmospheric, oceanic, land, and sea‐ice components have many parameters. Some parameters determine the numerics of the dynamical core, while others are based on our current understanding of the physical processes being simulated. Many of these parameters are poorly known, often globally defined, and are subject to pragmatic choices arising from a complex interplay between grid resolution and inherent model biases. To address this problem, we use an ensemble transform Kalman filter, to estimate spatiotemporally varying maps of ocean albedo and shortwave radiation e‐folding length scale in a coupled climate GCM. These parameters are designed to minimize the error between short term (3–28 days) forecasts of the climate model and a network of real world atmospheric, oceanic, and sea‐ice observations. The data assimilation system has an improved fit to observations when estimating ocean albedo and shortwave e‐folding length scale either individually or simultaneously. However, only individually estimated maps of shortwave e‐folding length scale are also shown to systematically reduce bias in longer multiyear climate forecasts during an out‐of‐sample period. The bias of the multiyear forecasts is reduced for parameter maps determined from longer DA cycle lengths.
... The method of parameter estimation is based on the theory of data assimilation, i.e. information estimation theory, or filtering theory (e.g., Jazwinski, 1970). Research on the use of observations to estimate model parameters has attracted extensive attention and has produced encouraging results in the literature (Annan et al., 2005;Aksoy et al., 2006aAksoy et al., , 2006bHansen and Penland, 2007;Kondrashov et al., 2008;Hu et al., 2010). 110 ...
Preprint
Full-text available
The multiple equilibria are an outstanding characteristic of the Atlantic meridional overturning circulation (AMOC) that has important impacts on the Earth climate system appearing as regime transitions. The AMOC can be simulated in different models but the behavior deviates from the real world due to the existence of model errors. Here, we first combine a general AMOC model with an ensemble Kalman filter to form an ensemble coupled model data assimilation and parameter estimation (CDAPE) system, and derive the general methodology to capture the observed AMOC regime transitions through utilization of observational information. Then we apply this methodology designed within a twin experiment framework with a simple conceptual model that simulates the transition phenomenon of AMOC multiple equilibria, as well as a more physics-based MOC box model to reconstruct the observed AMOC multiple equilibria. The results show that the coupled model parameter estimation with observations can significantly mitigate the model deviations, thus capturing regime transitions of the AMOC. This simple model study serves as a guideline when a coupled general circulation model is used to incorporate observations to reconstruct the AMOC historical states and make multi-decadal climate predictions.
... Other alternatives for parameter estimation with the KF include calibrating parameters outside the KF calculation with an outer optimisation routine [11][12][13], and parameter estimation in steady-state KF calculations where observations are climatological averages over the entire time period of interest [14], but in both of these two approaches the parameter estimation part of the calculation considers all observations at once rather than sequentially. ...
Chapter
Full-text available
In the era of technology and digitalization, the process industries are undergoing a digital transformation. The available process models, advance sensor technologies, enhanced computational power and a broad set of data analytical techniques enable solid bases for digital transformation in the biopharmaceutical industry. Among various data analytical techniques, the Kalman filter and its non-linear extensions are powerful tools for prediction of reliable process information. The combination of the Kalman filter with a virtual representation of the bioprocess, called digital twin, can provide real-time available process information. Incorporation of such variables in process operation can provide improved control performance with enhanced productivity. In this chapter the linear discrete Kalman filter, the extended Kalman filter and the unscented Kalman filters are described and a brief overview of applications of the Kalman filter and its non-linear extensions to bioreactors are presented. Furthermore, in a case study an example of the digital twin of the backer’s yeast batch cultivation process is presented.
... Note that, in the statistics literature, the term "ARMA process" generally refers to an ARMA filter driven by a Gaussian white-noise input. For more information about ARMA models and the backshift operator, see Brockwell and Davis (2002). 100 D. P. Cummins et al.: A new energy-balance approach to linear filtering Data availability. ...
Article
Full-text available
Reliable estimates of historical effective radiative forcing (ERF) are important for understanding the causes of past climate change and for constraining predictions of future warming. This study proposes a new linear-filtering method for estimating historical radiative forcing from time series of global mean surface temperature (GMST), using energy-balance models (EBMs) fitted to GMST from CO2-quadrupling general circulation model (GCM) experiments. We show that the response of any k-box EBM can be represented as an ARMA(k, k−1) (autoregressive moving-average) filter. We show how, by inverting an EBM's ARMA filter representation, time series of surface temperature may be converted into radiative forcing. The method is illustrated using three-box EBM fits to two recent Earth system models from CMIP5 and CMIP6 (Coupled Model Intercomparison Project). A comparison with published results obtained using the established ERF_trans method, a purely GCM-based approach, shows that our new method gives an ERF time series that closely matches the GCM-based series (correlation of 0.83). Time series of estimated historical ERF are obtained by applying the method to a dataset of historical temperature observations. The results show that there is clear evidence of a significant increase over the historical period with an estimated forcing in 2018 of 1.45±0.504 W m−2 when derived using the two Earth system models. This method could be used in the future to attribute past climate changes to anthropogenic and natural factors and to help constrain estimates of climate sensitivity.
... The Ensemble Kalman Filter (EnKF) is a Bayesian filtering algorithm used to estimate unknown states and parameters of nonlinear systems by combining model predictions with available system observations [1][2][3]. While these algorithms are commonly used for data assimilation in applications to weather prediction [4][5][6] and guidance, navigation, and control [7][8][9], ensemble Kalman-type filters have recently been utilized for parameter estimation and forecast prediction in a variety of epidemiological studies [10][11][12][13][14][15][16]. ...
Preprint
The Ensemble Kalman Filter (EnKF) is a Bayesian filtering algorithm utilized in estimating unknown model states and parameters for nonlinear systems. An important component of the EnKF is the observation function, which connects the unknown system variables with the observed data. These functions take different forms based on modeling assumptions with respect to the available data and relevant system parameters. The goal of this research is to analyze the effects of observation function selection in the EnKF in the setting of epidemic modeling, where a variety of observation functions are used in the literature. In particular, four observation functions of different forms and various levels of complexity are examined in connection with the classic Susceptible-Infectious-Recovered (SIR) model. Results demonstrate the importance of choosing an observation function that well interprets the available data on the corresponding EnKF estimates in several filtering scenarios, including state estimation with known parameters, and combined state and parameter estimation with both constant and time-varying parameters. Numerical experiments further illustrate how modifying the observation noise covariance matrix in the filter can help to account for uncertainty in the observation function in certain cases.
... (1) Implementation of EnKF or extended EnKF-CPE in intermediate coupled models (Annan et al. 2005;Annan and Hargreaves 2007;Kondrashov et al. 2008). ...
Article
Full-text available
Recent studies have started to explore coupled data assimilation (CDA) in coupled ocean–atmosphere models because of the great potential of CDA to improve climate analysis and seamless weather–climate prediction on weekly-to-decadal time scales in advanced high-resolution coupled models. In this review article, we briefly introduce the concept of CDA before outlining its potential for producing balanced and coherent weather–climate reanalysis and minimizing initial coupling shocks. We then describe approaches to the implementation of CDA and review progress in the development of various CDA methods, notably weakly and strongly coupled data assimilation. We introduce the method of coupled model parameter estimation (PE) within the CDA framework and summarize recent progress. After summarizing the current status of the research and applications of CDA-PE, we discuss the challenges and opportunities in high-resolution CDA-PE and nonlinear CDA-PE methods. Finally, potential solutions are laid out.
... We focus on the estimation of the parameters θ and assume that ν and the parameters of f are known. Estimating ν in energy balance models with data assimilation methods is studied in Annan et al. (2005), whereas estimation of parameters of f in the context of linear SPDEs is covered for example in Lind- gren et al. (2011). In a paleoclimate context, temperature observations are sparse (in space and time) and derived from climatic proxies, such as pollen assemblages, isotopic compositions, and tree rings, which are indirect measures of the climate state. ...
Article
Full-text available
While nonlinear stochastic partial differential equations arise naturally in spatiotemporal modeling, inference for such systems often faces two major challenges: sparse noisy data and ill-posedness of the inverse problem of parameter estimation. To overcome the challenges, we introduce a strongly regularized posterior by normalizing the likelihood and by imposing physical constraints through priors of the parameters and states. We investigate joint parameter-state estimation by the regularized posterior in a physically motivated nonlinear stochastic energy balance model (SEBM) for paleoclimate reconstruction. The high-dimensional posterior is sampled by a particle Gibbs sampler that combines a Markov chain Monte Carlo (MCMC) method with an optimal particle filter exploiting the structure of the SEBM. In tests using either Gaussian or uniform priors based on the physical range of parameters, the regularized posteriors overcome the ill-posedness and lead to samples within physical ranges, quantifying the uncertainty in estimation. Due to the ill-posedness and the regularization, the posterior of parameters presents a relatively large uncertainty, and consequently, the maximum of the posterior, which is the minimizer in a variational approach, can have a large variation. In contrast, the posterior of states generally concentrates near the truth, substantially filtering out observation noise and reducing uncertainty in the unconstrained SEBM.
... Importantly, the impact of simulation crashes on the validity of global sensitivity analysis (GSA) results has often been overlooked in the literature, where simulation crashes are commonly classified as ignorable (see section 1.2). As such, a surprisingly limited number of studies have reported simulation crashes (examples related to uncertainty analysis include Annan et al., 2005;Edwards and Marsh, 2005;Lucas et al., 2013). This is despite the fact that these crashes can be very computationally costly for GSA algorithms because they can waste the rest of the model runs, prevent completion of GSA, or inevitably introduce ambiguity into the inferences drawn from GSA. ...
Thesis
Full-text available
Complex Environmental Systems Models (CESMs) have been developed and applied as vital tools to tackle the ecological, water, food, and energy crises that humanity faces, and have been used widely to support decision-making about management of the quality and quantity of Earth’s resources. CESMs are often controlled by many interacting and uncertain parameters, and typically integrate data from multiple sources at different spatio-temporal scales, which make them highly complex. Global Sensitivity Analysis (GSA) techniques have proven to be promising for deepening our understanding of the model complexity and interactions between various parameters and providing helpful recommendations for further model development and data acquisition. Aside from the complexity issue, the computationally expensive nature of the CESMs precludes effective application of the existing GSA techniques in quantifying the global influence of each parameter on variability of the CESMs’ outputs. This is because a comprehensive sensitivity analysis often requires performing a very large number of model runs. Therefore, there is a need to break down this barrier by the development of more efficient strategies for sensitivity analysis. The research undertaken in this dissertation is mainly focused on alleviating the computational burden associated with GSA of the computationally expensive CESMs through developing efficiency-increasing strategies for robust sensitivity analysis. This is accomplished by: (1) proposing an efficient sequential sampling strategy for robust sampling-based analysis of CESMs; (2) developing an automated parameter grouping strategy of high-dimensional CESMs, (3) introducing a new robustness measure for convergence assessment of the GSA methods; and (4) investigating time-saving strategies for handling simulation failures/crashes during the sensitivity analysis of computationally expensive CESMs. This dissertation provides a set of innovative numerical techniques that can be used in conjunction with any GSA algorithm and be integrated in model building and systems analysis procedures in any field where models are used. A range of analytical test functions and environmental models with varying complexity and dimensionality are utilized across this research to test the performance of the proposed methods. These methods, which are embedded in the VARS–TOOL software package, can also provide information useful for diagnostic testing, parameter identifiability analysis, model simplification, model calibration, and experimental design. They can be further applied to address a range of decision making-related problems such as characterizing the main causes of risk in the context of probabilistic risk assessment and exploring the CESMs’ sensitivity to a wide range of plausible future changes (e.g., hydrometeorological conditions) in the context of scenario analysis.
... We focus on the estimation of the parameters θ and assume that ν and the parameters of f are known. Estimating ν in energy balance models with data assimilation methods is studied in [3], whereas estimation of parameters of f in the context of linear SPDEs is covered for example in [26]. ...
Preprint
Full-text available
While nonlinear stochastic partial differential equations arise naturally in spatiotemporal modeling, inference for such systems often faces two major challenges: sparse noisy data and ill-posedness of the inverse problem of parameter estimation. To overcome the challenges, we introduce a strongly regularized posterior by normalizing the likelihood and by imposing physical constraints through priors of the parameters and states. We investigate joint parameter-state estimation by the regularized posterior in a physically motivated nonlinear stochastic energy balance model (SEBM) for paleoclimate reconstruction. The high-dimensional posterior is sampled by a particle Gibbs sampler that combines MCMC with an optimal particle filter exploiting the structure of the SEBM. In tests using either Gaussian or uniform priors based on the physical range of parameters, the regularized posteriors overcome the ill-posedness and lead to samples within physical ranges, quantifying the uncertainty in estimation. Due to the ill-posedness and the regularization, the posterior of parameters presents a relatively large uncertainty, and consequently, the maximum of the posterior, which is the minimizer in a variational approach, can have a large variation. In contrast, the posterior of states generally concentrates near the truth, substantially filtering out observation noise and reducing uncertainty in the unconstrained SEBM.
... By contrast, physical or model closure parameters are referred to as deterministic parameters. Whereas estimation of deterministic parameters of a dynamical model is straightforward using state augmentation within the ensemble Kalman filter (Annan et al., 2005;Yang and Delsole, 2009;Ruiz et al., 2013a), stochastic parameters cannot be estimated in this way. Previous studies on the use of the augmented state approach for the estimation of stochastic parameters showed that the lack of covariance between ensemble mean of state variables and the stochastic parameters may lead to unreliable estimations (DelSole and Yang, 2010;Santitissadeekorn and Jones, 2015). ...
Article
Full-text available
Stochastic parametrizations are increasingly used to represent the uncertainty associated with model errors in ensemble forecasting and data assimilation. One of the challenges associated with the use of these parametrizations is the characterization of the statistical properties of the stochastic processes within their formulation. In this work, a hierarchical Bayesian approach based on two nested ensemble Kalman filters is proposed for inferring parameters associated with stochastic parametrizations. The proposed technique is based on the Rao‐Blackwellization of the parameter estimation problem. It consists of an ensemble of ensemble Kalman filters, each of them using a different set of stochastic parameter values. We show the ability of the technique to infer parameters related to the covariance of stochastic representations of model error in the Lorenz‐96 dynamical system. The evaluation is conducted with stochastic twin experiments and with imperfect model experiments with unresolved physics in the forecast model. The technique performs successfully under different model error covariance structures. The technique is conceived to be applied offline as part of an apriori optimization of the data assimilation system and could, in principle, be extended to the estimation of other hyperparameters of the data assimilation system.
... Typically, EnKF uses a Monte Carlo approach to sample the variables randomly among their distribution range. In this study, instead of Monte Carlo, Latin-Hypercube sampling was selected to increase the efficiency (Annan et al., 2005;Xie and Zhang, 2010). The Latin-Hypercube sampling method (McKay et al., 1979;Iman and Conover, 1980;McKay, 1988) is based on Monte Carlo simulation but instead of random sampling approach, it uses a stratified sampling approach. ...
Article
There is lack of knowledge in assimilation of multi-sensor, multi-modal water temperature observations into hydrodynamic models of shallow rivers, specifically, (a) how does accuracy of shallow river model improve after assimilation of in-situ and remotely sensed temperature observations, and (b) how can data from disparate sensors and sources be assimilated into the prediction model without significant increase in computational burden. Multi-sensor, multi-modal observations used in this study are obtained from in-situ monitoring devices in the river with limited spatial coverage but dense temporal resolution, and from Landsat-7 satellite that has better spatial coverage but limited temporal coverage. Use of remotely-sensed observations from satellites poses the challenge that the physical region represented by satellite data does not directly correspond with the physical domain of the river simulated in hydrodynamic models. Satellites detects the skin water temperature, whereas numerical models estimate the bulk temperature in the surface layer that may be multiple meters in thickness. Furthermore, for rivers narrower than the resolution of satellite data, the temperature of each cell represents the weighted average temperature of land and water. These factors introduce biases into the updated numerical model, thereby impeding appropriate management of temperature, water quality, and aquatic ecology. We implemented an efficient ensemble Kalman filter method using Latin-hypercube sampling to assimilate multi-sensor water temperature observations into the hydrodynamic model of the Lower Klamath River located in northern California. Assimilation of remote sensing data from Landsat-7 improved the model prediction for the entire river. The average spatial error was reduced from 2.59 °C to 0.66 °C (i.e., 75% improvement). In-situ data assimilation reduced the error at the observation location, however, error in the water temperature predicted by the updated model reverted in less than two days to the same level as that of an un-updated model. On the other hand, it is not computationally efficient to assimilate all of the available data into the model as they become available. In order to overcome these challenges, in-situ data were adaptively assimilated into the model whenever the error exceeded a maximum allowable error. Adaptive assimilation of in-situ data for the Lower Klamath river application occurred one to three times per day, and reduced the average daily error up to 58% compared to assimilation of in-situ data only once each day.
... Adjoint of sea ice-ocean models has been successfully applied only for a few years assimilation window yet and is very unlikely that the adjoint can be applied for multidecadal windows because the adjoint gets unstable. In comparison with the parameter optimizations using the EnKF approach (e.g., Annan et al. 2005;Massonnet et al. 2014), the mGA approach again requires larger computational resources (approximately one order of magnitude larger than EnKF). Nevertheless, the mGA approach has an advantage when the model shows a strongly nonlinear relation between model state and parameters, for which the EnKF approach requires larger number of ensembles or even has difficulties to FIG. 16. ...
Article
Full-text available
Improvement and optimization of numerical sea ice models are of great relevance for understanding the role of sea ice in the climate system. They are also a prerequisite for meaningful prediction. To improve the simulated sea ice properties, we develop an objective parameter optimization system for a coupled sea ice–ocean model based on a genetic algorithm. To take the interrelation of dynamic and thermodynamic model parameters into account, the system is set up to optimize 15 model parameters simultaneously. The optimization is minimizing a cost function composed of the model–observation misfit of three sea ice quantities (concentration, drift, and thickness). The system is applied for a domain covering the entire Arctic and northern North Atlantic Ocean with an optimization window of about two decades (1990–2012). It successfully improves the simulated sea ice properties not only during the period of optimization but also in a validation period (2013–16). The similarity of the final values of the cost function and the resulting sea ice fields from a set of 11 independent optimizations suggest that the obtained sea ice fields are close to the best possible achievable by the current model setup, which allows us to identify limitations of the model formulation. The optimized parameters are applied for a simulation with a higher-resolution model to examine a portability of the parameters. The result shows good portability, while at the same time, it shows the importance of the oceanic conditions for the portability.
... Importantly, the impact of simulation crashes on the validity of global sensitivity analysis (GSA) results has often been overlooked in the literature, where simulation crashes are commonly classified as ignorable (see section 1.2). As such, a surprisingly limited number of studies have reported simulation crashes (examples related to uncertainty analysis include 25 Annan et al., 2005;Edwards and Marsh, 2005;Lucas et al., 2013). This is despite the fact that these crashes can be very computationally costly for GSA algorithms because they can waste the rest of the model runs, prevent completion of GSA, or inevitably introduce ambiguity into the inferences drawn from GSA. ...
Article
Full-text available
Complex, software-intensive, technically advanced, and computationally demanding models, presumably with ever-growing realism and fidelity, have been widely used to simulate and predict the dynamics of the Earth and environmental systems. The parameter-induced simulation crash (failure) problem is typical across most of these models, despite considerable efforts that modellers have directed at model development and implementation over the last few decades. A simulation failure mainly occurs due to the violation of the numerical stability conditions, non-robust numerical implementations, or errors in programming. However, the existing sampling-based analysis techniques such as global sensitivity analysis (GSA) methods, which require running these models under many configurations of parameter values, are ill-equipped to effectively deal with model failures. To tackle this problem, we propose a novel approach that allows users to cope with failed designs (samples) during the GSA, without knowing where they took place and without re-running the entire experiment. This approach deems model crashes as missing data and uses strategies such as median substitution, single nearest neighbour, or response surface modelling to fill in for model crashes. We test the proposed approach on a 10-paramter HBV-SASK rainfall-runoff model and a 111-parameter MESH land surface-hydrology model. Our results show that response surface modelling is a superior strategy, out of the data filling strategies tested, and can scale well to the dimensionality of the model, sample size, and the ratio of number of failures to the sample size. Further, we conduct a "failure analysis" and discuss some possible causes of the MESH model failure.
... The parameters that characterize the covariance matrix of the stochastic process are referred to as stochastic parameters from now on. Whereas estimation of deterministic parameters of the dynamical model is straightforward within the ensemble Kalman filter using state augmentation (Annan et al., 2005;Ruiz et al., 2013a), stochastic parameters cannot be estimated in this way. Previous studies on the use of the augmented state approach in the ensemble Kalman filter showed that the lack of correlation between the mean of the ensemble of state variables and the stochastic parameters may lead to unreliable estimations (DelSole and Yang, 2010;Santitissadeekorn and Jones, 2015). ...
Preprint
Full-text available
Stochastic parameterizations are increasingly being used to represent the uncertainty associated with model errors in ensemble forecasting and data assimilation. One of the challenges associated with the use of these parameterizations is the optimization of the properties of the stochastic forcings within their formulation. In this work a hierarchical data assimilation approach based on two nested ensemble Kalman filters is proposed for inferring parameters associated with a stochastic parameterization. The proposed technique is based on the Rao-Blackwellization of the parameter estimation problem. The technique consists in using an ensemble of ensemble Kalman filters, each of them using a different set of stochastic parameter values. We show the ability of the technique to infer parameters related to the covariance structure of stochastic representations of model error in the Lorenz-96 dynamical system. The evaluation is conducted with stochastic twin experiments and imperfect model experiments with unresolved physics in the forecast model. The proposed technique performs successfully under different model error covariance structures. The technique is proposed to be applied offline as part of an a priori optimization of the data assimilation system and could in principle be extended to the estimation of other hyperparameters of a data assimilation system.
... Traditional methods in estimating these parameters include Markov Chain Monte Carlo (MCMC) [198][199][200][201], maximum likelihood estimation [202,203] and the ensemble Kalman filter [204,205]. Note that, if both the state variables γ, ω, b and the associated 9 parameters in these three equations are treated as unobserved variables u II , then the augmented system does not belong to the conditional Gaussian model family. ...
Article
Full-text available
A conditional Gaussian framework for understanding and predicting complex multiscale nonlinear stochastic systems is developed. Despite the conditional Gaussianity, such systems are nevertheless highly nonlinear and are able to capture the non-Gaussian features of nature. The special structure of the system allows closed analytical formulae for solving the conditional statistics and is thus computationally efficient. A rich gallery of examples of conditional Gaussian systems are illustrated here, which includes data-driven physics-constrained nonlinear stochastic models, stochastically coupled reaction–diffusion models in neuroscience and ecology, and large-scale dynamical models in turbulence, fluids and geophysical flows. Making use of the conditional Gaussian structure, efficient statistically accurate algorithms involving a novel hybrid strategy for different subspaces, a judicious block decomposition and statistical symmetry are developed for solving the Fokker–Planck equation in large dimensions. The conditional Gaussian framework is also applied to develop extremely cheap multiscale data assimilation schemes, such as the stochastic superparameterization, which use particle filters to capture the non-Gaussian statistics on the large-scale part whose dimension is small whereas the statistics of the small-scale part are conditional Gaussian given the large-scale part. Other topics of the conditional Gaussian systems studied here include designing new parameter estimation schemes and understanding model errors.
Preprint
Full preprint text available at nearby DOI. We present the Energy Balance Model – Kalman Filter (EBM-KF), a hybrid model of the global mean surface temperature (GMST) and ocean heat content anomaly (OHCA). It combines an energy balance model with parameters drawn from the literature and a statistical Extended Kalman Filter assimilating observed and/or earth system model-simulated GMST and OHCA. Our motivation is to create an efficient and natural estimator of the climate state and its uncertainty. Our climate emulator has the physical rationale of an annual energy budget, and is compatible with an Extended Kalman Filter both because it forms a set of difference equations (involving 17 constants) and because climate models and historical records of GMST and OHCA follow nearly Gaussian distributions about their relevant means. We illustrate four applications: 1) EBM-KF generates a similar estimate to the 30-year time-averaged climate state 15 years sooner. 2) EBM-KF conveniently assesses annually the likelihood of crossing a policy threshold, e.g., 2°C over preindustrial. 3) The EBM-KF also approximates the behavior of an entire climate model large ensemble using only one or a few ensemble members. 4) The EBM-KF is sufficiently fast to allow thorough sampling from non-Gaussian probabilistic futures, e.g., the impact of rare but significant volcanic eruptions. Indeed, volcanic eruptions dominate the future uncertainty over the slowly growing GMST climate state uncertainty. This sampling with the EBM-KF better determines how future volcanism may affect when policy thresholds will be crossed and what a larger-than-large ensemble including future intermittent volcanism would reveal.
Article
In variational methods, coupled parameter optimization (CPO) often needs a long minimization time window (MTW) to fully incorporate observational information, but the optimal MTW somehow depends on the model nonlinearity. The analytical four-dimensional ensemble-variational (A-4DEnVar) considers model nonlinearity well and avoids adjoint model. It can theoretically be applied to CPO. To verify the feasibility and the ability of the A-4DEnVar in CPO, “twin” experiments based on A-4DEnVar CPO are conducted for the first time with the comparison of four-dimensional variational (4D-Var). Two algorithms use the same background error covariance matrix and optimization algorithm to control variates. The experiments are based on a simple coupled ocean-atmosphere model, in which the atmospheric part is the highly nonlinear Lorenz-63 model, and the oceanic part is a slab ocean model. The results show that both A-4DEnVar and 4D-Var can effectively reduce the error of state variables through CPO. Besides, two methods produce almost the same results in most cases when the MTW is less than 560 time steps. The results are similar when the MTW is larger than 560 time steps and less than 880 time steps. The largest MTW of 4D-Var and A-4DEnVar are 1 200 time steps. Moreover, A-4DEnVar is not sensitive to ensemble size when the MTW is less than 720 time steps. A-4DEnVar obtains satisfactory results in the case of highly nonlinear model and long MTW, suggesting that it has the potential to be widely applied to realistic CPO.
Article
The ensemble Kalman filter (EnKF) was used to estimate the spatial distribution of the Young's modulus of a model of an earth‐fill dam by assimilating the travel time to the first arrival of the surface waves. By the ensemble data assimilation, the measured data from a geophysical exploration was applied to simultaneously estimate the geotechnical properties and evaluate the uncertainties. Swedish weight sounding (SWS) test results were employed as the prior information to generate the initial ensemble through the sequential Gaussian simulation (sGs). In the experiments of assimilation, it was shown that the reproducibility of the parameter field is enhanced by this initial ensemble generation method, and that the uncertainties of the identified parameters can be reduced by the assimilation.
Article
Full-text available
Parameter estimation plays an important role in reducing model error and thus is of great significance to improve the simulation and prediction capabilities of the model. However, due to filtering divergence, parameter estimation by ensemble-based filters still faces great challenges. Previous studies have shown that a covariance inflation scheme could alleviate the filtering divergence problem by increasing the signal-to-noise ratio of the state-parameter covariance. In this study, we proposed a new inflation scheme based on a local ensemble transform Kalman filter (LETKF). With the new scheme, the Zebiak–Cane (Z-C) model parameters were estimated by assimilating the sea surface temperature anomaly (SSTA) data. The effectiveness of the parameter estimation and its influence on El Niño–Southern Oscillation (ENSO) prediction were evaluated in an observation system simulation experiments (OSSE) framework and real-world scenario, respectively. With the utilization of the OSSE framework, the results showed that the model parameters were successfully estimated. Parameter estimation reduced the model error when compared with only state estimation (onlySE); however, multiple parameter estimation (MPE) further improved the ENSO prediction skill by providing better initial conditions and parameter values than the single parameter estimation (SPE). Parameter estimation could thus alleviate the spring prediction barrier (SPB) phenomenon of ENSO to a certain extent. In real-world experiments, the optimized parameters significantly improved the ENSO forecasting skill, primarily in prediction of warm events. This study provides an effective parameter estimation strategy to improve climate models and further climate predictions in the real world.
Article
The Ensemble Kalman Filter (EnKF) is a popular sequential data assimilation method that has been increasingly used for parameter estimation and forecast prediction in epidemiological studies. The observation function plays a critical role in the EnKF framework, connecting the unknown system variables with the observed data. Key differences in observed data and modeling assumptions have led to the use of different observation functions in the epidemic modeling literature. In this work, we present a novel computational analysis demonstrating the effects of observation function selection when using the EnKF for state and parameter estimation in this setting. In examining the use of four epidemiologically-inspired observation functions of different forms in connection with the classic Susceptible-Infectious-Recovered (SIR) model, we show how incorrect observation modeling assumptions (i.e., fitting incidence data with a prevalence model, or neglecting under-reporting) can lead to inaccurate filtering estimates and forecast predictions. Results demonstrate the importance of choosing an observation function that well interprets the available data on the corresponding EnKF estimates in several filtering scenarios, including state estimation with known parameters, and combined state and parameter estimation with both constant and time-varying parameters. Numerical experiments further illustrate how modifying the observation noise covariance matrix in the filter can help to account for uncertainty in the observation function in certain cases.
Article
Full-text available
We investigate the relative importance of ecosystem complexity and phytoplankton light absorption for climate studies. While the complexity of Earth System models (ESMs) with respect to marine biota has increased over the past years, the relative importance of biological processes in driving climate‐relevant mechanisms such as the biological carbon pump and phytoplankton light absorption is still unknown. The climate effects of these mechanisms have been studied separately, but not together. To shed light on the role of biologically mediated feedbacks, we performed different model experiments with the EcoGENIE ESM. The model experiments have been conducted with and without phytoplankton light absorption and with two or 12 plankton functional types. For a robust comparison, all simulations are tuned to have the same primary production. Our model experiments show that phytoplankton light absorption changes ocean physics and biogeochemistry. Higher sea surface temperature decreases the solubility of CO2 which in turn increases the atmospheric CO2 concentration, and finally the atmospheric temperature rises by 0.45°C. An increase in ecosystem complexity increases the export production of particulate organic carbon but decreases the amount of dissolved organic matter. These changes in the marine carbon cycling, however, hardly reduces the atmospheric CO2 concentrations and slightly decreases the atmospheric temperature by 0.034°C. Overall we show that phytoplankton light absorption has a higher impact on the carbon cycle and on the climate system than a more detailed representation of the marine biota.
Article
We propose a control method that changes the geometry of a fishing net into an arbitrary geometry. To control the net geometry, an automatic control system was constructed by integrating the data assimilation method into a fishing net dynamics simulation. This study focused on the function of the data assimilation method to estimate the unknown parameter needed to control the net geometry. By applying the parameter estimation, the length of the material and loading were set as unknown parameters and estimated to be an intended geometry of the fishing net. Further, geometry control experiments consisting of numerical simulations were conducted for validation. This was achieved by using a simplified plane net model and a trawl net model. An automatic control system using the extended Kalman filter was applied. In addition, we confirmed that the net geometry can be controlled in real space by the automatic control system. For validation, the results of experiments conducted in an experimental flume tank were compared with the numerical simulation results of the plane net geometry by using the automatic control system that integrated an ensemble Kalman filter. The numerical simulation results were found to be congruent with those of the flume tank experiments, confirming the validity of the proposed control system.
Preprint
Full-text available
Abstract. Uncertain or inaccurate parameters in sea ice models influence seasonal predictions and climate change projections in terms of both mean and trend. We explore the feasibility and benefits of applying an Ensemble Kalman filter (EnKF) to estimate parameters in the Los Alamos sea ice model (CICE). Parameter estimation (PE) is applied to the highly influential dry snow grain radius and combined with state estimation in a series of perfect model observing system simulation experiments (OSSEs). Allowing the parameter to vary in space improves performance along the sea ice edge compared to requiring the parameter to be uniform everywhere. We compare experiments with both PE and state estimation to experiments with only the latter and found that the benefits of PE mostly occur after the DA period, when no observations are available to assimilate (i.e., the forecast period), which suggests PE’s relevance for improving seasonal predictions of Arctic sea ice.
Article
Full-text available
Abstract This paper describes and evaluates the assimilation component of a seamless sea ice prediction system, which is developed based on the fully coupled Alfred Wegener Institute, Helmholtz Center for Polar and Marine Research Climate Model (AWI‐CM, v1.1). Its ocean/ice component with unstructured‐mesh discretization and smoothly varying spatial resolution enables seamless sea ice prediction across a wide range of space and time scales. The model is complemented with the Parallel Data Assimilation Framework to assimilate observations in the ocean/ice component with an Ensemble Kalman Filter. The focus here is on the data assimilation of the prediction system. First, the performance of the system is tested in a perfect‐model setting with synthetic observations. The system exhibits no drift for multivariate assimilation, which is a prerequisite for the robustness of the system. Second, real observational data for sea ice concentration, thickness, drift, and sea surface temperature are assimilated. The analysis results are evaluated against independent in situ observations and reanalysis data. Further experiments that assimilate different combinations of variables are conducted to understand their individual impacts on the model state. In particular, assimilating sea ice drift improves the sea ice thickness estimate, and assimilating sea surface temperature is able to avert a circulation bias of the free‐running model in the Arctic Ocean at middepth. Finally, we present preliminary results obtained with an extended system where the atmosphere is constrained by nudging toward reanalysis data, revealing challenges that still need to be overcome to adapt the ocean/ice assimilation. We consider this system a prototype on the way toward strongly coupled data assimilation across all model components.
Article
Full-text available
We investigate the feasibility of addressing model error by perturbing and estimating uncertain static model parameters using the localized ensemble transform Kalman filter. In particular we use the augmented state approach, where parameters are updated by observations via their correlation with observed state variables. This online approach offers a flexible, yet consistent way to better fit model variables affected by the chosen parameters to observations, while ensuring feasible model states. We show in a nearly-operational convection-permitting configuration that the prediction of clouds and precipitation with the COSMO-DE model is improved if the two dimensional roughness length parameter is estimated with the augmented state approach. Here, the targeted model error is the roughness length itself and the surface fluxes, which influence the initiation of convection. At analysis time, Gaussian noise with a specified correlation matrix is added to the roughness length to regulate the parameter spread. In the northern part of the COSMO-DE domain, where the terrain is mostly flat and assimilated surface wind measurements are dense, estimating the roughness length led to improved forecasts of up to six hours of clouds and precipitation. In the southern part of the domain, the parameter estimation was detrimental unless the correlation length scale of the Gaussian noise that is added to the roughness length is increased. The impact of the parameter estimation was found to be larger when synoptic forcing is weak and the model output is more sensitive to the roughness length.
Article
By using the Grünwald‐Letnikov (G‐L) difference method and the Tustin generating function method, this study presents extended Kalman filters to achieve satisfactory state estimation for fractional‐order nonlinear continuous‐time systems that containing some unknown parameters with the correlated fractional‐order colored noises. Based on the G‐L difference method and the Tustin generating function method, the difference equations corresponding to fractional‐order nonlinear continuous‐time systems are constructed respectively. The first‐order Taylor expansion is used to linearize the nonlinear functions in the estimated system, which provides the system model for extended Kalman filters. Using the augmented vector method, the unknown parameters are regarded as new state vectors, and the augmented difference equation is constructed. Based on the augmented difference equation, extended Kalman filters are designed to estimate the state of fractional‐order nonlinear systems with process noise as fractional‐order colored noise or measurement noise as fractional‐order colored noise. Meanwhile, the extended Kalman filters proposed in this paper can also estimate the unknown parameters effectively. Finally, the effectiveness of the proposed extended Kalman filters is validated in simulation with two examples.
Article
Full-text available
While various data assimilation algorithms based on Bayes’ theorem have been developed for state estimation, some of these algorithms have also been applied to model parameter estimation. Coupled model parameter estimation (CPE) adjusts model parameters using available observations; then, the observation-adjusted parameters can greatly mitigate the model bias, which has great potential to reduce climate drift and enhance forecast skill in coupled climate models. However, given numerous model parameters that are associated with multiple time scales, how to conduct CPE with the simultaneous estimation of multiple parameters (SEMP) is still a popular research topic. With the aid of 3 coupled models, ranging from the conceptual coupled model to the intermediate coupled circulation model, this study has developed a systematic method to implement the SEMP–CPE. Linking coupled model sensitivities with the signal-to-noise ratio of the CPE, the SEMP–CPE method uses a timescale structure with coupled model sensitivities to determine which and how many parameters are estimated simultaneously in each CPE cycle to minimize the error of the coupled model simulation. Given that in a coupled model, the timescales by which different model components sensitively respond to a parameter perturbation can be quite different due to their different variabilities in their characteristic timescales, the first part of our study series focuses on the SEMP–CPE associated with single model component sensitivities. The results show that the quality of the model state analysis (in terms of assimilation) improves with the number of parameters being estimated by the order of sensitivities until the signal-to-noise ratio reaches a low threshold. Only when the most impactful physical parameters are estimated is the error of the state estimation consistently decreasing; as well as the signal-to-noise ratio in state-parameter covariance in SEMP scheme is enhanced. While only the signal extracted from the SEMP–CPE reaches saturated, the signal-to-noise ratio in the SEMP–CPE is maximized, and the state estimation error is minimized. Otherwise, if the parameters with low sensitivities are included in the CPE, the error of the state estimation increases instead. These results provide some insight into simultaneously estimating multiple parameters in a biased coupled general circulation model that assimilates real observations, which further improves climate analysis and prediction initialization.
Article
Full-text available
While nonlinear stochastic partial differential equations arise naturally in spatiotemporal modeling, inference for such systems often faces two major challenges: sparse noisy data and ill-posedness of the inverse problem of parameter estimation. To overcome the challenges, we introduce a strongly regularized posterior by normalizing the likelihood and by imposing physical constraints through priors of the parameters and states. We investigate joint parameter-state estimation by the regularized posterior in a physically motivated nonlinear stochastic energy balance model (SEBM) for paleoclimate reconstruction. The high-dimensional posterior is sampled by a particle Gibbs sampler that combines MCMC with an optimal particle filter exploiting the structure of the SEBM. In tests using either Gaussian or uniform priors based on the physical range of parameters, the regularized posteriors overcome the ill-posedness and lead to samples within physical ranges, quantifying the uncertainty in estimation. Due to the ill-posedness and the regularization, the posterior of parameters presents a relatively large uncertainty, and consequently, the maximum of the posterior, which is the minimizer in a variational approach, can have a large variation. In contrast, the posterior of states generally concentrates near the truth, substantially filtering out observation noise and reducing uncertainty in the unconstrained SEBM.
Conference Paper
Full-text available
The authors have explicited, for the usual parabolic heat equation, a general method of estimating distributed parameters. This method, based on control theory (or calculus of variation) in function spaces, may be used in any situation where the state and observation equations are differentiable. It makes it numerically possible to consider the distributed parameter as a discretized function, and to adjust the discretization values of the parameter, the parameter being either a function or an independent variable (space, time), or a dependent variable (temperature, pressure). The unicity of the solution of the inverse problem has been discussed, and some partial results of unicity have been given, both on theoretical and on numerical ground.
Article
Full-text available
Data assimilation experiments are performed using an ensemble Kalman filter (EnKF) implemented for a two-layer spectral shallow water model at triangular truncation TlOO representing an abstract planet covered by a strongly stratified fluid. Advantage is taken of the inherent parallelism in the EnKF by running each ensemble member on a different processor of a parallel computer. The Kalman filter update step is parallelized by letting each processor handle the observations from a limited region. The algorithm is applied to the assimilation of synthetic altimetry data in the context of an imperfect model and known representation-error statistics. The effect of finite ensemble size on the residual errors is investigated and the error estimates obtained with the EnKF are compared to the actual errors.
Article
Full-text available
A multivariate ensemble Kalman filter (MvEnKF) implemented on a massively parallel computer architecture has been developed for the Poseidon ocean circulation model and tested with a Pacific basin model configuration. There are about 2 million prognostic state-vector variables. Parallelism for the data assimilation step is achieved by regionalization of the background-error covariances that are calculated from the phase-space distribution of the ensemble. Each processing element (PE) collects elements of a matrix measurement functional from nearby PEs. To avoid the introduction of spurious long-range covariances associated with finite ensemble sizes, the background-error covariances are given compact support by means of a Hadamard (element by element) product with a three-dimensional canonical correlation function. The methodology and the MvEnKF implementation are discussed. To verify the proper functioning of the algorithms, results from an initial experiment with in situ temperature data are presented. Furthermore, it is shown that the regionalization of the background covariances has a negligible impact on the quality of the analyses. Even though the parallel algorithm is very efficient for large numbers of observations, individual PE memory, rather than speed, dictates how large an ensemble can be used in practice on a platform with distributed memory.
Article
Full-text available
ABSTRACT A probability distribution for values of the effective climate sensitivity, with a lower bound of 1.6 K (5th percentile), is obtained on the basis of the increase in ocean heat content in recent decades from analyses of observed interior-ocean temperature changes, surface temperature changes measured since 1860, and estimates of anthropogenic,and natural radiative forcing of the climate system. Radiative forcing is the greatest source of uncertainty in the calculation; the result also depends,somewhat,on the rate of ocean heat uptake in the late nineteenth century, for which an assumption is needed as there is no observational estimate. Because the method does not use the climate sensitivity simulated by a general circulation model, it provides an independent ob- servationally based constraint on this important parameter of the climate system.
Article
Full-text available
Two types of sampling plans are examined as alternatives to simple random sampling in Monte Carlo studies. These plans are shown to be improvements over simple random sampling with respect to variance for a class of estimators that includes the sample mean and the empirical distribution function. 6 figures.
Article
Full-text available
An ensemble Kalman filter may be considered for the 4D assimilation of atmospheric data. In this paper, an efficient implementation of the analysis step of the filter is proposed. It employs a Schur (elementwise) product of the covariances of the background error calculated from the ensemble and a correlation function having local support to filter the small (and noisy) background-error covariances associated with remote observations. To solve the Kalman filter equations, the observations are organized into batches that are assimilated sequentially. For each batch, a Cholesky decomposition method is used to solve the system of linear equations. The ensemble of background fields is updated at each step of the sequential algorithm and, as more and more batches of observations are assimilated, evolves to eventually become the ensemble of analysis fields. A prototype sequential filter has been developed. Experiments are performed with a simulated observational network consisting of 542 radiosonde and 615 satellite-thickness profiles. Experimental results indicate that the quality of the analysis is almost independent of the number of batches (except when the ensemble is very small). This supports the use of a sequential algorithm. A parallel version of the algorithm is described and used to assimilate over 100 000 observations into a pair of 50-member ensembles. Its operation count is proportional to the number of observations, the number of analysis grid points, and the number of ensemble members. In view of the flexibility of the sequential filter and its encouraging performance on a NEC SX-4 computer, an application with a primitive equations model can now be envisioned.
Article
Full-text available
The possibility of performing data assimilation using the flow-dependent statistics calculated from an ensemble of short-range forecasts (a technique referred to as ensemble Kalman filtering) is examined in an idealized environment. Using a three-level, quasigeostrophic, T21 model and simulated observations, experiments are performed in a perfect-model context. By using forward interpolation operators from the model state to the observations, the ensemble Kalman filter is able to utilize nonconventional observations. In order to maintain a representative spread between the ensemble members and avoid a problem of inbreeding, a pair of ensemble Kalman filters is configured so that the assimilation of data using one ensemble of short-range forecasts as background fields employs the weights calculated from the other ensemble of short-range forecasts. This configuration is found to work well: the spread between the ensemble members resembles the difference between the ensemble mean and the true state, except in the case of the smallest ensembles. A series of 30-day data assimilation cycles is performed using ensembles of different sizes. The results indicate that (i) as the size of the ensembles increases, correlations are estimated more accurately and the root-mean-square analysis error decreases, as expected, and (ii) ensembles having on the order of 100 members are sufficient to accurately describe local anisotropic, baroclinic correlation structures. Due to the difficulty of accurately estimating the small correlations associated with remote observations, a cutoff radius beyond which observations are not used, is implemented. It is found that (a) for a given ensemble size there is an optimal value of this cutoff radius, and (b) the optimal cutoff radius increases as the ensemble size increases.
Article
Full-text available
We describe the use of a Monte Carlo Markov Chain (MCMC) method based on Bayes' Theorem and the Metropolis-Hastings algorithm for estimation of model parameters in a climate model. We use the model of Saltzman and Maasch (1990). This is a computationally simple model, but with seven free parameters and substantial non-linearity it would be difficult to tune with commonly used data assimilation methods. When forced with solar radiation, the model can reproduce mean ocean temperature, atmospheric CO2 concentration and global ice volume reasonably well over the last 500 ka. The MCMC method samples the multivariate probability density function of model parameters, which makes it a powerful tool for estimating not only parameter values but also for calculating the model's sensitivity to each parameter. A major attraction of the method is the simplicity and the ease of the implementation of the algorithm. We have used cross-validation to show that the model forecast for the next 50-100 ka is of similar accuracy to the hindcast over the last 500 ka. The model forecasts an immediate cooling of the Earth, with the next glacial maximum in around 60 ka. An anthropogenic pulse of CO2 has a short-term effect but does not influence the model prediction beyond 30 ka. Beyond 100 ka into the future, the model ensemble diverges widely, indicating that there is insufficient information in the data which we have used to determine the longer term evolution of the Earth's climate.
Article
Full-text available
In this study we examine the anthropogenically forced climate response over the historical period, 1860 to present, and projected response to 2100, using updated emissions scenarios and an improved coupled model (HadCM3) that does not use flux adjustments. We concentrate on four new Special Report on Emission Scenarios (SRES) namely (A1FI, A2, B2, B1) prepared for the Intergovernmental Panel on Climate Change Third Assessment Report, considered more self-consistent in their socio-economic and emissions structure, and therefore more policy relevant, than older scenarios like IS92a. We include an interactive model representation of the anthropogenic sulfur cycle and both direct and indirect forcings from sulfate aerosols, but omit the second indirect forcing effect through cloud lifetimes. The modelled first indirect forcing effect through cloud droplet size is near the centre of the IPCC uncertainty range. We also model variations in tropospheric and stratospheric ozone. Greenhouse gas-forced climate change response in B2 resembles patterns in IS92a but is smaller. Sulfate aerosol and ozone forcing substantially modulates the response, cooling the land, particularly northern mid-latitudes, and altering the monsoon structure. By 2100, global mean warming in SRES scenarios ranges from 2.6 to 5.3 K above 1900 and precipitation rises by 1%/K through the twenty first century (1.4%/K omitting aerosol changes). Large-scale patterns of response broadly resemble those in an earlier model (HadCM2), but with important regional differences, particularly in the tropics. Some divergence in future response occurs across scenarios for the regions considered, but marked drying in the mid-USA and southern Europe and significantly wetter conditions for South Asia, in June-July-August, are robust and significant.
Article
Full-text available
A new earth system climate model of intermediate complexity has been developed and its climatology compared to observations. The UVic Earth System Climate Model consists of a three-dimensional ocean general circulation model coupled to a thermodynamic/dynamic sea-ice model, an energy-moisture balance atmospheric model with dynamical feedbacks, and a thermomechanical land-ice model. In order to keep the model computationally efficient a reduced complexity atmosphere model is used. Atmospheric heat and freshwater transports are parametrized through Fickian diffusion, and precipitation is assumed to occur when the relative humidity is greater than 85%. Moisture transport can also be accomplished through advection if desired. Precipitation over land is assumed to return instantaneously to the ocean via one of 33 observed river drainage basins. Ice and snow albedo feedbacks are included in the coupled model by locally increasing the prescribed latitudinal profile of the planetary albedo. The atmospheric model includes a parametrization of water vapour/planetary longwave feedbacks, although the radiative forcing associated with changes in atmospheric CO2 is prescribed as a modification of the planetary longwave radiative flux. A specified lapse rate is used to reduce the surface temperature over land where there is topography. The model uses prescribed present-day winds in its climatology, although a dynamical wind feedback is included which exploits a latitudinally-varying empirical relationship between atmospheric surface temperature and density. The ocean component of the coupled model is based on the Geophysical Fluid Dynamics Laboratory (GFDL) Modular Ocean Model 2.2, with a global resolution of 3.6degrees (zonal) by 1.8degrees (meridional) and 19 vertical levels, and includes an option for brine-rejection parametrization. The sea-ice component incorporates an elastic-viscous-plastic rheology to represent sea-ice dynamics and various options for the representation of sea-ice thermodynamics and thickness distribution. The systematic comparison of the coupled model with observations reveals good agreement, especially when moisture transport is accomplished through advection. Global warming simulations conducted using the model to explore the role of moisture advection reveal a climate sensitivity of 3.0degreesC for a doubling of CO2, in line with other more comprehensive coupled models. Moisture advection, together with the wind feedback, leads to a transient simulation in which the meridional overturning in the North Atlantic initially weakens, but is eventually re-established to its initial strength once the radiative forcing is held fixed, as found in many coupled atmosphere General Circulation Models (GCMs). This is in contrast to experiments in which moisture transport is accomplished through diffusion whereby the overturning is re-established to a strength that is greater than its initial condition. When applied to the climate of the Last Glacial Maximum (LGM), the model obtains tropical cooling (30degreesN-30degreesS), relative to the present, of about 2.1degreesC over the ocean and 3.6degrees C over the land. These are generally cooler than CLIMAP estimates, but not as cool as some other reconstructions. This moderate cooling is consistent with alkenone reconstructions and a low to medium climate sensitivity to perturbations in radiative forcing. An amplification of the cooling occurs in the North Atlantic due to the weakening of North Atlantic Deep Water formation. Concurrent with this weakening is a shallowing of, and a more northward penetration of, Antarctic Bottom Water. Climate models are usually evaluated by spinning them up under perpetual present-day forcing and comparing the model results with present-day observations. Implicit in this approach is the assumption that the present-day observations are in equilibrium with the present-day radiative forcing. The comparison of a long transient integration (starting at 6 KBP), forced by changing radiative forcing (solar, CO2, orbital), with an equilibrium integration reveals substantial differences. Relative to the climatology from the present-day equilibrium integration, the global mean surface air and sea surface temperatures (SSTs) are 0.74degreesC and 0.55degreesC colder, respectively. Deep ocean temperatures are substantially cooler and southern hemisphere sea-ice cover is 22% greater, although the North Atlantic conveyor remains remarkably stable in all cases. The differences are due to the long timescale memory of the deep ocean to climatic conditions which prevailed throughout the late Holocene. It is also demonstrated that a global warming simulation that starts from an equilibrium present-day climate (cold start) underestimates the global temperature increase at 2100 by 13% when compared to a transient simulation, under historical solar, CO2 and orbital forcing, that is also extended out to 2100. This is larger (13% compared to 9.8%) than the difference from an analogous transient experiment which does not include historical changes in solar forcing. These results suggest that those groups that do not account for solar forcing changes over the twentieth century may slightly underestimate (similar to3% in our model) the projected warming by the year 2100.
Article
Full-text available
This paper discusses an important issue related to the implementation and interpretation of the analysis scheme in the ensemble Kalman filter. It is shown that the observations must be treated as random variables at the analysis steps. That is, one should add random perturbations with the correct statistics to the observations and generate an ensemble of observations that then is used in updating the ensemble of model states. Traditionally, this has not been done in previous applications of the ensemble Kalman filter and, as will be shown, this has resulted in an updated ensemble with a variance that is too low. This simple modification of the analysis scheme results in a completely consistent approach if the covariance of the ensemble of model states is interpreted as the prediction error covariance, and there are no further requirements on the ensemble Kalman filter method, except for the use of an ensemble of sufficient size. Thus, there is a unique correspondence between the error statistics from the ensemble Kalman filter and the standard Kalman filter approach.
Article
Full-text available
Includes bibliographical references (p. 10-11). Abstract in HTML and technical report in HTML and PDF available on the Massachusetts Institute of Technology Joint Program on the Science and Policy of Global Change website (http://mit.edu/globalchange/www/) Different atmosphere-ocean general circulation models produce significantly different projections of climate change in response to increases in greenhouse gases and aerosol concentrations in the atmosphere. The main reasons for this disagreement are differences in the sensitivities of the models to external radiative forcing and differences in their rates of heat uptake by the deep ocean. In this study, these properties are constrained by comparing radiosonde-based observations of temperature trends in the free troposphere and lower stratosphere with corresponding simulations of a fast, flexible climate model, using techniques based on optimal fingerprinting. Parameter choices corresponding either to low sensitivity, or to high sensitivity combined with slow oceanic heat uptake are rejected. Nevertheless, a broad range of acceptable model characteristics remains, such that climate change projections from any single model should be treated as only one of a range of possibilities.
Article
Full-text available
We define the radiative forcings used in climate simulations with the SI2000 version of the Goddard Institute for Space Studies (GISS) global climate model. These include temporal variations of well-mixed greenhouse gases, stratospheric aerosols, solar irradiance, ozone, stratospheric water vapor, and tropospheric aerosols. Our illustrations focus on the period 1951-2050, but we make the full data sets available for those forcings for which we have earlier data. We illustrate the global response to these forcings for the SI2000 model with specified sea surface temperature and with a simple Q-flux ocean, thus helping to characterize the efficacy of each forcing. The model yields good agreement with observed global temperature change and heat storage in the ocean. This agreement does not yield an improved assessment of climate sensitivity or a confirmation of the net climate forcing because of possible compensations with opposite changes of these quantities. Nevertheless, the results imply that observed global temperature change during the past 50 years is primarily a response to radiative forcings. It is also inferred that the planet is now out of radiation balance by 0.5 to 1 W/m2 and that additional global warming of about 0.5°C is already ``in the pipeline.''
Article
Full-text available
The assessment of uncertainties in global warming projections is often based on expert judgement, because a number of key variables in climate change are poorly quantified. In particular, the sensitivity of climate to changing greenhouse-gas concentrations in the atmosphere and the radiative forcing effects by aerosols are not well constrained, leading to large uncertainties in global warming simulations. Here we present a Monte Carlo approach to produce probabilistic climate projections, using a climate model of reduced complexity. The uncertainties in the input parameters and in the model itself are taken into account, and past observations of oceanic and atmospheric warming are used to constrain the range of realistic model responses. We obtain a probability density function for the present-day total radiative forcing, giving 1.4 to 2.4 W m-2 for the 5-95 per cent confidence range, narrowing the global-mean indirect aerosol effect to the range of 0 to -1.2 W m-2. Ensemble simulations for two illustrative emission scenarios suggest a 40 per cent probability that global-mean surface temperature increase will exceed the range predicted by the Intergovernmental Panel on Climate Change (IPCC), but only a 5 per cent probability that warming will fall below that range.
Article
Full-text available
The ring-shedding process in the Agulhas Current is studied using the ensemble Kalman filter to assimilate Geosat altimeter data into a two layer quasi-geostrophic ocean model. The properties of the ensemble Kalman filter are further explored with focus on the analysis scheme and the use of gridded data. The Geosat data consist of 10 fields of gridded sea-surface height anomalies separated 10 days apart which are added to a climatic mean field. This corresponds to a huge number of data values and a data reduction scheme must be applied to increase the efficiency of the analysis procedure. Further, it is illustrated how one can resolve the rank problem occurring when a too large data set or a small ensemble is used. 1 Introduction The Agulhas Current is a western-boundary current flowing along the east coast of South Africa. Its water originates from the Mozambique channel (see e.g. Saetre and da Silva, 1984) and from east of Madagascar (e.g. Lutjeharms et al., 1981) as part of...
Article
Two types of sampling plans are examined as alternatives to simple random sampling in Monte Carlo studies. These plans are shown to be improvements over simple random sampling with respect to variance for a class of estimators which includes the sample mean and the empirical distribution function.
Article
A theory for estimating the probability distribution of the state of a model given a set of observations exists. This nonlinear filtering theory unifies the data assimilation and ensemble generation problem that have been key foci of prediction and predictability research for numerical weather and ocean prediction applications. A new algorithm, referred to as an ensemble adjustment Kalman filter, and the more traditional implementation of the ensemble Kalman filter in which "perturbed observations" are used, are derived as Monte Carlo approximations to the nonlinear filter. Both ensemble Kalman filter methods produce assimilations with small ensemble mean errors while providing reasonable measures of uncertainty in the assimilated variables. The ensemble methods can assimilate observations with a nonlinear relation to model state variables and can also use observations to estimate the value of imprecisely known model parameters. These ensemble filter methods are shown to have significant advantages over four-dimensional variational assimilation in low-order models and scale easily to much larger applications. Heuristic modifications to the filtering algorithms allow them to be applied efficiently to very large models by sequentially processing observations and computing the impact of each observation on each state variable in an independent calculation. The ensemble adjustment Kalman filter is applied to a nondivergent barotropic model on the sphere to demonstrate the capabilities of the filters in models with state spaces that are much larger than the ensemble size. When observations are assimilated in the traditional ensemble Kalman filter, the resulting updated ensemble has a mean that is consistent with the value given by filtering theory, but only the expected value of the covariance of the updated ensemble is consistent with the theory. The ensemble adjustment Kalman filter computes a linear operator that is applied to the prior ensemble estimate of the state, resulting in an updated ensemble whose mean and also covariance are consistent with the theory. In the cases compared here, the ensemble adjustment Kalman filter performs significantly better than the traditional ensemble Kalman filter, apparently because noise introduced into the assimilated ensemble through perturbed observations in the traditional filter limits its relative performance. This superior performance may not occur for all problems and is expected to be most notable for small ensembles. Still, the results suggest that careful study of the capabilities of different varieties of ensemble Kalman filters is appropriate when exploring new applications.
Article
The size and impacts of anthropogenically induced climate change (AICC) strongly depend on the climate sensitivity, ΔT2x. If ΔT2x is less than the lower bound given by the Intergovernmental Panel on Climate Change (IPCC), 1.5°C, then AICC may not be a serious problem for humanity. If ΔT2x is greater than the upper bound given by the IPCC, 4.5°C, then AICC may be one of the most severe problems of the 21st century. Here we use a simple climate/ocean model, the observed near-surface temperature record, and a bootstrap technique to objectively estimate the probability density function for ΔT2x. We find that as a result of natural variability and uncertainty in the climatic radiative forcing, the 90% confidence interval for ΔT2x is 1.0°C to 9.3°C. Consequently, there is a 54% likelihood that ΔT2x lies outside the IPCC range.
Article
The identification of parameters in partial differential equations from experimental output data is investigated. It is assumed that a physical process can be represented by a system of nonlinear hyperbolic or parabolic partial differential equations of known form but containing unknown parameters. The parameters may enter in the equations themselves or the boundary conditions. A steep descent algorithm is derived based on minimizing the difference between the experimentally observed output and that predicted by the model. The question of observability of distributed systems is considered. The determination of the reaction velocity constant for a first-order decomposition in an isothermal, laminar-flow tubular reactor is treated in detail.
Article
Attempts to estimate the state of the ocean usually involve one of two approaches: Either an assimilation of data (typically altimetric surface height) is performed or an inversion is carried out according to some minimization scheme. The former case normally retains some version of the time-dependent equations of motion; the latter is usually steady. Data sources are frequently not ideal for either approach, usually being spatially and temporally confined (e.g., from an oceanographic cruise). This raises particular difficulties for inversions, whose physics seldom includes much beyond the geostrophic balance. In this paper the authors examine an approach midway between the two, examining several questions, (i) What is the impact of data assimilated continuously to a steady state on regions outside the data sources? (ii) Can remote data improve the long-term mean of a model whose natural response is not close to climatology? (iii) Can an eddy-free model assimilate data containing eddies? The authors employ an inversion using a simple North Atlantic model, which permits no eddies, but contains better dynamics than geostrophy (the frictional planetary geostrophic equations), and an assimilative scheme rather simpler than those normally employed, almost equivalent to direct data insertion, run to a steady state. The data used are real subsurface data, which do contain eddies, from World Ocean Circulation Experiment cruises in the northern North Atlantic. The presence of noise in these data is found to cause no numerical difficulties, and the authors show that the impact of even one vertical profile can strongly modify the water mass properties of the solution far from the data region through a combination of wave propagation, advection, and diffusion. Because the model can be run for very long times, the region of impact is thus somewhat wider than would occur for assimilations over short intervals, such as a year.
Article
Ed Lorenz, pioneer of chaos theory, presented this work at an earlier ECMWF workshop on predictability. The paper, which has never been published externally, presents what is widely known as the Lorenz 1996 model. Ed was unable to come to the 2002 meeting, but we decided it would be proper to acknowledge Ed’s unrivalled contribution to the field of weather and climate predictability by publishing his 1996 paper in this volume. The difference between the state that a system is assumed or predicted to possess, and the state that it actually possesses or will possess, constitutes the error in specifying or forecasting the state. We identify the rate at which an error will typically grow or decay, as the range of prediction increases, as the key factor in determining the extent to which a system is predictable. The long-term average factor by which an infinitesimal error will amplify or diminish, per unit time, is the leading Lyapunov number; its logarithm, denoted by λ1, is the leading Lyapunov exponent. Instantaneous growth rates can differ appreciably from the average. With the aid of some simple models, we describe situations where errors behave as would be expected from a knowledge of λ1, and other situations, particularly in the earliest and latest stages of growth, where their behaviour is systematically different. Slow growth in the latest stages may be especially relevant to the longrange predictability of the atmosphere.
Article
The ring-shedding process in the Agulhas Current is studied using the ensemble Kalman filter to assimilate Geosat altimeter data into a two-layer quasigeostrophic ocean model The properties of the ensemble Kalman filter are further explored with focus on the analysis scheme and the use of gridded data. The Geosat data consist of 10 fields of gridded sea surface height anomalies separated 10 days apart that are added to a climatic mean field This corresponds to a huge number of data values, and a data reduction scheme must be applied to increase the efficiency of the analysis procedure. Further, it is illustrated how one can resolve the rank problem occurring when a too large dataset or a small ensemble is used.
Article
We provide a detailed, introductory exposition of the Metropolis-Hastings algorithm, a powerful Markov chain method to simulate multivariate distributions. A simple, intuitive derivation of this method is given along with guidance on implementation. Also discussed are two applications of the algorithm, one for implementing acceptance-rejection sampling when a blanketing function is not available and the other for implementing the algorithm with block-at-a-time scans. In the latter situation, many different algorithms, including the Gibbs sampler, are shown to be special cases of the Metropolis-Hastings algorithm. The methods are illustrated with examples.
Article
A variational assimilation technique is presented which continuously adjusts a model solution by introducing a correction term to the model equations. The technique is essentially a modification of the adjoint technique. The Variational Continuous Assimilation (VCA) technique optimizes the correction to the model equations rather than the initial conditions as is done in the adjoint technique. Application of this correction during a forecast produced substantially improved simulations. -from Author
Article
We explore sensitivity analyses of ocean circulation models by comparing the adjoint and direct-perturbation methods. We study the sensitivity of time-averaged inter-gyre vorticity transport to the imposed wind-stress curl in an eddy-permitting reduced-gravity ocean model of a double gyre. Two regimes exist: a non-chaotic regime for low wind-stress curl, and a chaotic regime for stronger wind forcing. Direct-perturbation methods are found to converge, with increasing integration time, to a stable ‘climate’ sensitivity in both the chaotic and non-chaotic regimes. The adjoint method converges in the non-chaotic regime but diverges in the chaotic regime. The divergence of adjoint sensitivity in the chaotic regime is directly related to the chaotic divergence of solution trajectories through phase-space. Thus, standard adjoint sensitivity methods cannot be used to estimate climate sensitivity in chaotic ocean circulation models. An alternative method using an ensemble of adjoint calculations is explored. This is found to give estimates of the climate sensitivity of the time-mean vorticity transport with O(25%) error or less for integration times ranging from one month to one year. The ensemble-adjoint method is particularly useful when one wishes to produce a map of sensitivities (for example, the sensitivity of the advective vorticity transport to wind stress at every point in the domain) as direct sensitivity calculations for each point in the map are avoided. However, an ensemble-adjoint of the variance of the vorticity transport to wind-stress curl fails to estimate the climate sensitivity. We conclude that the most reliable method of determining the climate sensitivity is the direct-perturbation method, but ensemble-adjoint techniques may be of use in some problems. Copyright © 2002 Royal Meteorological Society.
Article
The ensemble Kalman filter (EnKF) has been proposed for operational atmospheric data assimilation. Some outstanding issues relate to the required ensemble size, the impact of localization methods on balance, and the representation of model error. To investigate these issues, a sequential EnKF has been used to assimilate simulated radiosonde, satellite thickness, and aircraft reports into a dry, global, primitive-equation model. The model uses the simple forcing and dissipation proposed by Held and Suarez. It has 21 levels in the vertical, includes topography, and uses a 144 72 horizontal grid. In total, about 80 000 observations are assimilated per day. It is found that the use of severe localization in the EnKF causes substantial imbalance in the analyses. As the distance of imposed zero correlation increases to about 3000 km, the amount of imbalance becomes acceptably small. A series of 14-day data assimilation cycles are performed with different configurations of the EnKF. Included is an experiment in which the model is assumed to be perfect and experiments in which model error is simulated by the addition of an ensemble of approximately balanced model perturbations with a specified statistical structure. The results indicate that the EnKF, with 64 ensemble members, performs well in the present context. The growth rate of small perturbations in the model is examined and found to be slow compared with the corresponding growth rate in an operational forecast model. This is partly due to a lack of horizontal resolution and partly due to a lack of realistic parameterizations. The growth rates in both models are found to be smaller than the growth rate of differences between forecasts with the operational model and verifying analyses. It is concluded that model-error simulation would be important, if either of these models were to be used with the EnKF for the assimilation of real observations.
Article
Introduction The phrase ‘model error’ means different things to different people, frequently arousing surprisingly passionate emotions. Everyone accepts that all models are wrong, but to some this is simply an annoying caveat on otherwise robust (albeit modeldependent) conclusions, while to others it means that no inference based on ‘electronic storytelling’ can be taken seriously at all. This chapter will focus on how to quantify and minimise the cumulative effect of model ‘imperfections’ (errors by any other name, but we are trying to avoid inflammatory language) that either have not been eliminated because of incomplete observations/understanding or cannot be eliminated because they are intrinsic to the model’s structure. We will not provide a recipe for eliminating these imperfections, but rather some ideas on how to live with them. Live with them we must, because no matter how clever model developers, or how fast supercomputers, become, these imperfections will always be with us and represent the hardest source of uncertainty to quantify in a weather or climate forecast (Smith, this volume). This is not meant to underestimate the importance of identifying and improving representations of dynamics (see Hoskins, this volume) or parametrisations (see Palmer, this volume) or existing (and planned) ensemble-based forecast systems (Anderson, Buizza, this volume), merely to draw attention to the fact that our models will always be subject to error or inadequacy (Smith, this volume), and that this fact is especially chronic in those cases where we lack the ability to use conventional verification/falsification procedures (i.e. the climate forecasting problem).
Article
The purpose of this paper is to provide a comprehensive presentation and interpretation of the Ensemble Kalman Filter (EnKF) and its numerical implementation. The EnKF has a large user group, and numerous publications have discussed applications and theoretical aspects of it. This paper reviews the important results from these studies and also presents new ideas and alternative interpretations which further explain the success of the EnKF. In addition to providing the theoretical framework needed for using the EnKF, there is also a focus on the algorithmic formulation and optimal numerical implementation. A program listing is given for some of the key subroutines. The paper also touches upon specific issues such as the use of nonlinear measurements, in situ profiles of temperature and salinity, and data which are available with high frequency in time. An ensemble based optimal interpolation (EnOI) scheme is presented as a cost-effective approach which may serve as an alternative to the EnKF in some applications. A fairly extensive discussion is devoted to the use of time correlated model errors and the estimation of model bias.
Article
The classical filtering and prediction problem is re-examined using the Bode-Sliannon representation of random processes and the “state-transition” method of analysis of dynamic systems. New results are: (1) The formulation and methods of solution of the problem apply without modification to stationary and nonstationary statistics and to growing-memory and infinitememory filters. (2) A nonlinear difference (or differential) equation is derived for the covariance matrix of the optimal estimation error. From the solution of this equation the coefficients of the difference (or differential) equation of the optimal linear filter are obtained without further calculations. (3) The filtering problem is shown to be the dual of the noise-free regulator problem. The new method developed here is applied to two well-known problems, confirming and extending earlier results. The discussion is largely self-contained and proceeds from first principles; basic concepts of the theory of random processes are reviewed in the Appendix.
Article
A new sequential data assimilation method is discussed. It is based on forecasting the error statistics using Monte Carlo methods, a better alternative than solving the traditional and computationally extremely demanding approximate error covariance equation used in the extended Kalman filter. The unbounded error growth found in the extended Kalman filter, which is caused by an overly simplified closure in the error covariance equation, is completely eliminated. Open boundaries can be handled as long as the ocean model is well posed. Well-known numerical instabilities associated with the error covariance equation are avoided because storage and evolution of the error covariance matrix itself are not needed. The results are also better than what is provided by the extended Kalman filter since there is no closure problem and the quality of the forecast error statistics therefore improves. The method should be feasible also for more sophisticated primitive equation models. The computational load for reasonable accuracy is only a fraction of what is required for the extended Kalman filter and is given by the storage of, say, 100 model states for an ensemble size of 100 and thus CPU requirements of the order of the cost of 100 model integrations. The proposed method can therefore be used with realistic nonlinear ocean models on large domains on existing computers, and it is also well suited for parallel computers and clusters of workstations where each processor integrates a few members of the ensemble.
Article
The arrival of satellite-borne ocean colour sensors means that there will soon be a wealth of observations of the surface concentration of chlorophyll in the worlds oceans. These observations can be used to improve our understanding of the oceanic ecosystem if the appropriate data assimilation techniques are available to combine them with an ecosystem model. In this paper we explore a novel method, based on Bayes Theorem and a Monte Carlo Markov Chain algorithm, of estimating a subset of the parameters in a seven compartment ecosystem model. The model describes the flows of nitrogen amongst phytoplankton, zooplankton, nitrate, bacteria, ammonium, dissolved organic nitrogen and detritus. We first generate synthetic observations from the model and then, in three separate experiments, try to recover subsets of the model parameters from clean and noisy versions of these. Bayes Theorem allows us to combine both prior information on the parameter values and the observations to generate a posterior probability density function of the parameters. The Metropolis-Hastings algorithm then allows us to produce Markov chains that sample this posterior probability density function and recover the parameter means, variances and standard errors. We find that the technique is very successful in recovering information on a small number of parameters but that the time required to solve the model makes it impractical to find second order properties of more than about ten of the model parameters.
Article
A new technique, Bayesian Monte Carlo (BMC), is used to quantify errors in water quality models caused by uncertain parameters. BMC also provides estimates of parameter uncertainty as a function of observed data on model state variables. The use of Bayesian inference generates uncertainty estimates that combine prior information on parameter uncertainty with observed variation in water quality data to provide an improved estimate of model parameter and output uncertainty. It also combines Monte Carlo analysis with Bayesian inference to determine the ability of random selected parameter sets to simulate observed data. BMC expands upon previous studies by providing a quantitative estimate of parameter acceptability using the statistical likelihood function. The likelihood of each parameter set is employed to generate an n-dimensional hypercube describing a probability distribution of each parameter and the covariance among parameters. These distributions are utilized to estimate uncertainty in model predictions. Application of BMC to a dissolved oxygen model reduced the estimated uncertainty in model output by 72% compared with standard Monte Carlo techniques. Sixty percent of this reduction was directly attributed to consideration of covariance between model parameters. A significant benefit of the technique is the ability to compare the reduction in total model output uncertainty corresponding to: (1) collection of more data on model state variables, and (2) laboratory or field studies to better define model processes. Limitations of the technique include computational requirements and accurate estimation of the joint probability distribution of model errors. This analysis was conducted assuming that model error is normally and independently distributed.
Article
Bayesian methods are experiencing increased use for probabilistic ecological modelling. Most Bayesian inference requires the numerical approximation of analytically intractable integrals. Two methods based on Monte Carlo simulation have appeared in the ecological/environmental modelling literature. Though they sound similar, the Bayesian Monte Carlo (BMC) and Markov Chain Monte Carlo (MCMC) methods are very different in their efficiency and effectiveness in providing useful approximations for accurate inference in Bayesian applications. We compare these two methods using a low-dimensional biochemical oxygen demand decay model as an example. We demonstrate that the BMC is extremely inefficient because the prior parameter distribution, from which the Monte Carlo sample is drawn, is often a poor surrogate for the posterior parameter distribution, particularly if the parameters are highly correlated. In contrast, MCMC generates a chain that converges, in distribution, on the posterior parameter distribution, that can be regarded as a sample from the posterior distribution. The inefficiency of the BMC can lead to marginal posterior parameter distributions that appear irregular and may be highly misleading because the important region of the posterior distribution may never be sampled. We also point out that a priori specification of the model error variance can strongly influence the estimation of the principal model parameters. Although the BMC does not require that the model error variance be specified, most published applications have treated this variance as a known constant. Finally, we note that most published BMC applications have chosen a uniform prior distribution, making the BMC more similar to a likelihood-based inference rather than a Bayesian method because the posterior is unaffected by the prior. Though other prior distributions could be applied, the treatment of Monte Carlo samples with any other choice of prior distribution has not been discussed in the BMC literature.
Book
The product of a unique collaboration among four leading scientists in academic research and industry, Numerical Recipes is a complete text and reference book on scientific computing. In a self-contained manner it proceeds from mathematical and theoretical considerations to actual practical computer routines. With over 100 new routines bringing the total to well over 300, plus upgraded versions of the original routines, the new edition remains the most practical, comprehensive handbook of scientific computing available today.
Article
A simplified climate model is presented which includes a fully 3-D, frictional geostrophic (FG) ocean component but retains an integration efficiency considerably greater than extant climate models with 3-D, primitive-equation ocean representations (20 kyears of integration can be completed in about a day on a PC). The model also includes an Energy and Moisture Balance atmosphere and a dynamic and thermodynamic sea-ice model. Using a semi-random ensemble of 1,000 simulations, we address both the inverse problem of parameter estimation, and the direct problem of quantifying the uncertainty due to mixing and transport parameters. Our results represent a first attempt at tuning a 3-D climate model by a strictly defined procedure, which nevertheless considers the whole of the appropriate parameter space. Model estimates of meridional overturning and Atlantic heat transport are well reproduced, while errors are reduced only moderately by a doubling of resolution. Model parameters are only weakly constrained by data, while strong correlations between mean error and parameter values are mostly found to be an artefact of single-parameter studies, not indicative of global model behaviour. Single-parameter sensitivity studies can therefore be misleading. Given a single, illustrative scenario of CO2 increase and fixing the polynomial coefficients governing the extremely simple radiation parameterisation, the spread of model predictions for global mean warming due solely to the transport parameters is around one degree after 100 years forcing, although in a typical 4,000-year ensemble-member simulation, the peak rate of warming in the deep Pacific occurs 400 years after the onset of the forcing. The corresponding uncertainty in Atlantic overturning after 100 years is around 5 Sv, with a small, but non-negligible, probability of a collapse in the long term.
Article
A theory for estimating the probability distribution of the state of a model given a set of observations exists.
Article
This paper addresses some fundamental methodological issues concerning the sensitivity analysis of chaotic geophysical systems. We show, using the Lorenz (1963) system as an example, that a naive approach to variational ("adjoint") sensitivity analysis is of limited utility. Applied to trajectories which are long relative to the predictability time scales of the system, cumulative error growth means that adjoint results diverge exponentially from the "macroscopic climate sensitivity" (that is, the sensitivity of time averaged properties of the system to finite-amplitude perturbations). This problem occurs even for time-averaged quantities and given infinite computing resources. Alternatively, applied to very short trajectories, the adjoint provides an incorrect estimate of the sensitivity, even if averaged over large numbers of initial conditions, because a finite time scale is required for the model climate to respond fully to certain perturbations. In the Lorenz (1963) system, an int...