ArticlePDF Available

Recent progress in performance evaluations and near real-time assessment of operational ocean products

Authors:

Abstract and Figures

Operational ocean forecast systems provide routine marine products to an ever-widening community of users and stakeholders. The majority of users need information about the quality and reliability of the products to exploit them fully. Hence, forecast centres have been developing improved methods for evaluating and communicating the quality of their products. Global Ocean Data Assimilation Experiment (GODAE) OceanView, along with the Copernicus European Marine Core Service and other national and international programmes, has facilitated the development of coordinated validation activities among these centres. New metrics, assessing a wider range of ocean parameters, have been defined and implemented in real-time. An overview of recent progress and emerging international standards is presented here.
Content may be subject to copyright.
Journal of Operational Oceanography
ISSN: 1755-876X (Print) 1755-8778 (Online) Journal homepage: http://www.tandfonline.com/loi/tjoo20
Recent progress in performance evaluations and
near real-time assessment of operational ocean
products
Fabrice Hernandez, Edward Blockley, Gary B. Brassington, Fraser Davidson,
Prasanth Divakaran, Marie Drévillon, Shiro Ishizaki, Marcos Garcia-Sotillo,
Patrick J. Hogan, Priidik Lagemaa, Bruno Levier, Matthew Martin, Avichal
Mehra, Christopher Mooers, Nicolas Ferry, Andrew Ryan, Charly Regnier,
Alistair Sellar, Gregory C. Smith, Sarantis Sofianos, Todd Spindler, Gianluca
Volpe, John Wilkin, Edward D. Zaron & Aijun Zhang
To cite this article: Fabrice Hernandez, Edward Blockley, Gary B. Brassington, Fraser
Davidson, Prasanth Divakaran, Marie Drévillon, Shiro Ishizaki, Marcos Garcia-Sotillo, Patrick
J. Hogan, Priidik Lagemaa, Bruno Levier, Matthew Martin, Avichal Mehra, Christopher
Mooers, Nicolas Ferry, Andrew Ryan, Charly Regnier, Alistair Sellar, Gregory C. Smith,
Sarantis Sofianos, Todd Spindler, Gianluca Volpe, John Wilkin, Edward D. Zaron & Aijun
Zhang (2015) Recent progress in performance evaluations and near real-time assessment of
operational ocean products, Journal of Operational Oceanography, 8:sup2, s221-s238, DOI:
10.1080/1755876X.2015.1050282
To link to this article: http://dx.doi.org/10.1080/1755876X.2015.1050282
© 2015 Institut de Recherche pour le
Développement (IRD). Published by Taylor &
Francis
Published online: 14 Oct 2015.
Submit your article to this journal
Article views: 51
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=tjoo20
Download by: [Environment Canada Library Services / Offert par les Services de bibliothèque
d'Environnement Canada]
Date: 16 November 2015, At: 06:01
Citing articles: 1 View citing articles
Downloaded by [Environment Canada Library Services / Offert par les Services de bibliothèque d'Environnement Canada] at 06:01 16 November 2015
Recent progress in performance evaluations and near real-time assessment of operational ocean
products
Fabrice Hernandez
a,e
*, Edward Blockley
b
, Gary B. Brassington
c
, Fraser Davidson
d
, Prasanth Divakaran
c
, Marie Drévillon
e
,
Shiro Ishizaki
f
, Marcos Garcia-Sotillo
g
, Patrick J. Hogan
h
, Priidik Lagemaa
i
, Bruno Levier
e
, Matthew Martin
b
,
Avichal Mehra
j
, Christopher Mooers
j
, Nicolas Ferry
e
, Andrew Ryan
b
, Charly Regnier
e
, Alistair Sellar
b
, Gregory C. Smith
l
,
Sarantis Soanos
m
, Todd Spindler
j
, Gianluca Volpe
n
, John Wilkin
o
, Edward D. Zaron
k
and Aijun Zhang
p
a
Institut de Recherche pour le Développement (IRD), LEGOS, Toulouse, France;
b
Met Ofce, Ocean Forecasting Research &
Development, Exeter, UK;
c
Centre for Australian Weather and Climate Research, Australian Bureau of Meteorology, Melbourne,
Australia;
d
Fisheries and Oceans, St Johns, Canada;
e
Mercator Océan, Ramonville St Agne, France;
f
Japan Meteorological Agency
(JMA), Ohtemachi, Tokyo, Japan;
g
Puertos del Estado, Madrid, Spain;
h
Navy Research Laboratory/Stennis Space Center, Mississippi,
USA;
i
Marine Systems Institute at Tallinn University of Technology, Tallinn, Estonia;
j
Environmental Modeling Center, NOAA/NWS/NCEP,
College Park, Maryland, USA;
k
Portland State University, Portland, Department of Civil and Environmental Engineering, Oregon, USA;
l
Environment Canada, Montréal, Canada;
m
Ocean Physics and Modeling Group, University of Athens, Athens, Greece;
n
Istituto di Scienze
dellAtmosfera e del Clima, Rome, Italy;
o
Institute of Marine and Coastal Sciences, Rutgers, State University of New Jersey, USA;
p
Center
for Operational Oceanographic Products and Service, NOAA, Silver Spring, Maryland, USA
Operational ocean forecast systems provide routine marine products to an ever-widening community of users and
stakeholders. The majority of users need information about the quality and reliability of the products to exploit them fully.
Hence, forecast centres have been developing improved methods for evaluating and communicating the quality of their
products. Global Ocean Data Assimilation Experiment (GODAE) OceanView, along with the Copernicus European
Marine Core Service and other national and international programmes, has facilitated the development of coordinated
validation activities among these centres. New metrics, assessing a wider range of ocean parameters, have been dened
and implemented in real-time. An overview of recent progress and emerging international standards is presented here.
Introduction
Operational ocean forecast systems (OOFSs) now provide
a wide range of analyses and forecasts of the marine
environment that can be exploited by many users. The
value of the products to any particular user depends not
only on the quality and skill of the products but also on
the users knowledge (and understanding) of the quality,
skill and reliability of the products for his or her particular
application. Since the initial implementation of OOFSs
during the late 1990s, continuous efforts have been made
to evaluate hindcast and forecast accuracy and skill (Her-
nandez 2011; Martin 2011). Accuracy and skill here are
dened respectively as the OOFS products degree of close-
ness to the ocean truth(Hernandez 2011) and the OOFSs
usefulness for a given application (Jolliff et al. 2009). An
overview of skill assessment using observations and other
reference datasets representing this truth is given by Stow
et al. (2009).
The calibration, validation, verication and quality
control of OOFS products are core activities in ocean
operational centres (Lellouche et al. 2013; Oke et al.
2013; Blockley et al. 2014) (OOCs). Usually calibration
refers to a task in which model parameters are optimized.
Here, the calibration phase refers to the last comprehen-
sive scientic assessment of the new OOFS version
before operation. The calibration phase is also often
used to demonstrate that the new system performance is
better than the existing system. Validation refers to the
OOFS performance assessment while in operation. Veri-
cation is dened here as the quantication of OOFS skill
based on independent data, i.e. not used to generate the
products.
Methods for assessing OOFS reliability (Crosnier & Le
Provost 2007) were dened in the early days of the Global
Ocean Data Assimilation Experiment (GODAE) exper-
iment (Bell et al. 2009), based on (1) consistency, (2)
quality (or accuracy) and (3) added value as proposed
and dened by weather forecast skill verication
approaches (Murphy 1993; Murphy & Winkler 1987).
The rst two types of assessments are undertaken routinely
by OOCs as internal metrics. The third is considered as
user-oriented, and requires use of external metrics
measuring the tness for purpose (provision of dependable,
reliable and repeatable information), or the value of ocean
© 2015 Institut de Recherche pour le Développement (IRD). Published by Taylor & Francis
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://crea tivecommons.org/licenses/by/4.0/), which permits unrestricted
use, distribution, and reproduction in any medium, provided the original work is properly cited.
*Corresponding author. Email: fhernandez@mercator-ocean.fr
Journal of Operational Oceanography, 2015
Vol. 8, No. S2, s221s238, http://dx.doi.org/10.1080/1755876X.2015.1050282
Downloaded by [Environment Canada Library Services / Offert par les Services de bibliothèque d'Environnement Canada] at 06:01 16 November 2015
forecast services. This is also addressed by some OOCs in
parallel with verication tasks performed by users.
Experts in OOCs across the world who are assessing the
skill of OOFSs face similar issues with the observational
data sets available for validating their products. In general,
in situ and satellite measurements are collected by dedicated
data assembly centres (DACs) that pre-process the data and
make it available for OOFSs. Hence, for similar components
of operational systems, methods and tools for assessing the
representation of ocean processes can be shared. Owing to
the nature and quality of the observations, validation experts
also face comparable issues, such as the validation of mesos-
cale chlorophyll or primary products using ocean colour satel-
lite data, or the use of Lagrangian approaches and drifters to
verify the realism of eddies in regional models for oil-spill
forecast skill. As a result, there is a great potential for collab-
oration within the scientic community in this area.
Naturally, working groups were set up to tackle, as a com-
munity, these validation issues. This started earlier within the
ocean observation community, which raised expert groups to
develop guidelines and standards for providing state-of-the-
art ocean observation products. Some examples of these are
the Ocean Surface Topography Science Team for sea
surface height (SSH) and satellite altimetry (www.aviso.
altimetry.fr), the Group for High Resolution Sea Surface
Temperature (GHRSST) for sea surface temperature (SST)
from various satellite and in situ sensors (www.ghrsst.org/),
the Argo team for in situ verticalproles of primarily tempera-
ture (T) and salinity (S) (www.argo.ucsd.edu), the Global
Ocean Surface Underway Data group for sea surface salinity
(SSS) (www.gosud.org) and the International Ocean Color
Coordinating Group (www.ioccg.org).
This paper aims to highlight recent progress in near-
real-time monitoring of OOFS performance, and to
describe different validation strategies and their limit-
ations. For the sake of completeness, we also present
DACs validation procedure for advanced observed-based
products. Some recent examples are presented and dis-
cussed in the rst section. The next section illustrates pro-
gress by OOCs in integrating validation in their systems.
More specically, since GODAE (Bell et al. 2009), the
operational community has maintained a partnership to
share and standardize validation methodologies. This com-
munity has gained mutual benet from inter-comparing
their ocean products and inferring the relative strength
and weaknesses of the operational systems (Oke et al.
2012). These issues are addressed in the framework of
the ongoing GODAE OceanView program (Schiller et al.
2015) (GOV) (www.godae-oceanview.org) by the Inter-
comparison and Validation Task Team (IV-TT). Three
initiatives have started and that are ongoing: an Ocean
Reanalysis Intercomparison (Balmaseda et al. 2015); the
organization of a multi-model ensemble forecast approach
for ocean surface parameters; and the organization of the
near-real-time operational product Class 4 metrics
inter-comparison against observations, described later in
this paper and detailed in two companion papers (Ryan
et al. 2015; Divakaran et al. 2015).
Recent improvements in near-real time and
operational assessment
New metrics
Presently, real-time OOFS skill assessment focuses on
various aspects of the dynamics of physical and biogeo-
chemical processes of the ocean, at different time-scales,
over different areas, and with different purposes and uses.
Evaluation metrics have evolved in order to synthesize
different aspects of system performance together. For
example, Taylor (2001) and target (Jolliff et al. 2009) dia-
grams consider root-mean-square error (RMSE) or root-
mean-square differences (RMSD) together with anomaly
correlations versus observations. Similarly, cost functions
and model efciency values (Hyder et al. 2012) can
provide a synthesis of model performance indicators.
Furthermore, new metrics have been designed to charac-
terize other properties. For example, in the case of search and
rescue, ensemble predictions and clouds of dispersion
(Melsom et al. 2012) have been used to evaluate the contri-
bution of uncertainty in ocean currents to drift projections.
Dispersion is also assessed using multi-model approaches
(such as Fukushima Cesium 137 concentration estimates;
Masumoto et al. 2012). New metrics have also been dened
for sea-ice, such as contingency tables and distribution stat-
istics used over ensemble coupled model seasonal forecast
experiments (Benestad et al. 2011). Skill assessment of
ocean biogeochemical models has also been addressed
recently. In particular, Lynch et al. (2009) point out the
failure of a model to accurately represent the ocean truth,
but also the failure to correctly assess the effective skill of
the model using appropriate metrics.
Assimilation performance assessment
In parallel, the monitoring of the performance of analysis
systems has been continuously improved. Statistics
derived from innovations (observation minus background)
and residuals (observation minus analysis) are used to
assess the consistency of the assimilation framework
(including the model background and observation error
covariances). In the case of ensemble analysis systems,
these statistics can be used to verify the adequacy of fore-
cast spread (Balmaseda et al. 2013; Desroziers et al.
2005; Desroziers & Ivanov 2001). Verifying and reducing
ocean model biases is also an important issue, as many
assimilation methods are based on the assumption that
models have no bias, which can reduce the efciency of
analysis methods and even lead to unphysical increments
if biases are present and not handled correctly.
s222 F. Hernandez et al.
Downloaded by [Environment Canada Library Services / Offert par les Services de bibliothèque d'Environnement Canada] at 06:01 16 November 2015
Rigorous skill assessment in the assimilation frame-
work is a difcult task: most available observations are
used to adjust models and reduce analysis errors. Thus,
independent assessment is only possible by: (1) withhold-
ing part of the dataset for statistical quantication of
errors (a trade-off between a sufcient population size to
estimate a statistic while not signicantly impacting the
quality of the system performance being measured); or by
(2) using other sources of data that have not been assimi-
lated (Gregg et al. 2009). The latter is generally employed
with data not available in near-real time, which is useful for
reanalysis (or hindcast) evaluation, but not for operational
routine verication.
Longer-term forecast assessment
Most of the OOCs provide short-term forecasts (from a
few days, to 12 weeks), but some have begun providing
longer monthly forecasts, like the Japanese Meteorologi-
cal Agency MOVE/MRI.COM-WNP OOFS. It covers a
large part of the Northwest Pacic(15°N65°N,
117°E160°W), with a specic zoom (1/10° resolution)
over 15°N50°N, 117°E160°E (Usuii et al. 2006), and
uses a multivariate Three-Dimensional Variational
(3DVAR) data-assimilation scheme (Fujii & Kamachi
2003). Persistence and 1- to 30-day forecasts are compared
against analyses to provide RMSE statistics. A forecast
skill metric is used, whereby the ratio of forecast RMSE
over the persistence RMSE is calculated for a given
forecast lead-time. Using this skill score, the forecast pro-
vides useful skill compared with persistence if the ratio is
below 1. Results from the MOVE OOFS are shown in
Figure 1 for the velocity eld at 50 m depth over the
period February 2006 to January 2008. Even for 30-day
forecasts, the system performs better than persistence in
most areas around Japan, and fails only in the vicinity of
the Kuroshio Extension area [Figure 1(ac)]. Moreover,
Kuroshio dynamics and predictability are assessed using
a specic metric based on the Kuroshio main axis. The Kur-
oshio axis error is dened as the distance between a forecast
and the trueaxis position over the 133139°E, 3035°N
region, where the axis position is determined using the pos-
ition of the 15°C isotherm at 200 m depth from the analysis.
Figure 1(d) indicates that the dynamical forecasting system
is consistently better than persistence at all lead times. This
type of metric is also useful to convey forecast skill to
users, as the error is expressed in tangible terms (here a dis-
tance in kilometres) rather than an abstract unit-less skill
score or RMSE value.
Specic approaches for regional operational systems
Recent improvements have also been made in terms of
evaluating specic regional and mesoscale dynamics. For
example, the Gulf of Mexico Pilot Prediction Project
(GOMEX-PPP, http://abcmgr.tamu.edu/gomexppp/)is
investigating the OOFSs performance for predicting the
evolution of the Loop Current in the Gulf of Mexico. The
Figure 1. MOVE/MRI maps of forecast skill, comparing forecast and persistence RMSE statistics against analyses for velocity eld at
50-m depth, for 10-, 20- and 30-day forecast lead-time, respectively [(a), (b) and (c)]. Forecast beats persistence for values below 1. (d)
Performance assessment of the Kuroshio axis position (in kilometres; see text for denition) for the forecast (solid line) and persistence
(dashed line), for 0- to 30-day lead-time.
Journal of Operational Oceanography s223
Downloaded by [Environment Canada Library Services / Offert par les Services de bibliothèque d'Environnement Canada] at 06:01 16 November 2015
Long Range Ensemble Forecasting System (GOM-LERFS)
developed at the Naval Research Laboratory (Stennis,
USA), has been providing 2-month forecasts since
January 2013, with the intention of supporting end users
impacted by strong currents associated with the Loop
Current and its eddies, and to provide boundary conditions
for coastal ocean models. This 3-km resolution OOFS per-
forms weekly 32-member 60-day forecasts. It is initialized
using an analysis provided by the Navy Coupled Ocean
Data Assimilation (NCODA) scheme that uses a 5-day
assimilation window to ingest satellite altimetry, SST and
in situ data obtained from the global telecommunication
system. A verication of SSH is performed in which fore-
casts are compared in real-time with along-track SSH
data following the Class 4 metrics approach (i.e. in obser-
vation space; discussed below). Then, in model space, the
ensemble is used to assess the probabilistic forecast skill.
In Figure 2, statistics of weekly comparisons against ana-
lyses for the period JanuarySeptember 2013 show that
forecasts remain skilful for approximately twice as long
as persistence. The SSH anomaly variance agrees closely
in the forecast and verifying analysis, but the ensemble
standard deviation does not appear to predict the forecast
error, suggesting that the ensemble spread does not fully
capture the forecast error patterns. Adequately sampling
the uncertainty in initial conditions, model physics and
forcing is an important aspect of ocean ensemble prediction
that requires further study.
Another example of forecast skill assessment is the
dynamical feature-based validation approach used for the
Experimental System for Predicting Shelf/Slope Optics
(ESPreSSO), operated in real-time by Rutgers University
over the New Jersey coast Mid-Atlantic Bight (Wilkin &
Hunter 2013). This OOFS is based on the 7-km horizontal
resolution Regional Ocean Modeling System (ROMS)
using boundary conditions from the HYCOM-NCODA
global OOFS. The system is initialized using daily analyses
from a Four-Dimensional Variational (4DVAR) analysis
system with a 3-day analysis window (Moore et al.
2011), which assimilates a large set of data [including
glider T/S proles and CODAR HF-Radar measurements
(http://www.myroms.org/espresso/)]. The 4DVAR
approach allows a better quantication of model errors by
assessing the impact of the assimilated data, thereby per-
mitting the correction of large-scale biases. Slope currents
and water masses used for real-time applications are evalu-
ated using all available data. A specic off-line verication
is performed using independent surface drifters, moored
data and SSH. Moreover, a dedicated multi-model real-
time assessment has been performed, comparing estimates
from ESPreSSO together with three other regional OOFS,
and three global OOFS (including HYCOM-NCODA) in
order to evaluate the OOFSprediction skill for subtidal
currents and shelf water mass changes. This assessment is
comprehensively discussed in Wilkin and Hunter (2013)
including a presentation of performance improvements
through downscaling strategies. In their gures 4 and 5,
improvements to Taylor and Target diagrams are proposed,
to represent individual vs average performance, and seaso-
nal model biases respectively.
Validation of biogeochemical products
In the eld of ecosystem modelling and marine-resources
management, in situ data for adequate validation of oper-
ational products are sparse. Hence, satellite ocean colour
(OC) products remain the main source of information for
estimates of phytoplankton pigment concentration distri-
bution [i.e. chlorophyll, CHL; Figure 3(a)]. The OC The-
matic Assembly Centre (TAC), within the European
MyOcean project (www.myocean.eu), has developed
specic processing chains to operationally distribute
state-of-the-art, quality-checked daily OC observations
over both global and regional domains. The need for
Figure 2. Real-time SSH assessment of the GOM-LERFS Gulf of Mexico OOFS against satellite altimetry data. Comparison with analy-
sis is restricted to water deeper than 200 m in the subdomain, 82W to 89W and 22N to 28N. (a) Correlation of SSH anomalies for persist-
ence (red) and forecast (black). (b) RMSE of the ensemble mean forecast (red) and ensemble spread standard deviation (red dashed) for
SSH. Also shown is the spatial variability in term of standard deviation (STD) of the analysis (black) and the 4-week ensemble mean fore-
cast (blue).
s224 F. Hernandez et al.
Downloaded by [Environment Canada Library Services / Offert par les Services de bibliothèque d'Environnement Canada] at 06:01 16 November 2015
regional processing comes from the demonstrated inade-
quacy, at regional scales, of global algorithms to generate
reliable products of sufcient accuracy (Volpe et al.
2012). For example, the oligotrophic waters of the Mediter-
ranean Sea were shown to be signicantly less blue and
more green than the global ocean (Volpe et al. 2007). The
OC TAC provides value-added products (generally not
distributed by space agencies) such as: (1) daily merged
elds from different sensors; (2) Level 4 products without
data voids owing to clouds generated using both Optimal
Interpolation and Empirical Orthogonal Function
approaches; and (3) products that account for the two
broad classes of bio-optical regimes (open ocean and
coastal waters). Level 4 and Level 3 (L4 and L3) mentioned
Figure 3. MODIS Aqua daily Level-3 CHL product, processed via the MedOC3 algorithm, for 30 May 2014 (a), with the corresponding
Quality Index: normalized departure from the daily climatology (b). The full online validation statistics time series is given by the shaded
plot in panel c, with time on the x axis and the histogram bins of the Quality Index on y axis; colours show the percent occurrence with
respect to the total number of valid pixels (classied as reliable by the operational processing and not agged as cloud or other contami-
nation factor), as described by the time series on panel d.
Journal of Operational Oceanography s225
Downloaded by [Environment Canada Library Services / Offert par les Services de bibliothèque d'Environnement Canada] at 06:01 16 November 2015
here refer to product levels as dened by the Committee on
Earth Observation Satellites (www.ceos.org). Typically, L4
products are regular maps of a given parameter, obtained by
merging and processing similar measurements from differ-
ent sources, and using specic estimation methods (optimal
interpolation, krigging, etc.). For these observation-based
products, both ofine and online quality assessments are
performed by the OC TAC. The former refers to the com-
parison of spacetime co-located in situ and satellite
derived products for quantities such as spectral remote
sensing reectance, total suspended matter, coloured dis-
solved organic matter and chlorophyll concentration. In
real-time, such data are not sufciently robust, and vali-
dation is limited to a consistency assessment (Hernandez
2011), where the daily climatology for each region is
used as a reference to make pixel-based comparison
(online validation). A Quality Index, based on the normal-
ized departures from climatology, is computed from the
SeaWiFS sensor [no longer operational; Figure 3(b)]. The
overall statistics and distribution are analysed, as illustrated
for the Mediterranean Sea CHL [Figure 3(c) and (d)]. In
parallel, the monitoring of input data (number, quality
) has increased the reliability of the products.
Many initiatives have led to progress on skill assess-
ment of biogeochemical modelling (Stow et al. 2009).
For example, as part of the Ocean Carbon Model Intercom-
parison Project, univariate metrics were proposed to quan-
tify both physical and biogeochemical parameters of the
coupled simulations (Doney et al. 2009). Multivariate
metrics (i.e. quantifying the reliability of both the par-
ameters and their relation to observed processes) (Allen
& Somereld 2009), or map-based validation (Rose et al.
2009), is also emerging in this eld. However, for real-
time assessment of ecosystem-biogeochemical forecasts,
most of the OOFSs can only rely on references given by
OC satellite products. Moreover, the dynamics of biogeo-
chemical systems is strongly characterized by the patchi-
ness of its properties generated by oceanic mesoscale,
which causes heterogeneity in concentration elds (Levy
& Martin 2013). Consequently, most forecast verications
mimic OC product assessment, by analysing in a similar
way, at the pixel level, the model equivalent to CHL and
optical satellite data (Lazzari et al. 2012), as described in
the previous paragraph.
Validation of sea-ice products
Growing interest in polar regions has driven the need for
improved sea-ice verication metrics to demonstrate the
capacity and quality of sea-ice forecast skill to potential
users. This effort has been hindered by the reliability and
availability of observational datasets together with a lack
of knowledge of how to adequately account for nonlineari-
ties in the verication metrics. Contingency table-based
metrics, introduced in the early twentieth century
(Pearson 1904), have been re-popularized, as well as the
root-mean-square distance of ice edge. However, these
metrics may not be relevant for regional or process-
dependent verication. In particular, errors in ice edge
location assessment are sensitive to the denition of ice
edge, as multiple ice edges may be present and the total
error will be sensitive to the length of the ice edge.
Hence, the metrics dened for the Arctic might not be suit-
able in the Baltic Sea, and denitions should consider sub-
regional scaling (e.g. size of a gulf) (Lagemaa 2013).
However, even if the ice metric is not properly dened, it
still gives valuable user information for the dense marine
trafc regions like the Baltic Sea.
An example of a contingency table-based metric from
the Canadian Meteorological Centre Global Ice-Ocean Pre-
diction Systems (GIOPSv1.0) is shown in Figure 4.
GIOPSv1.0 uses a 3DVAR ice concentration analysis for
correcting the Los Alamos sea-ice model by assimilating
satellite data together with daily ice charts from the Cana-
dian Ice Service. The reference dataset is given by the Inter-
active Multisensor Snow and Ice Mapping System (IMS)
analyses from the National Oceanic and Atmospheric
Administration (NOAA) National Ice Centre that provide
binary elds of ice/open water on a 4 km grid. Sea-ice ana-
lyses suffer from an incomplete coverage of observations,
with data-reliability issues and the mis-representation of
leads. A particular issue is the high sensitivity of passive
microwave retrievals to surface melt, often resulting in
erroneous values of open water in summer. Contingency
table statistics produced using IMS analyses (applying a
threshold of 0.4 to determine binary ice/water values
from the GIOPSv1.0 ice concentration forecasts) are used
in order to evaluate the proportion of correct ice, or
correct water. These contingency scores are computed sep-
arately for forecast and persistence elds. Then, differences
of scores for forecasts and persistence are computed. These
metrics are mapped for 2011 in Figure 4 showing skilful 7-
day forecasts along most of the ice edge. In other words, 7-
day forecasts beat persistence considering the prediction of
correct proportion of sea ice and correct water.
Reliability assessment of input information
Another recent aspect in OOFS validation strategy is the
systematic feedback of errors and anomalies to providers
of input data. For instance, validation of atmospheric
forcing elds is now carried out for some wave-prediction
systems (Feng et al. 2006). Moreover, inputs of ocean
assimilation systems, such as in situ data collected by
TACs or DACs, can suffer in real-time from incomplete
levels of quality control. While automatic procedures are
applied for the rapid distribution of the observations in
real-time, more detailed visual analysis is often left for
delayed-time datasets. Other analyses usually depend on
the level of expertise of the provider. In near-real time,
s226 F. Hernandez et al.
Downloaded by [Environment Canada Library Services / Offert par les Services de bibliothèque d'Environnement Canada] at 06:01 16 November 2015
erroneous in situ proles can drastically impact the quality
of ocean analyses. As a result, systematic quality control
has been implemented in many OOFSs to prevent this. At
Mercator Océan, two techniques are applied for in situ T/
S proles. First, innovations (guess/forecast minus obser-
vation) are tested against a threshold envelope. This envel-
ope is dened using statistics of innovations from an ocean
reanalysis and is used to detect anomalous observations (e.
g. blue dots between 500 and 1000 m depth in Figure 5).
Second, dynamic heights are computed from the T/S incre-
ments, and then probability density functions are con-
structed for consistent dynamical areas, in order to detect
points outside from the normal distribution. In some
cases, feedback to producers is organized through blacklist-
ing (Cabanes et al. 2013).
Development of integrated operational verication
systems
Several OOCs have recently taken steps to structure cali-
bration, validation, and verication activities, in real-time
or delayed mode, as an integrated component of the
OOFSs. In the USA, the national backbone of real-time
data, tidal predictions, data management and operational
modelling supporting NOAAs missions (http://
tidesandcurrents.noaa.gov) under the National Operational
Coastal Modeling Program now performs quality control
and forecast skill verication in a centralized way for all
OOFS through the Continuous Operational Real-Time
Monitoring System.
Similarly, the MyOcean IBI (Irish Biscay Iberian
shelves) OOFS team (Puertos del Estado, Spain, and Mer-
cator Océan, France) has developed a comprehensive tool
called NARVAL (Numeric Assessment for Regional VALi-
dation) to check its operational performance, in terms of
consistency, accuracy and reliability. NARVAL uses avail-
able observations, such as: satellite-derived Sea Level
Anomalies (SLA), SST and SSS (from both L3 and L4 pro-
ducts), in situ T/S proles, HF-radar surface currents and
tide gauge sea level. This tool builds on the MyOcean
project structure such that the input data are quality
checked by the TACs. NARVAL is modular and extendible
for any new data sources as a reference (measurements, cli-
matologies or model estimates). All validation information
produced is archived for further evaluation. Additionally,
the On-line Mode Validationprovides an automated
quality and consistency assessment, and is routinely per-
formed for each forecast bulletin (from the previous days
hindcast up to 5-day forecasts). It generates Class 14
metrics (Hernandez et al. 2009) that provide daily statistics
and an evolution of the skill score for each parameter over
the past two weeks [Figure 6(a)]. Furthermore, a Delayed
Mode Validationprovides an overall review of the IBI
product quality over longer time periods (i.e. monthly, sea-
sonal and annual). Real-time statistics are accumulated to
provide a synthesis assessment over longer periods, while
dedicated metrics using off-line datasets, can focus on par-
ticular ocean phenomena or parameters. Metrics are per-
formed over the whole domain (26°N56°N, 19°W5°E),
but also over specic sub-regions of interest both for
users and for verication teams, for example: Strait of
Figure 4. Contingency analyses of GIOPSv1.0 sea-ice forecast over 2011 for 7-day forecasts, using IMS data as reference. Forecast minus
persistence (skilful/erroneous, red positive/blue negative resp.) for proportion of correct ice (left) and correct water (right)
Journal of Operational Oceanography s227
Downloaded by [Environment Canada Library Services / Offert par les Services de bibliothèque d'Environnement Canada] at 06:01 16 November 2015
Gibraltar, English Chanel, Western Mediterranean Sea,
Gulf of Biscay, Western and Northern Iberian shelves, the
Canary Islands area and the Irish Sea.
Using NARVAL, the performance can be monitored for
specic areas, dynamics and OOFS, as illustrated for SST
using a Taylor diagram (Figure 7). This type of gure is
obviously complex, but it is used by validation teams to
monitor, at a glance, several systemsSST scores over
various areas. NARVAL has been designed to allow auto-
matic inter-comparison between IBI and adjacent regional
OOFS within the MyOcean framework for the Mediterra-
nean Sea and the North-West-Shelf [Figure 6(b)]. Compari-
son with adjacent OOFSs aims primarily to maintain
consistency in products and user delivery. Comparison
with the global OOFS (within which the IBI OOFS is
nested) quanties added value of the regional shelf
system (Figure 7), that representing tides and high-
frequency upper ocean dynamics. There are also compari-
sons against coastal systems over key areas, such as the
SAMPA (Sistema Autónomo de Medición, Predicción y
Alerta) system around the Gibraltar Strait (Lorente et al.
2014).
In the MyOcean framework, a similar methodology is
implemented for the Baltic Sea by Danish, Estonian,
Finnish, German and Swedish OOCs, with a comprehen-
sive validation toolbox designed to cover all available
data with various metrics. It provides detailed outputs for
expert users and model developers. However, for less
experienced users and decision makers, the system pro-
vides a more general reliability output. Routines are
adapted for mapped, on-track and time-series reference
data covering the sea level, ice thickness and concentration,
T, S, transports, CHL, oxygen, nitrate and phosphate
metrics (Lagemaa et al. 2013). Moreover, the ve contri-
buting Baltic OOCs have organized a multi-model verica-
tion and comparison process, together with a multi-model
ensemble estimate assessment. For some parameters,
results are regularly posted to the Baltic Operational
Figure 5. Detection of in situ prole anomalies before assimilation in the GLORYS2V1 Mercator Océan reanalysis. Left: innovation (blue
dots) and threshold envelop (red). Right: temperature-prole observations (blue dots), model forecast for the corresponding observations
(red) and climatology (green). In this case, the cluster of blue dots in the depth range 5001000 m has not passed the test. Top: location in
the equatorial Pacic of the prole.
s228 F. Hernandez et al.
Downloaded by [Environment Canada Library Services / Offert par les Services de bibliothèque d'Environnement Canada] at 06:01 16 November 2015
Figure 6. (a) Summary of comparisons computed by NARVAL (on daily, monthly, quarterly and yearly basis) to make an on-line and
delayed mode validation of the Irish Bay of Biscay Iberian shelves (IBI) products. Abbreviations: IBI BE, IBI best estimates; IBI
FCs, IBI forecast products at different forecast horizons; CLIM, climatologies; ATM, atmospheric elds used as IBI forcing. (b)
NARVAL Delayed-Modes web pages: SST comparison of IBI elds with OOFSs from adjacent areas and the global forecasting system.
Figure 7. SST Class 4 metrics against MyOcean super collated SST_EUR_SST_L3S_NRT_OBSERVATIONS_010_009_a. With IBI
free-run (cyan), IBI operational (blue), global old (yellow) and new version (red), for daily hindcast elds, in different domains dened
in the text. Y-axis scale also corresponds to normalized standard deviation. 0.2 isocontours for RMSD (inner dashed circles) are associated
with the normalized standard deviation.
Journal of Operational Oceanography s229
Downloaded by [Environment Canada Library Services / Offert par les Services de bibliothèque d'Environnement Canada] at 06:01 16 November 2015
Oceanographic System server (www.boos.org). Beyond the
extended information on forecast scores, the inter-
comparison of different OOFS is adding value to the near
real-time validation routines, in addition to the usual
evaluation against observations and climatology. The
multi-model standard deviations from different forecast
products (Figure 8) provides valuable information about
their uncertainties, which is difcult to assess using
regular model-reference approaches owing to sparse cover-
age of observations. These gures are available daily at
www.boos.org. Interestingly, nine OOFS are assembled to
show the reliability of surface current forecast, using
vector diagrams [Figure 8(c)]. This strategy has also been
adopted in MyOcean by the North West European Shelf
Operational Oceanographic System, covering the North
Sea and English Channel regions, presenting a multi-
model assessment from Belgium, Denmark, Germany,
Norway, Sweden and UK OOCs (www.noos.cc).
Reducing uncertainties by ensemble approach:
ocean surface parameters multi-model estimation
Major incidents, such as the AF447 Air France Rio-Paris
airplane crash in June 2009, the DeepWater Horizon oil
Figure 8. Example of the multi-model standard deviation for SSS (a). Multi-model mean total water transports in Sverdrup from 4 or 5
OOFS (blue/black arrows, numbers correspond to sections) for 13 June 2014. (b) Green transect indicating that the multi-model standard
deviation transport is lower than the mean value, and yellow comprise between 1 and 3 times the mean value. (c) Progressive vector
diagram of model sea surface currents from nine Baltic Sea forecast products. All trajectories starting from the same location on the
Gulf of Finland. Similar gures are available and discussed at www.boos.org. Inga Golbeck, Xin Li and Frank Janssen (German Maritime
and Hydrographic Agency), pers. comm.
s230 F. Hernandez et al.
Downloaded by [Environment Canada Library Services / Offert par les Services de bibliothèque d'Environnement Canada] at 06:01 16 November 2015
platform accident in April 2010, the Fukushima nuclear
power plant catastrophe in March 2011 or the search for
the missing MH370 Malaysia Airlines ight in March
2014 in the Indian Ocean, have highlighted operational
oceanographys capacity to provide relevant information
for decision makers (Masumoto et al. 2012; Kawamura
et al. 2011). For all these events, national authorities
have made requests to their respective OOCs in
order to provide some assistance in near-real time or
ofine, to carry out dedicated studies to complement risk
assessment.
Ocean studies performed in support of the effort to nd
the Air France plane wreckage relied on several new
aspects: (1) an international effort to collect forecasts
from different OOCs and to provide different ocean data-
sets to assist rescue activities in real-time; (2) the use of
multi-model datasets and ensemble approaches to reduce
errors of ocean surface dynamics in hindcasts and forecasts,
with the implementation of dedicated high resolution
model simulations in the area, nested into global OOFSs;
(3) a retrospective statistical analysis of the accuracy of
ocean currents and, in particular, the reliability of mixing
and transport properties; (4) the formation of an inter-
national task team, with contributions from many ocean
experts from both the in situ and modelling communities
(Scott et al. 2012; Drévillon et al. 2013). Performance
gains were also made during the search for MH370
through the use of ensemble mean products that improved
the representation of buoy trajectories.
At the Australian Bureau Of Meteorology, deterministic
forecast errors of the OceanMAPS OOFS are assessed and
reduced by implementing time-lagged ensemble forecast,
also called a multicycle ensemble (Brassington 2013).
Over four successive days, forecasts are performed each
day, starting from background elds independent from
each other. Weighted ensemble averages are then com-
puted, and forecast errors are assessed using spectral
methods that quantify the impact of ensemble averaging
as a function of wavenumber. For instance, for SST,
Figure 9 demonstrates the increase in power for random
information (Brassington 2013). By comparing the power
spectrum at different forecast periods, the growth in
random error relative to wavelength is also captured.
For marine pollution in the Northern Aegean Sea,
studies based on a 48-h oil-spill dispersion forecast have
been performed recently. The system is based on atmos-
pheric, wave and ocean circulation models coupled with
the operational systems using the Aegean-Levantine Eddy
Resolving Model (nested in the MyOcean OOFS) and
SKIRON of the University of Athens and oil-spill dis-
persion models (http://diavlos.oc.phys.uoa.gr). A Lagran-
gian-based verication east of the Limnos Island
(Northeastern Aegean Sea) was conducted during
October 2012 where 25 drifting buoys and special oil-
spill drifting instruments were compared with drift predic-
tions. The area was characterized by a very strong front,
and in many cases a small error in the prediction of the
frontal line resulted in very large errors in the oil-spill pre-
diction (Figure 10, left). This experiment shows that fore-
casts beat persistence over the rst 20 h (Figure 10,
right). Moreover, in these areas of varying dynamical fea-
tures (fronts, eddies), forecast errors grow signicantly,
emphasizing the need for more advanced prediction
systems such as ensemble forecasts. Ensemble approach
are considered now at regional scale, as in the Ligurian
Sea, where a multi-model strategy is tested against an
Figure 9. (a) Power spectrum for SST anomaly in the Tasman Sea for zonal sections (3832S) and temporally averaged from 1 March to
31 August 2012 from the Australian OceanMAPS OOFS. The black (red) lines represent the 0-lag latest forecast and the ensemble mean,
respectively. The periodograms are shown for the forecast hours 096 (4-day before) solid), 048 (2-day before, dashed), 00000 (dash-dot)
and 048 (2-day forecast, dotted). (b) Difference in power between the 0-lag latest forecast and the weighted ensemble mean for the forecast
096 (solid), 048 (dashed), 000 (dash-dot) and 048 (dotted).
Journal of Operational Oceanography s231
Downloaded by [Environment Canada Library Services / Offert par les Services de bibliothèque d'Environnement Canada] at 06:01 16 November 2015
ensemble predicting system, showing the respective merit
of each approach (Mourre & Chiggiato 2014). Addition-
ally, ensemble approaches proposed by the operational
SST community at global scales have been shown to
provide promising results, where the ensemble is usually
more reliable than individual estimates (Martin et al.
2012; Dash et al. 2012; Xie et al. 2008).
Several OOFSs involved in GOV activities contributed
to the rescue actions carried out for the dramatic events
mentioned above. The IV-TT proposed to strengthen this
multi-model approach, by organizing the real-time pro-
vision of operational hindcasts and forecasts among
several GOV OOFSs. Recent experiences have shown
that (1) surface ocean parameters were the most needed
products, and (2) higher resolution improved the estimation
(e.g. for drift, dispersion, mixing, sinking, etc.). As a result,
since 2013, four global OOCs (US Navy with HYCOM-
NCODA, UK Met Ofce with the Forecast Ocean Assim-
ilation Model (FOAM), NOAA/NCEP with the Real-Time
Ocean Forecast System (RTOFS) and the Climate Forecast
System (CFS), and Mercator Océan with PSY3) have been
providing a daily rolling archive of model native grid elds
of best estimates and forecasts of T, S and currents at the
surface. From these nowcasts/forecasts, a rst multiple/
ensemble assessment has been made, focusing on SST,
together with two observation-only datasets chosen as
reference: the NCEP Real-Time Global (RTG) (Thiébaux
et al. 2003) and the GHRSST NAVO K10 level-4 (Martin
et al. 2012) dataset. Three ensemble computations are
dened using daily OOFS outputs: (1) simple arithmetic
mean average; (2) weighted average, based on root-mean-
square (RMS) daily differences of each member with
respect to the reference SST eld; and (3) clustered
average, based on a k-mean algorithm (Hartigan & Wong
1979). A rst hindcast comparison has now been
performed for July 2013 compared with NCEP RTG
(Figure 11). In this evaluation, Mercator_PSY3, FOAM
and CFS perform better than the two other OOFSs. Note
also that FOAM biases are slightly different using the
NAVO K10 product (not shown). This highlights the sensi-
tivity to uncertainty in the observational dataset used as
truth. Above all, Figure 11 shows that the use of an
ensemble results in an improvement over each of the
members, with the k-means clustered average performing
the best of the three ensemble methods. Similar ensemble
scores are obtained against the NAVO K10 SST (not
shown). RMS scores seem dependent on the number of
clusters: preliminary tests (not shown) from one to 10 clus-
ters indicate signicant improvements. This assessment is
ongoing, with further analysis of the ensemble mean com-
putation for forecasts and for other ocean parameters. One
of the key aspects of this community effort is the real-time
provision of these OOFS outputs.
Forecast skill: intercomparison of ocean parameters
against observations: Class 4 metrics assessment
The Class 4 metrics approach, developed during the EU
MERSEA Strand1 project (Crosnier & Le Provost 2007)
and improved during the EU MERSEA-Integrated Project,
was adopted at the international level by the GODAE com-
munity (Hernandez et al. 2009). This approach is based on
comparison with reference measurements, from space or
in situ, to assess the OOFS forecasting skill. Reference
data, providing ocean truth, are used in a similar way, to
infer the accuracy of both the best estimate/analyses and
the forecasts at different lead times. Additionally, to evaluate
the added value provided by the model and the OOFSs
Figure 10. Left: experimental area, east of the Limnos Island; surface velocities given by the model (blue arrows). Simulated oil spills
(cyan and black), centre of mass of the oil spills (red x) and drifter tracks (green x) are plotted. Right: time evolution of the oil-spill fore-
casting error (in kilometres, blue bars), derived as the distance between the drifter location and the centre of mass of the predicted oil-spill,
compared with the persistence of the centre of mass (also as the difference with drifters in km, red bars).
s232 F. Hernandez et al.
Downloaded by [Environment Canada Library Services / Offert par les Services de bibliothèque d'Environnement Canada] at 06:01 16 November 2015
short-term prediction efciency, tests of the skill with respect
to climatological elds and persistence are made.
Class 4 metrics performed in near-real-time are limited
by the availability and quality of observations with several
important consequences. First, owing to the scarcity of
ocean measurements, the real-time assessment relies on
observations that are also used by the assimilation
system. These observations can be considered approxi-
mately independent (neglecting the autocorrelation of
observation error in time) for forecast assessment though
in particular when considering short time-scale ocean
transient features. A second consequence is the larger
error budget for real-time observations that are not fully
cross-calibrated, veried and corrected as in the delayed
mode (Cabanes et al. 2013; Le Borgne et al. 2012). A
third consequence relates to the overall quality of reference
product in near-real time. If some biases exist between
these products and information of the same kind that is
assimilated, then validation scores can be impacted or
wrong.
Assessment using Class 4 metrics can be distinguished
from assimilation diagnostics in several ways. By assimila-
tion diagnostics, we refer to statistics in observation space
derived from the background minus observation(i.e. the
so-called mists); from the increments(i.e. the correction
applied to the background to obtain the analysis); or from
the analysis residual(i.e. analysis minus observations).
For most assimilation systems, there is a pre-processing
of Global DAC (GDAC) data through editing, ltering or
thinning, in order to limit the number of assimilation obser-
vations. This is done (1) because of a limit in the maximum
number of observations that can be used by the assimilation
scheme (computational requirements), (2) because some
observations are considered a priori redundant (i.e. thinning
provides a means to avoid having to consider correlated
observation error) or (3) because the observations include
features and dynamical processes and scales not rep-
resented by the forecasting system. Super-obingcan
also be used in this assimilation pre-processing. In any
case, the associated assimilation statistics and metrics
often result in a net reduction in the number of observations
considered. This is obviously not the purpose of the Class 4
metrics: ideally, all gooddata from the GDAC can be
compared with the OOFS elds and, by the way, measure
the accuracy, forecasting skill and scales not represented
by the OOFS. Thus, this approach is not OOFS dependent
(i.e. the way observations are assimilated, the ocean model
gridding, etc.). The same observation can be used in the
evaluation of several OOFSs. That is, provided the refer-
ence data are independent, exact inter-comparison is poss-
ible, considering a specic ocean process or parameter.
The Class 4strategy is now in place in several OOFSs,
for global (Lellouche et al. 2013; Blockley et al. 2012)or
regional assessment (Maraldi et al. 2013). In the framework
of the GOV IV-TT, the Class 4 metrics project aims to
stimulate the inter-comparison of OOFSs by verifying
Figure 11. Global SST hindcast comparison statistics with RTG SST, computed daily, in July 2013. RMS (top) and biases (bottom) time
series for HYCOM-NCODA (blue), RTOFS (green), FOAM (red), Mercator_PSY3 (cyan), CFS (purple) and ensemble averages: simple
(dashed yellow), weighted (dashed black), clustered (dashed blue). Units in Kelvin.
Journal of Operational Oceanography s233
Downloaded by [Environment Canada Library Services / Offert par les Services de bibliothèque d'Environnement Canada] at 06:01 16 November 2015
different aspects of ocean processes captured by the avail-
able observations in real time. A near-real-time inter-
comparison activity has been ongoing since January 2013.
Five OOCs are involved, and six OOFSs are compared,
looking at SST, T/S at depth and SSH with sea ice concen-
tration in preparation. A companion paper (Ryan et al.
2015) presents the global inter-comparison performed
over basin scale areas. This exercise also allows each of
the partners to assess more carefully the forecast capability
of every OOFS in the region of interest and measure the ef-
ciency of each system. The Australian group has performed
regionally this multi-system assessment presented in a
second companion paper (Divakaran et al. 2015).
Based on the statistics of the comparison with the same
observations, this Class 4 assessment allows the following
questions to be addressed:
.What is the relative reliability of each system for a
given parameter in near-real-time?
.What is the performance of each system in forecast
mode (5 days ahead)?
.What is the added value of the system compared with
climatology or persistence?
.What benets could be obtained through an ensemble
approach, compared with each individual system?
As part of the Class 4 intercomparison, interesting new
metrics and ways of representing the information graphi-
cally have been proposed to better synthesize the infor-
mation. For instance, radar charts provide the score of
each system at different forecast lead times, for all par-
ameters evaluated (Figure 12). Note that the terminology
hindcastis used here for analysis, nowcast, hindcast or
best estimate. Owing to the details of their real-time oper-
ational assimilation scheme, every centre is providing what
it considers as its best eldin near-real-time with
minimum delay. Scores are dened by RMSE, based on
differences between observation and model values for
each parameter normalized by the largest RMSE. Refer-
ence observations are fully described in the companion
paper (Ryan et al. 2015). Using this approach, one can
characterize the relative score of each OOFS for each par-
ameter. Missing parameters are not problematic: SLA is not
evaluated for RTOFS (Ryan et al. 2015), and the radar chart
can still be used for plotting scores from the other four
systems. Moreover, this approach allows us to assess
specic features, such as resoIution, by comparing the
global eddy permitting (PSY3, ¼°) and eddy-resolving
(PSY4, 1/12°) Mercator Océan OOFS. The radar charts
indicate that PSY3 skill scores are always better than
PSY4 scores, even for SLA. It is worth noting that PSY3
and PSY4 are run in parallel every day at Mercator
Océan (Lellouche et al. 2013), using the same forcing
elds and assimilating the same set of observations. In
this case, one may ask whether the observations used to
derive the Class 4 metrics are capable of assessing the
eddy-resolving capability of the PSY4 system, and if this
Class 4 metric is able to infer the mesoscale predictive
capabilities of these global high-resolution systems. In
this case, SLA assessment could be performed using
along-track satellite observations ltered differently, in
order to capture more mesoscale features. Similarly,
model SST could be compared with the highest resolution
and most reliable SST products provided in near-real-
time by DACs.
Figure 12. Class 4 global assessments over 2013. Radar charts for hindcast (HDCST), and 5-day forecasts (FRCST_5D). Four parameter
evaluations are displayed: 5100 m depth temperature (TEMP), and salinity (SAL), then SST, and SLA. Each score (between 0 and 1) is
normalized by the largest RMSE value among the ve evaluated OOFSs.
s234 F. Hernandez et al.
Downloaded by [Environment Canada Library Services / Offert par les Services de bibliothèque d'Environnement Canada] at 06:01 16 November 2015
Class 4 metrics over a given period and space domain
are also presented through lead-time summary plots (see
all gures of Ryan et al. 2015, and gure 7 from Divakaran
et al. 2015), or Taylor diagrams. Note that for the Austra-
lian regional seas assessment (Divakaran et al. 2015), this
diagram (gure 8 from Divakaran et al. 2015) also contains
shaded values of the skill score, as dened by Taylor
(2001). This score merges the accuracy (RMSE) and the
pattern (correlation) evaluation of the parameter variability.
Note also that this assessment of vertical parameters (here
T, S) is presented separately for biases (consistency assess-
ment), RMSE (quality, or accuracy assessment) and
anomaly correlation (pattern of the variability). This
allows OOCs to measure at which depth, and for which
water masses, the OOFS is reliable. At this stage, Class 4
metrics are univariate, but alternatively, these metrics can
be used in more ocean orientedgures such as TS dia-
grams (Figure 13). For this TS diagram only hindcasts
(i.e. not forecasts, persistence or climatology) are plotted
together with the observed values, for the sake of clarity.
Figure 13 shows that both PSY3 and PSY4 systems
present inaccurate dense waters at depth, while the rest of
the water column is qualitatively well represented for this
3-month period.
Inter-comparison of several OOFSs using Class 4
metrics also allows OOCs to address the added value that
using an ensemble approach might bring. Ryan et al.
(2015) show that the ensemble mean outperforms individ-
ual OOFS scores in most cases. Interestingly, in their
Figure 6, they propose a synthetic global view of the
most reliable OOFS for the four parameters tested. Other
new approaches mentioned above (IBI OOFS) involve
inter-comparing the same diagnostic for several OOFSs
in order to show the added value of regional versus
global, free model simulation versus assimilation or old
versus new system estimates (Figure 7).
Summary
Signicant progress has been made in ocean-model skill
assessment during the last 510 years. Under the constraints
of real-time operation, many forecasting centres have
implemented more mature validation and performance-
assessment procedures. The most advanced examples are
operationally integrated, modular and able to use any avail-
able reference dataset. Based on a large number of metrics,
they permit a diverse validation strategy: (1) comparing
old and new systems to measure potential improvements
and degradations; (2) comparing coarse resolution father
and nested high-resolution sonsystems to quantify the
added value of downscaling; (3) comparing adjacent or over-
lapping systems to verify the consistency of adjacent fore-
casts; (4) multi-model comparison to better characterize
model error growth using different systems running in paral-
lel; and (5) ensemble approaches to assess the benetof
ensemble versus individual system estimates.
Real-time assessments suffer from limitations imposed
owing to observation availability and quality, as many
high-quality reference datasets can only be used off-line
meaning that the routine monitoring skill evaluation is
less efcient. To avoid spurious effects from erroneous
real-time data (for assimilation or validation), quality
checking and control of input information (observations,
forcing elds) is performed by most OOFS. Moreover,
the systematic feedback of quality control information
and observation blackliststo providers is starting to be
integrated into OOFSs.
Figure 13. TS diagram using Class 4 metrics applied on PSY3 (blue) and PSY4 (green) Mercator Océan OOFSs. Hindcast elds for the
JanuaryMarch 2014 period over the Kuroshio Extension area. The observed values of temperature and salinity are plotted in red.
Journal of Operational Oceanography s235
Downloaded by [Environment Canada Library Services / Offert par les Services de bibliothèque d'Environnement Canada] at 06:01 16 November 2015
More complex metrics that are better suited to assessing
physical, ecosystem and biogeochemical forecast processes
are being progressively adopted in operational centres.
Multivariate metrics now complement univariate tech-
niques in order to enhance the parameters-oriented assess-
ments to full ocean process evaluations. In parallel, metrics
including Taylor diagrams, target diagrams and radar charts
are used to provide a more enhanced quantication of
model skill. Additional user-oriented metrics are also
being developed, complementing the basic assessment of
OOFSs with more detailed information about skill for
specic applications.
Operational ocean forecasting systems are evolving
toward higher horizontal resolution and eddy-resolving
capability, and offer ner mesoscale representation. For
instance, AVISO SSH or Reynolds SST L4 mapped products
offer 50100-km resolution. Hence, these products are no
longer suitable for evaluating 5-km-resolution global eddy-
permitting OOFS. For regional and coastal OOFSs provid-
ing sub-mesoscale description, this issue is even more
crucial. Their evaluation using the existing observing
system presents new issues: are the metrics currently used
reliable, and do they provide pertinent information?
The L4 observation-based products provided by oper-
ational DAC and their evaluation also have to be considered
carefully. First, these products can be used directly by the
scientic community or other users instead of model-based
products. Second, many OOFS validation procedures rely
on these products and can be decient if they are erroneous.
Finally, multi-model inter-comparison and ensemble
approaches offer several potential benets. For example,
forecast spread can be used for forecast error evaluation
and is particularly efcient if individual model errors are
not correlated (e.g. for models using different forcing). In
many studies, ensemble estimates are seen to benet from
qualities of each individual OOFS and to reduce errors.
With the initiatives carried out by the GOV IV-TT, oper-
ational oceanography is following a strategic path similar
to that of the weather-forecast community 30 years ago,
the goal being to routinely exchange information among
OOFS in a multi-model framework, and enhance both
system predictability and skill assessments, for the eventual
benet of OOFS users.
Acknowledgements
During the nal preparation of this article, our co-author Nicolas
Ferry passed away. He was an active member of GOV and
MyOcean, and was a signicant innovator in the validation
activity at Mercator Océan, mentioned in the present work. We
will greatly miss the scientic expert, the gentle colleague and
the good friend.
We thank the reviewers for their helpful comments and assist-
ance in improving the clarity of this synthesis article. This work
was supported by the European Commission funded projects
MyOcean (FP7-SPACE-2007-1) and MyOcean2 (FP7-SPACE-
2011-1).
Disclosure statement
No potential conict of interest was reported by the authors.
References
Allen JI, Somereld PJ. 2009. A multivariate approach to model
skill assessment. J Marine Syst. 76(12):8394. doi:http://
dx.doi.org/10.1016/j.jmarsys.2008.05.009
Balmaseda MA, Hernandez F, Storto A, Palmer MD, Alves O, Shi
L, Smith GC, Toyoda T, Valdivieso M, Barnier B, Behringer
D, Boyer T, Chang Y-S, Chepurin GA, Ferry N, Forget G,
Fujii Y, Good S, Guinehut S, Haines K, Ishikawa Y, Keeley
S, Köhl A, Lee T, Martin MJ, Masina S, Masuda S,
Meyssignac B, Mogensen K, Parent L, Peterson KA, Tang
YM, Yin Y, Vernieres G, Wang X, Waters J, Wedd R, Wang
O, Xue Y, Chevallier M, Lemieux J-F, Dupont F, Kuragano
T, Kamachi M, Awaji T, Caltabiano A, Wilmer-Becker K,
Gaillard F. 2014. The Ocean Reanalyses Intercomparison
project (ORA-IP). J Oper Oceanogr. 8(S1):s80s97.
Balmaseda MA, Mogensen K, Weaver AT. 2013. Evaluation of
the ECMWF ocean reanalysis system ORAS4. Q J Roy
Meteor Soc. 139(674):11321161. doi:10.1002/qj.2063
Bell MJ, Lefebvre M, Le Traon P-Y, Smith N, Wilmer-Becker K.
2009. GODAE, The global ocean data experiment. Oceanogr
Magazine. 22(3):1421. doi:http://dx.doi.org/10.5670/
oceanog.2009.62
Benestad RE, Senan R, Balmaseda MA, Ferranti L, Orsolini Y,
Melsom A. 2011. Sensitivity of summer 2-m temperature to
sea ice conditions. Tellus A. 63(2):324337. doi:10.1111/j.
1600-0870.2010.00488.x
Blockley EW, Martin MJ, Hyder P. 2012. Validation of FOAM
near-surface ocean current forecasts using Lagrangian drifting
buoys. Ocean Sci. 8(4):551565. doi:10.5194/os-8551
2012
Blockley EW, Martin MJ, McLaren AJ, Ryan AG, Waters J, Lea
DJ, Mirouze I, Peterson KA, Sellar A, Storkey D. 2014.
Recent development of the Met Ofce operational ocean fore-
casting system: An overview and assessment of the new
global FOAM forecasts. Geosci Model Dev. 7(6):2613
2638. doi:10.5194/gmd-7-2613-2014
Brassington GB. 2013. Multicycle ensemble forecasting of sea
surface temperature. Geophys Res Lett. 40
(23):2013GL057752. doi:10.1002/2013GL057752
Cabanes C, Grouazel A, von Schuckmann K, Hamon M, Turpin
V, Coatanoan C, Paris F, Guinehut S, Boone C, Ferry N, de
Boyer Montégut C, Carval T, Reverdin G, Pouliquen S,
and Le Traon P-Y. 2013. The CORA dataset: validation and
diagnostics of in-situ ocean temperature and salinity
measurements. Ocean Sci. 9(1):118. doi:10.5194/os-91
2013
Crosnier L, Le Provost C. 2007. Inter-comparing ve forecast
operational systems in the North Atlantic and mediterranean
basins: The MERSEA-strand1 methodology. J Marine Syst.
65(14):354375. doi:10.1016/j.jmarsys.2005.01.003
Dash P, Ignatov A, Martin M, Donlon CJ, Brasnett B, Reynolds
RW, Banzon V, Beggs H, Cayula J-F, Chao Y, Grumbine R,
Maturi E, Harris A, Mittaz J, Sapper J, Chin TM, Vazquez-
Cuervo J, Armstrong EM, Gentemann CL, Cummings JA,
Piollié J-F, Autret E, Roberts-Jones J, Ishizaki S, Høyer JL,
and Poulter D. 2012. Group for high resolution sea surface
temperature (GHRSST) analysis elds inter-comparisons,
Part 2: Near real time web-based level 4 SST quality
monitor (L4-SQUAM). Deep Sea Res Part II. 7780:3143.
doi:http://dx.doi.org/10.1016/j.dsr2.2012.04.002
s236 F. Hernandez et al.
Downloaded by [Environment Canada Library Services / Offert par les Services de bibliothèque d'Environnement Canada] at 06:01 16 November 2015
Desroziers G, Berre L, Chapnik B, Poli P. 2005. Diagnosis of
observation, background and analysis-error statistics in obser-
vation space. Q J Roy Meteor Soc. 131(613):33853396.
doi:10.1256/qj.05.108
Desroziers G, Ivanov S. 2001. Diagnosis and adaptive tuning of
observation-error parameters in a variational assimilation. Q
J Roy Meteor Soc. 127(574):14331452. doi:10.1002/qj.
49712757417
Divakaran P, Brassington GB, Ryan AG, Regnier C, Spindler T,
Mehra A, Hernandez F, Smith GC, Liu YY, Davidson F.
2015. GODAE OceanView Class 4 inter-comparison for the
Australian region. J Oper Oceanogr. 8(S1):s112s126.
Doney SC, Lima I, Moore JK, Lindsay K, Behrenfeld MJ,
Westberry TK, Mahowald N, Glover DM, Takahashi T.
2009. Skill metrics for confronting global upper ocean eco-
system-biogeochemistry models against eld and remote
sensing data.J Marine Syst. 76(12):95112. doi:http://dx.
doi.org/10.1016/j.jmarsys.2008.05.015
Drévillon M, Greiner E, Paradis D, Payan C, Lellouche J-M,
Reffray G, Durand E, Law-Chune S, Cailleau S. 2013. A
strategy for producing rened currents in the equatorial
Atlantic in the context of the search of the AF447 wreckage.
Ocean Dynam. 63(1):6382. doi:10.1007/s10236012
05802
Feng H, Vandemark D, Quilfen Y, Chapron B, Beckley B. 2006.
Assessment of wind-forcing impact on a global wind-wave
model using the TOPEX altimeter. Ocean Eng. 33(11
12):14311461. doi:http://dx.doi.org/10.1016/j.oceaneng.
2005.10.015
Fujii Y, Kamachi M. 2003. Three-dimensional analysis of temp-
erature and salinity in the equatorial Pacic using a variational
method with vertical coupled temperature-salinity empirical
orthogonal function modes. J Geophys Res. 108(C9):3297.
doi:10.1029/2002JC001745
Gregg WW, Friedrichs MAM, Robinson AR, Rose KA, Schlitzer
R, Thompson KR, Doney SC. 2009. Skill assessment in ocean
biological data assimilation. J Marine Syst. 76(12):1633.
doi:http://dx.doi.org/10.1016/j.jmarsys.2008.05.006
Hartigan JA, Wong MA. 1979. Algorithm AS 136: A K-means
clustering algorithm. J Roy Stat Soc C. 28(1):100108.
doi:10.2307/2346830
Hernandez F, Bertino L, Brassington GB, Chassignet EP,
Cummings JA, Davidson F, Drévillon M, Garric G,
Kamachi M, Lellouche J.-M., Mahdon R, Martin MJ,
Ratsimandresy A, Regnier C. 2009. Validation and intercom-
parison studies within GODAE. Oceanogr Magazine. 22
(3):128143. doi:http://dx.doi.org/10.5670/oceanog.2009.71
Hernandez F. 2011. Performance of ocean forecasting systems
Intercomparison projects. In: Schiller A, Brassington GB,
editor. Operational oceanography in the 21st century.
Springer Science+Business Media B. V.; p. 633655.
doi:10.1007/978-94-007-0332-2_23
Hyder P, Storkey D, Blockley EW, Guiavarch C, Siddorn J,
Martin M, Lea D. 2012. Assessing equatorial surface currents
in the FOAM Global and Indian Ocean models against obser-
vations from the global tropical moored buoy array. J Oper
Oceanogr. 5(2):2539.
Jolliff JK, Kindle JC, Shulman I, Penta B, Friedrichs MAM,
Helber R, Arnone RA. 2009. Summary diagrams for
coupled hydrodynamic-ecosystem model skill assessment. J
Marine Syst. 76(12):6482. doi:http://dx.doi.org/10.1016/j.
jmarsys.2008.05.014
Kawamura H, Kobayashi T, Furuno A, In T, Ishikawa Y,
Nakayama T, Shima S, Awaji T. 2011. Preliminary numerical
experiments on oceanic dispersion of 131I and 137Cs
discharged into the ocean because of the Fukushima Daiichi
nuclear power plant disaster. J Nucl Sci Technol. 48
(11):13491356. doi:10.1080/18811248.2011.9711826
Lagemaa P, Janssen F, Jandt S, Kalev K. 2013. Pers. Comm.
General Validation Framework for the Baltic Sea in
GODAE OceanView Symposium Poster. Baltimore, USA.
Lagemaa P. 2013. Pers. Comm. Ice model metrics and scores for
the Baltic Sea in HIROMB-BOOS Scientic Workshop. oral
presentation. Tallinn, Estonia.
Lazzari P, Solidoro C, Ibello V, Salon S, Teruzzi A, Béranger K,
Colella S, Crise A. 2012. Seasonal and inter-annual variability
of plankton chlorophyll and primary production in the
Mediterranean sea: A modelling approach. Biogeosciences.
9(1):217233. doi:10.5194/bg-92172012
Le Borgne P, Marsouin A, Orain F, Roquet H. 2012. Operational
sea surface temperature bias adjustment using AATSR data.
Remote Sens Environ. 116(0):93106. doi:http://dx.doi.org/
10.1016/j.rse.2010.02.023
Lellouche J-M, Le Galloudec O, Drévillon M, Régnier C, Greiner
E, Garric G, Ferry N, Desportes C, Testut C-E, Bricaud C,
Bourdallé-Badie R, Tranchant B, Benkiran M, Drillet Y,
Daudin A, De Nicola C. 2013. Evaluation of global monitor-
ing and forecasting systems at Mercator Océan. Ocean
Science. 9(1):5781. doi:10.5194/os-9-57-2013
Levy M, Martin AP. 2013. The inuence of mesoscale and subme-
soscale heterogeneity on ocean biogeochemical reactions.
Global Biogeochemical Cycles. 27(4):11391150. doi:10.
1002/2012gb004518
Lorente P, Soto-Navarro J, Alvarez Fanjul E, Piedracoba S. 2014.
Accuracy assessment of high frequency radar current
measurements in the Strait of Gibraltar. J Oper Oceanogr. 7
(2):5973.
Lynch DR, McGillicuddy Jr DJ, and Werner FE. 2009. Skill
assessment for coupled biological/physical models of
marine systems. J Marine Syst. 76(12):13. doi:http://dx.
doi.org/10.1016/j.jmarsys.2008.05.002
Maraldi C, Chanut J, Levier B, Ayoub N, De Mey P, Reffray G,
Lyard FH, Cailleau S, Drévillon M, Alvarez Fanjul E,
Garcia Sotillo M, Marsaleix P, the Mercator Research
Development T. 2013. NEMO on the shelf: Assessment of
the Iberia-Biscay-Ireland conguration. Ocean Science. 9
(4):745771. doi:10.5194/os-97452013
Martin M, Dash P, Ignatov A, Banzon V, Beggs H, Brasnett B,
Cayula J-F, Cummings J, Donlon C, Gentemann C,
Grumbine R, Ishizaki S, Maturi E, Reynolds RW, Roberts-
Jones J. 2012. Group for high resolution sea surface tempera-
ture (GHRSST) analysis elds inter-comparisons. Part 1: A
GHRSST multi-product ensemble (GMPE). Deep Sea Res
Part II. 7780:2130. doi: http://dx.doi.org/10.1016/j.dsr2.
2012.04.013
Martin M. 2011. Ocean forecasting systems: Product evaluation
and skill. In: Schiller A, Brassington GB, editors.
Operational oceanography in the 21st century. Springer
Science+Business Media B. V.; p. 611632. doi:10.1007/
978-94-007-0332-2_22
Masumoto Y, Miyazawa Y, Tsumune D, Tsubono T, Kobayashi T,
Kawamura H, Estournel C, Marsaleix P, Lanerolle L, Mehra
A, Garraffo ZD. 2012. Oceanic dispersion simulations of
137Cs released from the Fukushima Daiichi nuclear power
plant. Elements. 8(3):207212. doi:10.2113/gselements.8.3.
207
Melsom A, Counillon F, LaCasce J, Bertino L. 2012. Forecasting
search areas using ensemble ocean circulation modeling.
Ocean Dynam. 62(8):12451257. doi:10.1007/s10236-012-
0561-5
Journal of Operational Oceanography s237
Downloaded by [Environment Canada Library Services / Offert par les Services de bibliothèque d'Environnement Canada] at 06:01 16 November 2015
Moore AM, Arango HG, Broquet G, Powell BS, Weaver AT,
Zavala-Garay J. 2011. The Regional Ocean Modeling
System (ROMS) 4-dimensional variational data assimilation
systems: Part I System overview and formulation. Prog
Oceanogr. 91(1):3449. doi:http://dx.doi.org/10.1016/j.
pocean.2011.05.004
Mourre B, Chiggiato J. 2014. A comparison of the performance of
the 3-D super-ensemble and an ensemble Kalman lter for
short-range regional ocean prediction. Tellus: Series A.
66:114. doi:10.3402/tellusa.v66.21640
Murphy AH, Winkler RL. 1987. A general framework for forecast
verication. Mon Weather Rev. 115(7):13301338. doi:10.
1175/15200493(1987)115<1330:agfffv>2.0.co;2
Murphy AH. 1993. What is a good forecast an essay on the
nature of goodness in weather forecasting. Weather
Forecast. 8(2):281293. doi:10.1175/1520-0434(1993)
008<0281:wiagfa>2.0.co;2
Oke PR, Brassington GB, Cummings JA, Martin MJ, Hernandez
F. 2012. GODAE inter-comparisons in the Tasman and Coral
Seas. J Oper Oceanogr. 5(2):1124.
Oke PR, Grifn DA, Schiller A, Matear RJ, Fiedler R,
Mansbridge J, Lenton A, Cahill M, Chamberlain MA,
Ridgway KR. 2013. Evaluation of a near-global eddy-resol-
ving ocean model. Geosci Model Dev. 6(3):591615.
doi:10.5194/gmd-6-591-2013
Pearson K. 1904. Mathematical Contributions to the theory of
Evolution. On the Theory of Contingency and Its Relation
to Association and Normal Correlation. Research Memoirs
Biometric Series I. Department of Applied Mathematics,
University College, University of London. London, UK.
http://ia600408.us.archive.org/18/items/cu31924003064833/
cu31924003064833.pdf
Rose KA, Roth BM, Smith EP. 2009. Skill assessment of spatial
maps for oceanographic modeling. J Marine Syst. 76
(12):3448. doi:http://dx.doi.org/10.1016/j.jmarsys.2008.05.
013
Ryan AG, Regnier C, Divakaran P, Spindler T, Mehra A, Smith
GC, Liu YY, Davidson F, Hernandez F, Maksymczuk J, Lui
Y. 2015. GODAE OceanView Class 4 forecast verication fra-
mework: Global ocean inter-comparison. J Oper Oceanogr.
8(S1):s98-s111.
Schiller A, Bell M, Brassington G, Brasseur P, Barciela R, De
Mey P, Dombrowsky E, Gehlen M, Hernandez F,
Kourafalou V, Larnicol G, Le Traon P-Y, Martin M, Oke P,
Smith GC, Smith N, Tolman H, Wilmer-Becker K. 2015.
Synthesis of New scientic challenges for GODAE
OceanView. J Oper Oceanogr doi: 10.1080/1755876X.2015.
1049901.
Scott RB, Ferry N, Drévillon M, Barron CN, Jourdain NC,
Lellouche J-M, Metzger EJ, Rio M-H, Smedstad OM.
2012. Estimates of surface drifter trajectories in the equator-
ial Atlantic: A multi-model ensemble approach.
Ocean Dynam. 62(7):10911109. doi:10.1007/s10236
01205482
Stow CA, Jolliff J, McGillicuddy Jr DJ, Doney SC, Allen JI,
Friedrichs MAM, Rose KA, Wallhead P. 2009. Skill assess-
ment for coupled biological/physical models of marine
systems. J Marine Syst. 76(12):415. doi:http://dx.doi.org/
10.1016/j.jmarsys.2008.03.011
Taylor KE. 2001. Summarizing multiple aspects of model per-
formance in a single diagram. J Geophys Res. 106
(D7):71837192. doi:10.1029/2000JD900719
Thiébaux J, Rogers E, Wang W, Katz B. 2003. A new high-resol-
ution blended real-time global sea surface temperature analy-
sis. B Am Meteorol Soc. 84(5):645656. doi:10.1175/
BAMS-845645
Usuii N, Ishizaki S, Fujii Y, Tsujino H, Yasuda T, Kamachi M.
2006. Meteorological research institute multivariate ocean
variational estimation (MOVE) system: Some early results.
Adv Space Res. 37(4):806822. doi:10.1016/j.asr.2005.09.
022
Volpe G, Colella S, Forneris V, Tronconi C, Santoleri R. 2012.
The mediterranean ocean colour observing system System
development and product validation. Ocean Sci. 8(5):869
883. doi:10.5194/os-88692012
Volpe G, Santoleri R, Vellucci V, Ribera dAlcalà M, Marullo S,
DOrtenzio F. 2007. The colour of the Mediterranean Sea:
Global versus regional bio-optical algorithms evaluation and
implication for satellite chlorophyll estimates. Rem Sens
Environ. 107(4):625638. doi:http://dx.doi.org/10.1016/j.
rse.2006.10.017
Wilkin JL, Hunter EJ. 2013. An assessment of the skill of real-
time models of Mid-Atlantic Bight continental shelf circula-
tion. J Geophys Res-Oceans. 118(6):29192933. doi:10.
1002/jgrc.20223
Xie J, Zhu J, Li Y. 2008. Assessment and inter-comparison of ve
high-resolution sea surface temperature products in the shelf
and coastal seas around China. Cont Shelf Res. 28(10
11):12861293. doi:http://dx.doi.org/10.1016/j.csr.2008.02.
020
s238 F. Hernandez et al.
Downloaded by [Environment Canada Library Services / Offert par les Services de bibliothèque d'Environnement Canada] at 06:01 16 November 2015
... On the other hand, validation refers to the operational ocean analyses and forecast performance assessment, while in operation. Finally, verification is defined by Hernandez et al. (2015) as the a posteriori quantification of operational ocean forecast skill, preferentially based on independent data, which means observational products not used to constrain the model products; for instance, by means of any kind of data assimilation. ...
... In the same year, this methodology was tested for the Spanish coast by Puertos del Estado (Spain) (Pérez et al., 2012), combining the output of Nivmar (Álvarez-Fanjul et al., 2001), an existing storm surge forecasting system, with circulation (baroclinic) models already operating in the region. Nowadays, at Puertos del Estado is operational a multi-model surge forecast named ENSURF that combines Nivmar with two Copernicus Marine Service regional operational models, IBI-MFC (Sotillo et al., 2015) and MedFS (Clementi et al., 2021). ...
Chapter
Full-text available
The architecture of ocean monitoring and forecasting systems
... At present, a number of global or regional comparison of international operational forecast systems based on the IV-TT Class4 standard are available, and the results show that no single system performs optimally in all forecasting parameters [9][10][11][12]. This paper is an evaluation of the NMEFC-NEMO based on the GOV-IVTT class 4 metrics for the first time. ...
Article
Full-text available
Based on the IV-TT Class4 metrics, this paper comprehensively evaluates the forecasting performance of the National Marine Environment Forecast Center High-resolution Global Ocean Forecasting System (hereinafter referred to as NMEFC-NEMO) on sea surface temperature (SST), sea level anomaly (SLA) and temperature-salinity profiles. The RMSE of NMEFC-NEMO for SST and SLA over forecast length has the smallest error among the international operational systems, which are 0.45°C and 0.07m, respectively. The forecast accuracy of NMEFC-NEMO temperature and salt profile RMSE is in the middle among the operational systems, with the values of 0.69°C and 0.24 PSU. In the profile forecast, the error of NMEFC-NEMO is slightly larger than other systems at 50-100 m depth, which is related to the fact that NMEFC-NEMO does not assimilate ARGO temperature and salinity directly. In addition, with respect to the forecast skill scores, NMEFC-NEMO has a positive climatology skill scores for both SST and SLA, and the persistence skill scores for SST is better than that for SLA.
... The solution was subjected to verification and validation procedures to assess the product quality (e.g. Hernandez et al. (2015)). A null velocity field was assumed as well as a sea surface height field with a null gradient. ...
Article
Full-text available
The northwestern coast of the Iberian Peninsula is a region characterized by pronounced hydrologic and biogeochemical activity, resulting in important fish and shellfish resources whose exploitation has a strong local socioeconomic impact. This high biological diversity is strongly dependent on coastal upwelling induced by favourable winds, which presents seasonal variability. This motivates the present study, which aims to understand the relation between local circulation, hydrography and Chl-a concentration under summer upwelling events of different intensities and clarify their effects in the region. To achieve this purpose, a coupled physical-biological model was developed and validated for the northwestern coast of the Iberian Peninsula, based on the use of MOHID modelling system and the application of a nested domains methodology. Comparison of predictions with observations demonstrated the model's accuracy in reproducing the physical and biogeochemical properties of the study region, both at the surface and along the water column. The study of different summer upwelling events shows that the local phytoplankton patterns are dependent on the characteristics of the event. Results show that under high upwelling favourable winds, a surface southwestward flow and an equatorward flow through the water column develop near the coast, inducing offshore advection of nutrient and phytoplankton-rich waters. Otherwise, under light upwelling favourable winds, surface currents are weak, a poleward flow develops, and phytoplankton is retained near the coast.
... In order to demonstrate the improvements of the different SCSOFS sub-versions during the upgrading process, the results of the intercomparison and assessment are presented and discussed in this section using the GOV Intercomparison and Validation Task Team (IV-TT) Class 4 verification framework (Hernandez et al., 2009). Class 4 metrics were originally used for intercomparison and validation among different global or regional OOFSs or assimilation systems Hernandez et al., 2015;. They include four metrics: the bias for assess- Figure 9. Schematic representation of the data assimilation procedure for two consecutive cycles, n and n + 1, in SCSOFSv2 while considering the FGAT and IAU methods. ...
Article
Full-text available
The South China Sea Operational Oceanography Forecasting System (SCSOFS), constructed and operated by the National Marine Environmental Forecasting Center of China, has been providing daily updated hydrodynamic forecasting in the South China Sea (SCS) for the next 5 d since 2013. This paper presents recent comprehensive updates to the configurations of the physical model and data assimilation scheme in order to improve the forecasting skill of the SCSOFS. This paper highlights three of the most sensitive updates: the sea surface atmospheric forcing method, the discrete tracer advection scheme, and a modification of the data assimilation scheme. Intercomparison and accuracy assessment among the five sub-versions were performed during the entire upgrading process using the OceanPredict Intercomparison and Validation Task Team Class 4 metrics. The results indicate that remarkable improvements have been made to the SCSOFSv2 with respect to the original version (known as SCSOFSv1). The domain-averaged monthly mean root-mean-square errors of the sea surface temperature and sea level anomaly have decreased from 1.21 to 0.52 ∘C and from 21.6 to 8.5 cm, respectively.
... The adoption of shared evaluation strategies and quality assessment methodologies for model analysis, reanalysis and forecasting is very important for the operational oceanography community to report consistently to stakeholders the various products' performance levels. A standard set of diagnostics, called "MERSEA-GODAE metrics" (Class 1 to Class 4), provides an overview of the ocean a dynamics and an evaluation of prediction systems quality, consistency and performance (Crosnier and Le Provost, 2007;Hernandez et al., 2015;Simoncelli et al., 2016;Davidson et al., 2019). The communication of products' quality can occur through ad hoc documents or more recently through web-based applications, which provide more interactive functionality and the possibility to intercompare different models at the same time. ...
Article
Full-text available
SOURCE utility for reprocessing, calibration, and evaluation is a software designed for web applications that permits to calibrate and validate ocean models within a selected spatial domain using in-situ observations. Nowadays, in-situ observations can be freely accessed online through several marine data portals together with the metadata information about the data provenance and its quality. Metadata information and compliance with modern data standards allow the user to select and filter the data according to the level of quality required for the intended use and application. However, the available data sets might still contain anomalous data, bad data flagged as good, due to several reasons, i.e., the general quality assurance procedures adopted by the data infrastructure, the selected data type, the timeliness of delivery, etc. In order to provide accurate model skill scores, the SOURCE utility performs a secondary quality check, or re-processing, of observations through gross check tests and a recursive statistical quality control. This first and basic SOURCE implementation uses Near Real Time moored temperature and salinity observations distributed by the Copernicus Marine Environment and Monitoring Service (CMEMS) and two model products from Istituto Nazionale di Geofisica e Vulcanologia (INGV), the first an analysis and the second a reanalysis, distributed during CMEMS phase I for the Mediterranean Sea. The SOURCE tool is freely available to the scientific community through the ZENODO open access repository, consistent with the open science principles and for that it has been designed to be relocatable, to manage multiple model outputs, and different data types. Moreover, its observation reprocessing module provides the possibility to characterize temperature and salinity variability at each mooring site and continuously monitor the ocean state. Highest quality mooring time series at 90 sites and the corresponding model values have been obtained and used to compute model skill scores. The SOURCE output also includes mooring climatologies, trends, Probability Density Functions and averages at different time scales. Model skill scores and site statistics can be used to visually inspect both model and sensor performance in Near Real Time at the single site or at the basin scale. The SOURCE utility uptake allows the interested user to adapt it to its specific purpose or domain, including for example additional parameters and statistics for early warning applications.
Chapter
This part is dedicated to research and development in the European coastal zone, which is of great importance, as coastal regions are among the most productive, rich in terms of ecosystems, resources, and services, but also among the most affected and exploited areas on the planet.
Article
Full-text available
The Mediterranean Sea is a prominent climate-change hot spot, with many socioeconomically vital coastal areas being the most vulnerable targets for maritime safety, diverse met-ocean hazards and marine pollution. Providing an unprecedented spatial and temporal resolution at wide coastal areas, high-frequency radars (HFRs) have been steadily gaining recognition as an effective land-based remote sensing technology for continuous monitoring of the surface circulation, increasingly waves and occasionally winds. HFR measurements have boosted the thorough scientific knowledge of coastal processes, also fostering a broad range of applications, which has promoted their integration in coastal ocean observing systems worldwide, with more than half of the European sites located in the Mediterranean coastal areas. In this work, we present a review of existing HFR data multidisciplinary science-based applications in the Mediterranean Sea, primarily focused on meeting end-user and science-driven requirements, addressing regional challenges in three main topics: (i) maritime safety, (ii) extreme hazards and (iii) environmental transport process. Additionally, the HFR observing and monitoring regional capabilities in the Mediterranean coastal areas required to underpin the underlying science and the further development of applications are also analyzed. The outcome of this assessment has allowed us to provide a set of recommendations for future improvement prospects to maximize the contribution to extending science-based HFR products into societally relevant downstream services to support blue growth in the Mediterranean coastal areas, helping to meet the UN's Decade of Ocean Science for Sustainable Development and the EU's Green Deal goals.
Preprint
Full-text available
The Mediterranean Sea is a prominent climate change hot spot, being their socio-economically vital coastal areas the most vulnerable targets for maritime safety, diverse met-ocean hazards and marine pollution. Providing an unprecedented spatial and temporal resolution at wide coastal areas, High-frequency radars (HFRs) have been steadily gaining recognition as an effective land-based remote sensing technology for a continuous monitoring of the surface circulation, increasingly waves and occasionally winds. HFR measurements have boosted the thorough scientific knowledge of coastal processes, also fostering a broad range of applications, which has promoted their integration in the Coastal Ocean Observing Systems worldwide, with more than half of the European sites located in the Mediterranean coastal areas. In this work, we present a review of existing HFR data multidisciplinary science-based applications in the Mediterranean Sea, primarily focused on meeting end-users and science-driven requirements, addressing regional challenges in three main topics: i) maritime safety; ii) extreme hazards; iii) environmental transport process. Additionally, the HFR observing and monitoring regional capabilities in the Mediterranean region required to underpin the underlying science and the further development of applications are also analyzed. The outcome of this assessment has allowed us to finally provide a set of recommendations for the future improvement prospects to maximize the contribution in extending the science-based HFR products into societal relevant downstream services to support the blue growth in the Mediterranean coastal areas, helping to meet the UN’s Decade of Ocean Science for Sustainable Development and the EU’s Green Deal goals.
Article
Full-text available
Current seasonal forecast models involve simple schemes for representing sea ice, such as imposing climatological values. The spread of ensemble forecasts may in principle be biased due to common boundary conditions prescribed in the high latitudes. The degree of sensitivity in the 2-metre temperature, associated with seasonal time scales and the state of the June–August sea ice, is examined through a set of experiments with a state-of-the-art coupled ocean-atmosphere model. Here we present a suite of numerical experiments examining the effect of different sea ice configurations on the final ensemble distribution. We also compare the sensitivity of the 2-metre temperature to sea ice boundary conditions and sea surface temperature perturbation in the initial conditions. One objective of this work was to test a simple scheme for a more realistic representation of sea ice variations that allows for a spread in the Polar surface boundary conditions, captures the recent trends and doesn’t smudge the sea ice edges. We find that the use of one common set of boundary conditions in the polar regions has little effect on the subsequent seasonal temperatures in the low latitudes, but nevertheless a profound influence on the local temperatures in the mid-to-high latitudes.
Article
Full-text available
This study presents a model of chlorophyll and primary production in the pelagic Mediterranean Sea. A 3-D-biogeochemical model (OPATM-BFM) was adopted to explore specific system characteristics and quantify dynamics of key biogeochemical variables over a 6 yr period, from 1999 to 2004. We show that, on a basin scale, the Mediterranean Sea is characterised by a high degree of spatial and temporal variability in terms of primary production and chlorophyll concentrations. On a spatial scale, important horizontal and vertical gradients have been observed. According to the simulations over a 6 yr period, the developed model correctly simulated the climatological features of deep chlorophyll maxima and chlorophyll west-east gradients, as well as the seasonal variability in the main offshore regions that were studied. The integrated net primary production highlights north-south gradients that differ from surface net primary production gradients and illustrates the importance of resolving spatial and temporal variations to calculate basin-wide budgets and their variability. According to the model, the western Mediterranean, in particular the Alboran Sea, can be considered mesotrophic, whereas the eastern Mediterranean is oligotrophic. During summer stratified period, notable differences between surface net primary production variability and the corresponding vertically integrated production rates have been identified, suggesting that care must be taken when inferring productivity in such systems from satellite observations alone. Finally, specific simulations that were designed to explore the role of external fluxes and light penetration were performed. The subsequent results show that the effects of atmospheric and terrestrial nutrient loads on the total integrated net primary production account for less than 5 % of the its annual value, whereas an increase of 30 % in the light extinction factor impacts primary production by approximately 10 %.
Article
Full-text available
Since December 2010, the MyOcean global analysis and forecasting system has consisted of the Mercator Océan NEMO global 1/4° configuration with a 1/12° nested model over the Atlantic and the Mediterranean. The open boundary data for the nested configuration come from the global 1/4° configuration at 20° S and 80° N. The data are assimilated by means of a reduced-order Kalman filter with a 3-D multivariate modal decomposition of the forecast error. It includes an adaptive-error estimate and a localization algorithm. A 3-D-Var scheme provides a correction for the slowly evolving large-scale biases in temperature and salinity. Altimeter data, satellite sea surface temperature and in situ temperature and salinity vertical profiles are jointly assimilated to estimate the initial conditions for numerical ocean forecasting. In addition to the quality control performed by data producers, the system carries out a proper quality control on temperature and salinity vertical profiles in order to minimise the risk of erroneous observed profiles being assimilated in the model. This paper describes the recent systems used by Mercator Océan and the validation procedure applied to current MyOcean systems as well as systems under development. The paper shows how refinements or adjustments to the system during the validation procedure affect its quality. Additionally, we show that quality checks (in situ, drifters) and data sources (satellite sea surface temperature) have as great an impact as the system design (model physics and assimilation parameters). The results of the scientific assessment are illustrated with diagnostics over the year 2010 mainly, assorted with time series over the 2007–2011 period. The validation procedure demonstrates the accuracy of MyOcean global products, whose quality is stable over time. All monitoring systems are close to altimetric observations with a forecast RMS difference of 7 cm. The update of the mean dynamic topography corrects local biases in the Indonesian Throughflow and in the western tropical Pacific. This improves also the subsurface currents at the Equator. The global systems give an accurate description of water masses almost everywhere. Between 0 and 500 m, departures from in situ observations rarely exceed 1 °C and 0.2 psu. The assimilation of an improved sea surface temperature product aims to better represent the sea ice concentration and the sea ice edge. The systems under development are still suffering from a drift which can only be detected by means of a 5-yr hindcast, preventing us from upgrading them in real time. This emphasizes the need to pursue research while building future systems for MyOcean2 forecasting.
Article
Full-text available
Analysis of the variability of the last 18 yr (1993–2012) of a 32 yr run of a new near-global, eddy-resolving ocean general circulation model coupled with biogeochemistry is presented. Comparisons between modelled and observed mean sea level (MSL), mixed layer depth (MLD), sea level anomaly (SLA), sea surface temperature (SST), and {chla} indicate that the model variability is realistic. We find some systematic errors in the modelled MLD, with the model generally deeper than observations, which results in errors in the {chla}, owing to the strong biophysical coupling. We evaluate several other metrics in the model, including the zonally averaged seasonal cycle of SST, meridional overturning, volume transports through key straits and passages, zonally averaged temperature and salinity, and El Niño-related SST indices. We find that the modelled seasonal cycle in SST is 0.5–1.5 °C weaker than observed; volume transports of the Antarctic Circumpolar Current, the East Australian Current, and Indonesian Throughflow are in good agreement with observational estimates; and the correlation between the modelled and observed NINO SST indices exceeds 0.91. Most aspects of the model circulation are realistic. We conclude that the model output is suitable for broader analysis to better understand upper ocean dynamics and ocean variability at mid- and low latitudes. The new model is intended to underpin a future version of Australia's operational short-range ocean forecasting system.
Article
Full-text available
The Forecast Ocean Assimilation Model (FOAM) is an operational ocean analysis and forecast system run daily at the Met Office. FOAM provides modelling capability in both deep ocean and coastal shelf sea regimes using the NEMO (Nucleus for European Modelling of the Ocean) ocean model as its dynamical core. The FOAM Deep Ocean suite produces analyses and 7-day forecasts of ocean tracers, currents and sea ice for the global ocean at 1/4◦ resolution. Satellite and in situ observations of temperature, salinity, sea level anomaly and sea ice concentration are assimilated by FOAM each day over a 48 h observation window. The FOAM Deep Ocean configurations have recently undergone a major upgrade which has involved the implementation of a new variational, first guess at appropriate time (FGAT) 3D-Var, assimilation scheme (NEMOVAR); coupling to a different, multi-thickness-category, sea ice model (CICE); the use of coordinated ocean-ice reference experiment (CORE) bulk formulae to specify the surface boundary condition; and an increased vertical resolution for the global model. In this paper the new FOAM Deep Ocean system is introduced and details of the recent changes are provided. Results are presented from 2-year reanalysis integrations of the Global FOAM configuration including an assessment of short-range ocean forecast accuracy. Comparisons are made with both the previous FOAM system and a non-assimilative FOAM system. Assessments reveal considerable improvements in the new system to the near-surface ocean and sea ice fields. However there is some degradation to sub-surface tracer fields and in equatorial regions which highlights specific areas upon which to focus future improvements.
Article
Full-text available
The marine environment plays an increasingly important role in shaping economies and infrastructures, and touches upon many aspects of our lives, including food supplies, energy resources, national security and recreational activities. Global Ocean Data Assimilation Experiment (GODAE) and GODAE OceanView have provided platforms for international collaboration that significantly contribute to the scientific development and increasing uptake of ocean forecasting products by end users who address societal issues such as those listed above. Many scientific challenges and opportunities remain to be tackled in the ever-changing field of operational oceanography, from the observing system to modelling, data assimilation and product dissemination. This paper provides a brief overview of past achievements in GODAE OceanView, but subsequently concentrates on the future scientific foci of GODAE OceanView and its Task Teams, and provides a vision for the future of ocean forecasting.
Article
Full-text available
This paper compares the performance of short-range operational ocean forecasts, using ‘observational space’ metrics developed under GODAE OceanView (GOV). Best estimates (behind the real-time analysis) and forecasts are inter-compared for the Australian region (0-50S, 90-180E) for 2013. Systems considered include those developed in Australia, France, Canada, United Kingdom and USA. Each system is compared to observations of along-track sea level anomaly, sea surface temperature observations from surface drifters and sub-surface Argo profiles of temperature and salinity. The UK operational system generally has the smallest errors for sea surface temperature and sea level anomaly for the Australian region. However, the French systems outperform others in sub-surface temperature and salinity for the region. Of the two products provided by the Australian centre, an ensemble based approach is found to perform better than the deterministic system, having higher skill and lower root mean square errors. Some of the ‘better’ results of systems can be attributed in part to the lack of independence of the reference observations; however the study does demonstrate the feasibility and robustness of GOV global ocean inter-comparison efforts for regional applications.
Article
Full-text available
As part of the work of the GODAE OceanView Inter-comparison and Validation Task Team (IV-TT), 6 global ocean forecasting systems spread across 5 operational oceanography forecast centres were inter-compared using a common set of observations as a proxy for the truth. The ‘Class 4’ in the title refers to a set of forecast verification metrics defined in the MERSEA-IP/GODAE internal metrics document (Hernandez 2007), the defining feature of which is that comparisons between forecasts and observations take place in observation space. This approach is seen as a departure from other diagnostic approaches such as analysing model trends or innovation statistics, and is commonly used in the atmospheric community. The physical parameters involved in the comparison are sea surface temperature (SST), sub-surface temperature, sub-surface salinity and sea level anomaly (SLA). SST was measured using in-situ observations obtained from USGODAE, sub-surface conditions were compared to Argo profiles, while sea level anomaly was measured by several satellite altimeters courtesy of AVISO. The 5 forecast centres involved in the project were Met Office, Australian Bureau of Meteorology, Mercator Océan, Environment Canada and NOAA/NWS/NCEP. Combining Met Office, Mercator Océan and Environment Canada forecasts into a mixed resolution multi-model ensemble produces estimates of the ocean state which have better accuracy and associativity properties for SST, SLA and temperature profiles than any individual ensemble component.
Article
Full-text available
This study compares the ability of two approaches integrating models and data to forecast the Ligurian Sea regional oceanographic conditions in the short-term range (0-72 hours) when constrained by a common observation dataset. The post-processing 3-D super-ensemble (3DSE) algorithm, which uses observations to optimally combine multi-model forecasts into a single prediction of the oceanic variable, is first considered. The 3DSE predictive skills are compared to those of the Regional Ocean Modeling System model in which observations are assimilated through a more conventional ensemble Kalman filter (EnKF) approach. Assimilated measurements include sea surface temperature maps, and temperature and salinity subsurface observations from a fleet of five underwater gliders. Retrospective analyses are carried out to produce daily predictions during the 11-d period of the REP10 sea trial experiment. The forecast skill evaluation based on a distributed multi-sensor validation dataset indicates an overall superior performance of the EnKF, both at the surface and at depth. While the 3DSE and EnKF perform comparably well in the area spanned by the incorporated measurements, the 3DSE accuracy is found to rapidly decrease outside this area. In particular, the univariate formulation of the method combined with the absence of regular surface salinity measurements produces large errors in the 3DSE salinity forecast. On the contrary, the EnKF leads to more homogeneous forecast errors over the modelling domain for both temperature and salinity. The EnKF is found to consistently improve the predictions with respect to the control solution without assimilation and to be positively skilled when compared to the climatological estimate. For typical regional oceanographic applications with scarce subsurface observations, the lack of physical spatial and multivariate error covariances applicable to the individual model weights in the 3DSE formulation constitutes a major limitation for the performance of this multi-model-data fusion approach compared to conventional advanced data assimilation strategies.
Article
Full-text available
An assessment of accuracy of a three site short-range (27 MHz) CODAR SeaSonde HF radar network deployed in the Strait of Gibraltar is attempted by comparing its surface current estimates with measurements from a moored point current meter. Radial and total current vectors are compared for a 47 day period from 19 October to 4 December 2013, yielding angular offsets, root mean square errors and correlations in the range 2?–30?, 8–22 cm.s–1 and 0.31–0.81, respectively. Statistics improve when the measured antenna pattern is used, except at one radar site. A self-consistency check in overwater baseline reveals that the dominant source of velocity differences is HF radar variance error.