ArticlePDF Available

Abstract and Figures

In this paper we describe how we have introduced workflows into the working practices of a community for whom the concept of workflows is very new, namely the heliophysics community. Heliophysics is a branch of astrophysics which studies the Sun and the interactions between the Sun and the planets, by tracking solar events as they travel throughout the Solar system. Heliophysics produces two major challenges for workflow technology. Firstly it is a systems science where research is currently developed by many different communities who need reliable data models and metadata to be able to work together. Thus it has major challenges in the semantics of workflows. Secondly, the problem of time is critical in heliophysics; the workflows must take account of the propagation of events outwards from the sun. They have to address the four dimensional nature of space and time in terms of the indexing of data.We discuss how we have built an environment for Heliophysics workflows building on and extending the Taverna workflow system and utilising the myExperiment site for sharing workflows. We also describe how we have integrated the workflows into the existing practices of the communities involved in Heliophysics by developing a web portal which can hide the technical details from the users, who can concentrate on the data from their scientific point of view rather than on the methods used to integrate and process the data. This work has been developed in the EU Framework 7 project HELIO, and is being disseminated to the worldwide Heliophysics community, since Heliophysics requires integration of effort on a global scale.
Content may be subject to copyright.
J Grid Computing (2013) 11:481–503
DOI 10.1007/s10723-013-9256-5
Workflows for Heliophysics
Anja Le Blanc ·John Brooke ·Donal Fellows ·
Marco Soldati ·David Pérez-Suárez ·
Alessandro Marassi ·Andrej Santin
Received: 14 May 2012 / Accepted: 16 April 2013 / Published online: 8 May 2013
© Springer Science+Business Media Dordrecht 2013
Abstract In this paper we describe how we have
introduced workflows into the working prac-
tices of a community for whom the concept of
workflows is very new, namely the heliophysics
community. Heliophysics is a branch of astro-
physics which studies the Sun and the interactions
between the Sun and the planets, by tracking solar
events as they travel throughout the Solar system.
Heliophysics produces two major challenges for
workflow technology. Firstly it is a systems science
where research is currently developed by many
different communities who need reliable data
models and metadata to be able to work together.
Thus it has major challenges in the semantics
A. Le Blanc (B)·J. Brooke ·D. Fellows
University of Manchester, Oxford Road,
Manchester M13 9PL, UK
e-mail: anja.leblanc@manchester.ac.uk
J. Brooke
e-mail: john.brooke@manchester.ac.uk
D. Fellows
e-mail: donal.k.fellows@manchester.ac.uk
M. Soldati
Fachhochschule Nordwestschweiz,
Institute of 4D Technologies,
Steinackerstrasse 5,
5210 Windisch, Switzerland
e-mail: marco.soldati@fhnw.ch
of workflows. Secondly, the problem of time is
critical in heliophysics; the workflows must take
account of the propagation of events outwards
from the sun. They have to address the four di-
mensional nature of space and time in terms of the
indexing of data. We discuss how we have built an
environment for Heliophysics workflows building
on and extending the Taverna workflow system
and utilising the myExperiment site for sharing
workflows. We also describe how we have inte-
grated the workflows into the existing practices
of the communities involved in Heliophysics by
developing a web portal which can hide the techni-
cal details from the users, who can concentrate on
D. Pérez-Suárez
Trinity College Dublin, College Green,
Dublin 2, Ireland
e-mail: d.perezsuarez@tcd.ie
A. Marassi ·A. Santin
INAF-Astronomical Observatory of Trieste,
Loc. Basovizza n. 302, 34012 Trieste, Italy
A. Marassi
e-mail: marassi@oats.inaf.it
A. Santin
e-mail: asantin@oats.inaf.it
Present Address:
D. Pérez-Suárez
Finnish Meteorological Institute, Erik Palménin
aukio 1, 00560 Helsinki, Finland
482 A. Le Blanc et al.
the data from their scientific point of view rather
than on the methods used to integrate and process
the data. This work has been developed in the
EU Framework 7 project HELIO, and is being
disseminated to the worldwide Heliophysics com-
munity, since Heliophysics requires integration of
effort on a global scale.
Keywords Workflow ·Taverna ·myExperiment ·
Taverna Server ·Portal integration ·Heliophysics
1 Introduction
Heliophysics is a discipline within astrophysics
which studies the Sun, Heliosphere, and Planetary
Environments as a single connected system. Over
years the astronomy and heliophysics domains
have created Virtual Observatories (VxO)1[1]to
provide access to data and software tools. VxOs
provide single points of access to datasets scat-
tered over the whole planet. Their main purpose
is to enable support for new ways of doing re-
search [24]. Many VxOs are based on a collec-
tion of individual services which are accessible,
for example, through a graphical portal applica-
tion. VxOs benefit from workflows as they allow
users to combine several services to do more com-
plex tasks and to automate repeated executions
of common procedures. Specific workflows could
then be deployed as standalone services, therefore
becoming an integrated part of a VxO.
In this paper we describe our work in the
Heliophysics Integrated Observatory (HELIO)
project2[5], which created a VxO for heliophys-
ics, in which workflows play a central role. The
use of workflows in the heliophysics domain is
relatively new. We describe our experiences and
how workflows are integrated into the different
areas of the infrastructure. This includes allowing
access to Grid resources, since some of the tasks
1The astronomy and solar physics communities refer to vir-
tual observatories as “VOs”, but this acronym is not used
in this paper to avoid confusion with virtual organisations.
2HELIO project website: http://helio-vo.eu.
in a heliophysics workflow require large amounts
of processing power (e.g., for image analysis).
For the same reason, we also require that the
workflows can be run in the Grid without any
need for users to remain connected via a user in-
terface. However, the users still need to be able to
monitor and retrieve the results from an interface
with which they are familiar.
A particular feature of the work presented in
this paper will be the description of our efforts
to embed the use of workflows into the working
practices of the heliophysics community. We have
had to cater for two broad classes of users. Firstly
there are heliophysicists who are confident with
novel methods of computation, who want to have
access to “power tools” for building workflows,
to be used directly by themselves and also by
others in their community. Secondly, there is
a much larger category of heliophysicists who
want a data-centric interface for working, where
the workflows are hidden from direct view, but
are presented as services with the functionality
to perform data analysis or transformation. The
workflows are presented from the HELIO portal3
(the VxO portal) as “virtual services”. This allows
the primary user interface to remain as simple as
possible, but allows the tools that it accesses to be
customised for particular methods of working with
data as these are developed by the scientists.
The following section will familiarise the reader
with some background information specific to
the domain of heliophysics; it discusses data and
web services which are the fundamentals of any
workflow. Section 3describes the experience of
introducing workflows to the community and clas-
sifies workflows. Section 4deals with the sharing
of created workflows in the community. Section 5
describes the development of the Taverna Server
which is required for the remote execution of
workflows and is the basis of the integration of
workflow execution in the HELIO portal which is
the topic of Section 6. We summarise and discuss
our achievements in Section 7.
3The HELIO portal is accessible at http://hfe.helio-vo.eu.
Workflows for Heliophysics 483
2 Background
2.1 Data in the Heliophysics Domain
Remote observations are the main data source
in astronomy. These are made by different tele-
scopes (either ground- or space-based) in different
wavelengths. The fact that the movement of the
stars and extra-galactic objects in our sky are
relatively small makes it easy to classify the ob-
servation in databases and catalogues, i.e., each
object has a unique coordinate in a 2D space—our
celestial dome.
In heliophysics this situation is considerably
more complicated. Planets are orbiting around
the Sun, making their location time dependent.
Also solar features are time dependent, as their
position depends on the rotation of the Sun (i.e.,
approximately once every 25 days on the so-
lar equator). Moreover, events on the Sun pro-
duce disturbances through their propagation in
the solar system. This makes heliophysics a 4-
dimensional domain; time is critical in HELIO
workflows and searches. Also, since observations
are made inside the heliosphere, we have to in-
tegrate data indexed by a variety of different co-
ordinate systems (e.g., many instruments use an
instrument-centric coordinate system).
Observations in heliophysics are not only
sensed remotely, but there are also in-situ mea-
surements. While remotely sensed observations
are recorded within minutes after an event oc-
curred, in-situ measurements of the same event
are detected from several hours to days after
it originates, as different particles propagates at
different velocities. This makes the association of
related observations much more difficult.
Historically heliophysics developed out of
a number of diverse/independent disciplines
(e.g., solar physics, planetary physics, space
weather,...). This led to big differences in
how these disciplines handled observations (i.e.,
different file formats, analysis software tools, and
data archives). Most of the disciplines have cre-
ated their own VxOs to keep their data easily
accessible, yet they are disconnected. Thus, it is
complicated for a scientist to perceive the big
picture of an event, as the relevant data needs to
be collected and combined from multiple places.
Besides observations, there are also data pro-
ducts (e.g., features being detected manually or
automatically on the Sun). Some are included
as catalogues on websites, others are offered as
“event lists” in scientific papers. As with data,
each of these data products follow different princi-
ples (e.g., name conventions, date formats, units).
HELIO collects some of these different data pro-
ducts under the umbrella of services according to
their principal subject. There is a certain amount
of integration work necessary to make these in-
dividual data products fit the overall model. The
typical ingestion process is in principle structured
as follows:
1. data source (catalogue) identification and se-
lection;
2. original data parsing and conversion into a
database compatible intermediate ASCII file;
3. table design and creation following the meta-
data constraints;
4. data ingestion from the ASCII file;
5. UCD/Utype compilation for the catalogue
VOTable.
Step number 3 makes sure that the same ta-
ble field names identify the same contents, which
are described differently in different data sources,
but are semantically equivalent (e.g., begin time,
start time, start of the event are defined as
time_start).
Step number 5 assigns meta-data tags to each
database field. UCD (Unified Content Descrip-
tors) [6] is a list of controlled vocabulary to classify
content semantically and to interoperate hetero-
geneous datasets; UType is a string which refer-
ences entries in an external data model (in our
case the common overarching HELIO data model
integrating all the different service data models).
This helps users to link data delivered from one
service to data belonging to any other service.
VOTable is the standard XML output format used
to exchange astronomy data [7].
2.2 Web Services in the Heliophysics Domain
Historically heliophysicists have provided web ap-
plications as interfaces to their data or stand-alone
tools for their community. Most of the scientific
484 A. Le Blanc et al.
analysis and visualisation was done using the IDL
[8] scripting language. With the rising popularity
of web services in other domains, new heliophysics
projects adopted this technology. The CASSIS
European Project [9] conducted a study on the
interoperability readiness of available European
services and found that a majority of services out-
side the ones created by the HELIO project are
not yet accessible via standard web service proto-
cols,suchasSOAP[10]orREST[11]. Projects
such as CASSIS are currently undertaken with
the aim to foster interoperability and accessibility
Table 1 List of web services provided by HELIO
Service name Description Protocols Auth4
Short Long SOAP REST
CTS Coordinate transformations Allows the transformation from 
service a set of input coordinates to a
set of output coordinates
CXS Context service Provides images of context 
information such as GOES
lightcurve, flare locations, or
the model of Parker’s spiral
DES Data evaluation service Creates on demand event lists 
by user defined criteria
DPAS Data provider access service Provides access to data files 
of various providers
HEC HELIO event catalogue Provides access to a diverse 
range of heliophysics events
HFC HELIO feature catalogue Contains catalogues of 
filaments, coronal holes,
sunspots, active regions,
and type III radio bursts
HPS HELIO processing service Executes user provided codes 
on the Grid service in Ireland
HRS HELIO registry service Provides a registry to all 
available services
HSS HELIO storage service Provides file store for user 
data
ICS Instrument capabilities service Contains a list of instruments 
and observatories and their
characteristics
ILS Instrument locations service Provides the locations of 
planets and observatories in
multiple coordinate systems
over a 30 year period
SMS Semantic mappings service Provides mappings between
terms in the heliophysics
domain
TavServ Taverna workflow execution Provides remote execution 
of Taverna workflows
UOC Unified observing catalogue Provides information about 
observing position and
observing time periods for
pointed instruments
4Indicates whether a user has to authenticate to the service
Workflows for Heliophysics 485
across project and domain boundaries. Outside
of the European area, NASA does provide some
web services based on the REST protocol [12].
The HELIO project is based on a service ori-
ented architecture. Every task is implemented
as a standalone web service. Table 1lists the
available HELIO services. The web services are
hosted on hardware according to the services re-
quirements (e.g., UOC on a server with large and
fast file store, HPS on a server with many CPU,
and Taverna Server (TavServ) on a server with
large memory). Most HELIO web services can
also be accessed through a dedicated web inter-
face. Additionally the HELIO portal (described in
Section 6) provides a unified web interface to all
services.
From the point of view of functionality the
HELIO services can be grouped into four categories.
1. Meta-data access services are based on the
HELIO Query Interface (HQI) and return
data that describe observatories, solar events,
solar features, observation data, ...The HQI
is specified in WSDL and implemented as a
configurable Java servlet which can be de-
ployed in any servlet container. It provides a
standardised, consistent and safe bridge to a
data table in a relational database. As query
languages the interface supports PQL (Para-
meterized Query Language) [13] and a subset
of SQL (Structured Query Language) [14].
2. Processing services provide access to a vari-
ety of data processing infrastructures offered
by HELIO. One of these infrastructures and
of particular interest for this article is the
Taverna server (Section 5). Others are the
context service (CXS), the processing service
(HPS) and the data evaluation service (DES).
3. Infrastructure services are used internally by
HELIO for administration and maintenance
of its infrastructure. Relevant to this paper
are the HELIO registry service which offers
a registry of all available services and the
myExperiment repository which allows us to
upload Taverna workflows (Section 4).
4. The Data access services give access to the
actual observations made by observatories.
3 Workflow Use in Heliophysics
The workflows developed within HELIO com-
bine the web services provided by the project. Any
workflow system which supports standard web
service technology could be used for this purpose.
Our requirements for a workflow system were
assessed as:
Use of standard web service protocols
Support of looping
Execution of workflows in both a client and a
server environment
Available support and documentation
Platform independence
Support of multiple instances of a service and
automatic fail-over.
There are many workflow systems which can
support these requirements, notably including sys-
tems based on BPEL [15], Pegasus [16], Unicore
[17], KNIME [18], Galaxy [19], Kepler [20], and
Taverna [21]. Each of these has its own set of
advantages and disadvantages:
BPEL-based workflow systems (and the Petri-
net [22] formalism on which they are based)
tend to focus more on the handling of indivi-
dual pieces of data, rather than larger streams
of data. This is a result of the primary topic
area being business workflows, and makes it
less suitable for scientific use where the model
of failure processing is different.
Pegasus is not really a workflow server as such,
so much as a system for executing workflows.
In terms of servers, it is relatively mature, but
it suffers from the relative lack of user-focused
tools for the preparation of the workflows;
the creation of a Pegasus workflow primarily
requires the use of programming tools; this
restricts its use among scientists in disciplines
where programming talent is historically a
lesser requirement.
Unicore is mainly not a workflow system
but rather a mature system for accessing
computer-based resources, though it includes
a workflow processing component for al-
lowing the resources to be coordinated for
486 A. Le Blanc et al.
higher-level tasks. Due to its historic focus on
use in high-performance computing, the fun-
damental coordination units are at the level of
files rather than records; processing units need
to be able to handle entire data collections
themselves rather than having the workflow
system level manage that for them.
KNIME is a commercial workflow system
used largely for processing of high-throughput
genomic data. While it is an exceptionally
strong candidate for use within its domain, it
is relatively poor at supporting other scientific
disciplines due to the fact that its workflow
components are focused on supporting ge-
nomics and it is not an open ecosystem tool.
Galaxy is a server-based scientific workflow
system primarily used in the biosciences that
uses a pure local processing model, though
those local processing elements may make
web service calls to other systems. There is
a collaboration effort in place to allow joint
Galaxy-Taverna workflows to be created.
Kepler is a general scientific workflow sys-
tem focused on server-based execution and a
graphical design tool, popular particularly in
North America. It has a very rich collection
of generic workflow system components, and
a mechanism for the creation of new workflow
components.
Taverna is a workflow system consisting of
a graphical workbench for design and local
execution, a separate command-line tool for
pure workflow execution, and a server for
execution of workflows remotely. It also in-
cludes a social ecosystem service for allowing
scientists to share access to workflows as pub-
lished artefacts. It is focused mainly on being
a system for routing substantial numbers of
relatively-small data items through arbitrary
web services, allowing it to be relatively eas-
ily adapted to novel data flows and service
types. At the time that the HELIO project
started, there was an obsolete Taverna 1
Server, though it had fallen into disuse due
to its dependence on aspects of the internal
architecture of Taverna that had ceased to be
true with the evolution of Taverna 2.
Though historically focused on the cellular
biosciences, Taverna now supports use across
a wide array of disciplines, including chem-
istry, astronomy, physiology and biodiversity
modelling.
The HELIO project decided to use the Taverna
workflow system [21] since it provided most of
the required functionality, is still in active devel-
opment with a responsive support mailing list,
and the developers are proactive in seeking input
and feedback for further development of their
product. An acceptable alternative would have
been the US-based Kepler [20] workflow engine.
Among others, Kepler is used in several NASA-
funded projects. To facilitate the use of HELIO-
developed services by as wide a community as
possible, a key goal of the project was to en-
sure that all basic services would be workflow
system agnostic: they had to be equally usable
from Taverna, Kepler, a web-site front end, or
end-user programming language toolkits (with a
particular focus on IDL [8] given its historical
prominence within the seed communities). How-
ever, the higher-level services were to be imple-
mented using and on top of Taverna, as it was
felt that it was the workflow system that had the
best support for use by scientists with limited
programming experience while still providing the
necessary flexibility to be able actually to perform
the desired workflow tasks.
Taverna workbench, the workflow editor, is
system independent and provides an easy to use
graphical user interface for constructing, editing
and executing workflows. Users are able to build
up libraries of services for use in their domain. The
integration of a workflow repository (Section 4)
via Taverna’s plug-in system enhances the usabil-
ity. The heliophysics community will also benefit
from Taverna plug-ins developed for the astro-
physics domain [23] which uses the same service
repository software and shares the interchange
format for tables commonly returned by HELIO
web services (i.e., VOTable).
On a larger scale, some Astronomy VxOs have
started looking at workflows as early as 2005 [24].
In 2011 the International Virtual Observatory
Alliance (IVOA) [25] even started a discussion
mailing list focused on workflows. A draft IVOA
note [26] on that subject is being discussed. So
Workflows for Heliophysics 487
workflows gain interest not only in the heliophys-
ics community but in other domains as well.
3.1 Experiences in Introducing Workflows
to Heliophysics Scientists
So far, most of the workflows have been de-
veloped by a computer scientist in close co-
operation with heliophysicists. The complexity of
the workflows was growing with the maturity
of the services they use. One of the goals of
HELIO is to help scientists to write workflows
themselves as part of their everyday work. The
strategy we followed was first to create some inter-
est by developing workflows and demonstrating
their strengths and their usefulness, and then to
provide training targeted at the discipline. Follow-
ing on from this was a period of time when the
technologist is available to help solving problems.
At the beginning of the project we used every
possibility to give short presentations and demon-
strations at workshops and presented posters to
the scientific community at conferences. After the
HELIO services were completed, we organised
and ran a Taverna tutorial session using example
workflows involving these HELIO services. This
tutorial was conducted just a week before a Co-
ordinated Data Analysis Workshop where parti-
cipants could apply that knowledge with computer
scientist present to solve problems.
Here are a few observations from our
experience:
The Learning Curve depends on the previous
exposure to a graphical user interface operated
mainly by mouse actions. Queries are done in
SQL/PQL and the user needs to know in ad-
vance the name of the tables and columns. So
the heliophysicist needs, besides knowing some
basic SQL/PQL, information about the data-
base itself. The HQI provides some support
functions to retrieve this information. Scripts in
Taverna are written in Java; most heliophysi-
cists do not know that language but are used
to writing code in IDL [8]. Heliophysics data
formats such as VOTable [7] are not easily
used in a workflow environment because they
require conversion scripts. Multidimensionality
of data vectors is difficult to understand, espe-
cially in connection with looping over functions
or providing input of the correct depth.
Documentation is very important, not only for
the workflow system, but especially for web
services. Lack of documentation renders a web
service unusable by anyone but the developers.
Documentation has to go beyond the function-
ality of the service and has to include documen-
tation of the content; in case of databases that
includes the tables and table structures. It is also
important to have one central point from which
it is possible to find the web services and their
documentation. In HELIO we have a service
registry as this central point.
Problem Solving is a technique which needs to
be taught alongside workflow building. The
workflow system contains new areas for possi-
ble errors be it iteration strategies or list hand-
ling. Each of these workflow specific error
domains shows specific characteristics. Users
who recognise these characteristics can resolve
problems quickly.
Integration of current working practises; Scientists
have built up a set of tools or scripts [27]which
already perform certain tasks which they do not
wish to re-implement within the workflow sys-
tem. Taverna now allows the seamless integra-
tion of tools external to Taverna. A disadvan-
tage of this is that workflows constructed in that
way are less usable by other scientists unless
the exact environment is replicated. The same
issues occur when trying to run these workflows
on a workflow server.
3.2 Application Areas for Workflows
in Heliophysics
Workflows can be constructed for different pur-
poses. In the following sections we describe three
classes of workflows. The first class is used as
integration tests at development time in order to
assert consistency between different services. The
second class introduces virtual services which pro-
vide new functionality in support of science. The
third class implements the actual science analysis
by combining web services, user-defined opera-
tions and virtual services into larger workflows.
The three classes are illustrated each by an
example. In the associated workflow diagrams the
488 A. Le Blanc et al.
colours of the squares represent different kind of
operators:
green: SOAP operator
purple: XML splitter—decomposes complex
SOAP types into their components
brown: local beanshell [28] scripts—user written
scripts to provide custom functionality
violet: local operator—predefined functions
within Taverna
pink: nested workflows—workflows re-used in-
side another workflow
blue: string constant—a string or text which does
not change
Input and output ports are of a different blue
colour and separately labelled.
3.2.1 Test of Data and Services
During the development of the service infrastruc-
ture, HELIO uses workflows to assert the robust-
ness of individual services and to test consistency
between multiple services. In particular the latter
is of great value in a distributed development
environment. When the output of one service is
used as input for another service, the exchanged
data needs to be kept aligned. An example is the
identifier for an instrument. In HELIO this ID
is defined in the Instrument Capabilities Service
(ICS). Many other services, such as DPAS, UOC,
DES, ILS, and HFC use it as an input parameter.
As the HELIO services are developed by inde-
pendent teams in different locations it is impor-
tant to find inconsistencies as early as possible in
an automated way.
An example of a test of data workflow can
be seen in Fig. 1. It tests the integrity of IDs
between ICS and data provider access service
(DPAS). This workflow [29] does not require any
inputs since it is checking the complete content
of the ICS against the instrument registered in
the DPAS. In a first step the workflow requests
all instrument IDs from the ICS in a SOAP call
(Label 1 in Fig. 1). In a next step the data is
extracted from the VOTable output format and
provided as list for further evaluation (Label 2).
Information about available instruments is acces-
sible in the DPAS via a servlet (Label 3). Again,
suitable data needs to be extracted in a local bean-
shell and cleaned up by removing duplicate entries
(Label 4). Any entry in the instrument ID list from
the DPAS which is not part of the instrument
IDs from the ICS represents data which could be
available to scientists but would not be accessible
since the ID to that data is not registered within
the system; another beanshell searches for those
IDs (Label 5). On the other hand IDs which are
known in the ICS but not in the DPAS only
represent data sources to which HELIO does not
have access. The workflow returns the result list in
two formats, a VOTable with the IDs of missing
ICS entries and a string list of the same IDs.
The advantages of implementing those tests as
workflows is a large time saving and a higher
reliability compared to a manual check. At time of
writing this article the ICS knows 362 instruments
and the DPAS has access to 263. The workflow
identified 21 IDs which are not part of the ICS.
The developers of the ICS and DPAS can take the
results of this workflow to resolve the problems
and to improve the services.
3.2.2 Provide Virtual Service
A virtual service is a workflow which provides
a building block for the implementation of com-
plex scientific use cases. As such it is used in
different circumstances to provide specific func-
tionality which supports the scientific work. It can
be integrated into workflows or into web portals
alongside other web services. There are two types
of virtual services. The first type accesses an exter-
nal service with some default input values to pro-
vide a more specialized functionality. This makes
it possible to simplify the interface to the external
service for the user. The second type combines
a number of services or service calls into some-
thing new. In a workflow environment virtual ser-
vices are commonly used as nested workflows—
a workflow within another workflow. Integrated
in the general user interface it becomes a service
indistinguishable from other web services. This is
expanded in Section 6.
The UOC is a service which provides access to
a large database containing information of when,
where and what pointed instruments have ob-
served. Pointed instruments can observe a specific
Workflows for Heliophysics 489
Fig. 1 Integration test
workflow; the workflow
checks the consistency
between instrument IDs
which are used in the
DPAS and are defined in
the ICS. Test workflows
support the development
of physically separate
services. The labels of
each section of the
workflow are explained
in the text
Split_VOTable_into_its_values
Workflow input ports
Workflow output ports
Workflow output ports
votable
createLists
nameucdutypevalues
Flatten_List
DPAS_instrumentsinstruments_vector
Get_Web_Page_from_URL
RetrieveContent
Remove_String_Duplicates
String_List_Difference
writeVOTable
SQLSelect_sql_select
ICS_SQLSelect
FROM_value WHAT_value
DPAS_url_servlet
region of the Sun in different locations or of vary-
ing size. A typical scientific question addressed by
the UOC is to find out which instrument has ob-
served a particular region at a given time. Query-
ing the UOC in the standard way would result
in a large table with an entry for every recorded
observation rather than just a list of instrument
IDs. The workflow [30] is implemented as virtual
service and uses a more complex call to the UOC
to return only the instrument IDs. This informa-
tion is sufficient for a scientist to obtain potentially
interesting data files.
The workflow shown in Fig. 2is an example of a
virtual service that combines several service calls.
In this particular case the calls are used to handle
asynchronous service calls properly. An asynchro-
nous service call requires a number of individual
service calls to perform a task. Asynchronous web
services usually perform long running tasks where
the output can not be reliably produced before a
request would time out. Depending on the ser-
vice you find at least three stages: submit the
task, check on status of processing, and request
results. HELIO provides a number of services
where only asynchronous functions are provided.
One of those services is the processing service
which performs user defined code executions on
the HPS. SHEBA [31], the propagation model
490 A. Le Blanc et al.
status
Workflow input ports
Workflow output ports
Workflow input ports
Workflow output ports
executionId
getStatusOfExecution_input
getOutputOfExecution
executionStatus
acceptInput
getStatusOfExecution
getStatusOfExecution_output
starting_time
selectedApplication_input
hit_object
selectedApplication_input_2
SW_starting_speed
selectedApplication_input_3
SW_error_speed
selectedApplication_input_4
VOTable
executeApplication
executeApplication_output
executeApplication_input
fastExecution_value numOfParallelJobs_value executeApplication_selectedApplication
pm_cir_back
starting_time hit_object speed error_speed
getOutputOfExecution_input
getOutputOfExecution_output
Concatenate_two_strings
Get_Web_Page_from_URL
file_name
Fig. 2 Virtual service; the workflow performs a backwards propagation of a co-rotating interaction region on the HPS. The
HPS provides an asynchronous web service only. The virtual service hides the complexity from the user
used in HELIO, is an example application which
runs in such a way. Having the propagation model
as a virtual service enables scientists to run it as a
single component where they do not have to worry
about the individual steps or error cases. Different
solar phenomena propagate differently through
Workflows for Heliophysics 491
the solar system, therefore requiring different
models to describe this propagation. Each of these
is encapsulated in a separate virtual service. The
workflow we use as an example is the backwards
propagation of a Co-rotating Interaction Region
(CIR) [32]. CIRs are regions of high speed solar
wind that co-rotate with the Sun following a spiral
shape (i.e., the Parker spiral [33]). These regions
are associated with coronal holes, a feature seen
on the solar corona. SHEBA can be run forward
Workflow output ports
Workflow input ports
VOTable_hec
STARTTIME
particle_query
ENDTIME beta
Propagation_of_the_Solar_Energetic_Particles_from_the_Sun
sw_speed sw_error_speed eastside_westside
flare_where
particle
Extract_content_of_columns_from_VOTablesVOTableProton
construct_input
ColumnNames_value
TimesAtSun
hit_object_value
Flare_query
dt_value
Flare
Flatten_List
FROM_value_3
Radio_query
Radio_Type_III
CME_query
CME
VOTableCME
Extract_content_of_columns_from_VOTables_2
CreateRadioInput
start_timepeak_time
VOTableFlare
combine_cme_flare
VOTabelFlare
WHERE_value
CreateVOTable
goes_flare_sep_event
soho_lasco_cmehfc_wind_waves
Fig. 3 Scientific workflow; the workflow associating SEP events with solar flare, CME and Type III Radio burst events.
Workflows support large scale and statistical analysis
492 A. Le Blanc et al.
and backwards; this means that from the location
of a coronal hole at a certain time it provides infor-
mation about when and where the CIR associated
with it should be detectable and vice-versa, thus
helping the heliophysicist to find a relationship
between the properties of the coronal hole and the
CIR.
3.2.3 Advance Scientific Research
It is challenging and expensive to collect actual
data in Heliophysics. The data are either re-
motely sensed or in situ measurements of events
propagating through the heliosphere. The nature
of events in this science makes it impossible to
gather any data at the event source. Most obser-
vatories are located only at key positions in the
heliosphere, as in orbits around planets or the
Lagrangian points (positions in space where
the combined gravitational pull of two large
masses provide a stationary position relative to
them), while others are travelling through the he-
liosphere (e.g., Voyager spacecraft). That all leads
to the relative sparsity of data sources. Heliophys-
ical events are characterised by their variability,
and their effects could have been affected by a
multitude of surrounding influences. Therefore
it is rarely straightforward to connect something
experienced on Earth with an event on the Sun,
or to predict the dangers of something remotely
observed at the Sun to satellites around Earth or
to power grids on it.
Heliophysicists spent time researching the
effects of single events, trying to propagate them
through the heliosphere, looking for their signa-
tures in other data sets. Once a connection is
identified, a workflow can reproduce the single
steps the scientist took and find other events
which show the same behaviour, or identify events
with the same global parameters where the beha-
viour could not be reproduced which could point
to some missing influences. All workflows which
were created to help answer or reproduce some
scientific question are done in close co-operation
with heliophysicists.
Let us describe a workflow which was created
during a Co-ordinated Data Analysis Workshop
where a group of scientists tried to associate so-
lar energetic particle (SEP) events measured at
Earth with flares, coronal mass ejections (CMEs)
and radio events observed on the Sun [34]. This
workflow, shown in Fig. 3, requires a time range
and propagation parameters as inputs and pro-
ceeds to work as follows:
1. Find which SEP events have been observed at
Earth during the time of interest. This is done
through a query to one of the lists at the HEC.
2. Propagate the events found backwards to re-
trieve the time and position in which the par-
ticles were accelerated towards Earth. This is
calculated by the propagation model available
at the HPS.
3. Flare catalogues at the HEC are queried using
the times previously calculated (plus/minus a
defined range). These queries provide start
and peak times of the energy released which
are consequently used in the next step.
4. Coronal mass ejections and radio shocks are
events observed above the solar atmosphere;
thus two new queries to the HEC (one for
CMEs, another for radio shocks detections)
are made using the new time ranges obtained
in the previous step.
5. Finally, a summary table is created which links
each SEP event to the associated flare, CME
and radio shock.
4 Workflow Sharing
The success of establishing workflows as a new
way for scientists to work depends largely on their
ability to find suitable example workflows and be-
ing able to share and discuss their workflows with
interested peers. Consideration has to be given
to the protection of their intellectual copyright
so that they do not lose the ability to publish
any results in high class journals. Any sharing of
workflows needs to be manageable and under-
standable to the individuals.
The requirements for our projects were
analysed as:
Easy to manage and set sharing settings
API to content for integration elsewhere
Secure storage of data including backups
Classifiable and sufficient meta-data set
Workflows for Heliophysics 493
Choice of license
DOI [35]orURI[36] for persistent references
Comments and feedback for workflows
Example values for valid inputs to workflows
Embedded references to a description of un-
derlying science
In HELIO we decided to use the myExperi-
ment [37] repository for the purpose of workflow
sharing with each other. myExperiment has built
in support for Taverna workflows like display-
ing embedded meta-data, it keeps the different
versions of a workflow accessible, and it is itself
accessible from within Taverna which makes the
reuse of workflows easy. The HELIO project cre-
ated a group especially for sharing heliophysics
workflows with each other called ‘helio’ which
has now 26 members and at the time of writing
has shared 87 items; most of those are Taverna
workflows.
The social elements in the myExperiment envi-
ronment allow users to restrict the visibility and
usability of their intellectual property to people
with whom they wish to co-operate or whom they
would like to be able to assess their work, or allow
to use it.
myExpriment provides a REST interface to the
content of the repository, which we make use of in
the HELIO portal (Section 6).
5 Remote Execution Service
In order to make good use of workflows as part
of the scientific process, we realized that users
would need to have some way of running those
workflows without having a local installation of
a heavyweight piece of software like the Ta-
verna Workbench. Because of the complexity of
a workflow execution engine, it became rapidly
clear to us that we would not want to have that
installed on end-user systems at all. Furthermore,
by moving it to a special dedicated deployment, it
would also be possible to use it to support other
key use cases, such as placing workflows behind
a (relatively-lightweight) portal (see Section 6).
Another key benefit of this is that it enables a
workflow to run for a long time without having the
user of that workflow connected to the internet for
all that time; when a workflow might potentially
take longer than a working day, this becomes a
significant issue.
Thus we have developed Taverna Server in
HELIO. This is a fully service-oriented interface
to the Taverna [21] workflow execution engine.
5.1 Key Features
Taverna Server is based on Taverna 2.4. It
supports the upload and execution of arbitrary
Taverna 2 workflows, provided they contain no
interactive components (there is no GUI through
which a user might interact with the running
workflow). It also has workflow run introspection
capabilities, so that clients can ask the server what
inputs they should supply and what outputs were
provided without having to understand the inter-
nals of a Taverna workflow. Each workflow run
lasts only a limited amount of time (according to
the principles of resources on a Grid) with this
life-span being user settable; upon the completion
of a workflow run, a notification is published by
the server’s Atom [38] feed (and optionally also
via email or Jabber/XMPP [39], depending on
setup).
The server provides access to a workflow run’s
input and output files. The only practical limit
on file size is the amount of disk space on the
deployment system. Each workflow run is isolated
from all the others on the server, so that inputs to
and results of one run do not affect another.
We support accessing the server via both
RESTful [11] and SOAP [10] APIs, both of which
are implemented as views over an underlying abs-
tract interface (the “workflow run”). This is a
particular benefit, because it means that different
languages can provide interfaces to the server in
the way that is most natural to them: with Java,
it is typically the case that a SOAP interface is
simplest, whereas a scripting language like Ruby
can use the REST interface more easily.
A number of security features are present,
keeping log-in details confidential, allowing con-
trol over who can connect to the workflow server,
the separation of runs by user, and the supply of
credentials to other services from workflow runs
494 A. Le Blanc et al.
Fig. 4 Summary diagram
of the architecture of
Taverna Server. The
orange boxes are the
major components of the
architecture, the yellow
boxes are selected minor
components of the
architecture, and the
arrows indicate the major
interactions between the
pieces; green parts are
pre-existing infrastructure
and frameworks. Note
that the main server (top
green box) cannot directly
access the user file-store;
there is no assumption
that they are shared
Tomcat/Spring/CXF
Abstract Model
SOAP Interface REST Interface
Secure Fork
Process
Per-User Manager
Process
File
System
Access
Workflow
Run
Builder
Process
Monitor
Per-Run Execution Engine Process
Concrete Implementation
State DatabaseWorkflow Run Factory Model Delegates
User
Filestore
Files always
accessed via
delegate
Fork after setup
Wait for termination
sudo
Create or
reuse
existing
Implement via model
Model implementation maintains state
Direct access
for efficiency
on the server.5There is a mechanism whereby a
user can grant another user access to one of their
runs (e.g., to allow them to see some results, or to
fix some problem in the run’s configuration). This
does not include the security credentials though;
those are always carefully hidden.
We also provide a management API via both
JMX [40] and REST, which allows setting many
options and viewing things such as resource ac-
counting.
Currently at an advanced stage of development
(primarily within the BioVeL project [41]) is a sys-
tem for unified handling of interactive workflows
via a web browser, allowing a particular workflow
to ask questions of its users in the same way,
whether that workflow is running in the Taverna
Workbench or in Taverna Server. We anticipate
that this will be exceptionally useful across many
domains of science, where it is frequently only
possible to semi-automate processes; the aim is to
ensure that the expert scientist is kept in the loop
5Multiple formats of security credential are supported, so
there is no need for clients to be written in a particular
language to gain access to the security. This is distinctly
different from the underlying Taverna platform which only
supports the Bouncy Castle format key-stores.
at critical stages while mechanising the process-
ing in-between. We are also working on produ-
cing a deployment version of Taverna Server as
Amazon Web Services AMI [42]. Other areas
under development, though currently somewhat
less advanced, are systems for unified provenance
models and discovery of workflow execution state
so that users of a workflow can discover how far
it has progressed on the server and what exactly it
did, just as they can with the Taverna Workbench.
5.2 Architecture
The workflow server consists of a web application
(see Fig. 4) that provides SOAP and a REST
views of an underlying abstract model. That model
consists of an abstract factory, which knows how
to create new workflow runs and list the existing
ones, an abstract run description which provides a
number of properties relating to the workflow run
(e.g., its execution state and the mapping of files to
workflow inputs), and an abstract file system de-
scription that models a particular workflow run’s
working directory (and its subdirectories and their
contents).
The implementation of the abstract model is
done through mapping the abstract workflow runs
to Java proxy objects running in a sub-process.
Workflows for Heliophysics 495
Those objects are each associated with a particular
directory that is specially created for the workflow
run, and the objects that handle file system access
are careful to ensure that each file accessed can
only be in the working directory or below. Sym-
bolic links are prohibited from being accessed;
the abstract file system model claims they simply
don’t exist, so stopping potential for information
leaks through that mechanism.
An executing workflow run (the central state of
the workflow run, though not the initial or final
one) corresponds to the presence of a workflow
execution process specifically for that run. An
additional security constraint is that each distinct
user of the system runs his workflows with a
separate user account; this prevents information
leakage from one user to another (the need to
prevent different runs of the same user from see-
ing each other is significantly less). All access to
the workflow run’s file system is done through the
proxy objects; there is no need for a shared file
system between the core server and the worker
processes at all (though the current implementa-
tion of the sub-process creation engine is more
constrained).
5.3 Development
The server is implemented as a Java web ap-
plication that sits on top of the Apache CXF
2.5 web service framework [43] hosted inside
the Spring 3.0 dependency injection framework
[44]. The abstract model described above is im-
plemented as a collection of Spring beans with
JAX-WS [45] and JAX-RS [46] annotations to
describe the mapping of the abstract model into
the service views presented by CXF. Security and
transaction constraints are enforced through the
use of aspect-oriented programming; a particular
workflow run’s model bean is only available if the
currently accessing user has permission to see that
run.
The sub-processes that implement the abstract
model are spawned through the use of the sudo
program [47], which can be configured to allow a
specific process permission to run particular pro-
grams without interaction for a limited set of users
(typically, a Unix user group). By strictly con-
straining what may be run this way and for whom,
it ensures that the potential for damage from
abuse is as limited as practical. The user specific
sub-processes started this way use a strictly reg-
ulated form of JRMP [48] to communicate back
with the main server process.
The presentation of security tokens to a
workflow run implementation is handled spe-
cially. They are written as an encrypted Bouncy
Castle [49] key-store to a user specific directory
that is not part of the working directory hierarchy;
the credentials are not (normally) visible to the
user after they are written to the disk. Further-
more, they are encrypted with a high entropy one-
time password that is never reused for any other
purpose and which is itself never written to disk
at all. This ensures that only authorised processes
running as the correct user can ever extract the
credentials; nothing else can find this information
out. When coupled with the fact that users are
strongly encouraged to give their credentials only
to servers that they trust to act honestly in the
first place, this gives as high a level of assurance
of confidentiality as is reasonably practical, given
that some remote services accessed by a workflow
might require a password to be used in the first
place. We also support the HELIO identity token
system6[50,51], which allows a cryptographic
token to be obtained by the portal and then passed
through to appropriately enabled services without
explicit use of the credential management part of
the workflow server API.
6 Integration into the HELIO Portal
The HELIO portal is a web application that pro-
vides integrated access to the HELIO web ser-
vices. Access to various HELIO services is imple-
mented in a generic and unified way. This has the
advantage that new services can be easily added
to the portal. This is particularly convenient for
adding new Taverna workflows to the system.
The HELIO portal is centred around data and
the tasks performed on it. Data may originate
6This is a deployment option that is not normally enabled,
as the current implementation of the workflow engine
itself is sufficiently constrained that it is necessary to use
a special workflow to make use of this feature.
496 A. Le Blanc et al.
either from catalogues within the system, from a
Taverna workflow or from an uploaded VOTable.
Data objects are presented as primary entities of
the user interface in a preferred location such
that users can get a good overview of previously
obtained data. Tasks may then be applied to the
data objects in order to transform them to new
data objects.
This data-centric view distinguishes the HELIO
portal from normal web applications as well as
traditional scientific systems. Normal web appli-
cations are workflow oriented in the sense that
a predefined flow of web pages is guiding the
user through the process of a data analysis task.
Traditional scientific systems such as a command
line based data analysis tool are function oriented.
Their most important entities are groups of ba-
sic functions that are joined together to fulfil a
specific task. Compared to a data-centric system a
user requires more technical knowledge about the
input and output parameters of these functions.
6.1 User Interaction Pattern
The way of using data and tasks in the HELIO
portal follows a generic user interaction pattern.
1. Select a task to be executed. Tasks are
presented in natural language like: “Get ob-
servations for a given time range”, “See what
instruments covered this period”, etc. This
task oriented approach supports novice users
to perform common tasks without having
deep knowledge of the detailed science, while
not preventing more advanced users from
working with the data.
2. Gather the input parameters required for the
selected task. The input parameters may be
entered manually or they may be reused from
a previously executed task such as a Taverna
workflow. Generally, tasks have sensible de-
faults for most input parameters.
The HELIO portal provides customised dia-
logues for different kinds of input para-
meters. Commonly used input parameters
such as date ranges or instruments are en-
tered through a dedicated dialogue. For other
types of input parameters HELIO provides a
configurable, generic dialogue. The latter is
used for workflow specific parameters.
3. Execute the task on the HELIO infrastructure.
Most tasks are connected directly to one or
several HELIO web services. Currently, only
one task can be run at the time. This is fine for
HELIO as most of the tasks terminate within
seconds, but it might have to be changed in
future versions of the portal.
4. Visualise the result of the task. Depending on
the data type of the result, different tools are
used for visualisation. It is even possible to
have different visualisations for a data type;
e.g., a table with time series can be repre-
sented as a plain HTML table or as a time line
plot.
5. Extract new input parameters from the result.
In many cases a user can extract new input
parameters from a result. These parameters
can be used as input for a succeeding task or
to refine the current task. In a timeline plot
a user can select a date range of interest. In
a table of instruments the user can look for
instruments with similar capabilities.
6. Continue at step 1. The process can be
repeated until the original question is
sufficiently answered.
In order to support sharing parameters be-
tween multiple tasks the portal introduces the
concept of a data cart. The data cart is a dedicated
area in the web interface to persist and manage
collections of parameter values. Parameters ex-
tracted from a task result will be stored in the data
cart. Using the mouse they can be dragged from
there and be dropped to the input area of another
task.
The data cart is inspired by shopping carts
known from web shops. It accentuates the impor-
tance of data in scientific applications and thus
reflects the astronomer’s way of thinking in terms
of data rather than functions. Analysis of data is
the main interest of a scientist. How to get to the
data, e.g. which function to run, is less interesting.
The portal interaction pattern and the data cart
are the core concepts that drove the design of the
HELIO user interface. Figure 5shows the portal
with the task menu at the top, the data cart right
below and the parameter input area for a selected
Workflows for Heliophysics 497
Fig. 5 Screenshot of the HELIO portal interface, showing an example data cart and the parameter input area
task in the bottom part. Figure 6shows part of the
result of a propagation model task. The buttons
above the result table allow the user to extract
input parameters from the table.
6.2 Workflow Integration
Taverna workflows are integrated into the HE-
LIO portal by presenting them as tasks. They
are executed on a Taverna Server (see Section 5)
which is accessed through its SOAP interface. The
actual workflow must be registered in the myEx-
periment repository (see Section 4). In this way
the portal can always fetch the latest version of
a workflow, assuming that this is the most stable
one.
By presenting Taverna workflows as tasks they
are treated by the portal like other HELIO ser-
vices such as the HELIO event catalogue (HEC),
the context service (CXS), the HELIO process-
ing service (HPS) or the data evaluation service
(DES) (see Section 2.2). Due to the data cart, para-
meters of a workflow can be shared with other
tasks. This opens a whole field of new research
possibilities.
The data and task centric approach of the por-
tal stands in contrast to the Taverna workbench
which follows a workflow centric route. Taverna
workbench offers a generic input UI for any
workflow. All input parameters are rendered a
simple text boxes with a descriptive text. For the
HELIO portal it is compulsory to know the exact
data type in order to reuse it with other tasks.
498 A. Le Blanc et al.
Fig. 6 HELIO portal showing the result of a CIR backwards propagation model
The situation gets even more complex for output
parameters where HELIO portal needs to know
how to present them to the user while Taverna
workbench provides the user with a choice of
generic viewers (e.g., text viewer, image viewer,
XML viewer).
Taverna workbench is ideally suited for the
development and testing of workflows, while the
HELIO portal offers easy access to a selection of
common Taverna workflows.
Integration of new workflows in the portal is a
manual process and requires some development
and configuration work. Presently, there is no
way to add new workflows automatically to the
HELIO portal. The main reason is that the de-
scription format used by the HELIO workflows,
the T2FLOW format [52], is slightly too lim-
ited to support the type of metadata that would
be required for automated UI construction. The
succeeding format, SCUFL2, is able to support
sufficient richness of annotation, but was not
available in time for the development of HELIO.
6.3 Implementation
The HELIO portal is implemented as a Rich
Internet Application (RIA) [53]. A RIA mimics
much of the capabilities that would be expected in
a desktop application. A large fraction of a RIA is
implemented in HTML, CSS and Javascript [54]
and runs in a web browser. The back end is a
common web server. The connection between the
Workflows for Heliophysics 499
client and the server is made through asynchro-
nous server calls based on AJAX technologies
[55].
The static component diagram of the HELIO
portal consists of three core layers (Fig. 7). The
fourth layer at the bottom is meant as a place
holder for the HELIO web services. It contains
all four categories, but only a selection of the web
services described in Section 2.2.
The access layer is written in Java and abstracts
access to the HELIO web services by offering
a couple of generic interfaces. Every member
within a category can be called the same way. This
is particularly convenient for infrastructure and
processing services, as not all of the underlying
services offer the same web service interface.
The access classes are implemented as Java-
Beans. The properties of the JavaBeans are
mapped to the input parameters of the underlying
web services. At runtime, clients can use bean
introspection to find out about these properties.
After setting appropriate values the client calls
an execute() method which delegates to the
web service. The actual call is executed in a back-
ground thread. The execute() method returns
an object to poll the execution status and to re-
Browser
(HTML, CSS,
JavaScript)
Access
(Java)
Service Layer
(Java, IDL, others)
Data Metadata
Processing
Data Provider
Access (DPAS)
Event Catalogue
Feature
Catalogue
Inst. Location
Inst. Coverage
Taverna Server Processing
Service (HPS)
Context Service
(CXS)
Data Evaluation
Service (DES)
Data Access
Beans
Metadata Access
Beans
Processing Access Beans
(Taverna, HPS, CXS, DES)
Registry Access
Beans
Infrastructure
HELIO Registry
Service (HRS)
MyExperiment
Web Server
(Grails)
Configuration
Mashup
Metadata Access
Controller
Data Access
Controller
Processing
Controller
Task ViewsDomain Model
Data Model Input Dialog Ta sk ResultData Cart
Main
Dialog Views Result Views
Data/Image
Repositories
Fig. 7 Component diagram of the HELIO portal architecture. The highlighted components are from outside of the HELIO
project
500 A. Le Blanc et al.
trieve the result and corresponding log messages
once they are available.
The web server part of the HELIO portal is im-
plemented in Grails. The Grails web framework is
written in the Groovy programming language and
sits on top of the Spring web MVC framework.
The web server layer consists of several
components:
1. The domain model is a Groovy object model
holding the data required within a browser
session. It is connected to a database in order
to ensure that the data persists between mul-
tiple requests.
2. The processing, data and metadata access con-
trollers inspect the corresponding JavaBeans
of the access layer and populate the domain
model.
3. The configuration mashup component queries
the HELIO service and the myExperiment
registry and integrates their content to a portal
specific configuration format. Additionally,
currently hard coded, UI specific metadata
are woven into the configuration. These are
for instance input validators, help texts in a
browser friendly format (e.g., with images) or
layout directives.
4. The dialog, task and result views are responsi-
ble for rendering the views.
The browser layer of the HELIO portal im-
plements the actual user interface components. It
is based on Javascript and uses the jQuery [56]
library for the core AJAX functionality as well
as a couple of jQuery plugins for advanced UI
widgets.
The Javascript code is organised in six core
modules: main, data model, input dialog, task,
result and data cart. The main module handles the
overall integration, the other modules implement
the behaviour of the corresponding UI widgets
and connect them to the server.
In order to add a new Taverna workflow, the
system has to be modified in three areas. First,
aprocessing access bean for the new workflow
has to be created. While this is currently done
manually it is planned to use dynamic JavaBeans
in future. The properties of a DynaBean7[57]can
be created at runtime based on the configured
input parameters of a workflow. Second, the
configuration mashup component has to be
configured with the UI specific details of the
workflow. Third, the workflow has to be regis-
tered in the task menu.
The HELIO portal will automatically create
the required parameter input dialog. If the result
formats are supported, the HELIO portal will
also take care to present the result to the user.
Otherwise the portal offers a download link and
the data can be viewed in a separate application.
7 Conclusion
The service oriented architecture of HELIO en-
ables us to pursue workflows as the means to en-
rich the community of heliophysics with new and
more complex functionality, built from the basic
building blocks of general heliophysics services.
We introduced workflow building to scientists and
enabled them to share and discuss their scientific
work as workflows within the social space of the
myExperiment repository. Scientists who do not
write their own workflows can benefit from the
advanced functionality they provide by executing
them remotely on the Taverna Server without the
need of any additional software on their compu-
ters. The availability of web service APIs to both
myExperiment and Taverna Server enables us to
integrate the most useful workflows for every user
into the HELIO VxO portal.
We have started the process of changing the
working practices of scientists, but this change
takes time, support from workflow experts, the
ability to integrate existing scripts and applica-
tions easily, and the availability of training. The
interest of scientists in the workflow writing tu-
torial was large and went beyond the partners
in the HELIO project, and this indicates that
further training needs to be offered at suitable
times, and where possible in association with other
scientific events. The integration of the HELIO
7See http://commons.apache.org/beanutils/.
Workflows for Heliophysics 501
registry into the Taverna Workbench will be a
key development going forward, as it will enhance
the accessibility of Taverna for heliophysics users
by making the registered services automatically
available to workflow developers.
The development of Taverna Server is pro-
ving to be an important outcome of the HELIO
project, as it has attracted much attention from
many scientists and developers across many dis-
ciplines. We provide a solid design which enables
the safe execution of workflows. The integration
of the Taverna Server in the MyGrid team will en-
sure that the further development will be aligned
with new releases of Taverna Workbench.
We found a way of making the integration of
additional workflows in the HELIO portal an easy
(though not automated) process. That makes it
possible to provide new functionality to the portal
users as scientists develop new workflows only by
editing some configuration files. There is scope
for some automation of this process, but open is-
sues remain in the area of authorization. Because
workflows will be used by unauthenticated users
and run on the Taverna server, it is necessary to
vet the workflows before making them available
through the portal. This is likely to increase the
administrative overhead of managing the portal.
A key part of this will be ensuring that the meta-
data quality of any workflow to be published
through the portal is of a sufficient standard to
be able to work with the automatic generation of
input and output UI specific components.
Acknowledgements We want to thank the service
providers of HELIO: Observatoire de Paris, University
College London (MSSL), Trinity College Dublin, IRAP
Toulouse, Istituto Nazionale di Astrofisica (Obs. Trieste),
The University of Manchester, Science and Technology
Facilities Council (RAL) and all the HELIO Consortium.
This work was funded by the European Commission as
part of the Seventh Framework Programme, Project No.
238969.
References
1. Szalay, A., Gray, J.: The world-wide telescope. Science
293(5537), 2037–2038 (2001)
2. Hatziminaoglou, E.: Virtual observatory: science ca-
pabilities and scientific results. In: Tsinganos, K.,
Hatzidimitriou, D., Matsakos, T. (eds.) 9th Interna-
tional Conference of the Hellenic Astronomical Soci-
ety. Astronomical Society of the Pacific Conference
Series, vol. 424, pp. 411 (2010)
3. Tedds, J.A.: Science with the virtual observatory:
the AstroGrid VO desktop. ArXiv:0906.1535 e-prints
(2009)
4. Dalla, S., Walton, N.A.: Astrogrid: the Uk’s virtual ob-
servatory and its solar physics capabilities. In: Walsh,
R.W., Ireland, J., Danesy, D., Fleck, B. (eds.) SOHO
15 Coronal Heating. ESA Special Publication, vol. 575,
p. 577 (2004)
5. Bentley, R., Csillaghy, A., Aboudarham, J., Jacquey,
C., Hapgood, M.A., Bocchialini, K., Messerotti, M.,
Brooke, J., Gallagher, P., Fox, P., et al.: HELIO: the
heliophysics integrated observatory. Adv. Space Res.
47(12), 2235–2239 (2011)
6. Martınez, A.P., Derriere, S., Gray, N., Mann, R.,
McDowell, J., Mc Glynn, T., Ochsenbein, F., Osuna,
P., Rixon, G., Williams, R.: The UCD1+controlled
vocabulary. IVOA Semantics WG Recommendation
(2005)
7. Ochsenbein, F., Williams, R., Davenhall, C., Durand,
D., Fernique, P., Hanisch, R., Giaretta, D., McGlynn,
T., Szalay, A., Wicenec, A.: VOTable: tabular data for
the virtual observatory. In: Quinn, P., Górski, K. (eds.)
Toward an International Virtual Observatory. ESO
Astrophysics Symposia, vol. 30, pp. 118–123. Springer,
Berlin / Heidelberg (2004). doi:10.1007/10857598_18
8. Stern, B.A.: Interactive data language. In: Proceed-
ings of SPACE 2000: The Seventh International Con-
ference and Exposition on Engineering, Construction,
Operations and Business in Space, p. 1011. American
Society of Civil Engineers, 1801 Alexander Bell Drive,
Reston, VA, 20191-4400, USA (2000)
9. Bentley, R., CASSIS team: Coordination action for the
integration of solar system infrastructures and science.
Project web page. http://cassis-vo.eu/ (2010). Cited 13
April 2012
10. Box, D., Ehnebuske, D., Kakivaya, G., Layman, A.,
Mendelsohn, N., Nielsen, H.F., Thatte, S., Winer,
D.: Simple object access protocol (SOAP) 1.1. http://
www.w3.org/TR/2000/NOTE-SOAP-20000508/ (2000)
11. Fielding, R.: Representational state transfer: an ar-
chitectural style for distributed hypermedia interac-
tion. PhD thesis, PhD Thesis, University of California,
Irvine (2000)
12. Bose, P., Hurlburt, N., Somani, A., Fox, P.: Collab-
orative virtual sensorweb infrastructure: architecture
and implementation. Online at http://www.ivoa.net/
internal/IVOA/InterOpOct2011GWS/IVOA-Scientific-
Workflows.pdf. In NASA Science Technology
Conference (2007)
13. Unkown: IVOA table access protocol parameterized
query language. Online at http://www.ivoa.net/internal/
IVOA/TableAccess/PQL-0.2-20090520.pdf (2009)
14. Melton, J., Simon, A.R.: Understanding the New SQL:
A Complete Guide. The Morgan Kaufmann Series in
Data Management Systems. Morgan Kaufmann Pub-
lishers (1993)
15. Andrews, T., Curbera, F., Dholakia, H., Goland,
Y., Klein, J., Leymann, F., Liu, K., Roller, D.,
Smith, D., Thatte, S., Trickovic, I., Weerawarana, S.:
502 A. Le Blanc et al.
BPEL4WS, Business process execution language
for web services version 1.1. IBM. http://download.
boulder.ibm.com/ibmdl/pub/software/dw/specs/ws-bpel
/ws-bpel.pdf (2003)
16. Deelman, E., Blythe, J., Gil, Y., Kesselman, C., Mehta,
G., Patil, S., Su, M.H., Vahi, K., Livny, M.: Pegasus:
mapping scientific workflows onto the Grid. In: Grid
Computing, pp. 131–140. Springer (2004)
17. Breuer, D., Erwin, D., Mallmann, D., Menday, R.,
Romberg,M.,Sander,V.,Schuller,B.,Wieder,P.:Sci-
entific computing with UNICORE. In: NIC Sympo-
sium, vol. 20, pp. 429–440 (2004)
18. Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R.,
Kötter, T., Meinl, T., Ohl, P., Sieb, C., Thiel, K.,
Wiswedel, B.: KNIME: the Konstanz information
miner. In: Data Analysis, Machine Learning and Ap-
plications, pp. 319–326 (2008)
19. Goecks, J., Nekrutenko, A., Taylor, J., Team, T.G.:
Galaxy: a comprehensive approach for supporting ac-
cessible, reproducible, and transparent computational
research in the life sciences. Genome Biol. 11(8), R86
(2010)
20. Altintas, I., Berkley, C., Jaeger, E., Jones, M.,
Ludascher, B., Mock, S.: Kepler: an extensible sys-
tem for design and execution of scientific workflows.
In: Proceedings. 16th International Conference on Sci-
entific and Statistical Database Management, 2004,
pp. 423–424 (2004)
21. Hull, D., Wolstencroft, K., Stevens, R., Goble, C.,
Pocock, M.R., Li, P., Oinn, T.: Taverna: a tool for build-
ing and running workflows of services. Nucleic Acids
Res. 34(suppl 2), W729 (2006)
22. Peterson, J.L.: Petri net theory and the modeling of sys-
tems. Prentice-Hall, Inc., Englewood Cliffs, NJ 07632
(1981)
23. Belhajjame, K., Corcho, O., Garijo, D., Zhao, J.,
Missier, P., Newman, D.R., Palma, R., Bechhofer,
S., Garcia Cuesta, E., Gomez-Perez, J.M., Klyne, G.,
Page, K., Roos, M., Ruiz, J.E., Soiland-Reyes, S.,
Verdes-Montenegro, L., De Roure, D., Goble, C.:
Workflow-centric research objects: a first class citi-
zen in the scholarly discourse. In: Proc. Workshop
on the Semantic Publishing (SePublica), pp. 1–12
(2012)
24. Schaaff, A., Le Petit, F., Prugniel, P., Slezak, E.,
Surace, C.: Workflow working group in the frame
of asov. Online at http://www.france-ov.org/twiki/pub/
GROUPEStravail/Workflow/schaaff.pdf (2006)
25. Ohishi, M.: International virtual observatory alliance.
Highlights Astron. 14, 528–529 (2006)
26. Schaaff, A., Ruiz, J.E., et al.: Scientific workflows in
the vo. Online at http://www.ivoa.net/internal/IVOA/
InterOpOct2011GWS/IVOA-Scientific-Workflows.pdf
(2011)
27. Freeland, S.L., Handy, B.N.: Data analysis with the
solarsoft system. Sol. Phys. 182, 497–500 (1998).
doi:10.1023/A:1005038224881
28. Hightower, R.: BeanShell & DynamicJava: Java script-
ing with Java. JAVA developer’s journal. Online at
http://java.sys-con.com/node/36439 (2000). Retrieved
April 2012
29. Le Blanc, A.: Available instruments through DPAS
which are not part of ICS instruments table. In: myEx-
periment Repository. http://www.myexperiment.org/
workflows/2829.html (2012)
30. Le Blanc, A.: Check in UOC which instruments were
observing at a given time period and place. In: myEx-
periment Repository. http://www.myexperiment.org/
workflows/2822.html (2012)
31. Pérez-Suárez, D., Maloney, S.A., Higgins, P.A.,
Bloomfield, D.S., Gallagher, P.T., Pierantoni, G.,
Bonnin, X., Cecconi, B., Alberti, V., Bocchialini, K.,
Dierckxsens, M., Opitz, A., Blanc, A., Aboudarham,
J., Bentley, R.B., Brooke, J., Coghlan, B., Csillaghy,
A., Jacquey, C., Lavraud, B., Messerotti, M.: Study-
ing SunPlanet connections using the heliophysics in-
tegrated observatory (HELIO). Sol. Phys. 280(2),
603–621 (2012)
32. Le Blanc, A.: Co-rotating interaction regions back
wards propagation. In: myExperiment Repository. http://
www.myexperiment.org/workflows/2817.html (2012)
33. Parker, E.N.: Dynamics of the interplanetary gas and
magnetic fields. Ap. J. 128, 664 (1958)
34. Le Blanc, A., Miteva, R.: Associate sep events at
earth with flare, cme and radio events on the sun. In:
myExperiment Repository. http://www.myexperiment.
org/workflows/2815.html (2012)
35. Paskin, N.: Digital Object Identifier (DOI®), chapter
114, pp. 1–12. Taylor & Francis (2011)
36. Berners-Lee, T., Fielding, R., Masinter, L.: Uniform
resource identifiers (URI): generic syntax. In: Obso-
leted by RFC 3986, updated by RFC 2732. Internet
Engineering Task Force, IETF, vol. 2396. http://www.
ietf.org/rfc/rfc2396.txt (1998)
37. Goble, C.A., De Roure, D.C.: myExperiment: social
networking for workflow-using e-scientists. In: Pro-
ceedings of the 2nd Workshop on Workflows in Sup-
port of Large-Scale Science, pp. 1–2. ACM (2007)
38. Gregorio, J., de hOra, B.: RFC: 5023 the atom publish-
ing protocol. In: IETF Requests For Comments (2007)
39. Saint-Andre, P.: RFC 6120: extensible messaging and
presence protocol (XMPP): core. In: IETF Requests
For Comments (2004)
40. Perry, J.S., Denn, R.: Java Management Extensions.
O’Reilly & Associates, Inc. (2002)
41. Vicario, S., Hardisty, A., Haitas, N.: Biovel: biodi-
versity virtual e-laboratory. EMBnet.journal 17(2), 5
(2011)
42. Murty, J.: Programming Amazon Web Services: S3,
EC2, SQS, FPS, and SimpleDB. O’Reilly Media, Incor-
porated (2008)
43. Balani, N., Hathi, R.: Apache CXF Web Service De-
velopment. Packt Publishing (2009)
44. Johnson, R., Hoeller, J., Arendsen, A., Risberg, T.,
Kopylenko, D.: Professional Java Development with
the Spring Framework. Wrox Press Ltd. (2005)
45. Chinnici, R., Hadley, M.: JSR 224: Java API for XML-
Based Web Services (JAX-WS) 2.0. Java Community
Process (2006)
46. Hadley, M., Sandoz, P.: JSR 311: JAX-RS: Java API for
RESTful Web Services (version 1.1). Java Community
Process (2009)
Workflows for Heliophysics 503
47. Napier, R.A.: Secure automation: achieving least priv-
ilege with SSH, Sudo and Setuid. In: 18th Large
Installation System Administration Conference, pp. 203–
212 (2004)
48. Maassen, J., van Nieuwpoort, R., Veldema, R., Bal,
H.E., Plaat, A.: An efficient implementation of Java’s
remote method invocation. SIGPLAN Not. 34(8), 173–
182 (1999)
49. The Legion of the Bouncy Castle: Bouncy Cas-
tle Crypto APIs for Java. Online at http://www.
bouncycastle.org/java.html (2007–2012)
50. Pierantoni, G., Kenny, E., Coghlan, B.: The architec-
ture of helio. In: Bubak, M., Turala, M., Wiatr, K. (eds.)
CGW’10 Proceedings. Volume CGW’10 Proceedings
of Krakow Grid Workshop Proceedings, pp. 84–91.
ACC CYFRONET AGH (2011)
51. Pierantoni, G., Kenny, E., Coghlan, B.: The use of stan-
dards in helio. Comp. Sci. 13(2), 93–102 (2012)
52. Sroka, J., Hidders, J., Missier, P., Goble, C.: A
formal semantics for the Taverna 2 workflow
model. J. Comput. Syst. Sci. 76(6), 490–508
(2010)
53. Driver, M., Valdes, R., Phifer, G.: Rich Internet Ap-
plications are the Next Evolution of the Web. Gartner
Research (2005)
54. Flanagan, D.: JavaScript: The Definitive Guide.
O’Reilly (1998)
55. Garrett, J.J.: Ajax: a new approach to web applications.
Blog posting, available online at http://www.adaptive
path.com/ideas/ajax-new-approach-web-applications
(2005). Downloaded June 2011
56. Bibeault, B., Katz, Y.: jQuery in Action. Manning Pub-
lications Co. (2008)
57. Spielman, S.: The Struts Framework: Practical Guide
for Java Programmers. Morgan Kaufmann Pub
(2002)
... Imposing the workflow paradigm for application composition offers several advantages, including orchestrated resource sharing, optimised resource utilisation, automatic data processing, inter-organisational collaboration and so on (Zhao et al., 2015;Qi, 2017). Therefore, scientific communities, such as high-energy physics, bioinformatics, geophysics, etc., have widely applied workflow to perform complex simulations or process large volumes of datasets (Le-Blanc et al., 2013;Krieger et al., 2017). ...
... More recently, support for conditional and cyclic data flows has been added, but its implementation is optimised for low numbers of iterations (for example, in loops workflows are executed serially, with their full instantiation overhead). Taverna provides an intuitive graphical user interface (Taverna Workbench) that, allowing to build scientific workflows with minimal computational expertise, has been adopted in different domains: bioinformatics [17,18], biodiversity and ecology [19], astronomy [19,20], heliophysics [21], statistics [18], and systems biology [22]. ...
Article
Full-text available
Computational medicine more and more requires complex orchestrations of multiple modelling & simulation codes, written in different programming languages and with different computational requirements, which when validated need to be run many times on large cohorts of patients. The aim of this paper is to present a new open source software, the VPH Hypermodelling Framework (VPH-HF). The VPH-HF overcomes the limitations of most workflow execution environments by supporting both Taverna and Muscle2; the addition of Muscle2 support makes possible the execution of very complex orchestrations that include strongly-coupled models. The overhead that the VPH-HF imposes in exchange for this is small, and tends to be flat regardless of the complexity and the computational cost of the hypermodel being executed. We recommend the use of the VPH-HF to orchestrate any hypermodel with an execution time of 200 s or higher, which would confine the VPH-HF overhead to less than 10%. The VPH-HF also provide an automatic caching system over the execution of every hypomodel, which may provide considerable speed-up when the orchestration is run repeatedly over large numbers of patients or within stochastic frameworks, and the input sets are properly binned. The caching system also makes it easy to form large input set/output set databases required to develop reduced-order models, and the framework offers the possibility to dynamically replace single models in the orchestration with reduced-order versions built from cached results, an essential feature when the orchestration of multiple models produces a combinatory explosion of the computational cost.
... BioVeL users can also share their workflows through the myExperiment portal [11] or the 2 BiodiversityCatalogue. In the Astrophysics domain, the HELIO project [10] also follows this approach and provides this community with workflows that implement new and more complex functionalities. ...
Article
Full-text available
An efficient exploitation of Distributed Computing Infrastructures (DCIs) is needed to deal with the data deluge that the scientific community is facing, in particular the Astrophysics one due to the emerging Square Kilometre Array (SKA) telescope that will reach data rates in the exascale domain. Hence, Science Gateways are being enriched with advanced tools that not only enable the scientists to build their experiments but also to optimize their adaptation to different infrastructures. In this work we present a method, called “two-level workflow system”, to build this kind of tools and we apply it to a set of analysis tasks of interest for some use applications to the SKA. This method uses the Software-as-a-Service model to keep the scientists insulated from technical complexity of DCIs, and the COMPSs programming model to achieve an efficient use of the computing resources.
... Providing the latest bioinformatics tools in a user-friendly environment allows bioscientists to take steps to perform their own bioinformatics analysis. Science gateways greatly facilitate this process [1][2][3][4]. ...
Article
Full-text available
Driven by advances in data generation technologies and fuelled by radical reduction in costs, genomics has become a data science. Nonetheless the field of genomics has been restrained by the ability to analyse data. Science gateways, such as Galaxy, have the potential to enable bench biologists to analyse their own data without needing be familiar with the command line. Implementing a production scale Galaxy service, sufficiently well-featured and resourced to meet the needs of the end-users, is a significant undertaking and requires the consideration and combination of a number of factors to be successfully adopted by the community. In this paper, we describe the process that we undertook to implement a Galaxy service and describe what we consider to be the essential components of such a service. Our experience and insights will be of interest to those who are planning on implementing a science gateway service in a research organisation.
... The establishment of such approaches will further advance the robustness, standardization, deployability and impact of metabolomics research, increasing the data quality and eventually facilitating its integration with other omics domains. There are many workflow platforms that have been implemented successfully across a variety of scientific fields141516. Recently, several in-house as well as community-based open source workflow platforms (e.g. ...
Article
Full-text available
Background: Metabolomics is increasingly recognized as an invaluable tool in the biological, medical and environmental sciences yet lags behind the methodological maturity of other omics fields. To achieve its full potential, including the integration of multiple omics modalities, the accessibility, standardization and reproducibility of computational metabolomics tools must be improved significantly. Results: Here we present our end-to-end mass spectrometry metabolomics workflow in the widely used platform, Galaxy. Named Galaxy-M, our workflow has been developed for both direct infusion mass spectrometry (DIMS) and liquid chromatography mass spectrometry (LC-MS) metabolomics. The range of tools presented spans from processing of raw data, e.g. peak picking and alignment, through data cleansing, e.g. missing value imputation, to preparation for statistical analysis, e.g. normalization and scaling, and principal components analysis (PCA) with associated statistical evaluation. We demonstrate the ease of using these Galaxy workflows via the analysis of DIMS and LC-MS datasets, and provide PCA scores and associated statistics to help other users to ensure that they can accurately repeat the processing and analysis of these two datasets. Galaxy and data are all provided pre-installed in a virtual machine (VM) that can be downloaded from the GigaDB repository. Additionally, source code, executables and installation instructions are available from GitHub. Conclusions: The Galaxy platform has enabled us to produce an easily accessible and reproducible computational metabolomics workflow. More tools could be added by the community to expand its functionality. We recommend that Galaxy-M workflow files are included within the supplementary information of publications, enabling metabolomics studies to achieve greater reproducibility.
... One common approach to tackling complex protocols (or workflows) in computational sciences is to automate the process by scripting the laborious elements, such as in the Taverna Workflow Management System [17] for a range of disciplines from heliophysics [18] to multi-disciplinary design optimisation in engineering [19]. ...
Article
Full-text available
The reproduction and replication of reported scientific results is a hot topic within the academic community. The retraction of numerous studies from a wide range of disciplines, from climate science to bioscience, has drawn the focus of many commentators, but there exists a wider socio-cultural problem that pervades the scientific community. Sharing data and models often requires extra effort, and this is currently seen as a significant overhead that may not be worth the time investment. Automated systems, which allow easy reproduction of results, offer the potential to incentivise a culture change and drive the adoption of new techniques to improve the efficiency of scientific exploration. In this paper, we discuss the value of improved access and sharing of the two key types of results arising from work done in the computational sciences: models and algorithms. We propose the development of an integrated cloud-based system underpinning computational science, linking together software and data repositories, toolchains, workflows and outputs, providing a seamless automated infrastructure for the verification and validation of scientific models and in particular, performance benchmarks.
... There are a number of existing workflows that have been built by the heliophysics community [6] that aid the analysis of solar physics data -these can be found at http://www.myexperiment.org/groups/101.html. The workflows involve extracting metadata from catalogs of CMEs, solar flares, solar wind properties or shocks. ...
Conference Paper
Full-text available
he Sun is responsible for the eruption of billions of tons of plasma and the generation of near light-speed particles that propagate throughout the solar system and beyond. If directed towards Earth, these events can be damaging to our technological infrastructure. Hence there is an effort to understand the cause of the eruptive events and how they propagate from Sun to Earth. However, the physics governing their propagation is not well understood, so there is a need to develop a theoretical description of their propagation, known as a Propagation Model, in order to predict when they may impact Earth. It is often difficult to define a single propagation model that correctly describes the physics of solar eruptive events, and even more difficult to implement models capable of catering for all these complexities and to validate them using real observational data. In this paper, we envisage that workflows offer both a theoretical and practical framework for a novel approach to propagation models. We define a mathematical framework that aims at encompassing the different modalities with which workflows can be used, and provide a set of generic building blocks written in the TAVERNA workflow language that users can use to build their own propagation models. Finally we test both the theoretical model and the composite building blocks of the workflow with a real Science Use Case that was discussed during the 4th CDAW (Coordinated Data Analysis Workshop) event held by the HELIO project. We show that generic workflow building blocks can be used to construct a propagation model that successfully describes the transit of solar eruptive events toward Earth and predict a correct Earth-impact time
Conference Paper
Workflow management has been widely adopted by scientific communities as a valuable tool to carry out complex experiments. It allows for the possibility to perform computations for data analysis and simulations, whereas hiding details of the complex infrastructures underneath. There are many workflow management systems that offer a large variety of generic services to coordinate the execution of workflows. Nowadays, there is a trend to extend the functionality of workflow management systems to cover all possible requirements that may arise from a user community. However, there are multiple scenarios for usage of workflow systems, involving various actors that require different services to be supported by these systems. In this paper we reflect about the usage scenarios of scientific workflow management based on the practical experience of heavy users of workflow technology from communities in three scientific domains: Astrophysics, Heliophysics and Biomedicine. We discuss the requirements regarding services and information to be provided by the workflow management system for each usage profile, and illustrate how these requirements are fulfilled by the tools these communities currently adopt. This paper contributes to the understanding of properties of future workflow management systems that are important to increase their adoption in a large variety of usage scenarios.
Conference Paper
Heliophysics is a relatively new branch of physics that investigates the relationship between the Sun and the other bodies of the solar system. To investigate such relationships, helio- physicists can rely on various tools developed by the community. Some of these tools are on-line catalogues that list events (such as Coronal Mass Ejections, CMEs) and their characteristics as they were observed on the surface of the Sun or on the other bodies of the Solar System. Other tools offer on-line data analysis and access to images and data catalogues. During their research, he- liophysicists often perform investigations that need to coordinate several of these services and to repeat these complex operations until the phenomena under investigation are fully analyzed. Heliophysicists combine the results of these services; this service orchestration is best suited for workflows. This approach has been investigated in the HELIO project. The HELIO project developed an infrastructure for a Virtual Observatory for Heliophysics and implemented service orchestration using TAVERNA workflows. HELIO developed a set of workflows that proved to be useful but lacked flexibility and re-usability. The TAVERNA workflows also needed to be executed directly in TAVERNA workbench, and this forced all users to learn how to use the workbench. Within the SCI-BUS and ER-FLOW projects, we have started an effort to re-think and re-design the heliophysics workflows with the aim of fostering re-usability and ease of use. We base our approach on two key concepts, that of meta-workflows and that of workflow interoperability. We have divided the produced workflows in three different layers. The first layer is Basic Workflows, developed both in the TAVERNA and WS- PGRADE languages. They are building blocks that users compose to address their scientific challenges. They implement well-defined Use Cases that usually involve only one service. The second layer is Science Workflows usually developed in TAVERNA. They implement Science Cases (the definition of a scientific challenge) by composing different Basic Workflows. The third and last layer,Iterative Science Workflows, is developed in WS-PGRADE. It executes sub-workflows (either Basic or Science Workflows) as parameter sweep jobs to investigate Science Cases on large multiple data sets. So far, this approach has proven fruitful for three Science Cases of which one has been completed and two are still being tested.
Article
Full-text available
HELIO [8] is a project funded under the FP7 program for the discovery and analysis of data for heliophysics. During its development, standards and common frameworks were adopted in three main areas of the project: query services, processing services, and the security infrastructure. After a first, proprietary implementation of the security service, it was suggested moving it to a standard security framework to simplify the enforcement of security on the different sites. As the HELIO front end is built with Spring and the TAVERNA server (HELIO workflow engine) has a security framework compatible with Spring, it has been decided to move the CIS in Spring security [2]. HELIO has two different processing services: one is a generic processing service called HELIO Processing Services (HPS), the other is called Context Service (CTX) and it runs specific IDL procedures. The CTX implements the UWS [4] interface from the IVOA [5], a standard interface for job submission used in the helio and astro-physics community. In its final release, the HPS will expose an UWS compliant interface. Finally, some of the HELIO services perform queries, to simplify the implementation and usage of this services a single query interface (the HELIO Query Interface) has been designed for all these services. The use of these solutions for security, execution, and query allows for easier implementation of the original HELIO architecture and for a simpler deployment of the services.
Article
Full-text available
The International Virtual Observatory Alliance is briefly introduced as a concensus-based group to construct International Virtual Observatory – a new, planet-wide research infrastructure for the 21st century astronomy. Standardized protocols by the IVOA were used to interconnect more than 10 astronomical obsrvatories and data centers to provide astronomers with multiwavelength astronomical data. The priority areas for technical development and planned developments are described.
Conference Paper
We present the Taverna workflow workbench and argue that scientific workflow environments need a rich ecosystem of tools that support the scientists’ experimental lifecycle. Workflows are scientific objects in their own right, to be exchanged and reused. myExperiment is a new initiative to create a social networking environment for workflow workers. We present the motivation for myExperiment and sketch the proposed capabilities and challenges. We argue that actively engaging with a scientist’s needs, fears and reward incentives is crucial for success.
Article
The aim of the BioVeL project is to provide a seamlessly connected informatics environment that makes it easier for biodiversity scientists to carry out in-silico analysis of relevant biodiversity data and to pursue in-silico experimentation based on composing and executing sequences of complex digital data manipulations and modelling tasks. In BioVeL scientists and technologists will work together to meet the needs and demands for in-silico or ‘e-Science’ and to create a production-quality informatics infrastructure to enable pipelining of data and analysis into efficient integrated workflows. Workflows represent a way of speeding up scientific advance when that advance is based on the manipulation of digital data.
Article
This workshop was held 14-15 February 2002 at UNESCO Headquarters, Paris. The workshop was a follow-on to a January 2000 Stakeholders Workshop sponsored by ICSTI and UNESCO. This workshop was sponsored by ICSTI in cooperation with the ICSU Press of the ...
Article
The Interactive Data Language (IDL) was originally designed to be used in space, but it has proved to be extremely useful in terrestrial environments, too. IDL has been used in many space missions and projects, like the salvation of the Hubble Space Telescope, the calibration of high-energy X-ray telescopes, and behind the scenes at Ball Aerospace. IDL's applications in the earth sciences include being used in monitoring waste sites to improve cleanup, tracking sea ice to keep ships safe, and improving accuracy in long-range weather predictions. Because IDL and other space technology have proved to be so useful in terrestrial environments, space technology should continue to be used in other fields of science.