ArticlePDF Available

Workflows for Heliophysics

May 2013
Journal of Grid Computing 11(3)

May 2013
11(3)

DOI:10.1007/s10723-013-9256-5

Authors:

Anja Le Blanc

The University of Manchester

John Brooke

The University of Sheffield

Donal Fellows

The University of Manchester

M. Soldati

University of Applied Sciences and Arts Northwestern Switzerland

Show all 7 authorsHide

In this paper we describe how we have introduced workflows into the working practices of a community for whom the concept of workflows is very new, namely the heliophysics community. Heliophysics is a branch of astrophysics which studies the Sun and the interactions between the Sun and the planets, by tracking solar events as they travel throughout the Solar system. Heliophysics produces two major challenges for workflow technology. Firstly it is a systems science where research is currently developed by many different communities who need reliable data models and metadata to be able to work together. Thus it has major challenges in the semantics of workflows. Secondly, the problem of time is critical in heliophysics; the workflows must take account of the propagation of events outwards from the sun. They have to address the four dimensional nature of space and time in terms of the indexing of data.We discuss how we have built an environment for Heliophysics workflows building on and extending the Taverna workflow system and utilising the myExperiment site for sharing workflows. We also describe how we have integrated the workflows into the existing practices of the communities involved in Heliophysics by developing a web portal which can hide the technical details from the users, who can concentrate on the data from their scientific point of view rather than on the methods used to integrate and process the data. This work has been developed in the EU Framework 7 project HELIO, and is being disseminated to the worldwide Heliophysics community, since Heliophysics requires integration of effort on a global scale.

ntegration test workflow; the workflow checks the consistency between instrument IDs which are used in the DPAS and are defined in the ICS. Test workflows support the development of physically separate services. The labels of each section of the workflow are explained in the text

…

List of web services provided by HELIO

…

Scientific workflow; the workflow associating SEP events with solar flare, CME and Type III Radio burst events. Workflows support large scale and statistical analysis

…

Screenshot of the HELIO portal interface, showing an example data cart and the parameter input area

…

HELIO portal showing the result of a CIR backwards propagation model

…

Figures - uploaded by Anja Le Blanc

Content may be subject to copyright.

Content uploaded by Anja Le Blanc

Content may be subject to copyright.

J Grid Computing (2013) 11:481–503

DOI 10.1007/s10723-013-9256-5

Workflows for Heliophysics

Anja Le Blanc ·John Brooke ·Donal Fellows ·

Marco Soldati ·David Pérez-Suárez ·

Alessandro Marassi ·Andrej Santin

Received: 14 May 2012 / Accepted: 16 April 2013 / Published online: 8 May 2013

Abstract In this paper we describe how we have

introduced workflows into the working prac-

tices of a community for whom the concept of

workflows is very new, namely the heliophysics

community. Heliophysics is a branch of astro-

physics which studies the Sun and the interactions

between the Sun and the planets, by tracking solar

events as they travel throughout the Solar system.

Heliophysics produces two major challenges for

workflow technology. Firstly it is a systems science

where research is currently developed by many

different communities who need reliable data

models and metadata to be able to work together.

Thus it has major challenges in the semantics

A. Le Blanc (B)·J. Brooke ·D. Fellows

University of Manchester, Oxford Road,

Manchester M13 9PL, UK

e-mail: anja.leblanc@manchester.ac.uk

J. Brooke

e-mail: john.brooke@manchester.ac.uk

D. Fellows

e-mail: donal.k.fellows@manchester.ac.uk

M. Soldati

Fachhochschule Nordwestschweiz,

Institute of 4D Technologies,

Steinackerstrasse 5,

5210 Windisch, Switzerland

e-mail: marco.soldati@fhnw.ch

of workflows. Secondly, the problem of time is

critical in heliophysics; the workflows must take

account of the propagation of events outwards

from the sun. They have to address the four di-

mensional nature of space and time in terms of the

indexing of data. We discuss how we have built an

environment for Heliophysics workflows building

on and extending the Taverna workflow system

and utilising the myExperiment site for sharing

workflows. We also describe how we have inte-

grated the workflows into the existing practices

of the communities involved in Heliophysics by

developing a web portal which can hide the techni-

cal details from the users, who can concentrate on

D. Pérez-Suárez

Trinity College Dublin, College Green,

Dublin 2, Ireland

e-mail: d.perezsuarez@tcd.ie

A. Marassi ·A. Santin

INAF-Astronomical Observatory of Trieste,

Loc. Basovizza n. 302, 34012 Trieste, Italy

A. Marassi

e-mail: marassi@oats.inaf.it

A. Santin

e-mail: asantin@oats.inaf.it

Present Address:

D. Pérez-Suárez

Finnish Meteorological Institute, Erik Palménin

aukio 1, 00560 Helsinki, Finland

482 A. Le Blanc et al.

the data from their scientific point of view rather

than on the methods used to integrate and process

the data. This work has been developed in the

EU Framework 7 project HELIO, and is being

disseminated to the worldwide Heliophysics com-

munity, since Heliophysics requires integration of

effort on a global scale.

Keywords Workflow ·Taverna ·myExperiment ·

Taverna Server ·Portal integration ·Heliophysics

1 Introduction

Heliophysics is a discipline within astrophysics

which studies the Sun, Heliosphere, and Planetary

Environments as a single connected system. Over

years the astronomy and heliophysics domains

have created Virtual Observatories (VxO)1[1]to

provide access to data and software tools. VxOs

provide single points of access to datasets scat-

tered over the whole planet. Their main purpose

is to enable support for new ways of doing re-

search [2–4]. Many VxOs are based on a collec-

tion of individual services which are accessible,

for example, through a graphical portal applica-

tion. VxOs benefit from workflows as they allow

users to combine several services to do more com-

plex tasks and to automate repeated executions

of common procedures. Specific workflows could

then be deployed as standalone services, therefore

becoming an integrated part of a VxO.

In this paper we describe our work in the

Heliophysics Integrated Observatory (HELIO)

project2[5], which created a VxO for heliophys-

ics, in which workflows play a central role. The

use of workflows in the heliophysics domain is

relatively new. We describe our experiences and

how workflows are integrated into the different

areas of the infrastructure. This includes allowing

access to Grid resources, since some of the tasks

1The astronomy and solar physics communities refer to vir-

tual observatories as “VOs”, but this acronym is not used

in this paper to avoid confusion with virtual organisations.

2HELIO project website: http://helio-vo.eu.

in a heliophysics workflow require large amounts

of processing power (e.g., for image analysis).

For the same reason, we also require that the

workflows can be run in the Grid without any

need for users to remain connected via a user in-

terface. However, the users still need to be able to

monitor and retrieve the results from an interface

with which they are familiar.

A particular feature of the work presented in

this paper will be the description of our efforts

to embed the use of workflows into the working

practices of the heliophysics community. We have

had to cater for two broad classes of users. Firstly

there are heliophysicists who are confident with

novel methods of computation, who want to have

access to “power tools” for building workflows,

to be used directly by themselves and also by

others in their community. Secondly, there is

a much larger category of heliophysicists who

want a data-centric interface for working, where

the workflows are hidden from direct view, but

are presented as services with the functionality

to perform data analysis or transformation. The

workflows are presented from the HELIO portal3

(the VxO portal) as “virtual services”. This allows

the primary user interface to remain as simple as

possible, but allows the tools that it accesses to be

customised for particular methods of working with

data as these are developed by the scientists.

The following section will familiarise the reader

with some background information specific to

the domain of heliophysics; it discusses data and

web services which are the fundamentals of any

workflow. Section 3describes the experience of

introducing workflows to the community and clas-

sifies workflows. Section 4deals with the sharing

of created workflows in the community. Section 5

describes the development of the Taverna Server

which is required for the remote execution of

workflows and is the basis of the integration of

workflow execution in the HELIO portal which is

the topic of Section 6. We summarise and discuss

our achievements in Section 7.

3The HELIO portal is accessible at http://hfe.helio-vo.eu.

Workflows for Heliophysics 483

2 Background

2.1 Data in the Heliophysics Domain

Remote observations are the main data source

in astronomy. These are made by different tele-

scopes (either ground- or space-based) in different

wavelengths. The fact that the movement of the

stars and extra-galactic objects in our sky are

relatively small makes it easy to classify the ob-

servation in databases and catalogues, i.e., each

object has a unique coordinate in a 2D space—our

celestial dome.

In heliophysics this situation is considerably

more complicated. Planets are orbiting around

the Sun, making their location time dependent.

Also solar features are time dependent, as their

position depends on the rotation of the Sun (i.e.,

approximately once every 25 days on the so-

lar equator). Moreover, events on the Sun pro-

duce disturbances through their propagation in

the solar system. This makes heliophysics a 4-

dimensional domain; time is critical in HELIO

workflows and searches. Also, since observations

are made inside the heliosphere, we have to in-

tegrate data indexed by a variety of different co-

ordinate systems (e.g., many instruments use an

instrument-centric coordinate system).

Observations in heliophysics are not only

sensed remotely, but there are also in-situ mea-

surements. While remotely sensed observations

are recorded within minutes after an event oc-

curred, in-situ measurements of the same event

are detected from several hours to days after

it originates, as different particles propagates at

different velocities. This makes the association of

related observations much more difficult.

Historically heliophysics developed out of

a number of diverse/independent disciplines

(e.g., solar physics, planetary physics, space

weather,...). This led to big differences in

how these disciplines handled observations (i.e.,

different file formats, analysis software tools, and

data archives). Most of the disciplines have cre-

ated their own VxOs to keep their data easily

accessible, yet they are disconnected. Thus, it is

complicated for a scientist to perceive the big

picture of an event, as the relevant data needs to

be collected and combined from multiple places.

Besides observations, there are also data pro-

ducts (e.g., features being detected manually or

automatically on the Sun). Some are included

as catalogues on websites, others are offered as

“event lists” in scientific papers. As with data,

each of these data products follow different princi-

ples (e.g., name conventions, date formats, units).

HELIO collects some of these different data pro-

ducts under the umbrella of services according to

their principal subject. There is a certain amount

of integration work necessary to make these in-

dividual data products fit the overall model. The

typical ingestion process is in principle structured

as follows:

1. data source (catalogue) identification and se-

lection;

2. original data parsing and conversion into a

database compatible intermediate ASCII file;

3. table design and creation following the meta-

data constraints;

4. data ingestion from the ASCII file;

5. UCD/Utype compilation for the catalogue

VOTable.

Step number 3 makes sure that the same ta-

ble field names identify the same contents, which

are described differently in different data sources,

but are semantically equivalent (e.g., begin time,

start time, start of the event are defined as

time_start).

Step number 5 assigns meta-data tags to each

database field. UCD (Unified Content Descrip-

tors) [6] is a list of controlled vocabulary to classify

content semantically and to interoperate hetero-

geneous datasets; UType is a string which refer-

ences entries in an external data model (in our

case the common overarching HELIO data model

integrating all the different service data models).

This helps users to link data delivered from one

service to data belonging to any other service.

VOTable is the standard XML output format used

to exchange astronomy data [7].

2.2 Web Services in the Heliophysics Domain

Historically heliophysicists have provided web ap-

plications as interfaces to their data or stand-alone

tools for their community. Most of the scientific

484 A. Le Blanc et al.

analysis and visualisation was done using the IDL

[8] scripting language. With the rising popularity

of web services in other domains, new heliophysics

projects adopted this technology. The CASSIS

European Project [9] conducted a study on the

interoperability readiness of available European

services and found that a majority of services out-

side the ones created by the HELIO project are

not yet accessible via standard web service proto-

cols,suchasSOAP[10]orREST[11]. Projects

such as CASSIS are currently undertaken with

the aim to foster interoperability and accessibility

Table 1 List of web services provided by HELIO

Service name Description Protocols Auth4

Short Long SOAP REST

CTS Coordinate transformations Allows the transformation from 

service a set of input coordinates to a

set of output coordinates

CXS Context service Provides images of context 

information such as GOES

lightcurve, flare locations, or

the model of Parker’s spiral

DES Data evaluation service Creates on demand event lists 

by user defined criteria

DPAS Data provider access service Provides access to data files 

of various providers

HEC HELIO event catalogue Provides access to a diverse 

range of heliophysics events

HFC HELIO feature catalogue Contains catalogues of 

filaments, coronal holes,

sunspots, active regions,

and type III radio bursts

HPS HELIO processing service Executes user provided codes 

on the Grid service in Ireland

HRS HELIO registry service Provides a registry to all 

available services

HSS HELIO storage service Provides file store for user 

data

ICS Instrument capabilities service Contains a list of instruments 

and observatories and their

characteristics

ILS Instrument locations service Provides the locations of 

planets and observatories in

multiple coordinate systems

over a 30 year period

SMS Semantic mappings service Provides mappings between 

terms in the heliophysics

domain

TavServ Taverna workflow execution Provides remote execution 

of Taverna workflows

UOC Unified observing catalogue Provides information about 

observing position and

observing time periods for

pointed instruments

4Indicates whether a user has to authenticate to the service

Workflows for Heliophysics 485

across project and domain boundaries. Outside

of the European area, NASA does provide some

web services based on the REST protocol [12].

The HELIO project is based on a service ori-

ented architecture. Every task is implemented

as a standalone web service. Table 1lists the

available HELIO services. The web services are

hosted on hardware according to the services re-

quirements (e.g., UOC on a server with large and

fast file store, HPS on a server with many CPU,

and Taverna Server (TavServ) on a server with

large memory). Most HELIO web services can

also be accessed through a dedicated web inter-

face. Additionally the HELIO portal (described in

Section 6) provides a unified web interface to all

services.

From the point of view of functionality the

HELIO services can be grouped into four categories.

1. Meta-data access services are based on the

HELIO Query Interface (HQI) and return

data that describe observatories, solar events,

solar features, observation data, ...The HQI

is specified in WSDL and implemented as a

configurable Java servlet which can be de-

ployed in any servlet container. It provides a

standardised, consistent and safe bridge to a

data table in a relational database. As query

languages the interface supports PQL (Para-

meterized Query Language) [13] and a subset

of SQL (Structured Query Language) [14].

2. Processing services provide access to a vari-

ety of data processing infrastructures offered

by HELIO. One of these infrastructures and

of particular interest for this article is the

Taverna server (Section 5). Others are the

context service (CXS), the processing service

(HPS) and the data evaluation service (DES).

3. Infrastructure services are used internally by

HELIO for administration and maintenance

of its infrastructure. Relevant to this paper

are the HELIO registry service which offers

a registry of all available services and the

myExperiment repository which allows us to

upload Taverna workflows (Section 4).

4. The Data access services give access to the

actual observations made by observatories.

3 Workflow Use in Heliophysics

The workflows developed within HELIO com-

bine the web services provided by the project. Any

workflow system which supports standard web

service technology could be used for this purpose.

Our requirements for a workflow system were

assessed as:

– Use of standard web service protocols

– Support of looping

– Execution of workflows in both a client and a

server environment

– Available support and documentation

– Platform independence

– Support of multiple instances of a service and

automatic fail-over.

There are many workflow systems which can

support these requirements, notably including sys-

tems based on BPEL [15], Pegasus [16], Unicore

[17], KNIME [18], Galaxy [19], Kepler [20], and

Taverna [21]. Each of these has its own set of

advantages and disadvantages:

– BPEL-based workflow systems (and the Petri-

net [22] formalism on which they are based)

tend to focus more on the handling of indivi-

dual pieces of data, rather than larger streams

of data. This is a result of the primary topic

area being business workflows, and makes it

less suitable for scientific use where the model

of failure processing is different.

– Pegasus is not really a workflow server as such,

so much as a system for executing workflows.

In terms of servers, it is relatively mature, but

it suffers from the relative lack of user-focused

tools for the preparation of the workflows;

the creation of a Pegasus workflow primarily

requires the use of programming tools; this

restricts its use among scientists in disciplines

where programming talent is historically a

lesser requirement.

– Unicore is mainly not a workflow system

but rather a mature system for accessing

computer-based resources, though it includes

a workflow processing component for al-

lowing the resources to be coordinated for

486 A. Le Blanc et al.

higher-level tasks. Due to its historic focus on

use in high-performance computing, the fun-

damental coordination units are at the level of

files rather than records; processing units need

to be able to handle entire data collections

themselves rather than having the workflow

system level manage that for them.

– KNIME is a commercial workflow system

used largely for processing of high-throughput

genomic data. While it is an exceptionally

strong candidate for use within its domain, it

is relatively poor at supporting other scientific

disciplines due to the fact that its workflow

components are focused on supporting ge-

nomics and it is not an open ecosystem tool.

– Galaxy is a server-based scientific workflow

system primarily used in the biosciences that

uses a pure local processing model, though

those local processing elements may make

web service calls to other systems. There is

a collaboration effort in place to allow joint

Galaxy-Taverna workflows to be created.

– Kepler is a general scientific workflow sys-

tem focused on server-based execution and a

graphical design tool, popular particularly in

North America. It has a very rich collection

of generic workflow system components, and

a mechanism for the creation of new workflow

components.

– Taverna is a workflow system consisting of

a graphical workbench for design and local

execution, a separate command-line tool for

pure workflow execution, and a server for

execution of workflows remotely. It also in-

cludes a social ecosystem service for allowing

scientists to share access to workflows as pub-

lished artefacts. It is focused mainly on being

a system for routing substantial numbers of

relatively-small data items through arbitrary

web services, allowing it to be relatively eas-

ily adapted to novel data flows and service

types. At the time that the HELIO project

started, there was an obsolete Taverna 1

Server, though it had fallen into disuse due

to its dependence on aspects of the internal

architecture of Taverna that had ceased to be

true with the evolution of Taverna 2.

Though historically focused on the cellular

biosciences, Taverna now supports use across

a wide array of disciplines, including chem-

istry, astronomy, physiology and biodiversity

modelling.

The HELIO project decided to use the Taverna

workflow system [21] since it provided most of

the required functionality, is still in active devel-

opment with a responsive support mailing list,

and the developers are proactive in seeking input

and feedback for further development of their

product. An acceptable alternative would have

been the US-based Kepler [20] workflow engine.

Among others, Kepler is used in several NASA-

funded projects. To facilitate the use of HELIO-

developed services by as wide a community as

possible, a key goal of the project was to en-

sure that all basic services would be workflow

system agnostic: they had to be equally usable

from Taverna, Kepler, a web-site front end, or

end-user programming language toolkits (with a

particular focus on IDL [8] given its historical

prominence within the seed communities). How-

ever, the higher-level services were to be imple-

mented using and on top of Taverna, as it was

felt that it was the workflow system that had the

best support for use by scientists with limited

programming experience while still providing the

necessary flexibility to be able actually to perform

the desired workflow tasks.

Taverna workbench, the workflow editor, is

system independent and provides an easy to use

graphical user interface for constructing, editing

and executing workflows. Users are able to build

up libraries of services for use in their domain. The

integration of a workflow repository (Section 4)

via Taverna’s plug-in system enhances the usabil-

ity. The heliophysics community will also benefit

from Taverna plug-ins developed for the astro-

physics domain [23] which uses the same service

repository software and shares the interchange

format for tables commonly returned by HELIO

web services (i.e., VOTable).

On a larger scale, some Astronomy VxOs have

started looking at workflows as early as 2005 [24].

In 2011 the International Virtual Observatory

Alliance (IVOA) [25] even started a discussion

mailing list focused on workflows. A draft IVOA

note [26] on that subject is being discussed. So

Workflows for Heliophysics 487

workflows gain interest not only in the heliophys-

ics community but in other domains as well.

3.1 Experiences in Introducing Workflows

to Heliophysics Scientists

So far, most of the workflows have been de-

veloped by a computer scientist in close co-

operation with heliophysicists. The complexity of

the workflows was growing with the maturity

of the services they use. One of the goals of

HELIO is to help scientists to write workflows

themselves as part of their everyday work. The

strategy we followed was first to create some inter-

est by developing workflows and demonstrating

their strengths and their usefulness, and then to

provide training targeted at the discipline. Follow-

ing on from this was a period of time when the

technologist is available to help solving problems.

At the beginning of the project we used every

possibility to give short presentations and demon-

strations at workshops and presented posters to

the scientific community at conferences. After the

HELIO services were completed, we organised

and ran a Taverna tutorial session using example

workflows involving these HELIO services. This

tutorial was conducted just a week before a Co-

ordinated Data Analysis Workshop where parti-

cipants could apply that knowledge with computer

scientist present to solve problems.

Here are a few observations from our

experience:

The Learning Curve depends on the previous

exposure to a graphical user interface operated

mainly by mouse actions. Queries are done in

SQL/PQL and the user needs to know in ad-

vance the name of the tables and columns. So

the heliophysicist needs, besides knowing some

basic SQL/PQL, information about the data-

base itself. The HQI provides some support

functions to retrieve this information. Scripts in

Taverna are written in Java; most heliophysi-

cists do not know that language but are used

to writing code in IDL [8]. Heliophysics data

formats such as VOTable [7] are not easily

used in a workflow environment because they

require conversion scripts. Multidimensionality

of data vectors is difficult to understand, espe-

cially in connection with looping over functions

or providing input of the correct depth.

Documentation is very important, not only for

the workflow system, but especially for web

services. Lack of documentation renders a web

service unusable by anyone but the developers.

Documentation has to go beyond the function-

ality of the service and has to include documen-

tation of the content; in case of databases that

includes the tables and table structures. It is also

important to have one central point from which

it is possible to find the web services and their

documentation. In HELIO we have a service

registry as this central point.

Problem Solving is a technique which needs to

be taught alongside workflow building. The

workflow system contains new areas for possi-

ble errors be it iteration strategies or list hand-

ling. Each of these workflow specific error

domains shows specific characteristics. Users

who recognise these characteristics can resolve

problems quickly.

Integration of current working practises; Scientists

have built up a set of tools or scripts [27]which

already perform certain tasks which they do not

wish to re-implement within the workflow sys-

tem. Taverna now allows the seamless integra-

tion of tools external to Taverna. A disadvan-

tage of this is that workflows constructed in that

way are less usable by other scientists unless

the exact environment is replicated. The same

issues occur when trying to run these workflows

on a workflow server.

3.2 Application Areas for Workflows

in Heliophysics

Workflows can be constructed for different pur-

poses. In the following sections we describe three

classes of workflows. The first class is used as

integration tests at development time in order to

assert consistency between different services. The

second class introduces virtual services which pro-

vide new functionality in support of science. The

third class implements the actual science analysis

by combining web services, user-defined opera-

tions and virtual services into larger workflows.

The three classes are illustrated each by an

example. In the associated workflow diagrams the

488 A. Le Blanc et al.

colours of the squares represent different kind of

operators:

green: SOAP operator

purple: XML splitter—decomposes complex

SOAP types into their components

brown: local beanshell [28] scripts—user written

scripts to provide custom functionality

violet: local operator—predefined functions

within Taverna

pink: nested workflows—workflows re-used in-

side another workflow

blue: string constant—a string or text which does

not change

Input and output ports are of a different blue

colour and separately labelled.

3.2.1 Test of Data and Services

During the development of the service infrastruc-

ture, HELIO uses workflows to assert the robust-

ness of individual services and to test consistency

between multiple services. In particular the latter

is of great value in a distributed development

environment. When the output of one service is

used as input for another service, the exchanged

data needs to be kept aligned. An example is the

identifier for an instrument. In HELIO this ID

is defined in the Instrument Capabilities Service

(ICS). Many other services, such as DPAS, UOC,

DES, ILS, and HFC use it as an input parameter.

As the HELIO services are developed by inde-

pendent teams in different locations it is impor-

tant to find inconsistencies as early as possible in

an automated way.

An example of a test of data workflow can

be seen in Fig. 1. It tests the integrity of IDs

between ICS and data provider access service

(DPAS). This workflow [29] does not require any

inputs since it is checking the complete content

of the ICS against the instrument registered in

the DPAS. In a first step the workflow requests

all instrument IDs from the ICS in a SOAP call

(Label 1 in Fig. 1). In a next step the data is

extracted from the VOTable output format and

provided as list for further evaluation (Label 2).

Information about available instruments is acces-

sible in the DPAS via a servlet (Label 3). Again,

suitable data needs to be extracted in a local bean-

shell and cleaned up by removing duplicate entries

(Label 4). Any entry in the instrument ID list from

the DPAS which is not part of the instrument

IDs from the ICS represents data which could be

available to scientists but would not be accessible

since the ID to that data is not registered within

the system; another beanshell searches for those

IDs (Label 5). On the other hand IDs which are

known in the ICS but not in the DPAS only

represent data sources to which HELIO does not

have access. The workflow returns the result list in

two formats, a VOTable with the IDs of missing

ICS entries and a string list of the same IDs.

The advantages of implementing those tests as

workflows is a large time saving and a higher

reliability compared to a manual check. At time of

writing this article the ICS knows 362 instruments

and the DPAS has access to 263. The workflow

identified 21 IDs which are not part of the ICS.

The developers of the ICS and DPAS can take the

results of this workflow to resolve the problems

and to improve the services.

3.2.2 Provide Virtual Service

A virtual service is a workflow which provides

a building block for the implementation of com-

plex scientific use cases. As such it is used in

different circumstances to provide specific func-

tionality which supports the scientific work. It can

be integrated into workflows or into web portals

alongside other web services. There are two types

of virtual services. The first type accesses an exter-

nal service with some default input values to pro-

vide a more specialized functionality. This makes

it possible to simplify the interface to the external

service for the user. The second type combines

a number of services or service calls into some-

thing new. In a workflow environment virtual ser-

vices are commonly used as nested workflows—

a workflow within another workflow. Integrated

in the general user interface it becomes a service

indistinguishable from other web services. This is

expanded in Section 6.

The UOC is a service which provides access to

a large database containing information of when,

where and what pointed instruments have ob-

served. Pointed instruments can observe a specific

Workflows for Heliophysics 489

Fig. 1 Integration test

workflow; the workflow

checks the consistency

between instrument IDs

which are used in the

DPAS and are defined in

the ICS. Test workflows

support the development

of physically separate

services. The labels of

each section of the

workflow are explained

in the text

Split_VOTable_into_its_values

Workflow input ports

Workflow output ports

votable

createLists

nameucdutypevalues

Flatten_List

DPAS_instrumentsinstruments_vector

Get_Web_Page_from_URL

RetrieveContent

Remove_String_Duplicates

String_List_Difference

writeVOTable

SQLSelect_sql_select

ICS_SQLSelect

FROM_value WHAT_value

DPAS_url_servlet

region of the Sun in different locations or of vary-

ing size. A typical scientific question addressed by

the UOC is to find out which instrument has ob-

served a particular region at a given time. Query-

ing the UOC in the standard way would result

in a large table with an entry for every recorded

observation rather than just a list of instrument

IDs. The workflow [30] is implemented as virtual

service and uses a more complex call to the UOC

to return only the instrument IDs. This informa-

tion is sufficient for a scientist to obtain potentially

interesting data files.

The workflow shown in Fig. 2is an example of a

virtual service that combines several service calls.

In this particular case the calls are used to handle

asynchronous service calls properly. An asynchro-

nous service call requires a number of individual

service calls to perform a task. Asynchronous web

services usually perform long running tasks where

the output can not be reliably produced before a

request would time out. Depending on the ser-

vice you find at least three stages: submit the

task, check on status of processing, and request

results. HELIO provides a number of services

where only asynchronous functions are provided.

One of those services is the processing service

which performs user defined code executions on

the HPS. SHEBA [31], the propagation model

490 A. Le Blanc et al.

status

Workflow input ports

Workflow output ports

Workflow input ports

Workflow output ports

executionId

getStatusOfExecution_input

getOutputOfExecution

executionStatus

acceptInput

getStatusOfExecution

getStatusOfExecution_output

starting_time

selectedApplication_input

hit_object

selectedApplication_input_2

SW_starting_speed

selectedApplication_input_3

SW_error_speed

selectedApplication_input_4

VOTable

executeApplication

executeApplication_output

executeApplication_input

fastExecution_value numOfParallelJobs_value executeApplication_selectedApplication

pm_cir_back

starting_time hit_object speed error_speed

getOutputOfExecution_input

getOutputOfExecution_output

Concatenate_two_strings

Get_Web_Page_from_URL

file_name

Fig. 2 Virtual service; the workflow performs a backwards propagation of a co-rotating interaction region on the HPS. The

HPS provides an asynchronous web service only. The virtual service hides the complexity from the user

used in HELIO, is an example application which

runs in such a way. Having the propagation model

as a virtual service enables scientists to run it as a

single component where they do not have to worry

about the individual steps or error cases. Different

solar phenomena propagate differently through

Workflows for Heliophysics 491

the solar system, therefore requiring different

models to describe this propagation. Each of these

is encapsulated in a separate virtual service. The

workflow we use as an example is the backwards

propagation of a Co-rotating Interaction Region

(CIR) [32]. CIRs are regions of high speed solar

wind that co-rotate with the Sun following a spiral

shape (i.e., the Parker spiral [33]). These regions

are associated with coronal holes, a feature seen

on the solar corona. SHEBA can be run forward

Workflow output ports

Workflow input ports

VOTable_hec

STARTTIME

particle_query

ENDTIME beta

Propagation_of_the_Solar_Energetic_Particles_from_the_Sun

sw_speed sw_error_speed eastside_westside

flare_where

particle

Extract_content_of_columns_from_VOTablesVOTableProton

construct_input

ColumnNames_value

TimesAtSun

hit_object_value

Flare_query

dt_value

Flare

Flatten_List

FROM_value_3

Radio_query

Radio_Type_III

CME_query

CME

VOTableCME

Extract_content_of_columns_from_VOTables_2

CreateRadioInput

start_timepeak_time

VOTableFlare

combine_cme_flare

VOTabelFlare

WHERE_value

CreateVOTable

goes_flare_sep_event

soho_lasco_cmehfc_wind_waves

Fig. 3 Scientific workflow; the workflow associating SEP events with solar flare, CME and Type III Radio burst events.

Workflows support large scale and statistical analysis

492 A. Le Blanc et al.

and backwards; this means that from the location

of a coronal hole at a certain time it provides infor-

mation about when and where the CIR associated

with it should be detectable and vice-versa, thus

helping the heliophysicist to find a relationship

between the properties of the coronal hole and the

CIR.

3.2.3 Advance Scientific Research

It is challenging and expensive to collect actual

data in Heliophysics. The data are either re-

motely sensed or in situ measurements of events

propagating through the heliosphere. The nature

of events in this science makes it impossible to

gather any data at the event source. Most obser-

vatories are located only at key positions in the

heliosphere, as in orbits around planets or the

Lagrangian points (positions in space where

the combined gravitational pull of two large

masses provide a stationary position relative to

them), while others are travelling through the he-

liosphere (e.g., Voyager spacecraft). That all leads

to the relative sparsity of data sources. Heliophys-

ical events are characterised by their variability,

and their effects could have been affected by a

multitude of surrounding influences. Therefore

it is rarely straightforward to connect something

experienced on Earth with an event on the Sun,

or to predict the dangers of something remotely

observed at the Sun to satellites around Earth or

to power grids on it.

Heliophysicists spent time researching the

effects of single events, trying to propagate them

through the heliosphere, looking for their signa-

tures in other data sets. Once a connection is

identified, a workflow can reproduce the single

steps the scientist took and find other events

which show the same behaviour, or identify events

with the same global parameters where the beha-

viour could not be reproduced which could point

to some missing influences. All workflows which

were created to help answer or reproduce some

scientific question are done in close co-operation

with heliophysicists.

Let us describe a workflow which was created

during a Co-ordinated Data Analysis Workshop

where a group of scientists tried to associate so-

lar energetic particle (SEP) events measured at

Earth with flares, coronal mass ejections (CMEs)

and radio events observed on the Sun [34]. This

workflow, shown in Fig. 3, requires a time range

and propagation parameters as inputs and pro-

ceeds to work as follows:

1. Find which SEP events have been observed at

Earth during the time of interest. This is done

through a query to one of the lists at the HEC.

2. Propagate the events found backwards to re-

trieve the time and position in which the par-

ticles were accelerated towards Earth. This is

calculated by the propagation model available

at the HPS.

3. Flare catalogues at the HEC are queried using

the times previously calculated (plus/minus a

defined range). These queries provide start

and peak times of the energy released which

are consequently used in the next step.

4. Coronal mass ejections and radio shocks are

events observed above the solar atmosphere;

thus two new queries to the HEC (one for

CMEs, another for radio shocks detections)

are made using the new time ranges obtained

in the previous step.

5. Finally, a summary table is created which links

each SEP event to the associated flare, CME

and radio shock.

4 Workflow Sharing

The success of establishing workflows as a new

way for scientists to work depends largely on their

ability to find suitable example workflows and be-

ing able to share and discuss their workflows with

interested peers. Consideration has to be given

to the protection of their intellectual copyright

so that they do not lose the ability to publish

any results in high class journals. Any sharing of

workflows needs to be manageable and under-

standable to the individuals.

The requirements for our projects were

analysed as:

– Easy to manage and set sharing settings

– API to content for integration elsewhere

– Secure storage of data including backups

– Classifiable and sufficient meta-data set

Workflows for Heliophysics 493

– Choice of license

– DOI [35]orURI[36] for persistent references

– Comments and feedback for workflows

– Example values for valid inputs to workflows

– Embedded references to a description of un-

derlying science

In HELIO we decided to use the myExperi-

ment [37] repository for the purpose of workflow

sharing with each other. myExperiment has built

in support for Taverna workflows like display-

ing embedded meta-data, it keeps the different

versions of a workflow accessible, and it is itself

accessible from within Taverna which makes the

reuse of workflows easy. The HELIO project cre-

ated a group especially for sharing heliophysics

workflows with each other called ‘helio’ which

has now 26 members and at the time of writing

has shared 87 items; most of those are Taverna

workflows.

The social elements in the myExperiment envi-

ronment allow users to restrict the visibility and

usability of their intellectual property to people

with whom they wish to co-operate or whom they

would like to be able to assess their work, or allow

to use it.

myExpriment provides a REST interface to the

content of the repository, which we make use of in

the HELIO portal (Section 6).

5 Remote Execution Service

In order to make good use of workflows as part

of the scientific process, we realized that users

would need to have some way of running those

workflows without having a local installation of

a heavyweight piece of software like the Ta-

verna Workbench. Because of the complexity of

a workflow execution engine, it became rapidly

clear to us that we would not want to have that

installed on end-user systems at all. Furthermore,

by moving it to a special dedicated deployment, it

would also be possible to use it to support other

key use cases, such as placing workflows behind

a (relatively-lightweight) portal (see Section 6).

Another key benefit of this is that it enables a

workflow to run for a long time without having the

user of that workflow connected to the internet for

all that time; when a workflow might potentially

take longer than a working day, this becomes a

significant issue.

Thus we have developed Taverna Server in

HELIO. This is a fully service-oriented interface

to the Taverna [21] workflow execution engine.

5.1 Key Features

Taverna Server is based on Taverna 2.4. It

supports the upload and execution of arbitrary

Taverna 2 workflows, provided they contain no

interactive components (there is no GUI through

which a user might interact with the running

workflow). It also has workflow run introspection

capabilities, so that clients can ask the server what

inputs they should supply and what outputs were

provided without having to understand the inter-

nals of a Taverna workflow. Each workflow run

lasts only a limited amount of time (according to

the principles of resources on a Grid) with this

life-span being user settable; upon the completion

of a workflow run, a notification is published by

the server’s Atom [38] feed (and optionally also

via email or Jabber/XMPP [39], depending on

setup).

The server provides access to a workflow run’s

input and output files. The only practical limit

on file size is the amount of disk space on the

deployment system. Each workflow run is isolated

from all the others on the server, so that inputs to

and results of one run do not affect another.

We support accessing the server via both

RESTful [11] and SOAP [10] APIs, both of which

are implemented as views over an underlying abs-

tract interface (the “workflow run”). This is a

particular benefit, because it means that different

languages can provide interfaces to the server in

the way that is most natural to them: with Java,

it is typically the case that a SOAP interface is

simplest, whereas a scripting language like Ruby

can use the REST interface more easily.

A number of security features are present,

keeping log-in details confidential, allowing con-

trol over who can connect to the workflow server,

the separation of runs by user, and the supply of

credentials to other services from workflow runs

494 A. Le Blanc et al.

Fig. 4 Summary diagram

of the architecture of

Taverna Server. The

orange boxes are the

major components of the

architecture, the yellow

boxes are selected minor

components of the

architecture, and the

arrows indicate the major

interactions between the

pieces; green parts are

pre-existing infrastructure

and frameworks. Note

that the main server (top

green box) cannot directly

access the user file-store;

there is no assumption

that they are shared

Tomcat/Spring/CXF

Abstract Model

SOAP Interface REST Interface

Secure Fork

Process

Per-User Manager

Process

File

System

Access

Workﬂow

Run

Builder

Process

Monitor

Per-Run Execution Engine Process

Concrete Implementation

State DatabaseWorkﬂow Run Factory Model Delegates

User

Filestore

Files always

accessed via

delegate

Fork after setup

Wait for termination

sudo

Create or

reuse

existing

Implement via model

Model implementation maintains state

Direct access

for efﬁciency

on the server.5There is a mechanism whereby a

user can grant another user access to one of their

runs (e.g., to allow them to see some results, or to

fix some problem in the run’s configuration). This

does not include the security credentials though;

those are always carefully hidden.

We also provide a management API via both

JMX [40] and REST, which allows setting many

options and viewing things such as resource ac-

counting.

Currently at an advanced stage of development

(primarily within the BioVeL project [41]) is a sys-

tem for unified handling of interactive workflows

via a web browser, allowing a particular workflow

to ask questions of its users in the same way,

whether that workflow is running in the Taverna

Workbench or in Taverna Server. We anticipate

that this will be exceptionally useful across many

domains of science, where it is frequently only

possible to semi-automate processes; the aim is to

ensure that the expert scientist is kept in the loop

5Multiple formats of security credential are supported, so

there is no need for clients to be written in a particular

language to gain access to the security. This is distinctly

different from the underlying Taverna platform which only

supports the Bouncy Castle format key-stores.

at critical stages while mechanising the process-

ing in-between. We are also working on produ-

cing a deployment version of Taverna Server as

Amazon Web Services AMI [42]. Other areas

under development, though currently somewhat

less advanced, are systems for unified provenance

models and discovery of workflow execution state

so that users of a workflow can discover how far

it has progressed on the server and what exactly it

did, just as they can with the Taverna Workbench.

5.2 Architecture

The workflow server consists of a web application

(see Fig. 4) that provides SOAP and a REST

views of an underlying abstract model. That model

consists of an abstract factory, which knows how

to create new workflow runs and list the existing

ones, an abstract run description which provides a

number of properties relating to the workflow run

(e.g., its execution state and the mapping of files to

workflow inputs), and an abstract file system de-

scription that models a particular workflow run’s

working directory (and its subdirectories and their

contents).

The implementation of the abstract model is

done through mapping the abstract workflow runs

to Java proxy objects running in a sub-process.

Workflows for Heliophysics 495

Those objects are each associated with a particular

directory that is specially created for the workflow

run, and the objects that handle file system access

are careful to ensure that each file accessed can

only be in the working directory or below. Sym-

bolic links are prohibited from being accessed;

the abstract file system model claims they simply

don’t exist, so stopping potential for information

leaks through that mechanism.

An executing workflow run (the central state of

the workflow run, though not the initial or final

one) corresponds to the presence of a workflow

execution process specifically for that run. An

additional security constraint is that each distinct

user of the system runs his workflows with a

separate user account; this prevents information

leakage from one user to another (the need to

prevent different runs of the same user from see-

ing each other is significantly less). All access to

the workflow run’s file system is done through the

proxy objects; there is no need for a shared file

system between the core server and the worker

processes at all (though the current implementa-

tion of the sub-process creation engine is more

constrained).

5.3 Development

The server is implemented as a Java web ap-

plication that sits on top of the Apache CXF

2.5 web service framework [43] hosted inside

the Spring 3.0 dependency injection framework

[44]. The abstract model described above is im-

plemented as a collection of Spring beans with

JAX-WS [45] and JAX-RS [46] annotations to

describe the mapping of the abstract model into

the service views presented by CXF. Security and

transaction constraints are enforced through the

use of aspect-oriented programming; a particular

workflow run’s model bean is only available if the

currently accessing user has permission to see that

run.

The sub-processes that implement the abstract

model are spawned through the use of the sudo

program [47], which can be configured to allow a

specific process permission to run particular pro-

grams without interaction for a limited set of users

(typically, a Unix user group). By strictly con-

straining what may be run this way and for whom,

it ensures that the potential for damage from

abuse is as limited as practical. The user specific

sub-processes started this way use a strictly reg-

ulated form of JRMP [48] to communicate back

with the main server process.

The presentation of security tokens to a

workflow run implementation is handled spe-

cially. They are written as an encrypted Bouncy

Castle [49] key-store to a user specific directory

that is not part of the working directory hierarchy;

the credentials are not (normally) visible to the

user after they are written to the disk. Further-

more, they are encrypted with a high entropy one-

time password that is never reused for any other

purpose and which is itself never written to disk

at all. This ensures that only authorised processes

running as the correct user can ever extract the

credentials; nothing else can find this information

out. When coupled with the fact that users are

strongly encouraged to give their credentials only

to servers that they trust to act honestly in the

first place, this gives as high a level of assurance

of confidentiality as is reasonably practical, given

that some remote services accessed by a workflow

might require a password to be used in the first

place. We also support the HELIO identity token

system6[50,51], which allows a cryptographic

token to be obtained by the portal and then passed

through to appropriately enabled services without

explicit use of the credential management part of

the workflow server API.

6 Integration into the HELIO Portal

The HELIO portal is a web application that pro-

vides integrated access to the HELIO web ser-

vices. Access to various HELIO services is imple-

mented in a generic and unified way. This has the

advantage that new services can be easily added

to the portal. This is particularly convenient for

adding new Taverna workflows to the system.

The HELIO portal is centred around data and

the tasks performed on it. Data may originate

6This is a deployment option that is not normally enabled,

as the current implementation of the workflow engine

itself is sufficiently constrained that it is necessary to use

a special workflow to make use of this feature.

496 A. Le Blanc et al.

either from catalogues within the system, from a

Taverna workflow or from an uploaded VOTable.

Data objects are presented as primary entities of

the user interface in a preferred location such

that users can get a good overview of previously

obtained data. Tasks may then be applied to the

data objects in order to transform them to new

data objects.

This data-centric view distinguishes the HELIO

portal from normal web applications as well as

traditional scientific systems. Normal web appli-

cations are workflow oriented in the sense that

a predefined flow of web pages is guiding the

user through the process of a data analysis task.

Traditional scientific systems such as a command

line based data analysis tool are function oriented.

Their most important entities are groups of ba-

sic functions that are joined together to fulfil a

specific task. Compared to a data-centric system a

user requires more technical knowledge about the

input and output parameters of these functions.

6.1 User Interaction Pattern

The way of using data and tasks in the HELIO

portal follows a generic user interaction pattern.

1. Select a task to be executed. Tasks are

presented in natural language like: “Get ob-

servations for a given time range”, “See what

instruments covered this period”, etc. This

task oriented approach supports novice users

to perform common tasks without having

deep knowledge of the detailed science, while

not preventing more advanced users from

working with the data.

2. Gather the input parameters required for the

selected task. The input parameters may be

entered manually or they may be reused from

a previously executed task such as a Taverna

workflow. Generally, tasks have sensible de-

faults for most input parameters.

The HELIO portal provides customised dia-

logues for different kinds of input para-

meters. Commonly used input parameters

such as date ranges or instruments are en-

tered through a dedicated dialogue. For other

types of input parameters HELIO provides a

configurable, generic dialogue. The latter is

used for workflow specific parameters.

3. Execute the task on the HELIO infrastructure.

Most tasks are connected directly to one or

several HELIO web services. Currently, only

one task can be run at the time. This is fine for

HELIO as most of the tasks terminate within

seconds, but it might have to be changed in

future versions of the portal.

4. Visualise the result of the task. Depending on

the data type of the result, different tools are

used for visualisation. It is even possible to

have different visualisations for a data type;

e.g., a table with time series can be repre-

sented as a plain HTML table or as a time line

plot.

5. Extract new input parameters from the result.

In many cases a user can extract new input

parameters from a result. These parameters

can be used as input for a succeeding task or

to refine the current task. In a timeline plot

a user can select a date range of interest. In

a table of instruments the user can look for

instruments with similar capabilities.

6. Continue at step 1. The process can be

repeated until the original question is

sufficiently answered.

In order to support sharing parameters be-

tween multiple tasks the portal introduces the

concept of a data cart. The data cart is a dedicated

area in the web interface to persist and manage

collections of parameter values. Parameters ex-

tracted from a task result will be stored in the data

cart. Using the mouse they can be dragged from

there and be dropped to the input area of another

task.

The data cart is inspired by shopping carts

known from web shops. It accentuates the impor-

tance of data in scientific applications and thus

reflects the astronomer’s way of thinking in terms

of data rather than functions. Analysis of data is

the main interest of a scientist. How to get to the

data, e.g. which function to run, is less interesting.

The portal interaction pattern and the data cart

are the core concepts that drove the design of the

HELIO user interface. Figure 5shows the portal

with the task menu at the top, the data cart right

below and the parameter input area for a selected

Workflows for Heliophysics 497

Fig. 5 Screenshot of the HELIO portal interface, showing an example data cart and the parameter input area

task in the bottom part. Figure 6shows part of the

result of a propagation model task. The buttons

above the result table allow the user to extract

input parameters from the table.

6.2 Workflow Integration

Taverna workflows are integrated into the HE-

LIO portal by presenting them as tasks. They

are executed on a Taverna Server (see Section 5)

which is accessed through its SOAP interface. The

actual workflow must be registered in the myEx-

periment repository (see Section 4). In this way

the portal can always fetch the latest version of

a workflow, assuming that this is the most stable

one.

By presenting Taverna workflows as tasks they

are treated by the portal like other HELIO ser-

vices such as the HELIO event catalogue (HEC),

the context service (CXS), the HELIO process-

ing service (HPS) or the data evaluation service

(DES) (see Section 2.2). Due to the data cart, para-

meters of a workflow can be shared with other

tasks. This opens a whole field of new research

possibilities.

The data and task centric approach of the por-

tal stands in contrast to the Taverna workbench

which follows a workflow centric route. Taverna

workbench offers a generic input UI for any

workflow. All input parameters are rendered a

simple text boxes with a descriptive text. For the

HELIO portal it is compulsory to know the exact

data type in order to reuse it with other tasks.

498 A. Le Blanc et al.

Fig. 6 HELIO portal showing the result of a CIR backwards propagation model

The situation gets even more complex for output

parameters where HELIO portal needs to know

how to present them to the user while Taverna

workbench provides the user with a choice of

generic viewers (e.g., text viewer, image viewer,

XML viewer).

Taverna workbench is ideally suited for the

development and testing of workflows, while the

HELIO portal offers easy access to a selection of

common Taverna workflows.

Integration of new workflows in the portal is a

manual process and requires some development

and configuration work. Presently, there is no

way to add new workflows automatically to the

HELIO portal. The main reason is that the de-

scription format used by the HELIO workflows,

the T2FLOW format [52], is slightly too lim-

ited to support the type of metadata that would

be required for automated UI construction. The

succeeding format, SCUFL2, is able to support

sufficient richness of annotation, but was not

available in time for the development of HELIO.

6.3 Implementation

The HELIO portal is implemented as a Rich

Internet Application (RIA) [53]. A RIA mimics

much of the capabilities that would be expected in

a desktop application. A large fraction of a RIA is

implemented in HTML, CSS and Javascript [54]

and runs in a web browser. The back end is a

common web server. The connection between the

Workflows for Heliophysics 499

client and the server is made through asynchro-

nous server calls based on AJAX technologies

[55].

The static component diagram of the HELIO

portal consists of three core layers (Fig. 7). The

fourth layer at the bottom is meant as a place

holder for the HELIO web services. It contains

all four categories, but only a selection of the web

services described in Section 2.2.

The access layer is written in Java and abstracts

access to the HELIO web services by offering

a couple of generic interfaces. Every member

within a category can be called the same way. This

is particularly convenient for infrastructure and

processing services, as not all of the underlying

services offer the same web service interface.

The access classes are implemented as Java-

Beans. The properties of the JavaBeans are

mapped to the input parameters of the underlying

web services. At runtime, clients can use bean

introspection to find out about these properties.

After setting appropriate values the client calls

an execute() method which delegates to the

web service. The actual call is executed in a back-

ground thread. The execute() method returns

an object to poll the execution status and to re-

Browser

(HTML, CSS,

JavaScript)

Access

(Java)

Service Layer

(Java, IDL, others)

Data Metadata

Processing

Data Provider

Access (DPAS)

Event Catalogue

Feature

Catalogue

Inst. Location

Inst. Coverage

Taverna Server Processing

Service (HPS)

Context Service

(CXS)

Data Evaluation

Service (DES)

Data Access

Beans

Metadata Access

Beans

Processing Access Beans

(Taverna, HPS, CXS, DES)

Registry Access

Beans

Infrastructure

HELIO Registry

Service (HRS)

MyExperiment

Web Server

(Grails)

Conﬁguration

Mashup

Metadata Access

Controller

Data Access

Controller

Processing

Controller

Task ViewsDomain Model

Data Model Input Dialog Ta sk ResultData Cart

Main

Dialog Views Result Views

Data/Image

Repositories

Fig. 7 Component diagram of the HELIO portal architecture. The highlighted components are from outside of the HELIO

project

500 A. Le Blanc et al.

trieve the result and corresponding log messages

once they are available.

The web server part of the HELIO portal is im-

plemented in Grails. The Grails web framework is

written in the Groovy programming language and

sits on top of the Spring web MVC framework.

The web server layer consists of several

components:

1. The domain model is a Groovy object model

holding the data required within a browser

session. It is connected to a database in order

to ensure that the data persists between mul-

tiple requests.

2. The processing, data and metadata access con-

trollers inspect the corresponding JavaBeans

of the access layer and populate the domain

model.

3. The configuration mashup component queries

the HELIO service and the myExperiment

registry and integrates their content to a portal

specific configuration format. Additionally,

currently hard coded, UI specific metadata

are woven into the configuration. These are

for instance input validators, help texts in a

browser friendly format (e.g., with images) or

layout directives.

4. The dialog, task and result views are responsi-

ble for rendering the views.

The browser layer of the HELIO portal im-

plements the actual user interface components. It

is based on Javascript and uses the jQuery [56]

library for the core AJAX functionality as well

as a couple of jQuery plugins for advanced UI

widgets.

The Javascript code is organised in six core

modules: main, data model, input dialog, task,

result and data cart. The main module handles the

overall integration, the other modules implement

the behaviour of the corresponding UI widgets

and connect them to the server.

In order to add a new Taverna workflow, the

system has to be modified in three areas. First,

aprocessing access bean for the new workflow

has to be created. While this is currently done

manually it is planned to use dynamic JavaBeans

in future. The properties of a DynaBean7[57]can

be created at runtime based on the configured

input parameters of a workflow. Second, the

configuration mashup component has to be

configured with the UI specific details of the

workflow. Third, the workflow has to be regis-

tered in the task menu.

The HELIO portal will automatically create

the required parameter input dialog. If the result

formats are supported, the HELIO portal will

also take care to present the result to the user.

Otherwise the portal offers a download link and

the data can be viewed in a separate application.

7 Conclusion

The service oriented architecture of HELIO en-

ables us to pursue workflows as the means to en-

rich the community of heliophysics with new and

more complex functionality, built from the basic

building blocks of general heliophysics services.

We introduced workflow building to scientists and

enabled them to share and discuss their scientific

work as workflows within the social space of the

myExperiment repository. Scientists who do not

write their own workflows can benefit from the

advanced functionality they provide by executing

them remotely on the Taverna Server without the

need of any additional software on their compu-

ters. The availability of web service APIs to both

myExperiment and Taverna Server enables us to

integrate the most useful workflows for every user

into the HELIO VxO portal.

We have started the process of changing the

working practices of scientists, but this change

takes time, support from workflow experts, the

ability to integrate existing scripts and applica-

tions easily, and the availability of training. The

interest of scientists in the workflow writing tu-

torial was large and went beyond the partners

in the HELIO project, and this indicates that

further training needs to be offered at suitable

times, and where possible in association with other

scientific events. The integration of the HELIO

7See http://commons.apache.org/beanutils/.

Workflows for Heliophysics 501

registry into the Taverna Workbench will be a

key development going forward, as it will enhance

the accessibility of Taverna for heliophysics users

by making the registered services automatically

available to workflow developers.

The development of Taverna Server is pro-

ving to be an important outcome of the HELIO

project, as it has attracted much attention from

many scientists and developers across many dis-

ciplines. We provide a solid design which enables

the safe execution of workflows. The integration

of the Taverna Server in the MyGrid team will en-

sure that the further development will be aligned

with new releases of Taverna Workbench.

We found a way of making the integration of

additional workflows in the HELIO portal an easy

(though not automated) process. That makes it

possible to provide new functionality to the portal

users as scientists develop new workflows only by

editing some configuration files. There is scope

for some automation of this process, but open is-

sues remain in the area of authorization. Because

workflows will be used by unauthenticated users

and run on the Taverna server, it is necessary to

vet the workflows before making them available

through the portal. This is likely to increase the

administrative overhead of managing the portal.

A key part of this will be ensuring that the meta-

data quality of any workflow to be published

through the portal is of a sufficient standard to

be able to work with the automatic generation of

input and output UI specific components.

Acknowledgements We want to thank the service

providers of HELIO: Observatoire de Paris, University

College London (MSSL), Trinity College Dublin, IRAP

Toulouse, Istituto Nazionale di Astrofisica (Obs. Trieste),

The University of Manchester, Science and Technology

Facilities Council (RAL) and all the HELIO Consortium.

This work was funded by the European Commission as

part of the Seventh Framework Programme, Project No.

238969.

References

1. Szalay, A., Gray, J.: The world-wide telescope. Science

293(5537), 2037–2038 (2001)

2. Hatziminaoglou, E.: Virtual observatory: science ca-

pabilities and scientific results. In: Tsinganos, K.,

Hatzidimitriou, D., Matsakos, T. (eds.) 9th Interna-

tional Conference of the Hellenic Astronomical Soci-

ety. Astronomical Society of the Pacific Conference

Series, vol. 424, pp. 411 (2010)

3. Tedds, J.A.: Science with the virtual observatory:

the AstroGrid VO desktop. ArXiv:0906.1535 e-prints

(2009)

4. Dalla, S., Walton, N.A.: Astrogrid: the Uk’s virtual ob-

servatory and its solar physics capabilities. In: Walsh,

R.W., Ireland, J., Danesy, D., Fleck, B. (eds.) SOHO

15 Coronal Heating. ESA Special Publication, vol. 575,

p. 577 (2004)

5. Bentley, R., Csillaghy, A., Aboudarham, J., Jacquey,

C., Hapgood, M.A., Bocchialini, K., Messerotti, M.,

Brooke, J., Gallagher, P., Fox, P., et al.: HELIO: the

heliophysics integrated observatory. Adv. Space Res.

47(12), 2235–2239 (2011)

6. Martınez, A.P., Derriere, S., Gray, N., Mann, R.,

McDowell, J., Mc Glynn, T., Ochsenbein, F., Osuna,

P., Rixon, G., Williams, R.: The UCD1+controlled

vocabulary. IVOA Semantics WG Recommendation

(2005)

7. Ochsenbein, F., Williams, R., Davenhall, C., Durand,

D., Fernique, P., Hanisch, R., Giaretta, D., McGlynn,

T., Szalay, A., Wicenec, A.: VOTable: tabular data for

the virtual observatory. In: Quinn, P., Górski, K. (eds.)

Toward an International Virtual Observatory. ESO

Astrophysics Symposia, vol. 30, pp. 118–123. Springer,

Berlin / Heidelberg (2004). doi:10.1007/10857598_18

8. Stern, B.A.: Interactive data language. In: Proceed-

ings of SPACE 2000: The Seventh International Con-

ference and Exposition on Engineering, Construction,

Operations and Business in Space, p. 1011. American

Society of Civil Engineers, 1801 Alexander Bell Drive,

Reston, VA, 20191-4400, USA (2000)

9. Bentley, R., CASSIS team: Coordination action for the

integration of solar system infrastructures and science.

Project web page. http://cassis-vo.eu/ (2010). Cited 13

April 2012

10. Box, D., Ehnebuske, D., Kakivaya, G., Layman, A.,

Mendelsohn, N., Nielsen, H.F., Thatte, S., Winer,

D.: Simple object access protocol (SOAP) 1.1. http://

www.w3.org/TR/2000/NOTE-SOAP-20000508/ (2000)

11. Fielding, R.: Representational state transfer: an ar-

chitectural style for distributed hypermedia interac-

tion. PhD thesis, PhD Thesis, University of California,

Irvine (2000)

12. Bose, P., Hurlburt, N., Somani, A., Fox, P.: Collab-

orative virtual sensorweb infrastructure: architecture

and implementation. Online at http://www.ivoa.net/

internal/IVOA/InterOpOct2011GWS/IVOA-Scientific-

Workflows.pdf. In NASA Science Technology

Conference (2007)

13. Unkown: IVOA table access protocol parameterized

query language. Online at http://www.ivoa.net/internal/

IVOA/TableAccess/PQL-0.2-20090520.pdf (2009)

14. Melton, J., Simon, A.R.: Understanding the New SQL:

A Complete Guide. The Morgan Kaufmann Series in

Data Management Systems. Morgan Kaufmann Pub-

lishers (1993)

15. Andrews, T., Curbera, F., Dholakia, H., Goland,

Y., Klein, J., Leymann, F., Liu, K., Roller, D.,

Smith, D., Thatte, S., Trickovic, I., Weerawarana, S.:

502 A. Le Blanc et al.

BPEL4WS, Business process execution language

for web services version 1.1. IBM. http://download.

boulder.ibm.com/ibmdl/pub/software/dw/specs/ws-bpel

/ws-bpel.pdf (2003)

16. Deelman, E., Blythe, J., Gil, Y., Kesselman, C., Mehta,

G., Patil, S., Su, M.H., Vahi, K., Livny, M.: Pegasus:

mapping scientific workflows onto the Grid. In: Grid

Computing, pp. 131–140. Springer (2004)

17. Breuer, D., Erwin, D., Mallmann, D., Menday, R.,

Romberg,M.,Sander,V.,Schuller,B.,Wieder,P.:Sci-

entific computing with UNICORE. In: NIC Sympo-

sium, vol. 20, pp. 429–440 (2004)

18. Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R.,

Kötter, T., Meinl, T., Ohl, P., Sieb, C., Thiel, K.,

Wiswedel, B.: KNIME: the Konstanz information

miner. In: Data Analysis, Machine Learning and Ap-

plications, pp. 319–326 (2008)

19. Goecks, J., Nekrutenko, A., Taylor, J., Team, T.G.:

Galaxy: a comprehensive approach for supporting ac-

cessible, reproducible, and transparent computational

research in the life sciences. Genome Biol. 11(8), R86

(2010)

20. Altintas, I., Berkley, C., Jaeger, E., Jones, M.,

Ludascher, B., Mock, S.: Kepler: an extensible sys-

tem for design and execution of scientific workflows.

In: Proceedings. 16th International Conference on Sci-

entific and Statistical Database Management, 2004,

pp. 423–424 (2004)

21. Hull, D., Wolstencroft, K., Stevens, R., Goble, C.,

Pocock, M.R., Li, P., Oinn, T.: Taverna: a tool for build-

ing and running workflows of services. Nucleic Acids

Res. 34(suppl 2), W729 (2006)

22. Peterson, J.L.: Petri net theory and the modeling of sys-

tems. Prentice-Hall, Inc., Englewood Cliffs, NJ 07632

(1981)

23. Belhajjame, K., Corcho, O., Garijo, D., Zhao, J.,

Missier, P., Newman, D.R., Palma, R., Bechhofer,

S., Garcia Cuesta, E., Gomez-Perez, J.M., Klyne, G.,

Page, K., Roos, M., Ruiz, J.E., Soiland-Reyes, S.,

Verdes-Montenegro, L., De Roure, D., Goble, C.:

Workflow-centric research objects: a first class citi-

zen in the scholarly discourse. In: Proc. Workshop

on the Semantic Publishing (SePublica), pp. 1–12

(2012)

24. Schaaff, A., Le Petit, F., Prugniel, P., Slezak, E.,

Surace, C.: Workflow working group in the frame

of asov. Online at http://www.france-ov.org/twiki/pub/

GROUPEStravail/Workflow/schaaff.pdf (2006)

25. Ohishi, M.: International virtual observatory alliance.

Highlights Astron. 14, 528–529 (2006)

26. Schaaff, A., Ruiz, J.E., et al.: Scientific workflows in

the vo. Online at http://www.ivoa.net/internal/IVOA/

InterOpOct2011GWS/IVOA-Scientific-Workflows.pdf

(2011)

27. Freeland, S.L., Handy, B.N.: Data analysis with the

solarsoft system. Sol. Phys. 182, 497–500 (1998).

doi:10.1023/A:1005038224881

28. Hightower, R.: BeanShell & DynamicJava: Java script-

ing with Java. JAVA developer’s journal. Online at

http://java.sys-con.com/node/36439 (2000). Retrieved

April 2012

29. Le Blanc, A.: Available instruments through DPAS

which are not part of ICS instruments table. In: myEx-

periment Repository. http://www.myexperiment.org/

workflows/2829.html (2012)

30. Le Blanc, A.: Check in UOC which instruments were

observing at a given time period and place. In: myEx-

periment Repository. http://www.myexperiment.org/

workflows/2822.html (2012)

31. Pérez-Suárez, D., Maloney, S.A., Higgins, P.A.,

Bloomfield, D.S., Gallagher, P.T., Pierantoni, G.,

Bonnin, X., Cecconi, B., Alberti, V., Bocchialini, K.,

Dierckxsens, M., Opitz, A., Blanc, A., Aboudarham,

J., Bentley, R.B., Brooke, J., Coghlan, B., Csillaghy,

A., Jacquey, C., Lavraud, B., Messerotti, M.: Study-

ing SunPlanet connections using the heliophysics in-

tegrated observatory (HELIO). Sol. Phys. 280(2),

603–621 (2012)

32. Le Blanc, A.: Co-rotating interaction regions back

wards propagation. In: myExperiment Repository. http://

www.myexperiment.org/workflows/2817.html (2012)

33. Parker, E.N.: Dynamics of the interplanetary gas and

magnetic fields. Ap. J. 128, 664 (1958)

34. Le Blanc, A., Miteva, R.: Associate sep events at

earth with flare, cme and radio events on the sun. In:

myExperiment Repository. http://www.myexperiment.

org/workflows/2815.html (2012)

35. Paskin, N.: Digital Object Identifier (DOI®), chapter

114, pp. 1–12. Taylor & Francis (2011)

36. Berners-Lee, T., Fielding, R., Masinter, L.: Uniform

resource identifiers (URI): generic syntax. In: Obso-

leted by RFC 3986, updated by RFC 2732. Internet

Engineering Task Force, IETF, vol. 2396. http://www.

ietf.org/rfc/rfc2396.txt (1998)

37. Goble, C.A., De Roure, D.C.: myExperiment: social

networking for workflow-using e-scientists. In: Pro-

ceedings of the 2nd Workshop on Workflows in Sup-

port of Large-Scale Science, pp. 1–2. ACM (2007)

38. Gregorio, J., de hOra, B.: RFC: 5023 the atom publish-

ing protocol. In: IETF Requests For Comments (2007)

39. Saint-Andre, P.: RFC 6120: extensible messaging and

presence protocol (XMPP): core. In: IETF Requests

For Comments (2004)

40. Perry, J.S., Denn, R.: Java Management Extensions.

O’Reilly & Associates, Inc. (2002)

41. Vicario, S., Hardisty, A., Haitas, N.: Biovel: biodi-

versity virtual e-laboratory. EMBnet.journal 17(2), 5

(2011)

42. Murty, J.: Programming Amazon Web Services: S3,

EC2, SQS, FPS, and SimpleDB. O’Reilly Media, Incor-

porated (2008)

43. Balani, N., Hathi, R.: Apache CXF Web Service De-

velopment. Packt Publishing (2009)

44. Johnson, R., Hoeller, J., Arendsen, A., Risberg, T.,

Kopylenko, D.: Professional Java Development with

the Spring Framework. Wrox Press Ltd. (2005)

45. Chinnici, R., Hadley, M.: JSR 224: Java API for XML-

Based Web Services (JAX-WS) 2.0. Java Community

Process (2006)

46. Hadley, M., Sandoz, P.: JSR 311: JAX-RS: Java API for

RESTful Web Services (version 1.1). Java Community

Process (2009)

Workflows for Heliophysics 503

47. Napier, R.A.: Secure automation: achieving least priv-

ilege with SSH, Sudo and Setuid. In: 18th Large

Installation System Administration Conference, pp. 203–

212 (2004)

48. Maassen, J., van Nieuwpoort, R., Veldema, R., Bal,

H.E., Plaat, A.: An efficient implementation of Java’s

remote method invocation. SIGPLAN Not. 34(8), 173–

182 (1999)

49. The Legion of the Bouncy Castle: Bouncy Cas-

tle Crypto APIs for Java. Online at http://www.

bouncycastle.org/java.html (2007–2012)

50. Pierantoni, G., Kenny, E., Coghlan, B.: The architec-

ture of helio. In: Bubak, M., Turala, M., Wiatr, K. (eds.)

CGW’10 Proceedings. Volume CGW’10 Proceedings

of Krakow Grid Workshop Proceedings, pp. 84–91.

ACC CYFRONET AGH (2011)

51. Pierantoni, G., Kenny, E., Coghlan, B.: The use of stan-

dards in helio. Comp. Sci. 13(2), 93–102 (2012)

52. Sroka, J., Hidders, J., Missier, P., Goble, C.: A

formal semantics for the Taverna 2 workflow

model. J. Comput. Syst. Sci. 76(6), 490–508

(2010)

53. Driver, M., Valdes, R., Phifer, G.: Rich Internet Ap-

plications are the Next Evolution of the Web. Gartner

Research (2005)

54. Flanagan, D.: JavaScript: The Definitive Guide.

O’Reilly (1998)

55. Garrett, J.J.: Ajax: a new approach to web applications.

Blog posting, available online at http://www.adaptive

path.com/ideas/ajax-new-approach-web-applications

(2005). Downloaded June 2011

56. Bibeault, B., Katz, Y.: jQuery in Action. Manning Pub-

lications Co. (2008)

57. Spielman, S.: The Struts Framework: Practical Guide

for Java Programmers. Morgan Kaufmann Pub

(2002)

CWOM: a lightweight cloud-oriented workflow optimisation middleware

Article

Jan 2021

Peng Xiao

VPH-HF: A software framework for the execution of complex subject-specific physiology modelling workflows

Article

Full-text available

Mar 2018

Computational medicine more and more requires complex orchestrations of multiple modelling & simulation codes, written in different programming languages and with different computational requirements, which when validated need to be run many times on large cohorts of patients. The aim of this paper is to present a new open source software, the VPH Hypermodelling Framework (VPH-HF). The VPH-HF overcomes the limitations of most workflow execution environments by supporting both Taverna and Muscle2; the addition of Muscle2 support makes possible the execution of very complex orchestrations that include strongly-coupled models. The overhead that the VPH-HF imposes in exchange for this is small, and tends to be flat regardless of the complexity and the computational cost of the hypermodel being executed. We recommend the use of the VPH-HF to orchestrate any hypermodel with an execution time of 200 s or higher, which would confine the VPH-HF overhead to less than 10%. The VPH-HF also provide an automatic caching system over the execution of every hypomodel, which may provide considerable speed-up when the orchestration is run repeatedly over large numbers of patients or within stochastic frameworks, and the input sets are properly binned. The caching system also makes it easy to form large input set/output set databases required to develop reduced-order models, and the framework offers the possibility to dynamically replace single models in the orchestration with reduced-order versions built from cached results, an essential feature when the orchestration of multiple models produces a combinatory explosion of the computational cost.

Web Services as Building Blocks for Science Gateways in Astrophysics

Article

Full-text available

Dec 2016

An efficient exploitation of Distributed Computing Infrastructures (DCIs) is needed to deal with the data deluge that the scientific community is facing, in particular the Astrophysics one due to the emerging Square Kilometre Array (SKA) telescope that will reach data rates in the exascale domain. Hence, Science Gateways are being enriched with advanced tools that not only enable the scientists to build their experiments but also to optimize their adaptation to different infrastructures. In this work we present a method, called “two-level workflow system”, to build this kind of tools and we apply it to a set of analysis tasks of interest for some use applications to the SKA. This method uses the Software-as-a-Service model to keep the scientists insulated from technical complexity of DCIs, and the COMPSs programming model to achieve an efficient use of the computing resources.

The Essential Components of a Successful Galaxy Service

Article

Full-text available

Dec 2016

Driven by advances in data generation technologies and fuelled by radical reduction in costs, genomics has become a data science. Nonetheless the field of genomics has been restrained by the ability to analyse data. Science gateways, such as Galaxy, have the potential to enable bench biologists to analyse their own data without needing be familiar with the command line. Implementing a production scale Galaxy service, sufficiently well-featured and resourced to meet the needs of the end-users, is a significant undertaking and requires the consideration and combination of a number of factors to be successfully adopted by the community. In this paper, we describe the process that we undertook to implement a Galaxy service and describe what we consider to be the essential components of such a service. Our experience and insights will be of interest to those who are planning on implementing a science gateway service in a research organisation.

Galaxy-M: A Galaxy workflow for processing and analyzing direct infusion and liquid chromatography mass spectrometry-based metabolomics data

Article

Full-text available

Dec 2016

Background: Metabolomics is increasingly recognized as an invaluable tool in the biological, medical and environmental sciences yet lags behind the methodological maturity of other omics fields. To achieve its full potential, including the integration of multiple omics modalities, the accessibility, standardization and reproducibility of computational metabolomics tools must be improved significantly. Results: Here we present our end-to-end mass spectrometry metabolomics workflow in the widely used platform, Galaxy. Named Galaxy-M, our workflow has been developed for both direct infusion mass spectrometry (DIMS) and liquid chromatography mass spectrometry (LC-MS) metabolomics. The range of tools presented spans from processing of raw data, e.g. peak picking and alignment, through data cleansing, e.g. missing value imputation, to preparation for statistical analysis, e.g. normalization and scaling, and principal components analysis (PCA) with associated statistical evaluation. We demonstrate the ease of using these Galaxy workflows via the analysis of DIMS and LC-MS datasets, and provide PCA scores and associated statistics to help other users to ensure that they can accurately repeat the processing and analysis of these two datasets. Galaxy and data are all provided pre-installed in a virtual machine (VM) that can be downloaded from the GigaDB repository. Additionally, source code, executables and installation instructions are available from GitHub. Conclusions: The Galaxy platform has enabled us to produce an easily accessible and reproducible computational metabolomics workflow. More tools could be added by the community to expand its functionality. We recommend that Galaxy-M workflow files are included within the supplementary information of publications, enabling metabolomics studies to achieve greater reproducibility.

"Share and Enjoy": Publishing Useful and Usable Scientific Models

Article

Full-text available

Sep 2014

The reproduction and replication of reported scientific results is a hot topic within the academic community. The retraction of numerous studies from a wide range of disciplines, from climate science to bioscience, has drawn the focus of many commentators, but there exists a wider socio-cultural problem that pervades the scientific community. Sharing data and models often requires extra effort, and this is currently seen as a significant overhead that may not be worth the time investment. Automated systems, which allow easy reproduction of results, offer the potential to incentivise a culture change and drive the adoption of new techniques to improve the efficiency of scientific exploration. In this paper, we discuss the value of improved access and sharing of the two key types of results arising from work done in the computational sciences: models and algorithms. We propose the development of an integrated cloud-based system underpinning computational science, linking together software and data repositories, toolchains, workflows and outputs, providing a seamless automated infrastructure for the verification and validation of scientific models and in particular, performance benchmarks.

A Workflow-Oriented Approach To Propagation Models In Heliophysics

Conference Paper

Full-text available

Nov 2013

he Sun is responsible for the eruption of billions of tons of plasma and the generation of near light-speed particles that propagate throughout the solar system and beyond. If directed towards Earth, these events can be damaging to our technological infrastructure. Hence there is an effort to understand the cause of the eruptive events and how they propagate from Sun to Earth. However, the physics governing their propagation is not well understood, so there is a need to develop a theoretical description of their propagation, known as a Propagation Model, in order to predict when they may impact Earth. It is often difficult to define a single propagation model that correctly describes the physics of solar eruptive events, and even more difficult to implement models capable of catering for all these complexities and to validate them using real observational data. In this paper, we envisage that workflows offer both a theoretical and practical framework for a novel approach to propagation models. We define a mathematical framework that aims at encompassing the different modalities with which workflows can be used, and provide a set of generic building blocks written in the TAVERNA workflow language that users can use to build their own propagation models. Finally we test both the theoretical model and the composite building blocks of the workflow with a real Science Use Case that was discussed during the 4th CDAW (Coordinated Data Analysis Workshop) event held by the HELIO project. We show that generic workflow building blocks can be used to construct a propagation model that successfully describes the transit of solar eruptive events toward Earth and predict a correct Earth-impact time

Scientific Workflow Management -- For Whom?

Conference Paper

Oct 2014

Workflow management has been widely adopted by scientific communities as a valuable tool to carry out complex experiments. It allows for the possibility to perform computations for data analysis and simulations, whereas hiding details of the complex infrastructures underneath. There are many workflow management systems that offer a large variety of generic services to coordinate the execution of workflows. Nowadays, there is a trend to extend the functionality of workflow management systems to cover all possible requirements that may arise from a user community. However, there are multiple scenarios for usage of workflow systems, involving various actors that require different services to be supported by these systems. In this paper we reflect about the usage scenarios of scientific workflow management based on the practical experience of heavy users of workflow technology from communities in three scientific domains: Astrophysics, Heliophysics and Biomedicine. We discuss the requirements regarding services and information to be provided by the workflow management system for each usage profile, and illustrate how these requirements are fulfilled by the tools these communities currently adopt. This paper contributes to the understanding of properties of future workflow management systems that are important to increase their adoption in a large variety of usage scenarios.

Metaworkflows and Workflow Interoperability for Heliophysics

Conference Paper

Jun 2014

Heliophysics is a relatively new branch of physics that investigates the relationship between the Sun and the other bodies of the solar system. To investigate such relationships, helio- physicists can rely on various tools developed by the community. Some of these tools are on-line catalogues that list events (such as Coronal Mass Ejections, CMEs) and their characteristics as they were observed on the surface of the Sun or on the other bodies of the Solar System. Other tools offer on-line data analysis and access to images and data catalogues. During their research, he- liophysicists often perform investigations that need to coordinate several of these services and to repeat these complex operations until the phenomena under investigation are fully analyzed. Heliophysicists combine the results of these services; this service orchestration is best suited for workflows. This approach has been investigated in the HELIO project. The HELIO project developed an infrastructure for a Virtual Observatory for Heliophysics and implemented service orchestration using TAVERNA workflows. HELIO developed a set of workflows that proved to be useful but lacked flexibility and re-usability. The TAVERNA workflows also needed to be executed directly in TAVERNA workbench, and this forced all users to learn how to use the workbench. Within the SCI-BUS and ER-FLOW projects, we have started an effort to re-think and re-design the heliophysics workflows with the aim of fostering re-usability and ease of use. We base our approach on two key concepts, that of meta-workflows and that of workflow interoperability. We have divided the produced workflows in three different layers. The first layer is Basic Workflows, developed both in the TAVERNA and WS- PGRADE languages. They are building blocks that users compose to address their scientific challenges. They implement well-defined Use Cases that usually involve only one service. The second layer is Science Workflows usually developed in TAVERNA. They implement Science Cases (the definition of a scientific challenge) by composing different Basic Workflows. The third and last layer,Iterative Science Workflows, is developed in WS-PGRADE. It executes sub-workflows (either Basic or Science Workflows) as parameter sweep jobs to investigate Science Cases on large multiple data sets. So far, this approach has proven fruitful for three Science Cases of which one has been completed and two are still being tested.

THE USE OF STANDARDS IN HELIO 1 . HELIO : An infrastructure for Heliophysics

Article

Full-text available

Jan 2012

HELIO [8] is a project funded under the FP7 program for the discovery and analysis of data for heliophysics. During its development, standards and common frameworks were adopted in three main areas of the project: query services, processing services, and the security infrastructure. After a ﬁrst, proprietary implementation of the security service, it was suggested moving it to a standard security framework to simplify the enforcement of security on the diﬀerent sites. As the HELIO front end is built with Spring and the TAVERNA server (HELIO workﬂow engine) has a security framework compatible with Spring, it has been decided to move the CIS in Spring security [2]. HELIO has two diﬀerent processing services: one is a generic processing service called HELIO Processing Services (HPS), the other is called Context Service (CTX) and it runs speciﬁc IDL procedures. The CTX implements the UWS [4] interface from the IVOA [5], a standard interface for job submission used in the helio and astro-physics community. In its ﬁnal release, the HPS will expose an UWS compliant interface. Finally, some of the HELIO services perform queries, to simplify the implementation and usage of this services a single query interface (the HELIO Query Interface) has been designed for all these services. The use of these solutions for security, execution, and query allows for easier implementation of the original HELIO architecture and for a simpler deployment of the services.

International Virtual Observatory Alliance

Article

Full-text available

Aug 2006
Proc Int Astron Union

Masatoshi Ohishi

The International Virtual Observatory Alliance is briefly introduced as a concensus-based group to construct International Virtual Observatory – a new, planet-wide research infrastructure for the 21st century astronomy. Standardized protocols by the IVOA were used to interconnect more than 10 astronomical obsrvatories and data centers to provide astronomers with multiwavelength astronomical data. The priority areas for technical development and planned developments are described.

Uniform Resource Identifiers (URI), Generic Syntax

Technical Report

Jan 1998

myExperiment: social networking for workflow-using e-scientists

Conference Paper

Jun 2007

We present the Taverna workflow workbench and argue that scientific workflow environments need a rich ecosystem of tools that support the scientists’ experimental lifecycle. Workflows are scientific objects in their own right, to be exchanged and reused. myExperiment is a new initiative to create a social networking environment for workflow workers. We present the motivation for myExperiment and sketch the proposed capabilities and challenges. We argue that actively engaging with a scientist’s needs, fears and reward incentives is crucial for success.

Galaxy: A comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences

Article

Jan 2010

Representational state transfer

Article

Jan 2000

R. Fielding

BioVeL: Biodiversity Virtual e-Laboratory

Article

Dec 2011

The aim of the BioVeL project is to provide a seamlessly connected informatics environment that makes it easier for biodiversity scientists to carry out in-silico analysis of relevant biodiversity data and to pursue in-silico experimentation based on composing and executing sequences of complex digital data manipulations and modelling tasks. In BioVeL scientists and technologists will work together to meet the needs and demands for in-silico or ‘e-Science’ and to create a production-quality informatics infrastructure to enable pipelining of data and analysis into efficient integrated workflows. Workflows represent a way of speeding up scientific advance when that advance is based on the manipulation of digital data.

JavaScript: The definitive guide. 2nd ed

Article

Jan 1997

David Flanagan

Digital object identifiers

Article

Jan 2002
Inform Serv Use

Norman Paskin

This workshop was held 14-15 February 2002 at UNESCO Headquarters, Paris. The workshop was a follow-on to a January 2000 Stakeholders Workshop sponsored by ICSTI and UNESCO. This workshop was sponsored by ICSTI in cooperation with the ICSU Press of the ...

Interactive Data Language

Article

Feb 2000

Benjamin A. Stern

The Interactive Data Language (IDL) was originally designed to be used in space, but it has proved to be extremely useful in terrestrial environments, too. IDL has been used in many space missions and projects, like the salvation of the Hubble Space Telescope, the calibration of high-energy X-ray telescopes, and behind the scenes at Ball Aerospace. IDL's applications in the earth sciences include being used in monitoring waste sites to improve cleanup, tracking sea ice to keep ships safe, and improving accuracy in long-range weather predictions. Because IDL and other space technology have proved to be so useful in terrestrial environments, space technology should continue to be used in other fields of science.

Workflows for Heliophysics

Abstract and Figures

Recommended publications

A Testbed For Experiments In Adaptive Memory Retrieval and Indexing

Indexing and querying linguistic metadata and document content

Understanding and modelling built environments for mobile guide interface design

On the Management of Visual Design Documentation