Conference PaperPDF Available

Enabling Data Transport between Web Services through alternative protocols and Streaming

December 2008

December 2008

DOI:10.1109/eScience.2008.127

Source
DBLP

Conference: Fourth International Conference on e-Science, e-Science 2008, 7-12 December 2008, Indianapolis, IN, USA

Authors:

Spiros Koulouzis

University of Amsterdam

A. S. Z. Belloum

University of Amsterdam

A significant challenge for Web services is to find reliable and efficient methods to transfer large data between them. This paper describes the problem of scalable data transport between Web services, and proposes a solution: the development of a modular server/client library that uses SOAP as a control channel while the actual data transport is accomplished by various protocol implementation. Apart from file transport, the proposed approach offers the facility of direct data streaming between Web services, an approach that could benefit workflow execution time.

(left) Service Orchestration — web service calls are always controlled by a workflow engine and (right) Service Choreography — which describes the message exchange among interacting web services.

…

A simple workflow where web services exchange data through file exchange. This approach results in unnecessary temporary files.

…

The Montage abstract workflow. In this workflow, there are three intermediate file transfers.

…

A simple workflow where web services communicate through a data pipeline.

…

The Streaming library architecture.

…

Figures - uploaded by A. S. Z. Belloum

Content may be subject to copyright.

Content uploaded by A. S. Z. Belloum

Content may be subject to copyright.

Enabling Data Transport between Web Services

through alternative protocols and Streaming

Spiros Koulouzis

skoulouz@science.uva.nl

Edgar Meij

E.J.Meij@uva.nl

M. Scott Marshall

S.Marshall@uva.nl

Adam Belloum

A.S.Z.Belloum@uva.nl

Informatics Institute

ISLA

University of Amsterdam

Abstract

As web services gain acceptance in the e-Science com-

munity, some of their shortcomings have begun to appear.

A signiﬁcant challenge is to ﬁnd reliable and efﬁcient meth-

ods to transfer large data between web services. This paper

describes the problem of scalable data transport between

web services, and proposes a solution: the development of

a modular Server/Client library that uses SOAP as a con-

trol channel while the actual data transport is accomplished

by various protocol implementation, as well as a simple

API that developers can use for data-intensive applications.

Apart from ﬁle transport, the proposed approach offers the

facility of direct data streaming between web services, an

approach that could beneﬁt workﬂow execution time by cre-

ating a data pipeline between web services. Finally, the

performance and usability of this library is evaluated, un-

der the indexing application that the Adaptive Information

Disclosure Application (AIDA) Toolkit offers as a Web Ser-

vice.

1 Introduction

Web services offer an appealing paradigm for develop-

ing scientiﬁc applications by providing interoperability and

ﬂexibility in a large scale distributed environment. Through

the use of XML based protocols (SOAP) and interfaces

(WSDL), web services can expose all or part of any appli-

cation in a language independent fashion across heteroge-

neous platforms. Those features enable them to be com-

bined in a loosely coupled way so that more complex oper-

ations may be achieved [23].

Scientiﬁc applications can be created from workﬂows by

combining and coordinating a set of web services so that

a more complex goal is met. In other words, a workﬂow

brings together web services (and/or applications) in a con-

sistent manner to provide a description of execution of a

higher level application.

Currently, two approaches exist in workﬂow implemen-

tation: Service Orchestration and Service Choreography.

Service Orchestration (shown in Figure 1) describes how

web services can interact at the message level, including

the application logic and execution order of the functional-

ity exposed by the WSDL a web services provides [6,17].

In orchestration, the process is always controlled by a work-

ﬂow engine, so all invocations (and replies) are made by

(and to) that workﬂow. On the other hand, choreography

(shown in Figure 1) is more collaborative, because it de-

scribes the message exchange among interacting — yet in-

dependent — web services [1]. Regardless which architec-

ture is chosen, any workﬂow execution can be effectively

reduced into 3 stages: (i) generating or obtaining data, (ii)

processing that data, and (iii) transferring or storing the re-

sults. Usually, those three stages are performed by an indi-

vidual web service.

This scenario, however, presents a data transport prob-

lem. During orchestration, any call to a producing web ser-

vice results in a reply back to the workﬂow engine, which

then needs to transfer the data to a consuming web service.

This results in unnecessary “data hops” between the web

service and workﬂow engine. In the case of choreogra-

phy, although data is delivered directly to the consuming

web services, this is usually done through SOAP. SOAP,

although suitable for service invocation, can be inefﬁcient

for data transport. Signiﬁcant performance issues for web

services can occur when binary data must be encoded in

XML, which measurably slows down applications and ab-

sorbs bandwidth [18].

The problems mentioned above are typically addressed

by the introduction of a third party ﬁle transfer. For exam-

ple in grid environments, where data is moved around with

Figure 1. (left) Service Orchestration — web service calls are always controlled by a workﬂow engine

and (right) Service Choreography — which describes the message exchange among interacting web

services.

the help of GridFTP and the Reliable File Transfer (RFT)

service [5]. This approach, however, might present another

problem. Taking into consideration the three stages of the

workﬂow mentioned above and a simple workﬂow shown

in Figure 2.W S1would generate data, save it locally

Figure 2. A simple workﬂow where web ser-

vices exchange data through ﬁle exchange.

This approach results in unnecessary tempo-

rary ﬁles.

and pass a ﬁle reference to W S2.W S2would process the

data received, save the result locally, pass the ﬁle reference

to W S3and so on until the ﬁnal result is obtained. In this

process, temporary ﬁles are saved to each web service lo-

cation, which results in an unnecessary demand on storage

resources and slower execution time of the workﬂow.

An example of an application that follows this scenario

may be seen from the Montage toolkit [14]. This applica-

tion is developed in order to assemble science-grade mo-

saics, located at distributed ﬁle repositories, by composing

multiple astronomical images. In other words, this appli-

cation integrates multiple images taken from different parts

of a galaxy in order to produce one representation of that

galaxy. Apart from the complex algorithm that ensures that

the separate images will ﬁt together while preserving some

vital data, the workﬂow of this application, seen in Figure 3,

produces some intermediate images, that go on to further

processing until they are composed into the ﬁnal image.

Another application that ﬁts into the simpliﬁed workﬂow

execution mentioned above is the one proposed in [10]. In

this application, web services are used for scientiﬁc visu-

alization where the visualization pipeline1is broken down

into a number of web services.

The data transport problem that web services face can

be summarised in the following way:

1. In service orchestration, all data is passed to the work-

ﬂow engine before delivery to a consuming web ser-

vice.

2. Data transfers are made through SOAP, which is unﬁt

for large data transfers.

3. third party ﬁle transfer is suitable for transferring large

data sets, but is restricted to ﬁles. This results in un-

necessary intermediate transfers that slow down work-

ﬂow execution and place excessive demand on storage

resources.

In order to address the above problems, this paper intro-

duces the use of streaming between web services. First of

all, it employs the approach of delivering data directly to a

consuming web service with alternative protocols to SOAP,

thus addressing the ﬁrst two problems. Second of all, we

describe streaming as a way to deliver data to a web service

without the need for intermediate ﬁle transfers.

Our Streaming library enables web services to stream

data directly to each other, creating a data pipeline (such

as seen in Figure 4) that speeds up workﬂow execution and

eliminate the need for allocating space on local disks. This

library is realised by a simple design of a client/server ar-

chitecture, that is contained in a web service and can take

care of transfers using multiple protocols. Since it is essen-

tial that its use is as simple as possible, it provides a very

1A visualization pipeline entails the process followed for generating

images from data.

Figure 3. The Montage abstract workﬂow. In

this workﬂow, there are three intermediate

ﬁle transfers.

simple API to web service, that can use it as any other I/O

stream, without having to worry about the transfer itself.

The remainder of this paper is organized as follows. Sec-

tion 2gives an overview of various protocols WS have at

their disposal for data transport. Section 3gives a descrip-

tion of our Streaming library, and how WS may use it for ﬁle

transfer, or direct streaming. Next in Section 4we give a de-

scription of the experimental setup, along with an overview

of AIDA, an application that makes use of our library. Sec-

tion 5provides some performance measurements of our li-

brary, under two use cases. Finlay Sections 6and 7, provide

the future work and conclusions. The implementation and

speciﬁcs are discussed in the following Sections.

2 Data Transport Methods

This section presents an overview of currently available

transport methods and protocols that web services can use,

for either ﬁle transfer or direct data streaming.

2.1 SOAP

SOAP is an XML-based protocol for exchanging mes-

sages over decentralized, distributed environments, and can

potentially be used over a variety of protocols, although

the most common implementation is made over HTTP(S).

SOAP forms the basic messaging framework upon which

Figure 4. A simple workﬂow where web ser-

vices communicate through a data pipeline.

web services may communicate, through several different

types of messaging patterns. One of these patterns is the

Remote Procedure Call (RPC), in which a client sends a

request message to a web service. After processing that re-

quest, the web service sends back a response to the client.

A SOAP message consists of three parts: (i) a mandatory

envelope element, which is the top element of the XML

document, (ii) an optional SOAP Header that deﬁnes how

a recipient should process the message, and (iii) a manda-

tory SOAP body element, that contains the actual message,

as it represents remote procedure calls and responses. [8,3].

Using SOAP to transfer data between web services is

probably the most straightforward approach to sending and

receiving data because no extra library development is re-

quired. In particular, current implementations (e.g. Axis)

provide a solid framework for message delivery and error

handing. Additionally, SOAP is platform independent: as

long as the data types sent are supported by XML or have

the appropriate schema deﬁnition, they can be handled by

any application. Another advantage of using SOAP is that

it makes it possible to add metadata to the message. Nev-

ertheless, SOAP can introduce a signiﬁcant overhead, thus

slowing down its overall performance as a data transport

protocol. In addition to the overhead of additional message

size, the serialization and deserialization process of a SOAP

message may cause additional performance delays [19].

2.2 SOAP with Attachments

SOAP with attachments (SwA) is an abstract model for

SOAP, that enables the transport of binary formatted data

along with a SOAP envelope. The SOAP message is still

described by the three parts mention in Section 2.1, but with

the addition of one or more Multipurpose Internet Mail Ex-

tensions (MIME) parts that are not deﬁned in the SOAP en-

velope but are related to the message. Each MIME part con-

tains some header information that may be used for identi-

fying the type of the embedded data, the encoding used for

this MIME part, and the content location. [21]. An alterna-

tive speciﬁcation to MIME is the Direct Internet Message

Encapsulation (DIME). The attached data in a DIME mes-

sage is deﬁned as a payload and a single message may con-

tain multiple payloads. The contents of DIME messages

are deﬁned by records. Each record speciﬁes the payload

size in bytes, the payload content type, and other informa-

tion [22]. SwA has a clear advantage over SOAP because at-

Figure 5. The Streaming library architecture. Dotted lines represent possible SOAP calls or byte

transfer paths.

tachments don’t require deserialization, while the message

retains SOAP’s advantages. Additionally, SwA can handle

large data sets [22], but requires signiﬁcant storage space

for handling the attachment (at least in the Axis implemen-

tation).

2.3 HTTP

Both SOAP and SwA are, in most cases, transmitted over

HTTP, which is a well-established and stable protocol that

deﬁnes a set of headers for exchanging data. An abundance

of high quality software is available that supports high speed

transfers of large data sets in HTTP, without having to worry

about ﬁrewalls. Furthermore, in the context of web services,

Tomcat containers already handle HTTP communications,

thus providing a solid framework for exploiting this facility.

On the down side, HTTP does not provide any facilities for

handling metadata or specify how that data should be han-

dled by the recipient side. Thus HTTP may require some

additional effort in implementing extensions that could de-

liver data to web services.

2.4 GridFTP

GridFTP is a protocol developed speciﬁcally for the grid

environment. It is an extension of the FTP that supports

security using public-key-based Grid Security Infrastruc-

ture(GSI) and Kerberos, parallel data transfer using mul-

tiple TCP streams, striped data transfer, automatic adjust-

ment of the TCP window size, and data transfer monitor-

ing [15,4]. Although GridFTP has set the standard for data

transfers in grid environments, it is not usable in a web ser-

vices scenario, where these are a loose collection of ser-

vices, rather than a structured Virtual Organization [12]. In

other words, most web service owners don’t offer this kind

of infrastructure. Another problem with using GridFTP is

that developers would be restricted to this particular type

of ﬁle transfer, as well as having to implement a client that

would request ﬁles from a GridFTP server.

2.5 TCP

TCP is a low level protocol in which all of the previously

mentioned protocols are implemented. It is the oldest and

most established protocol, providing a reliable and in-order

delivery of data that makes it suitable for a wide area of ap-

plications such as File Transfer Protocol, Secure Shell, and

some streaming media applications. In order to offer relia-

bility and in-order delivery, TCP assigns a sequence number

to each packet. This number is used by the receiving end

for ordering packets. Also the receiving side sends back an

acknowledgement for packets that have been successfully

received. Apart from re-transmitted unacknowledged pack-

ets, TCP also checks that no bytes are corrupted by using

a checksum. TCP comes with a variety of tuning options

that enable it to achieve full bandwidth utilization, and min-

imum delay [2,13]. Being a low level protocol, TCP should

be ideal for transporting large volumes of data, with zero

overhead, but it has no facility for metadata. Furthermore,

it requires custom development in order to be incorporated

in a web service. Extra attention to its tuning properties (e.g.

appropriately conﬁguring TCP buffer sizes) is necessary in

order to achieve its full potential.

Figure 6. A use case for indexing documents,

while using the Streaming Library. The doc-

uments are ﬁrst transferred to the IndexerWS

and then indexed locally.

3 The Streaming Library

In an effort to address the data transport problem, we

have developed a library in Java that employs the protocols

mentioned in Section 2. Our library provides a simple API

that developers and legacy web services can use to trans-

port data. The Streaming library (seen in Figure 5) is a

modular, client/server design that uses SOAP as a control

channel while the actual data transport is accomplished by

the various protocol implementations, which are developed

as plug-ins. The basic components that make up the library

are:

3.1 Connector

This module provides the functionality offered by the

Streaming client/server to a consuming/producing web ser-

vice, in the form of a standard I/O stream. Alternatively,

this module may use these streams to send or store data to

or from a local disk.

3.2 Streaming Client/Server

The role of these components is fairly simple. The server

only needs to send data received from the Connector mod-

ule to a connected client, while the client passes the received

data to the Connector. The details about authentication, and

the actual transfer, are left to the connection component.

3.3 Connection

We abstract the connection component, such that differ-

ent protocol implementations may be added in the design.

For this reason, it is assumed that a connection offers a read

Figure 7. A use case for indexing docu-

ments, while using streaming. The con-

tents extracted from documents are directly

streamed to the IndexerWS.

and write method, as well as one that enables the authen-

tication and/or the encryption of the channel used for the

transfer.

3.4 Establishing a Data Stream

Having described the basic components that make up the

design, it is time to see how these components interact to

establish a data stream between two web services:

1. The workﬂow engine invokes the producing web ser-

vice. That service will start producing data while pass-

ing them to the connector, which starts a Streaming

Server.

2. At this point, there are two alternatives: (i) the produc-

ing web service might respond back to the workﬂow

engine with a reference of the data stream2or (ii) pass

the reference directly to the consuming web service as

a SOAP call. In the ﬁrst case, it is the responsibility of

workﬂow engine to invoke the consuming method of

the second web service, which will use the Connector

module to connect to the server and get the data. In the

second case, the producing web service will contact

the consuming web service making the same call. For

this second case, the workﬂow engine should provide

the consuming endpoint to the web service.

3. If the protocol implementation provides authentication

and/or encryption mechanisms and the stream refer-

ence instructs the client/server to do so, the connection

between them would ﬁrst have to authenticate each

side. In the case where no authentication and/or en-

cryption is required, the connection is established and

the client starts receiving the data stream.

2This reference is an XML document, containing the server’s address,

port, and protocol speciﬁc conﬁgurations.

Figure 8. Speed measured in MB/sec, for ﬁle

transport for GridFTP, HTTP, TCP, SwA and

SOAP.

4. As soon as the producing web service is done pro-

ducing data, it simply closes the stream provided by

the Connector. As a consequence, the Connector must

close the connection between server and client. The

speciﬁc approach to closing the connection depends on

the protocol implementation.

5. Finally, the producing web service should return in-

formation about the transfer to the workﬂow engine,

such as any errors that might have occurred during that

transfer.

3.5 Streaming web services VS File-

oriented web services

As mentioned in Section 1, one of the problems web

services face while transferring data using third party ﬁle

transfers is the fact that temporary intermediate ﬁles must

be stored as a side effect of data transport. Streaming data

directly to web services could solve this problem, but there

are a number of issues to be considered before adopting that

approach. The main hurdle to adopting streaming comes

from the nature of the application a web services imple-

ments. If, for example, the application is designed to oper-

ate with large ﬁles that somehow are loaned, processed and

passed to a next web service, streaming might not be appli-

cable. This is because the loading and processing overhead

might be signiﬁcantly larger that the actual transfer itself. If

this is the case, streaming is only able to save storage capac-

ity in computational nodes and does not improve workﬂow

execution time dramatically. Furthermore when reliability

is the issue, streaming could also prove inefﬁcient. Con-

sider the example where a web service provides data from

Figure 9. Speed measured in MB/sec, for ﬁle

transport in a tuned TCP Java implementa-

tion, an HTTP, and an untuned TCP imple-

mentation (the current implementation of the

library).

a scientiﬁc instrument, (e.g. a telescope, or a scanner) and

that web service uses streaming to deliver data to a consum-

ing web service and that service consumes data directly. If

the data stream is broken for any reason, all the data pro-

duced so far would be lost, thus resuming the consuming

web service from the last known good checkpoint would be

impossible. Another case where streaming is not applica-

ble is when a consuming web service must obtain the entire

data set before operating on it. Nevertheless, when a web

service is designed to produce live data (e.g. cluster loads,

or stock prices), streaming would signiﬁcantly beneﬁt the

workﬂow execution time.

4 Experimental Setup

The Adaptive Information Disclosure Application

(AIDA) Toolkit is a generic set of components that can per-

form a variety of tasks such as learn new pattern recogni-

tion models, specialized search on resource collections, and

store knowledge in a repository. AIDA provides a set of

components which enable the indexing of text documents

in various formats, as well as the subsequent retrieval given

a query. The Indexer and Search components are both built

upon Apache Lucene, version 2.1.0 [16]. AIDA’s Indexer

component — called IndexerWS — is a webservice which

is able to index3a variety of document formats while taking

care of the preprocessing (the conversion, tokenization, and

possible normalization) of the text of each document as well

3Indexing is the process of analysing and extracting content from a

document. These contents are stored in an index in order to optimize the

speed and performance of ﬁnding documents relevant to a search query

Figure 10. Workﬂow execution time for vari-

ous ﬁle sizes. The two ﬁrst bars for every

ﬁle represent execution time for ﬁle transport

with HTTP and TCP respectively

as the subsequent index generation. The so-called “Docu-

mentHandlers” which handle the actual conversion of each

source ﬁle are loaded at runtime, so a handler for any other

proprietary document encoding can be created and directly

used [20]. The task of the IndexerWS might be potentially

data intensive, in which case SOAP is not able to meet the

IndexerWS’s demands. To enable the IndexerWS to index a

set of documents, the Streaming library was utilized for two

use cases:

1. A set of documents is obtained by a producer web ser-

vice (e.g. from a database), and transferred to the In-

dexerWS for indexing.

2. A PDF DocumentHandler is implemented as a web

service, for extracting text from a set of PDF ﬁles. This

text is streamed directly to the IndexerWS for index-

ing.

For the ﬁrst case, illustrated in Figure 6, a Data Transport

web service was developed that is able to transfer a set of

ﬁles to the IndexerWS location, in a third party ﬁle trans-

fer manner, using the API provided by the Streaming li-

brary and Axis. For the second case seen in Figure 7, the

PDF DocumentHandler and the IndexerWS used the APIs

to stream data between them.

In order to assess the performance of each protocol and

transfer method in the two use cases mentioned in Section

4, the IndexerWS and the DataTransportWS have been de-

ployed4on the DAS-3 Distributed Supercomputer [11]. For

the DataTransportWS four protocols were tested in terms of

4All web services were deployed in Axis 1.4 running in Apache Tomcat

6.0.16

Figure 11. Workﬂow execution time for gen-

erating various ﬁle sizes. The two ﬁrst bars

for every ﬁle represent execution time for ﬁle

transport with HTTP and TCP respectively.

speed. This metric was acquired for disk-to-disk transfers.

For the IndexerWS, two simple workﬂows were developed.

The ﬁrst workﬂow transfers a set of PDF ﬁles from a pro-

ducer location, to the IndexerWS, which is then invoked to

start the indexing process. The second workﬂow, invokes

a PDF DocumentHandler that extracts the content from a

set of PDFs and directly streams it to the IndexerWS. These

two workﬂows were measured in terms of execution time 5.

5 Results and Discussion

This section describes the actual performance results we

obtain using our proposed approach on the two tasks: ﬁle

and streaming transport.

5.1 File Transport

Figure 8shows the transport speed for the protocols de-

scribed in Section 2. As expected, GridFTP outperforms

all of the protocols when transferring larger ﬁles, as it is

now an optimized mature application. However, SwA and

HTTP are faster than GridFTP in ﬁle sizes of 109.47 and

259.48 MB. This could be explained, by the fact that HTTP

and in extension SwA has zero start-up time, since many

web servers and, in this case Tomcat, do not close the con-

nection after the ﬁrst ﬁle request resulting in better perfor-

mance for the subsequent transfer. For the remainder of the

ﬁles, HTTP exhibits lower speeds. This is probably because

5In all tests, the producer was located at the cluster site at the Univer-

sity of Amsterdam, while the consumer was at the Vrije Universiteit, also

located in Amsterdam

disk I/O starts to introduce a signiﬁcant overhead (at least

in the library’s current implementation). Disk I/O latency

affects SwA more than any other protocol, since SwA has

to ﬁrst save the attachment into the local disk, and then save

it again with the speciﬁed name. As a result it could be said

that SwA misuses storage resources. Although one would

expect TCP to have at least the same performance as HTTP,

this is not the case. Start up time is probably the cause of the

speed reduction. Our TCP implementation uses a separate

server to transfer data, and its start up time is approximately

60 msec. Additionally, the lack TCP tuning parameters, is

responsible for this suboptimal performance, together with

the blocking read/writes. This may be seen Figure 9, where

a simple Java TCP ﬁle transfer is compared to our imple-

mentation, as well as with the HTTP. The results for this

simple Java TCP ﬁle transfer were obtained by trying vari-

ous sizes for the TCP buffer. Furthermore, the performance

of this simple ﬁle transfer was even better when the delay

introduced by reading the ﬁle was eliminated6. Thus, if the

read/write of a ﬁle was done in a non-blocking way, the TCP

performance would reach the expected levels. When look-

ing at SOAP’s performance, it is dramatically lower than

any other protocol. This is because SOAP introduces a sig-

niﬁcant amount of overhead in each message. Additionally,

in order to prevent SOAP from crashing, “chunking” had to

be introduced. In this approach, the ﬁle is sent into small

chunks of data (approximately 15MB), preventing crash-

ing and bandwidth utilization. Finally, all protocols ex-

cept SOAP exhibit some speed reduction for ﬁle sizes of

1891.13 MB. In this case the transfer concerns ﬁve separate

ﬁles, thus introducing the latency of ﬁve discrete requests, a

problem that is identiﬁed and addressed in [9].

5.2 Streaming Transport

As mentioned earlier, two simple workﬂows were de-

veloped for measuring workﬂow execution time in ﬁle and

streaming transport. Figure 10 shows the execution time, of

the complete indexing operation (obtain the contents, ana-

lyze them, and save them to an index). For each ﬁle size,

the ﬁrst two measures concern the case where the PDFs

are transferred to the IndexerWS and indexed (ﬁle trans-

fer). The rest of the measures were obtained by streaming

the PDF contents directly to the IndexerWS. Although even

SOAP performs better than any other ﬁle transfer, the time

difference in execution time, is no more than 20 sec in a

total execution time of approximately 140 sec. This small

improvement may be attributed to the time needed to load a

PDF ﬁle. When eliminating the time necessary to load the

PDF’s, the performance of streaming is much better than the

one of ﬁle transport. In Figure 11, execution times were ob-

tained by having a producing web service generate a simple

6This was done by just reading data from /dev/zero

text ﬁle and sending it, for the ﬁle transport case, while for

the streaming case, the text was directly streamed to the In-

dexerWS. All protocols except of SwA outperform HTTP

and TCP ﬁle transfers. SwA’s low performance may be

again blamed on the way it handles messages. SwA gets

slower as more attachments are introduced to the SOAP

message and because they are ﬁrst saved to the local disk

and then read by the consumer, SwA proves more inefﬁcient

than just transferring the ﬁle and then index it. GridFTP

was not compared in this scenario, as the nodes able to run

a Tomcat container didn’t offer a GridFTP server, while the

ones offering a GridFTP server could not run a Tomcat con-

tainer, or didn’t have adequate storage space.

6 Future Work

The work presented in [7] proposes an interesting ap-

proach in workﬂow development. More speciﬁcally the in-

troduction of a proxy web service in the vicinity of produc-

ing web service would make sure data delivery directly to

a consuming web service. The combination of the Stream-

ing library and this proxy web service, could enable legacy

web services to exchange large data sets, while using the

most appropriate protocol for optimizing performance. This

approach, however, calls for some investigation of whether

SOAP is fast enough at delivering data to web services lo-

cated in the same container. Another approach that would

enable legacy web services to transfer large data sets, could

be the extension of Axis through some plug-in, that would

transform a SOAP message containing data to a stream ref-

erence, thus using an alternative route for data transport.

As HTTP offers an attractive solution for ﬁle transport, the

need to develop a mechanism that would also enable di-

rect data streaming is apparent. Additionally integrating an

XML header to streams, would enable developers to include

more information (e.g. metadata) in those streams. Another

beneﬁt from the inclusion of XML headers in the stream

would be the potential optimization of the streaming perfor-

mance for multiple ﬁle transfers, because the current imple-

mentation of the streaming library starts a new connection

for every ﬁle request. On a higher level, the introduction

of a registry service which producing web services could

use to register stream and ﬁle references, would offer more

ﬂexible workﬂow designs in terms of data transport.

7 Conclusions

We have identiﬁed and addressed the problem of trans-

ferring large data sets between web services. We have de-

scribed a modular and extensible Streaming library with a

simple API that is able to transfer large ﬁles, as well as con-

nect web services in a continuous data pipeline as an alter-

native approach to data transport. In our proposed approach,

SOAP is used as a control channel, while data is transferred

using the most suitable protocol for either ﬁle or streaming

transfers. In addition, the use of streaming could potentially

speed up workﬂow execution time by eliminating disk I/O

latency and enabling web services to work on data as it is

generated, rather than waiting for an entire ﬁle to be deliv-

ered.

8 Acknowledgment

This work was carried out in the context of the Virtual

Laboratory for e-Science project (www.vl-e.nl). Part of this

project is supported by a BSIK grant from the Dutch Min-

istry of Education, Culture and Science and is part of the

ICT innovation program of the Ministry of Economic Af-

fairs.

References

[1] W3C. Web Service Choreography Interface (WSCI)

1.0. http://www.w3.org/TR/wsci/.

[2] Advanced Networking Pittsburgh Supercomputing

Center. Enabling high performance data transfers,

2002.

[3] Asif Akram, Rob Allan, and David Meredith. Best

practices in web service style, data binding and val-

idation for use in data-centric scientiﬁc applications.

In UK e-Science All Hands Meeting 2006, September

2006.

[4] Bill Allcock, Joe Bester, John Bresnahan, Ann L.

Chervenak, Ian Foster, Carl Kesselman, Sam Meder,

Veronika Nefedova, Darcy Quesnel, and Steven

Tuecke. Data management and transfer in high-

performance computational grid environments. In

Parallel Computing, 2002.

[5] William E. Allcock, Ian Foster, and Ravi Madduri. Re-

liable data transport: A critical service for the grid.

In In Building Service Based Grids Workshop, Global

Grid Forum 11, 2004.

[6] Adam Barker and Jano van Hemert. Scientiﬁc work-

ﬂow: A survey and research directions. In Proceed-

ings of the Third Grid Applications and Middleware

Workshop (GAMW’2007), LNCS, page In press, 2007.

[7] Adam Barker, Jon B. Weissman, and Jano van Hemert.

Orchestrating data-centric workﬂows. In CCGRID,

pages 210–217, 2008.

[8] Don Box, David Ehnebuske, Gopal Kakivaya, Andrew

Layman, Noah Mendelsohn, Henrik Frystyk Nielsen,

Satish Thatte, and Dave Winer. Simple Object Access

Protocol (SOAP) 1.1, 2000.

[9] John Bresnahan, Michael Link, Rajkumar Kettimuthu,

Dan Fraser, and Ian Foster. Gridftp pipelining. In

Proceedings of the 2007 TeraGrid Conference, 2007.

[10] S. Charters, N. Holliman, and M. Munro. Visualiza-

tion on the grid: A web service approach. In Proceed-

ings UK eScience third All-Hands Meeting, 2004.

[11] DAS-3. http://www.cs.vu.nl/das3/.

[12] Ian Foster, Carl Kesselman, and Steven Tuecke. The

anatomy of the grid: Enabling scalable virtual organi-

zations. International Jounral of Supercomputer Ap-

plications, 15(3), 2001.

[13] M. K. Gardner, S. Thulasidasan, and W. Chun Feng.

User-space auto-tuning for tcp ow control in computa-

tional grids. Computer Communications, 2004.

[14] J. C. Jacob, D. S. Katz, G. B. Berriman, J. Good, A. C.

Laity, E. Deelman, C. Kesselman, G. Singh, M.-H. Su,

T. A. Prince, and R. Williams. Montage: A grid por-

tal and software toolkit for science-grade astronomical

image mosaicking. International Journal of Compu-

tational Science and Engineering, 2006.

[15] Rajkumar Kettimuthu, William E. Allcock, Lee Lim-

ing, John-Paul Navarro, and Ian T. Foster. Gridcopy:

Moving data fast on the grid. In IPDPS, pages 1–6.

IEEE, 2007.

[16] Lucene. http://lucene.apache.org.

[17] Chris Pelz. Web Services Orchestration and Choreog-

raphy. Computer, 36(10):46–52, October 2003.

[18] W3C. Three Web Services Recommenda-

tions. http://www.w3.org/2005/01/

xmlp-pressrelease.html.

[19] Robert van Engelen. Pushing the soap envelope with

web services for scientiﬁc computing. In ICWS, 2003.

[20] Adaptive Information Disclosure web site. http://

www.adaptivedisclosure.org.

[21] W3C. SOAP Messages with Attachments. http://

www.w3.org/TR/SOAP-attachments.

[22] Ying Ying, Yan Huang, and David W. Walker. A

Performance Evaluation of Using SOAP with Attach-

ments for e-Science. In Proceedings of the UK e-

Science All Hands Conference. Engineering and Phys-

ical Sciences Research Council, 2005.

[23] Jianting Zhang, Ilkay Altintas, Jing Tao, Xianhua Liu,

Deana D. Pennington, and William K. Michener. In-

tegrating data grid and web services for e-science ap-

plications: A case study of exploring species distribu-

tions. e-science, 0:31, 2006.

Recent Trends in Electronics & Communication Systems GSM based Remote Monitoring of Waste Gas with the Implementation of MODBUS Protocol and GPS

Article

Full-text available

Jan 2017

Industries release many toxic gases like CO2 etc., above the specified limit. This is a matter of serious concern as it affects people's health. A design has been developed for measuring and monitoring the amount of gas (particularly CO2) from any factory site. This will be of help in case when a factory or chimney releases harmful gases cross the permissible limit. An alarm will be sent to the operator's mobile when level of CO2 exceeds the standard level. It also has the temperature of that particular location and can also measure the pressure of CO2.GSM and GPS module comprising of elements like MSP 430 (controller), MG811 (gas sensor) and LM 335 (temperature sensor) are used to locate the plant. So far, researches have been done in this area with respect to the remote measurement of CO2 and that the data was sent by GSM. Here we have also used the GPS to locate the exact position of the factory site. In the present study, we have also used serial communication protocol, MODBUS (which proved to be a good replacement of low speed wireless channels) and GUI. This assisted us by monitoring and providing the update on local monitoring station and also measuring the temperature. Also the controller used is capable of responding to 247 different devices, a good choice for future expansion of a system like this.

Support for Cooperative Experiments in e-Science: From Scientific Workflows to Knowledge Sharing

Chapter

Sep 2013

The term e-Science describes computational and data-intensive science. It has become a complementary experiment paradigm alongside the traditional in vivo and in vitro experiment paradigms. e-Science opens new doors for scientists and with it, it exposes a number of challenges such as how to organize huge datasets and coordinate distributed execution. For these challenges, a plethora of technologies and innovations have come together to enable e-Science (Foster and Kesselman 2006). Nowadays, complex scientific experiments designed following the e-Science paradigm are preformed using geographically distributed instruments, data and computing resources. The newly designed scientific experiments are costly, time-consuming, and multidisciplinary. Complex scientific experiments not only require access to geographically distributed hardware and software resources, but also extensive support to foster best practices, dissemination, and re-use.

Identification of Ligand Binding Site and Protein-Protein Interaction Area

Book

Jan 2013

Contributors.- Foreword.- Chapter 1 SuMo: a tool for protein function inference based on 3D structures comparisons, J-A. Chemelle, E. Bettler, CH. Combet, R. Terreux, CH. Geourjon, G. Deleage.- Chapter 2 Identification of pockets on protein surface to predict protein-ligand binding sites, Binding Huang.- Chapter 3 Can the structure of the hydrophobic core determine the complexation site?, M. Banach, L. Konieczny, I. Roterman.- Chapter 4 Comparative analysis of techniques oriented on the recognition of ligand binding area in proteins, P. Alejster, M. Banach, W. Jurkowski, D. Marchewka, I. Roterman.- Chapter 5 Docking predictions of protein-protein interactions and their assessment: the CAPRI experiment, Joel Janin.- Chapter 6 Prediction of protein-protein binding interfaces, D. Marchewka, W. Jurkowski, M. Banach, I. Roterman.- Chapter 7 Support for Cooperative Experiments in e-Science: From Scientific Workflows to Knowledge Sharing, A. Belloum, R. Cushing, S. Koulouzis, V. Korkhov, D. Vasunin, V. Guevara-Masis, Z. Zhao, M. Bubak.- Index

Visualisation on the Grid: A Web Service Approach

Article

Full-text available

Jan 2004

The visualisation strand of the e-Demand project is working to distribute the visualisation pipeline across the Grid, allowing visualisations to be composed as needed. The traditional visualisation pipeline concept is maintained by providing an ability to divide a visualisation into its com- ponent parts and then built upon by allowing each of these parts to be deployed on an appropriate grid resource. The implementation has undergone several iterations, using re- leases of the Globus toolkit and using standard web service technology. A client that allows visualisation services to be composed has been developed. Case studies from a range of domains across the sciences demonstrate reuse of services and multiple path visualisation pipelines. An initial anal- ysis of performance using the scientific case studies as a basis for experimentation is discussed. The issues encoun- tered during the differing implementations of the architec- ture and those still outstanding are described. Future work is outlined including support for interactivity, collaboration and steering.

A performance evaluation of using SOAP with attachments for e-Science

Article

Full-text available

SOAP is now commonly used as the main transport protocol in Service-Oriented Architectures (SOA), but it is debatable whether SOAP can really meet the performance needs of e-Science. This paper presents an extended experimental evaluation of the performance of SOAP with Attachments(SwA)[1]. The performance of different SOAP variants: standard SOAP, SwA using MIME(Multipurpose Internet Mail Extension), SWA using DIME(Direct Internet Message Encapsulation), and XSOAP, are evaluated in communicating multiple floating-point matrices and a number of complex data structs. The objective of this paper is to demonstrate that SOAP performance in communicating large volumes of data could be simply and effectively improved by adopting SwA. More over, SwA using DIME is identified to be a faster and more efficient message processing approach that using MIME.

GridCopy: Moving Data Fast on the Grid

Conference Paper

Full-text available

Apr 2007

An important type of communication in grid and distributed computing environments is bulk data transfer. GridFTP has emerged as a de facto standard for secure, reliable, high-performance data transfer across resources on the grid. GridCopy provides a simple GridFTP client interface to users and extensible configuration that can be changed dynamically by administrators to make efficient data movement in the Grid easier for users.

Integrating Data Grid and Web Services for E-Science Applications: A Case Study of Exploring Species Distributions

Conference Paper

Full-text available

Dec 2006

Data Grid and Web Services are among the advanced computing technologies that are available to support scientists and scientific applications. We use Kepler scientific workflow system to integrate the two popular technologies for e-science applications. A prototype system for exploring species distribution patterns has been developed for demonstration purposes by using data grid resources and a similarity-based clustering Web service in conjunction with Geographical Information System (GIS) based spatial visualization resources in Kepler.

The Anatomy of the Grid: Enabling Scalable Virtual Organizations

Conference Paper

Full-text available

Aug 2001
INT J HIGH PERFORM C

"Grid" computing has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation. In this article, we define this new field. First, we review the "Grid problem," which we define as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources-what we refer to as virtual organizations. In such settings, we encounter unique authentication, authorization, resource access, resource discovery, and other challenges. It is this class of problem that is addressed by Grid technologies. Next, we present an extensible and open Grid architecture, in which protocols, services, application programming interfaces, and software development kits are categorized according to their roles in enabling resource sharing. We describe requirements that we believe any such mechanisms must satisfy, and we discuss the central role played by the intergrid protocols that enable interoperability among different Grid systems. Finally, we discuss how Grid technologies relate to other contemporary technologies, including enterprise integration, application service provider, storage service provider, and peer-to-peer computing. We maintain that Grid concepts and technologies complement and have much to contribute to these other approaches.

The Anatomy of the Grid - Enabling Scalable Virtual Organizations

Article

Mar 2001

Web Service Choreography Interface (WSCI) 1.0

Article

Jan 2002

GridCopy: Moving Data Fast on the Grid.

Conference Paper

Jan 2007

An important type of communication in grid and distributed computing environments is bulk data transfer. GridFTP has emerged as a de facto standard for secure, reliable, high-performance data transfer across resources on the Grid. GridCopy provides a simple GridFTP client interface to users and extensible configuration that can be changed dynamically by administrators to make efficient data movement in the Grid easier for users.

Data Management and Transfer in High-Performance Computational Grid Environments

Article

May 2002
PARALLEL COMPUT

An emerging class of data-intensive applications involve the geographically dispersed extraction of complex scientific information from very large collections of measured or computed data. Such applications arise, for example, in experimental physics, where the data in question is generated by accelerators, and in simulation science, where the data is generated by supercomputers. So-called Data Grids provide essential infrastructure for such applications, much as the Internet provides essential services for applications such as e-mail and the Web. We describe here two services that we believe are fundamental to any Data Grid: reliable, high-speed transport and replica management. Our high-speed transport service, GridFTP, extends the popular FTP protocol with new features required for Data Grid applications, such as striping and partial file access. Our replica management service integrates a replica catalog with GridFTP transfers to provide for the creation, registration, location, and management of dataset replicas. We present the design of both services and also preliminary performance results. Our implementations exploit security and other services provided by the Globus Toolkit.

User-space auto-tuning for TCP flow control in computational grids

Article

Sep 2004
COMPUT COMMUN

With the advent of computational grids, networking performance over the wide-area network (WAN) has become a critical component in the grid infrastructure. Unfortunately, many high-performance grid applications only use a small fraction of their available bandwidth because operating systems and their associated protocol stacks are still tuned for yesterday's network speeds. As a result, network gurus undertake the tedious process of manually tuning system buffers to allow TCP flow control to scale to today's WAN environments. And although recent research has shown how to set the size of these system buffers automatically at connection set-up, the buffer sizes are only appropriate at the beginning of the connection's lifetime. To address these problems, we describe an automated and lightweight technique called Dynamic Right-Sizing that can improve throughput by as much as an order of magnitude while still abiding by TCP semantics. We show the performance of two user-space implementations of DRS: drsFTP and DRS-enabled GridFTP. q 2004 Elsevier B.V. All rights reserved.

Enabling Data Transport between Web Services through alternative protocols and Streaming

Abstract and Figures

Recommended publications

Towards using SCTP as a Data Transport Protocol for Data-Intensive Batch Jobs

Empirical Analysis of TCP Variants and Their Impact on GridFTP Port Requirements

A novel data placement scheme on continuous media servers withzone-bit-recording disks

Streams