Conference PaperPDF Available

MathCloud: Publication and Reuse of Scientific Applications as RESTful Web Services

September 2013

September 2013
7979

DOI:10.1007/978-3-642-39958-9_36

Conference: 12th International Conference "Parallel Computing Technologies" (PaCT 2013)

Authors:

Alexander Afanasyev

The Institute for Information Transmission Problems

Oleg Sukhoroslov

Institute for Information Transmission Problems (Kharkevich Institute)

V. V. Voloshinov

The Institute for Information Transmission Problems (Kharkevich institute)

The paper presents MathCloud platform which enables wide-scale sharing, publication and reuse of scientific applications as RESTful web services. A unified interface of computational web service based on REST architectural style is proposed. Main components of MathCloud platform including service container, service catalogue, workflow management system, and security mechanism are described. In contrast to other similar efforts based on WS-* specifications, the described platform provides a more lightweight solution with native support for modern Web applications. The platform has been successfully used in several applications from various fields of computational science that confirm the viability of proposed approach and software platform.

Architecture of service container

…

. REST API of computational web service.

…

Graphical workflow editor

…

. Performance of Hilbert (NxN) matrix inversion application in MathCloud.

…

Security mechanism

…

Figures - uploaded by Oleg Sukhoroslov

Content may be subject to copyright.

Content uploaded by Oleg Sukhoroslov

Content may be subject to copyright.

MathCloud: Publication and Reuse of Scientific

Applications as RESTful Web Services

Alexander Afanasiev, Oleg Sukhoroslov, Vladimir Voloshinov

Institute for Information Transmission Problems of the Russian Academy of Sciences,

Bolshoy Karetny per. 19, Moscow, 127994, Russia

oleg.sukhoroslov@gmail.com

Abstract. The paper presents MathCloud platform which enables wide-scale

sharing, publication and reuse of scientific applications as RESTful web ser-

vices. A unified interface of computational web service based on REST archi-

tectural style is proposed. Main components of MathCloud platform including

service container, service catalogue, workflow management system, and securi-

ty mechanism are described. In contrast to other similar efforts based on WS-*

specifications, the described platform provides a more lightweight solution with

native support for modern Web applications. The platform has been successful-

ly used in several applications from various fields of computational science that

confirm the viability of proposed approach and software platform.

Keywords: computational web service, service-oriented scientific environment,

software as a service, REST, service container, service catalogue, workflow

1 Introduction

Modern scientific research is closely related to complex computations and analysis of

massive datasets. Computational Science is a rapidly growing field that uses advanced

computing and data analysis to solve complex scientific and engineering problems. In

their research scientists actively use software applications that implement computa-

tional algorithms, numerical methods and models of complex systems. Typically,

these applications require massive amounts of calculations and are often executed on

supercomputers or distributed computing systems.

The increasing complexity of problems being solved requires simultaneous use of

several computational codes and computing resources. This leads to an increased

complexity of applications and computing infrastructures. The multi- and interdisci-

plinary nature of modern science requires collaboration within distributed research

projects including coordinated use of scientific expertise, software and resources of

each partner. This brings a number of problems faced by a scientist in a day-to-day

research.

The reuse of existing computational software is one of key factors influencing re-

search productivity. However, the increased complexity of such software means that it

often requires specific expertise in order to install, configure and run it that is beyond

the expertise of an ordinary researcher. This specific expertise also involves configu-

ration and use of high performance computing resources required to run the software.

In some cases such expertise can be provided by IT support staff, but this brings addi-

tional operating expenses that can be prohibitive for small research teams. The prob-

lem amplifies in case of actively evolving software which means that it has to be up-

graded or reinstalled on a regular basis. In addition to problem-specific parameters

many applications require specification of additional runtime parameters such as

number of parallel processes. Mastering these parameters also requires additional

expertise that sometimes can only be provided by software authors.

Modern supercomputing centers and grid infrastructures provide researchers with

access to high performance computing resources. Such facilities also provide access

to preinstalled popular computational packages which partially solves the aforemen-

tioned problem. Researchers can also use such facilities to run arbitrary computational

code. But here lies another problem. In this case, in addition to master the software,

the researcher also has to master the subtleties of working with the command line and

the batch system of supercomputer or grid middleware. About 30 years ago such in-

terface was taken for granted, but in the eyes of a modern researcher it looks the same

as a text web browser - awkward and archaic. Without radically changing their inter-

face, scientific computing facilities have grown, become more complex inside and

harder to use.

The third issue faced by a modern computational scientist is related to the need to

combine multiple applications such as models or solvers in order to solve a complex

problem. Typically, this issue represents a complex problem on its own which natu-

rally includes all the issues discussed previously. It also brings an important problem

of interoperability between computational applications written by different authors.

Some applications are designed without interoperability in mind which means that

this issue has to be resolved by researcher itself.

The described problems severely reduce the research productivity by not allowing

scientists to focus on real problems to be solved. Therefore there is a huge demand for

high-level interfaces and problem solving environments that hide the complexity of

applications and infrastructure from a user.

The most promising approach for taming complexity and enabling reuse of appli-

cations is the use of service-oriented architecture (SOA). SOA consists of a set of

principles and methodologies for provision of applications in the form of remotely

accessible, interoperable services. The use of SOA can enable wide-scale sharing,

publication and reuse of scientific applications, as well as automation of scientific

tasks and composition of applications into new services [1].

The provision of applications as services is closely related to “Software as a Ser-

vice” (SaaS) software delivery model implemented nowadays by many web and cloud

computing services. This model has several advantages in comparison to traditional

software delivery such as ability to run software without installation using a web

browser, centralized maintenance and accelerated feature delivery. The ubiquity of

SaaS applications and the ability to access these applications via programmable APIs

have spawned development of mashups that combine data, presentation and function-

ality from multiple services, creating a composite service.

A key observation here is that, in essence, the aforementioned issues are not unique

to scientific computing. However, it is still an open question how existing approaches,

such as SOA, SaaS and Web 2.0, can be efficiently applied in the context of scientific

computing environments.

The paper presents MathCloud platform which enables wide-scale sharing, publi-

cation and reuse of scientific applications as RESTful web services. Section 2 intro-

duces a unified remote interface of computational web service based on REST archi-

tectural style. Section 3 describes main components of MathCloud platform including

service container, service catalogue, workflow management system, and security

mechanism. Section 4 presents applications and experimental evaluation of created

platform. Section 5 discusses related work.

2 Unified Interface of Computational Web Service

Currently, the dominant technology for building service-oriented systems are Web

services based on SOAP protocol, WSDL and numerous WS-* specifications (herein-

after referred to as “big Web services”). A common criticism of big Web services is

their excessive complexity and incorrect use of core principles of the Web architec-

ture [2]. The advantages of big Web services mostly apply to complex application

integration scenarios and business processes that occur in enterprise systems, while

rarely present in Web 2.0 applications that favor ease-of-use and ad hoc integration

[3].

The most promising alternative approach to implementation of web services is

based on the REST (Representational State Transfer) architectural style [4]. Thanks to

the uniform interface for accessing resources, the use of core Web standards and the

presence of numerous proven implementations, REST provides a lightweight and

robust framework for development of web services and related client applications.

This is confirmed by a proliferation of the so-called RESTful web services [2], espe-

cially within Web 2.0 applications.

Assuming that service-oriented scientific environments should also emphasize

ease-of-use and ad hoc integration of services, we propose to implement computa-

tional services as RESTful web services with a unified interface [5]. This interface or

REST API is based on the following abstract model of a computational service. A

service processes incoming client's requests to solve specific problems. A client's

request includes a parameterized description of the problem, which is represented as a

set of input parameters. Having successfully processed the request, the service returns

the result represented as a set of output parameters to the client.

The proposed unified interface of computational web service is formed by a set of

resources identified by URIs and accessible via standard HTTP methods (Table. 1).

The interface takes into account features of computational services by supporting

asynchronous request processing and passing large data parameters. Also, in accord-

ance with the service-oriented approach, the interface supports introspection, i.e.,

obtaining information about the service and its parameters.

Table 1. REST API of computational web service.

Resource GET POST DELETE

Service Get service description Submit new request (create job)

Job Get job status and results Cancel job, delete job data

File Get file data

The service resource supports two HTTP methods. GET method returns the service

description. POST method allows a client to submit a request to server. The request

body contains values of input parameters. Some of these values may contain identifi-

ers of file resources. In response to the request, the service creates a new subordinate

job resource and returns to the client identifier and current representation of the job

resource.

The job resource supports GET and DELETE methods. GET method returns job

representation with information about current job status. If the job is completed suc-

cessfully, then the job representation also contains job results in the form of values of

output parameters. Some of these values may contain identifiers of file resources.

The DELETE method of job resource allows a client to cancel job execution or, if

the job is already completed, delete job results. This method destroys the job resource

and its subordinate file resources.

The file resource represents a part of client request or job result provided as a re-

mote file. The file contents can be retrieved fully or partially via the GET method of

HTTP or other data transfer protocol.

Note that the described interface doesn’t prescribe specific templates for resource

URIs which may vary between implementations. It is desirable to respect the de-

scribed hierarchical relationships between resources while constructing these URIs.

The described interface supports job processing in both synchronous and asyn-

chronous modes. Indeed, if the job result can be immediately returned to the client,

then it is transmitted inside the returned job resource representation along with the

indication of DONE state. If, however, the processing of request takes time, it is stat-

ed in the returned job resource representation by specifying the appropriate job state

(WAITING or RUNNING). In this case, the client uses the obtained job resource

identifier for further checking of job state and obtaining its results.

The proposed REST API is incomplete without considering resource representation

formats and means of describing service parameters. The most widely used data rep-

resentation formats for Web services are XML and JSON. Among these JSON has

been chosen for the following reasons. First, JSON provides more compact and read-

able representation of data structures, while XML is focused on representation of

arbitrary documents. Second, JSON supports native integration with JavaScript lan-

guage simplifying creation of modern Ajax based Web applications.

A known disadvantage of JSON is the lack of standard tools for description and

validation of JSON data structures comparable to XML Schema. However, there is an

active ongoing work on such format called JSON Schema [6]. This format is used for

description of input and output parameters of computational web services within the

proposed REST API.

3 MathCloud Platform

MathCloud platform [7] is a software toolkit for building, deployment, discovery and

composition of computational web services using the proposed REST API. This sec-

tion presents main components of MathCloud platform.

3.1 Service Container

Service container codenamed Everest represents a core component of the platform. Its

main purpose is to provide a high-level framework for development and deployment

of computational web services. Everest simplifies service development by means of

ready-to-use adapters for common types of applications. The container also imple-

ments a universal runtime environment for such services based on the proposed REST

API. The architecture of Everest is presented in Fig. 1.

Fig. 1. Architecture of service container

The service container is based on Jersey library, a reference implementation of

JAX-RS (Java API for RESTful Web Services) specification. The container uses

built-in Jetty web server for interaction with service clients. Incoming HTTP requests

are dispatched to Jersey and then to the container. Everest is processing client re-

quests in accordance with configuration information.

The Service Manager component maintains a list of services deployed in the con-

tainer and their configuration. This information is read at startup from configuration

files. The configuration of each service consists of two parts:

• Public service description which is provided to service clients;

• Internal service configuration which is used during request processing.

The Job Manager component manages the processing of incoming requests. The

requests are converted into asynchronous jobs and placed in a queue served by a con-

figurable pool of handler threads. During job processing, handler thread invokes

adapter specified in the service configuration.

The components that implement processing of service requests (jobs) are provided

in the form of pluggable adapters. Each adapter implements a standard interface

through which the container passes request parameters, monitors the job state and

receives results.

Currently the following universal adapters are implemented.

The Command adapter converts service request to an execution of specified com-

mand in a separate process. The internal service configuration contains the command

to execute and information about mappings between service parameters and command

line arguments or external files.

The Java adapter performs invocation of a specified Java class inside the current

Java virtual machine, passing request parameters inside the call. The specified class

must implement standard Java interface. The internal service configuration includes

the name of the corresponding class.

The Cluster adapter performs translation of service request into a batch job submit-

ted to computing cluster via TORQUE resource manager. The internal service config-

uration contains the path to the batch job file and information about mappings be-

tween service parameters and job arguments or files.

The Grid adapter performs translation of service request into a grid job submitted

to the European Grid Infrastructure, which is based on gLite middleware. This adapter

can be used both to convert existing grid application to service and to port existing

service implementation to the grid. The internal service configuration contains the

name of grid virtual organization, the path to the grid job description file and infor-

mation about mappings between service parameters and job arguments or files.

Note that the all adapters, except Java, support converting of existing applications

to services by writing only a service configuration file, i.e., without writing a code.

This feature makes it possible for unskilled users to publish as services a wide range

of existing applications. Besides that, the support for pluggable adapters allows one to

attach arbitrary service implementations and computing resources.

Each service deployed in Everest is published via the proposed REST API. In addi-

tion to this, container automatically generates a complementary web interface allow-

ing users to access the service via a web browser.

3.2 Service Catalogue

The main purpose of service catalogue is to support discovery, monitoring and anno-

tation of computational web services. It is implemented as a web application with

interface and functionality similar to modern search engines.

After the service is deployed in the service container it can be published in the cat-

alogue by proving a URI of the service and a few tags describing it. The catalogue

retrieves service description via the unified REST API, performs indexing and stores

description along with specified tags in a database.

The catalogue provides a search query interface with optional filters. It supports

full text search in service descriptions and tags. Search results consist of short snip-

pets of each found service with highlighted query terms and a link to full service de-

scription.

In order to provide current information on service availability the catalogue period-

ically pings published services. If a service is not available it is marked accordingly in

search results. The catalogue also implements some experimental features similar to

collaborative Web 2.0 sites, e.g., ability to tag services by users.

3.3 Workflow Management System

In order to simplify composition of services a workflow management system is im-

plemented [8]. The system supports description, storage, publication and execution of

workflows composed of multiple services. Workflows are represented as directed

acyclic graphs and described by means of a visual editor. The described workflow can

be published as a new composite service and then executed by sending request to this

service. The system has client-server architecture. The client part of the system is

represented by workflow editor, while the server part is represented by the workflow

management service.

Fig. 2 shows the interface of the workflow editor. It is inspired by Yahoo! Pipes

and implemented as a Web application in JavaScript language. This makes it possible

to use the editor on any computer running a modern web browser.

Fig. 2. Graphical workflow editor

The workflow is represented in the form of a directed acyclic graph whose vertices

correspond to workflow blocks and edges define data flow between the blocks. Each

block has a set of inputs and outputs displayed in the form of ports at the top and at

the bottom of the block respectively. Each block implements a certain logic of

processing of input data and generating of output data. Data transfer between blocks

is realized by connecting the output of one block to the input of another block. Each

input or output has associated data type. The compatibility of data types is checked

during connecting the ports.

The introduction of a service in a workflow is implemented by creating a new

Service block and specifying the service URI. It is assumed that the service

implements the unified REST API. This allows the editor to dynamically retrieve

service description and extract information about the number, types and names of

input and output parameters of the service. This information is used to automatically

generate the corresponding input and output ports of the block.

The unified REST API provides a basis for service interoperability on the interface

level. The user can connect any output of one service with any input of another

service if both ports have compatible data types. However, it is important to note that

the system doesn’t check the compatibility of data formats and semantics of the

corresponding parameters. It is the task of the user to ensure this.

An important feature of the editor is ability to run a workflow and display its state

during the execution. Before the workflow can be run it is necessary to set the values

of all input parameters of the workflow via the appropriate Input blocks. After the

user clicks on the Run button, the editor makes a call with the specified input

parameters to the composite service representing the workflow. Then the editor

performs a periodic check of the status of running job, which includes information

about states of individual blocks of the workflow. This information is displayed to the

user by painting each workflow block in the color corresponding to its current state.

After successful completion of the workflow, the values of workflow output

parameters are displayed in the Output blocks. Each workflow instance has a unique

URI which can be used to open the current state of the instance in the editor at any

time. This feature is especially useful for long-running workflows.

The workflow management service (WMS) performs storage, deployment and

execution of workflows created with the described editor. In accordance with the

service-oriented approach the WMS deploys each saved workflow as a new service.

The subsequent workflow execution is performed by sending request to the new

composite service through the unified REST API. Such requests are processed by the

workflow runtime embedded in WMS. The WMS is implemented as a RESTful web

service. This provides a most convenient way to interact with the WMS from the

workflow editor.

3.4 Security

All platform components use common security mechanism (Fig. 3) for protecting

access to services. It supports authentication, authorization and a limited form of

delegation based on common security technologies.

Authentication of services is implemented by means of SSL server certificates. Au-

thentication of clients is implemented via two mechanisms. The first one is standard

X.509 client certificate. The second is Loginza service which supports authentication

via popular identity providers (Google, Facebook, etc.) or any OpenID provider. The

latter mechanism, which is available only for browser clients, is convenient for users

who don’t have a certificate.

Fig. 3. Security mechanism

Authorization is supported by means of allow and deny lists which enable service

administrator to specify users which should or should not have access to a service. A

user can be specified using its certificate’s distinguished name or OpenId identifier.

A well-known security challenge in service-oriented environments is providing a

mechanism for a service to act on behalf of a user, i.e., invoke other services. A com-

mon use case is a workflow service which needs to access services involved in the

workflow on behalf of a user invoked the service. For such cases a proxying mecha-

nism is implemented by means of a proxy list which enable service administrator to

specify certificates of services that are trusted to invoke the service on behalf of users.

In comparison to proxy certificate mechanism used in grids, this approach is more

limited but provides a lightweight solution compatible with the proposed REST API.

3.5 Clients

The platform provides Java, Python and command-line clients for accessing services

from external applications and implementing complex workflows. Since the access to

services is implemented via REST API, one can use any standard HTTP library or

client (e.g., curl) to interact with services. Also, thanks to using JSON format, ser-

vices can be easily accessed from JavaScript applications via Ajax calls. This simpli-

fies development of modern Web-based interfaces to services, in contrast to ap-

proaches based on big Web services and XML data format.

4 Applications

MathCloud platform has been used in several applications from various fields of

computational science. This section describes some of these applications and

summarizes general conclusions drawn from their development.

One of the first applications of MathCloud platform concerns an “error-free”

inversion of ill-conditioned matrix [9], a well-known challenging task in

computational science. The application uses symbolic computation techniques

available in computer algebra systems (CAS) which require substantial computing

time and memory. To address this issue a distributed algorithm of matrix inversion

has been implemented via Maxima CAS system exposed as a computational web

service. The algorithm was implemented as a workflow based on block

decomposition of input matrix and Schur complement. The approach has been

validated by inversion of Hilbert matrices up to 500×500.

The matrix inversion application provided an opportunity to evaluate performance

of all platform components, in particular with respect to passing large amounts of data

between services. In the case of extremely ill-conditioned matrices the symbolic

representation of final and intermediate results reached up to hundreds of megabytes.

Table 2 presents obtained performance results including serial execution time in

Maxima, parallel execution time in MathCloud (using 4-block decomposition) and

observed speedup. Additional analysis revealed that the overhead introduced by the

platform including data transfer is about 2-5% of total computing time.

Table 2. Performance of Hilbert (NxN) matrix inversion application in MathCloud.

N Serial execution time in Maxima,

minutes

Parallel execution time in MathCloud,

minutes

Speedup

250 8 5 1,60

300 15 8 1,88

350 27 13 2,08

400 45 20 2,25

450 72 30 2,40

500 109 40 2,73

Another MathCloud application has been developed for interpreting the data of X-

ray diffractometry of carbonaceous films by means of solving optimization problems

within a broad class of carbon nanostructures [10]. The application is implemented as

a workflow which combines parallel calculations of scattering curves for individual

nanostructures (performed by a grid application) with subsequent solution of

optimization problems (performed by three different solvers running on a cluster) to

determine the most probable topological and size distribution of nanostructures. All

these parts of computing scheme (and a number of additional steps, e.g., data

preparation, post-optimal processing and plotting) have been implemented as

computational web services. The application helped to reveal the prevalence of low-

aspect-ratio toroids in tested films [11].

A recent work [12-13] concerns a uniform approach to creation of computational

web services related to optimization modeling. MathCloud platform is used within

this work to integrate various optimization solvers intended for basic classes of math-

ematical programming problems and translators of AMPL optimization modeling

language. A number of created computational web services and workflows cover all

basic phases of optimization modeling techniques: input of optimization problems’

data, interaction with solvers, processing of solutions found.

A special service has been developed that implements dispatching of optimization

tasks to a pool of solver services directly via AMPL translator's execution. These

features enable running any optimization algorithm written as an AMPL script in

distributed mode when all problems (and/or intermediate subproblems) are solved by

remote optimization services. Independent problems are solved in parallel thus in-

creasing overall performance in accordance with the number of available services.

The proposed approach has been validated by the example of Dantzig–Wolfe decom-

position algorithm for multi-commodity transportation problem.

The experience gained from application development shows that MathCloud plat-

form can be efficiently applied for solving a wide class of problems. This class can be

described as problems that allow decomposition into several coarse-grained suprob-

lems (dependent or independent) that can be solved by existing applications repre-

sented as services. It is important to note that while MathCloud can be used as a paral-

lel computing platform in homogeneous environments such as a cluster, it is generally

not as efficient in this setting as dedicated technologies such as MPI. The main bene-

fits of MathCloud are revealed in heterogeneous distributed environments involving

multiple resources and applications belonging to different users and organizations.

The exposing of computational applications as web services is rather straightfor-

ward with MathCloud. From our experience it usually takes from tens of minutes to a

couple of hours to produce a new service including service deployment and debug-

ging. This is mainly due to the fact that the service interface is fixed and the service

container provides a framework that implements all problem-independent parts of a

service. That means that a user doesn’t need to develop a service from scratch as it

happens when using general purpose service-oriented platforms. In many cases ser-

vice development reduces to writing a service configuration file. In other cases a de-

velopment of additional application wrapper is needed which is usually accomplished

by writing a simple shell or Python script.

The workflow development is somewhat harder, especially in the case of complex

workflows. However, the workflow editor provides some means for dealing with this.

First of all, it enables dividing complex workflow into several simpler sub-workflows

by supporting publishing and composing of workflows as services. Second, it is pos-

sible to add custom workflow actions written in JavaScript or Python, for example to

create complex string inputs for services from user data or to get additional timing.

Finally, besides the graphical editor it is possible to download workflow in JSON

format, edit it manually and upload back to WMS. These and other features provide

rather good usability for practical use of MathCloud platform.

5 Related Work

The use of service-oriented approach in the context of scientific computing was pro-

posed in [1]. Service-Oriented Science as introduced by Foster refers to scientific

research enabled by distributed networks of interoperating services.

The first attempts to provide a software platform for Service-Oriented Science

were made in Globus Toolkit 3 and 4 based on the Open Grid Services Architecture

(OGSA) [14]. OGSA describes a service-oriented grid computing environment based

on big Web services. Globus Toolkit 3/4 provided service containers for deployment

of stateful grid services that extended big Web services. These extensions were doc-

umented in the Web Services Resource Framework (WSRF) specification [15]. It

largely failed due to inherent complexity and inefficiencies of both specification and

its implementations. Globus Toolkit 4 had steep learning curve and provided no tools

for rapid deployment of existing applications as services and connecting services to

grid resources.

There have been several efforts aiming at simplifying transformation of scientific

applications into remotely accessible services. The Java CoG Kit [16] provided a way

to expose legacy applications as Web services. It uses a serviceMap document to

generate source code and WSDL descriptor for the Web service implementation. Ge-

neric Factory Service (GFac) [17] provides automatic service generation using an

XML-based application description language. Instead of source code generation, it

uses an XSUL Message Processor to intercept the SOAP calls and route it to a generic

class that invokes the scientific application. SoapLab [18] is another toolkit that uses

an application description language called ACD to provide automatic Web service

wrappers.

Grid Execution Management for Legacy Code Architecture (GEMLCA) [19] im-

plements a general architecture for deploying legacy applications as grid services. It

implements an application repository and a set of WSRF-based grid services for de-

ployment, execution and administration of applications. Instead of generation of dif-

ferent WSDLs for every deployed application as in GFac and SoapLab, GEMLCA

uses generic interface and client for application execution. The execution of applica-

tions is implemented by submission of grid jobs through back-end plugins supporting

Globus Toolkit and gLite middleware. GEMLCA was integrated with the P-GRADE

grid portal [20] in order to provide user-friendly Web interfaces for application de-

ployment and execution. The workflow editor of the P-GRADE portal supports con-

nection of GEMLCA services into workflows using a Web-based graphical environ-

ment.

Opal [21] and Opal2 [22] toolkits provide a mechanism to deploy scientific appli-

cations as Web services with standard WSDL interface. This interface provides opera-

tions for job launch (which accepts command-line arguments and input files as its

parameters), querying status, and retrieving outputs. In comparison to GEMLCA,

Opal toolkit deploys a new Web service for each wrapped application. Opal also pro-

vides an optional XML-based specification for command-line arguments, which is

used to generate automatic Web forms for service invocation. In addition to Base64

encoded inputs, Opal2 supports transfer of input files from remote URLs and via

MIME attachments which greatly improves the performance of input staging. It sup-

ports several computational back-ends including GRAM, DRMAA, Torque, Condor

and CSF meta-scheduler.

The described toolkits have many similarities with the presented software platform,

e.g., declarative application description, uniform service interface, asynchronous job

processing. A key difference is related to the way services are implemented. While all

mentioned toolkits use big Web services and XML data format, the MathCloud plat-

form exposes applications as RESTful web services using JSON format. The major

advantages of this approach are decreased complexity, use of core Web standards,

wide adoption and native support for modern Web applications as discussed in Sec-

tion 2.

The idea of using RESTful web services and Web 2.0 technologies as a lightweight

alternative to big Web services for building service-oriented scientific environments

was introduced in [23]. Given the level of adoption of Web 2.0 technologies relative

to grid technologies, Fox et al. suggested the replacement of many grid components

with their Web 2.0 equivalents. Nevertheless, to the authors’ knowledge, there are no

other efforts to create a general purpose service-oriented toolkit for scientific applica-

tions based on RESTful web services.

There are many examples of applying Web 2.0 technologies in scientific research

in the form of Web-based scientific gateways and collaborative environments [20, 24-

25]. While such systems support convenient access to scientific applications via Web

interfaces, they don’t expose applications as services thus limiting application reuse

and composition.

There are many scientific workflow systems, e.g. [26-28]. The system described in

the paper stands out among these by providing a Web-based interface, automatic pub-

lication of workflows as composite services and native support for RESTful web ser-

vices.

6 Conclusion

The paper presented MathCloud platform which enables wide-scale sharing, publica-

tion and reuse of scientific applications as RESTful web services based on the pro-

posed unified REST API. In contrast to other similar efforts based on Web Services

specifications, it provides a more lightweight solution with native support for modern

Web applications. MathCloud includes all core tools for building a service-oriented

environment such as service container, service catalogue and workflow system. The

platform has been successfully used in several applications from various fields of

computational science that confirm the viability of proposed approach and software

platform.

The future work will be focused on building a hosted Platform-as-a-Service (PaaS)

for development, sharing and integration of computational web services based on the

described software platform.

Acknowledgements. The work is supported by the Presidium of the Russian Acade-

my of Sciences (the basic research program No.14) and the Russian Foundation for

Basic Research (grant No. 11-07-00543-а).

References

1. Foster, I.: Service-Oriented Science. Science, vol. 308, no. 5723, pp. 814–817 (2005)

2. Richardson, L., Ruby, S.: RESTful Web Services. O’Reilly Media (2007)

3. Pautasso, C., Zimmermann, O., Leymann, F.: Restful web services vs. "big"' web services:

making the right architectural decision. In: 17th international conference on World Wide

Web (WWW '08). ACM, New York, NY, USA, pp. 805-814 (2008)

4. Fielding, R. T.: Architectural Styles and the Design of Network-based Software Architec-

tures. Ph.D. dissertation, University of California, Irvine, Irvine, California (2000)

5. Sukhoroslov, O.V.: Unified Interface for Accessing Algorithmic Services in Web.

Proceedings of ISA RAS, vol. 46, pp. 60-82 (2009) (in Russian)

6. JSON Schema, http://www.json-schema.org/

7. MathCloud Project, http://mathcloud.org/

8. Lazarev, I.V., Sukhoroslov, O.V.: Implementation of Distributed Computing Workflows in

MathCloud Environment. Proceedings of ISA RAS, vol. 46, pp. 6-23 (2009) (in Russian)

9. Voloshinov, V.V., Smirnov, S.A.: Error-Free Inversion of Ill-Conditioned Matrices in

Distributed Computing System of RESTful Services of Computer Algebra. In: 4th Intern.

Conf. Distributed Computing and Grid-Technologies in Science and Education, pp. 257-

263. JINR, Dubna (2010)

10. Vneverov, V.S., Kukushkin, A.B., Marusov, N.L. et al.: Numerical Modeling of

Interference Effects of X-Ray Scattering by Carbon Nanostructures in the Deposited Films

from TOKAMAK T-10. Problems of Atomic Science and Technology, Ser.

Thermonuclear Fusion, Vol. 1, 2011, pp. 13-24 (2011) (in Russian)

11. Kukushkin, A.B., Neverov, V.S., Marusov, N.L. et al.: Few-nanometer-wide carbon

toroids in the hydrocarbon ﬁlms deposited in tokamak T-10. Chemical Physics Letters 506,

pp. 265–268 (2011)

12. Voloshinov, V.V., Smirnov, S.A.: On development of distributed optimization modelling

systems in the REST architectural style. In: 5th Intern. Conf. Distributed Computing and

Grid-Technologies in Science and Education. JINR, Dubna (2012)

13. V.V. Voloshinov, S.A. Smirnov: Software Integration in Scientific Computing.

Information Technologies and Computing Systems, No. 3, pp. 66-71 (2012) (in Russian)

14. Foster, I., Kesselman, C., Nick, J., Tuecke, S.: Grid Services for Distributed System

Integration. Computer 35(6), pp. 37-46 (2002)

15. WS-Resource Framework, http://www-

106.ibm.com/developerworks/library/ws-resource/ws-wsrf.pdf

16. von Laszewski, G., Gawor, J., Krishnan, S., Jackson, K.: Commodity Grid Kits -

Middleware for Building Grid Computing Environments. In: Grid Computing: Making the

Global Infrastructure a Reality, chapter 25. Wiley (2003)

17. Kandaswamy, G., Fang, L., Huang, Y., Shirasuna, S., Marru, S., Gannon, D.: Building

Web Services for Scientific Grid Applications. IBM Journal of Research and

Development, , vol. 50, no. 2.3, pp. 249-260 (2006)

18. SoapLab Web Services, http://www.ebi.ac.uk/soaplab/

19. Delaitre, T., Kiss, T., Goyeneche, A., Terstyanszky, G., Winter, S., Kacsuk, P.:

GEMLCA: Running Legacy Code Applications as Grid Services. Journal of Grid

Computing Vol. 3. No. 1-2, pp. 75-90 (2005)

20. Kacsuk, P., Sipos, G.: Multi-Grid, Multi-User Workflows in the P-GRADE Portal. Journal

of Grid Computing, Vol. 3, No. 3-4, Springer, pp. 221-238 (2005)

21. Krishnan, S., Stearn, B., Bhatia, K., Baldridge, K. K., Li, W., Arzberger, P.: Opal: Simple

Web Services Wrappers for Scientific Applications. In: IEEE Intl. Conf. on Web Services

(ICWS) (2006)

22. Krishnan, S., Clementi, L., Ren, J., Papadopoulos, P., Li, W.: Design and Evaluation of

Opal2: A Toolkit for Scientific Software as a Service. In: 2009 IEEE Congress on Services

(SERVICES-1 2009), pp.709-716 (2009)

23. Fox, G., Pierce, M.: Grids Challenged by a Web 2.0 and Multicore Sandwich.

Concurrency and Computation: Practice and Experience, vol. 21, no. 3, pp. 265–280

(2009)

24. McLennan, M., Kennell, R.: HUBzero: A Platform for Dissemination and Collaboration in

Computational Science and Engineering. Computing in Science and Engineering, 12(2),

pp. 48-52, March/April (2010)

25. Afgan, E., Goecks, J., Baker, D., Coraor, N., Nekrutenko, A., Taylor, J.: Galaxy - a

Gateway to Tools in e-Science. In: K. Yang, Ed. (ed) Guide to e-Science: Next Generation

Scientific Research and Discovery, pp. 145-177. Springer (2011)

26. Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludäscher, B., Mock, S.: Kepler: an

extensible system for design and execution of scientific workflows. In: Proceedings of the

16th International Conference on Scientific and Statistical Database Management

(SSDBM '04), pp. 423-424 (2004)

27. Deelman, E., Singh, G., Su, M. H. et al.: Pegasus: a framework for mapping complex

scientific workflows onto distributed systems. Scientific Programming, vol. 13, no. 3, pp.

219-237 (2005)

28. Oinn, T., Greenwood, M., Addis, M. et al.: Taverna: lessons in creating a workflow

environment for the life sciences. Concurrency Computation Practice and Experience, vol.

18, no. 10, pp. 1067–1100 (2006)

Cloud Implementation to Assist Teachers of English to Speakers of Other Languages in HEI’s in Sultanate of Oman

Article

Full-text available

Jan 2021

Cloud Computing has been implemented in diverse fields and sector, including education sector with a very good success rate. The cloud deployments enhance the learning experience and provisions appropriate IT resources critical for a specific learning environment. This research study exploits a service model to deliver appropriate learning resources to foundation students and teachers that otherwise is not available with the current cloud service architectures. Language should not be a barrier to learning, teaching subjects in English to those who are not native English speakers, most of the times is a challenging job especially in circumstances where the English Language is introduced late in the school academic curriculum. There are several challenges faced by students, and teachers. Several mechanisms have been suggested and adopted to deal with such situations so that student learning experience is not affected in a significant manner, and at the same time, teachers are also not facing problems while explaining concepts or topics to students. This research paper is a study on helping teachers and students in these situations by providing cloud-based knowledge base related to TESOL (Teachers of English to Speakers of Other Languages), repositories, and apps that may help and assist teacher and students.

Supporting Efficient Execution of Workflows on Everest Platform

Chapter

Full-text available

Dec 2019

Oleg Sukhoroslov

Workflows is an important class of parallel applications consisting of multiple tasks with control or data dependencies between them. Such applications are widely used for automation of computational and data processing pipelines in science and technology. The paper considers the execution of workflows on Everest, a web-based distributed computing platform. Two different approaches for workflow description and execution are presented and compared. The proposed solutions and optimizations address common problems related to the efficient execution of workflows on distributed computing resources. The results of experimental study demonstrate the effectiveness of proposed optimizations.

Development of distributed computing applications and services with Everest cloud platform

Article

Full-text available

Jun 2015

The use of service-oriented approach in scientific domains can increase research productivity by enabling sharing, publication and reuse of computing applications, as well as automation of scientific workflows. Everest is a cloud platform that enables researchers with minimal skills to publish and use scientific applications as services. In contrast to existing solutions, Everest executes applications on external resources attached by users, implements flexible binding of resources to applications and supports programmatic access to the platform's functionality. The paper presents current state of the platform, recent developments and remaining challenges.

Supporting Efficient Execution of Workflows on Everest Platform

Conference Paper

Full-text available

Sep 2019

Oleg Sukhoroslov

Using Resources of Supercomputing Centers with Everest Platform

Chapter

Full-text available

Dec 2018

High-performance computing plays an increasingly important role in modern science and technology. However, the lack of convenient interfaces and automation tools greatly complicates the widespread use of HPC resources among scientists. The paper presents an approach to solving these problems relying on Everest, a web-based distributed computing platform. The platform enables convenient access to HPC resources by means of domain-specific computational web services, development and execution of many-task applications, and pooling of multiple resources for running distributed computations. The paper describes the improvements that have been made to the platform based on the experience of integration with resources of supercomputing centers. The use of HPC resources via Everest is demonstrated on the example of loosely coupled many-task application for solving global optimization problems.

Building Robust Information Systems For Remote Sensing Data Management

Conference Paper

Oct 2023

Automation of EEG Data Processing with HPC Community Cloud

Conference Paper

Jun 2023

A Web-Based Platform for Interactive Parameter Study of Large-Scale Lattice Gas Automata

Chapter

Jul 2019

A problem of development of user-friendly interfaces for high performance computing (HPC) applications is addressed. The HPC Community Cloud (HPC2C) service that provides a RESTful application programming interface for unified control of HPC jobs was used to develop a prototype of a web-based UI for cellular automata simulation package. The UI allows a user to easily run multiple simulations on remote HPC resources and, this way, study a parameter space of a cellular automaton. The interface was used to organize a series of numerical experiments resulting in reproduction of the Kármán vortex street.

Transparent Execution of Data Transformations in Data-Aware Service Choreographies: Confederated International Conferences: CoopIS, C&TC, and ODBASE 2018, Valletta, Malta, October 22-26, 2018, Proceedings, Part II

Chapter

Oct 2018

Due to recent advances in data science, IoT, and Big Data, the importance of data is steadily increasing in the domain of business process management. Service choreographies provide means to model complex conversations between collaborating parties from a global viewpoint. However, the involved parties often rely on their own data formats. To still enable the interaction between them within choreographies, the underlying business data has to be transformed between the different data formats. The state-of-the-art in modeling such data transformations as additional tasks in choreography models is error-prone, time consuming and pollutes the models with functionality that is not relevant from a business perspective but technically required. As a first step to tackle these issues, we introduced in previous works a data transformation modeling extension for defining data transformations on the level of choreography models independent of their control flow as well as concrete technologies or tools. However, this modeling extension is not executable yet. Therefore, this paper presents an approach and a supporting integration middleware which enable to provide and execute data transformation implementations based on various technologies or tools in a generic and technology-independent manner to realize an end-to-end support for modeling and execution of data transformations in service choreographies.

Services and Cloud Infrastructure to Support Interdisciplinary Scientific Research

Conference Paper

Aug 2018

NUMERICAL MODELING OF INTERFERENCE EFFECTS OF X-RAY SCATTERING BY CARBON NANOSTRUCTURES IN THE DEPOSITED FILMS FROM TOKAMAK T-10

Article

Full-text available

Jan 2011

Galaxy: A Gateway to Tools in e-Science

Article

Full-text available

Apr 2011

e-Science focuses on the use of computational tools and resources to analyze large scientific datasets. Performing these analyses often requires running a variety of computational tools specific to a given scientific domain. This places a significant burden on individual researchers for whom simply running these tools may be prohibitively difficult, let alone combining tools into a complete analysis, or acquiring data and appropriate computational resources. This limits the productivity of individual researchers and represents a significant barrier to potential scientific discovery. In order to alleviate researchers from such unnecessary complexities and promote more robust science, we have developed a tool integration framework called Galaxy; Galaxy abstracts individual tools behind a consistent and easy-to-use web interface to enable advanced data analysis that requires no informatics expertise. Furthermore, Galaxy facilitates easy addition of developed tools, thus supporting tool developers, as well as transparent and reproducible communication of computationally intensive analyses. Recently, we have enabled trivial deployment of complete a Galaxy solution on aggregated infrastructures, including cloud computing providers.

Multi-Grid, multi-user workflows in the P-GRADE Grid portal

Article

Full-text available

Sep 2005

Computational Grids connect resources and users in a complex way in order to deliver nontrivial qualities of services. According to the current trend various communities build their own Grids and due to the lack of generally accepted standards these Grids are usually not interoperable. As a result, large scale sharing of resources is prevented by the isolation of Grid systems. Similarly, people are isolated, because the collaborative work of Grid users is not supported by current environments. Each user accesses Grids as an individual person without having the possibility of organizing teams that could overcome the difficulties of application development and execution more easily. The paper describes a new workflow-oriented portal concept that solves both problems. It enables the interoperability of various Grids during the execution of workflow applications, and supports users to develop and run their Grid workflows in a collaborative way. The paper also introduces a classification model that can be used to identify workflow-oriented Grid portals based on two general features: Ability to access multiple Grids, and support for collaborative problem solving. Using the approach the different potential portal types are introduced, their unique features are discussed and the portals and Problem Solving Environments (PSE) of our days are classified. The P-GRADE Portal as a Globus-based implementation for the classification model is also presented.

Opal: SimpleWeb Services Wrappers for Scientific Applications

Conference Paper

Full-text available

Jan 2006

Abstract— The Grid-based computational infrastructure en- ables large-scale scientific applications to be run on distributed resources and coupled in innovative ways. However, in practice, Grid-based resources are not very easy to use for the end-user. The end-user has to learn how to generate security credentials, stage inputs and outputs, access Grid-based schedulers, and install complex client software to do so. This has proved to be an effective deterrent for a number,of scientific users. There is an imminent,need to provide transparent access to these resources so that the end-users are shielded from the complicated details, and free to concentrate on their domain science. Scientific applications wrapped,as Web services alleviate some of these problems by hiding the complexities of the back-end security and computational infrastructure, only exposing a simple SOAP API that can be accessed programmatically by application-specific user interfaces. However, writing the application services that access Grid resources can itself be complicated, especially if it has to be replicated for every application. In this paper, we will present Opal, which is a toolkit for wrapping scientific applications as Web services in a matter of hours. Opal provides features such as scheduling, standards-based Grid security, and data management,in an easy-to-use and configurable manner. We will present some of the scientific applications that have been deployed using Opal, and describe the steps involved in doing so. Furthermore, we will demonstrate access to the Opal-based scientific applications via a number,of different clients. With the

RESTful Web Services vs. "Big" Web Services - Making the Right Architectural Decisions

Conference Paper

Full-text available

Apr 2008

Recent technology trends in the Web Services (WS) domain in- dicate that a solution eliminating the presumed complexity of the WS-* standards may be in sight: advocates of REpresentational State Transfer (REST) have come to believe that their ideas ex- plaining why the World Wide Web works are just as applicable to solve enterprise application integration problems and to simplify the plumbing required to build service-oriented architectures. In this paper we objectify the WS-* vs. REST debate by giving a quantitative technical comparison based on architectural principles and decisions. We show that the two approaches differ in the num- ber of architectural decisions that must be made and in the number of available alternatives. This discrepancy between freedom-from- choice and freedom-of-choice explains the complexity difference perceived. However, we also show that there are significant dif- ferences in the consequences of certain decisions in terms of re- sulting development and maintenance costs. Our comparison helps technical decision makers to assess the two integration styles and technologies more objectively and select the one that best fits their needs: REST is well suited for basic, ad hoc integration scenarios, WS-* is more flexible and addresses advanced quality of service requirements commonly occurring in enterprise computing.

Design and Evaluation of Opal2: A Toolkit for Scientific Software as a Service

Conference Paper

Full-text available

Jul 2009

Grid computing provides mechanisms for making large-scale computing environments available to the masses. In recent times, with the advent of Cloud computing, the concepts of Software as a Service (SaaS), where vendors provide key software products as services over the internet that can be accessed by users to perform complex tasks, and Service as Software (SaS), where customizable and repeatable services are packaged as software products that dynamically meet the demands of individual users, have become increasingly popular. Both SaaS and SaS models are highly applicable to scientific software and users alike. Opal2 is a toolkit for wrapping scientific applications as Web services on Grid and cloud computing resources. It provides a mechanism for scientific application developers to expose the functionality of their codes via simple Web service APIs, abstracting out the details of the back-end infrastructure. Services may be combined via cus- tomized workflows for specific research areas and distributed as virtual machine images. In this paper, we describe the overall philosophy and architecture of the Opal2 framework, including its new plug-in architecture and data handling capabilities. We analyze its performance in typical cluster and Grid settings, and in a cloud computing environment within virtual machines, using Amazon's Elastic Computing Cloud (EC2).

RESTful Web Services

Article

Jan 2007

Few-nanometer-wide carbon toroids in the hydrocarbon films deposited in tokamak T-10

Article

Apr 2011
CHEM PHYS LETT

Measurements of X-ray scattering by the hydrocarbon films, deposited in tokamak T-10, in the range of scattering wave vector’s modulus q∼5÷70nm-1 are interpreted via solving an optimization problem within a broad class of carbon nanostructures (tubules, fullerenes and bigger spheres, ellipsoids, toroids, half-fragments of all these structures, flat flakes, etc.), with allowance for amorphous hydrocarbon component and heavy impurities. It is shown that (i) wide peaks at q∼10,30 and 55nm-1 may be caused by the nanostructures formed by the curved single-layer graphene, (ii) respective most probable candidates are 2–3nm wide, low-aspect-ratio toroids.

Commodity Grid Kits – Middleware for Building Grid Computing Environments

Chapter

May 2003

IntroductionGrid Computing Environments and PortalsCommodity TechnologiesOverview of the Java CoG KitCurrent WorkAdvanced CoG Kit ComponentsConclusion AvailabilityAcknowledgementsReferences

HUBzero: A Platform for Dissemination and Collaboration in Computational Science and Engineering

Article

May 2010

The HUBzero cyberinfrastructure lets scientific researchers work together online to develop simulation and modeling tools. Other researchers can then access the resulting tools using an ordinary Web browser and launch simulation runs on the national Grid infrastructure, without having to download or compile any code.

MathCloud: Publication and Reuse of Scientific Applications as RESTful Web Services

Abstract and Figures

Recommended publications

Selected operation of workflow base on fuzzy integrated optimizing control

Everest: A Cloud Platform for Computational Web Services

A Web-Based Platform for Publication and Distributed Execution of Computing Applications

Program Autotuning As a Service: Opportunities and Challenges

Using Resources of Supercomputing Centers with Everest Platform