Conference PaperPDF Available

MathCloud: Publication and Reuse of Scientific Applications as RESTful Web Services

Authors:
  • Institute for Information Transmission Problems (Kharkevich Institute)
  • The Institute for Information Transmission Problems (Kharkevich institute)

Abstract and Figures

The paper presents MathCloud platform which enables wide-scale sharing, publication and reuse of scientific applications as RESTful web services. A unified interface of computational web service based on REST architectural style is proposed. Main components of MathCloud platform including service container, service catalogue, workflow management system, and security mechanism are described. In contrast to other similar efforts based on WS-* specifications, the described platform provides a more lightweight solution with native support for modern Web applications. The platform has been successfully used in several applications from various fields of computational science that confirm the viability of proposed approach and software platform.
Content may be subject to copyright.
MathCloud: Publication and Reuse of Scientific
Applications as RESTful Web Services
Alexander Afanasiev, Oleg Sukhoroslov, Vladimir Voloshinov
Institute for Information Transmission Problems of the Russian Academy of Sciences,
Bolshoy Karetny per. 19, Moscow, 127994, Russia
oleg.sukhoroslov@gmail.com
Abstract. The paper presents MathCloud platform which enables wide-scale
sharing, publication and reuse of scientific applications as RESTful web ser-
vices. A unified interface of computational web service based on REST archi-
tectural style is proposed. Main components of MathCloud platform including
service container, service catalogue, workflow management system, and securi-
ty mechanism are described. In contrast to other similar efforts based on WS-*
specifications, the described platform provides a more lightweight solution with
native support for modern Web applications. The platform has been successful-
ly used in several applications from various fields of computational science that
confirm the viability of proposed approach and software platform.
Keywords: computational web service, service-oriented scientific environment,
software as a service, REST, service container, service catalogue, workflow
1 Introduction
Modern scientific research is closely related to complex computations and analysis of
massive datasets. Computational Science is a rapidly growing field that uses advanced
computing and data analysis to solve complex scientific and engineering problems. In
their research scientists actively use software applications that implement computa-
tional algorithms, numerical methods and models of complex systems. Typically,
these applications require massive amounts of calculations and are often executed on
supercomputers or distributed computing systems.
The increasing complexity of problems being solved requires simultaneous use of
several computational codes and computing resources. This leads to an increased
complexity of applications and computing infrastructures. The multi- and interdisci-
plinary nature of modern science requires collaboration within distributed research
projects including coordinated use of scientific expertise, software and resources of
each partner. This brings a number of problems faced by a scientist in a day-to-day
research.
The reuse of existing computational software is one of key factors influencing re-
search productivity. However, the increased complexity of such software means that it
often requires specific expertise in order to install, configure and run it that is beyond
the expertise of an ordinary researcher. This specific expertise also involves configu-
ration and use of high performance computing resources required to run the software.
In some cases such expertise can be provided by IT support staff, but this brings addi-
tional operating expenses that can be prohibitive for small research teams. The prob-
lem amplifies in case of actively evolving software which means that it has to be up-
graded or reinstalled on a regular basis. In addition to problem-specific parameters
many applications require specification of additional runtime parameters such as
number of parallel processes. Mastering these parameters also requires additional
expertise that sometimes can only be provided by software authors.
Modern supercomputing centers and grid infrastructures provide researchers with
access to high performance computing resources. Such facilities also provide access
to preinstalled popular computational packages which partially solves the aforemen-
tioned problem. Researchers can also use such facilities to run arbitrary computational
code. But here lies another problem. In this case, in addition to master the software,
the researcher also has to master the subtleties of working with the command line and
the batch system of supercomputer or grid middleware. About 30 years ago such in-
terface was taken for granted, but in the eyes of a modern researcher it looks the same
as a text web browser - awkward and archaic. Without radically changing their inter-
face, scientific computing facilities have grown, become more complex inside and
harder to use.
The third issue faced by a modern computational scientist is related to the need to
combine multiple applications such as models or solvers in order to solve a complex
problem. Typically, this issue represents a complex problem on its own which natu-
rally includes all the issues discussed previously. It also brings an important problem
of interoperability between computational applications written by different authors.
Some applications are designed without interoperability in mind which means that
this issue has to be resolved by researcher itself.
The described problems severely reduce the research productivity by not allowing
scientists to focus on real problems to be solved. Therefore there is a huge demand for
high-level interfaces and problem solving environments that hide the complexity of
applications and infrastructure from a user.
The most promising approach for taming complexity and enabling reuse of appli-
cations is the use of service-oriented architecture (SOA). SOA consists of a set of
principles and methodologies for provision of applications in the form of remotely
accessible, interoperable services. The use of SOA can enable wide-scale sharing,
publication and reuse of scientific applications, as well as automation of scientific
tasks and composition of applications into new services [1].
The provision of applications as services is closely related to “Software as a Ser-
vice” (SaaS) software delivery model implemented nowadays by many web and cloud
computing services. This model has several advantages in comparison to traditional
software delivery such as ability to run software without installation using a web
browser, centralized maintenance and accelerated feature delivery. The ubiquity of
SaaS applications and the ability to access these applications via programmable APIs
have spawned development of mashups that combine data, presentation and function-
ality from multiple services, creating a composite service.
A key observation here is that, in essence, the aforementioned issues are not unique
to scientific computing. However, it is still an open question how existing approaches,
such as SOA, SaaS and Web 2.0, can be efficiently applied in the context of scientific
computing environments.
The paper presents MathCloud platform which enables wide-scale sharing, publi-
cation and reuse of scientific applications as RESTful web services. Section 2 intro-
duces a unified remote interface of computational web service based on REST archi-
tectural style. Section 3 describes main components of MathCloud platform including
service container, service catalogue, workflow management system, and security
mechanism. Section 4 presents applications and experimental evaluation of created
platform. Section 5 discusses related work.
2 Unified Interface of Computational Web Service
Currently, the dominant technology for building service-oriented systems are Web
services based on SOAP protocol, WSDL and numerous WS-* specifications (herein-
after referred to as “big Web services”). A common criticism of big Web services is
their excessive complexity and incorrect use of core principles of the Web architec-
ture [2]. The advantages of big Web services mostly apply to complex application
integration scenarios and business processes that occur in enterprise systems, while
rarely present in Web 2.0 applications that favor ease-of-use and ad hoc integration
[3].
The most promising alternative approach to implementation of web services is
based on the REST (Representational State Transfer) architectural style [4]. Thanks to
the uniform interface for accessing resources, the use of core Web standards and the
presence of numerous proven implementations, REST provides a lightweight and
robust framework for development of web services and related client applications.
This is confirmed by a proliferation of the so-called RESTful web services [2], espe-
cially within Web 2.0 applications.
Assuming that service-oriented scientific environments should also emphasize
ease-of-use and ad hoc integration of services, we propose to implement computa-
tional services as RESTful web services with a unified interface [5]. This interface or
REST API is based on the following abstract model of a computational service. A
service processes incoming client's requests to solve specific problems. A client's
request includes a parameterized description of the problem, which is represented as a
set of input parameters. Having successfully processed the request, the service returns
the result represented as a set of output parameters to the client.
The proposed unified interface of computational web service is formed by a set of
resources identified by URIs and accessible via standard HTTP methods (Table. 1).
The interface takes into account features of computational services by supporting
asynchronous request processing and passing large data parameters. Also, in accord-
ance with the service-oriented approach, the interface supports introspection, i.e.,
obtaining information about the service and its parameters.
Table 1. REST API of computational web service.
Resource GET POST DELETE
Service Get service description Submit new request (create job)
Job Get job status and results Cancel job, delete job data
File Get file data
The service resource supports two HTTP methods. GET method returns the service
description. POST method allows a client to submit a request to server. The request
body contains values of input parameters. Some of these values may contain identifi-
ers of file resources. In response to the request, the service creates a new subordinate
job resource and returns to the client identifier and current representation of the job
resource.
The job resource supports GET and DELETE methods. GET method returns job
representation with information about current job status. If the job is completed suc-
cessfully, then the job representation also contains job results in the form of values of
output parameters. Some of these values may contain identifiers of file resources.
The DELETE method of job resource allows a client to cancel job execution or, if
the job is already completed, delete job results. This method destroys the job resource
and its subordinate file resources.
The file resource represents a part of client request or job result provided as a re-
mote file. The file contents can be retrieved fully or partially via the GET method of
HTTP or other data transfer protocol.
Note that the described interface doesn’t prescribe specific templates for resource
URIs which may vary between implementations. It is desirable to respect the de-
scribed hierarchical relationships between resources while constructing these URIs.
The described interface supports job processing in both synchronous and asyn-
chronous modes. Indeed, if the job result can be immediately returned to the client,
then it is transmitted inside the returned job resource representation along with the
indication of DONE state. If, however, the processing of request takes time, it is stat-
ed in the returned job resource representation by specifying the appropriate job state
(WAITING or RUNNING). In this case, the client uses the obtained job resource
identifier for further checking of job state and obtaining its results.
The proposed REST API is incomplete without considering resource representation
formats and means of describing service parameters. The most widely used data rep-
resentation formats for Web services are XML and JSON. Among these JSON has
been chosen for the following reasons. First, JSON provides more compact and read-
able representation of data structures, while XML is focused on representation of
arbitrary documents. Second, JSON supports native integration with JavaScript lan-
guage simplifying creation of modern Ajax based Web applications.
A known disadvantage of JSON is the lack of standard tools for description and
validation of JSON data structures comparable to XML Schema. However, there is an
active ongoing work on such format called JSON Schema [6]. This format is used for
description of input and output parameters of computational web services within the
proposed REST API.
3 MathCloud Platform
MathCloud platform [7] is a software toolkit for building, deployment, discovery and
composition of computational web services using the proposed REST API. This sec-
tion presents main components of MathCloud platform.
3.1 Service Container
Service container codenamed Everest represents a core component of the platform. Its
main purpose is to provide a high-level framework for development and deployment
of computational web services. Everest simplifies service development by means of
ready-to-use adapters for common types of applications. The container also imple-
ments a universal runtime environment for such services based on the proposed REST
API. The architecture of Everest is presented in Fig. 1.
Fig. 1. Architecture of service container
The service container is based on Jersey library, a reference implementation of
JAX-RS (Java API for RESTful Web Services) specification. The container uses
built-in Jetty web server for interaction with service clients. Incoming HTTP requests
are dispatched to Jersey and then to the container. Everest is processing client re-
quests in accordance with configuration information.
The Service Manager component maintains a list of services deployed in the con-
tainer and their configuration. This information is read at startup from configuration
files. The configuration of each service consists of two parts:
Public service description which is provided to service clients;
Internal service configuration which is used during request processing.
The Job Manager component manages the processing of incoming requests. The
requests are converted into asynchronous jobs and placed in a queue served by a con-
figurable pool of handler threads. During job processing, handler thread invokes
adapter specified in the service configuration.
The components that implement processing of service requests (jobs) are provided
in the form of pluggable adapters. Each adapter implements a standard interface
through which the container passes request parameters, monitors the job state and
receives results.
Currently the following universal adapters are implemented.
The Command adapter converts service request to an execution of specified com-
mand in a separate process. The internal service configuration contains the command
to execute and information about mappings between service parameters and command
line arguments or external files.
The Java adapter performs invocation of a specified Java class inside the current
Java virtual machine, passing request parameters inside the call. The specified class
must implement standard Java interface. The internal service configuration includes
the name of the corresponding class.
The Cluster adapter performs translation of service request into a batch job submit-
ted to computing cluster via TORQUE resource manager. The internal service config-
uration contains the path to the batch job file and information about mappings be-
tween service parameters and job arguments or files.
The Grid adapter performs translation of service request into a grid job submitted
to the European Grid Infrastructure, which is based on gLite middleware. This adapter
can be used both to convert existing grid application to service and to port existing
service implementation to the grid. The internal service configuration contains the
name of grid virtual organization, the path to the grid job description file and infor-
mation about mappings between service parameters and job arguments or files.
Note that the all adapters, except Java, support converting of existing applications
to services by writing only a service configuration file, i.e., without writing a code.
This feature makes it possible for unskilled users to publish as services a wide range
of existing applications. Besides that, the support for pluggable adapters allows one to
attach arbitrary service implementations and computing resources.
Each service deployed in Everest is published via the proposed REST API. In addi-
tion to this, container automatically generates a complementary web interface allow-
ing users to access the service via a web browser.
3.2 Service Catalogue
The main purpose of service catalogue is to support discovery, monitoring and anno-
tation of computational web services. It is implemented as a web application with
interface and functionality similar to modern search engines.
After the service is deployed in the service container it can be published in the cat-
alogue by proving a URI of the service and a few tags describing it. The catalogue
retrieves service description via the unified REST API, performs indexing and stores
description along with specified tags in a database.
The catalogue provides a search query interface with optional filters. It supports
full text search in service descriptions and tags. Search results consist of short snip-
pets of each found service with highlighted query terms and a link to full service de-
scription.
In order to provide current information on service availability the catalogue period-
ically pings published services. If a service is not available it is marked accordingly in
search results. The catalogue also implements some experimental features similar to
collaborative Web 2.0 sites, e.g., ability to tag services by users.
3.3 Workflow Management System
In order to simplify composition of services a workflow management system is im-
plemented [8]. The system supports description, storage, publication and execution of
workflows composed of multiple services. Workflows are represented as directed
acyclic graphs and described by means of a visual editor. The described workflow can
be published as a new composite service and then executed by sending request to this
service. The system has client-server architecture. The client part of the system is
represented by workflow editor, while the server part is represented by the workflow
management service.
Fig. 2 shows the interface of the workflow editor. It is inspired by Yahoo! Pipes
and implemented as a Web application in JavaScript language. This makes it possible
to use the editor on any computer running a modern web browser.
Fig. 2. Graphical workflow editor
The workflow is represented in the form of a directed acyclic graph whose vertices
correspond to workflow blocks and edges define data flow between the blocks. Each
block has a set of inputs and outputs displayed in the form of ports at the top and at
the bottom of the block respectively. Each block implements a certain logic of
processing of input data and generating of output data. Data transfer between blocks
is realized by connecting the output of one block to the input of another block. Each
input or output has associated data type. The compatibility of data types is checked
during connecting the ports.
The introduction of a service in a workflow is implemented by creating a new
Service block and specifying the service URI. It is assumed that the service
implements the unified REST API. This allows the editor to dynamically retrieve
service description and extract information about the number, types and names of
input and output parameters of the service. This information is used to automatically
generate the corresponding input and output ports of the block.
The unified REST API provides a basis for service interoperability on the interface
level. The user can connect any output of one service with any input of another
service if both ports have compatible data types. However, it is important to note that
the system doesn’t check the compatibility of data formats and semantics of the
corresponding parameters. It is the task of the user to ensure this.
An important feature of the editor is ability to run a workflow and display its state
during the execution. Before the workflow can be run it is necessary to set the values
of all input parameters of the workflow via the appropriate Input blocks. After the
user clicks on the Run button, the editor makes a call with the specified input
parameters to the composite service representing the workflow. Then the editor
performs a periodic check of the status of running job, which includes information
about states of individual blocks of the workflow. This information is displayed to the
user by painting each workflow block in the color corresponding to its current state.
After successful completion of the workflow, the values of workflow output
parameters are displayed in the Output blocks. Each workflow instance has a unique
URI which can be used to open the current state of the instance in the editor at any
time. This feature is especially useful for long-running workflows.
The workflow management service (WMS) performs storage, deployment and
execution of workflows created with the described editor. In accordance with the
service-oriented approach the WMS deploys each saved workflow as a new service.
The subsequent workflow execution is performed by sending request to the new
composite service through the unified REST API. Such requests are processed by the
workflow runtime embedded in WMS. The WMS is implemented as a RESTful web
service. This provides a most convenient way to interact with the WMS from the
workflow editor.
3.4 Security
All platform components use common security mechanism (Fig. 3) for protecting
access to services. It supports authentication, authorization and a limited form of
delegation based on common security technologies.
Authentication of services is implemented by means of SSL server certificates. Au-
thentication of clients is implemented via two mechanisms. The first one is standard
X.509 client certificate. The second is Loginza service which supports authentication
via popular identity providers (Google, Facebook, etc.) or any OpenID provider. The
latter mechanism, which is available only for browser clients, is convenient for users
who don’t have a certificate.
Fig. 3. Security mechanism
Authorization is supported by means of allow and deny lists which enable service
administrator to specify users which should or should not have access to a service. A
user can be specified using its certificate’s distinguished name or OpenId identifier.
A well-known security challenge in service-oriented environments is providing a
mechanism for a service to act on behalf of a user, i.e., invoke other services. A com-
mon use case is a workflow service which needs to access services involved in the
workflow on behalf of a user invoked the service. For such cases a proxying mecha-
nism is implemented by means of a proxy list which enable service administrator to
specify certificates of services that are trusted to invoke the service on behalf of users.
In comparison to proxy certificate mechanism used in grids, this approach is more
limited but provides a lightweight solution compatible with the proposed REST API.
3.5 Clients
The platform provides Java, Python and command-line clients for accessing services
from external applications and implementing complex workflows. Since the access to
services is implemented via REST API, one can use any standard HTTP library or
client (e.g., curl) to interact with services. Also, thanks to using JSON format, ser-
vices can be easily accessed from JavaScript applications via Ajax calls. This simpli-
fies development of modern Web-based interfaces to services, in contrast to ap-
proaches based on big Web services and XML data format.
4 Applications
MathCloud platform has been used in several applications from various fields of
computational science. This section describes some of these applications and
summarizes general conclusions drawn from their development.
One of the first applications of MathCloud platform concerns an “error-free”
inversion of ill-conditioned matrix [9], a well-known challenging task in
computational science. The application uses symbolic computation techniques
available in computer algebra systems (CAS) which require substantial computing
time and memory. To address this issue a distributed algorithm of matrix inversion
has been implemented via Maxima CAS system exposed as a computational web
service. The algorithm was implemented as a workflow based on block
decomposition of input matrix and Schur complement. The approach has been
validated by inversion of Hilbert matrices up to 500×500.
The matrix inversion application provided an opportunity to evaluate performance
of all platform components, in particular with respect to passing large amounts of data
between services. In the case of extremely ill-conditioned matrices the symbolic
representation of final and intermediate results reached up to hundreds of megabytes.
Table 2 presents obtained performance results including serial execution time in
Maxima, parallel execution time in MathCloud (using 4-block decomposition) and
observed speedup. Additional analysis revealed that the overhead introduced by the
platform including data transfer is about 2-5% of total computing time.
Table 2. Performance of Hilbert (NxN) matrix inversion application in MathCloud.
N Serial execution time in Maxima,
minutes
Parallel execution time in MathCloud,
minutes
Speedup
250 8 5 1,60
300 15 8 1,88
350 27 13 2,08
400 45 20 2,25
450 72 30 2,40
500 109 40 2,73
Another MathCloud application has been developed for interpreting the data of X-
ray diffractometry of carbonaceous films by means of solving optimization problems
within a broad class of carbon nanostructures [10]. The application is implemented as
a workflow which combines parallel calculations of scattering curves for individual
nanostructures (performed by a grid application) with subsequent solution of
optimization problems (performed by three different solvers running on a cluster) to
determine the most probable topological and size distribution of nanostructures. All
these parts of computing scheme (and a number of additional steps, e.g., data
preparation, post-optimal processing and plotting) have been implemented as
computational web services. The application helped to reveal the prevalence of low-
aspect-ratio toroids in tested films [11].
A recent work [12-13] concerns a uniform approach to creation of computational
web services related to optimization modeling. MathCloud platform is used within
this work to integrate various optimization solvers intended for basic classes of math-
ematical programming problems and translators of AMPL optimization modeling
language. A number of created computational web services and workflows cover all
basic phases of optimization modeling techniques: input of optimization problems’
data, interaction with solvers, processing of solutions found.
A special service has been developed that implements dispatching of optimization
tasks to a pool of solver services directly via AMPL translator's execution. These
features enable running any optimization algorithm written as an AMPL script in
distributed mode when all problems (and/or intermediate subproblems) are solved by
remote optimization services. Independent problems are solved in parallel thus in-
creasing overall performance in accordance with the number of available services.
The proposed approach has been validated by the example of Dantzig–Wolfe decom-
position algorithm for multi-commodity transportation problem.
The experience gained from application development shows that MathCloud plat-
form can be efficiently applied for solving a wide class of problems. This class can be
described as problems that allow decomposition into several coarse-grained suprob-
lems (dependent or independent) that can be solved by existing applications repre-
sented as services. It is important to note that while MathCloud can be used as a paral-
lel computing platform in homogeneous environments such as a cluster, it is generally
not as efficient in this setting as dedicated technologies such as MPI. The main bene-
fits of MathCloud are revealed in heterogeneous distributed environments involving
multiple resources and applications belonging to different users and organizations.
The exposing of computational applications as web services is rather straightfor-
ward with MathCloud. From our experience it usually takes from tens of minutes to a
couple of hours to produce a new service including service deployment and debug-
ging. This is mainly due to the fact that the service interface is fixed and the service
container provides a framework that implements all problem-independent parts of a
service. That means that a user doesn’t need to develop a service from scratch as it
happens when using general purpose service-oriented platforms. In many cases ser-
vice development reduces to writing a service configuration file. In other cases a de-
velopment of additional application wrapper is needed which is usually accomplished
by writing a simple shell or Python script.
The workflow development is somewhat harder, especially in the case of complex
workflows. However, the workflow editor provides some means for dealing with this.
First of all, it enables dividing complex workflow into several simpler sub-workflows
by supporting publishing and composing of workflows as services. Second, it is pos-
sible to add custom workflow actions written in JavaScript or Python, for example to
create complex string inputs for services from user data or to get additional timing.
Finally, besides the graphical editor it is possible to download workflow in JSON
format, edit it manually and upload back to WMS. These and other features provide
rather good usability for practical use of MathCloud platform.
5 Related Work
The use of service-oriented approach in the context of scientific computing was pro-
posed in [1]. Service-Oriented Science as introduced by Foster refers to scientific
research enabled by distributed networks of interoperating services.
The first attempts to provide a software platform for Service-Oriented Science
were made in Globus Toolkit 3 and 4 based on the Open Grid Services Architecture
(OGSA) [14]. OGSA describes a service-oriented grid computing environment based
on big Web services. Globus Toolkit 3/4 provided service containers for deployment
of stateful grid services that extended big Web services. These extensions were doc-
umented in the Web Services Resource Framework (WSRF) specification [15]. It
largely failed due to inherent complexity and inefficiencies of both specification and
its implementations. Globus Toolkit 4 had steep learning curve and provided no tools
for rapid deployment of existing applications as services and connecting services to
grid resources.
There have been several efforts aiming at simplifying transformation of scientific
applications into remotely accessible services. The Java CoG Kit [16] provided a way
to expose legacy applications as Web services. It uses a serviceMap document to
generate source code and WSDL descriptor for the Web service implementation. Ge-
neric Factory Service (GFac) [17] provides automatic service generation using an
XML-based application description language. Instead of source code generation, it
uses an XSUL Message Processor to intercept the SOAP calls and route it to a generic
class that invokes the scientific application. SoapLab [18] is another toolkit that uses
an application description language called ACD to provide automatic Web service
wrappers.
Grid Execution Management for Legacy Code Architecture (GEMLCA) [19] im-
plements a general architecture for deploying legacy applications as grid services. It
implements an application repository and a set of WSRF-based grid services for de-
ployment, execution and administration of applications. Instead of generation of dif-
ferent WSDLs for every deployed application as in GFac and SoapLab, GEMLCA
uses generic interface and client for application execution. The execution of applica-
tions is implemented by submission of grid jobs through back-end plugins supporting
Globus Toolkit and gLite middleware. GEMLCA was integrated with the P-GRADE
grid portal [20] in order to provide user-friendly Web interfaces for application de-
ployment and execution. The workflow editor of the P-GRADE portal supports con-
nection of GEMLCA services into workflows using a Web-based graphical environ-
ment.
Opal [21] and Opal2 [22] toolkits provide a mechanism to deploy scientific appli-
cations as Web services with standard WSDL interface. This interface provides opera-
tions for job launch (which accepts command-line arguments and input files as its
parameters), querying status, and retrieving outputs. In comparison to GEMLCA,
Opal toolkit deploys a new Web service for each wrapped application. Opal also pro-
vides an optional XML-based specification for command-line arguments, which is
used to generate automatic Web forms for service invocation. In addition to Base64
encoded inputs, Opal2 supports transfer of input files from remote URLs and via
MIME attachments which greatly improves the performance of input staging. It sup-
ports several computational back-ends including GRAM, DRMAA, Torque, Condor
and CSF meta-scheduler.
The described toolkits have many similarities with the presented software platform,
e.g., declarative application description, uniform service interface, asynchronous job
processing. A key difference is related to the way services are implemented. While all
mentioned toolkits use big Web services and XML data format, the MathCloud plat-
form exposes applications as RESTful web services using JSON format. The major
advantages of this approach are decreased complexity, use of core Web standards,
wide adoption and native support for modern Web applications as discussed in Sec-
tion 2.
The idea of using RESTful web services and Web 2.0 technologies as a lightweight
alternative to big Web services for building service-oriented scientific environments
was introduced in [23]. Given the level of adoption of Web 2.0 technologies relative
to grid technologies, Fox et al. suggested the replacement of many grid components
with their Web 2.0 equivalents. Nevertheless, to the authors’ knowledge, there are no
other efforts to create a general purpose service-oriented toolkit for scientific applica-
tions based on RESTful web services.
There are many examples of applying Web 2.0 technologies in scientific research
in the form of Web-based scientific gateways and collaborative environments [20, 24-
25]. While such systems support convenient access to scientific applications via Web
interfaces, they don’t expose applications as services thus limiting application reuse
and composition.
There are many scientific workflow systems, e.g. [26-28]. The system described in
the paper stands out among these by providing a Web-based interface, automatic pub-
lication of workflows as composite services and native support for RESTful web ser-
vices.
6 Conclusion
The paper presented MathCloud platform which enables wide-scale sharing, publica-
tion and reuse of scientific applications as RESTful web services based on the pro-
posed unified REST API. In contrast to other similar efforts based on Web Services
specifications, it provides a more lightweight solution with native support for modern
Web applications. MathCloud includes all core tools for building a service-oriented
environment such as service container, service catalogue and workflow system. The
platform has been successfully used in several applications from various fields of
computational science that confirm the viability of proposed approach and software
platform.
The future work will be focused on building a hosted Platform-as-a-Service (PaaS)
for development, sharing and integration of computational web services based on the
described software platform.
Acknowledgements. The work is supported by the Presidium of the Russian Acade-
my of Sciences (the basic research program No.14) and the Russian Foundation for
Basic Research (grant No. 11-07-00543-а).
References
1. Foster, I.: Service-Oriented Science. Science, vol. 308, no. 5723, pp. 814–817 (2005)
2. Richardson, L., Ruby, S.: RESTful Web Services. O’Reilly Media (2007)
3. Pautasso, C., Zimmermann, O., Leymann, F.: Restful web services vs. "big"' web services:
making the right architectural decision. In: 17th international conference on World Wide
Web (WWW '08). ACM, New York, NY, USA, pp. 805-814 (2008)
4. Fielding, R. T.: Architectural Styles and the Design of Network-based Software Architec-
tures. Ph.D. dissertation, University of California, Irvine, Irvine, California (2000)
5. Sukhoroslov, O.V.: Unified Interface for Accessing Algorithmic Services in Web.
Proceedings of ISA RAS, vol. 46, pp. 60-82 (2009) (in Russian)
6. JSON Schema, http://www.json-schema.org/
7. MathCloud Project, http://mathcloud.org/
8. Lazarev, I.V., Sukhoroslov, O.V.: Implementation of Distributed Computing Workflows in
MathCloud Environment. Proceedings of ISA RAS, vol. 46, pp. 6-23 (2009) (in Russian)
9. Voloshinov, V.V., Smirnov, S.A.: Error-Free Inversion of Ill-Conditioned Matrices in
Distributed Computing System of RESTful Services of Computer Algebra. In: 4th Intern.
Conf. Distributed Computing and Grid-Technologies in Science and Education, pp. 257-
263. JINR, Dubna (2010)
10. Vneverov, V.S., Kukushkin, A.B., Marusov, N.L. et al.: Numerical Modeling of
Interference Effects of X-Ray Scattering by Carbon Nanostructures in the Deposited Films
from TOKAMAK T-10. Problems of Atomic Science and Technology, Ser.
Thermonuclear Fusion, Vol. 1, 2011, pp. 13-24 (2011) (in Russian)
11. Kukushkin, A.B., Neverov, V.S., Marusov, N.L. et al.: Few-nanometer-wide carbon
toroids in the hydrocarbon films deposited in tokamak T-10. Chemical Physics Letters 506,
pp. 265–268 (2011)
12. Voloshinov, V.V., Smirnov, S.A.: On development of distributed optimization modelling
systems in the REST architectural style. In: 5th Intern. Conf. Distributed Computing and
Grid-Technologies in Science and Education. JINR, Dubna (2012)
13. V.V. Voloshinov, S.A. Smirnov: Software Integration in Scientific Computing.
Information Technologies and Computing Systems, No. 3, pp. 66-71 (2012) (in Russian)
14. Foster, I., Kesselman, C., Nick, J., Tuecke, S.: Grid Services for Distributed System
Integration. Computer 35(6), pp. 37-46 (2002)
15. WS-Resource Framework, http://www-
106.ibm.com/developerworks/library/ws-resource/ws-wsrf.pdf
16. von Laszewski, G., Gawor, J., Krishnan, S., Jackson, K.: Commodity Grid Kits -
Middleware for Building Grid Computing Environments. In: Grid Computing: Making the
Global Infrastructure a Reality, chapter 25. Wiley (2003)
17. Kandaswamy, G., Fang, L., Huang, Y., Shirasuna, S., Marru, S., Gannon, D.: Building
Web Services for Scientific Grid Applications. IBM Journal of Research and
Development, , vol. 50, no. 2.3, pp. 249-260 (2006)
18. SoapLab Web Services, http://www.ebi.ac.uk/soaplab/
19. Delaitre, T., Kiss, T., Goyeneche, A., Terstyanszky, G., Winter, S., Kacsuk, P.:
GEMLCA: Running Legacy Code Applications as Grid Services. Journal of Grid
Computing Vol. 3. No. 1-2, pp. 75-90 (2005)
20. Kacsuk, P., Sipos, G.: Multi-Grid, Multi-User Workflows in the P-GRADE Portal. Journal
of Grid Computing, Vol. 3, No. 3-4, Springer, pp. 221-238 (2005)
21. Krishnan, S., Stearn, B., Bhatia, K., Baldridge, K. K., Li, W., Arzberger, P.: Opal: Simple
Web Services Wrappers for Scientific Applications. In: IEEE Intl. Conf. on Web Services
(ICWS) (2006)
22. Krishnan, S., Clementi, L., Ren, J., Papadopoulos, P., Li, W.: Design and Evaluation of
Opal2: A Toolkit for Scientific Software as a Service. In: 2009 IEEE Congress on Services
(SERVICES-1 2009), pp.709-716 (2009)
23. Fox, G., Pierce, M.: Grids Challenged by a Web 2.0 and Multicore Sandwich.
Concurrency and Computation: Practice and Experience, vol. 21, no. 3, pp. 265–280
(2009)
24. McLennan, M., Kennell, R.: HUBzero: A Platform for Dissemination and Collaboration in
Computational Science and Engineering. Computing in Science and Engineering, 12(2),
pp. 48-52, March/April (2010)
25. Afgan, E., Goecks, J., Baker, D., Coraor, N., Nekrutenko, A., Taylor, J.: Galaxy - a
Gateway to Tools in e-Science. In: K. Yang, Ed. (ed) Guide to e-Science: Next Generation
Scientific Research and Discovery, pp. 145-177. Springer (2011)
26. Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludäscher, B., Mock, S.: Kepler: an
extensible system for design and execution of scientific workflows. In: Proceedings of the
16th International Conference on Scientific and Statistical Database Management
(SSDBM '04), pp. 423-424 (2004)
27. Deelman, E., Singh, G., Su, M. H. et al.: Pegasus: a framework for mapping complex
scientific workflows onto distributed systems. Scientific Programming, vol. 13, no. 3, pp.
219-237 (2005)
28. Oinn, T., Greenwood, M., Addis, M. et al.: Taverna: lessons in creating a workflow
environment for the life sciences. Concurrency Computation Practice and Experience, vol.
18, no. 10, pp. 1067–1100 (2006)
... Some of the research work that has been conducted in the past to avail discrete service architecture for delivery of specific cloud services are (Bhat, Kameshwari, & Singh, 2020) which is related to the deployment of MathCloud meeting specific requirements of certain academic fraternity, a similar study has been conducted by Alexander et al., for MathCloud platform which enables wide-scale sharing, publication and reuse of scientific applications as RESTful web services. A unified interface of computational web service based on the REST architectural style is proposed (Afanasiev, Sukhoroslov, & Voloshinov, 2013). Few more examples of implementation-specific to different subject areas is also seen (Zameer, Naidu, & Singh, Emerging Trends in Expert Applications and Security, 2019). ...
Article
Full-text available
Cloud Computing has been implemented in diverse fields and sector, including education sector with a very good success rate. The cloud deployments enhance the learning experience and provisions appropriate IT resources critical for a specific learning environment. This research study exploits a service model to deliver appropriate learning resources to foundation students and teachers that otherwise is not available with the current cloud service architectures. Language should not be a barrier to learning, teaching subjects in English to those who are not native English speakers, most of the times is a challenging job especially in circumstances where the English Language is introduced late in the school academic curriculum. There are several challenges faced by students, and teachers. Several mechanisms have been suggested and adopted to deal with such situations so that student learning experience is not affected in a significant manner, and at the same time, teachers are also not facing problems while explaining concepts or topics to students. This research paper is a study on helping teachers and students in these situations by providing cloud-based knowledge base related to TESOL (Teachers of English to Speakers of Other Languages), repositories, and apps that may help and assist teacher and students.
... The majority of WMSes are implemented as a standalone application with a graphical workflow editor. More recent web-based environments [10,11,3] implement a hosted multi-user WMS with a web user interface for designing and running workflows. The execution of workflow tasks on remote computing resources is achieved by direct job submission via native interfaces or by employing resource brokers such as HTCondor. ...
Chapter
Full-text available
Workflows is an important class of parallel applications consisting of multiple tasks with control or data dependencies between them. Such applications are widely used for automation of computational and data processing pipelines in science and technology. The paper considers the execution of workflows on Everest, a web-based distributed computing platform. Two different approaches for workflow description and execution are presented and compared. The proposed solutions and optimizations address common problems related to the efficient execution of workflows on distributed computing resources. The results of experimental study demonstrate the effectiveness of proposed optimizations.
... The described model makes it possible to define a uniform web service interface for accessing applications [Afanasiev, Sukhoroslov, Voloshinov, 2013] which is essential in order to support application composition. This interface is implemented in Everest as a part of REST API. ...
Article
Full-text available
The use of service-oriented approach in scientific domains can increase research productivity by enabling sharing, publication and reuse of computing applications, as well as automation of scientific workflows. Everest is a cloud platform that enables researchers with minimal skills to publish and use scientific applications as services. In contrast to existing solutions, Everest executes applications on external resources attached by users, implements flexible binding of resources to applications and supports programmatic access to the platform's functionality. The paper presents current state of the platform, recent developments and remaining challenges.
... The majority of WMSes are implemented as a standalone application with a graphical workflow editor. More recent web-based environments [10,11,3] implement a hosted multi-user WMS with a web user interface for designing and running workflows. The execution of workflow tasks on remote computing resources is achieved by direct job submission via native interfaces or by employing resource brokers such as HTCondor. ...
Conference Paper
Full-text available
Workflows is an important class of parallel applications consisting of multiple tasks with control or data dependencies between them. Such applications are widely used for automation of computational and data processing pipelines in science and technology. The paper considers the execution of workflows on Everest, a web-based distributed computing platform. Two different approaches for workflow description and execution are presented and compared. The proposed solutions and optimizations address common problems related to the efficient execution of workflows on distributed computing resources. The results of experimental study demonstrate the effectiveness of proposed optimizations.
... First, it was difficult to combine multiple packages, run multi-step workflows and other many-task applications [15] typically found in science. This has lead to development of web based environments supporting the description and execution of user-defined workflows [10,4,3]. Second, the extension of portal functionality, e.g. publishing new applications, was difficult and limited to administrators only. ...
Chapter
Full-text available
High-performance computing plays an increasingly important role in modern science and technology. However, the lack of convenient interfaces and automation tools greatly complicates the widespread use of HPC resources among scientists. The paper presents an approach to solving these problems relying on Everest, a web-based distributed computing platform. The platform enables convenient access to HPC resources by means of domain-specific computational web services, development and execution of many-task applications, and pooling of multiple resources for running distributed computations. The paper describes the improvements that have been made to the platform based on the experience of integration with resources of supercomputing centers. The use of HPC resources via Everest is demonstrated on the example of loosely coupled many-task application for solving global optimization problems.
Chapter
A problem of development of user-friendly interfaces for high performance computing (HPC) applications is addressed. The HPC Community Cloud (HPC2C) service that provides a RESTful application programming interface for unified control of HPC jobs was used to develop a prototype of a web-based UI for cellular automata simulation package. The UI allows a user to easily run multiple simulations on remote HPC resources and, this way, study a parameter space of a cellular automaton. The interface was used to organize a series of numerical experiments resulting in reproduction of the Kármán vortex street.
Chapter
Due to recent advances in data science, IoT, and Big Data, the importance of data is steadily increasing in the domain of business process management. Service choreographies provide means to model complex conversations between collaborating parties from a global viewpoint. However, the involved parties often rely on their own data formats. To still enable the interaction between them within choreographies, the underlying business data has to be transformed between the different data formats. The state-of-the-art in modeling such data transformations as additional tasks in choreography models is error-prone, time consuming and pollutes the models with functionality that is not relevant from a business perspective but technically required. As a first step to tackle these issues, we introduced in previous works a data transformation modeling extension for defining data transformations on the level of choreography models independent of their control flow as well as concrete technologies or tools. However, this modeling extension is not executable yet. Therefore, this paper presents an approach and a supporting integration middleware which enable to provide and execute data transformation implementations based on various technologies or tools in a generic and technology-independent manner to realize an end-to-end support for modeling and execution of data transformations in service choreographies.
Article
Full-text available
e-Science focuses on the use of computational tools and resources to analyze large scientific datasets. Performing these analyses often requires running a variety of computational tools specific to a given scientific domain. This places a significant burden on individual researchers for whom simply running these tools may be prohibitively difficult, let alone combining tools into a complete analysis, or acquiring data and appropriate computational resources. This limits the productivity of individual researchers and represents a significant barrier to potential scientific discovery. In order to alleviate researchers from such unnecessary complexities and promote more robust science, we have developed a tool integration framework called Galaxy; Galaxy abstracts individual tools behind a consistent and easy-to-use web interface to enable advanced data analysis that requires no informatics expertise. Furthermore, Galaxy facilitates easy addition of developed tools, thus supporting tool developers, as well as transparent and reproducible communication of computationally intensive analyses. Recently, we have enabled trivial deployment of complete a Galaxy solution on aggregated infrastructures, including cloud computing providers.
Article
Full-text available
Computational Grids connect resources and users in a complex way in order to deliver nontrivial qualities of services. According to the current trend various communities build their own Grids and due to the lack of generally accepted standards these Grids are usually not interoperable. As a result, large scale sharing of resources is prevented by the isolation of Grid systems. Similarly, people are isolated, because the collaborative work of Grid users is not supported by current environments. Each user accesses Grids as an individual person without having the possibility of organizing teams that could overcome the difficulties of application development and execution more easily. The paper describes a new workflow-oriented portal concept that solves both problems. It enables the interoperability of various Grids during the execution of workflow applications, and supports users to develop and run their Grid workflows in a collaborative way. The paper also introduces a classification model that can be used to identify workflow-oriented Grid portals based on two general features: Ability to access multiple Grids, and support for collaborative problem solving. Using the approach the different potential portal types are introduced, their unique features are discussed and the portals and Problem Solving Environments (PSE) of our days are classified. The P-GRADE Portal as a Globus-based implementation for the classification model is also presented.
Conference Paper
Full-text available
Abstract— The Grid-based computational infrastructure en- ables large-scale scientific applications to be run on distributed resources and coupled in innovative ways. However, in practice, Grid-based resources are not very easy to use for the end-user. The end-user has to learn how to generate security credentials, stage inputs and outputs, access Grid-based schedulers, and install complex client software to do so. This has proved to be an effective deterrent for a number,of scientific users. There is an imminent,need to provide transparent access to these resources so that the end-users are shielded from the complicated details, and free to concentrate on their domain science. Scientific applications wrapped,as Web services alleviate some of these problems by hiding the complexities of the back-end security and computational infrastructure, only exposing a simple SOAP API that can be accessed programmatically by application-specific user interfaces. However, writing the application services that access Grid resources can itself be complicated, especially if it has to be replicated for every application. In this paper, we will present Opal, which is a toolkit for wrapping scientific applications as Web services in a matter of hours. Opal provides features such as scheduling, standards-based Grid security, and data management,in an easy-to-use and configurable manner. We will present some of the scientific applications that have been deployed using Opal, and describe the steps involved in doing so. Furthermore, we will demonstrate access to the Opal-based scientific applications via a number,of different clients. With the
Conference Paper
Full-text available
Recent technology trends in the Web Services (WS) domain in- dicate that a solution eliminating the presumed complexity of the WS-* standards may be in sight: advocates of REpresentational State Transfer (REST) have come to believe that their ideas ex- plaining why the World Wide Web works are just as applicable to solve enterprise application integration problems and to simplify the plumbing required to build service-oriented architectures. In this paper we objectify the WS-* vs. REST debate by giving a quantitative technical comparison based on architectural principles and decisions. We show that the two approaches differ in the num- ber of architectural decisions that must be made and in the number of available alternatives. This discrepancy between freedom-from- choice and freedom-of-choice explains the complexity difference perceived. However, we also show that there are significant dif- ferences in the consequences of certain decisions in terms of re- sulting development and maintenance costs. Our comparison helps technical decision makers to assess the two integration styles and technologies more objectively and select the one that best fits their needs: REST is well suited for basic, ad hoc integration scenarios, WS-* is more flexible and addresses advanced quality of service requirements commonly occurring in enterprise computing.
Conference Paper
Full-text available
Grid computing provides mechanisms for making large-scale computing environments available to the masses. In recent times, with the advent of Cloud computing, the concepts of Software as a Service (SaaS), where vendors provide key software products as services over the internet that can be accessed by users to perform complex tasks, and Service as Software (SaS), where customizable and repeatable services are packaged as software products that dynamically meet the demands of individual users, have become increasingly popular. Both SaaS and SaS models are highly applicable to scientific software and users alike. Opal2 is a toolkit for wrapping scientific applications as Web services on Grid and cloud computing resources. It provides a mechanism for scientific application developers to expose the functionality of their codes via simple Web service APIs, abstracting out the details of the back-end infrastructure. Services may be combined via cus- tomized workflows for specific research areas and distributed as virtual machine images. In this paper, we describe the overall philosophy and architecture of the Opal2 framework, including its new plug-in architecture and data handling capabilities. We analyze its performance in typical cluster and Grid settings, and in a cloud computing environment within virtual machines, using Amazon's Elastic Computing Cloud (EC2).
Article
Measurements of X-ray scattering by the hydrocarbon films, deposited in tokamak T-10, in the range of scattering wave vector’s modulus q∼5÷70nm-1 are interpreted via solving an optimization problem within a broad class of carbon nanostructures (tubules, fullerenes and bigger spheres, ellipsoids, toroids, half-fragments of all these structures, flat flakes, etc.), with allowance for amorphous hydrocarbon component and heavy impurities. It is shown that (i) wide peaks at q∼10,30 and 55nm-1 may be caused by the nanostructures formed by the curved single-layer graphene, (ii) respective most probable candidates are 2–3nm wide, low-aspect-ratio toroids.
Chapter
IntroductionGrid Computing Environments and PortalsCommodity TechnologiesOverview of the Java CoG KitCurrent WorkAdvanced CoG Kit ComponentsConclusion AvailabilityAcknowledgementsReferences
Article
The HUBzero cyberinfrastructure lets scientific researchers work together online to develop simulation and modeling tools. Other researchers can then access the resulting tools using an ordinary Web browser and launch simulation runs on the national Grid infrastructure, without having to download or compile any code.