ArticlePDF Available

Magio: Using mobile agents to enhance parallel I/O

March 2015
Computer Systems Science and Engineering 30(2):93-107

March 2015
30(2):93-107

Authors:

David E. Singh

University Carlos III de Madrid

Florin Isaila

University Carlos III de Madrid

Felix Garcia-Carballeira

University Carlos III de Madrid

Jesus Carretero

University Carlos III de Madrid

In the last years, the increasing gap between processor speed and storage has exposed I/O as one of the most significant bottlenecks in parallel applications. Distributed architectures based on commodity components belonging to the same organization consists of heterogeneous computers with dynamically evolving utilization loads. These resources can be efficiently used for executing parallel computing applications provided that all the computers are under the same administrative domain. In this work we propose an execution environment which could provide these parallel applications an efficient access to storage in the case that distributed or parallel filesystem are not available. Unlike traditional storage solutions, we propose a pull-based model based on mobile agents. In this approach, the mobile agents can freely traverse the compute nodes as well as accessing the memory region of the parallel application. Therefore, the application data can be accessed by the mobile agents and efficiently transferred to the storage nodes. The final goal of this platform is to provide distributed asynchronous I/O services based on the usage of mobile agents. Experimentally we show that, by exploiting the unique characteristics of mobile agents, the efficiency of the I/O stage can be dramatically increased.

MAGIO Architecture.

…

Mobile Agent Server Structure.

…

Control Unit Structure.

Figures - uploaded by David E. Singh

Content may be subject to copyright.

Content uploaded by David E. Singh

Content may be subject to copyright.

MAGIO: Using Mobile Agents to Enhance Parallel I/O

David E. Singh, Florin Isaila, Félix García and Jesús Carretero

Department of Computer Science, University Carlos III of Madrid, Avda. de La Universidad

30. 28911 Leganés (Madrid), Spain.

Email: desingh@arcos.inf.uc3m.es, florin.isaila@gmail.com, fgcarbal@inf.uc3m.es,

jesus.carretero@uc3m.es

Abstract

In the last years, the increasing gap between processor speed and storage has exposed

I/O as one of the most significant bottlenecks in parallel applications. Distributed architectures

based on commodity components belonging to the same organization consists of heterogeneous

computers with dynamically evolving utilization loads. These resources can be efficiently used for

executing parallel computing applications provided that all the computers are under the same

administrative domain. In this work we propose an execution environment which could provide

these parallel applications an efficient access to storage in the case that distributed or parallel

filesystem are not available. Unlike traditional storage solutions, we propose a pull-based model

based on mobile agents. In this approach, the mobile agents can freely traverse the compute nodes

as well as accessing the memory region of the parallel application. Therefore, the application data

can be accessed by the mobile agents and efficiently transferred to the storage nodes. The final

goal of this platform is to provide distributed asynchronous I/O services based on the usage of

mobile agents. Experimentally we show that, by exploiting the unique characteristics of mobile

agents, the efficiency of the I/O stage can be dramatically increased.

KEYWORDS: Distributed architectures, mobile agents, parallel I/O.

1 Introduction

Nowadays, distributed computing technologies are evolving toward integration of all the

services (compute, storage, visualization, etc.) into highly distributed large-scale platforms. This

work targets infrastructures of loosely connected components under the same administrative

domain. Examples of them are a collection of computers, belonging to the same organization, that

are employed in voluntary computing for executing MPI applications. There are significant

differences with the cluster computing approach. First, components can be highly heterogeneous

both in computing power as well as network performance. For instance, the network

characteristics can be different between several compute nodes. Second, the access to the

compute nodes is not exclusive. That is, other applications can be executed at the same time,

consuming compute and network resources. The load changes dynamically and is impossible to

predict. In addition, there can be hotspots, both at compute node-level and at network link-level.

Third, given that the target infrastructure is used in an ad-hoc manner, it is highly unprobable that

shared resources, like parallel filesystems, are available. For this new environment, issues such as

communication overhead, contention, system reliability and fault tolerance become increasingly

important for executing parallel applications in an efficient and dependable manner.

When parallel application reliability is considered, one practical solution for improving it

consists in the introduction of robust mechanisms of checkpointing and rollback. During the I/O

phase the current process state is transferred to local or remote disks. This operation usually

involves transferring large amounts of data, introducing significant delays that reduce the overall

application performance. In general, parallel applications alternate compute and I/O phases. The

use of local storage caching or distributed parallel file systems can diminish the impact of the I/O

phase, but they require of special support for, respectively, transferring the data to a central

storage server or deploying and administrating the file system. This work presents a solution to

this problem based on Mobile Agents (MAs) in the context of parallel applications executed on

decentralized distributed platforms.

Mobile Agent Platforms (MAPs) provide a flexible and scalable solution for the

management of diverse elements of the distributed infrastructure [37]. In the context of this work

different tasks may be assigned to a MA, which executes them in an autonomous, distributed, and

adaptive fashion. The MA includes functionalities for monitoriing and dynamically adapting to the

changing conditions of the platform and efficiently exploiting its resources. For instance, due to

external reasons, a given compute node increases its computational load or a network link reduces

its bandwidth the MAPs allows to dynamically adapt to the new execution conditions.

This paper presents Mobile AGents I/O (MAGIO), a novel I/O infrastructure for parallel

applications based on MAs. MAGIO provides facilities for collecting/storing data from the

application memory space and transferring these data to the storage nodes. In MAGIO the I/O is

performed asynchronously, without the need of special I/O calls from the parallel application. This

feature allows to overlap the compute and I/O phases, improving the overall performance of the

parallel application. Additionally, the platform includes monitoring functionalities that can be used

to collect information about the compute node status. This information is employed for

performing an adaptive, efficient I/O, which is aware of the real platform status. The unique

characteristics of MAs allow them to adapt to the changing platform conditions, such as processor

and network performance. In addition, MAs are intrinsequely parallel, thus they can adapt to the

application characteristicis and achieve efficiente parallel I/O operations.

The rest of the paper is organized as follows. Next section introduces the related work.

Section 2 described MAGIO internal structure and the components of the distributed architecture.

Section 4 shows how MAGIO handles the parallel application data distribution structures and the

way that MA scheduling is performed. Section 5 depicts different techniques for creating user

defined or optimal MA scheduling. Section 6 presents the integration of MAGIO into a parallel

scientific application, and performance results of its evaluation on a distributed architecture.

Finally, Section 7 summarizes the main conclusions of this work.

2 Related work

In this work, we present a MAP that is integrated into MPI applications for performing the

I/O on a remote filesystem. Typically, parallel applications follow the classical model for which

data are initially distributed, then processed in parallel, and finally the results are collected and

stored on the disk. There are several classifications of the file I/O access routines based on their

design and implementation. First, file I/O methods can be independent or collective. In the

former case all the network and file systems operations are performed independent from each

others. However, this implementations may cause an inefficient use of the network and disk

resources, especially for an important subset of scientific applications, known to generate large

number of requests for small disjoint regions of the shared files [45, 53]. In turn, collective I/O

techniques merge small individual requests from compute nodes into larger global requests in

order to optimize the network and disk performance. Depending on the place where the request

merging occurs, one can identify two collective I/O methods. If the requests are merged at the I/O

nodes the method is called disk-directed I/O [36] or server-directed I/O [49]. If the merging

occurs at intermediary nodes or at compute nodes the method is called two-phase I/O [51, 61].

Second, file I/O can be implemented either as synchronous or asynchronous operations.

Asynchronous implementations [40, 34, 47, 52] offer the advantage of overlapping the

computation and file I/O. Third, file I/O can be implemented either as push or pull operations. In

the push operations implemented by the vast majority of parallel file systems [48, 38, 33], the

compute nodes send the data to I/O nodes responsible for storing them to disks. Push operations

can cause disk contention if a large number of compute nodes write data at the same time. In the

pull operations [59], I/O nodes are notified by the compute nodes and recollect the data based on

a schedule optimizing the network and disk bandwidth. The agent approach taken in this paper

can be classified as collective, asynchronous and pull-based.

The use of autonomous and multi-agent systems as an active element of distributed

computing infrastructures has been an important trend during last years [44, 1]. Their use is

characterized by an autonomous and asynchronous execution that hides the communication

overheads. Other properties, such as reactiveness, proactiveness, learnability, and collaboration

capabilities can be used for adapting to the dynamic conditions of the evolving platform and

increasing its robustness. Mobile agents have been used in the context of distributed computing.

There are several initiatives such as GrADS [56], a migration framework for Grid systems that takes

into account both the compute load and the application characteristics; Ibis [46], a programming

environment based on Java for distributed systems that supports communication and serialization

of Java objects; and ASSIST [23] which is an example of the use of autonomic management for

performance tuning in massively parallel applications. [41] presents a platform that combines

agents with parallel applications. The integration with MAs is limited at the level of control and

management. In [35] a model driven component-based framework is presented. MASIPE [54] uses

MAs for collecting and displaying data from parallel applications although it is not designed for

I/O.

Examples of the use of MAP for Grid are AgentScape [57], GridFlow [10], Grid of agents for

network management [4], and GAIN [9]. GAIN provides an infrastructure for defining, distributing,

and executing workflow tasks on Grid nodes. GAIN is implemented in JADE [25] with different

types (roles) of MAs for performing these functionalities. In [15] a JADE extension for Grid is

presented. A different approach is presented in [31], where a execution environment for

multi-agent computing systems based on cloud and volunteer computing is introduced.

There is a limited body of research on exploiting agents for file I/O. MAPFS [63] is a

multiagent architecture targeting high-performance for different access patterns. This is achieved

by hiding the latency of file access through loosening the coupling between the applications and

the storage architecture of a cluster. MAPFS-DSI [64] leverages agents for improving the

performance of file transfers through the GridFTP protocol. Similarly to our approach, Fukuda and

Miyauchi [66] propose to use agents for distributing files from and to remote computing nodes

that are not connected in the same network file system. Unlike in our case, their system targets

Java applications, requires modification of existing applications and is based on a hierarchy of

agents.

In [27] a MA platform for searching, collecting and storing software components. It uses

ontologies [62] for disambiguate queries, navigating the MAs, and efficiently retrieving the

information. Although it is not used for I/O, it is an example of the flexibility and transparency

achieved by the use of MAs. In [16] MAs are employed to allocate resources for multimedia

applications based on the available bandwidth. In [30] MAs are used for transferring data in sensor

networks, reducing the data redundancy and communication overheads.

A critical issue of a MAP is the efficient scheduling and routing of MAs. In [43, 28] the

Travelling Agent Problem is presented. It is defined as finding the optimal MA itinerary which

minimizes the transfer time. In general, this is a NP-Complete problem, but its complexity can be

reduced under simplifying assumptions (regarding the number of MA, the network latencies, the

probabilities of success and the compute time of each MA task). In [17] a de-centralized scheme

with different routing schemes for MAs is presented. Routing techniques are addressed to avoid

congestions and achieve good load balance. In [11, 12] are presented routing algorithms that

computes the minimal number of MAs and their planning for a given turn-around time. In this

paper we follow a similar approach, but in our case the problem can be simplified given that once

a sequence of nodes is assigned a given MA, it is possible to find the routing for it.

In [5] an agent approach to solve routing problems is presented. This approach is based on

the swarm intelligence principle: Using a collection of agents with self-organizing capabilities to

solve a given problem. Another approach can be seen in [3], where a hybrid routing protocol

which combines on-demand and ant routing techniques for exploiting and combining the

advantages of each one. In [22] MAs have local planning abilities and interact with each other and

cooperate in a peer-to-peer mode. The use of Simulated Annealing techniques and Genetic

Algorithms for optimizing the task scheduling has been extensively studied [20]. In [13, 2] route

selection algorithms based on Genetic Algorithms are presented for reducing the MA propagation

time. In [55] a traffic-based routing algorithm is presented.

In our case, the routing scheme is limited by the range of data that are transferred to disk,

that is, the compute nodes that are involved in the I/O operation. For a given number, we

establish the number of MAs and we route them based on the minimization of an objective

function. Note that in our experiments each compute node is visited by only one MA, thus it

reduces the interaction degree between the MAs

One practical application of MAGIO is its use for checkpointing [29, 14, 21]. During this

technique the processes of a parallel application save their state on disk. In the event of a system

failure, last saved state can be used to restart the process in the same or different compute node.

There are several checkpointing strategies: in coordinated checkpointing [19, 58] the processes

synchronize for storing the same application state. This allows to simplify the recovery process but

increases the overhead, given that the processes have to be synchronized and halted. In

uncoordinated checkpointing [8], this operation is performed independently, storing different

application states. This strategy obtains better performance than the previous one, but the

recovery process for finding a consistent state is more complicated. Other checkpointing strategies

are message logging [26] for recovering a consistent state based on message traces and

distributed storage schemes of the application data on different compute nodes [7]. All these

strategies rely on storing the state of each process on disk. MAGIO can be employed in

combination with all of them given that it provides support for transferring the data from the

compute nodes to the storage nodes. In addition, by means of MAGIO I/O operations can be

removed, given that the current system state can be stored (by the application) in a memory area

and subsequently collected and transferred by the MAs.

Currently, there are many stable and scalable distributed monitoring solutions [6, 42]. The

collected information is used for the system management and failure detection and can be

employed in combination with MAGIO for modeling the platform architecture. These solutions and

the one offered by MAGIO are complementary. In [60] mobile agents are employed for reducing

the energy in cluster invironments. In this work agents motinor distributed applications executed

in clusters and are responsible of scaling the frequency in each compute node according the

Our platform supports collecting different parts of the compute node data by more than one MA. However, we impose this

restriction because it obtains the best performance.

application characteristics. In [39], a mobile agent-based platform for e-commerce systems is

presented. In this case mobile agents are used to monitor and choose the best sellers and to

perform the electronic transactions. Although the scope of this paper is different to the one of this

paper, in both works agents are employed to procure elements according to a dynamic

procurement scheme.

3 MAGIO architecture

MAGIO offers two main services: system monitoring and distributed I/O. The first one

consists of collecting information from the compute nodes, which includes the amount of

memory, disk, and CPU usage. In practice, the user specifies the compute nodes to be monitored.

According to this information, the MAP creates and delivers the MAs to the distributed system.

The agents traverse all their assigned compute nodes, collecting their status, and subsequently

levering it for monitoring the system.

The second service consists of storing in the Storage Units specific portions of the

application data. This service focuses on providing asynchronous data access facilities to

large-scale distributed applications. Data access can be performed in two forms: collecting a

specific portion of data that is distributed through the system, or collecting all the data of a given

compute node. This paper provides a unique solution to both approaches, based on the use of MPI

datatypes. Additionally, this platform implements different scheduling techniques for performing

these operations.

Figure 1: MAGIO Architecture.

Figure 1 shows a scheme of the proposed MAGIO architecture. It has a modular structure,

which allows to simplify the design and the future inclusion of new functionality. MAGIO consists

of the following major building blocks:

• The MPI application, is executed on the assigned compute nodes. This application

is external to our design, but we include it because MAGIO is executed attached to it, that is,

sharing its memory space. Currently, our implementation supports Fortran and C coded

applications.

• The Mobile Agent Server (MAS) is executed attached to each MPI process. It

provides routines for receiving, executing, and sending MAs. In addition, it offers adaptors for

accessing both the compute node resources and the MPI application memory space.

• The Control Unit (CU) is responsible for creating, launching, and receiving the MAs.

It also contains a GUI that displays information about the execution status of the compute nodes.

• The Storage Unit (SU) is executed on the I/O nodes. It provides an interface for

receiving MAs and transferring their associated data to local disk. The file access pattern, i.e., the

file positions where the agent data are stored, is provided by a MPI datatype stored in the MA.

• The Mobile Agent (MA) is an agent that is able of migrating between these blocks.

First, they are created at the CU; second, they collect data from the parallel application and finally,

they deliver these data to the SUs. Our innovative platform provides flexibility in use, transparency

in data accessing, efficiency and scalability in performing the I/O.

This approach was implemented in MPI, but can be also extended for being employed in

OpenMP or hybrid MPI-OpenMP programs. All building blocks, with exception of the MPI

applications, are coded in C# and are executed in the Mono execution environment [65]. The

following sections describe the structure of each one of them.

3.1 Building block structure

Figure 2: Mobile Agent Server Structure.

Figure 2 shows the Mobile Agent Server structure. When the MPI application is executed

in a distributed environment, each process has attached one MAS, which is executed as an

independent thread. The MAS has two different roles. First, when the MAS starts its execution, it

offers services for the management of assemblies, which are portable executable files

corresponding to agents and libraries. The management includes operations such as receiving the

assemblies from the CU, storing them in a local cache and accessing them when the agent is

executed. Second role consists of the management of MAs. This task consists of three operations:

MA reception from a CU or another MAS, MA local execution, and MA delivery to the next MAS,

CU, or SU.

MAS consists of four components: Migration Component, Assembly Resolver, Local

Assembly Cache, and MPI Server Adaptor. The Migration Component performs the transfer of the

MAs and other assemblies using .NET Remoting facilities over a TCP channel. The Assembly

Resolver manages the assembly execution. When a new assembly is received, the Assembly

Resolver authenticates it using the assembly metadata, which includes the agent name, version,

and public key. The public key is a 64-bit hash, which is associated to a private key used to sign the

assembly. When the assembly is created, a hash is computed, and subsequently encrypted with

the private key. This hash is stored in the assembly metadata. Then, when the assembly is

authenticated at the Assembly Resolver, the hash is decrypted using the public key and compared

with a new hash generated from the assembly. If the public key is truly related to the private key,

then both hashes match, and the assembly is properly authenticated. Once the authentication

process is completed, the assembly is stored in the Local Assembly Cache. When a specific

assembly is required, it is loaded from it. The fourth module is the MPI Server Adaptor, which

attaches the MAS to the MPI application. This unit provides access to the process memory space

as well as to the local node resources. This way, the MAS may access the application data by

means of memory pointers to data structures that are provided by the MPI Server Adaptor.

Figure 3: Control Unit Structure.

The Control Unit, depicted in Figure 3, consists of four major components: The GUI

Manager allows the user to select, configure, and execute MAs, displaying the information

resulting from their execution. Once a given MA is selected, its code is loaded into memory from

the Repository Module, which contains all the MAs and configuration assemblies. The CU also

contains an Data Assignment unit. This unit initiates and assigns the data that each MA has to

collect during the I/O operation. This assignment is computed trying to minimize the overall I/O

time, and taking into account the distributed architecture characteristics (network speed, compute

node computational power, etc.). Finally, the Migration Component transfers the MAs and other

assemblies using .NET Remoting facilities.

The Storage Unit consists of the Migration Component, the Assembly Resolver and the

Local Assembly Cache. These modules have the same functionality as the ones from Mobile Agent

Server. Instead of using a MPI Server Adaptor, the CU implements services for storing the MA data

to local disk. These services include MPI datatype handling for mapping the agent data to file

positions.

3.2 Mobile Agent logic

Mobile agents contain both code and data sections. The code section consists of all the

agent logic necessary to compute its itinerary as well as to perform its duties in the MASs and SUs.

The data section encloses two different elements: metadata and payload. The first one contains

the information required by the MA to perform its task. It includes the agent itinerary and data

map, represented as MPI datatypes (in case of being a data collector). The payload contains the

data collected by the agent in each MAS.

Our current implementation supports two different agents: the data collector which

gathers data from the MPI application memory space and the system monitor, which collects

information from the compute node configuration files. The first one is used for performing

distributed I/O operations and the second one is employed for monitoring the distributed

architecture status. The information collected from the monitoring process includes the

network/node characteristics, which will be subsequently used as input parameters in the

scheduling component. Figure 1 shows the agent itinerary of these two types of agents. Mobile

Agent 1 is a system monitor, which travels to the Compute Node 1, collecting the node status

information and transferring this information back to the CU. Mobile Agent 2 is a data collector. It

leaves the CU and collects data from the MPI application processes running on the compute nodes

2 and 4 and, finally, stores these data on the second storage node. Note that both agent types can

traverse any number of compute nodes and that both operations are fully parallel.

The integration of MAGIO with the target application is simple: the pointer of the data

structure, a MPI datatype describing the data contents and a lock variable for allowing/denying

the data access by the MA. Using the variable, it is possible to control the time interval where data

are collected (a subsequently transferred) by the MA.

Regarding the platform reliability, the MA is responsible for collecting (from memory or

disk) and transferring the compute node data. Agent failure detection can be done using timeout

or heartbeat techniques. Timeout is estimated based on the traveling time of each MA. This is

done during the MA scheduling process. A MA is considered lost if it is not received at the SU after

this time interval (plus a confidence value). In heartbeat techniques MAs regularly sent messages

to other members of the group for notifying their status. If one MA is lost, the message emission

will be interrupted. In both cases, when one MA is lost, the MAPs schedule a new agent with the

same tasks.

The MA has the following properties: they are autonomous and adaptive, i.e. they can

take run-time decisions about their actions based on platform conditions; they travel across both

the compute and I/O nodes, interacting with them; they are persistent. Following sections

describe in more details the implementation of these features.

Figure 4: Datatype handling.

4 Data management

A large class of parallel compute-intensive scientific applications operates on

multidimensional data sets, which are distributed among the compute nodes. MAGIO includes

services that allow the user to specify the disk data distribution, how many MAs are employed,

and which portion of data each agent collects. Optionally, an automatic data assignment unit can

be employed for automatically computing the number of MAs, in order to minimize the overall I/O

cost.

Each data collector agent handles a specific datatype, which describes the data entries

that it has to collect. Datatypes are data structures that contain references to memory or file

positions. Datatypes are commonly employed by MPI applications for describing the memory

regions that are used for sending/receiving data. They are also employed in I/O view operations

for describing the file positions where the data are allocated. MPI datatypes provides a flexible

solution for handling generic types of data distributions. Figure 4 shows an example of

MPI_Indexed datatype for a 16-entry data array. MPI_Indexed datatype consists of a sequence of

blocks, where each block can contain a different number of copies and have a different

displacement. Distribution datatype 1 contains the following offset-length list of blocks: {0,4},

{5,4}, {12,1} and {14,2}.

When a data collector MA is created, a given datatype is added to its data section with the

description of the memory entries that it has to collect. We call this datatype MA datatype. We

distinguish two different MA datatypes: global data assignment, when a given agent has to collect

the whole data from one or several MPI processes; and partial data assignments, when the agent

collects only part of the data. Once the MA is created and initiated with a data assignment, it

firstly visits the MASs that contain the data distribution datatypes. Once there, it compares its data

assignment with the global data distribution. Using this information the agent determines which

MASs to visit, which data entries to collect from each one, and in which order to perform its

itinerary.

The itinerary is determined in the following way: When a MA is created it includes a basic

primitive for computing the intersection between two datatypes. The result is a newly generated

datatype. Figure 4 shows an example of this operation. We label intersection datatype 1 and 2 the

resulting datatypes produced by the intersection between MA datatype with distribution datatype

1 and 2, respectively. The collected entries of the data array are the union of entries pointed by

both intersection datatypes.

Using this primitive, each MA can easily compute its itinerary by means of a scheduling

algorithm. Figure 5 depicts the algorithm. Initially (1), the MA checks if it has computed its

itinerary. If not, it visits the compute nodes that contain the data distribution datatypes (2), and

accesses their memory space. Then, it obtains the number of MPI processes used by the parallel

application. For each one, a reference of its datatypes (related to its data distribution) is

generated. Using this reference, the MA performs the intersection operation (3) between its MA

datatype and the data distribution (datatypes) of each MPI process. As a result, the range of

entries that has to collect in this particular node is obtained. Repeating this operation, the MA

collects the list of nodes of its itinerary. If one intersection operation produces no entries, then

this node is discarded from the list.

Figure 5: Mobile agent logic.

The last step consists in computing the itinerary for visiting these nodes (4). In this work,

we have followed the minimum network traffic scheduling. A given agent with a given data

assignment creates a minimum network traffic itinerary visiting first the MAS with smallest

amount of entries to be collected, then the following with fewer entries, and so on. This is

straightforward taking into account that the agent collects the entries of each MAS and all the

collected data are transferred across the MA itinerary. We can see that the larger MA itinerary,

the more accumulated data from each compute node and bigger network traffic [32]. Accessing

firstly the compute nodes with less data ensures smaller network traffic.

A minimum load itinerary that ensures minimum network traffic can be easily computed

from the agent data distribution. Based on the previous property, the elements of the MA

itinerary are sorted and visited in ascending order, according to the amount of data that the MA

has to collect. Finally, the SU is added as a final step of the itinerary.

Once the itinerary is computed (see Figure 5), the agent transfers the assemblies (both

code and data) to the next MAS (5). This operation is performed in the CU and MAS modules. Once

the MA is ready to be transferred to the next element of its itinerary, it firstly obtains a remote

reference of the Migration Component, responsible of receiving the MA. There are two different

migration levels of MAs: strong and weak. In the former one, the agent is transferred to the new

host with its current execution state, allowing to resume the execution in the new host with the

same state (instruction, register values, etc.). Weak mobility consist in transferring the agent and

executing it from an entry point, in a newly execution environment. Our platform corresponds to

the latter one.

The next step of the MA transfer consists in ensuring that the target contains all the

assemblies required by the agent. A specific petition is sent to the Migration Component with the

MA identification, which includes the MA version number, its language and its public key. This

information is received from the target Migration Component, which redirects it to the Local

Assembly Cache. This module checks whether the required assemblies are in the local cache or

not, sending back (via Migration Component) to the source the result of the check. Using this

information, the source node transfers all the required assemblies that are not present. When this

operation is completed, the MA state is transferred. Finally, using this information, the target node

recognizes the MA and executes it with its current state, collecting the compute node data (6). In

this way, the MA is transferred to each node of its itinerary. For each one, the required data is

collected. When the MA reaches the SU (7), it transfers the data to disk and the MA is destroyed.

For reading operations an analogous procedure can be applied: the MA creates the

itinerary using the data distribution datatypes and the MA datatype (now related to the file

portion that has to be read by the MA). Using this information, the MA transfers the associated

data to each node of its itinerary.

Note that MAGIO can handle generic data distributions and file layouts. In a general case,

it uses two different inputs (MPI datatypes): one describing how the data structure is distributed

on the processes and other related with the data elements that have to be read/written. The

checkpointing operations are a particular case where each process has a single portion of data that

has to be transferred as a single file (one file associated to each process). In this case, the datatype

structure is the simplest one (a complete data block).

Regarding the data consistency, two different scenarios can be considered: In the first one,

the application needs to store the current data structure value to a file and to start a new

computation that reuses the data structure. In the second one the processes of the application are

concurrently changing parts of the array while the MA is reading it. For both scenarios it is

necessary to provide an access mechanism for preventing reading inconsistent versions of the

data. This can be done by two different strategies: the first scenario is addressed by copying the

current data structure to a new memory area, where data are temporally stored until the arrival of

the MA. This allows the application to reuse and modify the original data structure without

interfering with the MA. The second scenario requires synchronization mechanisms between the

MA and the application. These mechanisms can be incorporated using locking facilities over the

data (by means of a shared variable).

This platform was designed for performing system monitoring and I/O, including

capabilities to adapt to the changes in the execution environment. First, the proposed architecture

is scalable, offering a larger number of MAs when more compute nodes are employed. Moreover,

more SU can also be used, increasing the available parallelism, not only at transfer level, but also

at storage level. Second, due to the modular design of the platform, new MAs with new logic can

be easily incorporated with a minimum design cost. Due to the close interaction between the MAs,

the MPI application and the compute nodes, it is possible to retrieve detailed information about

both the software application and underlying hardware. Therefore, it is possible to include new

logic to allow the MAs to adapt themselves to the changes in their environments.

5 Data assignment strategies

One critical element in the MAGIO architecture is the data assignment mechanisms of the

MAs. Performing this process efficiently will achieve a better resource utilization (computational

power and network usage), increasing the load balance and the overall platform performance.

Consequently, we need a system-aware data assignment mechanism for the MAs. In this section

we study the scalability and robustness of different data assignment strategies.

In our context, each data collector MA has a data assignment which describes the set of

data entries that has to collect. Thus the MA scheduling problem can be reformulated as the

generation of the data assignment for each MA. Our platform has currently integrated four

different data assignment techniques: user-defined, Simulated Annealing, Genetic Algorithm, and

Greedy Algorithm. The latter three data assignment strategies use a cost function for estimating

the I/O operation time based on the MA data assignment and platform characteristics. This

function is described next.

5.1 Cost function

The first step for constructing the cost function consists of obtaining the set of

parameters that characterize the distributed architecture. In our experiments we have considered

the following ones, associated to each node: Network link latency (ms) and bandwidth (Mb/s),

communication startup latency (ms), processor power (MFLOPS), and I/O storage node

bandwidth.

We have developed a mathematical model that computes the agent execution time for a

given itinerary. This simulation assumes that the agent has a size associated with its code and a

payload associated to the collected array entries. The model computes the network transfer time

between any pair of nodes of the MA itinerary. Additionally, it assumes that for each compute

node, each agent requires a given number of floating point operations in function of its size (code

plus payload). The cost of each MA transfer is shown in equation 1, the last term of this expression

is only considered at the SU.

]/[_= TimenComputatioionCommunicat OITTLatencystartupNodeCost 

(1)

In this model the MA transfers are exclusive, that is, when the agent is transferred

between two nodes, both nodes do not allow other communication. If any agent tries to visit any

of these nodes, it has to wait until the former agent finishes its transfer. The model addresses the

existence of bottlenecks in the network access. In addition, we allow concurrent MA execution,

that is, a given compute node can host a generic number of MAs. The final result produced by this

model is the total execution time of agent I/O operations, that is, the maximum execution time of

all existing MAs.

5.2 User-defined data assignment

In this policy, the user specifies the data associated to each MA by means of user-defined

datatypes. These datatypes are loaded at the CU and transferred to the MA. In this case, the

problem is not generating an optimal MA data assignment, but providing a procedure to collect a

generic portion of the MPI application data transparently, without knowing how many MPI

processes are being used, which data entries are stored in each one, and which compute nodes

are being executed in. All these variables are automatically computed by the MA. Note that this

strategy does not use the cost function, given that the data assignment is provided by the user.

A drawback of this approach is the potential hotspots due to MA traffic: If several agents

access the same compute node, contention hot spots can be produced. These hot spots can

introduce delays, increasing the agent itinerary time (that is, the global I/O operation time) and,

consequently producing a non-optimal schedule. The solution taken in the remainder of the

techniques consists of assigning always different compute nodes to the MAs. For the rest of the

approaches there are no conflicts with the MAs given that each compute node is visited only by

one agent.

5.3 Simulated annealing data assignment

In this approach, the objective is to find an optimal data assignment that achieves the

smallest I/O execution time. We assume that each compute node is only visited by one MA which

collects the whole data to be accessed. Using the Simulated Annealing (SA) approach, this

problem can be seen as a global optimization problem in a discrete space. Each space state

corresponds to a given data assignment for each agent. More specifically, we define state as a

vector of

node

entries, where

node

is the number of compute nodes. The

entry of

state

contains the agent responsible of collecting all data from the

compute node. If

agents are employed, then the number of different states is

node

The associated execution time of each state will be the one returned by the cost function.

Under this approach, the problem of finding the optimal data assignment can be seen as the

problem of searching the state with a minimum cost. Note that each state corresponds to a

specific data assignment for each MA participating in the I/O operation.

In our experiments we have used the SA implementation of Matlab, included in the Global

Optimization Toolbox. The SA implementation uses two user-customized functions: the cost

function (shown in Section 5.1) that provides the cost of each state, and a permutation function,

that obtains a random new state based on the current one. We have considered two permutation

functions:

• Perm1. This function permutes two random entries of state. That is, it generates

two random values

nodes

Nji >,0 

and exchanges the values

][istate

and

][ jstate

• Perm2. This function takes a random entry of state and assigns it a random agent.

That is, generate two random values

nodes

Ni >0 

Nj >0 

and sets

jistate =][

The initial state is a round-robin distribution of the MAs over the compute nodes. Note

that Perm1 ensures load balance, given that the number of compute nodes assigned to each agent

is kept constant. On the other hand, Perm2 introduces a random assignment to agents that can

cause unbalance. Both permutation functions are sensitive to temperature variable (internally

used by the SA). The greater temperature, the larger entries of state that are permuted.

5.4 Genetic algorithm data assignment

The Genetic Algorithm (GA) is a method for solving optimization problems based on

natural selection. Starting from an initial state (called population), it repeatedly modifies it,

producing new states (called children). These states are used as parents, in the next step, to

produce a new generation of children. Over successive generations, the population evolves to new

states. Introducing some criteria of selection, it is possible to shift towards an optimal solution.

In our case, we use the Matlab implementation of Genetic Algorithms of the Global

Optimization Toolbox and the cost function depicted in Section 5.1 for guiding the optimization

process. Both the SA and GA approaches compute a data assignment with a close-to-minimum

execution time.

Based on the previous strategies, the problem of finding the optimal solution has been

simplified to another one: given a generic number of agents, and using the SA or GA approaches, a

close to optimal MA data assignment can be found. Using this data assignment the MAs compute

the itinerary based on the minimum network traffic scheduling.

FOR i=1,

node

State[0:

node

]=-1

State[i]=1

cost[i]=evaluate_cost(state) T1

END

Scale=1.1; Tmax=-1;

=0; State[0:

node

]=-1

WHILE (there are unassigned nodes)

j=Take_Max_Cost(cost) T2

State[j]=

T=evaluate_cost(state) T3

Tmax=max(T,Tmax)

FOR each k



unassigned nodes

state[k]=

T= evaluate_cost(state) T5

Threshold= Tmax*scale

IF(T



Threshold) state[k]=-1 T6

ELSE Tmax=max(T,Tmax)

END

Figure 6: Pseudocode of Greedy Algorithm data assignment.

5.5 Greedy algorithm data assignment

With the Greedy Algorithm (GRA) we have followed a different approach: instead of

minimizing the I/O cost using a fixed number of MAs, we consider an increasingly number of

agents. Figure 6 shows the pseudocode of the Greedy Algorithm. It consists of two phases. In the

first phase, we evaluate the cost of sending one MA to each compute node, collecting its data, and

storing it in the SU. The idea is to evaluate the associated cost of independently accessing each

MAS. Function evaluate_cost (T1 tag), computes these values using the cost model defined before.

Note that a

state

value of -1 discards the associated MAS of the I/O operation.

In the second phase, the MA data assignment is computed. Initially, one MA is assigned to

the highest cost compute node (T2 tag), obtaining the overall execution time (T3 tag). Then, we

tentatively add, each one of the remainder unassigned compute nodes (T4 tag) to the MA

itinerary. For each one of them, we evaluate whether overall execution time (T5 tag) is larger than

a given threshold (T6 tag). If this is the case, the node is discarded (T6 tag). Otherwise, it is added

to the MA itinerary. This process repeats until all the nodes (

state

entries) are assigned to a MA.

In order to better exploit parallelism, a given compute node with a high execution time will have a

dedicated MA, but, at the same time, other nodes with smaller cost will share the same MA.

Note that, for the SA, GA and GRA data assignment strategies, the final result is the

state

vector. Using this data structure, the agent travels to the MPI root process, where it generates the

necessary data assignment and computes in which order it will visit the assigned nodes. In the

case of the User-defined data acquisition, the MA has initially its whole data assignment, thus, it

only has to compute its itinerary.

6 Use case: MAGIO with BIPS3D parallel application

We have fully integrated MAGIO with a parallel scientific application, and evaluated it on a

distributed architecture. This section, describes both the application and architecture

characteristics as well as the results of the evaluation.

6.1 BIPS3D Simulator

BIPS3D [24] is a 3-dimensional simulation of BJT and HBT bipolar devices. The goal of the

3D simulation is to relate electrical characteristics of the device with its physical and geometrical

parameters. The basic equations to be solved are Poisson’s equation and electron and hole

continuity, in a stationary state. Finite element methods are applied in order to discretize the

Poisson equation, hole and electron continuity equations by using tetrahedral elements. The result

is an unstructured mesh in which we place more nodes in the areas of union between different

areas of the transistor.

Using the METIS library [18], the mesh is divided into sub-domains, such that one

sub-domain corresponds to one processor. The next step is decoupling the Poisson equation, hole

and electron continuity equations, and linearize them using the Newton method. Then the part

corresponding to the associated linear system is constructed, for each sub-domain, in a parallel

manner.

BIPS3D consists of the following compute phases. The first phase is the data initialization

and distribution: the compute node 0 (called root node) loads and initializes the data structures

related to the electronic device to be simulated. Then, the mesh is partitioned using METIS and

data distribution datatypes are automatically generated based on the partition information. These

datatypes are employed for distributing the mesh elements among the compute nodes by MPI. In

a second phase, the device parallel simulation is carried out. This phase consists of a given number

of time steps. Note that these operations are performed in parallel and some communication

operations are required in order to update the mesh boundaries. The third phase, the I/O phase, is

executed interleaved with the second. Periodically, for a given number of time steps, the compute

nodes send to the root node the complete state information related to the assigned mesh portion.

The root node gathers all the data, reconstructs the complete mesh and transfers it to disk. Note

that this phase consists of two operations: MPI communication for gathering the data and I/O disk

transfers. In our approach, the I/O transfers are carried out using MAs.

Note that in BIPS3D data arrays are initialized at the root node, which subsequently

distributes them to the rest of nodes. Thus, root process contains the datatype structures

employed in the data distribution, that is, the description of the data distribution over all compute

nodes. In order to obtain the itinerary, each MA has to visit the root node firstly, to intersect its

data assignment with the data distribution datatypes, and to apply the Minimum Network Traffic

scheduling.

The next sections describe the MAGIO evaluation when used with BIPS3D. Our

experiments target the following issues: performance evaluation of the Mobile Agent Platform

employing different numbers of Mobile Agents and Storage Units and efficiency of the data

assignment techniques. In addition, the performance has been compared with one of the original

MPI application. All these experiments have been performed on a real distributed architecture

consisting of 40 commodity computers with one dual-Core AMD processor running at 2.2GHz, 512

MB of RAM and FastEthernet interconnection network. The average latency between the

computers is 0.098 msecs with a standard deviation of 0.012 msec.

We have used the gcc 4.1.2 for compiling the BIPS3D application and the Mono JIT

compiler 1.2.2.1 for the mobile agent platform. We have used two datasets called Device1 and

Device2 which are related to two tetrahedrical meshes of semiconductor devices employed by the

BIPS3D application. These meshes have respectively 47,200 and 32,888 nodes. Each node contains

several data structures, with a total data size of 56.6MB and 39.5MB per mesh. This is the volume

of data that have to be transferred to disk during the I/O operation.

6.2 Evaluation of the impact of the number of mobile agents

Figure 7(a) shows the execution time of the I/O data transfer for Device1 using 32

compute nodes, one Storage Unit, and a variable number of MAs. We evaluated the I/O time for a

range from 1 to 32 agents. The data assignment of the MAs follows block and cyclic distributions

of the MAs over the compute nodes. The last column is the I/O transfer cost of the original

MPI-based application with 32 processors. In our experiments we focus on the evaluation of the

data transfer cost over the network. Therefore, the measured results do not include the disk data

transfer. More specifically, Figure 7 shows the time interval since the first MA leaves the CU until

the last MA transfers the collected data to the SU memory space. For the MPI-based application,

the Figure shows the execution time for sending all the data to the root process.

If we consider, a distributed platform of commodity computers sharing the disk resources

using NFS, the MPI root process writes the collected data to a remote disk. The cost of this write

operation for Device1 dataset is 4,950 msecs for NFS (MPI original version). In contrast, each SU is

supposed to have local disk resources allowing to reduce the access time. In our experiments, the

local disk access time is 188 msecs.

In Figure 7(a) we can observe that the proposed architecture takes advantage of the

parallelism when several agents are employed, reducing the I/O time. For one MA the I/O transfer

time is very large, given that the collected data have to be transferred among all the compute

nodes. When more MAs are employed, the parallelism is efficiently exploited, drastically reducing

the I/O transfer time. Note that for 32 MAs, each one collects the data from one compute node.

However, the smallest execution time is obtained for 16 MAs. This is due to the fact that for 32

agents the contention is higher at the single SU.

If the disk access time is not considered, the MPI application gathers the data more

efficiently than the MAP. However, it is important to remark that the MAP fully overlaps the I/O

with the computation, given that it is done asynchronously without the intervention of the original

application. In the case of the MPI-based application, the root process has to wait until receiving

all the data from the rest of processes, limiting the degree of overlapping between I/O and

compute phases. This makes our approach more feasible.

(a) Impact of the number of MAs

(b) Impact of the number of SUs

Figure 7: Scalability of MAGIO.

6.3 Evaluation of the impact of the number of storage units

The proposed platform supports a generic number of SUs, each one with a generic

number of MAs. In this case, the file contents are distributed over several SUs. That is, we provide

the basic functionality of a parallel file system. There are several differences with other parallel file

systems approaches like GPFS and PVFS. First, MAGIO does not require administrative privileges,

given that it can be completely installed and executed in a local account. Second, it is

transparently integrated in the original application, given that no I/O explicit calls are required.

Third, any compute node with local storage can be used as a SU.

Figure 7(b) shows the execution time of the I/O data transfer for Device1 for 32 compute

nodes, 16 MAs (cyclically distributed) and a variable number of SUs. We considered a range

between 1 and 8 SUs. Note that the number of MAs is the same for all the cases, thus, the larger

the number of SUs, the smaller the number of MAs assigned to each SU.

In this scenario two different levels of parallelism are exploited: First, the use of several

MAs increases the parallelism for collecting data and second, the use of different SUs reduces

contention at disk. We can see that there is a strong reduction of the overall execution time, when

more storage units are employed, resulting in a better performance than the MPI-based approach.

6.4 Evaluation of data assignment techniques

In this section the efficiency of different data assignment methods is evaluated on MAGIO.

In addition, new synthetically generated scenarios are also studied. The idea consists on

introducing artificial compute node loads that represent contention spots. This situation appears

in distributed environments, such as Planetlab [50], that do not provide exclusive access to the

compute nodes. In these cases the parallel application competes with other user applications that

consume CPU and network resources.

In order to consider these effects, we have modified the application adding pre-defined

latencies to some specific compute nodes. These artificial latencies introduce communication lags

that delay the MA transfer time. Using these new scenarios, we evaluate the capacity of MAGIO to

adapt to the changing platform conditions and to perform efficient I/O in each case. MAGIO

performance was evaluated considering the following scenarios:

(a) Scenario 1

(b) Scenario 2

(d) Scenario 4

Figure 8: Evaluation of MAGIO for Device1.

(a) Scenario 1

(b) Scenario 2

(d) Scenario 4

Figure 9: Evaluation of MAGIO for Device2.

• Scenario 1: Application has exclusive access to the platform. No artificial latency is

added.

• Scenario 2: Each compute node executes a sequential program with a random load.

We model this scenario by introducing a random distribution of artificial latencies of up to 6 secs.

• Scenario 3: A group of compute nodes is executing a demanding parallel

application. The rest of nodes have no load. We introduce an artificial latency of 7 secs for three

compute nodes. The rest of them have no artificial latency.

• Scenario 4: One compute node has a very heavy contention and three more have

medium contention. One compute node has 15 secs of artificial latency and three compute nodes

have 5 secs. The rest of them have no latency.

All these scenarios are evaluated for 32 compute nodes and 1 SU. For each one of them

we have generated the agent scheduling using the Genetic Algorithm (GA), the Simulated

Annealing (SA), and the Greedy technique for 4, 8 and 16 agents. We call SA-Perm and SA-Repl as

the SA implementation using Perm1 and Perm2, respectively (see Section 5.3). Regarding the

Greedy technique, we use

scale

values (see Section 5.5) of

1.05

1.10

and

1.20

for

Greedy1, Greedy2 and Greedy3, respectively. Note that this algorithm dynamically calculates the

number of agents, thus this number is not an input parameter like in the other techniques. In

addition, we evaluated the block and cyclic agent distribution over the compute nodes (called

Block and Cyclic in the figures).

Figures 8(a) and 9(a) show the overall execution time for Scenario 1; Device1 and Device2,

respectively. We can see that the Block and Cyclic distributions obtains, in general, the best

performance. This is due to the fact that all the compute nodes have the same characteristics,

thus, load balance (assigning the same number of compute nodes to each agent) is the most

important factor in scheduling. For each Greedy technique, the number of MAs automatically

computed by the algorithm for each scenario is shown on the horizontal axis.

Figures 8(b) and 9(b) show the overall execution time for Scenario 2. Given that the

artificial latencies are randomly distributed, Block and Cyclic distributions are still efficient. In

general the GA, SA, and Cyclic strategies deliver the best scheduling. Block distribution introduces

important delays when a small number of agents are employed. This is because the same agent

visits the compute nodes with highest latencies. Greedy approach is efficient only for small scale

values.

Figures 8(c) and 9(c) show the overall execution time for Scenario 3. Now the artificial

latency affects only few compute nodes. Note that Block distribution is heavily affected by the

artificial latency. For Device1, the GA and Greedy obtain the best scheduling. For Device2 the most

efficient ones are the GA and Cyclic data assignments.

Figures 8(d) and 9(d) show the overall execution time for Scenario 4. The introduction of a

hotspot with large latencies produces poor results for Block and Cyclic scheduling. In general the

GA and SA generate efficient scheduling given that they adapt better to the changing system

conditions, assigning only one MA to each hotspot node. Greedy technique is also efficient for

scales 1.05 and 1.10. Note that the larger the overheads due to hotspots, the smaller the number

of MAs employed by the Greedy approach. This is due to the fact that the algorithm assigns one

MA to each hotspot and during this time, fewer MAs are necessary for balancing the accesses of

the remaining compute nodes. The advantage of a small number of agents is the reduced

contention at the final SUs.

In general, the data assignment efficiency is strongly dependent of the data distribution,

the platform characteristics, and number of MAs employed. In overall, AS and GA are efficient for

all the considered scenarios, while the Block and Cyclic distributions are only efficient when

hotspots are not present. Greedy technique produces competitive scheduling, especially for low

scale levels. Note that for all considered scenarios, MAGIO obtains in a autonomous and flexible

way a adaptive I/O scheduling that provides scalable solutions for distributed systems.

7 Conclussions

In this paper we present MAGIO, a novel mobile agent platform targeting to perform the

I/O on behalf of parallel MPI applications. This approach presents several advantages compared to

the classical I/O model: First, for each compute node, it is not necessary to explicitly perform

remote I/O operations. Data can be locally stored in the compute node (either in local disk or in

memory) and, subsequently, collected by the MAs. This is made possible by the fact that our

platform uses the same memory space and resources as the parallel application. Second, the

agents collect the data asynchronously, without intervention of the MPI application. That is, the

compute and I/O phases of the parallel application can be completely decoupled and overlapped,

hiding the I/O overhead. Third, MAGIO supports not only I/O agents, but also general purpose

agents, which can perform tasks such as collecting compute node and network information and

monitoring parallel applications. Additionally, the security is provided by a private/public key

mechanism. MAGIO supports a generic number of SUs, allowing to extract parallelism both at

agent level and SU level. Even more, it handles generic file distribution policies, based on MPI

datatypes. Fourth, different agent scheduling strategies can be employed, based on the network

and compute node performance. We have evaluated these scheduling both for real and synthetic

scenarios, proving that in the presence of hotspots, the use of scheduling based on global

optimization techniques reaches the best performance. Finally, we have fully integrated our

implementation with MPI. More specifically, data description operators, such as datatypes and file

views, are fully implemented in our MAP.

MAGIO is a scalable platform due to several reasons: it allows to use a generic number of

agents (communication level parallelism) as well as an increasing number of I/O storage nodes (I/O

level parallelism). The MA scheduling is aware of the systems characteristics. The performance of

each single link of the network (latency-bandwidth between two compute nodes) is considered,

thus it is possible to extract topology-related characteristics. Additionally, the computing power of

each compute node is also considered, thus it is suitable for heterogeneous architectures. Finally,

the use in conjunction with a monitoring process allows to detect hot-spots and adapt to the

changing conditions of the platform.

Future features of mobile agents include the run-time modification of the agent schedule

based on the real-time system monitoring, to allow to suspend/resume the agent execution, new

features for tracking and handling MA execution errors, establishing active cooperation with other

MAs, and new functionalities for developing on-demand reading prefetching techniques.

8 Acknowledgements

We would like to to acknowledge the assistance provided by Darío Ortega Correas. This

work has been partially supported by the Spanish Ministry of Science under the grant

TIN2010-16497..

References

[1] I. Foster, N. R. Jennings, and C. Kesselman. Brain meets brawn: Why grid and

agents need each other. Autonomous Agents and Multiagent Systems, International Joint

Conference on, 1:8–15, 2004.

[2] S. Papavassiliou, A. Puliafito, O. Tomarchio, and J. Ye. Mobile agent-based

approach for efficient network management and resource allocation: framework and applications.

Selected Areas in Communications, IEEE Journal on, 20(4):858–872, May 2002.

[3] S. Marwaha, C. K. Tham, and D. Srinivasan. A novel routing protocol using mobile

agents and reactive route discovery for ad hoc wireless networks. In Networks, 2002. ICON

2002. 10th IEEE International Conference on, pages 311–316, 2002.

[4] M. D. Assunç ao, F. L. Koch, and C. B. Westphall. Grids of agents for computer

and telecommunication network management: Research articles. Concurr. Comput. : Pract.

Exper., 16(5):413–424, 2004.

[5] C. X. Mavromoustakis and H. D. Karatza. Split agent-based routing in

interconnected networks: Research articles. Int. J. Commun. Syst., 17(4):303–320, 2004.

[6] S. Zanikolas and R. Sakellariou. A taxonomy of grid monitoring systems. Future

Gener. Comput. Syst., 21(1):163–188, 2005.

[7] R. Y. de Camargo, R. Cerqueira, and F. Kon. Strategies for storage of checkpointing

data using non-dedicated repositories on grid systems. In MGC ’05: Proceedings of the 3rd

international workshop on Middleware for grid computing, pages 1–6, New York, NY, USA, 2005.

ACM.

[8] P. Lemarinier, A. Bouteiller, T. Herault, G. Krawezik, and F. Cappello. Improved

message logging versus improved coordinated checkpointing for fault tolerant mpi. In

CLUSTER ’04: Proceedings of the 2004 IEEE International Conference on Cluster Computing, pages

115–124, Washington, DC, USA, 2004. IEEE Computer Society.

[9] A. Negri, A. Poggi, M. Tomaiuolo, and P. Turci. Dynamic grid tasks composition and

distribution through agents: Research articles. Concurr. Comput. : Pract. Exper., 18(8):875–885,

2006.

[10] Junwei Cao, S.A. Jarvis, S. Saini, and G.R. Nudd. Gridflow: workflow management

for grid computing. In Cluster Computing and the Grid, 2003. Proceedings. CCGrid 2003. 3rd

IEEE/ACM International Symposium on, pages 198–205, May 2003.

[11] J. W. Baek and H. Y. Yeom. D-agent: an approach to mobile agent planning for

distributed information retrieval. Consumer Electronics, IEEE Transactions on, 49(1):115–122,

Feb. 2003.

[12] D. Gavalas and C. T. Politi. Low-cost itineraries for multi-hop agents designed for

scalable monitoring of multiple subnets. Comput. Netw., 50(16):2937–2952, 2006.

[13] A. Selamat and S. Omatu. Analysis on route selection by mobile agents using

genetic algorithm. In SICE 2003 Annual Conference, volume 2, pages 2088–2093 Vol.2, Aug.

2003.

[14] J. Yang, J. Cao, and W. Wu. Efficient global checkpointing algorithms for mobile

agents. Concurr. Comput. : Pract. Exper., 20(7):825–838, 2008.

[15] A. Poggi, M. Tomaiuolo, and P. Turci. Extending jade for agent grid applications.

In Enabling Technologies: Infrastructure for Collaborative Enterprises, 2004. WET ICE 2004. 13th

IEEE International Workshops on, pages 352–357, June 2004.

[16] G. Varaprasad, R. S. D. Wahidabanu, and P. Venkataram. An efficient resource

allocation scheme for multimedia applications in manet. J. Netw. Comput. Appl., 31(4):577–

584, 2008.

[17] V. Baousis, S. Hadjiefthymiades, G. Alyfantis, and L. Merakos. Autonomous mobile

agent routing for efficient server resource allocation. J. Syst. Softw., 82(5):891–906, 2009.

[18] G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning

irregular graphs. SIAM J. Sci. Comput., 20(1):359–392, 1998.

[19] H. Jitsumoto, T. Endo, and S. Matsuoka. Abaris: An adaptable fault

detection/recovery component framework for mpis. Parallel and Distributed Processing

Symposium, International, 0:413, 2007.

[20] T. D. Braun, H. J. Siegel, N. Beck, et al. A comparison of eleven static heuristics for

mapping a class of independent tasks onto heterogeneous distributed computing systems. J.

Parallel Distrib. Comput., 61(6):810–837, 2001.

[21] R. Singh and M. Dave. Antecedence graph approach to checkpointing for fault

tolerance in mobile agent systems. Computers, IEEE Transactions on, 62(2):247–258, 2013.

[22] K. Sycara, M. Paolucci, M. Van Velsen, and J. Giampapa. The retsina mas

infrastructure. Autonomous Agents and Multi-Agent Systems, 7(1-2):29–48, 2003.

[23] M. Aldinucci, C. Bertolli, S. Campa, et al. Self-configuring and self-optimizing grid

components in the gcm model and their assist implementation. In Proc of.

HPC-GECO/Compframe (held in conjunction with HPDC-15), IEEE, pages 45–52, Paris, France, June

2006.

[24] A.J.Garcia Loureiro, J.M.Lpez Gonzalez, and T.F.Pena. A parallel 3D semiconductor

device simulator for gradual heterojunction bipolar transistors. Int. Journal of Numerical

Modelling: electronic networks, devices and fields, 16:53–66, 2003.

[25] F. Bellifemine, G. Caire, A. Poggi, and G. Rimassa. Jade: A white paper. EXP in

search of innovation, 3(3):6–19, 2003.

[26] A. Bouteiller, T. Hï¿½rault, G. Krawezik et al. Mpich-v project: A multiprotocol

automatic fault-tolerant mpi. International Journal of High Performance Computing and

Applications, 20(3):319–333, 2006.

[27] R. M.M. Braga, C. M.L. Werner, and M. Mattoso. Odyssey-search: A multi-agent

system for component information search and retrieval. Journal of Systems and Software,

79(2):204 – 215, 2006.

[28] B. Brewington, R. Gray, K. Moizumi, et al. Mobile agents in distributed

information retrieval. In Intelligent Information Agents, pages 355–395. Springer-Verlag, 1999.

[29] D. Buntinas, C. Coti, T. Herault, et al. Blocking vs. non-blocking coordinated

checkpointing for large-scale fault tolerant mpi protocols. Future Generation Computer

Systems, 24(1):73 – 84, 2008.

[30] M. Chen, T. Kwon, Yong Yuan, and V. C. M. Leung. Mobile agent based wireless

sensor networks. JCP, 1(1):14–21, 2006.

[31] R. D. A. Byrski and M. Kisiel-Dorohinicki. Agent-based computing in an augmented

cloud environment. Computer Systems Science & Engineering, 27(1):7–18, 2012.

[32] A. Fuggetta, G. P. Picco, and G. Vigna. Understanding code mobility. IEEE

Transactions on Software Engineering, 24:342–361, 1998.

[33] F. Schmuck and R. Haskin. GPFS: A Shared-Disk File System for Large Computing

Clusters. In Proceedings of FAST, 2002.

[34] P. M. Dickens and R. Thakur. Improving collective I/O performance using threads.

In In Proceedings of the 13th IPPS, pages 38–45, 1999.

[35] G. Buddhinath Jayatilleke, L. Padgham, and M. Winikoff. A model driven

component-based development framework for agents. Computer Systems Science &

Engineering, 20, 2005.

[36] D. Kotz. Disk-directed I/O for MIMD Multiprocessors. In Proc. of the First

USENIX Symp. on Operating Systems Design and Implementation, 1994.

[37] D. B. Lange and M. Oshima. Seven good reasons for mobile agents. Commun.

ACM, 42(3):88–89, March 1999.

[38] Cluster File Systems Inc. Lustre: A scalable, high-performance file system.

Cluster File Systems Inc. white paper, version 1.0, November 2002.

http://www.lustre.org/docs/whitepaper.pdf.

[39] J. Ye and H. C B Chanl. B2B magics: a mobile agent-based internet commerce

system for B2B e-commerce. Computer Systems Science & Engineering, 28(2):71–80, 2013.

[40] X. Ma, M. Winslett, J. Lee, and S. Yu. Improving MPI-IO Output Performance with

Active Buffering Plus Threads. In IPDPS, pages 22–26, 2003.

[41] D.C. Marinescu, L. Bölöni, J.R. Rice, P. Tsompanopoulu, and E.A Vavalis.

Agent-based scientific simulation and modeling. Concurrency Practice and Experience,

12(9):845–861, 2000.

[42] M.Massie, B. N. Chun, and D. E. Culler. The ganglia distributed monitoring system:

Design, implementation and experience. Parallel Computing, 30:2004, 2003.

[43] K. Moizumi and G. Cybenko. The travelling agent problem. In Mathematics of

Control, Signals and Systems, 1998.

[44] J. Aguilar, J. Chacal, and C. Bravo. A multiagents system for planning and

management of the production factors. Computer Systems Science & Engineering, 24(2):85–

102, 2009.

[45] N. Nieuwejaar, D. Kotz, A. Purakayastha, C.S. Ellis, and M.L. Best. File Access

Characteristics of Parallel Scientific Workloads. In IEEE Transactions on Parallel and Distributed

Systems, 7(10), pages 1075–1089, October 1996.

[46] R. V. Van Nieuwpoort, J. Maassen, G. Wrzesinska, et al. Ibis: a flexible and efficient

Java-based grid programming environment. In Concurrency & Computation: Practice &

Experience, pages 7–8, 2005.

[47] M. Kallahalla and P.J. Varman. Pc-opt: optimal offline prefetching and caching for

parallel i/o systems. Computers, IEEE Transactions on, 51(11):1333–1344, Nov 2002.

[48] W.B. Ligon and R.B. Ross. An Overview of the Parallel Virtual File System. In

Proceedings of the Extreme Linux Workshop, June 1999.

[49] K.E. Seamons, Y. Chen, P. Jones, J. Jozwiak, and M. Winslett. Server-directed

collective I/O in Panda. In Proceedings of Supercomputing ’95, 1995.

[50] L. Peterson, S. M., T. Roscoe, and Aaron Klingaman. PlanetLab Architecture: An

Overview. Technical Report PDN–06–031, PlanetLab Consortium, May 2006.

[51] J. del Rosario, R. Bordawekar, and A. Choudhary. Improved parallel I/O via a

two-phase run-time access strategy. In Proc. of IPPS Workshop on Input/Output in Parallel

Computer Systems, 1993.

[52] Y. Chen, S. Byna, Xian-He Sun, R. Thakur, and W. Gropp. Hiding I/O latency with

pre-execution prefetching for parallel applications. In SC ’08, pages 1–10, 2008.

[53] H. Simitici and D.A. Reed. A Comparison of Logical and Physical Parallel I/O

Patterns. In International Journal of High Performance Computing Applications, special issue

(I/O in Parallel Applications), 12(3), pages 364–380, 1998.

[54] F. Garca D. E. Singh, A. Miguel and J. Carretero. Mobile agent systems integration

into parallel environments. Scalable Computing: Practice and Experience (SCPE), 10(1), 2008.

[55] W. Kitsuregawa Qu, M. Zhuge, H. Shen, and Y. H. Jin. A traffic-based routing

algorithm by using mobile agents. Computer Systems Science & Engineering, 22(6):323–332,

2007.

[56] S. S. Vadhiyar and J. J. Dongarra. Self adaptivity in grid computing. Concurrency

& Computation: Practice & Experience, 2005, 2005.

[57] O. Wijngaards Van, B. J. Overeinder, N. J. E. Wijngaards, et al. Multi-agent support

for internet-scale grid management. In AISB02 Symposium on AI and Grid Computing, pages

18–22, 2002.

[58] N. Woo, H. Y. Yeom, and T. Park. Mpich-gf: Transparent checkpointing and

rollback-recovery for grid-enabled mpi processes. IEICE TRANSACTIONS on Information and

Systems, 87:1820–1828, 2004.

[59] P. Nowoczynski, N. Stone, J. Yanovich, and J. Sommerfield. Zest: Checkpoint

storage system for large supercomputers. In In 3rd Petascale Data Storage Workshop

Supercomputing, 2008.

[60] B. Unni, N. Parveen, A. Kumar, and B.S. Bindhumadhava. An intelligent energy

optimization approach for mpi based applications in hpc systems. CSI Transactions on ICT,

1(2):175–181, 2013.

[61] R. Bordawekar. Implementation of Collective I/O in the Intel Paragon Parallel File

System: Initial Experiences. In Proc. 11th International Conference on Supercomputing, July

1997.

[62] M. Hadzic and E. Chang. Onto-agent methodology for design of ontology-based

multi-agent systems. Computer Systems Science & Engineering, 23(1):19–30, 2008.

[63] M. S. Pérez, J. Carretero, F. Garca-Carballeira, and at al. Mapfs: A flexible

multiagent parallel file system for clusters. Future Generation Comp. Syst., 22(5):620–632,

2006.

[64] A. Sánchez, M. S. Pérez, Pierre G., and at al. Improving gridftp transfers by means

of a multiagent parallel file system. Multiagent Grid Syst., 3(4):441–451, 2007.

[65] Mono: the open source development platform based on the .NET framework.

http://www.mono-project.com.

[66] M. Fukuda and J. Miyauchi. An implementation of parallel file distribution in an

agent hierarchy. J. Supercomput., 47(3):255–285, 2009.

ResearchGate has not been able to resolve any citations for this publication.

PlanetLab Architecture: An Overview

Article

Full-text available

Seven good reasons for mobile agents

Article

Jan 1999

Onto-agent methodology for design of ontology-based multi-agent systems

Article

Jan 2008
COMPUT SYST SCI ENG

Multi-agent system provides a distributed collaborative platform and characterizes the system dynamics. Ontologies represent the domain knowledge and can be used to support various processes within a multi-agent system. Ontologies are high expressive knowledge models and as such increase the expressiveness and intelligence of a system. We propose onto-agent methodology for design of ontology-based multi-agent systems, as the first methodology that unifies the different approaches of the existing ontology and multi-agent systems design methodologies. Onto-Agent Methodology is composed of the two interconnected processes: ontology design process and multi-agent system design process. Each of the two design processes is composed of five steps. We discuss each of the steps separately and provide some illustrative examples.

A multiagents systems for planning and management of the production factors

Article

Mar 2009
COMPUT SYST SCI ENG

We propose a general reference model in this work for the automatic applications, which execute production planning functions and management of production factors. Next, such reference model is used for proposing a model for the planning and production management support system based on multiagents systems. For this, the agent specification methodology MASINA is applied.

B2B magies: A mobile agent-based internet commerce system for B2B e-commerce

Article

Mar 2013
COMPUT SYST SCI ENG

In this paper, we present a Business-to-Business (B2B) Mobile AGeni-based Internet Commerce System (MAGICS) that combines the advantages of centralized processing and of distributed management, allowing buyers and sellers to develop their own systems to manage their mobile agents in a distributed manner yet also allowing these agents to interact with each other in a centralized marketplace using Extensible Markup Language (XML)-based messages. We have also investigated a related inventory control application for B2B MAGICS and formulated a Markov decision model, which determines how an agent should make procurements under certain dynamic conditions so as to maximize profits. We have developed a prototype to show the basic functions of B2B MAGICS as a stock control application. In the prototype, the agents can be controlled through a Web/WAP interface or Short Messaging Service (SMS). Finally, we present analytical results to evaluate the effectiveness of the stock control algorithm.

A Traffic-Based Routing Algorithm by Using Mobile Agents

Article

Nov 2007
COMPUT SYST SCI ENG

In this paper, we propose a traffic-based routing algorithm by using mobile agents. The traffic cost func-tion for each link is defined based on known routing information. We theoretically analyze the probability distribution which is very useful for mobile agents to select a neighboring node to move to. The optimal prob-ability distribution makes inference on the known information and approximates to a unbiased distribution. Simulation experiments are conducted to compare the performance of our algorithm with existing algorithms. The results show that our algorithm outperforms existing algorithms on the performance metrics considered.

PVFS: parallel virtual file system

Conference Paper

Nov 2001

Antecedence Graph Approach to Checkpointing for Fault Tolerance in Mobile Agent Systems

Article

Feb 2013

The flexibility offered by mobile agents is quite noticeable in distributed computing environments. However, the greater flexibility of the mobile agent paradigm compared to the client/server computing paradigm comes at an additional threats since agent systems are prone to failures originating from bad communication, security attacks, agent server crashes, system resources unavailability, network congestion, or even deadlock situations. In such events, mobile agents either get lost or damaged (partially or totally) during execution. In this paper, we propose parallel checkpointing approach based on the use of antecedence graphs for providing fault tolerance in mobile agent systems. During normal computation message transmission, the dependency information among mobile agents is recorded in the form of antecedence graphs by participating mobile agents of mobile agent group. When a checkpointing procedure begins, the initiator concurrently informs relevant mobile agents, which minimizes the identifying time. The proposed scheme utilizes the checkpointed information for fault tolerance which is stored in form of antecedence graphs. In case of failures, using checkpointed information, the antecedence graphs and message logs are regenerated for recovery and then normal operation continued. Moreover, compared with the existing schemes, our algorithm involves the minimum number of mobile agents during the identifying and checkpoiting procedure, which leads to the improvement of the system performance. In addition, the proposed algorithm is a domino-free checkpointing algorithm, which is especially desirable for mobile agent systems. Quantitative analysis and experimental simulation show that our algorithm outperforms other coordinated checkpointing schemes in terms of the identifying time and the number of blocked mobile agents and then can provide a better system performance. The main contribution of the proposed checkpointing scheme is the enhancement of graph-based ap- roach in terms of considerable improvement by reducing message overhead, execution, and recovery times.

Agent-based computing in an augmented cloud environment

Article

Jan 2012
COMPUT SYST SCI ENG

In the paper an approach to building a highly scalable execution environment dedicated for multi-agent computing systems is presented. It utilizes the concepts of Augmented Cloud and Agent Platform as a Service (AgPaaS), discussed in the context of Cloud Computing as defined by NIST. The key idea of the presented solution is to span the cloud (i.e. computing infrastructure) beyond the data center borders by utilizing volunteer nodes as computational workers. The feasibility of the approach was demonstrated by a prototype based on Java Applets and Adobe Flash technologies. Selected results from the experiment are discussed in the final part of the paper.

An intelligent energy optimization approach for MPI based applications in HPC systems

Article

Jun 2013

Energy-aware computing is gaining more and more attention in high performance computing (HPC) environment. As an outcome of this, various energy-aware techniques are existing and many are being proposed. But it is difficult to have a technique which saves energy without compromising the performance. This paper talks about a novel energy optimization approach for Message Passing Interface (MPI) applications running on HPC systems. Our approach relies on applying Dynamic Voltage Frequency Scaling (DVFS) at node level by an optimization agent. Whenever MPI processes are idle or busy with I/O operations, the corresponding CPU cores run at higher frequencies, which results in wastage of power. During this time, CPU cores frequencies can be reduced using DVFS so that the energy can be saved. Our approach is based on a Multi-agent based intelligent energy management framework, which uses an optimization agent for implementing energy optimization algorithm. The key advantage of the proposed approach is that the performance will not be compromised while achieving energy savings.

Magio: Using mobile agents to enhance parallel I/O

Abstract and Figures

Recommended publications

Parallelization of Permuting XML Compressors

A relational data base machine employing associative memories and transposed files

Parallel I/O systems

Introduction to microprocessor input-output