ArticlePDF Available

Abstract and Figures

In the last years, the increasing gap between processor speed and storage has exposed I/O as one of the most significant bottlenecks in parallel applications. Distributed architectures based on commodity components belonging to the same organization consists of heterogeneous computers with dynamically evolving utilization loads. These resources can be efficiently used for executing parallel computing applications provided that all the computers are under the same administrative domain. In this work we propose an execution environment which could provide these parallel applications an efficient access to storage in the case that distributed or parallel filesystem are not available. Unlike traditional storage solutions, we propose a pull-based model based on mobile agents. In this approach, the mobile agents can freely traverse the compute nodes as well as accessing the memory region of the parallel application. Therefore, the application data can be accessed by the mobile agents and efficiently transferred to the storage nodes. The final goal of this platform is to provide distributed asynchronous I/O services based on the usage of mobile agents. Experimentally we show that, by exploiting the unique characteristics of mobile agents, the efficiency of the I/O stage can be dramatically increased.
Content may be subject to copyright.
MAGIO: Using Mobile Agents to Enhance Parallel I/O
David E. Singh, Florin Isaila, Félix García and Jesús Carretero
Department of Computer Science, University Carlos III of Madrid, Avda. de La Universidad
30. 28911 Leganés (Madrid), Spain.
Email: desingh@arcos.inf.uc3m.es, florin.isaila@gmail.com, fgcarbal@inf.uc3m.es,
jesus.carretero@uc3m.es
Abstract
In the last years, the increasing gap between processor speed and storage has exposed
I/O as one of the most significant bottlenecks in parallel applications. Distributed architectures
based on commodity components belonging to the same organization consists of heterogeneous
computers with dynamically evolving utilization loads. These resources can be efficiently used for
executing parallel computing applications provided that all the computers are under the same
administrative domain. In this work we propose an execution environment which could provide
these parallel applications an efficient access to storage in the case that distributed or parallel
filesystem are not available. Unlike traditional storage solutions, we propose a pull-based model
based on mobile agents. In this approach, the mobile agents can freely traverse the compute nodes
as well as accessing the memory region of the parallel application. Therefore, the application data
can be accessed by the mobile agents and efficiently transferred to the storage nodes. The final
goal of this platform is to provide distributed asynchronous I/O services based on the usage of
mobile agents. Experimentally we show that, by exploiting the unique characteristics of mobile
agents, the efficiency of the I/O stage can be dramatically increased.
KEYWORDS: Distributed architectures, mobile agents, parallel I/O.
1 Introduction
Nowadays, distributed computing technologies are evolving toward integration of all the
services (compute, storage, visualization, etc.) into highly distributed large-scale platforms. This
work targets infrastructures of loosely connected components under the same administrative
domain. Examples of them are a collection of computers, belonging to the same organization, that
are employed in voluntary computing for executing MPI applications. There are significant
differences with the cluster computing approach. First, components can be highly heterogeneous
both in computing power as well as network performance. For instance, the network
characteristics can be different between several compute nodes. Second, the access to the
compute nodes is not exclusive. That is, other applications can be executed at the same time,
consuming compute and network resources. The load changes dynamically and is impossible to
predict. In addition, there can be hotspots, both at compute node-level and at network link-level.
Third, given that the target infrastructure is used in an ad-hoc manner, it is highly unprobable that
shared resources, like parallel filesystems, are available. For this new environment, issues such as
communication overhead, contention, system reliability and fault tolerance become increasingly
important for executing parallel applications in an efficient and dependable manner.
When parallel application reliability is considered, one practical solution for improving it
consists in the introduction of robust mechanisms of checkpointing and rollback. During the I/O
phase the current process state is transferred to local or remote disks. This operation usually
involves transferring large amounts of data, introducing significant delays that reduce the overall
application performance. In general, parallel applications alternate compute and I/O phases. The
use of local storage caching or distributed parallel file systems can diminish the impact of the I/O
phase, but they require of special support for, respectively, transferring the data to a central
storage server or deploying and administrating the file system. This work presents a solution to
this problem based on Mobile Agents (MAs) in the context of parallel applications executed on
decentralized distributed platforms.
Mobile Agent Platforms (MAPs) provide a flexible and scalable solution for the
management of diverse elements of the distributed infrastructure [37]. In the context of this work
different tasks may be assigned to a MA, which executes them in an autonomous, distributed, and
adaptive fashion. The MA includes functionalities for monitoriing and dynamically adapting to the
changing conditions of the platform and efficiently exploiting its resources. For instance, due to
external reasons, a given compute node increases its computational load or a network link reduces
its bandwidth the MAPs allows to dynamically adapt to the new execution conditions.
This paper presents Mobile AGents I/O (MAGIO), a novel I/O infrastructure for parallel
applications based on MAs. MAGIO provides facilities for collecting/storing data from the
application memory space and transferring these data to the storage nodes. In MAGIO the I/O is
performed asynchronously, without the need of special I/O calls from the parallel application. This
feature allows to overlap the compute and I/O phases, improving the overall performance of the
parallel application. Additionally, the platform includes monitoring functionalities that can be used
to collect information about the compute node status. This information is employed for
performing an adaptive, efficient I/O, which is aware of the real platform status. The unique
characteristics of MAs allow them to adapt to the changing platform conditions, such as processor
and network performance. In addition, MAs are intrinsequely parallel, thus they can adapt to the
application characteristicis and achieve efficiente parallel I/O operations.
The rest of the paper is organized as follows. Next section introduces the related work.
Section 2 described MAGIO internal structure and the components of the distributed architecture.
Section 4 shows how MAGIO handles the parallel application data distribution structures and the
way that MA scheduling is performed. Section 5 depicts different techniques for creating user
defined or optimal MA scheduling. Section 6 presents the integration of MAGIO into a parallel
scientific application, and performance results of its evaluation on a distributed architecture.
Finally, Section 7 summarizes the main conclusions of this work.
2 Related work
In this work, we present a MAP that is integrated into MPI applications for performing the
I/O on a remote filesystem. Typically, parallel applications follow the classical model for which
data are initially distributed, then processed in parallel, and finally the results are collected and
stored on the disk. There are several classifications of the file I/O access routines based on their
design and implementation. First, file I/O methods can be independent or collective. In the
former case all the network and file systems operations are performed independent from each
others. However, this implementations may cause an inefficient use of the network and disk
resources, especially for an important subset of scientific applications, known to generate large
number of requests for small disjoint regions of the shared files [45, 53]. In turn, collective I/O
techniques merge small individual requests from compute nodes into larger global requests in
order to optimize the network and disk performance. Depending on the place where the request
merging occurs, one can identify two collective I/O methods. If the requests are merged at the I/O
nodes the method is called disk-directed I/O [36] or server-directed I/O [49]. If the merging
occurs at intermediary nodes or at compute nodes the method is called two-phase I/O [51, 61].
Second, file I/O can be implemented either as synchronous or asynchronous operations.
Asynchronous implementations [40, 34, 47, 52] offer the advantage of overlapping the
computation and file I/O. Third, file I/O can be implemented either as push or pull operations. In
the push operations implemented by the vast majority of parallel file systems [48, 38, 33], the
compute nodes send the data to I/O nodes responsible for storing them to disks. Push operations
can cause disk contention if a large number of compute nodes write data at the same time. In the
pull operations [59], I/O nodes are notified by the compute nodes and recollect the data based on
a schedule optimizing the network and disk bandwidth. The agent approach taken in this paper
can be classified as collective, asynchronous and pull-based.
The use of autonomous and multi-agent systems as an active element of distributed
computing infrastructures has been an important trend during last years [44, 1]. Their use is
characterized by an autonomous and asynchronous execution that hides the communication
overheads. Other properties, such as reactiveness, proactiveness, learnability, and collaboration
capabilities can be used for adapting to the dynamic conditions of the evolving platform and
increasing its robustness. Mobile agents have been used in the context of distributed computing.
There are several initiatives such as GrADS [56], a migration framework for Grid systems that takes
into account both the compute load and the application characteristics; Ibis [46], a programming
environment based on Java for distributed systems that supports communication and serialization
of Java objects; and ASSIST [23] which is an example of the use of autonomic management for
performance tuning in massively parallel applications. [41] presents a platform that combines
agents with parallel applications. The integration with MAs is limited at the level of control and
management. In [35] a model driven component-based framework is presented. MASIPE [54] uses
MAs for collecting and displaying data from parallel applications although it is not designed for
I/O.
Examples of the use of MAP for Grid are AgentScape [57], GridFlow [10], Grid of agents for
network management [4], and GAIN [9]. GAIN provides an infrastructure for defining, distributing,
and executing workflow tasks on Grid nodes. GAIN is implemented in JADE [25] with different
types (roles) of MAs for performing these functionalities. In [15] a JADE extension for Grid is
presented. A different approach is presented in [31], where a execution environment for
multi-agent computing systems based on cloud and volunteer computing is introduced.
There is a limited body of research on exploiting agents for file I/O. MAPFS [63] is a
multiagent architecture targeting high-performance for different access patterns. This is achieved
by hiding the latency of file access through loosening the coupling between the applications and
the storage architecture of a cluster. MAPFS-DSI [64] leverages agents for improving the
performance of file transfers through the GridFTP protocol. Similarly to our approach, Fukuda and
Miyauchi [66] propose to use agents for distributing files from and to remote computing nodes
that are not connected in the same network file system. Unlike in our case, their system targets
Java applications, requires modification of existing applications and is based on a hierarchy of
agents.
In [27] a MA platform for searching, collecting and storing software components. It uses
ontologies [62] for disambiguate queries, navigating the MAs, and efficiently retrieving the
information. Although it is not used for I/O, it is an example of the flexibility and transparency
achieved by the use of MAs. In [16] MAs are employed to allocate resources for multimedia
applications based on the available bandwidth. In [30] MAs are used for transferring data in sensor
networks, reducing the data redundancy and communication overheads.
A critical issue of a MAP is the efficient scheduling and routing of MAs. In [43, 28] the
Travelling Agent Problem is presented. It is defined as finding the optimal MA itinerary which
minimizes the transfer time. In general, this is a NP-Complete problem, but its complexity can be
reduced under simplifying assumptions (regarding the number of MA, the network latencies, the
probabilities of success and the compute time of each MA task). In [17] a de-centralized scheme
with different routing schemes for MAs is presented. Routing techniques are addressed to avoid
congestions and achieve good load balance. In [11, 12] are presented routing algorithms that
computes the minimal number of MAs and their planning for a given turn-around time. In this
paper we follow a similar approach, but in our case the problem can be simplified given that once
a sequence of nodes is assigned a given MA, it is possible to find the routing for it.
In [5] an agent approach to solve routing problems is presented. This approach is based on
the swarm intelligence principle: Using a collection of agents with self-organizing capabilities to
solve a given problem. Another approach can be seen in [3], where a hybrid routing protocol
which combines on-demand and ant routing techniques for exploiting and combining the
advantages of each one. In [22] MAs have local planning abilities and interact with each other and
cooperate in a peer-to-peer mode. The use of Simulated Annealing techniques and Genetic
Algorithms for optimizing the task scheduling has been extensively studied [20]. In [13, 2] route
selection algorithms based on Genetic Algorithms are presented for reducing the MA propagation
time. In [55] a traffic-based routing algorithm is presented.
In our case, the routing scheme is limited by the range of data that are transferred to disk,
that is, the compute nodes that are involved in the I/O operation. For a given number, we
establish the number of MAs and we route them based on the minimization of an objective
function. Note that in our experiments each compute node is visited by only one MA, thus it
reduces the interaction degree between the MAs
1
.
One practical application of MAGIO is its use for checkpointing [29, 14, 21]. During this
technique the processes of a parallel application save their state on disk. In the event of a system
failure, last saved state can be used to restart the process in the same or different compute node.
There are several checkpointing strategies: in coordinated checkpointing [19, 58] the processes
synchronize for storing the same application state. This allows to simplify the recovery process but
increases the overhead, given that the processes have to be synchronized and halted. In
uncoordinated checkpointing [8], this operation is performed independently, storing different
application states. This strategy obtains better performance than the previous one, but the
recovery process for finding a consistent state is more complicated. Other checkpointing strategies
are message logging [26] for recovering a consistent state based on message traces and
distributed storage schemes of the application data on different compute nodes [7]. All these
strategies rely on storing the state of each process on disk. MAGIO can be employed in
combination with all of them given that it provides support for transferring the data from the
compute nodes to the storage nodes. In addition, by means of MAGIO I/O operations can be
removed, given that the current system state can be stored (by the application) in a memory area
and subsequently collected and transferred by the MAs.
Currently, there are many stable and scalable distributed monitoring solutions [6, 42]. The
collected information is used for the system management and failure detection and can be
employed in combination with MAGIO for modeling the platform architecture. These solutions and
the one offered by MAGIO are complementary. In [60] mobile agents are employed for reducing
the energy in cluster invironments. In this work agents motinor distributed applications executed
in clusters and are responsible of scaling the frequency in each compute node according the
1
Our platform supports collecting different parts of the compute node data by more than one MA. However, we impose this
restriction because it obtains the best performance.
application characteristics. In [39], a mobile agent-based platform for e-commerce systems is
presented. In this case mobile agents are used to monitor and choose the best sellers and to
perform the electronic transactions. Although the scope of this paper is different to the one of this
paper, in both works agents are employed to procure elements according to a dynamic
procurement scheme.
3 MAGIO architecture
MAGIO offers two main services: system monitoring and distributed I/O. The first one
consists of collecting information from the compute nodes, which includes the amount of
memory, disk, and CPU usage. In practice, the user specifies the compute nodes to be monitored.
According to this information, the MAP creates and delivers the MAs to the distributed system.
The agents traverse all their assigned compute nodes, collecting their status, and subsequently
levering it for monitoring the system.
The second service consists of storing in the Storage Units specific portions of the
application data. This service focuses on providing asynchronous data access facilities to
large-scale distributed applications. Data access can be performed in two forms: collecting a
specific portion of data that is distributed through the system, or collecting all the data of a given
compute node. This paper provides a unique solution to both approaches, based on the use of MPI
datatypes. Additionally, this platform implements different scheduling techniques for performing
these operations.
Figure 1: MAGIO Architecture.
Figure 1 shows a scheme of the proposed MAGIO architecture. It has a modular structure,
which allows to simplify the design and the future inclusion of new functionality. MAGIO consists
of the following major building blocks:
• The MPI application, is executed on the assigned compute nodes. This application
is external to our design, but we include it because MAGIO is executed attached to it, that is,
sharing its memory space. Currently, our implementation supports Fortran and C coded
applications.
• The Mobile Agent Server (MAS) is executed attached to each MPI process. It
provides routines for receiving, executing, and sending MAs. In addition, it offers adaptors for
accessing both the compute node resources and the MPI application memory space.
• The Control Unit (CU) is responsible for creating, launching, and receiving the MAs.
It also contains a GUI that displays information about the execution status of the compute nodes.
• The Storage Unit (SU) is executed on the I/O nodes. It provides an interface for
receiving MAs and transferring their associated data to local disk. The file access pattern, i.e., the
file positions where the agent data are stored, is provided by a MPI datatype stored in the MA.
• The Mobile Agent (MA) is an agent that is able of migrating between these blocks.
First, they are created at the CU; second, they collect data from the parallel application and finally,
they deliver these data to the SUs. Our innovative platform provides flexibility in use, transparency
in data accessing, efficiency and scalability in performing the I/O.
This approach was implemented in MPI, but can be also extended for being employed in
OpenMP or hybrid MPI-OpenMP programs. All building blocks, with exception of the MPI
applications, are coded in C# and are executed in the Mono execution environment [65]. The
following sections describe the structure of each one of them.
3.1 Building block structure
Figure 2: Mobile Agent Server Structure.
Figure 2 shows the Mobile Agent Server structure. When the MPI application is executed
in a distributed environment, each process has attached one MAS, which is executed as an
independent thread. The MAS has two different roles. First, when the MAS starts its execution, it
offers services for the management of assemblies, which are portable executable files
corresponding to agents and libraries. The management includes operations such as receiving the
assemblies from the CU, storing them in a local cache and accessing them when the agent is
executed. Second role consists of the management of MAs. This task consists of three operations:
MA reception from a CU or another MAS, MA local execution, and MA delivery to the next MAS,
CU, or SU.
MAS consists of four components: Migration Component, Assembly Resolver, Local
Assembly Cache, and MPI Server Adaptor. The Migration Component performs the transfer of the
MAs and other assemblies using .NET Remoting facilities over a TCP channel. The Assembly
Resolver manages the assembly execution. When a new assembly is received, the Assembly
Resolver authenticates it using the assembly metadata, which includes the agent name, version,
and public key. The public key is a 64-bit hash, which is associated to a private key used to sign the
assembly. When the assembly is created, a hash is computed, and subsequently encrypted with
the private key. This hash is stored in the assembly metadata. Then, when the assembly is
authenticated at the Assembly Resolver, the hash is decrypted using the public key and compared
with a new hash generated from the assembly. If the public key is truly related to the private key,
then both hashes match, and the assembly is properly authenticated. Once the authentication
process is completed, the assembly is stored in the Local Assembly Cache. When a specific
assembly is required, it is loaded from it. The fourth module is the MPI Server Adaptor, which
attaches the MAS to the MPI application. This unit provides access to the process memory space
as well as to the local node resources. This way, the MAS may access the application data by
means of memory pointers to data structures that are provided by the MPI Server Adaptor.
Figure 3: Control Unit Structure.
The Control Unit, depicted in Figure 3, consists of four major components: The GUI
Manager allows the user to select, configure, and execute MAs, displaying the information
resulting from their execution. Once a given MA is selected, its code is loaded into memory from
the Repository Module, which contains all the MAs and configuration assemblies. The CU also
contains an Data Assignment unit. This unit initiates and assigns the data that each MA has to
collect during the I/O operation. This assignment is computed trying to minimize the overall I/O
time, and taking into account the distributed architecture characteristics (network speed, compute
node computational power, etc.). Finally, the Migration Component transfers the MAs and other
assemblies using .NET Remoting facilities.
The Storage Unit consists of the Migration Component, the Assembly Resolver and the
Local Assembly Cache. These modules have the same functionality as the ones from Mobile Agent
Server. Instead of using a MPI Server Adaptor, the CU implements services for storing the MA data
to local disk. These services include MPI datatype handling for mapping the agent data to file
positions.
3.2 Mobile Agent logic
Mobile agents contain both code and data sections. The code section consists of all the
agent logic necessary to compute its itinerary as well as to perform its duties in the MASs and SUs.
The data section encloses two different elements: metadata and payload. The first one contains
the information required by the MA to perform its task. It includes the agent itinerary and data
map, represented as MPI datatypes (in case of being a data collector). The payload contains the
data collected by the agent in each MAS.
Our current implementation supports two different agents: the data collector which
gathers data from the MPI application memory space and the system monitor, which collects
information from the compute node configuration files. The first one is used for performing
distributed I/O operations and the second one is employed for monitoring the distributed
architecture status. The information collected from the monitoring process includes the
network/node characteristics, which will be subsequently used as input parameters in the
scheduling component. Figure 1 shows the agent itinerary of these two types of agents. Mobile
Agent 1 is a system monitor, which travels to the Compute Node 1, collecting the node status
information and transferring this information back to the CU. Mobile Agent 2 is a data collector. It
leaves the CU and collects data from the MPI application processes running on the compute nodes
2 and 4 and, finally, stores these data on the second storage node. Note that both agent types can
traverse any number of compute nodes and that both operations are fully parallel.
The integration of MAGIO with the target application is simple: the pointer of the data
structure, a MPI datatype describing the data contents and a lock variable for allowing/denying
the data access by the MA. Using the variable, it is possible to control the time interval where data
are collected (a subsequently transferred) by the MA.
Regarding the platform reliability, the MA is responsible for collecting (from memory or
disk) and transferring the compute node data. Agent failure detection can be done using timeout
or heartbeat techniques. Timeout is estimated based on the traveling time of each MA. This is
done during the MA scheduling process. A MA is considered lost if it is not received at the SU after
this time interval (plus a confidence value). In heartbeat techniques MAs regularly sent messages
to other members of the group for notifying their status. If one MA is lost, the message emission
will be interrupted. In both cases, when one MA is lost, the MAPs schedule a new agent with the
same tasks.
The MA has the following properties: they are autonomous and adaptive, i.e. they can
take run-time decisions about their actions based on platform conditions; they travel across both
the compute and I/O nodes, interacting with them; they are persistent. Following sections
describe in more details the implementation of these features.
Figure 4: Datatype handling.
4 Data management
A large class of parallel compute-intensive scientific applications operates on
multidimensional data sets, which are distributed among the compute nodes. MAGIO includes
services that allow the user to specify the disk data distribution, how many MAs are employed,
and which portion of data each agent collects. Optionally, an automatic data assignment unit can
be employed for automatically computing the number of MAs, in order to minimize the overall I/O
cost.
Each data collector agent handles a specific datatype, which describes the data entries
that it has to collect. Datatypes are data structures that contain references to memory or file
positions. Datatypes are commonly employed by MPI applications for describing the memory
regions that are used for sending/receiving data. They are also employed in I/O view operations
for describing the file positions where the data are allocated. MPI datatypes provides a flexible
solution for handling generic types of data distributions. Figure 4 shows an example of
MPI_Indexed datatype for a 16-entry data array. MPI_Indexed datatype consists of a sequence of
blocks, where each block can contain a different number of copies and have a different
displacement. Distribution datatype 1 contains the following offset-length list of blocks: {0,4},
{5,4}, {12,1} and {14,2}.
When a data collector MA is created, a given datatype is added to its data section with the
description of the memory entries that it has to collect. We call this datatype MA datatype. We
distinguish two different MA datatypes: global data assignment, when a given agent has to collect
the whole data from one or several MPI processes; and partial data assignments, when the agent
collects only part of the data. Once the MA is created and initiated with a data assignment, it
firstly visits the MASs that contain the data distribution datatypes. Once there, it compares its data
assignment with the global data distribution. Using this information the agent determines which
MASs to visit, which data entries to collect from each one, and in which order to perform its
itinerary.
The itinerary is determined in the following way: When a MA is created it includes a basic
primitive for computing the intersection between two datatypes. The result is a newly generated
datatype. Figure 4 shows an example of this operation. We label intersection datatype 1 and 2 the
resulting datatypes produced by the intersection between MA datatype with distribution datatype
1 and 2, respectively. The collected entries of the data array are the union of entries pointed by
both intersection datatypes.
Using this primitive, each MA can easily compute its itinerary by means of a scheduling
algorithm. Figure 5 depicts the algorithm. Initially (1), the MA checks if it has computed its
itinerary. If not, it visits the compute nodes that contain the data distribution datatypes (2), and
accesses their memory space. Then, it obtains the number of MPI processes used by the parallel
application. For each one, a reference of its datatypes (related to its data distribution) is
generated. Using this reference, the MA performs the intersection operation (3) between its MA
datatype and the data distribution (datatypes) of each MPI process. As a result, the range of
entries that has to collect in this particular node is obtained. Repeating this operation, the MA
collects the list of nodes of its itinerary. If one intersection operation produces no entries, then
this node is discarded from the list.
Figure 5: Mobile agent logic.
The last step consists in computing the itinerary for visiting these nodes (4). In this work,
we have followed the minimum network traffic scheduling. A given agent with a given data
assignment creates a minimum network traffic itinerary visiting first the MAS with smallest
amount of entries to be collected, then the following with fewer entries, and so on. This is
straightforward taking into account that the agent collects the entries of each MAS and all the
collected data are transferred across the MA itinerary. We can see that the larger MA itinerary,
the more accumulated data from each compute node and bigger network traffic [32]. Accessing
firstly the compute nodes with less data ensures smaller network traffic.
A minimum load itinerary that ensures minimum network traffic can be easily computed
from the agent data distribution. Based on the previous property, the elements of the MA
itinerary are sorted and visited in ascending order, according to the amount of data that the MA
has to collect. Finally, the SU is added as a final step of the itinerary.
Once the itinerary is computed (see Figure 5), the agent transfers the assemblies (both
code and data) to the next MAS (5). This operation is performed in the CU and MAS modules. Once
the MA is ready to be transferred to the next element of its itinerary, it firstly obtains a remote
reference of the Migration Component, responsible of receiving the MA. There are two different
migration levels of MAs: strong and weak. In the former one, the agent is transferred to the new
host with its current execution state, allowing to resume the execution in the new host with the
same state (instruction, register values, etc.). Weak mobility consist in transferring the agent and
executing it from an entry point, in a newly execution environment. Our platform corresponds to
the latter one.
The next step of the MA transfer consists in ensuring that the target contains all the
assemblies required by the agent. A specific petition is sent to the Migration Component with the
MA identification, which includes the MA version number, its language and its public key. This
information is received from the target Migration Component, which redirects it to the Local
Assembly Cache. This module checks whether the required assemblies are in the local cache or
not, sending back (via Migration Component) to the source the result of the check. Using this
information, the source node transfers all the required assemblies that are not present. When this
operation is completed, the MA state is transferred. Finally, using this information, the target node
recognizes the MA and executes it with its current state, collecting the compute node data (6). In
this way, the MA is transferred to each node of its itinerary. For each one, the required data is
collected. When the MA reaches the SU (7), it transfers the data to disk and the MA is destroyed.
For reading operations an analogous procedure can be applied: the MA creates the
itinerary using the data distribution datatypes and the MA datatype (now related to the file
portion that has to be read by the MA). Using this information, the MA transfers the associated
data to each node of its itinerary.
Note that MAGIO can handle generic data distributions and file layouts. In a general case,
it uses two different inputs (MPI datatypes): one describing how the data structure is distributed
on the processes and other related with the data elements that have to be read/written. The
checkpointing operations are a particular case where each process has a single portion of data that
has to be transferred as a single file (one file associated to each process). In this case, the datatype
structure is the simplest one (a complete data block).
Regarding the data consistency, two different scenarios can be considered: In the first one,
the application needs to store the current data structure value to a file and to start a new
computation that reuses the data structure. In the second one the processes of the application are
concurrently changing parts of the array while the MA is reading it. For both scenarios it is
necessary to provide an access mechanism for preventing reading inconsistent versions of the
data. This can be done by two different strategies: the first scenario is addressed by copying the
current data structure to a new memory area, where data are temporally stored until the arrival of
the MA. This allows the application to reuse and modify the original data structure without
interfering with the MA. The second scenario requires synchronization mechanisms between the
MA and the application. These mechanisms can be incorporated using locking facilities over the
data (by means of a shared variable).
This platform was designed for performing system monitoring and I/O, including
capabilities to adapt to the changes in the execution environment. First, the proposed architecture
is scalable, offering a larger number of MAs when more compute nodes are employed. Moreover,
more SU can also be used, increasing the available parallelism, not only at transfer level, but also
at storage level. Second, due to the modular design of the platform, new MAs with new logic can
be easily incorporated with a minimum design cost. Due to the close interaction between the MAs,
the MPI application and the compute nodes, it is possible to retrieve detailed information about
both the software application and underlying hardware. Therefore, it is possible to include new
logic to allow the MAs to adapt themselves to the changes in their environments.
5 Data assignment strategies
One critical element in the MAGIO architecture is the data assignment mechanisms of the
MAs. Performing this process efficiently will achieve a better resource utilization (computational
power and network usage), increasing the load balance and the overall platform performance.
Consequently, we need a system-aware data assignment mechanism for the MAs. In this section
we study the scalability and robustness of different data assignment strategies.
In our context, each data collector MA has a data assignment which describes the set of
data entries that has to collect. Thus the MA scheduling problem can be reformulated as the
generation of the data assignment for each MA. Our platform has currently integrated four
different data assignment techniques: user-defined, Simulated Annealing, Genetic Algorithm, and
Greedy Algorithm. The latter three data assignment strategies use a cost function for estimating
the I/O operation time based on the MA data assignment and platform characteristics. This
function is described next.
5.1 Cost function
The first step for constructing the cost function consists of obtaining the set of
parameters that characterize the distributed architecture. In our experiments we have considered
the following ones, associated to each node: Network link latency (ms) and bandwidth (Mb/s),
communication startup latency (ms), processor power (MFLOPS), and I/O storage node
bandwidth.
We have developed a mathematical model that computes the agent execution time for a
given itinerary. This simulation assumes that the agent has a size associated with its code and a
payload associated to the collected array entries. The model computes the network transfer time
between any pair of nodes of the MA itinerary. Additionally, it assumes that for each compute
node, each agent requires a given number of floating point operations in function of its size (code
plus payload). The cost of each MA transfer is shown in equation 1, the last term of this expression
is only considered at the SU.
]/[_= TimenComputatioionCommunicat OITTLatencystartupNodeCost
(1)
In this model the MA transfers are exclusive, that is, when the agent is transferred
between two nodes, both nodes do not allow other communication. If any agent tries to visit any
of these nodes, it has to wait until the former agent finishes its transfer. The model addresses the
existence of bottlenecks in the network access. In addition, we allow concurrent MA execution,
that is, a given compute node can host a generic number of MAs. The final result produced by this
model is the total execution time of agent I/O operations, that is, the maximum execution time of
all existing MAs.
5.2 User-defined data assignment
In this policy, the user specifies the data associated to each MA by means of user-defined
datatypes. These datatypes are loaded at the CU and transferred to the MA. In this case, the
problem is not generating an optimal MA data assignment, but providing a procedure to collect a
generic portion of the MPI application data transparently, without knowing how many MPI
processes are being used, which data entries are stored in each one, and which compute nodes
are being executed in. All these variables are automatically computed by the MA. Note that this
strategy does not use the cost function, given that the data assignment is provided by the user.
A drawback of this approach is the potential hotspots due to MA traffic: If several agents
access the same compute node, contention hot spots can be produced. These hot spots can
introduce delays, increasing the agent itinerary time (that is, the global I/O operation time) and,
consequently producing a non-optimal schedule. The solution taken in the remainder of the
techniques consists of assigning always different compute nodes to the MAs. For the rest of the
approaches there are no conflicts with the MAs given that each compute node is visited only by
one agent.
5.3 Simulated annealing data assignment
In this approach, the objective is to find an optimal data assignment that achieves the
smallest I/O execution time. We assume that each compute node is only visited by one MA which
collects the whole data to be accessed. Using the Simulated Annealing (SA) approach, this
problem can be seen as a global optimization problem in a discrete space. Each space state
corresponds to a given data assignment for each agent. More specifically, we define state as a
vector of
node
N
entries, where
is the number of compute nodes. The
th
i
entry of
state
contains the agent responsible of collecting all data from the
th
i
compute node. If
MA
N
agents are employed, then the number of different states is
node
N
MA
N
.
The associated execution time of each state will be the one returned by the cost function.
Under this approach, the problem of finding the optimal data assignment can be seen as the
problem of searching the state with a minimum cost. Note that each state corresponds to a
specific data assignment for each MA participating in the I/O operation.
In our experiments we have used the SA implementation of Matlab, included in the Global
Optimization Toolbox. The SA implementation uses two user-customized functions: the cost
function (shown in Section 5.1) that provides the cost of each state, and a permutation function,
that obtains a random new state based on the current one. We have considered two permutation
functions:
• Perm1. This function permutes two random entries of state. That is, it generates
two random values
nodes
Nji >,0
and exchanges the values
][istate
and
][ jstate
.
• Perm2. This function takes a random entry of state and assigns it a random agent.
That is, generate two random values
nodes
Ni >0
,
MA
Nj >0
and sets
jistate =][
.
The initial state is a round-robin distribution of the MAs over the compute nodes. Note
that Perm1 ensures load balance, given that the number of compute nodes assigned to each agent
is kept constant. On the other hand, Perm2 introduces a random assignment to agents that can
cause unbalance. Both permutation functions are sensitive to temperature variable (internally
used by the SA). The greater temperature, the larger entries of state that are permuted.
5.4 Genetic algorithm data assignment
The Genetic Algorithm (GA) is a method for solving optimization problems based on
natural selection. Starting from an initial state (called population), it repeatedly modifies it,
producing new states (called children). These states are used as parents, in the next step, to
produce a new generation of children. Over successive generations, the population evolves to new
states. Introducing some criteria of selection, it is possible to shift towards an optimal solution.
In our case, we use the Matlab implementation of Genetic Algorithms of the Global
Optimization Toolbox and the cost function depicted in Section 5.1 for guiding the optimization
process. Both the SA and GA approaches compute a data assignment with a close-to-minimum
execution time.
Based on the previous strategies, the problem of finding the optimal solution has been
simplified to another one: given a generic number of agents, and using the SA or GA approaches, a
close to optimal MA data assignment can be found. Using this data assignment the MAs compute
the itinerary based on the minimum network traffic scheduling.
FOR i=1,
State[0:
]=-1
State[i]=1
cost[i]=evaluate_cost(state) T1
END
Scale=1.1; Tmax=-1;
MA
N
=0; State[0:
node
N
]=-1
WHILE (there are unassigned nodes)
j=Take_Max_Cost(cost) T2
MA
N
++
State[j]=
MA
N
T=evaluate_cost(state) T3
Tmax=max(T,Tmax)
FOR each k
unassigned nodes
state[k]=
MA
N
T4
T= evaluate_cost(state) T5
Threshold= Tmax*scale
IF(T
Threshold) state[k]=-1 T6
ELSE Tmax=max(T,Tmax)
END
END
Figure 6: Pseudocode of Greedy Algorithm data assignment.
5.5 Greedy algorithm data assignment
With the Greedy Algorithm (GRA) we have followed a different approach: instead of
minimizing the I/O cost using a fixed number of MAs, we consider an increasingly number of
agents. Figure 6 shows the pseudocode of the Greedy Algorithm. It consists of two phases. In the
first phase, we evaluate the cost of sending one MA to each compute node, collecting its data, and
storing it in the SU. The idea is to evaluate the associated cost of independently accessing each
MAS. Function evaluate_cost (T1 tag), computes these values using the cost model defined before.
Note that a
state
value of -1 discards the associated MAS of the I/O operation.
In the second phase, the MA data assignment is computed. Initially, one MA is assigned to
the highest cost compute node (T2 tag), obtaining the overall execution time (T3 tag). Then, we
tentatively add, each one of the remainder unassigned compute nodes (T4 tag) to the MA
itinerary. For each one of them, we evaluate whether overall execution time (T5 tag) is larger than
a given threshold (T6 tag). If this is the case, the node is discarded (T6 tag). Otherwise, it is added
to the MA itinerary. This process repeats until all the nodes (
state
entries) are assigned to a MA.
In order to better exploit parallelism, a given compute node with a high execution time will have a
dedicated MA, but, at the same time, other nodes with smaller cost will share the same MA.
Note that, for the SA, GA and GRA data assignment strategies, the final result is the
state
vector. Using this data structure, the agent travels to the MPI root process, where it generates the
necessary data assignment and computes in which order it will visit the assigned nodes. In the
case of the User-defined data acquisition, the MA has initially its whole data assignment, thus, it
only has to compute its itinerary.
6 Use case: MAGIO with BIPS3D parallel application
We have fully integrated MAGIO with a parallel scientific application, and evaluated it on a
distributed architecture. This section, describes both the application and architecture
characteristics as well as the results of the evaluation.
6.1 BIPS3D Simulator
BIPS3D [24] is a 3-dimensional simulation of BJT and HBT bipolar devices. The goal of the
3D simulation is to relate electrical characteristics of the device with its physical and geometrical
parameters. The basic equations to be solved are Poisson’s equation and electron and hole
continuity, in a stationary state. Finite element methods are applied in order to discretize the
Poisson equation, hole and electron continuity equations by using tetrahedral elements. The result
is an unstructured mesh in which we place more nodes in the areas of union between different
areas of the transistor.
Using the METIS library [18], the mesh is divided into sub-domains, such that one
sub-domain corresponds to one processor. The next step is decoupling the Poisson equation, hole
and electron continuity equations, and linearize them using the Newton method. Then the part
corresponding to the associated linear system is constructed, for each sub-domain, in a parallel
manner.
BIPS3D consists of the following compute phases. The first phase is the data initialization
and distribution: the compute node 0 (called root node) loads and initializes the data structures
related to the electronic device to be simulated. Then, the mesh is partitioned using METIS and
data distribution datatypes are automatically generated based on the partition information. These
datatypes are employed for distributing the mesh elements among the compute nodes by MPI. In
a second phase, the device parallel simulation is carried out. This phase consists of a given number
of time steps. Note that these operations are performed in parallel and some communication
operations are required in order to update the mesh boundaries. The third phase, the I/O phase, is
executed interleaved with the second. Periodically, for a given number of time steps, the compute
nodes send to the root node the complete state information related to the assigned mesh portion.
The root node gathers all the data, reconstructs the complete mesh and transfers it to disk. Note
that this phase consists of two operations: MPI communication for gathering the data and I/O disk
transfers. In our approach, the I/O transfers are carried out using MAs.
Note that in BIPS3D data arrays are initialized at the root node, which subsequently
distributes them to the rest of nodes. Thus, root process contains the datatype structures
employed in the data distribution, that is, the description of the data distribution over all compute
nodes. In order to obtain the itinerary, each MA has to visit the root node firstly, to intersect its
data assignment with the data distribution datatypes, and to apply the Minimum Network Traffic
scheduling.
The next sections describe the MAGIO evaluation when used with BIPS3D. Our
experiments target the following issues: performance evaluation of the Mobile Agent Platform
employing different numbers of Mobile Agents and Storage Units and efficiency of the data
assignment techniques. In addition, the performance has been compared with one of the original
MPI application. All these experiments have been performed on a real distributed architecture
consisting of 40 commodity computers with one dual-Core AMD processor running at 2.2GHz, 512
MB of RAM and FastEthernet interconnection network. The average latency between the
computers is 0.098 msecs with a standard deviation of 0.012 msec.
We have used the gcc 4.1.2 for compiling the BIPS3D application and the Mono JIT
compiler 1.2.2.1 for the mobile agent platform. We have used two datasets called Device1 and
Device2 which are related to two tetrahedrical meshes of semiconductor devices employed by the
BIPS3D application. These meshes have respectively 47,200 and 32,888 nodes. Each node contains
several data structures, with a total data size of 56.6MB and 39.5MB per mesh. This is the volume
of data that have to be transferred to disk during the I/O operation.
6.2 Evaluation of the impact of the number of mobile agents
Figure 7(a) shows the execution time of the I/O data transfer for Device1 using 32
compute nodes, one Storage Unit, and a variable number of MAs. We evaluated the I/O time for a
range from 1 to 32 agents. The data assignment of the MAs follows block and cyclic distributions
of the MAs over the compute nodes. The last column is the I/O transfer cost of the original
MPI-based application with 32 processors. In our experiments we focus on the evaluation of the
data transfer cost over the network. Therefore, the measured results do not include the disk data
transfer. More specifically, Figure 7 shows the time interval since the first MA leaves the CU until
the last MA transfers the collected data to the SU memory space. For the MPI-based application,
the Figure shows the execution time for sending all the data to the root process.
If we consider, a distributed platform of commodity computers sharing the disk resources
using NFS, the MPI root process writes the collected data to a remote disk. The cost of this write
operation for Device1 dataset is 4,950 msecs for NFS (MPI original version). In contrast, each SU is
supposed to have local disk resources allowing to reduce the access time. In our experiments, the
local disk access time is 188 msecs.
In Figure 7(a) we can observe that the proposed architecture takes advantage of the
parallelism when several agents are employed, reducing the I/O time. For one MA the I/O transfer
time is very large, given that the collected data have to be transferred among all the compute
nodes. When more MAs are employed, the parallelism is efficiently exploited, drastically reducing
the I/O transfer time. Note that for 32 MAs, each one collects the data from one compute node.
However, the smallest execution time is obtained for 16 MAs. This is due to the fact that for 32
agents the contention is higher at the single SU.
If the disk access time is not considered, the MPI application gathers the data more
efficiently than the MAP. However, it is important to remark that the MAP fully overlaps the I/O
with the computation, given that it is done asynchronously without the intervention of the original
application. In the case of the MPI-based application, the root process has to wait until receiving
all the data from the rest of processes, limiting the degree of overlapping between I/O and
compute phases. This makes our approach more feasible.
(a) Impact of the number of MAs
(b) Impact of the number of SUs
Figure 7: Scalability of MAGIO.
6.3 Evaluation of the impact of the number of storage units
The proposed platform supports a generic number of SUs, each one with a generic
number of MAs. In this case, the file contents are distributed over several SUs. That is, we provide
the basic functionality of a parallel file system. There are several differences with other parallel file
systems approaches like GPFS and PVFS. First, MAGIO does not require administrative privileges,
given that it can be completely installed and executed in a local account. Second, it is
transparently integrated in the original application, given that no I/O explicit calls are required.
Third, any compute node with local storage can be used as a SU.
Figure 7(b) shows the execution time of the I/O data transfer for Device1 for 32 compute
nodes, 16 MAs (cyclically distributed) and a variable number of SUs. We considered a range
between 1 and 8 SUs. Note that the number of MAs is the same for all the cases, thus, the larger
the number of SUs, the smaller the number of MAs assigned to each SU.
In this scenario two different levels of parallelism are exploited: First, the use of several
MAs increases the parallelism for collecting data and second, the use of different SUs reduces
contention at disk. We can see that there is a strong reduction of the overall execution time, when
more storage units are employed, resulting in a better performance than the MPI-based approach.
6.4 Evaluation of data assignment techniques
In this section the efficiency of different data assignment methods is evaluated on MAGIO.
In addition, new synthetically generated scenarios are also studied. The idea consists on
introducing artificial compute node loads that represent contention spots. This situation appears
in distributed environments, such as Planetlab [50], that do not provide exclusive access to the
compute nodes. In these cases the parallel application competes with other user applications that
consume CPU and network resources.
In order to consider these effects, we have modified the application adding pre-defined
latencies to some specific compute nodes. These artificial latencies introduce communication lags
that delay the MA transfer time. Using these new scenarios, we evaluate the capacity of MAGIO to
adapt to the changing platform conditions and to perform efficient I/O in each case. MAGIO
performance was evaluated considering the following scenarios:
(a) Scenario 1
(b) Scenario 2
(c) Scenario 3
(d) Scenario 4
Figure 8: Evaluation of MAGIO for Device1.
(a) Scenario 1
(b) Scenario 2
(c) Scenario 3
(d) Scenario 4
Figure 9: Evaluation of MAGIO for Device2.
• Scenario 1: Application has exclusive access to the platform. No artificial latency is
added.
• Scenario 2: Each compute node executes a sequential program with a random load.
We model this scenario by introducing a random distribution of artificial latencies of up to 6 secs.
• Scenario 3: A group of compute nodes is executing a demanding parallel
application. The rest of nodes have no load. We introduce an artificial latency of 7 secs for three
compute nodes. The rest of them have no artificial latency.
• Scenario 4: One compute node has a very heavy contention and three more have
medium contention. One compute node has 15 secs of artificial latency and three compute nodes
have 5 secs. The rest of them have no latency.
All these scenarios are evaluated for 32 compute nodes and 1 SU. For each one of them
we have generated the agent scheduling using the Genetic Algorithm (GA), the Simulated
Annealing (SA), and the Greedy technique for 4, 8 and 16 agents. We call SA-Perm and SA-Repl as
the SA implementation using Perm1 and Perm2, respectively (see Section 5.3). Regarding the
Greedy technique, we use
scale
values (see Section 5.5) of
1.05
,
1.10
and
1.20
for
Greedy1, Greedy2 and Greedy3, respectively. Note that this algorithm dynamically calculates the
number of agents, thus this number is not an input parameter like in the other techniques. In
addition, we evaluated the block and cyclic agent distribution over the compute nodes (called
Block and Cyclic in the figures).
Figures 8(a) and 9(a) show the overall execution time for Scenario 1; Device1 and Device2,
respectively. We can see that the Block and Cyclic distributions obtains, in general, the best
performance. This is due to the fact that all the compute nodes have the same characteristics,
thus, load balance (assigning the same number of compute nodes to each agent) is the most
important factor in scheduling. For each Greedy technique, the number of MAs automatically
computed by the algorithm for each scenario is shown on the horizontal axis.
Figures 8(b) and 9(b) show the overall execution time for Scenario 2. Given that the
artificial latencies are randomly distributed, Block and Cyclic distributions are still efficient. In
general the GA, SA, and Cyclic strategies deliver the best scheduling. Block distribution introduces
important delays when a small number of agents are employed. This is because the same agent
visits the compute nodes with highest latencies. Greedy approach is efficient only for small scale
values.
Figures 8(c) and 9(c) show the overall execution time for Scenario 3. Now the artificial
latency affects only few compute nodes. Note that Block distribution is heavily affected by the
artificial latency. For Device1, the GA and Greedy obtain the best scheduling. For Device2 the most
efficient ones are the GA and Cyclic data assignments.
Figures 8(d) and 9(d) show the overall execution time for Scenario 4. The introduction of a
hotspot with large latencies produces poor results for Block and Cyclic scheduling. In general the
GA and SA generate efficient scheduling given that they adapt better to the changing system
conditions, assigning only one MA to each hotspot node. Greedy technique is also efficient for
scales 1.05 and 1.10. Note that the larger the overheads due to hotspots, the smaller the number
of MAs employed by the Greedy approach. This is due to the fact that the algorithm assigns one
MA to each hotspot and during this time, fewer MAs are necessary for balancing the accesses of
the remaining compute nodes. The advantage of a small number of agents is the reduced
contention at the final SUs.
In general, the data assignment efficiency is strongly dependent of the data distribution,
the platform characteristics, and number of MAs employed. In overall, AS and GA are efficient for
all the considered scenarios, while the Block and Cyclic distributions are only efficient when
hotspots are not present. Greedy technique produces competitive scheduling, especially for low
scale levels. Note that for all considered scenarios, MAGIO obtains in a autonomous and flexible
way a adaptive I/O scheduling that provides scalable solutions for distributed systems.
7 Conclussions
In this paper we present MAGIO, a novel mobile agent platform targeting to perform the
I/O on behalf of parallel MPI applications. This approach presents several advantages compared to
the classical I/O model: First, for each compute node, it is not necessary to explicitly perform
remote I/O operations. Data can be locally stored in the compute node (either in local disk or in
memory) and, subsequently, collected by the MAs. This is made possible by the fact that our
platform uses the same memory space and resources as the parallel application. Second, the
agents collect the data asynchronously, without intervention of the MPI application. That is, the
compute and I/O phases of the parallel application can be completely decoupled and overlapped,
hiding the I/O overhead. Third, MAGIO supports not only I/O agents, but also general purpose
agents, which can perform tasks such as collecting compute node and network information and
monitoring parallel applications. Additionally, the security is provided by a private/public key
mechanism. MAGIO supports a generic number of SUs, allowing to extract parallelism both at
agent level and SU level. Even more, it handles generic file distribution policies, based on MPI
datatypes. Fourth, different agent scheduling strategies can be employed, based on the network
and compute node performance. We have evaluated these scheduling both for real and synthetic
scenarios, proving that in the presence of hotspots, the use of scheduling based on global
optimization techniques reaches the best performance. Finally, we have fully integrated our
implementation with MPI. More specifically, data description operators, such as datatypes and file
views, are fully implemented in our MAP.
MAGIO is a scalable platform due to several reasons: it allows to use a generic number of
agents (communication level parallelism) as well as an increasing number of I/O storage nodes (I/O
level parallelism). The MA scheduling is aware of the systems characteristics. The performance of
each single link of the network (latency-bandwidth between two compute nodes) is considered,
thus it is possible to extract topology-related characteristics. Additionally, the computing power of
each compute node is also considered, thus it is suitable for heterogeneous architectures. Finally,
the use in conjunction with a monitoring process allows to detect hot-spots and adapt to the
changing conditions of the platform.
Future features of mobile agents include the run-time modification of the agent schedule
based on the real-time system monitoring, to allow to suspend/resume the agent execution, new
features for tracking and handling MA execution errors, establishing active cooperation with other
MAs, and new functionalities for developing on-demand reading prefetching techniques.
8 Acknowledgements
We would like to to acknowledge the assistance provided by Darío Ortega Correas. This
work has been partially supported by the Spanish Ministry of Science under the grant
TIN2010-16497..
References
[1] I. Foster, N. R. Jennings, and C. Kesselman. Brain meets brawn: Why grid and
agents need each other. Autonomous Agents and Multiagent Systems, International Joint
Conference on, 1:815, 2004.
[2] S. Papavassiliou, A. Puliafito, O. Tomarchio, and J. Ye. Mobile agent-based
approach for efficient network management and resource allocation: framework and applications.
Selected Areas in Communications, IEEE Journal on, 20(4):858872, May 2002.
[3] S. Marwaha, C. K. Tham, and D. Srinivasan. A novel routing protocol using mobile
agents and reactive route discovery for ad hoc wireless networks. In Networks, 2002. ICON
2002. 10th IEEE International Conference on, pages 311316, 2002.
[4] M. D. Assunç ao, F. L. Koch, and C. B. Westphall. Grids of agents for computer
and telecommunication network management: Research articles. Concurr. Comput. : Pract.
Exper., 16(5):413424, 2004.
[5] C. X. Mavromoustakis and H. D. Karatza. Split agent-based routing in
interconnected networks: Research articles. Int. J. Commun. Syst., 17(4):303320, 2004.
[6] S. Zanikolas and R. Sakellariou. A taxonomy of grid monitoring systems. Future
Gener. Comput. Syst., 21(1):163188, 2005.
[7] R. Y. de Camargo, R. Cerqueira, and F. Kon. Strategies for storage of checkpointing
data using non-dedicated repositories on grid systems. In MGC ’05: Proceedings of the 3rd
international workshop on Middleware for grid computing, pages 16, New York, NY, USA, 2005.
ACM.
[8] P. Lemarinier, A. Bouteiller, T. Herault, G. Krawezik, and F. Cappello. Improved
message logging versus improved coordinated checkpointing for fault tolerant mpi. In
CLUSTER ’04: Proceedings of the 2004 IEEE International Conference on Cluster Computing, pages
115124, Washington, DC, USA, 2004. IEEE Computer Society.
[9] A. Negri, A. Poggi, M. Tomaiuolo, and P. Turci. Dynamic grid tasks composition and
distribution through agents: Research articles. Concurr. Comput. : Pract. Exper., 18(8):875885,
2006.
[10] Junwei Cao, S.A. Jarvis, S. Saini, and G.R. Nudd. Gridflow: workflow management
for grid computing. In Cluster Computing and the Grid, 2003. Proceedings. CCGrid 2003. 3rd
IEEE/ACM International Symposium on, pages 198205, May 2003.
[11] J. W. Baek and H. Y. Yeom. D-agent: an approach to mobile agent planning for
distributed information retrieval. Consumer Electronics, IEEE Transactions on, 49(1):115122,
Feb. 2003.
[12] D. Gavalas and C. T. Politi. Low-cost itineraries for multi-hop agents designed for
scalable monitoring of multiple subnets. Comput. Netw., 50(16):29372952, 2006.
[13] A. Selamat and S. Omatu. Analysis on route selection by mobile agents using
genetic algorithm. In SICE 2003 Annual Conference, volume 2, pages 20882093 Vol.2, Aug.
2003.
[14] J. Yang, J. Cao, and W. Wu. Efficient global checkpointing algorithms for mobile
agents. Concurr. Comput. : Pract. Exper., 20(7):825838, 2008.
[15] A. Poggi, M. Tomaiuolo, and P. Turci. Extending jade for agent grid applications.
In Enabling Technologies: Infrastructure for Collaborative Enterprises, 2004. WET ICE 2004. 13th
IEEE International Workshops on, pages 352357, June 2004.
[16] G. Varaprasad, R. S. D. Wahidabanu, and P. Venkataram. An efficient resource
allocation scheme for multimedia applications in manet. J. Netw. Comput. Appl., 31(4):577
584, 2008.
[17] V. Baousis, S. Hadjiefthymiades, G. Alyfantis, and L. Merakos. Autonomous mobile
agent routing for efficient server resource allocation. J. Syst. Softw., 82(5):891906, 2009.
[18] G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning
irregular graphs. SIAM J. Sci. Comput., 20(1):359392, 1998.
[19] H. Jitsumoto, T. Endo, and S. Matsuoka. Abaris: An adaptable fault
detection/recovery component framework for mpis. Parallel and Distributed Processing
Symposium, International, 0:413, 2007.
[20] T. D. Braun, H. J. Siegel, N. Beck, et al. A comparison of eleven static heuristics for
mapping a class of independent tasks onto heterogeneous distributed computing systems. J.
Parallel Distrib. Comput., 61(6):810837, 2001.
[21] R. Singh and M. Dave. Antecedence graph approach to checkpointing for fault
tolerance in mobile agent systems. Computers, IEEE Transactions on, 62(2):247258, 2013.
[22] K. Sycara, M. Paolucci, M. Van Velsen, and J. Giampapa. The retsina mas
infrastructure. Autonomous Agents and Multi-Agent Systems, 7(1-2):2948, 2003.
[23] M. Aldinucci, C. Bertolli, S. Campa, et al. Self-configuring and self-optimizing grid
components in the gcm model and their assist implementation. In Proc of.
HPC-GECO/Compframe (held in conjunction with HPDC-15), IEEE, pages 4552, Paris, France, June
2006.
[24] A.J.Garcia Loureiro, J.M.Lpez Gonzalez, and T.F.Pena. A parallel 3D semiconductor
device simulator for gradual heterojunction bipolar transistors. Int. Journal of Numerical
Modelling: electronic networks, devices and fields, 16:5366, 2003.
[25] F. Bellifemine, G. Caire, A. Poggi, and G. Rimassa. Jade: A white paper. EXP in
search of innovation, 3(3):619, 2003.
[26] A. Bouteiller, T. H�rault, G. Krawezik et al. Mpich-v project: A multiprotocol
automatic fault-tolerant mpi. International Journal of High Performance Computing and
Applications, 20(3):319333, 2006.
[27] R. M.M. Braga, C. M.L. Werner, and M. Mattoso. Odyssey-search: A multi-agent
system for component information search and retrieval. Journal of Systems and Software,
79(2):204 215, 2006.
[28] B. Brewington, R. Gray, K. Moizumi, et al. Mobile agents in distributed
information retrieval. In Intelligent Information Agents, pages 355395. Springer-Verlag, 1999.
[29] D. Buntinas, C. Coti, T. Herault, et al. Blocking vs. non-blocking coordinated
checkpointing for large-scale fault tolerant mpi protocols. Future Generation Computer
Systems, 24(1):73 84, 2008.
[30] M. Chen, T. Kwon, Yong Yuan, and V. C. M. Leung. Mobile agent based wireless
sensor networks. JCP, 1(1):1421, 2006.
[31] R. D. A. Byrski and M. Kisiel-Dorohinicki. Agent-based computing in an augmented
cloud environment. Computer Systems Science & Engineering, 27(1):718, 2012.
[32] A. Fuggetta, G. P. Picco, and G. Vigna. Understanding code mobility. IEEE
Transactions on Software Engineering, 24:342361, 1998.
[33] F. Schmuck and R. Haskin. GPFS: A Shared-Disk File System for Large Computing
Clusters. In Proceedings of FAST, 2002.
[34] P. M. Dickens and R. Thakur. Improving collective I/O performance using threads.
In In Proceedings of the 13th IPPS, pages 3845, 1999.
[35] G. Buddhinath Jayatilleke, L. Padgham, and M. Winikoff. A model driven
component-based development framework for agents. Computer Systems Science &
Engineering, 20, 2005.
[36] D. Kotz. Disk-directed I/O for MIMD Multiprocessors. In Proc. of the First
USENIX Symp. on Operating Systems Design and Implementation, 1994.
[37] D. B. Lange and M. Oshima. Seven good reasons for mobile agents. Commun.
ACM, 42(3):8889, March 1999.
[38] Cluster File Systems Inc. Lustre: A scalable, high-performance file system.
Cluster File Systems Inc. white paper, version 1.0, November 2002.
http://www.lustre.org/docs/whitepaper.pdf.
[39] J. Ye and H. C B Chanl. B2B magics: a mobile agent-based internet commerce
system for B2B e-commerce. Computer Systems Science & Engineering, 28(2):7180, 2013.
[40] X. Ma, M. Winslett, J. Lee, and S. Yu. Improving MPI-IO Output Performance with
Active Buffering Plus Threads. In IPDPS, pages 2226, 2003.
[41] D.C. Marinescu, L. Bölöni, J.R. Rice, P. Tsompanopoulu, and E.A Vavalis.
Agent-based scientific simulation and modeling. Concurrency Practice and Experience,
12(9):845861, 2000.
[42] M.Massie, B. N. Chun, and D. E. Culler. The ganglia distributed monitoring system:
Design, implementation and experience. Parallel Computing, 30:2004, 2003.
[43] K. Moizumi and G. Cybenko. The travelling agent problem. In Mathematics of
Control, Signals and Systems, 1998.
[44] J. Aguilar, J. Chacal, and C. Bravo. A multiagents system for planning and
management of the production factors. Computer Systems Science & Engineering, 24(2):85
102, 2009.
[45] N. Nieuwejaar, D. Kotz, A. Purakayastha, C.S. Ellis, and M.L. Best. File Access
Characteristics of Parallel Scientific Workloads. In IEEE Transactions on Parallel and Distributed
Systems, 7(10), pages 10751089, October 1996.
[46] R. V. Van Nieuwpoort, J. Maassen, G. Wrzesinska, et al. Ibis: a flexible and efficient
Java-based grid programming environment. In Concurrency & Computation: Practice &
Experience, pages 78, 2005.
[47] M. Kallahalla and P.J. Varman. Pc-opt: optimal offline prefetching and caching for
parallel i/o systems. Computers, IEEE Transactions on, 51(11):13331344, Nov 2002.
[48] W.B. Ligon and R.B. Ross. An Overview of the Parallel Virtual File System. In
Proceedings of the Extreme Linux Workshop, June 1999.
[49] K.E. Seamons, Y. Chen, P. Jones, J. Jozwiak, and M. Winslett. Server-directed
collective I/O in Panda. In Proceedings of Supercomputing ’95, 1995.
[50] L. Peterson, S. M., T. Roscoe, and Aaron Klingaman. PlanetLab Architecture: An
Overview. Technical Report PDN06031, PlanetLab Consortium, May 2006.
[51] J. del Rosario, R. Bordawekar, and A. Choudhary. Improved parallel I/O via a
two-phase run-time access strategy. In Proc. of IPPS Workshop on Input/Output in Parallel
Computer Systems, 1993.
[52] Y. Chen, S. Byna, Xian-He Sun, R. Thakur, and W. Gropp. Hiding I/O latency with
pre-execution prefetching for parallel applications. In SC ’08, pages 110, 2008.
[53] H. Simitici and D.A. Reed. A Comparison of Logical and Physical Parallel I/O
Patterns. In International Journal of High Performance Computing Applications, special issue
(I/O in Parallel Applications), 12(3), pages 364380, 1998.
[54] F. Garca D. E. Singh, A. Miguel and J. Carretero. Mobile agent systems integration
into parallel environments. Scalable Computing: Practice and Experience (SCPE), 10(1), 2008.
[55] W. Kitsuregawa Qu, M. Zhuge, H. Shen, and Y. H. Jin. A traffic-based routing
algorithm by using mobile agents. Computer Systems Science & Engineering, 22(6):323332,
2007.
[56] S. S. Vadhiyar and J. J. Dongarra. Self adaptivity in grid computing. Concurrency
& Computation: Practice & Experience, 2005, 2005.
[57] O. Wijngaards Van, B. J. Overeinder, N. J. E. Wijngaards, et al. Multi-agent support
for internet-scale grid management. In AISB02 Symposium on AI and Grid Computing, pages
1822, 2002.
[58] N. Woo, H. Y. Yeom, and T. Park. Mpich-gf: Transparent checkpointing and
rollback-recovery for grid-enabled mpi processes. IEICE TRANSACTIONS on Information and
Systems, 87:18201828, 2004.
[59] P. Nowoczynski, N. Stone, J. Yanovich, and J. Sommerfield. Zest: Checkpoint
storage system for large supercomputers. In In 3rd Petascale Data Storage Workshop
Supercomputing, 2008.
[60] B. Unni, N. Parveen, A. Kumar, and B.S. Bindhumadhava. An intelligent energy
optimization approach for mpi based applications in hpc systems. CSI Transactions on ICT,
1(2):175181, 2013.
[61] R. Bordawekar. Implementation of Collective I/O in the Intel Paragon Parallel File
System: Initial Experiences. In Proc. 11th International Conference on Supercomputing, July
1997.
[62] M. Hadzic and E. Chang. Onto-agent methodology for design of ontology-based
multi-agent systems. Computer Systems Science & Engineering, 23(1):1930, 2008.
[63] M. S. Pérez, J. Carretero, F. Garca-Carballeira, and at al. Mapfs: A flexible
multiagent parallel file system for clusters. Future Generation Comp. Syst., 22(5):620632,
2006.
[64] A. Sánchez, M. S. Pérez, Pierre G., and at al. Improving gridftp transfers by means
of a multiagent parallel file system. Multiagent Grid Syst., 3(4):441451, 2007.
[65] Mono: the open source development platform based on the .NET framework.
http://www.mono-project.com.
[66] M. Fukuda and J. Miyauchi. An implementation of parallel file distribution in an
agent hierarchy. J. Supercomput., 47(3):255285, 2009.
ResearchGate has not been able to resolve any citations for this publication.
Article
Multi-agent system provides a distributed collaborative platform and characterizes the system dynamics. Ontologies represent the domain knowledge and can be used to support various processes within a multi-agent system. Ontologies are high expressive knowledge models and as such increase the expressiveness and intelligence of a system. We propose onto-agent methodology for design of ontology-based multi-agent systems, as the first methodology that unifies the different approaches of the existing ontology and multi-agent systems design methodologies. Onto-Agent Methodology is composed of the two interconnected processes: ontology design process and multi-agent system design process. Each of the two design processes is composed of five steps. We discuss each of the steps separately and provide some illustrative examples.
Article
We propose a general reference model in this work for the automatic applications, which execute production planning functions and management of production factors. Next, such reference model is used for proposing a model for the planning and production management support system based on multiagents systems. For this, the agent specification methodology MASINA is applied.
Article
In this paper, we present a Business-to-Business (B2B) Mobile AGeni-based Internet Commerce System (MAGICS) that combines the advantages of centralized processing and of distributed management, allowing buyers and sellers to develop their own systems to manage their mobile agents in a distributed manner yet also allowing these agents to interact with each other in a centralized marketplace using Extensible Markup Language (XML)-based messages. We have also investigated a related inventory control application for B2B MAGICS and formulated a Markov decision model, which determines how an agent should make procurements under certain dynamic conditions so as to maximize profits. We have developed a prototype to show the basic functions of B2B MAGICS as a stock control application. In the prototype, the agents can be controlled through a Web/WAP interface or Short Messaging Service (SMS). Finally, we present analytical results to evaluate the effectiveness of the stock control algorithm.
Article
In this paper, we propose a traffic-based routing algorithm by using mobile agents. The traffic cost func-tion for each link is defined based on known routing information. We theoretically analyze the probability distribution which is very useful for mobile agents to select a neighboring node to move to. The optimal prob-ability distribution makes inference on the known information and approximates to a unbiased distribution. Simulation experiments are conducted to compare the performance of our algorithm with existing algorithms. The results show that our algorithm outperforms existing algorithms on the performance metrics considered.
Article
The flexibility offered by mobile agents is quite noticeable in distributed computing environments. However, the greater flexibility of the mobile agent paradigm compared to the client/server computing paradigm comes at an additional threats since agent systems are prone to failures originating from bad communication, security attacks, agent server crashes, system resources unavailability, network congestion, or even deadlock situations. In such events, mobile agents either get lost or damaged (partially or totally) during execution. In this paper, we propose parallel checkpointing approach based on the use of antecedence graphs for providing fault tolerance in mobile agent systems. During normal computation message transmission, the dependency information among mobile agents is recorded in the form of antecedence graphs by participating mobile agents of mobile agent group. When a checkpointing procedure begins, the initiator concurrently informs relevant mobile agents, which minimizes the identifying time. The proposed scheme utilizes the checkpointed information for fault tolerance which is stored in form of antecedence graphs. In case of failures, using checkpointed information, the antecedence graphs and message logs are regenerated for recovery and then normal operation continued. Moreover, compared with the existing schemes, our algorithm involves the minimum number of mobile agents during the identifying and checkpoiting procedure, which leads to the improvement of the system performance. In addition, the proposed algorithm is a domino-free checkpointing algorithm, which is especially desirable for mobile agent systems. Quantitative analysis and experimental simulation show that our algorithm outperforms other coordinated checkpointing schemes in terms of the identifying time and the number of blocked mobile agents and then can provide a better system performance. The main contribution of the proposed checkpointing scheme is the enhancement of graph-based ap- roach in terms of considerable improvement by reducing message overhead, execution, and recovery times.
Article
In the paper an approach to building a highly scalable execution environment dedicated for multi-agent computing systems is presented. It utilizes the concepts of Augmented Cloud and Agent Platform as a Service (AgPaaS), discussed in the context of Cloud Computing as defined by NIST. The key idea of the presented solution is to span the cloud (i.e. computing infrastructure) beyond the data center borders by utilizing volunteer nodes as computational workers. The feasibility of the approach was demonstrated by a prototype based on Java Applets and Adobe Flash technologies. Selected results from the experiment are discussed in the final part of the paper.
Article
Energy-aware computing is gaining more and more attention in high performance computing (HPC) environment. As an outcome of this, various energy-aware techniques are existing and many are being proposed. But it is difficult to have a technique which saves energy without compromising the performance. This paper talks about a novel energy optimization approach for Message Passing Interface (MPI) applications running on HPC systems. Our approach relies on applying Dynamic Voltage Frequency Scaling (DVFS) at node level by an optimization agent. Whenever MPI processes are idle or busy with I/O operations, the corresponding CPU cores run at higher frequencies, which results in wastage of power. During this time, CPU cores frequencies can be reduced using DVFS so that the energy can be saved. Our approach is based on a Multi-agent based intelligent energy management framework, which uses an optimization agent for implementing energy optimization algorithm. The key advantage of the proposed approach is that the performance will not be compromised while achieving energy savings.