Conference PaperPDF Available

SmartSockets: Solving the connectivity problems in grid computing

Authors:

Abstract

Tightly coupled parallel applications are increasingly run in Grid environments. Unfortunately, on many Grid sites the ability of machines to create or accept network connections is severely limited by firewalls, network address translation (NAT) or non-routed networks. Multi homing further com- plicates connection setup and machine identification. Al- though ad-hoc solutions exist for some of these problems, it is usually up to the application's user to discover the cause of the connectivity problems and find a solution. In this paper we describe SmartSockets,1 a communication library that lifts this burden by automatically discovering the con- nectivity problems and solving them with as little support from the user as possible.
SmartSockets: Solving the Connectivity Problems
in Grid Computing
Jason Maassen and Henri E. Bal
Dept. of Computer Science, Vrije Universiteit
Amsterdam, The Netherlands
jason@cs.vu.nl, bal@cs.vu.nl
ABSTRACT
Tightly coupled parallel applications are increasingly run in
Grid environments. Unfortunately, on many Grid sites the
ability of machines to create or accept network connections
is severely limited by firewalls, network address translation
(NAT) or non-routed networks. Multi homing further com-
plicates connection setup and machine identification. Al-
though ad-hoc solutions exist for some of these problems, it
is usually up to the application’s user to discover the cause
of the connectivity problems and find a solution. In this
paper we describe SmartSockets,1a communication library
that lifts this burden by automatically discovering the con-
nectivity problems and solving them with as little support
from the user as possible.
Categories and Subject Descriptors: C.2.4 [Distributed
Systems]: Distributed applications
General Terms: Algorithms, Design, Reliability
Keywords: Connectivity Problems, Grids, Networking, Par-
allel Applications
1. INTRODUCTION
Parallel applications are increasingly run in Grid environ-
ments. Unfortunately, on many Grid sites the ability of ma-
chines to create or accept network connections is severely
limited by network address translation (NAT) [14, 26] or
firewalls [15]. There are even sites that completely disallow
any direct communication between the compute nodes and
the rest of the world (e.g., the French Grid5000 system [3]).
In addition, multi homing (machines with multiple network
addresses) can further complicate connection setup.
For parallel applications that require direct communica-
tion between their components, these limitations have ham-
pered the transition from traditional multi processor or clus-
ter systems to Grids. When a combination of Grid sites is
used, serious connectivity problems are often encountered.
1SmartSockets is part of the Ibis project, and can be found
at http://www.cs.vu.nl/ibis
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
HPDC’07, June 25–29, 2007, Monterey, California, USA.
Copyright 2007 ACM 978-1-59593-673-8/07/0006 ...$5.00.
Unfortunately, it is often up to the user to discover the
cause of these connectivity problems, a non-trivial task at
best. Once the problems are identified, it may be possible
to circumvent some of them by using ad-hoc solutions, such
as opening a port range in a firewall, explicitly specifying
which address to use on a multi-homed machine, or using
SSH tunneling. Many problems, however, can only be solved
by adapting the application or the communication library it
uses. To make matters worse, as soon as the set of Grid sys-
tems being used changes, a large part of this process needs
to be repeated. As a result, running a parallel application
on multiple Grid sites can be a strenuous task [34].
In this paper we will describe a solution to this problem:
the SmartSockets communication library. The primary fo-
cus of SmartSockets is on ease of use. It automatically dis-
covers a wide range of connectivity problems and attempts
to solve them with little or no support from the user. Smart-
Sockets combines many known solutions, such as port for-
warding, TCP splicing and SSH tunneling, and introduces
several new ones that resolve problems with multi homing
and machine identification. In 30 connection setup experi-
ments, using 6 different sites worldwide, SmartSockets was
always able to establish a connection, while conventional
sockets only worked in 6 experiments. Using heuristics and
caching, SmartSockets is able to significantly improve the
connection setup performance.
SmartSockets offers a single integrated solution that hides
the complexity of connection setup in Grids behind a simple
interface that closely resembles sockets. We will show that
it is relatively straightforward to port an existing applica-
tion to SmartSockets, provided that certain programming
guidelines are followed.
SmartSockets is not specifically intended for use in par-
allel applications or Grids. It can also be applied to other
distributed applications, such as visualization, cooperative
environments, or even consumer applications such as instant
messaging, file sharing, or online gaming. However, many
of these applications only require a very limited degree of
connectivity. Often, clients simply connect to a server in a
well-known location, making it relatively easy to apply an
ad-hoc solution when a connectivity problem occurs.
Parallel applications, however, can be much more chal-
lenging. They often require a large number of connections
between the participating machines, and each machine must
be capable of both initiating outgoing and accepting incom-
ing connections. Running such applications in a Grid en-
vironment with limited connectivity is difficult. Therefore,
this paper will focus on this domain.
1
In Section 2 we describe the connectivity related problems
encountered while running applications on multiple Grid
sites. Section 3 describes how these problems are solved in
SmartSockets and briefly looks at the programming inter-
face. Section 4 evaluates the performance of SmartSockets,
Section 5 describes related work, and Section 6 concludes.
2. CONNECTIVITY PROBLEMS
In this section we will give a description of the network re-
lated problems that can occur when running a single parallel
or distributed application on multiple Grid sites.
2.1 Firewalls
As described in [15], ”A firewall is an agent which screens
network traffic in some way, blocking traffic it believes to be
inappropriate, dangerous, or both.”. Many sites use fire-
walls to protect their network from unauthorized access.
Firewalls usually allow outbound connections, but block in-
coming connections, often with the exception of a few well-
known ports (e.g., port 22 for SSH).
It is obvious that this connectivity restriction can cause
severe problems when running a parallel application on mul-
tiple sites. When only a single participating site uses fire-
wall, the connectivity problems can sometimes be solved by
ensuring that the connections setups are ’in the right direc-
tion’, i.e., that all required connections between open and
firewalled machines are initiated at the firewalled site. This
solution may require changes to the applications or commu-
nication libraries, however. Also, if both sites use a firewall,
this approach can no longer be used. In this case, a fire-
wall will always be encountered regardless of the connection
setup direction.
One way to solve the problems is to request an open port
range in the firewall. Connectivity can then be restored by
adapting the application to only use ports in this range.
Besides requiring reconfiguration of the firewall, open ports
are also seen as a threat to site security.
When both machines are behind a firewall it may still
be possible to establish a direct connection using a mecha-
nism called TCP splicing [6, 10, 13, 20]. Simply put, this
mechanism works by simultaneously performing a connec-
tion setup from both sides. Since this approach requires ex-
plicit cooperation between the machines, some alternative
communication channel must be available.
2.2 Network Address Translation
As described in [21], ”Network Address Translation is a
method by which IP addresses are mapped from one ad-
dress realm to another, providing transparent routing to end
hosts.”. NAT was introduced in [12] as a temporary solu-
tion to the problem of IPv4 address depletion. Although the
intended solution for this problem, IPv6, has been available
for some time, NAT is still widely used today.
Many different flavors of NAT exist, but the Network Ad-
dress Port Translation is most frequently used [21, 29]. This
type of NAT allows outbound connections from sites using
private addresses, but does not allow incoming connections.
Both the IP address (and related fields) and the transport
identifier (e.g., TCP and UDP port numbers) of packets are
translated, thereby preventing port number collisions when
a set of hosts share a single external address.
As mentioned above, NAT only allows outbound network
connections. Incoming connections are rejected, since the
connection request does not contain enough information to
find the destination machine (i.e., only the external IP ad-
dress is provided, but that may be shared by many ma-
chines). This restriction leads to connectivity problems that
are very similar to those caused by firewalls. Therefore, the
solution described in Section 2.1 (connecting ’in the right di-
rection’) also applies to a NAT setup, and fails in a similar
way when multiple NAT sites try to interconnect.
Although the TCP splicing mechanism can also be used
to connect two NAT sites, a more complex algorithm is re-
quired to compensate for the port translation performed by
NAT [6, 20].
Some NAT implementations have support for port for-
warding, where all incoming connections on a certain port
can be automatically forwarded to a certain host inside the
NAT site. Using mechanisms such as UPnP [5], DPF [28],
or MIDCOM [30], applications can contact the NAT im-
plementation and change the port forwarding rules on de-
mand. Port forwarding lifts many of the restrictions on in-
coming connections. Unfortunately, UPnP is mostly found
in consumer devices, MIDCOM is still under development,
and DPF only supports NAT (and firewall) implementations
based on NetFilter [1]. As a result, these mechanisms are
not (yet) generally usable in Grid applications. Currently,
SmartSockets only supports UPnP.
In addition to causing connection setup problems, NAT
also complicates machine identification. Machines in a NAT
site generally use IP addresses in the private range [26].
These addresses are only usable within a local network and
are not globally unique. Unfortunately, parallel applications
often use a machine’s IP address to create a unique identifier
for that machine. When multiple NAT sites participate in a
single parallel run, however, this approach can not be used,
since the machine addresses are no longer guaranteed to be
unique.
2.3 Non-routed networks
On some sites no direct communication between the com-
pute nodes and the outside world is possible due to a strict
separation between the internal and external networks. No
routing is performed between the two. Only a front-end
machine is accessible, and the connectivity of this machine
may be limited by a firewall or NAT. Two of the sites used
in Section 4 use such a setup.
It is clear that this is a major limitation when the site is
used in a parallel application. The only possibility for the
compute nodes to communicate with other sites is to use the
front-end machine as a bridge to the outside world, using,
for example, an SSH tunnel or a SOCKS [24] proxy. These
are non-trivial to set up, however.
2.4 Multi Homing
When multi-homed machines (i.e., machines with multi-
ple network addresses) participate in a parallel application,
another interesting problem occurs. When creating a con-
nection to such a machine, a choice must be made on which
of the possible target addresses to use. The outcome of this
choice may depend on the location of the machine that ini-
tiates the connection.
For example, the front-end machine of a site has two ad-
dresses, a public one, reachable over the internet, and a pri-
vate one used to communicate with the site’s compute nodes.
As a result, a different address must be used to reach this
2
machine depending on whether the connection originates in-
side or outside of the site.
In [34] we called this the Reverse Routing Problem.Nor-
mally, when a multi-homed machine is trying to connect to
a single IP address, a routing table on the machine decides
which network is used for the outgoing connection. In the
example described above the reverse problem is encountered.
Instead of having to decide how to ‘exit’ a multi-homed ma-
chine, we must decide on how to ‘enter’ it. This problem is
non-trivial, since the source machine generally does not have
enough information available to select the correct target ad-
dress. As a result, several connection attempts to different
addresses of the target may be necessary before a connection
can be established. In Section 3.2 we will describe heuristics
that can be used to speed up this process.
Multi homing can have a major effect on the implementa-
tion of parallel programming libraries. The example above
shows that it is not sufficient to use a single address to rep-
resent a multi-homed machine. Instead, all addresses must
be made available to the other participants of the parallel
application. In addition, some of the addresses may be in a
private range and refer to a different machine when used in
a different site. Therefore, it is also essential to check if a
connection was established to the correct machine.
3. SMARTSOCKETS
In this section we will give an overview of the design, im-
plementation and programming interface of the SmartSock-
ets library, and describe how it solves the problems described
in the previous section.
3.1 Overview
Currently, SmartSockets offers four different connection
setup mechanisms, Direct, Reverse, Splicing, and Routed.
They will be described in more detail below. Table 1 shows
an overview of how these mechanisms solve the connectivity
problems described in Section 2. As the table shows, each
problem is solved by at least one mechanism.
Table 1: Overview of connectivity problems and
their solutions. Connection Setup Mechanism
Problems Direct Reverse Splicing Routed
Identification X
Multi Homing X
Single FW/NAT (X) X X X
Dual FW/NAT (X) X X
No Routing X
The machine identification and multi-homing problems
are solved by the direct connection setup. As will be ex-
plained below, this approach also has limited firewall traver-
sal capabilities (using SSH tunneling), so in certain situa-
tions it may succeed in establishing a connection in a single
or even a dual firewall setting. In the table these entries are
shown between brackets.
A reverse connection setup is only capable of creating a
connection when a single firewall or NAT limits the con-
nectivity. Splicing is capable of handling both single and
dual firewall/NAT configurations. However, this approach is
significantly more complex than a reverse connection setup
(especially with dual NAT) and may not always succeed.
Therefore, reverse connection setup is preferred for single
firewall/NAT configurations.
Aroutedconnectionsetupcanbeusedinanysituation
where the connectivity is limited. Unlike the previous two
approaches it does not result in a direct connection. Instead
all network traffic is routed via external processes called
hubs (explained in Section 3.3), which may degrade both
latency and throughput of the connection. Therefore, the
previous mechanisms are preferred. When connecting to or
from a machine on a non-routed network, however, a routed
connection is the only choice.
The SmartSockets implementation is divided into two lay-
ers, a low-level Direct Connection Layer, responsible for all
actions that can be initiated on a single machine, and a
high-level Virtual Connection Layer that uses side-channel
communication to implement actions that require cooper-
ation of multiple machines. The direct connection layer is
implemented using the standard socket library. The virtual
connection layer is implemented using the direct connection
layer. Both layers will be explained in more detail below.
Currently, SmartSockets is implemented using Java [2].
3.2 Direct Connection Layer
The direct connection layer implements all actions that
do not require explicit cooperation between machines, such
as determining the local addresses or creating a direct con-
nection. It also supports a limited form of SSH tunneling.
3.2.1 Machine Identification
During initialization, the direct connection layer starts by
scanning all available network interfaces to determine which
IP addresses are available to the machine. It then generates
a unique machine identifier that contains these addresses,
and that can be used to contact the machine.
This identifier will automatically be unique if it contains
at least one public address. If all addresses are private,
however, additional work must be done. A machine that
only has private addresses is either in a NAT site or uses a
non-routed network. In the first case, a unique identifier can
still be generated for the machine by acquiring the external
address of the NAT. Provided that this address is public,
the combination of external and machine addresses should
also be unique, since other machines in the same NAT site
should have a different set of private addresses, and all other
NAT sites should have a different external address.
The SmartSockets library will use UPnP to discover the
external address of the NAT site. If this discovery fails, or
if the returned address is not public, a Universally Unique
Identifier (UUID) [23], will be generated and included in the
machine identifier, thereby making it unique.
3.2.2 Connection Setup
Once initialized, the direct connection layer can be used
to set up connections to other machines. The identifier of
the target machine may contain multiple network addresses,
some of which may not be reachable from the current loca-
tion. The private addresses in the identifier may even refer
to a completely different machine, so it is important that the
identity of the machine is checked during connection setup.
As a result, several connection attempts may be necessary
before the correct connection can be established.
When multiple target addresses are available, a choice
must be made in which order the connection attempts will
be performed. Although simply using the addresses in an
arbitrary order should always result in a connection (pro-
3
vided that a direct connection is possible), this may not be
the most efficient approach. Many Grid sites offer high-
performance networks such as Myrinet [7] or Infiniband [4]
in addition to a regular Ethernet network. Using such a
network for inter-site communication may significantly im-
prove the application’s performance. In general, these fast
networks are not routed and use addresses in the private
range, while the regular Ethernet networks (often) use pub-
lic addresses. Therefore, by sorting the target addresses and
trying all private ones first, the fast local networks will au-
tomatically be selected in sites with such a setup.
The drawback of this approach is that the private ad-
dresses of a target will always be tried first, even if the con-
nection originates on a different site. This may cause a sig-
nificant overhead. Therefore, SmartSockets uses a heuristic
that sorts the target addresses in relation to the addresses
that are available locally. For example, if only a public ad-
dress is available on the local machine, it is unlikely that it
will be able to create a direct connection to a private address
of a target. As a result, the connection order public before
private is used. This order is also used if both machines have
public and private addresses, but the private addresses re-
fer to a different network (e.g., 10.0.0.10 vs. 192.168.1.20 ).
The order private before public is only used if both machines
have private addresses in the same range. Section 4 will il-
lustrate the performance benefits of this heuristic.
Unfortunately, it is impossible to make a distinction be-
tween addresses of the same class. For example, if a target
has multiple private addresses, we can not automatically
determine which address is best. Therefore, if a certain
network is preferred, the user must specify this explicitly.
Without this explicit configuration, SmartSockets will still
create a direct connection (if possible), and the parallel ap-
plication will run, but its performance may be suboptimal.
When a connection has been established, an identity check
is performed to ensure that the correct machine has been
reached. This would be a simple comparison if the complete
identifier of the target is available, but unfortunately this
is not always the case. User provided addresses are often
used to bootstrap a parallel application. These addresses
are often limited to a single hostname or IP address, which
may only be part of the addresses available to the target ma-
chine. Therefore, the identity check used by SmartSockets
also allows the use of partial identifiers.
Whenever a connection is created, the target machine pro-
vides its complete identity to the machine initiating the con-
nection. This machine then checks if both the public and
private addresses in the partial identity are a subset of the
ones in the complete identity. If so, the partial identity is
accepted as a subset of the complete identity, and the con-
nection is established. Note that although the connection is
created to a machine that matches the address specified by
the user, it is not necessarily the correct machine from the
viewpoint of the parallel application. Unfortunately, in such
cases it is up to the user to provide an address that contains
enough information to reach the correct machine.
3.2.3 Open Port Ranges and Port Forwarding
When a firewall has an open port range available, Smart-
Sockets can ensure that all sockets used for incoming con-
nections are bound to a port in this range. There is no way
of discovering this range automatically, however, so it must
be specified explicitly by the user.
In addition, SmartSockets can use the UPnP protocol to
configure a NAT to do port forwarding, i.e., automatically
forward all incoming connections on a certain external port
to a specified internal address. However, as explained before,
this protocol is mainly used in consumer devices.
3.2.4 SSH Tunneling
In addition to regular network connections, the direct con-
nection layer also has limited support for SSH tunneling.
This feature is useful for connecting to machines behind a
firewall that allows SSH connections to pass through. It
does, however, require a suitable SSH setup (i.e., public key
authentication must be enabled).
Creating an SSH tunnel is similar to a regular connection
setup. The target addresses are sorted and tried consecu-
tively. Instead of using the port specified in the connection
setup, however, the default SSH port (i.e., 22) is used. When
a connection is established and the authentication is success-
ful, the receiving SSH daemon is instructed to forward all
traffic to the original destination port on the same machine.
If this succeeds, the regular identity check will be performed
to ensure that the right machine has been reached.
Although this approach is useful, it can only be used to
set up a tunnel to a different process on the target machine.
Using this approach to forward traffic to different machines
requires extra information. For example, setting up an SSH
tunnel to a compute node of a site through the site’s fron-
tend, can only be done if it is clear that the frontend must
be contacted in order to reach the target machine. Although
this approach is used in some pro jects [8], the necessary in-
formation cannot be obtained automatically and must be
provided by the user. Therefore, SmartSockets uses a differ-
ent approach which will be described in detail in Section 3.3.
3.2.5 Limitations
The direct connection layer offers several types of connec-
tion setup which have in common that they can be initiated
by a single machine. No explicit cooperation between ma-
chines is necessary to establish the connection. There are
many cases, however, where connectivity is too limited and
the direct connection layer cannot be used.
In general, direct connections to sites that use NAT or a
firewall are not possible. Although SSH tunneling and open
port ranges alleviate the firewall problems, they require a
suitable SSH setup or extra information from the user. Port
forwarding reduces the problems with NAT, but is rarely
supported in Grid systems. Therefore, these features are of
limited use. In the next section we will give a detailed de-
scription of the virtual connection layer, which solves these
problems.
3.3 Virtual Connection Layer
Like the direct connection layer, the virtual connection
layer implements several types of connection setup. It offers
a simple, socket-like API and has a modular design, making
it easy to extend. Besides a direct module that uses the
direct connection layer described above, it contains several
modules that offer more advanced types of connection setup.
These modules have in common that they require explicit
cooperation (and thus communication) between the source
and target machines in order to establish a connection. As a
result, side-channel communication is required to implement
these modules.
4
3.3.1 Side-Channel Communication
In SmartSockets, side-channel communication is imple-
mented by creating a network of interconnected processes
called hubs. These hubs are typically started on the fron-
tend machines of each participating site, so their number is
usually small.
When a hub is started, the location of one or more other
hubs must be provided. Each hub will attempt to setup a
connection to the others using the direct connection layer.
Although many of these connections may fail to be estab-
lished, this is not a problem as long as a spanning tree is
created that connects all hubs.
The hubs use a gossiping protocol to exchange information
about themselves and the hubs they know, with the hubs
that they are connected to. This way information about
each hub quickly spreads to all hubs in the network. When-
ever a hub receives information about a hub it has not seen
before, it will attempt to set up a connection to this hub.
This way, new connections will be discovered automatically.
All gossiped information contains a state number indicat-
ing the state of the originating machine when the informa-
tion was sent. Since information from a hub may reach an-
other hub through multiple paths, the state number allows
the receiver to decide which information is most recent.
By recording the length of the path traversed thus far in
the gossiped information, hubs can determine the distance
to the sites that they can not reach directly. Whenever a
hub receives a piece of information about another hub con-
taining a shorter distance than it has seen so far, it will
remember both the distance and the hub from which the in-
formation was obtained. This way, we automatically create
adistributed routing table with the shortest paths between
each pair of hubs. This table is later used to forward appli-
cation information (as will be described below).
When an application is started, the virtual layer on each
machine creates a single connection to the hub local to its
site. The location of this hub can either be explicitly speci-
fied or discovered automatically using UDP multicast.
3.3.2 Virtual Addresses
The connection to the hub can now be used as a side chan-
nel to forward requests to otherwise unreachable machines.
To ensure that the target machines can be found, virtual
addresses are used, consisting of the machine identifier (see
Section 3.2), a port number, and the identifier of the hub
the machine is connected to.
All requests for the target machine can then be sent to the
local hub, which forwards it in the direction of the target
hub using the information contained in its routing table.
The request will continue to be forwarded until the target
hub is reached and the request is delivered to the machine.
3.3.3 Modules
The current implementation of SmartSockets contains four
different connection modules, one for each of the connection
setup mechanisms described in Section 3.1.
Direct.
The direct connection module simply forwards all connec-
tion requests to the direct connection layer. It does not make
use of side-channel communication and has the features and
limitations described in Section 3.2.
Reverse.
A direct connection setup will generally fail if the tar-
get is behind a firewall or NAT. However, as explained in
Section 2, outgoing connections are usually allowed on such
sites. The reverse connection module exploits this property
by reversing the direction of the connection setup.
Instead of creating a connection, the reverse connection
module creates a new socket locally. It then sends a request
to the target machine using the side channel. This request
contains the target’s address and destination port, and the
address of the new socket. When the request is received
on the target machine, it will check if the destination port
exists. If it does, the target machine attempts to create a
direct connection back to the new socket. If successful, this
connection is returned as the result of original connection
setup call on the source. On the target, the new connection
will be queued, awaiting an accept from the application.
Splicing.
The reverse connection module requires the source ma-
chine to be publicly accessible. When both machines are
behind a firewall or NAT the reverse connection setup will
fail. However, it may still be possible to create a connection
using TCP splicing [10, 20]
When the machines have public addresses (i.e., they are
not behind a NAT), the actions performed by the splicing
module are relatively straightforward. First, the source ma-
chine sends a request to the target using the side channel.
This request contains the target address and destination
port, the complete identifier of the source, and a port the
source will use to create the outgoing connection. When the
request is received on the target machine, it will check if the
destination port exists. If it does, a reply is returned and
both machines repeatedly attempt to create a direct connec-
tion to the other (using only public addresses). Since this
mechanism is sensitive to timing and both machines may
have multiple public addresses, the number of attempts re-
quired may be large.
When one or both machines are behind a NAT (i.e., they
only have private addresses), a different approach is used.
As explained in Section 2.2, most NAT implementations
translate the address and port number of outgoing connec-
tions. TCP splicing requires both machines to know each
other’s exact external address and port number. Although
the external address of a NAT site is often constant, the port
mapping is hard to determine, since a different port may be
used for every connection attempt.
Fortunately, most NAT implementations use a predictable
port mapping scheme [19]. Therefore, once a single mapping
has been determined, a prediction can be made on the range
of port numbers that is likely to be used in the immediate
future. By using the external address and port range in the
connection setup attempts, TCP splicing can still be used.
To obtain an initial mapping, the assistance of an external
machine (outside of the NAT) is required. SmartSockets
uses the hubs for this purpose. If the source machine is
behind a NAT, it will request a list of available hubs from
its local hub, and attempt to find an external hub to which it
can connect directly. When such a connection is successful,
the external hub echoes the source address and port number
to the source machine. If this address is public, it is likely
to be the external address of the NAT site, and the address
and port are included in the request sent to the target. The
5
target will use the same approach if it is behind a NAT, and
return its external address and port number to the source.
Both machines will now repeatedly perform connection
attempts using the external address of the other site and
trying all port numbers in the predicted range. Currently,
SmartSockets uses a range of [port...port+5].
It is obvious that there are many cases where the splicing
module will not be able to set up a connection. For example,
a machine may be unable to find its external address, the
external address may be wrong (e.g., in case of multiple
consecutive NATs), the port range prediction may be wrong,
or the connection attempts may not succeed in time. As
shown in [19], the maximum success rate is approximately
86%. Fortunately, there is a backup solution, the routed
connections module explained below.
Routed.
The last module available is the routed connection mod-
ule. Provided that there is at least a spanning tree connect-
ing the hubs, this module should always be able to create
a connection between two machines, even if these machines
are only connected to non-routed networks.
Whenever a connection is created using the routed con-
nection module, the source sends a connection request to its
local hub using the side channel, containing the target ad-
dress and port. The hub will add this virtual connection in
its administration and forward the request to the next hub
using the routing table described in Section 3.3.1. When the
request reaches the target machine, it will make sure that
the destination port exists and queue the request.
Once the connection is accepted by the application, a re-
ply is sent back via the hubs, and the virtual connection
is established. Both machines now return a virtual socket,
which, instead of sending its data directly to the target ma-
chine, forwards all data through a series of hubs.
3.3.4 Module Order and Caching
To create a new connection, each module is tried until a
connection is established, or until it is clear that a connec-
tion can not be established at all (e.g., because the destina-
tion port does not exist on the target machine). By default,
the order Direct, Reverse, Splice, Routed is used. This order
prefers modules that produce a direct connection.
When a connection is established, the time required for
subsequent connection setups can be reduced by caching
which module was successful. This information is cached
based on the hub address of the target, and not its machine
address (see Section 3.3.2). Caching in this way allows an
entire site to be represented using a single cache entry (since
machines in a site typically share the same hub). Not only
does this save memory, but it also improves the effectiveness
of the cache. After a connection is created to a single ma-
chine of a site, all other connection setups to the same site
benefit from the cached information. In Section 4 we will
show the benefits of this approach.
3.4 Programming Interface
We will now give a short description of the programming
interface of the virtual connection layer of SmartSockets,
which is currently implemented using Java [2]. Converting
an application to SmartSockets is relatively straightforward,
provided that the application uses a Factory Pattern [16]
to create sockets. The javax.net package of the Java class
libraries contains interfaces for two such factories, one to
create Sockets (outgoing connections), the other to create
ServerSockets (incoming connections). Java also offers im-
plementations which create regular or secure sockets (SSL).
SmartSockets extends the Socket and ServerSocket imple-
mentations of Java, and offers two factories which adhere to
the interfaces described above. This allows SmartSockets to
be plugged in to existing applications by simply changing
the factory implementations that are used.
Unfortunately, the addressing scheme used in connection
setup cannot be replaced this easily. Currently, Java used
three forms of addressing. The first is based on an InetAd-
dress, which is hard coded to be either an IPv4 or IPv6
address and cannot be extended. When this scheme is used,
SmartSockets has no other choice then to attempt a direct
connection to the given address. None of the other connec-
tion setup schemes can be used due to a lack of information.
The second addressing scheme is more flexible. It is based
on a SocketAddress interface, which is implemented by a
VirtualSocketAddress in SmartSockets. Unfortunately, be-
cause Java does not not offer a factory to create these Sock-
etAddresses, most applications explicitly use InetSocketAd-
dresses, which consists of a InetAddress and a port num-
ber. To make full use of SmartSockets, it is necessary to
replace these with a VirtualSocketAddress. For this pur-
pose, SmartSockets offers a SocketAddressFactory,thatcan
be used to create both VirtualSocketAddress and InetSocke-
tAddress objects. As with the first scheme, SmartSockets is
also backward compatible with InetSocketAddress, although
this may restrict the connectivity.
The third scheme simply uses a String as a machine ad-
dress. Although this string is originally intended to contain
a host name, it can just as easily be used to carry a string
representation of a virtual address. Therefore this mecha-
nism can be used by SmartSockets without any code modifi-
cation. As with the previous schemes, the connectivity may
be restricted when the information in the string is limited.
class Example {
SocketFactory createFactory(String type) {
if (type.equals("plain"))
return SocketFactory.getDefault();
if (type.equals("SSL"))
return SSLSocketFactory.getDefault();
if (type.equals("SmartSockets"))
return SmartSocketFactory.getDefault();
// else print error
}
void run(String type, String address) {
SocketFactory f = createFactory(type);
SocketAddressFactory a = new SocketAddressFactory();
Socket s = f.createSocket().
s.connect(a.createSocketAddress(address));
// we can now use the socket.
}
}
Figure 1: Example Application
In Figure 1, an example is shown that is can make use of
regular sockets, SSL, or SmartSockets. By varying the type
parameter of run, a different socket factory can be selected.
Using the SocketAddresFactory provided by SmartSockets
(explained above), the target machine address can be trans-
lated to a SocketAddress in a portable manner, and used for
connection setup.
6
Table 2: The testbed
Machine Restrictions (frontend) Restrictions (nodes)
DAS3-V MH MH
DAS3-D MH MH
Grid5000 MH, FW NR
Rockstar MH, FW MH, FW
Hiroshi MH, FW, NAT NR
Desktop NAT n/a
MH = multi-homing, FW = firewall, NR = non-routed
4. EVALUATION
In this section we will evaluate the performance of Smart-
Sockets. We use a testbed consisting of 5 different clusters
and a desktop machine. The machines have varying connec-
tivity restrictions, shown in Table 2.
The DAS3 system consist of 5 different clusters in the
Netherlands. We will use two, DAS3-V, the cluster at the
Vrije Universiteit Amsterdam, and DAS3-D, the cluster lo-
cated at the Delft University of Technology. The Grid5000
system consist of several clusters distributed over 9 sites in
France. We use the cluster located at the University of Nice-
Sophia Antipolis. The Rockstar cluster2is located at the
San Diego Supercomputing Center, University of California,
USA, and the Hiroshi machine is located at the School of
Information Technologies, University of Sydney, Australia.
Finally, Desktop is a single machine located in Haarlem,
The Netherlands. All sites use AMD or Intel processors of
2.0 Ghz or faster.
Figure 2: Hub connections on testbed.
Before running the experiments, a hub is started on the
frontend machine of each of the clusters and on the desktop
machine. Each hub is provided with locations of all other
hubs. As explained in Section 3.3.1, each hub attempts to
set up a direct connection to all others. The resulting setup
is shown in Figure 2. Double circles indicate a site with a
firewall. Sites using NAT are explicitly marked. The ar-
rows between the hubs indicate a connection and show the
direction in which the connection was established. Each ar-
row is annotated with the round-trip time between the sites
(measured with ping). The Hiroshi machine could only be
reached using an SSH-tunnel. This tunnel was automati-
cally setup by SmartSockets, but only after a suitable SSH
configuration was created on the DAS3-V site.
4.1 Performance
We will start by evaluating the connection setup perfor-
mance, comparing it to the basic socket performance when
possible. The results are shown in Table 3. This table shows
the time required to set up a connection between compute
nodes of each combination of sites. Connections to and from
the desktop machine are also included.
2We would like to acknowledge Frank Seinstra for his assis-
tance in running experiments on the Rockstar and Hiroshi
machines.
The entries are annotated to indicate which connection
style is selected by SmartSockets. This selection is per-
formed automatically and the result is cached. As a result,
the correct connection style is immediately selected for all
but the first connection.
As the table shows, a conventional connection could only
be created in 6 out of the 30 combinations. As expected,
SmartSockets selects a direct connection setup in these cases.
The time required by SmartSockets to establish a direct con-
nection is roughly twice that of conventional sockets. This is
caused by the identity check (see Section 3.2) which requires
an extra round trip.
In four cases, SmartSockets selected a reverse connection
setup. These correspond to the combinations of machines
where the source machine is open, but the target is behind
a firewall or NAT. Reverse connection setup adds an addi-
tional round trip latency to the time required by the direct
connection setup (in the opposite direction). This additional
time is needed to forward a reverse connection request mes-
sage from the source to the target (via the hubs) and to send
an accept message (via the new connection) once the appli-
cation on the target has accepted the incoming connection.
In the other 20 cases, SmartSockets decided to set up a
virtual connection by routing all data via the hubs. In these
cases the connection setup time is dominated by the round
trip time to the machine, as can be seen by comparing the
numbers in Table 3 to those in Table 4. Although it is
possible to use splicing between the Desktop and Rockstar
machines, this method fails occasionally due to the timing
sensitivity of this approach. When this occurs, a virtual
connection is selected instead. Since SmartSockets uses a
cache to remember previous selections, splicing will not be
used afterward.
0
1
2
3
4
Time (seconds)
routed
splice
Reverse
UC C
Splice
UC C
Routed
UC C
reverse
direct
succesfull setup
wrong machine
non-existant machine
0
5
10
15
20
Time (milliseconds)
Direct
No heuristic
NR R
Heuristic Manual
Figure 3: Breakdown of connection setup time.
The first graph of Figure 3 shows the impact of the se-
lected module cache on connection setup time. Only three
connection setup mechanism are shown. The connection
setup mechanisms are initially tried in the order Direct, Re-
verse, Splice, Routed.SincetheDirect approach is always
tried first, no caching is needed when it is successful. For
each mechanism, Figure 3 shows two bars. The first shows
the connection setup time when no information is available,
thesecondshowsthetimewhenthecorrectmechanismcan
be retrieved from the cache. A connection timeout of 5 sec-
onds was specified for both.
The reverse connection setup is performed from DAS3-V
to Rockstar. First, about 2.5 seconds is spent in a direct
7
Table 3: Connection setup time of SmartSockets (time in milliseconds).
Source
Target DAS3-V DAS3-D Rockstar Grid5000 Hiroshi Desktop
DAS3-V 4.9d(2.4) 332d(166) 68v595v33d(17)
DAS3-D 4.9d(2.4) 335d(167) 70v595v33d(18)
Rockstar 500r503r206v718v182v
Grid5000 35v38v206v593v54v
Hiroshi 630v603v750v670v640v
Desktop 49r52r183v84v606v
Annotations indicate connection style: dfor direct, rfor reverse, sfor splicing, and vfor routed.
When applicable, the connection setup time of regular sockets is shown between brackets.
connection attempt, which only fails after the timeout has
fully expired. This is common behavior when connecting
to a machine behind a firewall which blocks the incoming
connection but sends no reply. As a result, the source ma-
chine has to wait for the timeout. Next, about 0.5 seconds
are needed to set up a reverse connection. All subsequent
connections are created in 0.5 seconds.
For the spliced connection setup we perform a separate ex-
periment connecting the Desktop machine to the Grid5000
frontend. As Figure 3 shows, the direct and reverse con-
nection setup fail after using most of their 1.25 second time
slots. A spliced connection is then created in 0.45 seconds.
The routed connection setup is performed from a DAS3-
V to a node of Grid5000. The first three connection setup
mechanisms fail, each using 1.25 seconds. A virtual connec-
tion is then established between the machines in 37 millisec-
onds. All subsequent connections are created in 37 millisec-
onds, reducing the connection time by a factor of 102.
The second graph of Figure 3 shows the impact of the tar-
get address heuristic on the direct connection setup time be-
tween the DAS3-D and DAS3-V site. DAS3-V has one pub-
lic and two private addresses, DAS3-D has one of both. The
range of one of the private addresses overlaps on both sites.
The figure shows four experiments, one with the heuristic
turned off, two with the heuristic turned on, and one where
the correct address was manually selected.
The first experiment shows that if no heuristic is used, the
connection setup requires 17 milliseconds, of which 12 are
spent attempting a connection setup to a private address
which does not exist on DAS3-D. Next, about 0.2 millisec-
onds is required to discover that no connection is possible
to the second private address (since no one is listening). Fi-
nally, the correct connection is set up in 4.5 milliseconds.
The second experiment does use the heuristic. In this
experiment there is no process listening on the DAS3-D ma-
chine that shares the private address with the target DAS3-
V machine. Therefore, the first connection attempt fails im-
mediately. The correct connection is then established in 5
milliseconds. The third experiment uses a similar setup, but
now there is a process listening on the DAS3-D machine that
shares the private address. As a result, a handshake is re-
quired to discover that a connection has been established to
the wrong machine. This handshake requires approximately
twice the time needed for the failed connection attempt of
the previous experiment. In the last experiment the correct
address is manually selected. As expected, the connection
is set up in 5 milliseconds.
Table 4 shows the round-trip time of the connections. In
the six cases where regular sockets can also establish a con-
nection, the round-trip time of SmartSockets and the regu-
lar socket connection is the same. For the other cases, the
round-trip time is approximately the same as the network
round-trip time as measured by ping. The only exception is
the Hiroshi cluster. The frontend of this machine can only
be reached using SSH-tunneling. Unfortunately, the over-
head of this approach is high, roughly doubling the required
round trip time.
Table 5 shows the achievable throughput. As with latency,
the throughput of SmartSockets is similar to that of regular
sockets (where applicable). The performance of the Hiroshi
cluster is limited both by the distance to the other machines
and by the encryption performed by the SSH-tunnel used
to reach it. The performance of the Desktop machine is
limited by its ADSL connection (approximately 3.5 Mbit/s
downstream and 800 KBit/s upstream).
5. RELATED WORK
The system described in [10] can be seen as a predeces-
sor to SmartSockets. It was developed in cooperation with
our group. The focus of this work was mainly on using
splicing to traverse firewalls. Only a limited form of mes-
sage routing was available (no further than two hops) and
the system did not support multi homing, SSH tunneling, or
reverse connection setup. To allow the side-channel commu-
nication through firewalls, the system needed gateway nodes
with access to the networks inside and outside of the firewall
(e.g. by using an open port range). Machines could then
connect to such a gateway whenever side-channel commu-
nication was necessary with a machine behind the firewall.
Instead, in SmartSockets the hubs use outgoing connections
from behind the firewall. This is easier to set up and it does
not require any open ports.
Generic Connection Brokering (GCB) [28] can serve as a
replacement for traditional sockets, provided that the ap-
plication follows certain programming guidelines. When a
GCB client creates a socket for listening, this socket is reg-
istered at a GCB server. This server must be located in a
publicly accessible network. Similar to NAT, the server then
creates a socket with a public address that acts as a proxy
for the private or firewalled client socket. This public ad-
dress is then returned to the client to be used as the ‘official’
address of the client socket. When any non-GCB client tries
to connect to this address, it will reach the proxy instead.
The server will then forward this incoming connection to the
client and relay any subsequent data. Connections created
by GCB-aware clients will be forwarded to the server instead
of the proxy (by replacing the port number in the client ad-
dress with a well-known server port). This allows the server
to mediate in the connection setup between the two clients
and, depending on their connectivity restrictions, instruct
them to use a direct connection, reverse the connection or-
der, or use the server itself as a relay.
8
Table 4: Roundtrip latency of SmartSockets (time in milliseconds).
Source
Target DAS3-V DAS3-D Rockstar Grid5000 Hiroshi Desktop
DAS3-V 2.3 (2.3) 166 (166) 56 528 14 (14)
DAS3-D 2.3 (2.3) 167 (167) 57 533 15 (15)
Rockstar 166 167 205 590 195
Grid5000 56 57 205 524 50
Hiroshi 528 529 590 522 539
Desktop 14 15 190 43 522
When applicable, the roundtrip latency of regular sockets is shown between brackets.
Table 5: Throughput of SmartSockets (in Mbit/second).
Source
Target DAS3-V DAS3-D Rockstar Grid5000 Hiroshi Desktop
DAS3-V 182 (183) 2.6 (2.5) 2.5 0.25 0.65 (0.65)
DAS3-D 185 (186) 2.6 (2.5) 2.6 0.26 0.65 (0.65)
Rockstar 2.8 2.7 6.9 0.23 0.65
Grid5000 7.6 8.2 2.4 0.20 0.65
Hiroshi 0.73 0.73 0.70 0.73 0.61
Desktop 3.3 3.3 2.2 2.2 0.25
When applicable, the throughput of regular sockets is shown between brackets.
Although GCB is similar to SmartSockets , there are some
significant differences. Because GCB clients directly connect
to the (remote) GCB server representing the target client,
this server must be on a publicly accessible network. Also,
outgoing connectivity is required on all client nodes. GCB
only supports two hop message routing, and does not have
support for multi homing on the client nodes. In Smart-
Sockets the hubs are not required to be publicly accessi-
ble, but instead, the hubs must be capable of setting up a
spanning tree. It is generally not a problem when a subset
of the hubs can only use outgoing connections or can only
be reached through SSH tunneling. Outgoing connectivity
is not required for the clients, since they can route their
connections over the hubs, using multiple hops if necessary.
Unlike SmartSockets , GCB does have support for incoming
legacy TCP connections.
Many projects attempt to solve the connectivity problems
by using peer-to-peer overlay networks. In WOW [18] vir-
tual machines are combined with peer-to-peer techniques to
create a virtual cluster. By running VMware [31] on all
machines a uniform system image can be provided to the
applications. All traffic to the (virtual) network device is
intercepted and routed to the target using the IPOP [17]
overlay network. To the application the system appears as
a single cluster using a local area network with private ad-
dresses. The system also supports transparent migration of
virtual machines. The advantage of this approach is that no
changes are required to the application. It is a heavy weight
solution, however. Instead of just deploying the application
to the Grid sites, VMware must also be deployed, including
a copy of the required operating system. In addition, all net-
work traffic is routed using the overlay network, even when
two machines are located in the same site. The experiments
in [18] shown that this limits the network bandwidth in a
single site to 12.5 MBit/s. Even when IPOP [17] is used di-
rectly by the application, the network bandwidth is reduced
to 61% on a local-area network, and 51% on a wide-area
network. VNET [32] is similar to WOW, but uses tunneling
instead of peer-to-peer techniques to forward network traffic.
Like WOW, VNET shows significant performance degrada-
tion in both local and wide-area experiments. VIOLIN [22]
and ViNe [33] propose similar solutions. In [25], this per-
formance degradation is solved by only using peer-to-peer
techniques for resource discovery and allocation. The ap-
plications use regular sockets instead, thereby significantly
improving the performance, but also reintroducing the con-
nectivity problems described in this paper.
SmartSockets uses a combination of the approaches de-
scribed above. It prefers to create direct connections and
only uses routing or tunneling as a last resort. This results
in a performance on par with regular sockets when possible,
but also offers improved connectivity when it is needed.
Several mechanisms exist that allow two machines using
NAT to set up a communication channel. STUN [27] only
allows the exchange of UDP messages. STUNT [19, 20]
and NATBlaster [6] support TCP, but require access to raw
sockets, for which special user privileges are needed. All
three use external servers to provide information on address
and port translation. Both STUN and NATBlaster use port
range prediction. The system described in [13] does not re-
quire raw sockets, but does not use port range prediction
during connection setup. As shown in [19], the connection
setup success rate of this approach increased from 45% to
84% when port range prediction was added. Since Smart-
Sockets uses the same mechanism as [13], but includes port
range prediction, we also expect the connection setup suc-
cess rate to be around 84%.
Although SmartSockets was initially designed to increase
the connectivity, it is also used to do the exact opposite. By
extending the handshake performed in the direct connection
layer with a check that selectively refuses connections based
on the address of the source machine, a simple firewall can
be simulated. This allows a complex network with limited
connectivity to be simulated on single cluster. In [11], this
mechanism is used to evaluate the effectiveness of peer-to-
peer gossiping techniques when machines have limited con-
nectivity. Based on these experiments, the authors propose
a new gossiping algorithm, Actualized Robust Random Gos-
siping (ARRG), that outperforms existing algorithms in sit-
uations where the network connectivity is restricted.
In MOB [9], the multi homing support of SmartSockets is
used to run cluster to cluster multicast experiments. Smart-
Sockets automatically selects the fast local network for intra-
cluster traffic, while only using the regular ethernet between
9
clusters. This project also uses the firewall simulation de-
scribed above to divide a single cluster into several smaller
ones. By forcing all communication to be routed via a small
number of machines, shared links between the clusters can
be simulated. By artificially reducing the bandwidth or in-
creasing the latency on those shared links, the robustness of
the multicast algorithms can be tested.
6. CONCLUSIONS
In this paper we have introduced an integrated framework,
called SmartSockets, which is capable of solving the connec-
tivity restrictions found on many Grid sites, with very little
help from the user. We have shown that by using caching,
the connection setup time can be reduced significantly. In 30
connection setup experiments, using 6 different sites world-
wide, our framework was always capable of creating a con-
nection, requiring a maximum time of 750 milliseconds once
the necessary information was cached. A conventional con-
nection could only be created in 6 out of the 30 combina-
tions. By preferring direct connections, the bandwidth and
latency offered by SmartSockets is similar to that of a con-
ventional connection (in the situations where a conventional
connection can be created).
By using a heuristic that prefers suitable private addresses
of a target machine during connection setup, a fast local net-
work is often selected for intra-site communication, thereby
potentially improving the application performance. As we
have shown, it is essential that an identity check is performed
to prevent a connection to a wrong machine.
So far we have only shown the results of low-level per-
formance benchmarks. The following step in our work will
be to further evaluate the performance and scalability of
SmartSockets using a wide range of parallel and distributed
applications. We are also planning to improve the through-
put on long distance connections.
7. REFERENCES
[1] Netfilter. http://www.netfilter.org.
[2] SUN Java 5.0. http://java.sun.com.
[3] The Grid5000 system. http://www.grid5000.fr.
[4] The InfiniBand Trade Alliance architecture.
http://www.infinibandta.org.
[5] Universal Plug and Play (UPnP). http://www.upnp.org.
[6] A. Biggadike, D. Ferullo, G. Wilson, and A. Perrig.
NATBlaster: Establishing TCP connections between hosts
behind NATs. In In Proc. of ACM SIGCOMM Asia
Workshop, April 2005.
[7] N.Boden,D.Cohen,R.Felderman,A.K.andC.L.Seitz,
J. Seizovic, and W. Su. Myrinet: A Gigabit-per-second Local
Area Network. IEEE Micro, 15(1):29–36, Jan. 1995.
[8] D.Caromel,C.Delbe,A.diCostanzo,andM.Leyton.
ProActive: an Integrated Platform for Programming and
Running Applications on Grids and P2P systems.
Computational Methods in Science and Technology, 12, 2006.
[9] M. den Burger and T. Kielmann. ”MOB: Zero-configuration
High-throughput Multicasting for Grid Applications”. In Proc.
of the 16th International Symposium on High-Performance
Distributed Computing (HPDC-16), Monterey, California,
USA, June 2007. Accepted for publication.
[10] A. Denis, O. Aumage, R. F. H. Hofman, K. Verstoep,
T. Kielmann, and H. E. Bal. Wide-area Communication for
Grids: An Integrated Solution to Connectivity, Performance
and Security Problems. In Proc. of the 13th International
Symposium on High-Performance Distributed Computing
(HPDC-13), pages 97–106, Honolulu, Hawaii, USA, June 2004.
[11] N. Drost, E. Ogston, R. V. van Nieuwpoort, and H. E. Bal.
”ARRG: Real-world Gossiping”. In Proc. of the 16th
International Symposium on High-Performance Distributed
Computing (HPDC-16), Monterey, California, USA, June
2007. Accepted for publication.
[12] K. Egevang and P. Francis. The IP Network Address
Translator (NAT). RFC 1631, May 1994. Obsoleted by RFC
3022.
[13] B. Ford, D. Kegel, and P. Srisuresh. Peer-to-peer
Communication Across Network Address Translators. In
Proceedings o f th e 2005 USENIX Technical Conference, 2005.
[14] P. Francis. Is The Internet Going Nutss? IEEE Internet
Computing, 7(6):94–96, 2003.
[15] N. Freed. Behavior of and Requirements for Internet Firewalls.
RFC 2979, Oct. 2000.
[16] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design
Patterns: Elements of Reusable Object-Oriented Software.
Addison Wesley, Reading (MA), USA, 1995.
[17] A. Ganguly, A. Agrawal, P. O. Boykin, and R. Figueiredo. IP
over P2P: Enabling Self-configuring Virtual IP Networks for
Grid Computing. In Proc. of 20th International Parallel and
Distributed Processing Symposium (IPDPS-2006), April 2006.
[18] A. Ganguly, A. Agrawal, P. O. Boykin, and R. Figueiredo.
WOW: Self-organizing Wide Area Overlay Networks of Virtual
Wor k sta t ion s . I n Proc. of the 15th International Symposium
on High-Performance Distributed Computing (HPDC-15),
Paris, France, June 19-23 2006.
[19] S. Guha and P. Francis. Characterization and Measurement of
TCP traversal Through NATs and Firewalls. In In Proc. of
Internet Measurement Conference (IMC), 2005.
[20] S. Guha, Y. Takeda, and P. Francis. Nutss: a SIP-based
Approach to UDP and TCP Network Connectivity. In FDNA
’04 : Proceedings of th e AC M SI GC OMM worksho p on
Futu re directions in networ k architecture, pages 43–48, New
York, NY, USA, 2004. ACM Press.
[21] T. Hain. Architectural Implications of NAT. RFC 2993, Nov.
2000.
[22] X. JIANG and D. XU. VIOLIN: Virtual Internetworking on
Overlay Infrastructure. In Proc. of the 2th International
Symposium on Parallel and Distributed Processing and
Applications., December 2004.
[23] P. Leach, M. Mealling, and R. Salz. A Universally Unique
IDentifier (UUID) URN Namespace. RFC 4122, July 2005.
[24] M. Leech, M. Ganis, Y. Lee, R. Kuris, D. Koblas, and
L. Jones. SOCKS Protocol Version 5. RFC 1928, Mar. 1996.
[25] Z. Pan, X. Ren, R. Eigenmann, and D. Xu. Executing MPI
Programs on Virtual Machines in an Internet Sharing System.
In Proc. of 20th International Parallel and Distributed
Processing Symposium (IPDPS-2006), April 2006.
[26] Y. Rekhter, B. Moskowitz, D. Karrenberg, G. J. de Groot, and
E. Lear. Address Allocation for Private Internets. RFC 1918,
Feb. 1996.
[27] J. Rosenberg, J. Weinberger, C. Huitema, and R. Mahy. STUN
- Simple Traversal of User Datagram Protocol (UDP) Through
Network Address Translators (NATs). RFC 3489, Mar. 2003.
[28] S. Son and M. Livny. Recovering Internet Symmetry in
Distributed Computing. In In Proceeding s of t he 3rd
International Symposium on Cluster Computing and the
Grid, May 2003.
[29] P. Srisuresh and K. Egevang. Traditional IP Network Address
Translator (Traditional NAT). RFC 3022, Jan. 2001.
[30] P. Srisuresh, J. Kuthan, J. Rosenberg, A. Molitor, and
A. Rayhan. Middlebox Communication Architecture and
Framework. RFC 3303, Aug. 2002.
[31] J. Sugerman, G. Venkitachalam, and B. Lim. Virtualizing I/O
Devices on VMware Workstation’s Hosted Virtual Machine
Monitor. In Proc. of the USENIX Annual Technical
Conference, June 2001.
[32] A. SUNDARARAJ and P. DINDA. Towards Virtual Networks
for Virtual Machine Grid Computing. In Proc.ofthe3rd
USENIX Virtual Machine Research And Technology
Symposium (VM 2004)., 2004.
[33] M. Tsugawa and J. A. Fortes. A Virtual Network (ViNe)
Architecture for Grid Computing. In Proc. of 20th
International Parallel and Distributed Processing Symposium
(IPDPS-2006), April 2006.
[34] R. V. van Nieuwpoort, J. Maassen, A. Agapi, A.M. Oprescu,
and T. Kielmann. Experiences Deploying Parallel Applications
on a Large-scale Grid. In Proc. of EXPGRID - Experimental
Grid Testbeds for the Assessment of Large-scale Distributed
Applications and Tools. Workshop in conjunction with
(HPDC-15), Paris, France, June 2006.
10
... While overlay networks have been used through interfaces that include virtual network devices at layer-2 or layer-3 [35,36], IP forwarding proxies [28], ''convergence'' layer below the transport stack [37], or ''smart sockets'' [38] at the application layer, this paper presents a novel approach where overlay links are initiated and terminated at switch ports of a distributed fabric of SDN switches organized as a P2P overlay and where endpoints may have private, non-routable Internet addresses. ...
Article
The advent of virtualization and cloud computing has fundamentally changed how distributed applications and services are deployed and managed. With the proliferation of IoT and mobile devices, virtualized systems akin to those offered by cloud providers are increasingly needed geographically near the network’s edge to perform processing tasks in proximity to the data sources and sinks. Latency-sensitive, bandwidth-intensive applications can be decomposed into workflows that leverage resources at the edge – a model referred to as fog computing. Not only is performance important, but a trustworthy network is fundamental to guaranteeing privacy and integrity at the network layer. This paper describes Bounded Flood, a novel technique that enables virtual private Ethernet networks that span edge and cloud resources – including those constrained by NAT and firewall middleboxes. Bounded Flood builds upon a scalable structured peer-to-peer overlay, and is novel in how it integrates overlay tunnels with SDN software switches to create a virtual network with dynamic membership – supporting unmodified Ethernet/IP stacks to facilitate the deployment of edge applications. Bounded Flood has been implemented as the core of the EdgeVPN open-source virtual private network software system for edge computing. Experiments with the software demonstrate its functionality and scalability – one of which includes Kubernetes with Flannel across Raspberry Pi 4 edge devices behind different NATs.
... This is far from trivial, as clusters are often protected by a firewall that disallows any incoming communication into the cluster. Also, it is common for the compute nodes to be configured such that they can only communicate with the cluster frontend, but not directly with the outside world, as explained in Maassen and Bal (2007). To solve this problem, we created wrapper code that is capable of intercepting the MPI calls in POP. ...
Article
Full-text available
The Parallel Ocean Program (POP) is used in many strongly eddying ocean circulation simulations. Ideally it would be desirable to be able to do thousand-year-long simulations, but the current performance of POP prohibits these types of simulations. In this work, using a new distributed computing approach, two methods to improve the performance of POP are presented. The first is a block-partitioning scheme for the optimization of the load balancing of POP such that it can be run efficiently in a multi-platform setting. The second is the implementation of part of the POP model code on graphics processing units (GPUs). We show that the combination of both innovations also leads to a substantial performance increase when running POP simultaneously over multiple computational platforms.
... All services running on a server are represented by certain ports in handling the request. These open ports on servers welcome attackers to scan the port to check services for vulnerable information, especially for malfeasant motives [2][3][4][5][6] on zero days exploits. ...
Article
Diverse security measures are used to improve security entropy including the introduction of secure port services, better tunneling protocols and complex encryptions cryptography. Most of these do not address the fundamental of the security risk which is to avoid newly discovered exploits and protect credential from man-in-the-middle attack. In this study, experiments involving three types of existing environment, which include insecure connection as a basis, working against pre-shared key and public-key infrastructure (PKI), are being modeled. A new framework named SeDIC has been introduced to overcome the limitations and address the current security weaknesses. In this new implementation, forward secrecy is maintained since the key for authentication is only valid once and this will deny replay attack. This study proves that secure internet application is possible and the user can have the freedom to use the lowest cryptographic entropy to perform their on-line transactions. Having complex mathematical algorithms such as Elliptic Curve Cryptography (ECC) for tunneling, or even multilayer authentication system alone will not address the potential risk, besides prolonging the time for intruder in gaining unauthorized access.
... The problem of limited connectivity has also been addressed in works [34,75,76,77] which rely on an explicit structure to route messages on top of a gossip protocol. These solutions use proactive mechanisms to ensure that communication between natted nodes is possible under the implicit assumption that the network is fairly static. ...
Article
Peer-to-peer (P2P) systems are very popular today. Their usage goes from instant messaging to file sharing, from distributed backup and storage to even live-video streaming. Among P2P protocols, gossip-based protocols are a family of protocols which have been the object of several research works in the last decade. The reasons behind the interest in gossip-based protocols are that they are considered robust, easy to implement, and that they have interesting scalability properties. They are then appealing candidates for implementing dynamic and large-scale distributed systems. This thesis tackles two problems faced by gossip-based protocols when deployed on a practical scenario as the Internet. The first problem is coping with Network Address Translators (NATs) in the context of gossip-based peer sampling protocols. These protocols make the assumption that, at any moment, each node is able to communicate with any other node of the network. This assumption is false when some nodes use NATs. We present Nylon, a peer sampling protocol which works despite the presence of NATs. Nylon introduces a low overhead to cope with NATs and fairly balances this overhead among nodes using a NAT and those which do not. The second problem that we study is the possibility to limit the dissemination of “spam” messages in gossip-based dissemination protocols. These protocols are in fact ideal vectors to spread spam messages due to the fact that there is no central authority in charge of filtering messages based on their content. We propose FireSpam, a gossip-based dissemination protocol which allows limiting the dissemination of “spam” messages. FireSpam implements a decentralized filtering mechanism (each node participates to the filtering). Moreover, it works despite the presence of a fraction of malicious nodes (also called “Byzantine” nodes) and despite the presence of so called “rational” nodes (also called “selfish” nodes). These latters are willing to deviate from the protocol if they have an interest in doing so.
... In order to distribute multiscale applications that among many clusters, MUSCLE had to be adapted for firewalls and local IP-range environments. Firstly, a solution based on the port-range technique [96] was implemented, a mechanism which limits the ports numbers MUSCLE uses to some predefined range. ...
Thesis
Full-text available
Multiscale models combine knowledge, data, and hypotheses from different scales. Simulating a multiscale model often requires extensive computation. This thesis evaluates distributing these computations, an approach termed distributed multiscale computing (DMC). First, the process of multiscale modelling is examined, in order to describe it in a general and effective way. Then, multiscale models are described with a scale-aware component-based approach, treating them as a set of coupled single scale models. The computational architecture of multiscale applications is then specified with the multiscale modelling language. Such a specification can be analysed for its structural and computational characteristics, using a task graph, and it can be used as the basis for an implementation with Multiscale Coupling Library and Environment 2 (MUSCLE 2). MUSCLE 2 executes multiscale applications on local and distributed machines with a low overhead. As a use case, a model of in-stent restenosis (ISR3D) is described as a set of coupled single scale models, specified with the multiscale modelling language, and implemented and executed with MUSCLE 2. When doing distributed computing, this lead to a decrease in resource consumption under specific circumstances. Five other applications from several domains evaluated DMC, and derived different benefits from it: an increase of simulation speed or a decrease in resource consumption by using heterogeneous machines; or an increase in simulation speed by using more resources altogether. Given these results, DMC is deemed viable for heterogeneous multiscale models and for users with limited local computing resources.
Chapter
This article proposes a new approach for distributed computing. The main novelty consists in the exploitation of Web browsers as clients, thanks to the availability of Javascript, AJAX and Flex. The described solution has two main advantages: it is client-free, so no additional programs have to be installed to perform the computation, and it requires low CPU usage, so client-side computation is no invasive for users. The solution is developed using both AJAX and Adobe®Flex®technologies embedding a pseudo-client into a Web page that hosts the computation. While users browse the hosting Web page, computation takes place resolving single sub-problems and sending the solution to the server-side part of the system. Our client-free solution is an example of high resilient and auto-administrated system that is able to organize the scheduling of the processes and the error management in an autonomic manner. A mathematical model has been developed over this solution. The main goals of the model are to describe and classify different categories of problems on the basis of the feasibility and to find the limits in the dimensioning of the scheduling systems to have convenience in the use of this approach. The new architecture has been tested through different performance metrics by implementing two examples of distributed computing, the cracking of an RSA cryptosystem through the factorization of the public key and the correlation index between samples in genetic data sets. Results have shown good feasibility of this approach both in a closed environment and also in an Internet environment, in a typical real situation.
Article
Full-text available
The landscape of distributed computing systems has changed many times over the previous decades. Modern real-world distributed systems consist of clusters, grids, clouds, desktop grids, and mobile devices. Writing applications for such systems has become increasingly difficult. The aim of the Ibis project is to drastically simplify the programming of such applications. The Ibis philosophy is that real-world distributed applications should be developed and compiled on a local workstation, and simply be launched from there. The Ibis project studies several fundamental problems of distributed computing hand-in-hand with major applications, and integrates the various solutions in one programming system.
Article
Full-text available
The distributed computing attempts to improve performance in large-scale computing problems by resource sharing. Moreover, rising low-cost computing power coupled with advances in communications/networking and the advent of big data, now enables new distributed computing paradigms such as Cloud, Jungle and Fog computing. Cloud computing brings a number of advantages to consumers in terms of accessibility and elasticity. It is based on centralization of resources that possess huge processing power and storage capacities. Fog computing, in contrast, is pushing the frontier of computing away from centralized nodes to the edge of a network, to enable computing at the source of the data. On the other hand, Jungle computing includes a simultaneous combination of clusters, grids, clouds, and so on, in order to gain maximum potential computing power. To understand these new buzzwords, reviewing these paradigms together can be useful. Therefore, this paper describes the advent of new forms of distributed computing. It provides a definition for Cloud, Jungle and Fog computing, and the key characteristics of them are determined. In addition, their architectures are illustrated and, finally, several main use cases are introduced.
Article
To solve the problem of inefficient transmission using multiple interfaces of multihome terminal in the tradition network, an end-to-end multipath transport layer architecture-E2EMP oriented the next generation network was presented. Through distributing data adaptively following characters of the end-to-end paths, adopting dual sequence space, implementing smart path management policies, the performance of the multihome terminal using E2EMP has significant improvement. The simulation results show that E2EMP aggregates bandwidth of the multihome terminal interfaces efficiently, and meanwhile promotes the security and reliability of end-to-end multipath transport.
Article
Full-text available
We describe our experiences with integrating sev- eral Grid software components into a single coherent system that is used to write and run parallel applications on the Grid. The integrated components are the Grid Application Toolkit (GAT), ProActive, Satin and Ibis. We experimented with this (Java- based) system by participating in the N-Queens contest of the Grids@work event in October 2005. In addition to integrating available components, we wrote a ProActive plugin for the GAT, a parallel N-Queens solver, and an application to manage Grid deployment of N-Queens. We identified several connectivity issues and scalability problems in the components we use. We show how we modified some of the components to solve of these problems. We successfully ran experiments on 960 processors across Grid'5000, with an efficiency of around 85%, winning the prize for the largest number of nodes deployed during the contest. The Grids@work event held in October 2005 in Sophia Antipolis, France (7) was composed of a series of conferences and tutorials including the 2nd Grid Plugtests. The objective was to bring together Grid users and to present and discuss current and future features of the ProActive Grid platform, and to test the deployment and interoperability of Grid applications on various Grids. A part of the 2nd Grid Plugtests consisted of an N-Queens contest, where the aim was to find the number of solutions to the N-Queens problem, N being as big as possible, in a limited amount of time. We participated in this contest with a parallel N-Queens application. We used this application and the Grid testbed that was provided to integrate many software components, and to evaluate the integration, functionality and performance. The global structure of the system we used is shown in Figure 1. For portability reasons, all software components are written in Java. The N-Queens application itself is written in Satin, our Java-based divide-and-conquer programming model (9). Satin is implemented on top of the Ibis (11) communication library, while deployment of the application was done with a manager application that was written specifically for this contest. The manager uses the Java Grid Application Toolkit (2) (GAT) to access the Grid. The Java GAT in turn uses the ProActive (5) middleware for Grid deployment. This paper describes our experiences with integrating all different software components we used, and identifies some problems in our software packages, Ibis and Satin in particular, that we discovered during the contest. Some are related to scaling a parallel programming system up to 1000 machines that are distributed over a large geographical area, others were related to typical Grid issues such as firewall problems and network misconfigurations. Although we did encounter some difficulties, we were still able to run the parallel N-Queens application on 961 CPUs scattered across different Grid'5000 sites in France. Finally, we suggest possible solutions for the problems encountered. After identifying and solving the prob- lems we describe in this paper, we won the prize for largest number of nodes deployed in parallel during the contest. The remainder of this paper is structured as follows. First, we discuss the deployment tools GAT and ProActive (Sec- tion II), followed by Ibis and Satin (Section III), and then the N-Queens application itself (Section IV) and the testbed (Section V). Section VI will discuss the issues we encountered using a large-scale Grid, along with the solutions we applied. Sections VII and VIII summarize results and our conclusions, respectively.
Article
Full-text available
Simple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs) (STUN) is a lightweight protocol that allows applications to discover the presence and types of NATs and firewalls between them and the public Internet. It also provides the ability for applications to determine the public Internet Protocol (IP) addresses allocated to them by the NAT. STUN works with many existing NATs, and does not require any special behavior from them. As a result, it allows a wide variety of applications to work through existing NAT infrastructure.
Article
Full-text available
This paper describes WOW, a distributed system that combines virtual machine, overlay networking and peer-to-peer techniques to create scalable wide-area networks of virtual workstations for high-throughput computing. The system is architected to: facilitate the addition of nodes to a pool of resources through the use of system virtual machines (VMs) and self-organizing virtual network links; to maintain IP connectivity even if VMs migrate across network domains; and to present to end-users and applications an environment that is functionally identical to a local-area network or cluster of workstations. We describe IPOP, a network virtualization technique that builds upon a novel, extensible user-level decentralized technique to discover, establish and maintain overlay links to tunnel IP packets over different transports (including UDP and TCP) and across firewalls. We evaluate latency and bandwidth overheads of IPOP and also time taken for a new node to become fully-routable over the virtual network. We also report on several experiments conducted on a testbed WOW deployment with 118 P2P router nodes over PlanetLab and 33 VMware-based VM nodes distributed across six firewalled domains. Experiments show that the testbed delivers good performance for two unmodified, representative benchmarks drawn from the life-sciences domain. We also demonstrate that the system is capable of seamlessly maintaining connectivity at the virtual IP layer for typical client/server applications (NFS, SSH, PBS) when VMs migrate across a WAN.
Article
We propose a grid programming approach using the ProAc-tive middleware. The proposed strategy addresses several grid concerns, which we have classified into three categories. I. Grid Infrastructure which handles the resource acquisition and creation using deployment descriptors and Peer-to-Peer. II. Grid Technical Services which can pro-vide non-functional transparent services like: fault tolerance, load balanc-ing, and file transfer. III. Grid Higher Level programming with: group communication and hierarchical components. We have validated our ap-proach with several grid programming experiences running applications on heterogeneous Grid resource using more than 1000 CPUs.
Article
Basic Network Address Translation or Basic NAT is a method by which IP addresses are mapped from one group to another, transparent to end users. Network Address Port Translation, or NAPT is a method by which many network addresses and their TCP/UDP (Transmission Control Protocol/User Datagram Protocol) ports are translated into a single network address and its TCP/UDP ports. Together, these two operations, referred to as traditional NAT, provide a mechanism to connect a realm with private addresses to an external realm with globally unique registered addresses.
Article
The communications establishment capability of the Session Initi-ation Protocol is being expanded by the IETF to include establish-ing network layer connectivity for UDP for a range of scenarios, including where hosts are behind NAT boxes, and host are run-ning IPv6. So far, this work has been limited to UDP because of the assumed impossibility of establishing TCP connections through NAT, and because of the difficulty of predicting port assignments on certain common types of NATs. This paper reports on prelimi-nary success in establishing TCP connections through NAT, and on port prediction. In so doing, we suggest that it may be appropriate for SIP to take a broader architectural role in P2P network layer connectivity for both IPv4 and IPv6.