Conference PaperPDF Available

SmartSockets: Solving the connectivity problems in grid computing

June 2007

June 2007

DOI:10.1145/1272366.1272368

Source
DBLP

Conference: Proceedings of the 16th International Symposium on High-Performance Distributed Computing (HPDC-16 2007), 25-29 June 2007, Monterey, California, USA

Authors:

Jason Maassen

Netherlands eScience Center

Tightly coupled parallel applications are increasingly run in Grid environments. Unfortunately, on many Grid sites the ability of machines to create or accept network connections is severely limited by firewalls, network address translation (NAT) or non-routed networks. Multi homing further com- plicates connection setup and machine identification. Al- though ad-hoc solutions exist for some of these problems, it is usually up to the application's user to discover the cause of the connectivity problems and find a solution. In this paper we describe SmartSockets,1 a communication library that lifts this burden by automatically discovering the con- nectivity problems and solving them with as little support from the user as possible.

Content uploaded by Jason Maassen

Content may be subject to copyright.

SmartSockets: Solving the Connectivity Problems

in Grid Computing

Jason Maassen and Henri E. Bal

Dept. of Computer Science, Vrije Universiteit

Amsterdam, The Netherlands

jason@cs.vu.nl, bal@cs.vu.nl

ABSTRACT

Tightly coupled parallel applications are increasingly run in

Grid environments. Unfortunately, on many Grid sites the

ability of machines to create or accept network connections

is severely limited by ﬁrewalls, network address translation

(NAT) or non-routed networks. Multi homing further com-

plicates connection setup and machine identiﬁcation. Al-

though ad-hoc solutions exist for some of these problems, it

is usually up to the application’s user to discover the cause

of the connectivity problems and ﬁnd a solution. In this

paper we describe SmartSockets,1a communication library

that lifts this burden by automatically discovering the con-

nectivity problems and solving them with as little support

from the user as possible.

Categories and Subject Descriptors: C.2.4 [Distributed

Systems]: Distributed applications

General Terms: Algorithms, Design, Reliability

Keywords: Connectivity Problems, Grids, Networking, Par-

allel Applications

1. INTRODUCTION

Parallel applications are increasingly run in Grid environ-

ments. Unfortunately, on many Grid sites the ability of ma-

chines to create or accept network connections is severely

limited by network address translation (NAT) [14, 26] or

ﬁrewalls [15]. There are even sites that completely disallow

any direct communication between the compute nodes and

the rest of the world (e.g., the French Grid5000 system [3]).

In addition, multi homing (machines with multiple network

addresses) can further complicate connection setup.

For parallel applications that require direct communica-

tion between their components, these limitations have ham-

pered the transition from traditional multi processor or clus-

ter systems to Grids. When a combination of Grid sites is

used, serious connectivity problems are often encountered.

1SmartSockets is part of the Ibis project, and can be found

at http://www.cs.vu.nl/ibis

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

HPDC’07, June 25–29, 2007, Monterey, California, USA.

Unfortunately, it is often up to the user to discover the

cause of these connectivity problems, a non-trivial task at

best. Once the problems are identiﬁed, it may be possible

to circumvent some of them by using ad-hoc solutions, such

as opening a port range in a ﬁrewall, explicitly specifying

which address to use on a multi-homed machine, or using

SSH tunneling. Many problems, however, can only be solved

by adapting the application or the communication library it

uses. To make matters worse, as soon as the set of Grid sys-

tems being used changes, a large part of this process needs

to be repeated. As a result, running a parallel application

on multiple Grid sites can be a strenuous task [34].

In this paper we will describe a solution to this problem:

the SmartSockets communication library. The primary fo-

cus of SmartSockets is on ease of use. It automatically dis-

covers a wide range of connectivity problems and attempts

to solve them with little or no support from the user. Smart-

Sockets combines many known solutions, such as port for-

warding, TCP splicing and SSH tunneling, and introduces

several new ones that resolve problems with multi homing

and machine identiﬁcation. In 30 connection setup experi-

ments, using 6 diﬀerent sites worldwide, SmartSockets was

always able to establish a connection, while conventional

sockets only worked in 6 experiments. Using heuristics and

caching, SmartSockets is able to signiﬁcantly improve the

connection setup performance.

SmartSockets oﬀers a single integrated solution that hides

the complexity of connection setup in Grids behind a simple

interface that closely resembles sockets. We will show that

it is relatively straightforward to port an existing applica-

tion to SmartSockets, provided that certain programming

guidelines are followed.

SmartSockets is not speciﬁcally intended for use in par-

allel applications or Grids. It can also be applied to other

distributed applications, such as visualization, cooperative

environments, or even consumer applications such as instant

messaging, ﬁle sharing, or online gaming. However, many

of these applications only require a very limited degree of

connectivity. Often, clients simply connect to a server in a

well-known location, making it relatively easy to apply an

ad-hoc solution when a connectivity problem occurs.

Parallel applications, however, can be much more chal-

lenging. They often require a large number of connections

between the participating machines, and each machine must

be capable of both initiating outgoing and accepting incom-

ing connections. Running such applications in a Grid en-

vironment with limited connectivity is diﬃcult. Therefore,

this paper will focus on this domain.

In Section 2 we describe the connectivity related problems

encountered while running applications on multiple Grid

sites. Section 3 describes how these problems are solved in

SmartSockets and brieﬂy looks at the programming inter-

face. Section 4 evaluates the performance of SmartSockets,

Section 5 describes related work, and Section 6 concludes.

2. CONNECTIVITY PROBLEMS

In this section we will give a description of the network re-

lated problems that can occur when running a single parallel

or distributed application on multiple Grid sites.

2.1 Firewalls

As described in [15], ”A ﬁrewall is an agent which screens

network traﬃc in some way, blocking traﬃc it believes to be

inappropriate, dangerous, or both.”. Many sites use ﬁre-

walls to protect their network from unauthorized access.

Firewalls usually allow outbound connections, but block in-

coming connections, often with the exception of a few well-

known ports (e.g., port 22 for SSH).

It is obvious that this connectivity restriction can cause

severe problems when running a parallel application on mul-

tiple sites. When only a single participating site uses ﬁre-

wall, the connectivity problems can sometimes be solved by

ensuring that the connections setups are ’in the right direc-

tion’, i.e., that all required connections between open and

ﬁrewalled machines are initiated at the ﬁrewalled site. This

solution may require changes to the applications or commu-

nication libraries, however. Also, if both sites use a ﬁrewall,

this approach can no longer be used. In this case, a ﬁre-

wall will always be encountered regardless of the connection

setup direction.

One way to solve the problems is to request an open port

range in the ﬁrewall. Connectivity can then be restored by

adapting the application to only use ports in this range.

Besides requiring reconﬁguration of the ﬁrewall, open ports

are also seen as a threat to site security.

When both machines are behind a ﬁrewall it may still

be possible to establish a direct connection using a mecha-

nism called TCP splicing [6, 10, 13, 20]. Simply put, this

mechanism works by simultaneously performing a connec-

tion setup from both sides. Since this approach requires ex-

plicit cooperation between the machines, some alternative

communication channel must be available.

2.2 Network Address Translation

As described in [21], ”Network Address Translation is a

method by which IP addresses are mapped from one ad-

dress realm to another, providing transparent routing to end

hosts.”. NAT was introduced in [12] as a temporary solu-

tion to the problem of IPv4 address depletion. Although the

intended solution for this problem, IPv6, has been available

for some time, NAT is still widely used today.

Many diﬀerent ﬂavors of NAT exist, but the Network Ad-

dress Port Translation is most frequently used [21, 29]. This

type of NAT allows outbound connections from sites using

private addresses, but does not allow incoming connections.

Both the IP address (and related ﬁelds) and the transport

identiﬁer (e.g., TCP and UDP port numbers) of packets are

translated, thereby preventing port number collisions when

a set of hosts share a single external address.

As mentioned above, NAT only allows outbound network

connections. Incoming connections are rejected, since the

connection request does not contain enough information to

ﬁnd the destination machine (i.e., only the external IP ad-

dress is provided, but that may be shared by many ma-

chines). This restriction leads to connectivity problems that

are very similar to those caused by ﬁrewalls. Therefore, the

solution described in Section 2.1 (connecting ’in the right di-

rection’) also applies to a NAT setup, and fails in a similar

way when multiple NAT sites try to interconnect.

Although the TCP splicing mechanism can also be used

to connect two NAT sites, a more complex algorithm is re-

quired to compensate for the port translation performed by

NAT [6, 20].

Some NAT implementations have support for port for-

warding, where all incoming connections on a certain port

can be automatically forwarded to a certain host inside the

NAT site. Using mechanisms such as UPnP [5], DPF [28],

or MIDCOM [30], applications can contact the NAT im-

plementation and change the port forwarding rules on de-

mand. Port forwarding lifts many of the restrictions on in-

coming connections. Unfortunately, UPnP is mostly found

in consumer devices, MIDCOM is still under development,

and DPF only supports NAT (and ﬁrewall) implementations

based on NetFilter [1]. As a result, these mechanisms are

not (yet) generally usable in Grid applications. Currently,

SmartSockets only supports UPnP.

In addition to causing connection setup problems, NAT

also complicates machine identiﬁcation. Machines in a NAT

site generally use IP addresses in the private range [26].

These addresses are only usable within a local network and

are not globally unique. Unfortunately, parallel applications

often use a machine’s IP address to create a unique identiﬁer

for that machine. When multiple NAT sites participate in a

single parallel run, however, this approach can not be used,

since the machine addresses are no longer guaranteed to be

unique.

2.3 Non-routed networks

On some sites no direct communication between the com-

pute nodes and the outside world is possible due to a strict

separation between the internal and external networks. No

routing is performed between the two. Only a front-end

machine is accessible, and the connectivity of this machine

may be limited by a ﬁrewall or NAT. Two of the sites used

in Section 4 use such a setup.

It is clear that this is a major limitation when the site is

used in a parallel application. The only possibility for the

compute nodes to communicate with other sites is to use the

front-end machine as a bridge to the outside world, using,

for example, an SSH tunnel or a SOCKS [24] proxy. These

are non-trivial to set up, however.

2.4 Multi Homing

When multi-homed machines (i.e., machines with multi-

ple network addresses) participate in a parallel application,

another interesting problem occurs. When creating a con-

nection to such a machine, a choice must be made on which

of the possible target addresses to use. The outcome of this

choice may depend on the location of the machine that ini-

tiates the connection.

For example, the front-end machine of a site has two ad-

dresses, a public one, reachable over the internet, and a pri-

vate one used to communicate with the site’s compute nodes.

As a result, a diﬀerent address must be used to reach this

machine depending on whether the connection originates in-

side or outside of the site.

In [34] we called this the Reverse Routing Problem.Nor-

mally, when a multi-homed machine is trying to connect to

a single IP address, a routing table on the machine decides

which network is used for the outgoing connection. In the

example described above the reverse problem is encountered.

Instead of having to decide how to ‘exit’ a multi-homed ma-

chine, we must decide on how to ‘enter’ it. This problem is

non-trivial, since the source machine generally does not have

enough information available to select the correct target ad-

dress. As a result, several connection attempts to diﬀerent

addresses of the target may be necessary before a connection

can be established. In Section 3.2 we will describe heuristics

that can be used to speed up this process.

Multi homing can have a major eﬀect on the implementa-

tion of parallel programming libraries. The example above

shows that it is not suﬃcient to use a single address to rep-

resent a multi-homed machine. Instead, all addresses must

be made available to the other participants of the parallel

application. In addition, some of the addresses may be in a

private range and refer to a diﬀerent machine when used in

a diﬀerent site. Therefore, it is also essential to check if a

connection was established to the correct machine.

3. SMARTSOCKETS

In this section we will give an overview of the design, im-

plementation and programming interface of the SmartSock-

ets library, and describe how it solves the problems described

in the previous section.

3.1 Overview

Currently, SmartSockets oﬀers four diﬀerent connection

setup mechanisms, Direct, Reverse, Splicing, and Routed.

They will be described in more detail below. Table 1 shows

an overview of how these mechanisms solve the connectivity

problems described in Section 2. As the table shows, each

problem is solved by at least one mechanism.

Table 1: Overview of connectivity problems and

their solutions. Connection Setup Mechanism

Problems Direct Reverse Splicing Routed

Identiﬁcation X

Multi Homing X

Single FW/NAT (X) X X X

Dual FW/NAT (X) X X

No Routing X

The machine identiﬁcation and multi-homing problems

are solved by the direct connection setup. As will be ex-

plained below, this approach also has limited ﬁrewall traver-

sal capabilities (using SSH tunneling), so in certain situa-

tions it may succeed in establishing a connection in a single

or even a dual ﬁrewall setting. In the table these entries are

shown between brackets.

A reverse connection setup is only capable of creating a

connection when a single ﬁrewall or NAT limits the con-

nectivity. Splicing is capable of handling both single and

dual ﬁrewall/NAT conﬁgurations. However, this approach is

signiﬁcantly more complex than a reverse connection setup

(especially with dual NAT) and may not always succeed.

Therefore, reverse connection setup is preferred for single

ﬁrewall/NAT conﬁgurations.

Aroutedconnectionsetupcanbeusedinanysituation

where the connectivity is limited. Unlike the previous two

approaches it does not result in a direct connection. Instead

all network traﬃc is routed via external processes called

hubs (explained in Section 3.3), which may degrade both

latency and throughput of the connection. Therefore, the

previous mechanisms are preferred. When connecting to or

from a machine on a non-routed network, however, a routed

connection is the only choice.

The SmartSockets implementation is divided into two lay-

ers, a low-level Direct Connection Layer, responsible for all

actions that can be initiated on a single machine, and a

high-level Virtual Connection Layer that uses side-channel

communication to implement actions that require cooper-

ation of multiple machines. The direct connection layer is

implemented using the standard socket library. The virtual

connection layer is implemented using the direct connection

layer. Both layers will be explained in more detail below.

Currently, SmartSockets is implemented using Java [2].

3.2 Direct Connection Layer

The direct connection layer implements all actions that

do not require explicit cooperation between machines, such

as determining the local addresses or creating a direct con-

nection. It also supports a limited form of SSH tunneling.

3.2.1 Machine Identiﬁcation

During initialization, the direct connection layer starts by

scanning all available network interfaces to determine which

IP addresses are available to the machine. It then generates

a unique machine identiﬁer that contains these addresses,

and that can be used to contact the machine.

This identiﬁer will automatically be unique if it contains

at least one public address. If all addresses are private,

however, additional work must be done. A machine that

only has private addresses is either in a NAT site or uses a

non-routed network. In the ﬁrst case, a unique identiﬁer can

still be generated for the machine by acquiring the external

address of the NAT. Provided that this address is public,

the combination of external and machine addresses should

also be unique, since other machines in the same NAT site

should have a diﬀerent set of private addresses, and all other

NAT sites should have a diﬀerent external address.

The SmartSockets library will use UPnP to discover the

external address of the NAT site. If this discovery fails, or

if the returned address is not public, a Universally Unique

Identiﬁer (UUID) [23], will be generated and included in the

machine identiﬁer, thereby making it unique.

3.2.2 Connection Setup

Once initialized, the direct connection layer can be used

to set up connections to other machines. The identiﬁer of

the target machine may contain multiple network addresses,

some of which may not be reachable from the current loca-

tion. The private addresses in the identiﬁer may even refer

to a completely diﬀerent machine, so it is important that the

identity of the machine is checked during connection setup.

As a result, several connection attempts may be necessary

before the correct connection can be established.

When multiple target addresses are available, a choice

must be made in which order the connection attempts will

be performed. Although simply using the addresses in an

arbitrary order should always result in a connection (pro-

vided that a direct connection is possible), this may not be

the most eﬃcient approach. Many Grid sites oﬀer high-

performance networks such as Myrinet [7] or Inﬁniband [4]

in addition to a regular Ethernet network. Using such a

network for inter-site communication may signiﬁcantly im-

prove the application’s performance. In general, these fast

networks are not routed and use addresses in the private

range, while the regular Ethernet networks (often) use pub-

lic addresses. Therefore, by sorting the target addresses and

trying all private ones ﬁrst, the fast local networks will au-

tomatically be selected in sites with such a setup.

The drawback of this approach is that the private ad-

dresses of a target will always be tried ﬁrst, even if the con-

nection originates on a diﬀerent site. This may cause a sig-

niﬁcant overhead. Therefore, SmartSockets uses a heuristic

that sorts the target addresses in relation to the addresses

that are available locally. For example, if only a public ad-

dress is available on the local machine, it is unlikely that it

will be able to create a direct connection to a private address

of a target. As a result, the connection order public before

private is used. This order is also used if both machines have

public and private addresses, but the private addresses re-

fer to a diﬀerent network (e.g., 10.0.0.10 vs. 192.168.1.20 ).

The order private before public is only used if both machines

have private addresses in the same range. Section 4 will il-

lustrate the performance beneﬁts of this heuristic.

Unfortunately, it is impossible to make a distinction be-

tween addresses of the same class. For example, if a target

has multiple private addresses, we can not automatically

determine which address is best. Therefore, if a certain

network is preferred, the user must specify this explicitly.

Without this explicit conﬁguration, SmartSockets will still

create a direct connection (if possible), and the parallel ap-

plication will run, but its performance may be suboptimal.

When a connection has been established, an identity check

is performed to ensure that the correct machine has been

reached. This would be a simple comparison if the complete

identiﬁer of the target is available, but unfortunately this

is not always the case. User provided addresses are often

used to bootstrap a parallel application. These addresses

are often limited to a single hostname or IP address, which

may only be part of the addresses available to the target ma-

chine. Therefore, the identity check used by SmartSockets

also allows the use of partial identiﬁers.

Whenever a connection is created, the target machine pro-

vides its complete identity to the machine initiating the con-

nection. This machine then checks if both the public and

private addresses in the partial identity are a subset of the

ones in the complete identity. If so, the partial identity is

accepted as a subset of the complete identity, and the con-

nection is established. Note that although the connection is

created to a machine that matches the address speciﬁed by

the user, it is not necessarily the correct machine from the

viewpoint of the parallel application. Unfortunately, in such

cases it is up to the user to provide an address that contains

enough information to reach the correct machine.

3.2.3 Open Port Ranges and Port Forwarding

When a ﬁrewall has an open port range available, Smart-

Sockets can ensure that all sockets used for incoming con-

nections are bound to a port in this range. There is no way

of discovering this range automatically, however, so it must

be speciﬁed explicitly by the user.

In addition, SmartSockets can use the UPnP protocol to

conﬁgure a NAT to do port forwarding, i.e., automatically

forward all incoming connections on a certain external port

to a speciﬁed internal address. However, as explained before,

this protocol is mainly used in consumer devices.

3.2.4 SSH Tunneling

In addition to regular network connections, the direct con-

nection layer also has limited support for SSH tunneling.

This feature is useful for connecting to machines behind a

ﬁrewall that allows SSH connections to pass through. It

does, however, require a suitable SSH setup (i.e., public key

authentication must be enabled).

Creating an SSH tunnel is similar to a regular connection

setup. The target addresses are sorted and tried consecu-

tively. Instead of using the port speciﬁed in the connection

setup, however, the default SSH port (i.e., 22) is used. When

a connection is established and the authentication is success-

ful, the receiving SSH daemon is instructed to forward all

traﬃc to the original destination port on the same machine.

If this succeeds, the regular identity check will be performed

to ensure that the right machine has been reached.

Although this approach is useful, it can only be used to

set up a tunnel to a diﬀerent process on the target machine.

Using this approach to forward traﬃc to diﬀerent machines

requires extra information. For example, setting up an SSH

tunnel to a compute node of a site through the site’s fron-

tend, can only be done if it is clear that the frontend must

be contacted in order to reach the target machine. Although

this approach is used in some pro jects [8], the necessary in-

formation cannot be obtained automatically and must be

provided by the user. Therefore, SmartSockets uses a diﬀer-

ent approach which will be described in detail in Section 3.3.

3.2.5 Limitations

The direct connection layer oﬀers several types of connec-

tion setup which have in common that they can be initiated

by a single machine. No explicit cooperation between ma-

chines is necessary to establish the connection. There are

many cases, however, where connectivity is too limited and

the direct connection layer cannot be used.

In general, direct connections to sites that use NAT or a

ﬁrewall are not possible. Although SSH tunneling and open

port ranges alleviate the ﬁrewall problems, they require a

suitable SSH setup or extra information from the user. Port

forwarding reduces the problems with NAT, but is rarely

supported in Grid systems. Therefore, these features are of

limited use. In the next section we will give a detailed de-

scription of the virtual connection layer, which solves these

problems.

3.3 Virtual Connection Layer

Like the direct connection layer, the virtual connection

layer implements several types of connection setup. It oﬀers

a simple, socket-like API and has a modular design, making

it easy to extend. Besides a direct module that uses the

direct connection layer described above, it contains several

modules that oﬀer more advanced types of connection setup.

These modules have in common that they require explicit

cooperation (and thus communication) between the source

and target machines in order to establish a connection. As a

result, side-channel communication is required to implement

these modules.

3.3.1 Side-Channel Communication

In SmartSockets, side-channel communication is imple-

mented by creating a network of interconnected processes

called hubs. These hubs are typically started on the fron-

tend machines of each participating site, so their number is

usually small.

When a hub is started, the location of one or more other

hubs must be provided. Each hub will attempt to setup a

connection to the others using the direct connection layer.

Although many of these connections may fail to be estab-

lished, this is not a problem as long as a spanning tree is

created that connects all hubs.

The hubs use a gossiping protocol to exchange information

about themselves and the hubs they know, with the hubs

that they are connected to. This way information about

each hub quickly spreads to all hubs in the network. When-

ever a hub receives information about a hub it has not seen

before, it will attempt to set up a connection to this hub.

This way, new connections will be discovered automatically.

All gossiped information contains a state number indicat-

ing the state of the originating machine when the informa-

tion was sent. Since information from a hub may reach an-

other hub through multiple paths, the state number allows

the receiver to decide which information is most recent.

By recording the length of the path traversed thus far in

the gossiped information, hubs can determine the distance

to the sites that they can not reach directly. Whenever a

hub receives a piece of information about another hub con-

taining a shorter distance than it has seen so far, it will

remember both the distance and the hub from which the in-

formation was obtained. This way, we automatically create

adistributed routing table with the shortest paths between

each pair of hubs. This table is later used to forward appli-

cation information (as will be described below).

When an application is started, the virtual layer on each

machine creates a single connection to the hub local to its

site. The location of this hub can either be explicitly speci-

ﬁed or discovered automatically using UDP multicast.

3.3.2 Virtual Addresses

The connection to the hub can now be used as a side chan-

nel to forward requests to otherwise unreachable machines.

To ensure that the target machines can be found, virtual

addresses are used, consisting of the machine identiﬁer (see

Section 3.2), a port number, and the identiﬁer of the hub

the machine is connected to.

All requests for the target machine can then be sent to the

local hub, which forwards it in the direction of the target

hub using the information contained in its routing table.

The request will continue to be forwarded until the target

hub is reached and the request is delivered to the machine.

3.3.3 Modules

The current implementation of SmartSockets contains four

diﬀerent connection modules, one for each of the connection

setup mechanisms described in Section 3.1.

Direct.

The direct connection module simply forwards all connec-

tion requests to the direct connection layer. It does not make

use of side-channel communication and has the features and

limitations described in Section 3.2.

Reverse.

A direct connection setup will generally fail if the tar-

get is behind a ﬁrewall or NAT. However, as explained in

Section 2, outgoing connections are usually allowed on such

sites. The reverse connection module exploits this property

by reversing the direction of the connection setup.

Instead of creating a connection, the reverse connection

module creates a new socket locally. It then sends a request

to the target machine using the side channel. This request

contains the target’s address and destination port, and the

address of the new socket. When the request is received

on the target machine, it will check if the destination port

exists. If it does, the target machine attempts to create a

direct connection back to the new socket. If successful, this

connection is returned as the result of original connection

setup call on the source. On the target, the new connection

will be queued, awaiting an accept from the application.

Splicing.

The reverse connection module requires the source ma-

chine to be publicly accessible. When both machines are

behind a ﬁrewall or NAT the reverse connection setup will

fail. However, it may still be possible to create a connection

using TCP splicing [10, 20]

When the machines have public addresses (i.e., they are

not behind a NAT), the actions performed by the splicing

module are relatively straightforward. First, the source ma-

chine sends a request to the target using the side channel.

This request contains the target address and destination

port, the complete identiﬁer of the source, and a port the

source will use to create the outgoing connection. When the

request is received on the target machine, it will check if the

destination port exists. If it does, a reply is returned and

both machines repeatedly attempt to create a direct connec-

tion to the other (using only public addresses). Since this

mechanism is sensitive to timing and both machines may

have multiple public addresses, the number of attempts re-

quired may be large.

When one or both machines are behind a NAT (i.e., they

only have private addresses), a diﬀerent approach is used.

As explained in Section 2.2, most NAT implementations

translate the address and port number of outgoing connec-

tions. TCP splicing requires both machines to know each

other’s exact external address and port number. Although

the external address of a NAT site is often constant, the port

mapping is hard to determine, since a diﬀerent port may be

used for every connection attempt.

Fortunately, most NAT implementations use a predictable

port mapping scheme [19]. Therefore, once a single mapping

has been determined, a prediction can be made on the range

of port numbers that is likely to be used in the immediate

future. By using the external address and port range in the

connection setup attempts, TCP splicing can still be used.

To obtain an initial mapping, the assistance of an external

machine (outside of the NAT) is required. SmartSockets

uses the hubs for this purpose. If the source machine is

behind a NAT, it will request a list of available hubs from

its local hub, and attempt to ﬁnd an external hub to which it

can connect directly. When such a connection is successful,

the external hub echoes the source address and port number

to the source machine. If this address is public, it is likely

to be the external address of the NAT site, and the address

and port are included in the request sent to the target. The

target will use the same approach if it is behind a NAT, and

return its external address and port number to the source.

Both machines will now repeatedly perform connection

attempts using the external address of the other site and

trying all port numbers in the predicted range. Currently,

SmartSockets uses a range of [port...port+5].

It is obvious that there are many cases where the splicing

module will not be able to set up a connection. For example,

a machine may be unable to ﬁnd its external address, the

external address may be wrong (e.g., in case of multiple

consecutive NATs), the port range prediction may be wrong,

or the connection attempts may not succeed in time. As

shown in [19], the maximum success rate is approximately

86%. Fortunately, there is a backup solution, the routed

connections module explained below.

Routed.

The last module available is the routed connection mod-

ule. Provided that there is at least a spanning tree connect-

ing the hubs, this module should always be able to create

a connection between two machines, even if these machines

are only connected to non-routed networks.

Whenever a connection is created using the routed con-

nection module, the source sends a connection request to its

local hub using the side channel, containing the target ad-

dress and port. The hub will add this virtual connection in

its administration and forward the request to the next hub

using the routing table described in Section 3.3.1. When the

request reaches the target machine, it will make sure that

the destination port exists and queue the request.

Once the connection is accepted by the application, a re-

ply is sent back via the hubs, and the virtual connection

is established. Both machines now return a virtual socket,

which, instead of sending its data directly to the target ma-

chine, forwards all data through a series of hubs.

3.3.4 Module Order and Caching

To create a new connection, each module is tried until a

connection is established, or until it is clear that a connec-

tion can not be established at all (e.g., because the destina-

tion port does not exist on the target machine). By default,

the order Direct, Reverse, Splice, Routed is used. This order

prefers modules that produce a direct connection.

When a connection is established, the time required for

subsequent connection setups can be reduced by caching

which module was successful. This information is cached

based on the hub address of the target, and not its machine

address (see Section 3.3.2). Caching in this way allows an

entire site to be represented using a single cache entry (since

machines in a site typically share the same hub). Not only

does this save memory, but it also improves the eﬀectiveness

of the cache. After a connection is created to a single ma-

chine of a site, all other connection setups to the same site

beneﬁt from the cached information. In Section 4 we will

show the beneﬁts of this approach.

3.4 Programming Interface

We will now give a short description of the programming

interface of the virtual connection layer of SmartSockets,

which is currently implemented using Java [2]. Converting

an application to SmartSockets is relatively straightforward,

provided that the application uses a Factory Pattern [16]

to create sockets. The javax.net package of the Java class

libraries contains interfaces for two such factories, one to

create Sockets (outgoing connections), the other to create

ServerSockets (incoming connections). Java also oﬀers im-

plementations which create regular or secure sockets (SSL).

SmartSockets extends the Socket and ServerSocket imple-

mentations of Java, and oﬀers two factories which adhere to

the interfaces described above. This allows SmartSockets to

be plugged in to existing applications by simply changing

the factory implementations that are used.

Unfortunately, the addressing scheme used in connection

setup cannot be replaced this easily. Currently, Java used

three forms of addressing. The ﬁrst is based on an InetAd-

dress, which is hard coded to be either an IPv4 or IPv6

address and cannot be extended. When this scheme is used,

SmartSockets has no other choice then to attempt a direct

connection to the given address. None of the other connec-

tion setup schemes can be used due to a lack of information.

The second addressing scheme is more ﬂexible. It is based

on a SocketAddress interface, which is implemented by a

VirtualSocketAddress in SmartSockets. Unfortunately, be-

cause Java does not not oﬀer a factory to create these Sock-

etAddresses, most applications explicitly use InetSocketAd-

dresses, which consists of a InetAddress and a port num-

ber. To make full use of SmartSockets, it is necessary to

replace these with a VirtualSocketAddress. For this pur-

pose, SmartSockets oﬀers a SocketAddressFactory,thatcan

be used to create both VirtualSocketAddress and InetSocke-

tAddress objects. As with the ﬁrst scheme, SmartSockets is

also backward compatible with InetSocketAddress, although

this may restrict the connectivity.

The third scheme simply uses a String as a machine ad-

dress. Although this string is originally intended to contain

a host name, it can just as easily be used to carry a string

representation of a virtual address. Therefore this mecha-

nism can be used by SmartSockets without any code modiﬁ-

cation. As with the previous schemes, the connectivity may

be restricted when the information in the string is limited.

class Example {

SocketFactory createFactory(String type) {

if (type.equals("plain"))

return SocketFactory.getDefault();

if (type.equals("SSL"))

return SSLSocketFactory.getDefault();

if (type.equals("SmartSockets"))

return SmartSocketFactory.getDefault();

// else print error

}

void run(String type, String address) {

SocketFactory f = createFactory(type);

SocketAddressFactory a = new SocketAddressFactory();

Socket s = f.createSocket().

s.connect(a.createSocketAddress(address));

// we can now use the socket.

}

Figure 1: Example Application

In Figure 1, an example is shown that is can make use of

regular sockets, SSL, or SmartSockets. By varying the type

parameter of run, a diﬀerent socket factory can be selected.

Using the SocketAddresFactory provided by SmartSockets

(explained above), the target machine address can be trans-

lated to a SocketAddress in a portable manner, and used for

connection setup.

Table 2: The testbed

Machine Restrictions (frontend) Restrictions (nodes)

DAS3-V MH MH

DAS3-D MH MH

Grid5000 MH, FW NR

Rockstar MH, FW MH, FW

Hiroshi MH, FW, NAT NR

Desktop NAT n/a

MH = multi-homing, FW = ﬁrewall, NR = non-routed

4. EVALUATION

In this section we will evaluate the performance of Smart-

Sockets. We use a testbed consisting of 5 diﬀerent clusters

and a desktop machine. The machines have varying connec-

tivity restrictions, shown in Table 2.

The DAS3 system consist of 5 diﬀerent clusters in the

Netherlands. We will use two, DAS3-V, the cluster at the

Vrije Universiteit Amsterdam, and DAS3-D, the cluster lo-

cated at the Delft University of Technology. The Grid5000

system consist of several clusters distributed over 9 sites in

France. We use the cluster located at the University of Nice-

Sophia Antipolis. The Rockstar cluster2is located at the

San Diego Supercomputing Center, University of California,

USA, and the Hiroshi machine is located at the School of

Information Technologies, University of Sydney, Australia.

Finally, Desktop is a single machine located in Haarlem,

The Netherlands. All sites use AMD or Intel processors of

2.0 Ghz or faster.

Figure 2: Hub connections on testbed.

Before running the experiments, a hub is started on the

frontend machine of each of the clusters and on the desktop

machine. Each hub is provided with locations of all other

hubs. As explained in Section 3.3.1, each hub attempts to

set up a direct connection to all others. The resulting setup

is shown in Figure 2. Double circles indicate a site with a

ﬁrewall. Sites using NAT are explicitly marked. The ar-

rows between the hubs indicate a connection and show the

direction in which the connection was established. Each ar-

row is annotated with the round-trip time between the sites

(measured with ping). The Hiroshi machine could only be

reached using an SSH-tunnel. This tunnel was automati-

cally setup by SmartSockets, but only after a suitable SSH

conﬁguration was created on the DAS3-V site.

4.1 Performance

We will start by evaluating the connection setup perfor-

mance, comparing it to the basic socket performance when

possible. The results are shown in Table 3. This table shows

the time required to set up a connection between compute

nodes of each combination of sites. Connections to and from

the desktop machine are also included.

2We would like to acknowledge Frank Seinstra for his assis-

tance in running experiments on the Rockstar and Hiroshi

machines.

The entries are annotated to indicate which connection

style is selected by SmartSockets. This selection is per-

formed automatically and the result is cached. As a result,

the correct connection style is immediately selected for all

but the ﬁrst connection.

As the table shows, a conventional connection could only

be created in 6 out of the 30 combinations. As expected,

SmartSockets selects a direct connection setup in these cases.

The time required by SmartSockets to establish a direct con-

nection is roughly twice that of conventional sockets. This is

caused by the identity check (see Section 3.2) which requires

an extra round trip.

In four cases, SmartSockets selected a reverse connection

setup. These correspond to the combinations of machines

where the source machine is open, but the target is behind

a ﬁrewall or NAT. Reverse connection setup adds an addi-

tional round trip latency to the time required by the direct

connection setup (in the opposite direction). This additional

time is needed to forward a reverse connection request mes-

sage from the source to the target (via the hubs) and to send

an accept message (via the new connection) once the appli-

cation on the target has accepted the incoming connection.

In the other 20 cases, SmartSockets decided to set up a

virtual connection by routing all data via the hubs. In these

cases the connection setup time is dominated by the round

trip time to the machine, as can be seen by comparing the

numbers in Table 3 to those in Table 4. Although it is

possible to use splicing between the Desktop and Rockstar

machines, this method fails occasionally due to the timing

sensitivity of this approach. When this occurs, a virtual

connection is selected instead. Since SmartSockets uses a

cache to remember previous selections, splicing will not be

used afterward.

Time (seconds)

routed

splice

Reverse

UC C

Splice

UC C

Routed

UC C

reverse

direct

succesfull setup

wrong machine

non-existant machine

Time (milliseconds)

Direct

No heuristic

NR R

Heuristic Manual

Figure 3: Breakdown of connection setup time.

The ﬁrst graph of Figure 3 shows the impact of the se-

lected module cache on connection setup time. Only three

connection setup mechanism are shown. The connection

setup mechanisms are initially tried in the order Direct, Re-

verse, Splice, Routed.SincetheDirect approach is always

tried ﬁrst, no caching is needed when it is successful. For

each mechanism, Figure 3 shows two bars. The ﬁrst shows

the connection setup time when no information is available,

thesecondshowsthetimewhenthecorrectmechanismcan

be retrieved from the cache. A connection timeout of 5 sec-

onds was speciﬁed for both.

The reverse connection setup is performed from DAS3-V

to Rockstar. First, about 2.5 seconds is spent in a direct

Table 3: Connection setup time of SmartSockets (time in milliseconds).

Source

Target DAS3-V DAS3-D Rockstar Grid5000 Hiroshi Desktop

DAS3-V 4.9d(2.4) 332d(166) 68v595v33d(17)

DAS3-D 4.9d(2.4) 335d(167) 70v595v33d(18)

Rockstar 500r503r206v718v182v

Grid5000 35v38v206v593v54v

Hiroshi 630v603v750v670v640v

Desktop 49r52r183v84v606v

Annotations indicate connection style: dfor direct, rfor reverse, sfor splicing, and vfor routed.

When applicable, the connection setup time of regular sockets is shown between brackets.

connection attempt, which only fails after the timeout has

fully expired. This is common behavior when connecting

to a machine behind a ﬁrewall which blocks the incoming

connection but sends no reply. As a result, the source ma-

chine has to wait for the timeout. Next, about 0.5 seconds

are needed to set up a reverse connection. All subsequent

connections are created in 0.5 seconds.

For the spliced connection setup we perform a separate ex-

periment connecting the Desktop machine to the Grid5000

frontend. As Figure 3 shows, the direct and reverse con-

nection setup fail after using most of their 1.25 second time

slots. A spliced connection is then created in 0.45 seconds.

The routed connection setup is performed from a DAS3-

V to a node of Grid5000. The ﬁrst three connection setup

mechanisms fail, each using 1.25 seconds. A virtual connec-

tion is then established between the machines in 37 millisec-

onds. All subsequent connections are created in 37 millisec-

onds, reducing the connection time by a factor of 102.

The second graph of Figure 3 shows the impact of the tar-

get address heuristic on the direct connection setup time be-

tween the DAS3-D and DAS3-V site. DAS3-V has one pub-

lic and two private addresses, DAS3-D has one of both. The

range of one of the private addresses overlaps on both sites.

The ﬁgure shows four experiments, one with the heuristic

turned oﬀ, two with the heuristic turned on, and one where

the correct address was manually selected.

The ﬁrst experiment shows that if no heuristic is used, the

connection setup requires 17 milliseconds, of which 12 are

spent attempting a connection setup to a private address

which does not exist on DAS3-D. Next, about 0.2 millisec-

onds is required to discover that no connection is possible

to the second private address (since no one is listening). Fi-

nally, the correct connection is set up in 4.5 milliseconds.

The second experiment does use the heuristic. In this

experiment there is no process listening on the DAS3-D ma-

chine that shares the private address with the target DAS3-

V machine. Therefore, the ﬁrst connection attempt fails im-

mediately. The correct connection is then established in 5

milliseconds. The third experiment uses a similar setup, but

now there is a process listening on the DAS3-D machine that

shares the private address. As a result, a handshake is re-

quired to discover that a connection has been established to

the wrong machine. This handshake requires approximately

twice the time needed for the failed connection attempt of

the previous experiment. In the last experiment the correct

address is manually selected. As expected, the connection

is set up in 5 milliseconds.

Table 4 shows the round-trip time of the connections. In

the six cases where regular sockets can also establish a con-

nection, the round-trip time of SmartSockets and the regu-

lar socket connection is the same. For the other cases, the

round-trip time is approximately the same as the network

round-trip time as measured by ping. The only exception is

the Hiroshi cluster. The frontend of this machine can only

be reached using SSH-tunneling. Unfortunately, the over-

head of this approach is high, roughly doubling the required

round trip time.

Table 5 shows the achievable throughput. As with latency,

the throughput of SmartSockets is similar to that of regular

sockets (where applicable). The performance of the Hiroshi

cluster is limited both by the distance to the other machines

and by the encryption performed by the SSH-tunnel used

to reach it. The performance of the Desktop machine is

limited by its ADSL connection (approximately 3.5 Mbit/s

downstream and 800 KBit/s upstream).

5. RELATED WORK

The system described in [10] can be seen as a predeces-

sor to SmartSockets. It was developed in cooperation with

our group. The focus of this work was mainly on using

splicing to traverse ﬁrewalls. Only a limited form of mes-

sage routing was available (no further than two hops) and

the system did not support multi homing, SSH tunneling, or

reverse connection setup. To allow the side-channel commu-

nication through ﬁrewalls, the system needed gateway nodes

with access to the networks inside and outside of the ﬁrewall

(e.g. by using an open port range). Machines could then

connect to such a gateway whenever side-channel commu-

nication was necessary with a machine behind the ﬁrewall.

Instead, in SmartSockets the hubs use outgoing connections

from behind the ﬁrewall. This is easier to set up and it does

not require any open ports.

Generic Connection Brokering (GCB) [28] can serve as a

replacement for traditional sockets, provided that the ap-

plication follows certain programming guidelines. When a

GCB client creates a socket for listening, this socket is reg-

istered at a GCB server. This server must be located in a

publicly accessible network. Similar to NAT, the server then

creates a socket with a public address that acts as a proxy

for the private or ﬁrewalled client socket. This public ad-

dress is then returned to the client to be used as the ‘oﬃcial’

address of the client socket. When any non-GCB client tries

to connect to this address, it will reach the proxy instead.

The server will then forward this incoming connection to the

client and relay any subsequent data. Connections created

by GCB-aware clients will be forwarded to the server instead

of the proxy (by replacing the port number in the client ad-

dress with a well-known server port). This allows the server

to mediate in the connection setup between the two clients

and, depending on their connectivity restrictions, instruct

them to use a direct connection, reverse the connection or-

der, or use the server itself as a relay.

Table 4: Roundtrip latency of SmartSockets (time in milliseconds).

Source

Target DAS3-V DAS3-D Rockstar Grid5000 Hiroshi Desktop

DAS3-V 2.3 (2.3) 166 (166) 56 528 14 (14)

DAS3-D 2.3 (2.3) 167 (167) 57 533 15 (15)

Rockstar 166 167 205 590 195

Grid5000 56 57 205 524 50

Hiroshi 528 529 590 522 539

Desktop 14 15 190 43 522

When applicable, the roundtrip latency of regular sockets is shown between brackets.

Table 5: Throughput of SmartSockets (in Mbit/second).

Source

Target DAS3-V DAS3-D Rockstar Grid5000 Hiroshi Desktop

DAS3-V 182 (183) 2.6 (2.5) 2.5 0.25 0.65 (0.65)

DAS3-D 185 (186) 2.6 (2.5) 2.6 0.26 0.65 (0.65)

Rockstar 2.8 2.7 6.9 0.23 0.65

Grid5000 7.6 8.2 2.4 0.20 0.65

Hiroshi 0.73 0.73 0.70 0.73 0.61

Desktop 3.3 3.3 2.2 2.2 0.25

When applicable, the throughput of regular sockets is shown between brackets.

Although GCB is similar to SmartSockets , there are some

signiﬁcant diﬀerences. Because GCB clients directly connect

to the (remote) GCB server representing the target client,

this server must be on a publicly accessible network. Also,

outgoing connectivity is required on all client nodes. GCB

only supports two hop message routing, and does not have

support for multi homing on the client nodes. In Smart-

Sockets the hubs are not required to be publicly accessi-

ble, but instead, the hubs must be capable of setting up a

spanning tree. It is generally not a problem when a subset

of the hubs can only use outgoing connections or can only

be reached through SSH tunneling. Outgoing connectivity

is not required for the clients, since they can route their

connections over the hubs, using multiple hops if necessary.

Unlike SmartSockets , GCB does have support for incoming

legacy TCP connections.

Many projects attempt to solve the connectivity problems

by using peer-to-peer overlay networks. In WOW [18] vir-

tual machines are combined with peer-to-peer techniques to

create a virtual cluster. By running VMware [31] on all

machines a uniform system image can be provided to the

applications. All traﬃc to the (virtual) network device is

intercepted and routed to the target using the IPOP [17]

overlay network. To the application the system appears as

a single cluster using a local area network with private ad-

dresses. The system also supports transparent migration of

virtual machines. The advantage of this approach is that no

changes are required to the application. It is a heavy weight

solution, however. Instead of just deploying the application

to the Grid sites, VMware must also be deployed, including

a copy of the required operating system. In addition, all net-

work traﬃc is routed using the overlay network, even when

two machines are located in the same site. The experiments

in [18] shown that this limits the network bandwidth in a

single site to 12.5 MBit/s. Even when IPOP [17] is used di-

rectly by the application, the network bandwidth is reduced

to 61% on a local-area network, and 51% on a wide-area

network. VNET [32] is similar to WOW, but uses tunneling

instead of peer-to-peer techniques to forward network traﬃc.

Like WOW, VNET shows signiﬁcant performance degrada-

tion in both local and wide-area experiments. VIOLIN [22]

and ViNe [33] propose similar solutions. In [25], this per-

formance degradation is solved by only using peer-to-peer

techniques for resource discovery and allocation. The ap-

plications use regular sockets instead, thereby signiﬁcantly

improving the performance, but also reintroducing the con-

nectivity problems described in this paper.

SmartSockets uses a combination of the approaches de-

scribed above. It prefers to create direct connections and

only uses routing or tunneling as a last resort. This results

in a performance on par with regular sockets when possible,

but also oﬀers improved connectivity when it is needed.

Several mechanisms exist that allow two machines using

NAT to set up a communication channel. STUN [27] only

allows the exchange of UDP messages. STUNT [19, 20]

and NATBlaster [6] support TCP, but require access to raw

sockets, for which special user privileges are needed. All

three use external servers to provide information on address

and port translation. Both STUN and NATBlaster use port

range prediction. The system described in [13] does not re-

quire raw sockets, but does not use port range prediction

during connection setup. As shown in [19], the connection

setup success rate of this approach increased from 45% to

84% when port range prediction was added. Since Smart-

Sockets uses the same mechanism as [13], but includes port

range prediction, we also expect the connection setup suc-

cess rate to be around 84%.

Although SmartSockets was initially designed to increase

the connectivity, it is also used to do the exact opposite. By

extending the handshake performed in the direct connection

layer with a check that selectively refuses connections based

on the address of the source machine, a simple ﬁrewall can

be simulated. This allows a complex network with limited

connectivity to be simulated on single cluster. In [11], this

mechanism is used to evaluate the eﬀectiveness of peer-to-

peer gossiping techniques when machines have limited con-

nectivity. Based on these experiments, the authors propose

a new gossiping algorithm, Actualized Robust Random Gos-

siping (ARRG), that outperforms existing algorithms in sit-

uations where the network connectivity is restricted.

In MOB [9], the multi homing support of SmartSockets is

used to run cluster to cluster multicast experiments. Smart-

Sockets automatically selects the fast local network for intra-

cluster traﬃc, while only using the regular ethernet between

clusters. This project also uses the ﬁrewall simulation de-

scribed above to divide a single cluster into several smaller

ones. By forcing all communication to be routed via a small

number of machines, shared links between the clusters can

be simulated. By artiﬁcially reducing the bandwidth or in-

creasing the latency on those shared links, the robustness of

the multicast algorithms can be tested.

6. CONCLUSIONS

In this paper we have introduced an integrated framework,

called SmartSockets, which is capable of solving the connec-

tivity restrictions found on many Grid sites, with very little

help from the user. We have shown that by using caching,

the connection setup time can be reduced signiﬁcantly. In 30

connection setup experiments, using 6 diﬀerent sites world-

wide, our framework was always capable of creating a con-

nection, requiring a maximum time of 750 milliseconds once

the necessary information was cached. A conventional con-

nection could only be created in 6 out of the 30 combina-

tions. By preferring direct connections, the bandwidth and

latency oﬀered by SmartSockets is similar to that of a con-

ventional connection (in the situations where a conventional

connection can be created).

By using a heuristic that prefers suitable private addresses

of a target machine during connection setup, a fast local net-

work is often selected for intra-site communication, thereby

potentially improving the application performance. As we

have shown, it is essential that an identity check is performed

to prevent a connection to a wrong machine.

So far we have only shown the results of low-level per-

formance benchmarks. The following step in our work will

be to further evaluate the performance and scalability of

SmartSockets using a wide range of parallel and distributed

applications. We are also planning to improve the through-

put on long distance connections.

7. REFERENCES

[1] Netﬁlter. http://www.netﬁlter.org.

[2] SUN Java 5.0. http://java.sun.com.

[3] The Grid5000 system. http://www.grid5000.fr.

[4] The InﬁniBand Trade Alliance architecture.

http://www.inﬁnibandta.org.

[5] Universal Plug and Play (UPnP). http://www.upnp.org.

[6] A. Biggadike, D. Ferullo, G. Wilson, and A. Perrig.

NATBlaster: Establishing TCP connections between hosts

behind NATs. In In Proc. of ACM SIGCOMM Asia

Workshop, April 2005.

[7] N.Boden,D.Cohen,R.Felderman,A.K.andC.L.Seitz,

J. Seizovic, and W. Su. Myrinet: A Gigabit-per-second Local

Area Network. IEEE Micro, 15(1):29–36, Jan. 1995.

[8] D.Caromel,C.Delbe,A.diCostanzo,andM.Leyton.

ProActive: an Integrated Platform for Programming and

Running Applications on Grids and P2P systems.

Computational Methods in Science and Technology, 12, 2006.

[9] M. den Burger and T. Kielmann. ”MOB: Zero-conﬁguration

High-throughput Multicasting for Grid Applications”. In Proc.

of the 16th International Symposium on High-Performance

Distributed Computing (HPDC-16), Monterey, California,

USA, June 2007. Accepted for publication.

[10] A. Denis, O. Aumage, R. F. H. Hofman, K. Verstoep,

T. Kielmann, and H. E. Bal. Wide-area Communication for

Grids: An Integrated Solution to Connectivity, Performance

and Security Problems. In Proc. of the 13th International

Symposium on High-Performance Distributed Computing

(HPDC-13), pages 97–106, Honolulu, Hawaii, USA, June 2004.

[11] N. Drost, E. Ogston, R. V. van Nieuwpoort, and H. E. Bal.

”ARRG: Real-world Gossiping”. In Proc. of the 16th

International Symposium on High-Performance Distributed

Computing (HPDC-16), Monterey, California, USA, June

2007. Accepted for publication.

[12] K. Egevang and P. Francis. The IP Network Address

Translator (NAT). RFC 1631, May 1994. Obsoleted by RFC

3022.

[13] B. Ford, D. Kegel, and P. Srisuresh. Peer-to-peer

Communication Across Network Address Translators. In

Proceedings o f th e 2005 USENIX Technical Conference, 2005.

[14] P. Francis. Is The Internet Going Nutss? IEEE Internet

Computing, 7(6):94–96, 2003.

[15] N. Freed. Behavior of and Requirements for Internet Firewalls.

RFC 2979, Oct. 2000.

[16] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design

Patterns: Elements of Reusable Object-Oriented Software.

Addison Wesley, Reading (MA), USA, 1995.

[17] A. Ganguly, A. Agrawal, P. O. Boykin, and R. Figueiredo. IP

over P2P: Enabling Self-conﬁguring Virtual IP Networks for

Grid Computing. In Proc. of 20th International Parallel and

Distributed Processing Symposium (IPDPS-2006), April 2006.

[18] A. Ganguly, A. Agrawal, P. O. Boykin, and R. Figueiredo.

WOW: Self-organizing Wide Area Overlay Networks of Virtual

Wor k sta t ion s . I n Proc. of the 15th International Symposium

on High-Performance Distributed Computing (HPDC-15),

Paris, France, June 19-23 2006.

[19] S. Guha and P. Francis. Characterization and Measurement of

TCP traversal Through NATs and Firewalls. In In Proc. of

Internet Measurement Conference (IMC), 2005.

[20] S. Guha, Y. Takeda, and P. Francis. Nutss: a SIP-based

Approach to UDP and TCP Network Connectivity. In FDNA

’04 : Proceedings of th e AC M SI GC OMM worksho p on

Futu re directions in networ k architecture, pages 43–48, New

York, NY, USA, 2004. ACM Press.

[21] T. Hain. Architectural Implications of NAT. RFC 2993, Nov.

2000.

[22] X. JIANG and D. XU. VIOLIN: Virtual Internetworking on

Overlay Infrastructure. In Proc. of the 2th International

Symposium on Parallel and Distributed Processing and

Applications., December 2004.

[23] P. Leach, M. Mealling, and R. Salz. A Universally Unique

IDentiﬁer (UUID) URN Namespace. RFC 4122, July 2005.

[24] M. Leech, M. Ganis, Y. Lee, R. Kuris, D. Koblas, and

L. Jones. SOCKS Protocol Version 5. RFC 1928, Mar. 1996.

[25] Z. Pan, X. Ren, R. Eigenmann, and D. Xu. Executing MPI

Programs on Virtual Machines in an Internet Sharing System.

In Proc. of 20th International Parallel and Distributed

Processing Symposium (IPDPS-2006), April 2006.

[26] Y. Rekhter, B. Moskowitz, D. Karrenberg, G. J. de Groot, and

E. Lear. Address Allocation for Private Internets. RFC 1918,

Feb. 1996.

[27] J. Rosenberg, J. Weinberger, C. Huitema, and R. Mahy. STUN

- Simple Traversal of User Datagram Protocol (UDP) Through

Network Address Translators (NATs). RFC 3489, Mar. 2003.

[28] S. Son and M. Livny. Recovering Internet Symmetry in

Distributed Computing. In In Proceeding s of t he 3rd

International Symposium on Cluster Computing and the

Grid, May 2003.

[29] P. Srisuresh and K. Egevang. Traditional IP Network Address

Translator (Traditional NAT). RFC 3022, Jan. 2001.

[30] P. Srisuresh, J. Kuthan, J. Rosenberg, A. Molitor, and

A. Rayhan. Middlebox Communication Architecture and

Framework. RFC 3303, Aug. 2002.

[31] J. Sugerman, G. Venkitachalam, and B. Lim. Virtualizing I/O

Devices on VMware Workstation’s Hosted Virtual Machine

Monitor. In Proc. of the USENIX Annual Technical

Conference, June 2001.

[32] A. SUNDARARAJ and P. DINDA. Towards Virtual Networks

for Virtual Machine Grid Computing. In Proc.ofthe3rd

USENIX Virtual Machine Research And Technology

Symposium (VM 2004)., 2004.

[33] M. Tsugawa and J. A. Fortes. A Virtual Network (ViNe)

Architecture for Grid Computing. In Proc. of 20th

International Parallel and Distributed Processing Symposium

(IPDPS-2006), April 2006.

[34] R. V. van Nieuwpoort, J. Maassen, A. Agapi, A.M. Oprescu,

and T. Kielmann. Experiences Deploying Parallel Applications

on a Large-scale Grid. In Proc. of EXPGRID - Experimental

Grid Testbeds for the Assessment of Large-scale Distributed

Applications and Tools. Workshop in conjunction with

(HPDC-15), Paris, France, June 2006.

EdgeVPN: Self-organizing layer-2 virtual edge networks

Article

Oct 2022
FUTURE GENER COMP SY

The advent of virtualization and cloud computing has fundamentally changed how distributed applications and services are deployed and managed. With the proliferation of IoT and mobile devices, virtualized systems akin to those offered by cloud providers are increasingly needed geographically near the network’s edge to perform processing tasks in proximity to the data sources and sinks. Latency-sensitive, bandwidth-intensive applications can be decomposed into workflows that leverage resources at the edge – a model referred to as fog computing. Not only is performance important, but a trustworthy network is fundamental to guaranteeing privacy and integrity at the network layer. This paper describes Bounded Flood, a novel technique that enables virtual private Ethernet networks that span edge and cloud resources – including those constrained by NAT and firewall middleboxes. Bounded Flood builds upon a scalable structured peer-to-peer overlay, and is novel in how it integrates overlay tunnels with SDN software switches to create a virtual network with dynamic membership – supporting unmodified Ethernet/IP stacks to facilitate the deployment of edge applications. Bounded Flood has been implemented as the core of the EdgeVPN open-source virtual private network software system for edge computing. Experiments with the software demonstrate its functionality and scalability – one of which includes Kubernetes with Flannel across Raspberry Pi 4 edge devices behind different NATs.

A distributed computing approach to improve the performance of the Parallel Ocean Program (v2.1)

Article

Full-text available

Feb 2014

The Parallel Ocean Program (POP) is used in many strongly eddying ocean circulation simulations. Ideally it would be desirable to be able to do thousand-year-long simulations, but the current performance of POP prohibits these types of simulations. In this work, using a new distributed computing approach, two methods to improve the performance of POP are presented. The first is a block-partitioning scheme for the optimization of the load balancing of POP such that it can be run efficiently in a multi-platform setting. The second is the implementation of part of the POP model code on graphics processing units (GPUs). We show that the combination of both innovations also leads to a substantial performance increase when running POP simultaneously over multiple computational platforms.

Secure on demand IP based connection (SEDIC) for virtual private networks(VPNs)

Article

May 2014

Diverse security measures are used to improve security entropy including the introduction of secure port services, better tunneling protocols and complex encryptions cryptography. Most of these do not address the fundamental of the security risk which is to avoid newly discovered exploits and protect credential from man-in-the-middle attack. In this study, experiments involving three types of existing environment, which include insecure connection as a basis, working against pre-shared key and public-key infrastructure (PKI), are being modeled. A new framework named SeDIC has been introduced to overcome the limitations and address the current security weaknesses. In this new implementation, forward secrecy is maintained since the key for authentication is only valid once and this will deny replay attack. This study proves that secure internet application is possible and the user can have the freedom to use the lowest cryptographic entropy to perform their on-line transactions. Having complex mathematical algorithms such as Elliptic Curve Cryptography (ECC) for tunneling, or even multilayer authentication system alone will not address the potential risk, besides prolonging the time for intruder in gaining unauthorized access.

Gossiping in the wild -- Tackling practical problems faced by gossip protocols when deployed on the Internet

Article

Oct 2011

Alessio Pace

Peer-to-peer (P2P) systems are very popular today. Their usage goes from instant messaging to file sharing, from distributed backup and storage to even live-video streaming. Among P2P protocols, gossip-based protocols are a family of protocols which have been the object of several research works in the last decade. The reasons behind the interest in gossip-based protocols are that they are considered robust, easy to implement, and that they have interesting scalability properties. They are then appealing candidates for implementing dynamic and large-scale distributed systems. This thesis tackles two problems faced by gossip-based protocols when deployed on a practical scenario as the Internet. The first problem is coping with Network Address Translators (NATs) in the context of gossip-based peer sampling protocols. These protocols make the assumption that, at any moment, each node is able to communicate with any other node of the network. This assumption is false when some nodes use NATs. We present Nylon, a peer sampling protocol which works despite the presence of NATs. Nylon introduces a low overhead to cope with NATs and fairly balances this overhead among nodes using a NAT and those which do not. The second problem that we study is the possibility to limit the dissemination of “spam” messages in gossip-based dissemination protocols. These protocols are in fact ideal vectors to spread spam messages due to the fact that there is no central authority in charge of filtering messages based on their content. We propose FireSpam, a gossip-based dissemination protocol which allows limiting the dissemination of “spam” messages. FireSpam implements a decentralized filtering mechanism (each node participates to the filtering). Moreover, it works despite the presence of a fraction of malicious nodes (also called “Byzantine” nodes) and despite the presence of so called “rational” nodes (also called “selfish” nodes). These latters are willing to deviate from the protocol if they have an interest in doing so.

Distributed Multiscale Computing

Thesis

Full-text available

Jul 2014

Joris Borgdorff

Multiscale models combine knowledge, data, and hypotheses from different scales. Simulating a multiscale model often requires extensive computation. This thesis evaluates distributing these computations, an approach termed distributed multiscale computing (DMC). First, the process of multiscale modelling is examined, in order to describe it in a general and effective way. Then, multiscale models are described with a scale-aware component-based approach, treating them as a set of coupled single scale models. The computational architecture of multiscale applications is then specified with the multiscale modelling language. Such a specification can be analysed for its structural and computational characteristics, using a task graph, and it can be used as the basis for an implementation with Multiscale Coupling Library and Environment 2 (MUSCLE 2). MUSCLE 2 executes multiscale applications on local and distributed machines with a low overhead. As a use case, a model of in-stent restenosis (ISR3D) is described as a set of coupled single scale models, specified with the multiscale modelling language, and implemented and executed with MUSCLE 2. When doing distributed computing, this lead to a decrease in resource consumption under specific circumstances. Five other applications from several domains evaluated DMC, and derived different benefits from it: an increase of simulation speed or a decrease in resource consumption by using heterogeneous machines; or an increase in simulation speed by using more resources altogether. Given these results, DMC is deemed viable for heterogeneous multiscale models and for users with limited local computing resources.

Edgevpn: Self-Organizing Layer-2 Virtual Edge Networks

Article

Jan 2022

Web Distributed Computing Systems

Chapter

Jan 2012

This article proposes a new approach for distributed computing. The main novelty consists in the exploitation of Web browsers as clients, thanks to the availability of Javascript, AJAX and Flex. The described solution has two main advantages: it is client-free, so no additional programs have to be installed to perform the computation, and it requires low CPU usage, so client-side computation is no invasive for users. The solution is developed using both AJAX and Adobe®Flex®technologies embedding a pseudo-client into a Web page that hosts the computation. While users browse the hosting Web page, computation takes place resolving single sub-problems and sending the solution to the server-side part of the system. Our client-free solution is an example of high resilient and auto-administrated system that is able to organize the scheduling of the processes and the error management in an autonomic manner. A mathematical model has been developed over this solution. The main goals of the model are to describe and classify different categories of problems on the basis of the feasibility and to find the limits in the dimensioning of the scheduling systems to have convenience in the use of this approach. The new architecture has been tested through different performance metrics by implementing two examples of distributed computing, the cracking of an RSA cryptosystem through the factorization of the public key and the correlation index between samples in genetic data sets. Results have shown good feasibility of this approach both in a closed environment and also in an Internet environment, in a typical real situation.

IBIS: THE NEW ERA FOR DISTRIBUTED COMPUTING

Article

Full-text available

Jan 2018

The landscape of distributed computing systems has changed many times over the previous decades. Modern real-world distributed systems consist of clusters, grids, clouds, desktop grids, and mobile devices. Writing applications for such systems has become increasingly difficult. The aim of the Ibis project is to drastically simplify the programming of such applications. The Ibis philosophy is that real-world distributed applications should be developed and compiled on a local workstation, and simply be launched from there. The Ibis project studies several fundamental problems of distributed computing hand-in-hand with major applications, and integrates the various solutions in one programming system.

A Review on Modern Distributed Computing Paradigms: Cloud Computing, Jungle Computing and Fog Computing

Article

Full-text available

Aug 2014

The distributed computing attempts to improve performance in large-scale computing problems by resource sharing. Moreover, rising low-cost computing power coupled with advances in communications/networking and the advent of big data, now enables new distributed computing paradigms such as Cloud, Jungle and Fog computing. Cloud computing brings a number of advantages to consumers in terms of accessibility and elasticity. It is based on centralization of resources that possess huge processing power and storage capacities. Fog computing, in contrast, is pushing the frontier of computing away from centralized nodes to the edge of a network, to enable computing at the source of the data. On the other hand, Jungle computing includes a simultaneous combination of clusters, grids, clouds, and so on, in order to gain maximum potential computing power. To understand these new buzzwords, reviewing these paradigms together can be useful. Therefore, this paper describes the advent of new forms of distributed computing. It provides a definition for Cloud, Jungle and Fog computing, and the key characteristics of them are determined. In addition, their architectures are illustrated and, finally, several main use cases are introduced.

End-to-end multipath transport layer architecture oriented the next generation network

Article

Oct 2010

To solve the problem of inefficient transmission using multiple interfaces of multihome terminal in the tradition network, an end-to-end multipath transport layer architecture-E2EMP oriented the next generation network was presented. Through distributing data adaptively following characters of the end-to-end paths, adopting dual sequence space, implementing smart path management policies, the performance of the multihome terminal using E2EMP has significant improvement. The simulation results show that E2EMP aggregates bandwidth of the multihome terminal interfaces efficiently, and meanwhile promotes the security and reliability of end-to-end multipath transport.

Experiences Deploying Parallel Applications on a Large-scale Grid

Article

Full-text available

We describe our experiences with integrating sev- eral Grid software components into a single coherent system that is used to write and run parallel applications on the Grid. The integrated components are the Grid Application Toolkit (GAT), ProActive, Satin and Ibis. We experimented with this (Java- based) system by participating in the N-Queens contest of the Grids@work event in October 2005. In addition to integrating available components, we wrote a ProActive plugin for the GAT, a parallel N-Queens solver, and an application to manage Grid deployment of N-Queens. We identified several connectivity issues and scalability problems in the components we use. We show how we modified some of the components to solve of these problems. We successfully ran experiments on 960 processors across Grid'5000, with an efficiency of around 85%, winning the prize for the largest number of nodes deployed during the contest. The Grids@work event held in October 2005 in Sophia Antipolis, France (7) was composed of a series of conferences and tutorials including the 2nd Grid Plugtests. The objective was to bring together Grid users and to present and discuss current and future features of the ProActive Grid platform, and to test the deployment and interoperability of Grid applications on various Grids. A part of the 2nd Grid Plugtests consisted of an N-Queens contest, where the aim was to find the number of solutions to the N-Queens problem, N being as big as possible, in a limited amount of time. We participated in this contest with a parallel N-Queens application. We used this application and the Grid testbed that was provided to integrate many software components, and to evaluate the integration, functionality and performance. The global structure of the system we used is shown in Figure 1. For portability reasons, all software components are written in Java. The N-Queens application itself is written in Satin, our Java-based divide-and-conquer programming model (9). Satin is implemented on top of the Ibis (11) communication library, while deployment of the application was done with a manager application that was written specifically for this contest. The manager uses the Java Grid Application Toolkit (2) (GAT) to access the Grid. The Java GAT in turn uses the ProActive (5) middleware for Grid deployment. This paper describes our experiences with integrating all different software components we used, and identifies some problems in our software packages, Ibis and Satin in particular, that we discovered during the contest. Some are related to scaling a parallel programming system up to 1000 machines that are distributed over a large geographical area, others were related to typical Grid issues such as firewall problems and network misconfigurations. Although we did encounter some difficulties, we were still able to run the parallel N-Queens application on 961 CPUs scattered across different Grid'5000 sites in France. Finally, we suggest possible solutions for the problems encountered. After identifying and solving the prob- lems we describe in this paper, we won the prize for largest number of nodes deployed in parallel during the contest. The remainder of this paper is structured as follows. First, we discuss the deployment tools GAT and ProActive (Sec- tion II), followed by Ibis and Satin (Section III), and then the N-Queens application itself (Section IV) and the testbed (Section V). Section VI will discuss the issues we encountered using a large-scale Grid, along with the solutions we applied. Sections VII and VIII summarize results and our conclusions, respectively.

STUN—Simple traversal of user datagram protocol (UDP) through network address translators (NATs)

Article

Full-text available

Jan 2003

Simple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs) (STUN) is a lightweight protocol that allows applications to discover the presence and types of NATs and firewalls between them and the public Internet. It also provides the ability for applications to determine the public Internet Protocol (IP) addresses allocated to them by the NAT. STUN works with many existing NATs, and does not require any special behavior from them. As a result, it allows a wide variety of applications to work through existing NAT infrastructure.

WOW: Self-organizing Wide Area Overlay Networks of Virtual Workstations

Article

Full-text available

Jan 2006

This paper describes WOW, a distributed system that combines virtual machine, overlay networking and peer-to-peer techniques to create scalable wide-area networks of virtual workstations for high-throughput computing. The system is architected to: facilitate the addition of nodes to a pool of resources through the use of system virtual machines (VMs) and self-organizing virtual network links; to maintain IP connectivity even if VMs migrate across network domains; and to present to end-users and applications an environment that is functionally identical to a local-area network or cluster of workstations. We describe IPOP, a network virtualization technique that builds upon a novel, extensible user-level decentralized technique to discover, establish and maintain overlay links to tunnel IP packets over different transports (including UDP and TCP) and across firewalls. We evaluate latency and bandwidth overheads of IPOP and also time taken for a new node to become fully-routable over the virtual network. We also report on several experiments conducted on a testbed WOW deployment with 118 P2P router nodes over PlanetLab and 33 VMware-based VM nodes distributed across six firewalled domains. Experiments show that the testbed delivers good performance for two unmodified, representative benchmarks drawn from the life-sciences domain. We also demonstrate that the system is capable of seamlessly maintaining connectivity at the virtual IP layer for typical client/server applications (NFS, SSH, PBS) when VMs migrate across a WAN.

ProActive: an Integrated platform for programming and running applications on Grids and P2P systems

Article

Jan 2006

We propose a grid programming approach using the ProAc-tive middleware. The proposed strategy addresses several grid concerns, which we have classified into three categories. I. Grid Infrastructure which handles the resource acquisition and creation using deployment descriptors and Peer-to-Peer. II. Grid Technical Services which can pro-vide non-functional transparent services like: fault tolerance, load balanc-ing, and file transfer. III. Grid Higher Level programming with: group communication and hierarchical components. We have validated our ap-proach with several grid programming experiences running applications on heterogeneous Grid resource using more than 1000 CPUs.

RFC 1631The IP Network Address Translator

Article

Behavior of and requirements for Internet firewalls

Article

Jan 2000

N. Freed

Address allocation for private internets

Article

Jan 1994

Traditional IP Network Address Translator (Traditional NAT)

Article

Jan 2004

Basic Network Address Translation or Basic NAT is a method by which IP addresses are mapped from one group to another, transparent to end users. Network Address Port Translation, or NAPT is a method by which many network addresses and their TCP/UDP (Transmission Control Protocol/User Datagram Protocol) ports are translated into a single network address and its TCP/UDP ports. Together, these two operations, referred to as traditional NAT, provide a mechanism to connect a realm with private addresses to an external realm with globally unique registered addresses.

NUTSS: a SIP-based approach to UDP and TCP network connectivity

Article

Aug 2004

The communications establishment capability of the Session Initi-ation Protocol is being expanded by the IETF to include establish-ing network layer connectivity for UDP for a range of scenarios, including where hosts are behind NAT boxes, and host are run-ning IPv6. So far, this work has been limited to UDP because of the assumed impossibility of establishing TCP connections through NAT, and because of the difficulty of predicting port assignments on certain common types of NATs. This paper reports on prelimi-nary success in establishing TCP connections through NAT, and on port prediction. In so doing, we suggest that it may be appropriate for SIP to take a broader architectural role in P2P network layer connectivity for both IPv4 and IPv6.

Design Patterns. Elements of Reusable Object-Oriented Software

Book

Aug 1995

SmartSockets: Solving the connectivity problems in grid computing

Abstract

Recommended publications

Parametric optimization with evolutionary strategies in particle physics

WAVNet: Wide-Area Network Virtualization Technique for Virtual Private Cloud

User-Level Virtual Network Support for Sky Computing

JEL: Unified resource tracking for parallel and distributed applications

Firewall Traversal in the Grid Architecture