Table 2 - uploaded by Matthew Woitaszek
Content may be subject to copyright.
HOMME speedup over built-in motherboard by interconnect, with and without HT enabled

HOMME speedup over built-in motherboard by interconnect, with and without HT enabled

Source publication
Article
Full-text available
Remote Direct Memory Access (RDMA) is an effective technology for reducing system load and improving performance. Recently, Ethernet offerings that exploit RDMA technology have become available that can potentially provide a high-performance fabric for M PI communications at lower cost than other competing technologies. The goal of this paper is to...

Context in source publication

Context 1
... gathered timing data for 4 and 8 processor runs, with 5 trials for each. The average time per time-step was used to compute the relative speedup to the motherboard NIC, as listed in Table 2. ...

Citations

... Even small amounts of loss or reordering can have a detrimental impact on RoCE performance. Note that the ability to do RoCE also requires RoCE-capable network interface cards (NICs), such as the Mellanox adapters used in our evaluation [1], [28]. ...
... A number of other RDMA and zero-copy protocols not involving InfiniBand (IB) have been proposed to run over Ethernet. These include technologies such as Intel's Direct Ethernet Transport (DET) [6] and approaches that use iWARP-enabled NICs [14], [28]. Compared to RoCE and Infiniband, DET does not provide full OS-bypass functionality with limited hardware support, while iWARP remains bound to the limitations of TCP/IP. ...
Conference Paper
Full-text available
Data set sizes are growing exponentially, so it is important to use data movement protocols that are the most efficient available. Most data movement tools today rely on TCP over sockets, which limits flows to around 20Gbps on today's hardware. RDMA over Converged Ethernet (RoCE) is a promising new technology for high-performance network data movement with minimal CPU impact over circuit-based infrastructures. We compare the performance of TCP, UDP, UDT, and RoCE over high latency 10Gbps and 40Gbps network paths, and show that RoCE-based data transfers can fill a 40Gbps path using much less CPU than other protocols. We also show that the Linux zero-copy system calls can improve TCP performance considerably, especially on current Intel “Sandy Bridge”-based PCI Express 3.0 (Gen3) hosts.
... Other RDMA and zero-copy protocols not involving IB have been proposed to run over Ethernet. These include technologies such as Intel's Direct Ethernet Transport (DET) [6] and approaches that use iWARP-enabled NICs [12], [30]. Compared to RoCE and IB, DET does not provide full OS-bypass functionality with limited hardware support, while iWARP remains bound to the limitations of TCP/IP. ...
Conference Paper
Full-text available
The use of zero-copy RDMA is a promising area of development in support of high-performance data movement over wide-area networks. In particular, the emerging RDMA over Converged Ethernet (RoCE) standard enables the InfiniBand transport for use over existing and widely deployed network infrastructure. In this paper, we evaluate the use of RDMA over Ethernet in two deployment scenarios: 1) a gateway approach that adapts standard application connections to an RDMA-based protocol for transmission over wide-area network paths, and 2) the integration of our RDMA implementation into GridFTP, a popular data transfer tool for distributed computing. We evaluate both approaches over a number of wide-area network conditions emulated using a commercial network emulation device, and we analyze the overhead of our RDMA implementations from a systems perspective. Our results show a significant increase in network utilization and performance when using RDMA over high-latency paths with a reduced CPU and memory I/O footprint on our gateways and end host applications.
... Many researchers have been paying effort to improve the situation. Michael Oberg etc. [19] evaluated RDMA over Gigabit Ethernet as a potential Linux cluster interconnect. In the paper, they describe Ammasso Gigabit Ethernet RDMA technology, which uses a custom protocol to wrap RDMA method in TCP/IP packets and send them over Ethernet frames. ...
Conference Paper
Full-text available
Though convergence has been a buzzword in the networking industry for sometime now, no vendor has successfully brought out a solution which combines the ubiquitous nature of Ethernet with the low latency and high performance capabilities that InfiniBand offers. Most of the overlay protocols introduced in the past have had to bear with some form of performance trade off or overhead. Recent advances in InfiniBand interconnect technology has allowed vendors to come out with a new model for network convergence-RDMA over Ethernet (RDMAoE). In this model, the IB packets are encapsulated into Ethernet frames thereby allowing us to transmit them seamlessly over an Ethernet network. The job of translating InfiniBand addresses to Ethernet addresses and back is taken care of by the InfiniBand HCA. This model, allows end users access to large computational clusters through the use of ubiquitous Ethernet interconnect technology while retaining the high performance, low latency guarantees that InfiniBand provides. In this paper, we present a detailed evaluation and analysis of the new RDMAoE protocol as opposed to the earlier overlay protocols as well as native-IB and socket based implementations. Through these evaluations, we also look at whether RDMAoE brings us closer the eventual goal of network convergence. The experimental results obtained with verbs, MPI, application and data center level evaluations show that RDMAoE is capable of providing performance comparable to native-IB based applications on a standard 10 GigE network.