Performance Comparison of TCP vs. Unix Domain Sockets as a Function of Message Size.

Source publication

XenSocket: A High-Throughput Interdomain Transport for Virtual Machines

Conference Paper

Full-text available

Nov 2007

This paper presents the design and implementation of XenSocket, a UNIX-domain-socket-like construct for high-throughput in- terdomain (VM-to-VM) communication on the same system. The design of XenSocket replaces the Xen page-flipping mechanism with a static cir- cular memory buffer shared between two domains, wherein information is written by one d...

Context 1

... are assuming that, because the Virtual Machine Monitor (VMM) is much smaller compared to a modern monolithic kernel, it is therefore much harder to break. example, Figure 2 shows the transport throughput of two guest domains on the same machine communicating through a TCP connection. For comparison, the figure also shows the throughput of two Unix processes communicating through a UNIX domain socket stream on a native Linux system. ...

View in full-text

Context 2

... these optimizations the authors achieve a maximum receive throughput of 970 Mb/s and transmit throughput of 3310 Mb/s. While these improvements are noteworthy, the performance of the resulting system still falls short compared to that of Unix Domain Sockets (over 10,000 Mb/s, see Figure 2). ...

View in full-text

Smart Pointers and Shared Memory Synchronisation for Efficient Inter-process Communication in ROS on an Autonomous Vehicle

Preprint

Aug 2021

Despite the stringent requirements of a real-time system, the reliance of the Robot Operating System (ROS) on the loopback network interface imposes a considerable overhead on the transport of high bandwidth data, while the nodelet package, which is an efficient mechanism for intra-process communication, does not address the problem of efficient local inter-process communication (IPC). To remedy this, we propose a novel integration into ROS of smart pointers and synchronisation primitives stored in shared memory. These obey the same semantics and, more importantly, exhibit the same performance as their C++ standard library counterparts, making them preferable to other local IPC mechanisms. We present a series of benchmarks for our mechanism - which we call LOT (Low Overhead Transport) - and use them to assess its performance on realistic data loads based on Five's Autonomous Vehicle (AV) system, and extend our analysis to the case where multiple ROS nodes are running in Docker containers. We find that our mechanism performs up to two orders of magnitude better than the standard IPC via local loopback. Finally, we apply industry-standard profiling techniques to explore the hotspots of code running in both user and kernel space, comparing our implementation against alternatives.

A Survey on Hypervisor-based Virtualization of Embedded Reconfigurable Systems

Conference Paper

Aug 2021

The increase of size, capabilities, and speed of FPGAs enables the shared usage of reconfigurable resources by multiple appli-cations and even operating systems. While research on FPGA virtualization in HPC-datacenters and cloud is already well advanced, it is a rather new concept for embedded systems. The necessity for FPGA virtualization of embedded systems results from the trend to integrate multiple environments into the same hardware platform. As multiple guest operating sys-tems with different requirements, e.g., regarding real-time, security, safety, or reliability share the same resources, the focus of research lies on isolation under the constraint of hav-ing minimal impact on the overall system. Drivers for this de-velopment are, e.g., computation intensive AI-based applica-tions in the automotive or medical field, embedded 5G edge computing systems, or the consolidation of electronic control units (ECUs) on a centralized MPSoC with the goal to increase reliability by reducing complexity. This survey outlines key concepts of hypervisor-based virtualization of embedded recon-figurable systems. Hypervisor approaches are compared and classified into FPGA-based hypervisors, MPSoC-based hyper-visors and hypervisors for distributed embedded reconfigura-ble systems. Strong points and limitations are pointed out and future trends for virtualization of embedded reconfigurable systems are identified.

Aligning intent and behavior in software systems: how programs communicate & their distribution and organization

Thesis

May 2020

Will Dietz

Managing the overwhelming complexity of software is a fundamental challenge because complexity is the root cause of problems regarding software performance, size, and security. Complexity is what makes software hard to understand, and our ability to understand software in whole or in part is essential to being able to address these problems effectively. Attacking this overwhelming complexity is the fundamental challenge I seek to address by simplifying how we write, organize, and think about programs. Within this dissertation I present a system of tools and a set of solutions for improving the nature of software by focusing on programmer’s desired outcome, i.e. their intent. At the program level, the conventional focus, it is impossible to identify complexity that, at the system level, is unnecessary. This “accidental complexity” includes everything from unused features to independent implementations of common algorithmic tasks. Software techniques driving innovation simultaneously increase the distance between what is intended by humans – developers, designers, and especially the users – and what the executing code does in practice. By preserving the declarative intent of the programmer, which is lost in the traditional process of compiling and linking and building software, it is easier to abstract away unnecessary details. The Slipstream, ALLVM, and software multiplexing methods presented here automatically reduce complexity of programs while retaining intended function of the program. This results in improved performance at both startup and run-time, as well as reduced disk and memory usage.

Efficient Hybrid CPU/IO Resource Scheduling for Virtual Machines

Article

Feb 2020
IEEE T IND ELECTRON

The use of virtualization technology in industrial control is increasing. However, virtual instances or virtual machines (VMs) are confronted with the unreasonable resource allocation in the industrial control cyber range, thereby highlighting the increasing importance of resource scheduling. In the Xen open source system, the IO-intensive task response is extended because the system does not distinguish between CPU- and IO-intensive tasks. Therefore, this study presented improved task performance through the hybrid scheduling of CPU and network IO resources. This method uses Cap- and Timeslice-scheduling algorithms for CPU resource scheduling. First, the Cap-scheduling algorithm uses historical data to train a recurrent neural network model for classification and then utilizes the heuristic method to set the upper limit of cap value for VMs. Second, the Timeslice-scheduling algorithm adjusts the timeslice based on Q-learning to shorten the execution time of the overall tasks. This study proposes an IOb-scheduling algorithm for network bandwidth scheduling. The part that does not exceed the average bandwidth is eliminated and distributed to other VMs by monitoring the bandwidth usage of each VM, thereby improving the utilization of bandwidth. Experimental results showed that the proposed CPU/IO scheduling algorithms improved the overall benchmark performance substantially.

FPGAs and the Cloud – An Endless Tale of Virtualization, Elasticity and Efficiency

Article

Full-text available

Jan 2019

Field Programmable Gate Arrays (FPGAs) provide a promising opportunity to improve performance, security and energy efficiency of computing architectures, which are essential in modern data centers. Especially the background acceleration of complex and computationally intensive tasks is an important field of application. The flexible use of reconfigurable devices within a cloud context requires abstraction from the actual hardware through virtualization to offer these resources to service providers. In this paper, we present our Reconfigurable Common Computing Frame (RC2F) approach – inspired by system virtual machines – for the profound virtualization of reconfigurable hardware in cloud services. Using partial reconfiguration, our framework abstracts a single physical FPGA into multiple inde- pendent virtual FPGAs (vFPGAs). A user can request vFPGAs of different size for optimal resource utilization and energy efficiency of the whole cloud system. To enable such flexibility, we create homogeneous partitions on top of an inhomogeneous FPGA fabric abstracting from physical locations and static areas. The RC2FSEC extension combines this virtualization with a security system to allow for processing of sensitive data. On the host side our Reconfigurable Common Cloud Computing Environment (RC3E) offers different service models and manages the allocation of the dynamic vFPGAs. We demonstrate the possibilities and the resource trade-off of our approach in a basic scenario. Moreover, we present future perspectives for the use of FPGAs in cloud- based environments.

Circuit Switched VM Networks for Zero-Copy IO

Conference Paper

Aug 2018

Although applications are nowadays often executed in virtual machines (VMs) to isolate applications or consolidate physical machines, VM network performance is still challenging. Packetization, encapsulation, congestion control, preparations for loss, and copying of data introduce unnecessary performance degradation within a system where VMs communicate over abundant and reliable shared-memory. Although protocols like TCP are therefore not well suited for kernel network stack in VMs, preexisting applications require the kernel socket interface to keep functioning. In eliminating the unnecessary overhead for inter-VM communication and shifting it to the host operating system for communication over a physical NIC, our approach increases performance for both cases of communicating with another VM on the same host and for communicating with external hosts. Instead of multiplexing multiple connections over a single virtual Ethernet link, we use a separate shared-memory connection for each VM application socket. Our approach improves the stream and datagram performance of existing applications over an unmodified socket interface and brings the benefits of memory-mapped zero-copy IO to modified applications without sacrificing isolation between sockets.

Virtualizing Reconfigurable Hardware to Provide Scalability in Cloud Architectures

Article

Full-text available

Sep 2017

Field Programmable Gate Arrays (FPGAs) provide a promising opportunity to improve performance, security and energy efficiency of computing architectures, which are essential in modern data centers. Especially the background acceleration of complex and computationally intensive tasks is an important field of application. The flexible use of reconfigurable devices within a cloud context requires abstraction from the actual hardware through virtualization to offer these resources to service providers. In this paper, we enhance our related Reconfigurable Common Computing Frame (RC2F) approach, which is inspired by system virtual machines, for the profound virtualization of reconfigurable hardware in cloud services. Using partial recon-figuration, our hardware and software framework virtualizes physical FPGAs to provide multiple independent user designs on a single device. Essential components are the management of the virtual user-defined accelerators (vFPGAs), as well as their migration between physical FPGAs to achieve higher system-wide utilization levels. We create homogenous partitions on top of an inhomogeneous FPGA fabric to offer an abstraction from physical location, size and access to the real hardware. We demonstrate the possibilities and the resource trade-off of our approach in a basic scenario. Moreover, we present future perspectives for the use of FPGAs in cloud-based environments.

Designing Locality and NUMA Aware MPI Runtime for Nested Virtualization based HPC Cloud with SR-IOV Enabled InfiniBand

Conference Paper

Apr 2017

Hypervisor-based virtualization solutions reveal good security and isolation, while container-based solutions make applications and workloads more portable and distributed in an effective, standardized and repeatable way. Therefore, nested virtualization based computing environments (e.g., container over virtual machine), which inherit the capabilities from both solutions, are becoming more and more attractive in clouds (e.g., running Docker over Amazon EC2 VMs). Recent studies have shown that running applications in either VMs or containers still has significant overhead, especially for I/O intensive workloads. This motivates us to investigate whether the nested virtualization based solution can be adopted to build high-performance computing (HPC) clouds for running MPI applications efficiently and where the bottlenecks lie. To eliminate performance bottlenecks, we propose a high-performance two-layer locality and NUMA aware MPI library, which is able to dynamically detect co-resident containers inside one VM as well as detect co-resident VM inside one host at MPI runtime. Thus the MPI processes across different containers and VMs can communicate to each other by shared memory or Cross Memory Attach (CMA) channels instead of network channel if they are co-resident. We further propose an enhanced NUMA aware hybrid design to utilize InfiniBand loopback based channel to optimize large message transfer across containers when they are running on different sockets. Performance evaluations show that compared with the performance of the state-of-art (1Layer) design, our proposed enhance-hybrid design can bring up to 184%, 81% and 12% benefit on point-to-point, collective operations, and end applications. Compared with the default performance, our enhanced-hybrid design delivers up to 184%, 85% and 16% performance improvement.

Naplus: a software distributed shared memory for virtual clusters in the cloud: NAPLUS: A SOFTWARE DISTRIBUTED SHARED MEMORY FOR VIRTUAL CLUSTERS

Article

Full-text available

Feb 2017
SOFTWARE PRACT EXPER

Virtual clusters (VCs) have exhibited various advantages over traditional cluster computing platforms by virtue of their extensibility, reconfigurability, and maintainability. As such, they have become a major execution environment for cloud-based cluster applications. However, compared with traditional clusters, their distributed-memory programming paradigm still remains largely unchanged, which implies that cluster applications cannot be efficiently deployed in VCs, especially when virtual machines (VMs) are running in different physical hosts. Recently, some efforts have been made to improve inter-VM communication, resulting in many studies on how cluster applications could take advantages of VCs. However, most of them mainly focus on the situation that the VMs are all coresident on the same physical machine where the message passing mechanism is usually optimized away by exploiting the host's shared memory. In this paper, we present a design and implementation of Naplus, a kernel-based virtual machine approach to the inter-VM communications that are across different physical hosts. Naplus is based on Nahanni, a mechanism for shared-memory communication in virtual environments. As such, it not only inherits the major merits of Nahanni with respect to flexible data structures and efficient synchronization but also achieves a shared-memory paradigm among VMs. With Naplus, we enable the size of shared space to be maximized as large as the sum of each machine's local memory to accommodate cluster applications with large memory footprints. We prototype Naplus in a dual-host system where an empirical study is conducted to show the effectiveness of the Naplus approach in achieving distributed shared memory for VCs in data centers. Copyright

Workload Adaptive Shared Memory Management for High Performance Network I/O in Virtualized Cloud

Article

Nov 2016
IEEE T COMPUT

Performance Comparison of TCP vs. Unix Domain Sockets as a Function of Message Size.

Contexts in source publication

Citations