Figure 3 - uploaded by Markus Hidell
Content may be subject to copyright.
Forwarding architecture for SR-IOV-based virtual routers 

Forwarding architecture for SR-IOV-based virtual routers 

Source publication
Article
Full-text available
In this paper we focus on how to use open source Linux software in combination with PC hardware to build high speed virtual routers. Router virtualization means that multiple virtual router instances will run in parallel on the same hardware. To enable this, virtual components are combined in the router's data plane. This can result in performance...

Contexts in source publication

Context 1
... the throughput tests we ran only one virtual router per CPU core. However, in practice there are many scenarios where it is desirable to run more virtual routers than there are CPU cores available. In that case, one CPU core will be shared by several virtual routers. In the scalability tests, we study the impact of this sharing in terms of aggregate throughput for the virtual routers. We start with a single CPU core and then gradually increase the number of virtual routers. The 100 different UDP flows are now distributed uniformly over all virtual routers. We observe that an increasing number of virtual routers results in a certain degree of throughput drop for both set-ups. However, the impact is marginal for SR-IOV compared to the macvlan setup, as shown in Figure 7 . With sixteen virtual routers running in parallel, SR-IOV based setup reports 11.3% throughput drop as compare to 1VR. For a macvlan setup, drop is 30%. We repeat the test with all four CPU cores. It can be seen in Figure 7 that we run eight virtual routers for both setups without any throughput drop. For more than eight virtual routers, some throughput drop can be observed. For sixteen virtual routers, throughput drop is 2.3% for SR-IOV and 6.4% for macvlan compared to 1VR per CPU core (i.e. 4VRs). We draw two conclusions from these results. Firstly, we observe that scalability in terms of aggregated throughput for an increasing number of virtual routers improves with the number of CPU cores. This is observed for both virtual setups. The result can be explained by the architectural properties presented in Figure 2 and Figure 3 . When a CPU core is added, one more forwarding path is available. More paths will reduce resource contention among virtual routers and improve performance. Secondly, SR-IOV exhibits better scalability than macvlan. This is due to the fact that the SR-IOV based architecture offloads some packet handling to hardware. It results in less CPU cycles for packet forwarding in SR-IOV compared to the macvlan setup. The additional processing resource can be used to serve more virtual routers. Furthermore, we observed in the throughput tests that the SR-IOV VF driver (ixgbevf) better lends itself to parallel processing compared to the physical driver (ixgbe) for Ethernet devices. The macvlan devices are created on top of Ethernet device. In this way ixgbe is also part of macvlan setup. Accordingly, the limitations of the physical setup would automatically be inherited by the virtual setup. We are sharing physical resources among virtual routers. In such an environment, it is important to understand the implications of resource contention among virtual routers. For instance, we would like to know how an overloaded virtual router might affect the performance of other virtual routers running in parallel. We refer to this as isolation ...
Context 2
... features in hardware (details in next section), which can be helpful to improve isolation among virtual routers. III. VIRTUAL ROUTERS FORWARDING ARCHITECTURE In this section we propose a packet forwarding architecture for virtual routers. We present a packet forwarding architecture both for software-based and hardware assisted I/O virtualizations and make a comparison. For the software-based approach we consider macvlan devices (a modified version [3]) and use SR-IOV for the hardware-based approach. Our forwarding architecture is based on multi-core hardware equipped with multi-queue NICs (receive queues as well as transmit queues). Multi-core hardware is attractive to perform packet forwarding tasks in parallel for multiple virtual routers. However, in order to exploit packet processing performance benefits of a multi-core system, a multi-queue NIC is essential. Such a NIC is specifically designed to make efficient use of the processing power of multi-core systems. It is able to distribute incoming network traffic on several RX queues, and each queue can be served by a different CPU core. In this way multiple CPU cores can process incoming packets in parallel. Similarly multiple TX queues can be used for multiple packet transmissions in parallel. Such parallelization looks more suitable for virtual routers than a regular non-virtualized IP router. This is due to the fact that resources are partitioned in a virtual environment and each virtual entity has dedicated resource. It provides more independence to a multi-core system to work in parallel. For instance in a regular router, all packets would traverse the same data path and share data structures such as the forwarding table. In contrast, each virtual router has its own routing table and different cores could access multiple routing tables at the same time. Multi-queue NICs provide different methods to select RX queue for an incoming packet [6]. A first method, which is the default, is hash computation (known as Receive Side Scaling or RSS), where a hash is computed based on source and destination IP addresses. The hash is used to select the RX queue. Thus, all packets belonging to the same IP flow will use the same RX queue. The second option is to use SR-IOV, which selects RX queue based on destination MAC address. Finally, a third option is to use Flow Director a hardware feature that allows queue selection based on different packet header fields including VLAN header, source/destination IP address and port number. In the subsequent sections, we first analyze packet forwarding path for a macvlan setup. We investigate how multi-core systems can support multiple virtual routers. We highlight its potential drawbacks and explore how SR-IOV can be useful to address these issues. Inside a virtual router the forwarding decision is taken and the next hop is determined. The packet is placed on an outgoing virtual interface (egress macvlan). After this, outgoing physical interface is identified and the packet is placed in the corresponding transmission queue (known as Qdisc). The NIC’s DMA engine then fetches the packet from host memory using the TX ring of the egress interface. In the following, we will analyze the scaling of such forwarding paths on a multi-core platform. For this purpose we consider a CPU with four cores and two physical network interfaces. Figure 2 describes the forwarding architecture for such a setup. Incoming network traffic is distributed among four RX queues. Each RX queue is served by a different CPU core. Similar configurations are made among TX queues for outgoing traffic. We create four macvlan-based virtual routers on top of this physical setup as shown in Figure 2. In this way, the setup provides four parallel forwarding paths. Our idea is that the forwarding paths should be able to scale in terms of performance by adding more CPU cores and interface queues. It can also be observed in Figure 2 that incoming traffic for a virtual router may be received on any of the available RX queues. In other words one RX queue can receive packets belonging to different virtual routers. This is due to the fact that incoming packets are distributed among RX queues based on the IP header hash value (i.e. IP flows). As a result one RX queue can receive traffic for multiple virtual routers. Since RX queues are bound to CPU cores, each CPU core can be involved to serve many virtual routers. We are thus unable to dedicate a particular CPU core to serve a particular virtual router. This might be a concern when considering CPU core isolation among virtual routers. To address isolation issue, flow director might be one possible solution. It can classify network traffic based on VLAN header, different VLAN IDs may correspond to different virtual routers. However from performance perspective, this approach does not provide any hardware assistance. When using SR-IOV, a packet received on a physical interface is passed to a layer 2 switch as shown in Figure 3 . Based on the destination MAC address the packet is placed in a specific RX queue. The RX queue is reserved for a particular VF, which is associated to a virtual router. The NIC initiates a DMA action and moves the packet directly to that VF memory area. The VF generates an interrupt to notify a packet receive event. The CPU schedules a SoftIRQ to dispatch packet processing. When the SoftIRQ occurs, the forwarding decision is taken and next hop VF is determined. The packet is placed on the Qdisc of that VF. The NIC is now able to directly fetch the packet from the VF memory area and place it in a TX queue which is reserved for that VF. Finally, the packet is transmitted onto the wire. The forwarding path is different from the earlier setup in many ways. The first difference is the packet classification scheme, which here is based on the destination MAC address. Secondly, each VF has reserved queues (RX and TX). These two features are attractive when it comes to providing traffic isolation between virtual routers. This is due to fact that each VF has a unique MAC address and belongs to a specific virtual router. Thirdly, the CPU only needs to process a packet when it is available on a VF. This means there is no need to perform any physical/virtual device mapping. Our hypothesis is that this should make the forwarding process faster and result in improved performance. For a multi-core platform, we consider the same hardware setup as described earlier. We have a CPU with four cores and two network interfaces, and configure a CPU core for each VF so that all VFs belonging to one virtual router are served by the same CPU core. Similarly, VFs of other virtual routers can be served by other cores as shown in Figure 3 . In this way, this setup allows to completely slice the forwarding path of a virtual router and dedicate a CPU core to serve that particular virtual router. The advantages with this are that we expect to isolate virtual routers from each other, and thereby reduce contention for resources and improve caching performance. IV. E XPERIMENTAL EVALUATION This section presents an evaluation of macvlan and SR- IOV based virtual router platforms from different perspectives, including throughput, scalability and isolation. We relate the performance of a virtual router to regular IP forwarding in a non-virtualized Linux based router. Throughout this section, the latter is denoted “IP Forwarder”, and we use it as a reference to study the effects of applying virtualization. In our test setup IP forwarder simply forwards packets from one physical interface to other after IP protocol handling. The setup is shown in Figure 4 . We adopt a standard method to examine the performance of a router in conformance with RFC 2544 [20]. A source machine generates network traffic that passes through a device under test (DUT) and is forwarded towards a destination machine (known as sink), as shown in Figure 5 . The hardware used for traffic generator and sink, is based on the Intel Xeon Quad Core 2.26 GHz processor with 3GB of RAM. The virtual router hardware (DUT) is an AMD PhenomII Quad Core 3.2GHz processor with 4GB of RAM. Each machine is also equipped with one 10 Gbps dual-port NIC with the Intel 82599 GbE controller. The DUT is running the Linux net-next 2.6.37-rc8 kernel with namespace options enabled. As traffic generator, we use pktgen [21], which is an open source software operating as a kernel module to achieve high packet rates. On the receiver side, we use pktgen with a patch for receiver side traffic analysis [22] as the traffic sink. In all tests we generate64 byte packets to expose the DUT to maximum load. The throughput is measured in kilo packets/ second (kpps). The maximum offered load is 5500kpps, which is the highest load that our sink can receive without dropping packets. We generate traffic with 100 different UDP flows and distribute these uniformly over all virtual routers running on the DUT. We start with a simple scenario; A DUT with two physical interfaces forwards packets from one interface to another. A virtual router is configured with two virtual interfaces; one virtual interface is connected to the physical ingress interface while the other virtual interface is connected to the physical egress physical. For macvlan setup, the experimental configuration is according to Figure 2 whereas for SR-IOV it is based on Figure 3 . As a first step, a single CPU core is used. The maximum achieved throughput can be seen in Figure 6 . The non- virtualized IP forwarder reaches 1400 kpps, the macvlan setup achieves 1100 kpps, whereas SR-IOV obtains 1217 kpps. It shows there is a certain degree of virtualization overhead for both virtual router setups compared to the non- virtualized IP forwarder. However the overhead is lower for SR-IOV. We continue by adding another CPU core and a virtual router to the DUT. Figure 6 now shows the aggregate throughput of both virtual routers. We observe that SR-IOV ...
Context 3
... has its own routing table and different cores could access multiple routing tables at the same time. Multi-queue NICs provide different methods to select RX queue for an incoming packet [6]. A first method, which is the default, is hash computation (known as Receive Side Scaling or RSS), where a hash is computed based on source and destination IP addresses. The hash is used to select the RX queue. Thus, all packets belonging to the same IP flow will use the same RX queue. The second option is to use SR-IOV, which selects RX queue based on destination MAC address. Finally, a third option is to use Flow Director a hardware feature that allows queue selection based on different packet header fields including VLAN header, source/destination IP address and port number. In the subsequent sections, we first analyze packet forwarding path for a macvlan setup. We investigate how multi-core systems can support multiple virtual routers. We highlight its potential drawbacks and explore how SR-IOV can be useful to address these issues. Inside a virtual router the forwarding decision is taken and the next hop is determined. The packet is placed on an outgoing virtual interface (egress macvlan). After this, outgoing physical interface is identified and the packet is placed in the corresponding transmission queue (known as Qdisc). The NIC’s DMA engine then fetches the packet from host memory using the TX ring of the egress interface. In the following, we will analyze the scaling of such forwarding paths on a multi-core platform. For this purpose we consider a CPU with four cores and two physical network interfaces. Figure 2 describes the forwarding architecture for such a setup. Incoming network traffic is distributed among four RX queues. Each RX queue is served by a different CPU core. Similar configurations are made among TX queues for outgoing traffic. We create four macvlan-based virtual routers on top of this physical setup as shown in Figure 2. In this way, the setup provides four parallel forwarding paths. Our idea is that the forwarding paths should be able to scale in terms of performance by adding more CPU cores and interface queues. It can also be observed in Figure 2 that incoming traffic for a virtual router may be received on any of the available RX queues. In other words one RX queue can receive packets belonging to different virtual routers. This is due to the fact that incoming packets are distributed among RX queues based on the IP header hash value (i.e. IP flows). As a result one RX queue can receive traffic for multiple virtual routers. Since RX queues are bound to CPU cores, each CPU core can be involved to serve many virtual routers. We are thus unable to dedicate a particular CPU core to serve a particular virtual router. This might be a concern when considering CPU core isolation among virtual routers. To address isolation issue, flow director might be one possible solution. It can classify network traffic based on VLAN header, different VLAN IDs may correspond to different virtual routers. However from performance perspective, this approach does not provide any hardware assistance. When using SR-IOV, a packet received on a physical interface is passed to a layer 2 switch as shown in Figure 3 . Based on the destination MAC address the packet is placed in a specific RX queue. The RX queue is reserved for a particular VF, which is associated to a virtual router. The NIC initiates a DMA action and moves the packet directly to that VF memory area. The VF generates an interrupt to notify a packet receive event. The CPU schedules a SoftIRQ to dispatch packet processing. When the SoftIRQ occurs, the forwarding decision is taken and next hop VF is determined. The packet is placed on the Qdisc of that VF. The NIC is now able to directly fetch the packet from the VF memory area and place it in a TX queue which is reserved for that VF. Finally, the packet is transmitted onto the wire. The forwarding path is different from the earlier setup in many ways. The first difference is the packet classification scheme, which here is based on the destination MAC address. Secondly, each VF has reserved queues (RX and TX). These two features are attractive when it comes to providing traffic isolation between virtual routers. This is due to fact that each VF has a unique MAC address and belongs to a specific virtual router. Thirdly, the CPU only needs to process a packet when it is available on a VF. This means there is no need to perform any physical/virtual device mapping. Our hypothesis is that this should make the forwarding process faster and result in improved performance. For a multi-core platform, we consider the same hardware setup as described earlier. We have a CPU with four cores and two network interfaces, and configure a CPU core for each VF so that all VFs belonging to one virtual router are served by the same CPU core. Similarly, VFs of other virtual routers can be served by other cores as shown in Figure 3 . In this way, this setup allows to completely slice the forwarding path of a virtual router and dedicate a CPU core to serve that particular virtual router. The advantages with this are that we expect to isolate virtual routers from each other, and thereby reduce contention for resources and improve caching performance. IV. E XPERIMENTAL EVALUATION This section presents an evaluation of macvlan and SR- IOV based virtual router platforms from different perspectives, including throughput, scalability and isolation. We relate the performance of a virtual router to regular IP forwarding in a non-virtualized Linux based router. Throughout this section, the latter is denoted “IP Forwarder”, and we use it as a reference to study the effects of applying virtualization. In our test setup IP forwarder simply forwards packets from one physical interface to other after IP protocol handling. The setup is shown in Figure 4 . We adopt a standard method to examine the performance of a router in conformance with RFC 2544 [20]. A source machine generates network traffic that passes through a device under test (DUT) and is forwarded towards a destination machine (known as sink), as shown in Figure 5 . The hardware used for traffic generator and sink, is based on the Intel Xeon Quad Core 2.26 GHz processor with 3GB of RAM. The virtual router hardware (DUT) is an AMD PhenomII Quad Core 3.2GHz processor with 4GB of RAM. Each machine is also equipped with one 10 Gbps dual-port NIC with the Intel 82599 GbE controller. The DUT is running the Linux net-next 2.6.37-rc8 kernel with namespace options enabled. As traffic generator, we use pktgen [21], which is an open source software operating as a kernel module to achieve high packet rates. On the receiver side, we use pktgen with a patch for receiver side traffic analysis [22] as the traffic sink. In all tests we generate64 byte packets to expose the DUT to maximum load. The throughput is measured in kilo packets/ second (kpps). The maximum offered load is 5500kpps, which is the highest load that our sink can receive without dropping packets. We generate traffic with 100 different UDP flows and distribute these uniformly over all virtual routers running on the DUT. We start with a simple scenario; A DUT with two physical interfaces forwards packets from one interface to another. A virtual router is configured with two virtual interfaces; one virtual interface is connected to the physical ingress interface while the other virtual interface is connected to the physical egress physical. For macvlan setup, the experimental configuration is according to Figure 2 whereas for SR-IOV it is based on Figure 3 . As a first step, a single CPU core is used. The maximum achieved throughput can be seen in Figure 6 . The non- virtualized IP forwarder reaches 1400 kpps, the macvlan setup achieves 1100 kpps, whereas SR-IOV obtains 1217 kpps. It shows there is a certain degree of virtualization overhead for both virtual router setups compared to the non- virtualized IP forwarder. However the overhead is lower for SR-IOV. We continue by adding another CPU core and a virtual router to the DUT. Figure 6 now shows the aggregate throughput of both virtual routers. We observe that SR-IOV scales in a much better way compared to the other configurations. We notice that throughput increases for all setups but the increase is more significant for SR-IOV. We also note that the difference between IP forwarder and SR- IOV becomes very marginal. When we introduce a third CPU core and a virtual router, SR-IOV exceeds both IP forwarder and macvlan setups. Finally, with four CPU cores SR-IOV finishes with a significant throughput increase (32.33%) compared to macvlan and smaller increase (7%) compared to the IP forwarder. We expect better throughout for SR-IOV than for macvlan. This is due to the fact that SR-IOV offloads some (virtualization related) packet processing tasks to hardware as described in section II. However it is interesting to see that SR-IOV also obtains better throughput than the non- virtualized IP forwarder. Generally some virtualization overhead is expected for a virtual router, resulting in lower performance than for a non-virtualized IP forwarder [3], [4]. Figure 6 also confirms this when we use one or two CPU cores. However for more CPU cores SR-IOV achieves better throughput. This indicates that SR-IOV based virtualization lends itself better to parallelization than the IP forwarder. For a non-virtualized setup (e.g. IP forwarder) it is more likely that many cores share the same resource. In that case resource contention among cores may result in slow performance. On the other hand virtualization allows dedicating virtual resources to different entities e.g. each name space with a dedicated routing table. It reduces resource contention and provides more parallelism with multi-cores. In order to investigate why SR-IOV gives better performance, we have done CPU profiling using ...
Context 4
... many virtual routers. We are thus unable to dedicate a particular CPU core to serve a particular virtual router. This might be a concern when considering CPU core isolation among virtual routers. To address isolation issue, flow director might be one possible solution. It can classify network traffic based on VLAN header, different VLAN IDs may correspond to different virtual routers. However from performance perspective, this approach does not provide any hardware assistance. When using SR-IOV, a packet received on a physical interface is passed to a layer 2 switch as shown in Figure 3 . Based on the destination MAC address the packet is placed in a specific RX queue. The RX queue is reserved for a particular VF, which is associated to a virtual router. The NIC initiates a DMA action and moves the packet directly to that VF memory area. The VF generates an interrupt to notify a packet receive event. The CPU schedules a SoftIRQ to dispatch packet processing. When the SoftIRQ occurs, the forwarding decision is taken and next hop VF is determined. The packet is placed on the Qdisc of that VF. The NIC is now able to directly fetch the packet from the VF memory area and place it in a TX queue which is reserved for that VF. Finally, the packet is transmitted onto the wire. The forwarding path is different from the earlier setup in many ways. The first difference is the packet classification scheme, which here is based on the destination MAC address. Secondly, each VF has reserved queues (RX and TX). These two features are attractive when it comes to providing traffic isolation between virtual routers. This is due to fact that each VF has a unique MAC address and belongs to a specific virtual router. Thirdly, the CPU only needs to process a packet when it is available on a VF. This means there is no need to perform any physical/virtual device mapping. Our hypothesis is that this should make the forwarding process faster and result in improved performance. For a multi-core platform, we consider the same hardware setup as described earlier. We have a CPU with four cores and two network interfaces, and configure a CPU core for each VF so that all VFs belonging to one virtual router are served by the same CPU core. Similarly, VFs of other virtual routers can be served by other cores as shown in Figure 3 . In this way, this setup allows to completely slice the forwarding path of a virtual router and dedicate a CPU core to serve that particular virtual router. The advantages with this are that we expect to isolate virtual routers from each other, and thereby reduce contention for resources and improve caching performance. IV. E XPERIMENTAL EVALUATION This section presents an evaluation of macvlan and SR- IOV based virtual router platforms from different perspectives, including throughput, scalability and isolation. We relate the performance of a virtual router to regular IP forwarding in a non-virtualized Linux based router. Throughout this section, the latter is denoted “IP Forwarder”, and we use it as a reference to study the effects of applying virtualization. In our test setup IP forwarder simply forwards packets from one physical interface to other after IP protocol handling. The setup is shown in Figure 4 . We adopt a standard method to examine the performance of a router in conformance with RFC 2544 [20]. A source machine generates network traffic that passes through a device under test (DUT) and is forwarded towards a destination machine (known as sink), as shown in Figure 5 . The hardware used for traffic generator and sink, is based on the Intel Xeon Quad Core 2.26 GHz processor with 3GB of RAM. The virtual router hardware (DUT) is an AMD PhenomII Quad Core 3.2GHz processor with 4GB of RAM. Each machine is also equipped with one 10 Gbps dual-port NIC with the Intel 82599 GbE controller. The DUT is running the Linux net-next 2.6.37-rc8 kernel with namespace options enabled. As traffic generator, we use pktgen [21], which is an open source software operating as a kernel module to achieve high packet rates. On the receiver side, we use pktgen with a patch for receiver side traffic analysis [22] as the traffic sink. In all tests we generate64 byte packets to expose the DUT to maximum load. The throughput is measured in kilo packets/ second (kpps). The maximum offered load is 5500kpps, which is the highest load that our sink can receive without dropping packets. We generate traffic with 100 different UDP flows and distribute these uniformly over all virtual routers running on the DUT. We start with a simple scenario; A DUT with two physical interfaces forwards packets from one interface to another. A virtual router is configured with two virtual interfaces; one virtual interface is connected to the physical ingress interface while the other virtual interface is connected to the physical egress physical. For macvlan setup, the experimental configuration is according to Figure 2 whereas for SR-IOV it is based on Figure 3 . As a first step, a single CPU core is used. The maximum achieved throughput can be seen in Figure 6 . The non- virtualized IP forwarder reaches 1400 kpps, the macvlan setup achieves 1100 kpps, whereas SR-IOV obtains 1217 kpps. It shows there is a certain degree of virtualization overhead for both virtual router setups compared to the non- virtualized IP forwarder. However the overhead is lower for SR-IOV. We continue by adding another CPU core and a virtual router to the DUT. Figure 6 now shows the aggregate throughput of both virtual routers. We observe that SR-IOV scales in a much better way compared to the other configurations. We notice that throughput increases for all setups but the increase is more significant for SR-IOV. We also note that the difference between IP forwarder and SR- IOV becomes very marginal. When we introduce a third CPU core and a virtual router, SR-IOV exceeds both IP forwarder and macvlan setups. Finally, with four CPU cores SR-IOV finishes with a significant throughput increase (32.33%) compared to macvlan and smaller increase (7%) compared to the IP forwarder. We expect better throughout for SR-IOV than for macvlan. This is due to the fact that SR-IOV offloads some (virtualization related) packet processing tasks to hardware as described in section II. However it is interesting to see that SR-IOV also obtains better throughput than the non- virtualized IP forwarder. Generally some virtualization overhead is expected for a virtual router, resulting in lower performance than for a non-virtualized IP forwarder [3], [4]. Figure 6 also confirms this when we use one or two CPU cores. However for more CPU cores SR-IOV achieves better throughput. This indicates that SR-IOV based virtualization lends itself better to parallelization than the IP forwarder. For a non-virtualized setup (e.g. IP forwarder) it is more likely that many cores share the same resource. In that case resource contention among cores may result in slow performance. On the other hand virtualization allows dedicating virtual resources to different entities e.g. each name space with a dedicated routing table. It reduces resource contention and provides more parallelism with multi-cores. In order to investigate why SR-IOV gives better performance, we have done CPU profiling using oprofile [23]. We make a comparison between IP forwarder and SR- IOV. The software components with high CPU usage are reported in Table 1 . We find major differences in network device drivers. The IP forwarder uses the ixgbe driver for ordinary Ethernet devices. In contrast SR-IOV uses the ixgbevf driver (virtualization extensions for ixgbe) for VFs. It is shown in Table 1 that ixgbevf uses 5.4% less CPU cycles than ixgbe when four cores are used. However we also observed that ixgbevf consumes 1.5% extra cycles than ixgbe with one core (results are not shown here). This indicates that the virtual driver allows better parallel processing on a multi-core platform than a non-virtualized driver. Similarly, we also notice (Table 1) that SR-IOV consumes less CPU cycles for symbols/functions inside kernel. This is an expected outcome of better ...

Citations

... This work focuses on namespaced routing within the Linux kernel, and has limited applicability to user-space applications typically required by more complex VNFs in containers. In a previous study [24], the same team performed an in-depth analysis of SR-IOV and macvlan for network device virtualization, but did not examine user-space networking. ...
Conference Paper
Full-text available
The network performance of virtual machines plays a critical role in Network Functions Virtualization (NFV), and several technologies have been developed to address hardware-level virtualization shortcomings. Recent advances in operating system level virtualization and deployment platforms such as Docker have made containers an ideal candidate for high performance application encapsulation and deployment. However, Docker and other solutions typically use lower-performing networking mechanisms. In this paper, we explore the feasibility of using technologies designed to accelerate virtual machine networking with containers, in addition to quantifying the network performance of container-based VNFs compared to the state-of-the-art virtual machine solutions. Our results show that containerized applications can provide lower latency and delay variation, and can take advantage of high performance networking technologies previously only used for hardware virtualization.
... Ubuntu virtual desktops [3]. In article [11] authors evaluated performance of virtual router platforms based on Linux namespaces and show that hardware assisted virtual routers can achieve better aggregate throughput than a nonvirtualized router on a multi-core platform [11]. It is noteworthy that in 2014 years Cisco in a white paper [7] describes technology Virtual Extensible LAN (VXLAN) and how to use CSR 1000V to route between VXLAN segments (VXLAN Layer 3 routing) in addition to switch Cisco Nexus 1000V support for VXLAN. ...
... Ubuntu virtual desktops [3]. In article [11] authors evaluated performance of virtual router platforms based on Linux namespaces and show that hardware assisted virtual routers can achieve better aggregate throughput than a nonvirtualized router on a multi-core platform [11]. It is noteworthy that in 2014 years Cisco in a white paper [7] describes technology Virtual Extensible LAN (VXLAN) and how to use CSR 1000V to route between VXLAN segments (VXLAN Layer 3 routing) in addition to switch Cisco Nexus 1000V support for VXLAN. ...
Article
Full-text available
Virtualization of physical network devices is a relatively new technology, that allows to improve the network organization and gives new possibilities for Software Defined Networking (SDN). Network virtualization is also commonly used for testing and debugging environments, before implementing new designs in production networks. Important aspect of network virtualization is selecting virtual platform and technology, that offer maximal performance with minimal physical resource utilization. This article presents a comparative analysis of performance of the virtual network created by the virtual CSR1000v and virtual machines running Windows 8.1 on two different virtual private cloud platforms: VMware vSphere 5.5 and Microsoft Hyper-V Server 2012 R2. In such prepared testbed we study the response time (delay) and throughput of virtual network devices.
... However, such hardware-based offloading approach bypasses the processing of high-functional virtual switches, and therefore, applying this method to Edge-Overlay model is difficult. Sira et al.[11][12] have proposed SR-IOV[13] based approach to improve performance of virtual routers. By using this technology, received packets are directly passed from the physical NIC to the virtual NIC, therefore, the switching overhead between kernel and user space can be reduced. ...
Conference Paper
Full-text available
An Edge-Overlay model constructing virtual networks using both virtual switches and IP tunnels is promising in cloud datacenter networks. But software-implemented virtual switches can cause performance problems because the packet processing load is concentrated on a particular CPU core. Although multi queue functions like Receive Side Scaling (RSS) can distribute the load onto multiple CPU cores, there are still problems to be solved such as IRQ core collision of heavy traffic flows as well as competitive resource use between physical and virtual for packet processing. In this paper, we propose a software packet processing unit named VSE (Virtual Switch Extension) to address these problems by adaptively determining softirq cores based on both CPU load and VM-running information. Furthermore, the behavior of VSE can be managed by Open Flow controllers. Our performance evaluation results showed that throughput of our approach was higher than an existing RSSbased model as packet processing load increased. In addition, we show that our method prevented performance of high-loaded flows from being degraded by priority-based CPU core selection.
... It provides hardware support to virtualize network interface cards (NICs). SR-IOV divides a single physical PCIe device into multiple PCIe instances, called Virtual Functions (VFs) [5][15]. A VF interface is an Ethernet-like interface that can be used in a virtual router. ...
... packet forwarding rate, latency etc). In our previous work, we study PC-based virtual routers [2][15]. The first study [2] compares two container-based approaches (i.e. ...
... However, the work does not consider hardware assistance for virtualization. The other work [15] focuses on hardware assistance (i.e. SR-IOV) to enable virtual routers in LXC environment. ...
Article
Full-text available
Concerns have been raised about the performance of PC-based virtual routers as they do packet processing in software. Furthermore, it becomes challenging to maintain isolation among virtual routers due to resource contention in a shared environment. Hardware vendors recognize this issue and PC hardware with virtualization support (SR-IOV and Intel-VTd) has been introduced in recent years. In this paper, we investigate how such hardware features can be integrated with two different virtualization technologies (LXC and KVM) to enhance performance and isolation of virtual routers on shared environments. We compare LXC and KVM and our results indicate that KVM in combination with hardware support can provide better trade-offs between performance and isolation. We notice that KVM has slightly lower throughput, but has superior isolation properties by providing more explicit control of CPU resources. We demonstrate that KVM allows defining a CPU share for a virtual router, something that is difficult to achieve in LXC, where packet forwarding is done in a kernel shared by all virtual routers.