VLANs covering 7-switch topology of Fig. 2 the topology; 2-D HyperX (k) [9], where k is the number of switches in each dimension of a 2-D mesh; and CiscoDC, Cisco's recommended data center network [12], a three-layer tree with two core switches, and with parameters (m, a) where m is the number of aggregation modules, and a the number of access switch pairs associated with each aggregation module.

Source publication

COTS Data-Center Ethernet for Multipathing over Arbitrary Topologies. NSDI

Conference Paper

Full-text available

Jan 2010

Operators of data centers want a scalable network fab- ric that supports high bisection bandwidth and host mo- bility, but which costs very little to purchase and admin- ister. Ethernetalmost solves the problem - it is cheap and supports high link bandwidths - but traditional Ethernet does not scale, because its spanning-tree topology forces traffi...

Context 1

... a parallel algorithm, described in [25], based on graph-coloring heuristics, which yields speedup linear in the number of edge switches. Fig. 2 shows a relatively simple wiring topology with seven switches. One can think of this as a 1-level tree (with switch #1 as the root), augmented by adding three cross-connect links to each non-root switch. Fig. 3 shows how the heuristic greedy algorithm (Alg. 2) chooses seven VLANs to cover this topology. VLAN #1 is the original tree (and is used as the default spanning ...

View in full-text

Reliability and Survivability Analysis of Data Center Network Topologies

Article

Full-text available

Apr 2016

SPAIN: Design and algorithms for constructing large data-center ethernets from commodity switches

Article

Full-text available

Operators of data centers want a scalable network fabric that supports high bisection bandwidth and host mobility, but which costs very little to purchase and administer. Ethernet almost solves the problem – it is cheap and supports high link bandwidths – but traditional Ethernet does not scale, because its spanning-tree topology forces traffic ont...

SFabric: a scalable SDN based large layer 2 data center network fabric

Article

Full-text available

May 2019
CLUSTER COMPUT

Data centers are now the basis for many Internet and cloud computing services. Trends toward multi-core processors, end-host virtualization, and commodities of scale are pointing to future single-site data centers with millions of virtual end-hosts. The Ethernet/IP style layer 2 and layer 3 network protocols are facing some mixture of inherent limitations in supporting such large topologies: lack of scalability, difficult to management, inflexible in communication, limited support for virtual machine migration. Although several large layer 2 network technologies have been proposed in recent years, they still have several weaknesses that impede them from practical applications such as inflexible, broadcast storms, un-scalability and un-interoperability with existing devices. Software defined networking (SDN) is an emerging promising solution to the above problems due to its outstanding characteristics of control plane and data plane separation, centralized and flexible network management. However, the limited efficiency of the centralized SDN Controller and the large number of routing rules needed in switches are the two faced main challenges in scalability when adopting the existing SDN solutions in large data centers. Therefore, this paper proposes a novel SDN based large layer 2 network fabric for data centers: SFabric, which deals with the two challenges by highly reducing the interactions between the Controller and switches in computing and constructing the paths among switches in advance, and decreasing the number of routing rules in tagging and routing packets at switch levels. A prototype is developed and experimental results prove the good efficiency and scalability of the proposed method.

Dart: Divide and Specialize for Fast Response to Congestion in RDMA-based Datacenter Networks

Preprint

May 2018

Though Remote Direct Memory Access (RDMA) promises to reduce datacenter network latencies significantly compared to TCP (e.g., 10x), end-to-end congestion control in the presence of incasts is a challenge. Targeting the full generality of the congestion problem, previous schemes rely on slow, iterative convergence to the appropriate sending rates (e.g., TIMELY takes 50 RTTs). We leverage the result in several papers that most congestion in datacenter networks occurs at the receiver. Accordingly, we propose a divide-and-specialize approach, called Dart, which isolates the common case of receiver congestion and further subdivides the remaining in-network congestion into the simpler spatially-localized and the harder spatially-dispersed cases. For receiver congestion, Dart proposes direct apportioning of sending rates (DASR) in which a receiver for n senders directs each sender to cut its rate by a factor of n, converging in only one RTT. For the spatially-localized case, Dart employs deflection by adding novel switch hardware for in-order flow deflection (IOFD) because RDMA disallows packet reordering, providing fast (under one RTT), light-weight response. For the uncommon spatially-dispersed case, Dart falls back to DCQCN. Small- scale testbed measurements and at-scale simulations, respectively, show that Dart achieves 60% (2.5x) and 79% (4.8x) lower 99th percentile latency, and similar and 58% higher throughput than RoCE, and TIMELY and DCQCN

Implementing advanced network functions for datacenters with stateful programmable data planes

Conference Paper

Jun 2017

SDN-Based Load Balancing Scheme for Fat-Tree Data Center Networks

Article

Full-text available

Jan 2017

Shavan Askar

this paper proposes a new load balancing algorithm for data center networks by means of exploiting the characteristics of Software Defined Networks. Mininet was utilized as an emulation tool for the purpose of emulating and evaluating the proposed design, Miniedit was utilized as a GUI tool for the same purpose. In order to obtain a realistic environment to the data center network, Fat-Tree topology was utilized with the following parameters; 4 pods, 16 edge switches, 16 aggregation switches, 4 core switches, and 16 hosts. Different scenarios and traffic distributions were applied in order to cover as much possible cases of the real traffic. POX controller was chosen as an SDN controller. The suggested design showed outperformance when compared to the traditional scheme in term of throughput and loss rate for all the evaluated scenarios. The first scenario assumes joining of new hosts while in the second scenario; there was an increase in the demand of the already established connections. The proposed algorithm showed a loss free performance in the first scenarios, whereas, the traditional scheme presented 15% to 31% loss rate for the same scenario. In the second scenario, the proposed algorithm recorded up to 81% improvement in the loss rate when compared to the traditional scheme. Moreover, the proposed algorithm showed a superiority over the traditional scheme in term of throughput, where it maintained the throughput intact without any reduction in the first scenario in contrast to the traditional scheme that underwent from a considerable degradation in the throughput value. The traditional scheme underwent from an average throughput reduction of 5Mbps in the case of joining of new hosts (first scenario). In the second scenario, both schemes underwent from a throughput reduction, however, the proposed scheme always showed superiority over the traditional scheme, whereas, it recorded up to 16.6% improvement in the throughput average value.

Relieving Core Routers from Dynamic Routing with off-the-shelf Equipment and Protocols

Article

Dec 2016

To answer traffic engineering goals, current backbone networks use expensive and sophisticated equipments, that run distributed algorithms to imple- ment dynamic multi-path routing (e.g., MPLS tunnels and dynamic trunk rerout- ing). We think that the same goals can be fulfilled using a simpler approach, where the core of the backbone only implements many a priori computed paths, and most adaptation to traffic engineering goals only takes place at the edge of the network. In the vein of Software Defined Networking, edge adaptation should be driven by a logically centralized controller that leverages the available paths to adapt traffic load balancing to the current demands and network status. In this article we present two algorithms to help building this vision. The first one selects sets of paths able to support future load balancing needs and adaptation to network faults. As the total number of required paths is very important, and their continuous availability requires many FIB entries in core routers, we also present a second algorithm that aggregates these paths in a reduced number of trees. This second algorithm achieves better results than previously proposed algorithms for path aggregation. To conclude, we show that off-the-shelf equipment supporting simple protocols may be used to implement routing with these trees, what shows that simplicity in the core can be achieved by using only trivially available proto- cols and their most common and unsophisticated implementation

A High-Performance Two-Phase Multipath Scheme for Data-Center Networks

Article

Full-text available

Nov 2016
COMPUT NETW

Multipath forwarding has been recently proposed to improve utilization in data centers leveraged by its redundant network design. However, most multipath proposals require significant modifications to the tenants’ network stack and therefore are only feasible in private clouds. In this paper, we propose the Two-Phase Multipath (TPM) forwarding scheme for public clouds. The proposal improves tenants’ network throughput, whereas keeping unmodified network stack on tenants. Our scheme is composed of a smart offline configuration phase that discover optimal disjoint paths, and a fast online path selection phase that improves flow throughput at run time. A logically centralized manager uses a genetic algorithm to generate and install sets of paths, summarized into trees, during multipath configuration, and a local controller performs the multipath selection based on network usage. We analyze TPM for different workloads and topologies under several scenarios of usage database locations and update policies. The results show that our proposal yields up to 77% throughput gains over previously proposed approaches.

A Survey of Network Isolation Solutions for Multi-Tenant Data Centers

Article

Full-text available

Oct 2016

The Infrastructure-as-a-Service (IaaS) model is one of the fastest growing opportunities for cloud based service providers. It provides an environment that reduces operating and capital expenses while increasing agility and reliability of critical information systems. In this multitenancy environment, cloud-based service providers are challenged with providing a secure isolation service combining different vertical segments, such as financial or public services, while nevertheless meeting industry standards and legal compliance requirements within their data centers. In order to achieve this, new solutions are being designed and proposed to provide traffic isolation for a large numbers of tenants and their resulting traffic volumes. This paper highlights key challenges that cloud-based service providers might encounter while providing multi-tenant environments. It also succinctly describes some key solutions for providing simultaneous tenant and network isolation, as well as highlights their respective advantages and disadvantages. We begin with Generic Routing Encapsulation (GRE) introduced in 1994 in "RFC 1701", and will conclude with today’s latest solutions. We detail fifteen of the newest architectures and then compare their complexities, the overhead they induce, their VM migration abilities, their resilience, their scalability, and their multi data center capacities. This paper is intended for, but not limited to, cloud-based service providers who want to deploy the most appropriate isolation solution for their needs, taking into consideration their existing network infrastructure. This survey provides details and comparisons of various proposals while also highlighting possible guidelines for future research on issues pertaining to the design of new network isolation architectures

Quickly Converging Renumbering in Network with Hierarchical Link-State Routing Protocol

Article

Jun 2016
IEICE T INF SYST

We have developed an automatic network configuration technology for flexible and robust network construction. In this paper, we propose a two-or-more-level hierarchical link-state routing protocol in Hierarchical QoS Link Information Protocol (HQLIP). The hierarchical routing easily scales up the network by combining and stacking configured networks. HQLIP is designed not to recompute shortest-path trees from topology information in order to achieve a high-speed convergence of forwarding information base (FIB), especially when renumbering occurs in the network. In addition, we propose a fixed-midfix renumbering (FMR) method. FMR enables an even faster convergence when HQLIP is synchronized with Hierarchical/Automatic Number Allocation (HANA). Experiments demonstrate that HQLIP incorporating FMR achieves the convergence time within one second in the network where 22 switches and 800 server terminals are placed, and is superior to Open Shortest Path First (OSPF) in terms of a convergence time. This shows that a combination of HQLIP and HANA performs stable renumbering in link-state routing protocol networks.

Reliability and Survivability Analysis of Data Center Network Topologies

Article

Full-text available

Apr 2016
J NETW SYST MANAG

The architecture of several data centers have been proposed as alternatives to the conventional three-layer one.Most of them employ commodity equipment for cost reduction. Thus, robustness to failures becomes even more important, because commodity equipment is more failure-prone. Each architecture has a different network topology design with a specific level of redundancy. In this work, we aim at analyzing the benefits of different data center topologies taking the reliability and survivability requirements into account. We consider the topologies of three alternative data center architecture: Fat-tree, BCube, and DCell. Also, we compare these topologies with a conventional three-layer data center topology. Our analysis is independent of specific equipment, traffic patterns, or network protocols, for the sake of generality. We derive closed-form formulas for the Mean Time To Failure of each topology. The results allow us to indicate the best topology for each failure scenario. In particular, we conclude that BCube is more robust to link failures than the other topologies, whereas DCell has the most robust topology when considering switch failures. Additionally, we show that all considered alternative topologies outperform a three-layer topology for both types of failures. We also determine to which extent the robustness of BCube and DCell is influenced by the number of network interfaces per server.

Your Data Center Switch is Trying Too Hard

Conference Paper

Mar 2016

We present Sourcey, a new data center network architecture with extremely simple switches. Sourcey switches have no CPUs, no software, no forwarding tables, no state, and require no switch configuration. Sourcey pushes all control plane functions to servers. A Sourcey switch supports only source-based routing. Each packet contains a path through the network. At each hop, a Sourcey switch pops the top label on the path stack and uses the label value as the switch output port number. The major technical challenge for Sourcey is to discover and monitor the network with server-only mechanisms. We design novel algorithms that use only end-to-end measurements to efficiently discover network topology and detect failures. Sourcey explores an extreme point in the design space. It advances the concept of software-defined networking by pushing almost all network functionality to servers and making switches much simpler than before, even simpler than OpenFlow switches. It is a thought experiment to show that it is possible to build a simple data center network and seeks to raise discussion in the community on whether or not current approaches to building data center networks warrant the complexity.

Context in source publication

Similar publications

Citations