Simple network topology with two aggregation switches and four racks illustrating the tradeoff between bandwidth usage (pack servers together, (a)) and fault-tolerance (spread servers across racks, (b)). Grayed boxes indicate parts of the cluster network each allocation is using. Assuming only racks as fault domains, (a) has worst-case survival of 0.5 (four of the eight servers survive a failure of a rack), while (b) has worst-case survival of 0.75 (six of the eight servers survive).

Source publication

Surviving Failures in Bandwidth-Constrained Datacenters

Article

Full-text available

Sep 2012

Datacenter networks have been designed to tolerate failures of network equipment and provide sufficient bandwidth. In practice, however, failures and maintenance of networking and power equipment often make tens to thousands of servers unavailable, and network congestion can increase service latency. Unfortunately, there exists an inherent tradeoff...

Context 1

... performs the best. When ignoring the number of server moves, CUT+FT+BW achieves the best performance (see Figure 10). This algorithm achieves 30%−60% reduction in band- width usage in the core of the network, while at the same time improving FT by 40% − 120%. ...

View in full-text

Context 2

... of comparing the performance of the algorithm on single config- urations of problem parameters, we compare the entire achievable tradeoff boundaries for these algorithms. In other words, we run the algorithm with different values of the parameters and plot the BW and FT achieved (see Figure 10). The solid line in the figure clearly represents the best algorithm, since its performance curve "dominates" the respective curves of the other two algorithms. ...

View in full-text

Context 3

... these algorithms first use the mini- mum k-way cut to reduce the bandwidth at the core, followed by performing gradient descent that improves fault tolerance, gradient descent that improves fault tolerance and bandwidth, and random- izing the low-talking services, respectively. We show results for one of the datacenters in Figures 10 and 11(left), and results for the remaining three datacenters in Figure 13a. We note that we have examined ways to incorporate FT optimization directly within the cut procedure. ...

View in full-text

Context 4

View in full-text

Context 5

... performs the minimum k-way cut (reaching the lower- left point in Figure 11(left)), followed by the steepest descent al- gorithm that only considers improvement in fault tolerance. We executed the algorithm many times, each time allowing it to swap an increasing number of servers. ...

View in full-text

Context 6

... executed the algorithm many times, each time allowing it to swap an increasing number of servers. The resulting BW and FT met- rics of the obtained server allocations are shown in Figure 10 (in particular, the CUT+FT curve). The diagonal line in this figure represents the achievable tradeoff boundary for this algorithm; by changing the total number of performed swaps, we can control the tradeoff between BW and FT. ...

View in full-text

Context 7

... results of this algorithm depend on the value of α; higher values of α put more weight on improvement of bandwidth at the cost of not improving fault tolerance as much. Figure 11 shows the progress of this algorithm for three different values of α. By running the algorithm until convergence with several different values of α, we obtain the "benchmark boundary" to which other algorithms can be compared (see the solid line in Figure 10). ...

View in full-text

Context 8

... 11 shows the progress of this algorithm for three different values of α. By running the algorithm until convergence with several different values of α, we obtain the "benchmark boundary" to which other algorithms can be compared (see the solid line in Figure 10). Be- cause this algorithm is not optimizing over a convex function, it is not guaranteed to reach the global optimum. ...

View in full-text

Context 9

... first performs the minimum k-way cut, followed by randomizing the allocation of the least-communicating services responsible for total of y% of the total traffic in the cluster. The achievable tradeoff boundary for this algorithm 8 is in Figure 10. This algorithm achieves performance close to the CUT+FT+BW algorithm, but it does not optimize the bandwidth of the low-talking services nor the fault tolerance of the high-talking ones, which ex- plains the gap between these two algorithms. ...

View in full-text

Context 10

... algorithm starts from the current server allocation and performs steepest descent moves on the cost function that considers the fault tolerance and bandwidth. The progress of this algorithm for different values of α is shown in Fig- ure 11(right); as in CUT+FT+BW, using larger α skews the opti- mization towards optimizing bandwidth. In this figure, each marker corresponds to moving approximately additional 2% of servers. ...

View in full-text

Context 11

... that improvement is significant at the beginning and then slows down. Figure 12 shows the achievable tradeoff boundaries of FT+BW for different fractions of the cluster that are required to move. For example, notice that we obtain significant improvements by mov- 8 Using y = 0, 25, 50, 60, 70, 80, 85, 90, 95, 98, 99, 99.9, 100. ...

View in full-text

Context 12

... just 5% of the cluster. Moving 29% of the cluster achieves results similar to moving most of machines using the CUT+FT+ BW algorithm (see the outer double line in Figure 12). Results for three additional datacenters are presented in Figure 13(b). ...

View in full-text

Context 13

... 29% of the cluster achieves results similar to moving most of machines using the CUT+FT+ BW algorithm (see the outer double line in Figure 12). Results for three additional datacenters are presented in Figure 13(b). ...

View in full-text

Context 14

... notice that when running FT+BW until convergence (see Figure 13a), it achieves results close to CUT+FT+BW even without the global optimization of graph cut. This is significant, because it means we can use FT+BW incrementally (e.g., move 2% of the servers every day) and still reach similar performance as CUT+FT+BW that reshuffles the whole datacenter at once. ...

View in full-text

Context 15

... fault tolerance was reduced, stayed the same, and was improved for 7%, 35%, and 58% of services, respectively. Finally, Figure 14(left) shows the changes of bandwidth and fault tolerance for all services with reduced fault tolerance. Again, a few services contributed sig- nificantly to the 47% drop in bandwidth, but paid for it by being spread across fewer fault domains. ...

View in full-text

Context 16

... α = 0.1, FT+BW achieved reduction of bandwidth usage by 26%, but improved the fault tolerance by 140%. In this case, fault tolerance was reduced only for 2.7% of the services (see the right plot on Figure 14) and the magnitude of the reduction was much smaller than for α = 1.0. This demonstrates how the value of α controls the tradeoff between fault tolerance and bandwidth usage. ...

View in full-text

Context 17

... say that a service is affected by a potential hardware failure if its worst-case survival is less than a certain threshold H. We use H = 30% that is used in the alert sys- Figure 14: The relative change in core bandwidth (x-axis) and fault tolerance (y-axis) for all services (circles) that actually re- duced their fault tolerance for α=1 (left) and α=0.1 (right). ...

View in full-text

Expandable and Cost-Effective Network Structures for Data Centers Using Dual-Port Servers

Article

Full-text available

Jul 2013

A fundamental goal of data center networking is to efficiently interconnect a large number of servers with the low equipment cost. Several server-centric network structures for data centers have been proposed. They, however, are not truly expandable and suffer a low degree of regularity and symmetry. Inspired by the commodity servers in today's dat...

Towards bandwidth guaranteed energy efficient data center networking

Article

Full-text available

Dec 2015

The data center network connecting the servers in a data center plays a crucial role in orchestrating the infrastructure to deliver peak performance to users. In order to meet high performance and reliability requirements, the data center network is usually constructed of a massive number of network devices and links to achieve 1:1 oversubscription...

Exploration and Evaluation of Congestion Control Algorithms for Data Center Networks

Article

Full-text available

Jun 2023

In recent years, Data Center Networks (DCN) have become a very popular platform for hosting various online services and applications, such as e-commerce, social networking, large-scale computing, and web searching due to their cost-effective and efficient service provisioning. DCN, online services, and applications typically require minimal latency in any information exchange. Moreover, compared with Internet traffic, the nature of traffic in DCN is bursty, delay, and throughput sensitive. Due to this reason, state-of-the-art TCP congestion control algorithms perform poorly and suffer from the problems, such as TCP Incast, TCP Outcast, Pseudo-Congestion Effect, Buffer pressure, and Queue build-up. For improving the performance of DCN, in recent years, various congestion control algorithms have been proposed for DCN. This paper summarizes the reason why the state-of-the-art TCP congestion control algorithm performs poorly and presents an overview of the recently proposed congestion control algorithms for DCN, followed by a comparative summary of their performance.

Templating Shuffles

Preprint

Full-text available

Jul 2022

Cloud data centers are rapidly evolving. At the same time, large-scale data analytics applications require non-trivial performance tuning that is often specific to the applications, workloads, and data center infrastructure. We propose TeShu, which makes network shuffling an extensible unified service layer common to all data analytics. Since an optimal shuffle depends on a myriad of factors, TeShu introduces parameterized shuffle templates, instantiated by accurate and efficient sampling that enables TeShu to dynamically adapt to different application workloads and data center layouts. Our experimental results with real-world graph workloads show that TeShu efficiently enables shuffling optimizations that improve performance and adapt to a variety of scenarios.

MXDAG: A Hybrid Abstraction for Cluster Applications

Preprint

Jul 2021

Distributed applications, such as database queries and distributed training, consist of both compute and network tasks. DAG-based abstraction primarily targets compute tasks and has no explicit network-level scheduling. In contrast, Coflow abstraction collectively schedules network flows among compute tasks but lacks the end-to-end view of the application DAG. Because of the dependencies and interactions between these two types of tasks, it is sub-optimal to only consider one of them. We argue that co-scheduling of both compute and network tasks can help applications towards the globally optimal end-to-end performance. However, none of the existing abstractions can provide fine-grained information for co-scheduling. We propose MXDAG, an abstraction to treat both compute and network tasks explicitly. It can capture the dependencies and interactions of both compute and network tasks leading to improved application performance.

Virtual Machine Consolidation in Cloud Computing Systems: Challenges and Future Trends

Article

Full-text available

Dec 2020
WIRELESS PERS COMMUN

Cloud Computing Systems (CCSs) provides a computing capability through the Internet. It enables organizations or individuals to have a computing power without deploying and maintaining their own Information Technology infrastructure. As a cloud is realized on a vast scale cloud, it consumes an enormous amount of energy. Migration pattern, where several Virtual Machines (VMs) can be placed on a minimum number of active Physical Machines is called VMs Consolidation (VMC). Thus, this technique can be a practical approach for balancing electricity consumption and other QoS requirement in CCSs. Especially, VMC must meet the service quality requirements, minimization of both energy consumption and Service Level Agreement violation in CCSs. This paper presents a systematic survey of VMC in CCSs with particular attention to the VMC phases, metrics, objectives, migration patterns, optimization methods, and evaluation approaches of VMC. Our review study is presented based on the past literature with a focus on the type of hardware metrics, software metrics, objectives, algorithms, and architectures of VMC in CCSs.

TMaR: a two-stage MapReduce scheduler for heterogeneous environments

Article

Full-text available

Oct 2020

In the context of MapReduce task scheduling, many algorithms mainly focus on the scheduling of Reduce tasks with the assumption that scheduling of Map tasks is already done. However, in the cloud deployments of MapReduce, the input data is located on remote storage which indicates the importance of the scheduling of Map tasks as well. In this paper, we propose a two-stage Map and Reduce task scheduler for heterogeneous environments, called TMaR. TMaR schedules Map and Reduce tasks on the servers that minimize the task finish time in each stage, respectively. We employ a dynamic partition binder for Reduce tasks in the Reduce stage to lighten the shuffling traffic. Indeed, TMaR minimizes the makespan of a batch of tasks in heterogeneous environments while considering the network traffic. The simulation results demonstrate that TMaR outperforms Hadoop-stock and Hadoop-A in terms of makespan and network traffic and achieves by an average of 29%, 36%, and 14% performance using Wordcount, Sort, and Grep benchmarks. Besides, the power reduction of TMaR is up to 12%.

A maximally robustness embedding algorithm in virtual data centers with multi-attribute node ranking based on TOPSIS

Article

Full-text available

Dec 2019
J SUPERCOMPUT

The virtualization of the data center network is one of the technologies that enable the performance guarantee and more flexibility and improve the utilization of infrastructure resources in cloud computing. One of the key issues in the management of virtual data center (VDC) is VDC embedding, which deals with the efficient mapping of required virtual network resources from the shared resources of the infrastructure provider (InP). In this paper, we propose a new VDC embedding algorithm that is different from previous works in many aspects. First, the provision of robustness for data center infrastructure is one of the critical requirements of cloud technology; however, this challenge has not been considered in the related literature. In order to analyze and evaluate the robustness of the infrastructure network, the classical and spectral graph robustness metrics are employed. Second, in order to avoid imbalance mapping and increase the efficiency of infrastructure resources, besides the resource dynamic capacity, four node attributes are exploited to compute the nodes mapping potential. The TOPSIS technique for nodes ranking has been used to increase the compatibility with the ideal solution. Third, unlike previous works in which the mapping phases of nodes and links are getting used to being separated, in the proposed algorithm, the virtual network is mapped to a physical network in a single step. Fourth, we also consider resources for network nodes (switches or routers). For these purposes, a multi-objective mathematical optimization problem is extracted with two goals of maximizing infrastructure network robustness and minimizing the long-term average cost-to-revenue ratio mapping for InPs. Finally, a new single-stage (non-dominated sorting-based genetic algorithm) NSGAII-based online VDCE algorithm is presented, where node mapping is TOP-MANR based and edge mapping is based on the shortest path. The fat-tree topology is considered for the substrate and virtual networks, and these two networks are modeled as a weighted undirected graph.

Memory Disaggregation: Research Problems and Opportunities

Conference Paper

Full-text available

Jul 2019

Say No to Rack Boundaries: Towards A Reconfigurable Pod-Centric DCN Architecture

Conference Paper

Apr 2019

Data center networks are designed to interconnect large clusters of servers. However, their static, rack-based architecture poses many constraints. For instance, due to over-subscription, bandwidth tends to be highly unbalanced---while servers in the same rack enjoy full bisection bandwidth through a top-of-rack (ToR) switch, servers across racks have much more constrained bandwidth. This translates to a series of performance issues for modern cloud applications. In this paper, we propose a rackless data center (RDC) architecture that removes this fixed "rack boundary". We achieve this by inserting circuit switches at the edge layer, and dynamically reconfiguring the circuits to allow servers from different racks to form "locality groups". RDC optimizes the topology between servers and edge switches based on the changing workloads, and achieves lower flow completion times and improved load balance for realistic workloads.

LAR: Locality-Aware Reconstruction for erasure-coded distributed storage systems

Article

Full-text available

Oct 2018
CONCURR COMP-PRACT E

Many modern distributed storage systems adopt erasure coding to protect data from frequent server failures for cost reason. Reconstructing data in failed servers efficiently is vital to these erasure‐coded storage systems. To this end, tree‐structured reconstruction mechanisms where blocks are transmitted and combined through a reconstruction tree have been proposed. However, existing tree‐structured reconstruction mechanisms build reconstruction trees from the perspective of available network bandwidths between servers, which are fluctuating and difficult to measure. Besides, these reconstruction mechanisms cannot reduce data transmission. In this study, we overcome these limitations by proposing LAR, a locality‐aware tree‐structured reconstruction mechanism. LAR builds reconstruction trees from the perspective of data locality, which is stable and easy to obtain. More importantly, by building reconstruction trees that combine blocks closer to each other first, LAR can reduce the data transmitted through the network core and hence speed up reconstruction. We prove that a minimum spanning tree is an optimal reconstruction tree that minimizes core bandwidth usage. We also design and implement a general reconstruction framework that supports all tree‐structured reconstruction mechanisms and nearly all erasure codes. Large‐scale simulations on commonly deployed network topologies show that LAR consumes 20%–61% less core bandwidth than previous reconstruction mechanisms. Thorough experiments on a testbed consisting of 40 physical servers show that LAR improves proactive recovery throughput by 23% at least and improves degraded read rate by up to 68%.

The Case for a Rackless Data Center Network Architecture

Conference Paper

Aug 2018

Contexts in source publication

Similar publications

Citations