ArticlePDF Available

Extending the Birkhoff-von Neumann Switching Strategy for Multicast - On the use of Optical Splitting in Switches

Authors:

Abstract and Figures

The Birkhoff-von Neumann (BVN) strategy for single-stage input-queued crossbar switches does not support multicast, as it considers only permutation-based switch configurations. This paper extends the BVN strategy to multicast switching, where an input can simultaneously transmit to multiple outputs. Knowledge of the average rates of flows is used to compute an offline schedule. We begin by considering a system in which the fanout of each flow is split in a predecided manner. We call this static splitting (as opposed to dynamic splitting where no such constraint is imposed), and we study the rate region of the switch under this restriction. We provide a graph-theoretic formulation of the rate region.
Content may be subject to copyright.
A preview of the PDF is not available
... Moreover, our objective is to minimize the flow completion time, while the frame-based multicast schedul-ing aims at finding permutation matrices that satisfy a given rate matrix. We refer to the work of Sundararajan et al. [51] for an overview on frame-based multicast scheduling. ...
... We note in this context that Sundararajan et al. [51] showed the problem of deciding the feasibility of a given rate vector in this context to be NP-hard. We note that there are subtle differences between the models, see our discussion in §2. ...
Article
Modern cloud applications has led to a huge increase in multicast flows, which is becoming one of the primary communication patterns in nowadays datacenter networks. Emerging datacenter technologies enable interesting new opportunities to support such multicast traffic more effectively and flexibly in the physical layer: novel circuit switches offer high-bandwidth and reconfigurable inter-rack multicasting capabilities. However, not much is known today about the algorithmic challenges introduced by this new technology, especially in optimizing the completion times for multicast flows. This paper presents SplitCast, a preemptive multicast scheduling approach that fully exploits emerging high-bandwidth physical-layer multicasting capabilities to reduce flow times. SplitCast dynamically reconfigures the circuit switches to adapt to the multicast traffic, accounting for reconfiguration delays. In particular, SplitCast relies on simple single-hop routing and leverages transfer flexibilities by supporting splittable multicast so that a transfer can already be delivered to just a subset of receivers when the circuit capacity is insufficient. Moreover, SplitCast supports two common forwarding models, the all-stop and the not-all-stop, during circuit reconfiguration. We conduct extensive simulation to evaluate the performance of SplitCast, and the results show that SplitCast can cut down flow times significantly compared to state-of-the-art solutions.
... Sundararajan et al. [32] extended this Birkhoff-von Neumann approach to multicast switching. Using a graph-theoretic formulation, they showed that the rate region of multicast switching without fanout splitting (defined in Section II-C1) is precisely the stable set polytope of the traffic pattern's "conflict graph", which we shall discuss in Section III-A. ...
... As a result, they showed that the problem of deciding achievability in a multicast switch is equivalent to the membership problem for the stable set polytope of a graph, which is known to be N P -hard. In addition, [32] showed that computing the offline schedule for multicast traffic, unlike that for unicast traffic, is hard. Indeed, it is equivalent to fractional weighted graph coloring, which is N P -hard in general. ...
Conference Paper
Full-text available
We consider the problem of serving multicast flows in a crossbar switch. We show that linear network coding across packets of a flow can sustain traffic patterns that cannot be served if network coding were not allowed. Thus, network coding leads to a larger rate region in a multicast crossbar switch. We demonstrate a traffic pattern which requires a switch speedup if coding is not allowed, whereas, with coding the speedup requirement is eliminated completely. In addition to throughput benefits, coding simplifies the characterization of the rate region. We give a graph-theoretic characterization of the rate region with fanout splitting and intra-flow coding, in terms of the stable set polytope of the "enhanced conflict graph" of the traffic pattern. Such a formulation is not known in the case of fanout splitting without coding. We show that computing the offline schedule (i.e. using prior knowledge of the flow arrival rates) can be reduced to certain graph coloring problems. Finally, we propose online algorithms (i.e. using only the current queue occupancy information) for multicast scheduling based on our graph-theoretic formulation. In particular, we show that a maximum weighted stable set algorithm stabilizes the queues for all rates within the rate region.
... Sundararajan et al. [32] extended this Birkhoff-von Neumann approach to multicast switching. Using a graph-theoretic formulation, they showed that the rate region of multicast switching without fanout splitting (defined in Section II-C1) is precisely the stable set polytope of the traffic pattern's "conflict graph", which we shall discuss in Section III-A. ...
... As a result, they showed that the problem of deciding achievability in a multicast switch is equivalent to the membership problem for the stable set polytope of a graph, which is known to be N P -hard. In addition, [32] showed that computing the offline schedule for multicast traffic, unlike that for unicast traffic, is hard. Indeed, it is equivalent to fractional weighted graph coloring, which is N P -hard in general. ...
Article
Full-text available
The problem of serving multicast flows in a crossbar switch is considered. Intraflow linear network coding is shown to achieve a larger rate region than the case without coding. A traffic pattern is presented which is achievable with coding but requires a switch speedup when coding is not allowed. The rate region with coding can be characterized in a simple graph-theoretic manner, in terms of the stable set polytope of the "enhanced conflict graph". No such graph-theoretic characterization is known for the case of fanout-splitting without coding. The minimum speedup needed to achieve 100% throughput with coding is shown to be upper bounded by the imperfection ratio of the enhanced conflict graph, where the imperfection ratio measures a certain graph theoretic property of the given graph. When applied to K × N switches with unicasts and broadcasts only, this gives a bound of min(2K-1/K, 2N/N+1) on the speedup. This shows that speedup, which is usually implemented in hardware, can often be substituted by network coding, which can be done in software. Computing an offline schedule (using prior knowledge of the flow rates) is reduced to fractional weighted graph coloring. A graph-theoretic online scheduling algorithm (using only queue occupancy information) is also proposed, that stabilizes the queues for all rates within the rate region.
... This is exactly the SSP of the traffic pattern's conflict graph. In addition, it has been shown in [23] that in infinite buffer networks with multicast traffic patterns but no fanout-splitting, the RR is again the SSP of the traffic pattern's conflict graph. ...
Article
Full-text available
Coding techniques may be useful for data center data survivability as well as for reducing traffic congestion. We present a queued cross-bar network (QCN) method that can be used for traffic analysis of both replication/uncoded and coded storage systems. We develop a framework for generating QCN rate regions (RRs) by analyzing their conflict graph stable set polytopes (SSPs). In doing so, we apply recent results from graph theory on the characterization of particular graph SSPs. We characterize the SSP of QCN conflict graphs under a variety of traffic patterns, allowing for their efficient RR computation. For uncoded systems, we show how to compute RRs and find rate optimal scheduling algorithms. For coded storage, we develop a RR upper bound, for which we provide an intuitive interpretation. We show that the coded storage RR upper bound is achievable in certain coded systems in which drives store sufficient coded information, as well in certain dynamic coding systems. Numerical illustrations show that coded storage can result in gains in RR volume of approximately 50%, averaged across traffic patterns.
... In both systems, we propose service schemes for UCS and NCS. Schemes presented in this section can be formulated as integer linear programs over content demand graphs, similar to those for cross-bar switches [16], [17] and are omitted here. ...
Article
Full-text available
We consider scheduling strategies for point-to-multipoint (PMP) storage area networks (SANs) that use network coded storage (NCS). In particular, we present a simple SAN system model, two server scheduling algorithms for PMP networks, and analytical expressions for internal and external blocking probability. We point to select scheduling advantages in NCS systems under normal operating conditions, where content requests can be temporarily denied owing to finite system capacity from drive I/O access or storage redundancy limitations. NCS can lead to improvements in throughput and blocking probability due to increased immediate scheduling options, and complements other well documented NCS advantages such as regeneration, and can be used as a guide for future storage system design.
Article
Datacenter networks are critical to cloud computing. The coflow abstraction is a major leap forward of application-aware network scheduling. In the context of multi-stage jobs, there are dependencies among coflows. As a result, there is a large divergence between coflow-completion-time (CCT) and job-completion-time (JCT). To our best knowledge, this is the first work that systematically studies: how to schedule dependent coflows of multi-stage jobs, so that the total weighted job completion time can be minimized. We present a formal mathematical formulation. Inspired by the optimal solution of the relaxed linear programming, we design an algorithm that runs in polynomial time to solve this problem with an approximation ratio of (2M+1) in general case, and 3 in special case, where M is the number of hosts. Evaluation results demonstrate that, the largest gap between our algorithm and the lower bound is only 9.14%. In testbeds, we reduce the JCT by up to 81.65% comparing with pure DCTCP. In simulations, we reduce the average JCT by up to 33.48% comparing with Aalo, a heuristic multi-stage coflow scheduler; we reduce the total weighted JCT by up to 83.58% comparing with LP-OV-LS, the state-of-the-art approximation algorithm of coflow scheduling.
Article
The problem of providing quality-of-service (QoS) guarantees for multicast traffic over crossbar switches has received limited attention despite the popularity of its counterpart for unicast traffic. Providing a 100% throughput to all admissible multicast traffic has been shown to be a very difficult task, and it requires a very high speedup in the switching fabric. In this paper, we introduce the concept of rate quantization and use rate quantization to show an analogy between packet scheduling in crossbar switches and circuit switching in three-stage Clos networks. We exploit the analogy to adopt circuit-switching algorithms in wide-sense and strict-sense nonblocking Clos networks in order to construct nonblocking packet schedulers for unicast and multicast traffic. We illustrate a simple multicast nonblocking packet scheduler, for which a speedup of 6logn/loglogn is sufficient to support 100% throughput for any admissible multicast traffic in an n×n crossbar switch. Moreover, we revisit some problems in unicast switch scheduling. We illustrate that the analogy provides useful perspectives, and we give a simple proof for a well-known result.
Conference Paper
Based on a decomposition result by Birkhoff and von Neumann for a doubly substochastic matrix, in this letter we propose a scheduling algorithm that is capable of providing guaranteed-rate services for input-buffered crossbar switches. Our guarantees are uniformly good for all nonuniform traffic. The computational complexity to identify the scheduling algorithm is O(N-4.5) for an N x N switch. Once the algorithm is identified, its on-line computational complexity is O(log N) and its on-line memory complexity is O(N-3 log N).
Chapter
Chapter 4 details the operation principles in different design approaches of shared-memory switches, including linked list, content-addressable memory (CAM), space-time-space, multistage. It also covers multicasting methods in shared-memory switches, including multicast logic queue, cell copy circuit, and address copy circuit.
Article
The problem of allocating network resources to the users of an integrated services network is investigated in the context of rate-based flow control. The network is assumed to be a virtual circuit, connection-based packet network. It is shown that the use of generalized processor sharing (GPS), when combined with leaky bucket admission control, allows the network to make a wide range of worst-case performance guarantees on throughput and delay. The scheme is flexible in that different users may be given widely different performance guarantees and is efficient in that each of the servers is work conserving. The authors present a practical packet-by-packet service discipline, PGPS that closely approximates GPS. This allows them to relate results for GPS to the packet-by-packet scheme in a precise manner. The performance of a single-server GPS system is analyzed exactly from the standpoint of worst-case packet delay and burstiness when the sources are constrained by leaky buckets. The worst-case session backlogs are also determined.< ></ETX