Article

A Comprehensive Survey on Coded Distributed Computing: Fundamentals, Challenges, and Networking Applications

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Distributed computing has become a common approach for large-scale computation tasks due to benefits such as high reliability, scalability, computation speed, and cost-effectiveness. However, distributed computing faces critical issues related to communication load and straggler effects. In particular, computing nodes need to exchange intermediate results with each other in order to calculate the final result, and this significantly increases communication overheads. Furthermore, a distributed computing network may include straggling nodes that run intermittently slower. This results in a longer overall time needed to execute the computation tasks, thereby limiting the performance of distributed computing. To address these issues, coded distributed computing (CDC), i.e., a combination of coding theoretic techniques and distributed computing, has been recently proposed as a promising solution. Coding theoretic techniques have proved effective in WiFi and cellular systems to deal with channel noise. Therefore, CDC may significantly reduce communication load, alleviate the effects of stragglers, provide fault-tolerance, privacy and security. In this survey, we first introduce the fundamentals of CDC, followed by basic CDC schemes. Then, we review and analyze a number of CDC approaches proposed to reduce the communication costs, mitigate the straggler effects, and guarantee privacy and security. Furthermore, we present and discuss applications of CDC in modern computer networks. Finally, we highlight important challenges and promising research directions related to CDC.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The heterogeneous CDCs with different storages and computation loads among workers were considered in [11], [12], [13], [14], [15], and [16]. For more works on CDCs, please see the survey in [17]. ...
... where R x = {B | x ∈ B, B ∈ B} for x ∈ X as mentioned in Definition 5. Equation (17) implies that any worker whose index is in R x will store file w x and compute output function ϕ x , for x ∈ X . From Lemma 5, we have ...
... □ • Shuffle phase: According to (17), we set the output function arranged set as R, i.e., each worker B ∈ B is arranged to compute the output functions Q B = {u y = ϕ y (w x1 , . . . , w x N ) | B ∈ R y , y ∈ X }. (22) Using the stored files Z B and arranged functions Q B , worker B requires the following intermediate values which it can not locally compute. ...
Preprint
Coded distributed computing, proposed by Li et al., offers significant potential for reducing the communication load in MapReduce computing systems. In the setting of the \emph{cascaded} coded distributed computing that consisting of $K$ nodes, $N$ input files, and $Q$ output functions, the objective is to compute each output function through $s\geq 1$ nodes with a computation load $r\geq 1$, enabling the application of coding techniques during the Shuffle phase to achieve minimum communication load. However, for most existing coded distributed computing schemes, a major limitation lies in their demand for splitting the original data into an exponentially growing number of input files in terms of $N/\binom{K}{r} \in\mathbb{N}$ and requiring an exponentially large number of output functions $Q/\binom{K}{s} \in\mathbb{N}$, which imposes stringent requirements for implementation and results in significant coding complexity when $K$ is large. In this paper, we focus on the cascaded case of $K/s\in\mathbb{N} $, deliberately designing the strategy of input files store and output functions assignment based on a grouping method, such that a low-complexity two-round Shuffle phase is available. The main advantages of our proposed scheme contains: 1) the communication load is quilt close to or surprisingly better than the optimal state-of-the-art scheme proposed by Li et al.; 2) our scheme requires significantly less number of input files and output functions; 3) all the operations are implemented over the minimum binary field $\mathbb{F}_2$.
... There are communication links between data owners and workers, and also between adversarial data owners, since they can collude with one another. However, like many coded computing systems [1], [2], there is no communication means between workers in our system. The system works as follows. ...
... This is because the adversaries give up their chance of using workers for computation, to cause inconsistency in the system. In fact, there exist adversarial behaviors for which it is not possible to find all of f (X is not enough information to find f (X (2) k ), k ∈ A from any set of equations. Therefore, in order to avoid complex scenarios, we define the fundamental limit based on the requirement to find only f (X k ), k ∈ H. ...
... • According to (2), inputs to tag function have a common part, X H , but there is no such constraint on x 0 and x 1 in (4). Even if collision resistance was defined as Pr (x 0 , x 1 , x 2 ) ← A : ...
Preprint
Full-text available
Coded computing has proved to be useful in distributed computing. We have observed that almost all coded computing systems studied so far consider a setup of one master and some workers. However, recently emerging technologies such as blockchain, internet of things, and federated learning introduce new requirements for coded computing systems. In these systems, data is generated in a distributed manner, so central encoding/decoding by a master is not feasible and scalable. This paper presents a fully distributed coded computing system that consists of $k\in\mathbb{N}$ data owners and $N\in\mathbb{N}$ workers, where data owners employ workers to do some computations on their data, as specified by a target function $f$ of degree $d\in\mathbb{N}$. As there is no central encoder, workers perform encoding themselves, prior to computation phase. The challenge in this system is the presence of adversarial data owners that do not know the data of honest data owners but cause discrepancies by sending different data to different workers, which is detrimental to local encodings in workers. There are at most $\beta\in\mathbb{N}$ adversarial data owners, and each sends at most $v\in\mathbb{N}$ different versions of data. Since the adversaries and their possibly colluded behavior are not known to workers and honest data owners, workers compute tags of their received data, in addition to their main computational task, and send them to data owners to help them in decoding. We introduce a tag function that allows data owners to partition workers into sets that previously had received the same data from all data owners. Then, we characterize the fundamental limit of the system, $t^*$, which is the minimum number of workers whose work can be used to correctly calculate the desired function of data of honest data owners. We show that $t^*=v^{\beta}d(K-1)+1$, and present converse and achievable proofs.
... The literature on this topic is very rich. For brevity, we explain the works closely related to ours and refer the interested reader to [2]- [4] and references within for comprehensive surveys on the topic. ...
... We restrict ourselves to constructing schemes with z = 1, known as the non-colluding regime in which the workers do neither communicate nor collaborate to learn the main node's data, and with t = 2. The main node then encodes the input matrices A and B into n computational tasks called shares of the form 2 ...
Preprint
Full-text available
This paper considers the problem of outsourcing the multiplication of two private and sparse matrices to untrusted workers. Secret sharing schemes can be used to tolerate stragglers and guarantee information-theoretic privacy of the matrices. However, traditional secret sharing schemes destroy all sparsity in the offloaded computational tasks. Since exploiting the sparse nature of matrices was shown to speed up the multiplication process, preserving the sparsity of the input matrices in the computational tasks sent to the workers is desirable. It was recently shown that sparsity can be guaranteed at the expense of a weaker privacy guarantee. Sparse secret sharing schemes with only two output shares were constructed. In this work, we construct sparse secret sharing schemes that generalize Shamir's secret sharing schemes for a fixed threshold $t=2$ and an arbitrarily large number of shares. We design our schemes to provide the strongest privacy guarantee given a desired sparsity of the shares under some mild assumptions. We show that increasing the number of shares, i.e., increasing straggler tolerance, incurs a degradation of the privacy guarantee. However, this degradation is negligible when the number of shares is comparably small to the cardinality of the input alphabet.
... To address this urgent need, a plethora of works has proposed novel methods that address various elements of distributed computing, such as scalability [3]- [11], privacy and security [12]- [21], as well as latency and straggler mitigation [22]- [28], to mention just a few. For a detailed survey of such related works, the reader is referred to [29]- [31]. In addition to the above elements, the celebrated computation-vs-communication relationship stands at the very core of distributed computing as a fundamental principle with profound ramifications. ...
Preprint
Full-text available
The work considers the $N$-server distributed computing scenario with $K$ users requesting functions that are linearly-decomposable over an arbitrary basis of $L$ real (potentially non-linear) subfunctions. In our problem, the aim is for each user to receive their function outputs, allowing for reduced reconstruction error (distortion) $\epsilon$, reduced computing cost ($\gamma$; the fraction of subfunctions each server must compute), and reduced communication cost ($\delta$; the fraction of users each server must connect to). For any given set of $K$ requested functions --- which is here represented by a coefficient matrix $\mathbf {F} \in \mathbb{R}^{K \times L}$ --- our problem is made equivalent to the open problem of sparse matrix factorization that seeks --- for a given parameter $T$, representing the number of shots for each server --- to minimize the reconstruction distortion $\frac{1}{KL}\|\mathbf {F} - \mathbf{D}\mathbf{E}\|^2_{F}$ overall $\delta$-sparse and $\gamma$-sparse matrices $\mathbf{D}\in \mathbb{R}^{K \times NT}$ and $\mathbf{E} \in \mathbb{R}^{NT \times L}$. With these matrices respectively defining which servers compute each subfunction, and which users connect to each server, we here design our $\mathbf{D},\mathbf{E}$ by designing tessellated-based and SVD-based fixed support matrix factorization methods that first split $\mathbf{F}$ into properly sized and carefully positioned submatrices, which we then approximate and then decompose into properly designed submatrices of $\mathbf{D}$ and $\mathbf{E}$. For the zero-error case and under basic dimensionality assumptions, the work reveals achievable computation-vs-communication corner points $(\gamma,\delta)$ which, for various cases, are proven optimal over a large class of $\mathbf{D},\mathbf{E}$ by means of a novel tessellations-based converse. Subsequently, for large $N$, and under basic statistical assumptions on $\mathbf{F}$, the average achievable error $\epsilon$ is concisely expressed using the incomplete first moment of the standard Marchenko-Pastur distribution, where this performance is shown to be optimal over a large class of $\mathbf{D}$ and $\mathbf{E}$. In the end, the work also reveals that the overall achieved gains over baseline methods are unbounded.
... The computational load is determined by the total sum of the computation node loads. In [75], codes distributed computing has been devised. It involves the combination of distributed computing and theoretic techniques of coding. ...
Article
Full-text available
The Sixth Generation (6G) Wireless Communication Network (WCN) is the successive provision to ameliorate the gain with ultra-low latency, and e xtremely high energy efficiency. The 6G WCN enables the specifications of artificial intelligence to optimize the services and capabilities. The vision of the 6G era is expected to address a seamless fusion of communication between the human, physical world, and digital world. The latest 6G WCN standard is a fundamental foundation and requires immense attention in the field of research. This paper presents the framework of 6G WCN with an illustration of its key technologies. The different technologies involved in 6G are well explained with the demonstration of the communication scenario such that the key performance indicators are improved with major differences. The primary contribution of this paper is the explanation of the 6G with the technologies that have a drastic impact on the characteristic aspects of a wireless communication network such as data rate, spectrum efficiency, energy efficiency, connection density, and reliability. All these technologies have the capability to revolutionize the subsequent WCN.
... The caveat is that jobs might be delayed due to straggler tasks-a well-studied effect of distributed computing that is exacerbated in the edge due to wireless connectivity and node heterogeneity. Various designs of coded computing [32] were proposed for reducing straggler-related latency: they create redundancy between tasks and discard unnecessary stragglers. Related challenges include opti- ...
Article
Full-text available
Edge systems are envisioned to provide storage and compute infrastructure at interoperating edge nodes located near the user. An edge node is orders-of-magnitude smaller than its datacenter counterpart in terms of power, network connectivity, and available compute resources. These differences have dramatic effects on the applicability of fundamental storage-related algorithms in edge settings. This article introduces the emerging edge architecture to the information theory community. We focus primarily on the applications and components of edge storage systems that are of interest to this community, on the challenges that stem from their limitations and constraints, and on the opportunities that arise from their special use cases.
... Dividing into multiple regions to build sub-models sacrifices the correlation and interaction between data and brings about the impact of accuracy. Despite the fact that approaches such as distributed computing techniques [143], transfer learning [133], and others have been proposed, it remains a representative traffic prediction challenge. ...
Preprint
Full-text available
p>With broad deployment of 5G network and pro- liferation of mobile devices, mobile network operators are not only facing massive data growth in mobile traffic, and also observing very complex and dynamic usage patterns, which bring challenges to network operation. In this context, network traffic prediction is becoming a key capability for network operation to assure quality of service and drive down the cost. Timely and accurate traffic prediction plays a crucial role in resource allocation, base-station energy saving, network planning and optimization. Recently, deep learning-based models have been widely used in mobile traffic prediction and shown significant performance gains. This survey provides a thorough account of deep learning solutions of mobile traffic prediction, involving representative data, model architectures, and applications. We start by analyzing the available data and categorize them into three major categories, and divide the traffic prediction problem into six subcategories. Then, we describe in detail how deep learning techniques are utilized to capture four crucial aspects of mobile traffic, namely temporal dependencies, spatial dependencies, external factors, and heterogeneity. We further briefly outline the applications based on mobile traffic prediction and summarize the open data and source codes. Finally, the remaining challenges and potential future directions are discussed to provide guidance for follow-up research. This article surveys the literature over the period 2017-2022 on the deep learning- based mobile traffic prediction. To the best of our knowledge, this paper is the first comprehensive survey of deep learning on mobile traffic prediction.</p
... Dividing into multiple regions to build sub-models sacrifices the correlation and interaction between data and brings about the impact of accuracy. Despite the fact that approaches such as distributed computing techniques [143], transfer learning [133], and others have been proposed, it remains a representative traffic prediction challenge. ...
Preprint
Full-text available
p>With broad deployment of 5G network and pro- liferation of mobile devices, mobile network operators are not only facing massive data growth in mobile traffic, and also observing very complex and dynamic usage patterns, which bring challenges to network operation. In this context, network traffic prediction is becoming a key capability for network operation to assure quality of service and drive down the cost. Timely and accurate traffic prediction plays a crucial role in resource allocation, base-station energy saving, network planning and optimization. Recently, deep learning-based models have been widely used in mobile traffic prediction and shown significant performance gains. This survey provides a thorough account of deep learning solutions of mobile traffic prediction, involving representative data, model architectures, and applications. We start by analyzing the available data and categorize them into three major categories, and divide the traffic prediction problem into six subcategories. Then, we describe in detail how deep learning techniques are utilized to capture four crucial aspects of mobile traffic, namely temporal dependencies, spatial dependencies, external factors, and heterogeneity. We further briefly outline the applications based on mobile traffic prediction and summarize the open data and source codes. Finally, the remaining challenges and potential future directions are discussed to provide guidance for follow-up research. This article surveys the literature over the period 2017-2022 on the deep learning- based mobile traffic prediction. To the best of our knowledge, this paper is the first comprehensive survey of deep learning on mobile traffic prediction.</p
... By introducing redundancy in distributed tasks, coded distributed computing can be more resilient to the link and edge failures. For example, if computation tasks are designed using maximum distance separable (MDS) codes, one can recover the straggling tasks from other completed tasks [22]. In Fig. 5, we show the task completion times for offloading linear computation tasks over two different channel scenarios. ...
Article
Full-text available
Technology forecasts anticipate a new era in which massive numbers of humans, machines, and things are connected to wireless networks to sense, process, act, and communicate with the surrounding environment in a real-time manner. To make the visions come true, the sixth generation (6G) wireless networks should be hyper-connected, implying that there are no constraints on the data rate, coverage, and computing. In this article, we first identify the main challenges for 6G hyper-connectivity, including terabits-per-second (Tbps) data rates for immersive user experiences, zero coverage-hole networks, and pervasive computing for connected intelligence. To overcome these challenges, we highlight key enabling technologies for 6G such as distributed and intelligence-aware cell-free massive multi-input multi-output (MIMO) networks, boundless and fully integrated terrestrial and non-terrestrial networks, and communication-aware distributed computing for computation-intensive applications. We further illustrate and discuss the hyper-connected 6G network architecture along with open issues and future research directions.
... Distributed technologies are able to improve performance and usability more than centralized counterparts [7], [8], and the blockchain has been widely used in the design of various distributed platforms due to its immutability and traceability features [9], [10]. Apart from cloud storage, the blockchain has found important applications in many other research areas. ...
Article
Distributed cloud storage (DCS) has been undergoing fast development to provide customers with multiple storage resources in a mix of locations and environments that best meet the service and performance requirements. Trust is widely regarded as one of the top obstacles to the adoption and growth of DCS. Nonetheless, existing works did not take the storage resource capacity of CSPs into the reputation evaluation, and most rely on a centralized reputation management framework. We bridge this gap by proposing a blockchain-assisted reputation mechanism for DCS. Firstly, a blockchain-assisted DCS model is proposed to ensure the traceability and tamper-proof of reputation-related data. Secondly, we propose a service credibility quantification method based on rating screening to mitigate the impact of outliers. Thirdly, we design a stochastic process to characterize the storage resource change to quantify CSP’s survival probability. Finally, we derive a reputation calculation algorithm based on the above two metrics that protects the authenticity of CSPs’ reputations in the presence of attacks. The security analysis and simulation results show that the proposed mechanism is reliable and effective in promoting the service success rate and improving the efficiency and security of the service of DCS.
... Some additional works explored the scenario where the computing servers communicate with each other through switch networks [85] or in the presence of a randomized connectivity [86], whereas some other works further investigated distributed computing over wireless channels [87], as well as explored the interesting scenario where each computing node might have limited storage and computational resources [88], [89]. A comprehensive survey on CDC is nicely presented in [90]. ...
Thesis
Caching has shown to be an excellent expedient for the purposes of reducing the traffic load in data networks. An information-theoretic study of caching, known as coded caching, represented a key breakthrough in understanding how memory can be effectively transformed into data rates. Coded caching also revealed the deep connection between caching and computing networks, which similarly show the same need for novel algorithmic solutions to reduce the traffic load. Despite the vast literature, there remain some fundamental limitations, whose resolution is critical. For instance, it is well-known that the coding gain ensured by coded caching not only is merely linear in the overall caching resources, but also turns out to be the Achilles heel of the technique in most practical settings. This thesis aims at improving and deepening the understanding of the key role that structure plays either in data or in topology for caching and computing networks. First, we explore the fundamental limits of caching under some information-theoretic models that impose structure in data, where by this we mean that we assume to know in advance what data are of interest to whom. Secondly, we investigate the impressive ramifications of having structure in network topology. Throughout the manuscript, we also show how the results in caching can be employed in the context of distributed computing.
... The heterogeneous CDC with different storages and computation loads among nodes were considered in [12]- [17]. For more works about CDC, please see a survey in [18]. ...
Preprint
Full-text available
Coded distributed computing (CDC) introduced by Li \emph{et al.} can greatly reduce the communication load for MapReduce computing systems. In the general cascaded CDC with $K$ workers, $N$ input files and $Q$ Reduce functions, each input file will be mapped by $r$ workers and each Reduce function will be computed by $s$ workers such that coding techniques can be applied to achieve the maximum multicast gain. The main drawback of most existing CDC schemes is that they require the original data to be split into a large number of input files that grows exponentially with $K$, which can significantly increase the coding complexity and degrade system performance. In this paper, we first use a classic combinatorial structure $t$-design, for any integer $t\geq 2$, to develop a low-complexity and asymptotically optimal CDC with $r=s$. The main advantages of our scheme via $t$-design are two-fold: 1) having much smaller $N$ and $Q$ than the existing schemes under the same parameters $K$, $r$ and $s$; and 2) achieving smaller communication loads compared with the state-of-the-art schemes. Remarkably, unlike the previous schemes that realize on large operation fields, our scheme operates on the minimum binary field $\mathbb{F}_2$. Furthermore, we show that our construction method can incorporate the other combinatorial structures that have a similar property to $t$-design. For instance, we use $t$-GDD to obtain another asymptotically optimal CDC scheme over $\mathbb{F}_2$ that has different parameters from $t$-design. Finally, we show that our construction method can also be used to construct CDC schemes with $r\neq s$ that have small file number and Reduce function number.
... By introducing redundancy in distributed tasks, coded distributed computing can be more resilient to the link and edge failures. For example, if computation tasks are designed using maximum distance separable (MDS) codes, one can recover the straggling tasks from other completed tasks [13]. ...
Preprint
Full-text available
Technology forecasts anticipate a new era in which massive numbers of humans, machines, and things are connected to wireless networks to sense, process, act, and communicate with the surrounding environment in a real-time manner. To make the visions come true, the sixth generation (6G) wireless networks should be hyper-connected, implying that there are no constraints on the data rate, coverage, and computing. In this article, we first identify the main challenges for 6G hyper-connectivity, including terabits-per-second (Tbps) data rates for immersive user experiences, zero coverage-hole networks, and pervasive computing for connected intelligence. To overcome these challenges, we highlight key enabling technologies for 6G such as distributed and intelligence-aware cell-free massive multi-input multioutput (MIMO) networks, boundless and fully integrated terrestrial and non-terrestrial networks, and communication-aware distributed computing for computation-intensive applications. We further illustrate and discuss the hyper-connected 6G network architecture along with open issues and future research directions.
... , ψ on the given dataset X. For a detailed survey of distributed computing, readers can refer to [14], [15]. For the rest of the paper, we call the one who wants to obtain certain computational task(s) as the Master Node and the servers as Worker Nodes. ...
Preprint
Full-text available
We consider the problem of evaluating arbitrary multivariate polynomials over several massive datasets in a distributed computing system with a single master node and multiple worker nodes. We focus on the general case when each multivariate polynomial is evaluated over its dataset and propose a generalization of the Lagrange Coded Computing framework (Yu et al. 2019) to provide robustness against stragglers who do not respond in time, adversarial workers who respond with wrong computation and information-theoretic security of dataset against colluding workers. Our scheme introduces a small computation overhead which results in a reduction in download cost and also offers comparable resistance to stragglers over existing solutions.
... Distributed Computing (DC) systems are computer networks that by parallelizing can reduce execution times of complex computing tasks such as federated learning or computer vision. MapReduce is a popular such framework and runs in three phases [1], [2]. In the first map phase, nodes calculate intermediate values (IVA) from their associated input files. ...
Preprint
Full-text available
We consider a full-duplex wireless Distributed Computing (DC) system under the MapReduce framework. New upper bound and lower bounds on the tradeoff between Normalized Delivery Time (NDT) and computation load are obtained. The lower bound is proved through an information-theoretic converse. The upper bound is based on a novel IA scheme tailored to the interference cancellation capabilities of the nodes and improves over existing bounds.
Article
The K User Linear Computation Broadcast (LCBC) problem is comprised of d dimensional data (from F <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">q</sub> ), that is fully available to a central server, and K users, who require various linear computations of the data, and have prior knowledge of various linear functions of the data as side-information. The optimal broadcast cost is the minimum number of q -ary symbols to be broadcast by the server per computation instance, for every user to retrieve its desired computation. The reciprocal of the optimal broadcast cost is called the capacity. The main contribution of this paper is the exact capacity characterization for the K = 3 user LCBC for all cases, i.e., for arbitrary finite fields F <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">q</sub> , arbitrary data dimension d , and arbitrary linear side-informations and demands at each user. A remarkable aspect of the converse (impossibility result) is that unlike the 2 user LCBC whose capacity was determined previously, the entropic formulation (where the entropies of demands and side-informations are specified, but not their functional forms) is insufficient to obtain a tight converse for the 3 user LCBC. Instead, the converse exploits functional submodularity. Notable aspects of achievability include sufficiency of vector linear coding schemes, subspace decompositions that parallel those found previously by Yao Wang in degrees of freedom (DoF) studies of wireless broadcast networks, and efficiency tradeoffs that lead to a constrained waterfilling solution. Random coding arguments are invoked to resolve compatibility issues that arise as each user has a different view of the subspace decomposition, conditioned on its own side-information.
Article
Linear computation broadcast (LCBC) refers to a setting with d dimensional data stored at a central server, where K users, each with some prior linear side-information, wish to compute various linear combinations of the data. For each computation instance, the data is represented as a d-dimensional vector with elements in a finite field $\mathbb {F}_{p^{n}}$ where $p^{n}$ is a power of a prime. The computation is to be performed many times, and the goal is to determine the minimum amount of information per computation instance that must be broadcast to satisfy all the users. The reciprocal of the optimal broadcast cost per computation instance is the capacity of LCBC. The capacity is known for up to $K=3$ users. Since LCBC includes index coding as a special case, large K settings of LCBC are at least as hard as the index coding problem. As such the general LCBC problem is beyond our reach and we do not pursue it. Instead of the general setting (all cases), by focusing on the generic setting (almost all cases) this work shows that the generic capacity of the symmetric LCBC (where every user has $m'$ dimensions of side-information and m dimensions of demand) for large number of users ( $K \geq d$ suffices) is $C_{g}=1/\Delta _{g}$ , where $\Delta _{g}=\min \left \{{\max \{0,d-m'\}, \frac {dm}{m+m'}}\right \}$ , is the broadcast cost that is both achievable and unbeatable asymptotically almost surely for large n, among all LCBC instances with the given parameters $p,K,d,m,m'$ . Relative to baseline schemes of random coding or separate transmissions, $C_{g}$ shows an extremal gain by a factor of K as a function of number of users, and by a factor of $\approx d/4$ as a function of data dimensions, when optimized over remaining parameters. For arbitrary number of users, the generic capacity of the symmetric LCBC is characterized within a factor of 2.
Chapter
The number of Internet of Things (IoT) devices worldwide is forecast to almost triple from around 10 billion in 2020 to around 30 billion IoT devices in 2030. The edge devices in these ecosystems are continuously/continually acquiring data for further analysis and leading to subsequent inferences/actions. A critical situation arises for IoT applications that are time-sensitive where the response time is ideally zero and non-negotiable. An edge computing process framework requires to address such and many more issues in these ecosystems. We present one in this paper where edge computing can be near the device, does not depend on enhancing the configuration parameters of the device and intends to be easy on network bandwidth that otherwise is required due to server-side computations.
Article
Coded distributed computing (CDC) is a new technique proposed with the purpose of decreasing the intense data exchange required for parallelizing distributed computing systems. Under the famous MapReduce paradigm, this coded approach has been shown to decrease this communication overhead by a factor that is linearly proportional to the overall computation load during the mapping phase. In this paper, we propose multi-access distributed computing (MADC) as a generalization of the original CDC model, where now mappers (nodes in charge of the map functions) and reducers (nodes in charge of the reduce functions) are distinct computing nodes that are connected through a multi-access network topology. Focusing on the MADC setting with combinatorial topology, which implies Λ mappers and K reducers such that there is a unique reducer connected to any α mappers, we propose a coded scheme and an information-theoretic converse, which jointly identify the optimal inter-reducer communication load, as a function of the computation load, to within a constant gap of 1.5. Additionally, a modified coded scheme and converse identify the optimal max-link communication load across all existing links to within a gap of 4.
Chapter
MapReduce is a programming framework designed for processing and analyzing large volumes of data in a distributed computing environment. Despite its capabilities, it faces challenges due to silent data corruption during task execution, which can yield inaccurate results. Ensuring fault tolerance in the MapReduce framework while minimizing communication overhead presents considerable challenges. This study presents CDCFT (Coded Distributed Computing Fault Tolerance), a novel approach to fault tolerance within the MapReduce paradigm, combining the strengths of TMR (Triple Modular Redundancy) and CDC (Coded Distributed Computing). By leveraging task-level TMR and voting mechanisms, CDCFT robustly defends against silent data corruption. To further optimize, CDCFT employs intra-group broadcasts for relaying intermediate messages and has a finely-tuned node grouping combined with a strategic data and task allocation procedure. Through rigorous theoretical analysis, we establish that CDCFT’s communication overhead during the Shuffle Stage is notably less than traditional CDC methods that rely on triple modular redundancy. Experimental results showcase the efficacy of CDCFT, signifying a substantial reduction in the overall communication overhead and execution time compared to the conventional fault-tolerant methods.
Chapter
In large-scale distributed computing systems, coded computing has attracted considerable attention since it can effectively mitigate the impact of stragglers. Nonetheless, several emerging issues seriously restrict the performance of coded distributed systems. First, the presence of colluding workers collude results in serious privacy leakage issues. Second, few existing works consider security issues in data transmission. Third, the number of required results to wait for increases with the degree of polynomial functions. In this paper, we propose a secure and private approximated coded distributed computing (SPACDC) scheme that addresses the aforementioned issues simultaneously. The SPACDC scheme ensures data security during the transmission process by leveraging a proposed matrix encryption algorithm based on elliptic curve cryptography. Unlike existing coding schemes, our SPACDC scheme does not impose strict constraints on the minimum number of results required to wait for. Furthermore, the SPACDC scheme provides information-theoretic privacy protection for raw data. Finally, extensive performance analysis is provided to demonstrate the effectiveness of the proposed SPACDC scheme.
Article
In this paper, for a general full-duplex wireless MapReduce distributed computing network, we investigate the minimization of the communication overhead for a given computation overhead. The wireless MapReduce framework consists of three phases: Map phase, Shuffle phase and Reduce phase. Specifically, we model the Shuffle phase into a cooperative X network based on a more general file assignment strategy. Furthermore, for this cooperative X network, we derive an information-theoretic upper bound on the sum degree of freedom (SDoF). Moreover, we propose a joint interference alignment and neutralization (IAN) scheme to characterize the achievable SDoF. Especially, in some cases, the achievable SDoF coincides with the upper bound on the SDoF, hence, the IAN scheme provides the optimal SDoF. Finally, based on the SDoF, we present an information-theoretic lower bound on the normalized delivery time (NDT) and achievable NDT of the wireless distributed computing network, which are less than or equal to those of the existing networks. The lower bound on the NDT shows that 1) there is a tradeoff between the computation load and the NDT; 2) the achievable NDT is optimal in some cases, hence, the proposed IAN scheme can reduce the communication overhead effectively.
Article
With the advancement of the Internet of Vehicles (IoV), delay-sensitive vehicular applications have flourished. Among them, the autonomous driving technology is a focal point. For autonomous driving vehicles, efficiently and timely processing the ever-increasing data is critical. In real traffic scenes, the task-processing efficiency is closely related to the traffic flows. However, the traffic flow modeling is always ignored or considered roughly in the most existing studies. For this issue, a traffic model based on a stochastic geometry framework is proposed to simulate a real traffic environment of autonomous driving vehicles. To reduce the cost of processing tasks, a distributed computation offloading scheme based on mobile edge computing (MEC) is proposed by soliciting nearby vehicles and roadside units (RSUs) with rich computing resources. For the average cost minimization optimization problem, we divide the NP-hard problem into several sub-problems and take advantage of the Lagrange multiplier with KKT constraints to solve by optimizing task splitting ratios. We compare the proposed traffic model with some common ones and also consider the pros and cons of different computation offloading strategies. Simulation results show that the proposed strategy outperforms other benchmarks and the proposed modeling method is rational.
Article
In this paper, we develop a new algorithm, named federated consensus-based algorithm (FCB), for sparse recovery, and show its performance in terms of both support recovery and signal recovery. Specifically, FCB is designed on the basis of the federated computational architecture, to increase the computational parallelism and accelerate the convergence. The algorithm design is realized by integrating accelerated projection-based consensus (APC) with greedy techniques. Then, the conditions of exact support recovery and an upper bound of signal recovery error are derived for FCB in the noisy case. From the explicit expression of the signal recovery error bound, it is confirmed that FCB can stably recover sparse signals under appropriate conditions using the coherence statistic of the measurement matrix and the minimum magnitude of nonzero elements of the signal. Experimental results illustrate the performance of FCB, validating our derived conditions of exact support recovery and upper bound of signal recovery error. In summary, FCB utilizes the federated computational architecture, enabling high parallelism and fast convergence, and uses greedy techniques to guarantee stable recovery performance.
Article
Distributed multi-task learning (MTL) can jointly learn multiple models and achieve better generalization performance by exploiting relevant information between the tasks. However, distributed MTL suffers from communication bottlenecks, in particular for large-scale learning with a massive number of tasks. This paper considers distributed MTL systems where distributed workers wish to learn different models orchestrated by a central server. To mitigate communication bottlenecks both in the uplink and downlink, we propose coded computing schemes for flexible and fixed data placements, respectively. Our schemes can significantly reduce communication loads by exploiting workers’ local information and creating multicast opportunities for both the server and workers. Moreover, we establish information-theoretic lower bounds on the optimal downlink and uplink communication loads, and prove the approximate optimality of the proposed schemes. For flexible data placement, our scheme achieves the optimal downlink communication load, and the order optimal uplink communication load that is smaller than 2 times of the information-theoretic optimum. For fixed data placement, the gaps between our communication load and the optimum are within the minimum computation load among all workers, regardless of the number of workers. Experiments demonstrate that our schemes can significantly speed up the training process compared to the traditional approach.
Article
Based on the Reed-Muller (RM) transform, this paper proposes a Reed-Solomon (RS) encoding/erasure decoding algorithm for any number of parities. Specifically, we first generalize the previous RM-based syndrome calculation, which allows only up to seven parities, to support any number of parities. Then we propose a general encoding/erasure decoding algorithm. The proposed encoding algorithm eliminates the operations in solving linear equations, and this improves the computational efficiency of existing RM-based RS algorithms. In terms of erasure decoding, this paper employs the generalized RM-based syndrome calculation and lower-upper (LU) decomposition to accelerate the computational efficiency. Analysis shows that the proposed encoding/erasure decoding algorithm approaches the complexity of $\lfloor \mathrm{lg} T \rfloor + 1$ XORs per data bit with $N$ increasing, where $T$ and $N$ denote the number of parities and codeword length respectively. To highlight the advantage of the proposed RM-based algorithms, the implementations with Single Instruction Multiple Data (SIMD) technology are provided. Simulation results show that the proposed algorithms are competitive, as compared with other cutting-edge implementations.
Article
Edge computing has recently garnered significant interest in many Internet of Things (IoT) applications. However, the excessive overhead during data exchange still remains an open challenge, especially for large-scale data processing tasks. This paper considers a master-aided distributed computing system with multiple edge computing nodes and a master node, where the master node helps edge nodes compute output functions. We propose a coded scheme to reduce the communication latency by exploiting computation and communication capabilities of all nodes and creating coded multicast opportunities. More importantly, we prove that the proposed scheme is always optimal, i.e., achieving the minimum communication latency, for arbitrary computing and storage abilities at the master. This extends the previous optimality results in the extreme cases (either the master could compute all input files or compute nothing) to the general case. Finally, numerical results and TeraSort experiments demonstrate that our schemes can greatly reduce the communication latency compared with the existing schemes.
Article
Coded distributed computing (CDC) has recently emerged to be a promising solution to address the straggling effects in conventional distributed computing systems. By assigning redundant workloads to the computing nodes, CDC can significantly enhance the performance of the whole system. However, since the core idea of CDC is to introduce redundancies to compensate for uncertainties, it may lead to a large amount of wasted energy at the edge nodes. It can be observed that the more redundant workload added, the less impact the straggling effects have on the system. However, at the same time, the more energy is needed to perform redundant tasks. In this work, we develop a novel framework, namely CERA, to elastically allocate computing resources for CDC processes. Particularly, CERA consists of two stages. In the first stage, we model a joint coding and node selection optimization problem to minimize the expected processing time for a CDC task. Since the problem is NP-hard, we propose a linearization approach and a hybrid algorithm to quickly obtain the optimal solutions. In the second stage, we develop a smart online approach based on Lyapunov optimization to dynamically turn off straggling nodes based on their actual performance. As a result, wasteful energy consumption can be significantly reduced with minimal impact on the total processing time. Simulations using real-world datasets have shown that our proposed approach can reduce the system’s total processing time by more than 200% compared to that of the state-of-the-art approach, even when the nodes’ actual performance is not known in advance. Moreover, the results have shown that CERA’s online optimization stage can reduce the energy consumption by up to 37.14% without affecting the total processing time.
Article
Full-text available
The metaverse is regarded as a new wave of technological transformation that provides a virtual space for people to interact through digital avatars. To achieve immersive user experiences in the metaverse, real-time rendering is the key technology. However, computing intensive tasks of real-time rendering from metaverse service providers cannot be processed efficiently on a single resource-limited mobile device. Alternatively, such mobile devices can offload the metaverse rendering tasks to other mobile devices by adopting the collaborative computing paradigm based on Coded Distributed Computing (CDC). Therefore, this paper introduces a hierarchical game-theoretic CDC framework for the metaverse services, especially for vehicular metaverse. In the framework, idle resources from vehicles, acting as CDC workers, are aggregated to handle intensive computation tasks in the vehicular metaverse. Specifically, in the upper layer, a miner coalition formation game is formulated based on a reputation metric to select reliable workers. To guarantee the reliable management of reputation values, the reputation values calculated based on the subjective logical model are maintained in a blockchain database. In the lower layer, a Stackelberg game based incentive mechanism is considered to attract reliable workers selected in the upper layer to participate in rendering tasks. The simulation results illustrate that the proposed framework is resistant to malicious workers. Compared with the baseline schemes, the proposed scheme can improve the utility of metaverse service provider and average profit of CDC workers.
Article
Full-text available
Li et al . introduced coded distributed computing (CDC) scheme to reduce the communication load in general distributed computing frameworks such as MapReduce. They also proposed cascaded CDC schemes where each output function is computed multiple times, and proved that such schemes achieved the fundamental trade-off between computation load and communication load. However, these schemes require exponentially large numbers of input files and output functions when the number of computing nodes gets large. In this paper, by using the structure of placement delivery arrays (PDAs), we construct several infinite classes of cascaded CDC schemes. We also show that the numbers of output functions in all the new schemes are only a factor of the number of computing nodes, and the number of input files in our new schemes is much smaller than that of input files in CDC schemes derived by Li et al .
Article
Full-text available
Due to the advanced capabilities of the Internet of Vehicles (IoV) components such as vehicles, Roadside Units (RSUs) and smart devices as well as the increasing amount of data generated, Federated Learning (FL) becomes a promising tool given that it enables privacy-preserving machine learning that can be implemented in the IoV. However, the performance of the FL suffers from the failure of communication links and missing nodes, especially when continuous exchanges of model parameters are required. Therefore, we propose the use of Unmanned Aerial Vehicles (UAVs) as wireless relays to facilitate the communications between the IoV components and the FL server and thus improving the accuracy of the FL. However, a single UAV may not have sufficient resources to provide services for all iterations of the FL process. In this paper, we present a joint auction-coalition formation framework to solve the allocation of UAV coalitions to groups of IoV components. Specifically, the coalition formation game is formulated to maximize the sum of individual profits of the UAVs. The joint auction-coalition formation algorithm is proposed to achieve a stable partition of UAV coalitions in which an auction scheme is applied to solve the allocation of UAV coalitions. The auction scheme is designed to take into account the preferences of IoV components over heterogeneous UAVs. The simulation results show that the grand coalition, where all UAVs join a single coalition, is not always stable due to the profit-maximizing behavior of the UAVs. In addition, we show that as the cooperation cost of the UAVs increases, the UAVs prefer to support the IoV components independently and not to form any coalition.
Article
Full-text available
Federated learning (FL) is a distributed machine learning approach that can achieve the purpose of collaborative learning from a large amount of data that belong to different parties without sharing the raw data among the data owners. FL can sufficiently utilize the computing capabilities of multiple learning agents to improve the learning efficiency while providing a better privacy solution for the data owners. FL attracts tremendous interests from a large number of industries due to growing privacy concerns. Future vehicular Internet of Things (IoT) systems, such as cooperative autonomous driving and intelligent transport systems (ITS), feature a large number of devices and privacy-sensitive data where the communication, computing, and storage resources must be efficiently utilized. FL could be a promising approach to solve these existing challenges. In this paper, we first conduct a brief survey of existing studies on FL and its use in wireless IoT. Then we discuss the significance and technical challenges of applying FL in vehicular IoT, and point out future research directions.
Article
Full-text available
We consider distributed gradient descent in the presence of stragglers. Recent work on gradient coding and approximate gradient coding have shown how to add redundancy in distributed gradient descent to guarantee convergence even if some workers are stragglers—that is, slow or non-responsive. In this work we propose an approximate gradient coding scheme called Stochastic Gradient Coding (SGC), which works when the stragglers are random. SGC distributes data points redundantly to workers according to a pair-wise balanced design, and then simply ignores the stragglers. We prove that the convergence rate of SGC mirrors that of batched Stochastic Gradient Descent (SGD) for the ℓ2 loss function, and show how the convergence rate can improve with the redundancy. We also provide bounds for more general convex loss functions. We show empirically that SGC requires a small amount of redundancy to handle a large number of stragglers and that it can outperform existing approximate gradient codes when the number of stragglers is large.
Article
Full-text available
The advent of 5G and the ever increasing stringent requirements in terms of bandwidth, latency, and quality of service pushes the boundaries of what is feasible with legacy Mobile Network Operators’ technologies. Network Functions Virtualization (NFV) is a promising attempt at solving some of these challenges that is widely adopted by the industry and specified by the standardization bodies. In essence, NFV is about running nf as virtualized workloads on commodity hardware. This may optimize deployment costs and simplify the lifecycle management of nf, but it introduces new fault management challenges and issues. In this paper, we propose a comprehensive state of the art of fault management techniques. We address the impact of virtualization in fault management. We propose a new classification of the recent fault management research achievements in the network virtualization environments and compare their major contributions and shortcomings.
Conference Paper
Full-text available
In this paper, we consider the straggler problem of the high-dimensional matrix multiplication over distributed workers. To tackle this problem, we propose an irregular product- coded computation, which is a generalized scheme of the standard-product-coded computation proposed in [1]. Introducing the irregularity to the product-coded matrix multiplication, one can further speed up the matrix multiplication, enjoying the low decoding complexity of the product code. The idea behind the irregular product code introduced in [2] is allowing different code rates for the row and column constituent codes of the product code. We provide a latency analysis of the proposed irregular product- coded computation. In terms of the total execution time, which is defined by a function of the computation time and decoding time, it is shown that the irregular-product-coded scheme outperforms other competing schemes including the replication, MDS-coded and standard-product-coded schemes in a specific regime.
Article
Coded computation techniques provide robustness against straggling workers in distributed computing. However, most of the existing schemes require exact provisioning of the straggling behavior and ignore the computations carried out by straggling workers. Moreover, these schemes are typically designed to recover the desired computation results accurately, while in many machine learning and iterative optimization algorithms, faster approximate solutions are known to result in an improvement in the overall convergence time. In this paper, we first introduce a novel coded matrix-vector multiplication scheme, called coded computation with partial recovery (CCPR) , which benefits from the advantages of both coded and uncoded computation schemes, and reduces both the computation time and the decoding complexity by allowing a trade-off between the accuracy and the speed of computation. We then extend this approach to distributed implementation of more general computation tasks by proposing a coded communication scheme with partial recovery, where the results of subtasks computed by the workers are coded before being communicated. Numerical simulations on a large linear regression task confirm the benefits of the proposed scheme in terms of the trade-off between the computation accuracy and latency.
Article
The problem of secure distributed batch matrix multiplication (SDBMM) studies the communication efficiency of retrieving a sequence of desired matrix products ${\mathbf{AB}} = ({\mathbf{A}}_{1}{\mathbf{B}}_{1},\,\,{\mathbf{A}}_{2}{\mathbf{B}}_{2},\,\,\cdots,\,\,{\mathbf{A}}_{S}{\mathbf{B}}_{S})$ from $N$ distributed servers where the constituent matrices ${\mathbf{A}}=({\mathbf{A}}_{1}, {\mathbf{A}}_{2}, \cdots, {\mathbf{A}}_{S})$ and ${\mathbf{B}}=({\mathbf{B}}_{1}, {\mathbf{B}}_{2},\cdots,{\mathbf{B}}_{S})$ are stored in $X$ -secure coded form, i.e., any group of up to $X$ colluding servers learn nothing about $\mathbf{ A, B}$ . It is assumed that ${\mathbf{A}}_{s}\in \mathbb {F}_{q}^{L\times K}, {\mathbf{B}}_{s}\in \mathbb {F}_{q}^{K\times M}, s\in \{1,2,\cdots, S\}$ are uniformly and independently distributed and $\mathbb {F}_{q}$ is a large finite field. The rate of an SDBMM scheme is defined as the ratio of the number of bits of desired information that is retrieved, to the total number of bits downloaded on average. The supremum of achievable rates is called the capacity of SDBMM. In this work we explore the capacity of SDBMM, as well as several of its variants, e.g., where the user may already have either ${\mathbf{A}}$ or ${\mathbf{B}}$ available as side-information, and/or where the security constraint for either ${\mathbf{A}}$ or ${\mathbf{B}}$ may be relaxed. We obtain converse bounds, as well as achievable schemes for various cases of SDBMM, depending on the $L, K, M, N, X$ parameters, and identify parameter regimes where these bounds match. In particular, the capacity for securely computing a batch of outer products of two vectors is $(1-X/N)^{+}$ , for a batch of inner products of two (long) vectors the capacity approaches $(1-2X/N)^{+}$ as the length of the vectors approaches infinity, and in general for sufficiently large $K$ (e.g., $K > 2\min (L,M)$ ), the capacity $C$ is bounded as $(1-2X/N)^{+}\leq C < (1-X/N)^{+}$ . A remarkable aspect of our upper bounds is a connection between SDBMM and a form of private information retrieval (PIR) problem, known as multi-message $X$ -secure $T$ -private information retrieval (MM-XSTPIR). Notable features of our achievable schemes include the use of cross-subspace alignment and a transformation argument that converts a scalar multiplication problem into a scalar addition problem, allowing a surprisingly efficient solution.
Article
We propose a flexible low complexity design (FLCD) of coded distributed computing (CDC) with empirical evaluation on Amazon Elastic Compute Cloud (Amazon EC2). CDC can expedite MapReduce like computation by trading increased map computations to reduce communication load and shuffle time. A main novelty of FLCD is to utilize the design freedom in defining map and reduce functions to develop asymptotic homogeneous systems to support varying intermediate values (IV) sizes under a general MapReduce framework. Compared to existing designs with constant IV sizes, FLCD offers greater flexibility in adapting to network parameters and significantly reduces the implementation complexity by requiring fewer input files and shuffle groups. The FLCD scheme is the first proposed low-complexity CDC design that can operate on a network with an arbitrary number of nodes and computation load. We perform empirical evaluations of the FLCD by executing the TeraSort algorithm on an Amazon EC2 cluster. This is the first time that theoretical predictions of the CDC shuffle time are validated by empirical evaluations. The evaluations demonstrate a 2.0 to 4.24 speedup compared to conventional uncoded MapReduce, a 12% to 52% reduction in total time, and a wider range of operating network parameters compared to existing CDC schemes.
Article
In recent years, coded distributed computing (CDC) has attracted significant attention, because it can efficiently facilitate many delay-sensitive computation tasks against unexpected latencies in distributed computing systems. Despite such a salient feature, many design challenges and opportunities remain. In this paper, we focus on practical computing systems with heterogeneous computing resources, and design a novel CDC approach, called batch-processing based coded computing (BPCC), which exploits the fact that every computing node can obtain some coded results before it completes the whole task. To this end, we first describe the main idea of the BPCC framework, and then formulate an optimization problem for BPCC to minimize the task completion time by configuring the computation load. Through formal theoretical analyses, extensive simulation studies, and comprehensive real experiments on the Amazon EC2 computing clusters, we demonstrate promising performance of the proposed BPCC scheme, in terms of high computational efficiency and robustness to uncertain disturbances.
Article
Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that the sparse architectures produced by pruning are difficult to train from the start, which would similarly improve training performance. We find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, we articulate the lottery ticket hypothesis: dense, randomly-initialized, feed-forward networks contain subnetworks (winning tickets) that-when trained in isolation-reach test accuracy comparable to the original network in a similar number of iterations. The winning tickets we find have won the initialization lottery: their connections have initial weights that make training particularly effective. We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10-20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. Above this size, the winning tickets that we find learn faster than the original network and reach higher test accuracy.
Article
Coupled with the rise of Deep Learning, the wealth of data and enhanced computation capabilities of Internet of Vehicles (IoV) components enable effective Artificial Intelligence (AI) based models to be built. Beyond ground data sources, Unmanned Aerial Vehicles (UAVs) based service providers for data collection and AI model training, i.e., Drones-as-a-Service (DaaS), is becoming increasingly popular in recent years. However, the stringent regulations governing data privacy potentially impedes data sharing across independently owned UAVs. To this end, we propose the adoption of a Federated Learning (FL) based approach to enable privacy-preserving collaborative Machine Learning across a federation of independent DaaS providers for the development of IoV applications, e.g., for traffic prediction and car park occupancy management. Given the information asymmetry and incentive mismatches between the UAVs and model owners, we leverage on the self-revealing properties of a multi-dimensional contract to ensure truthful reporting of the UAV types, while accounting for the multiple sources of heterogeneity, e.g., in sensing, computation, and transmission costs. Then, we adopt the Gale-Shapley algorithm to match the lowest cost UAV to each subregion. The simulation results validate the incentive compatibility of our contract design, and shows the efficiency of our matching, thus guaranteeing profit maximization for the model owner amid information asymmetry.
Article
How to train a machine learning model while keeping the data private and secure? We present , a fast and scalable approach to this critical problem. keeps both the data and the model information-theoretically private, while allowing efficient parallelization of training across distributed workers. We characterize ’s privacy threshold and prove its convergence for logistic (and linear) regression. Furthermore, via extensive experiments on Amazon EC2, we demonstrate that provides significant speedup over cryptographic approaches based on multi-party computing (MPC).
Article
This paper studies distributed computing mechanisms on heterogeneous mobile devices (MDs) for latency reduction in Internet of Things (IoT) services by mitigating the effect of straggling MDs. We propose novel coded distributed computing mechanisms with two different incentive distributions that consider the time-discounting value of processed results and the amount of the workload computed by MDs. Specifically, we consider distributed gradient descent computing with coding when a task publisher (TP) with a limited amount of budget offers incentives to encourage MDs’ participation in the computation. To analyze a hierarchical decision-making structure of the TP and MDs, we formulate a strategic competition between them as a Stackelberg game. In the case that the MDs are the leaders, we design a CPU-cycle frequency control scheme to balance each MD’s computing speed and energy consumption for obtaining its maximum utility with the incentive mechanisms. As the follower, the TP aims at minimizing latency of the distributed computing, and it follows the MDs’ decisions to determine the load allocation for each MD. We, then, design an algorithm achieving the Stackelberg equilibrium, which is shown to be a unique Nash equilibrium of the game. The performance evaluation results show that the proposed mechanisms achieve 39% of latency reduction on average compared to the benchmark mechanism. Furthermore, the results corroborate the efficiency of the proposed mechanisms in terms of the MDs’ social welfare.
Article
Gradient coding is a technique for straggler mitigation in distributed learning. In this paper we design novel gradient codes using tools from classical coding theory, namely, cyclic MDS codes, which compare favorably with existing solutions, both in the applicable range of parameters and in the complexity of the involved algorithms. Second, we introduce an approximate variant of the gradient coding problem, in which we settle for approximate gradient computation instead of the exact one. This approach enables graceful degradation, i.e., the $\ell _{2}$ error of the approximate gradient is a decreasing function of the number of stragglers. Our main result is that normalized adjacency matrices of expander graphs yield excellent approximate gradient codes, which enable significantly less computation compared to exact gradient coding, and guarantee faster convergence than trivial solutions under standard assumptions. We experimentally test our approach on Amazon EC2, and show that the generalization error of approximate gradient coding is very close to the full gradient while requiring significantly less computation from the workers.
Article
Distributed computing frameworks such as MapReduce are often used to process large computational jobs. They operate by partitioning each job into smaller tasks executed on different servers. The servers also need to exchange intermediate values to complete the computation. Experimental evidence suggests that this so-called Shuffle phase can be a significant part of the overall execution time for several classes of jobs. Prior work has demonstrated a natural tradeoff between computation and communication whereby running redundant copies of jobs can reduce the Shuffle traffic load, thereby leading to reduced overall execution times. For a single job, the main drawback of this approach is that it requires the original job to be split into a number of files that grows exponentially in the system parameters. When extended to multiple jobs (with specific function types), these techniques suffer from a limitation of a similar flavor, i.e., they require an exponentially large number of jobs to be executed. In practical scenarios, these requirements can significantly reduce the promised gains of the method. In this work, we show that a class of combinatorial structures called resolvable designs can be used to develop efficient coded distributed computing schemes for both the single and multiple job scenarios considered in prior work. We present both theoretical analysis and exhaustive experimental results (on Amazon EC2 clusters) that demonstrate the performance advantages of our method. For the single and multiple job cases, we obtain speed-ups of 4.69x (and 2.6x over prior work) and 4.31x over the baseline approach, respectively.
Article
Computing systems are evolving rapidly. At the device level, emerging devices are beginning to compete with traditional CMOS systems. At the architecture level, novel architectures are successfully avoiding the communication bottleneck that is a central feature, and a central limitation, of the von Neumann architecture. Furthermore, such systems are increasingly plagued by unreliability. This unreliability arises at device or gate-level in emerging devices, and can percolate up to processor or system-level if left unchecked. The goal of this article is to survey recent advances in reliable computing using unreliable elements, with an eye on nonsilicon and non-von Neumann architectures. We first observe that instead of aiming for generic computing problems, the community could use "dwarfs of modern computing," first noted in the high-performance computing (HPC) community, as a starting point. These computing problems are the basic building blocks of almost all scientific computing, machine learning, and data analytics today. Next, we survey the state of the art in "coded computing," which is an emerging area that advances on classical algorithm-based fault-tolerance (ABFT) and brings a fundamental information-theoretic perspective. By weaving error-correcting codes into a computing algorithm, coded computing provides dramatic improvements on solutions, as well as obtains novel fundamental limits, for problems that have been open for more than 30 years. We introduce existing and novel coded computing techniques in the context of "coded dwarfs," where a specific dwarf's computation is made resilient by applying coding. We discuss how, for the same redundancy, "coded dwarfs" are significantly more resilient compared to classical techniques such as replication. Furthermore, by examining a widely popular computation task--training large neural networks--we demonstrate how coded dwarfs can be applied to address this fundamentally nonlinear problem. Finally, we discuss practical challenges and future directions in implementing coded computing techniques on emerging and existing nonsilicon and/or non-von Neumann architectures.
Article
In recent years, mobile devices are equipped with increasingly advanced sensing and computing capabilities. Coupled with advancements in Deep Learning (DL), this opens up countless possibilities for meaningful applications, e.g., for medical purposes and in vehicular networks. Traditional cloud-based Machine Learning (ML) approaches require the data to be centralized in a cloud server or data center. However, this results in critical issues related to unacceptable latency and communication inefficiency. To this end, Mobile Edge Computing (MEC) has been proposed to bring intelligence closer to the edge, where data is produced. However, conventional enabling technologies for ML at mobile edge networks still require personal data to be shared with external parties, e.g., edge servers. Recently, in light of increasingly stringent data privacy legislations and growing privacy concerns, the concept of Federated Learning (FL) has been introduced. In FL, end devices use their local data to train an ML model required by the server. The end devices then send the model updates rather than raw data to the server for aggregation. FL can serve as an enabling technology in mobile edge networks since it enables the collaborative training of an ML model and also enables DL for mobile edge network optimization. However, in a large-scale and complex mobile edge network, heterogeneous devices with varying constraints are involved. This raises challenges of communication costs, resource allocation, and privacy and security in the implementation of FL at scale. In this survey, we begin with an introduction to the background and fundamentals of FL. Then, we highlight the aforementioned challenges of FL implementation and review existing solutions. Furthermore, we present the applications of FL for mobile edge network optimization. Finally, we discuss the important challenges and future research directions in FL.
Article
In recent years, the enhanced sensing and computation capabilities of Internet of Things (IoT) devices have opened the doors to several mobile crowdsensing applications. In mobile crowdsensing, a model owner announces a sensing task following which interested workers collect the required data. However, in some cases, a model owner may have insufficient data samples to build an effective machine learning model. To this end, we propose a Federated Learning based privacy preserving approach to facilitate collaborative machine learning among multiple model owners in mobile crowdsensing. Our system model allows collaborative machine learning without compromising data privacy given that only the model parameters instead of the raw data are exchanged within the federation. However, there are two main challenges of incentive mismatches between workers and model owners, as well as among model owners. For the former, we leverage on the self-revealing mechanism in contract theory under information asymmetry. For the latter, to ensure the stability of a federation through preventing free-riding attacks, we use the coalitional game theory approach that rewards model owners based on their marginal contributions. Considering the inherent hierarchical structure of the involved entities, we propose a hierarchical incentive mechanism framework. Using the backward induction, we first solve the contract formulation and then proceed to solve the coalitional game with the merge and split algorithm. The numerical results validate the performance efficiency of our proposed hierarchical incentive mechanism design, in terms of incentive compatibility of our contract design and fair payoffs of model owners in stable federation formation.
Conference Paper
Gradient coding is a technique for straggler mitigation in distributed learning. In this paper we design novel gradient codes using tools from classical coding theory, namely, cyclic MDS codes, which compare favourably with existing solutions, both in the applicable range of parameters and in the complexity of the involved algorithms. Second, we introduce an approximate variant of the gradient coding problem, in which we settle for approximate gradient computation instead of the exact one. This approach enables graceful degradation, i.e., the ℓ₂ error of the approximate gradient is a decreasing function of the number of stragglers. Our main result is that the normalized adjacency matrix of an expander graph can yield excellent approximate gradient codes, and that this approach allows us to perform significantly less computation compared to exact gradient coding. We experimentally test our approach on Amazon EC2, and show that the generalization error of approximate gradient coding is very close to the full gradient while requiring significantly less computation from the workers.
Conference Paper
We consider a scenario involving computations over a massive dataset stored distributedly across multiple workers, which is at the core of distributed learning algorithms. We propose Lagrange Coded Computing (LCC), a new framework to simultaneously provide (1) resiliency against stragglers that may prolong computations; (2) security against Byzantine (or malicious) workers that deliberately modify the computation for their benefit; and (3) (information-theoretic) privacy of the dataset amidst possible collusion of workers. LCC, which leverages the well-known Lagrange polynomial to create computation redundancy in a novel coded form across workers, can be applied to any computation scenario in which the function of interest is an arbitrary multivariate polynomial of the input dataset, hence covering many computations of interest in machine learning. LCC significantly generalizes prior works to go beyond linear computations. It also enables secure and private computing in distributed settings, improving the computation and communication efficiency of the state-of-the-art. Furthermore, we prove the optimality of LCC by showing that it achieves the optimal tradeoff between resiliency, security, and privacy, i.e., in terms of tolerating the maximum number of stragglers and adversaries, and providing data privacy against the maximum number of colluding workers. Finally, we show via experiments on Amazon EC2 that LCC speeds up the conventional uncoded implementation of distributed least-squares linear regression by up to 13.43×, and also achieves a 2.36×-12.65× speedup over the state-of-the-art straggler mitigation strategies.
Article
We consider the problem of secure distributed matrix multiplication (SDMM) in which a user wishes to compute the product of two matrices with the assistance of honest but curious servers. We construct polynomial codes for SDMM by studying a combinatorial problem on a special type of addition table, which we call the degree table. The codes are based on arithmetic progressions, and are thus named GASP (Gap Additive Secure Polynomial) Codes. GASP Codes are shown to outperform all previously known polynomial codes for secure distributed matrix multiplication in terms of download rate.
Article
Coded distributed computing (CDC) can overcome the problem that the computation of matrix multiplication with an extremely huge dimension cannot be executed in a single Internet-of-Things (IoT) node. All the encoding of existing CDC schemes are based on the linear combination (LC) to generate independent computation tasks, which introduces a heavy computational load, including a significant volume of expensive multiplications (compared with inexpensive additions) and even more expensive divisions to the encoding and decoding phases. Note that the number of elementwise multiplications of the LC operation during the encoding phase is $N$ times that of the original computation task, where $N$ denotes the number of worker nodes. In this article, to avoid expensive multiplications introduced by LC, a fresh new CDC framework based on shift-and-addition (SA) over the real field is proposed. In addition, to avoid the expensive matrix inverse operation (divisions) in the decoding phase, zigzag decoding (ZD) is incorporated. The proposed scheme, which combines SA and ZD and is hence named SAZD-based CDC, avoids expensive multiplications and divisions in both the encoding and decoding phases. It targets the following simultaneous objectives: an arbitrary $K$ out of $N$ generated computation tasks is independent and can recover the original computation tasks with the ZD algorithm, and the shift distance is small so as to cause a light additional computational load in the computation phase. Both analysis and practical study show that compared to the LC-based CDC, the SAZD-based CDC significantly reduces the computational load.
Article
Large-scale machine learning and data mining applications require computer systems to perform massive matrix-vector and matrix-matrix multiplication operations that need to be parallelized across multiple nodes. The presence of straggling nodes -- computing nodes that unpredictably slowdown or fail -- is a major bottleneck in such distributed computations. Ideal load balancing strategies that dynamically allocate more tasks to faster nodes require knowledge or monitoring of node speeds as well as the ability to quickly move data. Recently proposed fixed-rate erasure coding strategies can handle unpredictable node slowdown, but they ignore partial work done by straggling nodes thus resulting in a lot of redundant computation. We propose a rateless fountain coding strategy that achieves the best of both worlds -- we prove that its latency is asymptotically equal to ideal load balancing, and it performs asymptotically zero redundant computations. Our idea is to create linear combinations of the m rows of the matrix and assign these encoded rows to different worker nodes. The original matrix-vector product can be decoded as soon as slightly more than m row-vector products are collectively finished by the nodes. We conduct experiments in three computing environments: local parallel computing, Amazon EC2, and Amazon Lambda, which show that rateless coding gives as much as 3x speed-up over uncoded schemes.
Article
A promising research area that has recently emerged, is on how to use index coding to improve the communication efficiency in distributed computing systems, especially for data shuffling in iterative computations. In this paper, we posit that pliable index coding can offer a more efficient framework for data shuffling, as it can better leverage the many possible shuffling choices to reduce the number of transmissions. We theoretically analyze pliable index coding under data shuffling constraints, and design a hierarchical data-shuffling scheme that uses pliable coding as a component. We find benefits up to $O(ns/m)$ over index coding, where $ns/m$ is the average number of workers caching a message, and $m$ , $n$ , and $s$ are the numbers of messages, workers, and cache size, respectively.
Conference Paper
While performing distributed computations in today's cloud-based platforms, execution speed variations among compute nodes can significantly reduce the performance and create bottlenecks like stragglers. Coded computation techniques leverage coding theory to inject computational redundancy and mitigate stragglers in distributed computations. In this paper, we propose a dynamic workload distribution strategy for coded computation called Slack Squeeze Coded Computation (S²C²). S²C² squeezes the compute slack (i.e., overhead) that is built into the coded computing frameworks by efficiently assigning work for all fast and slow nodes according to their speeds and without needing to re-distribute data. We implement an LSTM-based speed prediction algorithm to predict speeds of compute nodes. We evaluate S²C² on linear algebraic algorithms, gradient descent, graph ranking, and graph filtering algorithms. We demonstrate 19% to 39% reduction in total computation latency using S²C² compared to job replication and coded computation. We further show how S²C² can be applied beyond matrix-vector multiplication.
Chapter
Every function of n inputs can be efficiently computed by a complete network of n processors in such a way that: 1. If no faults occur, no set of size t < n/2 of players gets any additional information (other than the function value), 2. Even if Byzantine faults are allowed, no set of size t < n/3 can either disrupt the computation or get additional information. Furthermore, the above bounds on t are tight!
Article
We provide novel coded computation strategies for distributed matrix–matrix products that outperform the recent “Polynomial code” constructions in recovery threshold, i.e., the required number of successful workers. When a fixed $1/m$ fraction of each matrix can be stored at each worker node, Polynomial codes require $m^{2}$ successful workers, while our MatDot codes only require $2m-1$ successful workers. However, MatDot codes have higher computation cost per worker and higher communication cost from each worker to the fusion node. We also provide a systematic construction of MatDot codes. Furthermore, we propose “PolyDot” coding that interpolates between Polynomial codes and MatDot codes to trade off computation/communication costs and recovery thresholds. Finally, we demonstrate a novel coding technique for multiplying $n$ matrices ( $n \geq 3$ ) using ideas from MatDot and PolyDot codes.