Fig 10 - uploaded by Marc Frincu
Content may be subject to copyright.
Load balancing example. 

Load balancing example. 

Source publication
Conference Paper
Full-text available
The MapReduce programming model, due to its simplicity and scalability, has become an essential tool for processing large data volumes in distributed environments. Recent Stream Processing Systems (SPS) extend this model to provide low-latency analysis of high-velocity continuous data streams. However, integrating MapReduce with streaming poses cha...

Similar publications

Article
Full-text available
Although Video-On-Demand (VOD) has been in existence for years, its cross-platform applicability in cloud service environments is still in increasing need. In this paper, an Adaptive Video-On-Demand (AVOD) framework that is suitable for private cloud environments is proposed. Private cloud has the key advantage of satisfying the real need of both c...
Article
Full-text available
Cloud manufacturing (CMfg) is a new service-oriented manufacturing paradigm in which shared resources are integrated and encapsulated as manufacturing services. When a single service is not able to meet some manufacturing requirement, a composition of multiple services is then required via CMfg. Service composition and optimal selection (SCOS) is a...
Conference Paper
Full-text available
IaaS Cloud systems enable the Cloud provider to overbook his data centre by selling more virtual resources than physical resources available. This approach works if on average the resource utilisation of a virtual machine is lower than the virtual machine boundaries. If this assumption is violated only locally, Cloud users will experience performan...

Citations

... While stream processing systems were originally envisaged to use only stateless operators, the use of stateful operators has grown to accommodate a greater range of complex stream processing such as large graph processing [35,6], machine learning [36,23] and general parallel processing [16]; where undertaking these computations "in stream" is known to be a more effective approach than offloading to a third party system [15]. Checkpointing is argued as a more efficient alternative to state replication [8,21,19,33], which becomes more evident as the state size grows. ...
Preprint
State-of-the-art distributed stream processing systems such as Apache Flink and Storm have recently included checkpointing to provide fault-tolerance for stateful applications. This is a necessary eventuality as these systems head into the Exascale regime, and is evidently more efficient than replication as state size grows. However current systems use a nominal value for the checkpoint interval, indicative of assuming roughly 1 failure every 19 days, that does not take into account the salient aspects of the checkpoint process, nor the system scale, which can readily lead to inefficient system operation. To address this shortcoming, we provide a rigorous derivation of utilization -- the fraction of total time available for the system to do useful work -- that incorporates checkpoint interval, failure rate, checkpoint cost, failure detection and restart cost, depth of the system topology and message delay. Our model yields an elegant expression for utilization and provides an optimal checkpoint interval given these parameters, interestingly showing it to be dependent only on checkpoint cost and failure rate. We confirm the accuracy and efficacy of our model through experiments with Apache Flink, where we obtain improvements in system utilization for every case, especially as the system size increases. Our model provides a solid theoretical basis for the analysis and optimization of more elaborate checkpointing approaches.
... Stream MapReduce by Brito et al. [19] introduces "windowed reducers" to output a stream of results according to a window policy. Kumbhare et al. [81] extend Stream MapReduce by methods for adaptive load-balancing, runtime elasticity and fault tolerance. Beyond approaches to adapt a streaming model in MapReduce, Apache Flink [20], Apache Spark [154] and AJIRA [137] support batch and stream processing. ...
Article
Stream Processing (SP) has evolved as the leading paradigm to process and gain value from the high volume of streaming data produced, e.g., in the domain of the Internet of Things. An SP system is a middleware that deploys a network of operators between data sources, such as sensors, and the consuming applications. SP systems typically face intense and highly dynamic data streams. Parallelization and elasticity enable SP systems to process these streams with continuous high quality of service. The current research landscape provides a broad spectrum of methods for parallelization and elasticity in SP. Each method makes specific assumptions and focuses on particular aspects. However, the literature lacks a comprehensive overview and categorization of the state of the art in SP parallelization and elasticity, which is necessary to consolidate the state of the research and to plan future research directions on this basis. Therefore, in this survey, we study the literature and develop a classification of current methods for both parallelization and elasticity in SP systems.
... Stream MapReduce by Brito et al. [19] introduces "windowed reducers" to output a stream of results according to a window policy. Kumbhare et al. [81] extend Stream MapReduce by methods for adaptive load-balancing, runtime elasticity and fault tolerance. Beyond approaches to adapt a streaming model in MapReduce, Apache Flink [20], Apache Spark [154] and AJIRA [137] support batch and stream processing. ...
Preprint
Stream Processing (SP) has evolved as the leading paradigm to process and gain value from the high volume of streaming data produced e.g. in the domain of the Internet of Things. An SP system is a middleware that deploys a network of operators between data sources, such as sensors, and the consuming applications. SP systems typically face intense and highly dynamic data streams. Parallelization and elasticity enables SP systems to process these streams with continuously high quality of service. The current research landscape provides a broad spectrum of methods for parallelization and elasticity in SP. Each method makes specific assumptions and focuses on particular aspects of the problem. However, the literature lacks a comprehensive overview and categorization of the state of the art in SP parallelization and elasticity, which is necessary to consolidate the state of the research and to plan future research directions on this basis. Therefore, in this survey, we study the literature and develop a classification of current methods for both parallelization and elasticity in SP systems.
... In contrast, in our approach (see next section) the splitter transmits at most one migration message per replica, which represents a negligible delay in the distribution activity. Ref. [25] targets streaming applications in the MapReduce framework. They have developed an asynchronous checkpointing technique to efficiently migrate the state partitions. ...
Article
Full-text available
Data stream processing applications have a long running nature (24hr/7d) with workload conditions that may exhibit wide variations at run-time. Elasticity is the term coined to describe the capability of applications to change dynamically their resource usage in response to workload fluctuations. This paper focuses on strategies for elastic data stream processing targeting multicore systems. The key idea is to exploit Model Predictive Control, a control-theoretic method that takes into account the system behavior over a future time horizon in order to decide the best reconfiguration to execute. We design a set of energy-aware proactive strategies, optimized for throughput and latency QoS requirements, which regulate the number of used cores and the CPU frequency through the Dynamic Voltage and Frequency Scaling (DVFS) support offered by modern multicore CPUs. We evaluate our strategies in a high-frequency trading application fed by synthetic and real-world workload traces. We introduce specific properties to effectively compare different elastic approaches, and the results show that our strategies are able to achieve the best outcome.
Article
Fault‐tolerance is an essential part of a stream processing system that guarantees data analysis could continue even after failures. State‐of‐the‐art distributed stream processing systems use checkpointing to support fault‐tolerance for stateful computations where the state of the computations is periodically persisted. However, the frequency of performing checkpoints impacts the performance (utilization, latency, and throughput) of the system as the checkpointing process consumes resources and time that can be used for actual computations. In practice, systems are often configured to perform checkpoints based on crude values ignoring factors such as checkpoint and restart costs, leading to suboptimal performance. In our previous work, we proposed a theoretical optimal checkpoint interval that maximizes the system utilization for stream processing systems to minimize the impact of checkpointing on system performance. In this article, we investigate the practical benefits of our proposed theoretical optimal by conducting experiments in a real‐world cloud setting using different streaming applications; we use Apache Flink, a well‐known stream processing system for our experiments. The experiment results demonstrate that an optimal interval can achieve better utilization, confirming the practicality of the theoretical model when applied to real‐world applications. We observed utilization improvements from 10% to 200% for a range of failure rates from 0.3 failures per hour to 0.075 failures per minute. Moreover, we explore how performance measures: latency and throughput are affected by the optimal interval. Our observations demonstrate that significant improvements can be achieved using the optimal interval for both latency and throughput.
Conference Paper
Full-text available
Speeding, slowing down, and sudden acceleration are the leading causes of fatal accidents on highways. Anomalous driving behavior detection can improve road safety by informing drivers who are in the vicinity of dangerous vehicles. However, detecting abnormal driving behavior at the city-scale in a centralized fashion results in considerable network and computation load, that would significantly restrict the scalability of the system. In this paper, we propose CAD3, a distributed collaborative system for road-aware and driver-aware anomaly driving detection. CAD3 considers a decentralized deployment of edge computation nodes on the roadside and combines collaborative and context-aware computation with low-latency communication to detect and inform nearby drivers of unsafe behaviors of other vehicles in real-time. Adjacent edge nodes collaborate to improve the detection of abnormal driving behavior at the city-scale. We evaluate CAD3 with a physical testbed implementation. We emulate realistic driving scenarios from a real driving data set of 3,000 vehicles, 214,000 trips, and 18 million trajectories of private cars in Shenzhen, China. At the microscopic (road) level, CAD3 significantly improves the accuracy of detection and lowers the number of potential accidents caused by false negatives up to four times and 24 times as compared to distributed standalone and centralized models, respectively. CAD3 can scale up to 256 vehicles connected to a single node while keeping the end-to-end latency under 50 ms and a required bandwidth below 5 mbps. At the mesoscopic (driver-trip) level, CAD3 performs stable and accurate detection over time, owing to local RSU interaction. With a dense deployment of edge nodes, CAD3 can scale up to the size of Shenzhen, a megalopolis of 12 million inhabitant with over 2 million concurrent vehicles at peak hours.
Article
In peer-to-peer (P2P) networks, free-riders and redundant streams including overlapped and folded streams dramatically degrade playback quality and network performance, respectively. Although a locality-aware P2P live video can reduce the topological complexity, it cannot effectively avoid redundant streams while denying free-riders. In this paper, we first model free-rider, redundant streams and a distance-driven P2P system. Based on that model, a distance-driven alliance algorithm is proposed to construct not only an alliance that directly prevents any utility gains of free-riders through inter-user constraints but also a small-world network or a multicast tree that effectively reduces redundant streams. Finally, simulations confirm its advantages in functionality and performance over several existing strategies and distance-driven P2P live video systems.
Article
Time-evolving stream datasets exist ubiquitously in many real-world applications where their inherent hot keys often evolve over times. Nevertheless, few existing solutions can provide efficient load balance on these time-evolving datasets while preserving low memory overhead. In this paper, we present a novel grouping approach (named FISH), which can provide the efficient time-evolving stream processing at scale. The key insight of this work is that the keys of time-evolving stream data can have a skewed distribution within any bounded distance of time interval. This enables to accurately identify the recent hot keys for the real-time load balance within a bounded scope. We therefore propose an epoch-based recent hot key identification with specialized intra-epoch frequency counting (for maintaining low memory overhead) and inter-epoch hotness decaying (for suppressing superfluous computation). We also propose to heuristically infer the accurate information of remote workers through computation rather than communication for cost-efficient worker assignment. We have integrated our approach into Apache Storm. Our results on a cluster of 128 nodes for both synthetic and real-world stream datasets show that FISH significantly outperforms state-of-the-art with the average and the 99th percentile latency reduction by 87.12% and 76.34% (vs. W-Choices), and memory overhead reduction by 99.96% (vs. Shuffle Grouping).