A high level view of Adaptive Scheduling Algorithm for Dynamic Heterogeneous Hadoop Systems

A high level view of Adaptive Scheduling Algorithm for Dynamic Heterogeneous Hadoop Systems

Source publication
Article
Full-text available
Hadoop is a cloud computing open source system, used in large-scale data processing. It became the basic computing platforms for many internet companies. With Hadoop platform users can develop the cloud computing application and then submit the task to the platform. Hadoop has a strong fault tolerance, and can easily increase the number of cluster...

Context in source publication

Context 1
... the mean job execution times are estimated when a new job is submitted to the system, which makes the scheduler adaptable to changes in job execution times. A high level view of Dynamic Heterogeneous Hadoop Systems is shown in Figure 5. ...

Similar publications

Article
Full-text available
MapReduce is a structural form to address larger-applications for handling tremendous data generated in parallel. These larger tasks is car-ried out by master and salve node architecture, where the master node judges all the available resources and manages the distributed applica-tions and the slave node is responsible to maintain the resources usa...

Citations

... Paper [10] presented that in 2016 Facebook handled 4 petabytes of data on daily basis and each day in 2017, 2:5 quintillion bytes of information were generated, according to IBM. Amazon uses ad hoc groups to manage enormous volumes of data on a regular basis, according to paper [11,12,13]. A substantial dataset is broken down into little segments and spread among multiple nodes, each with its distinct computing and caching capabilities. ...
... When compared to the conventional methodology, issue solving using metaheuristic approaches was better since the researched space's dimensions expanded. The MapReduce framework was the main emphasis of the authors' study in [13,14,42,[63][64][65], as well as its limitations, problems with job scheduling between nodes, and other algorithms presented by different academics. These algorithms were then categorized based on a variety of performance-related quality indicators in some of these studies. ...
Article
Full-text available
Rapid advancements in Big data systems have occurred over the last several decades. The significant element for attaining high performance is "Job Scheduling" in Big data systems which requires more utmost attention to resolve some challenges of scheduling. To obtain higher performance when processing the big data, proper scheduling is required. Apache Hadoop is most commonly used to manage immense data volumes in an efficient way and also proficient in handling the issues associated with job scheduling. To improve performance of big data systems, we significantly analyzed various Hadoop job scheduling algorithms. To get an overall idea about the scheduling algorithm, this paper presents a rigorous background. This paper made an overview on the fundamental architecture of Hadoop Big data framework, job scheduling and its issues, then reviewed and compared the most important and fundamental Hadoop job scheduling algorithms. In addition, this paper includes a review of other improved algorithms. The primary objective is to present an overview of various scheduling algorithms to improve performance when analyzing big data. This study will also provide appropriate direction in terms of job scheduling algorithm to the researcher according to which characteristics are most significant
... Table 6 shows a comparison of various Hadoop schedulers and proposed scheduler. To prepare this table, the results obtained in [12,25,26] and [27] and the results obtained in this article have been used. According to the table, the proposed scheduler allocates resources dynamically and considers job priority. ...
Article
Full-text available
Job scheduling in Hadoop has been thus far investigated in several studies. However, some challenges including minimum share (min-share), heterogeneous cluster, execution time estimation, and scheduling program size facing Hadoop clusters have received less attention. Accordingly, one of the most important algorithms with regard to min-share is that presented by Facebook Inc., i.e., FAIR scheduler, based on its own needs, in which an equal min-share has been considered for users. In this article, an attempt has been made to make the proposed method superior to existing methods through automation and configuration, performance optimization, fairness and data locality. A high-level architectural model is designed. Then a scheduler is defined on this architectural model. The provided scheduler contains four components. Three components schedule jobs and one component distributes the data for each job among the nodes. The given scheduler will be capable of being executed on heterogeneous Hadoop clusters and running jobs in parallel, in which disparate min-shares can be assigned to each job or user. Moreover, an approach is presented for each problem associated with min-share, cluster heterogeneity, execution time estimation, and scheduler program size. These approaches can be also utilized on its own to improve the performance of other scheduling algorithms. The scheduler presented in this paper showed acceptable performance compared with First-In, First-Out (FIFO), and FAIR schedulers.