ThesisPDF Available

Energy-efficient Workflow Scheduling with Budget and Deadline constraints in a Cloud Datacenter

Authors:
  • The University of Ngaoundere

Abstract and Figures

Scientific workflows decompose complex scientific applications into smaller subsequent interdependent tasks that can be executed in serial or parallel. Their use has boosted scientific advancements in various fields such as biology, physics, medicine, and astronomy. However, scientific workflows are generally complex and have varied structures and characteristics that can have a significant impact on the result of a scheduling algorithm. Schedule a workflow consist to assign workflow tasks onto the resources of a computing infrastructure. Nowadays, the trend in information technology is the usage of Cloud computing environments to perform scientific workflow applications. However, cloud environments are experiencing a real problem of energy consumption. Inefficient resources management in cloud data centers has been identified as one of the main causes. That led to resources underutilisation, huge electricity bills, and reduction of the return of investment (ROI) for the cloud providers, and also high carbon dioxide emissions. As for the users, the respect of their defined deadline and budget is very important. In this thesis, we have proposed consecutively five workflow scheduling algorithms based on the structural properties of workflows. We have first investigated how to propose scheduling strategies to minimize both execution cost and execution time, which led to the proposal of two algorithms. Finally, we have investigated how to render our strategies more energy efficient. That led to the proposition of three scheduling algorithms aiming at minimizing the energy consumption, the execution cost, and the execution time. The three algorithms take advantage of the structural properties of the workflow as well as newly introduced scheduling concepts. At each step of our work, comparative simulations have been conducted between each of our proposals against state-of-the-art algorithms. Supported by adequate statistical tests, the analysis of the results reveals the levels of outperformance of our proposals both in the case of the two bi-objective algorithms than in the case of the three multi-objective ones aiming in addition at reducing energy consumption. The out-performance of the later ones in terms of energy-saving is established in 80% of types and workloads of workflows. Overall, one among the three, namely the Structure-based Multi-objective Workflow Scheduling with an Optimal instance type (SMWSO), is at least 50% more energy-saving, followed by our two other algorithms. As for the success rate, even though SMWSO scored overall the highest success rate, statistical tests revealed that there is no significant difference between our three algorithms and the baseline algorithm in terms of user satisfaction.
Content may be subject to copyright.
A preview of the PDF is not available
Article
The Internet of Things (IoT) or Web 3.0 is a growing technology that allows devices to communicate with each other as well as with the cloud. The Industrial Internet of Things (IIoT), or Industry 4.0, is the industrial application of IoT. Industry 4.0 transforms production equipment into IoTs, which in turn transmit production-related data or their operational status to other connected devices or servers for rapid storage or processing. The results of the processing are used in real-time for the control and supervision of the production system. With the time constraints related to the availability of the processing results, companies have more and more computing servers. These IoT application computing servers are the fog. While processing IoT application data, fog servers require cloud intervention to offload tasks to free them up and provide data storage. Edge computing joins the game when connected objects are themselves computing resources. One of the main challenges of these technologies is the scheduling of dependent/independent tasks. Scheduling IoT workflows in the fog to optimize quality of service (QoS) metrics often requires offloading some tasks to the cloud in order to meet customer satisfaction and service level agreement (SLA) constraints. Therefore, many algorithms have been developed to schedule workflows in the cloud-fog environment. To organize this research, we present a literature review on scheduling workflows in cloud, fog, and edge environments. We outline a taxonomy that systematically classifies existing approaches. Finally, we identify challenges for future work.
Article
Full-text available
High energy consumption (EC) is one of the leading and interesting issue in the cloud environment. The optimization of EC is generally related to scheduling problem. Optimum scheduling strategy is used to select the resources or tasks in such a way that system performance is not violated while minimizing EC and maximizing resource utilization (RU). This paper presents a task scheduling model for scheduling the tasks on virtual machines (VMs). The objective of the proposed model is to minimize EC, maximize RU, and minimize workflow makespan while preserving the task’s deadline and dependency constraints. An energy and resource efficient workflow scheduling algorithm (ERES) is proposed to schedule the workflow tasks to the VMs and dynamically deploy/un-deploy the VMs based on the workflow task’s requirements. An energy model is presented to compute the EC of the servers. Double threshold policy is used to perceive the server’ status i.e. overloaded/underloaded or normal. To balance the workload on the overloaded/underloaded servers, live VM migration strategy is used. To check the effectiveness of the proposed algorithm, exhaustive simulation experiments are conducted. The proposed algorithm is compared with power efficient scheduling and VM consolidation (PESVMC) algorithm on the accounts of RU, energy efficiency and task makespan. Further, the results are also verified in the real cloud environment. The results demonstrate the effectiveness of the proposed ERES algorithm.
Article
Full-text available
Cloud computing is recently getting increasingly popular for supporting scientific applications and complex business processes. Clouds are highly potent for executing workflow-based tasks due to the fact that they provide elastic resource provisioning styles through which computational-intensive workflows can obtain requested resources according to their elastic demand and establish execution environment over virtual machines (VMs). However, it remains a challenge to guarantee cost-effectiveness and quality of service of workflow deployed upon clouds due to the fact that real-world cloud infrastructures are usually with fluctuating and time-varying performance. Existing researches mainly consider that cloud infrastructures are with fixed, random, or bounded quality of service (QoS). In this work, however, we consider that scientific computing processes to be supported by decentralized cloud infrastructures with fluctuating QoS and aim at managing the monetary cost of workflows with the completion-time constraint to be satisfied. We address the performance-variation-aware workflow scheduling problem by leveraging a time-series-based prediction model and a Critical-Path-Duration-Estimation-based (CPDE for short) VM Selection strategy. The proposed method is capable of exploiting real-time trends of performance changes of cloud infrastructures and generating dynamic workflow scheduling plans. To prove the effectiveness of our proposed method, we perform extensive experimental case analysis over real-world third-party commercial clouds and show that our method clearly beats existing approaches.
Article
Full-text available
In Infrastructure as a Service (IaaS) Clouds, users are charged to utilize cloud services according to a pay-per-use model. If users intend to run their workflow applications on cloud resources within a specific budget, they have to adjust their demands for cloud resources with respect to this budget. Although several scheduling approaches have introduced solutions to optimize the makespan of workflows on a set of heterogeneous IaaS cloud resources within a certain budget, the hourly-based cost model of some well-known cloud providers (e.g., Amazon EC2 Cloud) can easily lead to higher makespan and some schedulers may not find any feasible solution. In this paper, we propose a novel resource provisioning mechanism and a workflow scheduling algorithm, named Greedy Resource Provisioning and modified HEFT (GRP-HEFT), for minimizing the makespan of a given workflow subject to a budget constraint for the hourly-based cost model of modern IaaS clouds. As a resource provisioning mechanism, we propose a greedy algorithm which lists the instance types according to their efficiency rate. For our scheduler, we modified the HEFT algorithm to consider a budget limit. GRP-HEFT is compared against state-of-the-art workflow scheduling techniques, including MOACS (Multi-Objective Ant Colony System), PSO (Particle Swarm Optimization), and GA (Genetic Algorithm). The experimental results demonstrate that GRP-HEFT outperforms GA, PSO, and MOACS for several well-known scientific workflow applications for different problem sizes on average by 13.64%, 19.77%, and 11.69%, respectively. Also in terms of time complexity, GRP-HEFT outperforms GA, PSO and MOACS.
Article
Full-text available
Energy efficient workflow scheduling is the demand of the present time’s computing platforms such as an infrastructure-as-a-service (IaaS) cloud. An appreciable amount of energy can be saved if a dynamic voltage scaling (DVS) enabled environment is considered. But it is important to decrease makespan of a schedule as well, so that it may not extend beyond the deadline specified by the cloud user. In this paper, we propose a workflow scheduling algorithm which is inspired from hybrid chemical reaction optimization (HCRO) algorithm. The proposed scheme is shown to be energy efficient. Apart from this, it is also shown to minimize makespan. We refer the proposed approach as energy efficient workflow scheduling (EEWS) algorithm. The EEWS is introduced with a novel measure to determine the amount of energy which can be conserved by considering a DVS-enabled environment. Through simulations on a variety of scientific workflow applications, we demonstrate that the proposed scheme performs better than the existing algorithms such as HCRO and multiple priority queues genetic algorithm (MPQGA) in terms of various performance metrics including makespan and the amount of energy conserved. The significance of the proposed algorithm is also judged through the analysis of variance (ANOVA) test and its subsequent LSD analysis.
Article
Workflows are an application model that enables the automated execution of multiple interdependent and interconnected tasks. They are widely used by the scientific community to manage the distributed execution and dataflow of complex simulations and experiments. As the popularity of scientific workflows continue to rise, and their computational requirements continue to increase, the emergence and adoption of multi-tenant computing platforms that offer the execution of these workflows as a service becomes widespread. This article discusses the scheduling and resource provisioning problems particular to this type of platform. It presents a detailed taxonomy and a comprehensive survey of the current literature and identifies future directions to foster research in the field of multiple workflow scheduling in multi-tenant distributed computing systems.
Article
Workflow scheduling is one of the challenging issues in emerging trends of the distributed environment that focuses on satisfying various quality of service (QoS) constraints. The cloud receives the applications as a form of a workflow, consisting of a set of interdependent tasks, to solve the large-scale scientific or enterprise problems. Workflow scheduling in the cloud environment has been studied extensively over the years, and this article provides a comprehensive review of the approaches. This article analyses the characteristics of various workflow scheduling techniques and classifies them based on their objectives and execution model. In addition, the recent technological developments and paradigms such as serverless computing and Fog computing are creating new requirements/opportunities for workflow scheduling in a distributed environment. The serverless infrastructures are mainly designed for processing background tasks such as Internet-of-Things (IoT), web applications, or event-driven applications. To address the ever-increasing demands of resources and to overcome the drawbacks of the cloud-centric IoT, the Fog computing paradigm has been developed. This article also discusses workflow scheduling in the context of these emerging trends of cloud computing.
Article
Workflow scheduling is a largely studied research topic in cloud computing, which targets to utilize cloud resources for workflow tasks by considering the objectives specified in QoS. In this paper, we model dynamic workflow scheduling problem as a dynamic multi-objective optimization problem (DMOP) where the source of dynamism is based on both resource failures and the number of objectives which may change over time. Software faults and/or hardware faults may cause the first type of dynamism. On the other hand, confronting real-life scenarios in cloud computing may change number of objectives at runtime during the execution of a workflow. In this study, we propose a prediction-based dynamic multi-objective evolutionary algorithm, called NN-DNSGA-II algorithm, by incorporating artificial neural network with the NSGA-II algorithm. Additionally, five leading non-prediction based dynamic algorithms from the literature are adapted for the dynamic workflow scheduling problem. Scheduling solutions are found by the consideration of six objectives: minimization of makespan, cost, energy and degree of imbalance; and maximization of reliability and utilization. The empirical study based on real-world applications from Pegasus workflow management system reveals that our NN-DNSGA-II algorithm significantly outperforms the other alternatives in most cases with respect to metrics used for DMOPs with unknown true Pareto-optimal front, including the number of non-dominated solutions, Schott’s spacing and Hypervolume indicator.
Article
Workflow is a common model to represent large computations composed of dependent tasks. Most existing workflow scheduling algorithms use computing resources in a non-multiprogrammed way, by which only one task can run on a service (machine) at a time. In this paper, we study a new workflow scheduling model on heterogeneous Infrastructure-as-a-Service (IaaS) platforms, which allows multiple tasks to run concurrently on a virtual machine (VM) according to their multi-resource demands. First, we propose a list-scheduling framework for the new multiprogrammed cloud resource model. In the order of a priority list, this framework gradually appoints tasks the best placements found on both existing and new VMs on the platform. Different task prioritization and placement comparison methods can be employed for different scheduling objectives. To fully exploit the heterogeneity of IaaS platforms, the VMs can be scaled up during the scheduling process. Then, we propose a deadline-constrained workflow scheduling algorithm (called DyDL) based on this framework to optimize the cost of workflow execution. This algorithm prioritizes tasks by their latest start times and appoints tasks the placements which can meet their latest start times and incur the minimal cost increases. Experimental results show that DyDL can achieve significantly better schedules in most test cases compared to several existing deadline-constrained workflow scheduling algorithms.
Article
Cloud datacenters have become a backbone for today's business and economy, which are the fastest-growing electricity consumers, globally. Numerous studies suggest that ~30% of the US datacenters are comatose and the others are grossly less-utilized, which make it possible to save energy through resource consolidation techniques. However, consolidation comprises migrations that are expensive in terms of energy consumption and performance degradation, which is mostly not accounted for in many existing models, and, possibly, it could be more energy and performance efficient not to consolidate. In this paper, we investigate how migration decisions should be taken so that the migration cost is recovered, as only when migration cost has been recovered and performance is guaranteed, will energy start to be saved. We demonstrate through several experiments, using the Google workload data for 12,583 hosts and approximately one million tasks that belong to three different kinds of workload, how different allocation policies, combined with various migration approaches, will impact on datacenter's energy and performance efficiencies. Using several plausible assumptions for containerised datacenter set-up, we suggest, that a combination of the proposed energy-performance-aware allocation (Epc-Fu) and migration (Cper) techniques, and migrating relatively long-running containers only, offers for ideal energy and performance efficiencies.
Article
Green cloud computing attracts significant attention from both academia and industry. One of the major challenges involved, is to provide a high level of Quality of Service (QoS) in a cost-effective way for the end users and in an energy-efficient manner for the cloud providers. Towards this direction, this paper presents an energy-efficient, QoS-aware and cost-effective scheduling strategy for real-time workflow applications in cloud computing systems. The proposed approach utilizes per-core Dynamic Voltage and Frequency Scaling (DVFS) on the underlying heterogeneous multi-core processors, as well as approximate computations, in order to fill in schedule gaps. At the same time, it takes into account the effects of input error on the processing time of the component tasks. Our goal is to provide timeliness and energy efficiency by trading off result precision, while keeping the result quality of the completed jobs at an acceptable standard and the monetary cost required for the execution of the jobs at a reasonable level. The proposed scheduling heuristic is compared to two other baseline policies, under the impact of various QoS requirements. The simulation experiments reveal that our approach outperforms the other examined policies, providing promising results.