ArticlePDF Available

Scientific Workflow Scheduling in Clouds: A Review

August 2018
International Journal of Engineering & Technology Sciences 7(3.28):271-274

August 2018
7(3.28):271-274

DOI:10.14419/ijet.v7i3.28.23435

Authors:

Tawfiq Alrawashdeh

Al-Hussein Bin Talal University

Aznida Hayati Zakaria

Universiti Sultan Zainal Abidin | UniSZA

Due to their abundant resources that can be elastically provisioned with pay-as-you-go pricing, clouds have emerged as a promising cost-efficient platform to execute large scale scientific applications. Such applications consist of number of processes/tasks forming workflow. These tasks are connected by direct edges that show the data dependency between the tasks. Tasks perform their computation on the original data submitted by the user, or on data passed by its predecessor task. This work, classify and discuss proposals that investigate the problem of scheduling scientific workflows in clouds.

Content uploaded by Tawfiq Alrawashdeh

Content may be subject to copyright.

use, distribution, and reproduction in any medium, provided the original work is properly cited.

International Journal of Engineering & Technology, 7 (3.28) (2018) 271-274

International Journal of Engineering & Technology

Website: www.sciencepubco.com/index.php/IJET

Research paper

Scientific Workflow Scheduling in Clouds: A Review

Tawfiq Alrawashdeh1, Aznida Hayati Zakaria2*, Zarina Mohamad2

1Al Husein Bin Talal University, P.O. Box 20 Ma'an, Jordan

2Faculty of Informatics and Computing, Universiti Sultan Zainal Abidin, Besut Campus, Terengganu, Malaysia

*Corresponding author E-mail: aznida@unisza.edu.my

Abstract

Due to their abundant resources that can be elastically provisioned with pay-as-you-go pricing, clouds have emerged as a promising cost-

efficient platform to execute large scale scientific applications. Such applications consist of number of processes/tasks forming workflow.

These tasks are connected by direct edges that show the data dependency between the tasks. Tasks perform their computation on the

original data submitted by the user, or on data passed by its predecessor task. This work, classify and discuss proposals that investigate

the problem of scheduling scientific workflows in clouds.

Keywords: scheduling algorithms; scientific workflow; cloud computing.

1. Introduction

With the rapid increment on the complexity of the workflow, and

the resultant demand on the scalability of the environment, execut-

ing workflows on traditional environment become a very challeng-

ing task. To address this issue, cloud computing (especially Infra-

structure as a Service) has emerge as an efficient environment to

execute scientific workflows.

Infrastructure as a Service (IaaS) cloud offer an effectively availa-

ble, adaptable, and versatile foundation for the arrangement of

scientific workflows. Using IaaS, user can rent Virtual Machines

(VMs) to execute their computational tasks. This allows user ac-

cess to a practically unbounded pool of VMs that can be flexibly

gained and discharged during the execution of the computational

task(s). In this service, users are charged based on the number of

resources they rent using pay-per-use cost model.

In this direction, to increase the utilization of the resources, must

determine the right number of resources to rent. Over-renting is

expected to increase the execution cost, and under-renting is ex-

pected to reduce the performance (increase execution time). This

problem is underlined due to the bi-objective scheduling problem

presented by scheduling workflows in cloud. Generally, want to

establish an execution schedule for the tasks of the workflows

such that the execution cost and time is minimized. This conflict

in the objective results in highlighting the process of determine the

number of resources to rent as a major challenge.

The problem of scheduling workflows in clouds is NP-Complete

in nature. Many variations of this problem have been proposed in

the literature. For instance, many proposals have investigated the

problem of finding the cheapest schedules that satisfy pre-

determined deadline on the execution of the workflow [1-5, 7]. On

the other hand, in [6, 9-11], the authors addressed the problem of

determining the schedule of executing the tasks such that the exe-

cution time is minimized and pre-determined execution cost con-

straint is satisfied. Introducing budget or/and deadline constraints

results in reducing the optimization space to one of the objectives

(cost, time).

This paper, discuss and classify proposals that investigate the

problem of scheduling scientific workflows in clouds. Section 2

presents the most related approaches to this problem, and in sec-

tion 3, conclude this paper.

2. Scheduling Approaches

Many proposals [1-24] have investigated the problem of

scheduling scientific workflows in cloud. In this section,

this paper discusses the related proposals in the literature.

This paper categorizes these proposals in term of their ob-

jective function into: (1) single-objective, and (2) multi-

objective.

2.1. Single-objective

The problem of scheduling scientific workflows on with the objec-

tive of minimizing the makespan or cost minimization has been

studied extensively in the literature. For instance, given a pre-

determined budget-constraint, in [1] propose a heuristic-based

solution that aim to reduce the overall execution delay. In each

iteration, the main idea of this approach is to improve the current

schedule by considering the left budget. In [3] addressed the prob-

lem of minimizing the makespan under the presence of budget

constraints. They termed this problem as the Minimum End-to-end

Delay under Cost Constraint (MED-CC) problem. The authors

addressed the hardness of this problem by proving that it is NP-

Complete, and it is non-approximable. To address this problem,

the authors proposed heuristic-based solution. This heuristic fol-

lows an efficient searching strategy, where in each iteration it tries

to find a new schedule such that the makespan is minimized.

In [4] proposed priority based genetic algorithm termed as

BCHGA, which address the problem of scheduling the tasks of the

workflow under the presence of the budget constraint. In this algo-

rithm, based on the locality of the tasks, each task is assigned ei-

ther bottom level priority (b-level), or top-level priority (t-level).

Then, in each round, the algorithm by trying to find better sched-

272

International Journal of Engineering & Technology

ule in term of makespan, while minimizing the execution cost. In

[2, 5], the authors adopt similar strategy to minimize the execution

cost under the presence of the budget and execution time con-

straints.

In [6] proposed a Deadline guarantee enhanced scheduling algo-

rithm (DGESA). This algorithm target scenario where scientific

workflows can be scheduled on hybrid. This algorithm starts by

calculating the sub-deadline for each task. For each task, once the

sub-deadline is calculated, the algorithm proceeds by determining

the probability of violating thi s deadline. For each task, if the

probability of violating the deadline is higher than pre-determined

threshold, this task will be scheduled to be executed on public

cloud. Otherwise, the task will be executed on private cloud.

In [7] also addressed the problem of minimizing the execution

time under the presence of the budget constraint. To address this

problem, the authors' proposed budget-driven approach, which

start by partitioning the workflow into bags of tasks, where each

bag contains a group of parallel tasks. Then each bag is scheduled

based on the characteristics and locality of its tasks. This is estab-

lished by modelling the resource provisioning plan for each bag of

tasks as a mixed integer linear programming (MILP) model.

In [8] proposed a new pulling-based workflow execution system

with a profiling-based resource provisioning strategy (DEWE v2).

This system consists of master and worker daemons, which can be

run by the same node or different nodes. The master daemon is

responsible for managing the progress of executing the workflows.

The worker daemon informs the master daemon whenever a task

has been successfully executed. In addition, this system has work-

er submission application, which can be used by the scientist to

submit workflows

In [9], the authors proposed the IC-PCP algorithm that to schedule

the workflow's tasks such the execution cost is minimized and a

pre-determine execution deadline is satisfied. This algorithm starts

by distributing the workflow deadline across the entire workflows

tasks. Starting from the critical path tasks, the sub-deadline for

these path tasks will be determined. Then, the algorithm proceeds

by determining the deadlines for all sub-paths. Once the deadline

for each task is determined, each task will be schedule on the re-

source that result in satisfying the task deadline and reducing the

execution cost.

In [10], the authors propose scheduling strategy designed mainly

for WaaS, which aim to minimize the execution cost while satisfy-

ing a pre-determined time-deadline. The main idea of this algo-

rithm is to determine each task sub-deadline. Then, once a task

become ready for execution, if no available VM can be used to

schedule this task without violating its deadline, a new VM will be

leased.

In [11], the authors proposed an approximation algorithm to ad-

dress the problem of scheduling scientific workflows such that the

execution cost is minimized, while satisfying a pre-determined

time deadline. This algorithm works in iterative fashion, where in

each iteration the scheduler tries to find the maximum number of

tasks that can be executed at a specific node. This is established by

considering the latest acceptable finishing time for each task.

2.2. Multi-objective

Now, this paper discusses in details proposals that deal with bi-

objective problems. This paper focus on the problem of minimiz-

ing the execution cost and time. This problem has conflicted ob-

jective, since reducing the makespan results in increasing the cost,

and reducing the cost result in increasing the makespan.

In [12], proposed solution that address the problem of minimizing

both the execution cost and execution time. Their approach can be

considered as an extended version of the well-known Heterogene-

ous Earliest Finish Time (HEFT) heuristic, and it attempt to estab-

lish a trade-off between execution time and execution cost. This

main idea of this heuristic is to assign a priority value for each

task. This priority value is determine based on the locality of each

task, and it aim to prioritize executing tasks on VMs such that

executing cost and time is minimized.

In [13], the Deadline-Budget Workflow Scheduling (DBWS) algo-

rithm is presented to address the problem of scheduling scientific

workflows in clouds, while the execution cost and time are mini-

mized. This algorithm assume that the user specifies pre-

determined time and cost deadlines. Makespan is handled as a

primary objective and thus the resultant schedule must always

satisfy the deadline constraint. In this algorithm, cost is handled as

a secondary objective, where the cost constraint can be violated.

At its core, this algorithm tries to find the cheapest schedule that

can satisfy the deadline constraint.

In [14], the authors proposed dynamic auction-based workflow

scheduling algorithm that dynamically allocate the workflow's

tasks across multiple cloud domains. The objective of this algo-

rithm is to reduce the execution cost, and satisfy the execution

time constraint. In this algorithm, each task is ranked based on its

level load, and its successor rank. In this algorithm, task with

submit their input and output requirement, and resources will bid

their computational capacity. In this paradigm, tasks will be as-

signed to resources with higher computational capacity.

In [15] consider a multi-cloud domain, where every supplier con-

tributes a fixed number of heterogeneous VMs. In addition, they

provide global storage service to store intermediate data files. The

authors formulate the scheduling problem as a Mixed Integer Pro-

gram (MIP). Then they proposed two algorithmic solutions to

address two different scenarios. In the first scenario, they assume

that the running time for each is in hour unit, where in the second

scenario they assume that each task running time is less than an

hour. They start by partitioning the tasks based on their level. In

addition, they assumed that any VMs cannot be allocated different

partitions. This simplify the execution of the proposed MIP. How-

ever, by limiting the solution space, this approach reduces the

optimization space. In this direction, in [16] presented a clustering

approach to schedule workflows in clouds, where the tasks levels

are the key factor behind the schedule.

In [17] presented a dynamic resource provisioning and scheduling

algorithm DPDS. This algorithm aims to schedule multiple work-

flows simultaneously on the cloud. Such schedule is established

with the objective of satisfying the users cost and time constraints.

It attempts to maximize the number of completed workflows, and

does not take into account minimizing the execution cost. Fur-

thermore, it assumes that the available resources have the same

computational and communication capabilities.

In [18] presented the Workflow scheduling on Hybrid Cloud to

maintain Data Privacy (WHPD) algorithm. The objective of this

algorithm is to maintain the privacy of the scheduled tasks such

that the makespan constraint is satisfied. This algorithm handled

minimizing the execution cost as a secondary objective. This algo-

rithm starts by determining the latest starting time for each task, in

order to satisfy the pre-determined time-deadline. Then, each task

will be allocated to the VM such that the gap between the task

earliest execution time and the actual execution time is minimized.

The last step of this algorithm is to attempt improving the obtained

schedule by moving tasks between VMs in order to reduce the

execution time. The performance of this algorithm depends on the

initial schedule since, this schedule establishes constraints on the

allowed movements in the schedule improvement step.

In [19], the authors proposed the Cost with Finish Time-Based

(CwFT) Algorithm. This algorithm is an extended version of the

HEFT algorithm, and it aim to reduce the execution time and cost.

This algorithm consists of two phases: (1) task prioritizing and (2)

node selection. In the task prioritizing phase, the priority of each

task will be determined based on its locality. In the node selection

phase, each task will be assigned to the VM with the objective of

minimizing that ratio of executing cost and time. In [20], the au-

thors investigated the same problem, and they also proposed an

auto-scaling algorithm, which adopt the strategy of the well-

known HEFT algorithm.

Several authors [21-24] have investigated the problem of schedul-

ing scientific workflows in cloud using nature-inspired algorithm.

Although techniques like Ant Colony Optimization (ACO) and

Particle Swarm Optimization (PSO), and Genetic Algorithms

International Journal of Engineering & Technology

273

(GA), can lead to find a near optimal solution, its main issue is

practicality. Such techniques may require relatively large pro- cessing time, and this reduces the scalability of these approaches.

Table 1: Summary table of related work on workflow scheduling

References

Algorithm

Environment Tools

Optimization Strategy

Constraints

[1]

Critical-Greedy(CG)

CloudSim

Heuristic

single-objective

[2]

BPSO

Synthetic workflows

Heuristic

single-objective

[3]

MED-CC

CloudSim

Heuristic

single-objective

[4]

BCHGA

Java

Metaheuristic

single-objective

[5]

PDC and DCCP

Compared to existing algorithms

Heuristic

single-objective

[6]

DGESA

Simulation environment of hybrid computing

Heuristic

single-objective

[7]

BAGS

Synthetic workflows

Heuristic

single-objective

[8]

DEWE v2

Pegasus

Metaheuristic

single-objective

[9]

IC-PCP and IC-PCPD2

Synthetic workflows

Heuristic

single-objective

[10]

EPSM

Synthetic workflows

Heuristic

single-objective

[11]

EES

Synthetic workflows

Heuristic

single-objective

[12]

BDHEFT

Synthetic workflows

Heuristic

multi-objective

[13]

DBWS

Synthetic workflows

Heuristic

multi-objective

[14]

Novel replication aware

dynamic workflow scheduling

CloudSim toolkit

Heuristic

multi-objective

[15]

Mathematical models

Amazon EC2

Heuristic

multi-objective

[16]

Efficient workflow

scheduling algorithm

WorkflowSim and CloudSim

Heuristic

multi-objective

[17]

DPDS

CloudSim

Heuristic

multi-objective

[18]

WHPD

Synthetic workflows

Heuristic

multi-objective

[19]

CwFT

Gaussian Elimination program

Heuristic

multi-objective

[20]

HEFT

Compare with approaches

Heuristic

multi-objective

[21]

Algorithm based on the

meta-heuristic optimization technique

Synthetic workflows

Metaheuristic

multi-objective

[22]

CEGA

Synthetic workflows

Metaheuristic

multi-objective

[23]

A combined resource provisioning

and scheduling strategy

CloudSim framework

Metaheuristic

multi-objective

[24]

A hybrid genetic algorithm

WorkflowSim

Metaheuristic

multi-objective

3. Conclusion

This paper discussed the problem of scheduling scientific work-

flow in clouds under the presence of budget and/or deadline con-

straints. Based on the objective function of the problem, this paper

is mainly interested in three variations of this problem. In the first

problem, discussed proposals related to the problem of minimizing

the execution cost. In this problem, the users are typically main

concern with the cost of the execution and have no restriction on

the execution time. In the second problem, focus on minimizing

the execution time. In this problem, users target minimizing the

makespan, and this may lead to increase the execution cost. In the

last problem, focus on proposals related to minimizing the execu-

tion cost and time. In this problem users try to find a solution that

balance the importance of time and cost.

References

[1] Wu, C., Lin, X., Yu, D., Xu, W., & Li, L. (2015). End-to-end delay

minimization for scientific workflows in clouds under budget

constraint. IEEE Transactions on Cloud Computing, 3(2), 169-181.

[2] Verma, A., & Kaushal, S. (2015). Cost minimized PSO based

workflow scheduling plan for cloud computing. I.J. Information

Technology and Computer Science, 8, 37-43.

[3] Lin, X., & Wu, C. Q. (2013, October). On scientific workflow

scheduling in clouds under budget constraint. Proceedings of the

IEEE 42nd International Conference on Parallel Processing, pp. 90-

99.

[4] Verma, A., & Kaushal, S. (2013). Budget constrained priority

based genetic algorithm for workflow scheduling in cloud.

Proceedings of the IET Fifth International Conference on Recent

Trends in Information, Telecommunication and Computing, pp. 8-

14.

[5] Arabnejad, V., Bubendorfer, K., & Ng, B. (2017). Scheduling

deadline constrained scientific workflows on dynamically

provisioned cloud resources. Future Generation Computer

Systems, 75, 348-364.

[6] Luo, H., Yan, C., & Hu, Z. (2015). An Enhanced Workflow

Scheduling Strategy for Deadline Guarantee on Hybrid Grid/Cloud

Infrastructure. Journal of Applied Science and Engineering, 18(1),

67-78.

[7] Rodriguez, M. A., & Buyya, R. (2017). Budget-driven scheduling

of scientific workflows in IaaS clouds with fine-grained billing

periods. ACM Transactions on Autonomous and Adaptive

Systems, 12(2), 1-22.

[8] Jiang, Q., Lee, Y. C., & Zomaya, A. Y. (2015). Executing large

scale scientific workflow ensembles in public clouds. Proceedings

of the IEEE 44th International Conference on Parallel Processing,

pp. 520-529.

[9] Abrishami, S., Naghibzadeh, M., & Epema, D. H. (2013).

Deadline-constrained workflow scheduling algorithms for

infrastructure as a service clouds. Future Generation Computer

Systems, 29(1), 158-169.

[10] Rodriguez, M. A., & Buyya, R. (2018). Scheduling dynamic

workloads in multi-tenant scientific workflow as a service

platforms. Future Generation Computer Systems, 79, 739-750.

[11] Ma, Y., Gong, B., Sugihara, R., & Gupta, R. (2012). Energy-

efficient deadline scheduling for heterogeneous systems. Journal of

Parallel and Distributed Computing, 72(12), 1725-1740.

[12] Verma, A., & Kaushal, S. (2015). Cost-time efficient scheduling

plan for executing workflows in the cloud. Journal of Grid

Computing, 13(4), 495-506.

[13] Ghasemzadeh, M., Arabnejad, H., & Barbosa, J. G. (2017).

Deadline-budget constrained scheduling algorithm for scientific

workflows in a cloud environment. Proceedings of the LIPIcs-

Leibniz International Proceedings in Informatics, pp. 1-16.

[14] Gayathri, T., & Subashini, B. V. (2015). Task ranking based

allocation of scientific workflows in multiple clouds with deadline

constraint. International Journal of Engineering and Computer

Science, 4(2), 10543-10546.

[15] Malawski, M., Figiela, K., Bubak, M., Deelman, E., & Nabrzyski, J.

(2015). Scheduling multilevel deadline-constrained scientific

workflows on clouds based on cost optimization. Scientific

Programming, 2015, 1-13.

[16] Prathibha, D. A., Latha, B., Sumathi, G., Vani, R., Sangeetha, M.,

Davis, P., Nithyanandam C, Mohankumar G, Suratanee A, Lertsari

N, & Kamphasee, S. (2014). Efficient scheduling of workflow in

cloud environment using billing model aware task

clustering. Journal of Theoretical and Applied Information

Technology, 65(3), 595-605.

[17] Malawski, M., Juve, G., Deelman, E., & Nabrzyski, J. (2015).

Algorithms for cost-and deadline-constrained provisioning for

scientific workflow ensembles in IaaS clouds. Future Generation

Computer Systems, 48, 1-18.

274

International Journal of Engineering & Technology

[18] Abrishami, H., Rezaeian, A., & Naghibzadeh, M. (2015).

Workflow scheduling on the hybrid cloud to maintain data privacy

under deadline constraint. Journal of Intelligent Computing, 6(3),

92-103.

[19] Man, N. D., & Huh, E. N. (2013). Cost and efficiency-based

scheduling on a general framework combining between cloud

computing and local thick clients. Proceedings of the IEEE

International Conference on Computing, Management and

Telecommunications, pp. 258-263.

[20] Jiping, Z., Chunhua, G., & Feng, W. (2014). HEFT based cloud

auto-scaling algorithm with budget constraints. International

Journal of Advances in Computer Science and Technology, 3, 13-

18.

[21] Goyal, M., & Aggarwal, M. (2017). Optimize workflow scheduling

using hybrid ant colony optimization (ACO) and particle swarm

optimization (PSO) algorithm in cloud environment. International

Journal of Advance Research, Ideas and Innovations in Technology,

3(2), 1-9.

[22] Meena, J., Kumar, M., & Vardhan, M. (2016). Cost effective

genetic algorithm for workflow scheduling in cloud under deadline

constraint. IEEE Access, 4, 5065-5082.

[23] Rodriguez, M. A., & Buyya, R. (2014). Deadline based resource

provisioning and scheduling algorithm for scientific workflows on

clouds. IEEE Transactions on Cloud Computing, 2(2), 222-235.

[24] Kaur, G., & Kalra, M. (2017). Deadline constrained scheduling of

scientific workflows on cloud using hybrid genetic algorithm.

Proceedings of the IEEE 7th International Conference on Cloud

Computing, Data Science and Engineering-Confluence, pp. 276-

280.

A Critical Review of Workflow Scheduling Algorithms in Cloud Computing Environment

Conference Paper

Jul 2021

Executing Large Scale Scientific Workflow Ensembles in Public Clouds

Conference Paper

Full-text available

Sep 2015

Cost Effective Genetic Algorithm for Workflow Scheduling in Cloud Under Deadline Constraint

Article

Full-text available

Jan 2016

Cloud computing is becoming an increasingly admired paradigm that delivers high-performance computing resources over the Internet to solve the large-scale scientific problems, but still it has various challenges that need to be addressed to execute scientific workflows. The existing research mainly focused on minimizing finishing time (makespan) or minimization of cost while meeting the quality of service requirements. However, most of them do not consider essential characteristic of cloud and major issues, such as virtual machines (VMs) performance variation and acquisition delay. In this paper, we propose a meta-heuristic cost effective genetic algorithm that minimizes the execution cost of the workflow while meeting the deadline in cloud computing environment. We develop novel schemes for encoding, population initialization, crossover, and mutations operators of genetic algorithm. Our proposal considers all the essential characteristics of the cloud as well as VM performance variation and acquisition delay. Performance evaluation on some well-known scientific workflows, such as Montage, LIGO, CyberShake, and Epigenomics of different size exhibits that our proposed algorithm performs better than the current state-of-the-art algorithms.

Workflow Scheduling on the Hybrid Cloud to Maintain Data Privacy under Deadline Constraint

Article

Full-text available

Sep 2015

The development of cloud computing technology has been continuously growing since its invention and has attracted the attention of many researchers in the academia and the industry, particularly during the recent years. The majority of organizations, whether large corporate businesses or typical small companies, are moving towards adapting this cutting edge technology. The private cloud provides low cost and privacy for workflow applications execution. However, an organization’s requirements to high performance resources and high capacity storage devices encourage them to utilize public clouds. Public cloud leases information technology services in the form of small units and in larger scale compared to private cloud, but this model is potentially exposed to the risk of data and computation breach and is less secure in comparison to a pure private cloud environment. The combination of public and private clouds is known as hybrid cloud, where workflow tasks can be executed on resources residing on either public or private clouds. The objective of this paper is to present a scheduling algorithm for maintaining data privacy in workflow applications, such that the budget is minimized, while the makespan limitation imposed by the user is satisfied. A scheduling algorithm called Workflow scheduling on Hybrid Cloud to maintain Data Privacy (WHPD) is proposed and it is shown that it can perform as well as HCOC while preserving data and computation privacy.

Cost-Time Efficient Scheduling Plan for Executing Workflows in the Cloud

Article

Full-text available

Aug 2015

The emergence of Cloud Computing as a model of service provisioning in distributed systems instigated researchers to explore its pros and cons on executing different large scale scientific applications, i.e., Workflows. One of the most challenging problems in clouds is to execute workflows while minimizing the execution time as well as cost incurred by using a set of heterogeneous resources over the cloud simultaneously. In this paper, we present, Budget and Deadline Constrained Heuristic based upon Heterogeneous Earliest Finish Time (HEFT) to schedule workflow tasks over the available cloud resources. The proposed heuristic presents a beneficial trade-off between execution time and execution cost under given constraints. The proposed heuristic is evaluated for different synthetic workflow applications by a simulation process and comparison is done with state-of-art algorithm i.e. BHEFT. The simulation results show that our proposed scheduling heuristic can significantly decrease the execution cost while producing makespan as good as the best known scheduling heuristic under the same deadline and budget constraints.

Cost Minimized PSO based Workflow Scheduling Plan for Cloud Computing

Article

Full-text available

Jul 2015

Cloud computing is a collection of heterogeneous virtualized resources that can be accessed on-demand to service applications. Scheduling large and complex workflows becomes a challenging issue in cloud computing with a requirement that the execution time as well as cost incurred by using a set of heterogeneous cloud resources should be minimizes simultaneously. In this paper, we have extended our previously proposed Bi-Criteria Priority based Particle Swarm Optimization (BPSO) algorithm to schedule workflow tasks over the available cloud resources under given the deadline and budget constraints while considering the confirmed reservation of the resources. The extended heuristic is simulated and comparison is done with state-of-art algorithms. The simulation results show that extended BPSO algorithm also decreases the execution cost of schedule as compared to state-of-art algorithms under the same deadline and budget constraint while considering the exiting load of the resources too.

Deadline constrained scheduling of scientific workflows on cloud using hybrid genetic algorithm

Conference Paper

Jan 2017

Budget-Driven Scheduling of Scientific Workflows in IaaS Clouds with Fine-Grained Billing Periods

Article

May 2017

With the advent of cloud computing and the availability of data collected from increasingly powerful scientific instruments, workflows have become a prevailing mean to achieve significant scientific advances at an increased pace. Scheduling algorithms are crucial in enabling the efficient automation of these large-scale workflows, and considerable effort has been made to develop novel heuristics tailored for the cloud resource model. The majority of these algorithms focus on coarse-grained billing periods that are much larger than the average execution time of individual tasks. Instead, our work focuses on emerging finer-grained pricing schemes (e.g., per-minute billing) that provide users with more flexibility and the ability to reduce the inherent wastage that results from coarser-grained ones. We propose a scheduling algorithm whose objective is to optimize a workflow’s execution time under a budget constraint; quality of service requirement that has been overlooked in favor of optimizing cost under a deadline constraint. Our proposal addresses fundamental challenges of clouds such as resource elasticity, abundance, and heterogeneity, as well as resource performance variation and virtual machine provisioning delays. The simulation results demonstrate our algorithm’s responsiveness to environmental uncertainties and its ability to generate high-quality schedules that comply with the budget constraint while achieving faster execution times when compared to state-of-the-art algorithms.

Scheduling dynamic workloads in multi-tenant scientific workflow as a service platforms

Article

May 2017

With the advent of cloud computing and the availability of data collected from increasingly powerful scientific instruments, workflows have become a prevailing mean to achieve significant scientific advances at an increased pace. Emerging Workflow as a Service (WaaS) platforms offer scientists a simple, easily accessible, and cost-effective way of deploying their applications in the cloud at anytime and from anywhere. They are multi-tenant frameworks and are designed to manage the execution of a continuous workload of heterogeneous workflows. To achieve this, they leverage the compute, storage, and network resources offered by Infrastructure as a Service (IaaS) providers. Hence, at any given point in time, a WaaS platform should be capable of efficiently scheduling an arbitrarily large number of workflows with different characteristics and quality of service requirements. As a result, we propose a resource provisioning and scheduling strategy designed specifically for WaaS environments. The algorithm is scalable and dynamic to adapt to changes in the environment and workload. It leverages containers to address resource utilization inefficiencies and aims to minimize the overall cost of leasing the infrastructure resources while meeting the deadline constraint of each individual workflow. To the best of our knowledge, this is the first approach that explicitly addresses VM sharing in the context of WaaS by modeling the use of containers in the resource provisioning and scheduling heuristics. Our simulation results demonstrate its responsiveness to environmental uncertainties, its ability to meet deadlines, and its cost-efficiency when compared to a state-of-the-art algorithm.

Scheduling deadline constrained scientific workflows on dynamically provisioned cloud resources

Article

Jan 2017
FUTURE GENER COMP SY

Commercial cloud computing resources are rapidly becoming the target platform on which to perform scientific computation, due to the massive leverage possible and elastic pay-as-you-go pricing model. The cloud allows researchers and institutions to only provision compute when required, and to scale seamlessly as needed. The cloud computing paradigm therefore presents a low capital, low barrier to operating dedicated HPC eScience infrastructure. However, there are still significant technical hurdles associated with obtaining sufficient execution performance while limiting the financial cost, in particular, a naive scheduling algorithm may increase the cost of computation to the point that using cloud resources is no longer a viable option. The work in this article concentrates on the problem of scheduling deadline constrained scientific workloads on dynamically provisioned cloud resources, while reducing the cost of computation. Specifically we present two algorithms, Proportional Deadline Constrained (PDC) and Deadline Constrained Critical Path (DCCP) that address the workflow scheduling problem on such dynamically provisioned cloud resources. These algorithms are additionally extended to refine their operation in task prioritization and backfilling respectively. The results in this article indicate that both PDC and DCCP algorithms achieve higher cost efficiencies and success rates when compared to existing algorithms.

Efficient scheduling of workflow in cloud enviornment using billing model aware task clustering

Article

Jul 2014

Cloud computing is a cost effective alternative for the scientific community to deploy large scale workflow applications.For executing large scale scientific workflow applications in a distributed hetereogenous enviornment,scheduling of workflow tasks with the dynamic resources is a challenging issue.Moreover in a utility based computing like cloud which supports pay per use model of the resources,scheduling algorithm must efficiently utilize the available time of the resource.Most of the existing scheduling heuristics does not consider the dynamic nature of the cloud and hence produce the static schedule. Public cloud enviornment like Amazon EC2 offers catalog of resources and the price is generally metered per hour.Here any fractional usage is rounded off to the next hour.To meet the budget and deadline of the customers proposed work focuses to incorporate a billing model aware task clustering mechanism in the workflow scheduling process This work also presents a resource selection algorithm which can be used for choosing proper resource at each stage in the workflow. Preliminary results obtained by running two scientific applications Montage and Cybershake with different resources and task clustering mechanisms are discussed.

Scientific Workflow Scheduling in Clouds: A Review

Abstract

Recommended publications

Cost minimization for bag-of-tasks workflows in a federation of clouds

Deadline-constrained workflow scheduling in IaaS clouds with multi-resource packing

Lecture Notes in Computer Science

PyADF - A scripting framework for multiscale quantum chemistry