Conference PaperPDF Available

A Novel Data Placement Strategy for Data-Sharing Scientific Workflows in Heterogeneous Edge-Cloud Computing Environments

October 2020

October 2020

DOI:10.1109/ICWS49710.2020.00073

Conference: IEEE ICWS

Authors:

Xin Du

Fudan University

Jie Wu

Fudan University

Show all 6 authorsHide

The deployment of datasets in the heterogeneous edge-cloud computing paradigm has received increasing attention in state-of-the-art research. However, due to their large sizes and the existence of private scientific datasets, finding an optimal data placement strategy that can minimize data transmission as well as improve performance, remains a persistent problem. In this study, the advantages of both edge and cloud computing are combined to construct a data placement model that works for multiple scientific workflows. Apparently, the most difficult research challenge is to provide a data placement strategy to consider shared datasets, both within individual and among multiple workflows, across various geographically distributed environments. According to the constructed model, not only the storage capacity of edge micro-datacenters, but also the data transfer between multiple clouds across regions must be considered. To address this issue, we considered the characteristics of this model and identified the factors that are causing the transmission delay. The authors propose using a discrete particle swarm optimization algorithm with differential evolution (DE-DPSO) to distribute dataset during workflow execution. Based on this, a new data placement strategy named DE-DPSO-DPS is proposed. DE-DPSO-DPS is evaluated using several experiments designed in simulated heterogeneous edge-cloud computing environments. The results demonstrate that our data placement strategy can effectively reduce the data transmission time and achieve superior performance as compared to traditional strategies for data-sharing scientific workflows.

Sample of data placement by two strategies.

…

Sample of data placement for scientific workflow in heterogeneous edge-cloud computing environments.

…

Figures - uploaded by Xin Du

Content may be subject to copyright.

Content uploaded by Xin Du

Content may be subject to copyright.

A Novel Data Placement Strategy for Data-Sharing Scientiﬁc Workﬂows in

Heterogeneous Edge-Cloud Computing Environments

Xin Du†, Songtao Tang†, Zhihui Lu∗†‡, Jie Wu†, Keke Gai∗§, and Patrick C.K. Hung¶

†School of Computer Science, Fudan University, Shanghai, China

‡Shanghai Blockchain Engineering Research Center, Shanghai, China

§School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China

¶Faculty of Business and IT, University of Ontario Institute of Technology, Canada

Engineering Research Center of Cyber Security Auditing and Monitoring, Ministry of Education, Shanghai, China

Email: xdu20@fudan.edu.cn, sttang19@fudan.edu.cn, lzh@fudan.edu.cn, jwu@fudan.edu.cn, gaikeke@bit.edu.cn

and patrick.hung@uoit.ca

Abstract—The deployment of datasets in the heteroge-

neous edge-cloud computing paradigm has received increasing

attention in state-of-the-art research. However, due to their

large sizes and the existence of private scientiﬁc datasets,

ﬁnding an optimal data placement strategy that can minimize

data transmission as well as improve performance, remains

a persistent problem. In this study, the advantages of both

edge and cloud computing are combined to construct a data

placement model that works for multiple scientiﬁc workﬂows.

Apparently, the most difﬁcult research challenge is to pro-

vide a data placement strategy to consider shared datasets,

both within individual and among multiple workﬂows, across

various geographically distributed environments. According to

the constructed model, not only the storage capacity of edge

micro-datacenters, but also the data transfer between multiple

clouds across regions must be considered. To address this issue,

we considered the characteristics of this model and identiﬁed

the factors that are causing the transmission delay. The

authors propose using a discrete particle swarm optimization

algorithm with differential evolution (DE-DPSO) to distribute

dataset during workﬂow execution. Based on this, a new data

placement strategy named DE-DPSO-DPS is proposed. DE-

DPSO-DPS is evaluated using several experiments designed in

simulated heterogeneous edge-cloud computing environments.

The results demonstrate that our data placement strategy

can effectively reduce the data transmission time and achieve

superior performance as compared to traditional strategies for

data-sharing scientiﬁc workﬂows.

Keywords-Heterogeneous edge-cloud computing environ-

ments; data placement; data-sharing; scientiﬁc workﬂows

I. INTRODUCTION

With the exponential increase of global cooperation in the

scientiﬁc research and the rapid development of distributed

computing technology, scientiﬁc applications have changed

signiﬁcantly these days. They now involve thousands of

interwoven tasks and are generally data and computing

intensive [1]. To represent these complicated scientiﬁc ap-

plications, scientiﬁc workﬂow is widely used in several

scientiﬁc ﬁelds [2], such as astronomy, physics, and bioinfor-

matics. Expectedly, due to the complex structure and large

Corresponding authors: Zhihui Lu and Keke Gai

data tasks, the deployment of scientiﬁc workﬂows has rigid

requirements for computational and storage resources. In

some scientiﬁc domains, when creating the data placement

strategy for these workﬂows, multiple practical scenarios

must be considered. For example, the datasets are often

shared among multiple tasks within workﬂows, including

workﬂows in different geo-distributed organizations. Fur-

thermore, there are several private datasets that may only

be allowed to be stored in speciﬁc research institutes.

Thus, proposing a good data placement strategy, which can

generally optimize data transmission time during workﬂows

execution, has always been a major challenge.

In order to address the above-mentioned challenges,

constructing data placement models in the heterogeneous

edge-cloud computing environment has become an area of

signiﬁcant interest in the ﬁeld. In an heterogeneous edge-

cloud computing environment, there are several distributed

datacenters. Some are cloud data center that are distributed

geographically, others are edge micro-datacenters. Evidently,

every datacenter has computation and storage resources, and

there are signiﬁcant differences between characteristics of

cloud and edge micro-datacenters [3]–[5]. Compared with

cloud datacenters [6], the storage and computing power

of edge micro-datacenters are limited. But edge micro-

datacenters that are geographically closer have a critical

positive effect on data transmission time [7], and some

immovable and private datasets can only be stored in edge

micro-datacenters [8].

In real-world scenarios, a good data placement model

should have the following characteristics. Firstly, due to

the large number of datasets and complex structures of

scientiﬁc workﬂows, combining edge and cloud computa-

tions ensures high cohesion within a datacenter and low

coupling between different datacenters. Secondly, scientiﬁc

workﬂows are distributed and data-intensive applications

are needed to support the scientiﬁc collaboration between

different research institutes. Therefore, datasets and tasks

in these workﬂows need to be allocated and dispatched to

498

2020 IEEE International Conference on Web Services (ICWS)

DOI 10.1109/ICWS49710.2020.00073

(a) (b)

Figure 1: Different environment between data placement

models. (a) existing heterogeneous edge-cloud computing

environment; (b) proposed heterogeneous edge-cloud

computing environment.

geographically distributed datacenters. In this process, data

sharing among multiple workﬂows and tasks should improve

the efﬁciency of the scientiﬁc workﬂows. Thirdly, there are

signiﬁcant differences in bandwidth between different edge

data centers in the same geographic location and cloud data

centers in different geographic locations. These differences

have a signiﬁcant impact on the overall data placement

strategy. Figure 1 shows the different environment between

the proposed model and others. Unlike the existing heteroge-

neous edge-cloud computing environement used during the

execution of scientiﬁc workﬂows, the showed environement

in Figure 1(b) not only considers shared datasets both within

tasks and among multiple tasks, but it also considers the

coordination of data centers between different regions. Based

on our research, there is no other existing data placement

model that is suitable for the actual environment. The

proposed model addresses this problem, and based on it,

a novel data placement strategy is proposed.

The formulation of the data placement strategy is an NP-

hard problem with transmission time. Several researchers

have mapped it to the knapsack packing problem [9]–[11],

and hence, to obtain the optimal solution to this problem,

they proposed several methods using heuristic algorithms,

such as the genetic algorithm (GA) and particle swarm

optimization (PSO). These optimization algorithms, which

simulate the behaviors of birds, nuts, or ﬁshes in continuous

search spaces, are well-suited for solving NP-hard problems.

In [12], in order to apply the PSO to cloud computing sce-

narios, a novel discrete PSO (DPSO) algorithm is presented

to solve discrete problems. Furthermore, a self-adaptive dis-

crete PSO algorithm with genetic algorithm operators(GA-

DPSO) is proposed in [13] to reduce the data transmission

time during a workﬂow execution in a single-region hetero-

geneous edge-cloud computing environment. However, these

data placement strategies only consider individual workﬂows

or cloud environments, which consider each workﬂow in

isolation.

In this study, a data placement model for data-sharing

scientiﬁc workﬂows is constructed in heterogeneous edge-

cloud computing environments, and a novel data placement

strategy is proposed. The characteristics of different datacen-

ters and data-sharing scientiﬁc workﬂows have further been

investigated, and a discrete PSO algorithm with differential

evolution has been proposed to reduce the data transmission

time and improve the performance of system during multiple

workﬂows in multi-region heterogeneous edge-cloud com-

puting environments. This method considered not only the

factors impacting the data transfer delay, such as the number

of datacenters and the storage capacity of edge micro-

datacenters, but also the data sharing for multiple workﬂows

in different datacenters. The proposed data placement model

and strategy was evaluated by several designed experiments

in a simulated environment. Speciﬁcally, the data transfer

time and the speed of ﬁnding optimal data placement, which

are two important indexes to evaluate a data placement

strategy, were compared with those achieved using existing

strategies. The experimental results show that the proposed

model can effectively reduce data transfer time and achieve

the best performance.

The main contributions of this paper are as follows:

1) A model for obtaining data placement solutions during

the execution of multiple workﬂows in multi-region het-

erogeneous edge-cloud computing environments is pro-

posed. Compared with single-region heterogeneous edge-

cloud computing environments which consider only one

scientiﬁc workﬂow, the proposed model does not only

considers shared datasets both within tasks and among

multiple workﬂows, but also considers the coordination

of data centers between different regions, which is clearly

more congruent with actual cases of scientiﬁc research

workﬂows.

2) A novel data placement algorithm, which is based on

effective improvement in DPSO and DE, is proposed to

distribute datasets according to the above-mentioned data

placement model. The crossover and mutation operator

of the DE were recoded and deﬁned to be better suited

to our problems. This algorithm is highly efﬁcient and

can quickly ﬁnd ﬂexibly appropriate datacenters to place

shared and unshared datasets.

3) A data-share data placement strategy is proposed based

on DE-DPSO for scientiﬁc workﬂows to minimize data

transfer time and optimize the speed of ﬁnding the

ﬁnal placement locations of data. This strategy considers

almost all impact factors that may affect the ﬁnal results,

such as the number of edge micro-datacenters, the storage

capacity of edge micro-datacenters, the data sharing for

multiple workﬂows, and the bandwidth between different

datacenters.

The rest of this study is organized as follows. Section II

reviews related work. Section III gives details of our data

placement model, which comprises the model construction

and problem analysis. Section IV presents our data place-

499

ment strategy. Section V describes the experimental results.

The conclusions and future work are summarized in Section

VI.

II. RELATED WORK

Developing a strategy to ﬁnding an optimal data place-

ment in a distributed environment has always been a consid-

erable challenge. Whether in traditional storage systems or

in the newer cloud and heterogeneous edge-cloud computing

systems, the data placement problem has been studied in

depth. In this era of highly developed network technology,

we require a model which considers real-world scenarios as

much as possible. Along with factors, such as large datasets,

shared datasets, private datasets stored in ﬁxed edge micro-

datacenters, and bandwidth limitations, which have a critical

effect on the data transfer time and the speed of ﬁnding the

ﬁnal location for data placement.

Like cluster or grid systems, several traditional deploy-

ment methods for scientiﬁc workﬂows involve distributed

computing environments, which are very expensive and

do not satisfy the demands of a practical scenario. More-

over, these deployment environments with low-level resource

sharing will lead to signiﬁcant data transmission delays.

In the existing distributed computing systems, researchers

have focused on optimizing the simulation models and data

transfer time in cloud environments. Nukarapu et al. [14]

proposed a classic data-intensive scientiﬁc workﬂow system

deployed on a distributed platform, which explores the

interactions between data placement services and relatively

reduced system execution time. Yuan et al. [15] provided a

data placement strategy based on k-means and BEA clus-

tering for a scientiﬁc workﬂow that effectively reduced the

number of data movements. However, this strategy ignored

the difference in the storage capacity of each datacenter. In

addition, the number of data movements did not accurately

represent the amount of data movement or the actual data

transmission status. Wang et al. [16] designed a data place-

ment strategy based on k-means clustering for a scientiﬁc

workﬂow in cloud environments that considered the data size

and dependency. While this approach reduced the number of

data movements using a data replication mechanism, it did

not formalize the data replication cost.

As per the results of recent studies, scientiﬁc cloud

workﬂow systems can satisfy the demands of scientists

from different laboratories or regions, and they can also

collaborate and carry out their research process more ﬂex-

ibly [17]. However, owing to the existence of large-scale

dataset interaction and the remote deployment of cloud com-

puting sources, data transmission still requires a signiﬁcant

amount of time during workﬂow execution. To solve this

problem, a time-driven data placement strategy has been

developed by combining the advantages of both edge and

cloud computing, which also solves the problem of limited

resources in edge computing.

In summary, existing studies have only proposed data

placement models and strategies to reduce the data transfer

time for individual workﬂows. In particular, as per the latest

discussions on heterogeneous edge-cloud computing envi-

ronments, these models generally consider only one cloud

datacenter and one scientiﬁc workﬂow. However, in the real

world, cooperation between scientiﬁc organizations across

different geographical distributions is common. Therefore,

the existence of multiple scientiﬁc workﬂows and multiple

cloud data centers should be considered when building a data

placement model. In such a model, the data sharing can have

a big impact on the placement outcome. Additionally, when

formulating a data placement strategy, the data transfer time

as well as the speed of data placement must be minimized.

In this study, distinguishing from the other existing works, a

novel data placement model is constructed according to the

environment mentioned above, and a strategy is proposed

to minimize data transfer time and optimize the speed

of obtaining the ﬁnal placement of data for data-sharing

scientiﬁc workﬂows.

III. MODEL CONSTRUCTION AND PROBLEM ANALY SIS

In this section, fundamental deﬁnitions of data-sharing

scientiﬁc workﬂows in the heterogeneous edge-cloud com-

puting environment are given, and a data placement model

for the proposed data placement strategy is constructed.

Subsequently, based on the constructed model, a speciﬁc

example is provided to analyze the data placement problem.

It is worth noting that the core purpose of this strategy

for all scientiﬁc workﬂows in this model is to optimize the

speed of ﬁnding the ﬁnal location for datasets and also to

minimize the data transfer time during workﬂows execution

while satisfying the intrinsic properties of each datacenter

(such as storage or computing resource).

A. Model Construction

The heterogeneous edge-cloud computing environment

DC ={DCc,DC

e}contains geographically distributed

cloud datacenters and multiple edge micro-datacenters:

cloud computing at the long-distance end, which generally

has unlimited storage resources and only one per region,

and edge computing in the near end, which is used to store

the private datasets with limited storage capacity. According

to the heterogeneous computing environment, our model

consists of mcloud datacenters DCc={dc1,dc

2, .., dcm}

and nedge micro-datacenters DCe={dc1,dc

2, .., dcn}.In

addition, dci=<cap

i,type

i>represent the ith datacenter,

whose capidenotes its storage capacity and typeiis used

to ﬂag whether the datacenter is a cloud or an edge micro-

datacenter. In particular, when typei=0, the datacenter is

a cloud datacenter, which can only store public datasets and

is usually distributed geographically. When typei=1, the

datacenter is an edge micro-datacenter, which can store both

private and public datasets. The bandwidth across different

500

Figure 2: Sample of data placement by two strategies.

datacenters is represented as bij =<band

ij ,type

i,type

j>,

where bandij is the value of the bandwidth and datacenter

iis not datacenter j. To improve the focus on our data

placement issues, it was assumed that the bandwidth is

a known constant. Then, fundamental deﬁnitions of data-

sharing scientiﬁc workﬂows were given, and the model was

constructed.

Deﬁnition 1. Scientiﬁc workﬂow. In the proposed model,

scientiﬁc workﬂows Ware described as {W1,W

2, .., Wl},

where lis the number of scientiﬁc workﬂows. Each scientiﬁc

workﬂow can be described as a directed acyclic graph Wk=

(T,R, DS), where T={t1,t

2, ..., tr}represents the task

set in Wk, which contains rtasks. An adjacency matrix R

represents the relationship between tasks in the task set T,

and Ri,j =0denotes that task tihas no relationship with

task tj,Ri,j =1denotes that task tiprecedes another task

tj, and DS ={d1,d

2, ..., dn}denotes all datasets in these

scientiﬁc workﬂows.

Deﬁnition 2. Task. In a workﬂow Wk, a task can be

described as <iDS

i,oDS

i,dc

i>, where iDSidenotes

the input datasets of ti,oDSidenotes the collection of

output datasets, and ti.dcirepresents the placement location

at which datacenter the task was scheduled. There is a many-

to-many map between the task set and data set; in simple

terms, an item of data can be used by multiple tasks, and

conversely, a task can also request or generate multiple

datasets. It is worth noting that because private datasets can

only exist in a particular edge micro-datacenter, the tasks

also fall into two categories: those that require or generate

private datasets and those that do not. In the execution of

a scientiﬁc workﬂow, a task can only be executed if it

possesses all the datasets it requires.

Deﬁnition 3. Datasets. While the traditional model only

requires a scientiﬁc workﬂow, in the proposed model, not

only does the data need to be divided into public and private

datasets, but distinctions need to be made between datasets

that are shared across multiple workﬂows. In this model,

Figure 3: Sample of data placement for scientiﬁc workﬂow

in heterogeneous edge-cloud computing environments.

a dataset diis described as <dsize

i,cn

i,dc

i,pf

i,sf

i>,

where dsizeirepresents the size of di, and cnirepresents

the task set that either generates the dataset dior needs

to be generated from this dataset; di.dciis the datacenter

that stores the dataset di, and pf is a ﬂag that indicates

whether or not diis a private dataset, where pf =0denotes

a public dataset and pf =1denotes a private dataset. The

last attribute, sfi, indicates whether or not the dataset diis

shared between different workﬂows, where sfi=0denotes

an unshared dataset and sfi=1represents a shared dataset.

Deﬁnition 4. Data placement. In this model, S=

(W, D, DC,M ap, Ttrans,N

iter)deﬁnes as the data place-

ment, where Map can be described as dataset–datacenter

mapping. The Map can be divided into two cases, depending

on the datasets. For a private data placement map, which

can be formularized as Map.pri =di∈di.pri {di→di.dc},

these datasets can be mapped directly to their ﬁxed locations

(i.e.,edge micro-datacenters). For a public data placement

map Map.pub =di∈di.pub{di→di.dc}, these datasets

can be mapped by data placement algorithms. During the

execution of all the workﬂows in the model, Ttrans denotes

the total data transfer time and Niter represents the map

time. The Ttrans can be calculated by (1), and Niter can

be calculated by iteration times of data placement algorithm

(DPA).

Ttrans =

|DC|



i=1

|DC|



j=i

|D|



k=1

dsizek

bandij

·gijk (1)

where gijk is used to discern whether a dataset dkis

being transferred from different datacenters throughout the

process of data scheduling; gijk =0indicates that dkis

always in the same datacenter.

Obtaining the lowest data transfer time and the shortest

map time is the optimal workﬂow strategy for data place-

ment mapping. Hence, the characteristics of data and the

data placement strategy are the two most important factors

inﬂuencing our model. We have considered the privacy and

501

Table I:

Dataset Size of Data-Sharing Scientiﬁc Workﬂows

Dataset d1d2d3d4d5d6d7d8d9d10 d11

Size(GB) 3.1 5.4 2.1 1.3 1.1 2.3 1.7 2.1 1.5 0.5 4.0

Table II:

The Final Placement Location of Each Dataset

Dataset d1d2d3d4d5d6d7d8d9d10 d11

DCfig2(a)dc1dc1dc1dc1dc2dc2dc2dc2dc3dc3dc3

DCfig2(b)dc1dc1dc2dc1dc2dc3dc2dc2dc3dc3dc3

sharing of the data, and we will describe our data placement

strategy in the next section. Therefore, our data placement

model comprehensively calculates the data transfer time for

scientiﬁc workﬂows in heterogeneous edge-cloud computing

environments.

B. Problem Analysis

In order to analyze the data placement problem in this

model and illustrate the proposed data placement strat-

egy for data-sharing scientiﬁc workﬂows, Figure 2(a) and

Figure 2(b) show two different data placement strategies

in a simple data-sharing scientiﬁc workﬂow scenario. The

scenario has two individual workﬂows in the same region,

called Workﬂow 1 and Workﬂow 2, and there are 11 tasks

{t1,t

2, ..., t11}, 11 datasets {d1,d

2, ..., d11}, and three dat-

acenters {dc1,dc

2,dc

3}. Speciﬁcally, there are four private

datasets, which are deployed separately in two edge micro-

datacenters, dc2and dc3, and several datasets are shared

between different workﬂows. Table 1 lists the respective

sizes of all the datasets in the scenario. In addition, dc1is

a cloud datacenter with unlimited storage capacity, and dc2

and dc3are edge datacenters, each with storage capacities

of 20 GB. It is well known that the bandwidth between the

cloud and edge micro-datacenters is much lower than the

bandwidth between the edge micro-datacenters [18]. In this

example, the bandwidths {band12,band

13,band

23}were

set across the datacenters as {10M/s, 20M/s,150M/s}.

Compared with the data placement results of these two

strategies, the result of the strategy shown in Figure 2(a)

required the datasets to be moved eight times; the amount

of data movement was 11.6 GB, and the data transfer

time was calculated to be 600 s. The strategy described

in Figure 2(b) was found to have six data movements;

the amount of data movement was 8.4 GB, and the data

transfer time was approximately 280 s. Table 2 shows the

ﬁnal placement location of each dataset under these two

strategies. Expectedly, different data placement strategies

will signiﬁcantly affect the data transfer time and efﬁciency

of the scientiﬁc workﬂow process. In these terms, the data

placement strategy shown in Figure 2(b) is superior to that

shown in Figure 2(a).

In order to simulate the possible real-life scenario more

accurately, another geo-distributed organization was added

to the sample, as shown in Figure 3. Based on the real sce-

nario, which involved the execution of multiple workﬂows in

multi-region heterogeneous edge-cloud computing environ-

ments, the model in this study considers multiple cloud dat-

acenters and multiple scientiﬁc workﬂows, while proposing

a data placement strategy based on DE-DPSO, which places

datasets while considering not only the bandwidth between

datacenters, the number of edge micro-datacenters, and the

storage capacity of edge micro-datacenters but also the

sharing of data during workﬂow execution and the number

of iterations of the data placement algorithm in ﬁnding the

optimal placement location.

IV. DATA PLACEMENT STRATEGY

A model has been constructed based on real-life param-

eters, which involve multiple geo-distributed data-sharing

scientiﬁc workﬂows. In this section, a novel data placement

strategy based on the model is described, which provides the

algorithm for ﬁnding a better data placement map to mini-

mize data transfer time and maximize the speed of ﬁnding

the optimal location. First, a data placement algorithm(DE-

DPSO-DPA) is designed to determine the ﬁnal locations of

public datasets. Afterwards, the proposed data placement

strategy speciﬁcally based on DE-DPSO-DPA is described.

A. DE-DPSO-DPA

The PSO algorithm and DE algorithm are both heuristic

search algorithms based on groups, proposed by [19] and

[20], respectively. While elaborating on these algorithms,

which are beyond the scope of this paper, our DE-DPSO

algorithm does not only enables traditional PSO to solve dis-

crete problems like data placement through problem coding,

but also skillfully combines an efﬁcient global optimization

DE algorithm.

1) Problem Encoding: Inspired by the coding strategy

mentioned in [13], a discrete coding strategy is provided

for the data placement problem to satisfy the well-known

principles of completeness, non-redundancy, and viability.

Most meta-heuristic algorithms ﬁlter the optimal solution

by generating n-dimensional candidate particles. In our data

placement problem, a particle maps a solution for data-

sharing scientiﬁc workﬂows in the proposed distributed

model. The ith particle in the algorithm’s tth iteration is

formulated as

Xi,t ={x1

i,t,x

i,t, ..., xd

i,t}(2)

where dis the dimension of this particle and represents the

number of datasets in the model. Meanwhile, xk

i,t denotes the

placement location of the kth dataset after the tth iteration

of the algorithm. It is worth noting that, for a particle

in this model, there are Qdimensions representing the

502

Algorithm 1 DE-DPSO-DPA

Input:

T(itermax) , t(current iteration), n (particles),

D (the dimension) , F(scaling factor), Crg,Crp

Output:

Res (the best optimal solution)

1: Set parameters and Initialize all datasets’ placement;

2: Set particle dimension H=|D.pub|;

3: for i:1toswarmsize n do

4: for d:1toHdo

5: Initialize xk

id randomly;

6: End for

7: Initialize pbest

8: End for

9: Initialize gbest

10: while t<=Tdo

11: for i=1tondo

12: Select a, b randomly from particles and a =b;

13: mutation(xi,t−1,F,x

a,t−1,x

b,t−1) by Equation (3)

14: crossover(xpbest,t−1,ui,t,Crp) by Equation (4)

15: crossover(xgbest,t−1,ui,t,Crg) by Equation (4)

16: Select xi,t by Equation (5)

17: end for

18: t= t+1

19: end while

20: Update the best optimal solution Res

21: Output Result

private datasets are ﬁxed, and the Hdimensions representing

datasets are shared for multiple workﬂows.

After determining the correspondence between each parti-

cle and the candidate solution in our model, a data placement

algorithm is proposed such that the exposed dataset could

be placed in the appropriate datacenter, in order to achieve

the goal of minimizing data transmission time and ﬁnding

the optimal location in the shortest time.

2) Algorithm description: The traditional PSO algorithm

is a classical heuristic algorithm, and its update strategy is

based on the velocity and position of the particles. When it is

used as a data placement algorithm, there are two problems:

•It is easy to fall into local optimization.

•It cannot handle discrete problems, such as the one in

this study.

To address these issues, the DE algorithm is used to

expand its search capability, and the algorithm is also dis-

cretized based on the application scenarios. The pseudocode

of our DE-DPSO-DPA algorithm for scientiﬁc workﬂows

is presented in Algorithm 1. Speciﬁcally, at the ﬁrst part

(lines 1-9), all the datasets and the algorithm parameters

are initialized. The steps are the preprogressing of placing

shared datasets, and the algorithm evaluates the current

ﬁtness of particles and also updates best values with the

best result. Thereafter, according to the steps described

in 10)∼17), the update strategy for each particle at every

iteration adapts the mutation and crossover operators of the

DE to update the best global solution, which effectively

prevents the algorithm from falling into the local optimal

solution. The mutation strategy for the ith particle at the tth

iteration is formulated as follows.

ui,t =xi,t−1⊕F(xa,t−1xb,t−1)(3)

where ui,t is a new feasible particle and Fis the scale

factor. After the mutation, the generated particle is changed

by a crossover strategy. For the individual cognition and

social cognition components, the crossover operator of our

algorithm is shown in Equation (4).

⎧

⎪

⎨

⎪

⎩

y =crossover(x1,x2,prob)

y[i]=x1[i]if Random.r < prob

x2[i]if Random.r ≥prob

(4)

where Random.r is a random factor between 0 and 1.

prob is a parameter used to control the extent of crossover

operations. In order to get the optimal solution, the operation

is executed twice in each algorithm’s iteration. Speciﬁcally,

Crpdenotes the crossover parameter, which controls the

distance between the current particle position and the local

optimal position. Meanwhile, Crgis also a parameter that

proportionally selects indexes in an old particle and replaces

the segment between them with the gbest particle segment.

y,x1and x2also represent particles in different cases. The

ﬁnal particle, y, obtained by the above operations at the tth

iteration is denoted as wi,t.

The optimality of a particle is usually measured by the

ﬁtness function: the smaller it is, the better the performance.

In the present data placement problem, clearly, the better

particle must have a smaller data transfer time and a higher

speed of ﬁnding the ﬁnal location of all the datasets. In

the discrete encoding method, the ﬁtness value fit() is

deﬁned as the data transmission time. In summary, the

update strategy can be described as follows:

xi,t =wi,t if fit(wi,t)<fit(gbest)

xi,t−1if fit(wi,t)≥fit(gbest)(5)

B. DE-DPSO-DPS

Algorithm 2 is designed to describe our data placement

strategy for data-sharing scientiﬁc workﬂows. It consists

of three parts: data placement of private datasets (lines 1-

3), data placement of public datasets(lines 4-6), and the

combination of the former two parts (line 7-8).

In part one, after initializing the current storage of all

the datacenters, the total data transfer time, and the number

of DE-DPSO-DPA iterations to 0, all datasets are classiﬁed

into public datasets and private datasets. Subsequently, these

503

Algorithm 2 DE-DPSO-DPS

Input:

initial datasets DS, Tasks T, Datacenters DC

Output:

PM (data placement map), Data transfer time Ttrans,

Number of iterations Niter

1: Initial Data transfer time Ttrans = 0, number of algo-

rithm’s iteration Niter =0

2: Divide datasets DS into DS.pub and DS.pri;

3: Allocate datasets DS.pri to DC;

4: Divide datasets DS.pub into DS.ush and DS.sh;

5: allocate datasets D.pub into DC;

6: DE-DPSO-DPA;

7: Calculate Ttrans and Niter ;

8: Output Result

private datasets are distributed to ﬁxed datacenters. In part

two, the remaining public datasets are divided into unshared

and shared datasets. These unshared and shared datasets

are then respectively distributed to proper datacenters ac-

cording to the DE-DPSO-DPA. In part three, this algorithm

assembles the maps, the total data transmission time, and

the number of iterations of all the datasets. There have

been two improvements: (1) all data placement solutions are

transformed into a closed annular searching space in order

to ensure that they are effective and are mapped to actual

datacenters; (2) the particle velocity speed is constrained

based on the dataset characteristics (e.g., public/private and

shared/unshared) in order to guarantee that DE-DPSO-DPS

can locate appropriate data placement solutions.

The problem with data placement strategies for data-

sharing scientiﬁc workﬂows in heterogeneous edge-cloud

computing environments can be represented by Equation(6).

It’s core purpose is to pursue a minimum total data trans-

mission time while satisfying the storage capacity constraint

for each datacenter.

Minimize Ttrans

subject to ∀i, |D|

j=1 dj·lij ≤capacityi

(6)

where lij is a ﬂag which represents whether the datacenter

dcistores the dataset dj;lij =1indicates that it does,

while lij =0indicates that it does not. capacityidenotes

the storage capacity of the ith datacenter.

V. E XPERIMENTAL RESULTS AND ANALYSIS

In this section, several simulation experiments, designed

to evaluate the effectiveness of the proposed data placement

strategy, are described and the impact factors in our data

placement model are discussed. Comparing the results with

those from other strategies and also considering the different

scenarios of these factors, the advantages of our proposed

strategy in this context, as well as the impact factors, are

evaluated. All the experiments were conducted on a machine

with the following speciﬁcations: an Intel(R) Core(TM) i7–

4790 CPU @ 3.40 GHz, 16 GB of RAM, Windows10(64bit),

and IntelliJ IDEA2019.2.4.

A. Experimental Setup

In the developed DE-DPSO-DPA, the initial population

size was set to 100, the maximum number of iterations

was set to 2000, the scaling factor was set to 0.15, and

Crgand Crpwere set to 0.1 and 0.1, respectively. The

synthetic workﬂows from Montage in astronomy released

by Bharathi et al [21] were used, and both the number

of datasets and the structures differed in them. In order to

discuss the impact factors in our model, the number of edge

micro-datacenters was changed from three to ﬁve, and the

number of cloud datacenters was set at two. The storage

capacity of edge micro-datacenters was the same in the same

simulation, varying from 150 GB to 300 GB in steps of 50

GB. In addition, the bandwidth between different datacenters

in basic experiment was set as follows(its unit is M/s):

Bandwidth =⎡

⎢

⎣

∼5555

5∼20 20 20

520 ∼100 150

5 20 100 ∼200

5 20 150 200 ∼

⎤

⎥

⎦

(7)

the bandwidth between two cloud datacenters was set as

5 M/s, the bandwidth between an edge micro-datacenter

and a cloud datacenter was 20 M/s, and the bandwidth

between different edge micro-datacenters were set as {100

M/s, 150 M/s, 200 M/s}. In the basic experiment, there

are three edge micro-datacenters and their storage capacity

was set as 150GB. To compare the performance of different

bandwidths across different edge micro-datacenters of these

data placement strategies, we observed the bandwidth across

different edge micro-datacenters is {0.5, 0.8, 1.5, 3, 5}times

faster than the bandwidth in the basic experiment.

B. Performance Comparison

Four other data placement strategies were compared with

our DE-DPSO-DPS. The deployment of data in the other

strategies was based on the following algorithms: random,

DE, DPSO, and GA-DPSO. In view of the fact that, there are

several similarities between the distributed cloud computing

environment and our heterogeneous edge-cloud computing

environment, the methods used in the former can be reﬁned

and applied to our model and compared to our strategy.

When the strategy is random, the number of iterations is

1, so the Niter was ignored. DE, DPSO, and GA-DPSO

are meta-heuristic algorithms, which need to converge and

may generate different results in each experiment. According

to [13], if the ﬁtness values are consistent for over 80

consecutive iterations, the algorithm is considered to have

converged. The data transfer time and the speed of ﬁnding

504

Table III: Comparison of Different Strategies with Different Bandwidths

Bandwidth of Edge micro-Datacenters

Algorithms 0.5 0.8 1.5 3 5

Ttrans Niter Ttrans Niter Ttrans Niter Ttr ans Niter Ttrans Niter

Random 36564.17 - 26032.87 - 14787.45 - 9116.71 - 6788.63 -

DE-DPS 7883.97 1707 6182.27 1442 4845.19 1186 4088.46 1042 3787.79 1489

DPSO-DPS 8092.25 165 6223.92 182 4885.87 135 4114.91 117 3812.82 194

GA-DPSO-DPS 7827.42 1315 6112.22 1384 4830.79 1664 4081.86 1523 3786.71 1410

DE-DPSO-DPS 7818.04 211 6069.18 351 4830.34 327 4082.89 265 3785.50 198

*The unit of Ttrans is seconds(s), and Niter can be calculated by iteration times of data placement algorithm (DPA).

Table IV: Comparison of Different Strategies with Different Storage Capacity of Edge micro-Datacenters

Storage Capacity of Edge micro-Datacenters

Algorithms 150G 200G 250G 300G 350G

Ttrans Niter Ttrans Niter Ttrans Niter Ttr ans Niter Ttrans Niter

Random 19267.53 - 13947.78 - 14300.76 - 13413.87 - 12729.17 -

DE-DPS 5610.33 968 5483.35 1038 4460.15 1272 4383.37 1467 4358.24 769

DPSO-DPS 5685.15 151 5428.95 105 5877.73 109 4315.18 137 4268.19 105

GA-DPSO-DPS 5601.00 1485 5385.59 1263 4479.62 1549 4258.96 1623 4164.98 1637

DE-DPSO-DPS 5585.15 218 5382.84 323 4466.68 312 4251.91 251 4161.89 225

(a) (b)

Figure 4: Comparison of different strategies with and

without data sharing. (a) data transmission time; (b)

iteration times of DPAs.

the ﬁnal location for all datasets are measured as the average

of 100 repeated experiments.

Table III and Table IV show the results of comparing

the different bandwidths and different storage capacities, of

the edge computing datacenters for the ﬁve data placement

strategies in achieving the optimal result for our scientiﬁc

workﬂows. As the bandwidth across different datacenters

increased, the number of their DPA’s iteration times did

(a) (b)

Figure 5: Comparison of different strategies with different

numbers of edge micro-datacenters. (a) data transmission

time; (b) iteration times of DPAs.

not ﬂuctuate very much, and the data transmission time

decreased signiﬁcantly. As shown in table IV, with the

increase of the storage capacities, both the data transmission

time and the number of iterations are reduced. For data

transmission time, no matter how the impact factors change,

the DE-DPSO-DPS is the most optimal. For the iteration

times of their DPA, only DPSO-DPS is slightly less than

DE-DPSO-DPS, which is determined by the fact that DPSO

505

is extremely prone to local optimization.

Figure 4 shows the data transfer time and the number

of iterations of different data placement strategies with

and without data sharing. Our data-sharing-based model is

superior to those in which each workﬂow is considered

separately, in terms of the total data transfer time under

the same data placement strategy. Additionally, the proposed

data placement strategy results in minimum data transfer

time, regardless of whether sharing is considered or not.

Futhermore, because all workﬂows are treated as one, the

model that considers shared datasets shows tremendous

advantage for their DPA’s iteration times in each strategy.

Figure 5 depicts the data transmission time and the num-

ber of iterations of the different data placement strategies

with the different number of edge micro-datacenters. As

the number of edge micro-datacenters increased, the speed

of data movement increased, and the data transmission

time also increased signiﬁcantly. This result also implies

that the change in the number of edge micro-datacenters

will not affect the superiority of DE-DPSO-DPS over other

strategies.

VI. CONCLUSION

This study proposes a novel data placement strategy for

data-sharing scientiﬁc workﬂows in a heterogeneous edge-

cloud computing environment, based on the needs to reduce

data transmission time and increase the speed of optimal

data placement. A data placement model which considers

multiple workﬂows distributed across different geographic

regions is presented, and a data placement algorithm is

designed to allocate public datasets. Experimental results

express that the data placement strategy based on this algo-

rithm can effectively reduce the data transmission time in the

process of scientiﬁc workﬂow operation, as well as increase

the speed of ﬁnding optimal data placement, as compared

to other state-of-the-art algorithms. In addition, through the

discussion of the number of edge micro-datacenters; storage

capacity of edge micro-datacenters; bandwidth between dat-

acenters; and other factors that may affect the performance

of the data placement strategy, the advantages of our strategy,

in each case, were analyzed in more detail.

In the future, the impact of variations in the dataset sizes

and number of workﬂows as well as the impact of the

proportion of private datasets on scientiﬁc workﬂows on the

data placement strategies will be considered. Futhermore,

our model and DPS can be broadly applied to solve com-

plex data placement problems because it is not limited to

scientiﬁc workﬂows in heterogeneous edge-cloud computing

environments.

ACKNOWLEDGMENT

The work of this paper is supported by National

Key Research and Development Program of China

(2019YFB1405000), National Natural Science Foundation

of China under Grant (No. 61873309, No. 61972034,No.

61572137, and No. 61728202), and Shanghai Science and

Technology Innovation Action Plan Project under Grant

(No.19510710500,No. 18510732000,and No.18510760200).

REFERENCES

[1] A. Kashlev, S. Lu, and A. Chebotko, “Typetheoretic approach

to the shimming problem in scientiﬁc workﬂows,” IEEE

Transactions on Services Computing, vol. 8, no. 5, pp. 795–

809, 2015.

[2] H. Li, K. C. Chan, M. Liang, and X. Luo, “Composition

of resource-service chain for cloud manufacturing,” IEEE

Transactions on industrial informatics, vol. 12, no. 1, pp.

211–219, 2016.

[3] T. L. Duc, R. G. Leiva, P. Casari, and P.-O. ¨

Ostberg, “Ma-

chine learning methods for reliable resource provisioning in

edge-cloud computing: A survey,” ACM Computing Surveys

(CSUR), vol. 52, no. 5, p. 94, 2019.

[4] Z. Wu, Z. Lu, P. C. Hung, S.-C. Huang, Y. Tong, and Z. Wang,

“Qamec: A qos-driven iovs application optimizing deploy-

ment scheme in multimedia edge clouds,” Future Generation

Computer Systems, vol. 92, pp. 17–28, 2019.

[5] Z. Lu, N. Wang, J. Wu, and M. Qiu, “Iotdem: An iot big data-

oriented mapreduce performance prediction extended model

in multiple edge clouds,” Journal of Parallel and Distributed

Computing, vol. 118, pp. 316–327, 2018.

[6] Q. Zhang, L. T. Yang, Z. Yan, Z. Chen, and P. Li, “An efﬁcient

deep learning model to predict cloud workload for industry

informatics,” IEEE transactions on industrial informatics,

vol. 14, no. 7, pp. 3170–3178, 2018.

[7] H. Wu, S. Deng, W. Li, J. Yin, X. Li, Z. Feng, and A. Y.

Zomaya, “Mobility-aware service selection in mobile edge

computing systems,” in 2019 IEEE International Conference

on Web Services (ICWS). IEEE, 2019, pp. 201–208.

[8] W. Shi, H. Sun, J. Cao, Q. Zhang, and W. Liu, “Edge

computing-an emerging computing model for the internet of

everything era,” Journal of computer research and develop-

ment, vol. 54, no. 5, pp. 907–924, 2017.

[9] W. Du, T. Lei, Q. Het, W. Liu, Q. Lei, H. Zhao, and W. Wang,

“Service capacity enhanced task ofﬂoading and resource allo-

cation in multi-server edge computing environment,” in 2019

IEEE International Conference on Web Services (ICWS).

IEEE, 2019, pp. 83–90.

[10] Y. Shao, C. Li, and H. Tang, “A data replica placement

strategy for iot workﬂows in collaborative edge and cloud

environments,” Computer Networks, vol. 148, pp. 46–59,

2019.

[11] X. Chen, S. Tang, Z. Lu, J. Wu, Y. Duan, S.-C. Huang, and

Q. Tang, “idisc: A new approach to iot-data-intensive service

components deployment in edge-cloud-hybrid system,” IEEE

Access, vol. 7, pp. 59 172–59 184, 2019.

506

[12] X. Li, L. Zhang, Y. Wu, X. Liu, E. Zhu, H. Yi, F. Wang,

C. Zhang, and Y. Yang, “A novel workﬂow-level data place-

ment strategy for data-sharing scientiﬁc cloud workﬂows,”

IEEE Transactions on Services Computing, 2019.

[13] B. Lin, F. Zhu, J. Zhang, J. Chen, X. Chen, N. Xiong,

and J. Lloret, “A time-driven data placement strategy for

a scientiﬁc workﬂow combining edge computing and cloud

computing,” IEEE Transactions on Industrial Informatics,

2019.

[14] D. Nukarapu, B. Tang, L. Wang, and S. Lu, “Data replication

in data intensive scientiﬁc applications with performance

guarantee,” IEEE Transactions on Parallel and Distributed

Systems, vol. 22, no. 8, pp. 1299–1306, 2011.

[15] D. Yuan, Y. Yang, X. Liu, and J. Chen, “A data placement

strategy in scientiﬁc cloud workﬂows,” Future Generation

Computer Systems, vol. 26, no. 8, pp. 1200–1214, 2010.

[16] M. Wang, J. Zhang, F. Dong, and J. Luo, “Data placement

and task scheduling optimization for data intensive scientiﬁc

workﬂow in multiple data centers environment,” in 2014

Second International Conference on Advanced Cloud and Big

Data. IEEE, 2014, pp. 77–84.

[17] H. Li, D. Yang, W. Su, J. L¨

u, and X. Yu, “An overall

distribution particle swarm optimization mppt algorithm for

photovoltaic system under partial shading,” IEEE Transac-

tions on Industrial Electronics, vol. 66, no. 1, pp. 265–275,

2019.

[18] J. Meng, H. Tan, C. Xu, W. Cao, L. Liu, and B. Li, “Dedas:

Online task dispatching and scheduling with bandwidth con-

straint in edge computing,” in IEEE INFOCOM 2019-IEEE

Conference on Computer Communications. IEEE, 2019, pp.

2287–2295.

[19] J. Kennedy and R. Eberhart, “Particle swarm optimization

(pso),” in Proc. IEEE International Conference on Neural

Networks, Perth, Australia, 1995, pp. 1942–1948.

[20] A. K. Qin, V. L. Huang, and P. N. Suganthan, “Differen-

tial evolution algorithm with strategy adaptation for global

numerical optimization,” IEEE transactions on Evolutionary

Computation, vol. 13, no. 2, pp. 398–417, 2008.

[21] S. Bharathi, A. Chervenak, E. Deelman, G. Mehta, M.-H.

Su, and K. Vahi, “Characterization of scientiﬁc workﬂows,”

in 2008 third workshop on workﬂows in support of large-scale

science. IEEE, 2008, pp. 1–10.

507

EQGSA-DPW: A Quantum-GSA Algorithm-Based Data Placement for Scientific Workflow in Cloud Computing Environment

Article

Full-text available

Jun 2024

The processing of scientific workflow (SW) in geo-distributed cloud computing holds significant importance in the placement of massive data between various tasks. However, data movement across storage services is a main concern in the geo-distributed data centers, which entails issues related to the cost and energy consumption of both storage services and network infrastructure. Aiming to optimize data placement for SW, this paper proposes EQGSA-DPW a novel algorithm leveraging quantum computing and swarm intelligence optimization to intelligently reduce costs and energy consumption when a SW is processed in multi-cloud. EQGSA-DPW considers multiple objectives (e.g., transmission bandwidth, cost and energy consumption of both service and communication) and improves the GSA algorithm by using the log-sigmoid transfer function as a gravitational constant G and updating agent position by quantum rotation angle amplitude for more diversification. Moreover, to assist EQGSA-DPW in finding the optima, an initial guess is proposed. The performance of our EQGSA-DPW algorithm is evaluated via extensive experiments, which show that our data placement method achieves significantly better performance in terms of cost, energy, and data transfer than competing algorithms. For instance, in terms of energy consumption, EQGSA-DPW can on average achieve up to \(25\%\), \(14\%\), and \(40\%\) reduction over that of GSA, PSO, and ACO-DPDGW algorithms, respectively. As for the storage services cost, EQGSA-DPW values are the lowest.

Escape Attacks on Transwarp Data Cloud Container

Conference Paper

May 2024

Intelligent Defect Detection System Based on Cloud-edge Synergy Technology

Conference Paper

May 2024

Enhancing resource utilization and privacy in IoT data placement through fuzzy logic and PSO optimization

Article

Full-text available

Jun 2024
CLUSTER COMPUT

The exponential growth of Internet of Things (IoT) devices has ushered in an era of vast data generation, necessitating abundant resources for data storage and processing. Cloud environment forms a notorious paradigm for such data accommodation. Meanwhile, the privacy issues assimilated in IoT data provoke huge complications in data placement. In addition, it is significant to consider factors such as energy efficiency, energy utility of cloud and data access time of IoT applications while allotting resources for IoT data. In light of this circumstance, this research proposes a Fuzzy- Particle Swarm Optimization (PSO) framework to optimize IoT-oriented data placement in cloud data centers. The fuzzy Logic is adept at handling the uncertainty inherent in parameters such as resource availability and privacy sensitivity. Through membership functions and a Fuzzy Inference System, imprecise attributes are quantified, enabling smarter decision-making. Using its intelligence, it prioritizes the task with high sensitivity and resource availability to perform ideal allocation preferring best suitable resource feature unit. The integration of improved PSO leverages its capability to explore complex solution spaces and converge on optimal solutions. The greedy strategy in improved PSO assists in exploring most-optimal virtual machine instance in cloud to improve its resource efficacy. These facets culminate in a framework that holistically manages IoT-generated data, optimizing energy consumption, resource utilization, and data access time, while simultaneously upholding privacy constraints. The results underscore the potency of this approach in offering optimal data management in cloud environments, achieving better resource utilization of 89%, privacy sensitivity of 98.5%, and less energy consumption of 0.7 kWh.

Fedsemgan: A Semi-Supervised Federated Learning-Based Edge-Cloud Collaborative Framework for Medical Diagnosis Service

Conference Paper

Nov 2023

A balanced and reliable data replica placement scheme based on reinforcement learning in edge-cloud environments

Article

Feb 2024
FUTURE GENER COMP SY

DCE: A High-Performance, Scalable, Enterprise-Level Cloud Native Operating System

Conference Paper

Sep 2023

Improved Random Forest Based Anomaly Detection for Urban Rail Transits

Conference Paper

Sep 2023

An effective data placement strategy for IIoT applications

Article

Dec 2023

With the rapid development of the Internet of things (IoT) and mobile communication technology, the amount of data related to industrial Internet of things (IIoT) applications has shown a trend of explosive growth, and hence edge‐cloud collaborative environment becomes one of the most popular paradigms to place the IIoT applications data. However, edge servers are often heterogeneous and capacity limited while having lower access delay, so there is a contradiction between capacity and latency while using edge storage. Additionally, when IIoT applications deployed crossing edge regions, the impact of data replication and data privacy should not be ignored. These factors often pose challenges to proposing an effective data placement strategy to take full advantage of edge storage. To address these challenges, an effective data placement strategy for IIoT applications is designed in this article. We first analyze the data access time and data placement cost in an edge‐cloud collaborative environment, with the consideration of data replication and data privacy. Then, we design a data placement strategy based on ‐constraint and Lagrangian relaxation, to reduce the data access time and meanwhile limit the data placement cost to an ideal level. As a result, our proposed data placement strategy can effectively reduce data access time and control data placement costs. Simulation and comparative analysis results have demonstrated the validity of our proposed strategy.

Intelligent Business Resources Deployment Over Industrial Internet

Chapter

Nov 2023

Business process management (BPM) is a crucial method for standardized and systematic management in the context of Industrial Internet, especially Industrial Internet of Things (IIoT). As the mobile internet develops rapidly and the volume of data generated by IoT devices increases, traditional storage schemes are no longer sufficient for managing mobile IIoT business data. To address this challenge, the cloud-edge collaborative data storage scheme has emerged as an effective solution. This approach leverages cloud data centers to provide mass storage capability, while also considering access delays by storing data on edge servers located near users. Given the constrained storage capacity of edge servers, the development of an efficient data placement strategy has emerged as a critical issue in enhancing the speed of data retrieval and task execution within IIoT business processes. In this chapter, we will explore the use of Bayesian optimization algorithm (BOA) in the context of IIoT resource placement strategy. Resource deployment plays a crucial role in optimizing business processes and achieving efficient utilization of resources. To enhance user experience, shorten overall running time, and reduce data transmission costs, a data placement strategy based on Bayesian optimization algorithm is discussed in this chapter. This strategy can allocate storage capacity on the edge side in a reasonable manner and determine the most appropriate edge server on which to place the data. The simulation results show that the BOA-based strategy outperforms those based on other optimization algorithms used in similar studies across varied scenarios.

Machine Learning Methods for Reliable Resource Provisioning in Edge-Cloud Computing: A Survey

Article

Full-text available

Sep 2019

Large-scale software systems are currently designed as distributed entities and deployed in cloud data centers. To overcome the limitations inherent to this type of deployment, applications are increasingly being supplemented with components instantiated closer to the edges of networks—a paradigm known as edge computing. The problem of how to efficiently orchestrate combined edge-cloud applications is, however, incompletely understood, and a wide range of techniques for resource and application management are currently in use. This article investigates the problem of reliable resource provisioning in joint edge-cloud environments, and surveys technologies, mechanisms, and methods that can be used to improve the reliability of distributed applications in diverse and heterogeneous network environments. Due to the complexity of the problem, special emphasis is placed on solutions to the characterization, management, and control of complex distributed applications using machine learning approaches. The survey is structured around a decomposition of the reliable resource provisioning problem into three categories of techniques: workload characterization and prediction, component placement and system consolidation, and application elasticity and remediation. Survey results are presented along with a problem-oriented discussion of the state-of-the-art. A summary of identified challenges and an outline of future research directions are presented to conclude the article.

Mobility-Aware Service Selection in Mobile Edge Computing Systems

Conference Paper

Full-text available

Jul 2019

Service Capacity Enhanced Task Offloading and Resource Allocation in Multi-Server Edge Computing Environment

Conference Paper

Full-text available

Jul 2019

iDiSC: A New Approach to IoT-Data-intensive Service Components Deployment in Edge-Cloud-Hybrid System

Article

Full-text available

May 2019

With rapid development of the big data technology and the Internet, the requirements of human activities for data are getting higher and higher, and the increasing data volume has a high demand for data processing. The paradigm of the Internet of Things (IoT) has become a key component for edge-cloud-hybrid systems. In the edge environment, multiple IoT-data-intensive services will form a service combination. Due to the data transmission between different service components, there is a huge transmission delay in the process of IoT data transmission, which will affect the performance of the entire system. Therefore, by regarding the reduction of transmission delay as our optimization goal, we put forward iDiSC: a new heuristic approach for IoT-data-intensive service component deployment in the Edge-Cloud-Hybrid System. We also design the iDiSC model, then we optimize the model to select the optimal deployment scenario with the minimum guaranteed latency. Through a series of experiments, compared to genetic algorithm and simulated annealing algorithm, the experimental results show that the iDiSC algorithm has higher efficiency and performance for the problem of data-intensive service component deployment problem in the Edge-Cloud-Hybrid Environment.

An Overall Distribution Particle Swarm Optimization MPPT Algorithm for Photovoltaic System Under Partial Shading

Article

Full-text available

Apr 2018

Solar photovoltaic (PV) system under partial shading conditions (PSC) has a non-monotonic P-V characteristic with multiple local maximum power points, which makes the existing maximum power point tracking (MPPT) algorithms invalid or unsatisfactory to track the global maximum power point (MPP). This paper proposes a novel overall distribution (OD) MPPT algorithm, which can rapidly search the area near the global MPP. In order to find the global MPP accurately, the OD MPPT algorithm can be further combined with other intelligent algorithms. In this paper, the particle swarm optimization (PSO) algorithm is chosen as the combined algorithm because of its simplicity. Then, the OD-PSO MPPT algorithm is firstly proposed and applied into the global MPPT of a PV system, the simulation and experimental results prove the effectiveness and accuracy of the proposed OD-PSO MPPT algorithm by comparing with the existing PSO MPPT algorithm. Therefore, this paper provides a good reference for the MPPT of PV systems under partial shading conditions.

Dedas: Online Task Dispatching and Scheduling with Bandwidth Constraint in Edge Computing

Conference Paper

Apr 2019

A Time-Driven Data Placement Strategy for a Scientific Workflow Combining Edge Computing and Cloud Computing

Article

Mar 2019

Compared to traditional distributed computing envi-ronments such as grids, cloud computing provides a more cost-effective way to deploy scientific workflows. Each task of a scientific workflow requires several large datasets that are located in different datacenters, resulting in serious data transmission de-lays. Edge computing reduces the data transmission delays and supports the fixed storing manner for scientific workflow private datasets, but there is a bottleneck in its storage capacity. It is a challenge to combine the advantages of both edge computing and cloud computing to rationalize the data placement of scientific workflow, and optimize the data transmission time across different datacenters. In this study, a self-adaptive discrete particle swarm optimization algorithm with genetic algorithm operators (GA-DPSO) was proposed to optimize the data transmission time when placing data for a scientific workflow. This approach consid-ered the characteristics of data placement combining edge compu-ting and cloud computing. In addition, it considered the factors impacting transmission delay, such as the bandwidth between dat-acenters, the number of edge datacenters, and the storage capacity of edge datacenters. The crossover and mutation operators of the genetic algorithm were adopted to avoid the premature conver-gence of traditional particle swarm optimization algorithm, which enhanced the diversity of population evolution and effectively re-duced the data transmission time. The experimental results show that the data placement strategy based on GA-DPSO can effectively reduce the data transmission time during workflow execution combining edge computing and cloud computing.

A Data Replica Placement Strategy for IoT Workflows in Collaborative Edge and Cloud Environments

Article

Nov 2018
COMPUT NETW

The convergence of edge and cloud computing shares their strengths, such as unlimited shared storage and computing resources from cloud, low-latency data preprocessing of edge computing. The collaboration of the two computing paradigms can provide a real-time and cost-effective way to deploy Internet of Things (IoT) workflows among cooperative user groups. Since huge amounts of datasets continuously generated from user devices, how to place them to reduce the data access costs while meeting the deadline constraint is a critical issue. This paper proposed a novel data replica placement strategy for coordinated processing data-intensive IoT workflows in collaborative edge and cloud computing environment. Firstly, data replica placement can be modelled as a 0–1 integer programming problem to consider the overall data dependency, data reliability and user cooperation. And then, the ITÖ algorithm, a variant of intelligent swarm optimization, is presented to address this model. The experimental results show that the proposed method outperforms these compared algorithms. It can not only find a higher quality solution of data replica placement, but also need a lower computing budget compared with these traditional algorithms.

QaMeC: A QoS-driven IoVs application optimizing deployment scheme in multimedia edge clouds

Article

Sep 2018
FUTURE GENER COMP SY

Deploying applications to a centralized cloud for service delivery is infeasible because of the excessive latency and bandwidth limitation of the Internet, such as transporting all IoVs data to big data processing service in a centralized cloud. Therefore, multi-clouds, especially multiple edge clouds is a rising trend for cloud service provision. However, heterogeneity of the cloud service, complex deployment requirements, and large problem space of multi-clouds deployment make how to deploy applications in the multi-clouds environment be a difficult and error-prone decision-making process. Due to these difficulties, current SLA-based solution lacks a unified model to represent functional and non-functional requirements of users. In this background, we propose a QoS-driven IoVs application optimizing deployment scheme in multimedia edge clouds (QaMeC). Our scheme builds a unified QoS model to shield off the inconsistency of QoS calculation. Moreover, we use NSGA-II algorithm to solve the multi-clouds application deployment problem. The implementation and experiments show that our QaMeC scheme can provide optimal and efficient service deployment solutions for a variety of applications with different QoS requirements in CDN multimedia edge clouds environment.

An Efficient Deep Learning Model to Predict Cloud Workload for Industry Informatics

Article

Feb 2018

Deep learning, as the most important architecture of current computational intelligence, achieves super performance to predict the cloud workload for industry informatics. However, it is a non-trivial task to train a deep learning model efficiently since the deep learning model often includes a great number of parameters. In this paper, an efficient deep learning model based on the canonical polyadic decomposition is proposed to predict the cloud workload for industry informatics. In the proposed model, the parameters are compressed significantly by converting the weight matrices to the canonical polyadic format. Furthermore, an efficient learning algorithm is designed to train the parameters. Finally, the proposed efficient deep learning model is applied to the workload prediction of virtual machines on cloud. Experiments are conducted on the datasets collected from PlanetLab to validate the performance of the proposed model by comparing with other machine learning-based approaches for workload prediction of virtual machines. Results indicate that the proposed model achieves a higher training efficiency and workload prediction accuracy than state-of-the-art machine learning-based approaches, proving the potential of the proposed model to provide predictive services for industry informatics.

A Novel Data Placement Strategy for Data-Sharing Scientific Workflows in Heterogeneous Edge-Cloud Computing Environments

Abstract and Figures

Recommended publications

Scientific Workflows in Heterogeneous Edge-Cloud Computing: A Data Placement Strategy Based on Reinf...

Optimal Data Placement for Data-Sharing Scientific Workflows in Heterogeneous Edge-Cloud Computing E...

Scientific Workflows in IoT Environments: A Data Placement Strategy Based on Heterogeneous Edge-Clou...

A Time-Driven Data Placement Strategy for a Scientific Workflow Combining Edge Computing and Cloud C...