Conference PaperPDF Available

A Novel Data Placement Strategy for Data-Sharing Scientific Workflows in Heterogeneous Edge-Cloud Computing Environments

Authors:

Abstract and Figures

The deployment of datasets in the heterogeneous edge-cloud computing paradigm has received increasing attention in state-of-the-art research. However, due to their large sizes and the existence of private scientific datasets, finding an optimal data placement strategy that can minimize data transmission as well as improve performance, remains a persistent problem. In this study, the advantages of both edge and cloud computing are combined to construct a data placement model that works for multiple scientific workflows. Apparently, the most difficult research challenge is to provide a data placement strategy to consider shared datasets, both within individual and among multiple workflows, across various geographically distributed environments. According to the constructed model, not only the storage capacity of edge micro-datacenters, but also the data transfer between multiple clouds across regions must be considered. To address this issue, we considered the characteristics of this model and identified the factors that are causing the transmission delay. The authors propose using a discrete particle swarm optimization algorithm with differential evolution (DE-DPSO) to distribute dataset during workflow execution. Based on this, a new data placement strategy named DE-DPSO-DPS is proposed. DE-DPSO-DPS is evaluated using several experiments designed in simulated heterogeneous edge-cloud computing environments. The results demonstrate that our data placement strategy can effectively reduce the data transmission time and achieve superior performance as compared to traditional strategies for data-sharing scientific workflows.
Content may be subject to copyright.
A Novel Data Placement Strategy for Data-Sharing Scientific Workflows in
Heterogeneous Edge-Cloud Computing Environments
Xin Du, Songtao Tang, Zhihui Lu∗†‡, Jie Wu†, Keke Gai∗§, and Patrick C.K. Hung
School of Computer Science, Fudan University, Shanghai, China
Shanghai Blockchain Engineering Research Center, Shanghai, China
§School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Faculty of Business and IT, University of Ontario Institute of Technology, Canada
Engineering Research Center of Cyber Security Auditing and Monitoring, Ministry of Education, Shanghai, China
Email: xdu20@fudan.edu.cn, sttang19@fudan.edu.cn, lzh@fudan.edu.cn, jwu@fudan.edu.cn, gaikeke@bit.edu.cn
and patrick.hung@uoit.ca
Abstract—The deployment of datasets in the heteroge-
neous edge-cloud computing paradigm has received increasing
attention in state-of-the-art research. However, due to their
large sizes and the existence of private scientific datasets,
finding an optimal data placement strategy that can minimize
data transmission as well as improve performance, remains
a persistent problem. In this study, the advantages of both
edge and cloud computing are combined to construct a data
placement model that works for multiple scientific workflows.
Apparently, the most difficult research challenge is to pro-
vide a data placement strategy to consider shared datasets,
both within individual and among multiple workflows, across
various geographically distributed environments. According to
the constructed model, not only the storage capacity of edge
micro-datacenters, but also the data transfer between multiple
clouds across regions must be considered. To address this issue,
we considered the characteristics of this model and identified
the factors that are causing the transmission delay. The
authors propose using a discrete particle swarm optimization
algorithm with differential evolution (DE-DPSO) to distribute
dataset during workflow execution. Based on this, a new data
placement strategy named DE-DPSO-DPS is proposed. DE-
DPSO-DPS is evaluated using several experiments designed in
simulated heterogeneous edge-cloud computing environments.
The results demonstrate that our data placement strategy
can effectively reduce the data transmission time and achieve
superior performance as compared to traditional strategies for
data-sharing scientific workflows.
Keywords-Heterogeneous edge-cloud computing environ-
ments; data placement; data-sharing; scientific workflows
I. INTRODUCTION
With the exponential increase of global cooperation in the
scientific research and the rapid development of distributed
computing technology, scientific applications have changed
significantly these days. They now involve thousands of
interwoven tasks and are generally data and computing
intensive [1]. To represent these complicated scientific ap-
plications, scientific workflow is widely used in several
scientific fields [2], such as astronomy, physics, and bioinfor-
matics. Expectedly, due to the complex structure and large
Corresponding authors: Zhihui Lu and Keke Gai
data tasks, the deployment of scientific workflows has rigid
requirements for computational and storage resources. In
some scientific domains, when creating the data placement
strategy for these workflows, multiple practical scenarios
must be considered. For example, the datasets are often
shared among multiple tasks within workflows, including
workflows in different geo-distributed organizations. Fur-
thermore, there are several private datasets that may only
be allowed to be stored in specific research institutes.
Thus, proposing a good data placement strategy, which can
generally optimize data transmission time during workflows
execution, has always been a major challenge.
In order to address the above-mentioned challenges,
constructing data placement models in the heterogeneous
edge-cloud computing environment has become an area of
significant interest in the field. In an heterogeneous edge-
cloud computing environment, there are several distributed
datacenters. Some are cloud data center that are distributed
geographically, others are edge micro-datacenters. Evidently,
every datacenter has computation and storage resources, and
there are significant differences between characteristics of
cloud and edge micro-datacenters [3]–[5]. Compared with
cloud datacenters [6], the storage and computing power
of edge micro-datacenters are limited. But edge micro-
datacenters that are geographically closer have a critical
positive effect on data transmission time [7], and some
immovable and private datasets can only be stored in edge
micro-datacenters [8].
In real-world scenarios, a good data placement model
should have the following characteristics. Firstly, due to
the large number of datasets and complex structures of
scientific workflows, combining edge and cloud computa-
tions ensures high cohesion within a datacenter and low
coupling between different datacenters. Secondly, scientific
workflows are distributed and data-intensive applications
are needed to support the scientific collaboration between
different research institutes. Therefore, datasets and tasks
in these workflows need to be allocated and dispatched to
498
2020 IEEE International Conference on Web Services (ICWS)
978-1-7281-8786-0/20/$31.00 ©2020 IEEE
DOI 10.1109/ICWS49710.2020.00073
(a) (b)
Figure 1: Different environment between data placement
models. (a) existing heterogeneous edge-cloud computing
environment; (b) proposed heterogeneous edge-cloud
computing environment.
geographically distributed datacenters. In this process, data
sharing among multiple workflows and tasks should improve
the efficiency of the scientific workflows. Thirdly, there are
significant differences in bandwidth between different edge
data centers in the same geographic location and cloud data
centers in different geographic locations. These differences
have a significant impact on the overall data placement
strategy. Figure 1 shows the different environment between
the proposed model and others. Unlike the existing heteroge-
neous edge-cloud computing environement used during the
execution of scientific workflows, the showed environement
in Figure 1(b) not only considers shared datasets both within
tasks and among multiple tasks, but it also considers the
coordination of data centers between different regions. Based
on our research, there is no other existing data placement
model that is suitable for the actual environment. The
proposed model addresses this problem, and based on it,
a novel data placement strategy is proposed.
The formulation of the data placement strategy is an NP-
hard problem with transmission time. Several researchers
have mapped it to the knapsack packing problem [9]–[11],
and hence, to obtain the optimal solution to this problem,
they proposed several methods using heuristic algorithms,
such as the genetic algorithm (GA) and particle swarm
optimization (PSO). These optimization algorithms, which
simulate the behaviors of birds, nuts, or fishes in continuous
search spaces, are well-suited for solving NP-hard problems.
In [12], in order to apply the PSO to cloud computing sce-
narios, a novel discrete PSO (DPSO) algorithm is presented
to solve discrete problems. Furthermore, a self-adaptive dis-
crete PSO algorithm with genetic algorithm operators(GA-
DPSO) is proposed in [13] to reduce the data transmission
time during a workflow execution in a single-region hetero-
geneous edge-cloud computing environment. However, these
data placement strategies only consider individual workflows
or cloud environments, which consider each workflow in
isolation.
In this study, a data placement model for data-sharing
scientific workflows is constructed in heterogeneous edge-
cloud computing environments, and a novel data placement
strategy is proposed. The characteristics of different datacen-
ters and data-sharing scientific workflows have further been
investigated, and a discrete PSO algorithm with differential
evolution has been proposed to reduce the data transmission
time and improve the performance of system during multiple
workflows in multi-region heterogeneous edge-cloud com-
puting environments. This method considered not only the
factors impacting the data transfer delay, such as the number
of datacenters and the storage capacity of edge micro-
datacenters, but also the data sharing for multiple workflows
in different datacenters. The proposed data placement model
and strategy was evaluated by several designed experiments
in a simulated environment. Specifically, the data transfer
time and the speed of finding optimal data placement, which
are two important indexes to evaluate a data placement
strategy, were compared with those achieved using existing
strategies. The experimental results show that the proposed
model can effectively reduce data transfer time and achieve
the best performance.
The main contributions of this paper are as follows:
1) A model for obtaining data placement solutions during
the execution of multiple workflows in multi-region het-
erogeneous edge-cloud computing environments is pro-
posed. Compared with single-region heterogeneous edge-
cloud computing environments which consider only one
scientific workflow, the proposed model does not only
considers shared datasets both within tasks and among
multiple workflows, but also considers the coordination
of data centers between different regions, which is clearly
more congruent with actual cases of scientific research
workflows.
2) A novel data placement algorithm, which is based on
effective improvement in DPSO and DE, is proposed to
distribute datasets according to the above-mentioned data
placement model. The crossover and mutation operator
of the DE were recoded and defined to be better suited
to our problems. This algorithm is highly efficient and
can quickly find flexibly appropriate datacenters to place
shared and unshared datasets.
3) A data-share data placement strategy is proposed based
on DE-DPSO for scientific workflows to minimize data
transfer time and optimize the speed of finding the
final placement locations of data. This strategy considers
almost all impact factors that may affect the final results,
such as the number of edge micro-datacenters, the storage
capacity of edge micro-datacenters, the data sharing for
multiple workflows, and the bandwidth between different
datacenters.
The rest of this study is organized as follows. Section II
reviews related work. Section III gives details of our data
placement model, which comprises the model construction
and problem analysis. Section IV presents our data place-
499
ment strategy. Section V describes the experimental results.
The conclusions and future work are summarized in Section
VI.
II. RELATED WORK
Developing a strategy to finding an optimal data place-
ment in a distributed environment has always been a consid-
erable challenge. Whether in traditional storage systems or
in the newer cloud and heterogeneous edge-cloud computing
systems, the data placement problem has been studied in
depth. In this era of highly developed network technology,
we require a model which considers real-world scenarios as
much as possible. Along with factors, such as large datasets,
shared datasets, private datasets stored in fixed edge micro-
datacenters, and bandwidth limitations, which have a critical
effect on the data transfer time and the speed of finding the
final location for data placement.
Like cluster or grid systems, several traditional deploy-
ment methods for scientific workflows involve distributed
computing environments, which are very expensive and
do not satisfy the demands of a practical scenario. More-
over, these deployment environments with low-level resource
sharing will lead to significant data transmission delays.
In the existing distributed computing systems, researchers
have focused on optimizing the simulation models and data
transfer time in cloud environments. Nukarapu et al. [14]
proposed a classic data-intensive scientific workflow system
deployed on a distributed platform, which explores the
interactions between data placement services and relatively
reduced system execution time. Yuan et al. [15] provided a
data placement strategy based on k-means and BEA clus-
tering for a scientific workflow that effectively reduced the
number of data movements. However, this strategy ignored
the difference in the storage capacity of each datacenter. In
addition, the number of data movements did not accurately
represent the amount of data movement or the actual data
transmission status. Wang et al. [16] designed a data place-
ment strategy based on k-means clustering for a scientific
workflow in cloud environments that considered the data size
and dependency. While this approach reduced the number of
data movements using a data replication mechanism, it did
not formalize the data replication cost.
As per the results of recent studies, scientific cloud
workflow systems can satisfy the demands of scientists
from different laboratories or regions, and they can also
collaborate and carry out their research process more flex-
ibly [17]. However, owing to the existence of large-scale
dataset interaction and the remote deployment of cloud com-
puting sources, data transmission still requires a significant
amount of time during workflow execution. To solve this
problem, a time-driven data placement strategy has been
developed by combining the advantages of both edge and
cloud computing, which also solves the problem of limited
resources in edge computing.
In summary, existing studies have only proposed data
placement models and strategies to reduce the data transfer
time for individual workflows. In particular, as per the latest
discussions on heterogeneous edge-cloud computing envi-
ronments, these models generally consider only one cloud
datacenter and one scientific workflow. However, in the real
world, cooperation between scientific organizations across
different geographical distributions is common. Therefore,
the existence of multiple scientific workflows and multiple
cloud data centers should be considered when building a data
placement model. In such a model, the data sharing can have
a big impact on the placement outcome. Additionally, when
formulating a data placement strategy, the data transfer time
as well as the speed of data placement must be minimized.
In this study, distinguishing from the other existing works, a
novel data placement model is constructed according to the
environment mentioned above, and a strategy is proposed
to minimize data transfer time and optimize the speed
of obtaining the final placement of data for data-sharing
scientific workflows.
III. MODEL CONSTRUCTION AND PROBLEM ANALY SIS
In this section, fundamental definitions of data-sharing
scientific workflows in the heterogeneous edge-cloud com-
puting environment are given, and a data placement model
for the proposed data placement strategy is constructed.
Subsequently, based on the constructed model, a specific
example is provided to analyze the data placement problem.
It is worth noting that the core purpose of this strategy
for all scientific workflows in this model is to optimize the
speed of finding the final location for datasets and also to
minimize the data transfer time during workflows execution
while satisfying the intrinsic properties of each datacenter
(such as storage or computing resource).
A. Model Construction
The heterogeneous edge-cloud computing environment
DC ={DCc,DC
e}contains geographically distributed
cloud datacenters and multiple edge micro-datacenters:
cloud computing at the long-distance end, which generally
has unlimited storage resources and only one per region,
and edge computing in the near end, which is used to store
the private datasets with limited storage capacity. According
to the heterogeneous computing environment, our model
consists of mcloud datacenters DCc={dc1,dc
2, .., dcm}
and nedge micro-datacenters DCe={dc1,dc
2, .., dcn}.In
addition, dci=<cap
i,type
i>represent the ith datacenter,
whose capidenotes its storage capacity and typeiis used
to flag whether the datacenter is a cloud or an edge micro-
datacenter. In particular, when typei=0, the datacenter is
a cloud datacenter, which can only store public datasets and
is usually distributed geographically. When typei=1, the
datacenter is an edge micro-datacenter, which can store both
private and public datasets. The bandwidth across different
500
Figure 2: Sample of data placement by two strategies.
datacenters is represented as bij =<band
ij ,type
i,type
j>,
where bandij is the value of the bandwidth and datacenter
iis not datacenter j. To improve the focus on our data
placement issues, it was assumed that the bandwidth is
a known constant. Then, fundamental definitions of data-
sharing scientific workflows were given, and the model was
constructed.
Definition 1. Scientific workflow. In the proposed model,
scientific workflows Ware described as {W1,W
2, .., Wl},
where lis the number of scientific workflows. Each scientific
workflow can be described as a directed acyclic graph Wk=
(T,R, DS), where T={t1,t
2, ..., tr}represents the task
set in Wk, which contains rtasks. An adjacency matrix R
represents the relationship between tasks in the task set T,
and Ri,j =0denotes that task tihas no relationship with
task tj,Ri,j =1denotes that task tiprecedes another task
tj, and DS ={d1,d
2, ..., dn}denotes all datasets in these
scientific workflows.
Definition 2. Task. In a workflow Wk, a task can be
described as <iDS
i,oDS
i,dc
i>, where iDSidenotes
the input datasets of ti,oDSidenotes the collection of
output datasets, and ti.dcirepresents the placement location
at which datacenter the task was scheduled. There is a many-
to-many map between the task set and data set; in simple
terms, an item of data can be used by multiple tasks, and
conversely, a task can also request or generate multiple
datasets. It is worth noting that because private datasets can
only exist in a particular edge micro-datacenter, the tasks
also fall into two categories: those that require or generate
private datasets and those that do not. In the execution of
a scientific workflow, a task can only be executed if it
possesses all the datasets it requires.
Definition 3. Datasets. While the traditional model only
requires a scientific workflow, in the proposed model, not
only does the data need to be divided into public and private
datasets, but distinctions need to be made between datasets
that are shared across multiple workflows. In this model,
Figure 3: Sample of data placement for scientific workflow
in heterogeneous edge-cloud computing environments.
a dataset diis described as <dsize
i,cn
i,dc
i,pf
i,sf
i>,
where dsizeirepresents the size of di, and cnirepresents
the task set that either generates the dataset dior needs
to be generated from this dataset; di.dciis the datacenter
that stores the dataset di, and pf is a flag that indicates
whether or not diis a private dataset, where pf =0denotes
a public dataset and pf =1denotes a private dataset. The
last attribute, sfi, indicates whether or not the dataset diis
shared between different workflows, where sfi=0denotes
an unshared dataset and sfi=1represents a shared dataset.
Definition 4. Data placement. In this model, S=
(W, D, DC,M ap, Ttrans,N
iter)defines as the data place-
ment, where Map can be described as dataset–datacenter
mapping. The Map can be divided into two cases, depending
on the datasets. For a private data placement map, which
can be formularized as Map.pri =didi.pri {didi.dc},
these datasets can be mapped directly to their fixed locations
(i.e.,edge micro-datacenters). For a public data placement
map Map.pub =didi.pub{didi.dc}, these datasets
can be mapped by data placement algorithms. During the
execution of all the workflows in the model, Ttrans denotes
the total data transfer time and Niter represents the map
time. The Ttrans can be calculated by (1), and Niter can
be calculated by iteration times of data placement algorithm
(DPA).
Ttrans =
|DC|
i=1
|DC|
j=i
|D|
k=1
dsizek
bandij
·gijk (1)
where gijk is used to discern whether a dataset dkis
being transferred from different datacenters throughout the
process of data scheduling; gijk =0indicates that dkis
always in the same datacenter.
Obtaining the lowest data transfer time and the shortest
map time is the optimal workflow strategy for data place-
ment mapping. Hence, the characteristics of data and the
data placement strategy are the two most important factors
influencing our model. We have considered the privacy and
501
Table I:
Dataset Size of Data-Sharing Scientific Workflows
Dataset d1d2d3d4d5d6d7d8d9d10 d11
Size(GB) 3.1 5.4 2.1 1.3 1.1 2.3 1.7 2.1 1.5 0.5 4.0
Table II:
The Final Placement Location of Each Dataset
Dataset d1d2d3d4d5d6d7d8d9d10 d11
DCfig2(a)dc1dc1dc1dc1dc2dc2dc2dc2dc3dc3dc3
DCfig2(b)dc1dc1dc2dc1dc2dc3dc2dc2dc3dc3dc3
sharing of the data, and we will describe our data placement
strategy in the next section. Therefore, our data placement
model comprehensively calculates the data transfer time for
scientific workflows in heterogeneous edge-cloud computing
environments.
B. Problem Analysis
In order to analyze the data placement problem in this
model and illustrate the proposed data placement strat-
egy for data-sharing scientific workflows, Figure 2(a) and
Figure 2(b) show two different data placement strategies
in a simple data-sharing scientific workflow scenario. The
scenario has two individual workflows in the same region,
called Workflow 1 and Workflow 2, and there are 11 tasks
{t1,t
2, ..., t11}, 11 datasets {d1,d
2, ..., d11}, and three dat-
acenters {dc1,dc
2,dc
3}. Specifically, there are four private
datasets, which are deployed separately in two edge micro-
datacenters, dc2and dc3, and several datasets are shared
between different workflows. Table 1 lists the respective
sizes of all the datasets in the scenario. In addition, dc1is
a cloud datacenter with unlimited storage capacity, and dc2
and dc3are edge datacenters, each with storage capacities
of 20 GB. It is well known that the bandwidth between the
cloud and edge micro-datacenters is much lower than the
bandwidth between the edge micro-datacenters [18]. In this
example, the bandwidths {band12,band
13,band
23}were
set across the datacenters as {10M/s, 20M/s,150M/s}.
Compared with the data placement results of these two
strategies, the result of the strategy shown in Figure 2(a)
required the datasets to be moved eight times; the amount
of data movement was 11.6 GB, and the data transfer
time was calculated to be 600 s. The strategy described
in Figure 2(b) was found to have six data movements;
the amount of data movement was 8.4 GB, and the data
transfer time was approximately 280 s. Table 2 shows the
final placement location of each dataset under these two
strategies. Expectedly, different data placement strategies
will significantly affect the data transfer time and efficiency
of the scientific workflow process. In these terms, the data
placement strategy shown in Figure 2(b) is superior to that
shown in Figure 2(a).
In order to simulate the possible real-life scenario more
accurately, another geo-distributed organization was added
to the sample, as shown in Figure 3. Based on the real sce-
nario, which involved the execution of multiple workflows in
multi-region heterogeneous edge-cloud computing environ-
ments, the model in this study considers multiple cloud dat-
acenters and multiple scientific workflows, while proposing
a data placement strategy based on DE-DPSO, which places
datasets while considering not only the bandwidth between
datacenters, the number of edge micro-datacenters, and the
storage capacity of edge micro-datacenters but also the
sharing of data during workflow execution and the number
of iterations of the data placement algorithm in finding the
optimal placement location.
IV. DATA PLACEMENT STRATEGY
A model has been constructed based on real-life param-
eters, which involve multiple geo-distributed data-sharing
scientific workflows. In this section, a novel data placement
strategy based on the model is described, which provides the
algorithm for finding a better data placement map to mini-
mize data transfer time and maximize the speed of finding
the optimal location. First, a data placement algorithm(DE-
DPSO-DPA) is designed to determine the final locations of
public datasets. Afterwards, the proposed data placement
strategy specifically based on DE-DPSO-DPA is described.
A. DE-DPSO-DPA
The PSO algorithm and DE algorithm are both heuristic
search algorithms based on groups, proposed by [19] and
[20], respectively. While elaborating on these algorithms,
which are beyond the scope of this paper, our DE-DPSO
algorithm does not only enables traditional PSO to solve dis-
crete problems like data placement through problem coding,
but also skillfully combines an efficient global optimization
DE algorithm.
1) Problem Encoding: Inspired by the coding strategy
mentioned in [13], a discrete coding strategy is provided
for the data placement problem to satisfy the well-known
principles of completeness, non-redundancy, and viability.
Most meta-heuristic algorithms filter the optimal solution
by generating n-dimensional candidate particles. In our data
placement problem, a particle maps a solution for data-
sharing scientific workflows in the proposed distributed
model. The ith particle in the algorithm’s tth iteration is
formulated as
Xi,t ={x1
i,t,x
2
i,t, ..., xd
i,t}(2)
where dis the dimension of this particle and represents the
number of datasets in the model. Meanwhile, xk
i,t denotes the
placement location of the kth dataset after the tth iteration
of the algorithm. It is worth noting that, for a particle
in this model, there are Qdimensions representing the
502
Algorithm 1 DE-DPSO-DPA
Input:
T(itermax) , t(current iteration), n (particles),
D (the dimension) , F(scaling factor), Crg,Crp
Output:
Res (the best optimal solution)
1: Set parameters and Initialize all datasets’ placement;
2: Set particle dimension H=|D.pub|;
3: for i:1toswarmsize n do
4: for d:1toHdo
5: Initialize xk
id randomly;
6: End for
7: Initialize pbest
8: End for
9: Initialize gbest
10: while t<=Tdo
11: for i=1tondo
12: Select a, b randomly from particles and a =b;
13: mutation(xi,t1,F,x
a,t1,x
b,t1) by Equation (3)
14: crossover(xpbest,t1,ui,t,Crp) by Equation (4)
15: crossover(xgbest,t1,ui,t,Crg) by Equation (4)
16: Select xi,t by Equation (5)
17: end for
18: t= t+1
19: end while
20: Update the best optimal solution Res
21: Output Result
private datasets are fixed, and the Hdimensions representing
datasets are shared for multiple workflows.
After determining the correspondence between each parti-
cle and the candidate solution in our model, a data placement
algorithm is proposed such that the exposed dataset could
be placed in the appropriate datacenter, in order to achieve
the goal of minimizing data transmission time and finding
the optimal location in the shortest time.
2) Algorithm description: The traditional PSO algorithm
is a classical heuristic algorithm, and its update strategy is
based on the velocity and position of the particles. When it is
used as a data placement algorithm, there are two problems:
It is easy to fall into local optimization.
It cannot handle discrete problems, such as the one in
this study.
To address these issues, the DE algorithm is used to
expand its search capability, and the algorithm is also dis-
cretized based on the application scenarios. The pseudocode
of our DE-DPSO-DPA algorithm for scientific workflows
is presented in Algorithm 1. Specifically, at the first part
(lines 1-9), all the datasets and the algorithm parameters
are initialized. The steps are the preprogressing of placing
shared datasets, and the algorithm evaluates the current
fitness of particles and also updates best values with the
best result. Thereafter, according to the steps described
in 10)17), the update strategy for each particle at every
iteration adapts the mutation and crossover operators of the
DE to update the best global solution, which effectively
prevents the algorithm from falling into the local optimal
solution. The mutation strategy for the ith particle at the tth
iteration is formulated as follows.
ui,t =xi,t1F(xa,t1xb,t1)(3)
where ui,t is a new feasible particle and Fis the scale
factor. After the mutation, the generated particle is changed
by a crossover strategy. For the individual cognition and
social cognition components, the crossover operator of our
algorithm is shown in Equation (4).
y =crossover(x1,x2,prob)
y[i]=x1[i]if Random.r < prob
x2[i]if Random.r prob
(4)
where Random.r is a random factor between 0 and 1.
prob is a parameter used to control the extent of crossover
operations. In order to get the optimal solution, the operation
is executed twice in each algorithm’s iteration. Specifically,
Crpdenotes the crossover parameter, which controls the
distance between the current particle position and the local
optimal position. Meanwhile, Crgis also a parameter that
proportionally selects indexes in an old particle and replaces
the segment between them with the gbest particle segment.
y,x1and x2also represent particles in different cases. The
final particle, y, obtained by the above operations at the tth
iteration is denoted as wi,t.
The optimality of a particle is usually measured by the
fitness function: the smaller it is, the better the performance.
In the present data placement problem, clearly, the better
particle must have a smaller data transfer time and a higher
speed of finding the final location of all the datasets. In
the discrete encoding method, the fitness value fit() is
defined as the data transmission time. In summary, the
update strategy can be described as follows:
xi,t =wi,t if fit(wi,t)<fit(gbest)
xi,t1if fit(wi,t)fit(gbest)(5)
B. DE-DPSO-DPS
Algorithm 2 is designed to describe our data placement
strategy for data-sharing scientific workflows. It consists
of three parts: data placement of private datasets (lines 1-
3), data placement of public datasets(lines 4-6), and the
combination of the former two parts (line 7-8).
In part one, after initializing the current storage of all
the datacenters, the total data transfer time, and the number
of DE-DPSO-DPA iterations to 0, all datasets are classified
into public datasets and private datasets. Subsequently, these
503
Algorithm 2 DE-DPSO-DPS
Input:
initial datasets DS, Tasks T, Datacenters DC
Output:
PM (data placement map), Data transfer time Ttrans,
Number of iterations Niter
1: Initial Data transfer time Ttrans = 0, number of algo-
rithm’s iteration Niter =0
2: Divide datasets DS into DS.pub and DS.pri;
3: Allocate datasets DS.pri to DC;
4: Divide datasets DS.pub into DS.ush and DS.sh;
5: allocate datasets D.pub into DC;
6: DE-DPSO-DPA;
7: Calculate Ttrans and Niter ;
8: Output Result
private datasets are distributed to fixed datacenters. In part
two, the remaining public datasets are divided into unshared
and shared datasets. These unshared and shared datasets
are then respectively distributed to proper datacenters ac-
cording to the DE-DPSO-DPA. In part three, this algorithm
assembles the maps, the total data transmission time, and
the number of iterations of all the datasets. There have
been two improvements: (1) all data placement solutions are
transformed into a closed annular searching space in order
to ensure that they are effective and are mapped to actual
datacenters; (2) the particle velocity speed is constrained
based on the dataset characteristics (e.g., public/private and
shared/unshared) in order to guarantee that DE-DPSO-DPS
can locate appropriate data placement solutions.
The problem with data placement strategies for data-
sharing scientific workflows in heterogeneous edge-cloud
computing environments can be represented by Equation(6).
It’s core purpose is to pursue a minimum total data trans-
mission time while satisfying the storage capacity constraint
for each datacenter.
Minimize Ttrans
subject to i, |D|
j=1 dj·lij capacityi
(6)
where lij is a flag which represents whether the datacenter
dcistores the dataset dj;lij =1indicates that it does,
while lij =0indicates that it does not. capacityidenotes
the storage capacity of the ith datacenter.
V. E XPERIMENTAL RESULTS AND ANALYSIS
In this section, several simulation experiments, designed
to evaluate the effectiveness of the proposed data placement
strategy, are described and the impact factors in our data
placement model are discussed. Comparing the results with
those from other strategies and also considering the different
scenarios of these factors, the advantages of our proposed
strategy in this context, as well as the impact factors, are
evaluated. All the experiments were conducted on a machine
with the following specifications: an Intel(R) Core(TM) i7–
4790 CPU @ 3.40 GHz, 16 GB of RAM, Windows10(64bit),
and IntelliJ IDEA2019.2.4.
A. Experimental Setup
In the developed DE-DPSO-DPA, the initial population
size was set to 100, the maximum number of iterations
was set to 2000, the scaling factor was set to 0.15, and
Crgand Crpwere set to 0.1 and 0.1, respectively. The
synthetic workflows from Montage in astronomy released
by Bharathi et al [21] were used, and both the number
of datasets and the structures differed in them. In order to
discuss the impact factors in our model, the number of edge
micro-datacenters was changed from three to five, and the
number of cloud datacenters was set at two. The storage
capacity of edge micro-datacenters was the same in the same
simulation, varying from 150 GB to 300 GB in steps of 50
GB. In addition, the bandwidth between different datacenters
in basic experiment was set as follows(its unit is M/s):
Bandwidth =
5555
520 20 20
520 100 150
5 20 100 200
5 20 150 200
(7)
the bandwidth between two cloud datacenters was set as
5 M/s, the bandwidth between an edge micro-datacenter
and a cloud datacenter was 20 M/s, and the bandwidth
between different edge micro-datacenters were set as {100
M/s, 150 M/s, 200 M/s}. In the basic experiment, there
are three edge micro-datacenters and their storage capacity
was set as 150GB. To compare the performance of different
bandwidths across different edge micro-datacenters of these
data placement strategies, we observed the bandwidth across
different edge micro-datacenters is {0.5, 0.8, 1.5, 3, 5}times
faster than the bandwidth in the basic experiment.
B. Performance Comparison
Four other data placement strategies were compared with
our DE-DPSO-DPS. The deployment of data in the other
strategies was based on the following algorithms: random,
DE, DPSO, and GA-DPSO. In view of the fact that, there are
several similarities between the distributed cloud computing
environment and our heterogeneous edge-cloud computing
environment, the methods used in the former can be refined
and applied to our model and compared to our strategy.
When the strategy is random, the number of iterations is
1, so the Niter was ignored. DE, DPSO, and GA-DPSO
are meta-heuristic algorithms, which need to converge and
may generate different results in each experiment. According
to [13], if the fitness values are consistent for over 80
consecutive iterations, the algorithm is considered to have
converged. The data transfer time and the speed of finding
504
Table III: Comparison of Different Strategies with Different Bandwidths
Bandwidth of Edge micro-Datacenters
Algorithms 0.5 0.8 1.5 3 5
Ttrans Niter Ttrans Niter Ttrans Niter Ttr ans Niter Ttrans Niter
Random 36564.17 - 26032.87 - 14787.45 - 9116.71 - 6788.63 -
DE-DPS 7883.97 1707 6182.27 1442 4845.19 1186 4088.46 1042 3787.79 1489
DPSO-DPS 8092.25 165 6223.92 182 4885.87 135 4114.91 117 3812.82 194
GA-DPSO-DPS 7827.42 1315 6112.22 1384 4830.79 1664 4081.86 1523 3786.71 1410
DE-DPSO-DPS 7818.04 211 6069.18 351 4830.34 327 4082.89 265 3785.50 198
*The unit of Ttrans is seconds(s), and Niter can be calculated by iteration times of data placement algorithm (DPA).
Table IV: Comparison of Different Strategies with Different Storage Capacity of Edge micro-Datacenters
Storage Capacity of Edge micro-Datacenters
Algorithms 150G 200G 250G 300G 350G
Ttrans Niter Ttrans Niter Ttrans Niter Ttr ans Niter Ttrans Niter
Random 19267.53 - 13947.78 - 14300.76 - 13413.87 - 12729.17 -
DE-DPS 5610.33 968 5483.35 1038 4460.15 1272 4383.37 1467 4358.24 769
DPSO-DPS 5685.15 151 5428.95 105 5877.73 109 4315.18 137 4268.19 105
GA-DPSO-DPS 5601.00 1485 5385.59 1263 4479.62 1549 4258.96 1623 4164.98 1637
DE-DPSO-DPS 5585.15 218 5382.84 323 4466.68 312 4251.91 251 4161.89 225
(a) (b)
Figure 4: Comparison of different strategies with and
without data sharing. (a) data transmission time; (b)
iteration times of DPAs.
the final location for all datasets are measured as the average
of 100 repeated experiments.
Table III and Table IV show the results of comparing
the different bandwidths and different storage capacities, of
the edge computing datacenters for the five data placement
strategies in achieving the optimal result for our scientific
workflows. As the bandwidth across different datacenters
increased, the number of their DPA’s iteration times did
(a) (b)
Figure 5: Comparison of different strategies with different
numbers of edge micro-datacenters. (a) data transmission
time; (b) iteration times of DPAs.
not fluctuate very much, and the data transmission time
decreased significantly. As shown in table IV, with the
increase of the storage capacities, both the data transmission
time and the number of iterations are reduced. For data
transmission time, no matter how the impact factors change,
the DE-DPSO-DPS is the most optimal. For the iteration
times of their DPA, only DPSO-DPS is slightly less than
DE-DPSO-DPS, which is determined by the fact that DPSO
505
is extremely prone to local optimization.
Figure 4 shows the data transfer time and the number
of iterations of different data placement strategies with
and without data sharing. Our data-sharing-based model is
superior to those in which each workflow is considered
separately, in terms of the total data transfer time under
the same data placement strategy. Additionally, the proposed
data placement strategy results in minimum data transfer
time, regardless of whether sharing is considered or not.
Futhermore, because all workflows are treated as one, the
model that considers shared datasets shows tremendous
advantage for their DPA’s iteration times in each strategy.
Figure 5 depicts the data transmission time and the num-
ber of iterations of the different data placement strategies
with the different number of edge micro-datacenters. As
the number of edge micro-datacenters increased, the speed
of data movement increased, and the data transmission
time also increased significantly. This result also implies
that the change in the number of edge micro-datacenters
will not affect the superiority of DE-DPSO-DPS over other
strategies.
VI. CONCLUSION
This study proposes a novel data placement strategy for
data-sharing scientific workflows in a heterogeneous edge-
cloud computing environment, based on the needs to reduce
data transmission time and increase the speed of optimal
data placement. A data placement model which considers
multiple workflows distributed across different geographic
regions is presented, and a data placement algorithm is
designed to allocate public datasets. Experimental results
express that the data placement strategy based on this algo-
rithm can effectively reduce the data transmission time in the
process of scientific workflow operation, as well as increase
the speed of finding optimal data placement, as compared
to other state-of-the-art algorithms. In addition, through the
discussion of the number of edge micro-datacenters; storage
capacity of edge micro-datacenters; bandwidth between dat-
acenters; and other factors that may affect the performance
of the data placement strategy, the advantages of our strategy,
in each case, were analyzed in more detail.
In the future, the impact of variations in the dataset sizes
and number of workflows as well as the impact of the
proportion of private datasets on scientific workflows on the
data placement strategies will be considered. Futhermore,
our model and DPS can be broadly applied to solve com-
plex data placement problems because it is not limited to
scientific workflows in heterogeneous edge-cloud computing
environments.
ACKNOWLEDGMENT
The work of this paper is supported by National
Key Research and Development Program of China
(2019YFB1405000), National Natural Science Foundation
of China under Grant (No. 61873309, No. 61972034,No.
61572137, and No. 61728202), and Shanghai Science and
Technology Innovation Action Plan Project under Grant
(No.19510710500,No. 18510732000,and No.18510760200).
REFERENCES
[1] A. Kashlev, S. Lu, and A. Chebotko, “Typetheoretic approach
to the shimming problem in scientific workflows,IEEE
Transactions on Services Computing, vol. 8, no. 5, pp. 795–
809, 2015.
[2] H. Li, K. C. Chan, M. Liang, and X. Luo, “Composition
of resource-service chain for cloud manufacturing,” IEEE
Transactions on industrial informatics, vol. 12, no. 1, pp.
211–219, 2016.
[3] T. L. Duc, R. G. Leiva, P. Casari, and P.-O. ¨
Ostberg, “Ma-
chine learning methods for reliable resource provisioning in
edge-cloud computing: A survey,” ACM Computing Surveys
(CSUR), vol. 52, no. 5, p. 94, 2019.
[4] Z. Wu, Z. Lu, P. C. Hung, S.-C. Huang, Y. Tong, and Z. Wang,
“Qamec: A qos-driven iovs application optimizing deploy-
ment scheme in multimedia edge clouds,” Future Generation
Computer Systems, vol. 92, pp. 17–28, 2019.
[5] Z. Lu, N. Wang, J. Wu, and M. Qiu, “Iotdem: An iot big data-
oriented mapreduce performance prediction extended model
in multiple edge clouds,” Journal of Parallel and Distributed
Computing, vol. 118, pp. 316–327, 2018.
[6] Q. Zhang, L. T. Yang, Z. Yan, Z. Chen, and P. Li, “An efficient
deep learning model to predict cloud workload for industry
informatics,” IEEE transactions on industrial informatics,
vol. 14, no. 7, pp. 3170–3178, 2018.
[7] H. Wu, S. Deng, W. Li, J. Yin, X. Li, Z. Feng, and A. Y.
Zomaya, “Mobility-aware service selection in mobile edge
computing systems,” in 2019 IEEE International Conference
on Web Services (ICWS). IEEE, 2019, pp. 201–208.
[8] W. Shi, H. Sun, J. Cao, Q. Zhang, and W. Liu, “Edge
computing-an emerging computing model for the internet of
everything era,Journal of computer research and develop-
ment, vol. 54, no. 5, pp. 907–924, 2017.
[9] W. Du, T. Lei, Q. Het, W. Liu, Q. Lei, H. Zhao, and W. Wang,
“Service capacity enhanced task offloading and resource allo-
cation in multi-server edge computing environment,” in 2019
IEEE International Conference on Web Services (ICWS).
IEEE, 2019, pp. 83–90.
[10] Y. Shao, C. Li, and H. Tang, “A data replica placement
strategy for iot workflows in collaborative edge and cloud
environments,Computer Networks, vol. 148, pp. 46–59,
2019.
[11] X. Chen, S. Tang, Z. Lu, J. Wu, Y. Duan, S.-C. Huang, and
Q. Tang, “idisc: A new approach to iot-data-intensive service
components deployment in edge-cloud-hybrid system,” IEEE
Access, vol. 7, pp. 59 172–59 184, 2019.
506
[12] X. Li, L. Zhang, Y. Wu, X. Liu, E. Zhu, H. Yi, F. Wang,
C. Zhang, and Y. Yang, “A novel workflow-level data place-
ment strategy for data-sharing scientific cloud workflows,
IEEE Transactions on Services Computing, 2019.
[13] B. Lin, F. Zhu, J. Zhang, J. Chen, X. Chen, N. Xiong,
and J. Lloret, “A time-driven data placement strategy for
a scientific workflow combining edge computing and cloud
computing,” IEEE Transactions on Industrial Informatics,
2019.
[14] D. Nukarapu, B. Tang, L. Wang, and S. Lu, “Data replication
in data intensive scientific applications with performance
guarantee,” IEEE Transactions on Parallel and Distributed
Systems, vol. 22, no. 8, pp. 1299–1306, 2011.
[15] D. Yuan, Y. Yang, X. Liu, and J. Chen, “A data placement
strategy in scientific cloud workflows,Future Generation
Computer Systems, vol. 26, no. 8, pp. 1200–1214, 2010.
[16] M. Wang, J. Zhang, F. Dong, and J. Luo, “Data placement
and task scheduling optimization for data intensive scientific
workflow in multiple data centers environment,” in 2014
Second International Conference on Advanced Cloud and Big
Data. IEEE, 2014, pp. 77–84.
[17] H. Li, D. Yang, W. Su, J. L¨
u, and X. Yu, “An overall
distribution particle swarm optimization mppt algorithm for
photovoltaic system under partial shading,IEEE Transac-
tions on Industrial Electronics, vol. 66, no. 1, pp. 265–275,
2019.
[18] J. Meng, H. Tan, C. Xu, W. Cao, L. Liu, and B. Li, “Dedas:
Online task dispatching and scheduling with bandwidth con-
straint in edge computing,” in IEEE INFOCOM 2019-IEEE
Conference on Computer Communications. IEEE, 2019, pp.
2287–2295.
[19] J. Kennedy and R. Eberhart, “Particle swarm optimization
(pso),” in Proc. IEEE International Conference on Neural
Networks, Perth, Australia, 1995, pp. 1942–1948.
[20] A. K. Qin, V. L. Huang, and P. N. Suganthan, “Differen-
tial evolution algorithm with strategy adaptation for global
numerical optimization,” IEEE transactions on Evolutionary
Computation, vol. 13, no. 2, pp. 398–417, 2008.
[21] S. Bharathi, A. Chervenak, E. Deelman, G. Mehta, M.-H.
Su, and K. Vahi, “Characterization of scientific workflows,”
in 2008 third workshop on workflows in support of large-scale
science. IEEE, 2008, pp. 1–10.
507
... As far as we know, there are only a few algorithms based on meta-heuristic approaches that have been proposed for scientific workflow data placement problems in cloud computing. For instance, [11][12][13][14] use NSGA-II, ACO, PSO, and PSO algorithms, respectively. However, our approach has some novelties compared to these approaches. ...
... Ant colony optimization-based data placement of dataintensive geospatial workflow, ACO-DPDGW, is proposed in [12] aiming to minimize the data transfer time. Taking advantage of both edge and cloud computing, in [13] authors propose DE-DPSO-DPS a data placement model, which uses a discrete particle swarm optimization algorithm with differential evolution (DE-DPSO-DPA). The goal is to balance data transfer time and data placement cost. ...
... 2. According to (11) and (12), the communication cost (13), is defined as the product of the number of switches on the routing path from cs i to cs j and the energy used by each switch. ...
Article
Full-text available
The processing of scientific workflow (SW) in geo-distributed cloud computing holds significant importance in the placement of massive data between various tasks. However, data movement across storage services is a main concern in the geo-distributed data centers, which entails issues related to the cost and energy consumption of both storage services and network infrastructure. Aiming to optimize data placement for SW, this paper proposes EQGSA-DPW a novel algorithm leveraging quantum computing and swarm intelligence optimization to intelligently reduce costs and energy consumption when a SW is processed in multi-cloud. EQGSA-DPW considers multiple objectives (e.g., transmission bandwidth, cost and energy consumption of both service and communication) and improves the GSA algorithm by using the log-sigmoid transfer function as a gravitational constant G and updating agent position by quantum rotation angle amplitude for more diversification. Moreover, to assist EQGSA-DPW in finding the optima, an initial guess is proposed. The performance of our EQGSA-DPW algorithm is evaluated via extensive experiments, which show that our data placement method achieves significantly better performance in terms of cost, energy, and data transfer than competing algorithms. For instance, in terms of energy consumption, EQGSA-DPW can on average achieve up to \(25\%\), \(14\%\), and \(40\%\) reduction over that of GSA, PSO, and ACO-DPDGW algorithms, respectively. As for the storage services cost, EQGSA-DPW values are the lowest.
Article
Full-text available
The exponential growth of Internet of Things (IoT) devices has ushered in an era of vast data generation, necessitating abundant resources for data storage and processing. Cloud environment forms a notorious paradigm for such data accommodation. Meanwhile, the privacy issues assimilated in IoT data provoke huge complications in data placement. In addition, it is significant to consider factors such as energy efficiency, energy utility of cloud and data access time of IoT applications while allotting resources for IoT data. In light of this circumstance, this research proposes a Fuzzy- Particle Swarm Optimization (PSO) framework to optimize IoT-oriented data placement in cloud data centers. The fuzzy Logic is adept at handling the uncertainty inherent in parameters such as resource availability and privacy sensitivity. Through membership functions and a Fuzzy Inference System, imprecise attributes are quantified, enabling smarter decision-making. Using its intelligence, it prioritizes the task with high sensitivity and resource availability to perform ideal allocation preferring best suitable resource feature unit. The integration of improved PSO leverages its capability to explore complex solution spaces and converge on optimal solutions. The greedy strategy in improved PSO assists in exploring most-optimal virtual machine instance in cloud to improve its resource efficacy. These facets culminate in a framework that holistically manages IoT-generated data, optimizing energy consumption, resource utilization, and data access time, while simultaneously upholding privacy constraints. The results underscore the potency of this approach in offering optimal data management in cloud environments, achieving better resource utilization of 89%, privacy sensitivity of 98.5%, and less energy consumption of 0.7 kWh.
Article
With the rapid development of the Internet of things (IoT) and mobile communication technology, the amount of data related to industrial Internet of things (IIoT) applications has shown a trend of explosive growth, and hence edge‐cloud collaborative environment becomes one of the most popular paradigms to place the IIoT applications data. However, edge servers are often heterogeneous and capacity limited while having lower access delay, so there is a contradiction between capacity and latency while using edge storage. Additionally, when IIoT applications deployed crossing edge regions, the impact of data replication and data privacy should not be ignored. These factors often pose challenges to proposing an effective data placement strategy to take full advantage of edge storage. To address these challenges, an effective data placement strategy for IIoT applications is designed in this article. We first analyze the data access time and data placement cost in an edge‐cloud collaborative environment, with the consideration of data replication and data privacy. Then, we design a data placement strategy based on ‐constraint and Lagrangian relaxation, to reduce the data access time and meanwhile limit the data placement cost to an ideal level. As a result, our proposed data placement strategy can effectively reduce data access time and control data placement costs. Simulation and comparative analysis results have demonstrated the validity of our proposed strategy.
Chapter
Business process management (BPM) is a crucial method for standardized and systematic management in the context of Industrial Internet, especially Industrial Internet of Things (IIoT). As the mobile internet develops rapidly and the volume of data generated by IoT devices increases, traditional storage schemes are no longer sufficient for managing mobile IIoT business data. To address this challenge, the cloud-edge collaborative data storage scheme has emerged as an effective solution. This approach leverages cloud data centers to provide mass storage capability, while also considering access delays by storing data on edge servers located near users. Given the constrained storage capacity of edge servers, the development of an efficient data placement strategy has emerged as a critical issue in enhancing the speed of data retrieval and task execution within IIoT business processes. In this chapter, we will explore the use of Bayesian optimization algorithm (BOA) in the context of IIoT resource placement strategy. Resource deployment plays a crucial role in optimizing business processes and achieving efficient utilization of resources. To enhance user experience, shorten overall running time, and reduce data transmission costs, a data placement strategy based on Bayesian optimization algorithm is discussed in this chapter. This strategy can allocate storage capacity on the edge side in a reasonable manner and determine the most appropriate edge server on which to place the data. The simulation results show that the BOA-based strategy outperforms those based on other optimization algorithms used in similar studies across varied scenarios.
Article
Full-text available
Large-scale software systems are currently designed as distributed entities and deployed in cloud data centers. To overcome the limitations inherent to this type of deployment, applications are increasingly being supplemented with components instantiated closer to the edges of networks—a paradigm known as edge computing. The problem of how to efficiently orchestrate combined edge-cloud applications is, however, incompletely understood, and a wide range of techniques for resource and application management are currently in use. This article investigates the problem of reliable resource provisioning in joint edge-cloud environments, and surveys technologies, mechanisms, and methods that can be used to improve the reliability of distributed applications in diverse and heterogeneous network environments. Due to the complexity of the problem, special emphasis is placed on solutions to the characterization, management, and control of complex distributed applications using machine learning approaches. The survey is structured around a decomposition of the reliable resource provisioning problem into three categories of techniques: workload characterization and prediction, component placement and system consolidation, and application elasticity and remediation. Survey results are presented along with a problem-oriented discussion of the state-of-the-art. A summary of identified challenges and an outline of future research directions are presented to conclude the article.
Article
Full-text available
With rapid development of the big data technology and the Internet, the requirements of human activities for data are getting higher and higher, and the increasing data volume has a high demand for data processing. The paradigm of the Internet of Things (IoT) has become a key component for edge-cloud-hybrid systems. In the edge environment, multiple IoT-data-intensive services will form a service combination. Due to the data transmission between different service components, there is a huge transmission delay in the process of IoT data transmission, which will affect the performance of the entire system. Therefore, by regarding the reduction of transmission delay as our optimization goal, we put forward iDiSC: a new heuristic approach for IoT-data-intensive service component deployment in the Edge-Cloud-Hybrid System. We also design the iDiSC model, then we optimize the model to select the optimal deployment scenario with the minimum guaranteed latency. Through a series of experiments, compared to genetic algorithm and simulated annealing algorithm, the experimental results show that the iDiSC algorithm has higher efficiency and performance for the problem of data-intensive service component deployment problem in the Edge-Cloud-Hybrid Environment.
Article
Full-text available
Solar photovoltaic (PV) system under partial shading conditions (PSC) has a non-monotonic P-V characteristic with multiple local maximum power points, which makes the existing maximum power point tracking (MPPT) algorithms invalid or unsatisfactory to track the global maximum power point (MPP). This paper proposes a novel overall distribution (OD) MPPT algorithm, which can rapidly search the area near the global MPP. In order to find the global MPP accurately, the OD MPPT algorithm can be further combined with other intelligent algorithms. In this paper, the particle swarm optimization (PSO) algorithm is chosen as the combined algorithm because of its simplicity. Then, the OD-PSO MPPT algorithm is firstly proposed and applied into the global MPPT of a PV system, the simulation and experimental results prove the effectiveness and accuracy of the proposed OD-PSO MPPT algorithm by comparing with the existing PSO MPPT algorithm. Therefore, this paper provides a good reference for the MPPT of PV systems under partial shading conditions.
Article
Compared to traditional distributed computing envi-ronments such as grids, cloud computing provides a more cost-effective way to deploy scientific workflows. Each task of a scientific workflow requires several large datasets that are located in different datacenters, resulting in serious data transmission de-lays. Edge computing reduces the data transmission delays and supports the fixed storing manner for scientific workflow private datasets, but there is a bottleneck in its storage capacity. It is a challenge to combine the advantages of both edge computing and cloud computing to rationalize the data placement of scientific workflow, and optimize the data transmission time across different datacenters. In this study, a self-adaptive discrete particle swarm optimization algorithm with genetic algorithm operators (GA-DPSO) was proposed to optimize the data transmission time when placing data for a scientific workflow. This approach consid-ered the characteristics of data placement combining edge compu-ting and cloud computing. In addition, it considered the factors impacting transmission delay, such as the bandwidth between dat-acenters, the number of edge datacenters, and the storage capacity of edge datacenters. The crossover and mutation operators of the genetic algorithm were adopted to avoid the premature conver-gence of traditional particle swarm optimization algorithm, which enhanced the diversity of population evolution and effectively re-duced the data transmission time. The experimental results show that the data placement strategy based on GA-DPSO can effectively reduce the data transmission time during workflow execution combining edge computing and cloud computing.
Article
The convergence of edge and cloud computing shares their strengths, such as unlimited shared storage and computing resources from cloud, low-latency data preprocessing of edge computing. The collaboration of the two computing paradigms can provide a real-time and cost-effective way to deploy Internet of Things (IoT) workflows among cooperative user groups. Since huge amounts of datasets continuously generated from user devices, how to place them to reduce the data access costs while meeting the deadline constraint is a critical issue. This paper proposed a novel data replica placement strategy for coordinated processing data-intensive IoT workflows in collaborative edge and cloud computing environment. Firstly, data replica placement can be modelled as a 0–1 integer programming problem to consider the overall data dependency, data reliability and user cooperation. And then, the ITÖ algorithm, a variant of intelligent swarm optimization, is presented to address this model. The experimental results show that the proposed method outperforms these compared algorithms. It can not only find a higher quality solution of data replica placement, but also need a lower computing budget compared with these traditional algorithms.
Article
Deploying applications to a centralized cloud for service delivery is infeasible because of the excessive latency and bandwidth limitation of the Internet, such as transporting all IoVs data to big data processing service in a centralized cloud. Therefore, multi-clouds, especially multiple edge clouds is a rising trend for cloud service provision. However, heterogeneity of the cloud service, complex deployment requirements, and large problem space of multi-clouds deployment make how to deploy applications in the multi-clouds environment be a difficult and error-prone decision-making process. Due to these difficulties, current SLA-based solution lacks a unified model to represent functional and non-functional requirements of users. In this background, we propose a QoS-driven IoVs application optimizing deployment scheme in multimedia edge clouds (QaMeC). Our scheme builds a unified QoS model to shield off the inconsistency of QoS calculation. Moreover, we use NSGA-II algorithm to solve the multi-clouds application deployment problem. The implementation and experiments show that our QaMeC scheme can provide optimal and efficient service deployment solutions for a variety of applications with different QoS requirements in CDN multimedia edge clouds environment.
Article
Deep learning, as the most important architecture of current computational intelligence, achieves super performance to predict the cloud workload for industry informatics. However, it is a non-trivial task to train a deep learning model efficiently since the deep learning model often includes a great number of parameters. In this paper, an efficient deep learning model based on the canonical polyadic decomposition is proposed to predict the cloud workload for industry informatics. In the proposed model, the parameters are compressed significantly by converting the weight matrices to the canonical polyadic format. Furthermore, an efficient learning algorithm is designed to train the parameters. Finally, the proposed efficient deep learning model is applied to the workload prediction of virtual machines on cloud. Experiments are conducted on the datasets collected from PlanetLab to validate the performance of the proposed model by comparing with other machine learning-based approaches for workload prediction of virtual machines. Results indicate that the proposed model achieves a higher training efficiency and workload prediction accuracy than state-of-the-art machine learning-based approaches, proving the potential of the proposed model to provide predictive services for industry informatics.