ArticlePDF Available

A multitime‐steps‐ahead prediction approach for scheduling live migration in cloud data centers

Authors:

Abstract and Figures

One of the major challenges facing cloud computing is to accurately predict future resource usage to provision data centers for future demands. Cloud resources are constantly in a state of flux, making it difficult for forecasting algorithms to produce accurate predictions for short times scales (ie, 5 minutes to 1 hour). This motivates the research presented in this paper, which compares nonlinear and linear forecasting methods with a sequence prediction algorithm known as a recurrent neural network to predict CPU utilization and network bandwidth usage for live migration. Experimental results demonstrate that a multitime‐ahead prediction algorithm reduces bandwidth consumption during critical times and improves overall efficiency of a data center.
Content may be subject to copyright.
Received: 29 December 2017 Revised: 29 June 2018 Accepted: 25 July 2018
DOI: 10.1002/spe.2635
SPECIAL ISSUE PAPER
A multitime-steps-ahead prediction approach for
scheduling live migration in cloud data centers
M. Duggan R. Shaw J. Duggan E. Howley E. Barrett
Information Technology, National
University of Ireland Galway, Galway,
Ireland
Correspondence
M. Duggan, Information Technology,
National University of Ireland Galway,
Galway H91 TK33, Ireland.
Email: m.duggan1@nuigalway.ie
Funding information
Irish Research Council for Science,
Engineering and Technology
Summary
One of the major challenges facing cloud computing is to accurately predict
future resource usage to provision data centers for future demands. Cloud
resources are constantly in a state of flux, making it difficult for forecasting algo-
rithms to produce accurate predictions for short times scales (ie, 5 minutes to
1 hour). This motivates the research presented in this paper, which compares
nonlinear and linear forecasting methods with a sequence prediction algorithm
known as a recurrent neural network to predict CPU utilization and network
bandwidth usage for live migration. Experimental results demonstrate that a
multitime-ahead prediction algorithm reduces bandwidth consumption during
critical times and improves overall efficiency of a data center.
KEYWORDS
cloud computing, CPU, network bandwidth, neural network, prediction algorithms
1INTRODUCTION
It is estimated, by 2020, there will be 51 974 GB of internet traffic generated per second1and with more technology com-
panies offering computing as a service via cloud computing, this will generate massive volumes of data from hosts and
virtual machines (VMs). The more information available from these host machines and VMs will provide a better under-
standing of how they function and will also allow for a more in-depth monitoring of cloud resources. CPU is one of the
most important metrics for measuring and testing the performance of a host machine. Recent studies by other works2-4
investigated one-step ahead CPU utilization forecasting methods such as local regression and feed-forward neural net-
works. However, these one step ahead prediction models (usually forecast on a time scale no longer than 5 minutes ahead)
give insufficient time for cloud resources to be adjusted. When sudden high demands occur, eg, if an algorithm can pre-
dict further into the future there will be a greater chance to prevent a host becoming overutilized. Benson et al hasshown5
predicting a workload on a short time scale such as 5 to 10 minute intervals is more difficult than predicting for long-term
time scales (ie, time steps of days or weeks). This is due to the fact that cloud resources in these short time scales can be
extremely unpredictable. This is highlighted by Armburst et al who lists performance unpredictability as one of the main
obstacles to preventing the growth of cloud computing.6The further into the future that an algorithm can accurately pre-
dict resources usage is critical to how well a data center is able to optimize its resources. This is one of the key ideas that
has motivated this research. Another key metric to monitor when investigating live migration is the network bandwidth
availability, as it affects how quickly a VM will be transferred from a source to a destination host. Akoush et al presented
models that predict migration performance for specific workloads on a Xen virtual machine monitor and research by
Hu et al demonstrated that estimating future network traffic can enable better planning, greater resource provisioning,
and faster transferring of data.7,8At the moment, resource prediction approaches only examine single time step ahead
prediction.
Softw Pract Exper. 2018;1–23. wileyonlinelibrary.com/journal/spe © 2018 John Wiley & Sons, Ltd. 1
2DUGGAN ET AL.
In this study, we apply several forecasting algorithms to optimize the scheduling times of migration and improve a data
centers efficiency by reducing service level agreement violations (SLAVs). We implement a multiple-time-step-look-ahead
algorithm known as a recurrent neural network (RNN) to predict CPU and network bandwidth and compare the results
against both traditional one step ahead nonlinear and linear forecasting algorithms. A major reason for implementing
the RNN is that it has the ability to retain information and accurately make predictions for time series problems, making
it a promising candidate to predict cloud resources with greater accuracy when compared with traditional approaches.
The main contributions of this paper are as follows.
1. Compare traditional nonlinear and linear forecasting model accuracy with an RNN when predicting CPU and band-
width “one time step” ahead. The results are validated by using evaluation metrics such as mean square error (MSE),
root MSE (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE).
2. Examine the performance of the RNN when predicting host CPU and network bandwidth for multitime steps into the
future.
3. Examine the performance of a data center when both one time step model and multitime-steps-ahead model are
implemented in a simulated cloud environment.
The rest of this paper is organized as follows. In Section 2, we discuss the background of live migration, related work
on network aware live migrations, discuss how artificial intelligence (AI) is impacting cloud computing, and detail the
algorithms used in this paper. Section 3 details the experiments conducted and data models used. In Section 4, we present
our experimental analysis and results and discussion. Section 5 concludes this paper.
2RELATED WORK AND BACKGROUND
Live VM migration allows for VMs to be moved from source to destination hosts while ensuring no performance degrada-
tion to a users experience occurs. Live migration transfers the VMs entire system state, including CPU, memory, and disk.
Live migration consists of two phases, precopy (VM migration time) and stop-and-copy (downtime). The precopy phase
has the VM still running on the source host while its memory pages are copied over to the destination host. Stop-and-copy
phase terminates the VM on the source host after a certain amount of iterations of memory page copying and the remain-
ing unsynchronized memory pages of the VM is then copied over to the destination host. Once this is completed, the
VM is activated on the destination host. The live migration process consumes significant resources such as CPU and net-
work bandwidth from both the source and the destination hosts. Several studies such as the works of Wood et al and
Mandal et al have been conducted in provisioning adequate bandwidth.9,10 They both highlight that when high amounts
of the bandwidth resource is available it reduces migration times. Clarke et al highlighted that VM migration consumes
large amounts of bandwidth for several seconds.11 The computation of the total migration time and downtime of a set
of VMs depends on how many of VMs are simultaneously transferred off an overutilized host machine. This highlights
the need for a forecasting model that will accurately predict when a host will become over utilized and decide when live
migration should be scheduled so as to not saturate the network bandwidth during peak times.
2.1 Network aware live migration
Even though live migration has attracted attention in recent years, it has primarily focused on power management and
the efficiency of physical hosts, such as the works of Beloglazov and Buyya and Verma et al.12,13 Recently, researchers have
been looking closely into how network traffic contributes to long delays when migration of VMs occur. Chen et al proposed
a novel migration strategy that quantifies the benefits of VM migration and cost of VM placement to a network link load in
data centers.14 They take the network link load and bandwidth cost factors into account when conducting migration. Piao
and Yan presented a network aware VM placement and migration approach for data intensive applications.15 They devel-
oped a model that places a VM on a host machine, taking into account the network conditions between the source host and
destination. Stage and Setzer proposed a migration and scheduling model, while considering bandwidth requirements
and network topology.16 Chen et al researched how to coordinate multiple VM migrations while sharing network links and
global bandwidth resources.17 Ghorbani and Caesar presented a method for guaranteeing network bandwidth; however,
these approaches come at a cost of high network resources utilization.18 Duggan et al19 examined how an autonomous
learning agent decides appropriate times to schedule live migration from underutilized hosts by observing cloud traf-
fic demand patterns. They show, by analyzing current bandwidth values, reinforcement learning can enhance live
DUGGAN ET AL.3
migration, reducing energy consumption, and improving overall system performance based on an service level agree-
ment (SLA). Duggan et al investigated when to migrate VMs from an overutilized host by using reinforcement learning
and a moving average (MA) prediction approach for bandwidth availability to determine the optimal time to schedule
live migration.20 The work in this paper differs from the aforementioned research as we examine several nonlinear, linear,
and sequence prediction algorithms to decide when a host will become overutilized and to determine when to migrate
VMs based on network bandwidth availability.
2.2 Forecasting
Accurate forecasting techniques are important for effective planning to predict future resource requirements and as a
result it has a significant bearing on practical decision making. As a result, forecasting and predication methodologies have
been widely studied and have been applied to several different areas such as predicting household energy consumption,21
stock market returns,22 and global environmental change.23 Recently, there has been a move toward integrating AI tech-
niques in general to improve the overall efficiency of a cloud data center. Several works show how a broad range of AI
algorithms can provide cloud systems with the abilities to better adapt to the changes in cloud resource consumption to
improve resource scaling, VM live migration,19,20,24-27 and resource allocation28,29 in cloud computing. Tang et al exploited
collaborative filtering and location-based data smoothing for cloud service quality of service (QoS) prediction.30 Their
algorithm employed a location-based data smoothing algorithm to fill missing QoS values in the users QoS matrix and
outperformed other collaborative filtering techniques. The smoothing algorithm was based on the observation of the users
from the same local neighborhood, which have similar QoS performance.
The inherently dynamic nature of cloud workloads makes it difficult to makeinformed resource management decisions
without a relatively accurate estimate of future resource requirements.31,32 In recent times, there has been a growing
interest in the application of statistical methods and machine learning (ML) techniques to improve the efficiency and
reliability of cloud services. This growing trend is largely due to the many fruitful and compelling benefits of adopting
predictive ML solutions including the potential to deliver greater energy efficiency and improved performance across a
broad range of areas. One area in particular that has seen a significant growth in the number of predictive-based solutions
is VM placement optimization.33-37 These predictive-based solutions showed that by predicting future resource demand
they could improve energy related costs and environmental sustainability of modern data centers. Predictive algorithms
have also been used to improve cloud capacity planning and execution times by predicting server workloads.32,38 Research
has also explored the application of predictive techniques to estimate future network load for workflow scheduling on
the cloud.7,26,39 The results showed that, by adopting more proactive solutions, it can enable better planning to improve
the efficiency of scheduling decisions while also reducing costs significantly. In particular, results showed a reduction in
data transfer times while also increasing the number of valid schedules that are better able to meet QoS requirements.
Live migration is a promising technique to improve load balancing, achieve better resource utilization, and cost man-
agement while also supporting energy efficiency in clouds.18 Despite such benefits, live VM migration remains a challenge
for cloud providers, since the reallocation of VMs in the data center results in significant overheads on both the physical
hosts and also the shared network. Poor migration decisions result in hosts becoming overloaded, data transfer delays,
and SLAVs.16,18 Currently, the vast majority of works in the literature focus on heuristic approaches, which often base
their migration decisions on current resource demands.13-15,18 However, such methods often lend themselves to less effi-
cient resource utilization and redundant decisions in the data center. Application workloads often exhibit time-varying
demand patterns; failing to consider future demand can quickly result in redundant migration decisions, having a negative
impact on energy and performance.40 By estimating future resource demand, resources utilization can be more efficiently
planned, allowing providers to determine when a host will become overutilized while also enabling data to be transferred
at more optimal times to improve the overall QoS provided. Akoush et al8demonstrated that, by using a predictive-based
approach, they could accurately estimate migration times to enable more dynamic and intelligent placements of VMs
without degrading performance. Cui et al41 introduced a novel context-based prediction algorithm to optimize the pre-
copy phase of the live migration process. The results showed that by using prediction they could significantly shorten
total migration time, downtime, and the total pages transferred. Using regression, Wu and Zhao42 proposed a performance
model to predict migration latency. The model can be employed to predict migration time given its application's behavior
and the resources available to the migration. Forecasting CPU resource usage to enable improved migration scheduling
is an area that has always been a popular avenue of research in cloud computing. Some of the most influential work
stems from authors such as Liu et al, who introduced a 10-second look ahead CPU prediction approach to guide schedul-
ing decisions. Beloglazov and Buyya also proposed the implementation of a local regression algorithm to improve the
4DUGGAN ET AL.
management of cloud resources by predicting when hosts will become over utilized in a data center based on CPU
utilization.43 The results overall demonstrated that by estimating CPU demand they could migrate VMs before hosts
became overloaded to improve energy efficiency, reduce SLAVs, and VM migrations, overall resulting in more optimized
resource utilization.
The ability to accurately predict future resource usage is a key challenge facing cloud resource management strategies
due to the growing complexity of modern data centers.44 The accuracy of a prediction model has a large impact on the
overall performance of live migration. In determining which hosts in the data center will become overutilized, inaccu-
rate predictions can result in suboptimal decisions. In cases where the prediction model is overpredicting CPU resource
usage, the system may migrate VMs off hosts whose actual resource usage in future time steps remains at an optimal
level. As a result the system would incur unnecessary migrations, increased energy consumption, and SLAVs. Conversely,
underpredicting CPU resource usage could also result in failing to migrate VMs off hosts that will be overutilized in the
future leading to SLAVs and greater energy consumption. Similar to the previous scenarios, inaccurate resource predic-
tions would also have a negative impact on scheduling live migrations. In particular, using such estimates would result in
VMs being migrated at sub optimal times causing increased transfer times and SLAVs, while overall enabling inefficient
usage of network resources. An accurate predictive model would allow for more optimal and proactive migration deci-
sions. By obtaining a relatively accurate estimate of CPU demand, hosts likely to become overloaded can be detected in
advance and action can be taken prior to incurring significant service degradation.43 Moreover, an accurate estimation of
bandwidth resources can also improve the live migration process. An accurate estimate would allow a solution to better
adapt to dynamically changing network conditions to generate more efficient scheduling decisions.26 Overall, employ-
ing an accurate prediction model for live migration would enable better planning, more efficient resource provisioning,
and faster transferring of data as highlighted in the following studies.7,8,41-43 Given the importance of accurate predictive
models, our work conducts a comparative study using novel and commonly used forecasting techniques including neu-
ral networks, which are known for their powerful predictive capabilities.45 We compare the accuracy of each model on
both the training and test data sets across a range of widely used performance metrics, namely, MSE, RMSE, MAE, and
MAPE. In other works, Zhang et-al examined the problem of predicting available CPU performance in a time-shared grid
system.46 They evaluate a new and innovative method to predict the one-step-ahead CPU load in a grid. Their prediction
strategy forecasts the future CPU load based on the variety tendency in several past steps and in previous similar patterns.
They examined their strategy with four different ML algorithms with a large load trace to reduce the prediction errors,
which are between 22% and 86% less than those incurred by four previous methods. Cao et al applied a novel ensemble
model for online CPU load predictions in the cloud environment.47 The main focus of their model was the multiple pre-
dictor set, whose predictor members can be dynamically adjusted. Their results showed that an ensemble predictor model
can achieve high prediction accuracy.
Neural networks are one of the most effective and versatile ML algorithms and have been successfully applied to
areas of cloud computing in scheduling,48 intrusion detection,49 distributed denial of service attack defense50 and load
forecasting.51 Neural networks have previously been used to forecast resource demands in cloud computing. Duy et al
employed a neural network predictor for optimizing server power consumption in a data center.48 They use a feed-forward
neural network to predict future load demands based on historical demands to turn on/off servers to minimize the
energy usage. Prevost et al implemented neural networks and a linear predictor algorithms to forecast future workloads.51
Bey et al used several different models for time series prediction. They use an adaptive network to estimate the future value
of CPU load for distributed computing.4However, their hybrid predictors were designed to perform for one-step-ahead
prediction and the work presented in this paper builds on this work by predicting both one step and multisteps ahead. All
of the aforementioned research highlighted outlines how neural networks are effective at addressing many of the problems
in cloud computing, in particular, CPU forecasting. The research presented in this paper makes the novel contribution of
applying RNNs to predict CPU to determine when overutilization will occur in a host and predict network bandwidth to
determine the optimal times to migration VMs. Zhang et al showed that resource request prediction can be improved by
the latest deep learning techniques.52 Their deep belief network was used to predict both CPU and random access memory
(RAM). Their approach improves short-term prediction by reducing the MSE by 76% and 61% and long-term predictions
by 83% and 67% for both CPU and RAM. Mason et al40 implemented a number of state-of-the-art swarm and evolutionary
optimization algorithms of a neural network to predict CPU usage, such as particle swarm optimization, differential evo-
lution, and covariance matrix adaptation evolutionary strategy. Their results show that the covariance matrix adaptation
evolutionary strategy-trained neural network out performs both differential evolution and particle swarm optimization,
but most importantly, all swarm and evolutionary-trained neural network outperform traditional approaches such as
linear regression (LR) and MA.
DUGGAN ET AL.5
As mentioned earlier, recently, network bandwidth prediction has been gaining attention in cloud research. By esti-
mating future demand, resources can be more efficiently planned for, allocated in advance, and released once they are
no longer required to improve the overall QoS provided.53 In similar work, Hu et al applied autoregressive integrated MA
(ARIMA) modeling to estimate network traffic performance based on simple network management protocol data.7In
addition, Genez et al39 proposed the implementation of an alternative multiple LR model to combat the impact of impre-
cise estimates of bandwidth availability on workflow scheduling decisions in hybrid clouds. Their results showed that
schedulers, which adopt such an approach, can increase the number of qualified schedules that meet QoS requirements.
Shaw et al used an ARIMA model to forecast a days worth of bandwidth utilization within a data center.26
2.3 Comparative forecasting methods
In this research, we compare several nonlinear and linear one time step” ahead prediction algorithms with a multi-step
ahead algorithm known as an RNN to predict host CPU and network bandwidth utilization. The following section
describes all algorithms used in this paper. The research presented in this paper makes the novel contribution by com-
paring nonlinear and linear prediction algorithms that predict host CPU and network bandwidth one step ahead with an
RNN that predicts multiple time steps ahead to optimize live migration.
2.3.1 Nonlinear prediction methods
Neural networks are function approximators that are inspired by the biological neural networks that constitute the human
brain.54 Some of the applications of neural networks include power generation,55 control,56 and watershed management.57
Figure 1 illustrates the architecture for a neural network, which is arranged in a number of layers. The input layer is
responsible for taking in the inputs to the model, the hidden layer is where the computation is carried out, and the output
layer produces the output of the model.
The standard feed-forward network consists of an input layer of neurons, one or multiple hidden layers of neurons, and
an output layer. The neural networks receives information in form of a signal (normalized between 0 and 1) through the
input layer neurons, and then outputs a signal using the sigmoid function. The signal or input that the network receives
in this paper is in the form of two CPU utilization values from a host machine or two bandwidth utilization values. Both
sets of data are normalized between 0 and 1.
The two input values (CPU or bandwidth) are propagated forward through the hidden layers of neurons via synapses
(weighted connections). Then, the network calculates an output at the output layer neuron or neurons. In this paper, only
one output is needed and the output signal corresponds to a future CPU or bandwidth values of a host machine. An error
signal is calculated by finding the difference between the actual and the predicted value. This error is then propagated
back through the network, and the weights (synapses) are adjusted to correct the error of the prediction.
Aside from the input layer, a neuron in any other layer will have an input. The sum of the weighted signals that are
outputted from other connected neurons. A neurons input signal is described by
v𝑗=
N
i=1
wi,𝑗 ai,(1)
FIGURE 1 This figure illustrates a recurrent neural network.55 Neurons are connected to the weighted synapses that pass signals between
neurons. The recurrent synapses can be seen in the hidden layer of neurons. This gives the recurrent network the abilities to retain
information [Colour figure can be viewed at wileyonlinelibrary.com]
6DUGGAN ET AL.
where vjis the input to a neuron in the jth layer, layer iis the preceding layer to jthat contains Nneurons, each neuron
in layer ihas output ai, and each of these output signals are weighted by the value wi,jas they are passed to each neuron
in layer j.
Each neuron aioutputs a value between 0 and 1. This output value is determined by the activation function of the
neuron. The most commonly used activation function is the sigmoid function. This is described by
a𝑗=1
1+exp v𝑗
.(2)
This research implements an RNN, illustrated in Figure 1. Recurrent networks are different from the standard
feed-forward networks as the hidden layer neurons have recurrent connections. These connections allow the hidden layer
neurons to connect to itself, thus giving the neural network memory of previous predictions, which makes it well suited
to the problem of predicting CPU utilization or bandwidth demand. The recurrent network in this paper is trained using
the popular backpropagation-through-time (BPTT) algorithm.58 The idea of BPTT is the unfolding of the RNN at a dis-
crete time into a multilayer feed-forward neural network each time a sequence is processed. The BPTT is different to the
feed-forward neural networks as it enables the RNN to store past information; thus, it is suitable for sequential models.
The standard feed-forward algorithm is updated as follows:
v𝑗=
N
i=1
wi,𝑗 ai+
m
h=1
sh(t1)uih,(3)
where Uis the recurrent weight matrix and sh(t1)is the previous hidden layer.
Normal backpropagation (BP) algorithm weights are updated by calculation of the cost function (the error from the
actual answer and predicted answer) shown as
C=12
k
p=1
o
e=1
(dpk 𝑦pk)2,(4)
where dis the desired output, kis the total number of training samples, and ois the number of output units. Then, the
change in weights for the output nodes can be calculated as shown
𝛿pk =(dpk 𝑦pk)g(netpk ),(5)
where gis the activation function while net represents inputs. The changes in weights for the hidden layer weight can be
represented as shown
𝛿p𝑗=
o
k=1
𝛿pk wk𝑗g(netp𝑗).(6)
Therefore, the recurrent weights can be then backpropagated back through the network as shown
Δuih =
N
p=1
𝛿p𝑗sph(t1).(7)
The reason for using the RNN to predict cloud resources such as CPU and bandwidth utilization over a feed-forward
neural network is due to their ability to retain information and accurately make predictions for time series problems.
This makes it a promising candidate to predict CPU and bandwidth utilization with greater accuracy when compared
with traditional approaches. Each neural network used in this research has three hidden neurons in the hidden layer and
has two inputs from the input layer. The inputs into the networks are the current and previous values of CPU utilization
or bandwidth demand. When parameter sweeps were conducted, showed that a network with three hidden neurons
produced the best results. The parameter sweeps also highlighted that having greater than two inputs for the RNN on both
the bandwidth and CPU utilization did not increase the performance. The network had one output that corresponded to
the network's prediction of future CPU utilization or bandwidth data. Each neural network will be trained over 10 000
evaluations and will be evaluated on unseen test data. The experiments are repeated over 10 runs to ensure statistically
significant results.
Backpropagation is the most popular method of neural network training and is suitable for supervised learning prob-
lems. The algorithm works by calculating the error between the target output and the observed output. This error is
then propagated back through the network and is used to update the weights. A standard feed-forward BP algorithm
DUGGAN ET AL.7
is implemented in this research as a benchmark algorithm for comparative purposes against the RNN algorithm. This
network also has three hidden neurons and one output. The BP algorithm is similar to the recurrent network described
in Section 3.3. The feed-forward method is calculated exactly as Equation 2 is calculated, and the BP calculation is the
same as 3,5, and 6, however, without the recurrent layer calculation. In this paper, we also examine how many inputs
a standard backprop-trained network needs to produce similar accuracy as a recurrent network. As RNN retains infor-
mation from previous predictions, we have implemented the sliding window technique59 to enable the BP algorithm to
predict both CPU and bandwidth. An important contribution of feed-forward neural networks is that they are capable of
modeling more complex nonlinear functions; as a result, they are widely used as time series forecasters. However, neural
networks rely on the assumption that there is a learnable mapping from a set of inputs sequences to output values. Unlike
RNNs, which have the ability to learn the context (temporal dependency) of observations over time, a key requirement of
feed-forward neural networks is the specification of the temporal dependency of the time series data in the design of the
model. As a result, experiments were conducted with the sliding window method, which maps an input sequence of size
kto an output value y, where the input sequence xis a vector of bandwidth or CPU values over ktime intervals and the
predicted value yis a single time step ahead of the input sequence. The size of the windows considered in this paper are
2, 10, 20, and 30; these all relate to number of inputs of CPU or bandwidth to the network.
The ARIMA modeling is one of the most popular and frequently used forecasting approaches in time series analysis.60
At its simplest, a time series can be described as a collection of observations over successive time intervals from which
future values may be predicted. Time series data is often composed of several fundamental components such as long term
trends, seasonal fluctuations, and correlations between sequential observations. The aim of an ARIMA model is to identify
and describe the underlying components and systematic variations in the time series data to forecast future values. An
ARIMA model is defined by three terms denoted as (p,d,q). The identification of a valid model is the process of finding
suitable values for p,d,andq, which best capture the fundamental patterns in the data.
The integrated component of the model (d)is identified prior to determining the values of pand q. One of the fundamen-
tal principles in applying an ARIMA methodology is the time series data must be stationary. The concept of stationarity
plays a crucial role in the process of fitting an ARIMA model; a nonstationary series is often unstable and can result in
false correlations in the series making it extremely difficult to model. In general, a stationary series is one whose statisti-
cal properties such as mean and variance remain constant over time. To transform a nonstationary series into a stationary
one, the series must be differenced; this is achieved by subtracting the value of an earlier observation from the value of a
later observation. The number of times the time series data is differenced determines the value of the component (d).
The autoregressive (AR)term (p)represents the lingering effect of preceding observations on current values in the series.
For example, an AR(1)model forecasts future values based on the value of the preceding observation yt1as denoted in
Equation 8 as follows, where 𝜙is a parameter of the model known as the autoregressive coefficient, which represents the
magnitude of the relationship and 𝜀trepresents the random variation at the current time period t
𝑦t=𝜙(𝑦t1)+𝜀t.(8)
The MA (MA)term (q)represents the effects of previous random variation on the current periods random error. For
example, an MA(1)model forecasts future values based on a combination of the current random variation and previous
error as defined
𝑦t=𝜃(𝜀t1)+𝜀t,(9)
where 𝜀t1is the value of the previous random shock; 𝜃is the correlation coefficient of the model, which defines the extent
of the relationship; and 𝜀trepresents the random variation at the current time period t. The combined model assuming
differenced data is denoted
𝑦t=c+𝜙1(𝑦t1)+ +𝜙p(𝑦p1)+𝜃1(𝜀t1)+ +𝜃q(𝜀tq)+𝜀t.(10)
The ARIMA models are also capable of modeling a variety of highly seasonal data. Seasonal ARIMA models are classified
by including the following additional seasonal terms (P,D,Q)m,wheremdefines the number of periods per season. The
seasonal portion of the model operates across previous seasonal periods as opposed to previous observations, which occurs
in the standard model introduced above. However, in practice, both models are often combined to capture all of the
fundamental characteristics of a seasonal time series. In this study, the Box-Jenkins methodology was employed to fit only
the bandwidth model.61 This methodology consists of several steps, ie, model identification, estimation and diagnostic
checking, and lastly forecasting and validation.
8DUGGAN ET AL.
2.3.2 Linear prediction methods
Linear regression is a popular statistical approach to estimate the relationship between one or more input variables and the
output variable. The case of one input variable is called simple regression. More than one input variable is called multiple
regression. In all cases, regression approximates a function (regression function) that it can be considered as linear or
nonlinear. If the independent variable is x=[x1,x2,,xm], and the corresponding dependent variable is y, then the LR
model is as shown
𝑦t=𝛽0+
m
i=1
𝛽ixi.(11)
The parameters 𝛽0and𝛽1 are regression coefficients. A measure of goodness of fit, ie, how well it predicts the output
variable y, is the magnitude of the residual eiat each of the ndata points as shown
ei=𝑦𝑦. (12)
eiis the difference between the prediction output 𝑦 and the real output yin data point i. In this research, LR takes two
inputs of either CPU or bandwidth.
Random walk (RW) forecasting is the most basic forecasting method implemented. This approach consists of predicting
the next future value as equal to the current observed value.
Moving average is the final method that is used for the prediction of both network bandwidth, and CPU utilization is
an MA method, which is another commonly used forecasting approach. This method consists of predicting a future value
by averaging nprevious values.
3EXPERIMENT DETAILS
The following section describes metrics that are used to determine the most accurate algorithms for predicting CPU
and bandwidth utilization. This section also describes the data models for CPU and bandwidth used to train and test
both nonlinear and linear models in this paper. Finally, in this section, a detailed description of the simulated cloud
environment implemented to test each algorithm performance in terms of service level violations, bandwidth usage, and
energy consumption is given.
3.1 Metrics
The following describes the metrics used to compare the accuracy of each algorithm mentioned in Section 2.3. In the
following descriptions of the metrics, ydenotes actual, 𝑦 denotes forecast, and nis the number of values.
Mean absolute error measures the difference between the predicted value and the actual value by the mean of the
absolute error. The MAE tells us how big of an error we can expect from the forecast on average as shown
MAE =1
n
n
t=1𝑦𝑦.(13)
Mean absolute percentage error calculates the average % the forecasted values deviate from the actual values observed
in the test set as shown
MAPE =100
n
n
t=1
𝑦𝑦
𝑦
.(14)
Both the MAPE and MAE methods are based on the mean error, and are likely to underestimate the impact of large
infrequent errors. This is why the MSE and RMSE are also used in this paper to measure prediction accuracy.
Mean squared error is the measure of how close a fitted line is to data points. For each of the data points, you take the
distance vertically from the point to the corresponding yvalue on the curve fit (the error), and square the value. Then, you
sum all these values for all data points and divide by the number values n. The squaring of each error is done to prevent
negative values. The smaller the MSE, the closer the fit will be to the data
MSE =
n
t=1
(𝑦𝑦)2
n.(15)
DUGGAN ET AL.9
Root mean squared error is the square root of the MSE. By squaring the errors before calculating the mean and then
taking the square root of the mean, we arrive at a measure of the size of the error that gives more weight to the large
infrequent errors
RMSE =
n
t=1
(𝑦𝑦)2
n.(16)
3.2 Data models
Two data models (bandwidth and CPU) were used in this study as data for the algorithm to predict and also as input to
our simulator.
Bandwidth data. The bandwidth model implemented as part of this study is based on transmission control protocol
bandwidth measurements collected from Amazons EC2 cloud.62 In particular, the bandwidth values were taken from
measurements of the network performance within Amazons EU region. This benchmark study provides a measurement
of the available bandwidth on the network links at four points over a single day. In order to generate a bandwidth model
with a sampling distribution of 10-minute intervals, the values were interpolated resulting in a time series model com-
posed of 10-minute intervals over a 24-hour period. In general, the more data that is available to fit a predictive model
provides a greater opportunity to generate better predictions. Two data sets were generated for from the transmission
control protocol bandwidth measurements.62 The first was the training set; in order to generate the training data, the ini-
tial bandwidth values over 24 hours were sampled at each interval and the corresponding values were inputted into a
Gaussian distribution to produce a valid bandwidth model over 7 consecutive days, as shown in Figure 2A. The Gaussian
distribution served to introduce uncertainty into the bandwidth values on the network links. The resulting model con-
sisted of 144 values for each day (6 data points per hour x 24 hours) or 1008 values in total. Using the same procedure as
aforementioned, a test set was also generated from the initial distribution and used to validate the selected models. The
test set contained 432 values (3 days workload) and was used to test the accuracy of the predictive models in this study.
In the forecasting literature, there is generally no principled approach to dividing data into training and test sets. Hynd-
man and Athanasopoulos60 suggested an 80/20 split between the training and test sets, where roughly 80% of the data is
used to train the model and the remaining 20% is used to test the model. Broadly speaking, a commonly occurring ratio
within the community is to use 60% to 80% of the data as the training set and 20% to 40% as the test set. Based on this, we
divided the bandwidth model using 70% of the data as the training data, which corresponded to the first 7 days worth of
time series data. The remaining 30% of the data was used as a test set, which corresponded to the next 3 days. By dividing
the data using these ratios, it helps to ensure that the predictive model can generalize to unseen data.
CPU data. The CPU data model implemented as part of this study is Google's cluster data trace63 that details the resource
usage information of machines in a cluster for a 29-day trace period. This data set is over 300 GB in size and contains
information of over 12 000 host machines in Google's data centers. In this paper, we are only concerned with the CPU
values from these data sets. All of the algorithms mentioned in Section 2.3 are trained on a CPU data set that contain
(A) (B)
Bandwidth Model for Intra Cloud Network Links Within the EU Region CPU Training Data
CPU Utilisation
CPU Utilisation
Bandwidth Availability
Mbps
Time Time
FIGURE 2 Bandwidth and CPU training data sets. A, Bandwidth model; B, CPU training data set
10 DUGGAN ET AL.
7623 values and tested on a data set that contain 144 CPU values. The testing data contain 144 values representing each
10 minute in 24 hours. The training data is 53 days. The reason for having such a large training set is that there is a huge
amount of variation within the Google cluster data trace and this will give each algorithm an opportunity to discover
patterns with in the data. Figure 2B shows the CPU training data used in this paper.
3.3 Simulator model
In this paper, the targeted system is an infrastructure-as-a-service environment, represented by a large-scale data cen-
ter consisting of a cluster of 600 host machines represented as Hh1,h2,....hn. Each host hncontains a list of VMs
Vv1,v2vnand has a capacity of ah. Each VM is of size 1024 MB. Each VM is allocated avof CPU. Therefore, the max-
imum number of VMs allocated to a host is represented as m=ahav. In the interest of simplification, we assume that
the all of the host machines and VMs in the simulated environment are homogeneous. The tasks processed by each VM
will be driven by the Google cluster trace data set detailed in Section 4.2. Any host's CPU utilization greater than 85% is
deemed to be overutilized. The overutilized host detection policies will be continuously monitoring each host machine's
CPU utilization. A host will stay in an overutilized state until necessary VM migrations take place. Live migration occurs
once a host becomes overutilized and VMs will be moved between hosts. Live migration will have a negative impact on the
performance of the host machine and VMs. Voorsluys et al64 have shown that a VMs performance degradation and down-
time during live migration depends on the application's behavior (ie, how many memory pages the application transfers
during live migration). The average performance degradation including the downtime can be estimated as approximately
10% of the CPU utilization. Moreover, in our simulations, we model that the same amount of CPU capacity is allocated
to a VM on the destination host machine during the course of migration. This means that each migration may cause SLA
violation; thus, it is crucial to minimize the number of VM migrations and select a host machine, which will not become
overutilized if a VM is placed on it. The length of a live migration depends on the total amount of memory used by the VM
and the total network traffic in the cloud environment, which will vary at each time (t). Once it has been decided that a
host is overutilized, the next step is to select particular VMs to migrate from the host. We have implemented the minimum
migration time policy, which selects a VM that will require the minimum time to complete a migration relative to the other
VMs allocated to the same overutilized host. Virtual machines are selected based on their RAM utilization. If no suitable
host is available in the current time step, then no migration will take place and the host will remain overutilized. Each
VM will be assigned resources from the assigned host's RAM and CPU. In this paper, we have implemented a sequential
migration transfers policy of VMs, which means that VMs will be transferred one after another. Due to workload changes,
resources used by the VMs, CPU will vary, possibly leading to SLA violations and increased energy consumption. The
SLA violation will occur in host machines when the total demand for the CPU performance exceeds the available CPU
capacity ah.
Power consumption by host machines in data centers is determined by the CPU, memory, disk storage, power supplies,
and cooling systems.65 Recent studies66,67 have shown that the power consumption by a host machine can be accurately
described by a linear relationship between the power consumption and CPU utilization. The power model is similar to
the power model used in the work of Beloglazov and Buyya.43 In this paper, only the HP ProLiant G4 host is used. Table 1
represents the power usage at each of the 10% CPU utilization interval of the host.
We implemented CloudSim's (2011) network flow model for calculations of current delay in the network when migrat-
ing VMs.68 They use point to point communication for data from source uto destination dentity, which is called the flow
and is represented as f=sizef;u;v, where size fis the number of bytes in the flow; and vnisthecurrentVMbeing
migrated from source host uto destination host d. The bandwidth that is available between two entities is represented as
bw; the latency is denoted as lat. The duration of a single network flow can be calculated as shown
delay(u,d,vn)=lat +size𝑓bw.(17)
Migration from an overutilized host will stop automatically when the host is under the threshold set at 85% of the host
CPU utilization.
TABLE 1 Power model (kWhs) implemented in a simulated cloud data center
Server 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
HP ProLiant G4 86 89.4 92.6 96 99.5 102 106 108 112 114 117
DUGGAN ET AL.11
In addition, the total delay for the network at each time step is measured as shown
totalDelay =
n
v=1
delay(u,d,vn).(18)
To calculate the impact, one that VM migration would have on the network link bandwidth, we defined bwl(vn)asthe
network bandwidth consumed by one VM. The total traffic Tr generated by a group of VMs Vfrom the host selected hnis
as shown
Tr(hn)=
n
v=1
bwl(vn),(19)
where vis the set of VMs to be migrated. The latency model has a direct effect on the bandwidth model. The higher latency
lat, the lower bwlwill be available. The higher utilization of network, the lower bwlwill be available. When vnis migrated
from hn, then its utilization is recalculated and is then compared with an overutilized threshold. We only consider the
host's CPU utilization as an indicator to measures if a host machine is overutilization.
3.4 Experiments conducted
The first set of experiments involves comparing the accuracy of performance for each of the prediction algorithms on
the bandwidths training and testing data. All algorithms will be judged based on their performance on the metrics in
Section 4.1. As the RNN is designed to do sequenced planning, it will then be applied to multi-step ahead predictions of
the bandwidth data.
The next set of experiments will evaluate how each algorithm performs when predicting CPU data. This section is
divided into three separate experiments for CPU prediction. The first experiments will examine how each algorithm
performs when predicting CPU utilization on a time scale of one step ahead. The second experiment will examine the
performance of the RNN for “multi-time steps” ahead prediction. The purpose of the multi-step ahead experiments is to
determine how far into the near future a recurrent network can predict and still produce accurate results. This experiment
would be advantageous in real-world data centers as to know in advance if a host will become overutilized or when
bandwidth will become saturated so resources can be reconfigured. The final experiment in this section will measure
the prediction accuracy for each algorithm for a single time step, and then for the RNN's “multi-time steps” ahead CPU
predictions when trained on one host's CPU data and tested on a different host's CPU data. Again, this experiment would
be advantageous in real-world data centers when a host is predicted to become overutilized, and another host must be
selected for VMs to be moved onto without causing SLAV's on the new host.
Finally, the last experiment will evaluate the performance of each one step ahead algorithm with the RNN and how well
their predictions can improve a simulated cloud data centers efficiency. Each algorithms effectiveness will be measured
in SLAs, energy, and bandwidth usage.
4RESULTS
This section presents the results of each of the experiments outlined. First, the results of the bandwidth experiment will
be presented, followed by the CPU result, and lastly the overall simulator results.
4.1 Bandwidth prediction results
This section presents the results from the range of models that were implemented to predict bandwidth resources. The
results of both predicting a single time step and multiple time steps ahead are presented as follows.
4.1.1 Single time step ahead
A comparison on the accuracy of each of the bandwidth forecasting models for one step ahead predictions is presented
as follows for both the training data and test data sets. The calculated metrics on the training data allows us to measure
the performance of the fitted models, while the results generated from the test data demonstrate the predictive accuracy
and ability of each model to generalize to unseen data.
12 DUGGAN ET AL.
TABLE 2 Bandwidth training data prediction accuracy
Algorithm MSE RMSE MAE MAPE
BPTT 184.9968 13.60135 11.0208 1.8379
Random Walk 495.9737 22.2705 17.69222 2.9171
Moving Avg 371.3637 19.2708 15.3689 2.5327
Linear Regression 2985.1861 54.6369 46.2198 7.7215
ARIMA 200.1808 14.1485 10.6497 1.7556
Backpropagation (Sliding Window 2) 432.4122 20.7945 16.7863 2.7521
Backpropagation (Sliding Window 10) 295.5467 17.1915 13.6178 2.2359
Backpropagation (Sliding Window 20) 88.7296 9.4196 4.9847 0.8179
Backpropagation (Sliding Window 30) 155.9807 12.4892 9.7944 1.6294
Abbreviations: ARIMA, autoregressive integrated moving average; BPTT, backpropagation
through time; MAE, mean absolute error; MAPE, mean absolute percentage error; MSE, mean
square error; RMSE, root mean square error.
Table 2 evaluates the models across each of the selected performance metrics. The results overall show that nonlinear
models perform best; this is largely due to the characteristics of the bandwidth model as displayed in Figure 2A. As shown,
the bandwidth data shows significant seasonal patterns with daily peaks and troughs evident throughout that need to
be modeled. However, the data also remains largely stochastic between successive time steps with persistent irregular
fluctuations while no long term trend is evident. In particular, the BP (sliding window size 20) algorithm performed best
followed by BP (sliding window size 30), while ARIMA narrowly outperformed BPTT on the training data. Although LR
is a common prediction method, our experiments showed it performed the least best out of all of the selected models
with a MAPE of 7.7215 on the training data. This is not surprising as the bandwidth data does not display a linear trend;
it is stochastic displaying various fluctuations over time, and as a result, fitting a straight line through the data results in
significant errors on both sides of the fitted line as the model struggles to capture any of the variations in the data.
The true accuracy of a predictive model can only be determined by considering how well it performs on new data,
which was not used to train the model. Table 3 presents the results of each predictive model relative to the test set. The
results show BPTT outperformed all other algorithms with a MAPE of 1.430. In these experiments, this algorithm shows
its capacity to learn complex relationships in the bandwidth time series data, but it also highlights its ability to generalize
well to unseen data, thus indicating the reliability of the model for future predictions. Similar to what was observed in the
training results, the BP (sliding window 20) algorithm performed best out of the remaining approaches. Our empirical
evaluation found that by adjusting the size of the input sequence from 10 to 20 values it provides more specific knowledge
about the underlying structure of the data resulting in an improved MAPE of 2.3798. Unlike the training results, ARIMA
achieved a slightly better result on the test data in comparison to the BP (sliding window 30). In terms of the more sim-
plistic models, they perform similarly. In particular, they show their inability to model the fundamental characteristics of
the bandwidth data with MA performing least best overall with a MAPE of 7.7112. Overall BPTT, ARIMA, and BP with
TABLE 3 Bandwidth test data prediction accuracy
Algorithm MSE RMSE MAE MAPE
BPTT 121.9398 11.0426 8.5788 1.430
Random Walk 3030.3417 55.0485 46.0220 7.6971
Moving Avg 3065.7450 55.3692 45.6627 7.7112
Linear Regression 3021.8770 54.9716 46.2478 7.6953
ARIMA 438.557 20.9418 16.7241 2.7859
Backpropagation (Sliding Window 2) 422.8099 20.5623 16.5327 2.7110
Backpropagation (Sliding Window 10) 362.5414 19.0405 15.1452 2.4981
Backpropagation (Sliding Window 20) 322.0703 17.9463 14.3671 2.3798
Backpropagation (Sliding Window 30) 457.3444 21.3856 17.1363 2.8309
Abbreviations: ARIMA, autoregressive integrated moving average; BPTT, backpropagation
through time; MAE, mean absolute error; MAPE, mean absolute percentage error; MSE, mean
square error; RMSE, root mean square error.
DUGGAN ET AL.13
(A) (B)
Mbps
Mbps
Time Time
Predicted and Actual Bandwidth Values Predicted and Actual Bandwidth Values
FIGURE 3 Predictive performance of backpropagation through time (BPTT), autoregressive integrated moving average (ARIMA), and
various backpropagation (BP) approaches. A, BPTT and BP (sliding window 20); B, BP sliding window 10) and ARIMA [Colour figure can be
viewed at wileyonlinelibrary.com]
different-sized sliding windows resulted in acceptable performance. Figure 3 shows the predictive performance of these
algorithms relative to the actual values observed in the test data over a single day.
4.1.2 Multiple time steps ahead
In the previous experiment, the RNN trained with the BPTT algorithm outperformed all other approaches. In this experi-
ment, the BPTT algorithm will be evaluated in terms of its ability to accurately predict bandwidth availability for multiple
time steps into the future. In particular, the performance of the BPTT algorithm will be assessed based on its ability to
predict bandwidth from one to six time steps ahead. Each time step represents a 10-minute interval, thus equating to
1 hour of future bandwidth availability, which is adequate for predicting resources over a short time span.
Similar to the previous experiment, the results for both the training and test sets are presented in Table 4 and Table 5.
TABLE 4 Multi-step ahead training data prediction
accuracy
Step Ahead MSE RMSE MAE MAPE
1 Step Ahead 184.9968 13.6014 11.0208 1.8379
2StepAhead 436.0588 20.8820 16.7036 2.770
3 Step Ahead 500.8841 22.3804 17.8305 2.95
4StepAhead 552.0954 23.4967 18.8824 3.137
5 Step Ahead 592.6458 24.3443 19.4938 3.221
6StepAhead 718.8461 26.8113 21.1520 3.502
Abbreviations: MAE, mean absolute error; MAPE, mean abso-
lute percentage error; MSE, mean square error; RMSE, root mean
square error.
TABLE 5 Multi-step ahead test data prediction accuracy
Step Ahead MSE RMSE MAE MAPE
1 Step Ahead 121.9398 11.0426 8.5788 1.430
2StepAhead 237.5089 15.4113 12.0718 2.001
3 Step Ahead 257.1531 16.0360 12.5853 2.081
4StepAhead 318.3459 17.8422 14.1399 2.330
5 Step Ahead 407.0782 20.1762 16.2011 2.678
6StepAhead 488.2690 22.0968 17.9570 2.967
Abbreviations: MAE, mean average error; MAPE, mean absolute
percentage error; MSE, mean square error; RMSE, root mean
square error.
14 DUGGAN ET AL.
MAE Multitime Steps Ahead
Training Data
Test Data
Time Step
Mean Absolute Error
FIGURE 4 Mean absolute error (MAE) for backpropagation through time for multitime-steps-ahead bandwidth predictions
As shown, BPTT is capable of achieving significant predictive accuracy when forecasting multiple time steps into the
future with a MAPE of between 1.8379 and 3.502 for time steps one to six for the training data and 1.430 and 2.967 for
the test data. This indicates the reliability and overall robustness of the RNN for estimating bandwidth resources over a
longer time horizon. Also evident, as shown in Figure 4, is the approximately linear growth in the error of the algorithm
the further out into the future it attempts to predict.
4.2 CPU prediction results
This section presents the results for the three CPU prediction experiments mentioned in Section 3.4. As stated earlier,
ARIMA modeling is better for long-term sequence prediction such as for day or month-long predictions and will not be
applied to the CPU prediction.
4.2.1 Single time step ahead
First, we will examine the training prediction accuracy, and then the testing prediction accuracy for all algorithms for “one
time step”. Table 6 displays MSE, RMSE, MAE, and MAPE, for each of the algorithms. Table 6 shows that each algorithm
performs similarly across all metrics on the training data. Based on the MAPE, BP had the least accuracy and BPTT has
the highest accuracy.
Table 7 shows the results of each algorithm on the testing data. Again, for one step ahead prediction, each algorithm
performs similarly across all metrics on the testing data. Based on the MAPE metric, BP with sliding window 30 had the
worst performance. BPTT performed the best out of all. Figure 5A shows the prediction results for both the BPTT and LR
algorithm. Both algorithms perform similarly when compared with the actual tested CPU data-set. The RNN is usually
used for sequence prediction, and in the next section, it will be examined on how far into the future it is accurately able
to predict CPU data.
TABLE 6 Train CPU data prediction accuracy
Algorithm MSE RMSE MAE MAPE
BPTT 0.00761 0.0872 0.0639 0.1395
Random Walk 0.0078 0.0887 0.0604 0.2644
Moving Avg 0.0089 0.0947 0.0649 0.3238
Linear Regression 0.0075 0.0867 0.0608 0.2843
Backpropagation(Sliding Window 2) 0.0079 0.0893 0.0660 0.3621
Backpropagation(Sliding Window 10) 0.0084 0.0918 0.0673 0.1695
Backpropagation(Sliding Window 20) 0.0041 0.0638 0.0455 0.1315
Backpropagation(Sliding Window 30) 0.0064 0.0798 0.0554 0.1714
Abbreviations: ARIMA, autoregressive integrated moving average; BPTT, backpropagation
through time; MAE, mean absolute error; MAPE, mean absolute percentage error; MSE,
mean square error; RMSE, root mean square error.
DUGGAN ET AL.15
TABLE 7 Test CPU data prediction accuracy
Algorithm MSE RMSE MAE MAPE
BPTT 0.0144 0.1202 0.0818 0.1237
Random Walk 0.0188 0.1370 0.0843 0.1498
Moving Avg 0.0228 0.1509 0.0988 0.1516
Linear Regression 0.0144 0.1201 0.0828 0.1248
Backpropagation (Sliding Window 2) 0.0146 0.1208 0.0859 0.1278
Backpropagation (Sliding Window 10) 0.0396 0.1990 0.1690 0.2184
Backpropagation (Sliding Window 20) 0.0691 0.2628 0.2298 0.2975
Backpropagation (Sliding Window 30) 0.0809 0.2844 0.2527 0.3220
Abbreviations: BPTT, backpropagation through time; MAE, mean absolute error; MAPE,
mean absolute percentage error; MSE, mean square error; RMSE, root mean square error.
(A) (B)
FIGURE 5 Predictive performance of backpropagation through time (BPTT) and linear regression (LR) with mean absolute error (MAE)
for one and six time steps: A, Actual , BPTT, and LR CPU values; B, MAE for one and six time steps [Colour figure can be viewed at
wileyonlinelibrary.com]
4.2.2 Multi time step ahead
The aim of the multiple-time-steps-ahead prediction experiment was to evaluate how far into the future the RNN could
accurately predict CPU utilization of a host and to determine by how much does predictions decrease the further into the
future the network attempts to predict. This experiment involved predicting the CPU utilization of a host machine from
one to six time steps into the future. Each of these time steps corresponds to 10 minutes; for instance, time step six relates
to a prediction 1 hour into the future.
Tables 8 and 9 present the accuracy of the prediction on the training and testing data sets at each of the six time steps.
It is clear by the results that each performance metric shows a linear increase in the error of prediction; the further
TABLE 8 Multi-step ahead trained data prediction
accuracy
Step Ahead MSE RMSE MAE MAPE
1StepAhead 0.00761 0.0872 0.0639 0.1395
2StepAhead 0.013 0.113 0.084 0.179
3StepAhead 0.015 0.126 0.98 0.191
4StepAhead 0.019 0.137 0.106 0.220
5StepAhead 0.021 0.144 0.112 0.232
6StepAhead 0.024 0.154 0.122 0.248
Abbreviations: MAE, mean absolute error; MAPE, mean abso-
lute percentage error; MSE, mean square error; RMSE, root
mean square error.
16 DUGGAN ET AL.
TABLE 9 Multi-step ahead test data prediction
accuracy
Step Ahead MSE RMSE MAE MAPE
1StepAhead 0.0144 0.1202 0.0818 0.1237
2StepAhead 0.0259 0.1608 0.1200 0.1890
3StepAhead 0.0301 0.1734 0.1332 0.2140
4StepAhead 0.0333 0.1824 0.1447 0.2320
5StepAhead 0.0356 0.1887 0.1555 0.2443
6StepAhead 0.0377 0.194 0.1617 0.255
Abbreviations: MAE, mean absolute error; MAPE, mean abso-
lute percentage error; MSE, mean square error; RMSE, root
mean square error.
into the future, the recurrent network tries to predict. Up to six time steps ahead was chosen as the max future time
step as it corresponds to one hour of CPU information and this is a sufficient time into the future for the planning of
resources.
Figure 5B highlights the MAE for both 1-time-step-ahead and 6-time-steps-ahead for the BPTT algorithm on the test
data. Figure 5B shows that the errors increase the further into the future the algorithm predicts. The graphs also highlights
where the BPTT algorithm struggles when making predictions. As shown in Figure 5A, at time step 22, when there is
a extreme sudden change of CPU utilization, Figure 5B shows at the same time step that the BPTT MAE accuracy also
decreases; this is a result of the extreme varying CPU workloads . However, saying that, the algorithm performs well when
predicting six times steps into the future.
4.2.3 Unseen host data set
The next set of experiments involved evaluating the performance of each algorithm on a host's CPU data set, which it
was not trained on. The results from this experiment will show how well each generalizes when applied to new unseen
data. Table 10 shows by each of the metrics that BPTT outperforms all other algorithms, with BP with 10 sliding windows
having the worst performance.
Similarly, in this experiment, the RNN was examined on its ability to predict another host's CPU usage based on training
it completed on another host's CPU data. Table 11 shows how far into the future the RNN could accurately predict CPU
utilization. The results show a linear increase in the error of prediction the further into the future the recurrent network
tries to predict. However, the predictions produced, for even six time steps into the future, are relativity accurate. These
predictions will help the RNN approach to decide well in advance if a host will become overutilized so to not place another
VM on it, causing it to degrade even further.
TABLE 10 Test results from new host data when previously trained on
another host CPU data
Algorithm MSE RMSE MAE MAPE
BPTT 0.012 0.107 0.081 0.122
Random Walk 0.015 0.121 0.085 0.132
Moving Avg 0.014 0.109 0.082 0.127
Linear Regression 0.013 0.108 0.081 0.123
Backpropagation (Sliding Window 2) 0.013 0.107 0.082 0.125
Backpropagation (Sliding Window 10) 0.1953 0.4419 0.4386 6.3073
Backpropagation (Sliding Window 20) 0.1685 0.4105 0.4057 5.8130
Backpropagation (Sliding Window 30) 0.1193 0.3454 0.3423 4.9288
Abbreviations: BPTT, backpropagation through time; MAE, mean absolute error; MAPE,
mean absolute percentage error; MSE, mean square error; RMSE, root mean square error.
DUGGAN ET AL.17
TABLE 11 Multi-step ahead test data prediction
accuracy (different host)
Step Ahead MSE RMSE MAE MAPE
1StepAhead 0.012 0.107 0.081 0.122
2StepAhead 0.016 0.127 0.098 0.15
3StepAhead 0.021 0.145 0.112 0.169
4StepAhead 0.026 0.161 0.126 0.189
5StepAhead 0.03 0.172 0.134 0.199
6StepAhead 0.032 0.177 0.14 0.204
Abbreviations: MAE, mean absolute error; MAPE, mean
absolute percentage error; MSE, mean square error; RMSE,
root mean square error.
4.3 Simulation results
This section will present the performance of each algorithm in a simulated data center environment. The BPTT has the
abilities to predict multitimes into the future (60 minutes) for both CPU and bandwidth, while the rest of the algorithms
are one step ahead prediction (ie, 10 minutes) for CPU only. As BP with sliding window 2 has outperformed each of the
other BP sliding windows, this will be the only network evaluated in the simulation experiment. The goal of this exper-
iment is to investigate how well a “multi-time step” ahead approach can improve data center efficiency by determining
when a host is overutilized and when is the best time to initiate live migration.
The first metric that was examined was the SLAVs. Maintaining low occurrences of SLAV's is an essential factor in the
delivery of reliable quality assured cloud-based services. In this regard, it is imperative to consider the number of SLAVs
incurred by all approaches throughout the simulation. Table 12 presents the average SLAVs each algorithm occurred
during the simulations. In the experiment the lower the SLAV value is, the better the data center is performing. The
BPTT multiple step ahead predictions enabled it to anticipate well in advance if any overutilization was to occur in any
of the hosts. This allowed the RNN approach to start live migration well before a host could become overutilized and
select a suitable host for a VM, which would not become overutilized in the next hour, thus averaging SLAV results of
just 1.80776E-06. This is nearly 87% better than the next lowest value from RW. BP had the worst SLAV of all approaches.
A T-test was performed between the best algorithm and the second best algorithm for SLAV. The two-tailed P value is
less than 0.0001 and the results are deemed statistically significant with a 95% confidence interval of this difference from
15.82188702780 to 8.95795613255; BPTT had a standard deviation of 5.53272815845 compared with 20.17874430023
for RW. Figure 6B presents the results for all algorithms.
Figure 6A shows the energy consumption at each of the time steps. As can be seen from the results, BP had the lowest
energy consumption of all approaches with an average of 340.67 kWh per host. The BPTT had the second lowest energy
consumption of 457.72 kWh per host and RW had the worst energy consumption of all approaches with the average
energy consumption of 764.22 kWh per host. One reason for RW having the worst energy consumption is that it too
has the highest average migration count in the simulation with 197 and both neural network approaches had the lowest
migrations with 175 for BPTT and 120 for BP. The BPTT achieved a relatively low average energy consumption within
the data center. One reason being that it selects optimal times to migration when more bandwidth was available and had
better accuracy of prediction also, thus providing faster migration and ensuing the host never enters an overutilized state.
TABLE 12 SLAV and ESV results
Metric SLAV ESV
BPTT 1.80776E-06 0.000827449
Random Walk 1.41977E-05 0.010850105
Moving Average 1.89143E-05 0.013299876
Linear Regression 2.66094E-05 0.016313468
Backpropagation 2.7531E-05 0.009379006
Abbreviations: BPTT, backpropagation through time;
SLAV, service level agreement.
18 DUGGAN ET AL.
(A) (B)
FIGURE 6 Performance of all algorithms. A, Energy performance of all algorithms; B, SLAV performance of all algorithms. BP,
backpropagation; BPTT, backpropagation through time; LR, linear regression; MA, moving average; RW, random walk [Colour figure can be
viewed at wileyonlinelibrary.com]
The BPTT ability to store past information makes it an ideal algorithm for sequence prediction of cloud resources. The
BPTT produced the most accurate predictions for the bandwidth utilization both for one and multitime steps ahead. This
allowed the algorithm to decide when is the most optimal times to migrate VMs based on network bandwidth usage. The
BPTT achieves an average bandwidth usage of 610 MB per time step where RW achieved the worst overall results of 646
MB per time step. This led to the BPTT having lower migration times for VMs. By choosing the specific time of when to
migrate, a VM allowed the BPTT algorithm to better utilize available resources at critical times.
The SLA and energy metrics combine to create a metric known as ESV. This metric measures the overall data center
performance in terms of minimizing energy and reducing SLAs (ie, ESV =energy*SLAV). If we try to reduce too much
energy consumption, the SLA violation will be increased, because consolidating many VMs in a host increases the prob-
ability of overload. Therefore, it is desirable to obtain a method, which will consume less power and still incur less SLA
violation.
The lower the ESV, the better the performance the data center is achieving. Overall, the BPTT algorithm achieved the
best ESV performance of 0.000827449 compared with the second best, which is RW of 0.010850105. This is an improve-
ment of 61% of data center efficiency. Again, a T-test was conducted between the two best performing algorithms. The
two-tailed P value is less than 0.0001 and the results are deemed statistically significant with a 95% confidence interval
of this difference from 0.01458090588 to 0.00688034372; BPTT had a standard deviation of 0.00424366520 compared
with 0.02314924312 for RW.
4.4 Discussions
The results of the experiments show that the RNN has the capabilities to improve upon traditional prediction methods
such as RW, MA, and BP to predict CPU utilization and bandwidth with a high degree of accuracy to improve the per-
formance of a data center. Even though the bandwidth data set does contain a learnable pattern, the CPU data set used
has varying demands, which makes it much harder to predict CPU utilization of host machines. For this reason, we have
excluded the phenomenon such as heavy workloads for this paper.
The first experiment determine how an RNN could outperform tradition forecasting methods in predicting the next
bandwidth value. The results from the MSE, MAE, RMSE, and MAPE indicated that BPTT achieved the highest accuracy
as shown in Table 3. The second part of this experiment examined how far into the future the recurrent network could
predict bandwidth with a high degree of accuracy. The results indicate that the RNN can produce a reasonable degree of
accuracy when predicting multiple time steps into the future even though it only needs two previous inputs to achieve this.
The second experiment involved determining which algorithm would produce the most accurate results in predicting
CPU utilized. From the results, BPTT achieved the highest results with an MAE accuracy of 0.0818 on the test data. The
BPTT also produced the most accurate predictions when trained on one host's CPU utilization data and tested on a new
host CPU data.
The results from the final experiment highlight the efficiency of the BPTT algorithm. Each of the forecasting methods
were tested in a simulated environment on the following metrics: energy, SLAV, and bandwidth usage metrics. The BP
DUGGAN ET AL.19
algorithm achieved the lowest energy but the highest SLAVs. The RW algorithm achieved the second lowest SLAV but
the highest energy consumption. However, the BPTT achieved a statistically significant lower SLAV and ESV values,
improving the data center efficiency by 61% when compared with the next best algorithm. One of the main reasons BPTT
could achieve the most efficient results is because it can identify when host machines will become overutilized and best
time to schedule migration based on the network bandwidth.
Multi-time step ahead prediction is a difficult area in time series research. The additional noise present in both the CPU
and bandwidth utilization data makes forecasting far into the future even more difficult. The BPTT forecasting method
presented in this research could potentially be incorporated with many of the other subfields of cloud computing that
involve host prediction, VM migration, and task scheduling to improve overall performance.
The results presented in this paper demonstrate that RNNs are capable of predicting CPU data 1 hour ahead with a
relatively high degree of accuracy despite the noise in the data set. It is known that instantiating a new VM takes between 5
to 15 minutes.59 In real-world data centers, the recurrent network could be implemented to inform the cloud management
system as to when a host is going to become over/underutilized so the management system can take action and boot up
new VM instances on a different host prior to initiating live migration from the overutilized host. This, in turn, would
lead to smoother transitions of VMs being moved from a source host to a destination, reducing live migration times and
decreasing the occurrences of SLAs on host machines.
This paper focused on implementing BPTT and compared it to more tradition approaches such as BP and LR. Our
results showed that, even predicting one to six time steps into the future, the RNN trained with BPTT only needed two
past CPU values. One reason for this is the data was constantly fluctuating meaning only the most recent values were
required for the RNN-BPTT to make it predictions. Other RRN training algorithm, such as long short-term memory
(LSTM), has been explored in the cloud domain previously, eg, research such as the work of Janardhanan and Barrett69
demonstrated for longer forecasting problems where a single days' worth of CPU values (1440 time steps) is input and the
LSTM model was able to outperform ARIMA modeling. Previous work by Gers et al70 demonstrated that, when LSTM
networks are presented with data such as the Mackey-Glass series problem, LSTMs do not perform well because ofF the
constant fluctuation of the data. Their results showed that a sliding window multilayer perceptron could produce more
accurate results than an LTSM approach. Predicting CPU six time steps into the future based on the previous two also
suffers from similar high fluctuations as the Mackey-Glass series problem. However, at lower granularities past six time
steps, we find that BPTT is more effective. Zhang et al demonstrated the usefulness of RNN to predict CPU and RAM.71
As shown through our results and highlighted by Zhang et al, the abilities of RNN to retain information and to create
its own representation of the data enable the algorithm to achieve high accuracy of workload predictions on the Google
cloud trace data set.
How efficient management of VMs in cloud computing services offered by Google and Amazon is crucial to lowering
their cost and provide a better service. The results presented in this paper demonstrate that an RNN can produce accurate
forecasts well into the future; therefore, it could have additional benefit of reducing the overall energy consumption
of the data center. The recurrent network produce promising result for power consumption having the second lowest
consumption. Koomey has stated that, in 2010, data center consumed 1.3% of world's energy.72 Gartner Inc has highlighted
that the ICT industry contributed to about 2% of global CO2 emitted each year, aligning itself on the same level with the
aviation industry.73 Better optimization of cloud data centers need in place to dramatically decrease energy consumption.
Research has been shown that, in data centers, a significant portion of the host machines operate at 10 to 50% of their
full capacity.74 This results in a considerable increase in energy costs for cloud companies. Duy et al have shown how
neural networks can be utilized as a predictor to reduce energy consumption in a data center to turn off host when the
traffic load is light.48 Our approach results in a saving of 40% in terms of comparing the energy usage of the RNN and the
RW. Companies such as Google have recently implemented their own DeepMind neural network tool to reduce their data
center energy cost by 40%.75 This, in turn, will lead to millions of dollars saved on powering host machine and reduce
Google carbon footprint also.
Much more companies are hiring cloud computing services like Amazon and Google to process their “big data.” These
cloud computing services need to be effectively managed to provide the best service possible to their customers. Our
approach has demonstrated that the cloud environment can be managed more effectively using ML techniques. This
would place cloud computing services in a better position to handle the immense computing requirements faced in the
age of big data.
The results from this paper showed when an RNN trained on a “big data” data set, such as Google cluster trace, it has
the abilities to accurately predict when a host machine utilization for several time steps into the future. This would help
optimize cloud companies data centers such as Google and Amazon to reduce the amount of host machines idling at 0 to
20 DUGGAN ET AL.
10% utilization for at least 20 to 30 minutes. Thus, this will aid in reducing energy consumption and by extension decrease
CO2 emissions from the cloud data centers.
Our approach has demonstrated that the cloud environment utilizing a big data set can be managed more effectively
using ML techniques. This would place cloud computing services in a better position to handle the immense computing
requirements faced in the age of big data.
5CONCLUSION
To maximize resource usage within a data center and to ensure SLA is met at all times, resource management strategies
must be able to predict into the future how each host is going to perform. In this paper, we have conducted competi-
tive analysis of the nonlinear and linear one step ahead predicting algorithm against a multi time step ahead prediction
algorithm. We have evaluated the proposed algorithms through extensive simulations on a large-scale experiment setup
using workload traces from more than 600 host machines using the Google clustering trace data. The results of the exper-
iments have shown that the memory retention and sequence prediction of an RNN allowed it to have the abilities to
produce the most accurate prediction for both CPU and bandwidth utilization. The RNN also was able to decrease band-
width usage during critical times, reduce occurrence of SLAVs, and improve the overall efficiency of a cloud data center.
With respect to future work, one area that could be explored is the incorporation of other resources such as memory to
further improve migration decisions and overall systems performance.
In summary, the main findings of this research are as follows.
1. Recurrent neural networks produce better prediction for host CPU and network bandwidth utilization when compared
with traditional models.
2. Recurrent neural networks produce high accuracy when predicting as multitime steps into the future for both CPU
and bandwidth data. In both cases, the accuracy of the network predictions decreases linearly the further into the
future the network attempts to predict.
3. The RNNs produce statistically significant results when improving the efficiency of larger scale simulated cloud data
centers.
ACKNOWLEDGEMENT
The second author would like to acknowledge the ongoing financial support provided to her by the Irish Research Council.
ORCID
M. Duggan http://orcid.org/0000-0001-9576-3884
REFERENCES
1. Cisco Visual Networking Index: Forecast and Methodology, 2014–2019. White Paper. San Jose, CA: Cisco; 2016.
2. Zhang Y, Sun W, Inoguchi Y. Predict task running time in grid environments based on CPU load predictions. Futur Gener Comput Syst.
2008;24(6):489-497.
3. Dinda PA, O'Hallaron DR. Host load prediction using linear models. Clust Comput. 2000;3(4):265-280.
4. Bey KB, Benhammadi F, Mokhtari A, Guessoum Z. CPU load prediction model for distributed computing. Paper presented at: Eighth
International Symposium on Parallel and Distributed Computing (ISPDC); 2009; Lisbon, Portugal.
5. Benson T, Anand A, Akella A, Zhang M. MicroTE: fine grained traffic engineering for data centers. In: Proceedings of the Seventh
COnference on Emerging Networking EXperiments and Technologies (CoNEXT); 2011; Tokyo, Japan.
6. Armbrust M, Fox A, Griffith R, et al. A view of cloud computing. Commun ACM. 2010;53(4):50-58.
7. Hu K, Sim A, Antoniades D, Dovrolis C. Estimating and forecasting network traffic performance based on statistical patterns observed in
SNMP data. In: Machine Learning and Data Mining in Pattern Recognition: 9th International Conference, MLDM 2013, New York, NY, USA,
July 19-25, 2013. Proceedings. Berlin, Germany: Springer-Verlag Berlin Heidelberg; 2013:601-615.
8. Akoush S, Sohan R, Rice A, Moore AW, Hopper A. Predicting the performance of virtual machine migration. Paper presented at: IEEE
International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems; 2010; Miami Beach, FL.
9. Wood T, Ramakrishnan K, Shenoy P, et al. CloudNet: dynamic pooling of cloud resources by live WAN migration of virtual machines.
IEEE/ACM Trans Netw. 2015;23(5):1568-1583.
DUGGAN ET AL.21
10. Mandal U, Habib MF, Zhang S, Chowdhury P, Tornatore M, Mukherjee B. Heterogeneous bandwidth provisioning for virtual machine
migration over SDN-enabled optical networks. Paper presented at: Optical Fiber Communication Conference; 2014; San Francisco, CA.
11. Clark C, Fraser K, Hand S, et al. Live migration of virtual machines. In: Proceedings of the 2nd Conference on Symposium on Networked
Systems Design & Implementation (NSDI); 2005; Boston, MA.
12. Beloglazov A, Buyya R. Energy efficient resource management in virtualized cloud data centers. In: Proceedings of the 2010 10th
IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGRI); 2010; Melbourne, Australia.
13. Verma A, Ahuja P, Neogi A. pMapper: power and migration cost aware application placement in virtualized systems. In: Middleware 2008:
ACM/IFIP/USENIX 9th International Middleware Conference Leuven, Belgium, December 1-5, 2008 Proceedings. Berlin, Germany: Springer;
2008:243-264.
14. Chen J, Liu W, Song J. Network Performance-aware Virtual Machine Migration in Data Centers. Paper presented at: Third International
Conference on Cloud Computing (CloudComp); 2012; Vienna, Austria.
15. Piao JT, Yan J. A network-aware virtual machine placement and migration approach in cloud computing. Paper presented at: The Ninth
International Conference on Grid and Cloud Computing (GCC); 2010; Nanjing, China.
16. Stage A, Setzer T. Network-aware migration control and scheduling of differentiated virtual machine workloads. In: Proceedings of the
2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing; 2009; Vancouver, Canada.
17. Chen H, Kang H, Jiang G, Zhang Y. Coordinating virtual machine migrations in enterprise data centers and clouds.
18. Ghorbani S, Caesar M. Walk the line: consistent network updates with bandwidth guarantees. In: Proceedings of the First Workshop on
Hot Topics in software Defined Networks (HotSDN); 2012; Helsinki, Finland.
19. Duggan M, Duggan J, Howley E, Barrett E. A reinforcement learning approach for the scheduling of live migration from under utilised
hosts. Memetic Comput. 2017;9(4):283-293. https://doi.org/10.1007/s12293-016- 0218-x
20. Duggan M, Duggan J, Howley E, Barrett E. A network aware approach for the scheduling of virtual machine migration during peak loads.
Clust Comput. 2017;20(3):2083-2094.
21. Chujai P, Kerdprasop N, Kerdprasop K. Time series analysis of household electric consumption with ARIMA and ARMA models. In:
Proceedings of the International MultiConference of Engineers and Computer Scientists; 2013; Hong Kong.
22. Enke D, Thawornwong S. The use of data mining and neural networks for forecasting stock market returns. Expert Syst Appl.
2005;29(4):927-940.
23. Tilman D, Fargione J, Wolff B, et al. Forecasting agriculturally driven global environmental change. Science. 2001;292(5515):281-284.
24. Duggan M, Duggan J, Howley E, Barrett E. An autonomous network aware VM migration strategy in cloud data centres. Paper presented
at: International Conference on Cloud and Autonomic Computing (ICCAC) ; 2016; Augsburg, Germany.
25. Duggan M, Flesk K, Duggan J, Howley E, Barrett E. A reinforcement learning approach for dynamic selection of virtual machines in cloud
data centres. Paper presented at: Sixth International Conference on Innovative Computing Technology (INTECH); 2016; Dublin, Ireland.
26. Shaw R, Howley E, Barrett E. Predicting the available bandwidth on intra cloud network links for deadline constrained workflow schedul-
ing in public clouds. In: Service-Oriented Computing: 15th International Conference, ICSOC 2017, Malaga, Spain, November 13-16, 2017,
Proceedings. Cham, Switzerland: Springer International Publishing AG ; 2017:221-228.
27. Shaw R, Howley E, Barrett E. An advanced reinforcement learning approach for energy-aware virtual machine consolidation in cloud
data centers. Paper presented at: 12th International Conference for Internet Technology and Secured Transactions (ICITST); 2017;
Cambridge, UK.
28. Barrett E, Howley E, Duggan J. Applying reinforcement learning towards automating resource allocation and application scalability in
the cloud. Concurr Comput Pract Exp. 2013;25(12):1656-1674.
29. Barrett E, Howley E, Duggan J. A learning architecture for scheduling workflow applications in the cloud. Paper presented at: Ninth IEEE
European Conference on Web Services (ECOWS); 2011; Lugano, Switzerland.
30. Tang M, Zhang T, Liu J, Chen J. Cloud service QoS prediction via exploiting collaborative filtering and location-based data smoothing.
Concurr Comput Pract Exp. 2015;27(18):5826-5839.
31. Dabbagh M, Hamdaoui B, Guizani M, Rayes A. Toward energy-efficient cloud computing: Prediction, consolidation, and overcommitment.
IEEE Netw. 2015;29(2):56-61.
32. Wang J, Huang C, He K, Wang X, Chen X, Qin K. An energy-aware resource allocation heuristics for VM scheduling in cloud. Paper
presented at: 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International
Conference on Embedded and Ubiquitous Computing; 2013; Zhangjiajie, China.
33. Farahnakian F, Pahikkala T, Liljeberg P, Plosila J, Hieu NT, Tenhunen H. Energy-aware VM consolidation in cloud data centers using
utilization prediction model. IEEE Trans Cloud Comput. 2016.
34. Nguyen TH, Di Francesco M, Yla-Jaaski A. Virtual machine consolidation with multiple usage prediction for energy-efficient cloud data
centers. IEEE Trans Serv Comput. 2017.
35. Bobroff N, Kochut A, Beaty K. Dynamic placement of virtual machines for managing sla violations. Paper presented at: 10th IFIP/IEEE
International Symposium on Integrated Network Management (IM); 2007; Munich, Germany.
36. Fu X, Zhou C. Predicted affinity based virtual machine placement in cloud computing environments. IEEE Trans Cloud Comput. 2017.
37. Chen T, Zhu Y, Gao X, Kong L, Chen G, Wang Y. Improving resource utilization via virtual machine placement in data center networks.
Mob Netw Appl. 2017;23(2):227-238.
38. Gai K, Du Z, Qiu M, Zhao H. Efficiency-aware workload optimizations of heterogeneous cloud computing for capacity planning in
financial industry. Paper presented at: IEEE 2nd International Conference on Cyber Security and Cloud Computing (CSCloud); 2015;
New York, NY.
22 DUGGAN ET AL.
39. Genez TA, Bittencourt L, Fonseca N, Madeira E. Estimation of the available bandwidth in inter-cloud links for task scheduling in hybrid
clouds. IEEE Trans Cloud Comput. 2015.
40. Mason K, Duggan M, Barrett E, Duggan J, Howley E. Predicting host CPU utilization in the cloud using evolutionary neural networks.
Futur Gener Comput Syst. 2018;86:162-173.
41. Cui Y, Lin Y, Guo Y, Li R, Wang Z. Optimizing live migration of virtual machines with context based prediction algorithm. Adv Intell
Syst Res. 2013.
42. Wu Y, Zhao M. Performance modeling of virtual machine live migration. Paper presented at: IEEE 4th International Conference on Cloud
Computing (CLOUD); 2011; Washington, DC.
43. Beloglazov A, Buyya R. Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic
consolidation of virtual machines in cloud data centers. Concurrency Computat Pract Exp. 2012;24(13):1397-1420.
44. Duggan M, Mason K, Duggan J, Howley E, Barrett E. Predicting host CPU utilization in cloud computing using recurrent neural networks.
Paper presented at: 12th International Conference for Internet Technology and Secured Transactions (ICITST); 2017; Cambridge, UK.
45. Zhang G, Patuwo BE, Hu MY. Forecasting with artificial neural networks: the state of the art. Int J Forecast. 1998;14(1):35-62.
46. Zhang Y, Sun W, Inoguchi Y. CPU load predictions on the computational grid. IEICE Trans Inf Syst. 2007;90(1):40-47.
47. Cao J, Fu J, Li M, Chen J. CPU load prediction for cloud environment based on a dynamic ensemble model. Softw Pract Exp.
2014;44(7):793-804.
48. Duy TVT, Sato Y, Inoguchi Y. Performance evaluation of a green scheduling algorithm for energy savings in cloud computing. Paper
presented at: IEEE International Symposium on Parallel & Distributed Processing, Workshops and PhD Forum (IPDPSW); 2010;
Atlanta, GA.
49. Vieira K, Schulter A, Westphall C, Westphall C. Intrusion detection techniques in grid and cloud computing environment. IT Prof IEEE
Comput Soc. 2010;12(4):38-43.
50. Joshi B, Vijayan AS, Joshi BK. Securing cloud computing environment against DDoS attacks. Paper presented at: International Conference
on Computer Communication and Informatics (ICCCI); 2012; Coimbatore, India.
51. Prevost JJ, Nagothu K, Kelley B, Jamshidi M. Prediction of cloud data center networks loads using stochastic and neural models. Paper
presented at: 6th International Conference on System of Systems Engineering (SoSE); 2011; Albuquerque, NM.
52. Zhang W, Duan P, Yang LT, et al. Resource requests prediction in the cloud computing environment with a deep belief network. Softw
Pract Exp. 2017;47(3):473-488.
53. Calheiros RN, Masoumi E, Ranjan R, Buyya R. Workload prediction using ARIMA model and its impact on cloud applications' QoS. IEEE
Trans Cloud Comput. 2015;3(4):449-458.
54. Bishop CM. Neural Networks for Pattern Recognition. Oxford, UK: Oxford University Press; 1995.
55. Mason K, Duggan J, Howley E. Evolving multi-objective neural networks using differential evolution for dynamic economic emission
dispatch. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO); 2017; Berlin, Germany.
56. Mason K, Duggan J, Howley E. Neural network topology and weight optimization through neuro differential evolution. In: Proceedings
of the Genetic and Evolutionary Computation Conference Companion (GECCO); 2017; Berlin, Germany.
57. Mason K, Duggan J, Howley E. A meta optimisation analysis of particle swarm optimisation velocity update equations for watershed
management learning. Appl Soft Comput. 2018;62:148-161.
58. Werbos PJ. Backpropagation through time: what it does and how to do it. Proc IEEE. 1990;78(10):1550-1560.
59. Islam S, Keung J, Lee K, Liu A. Empirical prediction models for adaptive resource provisioning in the cloud. Futur Gener Comput Syst.
2012;28(1):155-162.
60. Hyndman RJ, Athanasopoulos G. Forecasting: Principles and Practice. OTexts; 2014.
61. Box GEP, Jenkins GM, Reinsel GC, Ljung GM. Time Series Analysis: Forecasting and Control. Hoboken, NJ: John Wiley & Sons; 2015.
62. Sanghrajka S, Mahajan N, Sion R. Cloud Performance Benchmark Series: Network Performance-Amazon EC2. Technical Report.
Brookhaven, NY: Stony Brook University; 2011.
63. Reiss C, Wilkes J, Hellerstein JL. Google Cluster-Usage Traces: Format+ Schema. White Paper. Mountain View, CA: Google Inc; 2011.
64. Voorsluys W, Broberg J, Venugopal S, Buyya R. Cost of virtual machine live migration in clouds: a performance evaluation. In: Cloud Com-
puting: First International Conference, CloudCom 2009, Beijing, China, December 1-4, 2009. Proceedings. Berlin, Germany: Springer-Verlag
Berlin Heidelberg; 2009:254-265.
65. MinasL,EllisonB.Energy Efficiency for Information Technology: How to Reduce Power Consumption in Servers and Data Centers.Santa
Clara, CA: Intel Press; 2009.
66. Fan X, Weber W-D, Barroso LA. Power provisioning for a warehouse-sized computer. ACM SIGARCH Comput Archit News.
2007;35(2):13-23.
67. Kusic D, Kephart JO, Hanson JE, Kandasamy N, Jiang G. Power and performance management of virtualized computing environments
via lookahead control. ClusT Comput. 2009;12(1):1-15.
68. Garg SK, Buyya R. NetworkCloudSim: modelling parallel applications in cloud simulations. Paper presented at: IEEE 4th International
Conference on Utility and Cloud Computing (UCC); 2011; Melbourne, Australia.
69. Janardhanan D, Barrett E. CPU workload forecasting of machines in data centers using LSTM recurrent neural networks and
ARIMA models. Paper presented at: 12th International Conference for Internet Technology and Secured Transactions (ICITST); 2017;
Cambridge, UK.
DUGGAN ET AL.23
70. Gers FA, Eck D, Schmidhuber J. Applying LSTM to time series predictable through time-window approaches. In: Neural Nets WIRN
Vietri-01: Proceedings of the 12th Italian Workshop on Neural Nets, Vietri sul Mare, Salerno, Italy, 17-19 May 2001. London, UK:
Springer-Verlag London; 2002.
71. Zhang W, Li B, Zhao D, Gong F, Lu Q. Workload prediction for cloud cluster using a recurrent neural network. Paper presented at:
International Conference on Identification, Information and Knowledge in the Internet of Things (IIKI); 2016; Beijing, China.
72. Koomey JG. Estimating Total Power Consumption by Servers in the US and the World. 2007.
73. Gartner Inc. Gartner Estimates ICT Industry Accounts for 2 Percent of Global CO2 Emissions. 2007. http://www.gartner.com/newsroom/
id/503867
74. Barroso LA, Hölzle U. The case for energy-proportional computing. Computer. 2007;40(12):33-37.
75. Gao J, Jamidar R. Machine Learning Applications for Data Center Optimization. White Paper. Mountain View, CA: Google Inc; 2014.
Howtocitethisarticle: Duggan M, Shaw R, Duggan J, Howley E, Barrett E. A multitime-steps-ahead
prediction approach for scheduling live migration in cloud data centers. Softw Pract Exper. 2018;1–23.
https://doi.org/10.1002/spe.2635
... Both algorithms reduce the response time compared to other approaches in the literature by up to 300 ms. Slashdot cloud effects toleration EMA Reduction in the effects of slashdot Oliveira et al. [5] Interference effects reduction EMA Reduction in up to 15% of total execution time Shahin [7] Automatic scheduling LSTM Reduction in average response time of up to 300 ms Wei et al. [8] On demand pricing resources HMM Increase in profit by about 25% Guo e Yao [9] System load prediction GRU Smaller MSE and MAPE Duggan et al. [10] SLA violations reduction RNN 85% less violations This work Application Profile Prediction several up to 94% F 1 score on average ...
... So it is necessary to make this prediction with greater anticipation. Duggan et al. [10] compares the use of various machine learning techniques, including a Recurrent Neural Network (RNN), for predicting CPU consumption and network bandwidth. ...
... In addition, simulations were performed based on application execution histories in a cloud environment. In these simulations [10], it was possible to anticipate the moment to migrate the VMs and, thus, reduce SLA violations. ...
Article
Full-text available
Today, cloud environments are widely used as execution platforms for most applications. In these environments, virtualized applications often share computing resources. Although this increases hardware utilization, resources competition can cause performance degradation, and knowing which applications can run on the same host without causing too much interference is key to a better scheduling and performance. Therefore, it is important to predict the resource consumption profile of applications in their subsequent iterations. This work evaluates the use of machine learning techniques to predict the increase or decrease in computational resources consumption. The prediction models are evaluated through experiments using real and benchmark applications. Finally, we conclude that some models offer significantly better performance when compared to the current trend of resource usage. These models averaged up to 94% on the F1 metric for this task.
... To predict CPU utilization and network bandwidth usage for live virtual machine migration, Duggan et al. [52] used an artificial neural network (ANN) and proposed a multi-time-ahead prediction model. The model aims to improve the performance of the data center by minimizing bandwidth utilization. ...
... As a result, the total migration time and downtime are dependent on the size of the VM's memory and bandwidth available for migration. Several studies [52,[58][59][60][61][62][63] were conducted for analyzing the correlation between bandwidth and performance parameters. Those studies have highlighted that the total migration time is reduced when high-bandwidth resources are available. ...
Article
Full-text available
Organizations widely use cloud computing to outsource their computing needs. One crucial issue of cloud computing is that services must be available to clients at all times. However, the cloud services may be temporarily unavailable due to maintenance of the cloud infrastructure, load balancing of services, defense against cyber attacks, power management, proactive fault tolerance, or resource usage. The unavailability of cloud services impacts negatively on the business model of cloud providers. One solution to tackle the service unavailability is Live Virtual Machine Migration (LVM), that is, moving virtual machines (VMs) from the source host machine to the destination host without disrupting the running application. Pre-copy memory migration is a common LVM approach used in most networked systems such as the cloud. The main difficulty with this approach is the high rate of frequently updating memory pages, referred to as ''dirty pages. Transferring these updated or dirty pages during the pre-copy migration approach prolongs the total migration time. After a predefined iteration, the pre-copy approach enters the stop-and-copy phase and transfers the remaining memory pages. If the remaining pages are huge, the downtime or service unavailability will be very high-resulting in a negative impact on the availability of the running services. To minimize such service downtime, it is critical to find an optimal time to migrate a virtual machine in the pre-copy approach. To address the issue, this paper proposes a machine learning-based method to optimize pre-copy migration. It has mainly three stages (i) Feature selection (ii) Model generation and (iii) Application of the proposed model in pre-copy migration. The experiment results show that our proposed model outperforms other machine learning models in terms of prediction accuracy and it significantly reduces downtime or service unavailability during the migration process.
... To predict CPU utilization and network bandwidth usage for live virtual machine migration, Duggan et al. [52] used an artificial neural network (ANN) and proposed a multi-time-ahead prediction model. The model aims to improve the performance of the data center by minimizing bandwidth utilization. ...
... As a result, the total migration time and downtime are dependent on the size of the VM's memory and bandwidth available for migration. Several studies [52,[58][59][60][61][62][63] were conducted for analyzing the correlation between bandwidth and performance parameters. Those studies have highlighted that the total migration time is reduced when a high amount of bandwidth resources are available. ...
Preprint
Full-text available
Organizations widely use cloud computing to outsource their computing needs. One crucial issue of cloud computing is that services must be available to clients at all times. However, the cloud services may be temporarily unavailable due to maintenance of the cloud infrastructure, load balancing of services, defense against cyber attacks, power management, proactive fault tolerance, or resource usage. The unavailability of cloud services impacts negatively on the business model of cloud providers. One solution to tackle the service unavailability is Live Virtual Machine Migration (LVM), that is, moving virtual machines (VMs) from the source host machine to the destination host without disrupting the running application. Pre-copy memory migration is a common LVM approach used in most networked systems such as the cloud. The main difficulty with this approach is the high rate of frequently updating memory pages, referred to as "dirty pages. Transferring these updated or dirty pages during the pre-copy migration approach prolongs the total migration time. After a predefined iteration, the pre-copy approach enters the stop-and-copy phase and transfers the remaining memory pages. If the remaining pages are huge, the downtime or service unavailability will be very high -resulting in a negative impact on the availability of the running services. To minimize such service downtime, it is critical to find an optimal time to migrate a virtual machine in the pre-copy approach. To address the issue, this paper proposes a machine learning-based method to optimize pre-copy migration. It has mainly three stages (i) Feature selection (ii) Model generation and (iii) Application of the proposed model in pre-copy migration. The experiment results show that our proposed model outperforms other machine learning models in terms of prediction accuracy and it significantly reduces downtime or service unavailability during the migration process.
... Extreme fluctuations in container CPU utilization may not only lead to unoptimized resource allocation, which reduces the quality of service, but can also trigger system stability issues and severe service disruptions [3,4]. In addition, container CPU utilization prediction provides a key reference for load balancing and container scaling strategies [5], enabling tasks to be dynamically adjusted across multiple containers or computing nodes, thus ensuring the stable and efficient operation of the entire cloud platform. ...
Article
Full-text available
Given the wide application of container technology, the accurate prediction of container CPU usage has become a core aspect of optimizing resource allocation and improving system performance. The high volatility of container CPU utilization, especially the uncertainty of extreme values of CPU utilization, is challenging to accurately predict, which affects the accuracy of the overall prediction model. To address this problem, a container CPU utilization prediction model, called ExtremoNet, which integrates the isolated forest algorithm, and classification sub-models are proposed. To ensure that the prediction model adequately takes into account critical information on the CPU utilization’s extreme values, the isolated forest algorithm is introduced to compute these anomalous extreme values and integrate them as features into the training data. In order to improve the recognition accuracy of normal and extreme CPU utilization values, a classification sub-model is used. The experimental results show that, on the AliCloud dataset, the model has an R2 of 96.51% and an MSE of 7.79. Compared with the single prediction models TCN, LSTM, and GRU, as well as the existing combination models CNN-BiGRU-Attention and CNN-LSTM, the model achieves average reductions in the MSE and MAE of about 38.26% and 23.12%, proving the effectiveness of the model at predicting container CPU utilization, and provides a more accurate basis for resource allocation decisions.
... Duggan et al. [22] suggested a model that estimated the CPU and network bandwidth requirements for the live migration. In this study, a recurrent neural network (RNN), a sequence prediction system, was compared with linear and nonlinear forecasting methods. ...
Conference Paper
Full-text available
Cloud computing has risen in its importance and is now hosted in massive data centers based on the virtualization technology that in turn allows creating multiple virtualized environments and several virtual machines (VMs) to provide multiple services on a single physical host. Despite its advantages, the virtualization technology might fail at any time or be updated or loaded. Accordingly, the VM must be transferred from the utilized host to another. This movement has now become a significant factor in saving the available resources, reducing energy consumption, increasing resource utilization, maintaining the quality of service in cloud data centers, increasing reliability, and achieving load balancing. Multiple methods for moving VMs have been developed for best utilization of the resources. Among these methods, pre-copy migration is considered as a common approach where it migrates the state of the VM’s memory from the original host to the intended host through a number of iterations before the shutting down of the VM on the original physical host with an amount of time called downtime. The problem with this approach is that it might cause a little disruption to the services operating in the VM. Therefore, various research attempts focused on studying and selecting proper destination hosts with the available resources to the future usage on the VM. Thus, this paper aims to highlight the current scientific work that target the aforementioned goal. Then, the paper tries to illustrate the current challenges and possible future directions in this research area.
Article
Full-text available
The Infrastructure as a Service (IaaS) platform in cloud computing provides resources as a service from a pool of compute, network, and storage resources. One of the major challenges facing cloud computing is to predict the usage of these resources in real time. By knowing future demands, cloud data centres can dynamically scale resources to decrease energy consumption while maintaining a high quality of service. However cloud resource consumption is ever changing, making it difficult for accurate predictions to be produced. This motivates the research presented in this paper which aims to predict in advance the level of CPU consumption of a host. This research implements evolutionary Neural Networks (NN), a powerful machine learning method, to make these predictions. A number of state of the art swarm and evolutionary optimisation algorithms are implemented to train the neural networks to predict host utilization: Particle Swarm Optimisation (PSO), Differential Evolution (DE) and Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES). The results of this research demonstrate that CMA-ES converges faster to a better solution on the training data. However when evaluated on the test data, DE performs statistically equal to CMA-ES. The results also demonstrate that the trained networks are still accurate when applied to CPU utilization data from different hosts with no further training needed. When evaluated to predict multiple steps into the future, the accuracy of the network understandably decreases but still performs well on average.
Conference Paper
Full-text available
The advent of Data Science has led to data being evermore useful for an increasing number of organizations who want to extract knowledge from it for financial and research purposes. This has triggered data to be mined at an even faster pace causing the rise of Data Centers that host over thousands of machines together with thousands of jobs running in each of those machines. The growing complexities associated with managing such a huge infrastructure has caused the scheduling management systems to be inefficient at resource allocation across these machines. Hence, resource usage forecasting of machines in data centers is a growing area for research. This study focuses on the Time Series forecasting of CPU usage of machines in data centers using Long Short-Term Memory (LSTM) Network and evaluating it against the widely used and traditional autoregressive integrated moving average (ARIMA) models for forecasting. The final LSTM model had a forecasting error in the range of 17-23% compared to ARIMA model’s 37 - 42%. The results clearly show that LSTM models performed more consistently due to their ability to learn non-linear data much better than ARIMA models.
Article
Full-text available
Multi-objective optimisation has received considerable attention in recent years as many real world problems have multiple conflicting objectives. There is an additional layer of complexity when considering multi-objective problems in dynamic environments due to the changing nature of the problem. A novel Multi-Objective Neural Network trained with Differential Evolution (MONNDE) is presented in this research. MONNDE utilizes Neural Network function approximators to address dynamic multi-objective optimisation problems. Differential Evolution (DE) is a state of the art single objective global optimisation problems and will be used to evolve neural networks capable of generating Pareto fronts. The proposed MONNDE algorithm has the added advantage of developing an approximation of the problem that can produce further Pareto fronts as the environment changes with no further optimisation needed. The MONNDE framework is applied to the Dynamic Economic Emission Dispatch (DEED) problem and performs equally optimal when compared to other state of the art algorithms in terms of the 24 hour cost and emissions. This research also compares the performance of fully and partially connected networks and discovers that dynamically optimising the topology of the neural networks performs better in an online learning environment than simply optimising the network weights.
Conference Paper
Full-text available
Energy awareness presents an immense challenge for cloud computing infrastructure and the development of next generation data centers. Inefficient resource utilization is one of the greatest causes of energy consumption in data center operations. To address this problem we introduce an Advanced Reinforcement Learning Consolidation Agent (ARLCA) capable of optimizing the distribution of virtual machines across the data center for improved resource management. Determining efficient policies in dynamic environments can be a difficult task, however the proposed Reinforcement Learning (RL) approach learns optimal behaviour in the absence of complete knowledge due to its innate ability to reason under uncertainty. Using real workload data we evaluate our algorithm against a state-of-the-art heuristic, our model shows a significant improvement in energy consumption while also reducing the number of service violations.
Conference Paper
Full-text available
One of the major challenges facing cloud computing is to accurately predict future resource usage for future demands. Cloud resource consumption is constantly changing, which makes it difficult for forecasting algorithms to produce accurate predictions. This motivates the research presented in this paper which aims to predict host machines CPU consumption for a single time-step and multiple time-steps into the future. This research implements a Recurrent Neural Network to predict CPU utilisation, due to their ability to retain information and accurately make predictions for time series problems, making it a promising candidate to predict CPU utilization with greater accuracy when compared to traditional approaches.
Article
Full-text available
Particle Swarm Optimisation (PSO) is a general purpose optimisation algorithm used to address hard optimisation problems. The algorithm operates as a result of a number of particles converging on what is hoped to be the best solution. How the particles move through the problem space is therefore critical to the success of the algorithm. This study utilizes meta optimisation to compare a number of velocity update equations to determine which features of each are of benefit to the algorithm. A number of hybrid velocity update equations are proposed based on other high performing velocity update equations. This research also presents a novel application of PSO to train a neural network function approximator to address the Watershed Management problem. It is found that the standard PSO with a linearly changing inertia, the proposed hybrid Attractive Repulsive PSO with Avoidance of Worst Locations (AR PSOAWL) and Adaptive Velocity PSO (AV PSO) provide the best performance overall. The results presented in this paper also reveal that commonly used PSO parameters do not provide the best performance. Increasing and negative inertia values were found to perform better.
Conference Paper
Full-text available
Cloud computing infrastructure has in recent times gained significant popularity for addressing the ever growing processing, storage and network requirements of scientific applications. In public cloud infrastructure predicting bandwidth availability on intra cloud network links play a pivotal role in efficiently scheduling and executing large scale data intensive workflows requiring vast amounts of network bandwidth. However, the majority of existing research focuses solely on scheduling approaches which reduce cost and makespan without considering the impact of bandwidth variability and network delays on execution performance. This work presents a time series network-aware scheduling approach to predict network conditions over time in order to improve performance by avoiding data transfers at network congested times for a more efficient execution.
Article
Full-text available
The resource utilization of servers (such as CPU, memory) is an important performance metric in data center networks (DCNs). The cloud platform supported by DCNs aims to achieve high average resource utilization while guaranteeing the quality of cloud services. Previous papers designed various efficient virtual machine placement schemes to increase the average resource utilization and decrease the server overload ratio. Unfortunately, most of virtual machine placement schemes did not contain the service level agreements (SLAs) and statistical methods. In this paper, we propose a correlation-aware virtual machine placement scheme that effectively places virtual machines on physical machines. First, we employ neural networks model and factor model to forecast the resource utilization trend data according to the historical resource utilization data. Second, we design three correlation-aware virtual machine placement algorithms to enhance resource utilization while meeting the user-defined SLAs. The simulation results show that the efficiency of our virtual machine placement algorithms outperforms the generic algorithm and constant variance algorithm by about 15%-30%.
Article
In cloud data centers, an appropriate Virtual Machine (VM) placement has become an effective method to improve the resource utilization and reduce the energy consumption. However, most current solutions regard the VM placement as a bin-packing problem and each VM is seen as a single object. None of them have taken the relationships between VMs into consideration, which supplies us with a kind of new perspective. In this paper, we provide a model which explores the relationships for every two VMs based on the resource requirement provided by ARIMA prediction. This model evaluates the volatility of resource utilization after putting the two VMs on the same host and we call this model as affinity model. Based on the affinity model, VMs will be placed on those hosts that have the highest affinity with them. Therefore, we call it as Predicted Affinity based Virtual Machine Placement Algorithm (PAVMP). The advantages of PAVMP are showed by comparing it with other VM placement algorithms on CloudSim simulation platform with the PlanetLab and Google workload trace.