ArticlePDF Available

A multitime‐steps‐ahead prediction approach for scheduling live migration in cloud data centers

September 2018
Software Practice and Experience 49(6)

September 2018
49(6)

DOI:10.1002/spe.2635

Authors:

Martin Duggan

University of Galway

Rachael Shaw

Galway-Mayo Institute of Technology

Jim Duggan

University of Galway

Enda Howley

University of Galway

Show all 5 authorsHide

One of the major challenges facing cloud computing is to accurately predict future resource usage to provision data centers for future demands. Cloud resources are constantly in a state of flux, making it difficult for forecasting algorithms to produce accurate predictions for short times scales (ie, 5 minutes to 1 hour). This motivates the research presented in this paper, which compares nonlinear and linear forecasting methods with a sequence prediction algorithm known as a recurrent neural network to predict CPU utilization and network bandwidth usage for live migration. Experimental results demonstrate that a multitime‐ahead prediction algorithm reduces bandwidth consumption during critical times and improves overall efficiency of a data center.

This figure illustrates a recurrent neural network. 55 Neurons are connected to the weighted synapses that pass signals between neurons. The recurrent synapses can be seen in the hidden layer of neurons. This gives the recurrent network the abilities to retain information [Colour figure can be viewed at wileyonlinelibrary.com]

…

Bandwidth training data prediction accuracy

…

Predictive performance of backpropagation through time (BPTT), autoregressive integrated moving average (ARIMA), and various backpropagation (BP) approaches. A, BPTT and BP (sliding window 20); B, BP sliding window 10) and ARIMA [Colour figure can be viewed at wileyonlinelibrary.com]

…

Bandwidth test data prediction accuracy

…

Multi-step ahead training data prediction

…

Figures - uploaded by Martin Duggan

Content may be subject to copyright.

Content uploaded by Martin Duggan

Content may be subject to copyright.

Received: 29 December 2017 Revised: 29 June 2018 Accepted: 25 July 2018

DOI: 10.1002/spe.2635

SPECIAL ISSUE PAPER

A multitime-steps-ahead prediction approach for

scheduling live migration in cloud data centers

M. Duggan R. Shaw J. Duggan E. Howley E. Barrett

Information Technology, National

University of Ireland Galway, Galway,

Ireland

Correspondence

M. Duggan, Information Technology,

National University of Ireland Galway,

Galway H91 TK33, Ireland.

Email: m.duggan1@nuigalway.ie

Funding information

Irish Research Council for Science,

Engineering and Technology

Summary

One of the major challenges facing cloud computing is to accurately predict

future resource usage to provision data centers for future demands. Cloud

resources are constantly in a state of flux, making it difficult for forecasting algo-

rithms to produce accurate predictions for short times scales (ie, 5 minutes to

1 hour). This motivates the research presented in this paper, which compares

nonlinear and linear forecasting methods with a sequence prediction algorithm

known as a recurrent neural network to predict CPU utilization and network

bandwidth usage for live migration. Experimental results demonstrate that a

multitime-ahead prediction algorithm reduces bandwidth consumption during

critical times and improves overall efficiency of a data center.

KEYWORDS

cloud computing, CPU, network bandwidth, neural network, prediction algorithms

1INTRODUCTION

It is estimated, by 2020, there will be 51 974 GB of internet traffic generated per second1and with more technology com-

panies offering computing as a service via cloud computing, this will generate massive volumes of data from hosts and

virtual machines (VMs). The more information available from these host machines and VMs will provide a better under-

standing of how they function and will also allow for a more in-depth monitoring of cloud resources. CPU is one of the

most important metrics for measuring and testing the performance of a host machine. Recent studies by other works2-4

investigated one-step ahead CPU utilization forecasting methods such as local regression and feed-forward neural net-

works. However, these one step ahead prediction models (usually forecast on a time scale no longer than 5 minutes ahead)

give insufficient time for cloud resources to be adjusted. When sudden high demands occur, eg, if an algorithm can pre-

dict further into the future there will be a greater chance to prevent a host becoming overutilized. Benson et al hasshown5

predicting a workload on a short time scale such as 5 to 10 minute intervals is more difficult than predicting for long-term

time scales (ie, time steps of days or weeks). This is due to the fact that cloud resources in these short time scales can be

extremely unpredictable. This is highlighted by Armburst et al who lists performance unpredictability as one of the main

obstacles to preventing the growth of cloud computing.6The further into the future that an algorithm can accurately pre-

dict resources usage is critical to how well a data center is able to optimize its resources. This is one of the key ideas that

has motivated this research. Another key metric to monitor when investigating live migration is the network bandwidth

availability, as it affects how quickly a VM will be transferred from a source to a destination host. Akoush et al presented

models that predict migration performance for specific workloads on a Xen virtual machine monitor and research by

Hu et al demonstrated that estimating future network traffic can enable better planning, greater resource provisioning,

and faster transferring of data.7,8At the moment, resource prediction approaches only examine single time step ahead

prediction.

2DUGGAN ET AL.

In this study, we apply several forecasting algorithms to optimize the scheduling times of migration and improve a data

centers efficiency by reducing service level agreement violations (SLAVs). We implement a multiple-time-step-look-ahead

algorithm known as a recurrent neural network (RNN) to predict CPU and network bandwidth and compare the results

against both traditional one step ahead nonlinear and linear forecasting algorithms. A major reason for implementing

the RNN is that it has the ability to retain information and accurately make predictions for time series problems, making

it a promising candidate to predict cloud resources with greater accuracy when compared with traditional approaches.

The main contributions of this paper are as follows.

1. Compare traditional nonlinear and linear forecasting model accuracy with an RNN when predicting CPU and band-

width “one time step” ahead. The results are validated by using evaluation metrics such as mean square error (MSE),

root MSE (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE).

2. Examine the performance of the RNN when predicting host CPU and network bandwidth for multitime steps into the

future.

3. Examine the performance of a data center when both one time step model and multitime-steps-ahead model are

implemented in a simulated cloud environment.

The rest of this paper is organized as follows. In Section 2, we discuss the background of live migration, related work

on network aware live migrations, discuss how artificial intelligence (AI) is impacting cloud computing, and detail the

algorithms used in this paper. Section 3 details the experiments conducted and data models used. In Section 4, we present

our experimental analysis and results and discussion. Section 5 concludes this paper.

2RELATED WORK AND BACKGROUND

Live VM migration allows for VMs to be moved from source to destination hosts while ensuring no performance degrada-

tion to a users experience occurs. Live migration transfers the VMs entire system state, including CPU, memory, and disk.

Live migration consists of two phases, precopy (VM migration time) and stop-and-copy (downtime). The precopy phase

has the VM still running on the source host while its memory pages are copied over to the destination host. Stop-and-copy

phase terminates the VM on the source host after a certain amount of iterations of memory page copying and the remain-

ing unsynchronized memory pages of the VM is then copied over to the destination host. Once this is completed, the

VM is activated on the destination host. The live migration process consumes significant resources such as CPU and net-

work bandwidth from both the source and the destination hosts. Several studies such as the works of Wood et al and

Mandal et al have been conducted in provisioning adequate bandwidth.9,10 They both highlight that when high amounts

of the bandwidth resource is available it reduces migration times. Clarke et al highlighted that VM migration consumes

large amounts of bandwidth for several seconds.11 The computation of the total migration time and downtime of a set

of VMs depends on how many of VMs are simultaneously transferred off an overutilized host machine. This highlights

the need for a forecasting model that will accurately predict when a host will become over utilized and decide when live

migration should be scheduled so as to not saturate the network bandwidth during peak times.

2.1 Network aware live migration

Even though live migration has attracted attention in recent years, it has primarily focused on power management and

the efficiency of physical hosts, such as the works of Beloglazov and Buyya and Verma et al.12,13 Recently, researchers have

been looking closely into how network traffic contributes to long delays when migration of VMs occur. Chen et al proposed

a novel migration strategy that quantifies the benefits of VM migration and cost of VM placement to a network link load in

data centers.14 They take the network link load and bandwidth cost factors into account when conducting migration. Piao

and Yan presented a network aware VM placement and migration approach for data intensive applications.15 They devel-

oped a model that places a VM on a host machine, taking into account the network conditions between the source host and

destination. Stage and Setzer proposed a migration and scheduling model, while considering bandwidth requirements

and network topology.16 Chen et al researched how to coordinate multiple VM migrations while sharing network links and

global bandwidth resources.17 Ghorbani and Caesar presented a method for guaranteeing network bandwidth; however,

these approaches come at a cost of high network resources utilization.18 Duggan et al19 examined how an autonomous

learning agent decides appropriate times to schedule live migration from underutilized hosts by observing cloud traf-

fic demand patterns. They show, by analyzing current bandwidth values, reinforcement learning can enhance live

DUGGAN ET AL.3

migration, reducing energy consumption, and improving overall system performance based on an service level agree-

ment (SLA). Duggan et al investigated when to migrate VMs from an overutilized host by using reinforcement learning

and a moving average (MA) prediction approach for bandwidth availability to determine the optimal time to schedule

live migration.20 The work in this paper differs from the aforementioned research as we examine several nonlinear, linear,

and sequence prediction algorithms to decide when a host will become overutilized and to determine when to migrate

VMs based on network bandwidth availability.

2.2 Forecasting

Accurate forecasting techniques are important for effective planning to predict future resource requirements and as a

result it has a significant bearing on practical decision making. As a result, forecasting and predication methodologies have

been widely studied and have been applied to several different areas such as predicting household energy consumption,21

stock market returns,22 and global environmental change.23 Recently, there has been a move toward integrating AI tech-

niques in general to improve the overall efficiency of a cloud data center. Several works show how a broad range of AI

algorithms can provide cloud systems with the abilities to better adapt to the changes in cloud resource consumption to

improve resource scaling, VM live migration,19,20,24-27 and resource allocation28,29 in cloud computing. Tang et al exploited

collaborative filtering and location-based data smoothing for cloud service quality of service (QoS) prediction.30 Their

algorithm employed a location-based data smoothing algorithm to fill missing QoS values in the users QoS matrix and

outperformed other collaborative filtering techniques. The smoothing algorithm was based on the observation of the users

from the same local neighborhood, which have similar QoS performance.

The inherently dynamic nature of cloud workloads makes it difficult to makeinformed resource management decisions

without a relatively accurate estimate of future resource requirements.31,32 In recent times, there has been a growing

interest in the application of statistical methods and machine learning (ML) techniques to improve the efficiency and

reliability of cloud services. This growing trend is largely due to the many fruitful and compelling benefits of adopting

predictive ML solutions including the potential to deliver greater energy efficiency and improved performance across a

broad range of areas. One area in particular that has seen a significant growth in the number of predictive-based solutions

is VM placement optimization.33-37 These predictive-based solutions showed that by predicting future resource demand

they could improve energy related costs and environmental sustainability of modern data centers. Predictive algorithms

have also been used to improve cloud capacity planning and execution times by predicting server workloads.32,38 Research

has also explored the application of predictive techniques to estimate future network load for workflow scheduling on

the cloud.7,26,39 The results showed that, by adopting more proactive solutions, it can enable better planning to improve

the efficiency of scheduling decisions while also reducing costs significantly. In particular, results showed a reduction in

data transfer times while also increasing the number of valid schedules that are better able to meet QoS requirements.

Live migration is a promising technique to improve load balancing, achieve better resource utilization, and cost man-

agement while also supporting energy efficiency in clouds.18 Despite such benefits, live VM migration remains a challenge

for cloud providers, since the reallocation of VMs in the data center results in significant overheads on both the physical

hosts and also the shared network. Poor migration decisions result in hosts becoming overloaded, data transfer delays,

and SLAVs.16,18 Currently, the vast majority of works in the literature focus on heuristic approaches, which often base

their migration decisions on current resource demands.13-15,18 However, such methods often lend themselves to less effi-

cient resource utilization and redundant decisions in the data center. Application workloads often exhibit time-varying

demand patterns; failing to consider future demand can quickly result in redundant migration decisions, having a negative

impact on energy and performance.40 By estimating future resource demand, resources utilization can be more efficiently

planned, allowing providers to determine when a host will become overutilized while also enabling data to be transferred

at more optimal times to improve the overall QoS provided. Akoush et al8demonstrated that, by using a predictive-based

approach, they could accurately estimate migration times to enable more dynamic and intelligent placements of VMs

without degrading performance. Cui et al41 introduced a novel context-based prediction algorithm to optimize the pre-

copy phase of the live migration process. The results showed that by using prediction they could significantly shorten

total migration time, downtime, and the total pages transferred. Using regression, Wu and Zhao42 proposed a performance

model to predict migration latency. The model can be employed to predict migration time given its application's behavior

and the resources available to the migration. Forecasting CPU resource usage to enable improved migration scheduling

is an area that has always been a popular avenue of research in cloud computing. Some of the most influential work

stems from authors such as Liu et al, who introduced a 10-second look ahead CPU prediction approach to guide schedul-

ing decisions. Beloglazov and Buyya also proposed the implementation of a local regression algorithm to improve the

4DUGGAN ET AL.

management of cloud resources by predicting when hosts will become over utilized in a data center based on CPU

utilization.43 The results overall demonstrated that by estimating CPU demand they could migrate VMs before hosts

became overloaded to improve energy efficiency, reduce SLAVs, and VM migrations, overall resulting in more optimized

resource utilization.

The ability to accurately predict future resource usage is a key challenge facing cloud resource management strategies

due to the growing complexity of modern data centers.44 The accuracy of a prediction model has a large impact on the

overall performance of live migration. In determining which hosts in the data center will become overutilized, inaccu-

rate predictions can result in suboptimal decisions. In cases where the prediction model is overpredicting CPU resource

usage, the system may migrate VMs off hosts whose actual resource usage in future time steps remains at an optimal

level. As a result the system would incur unnecessary migrations, increased energy consumption, and SLAVs. Conversely,

underpredicting CPU resource usage could also result in failing to migrate VMs off hosts that will be overutilized in the

future leading to SLAVs and greater energy consumption. Similar to the previous scenarios, inaccurate resource predic-

tions would also have a negative impact on scheduling live migrations. In particular, using such estimates would result in

VMs being migrated at sub optimal times causing increased transfer times and SLAVs, while overall enabling inefficient

usage of network resources. An accurate predictive model would allow for more optimal and proactive migration deci-

sions. By obtaining a relatively accurate estimate of CPU demand, hosts likely to become overloaded can be detected in

advance and action can be taken prior to incurring significant service degradation.43 Moreover, an accurate estimation of

bandwidth resources can also improve the live migration process. An accurate estimate would allow a solution to better

adapt to dynamically changing network conditions to generate more efficient scheduling decisions.26 Overall, employ-

ing an accurate prediction model for live migration would enable better planning, more efficient resource provisioning,

and faster transferring of data as highlighted in the following studies.7,8,41-43 Given the importance of accurate predictive

models, our work conducts a comparative study using novel and commonly used forecasting techniques including neu-

ral networks, which are known for their powerful predictive capabilities.45 We compare the accuracy of each model on

both the training and test data sets across a range of widely used performance metrics, namely, MSE, RMSE, MAE, and

MAPE. In other works, Zhang et-al examined the problem of predicting available CPU performance in a time-shared grid

system.46 They evaluate a new and innovative method to predict the one-step-ahead CPU load in a grid. Their prediction

strategy forecasts the future CPU load based on the variety tendency in several past steps and in previous similar patterns.

They examined their strategy with four different ML algorithms with a large load trace to reduce the prediction errors,

which are between 22% and 86% less than those incurred by four previous methods. Cao et al applied a novel ensemble

model for online CPU load predictions in the cloud environment.47 The main focus of their model was the multiple pre-

dictor set, whose predictor members can be dynamically adjusted. Their results showed that an ensemble predictor model

can achieve high prediction accuracy.

Neural networks are one of the most effective and versatile ML algorithms and have been successfully applied to

areas of cloud computing in scheduling,48 intrusion detection,49 distributed denial of service attack defense50 and load

forecasting.51 Neural networks have previously been used to forecast resource demands in cloud computing. Duy et al

employed a neural network predictor for optimizing server power consumption in a data center.48 They use a feed-forward

neural network to predict future load demands based on historical demands to turn on/off servers to minimize the

energy usage. Prevost et al implemented neural networks and a linear predictor algorithms to forecast future workloads.51

Bey et al used several different models for time series prediction. They use an adaptive network to estimate the future value

of CPU load for distributed computing.4However, their hybrid predictors were designed to perform for one-step-ahead

prediction and the work presented in this paper builds on this work by predicting both one step and multisteps ahead. All

of the aforementioned research highlighted outlines how neural networks are effective at addressing many of the problems

in cloud computing, in particular, CPU forecasting. The research presented in this paper makes the novel contribution of

applying RNNs to predict CPU to determine when overutilization will occur in a host and predict network bandwidth to

determine the optimal times to migration VMs. Zhang et al showed that resource request prediction can be improved by

the latest deep learning techniques.52 Their deep belief network was used to predict both CPU and random access memory

(RAM). Their approach improves short-term prediction by reducing the MSE by 76% and 61% and long-term predictions

by 83% and 67% for both CPU and RAM. Mason et al40 implemented a number of state-of-the-art swarm and evolutionary

optimization algorithms of a neural network to predict CPU usage, such as particle swarm optimization, differential evo-

lution, and covariance matrix adaptation evolutionary strategy. Their results show that the covariance matrix adaptation

evolutionary strategy-trained neural network out performs both differential evolution and particle swarm optimization,

but most importantly, all swarm and evolutionary-trained neural network outperform traditional approaches such as

linear regression (LR) and MA.

DUGGAN ET AL.5

As mentioned earlier, recently, network bandwidth prediction has been gaining attention in cloud research. By esti-

mating future demand, resources can be more efficiently planned for, allocated in advance, and released once they are

no longer required to improve the overall QoS provided.53 In similar work, Hu et al applied autoregressive integrated MA

(ARIMA) modeling to estimate network traffic performance based on simple network management protocol data.7In

addition, Genez et al39 proposed the implementation of an alternative multiple LR model to combat the impact of impre-

cise estimates of bandwidth availability on workflow scheduling decisions in hybrid clouds. Their results showed that

schedulers, which adopt such an approach, can increase the number of qualified schedules that meet QoS requirements.

Shaw et al used an ARIMA model to forecast a days worth of bandwidth utilization within a data center.26

2.3 Comparative forecasting methods

In this research, we compare several nonlinear and linear “one time step” ahead prediction algorithms with a multi-step

ahead algorithm known as an RNN to predict host CPU and network bandwidth utilization. The following section

describes all algorithms used in this paper. The research presented in this paper makes the novel contribution by com-

paring nonlinear and linear prediction algorithms that predict host CPU and network bandwidth one step ahead with an

RNN that predicts multiple time steps ahead to optimize live migration.

2.3.1 Nonlinear prediction methods

Neural networks are function approximators that are inspired by the biological neural networks that constitute the human

brain.54 Some of the applications of neural networks include power generation,55 control,56 and watershed management.57

Figure 1 illustrates the architecture for a neural network, which is arranged in a number of layers. The input layer is

responsible for taking in the inputs to the model, the hidden layer is where the computation is carried out, and the output

layer produces the output of the model.

The standard feed-forward network consists of an input layer of neurons, one or multiple hidden layers of neurons, and

an output layer. The neural networks receives information in form of a signal (normalized between 0 and 1) through the

input layer neurons, and then outputs a signal using the sigmoid function. The signal or input that the network receives

in this paper is in the form of two CPU utilization values from a host machine or two bandwidth utilization values. Both

sets of data are normalized between 0 and 1.

The two input values (CPU or bandwidth) are propagated forward through the hidden layers of neurons via synapses

(weighted connections). Then, the network calculates an output at the output layer neuron or neurons. In this paper, only

one output is needed and the output signal corresponds to a future CPU or bandwidth values of a host machine. An error

signal is calculated by finding the difference between the actual and the predicted value. This error is then propagated

back through the network, and the weights (synapses) are adjusted to correct the error of the prediction.

Aside from the input layer, a neuron in any other layer will have an input. The sum of the weighted signals that are

outputted from other connected neurons. A neurons input signal is described by

v𝑗=



i=1

wi,𝑗 ai,(1)

FIGURE 1 This figure illustrates a recurrent neural network.55 Neurons are connected to the weighted synapses that pass signals between

neurons. The recurrent synapses can be seen in the hidden layer of neurons. This gives the recurrent network the abilities to retain

information [Colour figure can be viewed at wileyonlinelibrary.com]

6DUGGAN ET AL.

where vjis the input to a neuron in the jth layer, layer iis the preceding layer to jthat contains Nneurons, each neuron

in layer ihas output ai, and each of these output signals are weighted by the value wi,jas they are passed to each neuron

in layer j.

Each neuron aioutputs a value between 0 and 1. This output value is determined by the activation function of the

neuron. The most commonly used activation function is the sigmoid function. This is described by

a𝑗=1

1+exp −v𝑗

.(2)

This research implements an RNN, illustrated in Figure 1. Recurrent networks are different from the standard

feed-forward networks as the hidden layer neurons have recurrent connections. These connections allow the hidden layer

neurons to connect to itself, thus giving the neural network memory of previous predictions, which makes it well suited

to the problem of predicting CPU utilization or bandwidth demand. The recurrent network in this paper is trained using

the popular backpropagation-through-time (BPTT) algorithm.58 The idea of BPTT is the unfolding of the RNN at a dis-

crete time into a multilayer feed-forward neural network each time a sequence is processed. The BPTT is different to the

feed-forward neural networks as it enables the RNN to store past information; thus, it is suitable for sequential models.

The standard feed-forward algorithm is updated as follows:

v𝑗=



i=1

wi,𝑗 ai+



h=1

sh(t−1)uih,(3)

where Uis the recurrent weight matrix and sh(t−1)is the previous hidden layer.

Normal backpropagation (BP) algorithm weights are updated by calculation of the cost function (the error from the

actual answer and predicted answer) shown as

C=1∕2



p=1



e=1

(dpk −𝑦pk)2,(4)

where dis the desired output, kis the total number of training samples, and ois the number of output units. Then, the

change in weights for the output nodes can be calculated as shown

𝛿pk =(dpk −𝑦pk)g′(netpk ),(5)

where gis the activation function while net represents inputs. The changes in weights for the hidden layer weight can be

represented as shown

𝛿p𝑗=



k=1

𝛿pk wk𝑗g′(netp𝑗).(6)

Therefore, the recurrent weights can be then backpropagated back through the network as shown

Δuih =



p=1

𝛿p𝑗sph(t−1).(7)

The reason for using the RNN to predict cloud resources such as CPU and bandwidth utilization over a feed-forward

neural network is due to their ability to retain information and accurately make predictions for time series problems.

This makes it a promising candidate to predict CPU and bandwidth utilization with greater accuracy when compared

with traditional approaches. Each neural network used in this research has three hidden neurons in the hidden layer and

has two inputs from the input layer. The inputs into the networks are the current and previous values of CPU utilization

or bandwidth demand. When parameter sweeps were conducted, showed that a network with three hidden neurons

produced the best results. The parameter sweeps also highlighted that having greater than two inputs for the RNN on both

the bandwidth and CPU utilization did not increase the performance. The network had one output that corresponded to

the network's prediction of future CPU utilization or bandwidth data. Each neural network will be trained over 10 000

evaluations and will be evaluated on unseen test data. The experiments are repeated over 10 runs to ensure statistically

significant results.

Backpropagation is the most popular method of neural network training and is suitable for supervised learning prob-

lems. The algorithm works by calculating the error between the target output and the observed output. This error is

then propagated back through the network and is used to update the weights. A standard feed-forward BP algorithm

DUGGAN ET AL.7

is implemented in this research as a benchmark algorithm for comparative purposes against the RNN algorithm. This

network also has three hidden neurons and one output. The BP algorithm is similar to the recurrent network described

in Section 3.3. The feed-forward method is calculated exactly as Equation 2 is calculated, and the BP calculation is the

same as 3,5, and 6, however, without the recurrent layer calculation. In this paper, we also examine how many inputs

a standard backprop-trained network needs to produce similar accuracy as a recurrent network. As RNN retains infor-

mation from previous predictions, we have implemented the sliding window technique59 to enable the BP algorithm to

predict both CPU and bandwidth. An important contribution of feed-forward neural networks is that they are capable of

modeling more complex nonlinear functions; as a result, they are widely used as time series forecasters. However, neural

networks rely on the assumption that there is a learnable mapping from a set of inputs sequences to output values. Unlike

RNNs, which have the ability to learn the context (temporal dependency) of observations over time, a key requirement of

feed-forward neural networks is the specification of the temporal dependency of the time series data in the design of the

model. As a result, experiments were conducted with the sliding window method, which maps an input sequence of size

kto an output value y, where the input sequence xis a vector of bandwidth or CPU values over ktime intervals and the

predicted value yis a single time step ahead of the input sequence. The size of the windows considered in this paper are

2, 10, 20, and 30; these all relate to number of inputs of CPU or bandwidth to the network.

The ARIMA modeling is one of the most popular and frequently used forecasting approaches in time series analysis.60

At its simplest, a time series can be described as a collection of observations over successive time intervals from which

future values may be predicted. Time series data is often composed of several fundamental components such as long term

trends, seasonal fluctuations, and correlations between sequential observations. The aim of an ARIMA model is to identify

and describe the underlying components and systematic variations in the time series data to forecast future values. An

ARIMA model is defined by three terms denoted as (p,d,q). The identification of a valid model is the process of finding

suitable values for p,d,andq, which best capture the fundamental patterns in the data.

The integrated component of the model (d)is identified prior to determining the values of pand q. One of the fundamen-

tal principles in applying an ARIMA methodology is the time series data must be stationary. The concept of stationarity

plays a crucial role in the process of fitting an ARIMA model; a nonstationary series is often unstable and can result in

false correlations in the series making it extremely difficult to model. In general, a stationary series is one whose statisti-

cal properties such as mean and variance remain constant over time. To transform a nonstationary series into a stationary

one, the series must be differenced; this is achieved by subtracting the value of an earlier observation from the value of a

later observation. The number of times the time series data is differenced determines the value of the component (d).

The autoregressive (AR)term (p)represents the lingering effect of preceding observations on current values in the series.

For example, an AR(1)model forecasts future values based on the value of the preceding observation yt−1as denoted in

Equation 8 as follows, where 𝜙is a parameter of the model known as the autoregressive coefficient, which represents the

magnitude of the relationship and 𝜀trepresents the random variation at the current time period t

𝑦t=𝜙(𝑦t−1)+𝜀t.(8)

The MA (MA)term (q)represents the effects of previous random variation on the current periods random error. For

example, an MA(1)model forecasts future values based on a combination of the current random variation and previous

error as defined

𝑦t=𝜃(𝜀t−1)+𝜀t,(9)

where 𝜀t−1is the value of the previous random shock; 𝜃is the correlation coefficient of the model, which defines the extent

of the relationship; and 𝜀trepresents the random variation at the current time period t. The combined model assuming

differenced data is denoted

𝑦t=c+𝜙1(𝑦t−1)+ … +𝜙p(𝑦p−1)+𝜃1(𝜀t−1)+ … +𝜃q(𝜀t−q)+𝜀t.(10)

The ARIMA models are also capable of modeling a variety of highly seasonal data. Seasonal ARIMA models are classified

by including the following additional seasonal terms (P,D,Q)m,wheremdefines the number of periods per season. The

seasonal portion of the model operates across previous seasonal periods as opposed to previous observations, which occurs

in the standard model introduced above. However, in practice, both models are often combined to capture all of the

fundamental characteristics of a seasonal time series. In this study, the Box-Jenkins methodology was employed to fit only

the bandwidth model.61 This methodology consists of several steps, ie, model identification, estimation and diagnostic

checking, and lastly forecasting and validation.

8DUGGAN ET AL.

2.3.2 Linear prediction methods

Linear regression is a popular statistical approach to estimate the relationship between one or more input variables and the

output variable. The case of one input variable is called simple regression. More than one input variable is called multiple

regression. In all cases, regression approximates a function (regression function) that it can be considered as linear or

nonlinear. If the independent variable is x=[x1,x2,…,xm], and the corresponding dependent variable is y, then the LR

model is as shown

𝑦t=𝛽0+



i=1

𝛽ixi.(11)

The parameters 𝛽0and𝛽1 are regression coefficients. A measure of goodness of fit, ie, how well it predicts the output

variable y, is the magnitude of the residual eiat each of the ndata points as shown

ei=𝑦−𝑦. (12)

eiis the difference between the prediction output 𝑦 and the real output yin data point i. In this research, LR takes two

inputs of either CPU or bandwidth.

Random walk (RW) forecasting is the most basic forecasting method implemented. This approach consists of predicting

the next future value as equal to the current observed value.

Moving average is the final method that is used for the prediction of both network bandwidth, and CPU utilization is

an MA method, which is another commonly used forecasting approach. This method consists of predicting a future value

by averaging nprevious values.

3EXPERIMENT DETAILS

The following section describes metrics that are used to determine the most accurate algorithms for predicting CPU

and bandwidth utilization. This section also describes the data models for CPU and bandwidth used to train and test

both nonlinear and linear models in this paper. Finally, in this section, a detailed description of the simulated cloud

environment implemented to test each algorithm performance in terms of service level violations, bandwidth usage, and

energy consumption is given.

3.1 Metrics

The following describes the metrics used to compare the accuracy of each algorithm mentioned in Section 2.3. In the

following descriptions of the metrics, ydenotes actual, 𝑦 denotes forecast, and nis the number of values.

Mean absolute error measures the difference between the predicted value and the actual value by the mean of the

absolute error. The MAE tells us how big of an error we can expect from the forecast on average as shown

MAE =1



t=1𝑦−𝑦.(13)

Mean absolute percentage error calculates the average % the forecasted values deviate from the actual values observed

in the test set as shown

MAPE =100



t=1



𝑦−𝑦

𝑦



.(14)

Both the MAPE and MAE methods are based on the mean error, and are likely to underestimate the impact of large

infrequent errors. This is why the MSE and RMSE are also used in this paper to measure prediction accuracy.

Mean squared error is the measure of how close a fitted line is to data points. For each of the data points, you take the

distance vertically from the point to the corresponding yvalue on the curve fit (the error), and square the value. Then, you

sum all these values for all data points and divide by the number values n. The squaring of each error is done to prevent

negative values. The smaller the MSE, the closer the fit will be to the data

MSE =



t=1

(𝑦−𝑦)2

n.(15)

DUGGAN ET AL.9

Root mean squared error is the square root of the MSE. By squaring the errors before calculating the mean and then

taking the square root of the mean, we arrive at a measure of the size of the error that gives more weight to the large

infrequent errors

RMSE =









t=1

(𝑦−𝑦)2

n.(16)

3.2 Data models

Two data models (bandwidth and CPU) were used in this study as data for the algorithm to predict and also as input to

our simulator.

Bandwidth data. The bandwidth model implemented as part of this study is based on transmission control protocol

bandwidth measurements collected from Amazons EC2 cloud.62 In particular, the bandwidth values were taken from

measurements of the network performance within Amazons EU region. This benchmark study provides a measurement

of the available bandwidth on the network links at four points over a single day. In order to generate a bandwidth model

with a sampling distribution of 10-minute intervals, the values were interpolated resulting in a time series model com-

posed of 10-minute intervals over a 24-hour period. In general, the more data that is available to fit a predictive model

provides a greater opportunity to generate better predictions. Two data sets were generated for from the transmission

control protocol bandwidth measurements.62 The first was the training set; in order to generate the training data, the ini-

tial bandwidth values over 24 hours were sampled at each interval and the corresponding values were inputted into a

Gaussian distribution to produce a valid bandwidth model over 7 consecutive days, as shown in Figure 2A. The Gaussian

distribution served to introduce uncertainty into the bandwidth values on the network links. The resulting model con-

sisted of 144 values for each day (6 data points per hour x 24 hours) or 1008 values in total. Using the same procedure as

aforementioned, a test set was also generated from the initial distribution and used to validate the selected models. The

test set contained 432 values (3 days workload) and was used to test the accuracy of the predictive models in this study.

In the forecasting literature, there is generally no principled approach to dividing data into training and test sets. Hynd-

man and Athanasopoulos60 suggested an 80/20 split between the training and test sets, where roughly 80% of the data is

used to train the model and the remaining 20% is used to test the model. Broadly speaking, a commonly occurring ratio

within the community is to use 60% to 80% of the data as the training set and 20% to 40% as the test set. Based on this, we

divided the bandwidth model using 70% of the data as the training data, which corresponded to the first 7 days worth of

time series data. The remaining 30% of the data was used as a test set, which corresponded to the next 3 days. By dividing

the data using these ratios, it helps to ensure that the predictive model can generalize to unseen data.

CPU data. The CPU data model implemented as part of this study is Google's cluster data trace63 that details the resource

usage information of machines in a cluster for a 29-day trace period. This data set is over 300 GB in size and contains

information of over 12 000 host machines in Google's data centers. In this paper, we are only concerned with the CPU

values from these data sets. All of the algorithms mentioned in Section 2.3 are trained on a CPU data set that contain

(A) (B)

Bandwidth Model for Intra Cloud Network Links Within the EU Region CPU Training Data

CPU Utilisation

Bandwidth Availability

Mbps

Time Time

FIGURE 2 Bandwidth and CPU training data sets. A, Bandwidth model; B, CPU training data set

10 DUGGAN ET AL.

7623 values and tested on a data set that contain 144 CPU values. The testing data contain 144 values representing each

10 minute in 24 hours. The training data is 53 days. The reason for having such a large training set is that there is a huge

amount of variation within the Google cluster data trace and this will give each algorithm an opportunity to discover

patterns with in the data. Figure 2B shows the CPU training data used in this paper.

3.3 Simulator model

In this paper, the targeted system is an infrastructure-as-a-service environment, represented by a large-scale data cen-

ter consisting of a cluster of 600 host machines represented as Hh1,h2,....hn. Each host hncontains a list of VMs

Vv1,v2…vnand has a capacity of ah. Each VM is of size 1024 MB. Each VM is allocated avof CPU. Therefore, the max-

imum number of VMs allocated to a host is represented as m=ah∕av. In the interest of simplification, we assume that

the all of the host machines and VMs in the simulated environment are homogeneous. The tasks processed by each VM

will be driven by the Google cluster trace data set detailed in Section 4.2. Any host's CPU utilization greater than 85% is

deemed to be overutilized. The overutilized host detection policies will be continuously monitoring each host machine's

CPU utilization. A host will stay in an overutilized state until necessary VM migrations take place. Live migration occurs

once a host becomes overutilized and VMs will be moved between hosts. Live migration will have a negative impact on the

performance of the host machine and VMs. Voorsluys et al64 have shown that a VMs performance degradation and down-

time during live migration depends on the application's behavior (ie, how many memory pages the application transfers

during live migration). The average performance degradation including the downtime can be estimated as approximately

10% of the CPU utilization. Moreover, in our simulations, we model that the same amount of CPU capacity is allocated

to a VM on the destination host machine during the course of migration. This means that each migration may cause SLA

violation; thus, it is crucial to minimize the number of VM migrations and select a host machine, which will not become

overutilized if a VM is placed on it. The length of a live migration depends on the total amount of memory used by the VM

and the total network traffic in the cloud environment, which will vary at each time (t). Once it has been decided that a

host is overutilized, the next step is to select particular VMs to migrate from the host. We have implemented the minimum

migration time policy, which selects a VM that will require the minimum time to complete a migration relative to the other

VMs allocated to the same overutilized host. Virtual machines are selected based on their RAM utilization. If no suitable

host is available in the current time step, then no migration will take place and the host will remain overutilized. Each

VM will be assigned resources from the assigned host's RAM and CPU. In this paper, we have implemented a sequential

migration transfers policy of VMs, which means that VMs will be transferred one after another. Due to workload changes,

resources used by the VMs, CPU will vary, possibly leading to SLA violations and increased energy consumption. The

SLA violation will occur in host machines when the total demand for the CPU performance exceeds the available CPU

capacity ah.

Power consumption by host machines in data centers is determined by the CPU, memory, disk storage, power supplies,

and cooling systems.65 Recent studies66,67 have shown that the power consumption by a host machine can be accurately

described by a linear relationship between the power consumption and CPU utilization. The power model is similar to

the power model used in the work of Beloglazov and Buyya.43 In this paper, only the HP ProLiant G4 host is used. Table 1

represents the power usage at each of the 10% CPU utilization interval of the host.

We implemented CloudSim's (2011) network flow model for calculations of current delay in the network when migrat-

ing VMs.68 They use point to point communication for data from source uto destination dentity, which is called the flow

and is represented as f=sizef;u;v, where size fis the number of bytes in the flow; and vnisthecurrentVMbeing

migrated from source host uto destination host d. The bandwidth that is available between two entities is represented as

bw; the latency is denoted as lat. The duration of a single network flow can be calculated as shown

delay(u,d,vn)=lat +size𝑓∕bw.(17)

Migration from an overutilized host will stop automatically when the host is under the threshold set at 85% of the host

CPU utilization.

TABLE 1 Power model (kWhs) implemented in a simulated cloud data center

Server 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

HP ProLiant G4 86 89.4 92.6 96 99.5 102 106 108 112 114 117

DUGGAN ET AL.11

In addition, the total delay for the network at each time step is measured as shown

totalDelay =



v=1

delay(u,d,vn).(18)

To calculate the impact, one that VM migration would have on the network link bandwidth, we defined bwl(vn)asthe

network bandwidth consumed by one VM. The total traffic Tr generated by a group of VMs Vfrom the host selected hnis

as shown

Tr(hn)=



v=1

bwl(vn),(19)

where vis the set of VMs to be migrated. The latency model has a direct effect on the bandwidth model. The higher latency

lat, the lower bwlwill be available. The higher utilization of network, the lower bwlwill be available. When vnis migrated

from hn, then its utilization is recalculated and is then compared with an overutilized threshold. We only consider the

host's CPU utilization as an indicator to measures if a host machine is overutilization.

3.4 Experiments conducted

The first set of experiments involves comparing the accuracy of performance for each of the prediction algorithms on

the bandwidths training and testing data. All algorithms will be judged based on their performance on the metrics in

Section 4.1. As the RNN is designed to do sequenced planning, it will then be applied to multi-step ahead predictions of

the bandwidth data.

The next set of experiments will evaluate how each algorithm performs when predicting CPU data. This section is

divided into three separate experiments for CPU prediction. The first experiments will examine how each algorithm

performs when predicting CPU utilization on a time scale of one step ahead. The second experiment will examine the

performance of the RNN for “multi-time steps” ahead prediction. The purpose of the multi-step ahead experiments is to

determine how far into the near future a recurrent network can predict and still produce accurate results. This experiment

would be advantageous in real-world data centers as to know in advance if a host will become overutilized or when

bandwidth will become saturated so resources can be reconfigured. The final experiment in this section will measure

the prediction accuracy for each algorithm for a single time step, and then for the RNN's “multi-time steps” ahead CPU

predictions when trained on one host's CPU data and tested on a different host's CPU data. Again, this experiment would

be advantageous in real-world data centers when a host is predicted to become overutilized, and another host must be

selected for VMs to be moved onto without causing SLAV's on the new host.

Finally, the last experiment will evaluate the performance of each one step ahead algorithm with the RNN and how well

their predictions can improve a simulated cloud data centers efficiency. Each algorithms effectiveness will be measured

in SLAs, energy, and bandwidth usage.

4RESULTS

This section presents the results of each of the experiments outlined. First, the results of the bandwidth experiment will

be presented, followed by the CPU result, and lastly the overall simulator results.

4.1 Bandwidth prediction results

This section presents the results from the range of models that were implemented to predict bandwidth resources. The

results of both predicting a single time step and multiple time steps ahead are presented as follows.

4.1.1 Single time step ahead

A comparison on the accuracy of each of the bandwidth forecasting models for one step ahead predictions is presented

as follows for both the training data and test data sets. The calculated metrics on the training data allows us to measure

the performance of the fitted models, while the results generated from the test data demonstrate the predictive accuracy

and ability of each model to generalize to unseen data.

12 DUGGAN ET AL.

TABLE 2 Bandwidth training data prediction accuracy

Algorithm MSE RMSE MAE MAPE

BPTT 184.9968 13.60135 11.0208 1.8379

Random Walk 495.9737 22.2705 17.69222 2.9171

Moving Avg 371.3637 19.2708 15.3689 2.5327

Linear Regression 2985.1861 54.6369 46.2198 7.7215

ARIMA 200.1808 14.1485 10.6497 1.7556

Backpropagation (Sliding Window 2) 432.4122 20.7945 16.7863 2.7521

Backpropagation (Sliding Window 10) 295.5467 17.1915 13.6178 2.2359

Backpropagation (Sliding Window 20) 88.7296 9.4196 4.9847 0.8179

Backpropagation (Sliding Window 30) 155.9807 12.4892 9.7944 1.6294

Abbreviations: ARIMA, autoregressive integrated moving average; BPTT, backpropagation

through time; MAE, mean absolute error; MAPE, mean absolute percentage error; MSE, mean

square error; RMSE, root mean square error.

Table 2 evaluates the models across each of the selected performance metrics. The results overall show that nonlinear

models perform best; this is largely due to the characteristics of the bandwidth model as displayed in Figure 2A. As shown,

the bandwidth data shows significant seasonal patterns with daily peaks and troughs evident throughout that need to

be modeled. However, the data also remains largely stochastic between successive time steps with persistent irregular

fluctuations while no long term trend is evident. In particular, the BP (sliding window size 20) algorithm performed best

followed by BP (sliding window size 30), while ARIMA narrowly outperformed BPTT on the training data. Although LR

is a common prediction method, our experiments showed it performed the least best out of all of the selected models

with a MAPE of 7.7215 on the training data. This is not surprising as the bandwidth data does not display a linear trend;

it is stochastic displaying various fluctuations over time, and as a result, fitting a straight line through the data results in

significant errors on both sides of the fitted line as the model struggles to capture any of the variations in the data.

The true accuracy of a predictive model can only be determined by considering how well it performs on new data,

which was not used to train the model. Table 3 presents the results of each predictive model relative to the test set. The

results show BPTT outperformed all other algorithms with a MAPE of 1.430. In these experiments, this algorithm shows

its capacity to learn complex relationships in the bandwidth time series data, but it also highlights its ability to generalize

well to unseen data, thus indicating the reliability of the model for future predictions. Similar to what was observed in the

training results, the BP (sliding window 20) algorithm performed best out of the remaining approaches. Our empirical

evaluation found that by adjusting the size of the input sequence from 10 to 20 values it provides more specific knowledge

about the underlying structure of the data resulting in an improved MAPE of 2.3798. Unlike the training results, ARIMA

achieved a slightly better result on the test data in comparison to the BP (sliding window 30). In terms of the more sim-

plistic models, they perform similarly. In particular, they show their inability to model the fundamental characteristics of

the bandwidth data with MA performing least best overall with a MAPE of 7.7112. Overall BPTT, ARIMA, and BP with

TABLE 3 Bandwidth test data prediction accuracy

Algorithm MSE RMSE MAE MAPE

BPTT 121.9398 11.0426 8.5788 1.430

Random Walk 3030.3417 55.0485 46.0220 7.6971

Moving Avg 3065.7450 55.3692 45.6627 7.7112

Linear Regression 3021.8770 54.9716 46.2478 7.6953

ARIMA 438.557 20.9418 16.7241 2.7859

Backpropagation (Sliding Window 2) 422.8099 20.5623 16.5327 2.7110

Backpropagation (Sliding Window 10) 362.5414 19.0405 15.1452 2.4981

Backpropagation (Sliding Window 20) 322.0703 17.9463 14.3671 2.3798

Backpropagation (Sliding Window 30) 457.3444 21.3856 17.1363 2.8309

Abbreviations: ARIMA, autoregressive integrated moving average; BPTT, backpropagation

through time; MAE, mean absolute error; MAPE, mean absolute percentage error; MSE, mean

square error; RMSE, root mean square error.

DUGGAN ET AL.13

(A) (B)

Mbps

Time Time

Predicted and Actual Bandwidth Values Predicted and Actual Bandwidth Values

FIGURE 3 Predictive performance of backpropagation through time (BPTT), autoregressive integrated moving average (ARIMA), and

various backpropagation (BP) approaches. A, BPTT and BP (sliding window 20); B, BP sliding window 10) and ARIMA [Colour figure can be

viewed at wileyonlinelibrary.com]

different-sized sliding windows resulted in acceptable performance. Figure 3 shows the predictive performance of these

algorithms relative to the actual values observed in the test data over a single day.

4.1.2 Multiple time steps ahead

In the previous experiment, the RNN trained with the BPTT algorithm outperformed all other approaches. In this experi-

ment, the BPTT algorithm will be evaluated in terms of its ability to accurately predict bandwidth availability for multiple

time steps into the future. In particular, the performance of the BPTT algorithm will be assessed based on its ability to

predict bandwidth from one to six time steps ahead. Each time step represents a 10-minute interval, thus equating to

1 hour of future bandwidth availability, which is adequate for predicting resources over a short time span.

Similar to the previous experiment, the results for both the training and test sets are presented in Table 4 and Table 5.

TABLE 4 Multi-step ahead training data prediction

accuracy

Step Ahead MSE RMSE MAE MAPE

1 Step Ahead 184.9968 13.6014 11.0208 1.8379

2StepAhead 436.0588 20.8820 16.7036 2.770

3 Step Ahead 500.8841 22.3804 17.8305 2.95

4StepAhead 552.0954 23.4967 18.8824 3.137

5 Step Ahead 592.6458 24.3443 19.4938 3.221

6StepAhead 718.8461 26.8113 21.1520 3.502

Abbreviations: MAE, mean absolute error; MAPE, mean abso-

lute percentage error; MSE, mean square error; RMSE, root mean

square error.

TABLE 5 Multi-step ahead test data prediction accuracy

Step Ahead MSE RMSE MAE MAPE

1 Step Ahead 121.9398 11.0426 8.5788 1.430

2StepAhead 237.5089 15.4113 12.0718 2.001

3 Step Ahead 257.1531 16.0360 12.5853 2.081

4StepAhead 318.3459 17.8422 14.1399 2.330

5 Step Ahead 407.0782 20.1762 16.2011 2.678

6StepAhead 488.2690 22.0968 17.9570 2.967

Abbreviations: MAE, mean average error; MAPE, mean absolute

percentage error; MSE, mean square error; RMSE, root mean

square error.

14 DUGGAN ET AL.

MAE Multitime Steps Ahead

Training Data

Test Data

Time Step

Mean Absolute Error

FIGURE 4 Mean absolute error (MAE) for backpropagation through time for multitime-steps-ahead bandwidth predictions

As shown, BPTT is capable of achieving significant predictive accuracy when forecasting multiple time steps into the

future with a MAPE of between 1.8379 and 3.502 for time steps one to six for the training data and 1.430 and 2.967 for

the test data. This indicates the reliability and overall robustness of the RNN for estimating bandwidth resources over a

longer time horizon. Also evident, as shown in Figure 4, is the approximately linear growth in the error of the algorithm

the further out into the future it attempts to predict.

4.2 CPU prediction results

This section presents the results for the three CPU prediction experiments mentioned in Section 3.4. As stated earlier,

ARIMA modeling is better for long-term sequence prediction such as for day or month-long predictions and will not be

applied to the CPU prediction.

4.2.1 Single time step ahead

First, we will examine the training prediction accuracy, and then the testing prediction accuracy for all algorithms for “one

time step”. Table 6 displays MSE, RMSE, MAE, and MAPE, for each of the algorithms. Table 6 shows that each algorithm

performs similarly across all metrics on the training data. Based on the MAPE, BP had the least accuracy and BPTT has

the highest accuracy.

Table 7 shows the results of each algorithm on the testing data. Again, for one step ahead prediction, each algorithm

performs similarly across all metrics on the testing data. Based on the MAPE metric, BP with sliding window 30 had the

worst performance. BPTT performed the best out of all. Figure 5A shows the prediction results for both the BPTT and LR

algorithm. Both algorithms perform similarly when compared with the actual tested CPU data-set. The RNN is usually

used for sequence prediction, and in the next section, it will be examined on how far into the future it is accurately able

to predict CPU data.

TABLE 6 Train CPU data prediction accuracy

Algorithm MSE RMSE MAE MAPE

BPTT 0.00761 0.0872 0.0639 0.1395

Random Walk 0.0078 0.0887 0.0604 0.2644

Moving Avg 0.0089 0.0947 0.0649 0.3238

Linear Regression 0.0075 0.0867 0.0608 0.2843

Backpropagation(Sliding Window 2) 0.0079 0.0893 0.0660 0.3621

Backpropagation(Sliding Window 10) 0.0084 0.0918 0.0673 0.1695

Backpropagation(Sliding Window 20) 0.0041 0.0638 0.0455 0.1315

Backpropagation(Sliding Window 30) 0.0064 0.0798 0.0554 0.1714

Abbreviations: ARIMA, autoregressive integrated moving average; BPTT, backpropagation

through time; MAE, mean absolute error; MAPE, mean absolute percentage error; MSE,

mean square error; RMSE, root mean square error.

DUGGAN ET AL.15

TABLE 7 Test CPU data prediction accuracy

Algorithm MSE RMSE MAE MAPE

BPTT 0.0144 0.1202 0.0818 0.1237

Random Walk 0.0188 0.1370 0.0843 0.1498

Moving Avg 0.0228 0.1509 0.0988 0.1516

Linear Regression 0.0144 0.1201 0.0828 0.1248

Backpropagation (Sliding Window 2) 0.0146 0.1208 0.0859 0.1278

Backpropagation (Sliding Window 10) 0.0396 0.1990 0.1690 0.2184

Backpropagation (Sliding Window 20) 0.0691 0.2628 0.2298 0.2975

Backpropagation (Sliding Window 30) 0.0809 0.2844 0.2527 0.3220

Abbreviations: BPTT, backpropagation through time; MAE, mean absolute error; MAPE,

mean absolute percentage error; MSE, mean square error; RMSE, root mean square error.

(A) (B)

FIGURE 5 Predictive performance of backpropagation through time (BPTT) and linear regression (LR) with mean absolute error (MAE)

for one and six time steps: A, Actual , BPTT, and LR CPU values; B, MAE for one and six time steps [Colour figure can be viewed at

wileyonlinelibrary.com]

4.2.2 Multi time step ahead

The aim of the multiple-time-steps-ahead prediction experiment was to evaluate how far into the future the RNN could

accurately predict CPU utilization of a host and to determine by how much does predictions decrease the further into the

future the network attempts to predict. This experiment involved predicting the CPU utilization of a host machine from

one to six time steps into the future. Each of these time steps corresponds to 10 minutes; for instance, time step six relates

to a prediction 1 hour into the future.

Tables 8 and 9 present the accuracy of the prediction on the training and testing data sets at each of the six time steps.

It is clear by the results that each performance metric shows a linear increase in the error of prediction; the further

TABLE 8 Multi-step ahead trained data prediction

accuracy

Step Ahead MSE RMSE MAE MAPE

1StepAhead 0.00761 0.0872 0.0639 0.1395

2StepAhead 0.013 0.113 0.084 0.179

3StepAhead 0.015 0.126 0.98 0.191

4StepAhead 0.019 0.137 0.106 0.220

5StepAhead 0.021 0.144 0.112 0.232

6StepAhead 0.024 0.154 0.122 0.248

Abbreviations: MAE, mean absolute error; MAPE, mean abso-

lute percentage error; MSE, mean square error; RMSE, root

mean square error.

16 DUGGAN ET AL.

TABLE 9 Multi-step ahead test data prediction

accuracy

Step Ahead MSE RMSE MAE MAPE

1StepAhead 0.0144 0.1202 0.0818 0.1237

2StepAhead 0.0259 0.1608 0.1200 0.1890

3StepAhead 0.0301 0.1734 0.1332 0.2140

4StepAhead 0.0333 0.1824 0.1447 0.2320

5StepAhead 0.0356 0.1887 0.1555 0.2443

6StepAhead 0.0377 0.194 0.1617 0.255

Abbreviations: MAE, mean absolute error; MAPE, mean abso-

lute percentage error; MSE, mean square error; RMSE, root

mean square error.

into the future, the recurrent network tries to predict. Up to six time steps ahead was chosen as the max future time

step as it corresponds to one hour of CPU information and this is a sufficient time into the future for the planning of

resources.

Figure 5B highlights the MAE for both 1-time-step-ahead and 6-time-steps-ahead for the BPTT algorithm on the test

data. Figure 5B shows that the errors increase the further into the future the algorithm predicts. The graphs also highlights

where the BPTT algorithm struggles when making predictions. As shown in Figure 5A, at time step 22, when there is

a extreme sudden change of CPU utilization, Figure 5B shows at the same time step that the BPTT MAE accuracy also

decreases; this is a result of the extreme varying CPU workloads . However, saying that, the algorithm performs well when

predicting six times steps into the future.

4.2.3 Unseen host data set

The next set of experiments involved evaluating the performance of each algorithm on a host's CPU data set, which it

was not trained on. The results from this experiment will show how well each generalizes when applied to new unseen

data. Table 10 shows by each of the metrics that BPTT outperforms all other algorithms, with BP with 10 sliding windows

having the worst performance.

Similarly, in this experiment, the RNN was examined on its ability to predict another host's CPU usage based on training

it completed on another host's CPU data. Table 11 shows how far into the future the RNN could accurately predict CPU

utilization. The results show a linear increase in the error of prediction the further into the future the recurrent network

tries to predict. However, the predictions produced, for even six time steps into the future, are relativity accurate. These

predictions will help the RNN approach to decide well in advance if a host will become overutilized so to not place another

VM on it, causing it to degrade even further.

TABLE 10 Test results from new host data when previously trained on

another host CPU data

Algorithm MSE RMSE MAE MAPE

BPTT 0.012 0.107 0.081 0.122

Random Walk 0.015 0.121 0.085 0.132

Moving Avg 0.014 0.109 0.082 0.127

Linear Regression 0.013 0.108 0.081 0.123

Backpropagation (Sliding Window 2) 0.013 0.107 0.082 0.125

Backpropagation (Sliding Window 10) 0.1953 0.4419 0.4386 6.3073

Backpropagation (Sliding Window 20) 0.1685 0.4105 0.4057 5.8130

Backpropagation (Sliding Window 30) 0.1193 0.3454 0.3423 4.9288

Abbreviations: BPTT, backpropagation through time; MAE, mean absolute error; MAPE,

mean absolute percentage error; MSE, mean square error; RMSE, root mean square error.

DUGGAN ET AL.17

TABLE 11 Multi-step ahead test data prediction

accuracy (different host)

Step Ahead MSE RMSE MAE MAPE

1StepAhead 0.012 0.107 0.081 0.122

2StepAhead 0.016 0.127 0.098 0.15

3StepAhead 0.021 0.145 0.112 0.169

4StepAhead 0.026 0.161 0.126 0.189

5StepAhead 0.03 0.172 0.134 0.199

6StepAhead 0.032 0.177 0.14 0.204

Abbreviations: MAE, mean absolute error; MAPE, mean

absolute percentage error; MSE, mean square error; RMSE,

root mean square error.

4.3 Simulation results

This section will present the performance of each algorithm in a simulated data center environment. The BPTT has the

abilities to predict multitimes into the future (60 minutes) for both CPU and bandwidth, while the rest of the algorithms

are one step ahead prediction (ie, 10 minutes) for CPU only. As BP with sliding window 2 has outperformed each of the

other BP sliding windows, this will be the only network evaluated in the simulation experiment. The goal of this exper-

iment is to investigate how well a “multi-time step” ahead approach can improve data center efficiency by determining

when a host is overutilized and when is the best time to initiate live migration.

The first metric that was examined was the SLAVs. Maintaining low occurrences of SLAV's is an essential factor in the

delivery of reliable quality assured cloud-based services. In this regard, it is imperative to consider the number of SLAVs

incurred by all approaches throughout the simulation. Table 12 presents the average SLAVs each algorithm occurred

during the simulations. In the experiment the lower the SLAV value is, the better the data center is performing. The

BPTT multiple step ahead predictions enabled it to anticipate well in advance if any overutilization was to occur in any

of the hosts. This allowed the RNN approach to start live migration well before a host could become overutilized and

select a suitable host for a VM, which would not become overutilized in the next hour, thus averaging SLAV results of

just 1.80776E-06. This is nearly 87% better than the next lowest value from RW. BP had the worst SLAV of all approaches.

A T-test was performed between the best algorithm and the second best algorithm for SLAV. The two-tailed P value is

less than 0.0001 and the results are deemed statistically significant with a 95% confidence interval of this difference from

−15.82188702780 to −8.95795613255; BPTT had a standard deviation of 5.53272815845 compared with 20.17874430023

for RW. Figure 6B presents the results for all algorithms.

Figure 6A shows the energy consumption at each of the time steps. As can be seen from the results, BP had the lowest

energy consumption of all approaches with an average of 340.67 kWh per host. The BPTT had the second lowest energy

consumption of 457.72 kWh per host and RW had the worst energy consumption of all approaches with the average

energy consumption of 764.22 kWh per host. One reason for RW having the worst energy consumption is that it too

has the highest average migration count in the simulation with 197 and both neural network approaches had the lowest

migrations with 175 for BPTT and 120 for BP. The BPTT achieved a relatively low average energy consumption within

the data center. One reason being that it selects optimal times to migration when more bandwidth was available and had

better accuracy of prediction also, thus providing faster migration and ensuing the host never enters an overutilized state.

TABLE 12 SLAV and ESV results

Metric SLAV ESV

BPTT 1.80776E-06 0.000827449

Random Walk 1.41977E-05 0.010850105

Moving Average 1.89143E-05 0.013299876

Linear Regression 2.66094E-05 0.016313468

Backpropagation 2.7531E-05 0.009379006

Abbreviations: BPTT, backpropagation through time;

SLAV, service level agreement.

18 DUGGAN ET AL.

(A) (B)

FIGURE 6 Performance of all algorithms. A, Energy performance of all algorithms; B, SLAV performance of all algorithms. BP,

backpropagation; BPTT, backpropagation through time; LR, linear regression; MA, moving average; RW, random walk [Colour figure can be

viewed at wileyonlinelibrary.com]

The BPTT ability to store past information makes it an ideal algorithm for sequence prediction of cloud resources. The

BPTT produced the most accurate predictions for the bandwidth utilization both for one and multitime steps ahead. This

allowed the algorithm to decide when is the most optimal times to migrate VMs based on network bandwidth usage. The

BPTT achieves an average bandwidth usage of 610 MB per time step where RW achieved the worst overall results of 646

MB per time step. This led to the BPTT having lower migration times for VMs. By choosing the specific time of when to

migrate, a VM allowed the BPTT algorithm to better utilize available resources at critical times.

The SLA and energy metrics combine to create a metric known as ESV. This metric measures the overall data center

performance in terms of minimizing energy and reducing SLAs (ie, ESV =energy*SLAV). If we try to reduce too much

energy consumption, the SLA violation will be increased, because consolidating many VMs in a host increases the prob-

ability of overload. Therefore, it is desirable to obtain a method, which will consume less power and still incur less SLA

violation.

The lower the ESV, the better the performance the data center is achieving. Overall, the BPTT algorithm achieved the

best ESV performance of 0.000827449 compared with the second best, which is RW of 0.010850105. This is an improve-

ment of 61% of data center efficiency. Again, a T-test was conducted between the two best performing algorithms. The

two-tailed P value is less than 0.0001 and the results are deemed statistically significant with a 95% confidence interval

of this difference from −0.01458090588 to −0.00688034372; BPTT had a standard deviation of 0.00424366520 compared

with 0.02314924312 for RW.

4.4 Discussions

The results of the experiments show that the RNN has the capabilities to improve upon traditional prediction methods

such as RW, MA, and BP to predict CPU utilization and bandwidth with a high degree of accuracy to improve the per-

formance of a data center. Even though the bandwidth data set does contain a learnable pattern, the CPU data set used

has varying demands, which makes it much harder to predict CPU utilization of host machines. For this reason, we have

excluded the phenomenon such as heavy workloads for this paper.

The first experiment determine how an RNN could outperform tradition forecasting methods in predicting the next

bandwidth value. The results from the MSE, MAE, RMSE, and MAPE indicated that BPTT achieved the highest accuracy

as shown in Table 3. The second part of this experiment examined how far into the future the recurrent network could

predict bandwidth with a high degree of accuracy. The results indicate that the RNN can produce a reasonable degree of

accuracy when predicting multiple time steps into the future even though it only needs two previous inputs to achieve this.

The second experiment involved determining which algorithm would produce the most accurate results in predicting

CPU utilized. From the results, BPTT achieved the highest results with an MAE accuracy of 0.0818 on the test data. The

BPTT also produced the most accurate predictions when trained on one host's CPU utilization data and tested on a new

host CPU data.

The results from the final experiment highlight the efficiency of the BPTT algorithm. Each of the forecasting methods

were tested in a simulated environment on the following metrics: energy, SLAV, and bandwidth usage metrics. The BP

DUGGAN ET AL.19

algorithm achieved the lowest energy but the highest SLAVs. The RW algorithm achieved the second lowest SLAV but

the highest energy consumption. However, the BPTT achieved a statistically significant lower SLAV and ESV values,

improving the data center efficiency by 61% when compared with the next best algorithm. One of the main reasons BPTT

could achieve the most efficient results is because it can identify when host machines will become overutilized and best

time to schedule migration based on the network bandwidth.

Multi-time step ahead prediction is a difficult area in time series research. The additional noise present in both the CPU

and bandwidth utilization data makes forecasting far into the future even more difficult. The BPTT forecasting method

presented in this research could potentially be incorporated with many of the other subfields of cloud computing that

involve host prediction, VM migration, and task scheduling to improve overall performance.

The results presented in this paper demonstrate that RNNs are capable of predicting CPU data 1 hour ahead with a

relatively high degree of accuracy despite the noise in the data set. It is known that instantiating a new VM takes between 5

to 15 minutes.59 In real-world data centers, the recurrent network could be implemented to inform the cloud management

system as to when a host is going to become over/underutilized so the management system can take action and boot up

new VM instances on a different host prior to initiating live migration from the overutilized host. This, in turn, would

lead to smoother transitions of VMs being moved from a source host to a destination, reducing live migration times and

decreasing the occurrences of SLAs on host machines.

This paper focused on implementing BPTT and compared it to more tradition approaches such as BP and LR. Our

results showed that, even predicting one to six time steps into the future, the RNN trained with BPTT only needed two

past CPU values. One reason for this is the data was constantly fluctuating meaning only the most recent values were

required for the RNN-BPTT to make it predictions. Other RRN training algorithm, such as long short-term memory

(LSTM), has been explored in the cloud domain previously, eg, research such as the work of Janardhanan and Barrett69

demonstrated for longer forecasting problems where a single days' worth of CPU values (1440 time steps) is input and the

LSTM model was able to outperform ARIMA modeling. Previous work by Gers et al70 demonstrated that, when LSTM

networks are presented with data such as the Mackey-Glass series problem, LSTMs do not perform well because ofF the

constant fluctuation of the data. Their results showed that a sliding window multilayer perceptron could produce more

accurate results than an LTSM approach. Predicting CPU six time steps into the future based on the previous two also

suffers from similar high fluctuations as the Mackey-Glass series problem. However, at lower granularities past six time

steps, we find that BPTT is more effective. Zhang et al demonstrated the usefulness of RNN to predict CPU and RAM.71

As shown through our results and highlighted by Zhang et al, the abilities of RNN to retain information and to create

its own representation of the data enable the algorithm to achieve high accuracy of workload predictions on the Google

cloud trace data set.

How efficient management of VMs in cloud computing services offered by Google and Amazon is crucial to lowering

their cost and provide a better service. The results presented in this paper demonstrate that an RNN can produce accurate

forecasts well into the future; therefore, it could have additional benefit of reducing the overall energy consumption

of the data center. The recurrent network produce promising result for power consumption having the second lowest

consumption. Koomey has stated that, in 2010, data center consumed 1.3% of world's energy.72 Gartner Inc has highlighted

that the ICT industry contributed to about 2% of global CO2 emitted each year, aligning itself on the same level with the

aviation industry.73 Better optimization of cloud data centers need in place to dramatically decrease energy consumption.

Research has been shown that, in data centers, a significant portion of the host machines operate at 10 to 50% of their

full capacity.74 This results in a considerable increase in energy costs for cloud companies. Duy et al have shown how

neural networks can be utilized as a predictor to reduce energy consumption in a data center to turn off host when the

traffic load is light.48 Our approach results in a saving of 40% in terms of comparing the energy usage of the RNN and the

RW. Companies such as Google have recently implemented their own DeepMind neural network tool to reduce their data

center energy cost by 40%.75 This, in turn, will lead to millions of dollars saved on powering host machine and reduce

Google carbon footprint also.

Much more companies are hiring cloud computing services like Amazon and Google to process their “big data.” These

cloud computing services need to be effectively managed to provide the best service possible to their customers. Our

approach has demonstrated that the cloud environment can be managed more effectively using ML techniques. This

would place cloud computing services in a better position to handle the immense computing requirements faced in the

age of big data.

The results from this paper showed when an RNN trained on a “big data” data set, such as Google cluster trace, it has

the abilities to accurately predict when a host machine utilization for several time steps into the future. This would help

optimize cloud companies data centers such as Google and Amazon to reduce the amount of host machines idling at 0 to

20 DUGGAN ET AL.

10% utilization for at least 20 to 30 minutes. Thus, this will aid in reducing energy consumption and by extension decrease

CO2 emissions from the cloud data centers.

Our approach has demonstrated that the cloud environment utilizing a big data set can be managed more effectively

using ML techniques. This would place cloud computing services in a better position to handle the immense computing

requirements faced in the age of big data.

5CONCLUSION

To maximize resource usage within a data center and to ensure SLA is met at all times, resource management strategies

must be able to predict into the future how each host is going to perform. In this paper, we have conducted competi-

tive analysis of the nonlinear and linear one step ahead predicting algorithm against a multi time step ahead prediction

algorithm. We have evaluated the proposed algorithms through extensive simulations on a large-scale experiment setup

using workload traces from more than 600 host machines using the Google clustering trace data. The results of the exper-

iments have shown that the memory retention and sequence prediction of an RNN allowed it to have the abilities to

produce the most accurate prediction for both CPU and bandwidth utilization. The RNN also was able to decrease band-

width usage during critical times, reduce occurrence of SLAVs, and improve the overall efficiency of a cloud data center.

With respect to future work, one area that could be explored is the incorporation of other resources such as memory to

further improve migration decisions and overall systems performance.

In summary, the main findings of this research are as follows.

1. Recurrent neural networks produce better prediction for host CPU and network bandwidth utilization when compared

with traditional models.

2. Recurrent neural networks produce high accuracy when predicting as multitime steps into the future for both CPU

and bandwidth data. In both cases, the accuracy of the network predictions decreases linearly the further into the

future the network attempts to predict.

3. The RNNs produce statistically significant results when improving the efficiency of larger scale simulated cloud data

centers.

ACKNOWLEDGEMENT

The second author would like to acknowledge the ongoing financial support provided to her by the Irish Research Council.

ORCID

M. Duggan http://orcid.org/0000-0001-9576-3884

REFERENCES

1. Cisco Visual Networking Index: Forecast and Methodology, 2014–2019. White Paper. San Jose, CA: Cisco; 2016.

2. Zhang Y, Sun W, Inoguchi Y. Predict task running time in grid environments based on CPU load predictions. Futur Gener Comput Syst.

2008;24(6):489-497.

3. Dinda PA, O'Hallaron DR. Host load prediction using linear models. Clust Comput. 2000;3(4):265-280.

4. Bey KB, Benhammadi F, Mokhtari A, Guessoum Z. CPU load prediction model for distributed computing. Paper presented at: Eighth

International Symposium on Parallel and Distributed Computing (ISPDC); 2009; Lisbon, Portugal.

5. Benson T, Anand A, Akella A, Zhang M. MicroTE: fine grained traffic engineering for data centers. In: Proceedings of the Seventh

COnference on Emerging Networking EXperiments and Technologies (CoNEXT); 2011; Tokyo, Japan.

6. Armbrust M, Fox A, Griffith R, et al. A view of cloud computing. Commun ACM. 2010;53(4):50-58.

7. Hu K, Sim A, Antoniades D, Dovrolis C. Estimating and forecasting network traffic performance based on statistical patterns observed in

SNMP data. In: Machine Learning and Data Mining in Pattern Recognition: 9th International Conference, MLDM 2013, New York, NY, USA,

July 19-25, 2013. Proceedings. Berlin, Germany: Springer-Verlag Berlin Heidelberg; 2013:601-615.

8. Akoush S, Sohan R, Rice A, Moore AW, Hopper A. Predicting the performance of virtual machine migration. Paper presented at: IEEE

International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems; 2010; Miami Beach, FL.

9. Wood T, Ramakrishnan K, Shenoy P, et al. CloudNet: dynamic pooling of cloud resources by live WAN migration of virtual machines.

IEEE/ACM Trans Netw. 2015;23(5):1568-1583.

DUGGAN ET AL.21

10. Mandal U, Habib MF, Zhang S, Chowdhury P, Tornatore M, Mukherjee B. Heterogeneous bandwidth provisioning for virtual machine

migration over SDN-enabled optical networks. Paper presented at: Optical Fiber Communication Conference; 2014; San Francisco, CA.

11. Clark C, Fraser K, Hand S, et al. Live migration of virtual machines. In: Proceedings of the 2nd Conference on Symposium on Networked

Systems Design & Implementation (NSDI); 2005; Boston, MA.

12. Beloglazov A, Buyya R. Energy efficient resource management in virtualized cloud data centers. In: Proceedings of the 2010 10th

IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGRI); 2010; Melbourne, Australia.

13. Verma A, Ahuja P, Neogi A. pMapper: power and migration cost aware application placement in virtualized systems. In: Middleware 2008:

ACM/IFIP/USENIX 9th International Middleware Conference Leuven, Belgium, December 1-5, 2008 Proceedings. Berlin, Germany: Springer;

2008:243-264.

14. Chen J, Liu W, Song J. Network Performance-aware Virtual Machine Migration in Data Centers. Paper presented at: Third International

Conference on Cloud Computing (CloudComp); 2012; Vienna, Austria.

15. Piao JT, Yan J. A network-aware virtual machine placement and migration approach in cloud computing. Paper presented at: The Ninth

International Conference on Grid and Cloud Computing (GCC); 2010; Nanjing, China.

16. Stage A, Setzer T. Network-aware migration control and scheduling of differentiated virtual machine workloads. In: Proceedings of the

2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing; 2009; Vancouver, Canada.

17. Chen H, Kang H, Jiang G, Zhang Y. Coordinating virtual machine migrations in enterprise data centers and clouds.

18. Ghorbani S, Caesar M. Walk the line: consistent network updates with bandwidth guarantees. In: Proceedings of the First Workshop on

Hot Topics in software Defined Networks (HotSDN); 2012; Helsinki, Finland.

19. Duggan M, Duggan J, Howley E, Barrett E. A reinforcement learning approach for the scheduling of live migration from under utilised

hosts. Memetic Comput. 2017;9(4):283-293. https://doi.org/10.1007/s12293-016- 0218-x

20. Duggan M, Duggan J, Howley E, Barrett E. A network aware approach for the scheduling of virtual machine migration during peak loads.

Clust Comput. 2017;20(3):2083-2094.

21. Chujai P, Kerdprasop N, Kerdprasop K. Time series analysis of household electric consumption with ARIMA and ARMA models. In:

Proceedings of the International MultiConference of Engineers and Computer Scientists; 2013; Hong Kong.

22. Enke D, Thawornwong S. The use of data mining and neural networks for forecasting stock market returns. Expert Syst Appl.

2005;29(4):927-940.

23. Tilman D, Fargione J, Wolff B, et al. Forecasting agriculturally driven global environmental change. Science. 2001;292(5515):281-284.

24. Duggan M, Duggan J, Howley E, Barrett E. An autonomous network aware VM migration strategy in cloud data centres. Paper presented

at: International Conference on Cloud and Autonomic Computing (ICCAC) ; 2016; Augsburg, Germany.

25. Duggan M, Flesk K, Duggan J, Howley E, Barrett E. A reinforcement learning approach for dynamic selection of virtual machines in cloud

data centres. Paper presented at: Sixth International Conference on Innovative Computing Technology (INTECH); 2016; Dublin, Ireland.

26. Shaw R, Howley E, Barrett E. Predicting the available bandwidth on intra cloud network links for deadline constrained workflow schedul-

ing in public clouds. In: Service-Oriented Computing: 15th International Conference, ICSOC 2017, Malaga, Spain, November 13-16, 2017,

Proceedings. Cham, Switzerland: Springer International Publishing AG ; 2017:221-228.

27. Shaw R, Howley E, Barrett E. An advanced reinforcement learning approach for energy-aware virtual machine consolidation in cloud

data centers. Paper presented at: 12th International Conference for Internet Technology and Secured Transactions (ICITST); 2017;

Cambridge, UK.

28. Barrett E, Howley E, Duggan J. Applying reinforcement learning towards automating resource allocation and application scalability in

the cloud. Concurr Comput Pract Exp. 2013;25(12):1656-1674.

29. Barrett E, Howley E, Duggan J. A learning architecture for scheduling workflow applications in the cloud. Paper presented at: Ninth IEEE

European Conference on Web Services (ECOWS); 2011; Lugano, Switzerland.

30. Tang M, Zhang T, Liu J, Chen J. Cloud service QoS prediction via exploiting collaborative filtering and location-based data smoothing.

Concurr Comput Pract Exp. 2015;27(18):5826-5839.

31. Dabbagh M, Hamdaoui B, Guizani M, Rayes A. Toward energy-efficient cloud computing: Prediction, consolidation, and overcommitment.

IEEE Netw. 2015;29(2):56-61.

32. Wang J, Huang C, He K, Wang X, Chen X, Qin K. An energy-aware resource allocation heuristics for VM scheduling in cloud. Paper

presented at: 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International

Conference on Embedded and Ubiquitous Computing; 2013; Zhangjiajie, China.

33. Farahnakian F, Pahikkala T, Liljeberg P, Plosila J, Hieu NT, Tenhunen H. Energy-aware VM consolidation in cloud data centers using

utilization prediction model. IEEE Trans Cloud Comput. 2016.

34. Nguyen TH, Di Francesco M, Yla-Jaaski A. Virtual machine consolidation with multiple usage prediction for energy-efficient cloud data

centers. IEEE Trans Serv Comput. 2017.

35. Bobroff N, Kochut A, Beaty K. Dynamic placement of virtual machines for managing sla violations. Paper presented at: 10th IFIP/IEEE

International Symposium on Integrated Network Management (IM); 2007; Munich, Germany.

36. Fu X, Zhou C. Predicted affinity based virtual machine placement in cloud computing environments. IEEE Trans Cloud Comput. 2017.

37. Chen T, Zhu Y, Gao X, Kong L, Chen G, Wang Y. Improving resource utilization via virtual machine placement in data center networks.

Mob Netw Appl. 2017;23(2):227-238.

38. Gai K, Du Z, Qiu M, Zhao H. Efficiency-aware workload optimizations of heterogeneous cloud computing for capacity planning in

financial industry. Paper presented at: IEEE 2nd International Conference on Cyber Security and Cloud Computing (CSCloud); 2015;

New York, NY.

22 DUGGAN ET AL.

39. Genez TA, Bittencourt L, Fonseca N, Madeira E. Estimation of the available bandwidth in inter-cloud links for task scheduling in hybrid

clouds. IEEE Trans Cloud Comput. 2015.

40. Mason K, Duggan M, Barrett E, Duggan J, Howley E. Predicting host CPU utilization in the cloud using evolutionary neural networks.

Futur Gener Comput Syst. 2018;86:162-173.

41. Cui Y, Lin Y, Guo Y, Li R, Wang Z. Optimizing live migration of virtual machines with context based prediction algorithm. Adv Intell

Syst Res. 2013.

42. Wu Y, Zhao M. Performance modeling of virtual machine live migration. Paper presented at: IEEE 4th International Conference on Cloud

Computing (CLOUD); 2011; Washington, DC.

43. Beloglazov A, Buyya R. Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic

consolidation of virtual machines in cloud data centers. Concurrency Computat Pract Exp. 2012;24(13):1397-1420.

44. Duggan M, Mason K, Duggan J, Howley E, Barrett E. Predicting host CPU utilization in cloud computing using recurrent neural networks.

Paper presented at: 12th International Conference for Internet Technology and Secured Transactions (ICITST); 2017; Cambridge, UK.

45. Zhang G, Patuwo BE, Hu MY. Forecasting with artificial neural networks: the state of the art. Int J Forecast. 1998;14(1):35-62.

46. Zhang Y, Sun W, Inoguchi Y. CPU load predictions on the computational grid. IEICE Trans Inf Syst. 2007;90(1):40-47.

47. Cao J, Fu J, Li M, Chen J. CPU load prediction for cloud environment based on a dynamic ensemble model. Softw Pract Exp.

2014;44(7):793-804.

48. Duy TVT, Sato Y, Inoguchi Y. Performance evaluation of a green scheduling algorithm for energy savings in cloud computing. Paper

presented at: IEEE International Symposium on Parallel & Distributed Processing, Workshops and PhD Forum (IPDPSW); 2010;

Atlanta, GA.

49. Vieira K, Schulter A, Westphall C, Westphall C. Intrusion detection techniques in grid and cloud computing environment. IT Prof IEEE

Comput Soc. 2010;12(4):38-43.

50. Joshi B, Vijayan AS, Joshi BK. Securing cloud computing environment against DDoS attacks. Paper presented at: International Conference

on Computer Communication and Informatics (ICCCI); 2012; Coimbatore, India.

51. Prevost JJ, Nagothu K, Kelley B, Jamshidi M. Prediction of cloud data center networks loads using stochastic and neural models. Paper

presented at: 6th International Conference on System of Systems Engineering (SoSE); 2011; Albuquerque, NM.

52. Zhang W, Duan P, Yang LT, et al. Resource requests prediction in the cloud computing environment with a deep belief network. Softw

Pract Exp. 2017;47(3):473-488.

53. Calheiros RN, Masoumi E, Ranjan R, Buyya R. Workload prediction using ARIMA model and its impact on cloud applications' QoS. IEEE

Trans Cloud Comput. 2015;3(4):449-458.

54. Bishop CM. Neural Networks for Pattern Recognition. Oxford, UK: Oxford University Press; 1995.

55. Mason K, Duggan J, Howley E. Evolving multi-objective neural networks using differential evolution for dynamic economic emission

dispatch. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO); 2017; Berlin, Germany.

56. Mason K, Duggan J, Howley E. Neural network topology and weight optimization through neuro differential evolution. In: Proceedings

of the Genetic and Evolutionary Computation Conference Companion (GECCO); 2017; Berlin, Germany.

57. Mason K, Duggan J, Howley E. A meta optimisation analysis of particle swarm optimisation velocity update equations for watershed

management learning. Appl Soft Comput. 2018;62:148-161.

58. Werbos PJ. Backpropagation through time: what it does and how to do it. Proc IEEE. 1990;78(10):1550-1560.

59. Islam S, Keung J, Lee K, Liu A. Empirical prediction models for adaptive resource provisioning in the cloud. Futur Gener Comput Syst.

2012;28(1):155-162.

60. Hyndman RJ, Athanasopoulos G. Forecasting: Principles and Practice. OTexts; 2014.

61. Box GEP, Jenkins GM, Reinsel GC, Ljung GM. Time Series Analysis: Forecasting and Control. Hoboken, NJ: John Wiley & Sons; 2015.

62. Sanghrajka S, Mahajan N, Sion R. Cloud Performance Benchmark Series: Network Performance-Amazon EC2. Technical Report.

Brookhaven, NY: Stony Brook University; 2011.

63. Reiss C, Wilkes J, Hellerstein JL. Google Cluster-Usage Traces: Format+ Schema. White Paper. Mountain View, CA: Google Inc; 2011.

64. Voorsluys W, Broberg J, Venugopal S, Buyya R. Cost of virtual machine live migration in clouds: a performance evaluation. In: Cloud Com-

puting: First International Conference, CloudCom 2009, Beijing, China, December 1-4, 2009. Proceedings. Berlin, Germany: Springer-Verlag

Berlin Heidelberg; 2009:254-265.

65. MinasL,EllisonB.Energy Efficiency for Information Technology: How to Reduce Power Consumption in Servers and Data Centers.Santa

Clara, CA: Intel Press; 2009.

66. Fan X, Weber W-D, Barroso LA. Power provisioning for a warehouse-sized computer. ACM SIGARCH Comput Archit News.

2007;35(2):13-23.

67. Kusic D, Kephart JO, Hanson JE, Kandasamy N, Jiang G. Power and performance management of virtualized computing environments

via lookahead control. ClusT Comput. 2009;12(1):1-15.

68. Garg SK, Buyya R. NetworkCloudSim: modelling parallel applications in cloud simulations. Paper presented at: IEEE 4th International

Conference on Utility and Cloud Computing (UCC); 2011; Melbourne, Australia.

69. Janardhanan D, Barrett E. CPU workload forecasting of machines in data centers using LSTM recurrent neural networks and

ARIMA models. Paper presented at: 12th International Conference for Internet Technology and Secured Transactions (ICITST); 2017;

Cambridge, UK.

DUGGAN ET AL.23

70. Gers FA, Eck D, Schmidhuber J. Applying LSTM to time series predictable through time-window approaches. In: Neural Nets WIRN

Vietri-01: Proceedings of the 12th Italian Workshop on Neural Nets, Vietri sul Mare, Salerno, Italy, 17-19 May 2001. London, UK:

Springer-Verlag London; 2002.

71. Zhang W, Li B, Zhao D, Gong F, Lu Q. Workload prediction for cloud cluster using a recurrent neural network. Paper presented at:

International Conference on Identification, Information and Knowledge in the Internet of Things (IIKI); 2016; Beijing, China.

72. Koomey JG. Estimating Total Power Consumption by Servers in the US and the World. 2007.

73. Gartner Inc. Gartner Estimates ICT Industry Accounts for 2 Percent of Global CO2 Emissions. 2007. http://www.gartner.com/newsroom/

id/503867

74. Barroso LA, Hölzle U. The case for energy-proportional computing. Computer. 2007;40(12):33-37.

75. Gao J, Jamidar R. Machine Learning Applications for Data Center Optimization. White Paper. Mountain View, CA: Google Inc; 2014.

Howtocitethisarticle: Duggan M, Shaw R, Duggan J, Howley E, Barrett E. A multitime-steps-ahead

prediction approach for scheduling live migration in cloud data centers. Softw Pract Exper. 2018;1–23.

https://doi.org/10.1002/spe.2635

Application of Profile Prediction for Proactive Scheduling

Article

Full-text available

Dec 2022

Today, cloud environments are widely used as execution platforms for most applications. In these environments, virtualized applications often share computing resources. Although this increases hardware utilization, resources competition can cause performance degradation, and knowing which applications can run on the same host without causing too much interference is key to a better scheduling and performance. Therefore, it is important to predict the resource consumption profile of applications in their subsequent iterations. This work evaluates the use of machine learning techniques to predict the increase or decrease in computational resources consumption. The prediction models are evaluated through experiments using real and benchmark applications. Finally, we conclude that some models offer significantly better performance when compared to the current trend of resource usage. These models averaged up to 94% on the F1 metric for this task.

A machine learning-based optimization approach for pre-copy live virtual machine migration

Article

Full-text available

May 2023
CLUSTER COMPUT

Organizations widely use cloud computing to outsource their computing needs. One crucial issue of cloud computing is that services must be available to clients at all times. However, the cloud services may be temporarily unavailable due to maintenance of the cloud infrastructure, load balancing of services, defense against cyber attacks, power management, proactive fault tolerance, or resource usage. The unavailability of cloud services impacts negatively on the business model of cloud providers. One solution to tackle the service unavailability is Live Virtual Machine Migration (LVM), that is, moving virtual machines (VMs) from the source host machine to the destination host without disrupting the running application. Pre-copy memory migration is a common LVM approach used in most networked systems such as the cloud. The main difficulty with this approach is the high rate of frequently updating memory pages, referred to as ''dirty pages. Transferring these updated or dirty pages during the pre-copy migration approach prolongs the total migration time. After a predefined iteration, the pre-copy approach enters the stop-and-copy phase and transfers the remaining memory pages. If the remaining pages are huge, the downtime or service unavailability will be very high-resulting in a negative impact on the availability of the running services. To minimize such service downtime, it is critical to find an optimal time to migrate a virtual machine in the pre-copy approach. To address the issue, this paper proposes a machine learning-based method to optimize pre-copy migration. It has mainly three stages (i) Feature selection (ii) Model generation and (iii) Application of the proposed model in pre-copy migration. The experiment results show that our proposed model outperforms other machine learning models in terms of prediction accuracy and it significantly reduces downtime or service unavailability during the migration process.

A Machine Learning-based Optimization Approach for Pre-copy Live Virtual Machine Migration

Preprint

Full-text available

Feb 2023

Organizations widely use cloud computing to outsource their computing needs. One crucial issue of cloud computing is that services must be available to clients at all times. However, the cloud services may be temporarily unavailable due to maintenance of the cloud infrastructure, load balancing of services, defense against cyber attacks, power management, proactive fault tolerance, or resource usage. The unavailability of cloud services impacts negatively on the business model of cloud providers. One solution to tackle the service unavailability is Live Virtual Machine Migration (LVM), that is, moving virtual machines (VMs) from the source host machine to the destination host without disrupting the running application. Pre-copy memory migration is a common LVM approach used in most networked systems such as the cloud. The main difficulty with this approach is the high rate of frequently updating memory pages, referred to as "dirty pages. Transferring these updated or dirty pages during the pre-copy migration approach prolongs the total migration time. After a predefined iteration, the pre-copy approach enters the stop-and-copy phase and transfers the remaining memory pages. If the remaining pages are huge, the downtime or service unavailability will be very high -resulting in a negative impact on the availability of the running services. To minimize such service downtime, it is critical to find an optimal time to migrate a virtual machine in the pre-copy approach. To address the issue, this paper proposes a machine learning-based method to optimize pre-copy migration. It has mainly three stages (i) Feature selection (ii) Model generation and (iii) Application of the proposed model in pre-copy migration. The experiment results show that our proposed model outperforms other machine learning models in terms of prediction accuracy and it significantly reduces downtime or service unavailability during the migration process.

Isolated Forest-Based Prediction of Container Resource Load Extremes

Article

Full-text available

Mar 2024

Given the wide application of container technology, the accurate prediction of container CPU usage has become a core aspect of optimizing resource allocation and improving system performance. The high volatility of container CPU utilization, especially the uncertainty of extreme values of CPU utilization, is challenging to accurately predict, which affects the accuracy of the overall prediction model. To address this problem, a container CPU utilization prediction model, called ExtremoNet, which integrates the isolated forest algorithm, and classification sub-models are proposed. To ensure that the prediction model adequately takes into account critical information on the CPU utilization’s extreme values, the isolated forest algorithm is introduced to compute these anomalous extreme values and integrate them as features into the training data. In order to improve the recognition accuracy of normal and extreme CPU utilization values, a classification sub-model is used. The experimental results show that, on the AliCloud dataset, the model has an R2 of 96.51% and an MSE of 7.79. Compared with the single prediction models TCN, LSTM, and GRU, as well as the existing combination models CNN-BiGRU-Attention and CNN-LSTM, the model achieves average reductions in the MSE and MAE of about 38.26% and 23.12%, proving the effectiveness of the model at predicting container CPU utilization, and provides a more accurate basis for resource allocation decisions.

Intelligent Mechanism for Virtual Machine Migration in Cloud Computing

Conference Paper

Full-text available

Jan 2023

Cloud computing has risen in its importance and is now hosted in massive data centers based on the virtualization technology that in turn allows creating multiple virtualized environments and several virtual machines (VMs) to provide multiple services on a single physical host. Despite its advantages, the virtualization technology might fail at any time or be updated or loaded. Accordingly, the VM must be transferred from the utilized host to another. This movement has now become a significant factor in saving the available resources, reducing energy consumption, increasing resource utilization, maintaining the quality of service in cloud data centers, increasing reliability, and achieving load balancing. Multiple methods for moving VMs have been developed for best utilization of the resources. Among these methods, pre-copy migration is considered as a common approach where it migrates the state of the VM’s memory from the original host to the intended host through a number of iterations before the shutting down of the VM on the original physical host with an amount of time called downtime. The problem with this approach is that it might cause a little disruption to the services operating in the VM. Therefore, various research attempts focused on studying and selecting proper destination hosts with the available resources to the future usage on the VM. Thus, this paper aims to highlight the current scientific work that target the aforementioned goal. Then, the paper tries to illustrate the current challenges and possible future directions in this research area.

Design and Implementation of Massive Data Migration System Based on Object Storage

Chapter

Jun 2024

Zhijiang Zhang

Energy-Aware Dynamic Virtual Machine Scheduling in Cloud Computing: A Survey

Conference Paper

Full-text available

Sep 2023

Towards Workload Trend Time Series Probabilistic Prediction via Probabilistic Deep Learning

Conference Paper

Aug 2023

GROUP: An End-to-end Multi-step-ahead Workload Prediction Approach Focusing on Workload Group Behavior

Conference Paper

Apr 2023

On a method for detecting periods and repeating patterns in time series data with autocorrelation and function approximation

Article

Jun 2023
PATTERN RECOGN

Predicting host CPU utilization in the cloud using evolutionary neural networks

Article

Full-text available

Apr 2018
FUTURE GENER COMP SY

The Infrastructure as a Service (IaaS) platform in cloud computing provides resources as a service from a pool of compute, network, and storage resources. One of the major challenges facing cloud computing is to predict the usage of these resources in real time. By knowing future demands, cloud data centres can dynamically scale resources to decrease energy consumption while maintaining a high quality of service. However cloud resource consumption is ever changing, making it difficult for accurate predictions to be produced. This motivates the research presented in this paper which aims to predict in advance the level of CPU consumption of a host. This research implements evolutionary Neural Networks (NN), a powerful machine learning method, to make these predictions. A number of state of the art swarm and evolutionary optimisation algorithms are implemented to train the neural networks to predict host utilization: Particle Swarm Optimisation (PSO), Differential Evolution (DE) and Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES). The results of this research demonstrate that CMA-ES converges faster to a better solution on the training data. However when evaluated on the test data, DE performs statistically equal to CMA-ES. The results also demonstrate that the trained networks are still accurate when applied to CPU utilization data from different hosts with no further training needed. When evaluated to predict multiple steps into the future, the accuracy of the network understandably decreases but still performs well on average.

CPU Workload forecasting of Machines in Data Centers using LSTM Recurrent Neural Networks and ARIMA Models

Conference Paper

Full-text available

Mar 2018

The advent of Data Science has led to data being evermore useful for an increasing number of organizations who want to extract knowledge from it for financial and research purposes. This has triggered data to be mined at an even faster pace causing the rise of Data Centers that host over thousands of machines together with thousands of jobs running in each of those machines. The growing complexities associated with managing such a huge infrastructure has caused the scheduling management systems to be inefficient at resource allocation across these machines. Hence, resource usage forecasting of machines in data centers is a growing area for research. This study focuses on the Time Series forecasting of CPU usage of machines in data centers using Long Short-Term Memory (LSTM) Network and evaluating it against the widely used and traditional autoregressive integrated moving average (ARIMA) models for forecasting. The final LSTM model had a forecasting error in the range of 17-23% compared to ARIMA model’s 37 - 42%. The results clearly show that LSTM models performed more consistently due to their ability to learn non-linear data much better than ARIMA models.

A Multi-Objective Neural Network Trained with Differential Evolution for Dynamic Economic Emission Dispatch

Article

Full-text available

Sep 2018
INT J ELEC POWER

Multi-objective optimisation has received considerable attention in recent years as many real world problems have multiple conflicting objectives. There is an additional layer of complexity when considering multi-objective problems in dynamic environments due to the changing nature of the problem. A novel Multi-Objective Neural Network trained with Differential Evolution (MONNDE) is presented in this research. MONNDE utilizes Neural Network function approximators to address dynamic multi-objective optimisation problems. Differential Evolution (DE) is a state of the art single objective global optimisation problems and will be used to evolve neural networks capable of generating Pareto fronts. The proposed MONNDE algorithm has the added advantage of developing an approximation of the problem that can produce further Pareto fronts as the environment changes with no further optimisation needed. The MONNDE framework is applied to the Dynamic Economic Emission Dispatch (DEED) problem and performs equally optimal when compared to other state of the art algorithms in terms of the 24 hour cost and emissions. This research also compares the performance of fully and partially connected networks and discovers that dynamically optimising the topology of the neural networks performs better in an online learning environment than simply optimising the network weights.

An Advanced Reinforcement Learning Approach for Energy-Aware Virtual Machine Consolidation in Cloud Data Centers

Conference Paper

Full-text available

Dec 2017

Energy awareness presents an immense challenge for cloud computing infrastructure and the development of next generation data centers. Inefficient resource utilization is one of the greatest causes of energy consumption in data center operations. To address this problem we introduce an Advanced Reinforcement Learning Consolidation Agent (ARLCA) capable of optimizing the distribution of virtual machines across the data center for improved resource management. Determining efficient policies in dynamic environments can be a difficult task, however the proposed Reinforcement Learning (RL) approach learns optimal behaviour in the absence of complete knowledge due to its innate ability to reason under uncertainty. Using real workload data we evaluate our algorithm against a state-of-the-art heuristic, our model shows a significant improvement in energy consumption while also reducing the number of service violations.

Predicting Host CPU Utilization in Cloud Computing using Recurrent Neural Networks

Conference Paper

Full-text available

Nov 2017

One of the major challenges facing cloud computing is to accurately predict future resource usage for future demands. Cloud resource consumption is constantly changing, which makes it difficult for forecasting algorithms to produce accurate predictions. This motivates the research presented in this paper which aims to predict host machines CPU consumption for a single time-step and multiple time-steps into the future. This research implements a Recurrent Neural Network to predict CPU utilisation, due to their ability to retain information and accurately make predictions for time series problems, making it a promising candidate to predict CPU utilization with greater accuracy when compared to traditional approaches.

A Meta Optimisation Analysis of Particle Swarm Optimisation Velocity Update Equations for Watershed Management Learning

Article

Full-text available

Dec 2017
APPL SOFT COMPUT

Particle Swarm Optimisation (PSO) is a general purpose optimisation algorithm used to address hard optimisation problems. The algorithm operates as a result of a number of particles converging on what is hoped to be the best solution. How the particles move through the problem space is therefore critical to the success of the algorithm. This study utilizes meta optimisation to compare a number of velocity update equations to determine which features of each are of benefit to the algorithm. A number of hybrid velocity update equations are proposed based on other high performing velocity update equations. This research also presents a novel application of PSO to train a neural network function approximator to address the Watershed Management problem. It is found that the standard PSO with a linearly changing inertia, the proposed hybrid Attractive Repulsive PSO with Avoidance of Worst Locations (AR PSOAWL) and Adaptive Velocity PSO (AV PSO) provide the best performance overall. The results presented in this paper also reveal that commonly used PSO parameters do not provide the best performance. Increasing and negative inertia values were found to perform better.

Predicting the Available Bandwidth on Intra Cloud Network Links for Deadline Constrained Workflow Scheduling in Public Clouds

Conference Paper

Full-text available

Oct 2017

Cloud computing infrastructure has in recent times gained significant popularity for addressing the ever growing processing, storage and network requirements of scientific applications. In public cloud infrastructure predicting bandwidth availability on intra cloud network links play a pivotal role in efficiently scheduling and executing large scale data intensive workflows requiring vast amounts of network bandwidth. However, the majority of existing research focuses solely on scheduling approaches which reduce cost and makespan without considering the impact of bandwidth variability and network delays on execution performance. This work presents a time series network-aware scheduling approach to predict network conditions over time in order to improve performance by avoiding data transfers at network congested times for a more efficient execution.

Improving Resource Utilization via Virtual Machine Placement in Data Center Networks

Article

Full-text available

Apr 2018
MOBILE NETW APPL

The resource utilization of servers (such as CPU, memory) is an important performance metric in data center networks (DCNs). The cloud platform supported by DCNs aims to achieve high average resource utilization while guaranteeing the quality of cloud services. Previous papers designed various efficient virtual machine placement schemes to increase the average resource utilization and decrease the server overload ratio. Unfortunately, most of virtual machine placement schemes did not contain the service level agreements (SLAs) and statistical methods. In this paper, we propose a correlation-aware virtual machine placement scheme that effectively places virtual machines on physical machines. First, we employ neural networks model and factor model to forecast the resource utilization trend data according to the historical resource utilization data. Second, we design three correlation-aware virtual machine placement algorithms to enhance resource utilization while meeting the user-defined SLAs. The simulation results show that the efficiency of our virtual machine placement algorithms outperforms the generic algorithm and constant variance algorithm by about 15%-30%.

Workload Prediction for Cloud Cluster Using a Recurrent Neural Network

Conference Paper

Oct 2016

Predicted Affinity Based Virtual Machine Placement in Cloud Computing Environments

Article

Aug 2017

In cloud data centers, an appropriate Virtual Machine (VM) placement has become an effective method to improve the resource utilization and reduce the energy consumption. However, most current solutions regard the VM placement as a bin-packing problem and each VM is seen as a single object. None of them have taken the relationships between VMs into consideration, which supplies us with a kind of new perspective. In this paper, we provide a model which explores the relationships for every two VMs based on the resource requirement provided by ARIMA prediction. This model evaluates the volatility of resource utilization after putting the two VMs on the same host and we call this model as affinity model. Based on the affinity model, VMs will be placed on those hosts that have the highest affinity with them. Therefore, we call it as Predicted Affinity based Virtual Machine Placement Algorithm (PAVMP). The advantages of PAVMP are showed by comparing it with other VM placement algorithms on CloudSim simulation platform with the PlanetLab and Google workload trace.

A multitime‐steps‐ahead prediction approach for scheduling live migration in cloud data centers

Abstract and Figures

Recommended publications

Prediction of cloud data center networks loads using stochastic and neural models

Predicting Host CPU Utilization in Cloud Computing using Recurrent Neural Networks

A reinforcement learning approach for the scheduling of live migration from under utilised hosts

An Autonomous Network Aware VM Migration Strategy in Cloud Data Centres

A Network Aware Approach for the Scheduling of Virtual Machine Migration During Peak Loads