ArticlePDF Available

Abstract and Figures

Cloud computing environments allow customers to dynamically scale their applications. The key problem is how to lease the right amount of resources, on a pay-as-you-go basis. Application re-dimensioning can be implemented effortlessly, adapting the resources assigned to the application to the incoming user demand. However, the identification of the right amount of resources to lease in order to meet the required Service Level Agreement, while keeping the overall cost low, is not an easy task. Many techniques have been proposed for automating application scaling. We propose a classification of these techniques into five main categories: static threshold-based rules, control theory, reinforcement learning, queuing theory and time series analysis. Then we use this classification to carry out a literature review of proposals for auto-scaling in the cloud.
Content may be subject to copyright.
Journal of Grid Computing manuscript No.
(will be inserted by the editor)
A Review of Auto-scaling Techniques for Elastic Applications in Cloud
Environments
Tania Lorido-Botran ·Jose Miguel-Alonso ·Jose A. Lozano
Received: date / Accepted: date
Abstract Cloud computing environments allow customers
to dynamically scale their applications. The key problem is
how to lease the right amount of resources, on a pay-as-you-
go basis. Application re-dimensioning can be implemented
effortlessly, adapting the resources assigned to the applica-
tion to the incoming user demand. However, the identifica-
tion of the right amount of resources to lease in order to
meet the required Service Level Agreement, while keeping
the overall cost low, is not an easy task. Many techniques
have been proposed for automating application scaling. We
propose a classification of these techniques into five main
categories: static threshold-based rules, control theory, rein-
forcement learning, queuing theory and time series analysis.
Then we use this classification to carry out a literature re-
view of proposals for auto-scaling in the cloud.
Keywords Cloud computing ·scalable applications ·
auto-scaling ·service level agreement
1 Introduction
Cloud computing is an emerging technology that is becom-
ing increasingly popular because, among other advantages,
T. Lorido-Botran ·J.Miguel-Alonso ·J.A. Lozano
Intelligent Systems Group
University of the Basque Country, UPV/EHU
Paseo Manuel de Lardizabal, 1
20018, Donostia-San Sebastian, SPAIN
T. Lorido-Botran
Department of Computer Architecture and Technology
E-mail: tania.lorido@ehu.es
J. Miguel-Alonso
Department of Computer Architecture and Technology
E-mail: j.miguel@ehu.es
J.A. Lozano
Department of Computer Science and Artificial Intelligence
E-mail: ja.lozano@ehu.es
allows customers to easily deploy elastic applications, greatly
simplifying the process of acquiring and releasing resources
to a running application, while paying only for the resources
actually allocated (pay-per-use or pay-as-you-go model). Elas-
ticity and dynamism are two key concepts of cloud com-
puting. In this context, resources are usually in the form of
Virtual Machines (VMs). Clouds can be used by companies
for a variety of purposes, from running batch jobs to hosting
web applications.
Three main markets are associated to cloud computing:
Infrastructure-as-a-Service (IaaS) designates the provision
of information technology and network resources such
as processing, storage and bandwidth as well as man-
agement middleware. Examples are Amazon EC2 [3],
RackSpace [20] and Google Compute Engine [13].
Platform-as-a-Service (PaaS) designates programming en-
vironments and tools hosted and supported by cloud pro-
viders that can be used by consumers to build and deploy
applications onto the cloud infrastructure. Examples in-
clude Amazon Elastic Beanstalk [5], Google App En-
gine [10], and Microsoft Windows Azure [18].
Software-as-a-Service (SaaS) designates hosted vendor ap-
plications. For example, Google Apps [11], Microsoft
Office 365 [17] and Salesforce.com [25].
In this work we focus on the IaaS client’s perspective.
A typical scenario could be a company that wants to host
an application, and for this purpose leases resources from a
IaaS provider such as Amazon EC2. From now on the fol-
lowing terms will be used:
Provider. It refers mainly to the IaaS provider, that offers
virtually unlimited resources in the form of VMs. A PaaS
provider may also play this role.
Client. The client is the customer of the IaaS or PaaS ser-
vice, that uses it for hosting the application. In other
words, it is the application owner.
2 Tania Lorido-Botran et al.
User. This is the end user that accesses the application and
generates the workload or demand that drives its behav-
ior.
As we said previously, a key characteristic of cloud com-
puting is elasticity. This can be a double-edged sword. While
it allows applications to acquire and release resources dy-
namically, adjusting to changing demands, deciding the right
amount of resources is not an easy task. It would be desirable
to have a system that automatically adjusts the resources to
the workload handled by the application. All this with min-
imum human intervention or, even better, without it. This
would be an auto-scaling system, the focus of this survey.
Resource scaling can be either horizontal or vertical. In
horizontal scaling (scaling out/in), the resource unit is the
server replica (running on a VM), and new replicas are added
or released as needed. In contrast, vertical scaling (scaling
up/down) consists of changing the resources assigned to an
already running VM, for example, increasing (or reducing)
the allocated CPU power or the memory. Most common op-
erating systems do not allow on-the-fly (without rebooting)
changes on the machine on which it runs (even if it is a VM);
for this reason, most cloud providers only offer horizontal
scaling.
The auto-scaler must be aware of the economical costs
of its decisions, which depend on the pricing scheme used
by the provider, to reduce the total expenditure. The pricing
model may include types of VMs, unit charge (per minute,
per hour), etc. The auto-scaler must also secure the correct
functioning of the application, by maintaining an acceptable
Quality of Service (QoS). The QoS depends on two types
of Service Level Agreement (SLA): the application SLA,
which is a contract between the client (application owner)
and end users; and the resource SLA, that is agreed by the
provider and the client. An example of the former is a certain
response time, and of the latter, 99.9% availability of the
infrastructure. Both types of SLA are usually mixed up, as to
satisfy the application SLA, it is necessary for the provider
to comply with the resource SLA. From now on, this review
focuses on the application SLA that must be guaranteed to
end users.
Applications hosted in cloud environments can be of very
diverse nature: batch jobs, map-reduce tasks, massively mul-
tiplayer online role-playing games (MMORPGs), web appli-
cations, video streaming services, and a large etcetera. Re-
source allocation for batch applications is usually denoted
as scheduling and involves meeting a certain job execution
deadline. It has been extensively studied in grid environ-
ments (job schedulers) and also explored in cloud environ-
ments, but this is not the topic of this review. The scope of
applicability of the auto-scaler covers any scalable or elas-
tic application composed of a load balancer that receives
job requests and dispatches them to any of a collection of
replicable servers. Web applications, MMORPGs and video
streaming services fit into this scheme. The main contribu-
tions of this review are:
A problem definition of the auto-scaling process and its
different phases.
A classification for auto-scaling techniques, together with
a description of each category and its pros/cons.
A review of the literature about auto-scaling, organized
using this classification. However, given the heterogene-
ity of auto-scaling approaches and testing conditions, it
is not possible to provide an assessment of the different
proposals, alone or in comparative terms.
A description, in an Appendix, of the different tools and
environments used to implement and test auto-scalers.
Note that the management of the cloud infrastructure is
out of the scope of this paper: topics such as VM placement
into physical servers, VM migration, energy consumption
and related problems are not discussed.
The remainder of this paper is organized as follows. Sec-
tion 2 describes the general scenario of elastic applications
in which to apply auto-scaling techniques. The auto-scaling
process is described in Section 3. A classification criterion
to organize auto-scaling proposals is introduced in Section
4. Section 5, the core of this work, contains a review of the
literature on auto-scaling in the cloud. Section 6 completes
this paper with some conclusions extracted from the review
and an outline of future work. Furthermore, Appendix A
provides details about the experimental platforms, workload
generation mechanisms and application benchmarks used by
different authors to test their proposal of auto-scaling algo-
rithms.
2 Scenario: Elastic Applications
An elastic application has the capability of being scaled (hor-
izontally or vertically) to make it adjust to the input, vari-
able workload. Applications of this class are normally based
on a load balancer (a dispatcher) and a collection of identi-
cal servers. In a cluster environment, those would be physi-
cal servers but, in cloud environments, servers are hosted in
VMs and, therefore, the terms servers and VMs can be used
interchangeably. An auto-scaler is in charge of making de-
cisions about scaling actions, without the intervention of a
human manager.
Although several applications can be considered elastic
(e.g. MMORPG or video streaming servers), most of the lit-
erature is focused on web applications. These typically in-
clude a business-logic tier, containing the application logic,
and a persistence or data base tier. The literature pays most
of its attention to the business-logic tier, that can be easily
scaled. There are some works that study auto-scaling at the
persistence tier, although replicating distributed databases
A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 3
brings out additional issues. Our description of auto-scaling
techniques is as neutral as possible, without focusing on any
particular application class or tier. However, given the liter-
ature bias towards web applications, it is unavoidable to see
it reflected in this survey.
Let us consider an elastic application deployed over a
pool of nVMs. VMs may have the same or different as-
signed amount of resources (e.g. 1GB of memory, 2 CPUs,...),
but each VM has its own unique identifier (it could be its IP
address). The requests received by the load balancer may
come from real end users or from other application. We will
assume that the execution time of a request may vary be-
tween milliseconds and minutes; long-running tasks lasting
hours will not be considered in the auto-scaling problem, as
they rather belong to a scheduling problem.
The load balancer will receive all the incoming requests
and forward them to one of the servers in the pool. It may be
implemented in different ways. These are a few examples: a
specific, hardware/software combination (an Internet appli-
ance), a part of the Domain Name System, a service offered
by the IaaS provider, or even an ad-hoc part of the client’s
application. Whichever the chosen technology, it is assumed
that the load balancer has updated information about the
VMs to use (the active ones): it will stop immediately send-
ing requests to removed VMs, and it will start sending work-
load to newly added VMs. It is also assumed that each re-
quest will be assigned to a single VM, that will run it until
completing the task associated to it. Several load balancing
policies could be used, for example, random, round-robin
or least-connection. In the case of heterogeneous collections
of VMs, workload dispatching should be proportional to the
processing power of the VMs. The revision of techniques to
implement load balancers, distribute workload among VMs,
control the admission of new requests, or redirect requests
between VMs fall outside the scope of this survey.
Next section is devoted to the auto-scaling process: its
objectives and the phases of the process.
3 Auto-scaling Process: the MAPE Loop
The aim of the auto-scaler is to dynamically adapt the re-
sources assigned to the elastic applications, depending on
the input workload. This auto-scaler may be either an ad-
hoc implementation for a particular application, or a generic
service offered by the IaaS provider. In any case, the system
should be able to find a trade-off between meeting the SLA
of the application and minimizing the cost of renting cloud
resources. Any auto-scaler faces several problems:
Under-provisioning: The application does not have enough
resources to process all the incoming requests within the
temporal limits imposed by the SLA. More resources are
needed, but it takes some time from the moment they
are requested until they are available. In case of sudden
traffic bursts, an under-provisioned application may lead
to many SLA violations, even to system congestion. It
will take some time until the application returns to its
normal operational state.
Over-provisioning: In this case the application has more re-
sources than those needed to comply with the SLA. This
is correct from the point of view of SLA, but the client is
paying an unnecessary cost if the VMs are idle or lightly
loaded. A certain degree of over-provisioning could be
desirable in order to cope with small workload fluctua-
tions. In general, neither a manual provision of resources,
nor the one carried out by an auto-scaler, aims to keep
the VMs operating at 100% of its capacity.
Oscillation: A combination of both undesirable effects. Os-
cillation occurs when scaling actions are carried out too
quickly, before being able to see the impact on each scal-
ing action on the application. Keeping a capacity buffer
(aiming to keep the VMs at, say, 80% instead of 100%),
or using a cooldown period (e.g. [73]) are common ap-
proaches to avoid oscillation.
The auto-scaling process matches the MAPE loop of au-
tonomous systems [68] [82], which consists of four phases:
Monitoring (M), Analysis (A), Planning (P) and Execution
(E). First, a monitoring system gathers information about the
system and application state. The auto-scaler encompasses
the analysis and planning phases: it uses the retrieved infor-
mation to make estimations of future resource utilization and
needs (analysis), and then plans a suitable resource modifi-
cation action, e.g., removing a VM or adding some extra
memory. The provider will then execute the actions as re-
quested by the auto-scaler. Each of the phases in the MAPE
loop is described in more detail in the forthcoming sections.
3.1 Monitor
An auto-scaling system requires the support of a monitor-
ing system providing measurements about user demands,
system (application) status and compliance with the expected
SLA. Cloud infrastructures provide, through an API, access
to information useful for the provider itself (for example, to
check the correct functioning of the infrastructure and the
level of utilization) and the client (for example, to check the
compliance with the SLA).
Auto-scaling decisions rely on having useful, updated
performance metrics. The performance of the auto-scaler
will depend on the quality of the available metrics, the sam-
pling granularity, and the overheads (and costs) of obtaining
the metrics [38]. In the literature, authors have used a variety
of metrics as drivers for scaling decisions. An extensive list
of those metrics is provided in [59], for both transactional
(e.g. e-commerce web sites) and batch workloads (e.g. video
4 Tania Lorido-Botran et al.
transcoding or text mining). They could be easily adapted to
other types of applications.
Hardware: CPU utilization per VM, disk access, network
interface access, memory usage.
General OS Process: CPU-time per process, page faults, real
memory (resident set).
Load balancer: size of request queue length, session rate,
number of current sessions, transmitted bytes, number
of denied requests, number of errors.
Application server: total thread count, active thread count,
used memory, session count, processed requests, pend-
ing requests, dropped requests, response time.
Database: number of active threads, number of transactions
in a particular state (write, commit, roll-back).
Message queue: average number of jobs in the queue, job’s
queuing time.
Note that, when working with applications deployed in
the cloud, some metrics are obtained from the cloud provider
(those related to the acquired resources and the hypervisor
managing the VMs), others from the host operating sys-
tem on which the application is implemented, and yet others
from the application itself. In order to reduce the complex-
ity of the monitoring system, sometimes proxy metrics are
used: for example, CPU utilization (hypervisor-level) as a
proxy of current system workload (application-level).
As stated before, auto-scaling is mostly related to the
Analyze and Plan parts of the MAPE loop. For this reason,
monitoring will not be further studied in this review. It will
be assumed that a good monitoring tool is available, gather-
ing different and updated metrics about system and applica-
tion current state, with negligible intrusion, and at a suitable
granularity (e.g. per second, per minute).
3.2 Analyze
The analysis phase consists of processing the metrics gath-
ered directly from the monitoring system, obtaining from
them data about current system utilization, and optionally
predictions of future needs. Some auto-scalers do not per-
form any kind of prediction, just respond to current sys-
tem status: they are reactive. However, other use sophisti-
cated techniques to predict future demands in order to ar-
range resource provisioning with enough anticipation: they
are proactive. Anticipation is important because there is al-
ways a delay from the time when an auto-scaling action
is executed (for example, adding a server) until it is effec-
tive (for example, it takes several minutes to assign a phys-
ical server to deploy a VM, move the VM image to it, boot
the operating system and application, and have the server
fully operational [80]). Reactive systems might not be able
to scale in case of sudden traffic bursts (e.g. special offers
or the Slashdot effect). Therefore, proactivity might be re-
quired in order to deal with fluctuating demands and being
able to scale in advance.
Part of the reviewed works focus only on this analysis
phase, on the way of processing the metrics to either deter-
mine the current state of the application or anticipate future
needs.
3.3 Plan
Once the current (or future) state is known (or predicted),
the auto-scaler is in charge of planning how to scale the
resources assigned to the application in order to find a satis-
factory trade-off between cost and SLA compliance. Exam-
ples of scaling decisions are removing a VM or adding more
memory to a particular VM. Decisions will be made consid-
ering the data obtained from the analysis phase (or directly
from the monitoring system) and the target SLA, as well
as other factors related to the cloud infrastructure, including
pricing models and VM boot-up time.
This part constitutes the core of any auto-scaling pro-
posal and, therefore, will be thoroughly studied in Sections
4 and 5.
3.4 Execute
This phase consists of actually executing the scaling actions
decided in the previous step. Conceptually, this is a straight-
forward phase, implemented through the cloud provider’s
API. Actual complexities are hidden to the client. Remem-
ber that it takes some time from the moment a resource is
requested until it is actually available, and that bounds on
these delays may be part of the resource SLA.
4 A Classification of Auto-scaling Techniques
As the body of literature dealing with proposals of auto-
scaling systems is large, we have tried to put some order
into it, to better understand and compare those proposals. To
that extent, we need some classification rules to group works
into meaningful sets. To the best of our knowledge, there is
no previous work proposing such a classification. The most
closely related survey, done by Guitart et al [61], targets
the performance of general Internet applications, deployed
over shared or dedicated clusters, relying on methods such
as admission control and service differentiation. However,
this review focuses on exploiting the elastic nature inherent
to cloud systems. Manvi and Shyam [78] gather many ref-
erences about resource management on IaaS environments,
but put little focus on auto-scaling.
A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 5
The works we have revised to put together this survey
are very diverse in regards to the underlying theory or tech-
nique used to implement the auto-scaler, including the met-
rics or models used to decide both when to scale or how
many resources are necessary. Each auto-scaling approach
has been designed with particular goals, focusing on several
application architectures or cloud providers offering differ-
ent scaling capabilities. The most differentiating factor is
the final goal/evaluation criteria used for a particular auto-
scaler: the prediction accuracy, the compliance with the SLA
(defined in many ways such as response time or availability
of resources), or the cost of resources. It seems quite obvious
that comparing auto-scaling approaches is not that straight-
forward. For example, let us consider two proposals, one fo-
cused in short-term prediction of workloads with a clearly
diurnal pattern, and another one that tries to comply with
the SLA even in the case of sudden bursts of traffic. Both
auto-scaling systems may work well for their target systems,
but clearly the latter approach is more prone to cause over-
provisioning. Still, it would be wrong to assume that this
extra cost due to over-provisioning state is enough to pre-
fer one auto-scaling system instead of the other. Then, the
target goal of the auto-scaler cannot be used as the classifi-
cation criterion.
A possible grouping for auto-scalers could be done using
their anticipation capabilities, arranging them in two classes:
reactive and proactive. The former implies that the system
reacts to changes in the workload only when those changes
have been detected, using the last values obtained from the
set of monitored variables; consequently, as resource provi-
sioning takes some time, the desired effect may arrive when
it is too late. Proactive systems anticipate future demands
and make decisions taking them into consideration. We have
chosen not to use this as a classification criterion because
sometimes it is not clear whether a particular approach is
purely reactive or proactive.
We decided to adopt the underlying theory or technique
used to build the auto-scaler as our classification criterion.
This will help the reader to better understand the basic con-
cepts of a particular group or category, including their ad-
vantages and limitations. Most reviewed works can fit in one
or more of these five groups (note that some works propose
hybridizations of several techniques):
1. Threshold-based rules (rules)
2. Reinforcement learning (RL)
3. Queuing theory (QT)
4. Control theory (CT)
5. Time series analysis (TS)
Commercial cloud providers offer purely reactive auto-
scaling using threshold-based rules. The scaling decisions
are triggered based on some performance metrics and pre-
defined thresholds. This approach has become rather pop-
ular due to its (apparent) simplicity: rule-based auto-scaler
are easy to provide as a cloud service, and are also easy to
set-up by clients. However, the effectiveness of rules under
bursty workloads is questionable.
Time series analysis covers a wide range of methods to
detect patterns and predict future values on sequences of
data points. The accuracy in the forecast value (e.g. future
number of requests or average CPU utilization) will depend
on selecting the right technique and setting the parameters
correctly, specially the history window and the prediction
interval. Time-series analysis is the main enabler of proac-
tive auto-scaling techniques.
There are two auto-scaling methods that rely on mod-
eling the system in order to determine its future resource
needs. This is the case of both queuing theory and control
theory. Queuing theory has been largely applied to comput-
ing systems, in order to find the relationship between the
jobs arriving and leaving a system. A simple approach con-
sists in modeling each VM (or set of VMs) as a queue of
requests in order to estimate different performance metrics
such as the response time. A main limitation of QT models
is that they are too rigid, and need to be recomputed when
there are changes in either the application or the workload.
Control theory also relies on creating a model of the ap-
plication. The aim is to define a (reactive or proactive) con-
troller to automatically adjust the required resources to the
application demands. The nature and performance of a con-
troller highly depends on both the application model and the
controller itself. As we will see, many researchers consider
that this type of auto-scaling has a great potential, specially
when combined with resource prediction.
Finally, the last of our categories contains proposals based
on reinforcement learning. Similarly to control theory, RL
tries to automate the scaling task, but without using any a-
priori knowledge or model of the application. Instead, RL
tries to learn the most suitable action for each particular state
on-the-fly, with a trial-and-error approach. Although the ab-
sence of model and adaptability of the technique might seem
the most appealing for auto-scaling, the truth is that RL suf-
fers from long learning phases. The time required by the
method to converge to an optimal policy can be unfeasible
long.
As defined in Section 3, the auto-scaling process is mainly
related to the analysis and planning phases of the MAPE
loop. Some auto-scaling proposals focus on one of these
phases, but most of them cover both of them. Queuing the-
ory and time series analysis are useful in the analysis phase
in order to estimate the current utilization or future needs of
the application. Threshold-based rules, reinforcement learn-
ing and control theory can be used in the planning phase to
decide the scaling action, and they can be combined with a
previous analysis phase involving for example, time series
analysis.
6 Tania Lorido-Botran et al.
5 Review of Auto-scaling Techniques
In this section we carry out a survey of the literature on auto-
scaling systems, following the classification introduced in
the previous section. It is organized in as many sub-sections
as categories, each one starting with a description of the
technique that defines the category, including the definition,
methodologies, pros and cons. After that, we analyze a col-
lection of papers fitting into the category, also discussing
their features and limitations. This information is comple-
mented with a set of tables summarizing the reviewed lit-
erature, one table per category. Each table row includes a
synopsis of a reviewed auto-scaling paper, which includes:
1. The specific technique or combination of techniques ap-
plied
2. Whether an horizontal (H) or vertical (V) scaling is per-
formed
3. The reactive (R) or proactive (P) nature of the approach
4. The performance metric considered (e.g. CPU load, in-
put request rate)
5. The monitoring tool and the granularity or monitoring
interval
6. Characteristics of the environment used to test the tech-
nique, including
The SLA considered for the application
The type of workload (either real or synthetic)
The experimental platform (simulator, custom testbed
or real provider), together with the application bench-
mark
Note that many table entries contain a “-”. This means
that the reviewed paper does not provide enough informa-
tion to know or infer the corresponding piece of information.
Unfortunately, this happens more often than desirable. Also,
note that authors have used many different mechanisms to
assess the goodness of their auto-scaler, ranging from com-
pletely synthetic simulated environments to actual produc-
tion systems. For more information about testing platforms,
the interested reader is referred to Appendix A.
5.1 Threshold-based Rules
Threshold-based auto-scaling rules or policies are very pop-
ular among cloud providers such as Amazon EC2, and third-
party tools like RightScale [22]. The simplicity and intuitive
nature of these policies make them very appealing to cloud
clients. However, setting the corresponding thresholds is a
per-application task, and requires a deep understanding of
workload trends.
5.1.1 Description of the Technique
From the MAPE loop (see Section 3), rules are purely a
decision-making technique (planning phase). The number of
VMs or the amount of resources assigned to the target ap-
plication will vary according to a set of rules, typically two:
one for scaling up/out and one for scaling down/in. Rules
are structured like these examples:
if x1>thrU1and/or x2>thrU2and/or .. .
for durU seconds then
n=n+sand
do nothing for inU seconds
(1)
if x1<thrL1and/or x2<thrL2and/or .. .
for durL seconds then
n=nsand
do nothing for inL seconds
(2)
Each rule has two parts: the condition and the action to
be executed when the condition is met. The condition part
uses one or more performance metrics x1,x2,..., such as the
input request rate, CPU load or average response time. Each
performance metric has upper thrU and lower thrL thresh-
olds. If the condition is met for a given time (durU or durL),
then the corresponding action will be triggered. For horizon-
tal scaling, the application manager should define a fixed
amount sof VMs to be acquired or released, while for ver-
tical scaling srefers to an increase or decrease of resources
such as CPU or RAM. After executing an action, the auto-
scaler inhibits itself for a small cooldown period defined by
inU or inL.
The best way to understand threshold-based rules is by
means of an example: add 2 small instances when the aver-
age CPU usage is above 70% for more than 5 minutes, and
then, do nothing for 10 minutes.
5.1.2 Review of Proposals
Threshold-based rules constitute an easy to deploy and use
mechanism to manage the amount of resources assigned to
an application hosted in a cloud platform, dynamically adapt-
ing those resources to the input demand (e.g. [54], [72], [81],
[48] [63], [64]). However, creating the rules requires an ef-
fort from the application manager (the client), who needs
to select the suitable performance metric or logical com-
bination of metrics, and also the values of several parame-
ters, mainly thresholds. The experiments carried out by [72]
show that application-specific metrics (e.g. the average wait-
ing time in queue), obtain better performance that system-
specific metrics (e.g. CPU load). Application managers also
need to set the corresponding upper (e.g. 70%) and lower
(e.g. 30%) thresholds for the performance variable (e.g. CPU
load). Thresholds are the key for the correct working of the
rules. In particular, Dutreilh et al [54] remark that thresh-
olds need to be carefully tuned in order to avoid oscillations
in the system (e.g. in the number of VMs or in the amount of
A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 7
Table 1 Summary of the reviewed literature about threshold-based rules. Table rows are as follow. (1) The reference to the reviewed paper. (2)
A short description of the proposed technique. (3) The type of auto-scaling: horizontal (H) or vertical (V). (4) The reactive (R) and/or proactive
(P) nature of the proposal. (5) The performance metric or metrics driving auto-scaling. (6) The monitoring tool used to gather the metrics. The
remaining three fields are related to the environment in which the technique is tested. (7) The metric used to verify SLA compliance. (8) The
workload applied to the application managed by the auto-scaler. (9) The platform on which the technique is tested.
Ref Auto-scaling Techniques H/V R/P Metric Monitoring SLA Workloads Experimental Platform
[63] Rules Both R CPU, memory, I/O Custom tool. 1
minute
Response time Synthetic. Browsing and
ordering behavior of
customers.
Custom testbed (called IC Cloud) +
TPC
[72] Rules H R Average waiting
time in queue, CPU
load
Custom tool. - Synthetic Public cloud. FutureGrid, Eucalyp-
tus India cluster
[64] Rules Both R CPU load, response
time, network link
load, jitter and delay.
- - Only algorithm is de-
scribed, no experimenta-
tion is carried out.
[48] Rules + QT H P Request rate Amazon Cloud-
Watch. 1-5 minutes
Response time Real. Wikipedia traces Real provider. Amazon EC2 +
Httperf + MediaWiki
[52] RightScale + MA to per-
formance metric
H R Number of active
sessions
Custom tool - Synthetic. Different
number of HTTP clients
Custom testbed. Xen + custom col-
laborative web application
[73] RightScale + TS: LR and
AR(1)
H R/P Request rate, CPU
load
Simulated. - Synthetic. Three traffic
patterns: weekly oscilla-
tion, large spike and ran-
dom
Custom simulator, tuned after some
real experiments.
[59] RightScale H R CPU load Amazon Cloud-
Watch
- Real. World Cup 98 Real provider. Amazon EC2 +
RightScale (PaaS) + a simple web
application
[96] RightScale + Strategy-tree H R Number of sessions,
CPU idle
Custom tool. 4 min-
utes.
- Real. World Cup 98 Real provider. Amazon EC2 +
RightScale (PaaS) + a simple web
application.
[81] Rules V R CPU load, memory,
bandwidth, storage
Simulated. - Synthetic Custom simulator, plus Java rule
engine Drools
[77] Rules V R CPU load Simulated. 1 minute Response time Real. ClarkNet Custom simulator
CPU assigned). To prevent this problem, it is advisable to set
an inertia,cooldown or calm period, a time during which no
scaling decisions can be committed, once an scaling action
has been carried out.
Most authors and cloud providers use only two thresh-
olds per performance metric. However, Hasan et al [64] have
considered using a set of four thresholds and two durations:
ThrU, the upper threshold; ThrbU, which is slightly below
the upper threshold; ThrL, the lower threshold; ThroL, which
is slightly above the lower threshold. Used in combination,
it is possible to determine the trend of the performance met-
ric (e.g. trending up or down), and then perform finer auto-
scaling decisions.
Conditions in the rules are usually based on a single, or
at most two performance metrics, being the most popular
the average CPU load of the VMs, the response time, or the
input request rate. Both Dutreilh et al [54] and Han et al
[63] use the average response time of the application. On the
contrary, Hasan et al [64] prefer using performance metrics
from multiple domains (compute, storage and network) or
even a correlation of several of them.
Note that in most cases the rules use the metrics directly
as obtained from the monitor, thus acting in a purely re-
active way. However, it is possible to carry out a previous
analysis of the monitored data in order to predict the fu-
ture (expected) behavior of the system and execute rules in a
proactive way [71]. This topic will be discussed later, in the
section devoted to time series analysis.
RightScale’s auto-scaling algorithm [23] propose com-
bining regular reactive rules with a voting process. If a ma-
jority of the VMs agree on that they should scale up/out or
down/in, that action is taken; otherwise, no action is planned.
Each VM votes to scale up/out or down/in based on a set
of rules evaluated individually. After each scaling action,
RightScale recommends a 15 minute-period of calm time
because new machines generally take between 5 to 10 min-
utes to be operational. This auto-scaling technique has been
adopted by several authors ([73], [51], [59], [96]). Chieu
et al [51] initially proposed a set of reactive rules based on
the number of active sessions, but this work was extended
in Chieu et al [52] following the RightScale approach: if
all VMs have active sessions above the given upper thresh-
old, a new VM is provisioned; if there are VMs with ac-
tive sessions below a given lower threshold and with at least
one VM that has no active session, the idle one will be shut
down.
As RightScale’s voting system is based on rules, it in-
herits their main disadvantage: the technique is highly de-
pendent on manager-defined threshold values, and also, on
the characteristics of the input workload. This was the con-
clusion reached by Kupferman et al [73] after comparing
RightScale with other algorithms. Simmons et al [96] try
to overcome these problems with a strategy-tree, a tool that
evaluates the deployed policy set, and switches among alter-
native strategies over time, in a hierarchical manner. Authors
created three different scaling policies, customized to dif-
ferent input workloads, and the strategy-tree would switch
among them based on the workload trend (analyzed with a
regression-based technique).
In order to save costs, Kupferman et al [73] (and other
authors [48]) came up with an idea called smart kill. Many
IaaS providers charge partial utilization hours as full hours.
Therefore, it is advisable not to terminate a VM before the
hour is over, even if the load is low. Apart from reducing
costs, smart kill may also improve system performance: in
case of an oscillating input workload, the costs of continu-
8 Tania Lorido-Botran et al.
ously shutting down and starting up VMs are avoided, im-
proving boot-up delays and reducing SLA violations.
The popularity of rules as auto-scaling method is prob-
ably due to their simplicity and the fact that they are easy
to understand for clients. However, this kind of technique
shows two main problems: its reactive nature and the dif-
ficulty of selecting the correct set of performance metrics
and the corresponding thresholds. The effectiveness of those
thresholds is highly dependent on the workload changes,
and may require frequent tuning. In order to solve the prob-
lem of static thresholds, Lorido-Botran et al [77] introduce
the concept of dynamic thresholds: initial values are set-up,
but they are automatically modified as a consequence of the
observed SLA violations. Meta-rules are included to define
how the threshold used in the scaling rules may change to
better adapt to the workload.
In conclusion, Rules can be used to easily automate the
auto-scaling of a particular application without much effort,
specially in the case of applications with quite regular, pre-
dictable patterns. However, in case of bursty workloads the
client should consider a more advanced and powerful auto-
scaling system from the rest of categories.
The main characteristics of the auto-scaling proposals
based on rules and reviewed in this section are summarized
in Table 1.
5.2 Reinforcement Learning
Reinforcement Learning (RL) [98] is a type of automatic
decision-making approach that has been used by several au-
thors to implement auto-scalers. Without any a priori knowl-
edge, RL techniques are able to determine the best scaling
action to take for every application state, given the input
workload.
5.2.1 Description of the Technique
Reinforcement learning [98] focuses on learning through di-
rect interaction between an agent (e.g. the auto-scaler) and
its environment (e.g. the application as defined in Section 2).
The auto-scaler will learn from experience (trial-and-error
method) the best scaling action to take, depending on the
current state, given by the input workload, performance or
other set of variables. After executing an action, the auto-
scaler gets a response or reward (e.g. performance improve-
ment) from the system, about how good that action was. So,
the auto-scaler will tend to execute actions that yield a high
reward (best actions are reinforced). From now on, in this
section the general term agent will be used, instead of auto-
scaler.
The objective of the agent is to find a policy πthat maps
every state sto the best action athe agent should choose.
The agent has to maximize the expected discounted rewards
obtained in the long run:
Rt=rt+1+γrt+2+γ2rt+3+. . . =
0
γkrt+k+1(3)
where rt+1is the reward obtained at time t+1, and gamma
is the discount factor.
The policy is based on a value function Q(s,a), usually
called the Q-value function. Every Q(s,a)value estimates
the future cumulative rewards by executing an action ain a
state s. In other words, it represents the goodness of execut-
ing action awhen in state s. The Q-value function can be
defined as:
Q(s,a) = Eπ(
k=0
γkrt+k+1|st=s,at=a)(4)
There are many RL algorithms that can be used in order
to obtain the Q-value function, but Q-learning is the most
used in the literature. Typically, the Q(s,a)values are stored
in a lookup table, that maps all system states sto their best
action a, and the corresponding Q-value. The Q-learning al-
gorithm is sketched in Algorithm 1.
Algorithm 1 Q-learning basic steps
1: Initialize the Q-values table, Q(s,a).
2: Observe the current state, s.
3: loop {infinitely}
4: Choose an action, a, for state sbased on one of the action se-
lection policies, such as ε-greedy.
5: Execute the action, and observe the reward, r, as well as the new
state, s0.
6: Update the Q-value for the state using the observed reward and
the maximum reward possible for the next state. The resulting
update formula is:
Q(s,a) = Q(s,a) + α[r+γmax
a0Q(s0,a0)Q(s,a)] (5)
7: Set the state sto the new state s0.
Step 4 of the algorithm involves choosing an action for
a given state. Among the existing action selection policies,
ε-greedy is often the one selected in the literature. Most of
the time (with probability 1 ε), the action with the best
reward will be executed (argamax Q(s,a)); a random action
will be chosen with a low probability ε, in order to explore
non-visited actions. Once the action is executed, the corre-
sponding Q-value is updated (step 6) with the obtained re-
ward rand the maximum reward possible for the next state
maxa0Q(s0,a0). The parameter γis the discount factor that
adjusts the importance given to future rewards. The update
formula also includes a parameter α, that determines the
learning rate. It can be the same for every state-action pair,
A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 9
or can be adjusted based on the number of times each state
has been visited.
The policy learned by the agent is just the action that
maximizes the Q-value in each state. Watkins and Dayan
[104] proved that the discrete case of Q-learning converges
to an optimal policy under certain conditions, and this is in-
dependent of the initial values of the Q-table. If each pair
(s,a)is visited an infinite number of times, then the lookup
table converges to a unique set of values Q(s,a) = Q(s,a),
which defines a stationary deterministic optimal policy. Note
that the auto-scaling process is a continuing task, not station-
ary. For this reason, the Q-learning policy has to be learned
infinitely and adapted to the workload changes.
An algorithm very similar to Q-learning is SARSA [98].
In contrast to Q-learning, it does not use the maximum re-
ward for the next state maxa0Q(s0,a0)to update the Q(s,a)
value. Instead, the same action selection policy (check Step
4 in Algorithm 1) is applied to the new state s0, in order
to determine an action a0. Then, the corresponding Q(s0,a0)
value is used to update the current Q(s,a), as shown in the
following formula:
Q(s,a) = Q(s,a) + α[r+γQ(s0,a0)Q(s,a)] (6)
The effect of a scaling decision takes some time to have
an impact on the application, and the reward comes after a
delay. Given that RL makes a decision with current informa-
tion (application state) about a future reward (e.g. response
time), a proactive nature is assumed for all RL approaches.
The method includes two phases of the MAPE process: an-
alyze and plan. First, information about application and re-
wards are collected in a lookup table (or any other structure)
for later use (analyze phase). Then, in the planning phase,
this information is used to decide the best scaling action.
5.2.2 Review of Proposals
Three basic elements have to be defined in order to apply RL
to auto-scaling: the action set A, the state space S, and the
reward function R. The first two highly depend on the type
of scaling, either horizontal or vertical, whereas the reward
function usually takes into account the cost of the acquired
resources (renting VMs, bandwidth, ...) and the penalty cost
for SLA violations. In case of horizontal scaling, the state is
mostly defined in terms of the input workload and the num-
ber of VMs. For example, Tesauro et al [99] propose using
(w,ut1,ut), where wis the total number of user requests ob-
served per time period; utand ut1are the number of VMs
allocated to the application in the current time step, and the
previous time step, respectively; Dutreilh et al [55] consid-
ered the (w,u,p)state definition, where pis the performance
in terms of average response time to requests, bounded by a
value P
max chosen from experimental observations. The set
of actions for horizontal scaling are usually three: add a new
VM, remove an existing VM, and do nothing.
In contrast, the state definition for vertical scaling takes
into account the amount of resources assigned to each VM
(CPU and memory, mostly). In particular, Rao et al [92]
[105] considered the following state definition: (mem1,time1,
vcpu1,...,memu,timeu,vc puu), where memi,t imeiand vcpui
are the ith VM’s memory size, scheduler credit and the num-
ber of virtual CPUs, respectively. For each of the three vari-
ables, possible operations can be either increase, decrease or
no-operation (i.e. maintain the previous value). The actions
for the RL task are defined as the combinations of the op-
erations on each variable. Given that the variables are con-
tinuous, operations are discretized: mem is reconfigured in
blocks of 256MB; scheduler changes (time) in 256 credits;
and vcpu in 1 unit. The same authors propose in [93] a dis-
tributed approach, in which a RL agent is learned per VM. In
this case, the state is configured as (CPU,bandwidth,memory,
swap).
Even though Q-learning is the most extended algorithm
in auto-scaling, there are some exceptions: e.g. Tesauro et al
[99] use the SARSA approach, explained before. Some of
the articles referenced in this section ([92], [93], [45], [105])
do not specify which is the specific RL algorithm applied
in the experiments, but the problem definition and update
function resembles those of SARSA. Both Q-learning and
SARSA present several problems, that have been addressed
in a number of ways [54], [55] [42], [99], [92]:
Bad initial performance and large training time. The per-
formance obtained during live on-line training may be
unacceptably poor, both initially and during a long train-
ing period, before a good enough solution is found. The
algorithm needs to explore the different states and ac-
tions.
Large state-space. This is often called the curse of di-
mensionality problem. The number of states grows expo-
nentially with the number of state variables, which leads
to scalability problems. In the simplest form, a lookup
table is used to store a separate value for every possible
state-action pair. As the size of such a table increases,
any access to the table will take a longer time, delaying
table updates and action selection.
Exploration actions and adaptability to environment chan-
ges. Even assuming that an optimal policy is found, the
environment conditions (e.g. workload pattern) may change
and the policy has to be re-adapted. For this purpose, the
RL algorithm executes a certain amount of exploration
actions, believed to be suboptimal. This could lead to
undesired bad performance, but it is essential in order
to adapt the current policy. Following this method, RL
algorithms are able to cope with relatively smooth chan-
ges in the behavior of the application, but not to sudden
burst in the input workload.
10 Tania Lorido-Botran et al.
Table 2 Summary of the reviewed literature about reinforcement learning
Ref Auto-scaling Techniques H/V R/P Metric Monitoring SLA Workloads Experimental Platform
[54] Rules + RL H P Request rate, re-
sponse time, number
of VMs
Custom tool. 20 sec-
onds
Response time Synthetic. Made up of 5
sinusoidal oscillations
Custom testbed + RUBiS
[55] RL H P Number of user re-
quests, number of
VMs, response time
- Response time, cost Synthetic. With sinu-
soidal pattern
Custom testbed. Olio application +
Custom decision agent (VirtRL)
[42] RL H P Number of user re-
quests, number of
VMs, time
Simulated Response time, cost Synthetic. Based on
Poisson distribution
Custom simulator (Matlab)
[99] RL(+ANN) - SARSA +
Queuing model
H P Arrival rate, previ-
ous number of VMs
- Response time Synthetic. Poisson dis-
tribution (open-loop),
different number of
users and exponentially
distributed think times
(closed-loop)
Custom testbed (shared data cen-
ter). Trade3 application (a realistic
simulation of an electronic trading
platform)
[92] RL(+ANN) V P CPU and memory
usage
Custom tool Throughput, re-
sponse time
Synthetic: 3 workload
mixes (browsing, shop-
ping and ordering)
Custom testbed. Xen + 3 applica-
tions (TPC-C, TPC-W, SpecWeb)
[105] RL(+ANN) V P CPU and memory
usage
Custom tool Response time Synthetic: 3 workload
mixes (browsing, shop-
ping and ordering)
Custom testbed. Xen + 3 applica-
tions (TPC-C, TPC-W, SpecWeb)
[93] RL(+CMAC) V P CPU, I/O, memory,
swap
Custom tool
(scripts)
Response time,
throughput
Real. ClarkNet trace Custom testbed. Xen + 3 applica-
tions (TPC-C, TPC-W, SpecWeb)
[45] RL(+Simplex) V P CPU, memory - ap-
plication parameters
Custom tool Response time,
throughput
Synthetic. Different
number of clients
Custom testbed. Xen + 2 applica-
tions (TPC-C, TPC-W)
[108] CT - PID controller + RL
+ ARMAX model + SVM
regression
V P Application adap-
tive parameters
CPU and memory
- Application-related
benefit function
Synthetic Custom testbed. Xen + 2 real ap-
plications (Great Lake Forecasting
System, Volume Rendering)
The problem of the bad performance in the early steps
has been addressed in a number of ways. Its main cause is
the lack of initial information and the random nature of the
exploration policy. For this reason, Dutreilh et al [54] used a
custom heuristic to guide the state space exploration, and the
learning process lasted for 60 hours. The same authors [55]
further propose an initial approximation of the Q-function
that updates the value for all states at each iteration, and
also speeds up the convergence to the optimal policy. Re-
ducing this training time can also be addressed with a policy
that visits several states at each step [92] or using parallel
learning agents [42]. In the latter, each agent does not need
to visit every state and action; instead, it can learn the value
of non-visited states from neighboring agents. A radically
different approach to avoid the poor early performance of
on-line training consists of using an alternative model (e.g.
a queuing model) to control the system, whilst the RL model
is trained off-line on collected data [99].
Some authors have proposed methods to address the curse
of dimensionality issue, reducing the state space. Bu et al
[45] use a Simplex optimization that selects the promising
states that would return a high reward. A parallel approach
would also help coping with large state spaces. Barrett et al
[42] create an agent per VM, that keeps its own, small lookup
table. Using a lookup table for representing the Q-function
is not efficient, and other nonlinear function approximators
can be utilized, such as neural networks, CMACs (Cerebel-
lar Model Articulation Controllers), regression trees, sup-
port vector machines, wavelets and regression splines. For
example, neural networks [99], [92] take the state-action
pairs as input and output the approximated Q-value. They
are also able to predict the value for non-visited states, and
deal with continuous state spaces. Rao et al [93] combine
the parallel approach (an agent per VM) with a CMAC [37]
[93] to represent the Qfunction. Authors found that updates
of the CMAC-based Qtable only need 6.5 milliseconds, in
comparison with the 50-second update time in a neural net-
work.
It is also worth mentioning the usefulness of RL in other
tasks that can be tightly linked to the auto-scaling prob-
lem. For example, application parameter configuration [105]
[45]. Xu et al [105] use an ANN-based RL agent to tune pa-
rameters directly related to the application and VM perfor-
mance, such as MaxClients, Keepalive timeout, MinSpare-
Servers, MaxThreads and Session timeout (these are exam-
ples of tunable parameters from Tomcat or Apache applica-
tions).
RL can be used in combination with other methods such
as control theory. Following a radically different approach,
Zhu and Agrawal [108] combine a PID controller with an
RL agent in charge of estimating the derivative term (this
is further explained in Section 5.4). The controller guides
the parameter adaptation of applications (e.g. image size)
in order to meet the SLA. Then, virtual resources, CPU and
memory, are dynamically provisioned according to the change
in the adaptive parameters.
Before finishing this section, it is important to remark
the interesting capability of RL algorithms to capture the
best management policy for a target scenario, without any
a-priori knowledge. The client does not need to define a par-
ticular model for the application; instead, it is learned online
and adapted if the conditions of the application, workload
or system change. In our opinion, RL techniques could be
a promising approach to solve the auto-scaling task of gen-
eral applications, but the field is not mature enough to satisfy
the requirements of a real production scenario. In this open
research field, efforts should be addressed towards provid-
ing an adequate adaptability to sudden bursts in the input
workload, and also to deal with continuous state spaces and
actions.
A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 11
Table 2 shows a summary of the articles reviewed in this
section.
5.3 Queuing Theory
Classical queuing theory has been extensively used to model
Internet applications and traditional servers. Queuing theory
can be used for the analysis phase of the auto-scaling task
in elastic applications (see Section 3), i.e., estimating per-
formance metrics such as the queue length or the average
waiting time for requests. This section describes the main
characteristics of a queuing model and how they can be ap-
plied to scalable scenarios.
5.3.1 Description of the Technique
Queuing theory makes reference to the mathematical study
of waiting lines, or queues. The basic structure of a model is
depicted in Figure 1. Requests arrive to the system at a mean
arrival rate λ, and are enqueued until they are processed. As
the figure shows, one or more servers may be available in
the model, that will attend requests at a mean service rate µ.
S
S
S
λ λ
S
Fig. 1 A simple queuing model with one server (left) and multiple
servers (right)
Kendall’s notation is the standard system used to de-
scribe and classify queuing models. A queue is denoted by
A/B/C/K/N/D. This is the meaning of each element in the
notation:
A:Inter-arrival time distribution.
B:Service time distribution.
C:Number of servers.
K:System capacity or queue length. It refers to the maxi-
mum number of customers allowed in the system includ-
ing those in service. When the system is fully occupied,
further arrivals are rejected.
N:Calling population. The size of the population from which
the customers come. If the requests come from an infi-
nite population of customers, the queuing model is open,
whereas a closed model is based on a finite population
of customers.
D:Service discipline or priority order. The service disci-
pline or priority order in which jobs in the queue are
served. The most typical one is FIFO/FCFS (First In
First Out / First Come First Served), in which the re-
quests are served in the order they arrived. There are al-
ternatives such as LIFO/LCFS (Last in First Out / Last
Come First Served) and PS (Processor Sharing), among
others.
Elements K,Nand Dare optional; if not present, it is
assumed that K=,N=and D=FIFO. The most typ-
ical values for both inter-arrival time Aand service time B,
are M,Dand G.Mstands for Markovian and it refers to
a Poisson process, which is characterized by a parameter λ
that indicates the number of arrivals (requests) per time unit.
Therefore, the inter-arrival or the service time will follow an
exponential distribution. Dmeans deterministic or constant
times. Another commonly used value is G, that corresponds
to a general distribution with known parameters.
The elastic application scenario described in Section 2
can be formulated using a simple queuing model, consider-
ing a single queue representing the load balancer, that dis-
tributes the requests among nVMs (see Figure 1). In order to
represent more complex systems such as multi-tier applica-
tions, a queuing network can be utilized. For example, each
tier can be modeled as a queue with one or nservers.
Queuing theory is used to analyze systems with a sta-
tionary nature, characterized by constant arrival and service
rates. Its objective is to derive some performance metrics
based on the queuing model and some known parameters
(e.g. arrival rate λ). Examples of performance metrics are
the average time waiting in the queue, and the mean re-
sponse time. In case of scenarios with changing conditions
(i.e. non-constant arrival and services rates), such as our
target, scalable applications, the parameters of the queuing
model have to be periodically recalculated, and the metrics
recomputed.
There are two main ways to solve queuing models: an-
alytically and by means of simulation. The former can only
be used with simple models with well-defined distributions
for arrival and service processes, such as M/M/1 and G/G/1.
A well-known analytical way of solving queuing networks
is Mean Value Analysis (MVA) [83]. When analytical ap-
proaches are not feasible, that is, for relatively complex mod-
els, simulation can still be used to obtain the desired metrics.
M/M/1 is the simplest queuing (Poisson-based) model,
where both the arrival times and the service times follow ex-
ponential distributions. In this case, the mean response time
Rof a M/M/1 model can be calculated as R=1
µλ, given
a service rate µand an arrival rate λ(the response time R
is the sum of the enqueued time and the service time). An-
other simple queuing model is G/G/1, in which the system’s
inter-arrival and service times are governed by general dis-
tributions with known mean and variance. The behavior of a
G/G/1 system can be captured using the following equation:
λs+σ2
a+σ2
b
2(Rs)1
, where Ris the mean response time, s
12 Tania Lorido-Botran et al.
is the average service time for a request and σ2
a,σ2
bare the
variances of inter-arrival time service time, respectively.
In many queuing scenarios, it is useful to apply Little’s
Law, which states that the average number of customers (or
requests) E[C]in the system is equal to the average customer
arrival rate λmultiplied by the average time of each cus-
tomer in the system E[T]:E[C] = λ×E[T].
5.3.2 Review of Proposals
In the literature, both simple queuing models and more com-
plex queuing networks have been widely used to analyze the
performance of computer applications and systems.
Ali-Eldin et al [39] [40] model a cloud-hosted applica-
tion as a G/G/n queue, in which the number of servers n
is variable. The model can be solved to compute, for exam-
ple, the necessary resources required to process a given input
workload λ, or the mean response time for requests, given a
configuration of servers.
Queuing networks can also be used to model elastic ap-
plications, representing each VM (server) as a separate queue.
For example, Urgaonkar et al [100] use a network of G/G/1
queues (one per server). They use histograms to predict the
peak workload. Based on this value and the queuing model,
the number of servers per application tier is calculated. This
number can be corrected using reactive methods. The clear
drawback is that provisioning for peak load drives to high
under-utilization of resources.
Multi-tier applications can be studied using one or more
queues per tier. Zhang et al [107] considered a limited num-
ber of users, and thus, they used a closed system with a net-
work of queues; this model can be efficiently solved using
MVA. Han et al [62] modeled a multi-tier application as an
open system with a network of G/G/n queues (one per each
tier); this model is used to estimate the number of VMs that
need to be added to, or removed from, the bottleneck tier,
and the associated cost (VM usage is charged per minute,
instead of the typical per-hour cost model). Finally, Baci-
galupo et al [41] considered a queuing network of three tiers,
solving each tier to compute the mean response time.
The discussed approaches are used as part of the analy-
sis phase of the MAPE loop. Many different techniques can
be used to implement the planning phase, such as a predic-
tive controller [40] or an optimization algorithm (e.g. dis-
tributing servers among different applications, while maxi-
mizing the revenue [101]). The information required for a
queuing model, such as the input workload (number of re-
quests) or service time can be obtained by on-line monitor-
ing [62] [100] or estimated using different methods. For ex-
ample, Zhang et al [107] used a regression-based approxi-
mation in order to estimate the CPU demand, based on the
number and type (browsing, ordering or shopping) of client
requests.
Queuing models have been extensively used to model
applications and systems. They usually impose a fixed ar-
chitecture, and any change in structure or parameters re-
quire solving the model (with analytical or simulation-based
tools). For this reason, they are not cheap when used with
elastic (dynamically variable) applications that have to deal
with a changing input workload and a varying pool of re-
sources. Additionally, a queuing system is an analysis tool,
that requires additional components to implement a com-
plete auto-scaler. Queuing models could be useful for some
particular cases of applications, e.g. when the relationship
between the number of requests and needed resources is
quite linear. Although there are efforts to model more com-
plex multi-tier applications, queuing theory might not be the
best option when trying to design a general-purpose auto-
scaling system.
Table 3 contains a summary of the articles reviewed in
this section.
5.4 Control Theory
Control theory has been applied to automate the manage-
ment of different information processing systems, such as
web server systems, storage systems and data centers/server
clusters. For cloud-hosted, elastic applications, a control sys-
tem may combine both phases of the auto-scaling task (anal-
ysis and planning).
5.4.1 Description of the Technique
The main objective of a controller is to automate the man-
agement (e.g. scaling task) of a target system (e.g. a cloud
application as defined in Section 2). The controller has to
maintain the value of a controlled variable y (e.g. CPU load),
close to the desired level or set point yre f , by adjusting the
manipulated variable u (e.g. number of VMs). The manipu-
lated variable is the input to the target system, whereas the
controlled variable is measured by a sensor and considered
the output of the system.
There are three main types of control systems: open loop,
feedback and feed-forward. Open-loop controllers, also re-
ferred to as non-feedback, compute the input to the target
system using only the current state and its model of the sys-
tem. They do not use feedback to determine whether the sys-
tem output yhas achieved the desired goal yref . In contrast,
feedback controllers observe system output, and are able to
correct any deviation from the desired value (see Figure 2).
Feed-forward controllers try to anticipate to errors in the
output. They predict, using a model, the behavior of the sys-
tem, and react before the error actually occurs. The predic-
tion may fail and, for this reason, feedback and feed-forward
controllers are usually combined.
A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 13
Table 3 Summary of the reviewed literature about queuing theory
Ref Auto-scaling Techniques H/V R/P Metric Monitoring SLA Workloads Experimental Platform
[100] QT + Histogram + Thresh-
olds
H R/P Peak workload Custom tool. 15
minutes
Response time Synthetic and Real
(World Cup 98)
Custom testbed. Xen + 2 applica-
tions (RUBiS and RUBBOS)
[101] QT H R Arrival rate, service
time
Simulated Response time Real. E-commerce web-
site (2001) + Synthetic
traces
Custom simulator (Monte-Carlo)
[107] QT + Regression (Predict
CPU load)
- P Number and type
of transactions
(requests), CPU
load
Custom tool. 1
minute
- Synthetic (browsing, or-
dering and shopping)
Custom simulator, based on
C++Sim. + Data collected from
TPC-W
[62] QT (model) + Reactive
scaling
H R Arrival rate, service
time
Custom tool. 1
minute
Response time, cost Synthetic (browsing, or-
dering and shopping)
Custom testbed (called IC Cloud) +
TPC-W benchmark
[41] QT + Historical perfor-
mance model
H P Arrival rate - Response time Synthetic (browsing,
buying)
Custom testbed. Eucalyptus + IBM
Performance Benchmark Sample
Trade
SYSTEM
Scalable application
CONTROLLER
-
Manipulated
or input
variable (u)
FEEDBACK
Control
error (e)
Target
value or
set point (yref)
Controlled
or output
variable (y)
Fig. 2 Block diagram of a feedback control system
From now on, focus will be put on feedback controllers,
as they are frequently used in the literature. They can be
classified into several categories [90]:
Fixed gain controllers: This class of controllers are very pop-
ular due to their simplicity. However, after selecting the
tuning parameters, they remain fixed during the opera-
tion time of the controller. The most common controller
in this class is called Proportional Integral Derivate (PID),
with the following control algorithm:
uk=Kpek+Ki
k
j=1
ej+Kd(ekek1)(7)
where ukis the new value for the manipulated variable
(e.g. new number of VM); ekis the difference between
the output ykand the set point yre f ; and Kp,Kiand Kdare
the proportional, integral and derivative gain parameters
(respectively) that need to be adapted to a given target
system. Different variants of the PID controller are used,
such as the Proportional Integral (PI) controller or the
Integral (I) controller. The latter can be represented as
uk=uk1+Kiek
Adaptive controllers: As the name suggests, adaptive con-
trol is able to adjust the parameter values on-line, in or-
der to adapt the controller to the changing conditions of
the environment. Examples of adaptive controllers are
self-tuning PID controllers, gain-scheduling and self-tuning
regulators. They are suitable for slowly varying work-
load conditions, but not for sudden bursts; in that case,
the on-line model estimation process may fail to capture
the dynamics of the system.
Model predictive controllers (MPC): MPCs follow a proac-
tive approach; the future behavior (output) of the system
is predicted, based on the model and the current output.
In order to maintain this output close to the target value,
the controller solves an optimization problem taking into
account a pre-defined cost function. An example of MPC
is the look-ahead controller [94].
As explained before, the controller has to adjust the in-
put variable (e.g. number of VMs), in order to maintain the
desired value in the output variable (e.g. average CPU load
of 90%). For this purpose, a formal relationship between the
input and the output has to be modeled, so as to determine
how a change in the former affects the value of the output.
This formal relationship is denoted as transfer function in
classical control theory, state-space function in modern con-
trol theory, or simply as performance model. PID controllers
consider a simple linear equation, but there are many alter-
natives that consider non-linear approaches, and even sev-
eral input and output variables (yielding a Multiple-Input
Multiple-Output (MIMO) controller, instead being Single-
Input Single-Output (SISO)). In the literature, authors have
proposed different performance models of the system, in-
cluding these:
ARMA(X) [88]: An ARMA (auto-regressive moving aver-
age) model is able to capture the characteristics of a time
series and then makes predictions of future values. AR-
MAX (ARMA with eXogenous input) models the rela-
tionship between two time series. Both models are fur-
ther explained in Section 5.5.
Kalman filter [69]: It is a recursive algorithm for making
predictions based on time series.
Smoothing splines [43]: It is a method of smoothing (i.e fit-
ting a smooth curve to a set of noisy observations) using
splines. This term refers to a polynomial function, de-
fined by multiple subfunctions.
Kriging model or Gaussian Process Regression [58]: It ex-
tends traditional linear regression with a statistical frame-
work that is able to predict the value of the target func-
tion in un-sampled locations together with a confidence
measure.
Fuzzy model [106] [103] [74]: Fuzzy models are based on
fuzzy rules. They are based on the idea that the member-
ship of an element to a set has a degree value in a con-
14 Tania Lorido-Botran et al.
Table 4 Summary of the reviewed literature about control theory techniques
Ref Auto-scaling Techniques H/V R/P Metric Monitoring SLA Workloads Experimental Platform
[89] CT: PI controller V R Job progress Sensor library Job deadline Batch jobs Custom testbed. HyperV + 5 appli-
cations (ADCIRC, OpenLB, WRF,
BLAST and Montage)
[75] CT: PI controller (Propor-
tional thresholding) + Ex-
ponential Smoothing for
performance variable
H R CPU load, request
rate
Hyperic HQ (Xen) - Synthetic. Different
number of threads.
Custom testbed. Xen + ORCA +
simple web service
[76] CT: PI controller (Propor-
tional thresholding)
H R CPU load, request
rate
Hyperic SIGAR. 10
seconds
- Synthetic. Custom testbed. Xen + Modified
CloudStone (using Hadoop Dis-
tributed File System)
[88] CT: MIMO adaptive con-
troller + ARMA (perfor-
mance model)
V P CPU usage, disk
I/O, response time
Xen + custom tool.
20 seconds
Response time Synthetic and real-
istic (generated with
MediSyn)
Custom testbed. Xen + 3 appli-
cations (RUBiS, TPC-W, media
server)
[40] CT: Adaptive controllers +
QT
H R/P Number of requests,
service rate
Simulated Number of requests
not handled
Real. World Cup 98. Custom simulator in Python
[39] CT: Adaptive, Propor-
tional controller + QT
H R/P Number of requests,
requests in buffer,
service rate
Simulated. 1 minute - Real. World Cup 98 and
Google Cluster Data
Custom simulator in Python
[43] CT: Gain-scheduler (adap-
tive) + Smoothing splines
(performance model) +
Linear Regression
H P Number of requests,
number of servers,
response time
20 seconds Response time Synthetic (Faban gener-
ator)
Real provider. Amazon EC2 +
CloudStone benchmark
[58] CT: Self-adaptive con-
troller + Kriging model
(performance model)
H P Number of incom-
ing and enqueued re-
quests, number of
VMs
- Execution time Synthetic (Batch jobs) Custom testbed. Private cloud +
Sun Grid Engine (SGE)
[106] CT: fuzzy controller V R Number and mixture
of requests, CPU
load
Custom tool. 20 sec-
onds
Reply rate Real (WorldCup 98) and
Synthetic (Httperf)
Custom testbed. VMware ESX
Server + Java Pet Store.
[103] Fuzzy model V P Number of queries,
CPU load, disk I/O
bandwidth
Xentop. 10 seconds Response time,
throughput
Synthetic and realistic
(based on WorldCup 98)
Custom testbed. Xen + 2 applica-
tions (RUBiS and TPC-H)
[74] CT: Adaptive controller +
Fuzzy model (+ ANN)
V P Number of requests,
resource usage
Custom tool. 3 min-
utes
End-to-end delay Synthetic. (Pareto distri-
bution)
Simulation
[69] CT: Adaptive SISO and
MIMO controllers +
Kalman filter
V P CPU load Custom tool. 5-10
seconds
Response time Synthetic (Browsing and
bidding mix)
Custom testbed. Xen + RUBiS ap-
plication
tinuous interval between 0 and 1 (in contrast to Boolean
logic). Fuzzy models are further described in Subsection
5.4.2.
5.4.2 Review of Proposals
Fixed-gain controllers, including PID, PI and I, are the sim-
plest controller types, and have been widely used in the lit-
erature. For example, Lim et al [75] [76] use an I controller
to adjust the number of VMs based on average CPU us-
age, while Park and Humphrey [89] apply a PI controller
to manage the resources required by batch jobs, based on
their execution progress. Gain parameters Kpand KIcan
be set manually, based on trial-and-error [75] or using an
application-specific model. For example, in [89] a model is
constructed to estimate the progress of a job with respect
to the resources provisioned. Zhu and Agrawal [108] rely
on a RL agent in order to estimate the derivative term of
a PID controller. With the trial-and-error training, the RL
agent learns to minimize the sum of the squared error of the
control variables (the adaptive parameters) without violating
the time and budget constraints over time.
Adaptive control techniques are also rather popular. For
example, Ali-Eldin et al [40] propose combining two proac-
tive, adaptive controllers for scaling down, using dynamic
gain parameters based on input workload. A reactive ap-
proach is used for scaling up. The same authors propose
in [39] an adaptive, proportional controller, using a proac-
tive approach for both scaling up and down, and taking into
account the VM startup time. As stated before, adaptive con-
trol techniques rely on the use of performance models. Padala
et al [88] propose a MIMO adaptive controller that uses
a second-order ARMA to model the non-linear and time-
varying relationship between the resource allocation and its
normalized performance. The controller is able to adjust the
CPU and disk I/O usage. Bod´
ık et al [43] combine smooth-
ing splines (used to map the workload and number of servers
to the application performance) with a gain-scheduling adap-
tive controller. Kalyvianaki et al [69] designed different SISO
and MIMO controllers to determine the CPU allocation of
VMs, relying on Kalman filters, whereas [58] utilized a Krig-
ing model to predict job completion time as a function of the
number of VM, the number of incoming requests and the
jobs enqueued at the master node.
Fuzzy models have been used as a performance model in
control systems to relate the workload (input variable) and
the required resources (output variable). First, both input and
output variables of the system are mapped into fuzzy sets.
This mapping is defined by a membership function that de-
termines a value within the interval [0,1]. A fuzzy model
is based on a set of rules, that relate the input variables
(pre-condition of the rule), to the output variables (conse-
quence of the rule). The process of translating input val-
ues into one or more fuzzy sets is called fuzzification. De-
fuzzification is the inverse transformation which derives a
single numeric value that best represents the inferred fuzzy
values of the output variables. A control system that relies
on a rule-based fuzzy model is called a fuzzy controller.
Typically, the rule set and membership functions of a fuzzy
model are fixed at design time, and thus, the controller is
unable to adapt to a highly dynamic workload. An adaptive
approach can be used, in which the fuzzy model is repeat-
A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 15
edly updated based on on-line monitored information [106],
[103]. Xu et al [106] applied an adaptive fuzzy controller to
the business-logic tier of a web application, and estimated
the required CPU load for the input workload. A similar
approach is followed by [103], but here authors focus on
the database tier. They claim that they use the fuzzy model
to predict the future resource needs; however, they use the
workload of the current time step t, to calculate the resource
needs rt+1of the time step t+1, based on the assumption
that no sudden change happened within the duration of a
time step.
A further improvement is the neural fuzzy controller,
which uses a four-layer neural network (see Section 5.5) to
represent the fuzzy model. Each node in the first layer cor-
responds to one input variable. The second layer determines
the membership of each input variable to the fuzzy set (the
fuzzification process). Each node in layer 3 represents the
precondition part of one fuzzy logic rule. An finally, the out-
put layer acts as a defuzzifier, which converts fuzzy conclu-
sions from layer 3 into numeric output in terms of resource
adjustment. At early steps, the neural network only contains
the input and output layers. The membership and the rule
nodes are generated dynamically through the structure and
parameters learning. Lama and Zhou [74] relied on a neural
fuzzy controller, that is capable of self-constructing its struc-
ture (both the fuzzy rules and the membership functions) and
adapting its parameters through fast on-line learning (a re-
configuring controller type).
Finally, the last type of controllers that follow a proac-
tive approach are MPCs. Roy et al [94] combined an ARMA
model for workload forecasting, with the look-ahead con-
troller in order to optimize the resource allocation problem.
Fuzzy models can also be used to create a Fuzzy Model Pre-
dictive Controller [102].
The suitability of controllers for the auto-scaling task
highly depends on the type of controller and the dynamics
of the target system. The idea of having a controller that au-
tomates the process of adding/removing resources is very
appealing, but still, the problem is how to create a reliable
performance model that maps the input and output variables.
Simple reactive controllers could be used for applications
with easy to predict needs. However, it seems advisable to
focus efforts towards both adaptive and MPCs that are able
to adapt the application model and could be more suitable to
produce a general auto-scaling solution.
Table 4 shows a summary of the articles reviewed in this
section.
5.5 Time Series Analysis
Time series are used in many domains including finance, en-
gineering, economics and bioinformatics, generally to repre-
sent the change of a measurement over time. A time series is
a sequence of data points, measured typically at successive
time instants spaced at uniform time intervals. An example
is the number of requests that reaches an application, taken
at one-minute intervals. The time series analysis can be used
in the analysis phase of the auto-scaling process, in order to
find repeating patterns in the input workload or to try to fore-
cast future values.
5.5.1 Description of the Technique
Given the scenario described in Section 2, a certain perfor-
mance metric, such as average CPU load or the input work-
load, will be sampled periodically, at fixed intervals (e.g.
each minute). The result will be a time series Xcontaining a
sequence of wobservations (wis the time series length):
X=xt,xt1,xt2,..., xtw+1(8)
Time series techniques can be applied in order to predict
future values of the metric, e.g. the future workload or re-
source usage. Based on this predicted value, a suitable auto-
scaling action can be planned, using for example a set of
predefined rules [71], or solving an optimization problem
for the resource allocation [94].
Formally, the objective of time series analysis is to fore-
cast future values of the time series, based on the last q
observations, which is denoted as input window or history
window (where qw). The future value ˆxt+ris rintervals
ahead of the input window. Time series analysis techniques
can be classified into two broad groups: some of them focus
on the direct prediction of future values, whereas other tech-
niques try to identify the pattern (if present) followed by the
time series, and then extrapolate it to predict future values.
The first group includes moving average, auto-regression,
ARMA (combining both), exponential smoothing and dif-
ferent approaches based on machine learning:
Moving average methods (MA): They can be used to smooth
a time series in order to remove noise or to make predic-
tions. The forecast value ˆxt+ris calculated as the weighted
average of the last qconsecutive values. Typically, the
prediction interval ris set to 1. Then, the general for-
mula is as follows: ˆxt+r=a1xt+a2xt1+.. .+aqxtq+1,
where a1,a2,...,aqare a set of positive weighting fac-
tors that must sum 1. Simple moving average MA(q)
considers the arithmetic mean of the last qvalues, i.e.,
it assigns equal weight 1
qto all observations. In contrast,
the weighted moving average WMA(q) assigns a differ-
ent weight to each observation. Typically, more weight
is given to the most recent terms in the time series, and
less weight to older data.
Exponential smoothing (ES): Similarly to moving average,
it calculates the weighted average of past observations,
16 Tania Lorido-Botran et al.
but exponential smoothing takes into account all the past
history of the time series (wobservations). It assigns
exponentially decreasing weights over time. A new pa-
rameter is introduced, a smoothing factor αthat weak-
ens the influence of past data. There are different ver-
sions of exponential smoothing such as simple or sin-
gle ES, and Brown’s double ES [44]. In simple ES, the
following formula is used recursively to calculate the
current smoothed value st, based on the current obser-
vation xtand the previous smoothed value st1:st=
αxt+ (1α)st1. The forecast for the next period ˆxt+1
is simply the current smoothed value st. The predictor
formula for simple exponential smoothing can be ex-
panded as follows:
ˆxt+1=αxt+ (1α)ˆxt
=αxt+ (1α)[αxt1+ (1α)ˆxt1]
=αxt+α(1α)xt1+
(1α)2[αxt2+ (1α)ˆxt2]
.
.
.
=αxt+α(1α)xt1+α(1α)2xt2+
. . . + (1α)w1ˆxtw+1
(9)
where ˆxt+1represents the forecast value for the period
t+1, based on actual xtvalue and the previous values of
the time series. ˆxtrefers to the forecast made for period t.
The first smoothed value ˆxtw+1can be set to the initial
value of the time series xtw+1, or to the mean of the few
first observations.
Simple ES is suitable for time series that have no sig-
nificant trend changes. For time series with an existing
linear trend, the Brown’s double ES should be applied.
It calculates two smoothing series:
s1
t=αxt+ (1α)s1
t1
s2
t=αs1
t+ (1α)s2
t1
(10)
The second smooth series s2
tis obtained by applying
simple ES to series s1
t. Both smoothed values s1
tand s2
t
are used to estimate the level ctand trend dtof the time
series at the time t. Based on these, the forecast value
ˆxt+ris calculated as follows:
ˆxt+r=ct+rdt
ct=2s1
ts2
t
dt=α
1αs1
ts2
t
(11)
Auto-regression of order p, AR( p): The prediction formula
is determined as the linear wighted sum of the ppre-
vious terms in the series: ˆxt+1=b1xt+b2xt1+.. . +
bpxtp+1+εt. Parameter pcorresponds to the number
of terms in the AR equation, that may be different from
the history window length w. The formula may include
a white noise term εt. The key is to derive the best val-
ues for the weights or auto-regression coefficients b1,
b2,...,bp. There are a diversity of techniques for com-
puting AR coefficients, such as least squares or the max-
imum likelihood method. In the literature, the most com-
mon method is based on the calculation of auto-correlation
coefficients and the Yule-Walker equations. The auto-
correlation function (ACF) of a time series gives corre-
lations between xtand xtkfor lag k=1,2,3,. ..:
rk=covariance(xt,xtk)
var(xt)=E[(xtµ)(xtkµ)]
var(xt)(12)
The autocorrelation can be estimated as:
rk=1
(wk)σ2
wk
t=1
(xt¯x)(xtk¯x)(13)
The full autocorrelation function can be derived by re-
cursively calculating rk=p
i=1bkrik. The result is a
set of linear equations called the Yule-Walker equations,
that can be represented in matrix form as:
1r1r2r3.. . rp1
r11r1r2.. . rp2
r2r11r1.. . rp3
.
.
..
.
..
.
..
.
..
.
..
.
.
rp1rp2rp3rp4.. . 1
b1
b2
b3
.
.
.
bp
=
r1
r2
r3
.
.
.
rp
(14)
By solving this set of equations, auto-regression coeffi-
cients b1,b2,...,bpcan be derived for any pvalue. For
example, for AR(1) (with p=1), the auto-regression co-
efficient b1is equal to the corresponding autocorrelation
coefficient r1(b1=r1).
Auto-Regressive Moving Average, ARMA(p,q): This model
combines both auto-regression (of order p) and moving
average (of order q). As stated before, AR takes into ac-
count the last pobservations in the time series xt,xt1,
xt2,...,xtp. The MA model, which is different from
the MA method described previously, is the sum of the
time series mean µ, plus the innovations or white error
terms εt,εt1,...,εtq.
xt=µ+εt+a1εt1+a2εt2+.. . +aqεtq(15)
where εtN(0,σ2). Then, the ARMA model is repre-
sented as:
xt=b1xt1+.. . +bpxtp+εt+a1εt1+. .. +aqεtq
A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 17
(16)
Note that an ARMA(0,q) is a pure MA model; and an
ARMA(p,0) corresponds to an AR model. There are
several techniques for selecting the appropriate values
for the orders pand q, and estimating the coefficients
a1,a2,...,aqand b1,b2,...,bp.
ARMA is a suitable model for stationary processes, i.e.
the mean and variance of the time series remain con-
stant over time. Thus, the time series must not show
any trend (the variations of the mean) or seasonal varia-
tions. If the AR model correctly fits the time series, the
residual εis a white noise that shows no pattern. An
extension of the ARMA model, called ARIMA (Auto-
Regressive Integrated Moving Average), can be applied
to non-stationary time series. Another extension called
ARMAX(p,q,b), or ARMA with eXogenous inputs, is
able to capture the relationship between a given time se-
ries Xand another external time series D. It contains the
AR(p) and MA(q) models of Xand a linear combination
of the last bterms of the time series D.
Machine Learning-based techniques: These are two popu-
lar techniques used to carry out analysis of time series
that can be considered part of the broader “machine learn-
ing” field:
Regression is a statistical method used to determine the
polynomial function that is the closest to a set of
points (in this case, the wvalues of the history win-
dow). Linear regression refers to the particular case
of a polynomial of order 1. The objective is to find
a polynomial such that the distance from each of the
points to the polynomial curve is as small as possible
and therefore fits the data the best. When the number
of input variables is more than one, it is referred to as
the Multiple Linear Regression. The Linear Regres-
sion equation is the same used in AR, but the weight
estimation method differs.
Neural networks consist of an interconnected group of
artificial neurons, arranged in several layers: an in-
put layer with several input neurons; an output layer
with one or more output neurons; and one or more
hidden layers in between. For time series analysis,
the input layer contains one neuron for each value in
the history window, and one neuron for the predicted
value in the output layer. During the training phase, it
is fed with input vectors and random weights. Those
weights will be adapted until the given input shows
the desired output, at a learning rate ρ.
As previously stated, a group of time series analysis tech-
niques try to identify the pattern that the series follows, and
then use this pattern to extrapolate future values. Time series
patterns can be described in terms of four classes of compo-
nents: trend, seasonality, cyclical and randomness. The gen-
eral trend (e.g. increasing or decreasing pattern), together
with the seasonal variations that appear repeated over a spe-
cific period (e.g. day, week, month, or season), are the most
common components in a time series. Input workloads of
cloud applications may show different periodic components.
The trend identifies the overall slope of the workload, whereas
seasonality and cyclical determine the peaks at specific points
of time in a short term and in a long term basis, respectively.
A wide diversity of methods can be used to find repeti-
tive patterns in time series, including:
Pattern matching: It searches for similar patterns in the his-
tory time series, that are similar to the present pattern. It
is very close to the string matching problem, for which
several efficient algorithms are available (e.g. Knuth-Morris-
Prat [53]).
Signal processing techniques: Fast Fourier Transform (FFT)
is a technique that decomposes a signal time series into
components of different frequencies. The dominant fre-
quencies (if any) will correspond to the repeating pattern
in the time series.
Auto-correlation: In auto-correlation, the input time series
is repeatedly shifted (up to half the total window length),
and the correlation is calculated between the shifted time
series and the original one. If the correlation is higher
than a given threshold (e.g. 0.9) after sshifts, a repeating
pattern is declared, with duration ssteps.
A basic tool for time series representation is the his-
togram. It involves distributing the values of the time se-
ries into several equal-width bins, and representing the fre-
quency for each bin. It has been used in the literature to rep-
resent the resource usage pattern or distribution, and then
predict future values.
5.5.2 Review of Proposals
In the context of elastic applications, time series analysis
have been applied mostly to predict workload or resource
usage. A simple moving average could be used for this pur-
pose, but with poor results [60]. For this reason, authors have
applied MA only to remove noise from the time series [89],
[75], or just to have a comparison yardstick. For example,
Huang et al [65] present a resource prediction model (for
CPU and memory utilization) based on double exponential
smoothing, and compare it with simple mean and weighted
moving average (WMA). ES clearly obtained better results,
because it takes into account the history records w(not only
the input window q) of the time series for the prediction. Mi
et al [85] also used Brown’s double ES in order to forecast
input workload of real traces (World Cup 98 and ClarkNet),
and obtained good accuracy results, with a small amount of
error (a mean relative error of 0.064 for the best case).
The auto-regression method has also been used for re-
source or workload forecasting ([73], [50], [60], [49], [71],
18 Tania Lorido-Botran et al.
Table 5 Summary of the reviewed literature about techniques based on time series analysis
Ref Auto-scaling Techniques H/V R/P Metric Monitoring SLA Workloads Experimental Platform
[47] TS: Pattern matching H P Total number of
CPUs
100 seconds Number of serviced
requests, cost
Real cloud workloads:
from Animoto and 7
IBM cloud applications
Analytical models
[60] TS: FFT and Discrete
Markov Chains. Compared
with auto-regression, auto-
correlation, histogram,
max and min.
V P CPU load Libxenstat library. 1
minute
Response time Real. World Cup 98
and ClarkNet. Also Syn-
thetic trace
Custom testbed. Xen + RUBiS +
part of Google Cluster Data trace
for CPU usage.
[95] TS: FFT and Discrete-time
Markov Chain
V P/R CPU load, memory
usage
Libxenstat library. 1
second
Response time, job
progress
Synthetic. Based on
World Cup 98 and EPA
web server
Custom testbed. Xen + 3 appli-
cations (RUBiS, Hadoop MapRe-
duce, IBM System S)
[57] TS: ARMA Both P Number of requests,
CPU load
- Prediction accuracy Real traces. Collected
from real applications
and a real data center
Custom testbed. Xen and KVM
[65] TS: Brown’s double ES.
Compared with WMA
- P CPU load, memory
usage
Simulated Prediction accuracy Synthetic traffic gener-
ated with simulator
CloudSim simulator
[85] TS: Brown’s double ES. - P Number of requests
per VM
10 minutes - Real. World Cup 98
and ClarkNet. Synthetic.
Poisson distribution
Custom testbed. TPC-W
[50] TS: AR H P Login rate, number
of active connec-
tions, CPU load
Simulated Service not available
(login), energy con-
sumption
Real. From Windows
Live Messenger (login
rate, number of active
connections)
Simulator.
[71] TS: AR + Threshold-based
rules
Both P Number of requests Zabbix - Synthetic Hybrid: Amazon EC2 + Custom
testbed (Xen + Eucalyptus + Php-
Collab application)
[94] TS: AR H P Number of users in
the system
- Response time, VM
cost, application re-
configuration cost
Real. World Cup 98 No experimentation on systems
[67] TS: ML Neural Network
and (Multiple) LR + Slid-
ing window
H P CPU load (aggre-
gated value for all
VMs)
Amazon Cloud-
Watch. 1 minute
Prediction accuracy Synthetic. TPC-W gen-
erator, constant growing
Real provider. Amazon EC2 and
TPC-W application to generate the
dataset. Prediction models are only
evaluated using cross-validation
and several accuracy metrics. Ex-
periments in R-Project.
[91] TS: ML Neural Network
(compared to MA, last
value and simple ES)
H P Number of entities
(players)
2 minutes Prediction accuracy Synthetic. Entities (play-
ers) with different be-
haviors
Simulator of a MMORPG game
[56] TS: Polynomial regression Both P Number of requests Custom tool. 1
minute
Response time, cost
for VM, application
license and reconfig-
uration
Real traces. From a pro-
duction data center of a
company
Custom testbed. KVM + Olio
[66] Threshold-based rules
(scale out) + TS, poly-
nomial regression (scale
in)
H R/P CPU load (scale
out), number of
requests, number of
VMs (scale in)
1 minute Response time Synthetic. Httperf Custom testbed. Eucalyptus + RU-
BiS
[49] TS: AR(1) and Histogram
+ QT
H P Request rate and ser-
vice demand
Simulated. 1, 5, 10
and 20 minutes
Response time Synthetic (Poisson
distribution) and Real
(World Cup 98)
Custom simulator + algorithms in
Matlab
[94]). For example, Roy et al [94] applied AR for workload
prediction, based on the last three observations. The pre-
dicted value is then used to estimate the response time. An
optimization controller takes this response time as an input
and computes the best resource allocation, taking into ac-
count the costs of SLA violations, leasing resources and re-
configuration. Kupferman et al [73] applied auto-regression
of order 1 to predict the request rate (requests per second)
and found that its performance depends largely on several
manager-defined parameters: the monitoring-interval length,
the size of the history window and the size of the adaptation
window. The history window determines the sensitivity of
the algorithm to short-term versus long-term trends, while
the size of the adaptation window determines how far into
the future the model extends.
ARMA models are able to capture characteristics of a
time series such as the input workload or the CPU usage.
Fang et al [57] found it useful to predict the future CPU us-
age of VMs. However, they remark the computational cost
of this technique, that includes the choice of pand q, the
estimation of the coefficients of each term and other param-
eters.
The history window values can also be the input for
a neural network [67] [91] or a multiple linear regression
equation [73], [43], [67]. The accuracy of both methods de-
pends on the input window size. Indeed, Islam et al [67]
obtained better results when using more than one past value
for prediction. Kupferman et al [73] further investigated the
topic and found that it is necessary to balance the size of
each sample in the window, to avoid overreacting, but also to
maintain a correct level of sensitivity to workload changes.
They propose regressing over windows of different sizes,
and then using the mean of all predictions. Another impor-
tant issue is the choice of the prediction interval r. Islam et al
[67] propose using a 12-minute interval, because the setup
time of VM instances in the cloud is typically around 5-15
min. In another context, Prodan and Nae [91] use a neural
network to predict a game load (i.e. the number of entities
or players) in the next two minutes. The neural network ob-
tained better accuracy than moving average and simple ex-
ponential smoothing.
Most time series analysis techniques have been applied
to vertical or horizontal scaling separately. Dutta et al [56]
claim that vertical scaling has limited range but has lower re-
source and configuration costs, while horizontal scaling can
allow the application to achieve a much larger throughput
but at a potentially higher cost. For this reason, they com-
bine both VM resizing in case of small increments in the re-
quest rate, and apply horizontal scaling for major changes in
the input workload. They use a polynomial regression to es-
A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 19
timate the expected number of requests for the next interval.
Fang et al [57] focus on vertical scaling (CPU and memory)
for regular changes in workload, whereas horizontal scaling
is applied in order to handle sudden spikes and flash crowds.
Time series forecasting (associated to proactive decision
making) can be combined with reactive techniques. For ex-
ample, Iqbal et al [66] proposed a hybrid scaling technique
that utilizes reactive rules for scaling up (based on CPU us-
age) and a regression-based approach for scaling down. Af-
ter a fixed number of intervals in which response time is
satisfied, they calculate the required number of application-
tier and database-tier instances using polynomial regression
(of degree two).
Some auto-scaling proposals use time series analysis tech-
niques that deal with pattern identification, applied to the in-
put workload [46], [47], [60], [95]. The most complete com-
parison of this class of techniques is done by Gong et al [60];
they propose using FFT to identify repeating patterns in re-
source usage (CPU, memory, I/O and network), and com-
pare it with auto-correlation, auto-regression and histogram.
Pattern matching, proposed by Caron et al [46] [47], has two
main drawbacks: the large number of parameters in the al-
gorithm (such as the maximum number of matches or the
length of the predicted sequence), that highly affect the per-
formance of the algorithm, and the time required to explore
the past history trace.
Simple histograms have also been used by some authors
to predict the resource usage of applications, considering the
mean of the distribution [49], or the mean of the bin with the
highest frequency [60].
Time series analysis techniques are very appealing for
implementing auto-scalers, as they are able to predict future
demands arriving to elastic applications. Having this infor-
mation, it is possible to provide resources in advance and
deal with the time required to start up new VMs or add re-
sources to a particular instance. Despite the potential of this
set of techniques, their main drawback relies on the predic-
tion accuracy, that highly depends on the target application,
input workload pattern and/or burstiness, the selected met-
ric, the history window and prediction interval, as well as on
the specific technique being used. Efforts should be focused
on automating the selection of the best prediction technique
for a particular application or application class.
Table 5 contains a summary of the articles reviewed in
this section.
6 Conclusions and Future Work
The focus of this review has been on auto-scaling elastic
applications in cloud environments. While capacity plan-
ning is required for any scalable applications, the elastic-
ity provided by cloud infrastructures allows for an almost-
immediate adaptation of resources to application needs (driven
by an external demand). Auto-scalers try to automate this
adaptation, minimizing resource-related costs while allow-
ing the application to comply with the SLA.
In Section 3, the auto-scaling task was defined as a MAPE
process, composed of four phases: monitor, analyze, plan
and execute. Given the extensive literature about auto-scaling
for elastic applications, a classification criterion has been
proposed to organize the different proposals into five main
categories: threshold-based rules, reinforcement learning, queu-
ing theory, control theory and time series analysis. Each of
these categories has been described separately, including their
pros/cons, together with a critical review of relevant articles
using the technique that defines the category.
One of the conclusions that can be extracted from this
survey is that reactive auto-scaling systems might not be
able to cope with abrupt changes in the input workload, spe-
cially in the case of sudden traffic surges. One of the rea-
sons is the time required to acquire and set-up new resources
(e.g. the boot-up time of a new instance). It seems advis-
able to redirect research efforts towards developing proac-
tive auto-scaling systems able to predict future needs and ac-
quire the corresponding resources with enough anticipation,
maybe using some reactive methods to correct possible pre-
diction errors. As an example, Moore et al [86] demonstrate
that, compared to a purely reactive controller, their auto-
scaling system (based on both reactive and proactive mod-
els) was able to make better provisioning decisions, yielding
few QoS violations. Additionally, attention should be paid
to methods to reduce the time necessary to provision new
VMs, including those that take advantage of vertical scaling
actions that usually needs only seconds to be fulfilled.
In our opinion, auto-scaling systems should take advan-
tage of the prediction capabilities of time series analysis
techniques, together with the automation capabilities of con-
trollers. It is our plan to propose a predictive auto-scaling
technique based on time series forecasting algorithms. The
review of the literature has shown that the accuracy of these
algorithms depend on different parameters such as the sizes
of the history window and the adaptation window. Optimiza-
tion techniques could be used to adjust these values, in order
to customize the parameters for a given scenario.
When going through this review, the reader should have
noticed the large diversity of testing methodologies that au-
thors have used to assess their proposals (many of them are
described in the companion Appendix). Authors use differ-
ent ways of generating input workload, different types of ap-
plications (realistic or simulated), different SLA definitions,
different execution platforms, etc. In fact, the lack of a com-
mon testing platform capable of generating a well-defined
set of standardized metrics is the reason that prevented a
comparative analysis of the reviewed techniques in quan-
titative terms. We are currently working in the development
of a simulation-based workbench that could provide a com-
20 Tania Lorido-Botran et al.
mon testing platform. It has been used already to implement
and compare several auto-scaling techniques [77], although
this work is still preliminary. In the same line, it would be of
interest to define a commonly accepted scoring metric to ob-
jectively determine whether one auto-scaling algorithm per-
forms better than another in a particular scenario.
Throughout this review it has been assumed that a client
runs an elastic application on a single and homogeneous
cloud infrastructure. However, clients may need to deploy
their applications on hybrid or federated clouds. Hybrid clouds
combine resources from both a private and a public cloud,
while federated systems comprise different public providers.
The combination of different cloud platforms represents ad-
ditional challenges for the auto-scaling task. For example,
the monitoring information has to be gathered from different
sources, probably using different tools with their own APIs.
The list of performance metrics available and the granular-
ity level will depend on the provider itself. In the analysis
phase, an extra effort will be necessary to select the suitable
metric from each provider, maybe requiring mapping func-
tions to combine different metrics. After making a scaling
decision, a variety of APIs may be available to implement
it in different platforms, and the options and their conse-
quent effects may vary greatly. Examples of these platform-
dependent particularities are the availability of horizontal or
vertical scaling (or any of them), the capabilities of the dif-
ferent VM templates and the billing schemes (per hour, per
minute). Different efforts have been carried out towards sim-
plifying the use of federated/hybrid models [70] [84], or the
interoperability among different cloud platforms, including
the proposal of open standards, such as Open Virtualization
Format (OVF), Open Cloud Computing Interface (OCCI)
and Cloud Data Management Interface (CDMI). However,
orchestrating the auto-scaling capabilities of different cloud
providers is still an open challenge.
Finally, the problem of auto-scaling is closely related
to an infrastructure-related one: VM placement, the actual
mapping of the VMs forming an application onto the phys-
ical servers of the cloud provider. We are currently working
on ways to optimize the placement of fixed-size applications
in order to maximize the revenue for the cloud provider,
while satisfying the resource SLA. Auto-scaling capability
of (elastic) applications imposes an additional challenge to
the provider, as the set of VMs of an application varies with
time.
Acknowledgements This work has been partially supported by the
Saiotek and Research Groups 2007-2012 (IT-242-07) programs (Basque
Government), TIN2010-14931 and COMBIOMED network in compu-
tational biomedicine (Carlos III Health Institute). Dr. Miguel-Alonso is
member of the HIPEAC European Network of Excellence. Mrs Lorido-
Botr´
an is supported by a doctoral grant from the Basque Government.
Finally, we would like to thank the anonymous reviewers for their sug-
gestions and comments that greatly contributed to improve this survey.
A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 21
A Performance Evaluation in the Cloud: Experimental
Platforms, Application Benchmarks and Workloads
The auto-scaling techniques proposed in the literature have been tested
under very different conditions, which makes it impossible to come up
with a fair comparative assessment. Researchers in the field have built
their own evaluation platforms, suitable to their own needs. These eval-
uation platforms can be classified into simulators, custom testbeds and
public cloud providers. Except for simple simulators, a scalable appli-
cation benchmark has to be executed on the platform to carry out an
evaluation. The input workload to the application can be either syn-
thetic, generated with specific programs, or obtained from real users.
A.1 Experimental Platforms
Experimentation could be done in production infrastructures, either
from real cloud providers or in a private cloud. The major advantage of
using a real cloud platform is that proposals can be checked in actual
scenarios, thus proving a proof of suitability. However, it has a clear
drawback: for each experimentation, the whole scenario needs to be
set. In case of using a public cloud provider, the infrastructure is al-
ready given, but the researcher still needs to configure the monitoring
and auto-scaling system, deploy an application benchmark and a load
generator over a pool of VMs, figure out how to extract the informa-
tion and store it for later processing. Probably, each execution will be
charged according to the fees established by the cloud provider.
In order to reduce the experimentation cost and to have a more
controlled environment, a custom testbed could be set up. This has a
cost in terms of system configuration effort (in addition to buying the
hardware, if not available). An initial, non-trivial step consists of in-
stalling the virtualization software that will manage the VM creation,
support scaling and so on. Virtualization can be applied at the server
level, OS level or at the application level. For custom testbeds, a server
level virtualization environment is needed, commonly referred to as
Hypervisor or Virtual Machine Monitor (VMM). Some popular hyper-
visors are Xen [36], VMWare ESXi [32] and Kernel Virtual Machine
(KVM) [15]. There are several platforms for deploying custom clouds,
including open-source alternatives like OpenStack [19] and Eucalyptus
[9], and commercial software like vCloud Director [31]. OpenStack is
an open-source initiative supported by many cloud-related companies
such as RackSpace, HP, Intel and AMD, with a large customer-base.
Eucalyptus enables the creation of on-premises Infrastructure as a Ser-
vice clouds, with support for Xen, KVM and ESXi, and the Amazon
EC2 API. VCloud Director is the commercial platform developed by
VMWare.
As an alternative to a real infrastructure (custom testbed or public
cloud provider), software tools could be used to simulate the function-
ing of a cloud platform, including resource allocation and deallocation,
VM execution, monitoring and the remaining cloud management tasks.
The researcher could select an already existing simulator and adapt it
to her needs, or implement a new one from scratch. Obviously, using a
simulator implies an initial effort to adapt or develop the software but,
in contrast, has many advantages. The evaluation process is shortened
in many ways. Simulation allows testing multiple algorithms, avoiding
infrastructure re-configurations. Besides, simulation isolates the exper-
iment from the influence of external, uncontrolled factors (e.g. interfer-
ence with other applications, provider-induced VM consolidation pro-
cesses, etc.), something impossible in real cloud infrastructures. Ex-
periments carried out in real infrastructures may last hours, whereas
in an event-based simulator, this process may take just a few minutes.
Simulators are highly configurable and allow the researcher to gather
any information about system state or performance metrics. In spite
of the advantages mentioned, simulated environments are still an ab-
straction of physical machine clusters, thus the reliability of the results
will depend on the level of implementation detail considered during the
development. Some research-oriented cloud simulators are CloudSim
[7], GreenCloud [14], and GroudSim [87].
A.2 Application Benchmarks
A scalable application benchmark is executed on top of an experimen-
tal platform, based on a public cloud provider or a custom testbed, in
order to measure the performance of the system. Simulated experimen-
tal platforms do not always require the use of a benchmark: a simplified
view of an application may be part of the simulated model. Typically,
benchmarks comprise a web application together with a workload gen-
erator that creates synthetic session-based requests to the application.
Some commonly used benchmarks for cloud research are RUBiS [1],
TPC-W [30], CloudStone [8] and WikiBench [33]. Although both RU-
BiS and TPC-W benchmarks are out-dated or declared obsolete, they
are still being used by the research community.
RUBiS [1]: It is a prototype of an auction website modeled after eBay.
It offers the core functionality of an auction site (selling, brows-
ing and bidding) and supports three kinds of user sessions: visitor,
buyer, and seller. The applications consists of three main compo-
nents: Apache load balancer server, JBoss application server and
MySQL database server. The last update in this benchmark was in
2008.
TPC-W [30]: TPC [29] is a nonprofit organization founded to define
transaction processing and database benchmarks. Among them,
TPC-W is a complex e-commerce application, specifically an on-
line bookshop. It simulates three different profiles: primarily shop-
ping, browsing and web-based ordering. The performance metric
reported is the number of web interactions processed per second.
It was declared obsolete in 2005.
CloudStone [8]: It is a multi-platform performance measurement tool
for Web 2.0 and cloud computing developed by the Rad Lab group
at the University of Berkeley. CloudStone includes a flexible, real-
istic workload generator (Faban) to generate load against a realistic
Web 2.0 application (Olio). The stack can be deployed on Amazon
EC2 instances. Olio 2.0 is a two-tier social networking benchmark,
with a web frontend and a database backend. The application met-
ric is the number of active users of the social networking applica-
tion, which drives the throughput or the number of operations per
second.
WikiBench [33]: It is a web hosting benchmark, that uses real data
from the Wikipedia database, and generates real traffic based on
the Wikipedia access traces. The core application is MediaWiki
[16], a free open source wiki package originally used on the Wikipedia
website.
There are other, less used benchmarks such as RUBBoS [24], SpecWeb
[97] and TPC-C [28]. RUBBoS is a bulletin board benchmark, mod-
eled after an online news forum like Slashdot. The last update was in
2005. SpecWeb is a benchmark tool, created by the Standard Perfor-
mance Evaluation Corporation (SPEC), that is able to generate bank-
ing, e-commerce and support (large downloads), synthetic workloads.
SpecWeb has been discontinued in 2012 and now the SPEC company
has created a cloud benchmarking group [26]. TPC-C is an on-line
transaction processing benchmark that simulates a complete computing
environment where a population of users executes transactions against
a database. The performance metric is the number of transactions per
minute.
A.3 Workloads
As described before, workloads represent inputs to be processed by
application benchmarks. They can be generated using some patterns
22 Tania Lorido-Botran et al.
or gathered from real cloud applications (and typically stored in trace
files).
Cloud-based systems process two main classes of workload: batch
and transactional. Batch workloads consist of arbitrary, long running,
resource-intensive jobs, such as scientific programs or video transcod-
ing. The most well-known examples of transactional workloads are
web applications built to serve on-line HTTP clients. These systems
usually serve content types such as HTML pages, images or video
streams. All of these contents can be statically stored or dynamically
rendered by the servers.
A.3.1 Synthetic Workloads
Synthetic workloads can be generated based on different patterns. Ac-
cording to Mao and Humphrey [79] [2], there are four representa-
tive workload patterns in cloud environments: Stable, Growing, Cy-
cle/Bursting and On-and-off. Each of them represents a typical appli-
cation or scenario. A Stable workload is characterized by a constant
number of requests per minute. In contrast, a Growing pattern shows a
load that rapidly increases, for example, a piece of news that suddenly
becomes popular. The third pattern is called Cyclic/Bursting because
it can show regular periods (e.g. daytime has more workload than the
night) or bursts of load in a punctual date (e.g. a special offer). The last
typical pattern found is the On-and-off workload that may represent
batch processing or data analysis performed everyday.
There is a broad range of workload generators, that may create
either simple requests, or even real HTTP sessions, that mix different
actions (e.g. login or browsing) and simulate user thinking times. Ex-
amples of workload generators are:
Faban [8]: A Markov-based workload generator, included in the Cloud-
Stone stack.
Apache JMeter [4]: A workload generator implemented in Java, used
for load testing and measuring performance. It can be used to test
performance on both static and dynamic resources (files, servlets,
Perl scripts, databases and queries, FTP servers and more). It can
also generate heavy loads for a server, network or object, either to
test its strength or to analyze the overall performance under differ-
ent scenarios.
Rain [21]: A workload generation toolkit that uses parameterized sta-
tistical distributions in order to model different classes of work-
load.
Httperf [27]: A tool for measuring web server performance. It pro-
vides a flexible facility for generating various HTTP workloads
and for measuring server performance.
Synthetic workloads are suitable to carry out controlled experi-
mentation. For example, the workload could be tuned in order to test
the system under different number of users or request rates, with smooth
increments or sudden peaks. However, they may not be realistic enough,
a reason that makes necessary to use workloads collected from real pro-
duction systems.
A.3.2 Real Workloads
To the best of our knowledge, there are no publicly available, real traces
from cloud-hosted applications, and this is an evident drawback for
cloud research. In the literature, some authors have generated their own
traces, running benchmarks or real applications in cloud platforms.
There are also some references to traces from private clouds that have
not been published. However, most authors have used traces from In-
ternet servers, such as the ClarkNet trace [6], the World Cup 98 trace
[35] and Wikipedia access traces [34].
The ClarkNet trace [6] contains the HTTP requests received by
the ClarkNet server over a two-weeks period in 1995. ClarkNet is an
Internet access provider for the metro Baltimore-Washington DC area.
The trace shows a clear cyclic workload pattern (see Figure 3): daytime
has more workload than the night, and the workload on weekends is
lower than that taking place on weekdays.
Fig. 3 Number of requests per minute for ClarkNet Trace
The World Cup 98 trace [35] has been extensively used in the lit-
erature. It contains all the HTTP requests made to the 1998 World Cup
Web site between April 30, 1998 and July 26, 1998.
More recently, several Wikipedia access traces were published for
research purposes. They contain the URL requests made to the Wikipedia
servers between September, 2007 and January, 2008. Most requests are
read-only queries to the database.
Some authors have used traces from Grid environments [46], but
although there is an extensive number of public traces, the job execu-
tion scheme is not suitable for evaluating elastic applications. However,
they could be useful for batch-based workloads.
It is also worth mentioning the Google Cluster Data [12], two sets
of traces that contain the workloads running on Google compute cells.
The first dataset refers to a 7-hour period and consists of a set of tasks.
However, the data records have been anonymized, and the CPU and
RAM consumption have been normalized and obscured using a linear
transformation. The second trace includes significantly more informa-
tion about jobs, machine characteristics and constraints. This trace in-
cludes data from an 11k-machine cell over about a month-long period.
Like in the first trace, all the numeric data have been normalized, and
there is no information about the job type. For this reason, these Google
traces can be utilized to test auto-scaling techniques, but they could be
useful in other cloud-related research scenarios.
References
1. (2009) RUBiS: Rice University Bidding System. http://
rubis.ow2.org/, [Online; accessed 13-September-2012]
2. (2010) Workload Patterns for Cloud Computing. http:
//watdenkt.veenhof.nu/2010/07/13/workload-patterns-
for-cloud- computing/, [Online; accessed 29-January-2014]
3. (2012) Amazon Elastic Compute Cloud (Amazon EC2). http:
//aws.amazon.com/ec2/, [Online; accessed 13-September-
2012]
4. (2012) Apache JMeter. http://jmeter.apache.org/, [Online;
accessed 18-September-2012]
5. (2012) AWS Elastic Beanstalk (beta). Easy to begin, Impossible
to outgrow. http://aws.amazon.com/elasticbeanstalk/,
[Online; accessed 13-September-2012]
A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 23
6. (2012) ClarkNet HTTP Trace (From the Internet Traf-
fic Archive). http://ita.ee.lbl.gov/html/contrib/
ClarkNet-HTTP.html, [Online; accessed 13-September-2012]
7. (2012) CloudSim: A Framework for Modeling and Simulation
of Cloud Computing Infrastructures and Services. http://www.
cloudbus.org/cloudsim/, [Online; accessed 18-September-
2012]
8. (2012) CloudStone Project by Rad Lab Group. http://radlab.
cs.berkeley.edu/wiki/Projects/Cloudstone/, [Online; ac-
cessed 13-September-2012]
9. (2012) Eucalyptus Cloud. http://www.eucalyptus.com/, [On-
line; accessed 18-September-2012]
10. (2012) Google App Engine. http://cloud.google.com/
products/, [Online; accessed 13-September-2012]
11. (2012) Google Apps for Business. http://www.google.com/
intl/es/enterprise/apps/business/products.html, [On-
line; accessed 13-September-2012]
12. (2012) Google Cluster Data. Traces of Google workloads.
http://code.google.com/p/googleclusterdata/, [Online;
accessed 13-September-2012]
13. (2012) Google Compute Engine. http://cloud.google.
com/products/compute-engine.html/, [Online; accessed 13-
September-2012]
14. (2012) Greencloud - The green cloud simulator.
http://greencloud.gforge.uni.lu/, [Online; accessed
18-September-2012]
15. (2012) Kernel Based Virtual Machine. http://www.linux-
kvm.org/, [Online; accessed 18-September-2012]
16. (2012) MediaWiki. http://www.mediawiki.org/wiki/
MediaWiki, [Online; accessed 24-November-2012]
17. (2012) Microsoft Office 365. http://www.microsoft.com/en-
us/office365/online-software.aspx, [Online; accessed 13-
September-2012]
18. (2012) Microsoft Windows Azure. https://www.
windowsazure.com/en-us/, [Online; accessed 13-September-
2012]
19. (2012) OpenStack Cloud Software. Open source software for
building private and public clouds. http://www.openstack.
org/, [Online; accessed 18-September-2012]
20. (2012) Rackspace. The open cloud company. http://www.
rackspace.com/, [Online; accessed 13-September-2012]
21. (2012) Rain Workload Toolkit. https://github.com/
yungsters/rain-workload- toolkit/wiki, [Online; ac-
cessed 13-September-2012]
22. (2012) RightScale Cloud Management. http://www.
rightscale.com/, [Online; accessed 13-September-2012]
23. (2012) RightScale. Set up Autoscaling using Voting Tags.
http://support.rightscale.com/03-Tutorials/02-
AWS/02-Website_Edition/Set_up_Autoscaling_using_
Voting_Tags, [Online; accessed 13-September-2012]
24. (2012) RUBBoS: Bulletin Board Benchmark. http:
//jmob.ow2.org/rubbos.html/, [Online; accessed 18-
September-2012]
25. (2012) Salesforce.com. http://www.salesforce.com/, [On-
line; accessed 13-September-2012]
26. (2012) SPEC forms cloud benchmarking group. http://www.
spec.org/osgcloud/press/cloudannouncement20120613.
html, [Online; accessed 18-September-2012]
27. (2012) The httperf HTTP load generator. http://code.google.
com/p/httperf/, [Online; accessed 18-September-2012]
28. (2012) TPC-C. http://www.tpc.org/tpcc/default.asp/,
[Online; accessed 18-September-2012]
29. (2012) TPC. Transaction Processing Performance Council.
http://www.tpc.org/default.asp, [Online; accessed 13-
September-2012]
30. (2012) TPC-W. http://www.tpc.org/tpcw/default.asp,
[Online; accessed 13-September-2012]
31. (2012) VMware vCloud Director. Deliver Complete Vir-
tual Datacenters for Consumption in Minutes. http://www.
eucalyptus.com/, [Online; accessed 18-September-2012]
32. (2012) VMware vSphere ESX and ESXi Info Center.
http://www.vmware.com/es/products/datacenter-
virtualization/vsphere/esxi-and- esx/overview.html,
[Online; accessed 18-September-2012]
33. (2012) WikiBench: A Web hosting benchmark. http://www.
wikibench.eu, [Online; accessed 24-November-2012]
34. (2012) Wikipedia access traces. http://www.wikibench.eu/
?page_id=60, [Online; accessed 24-November-2012]
35. (2012) World Cup 98 Trace (From the Internet Traffic Archive).
http://ita.ee.lbl.gov/html/contrib/WorldCup.html,
[Online; accessed 13-September-2012]
36. (2012) Xen hypervisor. http://http://www.xen.org/, [On-
line; accessed 18-September-2012]
37. Albus J (1975) A new approach to manipulator control: The cere-
bellar model articulation controller (CMAC). Transaction of the
ASME, Journal of dynamic systems, measurement and control
38. Alhamazani K, Ranjan R, Mitra K, Rabhi F, Khan SU, Guabtni
A, Bhatnagar V (2013) An Overview of the Commercial Cloud
Monitoring Tools: Research Dimensions, Design Issues, and
State-of-the-Art. arXiv preprint arXiv:13126170
39. Ali-Eldin A, Kihl M, Tordsson J, Elmroth E (2012) Efficient
provisioning of bursty scientific workloads on the cloud using
adaptive elasticity control. In: Proceedings of the 3rd workshop
on Scientific Cloud Computing Date - ScienceCloud ’12, ACM
Press, New York, New York, USA, p 31, DOI 10.1145/2287036.
2287044
40. Ali-Eldin A, Tordsson J, Elmroth E (2012) An adaptive hybrid
elasticity controller for cloud infrastructures. In: Network Opera-
tions and Management Symposium (NOMS), 2012 IEEE, IEEE,
pp 204–212
41. Bacigalupo DA, van Hemert J, Usmani A, Dillenberger DN,
Wills GB, Jarvis SA (2010) Resource management of enterprise
cloud systems using layered queuing and historical performance
models. In: 2010 IEEE International Symposium on Parallel &
Distributed Processing, Workshops and Phd Forum (IPDPSW),
IEEE, pp 1–8, DOI 10.1109/IPDPSW.2010.5470782
42. Barrett E, Howley E, Duggan J (2012) Applying reinforcement
learning towards automating resource allocation and application
scalability in the cloud. Concurrency and Computation: Practice
and Experience
43. Bod´
ık P, Griffith R, Sutton C, Fox A, Jordan M, Patterson
D (2009) Statistical machine learning makes automatic control
practical for internet datacenters. HotCloud’09 Proceedings of
the 2009 conference on Hot topics in cloud computing p 12
44. Brown R, Meyer R (1961) The fundamental theorem of exponen-
tial smoothing. Operations Research
45. Bu X, Rao J, Xu CZ (2012) Coordinated Self-configuration of
Virtual Machines and Appliances using A Model-free Learning
Approach. IEEE Transactions on Parallel and Distributed Sys-
tems pp 1–1, DOI 10.1109/TPDS.2012.174
46. Caron E, Desprez F, Muresan A (2010) Forecasting for Cloud
computing on-demand resources based on pattern matching. Re-
search Report RR-7217, INRIA
47. Caron E, Desprez F, Muresan A (2011) Pattern Matching Based
Forecast of Non-periodic Repetitive Behavior for Cloud Clients.
Journal of Grid Computing 9(1):49–64, DOI 10.1007/s10723-
010-9178- 4
48. Casalicchio E, Silvestri L (2013) Autonomic Management of
Cloud-Based Systems: The Service Provider Perspective. In: Ge-
lenbe E, Lent R (eds) Computer and Information Sciences III,
Springer London, pp 39–47, DOI 10.1007/978-1- 4471-4594-
24 Tania Lorido-Botran et al.
3\5
49. Chandra A, Gong W, Shenoy P (2003) Dynamic resource alloca-
tion for shared data centers using online measurements. Proceed-
ings of the 11th international conference on Quality of service pp
381–398
50. Chen G, He W, Liu J, Nath S, Rigas L, Xiao L, Zhao F
(2008) Energy-aware server provisioning and load dispatching
for connection-intensive internet services. In: Proceedings of the
5th USENIX Symposium on Networked Systems Design and Im-
plementation, USENIX Association, vol 8, pp 337–350
51. Chieu TC, Mohindra A, Karve AA, Segal A (2009) Dynamic
scaling of web applications in a virtualized cloud computing en-
vironment. In: e-Business Engineering, 2009. ICEBE’09. IEEE
International Conference on, Ieee, pp 281–286
52. Chieu TC, Mohindra A, Karve AA (2011) Scalability and Perfor-
mance of Web Applications in a Compute Cloud. In: e-Business
Engineering (ICEBE), 2011 IEEE 8th International Conference
on, IEEE, pp 317–323
53. Cormen TH, Stein C, Rivest RL, Leiserson CE (2001) Introduc-
tion to Algorithms, Chapter 32: String Matching. McGraw-Hill
Higher Education
54. Dutreilh X, Moreau A, Malenfant J, Rivierre N, Truck I (2010)
From data center resource allocation to control theory and back.
In: Cloud Computing (CLOUD), 2010 IEEE 3rd International
Conference on, IEEE, pp 410–417
55. Dutreilh X, Kirgizov S, Melekhova O, Malenfant J, Rivierre N,
Truck I (2011) Using Reinforcement Learning for Autonomic
Resource Allocation in Clouds: towards a fully automated work-
flow. In: Seventh International Conference on Autonomic and
Autonomous Systems, ICAS 2011, IEEE, pp 67–74
56. Dutta S, Gera S, Verma A, Viswanathan B (2012) SmartScale:
Automatic Application Scaling in Enterprise Clouds. In: 2012
IEEE Fifth International Conference on Cloud Computing, IEEE,
pp 221–228, DOI 10.1109/CLOUD.2012.12
57. Fang W, Lu Z, Wu J, Cao Z (2012) RPPS: A Novel Resource
Prediction and Provisioning Scheme in Cloud Data Center. In:
2012 IEEE Ninth International Conference on Services Comput-
ing, IEEE, pp 609–616, DOI 10.1109/SCC.2012.47
58. Gambi A, Toffetti G (2012) Modeling Cloud performance with
Kriging. In: 2012 34th International Conference on Software
Engineering (ICSE), IEEE, pp 1439–1440, DOI 10.1109/ICSE.
2012.6227075
59. Ghanbari H, Simmons B, Litoiu M, Iszlai G (2011) Exploring
Alternative Approaches to Implement an Elasticity Policy. In:
Cloud Computing (CLOUD), 2011 IEEE International Confer-
ence on, IEEE, pp 716–723
60. Gong Z, Gu X, Wilkes J (2010) Press: Predictive elastic resource
scaling for cloud systems. In: Network and Service Management
(CNSM), 2010 International Conference on, IEEE, pp 9–16
61. Guitart J, Torres J, Ayguad´
e E (2010) A survey on performance
management for internet applications. Concurrency and Compu-
tation: Practice and Experience 22(1):68–106, DOI 10.1002/cpe.
1470
62. Han R, Ghanem MM, Guo L, Guo Y, Osmond M (2012) En-
abling cost-aware and adaptive elasticity of multi-tier cloud ap-
plications. Future Generation Computer Systems null(null), DOI
10.1016/j.future.2012.05.018
63. Han R, Guo L, Ghanem M, Han, R and Guo, L and Ghanem, MM
and Guo Y (2012) Lightweight Resource Scaling for Cloud Ap-
plications. Cluster, Cloud and Grid Computing (CCGrid), 2012
12th IEEE/ACM International Symposium on
64. Hasan MZ, Magana E, Clemm A, Tucker L, Gudreddi SLD
(2012) Integrated and autonomic cloud resource scaling. In: Net-
work Operations and Management Symposium (NOMS), 2012
IEEE, IEEE, pp 1327–1334
65. Huang J, Li C, Yu J (2012) Resource prediction based on double
exponential smoothing in cloud computing. In: Consumer Elec-
tronics, Communications and Networks (CECNet), 2012 2nd In-
ternational Conference on, IEEE, pp 2056–2060
66. Iqbal W, Dailey MN, Carrera D, Janecek P (2011) Adaptive re-
source provisioning for read intensive multi-tier applications in
the cloud. Future Generation Computer Systems 27(6):871–879,
DOI 10.1016/j.future.2010.10.016
67. Islam S, Keung J, Lee K, Liu A (2012) Empirical prediction mod-
els for adaptive resource provisioning in the cloud. Future Gen-
eration Computer Systems 28(1):155–162, DOI 10.1016/j.future.
2011.05.027
68. Jacob B, Lanyon-Hogg R, Nadgir DK, Yassin AF (2004) A prac-
tical guide to the IBM autonomic computing toolkit
69. Kalyvianaki E, Charalambous T, Hand S (2009) Self-adaptive
and self-configured cpu resource provisioning for virtualized
servers using kalman filters. In: Proceedings of the 6th interna-
tional conference on Autonomic computing, ACM, pp 117–126
70. Kertesz A, Kecskemeti G, Oriol M, Kotcauer P, Acs S, Rodr´
ıguez
M, Merc`
e O, Marosi AC, Marco J, Franch X (2013) Enhancing
Federated Cloud Management with an Integrated Service Moni-
toring Approach. Journal of Grid Computing pp 1–22
71. Khatua S, Ghosh A, Mukherjee N (2010) Optimizing the utiliza-
tion of virtual resources in Cloud environment. In: 2010 IEEE
International Conference on Virtual Environments, Human-
Computer Interfaces and Measurement Systems, IEEE, pp 82–
87, DOI 10.1109/VECIMS.2010.5609349
72. Koperek P, Funika W (2012) Dynamic Business Metrics-driven
Resource Provisioning in Cloud Environments. In: Wyrzykowski
R, Dongarra J, Karczewski K, Wa´
sniewski J (eds) Parallel Pro-
cessing and Applied Mathematics, Lecture Notes in Computer
Science, vol 7204, Springer Berlin Heidelberg, pp 171–180, DOI
10.1007/978-3- 642-31500- 8\18
73. Kupferman J, Silverman J, Jara P, Browne J (2009) Scaling into
the cloud. Tech. rep., University of California, Santa Barbara;
CS270 - Advanced Operating Systems, URL http://cs.ucsb.
edu/~jkupferman/docs/ScalingIntoTheClouds.pdf
74. Lama P, Zhou X (2010) Autonomic Provisioning with Self-
Adaptive Neural Fuzzy Control for End-to-end Delay Guarantee.
In: 2010 IEEE International Symposium on Modeling, Analysis
and Simulation of Computer and Telecommunication Systems,
IEEE, pp 151–160, DOI 10.1109/MASCOTS.2010.24
75. Lim HC, Babu S, Chase JS, Parekh SS (2009) Automated control
in cloud computing: challenges and opportunities. In: Proceed-
ings of the 1st workshop on Automated control for datacenters
and clouds, ACM, New York, NY, USA, ACDC ’09, pp 13–18,
DOI 10.1145/1555271.1555275
76. Lim HC, Babu S, Chase JS (2010) Automated control for elas-
tic storage. In: Proceeding of the 7th international conference on
Autonomic computing - ICAC ’10, ACM Press, New York, New
York, USA, p 1, DOI 10.1145/1809049.1809051
77. Lorido-Botran T, Miguel-Alonso J, Lozano JA (2013) Compari-
son of Auto-scaling Techniques for Cloud Environments. In: Al-
berto A. Del Barrio, G. B. (editor), Actas de las XXIV Jornadas
de Paralelismo. Servicio de Publicaciones
78. Manvi SS, Shyam GK (2013) Resource management for Infras-
tructure as a Service (IaaS) in cloud computing: A survey. Journal
of Network and Computer Applications (0):–
79. Mao M, Humphrey M (2011) Auto-scaling to minimize cost and
meet application deadlines in cloud workflows. In: Proceedings
of 2011 International Conference for High Performance Comput-
ing, Networking, Storage and Analysis on - SC ’11, ACM Press,
New York, New York, USA, p 1, DOI 10.1145/2063384.2063449
80. Mao M, Humphrey M (2012) A Performance Study on the VM
Startup Time in the Cloud. In: Proceedings of the 2012 IEEE
Fifth International Conference on Cloud Computing, IEEE Com-
A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 25
puter Society, Washington, DC, USA, CLOUD ’12, pp 423–430,
DOI 10.1109/CLOUD.2012.103
81. Maurer M, Brandic I, Sakellariou R (2011) Enacting slas in
clouds using rules. Euro-Par 2011 Parallel Processing
82. Maurer M, Breskovic I, Emeakaroha VC, Brandic I (2011) Re-
vealing the MAPE loop for the autonomic management of Cloud
infrastructures. In: Computers and Communications (ISCC),
2011 IEEE Symposium on, pp 147–152, DOI 10.1109/ISCC.
2011.5984008
83. Menasce DA, Dowdy LW, Almeida VAF (2004) Performance by
Design: Computer Capacity Planning By Example. Upper Saddle
River, NJ: Prentice Hall
84. M´
endez Mu˜
noz V, Casaj´
us Ramo A, Fern´
andez Albor V, Gra-
ciani Diaz R, Merino Ar´
evalo G (2013) Rafhyc: an Archi-
tecture for Constructing Resilient Services on Federated Hy-
brid Clouds. Journal of Grid Computing 11(4):753–770, DOI
10.1007/s10723-013- 9279-y
85. Mi H, Wang H, Yin G, Zhou Y, Shi D, Yuan L (2010) Online self-
reconfiguration with performance guarantee for energy-efficient
large-scale cloud computing data centers. In: Services Comput-
ing (SCC), 2010 IEEE International Conference on, IEEE, pp
514–521
86. Moore LR, Bean K, Ellahi T (2013) Transforming reactive auto-
scaling into proactive auto-scaling. In: Proceedings of the 3rd In-
ternational Workshop on Cloud Data and Platforms, ACM, New
York, NY, USA, CloudDP ’13, pp 7–12, DOI 10.1145/2460756.
2460758
87. Ostermann S, Plankensteiner K, Prodan R, Fahringer T (2011)
GroudSim: An Event-Based Simulation Framework for Compu-
tational Grids and Clouds. In: Guarracino M, Vivien F, Tr¨
aff J,
Cannatoro M, Danelutto M, Hast A, Perla F, Kn¨
upfer A, Martino
B, Alexander M (eds) Euro-Par 2010 Parallel Processing Work-
shops, Lecture Notes in Computer Science, vol 6586, Springer
Berlin Heidelberg, pp 305–313, DOI 10.1007/978-3-642-21878-
1\38
88. Padala P, Hou KY, Shin KG, Zhu X, Uysal M, Wang Z, Singhal
S, Merchant A (2009) Automated control of multiple virtualized
resources. In: Proceedings of the 4th ACM European conference
on Computer systems, ACM, pp 13–26
89. Park SM, Humphrey M (2009) Self-Tuning Virtual Machines
for Predictable eScience. In: 2009 9th IEEE/ACM International
Symposium on Cluster Computing and the Grid, IEEE, pp 356–
363, DOI 10.1109/CCGRID.2009.84
90. Patikirikorala T, Colman A (2010) Feedback controllers in the
cloud. APSEC 2010, Cloud workshop
91. Prodan R, Nae V (2009) Prediction-based real-time resource pro-
visioning for massively multiplayer online games. Future Gener-
ation Computer Systems 25(7):785–793, DOI 10.1016/j.future.
2008.11.002
92. Rao J, Bu X, Xu CZ, Wang L, Yin G (2009) VCONF: a reinforce-
ment learning approach to virtual machines auto-configuration.
In: Proceedings of the 6th international conference on Auto-
nomic computing, ACM, New York, NY, USA, ICAC ’09, pp
137–146, DOI 10.1145/1555228.1555263
93. Rao J, Bu X, Xu CZ, Wang K (2011) 8. In: 2011 IEEE 19th
Annual International Symposium on Modelling, Analysis, and
Simulation of Computer and Telecommunication Systems, IEEE,
pp 45–54, DOI 10.1109/MASCOTS.2011.47
94. Roy N, Dubey A, Gokhale A (2011) Efficient Autoscaling in the
Cloud Using Predictive Models for Workload Forecasting. In:
2011 IEEE 4th International Conference on Cloud Computing,
IEEE, pp 500–507, DOI 10.1109/CLOUD.2011.42
95. Shen Z, Subbiah S, Gu X, Wilkes J (2011) Cloudscale: Elastic
resource scaling for multi-tenant cloud systems. Proceedings of
the 2nd ACM Symposium on Cloud Computing
96. Simmons B, Ghanbari H, Litoiu M, Iszlai G (2011) Managing
a SaaS application in the cloud using PaaS policy sets and a
strategy-tree. In: Network and Service Management (CNSM),
2011 7th International Conference on, pp 1–5
97. SPECweb2009 (2012) The httperf HTTP load generator. http:
//www.spec.org/web2009/, [Online; accessed 18-September-
2012]
98. Sutton RS, Barto AG (1998) Introduction to Reinforcement
Learning. Cambridge Univ Press
99. Tesauro G, Jong NK, Das R, Bennani MN (2006) A Hybrid Re-
inforcement Learning Approach to Autonomic Resource Alloca-
tion. In: Proceedings of the 2006 IEEE International Conference
on Autonomic Computing, IEEE Computer Society, Washing-
ton, DC, USA, ICAC ’06, pp 65–73, DOI 10.1109/ICAC.2006.
1662383
100. Urgaonkar B, Shenoy P, Chandra A, Goyal P, Wood T (2008)
Agile dynamic provisioning of multi-tier Internet applications.
ACM Transactions on Autonomous and Adaptive Systems
3(1):1–39, DOI 10.1145/1342171.1342172
101. Villela D, Pradhan P, Rubenstein D (2004) Provisioning servers
in the application tier for e-commerce systems. In: Quality of
Service, 2004. IWQOS 2004. Twelfth IEEE International Work-
shop on, IEEE, pp 57–66
102. Wang L, Xu J, Zhao M, Fortes J (2011) Adaptive virtual re-
source management with fuzzy model predictive control. In: Pro-
ceedings of the 8th ACM international conference on Autonomic
computing - ICAC ’11, ACM Press, New York, New York, USA,
p 191, DOI 10.1145/1998582.1998623
103. Wang L, Xu J, Zhao M, Tu Y, Fortes JAB (2011) Fuzzy Modeling
Based Resource Management for Virtualized Database Systems.
In: Modeling, Analysis & Simulation of Computer and Telecom-
munication Systems (MASCOTS), 2011 IEEE 19th International
Symposium on, IEEE, pp 32–42
104. Watkins C, Dayan P (1992) Q-learning. Machine learning
105. Xu CZ, Rao J, Bu X (2012) URL: A unified reinforcement learn-
ing approach for autonomic cloud management. Journal of Par-
allel and Distributed Computing 72(2):95–105, DOI 10.1016/j.
jpdc.2011.10.003
106. Xu J, Zhao M, Fortes J, Carpenter R, Yousif M (2007) On the
Use of Fuzzy Modeling in Virtualized Data Center Management.
In: Proceedings of the Fourth International Conference on Au-
tonomic Computing, IEEE Computer Society, Washington, DC,
USA, ICAC ’07, p 25, DOI 10.1109/ICAC.2007.28
107. Zhang Q, Cherkasova L, Smirni E (2007) A regression-based
analytic model for dynamic resource provisioning of multi-tier
applications. In: Autonomic Computing, 2007. ICAC’07. Fourth
International Conference on, IEEE, p 27
108. Zhu Q, Agrawal G (2012) Resource Provisioning with Budget
Constraints for Adaptive Applications in Cloud Environments.
IEEE Transactions on Services Computing 5(4):497–511, DOI
10.1109/TSC.2011.61
... Cloud scaling involves adjusting resources for cloud-deployed applications horizontally or vertically according to predefined policies [17] where the resources can include the number of virtual machines (VMs), containers, CPUs, or memory. Horizontal scaling involves increasing or decreasing the number of instances (such as VMs or containers) in a cloud-deployed application while vertical scaling involves adjusting the capacity of individual instances, such as increasing the number of CPUs or memory allocated to a particular VM. ...
Preprint
Full-text available
Interpreting the stochastic behavior of cloud scaling policies formalized and verified with a model checker, using an independent visualization tool can be time-consuming as several manual steps need to be taken to visualize the behavior. This is because the existing model checker presents verification results through text-based descriptions and line charts depicting probabilities or rewards against investigated variables. However, with the line charts, the user can only analyze the probabilities or rewards instead of the behavior. To address this, we propose a cloud scaling policies verifier system that utilizes the extensible STORM model checker to automatically visualize the behavior of the verified model and the verification results. Additionally, the system architecture integrates the desktop local environment with the Docker container environment using open-source technologies including StormPy, PyQt, Networkx, and Matplotlib to enable replication and innovation for future research. Moreover, the system has undergone evaluation through functional and integration testing, demonstrating its effectiveness. Finally, it has been demonstrated that the system can be utilized not only for checking and verifying various cloud scaling policies but also for other decision-making policies beyond cloud scaling.
Article
The proliferation of big data and analytic workloads has driven the need for cloud compute and cluster-based job processing. With Apache Spark, users can process terabytes of data at ease with hundreds of parallel executors. Providing low latency access to Spark clusters and sessions is a challenging problem due to the large overheads of cluster creation and session startup. In this paper, we introduce Intelligent Pooling, a system for proactively provisioning compute resources to combat the aforementioned overheads. Our system (1) predicts usage patterns using an innovative hybrid Machine Learning (ML) model with low latency and high accuracy; and (2) optimizes the pool size dynamically to meet customer demand while reducing extraneous COGS. The proposed system auto-tunes its hyper-parameters to balance between performance and operational cost with minimal to no engineering input. Evaluated using large-scale production data, Intelligent Pooling achieves up to 43% reduction in cluster idle time compared to static pooling when targeting 99% pool hit rate. Currently deployed in production, Intelligent Pooling is on track to save tens of million dollars in COGS per year as compared to traditional pre-provisioned pools.
Article
Full-text available
Auto-scaling is a crucial aspect of cloud computing, allowing for the efficient allocation of computational resources in response to immediate demand. This article delves into the concept of auto-scaling, its key components, and the strategies used to effectively manage resources in cloud environments. This study emphasizes the importance of auto-scaling in the cloud computing landscape by exploring its benefits, including cost efficiency, performance optimization, high availability, and scalability [1]. The article explores the various factors to consider when implementing scaling policies, such as selecting the right approach for scaling, whether it be predictive or reactive and the availability of auto-scaling services provided by major cloud platforms like Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure [2, 3]. In addition, the paper addresses the challenges and complexities related to configuring auto-scaling systems, cost management, and latency in resource provisioning [4]. The article also showcases case studies that illustrate the successful implementation of auto-scaling in different industries, along with valuable insights and recommended approaches [5]. Lastly, this paper delves into future trends and research directions in auto-scaling techniques, integration with emerging technologies, and potential research areas [6].
Article
A cluster of transcoding servers is essential for transcoding many on-demand videos. Cloud computing presents a scalable framework for online video transcoding, and the infrastructure as a service (IaaS) cloud provides heterogeneous virtual machines (VMs) for creating a dynamically scalable cluster of servers. Heterogeneous VMs consist of small or big cores, which are assigned dynamically to allocate varying sizes of videos to the appropriate VMs for transcoding. Earlier research has proposed cloud-based heterogeneous scheduling for allocating different types of videos to different types of VMs so that the quality of service is maintained by reducing video rejection. In this paper, we propose a heterogeneous multi-core video scheduling model that additionally estimates the number of VMs and cores per VM with the variation of the number of videos to optimize the resources and cost of a cloud-based transcoding system. We further estimate the model's overhead concerning the variation in the number of videos. We conducted experiments on random videos, and experimental results reveal that the proposed model provides an excellent estimation of the number of VMs and cores. The proposed model reduces the average cost by 5% and requires almost 10% fewer cores for processing video tasks than the existing work in average cases.
Article
Full-text available
Dynamic and appropriate resource dimensioning is a crucial issue in cloud computing. As applications go more and more 24/7, online policies must be sought to balance performance with the cost of allocated virtual machines. Most industrial approaches to date use ad hoc manual policies, such as threshold-based ones. Providing good thresholds proved to be tricky and hard to automatize to fit every application requirement. Research is being done to apply automatic decision-making approaches, such as reinforcement learning. Yet, they face a lot of problems to go to the field: having good policies in the early phases of learning, time for the learning to converge to an optimal policy and coping with changes in the application performance behavior over time. In this paper, we propose to deal with these problems using appropriate initialization for the early stages as well as convergence speedups applied throughout the learning phases and we present our first experimental results for these. We also introduce a performance model change detection on which we are currently working to complete the learning process management. Even though some of these proposals were known in the reinforcement learning field, the key contribution of this paper is to integrate them in a real cloud controller and to program them as an automated workflow.
Article
Full-text available
Autonomic management of quality of service attributes by dynamic resource allocation is one of the requirements in cloud computing environments. In order to provide these requirements in a flexible way existing cloud providers expose static rule /threshold/heuristic based decision implementation frameworks to their consumers. These rule /threshold/heuristic based methods are relatively easy to design and develop. However, they suffer from lack of well-founded design process to decide important design parameters (e.g.: thresholds, the number of instances to add/remove, calm time) and difficulty of dynamically adjusting to different conditions. In addition, incorrect/non-adaptable rule settings could cause long term instabilities leading to service outages. The feedback control has been shown to be useful for performance management and resource allocation in many complex software systems. In this work, we investigate the advantages and limitations of applying feedback controllers in cloud platforms. In addition, we illustrate the suitability of standard feedback controllers depending on the consumer requirements/applications. Finally, we propose a novel platform as a service architecture to design, develop, integrate and runtime manage feedback controllers for the cloud consumer applications.
Article
Full-text available
Cloud monitoring activity involves dynamically tracking the Quality of Service (QoS) parameters related to virtualized resources (e.g., VM, storage, network, appliances, etc.), the physical resources they share, the applications running on them and data hosted on them. Applications and resources configuration in cloud computing environment is quite challenging considering a large number of heterogeneous cloud resources. Further, considering the fact that at each point of time, there will be a different and specific cloud service which may be massively required. Hence, cloud monitoring tools can assist a cloud providers or application developers in: (i) keeping their resources and applications operating at peak efficiency; (ii) detecting variations in resource and application performance; (iii) accounting the Service Level Agreement (SLA) violations of certain QoS parameters; and (iv) tracking the leave and join operations of cloud resources due to failures and other dynamic configuration changes. In this paper, we identify and discuss the major research dimensions and design issues related to engineering cloud monitoring tools. We further discuss how aforementioned research dimensions and design issues are handled by current academic research as well as by commercial monitoring tools.
Article
The complexity of Cloud systems poses new infrastructure and application management challenges. One of the common goals of the research community, practitioners and vendors is to design self-adaptable solutions capable to react to unpredictable workload fluctuations and changing utility principles. This paper analyzes the problem from the perspective of an application service provider that uses a cloud infrastructure to achieve scalable provisioning of its services in the respect of QoS constraints. We designed and implemented two autonomic cloud resource management architectures running five different resource provisioning algorithms. The implemented testbed has been evaluated under a realistic workload based on Wikipedia access traces.
Chapter
The complexity of Cloud systems poses new infrastructure and application management challenges. One of the common goals of the research community, practitioners and vendors is to design self-adaptable solutions capable to react to unpredictable workload fluctuations and changing utility principles. This paper analyzes the problem from the perspective of an application service provider that uses a cloud infrastructure to achieve scalable provisioning of its services in the respect of QoS constraints. We designed and implemented two autonomic cloud resource management architectures running five different resource provisioning algorithms. The implemented testbed has been evaluated under a realistic workload based on Wikipedia access traces.
Article
Public Infrastructure as a Service (IaaS) clouds such as Amazon, GoGrid and Rackspace deliver computational resources by means of virtualisation technologies. These technologies allow multiple independent virtual machines to reside in apparent isolation on the same physical host. Dynamically scaling applications running on IaaS clouds can lead to varied and unpredictable results because of the performance interference effects associated with co-located virtual machines. Determining appropriate scaling policies in a dynamic non-stationary environment is non-trivial. One principle advantage exhibited by IaaS clouds over their traditional hosting counterparts is the ability to scale resources on-demand. However, a problem arises concerning resource allocation as to which resources should be added and removed when the underlying performance of the resource is in a constant state of flux. Decision theoretic frameworks such as Markov Decision Processes are particularly suited to decision making under uncertainty. By applying a temporal difference, reinforcement learning algorithm known as Q-learning, optimal scaling policies can be determined. Additionally, reinforcement learning techniques typically suffer from curse of dimensionality problems, where the state space grows exponentially with each additional state variable. To address this challenge, we also present a novel parallel Q-learning approach aimed at reducing the time taken to determine optimal policies whilst learning online. Copyright © 2012 John Wiley & Sons, Ltd.
Conference Paper
Elasticity is a key characteristic of cloud platforms enabling resource to be acquired on-demand in response to time-varying workloads. We introduce a new elasticity management framework that takes as input commonly used reactive rule-based scaling strategies but offers in return proactive auto-scaling. The elasticity framework combines reactive and predictive auto-scaling techniques, and we discuss the specification and performance of these individual components. We present a case study, based on real datasets, to demonstrate that our framework is capable of making appropriate auto-scaling decisions that can improve resource utilization compared to that obtained from a purely reactive approach.