ArticlePDF Available

A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments

December 2014
Journal of Grid Computing 12(4)

December 2014
12(4)

DOI:10.1007/s10723-014-9314-7

Authors:

Tania Lorido-Botrán

Microsoft

Jose Miguel-Alonso

Universidad del País Vasco / Euskal Herriko Unibertsitatea

Jose A. Lozano

Universidad del País Vasco / Euskal Herriko Unibertsitatea

Cloud computing environments allow customers to dynamically scale their applications. The key problem is how to lease the right amount of resources, on a pay-as-you-go basis. Application re-dimensioning can be implemented effortlessly, adapting the resources assigned to the application to the incoming user demand. However, the identification of the right amount of resources to lease in order to meet the required Service Level Agreement, while keeping the overall cost low, is not an easy task. Many techniques have been proposed for automating application scaling. We propose a classification of these techniques into five main categories: static threshold-based rules, control theory, reinforcement learning, queuing theory and time series analysis. Then we use this classification to carry out a literature review of proposals for auto-scaling in the cloud.

A simple queuing model with one server (left) and multiple servers (right)

…

Summary of the reviewed literature about reinforcement learning

…

Summary of the reviewed literature about queuing theory

…

Summary of the reviewed literature about control theory techniques

…

Summary of the reviewed literature about techniques based on time series analysis

…

Figures - uploaded by Tania Lorido-Botrán

Content may be subject to copyright.

Content uploaded by Tania Lorido-Botrán

Content may be subject to copyright.

Journal of Grid Computing manuscript No.

(will be inserted by the editor)

A Review of Auto-scaling Techniques for Elastic Applications in Cloud

Environments

Tania Lorido-Botran ·Jose Miguel-Alonso ·Jose A. Lozano

Received: date / Accepted: date

Abstract Cloud computing environments allow customers

to dynamically scale their applications. The key problem is

how to lease the right amount of resources, on a pay-as-you-

go basis. Application re-dimensioning can be implemented

effortlessly, adapting the resources assigned to the applica-

tion to the incoming user demand. However, the identiﬁca-

tion of the right amount of resources to lease in order to

meet the required Service Level Agreement, while keeping

the overall cost low, is not an easy task. Many techniques

have been proposed for automating application scaling. We

propose a classiﬁcation of these techniques into ﬁve main

categories: static threshold-based rules, control theory, rein-

forcement learning, queuing theory and time series analysis.

Then we use this classiﬁcation to carry out a literature re-

view of proposals for auto-scaling in the cloud.

Keywords Cloud computing ·scalable applications ·

auto-scaling ·service level agreement

1 Introduction

Cloud computing is an emerging technology that is becom-

ing increasingly popular because, among other advantages,

T. Lorido-Botran ·J.Miguel-Alonso ·J.A. Lozano

Intelligent Systems Group

University of the Basque Country, UPV/EHU

Paseo Manuel de Lardizabal, 1

20018, Donostia-San Sebastian, SPAIN

T. Lorido-Botran

Department of Computer Architecture and Technology

E-mail: tania.lorido@ehu.es

J. Miguel-Alonso

Department of Computer Architecture and Technology

E-mail: j.miguel@ehu.es

J.A. Lozano

Department of Computer Science and Artiﬁcial Intelligence

E-mail: ja.lozano@ehu.es

allows customers to easily deploy elastic applications, greatly

simplifying the process of acquiring and releasing resources

to a running application, while paying only for the resources

actually allocated (pay-per-use or pay-as-you-go model). Elas-

ticity and dynamism are two key concepts of cloud com-

puting. In this context, resources are usually in the form of

Virtual Machines (VMs). Clouds can be used by companies

for a variety of purposes, from running batch jobs to hosting

web applications.

Three main markets are associated to cloud computing:

Infrastructure-as-a-Service (IaaS) designates the provision

of information technology and network resources such

as processing, storage and bandwidth as well as man-

agement middleware. Examples are Amazon EC2 [3],

RackSpace [20] and Google Compute Engine [13].

Platform-as-a-Service (PaaS) designates programming en-

vironments and tools hosted and supported by cloud pro-

viders that can be used by consumers to build and deploy

applications onto the cloud infrastructure. Examples in-

clude Amazon Elastic Beanstalk [5], Google App En-

gine [10], and Microsoft Windows Azure [18].

Software-as-a-Service (SaaS) designates hosted vendor ap-

plications. For example, Google Apps [11], Microsoft

Ofﬁce 365 [17] and Salesforce.com [25].

In this work we focus on the IaaS client’s perspective.

A typical scenario could be a company that wants to host

an application, and for this purpose leases resources from a

IaaS provider such as Amazon EC2. From now on the fol-

lowing terms will be used:

Provider. It refers mainly to the IaaS provider, that offers

virtually unlimited resources in the form of VMs. A PaaS

provider may also play this role.

Client. The client is the customer of the IaaS or PaaS ser-

vice, that uses it for hosting the application. In other

words, it is the application owner.

2 Tania Lorido-Botran et al.

User. This is the end user that accesses the application and

generates the workload or demand that drives its behav-

ior.

As we said previously, a key characteristic of cloud com-

puting is elasticity. This can be a double-edged sword. While

it allows applications to acquire and release resources dy-

namically, adjusting to changing demands, deciding the right

amount of resources is not an easy task. It would be desirable

to have a system that automatically adjusts the resources to

the workload handled by the application. All this with min-

imum human intervention or, even better, without it. This

would be an auto-scaling system, the focus of this survey.

Resource scaling can be either horizontal or vertical. In

horizontal scaling (scaling out/in), the resource unit is the

server replica (running on a VM), and new replicas are added

or released as needed. In contrast, vertical scaling (scaling

up/down) consists of changing the resources assigned to an

already running VM, for example, increasing (or reducing)

the allocated CPU power or the memory. Most common op-

erating systems do not allow on-the-ﬂy (without rebooting)

changes on the machine on which it runs (even if it is a VM);

for this reason, most cloud providers only offer horizontal

scaling.

The auto-scaler must be aware of the economical costs

of its decisions, which depend on the pricing scheme used

by the provider, to reduce the total expenditure. The pricing

model may include types of VMs, unit charge (per minute,

per hour), etc. The auto-scaler must also secure the correct

functioning of the application, by maintaining an acceptable

Quality of Service (QoS). The QoS depends on two types

of Service Level Agreement (SLA): the application SLA,

which is a contract between the client (application owner)

and end users; and the resource SLA, that is agreed by the

provider and the client. An example of the former is a certain

response time, and of the latter, 99.9% availability of the

infrastructure. Both types of SLA are usually mixed up, as to

satisfy the application SLA, it is necessary for the provider

to comply with the resource SLA. From now on, this review

focuses on the application SLA that must be guaranteed to

end users.

Applications hosted in cloud environments can be of very

diverse nature: batch jobs, map-reduce tasks, massively mul-

tiplayer online role-playing games (MMORPGs), web appli-

cations, video streaming services, and a large etcetera. Re-

source allocation for batch applications is usually denoted

as scheduling and involves meeting a certain job execution

deadline. It has been extensively studied in grid environ-

ments (job schedulers) and also explored in cloud environ-

ments, but this is not the topic of this review. The scope of

applicability of the auto-scaler covers any scalable or elas-

tic application composed of a load balancer that receives

job requests and dispatches them to any of a collection of

replicable servers. Web applications, MMORPGs and video

streaming services ﬁt into this scheme. The main contribu-

tions of this review are:

–A problem deﬁnition of the auto-scaling process and its

different phases.

–A classiﬁcation for auto-scaling techniques, together with

a description of each category and its pros/cons.

–A review of the literature about auto-scaling, organized

using this classiﬁcation. However, given the heterogene-

ity of auto-scaling approaches and testing conditions, it

is not possible to provide an assessment of the different

proposals, alone or in comparative terms.

–A description, in an Appendix, of the different tools and

environments used to implement and test auto-scalers.

Note that the management of the cloud infrastructure is

out of the scope of this paper: topics such as VM placement

into physical servers, VM migration, energy consumption

and related problems are not discussed.

The remainder of this paper is organized as follows. Sec-

tion 2 describes the general scenario of elastic applications

in which to apply auto-scaling techniques. The auto-scaling

process is described in Section 3. A classiﬁcation criterion

to organize auto-scaling proposals is introduced in Section

4. Section 5, the core of this work, contains a review of the

literature on auto-scaling in the cloud. Section 6 completes

this paper with some conclusions extracted from the review

and an outline of future work. Furthermore, Appendix A

provides details about the experimental platforms, workload

generation mechanisms and application benchmarks used by

different authors to test their proposal of auto-scaling algo-

rithms.

2 Scenario: Elastic Applications

An elastic application has the capability of being scaled (hor-

izontally or vertically) to make it adjust to the input, vari-

able workload. Applications of this class are normally based

on a load balancer (a dispatcher) and a collection of identi-

cal servers. In a cluster environment, those would be physi-

cal servers but, in cloud environments, servers are hosted in

VMs and, therefore, the terms servers and VMs can be used

interchangeably. An auto-scaler is in charge of making de-

cisions about scaling actions, without the intervention of a

human manager.

Although several applications can be considered elastic

(e.g. MMORPG or video streaming servers), most of the lit-

erature is focused on web applications. These typically in-

clude a business-logic tier, containing the application logic,

and a persistence or data base tier. The literature pays most

of its attention to the business-logic tier, that can be easily

scaled. There are some works that study auto-scaling at the

persistence tier, although replicating distributed databases

A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 3

brings out additional issues. Our description of auto-scaling

techniques is as neutral as possible, without focusing on any

particular application class or tier. However, given the liter-

ature bias towards web applications, it is unavoidable to see

it reﬂected in this survey.

Let us consider an elastic application deployed over a

pool of nVMs. VMs may have the same or different as-

signed amount of resources (e.g. 1GB of memory, 2 CPUs,...),

but each VM has its own unique identiﬁer (it could be its IP

address). The requests received by the load balancer may

come from real end users or from other application. We will

assume that the execution time of a request may vary be-

tween milliseconds and minutes; long-running tasks lasting

hours will not be considered in the auto-scaling problem, as

they rather belong to a scheduling problem.

The load balancer will receive all the incoming requests

and forward them to one of the servers in the pool. It may be

implemented in different ways. These are a few examples: a

speciﬁc, hardware/software combination (an Internet appli-

ance), a part of the Domain Name System, a service offered

by the IaaS provider, or even an ad-hoc part of the client’s

application. Whichever the chosen technology, it is assumed

that the load balancer has updated information about the

VMs to use (the active ones): it will stop immediately send-

ing requests to removed VMs, and it will start sending work-

load to newly added VMs. It is also assumed that each re-

quest will be assigned to a single VM, that will run it until

completing the task associated to it. Several load balancing

policies could be used, for example, random, round-robin

or least-connection. In the case of heterogeneous collections

of VMs, workload dispatching should be proportional to the

processing power of the VMs. The revision of techniques to

implement load balancers, distribute workload among VMs,

control the admission of new requests, or redirect requests

between VMs fall outside the scope of this survey.

Next section is devoted to the auto-scaling process: its

objectives and the phases of the process.

3 Auto-scaling Process: the MAPE Loop

The aim of the auto-scaler is to dynamically adapt the re-

sources assigned to the elastic applications, depending on

the input workload. This auto-scaler may be either an ad-

hoc implementation for a particular application, or a generic

service offered by the IaaS provider. In any case, the system

should be able to ﬁnd a trade-off between meeting the SLA

of the application and minimizing the cost of renting cloud

resources. Any auto-scaler faces several problems:

Under-provisioning: The application does not have enough

resources to process all the incoming requests within the

temporal limits imposed by the SLA. More resources are

needed, but it takes some time from the moment they

are requested until they are available. In case of sudden

trafﬁc bursts, an under-provisioned application may lead

to many SLA violations, even to system congestion. It

will take some time until the application returns to its

normal operational state.

Over-provisioning: In this case the application has more re-

sources than those needed to comply with the SLA. This

is correct from the point of view of SLA, but the client is

paying an unnecessary cost if the VMs are idle or lightly

loaded. A certain degree of over-provisioning could be

desirable in order to cope with small workload ﬂuctua-

tions. In general, neither a manual provision of resources,

nor the one carried out by an auto-scaler, aims to keep

the VMs operating at 100% of its capacity.

Oscillation: A combination of both undesirable effects. Os-

cillation occurs when scaling actions are carried out too

quickly, before being able to see the impact on each scal-

ing action on the application. Keeping a capacity buffer

(aiming to keep the VMs at, say, 80% instead of 100%),

or using a cooldown period (e.g. [73]) are common ap-

proaches to avoid oscillation.

The auto-scaling process matches the MAPE loop of au-

tonomous systems [68] [82], which consists of four phases:

Monitoring (M), Analysis (A), Planning (P) and Execution

(E). First, a monitoring system gathers information about the

system and application state. The auto-scaler encompasses

the analysis and planning phases: it uses the retrieved infor-

mation to make estimations of future resource utilization and

needs (analysis), and then plans a suitable resource modiﬁ-

cation action, e.g., removing a VM or adding some extra

memory. The provider will then execute the actions as re-

quested by the auto-scaler. Each of the phases in the MAPE

loop is described in more detail in the forthcoming sections.

3.1 Monitor

An auto-scaling system requires the support of a monitor-

ing system providing measurements about user demands,

system (application) status and compliance with the expected

SLA. Cloud infrastructures provide, through an API, access

to information useful for the provider itself (for example, to

check the correct functioning of the infrastructure and the

level of utilization) and the client (for example, to check the

compliance with the SLA).

Auto-scaling decisions rely on having useful, updated

performance metrics. The performance of the auto-scaler

will depend on the quality of the available metrics, the sam-

pling granularity, and the overheads (and costs) of obtaining

the metrics [38]. In the literature, authors have used a variety

of metrics as drivers for scaling decisions. An extensive list

of those metrics is provided in [59], for both transactional

(e.g. e-commerce web sites) and batch workloads (e.g. video

4 Tania Lorido-Botran et al.

transcoding or text mining). They could be easily adapted to

other types of applications.

Hardware: CPU utilization per VM, disk access, network

interface access, memory usage.

General OS Process: CPU-time per process, page faults, real

memory (resident set).

Load balancer: size of request queue length, session rate,

number of current sessions, transmitted bytes, number

of denied requests, number of errors.

Application server: total thread count, active thread count,

used memory, session count, processed requests, pend-

ing requests, dropped requests, response time.

Database: number of active threads, number of transactions

in a particular state (write, commit, roll-back).

Message queue: average number of jobs in the queue, job’s

queuing time.

Note that, when working with applications deployed in

the cloud, some metrics are obtained from the cloud provider

(those related to the acquired resources and the hypervisor

managing the VMs), others from the host operating sys-

tem on which the application is implemented, and yet others

from the application itself. In order to reduce the complex-

ity of the monitoring system, sometimes proxy metrics are

used: for example, CPU utilization (hypervisor-level) as a

proxy of current system workload (application-level).

As stated before, auto-scaling is mostly related to the

Analyze and Plan parts of the MAPE loop. For this reason,

monitoring will not be further studied in this review. It will

be assumed that a good monitoring tool is available, gather-

ing different and updated metrics about system and applica-

tion current state, with negligible intrusion, and at a suitable

granularity (e.g. per second, per minute).

3.2 Analyze

The analysis phase consists of processing the metrics gath-

ered directly from the monitoring system, obtaining from

them data about current system utilization, and optionally

predictions of future needs. Some auto-scalers do not per-

form any kind of prediction, just respond to current sys-

tem status: they are reactive. However, other use sophisti-

cated techniques to predict future demands in order to ar-

range resource provisioning with enough anticipation: they

are proactive. Anticipation is important because there is al-

ways a delay from the time when an auto-scaling action

is executed (for example, adding a server) until it is effec-

tive (for example, it takes several minutes to assign a phys-

ical server to deploy a VM, move the VM image to it, boot

the operating system and application, and have the server

fully operational [80]). Reactive systems might not be able

to scale in case of sudden trafﬁc bursts (e.g. special offers

or the Slashdot effect). Therefore, proactivity might be re-

quired in order to deal with ﬂuctuating demands and being

able to scale in advance.

Part of the reviewed works focus only on this analysis

phase, on the way of processing the metrics to either deter-

mine the current state of the application or anticipate future

needs.

3.3 Plan

Once the current (or future) state is known (or predicted),

the auto-scaler is in charge of planning how to scale the

resources assigned to the application in order to ﬁnd a satis-

factory trade-off between cost and SLA compliance. Exam-

ples of scaling decisions are removing a VM or adding more

memory to a particular VM. Decisions will be made consid-

ering the data obtained from the analysis phase (or directly

from the monitoring system) and the target SLA, as well

as other factors related to the cloud infrastructure, including

pricing models and VM boot-up time.

This part constitutes the core of any auto-scaling pro-

posal and, therefore, will be thoroughly studied in Sections

4 and 5.

3.4 Execute

This phase consists of actually executing the scaling actions

decided in the previous step. Conceptually, this is a straight-

forward phase, implemented through the cloud provider’s

API. Actual complexities are hidden to the client. Remem-

ber that it takes some time from the moment a resource is

requested until it is actually available, and that bounds on

these delays may be part of the resource SLA.

4 A Classiﬁcation of Auto-scaling Techniques

As the body of literature dealing with proposals of auto-

scaling systems is large, we have tried to put some order

into it, to better understand and compare those proposals. To

that extent, we need some classiﬁcation rules to group works

into meaningful sets. To the best of our knowledge, there is

no previous work proposing such a classiﬁcation. The most

closely related survey, done by Guitart et al [61], targets

the performance of general Internet applications, deployed

over shared or dedicated clusters, relying on methods such

as admission control and service differentiation. However,

this review focuses on exploiting the elastic nature inherent

to cloud systems. Manvi and Shyam [78] gather many ref-

erences about resource management on IaaS environments,

but put little focus on auto-scaling.

A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 5

The works we have revised to put together this survey

are very diverse in regards to the underlying theory or tech-

nique used to implement the auto-scaler, including the met-

rics or models used to decide both when to scale or how

many resources are necessary. Each auto-scaling approach

has been designed with particular goals, focusing on several

application architectures or cloud providers offering differ-

ent scaling capabilities. The most differentiating factor is

the ﬁnal goal/evaluation criteria used for a particular auto-

scaler: the prediction accuracy, the compliance with the SLA

(deﬁned in many ways such as response time or availability

of resources), or the cost of resources. It seems quite obvious

that comparing auto-scaling approaches is not that straight-

forward. For example, let us consider two proposals, one fo-

cused in short-term prediction of workloads with a clearly

diurnal pattern, and another one that tries to comply with

the SLA even in the case of sudden bursts of trafﬁc. Both

auto-scaling systems may work well for their target systems,

but clearly the latter approach is more prone to cause over-

provisioning. Still, it would be wrong to assume that this

extra cost due to over-provisioning state is enough to pre-

fer one auto-scaling system instead of the other. Then, the

target goal of the auto-scaler cannot be used as the classiﬁ-

cation criterion.

A possible grouping for auto-scalers could be done using

their anticipation capabilities, arranging them in two classes:

reactive and proactive. The former implies that the system

reacts to changes in the workload only when those changes

have been detected, using the last values obtained from the

set of monitored variables; consequently, as resource provi-

sioning takes some time, the desired effect may arrive when

it is too late. Proactive systems anticipate future demands

and make decisions taking them into consideration. We have

chosen not to use this as a classiﬁcation criterion because

sometimes it is not clear whether a particular approach is

purely reactive or proactive.

We decided to adopt the underlying theory or technique

used to build the auto-scaler as our classiﬁcation criterion.

This will help the reader to better understand the basic con-

cepts of a particular group or category, including their ad-

vantages and limitations. Most reviewed works can ﬁt in one

or more of these ﬁve groups (note that some works propose

hybridizations of several techniques):

1. Threshold-based rules (rules)

2. Reinforcement learning (RL)

3. Queuing theory (QT)

4. Control theory (CT)

5. Time series analysis (TS)

Commercial cloud providers offer purely reactive auto-

scaling using threshold-based rules. The scaling decisions

are triggered based on some performance metrics and pre-

deﬁned thresholds. This approach has become rather pop-

ular due to its (apparent) simplicity: rule-based auto-scaler

are easy to provide as a cloud service, and are also easy to

set-up by clients. However, the effectiveness of rules under

bursty workloads is questionable.

Time series analysis covers a wide range of methods to

detect patterns and predict future values on sequences of

data points. The accuracy in the forecast value (e.g. future

number of requests or average CPU utilization) will depend

on selecting the right technique and setting the parameters

correctly, specially the history window and the prediction

interval. Time-series analysis is the main enabler of proac-

tive auto-scaling techniques.

There are two auto-scaling methods that rely on mod-

eling the system in order to determine its future resource

needs. This is the case of both queuing theory and control

theory. Queuing theory has been largely applied to comput-

ing systems, in order to ﬁnd the relationship between the

jobs arriving and leaving a system. A simple approach con-

sists in modeling each VM (or set of VMs) as a queue of

requests in order to estimate different performance metrics

such as the response time. A main limitation of QT models

is that they are too rigid, and need to be recomputed when

there are changes in either the application or the workload.

Control theory also relies on creating a model of the ap-

plication. The aim is to deﬁne a (reactive or proactive) con-

troller to automatically adjust the required resources to the

application demands. The nature and performance of a con-

troller highly depends on both the application model and the

controller itself. As we will see, many researchers consider

that this type of auto-scaling has a great potential, specially

when combined with resource prediction.

Finally, the last of our categories contains proposals based

on reinforcement learning. Similarly to control theory, RL

tries to automate the scaling task, but without using any a-

priori knowledge or model of the application. Instead, RL

tries to learn the most suitable action for each particular state

on-the-ﬂy, with a trial-and-error approach. Although the ab-

sence of model and adaptability of the technique might seem

the most appealing for auto-scaling, the truth is that RL suf-

fers from long learning phases. The time required by the

method to converge to an optimal policy can be unfeasible

long.

As deﬁned in Section 3, the auto-scaling process is mainly

related to the analysis and planning phases of the MAPE

loop. Some auto-scaling proposals focus on one of these

phases, but most of them cover both of them. Queuing the-

ory and time series analysis are useful in the analysis phase

in order to estimate the current utilization or future needs of

the application. Threshold-based rules, reinforcement learn-

ing and control theory can be used in the planning phase to

decide the scaling action, and they can be combined with a

previous analysis phase involving for example, time series

analysis.

6 Tania Lorido-Botran et al.

5 Review of Auto-scaling Techniques

In this section we carry out a survey of the literature on auto-

scaling systems, following the classiﬁcation introduced in

the previous section. It is organized in as many sub-sections

as categories, each one starting with a description of the

technique that deﬁnes the category, including the deﬁnition,

methodologies, pros and cons. After that, we analyze a col-

lection of papers ﬁtting into the category, also discussing

their features and limitations. This information is comple-

mented with a set of tables summarizing the reviewed lit-

erature, one table per category. Each table row includes a

synopsis of a reviewed auto-scaling paper, which includes:

1. The speciﬁc technique or combination of techniques ap-

plied

2. Whether an horizontal (H) or vertical (V) scaling is per-

formed

3. The reactive (R) or proactive (P) nature of the approach

4. The performance metric considered (e.g. CPU load, in-

put request rate)

5. The monitoring tool and the granularity or monitoring

interval

6. Characteristics of the environment used to test the tech-

nique, including

–The SLA considered for the application

–The type of workload (either real or synthetic)

–The experimental platform (simulator, custom testbed

or real provider), together with the application bench-

mark

Note that many table entries contain a “-”. This means

that the reviewed paper does not provide enough informa-

tion to know or infer the corresponding piece of information.

Unfortunately, this happens more often than desirable. Also,

note that authors have used many different mechanisms to

assess the goodness of their auto-scaler, ranging from com-

pletely synthetic simulated environments to actual produc-

tion systems. For more information about testing platforms,

the interested reader is referred to Appendix A.

5.1 Threshold-based Rules

Threshold-based auto-scaling rules or policies are very pop-

ular among cloud providers such as Amazon EC2, and third-

party tools like RightScale [22]. The simplicity and intuitive

nature of these policies make them very appealing to cloud

clients. However, setting the corresponding thresholds is a

per-application task, and requires a deep understanding of

workload trends.

5.1.1 Description of the Technique

From the MAPE loop (see Section 3), rules are purely a

decision-making technique (planning phase). The number of

VMs or the amount of resources assigned to the target ap-

plication will vary according to a set of rules, typically two:

one for scaling up/out and one for scaling down/in. Rules

are structured like these examples:

if x1>thrU1and/or x2>thrU2and/or .. .

for durU seconds then

n=n+sand

do nothing for inU seconds

(1)

if x1<thrL1and/or x2<thrL2and/or .. .

for durL seconds then

n=n−sand

do nothing for inL seconds

(2)

Each rule has two parts: the condition and the action to

be executed when the condition is met. The condition part

uses one or more performance metrics x1,x2,..., such as the

input request rate, CPU load or average response time. Each

performance metric has upper thrU and lower thrL thresh-

olds. If the condition is met for a given time (durU or durL),

then the corresponding action will be triggered. For horizon-

tal scaling, the application manager should deﬁne a ﬁxed

amount sof VMs to be acquired or released, while for ver-

tical scaling srefers to an increase or decrease of resources

such as CPU or RAM. After executing an action, the auto-

scaler inhibits itself for a small cooldown period deﬁned by

inU or inL.

The best way to understand threshold-based rules is by

means of an example: add 2 small instances when the aver-

age CPU usage is above 70% for more than 5 minutes, and

then, do nothing for 10 minutes.

5.1.2 Review of Proposals

Threshold-based rules constitute an easy to deploy and use

mechanism to manage the amount of resources assigned to

an application hosted in a cloud platform, dynamically adapt-

ing those resources to the input demand (e.g. [54], [72], [81],

[48] [63], [64]). However, creating the rules requires an ef-

fort from the application manager (the client), who needs

to select the suitable performance metric or logical com-

bination of metrics, and also the values of several parame-

ters, mainly thresholds. The experiments carried out by [72]

show that application-speciﬁc metrics (e.g. the average wait-

ing time in queue), obtain better performance that system-

speciﬁc metrics (e.g. CPU load). Application managers also

need to set the corresponding upper (e.g. 70%) and lower

(e.g. 30%) thresholds for the performance variable (e.g. CPU

load). Thresholds are the key for the correct working of the

rules. In particular, Dutreilh et al [54] remark that thresh-

olds need to be carefully tuned in order to avoid oscillations

in the system (e.g. in the number of VMs or in the amount of

A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 7

Table 1 Summary of the reviewed literature about threshold-based rules. Table rows are as follow. (1) The reference to the reviewed paper. (2)

A short description of the proposed technique. (3) The type of auto-scaling: horizontal (H) or vertical (V). (4) The reactive (R) and/or proactive

(P) nature of the proposal. (5) The performance metric or metrics driving auto-scaling. (6) The monitoring tool used to gather the metrics. The

remaining three ﬁelds are related to the environment in which the technique is tested. (7) The metric used to verify SLA compliance. (8) The

workload applied to the application managed by the auto-scaler. (9) The platform on which the technique is tested.

Ref Auto-scaling Techniques H/V R/P Metric Monitoring SLA Workloads Experimental Platform

[63] Rules Both R CPU, memory, I/O Custom tool. 1

minute

Response time Synthetic. Browsing and

ordering behavior of

customers.

Custom testbed (called IC Cloud) +

TPC

[72] Rules H R Average waiting

time in queue, CPU

load

Custom tool. - Synthetic Public cloud. FutureGrid, Eucalyp-

tus India cluster

[64] Rules Both R CPU load, response

time, network link

load, jitter and delay.

- - Only algorithm is de-

scribed, no experimenta-

tion is carried out.

[48] Rules + QT H P Request rate Amazon Cloud-

Watch. 1-5 minutes

Response time Real. Wikipedia traces Real provider. Amazon EC2 +

Httperf + MediaWiki

[52] RightScale + MA to per-

formance metric

H R Number of active

sessions

Custom tool - Synthetic. Different

number of HTTP clients

Custom testbed. Xen + custom col-

laborative web application

[73] RightScale + TS: LR and

AR(1)

H R/P Request rate, CPU

load

Simulated. - Synthetic. Three trafﬁc

patterns: weekly oscilla-

tion, large spike and ran-

dom

Custom simulator, tuned after some

real experiments.

[59] RightScale H R CPU load Amazon Cloud-

Watch

- Real. World Cup 98 Real provider. Amazon EC2 +

RightScale (PaaS) + a simple web

application

[96] RightScale + Strategy-tree H R Number of sessions,

CPU idle

Custom tool. 4 min-

utes.

- Real. World Cup 98 Real provider. Amazon EC2 +

RightScale (PaaS) + a simple web

application.

[81] Rules V R CPU load, memory,

bandwidth, storage

Simulated. - Synthetic Custom simulator, plus Java rule

engine Drools

[77] Rules V R CPU load Simulated. 1 minute Response time Real. ClarkNet Custom simulator

CPU assigned). To prevent this problem, it is advisable to set

an inertia,cooldown or calm period, a time during which no

scaling decisions can be committed, once an scaling action

has been carried out.

Most authors and cloud providers use only two thresh-

olds per performance metric. However, Hasan et al [64] have

considered using a set of four thresholds and two durations:

ThrU, the upper threshold; ThrbU, which is slightly below

the upper threshold; ThrL, the lower threshold; ThroL, which

is slightly above the lower threshold. Used in combination,

it is possible to determine the trend of the performance met-

ric (e.g. trending up or down), and then perform ﬁner auto-

scaling decisions.

Conditions in the rules are usually based on a single, or

at most two performance metrics, being the most popular

the average CPU load of the VMs, the response time, or the

input request rate. Both Dutreilh et al [54] and Han et al

[63] use the average response time of the application. On the

contrary, Hasan et al [64] prefer using performance metrics

from multiple domains (compute, storage and network) or

even a correlation of several of them.

Note that in most cases the rules use the metrics directly

as obtained from the monitor, thus acting in a purely re-

active way. However, it is possible to carry out a previous

analysis of the monitored data in order to predict the fu-

ture (expected) behavior of the system and execute rules in a

proactive way [71]. This topic will be discussed later, in the

section devoted to time series analysis.

RightScale’s auto-scaling algorithm [23] propose com-

bining regular reactive rules with a voting process. If a ma-

jority of the VMs agree on that they should scale up/out or

down/in, that action is taken; otherwise, no action is planned.

Each VM votes to scale up/out or down/in based on a set

of rules evaluated individually. After each scaling action,

RightScale recommends a 15 minute-period of calm time

because new machines generally take between 5 to 10 min-

utes to be operational. This auto-scaling technique has been

adopted by several authors ([73], [51], [59], [96]). Chieu

et al [51] initially proposed a set of reactive rules based on

the number of active sessions, but this work was extended

in Chieu et al [52] following the RightScale approach: if

all VMs have active sessions above the given upper thresh-

old, a new VM is provisioned; if there are VMs with ac-

tive sessions below a given lower threshold and with at least

one VM that has no active session, the idle one will be shut

down.

As RightScale’s voting system is based on rules, it in-

herits their main disadvantage: the technique is highly de-

pendent on manager-deﬁned threshold values, and also, on

the characteristics of the input workload. This was the con-

clusion reached by Kupferman et al [73] after comparing

RightScale with other algorithms. Simmons et al [96] try

to overcome these problems with a strategy-tree, a tool that

evaluates the deployed policy set, and switches among alter-

native strategies over time, in a hierarchical manner. Authors

created three different scaling policies, customized to dif-

ferent input workloads, and the strategy-tree would switch

among them based on the workload trend (analyzed with a

regression-based technique).

In order to save costs, Kupferman et al [73] (and other

authors [48]) came up with an idea called smart kill. Many

IaaS providers charge partial utilization hours as full hours.

Therefore, it is advisable not to terminate a VM before the

hour is over, even if the load is low. Apart from reducing

costs, smart kill may also improve system performance: in

case of an oscillating input workload, the costs of continu-

8 Tania Lorido-Botran et al.

ously shutting down and starting up VMs are avoided, im-

proving boot-up delays and reducing SLA violations.

The popularity of rules as auto-scaling method is prob-

ably due to their simplicity and the fact that they are easy

to understand for clients. However, this kind of technique

shows two main problems: its reactive nature and the dif-

ﬁculty of selecting the correct set of performance metrics

and the corresponding thresholds. The effectiveness of those

thresholds is highly dependent on the workload changes,

and may require frequent tuning. In order to solve the prob-

lem of static thresholds, Lorido-Botran et al [77] introduce

the concept of dynamic thresholds: initial values are set-up,

but they are automatically modiﬁed as a consequence of the

observed SLA violations. Meta-rules are included to deﬁne

how the threshold used in the scaling rules may change to

better adapt to the workload.

In conclusion, Rules can be used to easily automate the

auto-scaling of a particular application without much effort,

specially in the case of applications with quite regular, pre-

dictable patterns. However, in case of bursty workloads the

client should consider a more advanced and powerful auto-

scaling system from the rest of categories.

The main characteristics of the auto-scaling proposals

based on rules and reviewed in this section are summarized

in Table 1.

5.2 Reinforcement Learning

Reinforcement Learning (RL) [98] is a type of automatic

decision-making approach that has been used by several au-

thors to implement auto-scalers. Without any a priori knowl-

edge, RL techniques are able to determine the best scaling

action to take for every application state, given the input

workload.

5.2.1 Description of the Technique

Reinforcement learning [98] focuses on learning through di-

rect interaction between an agent (e.g. the auto-scaler) and

its environment (e.g. the application as deﬁned in Section 2).

The auto-scaler will learn from experience (trial-and-error

method) the best scaling action to take, depending on the

current state, given by the input workload, performance or

other set of variables. After executing an action, the auto-

scaler gets a response or reward (e.g. performance improve-

ment) from the system, about how good that action was. So,

the auto-scaler will tend to execute actions that yield a high

reward (best actions are reinforced). From now on, in this

section the general term agent will be used, instead of auto-

scaler.

The objective of the agent is to ﬁnd a policy πthat maps

every state sto the best action athe agent should choose.

The agent has to maximize the expected discounted rewards

obtained in the long run:

Rt=rt+1+γrt+2+γ2rt+3+. . . =

∞

∑

γkrt+k+1(3)

where rt+1is the reward obtained at time t+1, and gamma

is the discount factor.

The policy is based on a value function Q(s,a), usually

called the Q-value function. Every Q(s,a)value estimates

the future cumulative rewards by executing an action ain a

state s. In other words, it represents the goodness of execut-

ing action awhen in state s. The Q-value function can be

deﬁned as:

Q(s,a) = Eπ(∞

∑

k=0

γkrt+k+1|st=s,at=a)(4)

There are many RL algorithms that can be used in order

to obtain the Q-value function, but Q-learning is the most

used in the literature. Typically, the Q(s,a)values are stored

in a lookup table, that maps all system states sto their best

action a, and the corresponding Q-value. The Q-learning al-

gorithm is sketched in Algorithm 1.

Algorithm 1 Q-learning basic steps

1: Initialize the Q-values table, Q(s,a).

2: Observe the current state, s.

3: loop {inﬁnitely}

4: Choose an action, a, for state sbased on one of the action se-

lection policies, such as ε-greedy.

5: Execute the action, and observe the reward, r, as well as the new

state, s0.

6: Update the Q-value for the state using the observed reward and

the maximum reward possible for the next state. The resulting

update formula is:

Q(s,a) = Q(s,a) + α[r+γmax

a0Q(s0,a0)−Q(s,a)] (5)

7: Set the state sto the new state s0.

Step 4 of the algorithm involves choosing an action for

a given state. Among the existing action selection policies,

ε-greedy is often the one selected in the literature. Most of

the time (with probability 1 −ε), the action with the best

reward will be executed (argamax Q(s,a)); a random action

will be chosen with a low probability ε, in order to explore

non-visited actions. Once the action is executed, the corre-

sponding Q-value is updated (step 6) with the obtained re-

ward rand the maximum reward possible for the next state

maxa0Q(s0,a0). The parameter γis the discount factor that

adjusts the importance given to future rewards. The update

formula also includes a parameter α, that determines the

learning rate. It can be the same for every state-action pair,

A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 9

or can be adjusted based on the number of times each state

has been visited.

The policy learned by the agent is just the action that

maximizes the Q-value in each state. Watkins and Dayan

[104] proved that the discrete case of Q-learning converges

to an optimal policy under certain conditions, and this is in-

dependent of the initial values of the Q-table. If each pair

(s,a)is visited an inﬁnite number of times, then the lookup

table converges to a unique set of values Q(s,a) = Q∗(s,a),

which deﬁnes a stationary deterministic optimal policy. Note

that the auto-scaling process is a continuing task, not station-

ary. For this reason, the Q-learning policy has to be learned

inﬁnitely and adapted to the workload changes.

An algorithm very similar to Q-learning is SARSA [98].

In contrast to Q-learning, it does not use the maximum re-

ward for the next state maxa0Q(s0,a0)to update the Q(s,a)

value. Instead, the same action selection policy (check Step

4 in Algorithm 1) is applied to the new state s0, in order

to determine an action a0. Then, the corresponding Q(s0,a0)

value is used to update the current Q(s,a), as shown in the

following formula:

Q(s,a) = Q(s,a) + α[r+γQ(s0,a0)−Q(s,a)] (6)

The effect of a scaling decision takes some time to have

an impact on the application, and the reward comes after a

delay. Given that RL makes a decision with current informa-

tion (application state) about a future reward (e.g. response

time), a proactive nature is assumed for all RL approaches.

The method includes two phases of the MAPE process: an-

alyze and plan. First, information about application and re-

wards are collected in a lookup table (or any other structure)

for later use (analyze phase). Then, in the planning phase,

this information is used to decide the best scaling action.

5.2.2 Review of Proposals

Three basic elements have to be deﬁned in order to apply RL

to auto-scaling: the action set A, the state space S, and the

reward function R. The ﬁrst two highly depend on the type

of scaling, either horizontal or vertical, whereas the reward

function usually takes into account the cost of the acquired

resources (renting VMs, bandwidth, ...) and the penalty cost

for SLA violations. In case of horizontal scaling, the state is

mostly deﬁned in terms of the input workload and the num-

ber of VMs. For example, Tesauro et al [99] propose using

(w,ut−1,ut), where wis the total number of user requests ob-

served per time period; utand ut−1are the number of VMs

allocated to the application in the current time step, and the

previous time step, respectively; Dutreilh et al [55] consid-

ered the (w,u,p)state deﬁnition, where pis the performance

in terms of average response time to requests, bounded by a

value P

max chosen from experimental observations. The set

of actions for horizontal scaling are usually three: add a new

VM, remove an existing VM, and do nothing.

In contrast, the state deﬁnition for vertical scaling takes

into account the amount of resources assigned to each VM

(CPU and memory, mostly). In particular, Rao et al [92]

[105] considered the following state deﬁnition: (mem1,time1,

vcpu1,...,memu,timeu,vc puu), where memi,t imeiand vcpui

are the ith VM’s memory size, scheduler credit and the num-

ber of virtual CPUs, respectively. For each of the three vari-

ables, possible operations can be either increase, decrease or

no-operation (i.e. maintain the previous value). The actions

for the RL task are deﬁned as the combinations of the op-

erations on each variable. Given that the variables are con-

tinuous, operations are discretized: mem is reconﬁgured in

blocks of 256MB; scheduler changes (time) in 256 credits;

and vcpu in 1 unit. The same authors propose in [93] a dis-

tributed approach, in which a RL agent is learned per VM. In

this case, the state is conﬁgured as (CPU,bandwidth,memory,

swap).

Even though Q-learning is the most extended algorithm

in auto-scaling, there are some exceptions: e.g. Tesauro et al

[99] use the SARSA approach, explained before. Some of

the articles referenced in this section ([92], [93], [45], [105])

do not specify which is the speciﬁc RL algorithm applied

in the experiments, but the problem deﬁnition and update

function resembles those of SARSA. Both Q-learning and

SARSA present several problems, that have been addressed

in a number of ways [54], [55] [42], [99], [92]:

–Bad initial performance and large training time. The per-

formance obtained during live on-line training may be

unacceptably poor, both initially and during a long train-

ing period, before a good enough solution is found. The

algorithm needs to explore the different states and ac-

tions.

–Large state-space. This is often called the curse of di-

mensionality problem. The number of states grows expo-

nentially with the number of state variables, which leads

to scalability problems. In the simplest form, a lookup

table is used to store a separate value for every possible

state-action pair. As the size of such a table increases,

any access to the table will take a longer time, delaying

table updates and action selection.

–Exploration actions and adaptability to environment chan-

ges. Even assuming that an optimal policy is found, the

environment conditions (e.g. workload pattern) may change

and the policy has to be re-adapted. For this purpose, the

RL algorithm executes a certain amount of exploration

actions, believed to be suboptimal. This could lead to

undesired bad performance, but it is essential in order

to adapt the current policy. Following this method, RL

algorithms are able to cope with relatively smooth chan-

ges in the behavior of the application, but not to sudden

burst in the input workload.

10 Tania Lorido-Botran et al.

Table 2 Summary of the reviewed literature about reinforcement learning

Ref Auto-scaling Techniques H/V R/P Metric Monitoring SLA Workloads Experimental Platform

[54] Rules + RL H P Request rate, re-

sponse time, number

of VMs

Custom tool. 20 sec-

onds

Response time Synthetic. Made up of 5

sinusoidal oscillations

Custom testbed + RUBiS

[55] RL H P Number of user re-

quests, number of

VMs, response time

- Response time, cost Synthetic. With sinu-

soidal pattern

Custom testbed. Olio application +

Custom decision agent (VirtRL)

[42] RL H P Number of user re-

quests, number of

VMs, time

Simulated Response time, cost Synthetic. Based on

Poisson distribution

Custom simulator (Matlab)

[99] RL(+ANN) - SARSA +

Queuing model

H P Arrival rate, previ-

ous number of VMs

- Response time Synthetic. Poisson dis-

tribution (open-loop),

different number of

users and exponentially

distributed think times

(closed-loop)

Custom testbed (shared data cen-

ter). Trade3 application (a realistic

simulation of an electronic trading

platform)

[92] RL(+ANN) V P CPU and memory

usage

Custom tool Throughput, re-

sponse time

Synthetic: 3 workload

mixes (browsing, shop-

ping and ordering)

Custom testbed. Xen + 3 applica-

tions (TPC-C, TPC-W, SpecWeb)

[105] RL(+ANN) V P CPU and memory

usage

Custom tool Response time Synthetic: 3 workload

mixes (browsing, shop-

ping and ordering)

Custom testbed. Xen + 3 applica-

tions (TPC-C, TPC-W, SpecWeb)

[93] RL(+CMAC) V P CPU, I/O, memory,

swap

Custom tool

(scripts)

Response time,

throughput

Real. ClarkNet trace Custom testbed. Xen + 3 applica-

tions (TPC-C, TPC-W, SpecWeb)

[45] RL(+Simplex) V P CPU, memory - ap-

plication parameters

Custom tool Response time,

throughput

Synthetic. Different

number of clients

Custom testbed. Xen + 2 applica-

tions (TPC-C, TPC-W)

[108] CT - PID controller + RL

+ ARMAX model + SVM

regression

V P Application adap-

tive parameters

CPU and memory

- Application-related

beneﬁt function

Synthetic Custom testbed. Xen + 2 real ap-

plications (Great Lake Forecasting

System, Volume Rendering)

The problem of the bad performance in the early steps

has been addressed in a number of ways. Its main cause is

the lack of initial information and the random nature of the

exploration policy. For this reason, Dutreilh et al [54] used a

custom heuristic to guide the state space exploration, and the

learning process lasted for 60 hours. The same authors [55]

further propose an initial approximation of the Q-function

that updates the value for all states at each iteration, and

also speeds up the convergence to the optimal policy. Re-

ducing this training time can also be addressed with a policy

that visits several states at each step [92] or using parallel

learning agents [42]. In the latter, each agent does not need

to visit every state and action; instead, it can learn the value

of non-visited states from neighboring agents. A radically

different approach to avoid the poor early performance of

on-line training consists of using an alternative model (e.g.

a queuing model) to control the system, whilst the RL model

is trained off-line on collected data [99].

Some authors have proposed methods to address the curse

of dimensionality issue, reducing the state space. Bu et al

[45] use a Simplex optimization that selects the promising

states that would return a high reward. A parallel approach

would also help coping with large state spaces. Barrett et al

[42] create an agent per VM, that keeps its own, small lookup

table. Using a lookup table for representing the Q-function

is not efﬁcient, and other nonlinear function approximators

can be utilized, such as neural networks, CMACs (Cerebel-

lar Model Articulation Controllers), regression trees, sup-

port vector machines, wavelets and regression splines. For

example, neural networks [99], [92] take the state-action

pairs as input and output the approximated Q-value. They

are also able to predict the value for non-visited states, and

deal with continuous state spaces. Rao et al [93] combine

the parallel approach (an agent per VM) with a CMAC [37]

[93] to represent the Qfunction. Authors found that updates

of the CMAC-based Qtable only need 6.5 milliseconds, in

comparison with the 50-second update time in a neural net-

work.

It is also worth mentioning the usefulness of RL in other

tasks that can be tightly linked to the auto-scaling prob-

lem. For example, application parameter conﬁguration [105]

[45]. Xu et al [105] use an ANN-based RL agent to tune pa-

rameters directly related to the application and VM perfor-

mance, such as MaxClients, Keepalive timeout, MinSpare-

Servers, MaxThreads and Session timeout (these are exam-

ples of tunable parameters from Tomcat or Apache applica-

tions).

RL can be used in combination with other methods such

as control theory. Following a radically different approach,

Zhu and Agrawal [108] combine a PID controller with an

RL agent in charge of estimating the derivative term (this

is further explained in Section 5.4). The controller guides

the parameter adaptation of applications (e.g. image size)

in order to meet the SLA. Then, virtual resources, CPU and

memory, are dynamically provisioned according to the change

in the adaptive parameters.

Before ﬁnishing this section, it is important to remark

the interesting capability of RL algorithms to capture the

best management policy for a target scenario, without any

a-priori knowledge. The client does not need to deﬁne a par-

ticular model for the application; instead, it is learned online

and adapted if the conditions of the application, workload

or system change. In our opinion, RL techniques could be

a promising approach to solve the auto-scaling task of gen-

eral applications, but the ﬁeld is not mature enough to satisfy

the requirements of a real production scenario. In this open

research ﬁeld, efforts should be addressed towards provid-

ing an adequate adaptability to sudden bursts in the input

workload, and also to deal with continuous state spaces and

actions.

A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 11

Table 2 shows a summary of the articles reviewed in this

section.

5.3 Queuing Theory

Classical queuing theory has been extensively used to model

Internet applications and traditional servers. Queuing theory

can be used for the analysis phase of the auto-scaling task

in elastic applications (see Section 3), i.e., estimating per-

formance metrics such as the queue length or the average

waiting time for requests. This section describes the main

characteristics of a queuing model and how they can be ap-

plied to scalable scenarios.

5.3.1 Description of the Technique

Queuing theory makes reference to the mathematical study

of waiting lines, or queues. The basic structure of a model is

depicted in Figure 1. Requests arrive to the system at a mean

arrival rate λ, and are enqueued until they are processed. As

the ﬁgure shows, one or more servers may be available in

the model, that will attend requests at a mean service rate µ.

λ λ

Fig. 1 A simple queuing model with one server (left) and multiple

servers (right)

Kendall’s notation is the standard system used to de-

scribe and classify queuing models. A queue is denoted by

A/B/C/K/N/D. This is the meaning of each element in the

notation:

A:Inter-arrival time distribution.

B:Service time distribution.

C:Number of servers.

K:System capacity or queue length. It refers to the maxi-

mum number of customers allowed in the system includ-

ing those in service. When the system is fully occupied,

further arrivals are rejected.

N:Calling population. The size of the population from which

the customers come. If the requests come from an inﬁ-

nite population of customers, the queuing model is open,

whereas a closed model is based on a ﬁnite population

of customers.

D:Service discipline or priority order. The service disci-

pline or priority order in which jobs in the queue are

served. The most typical one is FIFO/FCFS (First In

First Out / First Come First Served), in which the re-

quests are served in the order they arrived. There are al-

ternatives such as LIFO/LCFS (Last in First Out / Last

Come First Served) and PS (Processor Sharing), among

others.

Elements K,Nand Dare optional; if not present, it is

assumed that K=∞,N=∞and D=FIFO. The most typ-

ical values for both inter-arrival time Aand service time B,

are M,Dand G.Mstands for Markovian and it refers to

a Poisson process, which is characterized by a parameter λ

that indicates the number of arrivals (requests) per time unit.

Therefore, the inter-arrival or the service time will follow an

exponential distribution. Dmeans deterministic or constant

times. Another commonly used value is G, that corresponds

to a general distribution with known parameters.

The elastic application scenario described in Section 2

can be formulated using a simple queuing model, consider-

ing a single queue representing the load balancer, that dis-

tributes the requests among nVMs (see Figure 1). In order to

represent more complex systems such as multi-tier applica-

tions, a queuing network can be utilized. For example, each

tier can be modeled as a queue with one or nservers.

Queuing theory is used to analyze systems with a sta-

tionary nature, characterized by constant arrival and service

rates. Its objective is to derive some performance metrics

based on the queuing model and some known parameters

(e.g. arrival rate λ). Examples of performance metrics are

the average time waiting in the queue, and the mean re-

sponse time. In case of scenarios with changing conditions

(i.e. non-constant arrival and services rates), such as our

target, scalable applications, the parameters of the queuing

model have to be periodically recalculated, and the metrics

recomputed.

There are two main ways to solve queuing models: an-

alytically and by means of simulation. The former can only

be used with simple models with well-deﬁned distributions

for arrival and service processes, such as M/M/1 and G/G/1.

A well-known analytical way of solving queuing networks

is Mean Value Analysis (MVA) [83]. When analytical ap-

proaches are not feasible, that is, for relatively complex mod-

els, simulation can still be used to obtain the desired metrics.

M/M/1 is the simplest queuing (Poisson-based) model,

where both the arrival times and the service times follow ex-

ponential distributions. In this case, the mean response time

Rof a M/M/1 model can be calculated as R=1

µ−λ, given

a service rate µand an arrival rate λ(the response time R

is the sum of the enqueued time and the service time). An-

other simple queuing model is G/G/1, in which the system’s

inter-arrival and service times are governed by general dis-

tributions with known mean and variance. The behavior of a

G/G/1 system can be captured using the following equation:

λ≥s+σ2

a+σ2

2(R−s)−1

, where Ris the mean response time, s

12 Tania Lorido-Botran et al.

is the average service time for a request and σ2

a,σ2

bare the

variances of inter-arrival time service time, respectively.

In many queuing scenarios, it is useful to apply Little’s

Law, which states that the average number of customers (or

requests) E[C]in the system is equal to the average customer

arrival rate λmultiplied by the average time of each cus-

tomer in the system E[T]:E[C] = λ×E[T].

5.3.2 Review of Proposals

In the literature, both simple queuing models and more com-

plex queuing networks have been widely used to analyze the

performance of computer applications and systems.

Ali-Eldin et al [39] [40] model a cloud-hosted applica-

tion as a G/G/n queue, in which the number of servers n

is variable. The model can be solved to compute, for exam-

ple, the necessary resources required to process a given input

workload λ, or the mean response time for requests, given a

conﬁguration of servers.

Queuing networks can also be used to model elastic ap-

plications, representing each VM (server) as a separate queue.

For example, Urgaonkar et al [100] use a network of G/G/1

queues (one per server). They use histograms to predict the

peak workload. Based on this value and the queuing model,

the number of servers per application tier is calculated. This

number can be corrected using reactive methods. The clear

drawback is that provisioning for peak load drives to high

under-utilization of resources.

Multi-tier applications can be studied using one or more

queues per tier. Zhang et al [107] considered a limited num-

ber of users, and thus, they used a closed system with a net-

work of queues; this model can be efﬁciently solved using

MVA. Han et al [62] modeled a multi-tier application as an

open system with a network of G/G/n queues (one per each

tier); this model is used to estimate the number of VMs that

need to be added to, or removed from, the bottleneck tier,

and the associated cost (VM usage is charged per minute,

instead of the typical per-hour cost model). Finally, Baci-

galupo et al [41] considered a queuing network of three tiers,

solving each tier to compute the mean response time.

The discussed approaches are used as part of the analy-

sis phase of the MAPE loop. Many different techniques can

be used to implement the planning phase, such as a predic-

tive controller [40] or an optimization algorithm (e.g. dis-

tributing servers among different applications, while maxi-

mizing the revenue [101]). The information required for a

queuing model, such as the input workload (number of re-

quests) or service time can be obtained by on-line monitor-

ing [62] [100] or estimated using different methods. For ex-

ample, Zhang et al [107] used a regression-based approxi-

mation in order to estimate the CPU demand, based on the

number and type (browsing, ordering or shopping) of client

requests.

Queuing models have been extensively used to model

applications and systems. They usually impose a ﬁxed ar-

chitecture, and any change in structure or parameters re-

quire solving the model (with analytical or simulation-based

tools). For this reason, they are not cheap when used with

elastic (dynamically variable) applications that have to deal

with a changing input workload and a varying pool of re-

sources. Additionally, a queuing system is an analysis tool,

that requires additional components to implement a com-

plete auto-scaler. Queuing models could be useful for some

particular cases of applications, e.g. when the relationship

between the number of requests and needed resources is

quite linear. Although there are efforts to model more com-

plex multi-tier applications, queuing theory might not be the

best option when trying to design a general-purpose auto-

scaling system.

Table 3 contains a summary of the articles reviewed in

this section.

5.4 Control Theory

Control theory has been applied to automate the manage-

ment of different information processing systems, such as

web server systems, storage systems and data centers/server

clusters. For cloud-hosted, elastic applications, a control sys-

tem may combine both phases of the auto-scaling task (anal-

ysis and planning).

5.4.1 Description of the Technique

The main objective of a controller is to automate the man-

agement (e.g. scaling task) of a target system (e.g. a cloud

application as deﬁned in Section 2). The controller has to

maintain the value of a controlled variable y (e.g. CPU load),

close to the desired level or set point yre f , by adjusting the

manipulated variable u (e.g. number of VMs). The manipu-

lated variable is the input to the target system, whereas the

controlled variable is measured by a sensor and considered

the output of the system.

There are three main types of control systems: open loop,

feedback and feed-forward. Open-loop controllers, also re-

ferred to as non-feedback, compute the input to the target

system using only the current state and its model of the sys-

tem. They do not use feedback to determine whether the sys-

tem output yhas achieved the desired goal yref . In contrast,

feedback controllers observe system output, and are able to

correct any deviation from the desired value (see Figure 2).

Feed-forward controllers try to anticipate to errors in the

output. They predict, using a model, the behavior of the sys-

tem, and react before the error actually occurs. The predic-

tion may fail and, for this reason, feedback and feed-forward

controllers are usually combined.

A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 13

Table 3 Summary of the reviewed literature about queuing theory

Ref Auto-scaling Techniques H/V R/P Metric Monitoring SLA Workloads Experimental Platform

[100] QT + Histogram + Thresh-

olds

H R/P Peak workload Custom tool. 15

minutes

Response time Synthetic and Real

(World Cup 98)

Custom testbed. Xen + 2 applica-

tions (RUBiS and RUBBOS)

[101] QT H R Arrival rate, service

time

Simulated Response time Real. E-commerce web-

site (2001) + Synthetic

traces

Custom simulator (Monte-Carlo)

[107] QT + Regression (Predict

CPU load)

- P Number and type

of transactions

(requests), CPU

load

Custom tool. 1

minute

- Synthetic (browsing, or-

dering and shopping)

Custom simulator, based on

C++Sim. + Data collected from

TPC-W

[62] QT (model) + Reactive

scaling

H R Arrival rate, service

time

Custom tool. 1

minute

Response time, cost Synthetic (browsing, or-

dering and shopping)

Custom testbed (called IC Cloud) +

TPC-W benchmark

[41] QT + Historical perfor-

mance model

H P Arrival rate - Response time Synthetic (browsing,

buying)

Custom testbed. Eucalyptus + IBM

Performance Benchmark Sample

Trade

SYSTEM

Scalable application

CONTROLLER

Manipulated

or input

variable (u)

FEEDBACK

Control

error (e)

Target

value or

set point (yref)

Controlled

or output

variable (y)

Fig. 2 Block diagram of a feedback control system

From now on, focus will be put on feedback controllers,

as they are frequently used in the literature. They can be

classiﬁed into several categories [90]:

Fixed gain controllers: This class of controllers are very pop-

ular due to their simplicity. However, after selecting the

tuning parameters, they remain ﬁxed during the opera-

tion time of the controller. The most common controller

in this class is called Proportional Integral Derivate (PID),

with the following control algorithm:

uk=Kpek+Ki

∑

j=1

ej+Kd(ek−ek−1)(7)

where ukis the new value for the manipulated variable

(e.g. new number of VM); ekis the difference between

the output ykand the set point yre f ; and Kp,Kiand Kdare

the proportional, integral and derivative gain parameters

(respectively) that need to be adapted to a given target

system. Different variants of the PID controller are used,

such as the Proportional Integral (PI) controller or the

Integral (I) controller. The latter can be represented as

uk=uk−1+Kiek

Adaptive controllers: As the name suggests, adaptive con-

trol is able to adjust the parameter values on-line, in or-

der to adapt the controller to the changing conditions of

the environment. Examples of adaptive controllers are

self-tuning PID controllers, gain-scheduling and self-tuning

regulators. They are suitable for slowly varying work-

load conditions, but not for sudden bursts; in that case,

the on-line model estimation process may fail to capture

the dynamics of the system.

Model predictive controllers (MPC): MPCs follow a proac-

tive approach; the future behavior (output) of the system

is predicted, based on the model and the current output.

In order to maintain this output close to the target value,

the controller solves an optimization problem taking into

account a pre-deﬁned cost function. An example of MPC

is the look-ahead controller [94].

As explained before, the controller has to adjust the in-

put variable (e.g. number of VMs), in order to maintain the

desired value in the output variable (e.g. average CPU load

of 90%). For this purpose, a formal relationship between the

input and the output has to be modeled, so as to determine

how a change in the former affects the value of the output.

This formal relationship is denoted as transfer function in

classical control theory, state-space function in modern con-

trol theory, or simply as performance model. PID controllers

consider a simple linear equation, but there are many alter-

natives that consider non-linear approaches, and even sev-

eral input and output variables (yielding a Multiple-Input

Multiple-Output (MIMO) controller, instead being Single-

Input Single-Output (SISO)). In the literature, authors have

proposed different performance models of the system, in-

cluding these:

ARMA(X) [88]: An ARMA (auto-regressive moving aver-

age) model is able to capture the characteristics of a time

series and then makes predictions of future values. AR-

MAX (ARMA with eXogenous input) models the rela-

tionship between two time series. Both models are fur-

ther explained in Section 5.5.

Kalman ﬁlter [69]: It is a recursive algorithm for making

predictions based on time series.

Smoothing splines [43]: It is a method of smoothing (i.e ﬁt-

ting a smooth curve to a set of noisy observations) using

splines. This term refers to a polynomial function, de-

ﬁned by multiple subfunctions.

Kriging model or Gaussian Process Regression [58]: It ex-

tends traditional linear regression with a statistical frame-

work that is able to predict the value of the target func-

tion in un-sampled locations together with a conﬁdence

measure.

Fuzzy model [106] [103] [74]: Fuzzy models are based on

fuzzy rules. They are based on the idea that the member-

ship of an element to a set has a degree value in a con-

14 Tania Lorido-Botran et al.

Table 4 Summary of the reviewed literature about control theory techniques

Ref Auto-scaling Techniques H/V R/P Metric Monitoring SLA Workloads Experimental Platform

[89] CT: PI controller V R Job progress Sensor library Job deadline Batch jobs Custom testbed. HyperV + 5 appli-

cations (ADCIRC, OpenLB, WRF,

BLAST and Montage)

[75] CT: PI controller (Propor-

tional thresholding) + Ex-

ponential Smoothing for

performance variable

H R CPU load, request

rate

Hyperic HQ (Xen) - Synthetic. Different

number of threads.

Custom testbed. Xen + ORCA +

simple web service

[76] CT: PI controller (Propor-

tional thresholding)

H R CPU load, request

rate

Hyperic SIGAR. 10

seconds

- Synthetic. Custom testbed. Xen + Modiﬁed

CloudStone (using Hadoop Dis-

tributed File System)

[88] CT: MIMO adaptive con-

troller + ARMA (perfor-

mance model)

V P CPU usage, disk

I/O, response time

Xen + custom tool.

20 seconds

Response time Synthetic and real-

istic (generated with

MediSyn)

Custom testbed. Xen + 3 appli-

cations (RUBiS, TPC-W, media

server)

[40] CT: Adaptive controllers +

H R/P Number of requests,

service rate

Simulated Number of requests

not handled

Real. World Cup 98. Custom simulator in Python

[39] CT: Adaptive, Propor-

tional controller + QT

H R/P Number of requests,

requests in buffer,

service rate

Simulated. 1 minute - Real. World Cup 98 and

Google Cluster Data

Custom simulator in Python

[43] CT: Gain-scheduler (adap-

tive) + Smoothing splines

(performance model) +

Linear Regression

H P Number of requests,

number of servers,

response time

20 seconds Response time Synthetic (Faban gener-

ator)

Real provider. Amazon EC2 +

CloudStone benchmark

[58] CT: Self-adaptive con-

troller + Kriging model

(performance model)

H P Number of incom-

ing and enqueued re-

quests, number of

VMs

- Execution time Synthetic (Batch jobs) Custom testbed. Private cloud +

Sun Grid Engine (SGE)

[106] CT: fuzzy controller V R Number and mixture

of requests, CPU

load

Custom tool. 20 sec-

onds

Reply rate Real (WorldCup 98) and

Synthetic (Httperf)

Custom testbed. VMware ESX

Server + Java Pet Store.

[103] Fuzzy model V P Number of queries,

CPU load, disk I/O

bandwidth

Xentop. 10 seconds Response time,

throughput

Synthetic and realistic

(based on WorldCup 98)

Custom testbed. Xen + 2 applica-

tions (RUBiS and TPC-H)

[74] CT: Adaptive controller +

Fuzzy model (+ ANN)

V P Number of requests,

resource usage

Custom tool. 3 min-

utes

End-to-end delay Synthetic. (Pareto distri-

bution)

Simulation

[69] CT: Adaptive SISO and

MIMO controllers +

Kalman ﬁlter

V P CPU load Custom tool. 5-10

seconds

Response time Synthetic (Browsing and

bidding mix)

Custom testbed. Xen + RUBiS ap-

plication

tinuous interval between 0 and 1 (in contrast to Boolean

logic). Fuzzy models are further described in Subsection

5.4.2.

5.4.2 Review of Proposals

Fixed-gain controllers, including PID, PI and I, are the sim-

plest controller types, and have been widely used in the lit-

erature. For example, Lim et al [75] [76] use an I controller

to adjust the number of VMs based on average CPU us-

age, while Park and Humphrey [89] apply a PI controller

to manage the resources required by batch jobs, based on

their execution progress. Gain parameters Kpand KIcan

be set manually, based on trial-and-error [75] or using an

application-speciﬁc model. For example, in [89] a model is

constructed to estimate the progress of a job with respect

to the resources provisioned. Zhu and Agrawal [108] rely

on a RL agent in order to estimate the derivative term of

a PID controller. With the trial-and-error training, the RL

agent learns to minimize the sum of the squared error of the

control variables (the adaptive parameters) without violating

the time and budget constraints over time.

Adaptive control techniques are also rather popular. For

example, Ali-Eldin et al [40] propose combining two proac-

tive, adaptive controllers for scaling down, using dynamic

gain parameters based on input workload. A reactive ap-

proach is used for scaling up. The same authors propose

in [39] an adaptive, proportional controller, using a proac-

tive approach for both scaling up and down, and taking into

account the VM startup time. As stated before, adaptive con-

trol techniques rely on the use of performance models. Padala

et al [88] propose a MIMO adaptive controller that uses

a second-order ARMA to model the non-linear and time-

varying relationship between the resource allocation and its

normalized performance. The controller is able to adjust the

CPU and disk I/O usage. Bod´

ık et al [43] combine smooth-

ing splines (used to map the workload and number of servers

to the application performance) with a gain-scheduling adap-

tive controller. Kalyvianaki et al [69] designed different SISO

and MIMO controllers to determine the CPU allocation of

VMs, relying on Kalman ﬁlters, whereas [58] utilized a Krig-

ing model to predict job completion time as a function of the

number of VM, the number of incoming requests and the

jobs enqueued at the master node.

Fuzzy models have been used as a performance model in

control systems to relate the workload (input variable) and

the required resources (output variable). First, both input and

output variables of the system are mapped into fuzzy sets.

This mapping is deﬁned by a membership function that de-

termines a value within the interval [0,1]. A fuzzy model

is based on a set of rules, that relate the input variables

(pre-condition of the rule), to the output variables (conse-

quence of the rule). The process of translating input val-

ues into one or more fuzzy sets is called fuzziﬁcation. De-

fuzziﬁcation is the inverse transformation which derives a

single numeric value that best represents the inferred fuzzy

values of the output variables. A control system that relies

on a rule-based fuzzy model is called a fuzzy controller.

Typically, the rule set and membership functions of a fuzzy

model are ﬁxed at design time, and thus, the controller is

unable to adapt to a highly dynamic workload. An adaptive

approach can be used, in which the fuzzy model is repeat-

A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 15

edly updated based on on-line monitored information [106],

[103]. Xu et al [106] applied an adaptive fuzzy controller to

the business-logic tier of a web application, and estimated

the required CPU load for the input workload. A similar

approach is followed by [103], but here authors focus on

the database tier. They claim that they use the fuzzy model

to predict the future resource needs; however, they use the

workload of the current time step t, to calculate the resource

needs rt+1of the time step t+1, based on the assumption

that no sudden change happened within the duration of a

time step.

A further improvement is the neural fuzzy controller,

which uses a four-layer neural network (see Section 5.5) to

represent the fuzzy model. Each node in the ﬁrst layer cor-

responds to one input variable. The second layer determines

the membership of each input variable to the fuzzy set (the

fuzziﬁcation process). Each node in layer 3 represents the

precondition part of one fuzzy logic rule. An ﬁnally, the out-

put layer acts as a defuzziﬁer, which converts fuzzy conclu-

sions from layer 3 into numeric output in terms of resource

adjustment. At early steps, the neural network only contains

the input and output layers. The membership and the rule

nodes are generated dynamically through the structure and

parameters learning. Lama and Zhou [74] relied on a neural

fuzzy controller, that is capable of self-constructing its struc-

ture (both the fuzzy rules and the membership functions) and

adapting its parameters through fast on-line learning (a re-

conﬁguring controller type).

Finally, the last type of controllers that follow a proac-

tive approach are MPCs. Roy et al [94] combined an ARMA

model for workload forecasting, with the look-ahead con-

troller in order to optimize the resource allocation problem.

Fuzzy models can also be used to create a Fuzzy Model Pre-

dictive Controller [102].

The suitability of controllers for the auto-scaling task

highly depends on the type of controller and the dynamics

of the target system. The idea of having a controller that au-

tomates the process of adding/removing resources is very

appealing, but still, the problem is how to create a reliable

performance model that maps the input and output variables.

Simple reactive controllers could be used for applications

with easy to predict needs. However, it seems advisable to

focus efforts towards both adaptive and MPCs that are able

to adapt the application model and could be more suitable to

produce a general auto-scaling solution.

Table 4 shows a summary of the articles reviewed in this

section.

5.5 Time Series Analysis

Time series are used in many domains including ﬁnance, en-

gineering, economics and bioinformatics, generally to repre-

sent the change of a measurement over time. A time series is

a sequence of data points, measured typically at successive

time instants spaced at uniform time intervals. An example

is the number of requests that reaches an application, taken

at one-minute intervals. The time series analysis can be used

in the analysis phase of the auto-scaling process, in order to

ﬁnd repeating patterns in the input workload or to try to fore-

cast future values.

5.5.1 Description of the Technique

Given the scenario described in Section 2, a certain perfor-

mance metric, such as average CPU load or the input work-

load, will be sampled periodically, at ﬁxed intervals (e.g.

each minute). The result will be a time series Xcontaining a

sequence of wobservations (wis the time series length):

X=xt,xt−1,xt−2,..., xt−w+1(8)

Time series techniques can be applied in order to predict

future values of the metric, e.g. the future workload or re-

source usage. Based on this predicted value, a suitable auto-

scaling action can be planned, using for example a set of

predeﬁned rules [71], or solving an optimization problem

for the resource allocation [94].

Formally, the objective of time series analysis is to fore-

cast future values of the time series, based on the last q

observations, which is denoted as input window or history

window (where q≤w). The future value ˆxt+ris rintervals

ahead of the input window. Time series analysis techniques

can be classiﬁed into two broad groups: some of them focus

on the direct prediction of future values, whereas other tech-

niques try to identify the pattern (if present) followed by the

time series, and then extrapolate it to predict future values.

The ﬁrst group includes moving average, auto-regression,

ARMA (combining both), exponential smoothing and dif-

ferent approaches based on machine learning:

Moving average methods (MA): They can be used to smooth

a time series in order to remove noise or to make predic-

tions. The forecast value ˆxt+ris calculated as the weighted

average of the last qconsecutive values. Typically, the

prediction interval ris set to 1. Then, the general for-

mula is as follows: ˆxt+r=a1xt+a2xt−1+.. .+aqxt−q+1,

where a1,a2,...,aqare a set of positive weighting fac-

tors that must sum 1. Simple moving average MA(q)

considers the arithmetic mean of the last qvalues, i.e.,

it assigns equal weight 1

qto all observations. In contrast,

the weighted moving average WMA(q) assigns a differ-

ent weight to each observation. Typically, more weight

is given to the most recent terms in the time series, and

less weight to older data.

Exponential smoothing (ES): Similarly to moving average,

it calculates the weighted average of past observations,

16 Tania Lorido-Botran et al.

but exponential smoothing takes into account all the past

history of the time series (wobservations). It assigns

exponentially decreasing weights over time. A new pa-

rameter is introduced, a smoothing factor αthat weak-

ens the inﬂuence of past data. There are different ver-

sions of exponential smoothing such as simple or sin-

gle ES, and Brown’s double ES [44]. In simple ES, the

following formula is used recursively to calculate the

current smoothed value st, based on the current obser-

vation xtand the previous smoothed value st−1:st=

αxt+ (1−α)st−1. The forecast for the next period ˆxt+1

is simply the current smoothed value st. The predictor

formula for simple exponential smoothing can be ex-

panded as follows:

ˆxt+1=αxt+ (1−α)ˆxt

=αxt+ (1−α)[αxt−1+ (1−α)ˆxt−1]

=αxt+α(1−α)xt−1+

(1−α)2[αxt−2+ (1−α)ˆxt−2]

=αxt+α(1−α)xt−1+α(1−α)2xt−2+

. . . + (1−α)w−1ˆxt−w+1

(9)

where ˆxt+1represents the forecast value for the period

t+1, based on actual xtvalue and the previous values of

the time series. ˆxtrefers to the forecast made for period t.

The ﬁrst smoothed value ˆxt−w+1can be set to the initial

value of the time series xt−w+1, or to the mean of the few

ﬁrst observations.

Simple ES is suitable for time series that have no sig-

niﬁcant trend changes. For time series with an existing

linear trend, the Brown’s double ES should be applied.

It calculates two smoothing series:

t=αxt+ (1−α)s1

t−1

t=αs1

t+ (1−α)s2

t−1

(10)

The second smooth series s2

tis obtained by applying

simple ES to series s1

t. Both smoothed values s1

tand s2

are used to estimate the level ctand trend dtof the time

series at the time t. Based on these, the forecast value

ˆxt+ris calculated as follows:

ˆxt+r=ct+rdt

ct=2s1

t−s2

dt=α

1−αs1

t−s2

(11)

Auto-regression of order p, AR( p): The prediction formula

is determined as the linear wighted sum of the ppre-

vious terms in the series: ˆxt+1=b1xt+b2xt−1+.. . +

bpxt−p+1+εt. Parameter pcorresponds to the number

of terms in the AR equation, that may be different from

the history window length w. The formula may include

a white noise term εt. The key is to derive the best val-

ues for the weights or auto-regression coefﬁcients b1,

b2,...,bp. There are a diversity of techniques for com-

puting AR coefﬁcients, such as least squares or the max-

imum likelihood method. In the literature, the most com-

mon method is based on the calculation of auto-correlation

coefﬁcients and the Yule-Walker equations. The auto-

correlation function (ACF) of a time series gives corre-

lations between xtand xt−kfor lag k=1,2,3,. ..:

rk=covariance(xt,xt−k)

var(xt)=E[(xt−µ)(xt−k−µ)]

var(xt)(12)

The autocorrelation can be estimated as:

rk=1

(w−k)σ2

w−k

∑

t=1

(xt−¯x)∗(xt−k−¯x)(13)

The full autocorrelation function can be derived by re-

cursively calculating rk=∑p

i=1bkri−k. The result is a

set of linear equations called the Yule-Walker equations,

that can be represented in matrix form as:







1r1r2r3.. . rp−1

r11r1r2.. . rp−2

r2r11r1.. . rp−3

rp−1rp−2rp−3rp−4.. . 1



















=











(14)

By solving this set of equations, auto-regression coefﬁ-

cients b1,b2,...,bpcan be derived for any pvalue. For

example, for AR(1) (with p=1), the auto-regression co-

efﬁcient b1is equal to the corresponding autocorrelation

coefﬁcient r1(b1=r1).

Auto-Regressive Moving Average, ARMA(p,q): This model

combines both auto-regression (of order p) and moving

average (of order q). As stated before, AR takes into ac-

count the last pobservations in the time series xt,xt−1,

xt−2,...,xt−p. The MA model, which is different from

the MA method described previously, is the sum of the

time series mean µ, plus the innovations or white error

terms εt,εt−1,...,εt−q.

xt=µ+εt+a1εt−1+a2εt−2+.. . +aqεt−q(15)

where εt∼N(0,σ2). Then, the ARMA model is repre-

sented as:

xt=b1xt−1+.. . +bpxt−p+εt+a1εt−1+. .. +aqεt−q

A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 17

(16)

Note that an ARMA(0,q) is a pure MA model; and an

ARMA(p,0) corresponds to an AR model. There are

several techniques for selecting the appropriate values

for the orders pand q, and estimating the coefﬁcients

a1,a2,...,aqand b1,b2,...,bp.

ARMA is a suitable model for stationary processes, i.e.

the mean and variance of the time series remain con-

stant over time. Thus, the time series must not show

any trend (the variations of the mean) or seasonal varia-

tions. If the AR model correctly ﬁts the time series, the

residual εis a white noise that shows no pattern. An

extension of the ARMA model, called ARIMA (Auto-

Regressive Integrated Moving Average), can be applied

to non-stationary time series. Another extension called

ARMAX(p,q,b), or ARMA with eXogenous inputs, is

able to capture the relationship between a given time se-

ries Xand another external time series D. It contains the

AR(p) and MA(q) models of Xand a linear combination

of the last bterms of the time series D.

Machine Learning-based techniques: These are two popu-

lar techniques used to carry out analysis of time series

that can be considered part of the broader “machine learn-

ing” ﬁeld:

Regression is a statistical method used to determine the

polynomial function that is the closest to a set of

points (in this case, the wvalues of the history win-

dow). Linear regression refers to the particular case

of a polynomial of order 1. The objective is to ﬁnd

a polynomial such that the distance from each of the

points to the polynomial curve is as small as possible

and therefore ﬁts the data the best. When the number

of input variables is more than one, it is referred to as

the Multiple Linear Regression. The Linear Regres-

sion equation is the same used in AR, but the weight

estimation method differs.

Neural networks consist of an interconnected group of

artiﬁcial neurons, arranged in several layers: an in-

put layer with several input neurons; an output layer

with one or more output neurons; and one or more

hidden layers in between. For time series analysis,

the input layer contains one neuron for each value in

the history window, and one neuron for the predicted

value in the output layer. During the training phase, it

is fed with input vectors and random weights. Those

weights will be adapted until the given input shows

the desired output, at a learning rate ρ.

As previously stated, a group of time series analysis tech-

niques try to identify the pattern that the series follows, and

then use this pattern to extrapolate future values. Time series

patterns can be described in terms of four classes of compo-

nents: trend, seasonality, cyclical and randomness. The gen-

eral trend (e.g. increasing or decreasing pattern), together

with the seasonal variations that appear repeated over a spe-

ciﬁc period (e.g. day, week, month, or season), are the most

common components in a time series. Input workloads of

cloud applications may show different periodic components.

The trend identiﬁes the overall slope of the workload, whereas

seasonality and cyclical determine the peaks at speciﬁc points

of time in a short term and in a long term basis, respectively.

A wide diversity of methods can be used to ﬁnd repeti-

tive patterns in time series, including:

Pattern matching: It searches for similar patterns in the his-

tory time series, that are similar to the present pattern. It

is very close to the string matching problem, for which

several efﬁcient algorithms are available (e.g. Knuth-Morris-

Prat [53]).

Signal processing techniques: Fast Fourier Transform (FFT)

is a technique that decomposes a signal time series into

components of different frequencies. The dominant fre-

quencies (if any) will correspond to the repeating pattern

in the time series.

Auto-correlation: In auto-correlation, the input time series

is repeatedly shifted (up to half the total window length),

and the correlation is calculated between the shifted time

series and the original one. If the correlation is higher

than a given threshold (e.g. 0.9) after sshifts, a repeating

pattern is declared, with duration ssteps.

A basic tool for time series representation is the his-

togram. It involves distributing the values of the time se-

ries into several equal-width bins, and representing the fre-

quency for each bin. It has been used in the literature to rep-

resent the resource usage pattern or distribution, and then

predict future values.

5.5.2 Review of Proposals

In the context of elastic applications, time series analysis

have been applied mostly to predict workload or resource

usage. A simple moving average could be used for this pur-

pose, but with poor results [60]. For this reason, authors have

applied MA only to remove noise from the time series [89],

[75], or just to have a comparison yardstick. For example,

Huang et al [65] present a resource prediction model (for

CPU and memory utilization) based on double exponential

smoothing, and compare it with simple mean and weighted

moving average (WMA). ES clearly obtained better results,

because it takes into account the history records w(not only

the input window q) of the time series for the prediction. Mi

et al [85] also used Brown’s double ES in order to forecast

input workload of real traces (World Cup 98 and ClarkNet),

and obtained good accuracy results, with a small amount of

error (a mean relative error of 0.064 for the best case).

The auto-regression method has also been used for re-

source or workload forecasting ([73], [50], [60], [49], [71],

18 Tania Lorido-Botran et al.

Table 5 Summary of the reviewed literature about techniques based on time series analysis

Ref Auto-scaling Techniques H/V R/P Metric Monitoring SLA Workloads Experimental Platform

[47] TS: Pattern matching H P Total number of

CPUs

100 seconds Number of serviced

requests, cost

Real cloud workloads:

from Animoto and 7

IBM cloud applications

Analytical models

[60] TS: FFT and Discrete

Markov Chains. Compared

with auto-regression, auto-

correlation, histogram,

max and min.

V P CPU load Libxenstat library. 1

minute

Response time Real. World Cup 98

and ClarkNet. Also Syn-

thetic trace

Custom testbed. Xen + RUBiS +

part of Google Cluster Data trace

for CPU usage.

[95] TS: FFT and Discrete-time

Markov Chain

V P/R CPU load, memory

usage

Libxenstat library. 1

second

Response time, job

progress

Synthetic. Based on

World Cup 98 and EPA

web server

Custom testbed. Xen + 3 appli-

cations (RUBiS, Hadoop MapRe-

duce, IBM System S)

[57] TS: ARMA Both P Number of requests,

CPU load

- Prediction accuracy Real traces. Collected

from real applications

and a real data center

Custom testbed. Xen and KVM

[65] TS: Brown’s double ES.

Compared with WMA

- P CPU load, memory

usage

Simulated Prediction accuracy Synthetic trafﬁc gener-

ated with simulator

CloudSim simulator

[85] TS: Brown’s double ES. - P Number of requests

per VM

10 minutes - Real. World Cup 98

and ClarkNet. Synthetic.

Poisson distribution

Custom testbed. TPC-W

[50] TS: AR H P Login rate, number

of active connec-

tions, CPU load

Simulated Service not available

(login), energy con-

sumption

Real. From Windows

Live Messenger (login

rate, number of active

connections)

Simulator.

[71] TS: AR + Threshold-based

rules

Both P Number of requests Zabbix - Synthetic Hybrid: Amazon EC2 + Custom

testbed (Xen + Eucalyptus + Php-

Collab application)

[94] TS: AR H P Number of users in

the system

- Response time, VM

cost, application re-

conﬁguration cost

Real. World Cup 98 No experimentation on systems

[67] TS: ML Neural Network

and (Multiple) LR + Slid-

ing window

H P CPU load (aggre-

gated value for all

VMs)

Amazon Cloud-

Watch. 1 minute

Prediction accuracy Synthetic. TPC-W gen-

erator, constant growing

Real provider. Amazon EC2 and

TPC-W application to generate the

dataset. Prediction models are only

evaluated using cross-validation

and several accuracy metrics. Ex-

periments in R-Project.

[91] TS: ML Neural Network

(compared to MA, last

value and simple ES)

H P Number of entities

(players)

2 minutes Prediction accuracy Synthetic. Entities (play-

ers) with different be-

haviors

Simulator of a MMORPG game

[56] TS: Polynomial regression Both P Number of requests Custom tool. 1

minute

Response time, cost

for VM, application

license and reconﬁg-

uration

Real traces. From a pro-

duction data center of a

company

Custom testbed. KVM + Olio

[66] Threshold-based rules

(scale out) + TS, poly-

nomial regression (scale

in)

H R/P CPU load (scale

out), number of

requests, number of

VMs (scale in)

1 minute Response time Synthetic. Httperf Custom testbed. Eucalyptus + RU-

BiS

[49] TS: AR(1) and Histogram

+ QT

H P Request rate and ser-

vice demand

Simulated. 1, 5, 10

and 20 minutes

Response time Synthetic (Poisson

distribution) and Real

(World Cup 98)

Custom simulator + algorithms in

Matlab

[94]). For example, Roy et al [94] applied AR for workload

prediction, based on the last three observations. The pre-

dicted value is then used to estimate the response time. An

optimization controller takes this response time as an input

and computes the best resource allocation, taking into ac-

count the costs of SLA violations, leasing resources and re-

conﬁguration. Kupferman et al [73] applied auto-regression

of order 1 to predict the request rate (requests per second)

and found that its performance depends largely on several

manager-deﬁned parameters: the monitoring-interval length,

the size of the history window and the size of the adaptation

window. The history window determines the sensitivity of

the algorithm to short-term versus long-term trends, while

the size of the adaptation window determines how far into

the future the model extends.

ARMA models are able to capture characteristics of a

time series such as the input workload or the CPU usage.

Fang et al [57] found it useful to predict the future CPU us-

age of VMs. However, they remark the computational cost

of this technique, that includes the choice of pand q, the

estimation of the coefﬁcients of each term and other param-

eters.

The history window values can also be the input for

a neural network [67] [91] or a multiple linear regression

equation [73], [43], [67]. The accuracy of both methods de-

pends on the input window size. Indeed, Islam et al [67]

obtained better results when using more than one past value

for prediction. Kupferman et al [73] further investigated the

topic and found that it is necessary to balance the size of

each sample in the window, to avoid overreacting, but also to

maintain a correct level of sensitivity to workload changes.

They propose regressing over windows of different sizes,

and then using the mean of all predictions. Another impor-

tant issue is the choice of the prediction interval r. Islam et al

[67] propose using a 12-minute interval, because the setup

time of VM instances in the cloud is typically around 5-15

min. In another context, Prodan and Nae [91] use a neural

network to predict a game load (i.e. the number of entities

or players) in the next two minutes. The neural network ob-

tained better accuracy than moving average and simple ex-

ponential smoothing.

Most time series analysis techniques have been applied

to vertical or horizontal scaling separately. Dutta et al [56]

claim that vertical scaling has limited range but has lower re-

source and conﬁguration costs, while horizontal scaling can

allow the application to achieve a much larger throughput

but at a potentially higher cost. For this reason, they com-

bine both VM resizing in case of small increments in the re-

quest rate, and apply horizontal scaling for major changes in

the input workload. They use a polynomial regression to es-

A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 19

timate the expected number of requests for the next interval.

Fang et al [57] focus on vertical scaling (CPU and memory)

for regular changes in workload, whereas horizontal scaling

is applied in order to handle sudden spikes and ﬂash crowds.

Time series forecasting (associated to proactive decision

making) can be combined with reactive techniques. For ex-

ample, Iqbal et al [66] proposed a hybrid scaling technique

that utilizes reactive rules for scaling up (based on CPU us-

age) and a regression-based approach for scaling down. Af-

ter a ﬁxed number of intervals in which response time is

satisﬁed, they calculate the required number of application-

tier and database-tier instances using polynomial regression

(of degree two).

Some auto-scaling proposals use time series analysis tech-

niques that deal with pattern identiﬁcation, applied to the in-

put workload [46], [47], [60], [95]. The most complete com-

parison of this class of techniques is done by Gong et al [60];

they propose using FFT to identify repeating patterns in re-

source usage (CPU, memory, I/O and network), and com-

pare it with auto-correlation, auto-regression and histogram.

Pattern matching, proposed by Caron et al [46] [47], has two

main drawbacks: the large number of parameters in the al-

gorithm (such as the maximum number of matches or the

length of the predicted sequence), that highly affect the per-

formance of the algorithm, and the time required to explore

the past history trace.

Simple histograms have also been used by some authors

to predict the resource usage of applications, considering the

mean of the distribution [49], or the mean of the bin with the

highest frequency [60].

Time series analysis techniques are very appealing for

implementing auto-scalers, as they are able to predict future

demands arriving to elastic applications. Having this infor-

mation, it is possible to provide resources in advance and

deal with the time required to start up new VMs or add re-

sources to a particular instance. Despite the potential of this

set of techniques, their main drawback relies on the predic-

tion accuracy, that highly depends on the target application,

input workload pattern and/or burstiness, the selected met-

ric, the history window and prediction interval, as well as on

the speciﬁc technique being used. Efforts should be focused

on automating the selection of the best prediction technique

for a particular application or application class.

Table 5 contains a summary of the articles reviewed in

this section.

6 Conclusions and Future Work

The focus of this review has been on auto-scaling elastic

applications in cloud environments. While capacity plan-

ning is required for any scalable applications, the elastic-

ity provided by cloud infrastructures allows for an almost-

immediate adaptation of resources to application needs (driven

by an external demand). Auto-scalers try to automate this

adaptation, minimizing resource-related costs while allow-

ing the application to comply with the SLA.

In Section 3, the auto-scaling task was deﬁned as a MAPE

process, composed of four phases: monitor, analyze, plan

and execute. Given the extensive literature about auto-scaling

for elastic applications, a classiﬁcation criterion has been

proposed to organize the different proposals into ﬁve main

categories: threshold-based rules, reinforcement learning, queu-

ing theory, control theory and time series analysis. Each of

these categories has been described separately, including their

pros/cons, together with a critical review of relevant articles

using the technique that deﬁnes the category.

One of the conclusions that can be extracted from this

survey is that reactive auto-scaling systems might not be

able to cope with abrupt changes in the input workload, spe-

cially in the case of sudden trafﬁc surges. One of the rea-

sons is the time required to acquire and set-up new resources

(e.g. the boot-up time of a new instance). It seems advis-

able to redirect research efforts towards developing proac-

tive auto-scaling systems able to predict future needs and ac-

quire the corresponding resources with enough anticipation,

maybe using some reactive methods to correct possible pre-

diction errors. As an example, Moore et al [86] demonstrate

that, compared to a purely reactive controller, their auto-

scaling system (based on both reactive and proactive mod-

els) was able to make better provisioning decisions, yielding

few QoS violations. Additionally, attention should be paid

to methods to reduce the time necessary to provision new

VMs, including those that take advantage of vertical scaling

actions that usually needs only seconds to be fulﬁlled.

In our opinion, auto-scaling systems should take advan-

tage of the prediction capabilities of time series analysis

techniques, together with the automation capabilities of con-

trollers. It is our plan to propose a predictive auto-scaling

technique based on time series forecasting algorithms. The

review of the literature has shown that the accuracy of these

algorithms depend on different parameters such as the sizes

of the history window and the adaptation window. Optimiza-

tion techniques could be used to adjust these values, in order

to customize the parameters for a given scenario.

When going through this review, the reader should have

noticed the large diversity of testing methodologies that au-

thors have used to assess their proposals (many of them are

described in the companion Appendix). Authors use differ-

ent ways of generating input workload, different types of ap-

plications (realistic or simulated), different SLA deﬁnitions,

different execution platforms, etc. In fact, the lack of a com-

mon testing platform capable of generating a well-deﬁned

set of standardized metrics is the reason that prevented a

comparative analysis of the reviewed techniques in quan-

titative terms. We are currently working in the development

of a simulation-based workbench that could provide a com-

20 Tania Lorido-Botran et al.

mon testing platform. It has been used already to implement

and compare several auto-scaling techniques [77], although

this work is still preliminary. In the same line, it would be of

interest to deﬁne a commonly accepted scoring metric to ob-

jectively determine whether one auto-scaling algorithm per-

forms better than another in a particular scenario.

Throughout this review it has been assumed that a client

runs an elastic application on a single and homogeneous

cloud infrastructure. However, clients may need to deploy

their applications on hybrid or federated clouds. Hybrid clouds

combine resources from both a private and a public cloud,

while federated systems comprise different public providers.

The combination of different cloud platforms represents ad-

ditional challenges for the auto-scaling task. For example,

the monitoring information has to be gathered from different

sources, probably using different tools with their own APIs.

The list of performance metrics available and the granular-

ity level will depend on the provider itself. In the analysis

phase, an extra effort will be necessary to select the suitable

metric from each provider, maybe requiring mapping func-

tions to combine different metrics. After making a scaling

decision, a variety of APIs may be available to implement

it in different platforms, and the options and their conse-

quent effects may vary greatly. Examples of these platform-

dependent particularities are the availability of horizontal or

vertical scaling (or any of them), the capabilities of the dif-

ferent VM templates and the billing schemes (per hour, per

minute). Different efforts have been carried out towards sim-

plifying the use of federated/hybrid models [70] [84], or the

interoperability among different cloud platforms, including

the proposal of open standards, such as Open Virtualization

Format (OVF), Open Cloud Computing Interface (OCCI)

and Cloud Data Management Interface (CDMI). However,

orchestrating the auto-scaling capabilities of different cloud

providers is still an open challenge.

Finally, the problem of auto-scaling is closely related

to an infrastructure-related one: VM placement, the actual

mapping of the VMs forming an application onto the phys-

ical servers of the cloud provider. We are currently working

on ways to optimize the placement of ﬁxed-size applications

in order to maximize the revenue for the cloud provider,

while satisfying the resource SLA. Auto-scaling capability

of (elastic) applications imposes an additional challenge to

the provider, as the set of VMs of an application varies with

time.

Acknowledgements This work has been partially supported by the

Saiotek and Research Groups 2007-2012 (IT-242-07) programs (Basque

Government), TIN2010-14931 and COMBIOMED network in compu-

tational biomedicine (Carlos III Health Institute). Dr. Miguel-Alonso is

member of the HIPEAC European Network of Excellence. Mrs Lorido-

Botr´

an is supported by a doctoral grant from the Basque Government.

Finally, we would like to thank the anonymous reviewers for their sug-

gestions and comments that greatly contributed to improve this survey.

A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 21

A Performance Evaluation in the Cloud: Experimental

Platforms, Application Benchmarks and Workloads

The auto-scaling techniques proposed in the literature have been tested

under very different conditions, which makes it impossible to come up

with a fair comparative assessment. Researchers in the ﬁeld have built

their own evaluation platforms, suitable to their own needs. These eval-

uation platforms can be classiﬁed into simulators, custom testbeds and

public cloud providers. Except for simple simulators, a scalable appli-

cation benchmark has to be executed on the platform to carry out an

evaluation. The input workload to the application can be either syn-

thetic, generated with speciﬁc programs, or obtained from real users.

A.1 Experimental Platforms

Experimentation could be done in production infrastructures, either

from real cloud providers or in a private cloud. The major advantage of

using a real cloud platform is that proposals can be checked in actual

scenarios, thus proving a proof of suitability. However, it has a clear

drawback: for each experimentation, the whole scenario needs to be

set. In case of using a public cloud provider, the infrastructure is al-

ready given, but the researcher still needs to conﬁgure the monitoring

and auto-scaling system, deploy an application benchmark and a load

generator over a pool of VMs, ﬁgure out how to extract the informa-

tion and store it for later processing. Probably, each execution will be

charged according to the fees established by the cloud provider.

In order to reduce the experimentation cost and to have a more

controlled environment, a custom testbed could be set up. This has a

cost in terms of system conﬁguration effort (in addition to buying the

hardware, if not available). An initial, non-trivial step consists of in-

stalling the virtualization software that will manage the VM creation,

support scaling and so on. Virtualization can be applied at the server

level, OS level or at the application level. For custom testbeds, a server

level virtualization environment is needed, commonly referred to as

Hypervisor or Virtual Machine Monitor (VMM). Some popular hyper-

visors are Xen [36], VMWare ESXi [32] and Kernel Virtual Machine

(KVM) [15]. There are several platforms for deploying custom clouds,

including open-source alternatives like OpenStack [19] and Eucalyptus

[9], and commercial software like vCloud Director [31]. OpenStack is

an open-source initiative supported by many cloud-related companies

such as RackSpace, HP, Intel and AMD, with a large customer-base.

Eucalyptus enables the creation of on-premises Infrastructure as a Ser-

vice clouds, with support for Xen, KVM and ESXi, and the Amazon

EC2 API. VCloud Director is the commercial platform developed by

VMWare.

As an alternative to a real infrastructure (custom testbed or public

cloud provider), software tools could be used to simulate the function-

ing of a cloud platform, including resource allocation and deallocation,

VM execution, monitoring and the remaining cloud management tasks.

The researcher could select an already existing simulator and adapt it

to her needs, or implement a new one from scratch. Obviously, using a

simulator implies an initial effort to adapt or develop the software but,

in contrast, has many advantages. The evaluation process is shortened

in many ways. Simulation allows testing multiple algorithms, avoiding

infrastructure re-conﬁgurations. Besides, simulation isolates the exper-

iment from the inﬂuence of external, uncontrolled factors (e.g. interfer-

ence with other applications, provider-induced VM consolidation pro-

cesses, etc.), something impossible in real cloud infrastructures. Ex-

periments carried out in real infrastructures may last hours, whereas

in an event-based simulator, this process may take just a few minutes.

Simulators are highly conﬁgurable and allow the researcher to gather

any information about system state or performance metrics. In spite

of the advantages mentioned, simulated environments are still an ab-

straction of physical machine clusters, thus the reliability of the results

will depend on the level of implementation detail considered during the

development. Some research-oriented cloud simulators are CloudSim

[7], GreenCloud [14], and GroudSim [87].

A.2 Application Benchmarks

A scalable application benchmark is executed on top of an experimen-

tal platform, based on a public cloud provider or a custom testbed, in

order to measure the performance of the system. Simulated experimen-

tal platforms do not always require the use of a benchmark: a simpliﬁed

view of an application may be part of the simulated model. Typically,

benchmarks comprise a web application together with a workload gen-

erator that creates synthetic session-based requests to the application.

Some commonly used benchmarks for cloud research are RUBiS [1],

TPC-W [30], CloudStone [8] and WikiBench [33]. Although both RU-

BiS and TPC-W benchmarks are out-dated or declared obsolete, they

are still being used by the research community.

RUBiS [1]: It is a prototype of an auction website modeled after eBay.

It offers the core functionality of an auction site (selling, brows-

ing and bidding) and supports three kinds of user sessions: visitor,

buyer, and seller. The applications consists of three main compo-

nents: Apache load balancer server, JBoss application server and

MySQL database server. The last update in this benchmark was in

2008.

TPC-W [30]: TPC [29] is a nonproﬁt organization founded to deﬁne

transaction processing and database benchmarks. Among them,

TPC-W is a complex e-commerce application, speciﬁcally an on-

line bookshop. It simulates three different proﬁles: primarily shop-

ping, browsing and web-based ordering. The performance metric

reported is the number of web interactions processed per second.

It was declared obsolete in 2005.

CloudStone [8]: It is a multi-platform performance measurement tool

for Web 2.0 and cloud computing developed by the Rad Lab group

at the University of Berkeley. CloudStone includes a ﬂexible, real-

istic workload generator (Faban) to generate load against a realistic

Web 2.0 application (Olio). The stack can be deployed on Amazon

EC2 instances. Olio 2.0 is a two-tier social networking benchmark,

with a web frontend and a database backend. The application met-

ric is the number of active users of the social networking applica-

tion, which drives the throughput or the number of operations per

second.

WikiBench [33]: It is a web hosting benchmark, that uses real data

from the Wikipedia database, and generates real trafﬁc based on

the Wikipedia access traces. The core application is MediaWiki

[16], a free open source wiki package originally used on the Wikipedia

website.

There are other, less used benchmarks such as RUBBoS [24], SpecWeb

[97] and TPC-C [28]. RUBBoS is a bulletin board benchmark, mod-

eled after an online news forum like Slashdot. The last update was in

2005. SpecWeb is a benchmark tool, created by the Standard Perfor-

mance Evaluation Corporation (SPEC), that is able to generate bank-

ing, e-commerce and support (large downloads), synthetic workloads.

SpecWeb has been discontinued in 2012 and now the SPEC company

has created a cloud benchmarking group [26]. TPC-C is an on-line

transaction processing benchmark that simulates a complete computing

environment where a population of users executes transactions against

a database. The performance metric is the number of transactions per

minute.

A.3 Workloads

As described before, workloads represent inputs to be processed by

application benchmarks. They can be generated using some patterns

22 Tania Lorido-Botran et al.

or gathered from real cloud applications (and typically stored in trace

ﬁles).

Cloud-based systems process two main classes of workload: batch

and transactional. Batch workloads consist of arbitrary, long running,

resource-intensive jobs, such as scientiﬁc programs or video transcod-

ing. The most well-known examples of transactional workloads are

web applications built to serve on-line HTTP clients. These systems

usually serve content types such as HTML pages, images or video

streams. All of these contents can be statically stored or dynamically

rendered by the servers.

A.3.1 Synthetic Workloads

Synthetic workloads can be generated based on different patterns. Ac-

cording to Mao and Humphrey [79] [2], there are four representa-

tive workload patterns in cloud environments: Stable, Growing, Cy-

cle/Bursting and On-and-off. Each of them represents a typical appli-

cation or scenario. A Stable workload is characterized by a constant

number of requests per minute. In contrast, a Growing pattern shows a

load that rapidly increases, for example, a piece of news that suddenly

becomes popular. The third pattern is called Cyclic/Bursting because

it can show regular periods (e.g. daytime has more workload than the

night) or bursts of load in a punctual date (e.g. a special offer). The last

typical pattern found is the On-and-off workload that may represent

batch processing or data analysis performed everyday.

There is a broad range of workload generators, that may create

either simple requests, or even real HTTP sessions, that mix different

actions (e.g. login or browsing) and simulate user thinking times. Ex-

amples of workload generators are:

Faban [8]: A Markov-based workload generator, included in the Cloud-

Stone stack.

Apache JMeter [4]: A workload generator implemented in Java, used

for load testing and measuring performance. It can be used to test

performance on both static and dynamic resources (ﬁles, servlets,

Perl scripts, databases and queries, FTP servers and more). It can

also generate heavy loads for a server, network or object, either to

test its strength or to analyze the overall performance under differ-

ent scenarios.

Rain [21]: A workload generation toolkit that uses parameterized sta-

tistical distributions in order to model different classes of work-

load.

Httperf [27]: A tool for measuring web server performance. It pro-

vides a ﬂexible facility for generating various HTTP workloads

and for measuring server performance.

Synthetic workloads are suitable to carry out controlled experi-

mentation. For example, the workload could be tuned in order to test

the system under different number of users or request rates, with smooth

increments or sudden peaks. However, they may not be realistic enough,

a reason that makes necessary to use workloads collected from real pro-

duction systems.

A.3.2 Real Workloads

To the best of our knowledge, there are no publicly available, real traces

from cloud-hosted applications, and this is an evident drawback for

cloud research. In the literature, some authors have generated their own

traces, running benchmarks or real applications in cloud platforms.

There are also some references to traces from private clouds that have

not been published. However, most authors have used traces from In-

ternet servers, such as the ClarkNet trace [6], the World Cup 98 trace

[35] and Wikipedia access traces [34].

The ClarkNet trace [6] contains the HTTP requests received by

the ClarkNet server over a two-weeks period in 1995. ClarkNet is an

Internet access provider for the metro Baltimore-Washington DC area.

The trace shows a clear cyclic workload pattern (see Figure 3): daytime

has more workload than the night, and the workload on weekends is

lower than that taking place on weekdays.

Fig. 3 Number of requests per minute for ClarkNet Trace

The World Cup 98 trace [35] has been extensively used in the lit-

erature. It contains all the HTTP requests made to the 1998 World Cup

Web site between April 30, 1998 and July 26, 1998.

More recently, several Wikipedia access traces were published for

research purposes. They contain the URL requests made to the Wikipedia

servers between September, 2007 and January, 2008. Most requests are

read-only queries to the database.

Some authors have used traces from Grid environments [46], but

although there is an extensive number of public traces, the job execu-

tion scheme is not suitable for evaluating elastic applications. However,

they could be useful for batch-based workloads.

It is also worth mentioning the Google Cluster Data [12], two sets

of traces that contain the workloads running on Google compute cells.

The ﬁrst dataset refers to a 7-hour period and consists of a set of tasks.

However, the data records have been anonymized, and the CPU and

RAM consumption have been normalized and obscured using a linear

transformation. The second trace includes signiﬁcantly more informa-

tion about jobs, machine characteristics and constraints. This trace in-

cludes data from an 11k-machine cell over about a month-long period.

Like in the ﬁrst trace, all the numeric data have been normalized, and

there is no information about the job type. For this reason, these Google

traces can be utilized to test auto-scaling techniques, but they could be

useful in other cloud-related research scenarios.

References

1. (2009) RUBiS: Rice University Bidding System. http://

rubis.ow2.org/, [Online; accessed 13-September-2012]

2. (2010) Workload Patterns for Cloud Computing. http:

//watdenkt.veenhof.nu/2010/07/13/workload-patterns-

for-cloud- computing/, [Online; accessed 29-January-2014]

3. (2012) Amazon Elastic Compute Cloud (Amazon EC2). http:

//aws.amazon.com/ec2/, [Online; accessed 13-September-

2012]

4. (2012) Apache JMeter. http://jmeter.apache.org/, [Online;

accessed 18-September-2012]

5. (2012) AWS Elastic Beanstalk (beta). Easy to begin, Impossible

to outgrow. http://aws.amazon.com/elasticbeanstalk/,

[Online; accessed 13-September-2012]

A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 23

6. (2012) ClarkNet HTTP Trace (From the Internet Traf-

ﬁc Archive). http://ita.ee.lbl.gov/html/contrib/

ClarkNet-HTTP.html, [Online; accessed 13-September-2012]

7. (2012) CloudSim: A Framework for Modeling and Simulation

of Cloud Computing Infrastructures and Services. http://www.

cloudbus.org/cloudsim/, [Online; accessed 18-September-

2012]

8. (2012) CloudStone Project by Rad Lab Group. http://radlab.

cs.berkeley.edu/wiki/Projects/Cloudstone/, [Online; ac-

cessed 13-September-2012]

9. (2012) Eucalyptus Cloud. http://www.eucalyptus.com/, [On-

line; accessed 18-September-2012]

10. (2012) Google App Engine. http://cloud.google.com/

products/, [Online; accessed 13-September-2012]

11. (2012) Google Apps for Business. http://www.google.com/

intl/es/enterprise/apps/business/products.html, [On-

line; accessed 13-September-2012]

12. (2012) Google Cluster Data. Traces of Google workloads.

http://code.google.com/p/googleclusterdata/, [Online;

accessed 13-September-2012]

13. (2012) Google Compute Engine. http://cloud.google.

com/products/compute-engine.html/, [Online; accessed 13-

September-2012]

14. (2012) Greencloud - The green cloud simulator.

http://greencloud.gforge.uni.lu/, [Online; accessed

18-September-2012]

15. (2012) Kernel Based Virtual Machine. http://www.linux-

kvm.org/, [Online; accessed 18-September-2012]

16. (2012) MediaWiki. http://www.mediawiki.org/wiki/

MediaWiki, [Online; accessed 24-November-2012]

17. (2012) Microsoft Ofﬁce 365. http://www.microsoft.com/en-

us/office365/online-software.aspx, [Online; accessed 13-

September-2012]

18. (2012) Microsoft Windows Azure. https://www.

windowsazure.com/en-us/, [Online; accessed 13-September-

2012]

19. (2012) OpenStack Cloud Software. Open source software for

building private and public clouds. http://www.openstack.

org/, [Online; accessed 18-September-2012]

20. (2012) Rackspace. The open cloud company. http://www.

rackspace.com/, [Online; accessed 13-September-2012]

21. (2012) Rain Workload Toolkit. https://github.com/

yungsters/rain-workload- toolkit/wiki, [Online; ac-

cessed 13-September-2012]

22. (2012) RightScale Cloud Management. http://www.

rightscale.com/, [Online; accessed 13-September-2012]

23. (2012) RightScale. Set up Autoscaling using Voting Tags.

http://support.rightscale.com/03-Tutorials/02-

AWS/02-Website_Edition/Set_up_Autoscaling_using_

Voting_Tags, [Online; accessed 13-September-2012]

24. (2012) RUBBoS: Bulletin Board Benchmark. http:

//jmob.ow2.org/rubbos.html/, [Online; accessed 18-

September-2012]

25. (2012) Salesforce.com. http://www.salesforce.com/, [On-

line; accessed 13-September-2012]

26. (2012) SPEC forms cloud benchmarking group. http://www.

spec.org/osgcloud/press/cloudannouncement20120613.

html, [Online; accessed 18-September-2012]

27. (2012) The httperf HTTP load generator. http://code.google.

com/p/httperf/, [Online; accessed 18-September-2012]

28. (2012) TPC-C. http://www.tpc.org/tpcc/default.asp/,

[Online; accessed 18-September-2012]

29. (2012) TPC. Transaction Processing Performance Council.

http://www.tpc.org/default.asp, [Online; accessed 13-

September-2012]

30. (2012) TPC-W. http://www.tpc.org/tpcw/default.asp,

[Online; accessed 13-September-2012]

31. (2012) VMware vCloud Director. Deliver Complete Vir-

tual Datacenters for Consumption in Minutes. http://www.

eucalyptus.com/, [Online; accessed 18-September-2012]

32. (2012) VMware vSphere ESX and ESXi Info Center.

http://www.vmware.com/es/products/datacenter-

virtualization/vsphere/esxi-and- esx/overview.html,

[Online; accessed 18-September-2012]

33. (2012) WikiBench: A Web hosting benchmark. http://www.

wikibench.eu, [Online; accessed 24-November-2012]

34. (2012) Wikipedia access traces. http://www.wikibench.eu/

?page_id=60, [Online; accessed 24-November-2012]

35. (2012) World Cup 98 Trace (From the Internet Trafﬁc Archive).

http://ita.ee.lbl.gov/html/contrib/WorldCup.html,

[Online; accessed 13-September-2012]

36. (2012) Xen hypervisor. http://http://www.xen.org/, [On-

line; accessed 18-September-2012]

37. Albus J (1975) A new approach to manipulator control: The cere-

bellar model articulation controller (CMAC). Transaction of the

ASME, Journal of dynamic systems, measurement and control

38. Alhamazani K, Ranjan R, Mitra K, Rabhi F, Khan SU, Guabtni

A, Bhatnagar V (2013) An Overview of the Commercial Cloud

Monitoring Tools: Research Dimensions, Design Issues, and

State-of-the-Art. arXiv preprint arXiv:13126170

39. Ali-Eldin A, Kihl M, Tordsson J, Elmroth E (2012) Efﬁcient

provisioning of bursty scientiﬁc workloads on the cloud using

adaptive elasticity control. In: Proceedings of the 3rd workshop

on Scientiﬁc Cloud Computing Date - ScienceCloud ’12, ACM

Press, New York, New York, USA, p 31, DOI 10.1145/2287036.

2287044

40. Ali-Eldin A, Tordsson J, Elmroth E (2012) An adaptive hybrid

elasticity controller for cloud infrastructures. In: Network Opera-

tions and Management Symposium (NOMS), 2012 IEEE, IEEE,

pp 204–212

41. Bacigalupo DA, van Hemert J, Usmani A, Dillenberger DN,

Wills GB, Jarvis SA (2010) Resource management of enterprise

cloud systems using layered queuing and historical performance

models. In: 2010 IEEE International Symposium on Parallel &

Distributed Processing, Workshops and Phd Forum (IPDPSW),

IEEE, pp 1–8, DOI 10.1109/IPDPSW.2010.5470782

42. Barrett E, Howley E, Duggan J (2012) Applying reinforcement

learning towards automating resource allocation and application

scalability in the cloud. Concurrency and Computation: Practice

and Experience

43. Bod´

ık P, Grifﬁth R, Sutton C, Fox A, Jordan M, Patterson

D (2009) Statistical machine learning makes automatic control

practical for internet datacenters. HotCloud’09 Proceedings of

the 2009 conference on Hot topics in cloud computing p 12

44. Brown R, Meyer R (1961) The fundamental theorem of exponen-

tial smoothing. Operations Research

45. Bu X, Rao J, Xu CZ (2012) Coordinated Self-conﬁguration of

Virtual Machines and Appliances using A Model-free Learning

Approach. IEEE Transactions on Parallel and Distributed Sys-

tems pp 1–1, DOI 10.1109/TPDS.2012.174

46. Caron E, Desprez F, Muresan A (2010) Forecasting for Cloud

computing on-demand resources based on pattern matching. Re-

search Report RR-7217, INRIA

47. Caron E, Desprez F, Muresan A (2011) Pattern Matching Based

Forecast of Non-periodic Repetitive Behavior for Cloud Clients.

Journal of Grid Computing 9(1):49–64, DOI 10.1007/s10723-

010-9178- 4

48. Casalicchio E, Silvestri L (2013) Autonomic Management of

Cloud-Based Systems: The Service Provider Perspective. In: Ge-

lenbe E, Lent R (eds) Computer and Information Sciences III,

Springer London, pp 39–47, DOI 10.1007/978-1- 4471-4594-

24 Tania Lorido-Botran et al.

3\5

49. Chandra A, Gong W, Shenoy P (2003) Dynamic resource alloca-

tion for shared data centers using online measurements. Proceed-

ings of the 11th international conference on Quality of service pp

381–398

50. Chen G, He W, Liu J, Nath S, Rigas L, Xiao L, Zhao F

(2008) Energy-aware server provisioning and load dispatching

for connection-intensive internet services. In: Proceedings of the

5th USENIX Symposium on Networked Systems Design and Im-

plementation, USENIX Association, vol 8, pp 337–350

51. Chieu TC, Mohindra A, Karve AA, Segal A (2009) Dynamic

scaling of web applications in a virtualized cloud computing en-

vironment. In: e-Business Engineering, 2009. ICEBE’09. IEEE

International Conference on, Ieee, pp 281–286

52. Chieu TC, Mohindra A, Karve AA (2011) Scalability and Perfor-

mance of Web Applications in a Compute Cloud. In: e-Business

Engineering (ICEBE), 2011 IEEE 8th International Conference

on, IEEE, pp 317–323

53. Cormen TH, Stein C, Rivest RL, Leiserson CE (2001) Introduc-

tion to Algorithms, Chapter 32: String Matching. McGraw-Hill

Higher Education

54. Dutreilh X, Moreau A, Malenfant J, Rivierre N, Truck I (2010)

From data center resource allocation to control theory and back.

In: Cloud Computing (CLOUD), 2010 IEEE 3rd International

Conference on, IEEE, pp 410–417

55. Dutreilh X, Kirgizov S, Melekhova O, Malenfant J, Rivierre N,

Truck I (2011) Using Reinforcement Learning for Autonomic

Resource Allocation in Clouds: towards a fully automated work-

ﬂow. In: Seventh International Conference on Autonomic and

Autonomous Systems, ICAS 2011, IEEE, pp 67–74

56. Dutta S, Gera S, Verma A, Viswanathan B (2012) SmartScale:

Automatic Application Scaling in Enterprise Clouds. In: 2012

IEEE Fifth International Conference on Cloud Computing, IEEE,

pp 221–228, DOI 10.1109/CLOUD.2012.12

57. Fang W, Lu Z, Wu J, Cao Z (2012) RPPS: A Novel Resource

Prediction and Provisioning Scheme in Cloud Data Center. In:

2012 IEEE Ninth International Conference on Services Comput-

ing, IEEE, pp 609–616, DOI 10.1109/SCC.2012.47

58. Gambi A, Toffetti G (2012) Modeling Cloud performance with

Kriging. In: 2012 34th International Conference on Software

Engineering (ICSE), IEEE, pp 1439–1440, DOI 10.1109/ICSE.

2012.6227075

59. Ghanbari H, Simmons B, Litoiu M, Iszlai G (2011) Exploring

Alternative Approaches to Implement an Elasticity Policy. In:

Cloud Computing (CLOUD), 2011 IEEE International Confer-

ence on, IEEE, pp 716–723

60. Gong Z, Gu X, Wilkes J (2010) Press: Predictive elastic resource

scaling for cloud systems. In: Network and Service Management

(CNSM), 2010 International Conference on, IEEE, pp 9–16

61. Guitart J, Torres J, Ayguad´

e E (2010) A survey on performance

management for internet applications. Concurrency and Compu-

tation: Practice and Experience 22(1):68–106, DOI 10.1002/cpe.

1470

62. Han R, Ghanem MM, Guo L, Guo Y, Osmond M (2012) En-

abling cost-aware and adaptive elasticity of multi-tier cloud ap-

plications. Future Generation Computer Systems null(null), DOI

10.1016/j.future.2012.05.018

63. Han R, Guo L, Ghanem M, Han, R and Guo, L and Ghanem, MM

and Guo Y (2012) Lightweight Resource Scaling for Cloud Ap-

plications. Cluster, Cloud and Grid Computing (CCGrid), 2012

12th IEEE/ACM International Symposium on

64. Hasan MZ, Magana E, Clemm A, Tucker L, Gudreddi SLD

(2012) Integrated and autonomic cloud resource scaling. In: Net-

work Operations and Management Symposium (NOMS), 2012

IEEE, IEEE, pp 1327–1334

65. Huang J, Li C, Yu J (2012) Resource prediction based on double

exponential smoothing in cloud computing. In: Consumer Elec-

tronics, Communications and Networks (CECNet), 2012 2nd In-

ternational Conference on, IEEE, pp 2056–2060

66. Iqbal W, Dailey MN, Carrera D, Janecek P (2011) Adaptive re-

source provisioning for read intensive multi-tier applications in

the cloud. Future Generation Computer Systems 27(6):871–879,

DOI 10.1016/j.future.2010.10.016

67. Islam S, Keung J, Lee K, Liu A (2012) Empirical prediction mod-

els for adaptive resource provisioning in the cloud. Future Gen-

eration Computer Systems 28(1):155–162, DOI 10.1016/j.future.

2011.05.027

68. Jacob B, Lanyon-Hogg R, Nadgir DK, Yassin AF (2004) A prac-

tical guide to the IBM autonomic computing toolkit

69. Kalyvianaki E, Charalambous T, Hand S (2009) Self-adaptive

and self-conﬁgured cpu resource provisioning for virtualized

servers using kalman ﬁlters. In: Proceedings of the 6th interna-

tional conference on Autonomic computing, ACM, pp 117–126

70. Kertesz A, Kecskemeti G, Oriol M, Kotcauer P, Acs S, Rodr´

ıguez

M, Merc`

e O, Marosi AC, Marco J, Franch X (2013) Enhancing

Federated Cloud Management with an Integrated Service Moni-

toring Approach. Journal of Grid Computing pp 1–22

71. Khatua S, Ghosh A, Mukherjee N (2010) Optimizing the utiliza-

tion of virtual resources in Cloud environment. In: 2010 IEEE

International Conference on Virtual Environments, Human-

Computer Interfaces and Measurement Systems, IEEE, pp 82–

87, DOI 10.1109/VECIMS.2010.5609349

72. Koperek P, Funika W (2012) Dynamic Business Metrics-driven

Resource Provisioning in Cloud Environments. In: Wyrzykowski

R, Dongarra J, Karczewski K, Wa´

sniewski J (eds) Parallel Pro-

cessing and Applied Mathematics, Lecture Notes in Computer

Science, vol 7204, Springer Berlin Heidelberg, pp 171–180, DOI

10.1007/978-3- 642-31500- 8\18

73. Kupferman J, Silverman J, Jara P, Browne J (2009) Scaling into

the cloud. Tech. rep., University of California, Santa Barbara;

CS270 - Advanced Operating Systems, URL http://cs.ucsb.

edu/~jkupferman/docs/ScalingIntoTheClouds.pdf

74. Lama P, Zhou X (2010) Autonomic Provisioning with Self-

Adaptive Neural Fuzzy Control for End-to-end Delay Guarantee.

In: 2010 IEEE International Symposium on Modeling, Analysis

and Simulation of Computer and Telecommunication Systems,

IEEE, pp 151–160, DOI 10.1109/MASCOTS.2010.24

75. Lim HC, Babu S, Chase JS, Parekh SS (2009) Automated control

in cloud computing: challenges and opportunities. In: Proceed-

ings of the 1st workshop on Automated control for datacenters

and clouds, ACM, New York, NY, USA, ACDC ’09, pp 13–18,

DOI 10.1145/1555271.1555275

76. Lim HC, Babu S, Chase JS (2010) Automated control for elas-

tic storage. In: Proceeding of the 7th international conference on

Autonomic computing - ICAC ’10, ACM Press, New York, New

York, USA, p 1, DOI 10.1145/1809049.1809051

77. Lorido-Botran T, Miguel-Alonso J, Lozano JA (2013) Compari-

son of Auto-scaling Techniques for Cloud Environments. In: Al-

berto A. Del Barrio, G. B. (editor), Actas de las XXIV Jornadas

de Paralelismo. Servicio de Publicaciones

78. Manvi SS, Shyam GK (2013) Resource management for Infras-

tructure as a Service (IaaS) in cloud computing: A survey. Journal

of Network and Computer Applications (0):–

79. Mao M, Humphrey M (2011) Auto-scaling to minimize cost and

meet application deadlines in cloud workﬂows. In: Proceedings

of 2011 International Conference for High Performance Comput-

ing, Networking, Storage and Analysis on - SC ’11, ACM Press,

New York, New York, USA, p 1, DOI 10.1145/2063384.2063449

80. Mao M, Humphrey M (2012) A Performance Study on the VM

Startup Time in the Cloud. In: Proceedings of the 2012 IEEE

Fifth International Conference on Cloud Computing, IEEE Com-

A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments 25

puter Society, Washington, DC, USA, CLOUD ’12, pp 423–430,

DOI 10.1109/CLOUD.2012.103

81. Maurer M, Brandic I, Sakellariou R (2011) Enacting slas in

clouds using rules. Euro-Par 2011 Parallel Processing

82. Maurer M, Breskovic I, Emeakaroha VC, Brandic I (2011) Re-

vealing the MAPE loop for the autonomic management of Cloud

infrastructures. In: Computers and Communications (ISCC),

2011 IEEE Symposium on, pp 147–152, DOI 10.1109/ISCC.

2011.5984008

83. Menasce DA, Dowdy LW, Almeida VAF (2004) Performance by

Design: Computer Capacity Planning By Example. Upper Saddle

River, NJ: Prentice Hall

84. M´

endez Mu˜

noz V, Casaj´

us Ramo A, Fern´

andez Albor V, Gra-

ciani Diaz R, Merino Ar´

evalo G (2013) Rafhyc: an Archi-

tecture for Constructing Resilient Services on Federated Hy-

brid Clouds. Journal of Grid Computing 11(4):753–770, DOI

10.1007/s10723-013- 9279-y

85. Mi H, Wang H, Yin G, Zhou Y, Shi D, Yuan L (2010) Online self-

reconﬁguration with performance guarantee for energy-efﬁcient

large-scale cloud computing data centers. In: Services Comput-

ing (SCC), 2010 IEEE International Conference on, IEEE, pp

514–521

86. Moore LR, Bean K, Ellahi T (2013) Transforming reactive auto-

scaling into proactive auto-scaling. In: Proceedings of the 3rd In-

ternational Workshop on Cloud Data and Platforms, ACM, New

York, NY, USA, CloudDP ’13, pp 7–12, DOI 10.1145/2460756.

2460758

87. Ostermann S, Plankensteiner K, Prodan R, Fahringer T (2011)

GroudSim: An Event-Based Simulation Framework for Compu-

tational Grids and Clouds. In: Guarracino M, Vivien F, Tr¨

aff J,

Cannatoro M, Danelutto M, Hast A, Perla F, Kn¨

upfer A, Martino

B, Alexander M (eds) Euro-Par 2010 Parallel Processing Work-

shops, Lecture Notes in Computer Science, vol 6586, Springer

Berlin Heidelberg, pp 305–313, DOI 10.1007/978-3-642-21878-

1\38

88. Padala P, Hou KY, Shin KG, Zhu X, Uysal M, Wang Z, Singhal

S, Merchant A (2009) Automated control of multiple virtualized

resources. In: Proceedings of the 4th ACM European conference

on Computer systems, ACM, pp 13–26

89. Park SM, Humphrey M (2009) Self-Tuning Virtual Machines

for Predictable eScience. In: 2009 9th IEEE/ACM International

Symposium on Cluster Computing and the Grid, IEEE, pp 356–

363, DOI 10.1109/CCGRID.2009.84

90. Patikirikorala T, Colman A (2010) Feedback controllers in the

cloud. APSEC 2010, Cloud workshop

91. Prodan R, Nae V (2009) Prediction-based real-time resource pro-

visioning for massively multiplayer online games. Future Gener-

ation Computer Systems 25(7):785–793, DOI 10.1016/j.future.

2008.11.002

92. Rao J, Bu X, Xu CZ, Wang L, Yin G (2009) VCONF: a reinforce-

ment learning approach to virtual machines auto-conﬁguration.

In: Proceedings of the 6th international conference on Auto-

nomic computing, ACM, New York, NY, USA, ICAC ’09, pp

137–146, DOI 10.1145/1555228.1555263

93. Rao J, Bu X, Xu CZ, Wang K (2011) 8. In: 2011 IEEE 19th

Annual International Symposium on Modelling, Analysis, and

Simulation of Computer and Telecommunication Systems, IEEE,

pp 45–54, DOI 10.1109/MASCOTS.2011.47

94. Roy N, Dubey A, Gokhale A (2011) Efﬁcient Autoscaling in the

Cloud Using Predictive Models for Workload Forecasting. In:

2011 IEEE 4th International Conference on Cloud Computing,

IEEE, pp 500–507, DOI 10.1109/CLOUD.2011.42

95. Shen Z, Subbiah S, Gu X, Wilkes J (2011) Cloudscale: Elastic

resource scaling for multi-tenant cloud systems. Proceedings of

the 2nd ACM Symposium on Cloud Computing

96. Simmons B, Ghanbari H, Litoiu M, Iszlai G (2011) Managing

a SaaS application in the cloud using PaaS policy sets and a

strategy-tree. In: Network and Service Management (CNSM),

2011 7th International Conference on, pp 1–5

97. SPECweb2009 (2012) The httperf HTTP load generator. http:

//www.spec.org/web2009/, [Online; accessed 18-September-

2012]

98. Sutton RS, Barto AG (1998) Introduction to Reinforcement

Learning. Cambridge Univ Press

99. Tesauro G, Jong NK, Das R, Bennani MN (2006) A Hybrid Re-

inforcement Learning Approach to Autonomic Resource Alloca-

tion. In: Proceedings of the 2006 IEEE International Conference

on Autonomic Computing, IEEE Computer Society, Washing-

ton, DC, USA, ICAC ’06, pp 65–73, DOI 10.1109/ICAC.2006.

1662383

100. Urgaonkar B, Shenoy P, Chandra A, Goyal P, Wood T (2008)

Agile dynamic provisioning of multi-tier Internet applications.

ACM Transactions on Autonomous and Adaptive Systems

3(1):1–39, DOI 10.1145/1342171.1342172

101. Villela D, Pradhan P, Rubenstein D (2004) Provisioning servers

in the application tier for e-commerce systems. In: Quality of

Service, 2004. IWQOS 2004. Twelfth IEEE International Work-

shop on, IEEE, pp 57–66

102. Wang L, Xu J, Zhao M, Fortes J (2011) Adaptive virtual re-

source management with fuzzy model predictive control. In: Pro-

ceedings of the 8th ACM international conference on Autonomic

computing - ICAC ’11, ACM Press, New York, New York, USA,

p 191, DOI 10.1145/1998582.1998623

103. Wang L, Xu J, Zhao M, Tu Y, Fortes JAB (2011) Fuzzy Modeling

Based Resource Management for Virtualized Database Systems.

In: Modeling, Analysis & Simulation of Computer and Telecom-

munication Systems (MASCOTS), 2011 IEEE 19th International

Symposium on, IEEE, pp 32–42

104. Watkins C, Dayan P (1992) Q-learning. Machine learning

105. Xu CZ, Rao J, Bu X (2012) URL: A uniﬁed reinforcement learn-

ing approach for autonomic cloud management. Journal of Par-

allel and Distributed Computing 72(2):95–105, DOI 10.1016/j.

jpdc.2011.10.003

106. Xu J, Zhao M, Fortes J, Carpenter R, Yousif M (2007) On the

Use of Fuzzy Modeling in Virtualized Data Center Management.

In: Proceedings of the Fourth International Conference on Au-

tonomic Computing, IEEE Computer Society, Washington, DC,

USA, ICAC ’07, p 25, DOI 10.1109/ICAC.2007.28

107. Zhang Q, Cherkasova L, Smirni E (2007) A regression-based

analytic model for dynamic resource provisioning of multi-tier

applications. In: Autonomic Computing, 2007. ICAC’07. Fourth

International Conference on, IEEE, p 27

108. Zhu Q, Agrawal G (2012) Resource Provisioning with Budget

Constraints for Adaptive Applications in Cloud Environments.

IEEE Transactions on Services Computing 5(4):497–511, DOI

10.1109/TSC.2011.61

Cloud Scaling Policies Verifier System driven by STORM Model Checker

Preprint

Full-text available

Jun 2024

Interpreting the stochastic behavior of cloud scaling policies formalized and verified with a model checker, using an independent visualization tool can be time-consuming as several manual steps need to be taken to visualize the behavior. This is because the existing model checker presents verification results through text-based descriptions and line charts depicting probabilities or rewards against investigated variables. However, with the line charts, the user can only analyze the probabilities or rewards instead of the behavior. To address this, we propose a cloud scaling policies verifier system that utilizes the extensible STORM model checker to automatically visualize the behavior of the verified model and the verification results. Additionally, the system architecture integrates the desktop local environment with the Docker container environment using open-source technologies including StormPy, PyQt, Networkx, and Matplotlib to enable replication and innovation for future research. Moreover, the system has undergone evaluation through functional and integration testing, demonstrating its effectiveness. Finally, it has been demonstrated that the system can be utilized not only for checking and verifying various cloud scaling policies but also for other decision-making policies beyond cloud scaling.

Vertically Autoscaling Monolithic Applications with CaaSPER: Scalable C ontainer- a s- a - S ervice P erformance E nhanced R esizing Algorithm for the Cloud

Conference Paper

Jun 2024

Intelligent Pooling: Proactive Resource Provisioning in Large-scale Cloud Service

Article

May 2024

The proliferation of big data and analytic workloads has driven the need for cloud compute and cluster-based job processing. With Apache Spark, users can process terabytes of data at ease with hundreds of parallel executors. Providing low latency access to Spark clusters and sessions is a challenging problem due to the large overheads of cluster creation and session startup. In this paper, we introduce Intelligent Pooling, a system for proactively provisioning compute resources to combat the aforementioned overheads. Our system (1) predicts usage patterns using an innovative hybrid Machine Learning (ML) model with low latency and high accuracy; and (2) optimizes the pool size dynamically to meet customer demand while reducing extraneous COGS. The proposed system auto-tunes its hyper-parameters to balance between performance and operational cost with minimal to no engineering input. Evaluated using large-scale production data, Intelligent Pooling achieves up to 43% reduction in cluster idle time compared to static pooling when targeting 99% pool hit rate. Currently deployed in production, Intelligent Pooling is on track to save tens of million dollars in COGS per year as compared to traditional pre-provisioned pools.

PASS: Predictive Auto-Scaling System for Large-scale Enterprise Web Applications

Conference Paper

May 2024

Systeme für skalierbares Datenmanagement

Chapter

May 2024

Polyglotte Persistenz im Datenmanagement

Chapter

May 2024

Achieving Agility through Auto-Scaling: Strategies for Dynamic Resource Allocation in Cloud Computing

Article

Full-text available

Apr 2024

Auto-scaling is a crucial aspect of cloud computing, allowing for the efficient allocation of computational resources in response to immediate demand. This article delves into the concept of auto-scaling, its key components, and the strategies used to effectively manage resources in cloud environments. This study emphasizes the importance of auto-scaling in the cloud computing landscape by exploring its benefits, including cost efficiency, performance optimization, high availability, and scalability [1]. The article explores the various factors to consider when implementing scaling policies, such as selecting the right approach for scaling, whether it be predictive or reactive and the availability of auto-scaling services provided by major cloud platforms like Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure [2, 3]. In addition, the paper addresses the challenges and complexities related to configuring auto-scaling systems, cost management, and latency in resource provisioning [4]. The article also showcases case studies that illustrate the successful implementation of auto-scaling in different industries, along with valuable insights and recommended approaches [5]. Lastly, this paper delves into future trends and research directions in auto-scaling techniques, integration with emerging technologies, and potential research areas [6].

Autoscaling in Mobile Edge Computing Based on Multi-Agent Reinforcement Learning

Conference Paper

Apr 2024

A heterogeneous multi-core architectural model for video scheduling for transcoding in clouds

Article

Apr 2024

A cluster of transcoding servers is essential for transcoding many on-demand videos. Cloud computing presents a scalable framework for online video transcoding, and the infrastructure as a service (IaaS) cloud provides heterogeneous virtual machines (VMs) for creating a dynamically scalable cluster of servers. Heterogeneous VMs consist of small or big cores, which are assigned dynamically to allocate varying sizes of videos to the appropriate VMs for transcoding. Earlier research has proposed cloud-based heterogeneous scheduling for allocating different types of videos to different types of VMs so that the quality of service is maintained by reducing video rejection. In this paper, we propose a heterogeneous multi-core video scheduling model that additionally estimates the number of VMs and cores per VM with the variation of the number of videos to optimize the resources and cost of a cloud-based transcoding system. We further estimate the model's overhead concerning the variation in the number of videos. We conducted experiments on random videos, and experimental results reveal that the proposed model provides an excellent estimation of the number of VMs and cores. The proposed model reduces the average cost by 5% and requires almost 10% fewer cores for processing video tasks than the existing work in average cases.

Optimizing Scalability and Resilience: Strategies for Aligning DevOps and Cloud-Native Approaches

Conference Paper

Dec 2023

Using Reinforcement Learning for Autonomic Resource Allocation in Clouds: Towards a Fully Automated Workflow

Article

Full-text available

May 2011

Dynamic and appropriate resource dimensioning is a crucial issue in cloud computing. As applications go more and more 24/7, online policies must be sought to balance performance with the cost of allocated virtual machines. Most industrial approaches to date use ad hoc manual policies, such as threshold-based ones. Providing good thresholds proved to be tricky and hard to automatize to fit every application requirement. Research is being done to apply automatic decision-making approaches, such as reinforcement learning. Yet, they face a lot of problems to go to the field: having good policies in the early phases of learning, time for the learning to converge to an optimal policy and coping with changes in the application performance behavior over time. In this paper, we propose to deal with these problems using appropriate initialization for the early stages as well as convergence speedups applied throughout the learning phases and we present our first experimental results for these. We also introduce a performance model change detection on which we are currently working to complete the learning process management. Even though some of these proposals were known in the reinforcement learning field, the key contribution of this paper is to integrate them in a real cloud controller and to program them as an automated workflow.

Feedback controllers in the cloud

Article

Full-text available

Autonomic management of quality of service attributes by dynamic resource allocation is one of the requirements in cloud computing environments. In order to provide these requirements in a flexible way existing cloud providers expose static rule /threshold/heuristic based decision implementation frameworks to their consumers. These rule /threshold/heuristic based methods are relatively easy to design and develop. However, they suffer from lack of well-founded design process to decide important design parameters (e.g.: thresholds, the number of instances to add/remove, calm time) and difficulty of dynamically adjusting to different conditions. In addition, incorrect/non-adaptable rule settings could cause long term instabilities leading to service outages. The feedback control has been shown to be useful for performance management and resource allocation in many complex software systems. In this work, we investigate the advantages and limitations of applying feedback controllers in cloud platforms. In addition, we illustrate the suitability of standard feedback controllers depending on the consumer requirements/applications. Finally, we propose a novel platform as a service architecture to design, develop, integrate and runtime manage feedback controllers for the cloud consumer applications.

An Overview of the Commercial Cloud Monitoring Tools: Research Dimensions, Design Issues, and State-of-the-Art

Article

Full-text available

Apr 2014
COMPUTING

Cloud monitoring activity involves dynamically tracking the Quality of Service (QoS) parameters related to virtualized resources (e.g., VM, storage, network, appliances, etc.), the physical resources they share, the applications running on them and data hosted on them. Applications and resources configuration in cloud computing environment is quite challenging considering a large number of heterogeneous cloud resources. Further, considering the fact that at each point of time, there will be a different and specific cloud service which may be massively required. Hence, cloud monitoring tools can assist a cloud providers or application developers in: (i) keeping their resources and applications operating at peak efficiency; (ii) detecting variations in resource and application performance; (iii) accounting the Service Level Agreement (SLA) violations of certain QoS parameters; and (iv) tracking the leave and join operations of cloud resources due to failures and other dynamic configuration changes. In this paper, we identify and discuss the major research dimensions and design issues related to engineering cloud monitoring tools. We further discuss how aforementioned research dimensions and design issues are handled by current academic research as well as by commercial monitoring tools.

Resource management of enterprise cloud systems using layered queuing and historical performance models

Article

Jul 2010

Autonomic management of cloud-based systems: The service provider perspective

Article

Jan 2013

The complexity of Cloud systems poses new infrastructure and application management challenges. One of the common goals of the research community, practitioners and vendors is to design self-adaptable solutions capable to react to unpredictable workload fluctuations and changing utility principles. This paper analyzes the problem from the perspective of an application service provider that uses a cloud infrastructure to achieve scalable provisioning of its services in the respect of QoS constraints. We designed and implemented two autonomic cloud resource management architectures running five different resource provisioning algorithms. The implemented testbed has been evaluated under a realistic workload based on Wikipedia access traces.

Computer and Information Sciences III

Chapter

Jan 2013

Reinforcement Learning: An Introduction

Book

Jan 1998

A Framework for Modeling and Simulation of Cloud Computing Infrastructures and Services

Article

Luca Silvestri

Applying reinforcement learning towards automating resource allocation and application scalability in the cloud

Article

Aug 2013
CONCURR COMP-PRACT E

Public Infrastructure as a Service (IaaS) clouds such as Amazon, GoGrid and Rackspace deliver computational resources by means of virtualisation technologies. These technologies allow multiple independent virtual machines to reside in apparent isolation on the same physical host. Dynamically scaling applications running on IaaS clouds can lead to varied and unpredictable results because of the performance interference effects associated with co-located virtual machines. Determining appropriate scaling policies in a dynamic non-stationary environment is non-trivial. One principle advantage exhibited by IaaS clouds over their traditional hosting counterparts is the ability to scale resources on-demand. However, a problem arises concerning resource allocation as to which resources should be added and removed when the underlying performance of the resource is in a constant state of flux. Decision theoretic frameworks such as Markov Decision Processes are particularly suited to decision making under uncertainty. By applying a temporal difference, reinforcement learning algorithm known as Q-learning, optimal scaling policies can be determined. Additionally, reinforcement learning techniques typically suffer from curse of dimensionality problems, where the state space grows exponentially with each additional state variable. To address this challenge, we also present a novel parallel Q-learning approach aimed at reducing the time taken to determine optimal policies whilst learning online. Copyright © 2012 John Wiley & Sons, Ltd.

Transforming reactive auto-scaling into proactive auto-scaling

Conference Paper

Apr 2013

Elasticity is a key characteristic of cloud platforms enabling resource to be acquired on-demand in response to time-varying workloads. We introduce a new elasticity management framework that takes as input commonly used reactive rule-based scaling strategies but offers in return proactive auto-scaling. The elasticity framework combines reactive and predictive auto-scaling techniques, and we discuss the specification and performance of these individual components. We present a case study, based on real datasets, to demonstrate that our framework is capable of making appropriate auto-scaling decisions that can improve resource utilization compared to that obtained from a purely reactive approach.

A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments

Abstract and Figures

Recommended publications

Enhancing Elasticity of SaaS Applications using Queuing Theory

SLA-driven cloud elasticity anagement approach

An Architecture for Automatic Scaling of Replicated Services

Measuring Prediction Sensitivity of a Cloud Auto-scaling System

A Comparison of Reinforcement Learning Techniques for Fuzzy Cloud Auto-Scaling