ArticlePDF Available

Adaptive resource provisioning for read intensive multi-tier applications in the cloud

June 2011
Future Generation Computer Systems 27(6):871-879

June 2011
27(6):871-879

DOI:10.1016/j.future.2010.10.016

Source
DBLP

Authors:

Waheed Iqbal

University of the Punjab

Matthew N. Dailey

Asian Institute of Technology

David Carrera

Barcelona Supercomputing Center

Paul Janecek

Asian Institute of Technology

A Service-Level Agreement (SLA) provides surety for specific quality attributes to the consumers of services. However, current SLAs offered by cloud infrastructure providers do not address response time, which, from the user’s point of view, is the most important quality attribute for Web applications. Satisfying a maximum average response time guarantee for Web applications is difficult for two main reasons: first, traffic patterns are highly dynamic and difficult to predict accurately; second, the complex nature of multi-tier Web applications increases the difficulty of identifying bottlenecks and resolving them automatically. This paper proposes a methodology and presents a working prototype system for automatic detection and resolution of bottlenecks in a multi-tier Web application hosted on a cloud in order to satisfy specific maximum response time requirements. It also proposes a method for identifying and retracting over-provisioned resources in multi-tier cloud-hosted Web applications. We demonstrate the feasibility of the approach in an experimental evaluation with a testbed EUCALYPTUS-based cloud and a synthetic workload. Automatic bottleneck detection and resolution under dynamic resource management has the potential to enable cloud infrastructure providers to provide SLAs for Web applications that guarantee specific response time requirements while minimizing resource utilization.

Component deployment diagram for system components including main interactions.

…

EUCALYPTUS-based testbed cloud using seven physical machines. We installed the CLC and CC on a front-end node attached to both our main LAN and the cloud's private network. We installed the NCs on six separate machines (Node1, Node2, Node3, Node4, Node5, and Node6) connected to the private network. Each physical machine has the capacity to spawn a maximum number of virtual machines as shown (highlighted in red) in the figure, based on the number of cores.

…

Workload generation profile for all experiments.

…

Throughput of the system during Experiment 1.

…

95 th percentile of mean response time during Experiment 2.

…

Figures - uploaded by Waheed Iqbal

Content may be subject to copyright.

Content uploaded by Waheed Iqbal

Content may be subject to copyright.

Content uploaded by Waheed Iqbal

Content may be subject to copyright.

Adaptive Resource Provisioning for Read Intensive

Multi-tier Applications in the Cloud

Waheed Iqbala, Matthew N. Daileya, David Carrerab, Paul Janeceka

aComputer Science and Information Management, Asian Institute of Technology,

Thailand

bTechnical University of Catalonia (UPC) Barcelona Supercomputing Center (BSC),

Spain

Abstract

A Service-Level Agreement (SLA) provides surety for speciﬁc quality at-

tributes to the consumers of services. However, current SLAs oﬀered by cloud

infrastructure providers do not address response time, which, from the user’s

point of view, is the most important quality attribute for Web applications.

Satisfying a maximum average response time guarantee for Web applications

is diﬃcult for two main reasons: ﬁrst, traﬃc patterns are highly dynamic

and diﬃcult to predict accurately; second, the complex nature of multi-tier

Web applications increases the diﬃculty of identifying bottlenecks and resolv-

ing them automatically. This paper proposes a methodology and presents a

working prototype system for automatic detection and resolution of bottle-

necks in a multi-tier Web application hosted on a cloud in order to satisfy

speciﬁc maximum response time requirements. It also proposes a method

for identifying and retracting over-provisioned resources in multi-tier cloud-

hosted Web applications. We demonstrate the feasibility of the approach in

an experimental evaluation with a testbed EUCALYPTUS-based cloud and

a synthetic workload. Automatic bottleneck detection and resolution under

dynamic resource management has the potential to enable cloud infrastruc-

ture providers to provide SLAs for Web applications that guarantee speciﬁc

response time requirements while minimizing resource utilization.

Keywords: Cloud computing, Adaptive resource management, Quality of

Service, Multi-tier applications, Service-level agreement, Scalability

Preprint submitted to Future Generation Computer Systems October 21, 2010

1. Introduction

Cloud providers [1] use the Infrastructure as a Service model to allow

consumers to rent computational and storage resources on demand and ac-

cording to their usage. Cloud infrastructure providers maximize their proﬁts

by fulﬁlling their obligations to consumers with minimal infrastructure and

maximal resource utilization.

Although most cloud infrastructure providers provide service-level agree-

ments (SLAs) for availability or other quality attributes, the most important

quality attribute for Web applications from the user’s point of view, response

time, is not addressed by current SLAs. Guaranteeing response time is a dif-

ﬁcult problem for two main reasons. First, Web application traﬃc is highly

dynamic and diﬃcult to predict accurately. Second, the complex nature

of multi-tier Web applications, in which bottlenecks can occur at multiple

points, means response time violations may not be easy to diagnose or rem-

edy. It is also diﬃcult to determine optimal static resource allocation for

multi-tier Web applications manually for certain workloads due to the dy-

namic nature of incoming requests and exponential number of possible allo-

cation strategies. Therefore, if a cloud infrastructure provider is to guarantee

a particular maximum response time for any traﬃc level, it must automati-

cally detect bottleneck tiers and allocate additional resources to those tiers

as traﬃc grows.

In this paper, we take steps toward eliminating this limitation of cur-

rent cloud-based Web application hosting SLAs. We propose a methodology

and present a working prototype system running on a EUCALYPTUS-based

[2] cloud that actively monitors the response times for requests to a multi-

tier Web application, gathers CPU usage statistics, and uses heuristics to

identify the bottlenecks. When bottlenecks are identiﬁed, the system dy-

namically allocates the resources required by the application to resolve the

identiﬁed bottlenecks and maintain response time requirements. The system

furthermore predicts the optimal conﬁguration for the dynamically varying

workload and scales down the conﬁguration whenever possible to minimize

resource utilization.

The bottleneck resolution method is purely reactive. Reactive bottleneck

resolution has the beneﬁt of avoiding inaccurate a priori performance models

and pre-deployment proﬁling. In contrast, the scale down method is neces-

sarily predictive, since we must avoid premature release of busy resources.

However, the predictive model is built using application performance statis-

tics acquired while the application is running under real-world traﬃc loads,

so it neither suﬀers from the inaccuracy of a priori models nor requires pre-

deployment proﬁling.

In this paper, we describe our prototype, the heuristics we have developed

for reactive scale-up of multi-tier Web applications, the predictive models

we have developed for scale-down, and an evaluation of the prototype on

a testbed cloud. The evaluation uses a speciﬁc two-tier Web application

consisting of a Web server tier and a database tier. In this context, the

resources to be minimized are the number of Web servers in the Web server

tier and the number of replicas in the database tier. We ﬁnd that the system

is able to detect bottlenecks, resolve them using adaptive resource allocation,

satisfy the SLA, and free up over-provisioned resources as soon as they are

not required.

There are a few limitations to this preliminary work. We only address

scaling of the Web server tier and a read-only database tier. Our system

only perform hardware and virtual resource management for applications.

In particular, we do not address software conﬁguration management; for

example, we assume that the number of connections from each server in

the Web server tier to the database tier is suﬃcient for the given workload.

Additionally, real-world cloud infrastructure providers using our approach to

response time-driven SLAs would need to protect themselves with detailed

contracts (imagine for example the rogue application owner who purposefully

inserts delays in order to force SLA violations). We plan to address some of

these limitations in future work.

In the rest of this paper, we provide related work, then we describe our

approach, the prototype implementation, and an experimental evaluation of

the prototype.

2. Related Work

There has been a great deal of research on dynamic resource allocation

for physical and virtual machines and clusters of virtual machines [3]. In [4]

and [5], a two-level control loop is proposed to make resource allocation deci-

sions within a single physical machine. This work does not address integrated

management of a collection of physical machines. The authors of [6] study

the overhead of a dynamic allocation scheme that relies on virtualization as

opposed to static resource allocation. None of these techniques provide a

technology to dynamically adjust allocation based on SLA objectives in the

presence of resource contention.

VMware DRS [7] provides technology to automatically adjust the amount

of physical resources available to VMs based on deﬁned policies. This is

achieved using the live-migration automation mechanism provided by VMo-

tion. VMware DRS adopts a VM-centric view of the system: policies and

priorities are conﬁgured on a VM-level.

A approach similar to VMware DRS is proposed in [8], which proposes a

dynamic adaptation technique based on rearranging VMs so as to minimize

the number of physical machines used. The application awareness is limited

to conﬁguring physical machine utilization thresholds based on oﬀ-line anal-

ysis of application performance as a function of machine utilization. In all of

this work, runtime requirements of VMs are taken as a given and there is no

explicit mechanism to tune resource consumption by any given VM.

Foster et al. [9] address the problem of deploying a cluster of virtual

machines with given resource conﬁgurations across a set of physical machines.

Czajkowski et al. [10] deﬁne a Java API permitting developers to monitor

and manage a cluster of Java VMs and to deﬁne resource allocation policies

for such clusters.

Unlike [7] and [8], our system takes an application-centric approach; the

virtual machine is considered only as a container in which an application is

deployed. Using knowledge of application workload and performance goals,

we can utilize a more versatile set of automation mechanisms than [7], [8],

[9], or [10].

Network bandwidth allocation issues in the deployment of clusters of

virtual machines has also been studied in [11]. The problem there is to place

virtual machines interconnected using virtual networks on physical servers

interconnected using a wide area network. VMs may be migrated, but the

emphasis is rather than resource scaling, to allocate network bandwidth for

the virtual networks. In contrast, our focus is on data center environments,

in which network bandwidth is of lesser concern.

There have been several eﬀorts to perform dynamic scaling of Web ap-

plications based on workload monitoring. Amazon Auto Scaling [12] allows

consumers to scale up or down according to criteria such as average CPU

utilization across a group of compute instances. [13] presents the design of

an auto-scaling solution based on incoming traﬃc analysis for Axis2 Web

services running on Amazon EC2. [14] presents a statistical machine learn-

ing approach to predict system performance and minimize the number of

resources required to maintain the performance of an application hosted on

a cloud. [15] monitors the CPU and bandwidth usage of virtual machines

hosted on an Amazon EC2 cloud, identiﬁes the resource requirements of ap-

plications, and dynamically switches between diﬀerent virtual machine con-

ﬁgurations to satisfy the changing workloads. However, none of these solu-

tions address the issues of multi-tier Web applications or database scalability,

a crucial step to dynamically manage multi-tier workloads.

Thus far, only a few researchers have addressed the problem of resource

provisioning for multi-tier applications. [16] presents an analytical model

using queuing networks to capture the behavior of each tier. The model is

able to predict the mean response time for a speciﬁc workload given several

parameters such as the visit ratio,service time, and think time. However, the

authors do not apply their approach toward dynamic resource management

on clouds. [17] presents a predictive and reactive approach using queuing

theory to address dynamic provisioning for multi-tier applications. The pre-

dictive approach is to allocate resources to applications on large time scales

such as days and hours, while the reactive approach is used for short time

scales such as seconds and minutes. This allows the system to overcome the

“ﬂash crowd” phenomenon and correct prediction mistakes made by the pre-

dictive model. The technique assumes knowledge of the resource demands of

each tier. In addition to the queuing model, the authors also provide a sim-

ple black-box approach for dynamic provisioning that scales up all replicable

tiers when bottlenecks are detected. However, this work does not address

database scalability or releasing of application resources when they are not

required. In contrast, our system classiﬁes requests as either dynamic or

static and uses a black box heuristic technique to scale up and scale down

only one tier at a time. Our scale-up system is reactive in resolving bottle-

necks and our scale-down system is predictive in releasing resources.

The most recent work in this area [18] presents a technique to model

dynamic workloads for multi-tier Web applications using k-means cluster-

ing. The method uses queuing theory to model the system’s reaction to the

workload and to identify the number of instances required for an Amazon

EC2 cloud to perform well under a given workload. Although this work does

model system behavior on a per-tier basis, it does not perform multi-tier

dynamic resource provisioning. In particular, database tier scaling is not

considered.

In our own recent work [19], we consider single-tier Web applications, use

log-based monitoring to identify SLA violations, and use dynamic resource

allocation to satisfy SLAs. In [20], we consider multi-tier Web applications

and propose an algorithm based on heuristics to identify the bottlenecks.

This work uses a simple reactive technique to scale up multi-tier Web ap-

plications to satisfy SLAs. The work described in the current paper is an

extension of this work. We aim to solve the problem of dynamic resource

provisioning for multi-tier Web applications to satisfy a response time SLA

with minimal resource utilization. Our method is reactive for scale-up de-

cisions and predictive for scale-down decisions. Our method uses heuristics

and predictive models to scale each tier of a given application, with the goal

of requiring minimal knowledge of and minimal modiﬁcation of the existing

application. To the best of our knowledge, our system is the ﬁrst SLA-

driven resource manager for clouds based on open source technology. Our

working prototype, built on top of a EUCALYPTUS-based compute cloud,

provides dynamic resource allocation and load balancing for multi-tier Web

applications in order to satisfy a SLA that enforces speciﬁc response time

requirements.

3. System Design and Implementation Details

3.1. Dynamic provisioning for multi-tier Web applications

Here we describe our methodology for dynamic provisioning of resources

for multi-tier Web applications, including the algorithms, system design, and

implementation. A high-level ﬂow diagram for bottleneck detection, scale-up

decision making, and scale-down decision making in our prototype system is

shown in Figure 1.

3.1.1. Reactive model for scale-up

We use heuristics and active proﬁling of the CPUs of virtual machine-

hosted application tiers for identiﬁcation of bottlenecks. Our system reads

the Web server proxy logs for tseconds and clusters the log entries into dy-

namic content requests and static content requests. Requests to resources

(Web pages) containing server-side scripts (PHP, JSP, ASP, etc.) are consid-

ered as dynamic content requests. Requests to the static resources (HTML,

JPG, PNG, TXT, etc.) are considered as static content requests. Dynamic

resources are generated through utilization of the CPU and may depend on

other tiers, while static resources are pre-generated ﬂat ﬁles available in the

Web server tier. Each type of request has diﬀerent characteristics and is

Yes

Read proxy logs

for t seconds

Cluster requests as

dynamic or static

Calculate RT_s and RT_d (95th percentile of

the average response time of static and

dynamic content requests)

Is RT_s or RT_d

above threshold?

Scale-up Web

server tier

Get CPU utilization of

every instance in Web

server tier

Has last scale

operation been

realized?

Yes

Is CPU in any

instance in Web

tier saturating?

Start

Scale-up

Database tier

RT_s above threshold

RT_d above

threshold No

Get N_web and N_db (number of

Web server and database server

instances) from prediction model

Is N_web

< current number

of Web instances

Scale-down Web

server tier Are RT_s and

RT_d under

threshold for last k

intervals

Yes

Is N_db

< current number of

DB instances

Scale-down

database tier

Yes

No No

Figure 1: Flow diagram for prototype system that detects the bottleneck tier in a two-tier

Web application hosted on a heterogeneous cloud and dynamically scales the tier to satisfy

a SLA that deﬁnes response time requirements and ensures to release over-provisioned

resources.

monitored separately for purposes of bottleneck detection. The system cal-

culates the 95th percentile of the average response time. When static content

response time indicates saturation, the system scales the Web server tier.

When the system determines that dynamic content response time indicates

saturation, it obtains the CPU utilization across the Web server tier. If the

CPU utilization of any instance in the Web server tier has reached a satura-

tion threshold, the system scales up the Web server tier; otherwise, it scales

up the database tier. Each scale up operation adds exactly one server to a

speciﬁc tier. Our focus is on read-intensive applications, and we assume that

a mechanism such as [21] exists to ensure consistent reads after updates to a

master database. Before initiating a scale operation, the system ensures that

the eﬀect of the last scale operation has been realized. If the system satisﬁes

the response time requirements for kconsecutive intervals, it uses the pre-

dictive model to identify any over-provisioned resources and if appropriate,

scales down the over-provisioned tier(s). The predictive model is explained

next.

3.1.2. Predictive model for scale down

To determine when to initiate scale down operations, we use a regression

model that predicts, for each time interval t, the number of Web server

instances nweb

tand number of database server instances ndb

trequired for the

current observed workload. We use polynomial regression with polynomial

degree two. Our reactive scale-up algorithm feeds training observations to the

model as appropriate. We retain training observations for every interval of

time that satisﬁes the response time requirements. Each observation contains

the observed workload for each type of request and the existing conﬁguration

of the tiers for the last 60-second interval. We can express the model as

follows:

nweb

t=a0+a1(hs

t+hd

t) + a2(hs

t+hd

t)2+web

t(1)

ndb

t=b0+b1hd

t+b2(hd

t)2+db

t,(2)

where hs

tand hd

tare the number of static and dynamic requests received dur-

ing interval t. We assume noise web

t∼N(0,(σweb)2) and db

t∼N(0,(σdb)2).

Since both static and dynamic resource requests hit the Web server tier,

we assume that nweb

t(the number of Web server instances required, Equa-

tion 1) depends on both hs

tand hd

t. To keep the number of model parameters

to be estimated small, we use a single parameter for the sum of the two load

levels. Since the database server only handles database queries, which are

normally only invoked by dynamic pages, we assume that ndb

t(the number

of database server instances required, Equation 2) depends only on hd

The regression coeﬃcients a0,a1,a2,b0,b1, and b2are recalculated after

updating the suﬃcient statistics for all of the historical data every time a

new observation is received. (The suﬃcient statistics are the sums and sums

of squares for variables nweb

t,ndb

t,hs

t, and hd

tover the training set up to the

current point in time.) The most recent predictive model is used as shown

in the ﬂow diagram of Figure 1 to identify over-provisioned resources for the

current workload and retract them from the current conﬁguration.

3.2. System components and implementation

To manage cloud resources dynamically based on response time require-

ments, we developed three components: VLBCoordinator,VLBManager, and

VMProfiler. We use Nginx [22] as a load balancer because it oﬀers detailed

logging and allows reloading of its conﬁguration ﬁle without termination of

existing client sessions. VLBCoordinator and VLBManager are our service

management [23] components.

VLBCoordinator interacts with a EUCALYPTUS cloud using Typica

[24]. Typica is a simple API written in Java to access a variety of Ama-

zon Web services such as EC2, SimpleDB, and DevPay. The core functions

of VLBCoordinator are instantiateVirtualMachine and getVMIP, which

are accessible through XML-RPC. VLBManager monitors the traces of the

load balancer and detects violations of response time requirements. It clus-

ters the requests into static and dynamic resource requests and calculates the

average response time for each type of request. VMProfiler is used to log the

CPU utilization of each virtual machine. It exposes XML-RPC functions to

obtain the CPU utilization of speciﬁc virtual machine for the last nminutes.

Every Web application has an application-speciﬁc interface between the

Web server tier and the database tier. We assume that database writes are

handled by a single master MySQL instance and that database reads can

be handled by a cluster of MySQL slaves. Under this assumption, we have

developed a component for load balancing and scaling the database tier that

requires minimal modiﬁcation of the application.

Our prototype is based on the RUBiS [25] open-source benchmark Web

application for auctions. It provides core functionality of an auction site

such as browsing, selling, and bidding for items, and provides three user

roles: visitor, buyer, and seller. Visitors are not required to register and are

allowed to browse items that are available for auction. We used the PHP

implementation of RUBiS as a sample Web application for our experimental

evaluation.

To enable RUBiS to support load balancing over the database tier, we

modiﬁed it to use round-robin balancing over a set of database servers listed

in a database connection settings ﬁle, and we developed a server-side compo-

nent, DbConfigAgent, to update the database connection settings ﬁle after

a scaling operation has modiﬁed the conﬁguration of the database tier. The

entire benchmark system consists of the physical machines supporting the

EUCALYPTUS cloud, a virtual Web server acting as a proxying load bal-

ancer for the entire Web application, a tier of virtual Web servers running the

RUBiS application software, and a tier of virtual database servers. Figure 2

shows the deployment of our components along with the main interactions.

getCPUusage (vmid, duration)

Virtual Machine

(Web App Proxy)

VLBManager

Physical Machine (Cloud

Frontend)

VLBCoordinator

Physical Machine

VMProfiler

Virtual Machine

(Webserver)

DbConfigAgent

updateDBList(dblist)

scaleUp(vmImgid)

scaleDown(vmImgid)

Figure 2: Component deployment diagram for system components including main inter-

actions.

4. Experimental Setup

In this section we describe the setup for an experimental evaluation of

our prototype based on a testbed cloud using the RUBiS Web application

and a synthetic workload generator.

4.1. Testbed cloud

We built a small private heterogeneous compute cloud on seven physical

machines (Front-end, Node1, Node2, Node3, Node4, Node5, and Node6)

using EUCALYPTUS. Figure 3 shows the design of our testbed cloud. Front-

end and Node1 are Intel Pentium 4 machines with 2.84 GHz and 2.66 GHz

CPUs, respectively. Node2 is an Intel Celeron machine with a 2.4 GHz CPU.

Node3 is an Intel Core 2 Duo machine with a 2.6 GHz CPU. Node4, Node5,

and Node6 are Intel Pentium Dual Core machines with 2.8 GHz CPU. Front-

end, Node2, Node3, Node4, Node5, and Node6 have 2 GB RAM while Node1

and Node4 have 1.5 GB RAM.

We used EUCALYPTUS to establish a cloud architecture comprised of

one Cloud Controller (CLC), one Cluster Controller (CC), and six Node Con-

trollers (NCs). We installed the CLC and CC on a front-end node attached

to both our main LAN and the cloud’s private network. We installed the

NCs on six separate machines (Node1, Node2, Node3, Node4, Node5, and

Node6) connected to the private network.

Node1

Internet

LAN

Front-end

CLC

VMs

Node3

VMs

Node4

Node2

VMs

Node5

VMs

Node6

Figure 3: EUCALYPTUS-based testbed cloud using seven physical machines. We installed

the CLC and CC on a front-end node attached to both our main LAN and the cloud’s

private network. We installed the NCs on six separate machines (Node1, Node2, Node3,

Node4, Node5, and Node6) connected to the private network. Each physical machine has

the capacity to spawn a maximum number of virtual machines as shown (highlighted in

red) in the ﬁgure, based on the number of cores.

4.2. Workload generation

We use httperf [26] to generate synthetic workloads for RUBiS. We gen-

erate workloads for speciﬁc durations with a required number of user sessions

per second. A user session emulates a visitor that browses items up for auc-

tion in speciﬁc categories and geographical regions and also bids on items

up for auction. In a ﬁrst cycle, every ﬁve minutes, we increment the load

level by 6, from load level 6 up to load level 108, and then we decrement

the load level by 6 from load level 108 down to load level 6. In a second

cycle, we increment the load level by 6, from load level 6 up to load level 60,

and then we decrement the load level by 6 from load level 60 down to load

108

120

0 20 40 60 80 100 120 140 160 180 200 220 240 260 280

Load level (new sessions/s)

Time (minutes)

Figure 4: Workload generation proﬁle for all experiments.

level 6. Each load level represents the number of user sessions per second;

each user session makes six requests to static resources and ﬁve requests to

dynamic resources including ﬁve pauses to simulate user think time. The dy-

namic resources consist of PHP pages that make read-only database queries.

Note that while each session is closed loop (the workload generator waits

for a response before submitting the next request), session creation is open

loop: new sessions are created independently of the system’s ability to handle

them. This means that many requests may queue up, leading to exponen-

tial increases in response times. Figure 4 shows the workload levels we use

for our experiments over time. We use three workload generators distributed

over three separate machines during the experiments to ensure that workload

generation machines never reach saturation.

We performed all of our experiments based on this workload generator

and RUBiS benchmark Web application.

5. Experimental Design

To evaluate our proposed system, we performed three experiments. Ex-

periments 1 and 2 proﬁle the system’s behavior using speciﬁc static alloca-

tions. Experiment 3 proﬁles the system’s behavior under adaptive resource

allocation using the proposed algorithm for bottleneck detection and resolu-

tion. Experiments 1 and 2 demonstrate system behavior using current in-

dustry practices, whereas Experiment 3 shows the strength of the proposed

alternative methodology. Table 1 summarizes the experiments, and details

follow.

5.1. Experiment 1: Simple static allocation

In Experiment 1, we statically allocate one virtual machine to the Web

server tier and one virtual machine to the database tier, and then we proﬁle

system behavior over the synthetic workload described previously. The single

Web server/single database server conﬁguration is the most common initial

allocation strategy used by most application deployment engineers.

5.2. Experiment 2: Static over-allocation

In Experiment 2, we over-allocate resources, using a maximal static con-

ﬁguration suﬃcient to process the workload. We statically allocate a cluster

of four Web server instances and four database server instances, and then

we then proﬁle the system behavior over the synthetic workload described

previously. Since it is quite diﬃcult to determine an optimal allocation for a

multi-tier application manually, we actually derived this conﬁguration from

the the behavior of the adaptive system proﬁled in Experiment 3.

5.3. Experiment 3: Adaptive allocation under proposed system

In Experiment 3, we use our proposed system to adapt to changing work-

loads. Initially, we started two virtual machines on our testbed cloud. The

Nginx-based Web server farm was initialized with one virtual machine host-

ing the Web server tier, and another single virtual machine was used to host

the database tier. As discussed earlier, we modiﬁed RUBiS to perform load

balancing across the instances in the database server cluster. The system’s

goal was to satisfy a SLA that enforces a one-second maximum average re-

sponse time requirement for the RUBiS application regardless of load level

using our proposed algorithm for bottleneck detection and resolution. The

threshold for CPU saturation (refer to the ﬂow diagram in Figure 1) was

set to 85% utilization. This gives the system a chance to handle unexpected

spikes in CPU activities, and it is a reasonable threshold for eﬃcient use of

the server [27].

To determine good values for the important parameters t(the time to read

proxy traces) and k(the number of consecutive intervals required to satisfy

response time constraints before a scale-down operation is attempted), we

performed a grid search over a set of reasonable values for tand k.

Table 1: Summary of experiments.

Exp. Description

1 Static allocation using one VM for Web server tier and one

VM for database tier

2 Static over-allocation using a cluster of four VMs for the Web

server tier and four VMs for database tier

3 Adaptive allocation using proposed methodology

6. Experimental Results

6.1. Experiment 1: Simple static allocation

This section describes the results we obtained in Experiment 1. Figure 5

shows the throughput of the system during the experiment. After load level

30, we do not observe any growth in the system’s throughput because one

or both of the tiers have reached their saturation points. Although the load

level increases with time, the system is unable to serve all requests, and it

either rejects or queues the remaining requests.

Figure 6 shows the 95th percentile of average response time during Ex-

periment 1. From load level 6 to load level 24, we observe a nearly constant

response time, but after load level 24, the arrival rate exceeds the limits of

the system’s processing capacity. One of the virtual machines hosting the

application tiers becomes a bottleneck, then requests begin to spend more

time in the queue and request processing time increases. From that point we

observe rapid growth in the response time. After load level 30, however, the

queue also becomes saturated, and the system rejects most requests. There-

fore, we do not observe further growth in the average response time. Clearly,

the system only works eﬃciently from load level 6 to load level 24.

Figure 7 shows the CPU utilization of the two virtual machines hosting

the application tiers during Experiment 1. The downward spikes at the be-

ginning of each load level occur because all user sessions are cleared between

load level increments, and it takes some time for the system to return to

a steady state. We do not observe any tier saturating its CPU during this

experiment; after load level 30, the CPU utilization remains nearly constant,

indicating that the CPU was not a bottleneck for this application with the

given workload.

100

200

300

400

500

600

700

0 20 40 60 80 100 120 140 160 180 200 220 240 260 280

108

120

Throughput (number of served requests/s)

Load level (new sessions/s)

Time (minutes)

Dynamic contents

Static contents

Workload

Figure 5: Throughput of the system during Experiment 1.

0 20 40 60 80 100 120 140 160 180 200 220 240 260 280

# machines

Time (minutes)

DB server

Web server

5000

10000

15000

20000

25000

30000

108

120

95th percentile of mean response time (ms)

Load level (new sessions/s)

Dynamic contents

Static contents

Max SLA mean response time

Workload

Figure 6: 95th percentile of mean response time during Experiment 1.

6.2. Experiment 2: Static over-allocation

In Experiment 2, to observe the system’s behavior under a static al-

location policy using the maximal conﬁguration observed during adaptive

experiments, we allocated four virtual machines to the Web server tier and

four virtual machines to the database tier, and generated the same work-

load described in Section 4.2. Figure 8 shows the throughput of the system

during Experiment 2. We observe the expected linear relationship between

load level and throughput; as load level increases, the system throughput

increases, and as load level decreases, the system throughput decreases.

Figure 9 shows the 95th percentile of average response times during Ex-

periment 2. We do not observe any response time violations during the

100

0 20 40 60 80 100 120 140 160 180 200 220 240 260 280

CPU utilization (%)

Time (minutes)

Database server VM

100

CPU utilization (%)

Web server VM

Figure 7: CPU utilization of virtual machines used during Experiment 1.

100

200

300

400

500

600

700

0 20 40 60 80 100 120 140 160 180 200 220 240 260 280

Throughput (number of served requests/s)

Time (minutes)

Dynamic contents

Static contents

Figure 8: Throughput of the system during Experiment 2.

experiment. We observe a slight increase in response time during load lev-

els 80 to 100 because, during this interval, the system is serving the peak

workload and utilizing all of the allocated resources to satisfy the workload

requirements. This experiment shows that the maximal conﬁguration identi-

ﬁed by our adaptive resource allocation system would never lead to violations

of the response time requirements under the same load.

6.3. Experiment 3: Bottleneck detection and resolution under adaptive allo-

cation

This section describes the results of Experiment 3 using our proposed

algorithm for bottleneck detection and resolution. We ﬁrst identiﬁed appro-

priate values and impact for important parameters (tand k) in our proposed

0 20 40 60 80 100 120 140 160 180 200 220 240 260 280

# machines

Time (minutes)

DB server

Web server

500

1000

1500

2000

108

120

95th percentile of mean response time (ms)

Load level (new sessions/s)

Dynamic contents

Static contents

Max SLA mean response time

Workload

Figure 9: 95th percentile of mean response time during Experiment 2.

Table 2: Summary of grid search to ﬁnd good values for the important parameters of the

proposed system.

t k % requests

missing SLA

Scale-down

mistakes Total operations

30 4 3.228 12 38

30 8 2.002 3 22

60 4 2.413 4 22

60 8 2.034 2 20

120 4 3.227 2 18

120 8 3.312 0 15

algorithm using a grid search. We then examined the results from the best

conﬁguration in more detail.

6.3.1. Parameter value identiﬁcation

We used t= 30,60,and 120 and k= 4 and 6 for the grid search. For

each value of tand k, the percentage of requests missing SLA requirements,

scale-down decision mistakes, and total number of scale operations (scale-up

and scale-down) are shown in Table 2.

Figure 10 compares the percentage of requests missing SLA requirements,

scale-down decision mistakes, and total number of scale (scale-up and scale-

down) operations over diﬀerent values of tand k.

We observe that a large number of requests exceed the required response

time when we use small values (t= 30, k= 4) for both parameters or a

large value (t= 120) for t. The parameter kis the number of consecutive

0 30 60 90 120 150

% Requests missing SLA

t (seconds)

k = 4

k = 8

(a) Percentage of requests

missing SLA.

0 30 60 90 120 150

Scale-down decision mistakes

t (seconds)

k = 4

k = 8

(b) Scale-down mistakes.

0 30 60 90 120 150

Number of scale operation

t (seconds)

k = 4

k = 8

erations.

Figure 10: Grid search comparison for determining appropriate values of tand kfor the

system.

intervals of length trequired to satisfy response time constraints before a

scale-down operation is attempted. As kdepends on t, using small values

for tand kenables system to react quickly and make scale-down decisions

that increase the number of scale-down mistakes. The system requires some

time to recover from such mistakes, so we observe additional response time

violations during the recovery. The large value of tincreases the system’s

reaction time; this is why we observe a large number of requests exceeding the

required response time with t= 120. We can also observe that as tincreases,

the number of scale down mistakes decreases, since scale down decisions are

made less frequently. However, the slower response with high values of talso

means that the system takes more time to respond to long traﬃc spikes and

to release over-provisioned resources. Smaller values of twith larger values of

kreduce the occurrence of scale down mistakes without negatively aﬀecting

the system’s responsiveness to traﬃc spikes.

We selected the values t= 60 and k= 8 for further examination, as these

values provide a good trade oﬀ between the percentage of requests missing

the SLA, the number of scale-down decision mistakes, and the total number

of operations. Figure 11 shows the 95th percentile of the average response

time during Experiment 3 using automatic bottleneck detection and adaptive

resource allocation under this parameter regime. The bottom graph shows

the adaptive addition and retraction of instances in each tier after a bot-

tleneck or over-provisioning is detected during the experiment. Whenever

the system detects a violation of the response time requirements, it uses the

proposed reactive algorithm to identify the bottleneck tier then dynamically

adds another virtual machine to the server farm for that bottleneck tier.

We observe temporary violations of the required response time for short pe-

0 20 40 60 80 100 120 140 160 180 200 220 240 260 280

# machines

Time (minutes)

DB server

Web server

5000

10000

15000

20000

25000

108

120

95th percentile of mean response time (ms)

Load level (new sessions/s)

Dynamic contents

Static contents

Max SLA mean response time

Workload

Figure 11: 95th percentile of mean response time during Experiment 3 using t= 1 and

k= 8 under proposed system.

riods of time due to the latency of virtual machine boot-up and the time

required to observe the eﬀects of previous scale operations. Whenever the

system identiﬁes over-provisioning of virtual machines for speciﬁc tiers us-

ing the predictive model, it scales down the speciﬁc tiers adaptively. In the

beginning, the prediction model makes some mistakes; we can observe two in-

correctly predicted scale-down decisions at load level 146 and load level 252.

However, the reactive scale-up algorithm quickly brings the system back to a

conﬁguration that satisﬁes the response time requirements. Occasional mis-

takes such as these are expected due to noise, since the predictive approach

is statistical. We could in principle reduce the occurrence of these mistakes

by incorporating traﬃc pattern prediction as part of the decision model.

Figure 12 shows the system throughput during the experiment. We ob-

serve linear growth in the system throughput through the full range of load

levels. The throughput increases and decreases as required with the load

level.

Figure 13 shows the CPU utilization of all virtual machines during the

experiment. Initially, the system is conﬁgured with one VM in each tier. The

system adaptively adds and removes virtual machines to each tier over time.

The diﬀering steady-state levels of CPU utilization for the diﬀerent VMs

reﬂects the use of round-robin balancing across diﬀering processor speeds for

the physical nodes. We observe the same downward spike at the beginning

of each load level as in the earlier experiments due to the time for the system

to return to steady state after all user sessions are cleared.

100

200

300

400

500

600

700

0 20 40 60 80 100 120 140 160 180 200 220 240 260 280

Throughput (number of served requests/s)

Time (minutes)

Dynamic contents

Static contents

Figure 12: Throughput of the system during Experiment 3 using t= 1 and k= 8 under

proposed system.

100

0 20 40 60 80 100 120 140 160 180 200 220 240 260 280

CPU utilization (%)

Time (minutes)

Database server VMs

CPU saturation threashold

100

CPU utilization (%)

Web server VMs

CPU saturation threashold

Figure 13: CPU utilization of all VMs during Experiment 3 using t= 1 and k= 8 under

proposed system.

The experiments demonstrate ﬁrst that insuﬃcient static resource alloca-

tion policies lead to system failure, that maximal static resource allocation

policies lead to overprovisioning of resources, and that our proposed adap-

tive resource allocation method is able to maintain a maximum response time

SLA while utilizing minimal resources.

7. Conclusion

In this paper, we have proposed a methodology and described a proto-

type system for automatic identiﬁcation and resolution of bottlenecks and

automatic identiﬁcation and resolution of overprovisioning in multi-tier ap-

plications hosted on a cloud. Our experimental results show that while we

clearly cannot provide a SLA guaranteeing a speciﬁc response time with an

undeﬁned load level for a multi-tier Web application using static resource al-

location, our adaptive resource provisioning method could enable us to oﬀer

such SLAs.

It is very diﬃcult to identify a minimally resource intensive conﬁguration

of a multi-tier Web application that satisﬁes given response time require-

ments for a given workload, even using pre-deployment training and testing.

However, our system is capable of identifying the minimum resources re-

quired using heuristics, a predictive model, and automatic adaptive resource

provisioning. Cloud infrastructure providers can adopt our approach not

only to oﬀer their customers SLAs with response time guarantees but also

to minimize the resources allocated to the customers’ applications, reducing

their costs.

We are currently extending our system to support n-tier clustered ap-

plications hosted on a cloud, and we are planning to extend our prediction

model, which is currently only used to retract over-provisioned resources,

to also perform bottleneck prediction in advance, in order to overcome the

virtual machine boot-up latency problem. We are developing more sophis-

ticated methods to classify URLs into static and dynamic content requests,

rather than relying on ﬁlename extensions. Finally, we intend to incorpo-

rate the eﬀects of heterogeneous physical machines on the prediction model

and also address issues related to best utilization of physical machines for

particular tiers.

Acknowledgments

This work was supported by graduate fellowships from the Higher Edu-

cation Commission (HEC) of Pakistan and the Asian Institute of Technology

to WI and by the Ministry of Science and Technology of Spain under contract

TIN2007-60625. We thank Faisal Bukhari, Irshad Ali, and Kifayat Ullah for

valuable discussions related to this work.

References

[1] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, I. Brandic, Cloud

computing and emerging IT platforms: Vision, hype, and reality for

delivering computing as the 5th utility, Future Generation Computer

Systems 25 (2009) 599 – 616.

[2] D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L. Yous-

eﬀ, D. Zagorodnov, The EUCALYPTUS Open-source Cloud-computing

System, in: CCA’08: Proceedings of the Cloud Computing and Its

Applications Workshop, Chicago, IL, USA.

[3] P. Anedda, S. Leo, S. Manca, M. Gaggero, G. Zanetti, Suspending,

migrating and resuming HPC virtual clusters, Future Generation Com-

puter Systems 26 (2010) 1063 –72.

[4] X. Zhu, Z. Wang, S. Singhal, Utility-driven workload management using

nested control design, in: ACC ’06: American Control Conference,

Minneapolis, Minnesota USA.

[5] P. Padala, K. G. Shin, X. Zhu, M. Uysal, Z. Wang, S. Singhal, A. Mer-

chant, K. Salem, Adaptive control of virtualized resources in utility

computing environments, in: EuroSys ’07: Proceedings of the ACM

SIGOPS/EuroSys European Conference on Computer Systems 2007,

ACM, New York, NY, USA, 2007, pp. 289–302.

[6] Z. Wang, X. Zhu, P. Padala, S. Singhal, Capacity and performance

overhead in dynamic resource allocation to virtual containers, in: In-

tegrated Network Management, 2007. IM ’07. 10th IEEE International

Symposium on Integrated Management, Dublin, Ireland, pp. 149–58.

[7] WMware, VMware Distributed Resource Scheduler (DRS), 2010. Avail-

able at http://www.vmware.com/products/drs/.

[8] G. Khanna, K. Beaty, G. Kar, A. Kochut, Application performance man-

agement in virtualized server environments, in: NOMS ’06: Network

Operations and Management Symposium, Vancouver, BC, pp. 373–81.

[9] I. Foster, T. Freeman, K. Keahy, D. Scheftner, B. Sotomayer, X. Zhang,

Virtual clusters for grid communities, in: CCGRID ’06: Proceedings of

the Sixth IEEE International Symposium on Cluster Computing and the

Grid (CCGRID’06), IEEE Computer Society, Washington, DC, USA,

2006, pp. 513–20.

[10] G. Czajkowski, M. Wegiel, L. Daynes, K. Palacz, M. Jordan, G. Skinner,

C. Bryce, Resource management for clusters of virtual machines, in:

CCGRID ’05: Proceedings of the Fifth IEEE International Symposium

on Cluster Computing and the Grid (CCGrid’05) - Volume 1, IEEE

Computer Society, Washington, DC, USA, 2005, pp. 382–9.

[11] A. Sundararaj, M. Sanghi, J. Lange, P. Dinda, Hardness of approx-

imation and greedy algorithms for the adaptation problem in virtual

environments, in: ICAC ’06: 7th IEEE International Conference on

Autonomic Computing and Communicatioins, Washington, DC, USA,

pp. 291–2.

[12] Amazon Inc, Amazon Web Services auto scaling, 2009. Available at

http://aws.amazon.com/autoscaling/.

[13] A. Azeez, Auto-scaling Web services on Amazon EC2,

2008. Available at http://people.apache.org/~azeez/

autoscaling-web-services-azeez.pdf.

[14] P. Bodik, R. Griﬃth, C. Sutton, A. Fox, M. Jordan, D. Patterson, Sta-

tistical machine learning makes automatic control practical for internet

datacenters, in: HotCloud’09: Proceedings of the Workshop on Hot

Topics in Cloud Computing.

[15] H. Liu, S. Wee, Web server farm in the cloud: Performance evaluation

and dynamic architecture, in: CloudCom ’09: Proceedings of the 1st

International Conference on Cloud Computing, Springer-Verlag, Berlin,

Heidelberg, 2009, pp. 369–80.

[16] B. Urgaonkar, G. Paciﬁci, P. Shenoy, M. Spreitzer, A. Tantawi, An

analytical model for multi-tier internet services and its applications,

in: SIGMETRICS ’05: Proceedings of the 2005 ACM SIGMETRICS

International Conference on Measurement and Modeling of Computer

Systems, volume 33, ACM, 2005, pp. 291–302.

[17] B. Urgaonkar, P. Shenoy, A. Chandra, P. Goyal, T. Wood, Agile dy-

namic provisioning of multi-tier internet applications, ACM Transac-

tions on Autonomous and Adaptive Systems 3 (2008) 1–39.

[18] R. Singh, U. Sharma, E. Cecchet, P. Shenoy, Autonomic mix-aware

provisioning for non-stationary data center workloads, in: ICAC ’10:

Proceedings of the 7th IEEE International Conference on Autonomic

Computing and Communication, IEEE Computer Society, Washington,

DC, USA, 2010.

[19] W. Iqbal, M. Dailey, D. Carrera, SLA-driven adaptive resource manage-

ment for web applications on a heterogeneous compute cloud, in: Cloud-

Com ’09: Proceedings of the 1st International Conference on Cloud

Computing, Springer-Verlag, Berlin, Heidelberg, 2009, pp. 243–53.

[20] W. Iqbal, M. N. Dailey, D. Carrera, P. Janecek, SLA-driven automatic

bottleneck detection and resolution for read intensive multi-tier appli-

cations hosted on a cloud, in: GPC ’10: Proceedings of the 5th Interna-

tional Conference on Advances in Grid and Pervasive Computing, pp.

37–46.

[21] xkoto, Gridscale, 2009. http://www.xkoto.com/products/.

[22] I. Sysoev, Nginx, 2002. Available at http://nginx.net/.

[23] L. Rodero-Merino, L. M. Vaquero, V. Gil, F. Galn, J. Fontn, R. S. Mon-

tero, I. M. Llorente, From infrastructure delivery to service management

in clouds, Future Generation Computer Systems 26 (2010) 1226 –40.

[24] Google Code, Typica: A Java client library for a variety of Amazon Web

Services, 2008. Available at http://code.google.com/p/typica/.

[25] OW2 Consortium, RUBiS: An auction site prototype, 1999. http://

rubis.ow2.org/.

[26] D. Mosberger, T. Jin, httperf - a tool for measuring web server perfor-

mance, in: In First Workshop on Internet Server Performance, ACM,

1998, pp. 59–67.

[27] J. Allspaw, The Art of Capacity Planning, O’Reilly Media, Inc., Se-

bastopol CA, USA, 2008.

Cloud Elasticity: VM vs container: A Survey

Preprint

Full-text available

Feb 2024

Microservices is an architectural style of development consisting of a collection of small independent and loosely coupled components. Microservices-based applications are deployed in form of many containers deployed in a pool of virtual machines (VMs). When there is a rise on the workload, microservice resources are overloaded and then it should have additional resources. One of the most important issues in cloud environment is to minimize computing resources so as to reduce the deployment cost of the application. We can resolve this issue and optimize computing resources using autoscaling techniques. An autoscaler automatically provisions essential resources at real time. Existing autoscalers have many issues which reduce their efficiency. The first issue is that existing autoscalers suppose that threshold exceeding is always caused by the rise of the workload. However, exceeding thresholds may not be caused by the increase in the workload, but may be caused by other problems such as specific requests, VM or container issues. The second issue is that in resource provisioning, many autoscalers do not select the appropriate microservices for scaling resources. The third issue is that existing autoscalers do not calculate needed resources to be allocated for each microservice

Cloud Elasticity of Microservices-based Applications: A Survey

Preprint

Full-text available

Feb 2024

Elasticity is an essential treatment in Cloudenvironment employed in academic and industrial contexts. The main purpose of elasticity is to reduce thedeployment cost while optimizing computing resources.Multiple studies were conducted to tackle classic applications using monolithic architecture deployed withvirtual machines (VMs). However, with the spread ofmicroservice pattern, recent studies have been investigating this new trend using containers. This paperclassifies and discusses existing approaches dealing withcloud elasticity. It provides a novel taxonomy for elasticapproaches while focusing on microservices-based solutions. We additionally specify the strength and theshortcomings of each class of works. As a conclusion,we report the challenges for microservices-based applications elasticity and provide requirements for futureinvestigations.

INTEGRATION OF PROACTIVE AND REACTIVE APPROACHES TO SCALING IN KUBERNETES

Article

Full-text available

Jan 2023

A Cloud-Based Container Microservices: A Review on Load-Balancing and Auto-Scaling Issues

Article

Full-text available

Sep 2022

Microservices are being used by businesses to split monolithic software into a set of small services whose instances run independently in containers. Load balancing and auto-scaling are important cloud features for cloud-based container microservices because they control the number of resources available. The current issues concerning load balancing and auto-scaling techniques in Cloud-based container microservices were investigated in this paper. Server overloaded, service failure and traffic spikes were the key challenges faced during the microservices communication phase, making it difficult to provide better Quality of Service (QoS) to users. The aim is to critically investigate the addressed issues related to Load balancing and Auto-scaling in Cloud-based Container Microservices (CBCM) in order to enhance performance for better QoS to the users.

Design and Evaluation of a Hierarchical Characterization and Adaptive Prediction Model for Cloud Workloads

Article

Apr 2024

Workload characterization and subsequent prediction are significant steps in maintaining the elasticity and scalability of resources in Cloud Data Centers. Due to the high variance in cloud workloads, designing a prediction algorithm that models the variations in the workload is a non-trivial task. If the workload predictor is unable to handle the dynamism in the workloads, then the result of the predictor may lead to over-provisioning or under-provisioning of cloud resources. To address this problem, we have created a Super Markov Prediction Model (SMPM) whose behaviour changes as per the change in the workload patterns. As the time progresses, based on the workload pattern SMPM uses different sequence models to predict the future workload. To evaluate the proposed model, we have experimented with Alibaba trace 2018, Google Cluster Trace (GCT), Alibaba trace 2020 and TPC-W workload trace. We have compared SMPM's prediction results with existing state-of-the-art prediction models and empirically verified that the proposed prediction model achieves a better accuracy as quantified using Root Mean Square Error (RMSE) and Mean Absolute Error (MAE).

A heterogeneous multi-core architectural model for video scheduling for transcoding in clouds

Article

Apr 2024

A cluster of transcoding servers is essential for transcoding many on-demand videos. Cloud computing presents a scalable framework for online video transcoding, and the infrastructure as a service (IaaS) cloud provides heterogeneous virtual machines (VMs) for creating a dynamically scalable cluster of servers. Heterogeneous VMs consist of small or big cores, which are assigned dynamically to allocate varying sizes of videos to the appropriate VMs for transcoding. Earlier research has proposed cloud-based heterogeneous scheduling for allocating different types of videos to different types of VMs so that the quality of service is maintained by reducing video rejection. In this paper, we propose a heterogeneous multi-core video scheduling model that additionally estimates the number of VMs and cores per VM with the variation of the number of videos to optimize the resources and cost of a cloud-based transcoding system. We further estimate the model's overhead concerning the variation in the number of videos. We conducted experiments on random videos, and experimental results reveal that the proposed model provides an excellent estimation of the number of VMs and cores. The proposed model reduces the average cost by 5% and requires almost 10% fewer cores for processing video tasks than the existing work in average cases.

Optimize Elasticity in Cloud Computing using Container Based Virtualization

Article

Full-text available

Jan 2020

Cloud computing emphasis on using and underlying infrastructure in a much efficient way. That’s why it is gaining immense importance in today’s industry. Like every other field, cloud computing also has some key feature for estimating the standard of working of every cloud provider. Elasticity is one of these key features. The term elasticity in cloud computing is directly related to response time (a server takes towards user request during resource providing and de-providing. With increase in demand and a huge shift of industry towards cloud, the problem of handling user requests also arisen. For a long time, the concept of virtualization held industry with all its merits and demerits to handle multiple requests over cloud. Biggest disadvantage of virtualization shown heavy load on underlying kernel or server but from past some decades an alternative technology emerges and get popular in a short time due to great efficiency known as containerization. In this paper we will discuss about elasticity in cloud, working of containers to see how it can help to improve elasticity in cloud for this will using some tools for analyzing two technologies i.e. virtualization and containerization. We will observe whether containers show less response time than virtual machine. If yes that’s mean elasticity can be improved in cloud on larger scale which may improve cloud efficiency to a large extent and will make cloud more eye catching.

Cdascaler: a cost-effective dynamic autoscaling approach for containerized microservices

Article

Full-text available

Jan 2024
CLUSTER COMPUT

Microservices are containerized, loosely coupled, interactive smaller units of the application that can be deployed, reused, and maintained independently. In a microservices-based application, allocating the right computing resources for each containerized microservice is important to meet the specific performance requirements while minimizing the infrastructure cost. Microservices-based applications are easy to scale automatically based on incoming workload and resource demand automatically. However, it is challenging to identify the right amount of resources for containers hosting microservices and then allocate them dynamically during the auto-scaling. Existing auto-scaling solutions for microservices focus on identifying the appropriate time and number of containers to be added/removed dynamically for an application. However, they do not address the issue of selecting the right amount of resources, such as CPU cores, for individual containers during each scaling event. This paper presents a novel approach to dynamically allocate the CPU resources to the containerized microservice during the autoscaling events. Our proposed approach is based on the machine learning method, which can identify the right amount of CPU resources for each container, dynamically spawning for the microservices over time to satisfy the application’s response time requirements. The proposed solution is evaluated using a benchmark microservices-based application based on real-world workloads on the Kubernetes cluster. The experimental results show that the proposed solution outperforms by yielding a 40% to 60% reduction in violating the response time requirements with 0.5\(\times\) to 1.5\(\times\) less cost compared to the state-of-art baseline methods.

A Quantitative Approach to Coordinated Scaling of Resources in Complex Cloud Computing Workflows

Chapter

Oct 2023

Resource scaling is widely employed in cloud computing to adapt system operation to internal (i.e., application) and external (i.e., environment) changes. We present a quantitative approach for coordinated vertical scaling of resources in cloud computing workflows, aimed at satisfying an agreed Service Level Objective (SLO) by improving the workflow end-to-end (e2e) response time distribution. Workflows consist of IaaS services running on dedicated clusters, statically reserved before execution. Services are composed through sequence, choice/merge, and balanced split/join blocks, and have generally distributed (i.e., non-Markovian) durations possibly over bounded supports, facilitating fitting of analytical distributions from observed data. Resource allocation is performed through an efficient heuristics guided by the mean makespans of sub-workflows. The heuristics performs a top-down visit of the hierarchy of services, and it exploits an efficient compositional method to derive the response time distribution and the mean makespan of each sub-workflow. Experimental results on a workflow with high concurrency degree appear promising for feasibility and effectiveness of the approach.

Dynamic Management of CPU Resources Towards Energy Efficient and Profitable Datacentre Operation

Chapter

Jan 2023

Energy reduction has become a necessity for modern datacentres, with CPU being a key contributor to the energy consumption of nodes. Increasing the utilization of CPU resources on active nodes is a key step towards energy efficiency. However, this is a challenging undertaking, as the workload can vary significantly among the nodes and over time, exposing operators to the risk of overcommitting the CPU. In this paper, we explore the trade-off between energy efficiency and node overloads, to drive virtual machine (VM) consolidation in a cost-aware manner. We introduce a model that uses runtime information to estimate the target utilization of the nodes to control their load, identifying and considering correlated behavior among collocated workloads. Moreover, we introduce a VM allocation and node management policy that exploits the model to increase the profit of datacentre operators considering the trade-off between energy reduction and potential SLA violation costs. We evaluate our work through simulations using node profiles derived from real machines and workloads from real datacentre traces. The results show that our policy adapts the nodes’ target utilization in a highly effective way, converging to a target utilization that is statically optimal for the workload at hand. Moreover, we show that our policy closely matches, or even outperforms two state-of-the-art policies that combine VM consolidation with VFS – the second one, also operating the CPU at reduced voltage margins – even when these are configured to use a static, workload- and architecture-specific target utilization derived through offline characterization of the workload.

Statistical machine learning makes automatic control practical for internet datacenters

Article

Full-text available

Jun 2009

Horizontally-scalable Internet services on clusters of commodity computers appear to be a great fit for automatic control: there is a target output (service-level agreement), observed output (actual latency), and gain controller (adjusting the number of servers). Yet few datacenters are automated this way in practice, due in part to well-founded skepticism about whether the simple models often used in the research literature can capture complex real-life workload/performance relationships and keep up with changing conditions that might invalidate the models. We argue that these shortcomings can be fixed by importing modeling, control, and analysis techniques from statistics and machine learning. In particular, we apply rich statistical models of the application's performance, simulation-based methods for finding an optimal control policy, and change-point methods to find abrupt changes in performance. Preliminary results running aWeb 2.0 benchmark application driven by real workload traces on Amazon's EC2 cloud show that our method can effectively control the number of servers, even in the face of performance anomalies.

Application Performance Management in Virtualized Server Environments

Conference Paper

Full-text available

May 2006

As businesses have grown, so has the need to deploy I/T applications rapidly to support the expanding business processes. Often, this growth was achieved in an unplanned way: each time a new application was needed a new server along with the application software was deployed and new storage elements were purchased. In many cases this has led to what is often referred to as "server sprawl", resulting in low server utilization and high system management costs. An architectural approach that is becoming increasingly popular to address this problem is known as server virtualization. In this paper we introduce the concept of server consolidation using virtualization and point out associated issues that arise in the area of application performance. We show how some of these problems can be solved by monitoring key performance metrics and using the data to trigger migration of virtual machines within physical servers. The algorithms we present attempt to minimize the cost of migration and maintain acceptable application performance levels

Utility-Driven Workload Management using Nested Control Design

Conference Paper

Full-text available

Jul 2006

Virtualization and consolidation of IT resources have created a need for more effective workload management tools, one that dynamically controls resource allocation to a hosted application to achieve quality of service (QoS) goals. These goals can in turn be driven by the utility of the service, typically based on the application's service level agreement (SLA) as well as the cost of resources allocated. In this paper, we build on our earlier work on dynamic CPU allocation to applications on shared servers, and present a feedback control system consisting of two nested integral control loops for managing the QoS metric of the application along with the utilization of the allocated CPU resource. The control system was implemented on a lab testbed running an Apache Web server and using the 90th percentile of the response times as the QoS metric. Experiments using a synthetic workload based on an industry benchmark validated two important features of the nested control design. First, compared to a single loop for controlling response time only, the nested design is less sensitive to the bimodal behavior of the system resulting in more robust performance. Second, compared to a single loop for controlling CPU utilization only, the new design provides a framework for dealing with the tradeoff between better QoS and lower cost of resources, therefore resulting in better overall utility of the service

An analytical model for multi-tier internet services and its applications

Conference Paper

Full-text available

Jun 2005
Perform Eval Rev

Since many Internet applications employ a multi-tier architecture, in this paper, we focus on the problem of analytically modeling the behavior of such applications. We present a model based on a network of queues, where the queues represent different tiers of the application. Our model is sufficiently general to capture (i) the behavior of tiers with significantly different performance characteristics and (ii) application idiosyncrasies such as session-based workloads, concurrency limits, and caching at intermediate tiers. We validate our model using real multi-tier applications running on a Linux server cluster. Our experiments indicate that our model faithfully captures the performance of these applications for a number of workloads and configurations. For a variety of scenarios, including those with caching at one of the application tiers, the average response times predicted by our model were within the 95% confidence intervals of the observed average response times. Our experiments also demonstrate the utility of the model for dynamic capacity provisioning, performance prediction, bottleneck identification, and session policing. In one scenario, where the request arrival rate increased from less than 1500 to nearly 4200 requests/min, a dynamic provisioning technique employing our model was able to maintain response time targets by increasing the capacity of two of the application tiers by factors of 2 and 3.5, respectively.

SLA-Driven Automatic Bottleneck Detection and Resolution for Read Intensive Multi-tier Applications Hosted on a Cloud

Conference Paper

Full-text available

May 2010

A Service-Level Agreement (SLA) provides surety for specific quality attributes to the consumers of services. However, the current SLAs offered by cloud providers do not address response time, which, from the user’s point of view, is the most important quality attribute for Web applications. Satisfying a maximum average response time guarantee for Web applications is difficult for two main reasons: first, traffic patterns are unpredictable; second, the complex nature of multi-tier Web applications increases the difficulty of identifying bottlenecks and resolving them automatically. This paper presents a working prototype system that automatically detects and resolves bottlenecks in a multi-tier Web application hosted on a EUCALYPTUS-based cloud in order to satisfy specific maximum response time requirements. We demonstrate the feasibility of the approach in an experimental evaluation with a testbed cloud and a synthetic workload. Automatic bottleneck detection and resolution under dynamic resource management has the potential to enable cloud providers to provide SLAs for Web applications that guarantee specific response time requirements.

Adaptive control of virtualized resources in utility computing environments

Conference Paper

Full-text available

Jun 2007

Data centers are often under-utilized due to over-provisioning as well as time-varying resource demands of typical enterprise applications. One approach to increase resource utilization is to consolidate applications in a shared infrastructure using virtualization. Meeting application-level quality of service (QoS) goals becomes a challenge in a consolidated environment as application resource needs differ. Furthermore, for multi-tier applications, the amount of resources needed to achieve their QoS goals might be different at each tier and may also depend on availability of resources in other tiers. In this paper, we develop an adaptive resource control system that dynamically adjusts the resource shares to individual tiers in order to meet application-level QoS goals while achieving high resource utilization in the data center. Our control system is developed using classical control theory, and we used a black-box system modeling approach to overcome the absence of first principle models for complex enterprise applications and systems. To evaluate our controllers, we built a testbed simulating a virtual data center using Xen virtual machines. We experimented with two multi-tier applications in this virtual data center: a two-tier implementation of RUBiS, an online auction site, and a two-tier Java implementation of TPC-W. Our results indicate that the proposed control system is able to maintain high resource utilization and meets QoS goals in spite of varying resource demands from the applications.

SLA-Driven Adaptive Resource Management for Web Applications on a Heterogeneous Compute Cloud

Conference Paper

Full-text available

Dec 2009

Current service-level agreements (SLAs) offered by cloud providers make guarantees about quality attributes such as availability. However, although one of the most important quality attributes from the perspective of the users of a cloud-based Web application is its response time, current SLAs do not guarantee response time. Satisfying a maximum average response time guarantee for Web applications is difficult due to unpredictable traffic patterns, but in this paper we show how it can be accomplished through dynamic resource allocation in a virtual Web farm. We present the design and implementation of a working prototype built on a EUCALYPTUS-based heterogeneous compute cloud that actively monitors the response time of each virtual machine assigned to the farm and adaptively scales up the application to satisfy a SLA promising a specific average response time. We demonstrate the feasibility of the approach in an experimental evaluation with a testbed cloud and a synthetic workload. Adaptive resource management has the potential to increase the usability of Web applications while maximizing resource utilization.

Web Server Farm in the Cloud: Performance Evaluation and Dynamic Architecture

Conference Paper

Web applications’ traffic demand fluctuates widely and unpredictably. The common practice of provisioning a fixed capacity would either result in unsatisfied customers (underprovision) or waste valuable capital investment (overprovision). By leveraging an infrastructure cloud’s on-demand, pay-per-use capabilities, we finally can match the capacity with the demand in real time. This paper investigates how we can build a web server farm in the cloud. We first present a benchmark performance study on various cloud components, which not only shows their performance results, but also reveals their limitations. Because of the limitations, no single configuration of cloud components can excel in all traffic scenarios. We then propose a dynamic switching architecture which dynamically switches among several configurations depending on the workload and traffic pattern.

Cloud Computing and Emerging IT Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility

Article

Jun 2009
FUTURE GENER COMP SY

With the significant advances in Information and Communications Technology (ICT) over the last half century, there is an increasingly perceived vision that computing will one day be the 5th utility (after water, electricity, gas, and telephony). This computing utility, like all other four existing utilities, will provide the basic level of computing service that is considered essential to meet the everyday needs of the general community. To deliver this vision, a number of computing paradigms have been proposed, of which the latest one is known as Cloud computing. Hence, in this paper, we define Cloud computing and provide the architecture for creating Clouds with market-oriented resource allocation by leveraging technologies such as Virtual Machines (VMs). We also provide insights on market-based resource management strategies that encompass both customer-driven service management and computational risk management to sustain Service Level Agreement (SLA)-oriented resource allocation. In addition, we reveal our early thoughts on interconnecting Clouds for dynamically creating global Cloud exchanges and markets. Then, we present some representative Cloud platforms, especially those developed in industries, along with our current work towards realizing market-oriented resource allocation of Clouds as realized in Aneka enterprise Cloud technology. Furthermore, we highlight the difference between High Performance Computing (HPC) workload and Internet-based services workload. We also describe a meta-negotiation infrastructure to establish global Cloud exchanges and markets, and illustrate a case study of harnessing ‘Storage Clouds’ for high performance content delivery. Finally, we conclude with the need for convergence of competing IT paradigms to deliver our 21st century vision.

Suspending, migrating and resuming HPC virtual clusters

Article

Oct 2010
FUTURE GENER COMP SY

A systematic study of issues related to suspending, migrating and resuming virtual clusters for data-driven HPC applications is presented. The interest is focused on nontrivial virtual clusters, that is where the running computation is expected to be coordinated and strongly coupled. It is shown that this requires that all cluster level operations, such as start and save, should be performed as synchronously as possible on all nodes, introducing the need of barriers at the virtual cluster computing meta-level. Once a synchronization mechanism is provided, and appropriate transport strategies have been setup, it is possible to suspend, migrate and resume whole virtual clusters composed of “heavy” (4 GB RAM, 6 GB disk images) virtual machines in times of the order of few minutes without disrupting parallel computation–albeit of the MapReduce type–running inside them. The approach is intrinsically parallel, and should scale without problems to larger size virtual clusters.

Adaptive resource provisioning for read intensive multi-tier applications in the cloud

Abstract and Figures

Recommended publications

A self‐learning fuzzy approach for proactive resource provisioning in cloud environment

SLA-Driven Automatic Bottleneck Detection and Resolution for Read Intensive Multi-tier Applications...

Adaptive resource allocation for Back-end Mashup applications on a heterogeneous private cloud

Minimalistic Adaptive Resource Management for Multi-tier Applications Hosted on Clouds

Unsupervised Learning of Dynamic Resource Provisioning Policies for Cloud-Hosted Multitier Web Appli...