ArticlePDF Available

Popularity (Hit Rate) Based Replica Creation for Enhancing the Availability in Cloud Storage

January 2018
International Journal of Intelligent Engineering and Systems 11(2)

January 2018
11(2)

DOI:10.22266/ijies2018.0430.18

Authors:

Bishop Heber College

In cloud computing, the replication management system has been well adopted in cloud storage applications. To provide the availability and reliability, the replication system replicates the files and can be stored in different server. The system led some complicated issues such as high memory consumption, incurred high storage cost and to access the file is more complicated issues in recent cloud storage applications. In the existing technique, File Accessing Frequency based Ranking (FAFR) Algorithm and Dynamically Reduced Replica for Rarely Accessed files (DRRRA) algorithm work jointly and identify the rarely accessed files and retain the replica in two server other replicated files are deleted. To provide access to more request with 2 or 3-replica is a complicated issue. Thus, this paper proposes a Dynamic replica Creation for Availability enhanced Storage (DRCAES) algorithm which jointly work with FAFR algorithm to predict most frequently accessed files and automatically replicated to other server based on server memory. The aim of this proposed approach is to enhance the availability, thereby reducing the request-response delay time. Thus the proposed approach optimizes the number of replicas, occupied space, and cost.

Proposed architecture

…

FAF vs. No. of replica

…

. Ex. 1: RAF based replica creation (After placement)

…

FAF vs. Request-response rime delay.

…

. Comparison of parameters for existing and proposed algorithms

…

Figures - uploaded by Annal Ezhil Selvi S

Content may be subject to copyright.

Content uploaded by Annal Ezhil Selvi S

Content may be subject to copyright.

Received: November 18, 2017 161

International Journal of Intelligent Engineering and Systems, Vol.11, No.2, 2018 DOI: 10.22266/ijies2018.0430.18

Popularity (Hit Rate) Based Replica Creation for Enhancing the Availability

in Cloud Storage

S. Annal Ezhil Selvi1* R. Anbuselvi1

1Department of Computer Science, Bishop Heber College,

Trichy, Tamilnadu, 620017, India

* Corresponding author’s Email: ezhilabel.bhc@gamil.com

Abstract: In cloud computing, the replication management system has been well adopted in cloud storage

applications. To provide the availability and reliability, the replication system replicates the files and can be stored in

different server. The system led some complicated issues such as high memory consumption, incurred high storage

cost and to access the file is more complicated issues in recent cloud storage applications. In the existing technique,

File Accessing Frequency based Ranking (FAFR) Algorithm and Dynamically Reduced Replica for Rarely Accessed

files (DRRRA) algorithm work jointly and identify the rarely accessed files and retain the replica in two server other

replicated files are deleted. To provide access to more request with 2 or 3-replica is a complicated issue. Thus, this

paper proposes a Dynamic replica Creation for Availability enhanced Storage (DRCAES) algorithm which jointly

work with FAFR algorithm to predict most frequently accessed files and automatically replicated to other server

based on server memory. The aim of this proposed approach is to enhance the availability, thereby reducing the

request-response delay time. Thus the proposed approach optimizes the number of replicas, occupied space, and cost.

Keywords: Cloud storage, Replication, Reduce replica, Dynamic replica, File popularity, File accessing frequency.

1. Introduction

Microsoft Azure, Amazon, Google Cloud

Storage (GCS) and as leading Cloud service

Providers offer different types of storage (i.e.,

sequences of files, etc.) with different prices for data

storage services. The data storage services and

accessing files are very difficult issues on

Redundancy Storage. Each cloud service provider

also provides and monitors the commands to

retrieve, store and delete data through network

services, which impose in- and out-network delay

and cost on an application [1]. In leading Cloud

service provider in network cost is free, while out-

network cost (network cost for accessing) is charged

and may be different for usage of cloud providers. In

cloud server Data transferring or data replication

among from one server to other server, this shows

significant price differences among them. The

existing problem on this diversification plays an

essential role in the optimization of data

management request response and delay in cloud

environments [2]. This proposed techniques at

optimizing this request response and delay that

consists of residential cost (i.e., storage) and

potential migration cost (i.e., network server cost).

Cloud service provider many applications are

moving towards a distributed interconnected

network environments. In this distributed

environment, the data storage and all computational

cloud resources are distributed during different and

widespread locations based on ranking.

A cloud server store the data, the data can have

a huge number of users that require having access to

huge data volumes. For example, consider a set of

documents or images or videos that needs to be read

and accessed by a number of user spread worldwide,

in a distributed way. The access to vast data

volumes by huge number of users can be access

very time consuming. As the size of the system is

increased, the tasks of providing such data service

Received: November 18, 2017 162

International Journal of Intelligent Engineering and Systems, Vol.11, No.2, 2018 DOI: 10.22266/ijies2018.0430.18

becomes more difficult since its users suffer from

long delays in data access.

Replica cloud storage is a new research field; the

dynamic replication policy is still rarely seen. In this

paper, the focal point is on early distributed storage

file system replica creation approach, combine with

the quality of cloud storage, this paper design a

dynamic replication strategy based on high

prediction. It create a replica according to the file

access history quality, so with the purpose of users

can access the nearby which bring the cloud system

to one of the most excellent status with ranking,

specifically the replica number of minimum, access

to the highest efficiency, network lifetime is

increased.

A fixed number of replicas for every file is

insufficient to give quick file read for hot files while

waste resources for storing replicas of cold files.

Random selection of replica destination require

observance all Data Centers active to ensure data

availability, which though waste power consumption.

As the random selection of replica destination does

not think purpose of bandwidth and request handling

capacity, network congestions could occur due to

capacity restriction of some links and server may

turn into overloaded by data requests.

In this environment, data replication is essential

so that the users can retrieve the most request data

from storage residing in nearby server. The

replication is based on server memory [3]. The

performance of the distributed networks is crucially

affected by the replication strategy used. The huge

majority of the known replication strategy

determines the replicas by computing an easy

ranking based on the number of requests for each

file on individual cloud server. The “most accessing”

files, the ones with the highest ranking value, are

selected for replication due to the memory based

other server replicated.

This ranking technique that it is quite possible

that, the accessing files with high recent demand

will be requested on cloud server, with even higher

hit rates. Since the computation is quite simple, the

strategy mainly focuses on the problem of select the

most suitable files for storing the replicas based on

memory [4].

The two main drawbacks of the strategies

proposed are related to the following techniques

implemented in this work. First one is optimization

process and second one is high prediction ranking

algorithm. The scheme presented in the literature

does not take into account the change that might

suggest itself in the interest of users for certain files.

Instead, they are mostly involved in one or more

factors that decide the importance of the file

themselves, like the file size, the number of requests

for an entity file or the contents of a file. The user

request response time and files are analysed

optimization process [5].

The Second process user request analysed and

which files are most hitting on individual server.

The existing replication decision algorithm satisfies

the better request response time but the replication

of hitting is complicated [5]. The hitting ratio is

calculated or monitor on high prediction ranking

algorithm. This algorithm identifies the most hitting

files and replicated to other server based on server

memory. The replication process implemented on

this server automatically reduced the delay and

request response time on server. At the end of this

approach the cloud storage system will act as a

Role-Based intelligent System (RBIS). So that, the

user can an efficient data storage on cloud

computing environment.

Some of the roles of the DRCAES approach is

listed below,

 Dynamically predict rarely accessed files and

most frequently accesses the file using FAFR

model.

 Through DRRRA dynamically reduce the

number of replicas of that rarely accessed files,

so that, the cost and occupied space will be

minimized. For reduction of the replica, it finds

minimum available Space of DC among DC’

where that file exists (Removal).

 In DRCAES, dynamically create and place the

new replica for the most accessed file if the

frequency of each replica is equally accessed

otherwise it won’t replicate. The new replica

placed in the data center that has more available

space and that file does not exist.

 Balanced storage retained during removal and

new replica placement. Because, it analyses all

aspects of Storage system like available Space,

SLA, Accessing frequency of all existing replica.

The existing work are discussed in Section 2,

overall back ground process are described in section

3, In section 4 the proposed work which is improve

the overall request time and reduced delay using

high prediction ranking algorithm are described. The

section 5 discusses the results details. And section 6

is conclusion.

2. Related works

Data replication issue effectively requires taking

a closer look at the arrangement of most common

services and applications deployed on storage clouds

to provide services to other parties. Such

applications are usually implement as multi-tier

Received: November 18, 2017 163

International Journal of Intelligent Engineering and Systems, Vol.11, No.2, 2018 DOI: 10.22266/ijies2018.0430.18

applications running on distributed software system.

The multi-tier application user giving the request the

overall response time is very high. The storage

strategies so far consider the number of requests as

the main hitting files for computing the popularity of

each file [6].

The hitting files identify the limitation of the

current research of data replication in cloud server:

they are whichever hypothetical investigation

without realistic consideration, or heuristics-based

execution without a provable performance guarantee.

The most directly related work to this replication

work is complicated process on clouds server. The

data replication and request response on cloud

server as a static optimization problem on user

access [7]. They show that this problem is NP-hard

and request delay, which means that present, is no

polynomial algorithm that provides an accurate

solution. They only reflect on static data replication

for the intention of proper analysis. The limitation of

the static approach is that the replication cannot

regulate to the dynamically shifting user access

prototype. Additionally, their centralized process of

integer programming technique cannot be simply

implementing in a distributed cloud server.

The request response and resource sharing use

an auction protocol to make the replication choice

and to trigger long-term optimization by with file

access patterns. In this propose utility-based

replication strategies on clouds server. In this

process address the data replication for availability

in the face of unreliable works, this is different from

this optimization work [8, 9].

The random collection of replica destination

neglects server heterogeneity (i.e., different Data

Centers vary in data request handling capacities and

network capacities). The write due to creating

replicas in production clusters at searching engine

application for almost half of all cross-rack traffic.

While the network within clusters is frequently

underutilized, there exist some traffic jam links

resulting from the network usage imbalance [10].

To assume the multi-facility cloud resource

allocation problem, they are mainly involved in

solutions that are agreeable to parallel

implementations. There is quite a lot of reason. First,

for a cloud resource allocation, problem (1) is

inherently an important convex resource

optimization problem, with millions of variables or

still more. A centralized process of cloud server

resource allocation solver is extremely inefficient in

solving such large-scale cloud storage problems [11].

3. Background process

Cloud services, such as search engines,

education portal, parallel application, social

networking, etc., are often deploy on a

geographically spread the infrastructure, i.e. data

centers placed in different regions and better and

reliability. A usual query is then how to direct the

workload from users along with the set of geo

distributed data centers in organize to achieve a

desired transaction between performance and delay,

since the power price exhibit an important degree of

geographical diversity [12]. This query has involved

much attention recently and is usually referred to as

geographical cloud server load balancing.

The resource scheduling based data replication

problem and focus on scheduling pathetically

parallel resource usage which are collected of a set

of independent responsibilities with very minimal or

no data synchronization. A huge number of

applications fit in to this type of resource sharing on

cloud storage. Examples consist of distributed

relational database query, search engine query,

BLAST searches, data processing, and image

processing applications such as shaft tracing. To

effect apathetically parallel resource allocation, each

of its tasks is placed on a physical server and

executed in an external server added for that task.

The completion time of this resource request is the

completion time of the last finished request and

overall request response, i.e., the make criticize of

that set of request are completed [13].

The conventional data caching/replication

problem have been considered extensively in the

framework of the Web distributed cloud databases

and multimedia systems. What be different from

Web caching is that disk memory and I/O bandwidth

are the main concerns in multimedia storage systems.

A number of algorithms are proposed to attain high

acceptance rate and resource utilization by balancing

the use of different request response resources.

Unlike Web search engine and multimedia data,

database contents are access by both read and write

operations based in optimization process and

ranking [14].

It is assumed that the frequent accessed files in

the past will be accessed more than the others in the

future. This is called as high prediction ranking on

temporal locality. With the property of sequential

locality, a most accepted data file is resolute by

analysing the number of access to the data files from

users. After discovery the best popular file, we trace

to the client that produce the most requests for the

popular data file and a new replica is placed in it.

Therefore, in this application have to collect history

Received: November 18, 2017 164

International Journal of Intelligent Engineering and Systems, Vol.11, No.2, 2018 DOI: 10.22266/ijies2018.0430.18

of records regarding the end-to-end data transfers to

decide which file should be replicated [15].

4. Proposed framework and algorithm

Dynamic replication of data is another

significant issue since access frequencies to

individual data items are likely to modify in most

cloud server environments. The aim is to make the

replication strategy rapidly and accurately adapt to

changes and achieve optimal ranking process on

long-term performance.

In [12], they establish that, to take advantage of

cached data, it is sometimes essential to procedure

individual queries using “suboptimal” plans in

arrange to reach high system performance. In data

replication is triggered as a result of changes of

request rates in cloud server.

4.1 Dynamic replica creation availability

enhanced storage framework

The proposed contribution is used an

optimization based high prediction ranking

algorithm. The Ranking algorithm (FAFR) is giving

better performance on request response time and

delay. The ‘N’ number of user agreed request and

access the files on cloud server. To find the files and

request user identification process and queuing

process all are calculated the cloud server. Fig. 1 is

architecture of optimization based ranking with

conclude of input request response. The user

requests the input and cloud analyse the user which

files are request. The server files viewed by ranking

(based on most popularity). Example the java.doc

files mostly requested and download the files from

user means, the java.doc is replicated to other server

based on server memory.

Design data sharing-aware optimization

algorithms for solving the resource request problem.

Before relating the algorithms establish few static

definition and assumption concerning the cloud

servers. The Data Centers manage by the cloud

service provider are in one of the following two

states: active (available) and replicate.

An active cloud server is a server that is

powered on and is currently considered for all

resource allocation by the algorithms. A replication

is a server that is most frequent files are request in

cloud server considered for replicated data on other

server based on memory allocation by the

algorithms. So it can denote by Fi, Si the set of most

frequent access file and number of Data Centers.

When the entire cloud Data Centers hosted by files,

and accessing most frequent files are replicated files

on ranking based techniques in clouds server.

Figure.1 Proposed architecture

Data Centers are configured and located in

different geo-locations. Files are stored in that

configured Data Centers. The previous work of this

research proposed an algorithm (Dynamically

Reduced Replica for Rarely Accessed file (DRRRA)

Algorithm [15]) reduce the replica based on file

popularity (Least accessed files) which is the result

of Ranking Algorithm (File Accessing Frequency

based Ranking) [14]. The ranking algorithm predicts

the files based on their popularity. But, this paper

focuses the most frequently accessed file which is

the most popular file. If the user need is increased

for a file that files replicated dynamically.

4.2 Mathematical model for replica creation and

placement

The following notation and equations are used in

the prediction process of frequently accessed files

and replica increasing process.

 m: Number of uploaded files.

 n: Number of Data Center (DC).

  : ith DC, i  1,…,n.

  : jth file, j  1,…,m.

  Replica Accessing Frequency (RAF)

(Hit Rate) of ith file in jth DC. It is shown in

the following matrix (m X n ) representation.

   

 













  











  







 FAF: File Accessing Frequency computed

using eq. (1) as,

 FAF



 , i  1,…, n

 j  1,…,m. (1)

Received: November 18, 2017 165

International Journal of Intelligent Engineering and Systems, Vol.11, No.2, 2018 DOI: 10.22266/ijies2018.0430.18

 TFL : Total File list(TFL) in ith file in jth

DC, where i  1,…,n  j  1,…,m.

 AFL : L where is Allocated File List

(AFL) of ith file in jth DC, if



 > 0,

TFL

 NAFL: Non Allocated File List of ith file

in jth DC, if



  0, TFL

 TFLDC :K where is Total File List of

each Data Center‘ of ith DC in jth File,

where i  1,…,n  j  1,…,m.

 AFLDC : D where is Allotted File List of

each Data Center, of ith DC in jth File, if



  0 K.

 DCC : Data Center' Capacity (DCC) of jth

DC, where j  1,…, n .

 FS: File’s Sizes of jth DC, where j  1,…,

m .

 OS: Occupied Space of jth DC is calculated

using the following equation

OS= 



 D, j  1,…,n. (2)

 AS : Available Space of jth DC calculated

using eq. (3) as,

AS = DCC- OS j  1,…,n. (3)

 FA_Th: Frequently Accessed files

Threshold Value.

 Th_LL: Low Limit of threshold value.

 Th_UL: Upper Limit of threshold value

In prediction process, there are two levels of

prediction done to find the files which are really

needed to increase the number of replicas for

meeting the availability enhancement requirements.

In first level prediction, the most frequently

accessed files are predicted using Eq. (4) based on

the FAF of the FAFR model.

The second level prediction able to done based

on the result of the first level prediction. By first

level prediction, some of the most frequently

accessed file is in resulting set. From that files, the

second level prediction finds the files which are

really required to provide seamless availability that

is done using Eq. (5) based on RAF of the FAFR

model.

  FA_Th (4)

 

󰇱 󰇡ASNAFL󰇢 if th_LL







th_UL,

  



(5)

To be exact, Check the frequency of each replica

(if it is equally accessed it will be replicated

otherwise it won’t be replicate). Then the Replica

Placement will be done based on available space of

the data center. That is, the dynamically created

replica will be placed in Datacenter that has more

available space and that file does not exist.

4.3 File accessing frequency based ranking

(FAFR) algorithm

FAFR is a ranking algorithm which proposed in

[14] itself. But in this paper, the new name is given

with some refined work.  is the Replica Accessing

Frequency (RAF) (hit rate). Initially,  value is 0

when the file is uploaded then the  value became 1.

And whenever the file is accessed the  value will

be incremented. Finally, the summative  value is

calculated and considered as a File Accessing

Frequency.

Input: , , M(), k=0,  =0,q

Output: Rank { q’s result set}

1. Done  file in  Data center’s when user

interface triggered

2. If file upload

3.  =1

4. If file access

5.  =  +1;

6. for each  do

7. for each  do

8. K=k+ 

9. End

10. Rank. insert (all Values)

11. If Rank =q

12. Return Rank

Figure.2 Working principle of DRRRA

Received: November 18, 2017 166

International Journal of Intelligent Engineering and Systems, Vol.11, No.2, 2018 DOI: 10.22266/ijies2018.0430.18

In Fig. 2 is shown the working principle of

Dynamically Reduced Replica for Rarely Accessed

files (DRRRA) is described in [15]. FAFR and

DRRRA are jointly worked and predict Rarely

Accessed files then reduce their replica to 2-replica

strategy. Here the minimum replica is 2 for assuring

Reliability and maximum is 3-replica strategy.

4.4 Dynamic replica creation for availability

enhanced storage (DRCAES) algorithm

The following DRCAES algorithm dynamically

finds the most frequently accessed file using FAFR

[14]. Then, based on FAFR result DRCAES

dynamically replicate the replica and place it on DC

that have maximum Available Space (AS) and that

DC does not have that file’s replica on it.

Input : Fi, DCi , NALk,  i,j , FA_Th, Th_LL ,

Th_UL., DC, Replica

Output: Dynamically Increased Replica.

1. Set values to FA_Th, Th_LL and Th_UL.

2. If user access

3. FAF getFAF()

4. If >= FA_Th then

5. Fi  getHighRankFiles()

6. DCi  getHighRankFilesDC()

7. NALk  get_allNonAllocatedList()

8. For each Fi do

9. For each DCj do

10.  i,j  getRAF(F i , D j)

11. If( >= Th_LL and  i,j < Th_UL) then

12. Replica=copyof(Fi)

13. DC=getDCid(max(ASi

getAvailableSpace(NALk )))

14. REPLICATE( DC, Replica)

15. End

16. End

In DRCAES algorithm Step (3) is used to get

most frequently accessed files. For every user access

monitored and when FAF reach the FA_Th valued

that time it performs the second level prediction

which is done in RAF. That is, the every RAF

should satisfy the range which mentions in step 10.

End of this checking, if the result set has files that

files are required to increase their number of

replications, that will be done based on SLA

specification and Available Space of DC which is

defined in Step (12).

Step (11) is used to verify if the file need to

replicate or not. That is, Th_LL is lower limit of

threshold value range and Th_UL is a upper limit of

threshold value range which determine the ranges of

threshold values are decide to replicate or not.Then,

Step(12-14) are work when step (11)’s decision.

Example:

The worked out examples of DRCAES

algorithm is presented in tables 1 to 6, which will be

discussed below one by one. After this discussion,

the reader can easily understand the concept of our

approach clearly.

The Prediction of Frequently Accessed Files

based on FAF process is represented in table 1. The

FA_Th value for validation is 15. When the FAF

reach 15 that file comes under the consideration. For

an example, using Eq. (4) the files 1, 3, and 5 are in

consideration.

In Table 1 shows seven Data Centers are

configured with 5GB Memory and they are located

in different geo-locations. And five different files

are stored on among the 7 DC.

Next, these files are verified by second level

prediction process that is Prediction of highly

needed files based on RAF which is explained in

table 2. In second level prediction done using Eq.

(5) which checks the individual replica frequency, if

it all equally accessed that file will be replicated in

one more data center.

In Eq. (5), there are two threshold values

involved. First one is Th_LL, its value for validation

is 10. The second one is Th_UL, its value for

validation is 20. The file 3’s replicas residing in

DC2, DC4, and DC5, as well as their RAF, is 3 is

8,10,6 respectively. So this files not in the range of

Th_UL and Th_UL. Thus, this file need not be

replicated, but other two files 1 and 5 need to be

replicated because it satisfies the range values.

In table 3 depicted the process of replica

placement which is described in Eq. (7). The file

1’s replicas are residing in DC2, DC4 and DC7

along with their RAF are 14, 20 and 15 respectively.

The replica placement has done based on

Available Space (AS) of the data center which

doesn’t have the replica of the file. Here, the DC1,

DC3, DC5, and DC6 doesn’t have the replica of

file1 as well as their AS is 4.5, 4.9, 4.7 and 4.8

respectively.

The DC3 has the maximum of AS among the

DCs which as mentioned in the previous point. So,

the replica will be placed in DC3, the reflected

changes in metadata are shown in table 4.

Received: November 18, 2017 167

International Journal of Intelligent Engineering and Systems, Vol.11, No.2, 2018 DOI: 10.22266/ijies2018.0430.18

Table 1. Prediction of frequently accessed files based on FAF

File Name

File

Type

File

Size

RAF (Replica Accessing Frequency ) ()

DC 1

5GB

DC 2

5GB

DC 3

5GB

DC 4

5GB

DC 5

5GB

DC 6

5GB

DC 7

5GB

Array_Java

docx

0.07

Tree_ds

pdf

0.11

Img_001

jpeg

0.31

CS_C

mp3

0.55

HelloEnglish

mp4

OS (Occupied Space In GB)

0.5

1.4

0.1

0.9

0.3

0.1

1.0

AS ( Available Space in GB)

4.5

3.6

4.9

4.1

4.7

4.9

4.0

Table 2. Prediction of highly needed files based on RAF

S.No

File Name

File

Type

File

Size

RAF (Replica Accessing Frequency ) ()

FAF

5GB

Array_Java

docx

0.07

Tree_ds

pdf

0.11

Img_001

jpeg

0.31

CS_C

mp3

0.55

HelloEnglish

mp4

OS (Occupied Space In GB)

0.5

1.4

0.1

0.9

0.3

0.1

1.0

AS ( Available Space in GB)

4.5

3.6

4.9

4.1

4.7

4.9

4.0

Table 3. Ex. 1: RAF based replica creation (Before placement)

S.No

File Name

File

Type

File

Size

RAF (Replica Accessing Frequency ) ()

FAF

5GB

Array_Java

docx

0.07

Tree_ds

pdf

0.11

Img_001

jpeg

0.31

CS_C

mp3

0.55

HelloEnglish

mp4

OS (Occupied Space In GB)

0.5

1.4

0.1

0.9

0.3

0.1

1.0

AS ( Available Space in GB)

4.5

3.6

4.9

4.1

4.7

4.8

4.0

Received: November 18, 2017 168

International Journal of Intelligent Engineering and Systems, Vol.11, No.2, 2018 DOI: 10.22266/ijies2018.0430.18

Table 4. Ex. 1: RAF based replica creation (After placement)

S.No

File Name

File

Type

File

Size

RAF (Replica Accessing Frequency ) ( )

FAF

5GB

Array_Java

docx

0.07

Tree_ds

pdf

0.11

Img_001

jpeg

0.31

CS_C

mp3

0.55

HelloEnglish

mp4

OS (Occupied Space In GB)

0.5

1.4

0.4

0.9

0.3

0.1

1.0

AS ( Available Space in GB)

4.5

3.6

4.6

4.1

4.7

4.9

4.0

Table 5. Ex. 2: RAF based replica creation (Before placement)

File Name

File

Type

File

Size

RAF (Replica Accessing Frequency ) ()

FAF

DC 1

5GB

DC 2

5GB

DC 3

5GB

DC 4

5GB

DC 5

5GB

DC 6

5GB

DC 7

5GB

Array_Java

docx

0.07

49 +1

Tree_ds

pdf

0.11

Img_001

jpeg

0.31

CS_C

mp3

0.55

HelloEnglish

mp4

OS (Occupied Space In GB)

0.5

1.4

0.4

0.9

0.3

0.1

1.0

AS ( Available Space in GB)

4.5

3.6

4.6

4.1

4.7

4.9

4.0

Table 6. Ex. 2: RAF based replica creation (After placement)

File Name

File

Type

File

Size

RAF (Replica Accessing Frequency ) ()

FAF

DC 1

5GB

DC 2

5GB

DC 3

5GB

DC 4

5GB

DC 5

5GB

DC 6

5GB

DC 7

5GB

Array_Java

docx

0.07

49 +1

Tree_ds

pdf

0.11

Img_001

jpeg

0.31

CS_C

mp3

0.55

HelloEnglish

mp4

OS (Occupied Space In GB)

0.5

1.4

0.4

0.9

0.3

0.1

1.0

AS ( Available Space in GB)

4.5

3.6

4.6

4.1

4.7

4.9

4.0

Received: November 18, 2017 169

International Journal of Intelligent Engineering and Systems, Vol.11, No.2, 2018 DOI: 10.22266/ijies2018.0430.18

Table 7: Comparison of Existing PRC [3], Proposed DRRRA [15] and DRCAES

S.no

FAF

Existing

(PRC)

conventional 3 Replica

Strategy (Minimum Replica 1

Maximum Replica 3)

Proposed DRRRA

Algorithm

(Dynamically Reduced

Replica of Rarely Accessed

files (2-Replica)

Proposed DRCAES

Algorithm

(Dynamically create the

Replica based on User need)

OS=

FS *

Cost= OS *

0.00067

RR_TD

OS=

FS *

Cost= OS *

0.00067

RR_TD

OS=

FS *

Cost= OS *

0.00067

RR_TD

>10

2.31

0.0015477

1.54

0.0010318

1.54

0.0010318

15-40

2.31

0.0015477

3.5

2.31

0.0015477

3.5

2.31

0.0015477

3.5

41-60

2.31

0.0015477

4.2

2.31

0.0015477

4.2

3.08

0.0020636

3.7

61-80

2.31

0.0015477

2.31

0.0015477

3.85

0.0025795

3.9

In table 5 is shown the replica placement process

of file 5. The file 5’s replicas are residing in DC2

and DC7 along with their RAF is 13, and 19

respectively. Here, the highlight point regarding this

file is, the replica of this file is reduced by DRRRA

approach (2-replica strategy) which is discussed in

chapter 6. But, the need for this file is increased. So,

it is going to be replicated.

Here, the DC1, DC3, DC4, DC5, and DC6

doesn’t have the replica of file 5 as well as their AS

is 4.5, 4.6, 4.1, 4.7 and 4.9 respectively. The DC6

has the maximum of AS. So, the replication will be

stored in DC6 which is shown in table 6. It is shown

in Table (5) is after ranking the process replicated to

that files based on server memory.

5. Result and discussion

The reflections in the parameter which involved

in research due to DRCAES approach is presented

in table 7. There are 4 parameters and their value

calculation shown in this table such as NR (number

of Replicas), OS (occupied Space), Cost and

RR_TD (Request-Response Time Delay).

Finally, when the research comparing the

proposed algorithms with the existing algorithm the

DRCAES give better performance in all aspects

such as the number of replicas, Occupied Space

(OS), Cost and Request-Response Time Delay

(RR_TD) based on File Accessing Frequency

Finally, when the research comparing the proposed

algorithms with the existing algorithm the DRCAES

give better performance in all aspects such as the

number of replicas, Occupied Space (OS), Cost and

Request-Response Time Delay (RR_TD) based on

File Accessing Frequency (FAF) for file of File Size

0.77 GB. It is shown in the table 7.

The change in NR value will be reflected to OS,

Cost, and RR_TD parameter values. In the existing

system, the NR value decided based on disk failure

rate benchmark of NR is 3 which is the convention

strategy.

For an example, the above table represents the

file with File Size (FS) 0.77 in GB is uploaded and

accessed in different scenarios.

• In the proposed system, the NR is decided

based on DRRRA and DRCAES approaches.

• The Occupied Space (OS) is calculated using

following way,

Occupied Space (OS) = File Size (FS) *

Number of Replicas (NR)

• The cost is calculated in the following way,

Cost = Occupied Space (OS) * 0.00067

(0.00067 is the amount incurred for 1 GB per

day which is adopts based on Google Drive

Cost plan. This is only for testing purpose)

• The RR_TD values are obtained by the use

of MATLAB tool.

These values are calculated based on different

File Accessing Frequency (FAF). The table values

are presented in graphical representations.

Fig. 3 shown the comparison of changes in

Number of Replicas parameter in different File

Accessing Frequency (FAF). From the graph, we

can clearly understand the NR is standard in existing

PRC, either 2 or 3 in DRRRA approach, and it is

vary based on FAF in DRCAES approach.

Fig. 4 presents the comparison of Occupied

Space (OS) and different File Accessing Frequency

(FAF) range. The graph is boons for the clear

understanding the reflections done due to the NR.

The OS is also standard in existing PRC because of

Standard NR, minimized for rarely accessed files in

Received: November 18, 2017 170

International Journal of Intelligent Engineering and Systems, Vol.11, No.2, 2018 DOI: 10.22266/ijies2018.0430.18

Figure.3 FAF vs. No. of replica

Figure.4 FAF vs. Occupied space

Figure.5 FAF vs. Cost

DRRRA approach, but optimized based on FAF in

DRCAES approach.

Fig. 5, the comparison of Cost in dollars ($) and

different File Accessing Frequency (FAF) range is

presented. The graph is determines the changes in

cost due to the Occupied Space (OS) modification.

The cost is also a regular in existing PRC because of

Standard OS, minimized for rarely accessed files in

Figure.6 FAF vs. Request-response rime delay.

DRRRA approach, but in DRCAES approach, it is

optimized based on FAF.

Fig. 6 shows the Request-Response Time Delay

(RR_TD) in the sec. and different File Accessing

Frequency (FAF) range. The graph values

represented based on MATLAB tool result and the

reflection of the RR_TD values is determined by the

value of NR. The RR_TD is worst in existing PRC

and DRRRA approach for high FAF range because

of Standard NR, but it is reduced in DRCAES

approach, because of optimized NR based on FAF.

Similarly, the table 7 presents the comparison of

parameters such as Number of Replicas (NR),

Occupied Space (OS), Cost, Request-Response

Time Delay (RR_TD) along with reliability and

availability concern of the proposed DRCAES with

DRRRA and exiting PRC algorithm. The optimized

cost obtained without affecting the existing

reliability assured percentage.

From the table, we can understand the DRCAES

provides an efficient data storage on cloud

computing environment with reliability, availability

concerns in a cost-effective manner.

The benefits of DRCAES approach is listed

below which stated in section 1. That all are proved,

 Dynamically predict rarely accessed files and

most frequently accesses the file using FAFR

model.

 Through DRRRA dynamically reduce the

number of replicas of that rarely accessed

files, so that, the cost and occupied space is

minimized. For reduction of the replica, it

finds minimum available Space of DC

among DC’ where that file exists (Removal).

 In DRCAES, dynamically create and place

the new replica for the most accessed file if

the frequency of each replica is equally

accessed otherwise it won’t replicate. The

new replica placed in the data center that has

Received: November 18, 2017 171

International Journal of Intelligent Engineering and Systems, Vol.11, No.2, 2018 DOI: 10.22266/ijies2018.0430.18

Table 8. Comparison of parameters for existing and proposed algorithms

Number

Replica

Occupied

Space

Cost

Reliability

Availability

Request-

Response

Time Delay

Existing

PRC

[3]

1 or 2 or 3

Decide

Based on

Disk Failure

Rate

Minimized

Based on

Replica

Minimized

Based on

Replica

1-Replica No

reliability

2-replica95%

Assured

3-replica99%

Not

Considered

Increased for

more request

Proposed

DRRRA

2 or 3

decide

Based on

FAF

Minimized

for Rarely

Accessed

File.

Minimized

for Rarely

Accessed

File.

2-replica 95%

Assured

[3]

Not

Considered

Increased

for more

request

Proposed

DRCAES

2-Replica is

Minimum

and

maximum is

decided

Based on

SLA

optimized

2-replica95%

Assured [3]

Enhanced

Decreased

more available space and that file does not

exist.

 Balanced storage retained during removal

and new replica placement. Because, it

analyses all aspects of Storage system like

available Space, SLA, Accessing frequency

of all existing replica.

6. Conclusion

To minimize the request response time and delay

of data placement for time-varying workload

applications, user necessity optimally makes use of

the time difference between storage and network

services across multiple cloud service provider. The

previous work of this research dynamically predicts

rarely accessed files with a help of FAFR algorithm

and reduces the number of replicas for that file, if it

satisfies the time limit using DRRRA algorithm.

Similarly, the proposed DRCAES algorithm

dynamically predicts most frequently accessed files

with the help of the FAFR algorithm. Then, it

creates a new replica for that file and finds the data

center that has more available space and doesn’t

have that file. However, this work achieves the

optimizing occupied space, cost, server performance,

increased server’s service delivery speed and

decreased request-response time delay. Thus,

ultimately the proposed DRCAES provide an

efficient data storage with an optimized cost without

affecting reliability, availability concerns for the

cloud also by optimizing the number of replicas

based on the user need and SLA. The proposed

algorithm achieves better result when compared to

the existing algorithms. In future replica

management during the disaster could be considered

without affecting the reliability and availability

concerns with minimum replica.

References

[1] R. Han, M.M. Ghanem, L. Guo, Y. Guo, and M.

Osmond, “Enabling cost-aware and adaptive

elasticity of multi-tier cloud applications”,

Elsevier, Future generation power systems in

Science Direct, Vol.32, No.1, pp. 82-98, 2014.

[2] M. Du and F. Li, "ATOM: Efficient Tracking,

Monitoring, and Orchestration of Cloud

Resources", IEEE Transactions on Parallel &

Distributed Systems, Vol. 28, No.8 , pp. 2172-

2189, 2017.

[3] W. Li, Y. Yang, and D. Yuan, “Ensuring Cloud

Data Reliability with Minimum Replication by

Proactive Replica Checking”, IEEE Transactions

on Computers, Vol. 65, No. 5, pp. 1494-1506,

2016.

[4] S. Souravlas, and A. Sifaleras, "Binary-Tree

Based Estimation of File Requests for Efficient

Data Replication", IEEE Transactions on

Parallel & Distributed Systems, Vol. 28, No. 7,

pp. 1839-1852, 2017.

[5] R. Han, S. Huang, Z. Wang, and J. Zhan, "CLAP:

Component-Level Approximate Processing for

Low Tail Latency and High Result Accuracy in

Cloud Online Services", IEEE Transactions on

Received: November 18, 2017 172

International Journal of Intelligent Engineering and Systems, Vol.11, No.2, 2018 DOI: 10.22266/ijies2018.0430.18

Parallel & Distributed Systems, Vol. 28, No.8 ,

pp. 2190-2203, 2017.

[6] U. Tos, R. Mokadem, A. Hameurlain, T. Ayav,

and S. Bora “A Performance and Profit Oriented

Data Replication Strategy for Cloud Systems”,

In: Proc. of IEEE Conferences on Ubiquitous

Intelligence & Computing, Toulouse, France,

pp. 780-787, 2016.

[7] D.T. Nukarapu, B. Tang, L. Wang, and S. Lu,

“Data Replication in Data Intensive Scientific

Applications with Performance Guarantee”,

IEEE Transactions on Parallel and Distributed

Systems, Vol. 22, No. 8, pp.1299-1306, 2011.

[8] A. Kumar, R. Tandon, and T.C. Clancy, “On the

Latency and Energy Efficiency of Distributed

Storage Systems”, IEEE Transactions on Cloud

Computing, Vol. 5, No 2, pp. 221- 233, 2017.

[9] M. Hadji, “Scalable and Cost-Efficient

Algorithms for Reliable and Distributed Cloud

Storage”, Springer International Publishing

Switzerland, Vol.581, No.1, pp. 3-12, 2016.

[10] S.Q. Long, Y.L. Zhao, and W. Chen, “MORM:

A Multi-objective Optimized Replication

Management strategy for cloud storage cluster”,

Journal of Systems Architecture, Vol. 60, No.1,

pp. 234-244, 2014.

[11] S. Rampersaud and D. Grosu, "Sharing-Aware

Online Virtual Machine Packing in

Heterogeneous Resource Clouds", IEEE

Transactions on Parallel & Distributed Systems,

Vol. 28, No. 7, pp. 2046-2059, 2017.

[12] Y. Lin and H. Shen, “EAFR: An Energy-

Efficient Adaptive File Replication System in

Data-Intensive Clusters”, IEEE Transactions on

Parallel and Distributed Systems , Vol. 28, N,

pp. 1017-1030, 2017.

[13] L. Shi, Z. Zhang, and T. Robertazzi, "Energy-

Aware Scheduling of Embarrassingly Parallel

Jobs and Resource Allocation in Cloud", IEEE

Transactions on Parallel & Distributed Systems,

Vol. 28, No. 8, pp. 1607-1620, 2017.

[14] S.A.E. Selvi and R. Anbuselvi, “Ranking

Algorithm Based on File’s Accessing

Frequency for Cloud Storage System”,

International Journal of Advanced Research

Trends in Engineering and Technology, Vol. 4,

No. 9, pp. 29-33,2017.

[15] S.A.E. Selvi and R. Anbuselvi, “Optimizing the

Storage Space and Cost with Reliability

Assurance by Replica Reduction on Cloud

Storage System”, International Journal of

Advanced Research in Computer Science, Vol.

8, No.8, pp.327-332, 2017.

Geo-Distance Based 2-Replica Maintaining Algorithm for Ensuring the Reliability forever Even During the Natural Disaster on Cloud Storage System

Article

Full-text available

Aug 2023

Annal Ezhil Selvi S

In today's digitalized and globalized scenario, everyone has moved to cloud computing for storing their information on cloud storage to access their data from anywhere at any time. The most significant feature of cloud storage is its high availability and reliability then it has the capability of reducing management factors as well as incurred lower storage cost compared with some other storing methods, it is most suitable for a high volume of data storage. In order to meet the requirements of high availability and reliability, the system adopts a replication system concept. In replicating systems, the objects are replicated many times, with each copy residing in a different geographical location. Though it is beneficial to the users, it leads to some issues like security, integrity, consistency and hidden storage and maintenance cost, etc. Therefore, it is exposed to a few threats to the Cloud Storage System (CSS) user and the provider as well. So, this research seeks to explore the mechanisms to rectify the above-mentioned issues. Thus, the predecessor of the research work has proposed an algorithm named as 2-Replica Placing (2RP) algorithm which is used to reduce the storage cost, maintenance cost; and maintenance overheads as well as increase the available storage spaces for the providers by placing the data files on two locations based on Geo-Distance. But it fails to address the recovery mechanism when a natural disaster happens because providing reliability with less than 2 replicas is a challenging task for the providers. Thus, the research proposed Geo-distance based 2-Replica Maintaining (2RM) algorithm which is used to consider that issue for ensuring reliability forever even during natural disasters

Forecasting and Quality Control of Confectionery Products with the use of “Water Activity Using the SPSS Method

Chapter

Full-text available

Jul 2023

Water quality for a specific purpose, usually of drinking or swimming Depending on the suitability, its chemical, of water including physical and biological properties Describes the condition. The quality of water is its of water based on quality of use Refers to chemical, physical and biological properties. Usually through water purification of standards against which attainable conformity can be assessed it is often used to refer to the set. Some of the contaminants in our water are alkaline Intestinal disease, reproductive problems and Health including neurological disorders can lead to problems. Infants, young children, pregnant women, the elderly and the weak especially those with compromised immune systems May be in danger of getting sick. Values closer to 150 mg/L are generally better from an aesthetic point of view. Water below 150 mg/L for soft water and above 200 mg/L Values in are also considered hard water. Sources: In soil and rock material from primarily dissolved carbonate Minerals. Disturbances such as fire, wind blowing, or debris flows can affect stream temperature, turbidity, and other water quality parameters. Geography, geography and climate affect Water quality. Air and water quality and Hawaii for the overall natural environment Number one in the nation. This Massachusetts is second in the division is in place, followed by North There are Dakota, Virginia and Florida. Better for air and water quality Learn more about the states.

Evaluation of Forensic Medical Examination using the DEMATEL Method

Chapter

Full-text available

Jul 2023

Forensic medical examination is Searching for injuries and police to be used as evidence at trial Taking samples and A lawsuit following Sexual Assault Forensic Medicine The purpose of the test is to thevictim's Health needs assessment Treatment of any injuries When consolidating and prosecuting Gather resources for potential use trial (US Department of Justice, 2004, pp. 30-2). Because the body is the scene of the crime, testimony is time sensitive and may only last until the victim has showered, washed, and/or urinated. (1) Examination by a health care provider is recommended even if no injuries are seen. As a result of the attack, (2) the victim did not want to collect evidence, or (3) the attack was not recent. In these cases, the victim may have injuries that are not obvious or serious or have associated health concernsForensics in the criminal justice system Science is an important part. Forensic scientists at crime scenes and sources from elsewhere study commonly murder in major criminal cases including impeachment Guilt or innocence Forensic Science as a Factor in Determining was used. Forensic Medicine or Forensics Pathology is a branch of forensic science that helps in finding the results of crime evidence. Visualize by applying clinical facts to the situation. Course subjects help individuals become proficient in identifying the cause and time of death and other details about the deceasedDEMATEL (Decision Making Trial and Evaluation Laboratory) They are divided into analysis using the Forensic Medical Examination Assessment of the victim, Informed consent, Medical, and gynecological history, Assault history, and Physical top-to-toe examination It is the interaction between the factors Visualized and assesses dependent relationships Through the structural model Also deals with identifying important.Assessment of the victim, Informed consent, Medical, and gynecological history, Assault history, Physical top-to-toe examinationthe assessment of the victim got the first rank whereas the Informed consent number is having the lowest rank.Forensic Medical Examination. Assessment of the victim is got the first rank whereas the Armed consent number is having the Lowest Rank

A Neoteric Geo-Distance Based 2-Replica Placing Algorithms on Cloud Storage System

Article

Full-text available

Sep 2023

Annal Ezhil Selvi S

Robust Design of Aero Engine Structures: Using the Weighted Product Method

Article

Full-text available

Aug 2023

Robust Design of Aero Engine Structures: Using the Weighted Product Method

Article

Full-text available

Mar 2023

Total loss of quality in products or processes Reduction is the objective of robust design. Strong Design is an effective approach that aims to simultaneously decrease product costs and enhance quality while also significantly reducing development time. Strength is defined as a skill Raw material, operating conditions, process equipment, environmental conditions, and human Expected variation in factors Tolerant manufacturing process. Robust design is the design of products, devices, and manufacturing equipment so that their performance and functionality are insensitive to multiple variations, such as manufacturing and assembly tolerances, ambient use conditions, or degradation over time. Therefore, there is not strong design sensitivity-meaning that variation in the product will have minimal influence. In essence, robust design means minimizing the impact of variation on a product. One or more due to unforeseen circumstances Input variables or assumptions are rigorous although modified, their output and predictions are accurate A model is considered robust if. The alternatives being considered are related to specific aircraft features: the aerodynamic characteristic (C1), maximum takeoff weight (C2), armament (C3), and avionics (C4). The evaluation options are Ao, F-16, Su-35, and Mig-35. Based on the evaluation results, Ao obtained the top rank, while Mig-35 received the lowest rank.The value of the dataset for Robust Design of Aero Engine in Weighted product method shows that it results in Ao and top ranking

Chapter

Full-text available

Jul 2023

The detection and interpretation of the presence of drugs and other potentially hazardous substances in bodily fluids and tissues fall under the multidisciplinary subject of forensic toxicology. 2020, Toxicology Fifth Edition, posted. Applied clinical science is the focus of forensic medicine, often known as forensic pathology. facts to the situation to determine the results of evidence at a crime scene. Course subjects help individuals become proficient in identifying the cause and time of death and other details about the deceased. Forensic medicine is one of the most popular professions, especially in India. There are endless opportunities in this industry due to the unlimited number of crimes taking place all over the world. The average salary for freshers is between 3 lakhs to 6 lakh per annum. Professionals with 5 years or more experience can expect a salary of 6-12 lakhs per annum. Due to their significant prison impact, particularly in civil and criminal proceedings, forensic toxicology and forensic medicine are distinct among all other medical specialties. Metabolomics, the most recent of the "omics sciences," has been proven to be one of the most potent methods for tracking changes in forensic domains thanks to new excessive-throughput technology adapted from chemistry and physics. Forensic medication and toxicology are taught within the final two years of scientific college students' research, in addition to taught to third-yr college students in the School of Pharmacy. Finally, the branch integrates all studies and submit-graduate publications inside the diverse fields of forensic remedy and toxicology.

Assessment of Seasonal Variations in Surface Water Quality in SPSS Method

Chapter

Full-text available

Jul 2023

s: The primary source of water in locations without access to surface water is groundwater. In the past, groundwater served as the primary resource for all needs in the Morapur area of Tamil Nadu Dharmapuri district. According to reports, the area has a serious fluorosis problem since the groundwater contains too much fluoride. The area is made up of charnockite, epidotic hornblende gneiss, and ultramafic rocks that date back to the Achaean period. Numerous tectonic upheavals in the region led to the development of quartz/feldspathic veins and heavily mineralized vertical joints. There are two hydrological systems in the area, namely the worn and cracked aquifer water table. 149 groundwater samples were taken before and after the monsoon to better understand the variables influencing high fluoride concentration in groundwater. According to analysis's findings, 35% of groundwater samples contained fluoride concentrations greater than 1.5 bps (permissible limit). The findings show that deep aquifers and high fluoride levels have an impact on both aquifers. Groundwater in the area interacts with biotic and hornblende minerals to release calcium, magnesium, and fluoride. And according to acid-alkaline indices, sodium ions replace calcium ions due to reverse ion exchange, resulting in a high concentration of sodium with a high concentration of fluoride. The federal government has taken measures to make fluoride-free food and water from distant water sources available. To increase groundwater quality, specific water management techniques are crucial. Tamil Nadu, Dharmapuri District, Domestic and Water quality for irrigation purposes to assess water quality survey has been carried out. PH, TDS, TH, Calcium, Magnesium, Chloride, Sulphate. This paper notes that increasing levels of water pollution, the resulting billion-dollar utility and with control schemes, it provides a way to measure and evaluate the quality of given water body Development of water quality codes is necessary. The data output of current water monitoring stations is huge and Dimensional reporting units are different and not integrated in a straightforward algebraic way, even by scientifically trained users have few means of integrating the data to provide; water quality. That quality is locally better than hook and line to be broadly defined, Because of the importance of downstream streams less emphasized in that context. The stream is never fishable However, it is an integral part of the watershed; Protection is essential if downstream streams are to remain fishable and swimmable. The Clean Water Act's biological integrity mandate without considering local streams separately, It depends on the overview of the entire hydrological system at the water table level. Agricultural waste, applied fertilizers, soil leach ate, urban waste, Cattle excreta and sewage Sources of poor water quality. Some models have hardness and due to magnesium concentration are highly saline not suitable for irrigation purposes. In general, ground water farming activities of Dharmapuri district, anthropogenic activities, ion exchange and contaminated by weather.

Water Quality Analyzing Barriers of Green Lean Practices in Manufacturing Industries by DEMATEL Method

Chapter

Full-text available

Jul 2023

Water quality for a specific purpose, usually of drinking or swimming Depending on the suitability, its chemical, of water including physical and biological properties describes the condition. The quality of water is its of water based on quality of use Refers to chemical, physical and biological properties. Research significance: Usually through water purification of standards against which attainable conformity can be assessed it is often used to refer to the set. Some of the contaminants in our water are alkaline Intestinal disease, reproductive problems and Health including neurological disorders can lead to problems. Alternative: sulphate, chloride, magnesium, calcium, PH. sulphate, chloride, magnesium, calcium, PH Result: The result it is seen that PH is got the first rank where as is the calcium having the lowest rank. Conclusion: The value of the dataset for Water Quality in Test and evaluate decision making the lab shows that it results in PH and top ranking.

A Review on Solid State Drives Transformer Concept: A New Era in Power Supply

Chapter

Full-text available

Jul 2023

Solid State Drives (SSD) are single drive. The performance is greater than that produced by disks allow Their low latency and parallelism Possibilities for operations, operating system I/O Read data at speeds that break interfaces They also allow writing. In addition, their performance Characteristics in existing grading systems Reveal gaps. An SSD is more Efficiency, high activity rates and more to operate with minimum latency under demanding conditions We prove that it can. High efficiency SSDs with Future High-Performance Computing (HPC) Dramatic parallel I/O performance for systems This suggests that the method can be improved. our Instead of traditional hard drives in public laboratories Using solid-state drives, our Public perception of services How technology has had an immediate effect This article also explains. Solid State Drives (SSDs) are more popular now Dense and compact are NAND flash Due to the growth in the cost of memories widely is used.

OPTIMIZING THE STORAGE SPACE AND COST WITH RELIABILITY ASSURANCE BY REPLICA REDUCTION ON CLOUD STORAGE SYSTEM

Article

Full-text available

Aug 2017

Annal Ezhil Selvi S

Content Distribution Networks have been attracted a great deal of attraction in recent years on cloud computing. Replica placement problems (RPPs) as one of the key technologies in the Content Distribution Networks have been widely studied. The internet services are hosted by multiple geographically distributed datacenters. For the increasingly expanded utility of Cloud storage, the improvement of resources management and reduces the storage space, cost is complicated issues on data replication. So in order to reduce the data replication, this paper, proposed the concept of Reliability Assurance algorithm (RA). The RA is reducing the file replication from the server and improves the reliability of the server. The RA algorithm identified the data replication on un-accessing files. The unpredicted files called as replicated files, these files occupy a more space on cloud server. The low ranking prediction algorithm identified the unpredicted files on cloud server based on file accessing. Reduced data replication on cloud server it’s allows optimizing the storage space and cost. Using the RA and Ranking algorithm combined to reduce the data replication and storage space.

Ranking Algorithm Based on File's Accessing Frequency for Cloud Storage System

Article

Full-text available

Sep 2017

The number of cloud storage users has improved abundantly at recent times. The reason behind is, the Cloud Storage system minimizes the burden of maintenance and it has less storage cost compare with other storage methods. It provides high availability, reliability and it is most suitable for high volume of data storage. In order to provide high availability and reliability, the systems introduce redundancy. In replicated system, the cloud storage services are hosted by multiple geographically distributed data centres. But the file Replication is rendering little bit threat about the Cloud Storage System for the users and for the providers it is a big challenge to offer efficient Data Storage. Since the increasingly expanded utility of Cloud storage, the improvement of resources management in the shortest time to respond to the user's requests and the geographical constraints are of prime importance to both the Cloud service providers and the users. The data replication helps in attractive the data availability which reduces the overall access time of the files, but at the same time it occupies more storage space and storage cost. In order to rectify the above mentioned problems, need to identify the popularity of the file. So this paper proposed new ranking algorithm which lists the most often accessed files and less frequently accessed files. In future the least accessed a file's replications going to be reduced likewise most accessed file's replications going to be increased based on their SLA.

A Performance and Profit Oriented Data Replication Strategy for Cloud Systems

Conference Paper

Full-text available

Jul 2016

Binary-Tree Based Estimation of File Requests for Efficient Data Replication

Article

Full-text available

Feb 2017

Recently, data replication has received considerable attention in the field of grid computing. The main goal of data replication algorithms is to optimize data access performance by replicating the most popular files. When a file does not exist in the node where it was requested, it necessarily has to be transferred from another node, causing delays in the completion the file requests. The general idea behind data replication is to keep track of the most popular files requested in the grid and create copies of them in selected nodes. In this way, more file requests can be completed over a period of time and average job execution time is reduced. In this paper, we introduce an algorithm that estimates the potential of the files located in each node of the grid, using a binary tree structure. Also, the file scope and the file type are taken into account. By potential of a file, we mean its increasing or decreasing demand over a period of time. The file scope generally refers to the extent of the group of users which are interested or potentially interested in a file. The file types are divided into read and write intensive. Our scheme mainly promotes the high-potential files for replication, based on the temporal locality principle. The simulation results indicate that the proposed scheme can offer better data access performance in terms of the hit ratio and the average job execution time, compared to other state-of-the-art strategies.

Sharing-Aware Online Virtual Machine Packing in Heterogeneous Resource Clouds

Article

Full-text available

Dec 2016

One of the key problems that cloud providers need to efficiently solve when offering on-demand virtual machine (VM) instances to a large number of users is the VM Packing problem, a variant of Bin Packing. The VM Packing problem requires determining the assignment of user requested VM instances to physical servers such that the number of physical servers is minimized. In this paper, we consider a more general variant of the VM Packing problem, called the Sharing-Aware VM Packing problem, that has the same objective as the standard VM Packing problem, but allows the VM instances collocated on the same physical server to share memory pages, thus reducing the amount of cloud resources required to satisfy the users’ demand. Our main contributions consist of designing several online algorithms for solving the Sharing-Aware VM Packing problem, and performing an extensive set of experiments to compare their performance against that of several existing sharing-oblivious online algorithms. For small problem instances, we also compare the performance of the proposed online algorithms against the optimal solution obtained by solving the offline variant of the Sharing-Aware VM Packing problem (i.e., the version of the problem that assumes that the set of VM requests are known a priori). The experimental results show that our proposed sharing-aware online algorithms activate a smaller average number of physical servers relative to the sharing-oblivious algorithms, directly reduce the amount of required memory, and thus, require fewer physical servers to instantiate the VM instances requested by users.

ATOM: Efficient Tracking, Monitoring, and Orchestration of Cloud Resources

Article

Jan 2017

The emergence of Infrastructure as a Service framework brings new opportunities, which also accompanies with new challenges in auto scaling, resource allocation, and security. A fundamental challenge underpinning these problems is the continuous tracking and monitoring of resource usage in the system. In this paper, we present ATOM, an efficient and effective framework to automatically track, monitor, and orchestrate resource usage in an Infrastructure as a Service (IaaS) system that is widely used in cloud infrastructure. We use novel tracking method to continuously track important system usage metrics with low overhead, and develop a Principal Component Analysis (PCA) based approach to continuously monitor and automatically find anomalies based on the approximated tracking results. We show how to dynamically set the tracking threshold based on the detection results, and further, how to adjust tracking algorithm to ensure its optimality under dynamic workloads. We demonstrate the extensibility of ATOM through virtual machine (VM) clustering. Lastly, when potential anomalies are identified, we use introspection tools to perform memory forensics on VMs guided by analyzed results from tracking and monitoring to identify malicious behavior inside a VM. We evaluate the performance of our framework in an open source IaaS system.

Energy-Aware Scheduling of Embarrassingly Parallel Jobs and Resource Allocation in Cloud

Article

Nov 2016

EAFR: An Energy-Efficient Adaptive File Replication System in Data-Intensive Clusters

Article

Sep 2016

In data intensive clusters, a large amount of files are stored, processed and transferred simultaneously. To increase the data availability, some file systems create and store three replicas for each file in randomly selected servers across different racks. However, they neglect the file heterogeneity and server heterogeneity, which can be leveraged to further enhance data availability and file system efficiency. As files have heterogeneous popularities, a rigid number of three replicas may not provide immediate response to an excessive number of read requests to hot files, and waste resources (including energy) for replicas of cold files that have few read requests. Also, servers are heterogeneous in network bandwidth, hardware configuration and capacity (i.e., the maximal number of service requests that can be supported simultaneously), it is crucial to select replica servers to ensure low replication delay and request response delay. In this paper, we propose an Energy-Efficient Adaptive File Replication System (EAFR), which incorporates three components. It is adaptive to time-varying file popularities to achieve a good tradeoff between data availability and efficiency. Higher popularity of a file leads to more replicas and vice versa. Also, to achieve energy efficiency, servers are classified into hot servers and cold servers with different energy consumption, and cold files are stored in cold servers. EAFR then selects a server with sufficient capacity (including network bandwidth and capacity) to hold a replica. To further improve the performance of EAFR, we propose a dynamic transmission rate adjustment strategy to prevent potential incast congestion when replicating a file to a server, a network-aware data node selection strategy to reduce file read latency, and a load-aware replica maintenance strategy to quickly create file replicas under replica node failures. Experimental results on a real-world cluster show the effectiveness of EAFR and proposed strategies in reducing file read latency, replication time, and power consumption in large clusters.

Scalable and Cost-Efficient Algorithms for Reliable and Distributed Cloud Storage

Chapter

Feb 2016

Makhlouf Hadji

This paper focuses on minimizing jointly data storage and networking costs in a distributed cloud storage environment. We present two new efficient algorithms to place encrypted data chunks and enhance data availability when guaranteeing a minimum cost of storage and communication in the same time. The proposed underlying solutions, based on linear programming approach lead to an exact formulation with convergence times feasible for small and medium network sizes. A new polynomial time algorithm is presented and shown to scale to much larger network sizes. Performance assessment results, using simulations, show the scalability and cost-efficiency of the proposed distributed cloud storage solutions.

On the Latency and Energy Efficiency of Distributed Storage Systems

Article

Jul 2015

Popularity (Hit Rate) Based Replica Creation for Enhancing the Availability in Cloud Storage

Abstract and Figures

Recommended publications

A Quantitative Justification to Dynamic Partial Replication of Web Contents through an Agent Archite...

Cloud Messaging Service for Preventing Smishing Attack

Resource-Saving File Management Scheme for Online Video Provisioning on Content Delivery Networks

Private Information Retrieval from MDS Array Codes with (Near-) Optimal Repair Bandwidth