Content uploaded by Mahdi S.Almhanna
Author content
All content in this area was uploaded by Mahdi S.Almhanna on Oct 11, 2017
Content may be subject to copyright.
Minimizing Replica Idle Time
Dr. Mahdi S. Almhanna
Information System Management Department
University of Information Technology and Communication,
Iraq, Baghdad.
E-mail : Mahdialmhanna@gmail.com
Abstract— The specialists know that the Servers are not equal
speeds as it depends on several factors Such as processor speed,
bandwidth, congestion…. etc. This means that if we use multiple
Servers for the purpose of downloading some files there will be a
problem that is the swift Server waits lazy one to complete its
task. In this work, adoption new strategy for transformation files,
called “Minimizing replica idle time” can solve this problem and
improves the performance by decreasing file transfer time more
than other strategy.
Keywords— Data grid; Replica Management System; Replica
Optimization Service; Data Transfer Service and GridFTP
I. INTRODUCTION
Data grid "DG" is a collection of services give the power to
access[1,2,3,4,5,7,8] modify and transfer huge amounts of data
spread in multiple different places. "DG" makes this potential, by
using middleware of applications and services that haul data and
resources from several domains then provided to the users. In DG the
data can be located at multiple sites, each one has its own
administrative domain controlled by collection of rules determine
who can access those data.
Several replicas distributed outside their original administrative
domain, it should be available as efficiently as possible.
Downloading huge data from multiple replicas will certainly differ in
performance rates, the most important factors may affected on
download speeds is bandwidth because bandwidth vary unpredictably
due to internet nature .The other factor is congestion in the links
between Server and clients. Choosing best replica with no congestion
link will be good way to improve the downloading speed. [
17,18,19,20 ]
There are several ways for downloading files; one way is to use
multiple Servers for downloading multiple files, but the bottle neck
problem still remains because faster Server should wait extra time for
the slowest Server to finish its job. Thus, it’s best to decrease the
variation of finish times between the Servers.
Another way using collocation technology [10, 11, 13, 14, 15, 16,
17,18] to download data. This technique enables client to download
file from multiple Servers by setting up multiple connections in
parallel, this improve the performance” [17 ,18]. In our work, we
proposed this way with new technique called “ Minimizing replica
idle time” to improves transfer performance in Data Grid
environments. Our proposed can eliminate the idle times of Server
and decrease the time of transfer files.
II. DATA ACCESS SERVICE
Data grids support replication of datasets, Data access services
and data transfer service work simultaneously to provide control for
access and management of data transfers within data grid. Useing
multiple replicas allow multiple users access the datasets. So, replicas
placed strategically within the sites where users need them, but with
some restriction because the replication of datasets and creation of
replicas depended on the availability of storage within the sites and
the bandwidth between them.
Replica Management Systems (RMS) controls the creation of replica
datasets and replication. RMS determines needs of users for replicas
and produce them dependent on storage and bandwidth availability
III. REPLICA MANAGEMENT SYSTEM (RMS)
RMS acts like a logical entry, it is a main component of Data
Grid react with other components as follows:
A. REPLICA LOCATION SERVICE (RLS)
The replicas resides on the physical storage systems[17,18,19],
anywhere it is reside? RLS can answer this by Maintains path with, it
is responsible to look after a catalogue of files registered by users or
even services at the time of creating files, such as illustrate in Figure1
below .
RLS creates relation among names of logical and physical file of
replica . We can send LFN to the Server of Replica Location Service
(RLS) and request the registered PFN of replicas. So, we can ask
Server to find LFN correlating with a particular physical file location.
Also we can inquire from RLS Server about some attributes such as
size of files or even checksum.
Figure1: Replica Files
B. Replica Optimization Service (ROS)
to choose Best Replica Site use this service, which can be done
by collecting the notification from network monitoring service (
NMS).
C. Data Transfer Service (DTS)
After knowing the physical address, Replica management
service (RMS) asks Data Transfer Service (DTS) to transfer the
related file sets using secure, high performance, and
authoritative data transfer protocol such as GridFTP or
UDT[17,18].
D. Replica Management and Selection
Group of files named Replica Management may use to create or
remove replicas in sites. The addresses of replica site and files
found in replica catalog, multiple catalogs are there in data grid.
Replica manager maintains this replica catalog [10,11,13,14].
IV. Multiple Replica Sites Vice Single File
First of all, the appropriate replica Candidate should be chosen
[17, 18, 19, 20]. Determine best sites [17,18,19,20] may be done by
sending a request to the sites and calculating the response time, the
least time is the safest site which is mean no congestion and therefore
faster one; after that check presence site whether they have our files
or not. Neglect sites that do not contain the required files.
Data grids consists many resources located in many countries or even
many counties within a country. We used grid middleware
GlobusToolkit [9] as data grid infrastructure. You can get all the
answer about resource management, information, services,
management, and security by using Globus Toolkit which its
components designed to supply mechanism for configuration
information and discovering resource, it uses GridFTP [1, 6, 9] to
provide management and transfer data in a wide-area environment.
GridFTP [1, 6, 9] enable parallel data transfer, the available Servers
assigned the requested files to deliver them in parallel. The file
should be divided into several part equal to the number of Server,
each part have a particular size from that file depended on the speed
of Server.
This way of processing file is efficacious approach to reduction the
time consumed between two machines for the purpose of
transformation. However, the inactive period of quicker Servers
waiting delayed one to finish its job is a worthy factor for expanding
time and so affects the efficiency. Several factors such as bandwidth,
CPU speed, link availability, etc between Servers are different
therefore; the loading balance of Servers to downloading files cannot
be avoided.
our job, called, Minimizing replica idle time, can overcome this
dilemma, by eliminate the idle time, where it will be assigned
calculated amount of job suit with Server capability at that moment,
so that all Servers finish their duty at the same time. Thus, with no
waiting time, improving overall transfer performance.
i. Mathematical Formulas
Let U = Size of whole file (downloaded File).
n = Number of Servers (Servers that have a copy of reqquested file).
ti = Time of dowenloading a file by a single server.
T = Time spent by all the Servers when they work in parallel.
ui = Size of file which can be download during the time T by the ith
Server.
After calculate the speed of each Server we can calculate the time (
T) "that the Servers needs to finish the entire works" by using
formula two. After that we used this amount of time with formula 1
to calculate (ui ) size of file which can be downloaded during the
time T by the ith Server.
ii. How to Calculate the Data Rate of Servers?
To calculate the Performance of Sliding-Window Let U is the
channel utilization, W is the window size and a = tt * tp when tt is
the transmission time and tp is the propagation time.
1 when w ≥ 2 * a +1.
U=
W / (2 * a+1) when w < 2 * a +1.
Thus, Data Rate = BW (band width) * U.
iii. Procedure
1. Start
2. Calculate “S”, size of requested file.
3. Calculate “ti”, time of each Server spends for
downloading entire file if its work individually, ≈ S/
Data Rate.
4. Calculate " T", total time of transferring requested file
1 = T * ∑ (1 / ti). When i=1, 2, 3….n
5. Calculate “ui”, using the formula,
ui= (T / ti )* U.
6. Assigned each part of file to single Server.
7. End.
V. CASE STUDY.
Suppose that there are specific actions required to download a file
size of 1000 GB (U), and we want choose 4 Servers (n) for the
purpose of loading this file, first step of action is to send a small
backed to account RTT for the purpose of knowing the speed of each
Server (ti), suppose the speed of each Server is 100 Gb, 200 Gb, 400
Gb, and 500 GB respectively, this mean the Servers need 10, 5, 2.5
and 2 unit time respectively to download 1000 GB file size .
Speed Server
File downloading time
S1=100 GB/H
t 1 = 1000/100 =10 unit time
S2= 200 GB/H
t 2 = 1000/200= 5 unit time
S3= 400 GB/H
t 3 =1000/400= 2.5 unit time
S4= 500 GB/H
t 4 = 1000/500= 2 unit time
Table 1: Time taken to download entire file by each Server.
As traditional way, let all the Servers work in parallel , before
starting, divide the file into four equal parts, assign each part to one
Server, so each Server will try to download 250 GB, so, the above
table update to the following one.
SPEED SERVER
DOUNLOADING TIME
S1 = 100 GB/H
t 1 =10/4=2.5 unit time
S2 = 200 GB/H
t 2 =5/4= 1.25 unit time
S3 = 400 GB/H
t 3 =2.5/4=0.625 unit time
S4 = 500 GB/H
t 4 =2/4= 0.5 unit time
Table 2: Time taken to download quarter of file by Servers
Thus
ui= /ti * U ……………… 1.
=
Thus =
Thus 1= *(
………………2.
From the above table, and as illustrated in the Figure 2: we can see
Server4 can finish its job in 0.5 unit time whereas Server1 needs 2.5
units time to finish its work, that's mean Server4 will waits 2 units
time as an idle time, same for Server3 and Server2 each of them will
waits 1.875 unit time and 1.25 unit time respectively.
It's clear that, if we used only Server4 then it needs two units time
only to downloading the entire file, but when we divided the file into
quarters then the time taken for downloading entire file became worst
,its need 2.5 units time because Server4 waits Server1 to finish its job.
And therefore, this method is inappropriate where it had expanded
the time takes to download the file despite the increase in the number
of Servers. The idle time will be 2 units time in Server4 "its faster
one" but the total time needs to finish the entire jobs is same time of
the slowest Server which is equal to the 2.5 unit time.
Figure 2: Idle Time of Servers
As an our way ,let all the Servers work in parallel ,and before starting
divide the file into four parts the size of each part match with the
following formula,
1=T * (∑ 1 /ti) ………2
Thus 1= T * (1/10 +1/5+1/2.5+1/2)
T=1/1.2 = 0.833333333 ≈ 0.833334
Thus T = 0.833334 (total time needs to download the entire file).
And by using formula 1
(ui = T / ti * U)
We can calculate each part of file which assigned to the Servers as
shown,
U1 = 0.833334/10 *1000= 83.3334.
(Part of file assigned to Server1).
U2 = 0.833334/5 *1000= 166.6668
(Part of file assigned to Server2).
U3 = 0.833334/2.5 *1000=333.3336
(Part of file assigned to Server3).
U4 = 0.833334/2 *1000= 416.667.
(Part of file assigned to Server4).
The total parts equal to (83.3334 + 166.6668 + 333.3336 + 416.667)
≈ 1000GB
Thus all Servers will finish the entire work at the same time with no
idle time, the total work time equal to 0.83334 unit times. We gain
around 2.5 – 0.83334 = 1.67 unit time which is mean we decrease the
work time around 300% comparing with traditional way .
Figure 3: Speed of Downloading File for Each Server
Figure 4: Compare with Traditional Ways
Figure 5: Speed of Servers
Figure 6: Portion Ratio of File Assigned to Each Server
VI. CONCLUSION:
Processor speed, Bandwidth, Congestion, and more other factors are
varies among Servers. That leads to deferent amount of file
transferring speed among servers. In another words, if there are
multiple Servers work simultaneously for downloading a huge file,
then the faster Server it’s been forced to wait for the slowest Server
until finish the entire jobs.
The current work is to eliminate the idle time of servers. To do this,
the requested files should be divided into multiple parts; each part is
assigned to a single server.
The proposed method, has improved the throughput of the network
about 300% compared with traditional method.
REFERENCES
1. ECR 2005 – Scientific Programme – “Abstracts , Eur Radial Suppl
(2005) 15 (Suppl 1)”: 1. doi:10.1007/s10406-005-0100-2
2. Kumar, K.A., Quamar, A., Deshpande, A. et al. “SWORD:
workload-aware data placement and replica selection for cloud “
data management systems, The VLDB Journal (2014) 23: 845.
doi:10.1007/s00778-014-0362-1
3. Toporkov, V.V. & Yemelyanov, D.M. “Economic model of
scheduling and fair resource sharing in distributed computations”,
Program Comput Soft (2014) 40: 35.
doi:10.1134/S0361768814010071
4. Pethuru Raj, Anupama Raman, Dhivya Nagara and Siddhartha
Duggirala “High-Performance Big-Data Analytics”, Part of the
series Computer Communications and Networks pp 275-315, High-
Performance Grids and Clusters
5. Foster, I., Kesselman, C.: “Globus: A Metacomputing Infrastructure
Toolkit”. International Journal of Supercomputer Applications and
High Performance Computing (1997) 11(2), 115–128
6. Linstead, E., Bajracharya, S., Ngo, T. et al. “Sourcerer: mining and
searching internet-scale software repositories”, Data Min Knowl
Disc (2009) 18: 300. doi:10.1007/s10618-008-0118-x
7. author, Sourav Mazumder. book , “Big Data Tools and Platforms”.
8. Manuel Sánchez, Óscar Cánovas, Diego Sevilla and
Antonio F. Gómez-Skarmeta. “Advances in Grid Computing” EGC
2005, A Service-Based Architecture for Integrating Globus 2 and
Globus 3.
9. Tan, J., Abramson, D. & Enticott C. J, “ Rerouting and
Multiplexing System for Grid Connectivity Across Firewalls Grid
Computing”, (2009) 7: 25. doi:10.1007/s10723-008-9104-
10. Cameron, D., Casey, J., Guy, L. et al. “Replica Management in the
European DataGrid Project”, J Grid Computing (2004) 2: 341.
doi:10.1007/s10723-004-5745-x
11. Ravimaran, S. & Maluk Mohamed, M.A. “Integrated Obj_FedRep:
Evaluation of Surrogate Object based Mobile Cloud System for
Federation”, Replica and Data Management, Arab J Sci Eng (2014)
39: 4577. doi:10.1007/s13369-014-1001-2
12. Caron, E., Desprez, F. & Muresan, A. “Pattern Matching Based
Forecast of Non-periodic Repetitive Behavior for Cloud Clients”,
Grid Computing (2011) 9: 49. doi:10.1007/s10723-010-9178-4
13. Hai Jin Jin Huang Xia Xie and Qin Zhang. “Using Classification
Techniques to Improve Replica Selection in Data Grid”
14. Yang, CT., Shih, PC., Lin, CF. et al. “ A resource broker with an
efficient network information model on grid environments”, J
Supercomput (2007) 40: 249. doi:10.1007/s11227-006-0025-0
15. Mansouri, N. “Adaptive data replication strategy in cloud computing
for performance improvement”, Front. Comput. Sci. (2016) 10: 925.
doi:10.1007/s11704-016-5182-6
16. Bang Zhang , Xingwei Wang , and Min Huang. “A PGSA Based
Data Replica Selection Scheme for Accessing Cloud Storage
System” ,
17. M. S. Almahanaa, R. M. Almuttairi, “Enhanced Replica Selection
Technique for binding Replica Sites in Data Grids”, In Pro. Of
International Conference on Intelligent Infrastructure, 47th Annual
National Convention of the Computer Society of India organized by
The Kolkata Chapter December 1-2, 2012, Science City, Kolkata.
18. R.M. Almuttairi, R. Wankar, A. Negi, C.R. Rao, A. Agrawal, R.
Buyya, A two phased service oriented broker for replica selection in
data grids (2SOB), Future Generation Computer Systems (2012),
doi:10.1016/j.future.2012.09.007
19. R. M. Almuttairi , Replica Optimization in Data grids, IJSR -
International Technique Journal of science research, ISSN No 2277
- 8179 , Volume : 4, Issue : 2 , February 2015
20. R.M. Almuttairi, “Smart Vogel s Approximation Method SVAM”,
International Journal of Advanced Computer Research (ISSN
(print): 2249-7277 ISSN (online): 2277-7970) ,Volume-4 Number-1
Issue-14 March-2014