ArticlePDF Available

An Efficient Data Replication Technique with Fault Tolerance Approach using BVAG with Checkpoint and Rollback-Recovery

January 2021
International Journal of Advanced Computer Science and Applications 12(1)

January 2021
12(1)

DOI:10.14569/IJACSA.2021.0120155

License
CC BY 4.0

Authors:

Sharifah Hafizah Sy Ahmad Ubaidillah

Universiti Malaysia Pahang

Noraziah Ahmad

Universiti Malaysia Pahang Al-Sultan Abdullah

Data replication has been one of the pathways for distributed database management as well as computational intelligence scheme as it continues to improve data access and reliability. The performance of data replication technique can be crucial when it involves failure interruption. In order to develop a more efficient data replication technique which can cope with failure, a fault tolerance approach needs to be applied in the data replication transaction. Fault tolerance approach is a core issue for a transaction management as it preserves an operation mode transaction prone to failure. In this study, a data replication technique known as Binary Vote Assignment on Grid (BVAG) has been combined with a fault tolerance approach named as Checkpoint and Rollback-Recovery (CR) to evaluate the effectiveness of applying fault tolerance approach in a data replication transaction. Binary Vote Assignment on Grid with Checkpoint and Rollback-Recovery Transaction Manager (BVAGCRTM) is used to run the BVAGCR proposed method. The performance of the proposed BVAGCR is compared to standard BVAG in terms of total execution time for a single data replication transaction. The experimental results reveal that BVAGCR improves the BVAG total execution time in failure environment of about 31.65% by using CR fault tolerance approach. Besides improving the total execution time of BVAG, BVAGCR also reduces the time taken to execute the most critical phase in BVAGCRTM which is Update (U) phase by 98.82%. Therefore, based on the benefits gained, BVAGCR is recommended as a new and efficient technique to obtain a reliable performance of data replication with failure condition in distributed databases.

Binary Vote Assignment on Grid (BVAG).

…

Binary Vote Assignment on Grid Transaction Manager (BVATM).

…

Checkpoint and Rollback-Recovery (CR).

…

Time Comparison between Standard BVAG and BVAGCR.

…

Figures - uploaded by Sharifah Hafizah Sy Ahmad Ubaidillah

Content may be subject to copyright.

Content uploaded by Sharifah Hafizah Sy Ahmad Ubaidillah

Content may be subject to copyright.

Available via license: CC BY 4.0

Content may be subject to copyright.

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 12, No. 1, 2021

473 | P a g e

www.ijacsa.thesai.org

An Efficient Data Replication Technique with Fault

Tolerance Approach using BVAG with Checkpoint

and Rollback-Recovery

Sharifah Hafizah Sy Ahmad Ubaidillah1, Basem Alkazemi2, A. Noraziah3

Faculty of Computing, University Malaysia Pahang, Pahang, Malaysia1, 3

College of Computer and Information System, Umm Al-Qura University, Saudi Arabia2

IBM Center of Excellence, University Malaysia Pahang, 26300 Kuantan, Pahang, Malaysia3

Abstract—Data replication has been one of the pathways for

distributed database management as well as computational

intelligence scheme as it continues to improve data access and

reliability. The performance of data replication technique can be

crucial when it involves failure interruption. In order to develop

a more efficient data replication technique which can cope with

failure, a fault tolerance approach needs to be applied in the data

replication transaction. Fault tolerance approach is a core issue

for a transaction management as it preserves an operation mode

transaction prone to failure. In this study, a data replication

technique known as Binary Vote Assignment on Grid (BVAG)

has been combined with a fault tolerance approach named as

Checkpoint and Rollback-Recovery (CR) to evaluate the

effectiveness of applying fault tolerance approach in a data

replication transaction. Binary Vote Assignment on Grid with

Checkpoint and Rollback-Recovery Transaction Manager

(BVAGCRTM) is used to run the BVAGCR proposed method.

The performance of the proposed BVAGCR is compared to

standard BVAG in terms of total execution time for a single data

replication transaction. The experimental results reveal that

BVAGCR improves the BVAG total execution time in failure

environment of about 31.65% by using CR fault tolerance

approach. Besides improving the total execution time of BVAG,

BVAGCR also reduces the time taken to execute the most critical

phase in BVAGCRTM which is Update (U) phase by 98.82%.

Therefore, based on the benefits gained, BVAGCR is

recommended as a new and efficient technique to obtain a

reliable performance of data replication with failure condition in

distributed databases.

Keywords—Data replication; computational intelligence; fault

tolerance; binary vote assignment on grid; checkpoint and

rollback-recovery

I. INTRODUCTION

Data replication is a useful technique for a Distributed

Database System (DDS) as it can provide high availability and

efficient access to required data and can be applied in a grid

computation situation to improve the efficiency of the system

[1, 2]. Besides, data replication technique can be one of the

influential techniques that can expand the usefulness of

computational intelligence structure. Data replication involves

frequent, incremental copying of data from one database to

another database in a continuous manner which can increases

availability, provide low response times and allows fast local

access of the system [3, 4]. Despite the goodness of data

replication techniques in handling the distributed database,

still, it has some weakness when dealing with failure cases.

Handling data replication in the failure cases is very

crucial in order to preserve the effectiveness of the systems.

The main challenges of data replication are that the replica has

to be kept consistent when updates occur despite having any

failure during the transaction‟s running [4]. The only way to

solve these problems is by enabling fault tolerance. Fault

tolerance approach is a crucial issue in distributed computing;

it keeps the transaction in an operational condition in subject

to failure. The most important point of it is to keep the

transaction working even if any of its part goes off or faulty

[5]. Fault tolerance is the dynamic approach that‟s used to

keep the interrelated transaction together, put up with

reliability and availability in DDS. Efficient fault tolerance

approaches help in detecting of faults and if possible recovers

from it [6].

Based on previous studies, the combination of any data

replication technique with Checkpoint and Rollback-Recovery

(CR) fault tolerance approach in a distributed database is

infrequently analyzed irrespectively of its individual

promising potential to lessen the total execution time in

failure-prone situations [7]. As an example, research done by

[7] explored the performance of transaction process using CR

only, replication only and the combination of both techniques

in linear workflow with the presence of failure. The result

obtained reveals that the conditions in which each techniques

lead to improved performance. Besides that, paper done by [8]

concludes that the CR approach is essential for not only

transaction process replication but also for security issues.

Despite good performances, there are only few researchers

who had interest in exploring the effectiveness of combining

data replication technique with CR fault tolerance approach. It

is a common practice to utilize a Checkpoint and Rollback-

Recovery (CR) to facilitate an adequate failure recovery for

improving transaction reliability [9]. Mainly, the checkpoint is

performed to save information linked with the completed

portion of a transaction. When a transaction failure occurs,

through rollback and information retrievals, the transaction

can be resumed from the last successful checkpoint. Instead,

without implementing the checkpoint technique, the

transaction has to repeat the execution of the entire transaction

from the very beginning [10]. Hence, the data replication

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 12, No. 1, 2021

474 | P a g e

www.ijacsa.thesai.org

transaction might be time consuming without the CR approach

if any failure happened.

Therefore, in this study, a data replication technique called

as Binary Vote Assignment on Grid (BVAG) is combined

with CR with the proposed of evaluating the efficiency of

hybridizing data replication technique with fault tolerance

approach for a better performance of a single data replication

transaction in the presence of failure. The proposed method,

BVAGCR is implemented in Binary Vote Assignment on Grid

with Checkpoint and Rollback-Recovery Transaction Manager

(BVAGCRTM).

The paper is arranged as follows. In the next section,

Literature Review is detailing about BVAG data replication

technique and CR fault tolerance approach. In Section 3,

Methodology describes the procedure of BVAGCR technique

which is employed via BVAGCRTM. The Result and

Discussion section discussed the outcome obtained from

standard BVAG and BVAGCR. Also presented in this section

is the comparison of both techniques in terms of execution

time while managing data replication transaction with the

occurrence of failure. Finally, the conclusion of this research

and suggestion for future research are provided in Conclusion.

II. LITERATURE REVIEW

A. Binary Vote Assignment on Grid (BVAG)

The concept of Binary Vote Assignment on Grid (BVAG)

is replicating the data from the primary replica to the

neighbours‟ replica which is located at the adjacent sites of the

primary replica [11]. Full replication can result in a huge

waste of storage space and consume a lot of bandwidth [12].

By using this technique, the execution time of the replication

process in a distributed database can be reduced as it only

replicates data at the specified sites [12]. The query expansion

process involves augmenting initial user query with additional

terms that are related to user requirements [13], while BVAG

focus challenge to increase write query availability through

replication. BVAG is striding a new track in replication that

helps to maximize the write availability with little

communication cost as a result of minimum number of

quorum size needed. Furthermore, the replication is

interconnected with transaction procedure [14].

In BVAG, all sites are logically organized in the form of

two-dimensional grid structure. For example, if a BVAG

consists of nine sites, it will logically organize in the form of 3

x 3 grid structure (Fig. 1) as shown. As can be seen in Fig. 1,

site A is neighbours to site B and site D, if A is logically

located adjacent to B and D. Hence, four sites on the corners

of the grid have only two adjacent sites, other sites on the

boundaries have only three neighbours and the site located in

the middle of the grid formation has four neighbours [14].

Fig. 1. Binary Vote Assignment on Grid (BVAG).

Each site has a premier data file. Data will be replicated to

the neighbours sites from the primary site [11]. For simplicity,

the primary site of any data file and its neighbours are

assigned with vote one (1) or vote zero (0) otherwise. A

neighbour binary vote assignment on grid, B, is a function

such that B(i) ϵ {0,1},1 ≤i ≤d where B(i) is the vote assign to

site i. This assignment is treated as an allocation of replicated

copies and a vote assigned to the site results in a copy

allocated at the neighbour. Due to the data that will be

replicated to neighbours, the possible number of data

replication from each site, d, should then be:

d ≤ quorum (the number of neighbours + a data from the

primary site itself).

For example, primary data from the site A which is called

as „a‟ are replicated to site B and site D which are their

neighbours. Site E which holds the primary data „e‟ has four

neighbours, namely, sites B, D, F and H that will get the

replicated data of „e‟. As such, the site E has five replicas.

Meanwhile, primary data „f‟ from site F are replicated to site

C, E and I. The number of quorums used are based on the total

number of replicated data and the primary data, d, which can

be three, four or five [11,14].

The transaction procedure in BVAG is called as Binary

Vote Assignment on Grid Transaction Manager (BVAGTM).

The BVAGTM is applied to control the transaction of each

data replication process. The primary site of any primary data

file and its replica are assigned with different votes depends

on their condition. There are two types of votes used in this

study as shown in Table I. Zero (0) specified that the site is

available (free). Meanwhile, one (1) displayed that the site is

unavailable (busy). The status of each site is statistically

independent of others. The status of each site is statistically

independent of others. When a site is available, the copy at the

site is available too; otherwise, it is unavailable [11,14].

TABLE I. TYPE OF STATUS

Type of Status

Definition

Available (Site is free or not in used)

Unavailable (Site is busy or in used)

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 12, No. 1, 2021

475 | P a g e

www.ijacsa.thesai.org

There are seven main phases involve in BVATM; Initiate

Lock (IL), Propagate Lock (PL), Obtain Quorum(OQ),

Update(U), Commit (C), Unlock (UL) and Release Lock (RL)

[11,14]. IL phase involves locking the primary site if the

primary site is in available (0) status. If the primary site is

busy (status = 1), then the primary site will be release (RL).

After the primary site has been locked, the PL phase

determines the status of each neighbours site. All neighbours

sites are locked if they are in available (0) status. Otherwise,

the neighbours‟ sites will be release (RL). Then, OQ phase

declares that the quorum obtained is enough for the

transaction to be continued. Next, the primary data will be

updated in the U phase. Afterward, the updated primary data

which is also called as new primary data is replicated to the

neighbours‟ sites in C phase. Last but not least, the transaction

will unlock (UL) all the sites that are involved in the

transaction. The summary of BVAGTM is shown in Fig. 2.

B. Checkpoint and Rollback-Recovery (CR)

Checkpoint with Rollback-Recovery (CR) is a renowned

fault tolerance approach. Checkpoint is a process which stores

the recent state of a transaction in stable (non-volatile) storage

[9]. It is recognized through the normal execution of a

transaction occasionally. The information related to the

transaction is saved on a stable storage with the intention of

using it in case of site failures. The saved information

comprises of the transaction state, its environment, the value

of registers, etc. When an error is spotted, the transaction is

roll backed to the last saved state [15].

Fig. 3 shows the summary of CR approach. The

checkpoint mechanism takes a snapshot of the transaction

state and stores the information on some non-volatile storage

medium [16]. When failures occur, the restore mechanism

copies the last known check pointed state of transaction back

into memory and continue processing. The basic idea behind

CR is the saving and restoration of transaction state. By saving

the current state of the transaction occasionally or before

critical code sections, it delivers the baseline information

needed for the restoration of lost state in the event of a

transaction failure. CR is one of various time efficient fault

tolerant approaches [17]. Besides reducing the execution time,

CR can also lessen computing resources [18].

Fig. 2. Binary Vote Assignment on Grid Transaction Manager (BVATM).

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 12, No. 1, 2021

476 | P a g e

www.ijacsa.thesai.org

Fig. 3. Checkpoint and Rollback-Recovery (CR).

III. METHODOLOGY

In this section, the methodology of the proposed technique

called as BVAGCR is described. Fig. 4 illustrated the

algorithm of BVAGCR technique applied in Binary Vote

Assignment on Grid with Checkpoint and Rollback-recovery

Transaction Manager (BVAGCRTM) for a single data

replication transaction.

First, the following notations are defined as:

1)  is transaction,

2) is checkpoint transaction,

3) λ represents different group of transaction T (before and

until get quorum). λ can be either α or β,

4)  is transaction of group λ

5)  is the data to be update

6)  is the number of queue for transaction . i = 1,2,3, …

7)  is transaction of λ for data χ in queue 1

8)  is checkpoint transaction of λ for data χ in queue 1

9)  is checkpoint file

10)  is checkpoint file for transaction of λ for data χ in

queue 1

11)  is the status of the required site

12)  stands for Primary Replica site

13)  stands for Neighbour Replica site

14)  is status for 

15)  is status for 

16)  is the status of  which hold data χ

17)  is the status of  which hold data χ

18)  is the amount of quorum needed to continue the

transaction of 

19)  is database

20)  is the database for 

21)  is the database for 

22)  is the database of  that consists of data χ

23)  is the database of  that consists of data χ

A data replication transaction can request to update any

data file at any replica. The BVAGCRTM will firstly check

whether there is any checkpoint file, that has been

saved in BVAGCRTM. If there is none file,

BVAGCRTM will accept a new data replication transaction.

A new data replication transaction named as  which

request to update data  is in first queue  in

BVAGCRTM. The transaction will check the status of

primary replica  which hold data  whether it‟s status

is free (0), busy (1) or having a failure (-1). If the primary

replica is free to be used in the transaction, the status will be

lock as 1. Else, the primary replica will be released because it

is unavailable. The status of  and data are save in

the checkpoint file named as   file. Next, the

transaction will request to lock all the neighbours‟ replicas,

 . If all is in free status, then it will be lock as 1.

However, if one or any  is busy, all of it will be

released for other transactions. The status of each is

then saved in file.

Afterward, the total of  is declared and saved in the

file. After that, the Update and Commit () function is

executed. Data  will be update at  which is the

primary database that hold data.  is then saved in

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 12, No. 1, 2021

477 | P a g e

www.ijacsa.thesai.org

 file. Next,   will commit the replication at all

neighbours database, . All  is then saved in

file. Lastly,  and all  will be unlock

(status set to 0).

Meanwhile, if there are any  that has been saved

in BVAGCRTM, then a transaction called as  will

retrieve any information saved in The information

that has been saved in are the status of  and

all , data ,  and .  and

is then set as 1 to declare that it is in used for the

transaction. Then, the Update and Commit () function is

executed based on the information that has been recover.

Finally, all the locks for  and  will be unlock

and are sets to 0. This algorithm can be implemented in the

case of either primary site or any neighbours‟ sites are having

a failure. The transaction can only be continued if the failed

site is back to normal status (0). This is because the proposed

algorithm is not constructed to repair the failure but somehow

to let the failed transaction to continue running not from the

very beginning after the failure is solved.

Main Algorithm

{

If exist  file,

Begin transaction ,

Read  ,

Write  = 1,

Commit ,

Write all = 1,

Commit,

Update and Commit (x),

Write = 0,

Write = 0,

Else

Begin transaction  ,

If  = 0,

Write  = 1,

Commit ,

Save  in  file,

Else

Write = 0,

End

If all = 0,

Write all = 1,

Commit all ,

Save all in  file,

Else

Write all  = 0,

End

= majority,

Update and Commit (x),

Write = 0,

Write = 0,

End

}

Function Update and Commit (x)

{

Update x in ,

Commit x in  ,

Save  , in  file,

Commit replication x in all ,

Save all , in  file,

}

Fig. 4. Algorithm of BVAGCRTM.

In the next section, an experiment considering failure

condition has been conducted in order to evaluate the

performance of BVAG and BVAGCR. The results obtained

and the discussions about the results are also explained in the

next section.

IV. RESULT AND DISCUSSION

An experiment of a single transaction with failure

condition occurred at the primary replica was conducted in

this research using MATLAB simulation. The time between

failure and recovery is assumed as 10 seconds. The transaction

is continued after failure recovery. In this transaction, the site

E is considered as the primary replica holding primary data e.

Meanwhile, sites B, D, F, H are the neighbor‟s replica which

will be receiving the copy of data e from site E. In this case,

the transaction,   requests to update data  and

replicate the data into the neighbors‟ replica. A transaction

failure is considered to occur in the Update (U) phase seeing

that it is the critical phase in BVAG and BVAGCR.

Fig. 5 and Fig. 6 demonstrated the flow of the transaction,

 with failure condition for BVAG and BVAGCR. As

can be seen in Fig. 5 (BVAG), the information related to the

transaction in BVAGCRTM was not being saved in a

checkpoint file ( . Thus, when a failure occur in T10 , the

transaction needs to be started all over again as there is no

information recovery can be done if failure occurs.

Meanwhile, in BVAGCR (refer Fig. 6), the information

related to the transaction is being saved in the checkpoint file

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 12, No. 1, 2021

478 | P a g e

www.ijacsa.thesai.org

(highlighted with grey) for each phase of BVAGCRTM. Thus,

once the transaction  failed as in T13, the information

can be retrieved from the checkpoint file  and the

transaction can be resume from the Update phase (U) as the

failure occurred in that particular phase.

The execution times for each phase involved in BVAG and

BVAGCR methods are recorded before and after failure

occurred as shown in Table II and Table III. As presented in

Table II, the overall time taken to complete a transaction using

BVAG is 15.8574 seconds which include the estimation time

duration of failure (10 seconds). The transaction had run four

phases which are IL, PL, CQ, OQ that took 0.4931 seconds to

be executed before failure occurred. After failure recovery, the

transaction needs to be run again from the start due to no

checkpoint file, (  found. The time taken to rerun the

transaction is 5.3643 seconds. For the critical phase U, the

time needed to finished it is 4.7809 seconds.

As displayed in Table III, BVAGCR need 10.8381 seconds

of time to finish a transaction before failure occurred and after

failure recovery which also takes account of the estimate time

duration of failure (10 seconds). The transaction had

performed four phases same as BVAG which are IL, PL, CQ,

and OQ that acquired 0.7443 seconds before failure happened.

Once the failure had been recovered, the transaction only

needs to rerun at U phase onwards because it had retrieved the

information about the transaction which is save in a

checkpoint file (. Based on the (  file, the last saved

information of the current transaction is in phase U. The time

used to rerun the transaction from U phase until the data has

been replicated to all neighbors‟ sites is 0.0938 seconds. For

the critical phase (U), the time needed to finish it is 0.0516

seconds.

Fig. 7 shows the comparisons of time taken for total time,

update phase, execution time before failure and execution time

after failure between BVAG and BVAGCR. Based on Fig. 7,

the proposed method, BVAGCR used more time (0.7443

seconds) than BVAG (0.4931 seconds) in execution time

before failure because it took extra time to save the

information about the current transaction into a checkpoint

file. However, after failure recovery, BVAGCR spend less

time to complete the transaction than BVAG as it does not

have to rerun the transaction from the beginning.

REPLICA

TIME

unlock 

unlock 

unlock 

unlock 

unlock 

begin transaction 

initiate lock: 

write lock  = 1

get lock 

propagate lock: 

, 

write lock  = 1

write lock  = 1

write lock  = 1

write lock  = 1

get lock : 

, 

Obtain majority quorum



T10

update (e) in  and

failure occurred

T11

unlock 

unlock 

unlock 

unlock 

unlock 

T12

begin transaction 

T13

initiate lock: 

T14

write lock  = 1

T15

get lock 

quorum  = 1

T16

propagate lock: 

, 

T17

write lock  = 1

write lock  = 1

write lock  = 1

write lock  = 1

T18

get lock : 

, 

T19

T20

Obtain majority quorum



T21

update (e) 

T22

Commit update in 

T23

Commit

replication in 

Commit replication

in 

Commit

replication in 

Commit replication

in 

T24

write lock  = 0

write lock  = 0

write lock  = 0

write lock  = 0

write lock  = 0

Fig. 5. BVAG Transaction‟s Flow (Failure Occurred While Updating Data).

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 12, No. 1, 2021

479 | P a g e

www.ijacsa.thesai.org

REPLICA

TIME

unlock 

unlock 

unlock 

unlock 

unlock 

begin transaction

initiate lock: 

write lock  = 1

Save  in  file

get lock 

propagate lock: 

, 

write lock  = 1

write lock  = 1

write lock  = 1

write lock  = 1

save 

,  in  file

T10

get lock 

, 

T11

obtain majority quorum



T12

Save  in  file

T13

update (e) in  and

failure occurred

T14

begin transaction 

T15

Retrieve information

from  file

T16

write lock  = 1

write lock  = 1

write lock  = 1

write lock  = 1

write lock  = 1

T17

update (e) in 

T18

Commit update in 

T19

Save  , in  file

T20

Commit

replication in 

Commit

replication in 

Commit replication

in 

Commit

replication in 

T21

Save  , , ,

 in  file

T22

write lock  = 0

write lock  = 0

write lock  = 0

write lock  = 0

write lock  = 0

Fig. 6. BVAGCR Transaction‟s Flow (Failure Occurred While Updating Data).

TABLE II. EXECUTION TIME TAKEN FOR BVAG DATA REPLICATION

TRANSACTION (A) BEFORE FAILURE OCCUR, (B) AFTER FAILURE OCCUR

PHASES

TIME(SECONDS)

INITIATE LOCKA

0.2320

PROPAGATE LOCKA

0.2408

OBTAIN QUORUMA

0.0203

UPDATE AND FAILURE OCCUR

10.0000

INITIATE LOCKB

0.2331

PROPAGATE LOCKB

0.2354

OBTAIN QUORUMB

0.0300

UPDATEB

4.7809

COMMITB

0.0602

UNLOCKB

0.0247

TOTAL

15.8574

TABLE III. EXECUTION TIMES TAKEN FOR BVAGCR DATA REPLICATION

TRANSACTION (A) BEFORE FAILURE OCCUR. (B) AFTER FAILURE OCCUR

PHASES

TIME(SECONDS)

INITIATE LOCKA

0.2390

PROPAGATE LOCKA

0.4795

OBTAIN QUORUMA

0.0258

UPDATE AND FAILURE OCCUR

10.0000

UPDATEB

0.0516

COMMITB

0.0149

UNLOCKB

0.0273

TOTAL

10.8381

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 12, No. 1, 2021

480 | P a g e

www.ijacsa.thesai.org

Fig. 7. Time Comparison between Standard BVAG and BVAGCR.

Meanwhile, the total time taken for a single transaction of

BVAGCR (10.8381 seconds) is shorter than BVAG (15.8574

seconds). BVAGCR has improved the standard BVAG

method by 31.65 % in terms of total execution time. Besides

that, the efficiency of BVAGCR can also be seen when it

successfully improved the performance of standard BVAG in

critical phase which is the Update (U) phase by 98.82%. Thus,

based on the results obtained, it can be said that the objective

of this study which is to improve the efficiency of standard

BVAG by proposing a new data replication technique with

fault tolerance approach (BVAGCR) has been successfully

achieved.

V. CONCLUSION

This study has explored a new combination of data

replication and fault tolerance approach called as BVAGCR.

The performance of BVAGCR is tested using a simulation of

MATLAB. A comparison between standard BVAG and

BVAGCR has been done in order to evaluate the effectiveness

of implying the CR fault tolerance approach in BVAG data

replication technique. The result gained from this study shows

that the proposed BVAGCR has outperformed standard

BVAG in terms of total execution time, time taken to execute

the U phase and time taken to rerun the transaction after

failure recovery.

Therefore, BVAGCR can be proposed as an alternative

technique which is efficient and reliable to replicate data in

failure condition. To test the robustness of the proposed

BVAGCR, future work should explore the application of this

proposed method on big data. Besides that, BVAGCR can also

be implemented with data mining method in order to get more

competent performance of the data replication technique.

ACKNOWLEDGMENT

This study is supported by the Fundamental Research

Grant Scheme (RDU190185) with Reference no:

FRGS/1/2018/ICT03/UMP/02/3 that sponsored by Ministry of

Higher Education Malaysia (MOHE). In addition, this study is

also supported by Grant Code: (20UQU0074DSR) under the

Deanship of Scientific Research at Umm Al-Qura University.

Appreciation is also conveyed to University Malaysia Pahang

for project financing under UMP Short Term Grant

RDU1903122 and UMP PGRS RDU170329.

REFERENCES

[1] N. Dogra and S. A. Singh, “A survey of dynamic replication strategies in

distributed systems,” International Journal of Computer Applications.,

vol. 110, no. 11, Jan 2015.

[2] S. H. S. A. Ubaidillah and N. Ahmad, "Fragmentation Techniques for

Ideal Performance in Distributed Database–A Survey." International

Journal of Software Engineering and Computer Systems., vol. 6, no. 1,

pp. 18-24, May 2020.

[3] Kumar, Nirmal, and Girish V. Mattur. "Data replication in a database

environment." U.S. Patent Application No. 15/813,828.

[4] B. Kemme, “Data Replication,” In: Liu L., Özsu M. (eds) Encyclopedia

of Database Systems. Springer, New York, NY, 2017.

[5] A. Sarı and E. Çağlar, “Performance Simulation of Gossip Relay

Protocol in Multi-Hop Wireless Networks,” Owner: Girne American

University Editor: Asst. Prof. Dr. İbrahim Erşan Advisory Board: Prof.

Dr. Sadık Ülker Assoc. Prof. Dr. Zafer Ağdelen Cover Graphic Design:

Asst. Prof. Dr. İbrahim Erşan., pp. 145, 2015.

[6] A. Sari and M. Akkaya, “Fault tolerance mechanisms in distributed

systems,” International Journal of Communications, Network and

System Sciences., vol. 8, no. 12, pp. 471, Dec 2015.

[7] A. Benoit, A. Cavelan, F. M. Ciorba, V. Le Fevre and Y. Robert,

“Combining checkpointing and replication for reliable execution of

linear workflows,” in IEEE International Parallel and Distributed

Processing Symposium Workshops (IPDPSW), 2018, pp. 793-802.

[8] S. Desai, N. Pendharkar and Veritas Technologies LLC, “Data

replication techniques using incremental checkpoints,” United States

patent US 9,495,264. 15 Nov 2016.

[9] Y. Mo, L. Xing, Y. K. Lin and W. Guo, “Efficient Analysis of

Repairable Com-puting Systems Subject to Scheduled Checkpointing,”

IEEE Transactions on Dependable and Secure Computing., Sep 2018.

[10] R. Garg and A. K. Singh, “Fault tolerance in grid computing: state of the

art and open issues,” International Journal of Computer Science &

Engineering Survey (IJCSES)., vol. 2, no. 1, pp. 88-97, Feb 2011.

[11] A. Noraziah, A. A. Fauzi, S. H. Ubaidillah, Z. Abdullah, R. M. Sidek,

M. A. Fakhreldin, “Managing Database Replication Using Binary Vote

Assignment on Grid Quorum with Association Rule,” Advanced Science

Letters., vol. 24, no. 10, pp. 7834-7837, Oct 2018.

[12] J. P. Yang, “Elastic load balancing using self-adaptive replication

management,” IEEE Access., vol. 22, no. 5, pp. 7495-7504, Nov 2016.

[13] Muhammad Ahsan Raza, M. Rahmah, A. Noraziah, Mahmood Ashraf,

“Sensual Semantic Analysis for Effective Query Expansion “,

International Journal of Advanced Computer Science and Applications,

vol. 9, no. 12, pp. 55-60, 2018.

[14] A. A. Fauzi, A. Noraziah, T. Herawan, Z. Abdullah, R. Gupta,

“Managing Fragmented Database Using BVAGQ-AR Replication

Model,” Advanced Science Letters., vol. 23, no. 11, pp. 11088-11091,

Nov 2017.

[15] S. Veerapandi, S. Gavaskar and A. Sumithra, “A Hybrid Fault Tolerance

System for Distributed Environment using Check Point Mechanism and

Replication,” International Journal of Computer Applications., vol. 975,

pp. 8887, 2017.

[16] J. Wu. “Checkpointing and recovery in distributed and database

systems,” 2011.

[17] D. Poola, M. A. Salehi, K. Ramamohanarao and R. Buyya, “A

taxonomy and survey of fault-tolerant workflow management systems in

cloud and distributed computing environments,” in Software

Architecture for Big Data and the Cloud., pp. 285-320, Jan 2017.

[18] T. Sterling, M. Anderson and M. Brodowicz, “High performance

computing: modern systems and practices,” Morgan Kaufmann., Dec

2017.

An automate failure recovery for synchronous distributed database system

Article

Full-text available

Nov 2022

Periodically, researchers have been sharing their constant attempts to improve the existing methods for data replication in distributed database system. The main goal is to work for an efficient distributed environment. An efficient environment may handle huge amount of data and preserve data availability. The occasionally failures in distributed systems will affect the end results, such as data loss, income loss etc. Thus, to prevent the data loss and guarantee the continuity of the business, many organizations have applied disaster recovery solutions in their system. One of the widely used is database replication, because it guarantees data safety and availability. However, disaster still can occur in database replication. Hence, an automatic failure recovery technique called distributed database replication with fault tolerance (DDR-FT) has been proposed in this research. DDR-FT uses heartbeat message for node monitoring. Subsequently, a foundation of binary vote assignment for fragmented database (BVAFD) replication technique has been used. In DDR-FT, the data nodes are continuously monitored while auto reconfiguring for automatic failure recovery. From the conducted experiments, it is proved that DDR-FT can preserve system availability. It shows that DDR-FT technique provides a convenient approach to system availability for distributed database replication in real time environment.

Elastic deep learning through resilient collective operations

Conference Paper

Nov 2023

Sensual Semantic Analysis for Effective Query Expansion

Article

Full-text available

Jan 2018

The information has evolved rapidly over the World Wide Web in the past few years. To satisfy information needs, users mostly submit a query via traditional search engines, which retrieve results on the basis of keyword matching principle. However, a keyword-based search cannot recognize the meanings of keywords and the semantic relationship among the terms in the user's query; thus, this technique cannot retrieve satisfactory results. The expansion of an initial query with relevant meaningful terms can solve this issue and enhance information retrieval. Generally, query expansion methods consider concepts that are semantically related to query terms within the ontology as candidates in expanding the initial query. An analysis of the correct sense of query terms, rather than only considering semantic relations, is necessary to overcome language ambiguity problems. In this work, we proposed a query expansion framework on the basis of query sense analysis and semantics mining using computer science domain ontology, followed by working prototype of the system. The experts analyzed the results of system prototype over test dataset and Web data, and found a remarkable improvement in the overall search performance. Furthermore, the proposed framework demonstrated better mean average precision and recall values than the baseline method. © 2018 International Journal of Advanced Computer Science and Applications.

A Hybrid Fault Tolerance System for Distributed Environment using Check Point Mechanism and Replication

Article

Full-text available

Jan 2017

Elastic Load Balancing Using Self-Adaptive Replication Management

Article

Full-text available

Nov 2016

Jui-Pin Yang

Load balancing is a critical problem within storage clusters. Existing algorithms often require high communication overhead because they have to collect sufficient information that they can then use to dispatch requests for hotspot data fairly. We propose an efficient scheme to achieve approximately optimal load balancing while keeping communication overhead low, namely self-adaptive replication management (SARM). Our approach estimates the access strength of hotspot data and establishes adequate number of replicas on nodes based on their load conditions. Each node uses a dynamic scheduling algorithm to address requests for hotspot data. If the load conditions of all dispatched nodes exceed the fair load estimate, a minimum scheduling algorithm is used to dispatch the requests; otherwise, a probabilistic scheduling algorithm is adopted instead. In another word, SARM automatically switches the scheduling algorithms according to fair load estimates and the load conditions on nodes. Consequently, it eliminates request burstiness while achieving stable load balancing. To avoid excessive communication overhead, the fair load estimates are updated within a fixed time interval. Moreover, when the load variations in a node exceed a specific threshold, their load conditions are dynamically updated to other nodes. Finally, we also consider data availability in SARM. We present simulations and analysis on the performance of our approach compared with other schemes under a variety of load conditions.

Fault Tolerance Mechanisms in Distributed Systems

Article

Full-text available

Dec 2015

The use of technology has increased vastly and today computer systems are interconnected via different communication medium. The use of distributed systems in our day to day activities has solely improved with data distributions. This is because distributed systems enable nodes to organize and allow their resources to be used among the connected systems or devices that make people to be integrated with geographically distributed computing facilities. The distributed systems may lead to lack of service availability due to multiple system failures on multiple failure points. This article highlights the different fault tolerance mechanism in distributed systems used to prevent multiple system failures on multiple failure points by considering replication, high redundancy and high availability of the distributed services.

High performance computing: Modern systems and practices

Book

Dec 2017

High Performance Computing: Modern Systems and Practices is a fully comprehensive and easily accessible treatment of high performance computing, covering fundamental concepts and essential knowledge while also providing key skills training. With this book, domain scientists will learn how to use supercomputers as a key tool in their quest for new knowledge. In addition, practicing engineers will discover how supercomputers can employ HPC systems and methods to the design and simulation of innovative products, and students will begin their careers with an understanding of possible directions for future research and development in HPC. Those who maintain and administer commodity clusters will find this textbook provides essential coverage of not only what HPC systems do, but how they are used.

Managing Database Replication Using Binary Vote Assignment on Grid Quorum with Association Rule

Article

Oct 2018

Efficient Analysis of Repairable Computing Systems Subject to Scheduled Checkpointing

Article

Sep 2018

To improve the success probability of a mission execution, scheduled checkpointing is often implemented to save completed portions of the mission task so that a system can resume the mission execution effectively after its restoration whenever the system failure occurs. This paper considers a repairable computing system subject to the scheduled checkpointing. The checkpointing intervals are deterministic, but can be even or uneven. The system repair time is fixed while the system time-to-failure can follow any arbitrary type of distributions. The maximum number of repairs is specified by a certain threshold value. A multi-valued decision diagram (MDD)-based analytical approach is proposed to evaluate the exact success probability of a mission execution for the considered repairable system. The proposed approach enables generating a compact mission MDD model where identical subMDD models can be merged to improve computational efficiency and reduce storage requirement. The MDD model, once being constructed, can be reused for system reliability evaluations using different input parameter values. A benchmark study is presented to show the efficiency of proposed MDD approach. A case study is performed to illustrate the application of the proposed MDD approach to facilitate decision making about proper system design and parameter selection.

Combining Checkpointing and Replication for Reliable Execution of Linear Workflows

Conference Paper

May 2018

Managing fragmented database using BVAGQ-AR replication model

Article

Nov 2017

One of the mechanisms for managing data is replication since it improves data access and reliability. However, the amount of various data grows rapidly since technology is widely available at a low-cost. Problem arises when the database is packed with data, but it has lacked of knowledge. If the unreasonable data is used in database replication, it will cause waste of data storage and delay the time taken for a replication process. This paper proposes a new algorithm namely Binary Vote Assignment on Grid Quorum with Association Rule (BVAGQ-AR) to handle fragmented database replication. BVAGQ-AR algorithm is capable of partitioning the database into disjoint fragments. Handling fragmented database replication becomes challenging issue to administrator since the distributed database is disseminated into split partitions or fragments. This paper will discuss about how to build reliable system by using the proposed BVAGQ-AR algorithm for distributed database fragmentation. The result shows that managing fragmented database replication through proposed BVAGQ-AR algorithm able to preserve data consistency.

Chapter 15. A Taxonomy and Survey of Fault-Tolerant Workflow Management Systems in Cloud and Distributed Computing Environments

Chapter

Dec 2017

An Efficient Data Replication Technique with Fault Tolerance Approach using BVAG with Checkpoint and Rollback-Recovery

Abstract and Figures

Recommended publications

Managing transactions on grid-neighbour replication in distributed systems

An automate failure recovery for synchronous distributed database system

BVAGQ-AR for Fragmented Database Replication Management

Overview of Replication Techniques on Distributed Database in Cloud Environment

Fragmentation Techniques with Ideal Performance in Distributed Database -A Review