Conference PaperPDF Available

A Comparative Analysis of Dependable Recovery Line Accumulation Protocols for Mobile Computing Environments

January 2023

January 2023

Authors:

Naheeda Zaib

NIMS University

Dr S. Senthil Kumar

NIMS University

Content uploaded by Naheeda Zaib

Content may be subject to copyright.

Content uploaded by Naheeda Zaib

Content may be subject to copyright.

162 | P a g e

A Comparative Analysis of Dependable Recovery Line

Accumulation Protocols for Mobile Computing

Environments

Naheeda Zaib1, Dr. S.Senthil Kumar2

1 Research Scholar, Department of Computer Science Engg, Nims University, Jaipur (Raj),

Naheedazaib93@gmail.com

2 Professor, Department of Computer Science Engg., Nims University, Jaipur (Raj),

senthil.kumar@nimsuniversity.org

ABSTRACT

Checkpointing-based fault tolerance techniques facilitate systems to carry out tasks in the occurrence of faults.

A checkpoint is a local state of a process saved on stable storage. In a distributed system, since the processes in

the system do not share memory; a global state of the system is designated as a combination of local states, one

from each process. In case of occurrence of a fault in distributed systems, checkpointing based recovery

techniques enable the execution of a program to be resumed from a previous consistent global state rather than

resuming the execution from the beginning. In this paper, we discuss about various issues related to the

checkpointing for distributed systems and mobile computing environments. We also confer various types of

checkpointing: coordinated checkpointing, asynchronous checkpointing, communication induced checkpointing

and message logging based checkpointing. We also present a survey of some checkpointing algorithms for

distributed systems.

Key words: Checkpointing algorithms; parallel & distributed computing; rollback recovery; fault-

tolerant systems.

1. INTRODUCTION

A DRL (Dependable Recovery Line) of a DS is an assortment of the singular states of all contributing

operations and the states of the application-communication channels. Instinctively, a consistent DRL is one that

may transpire in the course of a failure-free accomplishment of a distributed reckoning. More specifically, a

consistent interconnected-structure state is one, in which, if a operation’s state reveals a application-

communication receipt, then the state of the matching disseminator must reflect the sending of that application-

communication [1 ,3, 4]. Thus, a reclamation-dot (LS) is a local state of a operation and a Global State (GS) is a

set of reclamation-dots, one from each operation. A GS is stable if no application-communication is dispatched

by a operation after recording its reclamation-dot that is acknowledged by another operation before recording its

163 | P a g e

reclamation-dot. The reliability of GSs is intensely governed by the flow of application-communications

exchanged by operations; and an indiscriminate set of LSs at operations may not form a consistent GS. The

ultimate objective of any rollback-recovery methodology is to bring the interconnected-structure to a stable state

after a catastrophe [3, 13].

2. GLOBAL SNAPSHOT COMPILATION ETIQUETTS

DRL-accumulation methodologies are planned under orchestrated or un-orchestrated, quasi-orchestrated or

communication-induced and communication-logging based.

(a) Un-orchestrated /Independent DRL-accumulation Methodologies

Orchestrated DRL-accumulation methodologies permit each operation the extreme sovereignty in

determining when to capture reclamation-dots. The key benefit of this sovereignty is that each operation may

capture a reclamation-dot when it is utmost pertinent. For illustration, a operation may moderate the overhead

by capturing reclamation-dot when the expanse of state information to be kept is lesser [13, 22]. But there are

numerous shortcomings. First, there is the prospect of the domino-effect, which may reason the forfeiture of a

huge amount of advantageous work, probably all the way back to the commencement of the reckoning. Second,

a operation may record inoperable reclamation-dots that will never be part of a GS. Unserviceable reclamation-

dots are unwanted because they invite overhead and do not add to progressing of the recovery-line. Thirdly,

DRL-accumulation methodologies require each operation to preserve several reclamation-dots, and to

summon intermittently garbage assortment methodology to retrieve the reclamation-dots that are no longer

worthwhile. Fourth, it is not appropriate for applications with recurrent output commits because these

necessitate global synchronization to compute the recovery line, disproving much of the benefit of sovereignty.

In order to regulate a consistent GS during recovery, the operations capture the causal-interrelationships among

their reclamation-dots during failure-free operation [22, 25, 27].

b) Orchestrated Global Snapshot (GS) Compilation Methodologies

This structure necessitates the operations to plan their reclamation-dots in order to form a reliable GS. It

streamlines reclamation and is not vulnerable to the domino-effect, since every operation always starts from its

most recent reclamation-dot in case of a catastrophe. In this arrangement, operations record reclamation-dots in

such a fashion that the consequential GS is stable. Ordinarily, it adopts two-stage commit arrangement. In the

first level, operations capture partially-committed reclamation-dots; and in the second level, these are converted

into committed ones. The chief improvement is that only one committed reclamation-dot; and at most one

partially-committed reclamation-dot is requisite to be warehoused at any operation. It leads to less requirement

of storage for fault-tolerance. In the case of catastrophe, all the operations recover by rolling back to recent GS.

A committed GS cannot be unconcluded. It promises that the reckoning needed to reach the GS will not be

repeated in case of reclamation after a fault. A partially-committed reclamation-dot, however, can be discarded

in case of abort; or converted into committed one, in case of commit. The Orchestrated/orchestrated DRL-

accumulation Methodologies can be categorized into two kinds: intrusive and non-intrusive. In intrusive

methodologies, as mentioned above, some hindering of operations takes place during DRL-accumulation [10,

164 | P a g e

61]. In non- intrusive methodologies, no hindering of reckoning in operations is requisite for GS compilation.

[7,12]. The Orchestrated/orchestrated DRL-accumulation Methodologies can also be categorized into

succeeding two categories: min-operation and all-operation methodologies. In all-operation Methodologies,

every single operation is prerequisite to record its reclamation-dot in a commencement. In min-operation

strategies, least cooperating operations are obligatory to capture their reclamation-dots in an instigation of the

DRL-accumulation [28].

There is a two-stage approach, where the inter-operation application-communications are by application-

communication-passing. The initiator operation records its reclamation-dot and broadcasts the reclamation-dot-

invitation to all operations. When a operation acknowledges the reclamation-dot-invitation, it stops its

executions, flushes all the application-communication channels, takes a partially-committed -reclamation-dot,

and dispatches an acknowledgement application-communication back to the initiator. After the coordinator

acknowledges positive acknowledgements from all concerned operations, it broadcasts a commit application-

communication that completes the two stage DRL-accumulation methodology. On the receipt of commit, a

operation converts its partially-committed -reclamation-dot into committed one and abandons its old committed-

reclamation-dot, if any. The operation is then endorsed to recommence its execution and exchange application

application-communications with other operations. The main drawback of this strategy is that the operations are

blocked during DRL-accumulation [10].

c) Communication-induced/quasi-orchestrated Global Snapshot (GS) Compilation Methodologies

In this strategy, operations have the sovereignty to take some of their reclamation-dots autonomously.

Operations record two types of reclamation-dots, autonomous and forced. Autonomous reclamation-dots can be

taken independently, while forced reclamation-dots are taken to promise the subsequent advancement of the

recovery-line and to abate inoperable reclamation-dots. As contrasting to orchestrate GS Compilation

Methodologies, these methodologies do not necessitate to exchange any special coordination application-

communications to determine when forced reclamation-dots should be captured. But, they attach methodology

specific information [commonly reclamation-dot sequence numbers] on each application application-

communication; the receiver then uses this information to resolve if it should take a involuntary reclamation-dot

to advance the global recovery-line. This verdict is based on the receiver determining if past application-

communication and reclamation-dots patterns can lead to the creation of useless reclamation-dots; a forced

reclamation-dot is captured to break these patterns [11, 14, 15, 16].

d) Message-logging based Methodologies

Message-logging based methodologies are prevalent for constructing interconnected-structures that can endure

interconnected-structure failures. Message-logging and DRL-accumulation can be implemented together to

offer fault tolerance in DS in which all inter-operation communiqué is through application-communications.

Each application-communication acknowledged by a operation is kept in application-communication-log on

stable-storage. No synchronization is necessary between the DRL-accumulation of different operations and

application-communication-logging. The execution of each operation is presumed to be deterministic between

165 | P a g e

acknowledged application-communications, and all operations are supposed to accomplish on fail stop

operations.

When a operation crashes, a new operation is generated in its place. The fresh operation is given the applicable

recorded reclamation-dot, and then the logged-application-communications are replayed in the order, the

operation formerly acknowledged them. All application-communication-logging strategies necessitate that once

a crashed operation recuperates; its state needs to be stable with the states of the other operations. This stability

condition is typically articulated in terms of orphan operations, which are persisting operations whose states are

unreliable with the recovered states of crashed operations. Thus, application-communication-logging strategies

promise that upon reclamation, no operation is an orphan. This situation can be imposed either by evading the

formation of orphan-application-communications for the period of an execution, as pessimistic methodologies

do; or by recording appropriate actions during reclamation to eradicate all orphan-application-communications

as optimistic methodologies do [1, 3, 17, 18, 19, 27].

3. LITERATURE SURVEY

3.1 Koo and Toueg Minimum-Process Blocking DRL-accumulation Scheme for Dist_Systems [5]

Koo and Toueg have revealed that if the nodes take their resident reclamation-dots in an orchestrated manner it

may not be imaginable to build a consistent DRL from such resident reclamation-dots. The rollback may lead to

domino effect. In orchestrated DRL collection strategy, a node initiates DRL assortment by recording its

resident reclamation-dot, then, it dispatches invitation application-communication to another nodes to take their

resident reclamation-dots. If the nodes preserve information about casual causal-interrelationships, a minimal

number of nodes have to take their resident reclamation-dots in response to such reclamation-dot-invitations.

Koo and Toueg had pre-dispatched such a strategy, which implicates suspending the underlying reckoning

during DRL collection .The nodes resume the underlying reckoning when the DRL assemblage terminates. Koo

and Toueg have projected operations to handle contemporaneous commencements of DRL collection. Koo and

Toueg handles contemporaneous resident reclamation-dot gathering in following manner: Once a node takes a

resident reclamation-dot, it is unwilling to take a resident reclamation-dot in response to another

DRL_instigator. The node dispatches a negative response to all subsequent reclamation-dot-invitations until the

resident reclamation-dot invitation is made committed or until the resident reclamation-dot collection is aborted.

Their strategy makes the following assumption about distributed interconnected-structure: operations

communicate by exchanging reckoning-application-communications through application-communication

channels. Communication channels are FIFO. Communication failure does not partition the network. The

strategy takes two kinds of reclamation-dots on stable storage: committed and partially-committed. A committed

reclamation-dot is a reclamation-dot at a operation and is a part of a consistent DRL .A partially-committed

resident reclamation-dot is a temporary resident reclamation-dot; that is made committed resident reclamation-

dot on successful termination of DRL-accumulation strategy. In case of a failure, operations rollback only to

their committed resident reclamation-dots for recovery .The strategy assumes that no operation miscarries

during the execution of strategy .The strategy consists of two stages.

166 | P a g e

3.2 Silva and Silva [16] all operation orchestrated DRL-accumulation methodology for distributed

interconnected-structures

Silva and Silva projected all operation orchestrated DRL-accumulation methodology for distributed

interconnected-structures. The non-intrusiveness during DRL-accumulation is attained by piggybacking

monotonically increasing resident reclamation-dot sequence numbers along with reckoning-application-

communications. When a operation acknowledges a reckoning-application-communication with the higher

resident reclamation-dot sequence number, it takes its resident reclamation-dot before processing the reckoning-

application-communication. When it actually gets the resident reclamation-dot invitation from the

DRL_instigator, it ignores the same. If each operation of the distributed programs is allowed to initiate DRL-

accumulation , the network may be swamped with control-application-communications and operations might

waste their time making unnecessary resident reclamation-dots. In order to avoid this, Silva and Silva give the

key to initiate DRL-accumulation to one operation. The reclamation-dot event is triggered periodically by a

local timer mechanism. When this timer expires, the DRL_instigator operation captures the state of operations

running in its machine and forces all the others to take resident reclamation-dot by sending a broadcast

invitation. The interval between adjacent resident reclamation-dots is called the reclamation-dot interval.

3.3 Kim-Park Scheme for DRL-accumulation and partial commit on recovery [11]

Kim-Park Scheme [11] projected a methodology for DRL-accumulation and partial commit on recovery;

which exploits the causal-interrelationship relationship between operations to achieve time-efficiency in DRL-

accumulation and rollback coordination. Unlike other synchronized methodologies. In which the DRL-

accumulation coordinator collects the status information of the operations that it depends on and delivers its

decision, the operation in their methodology takes a resident reclamation-dot, when it knows that all operations

on which it reckoning ally depends took their resident reclamation-dots. In this way, the coordinator of the

DRL-accumulation does not always have to deliver its decision after it collects the status of the operations, it

depends on, and hence one stage of the coordination is practically removed. The DRL-accumulation

coordination time and the possibility of total abort of the DRL-accumulation are substantially reduced.

Reduction of the coordination roll back time is also achieved by sending the restart application-communications

from the coordinator directly to the roll back operations; and contemporaneous activities of the DRL-

accumulation and roll back are effectively handled exploiting the operation causal-interrelationship

relationship.

3.4 Lai and Yang’s global resident reclamation-dot strategy for non-FIFO interconnected-structures [27]

Lai and Yang’s DRL strategy for non-FIFO interconnected-structures is based on two observations on the role

of a marker in a FIFO interconnected-structure. The Lai-Yang strategy fulfills this role of a marker in a non-

FIFO interconnected-structure by using a coloring strategy on reckoning-application-communications that

works as follows:

167 | P a g e

1. Every operation is initially white and turns red while recording a resident reclamation-dot. The equivalent of

the “marker sending rule” is executed when a operation turns red.

2. Every reckoning-application-communication dispatched by a white (red) operation is colored white (red).

Thus a white (red) reckoning-application-communication is a application-communication that was dispatched

before (after) the sender of that reckoning-application-communication recorded its local resident reclamation-

dot.

3. Every white operation takes its resident reclamation-dot at its convenience, but no later than the instant it

acknowledges a red reckoning-application-communication.

Thus, when a white operation acknowledges a red reckoning-application-communication, it records its local

snap-shot before processing the application-communication. This ensures that no reckoning-application-

communication dispatched by a operation after recording its local resident reclamation-dot is processed by the

destination operation before the destination records its resident reclamation-dot. Thus, an explicit marker is not

required in this strategy and the “marker” is piggybacked on reckoning-application-communications using a

coloring strategy.

3.5 Spezialetti-Kearns marker-based DRL-accumulation strategy for Dist_Systems [28]

In this strategy, a marker carries the identifier of the DRL_instigator of the strategy. Each operation has a

variable master to keep track of the DRL_instigator of the strategy. When a operation executes the “marker

sending rule” on the receipt of its first marker, it records the DRL_instigator’s identifier carried in the

acknowledged marker in the master variable. A operation that initiates the strategy records its own identifier in

the master variable.

A key notion used by the optimizations is that of a region in the interconnected-structure. A region

encompasses all the operations whose master field contains the identifier of the same DRL_instigator. A region

is identified by the DRL_instigator’s identifier. When there are multiple contemporaneous DRL_instigators, the

interconnected-structure gets partitioned into multiple regions.

When the DRL_instigator’s identifier in a marker acknowledged along a channel is different from the value in

the master variable, a contemporaneous DRL_instigation of the strategy is detected and the sender of the marker

lies in a different region. The identifier of the contemporaneous DRL_instigator is recorded in a local variable

id-border-set. The operation acquiring the marker does not take a resident reclamation-dot for this marker and

does not propagate this marker. Thus, the strategy efficiently handles contemporaneous reclamation-dot

commencements by suppressing redundant reclamation-dot collections. A operation does not take a resident

reclamation-dot or propagate a reclamation-dot-invitation instigated by a operation, if it has already taken a

resident reclamation-dot in response to some other reclamation-dot DRL_instigation.

The state of the channel is recorded just as in the Chandy-Lamport strategy (including those that cross a border

between regions). This enables the resident reclamation-dot recorded in one region to be merged with the

168 | P a g e

resident reclamation-dot recorded in the adjacent region. Thus, even though markers arriving at a node contain

identifiers of different DRL_instigators, they are considered part of the same instance of the strategy for the

purpose of channel state recording.

Local-reclamation-dot recording at a operation is complete after it has acknowledged a marker along each of its

channels. After every operation has recorded its resident reclamation-dot, the interconnected-structure is

partitioned into as many regions as the number of contemporaneous DRL_instigations of the strategy. The

variable id-border-set at a operation contains the identifiers of the neighboring regions.

The strategy works as follows: resident reclamation-dots are assigned version numbers and all reckoning-

application-communications carry this version number. The DRL_instigator notifies all the operations the

version number of the new resident reclamation-dot by sending init_snap application-communications along the

spanning tree edges. A operation follows the “marker sending rule” when it acknowledges this notification or

when it acknowledges a regular reckoning-application-communication with a new version number. The

“marker sending rule” is modified so that the operation dispatches regular reckoning-application-

communications along only those channels on which it has dispatched reckoning-application-communications

since the previous resident reclamation-dot.

3.6 Ravi Prakash and Mukesh Singhal Algorithm [4]:

They had described a Synchronous Snapshot compilation algorithm for Mobile Systems that neither forces

every node to take a local snapshot nor blocks the essential computation during snapshot collection. If a node

initiates snapshot collection, local snapshot of only those nodes that have directly or transitively affected the

initiator, take their checkpoints. This paper presents that the global snapshot collection terminates within a finite

time of its request and collected global snapshot is consistent. This paper presents a minimal rollback/recovery

algorithm in which the computation at a node is rolled back only if depends on operations that have been undone

due to failure of node(s).Both the algorithms have low communication and storage overheads and meet the low

energy consumption and low bandwidth constraints of mobile computing systems. The synchronous snapshot

collection algorithm accounts for the mobility of the nodes .The algorithm forces a minimal set of nodes to take

their snapshots and underlying computation is not suspended during snapshot collection. An interesting aspect

of the algorithm is that it has lazy phase that enables nodes to take local snapshots in quasi-asynchronous

fashion, after the coordinated snapshot collection phase is over. This further reduces the amount of computation

that is rollback during recovery from node failure. The lazy phase advances the checkpoint slowly rather than in

a burst. This avoids disagreement for the low bandwidth channels. This algorithm also considers the changing

topology of network due to mobility of nodes. Here the Recovery algorithm is a compromise between two

diverse recovery strategies – fast recovery with high communication and storage overhead and slow recovery

with very little communication overhead.

169 | P a g e

3.7 Cao-Singhal Non-intrusive Checkpointing Algorithm [2]:

They proved that no min-process non-blocking algorithm exists. There are two directions in designing efficient

coordinated checkpointing algorithms. First is to relax the non-blocking condition while keeping the min-

process property. The other is to relax the min-process condition while keeping the non-blocking property. The

new constraints in mobile computing system, such as low bandwidth of wireless channel, high search cost, and

limited battery life, suggest that the proposed checkpointing algorithm should be a min-process algorithm.

Therefore, they developed an algorithm that relaxes the min-process condition. In this scheme, they introduced

the concept of mutable checkpoint, which is neither a tentative checkpoint nor a permanent checkpoint, to

design efficient checkpointing algorithms for mobile computing systems. Mutable checkpoints can be saved

anywhere, e.g., the main memory or local disk of Mob-Hosts.

Such algorithms rely on the two-phase commit protocol and save two kinds of checkpoints on the stable storage:

tentative and permanent.

In the first phase, the initiator takes a tentative checkpoint and forces all relevant processes to take tentative

checkpoints. Each process informs the initiator whether it succeeded in taking a tentative checkpoint. When the

initiator learns that all relevant processes have successfully taken tentative checkpoints, it asks them to make

their tentative checkpoints permanent; otherwise, it asks them to discard them. A process, on receiving the

message from the initiator, acts accordingly. A non-blocking checkpointing algorithm does not require any

process to suspend its underlying computation. When processes do not suspend their computations, it is possible

for a process to receive a computation message from another process which is already running in a new

checkpoint interval. If this situation is not properly handled, it may result in an inconsistency.

In their algorithm, initiator, say Pin, sends the checkpoint request to any process, say Pj, only if Pin receives m

from Pj in the current CI. Pj takes its tentative checkpoint if Pj has sent m to Pin in the current CI; otherwise, Pj

concludes that the checkpoint request is a useless one. Similarly, when Pj takes its tentative checkpoint, it

propagates the checkpoint request to other processes. This process is continued till the checkpoint request

reaches all the processes on which the initiator transitively depends and a checkpointing tree is formed. During

checkpointing, if Pi receives m from Pj such that Pj has taken some checkpoint in the current initiation before

sending m, Pi may be forced to take a checkpoint, called mutable checkpoint. If Pi is not in the minimum set, its

mutable checkpoint is useless and is discarded on commit. The huge data structure MR[] is also attached with

the checkpoint requests to reduce the number of useless checkpoint requests. The response from each process is

sent directly to initiator.

3.8 P. Kumar and L. Kumar algorithm [3]:

In Cao and Singhal algorithm number of useless checkpoint may exceed high in some situation [2]. P. Kumar

and L. Kumar proposed a new for Synchronous check pointing protocol for mobile distributed system [3]. They

are able to maintain exact dependencies among processes and make an approximate set of interacting processes

at the beginning. In this way the time to collect coordinated checkpoint is reduced. The number of useless check

pointing and blocking processes is also reduced. A process checkpoint if the probability that it will get a

checkpoint request in current initiation is high. A few processes may be blocked but they can continue their

170 | P a g e

normal computation and may send message..

Suppose, during the execution of the check pointing algorithm, Pi takes its checkpoint and sends m to Pj. Pj

receives m such that it has not taken its checkpoint for the current initiation and it does not know whether it will

get the checkpoint request. If Pj takes its checkpoint after processing m, m will become orphan. In order to avoid

such orphan messages, they propose the following technique. If Pj has sent at least one message to a process,

say Pk and Pk is in the tentative minimum set, there is a good probability that Pj will get the checkpoint request.

Therefore, Pj takes its induced checkpoint before processing m. An induced checkpoint is similar to the mutable

checkpoint [14]. In this case, most probably, Pj will get the checkpoint request and its induced checkpoint will

be converted into permanent one. There is a less probability that Pj will not get the checkpoint request and its

induced checkpoint will be discarded. Alternatively, if there is not a good probability that Pj will get the

checkpoint request, Pj buffers m till it takes its checkpoint or receives the commit message. They have tried to

minimize the number of useless checkpoints and blocking of the process by using the probabilistic approach and

buffering selective messages at the receiver end. Exact dependencies among processes are maintained. It

abolishes the useless checkpoint requests and reduces the number of replica checkpoint requests as compared to

[14].

3.8 P.Kumar Hybrid Checkpointing Scheme : P. Kumar suggested Crossbreed orchestrated DRL-accumulation

etiquette [31]. It minimizes the number of methods to reclamation-dot is a suitable approach to introduce fault

tolerance in mobile interlinked-structures transparently. Minimum-method coordinating DRL-accumulation

may require piggybacking of some information on normal reckoning-reckoning-communiqués , intrusive of

underlying reckoning or recording some resident Reclamation-dots more than the minimum required. In this

approach, some methods may not reclamation-dot for several CR CR_instigations as they are not part of

minimum methods to reclamation-dot. In case of a recovery after a fault, such methods may rollback to far

earlier reclamation-doted state and thus may cause greater loss of reckoning. In orchestrated DRL-

accumulation , where all methods record resident Reclamation-dots, the recovery line is advanced for all

methods but the DRL-accumulation overhead may be exceptionally high, especially in mobile environments;

because, it will consume the scarce resources of mobile nodes even if they are not part of minimum methods to

reclamation-dot. To optimize both, i.e., the DRL-accumulation overhead and the loss of reckoning on recovery,

author projected a crossbreed DRL-accumulation method, where an all-method DRL-accumulation is forced

after the execution of minimum-method orchestrated DRL-accumulation method for a fixed number of times.

Thus, the Mobile nodes with low activity or in doze mode operation may not be disturbed in case of minimum-

method DRL-accumulation and the recovery line is advanced for each method after an all-method reclamation-

dot. Additionally, he tries to minimize the piggybacked information. For minimum-method DRL-accumulation ,

he projected a intrusive method, where no useless Reclamation-dots are recordn and an effort has been made to

optimize the intrusive of methods. He projected an innovative idea to delay selective reckoning-reckoning-

communiqués at the receiver end. By doing so, methods are allowed to accomplish their normal reckoning,

send reckoning-reckoning-communiqués and partially receive them during intrusive.

171 | P a g e

4. CONCLUSION:

A survey of the literate on checkpointing algorithms for mobile distributed systems shows that a large number of

papers have been published. We have reviewed and compared different approaches to checkpointing in mobile

distributed systems with respect to a set of properties including the assumption of piecewise determinism,

performance overhead, storage overhead, ease of output commit, ease of garbage collection, ease of recovery,

useless checkpointing, low energy consumptions.

REFERENCES

[1] Chandy K. M. and Lamport L., “Distributed Snapshots: Determining Global State of Systems,” ACM

Transaction on Computing Systems, vol. 3, No. 1, pp. 63-75, February 1985.

[2] G. Cao and M. Singhal,”Mutable Checkpoints:A New Checkpointing Approach for Mobile Computing

Systems”, IEEE Transactions On Parallel And D istributed Systems,Vol.12,No.2,February 2001,pp 157-172.

[3] Lalit Kumar Awasthi, Kumar p. 2007 A Synchoronous Checkpointing Protocol For Mobile Distributed

Systems. Probabilistic Approach. Int J. Information and Computer Security, Vol.1, No.3 .pp 298-314

[4] R. Prakash and M. Singhal. “Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems”.

IEEE Trans. on Paralleland Distributed System, pages 1035-1048,Oct. 1996.

[5] Koo R, Toueg S “ Checkpointing and rollback recovery for distributed systems”.IEEE Trans.Software Eng.

SE-13: 23-31,1987

[6] G. Cao and M. Singhal. “On impossibility of Min-Process and Non-Blocking Checkpointing and An

Efficient Checkpointing algorithm for mobile computing Systems”. OSU Technical Report #OSU-CISRC-

9/97-TR44, 1997.

[7] Prakash R. and Singhal M. “Maximal Global Snapshot with concurrent initiators,” Proc. Sixth IEEE Symp.

Parallel and Distributed Processing, pp.344-351, Oct.1994

[8] Bidyut Gupta, S.Rahimi and Z.Lui. “A New High Performance Checkpointing Approach for Mobile

Computing Systems”. IJCSNS International Journal of Computer Science and Network Security, Vol.6

No.5B, May 2006.

[9] Acharya A. and Badrinath B. R., “Checkpointing Distributed Applications on Mobile Computers,”

Proceedings of the 3rd International Conference on Parallel and Distributed Information Systems, pp. 73-80,

September 1994.

[10] Ch.D.V. Subba Rao and M.M. Naidu. “A New, Efficient Coordinated Checkpointing Protocol Combined

with Selective Sender-Based Message Logging”

[11] J.L.Kim and T.Park. “An efficient protocol for checkpointing recovery in Distributed Systems” IEEE

Transaction On Parallel and Distributed Systems,4(8):pp.955-960, Aug 1993.

[12] Yanping Gao, Changhui Deng, Yandong Che. “An Adaptive Index-Based Algorithm using Time-

Coordination in Mobile Computing”. International Symposiums on Information Processing, 2008.

[13] Kanmani - Anitha - Ganesan .“Coordinated Checkpointing with Avalanche Avoidance for Distributed Mobile

Computing System.” International Conference on Computational Intelligence and Multimedia Applications

2007.

172 | P a g e

[14] Ajay D Kshemkalyani: “A symmetric O(n log n) message distributed snapshot algorithm for large scale

systems” IEEE, 2010, pp 1-4

[15] Ajay D Kshemkalyani “ Fast and message efficient global snapshot algorithms for large scale distributed

systems IEEE 2010. Page(s): 1281 – 1289.

[16] Silva L, Silva J 1992 Global checkpointing for distributed programs. Proc. IEEE 11th Symp. On Reliable

Distributed Syst. pp 155-162.

[17] Wang, Y.M., Fuchs, W.K.: Lazy checkpoint coordination for bounding rollback propagation. In: Proceedings

of IEEE Symposium on Reliable Distributed Systems, pp. 78–85 (1993).

[18] Kumar, P., Garg, R.: Soft Checkpointing Based Hybrid Synchronous Checkpointing Protocol for Mobile

Distributed Systems. International Journal of Distributed Systems and Technologies 2(1), 1–13 (2011)

[19] Venkatesan S 1993 Message-optimal incremental snapshots J. Comput. Sofnvare Engineering 1 211-31.

[20] Rahul Garg, Vijay K Garg, Yogish sabharwal “Scalable algorithms for global snapshots in distributed

systems” ACM 2006.

[21] Kumar and Khunteta “A Minimum-Process Coordinated Check pointing Protocol For Mobile Distributed

System” IJCSE, Vol. 02, No. 04, 2010, 1314-1326.

[22] Gupta and Kumar “Review of Some Checkpointing Algorithms for Distributed and Mobile Systems” CNSA

2011, CCIS 196, pp. 167–177, 2011.

[23] R. Tuli,P. Kumar,“ Minimum process coordinated Checkpointing scheme for ad hoc Networks”, International

Journal on AdHoc Networking Systems (IJANS) Vol. 1, No. 2, October 2011 ,pp-51-63.

[24] M. Singhal and N. Shivaratri, Advanced Concepts in Operating Systems, New York, McGraw Hill, 1994.

[25] Acharya A., “Structuring Distributed Algorithms and Services for networks with Mobile Hosts”, Ph.D.

Thesis, Rutgers University, 1995.

[26] Elnozahy E.N., Alvisi L., Wang Y.M. and Johnson D.B., “A Survey of Rollback-Recovery Protocols in

Message-Passing Systems,” ACM Computing Surveys, vol. 34, no. 3, pp. 375-408, 2002.

[27] T.H. Lai and T.H.Yang, “On distributed snapshots”, Information Processing Letters, 25, 1987, pp.153 -158.

[28] M. Spezialetti and P. Kearns, Efficient Distributed Snapshots, Proceedings of the 6th International

Conference on Distributed Computing Systems, 1986, 382-388.

[29] Praveen Choudhary, Parveen Kumar,” Low-Overhead Minimum-Method Global-Snapshot Compilation

Protocol for Deterministic Mobile Computing Systems ”, International Journal of Emerging Trends in

Engineering Research” Vol. 9, Issue 8, Aug 2021, pp.1069-1072.

[30] Deepak Chandra Uprety, Parveen Kumar, Arun Kumar Chouhary,”Transient Snapshot based Minimum-

process Synchronized Checkpointing Etiquette for Mobile Distributed Systems”,International Journal of

Emerging Trends in Engineering Research”, Vol 10, No 4, Aug. 2021

[31] Kumar, P.,” A Low-Cost Hybrid Coordinated Checkpointing Protocol for Mobile Distributed Systems”,

Mobile Information Systems pp 13-32, Vol. 4, No. 1. ,2007.

ResearchGate has not been able to resolve any citations for this publication.

STRUCTURING DISTRIBUTED ALGORITHMS AND SERVICES FOR NETWORKS WITH MOBILE HOSTS

Article

Full-text available

Global checkpointing for distributed programs

Article

Jan 1992

A symmetric O(n log n) message distributed snapshot algorithm for large-scale systems

Article

Jan 2009

Ajay D Kshemkalyani

A synchronous checkpointing protocol for mobile distributed systems: probabilistic approach

Article

Jan 2007
Int J Inform Comput Secur

Coordinated checkpointing is a method that minimises number of processes to checkpoint for an initiation. It may require blocking of processes, extra synchronisation messages or useless checkpoints. We propose a minimum process coordinated checkpointing algorithm where the number of useless checkpoints and blocking are reduced using a probabilistic approach that computes an interacting set of processes on checkpoint initiation. A process checkpoints if the probability that it will get a checkpoint request in current initiation is high. A few processes may be blocked but they can continue their normal computation and may send messages. We also modified methodology to maintain exact dependencies.

Advanced Concepts in Operating Systems

Article

Jan 1994

Fast and Message-Efficient Global Snapshot Algorithms for Large-Scale Distributed Systems

Article

Oct 2010

Ajay D Kshemkalyani

Large-scale distributed systems such as supercomputers and peer-to-peer systems typically have a fully connected logical topology over a large number of processors. Existing snapshot algorithms in such systems have high response time and/or require a large number of messages, typically O(n<sup>2</sup>), where n is the number of processes. In this paper, we present a suite of two algorithms: simple_tree, and hypercube, that are both fast and require a small number of messages. This makes the algorithms highly scalable. Simple_tree requires O(n) messages and has O(log n) response time. Hypercube requires O(n log n) messages and has O(log n) response time, in addition to having the property that the roles of all the processes are symmetrical. Process symmetry implies greater potential for balanced workload and congestion-freedom. All the algorithms assume non-FIFO channels.

Efficient Distributed Snapshots.

Conference Paper

Jan 1986

Soft-Checkpointing Based Hybrid Synchronous Checkpointing Protocol for Mobile Distributed Systems

Article

Jan 2011

Minimum-process coordinated checkpointing is a suitable approach to introduce fault tolerance in mobile distributed systems transparently. In order to balance the checkpointing overhead and the loss of computation on recovery, the authors propose a hybrid checkpointing algorithm, wherein an all-process coordinated checkpoint is taken after the execution of minimum-process coordinated checkpointing algorithm for a fixed number of times. In coordinated checkpointing, if a single process fails to take its checkpoint; all the checkpointing effort goes waste, because, each process has to abort its tentative checkpoint. In order to take the tentative checkpoint, an MH Mobile Host needs to transfer large checkpoint data to its local MSS over wireless channels. In this regard, the authors propose that in the first phase, all concerned MHs will take soft checkpoint only. Soft checkpoint is similar to mutable checkpoint. In this case, if some process fails to take checkpoint in the first phase, then MHs need to abort their soft checkpoints only. The effort of taking a soft checkpoint is negligibly small as compared to the tentative one. In the minimum-process coordinated checkpointing algorithm, an effort has been made to minimize the number of useless checkpoints and blocking of processes using probabilistic approach.

An Efficient Protocol for Checkpointing Recovery in Distributed Systems.

Article

Sep 1993

Given the increased availability of general purpose parallel computers two issues arise:One needs to compare the performance of the different available platforms using realisticexamples, and it is necessary to write application software that can be ported easily inorder to take advantage of different platforms. The authors address these issues from anapplications point of view. They are interested in the use of general purpose parallelcomputers for simulation tasks needed during the design of very large scale integrated(VLSI) circuits. They characterize the simulation task as a useful benchmark andintroduce a high level process view of parallel simulation that is helpful for derivingportable parallel programs. Details of the partitioning strategy and the simulation algorithm used in the application are given. They discuss their implementation on different parallel machines and give statistics of various experiments.

Mutable Checkpoints: A New Checkpointing Approach for Mobile Computing Systems.

Article

Jan 2001

Mobile computing raises many new issues such as lack of stable storage, low bandwidth of wireless channel, high mobility, and limited battery life. These new issues make traditional checkpointing algorithms unsuitable. Coordinated checkpointing is an attractive approach for transparently adding fault tolerance to distributed applications since it avoids domino effects and minimizes the stable storage requirement. However, it suffers from high overhead associated with the checkpointing process in mobile computing systems. Two approaches have been used to reduce the overhead: First is to minimize the number of synchronization messages and the number of checkpoints; the other is to make the checkpointing process nonblocking. These two approaches were orthogonal previously until the Prakash-Singhal algorithm [28] combined them. However, we [8] found that this algorithm may result in an inconsistency in some situations and we proved that there does not exist a nonblocking algorithm which forces only a minimum number of processes to take their checkpoints. In this paper, we introduce the concept of ¿mutable checkpoint,¿ which is neither a tentative checkpoint nor a permanent checkpoint, to design efficient checkpointing algorithms for mobile computing systems. Mutable checkpoints can be saved anywhere, e.g., the main memory or local disk of MHs. In this way, taking a mutable checkpoint avoids the overhead of transferring large amounts of data to the stable storage at MSSs over the wireless network. We present techniques to minimize the number of mutable checkpoints. Simulation results show that the overhead of taking mutable checkpoints is negligible. Based on mutable checkpoints, our nonblocking algorithm avoids the avalanche effect and forces only a minimum number of processes to take their checkpoints on the stable storage.

A Comparative Analysis of Dependable Recovery Line Accumulation Protocols for Mobile Computing Environments

Recommended publications

Recovery of the Complete Mueller Matrix of an Arbitrary Object in the Method of Three Input Polariza...

A Comparative Analysis of Some RGS-collation Protocols for Mobile Computing Environments

Saiba Jan et al/ A Transitional-retrieval-dot based Out crossed Synchronized Protocol for Reliable G...

Soft-Checkpointing Based Hybrid Synchronous Checkpointing Protocol for Mobile Distributed Systems

Soft-Checkpointing Based Hybrid Synchronous Checkpointing Protocol for Mobile Distributed Systems