Conference PaperPDF Available

A Comparative Analysis of Dependable Recovery Line Accumulation Protocols for Mobile Computing Environments

Authors:
162 | P a g e
A Comparative Analysis of Dependable Recovery Line
Accumulation Protocols for Mobile Computing
Environments
Naheeda Zaib1, Dr. S.Senthil Kumar2
1 Research Scholar, Department of Computer Science Engg, Nims University, Jaipur (Raj),
Naheedazaib93@gmail.com
2 Professor, Department of Computer Science Engg., Nims University, Jaipur (Raj),
senthil.kumar@nimsuniversity.org
ABSTRACT
Checkpointing-based fault tolerance techniques facilitate systems to carry out tasks in the occurrence of faults.
A checkpoint is a local state of a process saved on stable storage. In a distributed system, since the processes in
the system do not share memory; a global state of the system is designated as a combination of local states, one
from each process. In case of occurrence of a fault in distributed systems, checkpointing based recovery
techniques enable the execution of a program to be resumed from a previous consistent global state rather than
resuming the execution from the beginning. In this paper, we discuss about various issues related to the
checkpointing for distributed systems and mobile computing environments. We also confer various types of
checkpointing: coordinated checkpointing, asynchronous checkpointing, communication induced checkpointing
and message logging based checkpointing. We also present a survey of some checkpointing algorithms for
distributed systems.
Key words: Checkpointing algorithms; parallel & distributed computing; rollback recovery; fault-
tolerant systems.
1. INTRODUCTION
A DRL (Dependable Recovery Line) of a DS is an assortment of the singular states of all contributing
operations and the states of the application-communication channels. Instinctively, a consistent DRL is one that
may transpire in the course of a failure-free accomplishment of a distributed reckoning. More specifically, a
consistent interconnected-structure state is one, in which, if a operation’s state reveals a application-
communication receipt, then the state of the matching disseminator must reflect the sending of that application-
communication [1 ,3, 4]. Thus, a reclamation-dot (LS) is a local state of a operation and a Global State (GS) is a
set of reclamation-dots, one from each operation. A GS is stable if no application-communication is dispatched
by a operation after recording its reclamation-dot that is acknowledged by another operation before recording its
163 | P a g e
reclamation-dot. The reliability of GSs is intensely governed by the flow of application-communications
exchanged by operations; and an indiscriminate set of LSs at operations may not form a consistent GS. The
ultimate objective of any rollback-recovery methodology is to bring the interconnected-structure to a stable state
after a catastrophe [3, 13].
2. GLOBAL SNAPSHOT COMPILATION ETIQUETTS
DRL-accumulation methodologies are planned under orchestrated or un-orchestrated, quasi-orchestrated or
communication-induced and communication-logging based.
(a) Un-orchestrated /Independent DRL-accumulation Methodologies
Orchestrated DRL-accumulation methodologies permit each operation the extreme sovereignty in
determining when to capture reclamation-dots. The key benefit of this sovereignty is that each operation may
capture a reclamation-dot when it is utmost pertinent. For illustration, a operation may moderate the overhead
by capturing reclamation-dot when the expanse of state information to be kept is lesser [13, 22]. But there are
numerous shortcomings. First, there is the prospect of the domino-effect, which may reason the forfeiture of a
huge amount of advantageous work, probably all the way back to the commencement of the reckoning. Second,
a operation may record inoperable reclamation-dots that will never be part of a GS. Unserviceable reclamation-
dots are unwanted because they invite overhead and do not add to progressing of the recovery-line. Thirdly,
DRL-accumulation methodologies require each operation to preserve several reclamation-dots, and to
summon intermittently garbage assortment methodology to retrieve the reclamation-dots that are no longer
worthwhile. Fourth, it is not appropriate for applications with recurrent output commits because these
necessitate global synchronization to compute the recovery line, disproving much of the benefit of sovereignty.
In order to regulate a consistent GS during recovery, the operations capture the causal-interrelationships among
their reclamation-dots during failure-free operation [22, 25, 27].
b) Orchestrated Global Snapshot (GS) Compilation Methodologies
This structure necessitates the operations to plan their reclamation-dots in order to form a reliable GS. It
streamlines reclamation and is not vulnerable to the domino-effect, since every operation always starts from its
most recent reclamation-dot in case of a catastrophe. In this arrangement, operations record reclamation-dots in
such a fashion that the consequential GS is stable. Ordinarily, it adopts two-stage commit arrangement. In the
first level, operations capture partially-committed reclamation-dots; and in the second level, these are converted
into committed ones. The chief improvement is that only one committed reclamation-dot; and at most one
partially-committed reclamation-dot is requisite to be warehoused at any operation. It leads to less requirement
of storage for fault-tolerance. In the case of catastrophe, all the operations recover by rolling back to recent GS.
A committed GS cannot be unconcluded. It promises that the reckoning needed to reach the GS will not be
repeated in case of reclamation after a fault. A partially-committed reclamation-dot, however, can be discarded
in case of abort; or converted into committed one, in case of commit. The Orchestrated/orchestrated DRL-
accumulation Methodologies can be categorized into two kinds: intrusive and non-intrusive. In intrusive
methodologies, as mentioned above, some hindering of operations takes place during DRL-accumulation [10,
164 | P a g e
61]. In non- intrusive methodologies, no hindering of reckoning in operations is requisite for GS compilation.
[7,12]. The Orchestrated/orchestrated DRL-accumulation Methodologies can also be categorized into
succeeding two categories: min-operation and all-operation methodologies. In all-operation Methodologies,
every single operation is prerequisite to record its reclamation-dot in a commencement. In min-operation
strategies, least cooperating operations are obligatory to capture their reclamation-dots in an instigation of the
DRL-accumulation [28].
There is a two-stage approach, where the inter-operation application-communications are by application-
communication-passing. The initiator operation records its reclamation-dot and broadcasts the reclamation-dot-
invitation to all operations. When a operation acknowledges the reclamation-dot-invitation, it stops its
executions, flushes all the application-communication channels, takes a partially-committed -reclamation-dot,
and dispatches an acknowledgement application-communication back to the initiator. After the coordinator
acknowledges positive acknowledgements from all concerned operations, it broadcasts a commit application-
communication that completes the two stage DRL-accumulation methodology. On the receipt of commit, a
operation converts its partially-committed -reclamation-dot into committed one and abandons its old committed-
reclamation-dot, if any. The operation is then endorsed to recommence its execution and exchange application
application-communications with other operations. The main drawback of this strategy is that the operations are
blocked during DRL-accumulation [10].
c) Communication-induced/quasi-orchestrated Global Snapshot (GS) Compilation Methodologies
In this strategy, operations have the sovereignty to take some of their reclamation-dots autonomously.
Operations record two types of reclamation-dots, autonomous and forced. Autonomous reclamation-dots can be
taken independently, while forced reclamation-dots are taken to promise the subsequent advancement of the
recovery-line and to abate inoperable reclamation-dots. As contrasting to orchestrate GS Compilation
Methodologies, these methodologies do not necessitate to exchange any special coordination application-
communications to determine when forced reclamation-dots should be captured. But, they attach methodology
specific information [commonly reclamation-dot sequence numbers] on each application application-
communication; the receiver then uses this information to resolve if it should take a involuntary reclamation-dot
to advance the global recovery-line. This verdict is based on the receiver determining if past application-
communication and reclamation-dots patterns can lead to the creation of useless reclamation-dots; a forced
reclamation-dot is captured to break these patterns [11, 14, 15, 16].
d) Message-logging based Methodologies
Message-logging based methodologies are prevalent for constructing interconnected-structures that can endure
interconnected-structure failures. Message-logging and DRL-accumulation can be implemented together to
offer fault tolerance in DS in which all inter-operation communiqué is through application-communications.
Each application-communication acknowledged by a operation is kept in application-communication-log on
stable-storage. No synchronization is necessary between the DRL-accumulation of different operations and
application-communication-logging. The execution of each operation is presumed to be deterministic between
165 | P a g e
acknowledged application-communications, and all operations are supposed to accomplish on fail stop
operations.
When a operation crashes, a new operation is generated in its place. The fresh operation is given the applicable
recorded reclamation-dot, and then the logged-application-communications are replayed in the order, the
operation formerly acknowledged them. All application-communication-logging strategies necessitate that once
a crashed operation recuperates; its state needs to be stable with the states of the other operations. This stability
condition is typically articulated in terms of orphan operations, which are persisting operations whose states are
unreliable with the recovered states of crashed operations. Thus, application-communication-logging strategies
promise that upon reclamation, no operation is an orphan. This situation can be imposed either by evading the
formation of orphan-application-communications for the period of an execution, as pessimistic methodologies
do; or by recording appropriate actions during reclamation to eradicate all orphan-application-communications
as optimistic methodologies do [1, 3, 17, 18, 19, 27].
3. LITERATURE SURVEY
3.1 Koo and Toueg Minimum-Process Blocking DRL-accumulation Scheme for Dist_Systems [5]
Koo and Toueg have revealed that if the nodes take their resident reclamation-dots in an orchestrated manner it
may not be imaginable to build a consistent DRL from such resident reclamation-dots. The rollback may lead to
domino effect. In orchestrated DRL collection strategy, a node initiates DRL assortment by recording its
resident reclamation-dot, then, it dispatches invitation application-communication to another nodes to take their
resident reclamation-dots. If the nodes preserve information about casual causal-interrelationships, a minimal
number of nodes have to take their resident reclamation-dots in response to such reclamation-dot-invitations.
Koo and Toueg had pre-dispatched such a strategy, which implicates suspending the underlying reckoning
during DRL collection .The nodes resume the underlying reckoning when the DRL assemblage terminates. Koo
and Toueg have projected operations to handle contemporaneous commencements of DRL collection. Koo and
Toueg handles contemporaneous resident reclamation-dot gathering in following manner: Once a node takes a
resident reclamation-dot, it is unwilling to take a resident reclamation-dot in response to another
DRL_instigator. The node dispatches a negative response to all subsequent reclamation-dot-invitations until the
resident reclamation-dot invitation is made committed or until the resident reclamation-dot collection is aborted.
Their strategy makes the following assumption about distributed interconnected-structure: operations
communicate by exchanging reckoning-application-communications through application-communication
channels. Communication channels are FIFO. Communication failure does not partition the network. The
strategy takes two kinds of reclamation-dots on stable storage: committed and partially-committed. A committed
reclamation-dot is a reclamation-dot at a operation and is a part of a consistent DRL .A partially-committed
resident reclamation-dot is a temporary resident reclamation-dot; that is made committed resident reclamation-
dot on successful termination of DRL-accumulation strategy. In case of a failure, operations rollback only to
their committed resident reclamation-dots for recovery .The strategy assumes that no operation miscarries
during the execution of strategy .The strategy consists of two stages.
166 | P a g e
3.2 Silva and Silva [16] all operation orchestrated DRL-accumulation methodology for distributed
interconnected-structures
Silva and Silva projected all operation orchestrated DRL-accumulation methodology for distributed
interconnected-structures. The non-intrusiveness during DRL-accumulation is attained by piggybacking
monotonically increasing resident reclamation-dot sequence numbers along with reckoning-application-
communications. When a operation acknowledges a reckoning-application-communication with the higher
resident reclamation-dot sequence number, it takes its resident reclamation-dot before processing the reckoning-
application-communication. When it actually gets the resident reclamation-dot invitation from the
DRL_instigator, it ignores the same. If each operation of the distributed programs is allowed to initiate DRL-
accumulation , the network may be swamped with control-application-communications and operations might
waste their time making unnecessary resident reclamation-dots. In order to avoid this, Silva and Silva give the
key to initiate DRL-accumulation to one operation. The reclamation-dot event is triggered periodically by a
local timer mechanism. When this timer expires, the DRL_instigator operation captures the state of operations
running in its machine and forces all the others to take resident reclamation-dot by sending a broadcast
invitation. The interval between adjacent resident reclamation-dots is called the reclamation-dot interval.
3.3 Kim-Park Scheme for DRL-accumulation and partial commit on recovery [11]
Kim-Park Scheme [11] projected a methodology for DRL-accumulation and partial commit on recovery;
which exploits the causal-interrelationship relationship between operations to achieve time-efficiency in DRL-
accumulation and rollback coordination. Unlike other synchronized methodologies. In which the DRL-
accumulation coordinator collects the status information of the operations that it depends on and delivers its
decision, the operation in their methodology takes a resident reclamation-dot, when it knows that all operations
on which it reckoning ally depends took their resident reclamation-dots. In this way, the coordinator of the
DRL-accumulation does not always have to deliver its decision after it collects the status of the operations, it
depends on, and hence one stage of the coordination is practically removed. The DRL-accumulation
coordination time and the possibility of total abort of the DRL-accumulation are substantially reduced.
Reduction of the coordination roll back time is also achieved by sending the restart application-communications
from the coordinator directly to the roll back operations; and contemporaneous activities of the DRL-
accumulation and roll back are effectively handled exploiting the operation causal-interrelationship
relationship.
3.4 Lai and Yang’s global resident reclamation-dot strategy for non-FIFO interconnected-structures [27]
Lai and Yang’s DRL strategy for non-FIFO interconnected-structures is based on two observations on the role
of a marker in a FIFO interconnected-structure. The Lai-Yang strategy fulfills this role of a marker in a non-
FIFO interconnected-structure by using a coloring strategy on reckoning-application-communications that
works as follows:
167 | P a g e
1. Every operation is initially white and turns red while recording a resident reclamation-dot. The equivalent of
the “marker sending rule” is executed when a operation turns red.
2. Every reckoning-application-communication dispatched by a white (red) operation is colored white (red).
Thus a white (red) reckoning-application-communication is a application-communication that was dispatched
before (after) the sender of that reckoning-application-communication recorded its local resident reclamation-
dot.
3. Every white operation takes its resident reclamation-dot at its convenience, but no later than the instant it
acknowledges a red reckoning-application-communication.
Thus, when a white operation acknowledges a red reckoning-application-communication, it records its local
snap-shot before processing the application-communication. This ensures that no reckoning-application-
communication dispatched by a operation after recording its local resident reclamation-dot is processed by the
destination operation before the destination records its resident reclamation-dot. Thus, an explicit marker is not
required in this strategy and the “marker” is piggybacked on reckoning-application-communications using a
coloring strategy.
3.5 Spezialetti-Kearns marker-based DRL-accumulation strategy for Dist_Systems [28]
In this strategy, a marker carries the identifier of the DRL_instigator of the strategy. Each operation has a
variable master to keep track of the DRL_instigator of the strategy. When a operation executes the “marker
sending rule” on the receipt of its first marker, it records the DRL_instigator’s identifier carried in the
acknowledged marker in the master variable. A operation that initiates the strategy records its own identifier in
the master variable.
A key notion used by the optimizations is that of a region in the interconnected-structure. A region
encompasses all the operations whose master field contains the identifier of the same DRL_instigator. A region
is identified by the DRL_instigator’s identifier. When there are multiple contemporaneous DRL_instigators, the
interconnected-structure gets partitioned into multiple regions.
When the DRL_instigator’s identifier in a marker acknowledged along a channel is different from the value in
the master variable, a contemporaneous DRL_instigation of the strategy is detected and the sender of the marker
lies in a different region. The identifier of the contemporaneous DRL_instigator is recorded in a local variable
id-border-set. The operation acquiring the marker does not take a resident reclamation-dot for this marker and
does not propagate this marker. Thus, the strategy efficiently handles contemporaneous reclamation-dot
commencements by suppressing redundant reclamation-dot collections. A operation does not take a resident
reclamation-dot or propagate a reclamation-dot-invitation instigated by a operation, if it has already taken a
resident reclamation-dot in response to some other reclamation-dot DRL_instigation.
The state of the channel is recorded just as in the Chandy-Lamport strategy (including those that cross a border
between regions). This enables the resident reclamation-dot recorded in one region to be merged with the
168 | P a g e
resident reclamation-dot recorded in the adjacent region. Thus, even though markers arriving at a node contain
identifiers of different DRL_instigators, they are considered part of the same instance of the strategy for the
purpose of channel state recording.
Local-reclamation-dot recording at a operation is complete after it has acknowledged a marker along each of its
channels. After every operation has recorded its resident reclamation-dot, the interconnected-structure is
partitioned into as many regions as the number of contemporaneous DRL_instigations of the strategy. The
variable id-border-set at a operation contains the identifiers of the neighboring regions.
The strategy works as follows: resident reclamation-dots are assigned version numbers and all reckoning-
application-communications carry this version number. The DRL_instigator notifies all the operations the
version number of the new resident reclamation-dot by sending init_snap application-communications along the
spanning tree edges. A operation follows the “marker sending rule” when it acknowledges this notification or
when it acknowledges a regular reckoning-application-communication with a new version number. The
“marker sending rule” is modified so that the operation dispatches regular reckoning-application-
communications along only those channels on which it has dispatched reckoning-application-communications
since the previous resident reclamation-dot.
3.6 Ravi Prakash and Mukesh Singhal Algorithm [4]:
They had described a Synchronous Snapshot compilation algorithm for Mobile Systems that neither forces
every node to take a local snapshot nor blocks the essential computation during snapshot collection. If a node
initiates snapshot collection, local snapshot of only those nodes that have directly or transitively affected the
initiator, take their checkpoints. This paper presents that the global snapshot collection terminates within a finite
time of its request and collected global snapshot is consistent. This paper presents a minimal rollback/recovery
algorithm in which the computation at a node is rolled back only if depends on operations that have been undone
due to failure of node(s).Both the algorithms have low communication and storage overheads and meet the low
energy consumption and low bandwidth constraints of mobile computing systems. The synchronous snapshot
collection algorithm accounts for the mobility of the nodes .The algorithm forces a minimal set of nodes to take
their snapshots and underlying computation is not suspended during snapshot collection. An interesting aspect
of the algorithm is that it has lazy phase that enables nodes to take local snapshots in quasi-asynchronous
fashion, after the coordinated snapshot collection phase is over. This further reduces the amount of computation
that is rollback during recovery from node failure. The lazy phase advances the checkpoint slowly rather than in
a burst. This avoids disagreement for the low bandwidth channels. This algorithm also considers the changing
topology of network due to mobility of nodes. Here the Recovery algorithm is a compromise between two
diverse recovery strategies fast recovery with high communication and storage overhead and slow recovery
with very little communication overhead.
169 | P a g e
3.7 Cao-Singhal Non-intrusive Checkpointing Algorithm [2]:
They proved that no min-process non-blocking algorithm exists. There are two directions in designing efficient
coordinated checkpointing algorithms. First is to relax the non-blocking condition while keeping the min-
process property. The other is to relax the min-process condition while keeping the non-blocking property. The
new constraints in mobile computing system, such as low bandwidth of wireless channel, high search cost, and
limited battery life, suggest that the proposed checkpointing algorithm should be a min-process algorithm.
Therefore, they developed an algorithm that relaxes the min-process condition. In this scheme, they introduced
the concept of mutable checkpoint, which is neither a tentative checkpoint nor a permanent checkpoint, to
design efficient checkpointing algorithms for mobile computing systems. Mutable checkpoints can be saved
anywhere, e.g., the main memory or local disk of Mob-Hosts.
Such algorithms rely on the two-phase commit protocol and save two kinds of checkpoints on the stable storage:
tentative and permanent.
In the first phase, the initiator takes a tentative checkpoint and forces all relevant processes to take tentative
checkpoints. Each process informs the initiator whether it succeeded in taking a tentative checkpoint. When the
initiator learns that all relevant processes have successfully taken tentative checkpoints, it asks them to make
their tentative checkpoints permanent; otherwise, it asks them to discard them. A process, on receiving the
message from the initiator, acts accordingly. A non-blocking checkpointing algorithm does not require any
process to suspend its underlying computation. When processes do not suspend their computations, it is possible
for a process to receive a computation message from another process which is already running in a new
checkpoint interval. If this situation is not properly handled, it may result in an inconsistency.
In their algorithm, initiator, say Pin, sends the checkpoint request to any process, say Pj, only if Pin receives m
from Pj in the current CI. Pj takes its tentative checkpoint if Pj has sent m to Pin in the current CI; otherwise, Pj
concludes that the checkpoint request is a useless one. Similarly, when Pj takes its tentative checkpoint, it
propagates the checkpoint request to other processes. This process is continued till the checkpoint request
reaches all the processes on which the initiator transitively depends and a checkpointing tree is formed. During
checkpointing, if Pi receives m from Pj such that Pj has taken some checkpoint in the current initiation before
sending m, Pi may be forced to take a checkpoint, called mutable checkpoint. If Pi is not in the minimum set, its
mutable checkpoint is useless and is discarded on commit. The huge data structure MR[] is also attached with
the checkpoint requests to reduce the number of useless checkpoint requests. The response from each process is
sent directly to initiator.
3.8 P. Kumar and L. Kumar algorithm [3]:
In Cao and Singhal algorithm number of useless checkpoint may exceed high in some situation [2]. P. Kumar
and L. Kumar proposed a new for Synchronous check pointing protocol for mobile distributed system [3]. They
are able to maintain exact dependencies among processes and make an approximate set of interacting processes
at the beginning. In this way the time to collect coordinated checkpoint is reduced. The number of useless check
pointing and blocking processes is also reduced. A process checkpoint if the probability that it will get a
checkpoint request in current initiation is high. A few processes may be blocked but they can continue their
170 | P a g e
normal computation and may send message..
Suppose, during the execution of the check pointing algorithm, Pi takes its checkpoint and sends m to Pj. Pj
receives m such that it has not taken its checkpoint for the current initiation and it does not know whether it will
get the checkpoint request. If Pj takes its checkpoint after processing m, m will become orphan. In order to avoid
such orphan messages, they propose the following technique. If Pj has sent at least one message to a process,
say Pk and Pk is in the tentative minimum set, there is a good probability that Pj will get the checkpoint request.
Therefore, Pj takes its induced checkpoint before processing m. An induced checkpoint is similar to the mutable
checkpoint [14]. In this case, most probably, Pj will get the checkpoint request and its induced checkpoint will
be converted into permanent one. There is a less probability that Pj will not get the checkpoint request and its
induced checkpoint will be discarded. Alternatively, if there is not a good probability that Pj will get the
checkpoint request, Pj buffers m till it takes its checkpoint or receives the commit message. They have tried to
minimize the number of useless checkpoints and blocking of the process by using the probabilistic approach and
buffering selective messages at the receiver end. Exact dependencies among processes are maintained. It
abolishes the useless checkpoint requests and reduces the number of replica checkpoint requests as compared to
[14].
3.8 P.Kumar Hybrid Checkpointing Scheme : P. Kumar suggested Crossbreed orchestrated DRL-accumulation
etiquette [31]. It minimizes the number of methods to reclamation-dot is a suitable approach to introduce fault
tolerance in mobile interlinked-structures transparently. Minimum-method coordinating DRL-accumulation
may require piggybacking of some information on normal reckoning-reckoning-communiqués , intrusive of
underlying reckoning or recording some resident Reclamation-dots more than the minimum required. In this
approach, some methods may not reclamation-dot for several CR CR_instigations as they are not part of
minimum methods to reclamation-dot. In case of a recovery after a fault, such methods may rollback to far
earlier reclamation-doted state and thus may cause greater loss of reckoning. In orchestrated DRL-
accumulation , where all methods record resident Reclamation-dots, the recovery line is advanced for all
methods but the DRL-accumulation overhead may be exceptionally high, especially in mobile environments;
because, it will consume the scarce resources of mobile nodes even if they are not part of minimum methods to
reclamation-dot. To optimize both, i.e., the DRL-accumulation overhead and the loss of reckoning on recovery,
author projected a crossbreed DRL-accumulation method, where an all-method DRL-accumulation is forced
after the execution of minimum-method orchestrated DRL-accumulation method for a fixed number of times.
Thus, the Mobile nodes with low activity or in doze mode operation may not be disturbed in case of minimum-
method DRL-accumulation and the recovery line is advanced for each method after an all-method reclamation-
dot. Additionally, he tries to minimize the piggybacked information. For minimum-method DRL-accumulation ,
he projected a intrusive method, where no useless Reclamation-dots are recordn and an effort has been made to
optimize the intrusive of methods. He projected an innovative idea to delay selective reckoning-reckoning-
communiqués at the receiver end. By doing so, methods are allowed to accomplish their normal reckoning,
send reckoning-reckoning-communiqués and partially receive them during intrusive.
171 | P a g e
4. CONCLUSION:
A survey of the literate on checkpointing algorithms for mobile distributed systems shows that a large number of
papers have been published. We have reviewed and compared different approaches to checkpointing in mobile
distributed systems with respect to a set of properties including the assumption of piecewise determinism,
performance overhead, storage overhead, ease of output commit, ease of garbage collection, ease of recovery,
useless checkpointing, low energy consumptions.
REFERENCES
[1] Chandy K. M. and Lamport L., “Distributed Snapshots: Determining Global State of Systems,” ACM
Transaction on Computing Systems, vol. 3, No. 1, pp. 63-75, February 1985.
[2] G. Cao and M. Singhal,”Mutable Checkpoints:A New Checkpointing Approach for Mobile Computing
Systems”, IEEE Transactions On Parallel And D istributed Systems,Vol.12,No.2,February 2001,pp 157-172.
[3] Lalit Kumar Awasthi, Kumar p. 2007 A Synchoronous Checkpointing Protocol For Mobile Distributed
Systems. Probabilistic Approach. Int J. Information and Computer Security, Vol.1, No.3 .pp 298-314
[4] R. Prakash and M. Singhal. “Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems”.
IEEE Trans. on Paralleland Distributed System, pages 1035-1048,Oct. 1996.
[5] Koo R, Toueg S Checkpointing and rollback recovery for distributed systems”.IEEE Trans.Software Eng.
SE-13: 23-31,1987
[6] G. Cao and M. Singhal. “On impossibility of Min-Process and Non-Blocking Checkpointing and An
Efficient Checkpointing algorithm for mobile computing Systems”. OSU Technical Report #OSU-CISRC-
9/97-TR44, 1997.
[7] Prakash R. and Singhal M. “Maximal Global Snapshot with concurrent initiators,” Proc. Sixth IEEE Symp.
Parallel and Distributed Processing, pp.344-351, Oct.1994
[8] Bidyut Gupta, S.Rahimi and Z.Lui. “A New High Performance Checkpointing Approach for Mobile
Computing Systems”. IJCSNS International Journal of Computer Science and Network Security, Vol.6
No.5B, May 2006.
[9] Acharya A. and Badrinath B. R., “Checkpointing Distributed Applications on Mobile Computers,”
Proceedings of the 3rd International Conference on Parallel and Distributed Information Systems, pp. 73-80,
September 1994.
[10] Ch.D.V. Subba Rao and M.M. Naidu. “A New, Efficient Coordinated Checkpointing Protocol Combined
with Selective Sender-Based Message Logging”
[11] J.L.Kim and T.Park. “An efficient protocol for checkpointing recovery in Distributed Systems” IEEE
Transaction On Parallel and Distributed Systems,4(8):pp.955-960, Aug 1993.
[12] Yanping Gao, Changhui Deng, Yandong Che. “An Adaptive Index-Based Algorithm using Time-
Coordination in Mobile Computing”. International Symposiums on Information Processing, 2008.
[13] Kanmani - Anitha - Ganesan .“Coordinated Checkpointing with Avalanche Avoidance for Distributed Mobile
Computing System.” International Conference on Computational Intelligence and Multimedia Applications
2007.
172 | P a g e
[14] Ajay D Kshemkalyani: “A symmetric O(n log n) message distributed snapshot algorithm for large scale
systems” IEEE, 2010, pp 1-4
[15] Ajay D Kshemkalyani Fast and message efficient global snapshot algorithms for large scale distributed
systems IEEE 2010. Page(s): 1281 1289.
[16] Silva L, Silva J 1992 Global checkpointing for distributed programs. Proc. IEEE 11th Symp. On Reliable
Distributed Syst. pp 155-162.
[17] Wang, Y.M., Fuchs, W.K.: Lazy checkpoint coordination for bounding rollback propagation. In: Proceedings
of IEEE Symposium on Reliable Distributed Systems, pp. 7885 (1993).
[18] Kumar, P., Garg, R.: Soft Checkpointing Based Hybrid Synchronous Checkpointing Protocol for Mobile
Distributed Systems. International Journal of Distributed Systems and Technologies 2(1), 113 (2011)
[19] Venkatesan S 1993 Message-optimal incremental snapshots J. Comput. Sofnvare Engineering 1 211-31.
[20] Rahul Garg, Vijay K Garg, Yogish sabharwal “Scalable algorithms for global snapshots in distributed
systems” ACM 2006.
[21] Kumar and Khunteta “A Minimum-Process Coordinated Check pointing Protocol For Mobile Distributed
System” IJCSE, Vol. 02, No. 04, 2010, 1314-1326.
[22] Gupta and Kumar “Review of Some Checkpointing Algorithms for Distributed and Mobile Systems” CNSA
2011, CCIS 196, pp. 167177, 2011.
[23] R. Tuli,P. Kumar,“ Minimum process coordinated Checkpointing scheme for ad hoc Networks”, International
Journal on AdHoc Networking Systems (IJANS) Vol. 1, No. 2, October 2011 ,pp-51-63.
[24] M. Singhal and N. Shivaratri, Advanced Concepts in Operating Systems, New York, McGraw Hill, 1994.
[25] Acharya A., “Structuring Distributed Algorithms and Services for networks with Mobile Hosts”, Ph.D.
Thesis, Rutgers University, 1995.
[26] Elnozahy E.N., Alvisi L., Wang Y.M. and Johnson D.B., “A Survey of Rollback-Recovery Protocols in
Message-Passing Systems,” ACM Computing Surveys, vol. 34, no. 3, pp. 375-408, 2002.
[27] T.H. Lai and T.H.Yang, “On distributed snapshots”, Information Processing Letters, 25, 1987, pp.153 -158.
[28] M. Spezialetti and P. Kearns, Efficient Distributed Snapshots, Proceedings of the 6th International
Conference on Distributed Computing Systems, 1986, 382-388.
[29] Praveen Choudhary, Parveen Kumar,” Low-Overhead Minimum-Method Global-Snapshot Compilation
Protocol for Deterministic Mobile Computing Systems ”, International Journal of Emerging Trends in
Engineering Research” Vol. 9, Issue 8, Aug 2021, pp.1069-1072.
[30] Deepak Chandra Uprety, Parveen Kumar, Arun Kumar Chouhary,”Transient Snapshot based Minimum-
process Synchronized Checkpointing Etiquette for Mobile Distributed Systems”,International Journal of
Emerging Trends in Engineering Research”, Vol 10, No 4, Aug. 2021
[31] Kumar, P.,” A Low-Cost Hybrid Coordinated Checkpointing Protocol for Mobile Distributed Systems”,
Mobile Information Systems pp 13-32, Vol. 4, No. 1. ,2007.
ResearchGate has not been able to resolve any citations for this publication.
Article
Coordinated checkpointing is a method that minimises number of processes to checkpoint for an initiation. It may require blocking of processes, extra synchronisation messages or useless checkpoints. We propose a minimum process coordinated checkpointing algorithm where the number of useless checkpoints and blocking are reduced using a probabilistic approach that computes an interacting set of processes on checkpoint initiation. A process checkpoints if the probability that it will get a checkpoint request in current initiation is high. A few processes may be blocked but they can continue their normal computation and may send messages. We also modified methodology to maintain exact dependencies.
Article
Large-scale distributed systems such as supercomputers and peer-to-peer systems typically have a fully connected logical topology over a large number of processors. Existing snapshot algorithms in such systems have high response time and/or require a large number of messages, typically O(n<sup>2</sup>), where n is the number of processes. In this paper, we present a suite of two algorithms: simple_tree, and hypercube, that are both fast and require a small number of messages. This makes the algorithms highly scalable. Simple_tree requires O(n) messages and has O(log n) response time. Hypercube requires O(n log n) messages and has O(log n) response time, in addition to having the property that the roles of all the processes are symmetrical. Process symmetry implies greater potential for balanced workload and congestion-freedom. All the algorithms assume non-FIFO channels.
Article
Minimum-process coordinated checkpointing is a suitable approach to introduce fault tolerance in mobile distributed systems transparently. In order to balance the checkpointing overhead and the loss of computation on recovery, the authors propose a hybrid checkpointing algorithm, wherein an all-process coordinated checkpoint is taken after the execution of minimum-process coordinated checkpointing algorithm for a fixed number of times. In coordinated checkpointing, if a single process fails to take its checkpoint; all the checkpointing effort goes waste, because, each process has to abort its tentative checkpoint. In order to take the tentative checkpoint, an MH Mobile Host needs to transfer large checkpoint data to its local MSS over wireless channels. In this regard, the authors propose that in the first phase, all concerned MHs will take soft checkpoint only. Soft checkpoint is similar to mutable checkpoint. In this case, if some process fails to take checkpoint in the first phase, then MHs need to abort their soft checkpoints only. The effort of taking a soft checkpoint is negligibly small as compared to the tentative one. In the minimum-process coordinated checkpointing algorithm, an effort has been made to minimize the number of useless checkpoints and blocking of processes using probabilistic approach.
Article
Given the increased availability of general purpose parallel computers two issues arise:One needs to compare the performance of the different available platforms using realisticexamples, and it is necessary to write application software that can be ported easily inorder to take advantage of different platforms. The authors address these issues from anapplications point of view. They are interested in the use of general purpose parallelcomputers for simulation tasks needed during the design of very large scale integrated(VLSI) circuits. They characterize the simulation task as a useful benchmark andintroduce a high level process view of parallel simulation that is helpful for derivingportable parallel programs. Details of the partitioning strategy and the simulation algorithm used in the application are given. They discuss their implementation on different parallel machines and give statistics of various experiments.
Article
Mobile computing raises many new issues such as lack of stable storage, low bandwidth of wireless channel, high mobility, and limited battery life. These new issues make traditional checkpointing algorithms unsuitable. Coordinated checkpointing is an attractive approach for transparently adding fault tolerance to distributed applications since it avoids domino effects and minimizes the stable storage requirement. However, it suffers from high overhead associated with the checkpointing process in mobile computing systems. Two approaches have been used to reduce the overhead: First is to minimize the number of synchronization messages and the number of checkpoints; the other is to make the checkpointing process nonblocking. These two approaches were orthogonal previously until the Prakash-Singhal algorithm [28] combined them. However, we [8] found that this algorithm may result in an inconsistency in some situations and we proved that there does not exist a nonblocking algorithm which forces only a minimum number of processes to take their checkpoints. In this paper, we introduce the concept of ¿mutable checkpoint,¿ which is neither a tentative checkpoint nor a permanent checkpoint, to design efficient checkpointing algorithms for mobile computing systems. Mutable checkpoints can be saved anywhere, e.g., the main memory or local disk of MHs. In this way, taking a mutable checkpoint avoids the overhead of transferring large amounts of data to the stable storage at MSSs over the wireless network. We present techniques to minimize the number of mutable checkpoints. Simulation results show that the overhead of taking mutable checkpoints is negligible. Based on mutable checkpoints, our nonblocking algorithm avoids the avalanche effect and forces only a minimum number of processes to take their checkpoints on the stable storage.