ArticlePDF Available

BASE: Using Abstraction to Improve Fault Tolerance

October 2001
ACM SIGOPS Operating Systems Review 35(5)

October 2001
35(5)

DOI:10.1145/502034.502037

Source
CiteSeer

Authors:

Rodrigo Rodrigues

Inesc-ID

Miguel Castro

Microsoft

Barbara Liskov

Massachusetts Institute of Technology

increasingly exploited in malicious attacks. Byzantine fault tolerance allows replicated systems to mask some software errors but it is expensive to deploy. This paper describes a replication technique, BASE, which uses abstraction to reduce the cost of Byzantine fault tolerance and to improve its ability to mask software errors. BASE reduces cost because it enables reuse of o-the-shelf service implementations. It improves availability because each replica can be repaired periodically using an abstract view of the state stored by correct replicas, and because each replica can run distinct or non-deterministic service implementations, which reduces the probability of common mode failures. We built an NFS service where each replica can run a dierent o-the-shelf le system implementation, and an object-oriented database where the replicas ran the same, non-deterministic implementation. These examples suggest that our technique can be used in practice | in both cases, the implementation required only a modest amount of new code, and our performance results indicate that the replicated services perform comparably to the implementations that they reuse.

Software Architecture

…

Figures - uploaded by Miguel Castro

Content may be subject to copyright.

Content uploaded by Miguel Castro

Content may be subject to copyright.

Using Abstraction To Improve Fault Tolerance

Miguel Castro

Microsoft Research Ltd.

1 Guildhall St., Cambridge CB2 3NH, UK

mcastro@microsoft.com

Rodrigo Rodrigues and Barbara Liskov

MIT Laboratory for Computer Science

545 Technology Sq., Cambridge, MA 02139, USA

rodrigo,liskov

@lcs.mit.edu

Abstract

Software errors are a major cause of outages and they

are increasingly exploited in malicious attacks. Byzantine

fault tolerance allows replicated systems to mask some soft-

ware errors but it is expensive to deploy. This paper de-

scribes a replication technique, BFTA, which uses abstrac-

tion to reduce the cost of Byzantine fault tolerance and to

improve its ability to mask software errors. BFTA reduces

cost because it enables reuse of off-the-shelf service imple-

mentations. It improves availability because each replica

can be repaired periodically using an abstract view of the

state stored by correct replicas, and because each replica

can run distinct or non-deterministic service implementa-

tions, which reduces the probability of common mode fail-

ures. We built an NFS service that allows each replica to

run a different operating system. This example suggests that

BFTA can be used in practice — the replicated ﬁle system

required only a modest amount of new code, and prelimi-

nary performance results indicate that it performs compa-

rably to the off-the-shelf implementations that it wraps.

1. Introduction

There is a growing demand for highly-available systems

that provide correct service without interruptions. These

systems must tolerate software errors because these are a

major cause of outages [7]. Furthermore, there is an in-

creasing number of malicious attacks that exploit software

errors to gain control or deny access to systems that provide

important services.

This paper proposes a replication technique, BFTA, that

combines Byzantine fault tolerance [12] with work on data

abstraction [11]. Byzantine fault tolerance allows a repli-

cated service to tolerate arbitrary behavior from faulty repli-

cas, e.g., the behavior caused by a software bug, or the be-

havior of a replica that is controlled by an attacker. Abstrac-

This research was partially supported by DARPA under contract F30602-

98-1-0237 monitored by the Air Force Research Laboratory.

tion hides implementation details to enable the reuse of off-

the-shelf implementations of important services (e.g., ﬁle

systems, databases, or HTTP daemons) and to improve the

ability to mask software errors.

We extended the BFT library [1, 2] to implement BFTA.

The original BFT library provides Byzantine fault toleranc e

with good performance and strong correctness guarantees if

no more than

of the replicas fail within a small window

of vulnerability. However, it requires all replicas to run the

same service implementation and to update their state in a

deterministic way. Therefore, it cannot tolerate determinis-

tic software errors that cause all replicas to fail concurrently

and it complicates reuse of existing service implementations

because it requires extensive modiﬁcations to ensure identi-

cal values for the state of each replica.

The BFTA library and methodology described in this pa-

per correct these problems — they enable replicas to run dif-

ferent or non-deterministic implementations. The method-

ology is based on the concepts of abstract speciﬁcation and

abstraction function from work on data abstraction [11]. We

start by deﬁning a common abstract speciﬁcation for the

service, which speciﬁes an abstract state and describes how

each operation manipulates the state. Then we implement

aconformance wrapper for each distinct implementation to

make it behave according to the common speciﬁcation. The

last step is to implement an abstraction function (and one of

its inverses) to map from the concrete state of each imple-

mentation to the common abstract state (and vice versa).

Our methodology offers several important advantages.

Reuse of existing code. BFTA implements a form of state

machine replication [14, 10], which allows replication of

services that perform arbitrary computations, but requires

determinism: all replicas must produce the same sequence

of results when they process the same sequence of opera-

tions. Most off-the-shelf implementations of services fail

to satisfy this condition. For example, many implementa-

tions produce timestamps by reading local clocks, which

can cause the states of replicas to diverge. The conformance

wrapper and the abstract state conversions enable the reuse

of existing implementations without modiﬁcations. Fur-

thermore, these implementations can be non-deterministic,

which reduces the probability of common mode failures.

Software rejuvenation. It has been observed [9] that there

is a correlation between the length of time software runs and

the probability that it fails. BFTA combines proactive re-

covery [2] with abstraction to counter this problem. Repli-

cas are recovered periodically even if there is no reason to

suspect they are faulty. Recoveries are staggered such that

the service remains available during rejuvenation to enable

frequent recoveries. When a replica is recovered, it is re-

booted and restarted from a clean state. Then it is brought

up to date using a correct copy of the abstract state that

is obtained from the group of replicas. Abstraction may

improve availability by hiding corrupt concrete states, and

it enables proactive recovery when replicas do not run the

same code or run code that is non-deterministic.

Opportunistic N-version programming. Replication is

not useful when there is a strong positive correlation be-

tween the failure probabilities of the different replicas, e.g.,

deterministic software bugs cause all replicas to fail at the

same time when they run the same code. BFTA enables an

opportunistic form of N-version programming [3] — repli-

cas can run distinct, off-the-shelf implementations of the

service. This is a viable option for many common services,

e.g., relational databases, HTTP daemons, ﬁle systems, and

operating systems. In all these cases, competition has led to

four or more distinct implementations that were developed

and are maintained separately but have similar (although not

identical) functionality. Furthermore, the technique is made

easier by the existence of standards that provide identical

interfaces to different implementations, e.g., ODBC [6] and

NFS [5]. We can also leverage the large effort towards stan-

dardizing data representations using XML.

It is widely believed that the beneﬁts of N-version pro-

gramming [3] do not justify its high cost [7]. It is better

to invest the same amount of money on better development,

veriﬁcation, and testing of a single implementation. But op-

portunistic N-version programming achieves low cost due

to economies of scale without compromising the quality of

individual implementations. Since each off-the-shelf imple-

mentation is sold to a large number of customers, the ven-

dors can amortize the cost of producing a high quality im-

plementation. Additionally, taking advantage of interoper-

ability standards keeps the cost of writing the conformance

wrappers and state conversion functions low.

The paper explains the methodology by walking through

an example, the implementation of a replicated ﬁle service

where replicas run different operating systems and ﬁle sys-

tems. For this methodology to be successful, the confor-

mance wrapper and the state conversion functions must be

simple to reduce the likelihood of introducing more errors

and introduce a low overhead. Experimental results indicat e

that this is true in our example.

The remainder of the paper is organized as follows. Sec-

tion 2 provides an overview of the BFTA methodology and

library. Section 3 explains how we applied the methodol-

ogy to build the replicated ﬁle system. Section 4 presents

our conclusions and some preliminary results.

2. The BFTA Technique

This section provides an overview of our replication tech-

nique. It starts by describing the methodology that we use to

build a replicated system from existing service implemen-

tations. It ends with a description of the BFTA library.

2.1. Methodology

The goal is to build a replicated system by reusing a

set of off-the-shelf implementations,

; :::; I

, of some ser-

vice. Ideally, we would like

to equal the number of repli-

cas so that each replica can run a different implementation

to reduce the probability of simultaneous failures. But the

technique is useful even with a single implementation.

Although off-the-shelf implementations of the same ser-

vice offer roughly the same functionality, they behave dif-

ferently: they implement different speciﬁcations,

; :::; S

using different representations of the service state. Even the

behavior of different replicas that run the same implementa -

tion may be different when the speciﬁcation they implement

is not strong enough to ensure deterministic behavior. For

instance, the speciﬁcation of the NFS protocol [5] allows

implementations to choose arbitrary values for ﬁle handles.

BFTA, like any form of state machine replication, re-

quires determinism: replicas must produce the same se-

quence of results when they execute the same sequence of

operations. We achieve determinism by deﬁning a com-

mon abstract speciﬁcation,

, for the service that is strong

enough to ensure deterministic behavior. This speciﬁcation

deﬁnes the abstract state, an initial state value, and the be-

havior of each service operation.

The speciﬁcation is deﬁned without knowledge of the in-

ternals of each implementation unlike what happens in the

technique sketched in [13]. It is sufﬁcient to treat them as

black boxes, which is important to enable the use of existing

implementations. Additionally, the abstract state captures

only what is visible to the client rather than mimicking what

is common in the concrete states of the different implemen-

tations. This simpliﬁes the abstract state and improves the

effectiveness of our software rejuvenation technique.

The next step, is to implement conformance wrappers,

; :::; C

, for each of

; :::; I

. The conformance wrap-

pers implement the common speciﬁcation

. The imple-

mentation of each wrapper

is a veneer that invokes the

operations offered by

to implement the operations in

;

in implementing these operations it makes use of a con-

formance rep that stores whatever additional information is

needed to allow the translation from the concrete behavior

of the implementation to the abstract behavior.

The ﬁnal step is to implement the abstraction function

and one of its inverses. These functions allow state transfer

among the replicas. State transfer is used to repair faulty

replicas, and also to bring slow replicas up-to-date when

messages they are missing have been garbage collected. For

state transfer to work replicas must agreeon the value of the

state of the service after executinga sequence of operations;

they will not agree on the value of the concrete state but our

methodology ensures that they will agree on the value of

the abstract state. The abstraction function is used to con-

vert the concrete state stored by a replica into the abstract

state, which is transferred to another replica. The receiving

replica uses the inverse function to convert the abstract state

into its own concrete state representation.

To enable efﬁcient state transfer between replicas, the

abstract state is deﬁned as an array of variable-sized objects.

We explain how this representation enables efﬁcient state

transfer in Section 2.2.

There is an important trend that simpliﬁes the method-

ology. Market forces push vendors towards extending their

products to offer interfaces that implement standard spec-

iﬁcations for interoperability, e.g., ODBC [6]. Usually, a

standard speciﬁcation

cannot be used as the common

speciﬁcation

because it is too weak to ensure determin-

istic behavior. But it can be used as a basis for

and, be-

cause

and

are similar, it is relatively easy to implement

conformance wrappers and state conversion functions, these

implementations can be mostly reused across implementa-

tions, and most client code can use the replicated system

without modiﬁcation.

2.2. Library

The BFTA library extends BFT with the features neces-

sary to provide the methodology. Figure 1 presents a sum-

mary of the library’s interface.

Client call:

int invoke(Byz req *req, Byz rep *rep,

bool read only);

Execution upcall:

int execute(Byz req*req, Byz rep*rep,

int client, Byz buffer *non-det);

Checkpointing:

void modify(int nobjs, int* objs);

State conversion upcalls:

int get obj(int i, char** obj);

void put objs(int nobjs, char **objs,

int *is, int *szs);

Figure 1. BFTA Interface and Upcalls

The invoke procedure is called by the client to invoke

an operation on the replicated service. This procedure car-

ries out the client side of the replication protocol and returns

the result when enough replicas have responded. When the

library needs to execute an operation at a replica, it makes

an upcall to an execute procedure that is implemented

by the conformance wrapper for the service implementation

run by the replica.

To perform state transfer in the presence of Byzantine

faults, it is necessary to be able to prove that the state being

transferred is correct. Otherwise, faulty replicas could cor-

rupt the state of out-of-date but correct replicas. (A detailed

discussion of this point can be found in [2].) Consequently,

replicas cannot discard a copy of the state produced after

executing a request until they know that the state produced

by executing later requests can be proven correct. Repli-

cas could keep a copy of the state after executing each re-

quest but this would be too expensive. Instead replicas keep

just the current version of the concrete state plus copies of

the abstract state produced every k-th request (e.g., k=128).

These copies are called checkpoints.

As mentioned earlier, to implement checkpointing and

state transfer efﬁciently, we require that the abstract state

be encoded as an array of objects. Creating checkpoints by

making full copies of the abstract state would be too ex-

pensive. Instead, the library uses copy-on-write such that

checkpoints only contain the objects whose value is dif-

ferent in the current abstract state. Similarly, transferring

a complete checkpoint to bring a recovering or out-of-date

replica up to date would be too expensive. The library em-

ploys a hierarchical state partition scheme to transfer state

efﬁciently. When a replica is fetching state, it recurses down

a hierarchy of meta-data to determine which partitions are

out of date. When it reaches the leaves of the hierarchy

(which are the abstract objects), it fetches only the objects

that are corrupt or out of date.

To implement state transfer, each replica must provide

the library with two upcalls, which implement the abstrac-

tion function and one of its inverses. get obj receives an

object index

, allocates a buffer, obtains the value of the

abstract object with index

, and places that value in the

buffer. It returns the size for that object and a pointer to

the buffer. put objs receives a vector of objects with the

corresponding indices and sizes. It causes the application

to update its concrete state using the new values for the ab-

stract objects passed as arguments. The library guarantees

that the put objs upcall is invoked with an argument that

brings the abstract state of the replica to a consistent value

(i.e., the value of a valid checkpoint). This is important to

allow encodings of the abstract state with dependencies be-

tween objects, e.g., it allows objects to describe the meaning

of other objects.

Each time the execute upcall is about to modify an

object in the abstract state it is required to invoke a mod-

ify procedure, which is supplied by the library, passing the

object index as argument. This is used to implement copy-

on-write to create checkpoints incrementally: the library in-

vokes get obj with the appropriate index and keeps the

copy of the object until the corresponding checkpoint can

be discarded.

BFTA implements a form of state machine replication

that requires replicas to behave deterministically. The me tho-

dology uses abstraction to hide most of the non-determinism

in the implementations it reuses. However, many services

involve forms of non-determinism that cannot be hidden by

abstraction. For instance, in the case of the NFS service, the

time-last-modiﬁed for each ﬁle is set by reading the server’s

local clock. If this were done independently at each replica,

the states of the replicas would diverge. The library pro-

vides a mechanism [1] for replicas to agree on these non-

deterministic values, which are then passed as arguments to

the execute procedure.

Proactive recovery periodically restarts each replica from

a correct, up-to-date checkpoint of the abstract state that is

obtained from the other replicas. Recoveries are staggered

so that less than

of the replicas recover at the same time.

This allows the other replicas to continue processing client

requests during the recovery. Additionally, it should reduce

the likelihood of simultaneous failures due to aging prob-

lems because at any instant less than

of the replicas

have been running for the same period of time.

Recoveries are triggered by a watchdog timer. When

a replica is recovered, it reboots after saving the replica-

tion protocol state and the concrete service state to disk.

The protocol state includes the abstract objects that were

copied by the incremental checkpointing mechanism. Then

the replica is restarted, and the conformance rep is recon-

structed using the information that was saved to disk. Next,

the library uses the hierarchical state transfer mechanism to

compare the value of the abstract state it currently stores

with the abstract state values stored by the other replicas.

This is efﬁcient: the replica uses cryptographic hashes stored

in the state partition tree to determine which abstract objects

are out-of-date or corrupt and it only fetches the value of

these objects.

The object values fetched by the replica could be sup-

plied to put objs to update the concrete state, but the

concrete state might still be corrupt. For example, an im-

plementation may have a memory leak and simply calling

put objs will not free unreferenced memory. In fact, im-

plementations will not typically offer an interface that can

be used to ﬁx all corrupt data structures in their concrete

state. Therefore, it is better to restart the implementation

from a clean initial concrete state and use the abstract state

to bring it up-to-date.

3. An example: File System

This section illustrates the methodology using a repli-

cated ﬁle system as an example. The ﬁle system is based on

the NFS protocol [5]. Its replicas can run different operati ng

systems and ﬁle system implementations.

3.1. Abstract Speciﬁcation

The common abstract speciﬁcation is based on the spec-

iﬁcation of the NFS protocol [5]. The abstract ﬁle service

state consists of a ﬁxed-size array of pairs with an object and

a generation number. Each object has a unique identiﬁer,

oid, which is obtained by concatenating its index in the ar-

ray and its generation number. The generation number is in-

cremented every time the entry is assigned to a new object.

There are four types of objects: ﬁles, whose data is a byte

array; directories, whose data is a sequence of

name, oid

pairs ordered lexicographically; symbolic links, whose data

is a small character string; and special null objects, which

indicate an entry is free. All non-null objects have meta-

data, which includes the attributes in the NFS fattr struc-

ture. Each entry in the array is encoded using XDR [4]. The

object with index

is a directory object that corresponds to

the root of the ﬁle system tree that was mounted.

The operations in the common speciﬁcation are those de-

ﬁned by the NFS protocol. There are operations to read and

write each type of non-null object. The ﬁle handles used by

the clients are the oids of the corresponding objects. To en-

sure deterministic behavior, we deﬁne a deterministic pro-

cedure to assign oids, and require that directory entries re-

turned to a client be ordered lexicographically.

The abstraction hides many details; the allocation of ﬁle

blocks, the representation of large ﬁles and directories, and

the persistent storage medium and how it is accessed. This

is desirable for simplicity, performance, and to improve re-

silience to software faults due to aging.

3.2. Conformance Wrapper

The conformance wrapper for the ﬁle service processes

NFS protocol operations and interacts with an off-the-shelf

ﬁle system implementation also using the NFS protocol as

illustrated in Figure 2. A ﬁle system exported by the repli-

cated ﬁle service is mounted on the client machine like any

regular NFS ﬁle system. Application processes run unmod-

iﬁed and interact with the mounted ﬁle system through the

NFS client in the kernel. We rely on user level relay pro-

cesses to mediate communication between the standard NFS

client and the replicas. A relay receives NFS protocol re-

quests, calls the invoke procedure of our replication li-

brary, and sends the result back to the NFS client. The

replication library invokes the execute procedure imple-

mented by the conformance wrapper to run each NFS re-

quest.

The conformance rep consists of an array that corresponds

to the one in the abstract state but it does not store copies

of the objects; instead each array entry contains the gener-

ation number, the ﬁle handle assigned to the object by the

underlying NFS server, and the value of the timestamps in

the object’s abstract meta-data. Empty entries store a null

ﬁle handle. The rep also contains a map from ﬁle handles

to oids to aid in processing replies efﬁciently.

The wrapper processes each NFS request received from a

client as follows. It translates the ﬁle handles in the request,

which encode oids, into the corresponding NFS server ﬁle

Andrew

benchmark

kernelNFSclient

relay

replication

library

replica1

unmodifiedNFSdaemon1

replication

library

conformance

wrapper state

conversion

unmodifiedNFSdaemonn

replication

library

conformance

wrapper state

conversion

replican

client

Figure 2. Software Architecture

handles. Then it sends the modiﬁed request to the underly-

ing NFS server. The server processes the request and returns

a reply.

The wrapper parses the reply and updates the confor-

mance rep. If the operation created a new object, the wrap-

per allocates a new entry in the array in the conformance

rep, increments the generation number, and updates the en-

try to contain the ﬁle handle assigned to the object by the

NFS server. If any object is deleted, the wrapper marks

its entry in the array free. In both cases, the reverse map

from ﬁle handles to oids is updated. The wrapper must also

update the abstract timestamps in the array entries corre-

sponding to objects that were accessed. We use the library

to agree on the timestamp value that is assigned to each op-

eration [1]. This value is one of the arguments to the exe-

cute procedure implemented by the wrapper.

Finally, the wrapper returns a modiﬁed reply to the client,

using the map to translate ﬁle handles to oids and replacing

the concrete timestamp values by the abstract ones. When

handling readdir calls the wrapper reads the entire directory

and sorts it lexicographically to ensure the client receives

identical replies from all replicas.

3.3. State Conversions

The abstraction function in the ﬁle service is implemented

as follows. For each ﬁle system object, it uses the ﬁle han-

dle stored in the conformance rep to invoke the NFS server

to obtain the data and meta-data for the object. Then it re-

places the concrete timestamp values by the abstract ones,

converts the ﬁle handles in directory entries to oids, and

sorts the directories lexicographically.

The inverse abstractionfunction in the ﬁle service works

as follows. For each ﬁle system object

it receives, there

are three possible cases depending on the state of the entry

that corresponds to

in the conformance rep: (1)

contains

’s generation number, (2)

is not free and does not contain

’s generation number, (3)

is free.

In the ﬁrst case, objects that changed can be updated us-

ing the ﬁle handle in

to make calls to the NFS server. This

is done differently for different types of objects. For ﬁles,

it is sufﬁcient to issue a setattr and a write to update

the ﬁle’s meta-data and data, and for symbolic links, it is

sufﬁcient to update their meta-data. Updating directories is

slightly trickier. The inverse abstraction function reads the

entire directory from the NFS server, computes its current

abstract value, and compares this value with

. Nothing is

done for entries that did not change. Entries that are not

present in

or point to a different object are removed by

issuing the appropriate calls to the NFS server. Then entries

that are new or different in

are created but if the object

they refer to does not exist in the current abstract state, it is

ﬁrst created using the value for the object that is supplied to

put objs.

In the second case, the NFS server is invoked to remove

the object and then the function proceeds as in case 3.

In the third case, the NFS server is invoked to create the

object (initially in a separate unlinked directory) and the ob-

ject’s data and meta-data is updated as in case 1. It is guar-

anteed that the directories that point to the object will be

processed; the object is then linked to those directories and

removed from the unlinked directory. When new objects are

created, their ﬁle handles are recorded in the conformance

wrapper’s data structures.

3.4. Proactive Recovery

NFS ﬁle handles are volatile: the same ﬁle system ob-

ject may have a different ﬁle handle after the NFS server

restarts. For proactive recovery to work efﬁciently, we need

a persistent identiﬁer for objects in the concrete ﬁle system

state that can be used to compute the abstraction function

during recovery.

The NFS speciﬁcation states that each object is uniquely

identiﬁed by a pair of meta-data attributes:

fsid,ﬁleid

We solve the problem above by maintaining an additional

map from

fsid,ﬁleid

pairs to the corresponding oids. This

map is saved to disk asynchronously when a checkpoint is

created and synchronously before a proactive recovery. Af-

ter rebooting, the replica that is recovering reads the map

from disk. Then it traverses the ﬁle system’s directory tree

depth ﬁrst from the root. It reads each object, uses the map

to obtain its oid, and uses the cryptographic hashes from the

state transfer protocol to check if the object is up-to-date. If

the object is out-of-date or corrupt, it is fetched from an-

other replica.

Instead of simply calling put objs with the new object

values, we intend to start an NFS server on a second empty

disk and bring it up-to-date incrementally as we obtain the

value of the abstract objects. This has the advantage of im-

proving fault-tolerance as discussed in Section 2.2. Addi-

tionally, it can improve disk locality by clustering blocks

from the same ﬁle and ﬁles that are in the same directory.

This is not done in the current prototype.

4. Conclusion

Software errors are a major cause of outages and they are

increasingly exploited in malicious attacks to gain control

or deny access to important services. Byzantine fault toler-

ance allows replicated systems to mask some software er-

rors but it has been expensiveto deploy. We have described

a replication technique, BFTA, which uses abstraction to re-

duce the cost of deploying Byzantine fault tolerance and to

improve its ability to mask software errors.

BFTA reduces cost because it enables reuse of off-the-

shelf service implementations without modiﬁcations, and it

improves resilience to software errors by enabling oppor-

tunistic N-version programming, and software rejuvenation

through proactive recovery.

Opportunistic N-version programming runs distinct, off-

the-shelf implementations at each replica to reduce the prob-

ability of common mode failures. To apply this technique,

it is necessary to deﬁne a common abstract behavioral spec-

iﬁcation for the service and to implement appropriate con-

version functions for the state, requests, and replies of each

implementation in order to make it behave according to the

common speciﬁcation. These tasks are greatly simpliﬁed by

basing the common speciﬁcation on standards for the inter-

operability of software from different vendors; these stan-

dards appear to be common, e.g., ODBC [6], and NFS [5].

Opportunistic N-version programming improves on previ-

ous N-version programming techniques by avoiding the high

development, testing, and maintenance costs without com-

promising the quality of individual versions.

Additionally, we provide a mechanism to repair faulty

replicas. Proactive recovery allows the system to remain

available provided no more than

of the replicas become

faulty and corrupt the abstract state (in a correlated way)

within a window of vulnerability. Abstraction may enable

more than

of the replicas to be faulty because it can

hide corrupt items in concrete states of faulty replicas.

The paper described a replicated NFS ﬁle system imple-

mented using our technique. The conformance wrapper and

the state conversion functions in our prototype are simple —

they have 1105 semi-colons, which is two orders of magni-

tude less than the size of the Linux 2.2 kernel. This suggests

that they are unlikely to introduce new bugs.

We ran a scaled-up version of the Andrew benchmark [8,

2] (which generates 1 GB of data) to compare the perfor-

mance of our replicated ﬁle system and the off-the-shelf

implementation of NFS in Linux 2.2 that it wraps. Our

performance results indicate that the overhead introduced

by our technique is low; it is approximately 30% for this

benchmark with a window of vulnerability of 17 minutes.

These preliminary results suggest that BFTA can be used

in practice. As future work, it would be important to run

experiments that apply BFTA to more challenging services,

e.g., a relational database. It would also be important to

run fault injection experiments to evaluate the availability

improvements afforded by our technique.

References

[1] M. Castro and B. Liskov. Practical Byzantine Fault Toler-

ance. In Proceedings of the Third Symposium on Operat-

ing Systems Design and Implementation, New Orleans, LA,

February 1999.

[2] M. Castro and B. Liskov. Proactive Recovery in a Byzantine-

Fault-Tolerant System. In Proceedings of the Fourth Sympo-

sium on Operating Systems Design and Implementation, San

Diego, CA, October 2000.

[3] L. Chen and A. Avizienis. N-Version Programming: A Fault-

Tolerance Approach to Reliability of Software Operation. In

Fault Tolerant Computing, FTCS-8, pages 3–9, 1978.

[4] Network Working Group Request for Comments: 1014.

XDR: External Data Representation Standard, June 1987.

[5] Network Working Group Request for Comments: 1094.

NFS: Network File System Protocol Speciﬁcation, March

1989.

[6] Kyle Geiger. Inside ODBC. Microsoft Press, 1995.

[7] J. Gray and D. Siewiorek. High-Availability Computer Sys-

tems. IEEE Computer, 24(9):39–48, September 1991.

[8] J. Howard, M. Kazar, S. Menees, D. Nichols, M. Satya-

narayanan, R. Sidebotham, and M. West. Scale and perfor-

mance in a distributed ﬁle system. ACM Transactions on

Computer Systems, 6(1):51–81, February 1988.

[9] Y. Huang, C. Kintala, N. Kolettis, and N. D. Fulton. Software

Rejuvenation: Analysis, Module and Applications. In Fault-

Tolerant Computing, FTCS-25, pages 381–390, Pasadena,

CA, June 1995.

[10] L. Lamport. Time, Clocks, and the Ordering of Events

in a Distributed System. Communications of the ACM,

21(7):558–565, July 1978.

[11] B. Liskov and J. Guttag. Program Development in Java:

Abstraction, Speciﬁcation, and Object-Oriented Design.

Addison-Wesley, 2000.

[12] M. Pease, R. Shostak, and L. Lamport. Reaching Agreement

in the Presence of Faults. Journal of the ACM, 27(2):228–

234, April 1980.

[13] A. Romanovsky. Abstract Object State and Version Re-

covery in N-Version Programming. In TOOLS Europe’99,

Nancy, France, June 1999.

[14] F. Schneider. Implementing Fault-Tolerant Services Using

the State Machine Approach: A Tutorial. ACM Computing

Surveys, 22(4):299–319, December 1990.

Distributed Ledger Technology (DLT) and Byzantine Fault Tolerance in Blockchain

Chapter

Jun 2022

With the advent of technology, the world saw the rise of blockchain technology, because of its accessibility, and efficiency in managing transactions and the related records. According to IBM, because it delivers immediate, shareable and entirely transparent information kept on an immutable ledger that can only be viewed by permissioned network users, blockchain is excellent for delivering that information. The most important aspect of blockchain is its distributed ledger technology. The Byzantine Fault Tolerance which is largely associated with distributed systems is a feature that allows a decentralised, trustless network to function even when some nodes are broken or malevolent. This paper explains the use of proof-of-work algorithms and Byzantine Fault Tolerance (BFT) to handle the Byzantine Faults in blockchain technology, its effects and the solutions to this problem.

Intrusion Resilience Systems for Modern Vehicles

Preprint

Full-text available

Jul 2023

Current vehicular Intrusion Detection and Prevention Systems either incur high false-positive rates or do not capture zero-day vulnerabilities, leading to safety-critical risks. In addition, prevention is limited to few primitive options like dropping network packets or extreme options, e.g., ECU Bus-off state. To fill this gap, we introduce the concept of vehicular Intrusion Resilience Systems (IRS) that ensures the resilience of critical applications despite assumed faults or zero-day attacks, as long as threat assumptions are met. IRS enables running a vehicular application in a replicated way, i.e., as a Replicated State Machine, over several ECUs, and then requiring the replicated processes to reach a form of Byzantine agreement before changing their local state. Our study rides the mutation of modern vehicular environments, which are closing the gap between simple and resource-constrained "real-time and embedded systems", and complex and powerful "information technology" ones. It shows that current vehicle (e.g., Zonal) architectures and networks are becoming plausible for such modular fault and intrusion tolerance solutions,deemed too heavy in the past. Our evaluation on a simulated Automotive Ethernet network running two state-of-the-art agreement protocols (Damysus and Hotstuff) shows that the achieved latency and throughout are feasible for many Automotive applications.

Intrusion Resilience Systems for Modern Vehicles

Conference Paper

Full-text available

Mar 2023

We introduce the concept of Intrusion Resilience Systems (IRS) for modern vehicles. An IRS enables running a vehicular application in a replicated way, i.e., as a Replicated State Machine, over several ECUs. By requiring the replicated processes to reach a form of Byzantine agreement before changing their local state, the IRS ensures the resilience of critical vehicular applications despite assumed faults or attacks, as long as threat assumptions are met. This paper proposes the tentative architecture of IRS and discusses its feasibility and underlying challenges. Our study rides the mutation of modern vehicular environments, which are closing the gap between simple and resource-scarce 'real-time and embedded systems', and complex and powerful 'information technology' ones. We show that current vehicle architectures and networks are becoming plausible for such modular fault and intrusion tolerance solutions-deemed too heavy in the past. Our evaluation on a simulated Automotive Ethernet network shows that this approach is feasible for many Automotive applications. Our conclusion is that this approach is promising and deserves more attention in both academia and industry.

Intrusion Resilience Systems for Modern Vehicles

Conference Paper

Full-text available

Mar 2023

Coordenação Desacoplada Tolerante a Faltas Bizantinas

Conference Paper

May 2005

Os sistemas distribuídos atuais requerem mecanismos de comunicação que atendam requisitos como anonimato e desconexão temporária. Neste contexto, a comunicação generativa vêm se afirmando como um dos modelos de coordenação capazes de atender esses requisitos uma vez que é desacoplada no tempo e no espaço. Este trabalho apresenta a primeira proposta da literatura a considerar a construção de uma infra-estrutura de coordenação generativa tolerante a faltas bizantinos. Esta construção se dá através da aplicação de replicação por sistemas de quóruns bizantinos.

Implementando Diversidade em Replicação Máquina de Estados

Conference Paper

May 2016

Vulnerabilidades podem comprometer as propriedades de segurança de um sistema quando adequadamente exploradas por um atacante. Uma alternativa para mitigar este risco é a implementação de sistemas tolerantes a intrusões. Uma abordagem muito utilizada para estas implementações é a replicação Máquina de Estados (RME). Porém, as soluções existentes não suportam diversidade na implementação das réplicas, de forma que um mesmo ataque pode comprometer todo o sistema. Neste sentido, este trabalho propõe uma arquitetura para fornecer suporte à diversidade de implementação em RMEs e mostra como a mesma foi integrada no BFT-SMART. Um conjunto de experimentos mostra o comportamento prático das soluções propostas.

Intrusion Resilience Systems for Modern Vehicles - Position Paper

Preprint

Full-text available

Sep 2022

We introduce the concept of Intrusion Resilience Systems (IRS) for modern vehicles. An IRS is a middleware that enables running a vehicular application in a replicated way, i.e., as a Replicated State Machine, over several ECUs. By requiring the replicated processes to reach a form of Byzantine agreement before changing their local state, the IRS ensures the resilience of critical vehicular applications despite assumed faults or attacks, as long as threat assumptions are met. This position paper proposes the tentative architecture of IRS and discusses its conceptual feasibility and underlying challenges. Our study rides the mutation of modern vehicular environments, which are closing the gap between simple and resource-scarce 'real-time and embedded systems', and complex and powerful 'information technology' ones. We show that current architectures are becoming plausible for such modular fault and intrusion tolerance solutions-deemed too heavy in the past. Our conclusion is that this topic deserves more attention in both academia and industry.

Quantifying Cybersecurity Effectiveness of Dynamic Network Diversity

Preprint

Dec 2021

The deployment of monoculture software stacks can have devastating consequences because a single attack can compromise all of the vulnerable computers in cyberspace. This one-vulnerability-affects-all phenomenon will continue until after software stacks are diversified, which is well recognized by the research community. However, existing studies mainly focused on investigating the effectiveness of software diversity at the building-block level (e.g., whether two independent implementations indeed exhibit independent vulnerabilities); the effectiveness of enforcing network-wide software diversity is little understood, despite its importance in possibly helping justify investment in software diversification. As a first step towards ultimately tackling this problem, we propose a systematic framework for modeling and quantifying the cybersecurity effectiveness of network diversity, including a suite of cybersecurity metrics. We also present an agent-based simulation to empirically demonstrate the usefulness of the framework. We draw a number of insights, including the surprising result that proactive diversity is effective under very special circumstances, but reactive-adaptive diversity is much more effective in most cases.

Poligraph: Intrusion-Tolerant and Distributed Fake News Detection System

Article

Nov 2021

We present Poligraph, an intrusion-tolerant and decentralized fake news detection system. Poligraph aims to address architectural, system, technical, and social challenges of building a practical, long-term fake news detection platform. We first conduct a case study for fake news detection at authors’ institute, showing that machine learning-based reviews are less accurate but timely, while human reviews, in particular, experts reviews, are more accurate but time-consuming. This justifies the need for combining both approaches. At the core of Poligraph is two-layer consensus allowing seamlessly combining machine learning techniques and human expert determination. We construct the two-layer consensus using Byzantine fault-tolerant (BFT) and asynchronous threshold common coin protocols. We prove the correctness of our system in terms of conventional definitions of security in distributed systems (agreement, total order, and liveness) as well as new review validity (capturing the accuracy of news reviews). We also provide theoretical foundations on parameter selection for our system. We implement Poligraph and evaluate its performance on Amazon EC2 using a variety of news from online publications and social media. We demonstrate Poligraph achieves throughput of more than 5,000 transactions per second and latency as low as 0.05 second. The throughput of Poligraph is only marginally ( ${4\%}$ – ${7\%}$ ) slower than that of an unreplicated, single-server implementation. In addition, we conduct a real-world case study for the review of fake and real news among both experts and non-experts, which validates the practicality of our approach.

An Optimization Strategy for PBFT Consensus Mechanism Based On Consortium Blockchain

Conference Paper

May 2021

Scale and Performance in a Distributed File System.

Article

Full-text available

Nov 1987

Andrew is a distributed computing environment being developed in a joint project by Carnegie Mellon University and IBM. One of the major components of Andrew is a distributed file system which constitutes underlying mechanism for sharing information. The goals of the Andrew file system are to support growth up to at least 7000 workstations (one for each student, faculty member, and staff at Carnegie Mellon) while providing users, application programs, and system administrators with the amenities of a shared file system. A fundamental result of our concern with scale is the design decision to transfer whole files between servers and workstations rather than some smaller unit such as records or blocks, as almost all other distributed file systems do. This paper examines the consequences of this and other design decisions and features that bear on the scalability of Andrew. Large scale affects a distributed system in two ways: it degrades performance and it complicates administration and day-to-day operation. This paper addresses both concerns and shows that the mechanisms we have incorporated cope with them successfully. We start the initial prototype of the system, what we learned from it, and how we changed the system to improve performance. We compare its performance with that of a block-oriented file system, Sun Microsystems' NFS, in order to evaluate the whole file transfer strategy. We then turn to operability, and finish with issues related peripherally to scale and with the ways the present design could be enchanced.

Scale and performance in a distributed system

Article

Full-text available

Feb 1988

The Andrew File System is a location-transparent distributed tile system that will eventually span more than 5000 workstations at Carnegie Mellon University. Large scale affects performance and complicates system operation. In this paper we present observations of a prototype implementation, motivate changes in the areas of cache validation, server process structure, name translation, and low-level storage representation, and quantitatively demonstrate Andrews ability to scale gracefully. We establish the importance of whole-file transfer and caching in Andrew by comparing its performance with that of Sun Microsystems NFS tile system. We also show how the aggregation of files into volumes improves the operability of the system.

N-version programming: a fanlt-tolerance approach to reliability of software operation

Article

Jan 1978

N-version programming is defined as the independent generation of N greater than equivalent to 2 functionally equivalent programs from the same initial specification. A methodology of N-version programming has been devised and three types of special mechanisms have been identified that are needed to coordinate the execution of an N-version software unit and to compare the correspondent results generated by each version. Two experiments have been conducted to test the feasibility of N-version programming. The results of these experiments are discussed. In addition, constraints are identified that must be met for effective application of N-version programming.

Reaching Agreement in the Presence of Faults

Article

Jan 1979

The problem addressed here concerns a set of isolated processors, some unknown subset of which may be faulty, that communicate only by means of two-party messages. Each nonfaulty processor has a private value of information that must be communicated to each other nonfaulty processor. Nonfaulty processors always communicate honestly, whereas faulty processors may lie. The problem is to devise an algorithm in which processors communicate their own values and relay values received from others that allows each nonfaulty processor to infer a value for each other processor. The value inferred for a nonfaulty processor must be that processor's private value, and the value inferred for a faulty one must be consistent with the corresponding value inferred by each other nonfaulty processor. It is shown that the problem is solvable for, and only for, n ≥ 3m + 1, where m is the number of faulty processors and n is the total number. It is also shown that if faulty processors can refuse to pass on information but cannot falsely relay information, the problem is solvable for arbitrary n ≥ m ≥ 0. This weaker assumption can be approximated in practice using cryptographic methods.

The 007 Benchmark

Article

Jun 1993
SIGMOD REC

The OO7 Benchmark represents a comprehensive test of OODBMS performance. In this paper we describe the benchmark and present performance results from its implementation in three OODBMS systems. It is our hope that the OO7 Benchmark will provide useful insight for end-users evaluating the performance of OODBMS systems; we also hope that the research community will find that OO7 provides a database schema, instance, and workload that is useful for evaluating new techniques and algorithms for OODBMS implementation.

Portable Checkpointing for Heterogeneous Archtitectures

Article

Jun 1997

Current approaches for checkpointing assume system homogeneity, where checkpointing and recovery are both performed on the same processor architecture and operating system configuration. Sometimes it is desirable or necessary to recover a failed computation on a different processor architecture. For such situations checkpointing and recovery must be portable. In this paper, we argue that source-to-source compilation is an appropriate concept for this purpose. We describe the compilation techniques that we developed for the design of the c2ftc prototype. The c2ftc compiler enables machine-independent checkpoints by automatic generation of checkpointing and recovery code. Sequential C programs are compiled into fault tolerant C programs, whose checkpoints can be migrated across heterogeneous networks, and restarted on binary incompatible architectures. Experimental results on several systems provide evidence that the performance penalty of portable checkpointing is negligible for realistic checkpointing frequencies.

MetaKernels and Fault Containment Wrappers

Article

Jun 1999

This paper addresses the problem of using COTS microkernels in dependable systems. Because they are not developed with this aim, their behavior in the presence of faults is a main concern to system designers. We propose a novel approach to contain the effect of both external and internal faults that may affect their behavior. As microkernels can be decomposed into simple components, modeling of their expected behavior in the absence of faults is most often possible, which allows for the easy definition of dynamic predicates. For an efficient implementation of fault containment wrappers checking for these predicates, we introduce the notion of MetaKernel to reify the information required for implementing the predicates and to reflect appropriate actions. This approach is exemplified on a case study using an open version of the Chorus microkernel. MAFALDA, a software-implemented fault injection tool, is used to illustrate the benefits procured by the proposed wrappers

Inside ODBC.

Book

Jan 1995

Kyle Geiger

Program Development in Java - Abstraction, Specification, and Object-Oriented Design.

Book

Jan 2001

Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial

Article

Dec 1990

Fred B. Schneider

The state machine approach is a general method for implementing fault-tolerant services in distributed systems. This paper reviews the approach and describes protocols for two different failure models—Byzantine and fail stop. Systems reconfiguration techniques for removing faulty components and integrating repaired components are also discussed.

BASE: Using Abstraction to Improve Fault Tolerance

Abstract and Figures

Recommended publications

BASE

BASE: using abstraction to improve fault tolerance

Using Abstraction To Improve Fault Tolerance

Byzantine fault tolerance can be fast