Conference PaperPDF Available

Using Diversity as Fault Tolerance Tool

June 2010

June 2010

Conference: Challenges in Higher Education

Authors:

Technical University of Sofia

Diversity is a known approach for increasing reliability of computer systems. The goal of this work is to present properties of diversity as fail safe and fault tolerance tool and give quantitative criteria for measure of diversity. For this purpose, the model of diversity-based system with two failure types: detectable and undetectable is presented and a formula to calculate it is proposed

Fail safe approach of diversity

…

A reservation with diversity switching

…

Figures - uploaded by George Popov

Content may be subject to copyright.

Content uploaded by George Popov

Content may be subject to copyright.

Using Diversity as Fault Tolerance Tool

George Popov

Technical University of Sofia, Bulgaria

email: popovg@tu-sofia.bg

Abstract: Diversity is a known approach for increasing reliability of computer systems. The

goal of this work is to present properties of diversity as fail safe and fault tolerance tool and give

quantitative criteria for measure of diversity. For this purpose, the model of diversity-based sys-

tem with two failure types: detectable and undetectable is presented and a formula to calculate

it is proposed

Keywords: diversity, dependability, computer system, embedded system, fail-safe, fault-

tolerance.

1. INTRODUCTION

As is well-known, with aim to increase reliability of calculations, it is possible to

make them by multiple ways (multi channel, multi version) and compare received re-

sults using defined criteria: predomination, majority or concordance.

This paper is devoted to two version method, known as diversity.

Diversity is a method of solving a problem (mathematical, logical, technical or oth-

er) in two (A and B) different ways (paths) with identical input data, by virtue of which a

criterion of the solution being perfect is the correspondence (in this particular case-

identity) of the obtained output results

If input data are equal the decision from both channels must be equal. But it is true

only if decisions are perfect. If there are errors in the channels then results are different.

If we compare two results, we can detect an error and (at suitable design) where (in

which channel) is it.

Henceforth, this can be reach with multi channel homogeneous methods, when

two channels equal processing is used. What is advantage of diversity, we can under-

stand, when we pose a question about errors?

The errors can be:

- a result of casual faults in the system modules – wrong function, short and

broken circuits and etc;

- systematic, when system rules and functional algorithms are violated.

First group of disturbances can be detecting by diversity and homogeneous ways.

The main advantage of homogeneous systems is in the fact that these systems are

cheaper. If we use in the homogeneous structure equal units (computers, controllers)

then faults will be appear timely in one of them. When the faults dependency of infor-

mation stream is activated, and then we will be receiving a difference of output results.

If causes for errors are parasitic signals from the supply or environment (they might

be entering by same way into two channels) they make equal, but wrong, unrecogniza-

ble by comparison results. It is possible to perceive these results as correct, which

might be danger.

Homogeneous repeatedly consecutively processing in one unit (computer, channel

and controller) can detect sporadically disturbances, because they appear in the differ-

ent casual times: one of results is wrong but another is correct. Comparison detects er-

rors.

Homogeneous methods can be use for multichannel identification to one event, ob-

ject or phenomenon. Par example, two PIR detectors detect for an intruder, but they are

situated at the different places, they “see” the intruder by different way. It is possible

one of them do not detect the intruder. If we make a disjunction of output results, the

probability for detection is higher.

Second group of disturbances (include conceptually, designer, synthetically, tech-

nologically, documental errors) cannot be detected using homogeneous methods. Sys-

tematically error repeats in all modules which are fabricated by firm technology docu-

mentation. The most effective method for error detection is diversity (in common case

N-version processing).

2. FAULT TOLERANCE PROPERTY OF DIVERSITY

Diversity can be use in two properties:

 Fail-safe, it is possible to identify errors thought this instrument and cancel

system processing when system is damaged. It is need to compare results

for equivalency (Fig.1).

Processing А

Processing B

Output

result А

Output

result В

Input

Data

Comparision

ОК

Fig.1: Fail safe approach of diversity

 Fault-tolerance, the goal of this instrument is to tolerate the errors and the

whole system keeps his efficiency. There are two ways:

- logic for system reservation with diversity switching (Fig.2);

Fig.2: A reservation with diversity switching

- logic with hot diversity reservation (permanently incorporated diversity res-

ervation (Fig.3).

Fig.3: A hot “diversity” reservation

3. CALCULATION THE FAULT TOLERANCE PROPERTY OF DIVERSITY

Let dual-channel system has a flow of failures with intensity λ. Then set {A} repre-

sents the faults of channel A and set {B} – the faults of channel B (see Fig.4).

Obviously, there are common and different reasons for failures in these channels.

Set {A∩B} represents common reasons. Set {A-B} and set {B-A} represent different

reasons. Set {AUB} represents all failures in the mentioned diversity system.

Fig.4: Schema of faults in one diversity system

Let Set {D} is the set of faults which don’t lead to general failure, because these

faults are elements of sets {A-B} or {B-A}. In this case there is one working channel

and system will be in availability.

(1)

)}{(}{}{ BABAD 

If we suppose all faults which don’t lead to general failure (Card {D}) to all faults in

the system (Card{AUB}), we can receive an estimation to a depth of diversity Ω.

{A-B}

reasons for faults

in channel A, other

than the channel B

{A}

reasons for faults

in channel A

{B}

reasons for faults

in channel B

{B-A}

reasons for faults

in channel A, other

than the channel B

{B∩A}

reasons for faults

in channel A, common

to those of channel B

(2)

}{ )}(}{ BACard BACardBACard







(3)

}{ }{

1BACard BACard







When Card {A∩B} =0 the Ω =1 there is maximum diversity. Contrariwise, if

Card{A}=Card{B}=Card{A∩B} the depth of diversity will be zero.

For example, if both sets are particularly overlapped, and Card {A}=20, Card

{B}=30 and Card {A∩B}=10, then depth of diversity Ω will be:

(4)

75,0

}40{}10{

1 Card

Card

4. CONCLUSION

Drawn from both models and research done can make the following important gen-

eralizations:

1. Reliability as an essential feature of the systems may be based on diversity res-

ervation approach.

2. If depth of diversity is deeper, then reliability will be greater.

3. To determine the factors that determine the depth of diversity is necessary to ex-

amine the specific schedule for this case as well as seeking general and local causes of

failures and their intensity.

5. REFERENCES

All references will be included at the end of the paper in alphabetical order of the

surnames of the first author and must follow the bibliographic IEEE standards of jour-

nals. References should be formatted with the „Reference“ style, Arial size 12 pt, single

spacing. The rules for referencing Books and Journals are presented as follows:

[1] Popov G., Modelling Diversity as a Method of Detecting Failures in non Recovery

Computer Systems, Information Technologies and Control, 2005, N#2.pp15-19

[2] Popov G., Hristov H., Diversity as method for failure detection and tool for in-

crease reliability, Telematika, 2008,N#1,pp 65-72

[3] Hristov H., Trifonov V., Reliability and security of communications, Novi znania,

2005

[4] Martin Törngren and Jan Torin, Conceptual Design of Dependable Embedded

Control Systems,7.Oct 1998.

[5] Knight, J. C., E. A. Strunk and K. J. Sullivan. “Towards a Rigorous Definition of

Information System Survivability.”DISCEX 2003, Washington, DC, April 2003.

[6] Avizienis, A. “The N-version approach to fault tolerant software.” IEEE

Transactions on Software Engineering 11(12):1491-1501, December 1985

[7] 7. Strunk Е., Survivability in Embedded Systems, Ph.D. Dissertation, Sept. 12,

2003

Transport Systems Telematics

Book

Jan 2011

Jerzy Mikulski

The world trend in automotive industry represents the improvement of the existing vehicle power plants and their further development as well as the use of various alternative fuels. Such tendencies should not be considered only from an entirely technical aspect, but also from the economic, social and strategic aspects of the modern society. In this sense it is necessary to give priority to biodiesel fuel. The production of biodiesel fuel has to be developed in compliance with the increasingly severe exhaust emission standards in designing and realization of road transport means. From the economic aspect at macro-economic level, the development of biodiesel will reflect on the condition of industrial production, employment, additional inflow of financial means into agriculture and the economic development of rural areas, as well as the foreign currency reserves of a country along with the reduction in the dependence of macroeconomic parameters on the external factors.

Diversity as a Means for Reliability and Safety

Conference Paper

Oct 2010

The diversity as a fault-tolerance and fail-safe method is the issue. The quantity evaluation is used for the depth of the diversity and formulas are derived for the probability of fault detection and dependability function as the diversity. The logic-probabilistic transition method is used.

Diversity as method for failure detection and tool for increase reliability in communication systems

Raw Data

Full-text available

Jan 2009

Diversity is a known approach for increasing reliability of computer systems. The goal of this work is to present quantitative criteria for measure of diversity in non recovery computer systems. For this purpose, the model of diversity-based system with two failure types: detectable and undetectable is presented and a formula to calculate it is proposed.

MODELLING DIVERSITY AS A METHOD OF DETECTING FAILURES IN COMPUTER SYSTEMS

Conference Paper

Full-text available

May 2005

George Popov

Diversity is basic method for increasing dependability of systems. The aim of this paper is to present quantitative criteria for measure of diversity in the computer systems. For this purpose the model of diversity system with two kind failures-detectable and undetectable is presented and a formula to calculate it is given.

Towards a Rigorous Definition of Information System Survivability.

Conference Paper

Full-text available

Jan 2003

The computer systems that provide the information underpinnings for critical infrastructure applications, both military and civilian, are essential to the operation of those applications. Failure of the information systems can cause a major loss of service, and so their dependability is a major concern. Current facets of dependability, such as reliability and availability, do not address the needs of critical information systems adequately because they do not include the notion of degraded service as an explicit requirement. What is needed is a precise notion of what forms of degraded service are acceptable to users, under what circumstances each form is most useful, and the fraction of time such degraded service levels are acceptable. This concept is termed survivability. In this paper, we present the basis for a rigorous definition of survivability and an example of its use.

The N-Version Approach to Fault-Tolerant Software

Article

Full-text available

Jan 1986

Algirdas Avizienis

Evolution of the N-version software approach to the tolerance of design faults is reviewed. Principal requirements for the implementation of N-version software are summarized and the DEDIX distributed supervisor and testbed for the execution of N-version software is described. Goals of current research are presented and some potential benefits of the N-version approach are identified.

Survivability in Embedded Systems

Article

Elisabeth A Strunk

Safety-critical systems have made extensive use of software for some time, and they have a very good overall safety record. The size and complexity of these systems is increasing, however, and while software development technology is advancing, it is unclear that the pace of that advancement is rapid enough to match the increase in com-plexity. This research proposes to develop the key aspects of an approach to the implementation of complex safety-critical systems that enables them to maintain crucial safety properties with a high degree of assurance even in cases where full functionality cannot be guaranteed; and to develop analysis techniques that afford strong system safety assurance arguments. I plan to address this problem by creating a framework in which application developers can build the complex functionality desired while retaining dependability properties required of safety-critical function. The framework is based on the idea of survivability similar to that used in networked information systems. A survivable system imple-ments a primary specification and one or more simpler alternative specifications that define reduced or different func-tionality but which maintain crucial safety properties. The important advantage is that the alternatives can be simpler and, therefore, significantly more amenable to analysis than the primary—which enables comprehensive analysis of crucial portions of the system. The unique elements of the survivability approach that I propose are that it is specifi-cation driven, that is general and broadly applicable, and that it is supported by rigorous analytic techniques that per-mit strong assurance arguments to be developed.

Conceptual Design of Dependable Embedded Control Systems

Article

IntroductionA modern machinery system (aircraft, spacecraft, road or rail vehicles etc.) is very complex and dependson and incorporates a number of functions, for example mission management, control of dynamics,external communication, diagnostics, and maintenance support, all with dependencies and messageexchanges.Clearly, the functions and their associated performance and dependability (here primarily referring tosafety and reliability) requirements need to be carefully considered when...

Quality and Reliability in Communications

Article

Nov 1988

Vyasaraj V. Murthy

Communications plays an important role in every walk of life. From the ages, the simple means of communication has today become very complex. Communications, to be complete, needs a sender, receiver, a message, and confirmation that the stated message has been received. The quality of the message content, the efficiency with which it is transmitted, and the data integrity all play an important role in eifective communication. Use of computers in data and voice and video communications is becoming commonplace in the high-tech world of today. This plays an especially critical dependence on the computer and the related peripheral equipment. Computer networks are being pressed into service and the term “availability” has taken an important meaning as it relates to customer satisfaction. Today's customer has become more sophisticated and is really not interested in the technical jargon that we engineers are notorious for, but is interested in the availability of the system when it is pressed into service. All of this means that communications in its traditional sense as well as the equipment used for communication must be equally reliable. The challenge of today's technology and the technology of the future is to translate that into quality and reliable products that meet the needs of the ultimate user-the CUSTOMER.

Reliability and security of communications, Novi znania

Jan 2005

H Hristov
V Trifonov

Hristov H., Trifonov V., Reliability and security of communications, Novi znania, 2005

Using Diversity as Fault Tolerance Tool

Abstract and Figures

Recommended publications

FAILURES DETECTION METHODOLOGY IN NON RECOVERY COMPUTER SYSTEMS BASED ON DIVERSITY MODELING

Diversity as tool for increase reliability of systems

Diversity as Tool for Increasing Reliability of Systems

Diversity as method for failure detection and tool for increase reliability in communication systems

Modeling Diversity in Recovery Computer Systems