Conference PaperPDF Available

Using Diversity as Fault Tolerance Tool

Authors:

Abstract and Figures

Diversity is a known approach for increasing reliability of computer systems. The goal of this work is to present properties of diversity as fail safe and fault tolerance tool and give quantitative criteria for measure of diversity. For this purpose, the model of diversity-based system with two failure types: detectable and undetectable is presented and a formula to calculate it is proposed
Content may be subject to copyright.
Using Diversity as Fault Tolerance Tool
George Popov
Technical University of Sofia, Bulgaria
email: popovg@tu-sofia.bg
Abstract: Diversity is a known approach for increasing reliability of computer systems. The
goal of this work is to present properties of diversity as fail safe and fault tolerance tool and give
quantitative criteria for measure of diversity. For this purpose, the model of diversity-based sys-
tem with two failure types: detectable and undetectable is presented and a formula to calculate
it is proposed
Keywords: diversity, dependability, computer system, embedded system, fail-safe, fault-
tolerance.
1. INTRODUCTION
As is well-known, with aim to increase reliability of calculations, it is possible to
make them by multiple ways (multi channel, multi version) and compare received re-
sults using defined criteria: predomination, majority or concordance.
This paper is devoted to two version method, known as diversity.
Diversity is a method of solving a problem (mathematical, logical, technical or oth-
er) in two (A and B) different ways (paths) with identical input data, by virtue of which a
criterion of the solution being perfect is the correspondence (in this particular case-
identity) of the obtained output results
If input data are equal the decision from both channels must be equal. But it is true
only if decisions are perfect. If there are errors in the channels then results are different.
If we compare two results, we can detect an error and (at suitable design) where (in
which channel) is it.
Henceforth, this can be reach with multi channel homogeneous methods, when
two channels equal processing is used. What is advantage of diversity, we can under-
stand, when we pose a question about errors?
The errors can be:
- a result of casual faults in the system modules wrong function, short and
broken circuits and etc;
- systematic, when system rules and functional algorithms are violated.
First group of disturbances can be detecting by diversity and homogeneous ways.
The main advantage of homogeneous systems is in the fact that these systems are
cheaper. If we use in the homogeneous structure equal units (computers, controllers)
then faults will be appear timely in one of them. When the faults dependency of infor-
mation stream is activated, and then we will be receiving a difference of output results.
If causes for errors are parasitic signals from the supply or environment (they might
be entering by same way into two channels) they make equal, but wrong, unrecogniza-
ble by comparison results. It is possible to perceive these results as correct, which
might be danger.
Homogeneous repeatedly consecutively processing in one unit (computer, channel
and controller) can detect sporadically disturbances, because they appear in the differ-
ent casual times: one of results is wrong but another is correct. Comparison detects er-
rors.
Homogeneous methods can be use for multichannel identification to one event, ob-
ject or phenomenon. Par example, two PIR detectors detect for an intruder, but they are
situated at the different places, they “see” the intruder by different way. It is possible
one of them do not detect the intruder. If we make a disjunction of output results, the
probability for detection is higher.
Second group of disturbances (include conceptually, designer, synthetically, tech-
nologically, documental errors) cannot be detected using homogeneous methods. Sys-
tematically error repeats in all modules which are fabricated by firm technology docu-
mentation. The most effective method for error detection is diversity (in common case
N-version processing).
2. FAULT TOLERANCE PROPERTY OF DIVERSITY
Diversity can be use in two properties:
Fail-safe, it is possible to identify errors thought this instrument and cancel
system processing when system is damaged. It is need to compare results
for equivalency (Fig.1).
Processing А
Processing B
Output
result А
Output
result В
Input
Data
Comparision
ОК
А
В
Fig.1: Fail safe approach of diversity
Fault-tolerance, the goal of this instrument is to tolerate the errors and the
whole system keeps his efficiency. There are two ways:
- logic for system reservation with diversity switching (Fig.2);
Fig.2: A reservation with diversity switching
- logic with hot diversity reservation (permanently incorporated diversity res-
ervation (Fig.3).
Fig.3: A hot diversity reservation
3. CALCULATION THE FAULT TOLERANCE PROPERTY OF DIVERSITY
Let dual-channel system has a flow of failures with intensity λ. Then set {A} repre-
sents the faults of channel A and set {B} the faults of channel B (see Fig.4).
Obviously, there are common and different reasons for failures in these channels.
Set {A∩B} represents common reasons. Set {A-B} and set {B-A} represent different
reasons. Set {AUB} represents all failures in the mentioned diversity system.
Fig.4: Schema of faults in one diversity system
Let Set {D} is the set of faults which don’t lead to general failure, because these
faults are elements of sets {A-B} or {B-A}. In this case there is one working channel
and system will be in availability.
(1)
)}{(}{}{ BABAD
If we suppose all faults which don’t lead to general failure (Card {D}) to all faults in
the system (Card{AUB}), we can receive an estimation to a depth of diversity Ω.
{A-B}
reasons for faults
in channel A, other
than the channel B
{A}
reasons for faults
in channel A
{B}
reasons for faults
in channel B
{B-A}
reasons for faults
in channel A, other
than the channel B
{B∩A}
reasons for faults
in channel A, common
to those of channel B
(2)
}{ )}(}{ BACard BACardBACard
(3)
}{ }{
1BACard BACard
When Card {A∩B} =0 the Ω =1 there is maximum diversity. Contrariwise, if
Card{A}=Card{B}=Card{A∩B} the depth of diversity will be zero.
For example, if both sets are particularly overlapped, and Card {A}=20, Card
{B}=30 and Card {A∩B}=10, then depth of diversity Ω will be:
(4)
75,0
}40{}10{
1Card
Card
.
4. CONCLUSION
Drawn from both models and research done can make the following important gen-
eralizations:
1. Reliability as an essential feature of the systems may be based on diversity res-
ervation approach.
2. If depth of diversity is deeper, then reliability will be greater.
3. To determine the factors that determine the depth of diversity is necessary to ex-
amine the specific schedule for this case as well as seeking general and local causes of
failures and their intensity.
5. REFERENCES
All references will be included at the end of the paper in alphabetical order of the
surnames of the first author and must follow the bibliographic IEEE standards of jour-
nals. References should be formatted with the „Reference“ style, Arial size 12 pt, single
spacing. The rules for referencing Books and Journals are presented as follows:
[1] Popov G., Modelling Diversity as a Method of Detecting Failures in non Recovery
Computer Systems, Information Technologies and Control, 2005, N#2.pp15-19
[2] Popov G., Hristov H., Diversity as method for failure detection and tool for in-
crease reliability, Telematika, 2008,N#1,pp 65-72
[3] Hristov H., Trifonov V., Reliability and security of communications, Novi znania,
2005
[4] Martin Törngren and Jan Torin, Conceptual Design of Dependable Embedded
Control Systems,7.Oct 1998.
[5] Knight, J. C., E. A. Strunk and K. J. Sullivan. “Towards a Rigorous Definition of
Information System Survivability.”DISCEX 2003, Washington, DC, April 2003.
[6] Avizienis, A. “The N-version approach to fault tolerant software.” IEEE
Transactions on Software Engineering 11(12):1491-1501, December 1985
[7] 7. Strunk Е., Survivability in Embedded Systems, Ph.D. Dissertation, Sept. 12,
2003
Book
The world trend in automotive industry represents the improvement of the existing vehicle power plants and their further development as well as the use of various alternative fuels. Such tendencies should not be considered only from an entirely technical aspect, but also from the economic, social and strategic aspects of the modern society. In this sense it is necessary to give priority to biodiesel fuel. The production of biodiesel fuel has to be developed in compliance with the increasingly severe exhaust emission standards in designing and realization of road transport means. From the economic aspect at macro-economic level, the development of biodiesel will reflect on the condition of industrial production, employment, additional inflow of financial means into agriculture and the economic development of rural areas, as well as the foreign currency reserves of a country along with the reduction in the dependence of macroeconomic parameters on the external factors.
Conference Paper
The diversity as a fault-tolerance and fail-safe method is the issue. The quantity evaluation is used for the depth of the diversity and formulas are derived for the probability of fault detection and dependability function as the diversity. The logic-probabilistic transition method is used.
Raw Data
Full-text available
Diversity is a known approach for increasing reliability of computer systems. The goal of this work is to present quantitative criteria for measure of diversity in non recovery computer systems. For this purpose, the model of diversity-based system with two failure types: detectable and undetectable is presented and a formula to calculate it is proposed.
Conference Paper
Full-text available
Diversity is basic method for increasing dependability of systems. The aim of this paper is to present quantitative criteria for measure of diversity in the computer systems. For this purpose the model of diversity system with two kind failures-detectable and undetectable is presented and a formula to calculate it is given.
Conference Paper
Full-text available
The computer systems that provide the information underpinnings for critical infrastructure applications, both military and civilian, are essential to the operation of those applications. Failure of the information systems can cause a major loss of service, and so their dependability is a major concern. Current facets of dependability, such as reliability and availability, do not address the needs of critical information systems adequately because they do not include the notion of degraded service as an explicit requirement. What is needed is a precise notion of what forms of degraded service are acceptable to users, under what circumstances each form is most useful, and the fraction of time such degraded service levels are acceptable. This concept is termed survivability. In this paper, we present the basis for a rigorous definition of survivability and an example of its use.
Article
Full-text available
Evolution of the N-version software approach to the tolerance of design faults is reviewed. Principal requirements for the implementation of N-version software are summarized and the DEDIX distributed supervisor and testbed for the execution of N-version software is described. Goals of current research are presented and some potential benefits of the N-version approach are identified.
Article
Safety-critical systems have made extensive use of software for some time, and they have a very good overall safety record. The size and complexity of these systems is increasing, however, and while software development technology is advancing, it is unclear that the pace of that advancement is rapid enough to match the increase in com-plexity. This research proposes to develop the key aspects of an approach to the implementation of complex safety-critical systems that enables them to maintain crucial safety properties with a high degree of assurance even in cases where full functionality cannot be guaranteed; and to develop analysis techniques that afford strong system safety assurance arguments. I plan to address this problem by creating a framework in which application developers can build the complex functionality desired while retaining dependability properties required of safety-critical function. The framework is based on the idea of survivability similar to that used in networked information systems. A survivable system imple-ments a primary specification and one or more simpler alternative specifications that define reduced or different func-tionality but which maintain crucial safety properties. The important advantage is that the alternatives can be simpler and, therefore, significantly more amenable to analysis than the primary—which enables comprehensive analysis of crucial portions of the system. The unique elements of the survivability approach that I propose are that it is specifi-cation driven, that is general and broadly applicable, and that it is supported by rigorous analytic techniques that per-mit strong assurance arguments to be developed.
Article
IntroductionA modern machinery system (aircraft, spacecraft, road or rail vehicles etc.) is very complex and dependson and incorporates a number of functions, for example mission management, control of dynamics,external communication, diagnostics, and maintenance support, all with dependencies and messageexchanges.Clearly, the functions and their associated performance and dependability (here primarily referring tosafety and reliability) requirements need to be carefully considered when...
Article
Communications plays an important role in every walk of life. From the ages, the simple means of communication has today become very complex. Communications, to be complete, needs a sender, receiver, a message, and confirmation that the stated message has been received. The quality of the message content, the efficiency with which it is transmitted, and the data integrity all play an important role in eifective communication. Use of computers in data and voice and video communications is becoming commonplace in the high-tech world of today. This plays an especially critical dependence on the computer and the related peripheral equipment. Computer networks are being pressed into service and the term “availability” has taken an important meaning as it relates to customer satisfaction. Today's customer has become more sophisticated and is really not interested in the technical jargon that we engineers are notorious for, but is interested in the availability of the system when it is pressed into service. All of this means that communications in its traditional sense as well as the equipment used for communication must be equally reliable. The challenge of today's technology and the technology of the future is to translate that into quality and reliable products that meet the needs of the ultimate user-the CUSTOMER.
Reliability and security of communications, Novi znania
  • H Hristov
  • V Trifonov
Hristov H., Trifonov V., Reliability and security of communications, Novi znania, 2005