ArticlePDF Available

Fully unsupervised fault detection and identification based on recursive density estimation and self-evolving cloud-based classifier

October 2014
Neurocomputing 150

October 2014
150

DOI:10.1016/j.neucom.2014.05.086

Authors:

Bruno Sielly Jales Costa

Meta

Plamen P Angelov

Lancaster University

Luiz Affonso Guedes

Universidade Federal do Rio Grande do Norte

In this paper, we propose a two-stage algorithm for real-time fault detection and identification of industrial plants. Our proposal is based on the analysis of selected features using recursive density estimation and a new evolving classifier algorithm. More specifically, the proposed approach for the detection stage is based on the concept of the density in the data space, which is not the same as the probability density function, but is a very useful measure for abnormality/outliers detection. This density can be expressed by a Cauchy function and can be calculated recursively, which makes it memory and computational power efficient and, therefore, applicable to on-line applications. The identification/diagnosis stage is based on a self-developing (evolving) fuzzy-rule-based classifier system proposed in this paper, called the AutoClass. An important property of AutoClass is that it can start learning “from scratch”. Not only do the fuzzy rules not need to be prespecified, but neither do the number of classes for AutoClass (the number may grow, with new class labels being added by the online learning process), in a fully unsupervised manner. In the event that an initial rule base exists, AutoClass can evolve/develop it further based on the newly arrived faulty state data. In order to validate our proposal, we present experimental results from a level control didactic process, where control and error signals are used as features for the fault detection and identification system, but the approach is generic and the number of features can be significant due to the computationally lean methodology, since covariance or more complex calculations, as well as storage of old data, are not required. The obtained results are significantly better than the traditional approaches.

Proposed fault detection algorithm

…

AutoClass algorithm

…

: Results with SPC and RDE fault detection algorithms

…

Pilot plant scheme

…

Plant in normal operating state

…

Figures - uploaded by Bruno Sielly Jales Costa

Content may be subject to copyright.

Content uploaded by Bruno Sielly Jales Costa

Content may be subject to copyright.

Fully Unsupervised Fault Detection and Identiﬁcation Based on

Recursive Density Estimation and Self-evolving Cloud-based

Classiﬁer

Bruno Sielly Jales Costa

Campus Natal - Zona Norte

Federal Institute of Rio Grande do Norte - IFRN, Brazil

Tel.: +55-84-40069509

Fax: +55-84-40069504

bruno.costa@ifrn.edu.br

Plamen Parvanov Angelov

School of Computing and Communications

Lancaster University, Lancaster, UK

Tel.: +44-01524-510391

p.angelov@lancaster.ac.uk

Luiz Aﬀonso Guedes

Departament of Computing Engeneering and Automation

Federal University of Rio Grande do Norte - UFRN, Brazil

Tel.: +55-84-32153771

Fax: +55-84-32153738

aﬀonso@dca.ufrn.br

Abstract

In this paper, we propose a two-stage algorithm for real-time fault detection and identiﬁcation

of industrial plants. Our proposal is based on the analysis of selected features using recursive

density estimation and a new evolving classiﬁer algorithm. More speciﬁcally, the proposed ap-

proach for the detection stage is based on the concept of the density in the data space, which is

not the same as probability density function, but is a very useful measure for abnormality/outliers

detection. This density can be expressed by a Cauchy function and can be calculated recursively,

which makes it memory and computational power eﬃcient and, therefore, applicable to on-line

applications. The identiﬁcation/diagnosis stage is based on a self-developing (evolving) fuzzy-

rule-based classiﬁer system proposed in this paper, called AutoClass. An important property of

AutoClass is that it can start learning “from scratch”. Not only do the fuzzy rules not need to be

prespeciﬁed, but neither do the number of classes for AutoClass (the number may grow, with new

class labels being added by the online learning process), in a fully unsupervised manner. In the

event that an initial rule base exists, AutoClass can evolve/develop it further based on the newly

arrived faulty state data. In order to validate our proposal, we present experimental results from

a level control didactic process, where control and error signals are used as features for the fault

detection and identiﬁcation system, but the approach is generic and the number of features can

be signiﬁcant due to the computationally lean methodology, since covariance or more complex

calculations, as well as storage of old data, are not required. The obtained results are signiﬁcantly

better than the traditional approaches.

Keywords: Fault detection, fault diagnosis, fault identiﬁcation, recursive density estimation,

evolving classiﬁers, autonomous learning.

1. Introduction

In the past few decades fault detection and identiﬁcation (FDI) ﬁeld of research has received

extensive attention. It is an important problem in control and automation engineering and is

the centre of Abnormal Event Management (AEM) ﬁeld of research (Venkatasubramanian et al.,

2003). Applications of FDI techniques in industrial environments are increasing in order to

improve the operational safety as well as to reduce the costs related to unscheduled stoppages.

The importance of the FDI research in control and automation engineering is based on the fact

that, prompt detection of an occurring fault, while the system is still operating in a controllable

region, usually prevents or, at least, reduces productivity losses and health risks.

With the increasing complexity of the procedures and scope of the industrial activities, AEM

is a challenging ﬁeld of study nowadays. The human operator plays a crucial role in this matter

since it has been shown that people responsible for AEM often take incorrect decisions. Industrial

statistic shows that 70% to 90% of the accidents are caused by human errors (Wang and Guo,

2013).

In the industrial context, there are several diﬀerent types of faults that could aﬀect the normal

operation of a plant. Among these we can list (Samantaray and Bouamama, 2008):

•Gross parameter changes: also known as parametric faults, which refer to disturbances to

the process from independent variables, whose dynamics are not known. As examples of

parametric faults one can list a change in the concentration of a reactant, a blockage in a

pipeline resulting in a change of the ﬂow coeﬃcient and so on.

•Structural changes: these refer to equipment failures, which may change the model of the

process. An appropriate corrective action to such abnormality would require the extraction

of new modeling equations to describe the current faulty status of the process. Examples

of structural changes are failure of a controller, a leaking pipe and a stuck valve.

•Faulty sensors and actuators: also known as additive faults, refer to incorrect process inputs

and outputs, and could lead the plant variables beyond acceptable limits. Some examples

of abnormalities in the input/output instruments are constant (positive or negative) bias,

intermittent disturbances, saturation, out of range failure and so on.

The entire process of AEM is often divided into a series of steps, which in fault-tolerant

design is called fault diagnosis scheme. Fault detection or anomaly detection is the ﬁrst stage

and it has extreme importance to FDI systems. In this stage, we are able to identify if the system is

working in a normal operating state or in a faulty mode. However, in this stage, vital information

about the fault, such as physical location, length or intensity, is not provided to the operator

(Silva, 2008).

In this sense, the need of a subsequent stage arises. The detector system (ﬁrst stage) contin-

uously monitors the process variables (or attributes) looking for symptoms (deviations from the

normal variables values) and sends these symptoms to the diagnosis system, second stage, which

is responsible for the classiﬁcation process.

Preprint submitted to Neurocomputing April 9, 2014

The diagnosis stage presents its own challenges and obstacles, and can be handled indepen-

dently from the ﬁrst one. It demands diﬀerent techniques and solutions, and is divided in two

sub-stages called isolation and identiﬁcation. The term isolation refers to determination of the

type, location and time of detection of a fault, and follows the fault detection stage (Donders,

2002). Identiﬁcation, on the other hand, refers to determination of the size and time-variant

behavior of a fault, and follows the fault isolation.

A lot of approaches to FDI have been proposed in the literature. We can mention, for ex-

ample, the observer-based (Chen and Saif, 2007), (Li and Yang, 2012), (Maiying et al., 2004),

(Sneider and Frank, 1996), analytical redundancy-based (Simani and Patton, 2008), (Anwar and

Chen, 2007), (Xu and Tseng, 2007), fuzzy model-based (Oblak et al., 2007), (Yang et al., 2011),

(El-Shal and Morris, 2000), (Laukonen et al., 1995), neural network-based (Leite et al., 2009),

(Vemuri et al., 1998), (Bernieri et al., 1996), immune system-based methods (Laurentys et al.,

2010a), (Laurentys et al., 2010b) and so on. Unfortunately, most of the above mentioned tech-

niques require either previous knowledge or empirical observation about the model or behaviour

of the system, need extensive computational eﬀorts or too many thresholds or problem-speciﬁc

parameters to be pre-deﬁned in advance, inhibiting/hampering their use in on-line applications.

Thus, these technical features make diﬃcult their adoption in real problems.

One group of methods which is worth to mention, and serves as a basis for comparison with

our proposal, later in this paper, is the group of statistical process control approaches (SPC). SPC

deals with data which are snapshot windows of moving the history of a process control system

(Hossain et al., 1996). It is used for process variables monitoring and is based on statistical

analysis (mean and standard deviation values), calculated in time windows and compared with

pre-deﬁned thresholds. Although, SPC is an on-line approach, most of the applications in use

today were developed based on the premise that the process parameters being controlled follow

Gaussian/normal distributions. Independence of the inputs and inﬁnite number of observatories

are other premises which, in reality are not satisﬁed. For further information on SPC methods,

the reader is referred to Martin et al. (1996), Cook et al. (1997), Liukkonen and Tuominen (2004)

and Kano et al. (2010).

Being aware of these shortcomings, in this paper we propose a recursive fully unsupervised

fuzzy rule-based (FRB) classiﬁer for fault detection and identiﬁcation in industrial processes,

which can be generalised for other speciﬁc problems. The proposed FDI system does not de-

mand neither mathematical models based on ﬁrst principles nor explicit previous knowledge

about the analysed process. It is based, instead, on the estimation of the density and proximity in

the data space. This density can be expressed by a Cauchy function and can be calculated recur-

sively (Angelov, 2012b), which makes it memory- and, thus, computational power- eﬃcient and

suitable for on-line applications. In this sense, it is autonomous (user-independent) and is able

to perform FDI on-line and without the above mentioned disadvantages. The proposed approach

has two well deﬁned and sequential stages - detection and identiﬁcation -, with a minimum of

very intuitive parameters, that can be associated with other existing approaches.

The proposed on-line detection algorithm is based on the recently introduced recursive den-

sity estimation (RDE) approach (Angelov et al., 2008). This approach allows to build, accu-

mulate, and self-learn a dynamically evolving information model of “normality” based on the

process data for particular speciﬁc plant based on the normal/“good”/accident-free cases only.

Theoretically, such an approach can start fault detection “from scratch” from the very ﬁrst data

sample observed.

It is important to stress that only a few techniques for data density analysis in fault detection

have been previously proposed, most of them applied to software fault detection applications and

based on probability density function (PDF), not data distribution density. Breunig et al. (2000)

presents the probability density-based local outlier factor (LOF) algorithm. In this approach the

anomaly score of a data sample is deﬁned as the average local probability density of its neighbors.

Similar methods based on the KNN algorithm were presented in Tang et al. (2002), Hautamaki

et al. (2004) and Papadimitriou et al. (2003). However, most of the existing algorithms suﬀer

from high complexity, therefore, are not suitable for large datasets or real-time applications.

For the identiﬁcation stage, the proposed approach is based on the new self-learning (fully

unsupervised) evolving classiﬁer algorithm called AutoClass. It builds upon the family of evolv-

ing clustering - eClustering (Angelov, 2004a), ELM (Baruah and Angelov, 2012), DEC (Baruah

and Angelov, 2013) - and classiﬁer - eClass (Angelov and Zhou, 2008), simpleClass (Angelov

et al., 2011) - algorithms. The new clustering algorithm, called AutoClass diﬀers from eClass0

in the way clusters are deﬁned and updated. While they are based on the concept of traditional

clusters, AutoClass works with the concept of data clouds (Angelov and Yager, 2011), struc-

tures with no deﬁned bounderies or shapes. Another innovation, when compared to eClass0, for

example, is that AutoClass can store a ﬁnite vector of points (for a limited time) which do not

belong to any existing class and later create a new class from them. Like eClass0, AutoClass also

can start from an empty knowledge base, from the ﬁrst data sample acquired.

Among the related work, it is important to mention some of the recently presented approaches

in the ﬁeld of fault detection, using adaptive and evolving FRB models. The paper Serdio et al.

(2014) presents an approach to FDI based on data-driven evolving fuzzy models and dynamic

residual analysis for extracting fault indicators. The authors introduce a two-stage algorithm,

one oﬀ-line (model identiﬁcation and training) and one on-line (fault detection), where neither

annotated samples nor fault patterns/models need to be available a priori. The FDI system is

successfully applied to a power plant coal mills. Lemos et al. (2013) and Lughofer and Guardiola

(2008) present two diﬀerent fully on-line FDI systems, using evolving fuzzy classiﬁers, based

on the evolving Takagi-Sugeno (eTS) algorithm, ﬁrst introduced by Angelov and Filev (2004)

and Angelov and Zhou (2008). The work of Lughofer (2010) also worth mentioning, since the

author developed an evolving image classiﬁer, capable of sort the images into “good” (fault-

free production items) and “bad” (faulty production items). Regarding the extraction of decision

rules from data streams and handling time changing data, a few approaches can be mentioned,

e.g. Gama and Kosina (2011) and Kosina and Gama (2012). In the ﬁrst paper, the authors

present a new algorithm to learn rule sets, designed for open-ended data streams and, in the

latter, an on-line, any-time and one-pass algorithm for learning decision rules in the context

of time changing data is introduced. At last, but not least, Suvorov et al. (2013) introduces a

one-class SVM (support vector machine)-based FDI system, and the approach is applied to real

ﬂight data from the worldwide aircraft industry. Our proposed algorithm, AutoClass, diﬀers

from the mentioned approaches in the sense of either not needing any oﬀ-line/separate training

stage or not being based on the eTS framework. Instead, the clustering algorithm is based on

AnYa (Angelov and Yager, 2012), (Angelov and Yager, 2011) fuzzy models. The inference rules

have no speciﬁc parameters or shapes for the membership functions and it is entirely data-driven.

Also, the algorithm is fully unsupervised, which means there is no need for a pre-speciﬁed fault

base, and new faults and labels are created automatically in the presence of considerable outliers.

Speciﬁcally comparing to the latter approach, the main problem with the idea is that, oﬀ-line, or

even the on-line versions of one-class SVM require a lot of computational eﬀorts and parameters

that are problem and user-speciﬁc.

The remainder of the paper is organised as follows: in Section (2) the detection proposal is

described, with Subsection (2.1) describing the the Recursive Density Estimation (RDE) method

and Subsection (2.2) detailing the fault detection algorithm. Section (3) presents the identiﬁca-

tion proposal, with Subsection (3.1) presenting the AutoClass algorithm and Subsection (3.2)

detailing the fault identiﬁcation algorithm based on AutoClass. Section (4) describes the exper-

imental setup used to validate our proposal. Section (5) presents the analysis of the obtained

results using our approach and a comparison to the widely used benchmark approach called

statistical process control. Finally, in Section (6), the main conclusions are presented.

2. Fault Detection Stage

2.1. Recursive Density Estimation

The RDE concept was originally introduced by Angelov et al. (2011), but received the name

RDE in 2008 (Angelov et al., 2008) and its latest version is a part of a patent application (An-

gelov, 2012a). Since then it has been used in many applications (Angelov et al., 2008), (Kolev

et al., 2013), (Ramezani et al., 2008).

This concept uses a Cauchy function, which has similar properties to the Gaussian but can

be updated recursively (Angelov, 2004b) and is non-parametric. In addition, there is no need

to make any assumptions about the distribution. This means that only a very small amount of

data - only the mean of all data samples, µkand the scalar product quantity, Σkcalculated at the

current moment in time k- are required to be stored in the memory and updated. The current

data sample, xkis also used, but it is available and there is no need to store or update it.

This has signiﬁcant implications, because it allows theoretically an inﬁnite amount of data

(inﬁnitely large data sets or inﬁnitely long and open-ended time-wise data streams) to be pro-

cessed in real time, very fast and exactly (not approximetely). We will also present in the next

section an extended approach where a small vector of density is stored in the memory, with no

noticeable implications for the real-time constraints.

Let all measurable physical variables form the vector x∈Rnare divided into several clusters.

Then, for any vector x∈Rn, its Λ-th cluster density value is calculated for Euclidean type

distance as (Angelov, 2012b):

dΛ=1

1+1

NΛ

i=1

||xk−xi||2

(1)

where dΛdenotes the local density of cluster Λ;NΛdenotes the number of data samples associ-

ated with cluster Λ. In the case of fault detection applications, xkrepresents the feature vector

with values for the instant k.

The distance is calculated between a given data vector (e.g. measured at the time instant k)

and other data vectors that belong to the cluster to which the data vector xbelongs to (measured

at previous time instances). It can be shown, that this formula can be derived as an exact (not

approximated or learned) quantity as (Angelov, 2012b):

D(xk)=1

1+||xk−µk||2+ Σk− ||µk||2(2)

where both, the mean, µkand the scalar product, Σkcan be updated recursively as follows:

µk=k−1

kµk−1+1

kxk, µ1=x1(3)

Σk=k−1

kΣk−1+1

k||xk||2,Σ1=||x1||2(4)

The data is collected continuously, in on-line mode during the process run. Some of the new

data reinforce and conﬁrm the information contained in the previous data. Other data, however,

bring new information, which could indicate a change in operating conditions, development of

a fault or simply a more signiﬁcant change in the dynamic of the process (Angelov, 2002), (An-

gelov and Buswell, 2002), (Angelov and Filev, 2004), (Angelov and Filev, 2002). The judgment

of the importance of the data is made based on their spatial proximity, which corresponds to

operating conditions, possibly seasonal variations or diﬀerent faults.

In order to detect outliers within a data stream, the assumption is that for a set of features, the

normal behaviour of the system is invariant. We understand “invariant” as a state/regime which

is not substantially oscillatory but, obviously, may vary within the operating regime boundaries

for a real industrial system within the 3 standard deviations in terms of data density. The vector

xkis an n-dimensional vector, composed of the values of the n selected features for the discrete

time step k.

The feature selection procedure is an important stage of the overall problem, since the set

of selected features represents the overall idea of density variation. It is deﬁned from the in-

put/output variable space and possible pre-processing operations.

It is important to stress that such on-line fault detection approach approach, since it is based

entirely on the concept of density in the data space (RDE), is highly suitable and applicable in

conditions where it is not possible to perform a training stage or to pre determine all possible

faults. Unexpected faults can appear overtime, particularly in dynamic environments, such as

operation of industrial processes. Neural networks, for example, due to their intrinsic nature, are

often restricted to very narrow settings, neglecting implicit evolution of the environment due to

variations in the raw materials, contamination and other reasons (equipment getting older etc.).

Traditional models, such as neural networks, start to drift and a re-calibration is needed. The

proposed method does not suﬀer from such disadvantage because it is adapting and evolving.

This is a crucial matter for the evolving systems ﬁeld of study.

2.2. On-line Fault Detection Based on RDE

The on-line fault detection procedure starts with the initialisation of the the current time steps

k=1 and ks =1. While kcounts the number of data samples which are read (hence, the total

number of iterations of the algorithm), ks counts the number of time steps in which the system

remains in the same status (“normal”/“fault”). The variable status is also initialised with the

value “normal”.

From this point, the n-dimensional input data sample xkis read from one of the system

interfaces, e.g. text ﬁle, data base, industrial real-time protocols. In the ﬁrst execution (k== 1),

the variables density (D(xk)=1.0), mean value of density (µD=Dk), µkand Σkare initialised

and time steps kand ks are incremented by 1 (k=k+1, k s =ks +1).

From the second time step (k>1) onwards, the variables µk,Σkand D(xk) are recursively

updated by the equations (3), (4) and (2), respectively. The variable ∆Dis, then, calculated by

the absolute value of D(xk)−D(xk−1), where D(xk) is the density calculated for the current data

sample (xk) and D(xk−1) is the density calculated for the immediately previous data sample (xk−1).

Note, that we only need to store one previous value of D.

The mean of density (µD) is now calculated as follows:

µD= ks −1

ks µD+1

ks D(xk)!(1 −∆D)+D(xk)∆D(5)

This information will be used as a measure for deciding whether the system should enter or

exit a faulty state. Since it is recursively calculated, it does not need storing any previous values

in the memory, which is appropriate for an on-line approach. The calculation of µDfollows the

premise of equation (3), however, it is much less conservative, in the way that µDis based on the

past values of D, but also is sensitive to abrupt changes. The coeﬃcient (1 −∆D) will lead µDto

near the actual mean of density when there is a smooth change in the signal, and ∆Dwill lead µD

to near the new value of D(xk) in the presence of an abrupt change.

At this point, the following scenarios can occur:

a) If the current status of the system is “normal” and D(xk)< µDfor the past 2 seconds Then

change the status to “fault” and re-initialise ks (k s =0)

b) Else If the current status of the system is “fault” and D(xk)>=µDfor the past 8 seconds

Then change the status to “normal” and re-initialise ks (k s =0)

c) Else do nothing.

Note, that in cases a) and b), we use two intuitive enter/exit thresholds (2 seconds and 8

seconds, respectively). After 2 seconds with the density below the mean, the system will enter

a faulty state and, after 8 seconds with the density above the mean, the system will exit a faulty

state. These values represent a good trade-oﬀbetween response time and robustness of the

detection system and are based on the order of magnitude of the process (in this case, we are

working with a fast response plant, thus, seconds). Note also, that 2 and 8 seconds do not

necessarily concern the number of time steps (k). For a process with the sampling period equal

to 100ms (10Hz), for example, 8 seconds will be equal to 80 time steps.

The process is terminated and starts again from the reading of the next data sample xk, with

k=k+1 and ks =k s +1. Since it is an on-line process, the total readings and iterations are

theoretically indeﬁnite and in practice can be deﬁned by the user.

The proposed recursive procedure for on-line fault detection using density estimation is de-

tailed in the Figure 1.

3. Fault Identiﬁcation Stage by AutoClass

Fuzzy rule-based (FRB) systems have been successfully applied to diﬀerent classiﬁcation

tasks (Angelov and Zhou, 2008) including, but not limited to, decision making, pattern recog-

nition, image processing and, of course, fault identiﬁcation. The challenges which information

processing, and classiﬁcation, in particular are faced with, are related to: i) the need to cope

with huge amounts of data, and ii) process streaming data online and in real time (Fayyad et al.,

1996), (Angelov, 2012b). Storing the complete dataset and analysing the data streams in an

oﬄine (batch) mode is often impossible or impractical, and data streams are very often non-

stationary.

Thus, in order to overcome these problems, in the second stage of the proposed approach

we introduce an AnYa-like FRB classiﬁer, capable of identifying diﬀerent types of faults in a

hydraulic pilot plant application. The proposed algorithm is called AutoClass and is described

as follows.

Figure 1: Proposed fault detection algorithm

3.1. AutoClass Algorithm

Unlike traditional Mamdani (Mamdani and Assilian, 1975) and Takagi-Sugeno (TS) (Takagi

and Sugeno, 1985) fuzzy systems, AnYa does not require an explicit deﬁnition of fuzzy sets (and

their corresponding membership functions) for each input variable. On the other hand, AnYa

applies the concepts of data clouds (Angelov and Yager, 2012) and relative data density to deﬁne

antecedents that represent exactly the real data density and distribution and that can be obtained

online from data streams.

Data clouds are subsets of previous data samples with common properties (closeness in the

data space) (Angelov and Yager, 2012). Contrary to traditional membership functions (MFs),

they represent directly and exactly all the (previous) data samples. A given data sample can

belong to all the data clouds with a diﬀerent degree γ∈[0,1], thus the fuzziness in the model

is preserved. It is important to stress that data clouds are diﬀerent from traditional clusters in

that they do not have speciﬁc shapes and, thereby, do not require the deﬁnition of boundaries or

parameters.

AutoClass, as any other classiﬁer, is a mapping from the feature space to the class label space.

It is important to stress that of lables in AutoClass is auto-generated. A general FRB classiﬁer

describes, with its antecedents part, a fuzzy partitioning of the feature space x∈Rn, and with

the consequent part, the class label Classi,i=[1,K]. The structure of AutoClass follows the

construct of an AnYa FRB system:

Ri: IF ~

x∼XiTHEN Clas si(6)

where ∼denotes the fuzzy membership expressed linguistically as “is associated with”, Xi∈ Rn

is the i-th data cloud deﬁned in the input space ~

x=[x1,x2,...,xn]Tis the vector of features and

Classiis the label of the class of the i-th data cloud.

The inference in AutoClass is produced using the well-known “winner takes all” rule (Ishibuchi

et al., 1995):

Class =Classi∗,i∗=argmaxn

i=1(γi) (7)

where γidenotes the degree of membership of the data sample vector xkto the data cloud Ni,

deﬁned here as a normalised relative density, as follows (Angelov and Filev, 2004):

λi

k=γi

j=1

γi

(8)

where γi

kis the local density of the i-th cloud estimated from that data sample.

This local density is deﬁned by a Cauchy functionl over the distance between xkand all the

other samples in the data cloud, which can be recursively computed (Angelov, 2012b) as

γi

k=1

1+||xk−µk||2+Pk−||µk||2(9)

where γi

kdenotes the relative density to the i-th data cloud calculated in the k-th time instant; µk

denotes the mean and Pkthe scalar product for the data sample xk, calculated by the equations

(3) and (4), respectively.

The AutoClass algorithm starts with the deﬁnition of the initial “zone of inﬂuence” by the

user. Although the concept of data clouds diﬀers from the traditional clusters in the sense that

there are no well deﬁned bounderies, we still use a measure of zone of inﬂuence of a data cloud.

This is the only user-deﬁned parameter in the two stages of the proposal, and it is very intuitive

for the user. Too large a value of the zone of inﬂuence rleads to averaging, too small a value

leads to over-ﬁtting. Initial values of r∈[0.3,0.5]can be recommended (Angelov and Filev,

2004), assuming the feature range is [0,1] (normalised). Then, the ﬁrst data sample is read at the

time step k=1.

Initially, the rule base is completely empty, which means no fuzzy inference rules, data clouds

or labels were created yet. After reading the ﬁrst data sample, a data cloud nc is created. The

focal point (in this case, the nomenclature speciﬁed for the mean of the data samples) of nc will

be the data sample xkitself and the zone of inﬂuence is the initialZ I deﬁned by the user. Since

xkfor k=1 is the ﬁrst data sample, the number of points associated with nc will be 1. The

newly created cloud nc is added to the vector clouds and a label Class1as the consequent part

will complete the ﬁrst inference rule:

R1: IF ~

x∼cloud1THEN Class1(10)

Note that there is no need for storing all read data samples. The information representing an

existing cloud includes its focal point (mean), its zone of inﬂuence, its density and the number

of points associated with the referred cloud. This is very important for decreasing computational

eﬀort in on-line executions.

From the second iteration (k=2) onwards, AutoClass will work with the existing fuzzy rule

base, updating the existing rules and creating new ones when necessary. Note that the number of

steps is not pre-deﬁned since AutoClass is performed on-line.

With each subsequent data sample xkthat is being read, for k>1, two scenarios can occur:

a) the data sample xkis associated with an existing data cloud or b) the data sample xkis not

within the zone of inﬂuence of any existing cloud, which means xkis either i) an outlier, or ii)

may in future create a new data cloud.

In the ﬁrst case a), considering close point which is within two times the zone of inﬂuence of

a cloud, all clouds which exercise some inﬂuence over xkwill be updated (note that, we, again,

use the Euclidean type distance, however other approaches are also acceptable). This is a very

important step in order to preserve the fuzzy aspect of the system. For each aﬀected cloud cc,

the following steps are performed by AutoClass:

•The focal point (mean) of cc is updated. The amount of shift in respect to the previous

focal point will be deﬁned by i) the location of xkin the n-dimensional space of the selected

features and ii) the number of points already under inﬂuence of the cloud cc. In this way,

the update equation for the focal point of cc will be a weighted sum of the current focal

point and the new data sample xk, considering the number of points associated with cc.

The more populated the cloud is, the less its focal point will be driven towards xkin the

n-dimensional feature space.

•The zone of inﬂuence of cc is updated. Following the same idea for the focal point, the

amount of shift over the previous zone of inﬂuence will be deﬁned by i) the distance from

xkto the current focal point of cc in the n-dimensional space of the selected features and

ii) the number of points already under inﬂuence of the cloud cc. This way, the update

equation for the zone of inﬂuence of cc will be the weighted sum between the current zone

of inﬂuence and the distance from xkto the current focal point, considering the number of

points associated with cc. Here, we used the euclidean distance, but alternative forms of

distance can be used as well. The more populated the cloud is, the less its zone of inﬂuence

will be increased or decreased. Note, that after a number of time steps, the lenght of the

projections of the zone of inﬂuence of each cloud will be considerably diﬀerent from

the zones inﬂuence of other existing clouds. Densest clouds tend to decrease its zone of

inﬂuence, while sparce clouds tend to increase its inﬂuence further.

•The number of points under inﬂuence of the cloud cc will be increased by 1.

In the second case b), the point xkis not close to any existing cloud, and it is considered a

temporary outlier. Over time, a certain number of outliers close to each other can form a new

cloud. AutoClass stores the outliers in a small vector, avoiding to discard an immediate outlier,

which can later belong to an existing cloud. Note that the referred vector does not signiﬁcantly

increase the computational eﬀort of AutoClass, since the size of the vector is limited. The maxi-

mum size of the outliers vector is the smallest of the values 100 and 5% of the current k. These

values represent a good trade-oﬀbetween accessibility of past data samples and computational

memory needed for execution. If, after reading a new data sample xk, the size of the vector is

exceeded, the oldest data sample stored is removed.

After updating the vector outliers, two scenarios can occur: i) there are enough stored outliers

close to each other to create a new cloud and the density of this potential new cloud is higher than

the average density of all existing clouds or ii) it is an actual outlier and it will be temporarily

ignored. The number of outliers close to each other necessary to form a new cloud, here called

minPoints, is deﬁned by the maximum value between 3 and 15% of the total number of points

of the least populated cloud cloud. In this way, the formation of a cloud will depend not only

on a ﬁxed minimum (in that case >=3) of points, but also on the size of the existing clouds

and, consequently, the time steps/data samples read so far, avoiding size disparities between the

existing and the newly created clouds. The density is also a crucial factor to be considered,

since, together with the number of points, reﬂects the informativeness of the new cloud. Here,

we use the concept of relative local density, measured for each existing cloud and calculated by

the equation (8).

If the two conditions, the number of close outliers higher than minPoints and the density of

the new candidate cloud is higher than the average of the densities of all existing clouds, are

satisﬁed, the following steps are performed by AutoClass:

•A new cloud nc is created.

•The focal point of nc is deﬁned as the mean of all data samples associated with nc (former

outliers), here called µx.

•The zone of inﬂuence of nc is deﬁned by the average of i) the mean of the zone of inﬂuence

of all existing clouds and ii) the initial zone of inﬂuence deﬁned at the beginning of the

algorithm. Note, that this proposed relation considers both the already updated zones

of inﬂuence of the existing clouds and the initial value deﬁned by the user. This is a

conservative feature of AutoClass, which merges the current knowledge base of the system

and the expertise from the operator.

•The number of points under inﬂuence of the new cloud nc is assigned from the stored

outliers close to each other, considered in the creation of nc.

•The former outliers, which are now part of the cloud nc, are removed from the outliers

vector.

•The newly created i-th cloud nc is added to the vector clouds and a label Classias the

consequent part will complete the ﬁrst inference rule:

Ri: IF ~

x∼cloudsiTHEN Classi(11)

It should be noted that, the class labels are generated automatically in a sequence (“Class 1”,

“Class 2” and so on), as diﬀerent faults are detected. Of course, these labels do not represent the

actual type or location of the fault, but they are very useful to distinguish diﬀerent faults. Since

there is no training or pre-deﬁnition of faults or models, the correct labelling can be performed

in a semi-supervised manner by the human operators, without requiring prompt/synchronised

actions of the user.

Finally, the time step kis incremented by 1 (k=k+1) and the algorithm continues with

reading the next data sample, xk. The full procedure is detailed in the Figure 2.

Figure 2: AutoClass algorithm

3.2. On-line Fault Classiﬁcation Based on AutoClass

Fault identiﬁcation, the second stage of the FDI scheme, can be viewed as a classiﬁcaion

problem. The overall idea of the proposed approach is to select speciﬁc features, which can

be process variables or attributes, and cluster, on-line, the incoming data in the n-dimensional

feature space. AutoClass algorithm is responsible for generating and updating fuzzy inference

rules, in a fully unsupervised manner, creating diﬀerent classes which each of data sample that

is read will be assigned.

AutoClass, as an evolving classiﬁer and, diﬀerently from the traditional fuzzy models, is

able to change its structure, to grow and update when necessary, hence, presenting a higher

level of adaptation (Angelov and Kasabov, 2006). This means that inference rules, and not only

parameters, can be created or updated at each time step and they represent new types of faults

discovered from the data pattern autonomously.

The main goal is to spatially separate data in diﬀerent plant operating states/regimes and

group the data in similar states. Here we stress again the importance of the feature selection

procedure, as mentioned in Subsection (2.1). The choice of which process variables or attributes

(processed variables) to monitor is crucial when developing a classiﬁcation system. The selected

features need to reﬂect the diﬀerences among the diﬀerent operating states of the plant, and we

need to reach a good trade-oﬀbetween the number of features and computational eﬀort. While

a large number of selected features will ensure a more realistic representation of the data, its

computational requirements might be prohibitive. With a small number of selected features, on

the other hand, the system may not be able to distinguish diﬀerent classes, while keeping the

computational eﬀorts to a minimum. Once again, feature selection procedures are extensively

discussed in literature and will not be detailed in this paper.

Here, two process attributes were selected to form the 2-dimensional feature space:

•Feature 1: The period of the control signal. In most generated faults, the control signal u

assumes a periodic behaviour, with nearly constant intervals. This measure can be used to

distinguish diﬀerent classes of faults.

•Feature 2: The amplitude of the control signal. Diﬀerent amplitudes of ucan be used to

both distinguish diﬀerent classes of faults and levels of the same fault.

At each iteration, if the proposed fault detection algorithm triggers a faulty state, AutoClass

gets as an input the 2-dimensional data vector x={Feature1,Feature2}. Note that, although

we are presenting a two-stage algorithm for detection and identiﬁcation, both stages can be used

separately and associated with other existing approaches. The fault can then be automatically

associated with a previous similar fault (as it will be shown further) or a new data cloud can be

initiated.

4. Experimental Setup

To validate our proposal, we used a pilot plant for industrial process control (Marins, 2009).

The pilot plant allows to study continuous process control, based on the typical four variables,

namely pres sure,tem perature,f low, and tanklevel, deﬁned here as the input space vector S=

(u,t,f,y)T.

The pilot plant includes (DeLorenzo, 2009): indicators and sensors; transmitters that con-

vert the physical signal into electric one, to be processed by the programmable logic controller

Figure 3: Pilot plant scheme

(PLC); a terminal bus, where all electrical signals are available for external controller; supervi-

sory control and data acquisition software for parametric conﬁguration and process visualisation.

It is composed of: a control panel with PLC and all electric components for plant control; two

pressurised vessels, one made of acrylic, T1, and one made of stainless steel, T2; a centrifu-

gal recirculation pump controlled by a frequency inverter; a heater and a heat exchanger; two

directional valves, V1 and V2; temperature, pressure, ﬂow and level sensors; electrical power

controller.

The two tanks are connected by a piping system, which enables liquid ﬂow between the tanks.

The plant works in a way that is possible to transfer the liquid in both directions. It should be

mentioned that T1 is positioned above T2 in relation to the ground level. The liquid ﬂows always

in one direction: from T1 to T2 by the gravity and from T2 to T1 by the pressure generated from

the centrifugal pump. The plant scheme is shown in Figure 3 (Costa et al., 2013).

In this work we have considered only the liquid level application. The plant is controlled

by a multistage fuzzy controller, developed in JFuzZ (Costa et al., 2010) software tool through

an OPC (OLE for Process Control) interface (Liu et al., 2005), (Schwarz and Boercsoek, 2007).

The behaviour generated by the controller represents the “normal” state of operation of the plant.

The details of the controller implementation are presented in (Costa et al., 2012). Figure 4(a)

illustrates the variables level (y, observed variable), reference (r, user-deﬁned set point) and

pressure (u, control action) to the pump, as the vector x=(r,y,u), for r=0.5 (50% of the

maximum capacity of the tank), within a normal state of operation. The density (equation (2))

evolution is shown in the Figure 4(b).

It should be noted that the transient state of control, after a change of the set point, is ignored

when calculating the density. The level (observable variable), then, reaches the reference and

remains stable, with error (e=r−y) close to zero and no signiﬁcant oscillation. Likewise, the

control signal is nearly constant, considering a minor oscillation due to the noise intrinsic to real

applications and industrial environments. From now on, this dynamic pattern (Figure 4) will be

our reference for the “normal” operation of the plant and signiﬁcantly variant signals may be

interpreted as faulty states.

The subject of this study is a set of 16 diﬀerent faults, most of them physically generated in

(a) Variables chart

(b) Density chart

Figure 4: Plant in normal operating state

the pilot plant. The faults are divided in 4 groups: actuator, leakage, stuck valves and disturbance-

related.

Each group contains experiments with diﬀerent patterns and levels. In the “actuator” group,

there are 6 levels of oﬀsets in the centrifugal pump; in the “structural” group there are 3 levels of

open drain, which simulate a physical leakage in the tank T1, and 3 levels of jamming of each

valve; in the “disturbance” group there is 1 environmental disturbance with the manual addition

of water to the tank. All generated faults are described in Table 1.

Table 1: Set of generated faults

Fault ID Group Type Fault Level

F1Actuator Positive oﬀset +2%

F2+4%

F3+8%

F4Negative oﬀset -2%

F5-4%

F6-8%

F7Structural Tank leakage 33%

F866%

F9100%

F10 Stuck valve 1 30%

F11 50%

F12 85%

F13 Stuck valve 2 25%

F14 50%

F15 75%

F16 Disturbance Environment disturbance Low

5. Results

The experiment was divided in two stages; i) detection, and ii) identiﬁcation, where a set of

diﬀerent faults were separately analysed.

5.1. Fault detection results

For comparative purpose, we, ﬁrst, analysed the faulty process data with a statistical process

control (SPC) application, which is a well-known algorithm for outlier detection in industrial

processes. The procedure details were exhaustively presented in literature (Hossain et al., 1996),

(Martin et al., 1996), (Cook et al., 1997), (Liukkonen and Tuominen, 2004), (Kano et al., 2010).

In this paper, the SPC algorithm was implemented in Java language and performed on-line for

100 data samples for each time step, which represents 10 seconds of the real process timeframe

in this application (frequency 10Hz). The variables monitored in this experiment are the control

(u) and the error (e) signals.

The results or the SPC approach are usually presented as X-Bar charts (Hossain et al., 1996).

A X-Bar chart shows the behaviour over time of the monitored variable, upper limits and lower

limits. Figure 5 presents the resulting X-Bar charts, with 5(a) showing the control signal uand

(a) X-Bar control signal chart

(b) X-Bar error chart

Figure 5: Results for fault F11 with SPC application

5(b) showing the error e, for the fault F11. Indications of outliers and normal states are also

highlighted in the image.

After the ﬁrst round of experiments, the same data was analysed with the new approach.

The proposed algorithm was also implemented in Java language and performed on-line. The

variables monitored in this experiment are also the control signal (u) and the error (e). Figure

6 represents the resulting charts, with 6(a) showing the control behaviour and 6(b) showing the

density evolution, also for the fault F11. The reference (r), tank level (y), control signal (u) are

highlighted in the image. Note, also, that black and grey vertical bars indicate the beginning and

the end of faulty states, respectively.

For comparison purposes, we analyse here; i) the hit/miss rate, which are complementary,

and are calculated by the sum of hits/misses in comparison with the correct classiﬁcation, both

when the system is normally operating or under a fault, and ii) the execution time on the same

machine. The results for all 16 experiments with the SPC and the proposed approaches are

detailed in Table 2.

While both approaches used for the experiments are on-line and data driven, they perform

quite diﬀerently with the fault detection approach demonstrating a big improvement as compared

(a) Control behaviour chart

(b) Density

Figure 6: Results for fault F11 with the proposed application

Table 2: Results with SPC and RDE fault detection algorithms

Fault Samples Execution time (ms) Hit rate %Miss rate %

SPC RDE SPC RDE SPC RDE

F1973 728 568 64.13 97.84 35.87 2.16

F21384 605 389 71.46 98.48 28.54 1.52

F31535 360 271 50.23 98.63 49.77 1.37

F41696 284 277 56.66 96.7 43.34 3.3

F52174 397 161 61.41 96.46 38.59 3.54

F61379 332 308 74.76 98.48 25.24 1.52

F72046 221 171 50.29 48.36 49.71 51.64

F82422 570 275 45.46 75.59 54.54 24.41

F91632 293 351 61.64 95.46 38.36 4.54

F10 2241 352 247 45.78 98.93 54.22 1.07

F11 2319 241 293 62.4 88.61 37.6 11.39

F12 1851 334 173 52.3 77.14 47.7 22.86

F13 1969 302 218 48.25 66.92 51.75 33.08

F14 2302 505 312 33.62 90.31 66.38 9.69

F15 1766 290 173 46.21 98.75 53.79 35.14

F16 1744 524 145 61.3 64.86 38.7 9.58

to the SPC application. While the SPC obtained a total of 55.37% of hits only, the proposed fault

detection approach provided a total of 86.97% of hits. Individually, the second application also

demonstrated better results, for 15 of 16 diﬀerent faults analysed. Likewise, it should be noted

the robust nature of the proposal. While we can visually identify on charts several switches from

“normal” to “faulty” state and vice versa during the execution of the SPC algorithm, the proposed

approach is clearly more conservative on deciding when to enter or exit a “faulty” state. In this

sense, the proposed detection system tries to ignore transient signals and present a more accurate

alert to the user.

Another aspect to be considered in this comparison is the execution time of the two algo-

rithms. The total execution time for the 16 data ﬁles in the proposed detector is 31.65% faster

than in the widely used SPC algorithm. The main reason for the better performance is that, the

proposed approach does not need to store any past data samples, neither to do oﬀ-line calcula-

tions, such as mean, standard deviation and so on. In the proposed algorithm, the density and its

mean are calculated recursively, and their values are updated at each time step, without needing

to store any previous data samples.

It is important to highlight that, the main idea of the approach, as in other fully on-line/no

training stage algorithms, is based on the concept of “normality”. This means that, by default,

the algorithm will consider the more frequent point of operation as the “fault-free” state. It

is important to consider the fact that, faulty states are usually not dense at all. Analysing the

graphics of Figure 6, for example, it is easy to perceive that, even that the system is under a

faulty state in the majority of the experiment, the density signal does not rise continuously. In

practice, thus, a very early fault can be detected, if the faulty data presents an oscillatory (not

dense) behaviour, which is what often occurs.

(a) Detection stage - Input signals

(b) Detection stage - Density (c) Identiﬁcation - 2 selected features

Figure 7: Fault detection and identiﬁcation - system status for k =650

5.2. Fault identiﬁcation results

The second stage of the proposed approach, which deals with fault identiﬁcation, is quite

unique in the sense that it autonomously and in a completely unsupervised manner (without

any pre-training or prior knowledge and information) identiﬁes the types of faults. Therefore,

it is diﬃcult to compare this new approach with any existing alternative appraoch diretcly. We

consider a large data stream of sequential faults. The classiﬁcation process is performed “from

scratch”, without any a priori information, by the proposed AutoClass algorithm, fully unsu-

pervised, starting from the ﬁrst data sample acquired and an empty fuzzy rule base. Note, that

AutoClass is called only if the system is in a “faulty” state, detected by the proposed detection

algorithm, described in the previous section. The progress of execution and behaviour of the

system is illustrated in the next charts. Similarly to the previous ﬁgures, black bars indicate the

moment when the fault is detected and grey bars indicate the moment when the system exits a

faulty state.

Figure 7 shows the system state after 400 data samples within the ﬁrst fault detection, where

subﬁgure 7(a) shows the input signals, subﬁgure 7(b) shows the calculated density and subﬁgure

7(c) shows the clustering and classiﬁcation in the 2-dimensional feature space. Since the sam-

pling period of the analysed process is 100ms, 400 data samples means 40s, and so on. This

standard is also used in the next ﬁgures. The fault F2(actuator with +4% oﬀset) is detected

around data sample 250 and AutoClass creates, then, the ﬁrst cluster, which represents the ﬁrst

class of faults, automatically named “Class 1” or “Fault type 1”.

After the identiﬁcation of the ﬁrst faulty data, the system returns to a “normal” status. The

next fault, F4(actuator with -2% oﬀset), is then detected around the data sample k=1,700. After

400 data samples within the faulty state, AutoClass creates a new class of fault, automatically

named “Class 2”. Note that the ﬁrst data samples within the mentioned faulty state are classiﬁed

as outliers, since the local density of the potential new cluster is still not enough to create a new

class label, as seen in Figure 8. Subﬁgure 8(d) shows a zoomed image of the density chart 8(b),

focusing on the interval [1,600; 2,200].

The next data stream acquired belongs to the data set with the fault F1(actuator with +2%

oﬀset). Note, that F1and F2, the ﬁrst identiﬁed fault, are in the same class of faults, with diﬀerent

levels of strenght (see Table 1). Figure 9 shows the system state after the data sample k=4,000.

At this point, no new classes were added to the existing “Class 1” and “Class 2” set, however,

the cluster related to “Class 1” is updated to cover the latter data stream.

Last, but not least, the fault F9(leakage of 100%) is detected and identiﬁed after the third

cluster and its equivalent class “Class 3” is created. Figure 10 shows the system state after the

reading of the last data sample (k=5,600) and the ﬁnal classiﬁcation chart.

The ﬁnal AnYa rule base, after the execution of the 5,600 data samples, is detailed below.

R1: IF ~

x∼cloud1THEN “Class 100 

R2: IF ~

x∼cloud2THEN “Class 200 

R3: IF ~

x∼cloud3THEN “Class 300 (12)

with

cloud1:c1=[0.416, 3.316] and r1=[0.251, 0.756]

cloud2:c2=[-0.513, 2.706] and r2=[0.250, 0.601]

cloud3:c3=[-0.416, 1.491] and r3=[0.197, 0.451]

where ciis the focal point and riis the zone of inﬂuence of the cloud i.

It is important to note on Figure 10(c) that, even though the system was able to distinguish

faults F1and F2, which are positive oﬀset of the actuator, from fault F4, which is negative oﬀset

of the actuator, they are still close together, because both faults concern the actuator. Note also,

that faults F2and F4are also close to each other, albeit one is negative Feature1, while the

other is positive. Fault F9, on the other hand, concern structural changes and, on Figure 10(c),

is further from faults F1and F2, but, since leakage is logically closer to a negative change, F9is

close to F4.

6. Conclusion

An entirely new approach to FDI of industrial processes is introduced in this paper. With

two well deﬁned and independent stages, the proposed approach is able to perform detection and

classiﬁcation of diﬀerent types, lengths and levels of faults, in a fully unsupervised and on-line

manner, with no a priori knowledge about the process. RDE, which was recently introduced,

is used in the ﬁrst stage for outlier/anomaly detection over data streams. It does not require

pre-deﬁned models or user-deﬁned parameters as standard techniques do, and it is completely

data-driven. Fot the identiﬁcation stage, a new approach called AutoClass is introduced in this

paper. AutoClass can be used for classiﬁcation problems assuming autonomous labeling, simi-

larly to the self-learning autonomous classiﬁer eClass0, also introduced recently and now being

(a) Detection stage - Input signals

(b) Detection stage - Density (c) Identiﬁcation - 2 selected features

(d) Zoom on the chart 8(b)

Figure 8: Fault detection and identiﬁcation - system status for k =2,150

(a) Detection stage - Input signals

(b) Detection stage - Density (c) Identiﬁcation - 2 selected features

Figure 9: Fault detection and identiﬁcation - system status for k =4,050

(a) Detection stage - Input signals

(b) Detection stage - Density (c) Identiﬁcation - 2 selected features

Figure 10: Fault detection and identiﬁcation - system status for k =5,600

widely used in many areas. AutoClass, diﬀerently from the traditional approaches, works with

the concept of data clouds, which are structures with no speciﬁc shape, boundaries, centre, para-

metric function to describe them and yet they are represented by an aggregated measure (data

density). In this paper, the proposed FDI system is successfully applied to a liquid level control

plant, with several physically- and software- generated faults, and its ﬁrst stage is compared to

the well known SPC approach. The results demonstrate the superiority of the proposed approach

as well the fact that an open structure grouping and autonomously labeled FRB classiﬁer can

be generated on-line from streaming data achieving high classiﬁcation rates and using limited

computational resources.

References

Angelov, P., 2002. Evolving Rule-based Models: A Tool for Design of Flexible Adaptive Systems. Studies in Fuzziness

and Soft Computing, Springer Verlag.

Angelov, P., 2004a. An approach for fuzzy rule-base adaptation using on-line clustering. International Journal of

Approximate Reasoning 35, 275–289. The ﬁnal, deﬁnitive version of this article has been published in the Journal,

International Journal of Approximate Reasoning 35 (3), 2004, ELSEVIER.

Angelov, P., 2004b. An approach for fuzzy rule-base adaptation using on-line clustering. International Journal of

Approximate Reasoning 35, 275 – 289.

Angelov, P., 2012a. Anomalous system state identiﬁcation, patent gb1208542.9, priority date 15 may 2012.

Angelov, P., 2012b. Autonomous Learning Systems: From Data to Knowledge in Real Time. John Willey and Sons.

Angelov, P., Baruah, R.D., Andreu, J., 2011. Simpl eclass: simple potential-free evolving fuzzy rule-based on-line

classiﬁers, in: Proceedings of 2011 IEEE International Conference on Systems, Man and Cybernetics, SMC 2011,

Anchorage, Alaska, USA, 7-9 Oct, 2011, IEEE. pp. 2249–2254.

Angelov, P., Buswell, R., 2002. Identiﬁcation of evolving fuzzy rule-based models. Fuzzy Systems, IEEE Transactions

on 10, 667–677.

Angelov, P., Filev, D., 2002. Flexible models with evolving structure, in: Intelligent Systems, 2002. Proceedings. 2002

First International IEEE Symposium, pp. 28–33 vol.2.

Angelov, P., Filev, D., 2004. An approach to online identiﬁcation of Takagi-Sugeno fuzzy models. Systems, Man, and

Cybernetics, Part B: Cybernetics, IEEE Transactions on 34, 484–498.

Angelov, P., Kasabov, N., 2006. Evolving intelligent systems, eis. IEEE SMC eNewsLetter 15, 1–13.

Angelov, P., Ramezani, R., Zhou, X., 2008. Autonomous novelty detection and object tracking in video streams using

evolving clustering and Takagi-Sugeno type neuro-fuzzy system, in: Neural Networks, 2008. IJCNN 2008. (IEEE

World Congress on Computational Intelligence). IEEE International Joint Conference on, pp. 1456–1463.

Angelov, P., Yager, R., 2011. Simpliﬁed fuzzy rule-based systems using non-parametric antecedents and relative data

density, in: Evolving and Adaptive Intelligent Systems (EAIS), 2011 IEEE Workshop on, pp. 62–69.

Angelov, P., Yager, R., 2012. A new type of simpliﬁed fuzzy rule-based systems. International Journal of General

Systems 41, 163–185.

Angelov, P., Zhou, X., 2008. Evolving fuzzy-rule-based classiﬁers from data streams. Fuzzy Systems, IEEE Transactions

on 16, 1462–1475.

Anwar, S., Chen, L., 2007. An analytical redundancy-based fault detection and isolation algorithm for a road-wheel

control subsystem in a steer-by-wire system. Vehicular Technology, IEEE Transactions on 56, 2859–2869.

Baruah, R., Angelov, P., 2012. Evolving local means method for clustering of streaming data, in: Fuzzy Systems (FUZZ-

IEEE), 2012 IEEE International Conference on, pp. 1–8.

Baruah, R.D., Angelov, P.P., 2013. Online learning and prediction of data streams using dynamically evolving fuzzy

approach, in: FUZZ-IEEE, pp. 1–8.

Bernieri, A., Betta, G., Liguori, C., 1996. On-line fault detection and diagnosis obtained by implementing neural algo-

rithms on a digital signal processor. Instrumentation and Measurement, IEEE Transactions on 45, 894–899.

Breunig, M., Kriegel, H.P., Ng, R.T., Sander, J., 2000. Lof: Identifying density-based local outliers, in: Proceedings of

the 2000 ACM SIGMOD International Conference on Management of Data, ACM. pp. 93–104.

Chen, W., Saif, M., 2007. Observer-based strategies for actuator fault detection, isolation and estimation for certain class

of uncertain nonlinear systems. Control Theory Applications, IET 1, 1672–1680.

Cook, G., Maxwell, J., Barnett, R., Strauss, A., 1997. Statistical process control application to weld process. Industry

Applications, IEEE Transactions on 33, 454–463.

Costa, B., Bezerra, C., Guedes, L., 2012. A multistage fuzzy controller: Toolbox for industrial applications, in: Industrial

Technology (ICIT), 2012 IEEE International Conference on, pp. 1142–1147.

Costa, B., Bezerra, C.G., Guedes, L.A., 2010. Java fuzzy logic toolbox for industrial process control, in: Brazilian

Conference on Automatics (CBA), Brazilian Society for Automatics (SBA), Bonito-MS, Brazil.

Costa, B., Skrjanc, I., Blazic, S., Angelov, P., 2013. A practical implementation of self-evolving cloud-based control of

a pilot plant, in: 2013 IEEE International Conference on Cybernetics, Lausanne, Switzerland.

DeLorenzo, 2009. DL 2314BR - Didactic process control pilot plant. Catalog. DeLorenzo Italy. Italy.

Donders, S., 2002. Fault Detection and Identiﬁcation for Wind Turbine Systems: a closed-loop analysis. Master’s thesis.

Faculty of Applied Physics, Systems and Control Engineering, University of Twente. The Netherlands.

El-Shal, S., Morris, A., 2000. A fuzzy expert system for fault detection in statistical process control of industrial

processes. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 30, 281–289.

Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., 1996. Advances in knowledge discovery and data mining, American

Association for Artiﬁcial Intelligence, Menlo Park, CA, USA. chapter From Data Mining to Knowledge Discovery:

An Overview, pp. 1–34.

Gama, J., Kosina, P., 2011. Learning decision rules from data streams, in: Proceedings of the Twenty-Second Interna-

tional Joint Conference on Artiﬁcial Intelligence, pp. 1255–1260.

Hautamaki, V., Karkkainen, I., Franti, P., 2004. Outlier detection using k-nearest neighbour graph, in: Pattern Recogni-

tion, 2004. ICPR 2004. Proceedings of the 17th International Conference on, pp. 430–433 Vol.3.

Hossain, A., Choudhury, Z., Suyut, S., 1996. Statistical process control of an industrial process in real time. Industry

Applications, IEEE Transactions on 32, 243–249.

Ishibuchi, H., Nozaki, K., Yamamoto, N., Tanaka, H., 1995. Selecting fuzzy if-then rules for classiﬁcation problems

using genetic algorithms. Fuzzy Systems, IEEE Transactions on 3, 260–270.

Kano, M., Sakata, T., Hasebe, S., 2010. Just-in-time statistical process control for ﬂexible fault management, in: SICE

Annual Conference 2010, Proceedings of, pp. 1482–1485.

Kolev, D., Angelov, P., Markarian, G., Suvorov, M., Lysanov, S., 2013. Arfa: Automated real-time ﬂight data analysis

using evolving clustering, classiﬁers and recursive density estimation, in: Proceedings of the IEEE Symposium Series

on Computational Intelligence SSCI-2013, Singapore. pp. 91–97.

Kosina, P., Gama, J., 2012. Handling time changing data with adaptive very fast decision rules, in: Machine Learning and

Knowledge Discovery in Databases. Springer Berlin Heidelberg. volume 7523 of Lecture Notes in Computer Science,

pp. 827–842.

Laukonen, E., Passino, K., Krishnaswami, V., Luh, G.C., Rizzoni, G., 1995. Fault detection and isolation for an exper-

imental internal combustion engine via fuzzy identiﬁcation. Control Systems Technology, IEEE Transactions on 3,

347–355.

Laurentys, C., Palhares, R., Caminhas, W., 2010a. Design of an artiﬁcial immune system based on danger model for

fault detection. Expert Systems with Applications 37, 5145 – 5152.

Laurentys, C., Ronacher, G., Palhares, R., Caminhas, W., 2010b. Design of an artiﬁcial immune system for fault detec-

tion: A negative selection approach. Expert Systems with Applications 37, 5507 – 5513.

Leite, D.F., Hell, M.B., Jr., P.C., Gomide, F., 2009. Real-time fault diagnosis of nonlinear systems. Nonlinear Analysis:

Theory, Methods & Applications 71, e2665 – e2673.

Lemos, A., Caminhas, W., Gomide, F., 2013. Adaptive fault detection and diagnosis using an evolving fuzzy classiﬁer.

Information Sciences 220, 64 – 85. Online Fuzzy Machine Learning and Data Mining.

Li, X.J., Yang, G.H., 2012. Dynamic observer-based robust control and fault detection for linear systems. Control Theory

Applications, IET 6, 2657–2666.

Liu, J., Lim, K.W., Ho, W.K., Tan, K.C., Tay, A., Srinivasan, R., 2005. Using the opc standard for real-time process

monitoring and control. IEEE Software 22, 54–59.

Liukkonen, T., Tuominen, A., 2004. A case study of spc in circuit board assembly: statistical mounting process control,

in: Microelectronics, 2004. 24th International Conference on, pp. 445–448 vol.2.

Lughofer, E., 2010. On-line evolving image classiﬁers and their application to surface inspection. Image and Vision

Computing 28, 1065 – 1079.

Lughofer, E., Guardiola, C., 2008. On-line fault detection with data-driven evolving fuzzy models. Control and Intelligent

Systems 36, 307–317.

Maiying, Z., Chenghui, Z., Steven, D., James, L., 2004. Observer-based fault detection scheme for a class of discrete

time-delay systems. Systems Engineering and Electronics, Journal of 15, 288–294.

Mamdani, E., Assilian, S., 1975. An experiment in linguistic synthesis with a fuzzy logic controller. International Journal

of Man-Machine Studies 7, 1–13.

Marins, A., 2009. Continuous Process Workbench. Technical Manual. DeLorenzo Brazil. Brazil.

Martin, E., Morris, A.J., Zhang, J., 1996. Process performance monitoring using multivariate statistical process control.

Control Theory and Applications, IEE Proceedings - 143, 132–144.

Oblak, S., Skrjanc, I., Blazic, S., 2007. Fault detection for nonlinear systems with uncertain parameters based on the

interval fuzzy model. Engineering Applications of Artiﬁcial Intelligence 20, 503 – 510.

Papadimitriou, S., Kitagawa, H., Gibbons, P., Faloutsos, C., 2003. Loci: fast outlier detection using the local correlation

integral, in: Data Engineering, 2003. Proceedings. 19th International Conference on, pp. 315–326.

Ramezani, R., Angelov, P., Zhou, X., 2008. A fast approach to novelty detection in video streams using recursive density

estimation, in: Intelligent Systems, 2008. IS ’08. 4th International IEEE Conference, pp. 14–2–14–7.

Samantaray, A.K., Bouamama, B.O., 2008. Model-based Process Supervision: A Bond Graph Approach. 1st ed.,

Springer Publishing Company, Incorporated.

Schwarz, M.H., Boercsoek, J., 2007. A survey on ole for process control (opc), in: Proceedings of the 7th Conference

on 7th WSEAS International Conference on Applied Computer Science, World Scientiﬁc and Engineering Academy

and Society (WSEAS), Stevens Point, Wisconsin, USA. pp. 186–191.

Serdio, F., Lughofer, E., Pichler, K., Buchegger, T., Efendic, H., 2014. Residual-based fault detection using soft comput-

ing techniques for condition monitoring at rolling mills. Information Sciences 259, 304 – 320.

Silva, D.R.C., 2008. Sistema de Deteccao e Isolamento de Falhas em Sistemas Dinamicos Baseado em Identiﬁcacao

Parametrica. PhD in Computer Engineering. Departamento de Engenharia de Computacao e Automacao - Universi-

dade Federal do Rio Grande do Norte (UFRN). Brazil.

Simani, S., Patton, R.J., 2008. Fault diagnosis of an industrial gas turbine prototype using a system identiﬁcation

approach. Control Engineering Practice 16, 769 – 786.

Sneider, H., Frank, P., 1996. Observer-based supervision and fault detection in robots using nonlinear and fuzzy logic

residual evaluation. Control Systems Technology, IEEE Transactions on 4, 274–282.

Suvorov, M., Ivliev, S., Markarian, G., Kolev, D., Zvikhachevskiy, D., Angelov, P., 2013. Osa: One-class recursive svm

algorithm with negative samples for fault detection, in: Artiﬁcial Neural Networks and Machine Learning - ICANN

2013. Springer Berlin Heidelberg. volume 8131 of Lecture Notes in Computer Science, pp. 194–207.

Takagi, T., Sugeno, M., 1985. Fuzzy identiﬁcation of system and its applications to modeling and control. IEEE

Transactions on Systems, Man and Cybernetics 15, 116–132.

Tang, J., Chen, Z., Fu, A.W.C., Cheung, D.W.L., 2002. Enhancing eﬀectiveness of outlier detections for low density

patterns, in: Proceedings of the 6th Paciﬁc-Asia Conference on Advances in Knowledge Discovery and Data Mining,

Springer-Verlag, London, UK, UK. pp. 535–548.

Vemuri, A., Polycarpou, M., Diakourtis, S., 1998. Neural network based fault detection in robotic manipulators. Robotics

and Automation, IEEE Transactions on 14, 342–348.

Venkatasubramanian, V., Rengaswamy, R., Yin, K., Kavuri, S.N., 2003. A review of process fault detection and diagnosis:

Part i: Quantitative model-based methods. Computers & Chemical Engineering 27, 293 – 311.

Wang, P., Guo, C., 2013. Based on the coal mine’s essential safety management system of safety accident cause analysis.

American Journal of Environment, Energy and Power Research 1, 62 – 68.

Xu, L., Tseng, H., 2007. Robust model-based fault detection for a roll stability control system. Control Systems

Technology, IEEE Transactions on 15, 519–528.

Yang, H., Xia, Y., Liu, B., 2011. Fault detection for t-s fuzzy discrete systems in ﬁnite-frequency domain. Systems, Man,

and Cybernetics, Part B: Cybernetics, IEEE Transactions on 41, 911–920.

Fault detection and automation in the liquid storage terminals

Conference Paper

Full-text available

Feb 2023

Conventional approaches do not have the real time monitoring mechanism to detect the abnormal conditions in the industrial processes. The existing industrial processes in conventional way are not safe because of lack of monitoring systems. A considerable enhancement in chemical industries in recent years through automation increases the attention towards the advanced monitoring of systems. Real time monitoring and automation systems are capable of reducing the time to market, effectiveness and availability within the plant, as well as reducing plant costs. Every chemical plant across the globe needs to equipped with the various solutions that exist in terms of commissioning as well as taking advantage of automation services when and where possible. The automation systems in industry communicated with every equipment and instrument, much like a nervous system that touches every muscle and receptor in the body. In order to improve the performance, this paper proposes the automation solution and strategy for a liquid storage terminal.

Fault Detection using Unsupervised Feature Learning on Big Data Based on Improved Weighted SoftMax Regression with Jaya Optimization

Article

Full-text available

Jul 2019

Intelligent fault detection is promising to deal with big data due to its ability in rapidly and efficiently processing collected signals and providing accurate detection results. In traditional fault detection methods, however, the features are manually extracted depending on prior knowledge and diagnostic expertise, such processes take advantage of human ingenuity but are time-consuming. Inspired by the idea of unsupervised feature learning artificial intelligence techniques are used to learn features from the raw data.As the dimensionality increases, the accuracy of fault identification methods implemented on big data decreases significantly. For supervised learning, large volume of data is needed which leads to high cost and time consuming. In this paper, an unsupervised learning approach is proposed on the basis of weighted softmax regression for fault detection using the power signals. In the proposed approach, the features are extracted from the unlabelled data. The developed approach is based on squirrel search algorithm and voting based weighted softmax regression with jaya optimization algorithm (IWSRJO_SSA). The proposed approach is simple and easy to carry out, though it attains high accuracy when compared to that of more advanced techniques. The features from the signals are extracted normally and some of the best features are selected using the squirrel search algorithm (SSA). Because of selecting the best features and using the improved weighted softmax regression, the proposed method achieves high accuracy in fault detection. Experiment on the power signal dataset shows the applicability of the proposed approach in fault detection on big data. The proposed system will be applicable on medical field, industrial field, electrical field and so on. The experimental results prove that the proposed method attains high accuracy and is superior to the other advanced methods.

Data-Driven Process Monitoring and Fault Diagnosis: A Comprehensive Survey

Article

Full-text available

Jan 2024

This paper presents a comprehensive review of the historical development, the current state of the art, and prospects of data-driven approaches for industrial process monitoring. The subject covers a vast and diverse range of works, which are compiled and critically evaluated based on the different perspectives they provide. Data-driven modeling techniques are surveyed and categorized into two main groups: multivariate statistics and machine learning. Representative models, namely principal component analysis, partial least squares and artificial neural networks, are detailed in a didactic manner. Topics not typically covered by other reviews, such as process data exploration and treatment, software and benchmarks availability, and real-world industrial implementations, are thoroughly analyzed. Finally, future research perspectives are discussed, covering aspects related to system performance, the significance and usefulness of the approaches, and the development environment. This work aims to be a reference for practitioners and researchers navigating the extensive literature on data-driven industrial process monitoring.

Experimental Investigation of Evolving Cloud-based Fuzzy Control of a Pilot Thermal Exchanger Under a Decentralized Framework

Article

Sep 2023
APPL SOFT COMPUT

Optimizing Fault Detection for Big Data Analytics Through Evolutionary Computation

Chapter

Sep 2023

In this paper, an unsupervised learning approach is proposed based on weighted softmax regression for fault detection using the power signals. Fault prediction has become an important subject in recent years, as it helps businesses to make significant savings in time and expense by offering successful methods for predictive maintenance. Preprocessing of data was a complicated job to overcome many problems with the dataset, including scale, sparsity, distortion, burst effects, and confidence. As pre-monitor signals for failure did not share standard patterns, but were characterized only as non-normal system signals, a predictive error was made using outlier detection. Faults were explained by displaying system characteristics with abnormal values. An experimental assessment was conducted to determine the quality of the solution proposed. Results indicate that high-grade outliers provide successful markers of initial failures. In addition, explanations about irregular characteristic values (responsible for oversight) seem rather expressive. Based on the sliding window technique, the method to detect errors in high-dimensional data streams is applied to an online mode. The online extension can be adapted to the time changing behavior of the controlled system by experiments on synthetic datasets and is therefore applicable to the dynamic error detection. To assess the suggested strategy, we contrasted it with engineered datasets created utilizing the LOF (online expansion), SVDD, and KPCA approaches getting more than 90% result. The data exhibits our methodology’s accomplishment as far as perfection, productivity, and strength.

Evolving Systems: Review

Chapter

Nov 2017

In this article, we describe the research area of evolving systems starting with a brief history, the basic concepts, and definitions and moving to classes of evolving systems. We further provide illustrative examples of evolving systems to various problems. The research area of evolving systems is closely related to adaptive systems, machine learning, mathematical modeling, data science, and computer systems. Some systems with such properties emerged at the end of last century, but the area was formed and recognized in the current century. The importance of this research – which is now a recognized niche topic with its own scientific journal ( Evolving Systems ), annual IEEE conferences, technical committees, etc. – is even clearer now when the era of Big Data, data streams, machine learning, and intelligence is a topic not only of research but also of publicity. The importance of evolving systems was stressed at the very beginning of its formation as a subdiscipline because a true intelligence can only be evolving. Indeed, the majority of the real processes and phenomena that people are interested in, such as modeling, predicting, classifying, controlling, or simply monitoring, are nonstationary; they are complex, nonlinear, and dynamically evolving. This includes climate, human behavior, social and biomedical systems, and even contemporary technical systems. The reality in which we are living now, couple of decades after the appearance of evolving systems, is characterized not only by a huge volume of exponentially growing data but also by their dynamically evolving nature, heterogeneous form, uncertainty, and often lack of structure. The classical theories and practical toolset (techniques, algorithms, and methods) have their own limitations and often rely on unrealistic assumptions. The concept of evolving systems was revolutionary and broke a number of these assumptions bringing the results closer to the reality of the problems, processes, and phenomena we study. One such key restrictive assumption usually made is about the fixed structure of the model of the system that describes processes or phenomena we are interested in. This is closely linked with the millennia old principle of approaching complex issues – “divide et Impera.” Adding to it “evolve” is vital.

Evolving Intelligent Systems

Chapter

Feb 2020

This article provides an overview of Evolving Intelligent Systems. It discusses the architecture and design of evolving frameworks that are based on computational intelligence, namely, evolving fuzzy systems, evolving neural networks, and evolving neuro‐fuzzy systems. The article further describes evolving approaches in the context of unsupervised and supervised learning. It highlights several successful applications of evolving systems. The reader would also be introduced to the novel concept of collaborative evolving intelligent systems through this article. Finally, the article is concluded with future research directions.

Machine Monitoring System

Article

Full-text available

Sep 2019

At present, it is hard to maintain and monitor the machine tool in industry. This paper proposed one kind of machine monitoring system using current transformer and RTC. In this scheme, C.T, P.T, ZCD continuously monitor the machine and give the status to the arduino. The arduino intimates the GSM to send the message to the owner/manager whenever the non-working timing is exceed the certain limit.

Autonomous learning for fuzzy systems: a review

Article

Full-text available

Dec 2022
ARTIF INTELL REV

As one of the three pillars in computational intelligence, fuzzy systems are a powerful mathematical tool widely used for modelling nonlinear problems with uncertainties. Fuzzy systems take the form of linguistic IF-THEN fuzzy rules that are easy to understand for human. In this sense, fuzzy inference mechanisms have been developed to mimic human reasoning and decision-making. From a data analytic perspective, fuzzy systems provide an effective solution to build precise predictive models from imprecise data with great transparency and interpretability, thus facilitating a wide range of real-world applications. This paper presents a systematic review of modern methods for autonomously learning fuzzy systems from data, with an emphasis on the structure and parameter learning schemes of mainstream evolving, evolutionary, reinforcement learning-based fuzzy systems. The main purpose of this paper is to introduce the underlying concepts, underpinning methodologies, as well as outstanding performances of the state-of-the-art methods. It serves as a one-stop guide for readers learning the representative methodologies and foundations of fuzzy systems or who desire to apply fuzzy-based autonomous learning in other scientific disciplines and applied fields.

Evolving multi-user fuzzy classifier system with advanced explainability and interpretability aspects

Article

Nov 2022
INFORM FUSION

Evolving classifiers and especially evolving fuzzy classifiers have been established as a prominent technique for addressing the recent demands in building classifiers in an incremental online manner, based on target labels typically provided by a single user. We present a framework for an interactive evolving multi-user fuzzy classifier system with advanced explainability and interpretability aspects (EFCS-MU-AEI). Multiple users may provide their label feedback based on which own users’ classifiers are incrementally trained with evolving learning concepts. Its classification outputs are amalgamated by a specific ensembling scheme, respecting (i.) uncertainty in the class labels due to labeling ambiguities among the users and (ii.) different experience levels of the users as voting weights. A major focus thereby is concentrated on the explainability of classification outputs for the purpose to increase the quality (consistency and certainty) of the user (labelling) feedbacks. It is established to show reasons why certain decisions have been made and with which certainty levels and rule coverage degrees. The reasons are deduced from the most active rules, which are reduced in their length by a statistically-motivated instance-based feature importance level concept. Another major focus lies on the interpretability of extracted rules in order to represent understandable knowledge contained in the classification problem and especially to realize the labelling behaviors of different users for different parts of the feature space (= different sample groups). A specific incremental feature weighting technique, respecting label uncertainties from multiple users and sample forgetting weights (for handling drifts), as well as a fuzzy set merging process are proposed to aim for a high compactness and transparency of the rules. Our approach was evaluated based on a visual inspection scenario. It could be shown that the explanations of the classifier decisions in fact significantly improved the labelling behavior of three single users in terms of showing higher accumulated accuracy trends. Feature weights integration into the classifier updates could achieve transparent rules with final essential four features to describe the classification problem. Based on this description, it turned out in which ways, i.e. for which sample groups, the users with lower experience levels should be taught to improve their understanding about the process.

Fault detection and identification for wind turbine systems: a closed-loop analysis

Conference Paper

Full-text available

Sep 2004

This paper assesses the usability of time-domain model-based Fault Detection and Identification (FDI) methods for application to a horizontal axis wind turbine (HAWT) that uses pitch-to-vane control. Two scenarios are considered: the estimation of an unknown actuator gain and an unknown actuator delay. The wind acts as a disturbance to the system, for which no measurement is available. A linear HAWT model has been designed and used as a simulation environment. Using System Identification, no reliable model could be estimated because of the closed-loop measurements. Two observer-based approaches, the discrete-time Kalman filter and the Interacting Multiple-Model (IMM) estimator, have also been investigated. Both methods allowed simultaneous estimation of the wind speed and the faulty pitch actuator gain. When compared to the discrete-time Kalman filter, the IMM estimator is more versatile: it has been applied successfully to estimate an unknown actuator delay.

Java Fuzzy Logic Toolbox for Industrial Process Control

Conference Paper

Full-text available

Sep 2010

This paper describes the design, implementation and application of a fuzzy logic toolbox for industrial process control based on Java language, supporting communication through the OPC industrial protocol. The toolbox is written in Java and is completely independent of any other platforms. It provides easy and functional tools for modelling, building and editing complex fuzzy inference systems and using such logic systems to control a large variety of industrial processes.

Autonomous Learning Systems: From Data Streams to Knowledge in Real-time

Book

Dec 2012

Plamen Angelov

OSA: One-Class Recursive SVM Algorithm with Negative Samples for Fault Detection

Conference Paper

Sep 2013

In this paper a novel one-class classification approach (called OSA) is proposed. The algorithm is particularly suitable for fault detection in complex technological systems, such as aircraft. This study is based on the capability of one-class support vector machine (SVM) method to classify correctly the observation and measurement data, obtained during the exploitation of the system such as airborne aircraft into a single class of ‘normal’ behavior and, respectively, leave data that is not assigned to this class as suspected anomalies. In order to ensure real time (in flight) application a recursive learning procedure of the method is proposed. The proposed method takes into account both “positive”/“normal” and “negative”/“abnormal” examples of the base class, keeping the overall model structure as an outlier-detection approach. This approach is generic for any fault detection problem (for example in areas such as process control, computer networks, analysis of data from interrogations, etc.). The advantages of the new algorithm based on OSA are verified by comparison with several classifiers, including the traditional one-class SVM. The proposed approach is tested for fault detection problem using real flight data from a large number of aircraft of different make (USA, Western European as well as Russian).

Fundamentals of Probability Theory

Chapter

Nov 2012

Plamen P Angelov

Probability theory is one of the methodologies to represent and tackle some types of uncertainties. The frequentistic approach considers the probability as a frequency of occurrence of an event. Obviously, the belief-based approach is more subjective, epistemiological, while the frequentistic approach is rooted in statistics. Probabilities are non-negative (since they represent frequencies) and are represented by probability densities. Kernel density estimation (KDE) is a generic approach where the conclusions apply to the whole data distribution but the kernels are drawn from a finite set of representative data samples. The recursive expressions of the recursive density estimation (RDE) are exact and apply for both global and local density. There is a solid body of literature concerning statistical approaches for novelty (respectively, anomalies, outliers) detection, but the approaches are predominantly offline or require expert knowledge. probability; probability theory; recursive estimation

Based on the coal mine's essential safety management system of safety accident cause analysis

Article

Jan 2013

A review of process fault detection and diagnosis

Article

Mar 2003
COMPUT CHEM ENG

In this part of the paper, we review qualitative model representations and search strategies used in fault diagnostic systems. Qualitative models are usually developed based on some fundamental understanding of the physics and chemistry of the process. Various forms of qualitative models such as causal models and abstraction hierarchies are discussed. The relative advantages and disadvantages of these representations are highlighted. In terms of search strategies, we broadly classify them as topographic and symptomatic search techniques. Topographic searches perform malfunction analysis using a template of normal operation, whereas, symptomatic searches look for symptoms to direct the search to the fault location. Various forms of topographic and symptomatic search strategies are discussed.

On-line fault detection with data-driven evolving fuzzy models

Article

Jan 2008
Contr Intell Syst

The main contribution of this paper is a novel fault detection strategy, which is able to cope with changing system states at on-line measurement systems fully automatically. For doing so, an improved fault detection logic is introduced which is based on data-driven evolving fuzzy models. These are sample-wise trained from on-line measurement data from scratch, i.e., the structure and rules of the models evolve over time in order to cope (1) with high-frequented measurement recordings and (2) on-line changing operating conditions. The evolving models represent (changing) dependencies between certain system variables and are used for calculating the deviation between expected model outputs and real-measured values on new incoming data samples (→ residuals). The residuals are compared with confidence regions surrounding the evolving fuzzy models, so-called local error bars and their behaviour are analysed over time by adaptive univariate statistical methods → anomalies in the residual signals indicate faults in the system. Due to local error bars, it is possible to react very flexibly on local regions within the system variables and hence to increase the fault detection performance significantly. Evaluation results based on high-dimensional measurement data from engine test benches are demonstrated at the end of the paper, where the novel fault detection approach is compared against static analytical (fault) models.

Evolving rule-based models. A tool for design of flexible adaptive systems

Book

Jan 2002

Plamen P Angelov

A survey on OLE for process control (OPC)

Conference Paper

Nov 2007

OLE for Process Control, also known as OPC, is a client-server architecture for exchanging process data. Although the first OPC-standard was published in 1996, and is today widely accepted and used in industries, it is still not very popular in academia, especially in Europe. The paper gives detailed information about OPC, and how OPC can be beneficial for research and development and gives an overview of the latest developments and standards.

Fully unsupervised fault detection and identification based on recursive density estimation and self-evolving cloud-based classifier

Abstract and Figures

Recommended publications

Fault Diagnosis in a Flexible Link Robot

Fault detection of redundant systems based on B-spline neural network

Fault Diagnosis System of the Fire Control System Based on Fuzzy Neural Network

A Methodology for Fault Diagnosis of Diesel NOx Aftertreatment Systems