ThesisPDF Available

Evolving Granular Systems

July 2012

July 2012

DOI:10.13140/RG.2.2.16232.62725

Thesis for: PhD
Advisor: Fernando Gomide

Authors:

Daniel F. Leite

Universität Paderborn

In recent years there has been increasing interest in computational modeling approaches to deal with real-world data streams. Methods and algorithms have been proposed to uncover meaningful knowledge from very large (often unbounded) data sets in principle with no apparent value. This thesis introduces a framework for evolving granular modeling of uncertain data streams. Evolving granular systems comprise an array of online modeling approaches inspired by the way in which humans deal with complexity. These systems explore the information flow in dynamic environments and derive from it models that can be linguistically understood. Particularly, information granulation is a natural technique to dispense unnecessary details and emphasize transparency, interpretability and scalability of information systems. Uncertain (granular) data arise from imprecise perception or description of the value of a variable. Broadly stated, various factors can affect one's choice of data representation such that the representing object conveys the meaning of the concept it is being used to represent. Of particular concern to this work are numerical, interval, and fuzzy types of granular data; and interval, fuzzy, and neurofuzzy modeling frameworks. Learning in evolving granular systems is based on incremental algorithms that build model structure from scratch on a per-sample basis and adapt model parameters whenever necessary. This learning paradigm is meaningful once it avoids redesigning and retraining models all along if the system changes. Application examples in classification, function approximation, time-series prediction and control using real and synthetic data illustrate the usefulness of the granular approaches and framework proposed. The behavior of nonstationary data streams with gradual and abrupt regime shifts is also analyzed in the realm of evolving granular computing. We shed light upon the role of interval, fuzzy, and neurofuzzy computing in processing uncertain data and providing high-quality approximate solutions and rule summary of input-output data sets. The approaches and framework introduced constitute a natural extension of evolving intelligent systems over numeric data streams to evolving granular systems over granular data streams.

Performance of evolving granular classifiers using different proportions of unlabeled data

…

Figures - uploaded by Daniel F. Leite

Content may be subject to copyright.

Content uploaded by Daniel F. Leite

Content may be subject to copyright.

Evolving Granular Systems

Daniel Furtado Leite

Supervisor: Fernando Gomide

Co-supervisor: Pyramo Costa Jr.

Department of Computer Engineering and Industrial Automation

School of Electrical and Computer Engineering

University of Campinas

A thesis submitted in partial fulﬁllment of the requirements for

the degree of Doctor of Philosophy

July, 2012

iii

Acknowledgement

First and foremost, I would like to thank my supervisor Fernando Gomide. It is

diﬃcult to imagine this thesis without his inspiring mentorship, insightful sug-

gestions, and words of encouragement. I owe him a debt of gratitude for all I

have learned from him. I am also greatly fortunate to have Pyramo Costa as my

co-supervisor. Apart from providing me with numerous research feedbacks, he

has been a constant source of support and stimulation.

My academic life was shaped by great professors. I would like to acknowledge

Fernando Von Zuben, Romis Attux, Akebo Yamakami, and Takaaki Ohishi for

lecturing some of the best classes I have ever taken, and for nurturing my early

interests in computational intelligence, mathematics and optimization. I am also

indebted to Professor Rosangela Ballini for her help in obtaining and analyzing

datasets, and for sharing her expertise in several opportunities.

I would like to thank the committee members Andre Lemos, Weldon Lodwick,

and again Fernando Von Zuben, Romis Attux, and Fernando Gomide for the

constructive feedback they provided me in order to improve this thesis.

I was privileged to have known talented and enthusiastic folks in our LCA

group throughout these years. Special thanks go to Joelma Costa, Glaucya

Boechat, Yi Liu, Fernando Bordignon, Leandro Maciel, and Vitor Marques for

being a great support in hard times and joyful company in good times. My grat-

itude extends to Lucas Nascimento, Alan Barbosa, Luiz Bergo, Enderson Cruz,

and Israel Mendes, fellows from our LASI group at PUC Minas.

I appreciated the help of Carmen Fonseca, Mariana Silva, Noemia Benatti,

Edson Filho, Maria Waldman, Carolina Velho, Jerusa Soares, and Zilda Padovan,

who made their support available in countless ways whenever I needed it.

I have much gratitude for CAPES, the Brazilian Ministry of Education, for

the fellowship which enabled me to pursue this research.

Last, but certainly not least, I would like to thank Daniela and Lucia for

their endless support and encouragement during this long endeavor. Their love

and perseverance have kept me going through the roughest of times. To them I

dedicate this thesis.

Abstract

In recent years there has been increasing interest in computational modeling ap-

proaches to deal with real-world data streams. Methods and algorithms have been

proposed to uncover meaningful knowledge from very large (often unbounded)

data sets in principle with no apparent value. This thesis introduces a framework

for evolving granular modeling of uncertain data streams. Evolving granular sys-

tems comprise an array of online modeling approaches inspired by the way in

which humans deal with complexity. These systems explore the information ﬂow

in dynamic environments and derive from it models that can be linguistically un-

derstood. Particularly, information granulation is a natural technique to dispense

unnecessary details and emphasize transparency, interpretability and scalability

of information systems. Uncertain (granular) data arise from imprecise percep-

tion or description of the value of a variable. Broadly stated, various factors

can aﬀect one’s choice of data representation such that the representing object

conveys the meaning of the concept it is being used to represent. Of particular

concern to this work are numerical, interval, and fuzzy types of granular data; and

interval, fuzzy, and neurofuzzy modeling frameworks. Learning in evolving gran-

ular systems is based on incremental algorithms that build model structure from

scratch on a per-sample basis and adapt model parameters whenever necessary.

This learning paradigm is meaningful once it avoids redesigning and retraining

models all along if the system changes. Application examples in classiﬁcation,

function approximation, time-series prediction and control using real and syn-

thetic data illustrate the usefulness of the granular approaches and framework

proposed. The behavior of nonstationary data streams with gradual and abrupt

regime shifts is also analyzed in the realm of evolving granular computing. We

shed light upon the role of interval, fuzzy, and neurofuzzy computing in process-

ing uncertain data and providing high-quality approximate solutions and rule

summary of input-output data sets. The approaches and framework introduced

constitute a natural extension of evolving intelligent systems over numeric data

streams to evolving granular systems over granular data streams.

vii

Publications

During the course of this research, a number of publications were produced which

are based on or somehow related to the content of this thesis. They are listed

below for reference.

Book chapters

•Leite, D.; Costa, P.; Gomide, F. “Interval approach for evolving granular

system modeling.” In: Mouchaweh, M.; Lughofer, E. (Eds.) Learning in

Non-stationary Environments: Methods and Applications, Springer - New

York, pp: 271-301, 2012.

•Leite, D.; Gomide, F. “Evolving linguistic fuzzy models from data streams.”

In: Trillas, E.; Bonissone, P.; Magdalena, L.; Kacprycz, J. (Eds.) Combin-

ing Experimentation and Theory: A Hommage to Abe Mamdani (Studies

in Fuzziness and Soft Computing), Springer - Verlag, pp: 209-223, 2011.

•Leite, D.; Costa, P.; Gomide, F. “Granular approach for evolving systems

modeling.” In: Hullermeier, E.; Kruse, R.; Hoﬀmann, F. (Eds.) Lecture

Notes in Artiﬁcial Intelligence (LNAI/IPMU), Vol. 6178, pp: 340-349,

Springer - Verlag Berlin Heidelberg, 2010.

Journals

•Leite, D.; Costa, P.; Gomide, F. “Evolving granular neural networks from

fuzzy data streams.” Neural Networks. (Accepted).

•Leite, D.; Ballini, R.; Costa, P.; Gomide, F. “Evolving fuzzy granular mod-

eling from nonstationary fuzzy data streams.” Evolving Systems, Springer.

Vol. 3, Issue 2, pp: 65-79, 2012.

•Leite, D.; Hell, M.; Costa, P.; Gomide, F. “Real-time fault diagnosis of

nonlinear systems.” Nonlinear Analysis: Theory, Methods & Applications.

Vol. 71, Issue 12, pp: 2665-2673, 2009.

viii

International conferences

•Leite, D.; Costa, P.; Gomide, F. “Evolving granular neural network for

fuzzy time series forecasting.” World Congress on Computational Intelli-

gence (WCCI: IJCNN), Brisbane - AU, 8p. 2012.

•Lemos, A.; Leite, D.; Maciel, L.; Ballini, R.; Caminhas, W.; Gomide, F.

“Evolving Fuzzy Linear Regression Tree Approach for Forecasting Sales

Volume of Petroleum Products.” World Congress on Computational Intel-

ligence (WCCI: FUZZ-IEEE), Brisbane - AU, 8p. 2012.

•Leite, D.; Gomide, F.; Ballini, R.; Costa, P. “Fuzzy granular evolving mod-

eling for time series prediction.” IEEE International Conference on Fuzzy

Systems, Taipei - TW, pp: 2794-2801, 2011.

•Leite, D.; Costa, P.; Gomide, F. “Evolving granular neural network for semi-

supervised data stream classiﬁcation.” World Congress on Computational

Intelligence (WCCI: IJCNN), Barcelona - ES, pp: 1877-1884, 2010.

•Leite, D.; Costa, P.; Gomide, F. “Evolving granular classiﬁcation neural

networks.” IEEE International Joint Conference on Neural Networks, At-

lanta - US, pp: 1736-1743, 2009.

•Leite, D.; Attux, R.; Von Zuben, V.; Costa P.; Gomide, F. “Evolutionary

neural network applied to induction motors stator fault detection.” IEEE

International Electric Machines and Drives Conference, Miami - US, pp:

1721-1728, 2009.

•Leite, D.; Costa P.; Gomide, F. “Interval-based evolving modeling.” IEEE

Symposium Series on Computational Intelligence: Workshop on Evolving

Systems, Nashville - US, pp: 1-8, 2009.

Brazilian conferences

•Leite, D.; Ballini, R.; Costa, P.; Gomide, F. “Fuzzy granular evolving mod-

eling.” 10th Brazilian Symposium on Intelligent Automation, Sao Joao Del

Rey - MG, 6p. 2011. (In Portuguese).

•Leite, D.; Costa, P.; Gomide, F. “Granular neural networks for semi-

supervised learning.” 18th Brazilian Congress on Automatics, Bonito - MS,

8p. 2010. (In Portuguese).

•Leite, D.; Costa, P.; Gomide, F. “Evolving granular systems: real-time

processing of data streams.” (Abstract) 1st Brazilian Congress on Fuzzy

Systems, Sorocaba - SP, 2p. 2010. (In Portuguese).

•Leite, D.; Nascimento, L.; Barbosa, A.; Costa, P.; Ferreira, D.; Gomide,

F. “Evolving approach for power transformer fault detection.” 6th Interna-

tional Workshop on Power Transformers, Foz do Iguacu - PR, 8p. 2010.

(In Portuguese).

•Leite, D.; Gomide, F. “Granular neural network for evolving classiﬁcation.”

(Extended abstract) Annual Meeting of the Department of Computer Engi-

neering and Industrial Automation, UNICAMP, Campinas - SP, 4p. 2010.

(In Portuguese).

•Leite, D.; Bergo, L.; Costa, P.; Gomide, F. “Evolving granular neural net-

works in systems modeling.” 9th Brazilian Congress on Neural Networks,

Ouro Preto - MG, 5p. 2009. (In Portuguese).

•Leite, D.; Costa, P.; Gomide, F. “Evolving granular neural networks.” 9th

Brazilian Symposium on Intelligent Automation, Brasilia - DF, 6p. 2009.

(In Portuguese).

•Leite, D.; Costa, P.; Gomide, F. “Evolving connectionist systems.” 9th

Brazilian Symposium on Intelligent Automation, Brasilia - DF, 6p. 2009.

(In Portuguese).

•Leite, D.; Gomide, F. “Interval-based evolving modeling for streamﬂow

forecasting.” (Extended abstract) Annual Meeting of the Department of

Computer Engineering and Industrial Automation, UNICAMP, Campinas

- SP, 4p. 2009.

Contents

Acknowledgement iv

Abstract vi

Publications viii

1 Introduction 1

1.1 Background Research . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Objective ............................... 6

1.3 Contributions ............................. 7

1.4 Organization ............................. 8

2 Foundations of Granular Computing 11

2.1 Introduction.............................. 11

2.2 IntervalAnalysis ........................... 14

2.3 From Interval Analysis to Fuzzy Set Theory . . . . . . . . . . . . 21

2.4 FuzzySets............................... 22

2.5 Aggregation Operators . . . . . . . . . . . . . . . . . . . . . . . . 26

2.6 Summary ............................... 30

3 Evolving Granular Systems 31

3.1 Introduction.............................. 31

3.2 Evolving Intelligent Systems . . . . . . . . . . . . . . . . . . . . . 32

3.3 Granular Data Streams . . . . . . . . . . . . . . . . . . . . . . . . 38

3.4 Evolving Granular Modeling . . . . . . . . . . . . . . . . . . . . . 40

3.5 Time and Space Granulation . . . . . . . . . . . . . . . . . . . . . 41

3.6 Summary ............................... 45

4 Interval Based Evolving Modeling 47

4.1 Introduction.............................. 47

4.2 RelatedWork ............................. 49

4.3 Structure and Processing . . . . . . . . . . . . . . . . . . . . . . . 50

xii

CONTENTS

4.4 LearninginIBeM........................... 51

4.5 Summary ............................... 60

5 Fuzzy Set Based Evolving Modeling 61

5.1 Introduction.............................. 61

5.2 RelatedWork ............................. 62

5.3 Structure and Processing . . . . . . . . . . . . . . . . . . . . . . . 63

5.4 LearninginFBeM .......................... 66

5.5 Summary ............................... 74

6 Evolving Granular Neural Networks 75

6.1 Introduction.............................. 75

6.2 RelatedWork ............................. 77

6.3 Fuzzy Aggregation Neuron Model . . . . . . . . . . . . . . . . . . 80

6.4 Structure and Processing . . . . . . . . . . . . . . . . . . . . . . . 81

6.5 LearningineGNN .......................... 86

6.6 Summary ............................... 96

7 Application Examples 97

7.1 Introduction.............................. 97

7.2 Semi-Supervised Classiﬁcation . . . . . . . . . . . . . . . . . . . . 98

7.3 Time Series Prediction . . . . . . . . . . . . . . . . . . . . . . . . 107

7.4 Function Approximation . . . . . . . . . . . . . . . . . . . . . . . 122

7.5 Control ................................ 134

7.6 Summary ............................... 140

8 Conclusion 141

8.1 Summary ............................... 141

8.2 Contributions ............................. 143

8.3 FutureResearch............................ 144

A Universal Approximation 147

B Recursive Least Squares Method 153

References 157

xiii

List of Figures

2.1 Image fof box Iand inclusion functions Fand F∗........ 19

3.1 Granular models: (a) single-valued function, (b) granular function 39

3.2 Time and space granulation . . . . . . . . . . . . . . . . . . . . . 42

4.1 Expansion region of an IBeM granule . . . . . . . . . . . . . . . . 54

4.2 Creation and recursive adaptation of IBeM granules . . . . . . . . 56

4.3 Inter-granular conﬂict and data accommodation . . . . . . . . . . 58

5.1 Scattering approach for fuzzy data granulation . . . . . . . . . . . 64

5.2 Creation and recursive adaptation of FBeM granules . . . . . . . 71

6.1 Fuzzy aggregation neuron model . . . . . . . . . . . . . . . . . . . 81

6.2 Examples of input/output functions of fuzzy aggregation neurons 82

6.3 Single-valued approximation provided from input data processing 83

6.4 Granular approximation formed by input and output data granu-

lation.................................. 84

6.5 eGNN single-valued (a) and granular (b) approximation of a function 86

6.6 Stability-plasticity tradeoﬀ and the role ρin eGNN systems . . . . 89

7.1 The rotating Gaussians problem . . . . . . . . . . . . . . . . . . . 99

7.2 ROC curves of diﬀerent methods for the rotating Gaussians . . . . 102

7.3 eGNN decision boundary and last 200 data at particular time steps 103

7.4 A third class appears at h= 200 and remains . . . . . . . . . . . 104

7.5 FBeM evolution of the Acc index, rule base and granularity for the

new-classproblem........................... 105

7.6 eGNN decision boundaries for the 3-class problem . . . . . . . . . 106

7.7 Performance of evolving granular classiﬁers using diﬀerent propor-

tionsofunlabeleddata........................ 108

7.8 FBeM Death Valley temperature forecasts . . . . . . . . . . . . . 114

7.9 eGNN Helsinki temperature forecasts . . . . . . . . . . . . . . . . 115

xiv

LIST OF FIGURES

7.10 Comparing the narrowness of granular forecasts using rule bases

ofdiﬀerentsizes............................ 116

7.11 FBeM processing time and RMSE using diﬀerent amounts of in-

put variables from temperature time series . . . . . . . . . . . . . 118

7.12 FBeM processing time and RMSE for the Death Valley, Ottawa,

and Lisbon time series considering diﬀerent numbers of rules . . . 119

7.13 FBeM prediction of the Death Valley, Ottawa, and Lisbon tem-

perature time series combined . . . . . . . . . . . . . . . . . . . . 121

7.14 eGNN approximation of the concrete compressive strength func-

tion, and evolution of the rule base, error indices, and granularity 126

7.15 Evolving granular systems results on leave-one-variable-out ap-

proach to ﬁnd less correlated subsets of input variables . . . . . . 129

7.16 FBeM approximation of the Parkinson’s telemonitoring function,

and evolution of the rule base, error indices, and granularity . . . 132

7.17 Environment for sensor-based navigation . . . . . . . . . . . . . . 134

7.18 Initial conditions for the autonomous navigation control problem . 136

7.19 Granular controllers navigating at diﬀerent speeds . . . . . . . . . 138

7.20 Detail of the FBeM navigation at diﬀerent speeds . . . . . . . . . 139

7.21 FBeM navigating with noisy input . . . . . . . . . . . . . . . . . 140

List of Tables

7.1 Rotating Gaussians: comparing evolving/non-evolving methods . 101

7.2 New class problem: comparing evolving granular methods . . . . . 104

7.3 Monthly temperature values . . . . . . . . . . . . . . . . . . . . . 110

7.4 Temperature forecasts . . . . . . . . . . . . . . . . . . . . . . . . 113

7.5 Concrete compressive strength prediction: evaluation of diﬀerent

typesofeGNNneurons........................ 124

7.6 Concrete compressive strength prediction: evaluating diﬀerent evolv-

ingmethods.............................. 125

7.7 Parkinson’s telemonitoring prediction: evaluation of diﬀerent types

ofeGNNneurons ........................... 128

7.8 Parkinson’s telemonitoring prediction: evaluating diﬀerent methods 131

7.9 Comparison of diﬀerent evolving granular controllers . . . . . . . 137

xvi

xvii

Chapter 1

Introduction

The computing world has experienced a rapidly increasing growth of information.

A proliferation of automated systems, small scale computing devices, sensor net-

works, and data capture technologies has contributed to the production of large

volumes of data. Data set growth sometimes outpaces available storage capacity

and other times data are stored with no prospective use. Broadly stated, the

focus of data processing and analysis has changed from oﬄine batch processing

of data to the incremental handling of online data streams.

Online data streams originate from a variety of sources such as media enter-

tainment, surveillance systems, mobile devices, multimedia, industrial monitoring

and control, oceanographic and atmospheric systems, health care, stock market,

satellites, ﬁnancial and meteorological systems, web traﬃc and clickstreams, to

name a few. Their prominence in real-world systems, along with the necessity

of modeling, analyzing, and understanding these systems, has brought new chal-

lenges, greater demands, and new research directions.

Research and development of conceptual frameworks, methods, and algo-

rithms capable of extracting knowledge from data streams have taken place mo-

tivated by a manifold of relevant applications. Data stream modeling is funda-

mentally based on computational learning approaches that both, process data

continuously as an attempt to ﬁnd similarities in their spatio-temporal features,

and thereafter provide insights about the phenomenon which governs the data.

The ultimate goal is to obtain more abstract (often human-centric) representa-

tions of large amounts of detailed data with no apparent value.

1.1 Background Research

Modeling, processing, and disposing information become more complex as

real-world systems become more complex. Data streams are characterized by

nonstationarity, nonlinearity, and heterogeneity; they are potentially endless and

may be subject to changes of various kinds. Direct application of machine learning

and data mining algorithms to data streams is very often infeasible because it

is diﬃcult to maintain all the data in memory. A particular challenge faced in

stream modeling concerns how to handle uncertainty.

The primary research question of this thesis is how to obtain accurate and in-

terpretable human-centered models from uncertain data streams. We introduce

evolving granular systems, a granular modeling framework able to capture the

essence of uncertain data streams in a more abstract and compact representa-

tion. While the term ‘evolving’ derives from structurally adaptive models from

data streams, the term ‘granular’ comes from granular computing theory and

emphasizes comprehensible models of uncertainty. This thesis combines evolving

intelligence and granular computing concepts and ideas, and explicitly realizes

them into a practical evolving granular framework.

1.1 Background Research

Uncertainty is an attribute of information since our ability to perceive reality is

often limited (27) (154). The more complex a system is, potentially, the more

uncertain we are of the available information, and the more imprecise is our un-

derstanding of that system. The imprecision of real-world perception is evident

in natural languages and also in empirical measurements where it is known that

the process which generated the data is uncertain. As Kreinovich stated, mea-

surements and expert estimates are never exact (72). Modeling complex systems

raises doubts about the necessity of precise models (118). Granular computing

theory (15) (89) (116) (141) (144) (151) hypothesizes that accepting some level of

uncertainty may be beneﬁcial and therefore suggests a balance between precision

and uncertainty.

Information granulation for uncertainty representation is a fundamental man-

ifestation of the human knowledge (15). Information granulation means that,

instead of dealing with detailed real-world data, the data are considered in a

1.1 Background Research

more abstract and conceptual perspective. The result of information granulation

is called information granule - a granule being a cluster of points put together

by indistinguishability, similarity, proximity, or functionality (152). Examples of

granules include hyperboxes, fuzzy sets, bell-shaped probability distributions and

rough sets (153). A set of granules constitutes a vocabulary of generic descrip-

tors (115), and underlies the basic concepts of linguistic variable and rule-based

systems. Put diﬀerently, granules are semantically meaningful building blocks

of granular rule-based systems (18). Granular rules connect the elements of the

vocabulary and therefore play an important role in computation with information

described in natural language.

The notion of granulation emerged as a natural need to abstract and sum-

marize information and data to support various processes of comprehension and

decision making (15). For example, when we observe an environment we seldom

take into account all of the details of that environment. Because of our physical

and cognitive limitations, a reduced number of samples, variables, and attributes

of interest are brought into focus. To avoid distracting details we are provided

with eﬀective abstraction mechanisms. Detailed numeric data are integrated (ag-

gregated) into kinds of information granules where the granules themselves are

regarded as sets of elements that are perceived as being functionally equivalent

(116). There are close relations between granulation, data mining (135), data

fusion (86), and knowledge discovery (99).

From a more practical point of view, granular computing is a framework for

problem solving, complexity reduction and structured thinking (142). It deals

with data granulation and granular data processing. Information granulation

splits a complex problem into simpler sub-problems and treats them on an in-

dividual basis. Granular systems able to self-adapt their structures from data

streams have only been formally investigated since the early 2000s.

Granular models developed from data streams can be expressed in several

computationally tractable frameworks such as interval mathematics, statistics,

fuzzy sets, rough sets, shadow sets, cluster analysis, decision trees, neighborhood

systems, or hybrids. On top of these are generalized constraints, in the sense

of Zadeh’s general theory of uncertainty (154), which are used to delimit and

represent granules within the diﬀerent frameworks. Computing with granules

1.1 Background Research

grants ample freedom to choose representative granular objects and handling

tools. Regardless of the framework chosen, online granulation aims to retain the

essence of stream data as granular objects. Online granular computing models

consider online granular data streams under simpler (less detailed) resolutions.

The fundamental objective is to extract features of interest from the data to

attain eﬃcient solutions and a better rapport with reality.

Evolving intelligent systems are a mainstream of research in online data mod-

eling (5) (11) (65) (66) (97). These systems encompass one pass recursive al-

gorithms (algorithms independent of previous data) and manage to build the

structure of models from scratch as new information arises. We use the term

‘evolving’ in the sense of gradual development of the system structure (rule base

or the architecture of a neural network) and their parameters. This learning

paradigm mimics the evolution of individuals during their life-cycle, especially

humans: learning from experience, inheritance, and gradual change. Knowledge

can be generated from repetitive tasks and from data streams produced through

perceptions and sent to the brain. The development of the rule-base or neural

network structure is gradual, where the rules/neurons are not ﬁxed or pre-deﬁned.

Evolving systems generate new rules (neurons) each time new data does not ﬁt

into the existing model/understanding (fuzzy rule-base or neural network), but at

the same time only when this new data is informative enough (9) (11). Classiﬁ-

cation, clustering, frequent pattern mining, time series prediction, regression and

control are examples of problems addressed in the evolving systems literature.

Note that stream data modeling should not be confused with time series data

modeling (51). Although related, time series carry static objects that can, in

principle, be analyzed oﬄine whereas stream data require the evolution of the

model structure in online mode, which is not a requirement in time series analysis

(21). Particularly, conventional statistical (54), computational intelligence (44),

and machine learning (101) systems do not meet the requirements of data stream

modeling because they assume forms of linearity and stationarity, or demand

multiple passes over entire data sets and oﬄine processing. Very often there are

real-time constraints that must be met by stream algorithms. Evolving intelligent

systems arose as a framework to model online data streams and overcome the

drawbacks of existing conventional systems.

1.1 Background Research

Currently, a number of evolving intelligent systems have succeeded in dealing

with time-varying numeric data by means of recursive clustering algorithms and

adaptive local models. Notwithstanding, these systems are quite often unable

to process granular data and realize granular-data-stream-oriented computing in

unknown nonstationary environments. Informally, if uis a value which is known

precisely (exactly), we refer to uas a singular (point) value. Conversely, if u

is not known precisely, but there is some information which constrains possible

values of u, then the constraint on udeﬁnes a granular value (155).

This thesis introduces evolving granular systems, which extend evolving intel-

ligent systems in two ways. First, evolving granular systems deal with granular

input and output data such as intervals, fuzzy numbers, and fuzzy intervals.

Granular data may arise from expert judgment, readings from unreliable sen-

sors, and summaries of numeric data over time periods. Interval and fuzzy data

stream modeling generalizes numeric data stream modeling by allowing interval

and fuzzy data granulation. Numeric (singular) data stream is a special case of

granular data stream. Second, evolving granular systems provide granular ap-

proximation of functions. Granular approximation refers to an enclosure where

the output data are within the granular approximation. The granular approxima-

tion may come with a linguistic description, in addition to a numeric pointwise

approximation common of evolving intelligent systems. Granular output is useful

for interpretability purposes and helps to enhance model acceptability.

Numeric, interval and fuzzy granular data streams and interval, fuzzy and

neurofuzzy granular frameworks are of special concern to this study. Indepen-

dent of the data type and framework, evolving granular systems aim to pro-

vide transparent rule-based models that are built from a data sequence. Interval

mathematics (60) (69) (104) and fuzzy set theory (41) (159), as practical frame-

works of granular computing, capture our innate conception of transitional set

belonging and uncertainty. While ‘below 100’, ‘around 10 and 20’, and ‘above

100’ are examples of interval data, ‘about 20’, and ‘around 90’ are examples of

fuzzy data. The fundamental distinction between the interval and fuzzy granular

frameworks concerns the notion of partial membership supported by fuzzy sets.

Whenever interval yes-or-no quantiﬁcation of concepts becomes too restrictive,

fuzzy sets oﬀer an important feature of describing information granules whose

1.2 Objective

constituting elements may belong only partially, i.e., more-or-less quantiﬁcation

of concepts. Fuzzy sets avoid specifying solid borders between full belongingness

and full exclusion by means of smooth transition boundaries (116). Artiﬁcial

neural networks (55) are nonlinear, highly plastic systems equipped with signif-

icant learning capability. Fuzzy sets and fuzzy neurons provide neural networks

with mechanisms of approximate reasoning and transparency of the resulting con-

struction. Fuzzy sets and neurocomputing are complementary in terms of their

strengths thus motivating neurofuzzy granular computing. Granules formalized

in any of these frameworks, interval, fuzzy or neurofuzzy, can facilitate a vast

array of human-centric pursuits (116).

1.2 Objective

The main objective of this thesis is to introduce and characterize a theoretical

evolving granular modeling framework and a suite of practical approaches to learn

from and process uncertain data streams with a focus on accuracy, transparency

and interpretability of models.

Many issues arose during the course of this research which had to be overcome

in order to achieve the main objective. The research issues included:

•how to process interval and fuzzy types of data;

•how to ﬁt uncertain data into rule-based granular models;

•how to create, delete and reﬁne granules and rules without the need to

redesign and retrain the system structure from scratch;

•how to analyze large volumes of stream data eﬃciently;

•how to adjust the granularity of models based on stream data;

•how to obtain more ﬂexible granular constructs;

•how the interval, fuzzy and neurofuzzy approaches are comparable to each

other and alternative approaches.

Other, more subtle, issues are discussed in context throughout the chapters.

1.3 Contributions

The contributions of this thesis can be broadly divided into three groups: con-

ceptual, methodological and computational.

The conceptual contribution is the introduction of a new modeling framework

to represent and process granular data streams. The framework allows input and

output data to be real numbers, intervals, and fuzzy sets. We discuss the notion of

granular data streams as well as learning and model building driven by such data

streams. Central to the proposed framework is not only computational eﬃciency,

but also interpretability and transparency. The framework intends to develop

human-oriented models whose results are readily understood. Formulations and

conceptualizations are provided which are intended to establish foundations for

online granular data processing and uncertainty management.

The methodological contributions of this thesis are three practical approaches

to handle granular data streams. The approaches are oriented to diﬀerent types

of input and output data and each is supported by concepts and tools derived

from diﬀerent theories. In common, both approaches are designed to capture the

very essence of the underlying data stream.

First, we introduce interval-based evolving modeling. Interval-based evolving

modeling (IBeM) is a granular approach to enclose imprecise data and produce

rule-based summary. IBeM emphasizes imprecise data manifesting as tolerance

intervals and learning procedures grounded in fundamentals of interval mathemat-

ics. Antecedent and consequent parts of interval rules are interval hyperboxes,

which are linked by an interval granular mapping - or inclusion function in the

interval analysis terminology. Interval granular approach for systems modeling

makes no speciﬁc assumption about the data including probability distributions,

membership functions and belief or possibility values.

Second, we address fuzzy set based evolving modeling. Fuzzy set based

evolving modeling (FBeM) employs fuzzy granular models to deal with more de-

tailed fuzzy granular data and therefore provide a more comprehensible (human-

intelligible) representation of the data. For each fuzzy model there exists an asso-

ciated fuzzy rule base. The structure of the fuzzy rule base is gradually developed

from incremental learning algorithm suitable to process potentially unbounded

1.4 Organization

fuzzy data streams. FBeM renders linguistic models of information systems and

single-valued and granular approximation of nonstationary functions.

Third, we consider evolving neurofuzzy networks. Evolving granular neural

networks (eGNN) use fuzzy granules and fuzzy aggregation neurons for informa-

tion fusion. The fuzzy aspect allows a neural network to be translated into a

knowledge base and a rule-based inference system that can be promptly read and

understood. The eGNN learning algorithm is committed to building and adapt-

ing the neural network structure using fuzzy data streams. It may add or remove

granules, neurons and respective connections whenever necessary. This means

that the neural network captures new information from data streams, adapts

itself to the new scenario, and avoids redesigning and retraining.

The third primary contribution of this thesis concerns an extensive set of com-

putational results detailing the performance and demonstrating the usefulness of

the proposed approaches. The interval IBeM, fuzzy FBeM, and neurofuzzy eGNN

approaches are evaluated in a variety of applications such as semi-supervised clas-

siﬁcation, time series prediction, function approximation, and control. The ap-

plication examples emphasize the diﬃculty of currently existing methods to deal

with nonstationary data streams. The results demonstrate the competitiveness

of the proposed evolving granular approaches and framework.

1.4 Organization

This thesis is organized into eight chapters as summarized below.

•This chapter contains a general statement of the problem dealt with in this

thesis and places the research into a broader perspective by connecting it

to well-established information systems theories.

•Chapter 2 provides a review on concepts of granular computing and uncer-

tainty processing. We provide essentials from interval analysis, fuzzy sets,

and aggregation operators to form a background of concepts and support

our developments.

•Chapter 3 covers the state-of-the-art research in evolving intelligent systems

1.4 Organization

and introduces a theoretical framework for the analysis and representation

of granular data.

•Chapter 4 introduces an interval learning method. Interval-based evolving

modeling is an approach to deal with stream interval data. Interval granules

are characterized by sharp lower and upper bounds and empty content.

We present details along with some intuition behind learning heuristics of

interval algorithms.

•Chapter 5 presents a fuzzy extension of the aforementioned interval method.

Fuzzy set based evolving modeling uses fuzzy data streams to develop rule-

based fuzzy granular models. Fuzzy sets avoid specifying solid borders

between full belongingness and full exclusion by means of sets with partial

membership.

•Chapter 6 proposes evolving granular neural networks. These networks use

fuzzy granules and fuzzy neurons for information fusion and uncertainty rep-

resentation. The underlying granular construction is incrementally evolved

from a learning algorithm. It pictures a set of fuzzy rules and a fuzzy

inference system, which are obtained from fuzzy data streams.

•Chapter 7 addresses application examples of evolving granular systems in

semi-supervised classiﬁcation, function approximation, time series predic-

tion, and control problems. They are accompanied with discussions and

comparisons. We contrast the methods introduced in chapters 4, 5 and 6

with the state-of-the-art online and traditional oﬄine methods.

•Chapter 8 concludes this thesis and proposes future research directions.

Chapter 2

Foundations of Granular

Computing

This chapter provides deﬁnitions and principles of granular computing. Essen-

tial notions of interval analysis and fuzzy sets are addressed from the granular

computing point of view. Some notation to be used throughout this thesis is in-

troduced. The chapter also covers diﬀerent types of aggregation operators which

map several real inputs in the unit hypercube onto a single output in the unit

interval. Aggregation operators perform information fusion by gathering large

volumes of dissimilar information into a more compact form. Intervals and fuzzy

sets are instances of practical frameworks of granular computing.

2.1 Introduction

Theories and methodologies that make use of granules to solve problems fea-

tured by supplying huge amounts of data, information, and knowledge label a

new area of multi-disciplinary study called Granular Computing (15) (89) (117)

(141) (144) (151). Granular computing as a paradigm of information processing

spotlights multiple levels of data detailing to often provide useful abstractions

and approximate solutions to hard real-world problems (18) (19) (118) (146).

Granular information systems have appeared under diﬀerent names in related

ﬁelds such as interval analysis, fuzzy and rough sets, divide and conquer, quotient

2.1 Introduction

space theory, information fusion, and others (see (141)). Elementary processing

units in granular systems are referred to as information granules. An information

granule is deﬁned as a clump of entities that may originate at the numeric (singu-

lar) or granular level and are arranged together due to their similarity, proximity,

indistinguishability, or coherency.

The goal of a granule is to catch the very essence of the overall data in a

concise and explainable manner (15) (118); it deﬁnes a subset of a universal

set and conveys an internal representation. Granules may be interpreted from

two points of view: from the perspective of uncertainty theory, they are units

lacking precise knowledge; from that of knowledge engineering, they are units of

elementary knowledge.

Granular computing is intended to identify manifestations of granules from

moving back and forth among granularities to yield more or less diﬀerentiation.

Too much detail is wasteful whereas too little renders a system useless. In general,

there is no universal level of granularity of information: the size of granules is

problem-oriented and user-dependent. Granularity is deﬁned as the extent to

which a larger and more complex system is broken down into smaller and simpler

parts. We can quantify the granularity of a granule, for example, by counting its

number of elements. The more elements are located in a granule, the lower is its

granularity, and the higher is its generality (116). High granularities can produce

substantial computational overhead for data storage. In excess, granularities

and granules bring undesirable scalability issues such as incapacity to satisfy the

required throughput. The granularity of information that is explicitly inbuilt

into granules provides useful features in information systems modeling such as

transparency and ﬂexibility.

Let the result of data granulation be designated as a granular structure. A

granular structure is a family of granules which, when considered together, re-

assemble the more complex original system. Handling a complex phenomenon by

means of granular structures allows us to arrive at meaningful solutions. Based on

some carefully chosen granularity, granular computing systems attempt to solve

a problem by isolating its loosely connected sub-problems and handling them on

an individual basis.

Granules of multiple sizes are related to the depth of penetration that charac-

2.1 Introduction

terizes a system. A coarse granular structure contains fewer number of granules

compared to a ﬁne granular structure. This can be stated more precisely as fol-

lows. A coarse granular system regards a small amount of large granules usually

characterized by low precision and high interpretability. A ﬁne granular system

regards a large amount of small granules, high precision, and limited interpreta-

tion. Low-level reﬁned granules provide details about the system functionality.

More abstract, high-level granules are easier to manage and interpret, but may

lose important minutiae.

Input and output data sets generate input and output granular structures,

respectively, which should be somehow connected. We name the correspondence

between input and output granular structures as granular mapping. A granular

mapping is deﬁned over information granules lying in an input space and maps

them into a collection of granules expressed in some output space. Granular

mappings can be encountered quite frequently in rule-based systems, where the

mapping is given as If-Then statements (18).

In granular computing, everything, including data, variables and parameters,

is allowed to be granular. In general, inaccurate measurements and perception-

based information are granular, for example: ‘xis small’, ‘approximately 90’,

‘temperature is high’, ‘probability is high’, [20,25]. In this sense, a granular sys-

tem provides NL-capability (126), that is, capability to operate on information

described in Natural Language. NL-capability is important because much of hu-

man knowledge is described in natural language. Imprecision of human sensory

organs and brain is passed on to natural language (154). More speciﬁcally, when

a proposition expressed in a natural language is represented as a system of gen-

eralized constraints (153), it is, in eﬀect, a granular system. Computation with

information described in natural language ultimately reduces to computation with

granular values.

Computing with granules brings together existing formalisms of interval anal-

ysis, fuzzy sets, rough sets, etc. under one roof. In spite of several visible distinct

underpinnings of these theories, they exhibit fundamental synergies, which are

exploited in the granular computing framework (117).

2.2 Interval Analysis

Interval analysis is a branch of mathematics that provides reliable numerical

tools for problem solving; it treats an interval both as a set and as a number (53)

(60) (69) (103) (104) (109). While arithmetic performs operations on numbers,

interval arithmetic performs operations on intervals. Generally speaking, intervals

are instances of granules. Granular computing materializes in the framework of

interval analysis and provides features for interpretability.

Interval analysis is a theory oriented toward computational implementation

because it supports the development of interval-based granular algorithms. These

algorithms are mainly designed to automatically provide rigorous bounds on ap-

proximation errors, rounding errors, and propagated uncertainties in initial data.

This is of utmost importance because modeling of complex systems must com-

promise complexity and precision. Operations involving imprecise objects must

consider the nature of the imprecision.

The main concern of the interval analysis is to provide a guaranteed approx-

imation of the set of solutions of the underlying problem. ‘Guaranteed’ in this

context means that outer approximations (enclosure) of intervals can always be

obtained and, moreover, be made as precise as desired when further information

yields intervals of narrower width. Intervals acknowledge limited precision by as-

sociating with a variable of the model under investigation a set of reals as possible

values. For ease of storage and fast computation, these sets are restricted to inter-

vals (56). Essentials of interval theory, which form a background of fundamentals

for our investigations, are summarized next.

2.2.1 Interval Vectors

An interval Iis a closed bounded set of real numbers

[l, L] = {x:l≤x≤L},(2.1)

where land Ldenote its endpoints. An n-dimensional interval vector is an

ordered n-tuple of intervals (I1, ..., Ij, ..., In). If Iis, e.g., a two-dimensional

2.2 Interval Analysis

interval vector, then I= (I1, I2) for some I1= [l1, L1] and I2= [l2, L2].

Set-theoretic operations of intersection, ∩, and union, ∪, are applicable to

intervals. The intersection of two intervals, I1and I2, is empty, I1∩I2=∅, if

either l1> L2or L1< l2. This indicates that I1and I2have no common points.

Otherwise, the intersection of I1and I2is again an interval:

I1∩I2= [max(l1, l2), min(L1, L2)].(2.2)

The intersection of interval vectors is empty if the intersection of any of their

items is empty. Otherwise, for I1= (I1

1, ..., I1

j, ..., I1

n) and I2= (I2

1, ..., I2

j, ..., I2

we have:

I1∩I2= (I1

1∩I2

1, ..., I1

j∩I2

j, ..., I1

n∩I2

n).(2.3)

If two intervals have nonempty intersection, then their union,

I1∪I2= [min(l1, l2), max(L1, L2)],(2.4)

is an interval. Disconnected sets must not be expressed as a single interval.

The convex hull of two interval vectors, I1and I2, namely ch(I1, I 2), is the

smallest interval vector containing all their elements. Then,

ch(I1

j, I2

j) = [min(l1

j, l2

j), max(L1

j, L2

j)], j = 1, ..., n. (2.5)

Hull computation is an eﬃcient procedure to combine sets independently of their

connection. It follows that I1∪I2⊆ch(I1, I 2) for any I1and I2.

If I1= (I1

1, ..., I1

j, ..., I1

n) and I2= (I2

1, ..., I2

j, ..., I2

n) are interval vectors, then

I1⊆I2if and only if I1

j⊆I2

j, j = 1, ..., n. (2.6)

2.2 Interval Analysis

We denote the width of an interval vector, namely wdt(I), as the length of its

largest side:

wdt(I) = max(wdt(I1), ..., wdt(Ij), ..., wdt(In)),(2.7)

where,

wdt(Ij) = Lj−lj, j = 1, ..., n. (2.8)

Finally, it is worth deﬁning the midpoint of an interval I:

mp(I) = l+L

2.(2.9)

Analogously, if I= (I1, ..., Ij, ..., In) is an interval vector, then:

mp(I) = (mp(I1), ..., mp(Ij), ..., mp(In)).(2.10)

2.2.2 Interval Arithmetic

Operations on real numbers can be extended to intervals. Interval arithmetic

treats intervals as numbers: adding, subtracting, multiplying, and dividing them.

The rules for interval addition and subtraction are:

I1+I2= [l1, L1]+[l2, L2] = [l1+l2, L1+L2],(2.11)

I1−I2= [l1, L1]−[l2, L2]=[l1−L2, L1−l2].(2.12)

Operations of addition and subtraction for interval vectors are understood to be

component-wise. For two interval vectors, I1= (I1

1, ..., I1

j, ..., I1

n) and I2= (I2

1, ...,

j, ..., I2

n), we have

2.2 Interval Analysis

I1+I2= (I1

1+I2

1, ..., I1

j+I2

j, ..., I1

n+I2

n),(2.13)

I1−I2= (I1

1−I2

1, ..., I1

j−I2

j, ..., I1

n−I2

n).(2.14)

For the product of two independent intervals, I1and I2, we get

I1I2={x1x2:x1∈I1, x2∈I2}.(2.15)

Clearly, the result is again an interval, say I3, whose endpoints are

[l3, L3] = [min(l1l2, l1L2, L1l2, L1L2), max(l1l2, l1L2, L1l2, L1L2)].(2.16)

The reciprocal of an interval Iyields:

1/I ={1/x :x∈I}.(2.17)

If Iis an interval not containing the number 0, then 1/I = [1/L, 1/l] if l > 0; or

1/I = [1/l, 1/L] if L < 0. In case Icontains 0 so that l≤0≤L, then the set

is unbounded and cannot be represented as an interval whose endpoints are real

numbers. For the quotient of two intervals, we have:

I1/I2=I1(1/I2) = {x1/x2:x1∈I1, x2∈I2}.(2.18)

I1/I2is again an interval if 0 is not contained in I2.I1and I2are independent.

The product and quotient operations for interval numbers hold for interval vec-

tors. For two interval vectors, I1= (I1

1, ..., I1

j, ..., I1

n) and I2= (I2

1, ..., I2

j, ..., I2

n),

it follows that:

I1I2= (I1

1I2

1, ..., I1

jI2

j, ..., I1

nI2

n),(2.19)

I1/I2= (I1

1/I2

1, ..., I1

j/I2

j, ..., I1

n/I2

n).(2.20)

2.2 Interval Analysis

2.2.3 Distance Between Intervals

A suitable metric to measure the distance between two intervals, I1and I2, is:

d(I1, I2) = max(|l1−l2|,|L1−L2|).(2.21)

With this metric, the correspondence between the interval number system and

the real number system, [x, x]↔x, holds (106). The metric d(.) preserves the

distance between the corresponding items. We have that

d([x1, x1],[x2, x2]) = max(|x1−x2|,|x1−x2|) = |x1−x2|(2.22)

for any x1and x2. The real line is isometrically embedded into the metric space

of intervals (106).

The distance between two interval vectors, I1= (I1

1, ..., I1

n) and I2= (I2

1, ..., I2

n),

d(I1, I2)=(max(|l1

1−l2

1|,|L1

1−L2

1|), ..., max(|l1

n−l2

n|,|L1

n−L2

n|)),(2.23)

is an interval vector. Sometimes, we are more interested in a number to represent

the overall distance between interval vectors. A measure for the overall distance

between two interval vectors, I1and I2, is

D(I1, I2) = max(d(I1, I2)).(2.24)

2.2.4 Interval Functions

Consider a real-valued function f(x) and a corresponding interval-valued function

f(I). f(I) is a united extension of f(x) if f(I) = f(x) for any value of x∈I. If

the parameters of f(I) are degenerated, then f(I) is a degenerated interval equal

to f(x). Formally, the image of an interval Iunder a real mapping fis

2.2 Interval Analysis

f(I) = {f(x) : x∈I}.(2.25)

More generally, the image of a speciﬁed n-dimensional vector Iadmitting a mul-

tivariable real function fis:

f(I1, ..., Ij, ..., In) = {f(x1, ..., xj, ..., xn) : xj∈Ij∀j}.(2.26)

Generally, the image of an interval through fis not a box (see Fig. 2.1) and it

may be diﬃcult to obtain in closed form. In practice, f(I) can be approximated

by an inclusion function F(I), which is a box in the range of fif fis continuous.

An interval function Ffrom IRnto IRmis called an interval inclusion function

of fif

f(I)⊆F(I)∀I∈IRn.(2.27)

Inclusion functions are not unique and they depend on how we choose F. An

inclusion function is optimal if F(I) is the interval hull of f(I). In other words,

the optimal interval inclusion function for f(I) is the smallest box F∗(I) that

contains f(I). Figure 2.1 illustrates the idea. F∗(I) is unique.

Figure 2.1: Image fof box Iand inclusion functions Fand F∗

2.2 Interval Analysis

In particular, for degenerated intervals I, it follows that:

F(I) = f(I) = F∗(I).(2.28)

Consider fmonotonically increasing in I= [l, L]. Then, assuming continuity

or upper semicontinuity of f, we can obtain f(I) using:

f(I) = [f(l), f (L)].(2.29)

Consequently,

f(x)⊆[f(l), f (L)] ∀x∈I. (2.30)

With monotonic decreasing functions, we order the resulting endpoints properly.

In these cases f(I)=[f(L), f (l)], i.e. strict inclusion relationship holds.

Nonmonotonic functions could be monotonic under endpoint constraint. For

example, f(I) = sin(I) is not monotonic in general but deﬁning I= [−Π/2,Π/2],

then f(I) is monotonic and f(I) = sin(I) = [sin(l), sin(L)].

An interval function f(I) is inclusion isotonic when for any interval vectors,

I1and I2,

if I1⊂I2,then f(I1)⊂f(I2).(2.31)

Finite interval arithmetic (104) is inclusion isotonic. Let •denote the opera-

tions of addition, subtraction, multiplication and division, thus

I1•I2⊂I3•I4(2.32)

holds whenever I1⊂I3and I2⊂I4. In this thesis all interval enclosures are

inclusion isotonic interval extensions of real-valued continuous functions.

2.3 From Interval Analysis to Fuzzy Set Theory

An interval function f(I)∈IR is called ‘thin’ when it involves only degenerate

interval parameters or, equivalently, singular parameters. For instance,

f(I) = a0+

j=1

ajIj(2.33)

is thin for (a0, ..., an) degenerated intervals. When an interval function involves

at least one interval parameter of nonzero width, it is called ‘thick’. This thesis

considers thin interval functions only.

Interval analysis goes far beyond what has been covered in this section. For

instance, we do not address interval statistics (49), intervals in fuzzy set the-

ory (105), interval integration (106), complex interval arithmetic (120), but the

essential to the completeness of this work.

2.3 From Interval Analysis to Fuzzy Set Theory

While interval analysis arose out of a need to analyze error and uncertainty on

digital computers (103), fuzzy set theory arose from a need of more complete and

inclusive mathematical models of uncertainty (149). Relationships between fuzzy

set theory and interval mathematics have been reported by Lodwick (93).

Fuzzy arithmetic (67) is deﬁned by means of the extension principle for fuzzy

sets (70) (149). The extension principle for fuzzy sets is the united extension in

the interval analysis terminology when the fuzzy set is restricted to be an interval

(93). When intervals and fuzzy sets are non-interactive, arithmetic on alpha level

sets is a united extension arithmetic. Both concepts are related fundamentally

through what is known as set functions (131).

From the point of view of intervals as sets, interval analysis can be considered

as a subset of the fuzzy set theory. For instance, an interval [l, L] is a trapezoidal

fuzzy set [l, λ, Λ, L] where l=λand Λ = L(138).

Fuzzy interval analysis (40) and interval type-2 fuzzy logic systems (100) (150)

are explicit examples of joint eﬀorts between fuzzy set theory and interval analysis

to overcome the diﬃculties of uncertainty modeling.

2.4 Fuzzy Sets

Interval analysis and fuzzy set theory are instances of practical frameworks

used to represent granular information and construct granular mappings. Con-

ceptually, intervals and fuzzy sets are diﬀerent ways to model imprecise quanti-

ties and capture our inherent notion of approximate numbers. ‘Above 100’ and

‘around 1.5 and 1.7’ are instances of intervals whereas ‘approximately 100’ and

‘around 1.6’ are instances of fuzzy sets.

A striking diﬀerence between intervals and fuzzy sets comes from the idea

of partial membership intrinsic to fuzzy sets. Whenever interval quantiﬁcation

becomes too restrictive, fuzzy sets provide an important feature of describing in-

formation granules whose constituting elements may belong only partially. Fuzzy

sets prevent deﬁning hard borders between full belongingness and full exclusion

by means of smooth transition boundaries. Granules formalized in the language

of fuzzy sets support a vast array of human-centric pursuits (116).

2.4 Fuzzy Sets

Fuzzy sets (70) (149) constitute one of the most inﬂuential notions in science and

engineering. A fuzzy set captures in a granular way the essential in which much

of physical phenomena is observed and described. Fuzzy information granulation

underlies the basic concepts of linguistic variables, fuzzy rules, and fuzzy rule

base (116). In fuzzy set theory, objects, variables and concepts are a matter of

degree. In particular, fuzzy information granulation allows both, incorporation

of domain knowledge and knowledge discovery from data.

Fuzzy sets extend the notion of set by assigning to each element of a reference

set a value representing its degree of membership in the fuzzy set. Membership

values correspond to the degree the element is similar with typical elements rep-

resenting the concept associated with the fuzzy set. This characteristic of fuzzy

sets facilitates the management of the uncertainty carried by such elements.

Concepts and deﬁnitions related to fuzzy sets which are useful for our inves-

tigations are summarized in next.

2.4 Fuzzy Sets

2.4.1 Fuzzy Set Deﬁnitions

Fuzzy sets are fully characterized by their membership functions. Any function

A:X→[0,1] may serve as a membership function of fuzzy set A. In this

thesis we assume trapezoidal membership functions, which are piecewise linear

functions described by four parameters (l, λ, Λ, L). The membership degree of an

element xin the trapezoidal fuzzy set Ais

A(x) =











0, x < l

x−l

λ−l, x ∈[l, λ[

1, x ∈[λ, Λ]

L−x

L−Λ, x ∈]Λ, L]

0, x > L

(2.34)

A fuzzy set Ais normal if it produces a membership degree equal to 1 for at

least one element xof the universe X. Denote sup as the supremum value of A

for some element x; then Ais normal if

supx∈XA(x) = 1.(2.35)

We denote support and core of a trapezoidal membership function Arespec-

tively as the set of elements of Xwith nonzero membership degrees in A, and the

set of elements of Xwith membership degrees equal to 1, that is, for a trapezoidal

membership function A,

supp(A) = {x∈X|A(x)>0}= [l, L],and (2.36)

core(A) = {x∈X|A(x)=1}= [λ, Λ].(2.37)

The α-cut of a fuzzy set A,Aα, is a set containing all elements of Xwhose

membership degrees are greater than the value α. We have

2.4 Fuzzy Sets

Aα={x∈X|A(x)> α}.(2.38)

Support (α= 0) and core (α= 1) are boundary cases of α-level sets.

A fuzzy set is convex if for all x1, x2∈Xand all κ∈[0,1] it follows that

A(κx1+ (1 −κ)x2)≥min(A(x1), A(x2)).(2.39)

A fuzzy set A1is a subset of A2if and only if every element of A1is also an

element of A2:

A1(x)≤A2(x),for all x∈X. (2.40)

The midpoint and width of a membership function Aare, respectively:

mp(A) = λ+ Λ

2,(2.41)

wdt(A) = L−l. (2.42)

Intersection and union of two fuzzy sets, say A1and A2, are deﬁned as

(A1∩A2)(x) = min(A1(x), A2(x)) ∀x∈X, (2.43)

(A1∪A2)(x) = max(A1(x), A2(x)) ∀x∈X. (2.44)

The convex hull of two trapezoidal fuzzy sets A1and A2is a trapezoidal fuzzy

set determined as follows:

ch(A1, A2) = (min(l1, l2), min(λ1, λ2), max(Λ1,Λ2), max(L1, L2)).(2.45)

2.4 Fuzzy Sets

2.4.2 Fuzzy Interval

Granular data may take various forms depending on how they are modeled. They

can be intervals, probability distributions, rough sets, fuzzy numbers, and fuzzy

intervals (42). Fuzzy intervals and fuzzy numbers are instances of fuzzy granular

data. Fuzzy data arise in the realm of expert knowledge, whenever measurements

are inaccurate, variables are hard to be precisely quantiﬁed, or pre-processing

steps introduce uncertainty in singular data.

A membership function A:X→[0,1] is upper semi-continuous if the set

{x∈X|A(x)> α}is closed, that is, if the α-cuts of Aare closed intervals. If

the universe Xis the set of real numbers and Ais normal, A(x)=1∀x∈[λ, Λ],

then Ais a model of a fuzzy interval, with monotone increasing function φA:

[l, λ[→[0,1], monotone decreasing function ιA: ]Λ, L]→[0,1], and zero

otherwise. A fuzzy interval Ahas the following canonical form:

A(x) = 









φA, x ∈[l, λ[

1, x ∈[λ, Λ]

ιA, x ∈]Λ, L]

0,otherwise

,(2.46)

where xis a real number in X. The fuzzy interval Asatisﬁes the conditions of

normality (A(x) = 1 for at least one x∈X) and convexity (A(κx1+ (1 −κ)x2)≥

min{A(x1), A(x2)},x1, x2∈X,κ∈[0,1]). If

φA=x−l

λ−land (2.47)

ιA=L−x

L−Λ,(2.48)

then the fuzzy membership function (2.46) reduces to the model of a trapezoidal

membership function (2.34). Moreover, when λ= Λ, then A(x) = 1 for one

element x. In this case the corresponding fuzzy entity is called a fuzzy number

(116). Fuzzy data generalize numeric data by allowing fuzziness.

2.5 Aggregation Operators

2.4.3 Similarity Between Fuzzy Sets

Granular data and models are fuzzy objects of trapezoidal nature. In this case,

a useful similarity measure for trapezoids, say A1and A2, is:

S(A1, A2) = 1 −|l1−l2|+|λ1−λ2|+|Λ1−Λ2|+|L1−L2|

4.(2.49)

This measure translates the relation between the trapezoids in a number. It

returns 1 for identical trapezoids (indicating the maximum degree of matching

between them) and decreases linearly when A1and A2withdraw from each other.

Particularly, equation (2.49) is a Hamming-like metric (52) where the parameters

of the trapezoids are compared one by one. A thorough discussion of similarity

and compatibility measures can be found in (33).

The distance between two vectors of trapezoids, say A1= (A1

1, ..., A1

n) and

A2= (A2

1, ..., A2

n),

S(A1, A2) = 1 −1

j=1

(|l1

j−l2

j|+|λ1

j−λ2

j|+|Λ1

j−Λ2

j|+|L1

j−L2

j|),(2.50)

is also a number, which quantiﬁes their relationship.

2.5 Aggregation Operators

Aggregation operators C: [0,1]n→[0,1], n > 1 combine input values in the unit

hypercube [0,1]ninto a single output value in [0,1]. They must satisfy two funda-

mental properties: (i) monotonicity in all arguments, i.e., given x1= (x1

1, ..., x1

and x2= (x2

1, ..., x2

n), if x1

j≤x2

j∀jthen C(x1)≤C(x2); (ii ) boundary condi-

tions: C(0,0, ..., 0) = 0 and C(1,1, ..., 1) = 1. Important classes of aggregation

operators are summarized below. See (20) (116) for details.

2.5 Aggregation Operators

2.5.1 T-norm Aggregation

T-norms (T) are commutative, associative, and monotone operators on the unit

hypercube whose boundary conditions are T(α, α, ..., 0) = 0 and T(α, 1, ..., 1) =

α,α∈[0,1]. The neutral element of T-norms is e= 1. An example is the

minimum operator:

Tmin(x) = min

j=1,...,nxj,(2.51)

which is the strongest T-norm because

T(x)≤Tmin(x) for any x∈[0,1]n.(2.52)

The minimum is also idempotent, symmetric, and Lipschitz-continuous. Further

examples of T-norms include the product,

Tprod(x) =

j=1

xj,(2.53)

and the Lukasiewicz T-norm,

TL(x) = max(0,

j=1

xj−(n−1)).(2.54)

2.5.2 S-norm Aggregation

S-norms (S) are operators on the unit hypercube which are commutative, asso-

ciative, and monotone. S(α, α, ..., 1) = 1 and S(α, 0, ..., 0) = αare the boundary

conditions of S-norms. It follows that e= 0 is the neutral element of S-norms.

S-norms are stronger than T-norms. The maximum operator:

Smax(x) = max

j=1,...,nxj,(2.55)

2.5 Aggregation Operators

is the weakest S-norm, that is,

S(x)≥Smax(x)≥T(x),for any x∈[0,1]n.(2.56)

Other examples of S-norms include the probabilistic sum,

Sprob(x)=1−

j=1

(1 −xj),(2.57)

and the Lukasiewicz S-norm,

SL(x) = min(1,

j=1

xj).(2.58)

The dual CDof an aggregation operator Cis

CD(x1, ..., xn)=1−C(1 −x1, ..., 1−xn).(2.59)

Maximum and minimum, probabilistic sum and product, and Lukasiewicz S and

T-norms are examples of self-dual aggregation operators.

2.5.3 Uninorm Aggregation

Uninorms (U) are bivariate, associative and symmetric operators closed under

duality. Similarly as with T-norms and S-norms, associativity allows n-ary ex-

tension of uninorms. Uninorms U: [0,1]n→[0,1] generalizes triangular norms

by relaxing the assumption about the neutral element eto get values in [0,1].

Input values higher than eare interpreted as beneﬁcial, a positive evidence; input

values lower than eare considered detrimental, a negative evidence. Naturally,

when eis equal to 0 a uninorm turns into an S-norm and when e= 1 the uninorm

becomes a T-norm.

2.5 Aggregation Operators

This work considers the following family of uninorms:

U(x) = 









e T x1

e, ..., xn

eif x∈[0, e]n

(e+ (1 −e)) Sx1−e

1−e, ..., xn−e

1−eif x∈[e, 1]n

T(x1, ..., xn) otherwise,

(2.60)

where e6= 0 and e6= 1. Any pair of T and S norms may be used to construct the

uninorm Uindependently of their properties or duality.

2.5.4 Averaging Aggregation

An aggregation operator Cis averaging if for every x∈[0,1]nit is bounded by

Tmin(x)≤C(x)≤Smax(x).(2.61)

The basic rule is that the output value cannot be lower or higher than any input

value. An example of averaging operator is the arithmetic mean:

M(x) = 1

j=1

xj.(2.62)

Averaging operators are assumed to be idempotent, strictly increasing, symmet-

ric, homogeneous, and Lipschitz continuous.

2.5.5 Compensatory T-S Aggregation

Compensatory T-S operators combine T-norms and S-norms to counterbalance

their opposite eﬀects. Contrary to uninorm aggregation, T-S aggregation is uni-

form in the sense that it does not depend on parts of the underlying domain.

T-S operators use both a T-norm and a S-norm and averages the two val-

ues obtained by means of a weighted quasi-arithmetic mean. The linear convex

operator

2.6 Summary

L(x) = (1 −v)T(x1, ..., xn) + vS(x1, ..., xn),(2.63)

where v∈[0,1], is an example of T-S operator of the family of weighted quasi-

arithmetic means. T-S operators need not to be dual in terms of Tand S. It

follows that:

S(x)≥L(x)≥T(x),for any x∈[0,1]n.(2.64)

2.6 Summary

This chapter has addressed principles and deﬁnitions of granular computing that

are useful for the comprehension of subsequent chapters. We argued that in-

formation granulation plays a primary role both in handling data of uncertain

nature and in representing concepts described in natural language. We empha-

sized interval and fuzzy granular computing frameworks - with intervals and fuzzy

sets being instances of information granules. When processing granular data we

are in fact handling a signiﬁcant number of similar individual elements at the

same time and therefore ignoring details. This chapter also covered aggregation

operators which are pertinent for information fusion within granular computing

environment.

Chapter 3

Evolving Granular Systems

Evolving granular systems are a modeling framework that considers online gran-

ular data stream processing and structurally adaptive rule-based models. As

uncertain data prevail in stream applications, excessive data granularity becomes

unnecessary and ineﬃcient. This chapter starts with the motivation which led to

the development of evolving intelligent systems. We brieﬂy summarize the main

historical landmarks of the research area leading to the state of the art. Next, we

introduce evolving granular systems, which extend evolving intelligent systems

allowing data, variables and parameters to be granular (intervals and fuzzy sets).

The aim of evolving granular systems is to ﬁt the information carried by input-

output data streams from online nonstationary processes into rule-based models

and, at the same time, provide granular approximation of functions and linguistic

description of the system behavior.

3.1 Introduction

Adaptability is of paramount importance for intelligent systems. As Darwin

quoted (35), it is neither the strongest nor the most intelligent that survives, but

the most adaptable to change. Building adaptive models from large volumes of

real-world online data ﬂows requires developing non-conventional learning algo-

rithms able to continuously track system and environment changes. Rethinking

traditional data mining and modeling techniques is primordial to support struc-

3.2 Evolving Intelligent Systems

tural adaptation of information systems based on sequences of data, possibly of

uncertain nature.

Because data acquisition systems and small scale computing devices became

mere components of complex systems, large amounts of data have been produced

uninterruptedly. Storage of large-scale data sets and oﬄine processing are fre-

quently impractical, especially in online applications. In addition, data from

diﬀerent sources may be temporally and spatially related. Online learning algo-

rithms should beneﬁt from time and space data stream correlations to capture

essential information and recursively translate it into structured knowledge. The

eﬀectiveness of data stream-oriented learning algorithms is rooted in their apti-

tude to quickly evolve models from nonstationary data.

Learning system models from data streams in online mode is a challenging

task for most statistical and computational intelligence methods. Adaptive -

and naturally non-adaptive - learning methods face a number of drawbacks when

dealing with evolving data streams including: (i) diﬃculty in choosing the model

structure since data sets and related information are not available; (ii ) forget-

fulness when trying to acquire new information after concept changes; and (iii)

limited transparency and interpretability of the resulting model. In particular,

there is a need for developing recursive learning methods that explore the na-

ture of data streams (11) and at the same time fulﬁll accuracy, transparency and

interpretability requirements (4).

3.2 Evolving Intelligent Systems

Approaches to extract meaningful information from data streams have recently

been developed (1) (9) (10) (24) (25) (32) (46) (59) (66) (75) (79) (83) (84)

(94) (122) (124) (125) (134). Methods and algorithms directed toward this end

are known as Evolving Intelligent Systems. Evolving intelligent systems focus on

nonstationary processes and embody online learning methods and one-pass incre-

mental algorithms that evolve or gradually change individual models to guarantee

life-long learning and self-organization of the system structure.

Evolving systems are a step toward a higher level of adaptability compared

to conventional adaptive systems from control theory (13), classical identiﬁcation

3.2 Evolving Intelligent Systems

systems (92), and traditional data mining systems (54) (135). While the term

‘intelligent’ comes from the use of fuzzy and neuro-fuzzy (computational intelli-

gence) techniques, the evolving aspect of these systems accounts for unbounded

(inﬁnite) amounts of data, changing concepts, and structural adaptation of mod-

els.

Formally stated, a system is said to be evolving if it:

•learns continuously from data streams;

•does not store previous samples;

•does not depend upon prior structural knowledge;

•self-adapts its structure when needed;

•is independent of statistical properties of data; and

•does not use ‘prototype’ initialization.

Moreover, it is much desired that evolving systems assimilate knowledge fast using

small memory requirements to support real-time applications. Evolving systems

must account for the fact that the unknown is likely to matter.

In terms of implementation, evolving systems usually achieve their ﬁnal pur-

pose in software level, but they may be performed in physical embodiments in-

cluding intelligent agents, embedded systems, and ubiquitous computing (11).

3.2.1 Historical Landmarks

In the beginning of this century, two mainstreams of research in evolving intelli-

gent systems were introduced: evolving fuzzy systems (5) and evolving connec-

tionist systems (65). Their origins are independent of one another.

Evolving fuzzy systems (eFS) were proposed by Angelov (5), being evolving

Takagi-Sugeno (eTS) fuzzy systems (6) a milestone in the ﬁeld of structurally

adaptive rule-based systems. The eTS is an eFS paradigm for function ap-

proximation and control that fulﬁls the requirements for ﬂexible and adaptive

approaches of a variety of modern applications such as automation processes,

3.2 Evolving Intelligent Systems

autonomous systems, intelligent sensors, and defense. eTS assumes that the an-

tecedent and consequent parameters of functional fuzzy rules as well as the num-

ber of rules in a rule base can gradually change by learning from experience based

on data streams. This characteristic provides eTS approaches with the funda-

mental ability to pursue online modeling of time-varying nonstationary functions.

Evolving fuzzy classiﬁers (eClass) (8) (10) are another approach derived from eFS

when the consequent part of fuzzy rules is a class label. In eClass the number

of classes needs not be known in advance and new classes can be incorporated

at any time. eClass models were seminal to the ﬁeld of evolving classiﬁers which

possess the ability to capture both concept drift and shift (95).

Evolving connectionist systems (eCOS) were proposed by Kasabov (65) (66).

eCOS are artiﬁcial neural networks that operate continuously in time and adapt

their structure and functionality through interaction with the environment and

other systems. A paradigm of eCOS is called evolving fuzzy neural network

(EFuNN) (63), which is the earliest and perhaps most inﬂuential model of eCOS.

All neurons in EFuNN are created and updated during learning. They represent

membership functions and rules. Information carried by a data stream is memo-

rized on neurons and connections, and further used for predictions. The EFuNN

structure evolves from hybrid (supervised and unsupervised) algorithms. Partic-

ularly, the fuzzy aspect of EFuNN permits the neural network to be interpreted as

a fuzzy rule-base. Other noteworthy approaches supporting the context of eCOS

are evolving self-organizing maps (eSOM) (34) and dynamic evolving neural-fuzzy

inference systems (DENFIS) (64).

Common to both eFS and neurofuzzy eCOS are fuzzy sets, which are formed

on a basis of numeric data through incremental clustering. Clusters give rise to

fuzzy membership functions that considered together convey a global view of the

available data. In evolving systems, fuzzy membership functions play a key role

as the core of modeling approaches. They aim to represent similar data in a

concise manner. After cluster identiﬁcation, a recursive algorithm is usually used

to reﬁne local parameters and functions. In both platforms, eFS and neurofuzzy

eCOS, expert knowledge can be incorporated, but it is not compulsory.

From the granular computing point of view, eFS and great part of eCOS can be

considered granular modeling frameworks. Fuzzy sets, used to represent numeric

3.2 Evolving Intelligent Systems

data, are instances of granules whereas computations in eFS and eCOS are based

on the result of information granulation. However, in general, evolving intelligent

systems cannot be regarded as evolving granular systems in the greatest sense of

the term because they do not deal with input and output granular data and quite

often do not produce granular estimation. In other words, evolving systems are

granular systems internally, and singular systems externally.

Since the conception of evolving intelligent systems a diversity of studies sug-

gesting extension of the original content has taken place. Approaches regarding

primarily computational intelligence principles and ideas follow the essential no-

tions of the original evolving intelligent systems. Conversely, there exist parallel

research lines where structurally adaptive learning approaches from data streams

are mostly based on data mining and statistics. Such approaches are often not re-

ferred to as ‘evolving’; however, the central idea of capturing gradual and abrupt

changes in nonstationary data streams is the same independently of the diﬀerent

terminologies. The next section reviews some state-of-the-art works.

3.2.2 State of the Art

This section summarizes recent research related to learning methods capable of

handling numeric data streams. We do not intend to give an exhaustive review of

the literature. The purpose is to overview works closely related to the approaches

addressed in this thesis.

The evolving participatory learning (ePL) approach (87) combines the concept

of participatory learning (137) with evolving Takagi-Sugeno fuzzy systems. The

ePL approach is based on unsupervised clustering and therefore is a candidate

to ﬁnd rule base structures in adaptive fuzzy modeling. ePL uses participatory

learning fuzzy clustering instead of scattering or information potential-based clus-

tering used by eTS. At each time step, ePL updates the rule base structure using

convex combinations of new data samples and the closest cluster center. The

parameters of the consequent part of a rule are adapted using a recursive least

squares algorithm.

The evolving multivariable Gaussian approach (eMG) (84) is an evolving func-

tional fuzzy modeling approach which, diﬀerently from eTS, uses an evolving

3.2 Evolving Intelligent Systems

Gaussian clustering algorithm based on the concept of participatory learning.

The clustering algorithm is one-pass and updates the eMG rule base continuously.

Fuzzy sets in eMG are multivariable Gaussian membership functions which are

adopted to preserve information between input variable interactions. The param-

eters of the membership functions, that is, cluster centers and dispersion matrices,

are estimated by the clustering algorithm. A weighted recursive least squares al-

gorithm updates the parameters of the rule consequents. The eMG clustering

algorithm is particularly robust to noisy data and outliers through the use of a

mechanism to smooth incompatible input data.

A data-driven incremental algorithm called ﬂexible fuzzy inference system

(FLEXFIS) was proposed in (94) to evolve Takagi-Sugeno fuzzy systems. A

modiﬁed version of vector quantization was suggested for rules evolution. The

FLEXFIS algorithm adapts linear functions of rules consequent and premise pa-

rameters (fuzzy membership functions) in online mode. Clusters of data are

automatically generated based on the nature, distribution and quality of new

data. Convergence toward the optimal parameter set in the least-squares sense

has been achieved by the algorithm.

Self-organizing fuzzy modiﬁed least-square neural network (SOFMLS) (124)

is a neurofuzzy network capable of adapting itself in real-time to a changing envi-

ronment. In SOFMLS, parametric and structural model adaptation is performed

simultaneously. The neural network generates a new rule if the smallest distance

between a new numeric data vector and rule parameters is higher than a pre-

speciﬁed radius. A density-based pruning procedure controls the network growth

over time. SOFMLS does not require retraining of the whole model and has

proved to be able to escape from local minima and be stable to concept changes.

The fuzzy min-max neural network (GFMM) (46) is a generalization of the

fuzzy min-max clustering and classiﬁcation neural networks (129) (130). It han-

dles labeled and unlabeled data simultaneously in a single neural model. GFMM

combines supervised and unsupervised learning to give hybrid clustering and

classiﬁcation. The learning process places and adjusts hyperboxes (expansion-

contraction paradigm) in the feature space in a few or one pass over data sets.

GFMM is able to classify interval data and can be viewed as an incremental

granular classiﬁer.

3.2 Evolving Intelligent Systems

Learn++.NSE (43) is an ensemble of classiﬁers-based approach for time-

varying data distribution modeling. Learn++.NSE considers consecutive batches

of data and makes no assumptions about the nature and rate of concept drift.

The algorithm learns incrementally, similar to other algorithms of the Learn++

family (107) (121). Learn++.NSE trains one new classiﬁer for each batch of data

it receives and combines these classiﬁers using a dynamically weighted majority

voting procedure. This procedure allows the algorithm to recognize and react to

changes in the underlying data distributions. Since data batches are discarded

after use, Learn++.NSE is suitable for online modeling of large volumes of data.

Very fast decision trees (VFDT) (38) is a method to discover knowledge in

databases that builds decision trees using constant memory space and constant

time to process a sample. VFDT operates on high-volume data streams and

gradually creates branches and leaves if necessary. The approach uses Hoeﬀding

bounds to guarantee that its output is asymptotically nearly identical to that of

a conventional batch learner. VFDT is designed for classiﬁcation purpose.

The ultra fast forest of trees (UFFT) (48) is a one-pass incremental algorithm

able to detect concept drift. Trees are split according to new information appear-

ing in a numeric data stream. In multi-class classiﬁcation problems UFFT builds

a binary tree for each possible pair of classes, leading to a forest of trees. De-

cision nodes and leaves contain naive Bayes classiﬁers to detect changes in class

distribution and classify test examples. When changes in class distributions are

detected, sub-trees rooted at representative nodes are pruned.

Diﬀerently from VFDT and UFFT, evolving fuzzy linear regression trees

(eFRT) (83) (85) convey a linear regression model in each leaf. Thus, eFRT

can be used for function approximation and prediction. In general, the number

of tree nodes and the number of inputs can be changed given a new sample. The

tree starts with a single leaf and grows replacing leaves with sub-trees and adding

more variables to the regression model. The eFRT topology is updated on the

ﬂy using a statistical model selection test that considers accuracy and number of

parameters to provide accurate and parsimonious trees.

Massive Online Analysis (MOA) (23) is a software environment for learning

from evolving data streams. MOA supports incremental classiﬁcation and clus-

tering approaches that do not scale with the volume of information. For classiﬁ-

3.3 Granular Data Streams

cation, MOA considers boosting, bagging, and Hoeﬀding trees with and without

naive Bayes classiﬁers at the leaves. For clustering, it implements the algorithms

StreamKM++, CluStream, ClusTree, Den-Stream, D-Stream, and CobWeb. The

aim of MOA is to provide analysis tools and insight about real-world data stream

mining problems. MOA can interact with the software WEKA, the Waikato

Environment for Knowledge Analysis (135).

3.3 Granular Data Streams

Physical systems change over time and usually produce considerable amounts of

nonstationary data. Data streams in online environment can be granular from

diﬀerent perspectives. A more intuitive perspective concerns data that are granu-

lar by themselves. To elaborate, consider a simple example of predicting variable

yfrom the last available observation x. This leads us to search for an approxi-

mand pto describe the process function fbased on pairs (x, y). Here, instances x

and yare singular (real numbers), and function fis single-valued. Singular data

do not restrain models to be singular but rather a granular system may use gran-

ular models whose size and placement reﬂect the information carried by singular

data. A hypothesis is that granular representation helps to assess the structure

of detailed singular data and organizes the data into a more interpretable format.

Consider x= [x, x] and y= [y, y] as instances of a granular data stream,

intervals in this case. To exemplify, xand xmay denote the minimum and

maximum price of an economical index during a day, and yand ythe range of

ﬂuctuation of the price in the next day. In this example, data are originally

granular, and models [p, p] must be granular to support granular data. Figure

3.1 illustrates the granular modeling approach for function approximation.

Figures 3.1(a) and 3.1(b) show that granular models outer approximate single-

valued and granular functions, respectively. Outer approximations of functions

can always be obtained, e.g., at the top level, the coarsest possible granular

approximation is the problem domain. Although merely enclosing a solution may

sound at ﬁrst shallower than ﬁnding the solution itself, we should reﬂect that the

degree of satisfaction involved in embracing a solution depends strongly on the

width of the enclosure obtained (60). Moreover, when processing stream data,

3.3 Granular Data Streams

Figure 3.1: Granular models: (a) single-valued function, (b) granular function

we rarely have an idea about the error range and uncertainty associated with the

data. On the contrary, if we can compute with granules containing a solution,

then we can take for example the midpoint as a numeric approximation. Hence,

we obtain both an approximate numeric solution and tolerance bounds on the

approximation. The key task of approximating functions with granules is seeking

for the tightest envelope for the approximand.

Another perspective for the materialization of granules in data streams is con-

cerned with the uncertainty introduced during preprocessing steps. Incomplete

data makes precise discrimination of examples diﬃcult. Missing values are usu-

ally predicted through imputation methods (91) (127) where the imputed data

is uncertain by the very nature of the prediction and motivates granules. In

privacy-preserving data mining, uncertainty may be added to the data in order

to preserve the privacy of the results (3). Additionally, noise and disturbances of

bounded-error dynamic context also demand information granulation.

Granular data may arise when measurements are inaccurate or variables are

hard to be quantiﬁed. For example, in sensor streams imprecision arises from in-

accuracies in the underlying data acquisition equipment. Often, data are purely

numerical, but the process which generated the data is uncertain. In these cases,

uncertainty in data representation may be useful to improve the quality of the re-

sults. For example, an instance with greater uncertainty may not be as important

as one with smaller uncertainty.

Sometimes, stream data are derived from expert knowledge. Granular com-

3.4 Evolving Granular Modeling

puting provides a general framework to represent real-world perception in natural

language (42) (138). Various considerations can aﬀect one’s choice of data rep-

resentation. Foremost among these is what Zadeh calls cointention (154), the

ability of the representing object to convey the meaning of the concept it is being

used to represent (140).

In a nutshell, stream data can be intervals, probability distributions, rough

sets, and fuzzy intervals (42). We deﬁne granular data streams as a sequence of

samples that conveys granular information about a process. Evolving granular

models are built from granular data streams. Interval and fuzzy granular data

streams generalize numeric data streams by allowing interval representation and

fuzziness.

3.4 Evolving Granular Modeling

Nonstationary granular system modeling encompasses adaptive and ﬂexible learn-

ing procedures to deal with many types of data such as numbers, intervals, and

fuzzy intervals. Granular computing provides a rich framework for modeling non-

stationary systems using granular data streams.

Evolving granular modeling (6) (16) (66) (75) (77) (78) (79) (80) (81) (97)

(119) comes not only as an approach to capture the essence of stream data but

also as a framework to extrapolate spatio-temporal correlations from lower-level

raw data and provide a more abstract human-like representation of them. Re-

search eﬀort into granular computing toward online environment-related tasks is

supported by a manifold of relevant applications such as ﬁnancial, health care,

video and image processing, GPS navigation, click stream analysis, etc.

Our deﬁnition of evolving granular system is as follows: evolving granular

systems are systems that are able to derive interpretable rule-based models and

provide granular function approximation using an incremental learning algorithm

and imprecise stream data (with imprecise data being numbers, intervals, fuzzy

intervals, etc.) Association rules given in the form of If-Then statements can be

extracted from an evolving granular construct at any time. The evolved rule base

means, in essence, a granular description of a process.

In practice, evolving granular systems extend evolving intelligent systems in

3.5 Time and Space Granulation

their capability to handle singular and granular input-output data, and give

single-valued and granular approximations of original single-valued or granular

functions. Granular approximation comes with a linguistic description in addi-

tion to a numeric, pointwise approximation typical of evolving intelligent systems.

Evolving granular systems rely fundamentally on the concepts of granular

view, information granule, and granular mapping (see Section 2.1) in the process

of modeling stream data. Emphasis is on the tasks of data granulation and

computing with granules (143) (145) (146) (152). The granularity of information

explicitly embedded into granular systems oﬀers valuable features in dynamic

modeling such as transparency and ﬂexibility. Naturally, we are concerned with

a certain way of compressing granular data into more intelligible granular models.

Granular data streams are responsible for creation, expansion and shrink-

age of granular models along one or more dimensions of the input and output

spaces, guide parameter adaptation, and order the most appropriate granulari-

ties. Concept change, missing and noisy values, superﬂuous and outlier samples

are common in online environments and require automatic intervention. When-

ever a sample arrives, evolving algorithms should decide whether to discard it or

to use it to update the current knowledge. Evolving granular learning algorithms

designed to handle online granular data face odd challenges concerning the value

of the current knowledge, which reduces as the concept changes; and the impossi-

bility to neither store nor retrieve the data once read. Learning must be one-pass.

Constructive (bottom-up) and decomposition-based (top-down) mechanisms pre-

dominate.

3.5 Time and Space Granulation

Data granulation may be performed in time and space domains. Approaches to

building granules regard temporal granulation earlier than spatial granulation,

as illustrated in Fig. 3.2. This order of granulation is maintained due to several

reasons. Occasionally, samples are recorded at diﬀerent time intervals, e.g., as in

events stream. The need for synchronized analysis of manifold data streams and

search for time-correlated structures give support to the possibility of considering

temporal granulation ﬁrst. Temporal granulation tends to slow down the data

3.5 Time and Space Granulation

ﬂow once several streaming instances can be wrapped by a granular object and

further computations be based on granules. Time granules grant synchronism

and smaller amount of granular data for subsequent spatial analysis. Spatial

correlation uniting heterogeneous data with multiple levels of granularity and

diﬀerent representations (intervals, fuzzy sets, rough sets, etc.) is captured during

the process of spatial granulation. Structured representation of data is preserved

over time as a synopsis of the data stream; it warrants structured problem solving

at the practical level.

Figure 3.2: Time and space granulation

The ﬂexibility of handling data streams using a granular computing frame-

work enables us to describe granules in diﬀerent application domains without

deep knowledge about the problem. Tight time and memory constraints of on-

line environment and interpretability requirements inspire granulated views of

detailed data and computing at coarser granularities.

3.5.1 Time Domain Granulation

Time granulation aims at both reducing the sampling rate of fast data streams and

synchronizing concurrent data streams that are input at random time intervals.

A time granule describes the data for a certain time period.

3.5 Time and Space Granulation

Whenever the bounds of a time granule are aligned with signiﬁcant shifts in

the target function, the underlying granulation provides a good abstraction of

the data. Conversely, if the alignment is poor, models may be inadequate (17).

Manifold granularities require temporal reasoning and respective formalizations.

Time granules and time windows are distinguished as follows.

Time window (74) (110) stands for a pre-speciﬁed or adaptive duration interval

within which data samples assemble a representation. Generally, a ﬁxed number

of samplings or error values deﬁnes the size of the window. Windowing the

time domain attempts to produce as few segments as possible to avoid data

overﬁtting. Few time segments may hide information if the concept changes.

Nonstationarity modiﬁes “ideal” window lengths by its own dynamic. Approaches

to testing window lengths are computationally costly and, hence, infeasible in

environments with narrow time constraints. Essentially, there may exist several

information granules in a time window. Data chunk analysis belongs to window-

based approaches for information extraction and analysis.

A time granule groups data according to their indistinguishability in time.

Since a time granule conveys similar data indexed in time, its bounds are naturally

aligned with substantial changes in the function. The result of dynamic time

granulation is a unique granule per segment. Time granules assume manifold

levels of data abstraction and are aware of the pace of concept changes.

Event streams are examples of streams that usually come about at diﬀerent

time granularities. They require analysis of time-domain granules for common-

alities extraction prior to space-domain analysis. Broadly stated, information

evoked from time granules can be bounds of intervals, probability distributions

or membership functions, and features such as frequency and correlation between

events, patterns, prototypes. The internal structure of a granule and its associ-

ated variables provide full description and characterization of the granule.

Whenever manifold data streams mismatch each other at ﬁner time granular-

ities we resort to a granulated view of the time domain and a data mining and

modeling approach. The resulting granulation should be at least as coarse as the

coarsest individual stream to agree with the notion of outer approximation of

functions and guaranteed solution (Section 2.1).

3.5 Time and Space Granulation

3.5.2 Space Domain Granulation

Data granulation over the space domain is a process of organization for compre-

hension (15). Granulation enables us to view diﬀerent samples as being the same

if low level details are neglected. Granulating the domain space is fundamental in

methods of clustering (2) (17) and information integration (50). Resulting gran-

ules may compose antecedent and consequent parts of rules in rule-based systems

(11) (71) (73).

Whenever variables are recorded simultaneously and the sampling frequency

is not so high that we have enough time to step recursive algorithms, the time

granulation stage can be ignored and eﬀorts fully concentrated on spatial granu-

lation. In fact, time and space granulation are somewhat related. For instance,

(i) with the minimal and maximal values occurring in a time granule we may

form an interval granular object; (ii) taking a representative mean or median of

instances resting into a time granule and a conﬁdence interval around it we may

form a statistical granular object; (iii ) capturing the core and the uncertainty of

instances falling in a same time granule may give rise to a fuzzy granular object.

Granular objects of any precedence may be taken into consideration as input to

the stage of spatial granulation.

The location and size of a granule play a role in the process of granulation.

Original stream data are compressed to a few granules whose location and gran-

ularity reﬂect the structure of the data. There are many granulated views of

the same problem. When evolving granular structures, granules are created as

instances of the current knowledge. Next, granules may expand and occupy the

space wherever new instances arrive. Operations on granules combine granules to

form a coarser granule or decompose a granule into ﬁner granules. Operations on

granules should be consistent with the size of the granules and relations between

granules; they provide the basic ingredients for the granular computing.

While concept drift and shift are terms related to the joint time-space domain

(95), the descriptions of data density and information speciﬁcity (139) concern

the space domain and are options to guide spatial granulation. Bargiela and

Pedrycz (15) state that granules should encompass as many data as possible while

maintaining certain speciﬁcity in what they called principle of the maximization

3.6 Summary

of the information density. The principle of the balanced information granularity

(15) gives preference to the design of granules balanced along all dimensions

rather than granules with unbalanced geometry. In particular, hyperbox-based

spatial granulation provides descriptions fully compatible with the descriptions of

intervals and fuzzy sets. With intervals and fuzzy sets, the pursuit of a balanced

granularity and reﬁning and coarsening of granules are reduced to operations on

bounds of intervals and parameters of fuzzy membership functions.

3.6 Summary

Evolving granular systems combine granular computing and evolving intelligent

systems concepts into a single framework. We argued that it is sometimes unnec-

essary or ineﬃcient to discriminate numeric data precisely. Moreover, we argued

that systems are better supported by a granular framework to suit uncertain,

granular stream data. Numeric data is a particular case in which a granule de-

generates into a singleton. The necessity of building models in ﬁner granularities,

close to the singularity, is justiﬁed only when there are clear beneﬁts on doing so.

This chapter presented the state of the art of the research in evolving granular

systems and discussed adaptive rule-based modeling from granular data streams.

Chapter 4

Interval Based Evolving

Modeling

This chapter introduces an interval-based evolving modeling approach to develop

system models using data streams. The approach consists of a rule-based model-

ing scheme that gradually adapts its antecedent and consequent parts over time.

Its main purpose is continuous (inductive) learning, self-organization, and adap-

tation to unknown, nonstationary environments. While traditional functional

rule-based modeling approaches use numeric data and produce numeric results,

the suggested interval approach uses interval data and presents results in numeric

and granular format. Interval rule-based approaches are highly human-centric in

the sense that antecedent and consequent of rules are intervals, which may convey

a linguistic meaning. Interval outputs are more informative and comprehensible

than numeric outputs.

4.1 Introduction

Interval based evolving modeling (IBeM) is an adaptive granular framework whose

idea is to enclose similar interval data into coarser albeit more interpretable inter-

val models. The outcomes of IBeM are single-valued and interval predictions of

a target function, and a rule summary, which describes the behavior of a system.

IBeM emphasizes uncertain data manifesting as multi-dimensional tolerance in-

4.1 Introduction

tervals and recursive learning procedures rooted in fundamentals of the interval

mathematics theory. Antecedent and consequent terms of IBeM rules are deter-

mined by input and output crisp hyperboxes (granules) formed over time. Crisp

hyperboxes are referred to as a modality of crisp granular precisiation in the gen-

eralized theory of uncertainty (27) (153). IBeM input and output hyperboxes

are linked by a granular mapping, also called inclusion function in the interval

analysis terminology (75) (78) (81).

IBeM is equipped with a one-pass-through-the-data recursive algorithm which

builds its rule set gradually from scratch, captures new concepts from data

streams, and copes with uncertainty. The IBeM approach makes no speciﬁc

assumption about the properties of the data including probability distributions,

belief intervals, possibility values, membership functions. Moreover, no human

intervention is necessary during model construction. Contrarily, interval data

stream guides learning and reﬁnements. Examples of concepts translated into

interval data IBeM manages to handle include: the number of red balls in the

box is between 8 and 12; tonight’s temperature will be from 65 to 72 degrees;

the normal count of leukocytes in adult humans is 4500-11000 per cubic millime-

ter of blood; [0.8,0.85]. Intervals also rise after preprocessing singular data by

comprising them into a smaller amount of interval data.

Interval representation of interval data streams is attractive due to several

reasons: (i) easiness of acquiring parameters. Only two parameters related to

real features (upper and lower bounds) need to be captured; (ii ) adaptation of

intervals demands basic fully-formalized operations of interval arithmetic; (iii )

intervals make no speciﬁc assumption about the content of a granule. Higher-

level interval models are everything we wish to know from large quantities of

detailed, low-level, interval data; (iv) intervals can be translated quite easily to

linguistic propositions. Interval granular precision facilitates comprehension when

supported by a context. Naturally, an interval model has a great deal of appeal

to represent counterpart interval data.

4.2 Related Work

Literature in interval data modeling using interval representation is scarce. Some

works related to adaptive (non-evolving) interval modeling approaches that take

into account interval data are summarized next. The approaches do not support

online learning and require the whole data set to be available. Neural network

approaches able to learn from interval data streams (for example (46) (108)) are

discussed in Chapter 6.

A partitioning dynamical clustering algorithm which considers interval data

and Pompeiu-Hausdorﬀ distance was addressed in (31). The algorithm builds

clusters and identiﬁes their representative prototypes concomitantly at each pro-

cessing step. Interval data are compared using adaptive Pompeiu-Hausdorﬀ dis-

tance. This distance varies for each existing cluster according to intra-class struc-

tures. Although the clustering algorithm considers interval data, it is not suitable

for evolving modeling since prototype identiﬁcation is based on optimizing an ad-

equacy criterion that requires all data samples within a cluster to be available.

An extension of the radial basis function (kernel method) approach to interval

data mining is proposed in (37). Here, interval data result from the aggregation of

large data sets into smaller ones to represent uncertainty. Aggregation is carried

out through a Pompeiu-Hausdorﬀ distance-based approach that clusters numeric

data into crisp hyperboxes. The underlying learning approach can deal with

classiﬁcation, regression and novelty detection problems. The learning approach

cannot handle online data streams.

Reference (12) proposes an interval analysis based adaptive approach for an

extended Kalman ﬁlter. The approach is aimed at mobile robot navigation and,

particularly, at obstacle avoidance and robot position estimation. Since Kalman

ﬁlters are often aﬀected by noise and drift, interval adaptive approach is useful to

model and correct robot position estimates. Interval analysis methods dispense

deterministic modeling of the robot system and have shown to give more accurate

position estimates when compared to estimates using non-adaptive non-interval

Kalman ﬁlter methods.

4.3 Structure and Processing

The mathematical formalism of the interval analysis (Section 2.2) provides a

robust framework for the analysis of granular structures. Interval mathematics

supports the core of the IBeM learning algorithm and gives simplicity, correctness,

totality, closeness, eﬃciency, and optimality in the sense of Hickey et al. (56).

Let (x, y)[h],h= 1, ..., be the h-th observation of the target function f. The

output y[h]is known given the input x[h]or will be known some steps latter. In

this chapter each attribute xjof x= (x1, ..., xn) is an interval [xj, xj]. The same

holds true for the output y, that is, y= [y, y]. Therefore, (x, y) assembles a crisp

hyperbox (granule) in the Cartesian product space X×Y.

Let γi,i= 1, ..., c, be the current collection of IBeM granules built on the basis

of (x, y). Granules γiare deﬁned in the Cartesian product space X×Y. The

internal representation of γiin respect to input variables x= (x1, ..., xn) is empty.

This means that bounds of intervals, say [li

j, Li

j], j= 1, ..., n, are all IBeM records

from the input data stream. The output variable yis granulated using bounds

[ui, Ui]. The content of a granule in respect to the output yconveys an additional

information, an inclusion monotonic function pi. The inclusion function uses the

bounds of the input variables to produce granular approximation of f.

Rules Riassociated with granules γiare of the type:

Ri: IF (li

1≤x1≤Li

1) AND ... AND (li

n≤xn≤Li

THEN (ui≤y≤Ui) AND ˆy=pi(x1, ..., xn)

where

pi(x1, ..., xn) = ˆy=ai

j=1

j[xj, xj].(4.1)

Functions piare thin (with single-valued parameters) and of ﬁrst order in this

case. In general, each pican be of diﬀerent type, thick, and does not need to

be linear. Computing piusing xgives interval granular or single-valued approx-

imation of fdepending on the width of x, that is, greater than zero or zero.

The recursive least squares algorithm as described in Appendix B is used to de-

4.4 Learning in IBeM

termine the coeﬃcients ai

jof pi. Bounds [ui, Ui] are obtained from output data

granulation. They provide an enclosure of the solution, an outer approximation

of f.

IBeM exploits predominantly bottom-up incremental learning procedures to

form higher level granules and interval-valued rules from ﬁner granular data. In

a sense, it performs input-output data compaction to provide more human-like

models. A ∪-closure granular structure ensues from more speciﬁc local gran-

ules. In particular, granulation eases incremental updating and discovering of

the essence of the time and space structure of the data with modest storage

and processing costs. Experts usually prefer models that approximate real sys-

tem outputs and provide estimates of the approximation bounds. Taking into

account intervals in bounded-error context is the IBeM approach to deal with

uncertainty.

The next section introduces a learning approach to construct an IBeM model

from the very beginning, and adapt its structure and parameters on the ﬂy.

4.4 Learning in IBeM

This section addresses the working principle of the IBeM learning algorithm.

The learning algorithm detailed next is used to evolve the structure and pa-

rameters of IBeM models whenever new information appears in the data stream.

By IBeM structure we mean interval-type granules, If-Then rules, and a concept.

From an overall point of view, when new samples do not ﬁt current knowledge,

learning creates new granules and rules managing the granules. Conversely, when

new samples ﬁt current knowledge, learning adapts parameters of existing gran-

ules and rules if necessary. Eventually, the resulting granular structure may be

reﬁned or coarsened agreeing with inter-granule relationships.

The IBeM framework grants important characteristics for online modeling.

Its incremental learning algorithm spends a small and constant processing time:

the processing time does not scale with the number of samples. Continuous

processing on a per-sample basis enables IBeM to deal with concept drift and

shift within online environment. Constructive bottom-up mechanisms of learning

usually prevail over top-down, decomposing mechanisms.

4.4 Learning in IBeM

4.4.1 Choosing the Granularity

Let ρbe the maximum width that interval granules may assume:

wdt([li

j, Li

j]) ≤ρ, wdt([ui, U i]) ≤ρ, j = 1, ..., n;i= 1, ..., c. (4.2)

Values of ρallow diﬀerent representations of the same problem in diﬀerent levels

of detail. ρworks as an upper bound to the level of modeling abstraction.

For normalized data, ρtakes values in [0,1]. If ρequals 0, granules cannot be

expanded and each data sample is accommodated by a new granule. Conversely,

if ρequals 1, a single granule encloses the entire data set. To counterbalance

these extremes means to establish a tradeoﬀ between complexity and precision.

In the most general case, IBeM starts learning with an empty rule base and

devoid of knowledge about the data stream. It is reasonable in this case to set ρ

halfway to regard rule creation and rule adaptation equally. We consider ρ[0] = 0.5

as the default initial value. A simple and fast approach to adapt the maximum

width ρallowed for granules is as follows. Let rbe the number of rules created

during hrtime steps. If the number of rules grows faster than a given rate η, that

is, r > η, then ρis increased,

ρ(new) = 1 + r

hrρ(old).(4.3)

The idea here is to reject large rule bases because they increase model complexity

and may not help generalization. Equation (4.3) acts against outbursts of growth

letting intervals and granules expand larger.

Otherwise, if the number of rules grows at a rate smaller than η, that is, r < η,

then ρis decreased as follows:

ρ(new) = 1−(η−r)

hrρ(old).(4.4)

With this mechanism we maintain a data-dependent ﬂuctuating granularity.

4.4 Learning in IBeM

Alternative heuristic approaches to evolve the value of ρover time take into

account estimation errors and their derivatives as addressed in (79). Time-varying

granularity avoids guesses on how fast and how often the data stream changes.

4.4.2 Time Granulation

Consider an interval data stream (x, y)[h],h= 1, ... Time granulation groups a

set of successive instances (x, y)[h],h=hb, hb+1, ..., he, where hband hedenote

the lower and upper bounds of a time interval [hb, he]. The set of instances input

during [hb, he] is considered indistinguishable and the inequalities

wdt(ch(x[hb]

j, ..., x[he]

j)) ≤ρ, j = 1, ..., n, and wdt(ch(y[hb], ..., y[he])) ≤ρ(4.5)

hold true. Literally, the width of the convex hull of all samples available during

[hb, he] is less or equal to the maximum width allowed for granules, ρ. The sample

indexed by he+1 conveys at least one contrasting value.

The collection (x, y)[h],h=hb, hb+1, ..., he, produces a unique granule γ[H]

whose lower and upper endpoints are:

[l[H]

j, L[H]

j]=[min(x[hb]

j, ..., x[he]

j), max(x[hb]

j, ..., x[he]

j)],and (4.6)

[u[H], U[H]] = [min(y[hb], ..., y[he]), max(y[hb], ..., y[he])].(4.7)

Thus, a single granule γ[H]summarizes the content of several samples (x, y)[h].

Both, γ[H]and (x, y)[h], are of the same interval nature.

In the IBeM framework, time granulation is used as a preprocessing step

whenever input data arrive at diﬀerent rates, for example, x1arrives at each 10

milliseconds and x2at each 30 milliseconds. Multiple time granularities allow

synchronized analysis of concurrent data streams. Therefore, learning within the

space domain is based on the resulting interval granule γ[H]rather than on original

data (x, y)[h]. IBeM is not exposed to all original data, which are sometimes far

more abundant than time granules.

4.4 Learning in IBeM

4.4.3 Creating and Adapting Granules

No IBeM rules need to be preconceived nor does the amount of granules need to

be set in advance. From scratch, granules and rules are created and adapted on

demand, dynamically, steered by the behavior of the target process and informa-

tion mirrored in the measured data. Whenever data (x, y)[h]become available,

a decision mechanism is trigged and new granules and rules can be inserted into

the IBeM structure or existing ones can be reﬁned.

A key decision when building IBeM models concerns when and how to cre-

ate or adapt granules and rules recursively to consider never seen data samples

potentially bringing new information.

Deﬁne Ei= (Ei

1, ..., Ei

n, Ei

k), the expansion region of granule γi:

j= [Li

j−ρ, li

j+ρ], j = 1, ..., n, (4.8)

k= [Ui−ρ, ui+ρ].(4.9)

Expansion regions help to derive criteria for deciding whether or not data should

be gathered into a common granule. Figure 4.1 illustrates the expansion region

jof an attribute [li

j, Li

j] of the granule γi.

Figure 4.1: Expansion region of an IBeM granule

An IBeM granule is created when expansion regions Ei,i= 1, ..., c, do not

ﬁt the sample (x, y). In this case, none of the existing granules can expand its

4.4 Learning in IBeM

bounds beyond the limits imposed by ρto include the sample. Connective AND

operators of IBeM rules suggest the complete enclosing of both inputs xjand

output yfor the corresponding granule to be considered. The new granule γc+1

matches perfectly the sample that caused its creation, that is,

[lc+1

j, Lc+1

j]=[xj, xj],and (4.10)

[uc+1, U c+1]=[y, y].(4.11)

The parameters of the thin local function pc+1 are:

ac+1

j= 0, j 6= 0,and ac+1

0=mp(y).(4.12)

Adaptation of granules γisets the boundaries [li

j, Li

j] and [ui, Ui] to enclose

a sample (x, y). Meanwhile, parameters ai

jof the local inclusion function pi

are updated using recursive least squares as described in the Appendix B. A

granule γiis chosen to be adapted whenever (x, y) falls within its expansion region

Ei. Adaptation of only one granule is enough to ensure that the information

is incorporated in the model. Conﬂict resolution is addressed in Section 4.4.5.

Figure 4.2 summarizes nine situations that may happen depending on where the

data are conﬁned and appropriate adaptations.

In Fig. 4.2, the recently arrived datum x= [x, x] can be placed either outside,

partially inside, or inside granule γi. Depending on the location of x, IBeM may

create a new granule γc+1 and/or adapt the bounds of γi. Expansion of granules

is chieﬂy based on union and convex hull operations. All uncertainty is covered

by some granule to guarantee outer approximation of the input and output data.

Although datum and granule may have some level of overlap, two granules are

forbidden to overlap as result of these adaptation procedures.

4.4 Learning in IBeM

Figure 4.2: Creation and recursive adaptation of IBeM granules

4.4 Learning in IBeM

4.4.4 Reﬁning the Rule Base

Once granules are identiﬁed, IBeM analyzes the relationship among them and

proceeds accordingly. Top-down and bottom-up structural operations support

reﬁning and coarsening of granules over time. Structural knowledge is generated

to help visualization of relationships between diﬀerent parts of the problem.

Top-down processes produce ∩-closure granular models by splitting a large

granule into smaller granules. Situations in which the maximum width allowed

for a granule reduces (see Section 4.4.1) may cause top-down reﬁnements to ﬁt

some granules to the new value. Formally, wdt(γi)< ρ,i= 1, ..., c, may return

false for some i. In this case, granule γiis split into γi1and γi2so that

[li1

j, Li1

j] = [li

j, mp([li

j, Li

j])],and (4.13)

[li2

j, Li2

j] = [mp([li

j, Li

j]), Li

j], j = 1, ..., n. (4.14)

The same splitting procedure holds for the output interval [ui, Ui]. The procedure

is repeated until wdt(γi)< ρ for all i.

A∪-closure granular model results from a bottom-up process that involves

forming a larger granule from small granules. Consider the overall distance be-

tween two interval vectors as described in Section 2.2.3 and let







D(γ1, γ1)... D(γ1, γi)... D(γ1, γc)

.....

D(γi, γ1)... D(γi, γi)... D(γi, γc)

.....

D(γc, γ1)... D(γc, γi)... D(γc, γc)







(4.15)

be a distance matrix relating any pair of granules. Matrix Dis symmetric with

zeros in the main diagonal. Neighbor granules can be located close enough to

justify their combination into a coarser granule. The combination is based on the

minimum entry of matrix D, say D(γi1, γ i2), and depends if

4.4 Learning in IBeM

wdt(ch([li1

j, Li1

j],[li2

j, Li2

j])) ≤ρ, j = 1, ..., n, and (4.16)

wdt(ch([ui1, U i1],[ui2, Ui2])) ≤ρ. (4.17)

Granule γi=ch(γi1, γi2) is coarsening of γi1and γi2.

Coarsening provides more compact rule bases and contributes to eliminate

gaps between similar granules. At the top level, IBeM is closed by the most

general granule formed by the convex hull of all elementary granules.

4.4.5 Conﬂict of Interest

A requirement when designing granular systems such as IBeM is to include all

information that assembles a solution. However, at the same time it is desirable

to keep the system as simple as possible. During the course of learning, conﬂicting

situations may arise. In these cases, adaptation procedures that result in narrower

granules must be considered. Conﬂict of interest happens when two or more

Figure 4.3: Inter-granular conﬂict and data accommodation

4.4 Learning in IBeM

granules can be expanded to embrace a data sample. Figure 4.3 shows four

typical situations considering the current input xand two granules, say γi1and

γi2, they are: (i)x∈Ei1= [Li1−ρ, li1+ρ], but xdoes not. Conversely x∈Ei2,

but xdoes not; (ii )x∩Ei26=∅, but x∈Ei1and x6∈ Ei2; (iii )x⊆(Ei1∩Ei2);

and (iv)x∩Ei16=∅, but x∈Ei2and x6∈ Ei1. The respective adaptation

procedures are shown in the ﬁgure.

In case (i) a new granule is created to include xbecause γi1and γi2cannot

expand beyond ρ. Cases (ii) and (iv ) avoid redundancy and inconsistency ne-

glecting the adaptation of the granule that cannot enclose xentirely. Case (iii )

chooses the granule closer to xaccording to the distance D.

Inter-granular conﬂict resolution helps to choose which IBeM rule to adapt

and prevents overlapped intervals and contradiction. The tightest envelope for

the data generates a more concise description about the information it carries.

4.4.6 Removing Granules

Depending on the data sequence, small undesirable granules might be formed

quite close to large granules. These small granules, often called satellites (97),

contain residual information which is better neglected. The deletion procedure

we have proposed is useful to retain only the necessary information.

Broadly speaking, a granule should be removed from the IBeM structure if it

is inconsistent with the current concept. Common removing strategies either (i)

remove granules by age, (ii ) exclude the weakest granules based on error values, or

(iii) delete the most inactive granules. In IBeM, the strategy is to delete inactive

granules by exclusion. Old granules may still be useful in the current environment

whereas weak granules are attempted to be strengthened by adapting parameters

of local inclusion functions.

IBeM granules are deleted whenever they become inactive during hrtime

steps. If the application requires memorization of rare events, or if cyclical drifts

are anticipated, then it may be the case to let the granules remain forever. Re-

moving inactive granules periodically helps to keep the rule set updated and

concise.

4.5 Summary

4.4.7 Learning Algorithm

The IBeM modeling procedure can be summarized as follows:

——————————————————————

BEGIN

Set parameters ρ,hr,η,c= 0;

Read (x, y)[h],h= 1;

Create granule γc+1 and rule Rc+1;

For h= 2,... do

Read (x, y)[h];

Provide single-valued approximation mp(p(x[h]));

Provide granular approximation [ui,U i];

Calculate output error [h]=mp(y[h])−mp(p(x[h]));

If (x[h]

j/∈Ei

j|| y[h]/∈Ei

k, for i= 1, ..., c, and some j)

Create γc+1 and Rc+1;

Else

Resolve possible conﬂicts;

Update γiand Rito accommodate (x, y )[h];

Adapt local inclusion function parameters ai

jusing RLS;

If h=αhr,α= 1,2, ...

Update model granularity ρ;

Split and combine granules when feasible;

Remove inactive granules and rules;

END

——————————————————————

4.5 Summary

This chapter has introduced an interval-based evolving modeling approach to

assess the essence of interval data streams and simplify complex problems char-

acterized by nonlinearity and nonstationarity. The IBeM modeling approach for

interval data is based on the incremental evolution of hyperrectangle-like forms

of information granules and associated interval inclusion functions. Stream data

guide the construction of both granules and rule base without the need of hu-

man intervention or additional care. IBeM provides highly human-centric interval

and precise numeric approximations of functions using a fast one-pass learning

algorithm, modest memory requirements, and uncertain data.

Chapter 5

Fuzzy Set Based Evolving

Modeling

This chapter introduces an evolving fuzzy-set based granular framework to learn

from and model time-varying fuzzy input-output data streams. The framework

consists of a recursive algorithm capable of developing the structure of fuzzy

rule-based models on-demand. The framework is particularly suitable to handle

potentially unbounded fuzzy data streams and provide single-valued and granular

approximations of unknown nonstationary functions.

5.1 Introduction

A primary requirement of a broad class of evolving intelligent systems is to pro-

cess a sequence of numeric data over time. The fuzzy set based evolving modeling

(FBeM) framework employs fuzzy granular models to deal with more detailed

fuzzy granular data and therefore provide a more intelligible exposition of the

data. For each granular model there exists an associated fuzzy rule base. The

antecedent part of FBeM rules consists of fuzzy hyperboxes, which are inter-

pretable transparent descriptors of input granular data. The consequent part of

FBeM rules has a linguistic and a functional component. The linguistic compo-

nent arises from fuzzy hyperboxes formed by output data granulation. It facili-

tates model interpretation and encloses possible model outputs. The functional

5.2 Related Work

component is derived from input data and real-valued local functions. This com-

ponent produces more accurate approximands. The rationale behind the FBeM

approach is that it looks to input-output data streams under diﬀerent resolutions

and decides when to adopt coarser or more detailed granularities.

The function of FBeM is to deliver simultaneous single-valued and granular

function approximation and linguistic description of the behavior of a system.

Local FBeM models are a set of If-Then rules developed incrementally from input-

output data streams. Learning can start from scratch and, as new information is

brought by the data stream, granules and rules are created and their parameters

adjusted. Therefore, FBeM becomes more ﬂexible to handle data so that redesign

and retraining models all along are needless. The resulting input-output granular

mapping may be eventually either reﬁned or coarsened according to inter-granules

relationships and error indices.

5.2 Related Work

This section summarizes works related to incremental learning approaches to

handle data streams. The approaches described next are closely related to the

approach suggested in this chapter.

Fuzzy ARTMAP (30) is an adaptive resonance neural network. Its incremental

learning ability suggests its use in online data processing. Fuzzy ARTMAP is a

supervised neural network characterized by weight vectors. Half of the positions

of weight vectors represent one corner of a hyperbox, and the remaining half the

opposite corner. When new data arrive, the smallest box that encloses the data

is chosen. If no such box exists, then either the box that needs less expansion to

enclose the data is selected, or a new one is created. Next, a neuron representing

a hyperbox is selected and a vigilance criterion is checked. Vigilance serves to

choose another box if the one selected is too large. Fuzzy ARTMAP compares

the mapped class for an input with the actual label. If the labels are diﬀerent,

then it creates a new neuron and connects it to the actual label. Fuzzy ARTMAP

is not appropriate to ﬁt fuzzy data and provide granular function approximation.

Evolving Mamdani-Takagi-Sugeno neural fuzzy inference system (eMTSFIS)

(58) is a neurofuzzy approach to model the dynamic nature of real-world prob-

5.3 Structure and Processing

lems. eMTSFIS comes with an incremental learning algorithm that evolves its

structure and parameters according to time-varying numeric data streams. The

learning algorithm is life-long and addresses the stability-plasticity dilemma typ-

ical of neurofuzzy constructs. Essentially, the eMTSFIS approach combines the

relatively higher interpretability of Mamdani-type systems with the precision of

Takagi-Sugeno fuzzy systems in numeric data stream modeling.

The uncertain micro-clustering algorithm (UMicro) (2) considers that stream

data arrive together with their underlying standard error instead of assuming the

entire probability distribution function of the data is known. The algorithm uses

uncertainty information to improve the quality of the underlying results. UMicro

incorporates a time decay method to update the statistics of micro-clusters. The

decaying method is especially useful to model drifting concepts in evolving data

streams. The eﬃciency of the UMicro approach has been demonstrated in a

variety of data sets.

5.3 Structure and Processing

The formalism of fuzzy sets (Section 2.4) provides a framework for the analysis

and representation of fuzzy granular structures.

Let (x, y)[h],h= 1, ..., be the h-th observation of a data stream. The out-

put y[h]is known provided the input x[h]or will be known some steps latter.

In this chapter, each attribute xjof x= (x1, ..., xn) is a trapezoidal fuzzy in-

terval (xj, xj, xj, xj). The same holds for the output y, that is, y= (y, y, y, y).

Therefore, (x, y) shapes a fuzzy hyperbox in the Cartesian product space X×Y.

Let γi,i= 1, ..., c, be the current collection of FBeM granules built on the

basis of (x, y). Granules γiare deﬁned in the Cartesian product space X×Y.

Rules Rigoverning FBeM information granules γiare of the type:

Ri: IF (x1is Ai

1) AND ... AND (xnis Ai

THEN (yis Bi)

| {z }

linguistic

AND ˆy=pi(x1, ..., xn)

| {z }

functional

where Ai

jand Biare trapezoidal membership functions built in light of input and

5.3 Structure and Processing

output data being available; piis a local approximation function. The collection

of rules Ri,i= 1, ..., c, casts a rule base. Rules in FBeM are created and adapted

on the ﬂy whenever the data ask for improvement in the current model. Notice

that an FBeM rule combines both linguistic and functional consequents. The

linguistic part of the consequent favors interpretability since fuzzy sets may come

with a label. The functional part of the consequent oﬀers accuracy. Thus, FBeM

takes advantage of both linguistic and functional consequents within a single

framework.

Fuzzy sets Ai

jand Biare generated from scattered fuzzy granulation. The

scattering approach clusters the data into fuzzy sets when appropriate, and takes

into account the coexistence of a manifold of granularities in the data stream.

Sets Ai

jand Bican be easily extended to fuzzy hyperboxes γi(granules) in the

product space. Granules are positioned at locations populated by input and

outputs data in the product space. Figure 5.1 illustrates the scatter granulation

mechanism of fuzzy data. Note in the ﬁgure that the granularity of models is

coarser than the granularity of data. This is to obtain data compression and to

provide a more abstract, human-interpretable, representation of the data.

Figure 5.1: Scattering approach for fuzzy data granulation

Fitting data into conveniently placed and sized granules through scattering

leaves substantial ﬂexibility for incremental learning. The FBeM approach grants

5.3 Structure and Processing

freedom in choosing the internal structure of granules.

Yager (138) (140) has demonstrated that a trapezoidal fuzzy set Ai

j= (li

j,λi

Λi

j,Li

j) allows the modeling of a wide class of granular objects. A triangular fuzzy

set is a trapezoid where λi

j= Λi

j; an interval is a trapezoid where li

j=λi

jand

Λi

j=Li

j; a singleton (singular datum) is a trapezoid where li

j=λi

j= Λi

j=Li

Additional features that make the trapezoidal representation attractive comprise:

(i) ease of acquiring the necessary parameters: only four parameters need to

be captured; (ii ) many operations on trapezoids can be performed using the

endpoints of intervals which are level sets of trapezoids; moreover, the piecewise

linearity of the trapezoidal representation allows calculation of only two level

sets, corresponding to the core and support, respectively, to obtain a complete

implementation; (iii ) trapezoids are easier to translate into linguistic labels.

Fuzzy sets Bi= (ui, υi,Υi, U i) are used to assemble granules in the output

space. The local function piis adapted for samples that rest inside the granule

γi. In general, functions pican be of diﬀerent type and are not required to be

linear. Here we assume aﬃne functions:

pi(x1, ..., xn) = ai

j=1

jmp(xj),(5.1)

for simplicity. If higher-order functions piare used to approximate a function

f, then the number of coeﬃcients to be estimated increases, especially when

the number of input variables nis large. The recursive least squares algorithm

(Appendix B) is used to calculate the coeﬃcients ai

jof pi.

Trapezoidal fuzzy sets and scatter granulation allow granules to overlap. There-

fore, two or more granules can accommodate the same data sample. FBeM sin-

gular output is found as the weighted mean value:

i=1

min(Ai

1(x1), ..., Ai

n(xn))pi(x1, ..., xn)

i=1

min(Ai

1(x1), ..., Ai

n(xn))

.(5.2)

5.4 Learning in FBeM

Granular output is given by the convex hull of output fuzzy sets Bi∗, where i∗

are indices of granules that can accommodate the data sample. The convex hull

of trapezoidal fuzzy sets B1, ..., Bcis given as follows:

ch(B1, ..., Bc) = (min(u1, ..., uc), min(υ1, ..., υc),

max(Υ1, ..., Υc), max(U1, ..., U c)).(5.3)

The granular output given by Bi∗enriches decision making and motivates

interpretability. While being speciﬁc as determined by pwe risk being incorrect,

being unspeciﬁc from Bi∗increases our conﬁdence to be correct.

5.4 Learning in FBeM

5.4.1 Setting the Granularity

The maximum width fuzzy sets Ai

jare allowed to expand is denoted by ρ, that is,

wdt(Ai

j) = Li

j−li

j≤ρ,j= 1, ..., n;i= 1, ..., c. Diﬀerent values of ρyield diﬀerent

representations of the same data stream in diﬀerent levels of granularities.

Let the expansion region of a set Ai

jbe denoted by

j= [mp(Ai

j)−ρ

2, mp(Ai

j) + ρ

2],(5.4)

where mp(Ai

j)=(λi

j+ Λi

j)/2 is the midpoint of Ai

j. Expansion regions help to

derive criteria for deciding whether or not data samples should be considered in

the same granule.

For normalized data, ρtakes on values in [0,1]. If ρis equal to 0, then FBeM

granules are not enlarged. Learning creates a new rule for each sample, which may

cause overﬁtting and lead to excessive complexity and irreproducible optimistic

results. If ρequals 1, then a single granule may cover the entire data domain so

that FBeM becomes unable to manage nonstationarity in the data. Meaningful

life-long adaptability is reached choosing intermediate values for ρ.

5.4 Learning in FBeM

In the most general case, FBeM starts learning with an empty rule base and

without any knowledge about the data-generating process. In this case, a reason-

able approach is to initialize ρhalfway to yield structural stability and plasticity

equally. We consider ρ[0] = 0.5 as the default initial value.

A fast procedure to evolve ρover time is as follows. Let rbe the diﬀerence

between the current number of granules and the number of granules hrsteps

earlier, r=c[h]−c[h−hr]. If the quantity of granules grows faster than a given

rate η, that is, r > η, then ρis increased:

ρ(new) = 1 + r

hrρ(old).(5.5)

Equation (5.5) controls the value of ρin such a way to reject large rule bases and

therefore avoid complexity increasing. Large rule bases may not help generaliza-

tion of the results.

If the number of granules grows at a rate smaller than η, that is, r≤η, then

ρis decreased as follows:

ρ(new) = 1−(η−r)

hrρ(old).(5.6)

This procedure keeps model granularity time-varying according to the data stream.

Other granularity adaptation procedures may take into consideration estimation

errors and their derivatives, as suggested in (79) (80).

Reducing the maximum width allowed for granules requires shrinking larger

granules to ﬁt them to the new value. In this case, the support of a fuzzy set Ai

is narrowed as follows:

If mp(Ai

j)−ρ(new)

2> li

jthen li

j(new) = mp(Ai

j)−ρ(new)

If mp(Ai

j) + ρ(new)

2< Li

jthen Li

j(new) = mp(Ai

j) + ρ(new)

5.4 Learning in FBeM

Cores [λi

j,Λi

j] are handled similarly. Time-varying granularity is useful to avoid

guesses on how fast the data stream changes.

5.4.2 Time Granulation

Consider a fuzzy data stream (x, y)[h],h= 1, ... Time granulation groups a set of

successive samples (x, y)[h],h=hb, hb+1, ..., he, where hband hedenote the lower

and upper bounds of a time interval [hb, he]. The set of instances input during

[hb, he] produces a unique granule γ[H]whose corresponding fuzzy sets are

A[H]

j= (min(x[hb]

j, ..., x[he]

j), min(x[hb]

j, ..., x[he]

j),

max(x[hb]

j, ..., x[he]

j), max(x[hb]

j, ..., x[he]

j)),(5.7)

and B[H], which is constructed similarly from the output stream. Instances falling

within A[H]

j,j= 1, ..., n, and B[H]are considered indiscernible and the inequalities

wdt(A[H]

j)≤ρ, j = 1, ..., n, and wdt(B[H])≤ρ(5.8)

hold true.

Whenever input data arrive at diﬀerent rates, for example, x1arrives at each

2 seconds and x2at each 3 seconds, or the amount of data exceeds the aﬀordable

computational cost (e.g. in high-frequency applications), we resort to granulated

views of the time domain. Thereafter, rule construction is based on the resulting

fuzzy granules, A[H]

jand B[H], rather than on original data (x, y)[h]. FBeM does

not need to be exposed to all original data.

5.4.3 Creating Granules

No rule necessarily exists before learning starts. The incremental procedure to

create rules runs whenever at least one entry of an input (x1, ..., xn) does not

belong to expansion regions (Ei

1, ..., Ei

n), i= 1, ..., c, or the output y6⊂ Ei

i= 1, ..., c. Otherwise, the current rule base is not modiﬁed.

5.4 Learning in FBeM

A new granule γc+1 is assembled from fuzzy sets Ac+1

jand Bc+1 whose param-

eters match the sample, that is,

Ac+1

j= (lc+1

j, λc+1

j,Λc+1

j, Lc+1

j)=(xj, xj, xj, xj),

Bc+1 = (uc+1, υc+1,Υc+1, Uc+1)=(y, y, y, y).(5.9)

Coeﬃcients of the real-valued local function pc+1 are set to

ac+1

0=mp(y); ac+1

j= 0, j 6= 0.(5.10)

5.4.4 Adapting Granules

Adaptation of granules either expands or contracts the support and the core

of fuzzy sets Ai

jand Bito enclose new data, and simultaneously reﬁnes the

coeﬃcients of local functions pito improve accuracy. A granule is chosen to

be adapted whenever an instance of the data stream falls within its expansion

region. In situations in which two or more granules are qualiﬁed to enclose the

data, adapting only one of the granules is enough.

Data and granules are fuzzy objects of trapezoidal nature. A possible simi-

larity measure for vectors of trapezoids is:

S(x, Ai) = 1 −1

j=1

(|xj−li

j|+|xj−λi

j|+|xj−Λi

j|+|xj−Li

j|).(5.11)

This measure is based on Hamming or city-block distance (52); it quantiﬁes the

degree that input data match the current knowledge. Particularly, equation (5.11)

returns 1 for identical trapezoids, indicating the maximum degree of matching.

The output value decreases linearly as the trapezoids xand Aimove away from

each other. Among all granules qualiﬁed to accommodate a particular sample,

the one with highest similarity should be chosen. This procedure prevents conﬂict

and helps to keep the FBeM construction simple. Although equation (5.11) is

5.4 Learning in FBeM

simple to compute, involving only basic arithmetic operations, there are no strong

principled reasons to impose this measure. In fact, there is no generally accepted

consensus on a best similarity measure for a given application (33).

Adaptation of granules proceeds depending on how far an input datum xjis

from fuzzy set Ai

j. Namely,

If xj∈[mp(Ai

j)−ρ

2, li

j] then li

j(new) = xj

If xj∈[mp(Ai

j)−ρ

2, λi

j] then λi

j(new) = xj

If xj∈[λi

j, mp(Ai

j)] then λi

j(new) = xj

If xj∈[mp(Ai

j), mp(Ai

j) + ρ

2] then λi

j(new) = mp(Ai

If xj∈[mp(Ai

j)−ρ

2, mp(Ai

j)] then Λi

j(new) = mp(Ai

If xj∈[mp(Ai

j),Λi

j] then Λi

j(new) = xj

If xj∈[Λi

j, mp(Ai

j) + ρ

2] then Λi

j(new) = xj

If xj∈[Li

j, mp(Ai

j) + ρ

2] then Li

j(new) = xj.

The ﬁrst and the eighth rules suggest support expansion while the second and

seventh recommend core expansion. The remaining cases advise core contraction.

Figure 5.2 illustrates seven possible adaptation situations. In the ﬁgure, the

datum x= (x, x, x, x) places either outside, partially inside, or inside granule γi.

FBeM creates a new granule γc+1 or adapts the parameters of γiaccordingly.

Operations on core parameters, λi

jand Λi

j, require further adjustment of the

midpoint of the respective granule:

mp(Ai

j)(new) = λi

j(new) + Λi

j(new)

2.(5.12)

As result, support contraction may happen in two occasions:

If mp(Ai

j)(new) −ρ

2> li

jthen li

j(new) = mp(Ai

j)(new) −ρ

If mp(Ai

j)(new) + ρ

2< Li

jthen Li

j(new) = mp(Ai

j)(new) + ρ

Adaptation of consequent fuzzy sets Biis done similarly using output data

5.4 Learning in FBeM

Figure 5.2: Creation and recursive adaptation of FBeM granules

y. Coeﬃcients ai

jof the local function piare updated using the recursive least

squares algorithm as described in Appendix B. Storage of a number of recent

instances may be useful to guide alternative coeﬃcient identiﬁcation algorithms,

e.g., data chunks oriented algorithms. However, it comes with some additional

cost concerning memory and processing time.

5.4 Learning in FBeM

5.4.5 Coarsening the Granular Model

Relationships between granules may be strong enough to justify assembling a

more abstract granule that inherits the information of lower level granules. The

similarity measure (5.11) can be used to quantify granule-granule resemblance if

we restate it as:

S(Ai1,Ai2) = 1−1

j=1

(|li1

j−li2

j|+|λi1

j−λi2

j|+|Λi1

j−Λi2

j|+|Li1

j−Li2

j|).(5.13)

This measure has better discrimination capability than, for example, distance

between midpoints of granules (136), and its calculation is fast.

FBeM combines granules in intervals of hrsteps considering the lowest entry

of S(Ai1, Ai2), i1, i2= 1, ..., c,i16=i2, and a decision criterion. The decision may

be based on whether the new granule obeys the maximum width allowed ρ.

A new granule γi, coarsening of γi1and γi2, is formed by trapezoidal mem-

bership functions Ai

jwith parameters derived from Ai1

jand Ai2

jas follows:

j=min(li1

j, li2

λi

j=min(λi1

j, λi2

Λi

j=max(Λi1

j,Λi2

j=max(Li1

j, Li2

j).(5.14)

The granule γiencloses all the content of the granules γi1and γi2. The same

coarsening procedure is used to determine the parameters of the output member-

ship function Bi. The coeﬃcients of the local function of granule γiare:

j=1

2(ai1

j+ai2

j), j = 0, ..., n. (5.15)

Combining granules reduces the size of the rule base and eliminates redundancy.

The importance of reducing the number of rules in evolving rule-based systems

5.4 Learning in FBeM

is discussed in (96).

5.4.6 Removing Granules

A granule should be removed from the system model if it seems to be inconsistent

with the current knowledge. A common approach consists in deleting the most

inactive granules (79).

Let

Θi= 2(−ψ(h−hi

a)) (5.16)

be the activity factor associated to the granule γi, similar as in (3). The constant

ψis a decay rate, hthe current time step, and hi

athe last time step that granule γi

was processed. Factor Θidecreases exponentially when hincreases. The half-life

of a granule is the time spent to reduce the factor Θiby half, that is, 1/ψ.

Half-life 1/ψ is a value that suggests deletion of inactive granules. As a rule, ψ

is domain-dependent. Large values of ψexpress lower tolerance to inactivity and

higher privilege of more compact structures. Small values of ψadd robustness

and prevent catastrophic forgetting. If the application requires memorization of

isolated events or seasonality is expected, then it may be the case to set ψto 0

and let granules and rules exist forever. In general, ψshould be set in ]0,1[ to

keep model evolution active.

5.4.7 Learning Algorithm

The learning procedure to evolve FBeM models can be summarized by the al-

gorithm described below. The algorithm underlines the essence of data stream

oriented approaches where instances are read and discarded one at a time. His-

torical data are dispensable and evolution stands continuously on an incremental

basis.

5.5 Summary

———————————————————————————

BEGIN

Set parameters ρ,hr,η,ψ,c= 0;

Read (x, y)[h],h= 1;

Create granule γc+1;

For h= 2,... do

Read (x, y)[h];

Provide single-valued approximation p(x[h]);

Provide granular approximation Bi∗;

Calculate output error [h]=mp(y[h])−p(x[h]);

If x[h]or y[h]is not into granules’ expanded regions Ei∀i

Create granule γc+1;

Else

Adapt the most active granule γi,i=maxi(S(x, A1), ..., S(x, Ac));

Adapt local function parameters ai

jusing RLS;

If h=αhr,α= 1,2, ...

Combine granules when feasible;

Update model granularity ρ;

Remove inactive granules;

END

———————————————————————————

5.5 Summary

This chapter has introduced FBeM, an evolving granular fuzzy modeling frame-

work based on fuzzy granular data streams. FBeM carries a series of properties

that makes it suitable to model online nonstationary functions using fuzzy data.

FBeM gives accurate and granular information simultaneously. Granular model

outputs contain a range of possible values delimited by soft boundaries which

turns the outputs more reliable and truthful.

Chapter 6

Evolving Granular Neural

Networks

This chapter introduces evolving granular neural networks for neurofuzzy mod-

eling of fuzzy data streams. The evolving granular neural network (eGNN) ef-

ﬁciently handles concept changes, distinctive events of nonstationary environ-

ments. eGNN builds interpretable multi-sized local models using fuzzy neural

information fusion. An incremental learning algorithm develops the neural net-

work topology from the information contained in data streams. We emphasize

fuzzy intervals and objects with trapezoidal membership function representation.

More precisely, the framework considers triangular, interval, and numeric types of

data to construct granular fuzzy models as particular arrangements of trapezoids.

6.1 Introduction

Artiﬁcial neural network approaches are able to perform parallel processing, and

identify and generalize patterns in data sets. Although classical neural networks

can approximate any nonlinear continuous function in compact domains, they

usually demand high quality training data and time-consuming oﬄine learning.

Generally speaking, neural networks give black box, non-transparent, diﬃcult to

interpret models.

Granular neural networks were introduced in (112) as a framework to process

6.1 Introduction

information granules. By considering sets of objects sharing commonalities and

imprecise data items instead of precise singular data items, granular neural net-

works avoid processing detailed and costly data. Granular neural networks do not

need to consider all data, which are far more numerous than granules. Rather,

data can be discarded whenever they match an existing information granule.

The eGNN approach described in this chapter, diﬀerently from (112), is com-

mitted to online modeling of potentially unbounded fuzzy data streams. We

focus on fuzzy trapezoidal data, namely granular data expressed by trapezoidal

fuzzy sets. Trapezoids allow some freedom in the choice of representative granules

once they embody triangular fuzzy sets, intervals and real values as particular

realizations (138). The eGNN structure uses fuzzy aggregation neurons as basic

processing units and encodes a set of fuzzy rules and a fuzzy inference system.

Its structure results from a gradual neurofuzzy construction that is transparent

and interpretable. eGNN manages to discover more abstract high-level granular

knowledge from ﬁner granular data. High-level granular knowledge can be easily

translated into a fuzzy knowledge base. Each rule of the knowledge base consists

of two parts: rule antecedent (If part) and rule consequent (Then part). The

consequent part is composed by a linguistic and a local functional (real-valued

function) term. Independently of the choice of aggregation neurons, network

parameters and the nature of input-output data, the linguistic term of the rule

consequent provides a granular output while the functional term gives a singular

(pointwise) output.

The eGNN framework addresses four issues: (i) non-interpretability and lack

of transparency of black box neural network models; (ii ) online processing of

granular data streams; (iii ) trading oﬀ precision and interpretability; and (iv )

handling of large volumes of nonstationary data. The ﬁrst issue is addressed

using fuzzy hyperbox representations, which are interpretable and transparent

descriptors of granular data, together with fuzzy neurons to aggregate data. The

second issue is dealt with through incremental learning mechanisms capable of

processing granular data. The third issue is handled by combining functional

and linguistic fuzzy models into a single model. The last issue is managed by

resorting to scalable recursive learning algorithm that works on a per-sample

basis, requiring only features of a sample plus a small amount of aggregated

6.2 Related Work

information such as fuzzy rule bases or neural networks. Learning should be one-

pass, neglecting all previously seen data samples: each sample is processed only

once and removed from memory (10).

Learning in eGNN fundamentally means to accommodate new data into ex-

isting granular models on a recursive basis. Learning may add new granules,

neurons and respective connections into the network structure whenever neces-

sary. The parameters of the real-valued functions of rule consequents are also

object of learning. This means that eGNN captures new information from data

streams, adapts itself to the new scenario, and avoids redesigning and retrain-

ing. The granular neural structure may be coarsened or reﬁned depending on

inter-granules relationships (transparency and interpretability) and error indices

(accuracy).

Practical applications of eGNN include evolving regressors, classiﬁers, fore-

casters and neurofuzzy controllers (76) (77) (82).

6.2 Related Work

There are two main approaches related with evolving granular neural networks.

The ﬁrst concerns granular neural networks for data granulation, granularity

adaptation, and granular data processing. These are grounded in the princi-

ples of granular computing aiming at problem solving, complexity reduction, and

structured thinking. In this case, granular neural networks often require multi-

ple passes over data sets and oﬄine learning. The second involves data stream

oriented connectionist systems whose focus is on online tracking of nongranular

data in nonstationary environments. Although a number of evolving fuzzy neural

networks have succeeded in dealing with time-varying information, as it will be

shown next, they are not able to process granular fuzzy input-output data. The

evolving granular neural network framework addressed in this chapter beneﬁts

from and enhances both approaches.

6.2 Related Work

6.2.1 Granular Neural Networks

On conceptualizing the world at diﬀerent granularities, humans usually deal with

information granules hierarchically (88) (142) (156). Human learning proﬁts from

the aggregation of local fragments to form a global picture. Granular computing

oﬀers human-like learning to be gradually embedded into modeling approaches

through incremental granular data processing. Granule oriented neural networks

generalize numeric data oriented neural networks because they provide mecha-

nisms to process both singular and granular data.

Granular self-organizing maps (grSOM) (62) are neurofuzzy models for struc-

ture identiﬁcation in linguistic system modeling. grSOM induces fuzzy intervals

(granules) from the data using a metric tuned nonlinearly by a mass function. Its

learning approach is supported by an analysis from the lattice theory and genetic

algorithms. grSOM copes with ambiguity and can process fuzzy and interval

input data. Experimental results (62) recommend grSOM as a support tool in

decision making and performance gain in classiﬁcation tasks. Although grSOM

copes with ambiguity and can process fuzzy and interval input data, it needs the

entire data set to be available a priori. Learning is not incremental, and model

structure and parameters are not adaptive to test data.

Fuzzy granular neural networks (FGNN) (157) consider numeric-linguistic

data fusion, missing value prediction, and granular knowledge discovering in het-

erogeneous data sets. The FGNN packs relatively detailed granular data into

coarser and more intelligible granules. Since FGNN requires oﬄine learning and

uses a gradient descent method to back propagate errors and to adapt parameters,

it is not suitable to handle nonstationary data available gradually.

Granular reﬂex fuzzy min-max neural networks (GrRFMN) (108) learn from

and classify interval granular online data. The network structure emulates the

reﬂex mechanism of the human brain and deals with class overlapping using com-

pensation neurons. The GrRFMN training algorithm gives a way to compute

datum-model membership which leads to better network performance. Experi-

ments with real data sets (108) assert the eﬀectiveness of the approach.

The fuzzy min-max neural network (GFMM) (46) summarized in Section 3.2.2

is another example of a closely related granular neurofuzzy approach.

6.2 Related Work

The eGNN approach distinguishes from the granular neural network approaches

described above because it is oriented to fuzzy interval data streams, self-adapts

its structure continuously, and does not need to store past data.

6.2.2 Data Stream Oriented Neural Networks

The continuous increase in availability of large amounts of data has motivated

development of algorithms to process online data streams (11) (66). Evolving

the structure of models using new information in nonstationary data streams is a

challenging issue. Most of the traditional data mining and machine learning and

statistical algorithms usually assume a form of stationarity and demand multi-

ple passes over data sets. Thus, they do not meet the requirements needed for

online learning. The design of evolving models is concerned with gradual model

construction aiming at inducing new knowledge without catastrophic forgetting,

and reﬁning current knowledge keeping the system in operation.

Unsupervised evolving neural networks (65) can perform classiﬁcation from

unlabeled data streams. An example in this category is the Evolving Self-Organiz-

ing Map (eSOM), which uses standard principles of self-organization in an incre-

mental basis. eSOM allows prototype neurons to evolve in the input space to si-

multaneously acquire and keep a topological representation. The neighborhoods

of evolved neurons are not predeﬁned, and diﬀerently from non-evolving SOM,

they are determined online according to distances between neurons (65). The

eSOM learning algorithm is fast since it dispenses neighborhood ranking search,

but it is unable to cope with fuzzy intervals. eSOM is not proper for function

approximation.

Evolving fuzzy neural networks (EFuNN) (63) adapt their structure and pa-

rameters through incremental, hybrid supervised/unsupervised online learning.

EFuNN is able to model streams of data, new features, and new classes. Neurons

and connections are created during learning. Fuzzy or non-fuzzy rules can be

extracted from the network at any time. EFuNN has shown to be eﬃcient in pat-

tern recognition. The dynamic evolving neural-fuzzy inference system (DENFIS)

(64) is another type of fuzzy rule-based system for adaptive online learning. Like

EFuNN, DENFIS evolves using a hybrid incremental algorithm to ﬁt new input

6.3 Fuzzy Aggregation Neuron Model

data. Fuzzy rules may be created, updated or deleted before or during system op-

eration. It has been shown that DENFIS can learn complex temporal sequences

and outperform similar approaches. Both, EFuNN and DENFIS, cannot process

fuzzy intervals nor provide granular output.

The eGNN approach is diﬀerent from the evolving neural network approaches

described above primarily because of its ability to process fuzzy granular data

streams. Moreover, eGNN is able to simultaneously provide single-valued and

granular function approximation or classiﬁcation.

6.3 Fuzzy Aggregation Neuron Model

This section introduces fuzzy aggregation neurons which are pertinent when pro-

cessing data through successive layers of evolving granular neural networks.

Aggregation neurons are artiﬁcial neuron models based on aggregation opera-

tors (see Section 2.5). Evolving granular neural networks may use diﬀerent types

of aggregation neurons to perform information fusion (26). In general, there are

no speciﬁc guidelines to choose a particular aggregation operator to construct a

fuzzy neuron. The choice depends on the application environment and domain

knowledge. Although the choice usually conforms to simplicity, transparency, and

ﬂexibility requirements, occasionally, it may conform to the system performance

using the available data.

Let ex= (ex1, ..., exn) be a vector of membership degrees of a sample x=

(x1, ..., xn) in the fuzzy sets Gjof G= (G1, ..., Gn). Let w= (w1, ..., wn) be a

weighting vector such that

wj∈[0,1], j = 1, ..., n. (6.1)

Fuzzy aggregation neurons employ product T-norm to perform synaptic process-

ing and an aggregation operator Cto fuse the individual results of synaptic

processing in the neuron body. The output of a fuzzy aggregation neuron is:

6.4 Structure and Processing

o=C(ex1w1, ..., exnwn).(6.2)

An aggregation neuron produces a diversity of nonlinear mappings between

neuron inputs and output depending on the choice of weights w, triangular norms

Tand S, and parameters e(neutral element of uninorms, see Section 2.5.3) and

v(v-factor of compensatory aggregations, see Section 2.5.5). The structure of a

fuzzy aggregation neuron is shown in Fig. 6.1. Examples of outputs generated

by fuzzy aggregation neurons using uninorm Uwith T= min, S= max, e=

0.3, v= 0; and T-S aggregation Lwith T= min, S= max, e= 0, v= 0.3 are

illustrated in Figs. 6.2(a) and 6.2(b), respectively.

Figure 6.1: Fuzzy aggregation neuron model

6.4 Structure and Processing

Let x= (x1, ..., xn) be an input vector and yits corresponding output. Assume

that the data stream (x, y)[h],h= 1, ..., are samples derived from a time-varying,

nonstationary function f. Inputs xjand output yare symmetric fuzzy data.

eGNN has a four-layer structure as shown in Fig. 6.3. The input layer delivers

samples x[h], one at a time, to the network. The granular layer consists of a collec-

tion of fuzzy sets Gi

j,j= 1, ..., n;i= 1, ..., c, stratiﬁed from the input data. Fuzzy

sets Gi

j,i= 1, ..., c, form a fuzzy partition of the j-th input domain, Xj. Simi-

larly, fuzzy sets Γi,i= 1, ..., c, assemble a fuzzy partition of the output domain

6.4 Structure and Processing

(a) Uninorm Umin,max ,e= 0.3, v= 0

(b) T-S operator Lmin,max,e= 0, v= 0.3

Figure 6.2: Examples of input/output functions of fuzzy aggregation neurons

Y. A granule γi=Gi

1×...×Gi

n×Γiis a fuzzy relation in X1×...×Xn×Y. Thus,

granule γihas membership function γi(x, y) = min{Gi

1(x1), ..., Gi

n(xn),Γi(y)}in

X1×... ×Xn×Y. Granule γiis denoted by γi= (Gi,Γi), with Gi= (Gi

1, ..., Gi

n),

for short. Moreover, the granule γicomes with an associated local function pi.

In general, functions pican be of diﬀerent type and are not required to be linear.

This study employs non-fuzzy, real-valued aﬃne functions:

pi(ˆx1, ..., ˆxn) = ˆyi=ai

j=1

jˆxj,(6.3)

6.4 Structure and Processing

Figure 6.3: Single-valued approximation provided from input data processing

for simplicity. Parameters ai

0and ai

jare real values; ˆxjis the midpoint of xj=

(xj, xj, xj, xj), which is obtained from:

mp(xj) = ˆxj=xj+xj

2.(6.4)

Similarity degrees exi= (exi

1, ..., exi

n) arise as result of matching between an input

x= (x1, ..., xn) and fuzzy sets of Gi= (Gi

1, ..., Gi

n), see Section 2.4.3. The ag-

gregation layer comprises fuzzy aggregation neurons Ci,i= 1, ..., c, whose role

is to combine information from the diﬀerent inputs. A fuzzy aggregation neu-

6.4 Structure and Processing

ron Cicombines weighted similarity degrees (exi

1wi

1, ..., exi

nwi

n) into a single value

oi. The output layer aggregates weighted values (o1ˆy1δ1, ..., ocˆycδc) using a fuzzy

aggregation neuron Cfto produce the single network output ˆy[h]. Formally,

ˆy=Cf(o1ˆy1δ1, ..., ocˆycδc).(6.5)

An m-output eGNN requires a vector of local functions (pi

1, ..., pi

m), output

layer neurons (Cf

1, ..., Cf

m), and outputs (ˆy1, ..., ˆym). The network output ˆy, ob-

tained as illustrated in Fig. 6.3, is single-valued approximation of findependent

whether the input data are real numbers, fuzzy numbers or fuzzy intervals.

Granular approximation of a function fat step His given by a set of granules

γi,i= 1, ..., c, such that:

(x, y)[h]⊆

[

i=1

γi, h = 1, ..., H. (6.6)

Granular approximation is formed by granulating input data x[h]into fuzzy sets

of Gi, as in Fig. 6.3, and output data y[h]into fuzzy sets Γi, as in Fig. 6.4. Note

in Fig. 6.4 that the granular approximation is the convex hull of output fuzzy

sets Γi∗, where i∗are indices of the active granules, that is, those for which oi>0.

This guarantees that the single approximation ˆy[h]is included in the granule.

Figure 6.4: Granular approximation formed by input and output data granulation

6.4 Structure and Processing

The convex hull of trapezoidal fuzzy sets Γ1, ..., Γi, ..., Γc, with Γi= (ui, ui, ui, ui),

is a trapezoidal fuzzy set ch(Γ1, ..., Γc) such that:

ch(Γ1, ..., Γc) = (min(u1, ..., uc), min(u1, ..., uc),

max(u1, ..., uc), max(u1, ..., uc)).(6.7)

Particularly in Fig. 6.4, the trapezoid (ui∗, ui∗, ui∗, ui∗

) that results from

ch(Γi∗), where i∗={i:oi>0, i = 1, ..., c}, is a granular approximation of

output data y. It is worth noting that granular approximation at instant hdoes

not depend on the availability of y[h]because oiis obtained from x[h](see Fig.

6.3). Only the collection of output fuzzy sets Γiis required.

Figure 6.5 shows an example of single-valued and granular approximation, p

and Sc

i=1 γi, of a function f. In Fig. 6.5(a), a singular input x[h1]and a granular

input x[h2]produce singular outputs ˆy[h1]and ˆy[h2]by using p. In Fig. 6.5(b),

the granular input x[h]activates the fuzzy sets of G2and G3. Therefore, granular

output is obtained from the convex hull operation ch(Γ2,Γ3). It follows that

y[h]⊂ch(Γ2,Γ3). If y[h]6⊂ ch(Γ2,Γ3), either fuzzy set Γ2or Γ3is adapted to

enclose y[h]. Granule adaptation is addressed in the next section.

Information processing in intermediate layers of eGNN is single-valued to

speed up calculations. Thus, input and output fuzzy sets can be viewed as de-

coders and encoders of granular data. eGNN comes with an incremental learning

algorithm to adapt its structure and parameters over time. The algorithm will

be detailed in Section 6.5.

eGNN evolves functional and linguistic fuzzy models. While functional fuzzy

systems are more precise, linguistic fuzzy systems are more interpretable. Accu-

racy and interpretability require tradeoﬀs and one usually excels over the other.

eGNN joins functional and linguistic systems into a single framework. Under the

assumption of speciﬁc weights and neurons, fuzzy rules extracted from eGNN can

be of the type:

6.5 Learning in eGNN

(a) eGNN single-valued approximation pof function f

(b) eGNN granular approximation S5

i=1 γiof function f

Figure 6.5: eGNN single-valued (a) and granular (b) approximation of a function

Ri: IF (x1is Gi

1) AND ... AND (xnis Gi

THEN (yis Γi)

| {z }

linguistic

AND ˆy=pi(x1, ..., xn)

| {z }

functional

This type of eGNN combines Mamdani and Takagi-Sugeno fuzzy models.

6.5 Learning in eGNN

This section details the eGNN learning algorithm. Diﬀerently from the usual top-

down granular approaches used, because the data domain is unknown beforehand,

6.5 Learning in eGNN

the eGNN learning approach is mostly bottom-up.

Developing the fuzzy rules encoded in the network structure and approximat-

ing nonstationary functions from granular data streams are the key concerns of

the learning approach. The eGNN learning employs a sample-per-sample testing-

before-training method on a recursive basis. This method portrays a truly online

data stream scenario. We assume that no granules, neurons and connections

exist before training starts. The algorithm builds the network structure in plug-

and-play mode. Single pass over data enables eGNN to address the issues of

unbounded data sets and scalability of computationally hard problems.

We assume trapezoidal membership functions Gi

j= (gi

j,gi

j) and input

data xj= (xj,xj,xj,xj). Similarly, Γi= (ui,ui,ui,ui) and output data

y= (y,y,y,y) are trapezoids. Each rule antecedent Gi= (Gi

1, ..., Gi

n) has a

correspondent consequent Γi. With γi= (Gi,Γi), eGNN looks at examples (x, y)

at a coarser granule size.

6.5.1 Adapting the Granularity

Balancing parametric and structural adaptation is a key to capture gradual and

abrupt changes of nonstationary systems online. The neural pattern recognition

literature refers to this issue as the stability-plasticity dilemma (29). Structural

plasticity in eGNN means creating new granules and rules to memorize new con-

cepts. This avoids the rules learned to be exposed to catastrophic forgetting.

Structural stability preserves the eGNN structure, but allows adaptation of the

existing granules and rules to smooth and slow changes. Parametric reﬁnement

partially retains the information. The procedure suggested below is a way to

parsimoniously reconcile plasticity and stability in eGNN.

The maximum width fuzzy sets Gi

jare allowed to expand is denoted by ρ, that

is, wdt(Gi

j)≤ρ,j= 1, ..., n;i= 1, ..., c. The value of ρaﬀects the granularity,

accuracy and transparency of the models. Similar as in Chapter 5, the expansion

region of a fuzzy set Gi

jis

j= [mp(Gi

j)−ρ

2, mp(Gi

j) + ρ

2].(6.8)

6.5 Learning in eGNN

It follows that

wdt(Gi

j)≤wdt(Ei

j)∀j, i. (6.9)

ρplays a pivotal role in mediating plasticity and stability of eGNN structures.

Expressions similar to (6.8) and (6.9) can be derived for fuzzy sets Γi. Expansion

regions help to derive criteria for deciding whether or not granular data should

be considered enclosed by the current granular model.

In practice, ρ∈[0,1] determines the need to create or adapt rules. In the

most general case, the neural network starts learning with an empty rule base

and devoid of knowledge about data properties. Therefore, in these cases it is

interesting to initialize ρwith an intermediate value to allow structural stability

and plasticity equally. We use ρ[0] = 0.5 as default initial value.

A simple and fast procedure to evolve ρis as follows. Let rbe the number

of rules created after hrsteps. If the number of rules grows faster than a rate η,

that is r > η, then ρis increased:

ρ(new) = 1 + r

hrρ(old).(6.10)

The idea is to reject large rule bases because they increase model complexity

and may not help generalization. Equation (6.10) punishes ρand acts against

outbursts of growth.

Otherwise, if the number of rules grows at a rate smaller than η, that is r≤η,

then ρis decreased,

ρ(new) = 1−(η−r)

hrρ(old).(6.11)

If ρ= 1, then eGNN is structurally stable, but unable to capture abrupt

changes. Conversely, if ρ= 0, then eGNN overﬁts the data causing excessive

complexity and irreproducible optimistic results. Life-long adaptability is reached

choosing intermediate values for ρas depicted in Fig. 6.6.

6.5 Learning in eGNN

Figure 6.6: Stability-plasticity tradeoﬀ and the role ρin eGNN systems

Reducing the maximum width allowed for granules may require shrinking

larger granules to ﬁt them to new data. In this case, the support of a fuzzy set

jis narrowed as follows:

If mp(Gi

j)−ρ(new)

2> gi

jthen gi

j(new) = mp(Gi

j)−ρ(new)

If mp(Gi

j) + ρ(new)

2< gi

jthen gi

j(new) = mp(Gi

j) + ρ(new)

Cores [gi

j, gi

j] and supports [ui, ui] and cores [ui, ui] of fuzzy sets Γiare handled

similarly. Time-varying granularity is useful to avoid guesses on how fast and

how often the data stream changes. The accuracy-interpretability tradeoﬀ is an

important issue in neurofuzzy computing (113).

6.5.2 Calculating Similarity Degree

As input data and granules are trapezoidal fuzzy objects, a potential similarity

measure to determine how much they match is:

exi

j=









1− |gi

j−xj|+|gi

j−xj|

4(max(gi

j,xj)−min(gi

j,xj)) !if xj∩Gi

j6=∅

0 otherwise.

(6.12)

This measure returns exi

jequal to 1 for superposed trapezoids and decreases lin-

early as any numerator term increases. The numerator terms are Hamming dis-

tances between pairs of parameters of two diﬀerent trapezoids. The denominator

scales the result to lie in the range from zero to one. Non-overlapped trapezoids

6.5 Learning in eGNN

are considered dissimilar, yielding exi

jequal to 0.

Reference (46) introduces a similarity measure between an interval and a fuzzy

set. The measure is based on the minimum T-norm and decreases monotonically

as the distance between the interval and the fuzzy set increases. Reference (108)

pointed out that this similarity measure gives low values to signiﬁcantly over-

lapped objects. In extreme cases, the similarity measure proposed in (46) returns

zero for a fuzzy set contained in an interval. To avoid this situation, (108) pro-

poses the average of the overlapping area between the interval and fuzzy set as

similarity measure.

The similarity measure in (6.12) extends the measures in (46) and (108) to

trapezoidal fuzzy sets. It overcomes problems which may arise due to some T-

norms when considering the boundaries of intervals, as in (46), but is faster than

the measure in (108) because it does not need to compute the area of arbitrary

polygons.

6.5.3 Creating Granules

The incremental procedure to create granules runs whenever the support of at

least one entry of an input vector (x1, ..., xn) is not enclosed by expansion regions

(Ei

1, ..., Ei

n), i= 1, ..., c. In this case, fuzzy sets Gicannot be expanded beyond

the limit ρto ﬁt the sample. Analogously, if supp(y) is not enclosed by Eifor at

least one Γi, then the sample should be enclosed by a new granule.

A new granule γc+1 is assembled from fuzzy sets Gc+1

jand Γc+1 whose param-

eters match the sample:

(gc+1

j, gc+1

j) = (xj, xj, xj, xj),(6.13)

(uc+1, uc+1, uc+1, uc+1) = (y, y, y, y).(6.14)

Coeﬃcients of the real-valued local function pc+1 are set to

ac+1

0=mp(y), ac+1

j= 0, j 6= 0.(6.15)

6.5 Learning in eGNN

6.5.4 Adapting Granules

Adaptation of granules means to expand or contract the support and the core

of fuzzy sets Gi

jand Γito enclose new data, and to reﬁne the coeﬃcients of the

local functions pi.

Granule γican be adapted whenever a sample (x, y) falls within its expansion

region, that is,

supp(xj)⊂Ei

j, j = 1, ..., n and supp(y)⊂Ei.(6.16)

This means that either the sample is enclosed by granule γior it is close enough

so that the granule can be expanded to enclose it. In situations in which two

or more granules are qualiﬁed to enclose the data, adapting only one of the

granules is enough to guarantee data inclusion. In particular, we may chose γi

such that i=arg max(o1, ..., oc). In other words, γiis the granule with the

highest activation level for a given sample.

Adaptation proceeds depending on where the input datum xjis regarding

fuzzy set Gi

If xj∈[mp(Gi

j)−ρ

2, gi

j] then gi

j(new) = xj

If xj∈[mp(Gi

j)−ρ

2, gi

j] then gi

j(new) = xj

If xj∈[gi

j, mp(Gi

j)] then gi

j(new) = xj

If xj∈[mp(Gi

j), mp(Gi

j) + ρ

2] then gi

j(new) = mp(Gi

If xj∈[mp(Gi

j)−ρ

2, mp(Gi

j)] then gi

j(new) = mp(Gi

If xj∈[mp(Gi

j), gi

j] then gi

j(new) = xj

If xj∈[gi

j, mp(Gi

j) + ρ

2] then gi

j(new) = xj

If xj∈[gi

j, mp(Gi

j) + ρ

2] then gi

j(new) = xj

The ﬁrst and last rules imply support expansion, the second and seventh rules

core expansion, and the remaining cases core contraction. Notice that these

adaptation procedures are similar to those of Section 5.4.4 related to the FBeM

approach (refer to Fig. 5.2 for examples).

6.5 Learning in eGNN

Operations on core parameters, gi

jand gi

j, require adjustment of the midpoint

of the respective fuzzy sets:

mp(Gi

j)(new) = gi

j(new) + gi

j(new)

2.(6.17)

As a result, support contraction may happen in two occasions:

If mp(Gi

j)(new) −ρ

2> gi

jthen gi

j(new) = mp(Gi

j)(new) −ρ

If mp(Gi

j)(new) + ρ

2< gi

jthen gi

j(new) = mp(Gi

j)(new) + ρ

The adaptation of consequent fuzzy sets Γiis done similarly using output data

y. Coeﬃcients ai

jof the local functions piare updated using the recursive least

squares algorithm as detailed in Appendix B.

6.5.5 Incremental Weighting

Aggregation layer weights wi

j∈[0,1] embody the importance of the membership

degree of the j-th attribute in fuzzy set Gi

jto the neural network output. If wi

1, then the output is not aﬀected. A relatively lower value of wi

jdiscounts the

impact of the respective attribute. If wi

j= 0, then the attribute is ignored. The

procedure described below assigns lower weight values to less helpful attributes.

Whenever a new granule γc+1 is created, the learning procedure assigns wc+1

1, j= 1, ..., n. If it is known a priori that the input variables have diﬀerent

importance, then values for wc+1

jcan be chosen diﬀerently to reﬂect domain

knowledge.

Taking into account the similarity measure (6.12) and the approximation er-

ror (B.4), weights wi

jcorresponding to the most active granule γi, where i=

arg max(o1, ..., oc), are recursively updated using:

j(new) = wi

j(old) −exi

joi||.(6.18)

6.5 Learning in eGNN

The idea here is that the single-valued approximation ˆyis more aﬀected by more

active granules and attributes. Equation (6.18) ascribes to the j-th attribute of

Gia proportion of the approximation error.

Incremental weighting looks for relevant subsets of input variables. The proce-

dure (6.18) suggested is particularly simple and fast to compute. More elaborate

approaches for incremental weighting are addressed in (98).

6.5.6 Pruning

Pruning granules simpliﬁes the neural network structure and keeps it ﬂexible to

track dynamic behavior. We opt to prune the most inactive granules because

retaining a small number of highly active granules favors compactness and speed.

Output layer weights δi∈[0,1] help pruning by encoding the amount of data

assigned to granule γi. Learning starts with δi= 1. During the next steps, δiis

reduced whenever γiis not activated within hrsteps as follows:

δi(new) = ζδi(old),(6.19)

where ζ∈[0,1]. Otherwise, if γiis activated at least once within hrsteps, then

δiis increased:

δi(new) = δi(old) + ζ(1 −δi(old)).(6.20)

If the value of δiis less than a threshold ϑ, then granule γi, its respective neuron Ai

and connections are pruned since they do not aﬀect system accuracy signiﬁcantly.

If the application requires memorization of rare events or cyclical behavior is

envisioned, then it may be the case to set ϑ= 0 and let δi→0+. In this case,

the granule is kept in the network structure.

6.5 Learning in eGNN

6.5.7 Combining Granules

Relationships between granules may be strong enough to justify forming a larger

granule that inherits the information of the lower level granules. A metric to

measure the distance between trapezoidal objects, say granules γi1and γi2, is:

D(γi1, γi2) = 1

4(n+ 1)(

j=1

(|gi1

j−gi2

j|+|gi1

j−gi2

j|+|gi1

j−gi2

j|+

+|gi1

j−gi2

j|) + |ui1−ui2|+|ui1−ui2|+|ui1−ui2|+|ui1−ui2|).(6.21)

Dis a distance measure since it satisﬁes

D(γi1, γi2)≥0

D(γi1, γi2) = 0 if and only if γi1=γi2

D(γi1, γi2) = D(γi2, γ i1)

D(γi1, γi3)≤D(γi1, γ i2) + D(γi2, γi3)

for any γi1,γi2and γi3. Distance Dis fast to compute and is more accurate

than both distance between midpoints of trapezoids and distance between closest

points of trapezoids. Change in any of the parameters of the underlying trapezoids

is reﬂected in the value of distance D.

Granules are combined after hrsteps considering the lowest value of D(γi1, γi2),

i1, i2= 1, ..., c,i16=i2, and a decision criterion. For instance, the decision crite-

rion may consider if the new granule obeys the maximum width allowed ρ.

A new granule γi, coarsening of γi1and γi2, is formed by trapezoidal mem-

bership functions Gi

jas follows:

j=ch(Gi1

j, Gi2

j), j = 1, ..., n. (6.22)

Γiis obtained similarly. The new granule γiencloses the support and core of the

granules combined.

The coeﬃcients of the new local function piare found as:

6.5 Learning in eGNN

j=1

2(ai1

j+ai2

j), j = 0, ..., n. (6.23)

Combining granules avoids redundancy by eliminating similar rules of the rule

base. Reference (96) has emphasized the importance of having a compact rule

base in evolving fuzzy systems.

6.5.8 Learning Algorithm

The learning procedure to evolve granular neural networks can be summarized

by the following algorithm:

———————————————————————

BEGIN

Select a type of neuron for the aggregation and output layers;

Set parameters ρ,hr,η,ζ,ϑ,c= 0;

Read (x, y)[h],h= 1;

Create granule γc+1, neurons Cc+1 ,Cf, and respective connections;

For h= 2,... do

Read (x, y)[h];

Input x[h]to the network;

Compute compatibility degrees (o1, ..., oc);

Aggregate values using Cfto get single-valued approximation ˆy[h];

Compute the convex hull of Γi∗,i∗={i, oi>0};

Find granular approximation (ui∗, ui∗, ui∗, ui∗

);

Compute output error [h]=mp(y[h])−ˆy[h];

If x[h]is not within expansion regions Ei∀i

Create granule γc+1, neuron Cc+1 and connections;

Else

Update the most active granule γi,i=arg max(o1, ..., oc);

Update local function parameters ai

jusing RLS;

Update connection weights wi

j∀j, i;

If h=αhr,α= 1,2, ...

Combine granules when feasible;

Update model granularity ρ;

Adapt connection weights δi∀i;

Prune inactive granules and respective connections;

END

———————————————————————

6.6 Summary

This chapter has introduced a fuzzy data stream modeling framework based on

an evolving fuzzy granular neural network approach. The eGNN framework pro-

cesses fuzzy data streams using fuzzy granular models, fuzzy aggregation neurons,

and an online incremental learning algorithm. Its neurofuzzy structure encodes a

set of fuzzy rules and a fuzzy inference system that establishes a tradeoﬀ between

precision and interpretability combining functional and linguistic fuzzy models.

The eGNN provides single-valued approximation as well as granular approxima-

tion of functions.

Chapter 7

Application Examples

The application examples addressed in this section consider numeric, interval, and

fuzzy data streams to demonstrate the usefulness of evolving granular approaches.

We look forward to a low error rate, concise constructs, a high processing speed,

and meaningful understandable rules in semi-supervised classiﬁcation, function

approximation, time series prediction, and control problems.

7.1 Introduction

The experimental work described in this chapter is based on data sets that have

already been collected for some purpose. Information extraction and knowledge

discovery are therefore based on simulations of singular and granular data streams

in online environment. All experiments require evolving granular systems to deal

with data they have never seen before and that demand a prompt response before

being used for model adaptation. The following assumptions hold true:

•online approaches start learning from scratch, unless otherwise stated;

•previous data is neither stored nor retrieved (space constraint);

•the data streams - no matter single-valued or granular - dictate the granu-

larity of the underlying models;

•no missing values are found in the original data sets;

7.2 Semi-Supervised Classiﬁcation

•the data sampling frequency is constant for the diﬀerent scenarios;

•the per-sample latency of the algorithms is not larger than the time interval

between samples (time constraint).

Ideally, online modeling methods, such as IBeM, FBeM and eGNN, should re-

tain all previous relevant knowledge and rely on the newest input data to perform

classiﬁcation, prediction, approximation or control.

7.2 Semi-Supervised Classiﬁcation

Semi-supervised learning methods use both labeled and unlabeled data to build

pattern classiﬁcation systems. Mixtures of labeled and unlabeled data are easily

found in practice (47) (102) (111) (158). Often, the acquisition of labeled data

requires human experts to manually classify training instances. Manual classiﬁ-

cation can be greatly inﬂuenced by subjectivity as well as not be feasible, as in

when we handle large data sets in online environment. There are situations in

which instances are labeled and apparently ask for fully supervised learning meth-

ods and standard procedures of classiﬁer design. However, the labeling process

may have been unreliable so that our conﬁdence in the labels already assigned

is relatively low (114). In these cases we resort to semi-supervised learning and

accept only a fraction of instances that we deem to have been labeled correctly.

Let an input-output pair (x, y) be related through y=f(x). We seek an ap-

proximation to fthat allows us to predict the value of ygiven x. In classiﬁcation

problems, yis a class label, a value in a set {C1, ..., Cm} ∈ Nm, and relation f

speciﬁes class boundaries. In the more general, semi-supervised case, Ckmay or

may not be known when xarrives. Classiﬁcation of data streams involves pairs

(x, C)[h]of time-sequenced data indexed by h. Nonstationarity requires evolving

granular classiﬁers to identify time-varying relations f[h].

The experiments described next aim to demonstrate the ability of evolv-

ing granular methods in classifying unbalanced single-valued partially-supervised

streaming data subject to gradual and abrupt concept changes.

7.2 Semi-Supervised Classiﬁcation

7.2.1 Rotation of Twin Gaussians

Gradual change is evaluated taking into account a two-attribute classiﬁcation

problem where two partially overlapping Gaussians rotate anti-clockwise around

a central point, as shown in Fig. 7.1. The Gaussians are initially centered at

(4,4) and (6,6) with standard deviation ﬁxed at 0.8. The kinematics of their

movement around the point (5,5) is as follows:

θ[h]=θ[h−1] +φ(7.1)

x[h]

1= 5 + √2cos(θ[h]) (7.2)

x[h]

2= 5 + √2sin(θ[h]) (7.3)

The initial reference angle θ[0] for Class 1 is 225oand for Class 2 is 45o. For

h= 1, ..., 200, the rotating rate φis kept equal to 0, meaning that no drift is

present. The rotation starts at h= 201, and, for h= 201, ..., 400, φ= 0.45. The

ﬁnal positions of the Gaussians are (6,4) and (4,6), respectively, for classes 1 and

2. The total rotation angle is 90o. Samples from both classes arrive randomly

and sequentially.

Figure 7.1: The rotating Gaussians problem

Assume classes 1 and 2 in Fig. 7.1 correspond to the positive and negative

classes, respectively. Consider a confusion matrix consisting of two rows and two

7.2 Semi-Supervised Classiﬁcation

columns representing the number of true positives (TP), false positives (FP), true

negatives (TN), and false negatives (FN) as in the receiver operating characteristic

(ROC) method (45). The prediction accuracy of a classiﬁer can be deﬁned as

Acc = TP + TN

TP + FP + TN + FN .100%.(7.4)

This metric is usually employed for balanced data sets. The ROC method pro-

vides a convenient way to evaluate the quality of evolving classiﬁers in unknown

nonstationary environment because it is insensitive to both changes in class dis-

tribution and proportion of samples per class (45).

The ROC space is deﬁned over the TP ratio and FP ratio,

TP ratio = TP

TP + FN (7.5)

FP ratio = FP

FP + TN.(7.6)

For each class, ROC applies threshold values across the interval [0,1] to outputs.

Each cut-oﬀ threshold corresponds to a point (sensitivity/speciﬁcity pair) in the

ROC space. The closer the ROC curve is to the upper left corner, the better the

classiﬁcation. A test with perfect discrimination (no overlap in the two distribu-

tions) has a ROC curve that passes through the upper left corner.

We look for the decision boundary between the Gaussians using the newest

input data. Starting IBeM parameters are ρ= 0.35, hr= 40 and η= 2; FBeM

starts with the same parameters as IBeM plus 1/ψ =hr; and eGNN employs

Ci=Tmin,Cf=Smax,ρ= 0.42, hr= 40, η= 2 and ζ=ϑ= 0.5. These

values have worked well for a range of classiﬁcation problems. The functional

consequents of IBeM, FBeM and eGNN can be neglected in classiﬁcation tasks.

Aiming at emphasizing the importance of incremental learning in nonsta-

tionary data stream classiﬁcation, we ﬁrst compare evolving and non-evolving

approaches for the rotating Gaussians problem. We consider widely known non-

evolving methods, viz. a multi-layer perceptron (MLP) neural network trained

oﬄine via gradient descent (55), and a fuzzy C-means (FCM) clustering method

(22). Table 7.1 summarizes the results averaged over 5 runs for each method.

100

7.2 Semi-Supervised Classiﬁcation

Table 7.1: Rotating Gaussians: comparing evolving/non-evolving methods

Model # Avg.Rules Acc CPU*

MLP – 57.0 41.0

FCM 10.0 60.5 0.8

IBeM 2.8 87.4 0.9

FBeM 3.4 92.3 0.4

eGNN 4.3 92.1 0.3

* Average CPU time per sample in milliseconds

Table 7.1 shows that FBeM is the most accurate approach in this example,

producing an Acc index slightly superior to that of eGNN. The performance

of non-evolving methods degrades when the concept changes in online drifting

scenarios because the structure and parameters of the underlying models are

ﬁxed. MLP and FCM could not track the rotation of the Gaussians leading to

relatively worse results. We also consider for performance evaluation the average

number of rules in the model structure over the learning steps, and CPU time in a

dual-core 2.54GHz processor with 4GB of RAM. IBeM provided the most compact

model with an average of 2.8 rules during the learning process, whereas eGNN

was the fastest method being able to process each sample in 0.3 milliseconds.

Figure 7.2 shows the ROC curves produced by the evolving and non-evolving

methods. The results obtained were essentially the same in the diﬀerent runs.

The diagonal line corresponds to random guessing, e.g., coin ﬂipping.

Note in Fig. 7.2 that the area under the ROC curves of evolving granular ap-

proaches is larger than that referring to ROC curves of non-evolving approaches.

The ROC analysis conﬁrms that FBeM is slightly superior to eGNN and IBeM

in this classiﬁcation problem no matter if the Gaussian distribution is changed to

any other distribution or if the dataset is unbalanced. The area above the FBeM

ROC curve refers in part to 7.7% Acc of classiﬁcation error and in part to the

overlapping between granules with diﬀerent assigned labels.

Figure 7.3 shows an example of eGNN decision boundary (the overall best of

all experiments) for a 0.5 ROC cut-oﬀ threshold applied to outputs. At h= 200,

eGNN has 5 granules embodying its structure, two associated to Class 1 and

101

7.2 Semi-Supervised Classiﬁcation

Figure 7.2: ROC curves of diﬀerent methods for the rotating Gaussians

three to Class 2. It attained a 94.5% Acc classiﬁcation rate. After rotation, that

is, after h= 400, eGNN employs 5 granules in its structure, three for Class 1 and

two for Class 2. It achieves a 97.5% Acc recognition performance.

As an example, a highly active eGNN rule at h= 400 is:

————————————————————————————–

R4: IF (x1is [3.1774,4.0022,4.605,4.8683] AND

x2is [4.9950,5.1767,5.4811,6.9495])

THEN ˆyis Class 1

————————————————————————————–

Linguistically, and assuming that the ﬁve partitions of the input variables are de-

scribed by the adjectives: ‘very low’, ‘low’, ‘medium’, ‘high’ and ‘very high’, and,

for example, a problem of classifying wines produced with grapes from diﬀerent

vineyards. Thus, rule R4can be read: if the ‘concentration of ﬂavanoids’ (x1)

is ‘medium’ and ‘color intensity’ (x2) is ‘high’, then the wine was produced by

‘vineyard number 1’ (ˆy).

102

7.2 Semi-Supervised Classiﬁcation

(a) Snapshot at h= 200

(b) Snapshot at h= 400

Figure 7.3: eGNN decision boundary and last 200 data at particular time steps

7.2.2 New Class

A second experiment concerns an abrupt change: a new class in the data stream.

We introduce a new Gaussian class centered at (7,3) with dispersion 0.8 at h=

200, as shown in Fig. 7.4. Evolving granular methods should learn the previously

unknown class as soon as related information appears in the data stream.

Table 7.2 shows the results of evolving methods averaged over 5 independent

runs. Non-evolving and even parametric adaptive methods are unable to discover

103

7.2 Semi-Supervised Classiﬁcation

Figure 7.4: A third class appears at h= 200 and remains

new classes in data streams without redesigning and retraining the classiﬁer from

the beginning. Hence, they are inappropriate to this problem.

Table 7.2: New class problem: comparing evolving granular methods

Model # Avg.Rules Acc CPU*

IBeM 3.1 83.8 0.7

FBeM 3.4 89.5 0.4

eGNN 4.5 88.1 0.3

* Average CPU time per sample in milliseconds

We note from Table 7.2 that FBeM is the most accurate method, giving a

marginally better Acc index than eGNN. IBeM provided the most concise clas-

siﬁer; eGNN was the fastest method. Figure 7.5 illustrates the evolution of the

Acc index, number of rules, and granularity taking into consideration FBeM. The

results for the remaining methods are essentially the same.

We observe in Fig. 7.5 that the accuracy of the FBeM classiﬁer is kept at a

similar level after the shifting of the concept at h= 200. The Acc index reduced

from 90.05% to 89.00%, which is quite acceptable. The robustness of the FBeM

system to nonstationarities, as shown in the ﬁgure, is typical of evolving granular

systems in view of their structural and parametric ﬂexibility.

Figure 7.6 depicts the decision boundaries and the last 200 instances at h=

200 and h= 400 regarding eGNN. The neural network evolved a total of 6

104

7.2 Semi-Supervised Classiﬁcation

Figure 7.5: FBeM evolution of the Acc index, rule base and granularity for the

new-class problem

granules during the ﬁrst 200 steps, three associated with each of the two ﬁrst

classes. At this point, the eGNN Acc rate was 94.5%. Data about the third class

started to arrive at h= 200 and at h= 400 eGNN developed 8 granules, three

assigned to Class 1, two to Class 2, and three to Class 3. Assuming classes 2 and

3 as negative classes, eGNN reached a 92.5% Acc classiﬁcation rate - the overall

best accuracy of all experiments conducted.

7.2.3 Combining Labeled and Unlabeled Data

We analyze the behavior of granular approaches in semi-supervised online clas-

siﬁcation. Partially supervised learning methods combine labeled and unlabeled

data for training. Such mixtures are frequently found in practice (47) (102) (158).

Our approach for hybrid clustering and classiﬁcation is: if an unlabeled sample

causes the creation of a granule, then the class of the granule remains undeﬁned

until a new labeled sample falls within its bounds. The class label of the new

sample tags the granule. Contrarily, if an unlabeled sample rests within the

105

7.2 Semi-Supervised Classiﬁcation

(a) Snapshot at h= 200

(b) Snapshot at h= 400

Figure 7.6: eGNN decision boundaries for the 3-class problem

bounds of an existing granule whose label is known, it borrows the granule label.

We propose to change the proportion of unlabeled data from 0% to 100% in

the twin (rotating and non-rotating) Gaussians problem, and in the new-arising-

class problem such that all spectrum of possibilities of semi-supervised learning

could be evaluated. Non-rotating and rotating Gaussians generate stationary and

gradually-changing data streams, respectively; a new class represents a sudden

abrupt shift.

Granularities ρwere chosen diﬀerently in the range [0.3,0.45] to forbid the

106

7.3 Time Series Prediction

number of granules to exceed 5 and emphasize the semantic aspect of the resulting

construct. The remaining parameters are the same as considered in the previous

sections. They are repeated here for convenience. IBeM and FBeM use hr=

1/ψ = 40 and η= 2; eGNN employs Ci=Tmin,Cf=Smax,hr= 40, η= 2

and ζ=ϑ= 0.5. Figure 7.7 illustrates the performance of the evolving granular

methods averaged over 5 runs for each condition.

Figure 7.7 shows that evolving granular methods beneﬁt from all information

contained in the data stream, including that from unlabeled samples (input do-

main information) to perform classiﬁcation. Conventional and evolving classiﬁers

which operate on a supervised basis by simply discarding unlabeled data cannot

deal with small fractions of labeled samples as in the situations in the right side

of the graphs. Note that the left and right extremes of the plots indicate total

supervision and no supervision, respectively. In both cases the ﬁnal result is a

partition of data into classes. By contrasting Fig. 7.7(a) with Figs. 7.7(b)-(c), all

classiﬁers, IBeM, FBeM and eGNN, are not signiﬁcantly aﬀected by concept drift

and shift. The generalization of sharp boundaries of IBeM to fuzzy boundaries of

FBeM and eGNN is particularly decisive for the precision of the models in these

classiﬁcation applications. FBeM and eGNN interchanged as the most eﬃcient

approach. When the environment is stationary or processes are not overly com-

plex, as in the non-rotating Gaussians problem, the high learning capability and

structural plasticity of eGNN seem to be unnecessary. FBeM has shown to be

superior in these cases. Conversely, as the complexity of the problem increases

with nonstationarities and mixtures of labeled and unlabeled data, the eGNN

modeling approach has demonstrated to be equivalent or better than FBeM.

7.3 Time Series Prediction

Observing past outcomes of a system to estimate its future behavior is the essence

of forecasting and prediction (28) (51). When a complete mathematical model of

a system can be developed and the corresponding initial conditions are known,

prediction is an easy task. However, when no mathematical model is available or

only partially known models are feasible, an alternative to forecasting is to build

107

7.3 Time Series Prediction

(a) Non-rotating Gaussians

(b) Rotating Gaussians

Figure 7.7: Performance of evolving granular classiﬁers using diﬀerent proportions

of unlabeled data

108

7.3 Time Series Prediction

models that consider current and past outcomes of the system, while neglecting

any external inputs. This is a look at what it does, not why approach (39).

Time series prediction is based on the idea that the series carry the potential

information needed to predict their future behavior. Analyzing data produced

by actual phenomena can give good insights into the phenomena themselves and

knowledge about the laws underlying the data.

Forward prediction of a discrete time series can be deﬁned as follows. Given a

ﬁnite sequence x[h],x[h−1], ..., x[h−M], ﬁnd the continuation x[h+1],x[h+2], ... This

involves ﬁnding a scalar Mand a function f, such that the value x[h+1] can be

estimated by:

ˆx[h+1] =f(x[h], x[h−1], ..., x[h−M]).(7.7)

This is equivalent to modeling the time series as

x[h+1] =f(x[h], x[h−1], ..., x[h−M]) + ψ[h+1] ,(7.8)

with ψ[h+1] being a white noise process. If the statistics of the time series is non-

Gaussian or the time series is the result of some nonlinear operation, the function

fis nonlinear. fis the function we aim to model using evolving granular systems.

This section considers interval and fuzzy granular data streams derived from

monthly mean, minimum, and maximum temperatures of weather time series of

geographic regions with diﬀerent climatic patterns. The aim is to predict monthly

temperatures for all regions.

7.3.1 Weather Prediction

Weather predictions are useful for people to plan activities, protect property; and

to assist decision making in many diﬀerent sectors such as energy, transportation,

aviation, agriculture, and inventory planning. Any system that is sensitive to the

state of the atmosphere may beneﬁt from weather forecasts.

Monthly temperature data carry a degree of uncertainty due to imprecision

109

7.3 Time Series Prediction

of atmospheric measurements, instrument malfunction, equivocated transcripts,

and diﬀerent standards in acquiring and pre-processing the collected data. Usu-

ally temperature data are numerical, but the processes which originate and supply

the data are imprecise. Temperature estimates in ﬁner time granularities (days,

weeks) are commonly demanded. Evolving granular approaches provide guaran-

teed granular predictions of the time series in these cases. The satisfaction of the

granular prediction depends upon the prediction model compactness. Granular

predictions together with single-valued predictions are important because they

convey a value and a range of possible temperature values.

In the experiment we translate minimum, mean, and maximum monthly tem-

peratures to triangular fuzzy numbers. Numerically-driven modeling approaches

use mean monthly temperatures only; interval approaches consider minimum and

maximum temperatures. The data were linearly scaled to the range [0,1]. We use

data from diﬀerent weather stations as summarized in Table 7.3 (data available

at http://eca.knmi.nl and http://cdiac.ornl.gov/epubs/ndp/ushcn/ushcn.html).

Table 7.3: Monthly temperature values

Station # Samples From To Std.Dev.

Bucharest 960 Jan 1930 Dec 2010 0.1795

Death Valley 1308 Jan 1901 Dec 2009 0.1835

Helsinki 1680 Jan 1871 Dec 2010 0.1842

Lisbon 1200 Jan 1910 Dec 2009 0.1556

Ottawa 1380 Jan 1895 Dec 2009 0.1790

As shown in Table 7.3, we consider ﬁve weather stations. In Death Valley

(Furnace Creek), super-heated moving air masses are trapped into the valley by

surrounding steep mountain ranges creating an extremely dry climate with high

temperatures. Refer to (123) for a complete list of factors that produce high air

temperatures in Death Valley. Conversely, Ottawa is one of the coldest capitals in

the world. During the year, a wide range of temperatures can be observed, but the

winters are very cold and snowy. Lisbon experiences more usual weather patterns.

Summers are warm, sometimes hot, whereas winters are mild and moist. Helsinki

and Bucharest are further weather stations considered for evaluation. Bucharest

110

7.3 Time Series Prediction

has a continental climate owing to its distance from the open sea. Summers are

generally hot while winters are quite cold. Helsinki combines characteristics of

maritime and continental climates. The proximity of the Arctic Ocean and the

North Atlantic creates cold weather, while the Gulf Stream conveys warm air.

During the computational experiments described subsequently, IBeM, FBeM

and eGNN scan the data only once to build their structure and adapt parameters.

This simulates online data stream processing. Testing and training are performed

concomitantly on a per-sample basis. The performance of algorithms is evaluated

using the root mean square error of singular predictions,

RMSE =v

h=1

(mp(y)[h]−ˆy[h])2,(7.9)

the non-dimensional error index,

NDEI =RM SE

std(mp(y)[h]∀h),(7.10)

average number of rules in the model structure, and per-sample CPU time. The

computer has a dual-core 2.54GHz processor with 4GB of RAM.

7.3.2 Performance Analysis

Diﬀerent computational intelligence methods were chosen for performance as-

sessment. They are: multilayer perceptron neural network (MLP) (55), evolving

Takagi-Sugeno (eTS) (6), extended Takagi-Sugeno (xTS) (7), dynamic evolving

neuro-fuzzy inference system (DENFIS) (64); and IBeM, FBeM and eGNN.

The task of the diﬀerent methods is to provide one step ahead forecast of

the monthly temperature y[h+1], using the last 12 observations, x[h−11], ..., x[h].

The number of previous observations was chosen by trial and error to provide

relatively accurate predictions. Online methods employ the sample-per-sample

testing-before-training approach as follows. First, an estimation ˆy[h+1] is derived

for a given input (x[h−11], ..., x[h]). One time step after, the actual value y[h+1]

111

7.3 Time Series Prediction

becomes available and model adaptation is performed if necessary. In general,

models should be robust to the trend and seasonal components of the time series,

and not to the random noise component. Because the observed data contain

random noise and irregular patterns, models that do not over ﬁt them produce

better generalizations and predictions of future values. Table 7.4 summarizes

forecasting results for the Bucharest, Death Valley, Helsinki, Lisbon, and Ottawa

monthly temperature data. IBeM starts with ρ= 0.6, hr= 84, η= 2; FBeM uses

ρ= 0.7, hr= 1/ψ = 48, η= 2; and eGNN uses Ci=Tmin,Cf=M,ρ= 0.45,

hr= 84, η= 2 and ζ=ϑ= 0.5.

Table 7.4 shows that eGNN gives the most precise forecasts in 3 of the 5

temperature data sets, seconded by eTS and FBeM with one each. The struc-

tures of the eGNN are, in average, the most parsimonious. Alternative evolving

approaches such as DENFIS, eTS and xTS use numeric data - the mean tem-

perature. In contrast, granular approaches such as eGNN, IBeM and FBeM take

into account the mean and its neighbor data to bound forecasts. The trend com-

ponent of the time series is taken into account in granular systems by procedures

that gradually adapt granules and rules. The seasonal component is captured

through diﬀerent granules which represent diﬀerent seasons and transitions be-

tween seasons. Since the content of a granule carries seasonal information, its

corresponding rule tends to be activated in the speciﬁc months. IBeM and xTS

are the fastest among the algorithms evaluated in this section.

We can also notice in Table 7.4 that the MLP neural network behaved well

in all temperature time series. Our hypothesis is that the temperature time

series obtained by the weather stations have not changed very much during the

time period considered. In general, oﬄine methods, such as the MLP, cannot

deal with nonstationary functions, do not support one-pass training, and require

higher CPU time and memory when compared with online methods. Moreover,

the MLP neural network does not provide comprehensible models to support data

description and interpretation.

As examples, the one-step singular and granular forecasts of FBeM for the

Death Valley time series and of eGNN for the Helsinki time series are shown

in Figs. 7.8 and 7.9. The additional plots in both ﬁgures show the granularity,

error indices, and number of rules developed. Note that while the singular predic-

112

7.3 Time Series Prediction

Table 7.4: Temperature forecasts

Station Method # Avg. Rules RM SE NDEI CPU

DENFIS 5.00 0.0800 0.4457 4.7

eGNN 3.80 0.0594 0.3309 1.6

eTS 3.00 0.0598 0.3331 1.1

Bucharest FBeM 7.57 0.0603 0.3359 1.1

IBeM 5.88 0.0643 0.3582 1.0

MLP – 0.0892 0.4969 35.5

xTS 10.00 0.0643 0.3582 1.0

DENFIS 8.00 0.0600 0.3270 4.7

eGNN 3.91 0.0498 0.2714 1.6

eTS 3.00 0.0491 0.2676 1.0

Death Valley FBeM 8.00 0.0506 0.2757 1.1

IBeM 8.79 0.0541 0.2948 1.0

MLP – 0.0584 0.3183 44.2

xTS 10.00 0.0503 0.2741 1.1

DENFIS 24.00 0.0780 0.4235 5.7

eGNN 2.78 0.0607 0.3295 1.6

eTS 4.00 0.0634 0.3442 1.4

Helsinki FBeM 6.00 0.0602 0.3268 1.1

IBeM 10.38 0.0764 0.4148 1.2

MLP – 0.0892 0.4843 35.5

xTS 16.00 0.0651 0.3534 1.1

DENFIS 12.00 0.0880 0.5656 5.2

eGNN 2.77 0.0577 0.3708 1.7

eTS 4.00 0.0714 0.4589 2.3

Lisbon FBeM 5.63 0.0599 0.3850 1.2

IBeM 3.59 0.0687 0.4415 1.0

MLP – 0.0955 0.6138 48.2

xTS 11.00 0.0744 0.4781 1.0

DENFIS 7.00 0.0770 0.4302 4.9

eGNN 3.88 0.0575 0.3212 1.5

eTS 3.00 0.0604 0.3374 1.0

Ottawa FBeM 6.80 0.0609 0.3402 1.1

IBeM 9.28 0.0734 0.4101 1.1

MLP – 0.0769 0.4296 41.3

xTS 14.00 0.0631 0.3525 1.1

113

7.3 Time Series Prediction

Figure 7.8: FBeM Death Valley temperature forecasts

114

7.3 Time Series Prediction

Figure 7.9: eGNN Helsinki temperature forecasts

115

7.3 Time Series Prediction

tion pattempts to match the actual mean temperature value, the corresponding

granular information [u, U], formed by the lower and upper bounds of the con-

sequent trapezoidal membership functions, intends to envelop previous data and

the uncertainty of the unknown temperature function f.

Figure 7.10 enlarges the temperature predictions of the Figs. 7.8 and 7.9 for

the time intervals [739, 807] and [1009, 1069], respectively. During these time

intervals the respective granular models support 8 and 3 rules.

(a) Zoom of FBeM Death Valley forecast (Fig. 7.8) using 8 rules

(b) Zoom of eGNN Helsinki forecast (Fig. 7.9) using 3 rules

Figure 7.10: Comparing the narrowness of granular forecasts using rule bases of

diﬀerent sizes

We notice from Fig. 7.10 that relatively larger rule bases produce narrower

ranges of values [u, U] to bound predictions. Granular forecasts are determined

based on past actual temperature values; they are particularly quite important

since they usually come with a label and a linguistic description. FBeM and

116

7.3 Time Series Prediction

eGNN are evolving approaches to handle fuzzy granular data streams, and to

simultaneously provide singular and granular predictions.

Overall, the results in this section suggest that evolving granular systems bene-

ﬁt from data uncertainty and interval, fuzzy, and neurofuzzy granular frameworks

to provide accurate and linguistic predictions of granular time series.

7.3.3 Time Complexity

This section examines how the performance of granular systems is aﬀected by the

number of input variables and rules. Here performance concerns temporal scala-

bility and RMSE to access processing time and prediction error, respectively.

For these purposes we ﬁrst performed several independent experiments vary-

ing the number of input variables (lagged observations of temperature values).

Initial parameters were chosen to give rule bases with about ten granular rules.

This means that the size of the rule bases should not interfere in the temporal

scalability analysis of evolving granular systems. We evaluate the processing time

and prediction error when the number of input variables increases. Evaluation

was performed in the context of temperature prediction. We assume FBeM and

the Death Valley, Lisbon and Ottawa time series as the results for IBeM, eGNN

and remaining time series are fundamentally the same. Figure 7.11 shows the

processing time and RSME for the time series chosen.

The bottom plot of Fig. 7.11 suggests that the time complexity of FBeM,

and of evolving granular systems in general, is quasi-linear with the number of

inputs. This is important since many computational intelligence and statistical

algorithms behave polynomially or exponentially, which prohibit their use in han-

dling massive data streams and modeling large-scale online processes. Evolving

granular systems run linearly with respect to the number of samplings once their

learning algorithms are one-pass and of incremental nature.

It is worth noting at the top of Fig. 7.11 that the weather time series require

a small number of input variables, while the remainders tend to confuse the

underlying predictor. The RMSE indices for Death Valley, Ottawa, and Lisbon

suggest local optima in the range between six to twelve input variables.

In the next experiment we ﬁx the number of input variables to ﬁve, and

117

7.3 Time Series Prediction

Figure 7.11: FBeM processing time and RMSE using diﬀerent amounts of input

variables from temperature time series

run the FBeM algorithm with parameters that force it to generate an increasing

number of rules. The goal here is to evaluate temporal scalability and RMSE

when the size of the rule base increases. Figure 7.12 shows the results obtained

for the Death Valley, Ottawa, and Lisbon time series data.

The bottom plot of Fig. 7.12 shows that the processing time of FBeM grows

exponentially with the number of rules. Although the algorithm deals linearly

with the number of samples and input variables, granularity constraints within

evolving granular framework is of utmost importance to keep the system operating

online. Eﬀective procedures to bound the rule base and protect evolving granular

systems from outbursts of growth are: (i) using the half-life value 1/ψ or the

deletion threshold hr. The total number of rules, c, is guaranteed to be less

than or equal to 1/ψ anytime. For example, suppose 1/ψ = 6 and that the

rule base contains 7 rules. The last 6 samples can only activate 6 or less of the

existing rules. Thus, at least one of the rules should be inactive for 7 time steps,

118

7.3 Time Series Prediction

Figure 7.12: FBeM processing time and RM SE for the Death Valley, Ottawa,

and Lisbon time series considering diﬀerent numbers of rules

which contradicts that 1/ψ = 6; (ii ) adapting the maximum width allowed for

granules, ρ. This procedure develops only the necessary quantity of granules and

rules. Notice that the points at the right of the plots of Fig. 7.12 can only

be obtained by setting 1/ψ to a very large value, e.g. 10000, and turning the

granularity adaptation procedure oﬀ.

The error curves at the top plot of Fig. 7.12 show that quite small and

large rule bases decreases model accuracy. We employ piecewise cubic Hermite

interpolation polynomials to ﬁt the error data. Curiously, error values suggest

more appropriate models with about 6 to 12 rules. This reinforces the hypothesis

that seasonal trends are better modeled by single rules. Excessive granularity

is detrimental because similar information is forcibly split into diﬀerent granules

and the underlying local models do not proﬁt from the full information.

119

7.3 Time Series Prediction

The average number of rules in FBeM depends on the choice of ρand 1/ψ.

Reference (80) recommends ρ[0] = 0.5 to balance structural stability and plasticity

whenever we lack detailed knowledge of the modeling task and data properties.

Monthly mean temperature prediction experiments suggest ρ[0] in the range from

0.5 to 0.8 to avoid rule overshoot after learning starts. This procedure helps

to attain smoother structural development along the next time steps. Gradual

adaptation of the granularity also alleviates initial guesses and guides the value

of ρaccording to the data stream. For monthly weather prediction, we suggest

hrvalues in the range between 48 and 84. The idea here is: if a trend does not

appear again in the next four/eight years, then remove its corresponding rule.

7.3.4 Handling Abrupt Regime Changes

Long term climate changes cause average monthly temperatures to gradually drift

over time yet abrupt shifts are hardly noticeable. The experiment addressed in

this section show how evolving granular systems react when abrupt changes occur

in nonstationary time series. We assume the FBeM method as the behavior of

IBeM and eGNN are essentially equivalent.

For this purpose, we consider a hypothetical situation in which the time series

of Death Valley, Ottawa, and Lisbon occur sequentially, forming a single time

series. Two severe regime shifts are easily identiﬁed as the top plot of Fig. 7.13

illustrates. The bottom plot of Fig. 7.13 shows the fuzzy temperature predic-

tions during the Ottawa-Lisbon shift (time interval between 2661 and 2740). In

this experiment, FBeM should adapt the model to capture the new temperature

proﬁle and forget what is no longer relevant for the current environment. The

initial parameters of FBeM were: ρ= 0.6, hr= 1/ψ = 48 and η= 2. Figure

7.13 shows the RMSE, the number of rules, and the granular and singular pre-

dictions. Notice that the number of rules of the rule base peaks after the Death

Valley-Ottawa and Ottawa-Lisbon transitions, but returns to the usual values af-

terwards. Similarly, the RMSE increases slightly and decreases in the next steps

after time series transitions. Online adaptability improves prediction accuracy

after the transitions. Evolving granular systems are stable to abrupt changes in

granular data streams, a challenge to a variety of machine learning algorithms.

120

7.3 Time Series Prediction

Figure 7.13: FBeM prediction of the Death Valley, Ottawa, and Lisbon temper-

ature time series combined

121

7.4 Function Approximation

Function approximation consists in ﬁnding a function that matches in some extent

a target function in a task-speciﬁc way. Here, the target functions are unknown,

but perceived as streams of intervals or fuzzy intervals - with intervals and fuzzy

intervals representing our intuitive notion of approximate data. Diﬀerently from

time series prediction, in function approximation problems, external variables are

available and the time span in which the data are obtained is unimportant.

The generic form of the function approximation problem is as follows: given a

time-varying unknown function f[h], where h= 1, ... is the time index; and a pair

of observations (x, y)[h],x∈Xand y∈Y, ﬁnd a ﬁnite collection of information

granules γ={γ1, ..., γc}and a time-varying real-valued map p[h]:X→Ysuch

that γi⊆X×Yand p[h]minimizes (f[h]−p[h])2. The output y[h]is unknown

when the input x[h]is available, but is known afterwards. Attributes xjof an

input vector x= (x1, ..., xn) and the output yare trapezoidal fuzzy data.

Because every continuous function fcan be approximated uniformly on a

ﬁnite interval by continuous piecewise linear functions p(36), evolving granular

systems are universal approximators (proof in Appendix A).

The following sections consider recent benchmark data sets in material and

biomedical engineering to evaluate and illustrate the usefulness of the proposed

evolving granular approaches in the function approximation task.

7.4.1 Concrete Compressive Strength

Compressive strength is the capacity of a material to withstand axially-directed

pushing forces. When the limit of compressive strength is reached, materials are

crushed. When building with concrete, it is important to know whether it can

bear the compressive forces for safety sake (68).

Compressive tests measure how well concrete holds up to the compressive pres-

sures around it. Test standards neglect uncertainties from diﬀerent natures. For

example, (i) the concrete cross sectional area changes in function of the compres-

sive load applied. The material tends to spread laterally and hence increase the

cross sectional area; (ii ) compression tests clamp materials at the edges. There-

fore, a variable frictional force (barreling phenomenon) arises which will oppose

122

7.4 Function Approximation

the lateral spread. This results in a slightly inaccurate value of stress which is

obtained from the experiment (68); (iii ) standards for concrete structures do not

consider weather conditions and permit specimen density to vary about 2%.

The Concrete Compressive Strength data set, available at the UCI Machine

Learning Repository, consists of 1030 singular samples. We assume that the

data are perceptions of the values of a variable. Thus, we consider 2% of im-

precision in each input xjand output yand represent it by symmetrical trian-

gular fuzzy objects of the form (.98xj, xj, xj,1.02xj) and (.98y, y, y, 1.02y), re-

spectively. Interval methods consider intervals (.98xj, .98xj,1.02xj,1.02xj) and

(.98y, .98y, 1.02y, 1.02y), naturally. Concrete ingredients and age of the mixture

are the independent variables of the compression function. Ingredients include

cement, blast furnace slag, ﬂy ash, water, superplasticizer, and coarse and ﬁne

aggregate (147).

First, we perform a preliminary experiment to analyze diﬀerent fuzzy aggre-

gation neurons in eGNN. We consider 0.5 as default value of uninorm neutral

elements eand T-S norm v-factors. Therefore, in this experiment fuzzy neu-

rons use the T and S norms occurring in these constructs equally. Other fuzzy

aggregation neurons are used with no restrictions.

Performance evaluation is based on the RM SE and N DEI indices computed

similarly as in (7.9) and (7.10), respectively. We consider also the average number

of rules, and CPU time using a dual-core 2.54GHz processor with 4GB of RAM.

The original samples were shuﬄed and linearly scaled to the range [0,1]. eGNN

adopts ρ= 0.45, hr= 50, η= 2 and ζ=ϑ= 0.5. Table 7.5 shows the

best performance of each network setting (using diﬀerent types of neurons in the

aggregation and output layers) in 10 independent runs.

We notice in Table 7.5 that the accuracy of eGNN using product T-norm

(Tprod) neurons in the aggregation layer and combined T-S neuron (Lmin,max) in

the output layer provides the most accurate results employing a relatively small

number of rules. Although the averaging (M) output neuron conﬁguration using

Tprod aggregation produces very close results, the former construct which adopts

Tprod and Lmin,max neurons is maintained for the next experiments.

Because outlier points can disturb the computation of the mean and standard

deviation of a dataset, we consider replacing them with the mean of the values

123

7.4 Function Approximation

Table 7.5: Concrete compressive strength prediction: evaluation of diﬀerent types

of eGNN neurons

Aggregation Output # Avg.Rules RM SE NDEI CPU*

Tmin M4.01 0.1268 0.6354 1.6

Tprod M3.72 0.1210 0.6064 1.8

TLM5.26 0.1438 0.7209 1.8

Umin,max M4.53 0.1402 0.7029 1.5

Uprod,prob M3.86 0.1295 0.6488 1.7

Tmin Lmin,max 4.01 0.1275 0.6389 1.8

Tprod Lmin,max 3.72 0.1205 0.6040 1.8

TLLmin,max 5.26 0.1417 0.7100 1.9

Uprod,prob Lmin,max 3.86 0.1276 0.6393 1.7

* Average CPU time per sample in milliseconds

available so far. This procedure avoids biased estimates caused by low levels of

activation of granules. Put simply, if the concentration of concrete ingredients or

the compressive strength exceeds the accumulated mean value plus 4 standard

deviations, then it is replaced by the mean. Note that this procedure discards

the uncommon value and handles the sample with a missing datum. Imputation

methods for missing data (127) request certain constraints to be attended. For

instance, if the number of outliers is large compared to the total number of

samples then we run the risk to distort the covariance structure of the data and

to bias covariances toward zero. The feasibility of the proposed method using 4

standard deviations was conﬁrmed to the underlying data since that the quantity

of outliers is smaller than 2%. A more elaborate mechanism to deal with outliers

is, e.g., to use an arousal mechanism such as the one suggested in (128).

We compare evolving granular systems against alternative evolving meth-

ods. The approaches evaluated are: evolving Takagi-Sugeno (eTS) (6), extended

Takagi-Sugeno (xTS) (7), dynamic evolving neural-fuzzy inference system (DEN-

FIS) (64), evolving fuzzy linear regression tree (eFT) (83), evolving participatory

learning (87), IBeM, FBeM, and eGNN. In this comparative experiment, we con-

sider the dataset as originally provided instead of shuﬄing the data samples.

Based upon a possible temporal correlation of the data, one lagged value of the

compressive strength was considered as input to all of the methods. Therefore,

124

7.4 Function Approximation

the number of input variables in this experiment totalizes 9. IBeM employs

ρ= 0.45, hr= 50 and η= 2; FBeM uses the same parameters as IBeM ex-

cept for hr= 1/ψ = 40; eGNN also adopts the same parameters as IBeM plus

ζ=ϑ= 0.5. Table 7.6 shows the performance of each method.

Table 7.6: Concrete compressive strength prediction: evaluating diﬀerent evolving

methods

Model # Avg.Rules RM SE N DEI CPU

DENFIS 5.00 0.1130 0.5670 19.9

eFT 7.00 0.1380 0.6518 79.7

eGNN 3.72 0.1205 0.6040 1.8

ePL 6.00 0.1847 0.9259 24.4

eTS 7.00 0.1554 0.7343 0.9

FBeM 3.48 0.1398 0.7009 1.7

IBeM 4.43 0.1334 0.6683 1.2

xTS 8.00 0.1552 0.7333 0.9

We observe from Table 7.6 that DENFIS is the most accurate method to

approximate the compressive strength function although it uses more rules and

spends more time to process the data collection as opposed to evolving granular

systems. From the accuracy/compactness point of view all, IBeM, FBeM and

eGNN, have shown to be competitive. Particularly, eGNN reached a slightly

higher error rate than DENFIS, 0.1205 against 0.1130, using an average of only

3.72 rules and a maximum of 10 rules. eTS and xTS were the fastest meth-

ods in this function approximation application. Non-granular methods give no

prediction boundaries during processing steps.

Figure 7.14 shows an example of Tprod with Lmin,max eGNN approximation of

the concrete compressive strength function and the evolution of the number of

rules, error indices, and granularity. The bottom plot expands the approximation

of the top plots in the range [885,987].

In Fig. 7.14, the single-valued approximation ptogether with the granular

approximation [u, U] gives a value of compressive strength and a range of values

in the neighborhood of pinduced by the input data. Moreover, the granular

approximation may come with a label and a proper linguistic description; it

125

7.4 Function Approximation

Figure 7.14: eGNN approximation of the concrete compressive strength function,

and evolution of the rule base, error indices, and granularity

126

7.4 Function Approximation

enhances model acceptability and the neighborhood can be made tighter if we

accept a larger number of rules. The performance of eGNN proﬁts from the

combination of structural evolution and fuzzy granules. Results may recommend

changing ingredients mix ratio and/or adding special hardeners to the concrete

compound.

7.4.2 Parkinson’s Telemonitoring

Parkinson’s disease is one of the most common neurodegenerative disorders (133).

Early diagnosis is a key to improve patients’ quality of life and to prolong it.

Frequent symptoms of Parkinson’s disease include movement disorders and vocal

impairment (dysphonia). Particularly, vocal degradation is one of the earliest

indicators of the disease which patients consider a major barrier (57) (133).

The Parkinson’s telemonitoring data set, accessible at the UCI Machine Learn-

ing Repository, consists of 5875 biomedical voice measurements from 42 patients

with early-stage Parkinson’s disease recruited to a six-month trial of a telemoni-

toring device for remote symptom progression accompaniment. Recordings were

captured in typical home acoustic environments and transmitted over the Inter-

net to a clinic. Inputs comprise 5 Jitter and 6 Shimmer measures related respec-

tively to the frequency and amplitude of the speech signal, 2 measures of ratio

of noise-to-tonal components, 2 measures associated to entropy, and a measure

of detrended ﬂuctuation, a total of 16 inputs. Uncertainty may arise from the

processes of audio capturing and web transmission, steadiness of phonation and

loudness, and environmental conditions, to name a few. The output considered in

this study is the total score of the factor ‘uniﬁed Parkinson’s disease rating scale’

(UPDRS), which reﬂects the presence and severity of symptoms. The larger the

UPDRS, the more severe the patient disabilities.

We analyze diﬀerent aggregation neurons in eGNN similar to the concrete

compressive strength function approximation problem. New outlier data are re-

placed with the mean of the values available so far whenever the data surpass the

range of plus or minus 3 standard deviations around the mean. We consider not

shuﬄing data samples and taking advantage of 2 lagged UPDRS scores to beneﬁt

from temporal information. Table 7.7 summarizes the results.

127

7.4 Function Approximation

Table 7.7: Parkinson’s telemonitoring prediction: evaluation of diﬀerent types of

eGNN neurons

Aggregation Output # Rules RM SE NDEI CPU*

Tmin M9.10 0.0667 0.3219 1.8

Tprod M8.75 0.0679 0.3280 1.9

Umin,max M10.56 0.0755 0.3643 2.0

Uprod,prob M9.52 0.0754 0.3641 2.2

Tmin Lmin,max 9.10 0.0668 0.3226 1.8

Tprod Lmin,max 8.75 0.0683 0.3296 2.1

Umin,max Lmin,max 10.56 0.0735 0.3550 2.1

Uprod,prob Lmin,max 9.52 0.0752 0.3630 2.3

* Average CPU time per sample in milliseconds

Table 7.7 shows that using diﬀerent types of fuzzy neurons in the aggregation

layer of eGNN aﬀects prediction accuracy more than using diﬀerent neurons in

the output layer. The eGNN construct that combines minimum T-norm Tmin and

averaging Mneurons in the aggregation and output layers, respectively, performs

better than the remainder constructs according to the error indices. Therefore,

this construct is maintained in the next experiments.

Studies in (90) (133) point that some dysphonia measures are highly corre-

lated. Overall, highly correlated input variables tend to misguide the underlying

granular evolving method. Variable selection has been considered to improve pre-

diction accuracy, speed up the learning process, and provide easier-to-interpret

models.

Input variables of IBeM, FBeM, and eGNN may be mvariables of the speech

function or we may select the nless correlated variables. We conducted oﬄine

ranking and progressive elimination of correlated variables. Leaving one out of

two highly correlated variables allows assessing how well the results generalize to

relatively independent data sets. Similar to (14) (133), the sequence of removed

variables was chosen based on their maximum redundancy as calculated on par-

tial autocorrelation analysis. The variables were: Shimmer:DDA, Jitter:DDP,

Shimmer(dB), Shimmer:APQ5, Shimmer:APQ3, Jitter:RAP, Jitter:PPQ5, Jit-

ter(Abs), Shimmer:APQ11, and HNR, in this order. Refer to the UCI Machine

Learning Repository for a detailed description of the meaning and signiﬁcance

128

7.4 Function Approximation

of the variables. We ignore one variable at a time until a statistically signiﬁ-

cant degradation of the systems is noticed. Figure 7.15 shows the average results

considering IBeM, FBeM, eGNN, and independent runs for each set of variables.

Figure 7.15: Evolving granular systems results on leave-one-variable-out approach

to ﬁnd less correlated subsets of input variables

The top plot of Fig. 7.15 shows that a substantial degradation of the RMSE

performance is veriﬁed only when the leave-one-out approach results in a 6-

variable model (not counting the two lagged variables related to the previous

UPDRS values). According to the principle of parsimony, which states that,

129

7.4 Function Approximation

other things being equal, the simplest solution is the best, the models with the 7

less correlated variables are suﬃcient.

The bottom plot of Fig. 7.15 shows the average per-sample CPU time spent

by evolving granular systems for diﬀerent numbers of input variables. The data

were ﬁtted with a quadratic function whose small second-order coeﬃcient suggests

that time complexity is quasi-linear with the number of input variables. This is

a major characteristic in view that many computational intelligence and statis-

tical algorithms behave polynomially or exponentially and are unable to process

massive data streams in large-scale online modeling. Moreover, evolving granular

systems run in linear time with respect to the number of samplings because their

learning algorithms are one-pass and incremental.

Comparison between granular and alternative methods is given in Table 7.8.

The methods analyzed were: multi-layer perceptron (MLP) (55), least squares

(LS) (54), iterative reweighed least squares (IRLS) (133), least absolute shrinkage

and selection operator (LASSO) (132), classiﬁcation and regression trees (CART)

(54), extended Takagi-Sugeno (xTS) (7), and evolving Takagi-Sugeno (eTS) (6).

In this experiment samples were shuﬄed so that their order does not matter to

those methods which are online. Lagged values of the UPDRS score become su-

perﬂuous since temporal information is lost when we mix the data sequence. The

purpose of the experiment is to compare state-of-the-art function approximation

methods only. IBeM uses the following parameters: ρ= 0.5, hr= 120, η= 2;

FBeM employs ρ= 0.45, hr= 120, η= 1/ψ = 2; and eGNN employs ρ= 0.4,

hr= 120, η= 2 and ζ=ϑ= 0.5. IBeM was run processing input information

translated to interval data. Conversely, FBeM and eGNN process symmetrical

trapezoidal fuzzy data. More precisely, the original input and output data, xjand

y, were assumed to be (.98xj, .98xj,1.02xj,1.02xj) and (.98y, .98y, 1.02y, 1.02y)

in the case of IBeM; and (.98xj, .99xj,1.01xj,1.02xj) and (.98y, .99y, 1.01y, 1.02y)

in the case of FBeM and eGNN. Table 7.8 shows the results.

Table 7.8 shows that both granular evolving systems achieve satisfactory re-

sults taking into account the relation accuracy/compactness. Interestingly, FBeM

outperforms the remaining methods not requiring a large number of rules. Based

on its structure, learning algorithm and fuzzy granular framework, FBeM reaches

a 0.1245 RMSE rate using an average of 4.91 ±1.68 rules, with a maximum of

130

7.4 Function Approximation

Table 7.8: Parkinson’s telemonitoring prediction: evaluating diﬀerent methods

Model # Rules RM SE NDEI

LASSO – 0.3842 1.8402

LS – 0.3820 1.8294

IRLS – 0.3797 1.8186

CART – 0.3588 1.7185

MLP – 0.3559 1.7046

eTS* 7 0.1452 0.6954

xTS* 7 0.1443 0.6911

eGNN* 6 0.1394 0.6673

IBeM* 5 0.1358 0.6504

FBeM* 5 0.1245 0.5963

* Online methods

9 rules. Evolving granular methods have shown clear advantages over traditional

statistical LASSO, LS, and IRLS methods because of its nonlinear nature.

Figure 7.16 depicts an example of FBeM approximation of the Parkinson’s

telemonitoring function and the evolution of the number of rules, error indices and

granularity. The results in the ﬁgure stand for the best approximation attained

in all experiments performed. It considers an FBeM model with 8 inputs, being

7 less correlated features (obtained as previously described using the leave-one-

variable-out approach) and 1 lagged UPDRS value. Moreover, we discard samples

that convey values out of the range of plus or minus 3 standard deviations around

the current mean value of a variable.

The top plot of Fig. 7.16 shows that the FBeM single-valued approximation

pprovides quite accurate estimations when temporal information is intrinsic to

the data stream. In these cases, the FBeM learning algorithm explores space in-

formation from the original features and, simultaneously, time information from

past outputs. Note that the average number of rules during processing steps is

only 4.16, which means that FBeM is not over ﬁtting data to achieve the under-

lying error rate (RMSE = 0.0509), but generalizing the behavior of the actual

function. The granular prediction [u, U ] (support of trapezoids) provides lower

and upper bounds in which single-valued predictions must be within. Therefore,

given a limited range of possible values, the chances to obtain more accurate

single-valued predictions tend to be higher. The bottom plot of the ﬁgure ex-

131

7.4 Function Approximation

Figure 7.16: FBeM approximation of the Parkinson’s telemonitoring function,

and evolution of the rule base, error indices, and granularity

132

7.4 Function Approximation

pands the top plot in the interval [1181,1273]. If we accept larger rule bases by

choosing higher values for the rate ηand lower values of ρ, then FBeM takes

advantage of a larger number of rules and may improve its accuracy because the

granular approximation tends to become tighter around p. However, FBeM inter-

pretability may reduce in this case as the number of linguistic terms and granules

increases. The same phenomenon is observed in other granular systems.

Rules of particular interest can be displayed anytime. An example of a highly

active FBeM rule at h= 5460 is:

————————————————————————————–

Ri: IF (x1is [0.0381,0.1081,0.1154,0.4283],

x2is [0.0756,0.1469,0.1619,0.4582],

x3is [0.0169,0.0556,0.0667,0.3899],

x4is [0.3259,0.5531,0.5753,0.7571],

x5is [0.2813,0.5241,0.6081,0.7232],

x6is [0.0692,0.1783,0.1890,0.4656],

x7is [0.1272,0.2724,0.2887,0.5295],

y[−1] is [0.4283,0.5202,0.5300,0.8147])

THEN ˆyis [0.4327,0.5103,0.5202,0.8177] AND

ˆy= 0.0367 + 0.0300 x1- 0.0815 x2- 0.0449 x3+ 0.0115 x4+

+ 0.0225 x5+ 0.0396 x6+ 0.0532 x7+ 0.8812 y[−1]

————————————————————————————–

Here, x1stands for ‘Jitter(%)’; x2, ‘Shimmer’; x3, ‘NHR’; x4, ‘HNR’; x5, ‘RPDE’;

x6, ‘DFA’; x7, ‘PPE’; y[−1], ‘last UPDRS’; and ˆyis the predicted ‘UPDRS’. Lin-

guistically, and based on all existing rules at h= 5460, rule Rican be read: if

‘Jitter(%)’ is ‘very low’, ‘Shimmer’ is ‘low’, ‘NHR’ is ‘very low’, ‘HNR’ is ‘high’,

‘RPDE’ is ‘moderate’, ‘DFA’ is ‘low’, ‘PPE’ is ‘low’, and ‘last UPDRS’ is ‘high’,

then ‘UPDRS’ is ‘high’. The outputs of granular modeling approaches help to

monitor Parkinson’s disease symptoms.

Evolving granular systems are capable to process online interval and fuzzy

granular data as well as handle ranges of possible values to approximate functions.

In addition, they give transparency and interpretability of the resulting construct.

133

7.5 Control

Process controllers aim at keeping the output of a speciﬁc process within a desired

range. In the following, we shall show how to use IBeM, FBeM and eGNN as

controllers of a feedback system.

7.5.1 Sensor-Based Robust Navigation

We consider an instance of autonomous robot navigation in an unknown environ-

ment with obstacle avoidance. From the point of view of control, the autonomous

navigation problem consists in designing driving rules based on available sensor

data. Evolving granular systems for sensor-based navigation play the role of reac-

tive adaptive controllers that prevent the robot from colliding with obstacles. We

assume that the pair of sensors available for obstacle detection is infrared sensors

directed head-on and symmetrically as shown in Fig. 7.17. Measurements from

Figure 7.17: Environment for sensor-based navigation

134

7.5 Control

sensors SL and SR give a linear approximation of the surface of an obstacle. The

control variable is the wheel steering angle φ. Variable θstands for the reference

angle between the robot and the border of the track.

We assume the navigation environment is ﬂat without slopes, but unknown.

Coordinates z1and z2range between [0,3000] and [0,5000]. Positive values of the

steering angle φrepresent clockwise rotation of the steering wheel, and negative

values mean counterclockwise rotation. At every processing step, the controller

yields a steering angle. Input sensor readings, SL and SR, are proportional to

the distance between robot and obstacle and limited to 500. The perpendicular

distance between infrared beams is 40. We want the robot to drive through the

path without hitting the borderline.

Simple kinematical relations approximate the robot movement. For example,

if the robot moves from position (z1,z2) to position (z0

1,z0

2) at step hwith speed

S, then:

θ0=θ+φ(7.11)

1=z1+Ssin(θ0) (7.12)

2=z2+Scos(θ0) (7.13)

Obstacle avoidance simulation models often ignore physical limitations and pro-

cessing delays. Estimated paths are often unrealistic once the feasibility of the

trajectory is not guaranteed. In addition, uncertainty in measurements may hin-

der the robot to follow trajectories precisely. Evolving granular systems deal with

these constraints by keeping the robot between tolerance bounds [u, U] around

the more precise estimated path p.

Experiments with diﬀerent navigation speeds and noisy data were performed.

Experts provided a few common-sense associations of how the state and control

variables behave prior to learning and navigation. Three rules were considered:

R1: IF (SL is big) AND (SR is big) THEN (φis zero) AND (φ=p1(SL, SR))

R2: IF (SL is small)AND(SR is big)THEN(φis positive)AND(φ=p2(SL, S R))

R3: IF (SL is big)AND(SR is small)THEN(φis negative)AND(φ=p3(SL, SR))

135

7.5 Control

The parameters of functions piare a1= (0,−0.034,0.034), a2= (5,0.04,0.1), and a3

= (−5,−0.1,−0.04). In the case of interval framework, sharp boundaries deﬁne the

subsets ‘big’, ‘small’; ‘negative’, ‘zero’, and ‘positive’ as shown in Fig. 7.18(a). In

the cases of fuzzy and neurofuzzy frameworks, trapezoidal membership functions

deﬁne the same subsets as illustrated in Fig. 7.18(b).

(a) Interval attributes

(b) Fuzzy attributes

Figure 7.18: Initial conditions for the autonomous navigation control problem

The robot is chosen to be initially placed at position (1900,100) with reference

angle θ= 0oin all experiments (see Fig. 7.17). Sensor data streams are singular

and linearly scaled to the range [0,1]. The IBeM and FBeM controllers start

with parameters ρ= 0.5, hr= 1/ψ = 10000 and η= 1; eGNN adopts the same

parameters as IBeM and FBeM plus Ci=TL,Cf=Mand ζ=ϑ= 0.5. It is

worth noting that although the support of the initial membership functions of

Fig. 7.18 covers the whole variables domain, searching for more speciﬁc rules to

ﬁt never-before-seen stream data may contract granules and therefore activate

structural adaptation of models.

Figure 7.19 shows diﬀerent trajectories concerning the robot driving at speeds

5, 10, 20, 30 and 40, and using diﬀerent granular controllers. We run each algo-

rithm ﬁve times independently in this experiment. We notice in the ﬁgure that

the robot responded faster to obstacle detection driving at lower speeds. More-

over, alignment (parallel to the obstacle) after left and right turns tended to be

136

7.5 Control

more accurate at lower navigation speeds. Alignment yields smoother and shorter

paths, which are intuitively preferable. Some classes of problems emphasize fast

environment exploration though. Table 7.9 compares the performance of IBeM,

FBeM and eGNN based on the results shown in Fig. 7.19.

Table 7.9: Comparison of diﬀerent evolving granular controllers

Model Speed # Steps to Goal CPU # Rules

5 1164 5.57 3

10 581 3.18 3

IBeM 20 289 1.95 3

30 194 1.60 3

40 156 1.29 4

5 1094 4.70 3

10 546 2.78 3

FBeM 20 290 1.91 4

30 202 1.61 3

40 157 1.51 4

5 1086 5.71 3

10 550 3.33 3

eGNN 20 279 2.15 3

30 201 1.86 4

40 154 1.70 4

Table 7.9 shows that eGNN provided the shortest path in 3 of the 5 speed

settings, seconded by IBeM and FBeM with one shortest path each. IBeM was

able to process data faster than the remaining methods and therefore is the best

controller in terms of time spent to achieve the goal.

Figure 7.20 evidences the numerical and granular outputs of an experiment

with the FBeM controller. Numerical output pis provided by the functional part

of the FBeM controller, and granular output [u, U] is given by the bounds of

the granular part of the FBeM consequent. Granular output is interpreted as a

guaranteed safe path for navigation and maneuver.

The simulation concerning speed 30 (Fig. 7.20) started with 3 rules and ended

up with 4 rules. After contraction and drifting of initial membership functions

of antecedents toward frequently requested regions around 500, when the robot

approached the obstacle and the range of sensor readings got smaller quickly, a

137

7.5 Control

(a) IBeM navigation

(b) FBeM navigation

Figure 7.19: Granular controllers navigating at diﬀerent speeds

138

7.5 Control

Figure 7.20: Detail of the FBeM navigation at diﬀerent speeds

new rule:

R4: IF (SL is very small) AND (SR is small)

THEN (φis big positive) AND (φ=p4

1(SL, S R))

was created to help the robot turn right faster and avoid collision.

Experiment adding noise in the range ϑ= [−0.05,0.05] to input data was con-

ducted. Noise may swing the robot from one side to the other. In this experiment

the robot speed remains ﬁxed at 5 during the simulations. The initial parameters

of IBeM, FBeM, and eGNN are the same as in the previous experiment. Figure

7.21 illustrates trajectories from independent simulations considering FBeM. Tra-

jectories for IBeM and eGNN are similar. We notice that when obstacles are out

of vision, the controller accepts input data as they are and lets the robot explore

the environment freely. Otherwise, when obstacles are detected, the controller

responds turning the robot left and right satisfactorily. Granular representation

of data alleviates unlikely swing eﬀect.

Naturally, the accuracy of the granular controllers for navigation can be im-

proved using more sensors and considering speed as control variable. Evolving

algorithms oﬀer model-free estimation of the control system. Even if a math-

ematical model is available, evolving controllers may prove more robust, easier

to adapt, and give additional linguistically interpretable granular information,

139

7.6 Summary

Figure 7.21: FBeM navigating with noisy input

which may help design, analysis and supervision. If experts can provide struc-

tured knowledge of the control system or if training data are unavailable, the

evolving granular approach proceeds as an adaptive controller.

7.6 Summary

This chapter has shown that evolving granular modeling approaches can handle

singular and granular data from distinct scenarios successfully. Moreover, they

can outperform other oﬄine, online adaptive and evolving approaches in terms of

accuracy, conciseness, processing speed, and interpretability. IBeM, FBeM, and

eGNN have provided meaningful understandable rules in semi-supervised classi-

ﬁcation, function approximation, time series prediction, and control problems.

140

Chapter 8

Conclusion

This chapter summarizes the major aspects of this work, reiterates the thesis

contributions, and discusses potential future research.

8.1 Summary

This thesis has introduced a framework for evolving granular system modeling

and a suite of methods for uncertain data processing. Evolving granular sys-

tems emphasize structural learning of rule-based models and their realization

from online data streams. We consider uncertain data streams and transparent

and linguistically appealing approaches that explore the data uncertainty. Evolv-

ing granular systems provide a way to construct models of real-world processes

involving domain knowledge, experience, and empirical data.

The evolving granular framework is supported by notions of granular com-

puting such as data granulation, granularity adaptation, and granular data pro-

cessing. Development of the structure of models from scratch on a plug-and-play

incremental basis and computation with multi-sized granules are fundamental

characteristics of the framework. Instead of dealing with the problem as a whole,

we gradually granulate it in simpler sub-parts. The premise is to discover more

abstract granular knowledge from ﬁner granular input-output data. Because un-

certain data prevail in stream applications, excessive granularity (close to the

singularity) becomes unnecessary and ineﬃcient.

In spite of the fact that granular computing is a uniﬁed framework for granular

141

8.1 Summary

information processing, current literature is fairly dispersed and related ideas

have been developed independently under diﬀerent terminologies. Of particular

concern to this work are numeric, interval, and fuzzy types of granular data; and

interval, fuzzy, and neurofuzzy modeling frameworks. We have introduced three

methods, viz., interval based evolving modeling (IBeM), fuzzy set based evolving

modeling (FBeM), and evolving granular neural network (eGNN). IBeM uses

interval data and interval preserving operations rooted in the theory of interval

mathematics. FBeM deals with fuzzy data and produces results in fuzzy granular

format. eGNN essentially encodes a set of fuzzy rules in its topology; therefore,

neural processing conforms with a fuzzy inference system. Diﬀerently from FBeM,

eGNN is equipped with fuzzy aggregation neurons, which provides it with a higher

level of adaptability. Fuzzy sets and neurocomputing are complementary in terms

of their strengths, thus motivating neurofuzzy granular computing.

We evaluate the methods on several nonstationary environments, namely,

semi-supervised classiﬁcation, weather time series prediction, function approx-

imation in material and biomedical engineering, and control of an autonomous

robot. Put broadly, spatial and temporal aspects of data stream processing were

examined from a granular perspective. In general, the IBeM, FBeM, and eGNN

methods were able to provide: (i) computational tractability and scalability with

the number of samples and input variables; (ii) improved interpretability and

transparency of models; (iii ) reduced cost of data processing in relation to non-

evolving methods; (iv ) approximated results, bounds on the approximations, and

description of actions. A user is presented with a range of values without pressure

as to the commitment of choosing a speciﬁc numeric solution. A numeric value

is also given as result of local linear functions associated to granules.

All methods have shown to be extremely general and able to outperform state-

of-the-art evolving approaches. FBeM and eGNN alternate as the most accurate

methods. A main disadvantage of IBeM is relatively lower accuracy. This is a

consequence of parameter-free internal representation in opposition to the fuzzy

membership functions of FBeM and eGNN. IBeM usually processes data faster

than FBeM and eGNN for the same reason. An advantage of eGNN with respect

to IBeM and FBeM is the ability of feature weighting: eGNN tends to be more

robust against irrelevant input variables by changing the value of connection

142

8.2 Contributions

weights over time. However, this characteristic was not of particular importance

to the chosen applications.

All evolving granular modeling approaches provide a complete model descrip-

tion and oﬀer diﬀerent linguistic insights into the nature of granular data relation-

ship. Results were at least as strong as the results obtained by recently proposed

methods in the ﬁeld of evolving intelligent systems.

8.2 Contributions

This thesis has proposed evolving granular systems, a rule-based modeling frame-

work able to handle uncertain data streams from online information systems.

Evolving granular systems suggest a paradigm shift in online data analysis sup-

ported by the fact that storage and oﬄine processing of large amounts of data

are quite often impractical or simply not cost eﬀective. Accuracy, transparency,

and interpretability are the keys in evolving granular systems.

Three practical approaches founded on principles from diﬀerent theories were

suggested to handle granular data streams. Interval based evolving modeling

is an interval granular approach to enclose imprecise data streams revealed as

tolerance intervals. Interval-based modeling comes with a recursive learning al-

gorithm rooted in fundamentals of interval mathematics. Antecedent and con-

sequent parts of interval rules are interval hyperboxes which are connected by

an inclusion function. Fuzzy set based evolving modeling uses fuzzy granular

models to deal with ﬁner fuzzy data. For each fuzzy granule, there exists an

associated fuzzy rule. The structure of the fuzzy rule base is gradually developed

from an incremental learning algorithm suitable to process potentially unbounded

fuzzy data streams. This approach renders linguistic models of systems and fuzzy

granular approximation of functions. Evolving granular neural networks use fuzzy

granules and fuzzy aggregation neurons for information fusion. The network can

be translated into a knowledge base and a comprehensible rule-based inference

system. Learning in evolving granular neural networks consists in building and

adapting the network structure from fuzzy data streams. This means that the

neural network captures new information from data streams, adapts itself to the

143

8.3 Future Research

new scenario, and avoids redesigning and retraining.

A large set of experiments was performed to show the usefulness of the evolving

granular approaches. The interval IBeM, fuzzy FBeM, and neurofuzzy eGNN

approaches were evaluated in a variety of applications such as semi-supervised

classiﬁcation, time series prediction, function approximation, and control. The

experiments emphasized the diﬃculty of currently existing machine learning and

computational intelligence approaches to deal with nonstationary data streams.

Comparative results have demonstrated the relevance of the proposed framework.

Although important results have been achieved in this thesis, a lot of challenges

still lie ahead.

8.3 Future Research

The result of this thesis provides potential insights for further research. Next, we

brieﬂy list some of the most immediate works that will improve our achievements.

In the near future, an important research topic is to elaborate on new clus-

tering methods of interval and fuzzy interval data streams. Diﬀerent closeness

metrics for intervals and fuzzy intervals are likely to be rethought in a recur-

sive way and therefore considered in unsupervised evolving granular modeling.

Techniques for clustering categorical data streams are also worth addressing.

Another potential area of future work is the study of multi-dimensional granu-

lar models. Skewed, non-aligned, multi-dimensional granules allow dependencies

among several input variables to be captured without necessarily committing to

any directional association of the underlying variables. Multi-dimensional data

representation preserves information about interactions between input variables

through the use of dispersion matrices.

During this thesis, we did not go into chunk-based learning. Chunks of data

are sets of sequential data samples buﬀered to be analyzed at once. Chunks

are discarded soon after use. Although incremental data chunk driven learning

algorithms usually require additional memory and processing time compared to

incremental instance-based algorithms, we envision they may provide interesting

insights in one-class classiﬁcation problems and learning from imbalanced data

144

8.3 Future Research

sets.

We have addressed data granulation in the time (sampling) and space (clus-

tering) domains. We hope to extend this further to granulate the feature domain

(feature selection). In this direction, uncertainty in data representation may be

useful to help choosing granular features. For example, a feature with greater

uncertainty may not be as important as one with smaller uncertainty. Data un-

certainty works as a guideline to incremental granular feature selection.

The interval, fuzzy and neurofuzzy approaches discussed in this work are

considered semi-active. In semi-active learning, not all available data samples

are used for model adaptation. Implicitly, the approaches ignore indistinguish-

able samples as a result of temporal and spatial granulation. Conversely, active

learning approaches deal with ﬁltering mechanisms. Filtering mechanisms such as

those based on the participatory learning paradigm may prevent evolving systems

to be exposed to outliers, thus being an interesting issue to be explored.

This thesis did not cover the qualitative eﬀects of data granulation which re-

lates directly to computing with words. In computing with words, the objects of

computation are words and propositions drawn from natural language. Often, we

translate information expressed in words (soft information) into some tractable

granular computing framework. When people want this information back from a

fusion system, they want it retranslated in a way it can be used in the task they

are really interested in. More precisely, the retranslation process consists in con-

verting a formal mathematical representation into natural language statements

that can be understood by human beings. We believe there is room to research on

ﬁnding better criteria to retranslate granules into words based on external goals.

We envision as thesis topics of great importance in the near future hybrids be-

tween evolving rule-based systems and methods from the rough sets and support

vector machines theories. In particular, rough set theory is an instance of granu-

lar computing framework. Imprecision in the rough set approach is expressed by

a boundary region, and not by partial membership, as in fuzzy set theory. Two

crisp sets - called lower and upper approximation - are associated to a rough set.

The lower approximation of a set consists of all elements that surely belong to

the set, whereas the upper approximation of the set constitutes of all elements

that possibly belong to the set. The diﬀerence of the upper and the lower ap-

145

8.3 Future Research

proximation is the boundary region. Any rough set, in contrast to a crisp set, has

a non-empty boundary region. Evolving rough sets from uncertain data streams

is still an approach to be explored. Support vector machines are characterized

by the use of kernel mapping techniques. Although the support vector machine

approach has been primarily applied to pattern recognition, many of its ideas

carry over to the cases of function approximation. A support vector machine can

be, for example, used in the consequent part of a granular rule, thus providing

local nonlinear models for classiﬁcation or regression of uncertain data.

Although granular computing extends real-valued computing to computing

with intervals, fuzzy sets, etc., they remain deterministic in its very practical

aspect because calculations remain based on real parameters which character-

ize granules. Computing with granules and words are recognized to be of great

relevance to some matters in which the confusion of goals does not justify per-

fection of means. However, the current digital signal processing technology and

the discrete/continuous dichotomy do not allow computing in a more abstract

human-like level of thinking in its ultimate meaning. These are more philosoph-

ical issues worthy of future research.

On a more practical level, there is a need to reproduce our results on virtual

or real-world environments such as in an online business or industrial setting.

Although the data sets used in the experiments were recorded from actual appli-

cations, other software development issues must be addressed so that the proposed

methods may eﬀectively provide online decision support. System integration is

needed to link together the diﬀerent methods, computer networking, and soft-

ware applications to act as a coordinate whole. Software testing is necessary to

validate if the requirements of the applications were met and that the methods

work as expected, producing similar results to those obtained in simulations. The

extent to which the presented results can be reproduced as part of an integrated

system still remains to be fully determined.

We expect research on evolving granular systems to grow further, and believe

that it may have a valuable role in prediction and decision support systems.

146

Appendix A

Universal Approximation

In this appendix we provide a constructive proof that an evolving granular system

can approximate any arbitrary continuous function with arbitrary accuracy in

compact domains. The proof is based on a work of Davis (36). Here we emphasize

multiple input single output models.

Let

i=1

pi,(A.1)

be a piecewise linear continuous function where

pi=





j=1

jxjif xj∈[li

j, Li

j]∀j

0 otherwise

The purpose of pis to approximate a continuous function f:L→Rwhere

L= [li

1, Li

1]×... ×[li

j, Li

j]×... ×[li

n, Li

n]; index irefers to an ordered set of non-

overlapping granules, that is, l1

j=min(x[1]

j, ..., x[H]

j), Lc

j=max(x[1]

j, ..., x[H]

j), His

the total number of samples in a given instant of time. Lower and upper bounds

147

of intervals are such that l1

j≤L1

j=... =li

j≤Li

j=... =lc

j≤Lc

j. In other words,

our purpose is to approximate function fby piecewise linear functions.

In what follows we limit our discussion to real and continuous functions fthat

are deﬁned on L, i.e., the convex hull of the current set of granules γ= (γ1, ..., γc).

Observations from the function fare designated as x[A],x[B],x[C], etc., instead

of x[h]in order to stress independence from the order of data presentation.

Theorem 1: Let Pbe an enclosure of a family of functions p(x) such that

p∈Pimplies max(p1, ..., pc)∈P

and min(p1, ..., pc)∈P. (A.2)

A continuous function fis uniformly approximable by members of pwithin Pif

and only if for any two points x[A]and x[B], and for any e > 0, there exists a p

such that

|f(x[A])−p(x[A])|< e and

|f(x[B])−p(x[B])|< e. (A.3)

Proof: If uniform approximation is possible, then given e > 0 we can ﬁnd a

p(x[A])∈Psuch that

|f(x[A])−p(x[A])|< e,

and so (A.3) follows trivially. Conversely, suppose that (A.3) holds. Select a

ﬁxed x[B]∈Pand a ﬁxed e > 0. Then, for any point x[C], we can ﬁnd

a function p(x[A]) = p(x[A];x[B], x[C], e) such that |f(x[B])−p(x[B])|< e and

|f(x[C])−p(x[C])|< e. In particular,

148

p(x[C])< f(x[C]) + e.

By continuity of pand f, this inequality must persist in a certain neighborhood

Nof x[C]. As x[C]runs over all the points of P, the corresponding neighborhoods

must cover P. By the Heine-Borel theorem, which states that every closed interval

in <nis compact, we can ﬁnd a ﬁnite number of them, N1, ..., Nc, that covers P.

The corresponding functions p(x[A];x[B], x[Ci]) satisfy

p(x[A];x[B], x[Ci])< f(x[A]) + e,

x[A]∈Ni, i = 1, ..., c. (A.4)

Deﬁne

p−(x[A];x[B]) = min{p(x[A];x[B], x[C1]), ..., p(x[A];x[B], x[Cc])}.(A.5)

By (A.2) iterated, p−∈Pand by (A.4),

p−(x[A], x[B])< f(x[A]) + e, x[A]∈P. (A.6)

Once again, for each iwe have

|f(x[B])−p(x[B];x[B], x[Ci])|< e

so that

149

p(x[B];x[B], x[Ci])> f(x[B])−e.

It follows from (A.5) that

p−(x[B];x[B])> f(x[B])−e. (A.7)

By continuity, (A.7) must persist in a neighborhood Oof x[B]:

p−(x[A];x[B])> f(x[A])−e.

Now, let x[B]run over P. These neighborhoods Ocover P, and we may ﬁnd a

ﬁnite number of them, O1, O2, ..., Oc, corresponding to x[B1], ..., x[Bc], that covers

P. Since

p−(x[A];x[Bi])> f(x[A])−e, x[A]∈Oi, i = 1,2, ..., c,

and since the Oicover P, for every x[A]∈P, the inequality

p−(x[A];x[Bi])> f(x[A])−e

must hold for some i.

If we set

s(x[A]) = max{p−(x[A], x[B1]), ..., p−(x[A], x[Bc])}

150

then by what we have just said,

s(x[A])> f(x[A])−e, ∀x[A]∈P. (A.8)

On the other hand, by (A.6), p−(x[A];x[B])< f(x[A]) + e,x[A]∈P∀x[B]. Hence,

s(x[A])< f(x[A]) + e, x[A]∈P. (A.9)

Combining (A.9) with (A.8),

|f(x[A])−s(x[A])|< e, x[A]∈P.

Finally, by (A.2) iterated, s(x[A])∈P.

Being Pa ﬁnite domain and pthe set of all piecewise linear functions deﬁned

on P(A.1), it is easy to verify that psatisﬁes (A.2). Condition (A.3) can be

satisﬁed with e= 0 by means of a linear function. These results theoretically

guarantee that the desired approximation is always achievable Q.E.D. This is

stated more precisely in the following corollary.

Corollary 1: Every continuous function can be approximated uniformly on a

ﬁnite interval by continuous piecewise linear functions.

Let Hbe a suﬃcient large number of steam data x[h], h = 1, ..., H , so that there

exists coverage for all granules in the problem space L= [li

1, Li

1]×...×[li

j, Li

j]×...×

[li

n, Li

n]. Superscript i= 1, ..., c, refers to an ordered set of granules. Moreover,

being Hﬁnite, the total number of granules in Lis ﬁnite, with 1 ≤c≤H

depending on the granularity value, ρ.

Complete covarage of Lafter Htime steps is guaranteed by creating granules

that match every never-before-seen value and forbidding learning algorithms to

delete granules. Then, for a model with cgranules, the diﬀerence between a

151

certain pi(x[h]) and y[h], a measure of f, provides the worst case approximation

error ei=max(e1, ..., ec). Being all functions pi,i= 1, ..., c, piecewise-linear

and deﬁned for all x[h]in L, function p=

i=1

pisatisﬁes condition (A.3) for any

(x, y)[h]. Therefore, corollary 2 follows trivially from corollary 1.

Corollary 2: Evolving granular systems are universal approximators.

152

Appendix B

Recursive Least Squares Method

The recursive least squares (RLS) algorithm is used to adapt consequent function

parameters ai

jas follows.

Let (x, y)[h]be the sample available for training at step h. We adjust the

coeﬃcients ai

jof piassuming that

y[h]=ai

j=1

jx[h]

j.(B.1)

If xjand yare intervals or symmetric trapezoids, then to adapt the coeﬃcients ai

using the standard form of the RLS algorithm, we take advantage from the mid-

point of the respective intervals or trapezoids. In the remainder of this appendix

we assume (x, y)[h]are real numbers (midpoints of intervals or trapezoidal fuzzy

data) for short. In case trapezoids are asymmetric, then an alternative is to use

the center of the area.

In the matrix form, the equation (B.1) becomes

Y=XΩi,(B.2)

153

where Y= [y[h]], X= [1 x[h]

1... x[h]

n], and Ωi= [ai

0... ai

n]Tis the vector of

unknown parameters. To estimate the coeﬃcients ai

jwe let

Y=XΩi+E, (B.3)

where

E=[h]=y[h]−p(x[h]) (B.4)

is the approximation error. While in batch estimation the rows in Y,Xand E

increase with the number of available instances, in recursive mode only two rows

are kept and we reformulate equations (B.2)-(B.4) as follows:

Y="y[h−1]

y[h]#, X ="1x[h−1]

1... x[h−1]

1x[h]

1... x[h]

and E="[h−1]

[h]#.(B.5)

The rows in (B.5) refer to values before and just after adaptation. The RLS

algorithm chooses Ωito minimize the functional

J(Ωi) = ETE. (B.6)

Ωiis given by

154

Ωi= (XTX)−1XTY. (B.7)

Assuming P= (XTX)−1and using the matrix inversion lemma (148) we avoid

inverting XTXusing:

P(new) = P(old) I−XXTP(old)

1 + XTP(old)X,(B.8)

where Iis the identity matrix. In practice it is usual to choose large initial values

for the entries of the main diagonal of P. We use P[0] = 103Ias the default value.

After simple mathematical transformations, the vector of parameters is rear-

ranged recursively as follows:

Ωi(new) = Ωi(old) + P(new)XY−XTΩi(old).(B.9)

Detailed derivations of the RLS algorithm can be found in (13) and convergence

proof in (61).

155

156

Bibliography

[1] Aggarwal, C. C.; Han, J.; Wang, J.; Yu, P. S. “A framework for on-demand

classiﬁcation of evolving data streams.” IEEE Transactions on Knowledge

and Data Engineering, Vol. 18, Issue 5, pp: 577-589, 2006. 32

[2] Aggarwal, C. C.; Yu, P. S. “A framework for clustering uncertain data

streams.” IEEE International Conference on Data Engineering, pp: 150-

159, 2008. 44,63

[3] Aggarwal, C. C.; Yu, P. S. (Eds.) Privacy-Preserving Data Mining: Models

and Algorithms. Springer-Verlag (Series: Advances in Database Systems),

Vol. 34, 513p. 2008. 39,73

[4] Alonso, J. M.; Magdalena, L. “Special issue on interpretable fuzzy systems.”

Information Sciences, Vol. 181, pp: 4331-4339, 2011. 32

[5] Angelov, P. Evolving Rule-Based Models: A Tool for Design of Flexible

Adaptive Systems. Springer-Verlag, Heidelberg, New York (Studies in

Fuzziness and Soft Computing), 227p. 2002. 4,33

[6] Angelov, P.; Filev, D. “An approach to online identiﬁcation of Takagi-Sugeno

fuzzy models.” IEEE Transactions on Systems, Man, and Cybernetics -

Part B, Vol. 34, Issue 1, pp: 484-498, 2004. 33,40,111,124,130

[7] Angelov, P.; Zhou, X. “Evolving fuzzy systems from data streams in real-

time.” IEEE Symposium on Evolving Fuzzy Systems, pp: 29-35, 2006.

111,124,130

[8] Angelov, P.; Zhou, X.; Filev, D.; Lughofer, E. “Architectures for evolving

fuzzy rule-based classiﬁers.” IEEE International Conference on Systems,

Man and Cybernetics, pp: 2050-2055, 2007. 34

[9] Angelov, P.; Filev, D.; Kasabov, N. (Eds.) Evolving Fuzzy Systems - Preface

to the Special Section. IEEE Transactions on Fuzzy Systems, Vol. 6, Issue

6, pp: 1390-1392, 2008. 4,32

157

BIBLIOGRAPHY

[10] Angelov, P.; Zhou, X. “Evolving fuzzy-rule-based classiﬁers from data

streams.” IEEE Transactions on Fuzzy Systems, Vol. 16, Issue 6, pp: 1462-

1475, 2008. 32,34,77

[11] Angelov, P.; Filev, D.; Kasabov, N. (Eds.) Evolving Intelligent Systems:

Methodology and Applications. Wiley-IEEE Press Series on Computa-

tional Intelligence, 444p. 2010. 4,32,33,44,79

[12] Ashokaraj, I.; Tsourdos, A.; Silson, P.; White, B. “Sensor based robot lo-

calisation and navigation: using interval analysis and extended Kalman

ﬁlter.” 5th Asian Control Conference, Vol. 2, pp: 1086-1093, 2004. 49

[13] Astrom, K. J.; Wittenmark, B. Adaptive Control. Prentice-Hall, Addison-

Wesley, Boston, 2nd edition, 580p. 1994. 32,155

[14] Ballini, R.; Mendonca, A.; Gomide, F. “Evolving fuzzy modeling of sovereign

bonds.” Journal of Financial Decision Making, Special Issue: The Fuzzy

Logic in the Financial Uncertainty, Vol. 5, Issue 2, pp: 3-15, 2009. 128

[15] Bargiela, A.; Pedrycz, W. Granular Computing: An Introduction. Kluwer

Academic Publishers - Boston, 1st edition, 452p. 2002. 2,3,11,12,44,45

[16] Bargiela A.; Pedrycz, W. “Granulation of temporal data: a global view

on time series.” International Conference of the North American Fuzzy

Information Processing Society, pp: 191-196, 2003. 40

[17] Bargiela, A.; Pedrycz, W. “Recursive information granulation: aggregation

and interpretation issues.” IEEE Transactions on Systems, Man, and Cy-

bernetics - Part B, Vol. 33, Issue 1, pp: 96-112, 2003. 43,44

[18] Bargiela, A.; Pedrycz, W. “Granular mappings.” IEEE Transactions on Sys-

tem, Man, and Cybernetics - Part A, Vol. 35, Issue 2, pp: 292-297, 2005.

3,11,13

[19] Bargiela, A.; Pedrycz, W. “Toward a theory of granular computing for

human-centered information processing.” IEEE Transactions on Fuzzy

Systems, Vol. 16, Issue 2, pp: 320-330, 2008. 11

[20] Beliakov, G.; Pradera, A.; Calvo, T. Aggregation Functions: A Guide for

Practitioners. Springer-Verlag, Berlin, Heidelberg, 1st edition (Studies in

Fuzziness and Soft Computing), 361p. 2007. 26

[21] Beringer, J.; Hullermeier, E. “Eﬃcient instance-based learning on data

streams.” Intelligent Data Analysis, Vol. 11, Issue 6, pp: 627-650, 2007. 4

158

BIBLIOGRAPHY

[22] Bezdek, J. Pattern Recognition with Fuzzy Objective Function Algoritms.

Plenum Press, New York, 1981. 100

[23] Bifet, A.; Holmes, G.; Pfahringer, B.; Kranen, P.; Kremer, H.; Jansen, T.;

Seidl, T. “MOA: Massive online analysis, a framework for stream classi-

ﬁcation and clustering.” Journal of Machine Learning Research, Vol. 11,

pp: 44-50, 2010. 37

[24] Bouchachia, A.; Gabrys, B.; Sahel, Z. “Overview of some incremental learn-

ing algorithms.” IEEE International Conference on Fuzzy Systems, pp:

1-6, 2007. 32

[25] Bouchachia, A. “An evolving classiﬁcation cascade with self-learning.”

Evolving Systems, Vol. 1, Issue 3, pp: 143-160, 2010. 32

[26] Bouchon-Meunier, B. (Ed.) Aggregation and Fusion of Imperfect Informa-

tion. Physica-Verlag, Heidelberg, New York (Studies in Fuzziness and Soft

Computing), 278p. 1998. 80

[27] Bouchon-Meunier, B.; Marsala, C.; Rifqi, M.; Yager, R. R. (Eds.) Uncer-

tainty in Intelligent and Information Systems. World Scientiﬁc - Singapore,

536p. 2008. 2,48

[28] Box, G. E. P.; Jenkins, G. M.; Reinsel, G. C. Time Series Analysis: Forecast-

ing and Control. Wiley Series in Probability and Statistics, 4th edition,

746p. 2008. 107

[29] Carpenter, G. A.; Grossberg, S. “A massively parallel architecture for a self-

organizing neural pattern recognition machine.” Computer Vision, Graph-

ics, and Image Processing, Vol. 37, pp: 54-115, 1987. 87

[30] Carpenter, G. A.; Grossberg, S.; Markuzon, N.; Reynolds, J. H.; Rosen,

D. B. “Fuzzy ARTMAP: A neural network architecture for incremental

supervised learning of analog multidimensional maps.” IEEE Transactions

on Neural Networks, Vol. 3, Issue 5, pp: 698-713, 1992. 62

[31] Carvalho, F. A. T.; Souza, R. M. C. R.; Chavent, M.; Lechevallier, Y. “Adap-

tative Hausdorﬀ distances and dynamic clustering of symbolic interval

data.” Pattern Recognition Letters, Vol. 27, Issue 3, pp: 167-179, 2006.

[32] Chen, S.; He, H. “Towards incremental learning of nonstationary imbalanced

data stream: a multiple selectively recursive approach.” Evolving Systems,

Vol. 2, Issue 1, pp: 35-50, 2011. 32

159

BIBLIOGRAPHY

[33] Cross, V. V.; Sudkamp, T. A. Similarity and compatibility in fuzzy set the-

ory: assessment and applications. Physica-Verlag Heidelberg (Studies in

Fuzziness and Soft Computing), 209p. 2002. 26,70

[34] Da Deng; Kasabov, N. “ESOM: An algorithm to evolve self-organizing maps

from online data streams.” IEEE International Joint Conference on Neural

Networks, Vol. 6, pp: 3-8, 2000. 34

[35] Darwin, C. R. The origin of species by means of natural selection, or the

preservation of favoured races in the struggle for life. John Murray - Lon-

don, 6th edition, 1872. 31

[36] Davis, P. J. Interpolation and Approximation. Dover Publications, 393p.

1963. 122,147

[37] Do, T.-N.; Poulet, F. “Kernel-based algorithms and visualization for interval

data mining.” In: Zighed, D. A.; Tsumoto, S.; Ras, Z. W., Mining Com-

plex Data, SCI 165, Springer-Verlag, Berlin, Heidelberg, pp: 75-91, 2009.

[38] Domingos, P.; Hulten, G. “Mining high-speed data streams.” International

Conference on Knowledge Discovery and Data Mining, pp: 7180, 2000. 37

[39] Drossu, R.; Obradovic, Z. “Rapid design of neural networks for time series

prediction.” IEEE Computational Science & Engineering, Vol. 3, Issue 2,

pp: 78-89, 1996. 109

[40] Dubois, D.; Kerre, E.; Mesiar, R.; Prade, H. “Fuzzy interval analysis.” In:

The Handbook of Fuzzy Sets, Vol. 1 - Fundamentals of Fuzzy Sets, Kluwer

Academic - Bordrecht, pp: 483-581, 2000. 21

[41] Dubois D.; Prade, H. (Eds.) Fundamentals of Fuzzy Sets. Kluwer Academic

Publishers, 1st edition, 653p. 2000. 5

[42] Dubois, D.; Prade, H. “On the use of aggregation operations in information

fusion processes.” Fuzzy Sets and Systems, Vol. 142, Issue 1, pp: 143-161,

2004. 25,40

[43] Elwell, R.; Polikar, R. “Incremental learning of concept drift in nonstationary

environments.” IEEE Transactions on Neural Networks, Vol. 22, Issue 10,

pp: 1517-1531, 2011. 37

[44] Engelbrecht, A. P. Computational Intelligence: An Introduction. Wiley -

Chichester, England, 2nd edition, 597p. 2007. 4

160

BIBLIOGRAPHY

[45] Fawcett, T. “An introduction to ROC analysis.” Pattern Recognition Letters,

Vol 27, pp: 861-874, 2006. 100

[46] Gabrys, B.; Bargiela, A. “General fuzzy min-max neural network for cluster-

ing and classiﬁcation.” IEEE Transactions on Neural Networks, Vol. 11,

Issue 3, pp: 769-783, 2000. 32,36,49,78,90

[47] Gabrys, B.; Petrakieva, L. “Combining labelled and unlabelled data in the

design of pattern classiﬁcation systems.” International Journal of Approx-

imate Reasoning, Vol. 35, Issue 3, pp: 251-273, 2004. 98,105

[48] Gama, J.; Medas, P. “Learning decision trees from dynamic data streams.”

Journal of Universal Computer Science, Vol. 11, Issue 8, pp: 1353-1366,

2005. 37

[49] Hahn, G. J.; Meeker, W. Q. Statistical Intervals: A Guide for Practitioners.

Wiley, USA, 387p. 1991. 21

[50] Hall, D. L.; Llinas, J. “An introduction to multisensor data fusion.” Pro-

ceedings of the IEEE, Vol. 85, Issue 1, pp: 6-23, 1997. 44

[51] Hamilton, J. D. Time Series Analysis. Princeton University Press, 1st edition,

799p. 1994. 4,107

[52] Hamming, R. W. “Error detecting and error correcting codes.” Bell System

Technical Journal, Vol. 29, Issue 2, pp: 147160, 1950. 26,69

[53] Hansen, E. R.; Walster, G. W. Global Optimization using Interval Analysis.

2nd edition, Marcel Dekker, New York - Basel, 489p. 2004. 14

[54] Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning:

Data Mining, Inference and Prediction. Springer-Verlag, 2nd edition, 768p.

2009. 4,33,130

[55] Haykin, S. Neural Networks: A Comprehensive Foundation. Prentice Hall,

2nd edition, 823p. 1999. 6,100,111,130

[56] Hickey, T.; Ju, Q.; van Emden, M. H. “Interval arithmetic: from principles

to implementation.” Journal of the ACM, Vol. 48, Issue 5, pp: 1038-1068,

2001. 14,50

[57] Ho, A.; Iansek, R.; Marigliani, C.; Bradshaw, J.; Gates, S. “Speech impair-

ment in a large sample of patients with Parkinson’s disease.” Behavioral

Neurology, Vol. 11, pp: 131-137, 1998. 127

161

BIBLIOGRAPHY

[58] Ho, W. L.; Tung, W. L.; Quek, C. “An evolving Mamdani-Takagi-Sugeno

based neural-fuzzy inference system with improved interpretability-

accuracy.” IEEE International Conference on Fuzzy Systems, pp: 1-8,

July, 2010. 62

[59] Iglesias, J. A.; Angelov, P.; Ledezma, A.; Sanchis, A. “Evolving classiﬁcation

of agents behaviors: a general approach.” Evolving Systems, Vol. 1, Issue

3, pp: 161-171, 2010. 32

[60] Jaulin, L.; Keiﬀer, M.; Didrit, O.; Walter, E. Applied Interval Analysis.

Springer-Verlag - London, 379p. 2001. 5,14,38

[61] Johnson, C. R. Lectures on Adaptive Parameter Estimation. Prentice-Hall -

Upper Saddle River, USA, 185p. 1988. 155

[62] Kaburlasos, V. G.; Papadakis, S. E. “Granular self-organizing map (grSOM)

for structure identiﬁcation.” Neural Networks, Vol. 19, Issue 5, pp: 623-

643, 2006. 78

[63] Kasabov, N. “Evolving fuzzy neural networks for supervised / unsupervised

online knowledge-based learning.” IEEE Transactions on Systems, Man,

and Cybernetics - Part B, Vol. 31, Issue 6, pp: 902-918, 2001. 34,79

[64] Kasabov, N.; Song, Q. “DENFIS: Dynamic evolving neural-fuzzy inference

system and its application.” IEEE Transactions on Fuzzy Systems, Vol.

10, Issue 2, pp: 144-154, 2002. 34,79,111,124

[65] Kasabov, N. Evolving Connectionist Systems: Methods and Applications in

Bioinformatics, Brain Study and Intelligent Machines. Springer-Verlag -

London, 1st edition, 320p. 2003. 4,33,34,79

[66] Kasabov, N. Evolving Connectionist Systems: The Knowledge Engineering

Approach. Springer-Verlag - London, 2nd edition, 451p. 2007. 4,32,34,

40,79

[67] Kaufmann, A.; Gupta, M. M. Introduction to Fuzzy Arithmetic: Theory

and Applications. Van Nostrand Reinhold Company Inc., New York, 350p.

1985. 21

[68] Kausay, T.; Simon, T. K. “Acceptance of concrete compressive strength.”

Concrete Structures, Vol. 8, pp: 54-63, 2007. 122,123

[69] Kearfott, R. B.; Kreinovich, V. Applications of Interval Computations.

Kluwer Academic Publishers, 425p. 1996. 5,14

162

BIBLIOGRAPHY

[70] Klir, G. K.; Yuan, B. Fuzzy Sets and Fuzzy Logic: Theory and Applications.

Prentice Hall, 1st edition, 592p. 1995. 21,22

[71] Kosko, B. Neural Networks and Fuzzy Systems. A Dynamical Systems Ap-

proach to Machine Intelligence. Prentice-Hall, Englewood Cliﬀs, 449p.

1991. 44

[72] Kreinovich, V. “Interval computations as an important part of granular com-

puting: an introduction.” In: Pedrycz, W.; Skowron, A.; Kreinovich, V.

(Eds.) Handbook of Granular Computing, pp: 1-31, 2008. 2

[73] Kuncheva L. I. Fuzzy Classiﬁer Design. Springer-Verlag, Heidelberg, 321p.

2000. 44

[74] Last, M. “Online classiﬁcation of nonstationary data streams.” Intelligent

Data Analysis, Vol. 6, Issue 2, pp: 129-147, 2002. 43

[75] Leite, D.; Costa, P.; Gomide, F. “Interval-based evolving modeling.” IEEE

Symposium Series on Computational Intelligence, pp: 1-8, 2009. 32,40,

[76] Leite, D.; Costa, P.; Gomide, F. “Evolving granular classiﬁcation neural

networks.” IEEE International Joint Conference on Neural Networks, pp:

1736-1743, 2009. 77

[77] Leite, D.; Costa, P.; Gomide, F. “Evolving granular neural network for

semi-supervised data stream classiﬁcation.” World Congress on Compu-

tational Intelligence - International Joint Conference on Neural Networks,

pp: 1877-1884, 2010. 40,77

[78] Leite, D.; Costa, P.; Gomide, F. “Granular approach for evolving system

modeling.” In: Hullermeier, E.; Kruse, R.; Hoﬀmann F. (Eds.) Lecture

Notes in Artiﬁcial Intelligence, Vol. 6178, pp: 340-349, Springer, 2010. 40,

[79] Leite, D.; Gomide, F.; Ballini, R.; Costa, P. “Fuzzy granular evolving mod-

eling for time series prediction.” IEEE International Conference on Fuzzy

Systems, pp: 2794-2801, 2011. 32,40,53,67,73

[80] Leite, D.; Gomide, F. “Evolving linguistic fuzzy models from data streams.”

In: Trillas, E.; Bonissone, P.; Magdalena, L.; Kacprycz, J. (Eds.) Combin-

ing Experimentation and Theory: A Hommage to Abe Mamdani (Studies

in Fuzziness and Soft Computing), pp: 209-223, 2011. 40,67,120

163

BIBLIOGRAPHY

[81] Leite, D.; Costa, P.; Gomide, F. “Interval approach for evolving granular

system modeling.” In: Mouchaweh, M. S.; Lughofer, E. (Eds.) Learning

in Non-Stationary Environments: Methods and Applications, Springer-

Verlag, pp: 271-301, 2012. 40,48

[82] Leite, D.; Costa, P.; Gomide, F. “Evolving granular neural network for fuzzy

time series forecasting.” World Congress on Computational Intelligence -

IEEE Joint Conference on Neural Networks, 8p. 2012. 77

[83] Lemos, A.; Caminhas, W.; Gomide, F. “Fuzzy evolving linear regression

trees.” Evolving Systems, Vol. 2, Issue 1, pp: 1-14, 2011. 32,37,124

[84] Lemos, A.; Caminhas, W.; Gomide, F. “Multivariable Gaussian evolving

fuzzy modeling system.” IEEE Transactions on Fuzzy Systems, Vol. 19,

Issue 1, pp: 91-104, 2011. 32,35

[85] Lemos, A.; Caminhas, W.; Gomide, F. “Evolving fuzzy linear regression

trees with feature selection.” IEEE Workshop on Evolving and Adaptive

Intelligent Systems, pp: 31-38, 2011. 37

[86] Liggins, M. E.; Hall, D. L.; Llinas, J. (Eds.) Handbook of Multisensor Data

Fusion: Theory and Practice. CRC Press, 2nd edition, 849p. 2008. 3

[87] Lima, E.; Gomide, F.; Ballini, R. “Participatory evolving fuzzy modeling.”

International Symposium on Evolving Fuzzy Systems, pp: 36-41, 2006.

35,124

[88] Lin, T. Y. “Granular computing on binary relations.” International Confer-

ence on Rough Sets and Current Trends in Computing, pp: 296-299, 2002.

[89] Lin, T. Y. “Neural networks, qualitative fuzzy logic and granular adaptive

systems.” World Congress of Computational Intelligence, pp: 566-571,

2002. 2,11

[90] Little, M. A.; McSharry, P. E.; Hunter, E. J.; Spielman, J.; Ramig, L. O.

“Suitability of dysphonia measurements for telemonitoring of Parkinson’s

disease.” IEEE Transactions on Biomedical Engineering, Vol. 56, Issue 4,

pp: 1015-1022, 2009. 128

[91] Little, R. J. A.; Rubin, D. B. Statistical Analysis with Missing Data. Wiley-

Interscience, 2nd edition, 381p. 2002. 39

[92] Ljung, L. System Identiﬁcation - Theory for the User. Prentice-Hall, Engle-

wood Cliﬀs, NJ, 519p. 1988. 33

164

BIBLIOGRAPHY

[93] Lodwick, W.; Jamison, K. D. “Special issue: interfaces between fuzzy set

theory and interval analysis.” Fuzzy Sets and Systems, Vol. 135, pp: 1-3,

2003. 21

[94] Lughofer, E. “FLEXFIS: A robust incremental learning approach for evolving

Takagi-Sugeno fuzzy models.” IEEE Transactions on Fuzzy Systems, Vol.

16 , Issue 6, pp: 1393-1410, 2008. 32,36

[95] Lughofer, E.; Angelov, P. “Handling drifts and shifts in on-line data streams

with evolving fuzzy systems.” Applied Soft Computing, Vol. 11, Issue 2,

pp: 2057-2068, 2011. 34,44

[96] Lughofer, E.; Bouchot, J.-L.; Shaker, A. “On-line elimination of local redun-

dancies in evolving fuzzy systems.” Evolving Systems, Vol. 2, Issue 3, pp:

165-187, 2011. 73,95

[97] Lughofer, E. Evolving Fuzzy Systems - Methodologies, Advanced Concepts

and Applications. Springer-Verlag, Berlin Heidelberg, 460p. 2011. 4,40,

[98] Lughofer, E. “On-line incremental feature weighting in evolving fuzzy clas-

siﬁers.” Fuzzy Sets and Systems, Vol. 163, Issue 1, pp: 1-23, 2011. 93

[99] Maimon, O. Z.; Rokach, L. The Data Mining and Knowledge Discovery

Handbook. Springer - New York, USA, 1383p. 2005. 3

[100] Mendel, J. M. “Type-2 fuzzy sets and systems: an overview.” IEEE Com-

putational Intelligence Magazine, Vol. 2, Issue 2, pp: 20-29, 2007. 21

[101] Mitchell, T. M. Machine Learning. McGraw-Hill Sci-

ence/Engineering/Math, 1st edition, 414p. 1997. 4

[102] Mitchell, T. M. “The role of unlabeled data in supervised learning.” Pro-

ceedings of the Sixth International Colloquium on Cognitive Science, 8p.

1999. 98,105

[103] Moore, R. E. Interval Analysis. Prentice Hall - Englewood Cliﬀs, NJ, 145p.

1966. 14,21

[104] Moore, R. E. Methods and Applications of Interval Analysis. SIAM -

Philadelphia, 190p. 1979. 5,14,20

[105] Moore, R. E.; Lodwick, W. “Interval analysis and fuzzy set theory.” Fuzzy

Sets and Systems, Vol. 135, Issue 1, pp: 5-9, 2003. 21

165

BIBLIOGRAPHY

[106] Moore, R. E.; Kearfott, R. B.; Cloud, M. J. Introduction to Interval Anal-

ysis. SIAM - Philadelphia, 223p. 2009. 18,21

[107] Muhlbaier, M.; Topalis, A.; Polikar, R. “Learn++.NC: Combining ensem-

ble of classiﬁers with dynamically weighted consult-and-vote for eﬃcient

incremental learning of new classes.” IEEE Transactions on Neural Net-

works, Vol. 20, Issue 1, pp: 152168, 2009. 37

[108] Nandedkar, A. V.; Biswas, P. K. “A granular reﬂex fuzzy min-max neural

network for classiﬁcation.” IEEE Transactions on Neural Networks, Vol.

20, Issue 7, pp: 1117-1134, 2009. 49,78,90

[109] Neumaier, A. Interval Methods for Systems of Equations. Cambridge Uni-

versity Press, Cambridge, 272p. 1990. 14

[110] Ozawa, S.; Pang, S.; Kasabov, N. “Incremental learning of chunk data

for online pattern classiﬁcation systems.” IEEE Transactions on Neural

Networks, Vol. 19, Issue 6, pp: 1061-1074, 2008. 43

[111] Pedrycz, W.; Waletzky, J. “Fuzzy clustering with partial supervision.”

IEEE Transactions on Systems, Man and Cybernetics - Part B, Vol. 27,

Issue 5, pp: 787795, 1997. 98

[112] Pedrycz, W.; Vukovich, W. “Granular neural networks.” Neurocomputing,

Vol. 36, pp: 205-224, 2001. 75,76

[113] Pedrycz, W. “Heterogeneous fuzzy logic networks: fundamentals and devel-

opment studies.” IEEE Transactions on Neural Networks, Vol. 15, Issue

6, pp: 1466-1481, 2004. 89

[114] Pedrycz, W. Knowledge-based Clustering: From Data to Information Gran-

ules. Wiley, 1st edition, 336p. 2005. 98

[115] Pedrycz, W.; Kwak, K.-C. “The development of incremental models.” IEEE

Transactions on Fuzzy Systems, Vol. 15, Issue 3, pp: 507-518, 2007. 3

[116] Pedrycz, W.; Gomide, F. Fuzzy Systems Engineering: Toward Human-

Centric Computing. Wiley - Hoboken, NJ, USA, 526p. 2007. 2,3,6,12,

22,25,26

[117] Pedrycz, W. “Granular computing - the emerging paradigm.” Journal of

Uncertain Systems, Vol. 1, pp: 38-61, 2007. 11,13

[118] Pedrycz, W.; Skowron, A.; Kreinovich, V. (Eds.) Handbook of Granular

Computing. Wiley - Chichester, England, 1116p. 2008. 2,11,12

166

BIBLIOGRAPHY

[119] Pedrycz, W. “Evolvable fuzzy systems: some insights and challenges.”

Evolving Systems, Vol. 1, Issue 2, pp: 73-82, 2010. 40

[120] Petkovic, M. S.; Petkovic, L, D. Complex Interval Arithmetic and Its Ap-

plications. Wiley - VCH, Germany, 280p. 1998. 21

[121] Polikar, R.; Udpa, L.; Udpa, S. S.; Honavar, V. “Learn++: An incremental

learning algorithm for supervised neural networks.” IEEE Transactions on

Systems, Man, and Cybernetics - Part C, Vol. 31, Issue 4, pp: 497-508,

2001. 37

[122] Pouzols, F. M.; Lendasse, A. “Evolving fuzzy optimally pruned extreme

learning machine for regression problems.” Evolving Systems, Vol. 1, Issue

1, pp: 43-58, 2010. 32

[123] Roof, S.; Callagan, C. “The climate of Death Valley, California.” Bulletin

of the American Meteorological Sociecty, Vol. 84, pp: 1725-1739, 2003.

110

[124] Rubio, J. J. “SOFMLS: Online self-organizing fuzzy modiﬁed least-squares

network.” IEEE Transactions on Fuzzy Systems, Vol. 17, Issue 6, pp: 1296

- 1309, 2009. 32,36

[125] Rubio, J. J. “Stability analysis for an online evolving neuro-fuzzy recurrent

network.” In: Angelov, P.; Filev, D.; Kasabov, N. (Eds.) Evolving Intel-

ligent Systems: Methodology and Applications, Wiley - IEEE Press, pp:

173-199, 2010. 32

[126] Russell, S.; Norvig, P. Artiﬁcial Intelligence: A Modern Approach. Series

in Artiﬁcial Intelligence, 3rd edition, 2009. 13

[127] Shafer, J. L. Analysis of Incomplete Multivariate Data. Chapman and Hall

- London, 430p. 1997. 39,124

[128] Silva, L.; Gomide, F.; Yager, R. “Participatory learning in fuzzy clustering.”

IEEE International Conference on Fuzzy Systems, pp: 857-861, 2005. 124

[129] Simpson, P. K. “Fuzzy min-max neural networks. Part I: classiﬁcation.”

IEEE Transactions on Neural Networks, Vol. 3, Issue 5, pp: 776-786, 1992.

[130] Simpson, P. K. “Fuzzy min-max neural networks. Part II: clustering.” IEEE

Transactions on Fuzzy Systems, Vol. 1, Issue 1, pp: 32-45, 1993. 36

167

BIBLIOGRAPHY

[131] Strother, W. “Continuous multi-valued functions.” The Bulletin of Sao

Paulo Mathematical Society, Vol. 10, pp: 87-120, 1958. 21

[132] Tibshirani, R. “Regression shrinkage and selection via the Lasso.” Journal

of the Royal Statistical Society - Series B (Methodological), Vol. 58, Issue

1, pp: 267-288, 1996. 130

[133] Tsanas, A.; Little, M. A.; McSharry, P. E.; Ramig, L. O. “Accurate tele-

monitoring of Parkinson’s disease progression by noninvasive speech tests.”

IEEE Transactions on Biomedical Engineering, Vol. 57, Issue 4, pp: 884-

893, 2010. 127,128,130

[134] Vachkov, G. “Spatial-temporal knowledge base for modeling and analysis

of evolving systems.” Evolving Systems, Vol. 2, Issue 2, pp: 131-143, 2011.

[135] Witten, I. H.; Frank, E.; Hall, M. A. Data Mining: Practical Machine

Learning Tools and Techniques. Morgan Kaufmann, 3rd edition, 664p.

2011. 3,33,38

[136] Xiao, L.; Hung, E. “An eﬃcient distance calculation method for uncer-

tain objects.” IEEE Symposium on Computational Intelligence and Data

Mining, pp: 10-17, 2007. 72

[137] Yager, R. R. “A model of participatory learning.” IEEE Transactions on

Systems, Man and Cybernetics, Vol. 20, Issue 5, pp: 1229-1234, 1990. 35

[138] Yager, R. R. “Learning from imprecise granular data using trapezoidal fuzzy

set representations.” In: Prade, H.; Subrahmanian, V. S. (Eds.) Lecture

Notes in Computer Science, Springer - Berlim, Heidelberg, Vol. 4772, pp:

244-254, 2007. 21,40,65,76

[139] Yager, R. R. “Measures of speciﬁcity over continuous spaces under similarity

relations.” Fuzzy Sets and Systems, Vol. 159, Issue 17, pp: 2193-2210,

2008. 44

[140] Yager, R. R. “Participatory learning with granular observations.” IEEE

Transactions on Fuzzy Systems, Vol. 17, Issue 1, pp: 1-13, 2009. 40,65

[141] Yao, J. T. “A ten-year review of granular computing.” IEEE International

Conference on Granular Computing, pp: 734-739, 2007. 2,11,12

[142] Yao, Y. Y. “Perspectives of granular computing.” IEEE International Con-

ference on Granular Computing, pp: 85-90, 2005. 3,78

168

BIBLIOGRAPHY

[143] Yao, Y. Y. “The art of granular computing.” International Conference on

Rough Sets and Emerging Intelligent Systems Paradigms, LNAI Vol. 4585,

pp: 101-112, 2007. 41

[144] Yao, Y. Y. “Granular computing: past, present and future.” IEEE Inter-

national Conference on Granular Computing, pp: 80-85, 2008. 2,11

[145] Yao, Y. Y. “Interpreting concept learning in cognitive informatics and gran-

ular computing.” IEEE Transactions on Systems, Man, and Cybernetics -

Part B, Vol. 39, Issue 4, pp: 855-866, 2009. 41

[146] Yao, Y. Y. “Human-inspired granular computing.” In: Yao, J. T. (Ed.)

Novel Developments in Granular Computing: Applications for Advanced

Human Reasoning and Soft Computing, 2010. 11,41

[147] Yeh, I.-C. “Modeling of strength of high performance concrete using artiﬁ-

cial neural networks.” Cement and Concrete Research, Vol. 28, Issue 12,

pp: 1797-1808, 1998. 123

[148] Young, P. C. Recursive Estimation and Time-Series Analysis: An Introduc-

tion. Springer-Verlag - Berlin, 300p. 1984. 155

[149] Zadeh, L. “Fuzzy sets.” Information Control, Vol. 8, pp: 338-353, 1965. 21,

[150] Zadeh, L. “The concept of a linguistic variable and its application to ap-

proximate reasoning.” Information Science, Vol. 8, pp: 199-249, 1975. 21

[151] Zadeh, L. A. “Fuzzy sets and information granularity.” In: Gupta, M. M.;

Ragade, R. K.; Yager, R. R. (Eds.) Advances in Fuzzy Set Theory and

Applications, North Holland - Amsterdam, pp: 3-18, 1979. 2,11

[152] Zadeh, L. A. “Toward a theory of fuzzy information granulation and its

centrality in human reasoning and fuzzy logic.” Fuzzy Sets and Systems,

Vol. 90, Issue 2, pp: 111-127, 1997. 3,41

[153] Zadeh, L. A. “Toward a generalized theory of uncertainty (GTU) - an out-

line.” Information Sciences, Vol. 172, pp: 1-40, 2005. 3,13,48

[154] Zadeh, L. A. ”Generalized theory of uncertainty (GTU) - principal concepts

and ideas.” Computational Statistics & Data Analysis, Vol. 51, pp: 15-46,

2006. 2,3,13,40

[155] Zadeh, L. A. “Is there a need for fuzzy logic?” Information Sciences, Vol.

178, Issue 13, pp: 27512779, 2008. 5

169

BIBLIOGRAPHY

[156] Zhang, L; Zhang B. “Fuzzy reasoning model under quotient space struc-

ture.” Information Sciences, Vol. 173, pp: 353-364, 2005. 78

[157] Zhang, Y.-Q.; Fraser, M. D.; Gagliano, R. A.; Kandel, A. “Granular neural

networks for numerical-linguistic data fusion and knowledge discovery.”

IEEE Transactions on Neural Networks, Vol. 11, Issue 3, pp: 658-667,

2000. 78

[158] Zhu, X.; Goldberg, A. B. Introduction to Semi-Supervised Learning. Mor-

gan and Claypool Publishers (Synthesis Lectures on Artiﬁcial Intelligence

and Machine Learning), 116p. 2009. 98,105

[159] Zimmermann, H.-J. Fuzzy Set Theory and its Applications. Kluwer Aca-

demic Publishers, 4th edition, 544p. 2001. 5

170

State-Space Evolving Granular Control of Unknown Dynamic Systems

Conference Paper

Full-text available

Apr 2023

Daniel F. Leite

We present an approach for data-driven modeling and evolving control of unknown dynamic systems called State-Space Evolving Granular Control. The approach is based on elements of granular computing, discrete state-space systems, and online learning. First, the structure and parameters of a granular model is developed from a stream of state data. The model is formed by information granules comprising first-order difference equations. Partial activation of granules gives global nonlinear approximation capability. The model is supplied with an algorithm that constantly updates the granules toward covering new data; however, keeping memory of previous patterns. A granular controller is derived from the granular model for parallel distributed compensation. Instead of difference equations, the content of a control granule is a gain matrix, which can be redesigned in real-time from the solution of a relaxed locally-valid linear matrix inequality derived from a Lyapunov function and bounded control-input conditions. We have shown asymptotic stabilization of a chaotic map assuming no previous knowledge about the source that produces the stream of data.

Interval incremental learning of interval data streams and application to vehicle tracking

Article

Full-text available

Feb 2023
INFORM SCIENCES

This paper presents a method called Interval Incremental Learning (IIL) to capture spatial and temporal patterns in uncertain data streams. The patterns are represented by information granules and a granular rule base with the purpose of developing explainable human-centered computational models of virtual and physical systems. Fundamentally, interval data are either included into wider and more meaningful information granules recursively, or used for structural adaptation of the rule base. An Uncertainty-Weighted Recursive-Least-Squares (UW-RLS) method is proposed to update affine local functions associated with the rules. Online recursive procedures that build interval-based models from scratch and guarantee balanced information granularity are described. The procedures assure stable and understandable rule-based modeling. In general, the model can play the role of a predictor, a controller, or a classifier, with online sample-per-sample structural adaptation and parameter estimation done concurrently. The IIL method is aligned with issues and needs of the Internet of Things, Big Data processing, and eXplainable Artificial Intelligence. An application example concerning real-time land-vehicle localization and tracking in an uncertain environment illustrates the usefulness of the method. We also provide the Driving Through Manhattan interval dataset to foster future investigation.

Incremental Learning and State-Space Evolving Fuzzy Control of Nonlinear Time-Varying Systems with Unknown Model

Conference Paper

Full-text available

Jan 2021

We present a method for incremental modeling and time-varying control of unknown nonlinear systems. The method combines elements of evolving intelligence, granular machine learning, and multi-variable control. We propose a State-Space Fuzzy-set-Based evolving Modeling (SS-FBeM) approach. The resulting fuzzy model is structurally and parametrically developed from a data stream with focus on memory and data coverage. The fuzzy controller also evolves, based on the data instances and fuzzy model parameters. Its local gains are redesigned in real-time – whenever the corresponding local fuzzy models change – from the solution of a linear matrix inequality problem derived from a fuzzy Lyapunov function and bounded input conditions. We have shown one-step prediction and asymptotic stabilization of the Henon chaos.

Incremental Learning and State-Space Evolving Fuzzy Control of Nonlinear Time-Varying Systems with Unknown Model

Preprint

Full-text available

Feb 2021

We present a method for incremental modeling and time-varying control of unknown nonlinear systems. The method combines elements of evolving intelligence, granular machine learning, and multi-variable control. We propose a State-Space Fuzzy-set-Based evolving Modeling (SS-FBeM) approach. The resulting fuzzy model is structurally and parametrically developed from a data stream with focus on memory and data coverage. The fuzzy controller also evolves, based on the data instances and fuzzy model parameters. Its local gains are redesigned in real-time -- whenever the corresponding local fuzzy models change -- from the solution of a linear matrix inequality problem derived from a fuzzy Lyapunov function and bounded input conditions. We have shown one-step prediction and asymptotic stabilization of the Henon chaos.

Aprendizado Incremental Online para Classificação Fuzzy de Emoções em Jogos a partir de Fluxos de Dados EEG

Conference Paper

Full-text available

Nov 2020

Descrevemos um algoritmo de aprendizado de máquina online para construção de Classificadores Fuzzy Gaussianos (eGFC). Apresentamos um método de extração e seleção de atributos do espectro de Fourier de dados de eletro-encefalograma. Os dados são obtidos de 28 indivíduos expostos aos jogos de computador Train Sim World, Unravel, Slender The Arrival, e Goat Simulator. De acordo com o sistema Arousal-Valence, 4 emoções prevalecem (tédio, calma, horror e diversão). Analisamos eletrodos individuais e o efeito de janelas de tempo e redução de dimensionalidade no desempenho de eGFC. Concluímos que eletrodos em ambos os hemisférios do cérebro auxiliam na classificação, especialmente aqueles dos lobos temporal (T7-T8), occipital (O1-O2) e frontal (Af3-Af4). Observamos que padrões podem surgir em qualquer parte do espectro de frequências, entre 1 e 64 Hz. A abordagem eGFC é efetiva para o problema Big data em tempo real. Ela alcança uma acurácia de 72,2% usando uma estrutura compacta de regras, e velocidade de processamento de 1,8 ms/amostra.

EGFC: Evolving Gaussian Fuzzy Classifier from Never-Ending Semi-Supervised Data Streams – With Application to Power Quality Disturbance Detection and Classification

Conference Paper

Jul 2020

Previsao de Vazão com Séries Temporais Nebulosas para Bacias do Sistema Interligado Nacional

Conference Paper

Dec 2023

This study proposes a comparative analysis between the Soil Moisture Accounting Procedure (SMAP/ONS) and the fuzzy time series technique for water flow forecasting in Brazilian hydroelectric power generation systems. The reservoir crisis of 2021, caused by the worst drought since 1931, which affected the country’s energy sector and increased energy tariffs, served as a warning of the importance of water resource planning and management and how it is essential for electricity generation in Brazil. The SMAP/ONS model, currently used by the Brazilian National Electric System Operator (ONS), calculates water flow based on evapotranspiration and precipitation. In contrast, the fuzzy time series technique is a machine learning-based approach that uses fuzzy logic to handle uncertain data. Results show that the fuzzy time series technique presented competitive performance in most assessed cases.

Adaptive Gaussian Fuzzy Classifier for Real-Time Emotion Recognition in Computer Games

Conference Paper

Nov 2021

Emotion recognition has become a need for more realistic and interactive machines and computer systems. The greatest challenge is the availability of high-performance algorithms to effectively manage individual differences and nonstationarities in physiological data, i.e., algorithms that customize models to users with no subject-specific calibration data. We describe an evolving Gaussian Fuzzy Classifier (eGFC), which is supported by a semi-supervised learning algorithm to recognize emotion patterns from electroencephalogram (EEG) data streams. We extract features from the Fourier spectrum of EEG data. The data are provided by 28 individuals playing the games ‘Train Sim World’, ‘Unravel’, ‘Slender The Arrival’, and ‘Goat Simulator’ – a public dataset. Different emotions prevail, namely, boredom, calmness, horror and joy. We analyze individual electrodes, time window lengths, and frequency bands on the accuracy of user-independent eGFCs. We conclude that both brain hemispheres may assist classification, especially electrodes on the frontal (Af3-Af4), occipital (O1-O2), and temporal (T7-T8) areas. We observe that patterns may be eventually found in any frequency band; however, the Alpha (8-13Hz), Delta (1-4Hz), and Theta (4-8Hz) bands, in this order, are more correlated with the emotion classes. eGFC has shown to be effective for real-time learning of EEG data. It reaches a 72.2% accuracy using a variable rule base, 10-second windows, and 1.8ms/sample processing time in a highly-stochastic time-varying 4-class classification problem.

Adaptive Gaussian Fuzzy Classifier for Real-Time Emotion Recognition in Computer Games

Preprint

Full-text available

Mar 2021

Human emotion recognition has become a need for more realistic and interactive machines and computer systems. The greatest challenge is the availability of high-performance algorithms to effectively manage individual differences and nonstationarities in physiological data streams, i.e., algorithms that self-customize to a user with no subject-specific calibration data. We describe an evolving Gaussian Fuzzy Classifier (eGFC), which is supported by an online semi-supervised learning algorithm to recognize emotion patterns from electroencephalogram (EEG) data streams. We extract features from the Fourier spectrum of EEG data. The data are provided by 28 individuals playing the games 'Train Sim World', 'Unravel', 'Slender The Arrival', and 'Goat Simulator' - a public dataset. Different emotions prevail, namely, boredom, calmness, horror and joy. We analyze the effect of individual electrodes, time window lengths, and frequency bands on the accuracy of user-independent eGFCs. We conclude that both brain hemispheres may assist classification, especially electrodes on the frontal (Af3-Af4), occipital (O1-O2), and temporal (T7-T8) areas. We observe that patterns may be eventually found in any frequency band; however, the Alpha (8-13Hz), Delta (1-4Hz), and Theta (4-8Hz) bands, in this order, are the highest correlated with emotion classes. eGFC has shown to be effective for real-time learning of EEG data. It reaches a 72.2% accuracy using a variable rule base, 10-second windows, and 1.8ms/sample processing time in a highly-stochastic time-varying 4-class classification problem.

Optimal Rule-Based Granular Systems From Data Streams

Article

Mar 2020

We introduce an incremental learning method for the optimal construction of rule-based granular systems from numerical data streams. The method is developed within a multiobjective optimization framework considering the specificity of information, model compactness, and variability and granular coverage of the data. We use α-level sets over Gaussian membership functions to set model granularity and operate with hyperrectangular forms of granules in nonstationary environments. The resulting rule-based systems are formed in a formal and systematic fashion. They can be useful in time series modeling, dynamic system identification, predictive analytics, and adaptive control. Precise estimates and enclosures are given by linear piecewise and inclusion functions related to optimal granular mappings.

Acceptance of concrete compressive strength,concrete structures

Article

Full-text available

Jan 2007

Introduction to Semi-Supervised Learning

Article

Jan 2009

Statistical Analysis with Missing Data.

Article

Jan 1989

Applications of Interval Computations

Book

Jan 1996

Primary Audience for the Book • Specialists in numerical computations who are interested in algorithms with automatic result verification. • Engineers, scientists, and practitioners who desire results with automatic verification and who would therefore benefit from the experience of suc cessful applications. • Students in applied mathematics and computer science who want to learn these methods. Goal Of the Book This book contains surveys of applications of interval computations, i. e. , appli cations of numerical methods with automatic result verification, that were pre sented at an international workshop on the subject in EI Paso, Texas, February 23-25, 1995. The purpose of this book is to disseminate detailed and surveyed information about existing and potential applications of this new growing field. Brief Description of the Papers At the most fundamental level, interval arithmetic operations work with sets: The result of a single arithmetic operation is the set of all possible results as the operands range over the domain. For example, [0. 9,1. 1] + [2. 9,3. 1] = [3. 8,4. 2], where [3. 8,4. 2] = {x + ylx E [0. 9,1. 1] and y E [3. 8,4. 2]}. The power of interval arithmetic comes from the fact that (i) the elementary operations and standard functions can be computed for intervals with formulas and subroutines; and (ii) directed roundings can be used, so that the images of these operations (e. g.

Regression Shrinkage and Selection via the LASSO

Article

Jan 1996

R. J. Tibshirani

Uncertainty and Intelligent information Systems

Book

Jul 2008

Intelligent systems are necessary to handle modern computer-based technologies managing information and knowledge. This book discusses the theories required to help provide solutions to difficult problems in the construction of intelligent systems. Particular attention is paid to situations in which the available information and data may be imprecise, uncertain, incomplete or of a linguistic nature. The main aspects of clustering, classification, summarization, decision making and systems modeling are also addressed. Topics covered in the book include fundamental issues in uncertainty, the rapidly emerging discipline of information aggregation, neural networks, Bayesian networks and other network methods, as well as logic-based systems.

Handbook of Granular Computing

Book

Jul 2008

Although the notion is a relatively recent one, the notions and principles of Granular Computing (GrC) have appeared in a different guise in many related fields including granularity in Artificial Intelligence, interval computing, cluster analysis, quotient space theory and many others. Recent years have witnessed a renewed and expanding interest in the topic as it begins to play a key role in bioinformatics, e-commerce, machine learning, security, data mining and wireless mobile computing when it comes to the issues of effectiveness, robustness and uncertainty. The Handbook of Granular Computing offers a comprehensive reference source for the granular computing community, edited by and with contributions from leading experts in the field. Includes chapters covering the foundations of granular computing, interval analysis and fuzzy set theory; hybrid methods and models of granular computing; and applications and case studies. Divided into 5 sections: Preliminaries, Fundamentals, Methodology and Algorithms, Development of Hybrid Models and Applications and Case Studies. Presents the flow of ideas in a systematic, well-organized manner, starting with the concepts and motivation and proceeding to detailed design that materializes in specific algorithms, applications and case studies. Provides the reader with a self-contained reference that includes all pre-requisite knowledge, augmented with step-by-step explanations of more advanced concepts. The Handbook of Granular Computing represents a significant and valuable contribution to the literature and will appeal to a broad audience including researchers, students and practitioners in the fields of Computational Intelligence, pattern recognition, fuzzy sets and neural networks, system modelling, operations research and bioinformatics.

Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic

Article