ThesisPDF Available

Evolving Granular Systems

Authors:

Abstract and Figures

In recent years there has been increasing interest in computational modeling approaches to deal with real-world data streams. Methods and algorithms have been proposed to uncover meaningful knowledge from very large (often unbounded) data sets in principle with no apparent value. This thesis introduces a framework for evolving granular modeling of uncertain data streams. Evolving granular systems comprise an array of online modeling approaches inspired by the way in which humans deal with complexity. These systems explore the information flow in dynamic environments and derive from it models that can be linguistically understood. Particularly, information granulation is a natural technique to dispense unnecessary details and emphasize transparency, interpretability and scalability of information systems. Uncertain (granular) data arise from imprecise perception or description of the value of a variable. Broadly stated, various factors can affect one's choice of data representation such that the representing object conveys the meaning of the concept it is being used to represent. Of particular concern to this work are numerical, interval, and fuzzy types of granular data; and interval, fuzzy, and neurofuzzy modeling frameworks. Learning in evolving granular systems is based on incremental algorithms that build model structure from scratch on a per-sample basis and adapt model parameters whenever necessary. This learning paradigm is meaningful once it avoids redesigning and retraining models all along if the system changes. Application examples in classification, function approximation, time-series prediction and control using real and synthetic data illustrate the usefulness of the granular approaches and framework proposed. The behavior of nonstationary data streams with gradual and abrupt regime shifts is also analyzed in the realm of evolving granular computing. We shed light upon the role of interval, fuzzy, and neurofuzzy computing in processing uncertain data and providing high-quality approximate solutions and rule summary of input-output data sets. The approaches and framework introduced constitute a natural extension of evolving intelligent systems over numeric data streams to evolving granular systems over granular data streams.
Content may be subject to copyright.
Evolving Granular Systems
Daniel Furtado Leite
Supervisor: Fernando Gomide
Co-supervisor: Pyramo Costa Jr.
Department of Computer Engineering and Industrial Automation
School of Electrical and Computer Engineering
University of Campinas
A thesis submitted in partial fulfillment of the requirements for
the degree of Doctor of Philosophy
July, 2012
i
ii
iii
Acknowledgement
First and foremost, I would like to thank my supervisor Fernando Gomide. It is
difficult to imagine this thesis without his inspiring mentorship, insightful sug-
gestions, and words of encouragement. I owe him a debt of gratitude for all I
have learned from him. I am also greatly fortunate to have Pyramo Costa as my
co-supervisor. Apart from providing me with numerous research feedbacks, he
has been a constant source of support and stimulation.
My academic life was shaped by great professors. I would like to acknowledge
Fernando Von Zuben, Romis Attux, Akebo Yamakami, and Takaaki Ohishi for
lecturing some of the best classes I have ever taken, and for nurturing my early
interests in computational intelligence, mathematics and optimization. I am also
indebted to Professor Rosangela Ballini for her help in obtaining and analyzing
datasets, and for sharing her expertise in several opportunities.
I would like to thank the committee members Andre Lemos, Weldon Lodwick,
and again Fernando Von Zuben, Romis Attux, and Fernando Gomide for the
constructive feedback they provided me in order to improve this thesis.
I was privileged to have known talented and enthusiastic folks in our LCA
group throughout these years. Special thanks go to Joelma Costa, Glaucya
Boechat, Yi Liu, Fernando Bordignon, Leandro Maciel, and Vitor Marques for
being a great support in hard times and joyful company in good times. My grat-
itude extends to Lucas Nascimento, Alan Barbosa, Luiz Bergo, Enderson Cruz,
and Israel Mendes, fellows from our LASI group at PUC Minas.
I appreciated the help of Carmen Fonseca, Mariana Silva, Noemia Benatti,
Edson Filho, Maria Waldman, Carolina Velho, Jerusa Soares, and Zilda Padovan,
who made their support available in countless ways whenever I needed it.
I have much gratitude for CAPES, the Brazilian Ministry of Education, for
iv
the fellowship which enabled me to pursue this research.
Last, but certainly not least, I would like to thank Daniela and Lucia for
their endless support and encouragement during this long endeavor. Their love
and perseverance have kept me going through the roughest of times. To them I
dedicate this thesis.
v
Abstract
In recent years there has been increasing interest in computational modeling ap-
proaches to deal with real-world data streams. Methods and algorithms have been
proposed to uncover meaningful knowledge from very large (often unbounded)
data sets in principle with no apparent value. This thesis introduces a framework
for evolving granular modeling of uncertain data streams. Evolving granular sys-
tems comprise an array of online modeling approaches inspired by the way in
which humans deal with complexity. These systems explore the information flow
in dynamic environments and derive from it models that can be linguistically un-
derstood. Particularly, information granulation is a natural technique to dispense
unnecessary details and emphasize transparency, interpretability and scalability
of information systems. Uncertain (granular) data arise from imprecise percep-
tion or description of the value of a variable. Broadly stated, various factors
can affect one’s choice of data representation such that the representing object
conveys the meaning of the concept it is being used to represent. Of particular
concern to this work are numerical, interval, and fuzzy types of granular data; and
interval, fuzzy, and neurofuzzy modeling frameworks. Learning in evolving gran-
ular systems is based on incremental algorithms that build model structure from
scratch on a per-sample basis and adapt model parameters whenever necessary.
This learning paradigm is meaningful once it avoids redesigning and retraining
models all along if the system changes. Application examples in classification,
function approximation, time-series prediction and control using real and syn-
thetic data illustrate the usefulness of the granular approaches and framework
proposed. The behavior of nonstationary data streams with gradual and abrupt
regime shifts is also analyzed in the realm of evolving granular computing. We
shed light upon the role of interval, fuzzy, and neurofuzzy computing in process-
vi
ing uncertain data and providing high-quality approximate solutions and rule
summary of input-output data sets. The approaches and framework introduced
constitute a natural extension of evolving intelligent systems over numeric data
streams to evolving granular systems over granular data streams.
vii
Publications
During the course of this research, a number of publications were produced which
are based on or somehow related to the content of this thesis. They are listed
below for reference.
Book chapters
Leite, D.; Costa, P.; Gomide, F. “Interval approach for evolving granular
system modeling.” In: Mouchaweh, M.; Lughofer, E. (Eds.) Learning in
Non-stationary Environments: Methods and Applications, Springer - New
York, pp: 271-301, 2012.
Leite, D.; Gomide, F. “Evolving linguistic fuzzy models from data streams.”
In: Trillas, E.; Bonissone, P.; Magdalena, L.; Kacprycz, J. (Eds.) Combin-
ing Experimentation and Theory: A Hommage to Abe Mamdani (Studies
in Fuzziness and Soft Computing), Springer - Verlag, pp: 209-223, 2011.
Leite, D.; Costa, P.; Gomide, F. “Granular approach for evolving systems
modeling.” In: Hullermeier, E.; Kruse, R.; Hoffmann, F. (Eds.) Lecture
Notes in Artificial Intelligence (LNAI/IPMU), Vol. 6178, pp: 340-349,
Springer - Verlag Berlin Heidelberg, 2010.
Journals
Leite, D.; Costa, P.; Gomide, F. “Evolving granular neural networks from
fuzzy data streams.” Neural Networks. (Accepted).
Leite, D.; Ballini, R.; Costa, P.; Gomide, F. “Evolving fuzzy granular mod-
eling from nonstationary fuzzy data streams.” Evolving Systems, Springer.
Vol. 3, Issue 2, pp: 65-79, 2012.
Leite, D.; Hell, M.; Costa, P.; Gomide, F. “Real-time fault diagnosis of
nonlinear systems.” Nonlinear Analysis: Theory, Methods & Applications.
Vol. 71, Issue 12, pp: 2665-2673, 2009.
viii
International conferences
Leite, D.; Costa, P.; Gomide, F. “Evolving granular neural network for
fuzzy time series forecasting.” World Congress on Computational Intelli-
gence (WCCI: IJCNN), Brisbane - AU, 8p. 2012.
Lemos, A.; Leite, D.; Maciel, L.; Ballini, R.; Caminhas, W.; Gomide, F.
“Evolving Fuzzy Linear Regression Tree Approach for Forecasting Sales
Volume of Petroleum Products.” World Congress on Computational Intel-
ligence (WCCI: FUZZ-IEEE), Brisbane - AU, 8p. 2012.
Leite, D.; Gomide, F.; Ballini, R.; Costa, P. “Fuzzy granular evolving mod-
eling for time series prediction.” IEEE International Conference on Fuzzy
Systems, Taipei - TW, pp: 2794-2801, 2011.
Leite, D.; Costa, P.; Gomide, F. “Evolving granular neural network for semi-
supervised data stream classification.” World Congress on Computational
Intelligence (WCCI: IJCNN), Barcelona - ES, pp: 1877-1884, 2010.
Leite, D.; Costa, P.; Gomide, F. “Evolving granular classification neural
networks.” IEEE International Joint Conference on Neural Networks, At-
lanta - US, pp: 1736-1743, 2009.
Leite, D.; Attux, R.; Von Zuben, V.; Costa P.; Gomide, F. “Evolutionary
neural network applied to induction motors stator fault detection.” IEEE
International Electric Machines and Drives Conference, Miami - US, pp:
1721-1728, 2009.
Leite, D.; Costa P.; Gomide, F. “Interval-based evolving modeling.” IEEE
Symposium Series on Computational Intelligence: Workshop on Evolving
Systems, Nashville - US, pp: 1-8, 2009.
Brazilian conferences
Leite, D.; Ballini, R.; Costa, P.; Gomide, F. “Fuzzy granular evolving mod-
eling.” 10th Brazilian Symposium on Intelligent Automation, Sao Joao Del
Rey - MG, 6p. 2011. (In Portuguese).
Leite, D.; Costa, P.; Gomide, F. “Granular neural networks for semi-
supervised learning.” 18th Brazilian Congress on Automatics, Bonito - MS,
8p. 2010. (In Portuguese).
Leite, D.; Costa, P.; Gomide, F. “Evolving granular systems: real-time
processing of data streams.” (Abstract) 1st Brazilian Congress on Fuzzy
Systems, Sorocaba - SP, 2p. 2010. (In Portuguese).
ix
Leite, D.; Nascimento, L.; Barbosa, A.; Costa, P.; Ferreira, D.; Gomide,
F. “Evolving approach for power transformer fault detection.” 6th Interna-
tional Workshop on Power Transformers, Foz do Iguacu - PR, 8p. 2010.
(In Portuguese).
Leite, D.; Gomide, F. “Granular neural network for evolving classification.”
(Extended abstract) Annual Meeting of the Department of Computer Engi-
neering and Industrial Automation, UNICAMP, Campinas - SP, 4p. 2010.
(In Portuguese).
Leite, D.; Bergo, L.; Costa, P.; Gomide, F. “Evolving granular neural net-
works in systems modeling.” 9th Brazilian Congress on Neural Networks,
Ouro Preto - MG, 5p. 2009. (In Portuguese).
Leite, D.; Costa, P.; Gomide, F. “Evolving granular neural networks.” 9th
Brazilian Symposium on Intelligent Automation, Brasilia - DF, 6p. 2009.
(In Portuguese).
Leite, D.; Costa, P.; Gomide, F. “Evolving connectionist systems.” 9th
Brazilian Symposium on Intelligent Automation, Brasilia - DF, 6p. 2009.
(In Portuguese).
Leite, D.; Gomide, F. “Interval-based evolving modeling for streamflow
forecasting.” (Extended abstract) Annual Meeting of the Department of
Computer Engineering and Industrial Automation, UNICAMP, Campinas
- SP, 4p. 2009.
x
xi
Contents
Acknowledgement iv
Abstract vi
Publications viii
1 Introduction 1
1.1 Background Research . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Objective ............................... 6
1.3 Contributions ............................. 7
1.4 Organization ............................. 8
2 Foundations of Granular Computing 11
2.1 Introduction.............................. 11
2.2 IntervalAnalysis ........................... 14
2.3 From Interval Analysis to Fuzzy Set Theory . . . . . . . . . . . . 21
2.4 FuzzySets............................... 22
2.5 Aggregation Operators . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6 Summary ............................... 30
3 Evolving Granular Systems 31
3.1 Introduction.............................. 31
3.2 Evolving Intelligent Systems . . . . . . . . . . . . . . . . . . . . . 32
3.3 Granular Data Streams . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4 Evolving Granular Modeling . . . . . . . . . . . . . . . . . . . . . 40
3.5 Time and Space Granulation . . . . . . . . . . . . . . . . . . . . . 41
3.6 Summary ............................... 45
4 Interval Based Evolving Modeling 47
4.1 Introduction.............................. 47
4.2 RelatedWork ............................. 49
4.3 Structure and Processing . . . . . . . . . . . . . . . . . . . . . . . 50
xii
CONTENTS
4.4 LearninginIBeM........................... 51
4.5 Summary ............................... 60
5 Fuzzy Set Based Evolving Modeling 61
5.1 Introduction.............................. 61
5.2 RelatedWork ............................. 62
5.3 Structure and Processing . . . . . . . . . . . . . . . . . . . . . . . 63
5.4 LearninginFBeM .......................... 66
5.5 Summary ............................... 74
6 Evolving Granular Neural Networks 75
6.1 Introduction.............................. 75
6.2 RelatedWork ............................. 77
6.3 Fuzzy Aggregation Neuron Model . . . . . . . . . . . . . . . . . . 80
6.4 Structure and Processing . . . . . . . . . . . . . . . . . . . . . . . 81
6.5 LearningineGNN .......................... 86
6.6 Summary ............................... 96
7 Application Examples 97
7.1 Introduction.............................. 97
7.2 Semi-Supervised Classification . . . . . . . . . . . . . . . . . . . . 98
7.3 Time Series Prediction . . . . . . . . . . . . . . . . . . . . . . . . 107
7.4 Function Approximation . . . . . . . . . . . . . . . . . . . . . . . 122
7.5 Control ................................ 134
7.6 Summary ............................... 140
8 Conclusion 141
8.1 Summary ............................... 141
8.2 Contributions ............................. 143
8.3 FutureResearch............................ 144
A Universal Approximation 147
B Recursive Least Squares Method 153
References 157
xiii
List of Figures
2.1 Image fof box Iand inclusion functions Fand F........ 19
3.1 Granular models: (a) single-valued function, (b) granular function 39
3.2 Time and space granulation . . . . . . . . . . . . . . . . . . . . . 42
4.1 Expansion region of an IBeM granule . . . . . . . . . . . . . . . . 54
4.2 Creation and recursive adaptation of IBeM granules . . . . . . . . 56
4.3 Inter-granular conflict and data accommodation . . . . . . . . . . 58
5.1 Scattering approach for fuzzy data granulation . . . . . . . . . . . 64
5.2 Creation and recursive adaptation of FBeM granules . . . . . . . 71
6.1 Fuzzy aggregation neuron model . . . . . . . . . . . . . . . . . . . 81
6.2 Examples of input/output functions of fuzzy aggregation neurons 82
6.3 Single-valued approximation provided from input data processing 83
6.4 Granular approximation formed by input and output data granu-
lation.................................. 84
6.5 eGNN single-valued (a) and granular (b) approximation of a function 86
6.6 Stability-plasticity tradeoff and the role ρin eGNN systems . . . . 89
7.1 The rotating Gaussians problem . . . . . . . . . . . . . . . . . . . 99
7.2 ROC curves of different methods for the rotating Gaussians . . . . 102
7.3 eGNN decision boundary and last 200 data at particular time steps 103
7.4 A third class appears at h= 200 and remains . . . . . . . . . . . 104
7.5 FBeM evolution of the Acc index, rule base and granularity for the
new-classproblem........................... 105
7.6 eGNN decision boundaries for the 3-class problem . . . . . . . . . 106
7.7 Performance of evolving granular classifiers using different propor-
tionsofunlabeleddata........................ 108
7.8 FBeM Death Valley temperature forecasts . . . . . . . . . . . . . 114
7.9 eGNN Helsinki temperature forecasts . . . . . . . . . . . . . . . . 115
xiv
LIST OF FIGURES
7.10 Comparing the narrowness of granular forecasts using rule bases
ofdierentsizes............................ 116
7.11 FBeM processing time and RMSE using different amounts of in-
put variables from temperature time series . . . . . . . . . . . . . 118
7.12 FBeM processing time and RMSE for the Death Valley, Ottawa,
and Lisbon time series considering different numbers of rules . . . 119
7.13 FBeM prediction of the Death Valley, Ottawa, and Lisbon tem-
perature time series combined . . . . . . . . . . . . . . . . . . . . 121
7.14 eGNN approximation of the concrete compressive strength func-
tion, and evolution of the rule base, error indices, and granularity 126
7.15 Evolving granular systems results on leave-one-variable-out ap-
proach to find less correlated subsets of input variables . . . . . . 129
7.16 FBeM approximation of the Parkinson’s telemonitoring function,
and evolution of the rule base, error indices, and granularity . . . 132
7.17 Environment for sensor-based navigation . . . . . . . . . . . . . . 134
7.18 Initial conditions for the autonomous navigation control problem . 136
7.19 Granular controllers navigating at different speeds . . . . . . . . . 138
7.20 Detail of the FBeM navigation at different speeds . . . . . . . . . 139
7.21 FBeM navigating with noisy input . . . . . . . . . . . . . . . . . 140
xv
List of Tables
7.1 Rotating Gaussians: comparing evolving/non-evolving methods . 101
7.2 New class problem: comparing evolving granular methods . . . . . 104
7.3 Monthly temperature values . . . . . . . . . . . . . . . . . . . . . 110
7.4 Temperature forecasts . . . . . . . . . . . . . . . . . . . . . . . . 113
7.5 Concrete compressive strength prediction: evaluation of different
typesofeGNNneurons........................ 124
7.6 Concrete compressive strength prediction: evaluating different evolv-
ingmethods.............................. 125
7.7 Parkinson’s telemonitoring prediction: evaluation of different types
ofeGNNneurons ........................... 128
7.8 Parkinson’s telemonitoring prediction: evaluating different methods 131
7.9 Comparison of different evolving granular controllers . . . . . . . 137
xvi
xvii
Chapter 1
Introduction
The computing world has experienced a rapidly increasing growth of information.
A proliferation of automated systems, small scale computing devices, sensor net-
works, and data capture technologies has contributed to the production of large
volumes of data. Data set growth sometimes outpaces available storage capacity
and other times data are stored with no prospective use. Broadly stated, the
focus of data processing and analysis has changed from offline batch processing
of data to the incremental handling of online data streams.
Online data streams originate from a variety of sources such as media enter-
tainment, surveillance systems, mobile devices, multimedia, industrial monitoring
and control, oceanographic and atmospheric systems, health care, stock market,
satellites, financial and meteorological systems, web traffic and clickstreams, to
name a few. Their prominence in real-world systems, along with the necessity
of modeling, analyzing, and understanding these systems, has brought new chal-
lenges, greater demands, and new research directions.
Research and development of conceptual frameworks, methods, and algo-
rithms capable of extracting knowledge from data streams have taken place mo-
tivated by a manifold of relevant applications. Data stream modeling is funda-
mentally based on computational learning approaches that both, process data
continuously as an attempt to find similarities in their spatio-temporal features,
and thereafter provide insights about the phenomenon which governs the data.
The ultimate goal is to obtain more abstract (often human-centric) representa-
tions of large amounts of detailed data with no apparent value.
1
1.1 Background Research
Modeling, processing, and disposing information become more complex as
real-world systems become more complex. Data streams are characterized by
nonstationarity, nonlinearity, and heterogeneity; they are potentially endless and
may be subject to changes of various kinds. Direct application of machine learning
and data mining algorithms to data streams is very often infeasible because it
is difficult to maintain all the data in memory. A particular challenge faced in
stream modeling concerns how to handle uncertainty.
The primary research question of this thesis is how to obtain accurate and in-
terpretable human-centered models from uncertain data streams. We introduce
evolving granular systems, a granular modeling framework able to capture the
essence of uncertain data streams in a more abstract and compact representa-
tion. While the term ‘evolving’ derives from structurally adaptive models from
data streams, the term ‘granular’ comes from granular computing theory and
emphasizes comprehensible models of uncertainty. This thesis combines evolving
intelligence and granular computing concepts and ideas, and explicitly realizes
them into a practical evolving granular framework.
1.1 Background Research
Uncertainty is an attribute of information since our ability to perceive reality is
often limited (27) (154). The more complex a system is, potentially, the more
uncertain we are of the available information, and the more imprecise is our un-
derstanding of that system. The imprecision of real-world perception is evident
in natural languages and also in empirical measurements where it is known that
the process which generated the data is uncertain. As Kreinovich stated, mea-
surements and expert estimates are never exact (72). Modeling complex systems
raises doubts about the necessity of precise models (118). Granular computing
theory (15) (89) (116) (141) (144) (151) hypothesizes that accepting some level of
uncertainty may be beneficial and therefore suggests a balance between precision
and uncertainty.
Information granulation for uncertainty representation is a fundamental man-
ifestation of the human knowledge (15). Information granulation means that,
instead of dealing with detailed real-world data, the data are considered in a
2
1.1 Background Research
more abstract and conceptual perspective. The result of information granulation
is called information granule - a granule being a cluster of points put together
by indistinguishability, similarity, proximity, or functionality (152). Examples of
granules include hyperboxes, fuzzy sets, bell-shaped probability distributions and
rough sets (153). A set of granules constitutes a vocabulary of generic descrip-
tors (115), and underlies the basic concepts of linguistic variable and rule-based
systems. Put differently, granules are semantically meaningful building blocks
of granular rule-based systems (18). Granular rules connect the elements of the
vocabulary and therefore play an important role in computation with information
described in natural language.
The notion of granulation emerged as a natural need to abstract and sum-
marize information and data to support various processes of comprehension and
decision making (15). For example, when we observe an environment we seldom
take into account all of the details of that environment. Because of our physical
and cognitive limitations, a reduced number of samples, variables, and attributes
of interest are brought into focus. To avoid distracting details we are provided
with effective abstraction mechanisms. Detailed numeric data are integrated (ag-
gregated) into kinds of information granules where the granules themselves are
regarded as sets of elements that are perceived as being functionally equivalent
(116). There are close relations between granulation, data mining (135), data
fusion (86), and knowledge discovery (99).
From a more practical point of view, granular computing is a framework for
problem solving, complexity reduction and structured thinking (142). It deals
with data granulation and granular data processing. Information granulation
splits a complex problem into simpler sub-problems and treats them on an in-
dividual basis. Granular systems able to self-adapt their structures from data
streams have only been formally investigated since the early 2000s.
Granular models developed from data streams can be expressed in several
computationally tractable frameworks such as interval mathematics, statistics,
fuzzy sets, rough sets, shadow sets, cluster analysis, decision trees, neighborhood
systems, or hybrids. On top of these are generalized constraints, in the sense
of Zadeh’s general theory of uncertainty (154), which are used to delimit and
represent granules within the different frameworks. Computing with granules
3
1.1 Background Research
grants ample freedom to choose representative granular objects and handling
tools. Regardless of the framework chosen, online granulation aims to retain the
essence of stream data as granular objects. Online granular computing models
consider online granular data streams under simpler (less detailed) resolutions.
The fundamental objective is to extract features of interest from the data to
attain efficient solutions and a better rapport with reality.
Evolving intelligent systems are a mainstream of research in online data mod-
eling (5) (11) (65) (66) (97). These systems encompass one pass recursive al-
gorithms (algorithms independent of previous data) and manage to build the
structure of models from scratch as new information arises. We use the term
‘evolving’ in the sense of gradual development of the system structure (rule base
or the architecture of a neural network) and their parameters. This learning
paradigm mimics the evolution of individuals during their life-cycle, especially
humans: learning from experience, inheritance, and gradual change. Knowledge
can be generated from repetitive tasks and from data streams produced through
perceptions and sent to the brain. The development of the rule-base or neural
network structure is gradual, where the rules/neurons are not fixed or pre-defined.
Evolving systems generate new rules (neurons) each time new data does not fit
into the existing model/understanding (fuzzy rule-base or neural network), but at
the same time only when this new data is informative enough (9) (11). Classifi-
cation, clustering, frequent pattern mining, time series prediction, regression and
control are examples of problems addressed in the evolving systems literature.
Note that stream data modeling should not be confused with time series data
modeling (51). Although related, time series carry static objects that can, in
principle, be analyzed offline whereas stream data require the evolution of the
model structure in online mode, which is not a requirement in time series analysis
(21). Particularly, conventional statistical (54), computational intelligence (44),
and machine learning (101) systems do not meet the requirements of data stream
modeling because they assume forms of linearity and stationarity, or demand
multiple passes over entire data sets and offline processing. Very often there are
real-time constraints that must be met by stream algorithms. Evolving intelligent
systems arose as a framework to model online data streams and overcome the
drawbacks of existing conventional systems.
4
1.1 Background Research
Currently, a number of evolving intelligent systems have succeeded in dealing
with time-varying numeric data by means of recursive clustering algorithms and
adaptive local models. Notwithstanding, these systems are quite often unable
to process granular data and realize granular-data-stream-oriented computing in
unknown nonstationary environments. Informally, if uis a value which is known
precisely (exactly), we refer to uas a singular (point) value. Conversely, if u
is not known precisely, but there is some information which constrains possible
values of u, then the constraint on udefines a granular value (155).
This thesis introduces evolving granular systems, which extend evolving intel-
ligent systems in two ways. First, evolving granular systems deal with granular
input and output data such as intervals, fuzzy numbers, and fuzzy intervals.
Granular data may arise from expert judgment, readings from unreliable sen-
sors, and summaries of numeric data over time periods. Interval and fuzzy data
stream modeling generalizes numeric data stream modeling by allowing interval
and fuzzy data granulation. Numeric (singular) data stream is a special case of
granular data stream. Second, evolving granular systems provide granular ap-
proximation of functions. Granular approximation refers to an enclosure where
the output data are within the granular approximation. The granular approxima-
tion may come with a linguistic description, in addition to a numeric pointwise
approximation common of evolving intelligent systems. Granular output is useful
for interpretability purposes and helps to enhance model acceptability.
Numeric, interval and fuzzy granular data streams and interval, fuzzy and
neurofuzzy granular frameworks are of special concern to this study. Indepen-
dent of the data type and framework, evolving granular systems aim to pro-
vide transparent rule-based models that are built from a data sequence. Interval
mathematics (60) (69) (104) and fuzzy set theory (41) (159), as practical frame-
works of granular computing, capture our innate conception of transitional set
belonging and uncertainty. While ‘below 100’, ‘around 10 and 20’, and ‘above
100’ are examples of interval data, ‘about 20’, and ‘around 90’ are examples of
fuzzy data. The fundamental distinction between the interval and fuzzy granular
frameworks concerns the notion of partial membership supported by fuzzy sets.
Whenever interval yes-or-no quantification of concepts becomes too restrictive,
fuzzy sets offer an important feature of describing information granules whose
5
1.2 Objective
constituting elements may belong only partially, i.e., more-or-less quantification
of concepts. Fuzzy sets avoid specifying solid borders between full belongingness
and full exclusion by means of smooth transition boundaries (116). Artificial
neural networks (55) are nonlinear, highly plastic systems equipped with signif-
icant learning capability. Fuzzy sets and fuzzy neurons provide neural networks
with mechanisms of approximate reasoning and transparency of the resulting con-
struction. Fuzzy sets and neurocomputing are complementary in terms of their
strengths thus motivating neurofuzzy granular computing. Granules formalized
in any of these frameworks, interval, fuzzy or neurofuzzy, can facilitate a vast
array of human-centric pursuits (116).
1.2 Objective
The main objective of this thesis is to introduce and characterize a theoretical
evolving granular modeling framework and a suite of practical approaches to learn
from and process uncertain data streams with a focus on accuracy, transparency
and interpretability of models.
Many issues arose during the course of this research which had to be overcome
in order to achieve the main objective. The research issues included:
how to process interval and fuzzy types of data;
how to fit uncertain data into rule-based granular models;
how to create, delete and refine granules and rules without the need to
redesign and retrain the system structure from scratch;
how to analyze large volumes of stream data efficiently;
how to adjust the granularity of models based on stream data;
how to obtain more flexible granular constructs;
how the interval, fuzzy and neurofuzzy approaches are comparable to each
other and alternative approaches.
Other, more subtle, issues are discussed in context throughout the chapters.
6
1.3 Contributions
1.3 Contributions
The contributions of this thesis can be broadly divided into three groups: con-
ceptual, methodological and computational.
The conceptual contribution is the introduction of a new modeling framework
to represent and process granular data streams. The framework allows input and
output data to be real numbers, intervals, and fuzzy sets. We discuss the notion of
granular data streams as well as learning and model building driven by such data
streams. Central to the proposed framework is not only computational efficiency,
but also interpretability and transparency. The framework intends to develop
human-oriented models whose results are readily understood. Formulations and
conceptualizations are provided which are intended to establish foundations for
online granular data processing and uncertainty management.
The methodological contributions of this thesis are three practical approaches
to handle granular data streams. The approaches are oriented to different types
of input and output data and each is supported by concepts and tools derived
from different theories. In common, both approaches are designed to capture the
very essence of the underlying data stream.
First, we introduce interval-based evolving modeling. Interval-based evolving
modeling (IBeM) is a granular approach to enclose imprecise data and produce
rule-based summary. IBeM emphasizes imprecise data manifesting as tolerance
intervals and learning procedures grounded in fundamentals of interval mathemat-
ics. Antecedent and consequent parts of interval rules are interval hyperboxes,
which are linked by an interval granular mapping - or inclusion function in the
interval analysis terminology. Interval granular approach for systems modeling
makes no specific assumption about the data including probability distributions,
membership functions and belief or possibility values.
Second, we address fuzzy set based evolving modeling. Fuzzy set based
evolving modeling (FBeM) employs fuzzy granular models to deal with more de-
tailed fuzzy granular data and therefore provide a more comprehensible (human-
intelligible) representation of the data. For each fuzzy model there exists an asso-
ciated fuzzy rule base. The structure of the fuzzy rule base is gradually developed
from incremental learning algorithm suitable to process potentially unbounded
7
1.4 Organization
fuzzy data streams. FBeM renders linguistic models of information systems and
single-valued and granular approximation of nonstationary functions.
Third, we consider evolving neurofuzzy networks. Evolving granular neural
networks (eGNN) use fuzzy granules and fuzzy aggregation neurons for informa-
tion fusion. The fuzzy aspect allows a neural network to be translated into a
knowledge base and a rule-based inference system that can be promptly read and
understood. The eGNN learning algorithm is committed to building and adapt-
ing the neural network structure using fuzzy data streams. It may add or remove
granules, neurons and respective connections whenever necessary. This means
that the neural network captures new information from data streams, adapts
itself to the new scenario, and avoids redesigning and retraining.
The third primary contribution of this thesis concerns an extensive set of com-
putational results detailing the performance and demonstrating the usefulness of
the proposed approaches. The interval IBeM, fuzzy FBeM, and neurofuzzy eGNN
approaches are evaluated in a variety of applications such as semi-supervised clas-
sification, time series prediction, function approximation, and control. The ap-
plication examples emphasize the difficulty of currently existing methods to deal
with nonstationary data streams. The results demonstrate the competitiveness
of the proposed evolving granular approaches and framework.
1.4 Organization
This thesis is organized into eight chapters as summarized below.
This chapter contains a general statement of the problem dealt with in this
thesis and places the research into a broader perspective by connecting it
to well-established information systems theories.
Chapter 2 provides a review on concepts of granular computing and uncer-
tainty processing. We provide essentials from interval analysis, fuzzy sets,
and aggregation operators to form a background of concepts and support
our developments.
Chapter 3 covers the state-of-the-art research in evolving intelligent systems
8
1.4 Organization
and introduces a theoretical framework for the analysis and representation
of granular data.
Chapter 4 introduces an interval learning method. Interval-based evolving
modeling is an approach to deal with stream interval data. Interval granules
are characterized by sharp lower and upper bounds and empty content.
We present details along with some intuition behind learning heuristics of
interval algorithms.
Chapter 5 presents a fuzzy extension of the aforementioned interval method.
Fuzzy set based evolving modeling uses fuzzy data streams to develop rule-
based fuzzy granular models. Fuzzy sets avoid specifying solid borders
between full belongingness and full exclusion by means of sets with partial
membership.
Chapter 6 proposes evolving granular neural networks. These networks use
fuzzy granules and fuzzy neurons for information fusion and uncertainty rep-
resentation. The underlying granular construction is incrementally evolved
from a learning algorithm. It pictures a set of fuzzy rules and a fuzzy
inference system, which are obtained from fuzzy data streams.
Chapter 7 addresses application examples of evolving granular systems in
semi-supervised classification, function approximation, time series predic-
tion, and control problems. They are accompanied with discussions and
comparisons. We contrast the methods introduced in chapters 4, 5 and 6
with the state-of-the-art online and traditional offline methods.
Chapter 8 concludes this thesis and proposes future research directions.
9
10
Chapter 2
Foundations of Granular
Computing
This chapter provides definitions and principles of granular computing. Essen-
tial notions of interval analysis and fuzzy sets are addressed from the granular
computing point of view. Some notation to be used throughout this thesis is in-
troduced. The chapter also covers different types of aggregation operators which
map several real inputs in the unit hypercube onto a single output in the unit
interval. Aggregation operators perform information fusion by gathering large
volumes of dissimilar information into a more compact form. Intervals and fuzzy
sets are instances of practical frameworks of granular computing.
2.1 Introduction
Theories and methodologies that make use of granules to solve problems fea-
tured by supplying huge amounts of data, information, and knowledge label a
new area of multi-disciplinary study called Granular Computing (15) (89) (117)
(141) (144) (151). Granular computing as a paradigm of information processing
spotlights multiple levels of data detailing to often provide useful abstractions
and approximate solutions to hard real-world problems (18) (19) (118) (146).
Granular information systems have appeared under different names in related
fields such as interval analysis, fuzzy and rough sets, divide and conquer, quotient
11
2.1 Introduction
space theory, information fusion, and others (see (141)). Elementary processing
units in granular systems are referred to as information granules. An information
granule is defined as a clump of entities that may originate at the numeric (singu-
lar) or granular level and are arranged together due to their similarity, proximity,
indistinguishability, or coherency.
The goal of a granule is to catch the very essence of the overall data in a
concise and explainable manner (15) (118); it defines a subset of a universal
set and conveys an internal representation. Granules may be interpreted from
two points of view: from the perspective of uncertainty theory, they are units
lacking precise knowledge; from that of knowledge engineering, they are units of
elementary knowledge.
Granular computing is intended to identify manifestations of granules from
moving back and forth among granularities to yield more or less differentiation.
Too much detail is wasteful whereas too little renders a system useless. In general,
there is no universal level of granularity of information: the size of granules is
problem-oriented and user-dependent. Granularity is defined as the extent to
which a larger and more complex system is broken down into smaller and simpler
parts. We can quantify the granularity of a granule, for example, by counting its
number of elements. The more elements are located in a granule, the lower is its
granularity, and the higher is its generality (116). High granularities can produce
substantial computational overhead for data storage. In excess, granularities
and granules bring undesirable scalability issues such as incapacity to satisfy the
required throughput. The granularity of information that is explicitly inbuilt
into granules provides useful features in information systems modeling such as
transparency and flexibility.
Let the result of data granulation be designated as a granular structure. A
granular structure is a family of granules which, when considered together, re-
assemble the more complex original system. Handling a complex phenomenon by
means of granular structures allows us to arrive at meaningful solutions. Based on
some carefully chosen granularity, granular computing systems attempt to solve
a problem by isolating its loosely connected sub-problems and handling them on
an individual basis.
Granules of multiple sizes are related to the depth of penetration that charac-
12
2.1 Introduction
terizes a system. A coarse granular structure contains fewer number of granules
compared to a fine granular structure. This can be stated more precisely as fol-
lows. A coarse granular system regards a small amount of large granules usually
characterized by low precision and high interpretability. A fine granular system
regards a large amount of small granules, high precision, and limited interpreta-
tion. Low-level refined granules provide details about the system functionality.
More abstract, high-level granules are easier to manage and interpret, but may
lose important minutiae.
Input and output data sets generate input and output granular structures,
respectively, which should be somehow connected. We name the correspondence
between input and output granular structures as granular mapping. A granular
mapping is defined over information granules lying in an input space and maps
them into a collection of granules expressed in some output space. Granular
mappings can be encountered quite frequently in rule-based systems, where the
mapping is given as If-Then statements (18).
In granular computing, everything, including data, variables and parameters,
is allowed to be granular. In general, inaccurate measurements and perception-
based information are granular, for example: xis small’, ‘approximately 90’,
‘temperature is high’, ‘probability is high’, [20,25]. In this sense, a granular sys-
tem provides NL-capability (126), that is, capability to operate on information
described in Natural Language. NL-capability is important because much of hu-
man knowledge is described in natural language. Imprecision of human sensory
organs and brain is passed on to natural language (154). More specifically, when
a proposition expressed in a natural language is represented as a system of gen-
eralized constraints (153), it is, in effect, a granular system. Computation with
information described in natural language ultimately reduces to computation with
granular values.
Computing with granules brings together existing formalisms of interval anal-
ysis, fuzzy sets, rough sets, etc. under one roof. In spite of several visible distinct
underpinnings of these theories, they exhibit fundamental synergies, which are
exploited in the granular computing framework (117).
13
2.2 Interval Analysis
2.2 Interval Analysis
Interval analysis is a branch of mathematics that provides reliable numerical
tools for problem solving; it treats an interval both as a set and as a number (53)
(60) (69) (103) (104) (109). While arithmetic performs operations on numbers,
interval arithmetic performs operations on intervals. Generally speaking, intervals
are instances of granules. Granular computing materializes in the framework of
interval analysis and provides features for interpretability.
Interval analysis is a theory oriented toward computational implementation
because it supports the development of interval-based granular algorithms. These
algorithms are mainly designed to automatically provide rigorous bounds on ap-
proximation errors, rounding errors, and propagated uncertainties in initial data.
This is of utmost importance because modeling of complex systems must com-
promise complexity and precision. Operations involving imprecise objects must
consider the nature of the imprecision.
The main concern of the interval analysis is to provide a guaranteed approx-
imation of the set of solutions of the underlying problem. ‘Guaranteed’ in this
context means that outer approximations (enclosure) of intervals can always be
obtained and, moreover, be made as precise as desired when further information
yields intervals of narrower width. Intervals acknowledge limited precision by as-
sociating with a variable of the model under investigation a set of reals as possible
values. For ease of storage and fast computation, these sets are restricted to inter-
vals (56). Essentials of interval theory, which form a background of fundamentals
for our investigations, are summarized next.
2.2.1 Interval Vectors
An interval Iis a closed bounded set of real numbers
[l, L] = {x:lxL},(2.1)
where land Ldenote its endpoints. An n-dimensional interval vector is an
ordered n-tuple of intervals (I1, ..., Ij, ..., In). If Iis, e.g., a two-dimensional
14
2.2 Interval Analysis
interval vector, then I= (I1, I2) for some I1= [l1, L1] and I2= [l2, L2].
Set-theoretic operations of intersection, , and union, , are applicable to
intervals. The intersection of two intervals, I1and I2, is empty, I1I2=, if
either l1> L2or L1< l2. This indicates that I1and I2have no common points.
Otherwise, the intersection of I1and I2is again an interval:
I1I2= [max(l1, l2), min(L1, L2)].(2.2)
The intersection of interval vectors is empty if the intersection of any of their
items is empty. Otherwise, for I1= (I1
1, ..., I1
j, ..., I1
n) and I2= (I2
1, ..., I2
j, ..., I2
n)
we have:
I1I2= (I1
1I2
1, ..., I1
jI2
j, ..., I1
nI2
n).(2.3)
If two intervals have nonempty intersection, then their union,
I1I2= [min(l1, l2), max(L1, L2)],(2.4)
is an interval. Disconnected sets must not be expressed as a single interval.
The convex hull of two interval vectors, I1and I2, namely ch(I1, I 2), is the
smallest interval vector containing all their elements. Then,
ch(I1
j, I2
j) = [min(l1
j, l2
j), max(L1
j, L2
j)], j = 1, ..., n. (2.5)
Hull computation is an efficient procedure to combine sets independently of their
connection. It follows that I1I2ch(I1, I 2) for any I1and I2.
If I1= (I1
1, ..., I1
j, ..., I1
n) and I2= (I2
1, ..., I2
j, ..., I2
n) are interval vectors, then
I1I2if and only if I1
jI2
j, j = 1, ..., n. (2.6)
15
2.2 Interval Analysis
We denote the width of an interval vector, namely wdt(I), as the length of its
largest side:
wdt(I) = max(wdt(I1), ..., wdt(Ij), ..., wdt(In)),(2.7)
where,
wdt(Ij) = Ljlj, j = 1, ..., n. (2.8)
Finally, it is worth defining the midpoint of an interval I:
mp(I) = l+L
2.(2.9)
Analogously, if I= (I1, ..., Ij, ..., In) is an interval vector, then:
mp(I) = (mp(I1), ..., mp(Ij), ..., mp(In)).(2.10)
2.2.2 Interval Arithmetic
Operations on real numbers can be extended to intervals. Interval arithmetic
treats intervals as numbers: adding, subtracting, multiplying, and dividing them.
The rules for interval addition and subtraction are:
I1+I2= [l1, L1]+[l2, L2] = [l1+l2, L1+L2],(2.11)
I1I2= [l1, L1][l2, L2]=[l1L2, L1l2].(2.12)
Operations of addition and subtraction for interval vectors are understood to be
component-wise. For two interval vectors, I1= (I1
1, ..., I1
j, ..., I1
n) and I2= (I2
1, ...,
I2
j, ..., I2
n), we have
16
2.2 Interval Analysis
I1+I2= (I1
1+I2
1, ..., I1
j+I2
j, ..., I1
n+I2
n),(2.13)
I1I2= (I1
1I2
1, ..., I1
jI2
j, ..., I1
nI2
n).(2.14)
For the product of two independent intervals, I1and I2, we get
I1I2={x1x2:x1I1, x2I2}.(2.15)
Clearly, the result is again an interval, say I3, whose endpoints are
[l3, L3] = [min(l1l2, l1L2, L1l2, L1L2), max(l1l2, l1L2, L1l2, L1L2)].(2.16)
The reciprocal of an interval Iyields:
1/I ={1/x :xI}.(2.17)
If Iis an interval not containing the number 0, then 1/I = [1/L, 1/l] if l > 0; or
1/I = [1/l, 1/L] if L < 0. In case Icontains 0 so that l0L, then the set
is unbounded and cannot be represented as an interval whose endpoints are real
numbers. For the quotient of two intervals, we have:
I1/I2=I1(1/I2) = {x1/x2:x1I1, x2I2}.(2.18)
I1/I2is again an interval if 0 is not contained in I2.I1and I2are independent.
The product and quotient operations for interval numbers hold for interval vec-
tors. For two interval vectors, I1= (I1
1, ..., I1
j, ..., I1
n) and I2= (I2
1, ..., I2
j, ..., I2
n),
it follows that:
I1I2= (I1
1I2
1, ..., I1
jI2
j, ..., I1
nI2
n),(2.19)
I1/I2= (I1
1/I2
1, ..., I1
j/I2
j, ..., I1
n/I2
n).(2.20)
17
2.2 Interval Analysis
2.2.3 Distance Between Intervals
A suitable metric to measure the distance between two intervals, I1and I2, is:
d(I1, I2) = max(|l1l2|,|L1L2|).(2.21)
With this metric, the correspondence between the interval number system and
the real number system, [x, x]x, holds (106). The metric d(.) preserves the
distance between the corresponding items. We have that
d([x1, x1],[x2, x2]) = max(|x1x2|,|x1x2|) = |x1x2|(2.22)
for any x1and x2. The real line is isometrically embedded into the metric space
of intervals (106).
The distance between two interval vectors, I1= (I1
1, ..., I1
n) and I2= (I2
1, ..., I2
n),
d(I1, I2)=(max(|l1
1l2
1|,|L1
1L2
1|), ..., max(|l1
nl2
n|,|L1
nL2
n|)),(2.23)
is an interval vector. Sometimes, we are more interested in a number to represent
the overall distance between interval vectors. A measure for the overall distance
between two interval vectors, I1and I2, is
D(I1, I2) = max(d(I1, I2)).(2.24)
2.2.4 Interval Functions
Consider a real-valued function f(x) and a corresponding interval-valued function
f(I). f(I) is a united extension of f(x) if f(I) = f(x) for any value of xI. If
the parameters of f(I) are degenerated, then f(I) is a degenerated interval equal
to f(x). Formally, the image of an interval Iunder a real mapping fis
18
2.2 Interval Analysis
f(I) = {f(x) : xI}.(2.25)
More generally, the image of a specified n-dimensional vector Iadmitting a mul-
tivariable real function fis:
f(I1, ..., Ij, ..., In) = {f(x1, ..., xj, ..., xn) : xjIjj}.(2.26)
Generally, the image of an interval through fis not a box (see Fig. 2.1) and it
may be difficult to obtain in closed form. In practice, f(I) can be approximated
by an inclusion function F(I), which is a box in the range of fif fis continuous.
An interval function Ffrom IRnto IRmis called an interval inclusion function
of fif
f(I)F(I)IIRn.(2.27)
Inclusion functions are not unique and they depend on how we choose F. An
inclusion function is optimal if F(I) is the interval hull of f(I). In other words,
the optimal interval inclusion function for f(I) is the smallest box F(I) that
contains f(I). Figure 2.1 illustrates the idea. F(I) is unique.
Figure 2.1: Image fof box Iand inclusion functions Fand F
19
2.2 Interval Analysis
In particular, for degenerated intervals I, it follows that:
F(I) = f(I) = F(I).(2.28)
Consider fmonotonically increasing in I= [l, L]. Then, assuming continuity
or upper semicontinuity of f, we can obtain f(I) using:
f(I) = [f(l), f (L)].(2.29)
Consequently,
f(x)[f(l), f (L)] xI. (2.30)
With monotonic decreasing functions, we order the resulting endpoints properly.
In these cases f(I)=[f(L), f (l)], i.e. strict inclusion relationship holds.
Nonmonotonic functions could be monotonic under endpoint constraint. For
example, f(I) = sin(I) is not monotonic in general but defining I= [Π/2,Π/2],
then f(I) is monotonic and f(I) = sin(I) = [sin(l), sin(L)].
An interval function f(I) is inclusion isotonic when for any interval vectors,
I1and I2,
if I1I2,then f(I1)f(I2).(2.31)
Finite interval arithmetic (104) is inclusion isotonic. Let denote the opera-
tions of addition, subtraction, multiplication and division, thus
I1I2I3I4(2.32)
holds whenever I1I3and I2I4. In this thesis all interval enclosures are
inclusion isotonic interval extensions of real-valued continuous functions.
20
2.3 From Interval Analysis to Fuzzy Set Theory
An interval function f(I)IR is called ‘thin’ when it involves only degenerate
interval parameters or, equivalently, singular parameters. For instance,
f(I) = a0+
n
X
j=1
ajIj(2.33)
is thin for (a0, ..., an) degenerated intervals. When an interval function involves
at least one interval parameter of nonzero width, it is called ‘thick’. This thesis
considers thin interval functions only.
Interval analysis goes far beyond what has been covered in this section. For
instance, we do not address interval statistics (49), intervals in fuzzy set the-
ory (105), interval integration (106), complex interval arithmetic (120), but the
essential to the completeness of this work.
2.3 From Interval Analysis to Fuzzy Set Theory
While interval analysis arose out of a need to analyze error and uncertainty on
digital computers (103), fuzzy set theory arose from a need of more complete and
inclusive mathematical models of uncertainty (149). Relationships between fuzzy
set theory and interval mathematics have been reported by Lodwick (93).
Fuzzy arithmetic (67) is defined by means of the extension principle for fuzzy
sets (70) (149). The extension principle for fuzzy sets is the united extension in
the interval analysis terminology when the fuzzy set is restricted to be an interval
(93). When intervals and fuzzy sets are non-interactive, arithmetic on alpha level
sets is a united extension arithmetic. Both concepts are related fundamentally
through what is known as set functions (131).
From the point of view of intervals as sets, interval analysis can be considered
as a subset of the fuzzy set theory. For instance, an interval [l, L] is a trapezoidal
fuzzy set [l, λ, Λ, L] where l=λand Λ = L(138).
Fuzzy interval analysis (40) and interval type-2 fuzzy logic systems (100) (150)
are explicit examples of joint efforts between fuzzy set theory and interval analysis
to overcome the difficulties of uncertainty modeling.
21
2.4 Fuzzy Sets
Interval analysis and fuzzy set theory are instances of practical frameworks
used to represent granular information and construct granular mappings. Con-
ceptually, intervals and fuzzy sets are different ways to model imprecise quanti-
ties and capture our inherent notion of approximate numbers. ‘Above 100’ and
‘around 1.5 and 1.7’ are instances of intervals whereas ‘approximately 100’ and
‘around 1.6’ are instances of fuzzy sets.
A striking difference between intervals and fuzzy sets comes from the idea
of partial membership intrinsic to fuzzy sets. Whenever interval quantification
becomes too restrictive, fuzzy sets provide an important feature of describing in-
formation granules whose constituting elements may belong only partially. Fuzzy
sets prevent defining hard borders between full belongingness and full exclusion
by means of smooth transition boundaries. Granules formalized in the language
of fuzzy sets support a vast array of human-centric pursuits (116).
2.4 Fuzzy Sets
Fuzzy sets (70) (149) constitute one of the most influential notions in science and
engineering. A fuzzy set captures in a granular way the essential in which much
of physical phenomena is observed and described. Fuzzy information granulation
underlies the basic concepts of linguistic variables, fuzzy rules, and fuzzy rule
base (116). In fuzzy set theory, objects, variables and concepts are a matter of
degree. In particular, fuzzy information granulation allows both, incorporation
of domain knowledge and knowledge discovery from data.
Fuzzy sets extend the notion of set by assigning to each element of a reference
set a value representing its degree of membership in the fuzzy set. Membership
values correspond to the degree the element is similar with typical elements rep-
resenting the concept associated with the fuzzy set. This characteristic of fuzzy
sets facilitates the management of the uncertainty carried by such elements.
Concepts and definitions related to fuzzy sets which are useful for our inves-
tigations are summarized in next.
22
2.4 Fuzzy Sets
2.4.1 Fuzzy Set Definitions
Fuzzy sets are fully characterized by their membership functions. Any function
A:X[0,1] may serve as a membership function of fuzzy set A. In this
thesis we assume trapezoidal membership functions, which are piecewise linear
functions described by four parameters (l, λ, Λ, L). The membership degree of an
element xin the trapezoidal fuzzy set Ais
A(x) =
0, x < l
xl
λl, x [l, λ[
1, x [λ, Λ]
Lx
LΛ, x , L]
0, x > L
(2.34)
A fuzzy set Ais normal if it produces a membership degree equal to 1 for at
least one element xof the universe X. Denote sup as the supremum value of A
for some element x; then Ais normal if
supxXA(x) = 1.(2.35)
We denote support and core of a trapezoidal membership function Arespec-
tively as the set of elements of Xwith nonzero membership degrees in A, and the
set of elements of Xwith membership degrees equal to 1, that is, for a trapezoidal
membership function A,
supp(A) = {xX|A(x)>0}= [l, L],and (2.36)
core(A) = {xX|A(x)=1}= [λ, Λ].(2.37)
The α-cut of a fuzzy set A,Aα, is a set containing all elements of Xwhose
membership degrees are greater than the value α. We have
23
2.4 Fuzzy Sets
Aα={xX|A(x)> α}.(2.38)
Support (α= 0) and core (α= 1) are boundary cases of α-level sets.
A fuzzy set is convex if for all x1, x2Xand all κ[0,1] it follows that
A(κx1+ (1 κ)x2)min(A(x1), A(x2)).(2.39)
A fuzzy set A1is a subset of A2if and only if every element of A1is also an
element of A2:
A1(x)A2(x),for all xX. (2.40)
The midpoint and width of a membership function Aare, respectively:
mp(A) = λ+ Λ
2,(2.41)
wdt(A) = Ll. (2.42)
Intersection and union of two fuzzy sets, say A1and A2, are defined as
(A1A2)(x) = min(A1(x), A2(x)) xX, (2.43)
(A1A2)(x) = max(A1(x), A2(x)) xX. (2.44)
The convex hull of two trapezoidal fuzzy sets A1and A2is a trapezoidal fuzzy
set determined as follows:
ch(A1, A2) = (min(l1, l2), min(λ1, λ2), max1,Λ2), max(L1, L2)).(2.45)
24
2.4 Fuzzy Sets
2.4.2 Fuzzy Interval
Granular data may take various forms depending on how they are modeled. They
can be intervals, probability distributions, rough sets, fuzzy numbers, and fuzzy
intervals (42). Fuzzy intervals and fuzzy numbers are instances of fuzzy granular
data. Fuzzy data arise in the realm of expert knowledge, whenever measurements
are inaccurate, variables are hard to be precisely quantified, or pre-processing
steps introduce uncertainty in singular data.
A membership function A:X[0,1] is upper semi-continuous if the set
{xX|A(x)> α}is closed, that is, if the α-cuts of Aare closed intervals. If
the universe Xis the set of real numbers and Ais normal, A(x)=1x[λ, Λ],
then Ais a model of a fuzzy interval, with monotone increasing function φA:
[l, λ[[0,1], monotone decreasing function ιA: ]Λ, L][0,1], and zero
otherwise. A fuzzy interval Ahas the following canonical form:
A(x) =
φA, x [l, λ[
1, x [λ, Λ]
ιA, x , L]
0,otherwise
,(2.46)
where xis a real number in X. The fuzzy interval Asatisfies the conditions of
normality (A(x) = 1 for at least one xX) and convexity (A(κx1+ (1 κ)x2)
min{A(x1), A(x2)},x1, x2X,κ[0,1]). If
φA=xl
λland (2.47)
ιA=Lx
LΛ,(2.48)
then the fuzzy membership function (2.46) reduces to the model of a trapezoidal
membership function (2.34). Moreover, when λ= Λ, then A(x) = 1 for one
element x. In this case the corresponding fuzzy entity is called a fuzzy number
(116). Fuzzy data generalize numeric data by allowing fuzziness.
25
2.5 Aggregation Operators
2.4.3 Similarity Between Fuzzy Sets
Granular data and models are fuzzy objects of trapezoidal nature. In this case,
a useful similarity measure for trapezoids, say A1and A2, is:
S(A1, A2) = 1 |l1l2|+|λ1λ2|+|Λ1Λ2|+|L1L2|
4.(2.49)
This measure translates the relation between the trapezoids in a number. It
returns 1 for identical trapezoids (indicating the maximum degree of matching
between them) and decreases linearly when A1and A2withdraw from each other.
Particularly, equation (2.49) is a Hamming-like metric (52) where the parameters
of the trapezoids are compared one by one. A thorough discussion of similarity
and compatibility measures can be found in (33).
The distance between two vectors of trapezoids, say A1= (A1
1, ..., A1
n) and
A2= (A2
1, ..., A2
n),
S(A1, A2) = 1 1
4n
n
X
j=1
(|l1
jl2
j|+|λ1
jλ2
j|+|Λ1
jΛ2
j|+|L1
jL2
j|),(2.50)
is also a number, which quantifies their relationship.
2.5 Aggregation Operators
Aggregation operators C: [0,1]n[0,1], n > 1 combine input values in the unit
hypercube [0,1]ninto a single output value in [0,1]. They must satisfy two funda-
mental properties: (i) monotonicity in all arguments, i.e., given x1= (x1
1, ..., x1
n)
and x2= (x2
1, ..., x2
n), if x1
jx2
jjthen C(x1)C(x2); (ii ) boundary condi-
tions: C(0,0, ..., 0) = 0 and C(1,1, ..., 1) = 1. Important classes of aggregation
operators are summarized below. See (20) (116) for details.
26
2.5 Aggregation Operators
2.5.1 T-norm Aggregation
T-norms (T) are commutative, associative, and monotone operators on the unit
hypercube whose boundary conditions are T(α, α, ..., 0) = 0 and T(α, 1, ..., 1) =
α,α[0,1]. The neutral element of T-norms is e= 1. An example is the
minimum operator:
Tmin(x) = min
j=1,...,nxj,(2.51)
which is the strongest T-norm because
T(x)Tmin(x) for any x[0,1]n.(2.52)
The minimum is also idempotent, symmetric, and Lipschitz-continuous. Further
examples of T-norms include the product,
Tprod(x) =
n
Y
j=1
xj,(2.53)
and the Lukasiewicz T-norm,
TL(x) = max(0,
n
X
j=1
xj(n1)).(2.54)
2.5.2 S-norm Aggregation
S-norms (S) are operators on the unit hypercube which are commutative, asso-
ciative, and monotone. S(α, α, ..., 1) = 1 and S(α, 0, ..., 0) = αare the boundary
conditions of S-norms. It follows that e= 0 is the neutral element of S-norms.
S-norms are stronger than T-norms. The maximum operator:
Smax(x) = max
j=1,...,nxj,(2.55)
27
2.5 Aggregation Operators
is the weakest S-norm, that is,
S(x)Smax(x)T(x),for any x[0,1]n.(2.56)
Other examples of S-norms include the probabilistic sum,
Sprob(x)=1
n
Y
j=1
(1 xj),(2.57)
and the Lukasiewicz S-norm,
SL(x) = min(1,
n
X
j=1
xj).(2.58)
The dual CDof an aggregation operator Cis
CD(x1, ..., xn)=1C(1 x1, ..., 1xn).(2.59)
Maximum and minimum, probabilistic sum and product, and Lukasiewicz S and
T-norms are examples of self-dual aggregation operators.
2.5.3 Uninorm Aggregation
Uninorms (U) are bivariate, associative and symmetric operators closed under
duality. Similarly as with T-norms and S-norms, associativity allows n-ary ex-
tension of uninorms. Uninorms U: [0,1]n[0,1] generalizes triangular norms
by relaxing the assumption about the neutral element eto get values in [0,1].
Input values higher than eare interpreted as beneficial, a positive evidence; input
values lower than eare considered detrimental, a negative evidence. Naturally,
when eis equal to 0 a uninorm turns into an S-norm and when e= 1 the uninorm
becomes a T-norm.
28
2.5 Aggregation Operators
This work considers the following family of uninorms:
U(x) =
e T x1
e, ..., xn
eif x[0, e]n
(e+ (1 e)) Sx1e
1e, ..., xne
1eif x[e, 1]n
T(x1, ..., xn) otherwise,
(2.60)
where e6= 0 and e6= 1. Any pair of T and S norms may be used to construct the
uninorm Uindependently of their properties or duality.
2.5.4 Averaging Aggregation
An aggregation operator Cis averaging if for every x[0,1]nit is bounded by
Tmin(x)C(x)Smax(x).(2.61)
The basic rule is that the output value cannot be lower or higher than any input
value. An example of averaging operator is the arithmetic mean:
M(x) = 1
n
n
X
j=1
xj.(2.62)
Averaging operators are assumed to be idempotent, strictly increasing, symmet-
ric, homogeneous, and Lipschitz continuous.
2.5.5 Compensatory T-S Aggregation
Compensatory T-S operators combine T-norms and S-norms to counterbalance
their opposite effects. Contrary to uninorm aggregation, T-S aggregation is uni-
form in the sense that it does not depend on parts of the underlying domain.
T-S operators use both a T-norm and a S-norm and averages the two val-
ues obtained by means of a weighted quasi-arithmetic mean. The linear convex
operator
29
2.6 Summary
L(x) = (1 v)T(x1, ..., xn) + vS(x1, ..., xn),(2.63)
where v[0,1], is an example of T-S operator of the family of weighted quasi-
arithmetic means. T-S operators need not to be dual in terms of Tand S. It
follows that:
S(x)L(x)T(x),for any x[0,1]n.(2.64)
2.6 Summary
This chapter has addressed principles and definitions of granular computing that
are useful for the comprehension of subsequent chapters. We argued that in-
formation granulation plays a primary role both in handling data of uncertain
nature and in representing concepts described in natural language. We empha-
sized interval and fuzzy granular computing frameworks - with intervals and fuzzy
sets being instances of information granules. When processing granular data we
are in fact handling a significant number of similar individual elements at the
same time and therefore ignoring details. This chapter also covered aggregation
operators which are pertinent for information fusion within granular computing
environment.
30
Chapter 3
Evolving Granular Systems
Evolving granular systems are a modeling framework that considers online gran-
ular data stream processing and structurally adaptive rule-based models. As
uncertain data prevail in stream applications, excessive data granularity becomes
unnecessary and inefficient. This chapter starts with the motivation which led to
the development of evolving intelligent systems. We briefly summarize the main
historical landmarks of the research area leading to the state of the art. Next, we
introduce evolving granular systems, which extend evolving intelligent systems
allowing data, variables and parameters to be granular (intervals and fuzzy sets).
The aim of evolving granular systems is to fit the information carried by input-
output data streams from online nonstationary processes into rule-based models
and, at the same time, provide granular approximation of functions and linguistic
description of the system behavior.
3.1 Introduction
Adaptability is of paramount importance for intelligent systems. As Darwin
quoted (35), it is neither the strongest nor the most intelligent that survives, but
the most adaptable to change. Building adaptive models from large volumes of
real-world online data flows requires developing non-conventional learning algo-
rithms able to continuously track system and environment changes. Rethinking
traditional data mining and modeling techniques is primordial to support struc-
31
3.2 Evolving Intelligent Systems
tural adaptation of information systems based on sequences of data, possibly of
uncertain nature.
Because data acquisition systems and small scale computing devices became
mere components of complex systems, large amounts of data have been produced
uninterruptedly. Storage of large-scale data sets and offline processing are fre-
quently impractical, especially in online applications. In addition, data from
different sources may be temporally and spatially related. Online learning algo-
rithms should benefit from time and space data stream correlations to capture
essential information and recursively translate it into structured knowledge. The
effectiveness of data stream-oriented learning algorithms is rooted in their apti-
tude to quickly evolve models from nonstationary data.
Learning system models from data streams in online mode is a challenging
task for most statistical and computational intelligence methods. Adaptive -
and naturally non-adaptive - learning methods face a number of drawbacks when
dealing with evolving data streams including: (i) difficulty in choosing the model
structure since data sets and related information are not available; (ii ) forget-
fulness when trying to acquire new information after concept changes; and (iii)
limited transparency and interpretability of the resulting model. In particular,
there is a need for developing recursive learning methods that explore the na-
ture of data streams (11) and at the same time fulfill accuracy, transparency and
interpretability requirements (4).
3.2 Evolving Intelligent Systems
Approaches to extract meaningful information from data streams have recently
been developed (1) (9) (10) (24) (25) (32) (46) (59) (66) (75) (79) (83) (84)
(94) (122) (124) (125) (134). Methods and algorithms directed toward this end
are known as Evolving Intelligent Systems. Evolving intelligent systems focus on
nonstationary processes and embody online learning methods and one-pass incre-
mental algorithms that evolve or gradually change individual models to guarantee
life-long learning and self-organization of the system structure.
Evolving systems are a step toward a higher level of adaptability compared
to conventional adaptive systems from control theory (13), classical identification
32
3.2 Evolving Intelligent Systems
systems (92), and traditional data mining systems (54) (135). While the term
‘intelligent’ comes from the use of fuzzy and neuro-fuzzy (computational intelli-
gence) techniques, the evolving aspect of these systems accounts for unbounded
(infinite) amounts of data, changing concepts, and structural adaptation of mod-
els.
Formally stated, a system is said to be evolving if it:
learns continuously from data streams;
does not store previous samples;
does not depend upon prior structural knowledge;
self-adapts its structure when needed;
is independent of statistical properties of data; and
does not use ‘prototype’ initialization.
Moreover, it is much desired that evolving systems assimilate knowledge fast using
small memory requirements to support real-time applications. Evolving systems
must account for the fact that the unknown is likely to matter.
In terms of implementation, evolving systems usually achieve their final pur-
pose in software level, but they may be performed in physical embodiments in-
cluding intelligent agents, embedded systems, and ubiquitous computing (11).
3.2.1 Historical Landmarks
In the beginning of this century, two mainstreams of research in evolving intelli-
gent systems were introduced: evolving fuzzy systems (5) and evolving connec-
tionist systems (65). Their origins are independent of one another.
Evolving fuzzy systems (eFS) were proposed by Angelov (5), being evolving
Takagi-Sugeno (eTS) fuzzy systems (6) a milestone in the field of structurally
adaptive rule-based systems. The eTS is an eFS paradigm for function ap-
proximation and control that fulfils the requirements for flexible and adaptive
approaches of a variety of modern applications such as automation processes,
33
3.2 Evolving Intelligent Systems
autonomous systems, intelligent sensors, and defense. eTS assumes that the an-
tecedent and consequent parameters of functional fuzzy rules as well as the num-
ber of rules in a rule base can gradually change by learning from experience based
on data streams. This characteristic provides eTS approaches with the funda-
mental ability to pursue online modeling of time-varying nonstationary functions.
Evolving fuzzy classifiers (eClass) (8) (10) are another approach derived from eFS
when the consequent part of fuzzy rules is a class label. In eClass the number
of classes needs not be known in advance and new classes can be incorporated
at any time. eClass models were seminal to the field of evolving classifiers which
possess the ability to capture both concept drift and shift (95).
Evolving connectionist systems (eCOS) were proposed by Kasabov (65) (66).
eCOS are artificial neural networks that operate continuously in time and adapt
their structure and functionality through interaction with the environment and
other systems. A paradigm of eCOS is called evolving fuzzy neural network
(EFuNN) (63), which is the earliest and perhaps most influential model of eCOS.
All neurons in EFuNN are created and updated during learning. They represent
membership functions and rules. Information carried by a data stream is memo-
rized on neurons and connections, and further used for predictions. The EFuNN
structure evolves from hybrid (supervised and unsupervised) algorithms. Partic-
ularly, the fuzzy aspect of EFuNN permits the neural network to be interpreted as
a fuzzy rule-base. Other noteworthy approaches supporting the context of eCOS
are evolving self-organizing maps (eSOM) (34) and dynamic evolving neural-fuzzy
inference systems (DENFIS) (64).
Common to both eFS and neurofuzzy eCOS are fuzzy sets, which are formed
on a basis of numeric data through incremental clustering. Clusters give rise to
fuzzy membership functions that considered together convey a global view of the
available data. In evolving systems, fuzzy membership functions play a key role
as the core of modeling approaches. They aim to represent similar data in a
concise manner. After cluster identification, a recursive algorithm is usually used
to refine local parameters and functions. In both platforms, eFS and neurofuzzy
eCOS, expert knowledge can be incorporated, but it is not compulsory.
From the granular computing point of view, eFS and great part of eCOS can be
considered granular modeling frameworks. Fuzzy sets, used to represent numeric
34
3.2 Evolving Intelligent Systems
data, are instances of granules whereas computations in eFS and eCOS are based
on the result of information granulation. However, in general, evolving intelligent
systems cannot be regarded as evolving granular systems in the greatest sense of
the term because they do not deal with input and output granular data and quite
often do not produce granular estimation. In other words, evolving systems are
granular systems internally, and singular systems externally.
Since the conception of evolving intelligent systems a diversity of studies sug-
gesting extension of the original content has taken place. Approaches regarding
primarily computational intelligence principles and ideas follow the essential no-
tions of the original evolving intelligent systems. Conversely, there exist parallel
research lines where structurally adaptive learning approaches from data streams
are mostly based on data mining and statistics. Such approaches are often not re-
ferred to as ‘evolving’; however, the central idea of capturing gradual and abrupt
changes in nonstationary data streams is the same independently of the different
terminologies. The next section reviews some state-of-the-art works.
3.2.2 State of the Art
This section summarizes recent research related to learning methods capable of
handling numeric data streams. We do not intend to give an exhaustive review of
the literature. The purpose is to overview works closely related to the approaches
addressed in this thesis.
The evolving participatory learning (ePL) approach (87) combines the concept
of participatory learning (137) with evolving Takagi-Sugeno fuzzy systems. The
ePL approach is based on unsupervised clustering and therefore is a candidate
to find rule base structures in adaptive fuzzy modeling. ePL uses participatory
learning fuzzy clustering instead of scattering or information potential-based clus-
tering used by eTS. At each time step, ePL updates the rule base structure using
convex combinations of new data samples and the closest cluster center. The
parameters of the consequent part of a rule are adapted using a recursive least
squares algorithm.
The evolving multivariable Gaussian approach (eMG) (84) is an evolving func-
tional fuzzy modeling approach which, differently from eTS, uses an evolving
35
3.2 Evolving Intelligent Systems
Gaussian clustering algorithm based on the concept of participatory learning.
The clustering algorithm is one-pass and updates the eMG rule base continuously.
Fuzzy sets in eMG are multivariable Gaussian membership functions which are
adopted to preserve information between input variable interactions. The param-
eters of the membership functions, that is, cluster centers and dispersion matrices,
are estimated by the clustering algorithm. A weighted recursive least squares al-
gorithm updates the parameters of the rule consequents. The eMG clustering
algorithm is particularly robust to noisy data and outliers through the use of a
mechanism to smooth incompatible input data.
A data-driven incremental algorithm called flexible fuzzy inference system
(FLEXFIS) was proposed in (94) to evolve Takagi-Sugeno fuzzy systems. A
modified version of vector quantization was suggested for rules evolution. The
FLEXFIS algorithm adapts linear functions of rules consequent and premise pa-
rameters (fuzzy membership functions) in online mode. Clusters of data are
automatically generated based on the nature, distribution and quality of new
data. Convergence toward the optimal parameter set in the least-squares sense
has been achieved by the algorithm.
Self-organizing fuzzy modified least-square neural network (SOFMLS) (124)
is a neurofuzzy network capable of adapting itself in real-time to a changing envi-
ronment. In SOFMLS, parametric and structural model adaptation is performed
simultaneously. The neural network generates a new rule if the smallest distance
between a new numeric data vector and rule parameters is higher than a pre-
specified radius. A density-based pruning procedure controls the network growth
over time. SOFMLS does not require retraining of the whole model and has
proved to be able to escape from local minima and be stable to concept changes.
The fuzzy min-max neural network (GFMM) (46) is a generalization of the
fuzzy min-max clustering and classification neural networks (129) (130). It han-
dles labeled and unlabeled data simultaneously in a single neural model. GFMM
combines supervised and unsupervised learning to give hybrid clustering and
classification. The learning process places and adjusts hyperboxes (expansion-
contraction paradigm) in the feature space in a few or one pass over data sets.
GFMM is able to classify interval data and can be viewed as an incremental
granular classifier.
36
3.2 Evolving Intelligent Systems
Learn++.NSE (43) is an ensemble of classifiers-based approach for time-
varying data distribution modeling. Learn++.NSE considers consecutive batches
of data and makes no assumptions about the nature and rate of concept drift.
The algorithm learns incrementally, similar to other algorithms of the Learn++
family (107) (121). Learn++.NSE trains one new classifier for each batch of data
it receives and combines these classifiers using a dynamically weighted majority
voting procedure. This procedure allows the algorithm to recognize and react to
changes in the underlying data distributions. Since data batches are discarded
after use, Learn++.NSE is suitable for online modeling of large volumes of data.
Very fast decision trees (VFDT) (38) is a method to discover knowledge in
databases that builds decision trees using constant memory space and constant
time to process a sample. VFDT operates on high-volume data streams and
gradually creates branches and leaves if necessary. The approach uses Hoeffding
bounds to guarantee that its output is asymptotically nearly identical to that of
a conventional batch learner. VFDT is designed for classification purpose.
The ultra fast forest of trees (UFFT) (48) is a one-pass incremental algorithm
able to detect concept drift. Trees are split according to new information appear-
ing in a numeric data stream. In multi-class classification problems UFFT builds
a binary tree for each possible pair of classes, leading to a forest of trees. De-
cision nodes and leaves contain naive Bayes classifiers to detect changes in class
distribution and classify test examples. When changes in class distributions are
detected, sub-trees rooted at representative nodes are pruned.
Differently from VFDT and UFFT, evolving fuzzy linear regression trees
(eFRT) (83) (85) convey a linear regression model in each leaf. Thus, eFRT
can be used for function approximation and prediction. In general, the number
of tree nodes and the number of inputs can be changed given a new sample. The
tree starts with a single leaf and grows replacing leaves with sub-trees and adding
more variables to the regression model. The eFRT topology is updated on the
fly using a statistical model selection test that considers accuracy and number of
parameters to provide accurate and parsimonious trees.
Massive Online Analysis (MOA) (23) is a software environment for learning
from evolving data streams. MOA supports incremental classification and clus-
tering approaches that do not scale with the volume of information. For classifi-
37
3.3 Granular Data Streams
cation, MOA considers boosting, bagging, and Hoeffding trees with and without
naive Bayes classifiers at the leaves. For clustering, it implements the algorithms
StreamKM++, CluStream, ClusTree, Den-Stream, D-Stream, and CobWeb. The
aim of MOA is to provide analysis tools and insight about real-world data stream
mining problems. MOA can interact with the software WEKA, the Waikato
Environment for Knowledge Analysis (135).
3.3 Granular Data Streams
Physical systems change over time and usually produce considerable amounts of
nonstationary data. Data streams in online environment can be granular from
different perspectives. A more intuitive perspective concerns data that are granu-
lar by themselves. To elaborate, consider a simple example of predicting variable
yfrom the last available observation x. This leads us to search for an approxi-
mand pto describe the process function fbased on pairs (x, y). Here, instances x
and yare singular (real numbers), and function fis single-valued. Singular data
do not restrain models to be singular but rather a granular system may use gran-
ular models whose size and placement reflect the information carried by singular
data. A hypothesis is that granular representation helps to assess the structure
of detailed singular data and organizes the data into a more interpretable format.
Consider x= [x, x] and y= [y, y] as instances of a granular data stream,
intervals in this case. To exemplify, xand xmay denote the minimum and
maximum price of an economical index during a day, and yand ythe range of
fluctuation of the price in the next day. In this example, data are originally
granular, and models [p, p] must be granular to support granular data. Figure
3.1 illustrates the granular modeling approach for function approximation.
Figures 3.1(a) and 3.1(b) show that granular models outer approximate single-
valued and granular functions, respectively. Outer approximations of functions
can always be obtained, e.g., at the top level, the coarsest possible granular
approximation is the problem domain. Although merely enclosing a solution may
sound at first shallower than finding the solution itself, we should reflect that the
degree of satisfaction involved in embracing a solution depends strongly on the
width of the enclosure obtained (60). Moreover, when processing stream data,
38
3.3 Granular Data Streams
Figure 3.1: Granular models: (a) single-valued function, (b) granular function
we rarely have an idea about the error range and uncertainty associated with the
data. On the contrary, if we can compute with granules containing a solution,
then we can take for example the midpoint as a numeric approximation. Hence,
we obtain both an approximate numeric solution and tolerance bounds on the
approximation. The key task of approximating functions with granules is seeking
for the tightest envelope for the approximand.
Another perspective for the materialization of granules in data streams is con-
cerned with the uncertainty introduced during preprocessing steps. Incomplete
data makes precise discrimination of examples difficult. Missing values are usu-
ally predicted through imputation methods (91) (127) where the imputed data
is uncertain by the very nature of the prediction and motivates granules. In
privacy-preserving data mining, uncertainty may be added to the data in order
to preserve the privacy of the results (3). Additionally, noise and disturbances of
bounded-error dynamic context also demand information granulation.
Granular data may arise when measurements are inaccurate or variables are
hard to be quantified. For example, in sensor streams imprecision arises from in-
accuracies in the underlying data acquisition equipment. Often, data are purely
numerical, but the process which generated the data is uncertain. In these cases,
uncertainty in data representation may be useful to improve the quality of the re-
sults. For example, an instance with greater uncertainty may not be as important
as one with smaller uncertainty.
Sometimes, stream data are derived from expert knowledge. Granular com-
39
3.4 Evolving Granular Modeling
puting provides a general framework to represent real-world perception in natural
language (42) (138). Various considerations can affect one’s choice of data rep-
resentation. Foremost among these is what Zadeh calls cointention (154), the
ability of the representing object to convey the meaning of the concept it is being
used to represent (140).
In a nutshell, stream data can be intervals, probability distributions, rough
sets, and fuzzy intervals (42). We define granular data streams as a sequence of
samples that conveys granular information about a process. Evolving granular
models are built from granular data streams. Interval and fuzzy granular data
streams generalize numeric data streams by allowing interval representation and
fuzziness.
3.4 Evolving Granular Modeling
Nonstationary granular system modeling encompasses adaptive and flexible learn-
ing procedures to deal with many types of data such as numbers, intervals, and
fuzzy intervals. Granular computing provides a rich framework for modeling non-
stationary systems using granular data streams.
Evolving granular modeling (6) (16) (66) (75) (77) (78) (79) (80) (81) (97)
(119) comes not only as an approach to capture the essence of stream data but
also as a framework to extrapolate spatio-temporal correlations from lower-level
raw data and provide a more abstract human-like representation of them. Re-
search effort into granular computing toward online environment-related tasks is
supported by a manifold of relevant applications such as financial, health care,
video and image processing, GPS navigation, click stream analysis, etc.
Our definition of evolving granular system is as follows: evolving granular
systems are systems that are able to derive interpretable rule-based models and
provide granular function approximation using an incremental learning algorithm
and imprecise stream data (with imprecise data being numbers, intervals, fuzzy
intervals, etc.) Association rules given in the form of If-Then statements can be
extracted from an evolving granular construct at any time. The evolved rule base
means, in essence, a granular description of a process.
In practice, evolving granular systems extend evolving intelligent systems in
40
3.5 Time and Space Granulation
their capability to handle singular and granular input-output data, and give
single-valued and granular approximations of original single-valued or granular
functions. Granular approximation comes with a linguistic description in addi-
tion to a numeric, pointwise approximation typical of evolving intelligent systems.
Evolving granular systems rely fundamentally on the concepts of granular
view, information granule, and granular mapping (see Section 2.1) in the process
of modeling stream data. Emphasis is on the tasks of data granulation and
computing with granules (143) (145) (146) (152). The granularity of information
explicitly embedded into granular systems offers valuable features in dynamic
modeling such as transparency and flexibility. Naturally, we are concerned with
a certain way of compressing granular data into more intelligible granular models.
Granular data streams are responsible for creation, expansion and shrink-
age of granular models along one or more dimensions of the input and output
spaces, guide parameter adaptation, and order the most appropriate granulari-
ties. Concept change, missing and noisy values, superfluous and outlier samples
are common in online environments and require automatic intervention. When-
ever a sample arrives, evolving algorithms should decide whether to discard it or
to use it to update the current knowledge. Evolving granular learning algorithms
designed to handle online granular data face odd challenges concerning the value
of the current knowledge, which reduces as the concept changes; and the impossi-
bility to neither store nor retrieve the data once read. Learning must be one-pass.
Constructive (bottom-up) and decomposition-based (top-down) mechanisms pre-
dominate.
3.5 Time and Space Granulation
Data granulation may be performed in time and space domains. Approaches to
building granules regard temporal granulation earlier than spatial granulation,
as illustrated in Fig. 3.2. This order of granulation is maintained due to several
reasons. Occasionally, samples are recorded at different time intervals, e.g., as in
events stream. The need for synchronized analysis of manifold data streams and
search for time-correlated structures give support to the possibility of considering
temporal granulation first. Temporal granulation tends to slow down the data
41
3.5 Time and Space Granulation
flow once several streaming instances can be wrapped by a granular object and
further computations be based on granules. Time granules grant synchronism
and smaller amount of granular data for subsequent spatial analysis. Spatial
correlation uniting heterogeneous data with multiple levels of granularity and
different representations (intervals, fuzzy sets, rough sets, etc.) is captured during
the process of spatial granulation. Structured representation of data is preserved
over time as a synopsis of the data stream; it warrants structured problem solving
at the practical level.
Figure 3.2: Time and space granulation
The flexibility of handling data streams using a granular computing frame-
work enables us to describe granules in different application domains without
deep knowledge about the problem. Tight time and memory constraints of on-
line environment and interpretability requirements inspire granulated views of
detailed data and computing at coarser granularities.
3.5.1 Time Domain Granulation
Time granulation aims at both reducing the sampling rate of fast data streams and
synchronizing concurrent data streams that are input at random time intervals.
A time granule describes the data for a certain time period.
42
3.5 Time and Space Granulation
Whenever the bounds of a time granule are aligned with significant shifts in
the target function, the underlying granulation provides a good abstraction of
the data. Conversely, if the alignment is poor, models may be inadequate (17).
Manifold granularities require temporal reasoning and respective formalizations.
Time granules and time windows are distinguished as follows.
Time window (74) (110) stands for a pre-specified or adaptive duration interval
within which data samples assemble a representation. Generally, a fixed number
of samplings or error values defines the size of the window. Windowing the
time domain attempts to produce as few segments as possible to avoid data
overfitting. Few time segments may hide information if the concept changes.
Nonstationarity modifies “ideal” window lengths by its own dynamic. Approaches
to testing window lengths are computationally costly and, hence, infeasible in
environments with narrow time constraints. Essentially, there may exist several
information granules in a time window. Data chunk analysis belongs to window-
based approaches for information extraction and analysis.
A time granule groups data according to their indistinguishability in time.
Since a time granule conveys similar data indexed in time, its bounds are naturally
aligned with substantial changes in the function. The result of dynamic time
granulation is a unique granule per segment. Time granules assume manifold
levels of data abstraction and are aware of the pace of concept changes.
Event streams are examples of streams that usually come about at different
time granularities. They require analysis of time-domain granules for common-
alities extraction prior to space-domain analysis. Broadly stated, information
evoked from time granules can be bounds of intervals, probability distributions
or membership functions, and features such as frequency and correlation between
events, patterns, prototypes. The internal structure of a granule and its associ-
ated variables provide full description and characterization of the granule.
Whenever manifold data streams mismatch each other at finer time granular-
ities we resort to a granulated view of the time domain and a data mining and
modeling approach. The resulting granulation should be at least as coarse as the
coarsest individual stream to agree with the notion of outer approximation of
functions and guaranteed solution (Section 2.1).
43
3.5 Time and Space Granulation
3.5.2 Space Domain Granulation
Data granulation over the space domain is a process of organization for compre-
hension (15). Granulation enables us to view different samples as being the same
if low level details are neglected. Granulating the domain space is fundamental in
methods of clustering (2) (17) and information integration (50). Resulting gran-
ules may compose antecedent and consequent parts of rules in rule-based systems
(11) (71) (73).
Whenever variables are recorded simultaneously and the sampling frequency
is not so high that we have enough time to step recursive algorithms, the time
granulation stage can be ignored and efforts fully concentrated on spatial granu-
lation. In fact, time and space granulation are somewhat related. For instance,
(i) with the minimal and maximal values occurring in a time granule we may
form an interval granular object; (ii) taking a representative mean or median of
instances resting into a time granule and a confidence interval around it we may
form a statistical granular object; (iii ) capturing the core and the uncertainty of
instances falling in a same time granule may give rise to a fuzzy granular object.
Granular objects of any precedence may be taken into consideration as input to
the stage of spatial granulation.
The location and size of a granule play a role in the process of granulation.
Original stream data are compressed to a few granules whose location and gran-
ularity reflect the structure of the data. There are many granulated views of
the same problem. When evolving granular structures, granules are created as
instances of the current knowledge. Next, granules may expand and occupy the
space wherever new instances arrive. Operations on granules combine granules to
form a coarser granule or decompose a granule into finer granules. Operations on
granules should be consistent with the size of the granules and relations between
granules; they provide the basic ingredients for the granular computing.
While concept drift and shift are terms related to the joint time-space domain
(95), the descriptions of data density and information specificity (139) concern
the space domain and are options to guide spatial granulation. Bargiela and
Pedrycz (15) state that granules should encompass as many data as possible while
maintaining certain specificity in what they called principle of the maximization
44
3.6 Summary
of the information density. The principle of the balanced information granularity
(15) gives preference to the design of granules balanced along all dimensions
rather than granules with unbalanced geometry. In particular, hyperbox-based
spatial granulation provides descriptions fully compatible with the descriptions of
intervals and fuzzy sets. With intervals and fuzzy sets, the pursuit of a balanced
granularity and refining and coarsening of granules are reduced to operations on
bounds of intervals and parameters of fuzzy membership functions.
3.6 Summary
Evolving granular systems combine granular computing and evolving intelligent
systems concepts into a single framework. We argued that it is sometimes unnec-
essary or inefficient to discriminate numeric data precisely. Moreover, we argued
that systems are better supported by a granular framework to suit uncertain,
granular stream data. Numeric data is a particular case in which a granule de-
generates into a singleton. The necessity of building models in finer granularities,
close to the singularity, is justified only when there are clear benefits on doing so.
This chapter presented the state of the art of the research in evolving granular
systems and discussed adaptive rule-based modeling from granular data streams.
45
46
Chapter 4
Interval Based Evolving
Modeling
This chapter introduces an interval-based evolving modeling approach to develop
system models using data streams. The approach consists of a rule-based model-
ing scheme that gradually adapts its antecedent and consequent parts over time.
Its main purpose is continuous (inductive) learning, self-organization, and adap-
tation to unknown, nonstationary environments. While traditional functional
rule-based modeling approaches use numeric data and produce numeric results,
the suggested interval approach uses interval data and presents results in numeric
and granular format. Interval rule-based approaches are highly human-centric in
the sense that antecedent and consequent of rules are intervals, which may convey
a linguistic meaning. Interval outputs are more informative and comprehensible
than numeric outputs.
4.1 Introduction
Interval based evolving modeling (IBeM) is an adaptive granular framework whose
idea is to enclose similar interval data into coarser albeit more interpretable inter-
val models. The outcomes of IBeM are single-valued and interval predictions of
a target function, and a rule summary, which describes the behavior of a system.
IBeM emphasizes uncertain data manifesting as multi-dimensional tolerance in-
47
4.1 Introduction
tervals and recursive learning procedures rooted in fundamentals of the interval
mathematics theory. Antecedent and consequent terms of IBeM rules are deter-
mined by input and output crisp hyperboxes (granules) formed over time. Crisp
hyperboxes are referred to as a modality of crisp granular precisiation in the gen-
eralized theory of uncertainty (27) (153). IBeM input and output hyperboxes
are linked by a granular mapping, also called inclusion function in the interval
analysis terminology (75) (78) (81).
IBeM is equipped with a one-pass-through-the-data recursive algorithm which
builds its rule set gradually from scratch, captures new concepts from data
streams, and copes with uncertainty. The IBeM approach makes no specific
assumption about the properties of the data including probability distributions,
belief intervals, possibility values, membership functions. Moreover, no human
intervention is necessary during model construction. Contrarily, interval data
stream guides learning and refinements. Examples of concepts translated into
interval data IBeM manages to handle include: the number of red balls in the
box is between 8 and 12; tonight’s temperature will be from 65 to 72 degrees;
the normal count of leukocytes in adult humans is 4500-11000 per cubic millime-
ter of blood; [0.8,0.85]. Intervals also rise after preprocessing singular data by
comprising them into a smaller amount of interval data.
Interval representation of interval data streams is attractive due to several
reasons: (i) easiness of acquiring parameters. Only two parameters related to
real features (upper and lower bounds) need to be captured; (ii ) adaptation of
intervals demands basic fully-formalized operations of interval arithmetic; (iii )
intervals make no specific assumption about the content of a granule. Higher-
level interval models are everything we wish to know from large quantities of
detailed, low-level, interval data; (iv) intervals can be translated quite easily to
linguistic propositions. Interval granular precision facilitates comprehension when
supported by a context. Naturally, an interval model has a great deal of appeal
to represent counterpart interval data.
48
4.2 Related Work
4.2 Related Work
Literature in interval data modeling using interval representation is scarce. Some
works related to adaptive (non-evolving) interval modeling approaches that take
into account interval data are summarized next. The approaches do not support
online learning and require the whole data set to be available. Neural network
approaches able to learn from interval data streams (for example (46) (108)) are
discussed in Chapter 6.
A partitioning dynamical clustering algorithm which considers interval data
and Pompeiu-Hausdorff distance was addressed in (31). The algorithm builds
clusters and identifies their representative prototypes concomitantly at each pro-
cessing step. Interval data are compared using adaptive Pompeiu-Hausdorff dis-
tance. This distance varies for each existing cluster according to intra-class struc-
tures. Although the clustering algorithm considers interval data, it is not suitable
for evolving modeling since prototype identification is based on optimizing an ad-
equacy criterion that requires all data samples within a cluster to be available.
An extension of the radial basis function (kernel method) approach to interval
data mining is proposed in (37). Here, interval data result from the aggregation of
large data sets into smaller ones to represent uncertainty. Aggregation is carried
out through a Pompeiu-Hausdorff distance-based approach that clusters numeric
data into crisp hyperboxes. The underlying learning approach can deal with
classification, regression and novelty detection problems. The learning approach
cannot handle online data streams.
Reference (12) proposes an interval analysis based adaptive approach for an
extended Kalman filter. The approach is aimed at mobile robot navigation and,
particularly, at obstacle avoidance and robot position estimation. Since Kalman
filters are often affected by noise and drift, interval adaptive approach is useful to
model and correct robot position estimates. Interval analysis methods dispense
deterministic modeling of the robot system and have shown to give more accurate
position estimates when compared to estimates using non-adaptive non-interval
Kalman filter methods.
49
4.3 Structure and Processing
4.3 Structure and Processing
The mathematical formalism of the interval analysis (Section 2.2) provides a
robust framework for the analysis of granular structures. Interval mathematics
supports the core of the IBeM learning algorithm and gives simplicity, correctness,
totality, closeness, efficiency, and optimality in the sense of Hickey et al. (56).
Let (x, y)[h],h= 1, ..., be the h-th observation of the target function f. The
output y[h]is known given the input x[h]or will be known some steps latter. In
this chapter each attribute xjof x= (x1, ..., xn) is an interval [xj, xj]. The same
holds true for the output y, that is, y= [y, y]. Therefore, (x, y) assembles a crisp
hyperbox (granule) in the Cartesian product space X×Y.
Let γi,i= 1, ..., c, be the current collection of IBeM granules built on the basis
of (x, y). Granules γiare defined in the Cartesian product space X×Y. The
internal representation of γiin respect to input variables x= (x1, ..., xn) is empty.
This means that bounds of intervals, say [li
j, Li
j], j= 1, ..., n, are all IBeM records
from the input data stream. The output variable yis granulated using bounds
[ui, Ui]. The content of a granule in respect to the output yconveys an additional
information, an inclusion monotonic function pi. The inclusion function uses the
bounds of the input variables to produce granular approximation of f.
Rules Riassociated with granules γiare of the type:
Ri: IF (li
1x1Li
1) AND ... AND (li
nxnLi
n)
THEN (uiyUi) AND ˆy=pi(x1, ..., xn)
where
pi(x1, ..., xn) = ˆy=ai
0+
n
X
j=1
ai
j[xj, xj].(4.1)
Functions piare thin (with single-valued parameters) and of first order in this
case. In general, each pican be of different type, thick, and does not need to
be linear. Computing piusing xgives interval granular or single-valued approx-
imation of fdepending on the width of x, that is, greater than zero or zero.
The recursive least squares algorithm as described in Appendix B is used to de-
50
4.4 Learning in IBeM
termine the coefficients ai
jof pi. Bounds [ui, Ui] are obtained from output data
granulation. They provide an enclosure of the solution, an outer approximation
of f.
IBeM exploits predominantly bottom-up incremental learning procedures to
form higher level granules and interval-valued rules from finer granular data. In
a sense, it performs input-output data compaction to provide more human-like
models. A -closure granular structure ensues from more specific local gran-
ules. In particular, granulation eases incremental updating and discovering of
the essence of the time and space structure of the data with modest storage
and processing costs. Experts usually prefer models that approximate real sys-
tem outputs and provide estimates of the approximation bounds. Taking into
account intervals in bounded-error context is the IBeM approach to deal with
uncertainty.
The next section introduces a learning approach to construct an IBeM model
from the very beginning, and adapt its structure and parameters on the fly.
4.4 Learning in IBeM
This section addresses the working principle of the IBeM learning algorithm.
The learning algorithm detailed next is used to evolve the structure and pa-
rameters of IBeM models whenever new information appears in the data stream.
By IBeM structure we mean interval-type granules, If-Then rules, and a concept.
From an overall point of view, when new samples do not fit current knowledge,
learning creates new granules and rules managing the granules. Conversely, when
new samples fit current knowledge, learning adapts parameters of existing gran-
ules and rules if necessary. Eventually, the resulting granular structure may be
refined or coarsened agreeing with inter-granule relationships.
The IBeM framework grants important characteristics for online modeling.
Its incremental learning algorithm spends a small and constant processing time:
the processing time does not scale with the number of samples. Continuous
processing on a per-sample basis enables IBeM to deal with concept drift and
shift within online environment. Constructive bottom-up mechanisms of learning
usually prevail over top-down, decomposing mechanisms.
51
4.4 Learning in IBeM
4.4.1 Choosing the Granularity
Let ρbe the maximum width that interval granules may assume:
wdt([li
j, Li
j]) ρ, wdt([ui, U i]) ρ, j = 1, ..., n;i= 1, ..., c. (4.2)
Values of ρallow different representations of the same problem in different levels
of detail. ρworks as an upper bound to the level of modeling abstraction.
For normalized data, ρtakes values in [0,1]. If ρequals 0, granules cannot be
expanded and each data sample is accommodated by a new granule. Conversely,
if ρequals 1, a single granule encloses the entire data set. To counterbalance
these extremes means to establish a tradeoff between complexity and precision.
In the most general case, IBeM starts learning with an empty rule base and
devoid of knowledge about the data stream. It is reasonable in this case to set ρ
halfway to regard rule creation and rule adaptation equally. We consider ρ[0] = 0.5
as the default initial value. A simple and fast approach to adapt the maximum
width ρallowed for granules is as follows. Let rbe the number of rules created
during hrtime steps. If the number of rules grows faster than a given rate η, that
is, r > η, then ρis increased,
ρ(new) = 1 + r
hrρ(old).(4.3)
The idea here is to reject large rule bases because they increase model complexity
and may not help generalization. Equation (4.3) acts against outbursts of growth
letting intervals and granules expand larger.
Otherwise, if the number of rules grows at a rate smaller than η, that is, r < η,
then ρis decreased as follows:
ρ(new) = 1(ηr)
hrρ(old).(4.4)
With this mechanism we maintain a data-dependent fluctuating granularity.
52
4.4 Learning in IBeM
Alternative heuristic approaches to evolve the value of ρover time take into
account estimation errors and their derivatives as addressed in (79). Time-varying
granularity avoids guesses on how fast and how often the data stream changes.
4.4.2 Time Granulation
Consider an interval data stream (x, y)[h],h= 1, ... Time granulation groups a
set of successive instances (x, y)[h],h=hb, hb+1, ..., he, where hband hedenote
the lower and upper bounds of a time interval [hb, he]. The set of instances input
during [hb, he] is considered indistinguishable and the inequalities
wdt(ch(x[hb]
j, ..., x[he]
j)) ρ, j = 1, ..., n, and wdt(ch(y[hb], ..., y[he])) ρ(4.5)
hold true. Literally, the width of the convex hull of all samples available during
[hb, he] is less or equal to the maximum width allowed for granules, ρ. The sample
indexed by he+1 conveys at least one contrasting value.
The collection (x, y)[h],h=hb, hb+1, ..., he, produces a unique granule γ[H]
whose lower and upper endpoints are:
[l[H]
j, L[H]
j]=[min(x[hb]
j, ..., x[he]
j), max(x[hb]
j, ..., x[he]
j)],and (4.6)
[u[H], U[H]] = [min(y[hb], ..., y[he]), max(y[hb], ..., y[he])].(4.7)
Thus, a single granule γ[H]summarizes the content of several samples (x, y)[h].
Both, γ[H]and (x, y)[h], are of the same interval nature.
In the IBeM framework, time granulation is used as a preprocessing step
whenever input data arrive at different rates, for example, x1arrives at each 10
milliseconds and x2at each 30 milliseconds. Multiple time granularities allow
synchronized analysis of concurrent data streams. Therefore, learning within the
space domain is based on the resulting interval granule γ[H]rather than on original
data (x, y)[h]. IBeM is not exposed to all original data, which are sometimes far
more abundant than time granules.
53
4.4 Learning in IBeM
4.4.3 Creating and Adapting Granules
No IBeM rules need to be preconceived nor does the amount of granules need to
be set in advance. From scratch, granules and rules are created and adapted on
demand, dynamically, steered by the behavior of the target process and informa-
tion mirrored in the measured data. Whenever data (x, y)[h]become available,
a decision mechanism is trigged and new granules and rules can be inserted into
the IBeM structure or existing ones can be refined.
A key decision when building IBeM models concerns when and how to cre-
ate or adapt granules and rules recursively to consider never seen data samples
potentially bringing new information.
Define Ei= (Ei
1, ..., Ei
n, Ei
k), the expansion region of granule γi:
Ei
j= [Li
jρ, li
j+ρ], j = 1, ..., n, (4.8)
Ei
k= [Uiρ, ui+ρ].(4.9)
Expansion regions help to derive criteria for deciding whether or not data should
be gathered into a common granule. Figure 4.1 illustrates the expansion region
Ei
jof an attribute [li
j, Li
j] of the granule γi.
Figure 4.1: Expansion region of an IBeM granule
An IBeM granule is created when expansion regions Ei,i= 1, ..., c, do not
fit the sample (x, y). In this case, none of the existing granules can expand its
54
4.4 Learning in IBeM
bounds beyond the limits imposed by ρto include the sample. Connective AND
operators of IBeM rules suggest the complete enclosing of both inputs xjand
output yfor the corresponding granule to be considered. The new granule γc+1
matches perfectly the sample that caused its creation, that is,
[lc+1
j, Lc+1
j]=[xj, xj],and (4.10)
[uc+1, U c+1]=[y, y].(4.11)
The parameters of the thin local function pc+1 are:
ac+1
j= 0, j 6= 0,and ac+1
0=mp(y).(4.12)
Adaptation of granules γisets the boundaries [li
j, Li
j] and [ui, Ui] to enclose
a sample (x, y). Meanwhile, parameters ai
jof the local inclusion function pi
are updated using recursive least squares as described in the Appendix B. A
granule γiis chosen to be adapted whenever (x, y) falls within its expansion region
Ei. Adaptation of only one granule is enough to ensure that the information
is incorporated in the model. Conflict resolution is addressed in Section 4.4.5.
Figure 4.2 summarizes nine situations that may happen depending on where the
data are confined and appropriate adaptations.
In Fig. 4.2, the recently arrived datum x= [x, x] can be placed either outside,
partially inside, or inside granule γi. Depending on the location of x, IBeM may
create a new granule γc+1 and/or adapt the bounds of γi. Expansion of granules
is chiefly based on union and convex hull operations. All uncertainty is covered
by some granule to guarantee outer approximation of the input and output data.
Although datum and granule may have some level of overlap, two granules are
forbidden to overlap as result of these adaptation procedures.
55
4.4 Learning in IBeM
Figure 4.2: Creation and recursive adaptation of IBeM granules
56
4.4 Learning in IBeM
4.4.4 Refining the Rule Base
Once granules are identified, IBeM analyzes the relationship among them and
proceeds accordingly. Top-down and bottom-up structural operations support
refining and coarsening of granules over time. Structural knowledge is generated
to help visualization of relationships between different parts of the problem.
Top-down processes produce -closure granular models by splitting a large
granule into smaller granules. Situations in which the maximum width allowed
for a granule reduces (see Section 4.4.1) may cause top-down refinements to fit
some granules to the new value. Formally, wdt(γi)< ρ,i= 1, ..., c, may return
false for some i. In this case, granule γiis split into γi1and γi2so that
[li1
j, Li1
j] = [li
j, mp([li
j, Li
j])],and (4.13)
[li2
j, Li2
j] = [mp([li
j, Li
j]), Li
j], j = 1, ..., n. (4.14)
The same splitting procedure holds for the output interval [ui, Ui]. The procedure
is repeated until wdt(γi)< ρ for all i.
A-closure granular model results from a bottom-up process that involves
forming a larger granule from small granules. Consider the overall distance be-
tween two interval vectors as described in Section 2.2.3 and let
D=
D(γ1, γ1)... D(γ1, γi)... D(γ1, γc)
.
.
.....
.
..
.
.
D(γi, γ1)... D(γi, γi)... D(γi, γc)
.
.
..
.
.....
.
.
D(γc, γ1)... D(γc, γi)... D(γc, γc)
(4.15)
be a distance matrix relating any pair of granules. Matrix Dis symmetric with
zeros in the main diagonal. Neighbor granules can be located close enough to
justify their combination into a coarser granule. The combination is based on the
minimum entry of matrix D, say D(γi1, γ i2), and depends if
57
4.4 Learning in IBeM
wdt(ch([li1
j, Li1
j],[li2
j, Li2
j])) ρ, j = 1, ..., n, and (4.16)
wdt(ch([ui1, U i1],[ui2, Ui2])) ρ. (4.17)
Granule γi=ch(γi1, γi2) is coarsening of γi1and γi2.
Coarsening provides more compact rule bases and contributes to eliminate
gaps between similar granules. At the top level, IBeM is closed by the most
general granule formed by the convex hull of all elementary granules.
4.4.5 Conflict of Interest
A requirement when designing granular systems such as IBeM is to include all
information that assembles a solution. However, at the same time it is desirable
to keep the system as simple as possible. During the course of learning, conflicting
situations may arise. In these cases, adaptation procedures that result in narrower
granules must be considered. Conflict of interest happens when two or more
Figure 4.3: Inter-granular conflict and data accommodation
58
4.4 Learning in IBeM
granules can be expanded to embrace a data sample. Figure 4.3 shows four
typical situations considering the current input xand two granules, say γi1and
γi2, they are: (i)xEi1= [Li1ρ, li1+ρ], but xdoes not. Conversely xEi2,
but xdoes not; (ii )xEi26=, but xEi1and x6∈ Ei2; (iii )x(Ei1Ei2);
and (iv)xEi16=, but xEi2and x6∈ Ei1. The respective adaptation
procedures are shown in the figure.
In case (i) a new granule is created to include xbecause γi1and γi2cannot
expand beyond ρ. Cases (ii) and (iv ) avoid redundancy and inconsistency ne-
glecting the adaptation of the granule that cannot enclose xentirely. Case (iii )
chooses the granule closer to xaccording to the distance D.
Inter-granular conflict resolution helps to choose which IBeM rule to adapt
and prevents overlapped intervals and contradiction. The tightest envelope for
the data generates a more concise description about the information it carries.
4.4.6 Removing Granules
Depending on the data sequence, small undesirable granules might be formed
quite close to large granules. These small granules, often called satellites (97),
contain residual information which is better neglected. The deletion procedure
we have proposed is useful to retain only the necessary information.
Broadly speaking, a granule should be removed from the IBeM structure if it
is inconsistent with the current concept. Common removing strategies either (i)
remove granules by age, (ii ) exclude the weakest granules based on error values, or
(iii) delete the most inactive granules. In IBeM, the strategy is to delete inactive
granules by exclusion. Old granules may still be useful in the current environment
whereas weak granules are attempted to be strengthened by adapting parameters
of local inclusion functions.
IBeM granules are deleted whenever they become inactive during hrtime
steps. If the application requires memorization of rare events, or if cyclical drifts
are anticipated, then it may be the case to let the granules remain forever. Re-
moving inactive granules periodically helps to keep the rule set updated and
concise.
59
4.5 Summary
4.4.7 Learning Algorithm
The IBeM modeling procedure can be summarized as follows:
——————————————————————
BEGIN
Set parameters ρ,hr,η,c= 0;
Read (x, y)[h],h= 1;
Create granule γc+1 and rule Rc+1;
For h= 2,... do
Read (x, y)[h];
Provide single-valued approximation mp(p(x[h]));
Provide granular approximation [ui,U i];
Calculate output error [h]=mp(y[h])mp(p(x[h]));
If (x[h]
j/Ei
j|| y[h]/Ei
k, for i= 1, ..., c, and some j)
Create γc+1 and Rc+1;
Else
Resolve possible conflicts;
Update γiand Rito accommodate (x, y )[h];
Adapt local inclusion function parameters ai
jusing RLS;
If h=αhr,α= 1,2, ...
Update model granularity ρ;
Split and combine granules when feasible;
Remove inactive granules and rules;
END
——————————————————————
4.5 Summary
This chapter has introduced an interval-based evolving modeling approach to
assess the essence of interval data streams and simplify complex problems char-
acterized by nonlinearity and nonstationarity. The IBeM modeling approach for
interval data is based on the incremental evolution of hyperrectangle-like forms
of information granules and associated interval inclusion functions. Stream data
guide the construction of both granules and rule base without the need of hu-
man intervention or additional care. IBeM provides highly human-centric interval
and precise numeric approximations of functions using a fast one-pass learning
algorithm, modest memory requirements, and uncertain data.
60
Chapter 5
Fuzzy Set Based Evolving
Modeling
This chapter introduces an evolving fuzzy-set based granular framework to learn
from and model time-varying fuzzy input-output data streams. The framework
consists of a recursive algorithm capable of developing the structure of fuzzy
rule-based models on-demand. The framework is particularly suitable to handle
potentially unbounded fuzzy data streams and provide single-valued and granular
approximations of unknown nonstationary functions.
5.1 Introduction
A primary requirement of a broad class of evolving intelligent systems is to pro-
cess a sequence of numeric data over time. The fuzzy set based evolving modeling
(FBeM) framework employs fuzzy granular models to deal with more detailed
fuzzy granular data and therefore provide a more intelligible exposition of the
data. For each granular model there exists an associated fuzzy rule base. The
antecedent part of FBeM rules consists of fuzzy hyperboxes, which are inter-
pretable transparent descriptors of input granular data. The consequent part of
FBeM rules has a linguistic and a functional component. The linguistic compo-
nent arises from fuzzy hyperboxes formed by output data granulation. It facili-
tates model interpretation and encloses possible model outputs. The functional
61
5.2 Related Work
component is derived from input data and real-valued local functions. This com-
ponent produces more accurate approximands. The rationale behind the FBeM
approach is that it looks to input-output data streams under different resolutions
and decides when to adopt coarser or more detailed granularities.
The function of FBeM is to deliver simultaneous single-valued and granular
function approximation and linguistic description of the behavior of a system.
Local FBeM models are a set of If-Then rules developed incrementally from input-
output data streams. Learning can start from scratch and, as new information is
brought by the data stream, granules and rules are created and their parameters
adjusted. Therefore, FBeM becomes more flexible to handle data so that redesign
and retraining models all along are needless. The resulting input-output granular
mapping may be eventually either refined or coarsened according to inter-granules
relationships and error indices.
5.2 Related Work
This section summarizes works related to incremental learning approaches to
handle data streams. The approaches described next are closely related to the
approach suggested in this chapter.
Fuzzy ARTMAP (30) is an adaptive resonance neural network. Its incremental
learning ability suggests its use in online data processing. Fuzzy ARTMAP is a
supervised neural network characterized by weight vectors. Half of the positions
of weight vectors represent one corner of a hyperbox, and the remaining half the
opposite corner. When new data arrive, the smallest box that encloses the data
is chosen. If no such box exists, then either the box that needs less expansion to
enclose the data is selected, or a new one is created. Next, a neuron representing
a hyperbox is selected and a vigilance criterion is checked. Vigilance serves to
choose another box if the one selected is too large. Fuzzy ARTMAP compares
the mapped class for an input with the actual label. If the labels are different,
then it creates a new neuron and connects it to the actual label. Fuzzy ARTMAP
is not appropriate to fit fuzzy data and provide granular function approximation.
Evolving Mamdani-Takagi-Sugeno neural fuzzy inference system (eMTSFIS)
(58) is a neurofuzzy approach to model the dynamic nature of real-world prob-
62
5.3 Structure and Processing
lems. eMTSFIS comes with an incremental learning algorithm that evolves its
structure and parameters according to time-varying numeric data streams. The
learning algorithm is life-long and addresses the stability-plasticity dilemma typ-
ical of neurofuzzy constructs. Essentially, the eMTSFIS approach combines the
relatively higher interpretability of Mamdani-type systems with the precision of
Takagi-Sugeno fuzzy systems in numeric data stream modeling.
The uncertain micro-clustering algorithm (UMicro) (2) considers that stream
data arrive together with their underlying standard error instead of assuming the
entire probability distribution function of the data is known. The algorithm uses
uncertainty information to improve the quality of the underlying results. UMicro
incorporates a time decay method to update the statistics of micro-clusters. The
decaying method is especially useful to model drifting concepts in evolving data
streams. The efficiency of the UMicro approach has been demonstrated in a
variety of data sets.
5.3 Structure and Processing
The formalism of fuzzy sets (Section 2.4) provides a framework for the analysis
and representation of fuzzy granular structures.
Let (x, y)[h],h= 1, ..., be the h-th observation of a data stream. The out-
put y[h]is known provided the input x[h]or will be known some steps latter.
In this chapter, each attribute xjof x= (x1, ..., xn) is a trapezoidal fuzzy in-
terval (xj, xj, xj, xj). The same holds for the output y, that is, y= (y, y, y, y).
Therefore, (x, y) shapes a fuzzy hyperbox in the Cartesian product space X×Y.
Let γi,i= 1, ..., c, be the current collection of FBeM granules built on the
basis of (x, y). Granules γiare defined in the Cartesian product space X×Y.
Rules Rigoverning FBeM information granules γiare of the type:
Ri: IF (x1is Ai
1) AND ... AND (xnis Ai
n)
THEN (yis Bi)
| {z }
linguistic
AND ˆy=pi(x1, ..., xn)
| {z }
functional
,
where Ai
jand Biare trapezoidal membership functions built in light of input and
63
5.3 Structure and Processing
output data being available; piis a local approximation function. The collection
of rules Ri,i= 1, ..., c, casts a rule base. Rules in FBeM are created and adapted
on the fly whenever the data ask for improvement in the current model. Notice
that an FBeM rule combines both linguistic and functional consequents. The
linguistic part of the consequent favors interpretability since fuzzy sets may come
with a label. The functional part of the consequent offers accuracy. Thus, FBeM
takes advantage of both linguistic and functional consequents within a single
framework.
Fuzzy sets Ai
jand Biare generated from scattered fuzzy granulation. The
scattering approach clusters the data into fuzzy sets when appropriate, and takes
into account the coexistence of a manifold of granularities in the data stream.
Sets Ai
jand Bican be easily extended to fuzzy hyperboxes γi(granules) in the
product space. Granules are positioned at locations populated by input and
outputs data in the product space. Figure 5.1 illustrates the scatter granulation
mechanism of fuzzy data. Note in the figure that the granularity of models is
coarser than the granularity of data. This is to obtain data compression and to
provide a more abstract, human-interpretable, representation of the data.
Figure 5.1: Scattering approach for fuzzy data granulation
Fitting data into conveniently placed and sized granules through scattering
leaves substantial flexibility for incremental learning. The FBeM approach grants
64
5.3 Structure and Processing
freedom in choosing the internal structure of granules.
Yager (138) (140) has demonstrated that a trapezoidal fuzzy set Ai
j= (li
j,λi
j,
Λi
j,Li
j) allows the modeling of a wide class of granular objects. A triangular fuzzy
set is a trapezoid where λi
j= Λi
j; an interval is a trapezoid where li
j=λi
jand
Λi
j=Li
j; a singleton (singular datum) is a trapezoid where li
j=λi
j= Λi
j=Li
j.
Additional features that make the trapezoidal representation attractive comprise:
(i) ease of acquiring the necessary parameters: only four parameters need to
be captured; (ii ) many operations on trapezoids can be performed using the
endpoints of intervals which are level sets of trapezoids; moreover, the piecewise
linearity of the trapezoidal representation allows calculation of only two level
sets, corresponding to the core and support, respectively, to obtain a complete
implementation; (iii ) trapezoids are easier to translate into linguistic labels.
Fuzzy sets Bi= (ui, υi,Υi, U i) are used to assemble granules in the output
space. The local function piis adapted for samples that rest inside the granule
γi. In general, functions pican be of different type and are not required to be
linear. Here we assume affine functions:
pi(x1, ..., xn) = ai
0+
n
X
j=1
ai
jmp(xj),(5.1)
for simplicity. If higher-order functions piare used to approximate a function
f, then the number of coefficients to be estimated increases, especially when
the number of input variables nis large. The recursive least squares algorithm
(Appendix B) is used to calculate the coefficients ai
jof pi.
Trapezoidal fuzzy sets and scatter granulation allow granules to overlap. There-
fore, two or more granules can accommodate the same data sample. FBeM sin-
gular output is found as the weighted mean value:
p=
c
P
i=1
min(Ai
1(x1), ..., Ai
n(xn))pi(x1, ..., xn)
c
P
i=1
min(Ai
1(x1), ..., Ai
n(xn))
.(5.2)
65
5.4 Learning in FBeM
Granular output is given by the convex hull of output fuzzy sets Bi, where i
are indices of granules that can accommodate the data sample. The convex hull
of trapezoidal fuzzy sets B1, ..., Bcis given as follows:
ch(B1, ..., Bc) = (min(u1, ..., uc), min(υ1, ..., υc),
max1, ..., Υc), max(U1, ..., U c)).(5.3)
The granular output given by Bienriches decision making and motivates
interpretability. While being specific as determined by pwe risk being incorrect,
being unspecific from Biincreases our confidence to be correct.
5.4 Learning in FBeM
5.4.1 Setting the Granularity
The maximum width fuzzy sets Ai
jare allowed to expand is denoted by ρ, that is,
wdt(Ai
j) = Li
jli
jρ,j= 1, ..., n;i= 1, ..., c. Different values of ρyield different
representations of the same data stream in different levels of granularities.
Let the expansion region of a set Ai
jbe denoted by
Ei
j= [mp(Ai
j)ρ
2, mp(Ai
j) + ρ
2],(5.4)
where mp(Ai
j)=(λi
j+ Λi
j)/2 is the midpoint of Ai
j. Expansion regions help to
derive criteria for deciding whether or not data samples should be considered in
the same granule.
For normalized data, ρtakes on values in [0,1]. If ρis equal to 0, then FBeM
granules are not enlarged. Learning creates a new rule for each sample, which may
cause overfitting and lead to excessive complexity and irreproducible optimistic
results. If ρequals 1, then a single granule may cover the entire data domain so
that FBeM becomes unable to manage nonstationarity in the data. Meaningful
life-long adaptability is reached choosing intermediate values for ρ.
66
5.4 Learning in FBeM
In the most general case, FBeM starts learning with an empty rule base and
without any knowledge about the data-generating process. In this case, a reason-
able approach is to initialize ρhalfway to yield structural stability and plasticity
equally. We consider ρ[0] = 0.5 as the default initial value.
A fast procedure to evolve ρover time is as follows. Let rbe the difference
between the current number of granules and the number of granules hrsteps
earlier, r=c[h]c[hhr]. If the quantity of granules grows faster than a given
rate η, that is, r > η, then ρis increased:
ρ(new) = 1 + r
hrρ(old).(5.5)
Equation (5.5) controls the value of ρin such a way to reject large rule bases and
therefore avoid complexity increasing. Large rule bases may not help generaliza-
tion of the results.
If the number of granules grows at a rate smaller than η, that is, rη, then
ρis decreased as follows:
ρ(new) = 1(ηr)
hrρ(old).(5.6)
This procedure keeps model granularity time-varying according to the data stream.
Other granularity adaptation procedures may take into consideration estimation
errors and their derivatives, as suggested in (79) (80).
Reducing the maximum width allowed for granules requires shrinking larger
granules to fit them to the new value. In this case, the support of a fuzzy set Ai
j
is narrowed as follows:
If mp(Ai
j)ρ(new)
2> li
jthen li
j(new) = mp(Ai
j)ρ(new)
2
If mp(Ai
j) + ρ(new)
2< Li
jthen Li
j(new) = mp(Ai
j) + ρ(new)
2.
67
5.4 Learning in FBeM
Cores [λi
j,Λi
j] are handled similarly. Time-varying granularity is useful to avoid
guesses on how fast the data stream changes.
5.4.2 Time Granulation
Consider a fuzzy data stream (x, y)[h],h= 1, ... Time granulation groups a set of
successive samples (x, y)[h],h=hb, hb+1, ..., he, where hband hedenote the lower
and upper bounds of a time interval [hb, he]. The set of instances input during
[hb, he] produces a unique granule γ[H]whose corresponding fuzzy sets are
A[H]
j= (min(x[hb]
j, ..., x[he]
j), min(x[hb]
j, ..., x[he]
j),
max(x[hb]
j, ..., x[he]
j), max(x[hb]
j, ..., x[he]
j)),(5.7)
and B[H], which is constructed similarly from the output stream. Instances falling
within A[H]
j,j= 1, ..., n, and B[H]are considered indiscernible and the inequalities
wdt(A[H]
j)ρ, j = 1, ..., n, and wdt(B[H])ρ(5.8)
hold true.
Whenever input data arrive at different rates, for example, x1arrives at each
2 seconds and x2at each 3 seconds, or the amount of data exceeds the affordable
computational cost (e.g. in high-frequency applications), we resort to granulated
views of the time domain. Thereafter, rule construction is based on the resulting
fuzzy granules, A[H]
jand B[H], rather than on original data (x, y)[h]. FBeM does
not need to be exposed to all original data.
5.4.3 Creating Granules
No rule necessarily exists before learning starts. The incremental procedure to
create rules runs whenever at least one entry of an input (x1, ..., xn) does not
belong to expansion regions (Ei
1, ..., Ei
n), i= 1, ..., c, or the output y6⊂ Ei
k,
i= 1, ..., c. Otherwise, the current rule base is not modified.
68
5.4 Learning in FBeM
A new granule γc+1 is assembled from fuzzy sets Ac+1
jand Bc+1 whose param-
eters match the sample, that is,
Ac+1
j= (lc+1
j, λc+1
j,Λc+1
j, Lc+1
j)=(xj, xj, xj, xj),
Bc+1 = (uc+1, υc+1,Υc+1, Uc+1)=(y, y, y, y).(5.9)
Coefficients of the real-valued local function pc+1 are set to
ac+1
0=mp(y); ac+1
j= 0, j 6= 0.(5.10)
5.4.4 Adapting Granules
Adaptation of granules either expands or contracts the support and the core
of fuzzy sets Ai
jand Bito enclose new data, and simultaneously refines the
coefficients of local functions pito improve accuracy. A granule is chosen to
be adapted whenever an instance of the data stream falls within its expansion
region. In situations in which two or more granules are qualified to enclose the
data, adapting only one of the granules is enough.
Data and granules are fuzzy objects of trapezoidal nature. A possible simi-
larity measure for vectors of trapezoids is:
S(x, Ai) = 1 1
4n
n
X
j=1
(|xjli
j|+|xjλi
j|+|xjΛi
j|+|xjLi
j|).(5.11)
This measure is based on Hamming or city-block distance (52); it quantifies the
degree that input data match the current knowledge. Particularly, equation (5.11)
returns 1 for identical trapezoids, indicating the maximum degree of matching.
The output value decreases linearly as the trapezoids xand Aimove away from
each other. Among all granules qualified to accommodate a particular sample,
the one with highest similarity should be chosen. This procedure prevents conflict
and helps to keep the FBeM construction simple. Although equation (5.11) is
69
5.4 Learning in FBeM
simple to compute, involving only basic arithmetic operations, there are no strong
principled reasons to impose this measure. In fact, there is no generally accepted
consensus on a best similarity measure for a given application (33).
Adaptation of granules proceeds depending on how far an input datum xjis
from fuzzy set Ai
j. Namely,
If xj[mp(Ai
j)ρ
2, li
j] then li
j(new) = xj
If xj[mp(Ai
j)ρ
2, λi
j] then λi
j(new) = xj
If xj[λi
j, mp(Ai
j)] then λi
j(new) = xj
If xj[mp(Ai
j), mp(Ai
j) + ρ
2] then λi
j(new) = mp(Ai
j)
If xj[mp(Ai
j)ρ
2, mp(Ai
j)] then Λi
j(new) = mp(Ai
j)
If xj[mp(Ai
j),Λi
j] then Λi
j(new) = xj
If xji
j, mp(Ai
j) + ρ
2] then Λi
j(new) = xj
If xj[Li
j, mp(Ai
j) + ρ
2] then Li
j(new) = xj.
The first and the eighth rules suggest support expansion while the second and
seventh recommend core expansion. The remaining cases advise core contraction.
Figure 5.2 illustrates seven possible adaptation situations. In the figure, the
datum x= (x, x, x, x) places either outside, partially inside, or inside granule γi.
FBeM creates a new granule γc+1 or adapts the parameters of γiaccordingly.
Operations on core parameters, λi
jand Λi
j, require further adjustment of the
midpoint of the respective granule:
mp(Ai
j)(new) = λi
j(new) + Λi
j(new)
2.(5.12)
As result, support contraction may happen in two occasions:
If mp(Ai
j)(new) ρ
2> li
jthen li
j(new) = mp(Ai
j)(new) ρ
2
If mp(Ai
j)(new) + ρ
2< Li
jthen Li
j(new) = mp(Ai
j)(new) + ρ
2.
Adaptation of consequent fuzzy sets Biis done similarly using output data
70
5.4 Learning in FBeM
Figure 5.2: Creation and recursive adaptation of FBeM granules
y. Coefficients ai
jof the local function piare updated using the recursive least
squares algorithm as described in Appendix B. Storage of a number of recent
instances may be useful to guide alternative coefficient identification algorithms,
e.g., data chunks oriented algorithms. However, it comes with some additional
cost concerning memory and processing time.
71
5.4 Learning in FBeM
5.4.5 Coarsening the Granular Model
Relationships between granules may be strong enough to justify assembling a
more abstract granule that inherits the information of lower level granules. The
similarity measure (5.11) can be used to quantify granule-granule resemblance if
we restate it as:
S(Ai1,Ai2) = 11
4n
n
X
j=1
(|li1
jli2
j|+|λi1
jλi2
j|+|Λi1
jΛi2
j|+|Li1
jLi2
j|).(5.13)
This measure has better discrimination capability than, for example, distance
between midpoints of granules (136), and its calculation is fast.
FBeM combines granules in intervals of hrsteps considering the lowest entry
of S(Ai1, Ai2), i1, i2= 1, ..., c,i16=i2, and a decision criterion. The decision may
be based on whether the new granule obeys the maximum width allowed ρ.
A new granule γi, coarsening of γi1and γi2, is formed by trapezoidal mem-
bership functions Ai
jwith parameters derived from Ai1
jand Ai2
jas follows:
li
j=min(li1
j, li2
j)
λi
j=min(λi1
j, λi2
j)
Λi
j=maxi1
j,Λi2
j)
Li
j=max(Li1
j, Li2
j).(5.14)
The granule γiencloses all the content of the granules γi1and γi2. The same
coarsening procedure is used to determine the parameters of the output member-
ship function Bi. The coefficients of the local function of granule γiare:
ai
j=1
2(ai1
j+ai2
j), j = 0, ..., n. (5.15)
Combining granules reduces the size of the rule base and eliminates redundancy.
The importance of reducing the number of rules in evolving rule-based systems
72
5.4 Learning in FBeM
is discussed in (96).
5.4.6 Removing Granules
A granule should be removed from the system model if it seems to be inconsistent
with the current knowledge. A common approach consists in deleting the most
inactive granules (79).
Let
Θi= 2(ψ(hhi
a)) (5.16)
be the activity factor associated to the granule γi, similar as in (3). The constant
ψis a decay rate, hthe current time step, and hi
athe last time step that granule γi
was processed. Factor Θidecreases exponentially when hincreases. The half-life
of a granule is the time spent to reduce the factor Θiby half, that is, 1.
Half-life 1is a value that suggests deletion of inactive granules. As a rule, ψ
is domain-dependent. Large values of ψexpress lower tolerance to inactivity and
higher privilege of more compact structures. Small values of ψadd robustness
and prevent catastrophic forgetting. If the application requires memorization of
isolated events or seasonality is expected, then it may be the case to set ψto 0
and let granules and rules exist forever. In general, ψshould be set in ]0,1[ to
keep model evolution active.
5.4.7 Learning Algorithm
The learning procedure to evolve FBeM models can be summarized by the al-
gorithm described below. The algorithm underlines the essence of data stream
oriented approaches where instances are read and discarded one at a time. His-
torical data are dispensable and evolution stands continuously on an incremental
basis.
73
5.5 Summary
———————————————————————————
BEGIN
Set parameters ρ,hr,η,ψ,c= 0;
Read (x, y)[h],h= 1;
Create granule γc+1;
For h= 2,... do
Read (x, y)[h];
Provide single-valued approximation p(x[h]);
Provide granular approximation Bi;
Calculate output error [h]=mp(y[h])p(x[h]);
If x[h]or y[h]is not into granules’ expanded regions Eii
Create granule γc+1;
Else
Adapt the most active granule γi,i=maxi(S(x, A1), ..., S(x, Ac));
Adapt local function parameters ai
jusing RLS;
If h=αhr,α= 1,2, ...
Combine granules when feasible;
Update model granularity ρ;
Remove inactive granules;
END
———————————————————————————
5.5 Summary
This chapter has introduced FBeM, an evolving granular fuzzy modeling frame-
work based on fuzzy granular data streams. FBeM carries a series of properties
that makes it suitable to model online nonstationary functions using fuzzy data.
FBeM gives accurate and granular information simultaneously. Granular model
outputs contain a range of possible values delimited by soft boundaries which
turns the outputs more reliable and truthful.
74
Chapter 6
Evolving Granular Neural
Networks
This chapter introduces evolving granular neural networks for neurofuzzy mod-
eling of fuzzy data streams. The evolving granular neural network (eGNN) ef-
ficiently handles concept changes, distinctive events of nonstationary environ-
ments. eGNN builds interpretable multi-sized local models using fuzzy neural
information fusion. An incremental learning algorithm develops the neural net-
work topology from the information contained in data streams. We emphasize
fuzzy intervals and objects with trapezoidal membership function representation.
More precisely, the framework considers triangular, interval, and numeric types of
data to construct granular fuzzy models as particular arrangements of trapezoids.
6.1 Introduction
Artificial neural network approaches are able to perform parallel processing, and
identify and generalize patterns in data sets. Although classical neural networks
can approximate any nonlinear continuous function in compact domains, they
usually demand high quality training data and time-consuming offline learning.
Generally speaking, neural networks give black box, non-transparent, difficult to
interpret models.
Granular neural networks were introduced in (112) as a framework to process
75
6.1 Introduction
information granules. By considering sets of objects sharing commonalities and
imprecise data items instead of precise singular data items, granular neural net-
works avoid processing detailed and costly data. Granular neural networks do not
need to consider all data, which are far more numerous than granules. Rather,
data can be discarded whenever they match an existing information granule.
The eGNN approach described in this chapter, differently from (112), is com-
mitted to online modeling of potentially unbounded fuzzy data streams. We
focus on fuzzy trapezoidal data, namely granular data expressed by trapezoidal
fuzzy sets. Trapezoids allow some freedom in the choice of representative granules
once they embody triangular fuzzy sets, intervals and real values as particular
realizations (138). The eGNN structure uses fuzzy aggregation neurons as basic
processing units and encodes a set of fuzzy rules and a fuzzy inference system.
Its structure results from a gradual neurofuzzy construction that is transparent
and interpretable. eGNN manages to discover more abstract high-level granular
knowledge from finer granular data. High-level granular knowledge can be easily
translated into a fuzzy knowledge base. Each rule of the knowledge base consists
of two parts: rule antecedent (If part) and rule consequent (Then part). The
consequent part is composed by a linguistic and a local functional (real-valued
function) term. Independently of the choice of aggregation neurons, network
parameters and the nature of input-output data, the linguistic term of the rule
consequent provides a granular output while the functional term gives a singular
(pointwise) output.
The eGNN framework addresses four issues: (i) non-interpretability and lack
of transparency of black box neural network models; (ii ) online processing of
granular data streams; (iii ) trading off precision and interpretability; and (iv )
handling of large volumes of nonstationary data. The first issue is addressed
using fuzzy hyperbox representations, which are interpretable and transparent
descriptors of granular data, together with fuzzy neurons to aggregate data. The
second issue is dealt with through incremental learning mechanisms capable of
processing granular data. The third issue is handled by combining functional
and linguistic fuzzy models into a single model. The last issue is managed by
resorting to scalable recursive learning algorithm that works on a per-sample
basis, requiring only features of a sample plus a small amount of aggregated
76
6.2 Related Work
information such as fuzzy rule bases or neural networks. Learning should be one-
pass, neglecting all previously seen data samples: each sample is processed only
once and removed from memory (10).
Learning in eGNN fundamentally means to accommodate new data into ex-
isting granular models on a recursive basis. Learning may add new granules,
neurons and respective connections into the network structure whenever neces-
sary. The parameters of the real-valued functions of rule consequents are also
object of learning. This means that eGNN captures new information from data
streams, adapts itself to the new scenario, and avoids redesigning and retrain-
ing. The granular neural structure may be coarsened or refined depending on
inter-granules relationships (transparency and interpretability) and error indices
(accuracy).
Practical applications of eGNN include evolving regressors, classifiers, fore-
casters and neurofuzzy controllers (76) (77) (82).
6.2 Related Work
There are two main approaches related with evolving granular neural networks.
The first concerns granular neural networks for data granulation, granularity
adaptation, and granular data processing. These are grounded in the princi-
ples of granular computing aiming at problem solving, complexity reduction, and
structured thinking. In this case, granular neural networks often require multi-
ple passes over data sets and offline learning. The second involves data stream
oriented connectionist systems whose focus is on online tracking of nongranular
data in nonstationary environments. Although a number of evolving fuzzy neural
networks have succeeded in dealing with time-varying information, as it will be
shown next, they are not able to process granular fuzzy input-output data. The
evolving granular neural network framework addressed in this chapter benefits
from and enhances both approaches.
77
6.2 Related Work
6.2.1 Granular Neural Networks
On conceptualizing the world at different granularities, humans usually deal with
information granules hierarchically (88) (142) (156). Human learning profits from
the aggregation of local fragments to form a global picture. Granular computing
offers human-like learning to be gradually embedded into modeling approaches
through incremental granular data processing. Granule oriented neural networks
generalize numeric data oriented neural networks because they provide mecha-
nisms to process both singular and granular data.
Granular self-organizing maps (grSOM) (62) are neurofuzzy models for struc-
ture identification in linguistic system modeling. grSOM induces fuzzy intervals
(granules) from the data using a metric tuned nonlinearly by a mass function. Its
learning approach is supported by an analysis from the lattice theory and genetic
algorithms. grSOM copes with ambiguity and can process fuzzy and interval
input data. Experimental results (62) recommend grSOM as a support tool in
decision making and performance gain in classification tasks. Although grSOM
copes with ambiguity and can process fuzzy and interval input data, it needs the
entire data set to be available a priori. Learning is not incremental, and model
structure and parameters are not adaptive to test data.
Fuzzy granular neural networks (FGNN) (157) consider numeric-linguistic
data fusion, missing value prediction, and granular knowledge discovering in het-
erogeneous data sets. The FGNN packs relatively detailed granular data into
coarser and more intelligible granules. Since FGNN requires offline learning and
uses a gradient descent method to back propagate errors and to adapt parameters,
it is not suitable to handle nonstationary data available gradually.
Granular reflex fuzzy min-max neural networks (GrRFMN) (108) learn from
and classify interval granular online data. The network structure emulates the
reflex mechanism of the human brain and deals with class overlapping using com-
pensation neurons. The GrRFMN training algorithm gives a way to compute
datum-model membership which leads to better network performance. Experi-
ments with real data sets (108) assert the effectiveness of the approach.
The fuzzy min-max neural network (GFMM) (46) summarized in Section 3.2.2
is another example of a closely related granular neurofuzzy approach.
78
6.2 Related Work
The eGNN approach distinguishes from the granular neural network approaches
described above because it is oriented to fuzzy interval data streams, self-adapts
its structure continuously, and does not need to store past data.
6.2.2 Data Stream Oriented Neural Networks
The continuous increase in availability of large amounts of data has motivated
development of algorithms to process online data streams (11) (66). Evolving
the structure of models using new information in nonstationary data streams is a
challenging issue. Most of the traditional data mining and machine learning and
statistical algorithms usually assume a form of stationarity and demand multi-
ple passes over data sets. Thus, they do not meet the requirements needed for
online learning. The design of evolving models is concerned with gradual model
construction aiming at inducing new knowledge without catastrophic forgetting,
and refining current knowledge keeping the system in operation.
Unsupervised evolving neural networks (65) can perform classification from
unlabeled data streams. An example in this category is the Evolving Self-Organiz-
ing Map (eSOM), which uses standard principles of self-organization in an incre-
mental basis. eSOM allows prototype neurons to evolve in the input space to si-
multaneously acquire and keep a topological representation. The neighborhoods
of evolved neurons are not predefined, and differently from non-evolving SOM,
they are determined online according to distances between neurons (65). The
eSOM learning algorithm is fast since it dispenses neighborhood ranking search,
but it is unable to cope with fuzzy intervals. eSOM is not proper for function
approximation.
Evolving fuzzy neural networks (EFuNN) (63) adapt their structure and pa-
rameters through incremental, hybrid supervised/unsupervised online learning.
EFuNN is able to model streams of data, new features, and new classes. Neurons
and connections are created during learning. Fuzzy or non-fuzzy rules can be
extracted from the network at any time. EFuNN has shown to be efficient in pat-
tern recognition. The dynamic evolving neural-fuzzy inference system (DENFIS)
(64) is another type of fuzzy rule-based system for adaptive online learning. Like
EFuNN, DENFIS evolves using a hybrid incremental algorithm to fit new input
79
6.3 Fuzzy Aggregation Neuron Model
data. Fuzzy rules may be created, updated or deleted before or during system op-
eration. It has been shown that DENFIS can learn complex temporal sequences
and outperform similar approaches. Both, EFuNN and DENFIS, cannot process
fuzzy intervals nor provide granular output.
The eGNN approach is different from the evolving neural network approaches
described above primarily because of its ability to process fuzzy granular data
streams. Moreover, eGNN is able to simultaneously provide single-valued and
granular function approximation or classification.
6.3 Fuzzy Aggregation Neuron Model
This section introduces fuzzy aggregation neurons which are pertinent when pro-
cessing data through successive layers of evolving granular neural networks.
Aggregation neurons are artificial neuron models based on aggregation opera-
tors (see Section 2.5). Evolving granular neural networks may use different types
of aggregation neurons to perform information fusion (26). In general, there are
no specific guidelines to choose a particular aggregation operator to construct a
fuzzy neuron. The choice depends on the application environment and domain
knowledge. Although the choice usually conforms to simplicity, transparency, and
flexibility requirements, occasionally, it may conform to the system performance
using the available data.
Let ex= (ex1, ..., exn) be a vector of membership degrees of a sample x=
(x1, ..., xn) in the fuzzy sets Gjof G= (G1, ..., Gn). Let w= (w1, ..., wn) be a
weighting vector such that
wj[0,1], j = 1, ..., n. (6.1)
Fuzzy aggregation neurons employ product T-norm to perform synaptic process-
ing and an aggregation operator Cto fuse the individual results of synaptic
processing in the neuron body. The output of a fuzzy aggregation neuron is:
80
6.4 Structure and Processing
o=C(ex1w1, ..., exnwn).(6.2)
An aggregation neuron produces a diversity of nonlinear mappings between
neuron inputs and output depending on the choice of weights w, triangular norms
Tand S, and parameters e(neutral element of uninorms, see Section 2.5.3) and
v(v-factor of compensatory aggregations, see Section 2.5.5). The structure of a
fuzzy aggregation neuron is shown in Fig. 6.1. Examples of outputs generated
by fuzzy aggregation neurons using uninorm Uwith T= min, S= max, e=
0.3, v= 0; and T-S aggregation Lwith T= min, S= max, e= 0, v= 0.3 are
illustrated in Figs. 6.2(a) and 6.2(b), respectively.
Figure 6.1: Fuzzy aggregation neuron model
6.4 Structure and Processing
Let x= (x1, ..., xn) be an input vector and yits corresponding output. Assume
that the data stream (x, y)[h],h= 1, ..., are samples derived from a time-varying,
nonstationary function f. Inputs xjand output yare symmetric fuzzy data.
eGNN has a four-layer structure as shown in Fig. 6.3. The input layer delivers
samples x[h], one at a time, to the network. The granular layer consists of a collec-
tion of fuzzy sets Gi
j,j= 1, ..., n;i= 1, ..., c, stratified from the input data. Fuzzy
sets Gi
j,i= 1, ..., c, form a fuzzy partition of the j-th input domain, Xj. Simi-
larly, fuzzy sets Γi,i= 1, ..., c, assemble a fuzzy partition of the output domain
81
6.4 Structure and Processing
(a) Uninorm Umin,max ,e= 0.3, v= 0
(b) T-S operator Lmin,max,e= 0, v= 0.3
Figure 6.2: Examples of input/output functions of fuzzy aggregation neurons
Y. A granule γi=Gi
1×...×Gi
n×Γiis a fuzzy relation in X1×...×Xn×Y. Thus,
granule γihas membership function γi(x, y) = min{Gi
1(x1), ..., Gi
n(xn),Γi(y)}in
X1×... ×Xn×Y. Granule γiis denoted by γi= (Gi,Γi), with Gi= (Gi
1, ..., Gi
n),
for short. Moreover, the granule γicomes with an associated local function pi.
In general, functions pican be of different type and are not required to be linear.
This study employs non-fuzzy, real-valued affine functions:
pix1, ..., ˆxn) = ˆyi=ai
0+
n
X
j=1
ai
jˆxj,(6.3)
82
6.4 Structure and Processing
Figure 6.3: Single-valued approximation provided from input data processing
for simplicity. Parameters ai
0and ai
jare real values; ˆxjis the midpoint of xj=
(xj, xj, xj, xj), which is obtained from:
mp(xj) = ˆxj=xj+xj
2.(6.4)
Similarity degrees exi= (exi
1, ..., exi
n) arise as result of matching between an input
x= (x1, ..., xn) and fuzzy sets of Gi= (Gi
1, ..., Gi
n), see Section 2.4.3. The ag-
gregation layer comprises fuzzy aggregation neurons Ci,i= 1, ..., c, whose role
is to combine information from the different inputs. A fuzzy aggregation neu-
83
6.4 Structure and Processing
ron Cicombines weighted similarity degrees (exi
1wi
1, ..., exi
nwi
n) into a single value
oi. The output layer aggregates weighted values (o1ˆy1δ1, ..., ocˆycδc) using a fuzzy
aggregation neuron Cfto produce the single network output ˆy[h]. Formally,
ˆy=Cf(o1ˆy1δ1, ..., ocˆycδc).(6.5)
An m-output eGNN requires a vector of local functions (pi
1, ..., pi
m), output
layer neurons (Cf
1, ..., Cf
m), and outputs (ˆy1, ..., ˆym). The network output ˆy, ob-
tained as illustrated in Fig. 6.3, is single-valued approximation of findependent
whether the input data are real numbers, fuzzy numbers or fuzzy intervals.
Granular approximation of a function fat step His given by a set of granules
γi,i= 1, ..., c, such that:
(x, y)[h]
c
[
i=1
γi, h = 1, ..., H. (6.6)
Granular approximation is formed by granulating input data x[h]into fuzzy sets
of Gi, as in Fig. 6.3, and output data y[h]into fuzzy sets Γi, as in Fig. 6.4. Note
in Fig. 6.4 that the granular approximation is the convex hull of output fuzzy
sets Γi, where iare indices of the active granules, that is, those for which oi>0.
This guarantees that the single approximation ˆy[h]is included in the granule.
Figure 6.4: Granular approximation formed by input and output data granulation
84
6.4 Structure and Processing
The convex hull of trapezoidal fuzzy sets Γ1, ..., Γi, ..., Γc, with Γi= (ui, ui, ui, ui),
is a trapezoidal fuzzy set ch1, ..., Γc) such that:
ch1, ..., Γc) = (min(u1, ..., uc), min(u1, ..., uc),
max(u1, ..., uc), max(u1, ..., uc)).(6.7)
Particularly in Fig. 6.4, the trapezoid (ui, ui, ui, ui
) that results from
chi), where i={i:oi>0, i = 1, ..., c}, is a granular approximation of
output data y. It is worth noting that granular approximation at instant hdoes
not depend on the availability of y[h]because oiis obtained from x[h](see Fig.
6.3). Only the collection of output fuzzy sets Γiis required.
Figure 6.5 shows an example of single-valued and granular approximation, p
and Sc
i=1 γi, of a function f. In Fig. 6.5(a), a singular input x[h1]and a granular
input x[h2]produce singular outputs ˆy[h1]and ˆy[h2]by using p. In Fig. 6.5(b),
the granular input x[h]activates the fuzzy sets of G2and G3. Therefore, granular
output is obtained from the convex hull operation ch2,Γ3). It follows that
y[h]ch2,Γ3). If y[h]6⊂ ch2,Γ3), either fuzzy set Γ2or Γ3is adapted to
enclose y[h]. Granule adaptation is addressed in the next section.
Information processing in intermediate layers of eGNN is single-valued to
speed up calculations. Thus, input and output fuzzy sets can be viewed as de-
coders and encoders of granular data. eGNN comes with an incremental learning
algorithm to adapt its structure and parameters over time. The algorithm will
be detailed in Section 6.5.
eGNN evolves functional and linguistic fuzzy models. While functional fuzzy
systems are more precise, linguistic fuzzy systems are more interpretable. Accu-
racy and interpretability require tradeoffs and one usually excels over the other.
eGNN joins functional and linguistic systems into a single framework. Under the
assumption of specific weights and neurons, fuzzy rules extracted from eGNN can
be of the type:
85
6.5 Learning in eGNN
(a) eGNN single-valued approximation pof function f
(b) eGNN granular approximation S5
i=1 γiof function f
Figure 6.5: eGNN single-valued (a) and granular (b) approximation of a function
Ri: IF (x1is Gi
1) AND ... AND (xnis Gi
n)
THEN (yis Γi)
| {z }
linguistic
AND ˆy=pi(x1, ..., xn)
| {z }
functional
This type of eGNN combines Mamdani and Takagi-Sugeno fuzzy models.
6.5 Learning in eGNN
This section details the eGNN learning algorithm. Differently from the usual top-
down granular approaches used, because the data domain is unknown beforehand,
86
6.5 Learning in eGNN
the eGNN learning approach is mostly bottom-up.
Developing the fuzzy rules encoded in the network structure and approximat-
ing nonstationary functions from granular data streams are the key concerns of
the learning approach. The eGNN learning employs a sample-per-sample testing-
before-training method on a recursive basis. This method portrays a truly online
data stream scenario. We assume that no granules, neurons and connections
exist before training starts. The algorithm builds the network structure in plug-
and-play mode. Single pass over data enables eGNN to address the issues of
unbounded data sets and scalability of computationally hard problems.
We assume trapezoidal membership functions Gi
j= (gi
j,gi
j,gi
j,gi
j) and input
data xj= (xj,xj,xj,xj). Similarly, Γi= (ui,ui,ui,ui) and output data
y= (y,y,y,y) are trapezoids. Each rule antecedent Gi= (Gi
1, ..., Gi
n) has a
correspondent consequent Γi. With γi= (Gi,Γi), eGNN looks at examples (x, y)
at a coarser granule size.
6.5.1 Adapting the Granularity
Balancing parametric and structural adaptation is a key to capture gradual and
abrupt changes of nonstationary systems online. The neural pattern recognition
literature refers to this issue as the stability-plasticity dilemma (29). Structural
plasticity in eGNN means creating new granules and rules to memorize new con-
cepts. This avoids the rules learned to be exposed to catastrophic forgetting.
Structural stability preserves the eGNN structure, but allows adaptation of the
existing granules and rules to smooth and slow changes. Parametric refinement
partially retains the information. The procedure suggested below is a way to
parsimoniously reconcile plasticity and stability in eGNN.
The maximum width fuzzy sets Gi
jare allowed to expand is denoted by ρ, that
is, wdt(Gi
j)ρ,j= 1, ..., n;i= 1, ..., c. The value of ρaffects the granularity,
accuracy and transparency of the models. Similar as in Chapter 5, the expansion
region of a fuzzy set Gi
jis
Ei
j= [mp(Gi
j)ρ
2, mp(Gi
j) + ρ
2].(6.8)
87
6.5 Learning in eGNN
It follows that
wdt(Gi
j)wdt(Ei
j)j, i. (6.9)
ρplays a pivotal role in mediating plasticity and stability of eGNN structures.
Expressions similar to (6.8) and (6.9) can be derived for fuzzy sets Γi. Expansion
regions help to derive criteria for deciding whether or not granular data should
be considered enclosed by the current granular model.
In practice, ρ[0,1] determines the need to create or adapt rules. In the
most general case, the neural network starts learning with an empty rule base
and devoid of knowledge about data properties. Therefore, in these cases it is
interesting to initialize ρwith an intermediate value to allow structural stability
and plasticity equally. We use ρ[0] = 0.5 as default initial value.
A simple and fast procedure to evolve ρis as follows. Let rbe the number
of rules created after hrsteps. If the number of rules grows faster than a rate η,
that is r > η, then ρis increased:
ρ(new) = 1 + r
hrρ(old).(6.10)
The idea is to reject large rule bases because they increase model complexity
and may not help generalization. Equation (6.10) punishes ρand acts against
outbursts of growth.
Otherwise, if the number of rules grows at a rate smaller than η, that is rη,
then ρis decreased,
ρ(new) = 1(ηr)
hrρ(old).(6.11)
If ρ= 1, then eGNN is structurally stable, but unable to capture abrupt
changes. Conversely, if ρ= 0, then eGNN overfits the data causing excessive
complexity and irreproducible optimistic results. Life-long adaptability is reached
choosing intermediate values for ρas depicted in Fig. 6.6.
88
6.5 Learning in eGNN
Figure 6.6: Stability-plasticity tradeoff and the role ρin eGNN systems
Reducing the maximum width allowed for granules may require shrinking
larger granules to fit them to new data. In this case, the support of a fuzzy set
Gi
jis narrowed as follows:
If mp(Gi
j)ρ(new)
2> gi
jthen gi
j(new) = mp(Gi
j)ρ(new)
2
If mp(Gi
j) + ρ(new)
2< gi
jthen gi
j(new) = mp(Gi
j) + ρ(new)
2
Cores [gi
j, gi
j] and supports [ui, ui] and cores [ui, ui] of fuzzy sets Γiare handled
similarly. Time-varying granularity is useful to avoid guesses on how fast and
how often the data stream changes. The accuracy-interpretability tradeoff is an
important issue in neurofuzzy computing (113).
6.5.2 Calculating Similarity Degree
As input data and granules are trapezoidal fuzzy objects, a potential similarity
measure to determine how much they match is:
exi
j=
1 |gi
jxj|+|gi
jxj|+|gi
jxj|+|gi
jxj|
4(max(gi
j,xj)min(gi
j,xj)) !if xjGi
j6=
0 otherwise.
(6.12)
This measure returns exi
jequal to 1 for superposed trapezoids and decreases lin-
early as any numerator term increases. The numerator terms are Hamming dis-
tances between pairs of parameters of two different trapezoids. The denominator
scales the result to lie in the range from zero to one. Non-overlapped trapezoids
89
6.5 Learning in eGNN
are considered dissimilar, yielding exi
jequal to 0.
Reference (46) introduces a similarity measure between an interval and a fuzzy
set. The measure is based on the minimum T-norm and decreases monotonically
as the distance between the interval and the fuzzy set increases. Reference (108)
pointed out that this similarity measure gives low values to significantly over-
lapped objects. In extreme cases, the similarity measure proposed in (46) returns
zero for a fuzzy set contained in an interval. To avoid this situation, (108) pro-
poses the average of the overlapping area between the interval and fuzzy set as
similarity measure.
The similarity measure in (6.12) extends the measures in (46) and (108) to
trapezoidal fuzzy sets. It overcomes problems which may arise due to some T-
norms when considering the boundaries of intervals, as in (46), but is faster than
the measure in (108) because it does not need to compute the area of arbitrary
polygons.
6.5.3 Creating Granules
The incremental procedure to create granules runs whenever the support of at
least one entry of an input vector (x1, ..., xn) is not enclosed by expansion regions
(Ei
1, ..., Ei
n), i= 1, ..., c. In this case, fuzzy sets Gicannot be expanded beyond
the limit ρto fit the sample. Analogously, if supp(y) is not enclosed by Eifor at
least one Γi, then the sample should be enclosed by a new granule.
A new granule γc+1 is assembled from fuzzy sets Gc+1
jand Γc+1 whose param-
eters match the sample:
(gc+1
j, gc+1
j, gc+1
j, gc+1
j) = (xj, xj, xj, xj),(6.13)
(uc+1, uc+1, uc+1, uc+1) = (y, y, y, y).(6.14)
Coefficients of the real-valued local function pc+1 are set to
ac+1
0=mp(y), ac+1
j= 0, j 6= 0.(6.15)
90
6.5 Learning in eGNN
6.5.4 Adapting Granules
Adaptation of granules means to expand or contract the support and the core
of fuzzy sets Gi
jand Γito enclose new data, and to refine the coefficients of the
local functions pi.
Granule γican be adapted whenever a sample (x, y) falls within its expansion
region, that is,
supp(xj)Ei
j, j = 1, ..., n and supp(y)Ei.(6.16)
This means that either the sample is enclosed by granule γior it is close enough
so that the granule can be expanded to enclose it. In situations in which two
or more granules are qualified to enclose the data, adapting only one of the
granules is enough to guarantee data inclusion. In particular, we may chose γi
such that i=arg max(o1, ..., oc). In other words, γiis the granule with the
highest activation level for a given sample.
Adaptation proceeds depending on where the input datum xjis regarding
fuzzy set Gi
j:
If xj[mp(Gi
j)ρ
2, gi
j] then gi
j(new) = xj
If xj[mp(Gi
j)ρ
2, gi
j] then gi
j(new) = xj
If xj[gi
j, mp(Gi
j)] then gi
j(new) = xj
If xj[mp(Gi
j), mp(Gi
j) + ρ
2] then gi
j(new) = mp(Gi
j)
If xj[mp(Gi
j)ρ
2, mp(Gi
j)] then gi
j(new) = mp(Gi
j)
If xj[mp(Gi
j), gi
j] then gi
j(new) = xj
If xj[gi
j, mp(Gi
j) + ρ
2] then gi
j(new) = xj
If xj[gi
j, mp(Gi
j) + ρ
2] then gi
j(new) = xj
The first and last rules imply support expansion, the second and seventh rules
core expansion, and the remaining cases core contraction. Notice that these
adaptation procedures are similar to those of Section 5.4.4 related to the FBeM
approach (refer to Fig. 5.2 for examples).
91
6.5 Learning in eGNN
Operations on core parameters, gi
jand gi
j, require adjustment of the midpoint
of the respective fuzzy sets:
mp(Gi
j)(new) = gi
j(new) + gi
j(new)
2.(6.17)
As a result, support contraction may happen in two occasions:
If mp(Gi
j)(new) ρ
2> gi
jthen gi
j(new) = mp(Gi
j)(new) ρ
2
If mp(Gi
j)(new) + ρ
2< gi
jthen gi
j(new) = mp(Gi
j)(new) + ρ
2
The adaptation of consequent fuzzy sets Γiis done similarly using output data
y. Coefficients ai
jof the local functions piare updated using the recursive least
squares algorithm as detailed in Appendix B.
6.5.5 Incremental Weighting
Aggregation layer weights wi
j[0,1] embody the importance of the membership
degree of the j-th attribute in fuzzy set Gi
jto the neural network output. If wi
j=
1, then the output is not affected. A relatively lower value of wi
jdiscounts the
impact of the respective attribute. If wi
j= 0, then the attribute is ignored. The
procedure described below assigns lower weight values to less helpful attributes.
Whenever a new granule γc+1 is created, the learning procedure assigns wc+1
j=
1, j= 1, ..., n. If it is known a priori that the input variables have different
importance, then values for wc+1
jcan be chosen differently to reflect domain
knowledge.
Taking into account the similarity measure (6.12) and the approximation er-
ror (B.4), weights wi
jcorresponding to the most active granule γi, where i=
arg max(o1, ..., oc), are recursively updated using:
wi
j(new) = wi
j(old) exi
joi||.(6.18)
92
6.5 Learning in eGNN
The idea here is that the single-valued approximation ˆyis more affected by more
active granules and attributes. Equation (6.18) ascribes to the j-th attribute of
Gia proportion of the approximation error.
Incremental weighting looks for relevant subsets of input variables. The proce-
dure (6.18) suggested is particularly simple and fast to compute. More elaborate
approaches for incremental weighting are addressed in (98).
6.5.6 Pruning
Pruning granules simplifies the neural network structure and keeps it flexible to
track dynamic behavior. We opt to prune the most inactive granules because
retaining a small number of highly active granules favors compactness and speed.
Output layer weights δi[0,1] help pruning by encoding the amount of data
assigned to granule γi. Learning starts with δi= 1. During the next steps, δiis
reduced whenever γiis not activated within hrsteps as follows:
δi(new) = ζδi(old),(6.19)
where ζ[0,1]. Otherwise, if γiis activated at least once within hrsteps, then
δiis increased:
δi(new) = δi(old) + ζ(1 δi(old)).(6.20)
If the value of δiis less than a threshold ϑ, then granule γi, its respective neuron Ai
and connections are pruned since they do not affect system accuracy significantly.
If the application requires memorization of rare events or cyclical behavior is
envisioned, then it may be the case to set ϑ= 0 and let δi0+. In this case,
the granule is kept in the network structure.
93
6.5 Learning in eGNN
6.5.7 Combining Granules
Relationships between granules may be strong enough to justify forming a larger
granule that inherits the information of the lower level granules. A metric to
measure the distance between trapezoidal objects, say granules γi1and γi2, is:
D(γi1, γi2) = 1
4(n+ 1)(
n
X
j=1
(|gi1
jgi2
j|+|gi1
jgi2
j|+|gi1
jgi2
j|+
+|gi1
jgi2
j|) + |ui1ui2|+|ui1ui2|+|ui1ui2|+|ui1ui2|).(6.21)
Dis a distance measure since it satisfies
D(γi1, γi2)0
D(γi1, γi2) = 0 if and only if γi1=γi2
D(γi1, γi2) = D(γi2, γ i1)
D(γi1, γi3)D(γi1, γ i2) + D(γi2, γi3)
for any γi1,γi2and γi3. Distance Dis fast to compute and is more accurate
than both distance between midpoints of trapezoids and distance between closest
points of trapezoids. Change in any of the parameters of the underlying trapezoids
is reflected in the value of distance D.
Granules are combined after hrsteps considering the lowest value of D(γi1, γi2),
i1, i2= 1, ..., c,i16=i2, and a decision criterion. For instance, the decision crite-
rion may consider if the new granule obeys the maximum width allowed ρ.
A new granule γi, coarsening of γi1and γi2, is formed by trapezoidal mem-
bership functions Gi
jas follows:
Gi
j=ch(Gi1
j, Gi2
j), j = 1, ..., n. (6.22)
Γiis obtained similarly. The new granule γiencloses the support and core of the
granules combined.
The coefficients of the new local function piare found as:
94
6.5 Learning in eGNN
ai
j=1
2(ai1
j+ai2
j), j = 0, ..., n. (6.23)
Combining granules avoids redundancy by eliminating similar rules of the rule
base. Reference (96) has emphasized the importance of having a compact rule
base in evolving fuzzy systems.
6.5.8 Learning Algorithm
The learning procedure to evolve granular neural networks can be summarized
by the following algorithm:
———————————————————————
BEGIN
Select a type of neuron for the aggregation and output layers;
Set parameters ρ,hr,η,ζ,ϑ,c= 0;
Read (x, y)[h],h= 1;
Create granule γc+1, neurons Cc+1 ,Cf, and respective connections;
For h= 2,... do
Read (x, y)[h];
Input x[h]to the network;
Compute compatibility degrees (o1, ..., oc);
Aggregate values using Cfto get single-valued approximation ˆy[h];
Compute the convex hull of Γi,i={i, oi>0};
Find granular approximation (ui, ui, ui, ui
);
Compute output error [h]=mp(y[h])ˆy[h];
If x[h]is not within expansion regions Eii
Create granule γc+1, neuron Cc+1 and connections;
Else
Update the most active granule γi,i=arg max(o1, ..., oc);
Update local function parameters ai
jusing RLS;
Update connection weights wi
jj, i;
If h=αhr,α= 1,2, ...
Combine granules when feasible;
Update model granularity ρ;
Adapt connection weights δii;
Prune inactive granules and respective connections;
END
———————————————————————
95
6.6 Summary
6.6 Summary
This chapter has introduced a fuzzy data stream modeling framework based on
an evolving fuzzy granular neural network approach. The eGNN framework pro-
cesses fuzzy data streams using fuzzy granular models, fuzzy aggregation neurons,
and an online incremental learning algorithm. Its neurofuzzy structure encodes a
set of fuzzy rules and a fuzzy inference system that establishes a tradeoff between
precision and interpretability combining functional and linguistic fuzzy models.
The eGNN provides single-valued approximation as well as granular approxima-
tion of functions.
96
Chapter 7
Application Examples
The application examples addressed in this section consider numeric, interval, and
fuzzy data streams to demonstrate the usefulness of evolving granular approaches.
We look forward to a low error rate, concise constructs, a high processing speed,
and meaningful understandable rules in semi-supervised classification, function
approximation, time series prediction, and control problems.
7.1 Introduction
The experimental work described in this chapter is based on data sets that have
already been collected for some purpose. Information extraction and knowledge
discovery are therefore based on simulations of singular and granular data streams
in online environment. All experiments require evolving granular systems to deal
with data they have never seen before and that demand a prompt response before
being used for model adaptation. The following assumptions hold true:
online approaches start learning from scratch, unless otherwise stated;
previous data is neither stored nor retrieved (space constraint);
the data streams - no matter single-valued or granular - dictate the granu-
larity of the underlying models;
no missing values are found in the original data sets;
97
7.2 Semi-Supervised Classification
the data sampling frequency is constant for the different scenarios;
the per-sample latency of the algorithms is not larger than the time interval
between samples (time constraint).
Ideally, online modeling methods, such as IBeM, FBeM and eGNN, should re-
tain all previous relevant knowledge and rely on the newest input data to perform
classification, prediction, approximation or control.
7.2 Semi-Supervised Classification
Semi-supervised learning methods use both labeled and unlabeled data to build
pattern classification systems. Mixtures of labeled and unlabeled data are easily
found in practice (47) (102) (111) (158). Often, the acquisition of labeled data
requires human experts to manually classify training instances. Manual classifi-
cation can be greatly influenced by subjectivity as well as not be feasible, as in
when we handle large data sets in online environment. There are situations in
which instances are labeled and apparently ask for fully supervised learning meth-
ods and standard procedures of classifier design. However, the labeling process
may have been unreliable so that our confidence in the labels already assigned
is relatively low (114). In these cases we resort to semi-supervised learning and
accept only a fraction of instances that we deem to have been labeled correctly.
Let an input-output pair (x, y) be related through y=f(x). We seek an ap-
proximation to fthat allows us to predict the value of ygiven x. In classification
problems, yis a class label, a value in a set {C1, ..., Cm} ∈ Nm, and relation f
specifies class boundaries. In the more general, semi-supervised case, Ckmay or
may not be known when xarrives. Classification of data streams involves pairs
(x, C)[h]of time-sequenced data indexed by h. Nonstationarity requires evolving
granular classifiers to identify time-varying relations f[h].
The experiments described next aim to demonstrate the ability of evolv-
ing granular methods in classifying unbalanced single-valued partially-supervised
streaming data subject to gradual and abrupt concept changes.
98
7.2 Semi-Supervised Classification
7.2.1 Rotation of Twin Gaussians
Gradual change is evaluated taking into account a two-attribute classification
problem where two partially overlapping Gaussians rotate anti-clockwise around
a central point, as shown in Fig. 7.1. The Gaussians are initially centered at
(4,4) and (6,6) with standard deviation fixed at 0.8. The kinematics of their
movement around the point (5,5) is as follows:
θ[h]=θ[h1] +φ(7.1)
x[h]
1= 5 + 2cos(θ[h]) (7.2)
x[h]
2= 5 + 2sin(θ[h]) (7.3)
The initial reference angle θ[0] for Class 1 is 225oand for Class 2 is 45o. For
h= 1, ..., 200, the rotating rate φis kept equal to 0, meaning that no drift is
present. The rotation starts at h= 201, and, for h= 201, ..., 400, φ= 0.45. The
final positions of the Gaussians are (6,4) and (4,6), respectively, for classes 1 and
2. The total rotation angle is 90o. Samples from both classes arrive randomly
and sequentially.
Figure 7.1: The rotating Gaussians problem
Assume classes 1 and 2 in Fig. 7.1 correspond to the positive and negative
classes, respectively. Consider a confusion matrix consisting of two rows and two
99
7.2 Semi-Supervised Classification
columns representing the number of true positives (TP), false positives (FP), true
negatives (TN), and false negatives (FN) as in the receiver operating characteristic
(ROC) method (45). The prediction accuracy of a classifier can be defined as
Acc = TP + TN
TP + FP + TN + FN .100%.(7.4)
This metric is usually employed for balanced data sets. The ROC method pro-
vides a convenient way to evaluate the quality of evolving classifiers in unknown
nonstationary environment because it is insensitive to both changes in class dis-
tribution and proportion of samples per class (45).
The ROC space is defined over the TP ratio and FP ratio,
TP ratio = TP
TP + FN (7.5)
FP ratio = FP
FP + TN.(7.6)
For each class, ROC applies threshold values across the interval [0,1] to outputs.
Each cut-off threshold corresponds to a point (sensitivity/specificity pair) in the
ROC space. The closer the ROC curve is to the upper left corner, the better the
classification. A test with perfect discrimination (no overlap in the two distribu-
tions) has a ROC curve that passes through the upper left corner.
We look for the decision boundary between the Gaussians using the newest
input data. Starting IBeM parameters are ρ= 0.35, hr= 40 and η= 2; FBeM
starts with the same parameters as IBeM plus 1=hr; and eGNN employs
Ci=Tmin,Cf=Smax,ρ= 0.42, hr= 40, η= 2 and ζ=ϑ= 0.5. These
values have worked well for a range of classification problems. The functional
consequents of IBeM, FBeM and eGNN can be neglected in classification tasks.
Aiming at emphasizing the importance of incremental learning in nonsta-
tionary data stream classification, we first compare evolving and non-evolving
approaches for the rotating Gaussians problem. We consider widely known non-
evolving methods, viz. a multi-layer perceptron (MLP) neural network trained
offline via gradient descent (55), and a fuzzy C-means (FCM) clustering method
(22). Table 7.1 summarizes the results averaged over 5 runs for each method.
100
7.2 Semi-Supervised Classification
Table 7.1: Rotating Gaussians: comparing evolving/non-evolving methods
Model # Avg.Rules Acc CPU*
MLP 57.0 41.0
FCM 10.0 60.5 0.8
IBeM 2.8 87.4 0.9
FBeM 3.4 92.3 0.4
eGNN 4.3 92.1 0.3
* Average CPU time per sample in milliseconds
Table 7.1 shows that FBeM is the most accurate approach in this example,
producing an Acc index slightly superior to that of eGNN. The performance
of non-evolving methods degrades when the concept changes in online drifting
scenarios because the structure and parameters of the underlying models are
fixed. MLP and FCM could not track the rotation of the Gaussians leading to
relatively worse results. We also consider for performance evaluation the average
number of rules in the model structure over the learning steps, and CPU time in a
dual-core 2.54GHz processor with 4GB of RAM. IBeM provided the most compact
model with an average of 2.8 rules during the learning process, whereas eGNN
was the fastest method being able to process each sample in 0.3 milliseconds.
Figure 7.2 shows the ROC curves produced by the evolving and non-evolving
methods. The results obtained were essentially the same in the different runs.
The diagonal line corresponds to random guessing, e.g., coin flipping.
Note in Fig. 7.2 that the area under the ROC curves of evolving granular ap-
proaches is larger than that referring to ROC curves of non-evolving approaches.
The ROC analysis confirms that FBeM is slightly superior to eGNN and IBeM
in this classification problem no matter if the Gaussian distribution is changed to
any other distribution or if the dataset is unbalanced. The area above the FBeM
ROC curve refers in part to 7.7% Acc of classification error and in part to the
overlapping between granules with different assigned labels.
Figure 7.3 shows an example of eGNN decision boundary (the overall best of
all experiments) for a 0.5 ROC cut-off threshold applied to outputs. At h= 200,
eGNN has 5 granules embodying its structure, two associated to Class 1 and
101
7.2 Semi-Supervised Classification
Figure 7.2: ROC curves of different methods for the rotating Gaussians
three to Class 2. It attained a 94.5% Acc classification rate. After rotation, that
is, after h= 400, eGNN employs 5 granules in its structure, three for Class 1 and
two for Class 2. It achieves a 97.5% Acc recognition performance.
As an example, a highly active eGNN rule at h= 400 is:
————————————————————————————–
R4: IF (x1is [3.1774,4.0022,4.605,4.8683] AND
x2is [4.9950,5.1767,5.4811,6.9495])
THEN ˆyis Class 1
————————————————————————————–
Linguistically, and assuming that the five partitions of the input variables are de-
scribed by the adjectives: ‘very low’, ‘low’, ‘medium’, ‘high’ and ‘very high’, and,
for example, a problem of classifying wines produced with grapes from different
vineyards. Thus, rule R4can be read: if the ‘concentration of flavanoids’ (x1)
is ‘medium’ and ‘color intensity’ (x2) is ‘high’, then the wine was produced by
‘vineyard number 1’ (ˆy).
102
7.2 Semi-Supervised Classification
(a) Snapshot at h= 200
(b) Snapshot at h= 400
Figure 7.3: eGNN decision boundary and last 200 data at particular time steps
7.2.2 New Class
A second experiment concerns an abrupt change: a new class in the data stream.
We introduce a new Gaussian class centered at (7,3) with dispersion 0.8 at h=
200, as shown in Fig. 7.4. Evolving granular methods should learn the previously
unknown class as soon as related information appears in the data stream.
Table 7.2 shows the results of evolving methods averaged over 5 independent
runs. Non-evolving and even parametric adaptive methods are unable to discover
103
7.2 Semi-Supervised Classification
Figure 7.4: A third class appears at h= 200 and remains
new classes in data streams without redesigning and retraining the classifier from
the beginning. Hence, they are inappropriate to this problem.
Table 7.2: New class problem: comparing evolving granular methods
Model # Avg.Rules Acc CPU*
IBeM 3.1 83.8 0.7
FBeM 3.4 89.5 0.4
eGNN 4.5 88.1 0.3
* Average CPU time per sample in milliseconds
We note from Table 7.2 that FBeM is the most accurate method, giving a
marginally better Acc index than eGNN. IBeM provided the most concise clas-
sifier; eGNN was the fastest method. Figure 7.5 illustrates the evolution of the
Acc index, number of rules, and granularity taking into consideration FBeM. The
results for the remaining methods are essentially the same.
We observe in Fig. 7.5 that the accuracy of the FBeM classifier is kept at a
similar level after the shifting of the concept at h= 200. The Acc index reduced
from 90.05% to 89.00%, which is quite acceptable. The robustness of the FBeM
system to nonstationarities, as shown in the figure, is typical of evolving granular
systems in view of their structural and parametric flexibility.
Figure 7.6 depicts the decision boundaries and the last 200 instances at h=
200 and h= 400 regarding eGNN. The neural network evolved a total of 6
104
7.2 Semi-Supervised Classification
Figure 7.5: FBeM evolution of the Acc index, rule base and granularity for the
new-class problem
granules during the first 200 steps, three associated with each of the two first
classes. At this point, the eGNN Acc rate was 94.5%. Data about the third class
started to arrive at h= 200 and at h= 400 eGNN developed 8 granules, three
assigned to Class 1, two to Class 2, and three to Class 3. Assuming classes 2 and
3 as negative classes, eGNN reached a 92.5% Acc classification rate - the overall
best accuracy of all experiments conducted.
7.2.3 Combining Labeled and Unlabeled Data
We analyze the behavior of granular approaches in semi-supervised online clas-
sification. Partially supervised learning methods combine labeled and unlabeled
data for training. Such mixtures are frequently found in practice (47) (102) (158).
Our approach for hybrid clustering and classification is: if an unlabeled sample
causes the creation of a granule, then the class of the granule remains undefined
until a new labeled sample falls within its bounds. The class label of the new
sample tags the granule. Contrarily, if an unlabeled sample rests within the
105
7.2 Semi-Supervised Classification
(a) Snapshot at h= 200
(b) Snapshot at h= 400
Figure 7.6: eGNN decision boundaries for the 3-class problem
bounds of an existing granule whose label is known, it borrows the granule label.
We propose to change the proportion of unlabeled data from 0% to 100% in
the twin (rotating and non-rotating) Gaussians problem, and in the new-arising-
class problem such that all spectrum of possibilities of semi-supervised learning
could be evaluated. Non-rotating and rotating Gaussians generate stationary and
gradually-changing data streams, respectively; a new class represents a sudden
abrupt shift.
Granularities ρwere chosen differently in the range [0.3,0.45] to forbid the
106
7.3 Time Series Prediction
number of granules to exceed 5 and emphasize the semantic aspect of the resulting
construct. The remaining parameters are the same as considered in the previous
sections. They are repeated here for convenience. IBeM and FBeM use hr=
1= 40 and η= 2; eGNN employs Ci=Tmin,Cf=Smax,hr= 40, η= 2
and ζ=ϑ= 0.5. Figure 7.7 illustrates the performance of the evolving granular
methods averaged over 5 runs for each condition.
Figure 7.7 shows that evolving granular methods benefit from all information
contained in the data stream, including that from unlabeled samples (input do-
main information) to perform classification. Conventional and evolving classifiers
which operate on a supervised basis by simply discarding unlabeled data cannot
deal with small fractions of labeled samples as in the situations in the right side
of the graphs. Note that the left and right extremes of the plots indicate total
supervision and no supervision, respectively. In both cases the final result is a
partition of data into classes. By contrasting Fig. 7.7(a) with Figs. 7.7(b)-(c), all
classifiers, IBeM, FBeM and eGNN, are not significantly affected by concept drift
and shift. The generalization of sharp boundaries of IBeM to fuzzy boundaries of
FBeM and eGNN is particularly decisive for the precision of the models in these
classification applications. FBeM and eGNN interchanged as the most efficient
approach. When the environment is stationary or processes are not overly com-
plex, as in the non-rotating Gaussians problem, the high learning capability and
structural plasticity of eGNN seem to be unnecessary. FBeM has shown to be
superior in these cases. Conversely, as the complexity of the problem increases
with nonstationarities and mixtures of labeled and unlabeled data, the eGNN
modeling approach has demonstrated to be equivalent or better than FBeM.
7.3 Time Series Prediction
Observing past outcomes of a system to estimate its future behavior is the essence
of forecasting and prediction (28) (51). When a complete mathematical model of
a system can be developed and the corresponding initial conditions are known,
prediction is an easy task. However, when no mathematical model is available or
only partially known models are feasible, an alternative to forecasting is to build
107
7.3 Time Series Prediction
(a) Non-rotating Gaussians
(b) Rotating Gaussians
(c) New class
Figure 7.7: Performance of evolving granular classifiers using different proportions
of unlabeled data
108
7.3 Time Series Prediction
models that consider current and past outcomes of the system, while neglecting
any external inputs. This is a look at what it does, not why approach (39).
Time series prediction is based on the idea that the series carry the potential
information needed to predict their future behavior. Analyzing data produced
by actual phenomena can give good insights into the phenomena themselves and
knowledge about the laws underlying the data.
Forward prediction of a discrete time series can be defined as follows. Given a
finite sequence x[h],x[h1], ..., x[hM], find the continuation x[h+1],x[h+2], ... This
involves finding a scalar Mand a function f, such that the value x[h+1] can be
estimated by:
ˆx[h+1] =f(x[h], x[h1], ..., x[hM]).(7.7)
This is equivalent to modeling the time series as
x[h+1] =f(x[h], x[h1], ..., x[hM]) + ψ[h+1] ,(7.8)
with ψ[h+1] being a white noise process. If the statistics of the time series is non-
Gaussian or the time series is the result of some nonlinear operation, the function
fis nonlinear. fis the function we aim to model using evolving granular systems.
This section considers interval and fuzzy granular data streams derived from
monthly mean, minimum, and maximum temperatures of weather time series of
geographic regions with different climatic patterns. The aim is to predict monthly
temperatures for all regions.
7.3.1 Weather Prediction
Weather predictions are useful for people to plan activities, protect property; and
to assist decision making in many different sectors such as energy, transportation,
aviation, agriculture, and inventory planning. Any system that is sensitive to the
state of the atmosphere may benefit from weather forecasts.
Monthly temperature data carry a degree of uncertainty due to imprecision
109
7.3 Time Series Prediction
of atmospheric measurements, instrument malfunction, equivocated transcripts,
and different standards in acquiring and pre-processing the collected data. Usu-
ally temperature data are numerical, but the processes which originate and supply
the data are imprecise. Temperature estimates in finer time granularities (days,
weeks) are commonly demanded. Evolving granular approaches provide guaran-
teed granular predictions of the time series in these cases. The satisfaction of the
granular prediction depends upon the prediction model compactness. Granular
predictions together with single-valued predictions are important because they
convey a value and a range of possible temperature values.
In the experiment we translate minimum, mean, and maximum monthly tem-
peratures to triangular fuzzy numbers. Numerically-driven modeling approaches
use mean monthly temperatures only; interval approaches consider minimum and
maximum temperatures. The data were linearly scaled to the range [0,1]. We use
data from different weather stations as summarized in Table 7.3 (data available
at http://eca.knmi.nl and http://cdiac.ornl.gov/epubs/ndp/ushcn/ushcn.html).
Table 7.3: Monthly temperature values
Station # Samples From To Std.Dev.
Bucharest 960 Jan 1930 Dec 2010 0.1795
Death Valley 1308 Jan 1901 Dec 2009 0.1835
Helsinki 1680 Jan 1871 Dec 2010 0.1842
Lisbon 1200 Jan 1910 Dec 2009 0.1556
Ottawa 1380 Jan 1895 Dec 2009 0.1790
As shown in Table 7.3, we consider five weather stations. In Death Valley
(Furnace Creek), super-heated moving air masses are trapped into the valley by
surrounding steep mountain ranges creating an extremely dry climate with high
temperatures. Refer to (123) for a complete list of factors that produce high air
temperatures in Death Valley. Conversely, Ottawa is one of the coldest capitals in
the world. During the year, a wide range of temperatures can be observed, but the
winters are very cold and snowy. Lisbon experiences more usual weather patterns.
Summers are warm, sometimes hot, whereas winters are mild and moist. Helsinki
and Bucharest are further weather stations considered for evaluation. Bucharest
110
7.3 Time Series Prediction
has a continental climate owing to its distance from the open sea. Summers are
generally hot while winters are quite cold. Helsinki combines characteristics of
maritime and continental climates. The proximity of the Arctic Ocean and the
North Atlantic creates cold weather, while the Gulf Stream conveys warm air.
During the computational experiments described subsequently, IBeM, FBeM
and eGNN scan the data only once to build their structure and adapt parameters.
This simulates online data stream processing. Testing and training are performed
concomitantly on a per-sample basis. The performance of algorithms is evaluated
using the root mean square error of singular predictions,
RMSE =v
u
u
t1
H
H
X
h=1
(mp(y)[h]ˆy[h])2,(7.9)
the non-dimensional error index,
NDEI =RM SE
std(mp(y)[h]h),(7.10)
average number of rules in the model structure, and per-sample CPU time. The
computer has a dual-core 2.54GHz processor with 4GB of RAM.
7.3.2 Performance Analysis
Different computational intelligence methods were chosen for performance as-
sessment. They are: multilayer perceptron neural network (MLP) (55), evolving
Takagi-Sugeno (eTS) (6), extended Takagi-Sugeno (xTS) (7), dynamic evolving
neuro-fuzzy inference system (DENFIS) (64); and IBeM, FBeM and eGNN.
The task of the different methods is to provide one step ahead forecast of
the monthly temperature y[h+1], using the last 12 observations, x[h11], ..., x[h].
The number of previous observations was chosen by trial and error to provide
relatively accurate predictions. Online methods employ the sample-per-sample
testing-before-training approach as follows. First, an estimation ˆy[h+1] is derived
for a given input (x[h11], ..., x[h]). One time step after, the actual value y[h+1]
111
7.3 Time Series Prediction
becomes available and model adaptation is performed if necessary. In general,
models should be robust to the trend and seasonal components of the time series,
and not to the random noise component. Because the observed data contain
random noise and irregular patterns, models that do not over fit them produce
better generalizations and predictions of future values. Table 7.4 summarizes
forecasting results for the Bucharest, Death Valley, Helsinki, Lisbon, and Ottawa
monthly temperature data. IBeM starts with ρ= 0.6, hr= 84, η= 2; FBeM uses
ρ= 0.7, hr= 1= 48, η= 2; and eGNN uses Ci=Tmin,Cf=M,ρ= 0.45,
hr= 84, η= 2 and ζ=ϑ= 0.5.
Table 7.4 shows that eGNN gives the most precise forecasts in 3 of the 5
temperature data sets, seconded by eTS and FBeM with one each. The struc-
tures of the eGNN are, in average, the most parsimonious. Alternative evolving
approaches such as DENFIS, eTS and xTS use numeric data - the mean tem-
perature. In contrast, granular approaches such as eGNN, IBeM and FBeM take
into account the mean and its neighbor data to bound forecasts. The trend com-
ponent of the time series is taken into account in granular systems by procedures
that gradually adapt granules and rules. The seasonal component is captured
through different granules which represent different seasons and transitions be-
tween seasons. Since the content of a granule carries seasonal information, its
corresponding rule tends to be activated in the specific months. IBeM and xTS
are the fastest among the algorithms evaluated in this section.
We can also notice in Table 7.4 that the MLP neural network behaved well
in all temperature time series. Our hypothesis is that the temperature time
series obtained by the weather stations have not changed very much during the
time period considered. In general, offline methods, such as the MLP, cannot
deal with nonstationary functions, do not support one-pass training, and require
higher CPU time and memory when compared with online methods. Moreover,
the MLP neural network does not provide comprehensible models to support data
description and interpretation.
As examples, the one-step singular and granular forecasts of FBeM for the
Death Valley time series and of eGNN for the Helsinki time series are shown
in Figs. 7.8 and 7.9. The additional plots in both figures show the granularity,
error indices, and number of rules developed. Note that while the singular predic-
112
7.3 Time Series Prediction
Table 7.4: Temperature forecasts
Station Method # Avg. Rules RM SE NDEI CPU
DENFIS 5.00 0.0800 0.4457 4.7
eGNN 3.80 0.0594 0.3309 1.6
eTS 3.00 0.0598 0.3331 1.1
Bucharest FBeM 7.57 0.0603 0.3359 1.1
IBeM 5.88 0.0643 0.3582 1.0
MLP 0.0892 0.4969 35.5
xTS 10.00 0.0643 0.3582 1.0
DENFIS 8.00 0.0600 0.3270 4.7
eGNN 3.91 0.0498 0.2714 1.6
eTS 3.00 0.0491 0.2676 1.0
Death Valley FBeM 8.00 0.0506 0.2757 1.1
IBeM 8.79 0.0541 0.2948 1.0
MLP 0.0584 0.3183 44.2
xTS 10.00 0.0503 0.2741 1.1
DENFIS 24.00 0.0780 0.4235 5.7
eGNN 2.78 0.0607 0.3295 1.6
eTS 4.00 0.0634 0.3442 1.4
Helsinki FBeM 6.00 0.0602 0.3268 1.1
IBeM 10.38 0.0764 0.4148 1.2
MLP 0.0892 0.4843 35.5
xTS 16.00 0.0651 0.3534 1.1
DENFIS 12.00 0.0880 0.5656 5.2
eGNN 2.77 0.0577 0.3708 1.7
eTS 4.00 0.0714 0.4589 2.3
Lisbon FBeM 5.63 0.0599 0.3850 1.2
IBeM 3.59 0.0687 0.4415 1.0
MLP 0.0955 0.6138 48.2
xTS 11.00 0.0744 0.4781 1.0
DENFIS 7.00 0.0770 0.4302 4.9
eGNN 3.88 0.0575 0.3212 1.5
eTS 3.00 0.0604 0.3374 1.0
Ottawa FBeM 6.80 0.0609 0.3402 1.1
IBeM 9.28 0.0734 0.4101 1.1
MLP 0.0769 0.4296 41.3
xTS 14.00 0.0631 0.3525 1.1
113
7.3 Time Series Prediction
Figure 7.8: FBeM Death Valley temperature forecasts
114
7.3 Time Series Prediction
Figure 7.9: eGNN Helsinki temperature forecasts
115
7.3 Time Series Prediction
tion pattempts to match the actual mean temperature value, the corresponding
granular information [u, U], formed by the lower and upper bounds of the con-
sequent trapezoidal membership functions, intends to envelop previous data and
the uncertainty of the unknown temperature function f.
Figure 7.10 enlarges the temperature predictions of the Figs. 7.8 and 7.9 for
the time intervals [739, 807] and [1009, 1069], respectively. During these time
intervals the respective granular models support 8 and 3 rules.
(a) Zoom of FBeM Death Valley forecast (Fig. 7.8) using 8 rules
(b) Zoom of eGNN Helsinki forecast (Fig. 7.9) using 3 rules
Figure 7.10: Comparing the narrowness of granular forecasts using rule bases of
different sizes
We notice from Fig. 7.10 that relatively larger rule bases produce narrower
ranges of values [u, U] to bound predictions. Granular forecasts are determined
based on past actual temperature values; they are particularly quite important
since they usually come with a label and a linguistic description. FBeM and
116
7.3 Time Series Prediction
eGNN are evolving approaches to handle fuzzy granular data streams, and to
simultaneously provide singular and granular predictions.
Overall, the results in this section suggest that evolving granular systems bene-
fit from data uncertainty and interval, fuzzy, and neurofuzzy granular frameworks
to provide accurate and linguistic predictions of granular time series.
7.3.3 Time Complexity
This section examines how the performance of granular systems is affected by the
number of input variables and rules. Here performance concerns temporal scala-
bility and RMSE to access processing time and prediction error, respectively.
For these purposes we first performed several independent experiments vary-
ing the number of input variables (lagged observations of temperature values).
Initial parameters were chosen to give rule bases with about ten granular rules.
This means that the size of the rule bases should not interfere in the temporal
scalability analysis of evolving granular systems. We evaluate the processing time
and prediction error when the number of input variables increases. Evaluation
was performed in the context of temperature prediction. We assume FBeM and
the Death Valley, Lisbon and Ottawa time series as the results for IBeM, eGNN
and remaining time series are fundamentally the same. Figure 7.11 shows the
processing time and RSME for the time series chosen.
The bottom plot of Fig. 7.11 suggests that the time complexity of FBeM,
and of evolving granular systems in general, is quasi-linear with the number of
inputs. This is important since many computational intelligence and statistical
algorithms behave polynomially or exponentially, which prohibit their use in han-
dling massive data streams and modeling large-scale online processes. Evolving
granular systems run linearly with respect to the number of samplings once their
learning algorithms are one-pass and of incremental nature.
It is worth noting at the top of Fig. 7.11 that the weather time series require
a small number of input variables, while the remainders tend to confuse the
underlying predictor. The RMSE indices for Death Valley, Ottawa, and Lisbon
suggest local optima in the range between six to twelve input variables.
In the next experiment we fix the number of input variables to five, and
117
7.3 Time Series Prediction
Figure 7.11: FBeM processing time and RMSE using different amounts of input
variables from temperature time series
run the FBeM algorithm with parameters that force it to generate an increasing
number of rules. The goal here is to evaluate temporal scalability and RMSE
when the size of the rule base increases. Figure 7.12 shows the results obtained
for the Death Valley, Ottawa, and Lisbon time series data.
The bottom plot of Fig. 7.12 shows that the processing time of FBeM grows
exponentially with the number of rules. Although the algorithm deals linearly
with the number of samples and input variables, granularity constraints within
evolving granular framework is of utmost importance to keep the system operating
online. Effective procedures to bound the rule base and protect evolving granular
systems from outbursts of growth are: (i) using the half-life value 1or the
deletion threshold hr. The total number of rules, c, is guaranteed to be less
than or equal to 1anytime. For example, suppose 1= 6 and that the
rule base contains 7 rules. The last 6 samples can only activate 6 or less of the
existing rules. Thus, at least one of the rules should be inactive for 7 time steps,
118
7.3 Time Series Prediction
Figure 7.12: FBeM processing time and RM SE for the Death Valley, Ottawa,
and Lisbon time series considering different numbers of rules
which contradicts that 1= 6; (ii ) adapting the maximum width allowed for
granules, ρ. This procedure develops only the necessary quantity of granules and
rules. Notice that the points at the right of the plots of Fig. 7.12 can only
be obtained by setting 1to a very large value, e.g. 10000, and turning the
granularity adaptation procedure off.
The error curves at the top plot of Fig. 7.12 show that quite small and
large rule bases decreases model accuracy. We employ piecewise cubic Hermite
interpolation polynomials to fit the error data. Curiously, error values suggest
more appropriate models with about 6 to 12 rules. This reinforces the hypothesis
that seasonal trends are better modeled by single rules. Excessive granularity
is detrimental because similar information is forcibly split into different granules
and the underlying local models do not profit from the full information.
119
7.3 Time Series Prediction
The average number of rules in FBeM depends on the choice of ρand 1.
Reference (80) recommends ρ[0] = 0.5 to balance structural stability and plasticity
whenever we lack detailed knowledge of the modeling task and data properties.
Monthly mean temperature prediction experiments suggest ρ[0] in the range from
0.5 to 0.8 to avoid rule overshoot after learning starts. This procedure helps
to attain smoother structural development along the next time steps. Gradual
adaptation of the granularity also alleviates initial guesses and guides the value
of ρaccording to the data stream. For monthly weather prediction, we suggest
hrvalues in the range between 48 and 84. The idea here is: if a trend does not
appear again in the next four/eight years, then remove its corresponding rule.
7.3.4 Handling Abrupt Regime Changes
Long term climate changes cause average monthly temperatures to gradually drift
over time yet abrupt shifts are hardly noticeable. The experiment addressed in
this section show how evolving granular systems react when abrupt changes occur
in nonstationary time series. We assume the FBeM method as the behavior of
IBeM and eGNN are essentially equivalent.
For this purpose, we consider a hypothetical situation in which the time series
of Death Valley, Ottawa, and Lisbon occur sequentially, forming a single time
series. Two severe regime shifts are easily identified as the top plot of Fig. 7.13
illustrates. The bottom plot of Fig. 7.13 shows the fuzzy temperature predic-
tions during the Ottawa-Lisbon shift (time interval between 2661 and 2740). In
this experiment, FBeM should adapt the model to capture the new temperature
profile and forget what is no longer relevant for the current environment. The
initial parameters of FBeM were: ρ= 0.6, hr= 1= 48 and η= 2. Figure
7.13 shows the RMSE, the number of rules, and the granular and singular pre-
dictions. Notice that the number of rules of the rule base peaks after the Death
Valley-Ottawa and Ottawa-Lisbon transitions, but returns to the usual values af-
terwards. Similarly, the RMSE increases slightly and decreases in the next steps
after time series transitions. Online adaptability improves prediction accuracy
after the transitions. Evolving granular systems are stable to abrupt changes in
granular data streams, a challenge to a variety of machine learning algorithms.
120
7.3 Time Series Prediction
Figure 7.13: FBeM prediction of the Death Valley, Ottawa, and Lisbon temper-
ature time series combined
121
7.4 Function Approximation
7.4 Function Approximation
Function approximation consists in finding a function that matches in some extent
a target function in a task-specific way. Here, the target functions are unknown,
but perceived as streams of intervals or fuzzy intervals - with intervals and fuzzy
intervals representing our intuitive notion of approximate data. Differently from
time series prediction, in function approximation problems, external variables are
available and the time span in which the data are obtained is unimportant.
The generic form of the function approximation problem is as follows: given a
time-varying unknown function f[h], where h= 1, ... is the time index; and a pair
of observations (x, y)[h],xXand yY, find a finite collection of information
granules γ={γ1, ..., γc}and a time-varying real-valued map p[h]:XYsuch
that γiX×Yand p[h]minimizes (f[h]p[h])2. The output y[h]is unknown
when the input x[h]is available, but is known afterwards. Attributes xjof an
input vector x= (x1, ..., xn) and the output yare trapezoidal fuzzy data.
Because every continuous function fcan be approximated uniformly on a
finite interval by continuous piecewise linear functions p(36), evolving granular
systems are universal approximators (proof in Appendix A).
The following sections consider recent benchmark data sets in material and
biomedical engineering to evaluate and illustrate the usefulness of the proposed
evolving granular approaches in the function approximation task.
7.4.1 Concrete Compressive Strength
Compressive strength is the capacity of a material to withstand axially-directed
pushing forces. When the limit of compressive strength is reached, materials are
crushed. When building with concrete, it is important to know whether it can
bear the compressive forces for safety sake (68).
Compressive tests measure how well concrete holds up to the compressive pres-
sures around it. Test standards neglect uncertainties from different natures. For
example, (i) the concrete cross sectional area changes in function of the compres-
sive load applied. The material tends to spread laterally and hence increase the
cross sectional area; (ii ) compression tests clamp materials at the edges. There-
fore, a variable frictional force (barreling phenomenon) arises which will oppose
122
7.4 Function Approximation
the lateral spread. This results in a slightly inaccurate value of stress which is
obtained from the experiment (68); (iii ) standards for concrete structures do not
consider weather conditions and permit specimen density to vary about 2%.
The Concrete Compressive Strength data set, available at the UCI Machine
Learning Repository, consists of 1030 singular samples. We assume that the
data are perceptions of the values of a variable. Thus, we consider 2% of im-
precision in each input xjand output yand represent it by symmetrical trian-
gular fuzzy objects of the form (.98xj, xj, xj,1.02xj) and (.98y, y, y, 1.02y), re-
spectively. Interval methods consider intervals (.98xj, .98xj,1.02xj,1.02xj) and
(.98y, .98y, 1.02y, 1.02y), naturally. Concrete ingredients and age of the mixture
are the independent variables of the compression function. Ingredients include
cement, blast furnace slag, fly ash, water, superplasticizer, and coarse and fine
aggregate (147).
First, we perform a preliminary experiment to analyze different fuzzy aggre-
gation neurons in eGNN. We consider 0.5 as default value of uninorm neutral
elements eand T-S norm v-factors. Therefore, in this experiment fuzzy neu-
rons use the T and S norms occurring in these constructs equally. Other fuzzy
aggregation neurons are used with no restrictions.
Performance evaluation is based on the RM SE and N DEI indices computed
similarly as in (7.9) and (7.10), respectively. We consider also the average number
of rules, and CPU time using a dual-core 2.54GHz processor with 4GB of RAM.
The original samples were shuffled and linearly scaled to the range [0,1]. eGNN
adopts ρ= 0.45, hr= 50, η= 2 and ζ=ϑ= 0.5. Table 7.5 shows the
best performance of each network setting (using different types of neurons in the
aggregation and output layers) in 10 independent runs.
We notice in Table 7.5 that the accuracy of eGNN using product T-norm
(Tprod) neurons in the aggregation layer and combined T-S neuron (Lmin,max) in
the output layer provides the most accurate results employing a relatively small
number of rules. Although the averaging (M) output neuron configuration using
Tprod aggregation produces very close results, the former construct which adopts
Tprod and Lmin,max neurons is maintained for the next experiments.
Because outlier points can disturb the computation of the mean and standard
deviation of a dataset, we consider replacing them with the mean of the values
123
7.4 Function Approximation
Table 7.5: Concrete compressive strength prediction: evaluation of different types
of eGNN neurons
Aggregation Output # Avg.Rules RM SE NDEI CPU*
Tmin M4.01 0.1268 0.6354 1.6
Tprod M3.72 0.1210 0.6064 1.8
TLM5.26 0.1438 0.7209 1.8
Umin,max M4.53 0.1402 0.7029 1.5
Uprod,prob M3.86 0.1295 0.6488 1.7
Tmin Lmin,max 4.01 0.1275 0.6389 1.8
Tprod Lmin,max 3.72 0.1205 0.6040 1.8
TLLmin,max 5.26 0.1417 0.7100 1.9
Uprod,prob Lmin,max 3.86 0.1276 0.6393 1.7
* Average CPU time per sample in milliseconds
available so far. This procedure avoids biased estimates caused by low levels of
activation of granules. Put simply, if the concentration of concrete ingredients or
the compressive strength exceeds the accumulated mean value plus 4 standard
deviations, then it is replaced by the mean. Note that this procedure discards
the uncommon value and handles the sample with a missing datum. Imputation
methods for missing data (127) request certain constraints to be attended. For
instance, if the number of outliers is large compared to the total number of
samples then we run the risk to distort the covariance structure of the data and
to bias covariances toward zero. The feasibility of the proposed method using 4
standard deviations was confirmed to the underlying data since that the quantity
of outliers is smaller than 2%. A more elaborate mechanism to deal with outliers
is, e.g., to use an arousal mechanism such as the one suggested in (128).
We compare evolving granular systems against alternative evolving meth-
ods. The approaches evaluated are: evolving Takagi-Sugeno (eTS) (6), extended
Takagi-Sugeno (xTS) (7), dynamic evolving neural-fuzzy inference system (DEN-
FIS) (64), evolving fuzzy linear regression tree (eFT) (83), evolving participatory
learning (87), IBeM, FBeM, and eGNN. In this comparative experiment, we con-
sider the dataset as originally provided instead of shuffling the data samples.
Based upon a possible temporal correlation of the data, one lagged value of the
compressive strength was considered as input to all of the methods. Therefore,
124
7.4 Function Approximation
the number of input variables in this experiment totalizes 9. IBeM employs
ρ= 0.45, hr= 50 and η= 2; FBeM uses the same parameters as IBeM ex-
cept for hr= 1= 40; eGNN also adopts the same parameters as IBeM plus
ζ=ϑ= 0.5. Table 7.6 shows the performance of each method.
Table 7.6: Concrete compressive strength prediction: evaluating different evolving
methods
Model # Avg.Rules RM SE N DEI CPU
DENFIS 5.00 0.1130 0.5670 19.9
eFT 7.00 0.1380 0.6518 79.7
eGNN 3.72 0.1205 0.6040 1.8
ePL 6.00 0.1847 0.9259 24.4
eTS 7.00 0.1554 0.7343 0.9
FBeM 3.48 0.1398 0.7009 1.7
IBeM 4.43 0.1334 0.6683 1.2
xTS 8.00 0.1552 0.7333 0.9
We observe from Table 7.6 that DENFIS is the most accurate method to
approximate the compressive strength function although it uses more rules and
spends more time to process the data collection as opposed to evolving granular
systems. From the accuracy/compactness point of view all, IBeM, FBeM and
eGNN, have shown to be competitive. Particularly, eGNN reached a slightly
higher error rate than DENFIS, 0.1205 against 0.1130, using an average of only
3.72 rules and a maximum of 10 rules. eTS and xTS were the fastest meth-
ods in this function approximation application. Non-granular methods give no
prediction boundaries during processing steps.
Figure 7.14 shows an example of Tprod with Lmin,max eGNN approximation of
the concrete compressive strength function and the evolution of the number of
rules, error indices, and granularity. The bottom plot expands the approximation
of the top plots in the range [885,987].
In Fig. 7.14, the single-valued approximation ptogether with the granular
approximation [u, U] gives a value of compressive strength and a range of values
in the neighborhood of pinduced by the input data. Moreover, the granular
approximation may come with a label and a proper linguistic description; it
125
7.4 Function Approximation
Figure 7.14: eGNN approximation of the concrete compressive strength function,
and evolution of the rule base, error indices, and granularity
126
7.4 Function Approximation
enhances model acceptability and the neighborhood can be made tighter if we
accept a larger number of rules. The performance of eGNN profits from the
combination of structural evolution and fuzzy granules. Results may recommend
changing ingredients mix ratio and/or adding special hardeners to the concrete
compound.
7.4.2 Parkinson’s Telemonitoring
Parkinson’s disease is one of the most common neurodegenerative disorders (133).
Early diagnosis is a key to improve patients’ quality of life and to prolong it.
Frequent symptoms of Parkinson’s disease include movement disorders and vocal
impairment (dysphonia). Particularly, vocal degradation is one of the earliest
indicators of the disease which patients consider a major barrier (57) (133).
The Parkinson’s telemonitoring data set, accessible at the UCI Machine Learn-
ing Repository, consists of 5875 biomedical voice measurements from 42 patients
with early-stage Parkinson’s disease recruited to a six-month trial of a telemoni-
toring device for remote symptom progression accompaniment. Recordings were
captured in typical home acoustic environments and transmitted over the Inter-
net to a clinic. Inputs comprise 5 Jitter and 6 Shimmer measures related respec-
tively to the frequency and amplitude of the speech signal, 2 measures of ratio
of noise-to-tonal components, 2 measures associated to entropy, and a measure
of detrended fluctuation, a total of 16 inputs. Uncertainty may arise from the
processes of audio capturing and web transmission, steadiness of phonation and
loudness, and environmental conditions, to name a few. The output considered in
this study is the total score of the factor ‘unified Parkinson’s disease rating scale’
(UPDRS), which reflects the presence and severity of symptoms. The larger the
UPDRS, the more severe the patient disabilities.
We analyze different aggregation neurons in eGNN similar to the concrete
compressive strength function approximation problem. New outlier data are re-
placed with the mean of the values available so far whenever the data surpass the
range of plus or minus 3 standard deviations around the mean. We consider not
shuffling data samples and taking advantage of 2 lagged UPDRS scores to benefit
from temporal information. Table 7.7 summarizes the results.
127
7.4 Function Approximation
Table 7.7: Parkinson’s telemonitoring prediction: evaluation of different types of
eGNN neurons
Aggregation Output # Rules RM SE NDEI CPU*
Tmin M9.10 0.0667 0.3219 1.8
Tprod M8.75 0.0679 0.3280 1.9
Umin,max M10.56 0.0755 0.3643 2.0
Uprod,prob M9.52 0.0754 0.3641 2.2
Tmin Lmin,max 9.10 0.0668 0.3226 1.8
Tprod Lmin,max 8.75 0.0683 0.3296 2.1
Umin,max Lmin,max 10.56 0.0735 0.3550 2.1
Uprod,prob Lmin,max 9.52 0.0752 0.3630 2.3
* Average CPU time per sample in milliseconds
Table 7.7 shows that using different types of fuzzy neurons in the aggregation
layer of eGNN affects prediction accuracy more than using different neurons in
the output layer. The eGNN construct that combines minimum T-norm Tmin and
averaging Mneurons in the aggregation and output layers, respectively, performs
better than the remainder constructs according to the error indices. Therefore,
this construct is maintained in the next experiments.
Studies in (90) (133) point that some dysphonia measures are highly corre-
lated. Overall, highly correlated input variables tend to misguide the underlying
granular evolving method. Variable selection has been considered to improve pre-
diction accuracy, speed up the learning process, and provide easier-to-interpret
models.
Input variables of IBeM, FBeM, and eGNN may be mvariables of the speech
function or we may select the nless correlated variables. We conducted offline
ranking and progressive elimination of correlated variables. Leaving one out of
two highly correlated variables allows assessing how well the results generalize to
relatively independent data sets. Similar to (14) (133), the sequence of removed
variables was chosen based on their maximum redundancy as calculated on par-
tial autocorrelation analysis. The variables were: Shimmer:DDA, Jitter:DDP,
Shimmer(dB), Shimmer:APQ5, Shimmer:APQ3, Jitter:RAP, Jitter:PPQ5, Jit-
ter(Abs), Shimmer:APQ11, and HNR, in this order. Refer to the UCI Machine
Learning Repository for a detailed description of the meaning and significance
128
7.4 Function Approximation
of the variables. We ignore one variable at a time until a statistically signifi-
cant degradation of the systems is noticed. Figure 7.15 shows the average results
considering IBeM, FBeM, eGNN, and independent runs for each set of variables.
Figure 7.15: Evolving granular systems results on leave-one-variable-out approach
to find less correlated subsets of input variables
The top plot of Fig. 7.15 shows that a substantial degradation of the RMSE
performance is verified only when the leave-one-out approach results in a 6-
variable model (not counting the two lagged variables related to the previous
UPDRS values). According to the principle of parsimony, which states that,
129
7.4 Function Approximation
other things being equal, the simplest solution is the best, the models with the 7
less correlated variables are sufficient.
The bottom plot of Fig. 7.15 shows the average per-sample CPU time spent
by evolving granular systems for different numbers of input variables. The data
were fitted with a quadratic function whose small second-order coefficient suggests
that time complexity is quasi-linear with the number of input variables. This is
a major characteristic in view that many computational intelligence and statis-
tical algorithms behave polynomially or exponentially and are unable to process
massive data streams in large-scale online modeling. Moreover, evolving granular
systems run in linear time with respect to the number of samplings because their
learning algorithms are one-pass and incremental.
Comparison between granular and alternative methods is given in Table 7.8.
The methods analyzed were: multi-layer perceptron (MLP) (55), least squares
(LS) (54), iterative reweighed least squares (IRLS) (133), least absolute shrinkage
and selection operator (LASSO) (132), classification and regression trees (CART)
(54), extended Takagi-Sugeno (xTS) (7), and evolving Takagi-Sugeno (eTS) (6).
In this experiment samples were shuffled so that their order does not matter to
those methods which are online. Lagged values of the UPDRS score become su-
perfluous since temporal information is lost when we mix the data sequence. The
purpose of the experiment is to compare state-of-the-art function approximation
methods only. IBeM uses the following parameters: ρ= 0.5, hr= 120, η= 2;
FBeM employs ρ= 0.45, hr= 120, η= 1= 2; and eGNN employs ρ= 0.4,
hr= 120, η= 2 and ζ=ϑ= 0.5. IBeM was run processing input information
translated to interval data. Conversely, FBeM and eGNN process symmetrical
trapezoidal fuzzy data. More precisely, the original input and output data, xjand
y, were assumed to be (.98xj, .98xj,1.02xj,1.02xj) and (.98y, .98y, 1.02y, 1.02y)
in the case of IBeM; and (.98xj, .99xj,1.01xj,1.02xj) and (.98y, .99y, 1.01y, 1.02y)
in the case of FBeM and eGNN. Table 7.8 shows the results.
Table 7.8 shows that both granular evolving systems achieve satisfactory re-
sults taking into account the relation accuracy/compactness. Interestingly, FBeM
outperforms the remaining methods not requiring a large number of rules. Based
on its structure, learning algorithm and fuzzy granular framework, FBeM reaches
a 0.1245 RMSE rate using an average of 4.91 ±1.68 rules, with a maximum of
130
7.4 Function Approximation
Table 7.8: Parkinson’s telemonitoring prediction: evaluating different methods
Model # Rules RM SE NDEI
LASSO 0.3842 1.8402
LS 0.3820 1.8294
IRLS 0.3797 1.8186
CART 0.3588 1.7185
MLP 0.3559 1.7046
eTS* 7 0.1452 0.6954
xTS* 7 0.1443 0.6911
eGNN* 6 0.1394 0.6673
IBeM* 5 0.1358 0.6504
FBeM* 5 0.1245 0.5963
* Online methods
9 rules. Evolving granular methods have shown clear advantages over traditional
statistical LASSO, LS, and IRLS methods because of its nonlinear nature.
Figure 7.16 depicts an example of FBeM approximation of the Parkinson’s
telemonitoring function and the evolution of the number of rules, error indices and
granularity. The results in the figure stand for the best approximation attained
in all experiments performed. It considers an FBeM model with 8 inputs, being
7 less correlated features (obtained as previously described using the leave-one-
variable-out approach) and 1 lagged UPDRS value. Moreover, we discard samples
that convey values out of the range of plus or minus 3 standard deviations around
the current mean value of a variable.
The top plot of Fig. 7.16 shows that the FBeM single-valued approximation
pprovides quite accurate estimations when temporal information is intrinsic to
the data stream. In these cases, the FBeM learning algorithm explores space in-
formation from the original features and, simultaneously, time information from
past outputs. Note that the average number of rules during processing steps is
only 4.16, which means that FBeM is not over fitting data to achieve the under-
lying error rate (RMSE = 0.0509), but generalizing the behavior of the actual
function. The granular prediction [u, U ] (support of trapezoids) provides lower
and upper bounds in which single-valued predictions must be within. Therefore,
given a limited range of possible values, the chances to obtain more accurate
single-valued predictions tend to be higher. The bottom plot of the figure ex-
131
7.4 Function Approximation
Figure 7.16: FBeM approximation of the Parkinson’s telemonitoring function,
and evolution of the rule base, error indices, and granularity
132
7.4 Function Approximation
pands the top plot in the interval [1181,1273]. If we accept larger rule bases by
choosing higher values for the rate ηand lower values of ρ, then FBeM takes
advantage of a larger number of rules and may improve its accuracy because the
granular approximation tends to become tighter around p. However, FBeM inter-
pretability may reduce in this case as the number of linguistic terms and granules
increases. The same phenomenon is observed in other granular systems.
Rules of particular interest can be displayed anytime. An example of a highly
active FBeM rule at h= 5460 is:
————————————————————————————–
Ri: IF (x1is [0.0381,0.1081,0.1154,0.4283],
x2is [0.0756,0.1469,0.1619,0.4582],
x3is [0.0169,0.0556,0.0667,0.3899],
x4is [0.3259,0.5531,0.5753,0.7571],
x5is [0.2813,0.5241,0.6081,0.7232],
x6is [0.0692,0.1783,0.1890,0.4656],
x7is [0.1272,0.2724,0.2887,0.5295],
y[1] is [0.4283,0.5202,0.5300,0.8147])
THEN ˆyis [0.4327,0.5103,0.5202,0.8177] AND
ˆy= 0.0367 + 0.0300 x1- 0.0815 x2- 0.0449 x3+ 0.0115 x4+
+ 0.0225 x5+ 0.0396 x6+ 0.0532 x7+ 0.8812 y[1]
————————————————————————————–
Here, x1stands for ‘Jitter(%)’; x2, ‘Shimmer’; x3, ‘NHR’; x4, ‘HNR’; x5, ‘RPDE’;
x6, ‘DFA’; x7, ‘PPE’; y[1], ‘last UPDRS’; and ˆyis the predicted ‘UPDRS’. Lin-
guistically, and based on all existing rules at h= 5460, rule Rican be read: if
‘Jitter(%)’ is ‘very low’, ‘Shimmer’ is ‘low’, ‘NHR’ is ‘very low’, ‘HNR’ is ‘high’,
‘RPDE’ is ‘moderate’, ‘DFA’ is ‘low’, ‘PPE’ is ‘low’, and ‘last UPDRS’ is ‘high’,
then ‘UPDRS’ is ‘high’. The outputs of granular modeling approaches help to
monitor Parkinson’s disease symptoms.
Evolving granular systems are capable to process online interval and fuzzy
granular data as well as handle ranges of possible values to approximate functions.
In addition, they give transparency and interpretability of the resulting construct.
133
7.5 Control
7.5 Control
Process controllers aim at keeping the output of a specific process within a desired
range. In the following, we shall show how to use IBeM, FBeM and eGNN as
controllers of a feedback system.
7.5.1 Sensor-Based Robust Navigation
We consider an instance of autonomous robot navigation in an unknown environ-
ment with obstacle avoidance. From the point of view of control, the autonomous
navigation problem consists in designing driving rules based on available sensor
data. Evolving granular systems for sensor-based navigation play the role of reac-
tive adaptive controllers that prevent the robot from colliding with obstacles. We
assume that the pair of sensors available for obstacle detection is infrared sensors
directed head-on and symmetrically as shown in Fig. 7.17. Measurements from
Figure 7.17: Environment for sensor-based navigation
134
7.5 Control
sensors SL and SR give a linear approximation of the surface of an obstacle. The
control variable is the wheel steering angle φ. Variable θstands for the reference
angle between the robot and the border of the track.
We assume the navigation environment is flat without slopes, but unknown.
Coordinates z1and z2range between [0,3000] and [0,5000]. Positive values of the
steering angle φrepresent clockwise rotation of the steering wheel, and negative
values mean counterclockwise rotation. At every processing step, the controller
yields a steering angle. Input sensor readings, SL and SR, are proportional to
the distance between robot and obstacle and limited to 500. The perpendicular
distance between infrared beams is 40. We want the robot to drive through the
path without hitting the borderline.
Simple kinematical relations approximate the robot movement. For example,
if the robot moves from position (z1,z2) to position (z0
1,z0
2) at step hwith speed
S, then:
θ0=θ+φ(7.11)
z0
1=z1+Ssin(θ0) (7.12)
z0
2=z2+Scos(θ0) (7.13)
Obstacle avoidance simulation models often ignore physical limitations and pro-
cessing delays. Estimated paths are often unrealistic once the feasibility of the
trajectory is not guaranteed. In addition, uncertainty in measurements may hin-
der the robot to follow trajectories precisely. Evolving granular systems deal with
these constraints by keeping the robot between tolerance bounds [u, U] around
the more precise estimated path p.
Experiments with different navigation speeds and noisy data were performed.
Experts provided a few common-sense associations of how the state and control
variables behave prior to learning and navigation. Three rules were considered:
R1: IF (SL is big) AND (SR is big) THEN (φis zero) AND (φ=p1(SL, SR))
R2: IF (SL is small)AND(SR is big)THEN(φis positive)AND(φ=p2(SL, S R))
R3: IF (SL is big)AND(SR is small)THEN(φis negative)AND(φ=p3(SL, SR))
135
7.5 Control
The parameters of functions piare a1= (0,0.034,0.034), a2= (5,0.04,0.1), and a3
= (5,0.1,0.04). In the case of interval framework, sharp boundaries define the
subsets ‘big’, ‘small’; ‘negative’, ‘zero’, and ‘positive’ as shown in Fig. 7.18(a). In
the cases of fuzzy and neurofuzzy frameworks, trapezoidal membership functions
define the same subsets as illustrated in Fig. 7.18(b).
(a) Interval attributes
(b) Fuzzy attributes
Figure 7.18: Initial conditions for the autonomous navigation control problem
The robot is chosen to be initially placed at position (1900,100) with reference
angle θ= 0oin all experiments (see Fig. 7.17). Sensor data streams are singular
and linearly scaled to the range [0,1]. The IBeM and FBeM controllers start
with parameters ρ= 0.5, hr= 1= 10000 and η= 1; eGNN adopts the same
parameters as IBeM and FBeM plus Ci=TL,Cf=Mand ζ=ϑ= 0.5. It is
worth noting that although the support of the initial membership functions of
Fig. 7.18 covers the whole variables domain, searching for more specific rules to
fit never-before-seen stream data may contract granules and therefore activate
structural adaptation of models.
Figure 7.19 shows different trajectories concerning the robot driving at speeds
5, 10, 20, 30 and 40, and using different granular controllers. We run each algo-
rithm five times independently in this experiment. We notice in the figure that
the robot responded faster to obstacle detection driving at lower speeds. More-
over, alignment (parallel to the obstacle) after left and right turns tended to be
136
7.5 Control
more accurate at lower navigation speeds. Alignment yields smoother and shorter
paths, which are intuitively preferable. Some classes of problems emphasize fast
environment exploration though. Table 7.9 compares the performance of IBeM,
FBeM and eGNN based on the results shown in Fig. 7.19.
Table 7.9: Comparison of different evolving granular controllers
Model Speed # Steps to Goal CPU # Rules
5 1164 5.57 3
10 581 3.18 3
IBeM 20 289 1.95 3
30 194 1.60 3
40 156 1.29 4
5 1094 4.70 3
10 546 2.78 3
FBeM 20 290 1.91 4
30 202 1.61 3
40 157 1.51 4
5 1086 5.71 3
10 550 3.33 3
eGNN 20 279 2.15 3
30 201 1.86 4
40 154 1.70 4
Table 7.9 shows that eGNN provided the shortest path in 3 of the 5 speed
settings, seconded by IBeM and FBeM with one shortest path each. IBeM was
able to process data faster than the remaining methods and therefore is the best
controller in terms of time spent to achieve the goal.
Figure 7.20 evidences the numerical and granular outputs of an experiment
with the FBeM controller. Numerical output pis provided by the functional part
of the FBeM controller, and granular output [u, U] is given by the bounds of
the granular part of the FBeM consequent. Granular output is interpreted as a
guaranteed safe path for navigation and maneuver.
The simulation concerning speed 30 (Fig. 7.20) started with 3 rules and ended
up with 4 rules. After contraction and drifting of initial membership functions
of antecedents toward frequently requested regions around 500, when the robot
approached the obstacle and the range of sensor readings got smaller quickly, a
137
7.5 Control
(a) IBeM navigation
(b) FBeM navigation
(c) eGNN navigation
Figure 7.19: Granular controllers navigating at different speeds
138
7.5 Control
Figure 7.20: Detail of the FBeM navigation at different speeds
new rule:
R4: IF (SL is very small) AND (SR is small)
THEN (φis big positive) AND (φ=p4
1(SL, S R))
was created to help the robot turn right faster and avoid collision.
Experiment adding noise in the range ϑ= [0.05,0.05] to input data was con-
ducted. Noise may swing the robot from one side to the other. In this experiment
the robot speed remains fixed at 5 during the simulations. The initial parameters
of IBeM, FBeM, and eGNN are the same as in the previous experiment. Figure
7.21 illustrates trajectories from independent simulations considering FBeM. Tra-
jectories for IBeM and eGNN are similar. We notice that when obstacles are out
of vision, the controller accepts input data as they are and lets the robot explore
the environment freely. Otherwise, when obstacles are detected, the controller
responds turning the robot left and right satisfactorily. Granular representation
of data alleviates unlikely swing effect.
Naturally, the accuracy of the granular controllers for navigation can be im-
proved using more sensors and considering speed as control variable. Evolving
algorithms offer model-free estimation of the control system. Even if a math-
ematical model is available, evolving controllers may prove more robust, easier
to adapt, and give additional linguistically interpretable granular information,
139
7.6 Summary
Figure 7.21: FBeM navigating with noisy input
which may help design, analysis and supervision. If experts can provide struc-
tured knowledge of the control system or if training data are unavailable, the
evolving granular approach proceeds as an adaptive controller.
7.6 Summary
This chapter has shown that evolving granular modeling approaches can handle
singular and granular data from distinct scenarios successfully. Moreover, they
can outperform other offline, online adaptive and evolving approaches in terms of
accuracy, conciseness, processing speed, and interpretability. IBeM, FBeM, and
eGNN have provided meaningful understandable rules in semi-supervised classi-
fication, function approximation, time series prediction, and control problems.
140
Chapter 8
Conclusion
This chapter summarizes the major aspects of this work, reiterates the thesis
contributions, and discusses potential future research.
8.1 Summary
This thesis has introduced a framework for evolving granular system modeling
and a suite of methods for uncertain data processing. Evolving granular sys-
tems emphasize structural learning of rule-based models and their realization
from online data streams. We consider uncertain data streams and transparent
and linguistically appealing approaches that explore the data uncertainty. Evolv-
ing granular systems provide a way to construct models of real-world processes
involving domain knowledge, experience, and empirical data.
The evolving granular framework is supported by notions of granular com-
puting such as data granulation, granularity adaptation, and granular data pro-
cessing. Development of the structure of models from scratch on a plug-and-play
incremental basis and computation with multi-sized granules are fundamental
characteristics of the framework. Instead of dealing with the problem as a whole,
we gradually granulate it in simpler sub-parts. The premise is to discover more
abstract granular knowledge from finer granular input-output data. Because un-
certain data prevail in stream applications, excessive granularity (close to the
singularity) becomes unnecessary and inefficient.
In spite of the fact that granular computing is a unified framework for granular
141
8.1 Summary
information processing, current literature is fairly dispersed and related ideas
have been developed independently under different terminologies. Of particular
concern to this work are numeric, interval, and fuzzy types of granular data; and
interval, fuzzy, and neurofuzzy modeling frameworks. We have introduced three
methods, viz., interval based evolving modeling (IBeM), fuzzy set based evolving
modeling (FBeM), and evolving granular neural network (eGNN). IBeM uses
interval data and interval preserving operations rooted in the theory of interval
mathematics. FBeM deals with fuzzy data and produces results in fuzzy granular
format. eGNN essentially encodes a set of fuzzy rules in its topology; therefore,
neural processing conforms with a fuzzy inference system. Differently from FBeM,
eGNN is equipped with fuzzy aggregation neurons, which provides it with a higher
level of adaptability. Fuzzy sets and neurocomputing are complementary in terms
of their strengths, thus motivating neurofuzzy granular computing.
We evaluate the methods on several nonstationary environments, namely,
semi-supervised classification, weather time series prediction, function approx-
imation in material and biomedical engineering, and control of an autonomous
robot. Put broadly, spatial and temporal aspects of data stream processing were
examined from a granular perspective. In general, the IBeM, FBeM, and eGNN
methods were able to provide: (i) computational tractability and scalability with
the number of samples and input variables; (ii) improved interpretability and
transparency of models; (iii ) reduced cost of data processing in relation to non-
evolving methods; (iv ) approximated results, bounds on the approximations, and
description of actions. A user is presented with a range of values without pressure
as to the commitment of choosing a specific numeric solution. A numeric value
is also given as result of local linear functions associated to granules.
All methods have shown to be extremely general and able to outperform state-
of-the-art evolving approaches. FBeM and eGNN alternate as the most accurate
methods. A main disadvantage of IBeM is relatively lower accuracy. This is a
consequence of parameter-free internal representation in opposition to the fuzzy
membership functions of FBeM and eGNN. IBeM usually processes data faster
than FBeM and eGNN for the same reason. An advantage of eGNN with respect
to IBeM and FBeM is the ability of feature weighting: eGNN tends to be more
robust against irrelevant input variables by changing the value of connection
142
8.2 Contributions
weights over time. However, this characteristic was not of particular importance
to the chosen applications.
All evolving granular modeling approaches provide a complete model descrip-
tion and offer different linguistic insights into the nature of granular data relation-
ship. Results were at least as strong as the results obtained by recently proposed
methods in the field of evolving intelligent systems.
8.2 Contributions
This thesis has proposed evolving granular systems, a rule-based modeling frame-
work able to handle uncertain data streams from online information systems.
Evolving granular systems suggest a paradigm shift in online data analysis sup-
ported by the fact that storage and offline processing of large amounts of data
are quite often impractical or simply not cost effective. Accuracy, transparency,
and interpretability are the keys in evolving granular systems.
Three practical approaches founded on principles from different theories were
suggested to handle granular data streams. Interval based evolving modeling
is an interval granular approach to enclose imprecise data streams revealed as
tolerance intervals. Interval-based modeling comes with a recursive learning al-
gorithm rooted in fundamentals of interval mathematics. Antecedent and con-
sequent parts of interval rules are interval hyperboxes which are connected by
an inclusion function. Fuzzy set based evolving modeling uses fuzzy granular
models to deal with finer fuzzy data. For each fuzzy granule, there exists an
associated fuzzy rule. The structure of the fuzzy rule base is gradually developed
from an incremental learning algorithm suitable to process potentially unbounded
fuzzy data streams. This approach renders linguistic models of systems and fuzzy
granular approximation of functions. Evolving granular neural networks use fuzzy
granules and fuzzy aggregation neurons for information fusion. The network can
be translated into a knowledge base and a comprehensible rule-based inference
system. Learning in evolving granular neural networks consists in building and
adapting the network structure from fuzzy data streams. This means that the
neural network captures new information from data streams, adapts itself to the
143
8.3 Future Research
new scenario, and avoids redesigning and retraining.
A large set of experiments was performed to show the usefulness of the evolving
granular approaches. The interval IBeM, fuzzy FBeM, and neurofuzzy eGNN
approaches were evaluated in a variety of applications such as semi-supervised
classification, time series prediction, function approximation, and control. The
experiments emphasized the difficulty of currently existing machine learning and
computational intelligence approaches to deal with nonstationary data streams.
Comparative results have demonstrated the relevance of the proposed framework.
Although important results have been achieved in this thesis, a lot of challenges
still lie ahead.
8.3 Future Research
The result of this thesis provides potential insights for further research. Next, we
briefly list some of the most immediate works that will improve our achievements.
In the near future, an important research topic is to elaborate on new clus-
tering methods of interval and fuzzy interval data streams. Different closeness
metrics for intervals and fuzzy intervals are likely to be rethought in a recur-
sive way and therefore considered in unsupervised evolving granular modeling.
Techniques for clustering categorical data streams are also worth addressing.
Another potential area of future work is the study of multi-dimensional granu-
lar models. Skewed, non-aligned, multi-dimensional granules allow dependencies
among several input variables to be captured without necessarily committing to
any directional association of the underlying variables. Multi-dimensional data
representation preserves information about interactions between input variables
through the use of dispersion matrices.
During this thesis, we did not go into chunk-based learning. Chunks of data
are sets of sequential data samples buffered to be analyzed at once. Chunks
are discarded soon after use. Although incremental data chunk driven learning
algorithms usually require additional memory and processing time compared to
incremental instance-based algorithms, we envision they may provide interesting
insights in one-class classification problems and learning from imbalanced data
144
8.3 Future Research
sets.
We have addressed data granulation in the time (sampling) and space (clus-
tering) domains. We hope to extend this further to granulate the feature domain
(feature selection). In this direction, uncertainty in data representation may be
useful to help choosing granular features. For example, a feature with greater
uncertainty may not be as important as one with smaller uncertainty. Data un-
certainty works as a guideline to incremental granular feature selection.
The interval, fuzzy and neurofuzzy approaches discussed in this work are
considered semi-active. In semi-active learning, not all available data samples
are used for model adaptation. Implicitly, the approaches ignore indistinguish-
able samples as a result of temporal and spatial granulation. Conversely, active
learning approaches deal with filtering mechanisms. Filtering mechanisms such as
those based on the participatory learning paradigm may prevent evolving systems
to be exposed to outliers, thus being an interesting issue to be explored.
This thesis did not cover the qualitative effects of data granulation which re-
lates directly to computing with words. In computing with words, the objects of
computation are words and propositions drawn from natural language. Often, we
translate information expressed in words (soft information) into some tractable
granular computing framework. When people want this information back from a
fusion system, they want it retranslated in a way it can be used in the task they
are really interested in. More precisely, the retranslation process consists in con-
verting a formal mathematical representation into natural language statements
that can be understood by human beings. We believe there is room to research on
finding better criteria to retranslate granules into words based on external goals.
We envision as thesis topics of great importance in the near future hybrids be-
tween evolving rule-based systems and methods from the rough sets and support
vector machines theories. In particular, rough set theory is an instance of granu-
lar computing framework. Imprecision in the rough set approach is expressed by
a boundary region, and not by partial membership, as in fuzzy set theory. Two
crisp sets - called lower and upper approximation - are associated to a rough set.
The lower approximation of a set consists of all elements that surely belong to
the set, whereas the upper approximation of the set constitutes of all elements
that possibly belong to the set. The difference of the upper and the lower ap-
145
8.3 Future Research
proximation is the boundary region. Any rough set, in contrast to a crisp set, has
a non-empty boundary region. Evolving rough sets from uncertain data streams
is still an approach to be explored. Support vector machines are characterized
by the use of kernel mapping techniques. Although the support vector machine
approach has been primarily applied to pattern recognition, many of its ideas
carry over to the cases of function approximation. A support vector machine can
be, for example, used in the consequent part of a granular rule, thus providing
local nonlinear models for classification or regression of uncertain data.
Although granular computing extends real-valued computing to computing
with intervals, fuzzy sets, etc., they remain deterministic in its very practical
aspect because calculations remain based on real parameters which character-
ize granules. Computing with granules and words are recognized to be of great
relevance to some matters in which the confusion of goals does not justify per-
fection of means. However, the current digital signal processing technology and
the discrete/continuous dichotomy do not allow computing in a more abstract
human-like level of thinking in its ultimate meaning. These are more philosoph-
ical issues worthy of future research.
On a more practical level, there is a need to reproduce our results on virtual
or real-world environments such as in an online business or industrial setting.
Although the data sets used in the experiments were recorded from actual appli-
cations, other software development issues must be addressed so that the proposed
methods may effectively provide online decision support. System integration is
needed to link together the different methods, computer networking, and soft-
ware applications to act as a coordinate whole. Software testing is necessary to
validate if the requirements of the applications were met and that the methods
work as expected, producing similar results to those obtained in simulations. The
extent to which the presented results can be reproduced as part of an integrated
system still remains to be fully determined.
We expect research on evolving granular systems to grow further, and believe
that it may have a valuable role in prediction and decision support systems.
146
Appendix A
Universal Approximation
In this appendix we provide a constructive proof that an evolving granular system
can approximate any arbitrary continuous function with arbitrary accuracy in
compact domains. The proof is based on a work of Davis (36). Here we emphasize
multiple input single output models.
Let
p=
c
X
i=1
pi,(A.1)
be a piecewise linear continuous function where
pi=
ai
0+
n
P
j=1
ai
jxjif xj[li
j, Li
j]j
0 otherwise
The purpose of pis to approximate a continuous function f:LRwhere
L= [li
1, Li
1]×... ×[li
j, Li
j]×... ×[li
n, Li
n]; index irefers to an ordered set of non-
overlapping granules, that is, l1
j=min(x[1]
j, ..., x[H]
j), Lc
j=max(x[1]
j, ..., x[H]
j), His
the total number of samples in a given instant of time. Lower and upper bounds
147
of intervals are such that l1
jL1
j=... =li
jLi
j=... =lc
jLc
j. In other words,
our purpose is to approximate function fby piecewise linear functions.
In what follows we limit our discussion to real and continuous functions fthat
are defined on L, i.e., the convex hull of the current set of granules γ= (γ1, ..., γc).
Observations from the function fare designated as x[A],x[B],x[C], etc., instead
of x[h]in order to stress independence from the order of data presentation.
Theorem 1: Let Pbe an enclosure of a family of functions p(x) such that
pPimplies max(p1, ..., pc)P
and min(p1, ..., pc)P. (A.2)
A continuous function fis uniformly approximable by members of pwithin Pif
and only if for any two points x[A]and x[B], and for any e > 0, there exists a p
such that
|f(x[A])p(x[A])|< e and
|f(x[B])p(x[B])|< e. (A.3)
Proof: If uniform approximation is possible, then given e > 0 we can find a
p(x[A])Psuch that
|f(x[A])p(x[A])|< e,
and so (A.3) follows trivially. Conversely, suppose that (A.3) holds. Select a
fixed x[B]Pand a fixed e > 0. Then, for any point x[C], we can find
a function p(x[A]) = p(x[A];x[B], x[C], e) such that |f(x[B])p(x[B])|< e and
|f(x[C])p(x[C])|< e. In particular,
148
p(x[C])< f(x[C]) + e.
By continuity of pand f, this inequality must persist in a certain neighborhood
Nof x[C]. As x[C]runs over all the points of P, the corresponding neighborhoods
must cover P. By the Heine-Borel theorem, which states that every closed interval
in <nis compact, we can find a finite number of them, N1, ..., Nc, that covers P.
The corresponding functions p(x[A];x[B], x[Ci]) satisfy
p(x[A];x[B], x[Ci])< f(x[A]) + e,
x[A]Ni, i = 1, ..., c. (A.4)
Define
p(x[A];x[B]) = min{p(x[A];x[B], x[C1]), ..., p(x[A];x[B], x[Cc])}.(A.5)
By (A.2) iterated, pPand by (A.4),
p(x[A], x[B])< f(x[A]) + e, x[A]P. (A.6)
Once again, for each iwe have
|f(x[B])p(x[B];x[B], x[Ci])|< e
so that
149
p(x[B];x[B], x[Ci])> f(x[B])e.
It follows from (A.5) that
p(x[B];x[B])> f(x[B])e. (A.7)
By continuity, (A.7) must persist in a neighborhood Oof x[B]:
p(x[A];x[B])> f(x[A])e.
Now, let x[B]run over P. These neighborhoods Ocover P, and we may find a
finite number of them, O1, O2, ..., Oc, corresponding to x[B1], ..., x[Bc], that covers
P. Since
p(x[A];x[Bi])> f(x[A])e, x[A]Oi, i = 1,2, ..., c,
and since the Oicover P, for every x[A]P, the inequality
p(x[A];x[Bi])> f(x[A])e
must hold for some i.
If we set
s(x[A]) = max{p(x[A], x[B1]), ..., p(x[A], x[Bc])}
150
then by what we have just said,
s(x[A])> f(x[A])e, x[A]P. (A.8)
On the other hand, by (A.6), p(x[A];x[B])< f(x[A]) + e,x[A]Px[B]. Hence,
s(x[A])< f(x[A]) + e, x[A]P. (A.9)
Combining (A.9) with (A.8),
|f(x[A])s(x[A])|< e, x[A]P.
Finally, by (A.2) iterated, s(x[A])P.
Being Pa finite domain and pthe set of all piecewise linear functions defined
on P(A.1), it is easy to verify that psatisfies (A.2). Condition (A.3) can be
satisfied with e= 0 by means of a linear function. These results theoretically
guarantee that the desired approximation is always achievable Q.E.D. This is
stated more precisely in the following corollary.
Corollary 1: Every continuous function can be approximated uniformly on a
finite interval by continuous piecewise linear functions.
Let Hbe a sufficient large number of steam data x[h], h = 1, ..., H , so that there
exists coverage for all granules in the problem space L= [li
1, Li
1]×...×[li
j, Li
j]×...×
[li
n, Li
n]. Superscript i= 1, ..., c, refers to an ordered set of granules. Moreover,
being Hfinite, the total number of granules in Lis finite, with 1 cH
depending on the granularity value, ρ.
Complete covarage of Lafter Htime steps is guaranteed by creating granules
that match every never-before-seen value and forbidding learning algorithms to
delete granules. Then, for a model with cgranules, the difference between a
151
certain pi(x[h]) and y[h], a measure of f, provides the worst case approximation
error ei=max(e1, ..., ec). Being all functions pi,i= 1, ..., c, piecewise-linear
and defined for all x[h]in L, function p=
c
P
i=1
pisatisfies condition (A.3) for any
(x, y)[h]. Therefore, corollary 2 follows trivially from corollary 1.
Corollary 2: Evolving granular systems are universal approximators.
152
Appendix B
Recursive Least Squares Method
The recursive least squares (RLS) algorithm is used to adapt consequent function
parameters ai
jas follows.
Let (x, y)[h]be the sample available for training at step h. We adjust the
coefficients ai
jof piassuming that
y[h]=ai
0+
n
X
j=1
ai
jx[h]
j.(B.1)
If xjand yare intervals or symmetric trapezoids, then to adapt the coefficients ai
j
using the standard form of the RLS algorithm, we take advantage from the mid-
point of the respective intervals or trapezoids. In the remainder of this appendix
we assume (x, y)[h]are real numbers (midpoints of intervals or trapezoidal fuzzy
data) for short. In case trapezoids are asymmetric, then an alternative is to use
the center of the area.
In the matrix form, the equation (B.1) becomes
Y=Xi,(B.2)
153
where Y= [y[h]], X= [1 x[h]
1... x[h]
n], and Ωi= [ai
0... ai
n]Tis the vector of
unknown parameters. To estimate the coefficients ai
jwe let
Y=Xi+E, (B.3)
where
E=[h]=y[h]p(x[h]) (B.4)
is the approximation error. While in batch estimation the rows in Y,Xand E
increase with the number of available instances, in recursive mode only two rows
are kept and we reformulate equations (B.2)-(B.4) as follows:
Y="y[h1]
y[h]#, X ="1x[h1]
1... x[h1]
n
1x[h]
1... x[h]
n#
and E="[h1]
[h]#.(B.5)
The rows in (B.5) refer to values before and just after adaptation. The RLS
algorithm chooses Ωito minimize the functional
J(Ωi) = ETE. (B.6)
iis given by
154
i= (XTX)1XTY. (B.7)
Assuming P= (XTX)1and using the matrix inversion lemma (148) we avoid
inverting XTXusing:
P(new) = P(old) IXXTP(old)
1 + XTP(old)X,(B.8)
where Iis the identity matrix. In practice it is usual to choose large initial values
for the entries of the main diagonal of P. We use P[0] = 103Ias the default value.
After simple mathematical transformations, the vector of parameters is rear-
ranged recursively as follows:
i(new) = Ωi(old) + P(new)XYXTi(old).(B.9)
Detailed derivations of the RLS algorithm can be found in (13) and convergence
proof in (61).
155
156
Bibliography
[1] Aggarwal, C. C.; Han, J.; Wang, J.; Yu, P. S. “A framework for on-demand
classification of evolving data streams.” IEEE Transactions on Knowledge
and Data Engineering, Vol. 18, Issue 5, pp: 577-589, 2006. 32
[2] Aggarwal, C. C.; Yu, P. S. “A framework for clustering uncertain data
streams.” IEEE International Conference on Data Engineering, pp: 150-
159, 2008. 44,63
[3] Aggarwal, C. C.; Yu, P. S. (Eds.) Privacy-Preserving Data Mining: Models
and Algorithms. Springer-Verlag (Series: Advances in Database Systems),
Vol. 34, 513p. 2008. 39,73
[4] Alonso, J. M.; Magdalena, L. “Special issue on interpretable fuzzy systems.”
Information Sciences, Vol. 181, pp: 4331-4339, 2011. 32
[5] Angelov, P. Evolving Rule-Based Models: A Tool for Design of Flexible
Adaptive Systems. Springer-Verlag, Heidelberg, New York (Studies in
Fuzziness and Soft Computing), 227p. 2002. 4,33
[6] Angelov, P.; Filev, D. “An approach to online identification of Takagi-Sugeno
fuzzy models.” IEEE Transactions on Systems, Man, and Cybernetics -
Part B, Vol. 34, Issue 1, pp: 484-498, 2004. 33,40,111,124,130
[7] Angelov, P.; Zhou, X. “Evolving fuzzy systems from data streams in real-
time.” IEEE Symposium on Evolving Fuzzy Systems, pp: 29-35, 2006.
111,124,130
[8] Angelov, P.; Zhou, X.; Filev, D.; Lughofer, E. “Architectures for evolving
fuzzy rule-based classifiers.” IEEE International Conference on Systems,
Man and Cybernetics, pp: 2050-2055, 2007. 34
[9] Angelov, P.; Filev, D.; Kasabov, N. (Eds.) Evolving Fuzzy Systems - Preface
to the Special Section. IEEE Transactions on Fuzzy Systems, Vol. 6, Issue
6, pp: 1390-1392, 2008. 4,32
157
BIBLIOGRAPHY
[10] Angelov, P.; Zhou, X. “Evolving fuzzy-rule-based classifiers from data
streams.” IEEE Transactions on Fuzzy Systems, Vol. 16, Issue 6, pp: 1462-
1475, 2008. 32,34,77
[11] Angelov, P.; Filev, D.; Kasabov, N. (Eds.) Evolving Intelligent Systems:
Methodology and Applications. Wiley-IEEE Press Series on Computa-
tional Intelligence, 444p. 2010. 4,32,33,44,79
[12] Ashokaraj, I.; Tsourdos, A.; Silson, P.; White, B. “Sensor based robot lo-
calisation and navigation: using interval analysis and extended Kalman
filter.” 5th Asian Control Conference, Vol. 2, pp: 1086-1093, 2004. 49
[13] Astrom, K. J.; Wittenmark, B. Adaptive Control. Prentice-Hall, Addison-
Wesley, Boston, 2nd edition, 580p. 1994. 32,155
[14] Ballini, R.; Mendonca, A.; Gomide, F. “Evolving fuzzy modeling of sovereign
bonds.” Journal of Financial Decision Making, Special Issue: The Fuzzy
Logic in the Financial Uncertainty, Vol. 5, Issue 2, pp: 3-15, 2009. 128
[15] Bargiela, A.; Pedrycz, W. Granular Computing: An Introduction. Kluwer
Academic Publishers - Boston, 1st edition, 452p. 2002. 2,3,11,12,44,45
[16] Bargiela A.; Pedrycz, W. “Granulation of temporal data: a global view
on time series.” International Conference of the North American Fuzzy
Information Processing Society, pp: 191-196, 2003. 40
[17] Bargiela, A.; Pedrycz, W. “Recursive information granulation: aggregation
and interpretation issues.” IEEE Transactions on Systems, Man, and Cy-
bernetics - Part B, Vol. 33, Issue 1, pp: 96-112, 2003. 43,44
[18] Bargiela, A.; Pedrycz, W. “Granular mappings.” IEEE Transactions on Sys-
tem, Man, and Cybernetics - Part A, Vol. 35, Issue 2, pp: 292-297, 2005.
3,11,13
[19] Bargiela, A.; Pedrycz, W. “Toward a theory of granular computing for
human-centered information processing.” IEEE Transactions on Fuzzy
Systems, Vol. 16, Issue 2, pp: 320-330, 2008. 11
[20] Beliakov, G.; Pradera, A.; Calvo, T. Aggregation Functions: A Guide for
Practitioners. Springer-Verlag, Berlin, Heidelberg, 1st edition (Studies in
Fuzziness and Soft Computing), 361p. 2007. 26
[21] Beringer, J.; Hullermeier, E. “Efficient instance-based learning on data
streams.” Intelligent Data Analysis, Vol. 11, Issue 6, pp: 627-650, 2007. 4
158
BIBLIOGRAPHY
[22] Bezdek, J. Pattern Recognition with Fuzzy Objective Function Algoritms.
Plenum Press, New York, 1981. 100
[23] Bifet, A.; Holmes, G.; Pfahringer, B.; Kranen, P.; Kremer, H.; Jansen, T.;
Seidl, T. “MOA: Massive online analysis, a framework for stream classi-
fication and clustering.” Journal of Machine Learning Research, Vol. 11,
pp: 44-50, 2010. 37
[24] Bouchachia, A.; Gabrys, B.; Sahel, Z. “Overview of some incremental learn-
ing algorithms.” IEEE International Conference on Fuzzy Systems, pp:
1-6, 2007. 32
[25] Bouchachia, A. “An evolving classification cascade with self-learning.”
Evolving Systems, Vol. 1, Issue 3, pp: 143-160, 2010. 32
[26] Bouchon-Meunier, B. (Ed.) Aggregation and Fusion of Imperfect Informa-
tion. Physica-Verlag, Heidelberg, New York (Studies in Fuzziness and Soft
Computing), 278p. 1998. 80
[27] Bouchon-Meunier, B.; Marsala, C.; Rifqi, M.; Yager, R. R. (Eds.) Uncer-
tainty in Intelligent and Information Systems. World Scientific - Singapore,
536p. 2008. 2,48
[28] Box, G. E. P.; Jenkins, G. M.; Reinsel, G. C. Time Series Analysis: Forecast-
ing and Control. Wiley Series in Probability and Statistics, 4th edition,
746p. 2008. 107
[29] Carpenter, G. A.; Grossberg, S. “A massively parallel architecture for a self-
organizing neural pattern recognition machine.” Computer Vision, Graph-
ics, and Image Processing, Vol. 37, pp: 54-115, 1987. 87
[30] Carpenter, G. A.; Grossberg, S.; Markuzon, N.; Reynolds, J. H.; Rosen,
D. B. “Fuzzy ARTMAP: A neural network architecture for incremental
supervised learning of analog multidimensional maps.” IEEE Transactions
on Neural Networks, Vol. 3, Issue 5, pp: 698-713, 1992. 62
[31] Carvalho, F. A. T.; Souza, R. M. C. R.; Chavent, M.; Lechevallier, Y. “Adap-
tative Hausdorff distances and dynamic clustering of symbolic interval
data.” Pattern Recognition Letters, Vol. 27, Issue 3, pp: 167-179, 2006.
49
[32] Chen, S.; He, H. “Towards incremental learning of nonstationary imbalanced
data stream: a multiple selectively recursive approach.” Evolving Systems,
Vol. 2, Issue 1, pp: 35-50, 2011. 32
159
BIBLIOGRAPHY
[33] Cross, V. V.; Sudkamp, T. A. Similarity and compatibility in fuzzy set the-
ory: assessment and applications. Physica-Verlag Heidelberg (Studies in
Fuzziness and Soft Computing), 209p. 2002. 26,70
[34] Da Deng; Kasabov, N. “ESOM: An algorithm to evolve self-organizing maps
from online data streams.” IEEE International Joint Conference on Neural
Networks, Vol. 6, pp: 3-8, 2000. 34
[35] Darwin, C. R. The origin of species by means of natural selection, or the
preservation of favoured races in the struggle for life. John Murray - Lon-
don, 6th edition, 1872. 31
[36] Davis, P. J. Interpolation and Approximation. Dover Publications, 393p.
1963. 122,147
[37] Do, T.-N.; Poulet, F. “Kernel-based algorithms and visualization for interval
data mining.” In: Zighed, D. A.; Tsumoto, S.; Ras, Z. W., Mining Com-
plex Data, SCI 165, Springer-Verlag, Berlin, Heidelberg, pp: 75-91, 2009.
49
[38] Domingos, P.; Hulten, G. “Mining high-speed data streams.” International
Conference on Knowledge Discovery and Data Mining, pp: 7180, 2000. 37
[39] Drossu, R.; Obradovic, Z. “Rapid design of neural networks for time series
prediction.” IEEE Computational Science & Engineering, Vol. 3, Issue 2,
pp: 78-89, 1996. 109
[40] Dubois, D.; Kerre, E.; Mesiar, R.; Prade, H. “Fuzzy interval analysis.” In:
The Handbook of Fuzzy Sets, Vol. 1 - Fundamentals of Fuzzy Sets, Kluwer
Academic - Bordrecht, pp: 483-581, 2000. 21
[41] Dubois D.; Prade, H. (Eds.) Fundamentals of Fuzzy Sets. Kluwer Academic
Publishers, 1st edition, 653p. 2000. 5
[42] Dubois, D.; Prade, H. “On the use of aggregation operations in information
fusion processes.” Fuzzy Sets and Systems, Vol. 142, Issue 1, pp: 143-161,
2004. 25,40
[43] Elwell, R.; Polikar, R. “Incremental learning of concept drift in nonstationary
environments.” IEEE Transactions on Neural Networks, Vol. 22, Issue 10,
pp: 1517-1531, 2011. 37
[44] Engelbrecht, A. P. Computational Intelligence: An Introduction. Wiley -
Chichester, England, 2nd edition, 597p. 2007. 4
160
BIBLIOGRAPHY
[45] Fawcett, T. “An introduction to ROC analysis.” Pattern Recognition Letters,
Vol 27, pp: 861-874, 2006. 100
[46] Gabrys, B.; Bargiela, A. “General fuzzy min-max neural network for cluster-
ing and classification.” IEEE Transactions on Neural Networks, Vol. 11,
Issue 3, pp: 769-783, 2000. 32,36,49,78,90
[47] Gabrys, B.; Petrakieva, L. “Combining labelled and unlabelled data in the
design of pattern classification systems.” International Journal of Approx-
imate Reasoning, Vol. 35, Issue 3, pp: 251-273, 2004. 98,105
[48] Gama, J.; Medas, P. “Learning decision trees from dynamic data streams.”
Journal of Universal Computer Science, Vol. 11, Issue 8, pp: 1353-1366,
2005. 37
[49] Hahn, G. J.; Meeker, W. Q. Statistical Intervals: A Guide for Practitioners.
Wiley, USA, 387p. 1991. 21
[50] Hall, D. L.; Llinas, J. “An introduction to multisensor data fusion.” Pro-
ceedings of the IEEE, Vol. 85, Issue 1, pp: 6-23, 1997. 44
[51] Hamilton, J. D. Time Series Analysis. Princeton University Press, 1st edition,
799p. 1994. 4,107
[52] Hamming, R. W. “Error detecting and error correcting codes.” Bell System
Technical Journal, Vol. 29, Issue 2, pp: 147160, 1950. 26,69
[53] Hansen, E. R.; Walster, G. W. Global Optimization using Interval Analysis.
2nd edition, Marcel Dekker, New York - Basel, 489p. 2004. 14
[54] Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning:
Data Mining, Inference and Prediction. Springer-Verlag, 2nd edition, 768p.
2009. 4,33,130
[55] Haykin, S. Neural Networks: A Comprehensive Foundation. Prentice Hall,
2nd edition, 823p. 1999. 6,100,111,130
[56] Hickey, T.; Ju, Q.; van Emden, M. H. “Interval arithmetic: from principles
to implementation.” Journal of the ACM, Vol. 48, Issue 5, pp: 1038-1068,
2001. 14,50
[57] Ho, A.; Iansek, R.; Marigliani, C.; Bradshaw, J.; Gates, S. “Speech impair-
ment in a large sample of patients with Parkinson’s disease.” Behavioral
Neurology, Vol. 11, pp: 131-137, 1998. 127
161
BIBLIOGRAPHY
[58] Ho, W. L.; Tung, W. L.; Quek, C. “An evolving Mamdani-Takagi-Sugeno
based neural-fuzzy inference system with improved interpretability-
accuracy.” IEEE International Conference on Fuzzy Systems, pp: 1-8,
July, 2010. 62
[59] Iglesias, J. A.; Angelov, P.; Ledezma, A.; Sanchis, A. “Evolving classification
of agents behaviors: a general approach.” Evolving Systems, Vol. 1, Issue
3, pp: 161-171, 2010. 32
[60] Jaulin, L.; Keiffer, M.; Didrit, O.; Walter, E. Applied Interval Analysis.
Springer-Verlag - London, 379p. 2001. 5,14,38
[61] Johnson, C. R. Lectures on Adaptive Parameter Estimation. Prentice-Hall -
Upper Saddle River, USA, 185p. 1988. 155
[62] Kaburlasos, V. G.; Papadakis, S. E. “Granular self-organizing map (grSOM)
for structure identification.” Neural Networks, Vol. 19, Issue 5, pp: 623-
643, 2006. 78
[63] Kasabov, N. “Evolving fuzzy neural networks for supervised / unsupervised
online knowledge-based learning.” IEEE Transactions on Systems, Man,
and Cybernetics - Part B, Vol. 31, Issue 6, pp: 902-918, 2001. 34,79
[64] Kasabov, N.; Song, Q. “DENFIS: Dynamic evolving neural-fuzzy inference
system and its application.” IEEE Transactions on Fuzzy Systems, Vol.
10, Issue 2, pp: 144-154, 2002. 34,79,111,124
[65] Kasabov, N. Evolving Connectionist Systems: Methods and Applications in
Bioinformatics, Brain Study and Intelligent Machines. Springer-Verlag -
London, 1st edition, 320p. 2003. 4,33,34,79
[66] Kasabov, N. Evolving Connectionist Systems: The Knowledge Engineering
Approach. Springer-Verlag - London, 2nd edition, 451p. 2007. 4,32,34,
40,79
[67] Kaufmann, A.; Gupta, M. M. Introduction to Fuzzy Arithmetic: Theory
and Applications. Van Nostrand Reinhold Company Inc., New York, 350p.
1985. 21
[68] Kausay, T.; Simon, T. K. “Acceptance of concrete compressive strength.”
Concrete Structures, Vol. 8, pp: 54-63, 2007. 122,123
[69] Kearfott, R. B.; Kreinovich, V. Applications of Interval Computations.
Kluwer Academic Publishers, 425p. 1996. 5,14
162
BIBLIOGRAPHY
[70] Klir, G. K.; Yuan, B. Fuzzy Sets and Fuzzy Logic: Theory and Applications.
Prentice Hall, 1st edition, 592p. 1995. 21,22
[71] Kosko, B. Neural Networks and Fuzzy Systems. A Dynamical Systems Ap-
proach to Machine Intelligence. Prentice-Hall, Englewood Cliffs, 449p.
1991. 44
[72] Kreinovich, V. “Interval computations as an important part of granular com-
puting: an introduction.” In: Pedrycz, W.; Skowron, A.; Kreinovich, V.
(Eds.) Handbook of Granular Computing, pp: 1-31, 2008. 2
[73] Kuncheva L. I. Fuzzy Classifier Design. Springer-Verlag, Heidelberg, 321p.
2000. 44
[74] Last, M. “Online classification of nonstationary data streams.” Intelligent
Data Analysis, Vol. 6, Issue 2, pp: 129-147, 2002. 43
[75] Leite, D.; Costa, P.; Gomide, F. “Interval-based evolving modeling.” IEEE
Symposium Series on Computational Intelligence, pp: 1-8, 2009. 32,40,
48
[76] Leite, D.; Costa, P.; Gomide, F. “Evolving granular classification neural
networks.” IEEE International Joint Conference on Neural Networks, pp:
1736-1743, 2009. 77
[77] Leite, D.; Costa, P.; Gomide, F. “Evolving granular neural network for
semi-supervised data stream classification.” World Congress on Compu-
tational Intelligence - International Joint Conference on Neural Networks,
pp: 1877-1884, 2010. 40,77
[78] Leite, D.; Costa, P.; Gomide, F. “Granular approach for evolving system
modeling.” In: Hullermeier, E.; Kruse, R.; Hoffmann F. (Eds.) Lecture
Notes in Artificial Intelligence, Vol. 6178, pp: 340-349, Springer, 2010. 40,
48
[79] Leite, D.; Gomide, F.; Ballini, R.; Costa, P. “Fuzzy granular evolving mod-
eling for time series prediction.” IEEE International Conference on Fuzzy
Systems, pp: 2794-2801, 2011. 32,40,53,67,73
[80] Leite, D.; Gomide, F. “Evolving linguistic fuzzy models from data streams.”
In: Trillas, E.; Bonissone, P.; Magdalena, L.; Kacprycz, J. (Eds.) Combin-
ing Experimentation and Theory: A Hommage to Abe Mamdani (Studies
in Fuzziness and Soft Computing), pp: 209-223, 2011. 40,67,120
163
BIBLIOGRAPHY
[81] Leite, D.; Costa, P.; Gomide, F. “Interval approach for evolving granular
system modeling.” In: Mouchaweh, M. S.; Lughofer, E. (Eds.) Learning
in Non-Stationary Environments: Methods and Applications, Springer-
Verlag, pp: 271-301, 2012. 40,48
[82] Leite, D.; Costa, P.; Gomide, F. “Evolving granular neural network for fuzzy
time series forecasting.” World Congress on Computational Intelligence -
IEEE Joint Conference on Neural Networks, 8p. 2012. 77
[83] Lemos, A.; Caminhas, W.; Gomide, F. “Fuzzy evolving linear regression
trees.” Evolving Systems, Vol. 2, Issue 1, pp: 1-14, 2011. 32,37,124
[84] Lemos, A.; Caminhas, W.; Gomide, F. “Multivariable Gaussian evolving
fuzzy modeling system.” IEEE Transactions on Fuzzy Systems, Vol. 19,
Issue 1, pp: 91-104, 2011. 32,35
[85] Lemos, A.; Caminhas, W.; Gomide, F. “Evolving fuzzy linear regression
trees with feature selection.” IEEE Workshop on Evolving and Adaptive
Intelligent Systems, pp: 31-38, 2011. 37
[86] Liggins, M. E.; Hall, D. L.; Llinas, J. (Eds.) Handbook of Multisensor Data
Fusion: Theory and Practice. CRC Press, 2nd edition, 849p. 2008. 3
[87] Lima, E.; Gomide, F.; Ballini, R. “Participatory evolving fuzzy modeling.”
International Symposium on Evolving Fuzzy Systems, pp: 36-41, 2006.
35,124
[88] Lin, T. Y. “Granular computing on binary relations.” International Confer-
ence on Rough Sets and Current Trends in Computing, pp: 296-299, 2002.
78
[89] Lin, T. Y. “Neural networks, qualitative fuzzy logic and granular adaptive
systems.” World Congress of Computational Intelligence, pp: 566-571,
2002. 2,11
[90] Little, M. A.; McSharry, P. E.; Hunter, E. J.; Spielman, J.; Ramig, L. O.
“Suitability of dysphonia measurements for telemonitoring of Parkinson’s
disease.” IEEE Transactions on Biomedical Engineering, Vol. 56, Issue 4,
pp: 1015-1022, 2009. 128
[91] Little, R. J. A.; Rubin, D. B. Statistical Analysis with Missing Data. Wiley-
Interscience, 2nd edition, 381p. 2002. 39
[92] Ljung, L. System Identification - Theory for the User. Prentice-Hall, Engle-
wood Cliffs, NJ, 519p. 1988. 33
164
BIBLIOGRAPHY
[93] Lodwick, W.; Jamison, K. D. “Special issue: interfaces between fuzzy set
theory and interval analysis.” Fuzzy Sets and Systems, Vol. 135, pp: 1-3,
2003. 21
[94] Lughofer, E. “FLEXFIS: A robust incremental learning approach for evolving
Takagi-Sugeno fuzzy models.” IEEE Transactions on Fuzzy Systems, Vol.
16 , Issue 6, pp: 1393-1410, 2008. 32,36
[95] Lughofer, E.; Angelov, P. “Handling drifts and shifts in on-line data streams
with evolving fuzzy systems.” Applied Soft Computing, Vol. 11, Issue 2,
pp: 2057-2068, 2011. 34,44
[96] Lughofer, E.; Bouchot, J.-L.; Shaker, A. “On-line elimination of local redun-
dancies in evolving fuzzy systems.” Evolving Systems, Vol. 2, Issue 3, pp:
165-187, 2011. 73,95
[97] Lughofer, E. Evolving Fuzzy Systems - Methodologies, Advanced Concepts
and Applications. Springer-Verlag, Berlin Heidelberg, 460p. 2011. 4,40,
59
[98] Lughofer, E. “On-line incremental feature weighting in evolving fuzzy clas-
sifiers.” Fuzzy Sets and Systems, Vol. 163, Issue 1, pp: 1-23, 2011. 93
[99] Maimon, O. Z.; Rokach, L. The Data Mining and Knowledge Discovery
Handbook. Springer - New York, USA, 1383p. 2005. 3
[100] Mendel, J. M. “Type-2 fuzzy sets and systems: an overview.” IEEE Com-
putational Intelligence Magazine, Vol. 2, Issue 2, pp: 20-29, 2007. 21
[101] Mitchell, T. M. Machine Learning. McGraw-Hill Sci-
ence/Engineering/Math, 1st edition, 414p. 1997. 4
[102] Mitchell, T. M. “The role of unlabeled data in supervised learning.” Pro-
ceedings of the Sixth International Colloquium on Cognitive Science, 8p.
1999. 98,105
[103] Moore, R. E. Interval Analysis. Prentice Hall - Englewood Cliffs, NJ, 145p.
1966. 14,21
[104] Moore, R. E. Methods and Applications of Interval Analysis. SIAM -
Philadelphia, 190p. 1979. 5,14,20
[105] Moore, R. E.; Lodwick, W. “Interval analysis and fuzzy set theory.” Fuzzy
Sets and Systems, Vol. 135, Issue 1, pp: 5-9, 2003. 21
165
BIBLIOGRAPHY
[106] Moore, R. E.; Kearfott, R. B.; Cloud, M. J. Introduction to Interval Anal-
ysis. SIAM - Philadelphia, 223p. 2009. 18,21
[107] Muhlbaier, M.; Topalis, A.; Polikar, R. “Learn++.NC: Combining ensem-
ble of classifiers with dynamically weighted consult-and-vote for efficient
incremental learning of new classes.” IEEE Transactions on Neural Net-
works, Vol. 20, Issue 1, pp: 152168, 2009. 37
[108] Nandedkar, A. V.; Biswas, P. K. “A granular reflex fuzzy min-max neural
network for classification.” IEEE Transactions on Neural Networks, Vol.
20, Issue 7, pp: 1117-1134, 2009. 49,78,90
[109] Neumaier, A. Interval Methods for Systems of Equations. Cambridge Uni-
versity Press, Cambridge, 272p. 1990. 14
[110] Ozawa, S.; Pang, S.; Kasabov, N. “Incremental learning of chunk data
for online pattern classification systems.” IEEE Transactions on Neural
Networks, Vol. 19, Issue 6, pp: 1061-1074, 2008. 43
[111] Pedrycz, W.; Waletzky, J. “Fuzzy clustering with partial supervision.”
IEEE Transactions on Systems, Man and Cybernetics - Part B, Vol. 27,
Issue 5, pp: 787795, 1997. 98
[112] Pedrycz, W.; Vukovich, W. “Granular neural networks.” Neurocomputing,
Vol. 36, pp: 205-224, 2001. 75,76
[113] Pedrycz, W. “Heterogeneous fuzzy logic networks: fundamentals and devel-
opment studies.” IEEE Transactions on Neural Networks, Vol. 15, Issue
6, pp: 1466-1481, 2004. 89
[114] Pedrycz, W. Knowledge-based Clustering: From Data to Information Gran-
ules. Wiley, 1st edition, 336p. 2005. 98
[115] Pedrycz, W.; Kwak, K.-C. “The development of incremental models.” IEEE
Transactions on Fuzzy Systems, Vol. 15, Issue 3, pp: 507-518, 2007. 3
[116] Pedrycz, W.; Gomide, F. Fuzzy Systems Engineering: Toward Human-
Centric Computing. Wiley - Hoboken, NJ, USA, 526p. 2007. 2,3,6,12,
22,25,26
[117] Pedrycz, W. “Granular computing - the emerging paradigm.” Journal of
Uncertain Systems, Vol. 1, pp: 38-61, 2007. 11,13
[118] Pedrycz, W.; Skowron, A.; Kreinovich, V. (Eds.) Handbook of Granular
Computing. Wiley - Chichester, England, 1116p. 2008. 2,11,12
166
BIBLIOGRAPHY
[119] Pedrycz, W. “Evolvable fuzzy systems: some insights and challenges.”
Evolving Systems, Vol. 1, Issue 2, pp: 73-82, 2010. 40
[120] Petkovic, M. S.; Petkovic, L, D. Complex Interval Arithmetic and Its Ap-
plications. Wiley - VCH, Germany, 280p. 1998. 21
[121] Polikar, R.; Udpa, L.; Udpa, S. S.; Honavar, V. “Learn++: An incremental
learning algorithm for supervised neural networks.” IEEE Transactions on
Systems, Man, and Cybernetics - Part C, Vol. 31, Issue 4, pp: 497-508,
2001. 37
[122] Pouzols, F. M.; Lendasse, A. “Evolving fuzzy optimally pruned extreme
learning machine for regression problems.” Evolving Systems, Vol. 1, Issue
1, pp: 43-58, 2010. 32
[123] Roof, S.; Callagan, C. “The climate of Death Valley, California.” Bulletin
of the American Meteorological Sociecty, Vol. 84, pp: 1725-1739, 2003.
110
[124] Rubio, J. J. “SOFMLS: Online self-organizing fuzzy modified least-squares
network.” IEEE Transactions on Fuzzy Systems, Vol. 17, Issue 6, pp: 1296
- 1309, 2009. 32,36
[125] Rubio, J. J. “Stability analysis for an online evolving neuro-fuzzy recurrent
network.” In: Angelov, P.; Filev, D.; Kasabov, N. (Eds.) Evolving Intel-
ligent Systems: Methodology and Applications, Wiley - IEEE Press, pp:
173-199, 2010. 32
[126] Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach. Series
in Artificial Intelligence, 3rd edition, 2009. 13
[127] Shafer, J. L. Analysis of Incomplete Multivariate Data. Chapman and Hall
- London, 430p. 1997. 39,124
[128] Silva, L.; Gomide, F.; Yager, R. “Participatory learning in fuzzy clustering.”
IEEE International Conference on Fuzzy Systems, pp: 857-861, 2005. 124
[129] Simpson, P. K. “Fuzzy min-max neural networks. Part I: classification.”
IEEE Transactions on Neural Networks, Vol. 3, Issue 5, pp: 776-786, 1992.
36
[130] Simpson, P. K. “Fuzzy min-max neural networks. Part II: clustering.” IEEE
Transactions on Fuzzy Systems, Vol. 1, Issue 1, pp: 32-45, 1993. 36
167
BIBLIOGRAPHY
[131] Strother, W. “Continuous multi-valued functions.” The Bulletin of Sao
Paulo Mathematical Society, Vol. 10, pp: 87-120, 1958. 21
[132] Tibshirani, R. “Regression shrinkage and selection via the Lasso.” Journal
of the Royal Statistical Society - Series B (Methodological), Vol. 58, Issue
1, pp: 267-288, 1996. 130
[133] Tsanas, A.; Little, M. A.; McSharry, P. E.; Ramig, L. O. “Accurate tele-
monitoring of Parkinson’s disease progression by noninvasive speech tests.”
IEEE Transactions on Biomedical Engineering, Vol. 57, Issue 4, pp: 884-
893, 2010. 127,128,130
[134] Vachkov, G. “Spatial-temporal knowledge base for modeling and analysis
of evolving systems.” Evolving Systems, Vol. 2, Issue 2, pp: 131-143, 2011.
32
[135] Witten, I. H.; Frank, E.; Hall, M. A. Data Mining: Practical Machine
Learning Tools and Techniques. Morgan Kaufmann, 3rd edition, 664p.
2011. 3,33,38
[136] Xiao, L.; Hung, E. “An efficient distance calculation method for uncer-
tain objects.” IEEE Symposium on Computational Intelligence and Data
Mining, pp: 10-17, 2007. 72
[137] Yager, R. R. “A model of participatory learning.” IEEE Transactions on
Systems, Man and Cybernetics, Vol. 20, Issue 5, pp: 1229-1234, 1990. 35
[138] Yager, R. R. “Learning from imprecise granular data using trapezoidal fuzzy
set representations.” In: Prade, H.; Subrahmanian, V. S. (Eds.) Lecture
Notes in Computer Science, Springer - Berlim, Heidelberg, Vol. 4772, pp:
244-254, 2007. 21,40,65,76
[139] Yager, R. R. “Measures of specificity over continuous spaces under similarity
relations.” Fuzzy Sets and Systems, Vol. 159, Issue 17, pp: 2193-2210,
2008. 44
[140] Yager, R. R. “Participatory learning with granular observations.” IEEE
Transactions on Fuzzy Systems, Vol. 17, Issue 1, pp: 1-13, 2009. 40,65
[141] Yao, J. T. “A ten-year review of granular computing.” IEEE International
Conference on Granular Computing, pp: 734-739, 2007. 2,11,12
[142] Yao, Y. Y. “Perspectives of granular computing.” IEEE International Con-
ference on Granular Computing, pp: 85-90, 2005. 3,78
168
BIBLIOGRAPHY
[143] Yao, Y. Y. “The art of granular computing.” International Conference on
Rough Sets and Emerging Intelligent Systems Paradigms, LNAI Vol. 4585,
pp: 101-112, 2007. 41
[144] Yao, Y. Y. “Granular computing: past, present and future.” IEEE Inter-
national Conference on Granular Computing, pp: 80-85, 2008. 2,11
[145] Yao, Y. Y. “Interpreting concept learning in cognitive informatics and gran-
ular computing.” IEEE Transactions on Systems, Man, and Cybernetics -
Part B, Vol. 39, Issue 4, pp: 855-866, 2009. 41
[146] Yao, Y. Y. “Human-inspired granular computing.” In: Yao, J. T. (Ed.)
Novel Developments in Granular Computing: Applications for Advanced
Human Reasoning and Soft Computing, 2010. 11,41
[147] Yeh, I.-C. “Modeling of strength of high performance concrete using artifi-
cial neural networks.” Cement and Concrete Research, Vol. 28, Issue 12,
pp: 1797-1808, 1998. 123
[148] Young, P. C. Recursive Estimation and Time-Series Analysis: An Introduc-
tion. Springer-Verlag - Berlin, 300p. 1984. 155
[149] Zadeh, L. “Fuzzy sets.” Information Control, Vol. 8, pp: 338-353, 1965. 21,
22
[150] Zadeh, L. “The concept of a linguistic variable and its application to ap-
proximate reasoning.” Information Science, Vol. 8, pp: 199-249, 1975. 21
[151] Zadeh, L. A. “Fuzzy sets and information granularity.” In: Gupta, M. M.;
Ragade, R. K.; Yager, R. R. (Eds.) Advances in Fuzzy Set Theory and
Applications, North Holland - Amsterdam, pp: 3-18, 1979. 2,11
[152] Zadeh, L. A. “Toward a theory of fuzzy information granulation and its
centrality in human reasoning and fuzzy logic.” Fuzzy Sets and Systems,
Vol. 90, Issue 2, pp: 111-127, 1997. 3,41
[153] Zadeh, L. A. “Toward a generalized theory of uncertainty (GTU) - an out-
line.” Information Sciences, Vol. 172, pp: 1-40, 2005. 3,13,48
[154] Zadeh, L. A. ”Generalized theory of uncertainty (GTU) - principal concepts
and ideas.” Computational Statistics & Data Analysis, Vol. 51, pp: 15-46,
2006. 2,3,13,40
[155] Zadeh, L. A. “Is there a need for fuzzy logic?” Information Sciences, Vol.
178, Issue 13, pp: 27512779, 2008. 5
169
BIBLIOGRAPHY
[156] Zhang, L; Zhang B. “Fuzzy reasoning model under quotient space struc-
ture.” Information Sciences, Vol. 173, pp: 353-364, 2005. 78
[157] Zhang, Y.-Q.; Fraser, M. D.; Gagliano, R. A.; Kandel, A. “Granular neural
networks for numerical-linguistic data fusion and knowledge discovery.”
IEEE Transactions on Neural Networks, Vol. 11, Issue 3, pp: 658-667,
2000. 78
[158] Zhu, X.; Goldberg, A. B. Introduction to Semi-Supervised Learning. Mor-
gan and Claypool Publishers (Synthesis Lectures on Artificial Intelligence
and Machine Learning), 116p. 2009. 98,105
[159] Zimmermann, H.-J. Fuzzy Set Theory and its Applications. Kluwer Aca-
demic Publishers, 4th edition, 544p. 2001. 5
170
... aspects of a problem may assume a granular value. In other words, non-pointwise uncertain characterization (e.g., interval, fuzzy, rough, statistical, and mixtures of uncertain objects) can be admitted to: original data instances, pre-processed or space-transformed instances; to model parameters, learning equations, covering regions, and so on [1,3,5]. Generalized constraints, in the sense of Zadeh's general theory of uncertainty [6], are used to delimit granules. ...
... As the algorithms operate on an instance-per-instance basis, they are generally very fast compared to machine learning algorithms in general. Nonetheless, time granulation -aiming at reducing the sampling rate of fast data streams and/or synchronizing concurrent data streams that are provided at random time intervals -has also been discussed [1]. A time granule describes the data stream for a certain time period. ...
... Evolving granular-computing models are structures with online learning, and summarizing and representational capabilities [1,2,3,4]. An evolving granular model is equipped with an incremental algorithm that gradually builds its structure of interconnected elements. ...
Conference Paper
Full-text available
We present an approach for data-driven modeling and evolving control of unknown dynamic systems called State-Space Evolving Granular Control. The approach is based on elements of granular computing, discrete state-space systems, and online learning. First, the structure and parameters of a granular model is developed from a stream of state data. The model is formed by information granules comprising first-order difference equations. Partial activation of granules gives global nonlinear approximation capability. The model is supplied with an algorithm that constantly updates the granules toward covering new data; however, keeping memory of previous patterns. A granular controller is derived from the granular model for parallel distributed compensation. Instead of difference equations, the content of a control granule is a gain matrix, which can be redesigned in real-time from the solution of a relaxed locally-valid linear matrix inequality derived from a Lyapunov function and bounded control-input conditions. We have shown asymptotic stabilization of a chaotic map assuming no previous knowledge about the source that produces the stream of data.
... Informally, if x is a vector whose entries are known exactly, then we refer to x as pointwise, or real vector for short. If x is not known precisely, but there is some information that constrains possible or probable values of the entries of x, then the constraint on x defines a granular entity or object [5,6,2]. In practice, data uncertainty originate from inaccurate sensors and devices, expert decision or judgment, and summaries of data over time periods for example. ...
... Evolving granular systems embrace a form of adaptive machine learning able to learn from uncertain and nonstationary data streams [2,9,10]. They arose as computational-intelligence-based modeling approaches driven by pointwise data streams [11,12]; however, they generalize pointwise data and parameters to granular objects. ...
... Rule-based granular models supplied with incremental learning algorithms that work on a per-sample basis have been a prominent research line to deal with big, numerical and uncertain data streams [1][2][3][4]. Generally speaking, an uncertain data sample is a vector • it develops a new human-centric online Interval Incremental Learning (IIL) algorithm that builds and updates interval rule-based models on the fly, from uncertain data streams; • it introduces the concepts of space and time granulation, which are useful to operate at different levels of detail and abstraction, and to synchronize non-uniformly-sampled data streams; • it describes a recursive procedure for a balanced information granularity and, therefore, for obtaining stable and understandable rule-based models that comply with the idea of XAI; • it suggests the Uncertainty-Weighted Recursive-Least-Squares (UW-RLS) algorithm to take into consideration the granularity of the data in the updating of consequent parameters of interval rules within the IIL framework; and • it offers the Driving-Through-Manhattan interval dataset as a benchmark to encourage further research, development, and evaluation of alternative approaches. ...
Article
Full-text available
This paper presents a method called Interval Incremental Learning (IIL) to capture spatial and temporal patterns in uncertain data streams. The patterns are represented by information granules and a granular rule base with the purpose of developing explainable human-centered computational models of virtual and physical systems. Fundamentally, interval data are either included into wider and more meaningful information granules recursively, or used for structural adaptation of the rule base. An Uncertainty-Weighted Recursive-Least-Squares (UW-RLS) method is proposed to update affine local functions associated with the rules. Online recursive procedures that build interval-based models from scratch and guarantee balanced information granularity are described. The procedures assure stable and understandable rule-based modeling. In general, the model can play the role of a predictor, a controller, or a classifier, with online sample-per-sample structural adaptation and parameter estimation done concurrently. The IIL method is aligned with issues and needs of the Internet of Things, Big Data processing, and eXplainable Artificial Intelligence. An application example concerning real-time land-vehicle localization and tracking in an uncertain environment illustrates the usefulness of the method. We also provide the Driving Through Manhattan interval dataset to foster future investigation.
... Evolving granular computing [15] is a general-purpose online learning framework, i.e., a family of algorithms and methods to construct classifiers, regressors, predictors, and controllers in which any aspect of a problem may assume a non-pointwise (e.g., interval, fuzzy, rough, statistical) uncertain characterization, including data, parameters, attributes, learning equations, covering regions [11] [15] [26]. In particular, we have proposed a state-space variety of a granular eFS known as Fuzzy-set-Based evolving Modeling (FBeM) [16], and a model-based control design method that guarantees Lyapunov stability and bounded inputs to the closedloop evolving granular system. ...
... Evolving granular computing [15] is a general-purpose online learning framework, i.e., a family of algorithms and methods to construct classifiers, regressors, predictors, and controllers in which any aspect of a problem may assume a non-pointwise (e.g., interval, fuzzy, rough, statistical) uncertain characterization, including data, parameters, attributes, learning equations, covering regions [11] [15] [26]. In particular, we have proposed a state-space variety of a granular eFS known as Fuzzy-set-Based evolving Modeling (FBeM) [16], and a model-based control design method that guarantees Lyapunov stability and bounded inputs to the closedloop evolving granular system. ...
... Due to aspects of the SS-FBeM learning algorithm, namely, inactive rules do not change, recalculation of gains (16) is needed only for the active rules at a time step. Therefore, the number of LMIs in (15) can be greatly reduced by considering active rules only. ...
Conference Paper
Full-text available
We present a method for incremental modeling and time-varying control of unknown nonlinear systems. The method combines elements of evolving intelligence, granular machine learning, and multi-variable control. We propose a State-Space Fuzzy-set-Based evolving Modeling (SS-FBeM) approach. The resulting fuzzy model is structurally and parametrically developed from a data stream with focus on memory and data coverage. The fuzzy controller also evolves, based on the data instances and fuzzy model parameters. Its local gains are redesigned in real-time – whenever the corresponding local fuzzy models change – from the solution of a linear matrix inequality problem derived from a fuzzy Lyapunov function and bounded input conditions. We have shown one-step prediction and asymptotic stabilization of the Henon chaos.
... Evolving granular computing [15] is a general-purpose online learning framework, i.e., a family of algorithms and methods to construct classifiers, regressors, predictors, and controllers in which any aspect of a problem may assume a non-pointwise (e.g., interval, fuzzy, rough, statistical) uncertain characterization, including data, parameters, attributes, learning equations, covering regions [11] [15] [26]. In particular, we have proposed a state-space variety of a granular eFS known as Fuzzy-set-Based evolving Modeling (FBeM) [16], and a model-based control design method that guarantees Lyapunov stability and bounded inputs to the closedloop evolving granular system. ...
... Evolving granular computing [15] is a general-purpose online learning framework, i.e., a family of algorithms and methods to construct classifiers, regressors, predictors, and controllers in which any aspect of a problem may assume a non-pointwise (e.g., interval, fuzzy, rough, statistical) uncertain characterization, including data, parameters, attributes, learning equations, covering regions [11] [15] [26]. In particular, we have proposed a state-space variety of a granular eFS known as Fuzzy-set-Based evolving Modeling (FBeM) [16], and a model-based control design method that guarantees Lyapunov stability and bounded inputs to the closedloop evolving granular system. ...
... Due to aspects of the SS-FBeM learning algorithm, namely, inactive rules do not change, recalculation of gains (16) is needed only for the active rules at a time step. Therefore, the number of LMIs in (15) can be greatly reduced by considering active rules only. ...
Preprint
Full-text available
We present a method for incremental modeling and time-varying control of unknown nonlinear systems. The method combines elements of evolving intelligence, granular machine learning, and multi-variable control. We propose a State-Space Fuzzy-set-Based evolving Modeling (SS-FBeM) approach. The resulting fuzzy model is structurally and parametrically developed from a data stream with focus on memory and data coverage. The fuzzy controller also evolves, based on the data instances and fuzzy model parameters. Its local gains are redesigned in real-time -- whenever the corresponding local fuzzy models change -- from the solution of a linear matrix inequality problem derived from a fuzzy Lyapunov function and bounded input conditions. We have shown one-step prediction and asymptotic stabilization of the Henon chaos.
... Algoritmos de aprendizado de máquina convencionais para construção de modelo a partir de dados EEG são muitas vezes infactíveis, já que o volume de dadosé enorme e os padrões mudam.É necessário transformações, extração de atributos, modelos não-lineares, janelas, e algoritmos capazes de lidar com fluxos de dados (Leite, 2012). ...
... Um algoritmo recursivo constrói sua base de regras e atualiza grânulos para lidar com novidades. O método lida com quantidades ilimitadas de dados e escalabilidade computacional (Leite, 2012;Decker et al., 2020). ...
... As regras R i , ∀i, formam uma base. O número de regras, c,é variável, o queé uma característica notável da abordagem, pois a suposição de quantas partições existeḿ e dispensável (Skrjanc et al., 2019;Leite, 2012). ...
Conference Paper
Full-text available
Descrevemos um algoritmo de aprendizado de máquina online para construção de Classificadores Fuzzy Gaussianos (eGFC). Apresentamos um método de extração e seleção de atributos do espectro de Fourier de dados de eletro-encefalograma. Os dados são obtidos de 28 indivíduos expostos aos jogos de computador Train Sim World, Unravel, Slender The Arrival, e Goat Simulator. De acordo com o sistema Arousal-Valence, 4 emoções prevalecem (tédio, calma, horror e diversão). Analisamos eletrodos individuais e o efeito de janelas de tempo e redução de dimensionalidade no desempenho de eGFC. Concluímos que eletrodos em ambos os hemisférios do cérebro auxiliam na classificação, especialmente aqueles dos lobos temporal (T7-T8), occipital (O1-O2) e frontal (Af3-Af4). Observamos que padrões podem surgir em qualquer parte do espectro de frequências, entre 1 e 64 Hz. A abordagem eGFC é efetiva para o problema Big data em tempo real. Ela alcança uma acurácia de 72,2% usando uma estrutura compacta de regras, e velocidade de processamento de 1,8 ms/amostra.
... Successful applications of these systems in complex realworld problems, including control, prediction, classification, identification, and function approximation, are found [4], [11]- [15]. A further advantage of evolving fuzzy systems is that they may provide linguistically-appealing granular information [13], [14], [16], that is, these systems may explain their results or actions. Online structural adaptation of a fuzzy model to handle nonstationarities is pursued by adding, merging, and removing rules from a knowledge base [4]. ...
... This paper addresses a new fuzzy modeling framework we called Evolving Gaussian Fuzzy Classification (EGFC) framework. EGFC is a semi-supervised variation of evolving granular rule-based approach [13], [17] for the construction of nonlinear and time-varying classifiers -being unsupervised and supervised learning the boundary cases. We aim to detect and classify anomalies, broadly speaking, and power quality disturbances as a particular application example. ...
... Fundamentally, an HP filter and the DFT are applied to raw voltage data within a time window. A more discriminative set of attributes facilitates model interpretation, reduces data overfitting, and may produce better results due to the elimination of attributes and noise that may mislead online systems [13], [16]. ...
... Nesta técnica, as informações relativas a cada vazão e precipitação observadas nas bacias são associadas a conjuntos fuzzy ao invés de valoresúnicos, o que auxilia o tratamento de dados imprecisos ou com incertezas associadas ao processo de obtenção. Alguns exemplos de implementações de séries temporais nebulosas nasáreas de recursos energéticos renováveis [5] [6], engenharia [7][8], meteorologia [9] e mercado financeiro [10] são encontradas. ...
... Com base nos valores de escoamento dos reservatórios (Rsolo), (Rsub), (Rsup), (Rsup2) e naárea de drenagem da bacia em estudoé possível determinar a vazão total no instante de tempo t através da equação (9). A divisão pelo fator 86,4 tem por objetivo a mudança da unidade de tempo. ...
Conference Paper
This study proposes a comparative analysis between the Soil Moisture Accounting Procedure (SMAP/ONS) and the fuzzy time series technique for water flow forecasting in Brazilian hydroelectric power generation systems. The reservoir crisis of 2021, caused by the worst drought since 1931, which affected the country’s energy sector and increased energy tariffs, served as a warning of the importance of water resource planning and management and how it is essential for electricity generation in Brazil. The SMAP/ONS model, currently used by the Brazilian National Electric System Operator (ONS), calculates water flow based on evapotranspiration and precipitation. In contrast, the fuzzy time series technique is a machine learning-based approach that uses fuzzy logic to handle uncertain data. Results show that the fuzzy time series technique presented competitive performance in most assessed cases.
... We summarize a recently-proposed evolving Gaussian Fuzzy Classification (eGFC) method [12]. eGFC is an instance of Evolving Granular System [13] [14], which is a general-purpose online learning framework, i.e., a family of approaches to autonomously construct classifiers, regressors, predictors and controllers in which any aspect of a problem may have a non-pointwise (e.g., interval, fuzzy, sta-tistical) characterization, including data, parameters, features, learning equations, and covering regions [15] [16]. eGFC has been applied to power quality classification in smart grids [12]; and anomaly detection in data centers [17]. ...
... ρ [h] dictates how large granules can be. Different values result in different granular perspectives [13]. Section II-D gives recursive equations to update ρ [h] . ...
Conference Paper
Emotion recognition has become a need for more realistic and interactive machines and computer systems. The greatest challenge is the availability of high-performance algorithms to effectively manage individual differences and nonstationarities in physiological data, i.e., algorithms that customize models to users with no subject-specific calibration data. We describe an evolving Gaussian Fuzzy Classifier (eGFC), which is supported by a semi-supervised learning algorithm to recognize emotion patterns from electroencephalogram (EEG) data streams. We extract features from the Fourier spectrum of EEG data. The data are provided by 28 individuals playing the games ‘Train Sim World’, ‘Unravel’, ‘Slender The Arrival’, and ‘Goat Simulator’ – a public dataset. Different emotions prevail, namely, boredom, calmness, horror and joy. We analyze individual electrodes, time window lengths, and frequency bands on the accuracy of user-independent eGFCs. We conclude that both brain hemispheres may assist classification, especially electrodes on the frontal (Af3-Af4), occipital (O1-O2), and temporal (T7-T8) areas. We observe that patterns may be eventually found in any frequency band; however, the Alpha (8-13Hz), Delta (1-4Hz), and Theta (4-8Hz) bands, in this order, are more correlated with the emotion classes. eGFC has shown to be effective for real-time learning of EEG data. It reaches a 72.2% accuracy using a variable rule base, 10-second windows, and 1.8ms/sample processing time in a highly-stochastic time-varying 4-class classification problem.
... We summarize a recently-proposed semi-supervised evolving Gaussian Fuzzy Classification (eGFC) method [13]. eGFC is an instance of Evolving Granular System [14] [15], which is a general-purpose online learning framework, i.e., a family of algorithms and methods to autonomously construct classifiers, regressors, predictors, and controllers in which any aspect of a problem may assume a non-pointwise (e.g., interval, fuzzy, rough, statistical) uncertain characterization, including data, parameters, features, learning equations, and covering regions [16] [17] [18]. eGFC has been applied to power quality classification in smart grids [13]; and anomaly detection in data centers [19]. ...
Preprint
Full-text available
Human emotion recognition has become a need for more realistic and interactive machines and computer systems. The greatest challenge is the availability of high-performance algorithms to effectively manage individual differences and nonstationarities in physiological data streams, i.e., algorithms that self-customize to a user with no subject-specific calibration data. We describe an evolving Gaussian Fuzzy Classifier (eGFC), which is supported by an online semi-supervised learning algorithm to recognize emotion patterns from electroencephalogram (EEG) data streams. We extract features from the Fourier spectrum of EEG data. The data are provided by 28 individuals playing the games 'Train Sim World', 'Unravel', 'Slender The Arrival', and 'Goat Simulator' - a public dataset. Different emotions prevail, namely, boredom, calmness, horror and joy. We analyze the effect of individual electrodes, time window lengths, and frequency bands on the accuracy of user-independent eGFCs. We conclude that both brain hemispheres may assist classification, especially electrodes on the frontal (Af3-Af4), occipital (O1-O2), and temporal (T7-T8) areas. We observe that patterns may be eventually found in any frequency band; however, the Alpha (8-13Hz), Delta (1-4Hz), and Theta (4-8Hz) bands, in this order, are the highest correlated with emotion classes. eGFC has shown to be effective for real-time learning of EEG data. It reaches a 72.2% accuracy using a variable rule base, 10-second windows, and 1.8ms/sample processing time in a highly-stochastic time-varying 4-class classification problem.
Article
We introduce an incremental learning method for the optimal construction of rule-based granular systems from numerical data streams. The method is developed within a multiobjective optimization framework considering the specificity of information, model compactness, and variability and granular coverage of the data. We use α-level sets over Gaussian membership functions to set model granularity and operate with hyperrectangular forms of granules in nonstationary environments. The resulting rule-based systems are formed in a formal and systematic fashion. They can be useful in time series modeling, dynamic system identification, predictive analytics, and adaptive control. Precise estimates and enclosures are given by linear piecewise and inclusion functions related to optimal granular mappings.
Book
Primary Audience for the Book • Specialists in numerical computations who are interested in algorithms with automatic result verification. • Engineers, scientists, and practitioners who desire results with automatic verification and who would therefore benefit from the experience of suc­ cessful applications. • Students in applied mathematics and computer science who want to learn these methods. Goal Of the Book This book contains surveys of applications of interval computations, i. e. , appli­ cations of numerical methods with automatic result verification, that were pre­ sented at an international workshop on the subject in EI Paso, Texas, February 23-25, 1995. The purpose of this book is to disseminate detailed and surveyed information about existing and potential applications of this new growing field. Brief Description of the Papers At the most fundamental level, interval arithmetic operations work with sets: The result of a single arithmetic operation is the set of all possible results as the operands range over the domain. For example, [0. 9,1. 1] + [2. 9,3. 1] = [3. 8,4. 2], where [3. 8,4. 2] = {x + ylx E [0. 9,1. 1] and y E [3. 8,4. 2]}. The power of interval arithmetic comes from the fact that (i) the elementary operations and standard functions can be computed for intervals with formulas and subroutines; and (ii) directed roundings can be used, so that the images of these operations (e. g.
Book
Intelligent systems are necessary to handle modern computer-based technologies managing information and knowledge. This book discusses the theories required to help provide solutions to difficult problems in the construction of intelligent systems. Particular attention is paid to situations in which the available information and data may be imprecise, uncertain, incomplete or of a linguistic nature. The main aspects of clustering, classification, summarization, decision making and systems modeling are also addressed. Topics covered in the book include fundamental issues in uncertainty, the rapidly emerging discipline of information aggregation, neural networks, Bayesian networks and other network methods, as well as logic-based systems.
Book
Although the notion is a relatively recent one, the notions and principles of Granular Computing (GrC) have appeared in a different guise in many related fields including granularity in Artificial Intelligence, interval computing, cluster analysis, quotient space theory and many others. Recent years have witnessed a renewed and expanding interest in the topic as it begins to play a key role in bioinformatics, e-commerce, machine learning, security, data mining and wireless mobile computing when it comes to the issues of effectiveness, robustness and uncertainty. The Handbook of Granular Computing offers a comprehensive reference source for the granular computing community, edited by and with contributions from leading experts in the field. Includes chapters covering the foundations of granular computing, interval analysis and fuzzy set theory; hybrid methods and models of granular computing; and applications and case studies. Divided into 5 sections: Preliminaries, Fundamentals, Methodology and Algorithms, Development of Hybrid Models and Applications and Case Studies. Presents the flow of ideas in a systematic, well-organized manner, starting with the concepts and motivation and proceeding to detailed design that materializes in specific algorithms, applications and case studies. Provides the reader with a self-contained reference that includes all pre-requisite knowledge, augmented with step-by-step explanations of more advanced concepts. The Handbook of Granular Computing represents a significant and valuable contribution to the literature and will appeal to a broad audience including researchers, students and practitioners in the fields of Computational Intelligence, pattern recognition, fuzzy sets and neural networks, system modelling, operations research and bioinformatics.