Content uploaded by Fabien Belloir
Author content
All content in this area was uploaded by Fabien Belloir on Mar 16, 2015
Content may be subject to copyright.
New RBF neural network classifier with optimized hidden neurons number
Larbi Beheim, Adel Zitouni, Fabien Belloir
Laboratoire d’Automatique et Microélectronique
Université de Reims Champagne-Ardenne
Campus du Moulin de la Housse,
B.P. 1039, 51687 Reims Cedex 2
FRANCE
Abstract: - This article presents a noticeable performances improvement of a neural classifier based on an RBF
network. Based on the Mahalanobis distance, this new classifier increases relatively the recognition rate while
decreasing remarkably the number of hidden layer neurons. We obtain thus a new very general RBF classifier, very
simple, not requiring any adjustment parameter, and presenting an excellent ratio performances/neurons number. A
comparative study of its performances is presented and illustrated by examples on artificial and real databases.
Key-Words: - RBF neural networks, Mahalanobis distance, clustering, training algorithms, hidden neurons
number optimization, burying tag identification.
1 Introduction
The radial basic functions neural net (RBF) has become,
for these last years, a serious alternative to the traditional
Multi-Layer Perceptron network (MLP) in the
multidimensional approximation problems. RBF
Network was employed since the Seventies under the
name of potential functions and it is only later than [1]
and [2] rediscovered this particular structure in the
neuronal form. Since, this type of network profited from
many theoretical studies such as [3], [4] and [5]. In
pattern recognition, RBF network is very attractive
because of its locality property which makes it possible
to discriminate complex classes such as nonconvex ones.
We consider in this article the Gaussian RBF classifier
of which each m output s
j
is evaluated according to the
following formula:
()()
1
11
1
() () exp
2
hh
NN
T
jljllj lll
ll
s
X w X w XC XC
ϕ
−
==
==−−Σ−
∑∑
(1)
Where X=[x
1
… x
n
]
T
∈
R
N
is a prototype to be classified,
N
h
represents the total number of hidden neurons. Each
one of these nonlinear neurons is characterized by a
center C
l
∈
R
N
and a covariance matrix
Σ
L
.
From a training set S
train
={X
p
,
ω
p
}, p=1…N made up of
prototypes couples X
p
and its membership class
ω
p
∈
{1…,m}, the supervised training problem of RBF
classifier amounts determining his structure, i.e. the
number of hidden neurons N
h
and the different
parameters intervening in the equation of outputs (1).
Whereas these parameters can be calculated by different
heuristics, the estimate of N
h
is often delicate. For that,
many methods were developed among which we can
quote [6], [7] and [8]. That they basic or are very
sophisticated, these methods generally require a very
significant load of calculation without however
guaranteeing significant performances. Moreover, they
often require a certain number of parameters that must
be fixed a priori and optimized for a particular problem.
So these methods cannot be applied systematically and
without particular precautions to any type of
classification problem. The article [9] proposed a very
simple algorithm which generates automatically a
powerful RBF network without any optimization nor
introduction of parameters fixed a priori. Indeed, the
algorithm automatically selects the number of the hidden
layer neurons. Although this network is characterized by
its great simplicity, it presents a major limitation
however owing to the fact that it requires a rather
significant number of neurons in the hidden layer. This
limitation makes it very heavy and requiring very
significant training times for the very large databases.
In this article we propose a solution to this problem by
introducing the Mahalanobis distance. We thus obtain a
new very general, very simple RBF network and
presenting an excellent performances/neurons number
ratio.
The organization of the article is as follows: In section 2,
we describe the principle of construction of new RBF
network and we present the associated algorithm. Its
operation is illustrated for an example on an artificial
database. In section 3, we study its performances, as well
on artificial problems as real ones.
2 Algorithm
In this section, we describe the principle of construction
proposed as well as the algorithm allowing its
implementation. We illustrate then his operation in a
problem of classification including two classes of which
one is not convex.
2.1 Principle
The principle of the algorithm rests on [9]. According to
the exponential nature of the functions
ϕ
l
(.) of each
hidden neuron, the activation state of each one of them
decrease quickly when the vector of entry X moves away
from the neuron center. So only an area of the entry
space centered in C
l
will provide a significantly non null
activation state. Contrary to [9], our algorithm regards
this area as a hyperellipsoide centered in C
l
.. Indeed, the
use of the Mahalanobis distance makes it possible to take
into account the statistical distribution of the prototypes
around the centers C
l
and thus a better representation of
classes shape. Our algorithm proposes to divide a
nonconvex class into a set of hyperellipsoides called
clusters. Each cluster corresponds to a hidden neuron
and it is characterized thus by a center placing it in the
entry space, a matrix of covariance indicating the
privileged directions and a width calculating the
extension of the hyperellipsoide. In the continuation, we
will not make any more the distinction between a neuron
and a cluster.
2.2 Description
Before describing the algorithm of construction of the
RBF classifier, we will introduce some notations used
thereafter. At the k
th
iteration, one defines C
(k)
ij
like the i
th
center (i=1… m
(k)
j
) characterizing the class
Ω
j
. With each
center a matrix with covariance is associated
Σ
(k)
ij
and a
width L
(k)
ij
. We note H
(k)
ij
the hyperellipsoide of center
C
(k)
ij
such as:
()()
{
}
() ()1
,
T
kk
ij p j p ij ij p ij ij
H
XXC XCL
−
=∈Ω − Σ −<
(2)
Each class is characterized by the area R
(k)
j
defined as
the union of all the hyperellipsoides H
(k)
ij
(i=1… m
(k)
j
).
We will also use the distance d(R
(k)
j
,X) between a point
X
∈Ω
j
and its associated area R
(k)
j
. This one is defined as
the Mahalanobis distance between X and the nearest
center C
(k)
ij
of R
(k)
j
:
()
()()
)()(
min,
1
)(
1
)(
)(
k
ij
k
ij
XRd
CXCX
k
ij
T
mi
k
j
k
j
−
∑
−
−
=
=
…
(3)
Step 0 (initialization): For k=0, one define m clusters
whose centers correspond to the barycentres of different
classes
Ω
j
(N
j
is the element number of
Ω
j
) :
m
(0)
j
=1et
∑
Ω∈
==
jpX
p
j
j
mjX
N
C …1,
1
)0(
1
(4)
()()
mjCXCX
N
jpX
T
j
p
j
p
j
j
…1,
1
1
)0(
1
)0(
1
)0(
1
=−−
−
=∑
∑
Ω∈
(5)
Step 1 (adjustment of the widths): The width L
(k)
ij
relating to the center C
(k)
ij
is defined like the half
Mahalanobis distance between this center and the nearest
center of another class :
()()
()
()
() () () ( 1)1 () ()
,1
1,1
1
min
2
k
t
k
j
T
kkkkkk
ij ij st ij ij st
tjs m
imjm
LCCCC
−−
≠=
==
=−Σ−
…
……
(6)
Step 2 (search for an orphan point): We seek a point
X
i
∈
S
train
not belonging to its associated area R
(k)
ω
i
and
most distant from this one:
()
s
k
SX
i XRdX
s
trains
,max arg
)(
ω
∈
= (7)
If such a point does not exist, go to the step 5.
Step 3 (creation of a new center): Point X
i
found at step 2
becomes a new center composing the class
Ω
ω
i
:
1
)()(
+=
k
j
k
j
mm , i
k
jm
XC
k
j
=
)(
,
)(
(8)
Step 4 (Reorganization of the centers): The K-means
clustering algorithm [17] is applied to the points of S
train
pertaining to the class
Ω
ω
i
in order to distribute as well
as possible the m
(k)
j
centers. Calculate the new
covariance matrices:
()()
()
() () ()
()
1
()1
k
pij
T
kkk
ij p ij p ij
k
XH
ij
XC XC
Card H
∈
Σ= − −
−
∑
(9)
Do k=k+1 and go to the step 1.
Step 5 (determination of the weights): The weights
matrix W
*
which minimizes an error function, here
selected as the sum square errors of classification, is
given by:
1
* TT
WHHHT
−
=
(10)
where H and T are the matrices respectively gathering
the activation function stats and the target outputs. These
last are fixed at 1 when they correspond to the class of
the point and 0 elsewhere.
2.3 Discussion
The initialization of the algorithm (step 0) could have
proceeded by the random placement of a number of
given centers. This technique is very current in the
definition of an RBF network. The fact of choosing the
initial centers as barycentres of the points X
p
makes it
possible to avoid this unforeseeable character and
provides moreover the number of these centers. Thus,
the result of the algorithm depends only on the
composition of the training data. In certain cases where
the classes are nonconvex, it may be that the barycentre
of a class is inside another class. This situation is not
prejudicial for the algorithm since this center will be
moved during following iterations. The covariance
matrix corresponding to each center is obtained from the
associated training data. We will further see that other
definitions of this matrix can give different results. In
step 1, L
(k)
ij
is defined relatively to the minimal distance
between the center C
(k)
ij
and centers of another class.
This means that a partial covering between the clusters
of the same class is authorized. From a practical point of
view, that makes it possible to optimize the space
occupation of the attributes by the various zones of
receptivity and thus to reduce the number of clusters
necessary to compose each class. The elliptic volume
covered by each cluster is maximum without
encroaching on neighboring classes. In step 2, the fact of
choosing the furthest point from the region R
(k)
j
makes it
possible to improve the effectiveness of the algorithm of
K-means clustering used at step 4. It should be noted that
this one relates only to the centers constituting the same
class since the other centers did not change a position. It
guarantees moreover a fast development of this area. In
the network training (step 5), the target outputs are fixed
arbitrarily at 1 when they correspond to the class of the
point and 0 elsewhere. The motivation of this practice is
artificially to create a brutal fall of the membership
degree at the geometrical border of the class.
After k iterations, all the points of S
train
belong to a
cluster, the algorithm generated m+k clusters defining as
many subclasses. The RBF network thus built comprises
then N
h
=m+k hidden neurons. We can note that the
algorithm converges necessarily. Indeed, in the "worst
case" where none the classes is separable, there will be
creation of a cluster for each point of S
train
.
2.4 Illustration of operation
We will illustrate the significant phases of the algorithm
on a classification problem of two concentric classes
from the databases of "ELENA" project [10] [11]. This
base makes it possible to determine the capacity of a
classifier to separate two classes not overlapping but of
which one is included in the second.
Fig. 1a. Initialization of the algorithm.
Fig. 1b. 1st iteration of the algorithm.
Fig. 1c. 2nd iteration of the algorithm.
Fig. 1d. Result of classification of the algorithm.
The RBF network comprises 2 inputs and 2 outputs. The
figure 1a shows the 2 initial centers {C
1
,C
2
} obtained
following step 0. We can see that the two centers are
almost confused. Each cluster induced is delimited by an
ellipse of the width calculated at step 1. Obviously, the
cluster of center C
2
is not sufficient to entirely represent
the class
Ω
2
. This one thus will be subdivided in several
subclasses. With the first iteration of the algorithm, the
point noted X
i
on the figure 1b is the furthest from the
center C
2
and is out of the corresponding cluster. The
addition of a new center compared to this point led, after
application of the K-means, to the new distribution
{C
1
,C
2
,C
3
} illustrated by the figure 1b. The point X
j
is
now the furthest from the center C
3
on this figure. After
application of the K-means on this new configuration
one leads to the figure 1c. After 4 iterations, the 2 classes
are discriminated perfectly and the neuronal classifier
comprises a total of 5 neurons (see figure 1d). After
having determined the number of centers necessary and
their positions, the weights of the network are calculated
according to equation of step 5.
The algorithm thus manages to separate the two classes
with only 5 neurons against 108 neurons for the old
algorithm using the Euclidean distance and with a
slightly higher rate of recognition: 98% against 97.75%
for the old RBF.
3 Results
The object of this section is to evaluate the performances
of the RBF classifier built by the algorithm presented in
section 2. For that, we applied the classifier to various
problems of classification comprising a variable number
of attributes and classes and bearing on synthetic data as
well as resulting from the real world.
3.1 Benchmarks
The benchmarks carried out here are studied in detail in
ELENA project [10]. For each problem of classification,
we have the results concerning the RBF classifier
generated by the algorithm proposed as well as the
performances of certain classifiers studied in [11]. It is
about the classifier of the "k-nearest neighbor" (kNN)
[12] who gives the best approximation of the Bayes
recognition error and the of the Multi-Layer Perceptrons
classifier (MLP) very widespread in the pattern
recognition per connexionnist model [13]. The Learning
Vector Quantization classifier (LVQ) proposed by
Kohonen is a simple adaptive method of vector
quantization. For other types of neural classifiers, see
[11] and the included references. The RBFE classifier
acts of that proposed in [9] using the euclidean distance.
For each classifier, we calculate the average error of
recognition (in %) on the test set obtained on 5 different
experiments with the method of the "hold out" for
counting the classification errors. The experimental
protocol, which respects that used in ELENA project,
consists in learning the classifier on half of the data then
testing its performances on the second half of the base.
The first database is created artificially to highlight
certain properties or gaps of the tested classifiers. The
objective of the "Clouds" problem is to study the
influence of two interlaced classes with nonlinear
borders. The three last databases result from real
problems. The "Phoneme" problem relates to the speech
recognition studied in European project "ROARS project
SPIRIT" [14]. The principal difficulty of this problem is
great dissymmetry in the number of authorities of each
classes. We will not present the "Iris" data very known
in the pattern recognition [15]. To finish, the data of the
"Texture" file relates to the recognition of 11 natural
micro-textures such as grass, sand, paper or certain
textiles [16]. Various information concerning the
statistics and the analysis in principal components of
these various data files can be found in [10] and the
references included.
The figure 2 presents results on these various problems.
The performances of the RBF classifier are slightly
lower than the other classifiers for this first problem.
This is explained by the significant interlacing of the two
classes. The algorithm generates a neurons number close
to the points number of training data and the capacities
of generalization on the test set are thus very bad.
The error rate of our classifier RBF is generally weakest
for each of the last three problems. This is checked
whatever the number of classes to be distinguished and
the quantity of available data for the training.
Fig. 2. Classification results on four different bases.
3.2 Study according to the neurons number
The object of this study is to show excel it
performances/nombre report/ratio of neurons which our
new classifier has. Not having the number of neurons of
network MLP, this study is limited to only networks
RBF of the preceding section.
Table 1 presents a comparison between the old RBF and
the new one. We can see on this table the "compact"
quality of our new classifier who gives comparable error
rates or even lower while minimizing the hidden neurons
number Nh. So, training times are much less significant.
For the "Textures" database for example, the error rate is
divided by 4, while the number of neurons is divided by
39. It was necessary less than two minutes to training our
classifier and more than one hour for the old one.
Times of training are given here for an execution of the
algorithm under MATLAB on a PC AMD Athlon XP
1800+.
Database
Classifier Error(%) N
h
Learning time(s)
Old classifier 13,60 162 60
Clouds
This classifie
r
13,25 72 30
Old classifier 10,90 227 100
Phoneme
This classifie
r
10,43 59 24
Old classifier 2,90 24 0,5
Iris
This classifie
r
1,94 3 0,1
Old classifier 1,73 858 3900
Texture
This classifie
r
0,41 22 100
Table. 1. Comparison performances, neurons number
and training times of the two RBF classifiers.
3.3 The choice of the covariance matrix
One of the limits of this classifier is the estimate of the
covariance matrix. The larger the size of the clusters is
and the better is the estimate of this matrix. So the
calculation of this matrix can sometimes reduce the rates
of recognition. To cure this problem, other calculations
of this matrix can be proposed to take into account more
prototypes during the estimate of this matrix. Table 2
gives examples of calculation, errors and the
corresponding number of neurons. One can see on this
table that a different choice of the covariance matrix that
proposed in section 2 can increase or decrease the rate
error but the number of hidden neurons can only
increase. But this number remains always largely lower
than the number of neurons proposed by the old RBF.
Covariance matrix Error (%) N
h
Σ=cov(C) (a)
0,14 247
Σ=cov(J
i
) (b) 0,27 38
Σ=cov(JC
i
) (c) 0,41 22
Σ=cov(J) (d) 0,32 283
Table 2. Error rate of the base Textures according to the
choice of the matrix of covariance: (a) covariance of the
centers, (b) covariance of the data of each class (c)
covariance of the data of each center (d) covariance of
the total database.
3.4 Application in code identification
The goal of our application is to detect and identify
reliably different buried metallic codes with a smart eddy
current sensor. Based on the principle of the induction
balance, our detector measures the magnetic fields
modifications emitted by a coil. These modifications are
due to the presence of the metal codes buried on the top
of the drains. A code is built from a succession of
different metal pieces separated by empty spaces. Thus
the identification of the codes allows the identification
and the localization of the pipes (like water, gas,…) [18].
Several material improvements were carried out on our
detector [21], but the identification of the codes always
poses problems because of the similarity between the
codes, the non-linearity of the answer according to the
depth and the choice of a suitable coding of the signals
[22]. To solve these problems, various methods of
classifications were proposed. The first methods was
based one the fuzzy logic theory, the Kohonen SOM,
and an RBF classifier. The methods based on the fuzzy
logic theory are the well-known Fuzzy Pattern Matching
(FPM) [19] and the distributed rules (DR.) [20]
developed among others by Ishibuchi. Among all these
methods it is the classifier RBFE (Euclidean RBF) who
gives the best results. But, these results remain
insufficient for the great depths. It is for that we
developed this new classifier to try to decrease rate
errors and neurons number for a future integration of the
classifier on programmable microchips.
A comparison is made between these different methods
and the new RBF classifier.
This classifier RBFE SOM FPM DR
Error (%)
5.0 6.2 11.3 8.3 7.1
N
h
68 135 - - -
Table 3. Results of code misclassification for the 5
pattern recognition methods implemented.
For a burying depth up to 80 cm, we obtain the results
given in the table 3. We can notice that the result of the
new RBF classifier is better than the others, and always
with less number of hidden neurones.
4 Conclusion
We proposed a noticeable performances improvement of
a neural classifier based an RBF network. The new
classifier is very general and simple. It generates
automatically a powerful RBF network without any
introduction of parameters fixed a priori.
The number of hidden neurons is very optimized what
will allow its use for the very large databases. Indeed,
the new classifier obtains excellent recognition results
for a variety of different databases.
References:
[1] D.S. Broomhead and D. Lowe, Multivariable
functional interpolation and adaptive networks,
Complex Systems, Vol.2, 1988, pp. 321-355.
[2] J. Moody and C.J. Darken, Fast Learning in
Networks of Locally-Tuned Processing Units. Neural
Computation, Vol.1, 1898, pp. 281-294.
[3] F. Girosi and T. Poggio, Networks and The Best
Approximation Property, Technical Report C.B.I.P.
No. 45, Artificial Intelligence Laboratory,
Massachusetts Institute of Technology, 1989.
[4] Park J. and Sandberg I.W., Universal Approximation
Using Radial-Basis-Function Networks, Neural
Computation, Vol.3, 1991, pp. 246-257.
[5] M. Bianchini, P. frasconi and M. Gori, Learning
without Local Minima in Radial Basis Function
Networks, IEEE Transactions on Neural Networks,
Vol.6:3, 1995, pp. 749-756.
[6] B. Fritzke, Supervised Learning with Growing Cell
Structures, In Advances in Neural Processing
Systems 6, J.C. Cowan, Tesauro G. and Alspector J.
(eds.), Morgan Kaufmann, San Mateo, CA., 1994.
[7] B. Fritzke, Transforming Hard Problems into
Linearly Separable one with Incremental Radial
Basis Function Networks, In M.J. Vand Der Heyden,
J. Mrsic-Flögel and K. Weigel (eds.), HELNET
International Workshop on Neural Networks,
Proceedings Volume I/II 1994/1995, VU University
Press, 1996
[8] C.G. Looney, Pattern Recognition Using Neural
Network - Theory and Algorithms for Engineers and
Scientits, Oxford University Press, Oxford - New
York, 1997.
[9] F. Belloir, A. Fache and A. Billat, A General
Approach to Construct RBF Net-Based
Classifier, Proc. of the European Symposium on
Artificial Neural Networks (ESANN’99), April 21-
23, Bruges Belgium, 1999, pp. 399-404.
[10] C. Aviles-Cruz, A. Guerin-Dugué, J.L. Voz and
D. Van Cappel, Deliverable R3-B1-P Task B1:
Databases, Technical Report ELENA ESPRIT Basic
Research Project Number 6891, June 1995.
[11] F. Blayo, Y. Cheneval, A. Guerin-Dugué,
R. Chentouf, C.Aviles-Cruz, J. Madrenas,
M. Moreno and J.L. Voz, Deliverable R3-B4-P Task
B4: Benchmarks, Technical Report ELENA ESPRIT
Basic Research Project Number 6891, June 1995.
[12] R. Duda and P. Hart, Pattern Recognition and
Scene Analysis, J. Wiley & sons Edition, 1973.
[13] D.E. Rumelhart and J.L. McClelland, Parallel
Distributed Processing: Explorations in the
Microstructure of Cognition, MIT Press, 1986.
[14] P. Alinat, Periodic Progress Report 4, Technical
Report, ROARS Project ESPRIT II-Number 5516,
Thomson Report TS. ASM 93/S/EGS/NC/079,
February 1993.
[15] G.W. Gates, The Reduced Nearest Neighbor Rule,
IEEE Trans. on Information Theory, Vol. May, 1972,
pp. 431-433.
[16] A. Guerin-Dugué and C. Aviles-Cruz, High Order
Statistics from Natural Textured Images, ATHOS
Workshop on System Identification and High Order
Statistics, Sophia-Antipolis, France, September 1993.
[17] S.P. Lloyd, Least Square Quantization in PCM,
IEEE Transaction on Information Theory, Vol. IT-
28:2, 1982, pp. 129-137.
[18] F. Belloir, F. Klein and A. Billat, Pattern
Recognition Methods for Identification of Metallic
Codes Detected by Eddy Current Sensor, Signal and
Image Processing (SIP'97), Proceedings of the
IASTED International Conference, 1997, pp. 293-
297.
[19] M. Grabisch and Sugeno, A Comparison of some
Methods of Fuzzy Classification on Real Data, Proc.
Of IIZUKA'92, Iizuka, Japan, July 1992, pp. 659-
662.
[20] Ishibuchi H., Nosaki K. and Tanaka H., Selecting
Fuzzy If-Then Rules for Classification Problems
Using Genetic Algorithms, IEEE Tansactions on
Fuzzy Systems, Vol.3, N°3, 1995.
[21] L. Beheim, A. Zitouni, F. Belloir, Problem of
Optimal Pertinent Parameter Selection in Buried
Conductive Tag Recognition, Proceedings of
WISP’2003, IEEE International Symposium on
Intelligent Signal Processing, Budapest (Hungary), 4-
6 September 2003, pp. 87-91.
[22] F. Belloir, L. Beheim, A. Zitouni, N. Liebaux, D.
Placko, Modélisation et Optimisation d'un Capteur à
Courants de Foucault pour l'Identification d'Ouvrages
Enfouis, 3e Colloque Interdisciplinaire en
Instrumentation (C2I’2004), Cachan (France), 29-30
janvier 2004.