Content uploaded by Viswanath Pulabaigari
Author content
All content in this area was uploaded by Viswanath Pulabaigari on Jan 21, 2018
Content may be subject to copyright.
Fusion of multiple approximate nearest neighbor classifiers
for fast and efficient classification
P. Viswanath
*
, M. Narasimha Murty, Shalabh Bhatnagar
*
Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India
Received 8 May 2003; received in revised form 25 February 2004; accepted 25 February 2004
Available online 20 March 2004
Abstract
The nearest neighbor classifier (NNC) is a popular non-parametric classifier. It is a simple classifier with no design phase and
shows good performance. Important factors affecting the efficiency and performance of NNC are (i) memory required to store the
training set, (ii) classification time required to search the nearest neighbor of a given test pattern, and (iii) due to the curse of
dimensionality it becomes severely biased when the dimensionality of the data is high with finite samples. In this paper we propose (i)
a novel pattern synthesis technique to increase the density of patterns in the input feature space which can reduce the curse of
dimensionality effect, (ii) a compact representation of the training set to reduce the memory requirement, (iii) a weak approximate
nearest neighbor classifier which has constant classification time, and (iv) an ensemble of the approximate nearest neighbor clas-
sifiers where the individual classifier’s decisions are combined based on the majority vote. The ensemble has constant classification
time upperbound and according to empirical results, it shows good classification accuracy. A comparison based on empirical results
is shown between our approaches and other related classifiers.
Ó2004 Elsevier B.V. All rights reserved.
Keywords: Multi-classifier fusion; Ensemble of classifiers; Nearest neighbor classifier; Pattern synthesis; Approximate nearest neighbor classifier;
Compact representation
1. Introduction
The nearest neighbor classifier (NNC) is a very pop-
ular non-parametric classifier [1,2]. It is widely used be-
cause of its simplicity and good performance. It has no
design phase but simply stores the training set. The test
pattern is classified to the class of its nearest neighbor in
the training set. So the classification time required for the
NNC is largely due to reading the entire training set to
find the nearest neighbor(s).
1
Thus two major short-
comings of the classifier are (i) entire training set needs to
be stored and (ii) entire training set needs to be searched.
To add to this list, when the dimensionality of the data is
high, it becomes severely biased with finite training set
due to the curse of dimensionality [2].
Cover and Hart [3] show that the error for the NNC
is bounded by twice the Bayes error when the available
sample size is infinity. However, in practice, one can
never have an infinite number of training samples. With
a fixed number of training samples, the error of the
NNC tends to increase as the dimensionality of the data
gets large. This is called the peaking phenomenon [4,5].
Jain and Chandrasekharan [6] point out that the number
of training samples per class should be about 5–10 times
the dimensionality of the data. The peaking phenome-
non with the NNC is known to be more severe than
other parametric classifiers such as Fisher’s linear and
quadratic classifiers [7,8]. Thus, it is widely believed that
the size of training sample set needed to achieve a given
classification accuracy would be prohibitively large
when the dimensionality of data is high.
Increasing the training set size has two problems.
These are: (i) space and time requirements get increased
*
Corresponding authors. Tel.: +91803942368; fax: +910803600683.
E-mail addresses: viswanath@csa.iisc.ernet.in (P. Viswanath),
mnm@csa.iisc.ernet.in (M. Narasimha Murty), shalabh@csa.iisc.
ernet.in (S. Bhatnagar).
1
We assume that the training set is not preprocessed (like indexed,
etc.) to reduce the time needed to find the neighbor.
1566-2535/$ - see front matter Ó2004 Elsevier B.V. All rights reserved.
doi:10.1016/j.inffus.2004.02.003
Information Fusion 5 (2004) 239–250
www.elsevier.com/locate/inffus
and (ii) it may be expensive to get training patterns from
the real world. The space requirement problem can be
solved to some extent by using a compact representation
of the training data like PC-tree [9], FP-tree [10], CF-
tree [11], etc., or by using some editing techniques [1]
which reduce the training set size without affecting the
performance. The classification time requirement prob-
lem can be solved by finding an index over the training
set, like R-tree [12], while the curse of dimensionality
problem can be solved by using re-sampling techniques
like bootstrapping [13] and is widely studied [14–18].
These remedies are orthogonal, i.e., have to be done one
after the other (cannot be combined into a single step).
This paper, however, attempts to give a unified solution.
In this paper, we propose a novel bootstrap technique
for the NNC design, that we call partition based pattern
synthesis which reduces the curse of dimensionality ef-
fect. The artificial training set generated by this method
can be exponentially larger than the original set.
2
As a
result the synthetic patterns cannot be explicitly stored.
We propose a compact data structure called partitioned
pattern count tree (PPC-tree) which is a compact
representation of the original set and is suitable for
performing the synthesis. The classification time
requirement problem is solved as follows. Finding an
approximate nearest neighbor (NN) is computationally
less demanding since it avoids exhaustive search of the
training set. We propose an approximate NN classifier
called PPC-aNNC whose classification time is indepen-
dent of the training set size. PPC-aNNC directly works
with PPC-tree which does implicit pattern synthesis and
finds an approximate nearest neighbor of the given test
pattern from the entire synthetic set. Thus an explicit
bootstrap step to generate the artificial training set is
avoided. However, PPC-aNNC is a weak classifier
having a lower classification accuracy (CA) than NNC.
Classification decision fusion of multiple PPC-aNNC’s
is empirically shown to achieve better CA than con-
ventional NNC. This ensemble of PPC-aNNC’s is based
on simple majority voting technique and is suitable for
parallel implementations. The proposed ensemble is a
faster and better classifier than NNC and some of the
classifiers of its kind. PPC-tree and PPC-aNNC assume
discrete valued features. For other domains, the data
sets need to be discretized appropriately.
Some of the earlier attempts at combining nearest
neighbor (NN) classifiers are as follows. Breiman [19]
experimentally demonstrated that combining NN clas-
sifiers does not improve performance as compared to that
of a single NN classifier. He attributed this behavior to
the characteristic of NN classifiers that the addition or
removal of a small number of training instances does not
change NN classification boundaries significantly.
Decision trees and neural networks, he argued, are in this
sense less stable than NN classifiers. In his experiments,
the component NN classifiers stored a large number of
prototypes. Thus it is computationally less efficient as
well. Skalak [20] used a few selected prototypes for each
component NN classifier and showed that the composite
classifier outperforms the conventional NNC. Alpaydin
[21] used multiple condensed sets generated by accessing
the training set in various random orders. Individual
NNC works with a condensed training set and the final
decision is made by taking a majority voting (either
simple or weighted) of the individual classifiers. Experi-
mental results show that this improves performance.
Kubat and Chen [22] propose an ensemble of several
NNCs, such that each independent classifier considers
only one of the available features. Class assignment to
new patterns is done through weighted majority voting of
the individual classifiers. This does not work well in do-
mains where mutual inter-correlation between pairs of
attributes is high. Bay [23] combined multiple NN clas-
sifiers where each component uses only a random subset
of features. Experimentally this also is shown to improve
performance in most cases.
Hamamoto et al. [18] proposed a bootstrap technique
for NNC design which is experimentally shown to per-
form well. In their approach, each training pattern is
replaced by weighted average (which is the centroid if
the weights are equal) of its rnearest neighbors in the
training set.
We present experimental results in this paper with six
different data sets (having both discrete and continuous
valued features), and a comparison is drawn between
our approaches and (i) NNC, (ii) k-NNC, (iii) Naive–
Bayes classifier, (iv) NNC based on a bootstrap tech-
nique given by Hamamoto et al. [18], (v) voting over
multiple condensed nearest neighbors [21] and (vi)
weighted nearest neighbor with feature projection [22].
This paper is organized as follows: partition based
pattern synthesis is described in Section 2, compact data
structures in Section 3, PPC-aNNC in Section 4.2, the
ensemble of PPC-aNNC’s in Section 4.3, experimental
results in Section 5 and conclusions in Section 6,
respectively.
2. Partition based pattern synthesis
We use the following notation and definitions to de-
scribe partition based pattern synthesis and various
other concepts throughout this paper.
2.1. Notation and definitions
Set of features:
F¼ff1;f2;...;fdgis the set of features. Feature fi
can get its value from domain Di(1 6i6d).
2
By original set we mean the given training set.
240 P. Viswanath et al. / Information Fusion 5 (2004) 239–250
Pattern:
X¼ðx1;x2;...;xdÞTis a pattern in ddimensional vec-
tor format.
X½fiis the feature-value of pattern Xfor feature fi.
X½fi2Di(1 6i6d).
Thus, X½fi¼xifor pattern X.
Set of class labels:
X¼f1;2;...;cgis the set of class labels. Each train-
ing pattern has a class label.
Set of training patterns:
Xis the set of all training patterns.
Xlis the set of training patterns for a class with label
l.
X¼X1[X2[...[Xc.
Partition:
pl¼fB1;B2;...;Bpgis a partition of Ffor class with
label l, i.e.,
BiF,8i,
SiBi¼F, and
Bi\Bj¼;,ifi6¼ j,8i,8j.
Set of partitions is P¼fplj16l6cg.
Sub-pattern:
A pattern for which zero or more feature-values
are absent (missing or unknown) is called a sub-pat-
tern.
An absent feature-value is represented by H.
Thus, if Yis a sub-pattern, then
Y½fi2Di[fHg;16i6d:
Scheme of a sub-pattern:
A sub-pattern Yis said to be of scheme S, where
SF, if for 1 6i6d,
Y½fi2Di;if fi2S
Y½fi¼H;otherwise:
Sub-pattern of a pattern:
XSis said to be sub-pattern of pattern X, with scheme
S, provided
XS½fi¼X½fi;if fi2S
¼H;otherwise:
Set of sub-patterns:
A collection of sub-patterns with all members having
the same scheme. A collection of sub-patterns of dif-
ferent schemes is not aset of sub-patterns.
We further define, set of sub-patterns for a set of
patterns with respect to a scheme as follows.
If Wis a set of patterns, then WSis called the set of
sub-patterns of Wwith respect to scheme Swith
WS¼fWSjW2Wg.
Merge operation (}):
If P,Qare two sub-patterns of schemes Si,Sjrespec-
tively, then merge of Pand Qwritten as P}Qis a sub-
pattern of scheme Si[Sjand is defined only if
Si\Sj¼;.
If R¼P}Q, then for 1 6k6d,
R½fk¼P½fkif fk2Si;
¼Q½fkif fk2Sj;
¼H;otherwise:
Join operation (}):
If Ym,Ynare sets of sub-patterns of scheme Si,Sj
respectively, then join of Ymand Yn, written as
Ym}Ynis defined only if Si\Sj¼;, and
Ym}Yn¼fRjR¼P}Q;P2Ym;Q2Yng.
Join operation is commutative and associative.
3
So,
Ym}ðYn}YoÞ¼ðYm}YnÞ}Yo, and is written as
Ym}Yn}Yo.
2.2. Synthetic pattern generation
The method of synthetic pattern generation is as
follows.
(1) Choose an appropriate set of partitions
P¼fplj16l6cgwhere pl¼fB1;B2;...;Bpgis a
partition of Ffor a class with label l.
(2) The set of training patterns for the class with label l,
that is Xlis replaced by its synthetic counterpart,
SP ðXlÞ, where SP ðXlÞ¼XB1
l}XB2
l}}XBp
l.
(3) Repeat step 2 for each label l2X.
Note 1. Partition can be different for each class. How-
ever, we assume jplj¼p, a constant for all l2X.
This simplifies analysis of classification methods
and cross-validation method as described in subse-
quent sections.
Note 2. If each pattern is seen as an ordered tuple, then
XlSP ðXlÞD1D2Dd.
2.3. Example
This example illustrates the concept of synthetic
pattern generation. Let F¼ff1;f2;f3;f4g,D1¼
fred;green;blueg,D2¼f2;3;4;5g,D3¼fsmall;bigg
and D4¼f1:75;2:04g, respectively.
Let Xl¼fðred;3;big;1:75ÞT;ðgreen;2;small;1:75ÞTg
be the training set for the class with label l. Also, let the
partition for class x1be px1¼fB1;B2gwhere B1¼
ff1;f3g, and B2¼ff2;f4g. Then, XB1
l¼fðred;H;big;
HÞT;ðgreen;H;small;HÞTg, and XB2
l¼fðH;3;H;1:75ÞT;
ðH;2;H;1:75ÞTg, respectively.
The set of synthetic patterns for class l, i.e. SP ðXlÞ,is
SP ðXlÞ¼XB1
l}XB2
l
¼fðred;3;big;1:75ÞT;ðred;2;big;1:75ÞT;
ðgreen;3;small;1:75ÞT;ðgreen;2;small;1:75ÞTg:
3
This can be directly proved from the definitions of join and merge
operations.
P. Viswanath et al. / Information Fusion 5 (2004) 239–250 241
2.4. A partitioning method
Appropriate partition needs to be chosen for the given
classification problem. We present a simple heuristic
based method to find a partition. This method is based on
pair-wise correlation between the features and therefore
is suitable for domains having numerical feature values
only. Domain knowledge also can be used to get an
appropriate partition. The synthesis method can how-
ever work with any domain provided a partition is given.
The partitioning method is given below. The objective
of this method is to find a partition such that the average
correlation between features within a block is high and
that between features of different blocks is low. Since
this objective is a computationally demanding one, we
give a greedy method which can find only a locally
optimal partition.
Find-Partition()
{
Input: (i) Set of features, F¼ff1;...;fdg.
(ii) Pair-wise correlation between features,
C¼fc½fi½fj¼correlation between fi;fjj
ð16i;j6dÞg.
(iii) p¼Number of blocks required in the parti-
tion such that p6d.
Output: Partition, p¼fB1;B2;...;Bpg.
(1) Mark all features in Fas unused.
(2) Find c½f0
1½f0
2, the minimum element in Csuch that
f0
16¼ f0
2
(3) B1¼ff0
1g,B2¼ff0
2g
(4) Mark f0
1,f0
2as used.
(5) For i¼3top
{
(i) Choose an un-marked feature, f0
isuch that
ðc½f0
i½f0
1þþc½f0
i½f0
i1Þ=ði1Þis minimum,
where f0
1;...;f0
i1are marked (as used) features.
(ii) Bi¼ff0
ig
(iii) Mark f0
ias used.
}
(6) For each unmarked feature, f0
{
(i) For i¼1top
Ti¼PjBij
j¼1c½f0½f0
j
=jBij
(ii) Find maximum element from fT1;T2;...;Tpg.
Let it be Tk.
(iii) Bk¼Bk[ff0g
(iv) Mark f0as used.
}
(7) Output the partition, p¼fB1;...;Bpg.
}
For each class of training patterns, the above method
is used separately. Experiments (Section 5) are done
with number of blocks (i.e., p) being 1, 2, 3 and d,
respectively, where dis the total number of features.
3. The data structures
Partition based pattern synthesis can generate syn-
thetic set of size OðnpÞ, where nis the original set size
and pis the number of blocks in the partition. Hence
explicitly storing the synthetic set is very space con-
suming. In this section we present a compact represen-
tation of the original set which is suitable for the
synthesis. For large data sets, this representation re-
quires less storage space than that for the original set.
This representation is called partitioned pattern count
tree (PPC-tree).
Partitioned pattern count tree (PPC-tree) is a gener-
alization of pattern count tree (PC-tree). For the sake of
completeness, we give first a brief overview of PC-tree,
details of which can be found in [9]. These data struc-
tures are suitable when each feature can take discrete
values (can be categorical values also). For continuous
valued features, an appropriate discretization needs to
be done first. Later, we present a simple discretization
process which is used in our experimental studies.
3.1. PC-tree
PC-tree is a complete and compact representation of
training patterns which belong to a class. An order is
imposed on the set of features F¼ff1;...;fdg, where fi
denotes the ith feature. Patterns belonging to a class are
stored in a tree structure (PC-tree), where each feature
occupies a node. Every training pattern is present in a
path from root to leaf. Two patterns X,Yof a class can
share a common node for their respective nth feature if,
X½fi¼Y½fifor 1 6i6n.
A node has along with feature value, a count indi-
cating how many patterns are sharing that node. A
compact representation of the training set is obtained as
many patterns share a common node in the tree. The
given training set is represented as the set fT1;T2;...;Tcg
where each element Tiis the PC-tree for the class of
training patterns with label i.
3.1.1. Example
Let fða;b;c;x;y;zÞT;ða;b;d;x;y;zÞT;ða;e;c;x;y;uÞT;
ðf;b;c;x;y;vÞTgbe the original training set for a class
with label i. Then the corresponding PC-tree Ti(same
symbol is used for the tree and the root node of the tree)
for this training set is shown in Fig. 1. Each node of the
tree is of the format (feature : count).
3.2. PPC-tree
Let Xibe the set of original patterns which belong to
a class with label i. Let pi¼fB1;B2;...;Bpgbe a par-
tition of the feature set F, where each block
Bj¼ffj1;...;fjjBjjg(for 1 6j6p) is an ordered set and
the nth feature of block Bjis fjn. Then PPC-tree for Xi
242 P. Viswanath et al. / Information Fusion 5 (2004) 239–250
with respect to piis Ti¼fTi1;...;Tip g, a set of PC-trees
such that Tij is the PC-tree for the set of sub-patterns XBj
i
for 1 6j6pwhere the H-valued features (see Section
2.1) are ignored. Each PC-tree (Tij) corresponds to a
class (with label i) and to a block (Bjsuch that Bj2pi)
of the partition of that class. The given training set is
represented as the set fT1;T2;...;Tcg, where each ele-
ment Tiis the PPC-tree for the class of training patterns
with label i, and Ti¼fTi1;...;Tipg, a set of PC-trees.
A path from root to leaf of the PC-tree Tij (excluding
the root node) corresponds to a unique sub-pattern with
scheme Bj2pi.If(x1;x2;...;xjBjj) is a path in Tij then
the corresponding sub-pattern is Psuch that
P½fj1¼x1;P½fj2¼x2;...;P½fjjBjj¼xjBjjand for the
remaining features f, such that f2FBj,P½f¼H.If
Qjis sub-pattern corresponding to a path in Tij for
16j6p, then Q¼Q1}Q2}}Qpis a synthetic pat-
tern in the class with label i. Algorithms 1 and 2 give the
construction procedures.
Algorithm 1 (Build-PPC-trees())
{Input: (i) Original Training Set
(ii) Partition for each class, i.e., p1;p2;...;pc.
Output: The set of PPC-trees,T¼fT1;...;Tcgwhere
Ti¼fTi1;...;Tipg
for 16i6c.Tij is the PC-tree for the class
with label iand block Bj2pi.
Assumption: (i) Number of blocks in each pi,16i6c,
is the same and is equal to p.
(ii) Each Tij is empty (i.e., has only root
node) to start with.}
for i¼1to cdo
for each training pattern X2Xido
for j¼1to pdo
Add-Pattern(Tij;X)
end for
end for
end for
3.2.1. Example
For the example considered in Section 3.1.1, the PPC-
tree is shown in Fig. 2 where the partition is pi¼fB1;B2g
such that B1¼ff1;f2;f3gand B2¼ff4;f5;f6g, respec-
tively. The ordering of features considered for each block
is the same as that in Example 3.1.1. Thus the PPC-tree is
the set of PC-trees fTi1;Ti2g.Ti1is the PC-tree for the set
of sub-patterns XB1
i¼fða;b;c;H;H;HÞT;ða;b;d;H;H;
HÞT;ða;e;c;H;H;HÞT;ðf;b;c;H;H;HÞTgwhere the H-
valued features are ignored. Similarly Ti2is the PC-tree
for the set of sub-patterns XB2
i, see Fig. 2.
Note that PPC-tree is a more compact representation
than the corresponding PC-tree. From the examples, it
can be seen that the number of nodes in PPC-tree is 16,
but that in PC-tree is 22. A path from root to leaf of Ti1
represents a sub-pattern with scheme B1and that of Ti2
represents a sub-pattern with scheme B2. Merging the
two sub-patterns gives a synthetic pattern according to
the partition. Further,
Algorithm 2 (Add-Pattern(PC-tree Tij , Pattern X))
X0¼XBjsuch that Bj2pi{X0is the sub-pattern of X
with scheme Bj2pi.}
Node current-node ¼Tij root
for j¼1toddo
{dis the dimensionality of X}
if (X0½fj 6¼ H)then
L¼List of child nodes of current-node
if (Lis empty) then
Node new-node ¼create a new node
new-node Æfeature-value ¼X0½fj
new-node Æcount ¼1
Make new-node as a child for the current-node
current-node ¼new-node
else
if (a node v2Lexists such that vfeature-
value ¼X0½fj)then
vcount ¼vcount þ1
current-node ¼v
else
Node new-node ¼create a new node
new-node Æfeature-value ¼X0½fj
new-node Æcount ¼1
Make new-node as a child for the current-node
current-node ¼new-node
end if
end if
end if
end for
T
f : 1
a : 3
i
b : 2
e : 1
b : 1
c : 1
d : 1
c : 1
c : 1 x : 1
x : 1
x : 1
x : 1 y : 1
y : 1
y : 1
y : 1
z : 1
z : 1
u : 1
v : 1
Fig. 1. PC-tree Ti.
Ti2
Ti1
u : 1
a : 3
f : 1 b : 1
e : 1
b : 2
c : 1
d : 1
c : 1
c : 1
x : 4 y : 4
z : 2
v : 1
PCtree for block 1. PCtree for block 2.
Fig. 2. PPC-tree Ti¼fTi1;Ti2g.
P. Viswanath et al. / Information Fusion 5 (2004) 239–250 243
both PC-tree and PPC-tree can be incrementally built by
scanning the database of patterns only once and are
suitable with discrete valued features which could be of
categorical type as well.
4. Classification methods with synthetic patterns
We present three classification methods to work with
synthetic patterns viz., NNC(SP), PPC-aNNC, and
ensemble of several PPC-aNNC’s.
4.1. NNC(SP)
NNC(SP) is the nearest neighbor classifier with syn-
thetic patterns. Explicit generation of the synthetic set is
done first and then NNC is applied. This method is
computationally inefficient as the space and classifica-
tion time requirements are both OðnpÞ, where pis the
number of blocks used in the partition and nis the
original training set size. This method is presented for
comparison purposes with other methods using the
synthetic set. The distance measure used by this method
is Euclidean distance.
4.2. PPC-aNNC
PPC-aNNC finds an approximate nearest neighbor
of a given test pattern. The distance measure used
here is the Manhattan distance (City block distance).
The method is suitable for discrete and numeric val-
ued features only. PPC-aNNC is described in Algo-
rithm 3. Let Qbe the given test pattern. The quantity
distij is the distance between the sub-pattern QBjand
its approximate nearest neighbor in the set XBj
i(the
set of sub-patterns of Xiwith respect to the scheme
Bj2pi), where the H-valued features are ignored. The
quantity di¼Pp
j¼1distij is then the distance between Q
and its approximate nearest neighbor in the class with
label i.
The method progressively finds a path in each of Tij
starting from root to leaf. The ordering of features
present in QBjmust be the same as that of Bj2piwhich
is used to construct the PC-tree Tij . At each node, it tries
to find a child which is nearest to the corresponding
feature value in QBj(based on the absolute difference
between the values) and proceeds to that node. If there is
more than one such child then it proceeds to the child
that has the maximum count value. Let the child node
be vand the feature value of the corresponding feature
in QBjbe q. Then the distance distij is increased by
jvfeature-value qj.
If Qis present in the original training set then PPC-
aNNC will find it and in this case the neighbor obtained
is the exact nearest neighbor.
4.2.1. Computational requirements of PPC-aNNC
Let the number of discrete values any feature can take
be at-most l, the dimensionality of each pattern be dand
the number of classes be c. Then the time complexity of
PPC-aNNC is OðcldÞ, since it finds only one path in
each of the cPPC-trees (one for a class) and at any node
it searches only the child-list (can be of size OðlÞ) of that
node to find the next node in the path. The path will
have dnodes. For a given problem, c,land dare con-
stants (i.e., independent of the number of training pat-
terns) that are typically much smaller than the number
of training patterns. Thus, the effective time complexity
of the method is only Oð1Þ.That is classification time of
PPC-aNNC is constant and is independent of the training
set size. However, since it avoids exhaustive search of the
PPC-tree, it can only find an approximate nearest
neighbor.
Algorithm 3 (PPC-aNNC(Test Pattern Q))
{Assumption (i): The set of PPC-trees fT1;...;Tcg,
where Ti¼fTi1;...;Ti2gfor (1 6i6c)and (1 6j6p)
is assumed to be already built.
Assumption (ii): pi¼fB1;B2;...;Bpgð16i6cÞis the
partition of the feature set Ffor the class with label
i,and is the same as that used in the construction of
the PPC-tree,Ti,where each block Bj¼ffj1;...;
fjjBjjg(for 1 6j6p)is an ordered set with the nth fea-
ture of the block Bjbeing fjn.}
for each class with label i¼1tocdo
for each Bj2pi,(16j6pÞdo
Q0¼QBj
current-node ¼Tij root
distij ¼0
for l¼1tojBjjdo
L¼List of child nodes of the current-node.
Choose a sublist of nodes in Lsuch that
jQ0½fjlvfeature-valuejis minimum. Let this
sublist be L0.
Choose a node v2L0such that vcount is max-
imum. {Ties are broken arbitrarily}
distij ¼distij þjQ0½fjl vfeature-valuej
current-node ¼v
end for
end for
end for
for i¼1tocdo
di¼0
for j¼1topdo
di¼diþdistij
end for
end for
Find dx¼Minimum element in fd1;d2;...;dcg
Output (class label of Q¼x)
The space requirement of the method is mostly due to
the PPC-tree structures. PPC-trees require space of
244 P. Viswanath et al. / Information Fusion 5 (2004) 239–250
OðnÞ, where nis the total number of original patterns.
For medium to large data sets, empirical studies show
that the space requirement is much smaller than that
required by conventional vector format (i.e., each pat-
tern is represented by a list of feature values). However,
for small data sets the space required may increase be-
cause of the data structure overhead (the space needed
for pointers, etc.).
4.3. The ensemble of PPC-aNNC’s
PPC-aNNC is a weak classifier since it finds only an
approximate nearest neighbor for the test pattern. Par-
tition based pattern synthesis depends on the partition
chosen. PPC-tree and hence PPC-aNNC depends not
only on the partition chosen for each class, but also on
the ordering of features within each block of the parti-
tions. Thus various orderings of features in each block
of the partitions results in various PPC-aNNC’s. An
ensemble of PPC-aNNC’s where the final decision is
made based on simple majority voting is empirically
shown to perform well. Let there be rcomponent clas-
sifiers in the ensemble. Each component classifier is
chosen based on a random ordering of features in each
block.
Intuitively, the functioning of PPC-aNNC’s can be
explained as follows. While finding the approximate
nearest neighbor, PPC-aNNC gives emphasis to the
features in a block according to its order. The first
feature in a block is emphasized the most while the last
feature the least. Notice that if there is only one fea-
ture in each block for partitions of all classes, then
PPC-aNNC finds the exact nearest neighbor in the
entire synthetic set (generated according to this parti-
tioning). This is because, in this case all features are
emphasized equally (since there is only one feature in
each block). Since each PPC-aNNC in the ensemble is
based on a random ordering of the features, the
emphasis on features given by each of them is quite
different from that of others. Because of this the errors
made by each of the individual classifiers become un-
correlated significantly, causing the ensemble to per-
form well.
The ensemble is suitable for parallel implementation
with rmachines, where each machine implements a
different PPC-aNNC. The communication requirement
is there only when the test pattern is communicated to
all individual classifiers and when majority vote is re-
quired, and therefore results in a very small overhead.
On the other hand, if the ensemble is to be implemented
on a single machine, then the space and time require-
ments will be rtimes that of a single PPC-aNNC, which
may not be feasible for large data sets.
5. Experiments
5.1. Datasets
We performed experiments with six different datasets,
viz., OCR, WINE, VOWEL, THYROID, GLASS and
PENDIGITS, respectively. Except the OCR dataset, all
others are from the UCI Repository [24]. OCR dataset is
also used in [9,25] while WINE, VOWEL, THYROID
and GLASS datasets are used in [21]. The properties of
the datasets are given in Table 1. All the datasets have
only numeric valued features. The OCR dataset has
binary discrete features, while the others have continu-
ous valued features. Except OCR dataset, all other
datasets are normalized to have zero mean and unit
variance for each feature and subsequently discretized.
Let abe a feature value after normalization, and a0be its
discrete value. We used the following discretization
procedure.
If (a<0:75) then a0¼1;
Else-If (a<0:25) then a0¼0:5;
Else-If (a<0:25) then a0¼0;
Else-If (a<0:75) then a0¼0:5;
Else a0¼1.
5.2. Classifiers for comparison
The classifiers chosen for comparison purposes are as
follows.
NNC: The test pattern is assigned to the class of its
nearest neighbor in the training set. The Distance
measure used is Euclidean distance.
k-NNC: A simple extension of NNC, where the most
common class in the knearest neighbors is chosen.
The distance measure is Euclidean distance. Three-
fold cross-validation is done to choose the kvalue.
Table 1
Properties of the datasets used
Dataset Number of features Number of classes Number of training examples Number of test examples
OCR 192 10 6670 3333
WINE 13 3 100 78
VOWEL 10 11 528 462
THYROID 21 3 3772 3428
GLASS 9 7 100 114
PENDIGITS 16 10 7494 3498
P. Viswanath et al. / Information Fusion 5 (2004) 239–250 245
Naive–Bayes classifier (NBC): This is a specialization of
the Bayes classifier where the features are assumed to
be statistically independent. Further, the features are
assumed to be of discrete type. Let X¼ðx1;...;xdÞT
be a pattern and lbe a class label. Then the class con-
ditional probability, PðXjlÞ¼Pðx1jlÞPðxdjlÞ.
PðxijlÞis taken as the frequency ratio of number of
patterns in class with label land with feature fivalue
equal to xito that of total number of patterns in that
class. A priori probability for each class is taken as
the frequency ratio of number of patterns in that class
to the total training set size. The given test pattern is
classified to the class for which the posteriori proba-
bility is maximum. OCR dataset is used as it is,
whereas the other datasets are normalized (to have
zero mean and unit variance for each feature) and
discretized as done for PPC-aNNC.
NNC with bootstrapped training set (NNC(BS)):We
used the bootstrap method given by Hamamoto
et al. [18] to generate an artificial training set. The
bootstrapping method is as follows. Let Xbe a train-
ing pattern and let X1;...;Xrbe its rnearest neighbors
in its class. Then X0¼ð
Pr
i¼1XiÞ=ris the artificial pat-
tern generated for X. In this manner, for each training
pattern an artificial pattern is generated. NNC is done
with this new bootstrapped training set. The value of r
is chosen according to a three-fold cross-validation.
Voting over multiple condensed nearest neighbors
(MCNNC): Condensed nearest neighbor classifier
(CNNC) first finds a condensed training set which
is a subset of the training set, such that NNC with
the condensed set classifies each training pattern cor-
rectly. The condensed set is incrementally built.
Changing the order of the training patterns consid-
ered can give a new condensed set. Alpaydin [21] pro-
posed to train a multiple such subsets and take a vote
over them, thus combining predictions from a set of
concept descriptions. Two voting schemes are given:
simple voting where voters have equal weight and
weighted voting where weights depend on classifiers’
confidence in their predictions. The second scheme
is shown empirically to do well, so this is taken for
comparison purposes. The paper [21] proposes some
additional improvements based on bootstrapping,
etc., which are not considered here.
Weighted nearest neighbor with feature projection
(wNNFP): This is given by Kubat et al. in [22]. If d
is the number of features, then dindividual nearest
neighbor classifiers are considered, each classifier tak-
ing only one feature into account. That is, dseparate
training sets are projected, each being used by an
individual NNC. Weighted majority voting is taken
to combine the decisions of the individual NNC’s.
The weights for the individual classifiers are given
based on their classification accuracies. Three-fold
cross-validation is done for this.
NNC with synthetic patterns (NNC(SP)): This method is
given in Section 4.1. The parameter P, i.e., the set of
partitions is chosen based on the cross-validation
method given in Section 5.3.
PPC-aNNC: This method is given in Section 4.2 and
cross-validation method to choose the parameter val-
ues in Section 5.3.
Ensemble of PPC-aNNC’s: This method is given in Sec-
tion 4.3. Cross-validation method to choose the
parameter values is given in Section 5.3.
5.3. Validation method
Three-fold cross-validation is used to fix the param-
eter values for various classifiers described in this paper.
For the methods proposed in this paper, viz., NNC(SP),
PPC-aNNC and Ensemble of PPC-aNNC’s we give a
detailed cross-validation procedure below.
The training set is randomly divided into three equal
non-overlapping subsets. If equal division is not possible
then one or two randomly chosen training patterns are
replicated to get an equal division. Two such subsets are
combined to form a training set called validation training
set, and the remaining one is called validation test set.
Like this we get three different validation training sets
and corresponding validation test sets. We call these sets
as val-train-set-1,val-train-set-2,val-train-set-3 for vali-
dation training sets and val-test-set-1,val-test-set-2,val-
test-set-3 for the corresponding validation test sets,
respectively. For a given set of parameter values, val-
train-set-iis used as the training set for the classifier and
classification accuracy(CA) is measured over val-test-
set-iand is called val-CA-i, where i¼1, 2 or 3. Average
value of {val-CA-1,val-CA-2,val-CA-3} is called avg-
val-CA and its standard deviation as val-SD. Objective
of cross-validation is to find a set of parameter values
for which avg-val-CA is maximum. val-SD measures the
spread of val-CA-i,i¼1, 2 or 3, around avg-val-CA.
Exhaustive search over all possible sets of parameter
values is a computationally expensive activity and hence
we give a greedy approach for choosing the set of
parameter values.
(1) NNC(SP): The parameters used in NNC(SP) are
partitions of the set of features (F) for each class
which are used for performing partition based pattern
synthesis. Let these partitions be represented as a
set P¼fp1;p2;...;pcgwhere pi(1 6i6c) is the parti-
tion used for class with label i. Further, let Ppbe the set
of partitions where each element (i.e., partition) has
exactly pblocks. An element from fP1;P2;P3;Pdg
(where d¼jFj) which gives maximum avg-val-CA is
chosen. Ppfor a given pis obtained either from
domain knowledge or by using the method given in
Section 2.4.
OCR dataset consists of handwritten images on a two
dimensional rectangular grid of size 1612 where for
246 P. Viswanath et al. / Information Fusion 5 (2004) 239–250
each cell, presence of ink is represented by 1 and its
absence by 0. It is known that for a given class, the
values in nearby cells are highly dependent than for
far apart cells (nearness here is based on physical
closeness between the cells). This knowledge is used for
obtaining the partitions. An entire image is represented
as a 192 dimensional vector where the first 12 features
correspond to the first row of the grid, the second 12
features correspond to the second row of the grid and
so on. Let the set of features in this order be F¼
ff1;f2;...;f192g. A partitioning of Fwith p(where
p¼1, 2, 3 or 192) blocks, i.e., fB1;B2;...;Bpgis ob-
tained in the following manner: The first 192=pfeatures
go into block B1, the next 192=pfeatures into block B2,
and so on.
For other datasets, viz., WINE, VOWEL, THY-
ROID, GLASS and PENDIGITS, partitions are ob-
tained by using the method given in Section 2.4.
(2) PPC-aNNC: Parameters are the partitions of the
set of features as used by NNC(SP), ordering these in
all blocks of each partition. These are chosen as fol-
lows. Ppfor p¼1, 2, 3 and dare obtained as done
for NNC(SP). For each Pp(where p¼1, 2, 3 or d),
features in each block (for each partition) are randomly
ordered and avg-val-CA is obtained. 100 such runs
(each with a different random ordering of features) are
obtained for each Pp. The Ppalong with ordering of
features for which avg-val-CA is maximum is then cho-
sen.
(3) Ensemble of PPC-aNNC’s: The parameters here
are (i) number of component classifiers (r), (ii) set of
partitions Pi¼fp1;...;pcgused by each component
ið16i6rÞ, and (iii) ordering of features in each block of
each partition. These parameters are chosen by
restricting the search space as given below.
Set of partitions used by each component classifier is
same (except for ordering of features). That is,
P1¼P2¼ ¼ Prand is chosen as done for PPC-
aNNC. With 100 random orderings of features, PPC-
aNNC is run and their respective classification accura-
cies (CA) are obtained. Let avg-CA and max-CA be the
average, and maximum CA of these 100 runs, respec-
tively. We define a threshold classification accuracy
called thr-CA ¼(avg-CA+max-CA)/2. 50 component
classifiers are obtained by finding 50 random orderings
of features such that each component has CA greater
than the thr-CA. This process is done to choose good
component classifiers. For these 50 component classifi-
ers, their respective CAs and orderings of features are
stored. This is done with each pair (val-train-set-i,val-
test-set-i) for i¼1, 2 and 3. So we get in total of 150
(i.e., 503) orderings of features along with their
respective CAs. This corresponds to the list of orderings.
This list is sorted based on CA values and best r
orderings are chosen to be used in the final ensemble
where r, the number of component classifiers, is chosen
as described below.
For each pair (val-train-set-i,val-test-set-i), for i¼1,
2 and 3, we obtain 50 component classifiers as explained
above. From these 50 components, we choose randomly
mdistinct components to form an ensemble. CA of this
ensemble is measured and is called val-CA-im. The above
is done for m¼1 to 50 and for i¼1, 2 and 3. The
quantity avg-val-CAmis the average value of {val-CA-
1m,val-CA-2m,val-CA-3m}, and val-SDmis its standard
deviation. The number of components ris chosen such
that avg-val-CAris the maximum element in {avg-val-
CA1,avg-val-CA2,...,avg-val-CA50}.
5.4. Experimental results
Tables 2 and 3 give the comparison between these
classifiers. These show the classification accuracy (CA)
for each of the classifiers as a percentage over respective
test sets. The parameter values are chosen by per-
forming cross-validation as described in Section 5.3.
Table 3 shows the CAs for the methods proposed by us.
Along with CA values, it also shows the parameter
values p(the number of blocks used in the synthesis)
and r(the number of components used in the ensem-
ble). The points worth noting are as follows: (i) For
large values of n(the number of original training pat-
terns) and p, it may not be feasible to consider the entire
synthetic set which is required in the case of NNC(SP).
(ii) If p¼d, where dis the total number of features,
each feature goes into a separate block (i.e., each block
contains only one feature) and therefore only one
ordering of features is possible. This means, in the case
of ensemble of PPC-aNNC’s, each component is same
and hence CA of one component is equal to the CA of
the ensemble. (iii) If p¼1, then the synthetic and ori-
ginal sets are same.
Table 2
A comparison between the classifiers (showing CA (%))
Dataset NNC k-NNC NBC NNC(BS) MCNNC wNNFP
OCR 91.12 92.68 81.01 92.88 91.97 10.02
WINE 94.87 96.15 91.03 97.44 95.00 92.30
VOWEL 56.28 60.17 36.80 57.36 55.97 23.38
THYROID 93.14 94.40 83.96 94.57 92.23 92.71
GLASS 71.93 71.93 60.53 71.93 71.67 53.5
PENDIGITS 96.08 97.54 83.08 97.57 97.25 45.05
P. Viswanath et al. / Information Fusion 5 (2004) 239–250 247
The cross-validation results for four datasets (WINE,
VOWEL, GLASS and PENDIGITS) for the ensemble of
PPC-aNNC’s are given in Tables 4–7 respectively, that
show the average CA (avg-val-CAm) and standard devi-
ation (val-SDm) with various values of m(i.e., number of
components). For the remaining two datasets (OCR and
THYROID), similar results as those for WINE, GLASS
and PENDIGITS are observed and hence not presented.
From the results presented, some of the observations
are:
(1) The methods given by us (viz., NNC(SP) and
Ensemble of PPC-aNNC’s) outperform the other
methods in the case of OCR and THYROID data-
sets. For the remaining datasets, our methods show
good performance.
Table 3
A comparison between the classifiers
Dataset NNC(SP)
(# blocks)
PPC-aNNC
(# blocks)
Ensemble of PPC-aNNCs
(# blocks) (# components)
OCR 93.01 (3) 84.91 (3) 94.15 (3) (45)
WINE 96.15 (2) 89.74 (2) 94.87 (2) (7)
VOWEL 56.28 (1) 43.51 (1) 46.32 (1) (33)
THYROID 97.23 (d) 94.16 (2) 94.66 (2) (49)
GLASS 71.93 (1) 60.53 (1) 67.54 (1) (7)
PENDIGITS 96.08 (1) 90.19 (1) 96.34 (1) (29)
Table 4
Cross validation results for ensemble of PPC-aNNC’s for WINE dataset
# component classifiers Number of blocks
123d
1 95.09 (1.38) 96.08 (1.38) 95.09 (1.38) 77.45 (5.00)
7 98.03 (1.38) 99.06 (1.38) 98.03 (1.38) 77.45 (5.00)
10 98.03 (1.38) 99.02 (1.38) 98.03 (2.77) 77.45 (5.00)
20 98.03 (1.38) 98.04 (1.38) 99.01 (1.38) 77.45 (5.00)
30 98.03 (1.38) 98.04 (1.38) 99.01 (1.38) 77.45 (5.00)
40 98.03 (1.38) 98.04 (1.38) 99.01 (1.38) 77.45 (5.00)
50 99.01 (1.38) 98.04 (1.38) 99.01 (1.38) 77.45 (5.00)
Table 5
Cross validation results for ensemble of PPC-aNNC’s for VOWEL dataset
# component classifiers Number of blocks
123d
1 82.77 (1.49) 82.57 (3.15) 79.73 (2.83) 23.86 (2.45)
10 91.86 (2.63) 88.06 (2.78) 85.22 (2.41) 23.86 (2.45)
20 91.86 (1.75) 89.58 (2.09) 85.41 (2.28) 23.86 (2.45)
30 92.23 (1.93) 90.15 (2.09) 85.98 (1.93) 23.86 (2.45)
33 93.18 (2.02) 90.72 (2.56) 85.98 (1.93) 23.86 (2.45)
40 92.42 (2.14) 89.58 (2.33) 85.98 (2.09) 23.86 (2.45)
50 92.61 (1.67) 89.96 (2.38) 85.22 (2.45) 23.86 (2.45)
Table 6
Cross validation results for ensemble of PPC-aNNC’s for GLASS dataset
# component classifiers Number of blocks
123d
1 67.65 (2.40) 71.57 (5.00) 63.73 (6.04) 46.08 (7.34)
776.47 (2.40) 67.65 (6.35) 66.67 (7.34) 46.08 (7.34)
10 70.59 (2.40) 69.61 (6.04) 65.69 (7.34) 46.08 (7.34)
20 72.55 (1.39) 71.57 (5.00) 64.71 (7.20) 46.08 (7.34)
30 73.53 (0.00) 70.59 (4.80) 66.67 (7.34) 46.08 (7.34)
40 73.53 (2.40) 70.59 (4.16) 65.69 (6.04) 46.08 (7.34)
50 71.57 (1.39) 72.55 (6.04) 65.69 (6.04) 46.08 (7.34)
248 P. Viswanath et al. / Information Fusion 5 (2004) 239–250
(2) The Ensemble of PPC-aNNC’s performs uniformly
better as compared to NBC, wNNFP, and PPC-
aNNC, respectively, over all datasets.
(3) It is interesting to note that PPC-aNNC outper-
forms NBC and wNNFP over all datasets except
for WINE dataset.
The actual space requirement on the average for
PPC-tree is about 60% to 90% for OCR, THYROID
and PENDIGITS datasets, when compared with the
space requirement of the respective original sets. For
other datasets, the actual space requirement is slightly
more than that required for the original set. This is be-
cause for small datasets, the data structure overhead is
larger when compared with the space reduced because of
the sharing of nodes in PPC-tree.
6. Conclusions
This paper presented a fusion of multiple approxi-
mate nearest neighbor classifiers having constant (Oð1Þ)
classification time upper bound and good classification
accuracy. Each individual classifier of the ensemble is a
weak classifier which works with a synthetic set gener-
ated from the novel pattern synthesis technique called
partition based pattern synthesis which reduces the curse
of dimensionality effect. Further explicit generation of
the synthetic set is avoided by doing implicit pattern
synthesis within the classifier which works directly with
a compact representation of the original training set
called PPC-tree. The proposed ensemble of PPC-aN-
NC’s with parallel implementation is a fast and efficient
classifier suitable for large, high dimensional data sets.
Since it has constant classification time upper bound, it
is a suitable classifier for online, real-time applications.
7. Future work
A formal explanation for the good behavior of the
ensemble of PPC-aNNC’s needs to be given. Next, one
needs to answer questions such as ‘What is a good
partition for doing partition based pattern synthesis and
how to find it?’ We gave a partitioning method based on
pair-wise correlations between features within a class,
but this takes into account only linear dependency be-
tween the features, so for features having higher order
dependency this method can fail to capture that. A
general partitioning method which is computationally
also efficient needs to be found which can be used for
both numerical and categorical features.
Acknowledgements
Research work reported here is supported in part by
AOARD Grant F62562-03-P-0318. Thanks to the three
anonymous reviewers for constructive comments. Spe-
cial thanks to B.V. Dasarathy for prompt feedback and
many suggestions during the revision that significantly
improved the content of the paper.
References
[1] B.V. Dasarathy, Nearest neighbor (NN) norms: NN pattern
classification techniques, IEEE Computer Society Press, Los
Alamitos, California, 1991.
[2] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, second
ed., A Wiley-interscience Publication, John Wiley & Sons, 2000.
[3] T. Cover, P. Hart, Nearest neighbor pattern classification, IEEE
Transactions on Information Theory 13 (1) (1967) 21–27.
[4] K. Fukunaga, D. Hummels, Bias of nearest neighbor error
estimates, IEEE Transactions on Pattern Analysis and Machine
Intelligence 9 (1987) 103–112.
[5] G. Hughes, On the mean accuracy of statistical pattern recogniz-
ers, IEEE Transactions on Information Theory 14 (1) (1968) 55–
63.
[6] A. Jain, B. Chandrasekharan, Dimensionality and sample size
considerations in pattern recognition practice, in: P. Krishnaiah,
L. Kanal (Eds.), Handbook of Statistics, vol. 2, North Holland,
1982, pp. 835–855.
[7] K. Fukunaga, D. Hummels, Bayes error estimation using parzen
and k-nn procedures, IEEE Transactions on Pattern Analysis and
Machine Intelligence 9 (1987) 634–643.
[8] K. Fukunaga, Introduction to Statistical Pattern Recognition,
second ed., Academic Press, 1990.
[9] V. Ananthanarayana, M.N. Murty, D. Subramanian, An incre-
mental data mining algorithm for compact realization of proto-
types, Pattern Recognition 34 (2001) 2249–2251.
Table 7
Cross validation results for ensemble of PPC-aNNC’s for PENDIGITS dataset
# component classifiers Number of blocks
123d
1 93.70 (0.32) 93.68 (0.42) 90.28 (0.94) 29.06 (5.00)
10 98.16 (0.26) 96.99 (0.37) 93.15 (1.19) 29.06 (5.00)
20 98.45 (0.18) 97.17 (0.30) 93.76 (1.03) 29.06 (5.00)
29 98.71 (0.14) 97.30 (0.30) 93.79 (1.13) 29.06 (5.00)
30 98.57 (0.27) 97.41 (0.21) 93.79 (1.13) 29.06 (5.00)
40 98.64 (0.14) 97.37 (0.31) 93.78 (0.99) 29.06 (5.00)
50 98.51 (0.19) 97.43 (0.36) 93.78 (1.04) 29.06 (5.00)
P. Viswanath et al. / Information Fusion 5 (2004) 239–250 249
[10] J. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate
generation, in: Proceedings of ACM SIGMOD International
Conference of Management of Data, Dallas, Texas, USA, 2000.
[11] Z. Tian, R. Raghu, L. Micon, BIRCH: an efficient data clustering
methods for very large databases, in: Proceedings of ACM
SIGMOD International Conference of Management of Data,
1996.
[12] A. Guttman, A dynamic index structure for spatial searching 2
(1984) 47–57.
[13] B. Efron, Bootstrap methods: Another look at the jackknife,
Annual Statistics 7 (1979) 1–26.
[14] A. Jain, R. Dubes, C. Chen, Bootstrap technique for error
estimation, IEEE Transactions on Pattern Analysis and Machine
Intelligence 9 (1987) 628–633.
[15] M. Chernick, V. Murthy, C. Nealy, Application of bootstrap and
other resampling techniques: Evaluation of classifier performance,
Pattern Recognition Letters 3 (1985) 167–178.
[16] S. Weiss, Small sample error rate estimation for k-NN classifiers,
IEEE Transactions on Pattern Analysis and Machine Intelligence
13 (1991) 285–289.
[17] D. Hand, Recent advances in error rate estimation, Pattern
Recognition Letters 4 (1986) 335–346.
[18] Y. Hamamoto, S. Uchimura, S. Tomita, A bootstrap technique
for nearest neighbor classifier design, IEEE Transactions on
Pattern Analysis and Machine Intelligence 19 (1) (1997) 73–79.
[19] L. Breiman, Bagging predictors, Machine Learning 24 (1996) 123–
140.
[20] D.B. Skalak, Prototype Selection for Composite Nearest Neigh-
bor Classifiers, Ph.D. Thesis, Department of Computer Science,
University of Massachusetts Amberst, 1997.
[21] E. Alpaydin, Voting over multiple condensed nearest neighbors,
Artificial Intelligence Review 11 (1997) 115–132.
[22] M. Kubat, W.K. Chen, Weighted projection in nearest-neighbor
classifiers, in: Proceedings of the First Southern Symposium on
Computing, The University of Southern Mississippi, December 4–
5, 1998.
[23] S.D. Bay, Combining nearest neighbor classifiers through multiple
feature subsets, Intelligent Data Analysis 3 (3) (1999) 191–209.
[24] P.M. Murphy, UCI Repository of Machine Learning Databases
[http://www.ics.uci.edu/mlearn/MLRepository.html], Department
of Information and Computer Science, University of California,
Irvine, CA, 1994.
[25] T.R. Babu, M.N. Murty, Comparison of genetic algorithms based
prototypeselection schemes, Pattern Recognition 34 (2001) 523–525.
250 P. Viswanath et al. / Information Fusion 5 (2004) 239–250