ArticlePDF Available

An enhanced Support Vector Machine classification framework by using Euclidean distance function for text document categorization

Authors:
  • Simplify Networks, Malaysia

Abstract and Figures

This paper presents the implementation of a new text document classification framework that uses the Support Vector Machine (SVM) approach in the training phase and the Euclidean distance function in the classification phase, coined as Euclidean-SVM. The SVM constructs a classifier by generating a decision surface, namely the optimal separating hyper-plane, to partition different categories of data points in the vector space. The concept of the optimal separating hyper-plane can be generalized for the non-linearly separable cases by introducing kernel functions to map the data points from the input space into a high dimensional feature space so that they could be separated by a linear hyper-plane. This characteristic causes the implementation of different kernel functions to have a high impact on the classification accuracy of the SVM. Other than the kernel functions, the value of soft margin parameter, C is another critical component in determining the performance of the SVM classifier. Hence, one of the critical problems of the conventional SVM classification framework is the necessity of determining the appropriate kernel function and the appropriate value of parameter C for different datasets of varying characteristics, in order to guarantee high accuracy of the classifier. In this paper, we introduce a distance measurement technique, using the Euclidean distance function to replace the optimal separating hyper-plane as the classification decision making function in the SVM. In our approach, the support vectors for each category are identified from the training data points during training phase using the SVM. In the classification phase, when a new data point is mapped into the original vector space, the average distances between the new data point and the support vectors from different categories are measured using the Euclidean distance function. The classification decision is made based on the category of support vectors which has the lowest average distance with the new data point, and this makes the classification decision irrespective of the efficacy of hyper-plane formed by applying the particular kernel function and soft margin parameter. We tested our proposed framework using several text datasets. The experimental results show that this approach makes the accuracy of the Euclidean-SVM text classifier to have a low impact on the implementation of kernel functions and soft margin parameter C. KeywordsText document classification–Support Vector Machine–Euclidean distance function–Kernel function–Soft margin parameter
Content may be subject to copyright.
Appl Intell (2012) 37:80–99
DOI 10.1007/s10489-011-0314-z
An enhanced Support Vector Machine classification framework
by using Euclidean distance function for text document
categorization
Lam Hong Lee ·Chin Heng Wan ·
Rajprasad Rajkumar ·Dino Isa
Published online: 25 August 2011
© Springer Science+Business Media, LLC 2011
Abstract This paper presents the implementation of a new
text document classification framework that uses the Sup-
port Vector Machine (SVM) approach in the training phase
and the Euclidean distance function in the classification
phase, coined as Euclidean-SVM. The SVM constructs a
classifier by generating a decision surface, namely the opti-
mal separating hyper-plane, to partition different categories
of data points in the vector space. The concept of the opti-
mal separating hyper-plane can be generalized for the non-
linearly separable cases by introducing kernel functions to
map the data points from the input space into a high dimen-
sional feature space so that they could be separated by a lin-
ear hyper-plane. This characteristic causes the implementa-
tion of different kernel functions to have a high impact on
the classification accuracy of the SVM. Other than the ker-
nel functions, the value of soft margin parameter, C is an-
other critical component in determining the performance of
the SVM classifier. Hence, one of the critical problems of
the conventional SVM classification framework is the ne-
L.H. Lee ()·C.H. Wan
Faculty of Information and Communication Technology,
Universiti Tunku Abdul Rahman, Bandar Barat, 31900 Kampar,
Perak, Malaysia
e-mail: leelh@utar.edu.my
C.H. Wan
e-mail: wanchinheng@yahoo.com
R. Rajkumar ·D. Isa
Intelligent Systems Research Group, Faculty of Engineering,
The University of Nottingham, Malaysia Campus, Jalan Broga,
43500, Semenyih, Selangor, Malaysia
R. Rajkumar
e-mail: Rajprasad.Rajkumar@nottingham.edu.my
D. Isa
e-mail: Dino.Isa@nottingham.edu.my
cessity of determining the appropriate kernel function and
the appropriate value of parameter C for different datasets of
varying characteristics, in order to guarantee high accuracy
of the classifier. In this paper, we introduce a distance mea-
surement technique, using the Euclidean distance function
to replace the optimal separating hyper-plane as the classi-
fication decision making function in the SVM. In our ap-
proach, the support vectors for each category are identified
from the training data points during training phase using the
SVM. In the classification phase, when a new data point is
mapped into the original vector space, the average distances
between the new data point and the support vectors from
different categories are measured using the Euclidean dis-
tance function. The classification decision is made based on
the category of support vectors which has the lowest average
distance with the new data point, and this makes the classi-
fication decision irrespective of the efficacy of hyper-plane
formed by applying the particular kernel function and soft
margin parameter. We tested our proposed framework us-
ing several text datasets. The experimental results show that
this approach makes the accuracy of the Euclidean-SVM
text classifier to have a low impact on the implementation
of kernel functions and soft margin parameter C.
Keywords Text document classification ·Support Vector
Machine ·Euclidean distance function ·Kernel function, ·
Soft margin parameter
1 Introduction
This paper presents a novel text document classification
framework that uses the Support Vector Machine (SVM) ap-
proach in the training phase to identify the set of support
An enhanced Support Vector Machine classification framework by using Euclidean distance function 81
vectors for each category, and uses the Euclidean distance
function in the classification phase to compute the average
distances between the testing data point and each of the sets
of support vectors from different categories. Classification
decision is made based on the category which has the low-
est average distance between its set of support vectors and
the new data point and this makes the classification decision
irrespective of the efficacy of hyper-plane formed by apply-
ing the particular kernel function and soft margin parameter.
Text document classification denotes the task of automati-
cally assigning collections of electronic text documents into
their annotated categories, based on their contents and simi-
larities. For some decades now, text document classification
has become important due to the rapid growth of creating,
editing, manipulating and storing text documents in digi-
tal form. In recent years, an increasing number of statistical
and computational approaches have been developed for text
classification, including k-nearest-neighbor classification [1,
2], Bayesian classification [312], support vector machine
[2,1317], maximum entropy [18], decision tree induction
[19], rule induction [20,21], and artificial neural networks
[22]. Besides the supervised classification approaches, unsu-
pervised clustering techniques such as self-organizing maps
[23,24] have also been introduced for text document seg-
mentation.
Since the past decade, SVM has gained its popularity
in various types of classification applications and has been
reported as one of the best performing classification ap-
proaches [2,1317,2534]. It can be used as a discrimi-
native classifier and has been shown to be more accurate
than most other classification models [15,27,28,31,32,35,
36]. The good generalization characteristic of the SVM is
due to the implementation of Structural Risk Minimization
(SRM) principle, which entails finding an optimal separat-
ing hyper-plane, thus guaranteeing the highly accurate clas-
sifier in most applications. Equation (1) represents the equa-
tion of a hyper-plane which can be used to partition data
points in a SVM.
w·x+b=0(1)
Figure 1illustrates a linearly separable case, where data
points of one category (represented by “”) and data points
of another category (represented by “”) are separated by
the linear optimal separating hyper-plane (the solid straight
line).
There are actually an infinite number of hyper-planes
which are able to partition the data points into two categories
(as illustrated by the dashed lines on Fig. 1). According to
the SVM methodology, there will just be one optimal sep-
arating hyper-plane. This optimal separating hyper-plane is
lying half-way in between the maximal margin, where the
margin is defined as the sum of distances of the hyper-plane
Fig. 1 Optimal separating hyper-plane
to the support vectors. In the case as illustrated in Fig. 1,the
margin is d1+d2.
The optimal separating hyper-plane is only determined
by the closest data points of each category. These points are
called Support Vectors (SVs). As only the SVs determine
the optimal separating hyper-plane, there is a certain way
to represent them for a given set of training points. It has
been shown in [37] that the maximal margin can be found
by minimizing ½w2,asshownin(2).
min½w2(2)
Therefore, the optimal separating hyper-plane can be con-
figured by minimizing (2) under the constraint of (3), that
the training data points are correctly separated.
yi·(w·xi+b) 1,i(3)
A more detailed discussion of the SVM has been presented
in our previous work [14].
The concept of the optimal separating hyper-plane can be
generalized for the non-linearly separable cases. One of the
methods which can be used to partition non-linearly separa-
ble data points is the implementation of kernel functions. By
implementing the kernel functions, the non-linearly separa-
ble data points are mapped from the original input space to
a high dimensional feature space through a non-linear trans-
formation rather than fitting non-linear decision surfaces to
the input space to separate the data points, as illustrated in
Fig. 2. This is to ensure that a linear optimal separating
hyper-plane could be generated in the new feature space to
separate the data points. By introducing the kernel function
as shown by (4), it is not necessary to explicitly know (·)
[38]. Hence, the optimization problem can be translated di-
rectly to the more general kernel version, as shown by (5).
K(xi,x
j)=(xi), (xj),(4)
82 L.H. Lee et al.
Fig. 2 Mapping data points into high dimensional feature space using
kernel functions
Table 1 Common kernel functions for SVM with promising classifi-
cation performance in most cases
Kernels Formula
Lineal K(u,v) =u·v
Sigmoid K(u,v) =tanh(au·v+b)
Polynomial K(u,v) =(1+u·v)d
RBF K(u,v) =exp(auv2)
Exponential RBF K(u,v) =exp(auv)
W(α)=
n
i=1
αi1
2
n
i=1,j=1
αiαjyiyjK(xi,x
j),
subject to Cαi0,
n
i=1
αiyi=0(5)
The algorithm of the non-linear classification is formally
similar to the linear classification, except that every dot
product is replaced by a non-linear kernel function. This al-
lows a linear optimal separating hyper-place to be fitted in
the high dimensional feature space, and it may be non-linear
in the original input space, as illustrated in Fig. 2. By map-
ping the data points into a high dimensional feature space us-
ing kernel functions, superb performing classification tasks
can be obtained since it allows the SVM models to perform
data points separation even with very complex boundaries.
There could be an infinite number of kernel functions, but
only certain kernel functions have been found to perform
well in a wide variety of classification tasks. Table 1shows
some of the common well-performing kernel functions in
most cases.
Each of the kernel functions listed in Table 1has its own
properties and unique responses in handling different types
of data. For example, the SVM model equipped with a sig-
moid kernel function is equivalent to a two-layer perceptron
neural network [38], while the SVM model using a Radius
Basis Function (RBF) kernel is closely similar to the RBF
neural networks and the feature space is in an infinite di-
mension. The selection of kernel function for the SVM clas-
sification model is based on the classification task’s require-
ments and the patterns of the data points distribution. With
an appropriate and optimal kernel function implemented in
the SVM model, the classifier is able to scale high dimen-
sional data relatively well, and trade-off between classifier
complexity and classification error can be controlled explic-
itly.
Therefore, an appropriate implementation of optimal ker-
nel function is a necessity for the SVM classification frame-
work in order to obtain optimal performance. This unique
characteristic causes the classification accuracy of the SVM
to be highly dependent on the selection of kernel func-
tions. This is due to the fact that the data points distribu-
tion may change in different high dimensional feature space,
and the linear optimal separating hyper-plane can only be
constructed after the data points have been mapped into the
higher dimensional feature space using kernel functions. In
other words, the decision surface of the SVM classifier for
non-linearly separable cases is constructed based on the im-
plementation of kernel functions. As the result, one of the
critical problems of the conventional SVM classification ap-
proach is the selection of appropriate kernel function, based
on the varying characteristics of different datasets, in or-
der to obtain high classification accuracy. It does not have
a generally optimal kernel function which is able to guaran-
tee good classification performance on all types of datasets
of varying characteristics. In recent years, many research
works have been carried out with the same goal that is seek-
ing for solutions to counter this problem. However, there is
no ultimate solution of having an all-rounded and optimal
kernel function which will suit most of the SVM classifica-
tion tasks on different datasets of varying characteristics.
Apart from the kernel functions, the performance of the
SVM classifier is also heavily dependent on the soft mar-
gin parameter, C. The parameter C controls the trade-off
between the margin and the size of slack variables [39]. It
creates the soft margin that allows some of the classification
errors, especially when non-separable points exist during the
classification phase. If the value of C is small, the number of
training errors will increase due to underfitting [17]. On the
other hand, the large value of C will lead to overfitting where
a high penalty for non-separable points occurs [40], and the
classifier will behave like a hard margin SVM [17].
As the SVM classification approach requires the selec-
tion of appropriate combination of kernel function and pa-
rameters, the optimization of kernel function and parameters
has to be incorporated into the training phase of the SVM,
in order to guarantee high accuracy of the classification task.
Typically, convoluted computations have been carried out to
optimize the set of kernel function and parameters combina-
tion for the SVM. This could be done by conducting iterative
cross-validation process to predict the best performing com-
bination of kernel function and parameters for the trained
SVM classifier, using a validation set. This method leads to
a computationally intensive and high time-consuming train-
ing process for the SVM, hence degrading the efficiency of
An enhanced Support Vector Machine classification framework by using Euclidean distance function 83
the classifier. Furthermore, for certain cases in which the
training samples are limited, there exists a critical problem
in preparing sufficient training set and validation set for the
SVM to train the classifier and to conduct the kernel func-
tion and parameters optimization process.
In this paper, we propose an enhanced classification
framework for text document classification, coined as the
Euclidean-SVM. This new classification framework is pro-
posed by introducing the Euclidean distance measurement
function to replace the optimal separating hyper-plane as the
decision making function for the conventional SVM classi-
fication. In our proposed approach, the SVs for each of the
categories are identified using the SVM training algorithm
and the SVs are then mapped into the original vector space
to construct the trained Euclidean-SVM classifier. In the
classification phase, an unlabeled new data point is mapped
into the same original vector space and the average distances
between the new data point and each set of the SVs of dif-
ferent categories are computed using the Euclidean distance
function. The classification decision is made based on the
category of SVs which has the lowest average distance with
the new data point. With the classification decision making
function using the Euclidean distance function instead of the
optimal separating hyper-plane, the impact of kernel func-
tion and soft margin parameter C on accuracy of the SVM
classifier could be minimized, hence contribute to a kernel
function and soft margin parameter independent Euclidean-
SVM text classification framework.
2 Related works
Although the SVM has been reported as one of the best
performing machine learning approaches for classification,
there exists a critical problem of the SVM in determining
the optimal combination of kernel function and parame-
ters, in order to guarantee high efficiency and effectiveness
of the classification tasks. Typically, the optimal combina-
tion of the SVM kernel and parameters are determined by
using the computationally intensive grid search algorithms
[41,42]. This method varies different types of kernel func-
tions and parameters through a wide range of values using
geometric steps. The set of kernel and parameters combina-
tion with the best cross-validation accuracy is selected. In
recent years, many research works have been carried out in
order to solve the problem of automatically finding the most
appropriate kernel function and parameters for the SVM in
order to guarantee high accuracy of the classifier. Most of
them perform the kernel and parameters optimization using
evolutionary algorithms. These methods conduct iterative
computation in order to configure the optimal set of kernel
and parameters for the SVM, and they will further compli-
cate the convoluted training process, hence leading to a high
computational cost for the SVM during the training phase.
As a result, the efficiency of the SVM classifier has been
severely degraded by having such methods in determining
the appropriate combination of kernel and parameters.
Quang et. al. have presented the evolutionary algorithm,
in specific, the genetic algorithm to optimize SVM pa-
rameters, including kernel type, kernel parameters and up-
per bound C. This is an iterative process by repeating the
crossover, mutation and selection procedures to produce the
optimal set of parameters [43]. Friedrichs and Igel have also
proposed an evolution strategy, in specific, the Covariance
Matrix Adaptation Evolution Strategy for obtaining optimal
set of multiple hyper-parameters (kernel parameters and reg-
ularization parameter) for the SVM [44]. Briggs and Oates
have introduced the idea of domain-specific composite ker-
nels to the SVM classification for better generalization abil-
ity as compared to the base kernels [45]. An evolutionary
algorithm was employed in order to search through a large
number of composite kernels and the hill climbing technique
was chosen as the composite kernel search algorithm [45].
Dong et. al. have also presented a genetic algorithm-based
technique to select the optimal value of the cost parameter
C and kernel parameters for the SVM, using cross-valida-
tion [46]. All the approaches mentioned above suffer from
inefficient classification due to the high time consuming iter-
ative computation, as they employ evolutionary algorithms
for the optimal kernel and parameters selection.
Avci has proposed a hybrid method of genetic algorithm
and SVM, coined as HGASVM, for automatic digital mod-
ulation classification. Avci has shown that the proposed
HGASVM has better accuracy than the combination of the
SVM classifiers with randomly selected parameters, in the
specified application of automatic digital modulation classi-
fication [47]. However, as the parameters optimization pro-
cess is based on the genetic algorithm, the common prob-
lem of high time consumption is still a disadvantage to this
proposed approach. Zhang et al. have used the combination
of simulated annealing and genetic algorithm to optimize
the parameters for the SVM. This hybrid approach takes the
advantages from both of these techniques to overcome the
disadvantages of each other. As the result, this hybrid tech-
nique has been proven to have better performance than sim-
ulated annealing or genetic algorithm alone in selecting op-
timal kernel and parameters for the SVM [48]. Diosan et al.
have proposed another hybridized framework of genetic pro-
gramming and SVM to choose the most efficient expression
of the kernel of kernels function and to select the optimal set
of SVM hyper-parameters. This approach conducts an itera-
tive process to optimize the SVM parameters and due to the
complexity of kernel function, the computational complex-
ity of the proposed algorithm is high, and this is even higher
than that from evolutionary linear multiple kernel [49].
84 L.H. Lee et al.
Besides using the evolutionary algorithms, there exist
some approaches which determine the optimal kernel pa-
rameters using distance between two classes in the feature
space [5052]. Sun et al. have proposed a method in which
the training phase of the SVM and the iterative process of
evaluating the performance for all the parameters combina-
tions can be avoided [50,51]. The optimal parameters can
be determined by sigmoid function. According to Sun et al.,
this method has good accuracy with sigmoid function and
drastically reduces the time for searching the optimal ker-
nel parameters for the conventional SVM by using other ex-
isting algorithms, since the iterative computation of select-
ing optimal parameters using evolutionary algorithms can be
avoided [50,51]. Wu and Wang have also proposed a sim-
ilar kernel parameters selection approach which uses data
separation index (inter-class distance in the feature space) to
predict the optimal SVM kernel parameters [52]. However,
both of these methods do not perform the optimization of
parameter C in which the training time of the SVM could be
further reduced if the proposed methods incorporate param-
eter C into their optimization strategy.
As the existing kernel function and parameters optimiza-
tion methods for the SVM involve convoluted and iterative
computation, this problem is considered and investigated by
our group. Therefore, it is the goal of this paper to propose
an enhanced SVM framework which has low impact on the
implementation of different kernel functions and parame-
ter C, for text document classification.
3 The Euclidean-SVM text classification framework
We propose and implement a new text classification frame-
work by introducing the Euclidean distance function to re-
place the optimal separating hyper-plane in the conventional
SVM as the classification decision making function. We uti-
lize the SVM training algorithm to reduce the training data
points by identifying and retaining only the SVs, and elim-
inating the rest of the training data points. In the classifica-
tion phase, the Euclidean distance function is used to make
the classification decision based on the average distance be-
tween the testing data point to each group of SVs from dif-
ferent categories. We eliminate the use of optimal separat-
ing hyper-plane as the decision surface in the conventional
SVM approach, as the construction of the optimal separat-
ing hyper-plane is highly dependent on the kernel functions.
In fact, the construction of the linear separating hyper-plane
in the high dimensional feature space is based on the SVs
and kernel function, in which the kernel function is incorpo-
rated to map the data points into a high dimensional feature
space, so that the data points (specifically the SVs) are possi-
bly separable by a linear separating hyper-plane. This causes
the kernel functions to have high impact on the construction
Table 2 Kernels function and number of error [56]
Method Sample # Error #
ESTScan, closest ATG 2350 729
Salzberg method 3312 1095
SVM, Salzberg kernel 3312 530
TISHunter 3312 13
Table 3 Kernel functions and number of support vectors [56]
Kernel Average # of SVs
Edit kernel I 2312
Edit kernel II 2316
Edit kernel III, SCM120 319
Edit kernel III, SCM250 230
Edit kernel III, ASCM120 507
Edit kernel III, ASCM250 293
Edit kernel III, PAM250 821
Table 4 Kernels function and accuracy [53]
Kernels Best accuracy Average accuracy
Linear 56.04% 56.04%
Polynomial 74.60% 66.81%
RBF 70.34% 59.01%
of the separating hyper-plane, and hence affect the classifi-
cation accuracy of the SVM. Previous research works had
proven that the implementation of different kernel functions
will greatly influence the accuracy of the SVM classifier, as
well as the number of SVs [5359]. Tables 2,3and 4illus-
trate the results from previous research works which investi-
gated on the impact of different kernel functions on accuracy
and number of SVs of the SVM classification approach.
In this paper, we propose the utilization of the Euclidean
distance function to replace the optimal separating hyper-
plane as the decision making function of the SVM approach.
Prior to the training phase of our proposed Euclidean-
SVM text classification framework, we have proposed a pre-
processing approach to transform text documents into a rep-
resentation suitable format for the SVM and the Euclidean-
SVM, which is typically in numerical format. The text doc-
uments have been pre-processed by using the Bayesian vec-
torization technique [14,60]. The Bayesian vectorization
technique is carried out in order to transform each of the text
documents in the dataset into the format of probability dis-
tribution in the vector space, by using the Bayesian formula.
By applying the Bayesian vectorization technique to pre-
process the text documents, the textual data are transformed
into numerical format and the dimensionality of data has
An enhanced Support Vector Machine classification framework by using Euclidean distance function 85
been greatly reduced from thousands (equal to the number of
words in the document, such as when using Term Frequency
Inverse Document Frequency (TFIDF) method to transform
text to numerical) to typically less than hundred (number of
categories the document may be classified to). This trans-
formation is a necessity of the Euclidean-SVM text classi-
fication framework due to the fact that the Euclidean-SVM
approach is a classification framework based on vector space
model, which requires data in numerical format so that the
data could be mapped into a vector space, for both train-
ing and classifying purposes. Besides this, as the Euclidean-
SVM approach may suffer from high computational time
consumption in handling data with high dimensionality, due
to its convoluted computation (requiring the computation of
distances between the SVs and the input data points), the di-
mensionality reduction of data (from the number of words to
the number of available categories in the classification task)
is also a crucial requirement for the Euclidean-SVM classi-
fication approach. The details of the Bayesian vectorization
technique have been discussed in our previous works pre-
sented in [14,60].
We have conducted an experiment to validate the perfor-
mance of the Bayesian vectorization technique over TFIDF
vectorization on the preprocessing of textual data for the
SVM classifier and the Euclidean-SVM classifier. The re-
sults showed that the classifiers which use the Bayesian vec-
torization as the textual data transformation technique out-
performed the classifiers which use the TFIDF vectorization
technique. The results for this experiment can be seen in Ta-
bles 16 and 17 in Sect. 4.6 of this paper.
During the training phase of the Euclidean-SVM classifi-
cation framework, the conventional SVM training algorithm
is used to map all the training data points into the vector
space and identify the set of SVs for each of the categories.
The construction of the optimal separating hyper-plane is
still a necessity in order to identify the SVs, since the op-
timal separating hyper-plane is lying half-way in between
the maximal margin, where the margin is defined as the sum
of distances of the hyper-plane to the SVs. Figure 3illus-
trates the construction of the optimal separating-hyper-plane
in the vector space which separates the training data points
of two different categories, after implementing the conven-
tional SVM training algorithm.
As illustrated in Fig. 3, there are two categories of train-
ing data points, represented by spheres and squares respec-
tively. The black spheres represent the SVs of the category
“Sphere” and black squares represent the SVs of the cate-
gory “Square”. The optimal separating hyper-plane is con-
structed by maximizing the margin of d1+d2. However, the
optimal separating hyper-plane is discarded in the classifi-
cation phase as it does not act as the decision surface in
our proposed Euclidean-SVM classification framework. Our
proposal in this paper is to replace the optimal separating
Fig. 3 Vector space of the conventional SVM classifier with optimal
separating hyper-plane
Fig. 4 Vector space of the Euclidean-SVM classifier with the Eu-
clidean distance function as the classification decision making algo-
rithm
hyper-plane by introducing the Euclidean distance function
in making the decision for the classification task. After the
SVs for each of the categories have been identified, they are
mapped into the original vector space and the rest of the
training data points are eliminated. During the classification
phase, a new unlabeled data point is mapped into the same
vector space with the SVs, and the average distances be-
tween the new data point and each set of the SVs of different
categories are computed using the Euclidean distance func-
tion. Figure 4illustrates the vector space of the Euclidean-
SVM classifier during classification phase.
The “Triangle” in Fig. 4represents the new unlabeled
data point to be classified. The distances between the new
input data point and each of the SVs are computed. The
Euclidean distance function is used to calculate the dis-
tance between two points, new vector P, and support vec-
tor Q. Equation (6) illustrates the Euclidean distance for-
mula where piand qiare the coordinate of Por Qin di-
mension n.
D=
n
i=1
(piqi)2(6)
As illustrated in Fig. 4,D1and D2represent the Euclidean
distances between the new data point and the SVs of cat-
86 L.H. Lee et al.
Fig. 5 Block diagram of the
Euclidean-SVM text
classification framework
egory “Sphere”, while D3,D4and D5represent the Eu-
clidean distances between the new data point and the SVs of
category “Square”. After obtaining the Euclidean distances
between the new data point and each of the SVs from differ-
ent categories, the average distance of the new data point to
the set of SVs of each of the categories has been computed.
This could be done by adding up the Euclidean distances of
the new data point to the SVs from the same category, and
divide the sum with the total number of SVs for that par-
ticular category, as illustrated by (7). Based on the example
as illustrated in Fig. 4, the average distance of the new data
point to the SVs of category “Sphere” is (D1+D2)/2, and
the average distance of the new data point to the SVs of cat-
egory “Square” is (D3+D4+D5)/3.
Davg =N
I=1((n
i=1(piqi)2))I
N(7)
After computing the average distance of the new data point
to the set of SVs of each of the categories, the classifica-
tion decision is made based on the category which has the
lowest average distance between its set of SVs and the new
data point. In other words, the new input data point will be
labeled with the category which has the lowest average dis-
tance between its SVs and the new data point itself.
The Euclidean-SVM classification approach and the
k-Nearest Neighbor (k-NN) classification approach share
some similarities as both of these classification approaches
map the training data points into a vector space, and dis-
tance measurement technique is used to make classification
decision. In fact, the Euclidean-SVM approach differs from
the k-NN approach. The Euclidean-SVM approach makes
the classification decision based on the category which has
the shortest average Euclidean distance between its set of
SVs and the new data point. On the other hand, the k-NN
approach assigns a testing data point to a particular category
if it is the most frequent category among the k nearest train-
ing data points. Figure 5illustrates the block diagram and
Table 5illustrates the algorithm of the Euclidean-SVM text
classification approach.
With the combination of the SVM training algorithm and
the Euclidean distance function to make the classification
decision, the impact of kernel function and parameter C on
the classification accuracy of the conventional SVM can be
minimized. This is due to the fact that the optimal separat-
ing hyper-plane, which its construction is highly dependent
on kernel functions, is replaced by the Euclidean distance
function. Since the Euclidean distance function is able to
perform its classification decision making task sufficiently
as long as both the training data points (support vectors) and
data points to be classified are mapped into the same vec-
tor space, the transformation of existing vector space into
a higher dimensional feature space by the kernel functions
is not needed during the classification phase, hence does
not have great impact on the classification performance. In
An enhanced Support Vector Machine classification framework by using Euclidean distance function 87
Table 5 Algorithms of the Euclidean-SVM text classification framework in pre-processing phase, training phase and classification phase
Algorithms of the Euclidean-SVM text classification framework
Pre-Processing Phase
1. Transform all the text documents (in both training set and testing set) into numerical format using the Bayesian Vectorization
Technique.
Training Phase
1. Map all the training data points into the vector space of a SVM.
2. Identify and obtain the set of support vectors for each of the categories using SVM algorithm, and eliminate the rest of the
training data points which are not identified as support vectors.
3. Map all the support vectors into the original vector space.
Classification Phase
1. Map the new unlabeled data point into the same original vector space with support vectors.
2. Use the Euclidean distance formula to calculate the average distances between the new data point and each of the sets of
support vectors from different categories.
3. Identify the category which has the lowest average distance between its set of support vectors and the new data point.
4. Generate classification result for the new data point based on the identified category.
Table 6 List of categories of the vehicles dataset
1. Aircrafts
2. Boats
3. Cars
4. Train
other words, the problem of selecting the right kernel func-
tions for the classifier does not exist if the optimal separating
hyper-plane is replaced by the Euclidean distance function.
As the result, by integrating the SVM training algorithm
and the Euclidean distance function to construct a classifica-
tion framework, we can obtain an enhanced Euclidean-SVM
classifier with better performance in which the accuracy is
comparable to the conventional SVM, while immune from
the problem of determining the appropriate kernel functions
and parameter C.
4 Experimental results
Our proposed Euclidean-SVM text classification framework
has been tested and evaluated using five text corpuses. Three
of them were collected by our research group, namely the
Vehicles dataset [8,9,14,23], the Mathematics dataset [8,
9,14], and the Automobiles datasets [8,9,14]. These three
text datasets have been constructed by collecting text arti-
cles from different sources, such as Wikipedia website and
arxiv.org website.
The Vehicles dataset was built by acquiring vehicles re-
lated articles from Wikipedia website. This dataset consists
of 4 categories of vehicles. All the four categories are eas-
ily differentiated in terms of the content since each category
Table 7 List of categories of the mathematics dataset
1. Algebraic Geometry
2. Analysis of PDEs
3. Combinatorics
4. Differential Geometry
5. Mathematical Physics
6. Number Theory
7. Probability
8. Statistics
has their unique set of keywords. The list of categories of
the Vehicles dataset is illustrated in Table 6.
A dataset containing articles about mathematical topics,
namely the Mathematics dataset, has been acquired by our
research group from arxiv.org website. This dataset consists
of 8 mathematical sub-categories. The list of categories of
the Mathematics dataset is shown in Table 7.
The Automobiles dataset was designed and organized by
collecting articles about automobiles from Wikipedia web-
site. This dataset consists of nine categories of automobile,
differentiated in terms of geographical regions and classifi-
cations. Table 8illustrates the list of categories of the Auto-
mobiles dataset.
Besides the three text corpuses that we constructed by
acquiring documents from different sources and organized
by ourselves, we have also acquired the WebKB dataset and
the Reuters-21578 dataset for more generic evaluations of
the performance of our proposed Euclidean-SVM text clas-
sification approach.
The WebKB collection was originally constructed by
“the World Wide Knowledge Base (Web->Kb) Project of
88 L.H. Lee et al.
Table 8 List of categories of the automobiles dataset
1. American Mini Vans
2. American Sports Cars
3. American SUVs
4. Asian Mini Vans
5. Asian Sports Cars
6. Asian SUVs
7. European Mini Vans
8. European Sports Cars
9. European SUVs
Table 9 List of categories of the WebKB dataset
1. Course
2. Faculty
3. Project
4. Student
the CMU Text Learning Group” and this dataset has been
widely used for experiments in text applications of ma-
chine learning techniques, such as text classification and text
clustering [61]. Many research groups have used the We-
bKB dataset to evaluate the performance of their presented
text classification approaches [6266]. The original WebKB
dataset consists of files collected from computer science de-
partments of various universities in 1997. These documents
were manually classified into seven different categories: stu-
dent, faculty, staff, department, course, project, and other.
In the experiments here, the categories “Department” and
“Staff” were discarded due to the fact that there were only
a few pages from each university. The category “Other” was
also discarded because of the documents in this category are
greatly varying from each other. In conclusion, the list of
categories in the WebKB dataset which has been used in the
experiments carried out in this paper is illustrated in Table 9.
The Reuters-21578 dataset was originally collected by
Carnegie Group Inc. and Reuters Ltd. This text collection
has been reported as one of the most common benchmark
for text classification approaches and it has been widely
used by text classification research groups in evaluatingtheir
classification models [1,2,6,7,10,15,17,18,35,36].
This dataset consists of documents appeared on the Reuters
newswire in 1987 and they were manually organized into
categories by personnel from Reuters Ltd. There exist many
versions of Reuters-21578 text collection due to the fact
that different researchers have different evaluation criteria
on their classification models. In our experiment, we have
adopted the Reuters-21578 R8 dataset which is the set of
the 8 categories with the highest number of positive training
data, and this collection only consists of single labeled text
documents. The R8 version of the Reuters-21578 text col-
lection consists of the categories as illustrated in Table 10.
Table 10 List of categories of the Reuters-21578 R8 dataset
1. Acq
2. Crude
3. Earn
4. Grain
5. Interest
6. Money-FX
7. Ship
8. Trade
We have conducted the experiments by implement-
ing the conventional SVM classification approach and
the Euclidean-SVM classification approach independently.
Each of these approaches has been evaluated with the imple-
mentation of different kernel functions and different values
of parameter C during the training phase. We have imple-
mented the conventional SVM classification approach using
MATLAB version 7.6.0.324 (R2008a) with LIBSVM tool-
box version 2.91 [67]. As for the Euclidean-SVM, we im-
plemented the proposed classification approach by using the
same version of MATLAB and LIBSVM toolbox to identify
the set of SVs of each of the categories, and we developed
an additional module which performs the computation of the
average Euclidean distances of the new data point to the set
of SVs of each of the categories, and make the classification
decision based on the category which has the lowest average
distance between its SVs and the new data point.
In our experiments, we have implemented the classifica-
tion approaches with four common kernel functions for the
SVM, linear kernel, polynomial kernel, radial basis function
(RBF) kernel and sigmoid kernel. As for the parameter C,
the range of values of 1, 101,10
2,10
3,10
4and 105has
been applied to both of the tested classifiers. By conduct-
ing the experiments on these two classification approaches
separately with different kernel functions and different val-
ues of parameter C, we are able to evaluate the performance
of each approach and to determine the improvement of the
Euclidean-SVM approach (if any) in contrast to the conven-
tional SVM model in terms of classification accuracy. Be-
sides this, we are also able to evaluate the impact of the
implementation of different kernel functions and parame-
ter C, on the conventional SVM approach, as well as the
Euclidean-SVM approach.
4.1 Experiment on the vehicles dataset
The Vehicles dataset consists of 4 categories with a total of
640 documents. Each category consists of 160 documents
where 50 documents were used to build the training set,
and the remaining 110 documents were used for testing pur-
poses. In other words, the Vehicles dataset had been split
An enhanced Support Vector Machine classification framework by using Euclidean distance function 89
Table 11 Classification accuracies of the SVM classifier and the Euclidean-SVM classifier with different kernels and different values of parameter
C, on the vehicles dataset
Classification approach Classification accuracy (%) Variance of
(Kernels) accuracies
Dataset: Vehicles Value of soft margin parameter, C across
Training Set: 200 Documents values of
Testing Set: 440 Document 1 10 100 1000 10000 100000 parameter C
SVM (Linear) 93.75 93.33 92.92 92.92 92.92 92.92 0.1201
SVM (Polynomial) 92.50 92.50 92.50 92.50 92.50 92.50 0
SVM (RBF) 25.00 25.00 25.00 25.00 25.00 25.00 0
SVM (Sigmoid) 25.00 25.00 25.00 25.00 25.00 25.00 0
Euclidean-SVM (Linear) 93.75 93.33 93.75 94.17 94.17 94.17 0.1176
Euclidean-SVM (Polynomial) 94.17 94.17 94.17 94.17 94.17 94.17 0
Euclidean-SVM (RBF) 93.33 93.33 93.33 93.33 93.33 93.33 0
Euclidean-SVM (Sigmoid) 93.33 93.33 93.33 93.33 93.33 93.33 0
into a training set with 200 documents and a testing set with
440 documents.
Table 11 shows the experimental results of the conven-
tional SVM classifier and the Euclidean-SVM classifier,
which have been implemented with different kernels and dif-
ferent values of parameter C, on the Vehicles dataset.
As illustrated in Table 11, the performance of the conven-
tional SVM classifier on the Vehicle dataset is highly depen-
dent on the implementation of kernel functions. Both linear
kernel and polynomial kernel have contributed high classifi-
cation accuracies to the SVM, which are between the range
of 92.50% and 93.75%. On the other hand, the SVM clas-
sifier with RBF kernel and the SVM classifier with sigmoid
kernel have performed poorly on the Vehicles dataset, with
the accuracies of 25.00%. This is due to the fact that the
implementation of appropriate kernel function is a necessity
for the SVM classifier to guarantee good generalization abil-
ity. The wrong implementation of kernel functions will lead
to a seriously poor classification performance of the SVM.
In other words, the implementation of kernel functions has
very high impact to the classification accuracy of the SVM
classification approach.
As for the performance of the Euclidean-SVM classifi-
cation approach on the Vehicles dataset, we have obtained
classification accuracies between the range of 93.33% to
94.17%, with the implementation of different kernels and
different values of parameter C. Hence, we can conclude
that the Euclidean-SVM classification approach is nearly
immune from the implementation of kernel function and pa-
rameter C, in order to obtain good classification accuracy.
In this experiment, parameter C does not have great im-
pact on both of the conventional SVM classifier and the
Euclidean-SVM classifier. As illustrated in Table 11,the
variances of classification accuracies across the tested val-
ues of parameter C for both of the tested classification ap-
proaches are only approximately 0.12. This is due to the fact
that the data points for each of the available categories in
this dataset are very dissimilar, as they are easily differen-
tiated with their own unique features. Non-separable data
points are hardly found during the classification phase. With
a small number of non-separable points found in the clas-
sification phase, the effect of parameter C, which creates
the soft margin that allows some of the classification errors
when non-separable points occur, has become minimum in
the classification task.
4.2 Experiment on the mathematics dataset
The Mathematics dataset consists of 8 categories with a to-
tal of 320 documents. Each of the 8 categories consists of
an equal number of 40 documents. 10 documents from each
category were obtained to construct the training set with a
total number of 80 documents, and the remaining 30 docu-
ments from each category were used for building the testing
set with 240 documents.
Table 12 illustrates the experimental results of the con-
ventional SVM classifier and the Euclidean-SVM classifier
with different kernels and different values of parameter C,
on the Mathematics dataset.
BasedonTable12, we can observe that the implementa-
tion of different kernel functions has affected the classifica-
tion performance of the conventional SVM classifier on the
Mathematics dataset. As illustrated in Table 12, both of the
linear kernel and polynomial kernel have contributed rela-
tively high classification accuracies to the SVM. With the
varying of the value of parameter C, the classification ac-
curacies of the SVM with linear kernel have the variance of
27.8211, while the classification accuracies of the SVM with
90 L.H. Lee et al.
Table 12 Classification accuracies of the SVM classifier and the Euclidean-SVM classifier with different kernels and different values of parameter
C, on the mathematics dataset
Classification approach Classification accuracy (%) Variance of
(Kernels) accuracies
Dataset: Mathematics Value of soft margin parameter, C across
Training Set: 80 Documents values of
Testing Set: 240 Document 1 10 100 1000 10000 100000 parameter C
SVM (Linear) 61.25 74.17 74.17 74.17 74.17 74.17 27.8211
SVM (Polynomial) 70.00 70.00 70.00 70.42 70.00 70.00 0.0290
SVM (RBF) 41.67 41.67 41.67 41.67 41.67 41.67 0
SVM (Sigmoid) 12.50 12.50 12.50 12.50 12.50 12.50 0
Euclidean-SVM (Linear) 75.83 74.17 73.75 73.75 73.75 73.75 0.6922
Euclidean-SVM (Polynomial) 72.92 72.92 72.92 72.92 72.92 72.92 0
Euclidean-SVM (RBF) 75.83 75.83 75.83 75.83 75.83 75.83 0
Euclidean-SVM (Sigmoid) 75.83 75.83 75.83 75.83 75.83 75.83 0
polynomial kernel have the variance of 0.029, which is more
consistent as compared to the classification accuracies of the
SVM with linear kernel. On the other hand, the SVM clas-
sifier with RBF kernel and the SVM classifier with sigmoid
kernel have performed poorly on the Mathematics dataset.
With the varying of the value of parameter C, the classifica-
tion performance of the SVM with RBF kernel is consistent
with the accuracies of 41.67%, while the SVM with sigmoid
kernel has achieved poor performance with low but consis-
tent accuracies of 12.50%. As the nature of the conventional
SVM, the inconsistency of the SVM classification perfor-
mance in this experiment is due to the implementation of
different kernel functions. In order to guarantee good gener-
alization ability for the SVM, the determination of the right
kernel function is considered as mandatory in this experi-
ment.
On the other hand, the Euclidean-SVM approach is not
highly depend on the implementation of kernel functions
and parameter C. The Euclidean-SVM has achieved accu-
racies between the range of 72.92% to 75.83%, with the
implementation of different kernels and different values of
parameter C. In other words, the Euclidean-SVM classifier
has better consistency in terms of accuracy, with the imple-
mentation of different kernel functions and different value
of parameter C, as compared to the conventional SVM clas-
sifier.
4.3 Experiment on the automobiles dataset
The Automobiles dataset consists of 9 categories, which
consist of an equal number of 30 documents for each of
them. In other words, this dataset consists of a total of 270
documents. 10 documents from each category had been uti-
lized to construct the training set with 90 documents, and
the remaining 20 documents from each category were used
to build the testing set with 180 documents.
Table 13 shows the experimental results of the conven-
tional SVM classifier and the Euclidean-SVM classifier,
which have been implemented with different kernels and dif-
ferent values of parameter C, on the Automobiles dataset.
Table 13 shows that the implementation of kernel func-
tions and parameter C has very high impact on the classi-
fication performance of the conventional SVM approach,
while the Euclidean-SVM approach does not suffer from
this problem. As illustrated in Table 13, the SVM classifier
with linear kernel has achieved medium classification per-
formance on the Automobiles dataset, with accuracies be-
tween the range of 56.11% to 68.89% (variance of classifi-
cation accuracies is 24.6998), while the value of parameter
C varies from 1 to 105. The SVM classifier with polynomial
kernel has achieved consistent but poor classification accu-
racies of 30.56%, with the varying of the value of parameter
C from 1 to 105. The SVM classifier with RBF kernel and
the SVM classifier with sigmoid kernel have achieved the
lowest classification performance in this experiment, with
consistent accuracies of 11.11% while the value parameter
C varies within the tested range. These results further justify
that the SVM is highly dependent on the implementation
of kernel functions. The implementation of the inappropri-
ate kernel function may lead to a high risk of obtaining low
classification accuracy from the SVM classifier.
While SVM suffers from the problem of being highly
dependent on the implementation of kernel function, the
Euclidean-SVM has achieved classification accuracies be-
tween the range 59.44% to 67.78% (variance of classifica-
tion accuracies across the tested values of parameter C is
8.8515), with the implementation of linear kernel and differ-
ent values of parameter C. Even though the gap between the
An enhanced Support Vector Machine classification framework by using Euclidean distance function 91
Table 13 Classification accuracies of the SVM classifier and the Euclidean-SVM classifier with different kernels and different values of parameter
C, on the automobiles dataset
Classification approach Classification accuracy (%) Variance of
(Kernels) accuracies
Dataset: Automobiles Value of soft margin parameter, C across
Training Set: 90 Documents values of
Testing Set: 180 Document 1 10 100 1000 10000 100000 parameter C
SVM (Linear) 62.78 68.89 58.33 56.11 56.67 57.22 24.6998
SVM (Polynomial) 30.56 30.56 30.56 30.56 30.56 30.56 0
SVM (RBF) 11.11 11.11 11.11 11.11 11.11 11.11 0
SVM (Sigmoid) 11.11 11.11 11.11 11.11 11.11 11.11 0
Euclidean-SVM (Linear) 67.78 64.44 63.33 60.56 59.44 62.22 8.8515
Euclidean-SVM (Polynomial) 67.78 67.78 67.78 67.78 67.78 67.78 0
Euclidean-SVM (RBF) 62.78 62.22 62.22 62.22 62.22 62.22 0
Euclidean-SVM (Sigmoid) 67.78 67.78 67.78 67.78 67.78 67.78 0
Table 14 Classification accuracies of the SVM classifier and the Euclidean-SVM classifier with different kernels and different values of parameter
C, on the WebKB dataset
Classification approach Classification accuracy (%) Variance of
(Kernels) accuracies
Dataset: WebKB Value of soft margin parameter, C across
Training Set: 2803 Documents values of
Testing Set: 1396 Document 1 10 100 1000 10000 100000 parameter C
SVM (Linear) 48.60 72.68 73.11 74.12 74.33 73.90 104.7953
SVM (Polynomial) 38.99 38.99 38.99 38.99 73.62 73.33 317.1325
SVM (RBF) 62.36 72.83 73.33 74.33 74.40 74.40 22.4632
SVM (Sigmoid) 48.60 72.68 73.11 74.12 74.33 73.97 104.9205
Euclidean-SVM (Linear) 68.60 70.68 68.53 64.01 62.22 61.57 14.5823
Euclidean-SVM (Polynomial) 68.60 68.45 68.67 68.67 69.89 68.17 0.3522
Euclidean-SVM (RBF) 69.75 69.53 67.16 63.51 61.72 62.29 13.0916
Euclidean-SVM (Sigmoid) 68.60 70.68 68.53 64.01 62.22 61.57 14.5823
highest accuracy and the lowest accuracy for the Euclidean-
SVM classifier in this experiment is approximately 8%, the
classification performance of the Euclidean-SVM approach
is still considered as having low dependence on the imple-
mentation of different kernel functions and different value
of parameter C on the classifier, as compared to the conven-
tional SVM.
4.4 Experiment on the WebKB dataset
The WebKB dataset which had been utilized in our exper-
iments was acquired from Ana Cardoso-Cachopo’s web-
site [68]. It consists of 4 categories with a total of 4199 doc-
uments. The training set is constructed by 2803 documents,
while the testing set consists of 1396 documents.
Table 14 illustrates the experimental results of the con-
ventional SVM classifier and the Euclidean-SVM classifier
with different kernels and different values of parameter C,
on the WebKB dataset.
Table 14 again shows the inconsistency in terms of clas-
sification accuracy for the SVM with different kernel func-
tions and different values of parameter C. Based on Table 14,
we can observe that the SVM has achieved high classifica-
tion accuracies for every implementation of different kernel
functions, which are approximately 74%. However, the in-
appropriate value of parameter C will severely degrade the
accuracy of the SVM classifiers to below 50% (48.60% for
the SVM with linear kernel, C =1 and SVM with sigmoid
kernel, C =1) or even down to below 40% (38.99% for the
SVM with polynomial kernel, C =1 to 1000).
92 L.H. Lee et al.
On the other hand, the implementation of different ker-
nel functions and different value of parameter C does not
have high impact on the Euclidean-SVM classification ap-
proach. The Euclidean-SVM has achieved classification ac-
curacies between the range of 61.57% to 70.68%, with the
implementation of different kernels and different values of
parameter C. These results are considered as consistent even
though the gap between the highest classification accuracy
and the lowest classification accuracy in this experiment is
approximately 9%, as compared to the conventional SVM,
where the gap between the highest classification accuracy
and the lowest classification accuracy is approximately 35%.
The results in this experiment have further justified that the
Euclidean-SVM has lower dependency on the implementa-
tion of kernel functions and value of parameter C, as com-
pared to the conventional SVM.
In this experiment, the parameter C has greatly affected
the performance of the conventional SVM classifier with
different kernel functions. Based on the results as illus-
trated in Table 14, high variances of classification accuracies
across the tested values of parameter C have been recorded
for the conventional SVM, which is up to 317.13. This is
due to the fact that the WebKB dataset consists of files col-
lected from computer science departments of various uni-
versities. Text documents in this dataset are describing top-
ics which are related to computer science in various univer-
sities. Therefore, text documents for each of the categories
are very similar to each other, and non-separable cases have
high occurrences during the classification phase. As the re-
sult, parameter C has significant effect in the classification
performance of the conventional SVM, due to the fact that
soft margin is needed to allow classification errors caused by
the non-separable cases. On the other hand, the Euclidean-
SVM is not as sensitive as the conventional SVM to the im-
plementation of parameter C, as low variances of classifica-
tion accuracies across the tested values of parameter C (less
than 15) have been recorded in this experiment.
4.5 Experiment on the Reuters-21578 R8 dataset
The Reuters-21578 R8 dataset which had been used in our
experiments was acquired from Ana Cardoso-Cachopo’s
website [68], which is the same source where the WebKB
dataset was acquired. This collection consists of 7670 doc-
uments which had been categorized into 8 categories. The
documents in the collection had been divided into training
set and testing set, which consist of 5483 documents and
2187 documents respectively.
Table 15 illustrates the experimental results of the con-
ventional SVM classifier and the Euclidean-SVM classifier
with different kernels and different values of parameter C,
on the Reuters-21578 R8 dataset.
The results illustrated in Table 15 show that, as compared
to the conventional SVM approach, the Euclidean-SVM has
better consistency in terms of classification accuracy over
the implementation of different kernel functions and differ-
ent values of parameter C. The conventional SVM classifier
has scored high accuracies in this experiment, only when
the appropriate combination of kernel function and param-
eter C is implemented. In this experiment, the best classi-
fication accuracy of 94.97% has been achieved by the con-
ventional SVM with linear kernel function and parameter C
is set at 100000. However, the conventional SVM has only
achieved high classification accuracies (between the range
of 87.75% to 94.97%) while the value of parameter C is high
(C =100000), and this is not applicable for the implemen-
tation of polynomial kernel function, where the SVM classi-
fier scores the lowest accuracy of 49.52%. As illustrated in
Table 15 Classification accuracies of the SVM classifier and the Euclidean-SVM classifier with different kernels and different values of parameter
C, on the Reuters-21578 R8 dataset
Classification approach Classification accuracy (%) Variance of
(Kernels) accuracies
Dataset: Reuters-21578 R8 Value of Soft Margin Parameter, C across
Training Set: 5483 Documents values of
Testing Set: 2187 Document 1 10 100 1000 10000 100000 parameter C
SVM (Linear) 52.17 87.75 93.14 94.19 94.51 94.97 283.6759
SVM (Polynomial) 49.52 49.52 49.52 49.52 49.52 49.52 0
SVM (RBF) 49.52 49.52 49.52 49.52 66.16 89.21 264.6682
SVM (Sigmoid) 49.52 49.52 49.52 49.52 52.17 87.75 238.0053
Euclidean-SVM (Linear) 81.48 77.73 80.43 63.69 65.11 54.55 120.2538
Euclidean-SVM (Polynomial) 82.72 82.72 82.81 82.72 82.72 82.72 0.0014
Euclidean-SVM (RBF) 82.30 82.30 82.30 82.30 80.80 77.05 4.4438
Euclidean-SVM (Sigmoid) 84.73 84.73 82.58 81.48 81.48 77.73 6.7854
An enhanced Support Vector Machine classification framework by using Euclidean distance function 93
Table 15, in general, we could observe that the SVM clas-
sifier with linear kernel function has obtained high accura-
cies (between the range of 87.75% to 94.97%) in the exper-
iment with the implementation of different values of param-
eter C, except that when parameter C is set at 1. However, if
the SVM classifier has the inappropriate implementation of
kernel functions, the classification performance is severely
degraded and low classification accuracies have been ob-
tained. The SVM classifier with polynomial kernel function
has achieved the baseline accuracy in this experiment, where
classification accuracy is recorded as 49.52%, even though
the value of parameter C is altered from 1 to 100000. As for
the SVM classifier with RBF kernel function and the SVM
classifier with sigmoid kernel function, low classification ac-
curacies have been recorded unless the right value of param-
eter C is applied. Based on the results obtained from this ex-
periment, it has been proven again that the performance of
the conventional SVM classifier is highly dependent to the
implementation of kernel functions and parameter C.
As for the Euclidean-SVM classification approach, even
though the highest classification accuracy recorded (84.73%
for the Euclidean-SVM with sigmoid kernel, C =1 to 10)
is lower than the highest classification accuracy recorded
(94.97%) for the conventional SVM with linear kernel func-
tion and parameter C =100000, the overall consistency in
terms of accuracy for the Euclidean-SVM with the imple-
mentation of different kernel functions and different values
of parameter C is much better as compared to the conven-
tional SVM. As illustrated in Table 15, the highest variance
of classification accuracies across the tested values of pa-
rameter C for the Euclidean-SVM is recorded at 120.25,
while the highest variance of classification accuracies across
the tested values of parameter C for the conventional SVM
is recorded at 283.68. In most of the combinations of dif-
ferent kernel functions and different values of parameter C,
the Euclidean-SVM has achieved high classification accu-
racies (77.05% to 84.73%), except for the implementation
of linear kernel function and parameter C is set at the range
from 1000 to 100000. On the other hand, the conventional
SVM approach has achieved low classification accuracies in
most of the combinations of different kernel functions and
different values of parameter C, except for the implementa-
tion of linear kernel function and parameter C is set at the
range from 10 to 100000, the implementation of RBF kernel
function and parameter C is set at 100000, and the imple-
mentation of sigmoid kernel function and parameter C is set
at 100000. The experimental results here have further jus-
tified that the Euclidean-SVM has achieved better overall
performance, and lower dependency on the implementation
of kernel functions and value of parameter C, as compared
to the conventional SVM.
As illustrated in Table 15, the Euclidean-SVM has
achieved a low accuracy when linear kernel function is ap-
plied and parameter C is set at 100000. This is due to the
fact that the high value of parameter C leads to the condi-
tion of overfitting [17] to the SVM training algorithm, and
less training data points are identified as the SVs. In such a
situation, the Euclidean-SVM suffers from lack of sufficient
information in computing the average Euclidean distance
between the input data points and each set of the SVs from
different categories, hence the classification performance is
degraded. This problem could be solved by setting the pa-
rameter C with low values. Based on the results obtained in
this experiment, the range of values of parameter C from 1
to 100 will lead the Euclidean-SVM classification approach
to achieve good accuracies (approximately 80%), regardless
the implementation of kernel functions.
4.6 Comparison of the performance of the Bayesian
vectorization technique and TFIDF vectorization
technique
In this paper, the Bayesian vectorization technique had been
utilized to transform textual data into numerical format. On
the other hand, the TFIDF (Term Frequency Inverse Docu-
ment Frequency) technique has been reported as one of the
most widely used pre-processing technique by many text
mining research groups for the same purpose. To validate
that the enhancement of our proposed Euclidean-SVM clas-
sification framework is contributed by the implementation
of the Euclidean distance function to replace the optimal
separating hyper-plane of the conventional SVM classifica-
tion approach, rather than the preprocessing technique, we
have conducted an additional experiment using the Reuters-
21578 dataset, to compare the performance of the Bayesian
Table 16 Comparison of the TFIDF-SVM classifiers and the Bayesian-SVM classifiers with different kernel functions
Vectorization technique-classifier Accuracy of classifier using different kernels (%)
Dataset: Reuters-21578 R8
Training Set: 5483 Documents
Testing Set: 2187 Documents Linear Polynomial RBF Sigmoid
TFIDF-SVM 90.29 80.14 90.58 90.29
Bayes-SVM 94.97 92.87 94.97 94.92
94 L.H. Lee et al.
vectorization over the TFIDF vectorization as the prepro-
cessing technique for the SVM classifier. The experimental
results of this comparison have been presented in Table 16
below.
Based on the comparison conducted using the Reuters-
21578 dataset, the Bayesian-SVM classifiers always outper-
forms the TF-IDF-SVM classifiers, with the implementation
of all tested types of kernel functions. The results presented
in Table 16 prove that the Bayesian vectorization technique
provides better transformation of textual data to numerical
data for the SVM classifier, as compared to the TFIDF vec-
torization technique. In our previous works as presented in
[14,60], the SVM classifier with the Bayesian vectorization
has also been proven to have better performance, in terms
of classification accuracy and time consumption, as com-
pared to the SVM classifier with the TFIDF vectorization
technique. These results show that the Bayesian vectoriza-
tion technique has contributed a more effective textual data
transformation process to the SVM classifier, as compared
to the use of TFIDF vectorization technique for the same
purpose.
As the Bayesian vectorization contributes to the im-
provement of the classifier in the pre-processing stage, the
Euclidean-SVM approach further improves the conventional
SVM by replacing the optimal separating hyper-plane with
the Euclidean distance function in making the classifica-
tion decision. We have also conducted the experiment on
the Euclidean-SVM classification approach with different
vectorization techniques, to further justify the fact that the
main contribution of the proposed Bayesian-SVM clas-
sification framework to the conventional SVM approach
is delivered by the implementation of the Euclidean dis-
tance function to replace the optimal separating hyper-plane
of the conventional SVM classification approach, rather
than the preprocessing technique by the Bayesian vector-
ization. Table 17 illustrates the results of comparison be-
tween the TFIDF-Euclidean-SVM classification approach
and Bayesian-Euclidean-SVM classification approach.
As illustrated in Table 17, the TFIDF-Euclidean-SVM
classifier performs badly with only 1.02% of accuracy. This
is due to the fact that the high dimensionality of vectorized
data (approximately 18,000 dimensions, which is equal to
the number of words in the text collection) resulted from
the TFIDF vectorization has led to high complexity in com-
putation to the Euclidean-SVM approach in making classi-
fication decision. As the Euclidean distance function may
suffer from curse of dimensionality, high dimension of data
may severely degrade the effectiveness and efficiency of
the classification process due to the convoluted computa-
tion, hence leads to the poor performance of the Euclidean-
SVM classifier. On the other hand, the transformation of
data by the Bayesian vectorization technique, which reduces
the dimensionality of data from thousands to typically less
than one hundred (number of categories the document may
be classified to) has contributed to better performance of
the Euclidean-SVM classification approach, as the com-
putational complexity by the Euclidean distance function
has been drastically reduced. The Bayesian-Euclidean-SVM
classifier has achieved a classification accuracy of 84.73%.
Besides, the high dimensionality of vectorized data by
the TFIDF technique has also resulted to higher training
time and testing time consumptions of the Euclidean-SVM
classifier. In this experiment, the TFIDF-Euclidean-SVM
classifier has recorded the training time consumption of 20
seconds and the testing time consumption of 27 hours 50
minutes and 32 seconds. On the other hand, The Bayesian-
Euclidean-SVM classifier has recorded the training time
consumption of 6 seconds and the testing time consumption
of 2 hours 51 minutes 29 seconds. The time consumptions of
the TFIDF-Euclidean-SVM classifier, especially the testing
time is much higher as compared to the Bayesian-Euclidean-
SVM classifier. This is again due to the high dimensional-
ity of vectorized data resulted from the TFIDF vectoriza-
tion, which lead to high computational complexity to the
Euclidean-SVM classification approach. The system speci-
fications for running both of the TFIDF-Euclidean-SVM ap-
proach and the Bayesian-Euclidean-SVM approach are Intel
Core i3 CPU 550 at 3.2 GHz, 2 GB of RAM and Windows
7 Home Basic 32-Bit.
Based on the experimental results presented in this sec-
tion, it shows that the Bayesian vectorization technique has
enhanced the pre-processing stage of the conventional SVM
approach which typically implements TFIDF vectorization
technique. This fact has also been proven in our previous
works [14,60]. In the classification phase, the Euclidean-
SVM approach further improves the performance of the con-
Table 17 Comparison of the TFIDF-Euclidean-SVM classifier and the Bayesian-Euclidean-SVM classifier
Dataset: Reuters-21578 R8 TFIDF-Euclidean-SVM Bayesian-Euclidean-SVM
Training Set: 5483 Documents
Testing Set: 2187 Documents
Classification accuracy (%) 1.02 84.73
Training time (hh:mm:ss) 00:00:29 00:00:06
Testing time (hh:mm:ss) 27:50:32 02:51:29
An enhanced Support Vector Machine classification framework by using Euclidean distance function 95
ventional SVM approach to obtain better effectiveness and
efficiency in performing classification tasks. In conclusion,
the Bayesian vectorization technique and the Euclidean dis-
tance function provide a combination of enhancement tech-
niques to the conventional SVM approach, which is able to
improve the performance of the baseline approach in terms
of classification accuracy and time consumptions.
4.7 Discussion on the experimental results
Based on the results that we obtained from a series of experi-
ments using different text datasets, we found that the perfor-
mance of the Euclidean-SVM classification framework has
low dependency on the implementation of kernel functions
and parameter C. For all the five datasets that we have used
in our experiments, high classification accuracy can always
been obtained by the Euclidean-SVM classifier with linear
kernel function and a small value of parameter C. In all of
our experiments by carrying out classification tasks on each
of the five datasets, when linear kernel is used and parame-
ter C is set at the value of 1, the Euclidean-SVM classifier
always outperforms the conventional SVM classifier. This
situation shows that, by performing the classification tasks
using the Euclidean-SVM approach, high accuracies could
be obtained, without needing the transformation of the origi-
nal vector space into a high dimensional feature space using
kernel functions. This is due to the fact that the Euclidean
distance function could perform effective classification de-
cision making task, as long as all the training data points (the
SVs) and the input data points are mapped into the same
vector space. Besides this, the selection of optimal value
of parameter C can also be avoided in the Euclidean-SVM
classification approach. As shown by the results obtained
from the experiments, in most cases, the variances of clas-
sification accuracies across different values of parameter C
for the SVM approach are much higher as compared to the
Euclidean-SVM approach. This reiterates the fact that the
Euclidean-SVM approach has less dependency to the value
of parameter C. Moreover, the Euclidean-SVM classifica-
tion approach can perform well with a small value of param-
eter C (based on our experiments, the optimal value for C is
1), while the small value of parameter C leads to the condi-
tion of underfitting for the conventional SVM classifier [17].
According to SVM methodology, small value of parameter
C will lead to the situation where more training data points
have been identified as the SVs. When more SVs have been
identified during training phase, more information is pro-
vided for the computation of average Euclidean distance be-
tween the input data points and each set of the SVs from
different categories. Therefore, more accurate classification
results would be obtained for the Euclidean-SVM approach.
In contrast to the conventional SVM classification approach
which conducts iterative processes to determine the right
kernel function and the appropriate value of parameter C, the
convoluted and computationally intensive training process
and the preparation of the additional validation set could be
avoided by implementing the Euclidean-SVM classification
approach. In conclusion, our proposed Euclidean-SVM ap-
proach contributes to more effective and efficient classifica-
tion task, with its unique characteristic which is independent
from the implementation of different kernel functions and
different value of parameter C.
Besides, in our experiments, when the SVs are identified
using the SVM approach with certain kernel functions, such
as RBF kernel and sigmoid kernel, the Euclidean-SVM ap-
proach drastically outperforms the conventional SVM ap-
proach, even though both of these approaches share a simi-
larity that the SVs are similar. This situation can be observed
from the experiments on the Vehicles dataset, the Mathemat-
ics dataset and the Automobiles dataset. Based on our anal-
ysis, this situation is due to the fact that in the conventional
SVM classification phase, the testing data point is computed
with alpha (αi) values of each support vector and summed
up in order to determine the right category of the testing data
point using (8)[37].
iF(x) =signl
i=1
yiαiK(x,xi)+b(8)
The αivalues play a significant role in the performance of
the conventional SVM approach and the wrongly weighted
SVs play a very strong role in misclassification. The hyper-
plane resulting from a kernel function that is not optimized
will result in αivalues that are not accurate. This leads to the
low classification accuracy experienced by the conventional
SVM classifier using the non-optimal kernels.
On the other hand, the Euclidean-SVM approach how-
ever computes the average distance between the testing data
point and each set of support vectors from different cate-
gories before making the classification decision. Hence, dis-
tance and location of the testing data point and the SVs are
given higher priority in the classification process. There is
no reliance on the αiweight values. The average distance
calculated also does not change drastically when the SVs
change due to kernel manipulation. By using the average
Euclidean distance, the effect of the wrongly weighted SVs
which is a result of using a non-optimal kernel function is
diluted. This causes the accuracy of the Euclidean-SVM ap-
proach to be much higher and relatively consistent than the
conventional SVM approach, even though the SVs used are
the same for both approaches.
Another reason that may cause this situation is that the
setting of a constant value to one of the kernel parameters for
the SVM. Changing the value of this kernel parameter will
vary the construction of the optimal separating hyper-plane
and the number of SVs during the training phase, hence re-
sult in different classification accuracies for the conventional
96 L.H. Lee et al.
SVM approach, as well as the Euclidean-SVM approach.
The key point here is that, in terms of classification accu-
racy, the Euclidean-SVM approach has better consistency,
as compared to the conventional SVM approach, across the
range of values of parameter C and the types of kernel func-
tion.
As for the analysis of computational complexity of the
Euclidean-SVM classification approach, since it contributes
to an enhanced SVM classification framework which is
more independent from the implementation of kernel func-
tions and parameters, as the trade-off, the classification time
consumption of the Euclidean-SVM approach is higher than
the conventional SVM. The classification complexity of the
Euclidean-SVM approach is depending on the dimension-
ality of the data points, and also the number of SVs gen-
erated from the training phase. This is due to the fact that
the Euclidean-SVM approach inherits the characteristic of
the nearest neighbor approach by calculating the distances
between the new input data point and each set of the SVs
from different categories using the Euclidean distance for-
mula. Based on our experimental results, we found that the
Euclidean-SVM classifier performs well when the value of
parameter C is small, which will contribute to a high num-
ber of SVs during training phase. Due to the high number
of SVs, in the classification phase, high computational time
has been consumed for calculating the average Euclidean
distance between the input data points and each set of the
SVs from different categories. The higher classification time
consumption of our proposed Euclidean-SVM approaches
compared to the conventional SVM approach is reasonable,
since the training time of the Euclidean-SVM model is much
lesser than the training time of the conventional SVM classi-
fier, which implement evolutionary algorithms to determine
the optimal combination of kernel functions and parameters.
Besides this, the relatively consistent classification accuracy
of the Euclidean-SVM model with all ranges of kernel func-
tions and values of parameter C as compared to the conven-
tional SVM has been highlighted as one of the outstanding
characteristics of our proposed Euclidean-SVM classifica-
tion framework.
5 Conclusion
A new text classification framework is presented and de-
scribed here. The Euclidean-SVM classification approach is
reported to have low impact on the implementation of ker-
nel function and soft margin parameter C. The classification
accuracy of the Euclidean-SVM approach is relatively con-
sistent with the implementation of different kernel functions
and different values of parameter C, as compared to the con-
ventional SVM, which its classification accuracy is severely
degraded with the implementation of an inappropriate ker-
nel function and parameter C. This is achieved through the
implementation of the Euclidean distance function to re-
place the optimal separating hyper-plane as the classifica-
tion decision making function of the SVM. Unlike the opti-
mal separating hyper-plane of the conventionalSVM, where
the construction is highly dependent on kernel functions, the
Euclidean distance function could perform effective clas-
sification decision making tasks as long as all the training
data points and the input data points are in the same vector
space. Hence, the issue of selecting the appropriate kernel
function and parameter C can be avoided in the Euclidean-
SVM classification framework. However, the classification
phase of the Euclidean-SVM approach consumes a longer
time as compared to the conventional SVM. Besides this,
for certain classification tasks where the similarity between
categories are high, for example, the WebKB dataset that
we have used in our experiments, the classification accuracy
of the Euclidean-SVM approach is lower than the accuracy
of conventional SVM approach. This is due to the fact that
the Euclidean distance calculation which inherits the char-
acteristic of nearest neighbor approach, may suffer from the
curse of dimensionality, hence leads to the inefficient classi-
fication tasks. As a future work, we will further investigate
on the alternative distance and similarity measurement func-
tions to replace the Euclidean distance function, which may
reduce the time consumption of the distance or similarity
calculation, and contribute to a more accurate distance or
similarity measurement for the SVs and the input data point,
hence leading to a more effective and efficient SVM-based
text classification framework.
References
1. Han EH, Karypis G, Kumar V (1999) Text categorization us-
ing weighted adjusted k-nearest neighbor classification. Techni-
cal Report, Department of Computer Science and Engineering,
Army HPC Research Centre, University of Minnesota, Minneapo-
lis, USA
2. He J, Tan AH, Tan CL (2003) On machine learning methods for
Chinese document categorization. Appl Intell 18(3):311–322
3. Androutsopoulos I, Koutsias J, Chandrinos KV, Spyropoulos
CD (2000) An experimental comparison of Naïve Bayesian and
keyword-based anti-spam filtering with personal e-mail messages.
In: Proceedings of the 23rd annual international ACM SIGIR con-
ference on research and development in information retrieval, pp
160–167
4. Chen JN, Huang HK, Tian SF, Qu YL (2009) Feature selec-
tion for text classification with Naïve Bayes. Expert Syst Appl
36(3):5423–5435
5. Domingos P, Pazzani M (1997) On the optimality of the simple
Bayesian classifier under zero-one loss. Mach Learn 29(2–3):103–
130
6. Eyheramendy S, Genkin A, Ju WH, Lewis D, Madigan D (2003)
Sparce Bayesian classifiers for text categorization. Technical
Report, Department of Statistics, Rutgers University, 2003.
URL:http://www.stat.rutgers.edu/~madigan/PAPERS/jicrd-v13.
pdf
An enhanced Support Vector Machine classification framework by using Euclidean distance function 97
7. Kim SB, Rim HC, Yook DS, Lim HS (2002) Effective methods for
improving Naïve Bayes text classification. In: Proceedings of the
7th Pacific Rim international conference on artificial intelligence.
Springer, Heidelberg, pp 414–423
8. Lee LH, Isa D, Choo WO, Chue WY (2010) Tournament structure
ranking techniques for Bayesian text classification with highly
similar categories. J Appl Sci—Asian Netw Sci Inf 10(13):1243–
1254
9. Lee LH, Isa D (2010) Automatically computed document depen-
dent weighting factor facility for Naïve Bayes classification. Ex-
pert Syst Appl 37(12):8471–8478
10. McCallum A, Nigam K (1998) A comparison of event models for
Naïve Bayes text classification. In: AAAI-98 workshop on learn-
ing for text categorization, pp 41–48
11. O’Brien C, Vogel C (2003) Spam filters: Bayes vs. chi-squared.
Letters vs. words. In: Proceedings of the 1st international sympo-
sium on information and communication technologies, pp 298–
303
12. Sahami M, Dumais S, Heckerman D, Horvitz E (1998) A Bayesian
approach to filtering junk e-mail. In: AAAI-98 workshop on learn-
ing for text categorization, Madison, Wisconsin, pp 55–62
13. Diederich J, Kindermann J, Leopold E, Paass G (2003) Author-
ship attribution with support vector machines. Appl Intell 19(1–
2):109–123
14. Isa D, Lee LH, Kallimani VP, Rajkumar R (2008) Text document
pre-processing with the Bayes formula for classification using the
support vector machine. IEEE Trans Knowl Data Eng 20(9):1264–
1272
15. Joachims T (1998) Text categorization with support vector ma-
chines: learning with many relevant features. In: Proceedings of
the 10th European conference on machine learning (ECML-98),
pp 137–142
16. Joachims T (1999) Making large-scale SVM learning practical. In:
Advances in kernel methods—-support vector learning, pp 169–
184
17. Joachims T (2002) Learning to classify text using Support Vector
Machines. Kluwer Academic Publishers, Dordrecht
18. Nigam K, Lafferty J, McCallum A (1999) Using maximum en-
tropy for text classification. In: Proceedings of the IJCAI-99 work-
shop on machine learning for information filtering, pp 61–67
19. Greiner R, Schaffer J (2001) AIxploratorium—decision trees.
Department of Computing Science, University of Alberta, Ed-
monton, AB T6G 2H1, Canada. URL:http://www.cs.ualberta.
ca/~aixplore/learning/DecisionTrees
20. Apte C, Damerau F, Weiss SM (1994) Automated learning of de-
cision rules for text categorization. ACM Trans Inf Sys 12(3):233–
251
21. Apte C, Damerau F, Weiss SM (1994) Towards language indepen-
dent automated learning of text categorization models. In: Pro-
ceedings of the 17th annual international ACM-SIGIR conference
on research and development in information retrieval, pp 23–30
22. Chen CM, Lee HM, Hwang CW (2005) A hierarchical neural net-
work document classifier with linguistic feature selection. Appl
Intell 23(3):5423–5435
23. Isa D, Kallimani VP, Lee LH (2009) Using self-organizing map for
clustering of text document. Expert Syst Appl 36(5):9584–9591
24. Lee CH, Yang HC (2003) A multilingual text mining approach
based on self-organizing maps. Appl Intell 18(3):295–310
25. Bosnic Z, Kononenko I (2008) Estimation of individual predic-
tion reliability using the local sensitivity analysis. Appl Intell
29(3):187–203
26. Hao PY, Chiang JH, Lin YH (2009) A new maximal-margin
spherical-structured multi-class support vector machine. Appl In-
tell 30(2):98–111
27. Kocsor A, Toth L (2004) Application of kernel-based feature space
transformations and learning methods to phoneme classification.
Appl Intell 21(2):129–142
28. Kyriacou E, Pattichis MS, Pattichis CS, Mavrommatis A,
Christodoulou CI, Kakkos S, Nicolaides A (2009) Classification
of atherosclerotic carotid plaques using morphological analysis on
ultrasound images. Appl Intell 30(1):3–23
29. Li YM, Lai CY, Kao CP (2011) Building a qualitative recruitment
system via SVM with MCDM approach. Appl Intell 35(1):75–88
30. Li C, Liu K, Wang H (2011) The incremental learning algorithm
with support vector machine based on hyperplane-distance. Appl
Intell 34(1):19–27
31. Maglogiannis I, Zafiropoulos E, Anagnostopoulos I (2009) An in-
telligent system for automated breast cancer diagnosis and prog-
nosis using svm based classifiers. Appl Intell 30(1):24–36
32. Mahmoud SA, Al-Khatib WG (2010) Recognition of Arabic
(Indian) bank check digits using log-Gabor filters. Appl Intell.
doi:10.1007/s10489-010-0235-2
33. Maudes J, Rodriguez JJ, Garcia-Osorio C, Pardo C (2011)
Random projections for linear SVM ensembles. Appl Intell
34(3):347–359
34. Yu B, Yang Z (2009) A dynamic holding strategy in public transit
systems with real-time information. Appl Intell 31(1):69–80
35. Chakrabarti S, Roy S, Soundalgekar MV (2003) Fast and accu-
rate text classification via multiple linear discriminant projection.
VLDB J 12(2):170–185
36. Yang YM, Liu X (1999) A re-examination of text categorization
methods. In: Proceedings of the 22nd annual international ACM
SIGIR conference on research and development in information re-
trieval (SIGIR’99), pp 42–49
37. Haykin S (1999) Neural network, a comprehensive foundation,
2nd edn. Prentice Hall, New York
38. Burges CJC (1998) A tutorial on Support Vector Machines
for pattern recognition. Bell Laboratories, Lucent Technologies.
Data Mining and Knowledge Discovery. URL:http://research.
microsoft.com/~cburges/papers/SVMTutorial.pdf
39. Shawe-Taylor J, Cristianini N (2004) kernel methods for pattern
analysis. Cambridge University Press, Cambridge
40. Alpaydin E (2004) Introduction to machine learning. MIT Press,
Cambridge
41. Hsu CW, Lin CJ (2002) A comparison of methods for multiclass
support vector machines. IEEE Trans Neural Netw 13(2):415–425
42. Staelin C (2003) Parameter selection for Support Vector Ma-
chines. Technical Report HPL-2002-354R1, Hewlett Packard Lab-
oratories
43. Quang AT, Zhang QL, Li X (2002) Evolving Support Vector Ma-
chine parameters. In: Proceedings of the 1st international confer-
ence on machine learning and cybernetics, pp 548–551
44. Friedrichs F, Igel C (2004) Evolutionary tuning of multiple SVM
parameters. In: Proceedings of European symposium on artificial
neural networks (ESANN’2004), pp 519–524
45. Briggs T, Oates T (2005) Discovering domain-specific composite
kernels. In: Proceedings of the 20th national conference of artifi-
cial intelligence. AAAI Press, Menlo Park, pp 732–738
46. Dong Y, Xia Z, Tu M (2007) Selecting optimal parameters
in Support Vector Machines. In: Proceedings of the IEEE 6th
international conference on machine learning and applications
(ICMLA07).
47. Avci E (2009) Selecting of the optimal feature subset and kernel
parameters in digital modulation classification by using hybrid ge-
netic algorithm-support vector machines: HGASVM. Expert Syst
Appl 36(2):1391–1402
48. Zhang Q, Shan G, Duan X, Zhang Z (2009) Parameters optimiza-
tion of Support Vector Machine based on simulated annealing and
genetic algorithm. In: Proceedings of the IEEE international con-
ference on robotics and biomimetics, pp 1302–1306
49. Diosan L, Rogozan A, Pecuchet JP (2010) Improving clas-
sification performance of Support Vector Machine by geneti-
cally optimising kernel shape and hyper-parameters. Appl Intell
doi:10.1007/s10489-010-0260-1
98 L.H. Lee et al.
50. Sun J (2008) Fast tuning of SVM kernel parameter using dis-
tance between two classes. In: Proceedings of the 3rd interna-
tional conference on intelligent system and knowledge engineer-
ing (ISKE2008), pp 108–113
51. Sun J, Zheng C, Li X, Zhou Y (2010) Analysis of the distance be-
tween two classes for tuning SVM hyperparameters. IEEE Trans
Neural Netw 21(2):305–318
52. Wu KP, Wang SD (2009) Choosing the kernel parameters for Sup-
port Vector Machines by the inter-cluster distance in the feature
space. Pattern Recognit 42(5):710–717
53. Buck TAE, Zhang B (2006) SVM kernel optimization: an example
in yeast protein subcellular localization prediction. Project Report,
School of Computer Science, Carnegie Mellon University, Pitts-
burgh, USA
54. Doniger S, Hofmann T, Yeh J (2002) Predicting CNS permeability
of drugs molecules: comparison of neural network and Support
Vector Machines algorithms. J Comput Biol 9(6):849–864
55. Kim H, Cha S (2005) Empirical evaluation of SVM-based
masquerade detection using UNIX commands. Comput Secur
24(2):160–168
56. Li H, Jiang T (2004) A class of edit kernels for SVMs to predict
translation initiation in eukaryotic mRNAs. In: Proceedings of the
8th annual international conference on research in computational
molecular biology, pp 262–271
57. Lu M, P Chen L, Huo J, Wang X (2008) Optimization of combined
kernel function for SVM based on large margin learning theory.
In: Proceedings of the IEEE international conference on systems,
man and cybernetics (SMC 2008), pp 353–358
58. Scholköpf B, Burgers CJC, Smola AJ (1999) Advances in kernel
methods: support vector learning. MIT Press, Cambridge
59. Yuan R, Li Z, Guan X, Xu L (2010) An SVM-based machine
learning method for accurate Internet traffic classification. Inf Syst
Front 12(2):149–156
60. Lee LH, Rajkumar R, Isa D (2010) Automatic folder allocation
system using Bayesian-support Vector Machines hybrid classifi-
cation approach. Appl Intell. doi:10.1007/s10489-010-0261-0
61. Craven M, DiPasquo D, Freitag D, McCallum A, Mitchell T,
Nigam K, Slattery S (1998) Learning to construct knowledge
bases from the World Wide Web. In: Proceedings of the 15th na-
tional conference for artificial intelligence, pp 509–516
62. Callut J, Franscoisse K, Saerens M, Dupont P (2008) Semi-
supervised classification from discriminative random walks. In:
Proceedings of the 2008 European conference on machine learn-
ing and knowledge discovery in databases—Part 1 (ECML PKDD
’08), pp 162–177
63. Ko Y, Seo J (2009) Text classification from unlabeled documents
with bootstrapping and feature projection techniques. Inf Process
Manag 45(1):70–83
64. Li T, Zhu S, Ogihara M (2008) Text categorization via generalized
discriminant analysis. Inf Process Manag 44(5):1684–1697
65. Xue XB, Zhou ZH (2009) Distributional features for text catego-
rization. IEEE Trans Knowl Data Eng 21(3), 428–442
66. Zhang D, Mao R (2008) A new kernel for classification of net-
worked entities. In: Proceedings of the 6th international workshop
on mining and learning with graphs, Helsinki, Finland
67. Chang C, Lin C (2001) LIBSVM: a library for support vec-
tor machines. Software available at: http://www.csie.ntu.edu.tw/~
cjlin/libsvm
68. Cardoso-Cachopo A (2011) Datasets for single label text catego-
rization. Artificial Intelligence Group, Department of Information
Systems and Computer Science, Instituto Superior Tecnico, Por-
tugal. URL:http://web.ist.utl.pt/~acardoso/datasets/
Lam Hong Lee received a Bache-
lor of Computer Science from Uni-
versiti Putra Malaysia in 2004, and
a Ph.D. degree in Computer Sci-
ence from the University of Not-
tingham in 2009. He joined Univer-
siti Tunku Abdul Rahman, Malaysia
since March 2009 as an assistant
professor. His current research in-
terest lies in improving text catego-
rization using AI techniques, specif-
ically Support Vector Machines. Be-
sides this, he is also investigating on
the implementation of data mining,
pattern recognition and machine learning techniques in various kinds
of intelligent systems.
Chin Heng Wan received his Bach-
elor of Information System (Hons)
in Information System Engineering,
from Universiti Tunku Abdul Rah-
man, Malaysia, in 2005. He is cur-
rently pursuing a Master of Com-
puter Science in Universiti Tunku
Abdul Rahman, Malaysia. His re-
search interests are in Artificial In-
telligence, Machine Learning, Pat-
tern Recognition, Text Mining, and
Intelligent Systems.
Rajprasad Rajkumar received his
Ph.D. degree and Master’s degree
from the University of Nottingham
in 2011 and 2005 respectively, and
Bachelor of Engineering in Electri-
cal and Electronic from University
Tenaga National in 2003. He is cur-
rently working at the Department
of Electrical and Electronic Engi-
neering, University of Nottingham
Malaysia Campus as an Assistant
Professor. His main research inter-
ests are in the use of support vec-
tor machines and signal processing
techniques in various domains. He is currently working in the area
of nondestructive testing, remote sensing, text document classification
and developing unsupervised learning techniques in real-time systems.
Dino Isa is a Professor in the De-
partment of Electrical and Elec-
tronics Engineering of the Univer-
sity of Nottingham, Malaysia Cam-
pus. He obtained a BSEE (Hons)
from the University of Tennessee,
USA in 1986 and a Ph.D. from the
University of Nottingham, UK in
1991. Following his Ph.D., he was
employed as Engineering Section
Head in Motorola’s Power Products
Division in Seremban, Malaysia.
Subsequently he was recruited as
Plant Manager and then promoted
to Chief Technology Officer of Crystal Clear Technology (CCT), a
An enhanced Support Vector Machine classification framework by using Euclidean distance function 99
subsidiary of the Malaysian government’s investment arm, Khazanah
Nasional Berhad. He spent three years in the Westlake Village facility
in California as Director of Operations in the R&D phase of the project
before resuming his duties in CCT Malaysia. He joined the Univer-
sity of Nottingham in 2001. To date Prof. Isa has won five research
contracts worth RM 6,500,000 (£ 1,000,000) while at the University.
His research interest lies in the application of Machine Learning tech-
niques for various kinds of problems. The main aim of his research is
to formulate strategies which lead to the successful implementations of
“Intelligent Systems” in various domains.
... The SVM is basically a classifier that relies on labels for linear data classification (Joachims, 1998). This is not an indication of the inability of SVM to handle more than two classes; rather, its training phase is based on two classes (Lee, et al., 2012). In this study, the SVM training will be based on the '1' class label, 'not-1' class label, '2' class label, and not-5 class label. ...
Article
Full-text available
The detection of adverse drug reactions (ADRs) is an important piece of information for determining a patient's view of a single drug. This study attempts to consider and discuss this feature of drug reviews in medical opinion-mining systems. This paper discusses the literature that summarizes the background of this work. To achieve this aim, the first discusses a survey on detecting ADRs and side effects, followed by an examination of biomedical text mining that focuses on identifying the specific relationships involving ADRs. Finally, we will provide a general overview of sentiment analysis, particularly from a medical perspective. This study presents a survey on ADRs extracted from drug review sentences on social media, utilizing and comparing different techniques.
... This can be beneficial in specially with low resolution/dimensional experimentation. The Equation 4 gives a simplified representation of space transformation [44,45]. ...
Preprint
Full-text available
The Aluminum alloy AA7075 workpiece material is observed under dry finishing turning operation. This work is an investigation reporting promising potential of deep adaptive learning enhanced artificial intelligence process models for L18 (6133) Taguchi orthogonal array experiments and major cost saving potential in machining process optimization. Six different tool inserts are used as categorical parameter along with three continuous operational parameters i.e., depth of cut, feed rate and cutting speed to study their effect on output. Workpiece surface roughness and tool life are considered as output parameters. The data obtained from special L18 (6133) orthogonal array experimental design in dry finishing turning process is used to train AI models. Multi-layer perceptron based artificial neural networks (MLP-ANNs), support vector machines (SVMs) and decision trees are compared for better understanding ability of low resolution experimental design. Seven model evaluation criteria and external validation is used for final model selection. The AI models can be used with low resolution experimental design to obtain causal relationships between input and output variables. The best performing operational input ranges for surface roughness and tool life are identified keeping workpiece surface roughness as primary criteria of range selection in aerospace industry. TiN and TiCN are top two tool insets for obtaining low surface finish with maximum tool life under specified conditions. AI-response surfaces indicate different tool life behavior for alloy based coated tool inserts and non-alloy based coated tool inserts. The AI-Taguchi hybrid modelling and optimization technique helped in achieving 26% of experimental savings (obtaining causal relation with 26% less number of experiments) compared to conventional Taguchi design combined with two screened factors three levels full factorial experimentation.
... There are many different ways to build supervised machine learning classifiers, for example support vector machines (SVM) (e.g. Lee et al., 2012; used by Annif as described below), artificial neural networks (e.g. Ghiassi et al., 2012;You et al., 2019), linear models with some additional tricks such as the fastText algorithm (Joulin et al., 2016, used by Annif as described below), tree-based methods (e.g. ...
Article
Purpose In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an open source software package on a large set of Swedish union catalogue metadata records, with Dewey Decimal Classification (DDC) as the target classification system. It also aimed to contribute to the body of research on aboutness and related challenges in automated subject indexing and evaluation. Design/methodology/approach On a sample of over 230,000 records with close to 12,000 distinct DDC classes, an open source tool Annif, developed by the National Library of Finland, was applied in the following implementations: lexical algorithm, support vector classifier, fastText, Omikuji Bonsai and an ensemble approach combing the former four. A qualitative study involving two senior catalogue librarians and three students of library and information studies was also conducted to investigate the value and inter-rater agreement of automatically assigned classes, on a sample of 60 records. Findings The best results were achieved using the ensemble approach that achieved 66.82% accuracy on the three-digit DDC classification task. The qualitative study confirmed earlier studies reporting low inter-rater agreement but also pointed to the potential value of automatically assigned classes as additional access points in information retrieval. Originality/value The paper presents an extensive study of automated classification in an operative library catalogue, accompanied by a qualitative study of automated classes. It demonstrates the value of applying semi-automated indexing in operative information retrieval systems.
... Gaussian naive Bayes and kernel naive Bayes are two types of naive Bayes classifiers used in this analysis. Gaussian naive Bayes assumes features are normally distributed and calculates the likelihood of assigning to the classes using Bayes theorem (Kamel et al., 2019), whereas kernel naive Bayes transfers input feature space into high dimensional feature space to divide classes easily using linear hyperplane (Lee et al., 2012). Support vector machine (SVM) is an algorithm that defines optimal boundaries that separates classes by maximising the margin between the classes. ...
Article
Geotechnical characterisation of spoil piles has traditionally relied on the expertise of field specialists, which can be both hazardous and time-consuming. Although unmanned aerial vehicles (UAV) show promise as a remote sensing tool in various applications; accurately segmenting and classifying very high-resolution remote sensing images of heterogeneous terrains, such as mining spoil piles with irregular morphologies, presents significant challenges. The proposed method adopts a robust approach that combines morphology-based segmentation, as well as spectral, textural, structural, and statistical feature extraction techniques to overcome the difficulties associated with spoil pile characterisation. Additionally, it incorporates minimum redundancy maximum relevance (mRMR) based feature selection and machine learning-based classification. This automated characterisation will serve as a proactive tool for dump stability assessment, providing crucial data for improved stability models and contributing to a greener and more responsible mining industry.
... Then, the data distribution of each consumer is created using percentiles. Similarity index of two consumers' distribution pattern is then calculated using Euclidean distance formula [33]. This index is measured to check the similarities in the consumer's consumption patterns. ...
Article
Electricity theft is the largest type of non-technical losses faced by power utilities around the globe. It not only raises revenue losses to the utilities but also leads to lethal fires and electric shocks at distribution side. In the past, field operation groups were sent by the utilities to conduct inspections of suspicions electric equipments stated by the public. Advanced metering infrastructure based recent development in the smart grids makes it easy to detect electricity thefts. However, the conventional supervised learning techniques have low theft detection performance mainly due to imbalance datasets available for training. Therefore, in this paper, we develop a novel theft detection model with twofold contribution. A unique hybrid sampling technique named as hybrid oversampling and undersampling using both classes (HOUBC) is proposed to balance the dataset. HOUBC first performs undersampling and then oversampling using both the majority (normal) and minority (theft) classes. A new deep learning method, fractal network is applied with light gradient boosting method to extract and learn important characteristics from electricity consumption profiles for identifying electricity thieves. The proposed model relies on smart meter's data for theft detection and hence, a rapid and widespread adaption of this model is feasible, which shows its main advantage. The performance of the model is evaluated with real-world smart meter's data, i.e., state grid corporation of China. Comprehensive simulation results describe the effectiveness of the proposed model against conventional schemes in terms of electricity theft detection.
... Then, the data distribution of each consumer is created using percentiles. Similarity index of two consumers' distribution pattern is then calculated using Euclidean distance formula [33]. This index is measured to check the similarities in the consumer's consumption patterns. ...
Preprint
Full-text available
Electricity theft is the largest form of non-technical losses faced by power utilities around the globe. It not only raises revenue losses to the utilities but also leads to lethal fires and electric shocks at distribution side. In the past, field operation groups were sent by the utilities to conduct inspections of electric equipments based on suspicions reported by the public. The recent development of advanced metering infrastructure in the smart grids makes it easy to detect electricity thefts. However, the existing supervised learning methods have low theft detection performance mainly due to imbalance datasets available for training. Therefore, in this paper, we develop a novel theft detection model with twofold contribution. A unique hybrid sampling technique named as hybrid oversampling and undersampling using both classes (HOUBC) is proposed to balance the dataset. HOUBC first performs undersampling and then oversampling using both the majority (normal) and minority (theft) classes. A new deep learning method, fractal network is applied with light gradient boosting method to extract and learn important characteristics from electricity consumption profiles for identifying electricity thieves. The main advantage of this proposed model is that it only relies on smart meter's data for theft detection and hence, a rapid and widespread adaption of this model is feasible. The performance of the model is evaluated with the real-world smart meter's data, i.e., state grid corporation of China. Comprehensive simulation results show the effectiveness of the proposed model against state-of-the-art schemes in terms of electricity theft detection.
Chapter
Agriculture plays an indispensable role in each country, serving as a major driving force for economic development. It holds the responsibility for producing the majority of the world’s sustenance for the increasing world population, which is expected to reach 9.8 billion by 2050. With the expected population growing substantially, and the lack of precise knowledge from farmers regarding climactic factors, irrigation demand, soil types, yield, market demand, pesticide use, and livestock needs, the farming process is under scrutiny to produce efficient solutions. The recent advances in Machine Learning (ML) have witnessed an extensive number of applications in agriculture to address the issues. ML falls under the category of Artificial Intelligence (AI) where statistical models enable programmable machines to automatically learn from a dataset. This paper surveys various ML algorithms applicable across sub-domains in agriculture, namely, crop management, water management, soil management, and livestock management. This paper discusses the various problems associated with adopting traditional methods such as soil sampling, laboratory analysis, etc. In addition, the ML algorithms proposed by other authors for forecasting or detection are discussed in detail. At last, the future directions for the application of ML in agriculture are illustrated.
Article
Short text classification task is a special kind of text classification task in that the text to be classified is generally short, typically generating a sparse text representation that lacks rich semantic information. Given this shortcoming, scholars worldwide have explored improved short text classification methods based on deep learning. However, existing methods cannot effectively use concept knowledge and long-distance word dependencies. Therefore, based on graph neural networks from the perspective of text composition, we propose concept and dependencies enhanced graph convolutional networks for short text classification. First, the co-occurrence relationship between words is obtained by sliding window, the inclusion relationship between documents and words is obtained by TF-IDF, long-distance word dependencies is obtained by Stanford CoreNLP, and the association relationship between concepts in the concept graph with entities in the text is obtained through Microsoft Concept Graph. Then, a text graph is constructed for an entire text corpus based on the four relationships. Finally, the text graph is input into graph convolutional neural networks, and the category of each document node is predicted after two layers of convolution. Experimental results demonstrate that our proposed method overall best on multiple classical English text classification datasets.
Article
Full-text available
Training a support vector machine (SVM) leads to a quadratic optimization problem with bound constraints and one linear equality constraint. Despite the fact that this type of problem is well understood, there are many issues to be considered in designing an SVM learner. In particular, for large learning tasks with many training examples, off-the-shelf optimization techniques for general quadratic programs quickly become intractable in their memory and time requirements. SVMLight is an implementation of an SVM learner which addresses the problem of large tasks. This chapter presents algorithmic and computational results developed for SVMlight V2.0, which make large-scale SVM training more practical. The results give guidelines for the application of SVMs to large domains. Also published in: 'Advances in Kernel Methods - Support Vector Learning', Bernhard Schölkopf, Christopher J. C. Burges, and Alexander J. Smola (eds.), MIT Press, Cambridge, USA, 1998. The paper is written in English.
Article
LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Book
This book represents the most comprehensive treatment available of neural networks from an engineering perspective. Thorough, well-organized, and completely up to date, it examines all the important aspects of this emerging technology, including the learning process, back-propagation learning, radial-basis function networks, self-organizing systems, modular networks, temporal processing and neurodynamics, and VLSI implementation of neural networks. Written in a concise and fluid manner, by a foremost engineering textbook author, to make the material more accessible, this book is ideal for professional engineers and graduate students entering this exciting field. Computer experiments, problems, worked examples, a bibliography, photographs, and illustrations reinforce key concepts.
Article
The generalization error of support vector machine usually depends on its kernel parameters, but there is no analytic method to choose kernel parameters for SVM. In order to choose the kernel parameters for SVM, the simulated annealing algorithm and genetic algorithm are combined, which is called simulated annealing genetic algorithm (SA-GA), to choose the SVM kernel parameters. SA-GA makes use of encoding method, reproduction, crossover and mutation in the SA when generate new solution. In this way, the characteristic of SA that can accept a worse solution in a certain extent of probability can solve premature convergence of GA, and the heuristic search method of GA can make SA robust to the parameters of cooling schedule. So the combined algorithm has better performance than SA or GA, and it can get a better solution for optimization problem. At last, SA-GA has been used to choosing the kernel parameters of SVM. The results of simulation show that the performance of the method that proposed in this paper was more efficient than SA and GA for choosing kernel parameters of SVM.