ArticlePDF Available

An enhanced Support Vector Machine classification framework by using Euclidean distance function for text document categorization

July 2012
Applied Intelligence 37(1):1-20

July 2012
37(1):1-20

DOI:10.1007/s10489-011-0314-z

Authors:

Lam Hong Lee

Simplify Networks, Malaysia

Chin Heng Wan

Universiti Tunku Abdul Rahman

Rajprasad Rajkumar

University of Nottingham, Malaysia Campus

This paper presents the implementation of a new text document classification framework that uses the Support Vector Machine (SVM) approach in the training phase and the Euclidean distance function in the classification phase, coined as Euclidean-SVM. The SVM constructs a classifier by generating a decision surface, namely the optimal separating hyper-plane, to partition different categories of data points in the vector space. The concept of the optimal separating hyper-plane can be generalized for the non-linearly separable cases by introducing kernel functions to map the data points from the input space into a high dimensional feature space so that they could be separated by a linear hyper-plane. This characteristic causes the implementation of different kernel functions to have a high impact on the classification accuracy of the SVM. Other than the kernel functions, the value of soft margin parameter, C is another critical component in determining the performance of the SVM classifier. Hence, one of the critical problems of the conventional SVM classification framework is the necessity of determining the appropriate kernel function and the appropriate value of parameter C for different datasets of varying characteristics, in order to guarantee high accuracy of the classifier. In this paper, we introduce a distance measurement technique, using the Euclidean distance function to replace the optimal separating hyper-plane as the classification decision making function in the SVM. In our approach, the support vectors for each category are identified from the training data points during training phase using the SVM. In the classification phase, when a new data point is mapped into the original vector space, the average distances between the new data point and the support vectors from different categories are measured using the Euclidean distance function. The classification decision is made based on the category of support vectors which has the lowest average distance with the new data point, and this makes the classification decision irrespective of the efficacy of hyper-plane formed by applying the particular kernel function and soft margin parameter. We tested our proposed framework using several text datasets. The experimental results show that this approach makes the accuracy of the Euclidean-SVM text classifier to have a low impact on the implementation of kernel functions and soft margin parameter C. KeywordsText document classification–Support Vector Machine–Euclidean distance function–Kernel function–Soft margin parameter

Optimal separating hyper-plane

…

Mapping data points into high dimensional feature space using kernel functions

…

Kernels function and number of error [56]

…

ector space of the conventional SVM classifier with optimal separating hyper-plane

…

Kernels function and accuracy [53]

…

Figures - uploaded by Lam Hong Lee

Content may be subject to copyright.

Content uploaded by Lam Hong Lee

Content may be subject to copyright.

Appl Intell (2012) 37:80–99

DOI 10.1007/s10489-011-0314-z

An enhanced Support Vector Machine classiﬁcation framework

by using Euclidean distance function for text document

categorization

Lam Hong Lee ·Chin Heng Wan ·

Rajprasad Rajkumar ·Dino Isa

Published online: 25 August 2011

Abstract This paper presents the implementation of a new

text document classiﬁcation framework that uses the Sup-

port Vector Machine (SVM) approach in the training phase

and the Euclidean distance function in the classiﬁcation

phase, coined as Euclidean-SVM. The SVM constructs a

classiﬁer by generating a decision surface, namely the opti-

mal separating hyper-plane, to partition different categories

of data points in the vector space. The concept of the opti-

mal separating hyper-plane can be generalized for the non-

linearly separable cases by introducing kernel functions to

map the data points from the input space into a high dimen-

sional feature space so that they could be separated by a lin-

ear hyper-plane. This characteristic causes the implementa-

tion of different kernel functions to have a high impact on

the classiﬁcation accuracy of the SVM. Other than the ker-

nel functions, the value of soft margin parameter, C is an-

other critical component in determining the performance of

the SVM classiﬁer. Hence, one of the critical problems of

the conventional SVM classiﬁcation framework is the ne-

L.H. Lee ()·C.H. Wan

Faculty of Information and Communication Technology,

Universiti Tunku Abdul Rahman, Bandar Barat, 31900 Kampar,

Perak, Malaysia

e-mail: leelh@utar.edu.my

C.H. Wan

e-mail: wanchinheng@yahoo.com

R. Rajkumar ·D. Isa

Intelligent Systems Research Group, Faculty of Engineering,

The University of Nottingham, Malaysia Campus, Jalan Broga,

43500, Semenyih, Selangor, Malaysia

R. Rajkumar

e-mail: Rajprasad.Rajkumar@nottingham.edu.my

D. Isa

e-mail: Dino.Isa@nottingham.edu.my

cessity of determining the appropriate kernel function and

the appropriate value of parameter C for different datasets of

varying characteristics, in order to guarantee high accuracy

of the classiﬁer. In this paper, we introduce a distance mea-

surement technique, using the Euclidean distance function

to replace the optimal separating hyper-plane as the classi-

ﬁcation decision making function in the SVM. In our ap-

proach, the support vectors for each category are identiﬁed

from the training data points during training phase using the

SVM. In the classiﬁcation phase, when a new data point is

mapped into the original vector space, the average distances

between the new data point and the support vectors from

different categories are measured using the Euclidean dis-

tance function. The classiﬁcation decision is made based on

the category of support vectors which has the lowest average

distance with the new data point, and this makes the classi-

ﬁcation decision irrespective of the efﬁcacy of hyper-plane

formed by applying the particular kernel function and soft

margin parameter. We tested our proposed framework us-

ing several text datasets. The experimental results show that

this approach makes the accuracy of the Euclidean-SVM

text classiﬁer to have a low impact on the implementation

of kernel functions and soft margin parameter C.

Keywords Text document classiﬁcation ·Support Vector

Machine ·Euclidean distance function ·Kernel function, ·

Soft margin parameter

1 Introduction

This paper presents a novel text document classiﬁcation

framework that uses the Support Vector Machine (SVM) ap-

proach in the training phase to identify the set of support

An enhanced Support Vector Machine classiﬁcation framework by using Euclidean distance function 81

vectors for each category, and uses the Euclidean distance

function in the classiﬁcation phase to compute the average

distances between the testing data point and each of the sets

of support vectors from different categories. Classiﬁcation

decision is made based on the category which has the low-

est average distance between its set of support vectors and

the new data point and this makes the classiﬁcation decision

irrespective of the efﬁcacy of hyper-plane formed by apply-

ing the particular kernel function and soft margin parameter.

Text document classiﬁcation denotes the task of automati-

cally assigning collections of electronic text documents into

their annotated categories, based on their contents and simi-

larities. For some decades now, text document classiﬁcation

has become important due to the rapid growth of creating,

editing, manipulating and storing text documents in digi-

tal form. In recent years, an increasing number of statistical

and computational approaches have been developed for text

classiﬁcation, including k-nearest-neighbor classiﬁcation [1,

2], Bayesian classiﬁcation [3–12], support vector machine

[2,13–17], maximum entropy [18], decision tree induction

[19], rule induction [20,21], and artiﬁcial neural networks

[22]. Besides the supervised classiﬁcation approaches, unsu-

pervised clustering techniques such as self-organizing maps

[23,24] have also been introduced for text document seg-

mentation.

Since the past decade, SVM has gained its popularity

in various types of classiﬁcation applications and has been

reported as one of the best performing classiﬁcation ap-

proaches [2,13–17,25–34]. It can be used as a discrimi-

native classiﬁer and has been shown to be more accurate

than most other classiﬁcation models [15,27,28,31,32,35,

36]. The good generalization characteristic of the SVM is

due to the implementation of Structural Risk Minimization

(SRM) principle, which entails ﬁnding an optimal separat-

ing hyper-plane, thus guaranteeing the highly accurate clas-

siﬁer in most applications. Equation (1) represents the equa-

tion of a hyper-plane which can be used to partition data

points in a SVM.

w·x+b=0(1)

Figure 1illustrates a linearly separable case, where data

points of one category (represented by “◦”) and data points

of another category (represented by “•”) are separated by

the linear optimal separating hyper-plane (the solid straight

line).

There are actually an inﬁnite number of hyper-planes

which are able to partition the data points into two categories

(as illustrated by the dashed lines on Fig. 1). According to

the SVM methodology, there will just be one optimal sep-

arating hyper-plane. This optimal separating hyper-plane is

lying half-way in between the maximal margin, where the

margin is deﬁned as the sum of distances of the hyper-plane

Fig. 1 Optimal separating hyper-plane

to the support vectors. In the case as illustrated in Fig. 1,the

margin is d1+d2.

The optimal separating hyper-plane is only determined

by the closest data points of each category. These points are

called Support Vectors (SVs). As only the SVs determine

the optimal separating hyper-plane, there is a certain way

to represent them for a given set of training points. It has

been shown in [37] that the maximal margin can be found

by minimizing ½w2,asshownin(2).

min½w2(2)

Therefore, the optimal separating hyper-plane can be con-

ﬁgured by minimizing (2) under the constraint of (3), that

the training data points are correctly separated.

yi·(w·xi+b) ≥1,∀i(3)

A more detailed discussion of the SVM has been presented

in our previous work [14].

The concept of the optimal separating hyper-plane can be

generalized for the non-linearly separable cases. One of the

methods which can be used to partition non-linearly separa-

ble data points is the implementation of kernel functions. By

implementing the kernel functions, the non-linearly separa-

ble data points are mapped from the original input space to

a high dimensional feature space through a non-linear trans-

formation rather than ﬁtting non-linear decision surfaces to

the input space to separate the data points, as illustrated in

Fig. 2. This is to ensure that a linear optimal separating

hyper-plane could be generated in the new feature space to

separate the data points. By introducing the kernel function

as shown by (4), it is not necessary to explicitly know (·)

[38]. Hence, the optimization problem can be translated di-

rectly to the more general kernel version, as shown by (5).

K(xi,x

j)=(xi), (xj),(4)

82 L.H. Lee et al.

Fig. 2 Mapping data points into high dimensional feature space using

kernel functions

Table 1 Common kernel functions for SVM with promising classiﬁ-

cation performance in most cases

Kernels Formula

Lineal K(u,v) =u·v

Sigmoid K(u,v) =tanh(au·v+b)

Polynomial K(u,v) =(1+u·v)d

RBF K(u,v) =exp(−au−v2)

Exponential RBF K(u,v) =exp(−au−v)

W(α)=



i=1

αi−1



i=1,j=1

αiαjyiyjK(xi,x

j),

subject to C≥αi≥0,



i=1

αiyi=0(5)

The algorithm of the non-linear classiﬁcation is formally

similar to the linear classiﬁcation, except that every dot

product is replaced by a non-linear kernel function. This al-

lows a linear optimal separating hyper-place to be ﬁtted in

the high dimensional feature space, and it may be non-linear

in the original input space, as illustrated in Fig. 2. By map-

ping the data points into a high dimensional feature space us-

ing kernel functions, superb performing classiﬁcation tasks

can be obtained since it allows the SVM models to perform

data points separation even with very complex boundaries.

There could be an inﬁnite number of kernel functions, but

only certain kernel functions have been found to perform

well in a wide variety of classiﬁcation tasks. Table 1shows

some of the common well-performing kernel functions in

most cases.

Each of the kernel functions listed in Table 1has its own

properties and unique responses in handling different types

of data. For example, the SVM model equipped with a sig-

moid kernel function is equivalent to a two-layer perceptron

neural network [38], while the SVM model using a Radius

Basis Function (RBF) kernel is closely similar to the RBF

neural networks and the feature space is in an inﬁnite di-

mension. The selection of kernel function for the SVM clas-

siﬁcation model is based on the classiﬁcation task’s require-

ments and the patterns of the data points distribution. With

an appropriate and optimal kernel function implemented in

the SVM model, the classiﬁer is able to scale high dimen-

sional data relatively well, and trade-off between classiﬁer

complexity and classiﬁcation error can be controlled explic-

itly.

Therefore, an appropriate implementation of optimal ker-

nel function is a necessity for the SVM classiﬁcation frame-

work in order to obtain optimal performance. This unique

characteristic causes the classiﬁcation accuracy of the SVM

to be highly dependent on the selection of kernel func-

tions. This is due to the fact that the data points distribu-

tion may change in different high dimensional feature space,

and the linear optimal separating hyper-plane can only be

constructed after the data points have been mapped into the

higher dimensional feature space using kernel functions. In

other words, the decision surface of the SVM classiﬁer for

non-linearly separable cases is constructed based on the im-

plementation of kernel functions. As the result, one of the

critical problems of the conventional SVM classiﬁcation ap-

proach is the selection of appropriate kernel function, based

on the varying characteristics of different datasets, in or-

der to obtain high classiﬁcation accuracy. It does not have

a generally optimal kernel function which is able to guaran-

tee good classiﬁcation performance on all types of datasets

of varying characteristics. In recent years, many research

works have been carried out with the same goal that is seek-

ing for solutions to counter this problem. However, there is

no ultimate solution of having an all-rounded and optimal

kernel function which will suit most of the SVM classiﬁca-

tion tasks on different datasets of varying characteristics.

Apart from the kernel functions, the performance of the

SVM classiﬁer is also heavily dependent on the soft mar-

gin parameter, C. The parameter C controls the trade-off

between the margin and the size of slack variables [39]. It

creates the soft margin that allows some of the classiﬁcation

errors, especially when non-separable points exist during the

classiﬁcation phase. If the value of C is small, the number of

training errors will increase due to underﬁtting [17]. On the

other hand, the large value of C will lead to overﬁtting where

a high penalty for non-separable points occurs [40], and the

classiﬁer will behave like a hard margin SVM [17].

As the SVM classiﬁcation approach requires the selec-

tion of appropriate combination of kernel function and pa-

rameters, the optimization of kernel function and parameters

has to be incorporated into the training phase of the SVM,

in order to guarantee high accuracy of the classiﬁcation task.

Typically, convoluted computations have been carried out to

optimize the set of kernel function and parameters combina-

tion for the SVM. This could be done by conducting iterative

cross-validation process to predict the best performing com-

bination of kernel function and parameters for the trained

SVM classiﬁer, using a validation set. This method leads to

a computationally intensive and high time-consuming train-

ing process for the SVM, hence degrading the efﬁciency of

An enhanced Support Vector Machine classiﬁcation framework by using Euclidean distance function 83

the classiﬁer. Furthermore, for certain cases in which the

training samples are limited, there exists a critical problem

in preparing sufﬁcient training set and validation set for the

SVM to train the classiﬁer and to conduct the kernel func-

tion and parameters optimization process.

In this paper, we propose an enhanced classiﬁcation

framework for text document classiﬁcation, coined as the

Euclidean-SVM. This new classiﬁcation framework is pro-

posed by introducing the Euclidean distance measurement

function to replace the optimal separating hyper-plane as the

decision making function for the conventional SVM classi-

ﬁcation. In our proposed approach, the SVs for each of the

categories are identiﬁed using the SVM training algorithm

and the SVs are then mapped into the original vector space

to construct the trained Euclidean-SVM classiﬁer. In the

classiﬁcation phase, an unlabeled new data point is mapped

into the same original vector space and the average distances

between the new data point and each set of the SVs of dif-

ferent categories are computed using the Euclidean distance

function. The classiﬁcation decision is made based on the

category of SVs which has the lowest average distance with

the new data point. With the classiﬁcation decision making

function using the Euclidean distance function instead of the

optimal separating hyper-plane, the impact of kernel func-

tion and soft margin parameter C on accuracy of the SVM

classiﬁer could be minimized, hence contribute to a kernel

function and soft margin parameter independent Euclidean-

SVM text classiﬁcation framework.

2 Related works

Although the SVM has been reported as one of the best

performing machine learning approaches for classiﬁcation,

there exists a critical problem of the SVM in determining

the optimal combination of kernel function and parame-

ters, in order to guarantee high efﬁciency and effectiveness

of the classiﬁcation tasks. Typically, the optimal combina-

tion of the SVM kernel and parameters are determined by

using the computationally intensive grid search algorithms

[41,42]. This method varies different types of kernel func-

tions and parameters through a wide range of values using

geometric steps. The set of kernel and parameters combina-

tion with the best cross-validation accuracy is selected. In

recent years, many research works have been carried out in

order to solve the problem of automatically ﬁnding the most

appropriate kernel function and parameters for the SVM in

order to guarantee high accuracy of the classiﬁer. Most of

them perform the kernel and parameters optimization using

evolutionary algorithms. These methods conduct iterative

computation in order to conﬁgure the optimal set of kernel

and parameters for the SVM, and they will further compli-

cate the convoluted training process, hence leading to a high

computational cost for the SVM during the training phase.

As a result, the efﬁciency of the SVM classiﬁer has been

severely degraded by having such methods in determining

the appropriate combination of kernel and parameters.

Quang et. al. have presented the evolutionary algorithm,

in speciﬁc, the genetic algorithm to optimize SVM pa-

rameters, including kernel type, kernel parameters and up-

per bound C. This is an iterative process by repeating the

crossover, mutation and selection procedures to produce the

optimal set of parameters [43]. Friedrichs and Igel have also

proposed an evolution strategy, in speciﬁc, the Covariance

Matrix Adaptation Evolution Strategy for obtaining optimal

set of multiple hyper-parameters (kernel parameters and reg-

ularization parameter) for the SVM [44]. Briggs and Oates

have introduced the idea of domain-speciﬁc composite ker-

nels to the SVM classiﬁcation for better generalization abil-

ity as compared to the base kernels [45]. An evolutionary

algorithm was employed in order to search through a large

number of composite kernels and the hill climbing technique

was chosen as the composite kernel search algorithm [45].

Dong et. al. have also presented a genetic algorithm-based

technique to select the optimal value of the cost parameter

C and kernel parameters for the SVM, using cross-valida-

tion [46]. All the approaches mentioned above suffer from

inefﬁcient classiﬁcation due to the high time consuming iter-

ative computation, as they employ evolutionary algorithms

for the optimal kernel and parameters selection.

Avci has proposed a hybrid method of genetic algorithm

and SVM, coined as HGASVM, for automatic digital mod-

ulation classiﬁcation. Avci has shown that the proposed

HGASVM has better accuracy than the combination of the

SVM classiﬁers with randomly selected parameters, in the

speciﬁed application of automatic digital modulation classi-

ﬁcation [47]. However, as the parameters optimization pro-

cess is based on the genetic algorithm, the common prob-

lem of high time consumption is still a disadvantage to this

proposed approach. Zhang et al. have used the combination

of simulated annealing and genetic algorithm to optimize

the parameters for the SVM. This hybrid approach takes the

advantages from both of these techniques to overcome the

disadvantages of each other. As the result, this hybrid tech-

nique has been proven to have better performance than sim-

ulated annealing or genetic algorithm alone in selecting op-

timal kernel and parameters for the SVM [48]. Diosan et al.

have proposed another hybridized framework of genetic pro-

gramming and SVM to choose the most efﬁcient expression

of the kernel of kernels function and to select the optimal set

of SVM hyper-parameters. This approach conducts an itera-

tive process to optimize the SVM parameters and due to the

complexity of kernel function, the computational complex-

ity of the proposed algorithm is high, and this is even higher

than that from evolutionary linear multiple kernel [49].

84 L.H. Lee et al.

Besides using the evolutionary algorithms, there exist

some approaches which determine the optimal kernel pa-

rameters using distance between two classes in the feature

space [50–52]. Sun et al. have proposed a method in which

the training phase of the SVM and the iterative process of

evaluating the performance for all the parameters combina-

tions can be avoided [50,51]. The optimal parameters can

be determined by sigmoid function. According to Sun et al.,

this method has good accuracy with sigmoid function and

drastically reduces the time for searching the optimal ker-

nel parameters for the conventional SVM by using other ex-

isting algorithms, since the iterative computation of select-

ing optimal parameters using evolutionary algorithms can be

avoided [50,51]. Wu and Wang have also proposed a sim-

ilar kernel parameters selection approach which uses data

separation index (inter-class distance in the feature space) to

predict the optimal SVM kernel parameters [52]. However,

both of these methods do not perform the optimization of

parameter C in which the training time of the SVM could be

further reduced if the proposed methods incorporate param-

eter C into their optimization strategy.

As the existing kernel function and parameters optimiza-

tion methods for the SVM involve convoluted and iterative

computation, this problem is considered and investigated by

our group. Therefore, it is the goal of this paper to propose

an enhanced SVM framework which has low impact on the

implementation of different kernel functions and parame-

ter C, for text document classiﬁcation.

3 The Euclidean-SVM text classiﬁcation framework

We propose and implement a new text classiﬁcation frame-

work by introducing the Euclidean distance function to re-

place the optimal separating hyper-plane in the conventional

SVM as the classiﬁcation decision making function. We uti-

lize the SVM training algorithm to reduce the training data

points by identifying and retaining only the SVs, and elim-

inating the rest of the training data points. In the classiﬁca-

tion phase, the Euclidean distance function is used to make

the classiﬁcation decision based on the average distance be-

tween the testing data point to each group of SVs from dif-

ferent categories. We eliminate the use of optimal separat-

ing hyper-plane as the decision surface in the conventional

SVM approach, as the construction of the optimal separat-

ing hyper-plane is highly dependent on the kernel functions.

In fact, the construction of the linear separating hyper-plane

in the high dimensional feature space is based on the SVs

and kernel function, in which the kernel function is incorpo-

rated to map the data points into a high dimensional feature

space, so that the data points (speciﬁcally the SVs) are possi-

bly separable by a linear separating hyper-plane. This causes

the kernel functions to have high impact on the construction

Table 2 Kernels function and number of error [56]

Method Sample # Error #

ESTScan, closest ATG 2350 729

Salzberg method 3312 1095

SVM, Salzberg kernel 3312 530

TISHunter 3312 13

Table 3 Kernel functions and number of support vectors [56]

Kernel Average # of SVs

Edit kernel I 2312

Edit kernel II 2316

Edit kernel III, SCM120 319

Edit kernel III, SCM250 230

Edit kernel III, ASCM120 507

Edit kernel III, ASCM250 293

Edit kernel III, PAM250 821

Table 4 Kernels function and accuracy [53]

Kernels Best accuracy Average accuracy

Linear 56.04% 56.04%

Polynomial 74.60% 66.81%

RBF 70.34% 59.01%

of the separating hyper-plane, and hence affect the classiﬁ-

cation accuracy of the SVM. Previous research works had

proven that the implementation of different kernel functions

will greatly inﬂuence the accuracy of the SVM classiﬁer, as

well as the number of SVs [53–59]. Tables 2,3and 4illus-

trate the results from previous research works which investi-

gated on the impact of different kernel functions on accuracy

and number of SVs of the SVM classiﬁcation approach.

In this paper, we propose the utilization of the Euclidean

distance function to replace the optimal separating hyper-

plane as the decision making function of the SVM approach.

Prior to the training phase of our proposed Euclidean-

SVM text classiﬁcation framework, we have proposed a pre-

processing approach to transform text documents into a rep-

resentation suitable format for the SVM and the Euclidean-

SVM, which is typically in numerical format. The text doc-

uments have been pre-processed by using the Bayesian vec-

torization technique [14,60]. The Bayesian vectorization

technique is carried out in order to transform each of the text

documents in the dataset into the format of probability dis-

tribution in the vector space, by using the Bayesian formula.

By applying the Bayesian vectorization technique to pre-

process the text documents, the textual data are transformed

into numerical format and the dimensionality of data has

An enhanced Support Vector Machine classiﬁcation framework by using Euclidean distance function 85

been greatly reduced from thousands (equal to the number of

words in the document, such as when using Term Frequency

Inverse Document Frequency (TFIDF) method to transform

text to numerical) to typically less than hundred (number of

categories the document may be classiﬁed to). This trans-

formation is a necessity of the Euclidean-SVM text classi-

ﬁcation framework due to the fact that the Euclidean-SVM

approach is a classiﬁcation framework based on vector space

model, which requires data in numerical format so that the

data could be mapped into a vector space, for both train-

ing and classifying purposes. Besides this, as the Euclidean-

SVM approach may suffer from high computational time

consumption in handling data with high dimensionality, due

to its convoluted computation (requiring the computation of

distances between the SVs and the input data points), the di-

mensionality reduction of data (from the number of words to

the number of available categories in the classiﬁcation task)

is also a crucial requirement for the Euclidean-SVM classi-

ﬁcation approach. The details of the Bayesian vectorization

technique have been discussed in our previous works pre-

sented in [14,60].

We have conducted an experiment to validate the perfor-

mance of the Bayesian vectorization technique over TFIDF

vectorization on the preprocessing of textual data for the

SVM classiﬁer and the Euclidean-SVM classiﬁer. The re-

sults showed that the classiﬁers which use the Bayesian vec-

torization as the textual data transformation technique out-

performed the classiﬁers which use the TFIDF vectorization

technique. The results for this experiment can be seen in Ta-

bles 16 and 17 in Sect. 4.6 of this paper.

During the training phase of the Euclidean-SVM classiﬁ-

cation framework, the conventional SVM training algorithm

is used to map all the training data points into the vector

space and identify the set of SVs for each of the categories.

The construction of the optimal separating hyper-plane is

still a necessity in order to identify the SVs, since the op-

timal separating hyper-plane is lying half-way in between

the maximal margin, where the margin is deﬁned as the sum

of distances of the hyper-plane to the SVs. Figure 3illus-

trates the construction of the optimal separating-hyper-plane

in the vector space which separates the training data points

of two different categories, after implementing the conven-

tional SVM training algorithm.

As illustrated in Fig. 3, there are two categories of train-

ing data points, represented by spheres and squares respec-

tively. The black spheres represent the SVs of the category

“Sphere” and black squares represent the SVs of the cate-

gory “Square”. The optimal separating hyper-plane is con-

structed by maximizing the margin of d1+d2. However, the

optimal separating hyper-plane is discarded in the classiﬁ-

cation phase as it does not act as the decision surface in

our proposed Euclidean-SVM classiﬁcation framework. Our

proposal in this paper is to replace the optimal separating

Fig. 3 Vector space of the conventional SVM classiﬁer with optimal

separating hyper-plane

Fig. 4 Vector space of the Euclidean-SVM classiﬁer with the Eu-

clidean distance function as the classiﬁcation decision making algo-

rithm

hyper-plane by introducing the Euclidean distance function

in making the decision for the classiﬁcation task. After the

SVs for each of the categories have been identiﬁed, they are

mapped into the original vector space and the rest of the

training data points are eliminated. During the classiﬁcation

phase, a new unlabeled data point is mapped into the same

vector space with the SVs, and the average distances be-

tween the new data point and each set of the SVs of different

categories are computed using the Euclidean distance func-

tion. Figure 4illustrates the vector space of the Euclidean-

SVM classiﬁer during classiﬁcation phase.

The “Triangle” in Fig. 4represents the new unlabeled

data point to be classiﬁed. The distances between the new

input data point and each of the SVs are computed. The

Euclidean distance function is used to calculate the dis-

tance between two points, new vector P, and support vec-

tor Q. Equation (6) illustrates the Euclidean distance for-

mula where piand qiare the coordinate of Por Qin di-

mension n.

D=



n



i=1

(pi−qi)2(6)

As illustrated in Fig. 4,D1and D2represent the Euclidean

distances between the new data point and the SVs of cat-

86 L.H. Lee et al.

Fig. 5 Block diagram of the

Euclidean-SVM text

classiﬁcation framework

egory “Sphere”, while D3,D4and D5represent the Eu-

clidean distances between the new data point and the SVs of

category “Square”. After obtaining the Euclidean distances

between the new data point and each of the SVs from differ-

ent categories, the average distance of the new data point to

the set of SVs of each of the categories has been computed.

This could be done by adding up the Euclidean distances of

the new data point to the SVs from the same category, and

divide the sum with the total number of SVs for that par-

ticular category, as illustrated by (7). Based on the example

as illustrated in Fig. 4, the average distance of the new data

point to the SVs of category “Sphere” is (D1+D2)/2, and

the average distance of the new data point to the SVs of cat-

egory “Square” is (D3+D4+D5)/3.

Davg =N

I=1((n

i=1(pi−qi)2))I

N(7)

After computing the average distance of the new data point

to the set of SVs of each of the categories, the classiﬁca-

tion decision is made based on the category which has the

lowest average distance between its set of SVs and the new

data point. In other words, the new input data point will be

labeled with the category which has the lowest average dis-

tance between its SVs and the new data point itself.

The Euclidean-SVM classiﬁcation approach and the

k-Nearest Neighbor (k-NN) classiﬁcation approach share

some similarities as both of these classiﬁcation approaches

map the training data points into a vector space, and dis-

tance measurement technique is used to make classiﬁcation

decision. In fact, the Euclidean-SVM approach differs from

the k-NN approach. The Euclidean-SVM approach makes

the classiﬁcation decision based on the category which has

the shortest average Euclidean distance between its set of

SVs and the new data point. On the other hand, the k-NN

approach assigns a testing data point to a particular category

if it is the most frequent category among the k nearest train-

ing data points. Figure 5illustrates the block diagram and

Table 5illustrates the algorithm of the Euclidean-SVM text

classiﬁcation approach.

With the combination of the SVM training algorithm and

the Euclidean distance function to make the classiﬁcation

decision, the impact of kernel function and parameter C on

the classiﬁcation accuracy of the conventional SVM can be

minimized. This is due to the fact that the optimal separat-

ing hyper-plane, which its construction is highly dependent

on kernel functions, is replaced by the Euclidean distance

function. Since the Euclidean distance function is able to

perform its classiﬁcation decision making task sufﬁciently

as long as both the training data points (support vectors) and

data points to be classiﬁed are mapped into the same vec-

tor space, the transformation of existing vector space into

a higher dimensional feature space by the kernel functions

is not needed during the classiﬁcation phase, hence does

not have great impact on the classiﬁcation performance. In

An enhanced Support Vector Machine classiﬁcation framework by using Euclidean distance function 87

Table 5 Algorithms of the Euclidean-SVM text classiﬁcation framework in pre-processing phase, training phase and classiﬁcation phase

Algorithms of the Euclidean-SVM text classiﬁcation framework

Pre-Processing Phase

1. Transform all the text documents (in both training set and testing set) into numerical format using the Bayesian Vectorization

Technique.

Training Phase

1. Map all the training data points into the vector space of a SVM.

2. Identify and obtain the set of support vectors for each of the categories using SVM algorithm, and eliminate the rest of the

training data points which are not identiﬁed as support vectors.

3. Map all the support vectors into the original vector space.

Classiﬁcation Phase

1. Map the new unlabeled data point into the same original vector space with support vectors.

2. Use the Euclidean distance formula to calculate the average distances between the new data point and each of the sets of

support vectors from different categories.

3. Identify the category which has the lowest average distance between its set of support vectors and the new data point.

4. Generate classiﬁcation result for the new data point based on the identiﬁed category.

Table 6 List of categories of the vehicles dataset

1. Aircrafts

2. Boats

3. Cars

4. Train

other words, the problem of selecting the right kernel func-

tions for the classiﬁer does not exist if the optimal separating

hyper-plane is replaced by the Euclidean distance function.

As the result, by integrating the SVM training algorithm

and the Euclidean distance function to construct a classiﬁca-

tion framework, we can obtain an enhanced Euclidean-SVM

classiﬁer with better performance in which the accuracy is

comparable to the conventional SVM, while immune from

the problem of determining the appropriate kernel functions

and parameter C.

4 Experimental results

Our proposed Euclidean-SVM text classiﬁcation framework

has been tested and evaluated using ﬁve text corpuses. Three

of them were collected by our research group, namely the

Vehicles dataset [8,9,14,23], the Mathematics dataset [8,

9,14], and the Automobiles datasets [8,9,14]. These three

text datasets have been constructed by collecting text arti-

cles from different sources, such as Wikipedia website and

arxiv.org website.

The Vehicles dataset was built by acquiring vehicles re-

lated articles from Wikipedia website. This dataset consists

of 4 categories of vehicles. All the four categories are eas-

ily differentiated in terms of the content since each category

Table 7 List of categories of the mathematics dataset

1. Algebraic Geometry

2. Analysis of PDEs

3. Combinatorics

4. Differential Geometry

5. Mathematical Physics

6. Number Theory

7. Probability

8. Statistics

has their unique set of keywords. The list of categories of

the Vehicles dataset is illustrated in Table 6.

A dataset containing articles about mathematical topics,

namely the Mathematics dataset, has been acquired by our

research group from arxiv.org website. This dataset consists

of 8 mathematical sub-categories. The list of categories of

the Mathematics dataset is shown in Table 7.

The Automobiles dataset was designed and organized by

collecting articles about automobiles from Wikipedia web-

site. This dataset consists of nine categories of automobile,

differentiated in terms of geographical regions and classiﬁ-

cations. Table 8illustrates the list of categories of the Auto-

mobiles dataset.

Besides the three text corpuses that we constructed by

acquiring documents from different sources and organized

by ourselves, we have also acquired the WebKB dataset and

the Reuters-21578 dataset for more generic evaluations of

the performance of our proposed Euclidean-SVM text clas-

siﬁcation approach.

The WebKB collection was originally constructed by

“the World Wide Knowledge Base (Web->Kb) Project of

88 L.H. Lee et al.

Table 8 List of categories of the automobiles dataset

1. American Mini Vans

2. American Sports Cars

3. American SUVs

4. Asian Mini Vans

5. Asian Sports Cars

6. Asian SUVs

7. European Mini Vans

8. European Sports Cars

9. European SUVs

Table 9 List of categories of the WebKB dataset

1. Course

2. Faculty

3. Project

4. Student

the CMU Text Learning Group” and this dataset has been

widely used for experiments in text applications of ma-

chine learning techniques, such as text classiﬁcation and text

clustering [61]. Many research groups have used the We-

bKB dataset to evaluate the performance of their presented

text classiﬁcation approaches [62–66]. The original WebKB

dataset consists of ﬁles collected from computer science de-

partments of various universities in 1997. These documents

were manually classiﬁed into seven different categories: stu-

dent, faculty, staff, department, course, project, and other.

In the experiments here, the categories “Department” and

“Staff” were discarded due to the fact that there were only

a few pages from each university. The category “Other” was

also discarded because of the documents in this category are

greatly varying from each other. In conclusion, the list of

categories in the WebKB dataset which has been used in the

experiments carried out in this paper is illustrated in Table 9.

The Reuters-21578 dataset was originally collected by

Carnegie Group Inc. and Reuters Ltd. This text collection

has been reported as one of the most common benchmark

for text classiﬁcation approaches and it has been widely

used by text classiﬁcation research groups in evaluatingtheir

classiﬁcation models [1,2,6,7,10,15,17,18,35,36].

This dataset consists of documents appeared on the Reuters

newswire in 1987 and they were manually organized into

categories by personnel from Reuters Ltd. There exist many

versions of Reuters-21578 text collection due to the fact

that different researchers have different evaluation criteria

on their classiﬁcation models. In our experiment, we have

adopted the Reuters-21578 R8 dataset which is the set of

the 8 categories with the highest number of positive training

data, and this collection only consists of single labeled text

documents. The R8 version of the Reuters-21578 text col-

lection consists of the categories as illustrated in Table 10.

Table 10 List of categories of the Reuters-21578 R8 dataset

1. Acq

2. Crude

3. Earn

4. Grain

5. Interest

6. Money-FX

7. Ship

8. Trade

We have conducted the experiments by implement-

ing the conventional SVM classiﬁcation approach and

the Euclidean-SVM classiﬁcation approach independently.

Each of these approaches has been evaluated with the imple-

mentation of different kernel functions and different values

of parameter C during the training phase. We have imple-

mented the conventional SVM classiﬁcation approach using

MATLAB version 7.6.0.324 (R2008a) with LIBSVM tool-

box version 2.91 [67]. As for the Euclidean-SVM, we im-

plemented the proposed classiﬁcation approach by using the

same version of MATLAB and LIBSVM toolbox to identify

the set of SVs of each of the categories, and we developed

an additional module which performs the computation of the

average Euclidean distances of the new data point to the set

of SVs of each of the categories, and make the classiﬁcation

decision based on the category which has the lowest average

distance between its SVs and the new data point.

In our experiments, we have implemented the classiﬁca-

tion approaches with four common kernel functions for the

SVM, linear kernel, polynomial kernel, radial basis function

(RBF) kernel and sigmoid kernel. As for the parameter C,

the range of values of 1, 101,10

2,10

3,10

4and 105has

been applied to both of the tested classiﬁers. By conduct-

ing the experiments on these two classiﬁcation approaches

separately with different kernel functions and different val-

ues of parameter C, we are able to evaluate the performance

of each approach and to determine the improvement of the

Euclidean-SVM approach (if any) in contrast to the conven-

tional SVM model in terms of classiﬁcation accuracy. Be-

sides this, we are also able to evaluate the impact of the

implementation of different kernel functions and parame-

ter C, on the conventional SVM approach, as well as the

Euclidean-SVM approach.

4.1 Experiment on the vehicles dataset

The Vehicles dataset consists of 4 categories with a total of

640 documents. Each category consists of 160 documents

where 50 documents were used to build the training set,

and the remaining 110 documents were used for testing pur-

poses. In other words, the Vehicles dataset had been split

An enhanced Support Vector Machine classiﬁcation framework by using Euclidean distance function 89

Table 11 Classiﬁcation accuracies of the SVM classiﬁer and the Euclidean-SVM classiﬁer with different kernels and different values of parameter

C, on the vehicles dataset

Classiﬁcation approach Classiﬁcation accuracy (%) Variance of

(Kernels) accuracies

Dataset: Vehicles Value of soft margin parameter, C across

Training Set: 200 Documents values of

Testing Set: 440 Document 1 10 100 1000 10000 100000 parameter C

SVM (Linear) 93.75 93.33 92.92 92.92 92.92 92.92 0.1201

SVM (Polynomial) 92.50 92.50 92.50 92.50 92.50 92.50 0

SVM (RBF) 25.00 25.00 25.00 25.00 25.00 25.00 0

SVM (Sigmoid) 25.00 25.00 25.00 25.00 25.00 25.00 0

Euclidean-SVM (Linear) 93.75 93.33 93.75 94.17 94.17 94.17 0.1176

Euclidean-SVM (Polynomial) 94.17 94.17 94.17 94.17 94.17 94.17 0

Euclidean-SVM (RBF) 93.33 93.33 93.33 93.33 93.33 93.33 0

Euclidean-SVM (Sigmoid) 93.33 93.33 93.33 93.33 93.33 93.33 0

into a training set with 200 documents and a testing set with

440 documents.

Table 11 shows the experimental results of the conven-

tional SVM classiﬁer and the Euclidean-SVM classiﬁer,

which have been implemented with different kernels and dif-

ferent values of parameter C, on the Vehicles dataset.

As illustrated in Table 11, the performance of the conven-

tional SVM classiﬁer on the Vehicle dataset is highly depen-

dent on the implementation of kernel functions. Both linear

kernel and polynomial kernel have contributed high classiﬁ-

cation accuracies to the SVM, which are between the range

of 92.50% and 93.75%. On the other hand, the SVM clas-

siﬁer with RBF kernel and the SVM classiﬁer with sigmoid

kernel have performed poorly on the Vehicles dataset, with

the accuracies of 25.00%. This is due to the fact that the

implementation of appropriate kernel function is a necessity

for the SVM classiﬁer to guarantee good generalization abil-

ity. The wrong implementation of kernel functions will lead

to a seriously poor classiﬁcation performance of the SVM.

In other words, the implementation of kernel functions has

very high impact to the classiﬁcation accuracy of the SVM

classiﬁcation approach.

As for the performance of the Euclidean-SVM classiﬁ-

cation approach on the Vehicles dataset, we have obtained

classiﬁcation accuracies between the range of 93.33% to

94.17%, with the implementation of different kernels and

different values of parameter C. Hence, we can conclude

that the Euclidean-SVM classiﬁcation approach is nearly

immune from the implementation of kernel function and pa-

rameter C, in order to obtain good classiﬁcation accuracy.

In this experiment, parameter C does not have great im-

pact on both of the conventional SVM classiﬁer and the

Euclidean-SVM classiﬁer. As illustrated in Table 11,the

variances of classiﬁcation accuracies across the tested val-

ues of parameter C for both of the tested classiﬁcation ap-

proaches are only approximately 0.12. This is due to the fact

that the data points for each of the available categories in

this dataset are very dissimilar, as they are easily differen-

tiated with their own unique features. Non-separable data

points are hardly found during the classiﬁcation phase. With

a small number of non-separable points found in the clas-

siﬁcation phase, the effect of parameter C, which creates

the soft margin that allows some of the classiﬁcation errors

when non-separable points occur, has become minimum in

the classiﬁcation task.

4.2 Experiment on the mathematics dataset

The Mathematics dataset consists of 8 categories with a to-

tal of 320 documents. Each of the 8 categories consists of

an equal number of 40 documents. 10 documents from each

category were obtained to construct the training set with a

total number of 80 documents, and the remaining 30 docu-

ments from each category were used for building the testing

set with 240 documents.

Table 12 illustrates the experimental results of the con-

ventional SVM classiﬁer and the Euclidean-SVM classiﬁer

with different kernels and different values of parameter C,

on the Mathematics dataset.

BasedonTable12, we can observe that the implementa-

tion of different kernel functions has affected the classiﬁca-

tion performance of the conventional SVM classiﬁer on the

Mathematics dataset. As illustrated in Table 12, both of the

linear kernel and polynomial kernel have contributed rela-

tively high classiﬁcation accuracies to the SVM. With the

varying of the value of parameter C, the classiﬁcation ac-

curacies of the SVM with linear kernel have the variance of

27.8211, while the classiﬁcation accuracies of the SVM with

90 L.H. Lee et al.

Table 12 Classiﬁcation accuracies of the SVM classiﬁer and the Euclidean-SVM classiﬁer with different kernels and different values of parameter

C, on the mathematics dataset

Classiﬁcation approach Classiﬁcation accuracy (%) Variance of

(Kernels) accuracies

Dataset: Mathematics Value of soft margin parameter, C across

Training Set: 80 Documents values of

Testing Set: 240 Document 1 10 100 1000 10000 100000 parameter C

SVM (Linear) 61.25 74.17 74.17 74.17 74.17 74.17 27.8211

SVM (Polynomial) 70.00 70.00 70.00 70.42 70.00 70.00 0.0290

SVM (RBF) 41.67 41.67 41.67 41.67 41.67 41.67 0

SVM (Sigmoid) 12.50 12.50 12.50 12.50 12.50 12.50 0

Euclidean-SVM (Linear) 75.83 74.17 73.75 73.75 73.75 73.75 0.6922

Euclidean-SVM (Polynomial) 72.92 72.92 72.92 72.92 72.92 72.92 0

Euclidean-SVM (RBF) 75.83 75.83 75.83 75.83 75.83 75.83 0

Euclidean-SVM (Sigmoid) 75.83 75.83 75.83 75.83 75.83 75.83 0

polynomial kernel have the variance of 0.029, which is more

consistent as compared to the classiﬁcation accuracies of the

SVM with linear kernel. On the other hand, the SVM clas-

siﬁer with RBF kernel and the SVM classiﬁer with sigmoid

kernel have performed poorly on the Mathematics dataset.

With the varying of the value of parameter C, the classiﬁca-

tion performance of the SVM with RBF kernel is consistent

with the accuracies of 41.67%, while the SVM with sigmoid

kernel has achieved poor performance with low but consis-

tent accuracies of 12.50%. As the nature of the conventional

SVM, the inconsistency of the SVM classiﬁcation perfor-

mance in this experiment is due to the implementation of

different kernel functions. In order to guarantee good gener-

alization ability for the SVM, the determination of the right

kernel function is considered as mandatory in this experi-

ment.

On the other hand, the Euclidean-SVM approach is not

highly depend on the implementation of kernel functions

and parameter C. The Euclidean-SVM has achieved accu-

racies between the range of 72.92% to 75.83%, with the

implementation of different kernels and different values of

parameter C. In other words, the Euclidean-SVM classiﬁer

has better consistency in terms of accuracy, with the imple-

mentation of different kernel functions and different value

of parameter C, as compared to the conventional SVM clas-

siﬁer.

4.3 Experiment on the automobiles dataset

The Automobiles dataset consists of 9 categories, which

consist of an equal number of 30 documents for each of

them. In other words, this dataset consists of a total of 270

documents. 10 documents from each category had been uti-

lized to construct the training set with 90 documents, and

the remaining 20 documents from each category were used

to build the testing set with 180 documents.

Table 13 shows the experimental results of the conven-

tional SVM classiﬁer and the Euclidean-SVM classiﬁer,

which have been implemented with different kernels and dif-

ferent values of parameter C, on the Automobiles dataset.

Table 13 shows that the implementation of kernel func-

tions and parameter C has very high impact on the classi-

ﬁcation performance of the conventional SVM approach,

while the Euclidean-SVM approach does not suffer from

this problem. As illustrated in Table 13, the SVM classiﬁer

with linear kernel has achieved medium classiﬁcation per-

formance on the Automobiles dataset, with accuracies be-

tween the range of 56.11% to 68.89% (variance of classiﬁ-

cation accuracies is 24.6998), while the value of parameter

C varies from 1 to 105. The SVM classiﬁer with polynomial

kernel has achieved consistent but poor classiﬁcation accu-

racies of 30.56%, with the varying of the value of parameter

C from 1 to 105. The SVM classiﬁer with RBF kernel and

the SVM classiﬁer with sigmoid kernel have achieved the

lowest classiﬁcation performance in this experiment, with

consistent accuracies of 11.11% while the value parameter

C varies within the tested range. These results further justify

that the SVM is highly dependent on the implementation

of kernel functions. The implementation of the inappropri-

ate kernel function may lead to a high risk of obtaining low

classiﬁcation accuracy from the SVM classiﬁer.

While SVM suffers from the problem of being highly

dependent on the implementation of kernel function, the

Euclidean-SVM has achieved classiﬁcation accuracies be-

tween the range 59.44% to 67.78% (variance of classiﬁca-

tion accuracies across the tested values of parameter C is

8.8515), with the implementation of linear kernel and differ-

ent values of parameter C. Even though the gap between the

An enhanced Support Vector Machine classiﬁcation framework by using Euclidean distance function 91

Table 13 Classiﬁcation accuracies of the SVM classiﬁer and the Euclidean-SVM classiﬁer with different kernels and different values of parameter

C, on the automobiles dataset

Classiﬁcation approach Classiﬁcation accuracy (%) Variance of

(Kernels) accuracies

Dataset: Automobiles Value of soft margin parameter, C across

Training Set: 90 Documents values of

Testing Set: 180 Document 1 10 100 1000 10000 100000 parameter C

SVM (Linear) 62.78 68.89 58.33 56.11 56.67 57.22 24.6998

SVM (Polynomial) 30.56 30.56 30.56 30.56 30.56 30.56 0

SVM (RBF) 11.11 11.11 11.11 11.11 11.11 11.11 0

SVM (Sigmoid) 11.11 11.11 11.11 11.11 11.11 11.11 0

Euclidean-SVM (Linear) 67.78 64.44 63.33 60.56 59.44 62.22 8.8515

Euclidean-SVM (Polynomial) 67.78 67.78 67.78 67.78 67.78 67.78 0

Euclidean-SVM (RBF) 62.78 62.22 62.22 62.22 62.22 62.22 0

Euclidean-SVM (Sigmoid) 67.78 67.78 67.78 67.78 67.78 67.78 0

Table 14 Classiﬁcation accuracies of the SVM classiﬁer and the Euclidean-SVM classiﬁer with different kernels and different values of parameter

C, on the WebKB dataset

Classiﬁcation approach Classiﬁcation accuracy (%) Variance of

(Kernels) accuracies

Dataset: WebKB Value of soft margin parameter, C across

Training Set: 2803 Documents values of

Testing Set: 1396 Document 1 10 100 1000 10000 100000 parameter C

SVM (Linear) 48.60 72.68 73.11 74.12 74.33 73.90 104.7953

SVM (Polynomial) 38.99 38.99 38.99 38.99 73.62 73.33 317.1325

SVM (RBF) 62.36 72.83 73.33 74.33 74.40 74.40 22.4632

SVM (Sigmoid) 48.60 72.68 73.11 74.12 74.33 73.97 104.9205

Euclidean-SVM (Linear) 68.60 70.68 68.53 64.01 62.22 61.57 14.5823

Euclidean-SVM (Polynomial) 68.60 68.45 68.67 68.67 69.89 68.17 0.3522

Euclidean-SVM (RBF) 69.75 69.53 67.16 63.51 61.72 62.29 13.0916

Euclidean-SVM (Sigmoid) 68.60 70.68 68.53 64.01 62.22 61.57 14.5823

highest accuracy and the lowest accuracy for the Euclidean-

SVM classiﬁer in this experiment is approximately 8%, the

classiﬁcation performance of the Euclidean-SVM approach

is still considered as having low dependence on the imple-

mentation of different kernel functions and different value

of parameter C on the classiﬁer, as compared to the conven-

tional SVM.

4.4 Experiment on the WebKB dataset

The WebKB dataset which had been utilized in our exper-

iments was acquired from Ana Cardoso-Cachopo’s web-

site [68]. It consists of 4 categories with a total of 4199 doc-

uments. The training set is constructed by 2803 documents,

while the testing set consists of 1396 documents.

Table 14 illustrates the experimental results of the con-

ventional SVM classiﬁer and the Euclidean-SVM classiﬁer

with different kernels and different values of parameter C,

on the WebKB dataset.

Table 14 again shows the inconsistency in terms of clas-

siﬁcation accuracy for the SVM with different kernel func-

tions and different values of parameter C. Based on Table 14,

we can observe that the SVM has achieved high classiﬁca-

tion accuracies for every implementation of different kernel

functions, which are approximately 74%. However, the in-

appropriate value of parameter C will severely degrade the

accuracy of the SVM classiﬁers to below 50% (48.60% for

the SVM with linear kernel, C =1 and SVM with sigmoid

kernel, C =1) or even down to below 40% (38.99% for the

SVM with polynomial kernel, C =1 to 1000).

92 L.H. Lee et al.

On the other hand, the implementation of different ker-

nel functions and different value of parameter C does not

have high impact on the Euclidean-SVM classiﬁcation ap-

proach. The Euclidean-SVM has achieved classiﬁcation ac-

curacies between the range of 61.57% to 70.68%, with the

implementation of different kernels and different values of

parameter C. These results are considered as consistent even

though the gap between the highest classiﬁcation accuracy

and the lowest classiﬁcation accuracy in this experiment is

approximately 9%, as compared to the conventional SVM,

where the gap between the highest classiﬁcation accuracy

and the lowest classiﬁcation accuracy is approximately 35%.

The results in this experiment have further justiﬁed that the

Euclidean-SVM has lower dependency on the implementa-

tion of kernel functions and value of parameter C, as com-

pared to the conventional SVM.

In this experiment, the parameter C has greatly affected

the performance of the conventional SVM classiﬁer with

different kernel functions. Based on the results as illus-

trated in Table 14, high variances of classiﬁcation accuracies

across the tested values of parameter C have been recorded

for the conventional SVM, which is up to 317.13. This is

due to the fact that the WebKB dataset consists of ﬁles col-

lected from computer science departments of various uni-

versities. Text documents in this dataset are describing top-

ics which are related to computer science in various univer-

sities. Therefore, text documents for each of the categories

are very similar to each other, and non-separable cases have

high occurrences during the classiﬁcation phase. As the re-

sult, parameter C has signiﬁcant effect in the classiﬁcation

performance of the conventional SVM, due to the fact that

soft margin is needed to allow classiﬁcation errors caused by

the non-separable cases. On the other hand, the Euclidean-

SVM is not as sensitive as the conventional SVM to the im-

plementation of parameter C, as low variances of classiﬁca-

tion accuracies across the tested values of parameter C (less

than 15) have been recorded in this experiment.

4.5 Experiment on the Reuters-21578 R8 dataset

The Reuters-21578 R8 dataset which had been used in our

experiments was acquired from Ana Cardoso-Cachopo’s

website [68], which is the same source where the WebKB

dataset was acquired. This collection consists of 7670 doc-

uments which had been categorized into 8 categories. The

documents in the collection had been divided into training

set and testing set, which consist of 5483 documents and

2187 documents respectively.

Table 15 illustrates the experimental results of the con-

ventional SVM classiﬁer and the Euclidean-SVM classiﬁer

with different kernels and different values of parameter C,

on the Reuters-21578 R8 dataset.

The results illustrated in Table 15 show that, as compared

to the conventional SVM approach, the Euclidean-SVM has

better consistency in terms of classiﬁcation accuracy over

the implementation of different kernel functions and differ-

ent values of parameter C. The conventional SVM classiﬁer

has scored high accuracies in this experiment, only when

the appropriate combination of kernel function and param-

eter C is implemented. In this experiment, the best classi-

ﬁcation accuracy of 94.97% has been achieved by the con-

ventional SVM with linear kernel function and parameter C

is set at 100000. However, the conventional SVM has only

achieved high classiﬁcation accuracies (between the range

of 87.75% to 94.97%) while the value of parameter C is high

(C =100000), and this is not applicable for the implemen-

tation of polynomial kernel function, where the SVM classi-

ﬁer scores the lowest accuracy of 49.52%. As illustrated in

Table 15 Classiﬁcation accuracies of the SVM classiﬁer and the Euclidean-SVM classiﬁer with different kernels and different values of parameter

C, on the Reuters-21578 R8 dataset

Classiﬁcation approach Classiﬁcation accuracy (%) Variance of

(Kernels) accuracies

Dataset: Reuters-21578 R8 Value of Soft Margin Parameter, C across

Training Set: 5483 Documents values of

Testing Set: 2187 Document 1 10 100 1000 10000 100000 parameter C

SVM (Linear) 52.17 87.75 93.14 94.19 94.51 94.97 283.6759

SVM (Polynomial) 49.52 49.52 49.52 49.52 49.52 49.52 0

SVM (RBF) 49.52 49.52 49.52 49.52 66.16 89.21 264.6682

SVM (Sigmoid) 49.52 49.52 49.52 49.52 52.17 87.75 238.0053

Euclidean-SVM (Linear) 81.48 77.73 80.43 63.69 65.11 54.55 120.2538

Euclidean-SVM (Polynomial) 82.72 82.72 82.81 82.72 82.72 82.72 0.0014

Euclidean-SVM (RBF) 82.30 82.30 82.30 82.30 80.80 77.05 4.4438

Euclidean-SVM (Sigmoid) 84.73 84.73 82.58 81.48 81.48 77.73 6.7854

An enhanced Support Vector Machine classiﬁcation framework by using Euclidean distance function 93

Table 15, in general, we could observe that the SVM clas-

siﬁer with linear kernel function has obtained high accura-

cies (between the range of 87.75% to 94.97%) in the exper-

iment with the implementation of different values of param-

eter C, except that when parameter C is set at 1. However, if

the SVM classiﬁer has the inappropriate implementation of

kernel functions, the classiﬁcation performance is severely

degraded and low classiﬁcation accuracies have been ob-

tained. The SVM classiﬁer with polynomial kernel function

has achieved the baseline accuracy in this experiment, where

classiﬁcation accuracy is recorded as 49.52%, even though

the value of parameter C is altered from 1 to 100000. As for

the SVM classiﬁer with RBF kernel function and the SVM

classiﬁer with sigmoid kernel function, low classiﬁcation ac-

curacies have been recorded unless the right value of param-

eter C is applied. Based on the results obtained from this ex-

periment, it has been proven again that the performance of

the conventional SVM classiﬁer is highly dependent to the

implementation of kernel functions and parameter C.

As for the Euclidean-SVM classiﬁcation approach, even

though the highest classiﬁcation accuracy recorded (84.73%

for the Euclidean-SVM with sigmoid kernel, C =1 to 10)

is lower than the highest classiﬁcation accuracy recorded

(94.97%) for the conventional SVM with linear kernel func-

tion and parameter C =100000, the overall consistency in

terms of accuracy for the Euclidean-SVM with the imple-

mentation of different kernel functions and different values

of parameter C is much better as compared to the conven-

tional SVM. As illustrated in Table 15, the highest variance

of classiﬁcation accuracies across the tested values of pa-

rameter C for the Euclidean-SVM is recorded at 120.25,

while the highest variance of classiﬁcation accuracies across

the tested values of parameter C for the conventional SVM

is recorded at 283.68. In most of the combinations of dif-

ferent kernel functions and different values of parameter C,

the Euclidean-SVM has achieved high classiﬁcation accu-

racies (77.05% to 84.73%), except for the implementation

of linear kernel function and parameter C is set at the range

from 1000 to 100000. On the other hand, the conventional

SVM approach has achieved low classiﬁcation accuracies in

most of the combinations of different kernel functions and

different values of parameter C, except for the implementa-

tion of linear kernel function and parameter C is set at the

range from 10 to 100000, the implementation of RBF kernel

function and parameter C is set at 100000, and the imple-

mentation of sigmoid kernel function and parameter C is set

at 100000. The experimental results here have further jus-

tiﬁed that the Euclidean-SVM has achieved better overall

performance, and lower dependency on the implementation

of kernel functions and value of parameter C, as compared

to the conventional SVM.

As illustrated in Table 15, the Euclidean-SVM has

achieved a low accuracy when linear kernel function is ap-

plied and parameter C is set at 100000. This is due to the

fact that the high value of parameter C leads to the condi-

tion of overﬁtting [17] to the SVM training algorithm, and

less training data points are identiﬁed as the SVs. In such a

situation, the Euclidean-SVM suffers from lack of sufﬁcient

information in computing the average Euclidean distance

between the input data points and each set of the SVs from

different categories, hence the classiﬁcation performance is

degraded. This problem could be solved by setting the pa-

rameter C with low values. Based on the results obtained in

this experiment, the range of values of parameter C from 1

to 100 will lead the Euclidean-SVM classiﬁcation approach

to achieve good accuracies (approximately 80%), regardless

the implementation of kernel functions.

4.6 Comparison of the performance of the Bayesian

vectorization technique and TFIDF vectorization

technique

In this paper, the Bayesian vectorization technique had been

utilized to transform textual data into numerical format. On

the other hand, the TFIDF (Term Frequency Inverse Docu-

ment Frequency) technique has been reported as one of the

most widely used pre-processing technique by many text

mining research groups for the same purpose. To validate

that the enhancement of our proposed Euclidean-SVM clas-

siﬁcation framework is contributed by the implementation

of the Euclidean distance function to replace the optimal

separating hyper-plane of the conventional SVM classiﬁca-

tion approach, rather than the preprocessing technique, we

have conducted an additional experiment using the Reuters-

21578 dataset, to compare the performance of the Bayesian

Table 16 Comparison of the TFIDF-SVM classiﬁers and the Bayesian-SVM classiﬁers with different kernel functions

Vectorization technique-classiﬁer Accuracy of classiﬁer using different kernels (%)

Dataset: Reuters-21578 R8

Training Set: 5483 Documents

Testing Set: 2187 Documents Linear Polynomial RBF Sigmoid

TFIDF-SVM 90.29 80.14 90.58 90.29

Bayes-SVM 94.97 92.87 94.97 94.92

94 L.H. Lee et al.

vectorization over the TFIDF vectorization as the prepro-

cessing technique for the SVM classiﬁer. The experimental

results of this comparison have been presented in Table 16

below.

Based on the comparison conducted using the Reuters-

21578 dataset, the Bayesian-SVM classiﬁers always outper-

forms the TF-IDF-SVM classiﬁers, with the implementation

of all tested types of kernel functions. The results presented

in Table 16 prove that the Bayesian vectorization technique

provides better transformation of textual data to numerical

data for the SVM classiﬁer, as compared to the TFIDF vec-

torization technique. In our previous works as presented in

[14,60], the SVM classiﬁer with the Bayesian vectorization

has also been proven to have better performance, in terms

of classiﬁcation accuracy and time consumption, as com-

pared to the SVM classiﬁer with the TFIDF vectorization

technique. These results show that the Bayesian vectoriza-

tion technique has contributed a more effective textual data

transformation process to the SVM classiﬁer, as compared

to the use of TFIDF vectorization technique for the same

purpose.

As the Bayesian vectorization contributes to the im-

provement of the classiﬁer in the pre-processing stage, the

Euclidean-SVM approach further improves the conventional

SVM by replacing the optimal separating hyper-plane with

the Euclidean distance function in making the classiﬁca-

tion decision. We have also conducted the experiment on

the Euclidean-SVM classiﬁcation approach with different

vectorization techniques, to further justify the fact that the

main contribution of the proposed Bayesian-SVM clas-

siﬁcation framework to the conventional SVM approach

is delivered by the implementation of the Euclidean dis-

tance function to replace the optimal separating hyper-plane

of the conventional SVM classiﬁcation approach, rather

than the preprocessing technique by the Bayesian vector-

ization. Table 17 illustrates the results of comparison be-

tween the TFIDF-Euclidean-SVM classiﬁcation approach

and Bayesian-Euclidean-SVM classiﬁcation approach.

As illustrated in Table 17, the TFIDF-Euclidean-SVM

classiﬁer performs badly with only 1.02% of accuracy. This

is due to the fact that the high dimensionality of vectorized

data (approximately 18,000 dimensions, which is equal to

the number of words in the text collection) resulted from

the TFIDF vectorization has led to high complexity in com-

putation to the Euclidean-SVM approach in making classi-

ﬁcation decision. As the Euclidean distance function may

suffer from curse of dimensionality, high dimension of data

may severely degrade the effectiveness and efﬁciency of

the classiﬁcation process due to the convoluted computa-

tion, hence leads to the poor performance of the Euclidean-

SVM classiﬁer. On the other hand, the transformation of

data by the Bayesian vectorization technique, which reduces

the dimensionality of data from thousands to typically less

than one hundred (number of categories the document may

be classiﬁed to) has contributed to better performance of

the Euclidean-SVM classiﬁcation approach, as the com-

putational complexity by the Euclidean distance function

has been drastically reduced. The Bayesian-Euclidean-SVM

classiﬁer has achieved a classiﬁcation accuracy of 84.73%.

Besides, the high dimensionality of vectorized data by

the TFIDF technique has also resulted to higher training

time and testing time consumptions of the Euclidean-SVM

classiﬁer. In this experiment, the TFIDF-Euclidean-SVM

classiﬁer has recorded the training time consumption of 20

seconds and the testing time consumption of 27 hours 50

minutes and 32 seconds. On the other hand, The Bayesian-

Euclidean-SVM classiﬁer has recorded the training time

consumption of 6 seconds and the testing time consumption

of 2 hours 51 minutes 29 seconds. The time consumptions of

the TFIDF-Euclidean-SVM classiﬁer, especially the testing

time is much higher as compared to the Bayesian-Euclidean-

SVM classiﬁer. This is again due to the high dimensional-

ity of vectorized data resulted from the TFIDF vectoriza-

tion, which lead to high computational complexity to the

Euclidean-SVM classiﬁcation approach. The system speci-

ﬁcations for running both of the TFIDF-Euclidean-SVM ap-

proach and the Bayesian-Euclidean-SVM approach are Intel

Core i3 CPU 550 at 3.2 GHz, 2 GB of RAM and Windows

7 Home Basic 32-Bit.

Based on the experimental results presented in this sec-

tion, it shows that the Bayesian vectorization technique has

enhanced the pre-processing stage of the conventional SVM

approach which typically implements TFIDF vectorization

technique. This fact has also been proven in our previous

works [14,60]. In the classiﬁcation phase, the Euclidean-

SVM approach further improves the performance of the con-

Table 17 Comparison of the TFIDF-Euclidean-SVM classiﬁer and the Bayesian-Euclidean-SVM classiﬁer

Dataset: Reuters-21578 R8 TFIDF-Euclidean-SVM Bayesian-Euclidean-SVM

Training Set: 5483 Documents

Testing Set: 2187 Documents

Classiﬁcation accuracy (%) 1.02 84.73

Training time (hh:mm:ss) 00:00:29 00:00:06

Testing time (hh:mm:ss) 27:50:32 02:51:29

An enhanced Support Vector Machine classiﬁcation framework by using Euclidean distance function 95

ventional SVM approach to obtain better effectiveness and

efﬁciency in performing classiﬁcation tasks. In conclusion,

the Bayesian vectorization technique and the Euclidean dis-

tance function provide a combination of enhancement tech-

niques to the conventional SVM approach, which is able to

improve the performance of the baseline approach in terms

of classiﬁcation accuracy and time consumptions.

4.7 Discussion on the experimental results

Based on the results that we obtained from a series of experi-

ments using different text datasets, we found that the perfor-

mance of the Euclidean-SVM classiﬁcation framework has

low dependency on the implementation of kernel functions

and parameter C. For all the ﬁve datasets that we have used

in our experiments, high classiﬁcation accuracy can always

been obtained by the Euclidean-SVM classiﬁer with linear

kernel function and a small value of parameter C. In all of

our experiments by carrying out classiﬁcation tasks on each

of the ﬁve datasets, when linear kernel is used and parame-

ter C is set at the value of 1, the Euclidean-SVM classiﬁer

always outperforms the conventional SVM classiﬁer. This

situation shows that, by performing the classiﬁcation tasks

using the Euclidean-SVM approach, high accuracies could

be obtained, without needing the transformation of the origi-

nal vector space into a high dimensional feature space using

kernel functions. This is due to the fact that the Euclidean

distance function could perform effective classiﬁcation de-

cision making task, as long as all the training data points (the

SVs) and the input data points are mapped into the same

vector space. Besides this, the selection of optimal value

of parameter C can also be avoided in the Euclidean-SVM

classiﬁcation approach. As shown by the results obtained

from the experiments, in most cases, the variances of clas-

siﬁcation accuracies across different values of parameter C

for the SVM approach are much higher as compared to the

Euclidean-SVM approach. This reiterates the fact that the

Euclidean-SVM approach has less dependency to the value

of parameter C. Moreover, the Euclidean-SVM classiﬁca-

tion approach can perform well with a small value of param-

eter C (based on our experiments, the optimal value for C is

1), while the small value of parameter C leads to the condi-

tion of underﬁtting for the conventional SVM classiﬁer [17].

According to SVM methodology, small value of parameter

C will lead to the situation where more training data points

have been identiﬁed as the SVs. When more SVs have been

identiﬁed during training phase, more information is pro-

vided for the computation of average Euclidean distance be-

tween the input data points and each set of the SVs from

different categories. Therefore, more accurate classiﬁcation

results would be obtained for the Euclidean-SVM approach.

In contrast to the conventional SVM classiﬁcation approach

which conducts iterative processes to determine the right

kernel function and the appropriate value of parameter C, the

convoluted and computationally intensive training process

and the preparation of the additional validation set could be

avoided by implementing the Euclidean-SVM classiﬁcation

approach. In conclusion, our proposed Euclidean-SVM ap-

proach contributes to more effective and efﬁcient classiﬁca-

tion task, with its unique characteristic which is independent

from the implementation of different kernel functions and

different value of parameter C.

Besides, in our experiments, when the SVs are identiﬁed

using the SVM approach with certain kernel functions, such

as RBF kernel and sigmoid kernel, the Euclidean-SVM ap-

proach drastically outperforms the conventional SVM ap-

proach, even though both of these approaches share a simi-

larity that the SVs are similar. This situation can be observed

from the experiments on the Vehicles dataset, the Mathemat-

ics dataset and the Automobiles dataset. Based on our anal-

ysis, this situation is due to the fact that in the conventional

SVM classiﬁcation phase, the testing data point is computed

with alpha (αi) values of each support vector and summed

up in order to determine the right category of the testing data

point using (8)[37].

iF(x) =signl



i=1

yiαiK(x,xi)+b(8)

The αivalues play a signiﬁcant role in the performance of

the conventional SVM approach and the wrongly weighted

SVs play a very strong role in misclassiﬁcation. The hyper-

plane resulting from a kernel function that is not optimized

will result in αivalues that are not accurate. This leads to the

low classiﬁcation accuracy experienced by the conventional

SVM classiﬁer using the non-optimal kernels.

On the other hand, the Euclidean-SVM approach how-

ever computes the average distance between the testing data

point and each set of support vectors from different cate-

gories before making the classiﬁcation decision. Hence, dis-

tance and location of the testing data point and the SVs are

given higher priority in the classiﬁcation process. There is

no reliance on the αiweight values. The average distance

calculated also does not change drastically when the SVs

change due to kernel manipulation. By using the average

Euclidean distance, the effect of the wrongly weighted SVs

which is a result of using a non-optimal kernel function is

diluted. This causes the accuracy of the Euclidean-SVM ap-

proach to be much higher and relatively consistent than the

conventional SVM approach, even though the SVs used are

the same for both approaches.

Another reason that may cause this situation is that the

setting of a constant value to one of the kernel parameters for

the SVM. Changing the value of this kernel parameter will

vary the construction of the optimal separating hyper-plane

and the number of SVs during the training phase, hence re-

sult in different classiﬁcation accuracies for the conventional

96 L.H. Lee et al.

SVM approach, as well as the Euclidean-SVM approach.

The key point here is that, in terms of classiﬁcation accu-

racy, the Euclidean-SVM approach has better consistency,

as compared to the conventional SVM approach, across the

range of values of parameter C and the types of kernel func-

tion.

As for the analysis of computational complexity of the

Euclidean-SVM classiﬁcation approach, since it contributes

to an enhanced SVM classiﬁcation framework which is

more independent from the implementation of kernel func-

tions and parameters, as the trade-off, the classiﬁcation time

consumption of the Euclidean-SVM approach is higher than

the conventional SVM. The classiﬁcation complexity of the

Euclidean-SVM approach is depending on the dimension-

ality of the data points, and also the number of SVs gen-

erated from the training phase. This is due to the fact that

the Euclidean-SVM approach inherits the characteristic of

the nearest neighbor approach by calculating the distances

between the new input data point and each set of the SVs

from different categories using the Euclidean distance for-

mula. Based on our experimental results, we found that the

Euclidean-SVM classiﬁer performs well when the value of

parameter C is small, which will contribute to a high num-

ber of SVs during training phase. Due to the high number

of SVs, in the classiﬁcation phase, high computational time

has been consumed for calculating the average Euclidean

distance between the input data points and each set of the

SVs from different categories. The higher classiﬁcation time

consumption of our proposed Euclidean-SVM approaches

compared to the conventional SVM approach is reasonable,

since the training time of the Euclidean-SVM model is much

lesser than the training time of the conventional SVM classi-

ﬁer, which implement evolutionary algorithms to determine

the optimal combination of kernel functions and parameters.

Besides this, the relatively consistent classiﬁcation accuracy

of the Euclidean-SVM model with all ranges of kernel func-

tions and values of parameter C as compared to the conven-

tional SVM has been highlighted as one of the outstanding

characteristics of our proposed Euclidean-SVM classiﬁca-

tion framework.

5 Conclusion

A new text classiﬁcation framework is presented and de-

scribed here. The Euclidean-SVM classiﬁcation approach is

reported to have low impact on the implementation of ker-

nel function and soft margin parameter C. The classiﬁcation

accuracy of the Euclidean-SVM approach is relatively con-

sistent with the implementation of different kernel functions

and different values of parameter C, as compared to the con-

ventional SVM, which its classiﬁcation accuracy is severely

degraded with the implementation of an inappropriate ker-

nel function and parameter C. This is achieved through the

implementation of the Euclidean distance function to re-

place the optimal separating hyper-plane as the classiﬁca-

tion decision making function of the SVM. Unlike the opti-

mal separating hyper-plane of the conventionalSVM, where

the construction is highly dependent on kernel functions, the

Euclidean distance function could perform effective clas-

siﬁcation decision making tasks as long as all the training

data points and the input data points are in the same vector

space. Hence, the issue of selecting the appropriate kernel

function and parameter C can be avoided in the Euclidean-

SVM classiﬁcation framework. However, the classiﬁcation

phase of the Euclidean-SVM approach consumes a longer

time as compared to the conventional SVM. Besides this,

for certain classiﬁcation tasks where the similarity between

categories are high, for example, the WebKB dataset that

we have used in our experiments, the classiﬁcation accuracy

of the Euclidean-SVM approach is lower than the accuracy

of conventional SVM approach. This is due to the fact that

the Euclidean distance calculation which inherits the char-

acteristic of nearest neighbor approach, may suffer from the

curse of dimensionality, hence leads to the inefﬁcient classi-

ﬁcation tasks. As a future work, we will further investigate

on the alternative distance and similarity measurement func-

tions to replace the Euclidean distance function, which may

reduce the time consumption of the distance or similarity

calculation, and contribute to a more accurate distance or

similarity measurement for the SVs and the input data point,

hence leading to a more effective and efﬁcient SVM-based

text classiﬁcation framework.

References

1. Han EH, Karypis G, Kumar V (1999) Text categorization us-

ing weighted adjusted k-nearest neighbor classiﬁcation. Techni-

cal Report, Department of Computer Science and Engineering,

Army HPC Research Centre, University of Minnesota, Minneapo-

lis, USA

2. He J, Tan AH, Tan CL (2003) On machine learning methods for

Chinese document categorization. Appl Intell 18(3):311–322

3. Androutsopoulos I, Koutsias J, Chandrinos KV, Spyropoulos

CD (2000) An experimental comparison of Naïve Bayesian and

keyword-based anti-spam ﬁltering with personal e-mail messages.

In: Proceedings of the 23rd annual international ACM SIGIR con-

ference on research and development in information retrieval, pp

160–167

4. Chen JN, Huang HK, Tian SF, Qu YL (2009) Feature selec-

tion for text classiﬁcation with Naïve Bayes. Expert Syst Appl

36(3):5423–5435

5. Domingos P, Pazzani M (1997) On the optimality of the simple

Bayesian classiﬁer under zero-one loss. Mach Learn 29(2–3):103–

130

6. Eyheramendy S, Genkin A, Ju WH, Lewis D, Madigan D (2003)

Sparce Bayesian classiﬁers for text categorization. Technical

Report, Department of Statistics, Rutgers University, 2003.

URL:http://www.stat.rutgers.edu/~madigan/PAPERS/jicrd-v13.

pdf

An enhanced Support Vector Machine classiﬁcation framework by using Euclidean distance function 97

7. Kim SB, Rim HC, Yook DS, Lim HS (2002) Effective methods for

improving Naïve Bayes text classiﬁcation. In: Proceedings of the

7th Paciﬁc Rim international conference on artiﬁcial intelligence.

Springer, Heidelberg, pp 414–423

8. Lee LH, Isa D, Choo WO, Chue WY (2010) Tournament structure

ranking techniques for Bayesian text classiﬁcation with highly

similar categories. J Appl Sci—Asian Netw Sci Inf 10(13):1243–

1254

9. Lee LH, Isa D (2010) Automatically computed document depen-

dent weighting factor facility for Naïve Bayes classiﬁcation. Ex-

pert Syst Appl 37(12):8471–8478

10. McCallum A, Nigam K (1998) A comparison of event models for

Naïve Bayes text classiﬁcation. In: AAAI-98 workshop on learn-

ing for text categorization, pp 41–48

11. O’Brien C, Vogel C (2003) Spam ﬁlters: Bayes vs. chi-squared.

Letters vs. words. In: Proceedings of the 1st international sympo-

sium on information and communication technologies, pp 298–

303

12. Sahami M, Dumais S, Heckerman D, Horvitz E (1998) A Bayesian

approach to ﬁltering junk e-mail. In: AAAI-98 workshop on learn-

ing for text categorization, Madison, Wisconsin, pp 55–62

13. Diederich J, Kindermann J, Leopold E, Paass G (2003) Author-

ship attribution with support vector machines. Appl Intell 19(1–

2):109–123

14. Isa D, Lee LH, Kallimani VP, Rajkumar R (2008) Text document

pre-processing with the Bayes formula for classiﬁcation using the

support vector machine. IEEE Trans Knowl Data Eng 20(9):1264–

1272

15. Joachims T (1998) Text categorization with support vector ma-

chines: learning with many relevant features. In: Proceedings of

the 10th European conference on machine learning (ECML-98),

pp 137–142

16. Joachims T (1999) Making large-scale SVM learning practical. In:

Advances in kernel methods—-support vector learning, pp 169–

184

17. Joachims T (2002) Learning to classify text using Support Vector

Machines. Kluwer Academic Publishers, Dordrecht

18. Nigam K, Lafferty J, McCallum A (1999) Using maximum en-

tropy for text classiﬁcation. In: Proceedings of the IJCAI-99 work-

shop on machine learning for information ﬁltering, pp 61–67

19. Greiner R, Schaffer J (2001) AIxploratorium—decision trees.

Department of Computing Science, University of Alberta, Ed-

monton, AB T6G 2H1, Canada. URL:http://www.cs.ualberta.

ca/~aixplore/learning/DecisionTrees

20. Apte C, Damerau F, Weiss SM (1994) Automated learning of de-

cision rules for text categorization. ACM Trans Inf Sys 12(3):233–

251

21. Apte C, Damerau F, Weiss SM (1994) Towards language indepen-

dent automated learning of text categorization models. In: Pro-

ceedings of the 17th annual international ACM-SIGIR conference

on research and development in information retrieval, pp 23–30

22. Chen CM, Lee HM, Hwang CW (2005) A hierarchical neural net-

work document classiﬁer with linguistic feature selection. Appl

Intell 23(3):5423–5435

23. Isa D, Kallimani VP, Lee LH (2009) Using self-organizing map for

clustering of text document. Expert Syst Appl 36(5):9584–9591

24. Lee CH, Yang HC (2003) A multilingual text mining approach

based on self-organizing maps. Appl Intell 18(3):295–310

25. Bosnic Z, Kononenko I (2008) Estimation of individual predic-

tion reliability using the local sensitivity analysis. Appl Intell

29(3):187–203

26. Hao PY, Chiang JH, Lin YH (2009) A new maximal-margin

spherical-structured multi-class support vector machine. Appl In-

tell 30(2):98–111

27. Kocsor A, Toth L (2004) Application of kernel-based feature space

transformations and learning methods to phoneme classiﬁcation.

Appl Intell 21(2):129–142

28. Kyriacou E, Pattichis MS, Pattichis CS, Mavrommatis A,

Christodoulou CI, Kakkos S, Nicolaides A (2009) Classiﬁcation

of atherosclerotic carotid plaques using morphological analysis on

ultrasound images. Appl Intell 30(1):3–23

29. Li YM, Lai CY, Kao CP (2011) Building a qualitative recruitment

system via SVM with MCDM approach. Appl Intell 35(1):75–88

30. Li C, Liu K, Wang H (2011) The incremental learning algorithm

with support vector machine based on hyperplane-distance. Appl

Intell 34(1):19–27

31. Maglogiannis I, Zaﬁropoulos E, Anagnostopoulos I (2009) An in-

telligent system for automated breast cancer diagnosis and prog-

nosis using svm based classiﬁers. Appl Intell 30(1):24–36

32. Mahmoud SA, Al-Khatib WG (2010) Recognition of Arabic

(Indian) bank check digits using log-Gabor ﬁlters. Appl Intell.

doi:10.1007/s10489-010-0235-2

33. Maudes J, Rodriguez JJ, Garcia-Osorio C, Pardo C (2011)

Random projections for linear SVM ensembles. Appl Intell

34(3):347–359

34. Yu B, Yang Z (2009) A dynamic holding strategy in public transit

systems with real-time information. Appl Intell 31(1):69–80

35. Chakrabarti S, Roy S, Soundalgekar MV (2003) Fast and accu-

rate text classiﬁcation via multiple linear discriminant projection.

VLDB J 12(2):170–185

36. Yang YM, Liu X (1999) A re-examination of text categorization

methods. In: Proceedings of the 22nd annual international ACM

SIGIR conference on research and development in information re-

trieval (SIGIR’99), pp 42–49

37. Haykin S (1999) Neural network, a comprehensive foundation,

2nd edn. Prentice Hall, New York

38. Burges CJC (1998) A tutorial on Support Vector Machines

for pattern recognition. Bell Laboratories, Lucent Technologies.

Data Mining and Knowledge Discovery. URL:http://research.

microsoft.com/~cburges/papers/SVMTutorial.pdf

39. Shawe-Taylor J, Cristianini N (2004) kernel methods for pattern

analysis. Cambridge University Press, Cambridge

40. Alpaydin E (2004) Introduction to machine learning. MIT Press,

Cambridge

41. Hsu CW, Lin CJ (2002) A comparison of methods for multiclass

support vector machines. IEEE Trans Neural Netw 13(2):415–425

42. Staelin C (2003) Parameter selection for Support Vector Ma-

chines. Technical Report HPL-2002-354R1, Hewlett Packard Lab-

oratories

43. Quang AT, Zhang QL, Li X (2002) Evolving Support Vector Ma-

chine parameters. In: Proceedings of the 1st international confer-

ence on machine learning and cybernetics, pp 548–551

44. Friedrichs F, Igel C (2004) Evolutionary tuning of multiple SVM

parameters. In: Proceedings of European symposium on artiﬁcial

neural networks (ESANN’2004), pp 519–524

45. Briggs T, Oates T (2005) Discovering domain-speciﬁc composite

kernels. In: Proceedings of the 20th national conference of artiﬁ-

cial intelligence. AAAI Press, Menlo Park, pp 732–738

46. Dong Y, Xia Z, Tu M (2007) Selecting optimal parameters

in Support Vector Machines. In: Proceedings of the IEEE 6th

international conference on machine learning and applications

(ICMLA07).

47. Avci E (2009) Selecting of the optimal feature subset and kernel

parameters in digital modulation classiﬁcation by using hybrid ge-

netic algorithm-support vector machines: HGASVM. Expert Syst

Appl 36(2):1391–1402

48. Zhang Q, Shan G, Duan X, Zhang Z (2009) Parameters optimiza-

tion of Support Vector Machine based on simulated annealing and

genetic algorithm. In: Proceedings of the IEEE international con-

ference on robotics and biomimetics, pp 1302–1306

49. Diosan L, Rogozan A, Pecuchet JP (2010) Improving clas-

siﬁcation performance of Support Vector Machine by geneti-

cally optimising kernel shape and hyper-parameters. Appl Intell

doi:10.1007/s10489-010-0260-1

98 L.H. Lee et al.

50. Sun J (2008) Fast tuning of SVM kernel parameter using dis-

tance between two classes. In: Proceedings of the 3rd interna-

tional conference on intelligent system and knowledge engineer-

ing (ISKE2008), pp 108–113

51. Sun J, Zheng C, Li X, Zhou Y (2010) Analysis of the distance be-

tween two classes for tuning SVM hyperparameters. IEEE Trans

Neural Netw 21(2):305–318

52. Wu KP, Wang SD (2009) Choosing the kernel parameters for Sup-

port Vector Machines by the inter-cluster distance in the feature

space. Pattern Recognit 42(5):710–717

53. Buck TAE, Zhang B (2006) SVM kernel optimization: an example

in yeast protein subcellular localization prediction. Project Report,

School of Computer Science, Carnegie Mellon University, Pitts-

burgh, USA

54. Doniger S, Hofmann T, Yeh J (2002) Predicting CNS permeability

of drugs molecules: comparison of neural network and Support

Vector Machines algorithms. J Comput Biol 9(6):849–864

55. Kim H, Cha S (2005) Empirical evaluation of SVM-based

masquerade detection using UNIX commands. Comput Secur

24(2):160–168

56. Li H, Jiang T (2004) A class of edit kernels for SVMs to predict

translation initiation in eukaryotic mRNAs. In: Proceedings of the

8th annual international conference on research in computational

molecular biology, pp 262–271

57. Lu M, P Chen L, Huo J, Wang X (2008) Optimization of combined

kernel function for SVM based on large margin learning theory.

In: Proceedings of the IEEE international conference on systems,

man and cybernetics (SMC 2008), pp 353–358

58. Scholköpf B, Burgers CJC, Smola AJ (1999) Advances in kernel

methods: support vector learning. MIT Press, Cambridge

59. Yuan R, Li Z, Guan X, Xu L (2010) An SVM-based machine

learning method for accurate Internet trafﬁc classiﬁcation. Inf Syst

Front 12(2):149–156

60. Lee LH, Rajkumar R, Isa D (2010) Automatic folder allocation

system using Bayesian-support Vector Machines hybrid classiﬁ-

cation approach. Appl Intell. doi:10.1007/s10489-010-0261-0

61. Craven M, DiPasquo D, Freitag D, McCallum A, Mitchell T,

Nigam K, Slattery S (1998) Learning to construct knowledge

bases from the World Wide Web. In: Proceedings of the 15th na-

tional conference for artiﬁcial intelligence, pp 509–516

62. Callut J, Franscoisse K, Saerens M, Dupont P (2008) Semi-

supervised classiﬁcation from discriminative random walks. In:

Proceedings of the 2008 European conference on machine learn-

ing and knowledge discovery in databases—Part 1 (ECML PKDD

’08), pp 162–177

63. Ko Y, Seo J (2009) Text classiﬁcation from unlabeled documents

with bootstrapping and feature projection techniques. Inf Process

Manag 45(1):70–83

64. Li T, Zhu S, Ogihara M (2008) Text categorization via generalized

discriminant analysis. Inf Process Manag 44(5):1684–1697

65. Xue XB, Zhou ZH (2009) Distributional features for text catego-

rization. IEEE Trans Knowl Data Eng 21(3), 428–442

66. Zhang D, Mao R (2008) A new kernel for classiﬁcation of net-

worked entities. In: Proceedings of the 6th international workshop

on mining and learning with graphs, Helsinki, Finland

67. Chang C, Lin C (2001) LIBSVM: a library for support vec-

tor machines. Software available at: http://www.csie.ntu.edu.tw/~

cjlin/libsvm

68. Cardoso-Cachopo A (2011) Datasets for single label text catego-

rization. Artiﬁcial Intelligence Group, Department of Information

Systems and Computer Science, Instituto Superior Tecnico, Por-

tugal. URL:http://web.ist.utl.pt/~acardoso/datasets/

Lam Hong Lee received a Bache-

lor of Computer Science from Uni-

versiti Putra Malaysia in 2004, and

a Ph.D. degree in Computer Sci-

ence from the University of Not-

tingham in 2009. He joined Univer-

siti Tunku Abdul Rahman, Malaysia

since March 2009 as an assistant

professor. His current research in-

terest lies in improving text catego-

rization using AI techniques, specif-

ically Support Vector Machines. Be-

sides this, he is also investigating on

the implementation of data mining,

pattern recognition and machine learning techniques in various kinds

of intelligent systems.

Chin Heng Wan received his Bach-

elor of Information System (Hons)

in Information System Engineering,

from Universiti Tunku Abdul Rah-

man, Malaysia, in 2005. He is cur-

rently pursuing a Master of Com-

puter Science in Universiti Tunku

Abdul Rahman, Malaysia. His re-

search interests are in Artiﬁcial In-

telligence, Machine Learning, Pat-

tern Recognition, Text Mining, and

Intelligent Systems.

Rajprasad Rajkumar received his

Ph.D. degree and Master’s degree

from the University of Nottingham

in 2011 and 2005 respectively, and

Bachelor of Engineering in Electri-

cal and Electronic from University

Tenaga National in 2003. He is cur-

rently working at the Department

of Electrical and Electronic Engi-

neering, University of Nottingham

Malaysia Campus as an Assistant

Professor. His main research inter-

ests are in the use of support vec-

tor machines and signal processing

techniques in various domains. He is currently working in the area

of nondestructive testing, remote sensing, text document classiﬁcation

and developing unsupervised learning techniques in real-time systems.

Dino Isa is a Professor in the De-

partment of Electrical and Elec-

tronics Engineering of the Univer-

sity of Nottingham, Malaysia Cam-

pus. He obtained a BSEE (Hons)

from the University of Tennessee,

USA in 1986 and a Ph.D. from the

University of Nottingham, UK in

1991. Following his Ph.D., he was

employed as Engineering Section

Head in Motorola’s Power Products

Division in Seremban, Malaysia.

Subsequently he was recruited as

Plant Manager and then promoted

to Chief Technology Ofﬁcer of Crystal Clear Technology (CCT), a

An enhanced Support Vector Machine classiﬁcation framework by using Euclidean distance function 99

subsidiary of the Malaysian government’s investment arm, Khazanah

Nasional Berhad. He spent three years in the Westlake Village facility

in California as Director of Operations in the R&D phase of the project

before resuming his duties in CCT Malaysia. He joined the Univer-

sity of Nottingham in 2001. To date Prof. Isa has won ﬁve research

contracts worth RM 6,500,000 (£ 1,000,000) while at the University.

His research interest lies in the application of Machine Learning tech-

niques for various kinds of problems. The main aim of his research is

to formulate strategies which lead to the successful implementations of

“Intelligent Systems” in various domains.

A Review on Adverse Drug Reaction Detection Techniques

Article

Full-text available

Jun 2024

The detection of adverse drug reactions (ADRs) is an important piece of information for determining a patient's view of a single drug. This study attempts to consider and discuss this feature of drug reviews in medical opinion-mining systems. This paper discusses the literature that summarizes the background of this work. To achieve this aim, the first discusses a survey on detecting ADRs and side effects, followed by an examination of biomedical text mining that focuses on identifying the specific relationships involving ADRs. Finally, we will provide a general overview of sentiment analysis, particularly from a medical perspective. This study presents a survey on ADRs extracted from drug review sentences on social media, utilizing and comparing different techniques.

Coupling Taguchi Experimental Designs with Deep Adaptive Learning Enhanced Artificial Intelligence Process Models: A Novel Case in Promising Experimental Cost Savings Possibilities in Manufacturing Process Development

Preprint

Full-text available

Apr 2024

The Aluminum alloy AA7075 workpiece material is observed under dry finishing turning operation. This work is an investigation reporting promising potential of deep adaptive learning enhanced artificial intelligence process models for L18 (6133) Taguchi orthogonal array experiments and major cost saving potential in machining process optimization. Six different tool inserts are used as categorical parameter along with three continuous operational parameters i.e., depth of cut, feed rate and cutting speed to study their effect on output. Workpiece surface roughness and tool life are considered as output parameters. The data obtained from special L18 (6133) orthogonal array experimental design in dry finishing turning process is used to train AI models. Multi-layer perceptron based artificial neural networks (MLP-ANNs), support vector machines (SVMs) and decision trees are compared for better understanding ability of low resolution experimental design. Seven model evaluation criteria and external validation is used for final model selection. The AI models can be used with low resolution experimental design to obtain causal relationships between input and output variables. The best performing operational input ranges for surface roughness and tool life are identified keeping workpiece surface roughness as primary criteria of range selection in aerospace industry. TiN and TiCN are top two tool insets for obtaining low surface finish with maximum tool life under specified conditions. AI-response surfaces indicate different tool life behavior for alloy based coated tool inserts and non-alloy based coated tool inserts. The AI-Taguchi hybrid modelling and optimization technique helped in achieving 26% of experimental savings (obtaining causal relation with 26% less number of experiments) compared to conventional Taguchi design combined with two screened factors three levels full factorial experimentation.

Automated Dewey Decimal Classification of Swedish library metadata using Annif software

Article

Apr 2024
J DOC

Purpose In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an open source software package on a large set of Swedish union catalogue metadata records, with Dewey Decimal Classification (DDC) as the target classification system. It also aimed to contribute to the body of research on aboutness and related challenges in automated subject indexing and evaluation. Design/methodology/approach On a sample of over 230,000 records with close to 12,000 distinct DDC classes, an open source tool Annif, developed by the National Library of Finland, was applied in the following implementations: lexical algorithm, support vector classifier, fastText, Omikuji Bonsai and an ensemble approach combing the former four. A qualitative study involving two senior catalogue librarians and three students of library and information studies was also conducted to investigate the value and inter-rater agreement of automatically assigned classes, on a sample of 60 records. Findings The best results were achieved using the ensemble approach that achieved 66.82% accuracy on the three-digit DDC classification task. The qualitative study confirmed earlier studies reporting low inter-rater agreement but also pointed to the potential value of automatically assigned classes as additional access points in information retrieval. Originality/value The paper presents an extensive study of automated classification in an operative library catalogue, accompanied by a qualitative study of automated classes. It demonstrates the value of applying semi-automated indexing in operative information retrieval systems.

Geotechnical characterisation of coal spoil piles using high-resolution optical and multispectral data: A machine learning approach

Article

Feb 2024
ENG GEOL

Geotechnical characterisation of spoil piles has traditionally relied on the expertise of field specialists, which can be both hazardous and time-consuming. Although unmanned aerial vehicles (UAV) show promise as a remote sensing tool in various applications; accurately segmenting and classifying very high-resolution remote sensing images of heterogeneous terrains, such as mining spoil piles with irregular morphologies, presents significant challenges. The proposed method adopts a robust approach that combines morphology-based segmentation, as well as spectral, textural, structural, and statistical feature extraction techniques to overcome the difficulties associated with spoil pile characterisation. Additionally, it incorporates minimum redundancy maximum relevance (mRMR) based feature selection and machine learning-based classification. This automated characterisation will serve as a proactive tool for dump stability assessment, providing crucial data for improved stability models and contributing to a greener and more responsible mining industry.

A Novel Data Balancing Approach and a Deep Fractal Network with Light Gradient Boosting Approach for Theft Detection in Smart Grids

Article

Aug 2023

Electricity theft is the largest type of non-technical losses faced by power utilities around the globe. It not only raises revenue losses to the utilities but also leads to lethal fires and electric shocks at distribution side. In the past, field operation groups were sent by the utilities to conduct inspections of suspicions electric equipments stated by the public. Advanced metering infrastructure based recent development in the smart grids makes it easy to detect electricity thefts. However, the conventional supervised learning techniques have low theft detection performance mainly due to imbalance datasets available for training. Therefore, in this paper, we develop a novel theft detection model with twofold contribution. A unique hybrid sampling technique named as hybrid oversampling and undersampling using both classes (HOUBC) is proposed to balance the dataset. HOUBC first performs undersampling and then oversampling using both the majority (normal) and minority (theft) classes. A new deep learning method, fractal network is applied with light gradient boosting method to extract and learn important characteristics from electricity consumption profiles for identifying electricity thieves. The proposed model relies on smart meter's data for theft detection and hence, a rapid and widespread adaption of this model is feasible, which shows its main advantage. The performance of the model is evaluated with real-world smart meter's data, i.e., state grid corporation of China. Comprehensive simulation results describe the effectiveness of the proposed model against conventional schemes in terms of electricity theft detection.

A Deep Fractal Network with Light Gradient Boosting Approach for Theft Detection in Smart Grids

Preprint

Full-text available

Jun 2023

Electricity theft is the largest form of non-technical losses faced by power utilities around the globe. It not only raises revenue losses to the utilities but also leads to lethal fires and electric shocks at distribution side. In the past, field operation groups were sent by the utilities to conduct inspections of electric equipments based on suspicions reported by the public. The recent development of advanced metering infrastructure in the smart grids makes it easy to detect electricity thefts. However, the existing supervised learning methods have low theft detection performance mainly due to imbalance datasets available for training. Therefore, in this paper, we develop a novel theft detection model with twofold contribution. A unique hybrid sampling technique named as hybrid oversampling and undersampling using both classes (HOUBC) is proposed to balance the dataset. HOUBC first performs undersampling and then oversampling using both the majority (normal) and minority (theft) classes. A new deep learning method, fractal network is applied with light gradient boosting method to extract and learn important characteristics from electricity consumption profiles for identifying electricity thieves. The main advantage of this proposed model is that it only relies on smart meter's data for theft detection and hence, a rapid and widespread adaption of this model is feasible. The performance of the model is evaluated with the real-world smart meter's data, i.e., state grid corporation of China. Comprehensive simulation results show the effectiveness of the proposed model against state-of-the-art schemes in terms of electricity theft detection.

DALLMi: Domain Adaption for LLM-Based Multi-label Classifier

Chapter

Apr 2024

A Brief Survey of Vector Databases

Conference Paper

Dec 2023

Towards Applications of Machine Learning Algorithms for Sustainable Systems and Precision Agriculture

Chapter

Nov 2023

Agriculture plays an indispensable role in each country, serving as a major driving force for economic development. It holds the responsibility for producing the majority of the world’s sustenance for the increasing world population, which is expected to reach 9.8 billion by 2050. With the expected population growing substantially, and the lack of precise knowledge from farmers regarding climactic factors, irrigation demand, soil types, yield, market demand, pesticide use, and livestock needs, the farming process is under scrutiny to produce efficient solutions. The recent advances in Machine Learning (ML) have witnessed an extensive number of applications in agriculture to address the issues. ML falls under the category of Artificial Intelligence (AI) where statistical models enable programmable machines to automatically learn from a dataset. This paper surveys various ML algorithms applicable across sub-domains in agriculture, namely, crop management, water management, soil management, and livestock management. This paper discusses the various problems associated with adopting traditional methods such as soil sampling, laboratory analysis, etc. In addition, the ML algorithms proposed by other authors for forecasting or detection are discussed in detail. At last, the future directions for the application of ML in agriculture are illustrated.

Concept and dependencies enhanced graph convolutional networks for short text classification

Article

Sep 2023
J INTELL FUZZY SYST

Short text classification task is a special kind of text classification task in that the text to be classified is generally short, typically generating a sparse text representation that lacks rich semantic information. Given this shortcoming, scholars worldwide have explored improved short text classification methods based on deep learning. However, existing methods cannot effectively use concept knowledge and long-distance word dependencies. Therefore, based on graph neural networks from the perspective of text composition, we propose concept and dependencies enhanced graph convolutional networks for short text classification. First, the co-occurrence relationship between words is obtained by sliding window, the inclusion relationship between documents and words is obtained by TF-IDF, long-distance word dependencies is obtained by Stanford CoreNLP, and the association relationship between concepts in the concept graph with entities in the text is obtained through Microsoft Concept Graph. Then, a text graph is constructed for an entire text corpus based on the four relationships. Finally, the text graph is input into graph convolutional neural networks, and the category of each document node is predicted after two layers of convolution. Experimental results demonstrate that our proposed method overall best on multiple classical English text classification datasets.

Making large scale SVM learning practical

Article

Full-text available

Oct 1999

Thorsten Joachims

Training a support vector machine (SVM) leads to a quadratic optimization problem with bound constraints and one linear equality constraint. Despite the fact that this type of problem is well understood, there are many issues to be considered in designing an SVM learner. In particular, for large learning tasks with many training examples, off-the-shelf optimization techniques for general quadratic programs quickly become intractable in their memory and time requirements. SVMLight is an implementation of an SVM learner which addresses the problem of large tasks. This chapter presents algorithmic and computational results developed for SVMlight V2.0, which make large-scale SVM training more practical. The results give guidelines for the application of SVMs to large domains. Also published in: 'Advances in Kernel Methods - Support Vector Learning', Bernhard Schölkopf, Christopher J. C. Burges, and Alexander J. Smola (eds.), MIT Press, Cambridge, USA, 1998. The paper is written in English.

Improving Methods for Single-label Text Categorization

Article

Full-text available

A tutorial on support vector machines for pattern recognition. Bell laboratories. Lucent technologies

Article

Jan 1998
DATA MIN KNOWL DISC

J.C. Burges

Feature selection for text classification with Naïve Bayes

Article

Jan 2009

LIBSVM: A library for support vector machines

Article

Jan 2011

Text Categorization with Support Vector Machines: Learning with Many Relevant Features

Article

Jan 1998

Thorsten Joachims

Libsvm

Article

Apr 2011

LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

Neural Network: A Comprehensive Foundation

Book

Jan 1994

Haykin Simon

This book represents the most comprehensive treatment available of neural networks from an engineering perspective. Thorough, well-organized, and completely up to date, it examines all the important aspects of this emerging technology, including the learning process, back-propagation learning, radial-basis function networks, self-organizing systems, modular networks, temporal processing and neurodynamics, and VLSI implementation of neural networks. Written in a concise and fluid manner, by a foremost engineering textbook author, to make the material more accessible, this book is ideal for professional engineers and graduate students entering this exciting field. Computer experiments, problems, worked examples, a bibliography, photographs, and illustrations reinforce key concepts.

Making Large-Scale Support Vector Machine Learning Practical

Conference Paper

Feb 1999

Thorsten Joachims

Parameters Optimization of Support Vector Machine based on Simulated Annealing and Genetic Algorithm

Article

Dec 2009

The generalization error of support vector machine usually depends on its kernel parameters, but there is no analytic method to choose kernel parameters for SVM. In order to choose the kernel parameters for SVM, the simulated annealing algorithm and genetic algorithm are combined, which is called simulated annealing genetic algorithm (SA-GA), to choose the SVM kernel parameters. SA-GA makes use of encoding method, reproduction, crossover and mutation in the SA when generate new solution. In this way, the characteristic of SA that can accept a worse solution in a certain extent of probability can solve premature convergence of GA, and the heuristic search method of GA can make SA robust to the parameters of cooling schedule. So the combined algorithm has better performance than SA or GA, and it can get a better solution for optimization problem. At last, SA-GA has been used to choosing the kernel parameters of SVM. The results of simulation show that the performance of the method that proposed in this paper was more efficient than SA and GA for choosing kernel parameters of SVM.

An enhanced Support Vector Machine classification framework by using Euclidean distance function for text document categorization

Abstract and Figures

Recommended publications

Kernel-Based Semantic Text Categorization for Large Scale Web Information Organization

Oil and gas pipeline failure prediction system using long range ultrasonic transducers and Euclidean...

A hybrid text classification approach with low dependency on parameter by integrating K-nearest neig...

Automatic folder allocation system using Bayesian-support vector machines hybrid classification appr...

A New Framework to Categorize Text Documents using SMTP Measure