Conference PaperPDF Available

Training and Analysis of Support Vector Machine using Sequential Minimal Optimization

November 2008

November 2008

DOI:10.1109/ICSMC.2008.4811304

Source
IEEE Xplore

Conference: Systems, Man and Cybernetics, 2008. SMC 2008. IEEE International Conference on

Authors:

Shahrani Shahbudin

Universiti Teknologi MARA

Aini Hussain

Universiti Kebangsaan Malaysia

Salina Abdul Samad

Universiti Kebangsaan Malaysia

Noorita Tahir

Universiti Teknologi MARA

Maximizing the classification performance of the training data is a typical procedure in training a classifier. It is well known that training a Support Vector Machine (SVM) requires the solution of an enormous quadratic programming (QP) optimization problem. Serious challenges appeared in the training dilemma due to immense training and this could be solved using Sequential Minimal Optimization (SMO). This paper investigates the performance of SMO solver in term of CPU time, number of support vector and decision boundaries when applied in a 2-dimensional datasets. Next, the chunking algorithm is employed for comparison purpose. Initial results demonstrated that the SMO algorithm could enhance the performance of the training dataset. Both algorithms illustrated similar patterns from the decision boundaries attained. Classification rate achieved by both solvers are superb.

A 2D feature space with a separating hyperplane for nonlinear boundary. Both classification boundary and the accompanying soft margins are represented by solid line and dotted lines, respectively where as positive and negative examples fall on opposite sides of the decision boundary. The circled points are the support vectors that lie closest to the decision boundary.

…

Figures - uploaded by Shahrani Shahbudin

Content may be subject to copyright.

Content uploaded by Shahrani Shahbudin

Content may be subject to copyright.

Training and Analysis of Support Vector Machine

using Sequential Minimal Optimization

S.Shahbudin1, A. Hussain2, S. A. Samad3

Electrical, Electronics & System Engineering Department

Faculty of Engineering and Built Environment,

National University of Malaysia

Bangi, Selangor Darul Ehsan, Malaysia

shaqay@yahoo.com1 , aini@vlsi.eng.ukm.my2

N. Md Tahir4

Faculty of Electrical Engineering

Technology University of Mara,

Shah Alam, Selangor Darul Ehsan, Malaysia

Abstract— Maximizing the classification performance of the

training data is a ty pical procedure in training a classifier. It is

well known that training a Support Vector M achine (SVM)

requires the solution of an enormous quadratic programming

(QP) optimization problem. Serious challenges appeared in the

training dilemma due to immense training and this could be

solved using Sequential Minimal Optimization (SMO). This

paper investigates the performance of SMO solver in term of

CPU time, number of support vector and decision boundaries

when applied in a 2-dimensional datasets. Next, the chunking

algorithm is employed for comparison purpose. Initial results

demonstrated that the SMO algorithm could enhance the

performance of the training dataset. Both algorithms illustrated

similar patterns from the decision boundaries attained.

Classification rate achieved by both solvers are superb.

Keywords— Sequential Minimal Optimization, Chunking

algorithm, decision boundaries, support vector machine.

I. INTRODUCTION

Support Vector Machine (SVM) is an eminent technique

for solving classification problems. The goal of SVM is to

determine a classifier or regression machine that minimizes the

empirical risk namely the training set error and the confidence

interval which corresponds to the generalization or test set error

[1]-[2].

To obtain a SVM classifier with the best generalization

performance, appropriate training is required. Training SVM

entails the solution of a very large quadratic programming (QP)

optimization problem. However, Sequential Minimal

Optimization (SMO) with a fixed working set size is amongst

the popular decomposition method for training even for very

large data sets. Most of the researches [3]-[7], proved that SMO

gave superb performances in training infinite N-dimensional

data size. For example, in [3], SMO is applied to train SVM for

classifying large dataset such as the (UCI) “adult” data set, text

categorizations and sparse dataset. Results indicated that SMO

algorithm provides a better scaling for both linear and non

linear SVM with RBF Gaussian as the kernel function. It is also

performed extremely well with sparse data sets even for non-

linear SVM. In [7], various sample datasets such as ionosphere,

breast cancer and adult dataset have been tested and result

proved that SMO-based algorithm is significantly more

efficient than other methods available in the optimization

toolboxes.

However, in training of 2-dimensional (2D) data, the

performance of SMO algorithm is rarely visualized, analyzed

and studied. Thus, the purpose of this paper is to explore the

SMO capability in training 2D data as compared to chunking

algorithm along with visualization of the decision boundary

figures of both algorithms. In this study, to analyze the

performance of SMO algorithm, the parameters like CPU time,

number of support vector and the shape of decision boundaries

will be examined and observed.

A description of SVM is detailed in Section 2. In Section 3,

several previous training SVM algorithms are explained.

Experimental results of both algorithms are presented in

Section 4. Finally, in Section 5 we conclude our findings.

II. OVERVIEW OF SVM

In general, a Support Vector Machine (SVM) is a learning

machine for two class classification problems, and given a

labeled training dataset, (x1.,y1),…,(xl ,yl ) where xi∈RN is a

feature vector and yi∈ {-1,1} is a class label.

The algorithm seeks to define a decision surface which

gives the largest margin or separating between the data classes

whilst at the same time minimizing the number of errors.

However, this decision surface is not created in the input space,

but rather in a very high-dimensional feature space. The

resulting model is nonlinear, and is accomplished by the use of

kernel functions. The kernel function, K indicates a measure of

similarity between a pattern xi, and a pattern xj from the stored

training set. Using the kernel, the dual QP problem in term of

Lagrange Multipliers, Įi in the feature space is given in

equation (1), that is maximize

),(

)(

1,1

jiji

ixxKyyW ¦¦

−=

αααα

(1)

subject to the constraints

ii y

i≤≤

0 (2)

373

1-4244-2384-2/08/$20.00 c

2008 IEEE

Authorized licensed use limited to: GOOGLE. Downloaded on October 6, 2009 at 13:05 from IEEE Xplore. Restrictions apply.

where i=1,…,l.

After finding the optimal values of Įi, the decision

boundary that needs to be constructed is of the form

.),()(

bxxKyxf ii

+= ¦

≠

(3)

where the class of x is determined from the sign of f(x). Those

xicorresponding Įi≠0 is called support vector. The value b is a

threshold of the decision boundary from origin. The

regularization parameter, C, is the margin parameter that

determines the trade-off between maximizing the margin and

minimizing the classification error and is chosen by means of a

validation set [7].

In SVM classification one of the attracting features is the

sparsity representation of its decision boundary. According to

[8], the position of the separating hyperplane in the feature

space is determined via real-valued weights on the training set

examples. Those training examples that are situated far away

from the hyper plane do not participate in its specification and

thus receive weights of zero. Only the training examples that lie

close to the decision boundary between the two classes receive

nonzero weights. These training examples are called the

support vectors, since removing them would change the

location of the separating hyper plane. As an example, Fig. 1

illustrates the support vectors in a two-dimensional feature

space. Typically, SVM learning algorithm is defined such that

the number of support vectors is less compared to the total

number of training examples, thus allowing the SVM to

classify new examples efficiently, since the majority of the

training examples can be safely ignored.

III. PREVIOUS SVM TRAINING METHODS

The first training of a SVM with small data sets was

introduced by Vapnik [9], using constrained conjugate gradient

algorithm. Conjugate gradient ascent started with an initial

estimate for solution, denoted by Įo, and then updates the

vector iteratively following the steepest ascent path, that is

moving in the direction of the gradient of W(Į) evaluated at the

position Įt for update t+1.At each iteration, the direction of

update is determined by the steepest ascent strategy, but at the

same time the step length is kept fixed. In this method, every

time Įi reaches zero, the corresponding data point is eliminated

and the process will be re-started.

As such, the decomposition or working set method that is

the SMO which is known to be an excellent method to train

large data set problems will be investigated for its capability in

dealing with 2D data. In this study, SMO training capability is

explored and compared to another type of training method

called the chunking algorithm.

A. Chunking

The Chunking algorithm was proposed by Vapnik [10] in

which started with arbitrary subset or ‘chunk’ the data and train

an SVM, Then, the support vectors remain in the chunk while

other points are discarded and replaced by a new working set

with gross violations of KKT (Karush-Kuhn-Tucker)

conditions. In the final iteration the entire set of non-zero

Lagrange multipliers is extracted and hence the algorithm

solves the QP problem.

B. Sequential Minimal Optimization (SMO)

Recently, SMO has been employed to rapidly train

SVM. The idea behind SMO is that the QP problems can be

broken up into a series of the smallest possible QP problems

and solved analytically by optimizing two Įi at each iteration

and keeping the remaining Įi as fixed. These two values can be

acquired easily and rapidly and thus helps avoid large matrix

computation .Details of SMO can be found in [3]-[4].

The main difference between the SMO method and the

chunking algorithm is that SMO solves the QP problem

analytically without any extra matrix storage whereas for

chunking algorithm, the QP problem needs to be solved

iteratively; which involves exhaustive numerical QP steps and

thus required exponential memory [3].

IV. EXPERIMENTAL RESULTS

SMO solver is implemented to train the binary SVM

classifier with L1-soft margin. Based on [12], for each solver,

the convergence of tolerance, İ = 0.001, the value of kernel

argument is equal to 1 and the Gaussian radial basis function

(RBF) kernel was opted that is K(xi,xj)=exp(-||xi-xj||2/(2ı2)).

The CPU time of both algorithms were measured on a

Pentium R, 3.0 GHz computer with 768MB RAM. Both SMO

and Chunking solvers were implemented using the Statistical

Pattern Recognition Toolbox [12] and Matlab 7.0 respectively.

Over 190 2-D data acquired from [11] are divided equally

for training and testing. These values represent the feature

vectors of the second and forth eigenpostures that are generated

based on PCA technique. Both the SMO and chunking solvers

are used. The system is trained on the training data and its

performance measured on the test data. The trained SVM

classifier is evaluated on the 2D training data using various

values of regularization parameter, C. The example of decision

boundary with C=10 attained is as shown in Fig. 2.

Figure 1. A 2D feature space with a separating hyperplane for non-

linear boundary. Both classification boundary and the accompanying soft

margins are represented by solid line and dotted lines, respectivel y where

as positive and negative examples fall on opposite sides of the decision

boundary. The circled p oints are the support vec tors that lie closest to the

decision boundary.

374 2008 IEEE International Conference on Systems, Man and Cybernetics (SMC 2008)

Authorized licensed use limited to: GOOGLE. Downloaded on October 6, 2009 at 13:05 from IEEE Xplore. Restrictions apply.

A similar decision boundary is obtained using the Chunking

solver as illustrated in Fig. 3.

From both figures, it is observed that the patterns of

decision boundary are alike. However, there exist differences in

the CPU time as tabulated in Table I. The CPU time of the

Chunking algorithms is 1.6853 seconds whilst the SMO

algorithm recorded the CPU time of only 0.1288 seconds. The

number of support vector for both solvers are almost equal that

is 54 for chunking solver and 52 for SMO solver. The same

goes for the classification rates with both solvers gained an

accuracy of 90%.

Further, results for decision boundaries of the SMO and

chunking solvers with regularization parameter C of value 100

are depicted in Fig. 4 and Fig. 5 respectively. It is again

observed that both decision boundaries are almost similar with

equal number of support vectors of 30. Again, the classification

rates of both solvers are similar specifically 93.75% for SMO

and 93.47% for the chunking solver. As before, the CPU time

for SMO is faster than the chunking solver. More results with

various value of C are tabulated in Table I.

TABLE I. PERFORMANCE COMPARISON OF SMO AND

CHUNKING SOLVER WITH FROM C=10 TO C=500 (EIGENPOSTURES

DATASET)

Training

Algorithms

Parameter Measured

Regularization

Parameter, C

Number of

support

vectors

CPU Times

(s)

Classification

Accuracy

(%)

SMO solver

10 52 0.1288 91.67

50 38 0.2591 92.78

100 30 0.4097 93.75

500 24 1.6363 93.75

Chunking

solver

10 54 1.6853 91.51

50 37 1.3853 92.65

100 30 1.7758 93.47

500 25 2.4234 96.76

Figure 2. Sample of 2D eigenpostures dataset

using SMO solver (C=10)

Figure 3. Sample of 2-D eigenpostures dataset using

Chunking solver (C=10)

Figure 5. Sample of 2-D eigenpostures dataset

using Chunking solver (C=100 )

Figure 4. Sample of 2D eigenpostures dataset

using SMO solver (C=100 )

2008 IEEE International Conference on Systems, Man and Cybernetics (SMC 2008) 375

Authorized licensed use limited to: GOOGLE. Downloaded on October 6, 2009 at 13:05 from IEEE Xplore. Restrictions apply.

From the results, the decision boundaries for both solvers

depict similar patterns but recorded different CPU time. The

CPU time of SMO solver is shorter than the chunking solver

irrespective of the C values. It is also observed that the

regularization parameter, C is inversely related to the number

of support vectors, that is, as the value increases, the number of

support vectors will decrease. Additionally, even though the

CPU time of both solvers is different the solvers performances

in terms of the classification error rates are unaffected.

Next, the Iris and Ripley datasets acquired from the software

[12] and UCI Machine Learning Repository Databases

respectively are utilized to verify these findings. Both datasets

are in the 2D form. The Ripley dataset [12]-[13] comprised of

250 training data and 1000 patterns as the testing data. The Iris

dataset consists of 120 patterns with 60 patterns are used as the

training data and the remainder as testing data. The trained

SVM classifier is evaluated on both datasets using various

values of regularization parameter, from C =10 to C=500.

Results are as summarized in Table II and Table III

respectively.

TABLE II. PERFORMANCE COMPARISON OFSMO AND

CHUNKING SOLVER FROM C=10 TO C=500 (IRIS DATASET)

Training

Algorithms

Parameter Measured

Regularization

Parameter C

Number of

support

vectors

CPU

Times

(s)

Classification

Accuracy

(%)

SMO solver

10 12 0.0556 95.00

50 10 0.0563 95.00

100 9 0.0420 98.33

500 8 0.0161 98.33

Chunking

solver

10 12 0.3042 95.00

50 9 0.7916 96.67

100 9 0.4533 98.33

500 8 0.2266 98.33

TABLE III. PERFORM ANCE COMPAR ISON OF SMO AND

CHUNKING SOLVER WITH FROM C=10 TO C=500 (RIPLEY

DATASET)

Training

Algorithms

Parameter Measured

Regularization

Parameter C

Number of

support

vectors

CPU Times

(s)

Classification

Accuracy

(%)

SMO solver

10 94 0.3469 85.60

50 85 0.4931 87.60

100 83 0.5743 87.60

500 77 2.9142 89.20

Chunking

solver

10 94 0.7706 82.60

50 85 0.9766 84.35

100 83 1.4559 84.40

500 76 3.0644 89.45

From Table II and III, it is observed for both datasets that

the SMO performed faster than the chunking solver from the

CPU time attained. In addition, the number of support vectors

obtained is equivalent for both solvers towards Ripley and Iris

datasets. Also, the number of support vectors will reduce when

C increased. For both solvers, the Iris and the Ripley dataset

obtained excellent classification rate for various value of C. It

is also observed that both solvers generated similar decision

boundaries as depicted in Fig. 6(a)-(d) and Fig. 7(a)-(d) based

on the different values of C applied. Fig. 6 depicts the results

using Iris dataset whereas Fig. 7 depicts the results using

Ripley dataset.

V. CONCLUSIONS

As a conclusion, the SMO algorithm realized better training

method for SVM 2D data size based on the CPU time attained.

From the visualized decision boundaries, both solvers

illustrated similar pattern for both value of C applied along

with similar and excellent classification rate. Furthermore, the

number of support vectors is equal for both solvers with

different values of C. Initial results demonstrated that the

SMO is an efficient approach for training the SVM even for

2D data samples.

REFERENCES

[1] V.N. Vapnik, “The nature of statistical learning theory,” Springer, New

York, 1995.

[2] N.Cristianini , J.Shawe-Taylor,“An Introd uction to Support Vector

Machines:and other kernel-based learning methods,”New York:

Cambridge University Press, 2000.

[3] J. Platt, "Fast training of support vector ma chines using sequenti al

minimal optimizati on:' in Advances in Kemrl Ma rhods -Support Vector

Laming, B. Schdlkopi, C.1.C Burges. and A.J. Smala. editors, pages

185-208. MIT Press, Camb ridge. MA, 1999.

[4] Ginny Mak , “Th e Implementation Of Supp ort vector machines usin g the

sequential minima l optimization algori thm,” Master thesis , 2000.

[5] E. Osuna, R. Freund and F. Girosi, “Support vector machines: Training

and Applications,” A.I. Memo AIM-1602, MIT A.I. Lab, 1996.

[6] Francis R. Bach & Gert R. G. Lanckriet, Michael I. Jordan , “Fast kernel

learning using sequentia l minimal optimization,” Report No. UCB/CSD-

04-1307 February, 2004.

[7] Shiego Abe, “Supp ort vector machines for pattern cla ssification”

Advances in Pattern Rec ognition , Springer 2005 .

[8] Michael P. S. Brown,William Noble Grundy,David Lin,Nello

Cristianini,Charles Sugnet,Manuel Ares,Jr. David Haussler , “Support

Vector Machine Classification of Microarray Gene Exp ression Data,”

Technical Report No: UCSC-CRL-99-09, June 12, 1999.

[9] C.Burges and V.Vapnik, “A new method for constructing artifial neural

networks,”Technical report, AT&T Bell Laboratories,May 1995

[10] V.Vapnik, “Estimati on of Dependences Based on Empiric al Data,”

Springer-Verlag, 1982.

[11] Nooritawati Md Tahir, Aini Hussain, Salina Abdul Samad, Hafi zah

Husain & Mohd Marzuki Mustafa, “Eigenposture For Classification”

Journal of Applied Sciences, Asian Network for Scientific Information,

ANSINET, 6(2), 2006.

[12] Statistical Pattern Recognition Toolbox

http://cmp.felk.cvut.cz/cmp/software/stprtool/index.html

[13] B.D. Ripley, “Neural networks and related methods for classification,”

J. Royal Statistica l Soc. Series B, 56:pp.409–45 6, 1994.

376 2008 IEEE International Conference on Systems, Man and Cybernetics (SMC 2008)

Authorized licensed use limited to: GOOGLE. Downloaded on October 6, 2009 at 13:05 from IEEE Xplore. Restrictions apply.

Figure 6. Results obtained using iris dataset

1 2 3 4 5 6 7

0.5

1.5

2.5

(b) When C=10 for chunking solver

1 2 3 4 5 6 7

0.5

1.5

2.5

-1

(d) When C=100 for chunking solver

1 2 3 4 5 6 7

0.5

1.5

2.5

-1

1 2 3 4 5 6 7

0.5

1.5

2.5

(a) When C=10 for SMO solver

2008 IEEE International Conference on Systems, Man and Cybernetics (SMC 2008) 377

Authorized licensed use limited to: GOOGLE. Downloaded on October 6, 2009 at 13:05 from IEEE Xplore. Restrictions apply.

(a) When C=10 for SMO solver (b) When C=10 for chunking solver

(d) When C=100 for chunking s olver

Figure 7. Results obtained using Ripley dataset

378 2008 IEEE International Conference on Systems, Man and Cybernetics (SMC 2008)

Authorized licensed use limited to: GOOGLE. Downloaded on October 6, 2009 at 13:05 from IEEE Xplore. Restrictions apply.

An Efficient Machine Learning-based Sentiment Analysis of E-Commerce Reviews

Article

Full-text available

Jun 2022

The internet has enabled people to instantly access enormous amounts of information from anywhere in the world. In the process of decision-making, the other's opinion is important. It is familiar for e-commerce websites to have a review segmentfor each of their products where users may share their thoughts and help potential customers assess the products.Sentiment analysis is the most common field to obtain insights from text data from various e-commerce sites like Amazon, Ajio, etc. In sentiment analysis, machine learning techniques give the best results whenanalyzing and categorizing positive and negative reviews. This article provides a detailed analysis of sentiment analysis of e-commerce reviews from the Amazon website which contains reviews of beauty products. The Amazon dataset was collected from Kaggle and pre-processed for better results. The classifiers from the two machine learning categories are considered as follows: Naive Bayes (NB) from Bayes theorem, Sequential Minimal Optimization (SMO) from Support Vector Machine (SVM). The result of this paper is determined by analyzing and comparing these two outcomesefficiently.

Comparison of machine learning techniques for spam detection

Article

Full-text available

Feb 2023
MULTIMED TOOLS APPL

Email is a useful communication medium for better reach. There are two types of emails, those are ham or legitimate email and spam email. Spam is a kind of bulk or unsolicited email that contains an advertisement, phishing website link, malware, Trojan, etc. This research aims to classify spam emails using machine learning classifiers and evaluate the performance of classifiers. In the pre-processing step, the dataset has been analyzed in terms of attributes and instances. In the next step, thirteen machine learning classifiers are implemented for performing classification. Those classifiers are Adaptive Booster, Artificial Neural Network, Bootstrap Aggregating, Decision Table, Decision Tree, J48, K-Nearest Neighbor, Linear Regression, Logistic Regression, Naïve Bayes, Random Forest, Sequential Minimal Optimization and, Support Vector Machine. In terms of accuracy, the Random Forest classifier performs best and the performance of the Naïve Bayes classifier is substandard compared to the rest of the classifiers. Random Forest classifier had the accuracy of 99.91% and 99.93% for the Spam Corpus and Spambase datasets respectively. The naïve Bayes classifier had the accuracy of 87.63% and 79.53% for the Spam Corpus and Spambase datasets respectively.

Support vector machines for automated classification of plastic bottles

Article

Jan 2010

Many recycling activities adopt manual sorting for plastic recycling that relies on plant personnel who visually identify and pick plastic bottles as they travel along the conveyor belt. These bottles are then sorted into the respective containers. Manual sorting may not be a suitable option for recycling facilities of high throughput. It has also been noted that the high turnover among sorting line workers had caused difficulties in achieving consistency in the plastic separation process. As a result, an intelligent system for automated sorting is greatly needed to replace manual sorting system. The core components of machine vision for this intelligent sorting system is the image recognition and classification.[3]Therefore, in this work, an automated classification of plastic bottles based on the extraction of best feature vectors to represent the type of plastic bottles is performed using the morphological based approach. Morphological operations are used to describe the structure or form of an image. By using the two-dimensional description of plastic bottle silhouettes, edge detection of the object silhouette is performed followed by the erosion process. This procedure can be considered as two stages; a) a feature vector is extracted from the analysis of morphological operation and structure element used and b) a classification technique is applied to that input vector in order to provide a meaningful categorization of the data content. In this study, Support Vector Machines (SVM) was employed merely to classify the image of two groups of plastic bottles namely polyethyleneterephthalate (PET) and non-PET. Additionally, for detailed classification task, the pattern of decision boundary for classification of extracted feature vectors based on morphological approach is also illustrated. Furthermore, the optimal features for input to SVM classifier is identified.The initial results indicate that the performance of the SVM in terms of classification accuracy is more than 90%.

AI SoC-Based Accelerator for Speech Classification Accélérateur de classification de la parole basé sur un AI SoC

Article

Jul 2022

Speech classification acceleration using field-programmable gate arrays (FPGAs) is a well-studied field and enables the potential to gain both speed and better energy efficiency over other processor-intensive classifiers. System-on-chip (SoC) architecture allows for an integrated system between programmable logic and processor and for increased bandwidth communications to on-chip peripherals and memory. This article serves as an investigation of the utility of an edge-based support-vector machine (SVM) implemented onto a Zynq-XC7Z020 multiprocessor system on a chip (MPSoC) for the acceleration of three speech class pairs. The system allows for a parallelized structure, which yielded a faster classifier model. The results were found to be an acceleration factor of 2.08 $\times$ . This appears to have come at the cost of a decrease in prediction accuracy, lowering from 92.5% to 83.5% positive prediction percentage likely due to decreased data resolution. The resolution used in this model was a 16-bit fixed-point format for the hardware interpretation and a floating-point format for the software benchmark. The resource usage of the FPGA was also analyzed for both overlays and can yield a 21% reduction in CPU usage.

Fault Diagnosis of Rolling Bearings Based on an Improved Stack Autoencoder and Support Vector Machine

Article

Oct 2020

In recent years, autoencoder has been widely used for the fault diagnosis of mechanical equipment because of its excellent performance in feature extraction and dimension reduction; however, the original autoencoder only has limited feature extraction ability due to the lack of label information. To solve this issue, this study proposes a feature distance stack autoencoder (FD-SAE) for rolling bearing fault diagnosis. Compared with the existing methods, FD-SAE has stronger feature extraction ability and faster network convergence speed. By analyzing the characteristics of original rolling bearing data, it is found that there are evident differences between normal data and faulty data. Therefore, a simple linear support vector machine (SVM) is used to classify normal data and faulty data, and then the proposed FD-SAE is used for fault classification. The novel combination of SVM and FD-SAE has simple structure and little computational complexity. Finally, the proposed method is verified on the rolling bearing data set of Case Western Reserve University (CWRU).

Optimal Feature Selection for SVM based Weed Classification via Visual Analysis

Article

Nov 2010

Weed classification is a serious issue in the agricultural research. Weed classification is a necessity in identifying weed species for control. Many classification techniques have been used to identify weed based on images, however, most of the techniques only measure the percentages of accuracy but the detailed of classifier parameter are not analyzed and discussed. Therefore, in this work, feature vectors of weed images extracted using Gabor Wavelet and Fast Fourier Transform (FFT) were employed in analyzing weed pattern based on images using Support Vector Machines (SVM). The decision boundaries of the categorized extracted feature vectors are illustrated and optimal feature vectors are identified. Results are discussed and displayed with illustrations to prove the SVM classifier performance.

Analysis of PCA based feature vectors for SVM posture classification

Conference Paper

Jun 2010

Many classifiers have been employed to classify human posture classification; however, most of them only presents the average accuracy of the classification. Furthermore, the details of the measured parameters especially for SVM classifier are not measured. Therefore, the objective of this work is to analyse and classify human body posture using Support Vector Machine (SVM) techniques based on various two combinations of eigenpostures by considering two different solvers in the training process. The two solvers namely Sequential Minimal Optimization (SMO) and Matlab Quadratics Programming (QP) solvers have been studied and analyzed to perform the SVM training. The principal component analysis (PCA) method is applied to extract the features from human shape silhouettes. These extracted feature vectors are then used to perform human posture classification. Human posture evaluates which eigenpostures (feature vectors of the several eigenvalues) can be used to classify either human standing posture or human non-standing posture. Next, the solvers that produced the best performance in classifying human postures as well as the best combination of eigenpostures were selected. The results verified that the combination of second and fourth eigenpostures gives the superb performance with 100% correct classification and it is shown that the best solver in training process to classify human body posture classification is the SMO based on the shortest CPU time attained.

Fast Training of Support Vector Machines Using Sequential Minimal Optimization

Article

Full-text available

Feb 1999

John C. Platt

An abstract is not available.

Fast kernel learning using sequential minimal optimization

Article

Full-text available

Mar 2004

While classical kernel-based classifiers are based on a single kernel, in practice it is often desirable to base classifiers on combinations of multiple kernels. Lanckriet et al. (2004) considered conic combinations of kernel matrices for the support vector machine (SVM), and showed that the optimization of the coefficients of such a combi-nation reduces to a convex optimization problem known as a quadratically-constrained quadratic program (QCQP). Unfortunately, current convex optimization toolboxes can solve this problem only for a small number of kernels and a small number of data points; moreover, the sequential minimal optimization (SMO) techniques that are es-sential in large-scale implementations of the SVM cannot be applied because the cost function is non-differentiable. We propose a novel dual formulation of the QCQP as a second-order cone programming problem, and show how to exploit the technique of Moreau-Yosida regularization to yield a formulation to which SMO techniques can be applied. We present experimental results that show that our SMO-based algorithm is significantly more efficient than the general-purpose interior point methods available in current optimization toolboxes.

Neural Networks and Related Methods for Classification

Article

Sep 1994

B. D. Ripley

Feed‐forward neural networks are now widely used in classification problems, whereas non‐linear methods of discrimination developed in the statistical field are much less widely known. A general framework for classification is set up within which methods from statistics, neural networks, pattern recognition and machine learning can be compared. Neural networks emerge as one of a class of flexible non‐linear regression methods which can be used to classify via regression. Many interesting issues remain, including parameter estimation, the assessment of the classifiers and in algorithm development.

The Nature of Statistical Learning Theory

Chapter

Jan 2000

Vladimir N. Vapnik

In the history of research of the learning problem one can extract four periods that can be characterized by four bright events: (i) Constructing the first learning machines, (ii) constructing the fundamentals of the theory, (iii) constructing neural networks, (iv) constructing the alternatives to neural networks.

Support Vector Machines for Pattern Classification

Book

Jan 2010

Shigeo Abe

A guide on the use of SVMs in pattern classification, including a rigorous performance comparison of classifiers and regressors. The book presents architectures for multiclass classification and function approximation problems, as well as evaluation criteria for classifiers and regressors. Features: Clarifies the characteristics of two-class SVMs; Discusses kernel methods for improving the generalization ability of neural networks and fuzzy systems; Contains ample illustrations and examples; Includes performance evaluation using publicly available data sets; Examines Mahalanobis kernels, empirical feature space, and the effect of model selection by cross-validation; Covers sparse SVMs, learning using privileged information, semi-supervised learning, multiple classifier systems, and multiple kernel learning; Explores incremental training based batch training and active-set training methods, and decomposition techniques for linear programming SVMs; Discusses variable selection for support vector regressors.

An introduction to sup - port vector machines and other kernel - based methods

Article

Jan 2000

The Nature Of Statistical Learning Theory

Book

Jan 1995

Vladimir N. Vapnik

Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

Neural Networks and Related Methods for Classification

Article

Jan 1994

B. D. Ripley

Feed-forward neural networks are now widely used in classification problems, whereas nonlinear methods of discrimination developed in the statistical field are much less widely known. A general framework for classification is set up within which methods from statistics, neural networks, pattern recognition and machine learning can be compared. Neural networks emerge as one of a class of flexible non-linear regression methods which can be used to classify via regression. Many interesting issues remain, including parameter estimation, the assessment of the classifiers and in algorithm development.

Estimation of Dependences Based on Empirical Data

Book

Jan 2006

Estimation of Dependence Based on Empirical Data

Chapter

Jan 1982

Vladimir Naumovich Vapnik

Traducción de: Vosstanovlenie zavisimosteipo émpiricheskim dannym Incluye bibliografía e índice

Training and Analysis of Support Vector Machine using Sequential Minimal Optimization

Abstract and Figures

Recommended publications

Reduced set support vector machines: Application for 2-dimensional datasets

Analysis of PCA based feature vectors for SVM posture classification

Decision Boundaries and Classification Performance Of SVM And KNN Classifiers For 2-Dimensional Data...

Multi-class Support Vector Machine for Human Posture Classification Using a Simplified Shock Graph