ArticlePDF Available

Comprehensive Evaluation of Twin SVM based Classifiers on UCI Datasets

July 2019
Applied Soft Computing 83

July 2019
83

DOI:10.1016/j.asoc.2019.105617

Authors:

M. Tanveer

Indian Institute of Technology Indore

Chandan Gautam

Indian Institute of Technology Indore

Ponnuthurai N. Suganthan

Qatar University

In the past decade, twin support vector machine (TWSVM) based classifiers have received considerable attention from the research community. In this paper, we analyze the performance of 8 variants of TWSVM based classifiers along with 179 classifiers evaluated in [23] from 17 different families on 90 University of California Irvine (UCI) benchmark datasets from various domains. Results of these classifiers are exhaustively analyzed using various performance criteria. Statistical testing is performed using Friedman Rank (FRank). Our experiments show that two least square TWSVM based classifiers (ILSTSVM m, and RELS-TSVM m) are the top two ranked methods among 187 classifiers and they significantly outperform all other classifiers according to Friedman Rank. Overall, this paper bridges the evaluational benchmarking gap between various TWSVM variants and the classifiers from other families. Codes of this paper are provided on authors’ homepages to reproduce the presented results and figures in this paper.

Position of classifiers based on binary class datasets (Pos) as per FRank in [23], FRank, Average accuracy (Acc) for each classifier, ordered by increasing the FRank. This table is continued to Table 3

…

Top 20 classifiers as per the highest PAMA (%) value.

…

Top 20 classifiers as per the highest p95 (%) value.

…

Top 20 classifiers as per the highest PMA (%) value.

…

Position of classifiers for multi-class datasets (Pos) as per FRank in [23], FRank,

…

Figures - uploaded by M. Tanveer

Content may be subject to copyright.

Content uploaded by M. Tanveer

Content may be subject to copyright.

Comprehensive Evaluation of Twin SVM based

Classiﬁers on UCI Datasets

M. Tanveera,∗, C. Gautamb, P.N. Suganthanc,∗

aDiscipline of Mathematics, Indian Institute of Technology Indore, Indore, Simrol 453552

India

bDiscipline of Computer Science and Engineering, Indian Institute of Technology Indore,

Indore, Simrol 453552 India

cSchool of Electrical and Electronic Engineering, Nanyang Technological University,

639798, Singapore

Abstract

In the past decade, twin support vector machine (TWSVM) based classiﬁers

have received considerable attention from the research community. In this pa-

per, we analyze the performance of 8 variants of TWSVM based classiﬁers along

with 179 classiﬁers evaluated in [23] from 17 diﬀerent families on 90 University

of California Irvine (UCI) benchmark datasets from various domains. Results

of these classiﬁers are exhaustively analyzed using various performance criteria.

Statistical testing is performed using Friedman Rank (FRank). Our experi-

ments show that two least square TWSVM based classiﬁers (ILSTSVM m, and

RELS-TSVM m) are the top two ranked methods among 187 classiﬁers and

they signiﬁcantly outperform all other classiﬁers according to Friedman Rank.

Overall, this paper bridges the evaluational benchmarking gap between various

TWSVM variants and the classiﬁers from other families. Codes of this pa-

per are provided on authors’ homepages to reproduce the presented

results and ﬁgures in this paper.

Keywords: Benchmarking classiﬁers . Twin support vector machines . Least

squares twin support vector machines. Support vector machines . Machine

∗Corresponding author

Email addresses: mtanveer@iiti.ac.in (M. Tanveer), chandangautam31@gmail.com (C.

Gautam), epnsugan@ntu.edu.sg (P.N. Suganthan)

Preprint submitted to Applied Soft Computing, Elsevier July 18, 2019

learning .

1. Introduction

Among kernel based methods, SVM is well explored in the past by re-

searchers primarily in the context of pattern recognition [1, 2, 3, 4, 5]. Most

of the work in SVM endeavor to maximize the margin between two parallel

hyperplanes by minimizing the generalization error. In 2007, Jayadeva et al.

[8] introduced the concept of non-parallel supporting hyperplanes and was re-

ferred to as twin support vector machine (TWSVM). It solves two smaller sized

quadratic programming problems (QPPs) instead of solving one large QPP in

traditional SVM, and shows better performance in both computational time

and classiﬁcation accuracy. Then, Kumar et al. [9] proposed a least square

TWSVM (LSTSVM), which is an extremely simple and fast algorithm for gen-

erating binary classiﬁers. In the last decade, TWSVM formulation has attracted

considerable attention by the research community for replacing the parallel hy-

perplanes to non-parallel ones in SVM [8]. Generally, fuzziness is embedded in

SVM to solve this type of issue, which introduces extra complexity to the model.

However, TWSVM can handle such a situation eﬀectively without introducing

further complexity to the model. Various variants of TWSVM have been devel-

oped by the researchers in the last decade [14, 16, 19, 26, 28, 29, 32, 33]. Shao

et al. [12] included one more regularization term to TWSVM and proposed a

new variant termed as twin bounded support vector machine (TBSVM). The

formulation of TWSVM can be viewed as a special case of TBSVM. An im-

provement to LSTSVM (ILSTSVM) has also been proposed by Xu et al. [16]

by introducing a regularization term. Later, the weighted Lagrangian twin sup-

port vector machine (WLTSVM) [19] was proposed for imbalance data classiﬁ-

cation. Recently, two more variants viz., robust and sparse linear programming

TWSVM (LPTSVM) [18, 22] and robust energy based LSTSVM (RELS-TSVM)

[20] were proposed for classiﬁcation problems. Most recently, pinball loss-based

TWSVM (pinTSVM) [27] was proposed, which takes quantile distance into ac-

count and robust to noisy samples. Apart from the above discussed variants of

TSVM, researchers have also developed evolutionary algorithm-based TWSVM

[28, 30] where evolutionary algorithms are employed to select optimal values

of parameters for TWSVM. We have selected 8 competitive variants among

various existing TWSVM variants in the literature [15]. A brief description of

these 8 variants are provided in Section 3. Further, we analyze the outcomes

of the classiﬁers based on various performance analysis criteria viz., Friedman

Rank (FRank), Average accuracy (Acc), Probability of Achieving the Maximum

Accuracy (PAMA), Probability of achieving more than 95% of the maximum

accuracy (P95) and Percentage of the Maximum Accuracy (PMA) [23]. Before

going to further discussion, we are providing motivation in the next section.

2. Motivation and contributions

The main focus of this paper is to analyze the performance of the TWSVM

variants as well as existing classiﬁers on 90 datasets. Vanschoren et al. [25]

have provided a good analysis using 86 datasets and 93 classiﬁers using Weka.

Recently, Fernandez et al. [23] performed exhaustive experiments on 121 UCI

repository datasets with 179 classiﬁers from 17 diﬀerent families and provided

the ranking of these classiﬁers on various binary class and multi-class datasets.

They focused on the combined analysis of binary and multi-class classiﬁcation,

however, provided a very brief analysis on binary class datasets separately. Fer-

nandez et al. [23] have empirically exhibited that the performance of the classi-

ﬁers depends on the fact whether dataset belongs to binary or multi-class. They

provided a diﬀerent analysis for binary and multi-class datasets but provided

very brief analysis on binary datasets compared to multi-class datasets. Most

recently, Zhang and Suganthan [31] also performed similar experiments with

their proposed kernel ridge regression-based classiﬁers. Apart of this, above

mentioned papers [25, 23, 31] did not consider a quite popular method in the

last decade namely TWSVM. TWSVM exhibited very good performance in the

literature [17], therefore, TWSVM needs to be tested on the same experimen-

tal setup as used in [23]. Hence, by taking a clue from the paper [23], we are

providing a broad analysis in this paper of the 8 variants of TWSVM with

179 classiﬁers used in [23] over 90 UCI datasets (44 binary and 46 multi-class

datasets). These UCI datasets and their indices for training and testing have

been taken from [23] and listed on this web page 1along with the detailed

results. Moreover, for multi-class datasets, we use one vs. rest strategy and

analysis is provided separately for binary and multi-class datasets in this paper.

The remaining paper is organized as follows: Section 3 brieﬂy discusses

the eight variants of TWSVM. Section 4 provides the comparative analysis of

TWSVM based eight classiﬁers with 179 classiﬁers, which is followed by the

conclusion in the last section.

3. Variants of twin support vector machines

In this section, eight variants of TWSVM are discussed brieﬂy. These vari-

ants can be divided into 3 categories. The ﬁrst category contains three basic

TWSVM variants viz., TBSVM, TWSVM, and LPTSVM. The second category

is based on weighted TWSVM and variants are pinTSVM and WLTSVM. The

third category contains three least square variants viz., LSTSVM, ILSTSVM,

and RELS-TSVM. Out of eight variants, RELS-TSVM [20] and ILSTSVM [16]

emerge as the best classiﬁers among 187 classiﬁers and yields least FRank as

well as highest average accuracy.

3.1. Basic TWSVM variants

3.1.1. Twin support vector machine (TWSVM)

Let us denote all the data points in class +1 by a matrix A∈Rm1×n,where

ith data point Ai∈Rnand the matrix B∈Rm2×nrepresent the data points

of class -1. Unlike SVM, the linear TWSVM [8] seeks a pair of non-parallel

1http://people.iiti.ac.in/~phd1501101001/TSVM_JMLR_Binary_Multi.html

hyperplanes

f1(x) = wt

1x+b1and f2(x) = wt

2x+b2(1)

such that each hyperplane is proximal to the data points of one class and far

from the data points of other class, where w1∈Rn, w2∈Rn, b1∈Rand b2∈R.

The formulation of TWSVM can be written as follows:

min

(w1,b1)∈Rn+1

2kAw1+e2b1k2+c1kξ1k

s.t. −(Bw1+e1b1) + ξ1≥e1, ξ1≥0 (2)

and

min

(w2,b2)∈Rn+1

2kBw2+e1b2k2+c2kξ2k

s.t. (Aw2+e2b2) + ξ2≥e2, ξ2≥0 (3)

respectively, where c1, c2are positive parameters and e1, e2are vectors of one

with appropriate dimensions. Further, we derive dual forms of above problems

can be written as follows:

In order to derive the corresponding dual problems, TWSVM assumes that

the matrices GtGand HtHare nonsingular, where G= [A e2] and H= [B e1]

are augmented matrices of sizes m1×(n+ 1) and m2×(n+ 1), respectively.

Under this condition, the dual problems are

max

α∈Rm2

1α−1

2αtHGtG−1Htα

s.t. 0≤α≤c1(4)

and

max

γ∈Rm1

2γ−1

2γtGHtH−1Gtγ

s.t. 0≤γ≤c2(5)

respectively. Here, two matrices GtGand HtHare nonsingular matrices, where

G= [A e2], and H= [B e1].

In above optimization problem, GtGor HtHcan be singular or ill con-

ditioned. Hence, in order to avoid these cases, the inverse matrices (GtG)−1

and (HtH)−1are modiﬁed as (GtG+δI )−1and (HtH+δI)−1, respectively.

Here δis a very small positive scalar and Iis an identity matrix of appropriate

dimensions. Now, the dual form of above problems can be written as:

max

α∈Rm2

1α−1

2αtHGtG+δI−1Htα

s.t. 0≤α≤c1(6)

and

max

γ∈Rm1

2γ−1

2γtGHtH+δI−1Gtγ

s.t. 0≤γ≤c2(7)

respectively.

Thus, we obtain the solution of the above problems as follows:







=−GtG+δI−1Htαand 





=HtH+δI−1Gtγ. (8)

The dual problems in Eqns. (6) and (7) are derived and solved in [8]. Ex-

perimental results show that the performance of TWSVM is better than the

conventional SVM and generalized eigenvalues proximal SVM (GEPSVM) [10].

3.1.2. Twin bounded support vector machine (TBSVM)

It is well-known that the implementation of structural risk minimization

principle in SVM is one of the signiﬁcant advantages. However, the primal prob-

lems of TWSVM implements only empirical risk. In addition, we noticed that

TWSVM assumes the existence of the inverse matrices (GtG)−1and (HtH)−1.

However, this requirement cannot always be satisﬁed. Shao et al. [12] proposed

an improved and more eﬃcient algorithm termed as twin bounded support vec-

tor machines (TBSVM). The formulation of TBSVM implements structural risk

minimization principle by including one more regularization term to TWSVM.

The dual formulation of TBSVM can be derived without additional require-

ment. Thus, the formulation of TBSVM is theoretically better than TWSVM

[12].

The linear TBSVM [12] seeks a pair of non-parallel proximal hyperplanes

f1=wt

1x+b1= 0 and f2=wt

2x+b2= 0 (9)

by solving the following primal problems

min

(w1,b1)∈Rn+1

2kAw1+e2b1k2+c1kξ1k+c3

2













s.t. −(Bw1+e1b1) + ξ1≥e1, ξ1≥0 (10)

and

min

(w2,b2)∈Rn+1

2kBw2+e1b2k2+c2kξ2k+c4

2













s.t. (Aw2+e2b2) + ξ2≥e2, ξ2≥0 (11)

respectively, where ci, i = 1,2,3,4 are the penalty parameters, e1and e2are

vectors of ones of appropriate dimensions, ξ1and ξ2are slack variables of ap-

propriate dimensions. Their corresponding Lagrange dual problems are

max

αet

2α−1

2αtG(HtH+c3I)−1Gtα

s.t. 0 ≤α≤c1

(12)

max

γet

1γ−1

2γtH(GtG+c4I)−1Htγ

s.t. 0 ≤γ≤c2

(13)

where αand γare Lagrange multipliers, G= [B e1] and H= [A e2]. The

solution of the problems in Eqns. (10) and (11) are obtained by







=−(HtH+c3I)−1Gtα(14)

and 





= (GtG+c4I)−1Htγ. (15)

Once the solutions of the problems in Eqns. (12) and (13) are obtained, a

new point x∈Rnis assigned to class i(i= +1,−1), depending on which of the

two hyperplanes in (9) it is closer to

Class i= arg min

k=1,2

|wT

kx+bk|

||wk|| ,(16)

where |.|is the absolute value.

3.1.3. Linear programming twin support vector machines (LPTSVM)

The solution of TWSVM and TBSVM are not capable of generating sparse

solutions. To overcome this issue, a robust and sparse linear programming

twin support vector machines (LPTSVM) [22] was proposed. The solution of

LPTSVM is obtained by solving a pair of dual exterior penalty problems as

unconstrained optimization problems using Newton method. Unlike solving

two QPPs in TWSVM and TBSVM, the unconstrained optimization problems

of LPTSVM is reduced to solving two systems of linear equations, which leading

to extremely fast and eﬃcient algorithm.

The formulation of LPTSVM [22] can be expressed as follows:

min

(w1,b1)∈Rn+1 kAw1+e2b1k1+c1kξ1k1+c3













1

s.t. −(Bw1+e1b1) + ξ1≥e1, ξ1≥0

(17)

min

(w2,b2)∈Rn+1 kBw2+e1b2k1+c2kξ2k1+c4













1

s.t. (Aw2+e2b2) + ξ2≥e2, ξ2≥0

(18)

where Aand Bare matrices of sizes m1×nand m2×nrespectively, ci, i =

1,2,3,4 are the penalty parameters, e1and e2are the vectors of one’s of sizes

m1and m2, respectively.

Following the approach of [11] we obtain the solutions of the 1-norm TWSVM

in Eqns. (17) and (18) by converting them into a pair of linear programming

problems (LPPs) in primal and solving the exterior penalty functions of their

duals for a ﬁnite value of a penalty parameter θ.

Let G= [A e2], H = [B e1] be two augmented matrices of sizes m1×(n+1)

and m2×(n+ 1), respectively. Then, by setting







=p1−q1, G (p1−q1) = r1−s1,





=p2−q2, G (p2−q2) = r2−s2

(19)

where p1, q1, p2, q2∈Rn+1, r1, s1∈Rm1and r2, s2∈Rm2satisfying the non-

negativity constraints

p1, q1, p2, q2, r1, s1, r2, s2≥0,

the above pair of problems in Eqns. (17) and (18) can be converted into the

following pair of linear programming twin support vector machine (LPTSVM)

problems of the form:

min

r1,s1∈Rm1,p1,q1∈Rn+1,ξ1∈Rm2

1(r1+s1) + c1et

2ξ1+c3et(p1+q1)

−H(p1−q1) + ξ1≥e2

s.t. G (p1−q1)−(r1−s1)=0

p1, q1, r1, s1, ξ1≥0

(20)

and

min

r2,s2∈Rm2,p2,q2∈Rn+1,ξ2∈Rm1

2(r2+s2) + c2et

1ξ2+c4et(p2+q2)

G(p2−q2) + ξ2≥e1

s.t. H(p2−q2)−(r2−s2) = 0

p2, q2, r2, s2, ξ2≥0

(21)

respectively, where eis the vector of one’s of size (n+ 1).

3.2. Weighted TWSVM variants

3.2.1. Weighted Lagrangian twin support vector machine (WLTSVM)

Above discussed TWSVM variants do not handle the issue of imbalanced

data. Weighted Lagrangian twin support vector machine (WLTSVM) [19] was

developed for handling the imbalanced data. It uses a graph-based under-

sampling strategy, which provides the robustness to the algorithm against out-

liers. It also embeds the weight biases in the Lagrangian TWSVM for enabling

the algorithm to handle imbalanced data.

The primal problems of WLTSVM can be written as

min

(w1,b1,ξ1)

2(kw1k2+b2

1) + c1

2((Aw1+e2b1)t(Aw1+e2b1) + ξt

1D2ξ1)

s.t. −(B2w1+e1b1) + ξ1≥e1, ξ1≥0 (22)

and

min

(w2,b2,ξ2)

2(kw2k2+b2

2) + c2

2((B1w2+e1b2)tD1(B1w2+e1b2) + ξt

2ξ2)

s.t. (Aw2+e2b2) + ξ2≥e2, ξ2≥0 (23)

where ci, i = 1,2 are the penalty parameters, e1and e2are vectors of ones

with appropriate dimensions, ξ1and ξ2are slack variables of appropriate di-

mensions, B1and B2are under-sampled training sets. D1and D2are weight

matrices used to determine minority and majority planes respectively in the

case of imbalanced data.

The dual forms of Eqns. (22) and (23) are:

max

α−1

2αtR1StS+c1I−1Rt

1+1

D−1

2α+et

2α

s.t. α ≥0

(24)

max

γ−1

2γtSRt

2R2+c2I−1St+1

D−1

1γ+et

1γ

s.t. γ ≥0

(25)

where S=hA e2i,R1=hB2e1i,R2=hB1e2iand α, γ are Lagrangian

multipliers.

Similar to earlier subsections, solutions (w1, b1) and (w2, b2) can be obtained

by solving Eqns. (24) and (25).

3.2.2. Pinball loss based twin support vector machine (pinTSVM)

Twin support vector machines (TWSVM) [8], twin bounded SVM (TBSVM)

[12] and twin parametric-margin support vector machine (TPMSVM) [26] are

eﬃcient classiﬁers but noise sensitive. To overcome the issue of noise sensitivity

and further enhance the generalization ability, Xu et al. [27] introduced pinball

loss to TPMSVM and proposed twin support vector machine with pinball loss

(pinTSVM), especially for noise-corrupted data.

Let the data points belonging to class +1 and -1 are `1and `2respectively in

the n-dimensional real space Rn. The nonlinear pinTSVM seeks for two kernel

generated surfaces deﬁned as follows:

K(xT, DT)u++b+= 0 and K(xT, D T)u−+b−= 0,

where D= [A;B]; u+, u−∈Rnand Kis an arbitrary kernel function. The

nonlinear pinTSVM formulation can be expressed as follows:

min

u+, b+, ξ1

2ku+k2+ν1

2(K(B, DT)u++e2b+) + c1

1ξ1(26)

s.t. K(A, DT)u++e1b+≥ −ξ1,

K(A, DT)u++e1b+≤ξ1

τ1

and

min

u−, b−, ξ2

2ku−k2−ν2

1(K(A, DT)u−+e1b−) + c2

2ξ2(27)

s.t. −(K(B, DT)u−+e2b−)≥ −ξ2,

−(K(B, DT)u−+e2b−)≤ξ2

τ2

where c1, c2are positive parameters; ν1, ν2>0 are margin parameters and τ1,

τ2∈[0,1] are pinball loss function parameters. When τ1and τ2are zero then

QPPs in Eqns. (26) and (27) are converted into the QPPs of TPMSVM. By

introducing the Lagrange function and using the KarushKuhnTucker (K.K.T.)

optimality conditions, we get the dual formulations of QPPs in Eqns. (26) and

(27) as follows:

max

α, β

ν1

2K(B, A)T(α−β)−1

2(α−β)TK(A, A)T(α−β) (28)

s.t. eT

1(α−β) = ν1, α +β

τ1

=c1

e1,

α≥0, β ≥0

and

max

γ, σ

ν2

1K(A, B)T(γ−σ)−1

2(γ−σ)TK(B, B)T(γ−σ) (29)

s.t. eT

2(γ−σ) = ν2, γ +σ

τ2

=c2

e2,

γ≥0, σ ≥0,

where α,β,γand σ≥0 are Lagrange multipliers.

After optimizing the QPPs in Eqns. (28) and (29), we obtain the ui(i=

+,−) as follows:

u+=K(A, DT)T(α−β)−ν1

K(B, DT)Te2

and

u−=−K(B, DT)T(γ−σ) + ν2

K(A, DT)Te1.

Value of the bias term (b+) is given by:

O+={i:αi>0 and βi>0}, b+=−1

|O+|X

i∈O+

K(xT, DT)u+.

Similarly, value of bias term (b−) is given by:

O−={i:γi>0 and σi>0}, b−=−1

|O−|X

i∈O−

K(xT, DT)u−.

A new data point x∈Rnis assigned to class i(i= +1,−1) depending on

which of the kernel generated surface is closer to x, i.e.,

class(i) = signK(xT, DT)u++b+

ku+k+K(xT, DT)u−+b−

ku−k,

where sign(·) is the signum function.

3.3. Least squares TWSVM variants

3.3.1. Least squares twin support vector machine (LSTSVM)

The formulation of least squares twin support vector machines (LSTSVM)

[9] simply solves system of linear equations as opposed to solving QPPs in

TWSVM, TBSVM and pinTSVM. Therefore, it is a simple and fast algorithm.

For derivation of the primal problems of LSTSVM, inequality constraints is

replaced by equality constraints and 1-norm is replaced by 2-norm for slack

variables in the formulation of TWSVM. Thus, the primal problems of LSTSVM

[9] can be expressed as

min

(w1,b1)∈Rn+1

2kAw1+e2b1k2+c1

2kξ1k2

s.t. −(Bw1+e1b1) + ξ1=e1,(30)

min

(w2,b2)∈Rn+1

2kBw2+e1b2k2+c2

2kξ2k2

s.t. (Aw2+e2b2) + ξ2=e2.(31)

Solution of linear LSTSVM is obtained by computing inverse of two matrix

[9] which can be expressed in the form of two nonparallel hyperplanes as follows:







=−hc1QtQ+PtPi−1

c1Qte1,(32)







=hc2PtP+QtQi−1

c2Pte2,(33)

where c1and c2are positive penalty parameters, P= [A e2] and Q= [B e1].

Here, we can see that both TWSVM and LSTSVM only minimize the em-

pirical risk, and the matrices in Eqns. (32) and (33) may not be nonsingular.

3.3.2. Improved least squares twin support vector machine (ILSTSVM)

Least squares twin support vector machine (LSTSVM) implements only em-

pirical risk minimization principle, which reduces its generalization performance.

To overcome this drawback, Xu et al. [16] proposed an improved version of

LSTSVM by introducing extra regularization terms to each objective function.

This improvement leads to implement the structural risk minimization principle

and shows better generalization performance as compared to LSTSVM.

The primal problems of ILSTSVM [16] can be written as

min

(w1,b1,ξ1)

2kAw1+e2b1k2+c1

2ξt

1ξ1+c3

2













s.t. −(Bw1+e1b1) + ξ1=e1,

(34)

min

(w2,b2,ξ2)

2kBw2+e1b2k2+c2

2ξt

2ξ2+c4

2













s.t. (Aw2+e2b2) + ξ2=e2,

(35)

where ci, i = 1,2,3,4 are the penalty parameters, e1and e2are vectors of ones of

appropriate dimensions, ξ1and ξ2are slack variables of appropriate dimensions.

Solution of linear ILSTSVM is obtained by computing inverse of two matrix

[16] which can be expressed in the form of two nonparallel hyperplanes as follows:







=−[GtG+1

HtH+c3

I]−1Gte1,(36)







= [HtH+1

GtG+c4

I]−1Hte2.(37)

Note that the solutions for the primal problems in Eqns. (34) and (35) can

be obtained directly by solving two systems of linear equations instead of solving

two QPPs in TBSVM, which implies that the speed of ILSTSVM is faster than

TBSVM.

3.3.3. Robust energy-based least squares twin support vector machines (RELS-

TSVM)

By introducing an energy parameter to each hyperplane and an extra regular-

ization term to each objective function, recently Tanveer et al. [20] presented a

robust energy-based least squares twin support vector machines (RELS-TSVM)

algorithm for classiﬁcation problems. This algorithm is not only robust to noise

and outliers but also more stable.

The primal problems of RELS-TSVM can be expressed as follows:

min

(w1,b1,ξ1)

2kAw1+e2b1k2+c1

2ξt

1ξ1+c3

2













s.t. −(Bw1+e1b1) + ξ1=E1,

(38)

min

(w2,b2,ξ2)

2kBw2+e1b2k2+c2

2ξt

2ξ2+c4

2













s.t. (Aw2+e2b2) + ξ2=E2,

(39)

where ci, i = 1,2,3,4 are the penalty parameters; E1and E2are energy pa-

rameters of the hyperplanes; ξ1and ξ2are the slack variables of appropriate

dimensions.

One can obtain the solutions for the problems in Eqns. (38) and (39) as

z1=−(c1NtN+MtM+c3I)−1c1NtE1(40)

z2= (c2MtM+NtN+c4I)−1c2MtE2(41)

respectively, where N= [B e1] and M= [A e2].

It should be pointed out again that both (c1NtN+MtM+c3I) and (c2MtM+

NtN+c4I) are positive deﬁnite matrices due to adding extra regularization

term. This extra regularization provides more robustness and stability to the

algorithm. We also notice that RELS-TSVM is not aﬀected by matrix singular-

ity. The parameters c3and c4used in the formulation are penalty parameters

instead of perturbation term. Once training of RELS-TSVM is over then the

class of an unknown data point xiis assigned based on the following decision

function.

f(xi) = 









+1 if |xiw1+eb1

xiw2+eb2| ≤ 1

−1 if |xiw1+eb1

xiw2+eb2|>1

(42)

where |.|is the absolute value.

4. Numerical experiments

All 90 datasets are taken from UCI repository [24]. Out of 90 datasets, 44

are binary and 46 are multi-class datasets. The name of these datasets2are

available at these links http://people.iiti.ac.in/~phd1501101001/TSVM_

Binary_JMLR/results-Binary_Final.xlsx and http://people.iiti.ac.in/

~phd1501101001/TSVM_Binary_JMLR/results-Multi_Final.xlsx. We have

performed Z-score normalization on all the datasets as done in [23]. Our ex-

perimental setup is identical as in [23] and it consists of two steps. In the

ﬁrst step, one training and one testing set are generated randomly by dividing

the dataset into two equal parts for tuning the parameters for the classiﬁers

and select the best performing parameters from the testing set as the optimal

parameters. Indices for this random division of the datasets have been taken

directly from [23]. The authors have provided all indices at http://persoal.

citius.usc.es/manuel.fernandez.delgado/papers/jmlr/data.tar.gz. In

the second step, experiments have been performed in two ways using the opti-

mal parameters for ﬁnal training and testing as follows:

2The whole datasest and partitions are available from: http://persoal.citius.usc.es/

manuel.fernandez.delgado/papers/jmlr/data.tar.gz

(i) If the dataset is not originally (as provided by creator of the datasets)

available in two sets i.e. training and testing set then 4-fold cross validation

is performed on the whole dataset. We have used the indexes as in [23] of

the training and testing sets of each fold for all the used classiﬁers in the

paper. The ﬁnal result is considered as the average result of 4 folds.

(ii) If the dataset is originally (as provided by creator of the datasets) available

in two sets i.e. training and testing sets (like hill-valley, horse-colic, monks-

1, spectf, etc.) then train and test the classiﬁers on the respective partition

using the obtained optimal parameters in the ﬁrst step.

All used partitions and indexes in our experiments are the same as used in

[23]. Eight variants of TWSVM viz., TBSVM, TWSVM, LPTSVM, pinTSVM,

WLTSVM, LSTSVM, ILSTSVM, and RELS-TSVM, are discussed in the previ-

ous section. We have followed the same naming convention for classiﬁcation as

in [23]. All the experiments have been conducted on MATLAB 2016a in Win-

dows 7 (64 bit) environment with 64 GB RAM, 3.00 GHz Intel Xeon processor.

The names of 8 variants of TWSVM are appended with ‘m’ as ‘m’ stands for

MATLAB: TBSVM m, TWSVM m, LPTSVM m, pinTSVM m, WLTSVM m,

LSTSVM m, ILSTSVM m, and RELS-TSVM m. It is to be noted that now on-

ward when we mention TWSVM instead of TWSVM m then TWSVM denotes

twin SVM family collectively, and TWSVM m denotes the basic variant of twin

SVM family. These variants of TWSVM use Gaussian kernel and it has one

parameter σ. The ranges of all parameters of TWSVM variants are provided in

Table 1.

4.1. Comparison between 8TWSVM variants with 179 classiﬁers from [23] on

binary class datasets

In this section, we will discuss only performance on binary datasets. Discus-

sion on multi-class datasets is provided in Section 4.2. Performance of all 179

classiﬁers from [23] with 8 variants of TWSVM are presented in Tables 2 and 3.

All values of 8 variants of TWSVM are kept bold faced in Tables 2 and 3. First

column in Tables 2 and 3 shows position (Pos) as per Friedman Rank (FRank).

Table 1: Ranges of all parameters of TWSVM variants

Parameters ILSTSVM m LPTSVM m LSTSVM m pinTSVM m RELS-TSVM m TBSVM m TWSVM m WLTSVM m

Regularization (c1)10−5to 10510−5to 10510−5to 1052−3to 2710−5 to 10510−5to 10510−5to 10510−5to 105

Regularization (c2)10−5to 10510−5 to 10510−5to 1052−3to 2710−5to 10510−5to 10510−5to 10510−5to 105

Epsilon (1)10−5to 105−− −− −− 10−5to 10510−5 to 105−− 10−5to 105

Epsilon (2)10−5 to 105−− −− −− 10−5to 10510−5to 105−− 10−5to 105

Sigma (σ)2−10 to 210 2−10 to 210 2−10 to 210 2−10 to 210 2−10 to 210 2−10 to 210 2−10 to 210 2−10 to 210

Tau (τ1)−− −− −− [0.05,0.1,0.2,0.5,1] −− −− −− −−

Tau (τ2)−− −− −− [0.05,0.1,0.2,0.5,1] −− −− −− −−

Supplementary parameter (v1)−− −− −− 2−3to 27−− −− −− −−

Supplementary parameter (v2)−− −− −− 2−3to 27−− −− −− −−

Energy (E1)−− −− −− −− [0.5,0.6,0.7,0.8,0.9,1.0] −− −− −−

Energy (E2)−− −− −− −− [0.5,0.6,0.7,0.8,0.9,1.0] −− −− −−

Second column contains classiﬁers with their family name in the bracket, these

names are kept the same as in [23] and their detailed descriptions are available

in [23]. Third and fourth columns contain FRank and average accuracy (Acc)

of their respective classiﬁers. These average accuracies and ranks are diﬀerent

from the average accuracy mentioned in [23] as these average accuracies are

based only on 44 binary datasets. Out of 179 classiﬁers, vbmpRadial t didn’t

work on the binary class datasets as it needs minimum 3-class dataset. Just for

sake of completeness, vbmpRadial t is simply added to the last place in Table

3. We have performed analysis in the same way as discussed in [23] by calcu-

lating FRank, Average accuracy (Acc), Probability of Achieving the Maximum

Accuracy (PAMA), Probability of achieving more than 95% of the maximum

accuracy (P95) and Percentage of the Maximum Accuracy (PMA). PAMA, p95

and PMA are deﬁned as follows [23]:

PAMA = Number of datasets for which any classiﬁer achieves the maximum accuracy

Number of datasets ×100

p95 = Number of datasets for which any classiﬁer achieves more than 95% of maximum accuracy

Number of datasets ×100

PMA = PNo. of datasets

i=1 Accuracy of classiﬁer for ith dataset

Maximum accuracy achieved for ith dataset ×100

Number of datasets

Above all mentioned measures are recalculated for binary datasets with 8

variants of TWSVM by using the detailed results3provided by Fernandez et al.

[23]. These are extensively discussed in the following subsections.

4.1.1. FRank and Average Accuracy analysis of 8 variants of TWSVM with 179

classiﬁers from [23]

Friedman ranking is performed on 44 binary datasets with 187 classiﬁers

(179 classiﬁers from [23] and 8 variants of TWSVM). Their FRank and Aver-

age accuracy (Acc) are presented in Tables 2 and 3. As the detailed results

of each classiﬁer on each dataset are of very large size, all results are pro-

vided in detail at http://people.iiti.ac.in/~phd1501101001/TSVM_JMLR_

Binary_Multi.html. As mentioned in [23], some classiﬁers yield erroneous out-

put, all erroneous output of the classiﬁers are replaced by the average accuracy

of that speciﬁc dataset over 187 classiﬁers before calculating FRank value (as

done in [23]). Those erroneous values are represented by ‘–’ in the result table,

which is available at the above mentioned webpage. The top three best perform-

ing classiﬁers as per their average accuracies are RELS-TSVM m, ILSTSVM m

and KRR m/KELM m. However, FRank is not consistent with average accu-

racy which can be seen in Tables 2 and 3. As per FRank, RELS-TSVM m,

ILSTSVM m and avNNet t are the top three classiﬁers. According to both

criteria viz., FRank and Average accuracy, improved least square variants of

TWSVM based classiﬁers (RELS-TSVM m and ILSTSVM m) got the joint top

place in the table. However, the basic least square variants i.e. LSTSVM m is

the second worst performer among all 8 variants of TWSVM. If we see top 20

classiﬁers list then we observe that TWSVM based 5 classiﬁers are in the top

20 classiﬁers list according to both FRank as well as Average accuracy (Acc)

among 187 classiﬁers. Among 187 classiﬁers, RELS-TSVM m and ILSTSVM m

show stable and better results compared to all remaining classiﬁers. Fig. 1

3The detailed results are available at: http://persoal.citius.usc.es/manuel.

fernandez.delgado/papers/jmlr/

Table 2: Position of classiﬁers based on binary class datasets (Pos) as per FRank in [23],

FRank, Average accuracy (Acc) for each classiﬁer, ordered by increasing the FRank. This

table is continued to Table 3

Pos Classiﬁers FRank Acc

1 RELS-TSVM m(TWSVM) 36.9 83.1

1 ILSTSVM m(TWSVM) 36.9 83.1

2 avNNet t(NNET) 39.7 82.0

3 svmPoly t(SVM) 40.4 81.8

4 TBSVM m(TWSVM) 43.2 81.8

5 svmRadialCost t(SVM) 43.9 81.9

6 pcaNNet t(NNET) 45.1 81.9

7 KRR m/KELM m(NNET) 48.1 82.7

8 svm C(SVM) 48.4 80.8

9 rf t(RF) 48.6 81.6

10 LPTSVM m(TWSVM) 48.9 81.9

11 svmRadial t(SVM) 51.2 81.2

12 parRF t(RF) 52.9 81.1

13 nnet t(NNET) 54.1 80.9

14 C5.0 t(BST) 56.6 80.4

15 mlp t(NNET) 56.8 80.7

16 cforest t(RF) 57.3 79.5

17 mlpWeightDecay t(NNET) 57.4 80.3

18 TWSVM m(TWSVM) 58.9 79.6

19 svmLinear t(SVM) 60.9 80.6

20 svmBag R(BAG) 61.5 80.6

21 RotationForest w(RF) 62.6 80.6

22 rforest R(RF) 63.4 82.5

22 gaussprRadial R(OM) 63.4 81.2

23 bayesglm t(GLM) 64.2 80.3

24 fda t(DA) 64.5 79.7

25 glmnet R(GLM) 64.8 81.0

26 BG LibSVM w(BAG) 65.2 79.5

27 rda R(DA) 66.1 80.5

28 pls t(PLSR) 66.9 80.6

29 rbf t(NNET) 67.2 78.7

30 knn t(NN) 67.5 79.4

31 WLTSVM m(TWSVM) 68.5 81.1

31 pnn m(NNET) 68.5 79.3

32 svmlight C(NNET) 68.8 81.1

33 pda t(DA) 68.9 80.5

34 rbfDDA t(NNET) 69.3 80.0

35 MAB LibSVM w(BST) 70.0 79.7

36 simpls R(PLSR) 70.7 80.5

37 widekernelpls R(PLSR) 71.3 80.5

38 RRFglobal t(RF) 71.4 80.2

39 multinom t(LMR) 71.5 80.8

40 nnetBag R(BAG) 71.6 79.5

41 dkp C(NNET) 72.4 79.6

42 mlm R(GLM) 73.2 80.9

43 adaboost R(BST) 73.5 80.0

44 fda R(DA) 73.6 79.8

45 plsBag R(BAG) 73.9 79.0

46 lda R(DA) 74.2 79.5

47 LibLINEAR w(SVM) 74.6 79.5

Pos Classiﬁers FRank Acc

48 mda t(DA) 74.7 72.8

49 kernelpls R(PLSR) 74.9 78.4

50 LibSVM w(SVM) 76.0 78.6

50 RRF t(RF) 76.0 79.8

51 MCC w(LMR) 76.4 79.2

51 Logistic w(OEN) 76.4 79.2

52 BG RandomForest w(BAG) 76.8 79.3

53 RandomForest w(RF) 77.4 79.7

54 knn R(NN) 77.5 79.2

55 gcvEarth t(MARS) 77.6 78.7

56 lvq t(NNET) 78.5 79.0

57 Decorate w(OEN) 78.7 79.8

58 MAB PART w(BST) 79.2 79.3

59 bagging R(BAG) 79.5 77.0

60 ldaBag R(BAG) 79.7 79.6

60 sda t(DA) 79.7 79.5

61 SMO w(SVM) 79.8 78.9

62 mars R(MARS) 79.9 78.4

63 SimpleLogistic w(LMR) 80.0 78.2

64 glmStepAIC t(GLM) 80.3 78.7

65 BG PART w(BAG) 81.3 78.8

66 MAB RandomForest w(BST) 81.4 79.5

67 lda2 t(DA) 81.6 79.1

68 mlp C(NNET) 82.1 77.9

69 BG Logistic w(BAG) 82.3 79.2

69 MAB MLP w(BST) 82.3 79.7

70 MAB Logistic w(BST) 83.5 79.3

71 MAB J48 w(BST) 83.9 79.1

72 BG DecisionTable w(BAG) 84.9 77.8

73 gpls R(PLSR) 85.1 76.3

74 hdda R(DA) 85.3 79.0

75 MLP w(NNET) 85.9 79.2

76 glm R(GLM) 86.4 76.8

77 ctreeBag R(BAG) 87.1 77.7

77 BG J48 w(BAG) 87.1 78.4

78 elm m(NNET) 87.2 78.0

79 RandomSubSpace w(DT) 87.3 77.6

80 CVR w(OM) 88.5 78.4

81 MAB REPTree w(BST) 88.6 78.3

82 JRip t(RL) 89.0 77.7

83 ctree2 t(DT) 89.5 77.3

84 ctree t(DT) 90.1 77.1

85 AdaBoostM1 J48 w(BST) 90.4 79.3

86 Dagging w(OEN) 90.6 77.6

87 LSTSVM m(TWSVM) 91.1 74.0

88 BG Ibk w(BAG) 91.6 78.5

88 BG LWL w(BAG) 91.6 78.5

88 BG REPTree w(BAG) 91.6 78.5

89 mda R(DA) 92.6 78.4

90 RandomCommittee w(OEN) 92.7 78.6

Table 3: Continuation of Table 2

Pos Classiﬁers FRank Acc

91 treebag t(BAG) 92.8 78.1

91 obliqueTree R(DT) 92.8 78.6

92 PenalizedLDA R(DA) 94.2 76.2

93 BG RandomTree w(BAG) 95.9 78.4

94 MAB DecisionTable w(BST) 96.0 76.5

95 C5.0Rules t(RL) 96.9 78.4

96 mlp m(NNET) 97.4 77.4

97 NBTree w(DT) 97.6 77.2

97 LogitBoost w(BST) 97.6 77.2

98 lssvmRadial t(SVM) 97.7 77.5

99 RBFNetwork w(NNET) 98.8 76.3

100 C5.0Tree t(DT) 99.5 77.9

101 DTNB w(RL) 99.6 77.3

102 rpart t(DT) 99.9 77.0

103 slda t(DA) 100.5 76.4

104 ASC w(OM) 100.7 77.0

105 AdaBoostM1 w(BST) 100.8 76.8

105 JRip w(RL) 100.8 77.4

106 FilteredClassier w(OM) 101.4 77.0

107 Ridor w(RL) 102.8 77.0

108 lvq R(NNET) 103.2 73.5

109 pam t(OM) 103.7 76.2

110 MAB RandomTree w(BST) 103.8 77.7

111 sddaLDA R(DA) 104.1 76.6

112 PART w(DT) 104.3 77.7

113 J48 w(DT) 104.5 77.8

113 OCC w(OEN) 104.5 77.8

113 END w(OEN) 104.5 77.8

114 rbf m(NNET) 104.7 74.0

115 rpart2 t(DT) 104.9 78.3

116 qda t(DA) 105.4 76.1

116 BayesNet w(BY) 105.4 75.9

116 PART t(DT) 105.4 78.2

117 stepLDA t(DA) 106.4 75.4

118 bdk R(NNET) 106.8 77.7

119 J48 t(DT) 108.0 77.6

120 sddaQDA R(DA) 109.4 75.1

121 rpart R(DT) 111.3 76.3

122 sparseLDA R(DA) 112.5 73.4

123 REPTree w(DT) 113.5 76.1

124 stepQDA t(DA) 114.0 74.2

125 DecisionTable w(RL) 114.1 75.4

126 MAB Ibk w(BST) 114.3 74.8

Pos Classiﬁers FRank Acc

126 MAB w(BST) 114.6 74.8

127 MAB NaiveBayes w(BST) 115.1 74.5

127 IBk w(NN) 115.1 76.5

128 KStar w(OM) 116.3 76.2

129 NNge w(NN) 116.7 76.1

130 cascor C(NNET) 117.6 75.5

131 nbBag R(BAG) 117.7 74.5

132 rrlda R(DA) 117.8 74.7

132 naiveBayes R(BY) 117.9 74.3

133 IB1 w(NN) 120.1 76.2

134 NaiveBayes w(BY) 121.2 73.7

135 LWL w(OEN) 121.3 74.5

136 BG DecisionStump w(BAG) 122.0 73.8

137 BG NaiveBayes w(BAG) 124.5 73.1

138 BG OneR w(BAG) 125.3 74.0

139 DecisionStump w(DT) 125.7 73.5

139 NBUpdateable w(BY) 125.7 73.1

140 ConjunctiveRule w(RL) 128.7 72.7

141 MAB OneR w(BST) 130.1 74.0

142 NaiveBayesSimple w(BY) 131.3 71.9

143 OneR t(RL) 132.2 72.5

144 dpp C(NNET) 132.3 69.4

145 spls R(PLSR) 133.9 65.8

146 logitboost R(BST) 136.0 69.3

147 QdaCov t(DA) 136.6 73.1

148 OneR w(RL) 137.6 71.9

149 RandomTree w(DT) 138.7 74.6

150 BG MLP w(BAG) 138.7 66.5

151 pinTSVM m(TWSVM) 139.7 66.3

152 BG HyperPipes w(BAG) 141.3 67.0

152 Stacking w(STC) 145.2 63.2

152 Grading w(OEN) 145.2 63.2

153 CVPS w(OM) 145.2 63.2

154 StackingC w(STC) 145.3 63.1

155 RILB w(BST) 145.7 63.4

155 VFI w(OM) 146.0 68.2

156 HyperPipes w(OM) 146.1 65.1

156 ZeroR w(RL) 146.8 62.6

156 MultiScheme w(OEN) 146.8 62.6

156 CSC w(OEN) 146.8 62.6

157 Vote w(OEN) 146.8 62.6

158 MetaCost w(BAG) 147.3 62.5

159 CVC w(OM) 165.6 61.6

160 vbmpRadial t(BY) NA NA

exhibits that RELS-TSVM m and ILSTSVM m achieve either maximum ac-

curacy or near maximum accuracy for all datasets except hill-valley (53.6%)

and horse-colic (60.29%) datasets. We further provide two seperate analysis in

Figs. 2 and 3 on top 25 classiﬁers using FRank and Acc. Fig. 2 shows that

top 2 classiﬁers i.e. RELS-TSVM m, ILSTSVM m (TWSVM family) exhibit

identical performance. There is a signiﬁcant diﬀerence (2.8) in the FRank of

RELS-TSVM m (or ILSTSVM m) and avNNet t. Furthermore, we need to sta-

tistically verify the results presented in Tables 2 and 3. We have selected top

20 classiﬁers from these tables as per their FRanks and performed Friedman

test. In the Friedman test, null hypothesis states that accuracies of two com-

pared methods are not signiﬁcantly diﬀerent within a tolerance α= 0.05, when

p-value>0.05. We have computed three things under this test F-score, criti-

cal value(Cval) and p-value for top 20 classiﬁers. Moreover, we have computed

Friedman test and modifed Friedman test [34]. Computed p-values for both

cases (Friedman and modiﬁed Friedman test) are less than 0.05 i.e. 0.0012 and

0.0010, respectively. Computed F-scores for both cases are also less than critical

value i.e. 43.1672 <30.1435 and 2.3412 <1.5993. Based on the above analysis,

few interesting facts can be observed from Table 2 and Figs. 2 and 3:

(i) Despite of the identical average accuracy of svmPoly t and TBSVM m,

svmPoly t yields lower FRank compared to TBSVM m with a diﬀerence

of 2.8.

(ii) Similarly, Acc of TBSVM m is lesser compared to KRR m/KELM m how-

ever, TBSVM m achieves better FRank (diﬀerence of FRanks: 4.9) com-

pared to KRR m/KELM m.

(iii) KRR m/KELM m yields the third highest accuracy among 187 classiﬁers

and there is only a diﬀerence of 0.4 between the Acc of KRR m/KELM m

and RELS-TSVM m (or ILSTSVM m). However, KRR m/KELM m got

7th place among 187 classiﬁers.

(iv) Similar facts as discussed in the above three points can be stated for the

LPTSVM m, TWSVM m and WLTSVM m. These facts mainly exhibit

0 5 10 15 20 25 30 35 40 45

Datasets

100

Accuracy(%)

RELS-TSVM_m

Maximum Acc

Figure 1: Accuracy (in %) achieved by RELS-TSVM m (and ILSTSVM m) vs. maximum

accuracy for each dataset (ordered by increasing maximum accuracies).

the unstable behavior of these 3 variants of TWSVM based classiﬁers.

Even, WLTSVM m performs better in term of Acc but yields inferior

FRank compared to TWSVM m.

(v) One more interesting fact can be stated from the ﬁrst column (i.e. Pos) of

Tables 2, 3, 7 and 8 that various low performing classiﬁers for multi-class

datasets have performed very well for binary class datasets in terms of Acc

as well as FRank.

Analysis of this section clearly shows the signiﬁcant dominance of RELS-

TSVM m and ILSTSVM m over all remaining classiﬁers. However, these can’t

be the only criteria to measure the perfromance of the classiﬁers. In the next

two subsections, we will analyze the performance of classiﬁers based on PAMA,

p95 and PMA.

RELS-TSVM_m

ILSTSVM_m

avNNet_t

svmPoly_t

TBSVM_m

svmRadialCost_t

pcaNNet_t

KRR_m/KELM_m

svm_Crf_t

LPTSVM_m

svmRadial_t

parRF_t

nnet_t

C5.0_t

mlp_t

cforest_t

mlpWeightDecay_t

TWSVM_m

svmLinear_t

svmBag_R

RotationForest_w

rforest_R

gaussprRadial_R

bayesglm_t

Friedman Rank (FRank)

Figure 2: Top 25 classiﬁers as per FRank in increasing order of FRank.

RELS-TSVM_m

ILSTSVM_m

KRR_m/KELM_m

rforest_R

avNNet_t

pcaNNet_t

svmRadialCost_t

LPTSVM_m

TBSVM_m

svmPoly_t

rf_t

svmRadial_t

gaussprRadial_R

WLTSVM_m

svmlight_C

parRF_t

glmnet_R

mlm_R

nnet_t

svm_C

multinom_t

mlp_t

svmLinear_t

svmBag_R

RotationForest_w

80.5

81.5

82.5

83.5

Accuracy (%)

Figure 3: Top 25 classiﬁers as per average accuracy in decreasing order of Acc.

4.1.2. PAMA and p95 analysis of 8 variants of TWSVM with 179 classiﬁers

from [23]

PAMA and p95 were calculated for 187 classiﬁers over 44 binary datasets.

Top 20 classiﬁers as per PAMA criterion are listed in Table 4 and PAMA value

of all 187 classiﬁers are provided on this web page1. As we can see from Ta-

ble 4, TBSVM m emerges as the best classiﬁer instead of RELS-TSVM m (or

ILSTSVM m) as per PAMA value. Surprisingly, the top 2 classiﬁers (RELS-

TSVM m and ILSTSVM m) as per FRank in the Table 2 are not able to secure

position in the top 20 list. It can be noted that all three least square based

TWSVM (LSTSVM m, RELS-TSVM m and ILSTSVM m) yield the same PAMA

value of 6.8. 4 variants of TWSVM viz., TBSVM m, LPTSVM m, TWSVM m

and WLTSVM m, are in the top 20 list, which shows the dominance of TWSVM

based classiﬁers over other classiﬁers. However, PAMA provides the biased in-

sight about the classiﬁer as some classiﬁer didn’t achieve the maximum accuracy

but they might be very near to maximum accuracy [23]. Therefore, we consider

the p95 criteria for the evaluation. Top 20 classiﬁers as per p95 criterion are

listed in Table 5 and the p95 values of all 187 classiﬁers are provided on this web

page1. As per this criterion, 4 variants of TWSVM attain a position within the

top in the list of top 20 classiﬁers in Table 5. Both least square variants viz.,

RELS-TSVM m and ILSTSVM m, got the position in top 10 while TBSVM m

performs better than RELS-TSVM m and ILSTSVM m. This shows the fact

that RELS-TSVM m and ILSTSVM m might not achieve best accuracy but

improves the generalization capability of TWSVM m.

4.1.3. PMA analysis of 8 variants of TWSVM with 179 classiﬁers from [23]

Top 20 classiﬁers as per PMA criterion are listed in Table 6 and PMA val-

ues of all 187 classiﬁers are provided on the web page1. Before calculating

the PMA value, all erroneous output of the classiﬁers are replaced by zero (as

done in [23]). As per this criterion, 2 variants of TWSVM i.e. RELS-TSVM m

and ILSTSVM m, attain the top 2 positions similar to FRank and Acc crite-

ria. Four TWSVM based classiﬁers secured their positions among the top 10

Table 4: Top 20 classiﬁers as per the highest PAMA (%) value.

S.No. Classiﬁer PAMA(%)

1TBSVM m(TWSVM) 15.9

2 KRR m/KELM m(NNET) 13.6

3 mda t(DA) 11.4

4 mlp t(NNET) 11.4

5 svmRadialCost t(SVM) 11.4

6LPTSVM m(TWSVM) 11.4

7 pcaNNet t(NNET) 9.1

8 pnn m(NNET) 9.1

9 dkp C(NNET) 9.1

10 svm C(SVM) 9.1

S.No. Classiﬁer PAMA(%)

11 adaboost R(BST) 9.1

12 nnetBag R(BAG) 9.1

13 rforest R(RF) 9.1

14 gpls R(PLSR) 9.1

15 TWSVM m(TWSVM) 9.1

16 WLTSVM m(TWSVM) 9.1

17 MAB DecisionTable w(BST) 6.8

18 pda t(DA) 6.8

19 rda R(DA) 6.8

20 rbf m(NNET) 6.8

Table 5: Top 20 classiﬁers as per the highest p95 (%) value.

S.No. Classiﬁer p95(%)

1 svmRadialCost t(SVM) 77.3

2 svm C(SVM) 72.7

3 svmPoly t(SVM) 72.7

4TBSVM m(TWSVM) 72.7

5 svmRadial t(SVM) 70.5

6RELS-TSVM m(TWSVM) 70.5

7ILSTSVM m(TWSVM) 70.5

8 BG LibSVM w(BAG) 68.2

9LPTSVM m(TWSVM) 68.2

10 avNNet t(NNET) 65.9

S.No. Classiﬁer p95(%)

11 KRR m/KELM m(NNET) 63.6

12 MAB LibSVM w(BST) 61.4

13 pcaNNet t(NNET) 61.4

14 LibSVM w(SVM) 61.4

15 C5.0 t(BST) 61.4

16 rf t(RF) 61.4

17 parRF t(RF) 61.4

18 TWSVM m(TWSVM) 61.4

19 svmBag R(BAG) 59.1

20 mlpWeightDecay t(NNET) 56.8

Table 6: Top 20 classiﬁers as per the highest PMA (%) value.

S.No. Classiﬁer PMA(%)

1RELS-TSVM m(TWSVM) 95.3

2ILSTSVM m(TWSVM) 95.3

3 KRR m/KELM m(NNET) 94.8

4 avNNet t(NNET) 94.1

5 pcaNNet t(NNET) 93.9

6 svmRadialCost t(SVM) 93.9

7 svmPoly t(SVM) 93.8

8TBSVM m(TWSVM) 93.7

9LPTSVM m(TWSVM) 93.7

10 rf t(RF) 93.5

S.No. Classiﬁer PMA(%)

11 svmRadial t(SVM) 93.1

12 rforest R(RF) 93.0

13 nnet t(NNET) 93.0

14 parRF t(RF) 93.0

15 glmnet R(GLM) 92.9

16 WLTSVM m(TWSVM) 92.8

17 svm C(SVM) 92.8

18 mlp t(NNET) 92.7

19 nnetBag R(BAG) 92.7

20 svmLinear t(SVM) 92.6

RELS-TSVM_m

ILSTSVM_m

KRR_m/KELM_m

avNNet_t

pcaNNet_t

svmRadialCost_t

svmPoly_t

TBSVM_m

LPTSVM_m

rf_t

svmRadial_t

rforest_R

nnet_t

parRF_t

glmnet_R

WLTSVM_m

svm_C

mlp_t

nnetBag_R

svmLinear_t

RotationForest_w

pda_t

svmBag_R

simpls_R

widekernelpls_R

92.5

93.5

94.5

95.5

% of the maximum accuracy (PMA)

Figure 4: Top 20 classiﬁers as per PMA criterion in decreasing order of PMA value.

0 5 10 15 20 25 30 35 40 45

Datasets

100

% of the maximum accuracy

RELS-TSVM_m

avNNet_t

Figure 5: PMA value over 44 datasets in increasing order for RELS-TSVM m and avNNet t.

classiﬁers. In Fig. 4, top 20 classiﬁers are plotted using PMA values. One

can observe that RELS-TSVM m and ILSTSVM m clearly outperform the top

two classiﬁers of [23] as the diﬀerence of the PMA value of RELS-TSVM m

from KRR m/KELM m and avNNet t are 0.5 and 1.2, respectively. The best

performing classiﬁer among 179 classiﬁers of [23] (i.e. avNNet t) and among

TWSVM based classiﬁers (i.e. RELS-TSVM m or ILSTSVM m) as per FRank

are plotted with their PMA value in the increasing order over 44 binary datasets

in Fig. 5. It can be easily observed that RELS-TSVM m either performed sim-

ilar or signiﬁcantly better compared to avNNet t on 44 binary datasets.

4.2. Comparison between 8TWSVM variants with 179 classiﬁers from [23] for

multi-class datasets

In this subsection, we compare 8 TWSVM variants with 179 classiﬁers from

[23] on 46 multi-class datasets. Results are provided in Table 7 and Table 8

which contains FRank and Acc of the classiﬁers. As it can be observed from

these tables that 4 TWSVM variants achieve top 20 ranks and outperformed

most of the classiﬁers. However, performance is not similar as the case of bi-

nary datasets where TWSVM variants achieved top 2 positions. Further, we

have also calculated PAMA for these multi-class datasets and results are pre-

sented in Table 9. Here, 1 classiﬁer manages to achieve 2nd positions and 1

classiﬁer achieves the 3rd position. Total 4 TWSVM variants get a position

in top 20 as per PAMA criterion. It is to be noted that only one variant,

TBSVM m, is common between Table 6 and Table 9. All results are pro-

vided in detail at http://people.iiti.ac.in/~phd1501101001/TSVM_JMLR_

Binary_Multi.html. Comparison of TWSVM variants is discussed in detail

in the subsequent subsection. Furthermore, similar as binary class datasets,

we have computed F-score, p-value and critical value for multi-class datasets.

Computed p-values for both cases (Friedman and modiﬁed Friedman test) are

less than 0.05 i.e. 0.0094 and 0.0085, respectively. Computed F-scores for both

cases are also less than critical value i.e. 36.4090 <30.1435 and 1.9561 <1.5987.

Based on the above discussion, we can state that outcomes presented in this pa-

per are signiﬁcantly diﬀerent and we can reject null hypothesis.

4.3. Comparison among TWSVM variants

TWSVM variants can be divided into 3 categories. The ﬁrst category con-

tains three basic TWSVM variants viz., TBSVM m, TWSVM m, and LPTSVM m.

The second category is based on weighted TWSVM and variants are pinTSVM m

and WLTSVM m. The third category contains three least square variants viz.,

LSTSVM m, ILSTSVM m, and RELS-TSVM m. It can be observed from Table

2 and 3 that basic least square version of TWSVM didn’t perform well but two

variants from least square category ( ILSTSVM m, and RELS-TSVM m) per-

formed better than rest of the variants as per Acc value. A Similar observation

is made for multi-class datasets in Tables 7 and 8 that basic least square version

didn’t perform well but another variant, ILSTSVM m, from this category yields

best Acc value among all. Although, one variant from basic TWSVM category

(TWSVM m), two least square category variants (ILSTSVM m, and RELS-

TSVM m), and one variant from weighted TWSVM (WLTSVM m) category

yielded similar Acc value with minor diﬀerences as shown in Table 7. As per

PAMA value of binary datasets in Table 4, all three basic TWSVM family vari-

ants achieve top 3 position among 8 TWSVM variants. As per PAMA value of

multi-class datasets in Table 9, the scenario is completely diﬀerent and all three

variants from least square category achieve top 3 position among 8 TWSVM

variants. Time comparison among all 8 variants for 90 datasets is provided in

Tables 10 and 11. Weighted TWSVM category-based variants consume the least

time compared to rest of the categories and least square category stands on the

second position among three categories. Basic TWSVM variants consume more

time in converging the optimization problem for few datasets. This fact can be

observed in the Tables 10 and 11 for few datasets viz., thyroid, cardiotocography-

10clases, statlog-image, steel-plates etc. For these few datasets, average training

time is more for TBSVM m, TWSVM m, and LPTSVM m.

Table 7: Position of classiﬁers for multi-class datasets (Pos) as per FRank in [23], FRank,

Average accuracy (Acc) for each classiﬁer, ordered by increasing the FRank. This table is

continued to Table 8

Pos Classiﬁer FRank Acc

1 parRF t(RF) 24.3 79.4

2 rf t(RF) 28.0 79.1

3 rforest R(RF) 31.2 78.9

4 nnet t(NNET) 36.3 78.6

5 svmPoly t(SVM) 37.5 77.7

6 svm C(SVM) 38.8 78.3

7 svmRadial t(SVM) 39.3 77.4

8 KRR m/KELM m(NNET) 39.4 77.4

9 RRF t(RF) 39.5 78.2

10 svmRadialCost t(SVM) 40.1 77.5

11 mlp t(NNET) 41.8 78.4

12 C5.0 t(BST) 43.4 77.2

13 avNNet t(NNET) 44.5 77.8

14 BG LibSVM w(BAG) 45.0 77.2

15 pcaNNet t(NNET) 45.4 76.7

16 TBSVM m(TWSVM) 45.9 76.4

17 adaboost R(BST) 46.0 76.9

18 TWSVM m(TWSVM) 46.1 76.8

19 RELS-TSVM m(TWSVM) 46.4 76.7

20 ILSTSVM m(TWSVM) 46.6 76.9

20 RotationForest w(RF) 46.6 77.6

21 RRFglobal t(RF) 48.2 76.9

22 LibSVM w(SVM) 50.9 76.0

23 MAB LibSVM w(BST) 52.5 76.5

24 RandomCommittee w(OEN) 56.8 76.4

25 Decorate w(OEN) 57.0 76.1

26 MAB RandomForest w(BST) 57.2 75.2

27 LPTSVM m(TWSVM) 58.2 74.3

28 mlpWeightDecay t(NNET) 58.6 76.7

29 CVR w(OM) 59.4 75.6

30 svmLinear t(SVM) 59.7 75.6

31 cforest t(RF) 60.1 75.4

32 pnn m(NNET) 60.4 76.2

33 gaussprRadial R(OM) 60.5 76.2

34 dkp C(NNET) 60.8 76.0

35 multinom t(LMR) 60.9 75.9

36 glmnet R(GLM) 61.1 75.0

37 treebag t(BAG) 61.3 76.2

38 mlp C(NNET) 61.8 75.8

39 RandomForest w(RF) 61.9 74.8

40 SimpleLogistic w(LMR) 63.3 75.2

41 elm m(NNET) 64.8 76.2

42 rda R(DA) 66.0 75.2

43 MAB MLP w(BST) 66.8 74.2

43 mda t(DA) 66.8 73.4

44 END w(OEN) 67.2 75.4

45 pda t(DA) 67.3 74.3

46 BG RandomForest w(BAG) 67.7 74.9

Pos Classiﬁer FRank Acc

47 MAB PART w(BST) 68.4 74.7

47 LogitBoost w(BST) 68.4 74.8

47 svmlight C(NNET) 68.4 74.2

48 MAB J48 w(BST) 69.0 74.7

49 fda R(DA) 69.1 73.9

50 ldaBag R(BAG) 69.2 73.7

51 fda t(DA) 69.4 74.8

52 knn R(NN) 69.5 74.8

53 rbf t(NNET) 69.9 73.5

53 gcvEarth t(MARS) 69.9 74.4

54 lda R(DA) 70.3 73.8

55 BG PART w(BAG) 70.4 74.3

56 BG REPTree w(BAG) 70.6 74.8

57 BG J48 w(BAG) 71.2 74.4

58 rbfDDA t(NNET) 71.3 74.6

59 MLP w(NNET) 71.5 74.4

60 lda2 t(DA) 72.2 73.3

61 knn t(NN) 72.8 74.2

62 mlm R(GLM) 73.1 73.6

63 AdaBoostM1 J48 w(BST) 73.6 74.2

64 ctreeBag R(BAG) 74.1 73.5

65 BG RandomTree w(BAG) 75.1 73.4

66 LibLINEAR w(SVM) 75.3 74.9

67 lssvmRadial t(SVM) 76.0 75.8

68 BG Ibk w(BAG) 76.2 73.7

69 sda t(DA) 76.6 73.3

70 lvq t(NNET) 77.8 74.3

71 BG LWL w(BAG) 78.5 73.0

72 SMO w(SVM) 79.4 73.5

73 pls t(PLSR) 80.2 70.5

74 MAB RandomTree w(BST) 80.3 73.2

75 KStar w(OM) 81.3 73.8

76 hdda R(DA) 81.5 72.9

77 mda R(DA) 81.8 73.3

78 LSTSVM m(TWSVM) 82.5 71.8

79 RandomSubSpace w(DT) 82.6 73.1

80 RBFNetwork w(NNET) 83.6 73.6

81 C5.0Tree t(DT) 85.0 73.4

82 J48 t(DT) 85.5 72.9

82 rpart R(DT) 85.5 72.2

83 MAB REPTree w(BST) 85.6 72.4

84 NNge w(NN) 86.0 73.5

85 Logistic w(OEN) 86.2 72.2

86 C5.0Rules t(RL) 86.3 73.1

87 BG Logistic w(BAG) 86.9 72.0

88 JRip t(RL) 87.0 72.4

89 PART t(DT) 87.6 72.6

89 J48 w(DT) 87.6 73.3

Table 8: Continuation of Table 2

Pos Classiﬁer FRank Acc

90 ASC w(OM) 87.8 73.0

91 MAB Logistic w(BST) 89.3 71.6

92 logitboost R(BST) 89.7 72.4

93 PART w(DT) 90.6 72.4

94 rpart2 t(DT) 90.8 72.0

95 lvq R(NNET) 90.9 70.7

96 svmBag R(BAG) 91.0 67.5

96 MCC w(LMR) 91.0 71.8

97 nbBag R(BAG) 91.2 72.1

98 WLTSVM m(TWSVM) 91.5 71.4

99 IB1 w(NN) 92.1 72.4

100 rpart t(DT) 96.1 71.3

100 MAB DecisionTable w(BST) 96.1 70.8

101 BayesNet w(BY) 96.2 71.1

102 NBTree w(DT) 96.3 71.8

103 REPTree w(DT) 97.6 71.2

104 naiveBayes R(BY) 98.1 71.0

105 BG DecisionTable w(BAG) 98.7 71.3

106 DTNB w(RL) 98.8 71.4

107 ctree t(DT) 99.0 70.5

108 cascor C(NNET) 99.3 70.2

109 IBk w(NN) 99.6 70.8

110 ctree2 t(DT) 99.8 70.4

111 JRip w(RL) 100.4 71.2

112 qda t(DA) 100.8 69.3

113 NaiveBayes w(BY) 101.9 69.5

114 bagging R(BAG) 101.9 60.0

115 bdk R(NNET) 102.1 71.6

116 BG NaiveBayes w(BAG) 103.0 68.8

117 FilteredClassiﬁer w(OM) 103.9 70.6

118 NBUpdateable w(BY) 104.9 68.0

119 MAB NaiveBayes w(BST) 105.9 68.9

120 Ridor w(RL) 106.6 71.0

121 pam t(OM) 107.0 68.4

122 OCC w(OEN) 107.1 70.0

123 rrlda R(DA) 108.1 67.1

124 RandomTree w(DT) 109.1 69.8

125 slda t(DA) 110.7 67.5

126 vbmpRadial t (BY) 110.8 66.0

127 sparseLDA R(DA) 110.9 65.6

128 Dagging w(OEN) 111.8 67.9

129 plsBag R(BAG) 112.1 63.1

130 rbf m(NNET) 113.2 64.9

131 QdaCov t(DA) 113.9 66.1

132 obliqueTree R(DT) 114.2 62.1

Pos Classiﬁer FRank Acc

132 DecisionTable w(RL) 114.2 68.4

133 PenalizedLDA R(DA) 114.8 63.1

134 NaiveBayesSimple w(BY) 116.2 73.3

135 stepQDA t(DA) 118.8 66.0

136 mlp m(NNET) 120.4 65.3

137 stepLDA t(DA) 120.6 66.6

138 sddaLDA R(DA) 125.1 63.1

139 dpp C(NNET) 125.3 63.0

140 sddaQDA R(DA) 127.8 61.4

141 LWL w(OEN) 128.2 63.4

142 nnetBag R(BAG) 132.4 49.5

143 VFI w(OM) 133.9 63.2

144 OneR w(RL) 137.9 58.0

145 kernelpls R(PLSR) 139.2 51.9

146 OneR t(RL) 140.7 57.5

147 BG OneR w(BAG) 141.1 58.2

148 simpls R(PLSR) 141.8 51.0

149 BG HyperPipes w(BAG) 142.2 57.0

150 mars R(MARS) 142.3 54.8

151 MAB OneR w(BST) 142.4 57.9

152 widekernelpls R(PLSR) 142.5 52.0

153 MAB w(BST) 143.8 54.5

154 ConjunctiveRule w(RL) 145.4 52.6

155 BG DecisionStump w(BAG) 146.4 54.9

156 AdaBoostM1 w(BST) 147.6 54.3

157 MAB Ibk w(BST) 148.3 53.9

158 DecisionStump w(DT) 152.3 51.6

159 HyperPipes w(OM) 152.6 53.6

160 spls R(PLSR) 153.4 44.7

161 BG MLP w(BAG) 155.9 44.6

162 gpls R(PLSR) 158.4 38.5

163 bayesglm t(GLM) 159.8 41.1

164 CVC w(OM) 160.0 47.6

165 RILB w(BST) 161.8 43.4

166 glmStepAIC t(GLM) 163.1 40.6

167 StackingC w(STC) 165.2 40.8

168 MultiScheme w(OEN) 165.4 40.8

169 pinTSVM m(TWSVM) 166.6 34.3

170 Grading w(OEN) 166.9 40.6

171 glm R(GLM) 167.3 25.8

172 Vote w(OEN) 167.4 40.5

173 ZeroR w(RL) 167.5 40.5

173 MetaCost w(BAG) 167.5 40.4

174 Stacking w(STC) 168.1 40.3

174 CSC w(OEN) 168.1 40.3

175 CVPS w(OM) 168.5 40.1

Table 9: Top 20 classiﬁers as per the highest PAMA (%) value.

S.No. Classiﬁer PAMA(%)

1 svm C(SVM) 10.9

2 parRF t(RF) 8.7

3ILSTSVM m(TWSVM) 8.7

4 adaboost R(BST) 6.5

5 RRF t(RF) 6.5

6RELS-TSVM m(TWSVM) 6.5

7 KRR m/KELM m(NNET) 4.3

8 BG RandomForest w(BAG) 4.3

9 lda R(DA) 4.3

10 sda t(DA) 4.3

S.No. Classiﬁer PAMA(%)

11 nnet t(NNET) 4.3

12 lvq R(NNET) 4.3

13 C5.0 t(BST) 4.3

14 LSTSVM m(TWSVM) 4.3

15 TBSVM m(TWSVM) 4.3

16 MAB LibSVM w(BST) 2.2

17 MAB RandomForest w(BST) 2.2

18 MAB RandomTree w(BST) 2.2

19 lda2 t(DA) 2.2

20 PenalizedLDA R(DA) 2.2

5. Conclusions and future directions

This paper has provided an exhaustive benchmarking of 8 variants of TWSVM

based classiﬁers from three categories with 179 classiﬁers of 17 families. Eight

variants of TWSVM based classiﬁers have been taken into account and tested

along with 179 classiﬁers on various performance criteria viz., Acc, FRank,

PAMA, p95, and PMA. Two variants from least square category (ILSTSVM m,

and RELS-TSVM m) have performed the best among all 187 classiﬁers for bi-

nary class datasets as per FRank, Acc and PMA criteria, and another TWSVM

variants TBSVM m has performed the best as per PAMA criterion. Overall, 5

and 4 TWSVM variants are able to secure a place in the top 20 best classiﬁers for

binary and multi-class datasets according to FRank, respectively. An interesting

fact is observed among TWSVM variants for binary datasets. The basic least

square version of TWSVM i.e. LSTSVM m has performed the second-worst

among all TWSVM variants as per Acc criterion. However, their improved

variants i.e. RELS-TSVM m and ILSTSVM m have performed the best among

all TWSVM variants. A similar observation is made for multi-class datasets.

Moreover, TWSVM variants didn’t attain even top 5 positions for multi-class

datasets. Although, TWSVM variants are not emerged as the best classiﬁers

for multi-class datasets, it can be a good alternative because it obtained 2nd po-

sition as per PAMA criterion. On the other hand, it can be a better alternative

for binary class datasets compared to other state-of-the-art classiﬁers. Further-

Table 10: Training time (in seconds) of 8 variants for 90 datasets and continued to Table 11

ILSTSVM m LPTSVM m LSTSVM m pinTSVM m RELS-

TSVM m

TBSVM m TWSVM m WLTSVM m

acute-inﬂammation 0.0320 0.1779 0.0343 0.0161 0.0342 0.0662 0.1137 0.0023

acute-nephritis 0.0078 0.1487 0.0109 0.0125 0.0084 0.0610 0.0342 0.0037

annealing 0.7267 21.6805 0.7004 0.2043 0.7097 7.4263 1.7779 0.2041

arrhythmia 0.7148 6.7270 0.6239 0.0625 0.5953 0.8841 1.0891 0.0213

balance-scale 0.1603 6.4621 0.1684 0.0507 0.1080 1.2528 0.3692 0.0401

balloons 0.0026 0.0073 0.0056 0.0096 0.0030 0.0051 0.0191 0.0005

blood 0.1058 2.7389 0.1100 0.0832 0.1510 4.5923 0.4953 0.0635

breast-cancer 0.0145 0.6903 0.0147 0.0283 0.0155 0.1350 0.0642 0.0070

breast-cancer-wisc 0.1141 2.9150 0.0950 0.0688 0.0903 0.3817 0.3483 0.0576

breast-cancer-wisc-diag 0.1133 2.6128 0.0620 0.0438 0.0650 0.1578 0.2355 0.0313

breast-cancer-wisc-prog 0.0090 0.2316 0.0209 0.0193 0.0125 0.0428 0.0430 0.0066

breast-tissue 0.0106 0.3496 0.0108 0.0119 0.0129 0.0771 0.0817 0.0031

car 1.5964 40.2947 1.8283 0.4875 1.5445 37.5490 4.7756 0.5331

cardiotocography-10clases 6.6195 222.4167 8.2245 0.8833 7.7333 116.9285 18.1605 0.9085

cardiotocography-3clases 1.9685 104.8104 2.6934 1.2793 2.2968 16.3837 5.1091 1.0034

chess-krvkp 3.6092 444.7051 5.5426 1.3464 3.8869 10.8307 8.5032 2.4955

congressional-voting 0.0301 0.7976 0.0656 0.0248 0.0331 0.1125 0.1397 0.0196

conn-bench-sonar-mines-rocks 0.0102 0.1526 0.0105 0.0129 0.0310 0.0553 0.0608 0.0063

conn-bench-vowel-deterding 0.5993 15.4250 0.7889 0.0888 0.6403 11.4379 2.0700 0.0732

contrac 0.8070 17.0902 1.0402 0.2032 0.9466 32.2308 1.6210 0.3612

credit-approval 0.0884 3.3101 0.1561 0.0596 0.1697 0.4281 0.2311 0.1338

cylinder-bands 0.0487 4.3855 0.0482 0.0382 0.0887 0.2398 0.1274 0.0266

dermatology 0.1315 2.1967 0.0689 0.0916 0.0870 0.4126 0.2594 0.0129

echocardiogram 0.0058 0.0646 0.0054 0.0147 0.0059 0.0708 0.0440 0.0051

ecoli 0.0751 2.0166 0.1132 0.0283 0.1065 0.9302 0.3958 0.0124

energy-y1 0.1709 5.6710 0.2506 0.1099 0.1742 2.9909 0.3892 0.2076

energy-y2 0.1620 5.5776 0.2259 0.0820 0.2345 1.1859 0.4503 0.0786

fertility 0.0044 0.0328 0.0044 0.0128 0.0059 0.0271 0.0568 0.0264

ﬂags 0.0314 0.5832 0.0961 0.0148 0.0397 0.1773 0.1897 0.0112

glass 0.0245 0.5198 0.0229 0.0150 0.0432 0.4832 0.1756 0.0053

haberman-survival 0.0158 0.2859 0.0155 0.0194 0.0164 0.3252 0.1313 0.0096

hayes-roth 0.0100 0.3890 0.0106 0.0189 0.0137 0.1103 0.0763 0.0039

heart-cleveland 0.0390 1.2811 0.0348 0.0264 0.0489 0.5683 0.1989 0.0168

heart-hungarian 0.0142 0.5106 0.0140 0.0450 0.0185 0.1827 0.0732 0.0137

heart-switzerland 0.0101 0.2188 0.0133 0.0159 0.0368 0.0886 0.0828 0.0081

heart-va 0.0198 0.5101 0.0267 0.0304 0.0219 0.3522 0.2045 0.0064

hepatitis 0.0067 0.1055 0.0151 0.0139 0.0072 0.0496 0.0631 0.0039

hill-valley 0.1623 4.7513 0.2599 0.1048 0.2243 0.8187 0.2973 0.2008

horse-colic 0.0579 1.0534 0.0274 0.0290 0.0401 0.0797 0.1556 0.0141

ilpd-indian-liver 0.1184 1.5416 0.0615 0.0456 0.0601 1.0810 0.1632 0.1038

image-segmentation 0.1606 7.1914 0.4763 0.0876 0.5045 0.4171 0.3660 0.0241

ionosphere 0.0251 0.5386 0.0206 0.0239 0.0282 0.0968 0.0731 0.0169

iris 0.0080 0.2485 0.0076 0.0181 0.0551 0.1585 0.0637 0.0038

led-display 1.1272 22.0068 1.2954 0.1208 1.4207 11.6267 2.7106 0.2657

lenses 0.0025 0.0192 0.0037 0.0202 0.0030 0.0071 0.0514 0.0006

libras 0.2190 5.0433 0.2063 0.0338 0.2571 1.6482 0.8661 0.0141

low-res-spect 0.3521 7.2346 0.3484 0.0758 0.3577 1.0836 0.8815 0.0598

lung-cancer 0.0029 0.0061 0.0033 0.0215 0.0033 0.0073 0.0541 0.0007

Table 11: Continuing from Table 10

ILSTSVM m LPTSVM m LSTSVM m pinTSVM m RELS-

TSVM m

TBSVM m TWSVM m WLTSVM m

lymphography 0.0108 0.4371 0.0439 0.0206 0.0121 0.0379 0.1242 0.0119

mammographic 0.2450 5.5149 0.2959 0.0944 0.1832 0.7785 0.4427 0.1512

molec-biol-promoter 0.0050 0.0293 0.0410 0.0218 0.0093 0.0130 0.0431 0.0035

molec-biol-splice 5.7153 616.4776 7.6826 1.1327 5.8671 17.0395 10.3194 2.5694

monks-1 0.0088 0.5434 0.0127 0.0203 0.0116 0.0207 0.0690 0.0049

monks-2 0.0116 0.3640 0.0143 0.0347 0.0157 0.1047 0.0735 0.0093

monks-3 0.0074 0.6456 0.0105 0.0234 0.0115 0.0332 0.0608 0.0064

musk-1 0.0503 1.2807 0.0538 0.0479 0.0526 0.1084 0.1253 0.0268

oocytes merluccius nucleus 4d 0.2280 6.9678 0.2587 0.1192 0.2559 6.8678 0.6275 0.1540

oocytes merluccius states 2f 0.4011 11.9716 0.4037 0.1528 0.3747 1.0210 1.0321 0.1372

oocytes trisopterus nucleus 2f 0.1661 4.4441 0.2069 0.1021 0.1760 1.5097 0.3527 0.1762

oocytes trisopterus states 5b 0.3013 9.4324 0.3281 0.0990 0.2702 1.2043 0.6508 0.1317

ozone 2.4402 65.2626 3.0110 1.8025 2.4534 43.0555 6.7500 1.5644

parkinsons 0.0088 0.4617 0.0084 0.0221 0.0096 0.0298 0.0752 0.0034

pima 0.1514 6.3792 0.1133 0.0715 0.1099 0.4495 0.3621 0.0666

pittsburg-bridges-MATERIAL 0.0062 0.0759 0.0057 0.0237 0.0069 0.0295 0.0654 0.0029

pittsburg-bridges-REL-L 0.0055 0.1070 0.0055 0.0231 0.0059 0.0287 0.0930 0.0024

pittsburg-bridges-SPAN 0.0282 0.2568 0.0089 0.0244 0.0061 0.0289 0.0837 0.0029

pittsburg-bridges-T-OR-D 0.0043 0.0455 0.0044 0.0237 0.0048 0.0218 0.0440 0.0028

pittsburg-bridges-TYPE 0.0097 0.2723 0.0238 0.0223 0.0138 0.0649 0.1499 0.0025

planning 0.0085 0.1060 0.0071 0.0254 0.0086 0.0877 0.0861 0.0685

primary-tumor 0.1901 3.2089 0.1568 0.0377 0.1767 0.9176 1.7308 0.0114

seeds 0.0129 0.4570 0.0544 0.0250 0.0136 0.0479 0.2573 0.0048

soybean 0.2723 10.8691 0.3824 0.0439 0.3483 1.0151 1.6679 0.0166

spect 0.0058 0.0908 0.0069 0.0239 0.0073 0.0126 0.0687 0.0031

spectf 0.0055 0.0865 0.0095 0.0240 0.0083 0.0151 0.0608 0.0046

statlog-australian-credit 0.1066 2.5125 0.1669 0.0560 0.0888 2.1936 0.5037 0.0504

statlog-german-credit 0.2577 18.5491 0.2260 0.1189 0.2264 1.4237 0.7356 0.1430

statlog-heart 0.4155 0.7680 0.0130 0.0240 0.0138 0.0336 0.6537 0.0066

statlog-image 5.4597 248.1873 6.7212 0.8627 5.8391 107.6823 16.1723 1.2836

statlog-vehicle 0.2838 12.8589 0.3178 0.1154 0.2891 5.5666 1.0063 0.0942

steel-plates 3.8250 113.8698 4.4728 0.5176 3.8220 63.9410 10.8581 0.6868

synthetic-control 0.3117 6.7058 0.2292 0.0935 0.2404 1.8294 0.8085 0.0449

teaching 0.0081 0.1407 0.0079 0.0218 0.0093 0.0734 0.1540 0.0125

thyroid 18.8930 1252.4298 25.6913 11.0068 20.1259 471.9164 92.1563 8.1767

tic-tac-toe 0.1825 7.2642 0.2024 0.1488 0.2229 1.1397 0.4431 0.1126

titanic 1.3967 20.6183 1.7539 0.4372 1.7101 9.5813 2.4346 0.8920

trains 0.0020 0.0042 0.0028 0.0220 0.0022 0.0042 0.0387 0.0005

vertebral-column-2clases 0.0261 0.7050 0.0145 0.0244 0.0165 0.0685 0.0890 0.0081

vertebral-column-3clases 0.0264 0.8039 0.0210 0.0286 0.0533 0.1170 0.1396 0.0080

wine 0.0761 0.5782 0.0096 0.0220 0.0138 0.0337 0.1395 0.0039

wine-quality-red 1.9174 55.3645 2.2870 0.4359 1.9982 40.9643 4.9816 0.4275

Average Time (in Seconds) 0.7093 38.3875 0.9019 0.2684 0.7564 11.6374 2.3408 0.2691

Maximum Time (in Seconds) 18.8930 1252.4298 25.6913 11.0068 20.1259 471.9164 92.1563 8.1767

more, most of the TWSVM variants are developed for stationary environment

(batch learning), hence, a lot of scope remains for the development of TWSVM

for the non-stationary environment (online learning). These variants can also

be developed for various framework like LUPI framework, Graph-Embedding

framework etc. As TWSVM variants emerges as the viable alternative for bi-

nary datasets, therefore, these variants need to be tested on those applications

where binary classiﬁcation is required.

Acknowledgement

This work is supported by Science and Engineering Research Board (SERB)

funded Research Projects, Government of India under Early Career Research

Award Scheme, Grant No. ECR/2017/000053 and Ramanujan Fellowship Scheme,

Grant No. SB/S2/RJN-001/2016. We gratefully acknowledge the Indian Insti-

tute of Technology Indore for providing facilities and support.

References

[1] C. J. C. Burges, A tutorial on support vector machines for pattern recogni-

tion, Data Mining and Knowledge Discovery (2) (1998) 1-43.

[2] C. C. Chang, C. J. Lin, LIBSVM: a library for support vector machines,

ACM Transactions on Intelligent Systems and Technology (TIST) 2 (3)

(2011) 27.

[3] C. Cortes, V. N. Vapnik, Support vector networks, Machine Learning (20)

(1995) 273–297.

[4] N. Cristianini, J. Shawe-Taylor, An introduction to support vector machines

and other kernel based learning method, Cambridge University Press, Cam-

bridge, 2000.

[5] M. Tanveer, M. Mangal, I. Ahmad, Y.H. Shao, One norm linear program-

ming support vector regression, Neurocomputing (173) (2016) 1508-1518.

[6] J. Demsar, Statistical comparisons of classiﬁers over multiple data sets, Jour-

nal of Machine Learning Research (7) (2006) 1-30.

[7] G. H. Golub, C. F. V. Loan, Matrix Computations. Vol. 3 JHU Press, 2012.

[8] Jayadeva, R. Khemchandani, S. Chandra, Twin support vector machines for

pattern classiﬁcation, IEEE Transactions on Pattern Analysis and Machine

Intelligence 29 (5) (2007) 905-910.

[9] M. A. Kumar, M. Gopal, Least squares twin support vector machines for

pattern classiﬁcation, Expert Systems with Applications (36) (2009) 7535-

7543.

[10] O. L. Mangasarian, E. W. Wild, Multisurface proximal support vector clas-

siﬁcation via generalized eigenvalues, IEEE Transactions on Pattern Analy-

sis and Machine Intelligence 28 (1) (2006) 69-74.

[11] O. L. Mangasarian, Exact 1-norm support vector machines via uncon-

strained convex diﬀerentiable minimization. Journal of Machine Learning

Research. 2006; (7): 1517-1530.

[12] Y. H. Shao, C. H. Zhang, X. B. Wang, N. Y. Deng, Improvements on

twin support vector machines, IEEE Transactions on Neural Networks 22

(6) (2011) 962-968.

[13] V. N. Vapnik, Statistical Learning Theory, Wiley, New York, 1998.

[14] M. Tanveer, A. Tiwari, R. Choudhary, S. Jalan, Sparse pinball twin support

vector machines, Applied Soft Computing (78) (2019) 164-175.

[15] Y. Tian, Z. Qi, Review on: Twin Support Vector Machines, Annals of Data

Science 1 (2) (2014) 253-277.

[16] Y. Xu, W. Xi, X. Lv, R. Guo, An improved least squares twin support vec-

tor machine, Journal of information and computational science, 9(4) (2012)

1063-1071.

[17] H. Huang and X. Wei and Y. Zhou, Twin support vector machines: A

survey, Neurocomputing (300) (2018) 34 - 43.

[18] M. Tanveer, Application of smoothing techniques for linear programming

twin support vector machines, Knowledge and Information Systems 45(1)

(2015) 191-214.

[19] Y. H. Shao, W. J. Chen, J. J. Zhang, Z. Wang, N. Y. Deng, An eﬃcient

weighted Lagrangian twin support vector machine for imbalanced data clas-

siﬁcation, Pattern Recognition 47 (9) (2014) 3158-3167.

[20] M. Tanveer, M. A. Khan, S. S. Ho, Robust energy-based least squares twin

support vector machines, Applied Intelligence 45 (1) (2016) 174-186.

[21] A. N. Tikhonov, V. Y. Arsen, Solutions of ill-posed problems. John Wiley

& Sons. New York; 1977.

[22] M. Tanveer, Robust and sparse linear programming twin support vector

machines, Cognitive Computation, 7(1) (2015) 137-149.

[23] M. Fernandez-Delgado, E. Cernadas, S. Barro, D. Amorim, Do we need

hundreds of classiﬁers to solve real world classiﬁcation problems, Journal of

Machine Learning and Research, 15 (1) (2014) 3133-3181.

[24] M. Lichman, UCI machine learning repository, (2013).

[25] J. Vanschoren, H. Blockeel, B. Pfahringer, G. Holmes, Experiment

databases, Machine Learning, 87 (2) (2012) 127-158.

[26] X. Peng, TPMSVM: a novel twin parametric-margin support vector ma-

chine for pattern recognition, Pattern Recognition, 44(10-11) (2011) 2678-

2692.

[27] Y. Xu, Z. Yang, X. Pan, A novel twin support-vector machine with pinball

loss, IEEE Transactions on Neural Networks and Learning Systems, 28 (2)

(2017) 359-370.

[28] Z. Wang, Y.-H. Shao, T. R. Wu, A GA-based model selection for smooth

twin parametric-margin support vector machine, Pattern Recognition, 46

(8) (2013) 2267-2277.

[29] M. Tanveer, Newton method for implicit Lagrangian twin support vector

machines, International Journal of Machine Learning and Cybernetics 6(6)

(2015) 1029-1040.

[30] N. Parastalooi, A. Amiri, P. Aliheidari, Modiﬁed twin support vector re-

gression, Neurocomputing, 211 (2016) 84-97.

[31] L. Zhang and P. N. Suganthan, Benchmarking Ensemble Classiﬁers with

Novel Co-Trained Kernel Ridge Regression and Random Vector Functional

Link Ensembles [Research Frontier], in IEEE Computational Intelligence

Magazine, 12 (4) (2017) 61-72.

[32] R. Rastogi, S. Sharma, Fast Laplacian twin support vector machine with

active learning for pattern classiﬁcation, Applied Soft Computing 74 (2019)

424-439.

[33] B. Richhariya, M. Tanveer, EEG signal classiﬁcation using universum sup-

port vector machine, Expert Systems with Applications 106 (2018) 169-182.

[34] R. L. Iman, J. M. Davenport. Approximations of the critical region of the

fbietkan statistic, Communications in Statistics-Theory and Methods, 9 (6)

(1980) 571-595.

An Enhanced FCM Clustering Method Based on Multi-Strategy Tuna Swarm Optimization

Article

Full-text available

Jan 2024

To overcome the shortcoming of the Fuzzy C-means algorithm (FCM)—that it is easy to fall into local optima due to the dependence of sub-spatial clustering on initialization—a Multi-Strategy Tuna Swarm Optimization-Fuzzy C-means (MSTSO-FCM) algorithm is proposed. Firstly, a chaotic local search strategy and an offset distribution estimation strategy algorithm are proposed to improve the performance, enhance the population diversity of the Tuna Swarm Optimization (TSO) algorithm, and avoid falling into local optima. Secondly, the search and development characteristics of the MSTSO algorithm are introduced into the fuzzy matrix of Fuzzy C-means (FCM), which overcomes the defects of poor global searchability and sensitive initialization. Not only has the searchability of the Multi-Strategy Tuna Swarm Optimization algorithm been employed, but the fuzzy mathematical ideas of FCM have been retained, to improve the clustering accuracy, stability, and accuracy of the FCM algorithm. Finally, two sets of artificial datasets and multiple sets of the University of California Irvine (UCI) datasets are used to do the testing, and four indicators are introduced for evaluation. The results show that the MSTSO-FCM algorithm has better convergence speed than the Tuna Swarm Optimization Fuzzy C-means (TSO-FCM) algorithm, and its accuracies in the heart, liver, and iris datasets are 89.46%, 63.58%, 98.67%, respectively, which is an outstanding improvement.

Random vector functional link network: Recent developments, applications, and future directions

Article

Full-text available

May 2023
APPL SOFT COMPUT

Multi-class Weighted Least Squares Twin Bounded Support Vector Machine for Intelligent Water Leakage Diagnosis

Article

Full-text available

Jan 2023

Monitoring the operating status of the pipeline and determining the location of the leak in time are very important to ensure the safe operation of the pipeline. Least squares twin support vector machine (LSTSVM) is a classic fast classification method that has been used to identify different pipeline conditions; however, LSTSVM assumes that all samples share the same weight when generating the hyperplane, including data points that may be polluted in the sample (i.e., outliers), and outlier samples with equal weights will mislead the generation of the hyperplane. Inspired by the above research, this article proposes a weighted least squares twin bounded support vector machine based on Gaussian mixture models (GMMs), referred to as G-WLSTSVM. The proposed G-WLSTSVM introduces a weight matrix for the objective function through GMM and assigns a larger weight to the normal samples and a smaller weight to the outliers, which reduces the impact effect of the outliers on the generation of the classification hyperplane. Furthermore, since LSTSVM only considers the empirical risk minimization principle, it may lead to overfitting. The proposed G-WLSTSVM introduces an extra regularization term based on the margin maximization idea to realize the principle of structural risk minimization, which improves the generalization performance of the model. Finally, since the practical problems are mostly multiclassification problems, the G-WLSTSVM for binary classification cannot be satisfied. Therefore, the proposed G-WLSTSVM combined with a “One-versus-One” strategy is extended to handle multiclassification problems, namely the multiclass G-WLSTSVM. We evaluate the effectiveness of the multiclass G-WLSTSVM in identifying different pipeline conditions and localizing the identified leakage. Numerical experimental results on several University of California Irvine (UCI) datasets further demonstrate that compared with other related methods, the proposed G-WLSTSVM not only retains the advantages of simplicity and speed of the LSTSVM but also improves the classification accuracy and generalization ability. The code for this article is available at https://github.com/cmq-456/glstsvm .

A Novel Decision Method for Multiclass Twin Support Vector Machines by Average Distance

Conference Paper

Full-text available

May 2022

A common decision strategy in multiclass support vector machines is to tally the votes of one-versus-one decisions, and choose the class that receives the largest number of votes. We propose an alternative strategy for one-versus-one decision making in twin support vector machines that tallies the average distance of a data sample from all the twin hyperplanes belonging to a given class, for each class. We compare the results with other decision strategies on several variations of the twin support vector machine using standard unbalanced datasets. The proposed strategy shows improved accuracy with some datasets, which is most pronounced in the case of the least squares twin support vector machine.

Dynamic Analysis of Soil-Structure Interaction using Ensemble-based Modified Ant Lion Optimization Algorithm

Article

Nov 2022

In recent decades, the analysis of dynamic characteristics of Soil-Structure Interaction (SSI) has become an emerging research topic, where the SSI is defined as the structure's motion and the soil's response. The SSI is an important problem in solid and monstrous structures, which are built on delicate ground that changes the dynamic properties of the structures. The main objective of this research article is to propose an ensemble machine-learning algorithm for predicting the dynamic response and characteristics of SSI problems. After collecting the data from 57 structures, the data pre-processing is accomplished using Min-Max Normalization (MMN) and Max Normalization (MN) techniques that superiorly rescale the unstructured data for better prediction. Further, the data optimization is carried out using the Modified Ant Lion Optimization (MALO) algorithm that effectively optimizes the dimensionality of the data, where this process reduces the computational complexity and improves the prediction accuracy of dynamic characteristics in SSI modeling. Finally, the optimized data is given as the input to the ensemble classifier, which is a combination of Support Vector Machine (SVM) and ID3 for classifying the dynamic characteristics related to SSI, which are period Lengthening (PL), Super Structure Acceleration (SSA) and Pile Head Acceleration (PHA). The simulation results confirmed that the ensemble-based MALO algorithm improved performance in predicting the dynamic response and characteristics of SSI problems by error value. Whereas the proposed algorithm, on average, reduced 0.01-to-0.5 error value compared to the existing machine learning algorithms.

Enhancing class imbalance solutions: A projection-based fuzzy LS-TSVM approach

Article

Apr 2024
NEUROCOMPUTING

Probabilistic Local Mean K-Nearest Neighbors Classification

Conference Paper

Feb 2024

Dynamic grouping control of electric vehicles based on improved k-means algorithm for wind power fluctuations suppression

Article

Oct 2023

Support Vector Machine Based Models with Sparse Auto-encoder Based Features for Classification Problem

Chapter

Apr 2023

Auto-encoder is a special type of artificial neural network (ANN) that is used to learn informative features from data. In the literature, the generalization performance of several machine learning models have been improved either using auto-encoder based features or high dimensional features (original + auto-encoder based features). Random vector functional link (RVFL) network also uses two type of features, i.e., original features and randomized features, that makes it a special randomized neural network. These hybrid features improve the generalization performance of the RVFL network. In this paper, we introduce the idea of using additional features into robust energy-based least squares twin support vector machines (RELS-TSVM) and least squares twin support vector machines (LSTSVM). We used sparse auto-encoder with \(L_{1}\) norm regularization to learn the auxiliary feature representation from original feature space. These new additional features are concatenated with the original features to get the extended feature space. The conventional RELS-TSVM and LSTSVM are trained over new extended feature space. Experiments demonstrate that auto-encoder based features improve the generalization capability of the conventional RELS-TSVM and LSTSVM models. To examine the performance of the proposed classifiers, i.e., extended-RELS-TSVM (ext-RELS-TSVM) and extended LSTSVM (ext-LSTSVM), experiments have been conducted over 15 UCI binary datasets and the results show that the proposed classifiers have better generalization performance than the baseline classifiers.KeywordsRobust energy based least squares twin SVM (RELS-TSVM)SP-RVFLLeast squares twin SVM (LSTSVM)Extended feature spaceRVFL

HBSBoost: A Hybrid Balancing Technique for Defaulting Enterprise Recognition

Article

Dec 2022

Sparse pinball twin support vector machines

Article

Feb 2019
APPL SOFT COMPUT

The original twin support vector machine (TWSVM) formulation works by solving two smaller quadratic programming problems (QPPs) as compared to the traditional hinge-loss SVM (C-SVM) which solves a single large QPP — this makes the TWSVM training and testing process faster than the C-SVM. However, these TWSVM problems are based on the hinge-loss function and, hence, are sensitive to feature noise and unstable for re-sampling. The pinball-loss function, on the other hand, maximizes quantile distances which grants noise insensitivity but this comes at the cost of losing sparsity by penalizing correctly classified samples as well. To overcome the limitations of TWSVM, we propose a novel sparse pinball twin support vector machines (SPTWSVM) based on the ϵ-insensitive zone pinball loss function to rid the original TWSVM of its noise insensitivity and ensure that the resulting TWSVM problems retain sparsity which makes computations relating to predictions just as fast as the original TWSVM. We further investigate the properties of our SPTWSVM including sparsity, noise insensitivity, and time complexity. Exhaustive testing on several benchmark datasets demonstrates that our SPTWSVM is noise insensitive, retains sparsity and, in most cases, outperforms the results obtained by the original TWSVM.

Fast Laplacian twin support vector machine with active learning for pattern classification

Article

Oct 2018
APPL SOFT COMPUT

In this paper, we propose a semi-supervised classifier termed as Fast Laplacian Twin Support Vector Machine ([Formula presented]) with an objective to reduce the requirement of labeled data and simultaneously lessen the training time complexity of a traditional Laplacian Twin Support Vector Machine semi-supervised classifier. [Formula presented] is faster than existing Laplacian twin support vector machine as it solves a smaller size Quadratic Programming Problem (QPP) along with an Unconstrained Minimization Problem (UMP) to obtain decision hyperplanes which can also handle heteroscedastic noise present in the training data. Traditional semi-supervised classifiers generally have no explicit control over the choice of labeled data available for training, hence to overcome this limitation, we propose a pool-based active learning framework which identifies most informative examples to train the learning model. Moreover, the aforementioned framework has been extended to deal with multi-category classification scenarios. Several experiments have been performed on machine learning benchmark datasets which proves the utility of the proposed classifier over traditional Laplacian Twin Support Vector Machine ([Formula presented]) and active learning based Support Vector Machine ([Formula presented]). The efficacy of the proposed framework has also been tested on human activity recognition problem and content based image retrieval system.

EEG signal classification using universum support vector machine

Article

Mar 2018
EXPERT SYST APPL

Support vector machine (SVM) has been used widely for classification of electroencephalogram (EEG) signals for the diagnosis of neurological disorders such as epilepsy and sleep disorders. SVM shows good generalization performance for high dimensional data due to its convex optimization problem. The incorporation of prior knowledge about the data leads to a better optimized classifier. Different types of EEG signals provide information about the distribution of EEG data. To include prior information in the classification of EEG signals, we propose a novel machine learning approach based on universum support vector machine (USVM) for classification. In our approach, the universum data points are generated by selecting universum from the EEG dataset itself which are the interictal EEG signals. This removes the effect of outliers on the generation of universum data. Further, to reduce the computation time, we use our approach of universum selection with universum twin support vector machine (UTSVM) which has less computational cost in comparison to traditional SVM. For checking the validity of our proposed methods, we use various feature extraction techniques for different datasets consisting of healthy and seizure signals. Several numerical experiments are performed on the generated datasets and the results of our proposed approach are compared with other baseline methods. Our proposed USVM and proposed UTSVM show better generalization performance compared to SVM, USVM, Twin SVM (TWSVM) and UTSVM. The proposed UTSVM has achieved highest classification accuracy of 99 % for the healthy and seizure EEG signals.

Twin Support Vector Machines: A Survey

Article

Mar 2018
NEUROCOMPUTING

Benchmarking Ensemble Classifiers with Novel Co-Trained Kernal Ridge Regression and Random Vector Functional Link Ensembles [Research Frontier]

Article

Nov 2017

Studies in machine learning have shown promising classification performance of ensemble methods employing "perturb and combine" strategies. In particular, the classical random forest algorithm performs the best among 179 classifiers on 121 UCI datasets from different domains. Motivated by this observation, we extend our previous work on oblique decision tree ensemble. We also propose an efficient co-trained kernel ridge regression method. In addition, a random vector functional link network ensemble is also introduced. Our experiments show that our two oblique decision tree ensemble variants and the co-trained kernel ridge regression ensemble are the top three ranked methods among the 183 classifiers. The proposed random vector functional link network ensemble also outperforms all neural network based methods used in the experiments.

LIBSVM: A library for support vector machines

Article

Jan 2011

A Tutorial on Support Vector Machines for Pattern Recognition

Article

Jun 1998

Christopher J. C. Burges

The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will find that even old material is cast in a fresh light.

Modified Twin Support Vector Regression

Article

Jun 2016
NEUROCOMPUTING

The present study suggest modified twin support vector regression (MTSVR) for data regression. In the MTSVR model, the regression function is determined using a pair of unparalleled up and down bound functions. In any optimization problem, a new term is added to obtain structural information of the input data based on the concept of structural granularity. Furthermore, Successive Over relaxation is used to accelerate the training process of optimization problems. Particle Swarm Optimization (PSO) algorithm is used to determine the parameters of the MTSVR model. According to the results of the artificial and real datasets, the prediction accuracy and generalization capability of the MTSVR model is significantly increased.

A Novel Twin Support-Vector Machine With Pinball Loss

Article

Jan 2016

Twin support-vector machine (TSVM), which generates two nonparallel hyperplanes by solving a pair of smaller-sized quadratic programming problems (QPPs) instead of a single larger-sized QPP, works faster than the standard SVM, especially for the large-scale data sets. However, the traditional TSVM adopts hinge loss which easily leads to its sensitivity of the noise and instability for resampling. To enhance the performance of the TSVM, we present a novel TSVM with the pinball loss (Pin-TSVM) which deals with the quantile distance and is less sensitive to noise points. We further investigate its properties, including the noise insensitivity, between-class distance maximization, and within-class scatter minimization. In addition, we compare our Pin-TSVM with the twin parametric-margin SVM and the SVM with the pinball loss in theory. Numerical experiments on a synthetic data set and 14 benchmark data sets with different noises demonstrate the feasibility and validity of our proposed method.

Ill-posed problems

Article

Jan 1995

Comprehensive Evaluation of Twin SVM based Classifiers on UCI Datasets

Abstract and Figures

Recommended publications

An improved least squares Twin Support Vector Machine

Structural least square twin support vector machine for classification

Robust General Twin Support Vector Machine with Pinball Loss Function

Large scale least squares twin SVMs

LSTSVM classifier with enhanced features from pre-trained functional link network

Improved sparse pinball twin SVM