ArticlePDF Available

Abstract and Figures

In the past decade, twin support vector machine (TWSVM) based classifiers have received considerable attention from the research community. In this paper, we analyze the performance of 8 variants of TWSVM based classifiers along with 179 classifiers evaluated in [23] from 17 different families on 90 University of California Irvine (UCI) benchmark datasets from various domains. Results of these classifiers are exhaustively analyzed using various performance criteria. Statistical testing is performed using Friedman Rank (FRank). Our experiments show that two least square TWSVM based classifiers (ILSTSVM m, and RELS-TSVM m) are the top two ranked methods among 187 classifiers and they significantly outperform all other classifiers according to Friedman Rank. Overall, this paper bridges the evaluational benchmarking gap between various TWSVM variants and the classifiers from other families. Codes of this paper are provided on authors’ homepages to reproduce the presented results and figures in this paper.
Content may be subject to copyright.
Comprehensive Evaluation of Twin SVM based
Classifiers on UCI Datasets
M. Tanveera,, C. Gautamb, P.N. Suganthanc,
aDiscipline of Mathematics, Indian Institute of Technology Indore, Indore, Simrol 453552
India
bDiscipline of Computer Science and Engineering, Indian Institute of Technology Indore,
Indore, Simrol 453552 India
cSchool of Electrical and Electronic Engineering, Nanyang Technological University,
639798, Singapore
Abstract
In the past decade, twin support vector machine (TWSVM) based classifiers
have received considerable attention from the research community. In this pa-
per, we analyze the performance of 8 variants of TWSVM based classifiers along
with 179 classifiers evaluated in [23] from 17 different families on 90 University
of California Irvine (UCI) benchmark datasets from various domains. Results
of these classifiers are exhaustively analyzed using various performance criteria.
Statistical testing is performed using Friedman Rank (FRank). Our experi-
ments show that two least square TWSVM based classifiers (ILSTSVM m, and
RELS-TSVM m) are the top two ranked methods among 187 classifiers and
they significantly outperform all other classifiers according to Friedman Rank.
Overall, this paper bridges the evaluational benchmarking gap between various
TWSVM variants and the classifiers from other families. Codes of this pa-
per are provided on authors’ homepages to reproduce the presented
results and figures in this paper.
Keywords: Benchmarking classifiers . Twin support vector machines . Least
squares twin support vector machines. Support vector machines . Machine
Corresponding author
Email addresses: mtanveer@iiti.ac.in (M. Tanveer), chandangautam31@gmail.com (C.
Gautam), epnsugan@ntu.edu.sg (P.N. Suganthan)
Preprint submitted to Applied Soft Computing, Elsevier July 18, 2019
learning .
1. Introduction
Among kernel based methods, SVM is well explored in the past by re-
searchers primarily in the context of pattern recognition [1, 2, 3, 4, 5]. Most
of the work in SVM endeavor to maximize the margin between two parallel
hyperplanes by minimizing the generalization error. In 2007, Jayadeva et al.
[8] introduced the concept of non-parallel supporting hyperplanes and was re-
ferred to as twin support vector machine (TWSVM). It solves two smaller sized
quadratic programming problems (QPPs) instead of solving one large QPP in
traditional SVM, and shows better performance in both computational time
and classification accuracy. Then, Kumar et al. [9] proposed a least square
TWSVM (LSTSVM), which is an extremely simple and fast algorithm for gen-
erating binary classifiers. In the last decade, TWSVM formulation has attracted
considerable attention by the research community for replacing the parallel hy-
perplanes to non-parallel ones in SVM [8]. Generally, fuzziness is embedded in
SVM to solve this type of issue, which introduces extra complexity to the model.
However, TWSVM can handle such a situation effectively without introducing
further complexity to the model. Various variants of TWSVM have been devel-
oped by the researchers in the last decade [14, 16, 19, 26, 28, 29, 32, 33]. Shao
et al. [12] included one more regularization term to TWSVM and proposed a
new variant termed as twin bounded support vector machine (TBSVM). The
formulation of TWSVM can be viewed as a special case of TBSVM. An im-
provement to LSTSVM (ILSTSVM) has also been proposed by Xu et al. [16]
by introducing a regularization term. Later, the weighted Lagrangian twin sup-
port vector machine (WLTSVM) [19] was proposed for imbalance data classifi-
cation. Recently, two more variants viz., robust and sparse linear programming
TWSVM (LPTSVM) [18, 22] and robust energy based LSTSVM (RELS-TSVM)
[20] were proposed for classification problems. Most recently, pinball loss-based
TWSVM (pinTSVM) [27] was proposed, which takes quantile distance into ac-
2
count and robust to noisy samples. Apart from the above discussed variants of
TSVM, researchers have also developed evolutionary algorithm-based TWSVM
[28, 30] where evolutionary algorithms are employed to select optimal values
of parameters for TWSVM. We have selected 8 competitive variants among
various existing TWSVM variants in the literature [15]. A brief description of
these 8 variants are provided in Section 3. Further, we analyze the outcomes
of the classifiers based on various performance analysis criteria viz., Friedman
Rank (FRank), Average accuracy (Acc), Probability of Achieving the Maximum
Accuracy (PAMA), Probability of achieving more than 95% of the maximum
accuracy (P95) and Percentage of the Maximum Accuracy (PMA) [23]. Before
going to further discussion, we are providing motivation in the next section.
2. Motivation and contributions
The main focus of this paper is to analyze the performance of the TWSVM
variants as well as existing classifiers on 90 datasets. Vanschoren et al. [25]
have provided a good analysis using 86 datasets and 93 classifiers using Weka.
Recently, Fernandez et al. [23] performed exhaustive experiments on 121 UCI
repository datasets with 179 classifiers from 17 different families and provided
the ranking of these classifiers on various binary class and multi-class datasets.
They focused on the combined analysis of binary and multi-class classification,
however, provided a very brief analysis on binary class datasets separately. Fer-
nandez et al. [23] have empirically exhibited that the performance of the classi-
fiers depends on the fact whether dataset belongs to binary or multi-class. They
provided a different analysis for binary and multi-class datasets but provided
very brief analysis on binary datasets compared to multi-class datasets. Most
recently, Zhang and Suganthan [31] also performed similar experiments with
their proposed kernel ridge regression-based classifiers. Apart of this, above
mentioned papers [25, 23, 31] did not consider a quite popular method in the
last decade namely TWSVM. TWSVM exhibited very good performance in the
literature [17], therefore, TWSVM needs to be tested on the same experimen-
3
tal setup as used in [23]. Hence, by taking a clue from the paper [23], we are
providing a broad analysis in this paper of the 8 variants of TWSVM with
179 classifiers used in [23] over 90 UCI datasets (44 binary and 46 multi-class
datasets). These UCI datasets and their indices for training and testing have
been taken from [23] and listed on this web page 1along with the detailed
results. Moreover, for multi-class datasets, we use one vs. rest strategy and
analysis is provided separately for binary and multi-class datasets in this paper.
The remaining paper is organized as follows: Section 3 briefly discusses
the eight variants of TWSVM. Section 4 provides the comparative analysis of
TWSVM based eight classifiers with 179 classifiers, which is followed by the
conclusion in the last section.
3. Variants of twin support vector machines
In this section, eight variants of TWSVM are discussed briefly. These vari-
ants can be divided into 3 categories. The first category contains three basic
TWSVM variants viz., TBSVM, TWSVM, and LPTSVM. The second category
is based on weighted TWSVM and variants are pinTSVM and WLTSVM. The
third category contains three least square variants viz., LSTSVM, ILSTSVM,
and RELS-TSVM. Out of eight variants, RELS-TSVM [20] and ILSTSVM [16]
emerge as the best classifiers among 187 classifiers and yields least FRank as
well as highest average accuracy.
3.1. Basic TWSVM variants
3.1.1. Twin support vector machine (TWSVM)
Let us denote all the data points in class +1 by a matrix ARm1×n,where
ith data point AiRnand the matrix BRm2×nrepresent the data points
of class -1. Unlike SVM, the linear TWSVM [8] seeks a pair of non-parallel
1http://people.iiti.ac.in/~phd1501101001/TSVM_JMLR_Binary_Multi.html
4
hyperplanes
f1(x) = wt
1x+b1and f2(x) = wt
2x+b2(1)
such that each hyperplane is proximal to the data points of one class and far
from the data points of other class, where w1Rn, w2Rn, b1Rand b2R.
The formulation of TWSVM can be written as follows:
min
(w1,b1)Rn+1
1
2kAw1+e2b1k2+c1kξ1k
s.t. (Bw1+e1b1) + ξ1e1, ξ10 (2)
and
min
(w2,b2)Rn+1
1
2kBw2+e1b2k2+c2kξ2k
s.t. (Aw2+e2b2) + ξ2e2, ξ20 (3)
respectively, where c1, c2are positive parameters and e1, e2are vectors of one
with appropriate dimensions. Further, we derive dual forms of above problems
can be written as follows:
In order to derive the corresponding dual problems, TWSVM assumes that
the matrices GtGand HtHare nonsingular, where G= [A e2] and H= [B e1]
are augmented matrices of sizes m1×(n+ 1) and m2×(n+ 1), respectively.
Under this condition, the dual problems are
max
αRm2
et
1α1
2αtHGtG1Htα
s.t. 0αc1(4)
and
max
γRm1
et
2γ1
2γtGHtH1Gtγ
s.t. 0γc2(5)
respectively. Here, two matrices GtGand HtHare nonsingular matrices, where
G= [A e2], and H= [B e1].
5
In above optimization problem, GtGor HtHcan be singular or ill con-
ditioned. Hence, in order to avoid these cases, the inverse matrices (GtG)1
and (HtH)1are modified as (GtG+δI )1and (HtH+δI)1, respectively.
Here δis a very small positive scalar and Iis an identity matrix of appropriate
dimensions. Now, the dual form of above problems can be written as:
max
αRm2
et
1α1
2αtHGtG+δI1Htα
s.t. 0αc1(6)
and
max
γRm1
et
2γ1
2γtGHtH+δI1Gtγ
s.t. 0γc2(7)
respectively.
Thus, we obtain the solution of the above problems as follows:
w1
b1
=GtG+δI1Htαand
w2
b2
=HtH+δI1Gtγ. (8)
The dual problems in Eqns. (6) and (7) are derived and solved in [8]. Ex-
perimental results show that the performance of TWSVM is better than the
conventional SVM and generalized eigenvalues proximal SVM (GEPSVM) [10].
3.1.2. Twin bounded support vector machine (TBSVM)
It is well-known that the implementation of structural risk minimization
principle in SVM is one of the significant advantages. However, the primal prob-
lems of TWSVM implements only empirical risk. In addition, we noticed that
TWSVM assumes the existence of the inverse matrices (GtG)1and (HtH)1.
However, this requirement cannot always be satisfied. Shao et al. [12] proposed
an improved and more efficient algorithm termed as twin bounded support vec-
tor machines (TBSVM). The formulation of TBSVM implements structural risk
6
minimization principle by including one more regularization term to TWSVM.
The dual formulation of TBSVM can be derived without additional require-
ment. Thus, the formulation of TBSVM is theoretically better than TWSVM
[12].
The linear TBSVM [12] seeks a pair of non-parallel proximal hyperplanes
f1=wt
1x+b1= 0 and f2=wt
2x+b2= 0 (9)
by solving the following primal problems
min
(w1,b1)Rn+1
1
2kAw1+e2b1k2+c1kξ1k+c3
2
w1
b1
2
s.t. (Bw1+e1b1) + ξ1e1, ξ10 (10)
and
min
(w2,b2)Rn+1
1
2kBw2+e1b2k2+c2kξ2k+c4
2
w2
b2
2
s.t. (Aw2+e2b2) + ξ2e2, ξ20 (11)
respectively, where ci, i = 1,2,3,4 are the penalty parameters, e1and e2are
vectors of ones of appropriate dimensions, ξ1and ξ2are slack variables of ap-
propriate dimensions. Their corresponding Lagrange dual problems are
max
αet
2α1
2αtG(HtH+c3I)1Gtα
s.t. 0 αc1
(12)
max
γet
1γ1
2γtH(GtG+c4I)1Htγ
s.t. 0 γc2
(13)
where αand γare Lagrange multipliers, G= [B e1] and H= [A e2]. The
7
solution of the problems in Eqns. (10) and (11) are obtained by
w1
b1
=(HtH+c3I)1Gtα(14)
and
w2
b2
= (GtG+c4I)1Htγ. (15)
Once the solutions of the problems in Eqns. (12) and (13) are obtained, a
new point xRnis assigned to class i(i= +1,1), depending on which of the
two hyperplanes in (9) it is closer to
Class i= arg min
k=1,2
|wT
kx+bk|
||wk|| ,(16)
where |.|is the absolute value.
3.1.3. Linear programming twin support vector machines (LPTSVM)
The solution of TWSVM and TBSVM are not capable of generating sparse
solutions. To overcome this issue, a robust and sparse linear programming
twin support vector machines (LPTSVM) [22] was proposed. The solution of
LPTSVM is obtained by solving a pair of dual exterior penalty problems as
unconstrained optimization problems using Newton method. Unlike solving
two QPPs in TWSVM and TBSVM, the unconstrained optimization problems
of LPTSVM is reduced to solving two systems of linear equations, which leading
to extremely fast and efficient algorithm.
The formulation of LPTSVM [22] can be expressed as follows:
min
(w1,b1)Rn+1 kAw1+e2b1k1+c1kξ1k1+c3
w1
b1
1
s.t. (Bw1+e1b1) + ξ1e1, ξ10
(17)
min
(w2,b2)Rn+1 kBw2+e1b2k1+c2kξ2k1+c4
w2
b2
1
s.t. (Aw2+e2b2) + ξ2e2, ξ20
(18)
8
where Aand Bare matrices of sizes m1×nand m2×nrespectively, ci, i =
1,2,3,4 are the penalty parameters, e1and e2are the vectors of one’s of sizes
m1and m2, respectively.
Following the approach of [11] we obtain the solutions of the 1-norm TWSVM
in Eqns. (17) and (18) by converting them into a pair of linear programming
problems (LPPs) in primal and solving the exterior penalty functions of their
duals for a finite value of a penalty parameter θ.
Let G= [A e2], H = [B e1] be two augmented matrices of sizes m1×(n+1)
and m2×(n+ 1), respectively. Then, by setting
w1
b1
=p1q1, G (p1q1) = r1s1,
w2
b2
=p2q2, G (p2q2) = r2s2
(19)
where p1, q1, p2, q2Rn+1, r1, s1Rm1and r2, s2Rm2satisfying the non-
negativity constraints
p1, q1, p2, q2, r1, s1, r2, s20,
the above pair of problems in Eqns. (17) and (18) can be converted into the
following pair of linear programming twin support vector machine (LPTSVM)
problems of the form:
min
r1,s1Rm1,p1,q1Rn+11Rm2
et
1(r1+s1) + c1et
2ξ1+c3et(p1+q1)
H(p1q1) + ξ1e2
s.t. G (p1q1)(r1s1)=0
p1, q1, r1, s1, ξ10
(20)
and
min
r2,s2Rm2,p2,q2Rn+12Rm1
et
2(r2+s2) + c2et
1ξ2+c4et(p2+q2)
G(p2q2) + ξ2e1
s.t. H(p2q2)(r2s2) = 0
p2, q2, r2, s2, ξ20
(21)
9
respectively, where eis the vector of one’s of size (n+ 1).
3.2. Weighted TWSVM variants
3.2.1. Weighted Lagrangian twin support vector machine (WLTSVM)
Above discussed TWSVM variants do not handle the issue of imbalanced
data. Weighted Lagrangian twin support vector machine (WLTSVM) [19] was
developed for handling the imbalanced data. It uses a graph-based under-
sampling strategy, which provides the robustness to the algorithm against out-
liers. It also embeds the weight biases in the Lagrangian TWSVM for enabling
the algorithm to handle imbalanced data.
The primal problems of WLTSVM can be written as
min
(w1,b11)
1
2(kw1k2+b2
1) + c1
2((Aw1+e2b1)t(Aw1+e2b1) + ξt
1D2ξ1)
s.t. (B2w1+e1b1) + ξ1e1, ξ10 (22)
and
min
(w2,b22)
1
2(kw2k2+b2
2) + c2
2((B1w2+e1b2)tD1(B1w2+e1b2) + ξt
2ξ2)
s.t. (Aw2+e2b2) + ξ2e2, ξ20 (23)
where ci, i = 1,2 are the penalty parameters, e1and e2are vectors of ones
with appropriate dimensions, ξ1and ξ2are slack variables of appropriate di-
mensions, B1and B2are under-sampled training sets. D1and D2are weight
matrices used to determine minority and majority planes respectively in the
case of imbalanced data.
The dual forms of Eqns. (22) and (23) are:
max
α1
2αtR1StS+c1I1Rt
1+1
c1
D1
2α+et
2α
s.t. α 0
(24)
10
max
γ1
2γtSRt
2R2+c2I1St+1
c2
D1
1γ+et
1γ
s.t. γ 0
(25)
where S=hA e2i,R1=hB2e1i,R2=hB1e2iand α, γ are Lagrangian
multipliers.
Similar to earlier subsections, solutions (w1, b1) and (w2, b2) can be obtained
by solving Eqns. (24) and (25).
3.2.2. Pinball loss based twin support vector machine (pinTSVM)
Twin support vector machines (TWSVM) [8], twin bounded SVM (TBSVM)
[12] and twin parametric-margin support vector machine (TPMSVM) [26] are
efficient classifiers but noise sensitive. To overcome the issue of noise sensitivity
and further enhance the generalization ability, Xu et al. [27] introduced pinball
loss to TPMSVM and proposed twin support vector machine with pinball loss
(pinTSVM), especially for noise-corrupted data.
Let the data points belonging to class +1 and -1 are `1and `2respectively in
the n-dimensional real space Rn. The nonlinear pinTSVM seeks for two kernel
generated surfaces defined as follows:
K(xT, DT)u++b+= 0 and K(xT, D T)u+b= 0,
where D= [A;B]; u+, uRnand Kis an arbitrary kernel function. The
nonlinear pinTSVM formulation can be expressed as follows:
min
u+, b+, ξ1
1
2ku+k2+ν1
`2
eT
2(K(B, DT)u++e2b+) + c1
`1
eT
1ξ1(26)
s.t. K(A, DT)u++e1b+ ξ1,
K(A, DT)u++e1b+ξ1
τ1
and
min
u, b, ξ2
1
2kuk2ν2
`1
eT
1(K(A, DT)u+e1b) + c2
`2
eT
2ξ2(27)
s.t. (K(B, DT)u+e2b) ξ2,
(K(B, DT)u+e2b)ξ2
τ2
,
11
where c1, c2are positive parameters; ν1, ν2>0 are margin parameters and τ1,
τ2[0,1] are pinball loss function parameters. When τ1and τ2are zero then
QPPs in Eqns. (26) and (27) are converted into the QPPs of TPMSVM. By
introducing the Lagrange function and using the KarushKuhnTucker (K.K.T.)
optimality conditions, we get the dual formulations of QPPs in Eqns. (26) and
(27) as follows:
max
α, β
ν1
`2
eT
2K(B, A)T(αβ)1
2(αβ)TK(A, A)T(αβ) (28)
s.t. eT
1(αβ) = ν1, α +β
τ1
=c1
`1
e1,
α0, β 0
and
max
γ, σ
ν2
`1
eT
1K(A, B)T(γσ)1
2(γσ)TK(B, B)T(γσ) (29)
s.t. eT
2(γσ) = ν2, γ +σ
τ2
=c2
`2
e2,
γ0, σ 0,
where α,β,γand σ0 are Lagrange multipliers.
After optimizing the QPPs in Eqns. (28) and (29), we obtain the ui(i=
+,) as follows:
u+=K(A, DT)T(αβ)ν1
`2
K(B, DT)Te2
and
u=K(B, DT)T(γσ) + ν2
`1
K(A, DT)Te1.
Value of the bias term (b+) is given by:
O+={i:αi>0 and βi>0}, b+=1
|O+|X
iO+
K(xT, DT)u+.
Similarly, value of bias term (b) is given by:
O={i:γi>0 and σi>0}, b=1
|O|X
iO
K(xT, DT)u.
12
A new data point xRnis assigned to class i(i= +1,1) depending on
which of the kernel generated surface is closer to x, i.e.,
class(i) = signK(xT, DT)u++b+
ku+k+K(xT, DT)u+b
kuk,
where sign(·) is the signum function.
3.3. Least squares TWSVM variants
3.3.1. Least squares twin support vector machine (LSTSVM)
The formulation of least squares twin support vector machines (LSTSVM)
[9] simply solves system of linear equations as opposed to solving QPPs in
TWSVM, TBSVM and pinTSVM. Therefore, it is a simple and fast algorithm.
For derivation of the primal problems of LSTSVM, inequality constraints is
replaced by equality constraints and 1-norm is replaced by 2-norm for slack
variables in the formulation of TWSVM. Thus, the primal problems of LSTSVM
[9] can be expressed as
min
(w1,b1)Rn+1
1
2kAw1+e2b1k2+c1
2kξ1k2
s.t. (Bw1+e1b1) + ξ1=e1,(30)
min
(w2,b2)Rn+1
1
2kBw2+e1b2k2+c2
2kξ2k2
s.t. (Aw2+e2b2) + ξ2=e2.(31)
Solution of linear LSTSVM is obtained by computing inverse of two matrix
[9] which can be expressed in the form of two nonparallel hyperplanes as follows:
w1
b1
=hc1QtQ+PtPi1
c1Qte1,(32)
w2
b2
=hc2PtP+QtQi1
c2Pte2,(33)
13
where c1and c2are positive penalty parameters, P= [A e2] and Q= [B e1].
Here, we can see that both TWSVM and LSTSVM only minimize the em-
pirical risk, and the matrices in Eqns. (32) and (33) may not be nonsingular.
3.3.2. Improved least squares twin support vector machine (ILSTSVM)
Least squares twin support vector machine (LSTSVM) implements only em-
pirical risk minimization principle, which reduces its generalization performance.
To overcome this drawback, Xu et al. [16] proposed an improved version of
LSTSVM by introducing extra regularization terms to each objective function.
This improvement leads to implement the structural risk minimization principle
and shows better generalization performance as compared to LSTSVM.
The primal problems of ILSTSVM [16] can be written as
min
(w1,b11)
1
2kAw1+e2b1k2+c1
2ξt
1ξ1+c3
2
w1
b1
2
s.t. (Bw1+e1b1) + ξ1=e1,
(34)
min
(w2,b22)
1
2kBw2+e1b2k2+c2
2ξt
2ξ2+c4
2
w2
b2
2
s.t. (Aw2+e2b2) + ξ2=e2,
(35)
where ci, i = 1,2,3,4 are the penalty parameters, e1and e2are vectors of ones of
appropriate dimensions, ξ1and ξ2are slack variables of appropriate dimensions.
Solution of linear ILSTSVM is obtained by computing inverse of two matrix
[16] which can be expressed in the form of two nonparallel hyperplanes as follows:
w1
b1
=[GtG+1
c1
HtH+c3
c1
I]1Gte1,(36)
w2
b2
= [HtH+1
c3
GtG+c4
c2
I]1Hte2.(37)
Note that the solutions for the primal problems in Eqns. (34) and (35) can
be obtained directly by solving two systems of linear equations instead of solving
14
two QPPs in TBSVM, which implies that the speed of ILSTSVM is faster than
TBSVM.
3.3.3. Robust energy-based least squares twin support vector machines (RELS-
TSVM)
By introducing an energy parameter to each hyperplane and an extra regular-
ization term to each objective function, recently Tanveer et al. [20] presented a
robust energy-based least squares twin support vector machines (RELS-TSVM)
algorithm for classification problems. This algorithm is not only robust to noise
and outliers but also more stable.
The primal problems of RELS-TSVM can be expressed as follows:
min
(w1,b11)
1
2kAw1+e2b1k2+c1
2ξt
1ξ1+c3
2
w1
b1
2
s.t. (Bw1+e1b1) + ξ1=E1,
(38)
min
(w2,b22)
1
2kBw2+e1b2k2+c2
2ξt
2ξ2+c4
2
w2
b2
2
s.t. (Aw2+e2b2) + ξ2=E2,
(39)
where ci, i = 1,2,3,4 are the penalty parameters; E1and E2are energy pa-
rameters of the hyperplanes; ξ1and ξ2are the slack variables of appropriate
dimensions.
One can obtain the solutions for the problems in Eqns. (38) and (39) as
z1=(c1NtN+MtM+c3I)1c1NtE1(40)
z2= (c2MtM+NtN+c4I)1c2MtE2(41)
respectively, where N= [B e1] and M= [A e2].
It should be pointed out again that both (c1NtN+MtM+c3I) and (c2MtM+
NtN+c4I) are positive definite matrices due to adding extra regularization
term. This extra regularization provides more robustness and stability to the
15
algorithm. We also notice that RELS-TSVM is not affected by matrix singular-
ity. The parameters c3and c4used in the formulation are penalty parameters
instead of perturbation term. Once training of RELS-TSVM is over then the
class of an unknown data point xiis assigned based on the following decision
function.
f(xi) =
+1 if |xiw1+eb1
xiw2+eb2| 1
1 if |xiw1+eb1
xiw2+eb2|>1
(42)
where |.|is the absolute value.
4. Numerical experiments
All 90 datasets are taken from UCI repository [24]. Out of 90 datasets, 44
are binary and 46 are multi-class datasets. The name of these datasets2are
available at these links http://people.iiti.ac.in/~phd1501101001/TSVM_
Binary_JMLR/results-Binary_Final.xlsx and http://people.iiti.ac.in/
~phd1501101001/TSVM_Binary_JMLR/results-Multi_Final.xlsx. We have
performed Z-score normalization on all the datasets as done in [23]. Our ex-
perimental setup is identical as in [23] and it consists of two steps. In the
first step, one training and one testing set are generated randomly by dividing
the dataset into two equal parts for tuning the parameters for the classifiers
and select the best performing parameters from the testing set as the optimal
parameters. Indices for this random division of the datasets have been taken
directly from [23]. The authors have provided all indices at http://persoal.
citius.usc.es/manuel.fernandez.delgado/papers/jmlr/data.tar.gz. In
the second step, experiments have been performed in two ways using the opti-
mal parameters for final training and testing as follows:
2The whole datasest and partitions are available from: http://persoal.citius.usc.es/
manuel.fernandez.delgado/papers/jmlr/data.tar.gz
16
(i) If the dataset is not originally (as provided by creator of the datasets)
available in two sets i.e. training and testing set then 4-fold cross validation
is performed on the whole dataset. We have used the indexes as in [23] of
the training and testing sets of each fold for all the used classifiers in the
paper. The final result is considered as the average result of 4 folds.
(ii) If the dataset is originally (as provided by creator of the datasets) available
in two sets i.e. training and testing sets (like hill-valley, horse-colic, monks-
1, spectf, etc.) then train and test the classifiers on the respective partition
using the obtained optimal parameters in the first step.
All used partitions and indexes in our experiments are the same as used in
[23]. Eight variants of TWSVM viz., TBSVM, TWSVM, LPTSVM, pinTSVM,
WLTSVM, LSTSVM, ILSTSVM, and RELS-TSVM, are discussed in the previ-
ous section. We have followed the same naming convention for classification as
in [23]. All the experiments have been conducted on MATLAB 2016a in Win-
dows 7 (64 bit) environment with 64 GB RAM, 3.00 GHz Intel Xeon processor.
The names of 8 variants of TWSVM are appended with ‘m’ as ‘m’ stands for
MATLAB: TBSVM m, TWSVM m, LPTSVM m, pinTSVM m, WLTSVM m,
LSTSVM m, ILSTSVM m, and RELS-TSVM m. It is to be noted that now on-
ward when we mention TWSVM instead of TWSVM m then TWSVM denotes
twin SVM family collectively, and TWSVM m denotes the basic variant of twin
SVM family. These variants of TWSVM use Gaussian kernel and it has one
parameter σ. The ranges of all parameters of TWSVM variants are provided in
Table 1.
4.1. Comparison between 8TWSVM variants with 179 classifiers from [23] on
binary class datasets
In this section, we will discuss only performance on binary datasets. Discus-
sion on multi-class datasets is provided in Section 4.2. Performance of all 179
classifiers from [23] with 8 variants of TWSVM are presented in Tables 2 and 3.
All values of 8 variants of TWSVM are kept bold faced in Tables 2 and 3. First
column in Tables 2 and 3 shows position (Pos) as per Friedman Rank (FRank).
17
Table 1: Ranges of all parameters of TWSVM variants
Parameters ILSTSVM m LPTSVM m LSTSVM m pinTSVM m RELS-TSVM m TBSVM m TWSVM m WLTSVM m
Regularization (c1)105to 105105to 105105to 10523to 27105 to 105105to 105105to 105105to 105
Regularization (c2)105to 105105 to 105105to 10523to 27105to 105105to 105105to 105105to 105
Epsilon (1)105to 105−− −− −− 105to 105105 to 105−− 105to 105
Epsilon (2)105 to 105−− −− −− 105to 105105to 105−− 105to 105
Sigma (σ)210 to 210 210 to 210 210 to 210 210 to 210 210 to 210 210 to 210 210 to 210 210 to 210
Tau (τ1)−− −− −− [0.05,0.1,0.2,0.5,1] −− −− −− −−
Tau (τ2)−− −− −− [0.05,0.1,0.2,0.5,1] −− −− −− −−
Supplementary parameter (v1)−− −− −− 23to 27−− −− −− −−
Supplementary parameter (v2)−− −− −− 23to 27−− −− −− −−
Energy (E1)−− −− −− −− [0.5,0.6,0.7,0.8,0.9,1.0] −− −− −−
Energy (E2)−− −− −− −− [0.5,0.6,0.7,0.8,0.9,1.0] −− −− −−
Second column contains classifiers with their family name in the bracket, these
names are kept the same as in [23] and their detailed descriptions are available
in [23]. Third and fourth columns contain FRank and average accuracy (Acc)
of their respective classifiers. These average accuracies and ranks are different
from the average accuracy mentioned in [23] as these average accuracies are
based only on 44 binary datasets. Out of 179 classifiers, vbmpRadial t didn’t
work on the binary class datasets as it needs minimum 3-class dataset. Just for
sake of completeness, vbmpRadial t is simply added to the last place in Table
3. We have performed analysis in the same way as discussed in [23] by calcu-
lating FRank, Average accuracy (Acc), Probability of Achieving the Maximum
Accuracy (PAMA), Probability of achieving more than 95% of the maximum
accuracy (P95) and Percentage of the Maximum Accuracy (PMA). PAMA, p95
and PMA are defined as follows [23]:
PAMA = Number of datasets for which any classifier achieves the maximum accuracy
Number of datasets ×100
p95 = Number of datasets for which any classifier achieves more than 95% of maximum accuracy
Number of datasets ×100
PMA = PNo. of datasets
i=1 Accuracy of classifier for ith dataset
Maximum accuracy achieved for ith dataset ×100
Number of datasets
18
Above all mentioned measures are recalculated for binary datasets with 8
variants of TWSVM by using the detailed results3provided by Fernandez et al.
[23]. These are extensively discussed in the following subsections.
4.1.1. FRank and Average Accuracy analysis of 8 variants of TWSVM with 179
classifiers from [23]
Friedman ranking is performed on 44 binary datasets with 187 classifiers
(179 classifiers from [23] and 8 variants of TWSVM). Their FRank and Aver-
age accuracy (Acc) are presented in Tables 2 and 3. As the detailed results
of each classifier on each dataset are of very large size, all results are pro-
vided in detail at http://people.iiti.ac.in/~phd1501101001/TSVM_JMLR_
Binary_Multi.html. As mentioned in [23], some classifiers yield erroneous out-
put, all erroneous output of the classifiers are replaced by the average accuracy
of that specific dataset over 187 classifiers before calculating FRank value (as
done in [23]). Those erroneous values are represented by ‘–’ in the result table,
which is available at the above mentioned webpage. The top three best perform-
ing classifiers as per their average accuracies are RELS-TSVM m, ILSTSVM m
and KRR m/KELM m. However, FRank is not consistent with average accu-
racy which can be seen in Tables 2 and 3. As per FRank, RELS-TSVM m,
ILSTSVM m and avNNet t are the top three classifiers. According to both
criteria viz., FRank and Average accuracy, improved least square variants of
TWSVM based classifiers (RELS-TSVM m and ILSTSVM m) got the joint top
place in the table. However, the basic least square variants i.e. LSTSVM m is
the second worst performer among all 8 variants of TWSVM. If we see top 20
classifiers list then we observe that TWSVM based 5 classifiers are in the top
20 classifiers list according to both FRank as well as Average accuracy (Acc)
among 187 classifiers. Among 187 classifiers, RELS-TSVM m and ILSTSVM m
show stable and better results compared to all remaining classifiers. Fig. 1
3The detailed results are available at: http://persoal.citius.usc.es/manuel.
fernandez.delgado/papers/jmlr/
19
Table 2: Position of classifiers based on binary class datasets (Pos) as per FRank in [23],
FRank, Average accuracy (Acc) for each classifier, ordered by increasing the FRank. This
table is continued to Table 3
Pos Classifiers FRank Acc
1 RELS-TSVM m(TWSVM) 36.9 83.1
1 ILSTSVM m(TWSVM) 36.9 83.1
2 avNNet t(NNET) 39.7 82.0
3 svmPoly t(SVM) 40.4 81.8
4 TBSVM m(TWSVM) 43.2 81.8
5 svmRadialCost t(SVM) 43.9 81.9
6 pcaNNet t(NNET) 45.1 81.9
7 KRR m/KELM m(NNET) 48.1 82.7
8 svm C(SVM) 48.4 80.8
9 rf t(RF) 48.6 81.6
10 LPTSVM m(TWSVM) 48.9 81.9
11 svmRadial t(SVM) 51.2 81.2
12 parRF t(RF) 52.9 81.1
13 nnet t(NNET) 54.1 80.9
14 C5.0 t(BST) 56.6 80.4
15 mlp t(NNET) 56.8 80.7
16 cforest t(RF) 57.3 79.5
17 mlpWeightDecay t(NNET) 57.4 80.3
18 TWSVM m(TWSVM) 58.9 79.6
19 svmLinear t(SVM) 60.9 80.6
20 svmBag R(BAG) 61.5 80.6
21 RotationForest w(RF) 62.6 80.6
22 rforest R(RF) 63.4 82.5
22 gaussprRadial R(OM) 63.4 81.2
23 bayesglm t(GLM) 64.2 80.3
24 fda t(DA) 64.5 79.7
25 glmnet R(GLM) 64.8 81.0
26 BG LibSVM w(BAG) 65.2 79.5
27 rda R(DA) 66.1 80.5
28 pls t(PLSR) 66.9 80.6
29 rbf t(NNET) 67.2 78.7
30 knn t(NN) 67.5 79.4
31 WLTSVM m(TWSVM) 68.5 81.1
31 pnn m(NNET) 68.5 79.3
32 svmlight C(NNET) 68.8 81.1
33 pda t(DA) 68.9 80.5
34 rbfDDA t(NNET) 69.3 80.0
35 MAB LibSVM w(BST) 70.0 79.7
36 simpls R(PLSR) 70.7 80.5
37 widekernelpls R(PLSR) 71.3 80.5
38 RRFglobal t(RF) 71.4 80.2
39 multinom t(LMR) 71.5 80.8
40 nnetBag R(BAG) 71.6 79.5
41 dkp C(NNET) 72.4 79.6
42 mlm R(GLM) 73.2 80.9
43 adaboost R(BST) 73.5 80.0
44 fda R(DA) 73.6 79.8
45 plsBag R(BAG) 73.9 79.0
46 lda R(DA) 74.2 79.5
47 LibLINEAR w(SVM) 74.6 79.5
Pos Classifiers FRank Acc
48 mda t(DA) 74.7 72.8
49 kernelpls R(PLSR) 74.9 78.4
50 LibSVM w(SVM) 76.0 78.6
50 RRF t(RF) 76.0 79.8
51 MCC w(LMR) 76.4 79.2
51 Logistic w(OEN) 76.4 79.2
52 BG RandomForest w(BAG) 76.8 79.3
53 RandomForest w(RF) 77.4 79.7
54 knn R(NN) 77.5 79.2
55 gcvEarth t(MARS) 77.6 78.7
56 lvq t(NNET) 78.5 79.0
57 Decorate w(OEN) 78.7 79.8
58 MAB PART w(BST) 79.2 79.3
59 bagging R(BAG) 79.5 77.0
60 ldaBag R(BAG) 79.7 79.6
60 sda t(DA) 79.7 79.5
61 SMO w(SVM) 79.8 78.9
62 mars R(MARS) 79.9 78.4
63 SimpleLogistic w(LMR) 80.0 78.2
64 glmStepAIC t(GLM) 80.3 78.7
65 BG PART w(BAG) 81.3 78.8
66 MAB RandomForest w(BST) 81.4 79.5
67 lda2 t(DA) 81.6 79.1
68 mlp C(NNET) 82.1 77.9
69 BG Logistic w(BAG) 82.3 79.2
69 MAB MLP w(BST) 82.3 79.7
70 MAB Logistic w(BST) 83.5 79.3
71 MAB J48 w(BST) 83.9 79.1
72 BG DecisionTable w(BAG) 84.9 77.8
73 gpls R(PLSR) 85.1 76.3
74 hdda R(DA) 85.3 79.0
75 MLP w(NNET) 85.9 79.2
76 glm R(GLM) 86.4 76.8
77 ctreeBag R(BAG) 87.1 77.7
77 BG J48 w(BAG) 87.1 78.4
78 elm m(NNET) 87.2 78.0
79 RandomSubSpace w(DT) 87.3 77.6
80 CVR w(OM) 88.5 78.4
81 MAB REPTree w(BST) 88.6 78.3
82 JRip t(RL) 89.0 77.7
83 ctree2 t(DT) 89.5 77.3
84 ctree t(DT) 90.1 77.1
85 AdaBoostM1 J48 w(BST) 90.4 79.3
86 Dagging w(OEN) 90.6 77.6
87 LSTSVM m(TWSVM) 91.1 74.0
88 BG Ibk w(BAG) 91.6 78.5
88 BG LWL w(BAG) 91.6 78.5
88 BG REPTree w(BAG) 91.6 78.5
89 mda R(DA) 92.6 78.4
90 RandomCommittee w(OEN) 92.7 78.6
20
Table 3: Continuation of Table 2
Pos Classifiers FRank Acc
91 treebag t(BAG) 92.8 78.1
91 obliqueTree R(DT) 92.8 78.6
92 PenalizedLDA R(DA) 94.2 76.2
93 BG RandomTree w(BAG) 95.9 78.4
94 MAB DecisionTable w(BST) 96.0 76.5
95 C5.0Rules t(RL) 96.9 78.4
96 mlp m(NNET) 97.4 77.4
97 NBTree w(DT) 97.6 77.2
97 LogitBoost w(BST) 97.6 77.2
98 lssvmRadial t(SVM) 97.7 77.5
99 RBFNetwork w(NNET) 98.8 76.3
100 C5.0Tree t(DT) 99.5 77.9
101 DTNB w(RL) 99.6 77.3
102 rpart t(DT) 99.9 77.0
103 slda t(DA) 100.5 76.4
104 ASC w(OM) 100.7 77.0
105 AdaBoostM1 w(BST) 100.8 76.8
105 JRip w(RL) 100.8 77.4
106 FilteredClassier w(OM) 101.4 77.0
107 Ridor w(RL) 102.8 77.0
108 lvq R(NNET) 103.2 73.5
109 pam t(OM) 103.7 76.2
110 MAB RandomTree w(BST) 103.8 77.7
111 sddaLDA R(DA) 104.1 76.6
112 PART w(DT) 104.3 77.7
113 J48 w(DT) 104.5 77.8
113 OCC w(OEN) 104.5 77.8
113 END w(OEN) 104.5 77.8
114 rbf m(NNET) 104.7 74.0
115 rpart2 t(DT) 104.9 78.3
116 qda t(DA) 105.4 76.1
116 BayesNet w(BY) 105.4 75.9
116 PART t(DT) 105.4 78.2
117 stepLDA t(DA) 106.4 75.4
118 bdk R(NNET) 106.8 77.7
119 J48 t(DT) 108.0 77.6
120 sddaQDA R(DA) 109.4 75.1
121 rpart R(DT) 111.3 76.3
122 sparseLDA R(DA) 112.5 73.4
123 REPTree w(DT) 113.5 76.1
124 stepQDA t(DA) 114.0 74.2
125 DecisionTable w(RL) 114.1 75.4
126 MAB Ibk w(BST) 114.3 74.8
Pos Classifiers FRank Acc
126 MAB w(BST) 114.6 74.8
127 MAB NaiveBayes w(BST) 115.1 74.5
127 IBk w(NN) 115.1 76.5
128 KStar w(OM) 116.3 76.2
129 NNge w(NN) 116.7 76.1
130 cascor C(NNET) 117.6 75.5
131 nbBag R(BAG) 117.7 74.5
132 rrlda R(DA) 117.8 74.7
132 naiveBayes R(BY) 117.9 74.3
133 IB1 w(NN) 120.1 76.2
134 NaiveBayes w(BY) 121.2 73.7
135 LWL w(OEN) 121.3 74.5
136 BG DecisionStump w(BAG) 122.0 73.8
137 BG NaiveBayes w(BAG) 124.5 73.1
138 BG OneR w(BAG) 125.3 74.0
139 DecisionStump w(DT) 125.7 73.5
139 NBUpdateable w(BY) 125.7 73.1
140 ConjunctiveRule w(RL) 128.7 72.7
141 MAB OneR w(BST) 130.1 74.0
142 NaiveBayesSimple w(BY) 131.3 71.9
143 OneR t(RL) 132.2 72.5
144 dpp C(NNET) 132.3 69.4
145 spls R(PLSR) 133.9 65.8
146 logitboost R(BST) 136.0 69.3
147 QdaCov t(DA) 136.6 73.1
148 OneR w(RL) 137.6 71.9
149 RandomTree w(DT) 138.7 74.6
150 BG MLP w(BAG) 138.7 66.5
151 pinTSVM m(TWSVM) 139.7 66.3
152 BG HyperPipes w(BAG) 141.3 67.0
152 Stacking w(STC) 145.2 63.2
152 Grading w(OEN) 145.2 63.2
153 CVPS w(OM) 145.2 63.2
154 StackingC w(STC) 145.3 63.1
155 RILB w(BST) 145.7 63.4
155 VFI w(OM) 146.0 68.2
156 HyperPipes w(OM) 146.1 65.1
156 ZeroR w(RL) 146.8 62.6
156 MultiScheme w(OEN) 146.8 62.6
156 CSC w(OEN) 146.8 62.6
157 Vote w(OEN) 146.8 62.6
158 MetaCost w(BAG) 147.3 62.5
159 CVC w(OM) 165.6 61.6
160 vbmpRadial t(BY) NA NA
21
exhibits that RELS-TSVM m and ILSTSVM m achieve either maximum ac-
curacy or near maximum accuracy for all datasets except hill-valley (53.6%)
and horse-colic (60.29%) datasets. We further provide two seperate analysis in
Figs. 2 and 3 on top 25 classifiers using FRank and Acc. Fig. 2 shows that
top 2 classifiers i.e. RELS-TSVM m, ILSTSVM m (TWSVM family) exhibit
identical performance. There is a significant difference (2.8) in the FRank of
RELS-TSVM m (or ILSTSVM m) and avNNet t. Furthermore, we need to sta-
tistically verify the results presented in Tables 2 and 3. We have selected top
20 classifiers from these tables as per their FRanks and performed Friedman
test. In the Friedman test, null hypothesis states that accuracies of two com-
pared methods are not significantly different within a tolerance α= 0.05, when
p-value>0.05. We have computed three things under this test F-score, criti-
cal value(Cval) and p-value for top 20 classifiers. Moreover, we have computed
Friedman test and modifed Friedman test [34]. Computed p-values for both
cases (Friedman and modified Friedman test) are less than 0.05 i.e. 0.0012 and
0.0010, respectively. Computed F-scores for both cases are also less than critical
value i.e. 43.1672 <30.1435 and 2.3412 <1.5993. Based on the above analysis,
few interesting facts can be observed from Table 2 and Figs. 2 and 3:
(i) Despite of the identical average accuracy of svmPoly t and TBSVM m,
svmPoly t yields lower FRank compared to TBSVM m with a difference
of 2.8.
(ii) Similarly, Acc of TBSVM m is lesser compared to KRR m/KELM m how-
ever, TBSVM m achieves better FRank (difference of FRanks: 4.9) com-
pared to KRR m/KELM m.
(iii) KRR m/KELM m yields the third highest accuracy among 187 classifiers
and there is only a difference of 0.4 between the Acc of KRR m/KELM m
and RELS-TSVM m (or ILSTSVM m). However, KRR m/KELM m got
7th place among 187 classifiers.
(iv) Similar facts as discussed in the above three points can be stated for the
LPTSVM m, TWSVM m and WLTSVM m. These facts mainly exhibit
22
0 5 10 15 20 25 30 35 40 45
Datasets
50
55
60
65
70
75
80
85
90
95
100
Accuracy(%)
RELS-TSVM_m
Maximum Acc
Figure 1: Accuracy (in %) achieved by RELS-TSVM m (and ILSTSVM m) vs. maximum
accuracy for each dataset (ordered by increasing maximum accuracies).
the unstable behavior of these 3 variants of TWSVM based classifiers.
Even, WLTSVM m performs better in term of Acc but yields inferior
FRank compared to TWSVM m.
(v) One more interesting fact can be stated from the first column (i.e. Pos) of
Tables 2, 3, 7 and 8 that various low performing classifiers for multi-class
datasets have performed very well for binary class datasets in terms of Acc
as well as FRank.
Analysis of this section clearly shows the significant dominance of RELS-
TSVM m and ILSTSVM m over all remaining classifiers. However, these can’t
be the only criteria to measure the perfromance of the classifiers. In the next
two subsections, we will analyze the performance of classifiers based on PAMA,
p95 and PMA.
23
RELS-TSVM_m
ILSTSVM_m
avNNet_t
svmPoly_t
TBSVM_m
svmRadialCost_t
pcaNNet_t
KRR_m/KELM_m
svm_Crf_t
LPTSVM_m
svmRadial_t
parRF_t
nnet_t
C5.0_t
mlp_t
cforest_t
mlpWeightDecay_t
TWSVM_m
svmLinear_t
svmBag_R
RotationForest_w
rforest_R
gaussprRadial_R
bayesglm_t
35
40
45
50
55
60
65
Friedman Rank (FRank)
Figure 2: Top 25 classifiers as per FRank in increasing order of FRank.
RELS-TSVM_m
ILSTSVM_m
KRR_m/KELM_m
rforest_R
avNNet_t
pcaNNet_t
svmRadialCost_t
LPTSVM_m
TBSVM_m
svmPoly_t
rf_t
svmRadial_t
gaussprRadial_R
WLTSVM_m
svmlight_C
parRF_t
glmnet_R
mlm_R
nnet_t
svm_C
multinom_t
mlp_t
svmLinear_t
svmBag_R
RotationForest_w
80.5
81
81.5
82
82.5
83
83.5
Accuracy (%)
Figure 3: Top 25 classifiers as per average accuracy in decreasing order of Acc.
24
4.1.2. PAMA and p95 analysis of 8 variants of TWSVM with 179 classifiers
from [23]
PAMA and p95 were calculated for 187 classifiers over 44 binary datasets.
Top 20 classifiers as per PAMA criterion are listed in Table 4 and PAMA value
of all 187 classifiers are provided on this web page1. As we can see from Ta-
ble 4, TBSVM m emerges as the best classifier instead of RELS-TSVM m (or
ILSTSVM m) as per PAMA value. Surprisingly, the top 2 classifiers (RELS-
TSVM m and ILSTSVM m) as per FRank in the Table 2 are not able to secure
position in the top 20 list. It can be noted that all three least square based
TWSVM (LSTSVM m, RELS-TSVM m and ILSTSVM m) yield the same PAMA
value of 6.8. 4 variants of TWSVM viz., TBSVM m, LPTSVM m, TWSVM m
and WLTSVM m, are in the top 20 list, which shows the dominance of TWSVM
based classifiers over other classifiers. However, PAMA provides the biased in-
sight about the classifier as some classifier didn’t achieve the maximum accuracy
but they might be very near to maximum accuracy [23]. Therefore, we consider
the p95 criteria for the evaluation. Top 20 classifiers as per p95 criterion are
listed in Table 5 and the p95 values of all 187 classifiers are provided on this web
page1. As per this criterion, 4 variants of TWSVM attain a position within the
top in the list of top 20 classifiers in Table 5. Both least square variants viz.,
RELS-TSVM m and ILSTSVM m, got the position in top 10 while TBSVM m
performs better than RELS-TSVM m and ILSTSVM m. This shows the fact
that RELS-TSVM m and ILSTSVM m might not achieve best accuracy but
improves the generalization capability of TWSVM m.
4.1.3. PMA analysis of 8 variants of TWSVM with 179 classifiers from [23]
Top 20 classifiers as per PMA criterion are listed in Table 6 and PMA val-
ues of all 187 classifiers are provided on the web page1. Before calculating
the PMA value, all erroneous output of the classifiers are replaced by zero (as
done in [23]). As per this criterion, 2 variants of TWSVM i.e. RELS-TSVM m
and ILSTSVM m, attain the top 2 positions similar to FRank and Acc crite-
ria. Four TWSVM based classifiers secured their positions among the top 10
25
Table 4: Top 20 classifiers as per the highest PAMA (%) value.
S.No. Classifier PAMA(%)
1TBSVM m(TWSVM) 15.9
2 KRR m/KELM m(NNET) 13.6
3 mda t(DA) 11.4
4 mlp t(NNET) 11.4
5 svmRadialCost t(SVM) 11.4
6LPTSVM m(TWSVM) 11.4
7 pcaNNet t(NNET) 9.1
8 pnn m(NNET) 9.1
9 dkp C(NNET) 9.1
10 svm C(SVM) 9.1
S.No. Classifier PAMA(%)
11 adaboost R(BST) 9.1
12 nnetBag R(BAG) 9.1
13 rforest R(RF) 9.1
14 gpls R(PLSR) 9.1
15 TWSVM m(TWSVM) 9.1
16 WLTSVM m(TWSVM) 9.1
17 MAB DecisionTable w(BST) 6.8
18 pda t(DA) 6.8
19 rda R(DA) 6.8
20 rbf m(NNET) 6.8
Table 5: Top 20 classifiers as per the highest p95 (%) value.
S.No. Classifier p95(%)
1 svmRadialCost t(SVM) 77.3
2 svm C(SVM) 72.7
3 svmPoly t(SVM) 72.7
4TBSVM m(TWSVM) 72.7
5 svmRadial t(SVM) 70.5
6RELS-TSVM m(TWSVM) 70.5
7ILSTSVM m(TWSVM) 70.5
8 BG LibSVM w(BAG) 68.2
9LPTSVM m(TWSVM) 68.2
10 avNNet t(NNET) 65.9
S.No. Classifier p95(%)
11 KRR m/KELM m(NNET) 63.6
12 MAB LibSVM w(BST) 61.4
13 pcaNNet t(NNET) 61.4
14 LibSVM w(SVM) 61.4
15 C5.0 t(BST) 61.4
16 rf t(RF) 61.4
17 parRF t(RF) 61.4
18 TWSVM m(TWSVM) 61.4
19 svmBag R(BAG) 59.1
20 mlpWeightDecay t(NNET) 56.8
Table 6: Top 20 classifiers as per the highest PMA (%) value.
S.No. Classifier PMA(%)
1RELS-TSVM m(TWSVM) 95.3
2ILSTSVM m(TWSVM) 95.3
3 KRR m/KELM m(NNET) 94.8
4 avNNet t(NNET) 94.1
5 pcaNNet t(NNET) 93.9
6 svmRadialCost t(SVM) 93.9
7 svmPoly t(SVM) 93.8
8TBSVM m(TWSVM) 93.7
9LPTSVM m(TWSVM) 93.7
10 rf t(RF) 93.5
S.No. Classifier PMA(%)
11 svmRadial t(SVM) 93.1
12 rforest R(RF) 93.0
13 nnet t(NNET) 93.0
14 parRF t(RF) 93.0
15 glmnet R(GLM) 92.9
16 WLTSVM m(TWSVM) 92.8
17 svm C(SVM) 92.8
18 mlp t(NNET) 92.7
19 nnetBag R(BAG) 92.7
20 svmLinear t(SVM) 92.6
26
RELS-TSVM_m
ILSTSVM_m
KRR_m/KELM_m
avNNet_t
pcaNNet_t
svmRadialCost_t
svmPoly_t
TBSVM_m
LPTSVM_m
rf_t
svmRadial_t
rforest_R
nnet_t
parRF_t
glmnet_R
WLTSVM_m
svm_C
mlp_t
nnetBag_R
svmLinear_t
RotationForest_w
pda_t
svmBag_R
simpls_R
widekernelpls_R
92
92.5
93
93.5
94
94.5
95
95.5
% of the maximum accuracy (PMA)
Figure 4: Top 20 classifiers as per PMA criterion in decreasing order of PMA value.
0 5 10 15 20 25 30 35 40 45
Datasets
60
65
70
75
80
85
90
95
100
% of the maximum accuracy
RELS-TSVM_m
avNNet_t
Figure 5: PMA value over 44 datasets in increasing order for RELS-TSVM m and avNNet t.
27
classifiers. In Fig. 4, top 20 classifiers are plotted using PMA values. One
can observe that RELS-TSVM m and ILSTSVM m clearly outperform the top
two classifiers of [23] as the difference of the PMA value of RELS-TSVM m
from KRR m/KELM m and avNNet t are 0.5 and 1.2, respectively. The best
performing classifier among 179 classifiers of [23] (i.e. avNNet t) and among
TWSVM based classifiers (i.e. RELS-TSVM m or ILSTSVM m) as per FRank
are plotted with their PMA value in the increasing order over 44 binary datasets
in Fig. 5. It can be easily observed that RELS-TSVM m either performed sim-
ilar or significantly better compared to avNNet t on 44 binary datasets.
4.2. Comparison between 8TWSVM variants with 179 classifiers from [23] for
multi-class datasets
In this subsection, we compare 8 TWSVM variants with 179 classifiers from
[23] on 46 multi-class datasets. Results are provided in Table 7 and Table 8
which contains FRank and Acc of the classifiers. As it can be observed from
these tables that 4 TWSVM variants achieve top 20 ranks and outperformed
most of the classifiers. However, performance is not similar as the case of bi-
nary datasets where TWSVM variants achieved top 2 positions. Further, we
have also calculated PAMA for these multi-class datasets and results are pre-
sented in Table 9. Here, 1 classifier manages to achieve 2nd positions and 1
classifier achieves the 3rd position. Total 4 TWSVM variants get a position
in top 20 as per PAMA criterion. It is to be noted that only one variant,
TBSVM m, is common between Table 6 and Table 9. All results are pro-
vided in detail at http://people.iiti.ac.in/~phd1501101001/TSVM_JMLR_
Binary_Multi.html. Comparison of TWSVM variants is discussed in detail
in the subsequent subsection. Furthermore, similar as binary class datasets,
we have computed F-score, p-value and critical value for multi-class datasets.
Computed p-values for both cases (Friedman and modified Friedman test) are
less than 0.05 i.e. 0.0094 and 0.0085, respectively. Computed F-scores for both
cases are also less than critical value i.e. 36.4090 <30.1435 and 1.9561 <1.5987.
Based on the above discussion, we can state that outcomes presented in this pa-
28
per are significantly different and we can reject null hypothesis.
4.3. Comparison among TWSVM variants
TWSVM variants can be divided into 3 categories. The first category con-
tains three basic TWSVM variants viz., TBSVM m, TWSVM m, and LPTSVM m.
The second category is based on weighted TWSVM and variants are pinTSVM m
and WLTSVM m. The third category contains three least square variants viz.,
LSTSVM m, ILSTSVM m, and RELS-TSVM m. It can be observed from Table
2 and 3 that basic least square version of TWSVM didn’t perform well but two
variants from least square category ( ILSTSVM m, and RELS-TSVM m) per-
formed better than rest of the variants as per Acc value. A Similar observation
is made for multi-class datasets in Tables 7 and 8 that basic least square version
didn’t perform well but another variant, ILSTSVM m, from this category yields
best Acc value among all. Although, one variant from basic TWSVM category
(TWSVM m), two least square category variants (ILSTSVM m, and RELS-
TSVM m), and one variant from weighted TWSVM (WLTSVM m) category
yielded similar Acc value with minor differences as shown in Table 7. As per
PAMA value of binary datasets in Table 4, all three basic TWSVM family vari-
ants achieve top 3 position among 8 TWSVM variants. As per PAMA value of
multi-class datasets in Table 9, the scenario is completely different and all three
variants from least square category achieve top 3 position among 8 TWSVM
variants. Time comparison among all 8 variants for 90 datasets is provided in
Tables 10 and 11. Weighted TWSVM category-based variants consume the least
time compared to rest of the categories and least square category stands on the
second position among three categories. Basic TWSVM variants consume more
time in converging the optimization problem for few datasets. This fact can be
observed in the Tables 10 and 11 for few datasets viz., thyroid, cardiotocography-
10clases, statlog-image, steel-plates etc. For these few datasets, average training
time is more for TBSVM m, TWSVM m, and LPTSVM m.
29
Table 7: Position of classifiers for multi-class datasets (Pos) as per FRank in [23], FRank,
Average accuracy (Acc) for each classifier, ordered by increasing the FRank. This table is
continued to Table 8
Pos Classifier FRank Acc
1 parRF t(RF) 24.3 79.4
2 rf t(RF) 28.0 79.1
3 rforest R(RF) 31.2 78.9
4 nnet t(NNET) 36.3 78.6
5 svmPoly t(SVM) 37.5 77.7
6 svm C(SVM) 38.8 78.3
7 svmRadial t(SVM) 39.3 77.4
8 KRR m/KELM m(NNET) 39.4 77.4
9 RRF t(RF) 39.5 78.2
10 svmRadialCost t(SVM) 40.1 77.5
11 mlp t(NNET) 41.8 78.4
12 C5.0 t(BST) 43.4 77.2
13 avNNet t(NNET) 44.5 77.8
14 BG LibSVM w(BAG) 45.0 77.2
15 pcaNNet t(NNET) 45.4 76.7
16 TBSVM m(TWSVM) 45.9 76.4
17 adaboost R(BST) 46.0 76.9
18 TWSVM m(TWSVM) 46.1 76.8
19 RELS-TSVM m(TWSVM) 46.4 76.7
20 ILSTSVM m(TWSVM) 46.6 76.9
20 RotationForest w(RF) 46.6 77.6
21 RRFglobal t(RF) 48.2 76.9
22 LibSVM w(SVM) 50.9 76.0
23 MAB LibSVM w(BST) 52.5 76.5
24 RandomCommittee w(OEN) 56.8 76.4
25 Decorate w(OEN) 57.0 76.1
26 MAB RandomForest w(BST) 57.2 75.2
27 LPTSVM m(TWSVM) 58.2 74.3
28 mlpWeightDecay t(NNET) 58.6 76.7
29 CVR w(OM) 59.4 75.6
30 svmLinear t(SVM) 59.7 75.6
31 cforest t(RF) 60.1 75.4
32 pnn m(NNET) 60.4 76.2
33 gaussprRadial R(OM) 60.5 76.2
34 dkp C(NNET) 60.8 76.0
35 multinom t(LMR) 60.9 75.9
36 glmnet R(GLM) 61.1 75.0
37 treebag t(BAG) 61.3 76.2
38 mlp C(NNET) 61.8 75.8
39 RandomForest w(RF) 61.9 74.8
40 SimpleLogistic w(LMR) 63.3 75.2
41 elm m(NNET) 64.8 76.2
42 rda R(DA) 66.0 75.2
43 MAB MLP w(BST) 66.8 74.2
43 mda t(DA) 66.8 73.4
44 END w(OEN) 67.2 75.4
45 pda t(DA) 67.3 74.3
46 BG RandomForest w(BAG) 67.7 74.9
Pos Classifier FRank Acc
47 MAB PART w(BST) 68.4 74.7
47 LogitBoost w(BST) 68.4 74.8
47 svmlight C(NNET) 68.4 74.2
48 MAB J48 w(BST) 69.0 74.7
49 fda R(DA) 69.1 73.9
50 ldaBag R(BAG) 69.2 73.7
51 fda t(DA) 69.4 74.8
52 knn R(NN) 69.5 74.8
53 rbf t(NNET) 69.9 73.5
53 gcvEarth t(MARS) 69.9 74.4
54 lda R(DA) 70.3 73.8
55 BG PART w(BAG) 70.4 74.3
56 BG REPTree w(BAG) 70.6 74.8
57 BG J48 w(BAG) 71.2 74.4
58 rbfDDA t(NNET) 71.3 74.6
59 MLP w(NNET) 71.5 74.4
60 lda2 t(DA) 72.2 73.3
61 knn t(NN) 72.8 74.2
62 mlm R(GLM) 73.1 73.6
63 AdaBoostM1 J48 w(BST) 73.6 74.2
64 ctreeBag R(BAG) 74.1 73.5
65 BG RandomTree w(BAG) 75.1 73.4
66 LibLINEAR w(SVM) 75.3 74.9
67 lssvmRadial t(SVM) 76.0 75.8
68 BG Ibk w(BAG) 76.2 73.7
69 sda t(DA) 76.6 73.3
70 lvq t(NNET) 77.8 74.3
71 BG LWL w(BAG) 78.5 73.0
72 SMO w(SVM) 79.4 73.5
73 pls t(PLSR) 80.2 70.5
74 MAB RandomTree w(BST) 80.3 73.2
75 KStar w(OM) 81.3 73.8
76 hdda R(DA) 81.5 72.9
77 mda R(DA) 81.8 73.3
78 LSTSVM m(TWSVM) 82.5 71.8
79 RandomSubSpace w(DT) 82.6 73.1
80 RBFNetwork w(NNET) 83.6 73.6
81 C5.0Tree t(DT) 85.0 73.4
82 J48 t(DT) 85.5 72.9
82 rpart R(DT) 85.5 72.2
83 MAB REPTree w(BST) 85.6 72.4
84 NNge w(NN) 86.0 73.5
85 Logistic w(OEN) 86.2 72.2
86 C5.0Rules t(RL) 86.3 73.1
87 BG Logistic w(BAG) 86.9 72.0
88 JRip t(RL) 87.0 72.4
89 PART t(DT) 87.6 72.6
89 J48 w(DT) 87.6 73.3
30
Table 8: Continuation of Table 2
Pos Classifier FRank Acc
90 ASC w(OM) 87.8 73.0
91 MAB Logistic w(BST) 89.3 71.6
92 logitboost R(BST) 89.7 72.4
93 PART w(DT) 90.6 72.4
94 rpart2 t(DT) 90.8 72.0
95 lvq R(NNET) 90.9 70.7
96 svmBag R(BAG) 91.0 67.5
96 MCC w(LMR) 91.0 71.8
97 nbBag R(BAG) 91.2 72.1
98 WLTSVM m(TWSVM) 91.5 71.4
99 IB1 w(NN) 92.1 72.4
100 rpart t(DT) 96.1 71.3
100 MAB DecisionTable w(BST) 96.1 70.8
101 BayesNet w(BY) 96.2 71.1
102 NBTree w(DT) 96.3 71.8
103 REPTree w(DT) 97.6 71.2
104 naiveBayes R(BY) 98.1 71.0
105 BG DecisionTable w(BAG) 98.7 71.3
106 DTNB w(RL) 98.8 71.4
107 ctree t(DT) 99.0 70.5
108 cascor C(NNET) 99.3 70.2
109 IBk w(NN) 99.6 70.8
110 ctree2 t(DT) 99.8 70.4
111 JRip w(RL) 100.4 71.2
112 qda t(DA) 100.8 69.3
113 NaiveBayes w(BY) 101.9 69.5
114 bagging R(BAG) 101.9 60.0
115 bdk R(NNET) 102.1 71.6
116 BG NaiveBayes w(BAG) 103.0 68.8
117 FilteredClassifier w(OM) 103.9 70.6
118 NBUpdateable w(BY) 104.9 68.0
119 MAB NaiveBayes w(BST) 105.9 68.9
120 Ridor w(RL) 106.6 71.0
121 pam t(OM) 107.0 68.4
122 OCC w(OEN) 107.1 70.0
123 rrlda R(DA) 108.1 67.1
124 RandomTree w(DT) 109.1 69.8
125 slda t(DA) 110.7 67.5
126 vbmpRadial t (BY) 110.8 66.0
127 sparseLDA R(DA) 110.9 65.6
128 Dagging w(OEN) 111.8 67.9
129 plsBag R(BAG) 112.1 63.1
130 rbf m(NNET) 113.2 64.9
131 QdaCov t(DA) 113.9 66.1
132 obliqueTree R(DT) 114.2 62.1
Pos Classifier FRank Acc
132 DecisionTable w(RL) 114.2 68.4
133 PenalizedLDA R(DA) 114.8 63.1
134 NaiveBayesSimple w(BY) 116.2 73.3
135 stepQDA t(DA) 118.8 66.0
136 mlp m(NNET) 120.4 65.3
137 stepLDA t(DA) 120.6 66.6
138 sddaLDA R(DA) 125.1 63.1
139 dpp C(NNET) 125.3 63.0
140 sddaQDA R(DA) 127.8 61.4
141 LWL w(OEN) 128.2 63.4
142 nnetBag R(BAG) 132.4 49.5
143 VFI w(OM) 133.9 63.2
144 OneR w(RL) 137.9 58.0
145 kernelpls R(PLSR) 139.2 51.9
146 OneR t(RL) 140.7 57.5
147 BG OneR w(BAG) 141.1 58.2
148 simpls R(PLSR) 141.8 51.0
149 BG HyperPipes w(BAG) 142.2 57.0
150 mars R(MARS) 142.3 54.8
151 MAB OneR w(BST) 142.4 57.9
152 widekernelpls R(PLSR) 142.5 52.0
153 MAB w(BST) 143.8 54.5
154 ConjunctiveRule w(RL) 145.4 52.6
155 BG DecisionStump w(BAG) 146.4 54.9
156 AdaBoostM1 w(BST) 147.6 54.3
157 MAB Ibk w(BST) 148.3 53.9
158 DecisionStump w(DT) 152.3 51.6
159 HyperPipes w(OM) 152.6 53.6
160 spls R(PLSR) 153.4 44.7
161 BG MLP w(BAG) 155.9 44.6
162 gpls R(PLSR) 158.4 38.5
163 bayesglm t(GLM) 159.8 41.1
164 CVC w(OM) 160.0 47.6
165 RILB w(BST) 161.8 43.4
166 glmStepAIC t(GLM) 163.1 40.6
167 StackingC w(STC) 165.2 40.8
168 MultiScheme w(OEN) 165.4 40.8
169 pinTSVM m(TWSVM) 166.6 34.3
170 Grading w(OEN) 166.9 40.6
171 glm R(GLM) 167.3 25.8
172 Vote w(OEN) 167.4 40.5
173 ZeroR w(RL) 167.5 40.5
173 MetaCost w(BAG) 167.5 40.4
174 Stacking w(STC) 168.1 40.3
174 CSC w(OEN) 168.1 40.3
175 CVPS w(OM) 168.5 40.1
31
Table 9: Top 20 classifiers as per the highest PAMA (%) value.
S.No. Classifier PAMA(%)
1 svm C(SVM) 10.9
2 parRF t(RF) 8.7
3ILSTSVM m(TWSVM) 8.7
4 adaboost R(BST) 6.5
5 RRF t(RF) 6.5
6RELS-TSVM m(TWSVM) 6.5
7 KRR m/KELM m(NNET) 4.3
8 BG RandomForest w(BAG) 4.3
9 lda R(DA) 4.3
10 sda t(DA) 4.3
S.No. Classifier PAMA(%)
11 nnet t(NNET) 4.3
12 lvq R(NNET) 4.3
13 C5.0 t(BST) 4.3
14 LSTSVM m(TWSVM) 4.3
15 TBSVM m(TWSVM) 4.3
16 MAB LibSVM w(BST) 2.2
17 MAB RandomForest w(BST) 2.2
18 MAB RandomTree w(BST) 2.2
19 lda2 t(DA) 2.2
20 PenalizedLDA R(DA) 2.2
5. Conclusions and future directions
This paper has provided an exhaustive benchmarking of 8 variants of TWSVM
based classifiers from three categories with 179 classifiers of 17 families. Eight
variants of TWSVM based classifiers have been taken into account and tested
along with 179 classifiers on various performance criteria viz., Acc, FRank,
PAMA, p95, and PMA. Two variants from least square category (ILSTSVM m,
and RELS-TSVM m) have performed the best among all 187 classifiers for bi-
nary class datasets as per FRank, Acc and PMA criteria, and another TWSVM
variants TBSVM m has performed the best as per PAMA criterion. Overall, 5
and 4 TWSVM variants are able to secure a place in the top 20 best classifiers for
binary and multi-class datasets according to FRank, respectively. An interesting
fact is observed among TWSVM variants for binary datasets. The basic least
square version of TWSVM i.e. LSTSVM m has performed the second-worst
among all TWSVM variants as per Acc criterion. However, their improved
variants i.e. RELS-TSVM m and ILSTSVM m have performed the best among
all TWSVM variants. A similar observation is made for multi-class datasets.
Moreover, TWSVM variants didn’t attain even top 5 positions for multi-class
datasets. Although, TWSVM variants are not emerged as the best classifiers
for multi-class datasets, it can be a good alternative because it obtained 2nd po-
sition as per PAMA criterion. On the other hand, it can be a better alternative
for binary class datasets compared to other state-of-the-art classifiers. Further-
32
Table 10: Training time (in seconds) of 8 variants for 90 datasets and continued to Table 11
ILSTSVM m LPTSVM m LSTSVM m pinTSVM m RELS-
TSVM m
TBSVM m TWSVM m WLTSVM m
acute-inflammation 0.0320 0.1779 0.0343 0.0161 0.0342 0.0662 0.1137 0.0023
acute-nephritis 0.0078 0.1487 0.0109 0.0125 0.0084 0.0610 0.0342 0.0037
annealing 0.7267 21.6805 0.7004 0.2043 0.7097 7.4263 1.7779 0.2041
arrhythmia 0.7148 6.7270 0.6239 0.0625 0.5953 0.8841 1.0891 0.0213
balance-scale 0.1603 6.4621 0.1684 0.0507 0.1080 1.2528 0.3692 0.0401
balloons 0.0026 0.0073 0.0056 0.0096 0.0030 0.0051 0.0191 0.0005
blood 0.1058 2.7389 0.1100 0.0832 0.1510 4.5923 0.4953 0.0635
breast-cancer 0.0145 0.6903 0.0147 0.0283 0.0155 0.1350 0.0642 0.0070
breast-cancer-wisc 0.1141 2.9150 0.0950 0.0688 0.0903 0.3817 0.3483 0.0576
breast-cancer-wisc-diag 0.1133 2.6128 0.0620 0.0438 0.0650 0.1578 0.2355 0.0313
breast-cancer-wisc-prog 0.0090 0.2316 0.0209 0.0193 0.0125 0.0428 0.0430 0.0066
breast-tissue 0.0106 0.3496 0.0108 0.0119 0.0129 0.0771 0.0817 0.0031
car 1.5964 40.2947 1.8283 0.4875 1.5445 37.5490 4.7756 0.5331
cardiotocography-10clases 6.6195 222.4167 8.2245 0.8833 7.7333 116.9285 18.1605 0.9085
cardiotocography-3clases 1.9685 104.8104 2.6934 1.2793 2.2968 16.3837 5.1091 1.0034
chess-krvkp 3.6092 444.7051 5.5426 1.3464 3.8869 10.8307 8.5032 2.4955
congressional-voting 0.0301 0.7976 0.0656 0.0248 0.0331 0.1125 0.1397 0.0196
conn-bench-sonar-mines-rocks 0.0102 0.1526 0.0105 0.0129 0.0310 0.0553 0.0608 0.0063
conn-bench-vowel-deterding 0.5993 15.4250 0.7889 0.0888 0.6403 11.4379 2.0700 0.0732
contrac 0.8070 17.0902 1.0402 0.2032 0.9466 32.2308 1.6210 0.3612
credit-approval 0.0884 3.3101 0.1561 0.0596 0.1697 0.4281 0.2311 0.1338
cylinder-bands 0.0487 4.3855 0.0482 0.0382 0.0887 0.2398 0.1274 0.0266
dermatology 0.1315 2.1967 0.0689 0.0916 0.0870 0.4126 0.2594 0.0129
echocardiogram 0.0058 0.0646 0.0054 0.0147 0.0059 0.0708 0.0440 0.0051
ecoli 0.0751 2.0166 0.1132 0.0283 0.1065 0.9302 0.3958 0.0124
energy-y1 0.1709 5.6710 0.2506 0.1099 0.1742 2.9909 0.3892 0.2076
energy-y2 0.1620 5.5776 0.2259 0.0820 0.2345 1.1859 0.4503 0.0786
fertility 0.0044 0.0328 0.0044 0.0128 0.0059 0.0271 0.0568 0.0264
flags 0.0314 0.5832 0.0961 0.0148 0.0397 0.1773 0.1897 0.0112
glass 0.0245 0.5198 0.0229 0.0150 0.0432 0.4832 0.1756 0.0053
haberman-survival 0.0158 0.2859 0.0155 0.0194 0.0164 0.3252 0.1313 0.0096
hayes-roth 0.0100 0.3890 0.0106 0.0189 0.0137 0.1103 0.0763 0.0039
heart-cleveland 0.0390 1.2811 0.0348 0.0264 0.0489 0.5683 0.1989 0.0168
heart-hungarian 0.0142 0.5106 0.0140 0.0450 0.0185 0.1827 0.0732 0.0137
heart-switzerland 0.0101 0.2188 0.0133 0.0159 0.0368 0.0886 0.0828 0.0081
heart-va 0.0198 0.5101 0.0267 0.0304 0.0219 0.3522 0.2045 0.0064
hepatitis 0.0067 0.1055 0.0151 0.0139 0.0072 0.0496 0.0631 0.0039
hill-valley 0.1623 4.7513 0.2599 0.1048 0.2243 0.8187 0.2973 0.2008
horse-colic 0.0579 1.0534 0.0274 0.0290 0.0401 0.0797 0.1556 0.0141
ilpd-indian-liver 0.1184 1.5416 0.0615 0.0456 0.0601 1.0810 0.1632 0.1038
image-segmentation 0.1606 7.1914 0.4763 0.0876 0.5045 0.4171 0.3660 0.0241
ionosphere 0.0251 0.5386 0.0206 0.0239 0.0282 0.0968 0.0731 0.0169
iris 0.0080 0.2485 0.0076 0.0181 0.0551 0.1585 0.0637 0.0038
led-display 1.1272 22.0068 1.2954 0.1208 1.4207 11.6267 2.7106 0.2657
lenses 0.0025 0.0192 0.0037 0.0202 0.0030 0.0071 0.0514 0.0006
libras 0.2190 5.0433 0.2063 0.0338 0.2571 1.6482 0.8661 0.0141
low-res-spect 0.3521 7.2346 0.3484 0.0758 0.3577 1.0836 0.8815 0.0598
lung-cancer 0.0029 0.0061 0.0033 0.0215 0.0033 0.0073 0.0541 0.0007
33
Table 11: Continuing from Table 10
ILSTSVM m LPTSVM m LSTSVM m pinTSVM m RELS-
TSVM m
TBSVM m TWSVM m WLTSVM m
lymphography 0.0108 0.4371 0.0439 0.0206 0.0121 0.0379 0.1242 0.0119
mammographic 0.2450 5.5149 0.2959 0.0944 0.1832 0.7785 0.4427 0.1512
molec-biol-promoter 0.0050 0.0293 0.0410 0.0218 0.0093 0.0130 0.0431 0.0035
molec-biol-splice 5.7153 616.4776 7.6826 1.1327 5.8671 17.0395 10.3194 2.5694
monks-1 0.0088 0.5434 0.0127 0.0203 0.0116 0.0207 0.0690 0.0049
monks-2 0.0116 0.3640 0.0143 0.0347 0.0157 0.1047 0.0735 0.0093
monks-3 0.0074 0.6456 0.0105 0.0234 0.0115 0.0332 0.0608 0.0064
musk-1 0.0503 1.2807 0.0538 0.0479 0.0526 0.1084 0.1253 0.0268
oocytes merluccius nucleus 4d 0.2280 6.9678 0.2587 0.1192 0.2559 6.8678 0.6275 0.1540
oocytes merluccius states 2f 0.4011 11.9716 0.4037 0.1528 0.3747 1.0210 1.0321 0.1372
oocytes trisopterus nucleus 2f 0.1661 4.4441 0.2069 0.1021 0.1760 1.5097 0.3527 0.1762
oocytes trisopterus states 5b 0.3013 9.4324 0.3281 0.0990 0.2702 1.2043 0.6508 0.1317
ozone 2.4402 65.2626 3.0110 1.8025 2.4534 43.0555 6.7500 1.5644
parkinsons 0.0088 0.4617 0.0084 0.0221 0.0096 0.0298 0.0752 0.0034
pima 0.1514 6.3792 0.1133 0.0715 0.1099 0.4495 0.3621 0.0666
pittsburg-bridges-MATERIAL 0.0062 0.0759 0.0057 0.0237 0.0069 0.0295 0.0654 0.0029
pittsburg-bridges-REL-L 0.0055 0.1070 0.0055 0.0231 0.0059 0.0287 0.0930 0.0024
pittsburg-bridges-SPAN 0.0282 0.2568 0.0089 0.0244 0.0061 0.0289 0.0837 0.0029
pittsburg-bridges-T-OR-D 0.0043 0.0455 0.0044 0.0237 0.0048 0.0218 0.0440 0.0028
pittsburg-bridges-TYPE 0.0097 0.2723 0.0238 0.0223 0.0138 0.0649 0.1499 0.0025
planning 0.0085 0.1060 0.0071 0.0254 0.0086 0.0877 0.0861 0.0685
primary-tumor 0.1901 3.2089 0.1568 0.0377 0.1767 0.9176 1.7308 0.0114
seeds 0.0129 0.4570 0.0544 0.0250 0.0136 0.0479 0.2573 0.0048
soybean 0.2723 10.8691 0.3824 0.0439 0.3483 1.0151 1.6679 0.0166
spect 0.0058 0.0908 0.0069 0.0239 0.0073 0.0126 0.0687 0.0031
spectf 0.0055 0.0865 0.0095 0.0240 0.0083 0.0151 0.0608 0.0046
statlog-australian-credit 0.1066 2.5125 0.1669 0.0560 0.0888 2.1936 0.5037 0.0504
statlog-german-credit 0.2577 18.5491 0.2260 0.1189 0.2264 1.4237 0.7356 0.1430
statlog-heart 0.4155 0.7680 0.0130 0.0240 0.0138 0.0336 0.6537 0.0066
statlog-image 5.4597 248.1873 6.7212 0.8627 5.8391 107.6823 16.1723 1.2836
statlog-vehicle 0.2838 12.8589 0.3178 0.1154 0.2891 5.5666 1.0063 0.0942
steel-plates 3.8250 113.8698 4.4728 0.5176 3.8220 63.9410 10.8581 0.6868
synthetic-control 0.3117 6.7058 0.2292 0.0935 0.2404 1.8294 0.8085 0.0449
teaching 0.0081 0.1407 0.0079 0.0218 0.0093 0.0734 0.1540 0.0125
thyroid 18.8930 1252.4298 25.6913 11.0068 20.1259 471.9164 92.1563 8.1767
tic-tac-toe 0.1825 7.2642 0.2024 0.1488 0.2229 1.1397 0.4431 0.1126
titanic 1.3967 20.6183 1.7539 0.4372 1.7101 9.5813 2.4346 0.8920
trains 0.0020 0.0042 0.0028 0.0220 0.0022 0.0042 0.0387 0.0005
vertebral-column-2clases 0.0261 0.7050 0.0145 0.0244 0.0165 0.0685 0.0890 0.0081
vertebral-column-3clases 0.0264 0.8039 0.0210 0.0286 0.0533 0.1170 0.1396 0.0080
wine 0.0761 0.5782 0.0096 0.0220 0.0138 0.0337 0.1395 0.0039
wine-quality-red 1.9174 55.3645 2.2870 0.4359 1.9982 40.9643 4.9816 0.4275
Average Time (in Seconds) 0.7093 38.3875 0.9019 0.2684 0.7564 11.6374 2.3408 0.2691
Maximum Time (in Seconds) 18.8930 1252.4298 25.6913 11.0068 20.1259 471.9164 92.1563 8.1767
34
more, most of the TWSVM variants are developed for stationary environment
(batch learning), hence, a lot of scope remains for the development of TWSVM
for the non-stationary environment (online learning). These variants can also
be developed for various framework like LUPI framework, Graph-Embedding
framework etc. As TWSVM variants emerges as the viable alternative for bi-
nary datasets, therefore, these variants need to be tested on those applications
where binary classification is required.
35
Acknowledgement
This work is supported by Science and Engineering Research Board (SERB)
funded Research Projects, Government of India under Early Career Research
Award Scheme, Grant No. ECR/2017/000053 and Ramanujan Fellowship Scheme,
Grant No. SB/S2/RJN-001/2016. We gratefully acknowledge the Indian Insti-
tute of Technology Indore for providing facilities and support.
References
[1] C. J. C. Burges, A tutorial on support vector machines for pattern recogni-
tion, Data Mining and Knowledge Discovery (2) (1998) 1-43.
[2] C. C. Chang, C. J. Lin, LIBSVM: a library for support vector machines,
ACM Transactions on Intelligent Systems and Technology (TIST) 2 (3)
(2011) 27.
[3] C. Cortes, V. N. Vapnik, Support vector networks, Machine Learning (20)
(1995) 273–297.
[4] N. Cristianini, J. Shawe-Taylor, An introduction to support vector machines
and other kernel based learning method, Cambridge University Press, Cam-
bridge, 2000.
[5] M. Tanveer, M. Mangal, I. Ahmad, Y.H. Shao, One norm linear program-
ming support vector regression, Neurocomputing (173) (2016) 1508-1518.
[6] J. Demsar, Statistical comparisons of classifiers over multiple data sets, Jour-
nal of Machine Learning Research (7) (2006) 1-30.
[7] G. H. Golub, C. F. V. Loan, Matrix Computations. Vol. 3 JHU Press, 2012.
[8] Jayadeva, R. Khemchandani, S. Chandra, Twin support vector machines for
pattern classification, IEEE Transactions on Pattern Analysis and Machine
Intelligence 29 (5) (2007) 905-910.
36
[9] M. A. Kumar, M. Gopal, Least squares twin support vector machines for
pattern classification, Expert Systems with Applications (36) (2009) 7535-
7543.
[10] O. L. Mangasarian, E. W. Wild, Multisurface proximal support vector clas-
sification via generalized eigenvalues, IEEE Transactions on Pattern Analy-
sis and Machine Intelligence 28 (1) (2006) 69-74.
[11] O. L. Mangasarian, Exact 1-norm support vector machines via uncon-
strained convex differentiable minimization. Journal of Machine Learning
Research. 2006; (7): 1517-1530.
[12] Y. H. Shao, C. H. Zhang, X. B. Wang, N. Y. Deng, Improvements on
twin support vector machines, IEEE Transactions on Neural Networks 22
(6) (2011) 962-968.
[13] V. N. Vapnik, Statistical Learning Theory, Wiley, New York, 1998.
[14] M. Tanveer, A. Tiwari, R. Choudhary, S. Jalan, Sparse pinball twin support
vector machines, Applied Soft Computing (78) (2019) 164-175.
[15] Y. Tian, Z. Qi, Review on: Twin Support Vector Machines, Annals of Data
Science 1 (2) (2014) 253-277.
[16] Y. Xu, W. Xi, X. Lv, R. Guo, An improved least squares twin support vec-
tor machine, Journal of information and computational science, 9(4) (2012)
1063-1071.
[17] H. Huang and X. Wei and Y. Zhou, Twin support vector machines: A
survey, Neurocomputing (300) (2018) 34 - 43.
[18] M. Tanveer, Application of smoothing techniques for linear programming
twin support vector machines, Knowledge and Information Systems 45(1)
(2015) 191-214.
37
[19] Y. H. Shao, W. J. Chen, J. J. Zhang, Z. Wang, N. Y. Deng, An efficient
weighted Lagrangian twin support vector machine for imbalanced data clas-
sification, Pattern Recognition 47 (9) (2014) 3158-3167.
[20] M. Tanveer, M. A. Khan, S. S. Ho, Robust energy-based least squares twin
support vector machines, Applied Intelligence 45 (1) (2016) 174-186.
[21] A. N. Tikhonov, V. Y. Arsen, Solutions of ill-posed problems. John Wiley
& Sons. New York; 1977.
[22] M. Tanveer, Robust and sparse linear programming twin support vector
machines, Cognitive Computation, 7(1) (2015) 137-149.
[23] M. Fernandez-Delgado, E. Cernadas, S. Barro, D. Amorim, Do we need
hundreds of classifiers to solve real world classification problems, Journal of
Machine Learning and Research, 15 (1) (2014) 3133-3181.
[24] M. Lichman, UCI machine learning repository, (2013).
[25] J. Vanschoren, H. Blockeel, B. Pfahringer, G. Holmes, Experiment
databases, Machine Learning, 87 (2) (2012) 127-158.
[26] X. Peng, TPMSVM: a novel twin parametric-margin support vector ma-
chine for pattern recognition, Pattern Recognition, 44(10-11) (2011) 2678-
2692.
[27] Y. Xu, Z. Yang, X. Pan, A novel twin support-vector machine with pinball
loss, IEEE Transactions on Neural Networks and Learning Systems, 28 (2)
(2017) 359-370.
[28] Z. Wang, Y.-H. Shao, T. R. Wu, A GA-based model selection for smooth
twin parametric-margin support vector machine, Pattern Recognition, 46
(8) (2013) 2267-2277.
[29] M. Tanveer, Newton method for implicit Lagrangian twin support vector
machines, International Journal of Machine Learning and Cybernetics 6(6)
(2015) 1029-1040.
38
[30] N. Parastalooi, A. Amiri, P. Aliheidari, Modified twin support vector re-
gression, Neurocomputing, 211 (2016) 84-97.
[31] L. Zhang and P. N. Suganthan, Benchmarking Ensemble Classifiers with
Novel Co-Trained Kernel Ridge Regression and Random Vector Functional
Link Ensembles [Research Frontier], in IEEE Computational Intelligence
Magazine, 12 (4) (2017) 61-72.
[32] R. Rastogi, S. Sharma, Fast Laplacian twin support vector machine with
active learning for pattern classification, Applied Soft Computing 74 (2019)
424-439.
[33] B. Richhariya, M. Tanveer, EEG signal classification using universum sup-
port vector machine, Expert Systems with Applications 106 (2018) 169-182.
[34] R. L. Iman, J. M. Davenport. Approximations of the critical region of the
fbietkan statistic, Communications in Statistics-Theory and Methods, 9 (6)
(1980) 571-595.
39
... To verify the clustering ability of MSTSO-FCM, two sets of artificial large datasets and four sets of UCI datasets are selected [30] and are available from https://archive. ics.uci.edu/datasets ...
Article
Full-text available
To overcome the shortcoming of the Fuzzy C-means algorithm (FCM)—that it is easy to fall into local optima due to the dependence of sub-spatial clustering on initialization—a Multi-Strategy Tuna Swarm Optimization-Fuzzy C-means (MSTSO-FCM) algorithm is proposed. Firstly, a chaotic local search strategy and an offset distribution estimation strategy algorithm are proposed to improve the performance, enhance the population diversity of the Tuna Swarm Optimization (TSO) algorithm, and avoid falling into local optima. Secondly, the search and development characteristics of the MSTSO algorithm are introduced into the fuzzy matrix of Fuzzy C-means (FCM), which overcomes the defects of poor global searchability and sensitive initialization. Not only has the searchability of the Multi-Strategy Tuna Swarm Optimization algorithm been employed, but the fuzzy mathematical ideas of FCM have been retained, to improve the clustering accuracy, stability, and accuracy of the FCM algorithm. Finally, two sets of artificial datasets and multiple sets of the University of California Irvine (UCI) datasets are used to do the testing, and four indicators are introduced for evaluation. The results show that the MSTSO-FCM algorithm has better convergence speed than the Tuna Swarm Optimization Fuzzy C-means (TSO-FCM) algorithm, and its accuracies in the heart, liver, and iris datasets are 89.46%, 63.58%, 98.67%, respectively, which is an outstanding improvement.
... RVFL and its deep variants have shown superior performance than ELM, Hierarchical ELM (H-ELM) and multi-layer kernel ELM (ML-KELM) [193]. Support vector machine (SVM) has strong mathematical foundation and has shown state of the arts results [233,234,235]. However, RVFL with privileged information (RVFL+) and its kernel extension (KRVFL+) have shown superior performance than SVM and its variants such as gSMO-SVM+ and fast SVM+ [89]. ...
... In a recent comprehensive evaluation [45], RELS-TSVM [31] showed the best performance on several UCI datasets, and this classifier was the top-ranked classifier. To further investigate the robustness and computational efficiency of our G-WLSTSVM, we compare the classification accuracy and computational efficiency of the proposed G-WLSTSVM with that of TSVM, LSTSVM, and RELS-TSVM on several UCI datasets. ...
Article
Full-text available
Monitoring the operating status of the pipeline and determining the location of the leak in time are very important to ensure the safe operation of the pipeline. Least squares twin support vector machine (LSTSVM) is a classic fast classification method that has been used to identify different pipeline conditions; however, LSTSVM assumes that all samples share the same weight when generating the hyperplane, including data points that may be polluted in the sample (i.e., outliers), and outlier samples with equal weights will mislead the generation of the hyperplane. Inspired by the above research, this article proposes a weighted least squares twin bounded support vector machine based on Gaussian mixture models (GMMs), referred to as G-WLSTSVM. The proposed G-WLSTSVM introduces a weight matrix for the objective function through GMM and assigns a larger weight to the normal samples and a smaller weight to the outliers, which reduces the impact effect of the outliers on the generation of the classification hyperplane. Furthermore, since LSTSVM only considers the empirical risk minimization principle, it may lead to overfitting. The proposed G-WLSTSVM introduces an extra regularization term based on the margin maximization idea to realize the principle of structural risk minimization, which improves the generalization performance of the model. Finally, since the practical problems are mostly multiclassification problems, the G-WLSTSVM for binary classification cannot be satisfied. Therefore, the proposed G-WLSTSVM combined with a “One-versus-One” strategy is extended to handle multiclassification problems, namely the multiclass G-WLSTSVM. We evaluate the effectiveness of the multiclass G-WLSTSVM in identifying different pipeline conditions and localizing the identified leakage. Numerical experimental results on several University of California Irvine (UCI) datasets further demonstrate that compared with other related methods, the proposed G-WLSTSVM not only retains the advantages of simplicity and speed of the LSTSVM but also improves the classification accuracy and generalization ability. The code for this article is available at https://github.com/cmq-456/glstsvm .
... where m is the test dataset size and m i denotes the number of correctly identified test samples from class i. Computations were performed with Matlab R2019b on a Macbook Air notebook, Intel I5 1.8GHz CPU and 8GB Ram. The numbers for the normal SVM model were obtained with the optimized libsvm library [4], while the computer codes for all of the twin models were modifications of the codes made available in [11]. ...
Conference Paper
Full-text available
A common decision strategy in multiclass support vector machines is to tally the votes of one-versus-one decisions, and choose the class that receives the largest number of votes. We propose an alternative strategy for one-versus-one decision making in twin support vector machines that tallies the average distance of a data sample from all the twin hyperplanes belonging to a given class, for each class. We compare the results with other decision strategies on several variations of the twin support vector machine using standard unbalanced datasets. The proposed strategy shows improved accuracy with some datasets, which is most pronounced in the case of the least squares twin support vector machine.
... The SVM classifier can process higherdimension data, so it is extensively used in several applications. Specifically, the SVM classifier performs well in solving a 2-class problem associated with structure principles and vapnik-Chervonenkis theories [29], [30]. The general formula to determine linear discriminant function is mathematically stated in equation (11). ...
Article
In recent decades, the analysis of dynamic characteristics of Soil-Structure Interaction (SSI) has become an emerging research topic, where the SSI is defined as the structure's motion and the soil's response. The SSI is an important problem in solid and monstrous structures, which are built on delicate ground that changes the dynamic properties of the structures. The main objective of this research article is to propose an ensemble machine-learning algorithm for predicting the dynamic response and characteristics of SSI problems. After collecting the data from 57 structures, the data pre-processing is accomplished using Min-Max Normalization (MMN) and Max Normalization (MN) techniques that superiorly rescale the unstructured data for better prediction. Further, the data optimization is carried out using the Modified Ant Lion Optimization (MALO) algorithm that effectively optimizes the dimensionality of the data, where this process reduces the computational complexity and improves the prediction accuracy of dynamic characteristics in SSI modeling. Finally, the optimized data is given as the input to the ensemble classifier, which is a combination of Support Vector Machine (SVM) and ID3 for classifying the dynamic characteristics related to SSI, which are period Lengthening (PL), Super Structure Acceleration (SSA) and Pile Head Acceleration (PHA). The simulation results confirmed that the ensemble-based MALO algorithm improved performance in predicting the dynamic response and characteristics of SSI problems by error value. Whereas the proposed algorithm, on average, reduced 0.01-to-0.5 error value compared to the existing machine learning algorithms.
Chapter
Auto-encoder is a special type of artificial neural network (ANN) that is used to learn informative features from data. In the literature, the generalization performance of several machine learning models have been improved either using auto-encoder based features or high dimensional features (original + auto-encoder based features). Random vector functional link (RVFL) network also uses two type of features, i.e., original features and randomized features, that makes it a special randomized neural network. These hybrid features improve the generalization performance of the RVFL network. In this paper, we introduce the idea of using additional features into robust energy-based least squares twin support vector machines (RELS-TSVM) and least squares twin support vector machines (LSTSVM). We used sparse auto-encoder with \(L_{1}\) norm regularization to learn the auxiliary feature representation from original feature space. These new additional features are concatenated with the original features to get the extended feature space. The conventional RELS-TSVM and LSTSVM are trained over new extended feature space. Experiments demonstrate that auto-encoder based features improve the generalization capability of the conventional RELS-TSVM and LSTSVM models. To examine the performance of the proposed classifiers, i.e., extended-RELS-TSVM (ext-RELS-TSVM) and extended LSTSVM (ext-LSTSVM), experiments have been conducted over 15 UCI binary datasets and the results show that the proposed classifiers have better generalization performance than the baseline classifiers.KeywordsRobust energy based least squares twin SVM (RELS-TSVM)SP-RVFLLeast squares twin SVM (LSTSVM)Extended feature spaceRVFL
Article
The original twin support vector machine (TWSVM) formulation works by solving two smaller quadratic programming problems (QPPs) as compared to the traditional hinge-loss SVM (C-SVM) which solves a single large QPP — this makes the TWSVM training and testing process faster than the C-SVM. However, these TWSVM problems are based on the hinge-loss function and, hence, are sensitive to feature noise and unstable for re-sampling. The pinball-loss function, on the other hand, maximizes quantile distances which grants noise insensitivity but this comes at the cost of losing sparsity by penalizing correctly classified samples as well. To overcome the limitations of TWSVM, we propose a novel sparse pinball twin support vector machines (SPTWSVM) based on the ϵ-insensitive zone pinball loss function to rid the original TWSVM of its noise insensitivity and ensure that the resulting TWSVM problems retain sparsity which makes computations relating to predictions just as fast as the original TWSVM. We further investigate the properties of our SPTWSVM including sparsity, noise insensitivity, and time complexity. Exhaustive testing on several benchmark datasets demonstrates that our SPTWSVM is noise insensitive, retains sparsity and, in most cases, outperforms the results obtained by the original TWSVM.
Article
In this paper, we propose a semi-supervised classifier termed as Fast Laplacian Twin Support Vector Machine ([Formula presented]) with an objective to reduce the requirement of labeled data and simultaneously lessen the training time complexity of a traditional Laplacian Twin Support Vector Machine semi-supervised classifier. [Formula presented] is faster than existing Laplacian twin support vector machine as it solves a smaller size Quadratic Programming Problem (QPP) along with an Unconstrained Minimization Problem (UMP) to obtain decision hyperplanes which can also handle heteroscedastic noise present in the training data. Traditional semi-supervised classifiers generally have no explicit control over the choice of labeled data available for training, hence to overcome this limitation, we propose a pool-based active learning framework which identifies most informative examples to train the learning model. Moreover, the aforementioned framework has been extended to deal with multi-category classification scenarios. Several experiments have been performed on machine learning benchmark datasets which proves the utility of the proposed classifier over traditional Laplacian Twin Support Vector Machine ([Formula presented]) and active learning based Support Vector Machine ([Formula presented]). The efficacy of the proposed framework has also been tested on human activity recognition problem and content based image retrieval system.
Article
Support vector machine (SVM) has been used widely for classification of electroencephalogram (EEG) signals for the diagnosis of neurological disorders such as epilepsy and sleep disorders. SVM shows good generalization performance for high dimensional data due to its convex optimization problem. The incorporation of prior knowledge about the data leads to a better optimized classifier. Different types of EEG signals provide information about the distribution of EEG data. To include prior information in the classification of EEG signals, we propose a novel machine learning approach based on universum support vector machine (USVM) for classification. In our approach, the universum data points are generated by selecting universum from the EEG dataset itself which are the interictal EEG signals. This removes the effect of outliers on the generation of universum data. Further, to reduce the computation time, we use our approach of universum selection with universum twin support vector machine (UTSVM) which has less computational cost in comparison to traditional SVM. For checking the validity of our proposed methods, we use various feature extraction techniques for different datasets consisting of healthy and seizure signals. Several numerical experiments are performed on the generated datasets and the results of our proposed approach are compared with other baseline methods. Our proposed USVM and proposed UTSVM show better generalization performance compared to SVM, USVM, Twin SVM (TWSVM) and UTSVM. The proposed UTSVM has achieved highest classification accuracy of 99 % for the healthy and seizure EEG signals.
Article
Studies in machine learning have shown promising classification performance of ensemble methods employing "perturb and combine" strategies. In particular, the classical random forest algorithm performs the best among 179 classifiers on 121 UCI datasets from different domains. Motivated by this observation, we extend our previous work on oblique decision tree ensemble. We also propose an efficient co-trained kernel ridge regression method. In addition, a random vector functional link network ensemble is also introduced. Our experiments show that our two oblique decision tree ensemble variants and the co-trained kernel ridge regression ensemble are the top three ranked methods among the 183 classifiers. The proposed random vector functional link network ensemble also outperforms all neural network based methods used in the experiments.
Article
The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will find that even old material is cast in a fresh light.
Article
The present study suggest modified twin support vector regression (MTSVR) for data regression. In the MTSVR model, the regression function is determined using a pair of unparalleled up and down bound functions. In any optimization problem, a new term is added to obtain structural information of the input data based on the concept of structural granularity. Furthermore, Successive Over relaxation is used to accelerate the training process of optimization problems. Particle Swarm Optimization (PSO) algorithm is used to determine the parameters of the MTSVR model. According to the results of the artificial and real datasets, the prediction accuracy and generalization capability of the MTSVR model is significantly increased.
Article
Twin support-vector machine (TSVM), which generates two nonparallel hyperplanes by solving a pair of smaller-sized quadratic programming problems (QPPs) instead of a single larger-sized QPP, works faster than the standard SVM, especially for the large-scale data sets. However, the traditional TSVM adopts hinge loss which easily leads to its sensitivity of the noise and instability for resampling. To enhance the performance of the TSVM, we present a novel TSVM with the pinball loss (Pin-TSVM) which deals with the quantile distance and is less sensitive to noise points. We further investigate its properties, including the noise insensitivity, between-class distance maximization, and within-class scatter minimization. In addition, we compare our Pin-TSVM with the twin parametric-margin SVM and the SVM with the pinball loss in theory. Numerical experiments on a synthetic data set and 14 benchmark data sets with different noises demonstrate the feasibility and validity of our proposed method.