Content uploaded by M. Tanveer
Author content
All content in this area was uploaded by M. Tanveer on Aug 21, 2023
Content may be subject to copyright.
Comprehensive Evaluation of Twin SVM based
Classifiers on UCI Datasets
M. Tanveera,∗, C. Gautamb, P.N. Suganthanc,∗
aDiscipline of Mathematics, Indian Institute of Technology Indore, Indore, Simrol 453552
India
bDiscipline of Computer Science and Engineering, Indian Institute of Technology Indore,
Indore, Simrol 453552 India
cSchool of Electrical and Electronic Engineering, Nanyang Technological University,
639798, Singapore
Abstract
In the past decade, twin support vector machine (TWSVM) based classifiers
have received considerable attention from the research community. In this pa-
per, we analyze the performance of 8 variants of TWSVM based classifiers along
with 179 classifiers evaluated in [23] from 17 different families on 90 University
of California Irvine (UCI) benchmark datasets from various domains. Results
of these classifiers are exhaustively analyzed using various performance criteria.
Statistical testing is performed using Friedman Rank (FRank). Our experi-
ments show that two least square TWSVM based classifiers (ILSTSVM m, and
RELS-TSVM m) are the top two ranked methods among 187 classifiers and
they significantly outperform all other classifiers according to Friedman Rank.
Overall, this paper bridges the evaluational benchmarking gap between various
TWSVM variants and the classifiers from other families. Codes of this pa-
per are provided on authors’ homepages to reproduce the presented
results and figures in this paper.
Keywords: Benchmarking classifiers . Twin support vector machines . Least
squares twin support vector machines. Support vector machines . Machine
∗Corresponding author
Email addresses: mtanveer@iiti.ac.in (M. Tanveer), chandangautam31@gmail.com (C.
Gautam), epnsugan@ntu.edu.sg (P.N. Suganthan)
Preprint submitted to Applied Soft Computing, Elsevier July 18, 2019
learning .
1. Introduction
Among kernel based methods, SVM is well explored in the past by re-
searchers primarily in the context of pattern recognition [1, 2, 3, 4, 5]. Most
of the work in SVM endeavor to maximize the margin between two parallel
hyperplanes by minimizing the generalization error. In 2007, Jayadeva et al.
[8] introduced the concept of non-parallel supporting hyperplanes and was re-
ferred to as twin support vector machine (TWSVM). It solves two smaller sized
quadratic programming problems (QPPs) instead of solving one large QPP in
traditional SVM, and shows better performance in both computational time
and classification accuracy. Then, Kumar et al. [9] proposed a least square
TWSVM (LSTSVM), which is an extremely simple and fast algorithm for gen-
erating binary classifiers. In the last decade, TWSVM formulation has attracted
considerable attention by the research community for replacing the parallel hy-
perplanes to non-parallel ones in SVM [8]. Generally, fuzziness is embedded in
SVM to solve this type of issue, which introduces extra complexity to the model.
However, TWSVM can handle such a situation effectively without introducing
further complexity to the model. Various variants of TWSVM have been devel-
oped by the researchers in the last decade [14, 16, 19, 26, 28, 29, 32, 33]. Shao
et al. [12] included one more regularization term to TWSVM and proposed a
new variant termed as twin bounded support vector machine (TBSVM). The
formulation of TWSVM can be viewed as a special case of TBSVM. An im-
provement to LSTSVM (ILSTSVM) has also been proposed by Xu et al. [16]
by introducing a regularization term. Later, the weighted Lagrangian twin sup-
port vector machine (WLTSVM) [19] was proposed for imbalance data classifi-
cation. Recently, two more variants viz., robust and sparse linear programming
TWSVM (LPTSVM) [18, 22] and robust energy based LSTSVM (RELS-TSVM)
[20] were proposed for classification problems. Most recently, pinball loss-based
TWSVM (pinTSVM) [27] was proposed, which takes quantile distance into ac-
2
count and robust to noisy samples. Apart from the above discussed variants of
TSVM, researchers have also developed evolutionary algorithm-based TWSVM
[28, 30] where evolutionary algorithms are employed to select optimal values
of parameters for TWSVM. We have selected 8 competitive variants among
various existing TWSVM variants in the literature [15]. A brief description of
these 8 variants are provided in Section 3. Further, we analyze the outcomes
of the classifiers based on various performance analysis criteria viz., Friedman
Rank (FRank), Average accuracy (Acc), Probability of Achieving the Maximum
Accuracy (PAMA), Probability of achieving more than 95% of the maximum
accuracy (P95) and Percentage of the Maximum Accuracy (PMA) [23]. Before
going to further discussion, we are providing motivation in the next section.
2. Motivation and contributions
The main focus of this paper is to analyze the performance of the TWSVM
variants as well as existing classifiers on 90 datasets. Vanschoren et al. [25]
have provided a good analysis using 86 datasets and 93 classifiers using Weka.
Recently, Fernandez et al. [23] performed exhaustive experiments on 121 UCI
repository datasets with 179 classifiers from 17 different families and provided
the ranking of these classifiers on various binary class and multi-class datasets.
They focused on the combined analysis of binary and multi-class classification,
however, provided a very brief analysis on binary class datasets separately. Fer-
nandez et al. [23] have empirically exhibited that the performance of the classi-
fiers depends on the fact whether dataset belongs to binary or multi-class. They
provided a different analysis for binary and multi-class datasets but provided
very brief analysis on binary datasets compared to multi-class datasets. Most
recently, Zhang and Suganthan [31] also performed similar experiments with
their proposed kernel ridge regression-based classifiers. Apart of this, above
mentioned papers [25, 23, 31] did not consider a quite popular method in the
last decade namely TWSVM. TWSVM exhibited very good performance in the
literature [17], therefore, TWSVM needs to be tested on the same experimen-
3
tal setup as used in [23]. Hence, by taking a clue from the paper [23], we are
providing a broad analysis in this paper of the 8 variants of TWSVM with
179 classifiers used in [23] over 90 UCI datasets (44 binary and 46 multi-class
datasets). These UCI datasets and their indices for training and testing have
been taken from [23] and listed on this web page 1along with the detailed
results. Moreover, for multi-class datasets, we use one vs. rest strategy and
analysis is provided separately for binary and multi-class datasets in this paper.
The remaining paper is organized as follows: Section 3 briefly discusses
the eight variants of TWSVM. Section 4 provides the comparative analysis of
TWSVM based eight classifiers with 179 classifiers, which is followed by the
conclusion in the last section.
3. Variants of twin support vector machines
In this section, eight variants of TWSVM are discussed briefly. These vari-
ants can be divided into 3 categories. The first category contains three basic
TWSVM variants viz., TBSVM, TWSVM, and LPTSVM. The second category
is based on weighted TWSVM and variants are pinTSVM and WLTSVM. The
third category contains three least square variants viz., LSTSVM, ILSTSVM,
and RELS-TSVM. Out of eight variants, RELS-TSVM [20] and ILSTSVM [16]
emerge as the best classifiers among 187 classifiers and yields least FRank as
well as highest average accuracy.
3.1. Basic TWSVM variants
3.1.1. Twin support vector machine (TWSVM)
Let us denote all the data points in class +1 by a matrix A∈Rm1×n,where
ith data point Ai∈Rnand the matrix B∈Rm2×nrepresent the data points
of class -1. Unlike SVM, the linear TWSVM [8] seeks a pair of non-parallel
1http://people.iiti.ac.in/~phd1501101001/TSVM_JMLR_Binary_Multi.html
4
hyperplanes
f1(x) = wt
1x+b1and f2(x) = wt
2x+b2(1)
such that each hyperplane is proximal to the data points of one class and far
from the data points of other class, where w1∈Rn, w2∈Rn, b1∈Rand b2∈R.
The formulation of TWSVM can be written as follows:
min
(w1,b1)∈Rn+1
1
2kAw1+e2b1k2+c1kξ1k
s.t. −(Bw1+e1b1) + ξ1≥e1, ξ1≥0 (2)
and
min
(w2,b2)∈Rn+1
1
2kBw2+e1b2k2+c2kξ2k
s.t. (Aw2+e2b2) + ξ2≥e2, ξ2≥0 (3)
respectively, where c1, c2are positive parameters and e1, e2are vectors of one
with appropriate dimensions. Further, we derive dual forms of above problems
can be written as follows:
In order to derive the corresponding dual problems, TWSVM assumes that
the matrices GtGand HtHare nonsingular, where G= [A e2] and H= [B e1]
are augmented matrices of sizes m1×(n+ 1) and m2×(n+ 1), respectively.
Under this condition, the dual problems are
max
α∈Rm2
et
1α−1
2αtHGtG−1Htα
s.t. 0≤α≤c1(4)
and
max
γ∈Rm1
et
2γ−1
2γtGHtH−1Gtγ
s.t. 0≤γ≤c2(5)
respectively. Here, two matrices GtGand HtHare nonsingular matrices, where
G= [A e2], and H= [B e1].
5
In above optimization problem, GtGor HtHcan be singular or ill con-
ditioned. Hence, in order to avoid these cases, the inverse matrices (GtG)−1
and (HtH)−1are modified as (GtG+δI )−1and (HtH+δI)−1, respectively.
Here δis a very small positive scalar and Iis an identity matrix of appropriate
dimensions. Now, the dual form of above problems can be written as:
max
α∈Rm2
et
1α−1
2αtHGtG+δI−1Htα
s.t. 0≤α≤c1(6)
and
max
γ∈Rm1
et
2γ−1
2γtGHtH+δI−1Gtγ
s.t. 0≤γ≤c2(7)
respectively.
Thus, we obtain the solution of the above problems as follows:
w1
b1
=−GtG+δI−1Htαand
w2
b2
=HtH+δI−1Gtγ. (8)
The dual problems in Eqns. (6) and (7) are derived and solved in [8]. Ex-
perimental results show that the performance of TWSVM is better than the
conventional SVM and generalized eigenvalues proximal SVM (GEPSVM) [10].
3.1.2. Twin bounded support vector machine (TBSVM)
It is well-known that the implementation of structural risk minimization
principle in SVM is one of the significant advantages. However, the primal prob-
lems of TWSVM implements only empirical risk. In addition, we noticed that
TWSVM assumes the existence of the inverse matrices (GtG)−1and (HtH)−1.
However, this requirement cannot always be satisfied. Shao et al. [12] proposed
an improved and more efficient algorithm termed as twin bounded support vec-
tor machines (TBSVM). The formulation of TBSVM implements structural risk
6
minimization principle by including one more regularization term to TWSVM.
The dual formulation of TBSVM can be derived without additional require-
ment. Thus, the formulation of TBSVM is theoretically better than TWSVM
[12].
The linear TBSVM [12] seeks a pair of non-parallel proximal hyperplanes
f1=wt
1x+b1= 0 and f2=wt
2x+b2= 0 (9)
by solving the following primal problems
min
(w1,b1)∈Rn+1
1
2kAw1+e2b1k2+c1kξ1k+c3
2
w1
b1
2
s.t. −(Bw1+e1b1) + ξ1≥e1, ξ1≥0 (10)
and
min
(w2,b2)∈Rn+1
1
2kBw2+e1b2k2+c2kξ2k+c4
2
w2
b2
2
s.t. (Aw2+e2b2) + ξ2≥e2, ξ2≥0 (11)
respectively, where ci, i = 1,2,3,4 are the penalty parameters, e1and e2are
vectors of ones of appropriate dimensions, ξ1and ξ2are slack variables of ap-
propriate dimensions. Their corresponding Lagrange dual problems are
max
αet
2α−1
2αtG(HtH+c3I)−1Gtα
s.t. 0 ≤α≤c1
(12)
max
γet
1γ−1
2γtH(GtG+c4I)−1Htγ
s.t. 0 ≤γ≤c2
(13)
where αand γare Lagrange multipliers, G= [B e1] and H= [A e2]. The
7
solution of the problems in Eqns. (10) and (11) are obtained by
w1
b1
=−(HtH+c3I)−1Gtα(14)
and
w2
b2
= (GtG+c4I)−1Htγ. (15)
Once the solutions of the problems in Eqns. (12) and (13) are obtained, a
new point x∈Rnis assigned to class i(i= +1,−1), depending on which of the
two hyperplanes in (9) it is closer to
Class i= arg min
k=1,2
|wT
kx+bk|
||wk|| ,(16)
where |.|is the absolute value.
3.1.3. Linear programming twin support vector machines (LPTSVM)
The solution of TWSVM and TBSVM are not capable of generating sparse
solutions. To overcome this issue, a robust and sparse linear programming
twin support vector machines (LPTSVM) [22] was proposed. The solution of
LPTSVM is obtained by solving a pair of dual exterior penalty problems as
unconstrained optimization problems using Newton method. Unlike solving
two QPPs in TWSVM and TBSVM, the unconstrained optimization problems
of LPTSVM is reduced to solving two systems of linear equations, which leading
to extremely fast and efficient algorithm.
The formulation of LPTSVM [22] can be expressed as follows:
min
(w1,b1)∈Rn+1 kAw1+e2b1k1+c1kξ1k1+c3
w1
b1
1
s.t. −(Bw1+e1b1) + ξ1≥e1, ξ1≥0
(17)
min
(w2,b2)∈Rn+1 kBw2+e1b2k1+c2kξ2k1+c4
w2
b2
1
s.t. (Aw2+e2b2) + ξ2≥e2, ξ2≥0
(18)
8
where Aand Bare matrices of sizes m1×nand m2×nrespectively, ci, i =
1,2,3,4 are the penalty parameters, e1and e2are the vectors of one’s of sizes
m1and m2, respectively.
Following the approach of [11] we obtain the solutions of the 1-norm TWSVM
in Eqns. (17) and (18) by converting them into a pair of linear programming
problems (LPPs) in primal and solving the exterior penalty functions of their
duals for a finite value of a penalty parameter θ.
Let G= [A e2], H = [B e1] be two augmented matrices of sizes m1×(n+1)
and m2×(n+ 1), respectively. Then, by setting
w1
b1
=p1−q1, G (p1−q1) = r1−s1,
w2
b2
=p2−q2, G (p2−q2) = r2−s2
(19)
where p1, q1, p2, q2∈Rn+1, r1, s1∈Rm1and r2, s2∈Rm2satisfying the non-
negativity constraints
p1, q1, p2, q2, r1, s1, r2, s2≥0,
the above pair of problems in Eqns. (17) and (18) can be converted into the
following pair of linear programming twin support vector machine (LPTSVM)
problems of the form:
min
r1,s1∈Rm1,p1,q1∈Rn+1,ξ1∈Rm2
et
1(r1+s1) + c1et
2ξ1+c3et(p1+q1)
−H(p1−q1) + ξ1≥e2
s.t. G (p1−q1)−(r1−s1)=0
p1, q1, r1, s1, ξ1≥0
(20)
and
min
r2,s2∈Rm2,p2,q2∈Rn+1,ξ2∈Rm1
et
2(r2+s2) + c2et
1ξ2+c4et(p2+q2)
G(p2−q2) + ξ2≥e1
s.t. H(p2−q2)−(r2−s2) = 0
p2, q2, r2, s2, ξ2≥0
(21)
9
respectively, where eis the vector of one’s of size (n+ 1).
3.2. Weighted TWSVM variants
3.2.1. Weighted Lagrangian twin support vector machine (WLTSVM)
Above discussed TWSVM variants do not handle the issue of imbalanced
data. Weighted Lagrangian twin support vector machine (WLTSVM) [19] was
developed for handling the imbalanced data. It uses a graph-based under-
sampling strategy, which provides the robustness to the algorithm against out-
liers. It also embeds the weight biases in the Lagrangian TWSVM for enabling
the algorithm to handle imbalanced data.
The primal problems of WLTSVM can be written as
min
(w1,b1,ξ1)
1
2(kw1k2+b2
1) + c1
2((Aw1+e2b1)t(Aw1+e2b1) + ξt
1D2ξ1)
s.t. −(B2w1+e1b1) + ξ1≥e1, ξ1≥0 (22)
and
min
(w2,b2,ξ2)
1
2(kw2k2+b2
2) + c2
2((B1w2+e1b2)tD1(B1w2+e1b2) + ξt
2ξ2)
s.t. (Aw2+e2b2) + ξ2≥e2, ξ2≥0 (23)
where ci, i = 1,2 are the penalty parameters, e1and e2are vectors of ones
with appropriate dimensions, ξ1and ξ2are slack variables of appropriate di-
mensions, B1and B2are under-sampled training sets. D1and D2are weight
matrices used to determine minority and majority planes respectively in the
case of imbalanced data.
The dual forms of Eqns. (22) and (23) are:
max
α−1
2αtR1StS+c1I−1Rt
1+1
c1
D−1
2α+et
2α
s.t. α ≥0
(24)
10
max
γ−1
2γtSRt
2R2+c2I−1St+1
c2
D−1
1γ+et
1γ
s.t. γ ≥0
(25)
where S=hA e2i,R1=hB2e1i,R2=hB1e2iand α, γ are Lagrangian
multipliers.
Similar to earlier subsections, solutions (w1, b1) and (w2, b2) can be obtained
by solving Eqns. (24) and (25).
3.2.2. Pinball loss based twin support vector machine (pinTSVM)
Twin support vector machines (TWSVM) [8], twin bounded SVM (TBSVM)
[12] and twin parametric-margin support vector machine (TPMSVM) [26] are
efficient classifiers but noise sensitive. To overcome the issue of noise sensitivity
and further enhance the generalization ability, Xu et al. [27] introduced pinball
loss to TPMSVM and proposed twin support vector machine with pinball loss
(pinTSVM), especially for noise-corrupted data.
Let the data points belonging to class +1 and -1 are `1and `2respectively in
the n-dimensional real space Rn. The nonlinear pinTSVM seeks for two kernel
generated surfaces defined as follows:
K(xT, DT)u++b+= 0 and K(xT, D T)u−+b−= 0,
where D= [A;B]; u+, u−∈Rnand Kis an arbitrary kernel function. The
nonlinear pinTSVM formulation can be expressed as follows:
min
u+, b+, ξ1
1
2ku+k2+ν1
`2
eT
2(K(B, DT)u++e2b+) + c1
`1
eT
1ξ1(26)
s.t. K(A, DT)u++e1b+≥ −ξ1,
K(A, DT)u++e1b+≤ξ1
τ1
and
min
u−, b−, ξ2
1
2ku−k2−ν2
`1
eT
1(K(A, DT)u−+e1b−) + c2
`2
eT
2ξ2(27)
s.t. −(K(B, DT)u−+e2b−)≥ −ξ2,
−(K(B, DT)u−+e2b−)≤ξ2
τ2
,
11
where c1, c2are positive parameters; ν1, ν2>0 are margin parameters and τ1,
τ2∈[0,1] are pinball loss function parameters. When τ1and τ2are zero then
QPPs in Eqns. (26) and (27) are converted into the QPPs of TPMSVM. By
introducing the Lagrange function and using the KarushKuhnTucker (K.K.T.)
optimality conditions, we get the dual formulations of QPPs in Eqns. (26) and
(27) as follows:
max
α, β
ν1
`2
eT
2K(B, A)T(α−β)−1
2(α−β)TK(A, A)T(α−β) (28)
s.t. eT
1(α−β) = ν1, α +β
τ1
=c1
`1
e1,
α≥0, β ≥0
and
max
γ, σ
ν2
`1
eT
1K(A, B)T(γ−σ)−1
2(γ−σ)TK(B, B)T(γ−σ) (29)
s.t. eT
2(γ−σ) = ν2, γ +σ
τ2
=c2
`2
e2,
γ≥0, σ ≥0,
where α,β,γand σ≥0 are Lagrange multipliers.
After optimizing the QPPs in Eqns. (28) and (29), we obtain the ui(i=
+,−) as follows:
u+=K(A, DT)T(α−β)−ν1
`2
K(B, DT)Te2
and
u−=−K(B, DT)T(γ−σ) + ν2
`1
K(A, DT)Te1.
Value of the bias term (b+) is given by:
O+={i:αi>0 and βi>0}, b+=−1
|O+|X
i∈O+
K(xT, DT)u+.
Similarly, value of bias term (b−) is given by:
O−={i:γi>0 and σi>0}, b−=−1
|O−|X
i∈O−
K(xT, DT)u−.
12
A new data point x∈Rnis assigned to class i(i= +1,−1) depending on
which of the kernel generated surface is closer to x, i.e.,
class(i) = signK(xT, DT)u++b+
ku+k+K(xT, DT)u−+b−
ku−k,
where sign(·) is the signum function.
3.3. Least squares TWSVM variants
3.3.1. Least squares twin support vector machine (LSTSVM)
The formulation of least squares twin support vector machines (LSTSVM)
[9] simply solves system of linear equations as opposed to solving QPPs in
TWSVM, TBSVM and pinTSVM. Therefore, it is a simple and fast algorithm.
For derivation of the primal problems of LSTSVM, inequality constraints is
replaced by equality constraints and 1-norm is replaced by 2-norm for slack
variables in the formulation of TWSVM. Thus, the primal problems of LSTSVM
[9] can be expressed as
min
(w1,b1)∈Rn+1
1
2kAw1+e2b1k2+c1
2kξ1k2
s.t. −(Bw1+e1b1) + ξ1=e1,(30)
min
(w2,b2)∈Rn+1
1
2kBw2+e1b2k2+c2
2kξ2k2
s.t. (Aw2+e2b2) + ξ2=e2.(31)
Solution of linear LSTSVM is obtained by computing inverse of two matrix
[9] which can be expressed in the form of two nonparallel hyperplanes as follows:
w1
b1
=−hc1QtQ+PtPi−1
c1Qte1,(32)
w2
b2
=hc2PtP+QtQi−1
c2Pte2,(33)
13
where c1and c2are positive penalty parameters, P= [A e2] and Q= [B e1].
Here, we can see that both TWSVM and LSTSVM only minimize the em-
pirical risk, and the matrices in Eqns. (32) and (33) may not be nonsingular.
3.3.2. Improved least squares twin support vector machine (ILSTSVM)
Least squares twin support vector machine (LSTSVM) implements only em-
pirical risk minimization principle, which reduces its generalization performance.
To overcome this drawback, Xu et al. [16] proposed an improved version of
LSTSVM by introducing extra regularization terms to each objective function.
This improvement leads to implement the structural risk minimization principle
and shows better generalization performance as compared to LSTSVM.
The primal problems of ILSTSVM [16] can be written as
min
(w1,b1,ξ1)
1
2kAw1+e2b1k2+c1
2ξt
1ξ1+c3
2
w1
b1
2
s.t. −(Bw1+e1b1) + ξ1=e1,
(34)
min
(w2,b2,ξ2)
1
2kBw2+e1b2k2+c2
2ξt
2ξ2+c4
2
w2
b2
2
s.t. (Aw2+e2b2) + ξ2=e2,
(35)
where ci, i = 1,2,3,4 are the penalty parameters, e1and e2are vectors of ones of
appropriate dimensions, ξ1and ξ2are slack variables of appropriate dimensions.
Solution of linear ILSTSVM is obtained by computing inverse of two matrix
[16] which can be expressed in the form of two nonparallel hyperplanes as follows:
w1
b1
=−[GtG+1
c1
HtH+c3
c1
I]−1Gte1,(36)
w2
b2
= [HtH+1
c3
GtG+c4
c2
I]−1Hte2.(37)
Note that the solutions for the primal problems in Eqns. (34) and (35) can
be obtained directly by solving two systems of linear equations instead of solving
14
two QPPs in TBSVM, which implies that the speed of ILSTSVM is faster than
TBSVM.
3.3.3. Robust energy-based least squares twin support vector machines (RELS-
TSVM)
By introducing an energy parameter to each hyperplane and an extra regular-
ization term to each objective function, recently Tanveer et al. [20] presented a
robust energy-based least squares twin support vector machines (RELS-TSVM)
algorithm for classification problems. This algorithm is not only robust to noise
and outliers but also more stable.
The primal problems of RELS-TSVM can be expressed as follows:
min
(w1,b1,ξ1)
1
2kAw1+e2b1k2+c1
2ξt
1ξ1+c3
2
w1
b1
2
s.t. −(Bw1+e1b1) + ξ1=E1,
(38)
min
(w2,b2,ξ2)
1
2kBw2+e1b2k2+c2
2ξt
2ξ2+c4
2
w2
b2
2
s.t. (Aw2+e2b2) + ξ2=E2,
(39)
where ci, i = 1,2,3,4 are the penalty parameters; E1and E2are energy pa-
rameters of the hyperplanes; ξ1and ξ2are the slack variables of appropriate
dimensions.
One can obtain the solutions for the problems in Eqns. (38) and (39) as
z1=−(c1NtN+MtM+c3I)−1c1NtE1(40)
z2= (c2MtM+NtN+c4I)−1c2MtE2(41)
respectively, where N= [B e1] and M= [A e2].
It should be pointed out again that both (c1NtN+MtM+c3I) and (c2MtM+
NtN+c4I) are positive definite matrices due to adding extra regularization
term. This extra regularization provides more robustness and stability to the
15
algorithm. We also notice that RELS-TSVM is not affected by matrix singular-
ity. The parameters c3and c4used in the formulation are penalty parameters
instead of perturbation term. Once training of RELS-TSVM is over then the
class of an unknown data point xiis assigned based on the following decision
function.
f(xi) =
+1 if |xiw1+eb1
xiw2+eb2| ≤ 1
−1 if |xiw1+eb1
xiw2+eb2|>1
(42)
where |.|is the absolute value.
4. Numerical experiments
All 90 datasets are taken from UCI repository [24]. Out of 90 datasets, 44
are binary and 46 are multi-class datasets. The name of these datasets2are
available at these links http://people.iiti.ac.in/~phd1501101001/TSVM_
Binary_JMLR/results-Binary_Final.xlsx and http://people.iiti.ac.in/
~phd1501101001/TSVM_Binary_JMLR/results-Multi_Final.xlsx. We have
performed Z-score normalization on all the datasets as done in [23]. Our ex-
perimental setup is identical as in [23] and it consists of two steps. In the
first step, one training and one testing set are generated randomly by dividing
the dataset into two equal parts for tuning the parameters for the classifiers
and select the best performing parameters from the testing set as the optimal
parameters. Indices for this random division of the datasets have been taken
directly from [23]. The authors have provided all indices at http://persoal.
citius.usc.es/manuel.fernandez.delgado/papers/jmlr/data.tar.gz. In
the second step, experiments have been performed in two ways using the opti-
mal parameters for final training and testing as follows:
2The whole datasest and partitions are available from: http://persoal.citius.usc.es/
manuel.fernandez.delgado/papers/jmlr/data.tar.gz
16
(i) If the dataset is not originally (as provided by creator of the datasets)
available in two sets i.e. training and testing set then 4-fold cross validation
is performed on the whole dataset. We have used the indexes as in [23] of
the training and testing sets of each fold for all the used classifiers in the
paper. The final result is considered as the average result of 4 folds.
(ii) If the dataset is originally (as provided by creator of the datasets) available
in two sets i.e. training and testing sets (like hill-valley, horse-colic, monks-
1, spectf, etc.) then train and test the classifiers on the respective partition
using the obtained optimal parameters in the first step.
All used partitions and indexes in our experiments are the same as used in
[23]. Eight variants of TWSVM viz., TBSVM, TWSVM, LPTSVM, pinTSVM,
WLTSVM, LSTSVM, ILSTSVM, and RELS-TSVM, are discussed in the previ-
ous section. We have followed the same naming convention for classification as
in [23]. All the experiments have been conducted on MATLAB 2016a in Win-
dows 7 (64 bit) environment with 64 GB RAM, 3.00 GHz Intel Xeon processor.
The names of 8 variants of TWSVM are appended with ‘m’ as ‘m’ stands for
MATLAB: TBSVM m, TWSVM m, LPTSVM m, pinTSVM m, WLTSVM m,
LSTSVM m, ILSTSVM m, and RELS-TSVM m. It is to be noted that now on-
ward when we mention TWSVM instead of TWSVM m then TWSVM denotes
twin SVM family collectively, and TWSVM m denotes the basic variant of twin
SVM family. These variants of TWSVM use Gaussian kernel and it has one
parameter σ. The ranges of all parameters of TWSVM variants are provided in
Table 1.
4.1. Comparison between 8TWSVM variants with 179 classifiers from [23] on
binary class datasets
In this section, we will discuss only performance on binary datasets. Discus-
sion on multi-class datasets is provided in Section 4.2. Performance of all 179
classifiers from [23] with 8 variants of TWSVM are presented in Tables 2 and 3.
All values of 8 variants of TWSVM are kept bold faced in Tables 2 and 3. First
column in Tables 2 and 3 shows position (Pos) as per Friedman Rank (FRank).
17
Table 1: Ranges of all parameters of TWSVM variants
Parameters ILSTSVM m LPTSVM m LSTSVM m pinTSVM m RELS-TSVM m TBSVM m TWSVM m WLTSVM m
Regularization (c1)10−5to 10510−5to 10510−5to 1052−3to 2710−5 to 10510−5to 10510−5to 10510−5to 105
Regularization (c2)10−5to 10510−5 to 10510−5to 1052−3to 2710−5to 10510−5to 10510−5to 10510−5to 105
Epsilon (1)10−5to 105−− −− −− 10−5to 10510−5 to 105−− 10−5to 105
Epsilon (2)10−5 to 105−− −− −− 10−5to 10510−5to 105−− 10−5to 105
Sigma (σ)2−10 to 210 2−10 to 210 2−10 to 210 2−10 to 210 2−10 to 210 2−10 to 210 2−10 to 210 2−10 to 210
Tau (τ1)−− −− −− [0.05,0.1,0.2,0.5,1] −− −− −− −−
Tau (τ2)−− −− −− [0.05,0.1,0.2,0.5,1] −− −− −− −−
Supplementary parameter (v1)−− −− −− 2−3to 27−− −− −− −−
Supplementary parameter (v2)−− −− −− 2−3to 27−− −− −− −−
Energy (E1)−− −− −− −− [0.5,0.6,0.7,0.8,0.9,1.0] −− −− −−
Energy (E2)−− −− −− −− [0.5,0.6,0.7,0.8,0.9,1.0] −− −− −−
Second column contains classifiers with their family name in the bracket, these
names are kept the same as in [23] and their detailed descriptions are available
in [23]. Third and fourth columns contain FRank and average accuracy (Acc)
of their respective classifiers. These average accuracies and ranks are different
from the average accuracy mentioned in [23] as these average accuracies are
based only on 44 binary datasets. Out of 179 classifiers, vbmpRadial t didn’t
work on the binary class datasets as it needs minimum 3-class dataset. Just for
sake of completeness, vbmpRadial t is simply added to the last place in Table
3. We have performed analysis in the same way as discussed in [23] by calcu-
lating FRank, Average accuracy (Acc), Probability of Achieving the Maximum
Accuracy (PAMA), Probability of achieving more than 95% of the maximum
accuracy (P95) and Percentage of the Maximum Accuracy (PMA). PAMA, p95
and PMA are defined as follows [23]:
PAMA = Number of datasets for which any classifier achieves the maximum accuracy
Number of datasets ×100
p95 = Number of datasets for which any classifier achieves more than 95% of maximum accuracy
Number of datasets ×100
PMA = PNo. of datasets
i=1 Accuracy of classifier for ith dataset
Maximum accuracy achieved for ith dataset ×100
Number of datasets
18
Above all mentioned measures are recalculated for binary datasets with 8
variants of TWSVM by using the detailed results3provided by Fernandez et al.
[23]. These are extensively discussed in the following subsections.
4.1.1. FRank and Average Accuracy analysis of 8 variants of TWSVM with 179
classifiers from [23]
Friedman ranking is performed on 44 binary datasets with 187 classifiers
(179 classifiers from [23] and 8 variants of TWSVM). Their FRank and Aver-
age accuracy (Acc) are presented in Tables 2 and 3. As the detailed results
of each classifier on each dataset are of very large size, all results are pro-
vided in detail at http://people.iiti.ac.in/~phd1501101001/TSVM_JMLR_
Binary_Multi.html. As mentioned in [23], some classifiers yield erroneous out-
put, all erroneous output of the classifiers are replaced by the average accuracy
of that specific dataset over 187 classifiers before calculating FRank value (as
done in [23]). Those erroneous values are represented by ‘–’ in the result table,
which is available at the above mentioned webpage. The top three best perform-
ing classifiers as per their average accuracies are RELS-TSVM m, ILSTSVM m
and KRR m/KELM m. However, FRank is not consistent with average accu-
racy which can be seen in Tables 2 and 3. As per FRank, RELS-TSVM m,
ILSTSVM m and avNNet t are the top three classifiers. According to both
criteria viz., FRank and Average accuracy, improved least square variants of
TWSVM based classifiers (RELS-TSVM m and ILSTSVM m) got the joint top
place in the table. However, the basic least square variants i.e. LSTSVM m is
the second worst performer among all 8 variants of TWSVM. If we see top 20
classifiers list then we observe that TWSVM based 5 classifiers are in the top
20 classifiers list according to both FRank as well as Average accuracy (Acc)
among 187 classifiers. Among 187 classifiers, RELS-TSVM m and ILSTSVM m
show stable and better results compared to all remaining classifiers. Fig. 1
3The detailed results are available at: http://persoal.citius.usc.es/manuel.
fernandez.delgado/papers/jmlr/
19
Table 2: Position of classifiers based on binary class datasets (Pos) as per FRank in [23],
FRank, Average accuracy (Acc) for each classifier, ordered by increasing the FRank. This
table is continued to Table 3
Pos Classifiers FRank Acc
1 RELS-TSVM m(TWSVM) 36.9 83.1
1 ILSTSVM m(TWSVM) 36.9 83.1
2 avNNet t(NNET) 39.7 82.0
3 svmPoly t(SVM) 40.4 81.8
4 TBSVM m(TWSVM) 43.2 81.8
5 svmRadialCost t(SVM) 43.9 81.9
6 pcaNNet t(NNET) 45.1 81.9
7 KRR m/KELM m(NNET) 48.1 82.7
8 svm C(SVM) 48.4 80.8
9 rf t(RF) 48.6 81.6
10 LPTSVM m(TWSVM) 48.9 81.9
11 svmRadial t(SVM) 51.2 81.2
12 parRF t(RF) 52.9 81.1
13 nnet t(NNET) 54.1 80.9
14 C5.0 t(BST) 56.6 80.4
15 mlp t(NNET) 56.8 80.7
16 cforest t(RF) 57.3 79.5
17 mlpWeightDecay t(NNET) 57.4 80.3
18 TWSVM m(TWSVM) 58.9 79.6
19 svmLinear t(SVM) 60.9 80.6
20 svmBag R(BAG) 61.5 80.6
21 RotationForest w(RF) 62.6 80.6
22 rforest R(RF) 63.4 82.5
22 gaussprRadial R(OM) 63.4 81.2
23 bayesglm t(GLM) 64.2 80.3
24 fda t(DA) 64.5 79.7
25 glmnet R(GLM) 64.8 81.0
26 BG LibSVM w(BAG) 65.2 79.5
27 rda R(DA) 66.1 80.5
28 pls t(PLSR) 66.9 80.6
29 rbf t(NNET) 67.2 78.7
30 knn t(NN) 67.5 79.4
31 WLTSVM m(TWSVM) 68.5 81.1
31 pnn m(NNET) 68.5 79.3
32 svmlight C(NNET) 68.8 81.1
33 pda t(DA) 68.9 80.5
34 rbfDDA t(NNET) 69.3 80.0
35 MAB LibSVM w(BST) 70.0 79.7
36 simpls R(PLSR) 70.7 80.5
37 widekernelpls R(PLSR) 71.3 80.5
38 RRFglobal t(RF) 71.4 80.2
39 multinom t(LMR) 71.5 80.8
40 nnetBag R(BAG) 71.6 79.5
41 dkp C(NNET) 72.4 79.6
42 mlm R(GLM) 73.2 80.9
43 adaboost R(BST) 73.5 80.0
44 fda R(DA) 73.6 79.8
45 plsBag R(BAG) 73.9 79.0
46 lda R(DA) 74.2 79.5
47 LibLINEAR w(SVM) 74.6 79.5
Pos Classifiers FRank Acc
48 mda t(DA) 74.7 72.8
49 kernelpls R(PLSR) 74.9 78.4
50 LibSVM w(SVM) 76.0 78.6
50 RRF t(RF) 76.0 79.8
51 MCC w(LMR) 76.4 79.2
51 Logistic w(OEN) 76.4 79.2
52 BG RandomForest w(BAG) 76.8 79.3
53 RandomForest w(RF) 77.4 79.7
54 knn R(NN) 77.5 79.2
55 gcvEarth t(MARS) 77.6 78.7
56 lvq t(NNET) 78.5 79.0
57 Decorate w(OEN) 78.7 79.8
58 MAB PART w(BST) 79.2 79.3
59 bagging R(BAG) 79.5 77.0
60 ldaBag R(BAG) 79.7 79.6
60 sda t(DA) 79.7 79.5
61 SMO w(SVM) 79.8 78.9
62 mars R(MARS) 79.9 78.4
63 SimpleLogistic w(LMR) 80.0 78.2
64 glmStepAIC t(GLM) 80.3 78.7
65 BG PART w(BAG) 81.3 78.8
66 MAB RandomForest w(BST) 81.4 79.5
67 lda2 t(DA) 81.6 79.1
68 mlp C(NNET) 82.1 77.9
69 BG Logistic w(BAG) 82.3 79.2
69 MAB MLP w(BST) 82.3 79.7
70 MAB Logistic w(BST) 83.5 79.3
71 MAB J48 w(BST) 83.9 79.1
72 BG DecisionTable w(BAG) 84.9 77.8
73 gpls R(PLSR) 85.1 76.3
74 hdda R(DA) 85.3 79.0
75 MLP w(NNET) 85.9 79.2
76 glm R(GLM) 86.4 76.8
77 ctreeBag R(BAG) 87.1 77.7
77 BG J48 w(BAG) 87.1 78.4
78 elm m(NNET) 87.2 78.0
79 RandomSubSpace w(DT) 87.3 77.6
80 CVR w(OM) 88.5 78.4
81 MAB REPTree w(BST) 88.6 78.3
82 JRip t(RL) 89.0 77.7
83 ctree2 t(DT) 89.5 77.3
84 ctree t(DT) 90.1 77.1
85 AdaBoostM1 J48 w(BST) 90.4 79.3
86 Dagging w(OEN) 90.6 77.6
87 LSTSVM m(TWSVM) 91.1 74.0
88 BG Ibk w(BAG) 91.6 78.5
88 BG LWL w(BAG) 91.6 78.5
88 BG REPTree w(BAG) 91.6 78.5
89 mda R(DA) 92.6 78.4
90 RandomCommittee w(OEN) 92.7 78.6
20
Table 3: Continuation of Table 2
Pos Classifiers FRank Acc
91 treebag t(BAG) 92.8 78.1
91 obliqueTree R(DT) 92.8 78.6
92 PenalizedLDA R(DA) 94.2 76.2
93 BG RandomTree w(BAG) 95.9 78.4
94 MAB DecisionTable w(BST) 96.0 76.5
95 C5.0Rules t(RL) 96.9 78.4
96 mlp m(NNET) 97.4 77.4
97 NBTree w(DT) 97.6 77.2
97 LogitBoost w(BST) 97.6 77.2
98 lssvmRadial t(SVM) 97.7 77.5
99 RBFNetwork w(NNET) 98.8 76.3
100 C5.0Tree t(DT) 99.5 77.9
101 DTNB w(RL) 99.6 77.3
102 rpart t(DT) 99.9 77.0
103 slda t(DA) 100.5 76.4
104 ASC w(OM) 100.7 77.0
105 AdaBoostM1 w(BST) 100.8 76.8
105 JRip w(RL) 100.8 77.4
106 FilteredClassier w(OM) 101.4 77.0
107 Ridor w(RL) 102.8 77.0
108 lvq R(NNET) 103.2 73.5
109 pam t(OM) 103.7 76.2
110 MAB RandomTree w(BST) 103.8 77.7
111 sddaLDA R(DA) 104.1 76.6
112 PART w(DT) 104.3 77.7
113 J48 w(DT) 104.5 77.8
113 OCC w(OEN) 104.5 77.8
113 END w(OEN) 104.5 77.8
114 rbf m(NNET) 104.7 74.0
115 rpart2 t(DT) 104.9 78.3
116 qda t(DA) 105.4 76.1
116 BayesNet w(BY) 105.4 75.9
116 PART t(DT) 105.4 78.2
117 stepLDA t(DA) 106.4 75.4
118 bdk R(NNET) 106.8 77.7
119 J48 t(DT) 108.0 77.6
120 sddaQDA R(DA) 109.4 75.1
121 rpart R(DT) 111.3 76.3
122 sparseLDA R(DA) 112.5 73.4
123 REPTree w(DT) 113.5 76.1
124 stepQDA t(DA) 114.0 74.2
125 DecisionTable w(RL) 114.1 75.4
126 MAB Ibk w(BST) 114.3 74.8
Pos Classifiers FRank Acc
126 MAB w(BST) 114.6 74.8
127 MAB NaiveBayes w(BST) 115.1 74.5
127 IBk w(NN) 115.1 76.5
128 KStar w(OM) 116.3 76.2
129 NNge w(NN) 116.7 76.1
130 cascor C(NNET) 117.6 75.5
131 nbBag R(BAG) 117.7 74.5
132 rrlda R(DA) 117.8 74.7
132 naiveBayes R(BY) 117.9 74.3
133 IB1 w(NN) 120.1 76.2
134 NaiveBayes w(BY) 121.2 73.7
135 LWL w(OEN) 121.3 74.5
136 BG DecisionStump w(BAG) 122.0 73.8
137 BG NaiveBayes w(BAG) 124.5 73.1
138 BG OneR w(BAG) 125.3 74.0
139 DecisionStump w(DT) 125.7 73.5
139 NBUpdateable w(BY) 125.7 73.1
140 ConjunctiveRule w(RL) 128.7 72.7
141 MAB OneR w(BST) 130.1 74.0
142 NaiveBayesSimple w(BY) 131.3 71.9
143 OneR t(RL) 132.2 72.5
144 dpp C(NNET) 132.3 69.4
145 spls R(PLSR) 133.9 65.8
146 logitboost R(BST) 136.0 69.3
147 QdaCov t(DA) 136.6 73.1
148 OneR w(RL) 137.6 71.9
149 RandomTree w(DT) 138.7 74.6
150 BG MLP w(BAG) 138.7 66.5
151 pinTSVM m(TWSVM) 139.7 66.3
152 BG HyperPipes w(BAG) 141.3 67.0
152 Stacking w(STC) 145.2 63.2
152 Grading w(OEN) 145.2 63.2
153 CVPS w(OM) 145.2 63.2
154 StackingC w(STC) 145.3 63.1
155 RILB w(BST) 145.7 63.4
155 VFI w(OM) 146.0 68.2
156 HyperPipes w(OM) 146.1 65.1
156 ZeroR w(RL) 146.8 62.6
156 MultiScheme w(OEN) 146.8 62.6
156 CSC w(OEN) 146.8 62.6
157 Vote w(OEN) 146.8 62.6
158 MetaCost w(BAG) 147.3 62.5
159 CVC w(OM) 165.6 61.6
160 vbmpRadial t(BY) NA NA
21
exhibits that RELS-TSVM m and ILSTSVM m achieve either maximum ac-
curacy or near maximum accuracy for all datasets except hill-valley (53.6%)
and horse-colic (60.29%) datasets. We further provide two seperate analysis in
Figs. 2 and 3 on top 25 classifiers using FRank and Acc. Fig. 2 shows that
top 2 classifiers i.e. RELS-TSVM m, ILSTSVM m (TWSVM family) exhibit
identical performance. There is a significant difference (2.8) in the FRank of
RELS-TSVM m (or ILSTSVM m) and avNNet t. Furthermore, we need to sta-
tistically verify the results presented in Tables 2 and 3. We have selected top
20 classifiers from these tables as per their FRanks and performed Friedman
test. In the Friedman test, null hypothesis states that accuracies of two com-
pared methods are not significantly different within a tolerance α= 0.05, when
p-value>0.05. We have computed three things under this test F-score, criti-
cal value(Cval) and p-value for top 20 classifiers. Moreover, we have computed
Friedman test and modifed Friedman test [34]. Computed p-values for both
cases (Friedman and modified Friedman test) are less than 0.05 i.e. 0.0012 and
0.0010, respectively. Computed F-scores for both cases are also less than critical
value i.e. 43.1672 <30.1435 and 2.3412 <1.5993. Based on the above analysis,
few interesting facts can be observed from Table 2 and Figs. 2 and 3:
(i) Despite of the identical average accuracy of svmPoly t and TBSVM m,
svmPoly t yields lower FRank compared to TBSVM m with a difference
of 2.8.
(ii) Similarly, Acc of TBSVM m is lesser compared to KRR m/KELM m how-
ever, TBSVM m achieves better FRank (difference of FRanks: 4.9) com-
pared to KRR m/KELM m.
(iii) KRR m/KELM m yields the third highest accuracy among 187 classifiers
and there is only a difference of 0.4 between the Acc of KRR m/KELM m
and RELS-TSVM m (or ILSTSVM m). However, KRR m/KELM m got
7th place among 187 classifiers.
(iv) Similar facts as discussed in the above three points can be stated for the
LPTSVM m, TWSVM m and WLTSVM m. These facts mainly exhibit
22
0 5 10 15 20 25 30 35 40 45
Datasets
50
55
60
65
70
75
80
85
90
95
100
Accuracy(%)
RELS-TSVM_m
Maximum Acc
Figure 1: Accuracy (in %) achieved by RELS-TSVM m (and ILSTSVM m) vs. maximum
accuracy for each dataset (ordered by increasing maximum accuracies).
the unstable behavior of these 3 variants of TWSVM based classifiers.
Even, WLTSVM m performs better in term of Acc but yields inferior
FRank compared to TWSVM m.
(v) One more interesting fact can be stated from the first column (i.e. Pos) of
Tables 2, 3, 7 and 8 that various low performing classifiers for multi-class
datasets have performed very well for binary class datasets in terms of Acc
as well as FRank.
Analysis of this section clearly shows the significant dominance of RELS-
TSVM m and ILSTSVM m over all remaining classifiers. However, these can’t
be the only criteria to measure the perfromance of the classifiers. In the next
two subsections, we will analyze the performance of classifiers based on PAMA,
p95 and PMA.
23
RELS-TSVM_m
ILSTSVM_m
avNNet_t
svmPoly_t
TBSVM_m
svmRadialCost_t
pcaNNet_t
KRR_m/KELM_m
svm_Crf_t
LPTSVM_m
svmRadial_t
parRF_t
nnet_t
C5.0_t
mlp_t
cforest_t
mlpWeightDecay_t
TWSVM_m
svmLinear_t
svmBag_R
RotationForest_w
rforest_R
gaussprRadial_R
bayesglm_t
35
40
45
50
55
60
65
Friedman Rank (FRank)
Figure 2: Top 25 classifiers as per FRank in increasing order of FRank.
RELS-TSVM_m
ILSTSVM_m
KRR_m/KELM_m
rforest_R
avNNet_t
pcaNNet_t
svmRadialCost_t
LPTSVM_m
TBSVM_m
svmPoly_t
rf_t
svmRadial_t
gaussprRadial_R
WLTSVM_m
svmlight_C
parRF_t
glmnet_R
mlm_R
nnet_t
svm_C
multinom_t
mlp_t
svmLinear_t
svmBag_R
RotationForest_w
80.5
81
81.5
82
82.5
83
83.5
Accuracy (%)
Figure 3: Top 25 classifiers as per average accuracy in decreasing order of Acc.
24
4.1.2. PAMA and p95 analysis of 8 variants of TWSVM with 179 classifiers
from [23]
PAMA and p95 were calculated for 187 classifiers over 44 binary datasets.
Top 20 classifiers as per PAMA criterion are listed in Table 4 and PAMA value
of all 187 classifiers are provided on this web page1. As we can see from Ta-
ble 4, TBSVM m emerges as the best classifier instead of RELS-TSVM m (or
ILSTSVM m) as per PAMA value. Surprisingly, the top 2 classifiers (RELS-
TSVM m and ILSTSVM m) as per FRank in the Table 2 are not able to secure
position in the top 20 list. It can be noted that all three least square based
TWSVM (LSTSVM m, RELS-TSVM m and ILSTSVM m) yield the same PAMA
value of 6.8. 4 variants of TWSVM viz., TBSVM m, LPTSVM m, TWSVM m
and WLTSVM m, are in the top 20 list, which shows the dominance of TWSVM
based classifiers over other classifiers. However, PAMA provides the biased in-
sight about the classifier as some classifier didn’t achieve the maximum accuracy
but they might be very near to maximum accuracy [23]. Therefore, we consider
the p95 criteria for the evaluation. Top 20 classifiers as per p95 criterion are
listed in Table 5 and the p95 values of all 187 classifiers are provided on this web
page1. As per this criterion, 4 variants of TWSVM attain a position within the
top in the list of top 20 classifiers in Table 5. Both least square variants viz.,
RELS-TSVM m and ILSTSVM m, got the position in top 10 while TBSVM m
performs better than RELS-TSVM m and ILSTSVM m. This shows the fact
that RELS-TSVM m and ILSTSVM m might not achieve best accuracy but
improves the generalization capability of TWSVM m.
4.1.3. PMA analysis of 8 variants of TWSVM with 179 classifiers from [23]
Top 20 classifiers as per PMA criterion are listed in Table 6 and PMA val-
ues of all 187 classifiers are provided on the web page1. Before calculating
the PMA value, all erroneous output of the classifiers are replaced by zero (as
done in [23]). As per this criterion, 2 variants of TWSVM i.e. RELS-TSVM m
and ILSTSVM m, attain the top 2 positions similar to FRank and Acc crite-
ria. Four TWSVM based classifiers secured their positions among the top 10
25
Table 4: Top 20 classifiers as per the highest PAMA (%) value.
S.No. Classifier PAMA(%)
1TBSVM m(TWSVM) 15.9
2 KRR m/KELM m(NNET) 13.6
3 mda t(DA) 11.4
4 mlp t(NNET) 11.4
5 svmRadialCost t(SVM) 11.4
6LPTSVM m(TWSVM) 11.4
7 pcaNNet t(NNET) 9.1
8 pnn m(NNET) 9.1
9 dkp C(NNET) 9.1
10 svm C(SVM) 9.1
S.No. Classifier PAMA(%)
11 adaboost R(BST) 9.1
12 nnetBag R(BAG) 9.1
13 rforest R(RF) 9.1
14 gpls R(PLSR) 9.1
15 TWSVM m(TWSVM) 9.1
16 WLTSVM m(TWSVM) 9.1
17 MAB DecisionTable w(BST) 6.8
18 pda t(DA) 6.8
19 rda R(DA) 6.8
20 rbf m(NNET) 6.8
Table 5: Top 20 classifiers as per the highest p95 (%) value.
S.No. Classifier p95(%)
1 svmRadialCost t(SVM) 77.3
2 svm C(SVM) 72.7
3 svmPoly t(SVM) 72.7
4TBSVM m(TWSVM) 72.7
5 svmRadial t(SVM) 70.5
6RELS-TSVM m(TWSVM) 70.5
7ILSTSVM m(TWSVM) 70.5
8 BG LibSVM w(BAG) 68.2
9LPTSVM m(TWSVM) 68.2
10 avNNet t(NNET) 65.9
S.No. Classifier p95(%)
11 KRR m/KELM m(NNET) 63.6
12 MAB LibSVM w(BST) 61.4
13 pcaNNet t(NNET) 61.4
14 LibSVM w(SVM) 61.4
15 C5.0 t(BST) 61.4
16 rf t(RF) 61.4
17 parRF t(RF) 61.4
18 TWSVM m(TWSVM) 61.4
19 svmBag R(BAG) 59.1
20 mlpWeightDecay t(NNET) 56.8
Table 6: Top 20 classifiers as per the highest PMA (%) value.
S.No. Classifier PMA(%)
1RELS-TSVM m(TWSVM) 95.3
2ILSTSVM m(TWSVM) 95.3
3 KRR m/KELM m(NNET) 94.8
4 avNNet t(NNET) 94.1
5 pcaNNet t(NNET) 93.9
6 svmRadialCost t(SVM) 93.9
7 svmPoly t(SVM) 93.8
8TBSVM m(TWSVM) 93.7
9LPTSVM m(TWSVM) 93.7
10 rf t(RF) 93.5
S.No. Classifier PMA(%)
11 svmRadial t(SVM) 93.1
12 rforest R(RF) 93.0
13 nnet t(NNET) 93.0
14 parRF t(RF) 93.0
15 glmnet R(GLM) 92.9
16 WLTSVM m(TWSVM) 92.8
17 svm C(SVM) 92.8
18 mlp t(NNET) 92.7
19 nnetBag R(BAG) 92.7
20 svmLinear t(SVM) 92.6
26
RELS-TSVM_m
ILSTSVM_m
KRR_m/KELM_m
avNNet_t
pcaNNet_t
svmRadialCost_t
svmPoly_t
TBSVM_m
LPTSVM_m
rf_t
svmRadial_t
rforest_R
nnet_t
parRF_t
glmnet_R
WLTSVM_m
svm_C
mlp_t
nnetBag_R
svmLinear_t
RotationForest_w
pda_t
svmBag_R
simpls_R
widekernelpls_R
92
92.5
93
93.5
94
94.5
95
95.5
% of the maximum accuracy (PMA)
Figure 4: Top 20 classifiers as per PMA criterion in decreasing order of PMA value.
0 5 10 15 20 25 30 35 40 45
Datasets
60
65
70
75
80
85
90
95
100
% of the maximum accuracy
RELS-TSVM_m
avNNet_t
Figure 5: PMA value over 44 datasets in increasing order for RELS-TSVM m and avNNet t.
27
classifiers. In Fig. 4, top 20 classifiers are plotted using PMA values. One
can observe that RELS-TSVM m and ILSTSVM m clearly outperform the top
two classifiers of [23] as the difference of the PMA value of RELS-TSVM m
from KRR m/KELM m and avNNet t are 0.5 and 1.2, respectively. The best
performing classifier among 179 classifiers of [23] (i.e. avNNet t) and among
TWSVM based classifiers (i.e. RELS-TSVM m or ILSTSVM m) as per FRank
are plotted with their PMA value in the increasing order over 44 binary datasets
in Fig. 5. It can be easily observed that RELS-TSVM m either performed sim-
ilar or significantly better compared to avNNet t on 44 binary datasets.
4.2. Comparison between 8TWSVM variants with 179 classifiers from [23] for
multi-class datasets
In this subsection, we compare 8 TWSVM variants with 179 classifiers from
[23] on 46 multi-class datasets. Results are provided in Table 7 and Table 8
which contains FRank and Acc of the classifiers. As it can be observed from
these tables that 4 TWSVM variants achieve top 20 ranks and outperformed
most of the classifiers. However, performance is not similar as the case of bi-
nary datasets where TWSVM variants achieved top 2 positions. Further, we
have also calculated PAMA for these multi-class datasets and results are pre-
sented in Table 9. Here, 1 classifier manages to achieve 2nd positions and 1
classifier achieves the 3rd position. Total 4 TWSVM variants get a position
in top 20 as per PAMA criterion. It is to be noted that only one variant,
TBSVM m, is common between Table 6 and Table 9. All results are pro-
vided in detail at http://people.iiti.ac.in/~phd1501101001/TSVM_JMLR_
Binary_Multi.html. Comparison of TWSVM variants is discussed in detail
in the subsequent subsection. Furthermore, similar as binary class datasets,
we have computed F-score, p-value and critical value for multi-class datasets.
Computed p-values for both cases (Friedman and modified Friedman test) are
less than 0.05 i.e. 0.0094 and 0.0085, respectively. Computed F-scores for both
cases are also less than critical value i.e. 36.4090 <30.1435 and 1.9561 <1.5987.
Based on the above discussion, we can state that outcomes presented in this pa-
28
per are significantly different and we can reject null hypothesis.
4.3. Comparison among TWSVM variants
TWSVM variants can be divided into 3 categories. The first category con-
tains three basic TWSVM variants viz., TBSVM m, TWSVM m, and LPTSVM m.
The second category is based on weighted TWSVM and variants are pinTSVM m
and WLTSVM m. The third category contains three least square variants viz.,
LSTSVM m, ILSTSVM m, and RELS-TSVM m. It can be observed from Table
2 and 3 that basic least square version of TWSVM didn’t perform well but two
variants from least square category ( ILSTSVM m, and RELS-TSVM m) per-
formed better than rest of the variants as per Acc value. A Similar observation
is made for multi-class datasets in Tables 7 and 8 that basic least square version
didn’t perform well but another variant, ILSTSVM m, from this category yields
best Acc value among all. Although, one variant from basic TWSVM category
(TWSVM m), two least square category variants (ILSTSVM m, and RELS-
TSVM m), and one variant from weighted TWSVM (WLTSVM m) category
yielded similar Acc value with minor differences as shown in Table 7. As per
PAMA value of binary datasets in Table 4, all three basic TWSVM family vari-
ants achieve top 3 position among 8 TWSVM variants. As per PAMA value of
multi-class datasets in Table 9, the scenario is completely different and all three
variants from least square category achieve top 3 position among 8 TWSVM
variants. Time comparison among all 8 variants for 90 datasets is provided in
Tables 10 and 11. Weighted TWSVM category-based variants consume the least
time compared to rest of the categories and least square category stands on the
second position among three categories. Basic TWSVM variants consume more
time in converging the optimization problem for few datasets. This fact can be
observed in the Tables 10 and 11 for few datasets viz., thyroid, cardiotocography-
10clases, statlog-image, steel-plates etc. For these few datasets, average training
time is more for TBSVM m, TWSVM m, and LPTSVM m.
29
Table 7: Position of classifiers for multi-class datasets (Pos) as per FRank in [23], FRank,
Average accuracy (Acc) for each classifier, ordered by increasing the FRank. This table is
continued to Table 8
Pos Classifier FRank Acc
1 parRF t(RF) 24.3 79.4
2 rf t(RF) 28.0 79.1
3 rforest R(RF) 31.2 78.9
4 nnet t(NNET) 36.3 78.6
5 svmPoly t(SVM) 37.5 77.7
6 svm C(SVM) 38.8 78.3
7 svmRadial t(SVM) 39.3 77.4
8 KRR m/KELM m(NNET) 39.4 77.4
9 RRF t(RF) 39.5 78.2
10 svmRadialCost t(SVM) 40.1 77.5
11 mlp t(NNET) 41.8 78.4
12 C5.0 t(BST) 43.4 77.2
13 avNNet t(NNET) 44.5 77.8
14 BG LibSVM w(BAG) 45.0 77.2
15 pcaNNet t(NNET) 45.4 76.7
16 TBSVM m(TWSVM) 45.9 76.4
17 adaboost R(BST) 46.0 76.9
18 TWSVM m(TWSVM) 46.1 76.8
19 RELS-TSVM m(TWSVM) 46.4 76.7
20 ILSTSVM m(TWSVM) 46.6 76.9
20 RotationForest w(RF) 46.6 77.6
21 RRFglobal t(RF) 48.2 76.9
22 LibSVM w(SVM) 50.9 76.0
23 MAB LibSVM w(BST) 52.5 76.5
24 RandomCommittee w(OEN) 56.8 76.4
25 Decorate w(OEN) 57.0 76.1
26 MAB RandomForest w(BST) 57.2 75.2
27 LPTSVM m(TWSVM) 58.2 74.3
28 mlpWeightDecay t(NNET) 58.6 76.7
29 CVR w(OM) 59.4 75.6
30 svmLinear t(SVM) 59.7 75.6
31 cforest t(RF) 60.1 75.4
32 pnn m(NNET) 60.4 76.2
33 gaussprRadial R(OM) 60.5 76.2
34 dkp C(NNET) 60.8 76.0
35 multinom t(LMR) 60.9 75.9
36 glmnet R(GLM) 61.1 75.0
37 treebag t(BAG) 61.3 76.2
38 mlp C(NNET) 61.8 75.8
39 RandomForest w(RF) 61.9 74.8
40 SimpleLogistic w(LMR) 63.3 75.2
41 elm m(NNET) 64.8 76.2
42 rda R(DA) 66.0 75.2
43 MAB MLP w(BST) 66.8 74.2
43 mda t(DA) 66.8 73.4
44 END w(OEN) 67.2 75.4
45 pda t(DA) 67.3 74.3
46 BG RandomForest w(BAG) 67.7 74.9
Pos Classifier FRank Acc
47 MAB PART w(BST) 68.4 74.7
47 LogitBoost w(BST) 68.4 74.8
47 svmlight C(NNET) 68.4 74.2
48 MAB J48 w(BST) 69.0 74.7
49 fda R(DA) 69.1 73.9
50 ldaBag R(BAG) 69.2 73.7
51 fda t(DA) 69.4 74.8
52 knn R(NN) 69.5 74.8
53 rbf t(NNET) 69.9 73.5
53 gcvEarth t(MARS) 69.9 74.4
54 lda R(DA) 70.3 73.8
55 BG PART w(BAG) 70.4 74.3
56 BG REPTree w(BAG) 70.6 74.8
57 BG J48 w(BAG) 71.2 74.4
58 rbfDDA t(NNET) 71.3 74.6
59 MLP w(NNET) 71.5 74.4
60 lda2 t(DA) 72.2 73.3
61 knn t(NN) 72.8 74.2
62 mlm R(GLM) 73.1 73.6
63 AdaBoostM1 J48 w(BST) 73.6 74.2
64 ctreeBag R(BAG) 74.1 73.5
65 BG RandomTree w(BAG) 75.1 73.4
66 LibLINEAR w(SVM) 75.3 74.9
67 lssvmRadial t(SVM) 76.0 75.8
68 BG Ibk w(BAG) 76.2 73.7
69 sda t(DA) 76.6 73.3
70 lvq t(NNET) 77.8 74.3
71 BG LWL w(BAG) 78.5 73.0
72 SMO w(SVM) 79.4 73.5
73 pls t(PLSR) 80.2 70.5
74 MAB RandomTree w(BST) 80.3 73.2
75 KStar w(OM) 81.3 73.8
76 hdda R(DA) 81.5 72.9
77 mda R(DA) 81.8 73.3
78 LSTSVM m(TWSVM) 82.5 71.8
79 RandomSubSpace w(DT) 82.6 73.1
80 RBFNetwork w(NNET) 83.6 73.6
81 C5.0Tree t(DT) 85.0 73.4
82 J48 t(DT) 85.5 72.9
82 rpart R(DT) 85.5 72.2
83 MAB REPTree w(BST) 85.6 72.4
84 NNge w(NN) 86.0 73.5
85 Logistic w(OEN) 86.2 72.2
86 C5.0Rules t(RL) 86.3 73.1
87 BG Logistic w(BAG) 86.9 72.0
88 JRip t(RL) 87.0 72.4
89 PART t(DT) 87.6 72.6
89 J48 w(DT) 87.6 73.3
30
Table 8: Continuation of Table 2
Pos Classifier FRank Acc
90 ASC w(OM) 87.8 73.0
91 MAB Logistic w(BST) 89.3 71.6
92 logitboost R(BST) 89.7 72.4
93 PART w(DT) 90.6 72.4
94 rpart2 t(DT) 90.8 72.0
95 lvq R(NNET) 90.9 70.7
96 svmBag R(BAG) 91.0 67.5
96 MCC w(LMR) 91.0 71.8
97 nbBag R(BAG) 91.2 72.1
98 WLTSVM m(TWSVM) 91.5 71.4
99 IB1 w(NN) 92.1 72.4
100 rpart t(DT) 96.1 71.3
100 MAB DecisionTable w(BST) 96.1 70.8
101 BayesNet w(BY) 96.2 71.1
102 NBTree w(DT) 96.3 71.8
103 REPTree w(DT) 97.6 71.2
104 naiveBayes R(BY) 98.1 71.0
105 BG DecisionTable w(BAG) 98.7 71.3
106 DTNB w(RL) 98.8 71.4
107 ctree t(DT) 99.0 70.5
108 cascor C(NNET) 99.3 70.2
109 IBk w(NN) 99.6 70.8
110 ctree2 t(DT) 99.8 70.4
111 JRip w(RL) 100.4 71.2
112 qda t(DA) 100.8 69.3
113 NaiveBayes w(BY) 101.9 69.5
114 bagging R(BAG) 101.9 60.0
115 bdk R(NNET) 102.1 71.6
116 BG NaiveBayes w(BAG) 103.0 68.8
117 FilteredClassifier w(OM) 103.9 70.6
118 NBUpdateable w(BY) 104.9 68.0
119 MAB NaiveBayes w(BST) 105.9 68.9
120 Ridor w(RL) 106.6 71.0
121 pam t(OM) 107.0 68.4
122 OCC w(OEN) 107.1 70.0
123 rrlda R(DA) 108.1 67.1
124 RandomTree w(DT) 109.1 69.8
125 slda t(DA) 110.7 67.5
126 vbmpRadial t (BY) 110.8 66.0
127 sparseLDA R(DA) 110.9 65.6
128 Dagging w(OEN) 111.8 67.9
129 plsBag R(BAG) 112.1 63.1
130 rbf m(NNET) 113.2 64.9
131 QdaCov t(DA) 113.9 66.1
132 obliqueTree R(DT) 114.2 62.1
Pos Classifier FRank Acc
132 DecisionTable w(RL) 114.2 68.4
133 PenalizedLDA R(DA) 114.8 63.1
134 NaiveBayesSimple w(BY) 116.2 73.3
135 stepQDA t(DA) 118.8 66.0
136 mlp m(NNET) 120.4 65.3
137 stepLDA t(DA) 120.6 66.6
138 sddaLDA R(DA) 125.1 63.1
139 dpp C(NNET) 125.3 63.0
140 sddaQDA R(DA) 127.8 61.4
141 LWL w(OEN) 128.2 63.4
142 nnetBag R(BAG) 132.4 49.5
143 VFI w(OM) 133.9 63.2
144 OneR w(RL) 137.9 58.0
145 kernelpls R(PLSR) 139.2 51.9
146 OneR t(RL) 140.7 57.5
147 BG OneR w(BAG) 141.1 58.2
148 simpls R(PLSR) 141.8 51.0
149 BG HyperPipes w(BAG) 142.2 57.0
150 mars R(MARS) 142.3 54.8
151 MAB OneR w(BST) 142.4 57.9
152 widekernelpls R(PLSR) 142.5 52.0
153 MAB w(BST) 143.8 54.5
154 ConjunctiveRule w(RL) 145.4 52.6
155 BG DecisionStump w(BAG) 146.4 54.9
156 AdaBoostM1 w(BST) 147.6 54.3
157 MAB Ibk w(BST) 148.3 53.9
158 DecisionStump w(DT) 152.3 51.6
159 HyperPipes w(OM) 152.6 53.6
160 spls R(PLSR) 153.4 44.7
161 BG MLP w(BAG) 155.9 44.6
162 gpls R(PLSR) 158.4 38.5
163 bayesglm t(GLM) 159.8 41.1
164 CVC w(OM) 160.0 47.6
165 RILB w(BST) 161.8 43.4
166 glmStepAIC t(GLM) 163.1 40.6
167 StackingC w(STC) 165.2 40.8
168 MultiScheme w(OEN) 165.4 40.8
169 pinTSVM m(TWSVM) 166.6 34.3
170 Grading w(OEN) 166.9 40.6
171 glm R(GLM) 167.3 25.8
172 Vote w(OEN) 167.4 40.5
173 ZeroR w(RL) 167.5 40.5
173 MetaCost w(BAG) 167.5 40.4
174 Stacking w(STC) 168.1 40.3
174 CSC w(OEN) 168.1 40.3
175 CVPS w(OM) 168.5 40.1
31
Table 9: Top 20 classifiers as per the highest PAMA (%) value.
S.No. Classifier PAMA(%)
1 svm C(SVM) 10.9
2 parRF t(RF) 8.7
3ILSTSVM m(TWSVM) 8.7
4 adaboost R(BST) 6.5
5 RRF t(RF) 6.5
6RELS-TSVM m(TWSVM) 6.5
7 KRR m/KELM m(NNET) 4.3
8 BG RandomForest w(BAG) 4.3
9 lda R(DA) 4.3
10 sda t(DA) 4.3
S.No. Classifier PAMA(%)
11 nnet t(NNET) 4.3
12 lvq R(NNET) 4.3
13 C5.0 t(BST) 4.3
14 LSTSVM m(TWSVM) 4.3
15 TBSVM m(TWSVM) 4.3
16 MAB LibSVM w(BST) 2.2
17 MAB RandomForest w(BST) 2.2
18 MAB RandomTree w(BST) 2.2
19 lda2 t(DA) 2.2
20 PenalizedLDA R(DA) 2.2
5. Conclusions and future directions
This paper has provided an exhaustive benchmarking of 8 variants of TWSVM
based classifiers from three categories with 179 classifiers of 17 families. Eight
variants of TWSVM based classifiers have been taken into account and tested
along with 179 classifiers on various performance criteria viz., Acc, FRank,
PAMA, p95, and PMA. Two variants from least square category (ILSTSVM m,
and RELS-TSVM m) have performed the best among all 187 classifiers for bi-
nary class datasets as per FRank, Acc and PMA criteria, and another TWSVM
variants TBSVM m has performed the best as per PAMA criterion. Overall, 5
and 4 TWSVM variants are able to secure a place in the top 20 best classifiers for
binary and multi-class datasets according to FRank, respectively. An interesting
fact is observed among TWSVM variants for binary datasets. The basic least
square version of TWSVM i.e. LSTSVM m has performed the second-worst
among all TWSVM variants as per Acc criterion. However, their improved
variants i.e. RELS-TSVM m and ILSTSVM m have performed the best among
all TWSVM variants. A similar observation is made for multi-class datasets.
Moreover, TWSVM variants didn’t attain even top 5 positions for multi-class
datasets. Although, TWSVM variants are not emerged as the best classifiers
for multi-class datasets, it can be a good alternative because it obtained 2nd po-
sition as per PAMA criterion. On the other hand, it can be a better alternative
for binary class datasets compared to other state-of-the-art classifiers. Further-
32
Table 10: Training time (in seconds) of 8 variants for 90 datasets and continued to Table 11
ILSTSVM m LPTSVM m LSTSVM m pinTSVM m RELS-
TSVM m
TBSVM m TWSVM m WLTSVM m
acute-inflammation 0.0320 0.1779 0.0343 0.0161 0.0342 0.0662 0.1137 0.0023
acute-nephritis 0.0078 0.1487 0.0109 0.0125 0.0084 0.0610 0.0342 0.0037
annealing 0.7267 21.6805 0.7004 0.2043 0.7097 7.4263 1.7779 0.2041
arrhythmia 0.7148 6.7270 0.6239 0.0625 0.5953 0.8841 1.0891 0.0213
balance-scale 0.1603 6.4621 0.1684 0.0507 0.1080 1.2528 0.3692 0.0401
balloons 0.0026 0.0073 0.0056 0.0096 0.0030 0.0051 0.0191 0.0005
blood 0.1058 2.7389 0.1100 0.0832 0.1510 4.5923 0.4953 0.0635
breast-cancer 0.0145 0.6903 0.0147 0.0283 0.0155 0.1350 0.0642 0.0070
breast-cancer-wisc 0.1141 2.9150 0.0950 0.0688 0.0903 0.3817 0.3483 0.0576
breast-cancer-wisc-diag 0.1133 2.6128 0.0620 0.0438 0.0650 0.1578 0.2355 0.0313
breast-cancer-wisc-prog 0.0090 0.2316 0.0209 0.0193 0.0125 0.0428 0.0430 0.0066
breast-tissue 0.0106 0.3496 0.0108 0.0119 0.0129 0.0771 0.0817 0.0031
car 1.5964 40.2947 1.8283 0.4875 1.5445 37.5490 4.7756 0.5331
cardiotocography-10clases 6.6195 222.4167 8.2245 0.8833 7.7333 116.9285 18.1605 0.9085
cardiotocography-3clases 1.9685 104.8104 2.6934 1.2793 2.2968 16.3837 5.1091 1.0034
chess-krvkp 3.6092 444.7051 5.5426 1.3464 3.8869 10.8307 8.5032 2.4955
congressional-voting 0.0301 0.7976 0.0656 0.0248 0.0331 0.1125 0.1397 0.0196
conn-bench-sonar-mines-rocks 0.0102 0.1526 0.0105 0.0129 0.0310 0.0553 0.0608 0.0063
conn-bench-vowel-deterding 0.5993 15.4250 0.7889 0.0888 0.6403 11.4379 2.0700 0.0732
contrac 0.8070 17.0902 1.0402 0.2032 0.9466 32.2308 1.6210 0.3612
credit-approval 0.0884 3.3101 0.1561 0.0596 0.1697 0.4281 0.2311 0.1338
cylinder-bands 0.0487 4.3855 0.0482 0.0382 0.0887 0.2398 0.1274 0.0266
dermatology 0.1315 2.1967 0.0689 0.0916 0.0870 0.4126 0.2594 0.0129
echocardiogram 0.0058 0.0646 0.0054 0.0147 0.0059 0.0708 0.0440 0.0051
ecoli 0.0751 2.0166 0.1132 0.0283 0.1065 0.9302 0.3958 0.0124
energy-y1 0.1709 5.6710 0.2506 0.1099 0.1742 2.9909 0.3892 0.2076
energy-y2 0.1620 5.5776 0.2259 0.0820 0.2345 1.1859 0.4503 0.0786
fertility 0.0044 0.0328 0.0044 0.0128 0.0059 0.0271 0.0568 0.0264
flags 0.0314 0.5832 0.0961 0.0148 0.0397 0.1773 0.1897 0.0112
glass 0.0245 0.5198 0.0229 0.0150 0.0432 0.4832 0.1756 0.0053
haberman-survival 0.0158 0.2859 0.0155 0.0194 0.0164 0.3252 0.1313 0.0096
hayes-roth 0.0100 0.3890 0.0106 0.0189 0.0137 0.1103 0.0763 0.0039
heart-cleveland 0.0390 1.2811 0.0348 0.0264 0.0489 0.5683 0.1989 0.0168
heart-hungarian 0.0142 0.5106 0.0140 0.0450 0.0185 0.1827 0.0732 0.0137
heart-switzerland 0.0101 0.2188 0.0133 0.0159 0.0368 0.0886 0.0828 0.0081
heart-va 0.0198 0.5101 0.0267 0.0304 0.0219 0.3522 0.2045 0.0064
hepatitis 0.0067 0.1055 0.0151 0.0139 0.0072 0.0496 0.0631 0.0039
hill-valley 0.1623 4.7513 0.2599 0.1048 0.2243 0.8187 0.2973 0.2008
horse-colic 0.0579 1.0534 0.0274 0.0290 0.0401 0.0797 0.1556 0.0141
ilpd-indian-liver 0.1184 1.5416 0.0615 0.0456 0.0601 1.0810 0.1632 0.1038
image-segmentation 0.1606 7.1914 0.4763 0.0876 0.5045 0.4171 0.3660 0.0241
ionosphere 0.0251 0.5386 0.0206 0.0239 0.0282 0.0968 0.0731 0.0169
iris 0.0080 0.2485 0.0076 0.0181 0.0551 0.1585 0.0637 0.0038
led-display 1.1272 22.0068 1.2954 0.1208 1.4207 11.6267 2.7106 0.2657
lenses 0.0025 0.0192 0.0037 0.0202 0.0030 0.0071 0.0514 0.0006
libras 0.2190 5.0433 0.2063 0.0338 0.2571 1.6482 0.8661 0.0141
low-res-spect 0.3521 7.2346 0.3484 0.0758 0.3577 1.0836 0.8815 0.0598
lung-cancer 0.0029 0.0061 0.0033 0.0215 0.0033 0.0073 0.0541 0.0007
33
Table 11: Continuing from Table 10
ILSTSVM m LPTSVM m LSTSVM m pinTSVM m RELS-
TSVM m
TBSVM m TWSVM m WLTSVM m
lymphography 0.0108 0.4371 0.0439 0.0206 0.0121 0.0379 0.1242 0.0119
mammographic 0.2450 5.5149 0.2959 0.0944 0.1832 0.7785 0.4427 0.1512
molec-biol-promoter 0.0050 0.0293 0.0410 0.0218 0.0093 0.0130 0.0431 0.0035
molec-biol-splice 5.7153 616.4776 7.6826 1.1327 5.8671 17.0395 10.3194 2.5694
monks-1 0.0088 0.5434 0.0127 0.0203 0.0116 0.0207 0.0690 0.0049
monks-2 0.0116 0.3640 0.0143 0.0347 0.0157 0.1047 0.0735 0.0093
monks-3 0.0074 0.6456 0.0105 0.0234 0.0115 0.0332 0.0608 0.0064
musk-1 0.0503 1.2807 0.0538 0.0479 0.0526 0.1084 0.1253 0.0268
oocytes merluccius nucleus 4d 0.2280 6.9678 0.2587 0.1192 0.2559 6.8678 0.6275 0.1540
oocytes merluccius states 2f 0.4011 11.9716 0.4037 0.1528 0.3747 1.0210 1.0321 0.1372
oocytes trisopterus nucleus 2f 0.1661 4.4441 0.2069 0.1021 0.1760 1.5097 0.3527 0.1762
oocytes trisopterus states 5b 0.3013 9.4324 0.3281 0.0990 0.2702 1.2043 0.6508 0.1317
ozone 2.4402 65.2626 3.0110 1.8025 2.4534 43.0555 6.7500 1.5644
parkinsons 0.0088 0.4617 0.0084 0.0221 0.0096 0.0298 0.0752 0.0034
pima 0.1514 6.3792 0.1133 0.0715 0.1099 0.4495 0.3621 0.0666
pittsburg-bridges-MATERIAL 0.0062 0.0759 0.0057 0.0237 0.0069 0.0295 0.0654 0.0029
pittsburg-bridges-REL-L 0.0055 0.1070 0.0055 0.0231 0.0059 0.0287 0.0930 0.0024
pittsburg-bridges-SPAN 0.0282 0.2568 0.0089 0.0244 0.0061 0.0289 0.0837 0.0029
pittsburg-bridges-T-OR-D 0.0043 0.0455 0.0044 0.0237 0.0048 0.0218 0.0440 0.0028
pittsburg-bridges-TYPE 0.0097 0.2723 0.0238 0.0223 0.0138 0.0649 0.1499 0.0025
planning 0.0085 0.1060 0.0071 0.0254 0.0086 0.0877 0.0861 0.0685
primary-tumor 0.1901 3.2089 0.1568 0.0377 0.1767 0.9176 1.7308 0.0114
seeds 0.0129 0.4570 0.0544 0.0250 0.0136 0.0479 0.2573 0.0048
soybean 0.2723 10.8691 0.3824 0.0439 0.3483 1.0151 1.6679 0.0166
spect 0.0058 0.0908 0.0069 0.0239 0.0073 0.0126 0.0687 0.0031
spectf 0.0055 0.0865 0.0095 0.0240 0.0083 0.0151 0.0608 0.0046
statlog-australian-credit 0.1066 2.5125 0.1669 0.0560 0.0888 2.1936 0.5037 0.0504
statlog-german-credit 0.2577 18.5491 0.2260 0.1189 0.2264 1.4237 0.7356 0.1430
statlog-heart 0.4155 0.7680 0.0130 0.0240 0.0138 0.0336 0.6537 0.0066
statlog-image 5.4597 248.1873 6.7212 0.8627 5.8391 107.6823 16.1723 1.2836
statlog-vehicle 0.2838 12.8589 0.3178 0.1154 0.2891 5.5666 1.0063 0.0942
steel-plates 3.8250 113.8698 4.4728 0.5176 3.8220 63.9410 10.8581 0.6868
synthetic-control 0.3117 6.7058 0.2292 0.0935 0.2404 1.8294 0.8085 0.0449
teaching 0.0081 0.1407 0.0079 0.0218 0.0093 0.0734 0.1540 0.0125
thyroid 18.8930 1252.4298 25.6913 11.0068 20.1259 471.9164 92.1563 8.1767
tic-tac-toe 0.1825 7.2642 0.2024 0.1488 0.2229 1.1397 0.4431 0.1126
titanic 1.3967 20.6183 1.7539 0.4372 1.7101 9.5813 2.4346 0.8920
trains 0.0020 0.0042 0.0028 0.0220 0.0022 0.0042 0.0387 0.0005
vertebral-column-2clases 0.0261 0.7050 0.0145 0.0244 0.0165 0.0685 0.0890 0.0081
vertebral-column-3clases 0.0264 0.8039 0.0210 0.0286 0.0533 0.1170 0.1396 0.0080
wine 0.0761 0.5782 0.0096 0.0220 0.0138 0.0337 0.1395 0.0039
wine-quality-red 1.9174 55.3645 2.2870 0.4359 1.9982 40.9643 4.9816 0.4275
Average Time (in Seconds) 0.7093 38.3875 0.9019 0.2684 0.7564 11.6374 2.3408 0.2691
Maximum Time (in Seconds) 18.8930 1252.4298 25.6913 11.0068 20.1259 471.9164 92.1563 8.1767
34
more, most of the TWSVM variants are developed for stationary environment
(batch learning), hence, a lot of scope remains for the development of TWSVM
for the non-stationary environment (online learning). These variants can also
be developed for various framework like LUPI framework, Graph-Embedding
framework etc. As TWSVM variants emerges as the viable alternative for bi-
nary datasets, therefore, these variants need to be tested on those applications
where binary classification is required.
35
Acknowledgement
This work is supported by Science and Engineering Research Board (SERB)
funded Research Projects, Government of India under Early Career Research
Award Scheme, Grant No. ECR/2017/000053 and Ramanujan Fellowship Scheme,
Grant No. SB/S2/RJN-001/2016. We gratefully acknowledge the Indian Insti-
tute of Technology Indore for providing facilities and support.
References
[1] C. J. C. Burges, A tutorial on support vector machines for pattern recogni-
tion, Data Mining and Knowledge Discovery (2) (1998) 1-43.
[2] C. C. Chang, C. J. Lin, LIBSVM: a library for support vector machines,
ACM Transactions on Intelligent Systems and Technology (TIST) 2 (3)
(2011) 27.
[3] C. Cortes, V. N. Vapnik, Support vector networks, Machine Learning (20)
(1995) 273–297.
[4] N. Cristianini, J. Shawe-Taylor, An introduction to support vector machines
and other kernel based learning method, Cambridge University Press, Cam-
bridge, 2000.
[5] M. Tanveer, M. Mangal, I. Ahmad, Y.H. Shao, One norm linear program-
ming support vector regression, Neurocomputing (173) (2016) 1508-1518.
[6] J. Demsar, Statistical comparisons of classifiers over multiple data sets, Jour-
nal of Machine Learning Research (7) (2006) 1-30.
[7] G. H. Golub, C. F. V. Loan, Matrix Computations. Vol. 3 JHU Press, 2012.
[8] Jayadeva, R. Khemchandani, S. Chandra, Twin support vector machines for
pattern classification, IEEE Transactions on Pattern Analysis and Machine
Intelligence 29 (5) (2007) 905-910.
36
[9] M. A. Kumar, M. Gopal, Least squares twin support vector machines for
pattern classification, Expert Systems with Applications (36) (2009) 7535-
7543.
[10] O. L. Mangasarian, E. W. Wild, Multisurface proximal support vector clas-
sification via generalized eigenvalues, IEEE Transactions on Pattern Analy-
sis and Machine Intelligence 28 (1) (2006) 69-74.
[11] O. L. Mangasarian, Exact 1-norm support vector machines via uncon-
strained convex differentiable minimization. Journal of Machine Learning
Research. 2006; (7): 1517-1530.
[12] Y. H. Shao, C. H. Zhang, X. B. Wang, N. Y. Deng, Improvements on
twin support vector machines, IEEE Transactions on Neural Networks 22
(6) (2011) 962-968.
[13] V. N. Vapnik, Statistical Learning Theory, Wiley, New York, 1998.
[14] M. Tanveer, A. Tiwari, R. Choudhary, S. Jalan, Sparse pinball twin support
vector machines, Applied Soft Computing (78) (2019) 164-175.
[15] Y. Tian, Z. Qi, Review on: Twin Support Vector Machines, Annals of Data
Science 1 (2) (2014) 253-277.
[16] Y. Xu, W. Xi, X. Lv, R. Guo, An improved least squares twin support vec-
tor machine, Journal of information and computational science, 9(4) (2012)
1063-1071.
[17] H. Huang and X. Wei and Y. Zhou, Twin support vector machines: A
survey, Neurocomputing (300) (2018) 34 - 43.
[18] M. Tanveer, Application of smoothing techniques for linear programming
twin support vector machines, Knowledge and Information Systems 45(1)
(2015) 191-214.
37
[19] Y. H. Shao, W. J. Chen, J. J. Zhang, Z. Wang, N. Y. Deng, An efficient
weighted Lagrangian twin support vector machine for imbalanced data clas-
sification, Pattern Recognition 47 (9) (2014) 3158-3167.
[20] M. Tanveer, M. A. Khan, S. S. Ho, Robust energy-based least squares twin
support vector machines, Applied Intelligence 45 (1) (2016) 174-186.
[21] A. N. Tikhonov, V. Y. Arsen, Solutions of ill-posed problems. John Wiley
& Sons. New York; 1977.
[22] M. Tanveer, Robust and sparse linear programming twin support vector
machines, Cognitive Computation, 7(1) (2015) 137-149.
[23] M. Fernandez-Delgado, E. Cernadas, S. Barro, D. Amorim, Do we need
hundreds of classifiers to solve real world classification problems, Journal of
Machine Learning and Research, 15 (1) (2014) 3133-3181.
[24] M. Lichman, UCI machine learning repository, (2013).
[25] J. Vanschoren, H. Blockeel, B. Pfahringer, G. Holmes, Experiment
databases, Machine Learning, 87 (2) (2012) 127-158.
[26] X. Peng, TPMSVM: a novel twin parametric-margin support vector ma-
chine for pattern recognition, Pattern Recognition, 44(10-11) (2011) 2678-
2692.
[27] Y. Xu, Z. Yang, X. Pan, A novel twin support-vector machine with pinball
loss, IEEE Transactions on Neural Networks and Learning Systems, 28 (2)
(2017) 359-370.
[28] Z. Wang, Y.-H. Shao, T. R. Wu, A GA-based model selection for smooth
twin parametric-margin support vector machine, Pattern Recognition, 46
(8) (2013) 2267-2277.
[29] M. Tanveer, Newton method for implicit Lagrangian twin support vector
machines, International Journal of Machine Learning and Cybernetics 6(6)
(2015) 1029-1040.
38
[30] N. Parastalooi, A. Amiri, P. Aliheidari, Modified twin support vector re-
gression, Neurocomputing, 211 (2016) 84-97.
[31] L. Zhang and P. N. Suganthan, Benchmarking Ensemble Classifiers with
Novel Co-Trained Kernel Ridge Regression and Random Vector Functional
Link Ensembles [Research Frontier], in IEEE Computational Intelligence
Magazine, 12 (4) (2017) 61-72.
[32] R. Rastogi, S. Sharma, Fast Laplacian twin support vector machine with
active learning for pattern classification, Applied Soft Computing 74 (2019)
424-439.
[33] B. Richhariya, M. Tanveer, EEG signal classification using universum sup-
port vector machine, Expert Systems with Applications 106 (2018) 169-182.
[34] R. L. Iman, J. M. Davenport. Approximations of the critical region of the
fbietkan statistic, Communications in Statistics-Theory and Methods, 9 (6)
(1980) 571-595.
39