ArticlePDF Available

Large Margin Classifier Based on Hyperdisks

December 2011

December 2011
1:370-375

DOI:10.1109/ICMLA.2011.86

Authors:

Eskisehir Osmangazi University

This paper introduces a binary large margin classifier that approximates each class with an hyper disk constructed from its training samples. For any pair of classes approximated with hyper disks, there is a corresponding linear separating hyper plane that maximizes the margin between them, and this can be found by solving a convex program that finds the closest pair of points on the hyper disks. More precisely, the best separating hyper plane is chosen to be the one that is orthogonal to the line segment connecting the closest points on the hyper disks and at the same time bisects the line. The method is extended to the nonlinear case by using the kernel trick, and the multi-class classification problems are dealt with constructing and combining several binary classifiers as in Support Vector Machine (SVM) classifier. The experiments on several databases show that the proposed method compares favorably to other popular large margin classifiers.

Hyperdisk model of a class is the intersection of affine hull and bounding hypersphere of class samples.

…

Classification rates as a function of different number of samples per class on the AR Face Database.

…

Figures - uploaded by Hakan Cevikalp

Content may be subject to copyright.

Content uploaded by Hakan Cevikalp

Content may be subject to copyright.

Large Margin Classiﬁer Based on Hyperdisks

Hakan Cevikalp

Electrical and Electronics Engineering Department

Machine Learning and Computer Vision Laboratory, Eskisehir Osmangazi University

Meselik, Eskisehir, 26480 Turkey. Email:hakan.cevikalp@gmail.com

Abstract—This paper introduces a binary large margin classi-

ﬁer that approximates each class with an hyperdisk constructed

from its training samples. For any pair of classes approximated

with hyperdisks, there is a corresponding linear separating

hyperplane that maximizes the margin between them, and this

can be found by solving a convex program that ﬁnds the

closest pair of points on the hyperdisks. More precisely, the best

separating hyperplane is chosen to be the one that is orthogonal

to the line segment connecting the closest points on the hyperdisks

and at the same time bisects the line. The method is extended to

the nonlinear case by using the kernel trick, and the multi-class

classiﬁcation problems are dealt with constructing and combining

several binary classiﬁers as in Support Vector Machine (SVM)

classiﬁer. The experiments on several databases show that the

proposed method compares favorably to other popular large

margin classiﬁers.

Keywords-classiﬁcation; convex hull; hyperdisk; kernel meth-

ods; large margin classiﬁer; quadratic programming; support

vector machines.

I. INTRODUCTION

Large margin classiﬁers have recently enjoyed increased

attention due to their successful applications in various ﬁelds

such as computer vision (visual object classiﬁcation and detec-

tion), text classiﬁcation, biometrics, and genetic microarrays

[1,2,3,4]. The most popular large margin classiﬁer, the Sup-

port Vector Machines (SVMs) [4], is a binary classiﬁcation

method that simultaneously minimizes the empirical classi-

ﬁcation error and maximizes the geometric margin, which

is deﬁned to be the distance between the best separating

hyperplane and closest samples from the classes. If the classes

are not linearly separable in the original input space, the

data samples are mapped onto a higher-dimensional space

where they become linearly separable, and the best separat-

ing hyperplane is constructed in the mapped space. Finding

such an hyperplane involves the minimization of a convex

quadratic function subject to linear inequality constraints,

and the quadratic optimization problem can be efﬁciently

solved using sequential minimal optimization [5, 6, 7] or using

minimum enclosing balls [8]. The solution of the quadratic

programming problem leads to a sparse solution that enables

us to evaluate the decision function by using a small number

of samples in the vicinity of the class decision boundaries

(more precisely, the training samples that lie on either on the

margin or on the “wrong” side of it) called the support vectors.

Therefore, SVM classiﬁer is relatively fast compared to the

other classiﬁcation algorithms, which makes it attractive for

the most of the pattern classiﬁcation tasks.

From geometrical point of view, in linearly separable case,

SVM classiﬁer approximates each class with a convex hull

and ﬁnds the closest points in these hulls [9,10]. Then these

two closest points are connected with a line segment. The

separating hyperplane, that is orthogonal to the line segment

and at the same time bisects the line, is chosen to be the best

separating hyperplane. In other words, the two closest points

on the convex hulls determine the separating hyperplane, and

the SVM margin is merely equivalent to the minimum distance

between the convex hulls that represent classes. However,

convex hulls may not be the best models for approximation

of classes especially in high-dimensional spaces. Because,

convex hulls approximations tend to be unrealistically tight

in high-dimensional spaces since the classes typically extend

beyond the convex hulls of their training samples (It should

be noted that even if the original dimensionality of the input

space is low, the data samples are mapped to a higher-

dimensional feature space through kernel mapping during the

estimation of the nonlinear decision boundaries with SVMs.)

For example, for classes that are ellipsoids or boxes in high

dimensions and for any placement of any number of samples

sub-exponential in the dimension, the volume of their convex

hull is exponentially smaller than the volume of the real

class region. Similarly, for the Gaussians, the convex hull

of any probable placement of a sub-exponential number of

samples contain exponentially little probability mass. Other

alternative models to the convex hulls may be afﬁne hulls,

hyperspheres, hyperdisks, and hyperellipsoids, which are all

convex geometric models: Afﬁne hulls are linear subspaces

that have been shifted to pass through the centroids of the

classes. The hyperdisk model of a class is the intersection of

the afﬁne hull and the smallest bounding hypersphere of its

training samples [11]. Hyperellipsoids on the other hand are

characterized by the covariance matrix of the class samples

as well as their mean. Different studies [12,11,13,1,14,15]

show that when such convex models are used in “nearest

convex model” type classiﬁers, convex hulls of samples are

often outperformed by simpler convex models such as afﬁne

hulls or hyperdisks. These results are not surprising due to the

fact that high-dimensional approximations tend to be simple:

For a ﬁxed sample size, the amount of geometric details that

can be resolved usually decreases rapidly as the dimensionality

increases. Note that the methods we just cited are “nearest

convex model” type of classiﬁers rather than a “large margin

classiﬁer between the convex models”.

This paper introduces a new binary large margin clas-

siﬁer between the convex models (rather than the nearest

convex model classiﬁer) that approximates each class with

an hyperdisk model. One motivation for replacing nearest-

convex-model approaches with margin-based ones is that for

the nearest convex model classiﬁer, the pairwise decision

boundaries (surfaces equidistant from the two convex models)

are generically at least quadratic or piecewise quadratic in

complexity. Such decision boundaries are more ﬂexible than

linear ones, but in high dimensions when the training data is

scarce this may lead to overﬁtting, thus damaging general-

ization to unseen examples. Linear margin based approaches

have fewer degrees of freedom, so they are typically less

sensitive to the precise arrangement of the training samples.

For example for an SVM classiﬁer, motions of the SVM

support vectors parallel to the SVM decision surface do not

alter the margin and hence do not invalidate the classiﬁer

(although they might allow an even better one to be found),

whereas they do typically change the piecewise quadratic

decision surface of the equivalent nearest-convexhull classiﬁer.

The best separating hyperplane between hyperdisks is chosen

to be the one that maximizes the distance between them.

Finding such an hyperplane was ﬁrst discussed in [3], and a

solution based on a linear system and 2D Newton root-ﬁnding

process has been given there for linearly separable data. Here

this problem is formulated as a quadratically constrained

quadratic optimization problem, and it is also extended to the

nonlinear case by using the kernel trick. To handle multi-class

problems, we construct several binary classiﬁers and combine

them by using different techniques (e.g., one-against-one, one-

againts-rest, etc.) as in SVM.

The rest of the paper is organized as follows: In Section

2, we introduce the proposed method. Section 3 describes the

experimental results. Concluding remarks are given in Section

II. METHOD

A. Motivation and Problem Setting

Consider a binary classiﬁcation problem with the training

data given in the form {x

, y

}, i = 1, ..., n, y

∈ {−1, +1},

∈ IR

. The most popular large margin classiﬁer, SVM, ﬁnds

a separating hyperplane that maximizes the margin, which is

deﬁned as the distance between the hyperplane and closest

samples from the classes. To do so, SVM ﬁrst approximates

each class with a convex hull [9]. A convex hull consists of

all points that can be written as a linear combination of data

points where all coefﬁcients are nonnegative and sum up to

1. More formally, the convex hull of samples {x

}

i=1,...,n

class c can be written as

convex

(

x =

i=1

= 1, α

≥ 0

)

. (1)

Following this approximation, SVM ﬁnds the closest points

in these convex hulls. Then, these two points are connected

with a line segment. The plane, orthogonal to the line segment,

that bisects the line is selected to be the separating hyperplane

[9,10]. The convex hull model is the tightest possible convex

approximation to the class samples, and for classes with

more general convex forms, it is typically a substantial under-

approximation.

The large margin classiﬁer using afﬁne hulls on the other

hand approximates each class with an afﬁne hull [1]. An afﬁne

hull of a class c is the smallest afﬁne subspace containing the

class samples and the afﬁne hull of samples {x

}

i=1,...,n

can

be written as

affine

(

x =

i=1

= 1

)

. (2)

This is an unbounded and hence typically rather loose model

for a class in contrast to the convex hull approximation. Afﬁne

hulls work surprisingly better than convex hulls especially in

high-dimensional spaces with limited number of samples [12,

1,3]. However, one may have problems with afﬁne hull models

if the classes have similar or intersecting afﬁne hulls, but very

different distributions of samples within their afﬁne hulls.

The hyperdisk is a model between convex and afﬁne hulls,

and it captures the best aspects of each model. The hyperdisk

of a class is the intersection of the afﬁne hull and the smallest

bounding hypersphere of its training samples as illustrated

in Fig. 1, and it maintains the stability of the afﬁne hull

and hypersphere methods while providing better localiza-

tion of the training samples and hence potentially a better

discrimination. The hyperdisk of a class consists of afﬁne

combinations of class samples as before and an additional

constraint ||

i=1

− s

≤ r

. Thus, hyperdisk of a

class can be written as

disk

(

x =

i=1

= 1, ||

i=1

− s

≤ r

)

(3)

Here, s

is the center of the bounding hypersphere and r

is the radius. These hypersphere parameters can be found by

solving the following quadratic program [16]

min

+ γ

s.t. ||x

− s

≤ r

+ ξ

, i = 1, ..., n

(4)

or its dual

min





i,j

, x

i −

, x





s.t.

= 1, ∀i 0 ≤ α

≤ γ, i, j = 1, ..., n

(5)

where h.i denotes the inner product between samples. Here α

are Lagrange multipliers and γ ∈ [0, 1] is a ceiling parameter

that can be set to a ﬁnite value to eliminate overdistant points

as outliers. Given the solution, the center of the hypersphere

is s

and the radius is r

= ||x

− s

|| for any

for 0 < α

< γ.

Our goal is to ﬁnd the linear separating hyperplane that

yields the maximum margin between hyperdisks of classes.

Fig. 1. Hyperdisk model of a class is the intersection of afﬁne hull and

bounding hypersphere of class samples.

The points x which lie on the separating hyperplane satisfy

hw, xi + b = 0, where w is the normal of the separating

hyperplane, |b|/||w|| is the perpendicular distance from the

hyperplane to the origin, and ||w|| is the Euclidean norm of

w. For any separating hyperplane, all points x

in the positive

class satisfy hw, x

i + b > 0 and all points x

in the negative

class satisfy hw, x

i + b < 0 so that y

(hw, x

i + b) > 0 for

all training data points. Finding the best separating hyperplane

maximizing the margin between hyperdisks can be solved by

computing the closest points on them. The optimal separating

hyperplane will be the one that bisects perpendicularly the

line segment connecting the closest points as in SVM. The

offset (also called threshold), b, can be chosen as the distance

from the origin to the point halfway between the closest points

along the normal w. Once the best separating hyperplane is

determined, a new sample x

test

is classiﬁed based on the

decision function, f(x

test

) = sign(hw, x

test

i + b).

B. Formulation Based on Quadratically Constrained

Quadratic Optimization

In this setup, we formulate the ﬁnding the closest points

on the hyperdisks as a quadratically constrained quadratic

optimization (QCQP) problem. Now let X

and X

−

denote

the matrices whose columns are the samples belonging to the

positive and negative classes, respectively. We ﬁrst compute

the hypersphere center and radius for both classes. Then,

ﬁnding the closest points on the hyperdisk of classes can be

written as the following optimization problem

min

,α

−

||X

− X

−

s.t.

= 1,

−j

= 1, i(j) = 1, ..., n

−

||X

− s

≤ r

, ||X

−

− s

−

≤ r

−

(6)

If we let α ≡



−



, y be a column vector of combined la-

bels, and e be a column vector of ones of arbitrary dimension,

the optimization problem can be written as

min

Gα

s.t. α

= 1, α

−

= 1,

− 2α

+ s

≤ r

−

− 2α

−

+ s

−

≤ r

−

(7)

where G = (yy

) ◦



−

]

−

]



, G

= X

and G

−

= X

−

. Here ◦ denotes the element-wise

(Hadamard) multiplication of matrices. This is a quadratically

constrained quadratic programming problem and it is convex

since the Hessian matrix G of the objective function and the

other two Hessian matrices G

and G

−

of constraints are

positive semi-deﬁnite matrices.

QCQP problems can be transformed into semi-deﬁnite pro-

gramming (SDP), that is the optimization problem over the in-

tersection of an afﬁne set and cone of the positive semi-deﬁnite

matrices [17]. CVX software

uses this approach. However, in

our simulations with synthetic data, CVX sometimes failed to

ﬁnd a solution or returned a wrong solution. Therefore we

used MOSEK software

in the experiments since it always

successfully returned correct solutions with the simulated data.

MOSEK transforms the QCQP problem into a second-order

cone programming (SOCP) problem, and SOCP problems can

be solved in polynomial time by interior points methods more

efﬁciently than SDP [18]. Recently more efﬁcient algorithms

have been introduced for solving QCQP problems [19,20].

Given optimal α =



−



, the closest pair of points in the

two disks and the normal of the maximum margin separating

hyperplane can be found by using the following equation

w =

− x

−

) =

− X

−

), (8)

where x

and x

−

denote the closest points on the hyperdisks

of the positive and negative classes, respectively. The offset b

of the separating hyperplane will be

b = −

+ x

−

). (9)

If the hyperdisks are close to being linearly separable and

they overlap because of a few outliers, there are several ways

to overcome this problem. Firstly, ceiling parameter γ can be

set to a value smaller than 1 to ﬁnd a more smaller compact

hypersphere that does not include outliers. If this does not

solve the overlapping problem between the hyperdisks, we can

introduce lower and upper bounds on Lagrange coefﬁcients α

in (7) to reduce hyperdisks so that they do not overlap anymore

as in reducing convex hulls introduced in [9].

In case of linearly inseparable hyperdisks, we can map the

data to a higher-dimensional space where linear hyperdisks

constructed in the mapped space become separable by using

the kernel trick. Extension of the QCQP algorithm to the non-

linear case is easy. Note that objective function of (7) can be

written in terms of the dot products of samples, which allows

the use of the kernel trick, i.e., replacing the inner product

, x

i with the kernel function k(x

, x

) = hφ(x

), φ(x

)i.

Now let Φ

= [φ(x

), ..., φ(x

)] and Φ

−

[φ(x

−

), ..., φ(x

−

)] be matrices whose columns are the

mapped samples belonging to the positive and negative classes,

respectively. In the nonlinear case, hypersphere center of any

class (consider positive class for example) is also expressed

in terms of the mapped samples, i.e., φ(s

) = Φ

, where

= [φ(x

), ..., φ(x

)] is the matrix whose columns are

available at http://cvxr.com/cvx/

available at http://www.mosek.com/

the mapped samples associated to the nonzero coefﬁcients

returned by the hypersphere algorithm and the β

is the vector

of nonzero coefﬁcients. Note that l

≤ n

. Then, optimization

problem becomes

min

Gα

s.t. α

= 1, α

−

= 1,

− 2α

+ β

≤ r

−

− 2α

−

+ β

−

≤ r

−

(10)

where G = (yy

) ◦



[Φ

−

]

[Φ

−

]



= (yy

) ◦ K,

= Φ

, K

−

= Φ

−

, K

= Φ

, K

−

, K

= (Φ

)

, and K

−

= (Φ

−

)

−

. Note

that all kernel matrices K, K

, K

−

, K

−

, K

, and

−

can be easily computed by using kernel functions

k(x

, x

) = hφ(x

), φ(x

)i. Given the solution α, the normal

of the separating hyperplane is w = (1/2)

i=1

φ(x

Bias b can be computed using (9). A new sample x

test

classiﬁed by using

f(x

test

) = sign(

i=1

k(x

, x

test

) + b).

(11)

C. Extension to the Multi-Class Classiﬁcation Problems

To use the proposed methods in multi-class classiﬁcation

problems, we can use most of the strategies adopted for

extending binary SVM classiﬁers to the multi-class case. We

used the most popular two strategies in our experiments:

one-against-one (OAO) and one-against-rest (OAR). For a c-

class classiﬁcation problem, the OAR strategy trains c binary

classiﬁers, in which each classiﬁer separates one class from the

remaining c−1 classes. All classiﬁers are needed to be trained

on the entire training set, and the class label of a test sample is

determined according to the highest output of the classiﬁers in

the ensemble. On the other hand, the OAO strategy constructs

all possible c(c − 1)/2 binary classiﬁers out of c classes. The

decision of the ensemble is decided by max wins algorithm:

each OAO classiﬁer casts one vote for its preferred class, and

the ﬁnal decision is the class with the most votes. In addition to

these, one can also use Directed Acyclic Graphs (DAGs) [21]

or Binary Decision Trees [22] for multi-class classiﬁcation.

III. EXPERIMENTS

We tested

the proposed method, Large Margin Classiﬁer

based on HyperDisks (LMC-HD), on a number of datasets

and compared them to the SVM classiﬁer and large margin

classiﬁer based on afﬁne hulls (LMC-AH). Both OAR and

OAO approaches are used for multi-class classiﬁcation prob-

lems and we report the results of whichever yields the best.

A. Experiments with Linear Large Margin Classiﬁers

Here we test large margin classiﬁers on high-dimensional

linearly separable datasets.

For software see http://www2.ogu.edu.tr/∼mlcv/softwares.html.

Fig. 2. Aligned images of one subject from the AR face database.

1) AR Face Database: The AR Face data set [23] contains

26 frontal images with different facial expressions, illumi-

nation conditions and occlusions for each of 126 subjects,

recorded in two 13-image sessions spaced by 14 days. For

this experiment, we randomly selected 20 male and 20 female

subjects. The images were down-scaled (from 768×576),

aligned so that centers of the two eyes fell at ﬁxed coordinates,

then cropped to size 105×78. Some cropped images are

shown in Fig. 2. Raw pixel values were used as features. The

design parameters are set based on grid search using random

partitions of datasets into training and test set. For training

we randomly selected n = 5, 10, 15, 20, 25 samples for each

individual, keeping the remaining 26 − n for testing. This

process was repeated 15 times, with the ﬁnal classiﬁcation

rates being obtained by averaging the 15 results. The results

are plotted in Fig. 3.

Fig. 3. Classiﬁcation rates as a function of different number of samples per

class on the AR Face Database.

When 5 samples per class are used, large margin classiﬁers

based on afﬁne hulls and hyperdisks respectively yield 92.64%

and 92.34% classiﬁcation accuracy, and they signiﬁcantly out-

perform SVM, which yields 90.24% accuracy. As the number

Fig. 4. Selected 40 objects from the Coil100 database.

of samples per class is increased, all methods begin to yield

similar classiﬁcation accuracies. These results show that afﬁne

hull and hyperdisks are better models for representing classes

in high-dimensional spaces with limited number of samples.

2) Coil100 Object Database: The Coil100 dataset

in-

cludes 72 views each of 100 different objects taken on a

turntable at orientations spaced at 5 degree intervals. We chose

40 objects randomly for the experiments, and these objects are

shown in Fig. 4. We used the raw grayscale pixels of the down-

sampled 32×32 images as input features, without applying

any further visual preprocessing. For training we randomly

selected n = 10, 20, ..., 60 images of each object, keeping the

remaining 72 − n for testing. The results are given in Fig. 5.

As can be seen in the ﬁgure, all classiﬁcation methods yielded

same classiﬁcation accuracies for this particular database.

Fig. 5. Classiﬁcation rates for different number of samples per class on the

Coil Database.

B. Experiments with Nonlinear Large Margin Classiﬁers

In this group of experiments, we tested the kernelized ver-

sions of the methods on eight lower-dimensional datasets from

the UCI repository

: Ionosphere, Iris, Image Segmentation

(IS), Letter Recognition (LR), Multiple Features (MF) - pixel

available at www1.cs.columbia.edu/CAVE/ software/softlib/coil-100.php

available at at http://archive.ics.uci.edu/ml/

TABLE I

LOW-DIMENSIONAL DATABASES SELECTED FROM UCI

REPOSITORY

Databases Number of Classes Data Set Size Dimensionality

Ionosphere 2 351 34

Iris 3 150 4

IS 7 2310 19

LR 26 20000 16

MF 10 2000 256

PID 2 768 8

Wine 3 178 13

WDBC 2 569 30

TABLE II

CLASSIFICATION RATES (%) ON THE UCI DATASETS.

UCI LMC-HD LMC-AH SVM

Ionosphere 94.01±3.1 93.73±3.4 92.87±3.2

Iris 96.67±2.3 94.67±2.9 95.33±3.8

IS 97.23±0.3 95.28±0.7 97.10±0.4

LR 99.99±0.02 99.99±0.02 99.64±0.12

MF 98.30±0.5 98.30±0.5 98.00±0.4

PID 99.87±0.3 99.87±0.3 99.87±0.3

Wine 98.82±1.6 98.82±1.6 98.20±1.6

WDBC 97.01±0.5 96.00±0.8 97.36±0.9

averages, Pima Indian Diabetes (PID), Wine, and Wisconsin

Diagnostic Breast Cancer (WDBC). The key parameters of

UCI Repository datasets are summarized in Table I. We used

the Gaussian kernels for all datasets. All design parameters are

set based on grid search using random partitions of datasets.

Classiﬁcation accuracies obtained by 5-fold cross-validation

for UCI databases are given in Table II. Among all tested

methods, the proposed hyperdisk based large margin classiﬁer

achieves the best results except for the WDBC database. For

Ionosphere, LR, MF, and Wine databases, both afﬁne hull and

hyperdisk based classiﬁers achieve better results than SVM.

On the other hand, for IS and Iris databases SVM achieves

better results than afﬁne hull based classiﬁer, yet the hyperdisk

classiﬁer yields even higher results than SVM. There is only

one case where the hyperdisk classiﬁer is outperformed by

SVM. Overall these results show that the hyperdisk model

captures the best aspects of afﬁne hulls and convex hulls. Thus,

the corresponding classiﬁer using hyperdisks either achieves

the best classiﬁcation accuracy or comparable results to the

other convex class model classiﬁers using afﬁne or convex

hulls as demonstrated in Table II.

IV. SUMMARY AND CONCLUSION

We investigated the idea of basing large margin classiﬁers

on hyperdisks of classes as an alternative to the afﬁne and

convex hull classiﬁers. Given two hyperdisk models, their

corresponding large margin classiﬁer is easily determined by

ﬁnding a closest pair of points on these two models and

bisecting the displacement between them. To this end, we

formulated the problem as a convex quadratically constrained

quadratic optimization problem. Extension to the nonlinear

case is realized by using the kernel trick.

Hyperdisk is a model between an afﬁne and convex hull,

and it captures the best aspects of these. More precisely, since

an afﬁne hull is restricted to lie in a bounding hypersphere

in an hyperdisk model, hyperdisks provide better localization

of class samples compared to afﬁne hulls. At the same time,

hyperdisks are looser than convex hulls, and this enables us

to approximate classes more accurately in high-dimensinal

spaces. Experimental results verify these facts. Hyperdisk

classiﬁer always produced the best classiﬁcation results or

comparable results to the other convex class model classiﬁers

achieving the best result. There is not a single case where the

hyperdisk classiﬁer is signiﬁcantly outperformed by the other

convex class model classiﬁers, whereas it signiﬁcantly outper-

forms others on some databases. However, these improvements

come with a price. Training time of the hyperdisk classiﬁer

is slow compared to other large margin classiﬁers since

it requires running a quadratic programming algorithm (for

ﬁnding hypersphere parameters) followed by QCQP algorithm.

Another limitation is related to the real-time efﬁciency (testing

time). Hyperdisk classiﬁer does not return a sparse solution as

in afﬁne hull classiﬁer, thus its real-time efﬁciency is slow

compared to the SVM classiﬁer. However, this limitation can

be overcome by running a reduced set algorithm [24, 25] that

enables us to derive a sparse solution. As a future work, we are

considering to revise the QCQP algorithm such that it returns

sparse solutions.

ACKNOWLEDGMENT

This work is supported by the Young Scientists Award

Programme (T

UBA-GEB

IP/2010-11) of the Turkish Academy

of Sciences.

REFERENCES

[1] H. Cevikalp, B. Triggs, H. S. Yavuz, Y. Kucuk, M. Kucuk,

and A. Barkana. Large margin classiﬁers based on afﬁne hulls.

Neurocomputing, 73:3160–3168, 2010.

[2] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection

for cancer classiﬁcation using support vector machines. Ma-

chine Learning, 46:389–422, 2002.

[3] H. Cevikalp and B. Triggs. Large margin classiﬁers based on

convex class models. In International Conference on Computer

Vision Workshops, 2009.

[4] C. Cortes and V. Vapnik. Support vector networks. Machine

Learning, 20:273–297, 1995.

[5] J. Platt. Fast training of support vector machines using sequen-

tial minimal optimization. In Advances in Kernel Methods: Sup-

port Vector Learning, pages 185–208. MIT Press, Cambridge,

1999.

[6] T. Joachims. Making large-scale support vector machine learn-

ing practical. In Advances in Kernel Methods: Support Vector

Learning, pages 169–184. MIT Press, Cambridge, 1999.

[7] R.-E. Fan, P.-H. Chen, and C.-J. Lin. Working set selection

using second order information for training svm. Journal of

Machine Learning Research, 6:1889–1918, 2005.

[8] I. W. Tsang, J. T. Kwok, and P.-M. Cheung. Core vector

machines: Fast svm training on very large data sets. Journal of

Machine Learning Research, 6:363–392, 2005.

[9] K. P. Bennett and E. J. Bredensteiner. Duality and geometry in

svm classiﬁers. In ICML, 2000.

[10] D. J. Crisp and C. J. Burges. A geometric interpretation of

ν -svm classiﬁers. In Neural Information Processing Systems,

1999.

[11] H. Cevikalp, B. Triggs, and R. Polikar. Nearest hyperdisk

methods for high-dimensional classiﬁcation. In ICML ’08:

Proceedings of the 25th international conference on Machine

learning, pages 120–127, 2008.

[12] P. Vincent and Y. Bengio. K-local hyperplane and convex

distance nearest neighbor algorithms. In NIPS, 2001.

[13] J. Laaksonen. Subspace classiﬁers in recognition of handwritten

digits. PhD thesis, Helsinki University of Technology, 1997.

[14] H. Cevikalp, D. Larlus, M. Neamtu, B. Triggs, and F. Ju-

rie. Manifold based local classiﬁers: linear and nonlinear

approaches. Journal of Signal Processing Systems, 61:61–73,

2010.

[15] M. B. Gulmezoglu, V. Dzhafarov, and A. Barkana. The common

vector approach and its relation to principal component analysis.

IEEE Trans. Speech Audio Proc., 9:655–662, 2001.

[16] D. M. J. Tax and R. P. W. Duin. Support vector data description.

Machine Learning, 54:45–66, 2004.

[17] L. Vandenberghe and S. Boyd. Semideﬁnite programming.

SIAM Review, 38:49–95, 1996.

[18] F. Alizadeh and D. Goldfarb. Second-order cone programming.

Mathematical Programming, 95:3–51, 2003.

[19] C.-M. Tang and J.-B. Jian. A sequential quadratically con-

strained quadratic programming method with an augmented

lagrangian line search function. Journal of Computational and

Applied Mathematics, 220:527–547, 2008.

[20] H. Tuy and N. T. Hoai-Phuong. A robust algorithm for quadratic

optimization under quadratic constraints. Journal of Global

Optimization, 37:557–569, 2007.

[21] J. C. Platt, N. Cristianini, and J. Shawe-taylor. Large margin

dags for multiclass classiﬁcation. In Advances in Neural

Information Processing Systems, pages 547–553. MIT Press,

2000.

[22] V. Vural and J. G. Dy. A hierarchical method for multi-class

support vector machines. In ICML ’04: Proceedings of the

twenty-ﬁrst international conference on Machine learning, page

105, New York, NY, USA, 2004. ACM.

[23] A. M. Martinez and R. Benavente. The AR face database.

Technical report, Computer Vision Center, Barcelona, Spain,

1998.

[24] B. Sch

olkopf, S. Mika, C. J. C. Burges, P. Knirsch, K. R. M

uller,

G. Ratsch, and A. J. Smola. Input space versus feature space in

kernel-based methods. IEEE Transactions on Neural Networks,

10:1000–1017, 1999.

[25] S. Mika, B. Sc

olkopf, A. Smola, K.-R. M

uller, M. Scholz, and

G. Ratsch. Kernel pca and de-noising in feature spaces. In

Neural Information Processing Systems (NIPS), 1998.

Maximum margin classification based on flexible convex hulls for fault diagnosis of roller bearings

Article

Jul 2015
MECH SYST SIGNAL PR

Large-Scale Robust Transductive Support Vector Machines

Article

Jan 2017
NEUROCOMPUTING

In this paper, we propose a robust and fast transductive support vector machine (RTSVM) classifier that can be applied to large-scale data. To this end, we use the robust Ramp loss instead of Hinge loss for labeled data samples. The resulting optimization problem is non-convex but it can be decomposed to a convex and concave parts. Therefore, the optimization is accomplished iteratively by solving a sequence of convex problems known as concave-convex procedure. Stochastic gradient (SG) is used to solve the convex problem at each iteration, thus the proposed method scales well with large training set size for the linear case (to the best of our knowledge, it is the second transductive classification method that is practical for more than a million data). To extend the proposed method to the nonlinear case, we proposed two alternatives where one uses the primal optimization problem and the other uses the dual. But in contrast to the linear case, both alternatives do not scale well with large-scale data. Experimental results show that the proposed method achieves comparable results to other related transductive SVM methods, but it is faster than other transductive learning methods and it is more robust to the noisy data.

Efficient Human Activity Recognition in Large Image and Video Databases

Thesis

Full-text available

Sep 2014

Muhammad Shahzad Cheema

Vision-based human action recognition has attracted considerable interest in recent research for its applications to video surveillance, content-based search, healthcare, and interactive games. Most existing research deals with building informative feature descriptors, designing efficient and robust algorithms, proposing versatile and challenging datasets, and fusing multiple modalities. Often, these approaches build on certain conventions such as the use of motion cues to determine video descriptors, application of off-the-shelf classifiers, and single-factor classification of videos. In this thesis, we deal with important but overlooked issues such as efficiency, simplicity, and scalability of human activity recognition in different application scenarios: controlled video environment (e.g.~indoor surveillance), unconstrained videos (e.g.~YouTube), depth or skeletal data (e.g.~captured by Kinect), and person images (e.g.~Flicker). In particular, we are interested in answering questions like (a) is it possible to efficiently recognize human actions in controlled videos without temporal cues? (b) given that the large-scale unconstrained video data are often of high dimension low sample size (HDLSS) nature, how to efficiently recognize human actions in such data? (c) considering the rich 3D motion information available from depth or motion capture sensors, is it possible to recognize both the actions and the actors using only the motion dynamics of underlying activities? and (d) can motion information from monocular videos be used for automatically determining saliency regions for recognizing actions in still images?

Sparsity preserving projection aided baselined hyperdisk modeling for interpretable machine health monitoring

Article

Oct 2023
MECH SYST SIGNAL PR

Using electronic nose to recognize fish spoilage with an optimum classifier

Article

Full-text available

Jun 2019

For automatic, rapid, accurate and objective classification of fish freshness under cold storage an electronic nose using seven metal dioxide gas sensors was developed to detect fish volatiles. Total viable count and Total volatile base nitrogen analyses were conducted simultaneously to indicate fish quality status. By sampling fish headspace, patterns were obtained during 15 storage days. 35 appropriate odor parameters were selected from each test. Principle component analysis was applied to reduce the 35-dimensional vectors to 5-dimensional vectors and clustered samples into fresh, semi fresh and spoiled. With 5-dimensional vectors as input, multilayer perceptron neural network modeled fish spoilage based on these three classes with 96.87 percent correct rate of test data. We found that the newly introduced hyper disk models maximum margin optimum classifier yielded 100 percent correct rate that could be successfully applied in industry for the diagnosis of fish spoilage. Graphical abstract Open image in new window

Hyperdisk Based Large Margin Classifiers

Article

Jun 2013
PATTERN RECOGN

We introduce a large margin linear binary classification framework that approximates each class with a hyperdisk – the intersection of the affine support and the bounding hypersphere of its training samples in feature space – and then finds the linear classifier that maximizes the margin separating the two hyperdisks. We contrast this with Support Vector Machines (SVMs), which find the maximum-margin separator of the pointwise convex hulls of the training samples, arguing that replacing convex hulls with looser convex class models such as hyperdisks provides safer margin estimates that improve the accuracy on some problems. Both the hyperdisks and their separators are found by solving simple quadratic programs. The method is extended to nonlinear feature spaces using the kernel trick, and multi-class problems are dealt with by combining binary classifiers in the same ways as for SVMs. Experiments on a range of data sets show that the method compares favourably with other popular large margin classifiers.

The AR face database

Article

Full-text available

Jan 1998

K-local hyperplane and convex distance nearest neighbour algorithms

Article

Jan 2002

Support-vector networks

Article

Jan 2009

Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

The ar face database. Technical report. CVC Technical

Article

Semidefinite programming

Article

Jan 1995

Support Vector Networks

Article

Sep 1995

The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

Subspace classifiers in recognition of handwritten digits

Article

Jan 1997

Jorma Tapio Laaksonen

This thesis consists of two parts. The first part reviews the general structure of a pattern recognition system and, in particular, various statistical and neural classification algorithms. The presentation then focuses on subspace classification methods that form a family of semiparametric methods. Several improvements on the traditional subspace classification rule are presented. Most importantly, two new classification techniques, here named the Local Subspace Classifier (LSC) and the Convex Local Subspace Classifier (LSC+), are introduced. These new methods connect the subspace principle to the family of nonparametric prototype-based classifiers and, thus, seek to combine the benefits of both approaches. The second part addresses the recognition of handwritten digits, which is the case study of this thesis. Special attention is given to feature extraction methods in optical character recognition systems. As a novel contribution, a new method, here named the error-corrective feature extraction, is presented. The prototype recognition system developed for the experiments is described and various options in the implementation are discussed. For the background of the experiments, thirteen well-known statistical and neural classification algorithms were tested. The results obtained with two traditional subspace methods and ten novel techniques presented in this thesis are compared with them. The results show that the Convex Local Subspace Classifier performs better than any other classification algorithm in the comparison. The conclusions of this thesis state that the suggested enhancements make the subspace methods very useful for tasks like the recognition of handwritten digits. This result is expected to be applicable in other similar cases of recognizing two-dimensional isolated visual objects.

Making Large-Scale Support Vector Machine Learning Practical

Conference Paper

Feb 1999

Thorsten Joachims

A Geometric Interpretation of v-SVM Classifiers

Article

Jan 1999
Adv Neural Inform Process Syst

We show that the recently proposed variant of the Support Vector machine (SVM[) algorithm, known as ν-SVM, can be interpreted as a maximal separation between subsets of the convex hulls of the data, which we call soft convex hulls. The soft convex hulls are controlled by choice of the parameter ν If the intersection of the convex hulls is empty, the hyperplane is positioned halfway between them such that the distance between convex hulls, measured along the normal, is maximized 5 and if it is not, the hyperplane s normal is similarly determined by the soft convex hulls, but its position (perpendicular distance from the origin) is adjusted to minimize the error sum. The proposed geometric interpretation of ν-SVM also leads to necessary and sufficient conditions for the existence of a choice of v for which the ν-SVM solution is nontrivial.

Working Set Selection Using Second Order Information for Training SVM

Article

Jan 2005
J MACH LEARN RES

Large Margin Classifier Based on Hyperdisks

Abstract and Figures

Recommended publications

2-Sided Best Fitting Hyperplane Classifier

Hyperdisk Based Large Margin Classifiers

Large Margin Classifier Based on Affine Hulls

Large margin classifiers based on affine hulls