ResearchPDF Available

SVM Classification with Linear and RBF kernels

July 2015

July 2015

DOI:10.13140/RG.2.1.3351.4083

Authors:

University of Macedonia

The paper attempts to survey the existing research and development efforts involving the use of Matlab for classification. In particular, it aims at providing a representative view of support vector machines and a way they can be trained and learn from all kinds of data. Two kind of algorithms are presented with short overviews, then discussed separately and finally in comparison with the results, including a few figures. Finally, a summary of the considered systems is presented together with the experimental results.

Linear Kernel in comparison with RBF kernel results

…

A hyperplane separating two classes with the maximum margin. The circled examples that lie on the canonical hyperplanes are called support vectors.

a) Training data and an overfitting classifier (b) Applying an overfitting classifier on testing data

…

Figures - uploaded by Vasileios Apostolidis-Afentoulis

Content may be subject to copyright.

Content uploaded by Vasileios Apostolidis-Afentoulis

Content may be subject to copyright.

SVM CLASSIFICATION WITH LINEAR AND RBF KERNELS

Vasileios Apostolidis-Afentoulis

Department of Information Technology

Alexander TEI of Thessaloniki

P.O. Box 141, 574 00,

Thessaloniki, Greece

vapostolidis@gmail.com

Konstantina-Ina Lioufi

Department of Information Technology

Alexander TEI of Thessaloniki

P.O. Box 141, 574 00,

Thessaloniki, Greece

ntinaki_l@hotmail.com

ABSTRACT

The paper attempts to survey the existing research and

development efforts involving the use of Matlab for

classification. In particular, it aims at providing a

representative view of support vector machines and a way

they can be trained and learn from all kinds of data. Two

kind of algorithms are presented with short overviews, then

discussed separately and finally in comparison with the

results, including a few figures. Finally, a summary of the

considered systems is presented together with the

experimental results.

Index Terms— SVM, Classification, Matlab, Linear,

RBF

1. INTRODUCTION

1.1. Support vector machines

The support vector machines’ (SVMs) technique was

introduced by Vapnik [1] and developed fast in recent years.

Several studies reported that SVMs, generally, are able to

deliver higher classification accuracy than the other existing

classification algorithms [2], [3]. In the last decade Support

Vector Machines (SVMs) have emerged as an important

learning technique for solving classification and regression

problems in various fields, most notably in computational

biology, finance and text categorization. This is due in part

to built-in mechanisms, to ensure good generalization which

leads to accurate prediction, the use of kernel functions to

model non-linear distributions, the ability to train relatively

quickly on large datasets, using novel mathematical

optimization techniques and most significantly the

possibility of theoretical analysis, using computational

learning theory.[5] The main objective of statistical learning

is to find a description of an unknown dependency between

measurements of objects and certain properties of these

objects. The measurements, also known as "input variables",

are assumed to be observable in all objects of interest. On

the contrary, the properties of the objects, or "output

variables", are in general available only for a small subset of

objects known as examples. The purpose of estimating the

dependency between the input and output variables is to be

able to determine the values of output variables for any

object of interest. In pattern recognition, this relates to

trying to estimate a function f: RN  {±1} that can correctly

classify new examples based on past observations.[4] The

SVM software that has been used, is LIBSVM [6] with the

Linear Kernel and the RBF(radial basis function) kernel. As

the execution time of model selection is such an important

issue for practical applications of SVM, a number of studies

have been conducted on this paper [7], [8], [9], [10]. The

basic approach employed by these recent studies is to reduce

the search space of the parameter combinations.[11]

1.2. Linear SVMs

1.2.1 Separable case

In the binary classification setting, let ((x1, y1)...(xn,

yn)) be the training dataset where xi are the feature vectors

representing the instances (i.e. observations) and yi {-

1,+1} be the labels of the instances. Support vector learning

is the problem of finding a separating hyperplane that

separates the positive examples (labeled +1) from the

negative examples (labeled -1) with the largest margin. The

margin of the hyperplane is defined as the shortest distance

between the positive and negative instances that are closest

to the hyperplane. The intuition behind searching for the

hyperplane with a large margin is that a hyperplane with the

largest margin should be more resistant to noise than a

hyperplane with a smaller margin.

Formally, suppose that all the data satisfy the constraints

 

  (1),(2)

Where the w is the normal to the hyperplane,  is

the perpendicular distance from the hyperplane to the origin,

and  is the Euclidean norm of w.

Figure 1-A hyperplane separating two classes with the maximum margin.

The circled examples that lie on the canonical hyperplanes are called

support vectors.

Two constraints can be conveniently combined into the

following:

(3)

The training examples for which (3) holds lie on the

canonical hyperplanes (H1 and H2 in figure 1). The margin ρ

can be easily computed as the distance between H1 and H2 .





 



(4)

Hence, the maximum margin separating hyperplane be can

constructed by solving the following Primal optimization

problem:

 

 subject to 

(5)

We switch to Lagrangian formulation of this problem for

two main reasons: i) the constraints are easier to handle, and

ii) training data only appears as a dot product between

vectors. This formulation introduces a new Lagrange

multiplier ai for each constraint and the formulation of the

minimization problem then becomes,



 





 





(6)

with Lagrange multipliers  for each constraint in (5).

The objective is then to minimize (6) with respect to w and

b simultaneously require that the derivatives of 

with respect to all the  vanish.

1.2.2 Non-Separable case

The previous section discussed the case where it is

possible to linearly separate the training instances that

belong to different classes. Obviously this SVM formulation

will not find a solution if the data cannot be separated by a

hyperplane. Even in the cases where the data is linearly

separable, SVM may overfit to the training data in its search

for the hyperplane that completely separates all of the

instances of both classes. For instance, an individual outlier

in a dataset, such as a pattern which is mislabeled, can

crucially affect the hyperplane. These concerns prompted

the development of soft margin SVMs [10], which can

handle linearly non-separable data by introducing positive

slack variables  that relax the constraints in (2.1) and (2.2)

at a cost proportional to the value of . Based on this new

criteria, the relaxed constraints with the slack variables then

become:

 



  (7)

which permits some instances to lie inside the margin or

even cross further among the instance of the opposite class

(see figure 2). While this relaxation gives SVM flexibility to

decrease the influence of outliers, from an optimization

perspective, it is not desirable to have arbitrarily large

values for  as that would cause the SVM to obtain trivial

and sub-optimal solutions.

Figure 2 -Soft margin SVM

Thus, “the relaxation is constrained” by making the slack

variables part of the objective function (5) yielding,











(8)

subject to the constraints in (7). The cost coefficient C>0 is

a hyperparameter that specifies the misclassification penalty

and is tuned by the user based on the classification task and

dataset characteristics. As in the separable case, the solution

to (8) can be shown to have an expansion



 



(9)

where the training instances with  are the support

vectors of the SVM solution. Note, that the penalty function

related to the slack variables is linear, which disappears

when (8) is being transformed into the dual formulation









 

subject to,







(10)

The dual formulation is conveniently very similar to the

linearly separable case, with the only difference being the

extra upper bound of C for the coefficients . Obviously as

the misclassification penalty C (10) converges to the

linearly separable case.

1.3 RBF SVMs

1.3.1 General

In general, the RBF kernel is a reasonable first choice.

This kernel nonlinearly maps samples into a higher

dimensional space, so it, unlike the linear kernel, can handle

the case when the relation between class labels and

attributes is nonlinear. Furthermore, the linear kernel is a

special case of RBF [13] since the linear kernel with a

penalty parameter  has the same performance as the RBF

kernel with some parameters (C, γ). In addition, the sigmoid

kernel behaves like RBF for certain parameters [14].

The second reason is the number of hyperparameters

which influences the complexity of model selection. The

polynomial kernel has more hyperparameters than the RBF

kernel. Finally, the RBF kernel has fewer numerical

difficulties. One key point is 0 < Kij ≤ 1 in contrast to

polynomial kernels of which kernel values may go to

infinity (γxiT xj + r > 1) or zero (γxiT xj + r < 1) while the

degree is large. Moreover, we must note that the sigmoid

kernel is not valid (i.e. not the inner product of two vectors)

under some parameters [15]. There are some situations

where the RBF kernel is not suitable. In particular, when the

number of features is very large, one may just use the linear

kernel.

1.3.2 Cross-validation and Grid-search

There are two parameters for an RBF kernel: C and γ. It is

not known beforehand which C and γ are best for a given

problem; consequently some kind of model selection

(parameter search) must be done. The goal is to identify

good (C, γ) so that the classifier can accurately predict

unknown data (i.e. testing data). Note that it may not be

useful to achieve high training accuracy (i.e. a classifier

which accurately predicts training data whose class labels

are indeed known). As discussed above, a common strategy

is to separate the data set into two parts, of which one is

considered unknown. The prediction accuracy obtained

from the “unknown” set more precisely reflects the

performance on classifying an independent data set. An

improved version of this procedure is known as cross-

validation [17].

In v-fold cross-validation, we first divide the training set

into v subsets of equal size. Sequentially one subset is tested

using the classifier trained on the remaining v −1 subsets.

Thus, each instance of the whole training set is predicted

once so the cross-validation accuracy is the percentage of

data which are correctly classified.

The cross-validation procedure can prevent the overfitting

problem. Figure 3 represents a binary classification problem

to illustrate this issue. Filled circles and triangles are the

training data while hollow circles and triangles are the

testing data. The testing accuracy of the classifier in Figures

3a and 3b is not good since it overfits the training data. If

we think of the training and testing data in Figure 3a and 3b

as the training and validation sets in cross-validation, the

accuracy is not good. On the other hand, the classifier in 3c

and 3d does not overfit the training data and gives better

cross-validation as well as testing accuracy.

It is recommend a “grid-search” on C and γ using cross-

validation. Various pairs of (C, γ) values are tried and the

one with the best cross-validation accuracy is picked. It is

found that trying exponentially growing sequences of C and

γ is a practical method to identify good parameters (for

example, C = 2−5, 2−3, . . . , 215, γ = 2−15, 2−13, . . . , 23). The

grid-search is straightforward but seems naive. In fact, there

are several advanced methods which can save computational

cost by, for example, approximating the cross-validation

rate. However, there are two motivations why we prefer the

simple grid-search approach [17].

Figure 3- (a) Training data and an overfitting classifier (b) Applying an

overfitting classifier on testing data

Figure 3 (c) Training data and a better classifier, (d) Applying a better

classifier on testing data - An overfitting classifier and a better classifier (●

and ▲: training data; O and ∆ testing data).

One is that, psychologically, we may not feel safe to

use methods which avoid doing an exhaustive parameter

search by approximations or heuristics. The other reason

is that the computational time required to find good

parameters by grid-search is not much more than that by

advanced methods since there are only two parameters.

Furthermore, the grid-search can be easily parallelized

because each (C, γ) is independent. Many of advanced

methods are iterative processes, e.g. walking along a path,

which can be hard to parallelize [17].

Since doing a complete grid-search may still be time-

consuming, we recommend using a coarse grid first. After

identifying a “better” region on the grid, a finer grid search

on that region can be conducted. To illustrate this, we do an

experiment on the problem german from the Statlog

collection [16]. After scaling this set, we first use a coarse

grid (Figure 5) and find that the best (C, γ) is (23, 2−5) with

the cross-validation rate 77.5%. Next we conduct a finer

grid search on the neighborhood of (23, 2−5) (Figure 6) and

obtain a better cross-validation rate 77.6% at (23.25, 2−5.25).

After the best (C, γ) is found, the whole training set is

trained again to generate the final classifier.

The above approach works well for problems with

thousands or more data points. For very large data sets a

feasible approach is to randomly choose a subset of the data

set, conduct grid-search on them, and then do a better-

region-only grid-search on the complete data set.

Figure 5 - Loose grid search on C = 2−5, 2−3, . . . , 215 and γ = 2−15, 2−13, . . .

,23.[17]

Figure 6 - Fine grid-search on C = 21, 21.25, . . . , 25 and γ = 2−7, 2−6.75, . . . ,

2−3.[17]

1.4 Dataset Description

Title of Dataset: ISOLET (Isolated Letter Speech

Recognition) [12]

This data set was generated as follows: 150 subjects spoke

the name of each letter of the alphabet twice. Hence, we

have 52 training examples from each speaker. The speakers

are grouped into sets of 30 speakers each, and are referred to

as isolet1, isolet2, isolet3, isolet4, and isolet5. The data

appears in isolet1+2+3+4.data in sequential order, first the

speakers from isolet1, then isolet2, and so on. The test set,

isolet5, is a separate file. Note, that 3 examples are missing.

They were dropped due to difficulties in recording. This is a

good domain for a noisy, perceptual task. It is also a very

good domain for testing the scaling abilities of algorithms.

We have formatted the two separate files into one data file

(isolet12345.data) for convenience and we will provide it, as

well.

The number of instances from isolet1+2+3+4.data is 6238

and from isolet5.data is 1559. The total number of instances

is 7797. The number of attributes is 617 plus 1 for the class

which is the last column. All attributes are continuous, real-

valued attributes scaled into the range of -1.0 to 1.0 . The

features include spectral coefficients, contour features,

sonorant features, pre-sonorant and post-sonorant features.

There are no missing attribute values.

2. EXPERIMENTS

2.1 General Explanation - Linear Experiments

First of all, we have to bring to mind that the given dataset

was split into two separate files, so we had to combine them

into one. The process of the file combination was verified

with a program called WinMerge, which is an open source

differencing and merging tool for Windows.

Cross-validation Method has been used to split dataset

into random pieces of data. For this implementation in

Matlab, «Holdout» parameter is used and sets the amount of

data that have been left out of the training procedure. While

keeping only the 10% of the data for training, two kinds of

indexes have been created, one for training and one for

testing data. So far, all this process was made to find the

sum of the total instances’ number. The next part, contains

the selection of the svm training model, where the parameter

C (or ξ as already has been explained in introduction) is also

included and then a vector is created to store the values

needed for the experiments.

After setting the «svmtrain» , it’s «svm prediction’s»| turn

to take the lead. As returned values, we get the labels and

the accuracy and ignore the last attribute which is

information about the training model. This situation is split

into 2 parts, one for training and one for testing data. Also,

the results of training and testing predictions are printed into

a plot with two subplots. The first subplot presents the

training data while the other one, the testing data.

The whole procedure is encapsulated into a loop of one

hundred iterations, so it’s very important to capture an

average accuracy value for the training and testing

prediction.

Finally, there is one more plot, a special one, a semi

logarithmic plot which, because of the nature of C

parameter, it helps us to comprehend the results more easily.

This plot, makes a comparison of the mean value Accuracy

with C parameter.

In the following figures, you can see the visualization of

the training and testing procedure for some specific values

of C. The X axis, represents the number of classes and the Y

axis represents the total number of instances that have been

used. From the results, we can realize that for C=10-4 we get

very low accuracy 0,85 % (figure 7). For C=10-3, accuracy

raises to 8,28% (figure 8). For C=101 and for higher values,

our system performs its highest peak (figure 9), while the

accuracy reaches 92,61% (figure 10).

Figure 7 - Training & Testing Data for C=10-4

Figure 8 - Training & Testing Data for C=10-3

Figure 9 - Training & Testing Data for C=101 & for higher values

Figure 10 – Final plot, comparison of Accuracy with C

2.2 RBF Experiments

The first step, in the SVM-RBF code, is to load the dataset

and also to set values for the parameters (C and G). Both (C

& G), initial values are set at 100. The C was kept stable,

while G was changing constantly. Then, the initialization of

the data begins and the path is defined, where the graphs

will be stored. A big part of the program are the iterations.

In this section, cross validation is implemented. Two pairs

of indexes have been made:

 The first pair of indexes concludes the instances

and the labels, in order to store the training data.

 The second pair of indexes concludes the instances

and the labels, in order to store the testing data.

The training of RBF kernel starts and then, the

development of the parameters that have already been set,

follow up. A number of series test the model, while the

values of the prediction, the accuracy and the labels’

description is returned. There is also, an inspection of the

elements of the accuracy’s vector and its conversion into

string format. The next part of the program refers to the

graphs. A plot is created with two subplots. Green colors

for circles and red dots have been selected to be displayed in

the subplots. Moreover, a legend has been created, including

the titles of the graphs, the number of each iteration, the

accuracy amount and the parameter values about C and G.

After that, the output of the data is exported temporarily

into an xls file, the saving type of images is set and the kind

of information that is going to appear in each graph is

selected (minimum/ maximum accuracy). The most

important part of this section is the reduction of the G, by

dividing it with 1.1 in each rerun. The X axis, represents the

number of classes and the Y axis represents the total number

of instances that have been used.

Finally, there is a procedure about comparing the G’s

values with accuracy and that appears at the final graph.

Taking a quick peek on this graph (figure 14), it turns out

that accuracy gets to really high values, at 92,71%.

Figure 11 - Training & Testing Data for C=102 & G=3*10-1

Figure 12 - Training & Testing Data for C=102 & for G=8*10-2

Figure 13- Training & Testing Data for C=102 & for G=9*10-3 and lower

Figure 14 - Final plot, Comparison of Accuracy with G

3. COMPARISON OF EXPERIMENTAL RESULTS

In order to show the validity and the accuracy of

classification of our algorithms, we performed a series of

experiments on standard benchmark data-sets. In this series

of experiments, the data were split into training and test sets.

The differences between the algorithms, are: in Linear

kernel, four values have been used for C parameter, so that

the output can be checked. On the other hand, in Radial

Basis Function kernel, C parameter is being kept stable in

102, while Gamma parameter has been constantly changing

from 102 to 8*10-2 .

As far as we can perceive from the 2 final graphical

representations, the achievement results are almost the same

from the two kernels. More specifically, both kernels

achieve the same level of accuracy, almost 93%.

In the current dataset, data are linear elements, so we

cannot make a real comparison of the 2 kernels. But in a

different dataset with non-linear data, radial basis function

kernel, would generalize much better, unlike Linear kernel.

In the table below, we can observe the elements from

each algorithm, separately and compare the results.

Table 1- Linear Kernel in comparison with RBF kernel results

Isolet Dataset

Linear Kernel

RBF Kernel

Instances

7797

Attributes

617

Train Data

780

Test Data

7017

Iterations

100

10-4 /10-3/101/102

102

8*10-2 - 102

Accuracy

0,85/8,28/92,61

0,85/8,49/92,73

Decision

boundary

Linear

Nonlinear

Related distance

function

Euclidian distance

Regularization

[18]

Training-set,

cross-validation

to select C

(defining

misclassification

penalty

Training-set,

cross-validation

to select C and γ

(defining RBF

width)

4. REFERENCES

[1] C. Cortes and V. Vapnik, “Support-vector network,” Machine

Learning, vol. 20, pp. 273–297, 1995.

[2] C.-W. Hsu and C.-J. Lin, “A comparison of methods for multi-

class support vector machines,” IEEE Transactions on Neural

Networks, vol. 13, no. 2, pp. 415–425, 2002.

[3] T. Joachims, “Text categorization with support vector

machines: learning with many relevant features,” in Proceedings of

ECML-98, 10th European Conference on Machine Learning, 1998,

number 1398, pp. 137–142.

[4] S. Ertekin, “Learning in Extreme Conditions: Online and

Active Learning with Massive, Imbalanced and Noisy Data,”

Citeseer, 2009.

[5] R. S. Shah, “Support Vector Machines for Classification and

Regression,” McGill University, 2007.

[6] C.-C. Chang and C.-J. Lin, LIBSVM: a library for support

vector machines, 2001, Software available at

http://www.csie.ntu.edu.tw/~cjlin/libsvm.

[7] O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee,

“Choosing multiple parameters for support vector machines,”

Machine Learning, vol. 46, pp. 131–159, 2002.

[8] S. Sathiya Keerthi, “Efficient tuning of SVM hyperparameters

using radius/margin bound and iterative algorithms,” IEEE

Transactions on Neural Networks, 2002.

[9] K. Duan, S. S. Keerthi, and A. N. Poo, “Evaluation of simple

performance measures for tuning SVM hyperparameters,”

Neurocomputing, 2002.

[10] D. DeCoste and K. Wagstaff, “Alpha seeding for support

vector machines,” in Proceedings of International Conference on

Knowledge Discovery and Data Mining (KDD-2000), 2000.

[11] Y.-Y. Ou, C.-Y. Chen, S.-C. Hwang, and Y.-J. Oyang,

“Expediting model selection for support vector machines based on

data reduction,” inSystems, Man and Cybernetics, 2003. IEEE

International Conference on, 2003, vol. 1, pp. 786–791.

[12] UCI Repository, Uci machine learning repository,

http://www.ics.uci.edu/mlearn/MLRepository.html

[13] S. S. Keerthi and C.-J. Lin. Asymptotic behaviors of support

vector machines with Gaussian kernel. Neural Computation,

15(7):1667Ð1689, 2003.

[14] H.-T. Lin and C.-J. Lin. A study on sigmoid kernels for SVM

and the training of non-PSD kernels by SMO-type methods.

Technical report, Department of Computer Science, National

Taiwan University, 2003.

[15] V. Vapnik. The Nature of Statistical Learning Theory.

Springer-Verlag, New York, NY, 1995.

[16] D. Michie, D. J. Spiegelhalter, C. C. Taylor, and J. Campbell,

editors. Machine learning, neural and statistical classification.

Ellis Horwood, Upper Saddle River, NJ, USA, 1994. ISBN 0-13-

106360-X. Data available at http://archive.ics.uci.edu/ml/machine-

learning-databases/statlog/

[17] Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin. A

Practical Guide to Support Vector Classification. National Taiwan

University, Taipei 106, Taiwan. Last updated: April 15, 2010.

[18] M. Misaki, Y. Kim, P. A. Bandettini, and N. Kriegeskorte,

“Comparison of multivariate classifiers and response

normalizations for pattern-information fMRI,” Neuroimage, vol.

53, no. 1, pp. 103–118, Oct. 2010.

Drought Forecasting of Seyhan and Ceyhan Basins Using Machine Learning Methods

Article

Apr 2024

SVM-DO: identification of tumor-discriminating mRNA signatures via support vector machines supported by Disease Ontology

Article

Full-text available

Dec 2023

Background/aim: The complicated nature of tumor formation makes it difficult to identify discriminatory genes. Recently, transcriptome-based supervised classification methods using support vector machines (SVMs) have become popular in this field. However, the inclusion of less significant variables in the construction of classification models can lead to misclassification. To improve model performance, feature selection methods such as enrichment analysis can be used to extract useful variable sets. The detection of genes that can discriminate between normal and tumor samples in the association of cancer and disease remains an area of limited information. We therefore aimed to discover novel and practical sets of discriminatory biomarkers by utilizing the association of cancer and disease. Materials and methods: In this study, we employed an SVM classification method for differentially expressed genes enriched by Disease Ontology and filtered nondiscriminatory features using Wilk’s lambda criterion prior to classification. Our approach uses the discovery of disease-associated genes as a viable strategy to identify gene sets that discriminate between tumor and normal states. We analyzed the performance of our algorithm using comprehensive RNA-Seq data for adenocarcinoma of the colon, squamous cell carcinoma of the lung, and adenocarcinoma of the lung. The classification performance of the obtained gene sets was analyzed by comparison with different expression datasets and previous studies using the same datasets. Results: It was found that our algorithm extracts stable small gene sets that provide high accuracy in predicting cancer status. In addition, the gene sets generated by our method perform well in survival analyses, indicating their potential for prognosis. Conclusion: By combining gene sets for both diagnosis and prognosis, our method can improve clinical applications in cancer research. Our algorithm is available as an R package with a graphical user interface in Bioconductor (https://doi.org/10.18129/B9.bioc.SVMDO) and GitHub (https://github.com/robogeno/SVMDO).

Intelligent grading of kaffir lime oil quality using non-linear support vector machine

Article

Full-text available

Dec 2022
IJECE

span lang="EN-US">This paper presents kaffir lime oil quality grading using the intelligent system classification method, a non-linear support vector machine (NSVM). This method classifies the quality kaffir lime oil into two groups: high and low quality, based on their significant chemical compounds. The 90 data of kaffir lime oil were used in this project from high to low quality. The abundance (%) of significant chemical compounds will act as the input and high or low quality as an output. The 90 data will be divided into two sets: training and testing data sets with a ratio of 8:2. The radial basis function (RBF) optimization kernel parameters in NSVM. Using the implementation of MATLAB software version R2020a, all data and analysis work was performed automatically. The results showed that the NSVM model met all performance criteria for 100% accuracy, sensitivity, specificity, and precision.</span

Demand aggregator optimal strategies: from the bidding to the execution

Thesis

Full-text available

Jul 2023

Europe is at the beginning of an energy revolution. The old energy paradigm where big, centralized power plants provide energy to passive consumers is ending. Distributed renewable energy resources and electrification of mobility, industrial processes or building heating and cooling devices are changing the way in which electricity is produced and consumed. In addition, the current socio-political tensions and the Ukrainian war boosts European countries to reach the energy independency from other regions and from fossil fuels. In this moment, energy is more expensive than ever, and citizens are every time more conscious about environmental issues and desire to be an active part of the revolution. In this context, advances in the Information Technology (IT) allows to gather much more information from devices and allow to control them remotely, although their potential in energy-related topics is still untapped. At the same time, European Directives are incentivizing consumers to play an active role in the electricity system and national regulators are transposing the European Directives to enable consumers to actively participate in electricity markets. However, it is not easy for the System Operator to handle the coexistence of consumers and generators in the same markets’ mechanisms, due to their intrinsic differences. Demand aggregators are the new market actors that promise to put the consumers at the center of the energy system. Demand aggregator’s role is to aggregate, trade, and coordinately manage the flexibility of multiple consumers in electricity markets. Demand-side flexibility is the ability of a consumer to modify its consumption depending on external factors, such as electricity prices or electricity’s grid conditions. However, it is not clear what is the best strategy to adopt for Demand Aggregators. Under this new paradigm, this PhD thesis prospects the optimal strategies of a demand aggregator that manages the flexibility of different types of assets in energy and balancing markets, from the optimal bidding strategies in a day-ahead horizon to the execution and control of the devices in real time. To achieve this main objective, the thesis firstly analyzes the regulation of the four main European balancing markets, with special attention in finding barriers and enablers for a commercial scale development of Demand Aggregators. Once analyzed the framework, the thesis aims to cover the optimal operation of the Demand Aggregator: (1) the thesis explores and proposes three different algorithms to predict the electricity’s consumption of different type of consumers, proposing a methodology to compare them; (2) the work proposes a methodology to predict the flexibility of several type of consumption and allow to trade this flexibility in electricity and balancing markets and (3) the thesis proposes two novel mathematical programming models to allow the participation in the Iberian secondary reserve market and the joint participation in short-term energy and tertiary reserve markets. Results demonstrate the technical and economic viability for Demand Aggregators to participate in the selected markets and the novelty of the proposed methodologies. Further research topics are individuated due to the complexity of the problem, including electricity market’s regulatory issues or economical and physicals restrictions to consider when a change in the consumer’s behavior is needed. Despite the challenging framework, from the algorithms and knowledge developed during this thesis, the author, with its thesis director, funded in 2020 Bamboo Energy. Bamboo Energy is a company created to commercialize the software developed withing this thesis, making demand aggregation a reality in Spain. Consequently, this thesis presents a success story of how what began, more than five years ago, with the initial steps of energy flexibility in a research environment, ended up with a spin-off that tackles real market business on energy management in demand aggregation applications. Keywords: Demand aggregators; flexibility; energy markets; Demand Response; balancing services; bidding strategies; optimization; forecast, business models.

Cross-Lingual Depression Detection for Twitter Users: A Comparative Sentiment Analysis of English and Arabic Tweets

Preprint

Full-text available

Jul 2023

Depression, a common mental health issue, significantly disrupting an individual's daily functioning and increasing premature Depression, a common mental health issue, significantly disrupting an individual's daily functioning and increasing premature mortality risk. The ubiquitous use of social media platforms for expressing sentiments and sharing daily activities provides a fertile ground for early detection of depression. This paper makes significant contributions in utilizing online platforms for depression detection. Firstly, we introduce five machine-learning models to detect depression in Arabic and English text from Twitter. For Arabic text, our optimal model achieved a high accuracy with an F1-score of 96.6% for binary classification of depressed and non-depressed tweets. For English text, excluding negations, the model accomplished an F1-score of 92% for binary classification and 88% for multi-classification (depressed, indifferent, happy). When considering negations, the model demonstrated a slightly lower performance with an 87% and 85% F1-score for binary and multi-classification respectively. Secondly, we present three unique corpora: one manually annotated Arabic corpus, and two automatically annotated English corpora—with and without negation. These corpora encompass a broad spectrum of emotional sentiments, enhancing the depth of our analysis. Lastly, the paper presents a novel web application for depression detection, implementing our refined models. This application enables the identification of depression symptoms in tweets and prediction of an individual's depression trends, supporting both English and Arabic languages. This research represents a significant stride forward in mental health detection leveraging the widespread use of social media.

Moisture Content Prediction in Polymer Composites Using Machine Learning Techniques

Article

Full-text available

Oct 2022

The principal objective of this study is to employ non-destructive broadband dielectric spectroscopy/impedance spectroscopy and machine learning techniques to estimate the moisture content in FRP composites under hygrothermal aging. Here, classification and regression machine learning models that can accurately predict the current moisture saturation state are developed using the frequency domain dielectric response of the composite, in conjunction with the time domain hygrothermal aging effect. First, to categorize the composites based on the present state of the absorbed moisture supervised classification learning models (i.e., quadratic discriminant analysis (QDA), support vector machine (SVM), and artificial neural network-based multilayer perceptron (MLP) classifier) have been developed. Later, to accurately estimate the relative moisture absorption from the dielectric data, supervised regression models (i.e., multiple linear regression (MLR), decision tree regression (DTR), and multi-layer perceptron (MLP) regression) have been developed, which can effectively estimate the relative moisture absorption from the dielectric response of the material with an R2 value greater than 0.95. The physics behind the hygrothermal aging of the composites has then been interpreted by comparing the model attributes to see which characteristics most strongly influence the predictions.

Acoustic signal based water leakage detection system using hybrid machine learning model

Article

Aug 2023

Study of Spam Email Filtering Methods using Supervised Machine Learning Techniques

Conference Paper

Apr 2023

Real Time Conversion of American Sign Language to text with Emotion using Machine Learning

Conference Paper

Nov 2022

Automated recognition of major depressive disorder from cardiovascular and respiratory physiological signals

Article

Full-text available

Dec 2022

Major Depressive Disorder (MDD) is a neurohormonal disorder that causes persistent negative thoughts, mood and feelings, often accompanied with suicidal ideation (SI). Current clinical diagnostic approaches are solely based on psychiatric interview questionnaires. Thus, a computational intelligence tool for the automated detection of MDD with and without suicidal ideation is presented in this study. Since MDD is proven to affect cardiovascular and respiratory systems, the aim of the study is to automatically identify the disorder severity in MDD patients using corresponding multi-modal physiological signals, including electrocardiogram (ECG), finger photoplethysmography (PPG) and respiratory signals (RSP). Data from 88 subjects were used in this study, out of which 25 were MDD patients without SI (MDDSI−), 18 MDD patients with SI (MDDSI+), and 45 normal subjects. Multi-modal physiological signals were acquired from each subject, including ECG, RSP, and PPG signals, and then pre-processed. Discrete wavelet transform (DWT) was applied to the signals, which were decomposed up to six levels, and then eleven nonlinear features were extracted. The features were ranked according to the analysis of variance test and Marginal Fisher Analysis was employed to reduce the feature set, after which the reduced features were ranked again to select the most discriminatory features. Support vector machine with polynomial radial basis function (SVM-RBF) as well as k-nearest neighbor (KNN) classifiers were used to classify the significant features. The performance of the classifiers was evaluated in a 10-fold cross validation scheme. The best performance achieved for the classification of MDDSI+ patients was up to 85.2%, by using selected features from the obtained multi-modal signals with SVM-RBF, while it was up to 96.6% for the detection of MDD patients against healthy subjects. This work is a step toward the utilization of automated tools in diagnostics and monitoring of MDD patients in a personalized and wearable healthcare system.

LIBSVM: A library for support vector machines

Article

Jan 2011

A Practical Guide to Support Vector Classification

Article

Jan 2003

Text Categorization with Support Vector Machines: Learning with Many Relevant Features

Article

Jan 1998

Thorsten Joachims

The Nature of Statistical Learning Theory

Chapter

Jan 2000

Vladimir N. Vapnik

In the history of research of the learning problem one can extract four periods that can be characterized by four bright events: (i) Constructing the first learning machines, (ii) constructing the fundamentals of the theory, (iii) constructing neural networks, (iv) constructing the alternatives to neural networks.

Support Vector Networks

Article

Sep 1995

The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

Evaluation of simple performance measures for tuning SVM parameters

Article

Apr 2003
NEUROCOMPUTING

Choosing optimal hyperparameter values for support vector machines is an important step in SVM design. This is usually done by minimizing either an estimate of generalization error or some other related performance measure. In this paper, we empirically study the usefulness of several simple performance measures that are inexpensive to compute (in the sense that they do not require expensive matrix operations involving the kernel matrix). The results point out which of these measures are adequate functionals for tuning SVM hyperparameters. For SVMs with L1 soft-margin formulation, none of the simple measures yields a performance uniformly as good as k-fold cross validation; Joachims’ Xi-Alpha bound and the GACV of Wahba et al. come next and perform reasonably well. For SVMs with L2 soft-margin formulation, the radius margin bound gives a very good prediction of optimal hyperparameter values.

The Nature of Satistical Learning Theory

Chapter

Jan 1999

Vladimir N. Vapnik

Support Vector Network

Article

Sep 1995

Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

Evolutionary Support Vector Machines: An Enhanced Framework for Classification and Regression

Chapter

Jan 2011

Lecture Notes in Computer Science

Conference Paper

Jan 1998

Thorsten Joachims

This paper explores the use of Support Vector Machines (SVMs) for learning text classifiers from examples. It analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task. Empirical results support the theoretical findings. SVMs achieve substantial improvements over the currently best performing methods and behave robustly over a variety of different learning tasks. Furthermore they are fully automatic, eliminating the need for manual parameter tuning.

SVM Classification with Linear and RBF kernels

Abstract and Figures

Recommended publications

Improving support vector classification by learning group information hidden in the data

Support Vector Machine

A new incremental learning algorithm based on hyper-sphere SVM

Learning Multi Labels from Single Label —— An Extreme Weak Label Learning Algorithm

Pattern Synthesis Using Multiple Kernel Learning for Efficient SVM Classification