ArticlePDF Available

Heartbeat Time Series Classification With Support Vector Machines

Authors:

Abstract and Figures

In this study, heartbeat time series are classified using support vector machines (SVMs). Statistical methods and signal analysis techniques are used to extract features from the signals. The SVM classifier is favorably compared to other neural network-based classification approaches by performing leave-one-out cross validation. The performance of the SVM with respect to other state-of-the-art classifiers is also confirmed by the classification of signals presenting very low signal-to-noise ratio. Finally, the influence of the number of features to the classification rate was also investigated for two real datasets. The first dataset consists of long-term ECG recordings of young and elderly healthy subjects. The second dataset consists of long-term ECG recordings of normal subjects and subjects suffering from coronary artery disease.
Content may be subject to copyright.
512 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 13, NO. 4, JULY 2009
Heartbeat Time Series Classification With Support
Vector Machines
Argyro Kampouraki, George Manis, and Christophoros Nikou, Member, IEEE
Abstract—In this study, heartbeat time series are classified using
support vector machines (SVMs). Statistical methods and signal
analysis techniques are used to extract features from the signals.
The SVM classifier is favorably compared to other neural network-
based classification approaches by performing leave-one-out cross
validation. The performance of the SVM with respect to other
state-of-the-art classifiers is also confirmed by the classification
of signals presenting very low signal-to-noise ratio. Finally, the
influence of the number of features to the classification rate was
also investigated for two real datasets. The first dataset consists of
long-term ECG recordings of young and elderly healthy subjects.
The second dataset consists of long-term ECG recordings of normal
subjects and subjects suffering from coronary artery disease.
Index Terms—Feature extraction, heartbeat time series, heart
rate variability (HRV), support vector machine (SVM).
I. INTRODUCTION
HEART RATE Variability analysis is based on measuring
the variability of heart rate signals, and more specifically,
variations per unit of time of the number of heartbeats (also re-
ferred to as the RR interval, since it is the time interval between
consecutive R points of the QRS complex of the electrocardio-
gram). A large value of this index reveals a complicated system
that can response better to a wide variety of conditions. Thus,
a healthy person usually presents large values of HRV, while a
decreased value may indicate pathological cases. HRV analy-
sis has gained significant clinical attention as can be seen from
the large number of research efforts of the past two decades.
Several categorizations for heart rate variability measures were
proposed. Guidelines for standards are summarized in [1], a
summary of measures and models is presented in [2], and a
review examining the physiological origins and mechanisms of
heart rate may be found in [3].
Several techniques have been proposed for the investigation of
HRV time series. Among them, spectral methods [4] based on
fast Fourier transform (FFT) or standard autoregressive mod-
eling, nonlinear approaches, including Markov modeling [5],
entropy-based metrics [6], the mutual information measure [7],
and probabilistic modeling [8] are widely used. The applica-
tion of the Karhunen–Lo¨
eve transformation [9] and modulation
analysis [10] have also been considered.
Artificial intelligence and machine learning methods consti-
tute a powerful tool in HRV analysis. Radial basis function
Manuscript received November 11, 2007; revised April 14, 2008. First
published August 4, 2008; current version published July 6, 2009.
The authors are with the Department of Computer Science, University of
Ioannina, 45110 Ioannina, Greece (e-mail: akampour@cs.uoi.gr; manis@cs.
uoi.gr; cnikou@cs.uoi.gr).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org
Digital Object Identifier 10.1109/TITB.2008.2003323
networks were applied for learning and predicting the HRV
dynamics [11]. In [12], neural networks were used as a pre-
diction and approximation tool for HRV analysis and the mean
prediction error was used as a HRV index. In [13], coronary
disease risk was predicted based on short-term RR interval
measurements.
In this study, we investigate the potential benefit of using
support vector machine (SVM) learning [14], [15] to classify
heart rate signals. In order to enforce independence of the results
of our study to the recordings and their physiological conditions,
we have experimented with two different datasets. The first
dataset is available on the Web [16] and consists of recordings
acquired from young and elderly subjects. The second dataset
consists of normal subjects and subjects suffering from coronary
artery disease [17].
Support vector classifiers are based on recent advances in
statistical learning theory [18]. They use a hypothesis space
of linear functions in a high-dimensional feature space, trained
with a learning algorithm from optimization theory that imple-
ments a learning bias derived from statistical learning theory.
In the last decade, SVM learning has found a wide range of
applications [19], including image segmentation [20] and clas-
sification [21], object recognition [22], image fusion [23], and
stereo correspondence [24]. More recently, SVMs have been
employed in several applications in biomedicine: gait degen-
eration due to age [25], EEG signal classification [26], brain
computer interfacing (BCI) [27], [28], analysis and prediction
of scoliosis [29], [30], electrogastrogram analysis [31], and
color Doppler echocardiography [32]. Relevant studies involv-
ing SVM and heart rate time series are the Hermite charac-
terization of QRS complex [33], where a heartbeat is char-
acterized as normal or abnormal and the detection of risky
situations for fetal assessment [34]. Previous research of our
group on HRV was presented in [35] and [36]. This study in-
vestigates the problem further, examines a larger number of
HRV computation methods, extracts more features, and uses
different datasets. It also compares SVM with other neural
network-based classifiers and examines their robustness to noisy
data.
Heart rate variability analysis is applied to the estimation of
the autonomic nervous balance, to the estimation of the stress or
relaxation condition, and to the evaluation of mental or physio-
logical workload. All of these issues may be handled using clas-
sification methods. In machine learning theory, SVMs are con-
sidered as being the state-of-the-art classifier. Therefore, they
could be employed in such situations as the problem addressed
here. Since they have not been used for the aforesaid cases, their
performance is still an open issue.
1089-7771/$25.00 © 2009 IEEE
Authorized licensed use limited to: Illinois Institute of Technology. Downloaded on September 3, 2009 at 10:29 from IEEE Xplore. Restrictions apply.
KAMPOURAKI et al.: HEARTBEAT TIME SERIES CLASSIFICATION WITH SUPPORT VECTOR MACHINES 513
The rest of the paper is structured as follows: Section II
presents a background on SVMs and Section III discusses the
most common HRV features that are the input to the SVM classi-
fier. Description of the datasets employed in our experimentation
is provided in Section IV, numerical experiments are presented
in Section V, and conclusions are drawn in Section VI.
II. BACKGROUND ON SVMS
Support vector learning strategy is a principled and very pow-
erful method that has outperformed most other systems in a
wide variety of applications [19]. The learning machine is given
a training set of examples (or inputs), belonging to two classes,
with associated labels (or output values). The examples are in
form of attribute vectors and the SVM finds the hyperplane
separating the input data and being furthest from both convex
hulls. If the data are not linearly separable, a set of slack vari-
ables is introduced representing the amount by which the linear
constraint is violated by each data point.
In this study, we are concerned with a two-class pattern clas-
sification problem. Let vector x∈
ndenote a pattern to be
classified and let scalar ydenote its class (y∈{±1}). Also let
{(xi,y
i),i=1,...,l}denote a set of ltraining examples. The
problem is how to construct a decision function f(x)that cor-
rectly classifies an input pattern that is not necessarily in the
training set.
A. Linear SVM Classifiers
If the training patterns are linearly separable, there exists a
linear function of the form
f(x)=wTx+b(1)
such that yif(xi)0,orf(xi)0for yi=+1and f(xi)<0
for yi=1. Vector wand scalar brepresent the hyperplane
f(x)=wTx+b=0separating the two classes.
While there may exist many hyperplanes separating the two
classes, the SVM classifier finds the hyperplane that maximizes
the separating margins between the two classes [14], [15]. This
hyperplane can be found by minimizing the cost function
J(w)=1
2wTw=1
2w2(2)
subject to the separability constraints
yi(wTxi+b)1,i=1,...,l. (3)
If the training data is not completely separable by a hyper-
plane, a set of slack variables ξi0,i=1,...,lis introduced
that represents the amount by which the linearity constraint is
violated
yi(wTxi+b)1ξi
i0,i=1,...,l. (4)
In that case, the cost function is modified to take into account
the extent of the constraint violations. Hence, the function to be
minimized becomes
J(w)= 1
2w2+C
l
i=1
ξi(5)
subject to the constraints in (4). Here, Cgives the significance of
the constraint violations with respect to the distance between the
points and the hyperplane and ξis a vector containing the slack
variables. The cost function in (5) is called structural risk and is a
tradeoff between the empirical risk (the training errors reflected
by the second term) with model complexity (the first term)
[37]. The purpose of using model complexity to constrain the
optimization of empirical risk is to avoid overfitting, a situation
in which the decision boundary corresponds to the training data,
and thereby, fails to perform well on data outside the training
set.
The problem in (5) with the constraints in (4) can be solved by
introducing Lagrange multipliers. With some manipulation, it
can be shown that the vector wis formed by linear combination
of the training vectors
w=
l
i=1
αiyixi(6)
where αi0,i=1,...,lare the Lagrange multipliers asso-
ciated with the constraints in (4). The Lagrange multipliers are
solved for the dual problem of (5), which is expressed as
max
αi
l
i=1
αi1
2
l
i=1
l
j=1
αi(yiyjxixj)αj
(7)
subject to the constraints
αi0,
l
i=1
αiyi=0 (8)
The cost function to be maximized in (7) is convex and quadratic
with respect to the unknown parameters αi, and in practice, it is
solved numerically through quadratic programming.
Note that only a few parameters αiwill have values satisfying
the constraints in (8), i.e., will be nonzero. The corresponding
training vector xiis called a support vector. Vector wis com-
puted from (6) while scalar bis computed from yi(wxi+b)=1
for any support vector. The classification of a vector xoutside
the training set is performed by
f(x)=sign l
i=1
(αiyixxi+b).(9)
B. Kernel-based SVM Classifiers
For many datasets, it is unlikely that a hyperplane will yield
a good classifier. Instead, we want a decision boundary with
more complex geometry. One way to achieve this is to map
the attribute vector into some new space of higher dimension-
ality and look for a hyperplane in that new space, leading to
kernel-based SVMs [38], [39]. The interesting point about ker-
nel functions is that although classification is accomplished in
a space of higher dimension, any dot product between vectors
involved in the optimization process can be implicitly computed
in the low-dimensional space [15].
Let Φ(·)be a nonlinear operator mapping the input vector x
to a higher dimensional space. The optimization problem for the
Authorized licensed use limited to: Illinois Institute of Technology. Downloaded on September 3, 2009 at 10:29 from IEEE Xplore. Restrictions apply.
514 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 13, NO. 4, JULY 2009
new points Φ(x)becomes
min J(w)= 1
2w2+C
l
i=1
ξi(10)
subject to the constraints
yi(wTΦ(xi)+b)1ξi
i0,i=1,...,l. (11)
Following the same principles as in the linear case, we note
that the only form in which the mapping appears is in terms of
K(xi,xj,)=Φ
T(xi)Φ(xj). That is, the mapping appears only
implicitly through the kernel function K(·,·).Thereareavariety
of possible kernels. However, when choosing a kernel, it is
necessary to check that it is associated with the inner product of
some nonlinear mapping [37]. Some typical choices for kernels
are polynomials and radial basis functions.
Finally, the dual problem to be solved is
max
αi
l
i=1
αi1
2
l
i=1
l
j=1
αi(yiyjK(xi,xj))αj
(12)
subject to the constraints αi0,l
i=1 αiyi=0, and the
classifier becomes
f(x)=sign l
i=1
(αiyiK(x,xi)+b)(13)
III. HEART RATE VARIABILITY FEATURES
At first, we describe how the RR features are obtained by
a QRS detection algorithm [40]. Then, the most commonly
used HRV analysis methods were selected as features for the
Gaussian kernel-based SVM classifier. We employed the results
of the widely used HRV analysis methods [1], [2].
A. RR Detection Algorithm
In the first step, the algorithm [40] passes the signal through
a low-pass and a high-pass filter in order to reduce the influence
of the muscle noise, the power line interference, the baseline
wander, and the T-wave interference. After filtering, the signal
is differentiated to provide the QRS slope information and it
is squared making all data points positive and emphasizing the
higher frequencies.
After squaring, the algorithm performs sliding window in-
tegration in order to obtain the waveform feature. A temporal
location of the QRS is marked from the rising edge of the inte-
grated waveform. In the last step, two thresholds are adjusted.
The higher of the two thresholds identifies peaks of the signal.
The lower threshold is used when no peak has been detected by
the higher threshold in a certain time interval. In this case, the
algorithm has to search back in time for a lost peak. When a
new peak is identified (as a local maximum–change of direction
within a predefined time interval), then this peak is classified as
a signal peak if it exceeds the high threshold (or the low thresh-
old if we search back in time for a lost peak) or as a noise peak
otherwise. In order to detect a QRS complex, the integration
waveform and the filtered signals are investigated and different
values for the aforesaid thresholds are used. To be identified as
a QRS complex, a peak must be recognized as a QRS in both
integration and filtered waveform.
B. Statistical HRV Features
Let xi,i=1,2,...,N be the series of the RR intervals (the
time interval between consecutive R points of the QRS) of a
heartbeat signal. Let, also, ¯xbe the RR signal mean value. The
statistical methods considered in this study are:
1) the standard deviation
sdnn =
1
N
N
i=1
(xi¯x)2
2) the standard deviation of mean values of intervals (sdann)
is defined by the standard deviation of mean values of
successive equal-sized window intervals. A typical value
of the window size is 5 min of recording;
3) the root-mean-square of successive differences
rmssd =
1
N1
N1
i=1
(xi+1 xi)2
4) the mean standard deviation of intervals (sdnni) is de-
scribed by the mean standard deviation of successive
equal-sized window intervals, in a way similar to sdann.
Again, we define a window size, but this time we first
calculate the standard deviation for every successive win-
dow and then compute the mean value of the standard
deviations;
5) the percentage of differences greater than x(pNNx) cal-
culates how much percent of the differences between suc-
cessive samples are greater than a given value x;
6) the standard deviation of differences
sdsd =
1
N
N
i=1
(dxi¯
dx)2
where dxi=xi+1 xi,xiis a sample point, ¯
dx the mean
value of all dxiand Nthe total number dxiintervals;
7) the autocorrelation
corr(τ)=Nτ
i=1 (xi¯x)(xi+τ¯x)
N
i=1(xi¯x)2
where τis a time lag;
8) the Shannon entropy
entr =
B
i=1
filog fi
where fiis the relative frequency of the ith bin of the RR
intervals histogram. The RR intervals are quantized into
Bbins that span the range [0,max{xi}].
Authorized licensed use limited to: Illinois Institute of Technology. Downloaded on September 3, 2009 at 10:29 from IEEE Xplore. Restrictions apply.
KAMPOURAKI et al.: HEARTBEAT TIME SERIES CLASSIFICATION WITH SUPPORT VECTOR MACHINES 515
C. Prediction-Based HRV Feature
1) The local linear prediction (llp) [17], [41] is a simple
autoregressive prediction method in which future samples
of a time series x1,x
2,...,x
i,...,x
Nare predicted by
using a linear combination of previous ksamples:
ˆxi=1
k
i1
j=ik
xj.
The index is calculated by the mean values of absolute
differences between predicted and actual values
llp =1
Nk
N
i=k
|ˆxixi|.
D. Wavelet HRV Features
1) The signal is decomposed by the discrete wavelet trans-
form using the Haar wavelet whose mother function is
simply a step function. The standard deviation of the de-
tail coefficients, representing the high-frequency content
of the signal, of each scale of analysis is computed. A
detailed description of the method is described in [42].
IV. MATERIALS AND METHODS
We applied the Gaussian kernel-based SVM classification
to two different datasets. The first dataset consists of long-
term ECG recordings, in which twenty young (21–34 years
old) and twenty elderly (68–85 years old) rigorously screened
healthy subjects underwent 120 min of continuous supine resting
while continuous electrocardiographic signals were collected.
Each subgroup of subjects includes equal numbers of men and
women. All subjects remained in a resting state in sinus rhythm
while watching the movie Fantasia (Disney, 1940) to help main-
tain wakefulness. The continuous ECG signals were digitized
at 250 Hz. Each heartbeat was annotated using an automated
arrhythmia detection algorithm, and each beat annotation was
verified by visual inspection. The data are available and further
described in [16] and [43]. We refer to these recordings as Data
Set I.
The second data set consists of long-recording ECGs (ap-
proximately 2 h long) of six normal subjects and six subjects
suffering from coronary artery disease [17]. The normal sub-
jects were young males aged 25–29 years, with unremarkable
medical histories and normal physical examinations. All of the
subjects were nonsmokers, received no drugs, and abstained
from caffeine for 24 h prior to acquisition. The recordings were
performed in a controlled environment that is similar for all pa-
tients and a cardiologist is always present to ensure that prepa-
ration and procedure details during cardiogram acquisition are
followed properly. The patient subjects were hospitalized, had
one- or two-vessel coronary disease, which was angiographi-
cally confirmed, and normal left ventricular function (defined as
an ejection fraction greater than or equal to 50%). Subjects with
a history of myocardial infarction, coronary angioplasty or by-
pass grafting, cardiac rhythm disturbances, left ventricular dys-
function (defined as an ejection fraction less than 50%), severe
Fig. 1. Features computed for the heartbeat signals of Data Set I.
Fig. 2. Features computed for the heartbeat signals of Data Set II.Nindicates
normal subjects and C indicates subjects suffering from coronary artery disease.
arterial hypertension, and medical conditions affecting heart rate
variability (e.g., diabetes mellitus, hormonal disturbances, treat-
ment with psychotropic drugs, and respiration diseases) were
excluded. All patient subjects were undergoing treatment with
nitrates, angiotensin converting enzyme inhibitors, salicylics,
and calcium antagonists. No one had a history of stroke, periph-
eral vascular disease, or clinically significant valvular abnormal-
ities. Their ages were 42–52 years old. To guarantee that valid
and precise data were acquired, a cardiologist was present to
ensure that all preparation and procedure details during electro-
cardiogram acquisition were followed properly. All recordings
were performed in a quiet room, between the hours of 15.00
and 17.00, in the supine position under continuous monitoring
by the cardiologist who confirmed the absence of any cardiac
rhythm disturbances throughout the recording. Subjects were
told to breathe normally and an attempt was made to maintain
the respiratory rate at around 12/min. The continuous ECG sig-
nals were digitized at 300 Hz. We refer to these recordings as
Data Set II.
Authorized licensed use limited to: Illinois Institute of Technology. Downloaded on September 3, 2009 at 10:29 from IEEE Xplore. Restrictions apply.
516 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 13, NO. 4, JULY 2009
In HRV analysis methods, a set of parameters should be spec-
ified for the feature extraction processes. This is nontrivial and
usually a data-dependent task. We have selected these param-
eters after a trial and error procedure. In the performed ex-
periments, the window size for both sdann and sdnni is set to
5 min of recording, as suggested in [1]. Llp has been applied
with window size equal to 8, as suggested in [17]. The τparam-
eter of the autocorrelation function has been set to 5 beats. The
Haar wavelet was applied to extract multiresolution features.
Since the seventh scale of analysis produced the clearest clas-
sification compared to all other scales of analysis, we selected
this scale as the output of the method.
V. N UMERICAL EXPERIMENTS
To motivate the use of vector-valued features and nonlinear
classification using SVM, we present in Figs. 1 and 2 the features
computed for the signals of the two datasets. Also, in Table I, the
p-values of the features of the datasets used in the experiments
are presented. As can be easily observed, although the features
are statistically significant (small p-values), the signals may not
be classified correctly by simple thresholding.
In our evaluation framework, we also compared the SVM
approach with the learning vector quantization (LVQ) neural
network [44] and a backpropagation neural network using the
Levenberg–Marquardt (LM) minimization algorithm [45].
LVQ is an autoassociative nearest-neighbor classifier that
classifies arbitrarily patterns into classes using an error cor-
rection encoding procedure related to competitive learning. The
main idea is to cover the input space of samples with code-
book vectors, each representing a region labeled with a class. A
codebook vector can be seen as a prototype of a class member,
localized in the center of a class or decision region (Voronoi
cell) in the input space. A class can be represented by an ar-
bitrary number of codebook vectors, but one codebook vector
represents one class only.
Neural network minimization problems are often very ill-
conditioned due to the Hessian matrix. For such problems,
the LM algorithm is often a good choice. It approximates the
Hessian by a product of the Jacobian, which is a matrix that con-
tains the first derivatives of errors with respect to the weights.
The Jacobian is much less complex to be computed and the
approximation speeds up the minimization algorithm.
Let us also notice that SVM has the computational cost of a
quadratic programming optimization algorithm that is not pro-
hibitive if the vector dimensions are not very large, as is the case
in our experiments.
A first experiment consists in applying the SVM classifier
to feature vectors generated by the signals of Data Set I and
Data Set II. In that framework, leave-one-out cross validation
was implemented for all of the compared methods (SVM, LVQ,
and LM). The classifier was trained with all but one signal
and the remaining signal was used for testing the classifier. The
procedure is repeated in a cyclic way until all of the signals have
been used for testing. The percentage of correctly classified
signals for the SVM, the LVQ, and the LM neural networks
are summarized in Table II. As can be seen, SVM achieves to
correctly differentiate young and elderly subjects at 100% for
Data Set I. The same stands for Data Set II, where healthy
and pathological cases are correctly classified with the SVM
while the other methods present weaker performances. This is
an important conclusion that reveals the efficiency of the SVM
methodology for heart rate variability signal classification.
A second experiment consists in investigating the robustness
to noise of the SVM classifier. The whole set of the original
signals (the original heartbeat time series) was corrupted by zero
mean white Gaussian noise. The standard deviation of the noise
was selected appropriately in order to obtain the SNRs between
5and 0dB. Attribute vectors were created from the degraded
signals and the classifiers were trained by a total number of 500
signals for each SNR level. Two hundred new test signals were
then created by the same procedure for each SNR level and
the classifiers were evaluated on them. Table III summarizes
the performances of the compared algorithms where the SVM
classifier overwhelms the other methods. Let us notice that when
a significant amount of noise is added to the signal, the statistical
attributes totally missclassify the signals by simple thresholding.
Table III summarizes the results.
We have also investigated the relation between the number
of features and the classification performance which is a first
step toward feature selection. To this end, we have randomly
selected 20 signals of Data Set I (10 young and 10 elderly)
and used them as a training set for the SVM classifier. The
remaining signals (10 young and 10 elderly) were used as a test
set. At first, we removed one feature from the feature vectors
and performed both training and test phases with the remaining
9 features. This was done for all possible combinations of 9
features chosen out of 10 features (totally 10 configurations).
Then, the same experiment was repeated by omitting 2 features
out of 10 resulting in 45 configurations. This procedure was
repeated until keeping 4 features out of 10 (210 configurations).
In every configuration, the values of Recall and Precision were
computed. Let us remind that Recall and Precision are defined
as
Recall =tp
tp +fn ,Precision =tp
tp +fp (14)
where tp, fp, and fn are the true positive, false positive, and false
negative classification results for the examined signals. More
specifically, Recall shows the percentage of the ground truth
that was retrieved and Precision represents the percentage of the
retrieved signals that were relevant (i.e., correctly classified).
A statistical representation of the results is presented in Fig. 3.
As can be observed, the fact of leaving out one feature at a time
gradually deteriorates the Recall measure. However, the Pre-
cision is less affected. The same experiment was repeated for
the case of a noisy signal with SNR of 2 dB and the respective
results are presented in Fig. 4. In that case, the Precision mea-
sure provides less coherent classification rates as is intuitively
expected.
In order to study the relationship between the number of
training signals and the classification performance, one has to
perform a huge amount of experiments. In our case, we have per-
formed leave-one-out cross validation and a second experiment,
Authorized licensed use limited to: Illinois Institute of Technology. Downloaded on September 3, 2009 at 10:29 from IEEE Xplore. Restrictions apply.
KAMPOURAKI et al.: HEARTBEAT TIME SERIES CLASSIFICATION WITH SUPPORT VECTOR MACHINES 517
TAB L E I
p-VALUES OF THE FEATURES OF THE DATASETS USEDINTHEEXPERIMENTS
TAB L E II
PERCENTAGE OF CORRECTLY CLASSIFIED SIGNALS USING LVQ, LM, AND
SVM CLASSIFIERS WITH LEAVE-ONE-OUT CROSS VALIDATION.FOR EAC H
SIGNAL,AFEATURE VECTOR WAS CREATED WITH COMPONENTS OF THE HRV
CHARACTERISTICS,DESCRIBED IN SECTION III
TABLE III
PERCENTAGE OF CORRECTLY CLASSIFIED SIGNALS USING LVQ, LM, AND
SVM CLASSIFIERS TO DATA CORRUPTED BY ZERO-MEAN WHITE GAUSSIAN
NOISE.INEAC H CASE, 500 NEW DEGRADED SIGNALS WERE USED FOR
TRAINING AND 200 SIGNALS FOR TESTING THE ALGORITHMS.FOR EACH
CORRUPTED SIGNAL,AFEATURE VECTOR WAS CREATED WITH COMPONENTS
OF THE HRV CHARACTERISTICS,DESCRIBED IN SECTION III
Fig. 3. Recall and Precision values as a function of the number of features.
For every value of features in the ordinate, all the possible combinations of the
initial ten features were examined. The boxes summarize the statistics of the
respective experiments.
Fig. 4. Recall and Precision values as a function of the number of features for
the signals degraded by white Gaussian noise at 2 dB. For every value of features
in the ordinate, all the possible combinations of the initial ten features were
examined. The boxes summarize the statistics of the respective experiments.
where we have randomly selected half of the signals as training
signals. The obtained results showed that there is some dif-
ference in the classification rates though not significant. For
instance, in Fig. 3, in the Recall measure, we can observe that
leaving one feature out (i.e., with nine features), the median
value is 0.9 and that there are two cases with lower rates. This
is due to the selection of the training signals. On the other side,
the majority of the values are around 0.9 and the Precision rates
are much higher and not significantly affected.
VI. DISCUSSION AND CONCLUSION
The categorization of the ECGs into two dinstinct groups ac-
cording to their heart rate variability can be very accurate using
an SVM in cases where standard methods fail to present a sat-
isfactory categorization. Experiments comparing SVM classifi-
cation of heart rate signals with the classifications obtained by
other nonlinear classifiers have also confirmed the effectiveness
of the former methodology even in the presence of important
amount of noise.
In the relevant literature, the features employed for HRV
characterization are either applied as scalars [1], [2] or used in
totally different contexts by being adapted to a specific appli-
cation [4]–[10]. For instance, in [6], the authors present a study
for the detection of regularities in the time series, while in [9],
the authors detect abrupt changes in the signal. The main char-
acteristic of the already proposed methods is that they focus on
a specific feature and try to improve its performance. Gener-
ally, there is no effort for a combination of different features.
However, we agree that the majority of the presented studies are
application dependent. Therefore, the authors try to improve the
performance of a specific feature. From this point of view, our
approach is more general and flexible.
In our experiments, for the specific datasets examined, we
achieved a very accurate classification of subjects, contrary to
the most common HRV analysis methods that failed to catego-
rize the same signals accurately. An important open issue and
a perspective of this study is the problem of feature selection.
We would preferably provide the classifier with a feature space
that would make some signal characteristic obvious. Although a
first approach was proposed in this study, these aspects of signal
classification are still the subject of ongoing research.
REFERENCES
[1] Task Force of the European Society of Cardiology and the North Amer-
ican Society of Pacing and Electrophysiology, “Heart rate variability:
Standards of measurement, physiological interpretation, and clinical use,”
Eur. Heart J., vol. 17, pp. 354–381, 1996.
[2] M. Teich, S. Lowen, K. Vibe-Rheymer, and C. Heneghan, “Heart rate
variability: measures and models,” in Nonlinear Biomedical Signal Pro-
cessing, vol. II, Dynamic Analysis and Modelling. New York: IEEE
Press, 2001, pp. 159–213.
[3] G. Berntson, J. Bigger, D. Eckberg, P. Grossman, P. Kaufmann, M. Malik,
H. Nagaraja, S. Porges, J. Saul, P. Stone, and M. van der Molen, “Heart
rate variability: Origins, methods, and interpretive caveats,” Psychophys-
iology, vol. 34, no. 6, pp. 623–648, Nov. 1997.
[4] M. Kamath and E. Fallen, “Power spectral analysis of HRV: A noninva-
sive signature of cardiac autonomic functions,” Crit. Rev. Biomed. Eng.,
vol. 21, pp. 245–311, 1993.
Authorized licensed use limited to: Illinois Institute of Technology. Downloaded on September 3, 2009 at 10:29 from IEEE Xplore. Restrictions apply.
518 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 13, NO. 4, JULY 2009
[5] R. Silipo, G. Deco, R. Vergassola, and C. Gremigni, “A characterization
of HRV’s nonlinear hidden dynamics by means of Markov models,” IEEE
Trans. Biomed. Eng., vol. 46, no. 8, pp. 978–986, Aug. 1999.
[6] M. Ferrario, M. Signorini, G. Magenes, and S. Cerutti, “Comparison of
entropy-based regularity estimators: Application to the fetal heart rate
signal for the identification of fetal distress,” IEEE Trans. Biomed. Eng.,
vol. 53, no. 1, pp. 119–125, Jan. 2006.
[7] D. Hoyer, B. Pompe, K. Chon, H. Hardhalt, C. Wicher, and U. Zwiener,
“Mutual information function assesses autonomic information flow of
heart rate dynamics at different time scales,” IEEE Trans. Biomed. Eng.,
vol. 52, no. 4, pp. 584–592, Apr. 2005.
[8] R. Barbieri and E. Brown, “Analysis of heartbeat dynamics by point
process adaptive filtering,” IEEE Trans. Biomed. Eng., vol. 53, no. 1,
pp. 4–12, Jan. 2006.
[9] B. Aysin, L. Chaparro, I. Grav´
e, and V. Shusterman, “Orthonormal basis
partitioning and time frequency representation of cardiac rhythm dynam-
ics,” IEEE Trans. Biomed. Eng., vol. 52, no. 5, pp. 878–889, May 2005.
[10] J. Mateo and P. Laguna, “Improved heart rate variability signal analysis
from the beat occurence times according to the IPFM model,” IEEE
Trans. Biomed. Eng., vol. 47, no. 8, pp. 997–1009, Aug. 2000.
[11] A. Bezerianos, S. Papadimitriou, and D. Alexopoulos, “Radial basis func-
tion neural networks for the characterization of heart rate variability dy-
namics,” Artif. Intell. Med., vol. 15, no. 3, pp. 215–234, 1999.
[12] A. Alexandridi, C. D. Stylios, and G. Manis, “Neural networks and fuzzy
logic approximation and prediction for HRV analysis,” presented at the
Eur. Symp. Intell. Technol., Hybrid Syst. Implement. Smart Adapt. Syst.,
Oulu, Finland, Jul. 2003.
[13] F. Azuaje, W. Dubitzky, P. Lopes, N. Black, K. Adamson, X. Wu, and
J. A. White, “Predicting coronary disease risk based on short-term RR
interval measurements: A neural network approach,” Artif. Intell. Med.,
vol. 15, no. 3, pp. 275–297, 1999.
[14] C. Cortes and V. N. Vapnik, “Support vector networks,” Mach. Learn.,
vol. 20, pp. 1–25, 1995.
[15] N. Christianini and J. Shawe-Taylor, Support Vector Machines and Other
Kernel-based Methods. Cambridge, U.K.: Cambridge Univ. Press,
2000.
[16] A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov,
R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley,
“PhysioBank, PhysioToolkit, and PhysioNet: Components of a new re-
search resource for complex physiologic signals,” Circulation, vol. 101,
no. 23, pp. e215–e220, Jun. 2000, circulation Electronic Pages [Online].
Available: http://circ.ahajournals.org/cgi/content/full/101/23/e215.
[17] G. Manis, S. Nikolopoulos, A. Alexandridi, and C. Davos, “Assessment
of the classification capability of prediction and approximation methods
for HRV analysis,” Comput. Biol. Med., vol. 37, no. 5, pp. 642–654, 2007.
[18] V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.
[19] H. Byun and S. W. Lee, “A survey of pattern recognition applications of
support vector machines,” Int. J. Pattern Recognit. Artif. Intell., vol. 17,
no. 3, pp. 459–486, 2003.
[20] I. El-Naqa, Y. Yang, M. Wernick, N. Galatsanos, and R. Nishikawa, “A
support vector machine approach for detection of microcalcifications,”
IEEE Trans. Med. Imag., vol. 21, no. 12, pp. 1552–1563, Dec. 2002.
[21] K. I. Kim, K. Jung, S. H. Park, and H. J. Kim, “Support vector machines for
texture classification,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24,
no. 11, pp. 1542–1550, Nov. 2002.
[22] M. Pontil and A. Verri, “Support vector machines for 3D object recogni-
tion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 6, pp. 637–646,
Jun. 1998.
[23] S. Li, T. Y. Kwok, I. Wai-Hang, and Y. Wang, “Fusing images with differ-
ent focuses using support vector machines,” IEEE Trans. Neural Netw.,
vol. 15, no. 6, pp. 1555–1561, Nov. 2004.
[24] G. Pajares and J. M. de la Cruz, “On combining support vector machines
and simulated annealing in stereovision matching,” IEEE Trans. Syst.,
Man, Cybern. B, Cybern., vol. 34, no. 4, pp. 1646–1657, Aug. 2004.
[25] R. Begg, M. Palaniswami, and B. Owen, “Support vector machines for
automated gait classification,” IEEE Trans. Biomed. Eng., vol. 52, no. 5,
pp. 828–838, May 2005.
[26] I. G ¨
uler and E. D. ¨
Ubeyli, “Multiclass support vector machines for EEG
signals classification,” IEEE Trans. Inf. Technol. Biomed., vol. 11, no. 2,
pp. 117–126, Mar. 2007.
[27] T. N. Lal, M. Schroder, T. Hinterberger, J. Weston, M. Bogdan, N. Bir-
baunner, and B. Scholkopf, “Support vector channel selection for BCI,”
IEEE Trans. Biomed. Eng., vol. 51, no. 6, pp. 1003–1010, Jun. 2004.
[28] M. Kaper, P. Meinicke, U. Grossekathoefer, T. Lingner, and H. Ritter,
“BCI competition 2003-Data set IIb: Support vector machines for the P300
speller paradigm,” IEEE Trans. Biomed. Eng., vol. 51, no. 6, pp. 1073–
1076, Jun. 2004.
[29] L. Ramirez, N. Durdle, D. Hill, and J. Raso, “A support vector classifier
approach to predicting the risk of progression of adolescent idiopathic
scoliosis,” IEEE Trans. Inf. Technol. Biomed., vol. 9, no. 2, pp. 276–282,
Jun. 2005.
[30] L. Ramirez, N. Durdle, J. Raso, and D. Hill, “A support vector machines
classifier to assess the severity of idiopathic scoliosis from surface topog-
raphy,” IEEE Trans. Inf. Technol. Biomed., vol. 10, no. 1, pp. 84–91, Jan.
2006.
[31] H. Liang and Z. Lin, “Detection of delayed gastric emptying from electro-
gastrograms with support vector machines,” IEEE Trans. Biomed. Eng.,
vol. 48, no. 5, pp. 601–604, May 2001.
[32] J. L. Rojo-Alvarez, J. Bermejo, V. M. Juarez-Caballero, R. Yotti,
C. Cortina, M. A. Garcia-Fernandez, and J. C. Antoranz, “Support vector
analysis of color-doppler images: A new approach for estimating indices
of left ventricular function,” IEEE Trans. Med. Imag., vol. 25, no. 8,
pp. 1037–1043, Aug. 2006.
[33] S. Osowski, L. T. Hoai, and T. Markiewicz, “Support vector machine-based
expert system for reliable heartbeat recognition,” IEEE Trans. Biomed.
Eng., vol. 51, no. 4, pp. 584–589, Apr. 2004.
[34] G. Georgoulas, C. Stylios, and P. Groumpos, “Predicting the risk of
metabolic acidosis for newborns based on fetal heart rate signal clas-
sification using support vector machines,” IEEE Trans. Biomed. Eng.,
vol. 53, no. 5, pp. 875–884, May 2006.
[35] A. Kampouraki, C. Nikou, and G. Manis, “Classification of heart rate
signals using support vector machines,” in Proc. BioSignal Conf., Brno,
Czech Republic, Jun. 2006, pp. 9–11.
[36] A. Kampouraki, C. Nikou, and G. Manis, “Robustness of support vector
machine-based classification of heart rate signals,” in Proc. IEEE Conf.
Eng. Med. Biol. Soc. (EMBS 2006), New York, Aug.–Sep., pp. 2159–2162.
[37] B. Sch ¨
olkopf, C. Burges, and A. Smola, Advances in Kernel Methods:
Support Vector Learning. New York: MIT Press, 1999.
[38] R. Kondor and T. Jebara, “A kernel between sets of vectors,” in Proc. 20th
Int. Conf. Mach. Learn. (ICML), Washington, DC, 2003, pp. 361–368.
[39] K. R. Muller, S. Mika, G. Ratsch, K. Tsuda, and B. Scholkopf, “An
introduction to kernel-based learning algorithms,” IEEE Trans. Neural
Netw., vol. 12, no. 2, pp. 181–202, Mar. 2001.
[40] J. Pan and W. J. Tompkins, “A real-time QRS detection algorithm,” IEEE
Trans. Biomed. Eng., vol. 32, no. 3, pp. 230–236, Mar. 1985.
[41] G. Manis, S. Nikolopoulos, and A. Alexandridi, “Prediction techniques
and HRV analysis,” presented at the Medicon Health Telematics, Naples,
Italy, 2004.
[42] M. C. Teich, “Multiresolution wavelet analysis of heart-rate variability for
heart-failure and heart-transplant patients,” in Proc. IEEE Int. Conf. Eng.
Med. Biol. Soc., vol. 3, Hong Kong, 1998, pp. 1136–1141.
[43] N. Iyengar, C.-K. Peng, R. Morin, A. Goldberger, and L. Lipsitz, “Age-
related alterations in the fractal scaling of cardiac interbeat interval dy-
namics,” Amer.J.Physiol., vol. 271, pp. 1078–1084, 1996.
[44] T. K. Kohonen, “The self-organizing map,” Proc. IEEE, vol. 78, no. 9,
pp. 1464–1480, Sep. 1990.
[45] M. T. Hagan and M. Menhaj, “Training feedforward networks with the
Marquardt algorithm,” IEEE Trans. Neural Netw., vol. 5, no. 6, pp. 989–
993, Nov. 1994.
Authors’ photographs and biographies not available at the time of publication.
Authorized licensed use limited to: Illinois Institute of Technology. Downloaded on September 3, 2009 at 10:29 from IEEE Xplore. Restrictions apply.
... K ampour aki et al. in 2009 [16] used SVM to classify time series heartbea ts. They used Statistical methods and signal testing techniques to extract features from the signal waves. ...
... SVM was originally conceived and developed by Vapnik and Cortes in the late 1960s and early 1970s [46]. SVM is a common tool for data classification and image classification [47,48]. When dealing with nonlinear multi-scale features for EEG classification, SVM can leverage kernel functions to map the data into a high-dimensional space. ...
Article
Full-text available
Electroencephalography (EEG) is essential for diagnosing neurological disorders such as epilepsy. This paper introduces a novel approach that employs the Allen-Cahn (AC) energy function for the extraction of nonlinear features. Drawing on the concept of multifractals, this method facilitates the acquisition of features across multi-scale. Features extracted by our method are combined with a support vector machine (SVM) to create the AC-SVM classifier. By incorporating additional measures such as Kolmogorov complexity, Shannon entropy, and Higuchi's Hurst exponent, we further developed the AC-MC-SVM classifier. Both classifiers demonstrate excellent performance in classifying epilepsy conditions. The AC-SVM classifier achieves 89.97% accuracy, 94.17% sensitivity, and 89.95% specificity, while the AC-MC-SVM reaches 97.19%, 97.96%, and 94.61%, respectively. Furthermore, our proposed method significantly reduces computational costs and demonstrates substantial potential as a tool for analyzing medical signals.
... The classification of myocardial infraction from normal ECG signal done by N. Bhaskar (2015) with wavelet transform and PCA for features extraction and artificial neural networks of three layers and SVM for classification of signals gave an overall accuracy of 91.0714% for approximate coefficients and 90.1786% for detailed coefficients. HRV features were extracted and are used for classification of heart beat time series using SVM by Kampouraki et al. (2009). In Emanet (2009) classified the ECG MIT-BIH arrhythmia database signals into five beats namely N, L, R, V, P based on AAMI standards using discrete wavelet transform for feature extraction and RF for classification and achieved an accuracy of 99.8%. ...
... Machine learning is developing rapidly to address real-world classification problems and automate decisions in different fields. Especially, time series classification (TSC) has gained popularity in several application domains, such as electrocardiogram (ECG) signal classification (Kampouraki et al., 2009), sensor signal classification (Yao et al., 2017), and stream monitoring (Rebbapragada et al., 2009). Nevertheless, most machine learning models remain opaque, while model interpretability is crucial for end-users and practitioners to gain trust in the predictions. ...
Article
Full-text available
In machine learning applications, there is a need to obtain predictive models of high performance and, most importantly, to allow end-users and practitioners to understand and act on their predictions. One way to obtain such understanding is via counterfactuals, that provide sample-based explanations in the form of recommendations on which features need to be modified from a test example so that the classification outcome of a given classifier changes from an undesired outcome to a desired one. This paper focuses on the domain of time series classification, more specifically, on defining counterfactual explanations for univariate time series. We propose Glacier, a model-agnostic method for generating locally-constrained counterfactual explanations for time series classification using gradient search either on the original space or on a latent space that is learned through an auto-encoder. An additional flexibility of our method is the inclusion of constraints on the counterfactual generation process that favour applying changes to particular time series points or segments while discouraging changing others. The main purpose of these constraints is to ensure more reliable counterfactuals, while increasing the efficiency of the counterfactual generation process. Two particular types of constraints are considered, i.e., example-specific constraints and global constraints. We conduct extensive experiments on 40 datasets from the UCR archive, comparing different instantiations of Glacier against three competitors. Our findings suggest that Glacier outperforms the three competitors in terms of two common metrics for counterfactuals, i.e., proximity and compactness. Moreover, Glacier obtains comparable counterfactual validity compared to the best of the three competitors. Finally, when comparing the unconstrained variant of Glacier to the constraint-based variants, we conclude that the inclusion of example-specific and global constraints yields a good performance while demonstrating the trade-off between the different metrics.
... Support vector machine (SVM) is a learning machine for two groups of classification problems [1,2]. SVM learning has been used in many applications, some of them are the detection of object and classification in medical imagery [3][4][5][6][7], detection of object and classification in transportation system [8][9][10], gender classification [11], signature recognition [12] and detection of termites [13]. The formulation of SVM learning was based on the principles of structural risk minimization. ...
Article
Full-text available
In this research, we investigated the method which was based on a support vector machine (SVM) to identify pleural effusion on the thoracic image. SVM is a method of machine learning that works well when applied to data outside the training set. We formulated the detection of pleural effusion and applied SVM to develop the identification algorithm. We applied SVM to detect thoracic images whether they identified as pleural effusion or normal. The identification of pleural effusion on the thoracic image was conducted through some processes such as the determination of the region of interest (ROI), segmentation, morphology operation, measurement of the sharpness value and slope value, training as well as testing. Determining ROI was intended to focus the measurement on the left side of the chest. Segmentation was carried out to separate lungs object from the background. Morphology operation was carried out for cavities on the object as the segmentation result to obtain the entire object so that the measurement of the slope's lower part image could be done perfectly. The training was carried out on 100 thoracic images, 50 of them were identified with pleural effusion and the other 50 were normal. The objective was to find the hyperplane with the parameter input such as the sharpness value and slope value of the lungs on the thoracic image. We tested the method proposed based on doctors' diagnosis using 50 thoracic images, 25 of which were identified with pleural effusion and the other 25 were normal. From the result of the test, the accuracy of the method we proposed was 96%.
Article
Full-text available
Arrhythmias can be extremely important in the diagnosis and management of cardiac disorders. In this research, we offer a feature extraction and support vector machine (SVM) based technique for ECG arrhythmia detection and classification. The suggested process entails extracting a variety of characteristics, including the R peak, QRS complex, and ST segment, and then utilizing the mutual information criteria to choose the aspects that are most pertinent. The identification and categorization of arrhythmias on the electrocardiogram (ECG) are essential steps in the diagnosis and management of cardiovascular disorders. Recent years have seen the application of feature extraction, machine learning (ML), and deep learning (DL) approaches to the identification and categorization of ECG arrhythmias. In this study, we cover the comparative evaluation of feature extraction, ML, and DL approaches for ECG arrhythmia detection and classification. We first discuss the history of ECG arrhythmia detection and classification before going into relevant research in the field. The approach for ECG data processing, feature extraction, and classification using ML and DL algorithms is then described. In MATLAB, we mimic the suggested methods and report the findings of our tests. Finally, we compare the effectiveness of various techniques and analyze their advantages, disadvantages, applications, and architecture.
Article
Full-text available
Nowadays, automated analysis of the electrocardiogram (ECG) signal is a popular choice to facilitates easy and expert-independent detection of lethal cardiovascular diseases (CVDs). Although, a majority of the state-of-the-art algorithms are found to be lagging due to the use of complicated methodologies, limited dataset, high feature dimension, feature selection or intense classification techniques. In this research an original, easy-to-use ECG based methodology is proposed for completely automated identification of multiple types of critical CVDs. Primarily, after preprocessing the algorithm uses a simplified technique for exact detection of ECG fiducial points. Then, based on detected fiducial points, some well interpretable and prominent ECG time domain features are efficiently extracted and a binary feature matrix has been derived using those features from different leads. Finally, a distinctive score is evaluated from the binary feature matrix calculating the sum of weighted feature value and only by utilizing the score,discrimination between the various types of CVDs is highly detectable. Proficiency of the algorithm is widely evaluated on the 12-lead ECG signal data collected from Physikalisch-Technische-Bundesanstalt (PTB) and PTB-XL database. The algorithm presents promising outcome with average accuracy, sensitivity and specificity of 99.43%, 98.27% and 99.59%, respectively. Evidently, the algorithm is capable enough and efficient as well in comparison with other reported techniques till date. Moreover, the use of a unique score derived from the binary matrix ascertains the exact detection of multiple cardiac abnormalities and the superior classification accuracy makes the algorithm promising for personal computerized health monitoring applications.
Article
The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Article
The self-organized map, an architecture suggested for artificial neural networks, is explained by presenting simulation experiments and practical applications. The self-organizing map has the property of effectively creating spatially organized internal representations of various features of input signals and their abstractions. One result of this is that the self-organization process can discover semantic relationships in sentences. Brain maps, semantic maps, and early work on competitive learning are reviewed. The self-organizing map algorithm (an algorithm which order responses spatially) is reviewed, focusing on best matching cell selection and adaptation of the weight vectors. Suggestions for applying the self-organizing map algorithm, demonstrations of the ordering process, and an example of hierarchical clustering of data are presented. Fine tuning the map by learning vector quantization is addressed. The use of self-organized maps in practical speech recognition and a simulation experiment on semantic mapping are discussed
Book
Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.