Content uploaded by Fahim Sufi
Author content
All content in this area was uploaded by Fahim Sufi on Jul 26, 2022
Content may be subject to copyright.
Cardiac Abnormalities Detection from Compressed
ECG in Wireless Telemonitoring using Principal
Components Analysis (PCA)
Ayman Ibaida #1 , Ibrahim Khalil #2 and Fahim Sufi #3
Distributed Systems & Networking
School of Computer Science & IT
RMIT University, Melbourne, Australia
1ayman.ibaida@student.rmit.edu.au
2ibrahimk@cs.rmit.edu.au
3fahim.sufi@student.rmit.edu.au
Abstract—In Wireless telecardiology applications ECG signal
is compressed before transmission to support faster data delivery
and reduce consumption of bandwidth. However, most of the
ECG analysis and diagnosis algorithms are based on processing
of the original ECG signal. Therefore, compressed ECG data
needs to be decompressed first before the existing algorithms
and tools can be applied to detect cardiovascular abnormalities.
Decompression will cause delay on the doctor’s mobile device
and in wireless nodes that have the responsibilities to detect and
prioritize abnormal data for faster processing. This is undesirable
in body sensor networks (BSNs) as high processing involved
in decompression will waste valuable energy in the resource
and power constrained sensor nodes. In this paper, in order to
diagnose cardiac abnormality such as Ventricular tachycardia,
we applied a novel system to analyse and classify compressed
ECG signal by using a PCA for feature extraction and k-mean
for clustering of normal and abnormal ECG signals.
I. INTRODUCTION
Cardiac diseases are number one killer in the modern world
as many people die because of sudden heart attack. At the
same time, a large number of people die because of the
delay or errors in diagnosing their cardiac diseases. Electro-
cardiogram (ECG) signal has been intensively used by car-
diac specialists to effectively diagnose cardiovascular diseases
[1].Several researchers have proposed various methods such
as digital signal processing, filtering methods, data mining
tools as well as neural networks for classification of cardiac
anomalies [1]. ECG can also be used for continuous patient
monitoring as well as in biometric authentication techniques
[2], [3], [4].
A typical ECG signal as shown in Fig. 1 contains special
waves such as P, T waves as well as QRS complex. Cardiol-
ogists investigate each of these waves, complexes and other
features such as RR interval, PR interval, PR segment, ST
interval and ST segment etc.[5] to diagnose various types of
abnormal cardiac symptoms. However, accurate extraction of
features using numerous signal processing techniques [6], [7]
can be complex and difficult. Recently, wavelet-based QRS
detectors have been suggested by a variety of researches [8].
Such methods have a post-processing phase in which the
redundant R waves or noise peaks are removed. Other re-
searchers used template matching methods to classify the ECG
signal using neural networks [9]. The techniques mentioned
above are applied to the original enormous sampled ECG
signal. Large sampled ECG signal will make processing both
time and resource consuming.
Time / Samples
A
mplitude
P
Q
R
S
T
Fig. 1. ECG Waves
In a typical wireless telemonitoring scenario as shown in
Fig. 2 a patient wears wireless sensors capable of reading
samples of ECG, possibly compress and diagnose, and send it
wirelessly with the result of diagnosis to a central server and
e-doctors (e.g. doctors who are roaming around with mobile
devices) that can take quick action according to its priority
[10]. However, wireless nodes (e.g. Sensor nodes in body
sensor networks (BSNs) or a roaming doctor’s smartphone) are
power and resource constrained. Therefore, it is obvious that
the existing [11], [12], [13], [14] and above mentioned tech-
niques for diagnosis are suitable for implementation neither
in body sensor networks nor in resource-constrained wireless
environment.
In wireless telemonitoring scenarios digitized ECG data
need to be transferred as fast as possible using the mobile
technologies such as MMS,GPRS, HSDPA or zigbee etc.
However, these technologies can not provide high speed
communication [15] and data must be compressed first to
make the transmission energy efficient. Therefore, in this paper
we have proposed a novel technique to analyse abnormalities
978-1-4244-3518-0/09/$25.00 © 2009 IEEE ISSNIP 2009207
Bluetooth
Link
Bandwidth Constrained
Wireless Link
Gateway
Node
Power Constrained ECG Sensor Nodes
diagnoses from compressed ECGs
Doctors Diagnosing on Resource
Constrained Mobile Nodes
Hospital
Roaming Patients
Roaming Doctors
Wireless Sensor Patch (remote aged
care facility or battlefield)
Diagnosis from
Compressed ECGs
zigbee 802.15
Base Station
ECG
Sensor
Fig. 2. A typical wireless telemonitoring scenario. Compression would save energy on power hungry bluetooth device, resource constrained wireless sensor
nodes and doctor’s smartphone. Compression also helps trasmit faster over bandwidth constrained wireless links. Diagnosis of diseases possible on Mobile
nodes from Compressed data.
from compressed ECG data without decompressing the data.
The abnormal cardiac condition considered in this paper is
Ventricular Tachycardia which is a life-threatening cardiac
desease consisting of a rapid rhtym originating from the lower
chambers of the heart. The rapid rate prevents the heart from
filling adequately with blood, and less blood is able to pump
through the body. To achieve this, we first applied a lossless
compression method as described in [15] before transmission.
We then analyzed the compressed ECG signal directly and
extracted the important features of it from the compressed data
using Principal Component Analysis (PCA). The extracted
features are classified as normal and abnormal using k-means
algorithm [16], [17]. In this research we have made contribu-
tions by answering the following research questions:
•How can we classify and detect normal and abnormal
ECG data directly from the compressed ECG signal
without decompression?
•How can we implement an attribute selection technique
and apply it to extract compressed ECG features using
principal component analysis (PCA)?
Rest of the paper is organized as follows. Section II briefly
discusses the our previously proposed compression algorithm
that is used in this paper. In section III we discuss the basic
system, present analysis of feature subset selection from com-
pressed ECGs using Principal Component Analysis (PCA).
Next, in section IV we show results of PCA and and simple
k-means algorithm to cluster data into abnormal and normal
segments.Finally, section V concludes the paper.
II. BACKGROUND:THE COMPRESSION ALGORITHM
The ECG encoding algorithm is a symbol substitution based
technique preceded by some mathematical transformations.
According to our previous experiments [15], up to 95% com-
pression (compression rate of 20) was harnessed without any
loss of information (lossless compression) when our encoding
algorithm is jointly applied with existing compression and
encryption algorithms. According to the literature, this was
highest possible compression ratio achieved for compressing
publicly available ECG of MIT BIT Arrhythmia Database. The
character set that is used for substitutions is shown in Fig.
3. Application of the algorithm with this character set would
generate, for example, compressed ECGs shown in Fig. 5.
Apart from providing highest possible compression ratio,
the compression algorithm also preserves features for cardio-
vascular diagnosis directly from the compressed ECG. Ac-
cording to the literature [18] and to the best of our knowledge,
multiple diseases were diagnosed from compressed MIT BIH
ECG for the very first time directly from their compressed
ECG.
The benefit of diagnosis from compressed ECG is immense.
As compressed ECG contains less characters, diagnosis from
compressed ECG can be possible (using the techniques shown
in [18] with fewer reading operations (I/O). Most importantly,
for telecardiology applications, where ECG is transmitted and
stored in compressed format, cardiovascular diagnosis is pos-
sible, without performing decompression, saving processing
power, resource and time. Minimizing delays in diagnosis
entail savings of patients’ lives.
The selected compression algorithm is not only a compres-
sion algorithm, but also an encryption algorithm to make sure
secure transmission of data can be achieved. The compression
algorithm consisting of the following stages:
•Normalization stage: to rescale the ECG signal and
convert them to the smallest integer value
208
Fig. 3. Character Set for the compressed ECG signal
•Differencing stage: to lower the amplitude of the signal
•Value encoding: to encode the unsigned normalized dif-
ference
•Sign encoding: to encode the signs of the values
•Decimal Values permutation stage as a mapping function
•Substitution of ASCII character codes to ASCII charac-
ters
III. THE METHODOLOGY
0 500 1000 1500 2000 2500
−2
1
.5
−1
0
.5
0
0
.5
1
1
.5
2
2
.5
3
(a)
0 500 1000 1500 2000 2500
1
.2
−1
0
.8
0
.6
0
.4
0
.2
0
0
.2
0
.4
0
.6
(b)
Fig. 4. (a) Normal ECG sample for pationt CU01 (b) Abnormal ECG sample
for pationt CU01
Huge amount of ECG data is required to be transmitted
over bandwidth constrained wireless networks as well as
limited power sensor nodes. However, sending large amount
of data is a power consuming and will reduce the lifetime of
body sensor networks. Therefore, compression of ECGs and
diagnosis of abnormalities from compressed ECGs will play
key roles in enhancing the lifetime of body sensor networks. In
this paper, we deployed the compression algorithm proposed
in [15] because it is a lossless algorithm. We analysed the
resulting compressed ECG and used data mining tools to
classify it as normal and abnormal. In Fig. 4(a) we can see a
normal ECG sample for patient CU01 from the CU Ventricular
Tachyarrhythmia Database and Fig. 4(b) shows the abnormal
sample for the same patient.
For the purpose of classification of normal and abnormal
cases, we will only use the compressed ECGs as shown in
Fig. 5. If we observe carefully we may not notice a signifi-
cant difference in normal and abnormal compressed signals.
By using special data mining algorithms we can determine
abnormalities in the compressed ECG data.
A. Analysis of Compressed ECG signal
The compressed ECG data contains the characters set shown
in Fig. 3. This compression is done in an wireless node of
the body sensors carried by a patient. Before transmission of
the compressed ECGs over the wireless networks data mining
module in the nodes needs to be trained with the normal
samples of compressed ECGs.
As shown in Fig. 7 character frequency calculation is
performed for each compressed ECG segment. As a result, we
have the frequency count for each character. Since we have
148 characters, if each character is regarded as an attribute,
then we have 148 attributes. But 148 attributes is large number
for clustering (normal and abnormal). Therefore, we applied
an attribute selection technique called Principle Component
Analysis (PCA) for dimensionality reduction.
ECG
Compression
Frequency
Count
Attribute
Selection (PCA)
Clustering of
Normal/Abnormal
(k-mean)
ECG Signal
Disease
Detection
Fig. 7. Block Diagram for the Proposed ECG detection system
B. Attribute Subset Selection
A preprocessing of data using attribute selection algorithm
is a critical issue in data mining solutions, since the training
will be hard and inaccurate using large number of attributes.
Also, it will make the system more complicated and the
processing time will be large if the number of attributes keep
increasing. In this paper, we adopted PCA which is appropriate
if there is a set of samples with large number of variables
(attributes). The algorithm will generate a new small set of
artificial variables called Principle Components which can be
selected and fed to clustering system. We first prepared the
data set for patient Cu01 as an example. For experimentation
we took 12 samples of which 6 are normal and the rest
abnormal. We then compressed each sample and calculated
character frequency to derive the final data set shown in Fig. 6.
By applying PCA on this data set we first generate the
covariance matrix of the data. Next, we derive eigenvectors and
eigenvalues for the covariance matrix which is then rearranged
as a new matrix starting with the eigenvector that corresponds
to the highest eigenvalue, and so on. As a result, this matrix
will be (n×n)matrix where nis the number of variables
(i.e in this case n= 148). After this, we calculate the scores
matrix which is a (n×m)matrix where nis the number of
samples and mis the number of variables. Equation 1 shows
209
(a) (b)
Fig. 5. Compressed ECG samples for patient CU01 (a) Abnormal ECG of 4(b) in Compressed Format (b) Normal ECG of 4(a) in Compressed Format
0 50 100 150
0
50
00
50
0 50 100 150
0
100
200
0 50 100 150
0
100
200
0 50 100 150
0
00
00
0 50 100 150
0
50
100
150
0 50 100 150
0
50
100
150
0 50 100 150
0
50
00
50
0 50 100 150
0
100
200
0 50 100 150
0
50
100
150
0 50 100 150
0
50
00
50
0 50 100 150
0
50
100
150
0 50 100 150
0
100
200
Fig. 6. data set for patient CU01 the first six plots for abnormal samples and the second six plots for normal samples
the general form to calculate the scores for the first principle
component
C1=b11(X1)+b21 (X2)+...b
p1(Xp)(1)
where
C1= the sample score on the principal component 1
bp1= the regression coefficient (or weight) for observed
variable p,
Xp= the sample value of variable no p
Similarly, other principal components (i.e. PC2, PC3,PC4,
and so on) can also be calculated. The challenge now is how
many Principal components we will keep. A simple calculation
reveals that the first few components represent the high portion
of data, which is clearly shown in Table I with the eigenvalues
and the proportion of each eigenvalue of the total data.
If we look at Table I we can clearly notice that the first and
second eigenvalues represent approximately 70% of the total
data. Proportions of each eigenvalue in this table is derived
by dividing the eigenvalue over the total summation of all
eigenvalues obtained as in equation 2.
Pi=ei
k=m
k=1 ek
(2)
210
−6 −5 −4 −3 −2 −1 0 1 2 3 4
−3
−2
−1
0
1
2
PC1
PC2
CU01
Normal
Abnormal
−4 −3 −2 −1 0 1 2 3 4
−5
−4
−3
−2
−1
0
1
2
PC1
PC2
CU03
Normal
Abnormal
−5 −4 −3 −2 −1 0 1 2 3 4
−3
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
PC1
PC2
CU09
Normal
Abnormal
Fig. 8. Class distribution for Principal Component 1 (i.e. PC1) & Principal Component 2 (i.e. PC2) for patients cu01,cu03 and cu09 respectively
TABLE I
EIGENVALUES FOR VARIOUS PRINCIPAL COMPONENTS OF PATIENT CU01
Principal Components Eigenvalue Proportion
Principal Component 1 (PC1) 82.56614 0.56552
Principal Component 2 (PC2) 20.04823 0.13732
Principal Component 3 (PC3) 7.47957 0.05123
Principal Component 4 (PC4) 6.37829 0.04369
Principal Component 5 (PC5) 5.9081 0.04047
Principal Component 6 (PC6) 5.45423 0.03736
Principal Component 7 (PC7) 5.16522 0.03538
Principal Component 8 (PC8) 3.76667 0.0258
Principal Component 9 (PC9) 3.6941 0.0253
where
Piis proportion of the ith eigenvalue
eiis the ith eigenvalue
mnumber of eigenvalues which is the same number of
variables.
Therefore, in our work we will take just the first two
principal components and their corresponding score will be
used as an input for our clustering part. The same procedure
was repeated for all other patients of Ventricular Tachyarrhyth-
mia Database to determine eignevalues and the principal
components.
TABLE II
SCORES FOR PC1 AND PC2 (COMPONENT 1 &2) OF CU01
Sample Scores for Scores for Class
Number PC1 PC2
1 -1.18113 -2.54187 abnormal
2 -1.26634 -2.64893 abnormal
3 -0.88636 -2.97021 abnormal
4 -0.12991 -1.99849 abnormal
5 3.246313 1.579908 abnormal
6 2.38746 0.798551 abnormal
7 -4.92702 0.419675 normal
8 -5.10988 -0.03882 normal
9 -4.82858 0.275417 normal
10 -4.99787 -0.12245 normal
11 -5.02603 0.221886 normal
12 -4.94025 0.02097 normal
IV. RESULTS AND DISCUSSION
Using the procedure discussed earlier we can now derive Ta-
ble II that shows the first two principal component scores for
every normal and abnormal ECG segment. Since we performed
TABLE III
K-MEAN RESULTS FOR CU01
Sample Distance Distance
Number from class1 from class2 Class
1 21.50359 3.930326 2 (abnormal)
2 21.44839 4.478564 2 (abnormal)
3 26.29707 4.35777 2 (abnormal)
4 27.9701 0.733969 2 (abnormal)
5 69.63799 16.59684 2 (abnormal)
6 54.60353 8.494487 2 (abnormal)
7 0.086221 30.91665 1 (normal)
8 0.047432 31.52044 1 (normal)
9 0.041762 29.41073 1 (normal)
10 0.064142 30.10386 1 (normal)
11 0.011507 31.33386 1 (normal)
12 0.01275 29.84699 1 (normal)
tests on 6 normal and another 6 abnormal ECG segments for
every patient we have 12 sets of values for the principal com-
ponent scores. This particula Table II corresponds to patient
CU01 of CU Ventricular Tachyarrhythmia Database. Similiary,
scores can be derived for other patients in the database. Figure
8 show class distribution for Principal Component1&2for
patients cu01,cu03 and cu09 respectively. It is obvious from
this distribution that abnormal ECGs can be easily separated
from the normal ones. Similar tests were performed on all
other patients from Ventricular Tachyarrhythmia Database, and
they all follow the same trend which confirms that abnormal
ECGs can be distinguished from the normal ECGs when PCA
is applied to compressed ECGs of the patients.
Now we just have two variables (i.e PC1 and PC2) which
can be easily fed to a k-mean clustering to classify abnormal
and normal ECG segments. This will further validate our
earlier observations. Table III shows the results for k-mean
algorithm as it is applied to previous data shown in Table
II. From the results it is clear that the distances of samples
1-6 are small for class 2 (abnormal) and large for class 1
(normal). This is why it is classified as class 2 (abnormal).
Similarly, samples 7-12 have small distance from class 1 and
large distance from class 2. Therefore, it is classified as class
1 (normal). This fact is clearly established in Fig. 8 which
shows plots of scores for three patients.
211
V. CONCLUSION AND FUTURE WORK
Because ECG signal is enormous in size [19],compression
algorithms must be used to make the whole tele-cardiology
faster and efficient. A faster solution is of crucial importance
for diagnoses and treatment of cardiovascular diseases. Al-
though ECG compression enables faster transmission, it also
introduces a delay in the processing phase because of the de-
compression. Since existing methods process the original ECG
signal and not the compressed one, this decompression time
can be enough to threaten patient life. However decompression
in wireless telemonitoring will cause delay on the doctor’s
mobile devices. This (decompression) is also undesirable in
body sensor network as more processing will waste valuable
energy resources. To overcome the decompression delay and
make body sensor network energy efficient, in this paper we
implemented the ECG analysis and data mining solution on
the compressed ECG signal using PCA for feature extraction
and k-mean as a clustering technique. Compressed ECG signal
can be fast in transmission, and now we have clearly shown
that we can classify and analyse the compressed ECG signal
to detect cardiac abnormalities. Encouraged by these results
we intend to develop a neural network model to be trained to
classify more diseases in a node of body sensor networks.
REFERENCES
[1] G. Clifford, F. Azuaje, and P. McSharry, Advanced methods and tools
for ECG data analysis. Artech House.
[2] I. Khalil and F. Sufi, “Legendre Polynomials based biometric authen-
tication using QRS complex of ECG,” in Intelligent Sensors, Sensor
Networks and Information Processing, 2008. ISSNIP 2008. International
Conference on, 2008, pp. 297–302.
[3] F. Sufi and I. Khalil, “An automated patient authentication system for
remote telecardiology,” in Intelligent Sensors, Sensor Networks and
Information Processing, 2008. ISSNIP 2008. International Conference
on, 2008, pp. 279–284.
[4] F. Sufi, I. Khalil, and I. Habib, “Polynomial distance measurement
for ECG based biometric authentication.” John Wiley & Sons, Ltd.
Chichester, UK, 2008.
[5] Y. Suzuki, “Self-organizing QRS-wave recognition in ECG using neural
networks,” IEEE Transactions on Neural Networks, vol. 6, no. 6, pp.
1469–1477, 1995.
[6] S. Mahmoodabadi, A. Ahmadian, and M. Abolhasani, “ECG feature
extraction using Daubechies wavelets,” in Proceedings of the Fifth
IASTED International Conference, Visualization, Imaging, and Image
Processing, Benidorm, Spain, 2005.
[7] K. Minami, H. Nakajima, and T. Toyoshima, “Real-time discrimination
of ventricular tachyarrhythmia withFourier-transform neural network,”
IEEE Transactions on Biomedical Engineering, vol. 46, no. 2, pp. 179–
185, 1999.
[8] A. Ghaffari, H. Golbayani, and M. Ghasemi, “A new mathematical
based QRS detector using continuous wavelet transform,” Computers
and Electrical Engineering, vol. 34, no. 2, pp. 81–91, 2008.
[9] S. Barro, M. Fernandez-Delgado, J. Vila-Sobrino, C. Regueiro, and
E. Sanchez, “Classifying multichannel ECG patterns with an adaptive
neuralnetwork,” IEEE Engineering in Medicine and Biology Magazine,
vol. 17, no. 1, pp. 45–55, 1998.
[10] B. Lo, S. Thiemjarus, R. King, and G. Yang, “Body sensor network–a
wireless sensor platform for pervasive healthcare monitoring,” in The
3rd International Conference on Pervasive Computing. Citeseer, 2005.
[11] J. De Vos and M. Blanckenberg, “Automated pediatric cardiac auscul-
tation,” IEEE Transactions on Biomedical Engineering, vol. 54, no. 2,
pp. 244–252, 2007.
[12] E. Ubeyli, “Eigenvector Methods for Automated Detection of Electro-
cardiographic Changes in Partial Epileptic Patients.”
[13] W. Jiang and S. Kong, “Block-based neural networks for personalized
ECG signal classification,” IEEE Transactions on Neural Networks,
vol. 18, no. 6, 2007.
[14] F. Melgani and Y. Bazi, “Classification of Electrocardiogram Signals
With Support Vector Machines and Particle Swarm Optimization,” IEEE
Transactions on Information Technology in Biomedicine, vol. 12, no. 5,
pp. 667–677, 2008.
[15] F. Sufi and I. Khalil, “Enforcing secured ECG transmission for real time
telemonitoring: A joint encoding, compression, encryption mechanism,”
Security and Communication Networks, vol. 1, no. 5, 2008.
[16] I. Jolliffe, Principal component analysis. Springer verlag, 2002.
[17] M. Dunham, Data mining introductory and advanced topics. Prentice
Hall/Pearson Education, 2003.
[18] F. Sufi, Q. Fang, I. Khalil, and S. S. Mahmoud, “Novel methods of faster
cardiovascular diagnosis in wireless telecardiology,” IEEE Journal on
Selected Areas in Communications, vol. 27, no. 4, MAY 2009.
[19] F. Sufi, I. Khalil, Q. Fang, and I. Cosic, “A mobile web grid based
physiological signal monitoring system,” in Technology and Applications
in Biomedicine, 2008. ITAB 2008. International Conference on, 2008,
pp. 252–255.
212