ArticlePDF Available

A clustering based system for instant detection of cardiac abnormalities from compressed ECG

Authors:

Abstract and Figures

Compressed Electrocardiography (ECG) is being used in modern telecardiology applications for faster and efficient transmission. However, existing ECG diagnosis algorithms require the compressed ECG packets to be decompressed before diagnosis can be applied. This additional process of decompression before performing diagnosis for every ECG packet introduces undesirable delays, which can have severe impact on the longevity of the patient. In this paper, we first used an attribute selection method that selects only a few features from the compressed ECG. Then we used Expected Maximization (EM) clustering technique to create normal and abnormal ECG clusters. Twenty different segments (13 normal and 7 abnormal) of compressed ECG from a MIT-BIH subject were tested with 100% success using our model. Apart from automatic clustering of normal and abnormal compressed ECG segments, this paper presents an algorithm to identify initiation of abnormality. Therefore, emergency personnel can be contacted for rescue mission, within the earliest possible time. This innovative technique based on data mining of compressed ECGs attributes, enables faster identification of cardiac abnormalities resulting in an efficient telecardiology diagnosis system.
Content may be subject to copyright.
A clustering based system for instant detection of cardiac abnormalities from
compressed ECG
Fahim Sufi
, Ibrahim Khalil
1
, Abdun Naser Mahmood
2
RMIT University, School of Computer Science and Information Technology, 123 Latrobe St., Melbourne, VIC 3000, Australia
article info
Keywords:
Cardiac abnormality classification
Compressed ECG
CVD diagnosis
Symmetricity of bi-class clustering
CVD alert mechanism
abstract
Compressed Electrocardiography (ECG) is being used in modern telecardiology applications for faster and
efficient transmission. However, existing ECG diagnosis algorithms require the compressed ECG packets
to be decompressed before diagnosis can be applied. This additional process of decompression before per-
forming diagnosis for every ECG packet introduces undesirable delays, which can have severe impact on
the longevity of the patient. In this paper, we first used an attribute selection method that selects only a
few features from the compressed ECG. Then we used Expected Maximization (EM) clustering technique
to create normal and abnormal ECG clusters. Twenty different segments (13 normal and 7 abnormal) of
compressed ECG from a MIT-BIH subject were tested with 100% success using our model. Apart from
automatic clustering of normal and abnormal compressed ECG segments, this paper presents an algo-
rithm to identify initiation of abnormality. Therefore, emergency personnel can be contacted for rescue
mission, within the earliest possible time. This innovative technique based on data mining of compressed
ECGs attributes, enables faster identification of cardiac abnormalities resulting in an efficient telecardiol-
ogy diagnosis system.
Ó2010 Published by Elsevier Ltd.
1. Introduction
Electrocardiogram (ECG) signal has significantly been used for
diagnosing Cardiovascular Diseases (CVD), which have been the
number one killer of modern time. The existing diagnosis algo-
rithms are mostly suited for plain ECG signals (i.e. not in com-
pressed form) that work by detecting the ECG fiducial points,
namely P, Q, R, S and T (Friesen et al., 1990; Hamilton & Tompkins,
1986; Sufi, Fang, & Cosic, 2007; Surez, Silva, Berthoumieu, Gomis, &
Najim, 2007) (as shown in Fig. 1). After detecting the ECG fiducial
points, the existing ECG diagnosis algorithms employ computa-
tionally intensive processing to ascertain particular cardiac
anomalies.
According to existing research on mobile phone based telecardi-
ology application, the death rate associated with CVD can be tack-
led by harnessing the processing power of mobile technologies
(Blount, 2007; Hung & Zhang, 2003; Lee, Chen, Hsiao, & Tseng,
2007). More recent set of research confirms that the usage of
specially designed compression technologies can result in a faster
telecardiology solutions (Istepanian & Petrosian, 2000; Kim, Yoo,
& Lee, 2006; Sufi, Fang, Mahmoud, & Cosic, 2006; Sufi, Fang, Khalil,
& Mahmoud, 2009; Sufi & Khalil, 2008). However, if the ECG pack-
ets remain in compressed format during data transmission and
storage, then existing ECG diagnosis algorithms cannot be applied
directly. The compressed ECG must be decompressed before apply-
ing most of the CVD detection algorithms (Friesen et al., 1990;
Hamilton & Tompkins, 1986; Surez et al., 2007; Sufi et al., 2007).
If a hospital has hundreds of remotely monitored (real-time) CVD
patients then the hospital server might have to perform this addi-
tional task of decompression for millions of compressed ECG pack-
ets per second. Therefore, this added process of decompression
may create enormous computational burden on existing infra-
structure. To mitigate the computational burden imposed by com-
pression technology, research in Sufi et al. (2009) demonstrates a
new set of CVD diagnosis algorithms that works on compressed
ECG directly (i.e. decompression of the compressed ECG packet is
no longer required). However, the techniques of detecting cardiac
abnormality from compressed ECG presented in Sufi et al. (2009)
engages a rule based algorithm for detection of a particular disease.
In order to identify all the cardiac abnormalities, the presented sys-
tem in Sufi et al. (2009) requires hundreds of complex algorithms
to be integrated under one computationally intensive system.
Maintaining and updating such complex system for every new
abnormality is difficult.
This introduces the problem of finding a simple and fast
solution towards heart abnormality detection from compressed
0957-4174/$ - see front matter Ó2010 Published by Elsevier Ltd.
doi:10.1016/j.eswa.2010.08.149
Corresponding author.
E-mail addresses: research@fahimsufi.com (F. Sufi), ibrahimk@cs.rmit.edu.au (I.
Khalil), abdun.mahmood@rmit.edu.au (A.N. Mahmood).
1
Tel: +61399252879.
2
Tel: +61399251902.
Expert Systems with Applications 38 (2011) 4705–4713
Contents lists available at ScienceDirect
Expert Systems with Applications
journal homepage: www.elsevier.com/locate/eswa
ECG that raises alert to the cardiac specialist as soon as a cardiac
abnormality is detected.
In this paper, we present a simple but efficient data mining
based solution that detects an abnormality from the compressed
ECG instantly. This technique can be placed within a wireless mon-
itoring facility to alert the emergency personnel in an event of car-
diac abnormality of a subscribed patient.
2. Background
Human heart is responsible for maintaining oxygenated blood
circulating throughout our body, by beating about 100,000 times
per day. A human heart contains four chambers: two atria and
two ventricles. The deoxygenated blood initially enters the right
atrium. The right atrium contracts and forces the deoxygenated
blood to the right ventricle. From the right ventricle the oxygen
deficit blood rushes to the lungs, where gas exchange process takes
place and blood attains oxygen (releases carbon dioxide). The oxy-
genated (i.e. oxygen enriched) blood then enters the left atria, from
where it is redirected to the left ventricle. Finally, the left ventricle
forces the blood to the rest of the body. Both the atria contracts to-
gether, and on the other hand both the ventricular contraction oc-
curs at the same time.
An ECG signal, representation of the electrical activity of the
heart, has three major features waves; namely P wave, QRS com-
plex and T wave (as seen from Fig. 1). An atrial contraction results
in a P wave and a ventricular contraction is reflected by a QRS com-
plex. T wave, on the other hand, represents ventricular relaxation
that occurs after ventricular contraction. Cardiologists have used
different features of these feature waves to assess the condition
of the heart (see Tables 1).
As seen from Fig. 2, patient is attached with a portable ECG
acquisition device, which collects ECG signal from the patient’s
body and transmits ECG packets to the mobile phone via Bluetooth,
Wifi, Near Field Communication (NFC) or Zigbee protocol. Mobile
phone then compresses and encrypts the ECG packets and
forwards them (i.e. compressed and encrypted packets) to the
Time / Samples
A
mplitude
P
Q
R
S
T
Fig. 1. The proposed cardiac diagnosis system.
Table 1
ECG Features related to P wave, QRS complex and T wave.
P wave duration QRS complex duration T wave duration
P wave amplitude QRS complex amplitude T wave amplitude
P wave onset slope Q onset slope T wave onset slope
P wave offset slope Q offset slope T wave offset slope
QT Interval R onset slope P wave direction
RR Interval R offset slope T wave direction
ST Segment S onset slope
RR Interval S offset slope
ECG Acquisition
Device
Patient’s mobile
phone compresses
and encrypts the
ECG packets
ECG acquisition device
to mobile communication
via Bluetooth, NFC, Zigbee
or Wifi
Monitoring service / hospital employs
Data Mining Agent to detect abnormality
Ambulance or rescue team is
notified when abnormality
is detected by the Data
Mining Agent
Patient’s mobile phone transmits
the ECG packets via HTTP, MMS or
SMS to the hospital / monitoring
service
Fig. 2. Architecture of the data mining based compressed ECG diagnosis system.
4706 F. Sufi et al. / Expert Systems with Applications 38 (2011) 4705–4713
hospital or monitoring services via HTTP or MMS. The monitoring
services execute a background monitoring agent implementing
data mining techniques. This mobile phone based compressed
ECG transmission has been proposed and used in our earlier re-
search works (Sufi, 2007; Sufi et al., 2009, 2006; Sufi & Khalil,
2008).
However, for this paper we are adding a data mining module
(situated in the hospital) for identification of CVD abnormality
from compressed ECG sent by the patient, using clustering tech-
niques. These data mining techniques use the knowledge of what
is normal and what is abnormal from the monitored patient’s
ECG. The input and output to the mining agent are the compressed
ECG and a Boolean type denoting abnormality, respectively. There-
fore, for this telemonitoring solution, if the compressed ECG is de-
rived from a normal ECG, output of the data mining agent would be
negative. In case of abnormal ECG signal from the patient, the
agent will output positive detection, signalling abnormality and
alert mechanism would be activated in such a case.
3. Architecture of the proposed disease identification system
In remote telemonitoring, massive amount of ECG data is trans-
ferred (Sufi, Khalil, Fang, & Cosic, 2008), and therefore, adoption of
specialized compression technology (as demonstrated in our ear-
lier research in Sufi et al. (2009) & Sufi & Khalil (2008)) is often re-
quired. Our ECG compression technique uses the encoding function
() that transforms the ECG signal, X
n
to a compressed ECG, C
r
(Eq.
1). The lossless nature of our ECG compression technique ensures
that ECG features set, F(a subset of ECG signal X
n
as shown in
Eq. (2)) also exists within the encoded (or compressed) ECG C
r
(Eq. 3). New algorithm can be designed to reveal these encoded
ECG feature set for CVD diagnosis directly from the compressed
ECG.
As an example, Fig. 3 shows a normal ECG segment for Entry ID
CU1 of CU Ventricular Tachyarrhythmia database (Physiobank,
2009). Fig. 4 demonstrates the initiation of abnormality (i.e. Ven-
tricular Tachyarrhythmia) for that particular patient. Lastly, Fig. 5
depicts a complete episode of Ventricular Tachyarrhythmia for
the same patient. Fig. 6 shows the compressed ECG (i.e. com-
pressed using our specialized ECG compression algorithm (Sufi
et al., 2009; Sufi & Khalil, 2008)) of Figs. 3–5. Eq. (1) represents
the fact that Fig. 6 preserve the ECG features of Figs. 3–5. Within
this paper, our proposed idea is to harness data mining routines
for efficient detection of CVD anomalies (i.e. cardiac abnormality)
0 200 400 600 800 1000
−1
0
1
2
3
Fig. 3. A normal ECG segment of a patient (a random CU1 entry MIT BiH CU
Ventricular Tachyarrythmia Database).
0 200 400 600 800 1000
−2
0
2
4
Fig. 4. Initiation of abnormality (Ventricular Tachyarrythmia) with the ECG
segment for (CU1).
0 200 400 600 800 1000
−2
−1
0
1
2
Fig. 5. An abnormal (Ventricular Tachyarrythmia) ECG segment of a patient (CU1).
Fig. 6. Compressed ECG for Fig. 3 (normal ECG), Fig. 4 (normal and abnormal) and Fig. 5 (abnormal ECG).
F. Sufi et al. / Expert Systems with Applications 38 (2011) 4705–4713 4707
directly from compressed ECG (e.g. the compressed ECG shown in
Fig. 6).
ðX
n
Þ¼C
r
ð1Þ
FX
n
ð2Þ
FC
r
ð3Þ
During the compression process, 148 characters and numeric
values (0–9) are used to encode the plain text ECG signal, as seen
in Fig. 7 (ECG compression is performed inside patient’s mobile
phone). The data mining agent (DMA) of the hospital (Fig. 2) needs
to be trained with normal and abnormal ECG (from compressed
ECG) of patients. After being trained, the DMA can be tested for
irregularities (abnormal ECG). Our proposed algorithm (Algorithm
1), instantly identifies abnormal ECG segments (directly from the
compressed ECG).
3.1. Training of the proposed model
During this training phase, the proposed model learns what is
normal ECG and what is abnormal ECG. Fig. 8 shows the main
stages of this learning process from compressed ECG.
3.1.1. Character frequency calculation
As shown in Fig. 8, from the compressed ECG, the frequency of
each encoded characters is computed first. There are about 148
characters and 6 numeric subgroups for which the frequencies
are generated (Fig. 7). The frequency of these 157 character (and
numeric sub groups) are utilized as the attributes for clustering.
However, 157 attributes are too many for generating clusters (nor-
mal and abnormal ECG). Therefore, the attribute subset selection is
necessary. Using proven techniques, we first select characters from
the compressed ECG that are mainly responsible for identifying
diseases. Then, based on the selected characters (or attributes)
classification of abnormality and normality is possible.
3.1.2. Attribute subset selection
Data pre-processing using attribute selection is an important
step in data mining, since a large number of attributes often lead
to poor learning due to untenably large combinatorial search space
for the solution (Han & Kamber, 2006). The goal of feature subset
selection is to (a) reduce the dimensionality of the data to be ana-
lysed, (b) to speed up execution of learning algorithms, (c) improve
performance of data mining techniques including learning time
and predictive accuracy, (d) improve the comprehensibility of the
output. Recent studies have shown that attribute subset selection
helps improve the performance of clustering algorithms with re-
duced attributes (Sufi & Khalil, 2009; Talavera, 1999a, 1999b). In
this paper, we have adapted for use with continuous ECG signals,
a correlation based feature subset selection technique ( Hall,
1999; Sufi & Khalil, 2009), which outperforms other feature selec-
tion algorithms, such as ReliefF (Kira & Rendell, 1992) and RReliefF
(Robnik-Sikonja & Kononenko, 1997). The attribute selection is
based on an attribute’s relative utility with regards to the predicted
class as well as taking into consideration its correlation with other
attributes in the subset. The utility of an attribute can be repre-
sented using the Pearson’s co-efficient for correlation, where the
variables are standardized as in Eqs. (4) and (5)
r
xy
¼Pðx
i
xÞðy
i
yÞ
ðn1Þ
r
x
r
y
ð4Þ
U
S
¼Cr
ap
ðCþCðC1Þr
a
a
Þ
1
2
ð5Þ
wherex
i
andy
i
aresamplemeancalculatedfromthedata,
r
x
and
r
x
are
the standard deviations, a;
a2S;C jSj;r
xy
A
v
eragecorrelation
between features xand y. For a subset Sof Cfeatures, the utility
function calculates how much the features ða;
aÞare related r
ap
to the predicted class p, while being less related to each other r
a
a
.
The utility function reduces the effect of irrelevant attributes
as they are less correlated with the predicted class. It also
discards redundant attributes as they are highly correlated with each
other.
We used a greedy best first algorithm to search through the
candidate subsets for a locally optimal solution. The algorithm ini-
tiates with an empty subset, adding one attribute at a time and
estimating the utility function, to determine the correlation of
the subset with the predicted class. The next attribute is added
as long as the utility value does not decline for the best subset. If
there is a decrease then the algorithm selects the next best subset
and commences adding attributes to it. In some datasets where
there are groups of features that are locally predictive to the pre-
dicted class, we investigate the attributes that were initially dis-
carded while building the best subset. In this case, after the best
subset has been generated, the algorithm investigates the rejected
list of attributes one-by-one and evaluates its correlation to the
predicted class against the average correlation to the subset. If its
correlation to the class is greater than its correlation with the attri-
bute subset, signalling a stronger attraction to the class than the
subset, then the attribute is incorporated in the subset.
Fig. 7. 157 characters and numeric sub groups (attributes) used for generating
compressed ECG (from plain ECG signal). Details of this character substitution
based compression techniques have been described in Sufi et al. (2009), Sufi and
Khalil (2008).
Compressed ECG Attribute Subset
Selection
Clustering of
Reduced Attributes
Detection of Normal
and Abnormal Clusters
Data Mining Techniques
Calculate Frequency
of Each Characters
Fig. 8. Step by step procedure of the proposed cardiac abnormality detection technique.
4708 F. Sufi et al. / Expert Systems with Applications 38 (2011) 4705–4713
3.1.3. Automatic learning of normal and abnormal patterns using
clustering of compressed ECG features
Using the smaller subset of attributes we can now produce a
cluster from the normal compressed ECG patterns. This cluster of
normal patterns would serve as the benchmark test against future
ECG sent from the observed client. Under normal circumstances
any incoming ECG would closely match the stored cluster. How-
ever, if there is any abnormality then the clustering algorithm
would create a different cluster from the abnormal ECG. This will
generate an alarm and require urgent attention of a physician or
a cardiologist. It should be noted that procedure given in this paper
works solely on the compressed ECG character frequency, and does
not even require decompression, which would take valuable extra
time from a patient’s life.
The aim of clustering is to group a given set of objects so that
similar objects (also known as cases, instances or patterns) are
grouped together and dissimilar objects are kept apart. Although
there are many different techniques to build multi-dimensional
clusters (Mahmood, Leckie, & Udaya, 2008), we have chosen a sta-
tistical clustering technique called Expectation Maximization (EM)
(Han & Kamber, 2006) to cluster compressed ECG data, since it can
be used to find the correct number of clusters automatically.
Assuming two clusters Aand B, representing normal and abnormal
class of ECG, we describe the steps for EM clustering for two
clusters:
1. Choose model parameters mean
l
, standard deviation
r
and
probability of clusters parbitrarily for clusters Aand B.
2. For each iteration j, calculate the probability that instance I
belongs to clusters Aand B:
PðAjIÞ¼p
j
A
P
j
ðIjAÞ
P
j
ðIÞ;PðBjIÞ¼p
j
B
P
j
ðIjBÞ
P
j
ðIÞð6Þ
The probability of P(IjA) can be modelled using any distribution
function. For the commonly used Gaussian distribution that we
have adopted in this paper, it can be given by
PðIjAÞ¼ 1
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð2
p
Þ
r
A
pexp
ðI
l
AÞ2
2
r
2
ð7Þ
3. Update the mixture parameters on the basis of the new
estimates:
P
jþ1
B
¼P
I
PðAjIÞ
n;P
jþ1
B
¼P
I
PðBjIÞ
nð8Þ
l
jþ1
A
¼P
I
IPðAjIÞ
P
I
PðAjIÞ;
l
jþ1
B
¼P
I
IPðBjIÞ
P
I
PðBjIÞð9Þ
r
jþ1
A
¼P
I
PðAjIÞðI
l
jþ1
A
Þ
2
P
I
PðAjIÞð10Þ
r
jþ1
B
¼P
I
PðBjIÞðI
l
jþ1
B
Þ
2
P
I
PðBjIÞð11Þ
4. Calculate the log likelihood value E
j
¼P
I
logðP
j
ðIÞÞ. Consider a
fixed stopping criterion
, then if jE
j
E
j+i
j6
, then stop; else set
j=j+1.
EM can decide how many clusters to create by cross validation
(as is the case in the present study), or it may be specified a priori
(normal and abnormal clusters). In the depicted scenario of Fig. 2,
the patient continuously sends the compressed ECG information to
the hospital, which clusters the new information and checks to see
if there are two clearly segregated clusters. In cases where the
compressed ECG falls under abnormal cluster (or inclines towards
abnormal cluster), as shown in Fig. 12, abnormality is detected. If
such an abnormality is observed then an immediate alarm is
raised, since the ECG pattern has been found to be significantly
0 500 1000
−2
0
2
4
0 500 1000
−2
0
2
4
0 500 1000
−2
0
2
4
0 500 1000
−2
0
2
4
0 500 1000
−2
0
2
0 500 1000
−2
0
2
0 500 1000
−2
0
2
0 500 1000
−2
0
2
0 500 1000
−2
0
2
0 500 1000
−2
0
2
0 500 1000
−2
0
2
4
0 500 1000
−2
0
2
4
0 500 1000
−5
0
5
0 500 1000
−5
0
5
0 500 1000
−5
0
5
0 500 1000
−5
0
5
0 500 1000
−2
0
2
4
0 500 1000
−2
0
2
4
0 500 1000
−2
0
2
4
0 500 1000
−2
0
2
4
Fig. 9. 20 randomely selected ECG segments for CU1 entry (from CU Ventricular Tachyarythmia MIT BIH).
F. Sufi et al. / Expert Systems with Applications 38 (2011) 4705–4713 4709
different from normal patterns. In our experiments, the EM algo-
rithm has been successful in isolating the normal and abnormal
compressed ECG with remarkable accuracy (100%) using the 20
ECG segment dataset.
3.2. Instant abnormality detection from compressed ECG
Once the proposed model is trained, we know the cluster cen-
ters (or means) for all the selected attributes (for the classes). With
this knowledge, whenever a new compressed ECG is sent by the
patient, the DMA calculates the frequency of selected characters
(selected attribute in training stage). These inputs (attribute values
of the instance) are fed along with the cluster centers to Algorithm
1, which determines initialization of abnormality.
During an initialization of abnormality, we expect the com-
pressed ECG packet to contain both normal and abnormal ECG.
Therefore, for these initialization of abnormality packets, distances
from normal cluster centers (for the selected attributes) will start
to increase. Abnormality can be signalled, once the distance be-
tween the instance (initialization ECG packet) and normal cluster
0 100
0
50
100
0 100
0
50
100
0 100
0
50
100
0 100
0
50
100
0 100
0
50
100
0 100
0
50
100
0 100
0
50
100
0 100
0
50
100
0 100
0
50
100
0 100
0
50
100
0 100
0
50
100
0 100
0
50
100
0 100
0
50
100
0 100
0
50
100
0 100
0
50
100
0 100
0
50
100
0 100
0
50
100
0 100
0
50
100
0 100
0
50
100
0 100
0
50
100
Fig. 10. Frequency distribution of the 20 randomly selected ECG segments for CU1 entry (of Fig. 9). Boxed region shows high frequencies of attribute 115–131 denoting
abnormality from the compressed ECG.
Table 2
Selected characters (first half attributes) and their respective frequencies in compressed ECGs (normal) for 13 different instances.
At. N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 N13 CCtr
@ 7 10 8 7 7 5 10 8 9 8 10 9 7 8.0769
$ 68551015974911 610 8.0769
Ø 596588625812 9 7 6.9231
Å 8 10 8 12 9 9 15 9 18 11 9 10 6 10.3077
å 5 12 14 11 7 7 5 11 7 7 15 12 11 9.5385
_ 69978172765 812 88
[ 5357479428 5 5 4 5.2308
] 13 9 5 8 10 7 7 8 6 12 4 10 12 8.5385
j15 13 11 11 11 15 8 17 11 11 8 5 7 11
Æ 8 14 10 6 7 8 15 12 8 11 12 8 13 10.1538
& 14 13 10 10 8 10 9 7 12 11 6 8 8 9.6923
( 7 11 5 15 11 10 11 12 10 14 6 13 9 10.3077
*47779656614 5 8 6 6.9231
: 12 8 8 13 8 8 11 5 8 8 9 13 8 9.1538
; 12 14 6 12 10 13 15 13 10 6 11 10 9 10.8462
ü11512776867612 8 2 7.4615
Á 9 9 5 11 14 7 8 9 12 7 13 8 13 9.6154
Ë 5033312432 2 6 6 3.0769
k 14 8 7 11 9 4 11 5 10 9 9 6 6 8.3846
l 9644596948 3 7 7 6.2308
m 7678735446 6 4 7 5.6923
o 2842163413 3 4 5 3.5385
r 10 14 7 11 9 9 12 11 8 16 13 11 8 10.6923
s191099996897 811 6 9.2308
4710 F. Sufi et al. / Expert Systems with Applications 38 (2011) 4705–4713
mean goes beyond a threshold value. After the detection of abnor-
mality initialization, the emergency personnel can be contacted for
the rescue of the patient (Fig. 2).
4. Results and discussion
Fig. 9 shows 20 different segments of ECG for CU1 entry of CU
Ventricular Arrythmia database (Physiobank, 2009) in a matrix for-
mat. Sub-Figs. 1–3 ([1,1], [1,2] and [1,3]) of Fig. 9 are normal ECG
segments. Sub Fig. 4 or [1,4] shows initiation of ventricular arryth-
mia. Sub Figs. 5–10 represent continual cardiac abnormality (Ven-
tricular Tachyarrythmia episode). The rest of the sub figures of
Fig. 9 show normal ECG segments for patient CU1. It should be
noted that for our proposed architecture (in Fig. 2), plain ECG (as
in Fig. 9) is not viewed anywhere. Fig. 9 only serves the purpose
of understanding the concept behind this paper.
As shown in Fig. 8, we only receive compressed ECG from which
the frequencies for all attributes (Fig. 7) are calculated. After calcu-
lating frequencies of the 157 attributes from the compressed ECGs
of Fig. 9, we can observe that certain group of characters have dif-
ferent frequency bands for normal and abnormal ECGs. Fig. 10
illustrates the fact that Sub Figs. 3–10 have notably higher frequen-
cies for attributes 115–131 (for character set {[t–z],[A–J]}). How-
ever, these sub figures (3–10) actually correspond to abnormal
ECG. Therefore, Fig. 10 represents the fact that certain compressed
character frequencies behave differently for abnormal ECG.
However, rather than manual inspection of the characters
responsible to signal abnormality, an accurate and automated
attribute selection procedure is highly desirable. Our attribute
selection process on 20 different instances provides us 48 key char-
acters or attributes that are shown in the left column of Tables 2–5.
Based on these 48 attributes, we generated cluster with previously
Table 3
Selected characters (last half attributes) and their respective frequencies in compressed ECGs (normal) for 13 different instances.
At. N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 N13 CCtr
t 6665667564 5 6 2 5.3846
u 3263135243 4 1 5 3.2308
v 4441253131 3 1 3 2.6923
w 0131233224 4 0 12
x 0 4 2 0 3 1 0 1 0 2 0 0 2 1.1538
y 2 0 0 4 0 2 0 1 2 0 0 0 1 0.9231
z 0 2 2 1 1 3 0 1 2 2 0 1 1 1.2308
A 0 1 1 2 0 0 1 2 0 2 1 0 1 0.8462
B 3 0 2 2 3 2 0 1 3 1 1 0 0 1.3846
C 13200000100000.5385
D 1 0 0 2 2 2 0 0 1 3 2 1 0 1.0769
E 0 1 2 0 0 2 0 1 1 0 1 0 3 0.8462
F 1220113210 0 1 0 1.0769
G 1011222122 2 3 4 1.7692
H 0 0 0 0 0 0 0 0 2 0 0 1 0 0.2308
I 0 1 0 1 0 1 0 1 1 0 0 1 1 0.5385
J 0 1 0 0 0 0 2 1 2 0 0 1 0 0.5385
K 0110020000 1 0 0 0.3846
L 0 0 0 0 0 0 0 2 0 0 0 1 0 0.2308
M 0 1 0 1 0 1 2 0 0 1 2 0 1 0.6923
N 0010000000 0 0 1 0.1538
O 0 0 0 0 0 1 0 0 1 1 0 0 0 0.2308
R 0 0 0 1 1 0 0 0 1 0 0 0 1 0.3077
50–100 31 35 30 36 30 31 29 29 34 34 26 29 31 31.1538
Table 4
Selected characters (first half attributes) and their respective frequencies in
compressed ECGs (abnormal) for seven different instances.
At. An1 An2 An3 An 4 An5 An6 An7 CCtr
@ 24 50 47 53 52 46 48 45.7143
$ 6 3 1 1 2 1 2 2.2857
Ø 4 1 1 1 1 0 0 1.1429
Å 4 3 3 1 1 2 1 2.1429
å 4 2 5 0 2 2 0 2.1429
_ 0 2 2 1 3 2 1 1.5714
[ 2 2 0 1 0 0 0 0.7143
] 1 6 1 2 0 0 0 1.4286
j5 4 0 3 0 0 0 1.7143
Æ 4 3 3 3 3 1 1 2.5714
& 6 1 2 3 2 3 0 2.4286
( 3 1 0 3 0 2 1 1.4286
*0 0 3 2 1 3 1 1.4286
: 2 1 1 0 1 0 1 0.8571
; 5 2 1 4 3 0 1 2.2857
ü 5 2 1 0 2 1 0 1.5714
Á 5 3 3 0 1 1 0 1.8571
Ë 0 0 1 1 1 0 1 0.5714
k 14 13 13 12 19 14 12 13.8571
l 14 17 11 9 14 6 11 11.7143
m 8 16 16 14 15 15 16 14.2857
o10131815 26201016
r 32 34 44 33 47 37 26 36.1429
s 33 39 32 29 54 38 26 35.8571
Table 5
Selected characters (Last half attributes) and their respective frequencies in
compressed ECGs (abnormal) for seven different instances.
Att. An1 An2 An3 An 4 An5 An6 An7 CCtr
t 14 34 34 34 28 28 21 27.5714
u 21 36 28 40 37 39 34 33.5714
v 19 29 29 31 31 26 27 27.4286
w 16333619 30303228
x 15 37 26 30 33 33 20 27.7143
y 17 28 24 19 37 35 31 27.2857
z 15 48 23 39 41 33 34 33.2857
A 8 31 20 20 21 27 17 20.5714
B 15 23 27 22 30 25 23 23.5714
C 14 28 21 21 19 36 16 22.1429
D 9 21 26 13 23 25 16 19
E 10 9 19 26 14 31 21 18.5714
F 17231927 13212020
G 25 46 39 41 40 63 52 43.7143
H 7 17 15 17 11 18 19 14.8571
I 7 14 8 20 4 19 14 12.2857
J 6 8 13 12 12 16 15 11.7143
K 5 8 12 13 8 14 13 10.4286
L 398126132210.4286
M 48712310178.7143
N 2782 27196.7143
O 3 3 12 6 2 9 16 7.2857
R 3444 1574
50–100 15 13 15 49 6 9 42 21.2857
F. Sufi et al. / Expert Systems with Applications 38 (2011) 4705–4713 4711
described EM methodology. EM generates 2 clusters with 100%
accuracy when the clusters are compared (or cross-validated) to
the known class (abnormal ECG segment and normal ECG seg-
ment). It is worth mentioning that EM was not informed about
the number of clusters (i.e. 2). The log likelihood measured by
EM, after creation of 2 clusters based on the 48 selected attributes,
is 100.27906. Tables 2 and 3 show the frequency of these charac-
ters on the 13 different instances for normal ECG. On the other
hand, Tables 4 and 5 show 7 instances of abnormalities. For all
the tables cluster means or centers (right most columns for Tables
2–5) are distant. Also, for normal and abnormal cases, the respec-
tive attributes show affinity towards their corresponding class
means. Fig. 11 shows the difference in normal and abnormal ECGs
for the selected 48 attributes. Unlike Fig. 10, where 16 characters
show visual distinction (from 115 to 131), Fig. 11 shows clear dis-
tinction of 48 automatically selected attributes.
Algorithm 1. Detection of the abnormality initialization
//Notation Description:
//Input: Attribute values for all the instances
//Input: Cluster means of the 2 clusters for all the attributes
//Output: The most equidistant instance
Step 1
Create distance vector, A
j
and B
j
for
Cluster 1 and 2, where jis the number of instances
A
j
¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X
I
i¼1
f
j
i
C
1
i

2
r
B
j
¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X
I
i¼1
f
j
i
C
2
i

2
r
here, f
i
is the attribute value vector for all I
attributes and C
1
i
and C
2
i
are the centroid
vectors of cluster means 1 and 2 (normal & abnormal)
and i=1,2,3,...,Iis the number of attributes
Step 2
Symmetricity metric is generated by normalizing the
difference in distance vectors for the 2 clusters
S
j
¼jA
j
B
j
j
MaxðA
j
;B
j
Þ
Step 3
The most equidistant instance, Rhas the lowest value of S
j
S
R
¼MinðS
j
Þ
This study was the first demonstrated in Sufi and Khalil (2009)
and enhanced in current work to show the feasibility of an auto-
mated alert mechanism based on data mining techniques and com-
pressed ECG designed to save lives of monitored CVD patients.
Now that we can observe two distinct clusters for normal and
abnormal compressed ECG segments, the question of belonging
arises for the compressed ECG segments that contain half normal
ECG and half abnormal ECG. For example, this situation can be ob-
served for the case of 4th sub figure of Fig. 9 (row 1, column 4). This
initialization of abnormality is also depicted in Fig. 4. It should be
mentioned again that for the sake of clarity of this paper, original
ECG segments are shown in Figs. 4 and 9. However, in real moni-
toring scenario, only compressed ECGs are dealt by the patient
and the DM agent (in Fig. 2). For this initialization of abnormality
case (Fig. 4), we logically expect it to be equidistant from the
two clusters, as this particular segment contains both normal and
abnormal ECG. To represent this fact, in a two dimensional coordi-
nate is not straight forward, as we are dealing with 48 attributes
and each attribute provides individual decision of belonging to-
wards a particular cluster.
To represent the fact that compressed ECG packets containing
both normal and abnormal ECG are nearly equidistant from both
the clusters, in two dimensional coordinate, we define the concept
of symmetricity of instances in a bi-class clustering. An instance is
said to be symmetric with respect to a bi-class clustering, when the
location of the instance is nearly equidistant from both the cluster
centroids.
Algorithm 1 basically determines the instance, which is equidis-
tant from both the classes. In first step, Algorithm 1 calculates the
cluster distances for all the 20 instances of the example case (i.e.
distance from normal cluster, A
j
and distance from abnormal clus-
ter, B
j
). For this examples case, cardinality of A
j
and B
j
is 20
(jA
j
j=jB
j
j= 20).
Fig. 11. Normal and abnormal cluster means.
Normal Abnormal
1, 2, 3, 11,
12, 13, 14,
15, 16, 17,
18, 19, 20
5, 6, 7,
8, 9, 10
4
Initiation of
abnormality
Fig. 12. Segregation of normal and abnormal ECG (in two different clusters).
4712 F. Sufi et al. / Expert Systems with Applications 38 (2011) 4705–4713
Using step 2 of Algorithm 1, we can also ascertain our proposed
symmetricity metric, S
j
for the 20 instances of our example case (as
seen from Table 6). We can clearly see that the most equidistant
case, Ris the 4th (4th subplot of Fig. 9 or Fig. 4) case. Therefore,
R=4asS
4
=Min(S
j
), where, j=1,2,3,...,20.
Algorithm 1 can clearly identify the initialization of abnormal-
ity, and as soon as the algorithm detects shifts from normal cluster,
it can notify the emergency personnel for assistance of the moni-
tored patient. This paper serves as a proof of concept to show that
cardiac abnormality can be detected directly from the compressed
ECG with the application of data mining technique like EM.
Fig. 12 shows the fact that sub Fig. 4 of Fig. 9 (or Fig. 4) is equi-
distant (being more closer to abnormal cluster) from the 2 clusters
(according to Algorithm 1), even though it belongs to abnormal
cluster according to EM. Other instances (or compressed ECG seg-
ments) are clearly identified as a member of normal or abnormal
clusters.
5. Conclusion
In this paper, we have used data mining techniques like CFS
based attribute selection and EM based clustering to instantly de-
tect cardiac abnormalities of the CVD affected subjects. The pro-
posed DM driven CVD detection framework detects ECG
abnormalities without incurring delays, which is ideal for CVD af-
fected patients, as every second counts towards the mortality of
these patients (Luca, Suryapranata, Ottervanger, & Antman,
2004). For detecting the ECG anomalies, our proposed model does
not have to decompress the compressed ECG.
CVD related deaths being the number one killer of modern
times, our proposed instant ECG anomaly detection algorithm
has the potential to save the life of a CVD affected patient. This is
due to the fact that without proper monitoring (i.e. real-time
ECG monitoring demonstrated in Lee et al. (2007), Hung & Zhang
(2003), Blount (2007), Sufi et al. (2009), Sufi & Khalil (2008)), 40%
of the patients having their first symptom of CVD might be dead
within years (Access Economics Pty Limited (2008)).
According to our experimentation on MIT-BIH entries (Physio-
bank, 2009), 100% accuracy can be achieved in detecting cardiac
abnormality from compressed ECG. However, in this paper, we
had focused on essentially two clusters (i.e. normal and abnormal)
that can only determine abnormality. To know the type of the
abnormality (e.g. Ventricular Fibrillation, Atrial Fibrillation, Prema-
ture Ventricular Beat, etc.), a multicluster system, where each clus-
ter represents one particular disease, needs to be implemented in
the future.
References
Access Economics Pty Limited. The shifting burden of cardiovascular disease in
australia, a report of heart foundation. <http://www.heartfoundation.com.au/
media/nhfashifting_burden_cvd_0505.pdf> Accessed 2008.
Blount, M. et al. (2007). Remote health-care monitoring using personal care
connect. IBM Systems Journal, 46(1), 95–113.
Friesen, G., Jannett, T., Jadallah, M., Yates, S., Quint, S., & Nagle, H. (1990). A
comparison of the noise sensitivity of nine qrs detection algorithms. IEEE
Transactions on Biomedical Engineering, 37(1), 85–98.
Hall, M. (1999). Correlation-based feature selection of discrete and numeric class
machine learning. In Computer science working papers, 2000. Working paper 00/
08. University of Waikato, Department of Computer Science.
Hamilton, P. S., & Tompkins, W. J. (1986). Quantitative investigation of qrs detection
rules using the mit/bih arrhythmia database. IEEE Transactions on Biomedical
Engineering, BME-33(12), 1157–1165.
Han, J., & Kamber, M. (2006). Data mining: Concepts and techniques. Morgan
Kaufmann.
Hung, K., & Zhang, Y.-T. (2003). Implementation of a wab-based telemedicine
system for patient monitoring. IEEE Transactions on Information Technology in
Biomedicine, 7(2), 101–107.
Istepanian, R., & Petrosian, A. (2000). Optimal zonal wavelet-based ecg data
compression for a mobile telecardiology system. IEEE Transactions on
Information Technology in Biomedicine, 4(3), 200–211.
Kim, B., Yoo, S., & Lee, M. (2006). Wavelet-based low-delay ecg compression
algorithm for continuous ecg transmission. IEEE Transactions on Information
Technology in Biomedicine, 10(1), 77–83.
Kira, K., & Rendell, L. (1992). A practical approach to feature selection. In Proceedings
of the ninth international workshop on Machine learning (pp. 249–256). San
Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Lee, R.-G., Chen, K.-C., Hsiao, C.-C., & Tseng, C.-L. (2007). A mobile care system with
alert mechanism. IEEE Transactions on Information Technology in Biomedicine,
11(5), 507–517.
Luca, G. D., Suryapranata, H., Ottervanger, J. P., & Antman, E. M. (2004). Time delay
to treatment and mortality in primary angioplasty for acute myocardial
infarction: Every minute of delay counts. Circulation, 109, 1223–1225.
Mahmood, A., Leckie, C., & Udaya, P. (2008). An efficient clustering scheme to
exploit hierarchical data in network traffic analysis. IEEE Transactions on
Knowledge and Data Engineering, 20(6), 752–767.
Physiobank: Physiologic signal archives for biomedical research. <http://
www.physionet.org/physiobank/> Accessed 2009.
Robnik-Sikonja, M., & Kononenko, I. (1997). An adaptation of relief for attribute
estimation in regression. In Proceedings of the Fourteenth International
Conference on Machine Learning (pp. 296–304). San Francisco, CA, USA:
Morgan Kaufmann Publishers Inc..
Sufi, F. (2007). Mobile phone programming java 2 micro edition. In Proceedings of
the 2007 international workshop on mobile computing technologies for pervasive
healthcare, Philip Island, Melbourne, December 2007 (pp. 64–80).
Sufi, F., Fang, Q., & Cosic, I. (2007). Ecg r-r peak detection on mobile phones. In 29th
Annual international conference of the IEEE engineering in medicine and biology
society, 2007, EMBS 2007, August 2007 (pp. 3697–3700).
Sufi, F., Fang, Q., Khalil, I., & Mahmoud, S. S. (2009). Novel methods of faster
cardiovascular diagnosis in wireless telecardiology. IEEE Journal on Selected
Areas in Communications, 27(4).
Sufi, F., Fang, Q., Mahmoud, S., & Cosic, I. (2006). A mobile phone based intelligent
telemonitoring platform. In Medical devices and biosensors, 2006. 3rd IEEE/EMBS
International Summer School on ISSMDBS, September 2006 (pp. 101–104).
Sufi, F., & Khalil, I. (2008). Enforcing secured ecg transmission for realtime
telemonitoring: A joint encoding, compression, encryption mechanism.
Security and Communication Networks, 1(5), 389–405.
Sufi, F., & Khalil, I. (2009). Diagnosis of cardiovascular abnormalities from
compressed ecg: A data mining based approach. In 9th International
conference on information technology and application in biomedicine, ITAB 2009,
Cyprus, November 2009.
Sufi, F., Khalil, I., Fang, Q., & Cosic, I. (2008). A mobile web grid based
physiological signal monitoring system. In International conference on
technology and applications in biomedicine, 2008. ITAB 2008, May 2008 (pp.
252–255).
Surez, K. V., Silva, J. C., Berthoumieu, Y., Gomis, P., & Najim, M. (2007). Ecg beat
detection using a geometrical matching approach. IEEE Transactions on
Biomedical Engineering, 54(4), 641–650.
Talavera, L. (1999a). Dependency-based feature selection for clustering symbolic
data. Intelligent Data Analysis, 4(1/2000), 19–28.
Talavera, L. (1999b). Feature selection as a preprocessing step for hierarchical
clustering. In Proceedings of the Sixteenth International Conference on Machine
Learning (pp. 389–397). Morgan Kaufmann Publishers Inc.
Table 6
A
j
,B
j
and S
j
values for the 20 ECG segments. The Fourth value of S
j
is the lowest (i.e. S
4
= 0.24573771) signaling equidistant from both normal and abnormal clusters (i.e. initiation
of abnormality).
A
j
= {17.30530722,12.36044732,13.77612638, 72.26985027,134.7904569, 120.5433543,125.0978411, 137.5469333,144.1012035, 123.8610014,12.08989901,
9.406054328,16.00034556,14.48650607, 12.22903341, 13.18553215,14.27789403, 15.17882927, 12.32616964, 13.55373225}
B
j
= {118.8628916,118.3491715,117.4859846, 54.51042274,29.27631103, 23.41155244,38.1514297, 37.79026049,35.32039337, 38.08583448,121.1030833,
120.3706758,119.8604513,119.2606355, 120.2970672, 118.8009722,119.3707786, 118.8893224, 120.5503405, 120.1616153}
S
j
= {0.854409505,0.895559494, 0.882742385,0.24573771, 0.782801307, 0.805783134, 0.695027273,0.725255521, 0.754891753, 0.692511492, 0.900168529,
0.921857593,0.866508549,0.878530699, 0.898343046, 0.88901158,0.880390375,0.87232807, 0.897750852, 0.887204144}
F. Sufi et al. / Expert Systems with Applications 38 (2011) 4705–4713 4713
... Previously, we reported AI and Machine Learning (ML) algorithms for solving multi-disciplinary problems ranging from abnormality detection ( Sufi, Fang, Khalil, & Mahmoud, 2009 ;Sufi & Khalil, 2010 ;Sufi & Khalil, 2011a ), person identification, ( Sufi & Khalil, 2011b ), and AI based knowledge discovery from Global Landslide ( Sufi & Alsulami, 2021b ;Sufi, 2021 ). All these previous studies required feature attributes to be present on which ML algorithms could operate. ...
... Moreover, for this study we only used linear and logistics regression to discover the related factors on negative news. Sophisticated algorithms like greedy Correlation based Feature Selection (CFS), which was used in our previous studies at ( Sufi & Khalil, 2010 ;Sufi & Khalil, 2011a ;Sufi & Khalil, 2011b ), were never utilized. ...
Article
Full-text available
Modern-day news agencies cater for a wide range of negative news, since multiple studies show general people are more attracted towards negative news. Once a highly negative incident is reported by a local news agency, it is often propagated by many other foreign news agencies at a global scale characterizing the news as breaking news. This propagation of negative news generates significant impacts on groups (who conducted the event), location (where the event was conducted), societies (that was impacted by the news) along with other factors. This research critically analyzes the impacts of negative news or breaking news with the help Artificial Intelligence (AI) based techniques like sentiment analysis, entity detection and automated regression analysis. The methodology described within this paper was implemented with a unique algorithm that allowed identification of all related factors or topics that drive negative perceptions towards global news. The solution was hosted in cloud environment from 2nd June 2021 till 1st September 2021. It automatically captured and analyzed 22,425 global news from 2397 different news sources of 192 countries. During this time, 34,975 entities are automatically categorized into 13 different entity groups. The classification accuracy of the entity detection was found to be 0.992, 0.995 and 0.994 in terms of precision, recall and F1-score. Moreover, the accuracies of logistic regression and linear regression were found to be 0.895 in AUC and 0.255 in MAPE on an average. Finally, the presented solution was successfully deployed in a wide range of environments including smartphones, tablets, and desktops.
... We have reported on the use of AI for solving multidisciplinary problems, ranging from abnormality detection from biomedical signals [5][6][7], person identification from electrocardiograms [8], knowledge discovery from landslide data [9], and global event analysis from online news data [10]. Sentiment analysis, entity detection and translation techniques were used in [10], decomposition tree analysis was used in [9] and other AI based techniques were used in [5][6][7][8]. ...
... We have reported on the use of AI for solving multidisciplinary problems, ranging from abnormality detection from biomedical signals [5][6][7], person identification from electrocardiograms [8], knowledge discovery from landslide data [9], and global event analysis from online news data [10]. Sentiment analysis, entity detection and translation techniques were used in [10], decomposition tree analysis was used in [9] and other AI based techniques were used in [5][6][7][8]. However, none of our previous study were focused into COVID-19 data analysis. ...
... It is methodologically possible to perform behavioral analysis of Twitter users by harnessing big data extracted from their social media accounts [8] and then using various artificial intelligence (AI)-based techniques reported in our earlier studies [9][10][11][12][13][14][15][16][17][18]. Existing studies in the political science domain either used geotagged tweets or completely skipped usage of location extraction algorithms, as observed in [4][5][6][7]19,20]. ...
... Moreover, in the future, we endeavor to use deep learning algorithms such as convolution neural network (CNN), linear regression, logistic regression, clustering algorithms such as expectation maximization (EM), similar to our previously demonstrated work in AI and machine learning [9][10][11][12][13][14][15][16][17]. ...
Article
Full-text available
Social media platforms such as Twitter have been used by political leaders, heads of states, political parties, and their supporters to strategically influence public opinions. Leaders can post about a location, a state, a country, or even a region in their social media accounts, and the posts can immediately be viewed and reacted to by millions of their followers. The effect of social media posts by political leaders could be automatically measured by extracting, analyzing, and producing real-time geospatial intelligence for social scientists and researchers. This paper proposed a novel approach in automatically processing real-time social media messages of political leaders with artificial intelligence (AI)-based language detection, translation, sentiment analysis, and named entity recognition (NER). This method automatically generates geospatial and location intelligence on both ESRI ArcGIS Maps and Microsoft Bing Maps. The proposed system was deployed from 1 January 2020 to 6 February 2022 to analyze 1.5 million tweets. During this 25-month period, 95K locations were successfully identified and mapped using data of 271,885 Twitter handles. With an overall 90% precision, recall, and F1score, along with 97% accuracy, the proposed system reports the most accurate system to produce geospatial intelligence directly from live Twitter feeds of political leaders with AI.
... The MIT-BIH database has significantly influenced the field of machine learning-based ECG analysis for over 10 years. Scientists have utilized this large dataset to create and improve algorithms that can increase the accuracy and efficiency of automated ECG interpretation [15,16]. ...
Article
Full-text available
Worldwide, cardiovascular diseases are some of the primary causes of death; yet the early detection and diagnosis of such diseases have the potential to save many lives. Technological means of detection are becoming increasingly essential and numerous techniques have been created for this purpose, such as forecasting. Of these techniques, the time series forecasting technique seeks to predict future events. The long-term time series forecasting of physiological data could assist medical professionals in predicting and treating patients based on very early diagnosis. This article presents a model that utilizes a deep learning technique to predict long-term ECG signals. The forecasting model can learn signals’ nonlinearity, nonstationarity, and complexity based on a long short-term memory architecture. However, this is not a trivial task as the correct forecasting of a signal that closely resembles the original complex signal’s structure and behavior while minimizing any differences in amplitude continues to pose challenges. To achieve this goal, we used a dataset available on the Physio net database, called MIT-BIH, with 48 ECG recordings of 30 min each. The developed model starts with pre-processing to reduce interference in the original signals, then applies a deep learning algorithm, based on a long short-term memory (LTSM) neural network with two hidden layers. Next, we applied the root mean square error (RMSE) and mean absolute error (MAE) metrics to evaluate the performance of the model and obtained an average RMSE of 0.0070±0.0028 and an average MAE of 0.0522±0.0098 across all simulations. The results indicate that the proposed LSTM model is a promising technique for ECG forecasting, considering the trends of the changes in the original data series, most notably in R-peak amplitude. Given the model’s accuracy and the features of the physiological signals, the system could be used to improve existing predictive healthcare systems for cardiovascular monitoring.
... Previously, we utilized AI algorithms for knowledge discovery on landslides [29,30], global event analysis [6], person identification [31], and cardiac abnormality detection [32][33][34]. These studies mandated feature attributes to be existent on which the AI algorithms could execute. ...
Article
Full-text available
Negative events are prevalent all over the globe round the clock. People demonstrate psychological affinity to negative events, and they incline to stay away from troubled locations. This paper proposes an automated geospatial imagery application that would allow a user to remotely extract knowledge of troubled locations. The autonomous application uses thousands of connected news sensors to obtain real-time news pertaining to all global troubles. From the captured news, the proposed application uses artificial intelligence-based services and algorithms like sentiment analysis, entity detection, geolocation decoder, news fidelity analysis, and decomposition tree analysis to reconstruct global threat maps representing troubled locations interactively. The fully deployed system was evaluated for full three months of summer 2021, during which the autonomous system processed above 22 k news from 2397 connected news sources involving BBC, CNN, NY Times, Government websites of 192 countries, and all possible major social media sites. The study revealed 11,668 troubled locations classified successfully with outstanding precision, recall, and F1-score, all evaluated in ubiquitous environment covering mobile, tablet, desktop, and cloud platforms. The system generated interesting global threat maps for robust scenario set of $$3.71 \times {10}^{29}$$ 3.71 × 10 29 , to be reported as original fully autonomous remote sensing application of this kind. The research discloses attractive news and global threat-maps with trusted overall classification accuracy.
... In our future work, we endeavor to employ deep learning algorithms like CNN, Linear Regression, Logistic Regression, Decomposition Analysis, Clustering Algorithms like EM similar to our previously demonstrated work in AI [13,14,16,17,22,[44][45][46]. ...
Article
Full-text available
Existing studies on Twitter-based natural disaster analysis suffer from shortcomings like limitations on supported languages, lack of sentiment analysis, regional restrictions, lack of end-to-end automation, and lack of Mobile App support. In this study, we design and develop a fully-automated artificial intelligence (AI) based Decision Support System (DSS) available through multiple platforms like iOS, Android, and Windows. The proposed DSS uses a live Twitter feed to obtain natural disaster-related Tweets in 110 supported languages. The system automatically executes AI-based translation, sentiment analysis, and automated K-Means algorithm to generate AI-driven insights for disaster strategists. The proposed DSS was tested with 67,528 real-time Tweets captured between 28 September 2021 and 6 October 2021 in 39 different languages under two different scenarios. The system revealed critical information for disaster planners or strategists like which clusters of natural disasters were associated with the most negative sentiments. We evaluated the proposed system’s accuracy and user experiences from 12 different disaster strategists. 83.33% of users found the proposed solution easy to use, effective, and self-explanatory. With 97% and 99.7% accuracy in Twitter keyword extraction and entity classification, this DSS reported the most accurate disaster intelligence system on a mobile platform.
... As a result, any AI tasks performed on false or erroneous data would produce false results. For example, biosignals contain very important information of cardiovascular disease (CVD) patients [49,50]. Hence, ensuring the authenticity of biosignals is very important before using them in healthcare systems. ...
Article
Edge computing is an emerging technology for the acquisition of Internet-of-Things (IoT) data and provisioning different services in connected living. Artificial Intelligence (AI) powered edge devices (edge-AI) facilitate intelligent IoT data acquisition and services through data analytics. However, data in edge networks are prone to several security threats such as external and internal attacks and transmission errors. Attackers can inject false data during data acquisition or modify stored data in the edge data storage to hamper data analytics. Therefore, an edge-AI device must verify the authenticity of IoT data before using them in data analytics. This article presents an IoT data authenticity model in edge-AI for a connected living using data hiding techniques. Our proposed data authenticity model securely hides the data source’s identification number within IoT data before sending it to edge devices. Edge-AI devices extract hidden information for verifying data authenticity. Existing data hiding approaches for biosignal cannot reconstruct original IoT data after extracting the hidden message from it (i.e., lossy) and are not usable for IoT data authenticity. We propose the first lossless IoT data hiding technique in this article based on error-correcting codes (ECCs). We conduct several experiments to demonstrate the performance of our proposed method. Experimental results establish the lossless property of the proposed approach while maintaining other data hiding properties.
... It appears that for all the cases most of tweets were in English followed by French and Dutch languages. It should be mentioned that this study did not used any Data Mining (DM) or Artificial Intelligence (AI) based approaches like our previous research in global event analysis [6], [2], landslide analysis [3,8], cardiovascular disease detection [9,10], or person identification [11]. Rather, this paper utilizes generic statistical techniques like frequency ranking and others simplified calculations [12] to extract meaningful information which might be critical for a social scientist or decision makers who are not familiar with complex AI or DM based techniques. ...
Article
Full-text available
SARS-CoV-2, or more popularly known COVID-19 has claimed more than 5.5 million lives since it has been declared as a global pandemic. Similar to other viruses, COVID-19 is also undergoing several mutations and has many variants like Alpha, Beta, Gamma, Delta, Omicron and others. With so many variants, social media users are confused and posting their frustrations and angers with Tweets or Posts in public social media platforms. These publicly accessible social media posts provide a wealth of information for a social scientist or political leader or a strategic decision maker. This study demonstrates a feasible approach to extract meaningful critical information from social media posts. By programmatically accessing Twitter database from 11th January 2022 till 20th January 2022, we retrieved almost 9 K Tweet messages on 6 different keywords like “COVID Variants”, “Omicron”, “Alpha Variant”, “Beta Variant”, “Gamma Variant” and “Delta Variant”. Results were compared against metrics like users, posts, engagement, and influence. Omicron was found to be the most popular topic compared to other variants with an influence score of 70.2 million and 2.1 K posts during the monitored period. The most popular sources for influences on COVID-19 Variant related posts were found to be @reuters with 24.2M, @forbes with 17.4M, @timesofindia with 14.2M and @inquirerdotnet with 3.4 followers. This study also found out that the most popular Tweet languages were English followed by French and Dutch. Lastly, this study ranked user mentions, word frequency (with word cloud) and hashtags for COVID-19 Variant related twitter posts during the monitored timeframe.
Article
Full-text available
Tropical cyclones devastate large areas, take numerous lives and damage extensive property in Bangladesh. Research on landfalling tropical cyclones affecting Bangladesh has primarily focused on events occurring since AD1960 with limited work examining earlier historical records. We rectify this gap by developing a new Tornado catalogue that include present and past records of Tornados across Bangladesh maximizing use of available sources. Within this new Tornado database, 119 records were captured starting from 1838 till 2020 causing 8735 deaths and 97,868 injuries leaving more than 102,776 people affected in total. Moreover, using this new Tornado data, we developed an end-to-end system that allows a user to explore and analyze the full range of Tornado data on multiple scenarios. The user of this new system can select a date range or search a particular location, and then, all the Tornado information along with Artificial Intelligence (AI) based insights within that selected scope would be dynamically presented in a range of devices including iOS, Android, and Windows. Using a set of interactive maps, charts, graphs, and visualizations the user would have a comprehensive understanding of the historical records of Tornados, Cyclones and associated landfalls with detailed data distributions and statistics.
Preprint
Full-text available
Tropical cyclones devastate large areas, take numerous lives and damage extensive property in Bangladesh. Research on landfalling tropical cyclones affecting Bangladesh has primarily focused on events occurring since AD1960 with limited work examining earlier historical records. We rectify this gap by developing a new tornado catalogue that include present and past records of tornados across Bangladesh maximizing use of available sources. Within this new tornado database, 119 records were captured starting from 1838 till 2020 causing 8,735 deaths and 97,868 injuries leaving more than 1,02,776 people affected in total. Moreover, using this new tornado data, we developed an end-to-end system that allows a user to explore and analyze the full range of tornado data on multiple scenarios. The user of this new system can select a date range or search a particular location, and then, all the tornado information along with Artificial Intelligence (AI) based insights within that selected scope would be dynamically presented in a range of devices including iOS, Android, and Windows. Using a set of interactive maps, charts, graphs, and visualizations the user would have a comprehensive understanding of the historical records of Tornados, Cyclones and associated landfalls with detailed data distributions and statistics.
Conference Paper
Full-text available
In this paper, we propose a generic smart telemonitoring platform in which the computation power of the mobile phone is highly utilized. In this approach, compression of ECG is done in real-time by the mobile phone for the very first time. The fast and effective compression scheme, designed for the proposed telemonitoring system, outperforms most of the real-time lossless ECG compression algorithms. This mobile phone based computation platform is a promising solution for privacy issues in telemonitoring through encryptions. Moreover, the mobile phones used in this platform performs preliminary detection of abnormal biosignal in realtime. Apart from the usage of mobile phones, this platform supports background biosignal abnormality surveillance using data mining agent.
Article
Full-text available
With the rapid development wireless technologies, mobile phones are gaining acceptance to become an effective tool for cardiovascular monitoring. However, existing technologies have limitations in terms of efficient transmission of compressed ECG over text messaging communications like SMS and MMS. In this paper, we first propose an ECG compression algorithm which allows lossless transmission of compressed ECG over bandwidth constrained wireless link. Then, we propose several algorithms for cardiovascular abnormality detection directly from the compressed ECG maintaining end to end security, patient privacy while offering the benefits of faster diagnosis. Next, we show that our mobile phone based cardiovascular monitoring solution is capable of harnessing up to 6.72 times faster diagnosis compared to existing technologies. As the decompression time on a doctor's mobile phone could be significant, our method will be highly advantageous in patient wellness monitoring system where a doctor has to read and diagnose from compressed ECGs of several patients assigned to him. Finally, we successfully implemented the prototype system by establishing mobile phone based cardiovascular patient monitoring.
Article
Full-text available
A new integrated design approach for an optimal zonal wavelet-based ECG data compression (OZWC) method for a mobile telecardiology model is presented. The hybrid implementation issues of this wavelet method with a GSM-based mobile telecardiology system are also introduced. The performance of the mobile system with compressed ECG data segments selected from the MIT-BIH arrhythmia database is evaluated in terms of bit error rate (BER), percent rms difference (PRD), and visual clinical inspection. The compression performance analysis of the OZWC is compared with another wavelet-based (Discrete Symmetric Wavelet Compression) approach. The optimal wavelet algorithm achieved a maximum compression ratio of 18:1 with low PRD ratios. The mobile telemedical simulation results show the successful compressed ECG transmission at speeds of 100 (km/h) with BER rates of less than 10 -15, providing a 73% reduction in total mobile transmission time with clinically acceptable reconstruction of the received signals. This approach will provide a framework for the design and functionality issues of GSM-based wireless telemedicine systems with wavelet compression techniques and their future integration for the next generation of mobile telecardiology systems.
Article
Full-text available
Caring for patients with chronic illnesses is costly—nearly $1.27 trillion today and predicted to grow much larger. To address this trend, we have designed and built a platform, called Personal Care Connect (PCC), to facilitate the remote monitoring of patients. By providing caregivers with timely access to a patient's health status, they can provide patients with appropriate preventive interventions, helping to avoid hospitalization and to improve the patient's quality of care and quality of life. PCC may reduce health-care costs by focusing on preventive measures and monitoring instead of emergency care and hospital admissions. Although PCC may have features in common with other remote monitoring systems, it differs from them in that it is a standards-based, open platform designed to integrate with devices from device vendors and applications from independent software vendors. One of the motivations for PCC is to create and propagate a working environment of medical devices and applications that results in innovative solutions. In this paper, we describe the PCC remote monitoring system, including our pilot tests of the system.
Article
Full-text available
Realtime telemonitoring of critical, acute and chronic patients has become increasingly popular with the emergence of portable acquisition devices and IP enabled mobile phones. During telemonitoring, enormous physiological signals are transmitted through the public communication network in realtime. However, these physiological signals can be intercepted with minimal effort, since existing telemonitoring practise ignores the privacy and security requirements. In this paper, to achieve end-to-end security, we first proposed an encoding method capable of securing Electrocardiogram (ECG) data transmission from an acquisition device to a mobile phone, and then from a mobile phone to a centralised medical server by concealing cardiovascular details as well as features in ECG data required to identify an individual. The encoding method not only conceals cardiovascular condition, but also reduces the enormous file size of the ECG with a compression ratio of up to 3.84, thus making it suitable in energy constrained small acquisition devices. As ECG data transfer faces even greater security vulnerabilities while traversing through the public Internet, we further designed and implemented 3 phase encoding—compression—encryption mechanism on mobile phones using the proposed encoding method and existing compression and encryption tools. This new mechanism elevates the security strength of the system even further. Apart from higher security, we also achieved higher compression ratio of up to 20.06, which will enable faster transmission and make the system suitable for realtime telemonitoring. Copyright © 2008 John Wiley & Sons, Ltd.
Conference Paper
Monitoring of physiological signals of disaster affected patients inherit several challenges. First of all, the care providers of a disaster zone are susceptible to health hazards for themselves. Fast and easy transportation of monitoring equipments is crucial for survival of the injured. Moreover, Enormous physiological data transmission from the disaster zone to the medical server needs to be regulated to prevent network congestion. In this paper, we are proposing a mobile grid based health content delivery service, which can be useful for vital signal monitoring from a remote location. The proposed system is specifically designed for monitoring of a group that is very much mobile and dynamic in nature. Therefore, during a catastrophic event like earth quake, flood, cyclone the whole system can be transported with minimal mobility to the disaster affected patients. Minimally trained people are capable of installing the system within the disaster affected area entirely in ad-hoc manner. Medical experts can monitor the group from a safe location and provide specialist advice for the early recovery of the affected patients. To deal with network congestion, local intelligence is applied within the mobile patient monitoring system. Therefore, only medically urgent information is transmitted to the hospital server or central server. Application of grid network provides additional computational power to analyze raw physiological signal to identify possible health hazards for the monitored patients. In addition, the proposed mobile grid provides load sharing and redundancy of patient data, which are of prime importance for a disaster zone.
Conference Paper
In real-world concept learning problems, the representation of data often uses many features, only a few of which may be related to the target concept. In this situation, feature selection is important both to speed up learning and to improve concept quality. A new feature selection algorithm Relief uses a statistical method and avoids heuristic search. Relief requires linear time in the number of given features and the number of training instances regardless of the target concept to be learned. Although the algorithm does not necessarily find the smallest subset of features, the size tends to be small because only statistically relevant features are selected. This paper focuses on empirical test results in two artificial domains; the LED Display domain and the Parity domain with and without noise. Comparison with other feature selection algorithms shows Relief's advantages in terms of learning time and the accuracy of the learned concept, suggesting Relief's practicality.
Conference Paper
Algorithms for feature selection fall into two broad categories: wrappers that use the learning algorithm itself to evaluate the usefulness of features and filters that evaluate features according to heuristics based on general characteristics of the data. For application to large databases, filters have proven to be more practical than wrappers because they are much faster. However, most existing filter algorithms only work with discrete classification problems. This paper describes a fast, correlation-based filter algorithm that can be applied to continuous and discrete problems. The algorithm often out-performs the well-known ReliefF attribute estimator when used as a preprocessing step for naive Bayes, instance-based learning, decision trees, locally weighted regression, and model trees. It performs more feature selection than ReliefF does-reducing the data dimensionality by fifty percent in most cases. Also, decision and model trees built from the preprocessed data are often significantly smaller.
Book
This is the third edition of the premier professional reference on the subject of data mining, expanding and updating the previous market leading edition. This was the first (and is still the best and most popular) of its kind. Combines sound theory with truly practical applications to prepare students for real-world challenges in data mining. Like the first and second editions, Data Mining: Concepts and Techniques, 3rd Edition equips professionals with a sound understanding of data mining principles and teaches proven methods for knowledge discovery in large corporate databases. The first and second editions also established itself as the market leader for courses in data mining, data analytics, and knowledge discovery. Revisions incorporate input from instructors, changes in the field, and new and important topics such as data warehouse and data cube technology, mining stream data, mining social networks, and mining spatial, multimedia and other complex data. This book begins with a conceptual introduction followed by a comprehensive and state-of-the-art coverage of concepts and techniques. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. Wherever possible, the authors raise and answer questions of utility, feasibility, optimization, and scalability. relational data. -- A comprehensive, practical look at the concepts and techniques you need to get the most out of real business data. -- Updates that incorporate input from readers, changes in the field, and more material on statistics and machine learning, -- Scores of algorithms and implementation examples, all in easily understood pseudo-code and suitable for use in real-world, large-scale data mining projects. -- Complete classroom support for instructors as well as bonus content available at the companion website. A comprehensive and practical look at the concepts and techniques you need in the area of data mining and knowledge discovery.