ArticlePDF Available

A clustering based system for instant detection of cardiac abnormalities from compressed ECG

May 2011
Expert Systems with Applications 38(5):4705-4713

May 2011
38(5):4705-4713

DOI:10.1016/j.eswa.2010.08.149

Source
DBLP

Authors:

Fahim Sufi

Federal Government

Ibrahim Khalil

RMIT University

Abdun Mahmood

La Trobe University

Compressed Electrocardiography (ECG) is being used in modern telecardiology applications for faster and efficient transmission. However, existing ECG diagnosis algorithms require the compressed ECG packets to be decompressed before diagnosis can be applied. This additional process of decompression before performing diagnosis for every ECG packet introduces undesirable delays, which can have severe impact on the longevity of the patient. In this paper, we first used an attribute selection method that selects only a few features from the compressed ECG. Then we used Expected Maximization (EM) clustering technique to create normal and abnormal ECG clusters. Twenty different segments (13 normal and 7 abnormal) of compressed ECG from a MIT-BIH subject were tested with 100% success using our model. Apart from automatic clustering of normal and abnormal compressed ECG segments, this paper presents an algorithm to identify initiation of abnormality. Therefore, emergency personnel can be contacted for rescue mission, within the earliest possible time. This innovative technique based on data mining of compressed ECGs attributes, enables faster identification of cardiac abnormalities resulting in an efficient telecardiology diagnosis system.

A normal ECG segment of a patient (a random CU1 entry MIT BiH CU Ventricular Tachyarrythmia Database).

…

Initiation of abnormality (Ventricular Tachyarrythmia) with the ECG segment for (CU1).

…

An abnormal (Ventricular Tachyarrythmia) ECG segment of a patient (CU1).

…

Figures - uploaded by Fahim Sufi

Content may be subject to copyright.

Content uploaded by Fahim Sufi

Content may be subject to copyright.

A clustering based system for instant detection of cardiac abnormalities from

compressed ECG

Fahim Suﬁ

⇑

, Ibrahim Khalil

, Abdun Naser Mahmood

RMIT University, School of Computer Science and Information Technology, 123 Latrobe St., Melbourne, VIC 3000, Australia

article info

Keywords:

Cardiac abnormality classiﬁcation

Compressed ECG

CVD diagnosis

Symmetricity of bi-class clustering

CVD alert mechanism

abstract

Compressed Electrocardiography (ECG) is being used in modern telecardiology applications for faster and

efﬁcient transmission. However, existing ECG diagnosis algorithms require the compressed ECG packets

to be decompressed before diagnosis can be applied. This additional process of decompression before per-

forming diagnosis for every ECG packet introduces undesirable delays, which can have severe impact on

the longevity of the patient. In this paper, we ﬁrst used an attribute selection method that selects only a

few features from the compressed ECG. Then we used Expected Maximization (EM) clustering technique

to create normal and abnormal ECG clusters. Twenty different segments (13 normal and 7 abnormal) of

compressed ECG from a MIT-BIH subject were tested with 100% success using our model. Apart from

automatic clustering of normal and abnormal compressed ECG segments, this paper presents an algo-

rithm to identify initiation of abnormality. Therefore, emergency personnel can be contacted for rescue

mission, within the earliest possible time. This innovative technique based on data mining of compressed

ECGs attributes, enables faster identiﬁcation of cardiac abnormalities resulting in an efﬁcient telecardiol-

ogy diagnosis system.

Ó2010 Published by Elsevier Ltd.

1. Introduction

Electrocardiogram (ECG) signal has signiﬁcantly been used for

diagnosing Cardiovascular Diseases (CVD), which have been the

number one killer of modern time. The existing diagnosis algo-

rithms are mostly suited for plain ECG signals (i.e. not in com-

pressed form) that work by detecting the ECG ﬁducial points,

namely P, Q, R, S and T (Friesen et al., 1990; Hamilton & Tompkins,

1986; Suﬁ, Fang, & Cosic, 2007; Surez, Silva, Berthoumieu, Gomis, &

Najim, 2007) (as shown in Fig. 1). After detecting the ECG ﬁducial

points, the existing ECG diagnosis algorithms employ computa-

tionally intensive processing to ascertain particular cardiac

anomalies.

According to existing research on mobile phone based telecardi-

ology application, the death rate associated with CVD can be tack-

led by harnessing the processing power of mobile technologies

(Blount, 2007; Hung & Zhang, 2003; Lee, Chen, Hsiao, & Tseng,

2007). More recent set of research conﬁrms that the usage of

specially designed compression technologies can result in a faster

telecardiology solutions (Istepanian & Petrosian, 2000; Kim, Yoo,

& Lee, 2006; Suﬁ, Fang, Mahmoud, & Cosic, 2006; Suﬁ, Fang, Khalil,

& Mahmoud, 2009; Suﬁ & Khalil, 2008). However, if the ECG pack-

ets remain in compressed format during data transmission and

storage, then existing ECG diagnosis algorithms cannot be applied

directly. The compressed ECG must be decompressed before apply-

ing most of the CVD detection algorithms (Friesen et al., 1990;

Hamilton & Tompkins, 1986; Surez et al., 2007; Suﬁ et al., 2007).

If a hospital has hundreds of remotely monitored (real-time) CVD

patients then the hospital server might have to perform this addi-

tional task of decompression for millions of compressed ECG pack-

ets per second. Therefore, this added process of decompression

may create enormous computational burden on existing infra-

structure. To mitigate the computational burden imposed by com-

pression technology, research in Suﬁ et al. (2009) demonstrates a

new set of CVD diagnosis algorithms that works on compressed

ECG directly (i.e. decompression of the compressed ECG packet is

no longer required). However, the techniques of detecting cardiac

abnormality from compressed ECG presented in Suﬁ et al. (2009)

engages a rule based algorithm for detection of a particular disease.

In order to identify all the cardiac abnormalities, the presented sys-

tem in Suﬁ et al. (2009) requires hundreds of complex algorithms

to be integrated under one computationally intensive system.

Maintaining and updating such complex system for every new

abnormality is difﬁcult.

This introduces the problem of ﬁnding a simple and fast

solution towards heart abnormality detection from compressed

0957-4174/$ - see front matter Ó2010 Published by Elsevier Ltd.

doi:10.1016/j.eswa.2010.08.149

⇑

Corresponding author.

E-mail addresses: research@fahimsuﬁ.com (F. Suﬁ), ibrahimk@cs.rmit.edu.au (I.

Khalil), abdun.mahmood@rmit.edu.au (A.N. Mahmood).

Tel: +61399252879.

Tel: +61399251902.

Expert Systems with Applications 38 (2011) 4705–4713

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier.com/locate/eswa

ECG that raises alert to the cardiac specialist as soon as a cardiac

abnormality is detected.

In this paper, we present a simple but efﬁcient data mining

based solution that detects an abnormality from the compressed

ECG instantly. This technique can be placed within a wireless mon-

itoring facility to alert the emergency personnel in an event of car-

diac abnormality of a subscribed patient.

2. Background

Human heart is responsible for maintaining oxygenated blood

circulating throughout our body, by beating about 100,000 times

per day. A human heart contains four chambers: two atria and

two ventricles. The deoxygenated blood initially enters the right

atrium. The right atrium contracts and forces the deoxygenated

blood to the right ventricle. From the right ventricle the oxygen

deﬁcit blood rushes to the lungs, where gas exchange process takes

place and blood attains oxygen (releases carbon dioxide). The oxy-

genated (i.e. oxygen enriched) blood then enters the left atria, from

where it is redirected to the left ventricle. Finally, the left ventricle

forces the blood to the rest of the body. Both the atria contracts to-

gether, and on the other hand both the ventricular contraction oc-

curs at the same time.

An ECG signal, representation of the electrical activity of the

heart, has three major features waves; namely P wave, QRS com-

plex and T wave (as seen from Fig. 1). An atrial contraction results

in a P wave and a ventricular contraction is reﬂected by a QRS com-

plex. T wave, on the other hand, represents ventricular relaxation

that occurs after ventricular contraction. Cardiologists have used

different features of these feature waves to assess the condition

of the heart (see Tables 1).

As seen from Fig. 2, patient is attached with a portable ECG

acquisition device, which collects ECG signal from the patient’s

body and transmits ECG packets to the mobile phone via Bluetooth,

Wiﬁ, Near Field Communication (NFC) or Zigbee protocol. Mobile

phone then compresses and encrypts the ECG packets and

forwards them (i.e. compressed and encrypted packets) to the

Time / Samples

mplitude

Fig. 1. The proposed cardiac diagnosis system.

Table 1

ECG Features related to P wave, QRS complex and T wave.

P wave duration QRS complex duration T wave duration

P wave amplitude QRS complex amplitude T wave amplitude

P wave onset slope Q onset slope T wave onset slope

P wave offset slope Q offset slope T wave offset slope

QT Interval R onset slope P wave direction

RR Interval R offset slope T wave direction

ST Segment S onset slope

RR Interval S offset slope

ECG Acquisition

Device

Patient’s mobile

phone compresses

and encrypts the

ECG packets

ECG acquisition device

to mobile communication

via Bluetooth, NFC, Zigbee

or Wifi

Monitoring service / hospital employs

Data Mining Agent to detect abnormality

Ambulance or rescue team is

notified when abnormality

is detected by the Data

Mining Agent

Patient’s mobile phone transmits

the ECG packets via HTTP, MMS or

SMS to the hospital / monitoring

service

Fig. 2. Architecture of the data mining based compressed ECG diagnosis system.

4706 F. Suﬁ et al. / Expert Systems with Applications 38 (2011) 4705–4713

hospital or monitoring services via HTTP or MMS. The monitoring

services execute a background monitoring agent implementing

data mining techniques. This mobile phone based compressed

ECG transmission has been proposed and used in our earlier re-

search works (Suﬁ, 2007; Suﬁ et al., 2009, 2006; Suﬁ & Khalil,

2008).

However, for this paper we are adding a data mining module

(situated in the hospital) for identiﬁcation of CVD abnormality

from compressed ECG sent by the patient, using clustering tech-

niques. These data mining techniques use the knowledge of what

is normal and what is abnormal from the monitored patient’s

ECG. The input and output to the mining agent are the compressed

ECG and a Boolean type denoting abnormality, respectively. There-

fore, for this telemonitoring solution, if the compressed ECG is de-

rived from a normal ECG, output of the data mining agent would be

negative. In case of abnormal ECG signal from the patient, the

agent will output positive detection, signalling abnormality and

alert mechanism would be activated in such a case.

3. Architecture of the proposed disease identiﬁcation system

In remote telemonitoring, massive amount of ECG data is trans-

ferred (Suﬁ, Khalil, Fang, & Cosic, 2008), and therefore, adoption of

specialized compression technology (as demonstrated in our ear-

lier research in Suﬁ et al. (2009) & Suﬁ & Khalil (2008)) is often re-

quired. Our ECG compression technique uses the encoding function



() that transforms the ECG signal, X

to a compressed ECG, C

(Eq.

1). The lossless nature of our ECG compression technique ensures

that ECG features set, F(a subset of ECG signal X

as shown in

Eq. (2)) also exists within the encoded (or compressed) ECG C

(Eq. 3). New algorithm can be designed to reveal these encoded

ECG feature set for CVD diagnosis directly from the compressed

ECG.

As an example, Fig. 3 shows a normal ECG segment for Entry ID

CU1 of CU Ventricular Tachyarrhythmia database (Physiobank,

2009). Fig. 4 demonstrates the initiation of abnormality (i.e. Ven-

tricular Tachyarrhythmia) for that particular patient. Lastly, Fig. 5

depicts a complete episode of Ventricular Tachyarrhythmia for

the same patient. Fig. 6 shows the compressed ECG (i.e. com-

pressed using our specialized ECG compression algorithm (Suﬁ

et al., 2009; Suﬁ & Khalil, 2008)) of Figs. 3–5. Eq. (1) represents

the fact that Fig. 6 preserve the ECG features of Figs. 3–5. Within

this paper, our proposed idea is to harness data mining routines

for efﬁcient detection of CVD anomalies (i.e. cardiac abnormality)

0 200 400 600 800 1000

−1

Fig. 3. A normal ECG segment of a patient (a random CU1 entry MIT BiH CU

Ventricular Tachyarrythmia Database).

0 200 400 600 800 1000

−2

Fig. 4. Initiation of abnormality (Ventricular Tachyarrythmia) with the ECG

segment for (CU1).

0 200 400 600 800 1000

−2

−1

Fig. 5. An abnormal (Ventricular Tachyarrythmia) ECG segment of a patient (CU1).

Fig. 6. Compressed ECG for Fig. 3 (normal ECG), Fig. 4 (normal and abnormal) and Fig. 5 (abnormal ECG).

F. Suﬁ et al. / Expert Systems with Applications 38 (2011) 4705–4713 4707

directly from compressed ECG (e.g. the compressed ECG shown in

Fig. 6).

ðX

Þ¼C

ð1Þ

FX

ð2Þ

FC

ð3Þ

During the compression process, 148 characters and numeric

values (0–9) are used to encode the plain text ECG signal, as seen

in Fig. 7 (ECG compression is performed inside patient’s mobile

phone). The data mining agent (DMA) of the hospital (Fig. 2) needs

to be trained with normal and abnormal ECG (from compressed

ECG) of patients. After being trained, the DMA can be tested for

irregularities (abnormal ECG). Our proposed algorithm (Algorithm

1), instantly identiﬁes abnormal ECG segments (directly from the

compressed ECG).

3.1. Training of the proposed model

During this training phase, the proposed model learns what is

normal ECG and what is abnormal ECG. Fig. 8 shows the main

stages of this learning process from compressed ECG.

3.1.1. Character frequency calculation

As shown in Fig. 8, from the compressed ECG, the frequency of

each encoded characters is computed ﬁrst. There are about 148

characters and 6 numeric subgroups for which the frequencies

are generated (Fig. 7). The frequency of these 157 character (and

numeric sub groups) are utilized as the attributes for clustering.

However, 157 attributes are too many for generating clusters (nor-

mal and abnormal ECG). Therefore, the attribute subset selection is

necessary. Using proven techniques, we ﬁrst select characters from

the compressed ECG that are mainly responsible for identifying

diseases. Then, based on the selected characters (or attributes)

classiﬁcation of abnormality and normality is possible.

3.1.2. Attribute subset selection

Data pre-processing using attribute selection is an important

step in data mining, since a large number of attributes often lead

to poor learning due to untenably large combinatorial search space

for the solution (Han & Kamber, 2006). The goal of feature subset

selection is to (a) reduce the dimensionality of the data to be ana-

lysed, (b) to speed up execution of learning algorithms, (c) improve

performance of data mining techniques including learning time

and predictive accuracy, (d) improve the comprehensibility of the

output. Recent studies have shown that attribute subset selection

helps improve the performance of clustering algorithms with re-

duced attributes (Suﬁ & Khalil, 2009; Talavera, 1999a, 1999b). In

this paper, we have adapted for use with continuous ECG signals,

a correlation based feature subset selection technique ( Hall,

1999; Suﬁ & Khalil, 2009), which outperforms other feature selec-

tion algorithms, such as ReliefF (Kira & Rendell, 1992) and RReliefF

(Robnik-Sikonja & Kononenko, 1997). The attribute selection is

based on an attribute’s relative utility with regards to the predicted

class as well as taking into consideration its correlation with other

attributes in the subset. The utility of an attribute can be repre-

sented using the Pearson’s co-efﬁcient for correlation, where the

variables are standardized as in Eqs. (4) and (5)

¼Pðx



xÞðy



yÞ

ðn1Þ

ð4Þ

¼Cr

ðCþCðC1Þr

a

ð5Þ

wherex

andy

aresamplemeancalculatedfromthedata,

and

are

the standard deviations, a;

a2S;C jSj;r

eragecorrelation

between features xand y. For a subset Sof Cfeatures, the utility

function calculates how much the features ða;

aÞare related r

to the predicted class p, while being less related to each other r

a

The utility function reduces the effect of irrelevant attributes

as they are less correlated with the predicted class. It also

discards redundant attributes as they are highly correlated with each

other.

We used a greedy best ﬁrst algorithm to search through the

candidate subsets for a locally optimal solution. The algorithm ini-

tiates with an empty subset, adding one attribute at a time and

estimating the utility function, to determine the correlation of

the subset with the predicted class. The next attribute is added

as long as the utility value does not decline for the best subset. If

there is a decrease then the algorithm selects the next best subset

and commences adding attributes to it. In some datasets where

there are groups of features that are locally predictive to the pre-

dicted class, we investigate the attributes that were initially dis-

carded while building the best subset. In this case, after the best

subset has been generated, the algorithm investigates the rejected

list of attributes one-by-one and evaluates its correlation to the

predicted class against the average correlation to the subset. If its

correlation to the class is greater than its correlation with the attri-

bute subset, signalling a stronger attraction to the class than the

subset, then the attribute is incorporated in the subset.

Fig. 7. 157 characters and numeric sub groups (attributes) used for generating

compressed ECG (from plain ECG signal). Details of this character substitution

based compression techniques have been described in Suﬁ et al. (2009), Suﬁ and

Khalil (2008).

Compressed ECG Attribute Subset

Selection

Clustering of

Reduced Attributes

Detection of Normal

and Abnormal Clusters

Data Mining Techniques

Calculate Frequency

of Each Characters

Fig. 8. Step by step procedure of the proposed cardiac abnormality detection technique.

4708 F. Suﬁ et al. / Expert Systems with Applications 38 (2011) 4705–4713

3.1.3. Automatic learning of normal and abnormal patterns using

clustering of compressed ECG features

Using the smaller subset of attributes we can now produce a

cluster from the normal compressed ECG patterns. This cluster of

normal patterns would serve as the benchmark test against future

ECG sent from the observed client. Under normal circumstances

any incoming ECG would closely match the stored cluster. How-

ever, if there is any abnormality then the clustering algorithm

would create a different cluster from the abnormal ECG. This will

generate an alarm and require urgent attention of a physician or

a cardiologist. It should be noted that procedure given in this paper

works solely on the compressed ECG character frequency, and does

not even require decompression, which would take valuable extra

time from a patient’s life.

The aim of clustering is to group a given set of objects so that

similar objects (also known as cases, instances or patterns) are

grouped together and dissimilar objects are kept apart. Although

there are many different techniques to build multi-dimensional

clusters (Mahmood, Leckie, & Udaya, 2008), we have chosen a sta-

tistical clustering technique called Expectation Maximization (EM)

(Han & Kamber, 2006) to cluster compressed ECG data, since it can

be used to ﬁnd the correct number of clusters automatically.

Assuming two clusters Aand B, representing normal and abnormal

class of ECG, we describe the steps for EM clustering for two

clusters:

1. Choose model parameters mean

, standard deviation

and

probability of clusters parbitrarily for clusters Aand B.

2. For each iteration j, calculate the probability that instance I

belongs to clusters Aand B:

PðAjIÞ¼p

ðIjAÞ

ðIÞ;PðBjIÞ¼p

ðIjBÞ

ðIÞð6Þ

The probability of P(IjA) can be modelled using any distribution

function. For the commonly used Gaussian distribution that we

have adopted in this paper, it can be given by

PðIjAÞ¼ 1

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

ð2

pexp

ðI

AÞ2

ð7Þ

3. Update the mixture parameters on the basis of the new

estimates:

jþ1

¼P

PðAjIÞ

n;P

jþ1

¼P

PðBjIÞ

nð8Þ

jþ1

¼P

IPðAjIÞ

PðAjIÞ;

jþ1

¼P

IPðBjIÞ

PðBjIÞð9Þ

jþ1

¼P

PðAjIÞðI

jþ1

PðAjIÞð10Þ

jþ1

¼P

PðBjIÞðI

jþ1

PðBjIÞð11Þ

4. Calculate the log likelihood value E

¼P

logðP

ðIÞÞ. Consider a

ﬁxed stopping criterion



, then if jE

j+i



, then stop; else set

j=j+1.

EM can decide how many clusters to create by cross validation

(as is the case in the present study), or it may be speciﬁed a priori

(normal and abnormal clusters). In the depicted scenario of Fig. 2,

the patient continuously sends the compressed ECG information to

the hospital, which clusters the new information and checks to see

if there are two clearly segregated clusters. In cases where the

compressed ECG falls under abnormal cluster (or inclines towards

abnormal cluster), as shown in Fig. 12, abnormality is detected. If

such an abnormality is observed then an immediate alarm is

raised, since the ECG pattern has been found to be signiﬁcantly

0 500 1000

−2

0 500 1000

−2

0 500 1000

−2

0 500 1000

−2

0 500 1000

−2

0 500 1000

−2

0 500 1000

−2

0 500 1000

−2

0 500 1000

−2

0 500 1000

−2

0 500 1000

−2

0 500 1000

−2

0 500 1000

−5

0 500 1000

−5

0 500 1000

−5

0 500 1000

−5

0 500 1000

−2

0 500 1000

−2

0 500 1000

−2

0 500 1000

−2

Fig. 9. 20 randomely selected ECG segments for CU1 entry (from CU Ventricular Tachyarythmia – MIT BIH).

F. Suﬁ et al. / Expert Systems with Applications 38 (2011) 4705–4713 4709

different from normal patterns. In our experiments, the EM algo-

rithm has been successful in isolating the normal and abnormal

compressed ECG with remarkable accuracy (100%) using the 20

ECG segment dataset.

3.2. Instant abnormality detection from compressed ECG

Once the proposed model is trained, we know the cluster cen-

ters (or means) for all the selected attributes (for the classes). With

this knowledge, whenever a new compressed ECG is sent by the

patient, the DMA calculates the frequency of selected characters

(selected attribute in training stage). These inputs (attribute values

of the instance) are fed along with the cluster centers to Algorithm

1, which determines initialization of abnormality.

During an initialization of abnormality, we expect the com-

pressed ECG packet to contain both normal and abnormal ECG.

Therefore, for these initialization of abnormality packets, distances

from normal cluster centers (for the selected attributes) will start

to increase. Abnormality can be signalled, once the distance be-

tween the instance (initialization ECG packet) and normal cluster

0 100

100

0 100

100

0 100

100

0 100

100

0 100

100

0 100

100

0 100

100

0 100

100

0 100

100

0 100

100

0 100

100

0 100

100

0 100

100

0 100

100

0 100

100

0 100

100

0 100

100

0 100

100

0 100

100

0 100

100

Fig. 10. Frequency distribution of the 20 randomly selected ECG segments for CU1 entry (of Fig. 9). Boxed region shows high frequencies of attribute 115–131 denoting

abnormality from the compressed ECG.

Table 2

Selected characters (ﬁrst half attributes) and their respective frequencies in compressed ECGs (normal) for 13 different instances.

At. N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 N13 CCtr

@ 7 10 8 7 7 5 10 8 9 8 10 9 7 8.0769

$ 68551015974911 610 8.0769

Ø 596588625812 9 7 6.9231

Å 8 10 8 12 9 9 15 9 18 11 9 10 6 10.3077

å 5 12 14 11 7 7 5 11 7 7 15 12 11 9.5385

_ 69978172765 812 88

[ 5357479428 5 5 4 5.2308

] 13 9 5 8 10 7 7 8 6 12 4 10 12 8.5385

j15 13 11 11 11 15 8 17 11 11 8 5 7 11

Æ 8 14 10 6 7 8 15 12 8 11 12 8 13 10.1538

& 14 13 10 10 8 10 9 7 12 11 6 8 8 9.6923

( 7 11 5 15 11 10 11 12 10 14 6 13 9 10.3077

*47779656614 5 8 6 6.9231

: 12 8 8 13 8 8 11 5 8 8 9 13 8 9.1538

; 12 14 6 12 10 13 15 13 10 6 11 10 9 10.8462

ü11512776867612 8 2 7.4615

Á 9 9 5 11 14 7 8 9 12 7 13 8 13 9.6154

Ë 5033312432 2 6 6 3.0769

k 14 8 7 11 9 4 11 5 10 9 9 6 6 8.3846

l 9644596948 3 7 7 6.2308

m 7678735446 6 4 7 5.6923

o 2842163413 3 4 5 3.5385

r 10 14 7 11 9 9 12 11 8 16 13 11 8 10.6923

s191099996897 811 6 9.2308

4710 F. Suﬁ et al. / Expert Systems with Applications 38 (2011) 4705–4713

mean goes beyond a threshold value. After the detection of abnor-

mality initialization, the emergency personnel can be contacted for

the rescue of the patient (Fig. 2).

4. Results and discussion

Fig. 9 shows 20 different segments of ECG for CU1 entry of CU

Ventricular Arrythmia database (Physiobank, 2009) in a matrix for-

mat. Sub-Figs. 1–3 ([1,1], [1,2] and [1,3]) of Fig. 9 are normal ECG

segments. Sub Fig. 4 or [1,4] shows initiation of ventricular arryth-

mia. Sub Figs. 5–10 represent continual cardiac abnormality (Ven-

tricular Tachyarrythmia episode). The rest of the sub ﬁgures of

Fig. 9 show normal ECG segments for patient CU1. It should be

noted that for our proposed architecture (in Fig. 2), plain ECG (as

in Fig. 9) is not viewed anywhere. Fig. 9 only serves the purpose

of understanding the concept behind this paper.

As shown in Fig. 8, we only receive compressed ECG from which

the frequencies for all attributes (Fig. 7) are calculated. After calcu-

lating frequencies of the 157 attributes from the compressed ECGs

of Fig. 9, we can observe that certain group of characters have dif-

ferent frequency bands for normal and abnormal ECGs. Fig. 10

illustrates the fact that Sub Figs. 3–10 have notably higher frequen-

cies for attributes 115–131 (for character set {[t–z],[A–J]}). How-

ever, these sub ﬁgures (3–10) actually correspond to abnormal

ECG. Therefore, Fig. 10 represents the fact that certain compressed

character frequencies behave differently for abnormal ECG.

However, rather than manual inspection of the characters

responsible to signal abnormality, an accurate and automated

attribute selection procedure is highly desirable. Our attribute

selection process on 20 different instances provides us 48 key char-

acters or attributes that are shown in the left column of Tables 2–5.

Based on these 48 attributes, we generated cluster with previously

Table 3

Selected characters (last half attributes) and their respective frequencies in compressed ECGs (normal) for 13 different instances.

At. N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 N13 CCtr

t 6665667564 5 6 2 5.3846

u 3263135243 4 1 5 3.2308

v 4441253131 3 1 3 2.6923

w 0131233224 4 0 12

x 0 4 2 0 3 1 0 1 0 2 0 0 2 1.1538

y 2 0 0 4 0 2 0 1 2 0 0 0 1 0.9231

z 0 2 2 1 1 3 0 1 2 2 0 1 1 1.2308

A 0 1 1 2 0 0 1 2 0 2 1 0 1 0.8462

B 3 0 2 2 3 2 0 1 3 1 1 0 0 1.3846

C 13200000100000.5385

D 1 0 0 2 2 2 0 0 1 3 2 1 0 1.0769

E 0 1 2 0 0 2 0 1 1 0 1 0 3 0.8462

F 1220113210 0 1 0 1.0769

G 1011222122 2 3 4 1.7692

H 0 0 0 0 0 0 0 0 2 0 0 1 0 0.2308

I 0 1 0 1 0 1 0 1 1 0 0 1 1 0.5385

J 0 1 0 0 0 0 2 1 2 0 0 1 0 0.5385

K 0110020000 1 0 0 0.3846

L 0 0 0 0 0 0 0 2 0 0 0 1 0 0.2308

M 0 1 0 1 0 1 2 0 0 1 2 0 1 0.6923

N 0010000000 0 0 1 0.1538

O 0 0 0 0 0 1 0 0 1 1 0 0 0 0.2308

R 0 0 0 1 1 0 0 0 1 0 0 0 1 0.3077

50–100 31 35 30 36 30 31 29 29 34 34 26 29 31 31.1538

Table 4

Selected characters (ﬁrst half attributes) and their respective frequencies in

compressed ECGs (abnormal) for seven different instances.

At. An1 An2 An3 An 4 An5 An6 An7 CCtr

@ 24 50 47 53 52 46 48 45.7143

$ 6 3 1 1 2 1 2 2.2857

Ø 4 1 1 1 1 0 0 1.1429

Å 4 3 3 1 1 2 1 2.1429

å 4 2 5 0 2 2 0 2.1429

_ 0 2 2 1 3 2 1 1.5714

[ 2 2 0 1 0 0 0 0.7143

] 1 6 1 2 0 0 0 1.4286

j5 4 0 3 0 0 0 1.7143

Æ 4 3 3 3 3 1 1 2.5714

& 6 1 2 3 2 3 0 2.4286

( 3 1 0 3 0 2 1 1.4286

*0 0 3 2 1 3 1 1.4286

: 2 1 1 0 1 0 1 0.8571

; 5 2 1 4 3 0 1 2.2857

ü 5 2 1 0 2 1 0 1.5714

Á 5 3 3 0 1 1 0 1.8571

Ë 0 0 1 1 1 0 1 0.5714

k 14 13 13 12 19 14 12 13.8571

l 14 17 11 9 14 6 11 11.7143

m 8 16 16 14 15 15 16 14.2857

o10131815 26201016

r 32 34 44 33 47 37 26 36.1429

s 33 39 32 29 54 38 26 35.8571

Table 5

Selected characters (Last half attributes) and their respective frequencies in

compressed ECGs (abnormal) for seven different instances.

Att. An1 An2 An3 An 4 An5 An6 An7 CCtr

t 14 34 34 34 28 28 21 27.5714

u 21 36 28 40 37 39 34 33.5714

v 19 29 29 31 31 26 27 27.4286

w 16333619 30303228

x 15 37 26 30 33 33 20 27.7143

y 17 28 24 19 37 35 31 27.2857

z 15 48 23 39 41 33 34 33.2857

A 8 31 20 20 21 27 17 20.5714

B 15 23 27 22 30 25 23 23.5714

C 14 28 21 21 19 36 16 22.1429

D 9 21 26 13 23 25 16 19

E 10 9 19 26 14 31 21 18.5714

F 17231927 13212020

G 25 46 39 41 40 63 52 43.7143

H 7 17 15 17 11 18 19 14.8571

I 7 14 8 20 4 19 14 12.2857

J 6 8 13 12 12 16 15 11.7143

K 5 8 12 13 8 14 13 10.4286

L 398126132210.4286

M 48712310178.7143

N 2782 27196.7143

O 3 3 12 6 2 9 16 7.2857

R 3444 1574

50–100 15 13 15 49 6 9 42 21.2857

F. Suﬁ et al. / Expert Systems with Applications 38 (2011) 4705–4713 4711

described EM methodology. EM generates 2 clusters with 100%

accuracy when the clusters are compared (or cross-validated) to

the known class (abnormal ECG segment and normal ECG seg-

ment). It is worth mentioning that EM was not informed about

the number of clusters (i.e. 2). The log likelihood measured by

EM, after creation of 2 clusters based on the 48 selected attributes,

is 100.27906. Tables 2 and 3 show the frequency of these charac-

ters on the 13 different instances for normal ECG. On the other

hand, Tables 4 and 5 show 7 instances of abnormalities. For all

the tables cluster means or centers (right most columns for Tables

2–5) are distant. Also, for normal and abnormal cases, the respec-

tive attributes show afﬁnity towards their corresponding class

means. Fig. 11 shows the difference in normal and abnormal ECGs

for the selected 48 attributes. Unlike Fig. 10, where 16 characters

show visual distinction (from 115 to 131), Fig. 11 shows clear dis-

tinction of 48 automatically selected attributes.

Algorithm 1. Detection of the abnormality initialization

//Notation Description:

//Input: Attribute values for all the instances

//Input: Cluster means of the 2 clusters for all the attributes

//Output: The most equidistant instance

Step 1

Create distance vector, A

and B

for

Cluster 1 and 2, where jis the number of instances

¼ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

i¼1

C



¼ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

i¼1

C



here, f

is the attribute value vector for all I

attributes and C

and C

are the centroid

vectors of cluster means 1 and 2 (normal & abnormal)

and i=1,2,3,...,Iis the number of attributes

Step 2

Symmetricity metric is generated by normalizing the

difference in distance vectors for the 2 clusters

¼jA

B

MaxðA

Step 3

The most equidistant instance, Rhas the lowest value of S

¼MinðS

This study was the ﬁrst demonstrated in Suﬁ and Khalil (2009)

and enhanced in current work to show the feasibility of an auto-

mated alert mechanism based on data mining techniques and com-

pressed ECG designed to save lives of monitored CVD patients.

Now that we can observe two distinct clusters for normal and

abnormal compressed ECG segments, the question of belonging

arises for the compressed ECG segments that contain half normal

ECG and half abnormal ECG. For example, this situation can be ob-

served for the case of 4th sub ﬁgure of Fig. 9 (row 1, column 4). This

initialization of abnormality is also depicted in Fig. 4. It should be

mentioned again that for the sake of clarity of this paper, original

ECG segments are shown in Figs. 4 and 9. However, in real moni-

toring scenario, only compressed ECGs are dealt by the patient

and the DM agent (in Fig. 2). For this initialization of abnormality

case (Fig. 4), we logically expect it to be equidistant from the

two clusters, as this particular segment contains both normal and

abnormal ECG. To represent this fact, in a two dimensional coordi-

nate is not straight forward, as we are dealing with 48 attributes

and each attribute provides individual decision of belonging to-

wards a particular cluster.

To represent the fact that compressed ECG packets containing

both normal and abnormal ECG are nearly equidistant from both

the clusters, in two dimensional coordinate, we deﬁne the concept

of symmetricity of instances in a bi-class clustering. An instance is

said to be symmetric with respect to a bi-class clustering, when the

location of the instance is nearly equidistant from both the cluster

centroids.

Algorithm 1 basically determines the instance, which is equidis-

tant from both the classes. In ﬁrst step, Algorithm 1 calculates the

cluster distances for all the 20 instances of the example case (i.e.

distance from normal cluster, A

and distance from abnormal clus-

ter, B

). For this examples case, cardinality of A

and B

is 20

(jA

j=jB

j= 20).

Fig. 11. Normal and abnormal cluster means.

Normal Abnormal

1, 2, 3, 11,

12, 13, 14,

15, 16, 17,

18, 19, 20

5, 6, 7,

8, 9, 10

Initiation of

abnormality

Fig. 12. Segregation of normal and abnormal ECG (in two different clusters).

4712 F. Suﬁ et al. / Expert Systems with Applications 38 (2011) 4705–4713

Using step 2 of Algorithm 1, we can also ascertain our proposed

symmetricity metric, S

for the 20 instances of our example case (as

seen from Table 6). We can clearly see that the most equidistant

case, Ris the 4th (4th subplot of Fig. 9 or Fig. 4) case. Therefore,

R=4asS

=Min(S

), where, j=1,2,3,...,20.

Algorithm 1 can clearly identify the initialization of abnormal-

ity, and as soon as the algorithm detects shifts from normal cluster,

it can notify the emergency personnel for assistance of the moni-

tored patient. This paper serves as a proof of concept to show that

cardiac abnormality can be detected directly from the compressed

ECG with the application of data mining technique like EM.

Fig. 12 shows the fact that sub Fig. 4 of Fig. 9 (or Fig. 4) is equi-

distant (being more closer to abnormal cluster) from the 2 clusters

(according to Algorithm 1), even though it belongs to abnormal

cluster according to EM. Other instances (or compressed ECG seg-

ments) are clearly identiﬁed as a member of normal or abnormal

clusters.

5. Conclusion

In this paper, we have used data mining techniques like CFS

based attribute selection and EM based clustering to instantly de-

tect cardiac abnormalities of the CVD affected subjects. The pro-

posed DM driven CVD detection framework detects ECG

abnormalities without incurring delays, which is ideal for CVD af-

fected patients, as every second counts towards the mortality of

these patients (Luca, Suryapranata, Ottervanger, & Antman,

2004). For detecting the ECG anomalies, our proposed model does

not have to decompress the compressed ECG.

CVD related deaths being the number one killer of modern

times, our proposed instant ECG anomaly detection algorithm

has the potential to save the life of a CVD affected patient. This is

due to the fact that without proper monitoring (i.e. real-time

ECG monitoring demonstrated in Lee et al. (2007), Hung & Zhang

(2003), Blount (2007), Suﬁ et al. (2009), Suﬁ & Khalil (2008)), 40%

of the patients having their ﬁrst symptom of CVD might be dead

within years (Access Economics Pty Limited (2008)).

According to our experimentation on MIT-BIH entries (Physio-

bank, 2009), 100% accuracy can be achieved in detecting cardiac

abnormality from compressed ECG. However, in this paper, we

had focused on essentially two clusters (i.e. normal and abnormal)

that can only determine abnormality. To know the type of the

abnormality (e.g. Ventricular Fibrillation, Atrial Fibrillation, Prema-

ture Ventricular Beat, etc.), a multicluster system, where each clus-

ter represents one particular disease, needs to be implemented in

the future.

References

Access Economics Pty Limited. The shifting burden of cardiovascular disease in

australia, a report of heart foundation. <http://www.heartfoundation.com.au/

media/nhfashifting_burden_cvd_0505.pdf> Accessed 2008.

Blount, M. et al. (2007). Remote health-care monitoring using personal care

connect. IBM Systems Journal, 46(1), 95–113.

Friesen, G., Jannett, T., Jadallah, M., Yates, S., Quint, S., & Nagle, H. (1990). A

comparison of the noise sensitivity of nine qrs detection algorithms. IEEE

Transactions on Biomedical Engineering, 37(1), 85–98.

Hall, M. (1999). Correlation-based feature selection of discrete and numeric class

machine learning. In Computer science working papers, 2000. Working paper 00/

08. University of Waikato, Department of Computer Science.

Hamilton, P. S., & Tompkins, W. J. (1986). Quantitative investigation of qrs detection

rules using the mit/bih arrhythmia database. IEEE Transactions on Biomedical

Engineering, BME-33(12), 1157–1165.

Han, J., & Kamber, M. (2006). Data mining: Concepts and techniques. Morgan

Kaufmann.

Hung, K., & Zhang, Y.-T. (2003). Implementation of a wab-based telemedicine

system for patient monitoring. IEEE Transactions on Information Technology in

Biomedicine, 7(2), 101–107.

Istepanian, R., & Petrosian, A. (2000). Optimal zonal wavelet-based ecg data

compression for a mobile telecardiology system. IEEE Transactions on

Information Technology in Biomedicine, 4(3), 200–211.

Kim, B., Yoo, S., & Lee, M. (2006). Wavelet-based low-delay ecg compression

algorithm for continuous ecg transmission. IEEE Transactions on Information

Technology in Biomedicine, 10(1), 77–83.

Kira, K., & Rendell, L. (1992). A practical approach to feature selection. In Proceedings

of the ninth international workshop on Machine learning (pp. 249–256). San

Francisco, CA, USA: Morgan Kaufmann Publishers Inc.

Lee, R.-G., Chen, K.-C., Hsiao, C.-C., & Tseng, C.-L. (2007). A mobile care system with

alert mechanism. IEEE Transactions on Information Technology in Biomedicine,

11(5), 507–517.

Luca, G. D., Suryapranata, H., Ottervanger, J. P., & Antman, E. M. (2004). Time delay

to treatment and mortality in primary angioplasty for acute myocardial

infarction: Every minute of delay counts. Circulation, 109, 1223–1225.

Mahmood, A., Leckie, C., & Udaya, P. (2008). An efﬁcient clustering scheme to

exploit hierarchical data in network trafﬁc analysis. IEEE Transactions on

Knowledge and Data Engineering, 20(6), 752–767.

Physiobank: Physiologic signal archives for biomedical research. <http://

www.physionet.org/physiobank/> Accessed 2009.

Robnik-Sikonja, M., & Kononenko, I. (1997). An adaptation of relief for attribute

estimation in regression. In Proceedings of the Fourteenth International

Conference on Machine Learning (pp. 296–304). San Francisco, CA, USA:

Morgan Kaufmann Publishers Inc..

Suﬁ, F. (2007). Mobile phone programming java 2 micro edition. In Proceedings of

the 2007 international workshop on mobile computing technologies for pervasive

healthcare, Philip Island, Melbourne, December 2007 (pp. 64–80).

Suﬁ, F., Fang, Q., & Cosic, I. (2007). Ecg r-r peak detection on mobile phones. In 29th

Annual international conference of the IEEE engineering in medicine and biology

society, 2007, EMBS 2007, August 2007 (pp. 3697–3700).

Suﬁ, F., Fang, Q., Khalil, I., & Mahmoud, S. S. (2009). Novel methods of faster

cardiovascular diagnosis in wireless telecardiology. IEEE Journal on Selected

Areas in Communications, 27(4).

Suﬁ, F., Fang, Q., Mahmoud, S., & Cosic, I. (2006). A mobile phone based intelligent

telemonitoring platform. In Medical devices and biosensors, 2006. 3rd IEEE/EMBS

International Summer School on ISSMDBS, September 2006 (pp. 101–104).

Suﬁ, F., & Khalil, I. (2008). Enforcing secured ecg transmission for realtime

telemonitoring: A joint encoding, compression, encryption mechanism.

Security and Communication Networks, 1(5), 389–405.

Suﬁ, F., & Khalil, I. (2009). Diagnosis of cardiovascular abnormalities from

compressed ecg: A data mining based approach. In 9th International

conference on information technology and application in biomedicine, ITAB 2009,

Cyprus, November 2009.

Suﬁ, F., Khalil, I., Fang, Q., & Cosic, I. (2008). A mobile web grid based

physiological signal monitoring system. In International conference on

technology and applications in biomedicine, 2008. ITAB 2008, May 2008 (pp.

252–255).

Surez, K. V., Silva, J. C., Berthoumieu, Y., Gomis, P., & Najim, M. (2007). Ecg beat

detection using a geometrical matching approach. IEEE Transactions on

Biomedical Engineering, 54(4), 641–650.

Talavera, L. (1999a). Dependency-based feature selection for clustering symbolic

data. Intelligent Data Analysis, 4(1/2000), 19–28.

Talavera, L. (1999b). Feature selection as a preprocessing step for hierarchical

clustering. In Proceedings of the Sixteenth International Conference on Machine

Learning (pp. 389–397). Morgan Kaufmann Publishers Inc.

Table 6

and S

values for the 20 ECG segments. The Fourth value of S

is the lowest (i.e. S

= 0.24573771) signaling equidistant from both normal and abnormal clusters (i.e. initiation

of abnormality).

= {17.30530722,12.36044732,13.77612638, 72.26985027,134.7904569, 120.5433543,125.0978411, 137.5469333,144.1012035, 123.8610014,12.08989901,

9.406054328,16.00034556,14.48650607, 12.22903341, 13.18553215,14.27789403, 15.17882927, 12.32616964, 13.55373225}

= {118.8628916,118.3491715,117.4859846, 54.51042274,29.27631103, 23.41155244,38.1514297, 37.79026049,35.32039337, 38.08583448,121.1030833,

120.3706758,119.8604513,119.2606355, 120.2970672, 118.8009722,119.3707786, 118.8893224, 120.5503405, 120.1616153}

= {0.854409505,0.895559494, 0.882742385,0.24573771, 0.782801307, 0.805783134, 0.695027273,0.725255521, 0.754891753, 0.692511492, 0.900168529,

0.921857593,0.866508549,0.878530699, 0.898343046, 0.88901158,0.880390375,0.87232807, 0.897750852, 0.887204144}

F. Suﬁ et al. / Expert Systems with Applications 38 (2011) 4705–4713 4713

Identifying the drivers of negative news with sentiment, entity and regression analysis

Article

Full-text available

Apr 2022

Fahim Sufi

Modern-day news agencies cater for a wide range of negative news, since multiple studies show general people are more attracted towards negative news. Once a highly negative incident is reported by a local news agency, it is often propagated by many other foreign news agencies at a global scale characterizing the news as breaking news. This propagation of negative news generates significant impacts on groups (who conducted the event), location (where the event was conducted), societies (that was impacted by the news) along with other factors. This research critically analyzes the impacts of negative news or breaking news with the help Artificial Intelligence (AI) based techniques like sentiment analysis, entity detection and automated regression analysis. The methodology described within this paper was implemented with a unique algorithm that allowed identification of all related factors or topics that drive negative perceptions towards global news. The solution was hosted in cloud environment from 2nd June 2021 till 1st September 2021. It automatically captured and analyzed 22,425 global news from 2397 different news sources of 192 countries. During this time, 34,975 entities are automatically categorized into 13 different entity groups. The classification accuracy of the entity detection was found to be 0.992, 0.995 and 0.994 in terms of precision, recall and F1-score. Moreover, the accuracies of logistic regression and linear regression were found to be 0.895 in AUC and 0.255 in MAPE on an average. Finally, the presented solution was successfully deployed in a wide range of environments including smartphones, tablets, and desktops.

AI-based Automated Extraction of Location-Oriented COVID-19 Sentiments

Article

Full-text available

Jan 2022
CMC-COMPUT MATER CON

A Novel Method of Generating Geospatial Intelligence from Social Media Posts of Political Leaders

Article

Full-text available

Feb 2022

Social media platforms such as Twitter have been used by political leaders, heads of states, political parties, and their supporters to strategically influence public opinions. Leaders can post about a location, a state, a country, or even a region in their social media accounts, and the posts can immediately be viewed and reacted to by millions of their followers. The effect of social media posts by political leaders could be automatically measured by extracting, analyzing, and producing real-time geospatial intelligence for social scientists and researchers. This paper proposed a novel approach in automatically processing real-time social media messages of political leaders with artificial intelligence (AI)-based language detection, translation, sentiment analysis, and named entity recognition (NER). This method automatically generates geospatial and location intelligence on both ESRI ArcGIS Maps and Microsoft Bing Maps. The proposed system was deployed from 1 January 2020 to 6 February 2022 to analyze 1.5 million tweets. During this 25-month period, 95K locations were successfully identified and mapped using data of 271,885 Twitter handles. With an overall 90% precision, recall, and F1score, along with 97% accuracy, the proposed system reports the most accurate system to produce geospatial intelligence directly from live Twitter feeds of political leaders with AI.

ECG Forecasting System Based on Long Short-Term Memory

Article

Full-text available

Jan 2024

Worldwide, cardiovascular diseases are some of the primary causes of death; yet the early detection and diagnosis of such diseases have the potential to save many lives. Technological means of detection are becoming increasingly essential and numerous techniques have been created for this purpose, such as forecasting. Of these techniques, the time series forecasting technique seeks to predict future events. The long-term time series forecasting of physiological data could assist medical professionals in predicting and treating patients based on very early diagnosis. This article presents a model that utilizes a deep learning technique to predict long-term ECG signals. The forecasting model can learn signals’ nonlinearity, nonstationarity, and complexity based on a long short-term memory architecture. However, this is not a trivial task as the correct forecasting of a signal that closely resembles the original complex signal’s structure and behavior while minimizing any differences in amplitude continues to pose challenges. To achieve this goal, we used a dataset available on the Physio net database, called MIT-BIH, with 48 ECG recordings of 30 min each. The developed model starts with pre-processing to reduce interference in the original signals, then applies a deep learning algorithm, based on a long short-term memory (LTSM) neural network with two hidden layers. Next, we applied the root mean square error (RMSE) and mean absolute error (MAE) metrics to evaluate the performance of the model and obtained an average RMSE of 0.0070±0.0028 and an average MAE of 0.0522±0.0098 across all simulations. The results indicate that the proposed LSTM model is a promising technique for ECG forecasting, considering the trends of the changes in the original data series, most notably in R-peak amplitude. Given the model’s accuracy and the features of the physiological signals, the system could be used to improve existing predictive healthcare systems for cardiovascular monitoring.

Automating Global Threat-Maps Generation via Advancements of News Sensors and AI

Article

Full-text available

Oct 2022

Negative events are prevalent all over the globe round the clock. People demonstrate psychological affinity to negative events, and they incline to stay away from troubled locations. This paper proposes an automated geospatial imagery application that would allow a user to remotely extract knowledge of troubled locations. The autonomous application uses thousands of connected news sensors to obtain real-time news pertaining to all global troubles. From the captured news, the proposed application uses artificial intelligence-based services and algorithms like sentiment analysis, entity detection, geolocation decoder, news fidelity analysis, and decomposition tree analysis to reconstruct global threat maps representing troubled locations interactively. The fully deployed system was evaluated for full three months of summer 2021, during which the autonomous system processed above 22 k news from 2397 connected news sources involving BBC, CNN, NY Times, Government websites of 192 countries, and all possible major social media sites. The study revealed 11,668 troubled locations classified successfully with outstanding precision, recall, and F1-score, all evaluated in ubiquitous environment covering mobile, tablet, desktop, and cloud platforms. The system generated interesting global threat maps for robust scenario set of $$3.71 \times {10}^{29}$$ 3.71 × 10 29 , to be reported as original fully autonomous remote sensing application of this kind. The research discloses attractive news and global threat-maps with trusted overall classification accuracy.

A decision support system for extracting artificial intelligence-driven insights from live twitter feeds on natural disasters

Article

Full-text available

Sep 2022

Fahim Sufi

Existing studies on Twitter-based natural disaster analysis suffer from shortcomings like limitations on supported languages, lack of sentiment analysis, regional restrictions, lack of end-to-end automation, and lack of Mobile App support. In this study, we design and develop a fully-automated artificial intelligence (AI) based Decision Support System (DSS) available through multiple platforms like iOS, Android, and Windows. The proposed DSS uses a live Twitter feed to obtain natural disaster-related Tweets in 110 supported languages. The system automatically executes AI-based translation, sentiment analysis, and automated K-Means algorithm to generate AI-driven insights for disaster strategists. The proposed DSS was tested with 67,528 real-time Tweets captured between 28 September 2021 and 6 October 2021 in 39 different languages under two different scenarios. The system revealed critical information for disaster planners or strategists like which clusters of natural disasters were associated with the most negative sentiments. We evaluated the proposed system’s accuracy and user experiences from 12 different disaster strategists. 83.33% of users found the proposed solution easy to use, effective, and self-explanatory. With 97% and 99.7% accuracy in Twitter keyword extraction and entity classification, this DSS reported the most accurate disaster intelligence system on a mobile platform.

A Lossless Data-Hiding based IoT Data Authenticity Model in Edge-AI for Connected Living

Article

Aug 2022

Edge computing is an emerging technology for the acquisition of Internet-of-Things (IoT) data and provisioning different services in connected living. Artificial Intelligence (AI) powered edge devices (edge-AI) facilitate intelligent IoT data acquisition and services through data analytics. However, data in edge networks are prone to several security threats such as external and internal attacks and transmission errors. Attackers can inject false data during data acquisition or modify stored data in the edge data storage to hamper data analytics. Therefore, an edge-AI device must verify the authenticity of IoT data before using them in data analytics. This article presents an IoT data authenticity model in edge-AI for a connected living using data hiding techniques. Our proposed data authenticity model securely hides the data source’s identification number within IoT data before sending it to edge devices. Edge-AI devices extract hidden information for verifying data authenticity. Existing data hiding approaches for biosignal cannot reconstruct original IoT data after extracting the hidden message from it (i.e., lossy) and are not usable for IoT data authenticity. We propose the first lossless IoT data hiding technique in this article based on error-correcting codes (ECCs). We conduct several experiments to demonstrate the performance of our proposed method. Experimental results establish the lossless property of the proposed approach while maintaining other data hiding properties.

Analyzing Public Concerns Over COVID-19 Variants Using Social Media

Article

Full-text available

May 2022

Musleh Alsulami

SARS-CoV-2, or more popularly known COVID-19 has claimed more than 5.5 million lives since it has been declared as a global pandemic. Similar to other viruses, COVID-19 is also undergoing several mutations and has many variants like Alpha, Beta, Gamma, Delta, Omicron and others. With so many variants, social media users are confused and posting their frustrations and angers with Tweets or Posts in public social media platforms. These publicly accessible social media posts provide a wealth of information for a social scientist or political leader or a strategic decision maker. This study demonstrates a feasible approach to extract meaningful critical information from social media posts. By programmatically accessing Twitter database from 11th January 2022 till 20th January 2022, we retrieved almost 9 K Tweet messages on 6 different keywords like “COVID Variants”, “Omicron”, “Alpha Variant”, “Beta Variant”, “Gamma Variant” and “Delta Variant”. Results were compared against metrics like users, posts, engagement, and influence. Omicron was found to be the most popular topic compared to other variants with an influence score of 70.2 million and 2.1 K posts during the monitored period. The most popular sources for influences on COVID-19 Variant related posts were found to be @reuters with 24.2M, @forbes with 17.4M, @timesofindia with 14.2M and @inquirerdotnet with 3.4 followers. This study also found out that the most popular Tweet languages were English followed by French and Dutch. Lastly, this study ranked user mentions, word frequency (with word cloud) and hashtags for COVID-19 Variant related twitter posts during the monitored timeframe.

A New Decision Support System for Analyzing Factors of Tornado Related Deaths in Bangladesh

Article

Full-text available

May 2022

Tropical cyclones devastate large areas, take numerous lives and damage extensive property in Bangladesh. Research on landfalling tropical cyclones affecting Bangladesh has primarily focused on events occurring since AD1960 with limited work examining earlier historical records. We rectify this gap by developing a new Tornado catalogue that include present and past records of Tornados across Bangladesh maximizing use of available sources. Within this new Tornado database, 119 records were captured starting from 1838 till 2020 causing 8735 deaths and 97,868 injuries leaving more than 102,776 people affected in total. Moreover, using this new Tornado data, we developed an end-to-end system that allows a user to explore and analyze the full range of Tornado data on multiple scenarios. The user of this new system can select a date range or search a particular location, and then, all the Tornado information along with Artificial Intelligence (AI) based insights within that selected scope would be dynamically presented in a range of devices including iOS, Android, and Windows. Using a set of interactive maps, charts, graphs, and visualizations the user would have a comprehensive understanding of the historical records of Tornados, Cyclones and associated landfalls with detailed data distributions and statistics.

A New Decision Support System for Analyzing Factors of Tornado Related Deaths in Bangladesh

Preprint

Full-text available

May 2022

Tropical cyclones devastate large areas, take numerous lives and damage extensive property in Bangladesh. Research on landfalling tropical cyclones affecting Bangladesh has primarily focused on events occurring since AD1960 with limited work examining earlier historical records. We rectify this gap by developing a new tornado catalogue that include present and past records of tornados across Bangladesh maximizing use of available sources. Within this new tornado database, 119 records were captured starting from 1838 till 2020 causing 8,735 deaths and 97,868 injuries leaving more than 1,02,776 people affected in total. Moreover, using this new tornado data, we developed an end-to-end system that allows a user to explore and analyze the full range of tornado data on multiple scenarios. The user of this new system can select a date range or search a particular location, and then, all the tornado information along with Artificial Intelligence (AI) based insights within that selected scope would be dynamically presented in a range of devices including iOS, Android, and Windows. Using a set of interactive maps, charts, graphs, and visualizations the user would have a comprehensive understanding of the historical records of Tornados, Cyclones and associated landfalls with detailed data distributions and statistics.

A Mobile Phone Based Intelligent Telemonitoring Platform

Conference Paper

Full-text available

Oct 2006

In this paper, we propose a generic smart telemonitoring platform in which the computation power of the mobile phone is highly utilized. In this approach, compression of ECG is done in real-time by the mobile phone for the very first time. The fast and effective compression scheme, designed for the proposed telemonitoring system, outperforms most of the real-time lossless ECG compression algorithms. This mobile phone based computation platform is a promising solution for privacy issues in telemonitoring through encryptions. Moreover, the mobile phones used in this platform performs preliminary detection of abnormal biosignal in realtime. Apart from the usage of mobile phones, this platform supports background biosignal abnormality surveillance using data mining agent.

Novel Methods of Faster Cardiovascular Diagnosis in Wireless Telecardiology

Article

Full-text available

Jun 2009

With the rapid development wireless technologies, mobile phones are gaining acceptance to become an effective tool for cardiovascular monitoring. However, existing technologies have limitations in terms of efficient transmission of compressed ECG over text messaging communications like SMS and MMS. In this paper, we first propose an ECG compression algorithm which allows lossless transmission of compressed ECG over bandwidth constrained wireless link. Then, we propose several algorithms for cardiovascular abnormality detection directly from the compressed ECG maintaining end to end security, patient privacy while offering the benefits of faster diagnosis. Next, we show that our mobile phone based cardiovascular monitoring solution is capable of harnessing up to 6.72 times faster diagnosis compared to existing technologies. As the decompression time on a doctor's mobile phone could be significant, our method will be highly advantageous in patient wellness monitoring system where a doctor has to read and diagnose from compressed ECGs of several patients assigned to him. Finally, we successfully implemented the prototype system by establishing mobile phone based cardiovascular patient monitoring.

Optimal zonal wavelet-based ECG data compression for a mobile telecardiology system

Article

Full-text available

Sep 2000

A new integrated design approach for an optimal zonal wavelet-based ECG data compression (OZWC) method for a mobile telecardiology model is presented. The hybrid implementation issues of this wavelet method with a GSM-based mobile telecardiology system are also introduced. The performance of the mobile system with compressed ECG data segments selected from the MIT-BIH arrhythmia database is evaluated in terms of bit error rate (BER), percent rms difference (PRD), and visual clinical inspection. The compression performance analysis of the OZWC is compared with another wavelet-based (Discrete Symmetric Wavelet Compression) approach. The optimal wavelet algorithm achieved a maximum compression ratio of 18:1 with low PRD ratios. The mobile telemedical simulation results show the successful compressed ECG transmission at speeds of 100 (km/h) with BER rates of less than 10 -15, providing a 73% reduction in total mobile transmission time with clinically acceptable reconstruction of the received signals. This approach will provide a framework for the design and functionality issues of GSM-based wireless telemedicine systems with wavelet compression techniques and their future integration for the next generation of mobile telecardiology systems.

Remote health-care monitoring using Personal Care Connect

Article

Full-text available

Jan 2007
IBM SYST J

Caring for patients with chronic illnesses is costly—nearly $1.27 trillion today and predicted to grow much larger. To address this trend, we have designed and built a platform, called Personal Care Connect (PCC), to facilitate the remote monitoring of patients. By providing caregivers with timely access to a patient's health status, they can provide patients with appropriate preventive interventions, helping to avoid hospitalization and to improve the patient's quality of care and quality of life. PCC may reduce health-care costs by focusing on preventive measures and monitoring instead of emergency care and hospital admissions. Although PCC may have features in common with other remote monitoring systems, it differs from them in that it is a standards-based, open platform designed to integrate with devices from device vendors and applications from independent software vendors. One of the motivations for PCC is to create and propagate a working environment of medical devices and applications that results in innovative solutions. In this paper, we describe the PCC remote monitoring system, including our pilot tests of the system.

Enforcing secured ECG Transmission for realtime Telemonitoring: a joint encoding, compression, encryption mechanism

Article

Full-text available

Sep 2008

Realtime telemonitoring of critical, acute and chronic patients has become increasingly popular with the emergence of portable acquisition devices and IP enabled mobile phones. During telemonitoring, enormous physiological signals are transmitted through the public communication network in realtime. However, these physiological signals can be intercepted with minimal effort, since existing telemonitoring practise ignores the privacy and security requirements. In this paper, to achieve end-to-end security, we first proposed an encoding method capable of securing Electrocardiogram (ECG) data transmission from an acquisition device to a mobile phone, and then from a mobile phone to a centralised medical server by concealing cardiovascular details as well as features in ECG data required to identify an individual. The encoding method not only conceals cardiovascular condition, but also reduces the enormous file size of the ECG with a compression ratio of up to 3.84, thus making it suitable in energy constrained small acquisition devices. As ECG data transfer faces even greater security vulnerabilities while traversing through the public Internet, we further designed and implemented 3 phase encoding—compression—encryption mechanism on mobile phones using the proposed encoding method and existing compression and encryption tools. This new mechanism elevates the security strength of the system even further. Apart from higher security, we also achieved higher compression ratio of up to 20.06, which will enable faster transmission and make the system suitable for realtime telemonitoring. Copyright © 2008 John Wiley & Sons, Ltd.

Data Mining: Concepts and Techniques

Article

Jan 2006

Mobile web grid based physiological signal monitoring system

Conference Paper

Jul 2008

Monitoring of physiological signals of disaster affected patients inherit several challenges. First of all, the care providers of a disaster zone are susceptible to health hazards for themselves. Fast and easy transportation of monitoring equipments is crucial for survival of the injured. Moreover, Enormous physiological data transmission from the disaster zone to the medical server needs to be regulated to prevent network congestion. In this paper, we are proposing a mobile grid based health content delivery service, which can be useful for vital signal monitoring from a remote location. The proposed system is specifically designed for monitoring of a group that is very much mobile and dynamic in nature. Therefore, during a catastrophic event like earth quake, flood, cyclone the whole system can be transported with minimal mobility to the disaster affected patients. Minimally trained people are capable of installing the system within the disaster affected area entirely in ad-hoc manner. Medical experts can monitor the group from a safe location and provide specialist advice for the early recovery of the affected patients. To deal with network congestion, local intelligence is applied within the mobile patient monitoring system. Therefore, only medically urgent information is transmitted to the hospital server or central server. Application of grid network provides additional computational power to analyze raw physiological signal to identify possible health hazards for the monitored patients. In addition, the proposed mobile grid provides load sharing and redundancy of patient data, which are of prime importance for a disaster zone.

A Practical Approach to Feature Selection

Conference Paper

Dec 1992

In real-world concept learning problems, the representation of data often uses many features, only a few of which may be related to the target concept. In this situation, feature selection is important both to speed up learning and to improve concept quality. A new feature selection algorithm Relief uses a statistical method and avoids heuristic search. Relief requires linear time in the number of given features and the number of training instances regardless of the target concept to be learned. Although the algorithm does not necessarily find the smallest subset of features, the size tends to be small because only statistically relevant features are selected. This paper focuses on empirical test results in two artificial domains; the LED Display domain and the Parity domain with and without noise. Comparison with other feature selection algorithms shows Relief's advantages in terms of learning time and the accuracy of the learned concept, suggesting Relief's practicality.

Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

Conference Paper

Jan 2000

Mark A. Hall

Algorithms for feature selection fall into two broad categories: wrappers that use the learning algorithm itself to evaluate the usefulness of features and filters that evaluate features according to heuristics based on general characteristics of the data. For application to large databases, filters have proven to be more practical than wrappers because they are much faster. However, most existing filter algorithms only work with discrete classification problems. This paper describes a fast, correlation-based filter algorithm that can be applied to continuous and discrete problems. The algorithm often out-performs the well-known ReliefF attribute estimator when used as a preprocessing step for naive Bayes, instance-based learning, decision trees, locally weighted regression, and model trees. It performs more feature selection than ReliefF does-reducing the data dimensionality by fifty percent in most cases. Also, decision and model trees built from the preprocessed data are often significantly smaller.

Data Mining: Concepts and Techniques

Book

Jan 2000

This is the third edition of the premier professional reference on the subject of data mining, expanding and updating the previous market leading edition. This was the first (and is still the best and most popular) of its kind. Combines sound theory with truly practical applications to prepare students for real-world challenges in data mining. Like the first and second editions, Data Mining: Concepts and Techniques, 3rd Edition equips professionals with a sound understanding of data mining principles and teaches proven methods for knowledge discovery in large corporate databases. The first and second editions also established itself as the market leader for courses in data mining, data analytics, and knowledge discovery. Revisions incorporate input from instructors, changes in the field, and new and important topics such as data warehouse and data cube technology, mining stream data, mining social networks, and mining spatial, multimedia and other complex data. This book begins with a conceptual introduction followed by a comprehensive and state-of-the-art coverage of concepts and techniques. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. Wherever possible, the authors raise and answer questions of utility, feasibility, optimization, and scalability. relational data. -- A comprehensive, practical look at the concepts and techniques you need to get the most out of real business data. -- Updates that incorporate input from readers, changes in the field, and more material on statistics and machine learning, -- Scores of algorithms and implementation examples, all in easily understood pseudo-code and suitable for use in real-world, large-scale data mining projects. -- Complete classroom support for instructors as well as bonus content available at the companion website. A comprehensive and practical look at the concepts and techniques you need in the area of data mining and knowledge discovery.

A clustering based system for instant detection of cardiac abnormalities from compressed ECG

Abstract and Figures

Recommended publications

Efficient ECG Modeling using Polynomial Functions

A novel method of diagnosing coronary heart disease by analysing ECG signals combined with motion ac...

Automatic segmentation based on spectral characteristics of speech signal

Automatic recognition and recording of arrhythmias on the intensive care unit. Experiences with tren...