Content uploaded by Saraju P. Mohanty
Author content
All content in this area was uploaded by Saraju P. Mohanty on Mar 20, 2024
Content may be subject to copyright.
iCardo 3.0: Machine Learning Framework for
Prediction of Conduction Disturbance in Heart
Nidhi Sinha1, Amit Joshi2, and Saraju P. Mohanty3
1 Research Scholar, Malviya National Institute of Technology, Jaipur 302017, India 2
Assistant Professor, Malviya National Institute of Technology, Jaipur 302017, India
2018rec9033@mnit.ac.in
3 Dept. of CSE, University of North Texas, Denton, USA, saraju.mohanty@unt.edu
amjoshi.ece@mnit.ac.in
Abstract. Cardiovascular disease is one of the main causes of death
globally. Electrocardiography (ECG) is one of the non-invasive meth-
ods to assess the disorders of heart functioning. The paper presents
the prediction of conduction disturbance or disorders (CD) which lead
to chronic heart failure or cardiac arrest through a 12-lead electrocar-
diogram (ECG). A publicly available large electrocardiography data set
named PTB-XL is used in the study. Bagging and boosting-based ma-
chine learning algorithms (i.e. Random Forest (RF) and XG boost along
with the Support Vector Machine (SVM) have been used to classify the
CD and normal subjects. Two demographic features, age and sex of the
subject, have been added to the ECG to prepare the final data for the
input of the classifiers. The performance in terms of accuracy with Ran-
dom Forest(RF) and XG boost performance is similar whereas the total
number of true predictions is higher in the case of RF.
Keywords: Conduction Disturbance ·ECG ·CVD ·chronic heart fail-
ure ·Random Forest ·XG Boost ·SVM
1 Introduction
Conduction in a human heart refers to how the electrical impulse travels within
the heart, which causes it to beat. Normally the impulse generated by the Sinoa-
trial (SA) node activates the atria. The conduction pathways continue to bundle
branches, i.e., Left Bundle Branch (LBB) and Right Bundle Branch (RBB), and
finally cause the contraction of the left and right ventricles simultaneously. Each
impulse contributes to one heartbeat. Depending on the individual’s age, the
heart typically beats 60 to 100 times per minute during resting. Conduction dis-
turbance (CD) or disorder is a condition of the heart in which it has a block
in conduction pathways. The conduction disorders lead to chronic heart failure.
Conduction disorders can be broadly classified into three categories: First-degree
heart block, Second-degree heart block, and Third-degree heart block [1]. In
first-degree heart block, the electric impulse moves slower through the heart’s
atrioventricular (AV) node than normal. It does not cause any symptoms. In
2 F. Author et al.
second-degree heart block, only some of the electrical impulses reach from the
upper heart’s chamber, that is, atria, to the lower heart’s chamber, which is the
ventricles. In this situation, the heart may miss beats, or the heartbeat may be
irregular or slow. It has symptoms like heart palpitations, shortness of breath,
chest pain, etc. In the third-degree heart block, also known as complete heart
block, the electrical impulse can not pass from the heart’s upper chamber to the
lower chamber of the heart, but ventricles still contract and pump blood but at a
slower rate. Although the contraction is not proper and the pumping of blood is
also ineffective. In this situation, the patient requires immediate help as it has a
high risk of cardiac arrest. Since CD disrupts the electrical impulse of the heart,
it reflects in ECG, making the ECG a suitable tool for predicting CD. Since CD
Some work related to this field is discussed in the next section ??.
Many researchers in the field of prediction CD are coming off. It is discussed
that first and second-degree heart blocks do not give idiosyncratic symptoms to
have proper identification. Prediction of the CD at an early stage can decrease
mortality. Conductance disturbances are quite common after Transcatheter aor-
tic valve replacement (TAVR) [3][8]. A comparison of machine learning models
and neural networks to predict Atrioventricular block with single lead ECG is
presented by [6]. To predict the CD accurately after TAVR [5] presented an
ML-based model for patient-specific monitoring. [4] assessed the risk factors of
conduction disturbances, atrial fibrillation, sudden coronary death, and device
infection. On the other hand [10] investigated conduction disorders and arrhyth-
mia associated with the septal defects. [9] studied the Conduction disturbances
in patients with systemic sclerosis. Here, in this paper CD is predicted with
12-lead ECG and two additional features i.e., ’age’, and ’sex. The detail of the
data used here is given in section 2, while detailed methodology is explained in
section 3.
2 Dataset
A publicly available large electrocardiography data set named PTB-XL [7] has
been used in this work, and it was taken from [2]. It contains 12 lead ECG
recordings (i.e. (I, II, III, aVL, aVR, aVF, V1, V2, V3, V4, V5 and V6)) of
21837 records that have been taken from 18885 subjects, and each recording is
ten second long. The ECG data is a multi-label data set as up to two cardiol-
ogists annotated it. Later it was aggregated as diagnostic super and subclass.
Five superclasses are: Normal ECG (NORM), Conduction Disturbance (CD),
Myocardial Infarction (MI), Hypertrophy (HYP) and ST/T change (STTC).
Here only two classes are considered which are: NORM and CD. Conduction
disturbance or disorder (CD) includes the following diseases:
–LAFB/LPFB : Left anterior/left posterior fascicular block
–IRBBB : eft anterior/left posterior fascicular block
–ILBBB: incomplete left bundle branch block
–CLBBB : complete left bundle branch block
–CRBBB : complete right bundle branch block
Title Suppressed Due to Excessive Length 3
–AVB : AV block
–IVCB : non-specific intraventricular conduction block or disturbance
–WPW : Wolf-Parkinson-White syndrome
The data inclusion and exclusion for the presented work is shown in Fig. 1
Total samples =
27,799
Data with single
label = 16,244
Data with
multiple labels
= 5,555
PTB-XL Dataset
Data Excluded
17,022
MI: 5,469
STTC: 5,235
HYP: 2,649
Data Included
10,777
CD: 1,708
NORM: 9069
Fig. 1. Data Inclusion and Exclusion Process Flow
3 MLCardio : Proposed Machine Learning method
3.1 Data Preparation
The problem is modelled as a binary classification problem, in which class 1 is
the normal subjects (NORM), and class 2 is the subject with CD. The 12-leads
ECG data for all 10777 subjects have been converted into a 3-D array using
Numpy in Python 3.0. Later it was flattened into a 2-D array and converted into
a data frame using Pandas. After which, two features, ’age’ and ’sex’ have been
added to the data. The training data are scaled or standardized with the help
of Standard Scaler from the sklearn library. Finally, the prepared training data
are used to train three different classifiers.
3.2 Machine Learning Models
Here, Support Vector Machine(SVM), due to its discriminative power of classi-
fication [11], along with the ensemble machine learning algorithm, i.e., Random
Forest and XG Boost, are used for the prediction of conduction disturbances.
The ensemble ML algorithms have two benefits in predictive modelling; (i) Per-
formance, and (ii) Robustness. An ensemble ML model can make more accurate
predictions and accomplish superior performance than any solitary model, and it
4 F. Author et al.
also reduces the variance or dispersion of predictions. Adding two demographic
features, i.e., ’age’ and ’sex’, to the raw ECG signal significantly improved the
models’ performance.
3.3 Classification of CD
The three classifiers, which are Random Forest (RF), XG Boost (XGB), and Sup-
port Vector Machine (SVM), are initialized with their default hyper-parameter
and trained with the prepared data. After training, the testing data evaluate
the models’ performances. Overall accuracy, precision, recall, and f1-score are
performance measures.
Performance Measures There are various performance measures for clas-
sification models, although the most common is accuracy. Here is the list of
performance metrics evaluated for each machine-learning model:
–Accuracy: Ratio of correct prediction to total no. of prediction
–Precision: It is the ratio of True Positive (TP) to total positive prediction
(i.e., the sum of TP and FP)
–Recall: It is the ratio of True Positive to actual positive (i.e., the sum of TP
and FN)
–F1 score: It is the harmonic mean of precision and recall
All the above performance metrics are calculated for all the classifiers, i.e., SVM,
RF, and XG Boost, and listed in the next section, and the process flow of the
work is shown in Fig. 2.
Raw ECG data Data
Preparation
Data split
80:20
age, sex
Training Data
Testing Data
Precision Recall F1 score Accuracy
Performance Metrics
ML Classifier(s) Training
and Evaluation
Fig. 2. Data Inclusion and Exclusion Process Flow
4 Results and Discussion
The data included for the work has 10,777 samples, which were split into train-
ing and testing sets. Eighty per cent of the data, which is 8621 samples, is used
Title Suppressed Due to Excessive Length 5
for training the model, while twenty percent of the data, i.e., 2156, is used for
evaluating the model’s performance which is also called testing. Two features,
’age’ and ’sex’ have been added to the ECG signal data before training and
testing. These two features improved the classifiers’ performance significantly.
Usually, either the raw ECG signals or features extracted from it alone are used
in ML classification. The limitations of such approaches are: If alone, ECG raw
signals are used in ML, it gives low accuracy, and if it is used with neural net-
works, it requires comparatively more computation and resources. On the other
hand, if extract features from raw ECG signals then apply the ML to classify,
then it adds complexity to the model. Although, here in the presented work, raw
ECG is combined with two readily available features, i.e., ’age’ and ’sex.’ which
improves the classifier’s performance significantly. The overall performance ma-
trices of each classifier before and after adding the ’age’ and ’sex’ features are
listed in table 1 2, and the confusion matrix for each classifier after adding ’age’
and ’sex’ features are given in 4.
Table 1. Performance metrics of SVM, XG Boost, and RF before adding ’age’ and
’sex’ features
ML Classifiers Precision (%) Recall (%) F1 score (%) Accuracy (%)
Support Vector Machine (SVM) 70 84 77 84
XG Boost 72 84 76 84
Random Forest (RF) 74 83 77 83
Table 2. Performance metrics of SVM, XG Boost, and RF after adding ’age’ and ’sex’
features
ML Classifiers Precision (%) Recall (%) F1 score (%) Accuracy (%)
Support Vector Machine (SVM) 89 87 83 87
XG Boost 89 90 88 90
Random Forest (RF) 90 90 89 90
Table 3. Confusion matrix of SVM, XG Boost, and RF respectively
NORM CD
NORM 1815 1
CD 279 61
NORM CD
NORM 1797 19
CD 206 137
NORM CD
NORM 1791 25
CD 189 151
It is evident that the ensemble ML algorithms perform better than the SVM,
as we know that RF uses the bagging approach and works parallelly, while XG
6 F. Author et al.
Boost works on the concept of boosting, which is a sequential process. Then it
can be concluded that RF is better in terms of speed and complexity as well as
overall performance metrics. The performance measures for each class are listed
in the table 4 for a better understanding of the classifiers’ behavior.
Table 4. Class-wise performance metrics of each classifier after adding ’age’ and ’sex’
features
Performance Metrics SVM XG Boost RF
NORM CD NORM CD NORM CD
Precision 0.87 0.98 0.90 0.88 0.90 0.86
Recall 1.00 0.18 0.99 0.39 0.99 0.44
F1 Score 0.93 0.30 0.94 0.54 0.94 0.59
In future work, the neural network-based classifier can be studied. Apart
from this, Optuna optimization can be performed on the presented ML models
to improve the performance further.
References
1. Heart conduction disorders. https://www.heart.org/en/health-
topics/arrhythmia/about-arrhythmia/conduction-disorders, accessed: 2023-05-14
2. Ary L. Goldberger, L.A.N.A., et al: Physiobank, physiotoolkit, and physionet. Cir-
culation 101(23), e215–e220 (2000). https://doi.org/10.1161/01.CIR.101.23.e215
3. Auffret, V., Puri, R., Urena, M., Chamandi, C., Rodriguez-Gabella, T., Philippon,
F., Rodes-Cabau, J.: Conduction disturbances after transcatheter aortic valve re-
placement: current status and future perspectives. Circulation 136(11), 1049–1069
(2017)
4. Crea, F.: Novel risk factors for atrial fibrillation, conduction distur-
bances, sudden coronary death, and device infection. European Heart Jour-
nal 43(47), 4853–4857 (12 2022). https://doi.org/10.1093/eurheartj/ehac734,
https://doi.org/10.1093/eurheartj/ehac734
5. Galli V, L.F., et al: Towards patient-specific prediction of conduction abnormal-
ities induced by transcatheter aortic valve implantation: a combined mechanistic
modelling and machine learning approach. Eur Heart J Digit Health pp. 606–615
(Aug 2021). https://doi.org/10.1093/ehjdh/ztab063
6. Kirti Singh, V.N., et al: Machine learning algorithms for atrioven-
tricular conduction defects prediction using ecg: A comparative study.
In: 2022 IEEE Delhi Section Conference (DELCON). pp. 1–5 (2022).
https://doi.org/10.1109/DELCON54057.2022.9753488
7. Patrick Wagner, N.S., et al: Ptb-xl, a large publicly available electrocardiography
dataset. Scientific Data 7(1), 154 (May 2020). https://doi.org/10.1038/s41597-
020-0495-6, https://doi.org/10.1038/s41597-020-0495-6
Title Suppressed Due to Excessive Length 7
8. Sammour, Y., Kapadia, S.R.: Conduction disturbance after tavr. Cardiac
Interventions https://citoday.com/articles/2022-mar-apr/conduction-disturbance-
after-tavr
9. Vrancianu, C.A., Gheorghiu, A.M., Popa, D.E., Chan, J.S.K., Satti, D.I., Lee,
Y.H.A., Hui, J.M.H., Tse, G., Ancuta, I., Ciobanu, A., et al.: Arrhythmias and
conduction disturbances in patients with systemic sclerosis—a systematic literature
review. International Journal of Molecular Sciences 23(21), 12963 (2022)
10. Williams, M.R., Perry, J.C.: Arrhythmias and conduction disorders associated
with atrial septal defects. Journal of Thoracic Disease 10(Suppl 24) (2018),
https://jtd.amegroups.com/article/view/23665
11. Youn-Jung Son, RN, H.G.K., et al: Application of support vector machine for
prediction of medication adherence in heart failure patients. Healthc Inform Res
16(4), 253–259 (Dec 2010)