ArticlePDF Available

Fusion-Based Machine Learning Architecture for Heart Disease Prediction

Tech Science Press
Computers, Materials & Continua
Authors:

Abstract and Figures

The contemporary evolution in healthcare technologies plays a considerable and signi cant role to improve medical services and save human lives. Heart disease or cardiovascular disease is the most fatal and complex disease which it is hardly to be detected through our naked eyes, as numerous people have been suffering from this disease globally. Heart attacks occur when the ranges of vital signs such as blood pressure, pulse rate, and body temperature exceed their normal values. The ef cient diagnosis of heart diseases could play a substantial role in the eld of cardiology, while diagnostic time could be reduced. It has been a key challenge for researchers and medical experts to diagnose heart diseases accurately and timely. Therefore, machine learning-based techniques are used for the diagnosis with higher accuracy, using datasets compiled from former medical patients' reports. In recent years, numerous studies have been presented in the literature propose machine learning techniques for diagnosing heart diseases. However, the existing techniques have some limitations in terms of their accuracy. In this paper, a novel Support Vector Machine (SVM) based architecture for heart disease prediction, empowered with a fuzzy based decision level fusion, is presented. The SVM-based architecture has improved the accuracy signi cantly as compared to existing solutions, where 96.23% accuracy has been achieved.
Content may be subject to copyright.
ech
T
PressScience
Computers, Materials & Continua
DOI:10.32604/cmc.2021.014649
Article
Fusion-Based Machine Learning Architecture for Heart
Disease Prediction
Muhammad Waqas Nadeem1,2 , Hock Guan Goh1, *, Muhammad Adnan Khan3, Muzammil Hussain4,
Muhammad Faheem Mushtaq5and Vasaki a/p Ponnusamy1
1Faculty of Information and Communication Technology (FICT), Universiti Tunku Abdul Rahman (UTAR),
Kampar, Perak, 31900, Malaysia
2Department of Computer Science, Lahore Garrison University, Lahore, 54000, Pakistan
3Department of Computer Science, Faculty of Computing, Riphah International University, Lahore Campus,
Lahore, 54000, Pakistan
4Department of Computer Science, School of Systems and Technology, University of Management and Technology,
Lahore, 54000, Pakistan
5Department of Information Technology, Khwaja Fareed University of Engineering and Information Technology,
Rahim Yar Khan, 64200, Pakistan
*Corresponding Author: Hock Guan Goh. Email: gohhg@utar.edu.my
Received: 05 October 2020; Accepted: 21 November 2020
Abstract: The contemporary evolution in healthcare technologies plays a con-
siderable and signicant role to improve medical services and save human
lives. Heart disease or cardiovascular disease is the most fatal and complex
disease which it is hardly to be detected through our naked eyes, as numerous
people have been suffering from this disease globally. Heart attacks occur when
the ranges of vital signs such as blood pressure, pulse rate, and body tem-
perature exceed their normal values. The efcient diagnosis of heart diseases
could play a substantial role in the eld of cardiology, while diagnostic time
could be reduced. It has been a key challenge for researchers and medical
experts to diagnose heart diseases accurately and timely. Therefore, machine
learning-based techniques are used for the diagnosis with higher accuracy,
using datasets compiled from former medical patients’ reports. In recent years,
numerous studies have been presented in the literature propose machine learn-
ing techniques for diagnosing heart diseases. However, the existing techniques
have some limitations in terms of their accuracy. In this paper, a novel Sup-
port Vector Machine (SVM) based architecture for heart disease prediction,
empowered with a fuzzy based decision level fusion, is presented. The SVM-
based architecture has improved the accuracy signicantly as compared to
existing solutions, where 96.23% accuracy has been achieved.
Keywords: Heart disease; machine learning; support vector machine; fuzzy
logic; fusion; cardiovascular
This work is licensed under a Creative Commons Attribution 4.0 International License,
which permits unrestricted use, distribution, and reproduction in any medium, provided
the original work is properly cited.
2482 CMC, 2021, vol.67, no.2
1 Introduction
Heart disease (HD) is a serious health issue around the world and numerous peoples are
affected by this disease [1]. The most common symptoms of HD are physical body weakness,
breath shortness, and swollen feet [2]. In recent years, many researchers present various machine
learning methods and techniques for early prediction of heart disease but the existing diagnostic
techniques for heart disease are not efcient and effective due to several reasons such as execution
time and accuracy of the machine learning models [3]. Due to the unavailability of a medical
expert and modern technology, the diagnosis and treatment of heart disease are difcult to be
carried out appropriately [4]. The life of numerous people can be saved by using effective and
accurate diagnostic technologies [5]. According to the European Society of Cardiology, there are
3.6 million people have diagnosed as HD patients annually around the world [6,7]. Most of the
people in the United States (US) are affected by HD [8]. Approximately 50% of heart patients that
are suffering from heart disease can survive within 1–2 years, and 3% of the nancial healthcare
budget is used for the management of heart disease [9]. Traditionally, the physician use concerning
symptoms, patient medical history, and physical examination reports for the diagnosis of heart dis-
ease. The results obtained from these methods are not effective and accurate for the identication
of HD patient. Moreover, these methods are computationally difcult and expensive [10].
The development of machine learning-based noninvasive diagnostic systems is needed for
effective diagnosis of HD [1116]. Machine learning-based expert decision systems and applica-
tions of Articial Fuzzy Logic (AFL) efciently diagnose the heart disease patient that results
in the decreases in death ratio [17,18]. The Cleveland heart disease data set used by several
researchers [1924] for the prediction of HD. The proper data are required by the predictive
machine learning models for their training and testing. The use of balanced data set for the
training and testing improves the performance machine learning model. Furthermore, the use of
proper and related features from the data set improves the predictive capabilities of the model.
Hence, features selection and data balancing are key parameters to improve the performance of
the model. In the literature, several machine learning-based diagnostic methods and techniques
such as Neuro Fuzzy, Articial Neural Network (ANN), Support Vector Machine (SVM), Deci-
sion Tree (DT), Naïve Bayes (NB) etc. have been proposed by researchers, but these techniques
have some limitations that include lack of large training data, inconsistency accuracy, proper data
balancing, and so on. Furthermore, these techniques do not effectively diagnose heart disease.
The data standardization at the data processing layer also improves the predictive capabilities
of the machine learning models. Since more, some other preprocessing techniques that include
Min–Max Scalar, removal of missing features from the dataset, and standard scalar improve the
performance of the model [20]. Several features selection techniques such as Principle Component
Analysis (PCA), Local Learning-Based Features Selection (LLBFS), Greedy Algorithm (GA) etc.
are used for the selection of important parameters. Furthermore, several optimization techniques
that include Bacterial Foraging Optimization (BFO), Ant Colony Optimization (ACO) and so on
also used for the optimization of features before training of machine learning models [25].
Furthermore, in recent years, several machine learning algorithms such as ANN, SVM,
K-Nearst Neighbour (KNN) etc. are used in the Internet of Things (IoT) based systems for
prediction and classication [26]. The unsupervised machine learning algorithms are used to label
the data which is collected by the different IoT devices. The data which is labeled by the machine
learning algorithms gives more accurate results as compared to manual labeling.
Hence more, Neural Networks based tools achieved state-of-the-art performance for the pre-
diction of brain and heart diseases. In recent years, Carotid Artery Stenting (CAS) treatment
CMC, 2021, vol.67, no.2 2483
is commonly used in the eld of medicine. The CAS methods give an overview of the Major
Adverse Cardiovascular Events (MACE) of the HD patients at an early stage. The ANN produces
more accurate results as compared to the simple CAS method [27]. The proposed ANN-based
methods do not only combine posterior probabilities but also produce vales from multiple prede-
cessor techniques. The ANN-based methods achieved much better results as compared to existing
methods [28].
In this paper, supervised machine learning architecture empowered with fuzzy-based decision
level fusion medical expert system is proposed for the prediction of heart disease. The proposed
architecture consists of two phases: supervised machine learning phase and fuzzy-based decision
level fusion phase. The main objective of this proposed architecture is to improve the accuracy of
machine learning-based solution for the diagnosis of heart disease. Furthermore, in recent years,
many studies restrict the use of feature selection methods for the model. Therefore, the proposed
model working on the mechanism of parallel computation that allows us to use all the features
without any restriction of feature selection method at the pre-processing layer. The experiment
results show that the proposed architecture has effective results in terms of accuracy for the
diagnosis of heart disease as compared to existing machine learning methods.
The rest of the paper is organized as follows: Section 2 describes the related work. Section 3
presents the materials and methods for the diagnosis of heart disease. Section 4 discusses the
simulation results of the proposed architecture. Section 5 concludes the study.
2 Related Work
In the literature, numerous machine learning-based medical expert systems were designed by
the researchers for the diagnosis of heart disease. This paper gives an overview of some existing
machine learning-based diagnosis systems for heart disease and highlights the importance of
the proposed work. ANN-based diagnostics models give the highest prediction accuracy in the
domain of healthcare [27]. Similarly, Big Data and Optimal Articial Neural Network (OANN)
based model is presented in [29] that achieved the 90.91% prediction accuracy. Kaggle and UCI
laboratory heart disease data sets have been used by the researchers to discover the patterns using
different machine learning algorithms such as DT, ANN, NB, and SVM. The hybrids methods
give more accuracy as compared with a single machine learning algorithm [30].
Furthermore, numerous machine learning-based noninvasive medical support systems such
as ANN, SVM, DT, NB, KNN, Logistic Regression (LR), Fuzzy logic (FL), Adaboost (AB)
are developed by the researchers in the recent years for the diagnosis of heart disease [18,31].
The use of machine learning-based medical expert systems for the diagnosis of heart disease
gradually increases which decreases the death ratio of heart patients [32]. Several machine
learning-based medical expert system for the diagnosis of HD has been reported in numerous
scientic research studies.
Support Vector Machine and Principal Component Analysis (SVM-PCA) based system is
present in [33] which achieved 88.24% classication accuracy. Another SVM based model is
presented in [34] to predict the risk of heart disease and achieved 89.9% accuracy. In [35], ANN
and Neuro Fuzzy based predictive model for heart disease that obtained 87.04% accuracy was
presented. Olaniyi et al. [36] presented a three-phase ANN model to diagnose heart disease and
achieved 88.89% classication accuracy and the proposed system easily deployed in health care
information systems. Another Ensemble-based ANN predictive model is also presented in [29]
which used statistical analysis technique for the diagnosis of heart disease and obtained 89.01%
2484 CMC, 2021, vol.67, no.2
accuracy, 95.91% specicity, and 80.09% sensitivity. Furthermore, ANN and Fuzzy Analytical
Hierarchical Processing (F-AHP) based integrated decision support medical system is presented
in [37] that achieved 91.10% classication accuracy.
3 Materials and Method
The section briey describes the research method and materials of the paper.
3.1 Dataset
Two different heart disease datasets are used in this paper to train the supervised machine
learning algorithm. The rst “heart disease dataset 2019, which is used by various researchers [13]
in recent years for the diagnosis of heart disease. The “heart disease dataset 2019” is also
publically available on the online Kaggle repository. The heart disease dataset has 1025 number
of samples, 13 features, and some missing values. The target output label has two classes that
represent the patient is normal or heart patient. The second “cardiovascular disease dataset 2019”
is also used in this paper. The cardiovascular disease dataset 2019 is also available on the online
Kaggle repository. The cardiovascular disease dataset 2019 has 70,000 number of patient samples,
11 unique features, and some missing values. The detailed description of these datasets is given
in Tabs. 1 and 2.
3.2 Experimental Design Setup
The supervised prediction experiment has been conducted to evaluate the performance of the
proposed architecture. First, we evaluate the performance of the Support vector machine (SVM)
on two different data sets. The K-fold cross-validation method is applied to split the data. To
access the performance of the architecture several performance evaluation metrics are computed.
All the computation experiment has been performed in Python 3.7 environment using several
machine learning libraries on an Intel®Corei3-3217U CPU@1.80 GHz PC.
3.3 Proposed System Model
The proposed supervised machine learning architecture empowered with fuzzy-based decision
level fusion is presented in Fig. 1. The data set which is generated by the Internet of Medical
Things (IoMT) enabled devices are used for the training of machine learning algorithms. The
proposed architecture consists of two phases: the supervised machine learning phase and the
fuzzy-based decision level fusion phase. The supervised machine learning phase has three distinct
layers that include the pre-processing layer, application layer, and performance layer. The pre-
processing layer receives raw data and maybe the raw data has some missing values and noise.
At the pre-processing layer, different methods such as mean, mode, and average are applied for
the prediction of missing values and remove the noise using normalization. Furthermore, the
application layer receives the processed data and the processed data is used to train the supervised
machine learning technique named SVM. The same mechanism is executed in parallel inside the
proposed architecture.
3.3.1 Preprocessing
In the proposed architecture, the preprocessing step includes handling missing values, moving
average, and normalization are describe as follows:
At the rst step, the null and missing values are lled in the data set, because they can lead
towards the wrong prediction of any machine learning model. In the proposed architecture, mean
CMC, 2021, vol.67, no.2 2485
method is selected to impute the missing or null values because the mean method is benecial as
it impute continuous data without introducing outliers. The mean method is formulated as:
Q(x)=(mean (x),if x =null/missed
x,otherwise
where xrepresents the instances of feature vectors that lies in n-dimensional space, xR.
Table 1: Description of heart disease dataset 2019 with feature information
Sr # Feature name Feature code Description Type Value range
(min–max)
1 Age Age Age in years Numeric 29<age>77
2 Sex Sex 1 =Male
0=Female
Nominal 1
0
3 Chest pain type Cp 1 =atypical angina
2=typical angina
3=asymptomatic
4=nonanginal pain
Nominal 1
2
3
4
4 Resting blood
pressure
Trestbps In mm Hg on
admission to the
hospital
Numeric 94–200
5 Serum cholesterol Chol In mg/dl Numeric 126–564
6 Fasting blood sugar Fbs Fasting blood sugar
&gt; 120 mg/dl)
(1 =true; 0 =false)
Nominal 1
0
7 Resting
electrocardiographic
results
Rest ECG 0 =normal
1=having ST-T
2=hypertrophy
Nominal 0
1
2
8 Maximum heart rate
achieved
Thalach Not mention Numeric 77–202
9 Exercise-induced
angina
Exang 1 =yes
0=no
Nominal 1
0
10 Old peak =ST
depression induced
by exercise relative
to rest
oldpeak Not mention Numeric 0–6.2
11 Slope of the peak
exercise ST segment
slope 1 =up sloping
2=at
3=down sloping
Nominal 1
2
3
12 Number of major
vessels (0–3) colored
by uoroscopy
Ca Not mention Nominal 1
2
3
13 Thallium scan Thal 3 =normal
6=xed defect
7=reversible defect
Nominal 3
6
7
2486 CMC, 2021, vol.67, no.2
Table 2: Description of cardiovascular disease dataset 2019 with feature information
Sr # Feature name Feature code Description Type Value range
(min–max)
1 Age Age In days Numeric 10798<day>23713
2 Gender Gender 1 =women
2=men
Nominal 1
2
3 Height Height In cm Numeric 55–250
4 Weight Weight In Kg Numeric 10–200
5 Systolic blood
pressure
ap_hi Not mention Numeric 150–16020
6 Diastolic blood
pressure
ap_lo Not mention Numeric 70–11000
7 Cholesterol Cholesterol 1 =normal
2=above normal
3=well above
normal
Nominal 1
2
3
8 Glucose Gluc 1 =normal
2=above normal
3=well above
normal
Nominal 1
2
3
9 Smoke Smoke Whether patient
1=smoke
0=not smoke
Nominal 1
0
10 Alcohol intake Alco whether patient take
1=take alcohol
0=not take alcohol
Nominal 1
0
11 Physical activity Active whether physical
active
1=active
0=not active
Nominal 1
0
In Moving average (MA), to reduce the noise from the data set, a series of averages is
computed of different subsets using full data set. Arithmetic mean of given set of values is taken
to calculate the moving average. The moving average is formulated as:
MA =x1+x2+x3+...+xn
N
where x1, x2, x3, ...,xn represents instances of the feature vector and Nrepresents total number
of attributes.
In normalization, standardization or Z-score normalization techniques is used to rescale the
values of attributes. The standardization method normalize the distribution of data with zero
mean and also reduce the skewness of the data distribution. The standardization is formulated as:
R(x)=xx
σ
CMC, 2021, vol.67, no.2 2487
where xis the instances of feature vectors with n-dimensional space, x Rn. x and σrepresent
mean and standard deviation of attributes respectively.
Figure 1: The systematic diagram of proposed supervised machine learning architecture empow-
ered with fuzzy based decision level fusion
3.3.2 K-Fold Cross Validation
The K-fold cross validation method is widely used by the researchers for the selection of
machine learning model and estimation of classiers error [28]. In the proposed architecture, 5-
fold cross-validation is used to split the data set for the training and testing of SVM. The fold-1
is used to train and ne-tune of hyper-parameters in inner loop where grid search algorithm is
employed [29]. In outer loop (k times), the performance of the model is evaluated using test data.
Since more, the data sets which are used for the training and testing of proposed architecture
has imbalanced negative and positive samples. The stratied KCV is used to preserve the ratio
of each class. The nal performance of the model is evaluated by using the following formula:
M=1
K×
K
X
n=1
Pn±v
u
u
u
t
K
X
n=1PnP2
K1where M denotes the nal performance metric for the classiers
and Pn R, n =1, 2, 3, ..., K represents the performance metric for each fold.
2488 CMC, 2021, vol.67, no.2
3.3.3 Support Vector Machine
SVM algorithm is used for regression and classication. In SVM based models, the data
points are categorized into groups, represent on the space and the points which have similar
properties falls in same group. In linear SVM, the p-dimensional vector is considered for the given
data and separated by maximum of p 1 planes that are known as hyper planes. These planes
are used to separate the data space among different data groups to regression and classication.
The mathematical representation of SVM is formulated as:
The equation of the line is described as:
a1=a2x+b (1)
In Eq. (1) ‘x’ is the slope of the line and ‘b’ is intersect, so
a1a2x+b=0
Let a =(a1, a2)T, and z =(x, 1)so the above equation can be written as
z·a+b=0 (2)
The Eq. (2) is derived from 2-dimensional vectors. The above equation is also applicable for
any number of dimensions. The Eq. (2) is also called the hyper lane equation.
Vector direction a =(a1, a2)Tis written in the form of z and dened as:
z=a1
kak+a2
kak(3)
where
kak=qa2
1+a2
2+a2
3+...a2
n
As we know that
cos (θ1)=a1
kakand cos (θ2)=a2
kak
So, Eq. (3) can also be written as
z=(cos (θ1), cos (θ2))
z·a=kzk kakcos (θ)
θ=θ1θ2
cos (θ)=cos(θ1θ2)=cos (θ1)cos (θ2)+sin (θ1)sin (θ2)
=z1
kzk
a1
kak+z2
kzk
a2
kak
=z1a1+z2a2
kzk kak
CMC, 2021, vol.67, no.2 2489
z·a=kzk kakz1a1+z2a2
kzk kak
z·a=
n
X
i=1
ziai
For n-dimensional vectors, the dot product of the above equation is computed as:
Let, f =y(z·a+b)
If sign (f) > 0 mean the classication is correct and sign (f) < 0 mean the classication is
incorrect. If D is given dataset, then f is computed on a training dataset
fi=yi(z·a+b)(4)
We also compute the functional margin (F) of a dataset as:
F=min
i=1...mfi
Through the comparison of hyperplanes, the hyperplane which has the largest F will be
selected. Where F is known as the geometric mean of the dataset. We need to nd the optimal
values of z and b for the selection of optimal hyperplane.
The Lagrangian function is
L(z, b, α)=1
2z·z
m
X
i=1
αi[y: (z·a+b)1]
zL(z, b, α)=z
m
X
i=1
αiyiai=0 (5)
bL(z, b, α)= −
m
X
i=1
αiyi=0 (6)
By using Eqs. (5) and (6) we get
z=
m
X
i=1
αiyiaiand
m
X
i=1
αiyi=0 (7)
After the substitution of Lagrangian function L we get
z(α, b)=
m
X
i=1
αi1
2
m
X
i=1
m
X
j=1
αiαjyiyjaiaj
Thus,
max
α
m
X
i=1
αi1
2
m
X
i=1
m
X
j=1
αiαjyiyjaiaj(8)
2490 CMC, 2021, vol.67, no.2
Subject to
αi0, i =1...m,
m
X
i=1
αiyi=0
Due to the inequality of constraints, the Lagrangian function can be extended to Karush-
kuhn-tucker (KKT) conditions. Eq. (9) describes the complementary conditions of KKT.
αiyizi.a+b1=0 (9)
where ais the optimal point and αis a positive value. For other points, the value of αis 0
So,
yizi.a+b1=0. (10)
Eq. (10) describe the support vectors which are closest points to the hyperplane.
z
m
X
i=1
αiyiai=0
z=
m
X
i=1
αiyiai(11)
The value of b is computed as
yizi.a+b1=0 (12)
Multiply by y on both sides of Eq. (12)
y2
izi·a+byi=0, where y2
i=1
zi·a+byi=0
b=yizi·a(13)
Then
b=1
S
s
X
i=1
(yiz·a)(14)
In Eq. (14) ‘s’ represents the number of support vectors. These support vectors make the
hyperplanes and then hyperplanes are used for prediction.
The hypothesis function is described as:
h(zi)="+1 if z ·a+b0
1 if z ·a+b<0#(15)
CMC, 2021, vol.67, no.2 2491
If the point is above the hyperplane then it will be classied as +1 class mean the HD found
and if the point is below the hyperplane then it is classied as 1 class mean the HD does
not found.
3.3.4 Fuzzy Based Fusion
After the training of the SVM, different evaluation parameters such as accuracy, sensitivity,
specicity etc. are used to evaluate the performance of the proposed architecture. Once the process
of performance evaluation is completed for both SVM individually then a fuzzy-based decision
level fusion process is applied to integrate the performance of both SVM for the nal decision as:
µFHD (fh)=µSVM1SVM 2(s1, s2)
µFHD (fh)=min [µSVM1(s1),µSVM 2(s2)](16)
In Eq. (16) the FHD denotes the fusion-based heart disease prediction. The t-norm function
for fuzzy-based fusion is dened as:
t:[0, 1]×[0, 1][0, 1]
After the implementation of the t-norm function fuzzy-based fusion implication is applied as:
µQ6(s1, s2)=min [µFP1(s1),µFP2(s2)]
The Eq. (17) shows the relationship between SVM1 and SVM2:
Q6=
6
[
e=1
Rue(17)
Eq. (18) integrates the performance of SVM1 and SVM2 in crisp form is as follow:
µϕ(L)=max
1i6"supI(s1,s2) 6
Y
k=1µs1k,s2k(s1, s2)!# (18)
Defuzzier is a very important component of an expert system. It is the process of mapping
the fuzzy sets to the crisp output. The center of gravity defuzzier is used to get the nal fused
decision of the proposed architecture. The center of gravity defuzzier species the
COG as the
center of the area covered by the membership function of ϕfor fuzzy-based decision level fusion,
that is,
COG =Rvϕµϕ(ϕ)dϕ
Rvµϕ(ϕ)dϕ
4 Results and Discussion
The proposed supervised machine learning architecture empowered with fuzzy-based decision
level fusion has been applied to two different datasets. The proposed architecture working on the
mechanism of parallel computing. Furthermore, the k-fold cross-validation method is used to split
2492 CMC, 2021, vol.67, no.2
the dataset into different folds for the training and testing of the proposed architecture. Different
evaluation metrics are used to access the performance of the architecture which are as follows.
Accuracy =TP +TN
TP +TN +FP +FN ×100%
Miss rate =FP +FN
TP +TN +FP +FN ×100%
Sensitivity/recall =TP
TP +FN ×100%
Specicity =TN
TN +FP ×100%
Percision =TP
TP +FP ×100%
False positive ratio =1specicity
False negative ratio =1sensitivity
The proposed architecture predicts the output as positive (+1) and negative (1). The +1
indicates the presence of heart disease and 1 indicates that no symptoms of heart disease
found in the patient. The performance of the proposed supervised machine learning architecture
empowered with fuzzy-based decision level fusion using different statistical metrics are shown in
Tab. 3. In Tab. 3 it is clearly shown that the proposed architecture achieved effective results during
fold-5 cross-validation. The architecture achieved 96.23%, 95.64%, 94.36%, 97.01%, 3.7%, 4.36%,
and 2.99% in terms of accuracy, specicity, precision, sensitivity, miss rate, false positive ratio, and
false negative ratio respectively.
Table 3: The performance of the proposed architecture
K-fold Accuracy
(%)
Specicity
(%)
Precision
(%)
Sensitivity
(%)
Miss rate
(%)
False positive
ratio (FPR)
(%)
False negative
ratio (FNR)
(%)
2-fold 84.41 83.06 76.62 86.52 15.58 16.94 13.48
3-fold 94.70 94.10 92.32 95.75 5.19 5.9 4.25
4-fold 94.80 95.85 94.70 95.52 4.29 4.15 4.48
5-fold 96.23 95.64 94.36 97.01 3.7 4.36 2.99
The comparison of proposed supervised machine learning architecture empowered with fuzzy-
based decision level fusion with existing methods is described in Tab. 4. Different machine learning
methods and architectures for the diagnosis of heart disease which is presented by the researcher
in recent years such as Multilayer Perceptron combine with SVM (MLP+SVM), Hybrid machine
learning-based diagnostic system, ANN combine with FL (ANN +FL), Hybrid Random Forest
with a Linear Model (HRFLM), ANN combine with Fuzzy Analytical Hierarchy Process (AHP)
(ANN +Fuzzy AHP) etc. are studied for the comparative analysis of proposed architecture. The
accuracy performance metric is used to compare the performance of proposed architecture with
CMC, 2021, vol.67, no.2 2493
existing methods in the eld of heart disease. It is observed that the proposed architecture gives
better results in terms of accuracy as compared to the other existing methods.
Table 4: Comparative analysis of proposed architecture with existing methods
Study Method Year of
proposed
Evaluation in term
of accuracy (%)
[33] Support vector machine +principle
component analysis (SVM +PCA)
2018 88.24
[28] Hybrid machine learning-based diagnostic
system
2019 88.47
[35] Articial neural network +Neuro Fuzzy logic
(ANN +NF)
2014 87.04
[34] Support vector machine based heart disease
risk prediction model
2020 89.9
[29] Big data +optimal articial neural network
(OANN)-based diagnostic system
2020 90.91
[36] ANN-based three-phase method 2015 88.89
[37] Articial neural network+Fuzzy analytical
hierarchy process (AHP) (ANN +Fuzzy
AHP)
2017 91.1
[38] HD detection system based on a set of
relief-rough
2017 92.32
[25] Fast conditional mutual information
(FCMIM)+support vector machine
(FCMIM +SVM)
2020 92.37
[39] Cloud computing and machine learning
algorithm support vector machine (SVM)
2020 93.33
The Proposed
architecture
Fussion based machine learning 2020 96.23
5 Conclusion
The early diagnosis of heart abnormalities and information related to heart condition from
raw health care data is very important which could help to save human lives in the long term.
In recent years, machine learning methods and techniques have achieved effective performance
to process raw data and give a novel and new discernment toward heart disease. The prediction
of heart disease is an important and a challenging task in the eld of medical. However, the
mortality rate of heart disease can be signicantly controlled if heart disease is diagnosed at an
early stage and adopt preventative measures. Furthermore, different machine learning methods
and techniques for the diagnosis of heart disease are presented in recent years. The existing
machine learning methods have some limitations in terms of accuracy. Therefore, the proposed
supervised machine learning architecture empowered with fuzzy-based decision level fusion has
achieved 96.23% accuracy which is much better than the existing methods. The proposed work
can be extended by using different machine learning algorithms such as Articial Neural Network,
Decision Tree, Random Forest etc. along with SVM.
2494 CMC, 2021, vol.67, no.2
Acknowledgement: Thanks to our families & colleagues who supported us morally.
Funding Statement: The author(s) received no specic funding for this study.
Conicts of Interest: The authors declare that they have no conicts of interest to report regarding
the present study.
References
[1] A. L. Bui, T. B. Horwich and G. C. Fonarow, “Epidemiology and risk prole of heart failure,Nature
Reviews Cardiology, vol. 8, no. 1, pp. 30–41, 2011.
[2] M. Durairaj and N. Ramasamy, “A comparison of the perceptive approaches for preprocessing the data
set for predicting fertility success rate,International Journal of Control theory and Applications, vol. 9,
no. 27, pp. 1–7, 2016.
[3] L. A. Allen, L. W. Stevenson, K. L. Grady, N. E. Goldstein, D. D. Matlock et al., “Decision making
in advanced heart failure: A scientic statement from the American Heart Association,Circulation,
vol. 125, no. 15, pp. 1928–1952, 2012.
[4] S. Ghwanmeh, A. Mohammad and A. A. Ibrahim, “Innovative articial neural networks-based decision
support system for heart diseases diagnosis,” Journal of Intelligent Learning Systems and Applications,
vol. 5, pp. 1–6, 2013.
[5] Q. K. A. Shayea, “Articial neural networks in medical diagnosis,International Journal of Computer
Science, vol. 8, no. 2, pp. 150–154, 2011.
[6] A. J. S. Coats, “Ageing, demographics, and heart failure,European Heart Journal Supplements, vol. 21,
no. Supplement_L, pp. L4–L7, 2019.
[7] I. Spoletini and M. Lainscak, “Epidemiology and prognosis of heart failure,” International Cardiovas-
cular Forum Journal, vol. 10, pp. 1–6, 2017.
[8] P. A. Heidenreich, J. G. Trogdon, O. A. Khavjou, J. Butler, K. Dracup et al., “Forecasting the future of
cardiovascular disease in the United States: A policy statement from the American Heart Association,
Circulation, vol. 123, no. 8, pp. 933–944, 2011.
[9] A. U. Haq, J. P. Li, M. H. Memon, S. Nazir and R. Sun, “A hybrid intelligent system framework
for the prediction of heart disease using machine learning algorithms,” Mobile Information Systems,
vol. 2018, no. 8, pp. 1–21, 2018.
[10] A. Tsanas, M. A. Little, P. E. M. Sharry and L. O. Ramig, “Nonlinear speech analysis algorithms
mapped to a standard metric achieve clinically useful quantication of average Parkinson’s disease
symptom severity,Journal of the Royal Society Interface, vol. 8, no. 59, pp. 842–855, 2011.
[11] A. H. Gonsalves, F. Thabtah, R. M. A. Mohammad and G. Singh, “Prediction of coronary heart
disease using machine learning: An experimental analysis,” in Int. Conf. on Deep Learning Technologies,
Xiamen, China, pp. 51–56, 2019.
[12] P. Sharma, K. Choudhary, K. Gupta, R. Chawla, D. Gupta et al., Articial plant optimization
algorithm to detect heart rate & presence of heart disease using machine learning,” Articial Intelligent
Medicine, vol. 102, pp. 101752–101765, 2020.
[13] Y. Khan, U. Qamar, N. Yousaf and A. Khan, “Machine learning techniques for heart disease datasets:
A survey,” in Int. Conf. on Machine Learning and Computing, Zhuhai, China, pp. 27–35, 2019.
[14] G. P. Diller, A. Kempny, S. V. B. Narayan, M. Henrichs, M. Brida et al., “Machine learning algorithms
estimating prognosis and guiding therapy in adult congenital heart disease: Data from a single tertiary
centre including 10019 patients,” European Heart Journal, vol. 40, no. 13, pp. 1069–1077, 2019.
[15] N. S. C. Reddy, S. S. Nee, L. Z. Min and C. X. Ying, “Classication and feature selection approaches
by machine learning techniques: Heart disease prediction,International Journal of Innovative Computing,
vol. 9, no. 1, pp. 39–46, 2019.
CMC, 2021, vol.67, no.2 2495
[16] Y. Meng, W. Speier, C. Shufelt, S. Joung, J. E. V. Eyk et al., A machine learning approach to classifying
self-reported health status in a cohort of patients with heart disease using activity tracker data,IEEE
Journal of Biomedical and Health Informatics, vol. 24, no. 3, pp. 878–884, 2019.
[17] S. I. Ansarullah and P. Kumar, “A systematic literature review on cardiovascular disorder identication
using knowledge mining and machine learning method,International Journal of Recent Technology and
Engineering, vol. 7, no. 65, pp. 1009–1015, 2019.
[18] S. Nazir, S. Shahzad, S. Mahfooz and M. Nazir, “Fuzzy logic based decision support system for
component security evaluation,International Arab Journal of Information Technology, vol. 15, no. 2,
pp. 224–231, 2018.
[19] J. Nahar, T. Imam, K. S. Tickle and Y. P. P. Chen, “Computational intelligence for heart disease
diagnosis: A medical knowledge driven approach,Expert Systems with Applications, vol. 40, no. 1,
pp. 96–104, 2013.
[20] J. Nahar, T. Imam, K. S. Tickle and Y. P. P. Chen, “Association rule mining to detect factors which
contribute to heart disease in males and females,Expert Systems with Applications, vol. 40, no. 4,
pp. 1086–1093, 2013.
[21] C. B. Gokulnath and S. P. Shantharajah, “An optimized feature selection based on genetic approach
and support vector machine for heart disease,Cluster Computing, vol. 22, no. S6, pp. 14777–
14787, 2019.
[22] M. S. Amin, Y. K. Chiam and K. D. Varathan, “Identication of signicant features and data mining
techniques in predicting heart disease,Telematics and Informatics, vol. 36, pp. 82–93, 2019.
[23] H. Ahmed, E. M. G. Younis, A. Hendawi and A. A. Ali, “Heart disease identication from patients’
social posts, machine learning solution on spark,Future Generation Computor Systems, vol. 111,
pp. 714–722, 2020.
[24] R. E. Bialy, M. A. Salamay, O. H. Karam and M. E. Khalifa, “Feature analysis of coronary artery
heart disease data sets,” Procedia Computer Science, vol. 65, pp. 459–468, 2015.
[25] J. P. Li, A. U. Haq, S. U. Din, J. Khan, A. Khan et al., “Heart disease identication method using
machine learning classication in e-healthcare,” IEEE Access, vol. 8, pp. 107562–107582, 2020.
[26] Y. Meidan, M. Bohadana, A. Shabtai, J. D. Guarnizo, M. Ochoa et al., “Proliot: A machine learning
approach for IoT device identication based on network trafc analysis,” in Proc. of the Symp. on Applied
Computing, Marrakech, Morocco, pp. 506–509, 2017.
[27] L. Baccour, “Amended fused TOPSIS-VIKOR for classication (ATOVIC) applied to some uci data
sets,Expert Systems with Applications, vol. 99, pp. 115–125, 2018.
[28] S. Mohan, C. Thirumalai and G. Srivastava, “Effective heart disease prediction using hybrid machine
learning techniques,” IEEE Access, vol. 7, pp. 81542–81554, 2019.
[29] R. T. Selvi and I. Muthulakshmi, “An optimal articial neural network based big data application
for heart disease diagnosis and classication model,” Journal of Ambient Intelligence and Humanized
Computing, vol. 10, pp. 1–11, 2020.
[30] C. A. Cheng and H. W. Chiu, “An articial neural network model for the evaluation of carotid artery
stenting prognosis using a national-wide database,” in Annual Int. Conf. of the IEEE Engineering in
Medicine and Biology Society, Seogwipo, South Korea, pp. 2566–2569, 2017.
[31] S. Nazir, S. Shahzad and L. S. Riza, “Birthmark-based software classication using rough sets,” Arabian
Journal for Science and Engineering, vol. 42, no. 2, pp. 859–871, 2017.
[32] A. Methaila, P. Kansal, H. Arya and P. Kumar, “Early heart disease prediction using data mining
techniques,Computer Science and Information Technology Journal, vol. 5, pp. 53–59, 2014.
[33] C. Yang, B. An and S. Yin, “Heart-disease diagnosis via support vector machine-based approaches,
in IEEE Int. Conf. on Systems, Man, and Cybernetics, Miyazaki, Japan, pp. 3153–3158, 2018.
[34] H. Y. Lu, “Applying propensity score and support vector machine to construct a predictive model for
heart disease,” in 4th Int. Conf. on Medical and Health Informatics, Kamakura, Japan, pp. 18–21, 2020.
[35] M. A. M. Abushariah, A. A. M. Alqudah, O. Y. Adwan and R. M. M. Yousef, “Automatic heart
disease diagnosis system based on articial neural network and adaptive neuro-fuzzy inference systems
approaches,Journal of Software Engineering and Applications, vol. 7, no. 12, pp. 1055–1064, 2014.
2496 CMC, 2021, vol.67, no.2
[36] E. O. Olaniyi, O. K. Oyedotun and K. Adnan, “Heart diseases diagnosis using neural networks
arbitration,International Journal of Intelligent Systems and Applications, vol. 7, no. 12, pp. 72–78, 2015.
[37] O. W. Samuel, G. M. Asogbon, A. K. Sangaiah, P. Fang and G. Li, “An integrated decision sup-
port system based on ANN and Fuzzy_AHP for heart failure risk prediction,Expert Systems with
Applications, vol. 68, pp. 163–172, 2017.
[38] X. Liu, X. Wang, Q. Su, M. Zhang, Y. Zhu et al., A hybrid classication system for heart disease
diagnosis based on the rfrs method,Computational and Mathematical Methods in Medicine, vol. 2017,
pp. 1–10, 2017.
[39] M. A. Khan, S. Abbas, A. Atta, A. Ditta, H. Alquhayz et al., “Intelligent cloud based heart disease
prediction system empowered with supervised machine learning,Computers, Materials & Continua,
vol. 65, no. 1, pp. 139–151, 2020.
... The above-related literature study is considered mainly for predicting HD using conventional ML models with the specific motivation of resource-constrained devices' perspective use of less memory and a significantly shorter response time. ML models are less complex and more effective for the scenario of real-life human disease detection compared to other approaches that exist in stateof-the-art (SOTA) models based on deep learning methods by using the ECG signal and applying CNN models [27], [28], genetic algorithms [29], fuzzy-based [30], etc. The existing literature and most relevant articles are summarized in Table I. ...
Article
Full-text available
Cardiovascular diseases (CVDs) continue to be a prominent cause of global mortality, necessitating the development of effective risk prediction models to combat the rise in heart disease (HD) mortality rates. This work presents a novel dual-stage stacked machine learning (ML) based computational risk prediction model for cardiac disorders. Leveraging a dataset that includes eleven significant characteristics from 1190 patients from five distinct sources, five ML classifiers are utilized to create the initial prediction model. To ensure robustness and generalizability, the classifiers are cross-validated ten times. The model performance is optimized by employing two hyperparameter tuning approaches: RandomizedSearchCV and GridSearchCV. These methods aim to find the optimal estimator values. The highest-performing models, specifically Random Forest, Extreme Gradient Boost, and Decision Tree, undergo additional refinement using a stacking ensemble technique. The stacking model, which leverages the capabilities of the three models, attains a remarkable accuracy rate of 96%, a recall value of 0.98, and a ROC-AUC score of 0.96. Notably, the rate of false-negative results is below 1%, demonstrating a high level of accuracy and non-overfitted model. To evaluate the model’s stability and repeatability, a comparable dataset consisting of 1000 occurrences is employed. The model consistently achieves an accuracy of 96.88% under identical experimental settings. This highlights the strength and dependability of the suggested computer model for predicting the risk of cardiac illnesses. The outcomes indicate that employing this two-step stacking ML method shows potential for prompt and precise diagnosis, hence aiding the worldwide endeavor to decrease fatalities caused by heart disease.
... A variety of conditions, such as heart disease and stroke, can have an impact on the heart [1,2]. Heart disease is considered a dangerous condition, because most people suffer from the effects of cardiovascular disease and other heart-related illnesses [3]. According to research conducted by the American Heart Association (AHA) and the American College of Cardiology (ACC) in Washington, most cardiac patients die because of their heart attacks [4]. ...
Article
Full-text available
Heart disease has seriously threatened people's health in recent decades due to its prevalence and high mortality rate. Detecting heart disease through clinical features is a major challenge in today’s world. Machine Learning (ML) and Deep Learning (DL) are technological innovations now being effectively used in healthcare, disease prediction, and biomedical care. This paper proposes a model RL-DLBH for identifying the Risk Levels (RL), to select the best features using BHHO(BH) and a deep learning model based on a Deep bi-LSTM (DL) classifier for detecting heart diseases and determining their risk level. The method was created by first choosing the most important attributes from the dataset by using BHHO to determine whether the patient has heart disease and the level of heart disease risk in patients based on their clinical report, and then using a deep Bi-LSTM model as a classifier. The characteristics of heart disease are defined by the main risk factors. ST depression, the highest heart rate, cholesterol, and chest pain are all factors to consider. Class labels were assigned to the following risk levels: risk level 1, risk level 2, and risk level 3. Three distinct datasets namely Cleveland, Hungarian, and CH (Cleveland and Hungarian) are used in this work. The experimental results show that BHHO with Deep bi-LSTM performs well with a classification accuracy of 98.12% compared with the existing models.
Article
Full-text available
INTRODUCTION: Cardiovascular disease (CVD) is the most common cause of death worldwide, and its prevalence is rising in low-resource settings and among those with lower incomes. OBJECTIVES: Machine learning (ML) algorithms are quickly evolving and being implemented in medical procedures for CVD diagnosis and treatment decisions. Every day, the healthcare business creates massive amounts of data. However, the majority of it is inadequately utilized. Efficient techniques for extracting knowledge from these datasets for clinical diagnosis or other uses are scarce. METHODS: ML is being applied in the healthcare industry all over the world. In the health dataset, ML approaches useful in the prevention of locomotor disorders and heart disease. RESULTS: The revelation of such vital information allows researchers to acquire significant insight into how to use the proper treatment and diagnosis for a specific patient. Researchers study enormous volumes of complex healthcare data using various ML approaches, which improves healthcare professionals in disease prediction. CONCLUSION: The goal of this study is to summarize some of the current research on predicting heart diseases utilizing machine learning and data mining techniques, analyze the various mining algorithm combinations employed, and determine which techniques are useful and efficient. Future directions in prediction systems have also been considered.
Article
Full-text available
Open Access | Rapid and quality publishing Research Article | Volume 11, Issue 2 | Pages 253-261 | e-ISSN: 2347-470X 253 Website: www.ijeer.forexjournal.co.in Hybrid Optimization Based Feature Selection with DenseNet ░ ABSTRACT-The prevalence of cardiovascular diseases (CVD) makes it one of the leading reasons of death worldwide. Reduced mortality rates may result from early detection of CVDs and their potential prevention or amelioration. Machine learning models are a promising method for identifying risk variables. In order to make accurate predictions about cardiovascular illness, we would like to develop a model that makes use of transfer learning. Our proposed model relies on accurate training data, which was generated by careful Data Collecting, Data Pre-processing, and Data Transformation procedures. Additionally, the optimal selection is carried out on the existing attributes by fusing two meta-heuristic procedures, the Lion Algorithm (LA) and the Butterfly Optimization Procedure (BOA), into a single method dubbed the hybrid Lion-based BOA (L-BOA). In this research, we analyse the amount of parameters in a deep learning model and provide an end-to-end solution for classifying patients as healthy or unwell. To extract deep features from the best data, the proposed approach makes use of pre-trained convolutional neural networks-(CNNs) dubbed DenseNet121. The model can benefit from a more nuanced feature set composed of the features derived from each CNN. Accuracy, precision, recall, and the F1-Score were used to rate the trained classifiers. The models' classification results demonstrated that the inclusion of pertinent characteristics significantly improved the classification precision. When compared to models skilled on a full feature set, the performance of organization replicas trained with a smaller feature set improved dramatically with less training time.
Article
Full-text available
Cardiovascular diseases (CVDs) pose a significant global public health challenge, necessitating precise risk assessment for proactive treatment and optimal utilization of healthcare resources. This study employs machine learning algorithms and sophisticated feature selection techniques to enhance the accuracy and comprehensibility of CVD prediction models. While traditional risk assessment tools are valuable, they frequently fail to consider the myriad intricate factors that contribute to the heightened risk of CVD. Our methodology employs machine learning algorithms to analyze diverse healthcare data sources and produce advanced predictive models. The salient feature of this research lies in the meticulous application of advanced feature selection techniques, enabling the identification of pivotal factors within heterogeneous datasets. Optimizing feature selection enhances the interpretability of the model, reduces dimensionality, and improves predictive accuracy. The area under the ROC curve (AUC-ROC) score of the wrapper method model significantly decreased from 95.1% to 75.1% after tuning, based on empirical tests that supported the suggested method. This showcases its capacity as a tool for assessing premature CVD susceptibility and developing tailored healthcare strategies. The study highlights the significance of integrating machine learning with feature selection due to the widespread influence of cardiovascular diseases. Integrating this system has the potential to enhance patient care and optimize the utilization of healthcare resources.
Article
Full-text available
The innovation in technologies related to health facilities today is increasingly helping to manage patients with different diseases. The most fatal of these is the issue of heart disease that cannot be detected from a naked eye, and attacks as soon as the human exceeds the allowed range of vital signs like pulse rate, body temperature, and blood pressure. The real challenge is to diagnose patients with more diagnostic accuracy and in a timely manner, followed by prescribing appropriate treatments and keeping prescription errors to a minimum. In developing countries, the domain of healthcare is progressing day by day using different Smart healthcare: emerging technologies like cloud computing, fog computing, and mobile computing. Electronic health records (EHRs) are used to manage the huge volume of data using cloud computing. That reduces the storage, processing, and retrieval cost as well as ensuring the availability of data. Machine learning procedures are used to extract hidden patterns and data analytics. In this research, a combination of cloud computing and machine learning algorithm Support vector machine (SVM) is used to predict heart diseases. Simulation results have shown that the proposed intelligent cloud-based heart disease prediction system empowered with a Support vector machine (SVM)-based system model gives 93.33% accuracy, which is better than previously published approaches.
Article
Full-text available
At present days, world is facing several issues like irregular distribution of medicinal resources, new chronic diseases, and the raising operating cost. The way of combining recent technologies into the medical system will helps to significantly resolve the problems. This study introduces a big health application system based on optimal artificial neural network (OANN) for heart disease diagnosis, which is considered as a deadliest disease in all over the globe. The proposed OANN includes a set of two main processes namely, distance based misclassified instance removal (DBMIR) and teaching and learning based optimization (TLBO) algorithm for ANN, called (TLBO-ANN). The proposed model is developed using a Big Data framework like Apache Spark. The presented OANN model operates on two phases, namely offline prediction and online prediction. During the offline prediction stage, the benchmark heart disease dataset will be used to train a model and performs testing. Similarly, at the online prediction stage, the real time data will be streamed into Apache Spark model and the filtered data will be diagnosed by the use of trained model to obtain the prediction results. The performance of the presented OANN model has been tested using a benchmark heart disease dataset from UCI repository. A comprehensive experimental result analysis clearly verified the better outcome of the OANN model over the compared methods. The proposed method is found to be an effective tool to analyze big data based heart disease prediction model to satisfy the need of increasing number of heart patients.
Article
Full-text available
Heart disease is one of the complex diseases and globally many people suffered from this disease. On time and efficient identification of heart disease plays a key role in healthcare, particularly in the field of cardiology. In this article, we proposed an efficient and accurate system to diagnosis heart disease and the system is based on machine learning techniques. The system is developed based on classification algorithms includes Support vector machine, Logistic regression, Artificial neural network, K-nearest neighbor, Naïve bays, and Decision tree while standard features selection algorithms have been used such as Relief, Minimal redundancy maximal relevance, Least absolute shrinkage selection operator and Local learning for removing irrelevant and redundant features. We also proposed novel fast conditional mutual information feature selection algorithm to solve feature selection problem. The features selection algorithms are used for features selection to increase the classification accuracy and reduce the execution time of classification system. Furthermore, the leave one subject out cross-validation method has been used for learning the best practices of model assessment and for hyperparameter tuning. The performance measuring metrics are used for assessment of the performances of the classifiers. The performances of the classifiers have been checked on the selected features as selected by features selection algorithms. The experimental results show that the proposed feature selection algorithm (FCMIM) is feasible with classifier support vector machine for designing a high-level intelligent system to identify heart disease. The suggested diagnosis system (FCMIM-SVM) achieved good accuracy as compared to previously proposed methods. Additionally, the proposed system can easily be implemented in healthcare for the identification of heart disease.
Article
Full-text available
Heart failure (HF) is a complex clinical syndrome resulting from structural or functional cardiac disorders. In the developed world, HF is primarily a disorder of the elderly. It is one that is accompanied by many non-cardiac comorbidities that affect treatments given, the patient’s response and treatment tolerance and outcomes. Even the pathophysiological mechanisms of HF change as we look at older patient populations. Younger HF patients typically have ischaemic heart disease and HF with reduced ejection fraction (HFrEF), whereas older patients have more hypertension HF with preserved ejection fraction (HFpEF). The prevalence of HF has progressively increased for many years and rises even more steeply with age. The outcomes of older especially HFpEF patients have not progressed as much younger HFrEF cohorts. We need more studies specifically recruiting older HF patients with more comorbidities, to guide real-world practice, and we need more assessment of patient-reported outcomes and quality of life rather than just mortality effects. The management of elderly patients with HF requires a more holistic approach recognizing individual needs and necessary support mechanisms and our future trials need to guide us more in achieving these gains.
Conference Paper
Full-text available
The field of medical analysis is often referred to be a valuable source of rich information. Coronary Heart Disease (CHD) is one of the major causes of death all around the world therefore early detection of CHD can help reduce these rates. The challenge lies in the complexity of the data and correlations when it comes to prediction using conventional techniques. The aim of this research is to use the historical medical data to predict CHD using Machine Learning (ML) technology. The scope of this research is limited to using three supervised learning techniques namely Naïve Bayes (NB), Support Vector Machine (SVM) and Decision Tree (DT), to discover correlations in CHD data that might help improving the prediction rate. Using the South African Heart Disease dataset of 462 instances, intelligent models are derived by the considered ML techniques using 10-fold cross validation. Empirical results using different performance evaluation measures report that probabilistic models derived by NB are promising in detecting CHD.
Article
The heart disease has been one of the major causes of death worldwide. The heart disease diagnosis has been expensive nowadays, thus it is necessary to predict the risk of getting heart disease with selected features. The feature selection methods could be used as valuable techniques to reduce the cost of diagnosis by selecting the important attributes. The objectives of this study are to predict the classification model, and to know which selected features play a key role in the prediction of heart disease by using Cleveland and statlog project heart datasets. The accuracy of random forest algorithm both in classification and feature selection model has been observed to be 90–95% based on three different percentage splits. The 8 and 6 selected features seem to be the minimum feature requirements to build a better performance model. Whereby, further dropping of the 8 or 6 selected features may not lead to better performance for the prediction model.
Article
In today's world, cardiovascular diseases are prevalent becoming the leading cause of death; more than half of the cardiovascular diseases are due to Coronary Heart Disease (CHD) which generates the demand of predicting them timely so that people can take precautions or treatment before it becomes fatal. For serving this purpose a Modified Artificial Plant Optimization (MAPO) algorithm has been proposed which can be used as an optimal feature selector along with other machine learning algorithms to predict the heart rate using the fingertip video dataset which further predicts the presence or absence of Coronary Heart Disease in an individual at the moment. Initially, the video dataset has been pre-processed, noise is filtered and then MAPO is applied to predict the heart rate with a Pearson correlation and Standard Error Estimate of 0.9541 and 2.418 respectively. The predicted heart rate is used as a feature in other two datasets and MAPO is again applied to optimize the features of both datasets. Different machine learning algorithms are then applied to the optimized dataset to predict values for presence of current heart disease. The result shows that MAPO reduces the dimensionality to the most significant information with comparable accuracies for different machine learning models with maximum dimensionality reduction of 81.25%. MAPO has been compared with other optimizers and outperforms them with better accuracy.
Article
Heart diseases are one of the first causes of death worldwide. This paper presents a real-time system for predicting heart disease from medical data streams that describe a patient’s current health status. The main goal of the proposed system is to find the optimal machine learning algorithm that achieves high accuracy for heart disease prediction. Two types of features selection algorithms, univariate feature selection and Relief, are used to select important features from the dataset. We compared four types of machine learning algorithms; Decision Tree, Support Vector Machine, Random Forest Classifier, and Logistic Regression Classifier with the selected features as well as full features. We apply hyperparameter tuning and cross-validation with machine learning to enhance accuracy. One core merit of the proposed system is able to handle Twitter data streams that contain patients’ data efficiently. This is done by integrating Apache Kafka with Apache Spark as the underlying infrastructure of the system. The results proved that the random forest classifier outperforms the other models by achieving the highest accuracy at 94.9%.