Conference PaperPDF Available

A Compassion of Three Data Miming Algorithms for Heart Disease Prediction

Authors:

Figures

Content may be subject to copyright.
978-1-6654-1591-0/21/$31.00 ©2021 IEEE
A Compassion of Three Data Miming Algorithms for Heart
Disease Prediction
Abstract— Heart disease is one of the most common causes of
death worldwide. Real-time methods for forecasting heart
disease from medical data sources that explain a patient's
current health status are discussed in this paper. The proposed
system's main aim is to find the best data mining algorithm for
predicting heart disease with high accuracy. We suggested
using Decision Tree (DT), Support Vector Machine (SVM) and
Naïve Bayes (NB) algorithms. All of these algorithms are
classified as supervised learning and work better with training
data. The main purpose of using three algorithms is to see which
one is the best at predicting heart disease. The result shows that
the DT algorithm provides the best accuracy with less training
time when compared to SVM and Naïve Bayes(NB).
Keywords— Data Mining, Heart Disease Prediction, Support
Vector Machine, Decision Trees, Naïve Bayes.
I. INTRODUCTION
The data mining (DM) method is entirely reliant on computers, it
is crucial for extracting valuable and useful knowledge from huge
databases [1]. Because of the large amount of non-intuitive
information in large sets of facts, the process of data extraction is
very interesting and useful for exploring and analysing big data
[2]. DM is useful in the clinical sector since it includes many
coded patterns that can be derived from huge data sets by
extracting different medical data [3]. Moreover, In the process of
excretion, disease prediction is significant, and data mining
techniques used in health care are very useful for answering a
series of questions about heart disease prediction [4] [5]. The
rising occurrence of heart disease has become a worldwide
concern, the healthcare sector must form and intensify the way
these diseases are treated in order to reduce their social effects
[6]. In healthcare there is a lot of data, especially data on heart
disease, that needs to be analysed quickly in order to make faster
decisions about the patient [7]. According to data, clinical
records, and hospital management, medical data doubles every
three years that make a huge data. In medical data analysis and
information extraction, DM techniques are extremely important
[8] [9]. The rising rates of morbidity and mortality from heart
disease around the world have prompted researchers to perform a
slew of studies in an attempt to reduce the numbers [10] [11] [12].
In the development of clinical decision support systems for heart
disease prediction, data mining methods have been widely used
[13]. DM tools are used to improve patient policy-making and
prevent hospital errors, as well as early detection, disease
prevention, and avoidable hospital deaths [14]. As a response, a
highly accurate approach that can be used as an analysis tool to
uncover secret heart disease trends in medical data and predict
heart disease before it occurs is needed [15]. This would lead to
improving heart diseases control by using data mining algorithms
that will help for prediction of the heart diseases in early stage
[16]. The data mining algorithms that mostly used in heart disease
prediction are SVM, Decision Tree and NB that we used in our
study and depended on them[17] [18]. These classification
algorithms are used to find a model that represents and
distinguishes classes or concepts, and are one of the predictive
data mining tasks that help to predicting high accuracy. The paper
is organized as follow the literature study in section 2, then the
dataset in section 3, following by method in section 4 and the
result in section 5, finally in section 6 the conclusion of the paper.
II. RELATED WORK
M. J. A. Alkhafaji et al., [19] suggested that using three
techniques to collect data with acceptable precision Before the
process of accessing information in order to make the appropriate
decision for the patient to predict heart disease, classification
review requirements must be met. The findings show that the
efficiency, prediction accuracy, and diagnosis, decision trees
technology outperforms the Bayesian classification technique and
the neural network technique (98.85%, 98.16%, 91.31%)
respectively.
H. Ahmed et al.,[20] presented a method based on “Apache
Spark and Apache Kafka” that predicted heart disease in real-
time. They evaluate the features in the dataset and choose the best
set of features using two feature selection algorithms in this
component, Univariate feature selection and Relief. In addition,
A number of machine learning classification algorithms were
used to classify the entire collection of features as well as selected
features, including (SVM, DT, RF, and LR). The results show that
the random forest classifier outperforms the other models with a
94.9 % accuracy score.
S. Anitha et al., [21] proposed three supervised machine
learning algorithms K-Nearest Neighbour, Naive Bayes, and
SVM is compared using the heart diseases dataset. To determine
whether or not a patient has heart disease, which will help in the
Jwan Najeeb Saeed
Department of Information Technology,
Duhok Polytechnic University, Duhok,
Kurdistan Region, Iraq
Jwan.najeeb@dpu.edu.krd
Adnan Mohsin Abdulazeez
Research Centre of
Duhok Polytechnic University
Duhok, Kurdistaion Region, Iraq
adnan.mohsin@dpu.edu.krd
Noor Salah Hassan
Technical College of Informatics-Akre
Duhok Polytechnic University
Duhok, Kurdistaion Region, Iraq
noor.salah.hassan6@gmail.com
Diyar Qader Zeebaree
Research Center of
Duhok Polytechnic University,
Duhok, Kurdistan Region, Iraq
dqszeebaree@dpu.edu.krd
Falah Y.H. Ahmed
Faculty of Information Science and Engineering
(FISE), Management and Science University,
Shah Alam, Selangor Malaysia
falah_ahmed@msu.edu.my
Adel Al-zebari
Technical College of Informatics-Akre
Duhok Polytechnic University
Duhok, Kurdistaion Region, Iraq
adel.ali@dpu.edu.krd
2021 IEEE Symposium on Industrial Electronics & Applications (ISIEA) | 978-1-6654-1591-0/21/$31.00 ©2021 IEEE | DOI: 10.1109/ISIEA51897.2021.9509985
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on August 13,2021 at 20:15:08 UTC from IEEE Xplore. Restrictions apply.
diagnosis. The Naive Bayes algorithm correctly predicts the
disease 86.6%, according to the results of the experiments.
M. Tarawneh et al., [22] proposed a hybrid system, which is a
new method for predicting heart disease that incorporates all
methods into a single algorithm. The findings indicate that a
composite model of all approaches can be used to make an
effective diagnosis. 89.2% accuracy was reached using data
mining algorithms such as Naive Bayes, SVM, K-Nearest
Neighbour, Neural Network, J4.8, Random Forest. The Naive
Bayes and SVM techniques had the highest accuracy on the entire
data collection.
S. Bashir et al., [23] the use of data science to predict heart
disease in the medical field has been presented. They focused on
feature selection techniques, experimenting with and
demonstrating improved accuracy using data mining techniques
on a variety of heart disease datasets (Decision Tree, Logistic
Regression, Logistic Regression SVM, Nave Bayes, and Random
Forest). Logistic regression is the best feature selection tool for
predicting heart disease, since it achieves the highest level of
precision.
R. U. Khan et al., [24] proposed the development of an IHDDA
(Intelligent Heart Diseases Diagnosis Algorithm) that reads heart
signals (such as ECG graphs) and allows an accurate diagnosis in
a fraction of a second. On a supercomputer cluster, the proposed
algorithm was reviewed with 300 patients with different heart
problems. The results show that the IHDDA can detect heart
problems in 1.5 seconds with a 97% accuracy, allowing for the
highest diagnostic yield with the least amount of physician effort.
S. Ramasamy et al., [25] proposed that the association rule
mining algorithm be used to extract matched features from the
hospital knowledge database and that the keyword-based
clustering algorithm be used to identify the patient's specific
disease [46,47]. The aim is to use data mining techniques to
predict potential disease from a patient data set and to figure out
which model generates the most accurate diagnosis predictions.
Their findings indicate that this algorithm can help diagnose heart
disease more efficiently and quickly.
Y. Sharma et al., [26] suggested using the K-Means Clustering
and Decision Tree algorithm these two data mining methods that
used together, These two methods for predicting heart diseases,
one using unsupervised learning and the other using supervised
learning, take very different approaches. The results show that the
K-Means and Decision Tree algorithms have been merged into a
single Hybrid Classifier, which performs better than their
individual classifiers to predicting heart disease with high
accuracy.
C. Beyene et al., [27] proposed to use data mining and machine
learning algorithms for automated disease detection in healthcare
centres to predict the occurrence of heart diseases. Support Vector
Machine, Decision Tree, Nave Bayes, K-Nearest Neighbour, and
Artificial Neural Network are used to assist doctors in making
decisions. The findings show that using the J48, Nave Bayes, and
Support Vector Machine algorithms to predict the occurrence of
heart disease for early automated diagnosis and rapid retrieval of
results aids in delivering high-quality services while reducing
costs, ultimately saving lives.
S. K. J. et al., [28] proposed using two supervised data mining
algorithms on the dataset to estimate the likelihood of patient
developing heart disease. were evaluated using the Nave Bayes
Classifier and Decision Tree Classifier classification models.
These two algorithms are tested on the same dataset in order to
determine which is the most accurate. The Decision tree model
correctly predicted heart disease patients 91% of the time, and the
Naive Bayes classifier correctly predicted heart disease patients
87% of the time. The developed framework, together with the
machine learning classification algorithm, could be used to
predict or diagnose other diseases.
III. DATASETS
The human body's heart is a vital organ. If the heart does not
function correctly, it will have an effect on other human organs
such as the kidney, brain, and so on. According to WHO statistics,
one-third of the world's population died from heart disease.
We used the heart disease dataset from the Kaggle website
because it has a large number of different datasets and is a
common source for datasets. The following are some of the
characteristics of the heart disease dataset:
1. age
2. sex
3. chest pain type (4 values)
4. resting blood pressure
5. serum cholesterol in mg/dl
6. fasting blood sugar > 120 mg/dl
7. resting electrocardiographic results (values 0,1,2)
8. maximum heart rate achieved
9. exercise-induced angina
10. oldpeak = ST depression induced by exercise relative
to rest
11. the slope of the peak exercise ST segment
12. number of major vessels (0-3) colored by flourosopy
13. that: 0 = normal; 1 = fixed defect; 2 = reversible defect
The dataset has 4 databases (Cleveland, Hungary, Switzerland,
and Long Beach V databases), which dates back to 1988. It has
76 attributes, including the predicted attribute, but all reported
studies only use a subset of 13, including the predicted attribute.
The "goal" field indicates whether or not the patient has heart
disease.
IV. METHODOLOGY
In healthcare organisations, data mining is critical for automating
systems and improving the working environment. DM aids in the
enhancement of service quality while also lowering costs such as
the heart disease prediction will help the doctor to specify the
disease with more accuracy [29]. Today, a large amount of data
is processed electrically in healthcare facilities, making
conventional analysis impossible [30]. Moreover, it is possible to
use software to analyse vast amounts of data in databases or other
information repositories in order to save people's lives [31]. The
proposed method main goal is to forecast the incidence of heart
disease in order to perform an early automated diagnosis of the
disease and retrieve results in a limited amount of time. This is
important for healthcare professionals to treat their patients based
on accurate decision-making and provide customers with high-
quality services. The proposed approach is also essential in
healthcare organizations with experts who lack experience and
skills. One of the key limitations of the current methodology is its
inability to provide reliable results when they are needed [32]. To
predict the incidence of heart disease, this employs data mining
techniques and machine learning algorithms such as Decision
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on August 13,2021 at 20:15:08 UTC from IEEE Xplore. Restrictions apply.
Tree, Nave Bayes, and Support Vector Machine. It uses a variety
of medical attributes that are more important, such as age, sex,
blood pressure, cholesterol, blood sugar, and heart rate, to
determine whether or not a person has heart disease. Orange
program is used to compute data set analyses, also the dataset
collected from the Kaggle website. The following are prediction
algorithms that used in this paper:
A. Decision Tree (DT)
A decision tree is simple to understand and interpret a
supervised learning algorithm classifier. It works for
numerical as well as categorical data sets. DT performs a test
on a set of attributes, with leaf nodes displaying the expected
class or indicating the result, each node represents attribute
values of a given dataset. As a result, the leaf nodes indicate
that the class is expected or that the results are indicated [33].
Based on the predictive attribute and given rules, the
classification rule begins at the root node and works its way
through to the leaf nodes [34]. This algorithm has a higher
degree of accuracy than the others. This algorithm's high
accuracy is due to the fact that it analyses the dataset in a tree
shape format. It is shown that every attribute of the dataset
has been examined [35].
This model gives the higher accuracy comparing with SVM
and Nave Bayes algorithms. The data in the tree-shaped
structure is analysed by this model [36]. The acts are
determined by a tree-shaped diagram [45,48,49]. The data is
analysed using the decision tree model, which consists of
three nodes:
Root node - this is the most important node; everything
else is built around it.
Leaf node - the final result is brought on a leaf node.
Interior node - the status of dependent variables is
addressed by this node.
Entropy Class:
(

) = -



-



(1)
P= Possibilities of Yes.
N= Possibilities of No.
To find Entropy attributes:


(2)
B. Support Vector Machine (SVM)
SVM is a classification system that can handle both linear
and nonlinear data sets. The margin of the hyperplane that
divides the two groups is maximized in SVM classification.
It helps predict the incidence of heart disease by plotting a
multidimensional hyperplane that divides groups and
increases the margin between them to boost classification
accuracy [37]. There are linear (dot product) kernels,
quadratic kernels, polynomial kernels, Radial Basis
Function kernels, Multilayer Perceptron kernels, and so on
[38]. SVM can also be implemented using a variety of
techniques, including quadratic programming, sequential
minimal optimization, and least squares. Kernel and method
selection are difficult aspects of SVM to get right so that
your model isn't overly positive or negative [39] [40]. Figure
1 depicts the basic idea of SVM. The data points are labelled
as positive or negative, and the aim is to find a hyperplane
that divides them by the greatest possible distance [41].
Fig. 1: SVM example.
C. Naive Bayes (NB)
Naive Bayes based on Bayes' theorem for classification.
According to the Naive Bayesian classifier theorem, the
occurrences of specific features of a class are independent of
the existence or absence of other features. It is a reliable
classifier for heart disease prediction. To classify data sets,
Nave Bayes is used to computing the posterior probability of
each class, which is dependent on conditional probability
[42]. The following is the equation of NB.




(1)
Where (X) denotes the instance to be projected, and (C)
denotes the instance's class value. The formula or equation
given above aids in determining the class in which a function
is supposed to be classified [43][44].
V. EXPERIMENTAL RESULTS AND
DISCUSSION
This study examines the success of the data mining algorithm on
heart disease prediction in our dataset, using a classification
system show the result in “Fig. 2”. The data set used in this
research includes several attributes as well as the known output
class. The output class is the one that will be predicted based on
the other available attributes.
Fig. 2. The Classification Process of Data Mining Techniques
Using (Orange Ver 3.28.0)
The results of the performed algorithms are shown in table 1. The
output class, among other features, is included in the dataset to
evaluate the efficiency and accuracy of data mining techniques
that have been used. After processing, the output results are
compared to a known class, and performance is tested using Train
Time, AUC, CA, F1 scale, Precision, and Recall.
Mod
el
Trainin
g Time
AU
C
CA F1 Precisio
n
Reca
ll
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on August 13,2021 at 20:15:08 UTC from IEEE Xplore. Restrictions apply.
DT 0.093s 0.98
3
0.96
1
0.96
0
0.960 0.960
SVM 0.393s 0.97
4
0.93
4
0.93
0
0.960 0.902
NB 0.358s 0.93
0
0.86
4
0.85
9
0.863 0.856
Table 1: The results of the algorithm
The table shows the model result of Decision Tree, SVM, Naïve
Bayes and training time with each model that has been used.
Then AUC (Area under the ROC Curve) is a metric of success
that encompasses all possible classification thresholds. The
likelihood that the model rates a random positive example higher
than a random negative example is one way to view AUC. The
Specificity (True Negative Rate) assesses the proportion of
correctly defined negatives (i.e., the proportion of those who do
not have the disease (unaffected) who are correctly identified as
not having the Accuracy).
Precision=


(1)
TP = is used for entities that have been correctly categorized.
FP = is used for entities that have been incorrectly classified.
Recall =


(2)
There may come a point where performance assessment with
precision and recall is no longer possible, for example, The
question of which mining algorithm is better arises when one has
higher precision but lower recall than another. This problem can
be solved using the F-measure, which is the average precision and
recall. The following formula can be used to calculate the F-
measure:
F-measure=
 !"#$%&!"'((
 !"#$%&!"'((
 (3)
CA the classifier's accuracy is deemed appropriate
mathematically, the classifier can be used to identify future data
tuples for which the classmark is unknown. It is a classifier that
calculated by dividing the percentage of overall accurate
predictions by the total number of instances.
The result in table 1 shows that DT has the best result for
prediction of heart disease. It achieved an accuracy of 98.3%,
where the AUC was 96.1%, F1-measure was 96%, with precision
and recall was 96%, 90% respectively with less training time than
other techniques 0.092s. This means that accuracy varies
depending on the parameters are chosen and which parameters are
used. The DT was checked with various split percentages and
obtained the quickest training time. SVM also gives good results
after DT. It achieved that from table 1 with training time 0.393s
and 97.4% AUC, 93.4% CA, 96% precision and 90% recall. The
(NB) result show that it is training time 0.358s when it compared
to DT and SVM, it shows that it has less accuracy as 93% of AUC,
and 86.4% CA, then 85.9%, 86.3 % ,85.6% of F1, precision and
recall respectively. “Fig.3” illustrates the Roc analysis of three
used algorithms.
Fig. 3. ROC analysis of Decision Tree, SVM and Naïve Bayes
VI. CONCLUSION
The World Health Organization (WHO) has statistics on heart
disease that is the most common cause of death in the world,
particularly in developing countries. Medical experts do not all
have the same level of experience and skill to make an accurate
decision, and some experts make bad rational decisions that put
people at risk. It is important to forecast the incidence of diseases
in order to solve these issues. One of the advantages of this paper
is that it can be used to develop existing methodologies for better
decision-making by incorporating various algorithms and feature
selection methods. In this paper, the algorithems Decision Trees,
Nave Bayes, and Support Vector Machine algorithms are
recommended to be used to in data mining for predicting the
incidence of heart disease for early automatic diagnosis and fast
retrieval of results, which will help to improve service quality
with lower costs and help the doctor to save people's lives. The
results show that Decision Tree gives the highest accuracy with
less training time than SVM and last Naive Bayes.
REFERENCES
[1] M. A. Sulaiman, “Evaluating Data Mining Classification Methods
Performance in Internet of Things Applications,” J. Soft Comput.
Data Min., vol. 1, no. 2, pp. 11–25, 2020.
[2] J. Thomas and R. T. Princy, “Human heart disease prediction
system using data mining techniques,” in 2016 international
conference on circuit, power and computing technologies
(ICCPCT), 2016, pp. 1–5.
[3] Chicho, B. T., Abdulazeez, A. M., Zeebaree, D. Q., & Zebari, D.
A. (2021). Machine Learning Classifiers Based Classification For
IRIS Recognition. Qubahan Academic Journal, 1(2), 106-118.
[4] R. Alizadehsani et al., “A database for using machine learning and
data mining techniques for coronary artery disease diagnosis,”
Sci. Data, vol. 6, no. 1, pp. 1–13, 2019.
[5] M. M. Islam, C.-C. Wu, T. N. Poly, H.-C. Yang, and Y.-C. (Jack)
Li, “Applications of Machine Learning in Fatty Live Disease
Prediction.,” in MIE, 2018, pp. 166–170.
[6] Y. Khourdifi and M. Bahaj, “Heart disease prediction and
classification using machine learning algorithms optimized by
particle swarm optimization and ant colony optimization,” Int. J.
Intell. Eng. Syst., vol. 12, no. 1, pp. 242–252, 2019.
[7] N. Bhatla and K. Jyoti, “An analysis of heart disease prediction
using different data mining techniques,” Int. J. Eng., vol. 1, no. 8,
pp. 1–4, 2012.
[8] J. Patel, D. TejalUpadhyay, and S. Patel, “Heart disease prediction
using machine learning and data mining technique,” Heart Dis.,
vol. 7, no. 1, pp. 129–137, 2015.
[9] D. Q. Zeebaree, H. Haron, A. M. Abdulazeez, and D. A. Zebari,
“Machine learning and region growing for breast cancer
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on August 13,2021 at 20:15:08 UTC from IEEE Xplore. Restrictions apply.
segmentation,” in 2019 International Conference on Advanced
Science and Engineering (ICOASE), 2019, pp. 88–93.
[10] L. Yahaya, N. D. Oye, and E. J. Garba, “A comprehensive review
on heart disease prediction using data mining and machine
learning techniques,” Am. J. Artif. Intell., vol. 4, no. 1, pp. 20–29,
2020.
[11] J. K. Kim and S. Kang, “Neural network-based coronary heart
disease risk prediction using feature correlation analysis,” J.
Healthc. Eng., vol. 2017, 2017.
[12] D. A. Zebari, D. Q. Zeebaree, A. M. Abdulazeez, H. Haron, and
H. N. A. Hamed, “Improved Threshold Based and Trainable Fully
Automated Segmentation for Breast Cancer Boundary and
Pectoral Muscle in Mammogram Images,” IEEE Access, vol. 8,
pp. 203097–203116, 2020.
[13] S. Nashif, M. R. Raihan, M. R. Islam, and M. H. Imam, “Heart
disease detection by using machine learning algorithms and a real-
time cardiovascular health monitoring system,” World J. Eng.
Technol., vol. 6, no. 4, pp. 854–873, 2018.
[14] A. Kashyap, “Artificial intelligence and medical diagnosis,” Sch.
J. Appl. Med. Sci., pp. 4982–4985, 2018.
[15] S. D. Desai, S. Giraddi, P. Narayankar, N. R. Pudakalakatti, and
S. Sulegaon, “Back-propagation neural network versus logistic
regression in heart disease classification,” in Advanced computing
and communication technologies, Springer, 2019, pp. 133–144.
[16] I. K. A. Enriko, M. Suryanegara, and D. Gunawan, “Heart Disease
Prediction System using k-Nearest Neighbor Algorithm with
Simplified Patient’s Health Parameters,” J. Telecommun.
Electron. Comput. Eng. JTEC, vol. 8, no. 12, pp. 59–65, 2016.
[17] A. M. Abdulazeez, D. Q. Zeebaree, and D. M. Abdulqader,
“Wavelet Applications in Medical Images: A Review,”
Transform. DWT, vol. 21, p. 22, 2020.
[18] N. M. Abdulkareem and A. M. Abdulazeez, “Machine Learning
Classification Based on Radom Forest Algorithm: A Review,”
Int. J. Sci. Bus., vol. 5, no. 2, pp. 128–142, 2021.
[19] M. J. A. Alkhafaji, A. F. Aljuboori, and A. A. Ibrahim, “Clean
medical data and predict heart disease,” in 2020 International
Congress on Human-Computer Interaction, Optimization and
Robotic Applications (HORA), 2020, pp. 1–7.
[20] H. Ahmed, E. M. Younis, A. Hendawi, and A. A. Ali, Heart
disease identification from patients’ social posts, machine
learning solution on Spark,” Future Gener. Comput. Syst., vol.
111, pp. 714–722, 2020.
[21] S. Anitha and N. Sridevi, “HEART DISEASE PREDICTION
USING DATA MINING TECHNIQUES,” p. 9, 2019.
[22] M. Tarawneh and O. Embarak, “Hybrid Approach for Heart
Disease Prediction Using Data Mining Techniques,” in Advances
in Internet, Data and Web Technologies, vol. 29, L. Barolli, F.
Xhafa, Z. A. Khan, and H. Odhabi, Eds. Cham: Springer
International Publishing, 2019, pp. 447–454.
[23] S. Bashir, Z. S. Khan, F. Hassan Khan, A. Anjum, and K. Bashir,
“Improving Heart Disease Prediction Using Feature Selection
Approaches,” in 2019 16th International Bhurban Conference on
Applied Sciences and Technology (IBCAST), Islamabad, Pakistan,
Jan. 2019, pp. 619–623, doi: 10.1109/IBCAST.2019.8667106.
[24] R. U. Khan, T. Hussain, H. Quddus, A. Haider, A. Adnan, and Z.
Mehmood, “An Intelligent Real-time Heart Diseases Diagnosis
Algorithm,” in 2019 2nd International Conference on Computing,
Mathematics and Engineering Technologies (iCoMET), Sukkur,
Pakistan, Jan. 2019, pp. 1–6, doi:
10.1109/ICOMET.2019.8673506.
[25] S. Ramasamy and K. Nirmala, “Disease prediction in data mining
using association rule mining and keyword-based clustering
algorithms,” Int. J. Comput. Appl., vol. 42, no. 1, pp. 1–8, Jan.
2020, doi: 10.1080/1206212X.2017.1396415.
[26] Y. Sharma, R. Veliyambara, and R. Shettar, “Hybrid Classifier for
Identification of Heart Disease,” in 2019 4th International
Conference on Computational Systems and Information
Technology for Sustainable Solution (CSITSS), Bengaluru, India,
Dec. 2019, pp. 1–3, doi: 10.1109/CSITSS47250.2019.9031037.
[27] C. Beyene and P. Kamat, “Survey on prediction and analysis the
occurrence of heart disease using data mining techniques,” Int. J.
Pure Appl. Math., vol. 118, pp. 165–173, Jan. 2018.
[28] S. K. J. and G. S., “Prediction of Heart Disease Using Machine
Learning Algorithms.,” in 2019 1st International Conference on
Innovations in Information and Communication Technology
(ICIICT), CHENNAI, India, Apr. 2019, pp. 1–5, doi:
10.1109/ICIICT1.2019.8741465.
[29] A. Taneja, “Heart disease prediction system using data mining
techniques,” Orient. J. Comput. Sci. Technol., vol. 6, no. 4, pp.
457–466, 2013.
[30] V. Chaurasia and S. Pal, “Early prediction of heart diseases using
data mining techniques,” Caribb. J. Sci. Technol., vol. 1, pp. 208–
217, 2013.
[31] M. Saqlain, W. Hussain, N. A. Saqib, and M. A. Khan,
“Identification of heart failure by using unstructured data of
cardiac patients,” in 2016 45th International Conference on
Parallel Processing Workshops (ICPPW), 2016, pp. 426–431.
[32] D. Maulud and A. M. Abdulazeez, “A Review on Linear
Regression Comprehensive in Machine Learning,” J. Appl. Sci.
Technol. Trends, vol. 1, no. 4, pp. 140–147, 2020.
[33] Zebari, D. A., Abdulazeez, A. M., Zeebaree, D. Q., & Salih, M.
S. (2020, December). A Fusion Scheme of Texture Features for
COVID-19 Detection of CT Scan Images. In 2020 International
Conference on Advanced Science and Engineering
(ICOASE) (pp. 1-6). IEEE.
[34] M. Sultana, A. Haider, and M. S. Uddin, “Analysis of data mining
techniques for heart disease prediction,” in 2016 3rd international
conference on electrical engineering and information
communication technology (ICEEICT), 2016, pp. 1–5.
[35] A. Aldallal and A. A. A. Al-Moosa, “Using Data Mining
Techniques to Predict Diabetes and Heart Diseases,” in 2018 4th
International Conference on Frontiers of Signal Processing
(ICFSP), 2018, pp. 150–154.
[36] J. Soni, U. Ansari, D. Sharma, and S. Soni, “Predictive data
mining for medical diagnosis: An overview of heart disease
prediction,” Int. J. Comput. Appl., vol. 17, no. 8, pp. 43–48, 2011.
[37] C. Raju, E. Philipsy, S. Chacko, L. P. Suresh, and S. D. Rajan, “A
survey on predicting heart disease using data mining techniques,”
in 2018 conference on emerging devices and smart systems
(ICEDSS), 2018, pp. 253–255.
[38] H. D. Masethe and M. A. Masethe, “Prediction of heart disease
using classification algorithms,” in Proceedings of the world
Congress on Engineering and computer Science, 2014, vol. 2, pp.
22–24.
[39] M. T, D. Mukherji, N. Padalia, and A. Naidu, A Heart Disease
Prediction Model using SVM-Decision Trees-Logistic Regression
(SDL).
[40] D. Q. Zeebaree, A. M. Abdulazeez, D. A. Zebari, H. Haron, and
H. N. A. Hamed, “Multi-Level Fusion in Ultrasound for Cancer
Detection Based on Uniform LBP Features.”
[41] V. Mohan, “Liver Disease Prediction using SVM and Naïve
Bayes Algorithms,” Apr. 2015.
[42] M. S. B. Sinal, “Quick identification of Arrhythmia Symptoms
using Empirical Approach in Long Sequence of Heart Cycles.”
[43] M. Gandhi and S. N. Singh, Predictions in heart disease using
techniques of data mining,” in 2015 International Conference on
Futuristic Trends on Computational Analysis and Knowledge
Management (ABLAZE), 2015, pp. 520–525.
[44] Zeebaree, D. Q., Haron, H., Abdulazeez, A. M., & Zebari, D. A.
(2019, April). Trainable model based on new uniform LBP feature
to identify the risk of the breast cancer. In 2019 International
Conference on Advanced Science and Engineering
(ICOASE) (pp. 106-111). IEEE.
[45] F. Y. Ahmed, K. L. T. Aik, A. S. Radzi, and M. D. Salleh,
"Develop Attendance Management System with Feedback and
Complaint Management Function," in 2019 IEEE 7th Conference
on Systems, Process and Control (ICSPC), 2019: IEEE, pp. 248-
252.
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on August 13,2021 at 20:15:08 UTC from IEEE Xplore. Restrictions apply.
[46] D. Nugraha and F. Y. Ahmed, "MEAN stack to enhance the
advancement of parking application: A narrative review," in
Journal of Physics: Conference Series, 2019, vol. 1167, no. 1: IOP
Publishing, p. 012075.
[47] O. A. Mahmood, A. S. Yousif, and F. Y. A. Shamsuddin, "A new
approach to solving Transportation Model Based on the Standard
Deviation," in 2020 IEEE 10th Symposium on Computer
Applications & Industrial Electronics (ISCAIE), 2020: IEEE, pp.
1-5.
[48] Alkawaz, M. H., Zelani, A. A. M., Razalli, H., & Saud, S. N. (2019).
A Digital Eye Navigators for the Visually Impaired. In 2019 IEEE
9th International Conference on System Engineering and
Technology (ICSET) (pp. 477-481). IEEE.
[49] Alkawaz, M. H., Rajandran, H., & Abdullah, M. I. (2020). The
Impact of Current Relation between Facebook Utilization and E-
Stalking towards Users Privacy. In 2020 IEEE International
Conference on Automatic Control and Intelligent Systems
(I2CACIS) (pp. 141-147). IEEE.
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on August 13,2021 at 20:15:08 UTC from IEEE Xplore. Restrictions apply.
... Gene expression data could be completely labelled, partially labelled, or unlabeled. This leads to the creation of a semi-supervised collection of genes for biotype identification and stratigraphic prediction [6]. Typically, unmarked data consists of samples and functions even without information on usual data clustering. ...
Conference Paper
Full-text available
Machine learning and data mining have established several effective applications in gene selection analysis. This paper review semi-supervised learning algorithms and gene selection. Semi-Supervised learning is learning that includes experiences that are familiar with the environment because it can deal with labeled and unnamed data. Gene selection is dimension reduction defined as the discovery process of the perfect selection of attributes comprising the whole collected dataset. We review many previous studies on gene selection in semi-supervised learning where each previous research paper tests a group of algorithms to select a gene on a specific set of selected medical data. Each study proposes its algorithm and compares it with previous existing algorithms and compares their accuracy.
Conference Paper
While beauty is subjective, it is not easy to quantify. Assessing facial beauty based on a computer perspective is an emerging research area with various applications. Different trainable models have been proposed to identify the attractiveness of facial beauty utilizing different types of features, machine learning techniques and lately, convolutional neural networks (CNNs) have proven their efficiency in image classification. The main objective of recent previous work is to enhance the performance of the existing trainable methods and make them suitable for beauty attractiveness identification. In this study, the accuracy and effectiveness of four affective pre-trained CNNs models (AlexNet, GoogleNet, ResNet-50, and VGG16) in assessing the attractiveness of human facial images using the CelebA dataset have been explored, evaluated, and analyzed. The results demonstrate that GoogleNet surpassed the investigated pre-trained networks with a performance accuracy of 82.8%.
Article
Full-text available
Cardiovascular disorders are one of the major causes of sad death among older and middle-aged people. Over the past two decades, health monitoring services have evolved quickly and had the ability to change the way health care is currently provided. However, the most challenging aspect of the mobile and wearable sensor-based human activity recognition pipeline is the extraction of the related features. Feature extraction decreases both computational complexity and time. Deep learning techniques are used for automatic feature learning in a variety of fields, including health, image classification, and, most recently, for the extraction and classification of complex and straightforward human activity recognition in smart health care. This paper reviews the recent state of the art in electrocardiogram (ECG) smart health monitoring systems based on the Internet of things with the machine and deep learning techniques. Moreover, the paper provids possible research and challenges that can help researchers advance state of art in future work.
Article
Full-text available
Classification is the most widely applied machine learning problem today, with implementations in face recognition, flower classification, clustering, and other fields. The goal of this paper is to organize and identify a set of data objects. The study employs K-nearest neighbors, decision tree (j48), and random forest algorithms, and then compares their performance using the IRIS dataset. The results of the comparison analysis showed that the K-nearest neighbors outperformed the other classifiers. Also, the random forest classifier worked better than the decision tree (j48). Finally, the best result obtained by this study is 100% and there is no error rate for the classifier that was obtained.
Article
Full-text available
Perhaps one of the most common and comprehensive statistical and machine learning algorithms are linear regression. Linear regression is used to find a linear relationship between one or more predictors. The linear regression has two types: simple regression and multiple regression (MLR). This paper discusses various works by different researchers on linear regression and polynomial regression and compares their performance using the best approach to optimize prediction and precision. Almost all of the articles analyzed in this review is focused on datasets; in order to determine a model's efficiency, it must be correlated with the actual values obtained for the explanatory variables.
Article
Full-text available
Collective improvement in the acceptable or desirable accuracy level of breast cancer image-related pattern recognition using various schemes remains challenging. Despite the combination of multiple schemes to achieve superior ultrasound image pattern recognition by reducing the speckle noise, an enhanced technique is not achieved. The purpose of this study is to introduce a features-based fusion scheme based on enhancement uniform-Local Binary Pattern (LBP) and filtered noise reduction. To surmount the above limitations and achieve the aim of the study, a new descriptor that enhances the LBP features based on the new threshold has been proposed. This paper proposes a multi-level fusion scheme for the auto-classification of the static ultrasound images of breast cancer, which was attained in two stages. First, several images were generated from a single image using the pre-processing method. The median and Wiener filters were utilized to lessen the speckle noise and enhance the ultrasound image texture. This strategy allowed the extraction of a powerful feature by reducing the overlap between the benign and malignant image classes. Second, the fusion mechanism allowed the production of diverse features from different filtered images. The feasibility of using the LBP-based texture feature to categorize the ultrasound images was demonstrated. The effectiveness of the proposed scheme is tested on 250 ultrasound images comprising 100 and 150 benign and malignant images, respectively. The proposed method achieved very high accuracy (98%), sensitivity (98%), and specificity (99%). As a result, the fusion process that can help achieve a powerful decision based on different features produced from different filtered images improved the results of the new descriptor of LBP features in terms of accuracy, sensitivity, and specificity.
Article
Full-text available
Segmentation of the breast region and pectoral muscle are fundamental subsequent steps in the process of Computer-Aided Diagnosis (CAD) systems. Segmenting the breast region and pectoral muscle are considered a difficult task, particularly in mammogram images because of artefacts, homogeneity among the region of the breast and pectoral muscle, and low contrast along the region of breast boundary, the similarity between the texture of the Region of Interest (ROI), and the unwanted region and irregular ROI. This study aims to propose an improved threshold-based and trainable segmentation model to derive ROI. A hybrid segmentation approach for the boundary of the breast region and pectoral muscle in mammogram images was established based on thresholding and Machine Learning (ML) techniques. For breast boundary estimation, the region of the breast was highlighted by eliminating bands of the wavelet transform. The initial breast boundary was determined through a new thresholding technique. Morphological operations and masking were employed to correct the overestimated boundary by deleting small objects. In the medical imaging field, significant progress to develop effective and accurate ML methods for the segmentation process. In the literature, the imperative role of ML methods in enabling effective and more accurate segmentation method has been highlighted. In this study, an ML technique was built based on the Histogram of Oriented Gradient (HOG) feature with neural network classifiers to determine the region of pectoral muscle and ROI. The proposed segmentation approach was tested by utilizing 322, 200, 100 mammogram images from mammographic image analysis society (mini-MIAS), INbreast, Breast Cancer Digital Repository (BCDR) databases, respectively. The experimental results were compared with manual segmentation based on different texture features. Moreover, evaluation and comparison for the boundary of the breast region and pectoral muscle segmentation have been done separately. The experimental results showed that the boundary of the breast region and the pectoral muscle segmentation approach obtained an accuracy of 98.13% and 98.41% (mini-MIAS), 100%, and 98.01% (INbreast), and 99.8% and 99.5% (BCDR), respectively. On average, the proposed study achieved 99.31% accuracy for the boundary of breast region segmentation and 98.64% accuracy for pectoral muscle segmentation. The overall ROI performance of the proposed method showed improving accuracy after improving the threshold technique for background segmentation and building an ML technique for pectoral muscle segmentation. More so, this paper also included the ground-truth as an evaluation of comprehensive similarity. In the clinic, this analysis may be provided as a valuable support for breast cancer identification. INDEX TERMS Breast cancer, Digital mammogram, Threshold technique, ML technique, Breast segmentation, Pectoral muscle segmentation. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
Article
Full-text available
Heart disease is one of the major causes of life complicacies and subsequently leading to death. The heart disease diagnosis and treatment are very complex, especially in the developing countries, due to the rare availability of efficient diagnostic tools and shortage of medical professionals and other resources which affect proper prediction and treatment of patients. Inadequate preventive measures, lack of experienced or unskilled medical professionals in the field are the leading contributing factors. Although, large proportion of heart diseases is preventable but they continue to rise mainly because preventive measures are inadequate. In today's digital world, several clinical decision support systems on heart disease prediction have been developed by different scholars to simplify and ensure efficient diagnosis. This paper investigates the state of the art of various clinical decision support systems for heart disease prediction, proposed by various researchers using data mining and machine learning techniques. Classification algorithms such as the Naïve Bayes (NB), Decision Tree (DT), and Artificial Neural Network (ANN) have been widely employed to predict heart diseases, where various accuracies were obtained. Hence, only a marginal success is achieved in the creation of such predictive models for heart disease patients therefore, there is need for more complex models that incorporate multiple geographically diverse data sources to increase the accuracy of predicting the early onset of the disease.
Conference Paper
Full-text available
The enormous data provided by the health care environment needs many important and powerful tools for analyzing and extracting data and accessing useful knowledge. Many researchers have been interested in applying many statistical tools as well as many different data mining tools in order to improve an analysis process and extract data from a different data set. The only thing that proves the success and robustness of data mining tool is accurate diagnosis of the disease. According to the (WHO), the biggest cause of death in the last ten years or so in this vast world is heart disease. The statistical exploration tools that researchers use are tools that help decision-makers in health care to predict and diagnose heart disease. The tools used in the diagnostic process for heart disease have been thoroughly tested in order to demonstrate sufficient and acceptable accuracy. A set of patient data divided into 665 records was used, of which 300 were for males, with 365 for females, with 10 different related characteristics. The decision-making department still suffers from a lack of performance and decision-making. Our paper aims to process data in different ways before the process of accessing knowledge to make the appropriate decision through expectations of classification analysis and then using techniques to extract data with acceptable accuracy. Our goal proposed in this paper is to purify the data before the disease prediction process to get the best possible prediction and compare the results with the results of a group of previous researchers to reach an accurate diagnosis and prediction. The second part of our goal is to compare between different technologies on different data sets such as decision tree technology and the second technique is Bayesian classification technology and the last technology is neural networks and the results were (98.85%, 98.16%, 91.31%), respectively. In the end, we hope to obtain acceptable results with high accuracy in the future, enhance clinical diagnosis, and promote appropriate decision-making for early treatment specialists.