Conference PaperPDF Available

A Compassion of Three Data Miming Algorithms for Heart Disease Prediction

July 2021

July 2021

DOI:10.1109/ISIEA51897.2021.9509985

Conference: 2021 IEEE Symposium on Industrial Electronics & Applications (ISIEA)

Authors:

Noor Salah

Duhok Polytechnic University

Adnan Mohsin Abdulazeez

Duhok Polytechnic University

Jwan Najeeb Saeed

Duhok Polytechnic University

Diyar Zeebaree

Duhok Polytechnic University

Show all 6 authorsHide

The Classification Process of Data Mining Techniques Using (Orange Ver 3.28.0)

…

ROC analysis of Decision Tree, SVM and Naïve Bayes

…

Figures - uploaded by Adel Ali Al-zebari

Content may be subject to copyright.

Content uploaded by Adel Ali Al-zebari

Content may be subject to copyright.

A Compassion of Three Data Miming Algorithms for Heart

Disease Prediction

Abstract— Heart disease is one of the most common causes of

death worldwide. Real-time methods for forecasting heart

disease from medical data sources that explain a patient's

current health status are discussed in this paper. The proposed

system's main aim is to find the best data mining algorithm for

predicting heart disease with high accuracy. We suggested

using Decision Tree (DT), Support Vector Machine (SVM) and

Naïve Bayes (NB) algorithms. All of these algorithms are

classified as supervised learning and work better with training

data. The main purpose of using three algorithms is to see which

one is the best at predicting heart disease. The result shows that

the DT algorithm provides the best accuracy with less training

time when compared to SVM and Naïve Bayes(NB).

Keywords— Data Mining, Heart Disease Prediction, Support

Vector Machine, Decision Trees, Naïve Bayes.

I. INTRODUCTION

The data mining (DM) method is entirely reliant on computers, it

is crucial for extracting valuable and useful knowledge from huge

databases [1]. Because of the large amount of non-intuitive

information in large sets of facts, the process of data extraction is

very interesting and useful for exploring and analysing big data

[2]. DM is useful in the clinical sector since it includes many

coded patterns that can be derived from huge data sets by

extracting different medical data [3]. Moreover, In the process of

excretion, disease prediction is significant, and data mining

techniques used in health care are very useful for answering a

series of questions about heart disease prediction [4] [5]. The

rising occurrence of heart disease has become a worldwide

concern, the healthcare sector must form and intensify the way

these diseases are treated in order to reduce their social effects

[6]. In healthcare there is a lot of data, especially data on heart

disease, that needs to be analysed quickly in order to make faster

decisions about the patient [7]. According to data, clinical

records, and hospital management, medical data doubles every

three years that make a huge data. In medical data analysis and

information extraction, DM techniques are extremely important

[8] [9]. The rising rates of morbidity and mortality from heart

disease around the world have prompted researchers to perform a

slew of studies in an attempt to reduce the numbers [10] [11] [12].

In the development of clinical decision support systems for heart

disease prediction, data mining methods have been widely used

[13]. DM tools are used to improve patient policy-making and

prevent hospital errors, as well as early detection, disease

prevention, and avoidable hospital deaths [14]. As a response, a

highly accurate approach that can be used as an analysis tool to

uncover secret heart disease trends in medical data and predict

heart disease before it occurs is needed [15]. This would lead to

improving heart diseases control by using data mining algorithms

that will help for prediction of the heart diseases in early stage

[16]. The data mining algorithms that mostly used in heart disease

prediction are SVM, Decision Tree and NB that we used in our

study and depended on them[17] [18]. These classification

algorithms are used to find a model that represents and

distinguishes classes or concepts, and are one of the predictive

data mining tasks that help to predicting high accuracy. The paper

is organized as follow the literature study in section 2, then the

dataset in section 3, following by method in section 4 and the

result in section 5, finally in section 6 the conclusion of the paper.

II. RELATED WORK

M. J. A. Alkhafaji et al., [19] suggested that using three

techniques to collect data with acceptable precision Before the

process of accessing information in order to make the appropriate

decision for the patient to predict heart disease, classification

review requirements must be met. The findings show that the

efficiency, prediction accuracy, and diagnosis, decision trees

technology outperforms the Bayesian classification technique and

the neural network technique (98.85%, 98.16%, 91.31%)

respectively.

H. Ahmed et al.,[20] presented a method based on “Apache

Spark and Apache Kafka” that predicted heart disease in real-

time. They evaluate the features in the dataset and choose the best

set of features using two feature selection algorithms in this

component, Univariate feature selection and Relief. In addition,

A number of machine learning classification algorithms were

used to classify the entire collection of features as well as selected

features, including (SVM, DT, RF, and LR). The results show that

the random forest classifier outperforms the other models with a

94.9 % accuracy score.

S. Anitha et al., [21] proposed three supervised machine

learning algorithms K-Nearest Neighbour, Naive Bayes, and

SVM is compared using the heart diseases dataset. To determine

whether or not a patient has heart disease, which will help in the

Jwan Najeeb Saeed

Department of Information Technology,

Duhok Polytechnic University, Duhok,

Kurdistan Region, Iraq

Jwan.najeeb@dpu.edu.krd

Adnan Mohsin Abdulazeez

Research Centre of

Duhok Polytechnic University

Duhok, Kurdistaion Region, Iraq

adnan.mohsin@dpu.edu.krd

Noor Salah Hassan

Technical College of Informatics-Akre

Duhok Polytechnic University

Duhok, Kurdistaion Region, Iraq

noor.salah.hassan6@gmail.com

Diyar Qader Zeebaree

Research Center of

Duhok Polytechnic University,

Duhok, Kurdistan Region, Iraq

dqszeebaree@dpu.edu.krd

Falah Y.H. Ahmed

Faculty of Information Science and Engineering

(FISE), Management and Science University,

Shah Alam, Selangor Malaysia

falah_ahmed@msu.edu.my

Adel Al-zebari

Technical College of Informatics-Akre

Duhok Polytechnic University

Duhok, Kurdistaion Region, Iraq

adel.ali@dpu.edu.krd

Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on August 13,2021 at 20:15:08 UTC from IEEE Xplore. Restrictions apply.

diagnosis. The Naive Bayes algorithm correctly predicts the

disease 86.6%, according to the results of the experiments.

M. Tarawneh et al., [22] proposed a hybrid system, which is a

new method for predicting heart disease that incorporates all

methods into a single algorithm. The findings indicate that a

composite model of all approaches can be used to make an

effective diagnosis. 89.2% accuracy was reached using data

mining algorithms such as Naive Bayes, SVM, K-Nearest

Neighbour, Neural Network, J4.8, Random Forest. The Naive

Bayes and SVM techniques had the highest accuracy on the entire

data collection.

S. Bashir et al., [23] the use of data science to predict heart

disease in the medical field has been presented. They focused on

feature selection techniques, experimenting with and

demonstrating improved accuracy using data mining techniques

on a variety of heart disease datasets (Decision Tree, Logistic

Regression, Logistic Regression SVM, Nave Bayes, and Random

Forest). Logistic regression is the best feature selection tool for

predicting heart disease, since it achieves the highest level of

precision.

R. U. Khan et al., [24] proposed the development of an IHDDA

(Intelligent Heart Diseases Diagnosis Algorithm) that reads heart

signals (such as ECG graphs) and allows an accurate diagnosis in

a fraction of a second. On a supercomputer cluster, the proposed

algorithm was reviewed with 300 patients with different heart

problems. The results show that the IHDDA can detect heart

problems in 1.5 seconds with a 97% accuracy, allowing for the

highest diagnostic yield with the least amount of physician effort.

S. Ramasamy et al., [25] proposed that the association rule

mining algorithm be used to extract matched features from the

hospital knowledge database and that the keyword-based

clustering algorithm be used to identify the patient's specific

disease [46,47]. The aim is to use data mining techniques to

predict potential disease from a patient data set and to figure out

which model generates the most accurate diagnosis predictions.

Their findings indicate that this algorithm can help diagnose heart

disease more efficiently and quickly.

Y. Sharma et al., [26] suggested using the K-Means Clustering

and Decision Tree algorithm these two data mining methods that

used together, These two methods for predicting heart diseases,

one using unsupervised learning and the other using supervised

learning, take very different approaches. The results show that the

K-Means and Decision Tree algorithms have been merged into a

single Hybrid Classifier, which performs better than their

individual classifiers to predicting heart disease with high

accuracy.

C. Beyene et al., [27] proposed to use data mining and machine

learning algorithms for automated disease detection in healthcare

centres to predict the occurrence of heart diseases. Support Vector

Machine, Decision Tree, Nave Bayes, K-Nearest Neighbour, and

Artificial Neural Network are used to assist doctors in making

decisions. The findings show that using the J48, Nave Bayes, and

Support Vector Machine algorithms to predict the occurrence of

heart disease for early automated diagnosis and rapid retrieval of

results aids in delivering high-quality services while reducing

costs, ultimately saving lives.

S. K. J. et al., [28] proposed using two supervised data mining

algorithms on the dataset to estimate the likelihood of patient

developing heart disease. were evaluated using the Nave Bayes

Classifier and Decision Tree Classifier classification models.

These two algorithms are tested on the same dataset in order to

determine which is the most accurate. The Decision tree model

correctly predicted heart disease patients 91% of the time, and the

Naive Bayes classifier correctly predicted heart disease patients

87% of the time. The developed framework, together with the

machine learning classification algorithm, could be used to

predict or diagnose other diseases.

III. DATASETS

The human body's heart is a vital organ. If the heart does not

function correctly, it will have an effect on other human organs

such as the kidney, brain, and so on. According to WHO statistics,

one-third of the world's population died from heart disease.

We used the heart disease dataset from the Kaggle website

because it has a large number of different datasets and is a

common source for datasets. The following are some of the

characteristics of the heart disease dataset:

1. age

2. sex

3. chest pain type (4 values)

4. resting blood pressure

5. serum cholesterol in mg/dl

6. fasting blood sugar > 120 mg/dl

7. resting electrocardiographic results (values 0,1,2)

8. maximum heart rate achieved

9. exercise-induced angina

10. oldpeak = ST depression induced by exercise relative

to rest

11. the slope of the peak exercise ST segment

12. number of major vessels (0-3) colored by flourosopy

13. that: 0 = normal; 1 = fixed defect; 2 = reversible defect

The dataset has 4 databases (Cleveland, Hungary, Switzerland,

and Long Beach V databases), which dates back to 1988. It has

76 attributes, including the predicted attribute, but all reported

studies only use a subset of 13, including the predicted attribute.

The "goal" field indicates whether or not the patient has heart

disease.

IV. METHODOLOGY

In healthcare organisations, data mining is critical for automating

systems and improving the working environment. DM aids in the

enhancement of service quality while also lowering costs such as

the heart disease prediction will help the doctor to specify the

disease with more accuracy [29]. Today, a large amount of data

is processed electrically in healthcare facilities, making

conventional analysis impossible [30]. Moreover, it is possible to

use software to analyse vast amounts of data in databases or other

information repositories in order to save people's lives [31]. The

proposed method main goal is to forecast the incidence of heart

disease in order to perform an early automated diagnosis of the

disease and retrieve results in a limited amount of time. This is

important for healthcare professionals to treat their patients based

on accurate decision-making and provide customers with high-

quality services. The proposed approach is also essential in

healthcare organizations with experts who lack experience and

skills. One of the key limitations of the current methodology is its

inability to provide reliable results when they are needed [32]. To

predict the incidence of heart disease, this employs data mining

techniques and machine learning algorithms such as Decision

Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on August 13,2021 at 20:15:08 UTC from IEEE Xplore. Restrictions apply.

Tree, Nave Bayes, and Support Vector Machine. It uses a variety

of medical attributes that are more important, such as age, sex,

blood pressure, cholesterol, blood sugar, and heart rate, to

determine whether or not a person has heart disease. Orange

program is used to compute data set analyses, also the dataset

collected from the Kaggle website. The following are prediction

algorithms that used in this paper:

A. Decision Tree (DT)

A decision tree is simple to understand and interpret a

supervised learning algorithm classifier. It works for

numerical as well as categorical data sets. DT performs a test

on a set of attributes, with leaf nodes displaying the expected

class or indicating the result, each node represents attribute

values of a given dataset. As a result, the leaf nodes indicate

that the class is expected or that the results are indicated [33].

Based on the predictive attribute and given rules, the

classification rule begins at the root node and works its way

through to the leaf nodes [34]. This algorithm has a higher

degree of accuracy than the others. This algorithm's high

accuracy is due to the fact that it analyses the dataset in a tree

shape format. It is shown that every attribute of the dataset

has been examined [35].

This model gives the higher accuracy comparing with SVM

and Nave Bayes algorithms. The data in the tree-shaped

structure is analysed by this model [36]. The acts are

determined by a tree-shaped diagram [45,48,49]. The data is

analysed using the decision tree model, which consists of

three nodes:

• Root node - this is the most important node; everything

else is built around it.

• Leaf node - the final result is brought on a leaf node.

• Interior node - the status of dependent variables is

addressed by this node.

Entropy Class:

(



) = -

























(1)

P= Possibilities of Yes.

N= Possibilities of No.

To find Entropy attributes: 





(2)

B. Support Vector Machine (SVM)

SVM is a classification system that can handle both linear

and nonlinear data sets. The margin of the hyperplane that

divides the two groups is maximized in SVM classification.

It helps predict the incidence of heart disease by plotting a

multidimensional hyperplane that divides groups and

increases the margin between them to boost classification

accuracy [37]. There are linear (dot product) kernels,

quadratic kernels, polynomial kernels, Radial Basis

Function kernels, Multilayer Perceptron kernels, and so on

[38]. SVM can also be implemented using a variety of

techniques, including quadratic programming, sequential

minimal optimization, and least squares. Kernel and method

selection are difficult aspects of SVM to get right so that

your model isn't overly positive or negative [39] [40]. Figure

1 depicts the basic idea of SVM. The data points are labelled

as positive or negative, and the aim is to find a hyperplane

that divides them by the greatest possible distance [41].

Fig. 1: SVM example.

C. Naive Bayes (NB)

Naive Bayes based on Bayes' theorem for classification.

According to the Naive Bayesian classifier theorem, the

occurrences of specific features of a class are independent of

the existence or absence of other features. It is a reliable

classifier for heart disease prediction. To classify data sets,

Nave Bayes is used to computing the posterior probability of

each class, which is dependent on conditional probability

[42]. The following is the equation of NB.

 

  











(1)

Where (X) denotes the instance to be projected, and (C)

denotes the instance's class value. The formula or equation

given above aids in determining the class in which a function

is supposed to be classified [43][44].

V. EXPERIMENTAL RESULTS AND

DISCUSSION

This study examines the success of the data mining algorithm on

heart disease prediction in our dataset, using a classification

system show the result in “Fig. 2”. The data set used in this

research includes several attributes as well as the known output

class. The output class is the one that will be predicted based on

the other available attributes.

Fig. 2. The Classification Process of Data Mining Techniques

Using (Orange Ver 3.28.0)

The results of the performed algorithms are shown in table 1. The

output class, among other features, is included in the dataset to

evaluate the efficiency and accuracy of data mining techniques

that have been used. After processing, the output results are

compared to a known class, and performance is tested using Train

Time, AUC, CA, F1 scale, Precision, and Recall.

Mod

Trainin

g Time

CA F1 Precisio

Reca

Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on August 13,2021 at 20:15:08 UTC from IEEE Xplore. Restrictions apply.

DT 0.093s 0.98

0.96

0.960 0.960

SVM 0.393s 0.97

0.93

0.960 0.902

NB 0.358s 0.93

0.86

0.85

0.863 0.856

Table 1: The results of the algorithm

The table shows the model result of Decision Tree, SVM, Naïve

Bayes and training time with each model that has been used.

Then AUC (Area under the ROC Curve) is a metric of success

that encompasses all possible classification thresholds. The

likelihood that the model rates a random positive example higher

than a random negative example is one way to view AUC. The

Specificity (True Negative Rate) assesses the proportion of

correctly defined negatives (i.e., the proportion of those who do

not have the disease (unaffected) who are correctly identified as

not having the Accuracy).

Precision=





(1)

TP = is used for entities that have been correctly categorized.

FP = is used for entities that have been incorrectly classified.

Recall =





(2)

There may come a point where performance assessment with

precision and recall is no longer possible, for example, The

question of which mining algorithm is better arises when one has

higher precision but lower recall than another. This problem can

be solved using the F-measure, which is the average precision and

recall. The following formula can be used to calculate the F-

measure:

F-measure=

 !"#$%&!"'((

 !"#$%&!"'((

 (3)

CA the classifier's accuracy is deemed appropriate

mathematically, the classifier can be used to identify future data

tuples for which the classmark is unknown. It is a classifier that

calculated by dividing the percentage of overall accurate

predictions by the total number of instances.

The result in table 1 shows that DT has the best result for

prediction of heart disease. It achieved an accuracy of 98.3%,

where the AUC was 96.1%, F1-measure was 96%, with precision

and recall was 96%, 90% respectively with less training time than

other techniques 0.092s. This means that accuracy varies

depending on the parameters are chosen and which parameters are

used. The DT was checked with various split percentages and

obtained the quickest training time. SVM also gives good results

after DT. It achieved that from table 1 with training time 0.393s

and 97.4% AUC, 93.4% CA, 96% precision and 90% recall. The

(NB) result show that it is training time 0.358s when it compared

to DT and SVM, it shows that it has less accuracy as 93% of AUC,

and 86.4% CA, then 85.9%, 86.3 % ,85.6% of F1, precision and

recall respectively. “Fig.3” illustrates the Roc analysis of three

used algorithms.

Fig. 3. ROC analysis of Decision Tree, SVM and Naïve Bayes

VI. CONCLUSION

The World Health Organization (WHO) has statistics on heart

disease that is the most common cause of death in the world,

particularly in developing countries. Medical experts do not all

have the same level of experience and skill to make an accurate

decision, and some experts make bad rational decisions that put

people at risk. It is important to forecast the incidence of diseases

in order to solve these issues. One of the advantages of this paper

is that it can be used to develop existing methodologies for better

decision-making by incorporating various algorithms and feature

selection methods. In this paper, the algorithems Decision Trees,

Nave Bayes, and Support Vector Machine algorithms are

recommended to be used to in data mining for predicting the

incidence of heart disease for early automatic diagnosis and fast

retrieval of results, which will help to improve service quality

with lower costs and help the doctor to save people's lives. The

results show that Decision Tree gives the highest accuracy with

less training time than SVM and last Naive Bayes.

REFERENCES

[1] M. A. Sulaiman, “Evaluating Data Mining Classification Methods

Performance in Internet of Things Applications,” J. Soft Comput.

Data Min., vol. 1, no. 2, pp. 11–25, 2020.

[2] J. Thomas and R. T. Princy, “Human heart disease prediction

system using data mining techniques,” in 2016 international

conference on circuit, power and computing technologies

(ICCPCT), 2016, pp. 1–5.

[3] Chicho, B. T., Abdulazeez, A. M., Zeebaree, D. Q., & Zebari, D.

A. (2021). Machine Learning Classifiers Based Classification For

IRIS Recognition. Qubahan Academic Journal, 1(2), 106-118.

[4] R. Alizadehsani et al., “A database for using machine learning and

data mining techniques for coronary artery disease diagnosis,”

Sci. Data, vol. 6, no. 1, pp. 1–13, 2019.

[5] M. M. Islam, C.-C. Wu, T. N. Poly, H.-C. Yang, and Y.-C. (Jack)

Li, “Applications of Machine Learning in Fatty Live Disease

Prediction.,” in MIE, 2018, pp. 166–170.

[6] Y. Khourdifi and M. Bahaj, “Heart disease prediction and

classification using machine learning algorithms optimized by

particle swarm optimization and ant colony optimization,” Int. J.

Intell. Eng. Syst., vol. 12, no. 1, pp. 242–252, 2019.

[7] N. Bhatla and K. Jyoti, “An analysis of heart disease prediction

using different data mining techniques,” Int. J. Eng., vol. 1, no. 8,

pp. 1–4, 2012.

[8] J. Patel, D. TejalUpadhyay, and S. Patel, “Heart disease prediction

using machine learning and data mining technique,” Heart Dis.,

vol. 7, no. 1, pp. 129–137, 2015.

[9] D. Q. Zeebaree, H. Haron, A. M. Abdulazeez, and D. A. Zebari,

“Machine learning and region growing for breast cancer

Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on August 13,2021 at 20:15:08 UTC from IEEE Xplore. Restrictions apply.

segmentation,” in 2019 International Conference on Advanced

Science and Engineering (ICOASE), 2019, pp. 88–93.

[10] L. Yahaya, N. D. Oye, and E. J. Garba, “A comprehensive review

on heart disease prediction using data mining and machine

learning techniques,” Am. J. Artif. Intell., vol. 4, no. 1, pp. 20–29,

2020.

[11] J. K. Kim and S. Kang, “Neural network-based coronary heart

disease risk prediction using feature correlation analysis,” J.

Healthc. Eng., vol. 2017, 2017.

[12] D. A. Zebari, D. Q. Zeebaree, A. M. Abdulazeez, H. Haron, and

H. N. A. Hamed, “Improved Threshold Based and Trainable Fully

Automated Segmentation for Breast Cancer Boundary and

Pectoral Muscle in Mammogram Images,” IEEE Access, vol. 8,

pp. 203097–203116, 2020.

[13] S. Nashif, M. R. Raihan, M. R. Islam, and M. H. Imam, “Heart

disease detection by using machine learning algorithms and a real-

time cardiovascular health monitoring system,” World J. Eng.

Technol., vol. 6, no. 4, pp. 854–873, 2018.

[14] A. Kashyap, “Artificial intelligence and medical diagnosis,” Sch.

J. Appl. Med. Sci., pp. 4982–4985, 2018.

[15] S. D. Desai, S. Giraddi, P. Narayankar, N. R. Pudakalakatti, and

S. Sulegaon, “Back-propagation neural network versus logistic

regression in heart disease classification,” in Advanced computing

and communication technologies, Springer, 2019, pp. 133–144.

[16] I. K. A. Enriko, M. Suryanegara, and D. Gunawan, “Heart Disease

Prediction System using k-Nearest Neighbor Algorithm with

Simplified Patient’s Health Parameters,” J. Telecommun.

Electron. Comput. Eng. JTEC, vol. 8, no. 12, pp. 59–65, 2016.

[17] A. M. Abdulazeez, D. Q. Zeebaree, and D. M. Abdulqader,

“Wavelet Applications in Medical Images: A Review,”

Transform. DWT, vol. 21, p. 22, 2020.

[18] N. M. Abdulkareem and A. M. Abdulazeez, “Machine Learning

Classification Based on Radom Forest Algorithm: A Review,”

Int. J. Sci. Bus., vol. 5, no. 2, pp. 128–142, 2021.

[19] M. J. A. Alkhafaji, A. F. Aljuboori, and A. A. Ibrahim, “Clean

medical data and predict heart disease,” in 2020 International

Congress on Human-Computer Interaction, Optimization and

Robotic Applications (HORA), 2020, pp. 1–7.

[20] H. Ahmed, E. M. Younis, A. Hendawi, and A. A. Ali, “Heart

disease identification from patients’ social posts, machine

learning solution on Spark,” Future Gener. Comput. Syst., vol.

111, pp. 714–722, 2020.

[21] S. Anitha and N. Sridevi, “HEART DISEASE PREDICTION

USING DATA MINING TECHNIQUES,” p. 9, 2019.

[22] M. Tarawneh and O. Embarak, “Hybrid Approach for Heart

Disease Prediction Using Data Mining Techniques,” in Advances

in Internet, Data and Web Technologies, vol. 29, L. Barolli, F.

Xhafa, Z. A. Khan, and H. Odhabi, Eds. Cham: Springer

International Publishing, 2019, pp. 447–454.

[23] S. Bashir, Z. S. Khan, F. Hassan Khan, A. Anjum, and K. Bashir,

“Improving Heart Disease Prediction Using Feature Selection

Approaches,” in 2019 16th International Bhurban Conference on

Applied Sciences and Technology (IBCAST), Islamabad, Pakistan,

Jan. 2019, pp. 619–623, doi: 10.1109/IBCAST.2019.8667106.

[24] R. U. Khan, T. Hussain, H. Quddus, A. Haider, A. Adnan, and Z.

Mehmood, “An Intelligent Real-time Heart Diseases Diagnosis

Algorithm,” in 2019 2nd International Conference on Computing,

Mathematics and Engineering Technologies (iCoMET), Sukkur,

Pakistan, Jan. 2019, pp. 1–6, doi:

10.1109/ICOMET.2019.8673506.

[25] S. Ramasamy and K. Nirmala, “Disease prediction in data mining

using association rule mining and keyword-based clustering

algorithms,” Int. J. Comput. Appl., vol. 42, no. 1, pp. 1–8, Jan.

2020, doi: 10.1080/1206212X.2017.1396415.

[26] Y. Sharma, R. Veliyambara, and R. Shettar, “Hybrid Classifier for

Identification of Heart Disease,” in 2019 4th International

Conference on Computational Systems and Information

Technology for Sustainable Solution (CSITSS), Bengaluru, India,

Dec. 2019, pp. 1–3, doi: 10.1109/CSITSS47250.2019.9031037.

[27] C. Beyene and P. Kamat, “Survey on prediction and analysis the

occurrence of heart disease using data mining techniques,” Int. J.

Pure Appl. Math., vol. 118, pp. 165–173, Jan. 2018.

[28] S. K. J. and G. S., “Prediction of Heart Disease Using Machine

Learning Algorithms.,” in 2019 1st International Conference on

Innovations in Information and Communication Technology

(ICIICT), CHENNAI, India, Apr. 2019, pp. 1–5, doi:

10.1109/ICIICT1.2019.8741465.

[29] A. Taneja, “Heart disease prediction system using data mining

techniques,” Orient. J. Comput. Sci. Technol., vol. 6, no. 4, pp.

457–466, 2013.

[30] V. Chaurasia and S. Pal, “Early prediction of heart diseases using

data mining techniques,” Caribb. J. Sci. Technol., vol. 1, pp. 208–

217, 2013.

[31] M. Saqlain, W. Hussain, N. A. Saqib, and M. A. Khan,

“Identification of heart failure by using unstructured data of

cardiac patients,” in 2016 45th International Conference on

Parallel Processing Workshops (ICPPW), 2016, pp. 426–431.

[32] D. Maulud and A. M. Abdulazeez, “A Review on Linear

Regression Comprehensive in Machine Learning,” J. Appl. Sci.

Technol. Trends, vol. 1, no. 4, pp. 140–147, 2020.

[33] Zebari, D. A., Abdulazeez, A. M., Zeebaree, D. Q., & Salih, M.

S. (2020, December). A Fusion Scheme of Texture Features for

COVID-19 Detection of CT Scan Images. In 2020 International

Conference on Advanced Science and Engineering

(ICOASE) (pp. 1-6). IEEE.

[34] M. Sultana, A. Haider, and M. S. Uddin, “Analysis of data mining

techniques for heart disease prediction,” in 2016 3rd international

conference on electrical engineering and information

communication technology (ICEEICT), 2016, pp. 1–5.

[35] A. Aldallal and A. A. A. Al-Moosa, “Using Data Mining

Techniques to Predict Diabetes and Heart Diseases,” in 2018 4th

International Conference on Frontiers of Signal Processing

(ICFSP), 2018, pp. 150–154.

[36] J. Soni, U. Ansari, D. Sharma, and S. Soni, “Predictive data

mining for medical diagnosis: An overview of heart disease

prediction,” Int. J. Comput. Appl., vol. 17, no. 8, pp. 43–48, 2011.

[37] C. Raju, E. Philipsy, S. Chacko, L. P. Suresh, and S. D. Rajan, “A

survey on predicting heart disease using data mining techniques,”

in 2018 conference on emerging devices and smart systems

(ICEDSS), 2018, pp. 253–255.

[38] H. D. Masethe and M. A. Masethe, “Prediction of heart disease

using classification algorithms,” in Proceedings of the world

Congress on Engineering and computer Science, 2014, vol. 2, pp.

22–24.

[39] M. T, D. Mukherji, N. Padalia, and A. Naidu, A Heart Disease

Prediction Model using SVM-Decision Trees-Logistic Regression

(SDL).

[40] D. Q. Zeebaree, A. M. Abdulazeez, D. A. Zebari, H. Haron, and

H. N. A. Hamed, “Multi-Level Fusion in Ultrasound for Cancer

Detection Based on Uniform LBP Features.”

[41] V. Mohan, “Liver Disease Prediction using SVM and Naïve

Bayes Algorithms,” Apr. 2015.

[42] M. S. B. Sinal, “Quick identification of Arrhythmia Symptoms

using Empirical Approach in Long Sequence of Heart Cycles.”

[43] M. Gandhi and S. N. Singh, “Predictions in heart disease using

techniques of data mining,” in 2015 International Conference on

Futuristic Trends on Computational Analysis and Knowledge

Management (ABLAZE), 2015, pp. 520–525.

[44] Zeebaree, D. Q., Haron, H., Abdulazeez, A. M., & Zebari, D. A.

(2019, April). Trainable model based on new uniform LBP feature

to identify the risk of the breast cancer. In 2019 International

Conference on Advanced Science and Engineering

(ICOASE) (pp. 106-111). IEEE.

[45] F. Y. Ahmed, K. L. T. Aik, A. S. Radzi, and M. D. Salleh,

"Develop Attendance Management System with Feedback and

Complaint Management Function," in 2019 IEEE 7th Conference

on Systems, Process and Control (ICSPC), 2019: IEEE, pp. 248-

252.

Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on August 13,2021 at 20:15:08 UTC from IEEE Xplore. Restrictions apply.

[46] D. Nugraha and F. Y. Ahmed, "MEAN stack to enhance the

advancement of parking application: A narrative review," in

Journal of Physics: Conference Series, 2019, vol. 1167, no. 1: IOP

Publishing, p. 012075.

[47] O. A. Mahmood, A. S. Yousif, and F. Y. A. Shamsuddin, "A new

approach to solving Transportation Model Based on the Standard

Deviation," in 2020 IEEE 10th Symposium on Computer

Applications & Industrial Electronics (ISCAIE), 2020: IEEE, pp.

1-5.

[48] Alkawaz, M. H., Zelani, A. A. M., Razalli, H., & Saud, S. N. (2019).

A Digital Eye Navigators for the Visually Impaired. In 2019 IEEE

9th International Conference on System Engineering and

Technology (ICSET) (pp. 477-481). IEEE.

[49] Alkawaz, M. H., Rajandran, H., & Abdullah, M. I. (2020). The

Impact of Current Relation between Facebook Utilization and E-

Stalking towards Users Privacy. In 2020 IEEE International

Conference on Automatic Control and Intelligent Systems

(I2CACIS) (pp. 141-147). IEEE.

Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on August 13,2021 at 20:15:08 UTC from IEEE Xplore. Restrictions apply.

Machine Learning Semi-Supervised Algorithms for Gene Selection: A Review

Conference Paper

Full-text available

Nov 2021

Machine learning and data mining have established several effective applications in gene selection analysis. This paper review semi-supervised learning algorithms and gene selection. Semi-Supervised learning is learning that includes experiences that are familiar with the environment because it can deal with labeled and unnamed data. Gene selection is dimension reduction defined as the discovery process of the perfect selection of attributes comprising the whole collected dataset. We review many previous studies on gene selection in semi-supervised learning where each previous research paper tests a group of algorithms to select a gene on a specific set of selected medical data. Each study proposes its algorithm and compares it with previous existing algorithms and compares their accuracy.

A Comparison of Classification Algorithms for Software Defect Prediction

Conference Paper

Nov 2023

2D Facial Images Attractiveness Assessment Based on Transfer Learning of Deep Convolutional Neural Networks

Conference Paper

Mar 2023

While beauty is subjective, it is not easy to quantify. Assessing facial beauty based on a computer perspective is an emerging research area with various applications. Different trainable models have been proposed to identify the attractiveness of facial beauty utilizing different types of features, machine learning techniques and lately, convolutional neural networks (CNNs) have proven their efficiency in image classification. The main objective of recent previous work is to enhance the performance of the existing trainable methods and make them suitable for beauty attractiveness identification. In this study, the accuracy and effectiveness of four affective pre-trained CNNs models (AlexNet, GoogleNet, ResNet-50, and VGG16) in assessing the attractiveness of human facial images using the CelebA dataset have been explored, evaluated, and analyzed. The results demonstrate that GoogleNet surpassed the investigated pre-trained networks with a performance accuracy of 82.8%.

Machine Learning and Deep Neural Network Techniques for Heart Disease Prediction

Conference Paper

Dec 2022

Evolving fuzzy neural network based on null-unineurons for the identification of coronary artery disease

Conference Paper

Oct 2022

Segmentation and Classification for Breast Cancer Ultrasound Images Using Deep Learning Techniques: A Review

Conference Paper

May 2022

Smart Healthcare for ECG Telemonitoring System

Article

Full-text available

Oct 2021

Journal of Soft Computing and Data Mining Smart Healthcare for ECG Telemonitoring System

Article

Full-text available

Oct 2021

Cardiovascular disorders are one of the major causes of sad death among older and middle-aged people. Over the past two decades, health monitoring services have evolved quickly and had the ability to change the way health care is currently provided. However, the most challenging aspect of the mobile and wearable sensor-based human activity recognition pipeline is the extraction of the related features. Feature extraction decreases both computational complexity and time. Deep learning techniques are used for automatic feature learning in a variety of fields, including health, image classification, and, most recently, for the extraction and classification of complex and straightforward human activity recognition in smart health care. This paper reviews the recent state of the art in electrocardiogram (ECG) smart health monitoring systems based on the Internet of things with the machine and deep learning techniques. Moreover, the paper provids possible research and challenges that can help researchers advance state of art in future work.

MACHINE LEARNING ALGORITHMS IN PREDICTION OF HEART ATTACK

Research

Full-text available

May 2021

Journal of Soft Computing and Data Mining Evaluating Data Mining Classification Methods Performance in Internet of Things Applications

Article

Full-text available

Jan 2020

A Fusion Scheme of Texture Features for COVID-19 Detection of CT Scan Images

Conference Paper

Full-text available

Dec 2020

Machine Learning Classifiers Based Classification For IRIS Recognition

Article

Full-text available

May 2021

Classification is the most widely applied machine learning problem today, with implementations in face recognition, flower classification, clustering, and other fields. The goal of this paper is to organize and identify a set of data objects. The study employs K-nearest neighbors, decision tree (j48), and random forest algorithms, and then compares their performance using the IRIS dataset. The results of the comparison analysis showed that the K-nearest neighbors outperformed the other classifiers. Also, the random forest classifier worked better than the decision tree (j48). Finally, the best result obtained by this study is 100% and there is no error rate for the classifier that was obtained.

A Review on Linear Regression Comprehensive in Machine Learning

Article

Full-text available

Dec 2020

Perhaps one of the most common and comprehensive statistical and machine learning algorithms are linear regression. Linear regression is used to find a linear relationship between one or more predictors. The linear regression has two types: simple regression and multiple regression (MLR). This paper discusses various works by different researchers on linear regression and polynomial regression and compares their performance using the best approach to optimize prediction and precision. Almost all of the articles analyzed in this review is focused on datasets; in order to determine a model's efficiency, it must be correlated with the actual values obtained for the explanatory variables.

Multi-Level Fusion in Ultrasound for Cancer Detection Based on Uniform LBP Features

Article

Full-text available

Dec 2020
CMC-COMPUT MATER CON

Collective improvement in the acceptable or desirable accuracy level of breast cancer image-related pattern recognition using various schemes remains challenging. Despite the combination of multiple schemes to achieve superior ultrasound image pattern recognition by reducing the speckle noise, an enhanced technique is not achieved. The purpose of this study is to introduce a features-based fusion scheme based on enhancement uniform-Local Binary Pattern (LBP) and filtered noise reduction. To surmount the above limitations and achieve the aim of the study, a new descriptor that enhances the LBP features based on the new threshold has been proposed. This paper proposes a multi-level fusion scheme for the auto-classification of the static ultrasound images of breast cancer, which was attained in two stages. First, several images were generated from a single image using the pre-processing method. The median and Wiener filters were utilized to lessen the speckle noise and enhance the ultrasound image texture. This strategy allowed the extraction of a powerful feature by reducing the overlap between the benign and malignant image classes. Second, the fusion mechanism allowed the production of diverse features from different filtered images. The feasibility of using the LBP-based texture feature to categorize the ultrasound images was demonstrated. The effectiveness of the proposed scheme is tested on 250 ultrasound images comprising 100 and 150 benign and malignant images, respectively. The proposed method achieved very high accuracy (98%), sensitivity (98%), and specificity (99%). As a result, the fusion process that can help achieve a powerful decision based on different features produced from different filtered images improved the results of the new descriptor of LBP features in terms of accuracy, sensitivity, and specificity.

I. DATE OF PUBLICATION XXXX 00, 0000, DATE OF CURRENT VERSION XXXX 00, 0000. Improved Threshold Based and Trainable Fully Automated Segmentation for Breast Cancer Boundary and Pectoral Muscle in Mammogram Images

Article

Full-text available

Nov 2020

Segmentation of the breast region and pectoral muscle are fundamental subsequent steps in the process of Computer-Aided Diagnosis (CAD) systems. Segmenting the breast region and pectoral muscle are considered a difficult task, particularly in mammogram images because of artefacts, homogeneity among the region of the breast and pectoral muscle, and low contrast along the region of breast boundary, the similarity between the texture of the Region of Interest (ROI), and the unwanted region and irregular ROI. This study aims to propose an improved threshold-based and trainable segmentation model to derive ROI. A hybrid segmentation approach for the boundary of the breast region and pectoral muscle in mammogram images was established based on thresholding and Machine Learning (ML) techniques. For breast boundary estimation, the region of the breast was highlighted by eliminating bands of the wavelet transform. The initial breast boundary was determined through a new thresholding technique. Morphological operations and masking were employed to correct the overestimated boundary by deleting small objects. In the medical imaging field, significant progress to develop effective and accurate ML methods for the segmentation process. In the literature, the imperative role of ML methods in enabling effective and more accurate segmentation method has been highlighted. In this study, an ML technique was built based on the Histogram of Oriented Gradient (HOG) feature with neural network classifiers to determine the region of pectoral muscle and ROI. The proposed segmentation approach was tested by utilizing 322, 200, 100 mammogram images from mammographic image analysis society (mini-MIAS), INbreast, Breast Cancer Digital Repository (BCDR) databases, respectively. The experimental results were compared with manual segmentation based on different texture features. Moreover, evaluation and comparison for the boundary of the breast region and pectoral muscle segmentation have been done separately. The experimental results showed that the boundary of the breast region and the pectoral muscle segmentation approach obtained an accuracy of 98.13% and 98.41% (mini-MIAS), 100%, and 98.01% (INbreast), and 99.8% and 99.5% (BCDR), respectively. On average, the proposed study achieved 99.31% accuracy for the boundary of breast region segmentation and 98.64% accuracy for pectoral muscle segmentation. The overall ROI performance of the proposed method showed improving accuracy after improving the threshold technique for background segmentation and building an ML technique for pectoral muscle segmentation. More so, this paper also included the ground-truth as an evaluation of comprehensive similarity. In the clinic, this analysis may be provided as a valuable support for breast cancer identification. INDEX TERMS Breast cancer, Digital mammogram, Threshold technique, ML technique, Breast segmentation, Pectoral muscle segmentation. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

A Comprehensive Review on Heart Disease Prediction Using Data Mining and Machine Learning Techniques

Article

Full-text available

Oct 2020

Heart disease is one of the major causes of life complicacies and subsequently leading to death. The heart disease diagnosis and treatment are very complex, especially in the developing countries, due to the rare availability of efficient diagnostic tools and shortage of medical professionals and other resources which affect proper prediction and treatment of patients. Inadequate preventive measures, lack of experienced or unskilled medical professionals in the field are the leading contributing factors. Although, large proportion of heart diseases is preventable but they continue to rise mainly because preventive measures are inadequate. In today's digital world, several clinical decision support systems on heart disease prediction have been developed by different scholars to simplify and ensure efficient diagnosis. This paper investigates the state of the art of various clinical decision support systems for heart disease prediction, proposed by various researchers using data mining and machine learning techniques. Classification algorithms such as the Naïve Bayes (NB), Decision Tree (DT), and Artificial Neural Network (ANN) have been widely employed to predict heart diseases, where various accuracies were obtained. Hence, only a marginal success is achieved in the creation of such predictive models for heart disease patients therefore, there is need for more complex models that incorporate multiple geographically diverse data sources to increase the accuracy of predicting the early onset of the disease.

Clean Medical Data and Predict Heart Disease

Conference Paper

Full-text available

Jul 2020

The enormous data provided by the health care environment needs many important and powerful tools for analyzing and extracting data and accessing useful knowledge. Many researchers have been interested in applying many statistical tools as well as many different data mining tools in order to improve an analysis process and extract data from a different data set. The only thing that proves the success and robustness of data mining tool is accurate diagnosis of the disease. According to the (WHO), the biggest cause of death in the last ten years or so in this vast world is heart disease. The statistical exploration tools that researchers use are tools that help decision-makers in health care to predict and diagnose heart disease. The tools used in the diagnostic process for heart disease have been thoroughly tested in order to demonstrate sufficient and acceptable accuracy. A set of patient data divided into 665 records was used, of which 300 were for males, with 365 for females, with 10 different related characteristics. The decision-making department still suffers from a lack of performance and decision-making. Our paper aims to process data in different ways before the process of accessing knowledge to make the appropriate decision through expectations of classification analysis and then using techniques to extract data with acceptable accuracy. Our goal proposed in this paper is to purify the data before the disease prediction process to get the best possible prediction and compare the results with the results of a group of previous researchers to reach an accurate diagnosis and prediction. The second part of our goal is to compare between different technologies on different data sets such as decision tree technology and the second technique is Bayesian classification technology and the last technology is neural networks and the results were (98.85%, 98.16%, 91.31%), respectively. In the end, we hope to obtain acceptable results with high accuracy in the future, enhance clinical diagnosis, and promote appropriate decision-making for early treatment specialists.

The Impact of Current Relation between Facebook Utilization and E-Stalking towards Users Privacy

Conference Paper

Jun 2020

A Compassion of Three Data Miming Algorithms for Heart Disease Prediction

Figures

Recommended publications

Keeping Compassion in Yoga Therapy

Customer Churn Prediction in Telecommunications Industry Based on Data Mining

Customer Churn Prediction in Telecommunications Industry Based on Data Mining

The Impact of Different Data Mining Classification Techniques in Different Datasets

The Impact of Different Data Mining Classification Techniques in Different Datasets