ArticlePDF Available

Heart Disease Prediction Using Machine Learning Techniques

July 2022
American Journal of Computer Science and Technology Vol. 5, No. 3, 2022,:146-154

July 2022
Vol. 5, No. 3, 2022,:146-154

DOI:10.11648/j.ajcst.20220503.11

Authors:

Dalhousie University

Machine learning and artificial intelligence have been found useful in various disciplines during the course of their development, especially in the enormous increasing data in recent years. It can be more reliable for making better and faster decisions for disease predictions. So, machine learning algorithms are increasingly finding their application to predict various diseases. Constructing a model can also help us visualize and analyze diseases to improve reporting consistency and accuracy. This article has investigated how to detect heart disease by applying various machine learning algorithms. The study in this article has shown a two-step process. The heart disease dataset is first prepared into a required format for running through machine learning algorithms. Medical records and other information about patients are gathered from the UCI repository. The heart disease dataset is then used to determine whether or not the patients have heart disease. Secondly, Many valuable results are shown in this article. The accuracy rate of the machine learning algorithms, such as Logistic Regression, Support vector machine, K-Nearest-Neighbors, Random Forest, and Gradient Boosting Classifier, are validated through the confusion matrix. Current findings suggest that the Logistic Regression algorithm gives a high accuracy rate of 95% compared to other algorithms. It also shows high accuracy for f 1-score, recall, and precision than the other four different algorithms. However, increasing the accuracy rates to approximately 97% to 100% of the machine learning algorithms is the future study and challenging part of this research.

Support Vector Machine.

…

K-Nearest Neighbors.

…

General procedure of Random Forest.

…

Correlation with the target feature.

…

(a) cp v/s target, (b) slope v/s target.

…

Figures - uploaded by Mohammed Khalid Hossen

Content may be subject to copyright.

Content uploaded by Mohammed Khalid Hossen

Content may be subject to copyright.

American Journal of Computer Science and Technology

2022; 5(3): 146-154

http://www.sciencepublishinggroup.com/j/ajcst

doi: 10.11648/j.ajcst.20220503.11

ISSN: 2640-0111 (Print); ISSN: 2640-012X (Online)

Heart Disease Prediction Using Machine Learning

Techniques

Mohammed Khalid Hossen

Department of Computer Science and Engineering, Sylhet Agricultural University, Sylhet, Bangladesh

Email address:

To cite this article:

Mohammed Khalid Hossen. Heart Disease Prediction Using Machine Learning Techniques. American Journal of Computer Science and

Technology. Vol. 5, No. 3, 2022, pp. 146-154. doi: 10.11648/j.ajcst.20220503.11

Received: June 27, 2022; Accepted: July 13, 2022; Published: July 20, 2022

Abstract:

Machine learning and artificial intelligence have been found useful in various disciplines during the course of their

development, especially in the enormous increasing data in recent years. It can be more reliable for making better and faster

decisions for disease predictions. So, machine learning algorithms are increasingly finding their application to predict various

diseases. Constructing a model can also help us visualize and analyze diseases to improve reporting consistency and accuracy.

This article has investigated how to detect heart disease by applying various machine learning algorithms. The study in this article

has shown a two-step process. The heart disease dataset is first prepared into a required format for running through machine

learning algorithms. Medical records and other information about patients are gathered from the UCI repository. The heart disease

dataset is then used to determine whether or not the patients have heart disease. Secondly, Many valuable results are shown in this

article. The accuracy rate of the machine learning algorithms, such as Logistic Regression, Support vector machine, K-Nearest-

Neighbors, Random Forest, and Gradient Boosting Classifier, are validated through the confusion matrix. Current findings suggest

that the Logistic Regression algorithm gives a high accuracy rate of 95% compared to other algorithms. It also shows high

accuracy for f

-score, recall, and precision than the other four different algorithms. However, increasing the accuracy rates to

approximately 97% to 100% of the machine learning algorithms is the future study and challenging part of this research.

Keywords:

Machine Learning, Artificial Intelligence, Heart Disease, Linear Regression, Support Vector Machine,

K-Nearest-Neighbors, Random Forest, Decision Tree, Gradient Boosting

1. Introduction

Machine learning (ML) is a part of artificial intelligence

(AI) that allows a software application to improve its

prediction accuracy without being formally programmed. In

order to forecast new output values, machine learning

algorithms use historical data as input [1]. Machine learning

is a significant and diversified field, and its scope and

application are expanding daily. For this reason, machine

learning has become a crucial competitive differentiation in

many organizations. Machine learning includes supervised,

unsupervised, and ensemble learning classifiers that are used

to predict and find the accuracy of a dataset. ML algorithms

can build a model based on sample data called train data to

make a decision or prediction [1, 2].

The use of machine learning methods in the medical

industry is the subject of the current study, which mainly

focuses on mimicking some human activities or mental

processes and recognizing diseases from a variety of inputs [3].

The term “heart disease” refers to a group of conditions that

affect the heart. According to World Health Organization

reports, cardiovascular diseases are now the leading cause of

death worldwide, approximately 17.9 million [4, 5]. Many

types of research have been studied and performed with

various machine learning algorithms to diagnose heart diseases.

According to Ghumbre et al., machine learning and deep

learning algorithms are applied to predict heart diseases in the

UCI dataset [3]. The authors concluded that machine learning

algorithms performed better for this analysis. Machine learning

techniques for heart disease prediction are published by Rohit

Bharti et al., where the article concluded that different data

mining and neural system should be used to find the

seriousness of HD among patients [4]. Some analysis has been

led to think about the implementation of a predictive data

mining strategy on the same dataset [5]. Prediction of heart

disease using machine learning is studied by Jee S H et al. in

147 Mohammed Khalid Hossen: Heart Disease Prediction Using Machine Learning Techniques

which training and the testing dataset are performed by using a

neural network algorithm [6]. K-Nearest Neighbor algorithm is

reviewed to diagnose heart disease by Mai Shouman et al. [7].

Some efficient algorithms have been used to detect HD, which

shows results that each algorithm has its strength to register the

defined objectives [8]. The supervised network has been

applied for HD diagnosis, which is studied by Raihan M et al.

[9]. This research idea has been broadened and inspired us

worldwide by publishing many articles [10-15].

This article will construct an ML predictive model, which

will help analyze heart disease regarding the medical history.

Data is collected from the UCI repository with patients'

medical records and attributes. This dataset would be utilized

to predict whether the patients have heart disease or not. To

diagnose the HD dataset, this article considers 14 attributes of

a patient. It classifies whether the disease is present or not and

can help us diagnose diseases with fewer medical treatments [1,

5]. For this study, this article considers various attributes of

patients like age, sex, serum cholesterol, blood pressure, exang,

etc. Five different ML algorithms such as Logistic Regression

(LR), Support vector machine (SVM), K-Nearest-Neighbors

(KNN), Random Forest (RF), and Gradient Boosting Classifier

(GBC) are applied for the purpose of classification and

prediction of heart disease. Many beneficial results are

presented in this article. The attributes of the given dataset are

trained under these algorithms. Based on the characteristics of

the HD dataset, a comparative analysis of algorithms has been

studied regarding the accuracy rate. All the selected ML

algorithms are efficient by showing their accuracy, which is

greater than 80%. The most efficient algorithm is Logistic

Regression (LR), which gives us an accuracy rate of

approximately 95%. Finally, Logistic Regression (LR)

algorithm will be considered to predict and diagnose for heart

disease of a patient.

This article is rearranged sequentially. In section 2, the

methodology has been discussed. Various ML algorithms are

studied briefly in section 3. Results and analysis are shown in

section 4. In the result section, algorithms are compared

regarding the confusion matrix. Finally, a conclusion and

future scope have been drawn in section 5.

2. Methodology

In this section, the method and analysis are described,

which is performed in this research work. First of all, the

collection of data and selection of relevant attributes are the

initial steps in this study. After that, the relevant data is pre-

processed into the required format. The given data is then

separated into two categories: training and testing datasets.

The algorithms are then used, and the given data train the

model. The accuracy of this model is obtained by using the

testing data. The procedures of this study are loaded by using

several modules such as a collection of data, selection of

attributes, pre-processing of data, data balancing, and

prediction of disease.

2.1. Data Collection

In this article, the dataset is collected from the UCI

repository, which is considered in research analysis by the

many authors [4, 7]. So, the first step is organizing the

dataset from the UCI repository to predict the heart disease

and then dividing the dataset into two sections: training and

testing. In this article, 80% data has been considered as a

training dataset, and 20% dataset is used for testing purposes.

2.2. Dataset and Attributes

Attributes of a dataset are properties of a dataset, which

are important to analyze and make a prediction regarding our

concern. Various attributes of the patient, like gender, chest

pain, serum cholesterol, fasting blood pressure, exang, etc.,

are considered for predicting diseases. However, the

correlation matrix can be used for attribute selection to

construct a model.

Table 1. Attributes used are listed.

Sl. No. Attributes Description Values

1. Age Patients age in years Continuous

2. Sex Sex of subject (male-0, female-1) Male/Female

3. CP Chest pain type Four types

4. Trestbps Resting blood pressure Continuous

5. Chol Serum cholesterol in mg/dl Continuous

6. FBS Fasting blood pressure < or >120 mg/dl

7. Restecg Resting Electrocardiograph Five values

8. Thalach Maximum heart rate achieved Continuous

9. exang Exercise Induced Angina Yes/No

10. oldpeak ST Depression introduced by exer. Continuous

11. slope Slope of Peak Exercise ST segment up/flat/down

12. Ca Number of major vessels 0-3

13. thal Defect type Reversible/Fixed/Normal

14. Targets Heart disease 1 (disease), 0 (no disease)

2.3. Pre-processing of Data

We need to clean and remove the missing or noise values

from the dataset to obtain accurate and perfect results, known

as data cleaning. Using some standard techniques in python

3.8, we can fill missing and noise values, see [16]. Then we

need to transform our dataset by considering the dataset's

normalization, smoothing, generalization, and aggregation.

American Journal of Computer Science and Technology 2022; 5(3): 146-154 148

Integration is one of the crucial phases in data pre-

processing, and various issues are considered here to

integrate. Sometimes the dataset is more complex or difficult

to understand. In this case, the dataset needs to be reduced in

a required format, which is best to get a good result.

2.4. Balancing of Data

Balancing the dataset is necessary to improve the

performance of machine learning algorithms. A balanced

dataset has the same amount of input samples for each output

class (or target class). The imbalanced dataset can be

balanced by considering two methods, such as under

sampling and over sampling.

2.5. Prediction of Disease

In this article, five different machine learning algorithms

are implemented for classification. A comparative analysis of

the algorithms has been studied. Finally, this article considers

an ML algorithm that gives the highest accuracy rate for

heart disease prediction, see Figure 1.

Figure 1. Architecture of prediction models.

3. Machine Learning Algorithms

A data analysis technique called machine learning

automates the development of analytical models. In this

observation, five different algorithms are studied to obtain

the accuracy for finding the best one.

3.1. Logistic Regression Model

This ML model is often used for classification and

predictive analysis, also known as logit regression [16]. It is

also utilized to estimate the discrete values, like the binary

outcome, from a collection of independent variables. A

binary result means two possibilities will happen: either the

event happens (say 1), or it does not happen (say 0).

Here below are the working procedures of the Logistic

Regression model.

Figure 2. Logistic Regression model.

Where z is a function of x1, x2, w1, w2, and b. So, z is a

linear equation given to a sigmoid function to predict the

output. We calculate the loss to evaluate the performance of

this model. In this case, we use the cross-entropy loss

149 Mohammed Khalid Hossen: Heart Disease Prediction Using Machine Learning Techniques

function [17].

3.2. Support Vector Machine (SVM)

SVM is the most popular supervised machine learning

algorithm, which is used for classification as well as regression

[23]. Although, we primarily consider this algorithm for

classification problems in ML. The purpose of the SVM

algorithm is to construct the optimum decision boundary or

line that can divide n-dimensional space into classes so that we

can quickly put the new data point in the correct category. This

optimal decision boundary is known as hyperplane [23]. SVM

selects the extreme vectors that help to create the hyperplane.

The extreme vectors are known as support vectors, and the

algorithm with support vectors is called the support vector

machine. Here below is a figure of SVM, where the decision

boundaries or hyperplane classifies two different categories.

The training sample dataset is (x2, x1), where x1 is the x-axis

vector, and x2 is the target vector, see figure 3.

Figure 3. Support Vector Machine.

3.3. K-Nearest Neighbors (K-NN)

K-NN is the most straightforward classification algorithm

based on supervised learning techniques. However, the K-NN

algorithm can also be used for regression but is mostly used

for classification [18]. A new data point is classified by using

the K-NN algorithm depending on how similar the existing

data is stored. It indicates that the K-NN algorithm can

quickly classify new data when it appears in a suitable

category, see Figure 4.

Figure 4. K-Nearest Neighbors.

Here, the horizontal x-axis and vertical y-axis are

independent and dependent variables of a function,

respectively. Figure 4 is a simple example of the K-NN

classification algorithm. The test sample (Yellow Square with

what symbol) should be classified as either a green triangle

or a red star in this algorithm. When k=3 is considered in a

small dash circle, the yellow square would be a green triangle

because the majority number in this region is green triangles,

not red stars. Now, if we consider k=7, which is in a large

dash circle, then the yellow square would be red stars

because the number of red stars is four and the green

triangles are 3. So, It can conclude that the majority vote in a

specific region is important here, see Figure 5.

3.4. Random Forest

Random Forest (RF) is a popular supervised machine

learning algorithm used for both classifications and

regression. However, it is mainly used in classification

problems. RF algorithm is based on the concept of ensemble

learning. Ensemble learning is a general machine learning

procedure that can be used for multiple learning algorithms

to seek better predictive performance [2, 19]. So, the RF

technique creates several decision trees on the data samples,

obtains the prediction from each tress, and finally gets the

better solution by considering the majority voting. It is noted

that the ensemble method is better than a single decision tree

because it mitigates the over-fitting by averaging results. The

large number of decision trees in RF helps us to get the

accuracy and prevent over-fitting of the problems. The

following procedures are completed by RF algorithm, see

also figure 6:

Step 1: First, n numbers of the random sample are selected

from a given dataset.

Step 2: A decision tree will be constructed for every

individual.

Step 3: Each decision tree will predict an output.

Step 4: Final result has come through a majority voting or

averaging.

Figure 5. General procedure of Random Forest.

3.5. Gradient Boosting

Gradient Boosting (GB) is a machine learning technique

American Journal of Computer Science and Technology 2022; 5(3): 146-154 150

that is used in classification and regression problems like

others. It is a powerful algorithm in the field of machine

learning [21]. As is well known, the errors are classified into

two categories in machine learning algorithms: Bias error and

Variance error. GBC helps us to minimize bias error

sequentially in the model, see Figure 6. A diagram is

described as follows below,

Figure 6. Diagram Gradient Boosting (Source [22]).

As we can see that the ensemble consists of N trees; see

Figure 6. First of all, the feature matrix X and the labels y are

used to train Tree 1. To calculate the training set residual

error r1, the predictions labeled are used. Then, Tree 2 is

trained using feature matrix X and residual errors r1 of Tree 1

as labels. The residual error r2 is then calculated by using

predictive error, see Figure 6.

4. Result Analysis

4.1. Analysis of Heart Disease Dataset

Figure 7. Target class.

Before going to study the performance of considering

machine learning algorithms in this research, analysis of

the features of the heart disease dataset will be focused on

here. The total number of observations in the target

attributes is 1025, where not having heart disease 499

(denoted by 0) and having heart disease 526 (represented

by 1), see Figure 7. So, the percentage of not having heart

disease is 45.7%, and the percentage of having heart

disease is 54.3%, see Figure 8(a). It is shown that the rate

of heart disease is more than the rate of no heart disease.

In Figure 8(b), the sex feature of the HD dataset is

observed through the target feature. In sex attribute, the

female and male numbers are 312 and 713, respectively.

So, the male number is more than double of female

number. We can see in this figure 8(b) that the number of

heart diseases in males is higher than in females.

Similarly, no heart disease among males is higher than in

females. Figure 8(b) concludes that male is sufferer than

female; for more information, see figure 8(b).

Figure 8. (a) Percentage of no heart disease and heart disease, (b)

Comparison between sex and target feature.

Figure 9(a) shows a relationship between age and

cholesterol with the target feature. For an experiment, these

features from the dataset are considered randomly. The trend

of no heart disease is higher from 55 to 68 when the

cholesterol level is between 200 mg/dl and 300 mg/dl. For

validation, the KDE plot 9(b) is studied and shows similar

statistics.

151 Mohammed Khalid Hossen: Heart Disease Prediction Using Machine Learning Techniques

Figure 9. (a) Age v/s Cholesterol with the target feature, (b) Kernel density estimate (kde) plot of age v/s cholesterol.

The correlation of the features is drawn in figure 10. The

main purpose of the correlation plot is to define the positive

and negative correlation between the features. However, it

assumes that figure 10 is complex for getting the strong and

weak correlation. For this reason, this article added another

figure 11 to obtain these correlations efficiently. In figure 11,

we can see that three features like cp, thalach, and slope

positively correlate with target features.

Figure 10. Correlation matrix of the attributes.

American Journal of Computer Science and Technology 2022; 5(3): 146-154 152

Figure 11. Correlation with the target feature.

Two strong correlations by cp and slope with target feature

are studied statistically. As we can see in figure 12(a), there is

no heart disease when the cp level is more than 350;

however, heart disease is sustained more when the cp is

between 200 and 250. In addition, when the slope is in 300 <

slope-1 < 350, it shows that there is no disease, see figure

12(a). In contrast, for slope-2, there is a heart disease in 300

< slope-2 < 350.

Figure 12. (a) cp v/s target, (b) slope v/s target.

153 Mohammed Khalid Hossen: Heart Disease Prediction Using Machine Learning Techniques

4.2. Performance Analysis

In this article, various machine learning algorithms like

Logistic Regression (LR), Support vector machine (SVM), k-

Nearest-Neighbors (KNN), Random Forest Classifier (RF),

and Gradient Boosting Classifier (GBC) are studied broadly

to predict the heart disease. The accuracy rate of each

algorithm has been measured, and selects the algorithm with

the highest accuracy. The accuracy rate is a correct prediction

ratio to the total number of given datasets. It can be written

as,

Accuracy =



 

Where, TP: True Positive

TN: True Negative

FP: False Positive

FN: False Negative

After performing the machine learning algorithms for

training and testing the dataset, we can find the better

algorithm by considering the accuracy rate. The rate of

accuracy is calculated with the support of a confusion matrix.

As shown in Table 2, the Logistic Regression algorithm gives

us the best accuracy to compare with other ML algorithms.

Table 2. Accuracy comparison of algorithms.

Algorithms Accuracy

Logistic Regression (LR) 0.95

Support vector machine (SVM) 0.90

K-Nearest-Neighbors (KNN) 0.87

Random Forest Classifier (RF) 0.79

Gradient Boosting Classifier (GBC) 0.80

Figure 13. Accuracy comparison of machine learning algorithms by bar

diagram.

This has been studied more on the LR machine learning

algorithm through confusion matrix and f

-score. The

confusion matrix shows that the correct predicted value is

95%, see figure 14. f

-score is calculated by, which is shown

in figure 15,

= 2 *



















Where, Precision, P =





, and Recall R =





Figure 14. Confusion matrix of LR algorithm.

Figure 15. f

-score, precision, and recall of LR algorithm.

5. Conclusion and Future Scope

The heart is a vital organ in the human body; however,

heart disease is a major concern in the world because this

disease is increasing day by day. So, we can handle this

disease if we have a model which can predict the initial

condition of heart disease. So, we need to create a machine

learning model that can be more accurate and help to

diagnose heart disease with less doubt and cost. It can be a

primary technique for knowing the condition of the heart. For

this reason, this article focuses on the heart disease prediction

based on the accuracy rate of the confusion matrix.

Following this idea, the statistics of the given algorithms are

used to estimate the accuracy rate of confusion matrix and

validated the statistics among the machine learning

algorithms. When five algorithms are compared, it is found

that Logistic Regression algorithm is selected regarding the

performance of high accuracy rate. The accuracy rate of

Logistic Regression model is 95%, which indicates that

machine learning algorithm will be considered as a pre-

defined tool to seek heart diseases in the near future. Other

statistics such as f

-score, recall, and precision rate have been

calculated for Logistic Regression as 95%, 95%, and 95%,

respectively. These estimated values suggest the highest

accuracy of this algorithm.

These findings suggest that machine learning algorithms

can effectively learn about the disease predictions. We may

extend this kind of study to diagnose other diseases. We may

also analyze the past history of data and combine other

American Journal of Computer Science and Technology 2022; 5(3): 146-154 154

machine learning techniques for better study. Other possible

further applications of this study can include such as,

cardiovascular disease prediction, diabetic prediction, breast

cancer prediction, tumor prediction, and multiple disease

predictions.

References

[1] Wikipedia contributors. (2022, June 22). Machine learning. In

Wikipedia, The Free Encyclopedia. Retrieved 06:31, June 26,

2022, from

https://en.wikipedia.org/w/index.php?title=Machine_learning

&oldid=1094363111.

[2] Victor Chang, Vallabhanent Rupa Bhavani, Ariel Qianwen Xu,

MA Hossain. An artificial intellegence model for heart disease

detection using machine learning. Healthcare Analytics,

volume 2, November 2022, 100016.

https://doi.org/10.1016/j.health.2022.100016.

[3] Ghumbre, S. U., & Ghatol, A. A. (2012). Heart disease

diagnosis using machine learning algorithm. In Proceedings of

the International Conference on Information Systems Design

and Intelligent Applications 2012 (INDIA 2012) held in

Visakhapatnam, India, January 2012 (pp. 217-225). Springer,

Berlin, Heidelberg.

[4] Rohit Bharti, Aditya Khamparia, Mohammed Shabaz, Gaurav

Dhiman, Sagar pande, and Parneet Singh. Prediction of Heart

Disease Using a combination of Machine Learning and Deep

learning. Hindawi Computational Intelligence and

Neuroscience, Volume 2021, Article ID 8387680, 11 pages.

https://doi.org/10.1155/2021/8387680.

[5] Khaled Mohamed Almustafa. Prediction of heart disease and

classifiers sensitivity analysis. Almustafa BMC Bioinfirmatics

(2020) 21: 278. https://doi.org/10.1186/s12859-020-03626-y.

[6] Jee S H, Jang Y, Oh D J, Oh B H, Lee S H, Park S W & Yun

Y D (2014), A coronary heart disease prediction model. The

Korean Heart Study. BMJ open, 4 (5), e005025.

[7] Mai Shouman, Tim Turner, and Rob Stocker. Applying k-

Nearest Neighbour in diagnosis heart disease patients..

International Journal of Information and Education

Technology, vol. 2, No. 3, June 2012.

[8] Ganna A, Magnusson P K, Pedersen N L, de Faire U, Reilly

M, Arnlov J & Ingelsson E (2013). Multilocus genetic risk

scores for coronary heart disease prediction. Arteriosclerosis,

thrombosis, and vascular biology, 33 (9), 2267-72.

[9] Raihan M, Mondal S, More A, Sagor M O F, Sikder G,

Majumder M A & Ghosh K (2016, December). Smartphone

based ischeme heart disease (heart attact) risk prediction using

clinical data and data mining approaches, a prototype design.

International conference on Computer and Information

Technology (ICCIT) (pp. 299-303). IEEE.

[10] Acharya U R, Fujita H, Oh S L, Hagiwara Y, Tan J H &

Adam M (2017). Application of deep convolutional neural

network for automated detection of myocardial infarction

using ECG signals. Information Sciences, 415, 190-8.

[11] Takci H (2018). Improvement of heart attack prediction by the

feature selection methods. Turkish Journal of Electrical

Engineering & Computer Sciences, 26 (1), 1-10.

[12] Brown N, Young T, Gray D, Skene A M & Hampton J R

(1997). Inpatient deaths from acute myocardial infarction,

1982-92: analysis of data in the Nottingham heart attack

[13] Soni J, Ansari U, Sharman D & Soni S (2011). Predictive data

mining for medical diagnosis: an overview of heart disease

prediction. International Journal of Computer Applications, 17

(8), 43-8.

[14] Bashir S, Qamar U & Javed M Y (2014, November). An

ensemble-based decision support framework for intelligent

heart disease diagnosis. International Conference on

Information Sociaty (i-Sociaty 2014) (pp. 259-64). IEEE.

[15] Ordonez C (2006). Associate rule discovery with the train and

test approach for heart disease prediction. IEEE Transaction

on Information Technology in Biomedicine, 10 (2), 334-43.

[16] Wikipedia contributors. (2022, June 21). Logistic regression.

In Wikipedia, The Free Encyclopedia. Retrieved 06:36, June

26, 2022, from

https://en.wikipedia.org/w/index.php?title=Logistic_regressio

n&oldid=1094256072.

[17] Wikipedia contributors. (2022, June 1). Linear regression. In

Wikipedia, The Free Encyclopedia. Retrieved 06:39, June 26,

2022, from

https://en.wikipedia.org/w/index.php?title=Linear_regression

&oldid=1091044459.

[18] Wikipedia contributors. (2022, June 4). K-nearest neighbors

algorithm. In Wikipedia, The Free Encyclopedia. Retrieved

06:40, June 26, 2022, from

https://en.wikipedia.org/w/index.php?title=K-

nearest_neighbors_algorithm&oldid=1091525121.

[19] Wikipedia contributors. (2022, June 20). Random forest. In

Wikipedia, The Free Encyclopedia. Retrieved 06:41, June 26,

2022, from

https://en.wikipedia.org/w/index.php?title=Random_forest&ol

did=1094130824.

[20] Wikipedia contributors. (2022, June 15). Decision tree

learning. In Wikipedia, The Free Encyclopedia. Retrieved

06:42, June 26, 2022, from

https://en.wikipedia.org/w/index.php?title=Decision_tree_lear

ning&oldid=1093316444.

[21] Wikipedia contributors. (2022, June 24). Gradient boosting. In

Wikipedia, The Free Encyclopedia. Retrieved 06:43, June 26,

2022, from

https://en.wikipedia.org/w/index.php?title=Gradient_boosting

&oldid=1094845596.

[22] nikki2398. (02 Sep, 2020). ML–Gradient Boosting.

https://www.geeksforgeeks.org/ml-gradient-boosting/.

[23] Wikipedia contributors. (2022, June 20). Support-vector

machine. In Wikipedia, The Free Encyclopedia. Retrieved

06:51, June 26, 2022, from

https://en.wikipedia.org/w/index.php?title=Support-

vector_machine&oldid=1094109362.

Performance Comparison of Machine Learning Models for Handwritten Devanagari Numerals Classification

Article

Full-text available

Jan 2023

This work focuses on comparing the suitability of different machine learning models for the classification of handwritten digits in the Devanagari script. The models that will be compared in this study are: K-Nearest Neighbours (K-NN), Support Vector Machine (SVM), Convolutional Neural Network (CNN), GoogLeNet (Inception v1), and ResNet-50. GoogLeNet and ResNet-50 are complex, deep neural networks. They possess a large number of hidden layers, and are generally used for more complex image classification tasks. The use of these models in this project is to gauge how well they perform on simpler image data. The foundation of this research is based on the ever increasing demand for accurate and efficient digit classification models in India, for purposes such as document scanning, ID card recognition, and the digitization of institutional records. The primary objective of this research project is to identify the most accurate and efficient digit classification model for numbers in the Devanagari script. Surprisingly, proposed simple CNN model outperforms the other complex GoogleNet and ResNet-50 models. Accuracy and Fl score of proposed CNN model is 99.522% and 0.9978 respectively. Also, the proposed CNN model used in this study outperforms other CNN model considered for Devanagari numerals classification.

A Study of Large-eddy Simulation using Statistical and Machine Learning Techniques

Article

Full-text available

Sep 2022

Mohammed Khalid Hossen

The numerical solution of Navier-Stokes (N-S) equations has been found useful in various disciplines during its development, especially in recent years. However, a large-eddy simulation method has been developed to model the subgrid-scale dissipation rate by closing the Navier-Stokes equations. Because the instantaneous and time-averaged statistic characteristics of the subgrid-scale turbulent kinetic energy and dissipation have been studied by large eddy simulation. The purpose of this study is to check the statistical and machine learning of the subgrid-scale energy dissipation. As we know that the current turbulence theory states that the vortex stretching mechanism transports energy from large to small scales and leads to a high energy dissipation rate in a turbulent flow. Hence, a vortex-stretching-based subgrid-scale model is considered regarding the square of the velocity gradient to detect the playing role of the vortex stretching mechanism. The study in this article has shown a two-step process. Considering a posteriori statistic of the velocity gradient is analyzed through higher-order statistics and joint probability density function. Secondly, a machine learning approach is studied on the same data. The results of the vortex-stretching-based subgrid-scale model are then compared with the other two dynamic subgrid models, such as the localized dynamic kinetic energy equation model and the TKE-based Deardorff model. The results suggest that the vortex-stretching-based model can detect the significant subgrid-scale dissipation of small-scale motions and predict satisfactory turbulence statistics of the velocity gradient tensor.

Pre-Processing and Normalization of the Historical Weather Data Collected from Secondary Data Source for Rainfall Prediction

Article

Nov 2023

In the twenty first century, data analysis has become the talk of the town. Almost every company or organization depends on data analysis for taking future decision. The most important step in data analysis after data collection is the preprocessing of the collected data. The main aim of data analysis is to find meaningful pattern by processing large amount of data. In data preprocessing, the inconsistency of collected data has been removed. After storing data for a relatively longer period, it becomes noisy and inconsistent. While measuring various parameter due to error in the instrument or human error, the value become incorrect or invalid. It is necessary to remove the invalid data otherwise it will deflect the results and produce error in the prediction. In this work preprocessing of the weather data has been analyzed for rainfall prediction using data mining.

Heart Disease Prediction Classification using Machine Learning

Article

Nov 2023

Heart disease is a leading cause of mortality worldwide, and early detection and accurate prediction of heart disease can significantly improve patient outcomes. Machine learning techniques have shown great promise in assisting healthcare professionals in diagnosing and predicting heart disease. The diagnosis and prognosis of heart disease must be improved, refined, and accurate, because a small mistake can cause weakness or death. According to a recent World Health Organization study, 17.5 million people die each year. By 2030, this number will increase to 75 million.[2] This document explains how to enable online KSRM capabilities. The KSRM smart system allows users to report heart-related problems. This research paper aims to explore the use of machine learning algorithms for effective heart disease prediction classification with Ada boost for improve the accuracy of algorithm.

Machine Learning-Based Diabetic Risk Prediction Model for Early Detection

Article

Full-text available

Sep 2023

Diabetes is a chronic disease that affects millions of people worldwide. Early detection and effective management of diabetes can significantly reduce the risk of complications and improve the quality of life of individuals with diabetes. In recent years, machine learning techniques have been applied to predict the risk of diabetes and to develop personalized treatment plans. In this study, we propose a machine learning-based diabetic risk prediction model for early detection and management. The proposed model uses various clinical and demographic variables such as age, gender, BMI, blood pressure, and fasting blood glucose levels to predict the risk of developing diabetes. We evaluated the performance of the proposed models using a dataset of patients with diabetes and non-diabetic individuals. Machine learning techniques including Logistic Regression, Support Vector Machine, K-Nearest Neighbors, and Random Forest are evaluated using the confusion matrices. The experimental results show that the Random Forest classifier achieved an accuracy of 80%, sensitivity of 82%, specificity of 80% in predicting the risk of diabetes. However, Increasing the accuracy rates of machine learning algorithms to 90% to 100% will be the challenging part of this study.

Progressive Heart Disease Prediction Model Using Machine Learning: A Comprehensive Staging Approach

Conference Paper

May 2024

A Comparative Evaluation of Machine Learning Techniques for Data-Driven Heart Disease Prediction

Conference Paper

Oct 2023

Improved and Intelligent Heart Disease Prediction System Using Machine Learning Algorithm

Chapter

Jun 2023

Predicting heart disease needs more perfection, precision, and correctness because a little fault may cause a big danger for a patient. In the field of machine learning, there are many classification algorithms for predicting heart disease. This paper presents the probability of heart disease prediction by some machine learning classifiers which are processed by feature engineering techniques on datasets. Feature engineering is used for building features by the process of using domain knowledge of data. Here a comparison has been shown before and after feature engineering of those supervised learning algorithms and identified the best algorithm for the best accuracy. The performance of each algorithm is determined and a comparison is made for each algorithm based on the precision of the calculation and the evaluation time. The proposed method has used the Cleveland dataset and another dataset consists of four datasets (Switzerland, Hungary, Cleveland, and Long Beach) downloaded from the Kaggle repository. Here the better accuracy has been gained from Ridge Classifier 86.89% for the Cleveland database. Another dataset has given 100% accuracy for the Gradient Boosting classifier, Bagging Classifier, and Gaussian Process classifier. This research will help to predict heart disease at an early stage which will reduce the death rate of heart disease.KeywordsMachine learning algorithmsDisease PredictionHeart Disease predictionFeature selectionFeature engineering technique

Heart Disease Prediction using Ensemble ML

Conference Paper

Mar 2023

Prediction Of Heart Disease and Identification of Severity Level Using Machine Learning Classifier with Android App

Conference Paper

Dec 2022

An artificial intelligence model for heart disease detection using machine learning algorithms

Article

Full-text available

Jan 2022

The paper focuses on the construction of an artificial intelligence-based heart disease detection system using machine learning algorithms. We show how machine learning can help predict whether a person will develop heart disease. In this paper, a python-based application is developed for healthcare research as it is more reliable and helps track and establish different types of health monitoring applications. We present data processing that entails working with categorical variables and conversion of categorical columns. We describe the main phases of application developments: collecting databases, performing logistic regression, and evaluating the dataset’s attributes. A random forest classifier algorithm is developed to identify heart diseases with higher accuracy. Data analysis is needed for this application, which is considered significant according to its approximately 83% accuracy rate over training data. We then discuss the random forest classifier algorithm, including the experiments and the results, which provide better accuracies for research diagnoses. We conclude the paper with objectives, limitations and research contributions.

Prediction of Heart Disease Using a Combination of Machine Learning and Deep Learning

Article

Full-text available

Jul 2021
Comput Intell Neurosci

The correct prediction of heart disease can prevent life threats, and incorrect prediction can prove to be fatal at the same time. In this paper different machine learning algorithms and deep learning are applied to compare the results and analysis of the UCI Machine Learning Heart Disease dataset. The dataset consists of 14 main attributes used for performing the analysis. Various promising results are achieved and are validated using accuracy and confusion matrix. The dataset consists of some irrelevant features which are handled using Isolation Forest, and data are also normalized for getting better results. And how this study can be combined with some multimedia technology like mobile devices is also discussed. Using deep learning approach, 94.2% accuracy was obtained.

Prediction of heart disease and classifiers’ sensitivity analysis

Article

Full-text available

Jul 2020
BMC BIOINFORMATICS

Khaled Mohamad Almustafa

Background: Heart disease (HD) is one of the most common diseases nowadays, and an early diagnosis of such a disease is a crucial task for many health care providers to prevent their patients for such a disease and to save lives. In this paper, a comparative analysis of different classifiers was performed for the classification of the Heart Disease dataset in order to correctly classify and or predict HD cases with minimal attributes. The set contains 76 attributes including the class attribute, for 1025 patients collected from Cleveland, Hungary, Switzerland, and Long Beach, but in this paper, only a subset of 14 attributes are used, and each attribute has a given set value. The algorithms used K- Nearest Neighbor (K-NN), Naive Bayes, Decision tree J48, JRip, SVM, Adaboost, Stochastic Gradient Decent (SGD) and Decision Table (DT) classifiers to show the performance of the selected classifications algorithms to best classify, and or predict, the HD cases. Results: It was shown that using different classification algorithms for the classification of the HD dataset gives very promising results in term of the classification accuracy for the K-NN (K = 1), Decision tree J48 and JRip classifiers with accuracy of classification of 99.7073, 98.0488 and 97.2683% respectively. A feature extraction method was performed using Classifier Subset Evaluator on the HD dataset, and results show enhanced performance in term of the classification accuracy for K-NN (N = 1) and Decision Table classifiers to 100 and 93.8537% respectively after using the selected features by only applying a combination of up to 4 attributes instead of 13 attributes for the predication of the HD cases. Conclusion: Different classifiers were used and compared to classify the HD dataset, and we concluded the benefit of having a reliable feature selection method for HD disease prediction with using minimal number of attributes instead of having to consider all available ones.

Improvement of heart attack prediction by the feature selection methods

Article

Full-text available

Jan 2018

Hidayet Takci

Prediction of a heart attack is very important since it is one of the leading causes of sudden death, especially in low-income countries. Although cardiologists use traditional clinical methods such as electrocardiography and blood tests for heart attack prediction, computer aided diagnosis systems that use machine learning methods are also in use for this task. In this study, we used machine learning and feature selection algorithms together. Our aim is to determine the best machine learning method and the best feature selection algorithm to predict heart attacks. For this purpose, many machine learning methods with optimum parameters and several feature selection methods were used and evaluated on the Statlog (Heart) dataset. According to the experimental results, the best machine learning algorithm is the support vector machine algorithm with the linear kernel, while the best feature selection algorithm is the reliefF method. This pair gave the highest accuracy value of 84.81%.

An Ensemble based Decision Support Framework for Intelligent Heart Disease Diagnosis

Conference Paper

Full-text available

Nov 2014

Large amount of medical data leads to the need of intelligent data mining tools in order to extract useful knowledge. Researchers have been using several statistical analysis and data mining techniques to improve the disease diagnosis accuracy in medical healthcare. Heart disease is considered as the leading cause of deaths worldwide over the past 10 years. Several researchers have introduced different data mining techniques for heart disease diagnosis. Using a single data mining technique shows an acceptable level of accuracy for disease diagnosis. Recently, more research is carried out towards hybrid models which show tremendous improvement in heart disease diagnosis accuracy. The objective of the proposed research is to predict the heart disease in a patient more accurately. The proposed framework uses majority vote based novel classifier ensemble to combine different data mining classifiers. UCI heart disease dataset is used for results and evaluation. Analysis of the results shows that the sensitivity, specificity and accuracy of the ensemble framework are higher as compared to the individual techniques. We obtained 82% accuracy, 74% sensitivity and 93% specificity for heart disease dataset.

A Classification Approach for Heart Disease Diagnosis using Machine Learning

Conference Paper

Oct 2021

Applying k-Nearest Neighbour in Diagnosing Heart Disease Patients

Article

Jan 2012

Heart disease is the leading cause of death in the world over the past 10 years. Researchers have been using several data mining techniques to help health care professionals in the diagnosis of heart disease. K-Nearest-Neighbour(KNN) is one of the successful data mining techniques used in classification problems. However, it is less used in the diagnosis of heart disease patients. Recently, researchers are showing that combining different classifiers through voting is outperforming other single classifiers. This paper investigates applying KNN to help healthcare professionals in the diagnosis of heart disease. It also investigates if integrating voting with KNN can enhance its accuracy in the diagnosis of heart disease patients. The results show that applying KNN could achieve higher accuracy than neural network ensemble in the diagnosis of heart disease patients. The results also show that applying voting could not enhance the KNN accuracy in the diagnosis of heart disease.

Predictive Data Mining for Medical Diagnosis: An Overview of Heart Disease Prediction

Article

Mar 2011

The successful application of data mining in highly visible fields like e-business, marketing and retail has led to its application in other industries and sectors. Among these sectors just discovering is healthcare. The healthcare environment is still "information rich" but "knowledge poor". There is a wealth of data available within the healthcare systems. However, there is a lack of effective analysis tools to discover hidden relationships and trends in data. This research paper intends to provide a survey of current techniques of knowledge discovery in databases using data mining techniques that are in use in today"s medical research particularly in Heart Disease Prediction. Number of experiment has been conducted to compare the performance of predictive data mining technique on the same dataset and the outcome reveals that Decision Tree outperforms and some time Bayesian classification is having similar accuracy as of decision tree but other predictive methods like KNN, Neural Networks, Classification based on clustering are not performing well. The second conclusion is that the accuracy of the Decision Tree and Bayesian Classification further improves after applying genetic algorithm to reduce the actual data size to get the optimal subset of attribute sufficient for heart disease prediction.

Multilocus Genetic Risk Scores for Coronary Heart Disease Prediction

Article

May 2013
ARTERIOSCL THROM VAS

Objective: Current guidelines do not support the use of genetic profiles in risk assessment of coronary heart disease (CHD). However, new single nucleotide polymorphisms associated with CHD and intermediate cardiovascular traits have recently been discovered. We aimed to compare several multilocus genetic risk score (MGRS) in terms of association with CHD and to evaluate clinical use. Approach and results: We investigated 6 Swedish prospective cohort studies with 10 612 participants free of CHD at baseline. We developed 1 overall MGRS based on 395 single nucleotide polymorphisms reported as being associated with cardiovascular traits, 1 CHD-specific MGRS, including 46 single nucleotide polymorphisms, and 6 trait-specific MGRS for each established CHD risk factors. Both the overall and the CHD-specific MGRS were significantly associated with CHD risk (781 incident events; hazard ratios for fourth versus first quartile, 1.54 and 1.52; P<0.001) and improved risk classification beyond established risk factors (net reclassification improvement, 4.2% and 4.9%; P=0.006 and 0.017). Discrimination improvement was modest (C-index improvement, 0.004). A polygene MGRS performed worse than the CHD-specific MGRS. We estimate that 1 additional CHD event for every 318 people screened at intermediate risk could be saved by measuring the CHD-specific genetic score in addition to the established risk factors. Conclusions: Our results indicate that genetic information could be of some clinical value for prediction of CHD, although further studies are needed to address aspects, such as feasibility, ethics, and cost efficiency of genetic profiling in the primary prevention setting.

Association Rule Discovery With the Train and Test Approach for Heart Disease Prediction

Article

May 2006

Carlos Ordonez

Association rules represent a promising technique to improve heart disease prediction. Unfortunately, when association rules are applied on a medical data set, they produce an extremely large number of rules. Most of such rules are medically irrelevant and the time required to find them can be impractical. A more important issue is that, in general, association rules are mined on the entire data set without validation on an independent sample. To solve these limitations, we introduce an algorithm that uses search constraints to reduce the number of rules, searches for association rules on a training set, and finally validates them on an independent test set. The medical significance of discovered rules is evaluated with support, confidence, and lift. Association rules are applied on a real data set containing medical records of patients with heart disease. In medical terms, association rules relate heart perfusion measurements and risk factors to the degree of disease in four specific arteries. Search constraints and test set validation significantly reduce the number of association rules and produce a set of rules with high predictive accuracy. We exhibit important rules with high confidence, high lift, or both, that remain valid on the test set on several runs. These rules represent valuable medical knowledge.

Heart Disease Prediction Using Machine Learning Techniques

Abstract and Figures

Recommended publications

KFCM-Based Direct Marketing

A Weighted Average Ensemble Technique to Predict Heart Disease

A hybrid cost-sensitive ensemble for heart disease prediction

Heart Disease Prediction Using ML

Heart disease prediction using machine learning algorithms

Heart Disease Prediction Using ML Algorthim

Hybrid Classification Using Ensemble Model to Predict Cardiovascular Diseases