Conference PaperPDF Available

MLHeartDis:Can Machine Learning Techniques Enable to Predict Heart Diseases?

October 2022

October 2022

DOI:10.1109/UEMCON54665.2022.9965714

Conference: 2022 IEEE 13th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)

Authors:

Muntasir Mamun

The University of Arizona

Md Milon Uddin

University of Texas at Tyler

Asm Mohaimenul Islam

University of South Dakota

Show all 5 authorsHide

AUC curve for Adaboost Fig.3. AUC curve for Decision tree

…

Figures - uploaded by Md Milon Uddin

Content may be subject to copyright.

Content uploaded by Md Milon Uddin

Content may be subject to copyright.

Content uploaded by Muntasir Mamun

Content may be subject to copyright.

MLHeartDis:Can Machine Learning Techniques

Enable to Predict Heart Diseases?

Muntasir Mamun Md. Milon Uddin

Deapartment of Computer Science

The Univeisity of South Dkaota, Vermillion, SD,USA, 57069

Muntasir.Mamun@coyotes.usd.edu

Department of Electrical Engineering

The University of Texas at Tyler, Tyler, TX , USA 75799

muddin3@patriots.uttyler.edu

Vivek Kumar Tiwari Asm Mohaimenul Islam

Deapartment of Electronic Engineering

The Univeisity of Texas at Tyler,USA, 75701

vtiwari@patriots.uttyler.edu

Ahmed Ullah Ferdous

Department of Computer Science

University of South Dakota, SD, TX , USA 57069

asm.islam@coyotes.usd.edu

Deapartment of Electronic and Telecommunication

Univeisity of Liberal Arts Bangladesh, Dhaka, Bangladesh

ahmdferdous@gmail.com

Abstract— Heart disease is contributing one of the leading

reasons of death in the contemporary world. The three major

danger signs for heart disease are smoking, high blood pressure

and cholesterol, and 47% of all US citizens have at least one of

these risk factors. In the field of clinical data analysis, predicting

cardiovascular disease is a major difficulty. In this case,

Machine learning (ML) can be important for taking decisions

and predictions about heart disease based on personal key

indicators (e.g., blood pressure, cholesterol level, smoking,

diabetic status, obesity, stroke, alcohol drinking) of heart

disease. In this paper, we proposed six machine learning models

using survey data of over 400k US residents from the year 2020.

The six machine learning models-Xgboost, Adaboost, Random

Forest, Decision Tree, Logistic Regression, and Naïve Bayes

have been compared in detail. Through the prediction model for

heart disease, we achieved an improved performance level with

an accuracy level of 91.57% for the prediction of heart diseases

with the logistic regression model.

Keywords— Machine learning, heart disease prediction, centers

for disease control and prevention (CDC), classification

algorithms, cardiovascular disease (CVD), regression model

I. INTRODUCTION

The heart is an essential component of the human

body, life depends on its component functioning. According

to the World Health Organization, heart disease would cause

over 23.6 million deaths worldwide by 2030 [1].

Numerous different heart problems are categorized

as cardiovascular diseases (CVDs). Heart attacks, which

claim the lives of more than 370,000 people annually, may be

caused by coronary heart disease, the most frequent of them

all. Heart failure is one of the first CVD presentations and

another one that causes morbidity and mortality. The World

Heart Federation recently listed certain risk factors that rise

the incidence and occurrence of heart failure, including

arterial hypertension, diabetes, smoking, injured heart

muscles, malfunctioning heart valves, and obesity [2].

Machine learning (ML), a new technology for evaluation

of clinical data and prediction generation for early disease

detection, is one of today's techniques for computer-aided

detection such as cancer diagnosis [3], heart failure diagnosis

[4], Alzheimer detection using brain MRI images [5],

Parkinson’s disease detection [6], and different fields like

virtual reality [7], 360-degree video caching [8] and so on.

In this research, we use four datasets with clinical data of

patients with heart disease to determine the performance of

six machine learning algorithms (MLAs). Machine learning

is a fast-expanding trend in the healthcare field as a result of

the improvement of wearable technology and sensors that

utilize data to evaluate a patient's health in real time. By using

machine learning to anticipate heart disease, an accurate

diagnosis can be made at a lower cost than the conventional

method [9].

Our study's primary contribution in this regard is the

identification of the top features from the raw dataset. with a

primary focus on the prediction heart diseases using machine

learning techniques. This would enable quick and correct

treatment of the identified risk factors during the course of

any necessary preventive diagnosis of these cardiovascular

illnesses. We have compared our results with other previous

works.

The paper is structured as follows: The literature study on

utilizing machine learning to detect heart problems is covered

in Section 2. The methodology is then presented in Section 3.

Dataset is represented in Section 4. The results are

highlighted and discussed in Section 5. Finally, in Section 6,

we highlight our recommendations for future works and offer

our findings.

II. LITERATURE REVIEW

Using Logistic Regression, Random Forest, Support

Vector Machine, Gaussian Nave Bayes, Gradient boosting,

K-nearest neighbors, Multinomial Nave Bayes, and Decision

Trees, Padmaja et al. [10] developed a machine learning

0561

Authorized licensed use limited to: CMU Libraries - library.cmich.edu. Downloaded on December 02,2022 at 16:36:35 UTC from IEEE Xplore. Restrictions apply.

model to identify cardiac illnesses. For validating the process,

the authors used the UCI repository-based Cleveland data for

classification which has 303 data samples with 14 attributes.

Authors found that, Random Forest achieved the highest

accuracy which is 93.44% and outperformed others machine

learning models. In addition, the performance of the classifier

is enhanced by shortening the execution time and selecting

important features from the input data set using the chi-square

feature selection approach.

Mary et al. [11] developed several Machine Learning

algorithms where authors used Cleveland dataset to predict

cardiac disease. The dataset has 303 instances and 14

attributes. Authors used KNN, SVM, Logistic Regression,

Neural Network, RF, Naïve Bayes, DT, and GDBT- Bagging

Tree Machine Learning Models for prediction. They

achieved the highest accuracy of 90% using the Artificial

Neural Network (ANN) algorithm with producing SAE.

For predicting heart illnesses, Singh et al. [12] employed four

different machine learning models. The authors collected the

UCI repository-based cleveland dataset which has 303

instances and 14 attributes. They found that k-nearest

neighbor achieved the highest accuracy of 87% for predicting

heart diseases.

For the purpose of predicting heart diseases, Shah et

al. [13] employed supervised machine learning methods such

Naive Bayes, decision trees, K-nearest neighbors, and

random forests. Here authors used UCI repository-based

cleveland dataset which has 303 instances and 14 attributes.

They used WEKA tools for pre-processing the dataset. The

authors implemented four models for the prediction and

found that k-nearest neighbor achieved the highest accuracy

of 90.789%.

Ghosh et al. [14] used a combined dataset

of UCI based (Cleveland, Long Beach VA, Switzerland,

Hungarian and Stat log) which has more than 1190 instances

and 14 attributes for heart diseases prediction. After

calculating all the results of the models, the authors achieved

the highest accuracy of 99.05% using Random Forest

Bagging Method (RFBM) model.

Based on the patients' accessible health indicators,

the authors in [15] have developed a framework that is

effective at detecting patient perspectives for predicting risk

factors. The study's goal is to offer light on the greatest

safeguards that specialists in medicine can employ in the case

of heart disease risk. The algorithms employed include C4.5,

SVM, CMAR, and Bayesian Classifiers. The system is

trained and tested using 10-fold methods which is a drawback

of this approach.

Machine learning techniques were employed [16] to

analyze the raw data and deliver the patient's disease

prediction and health status. The hybrid approach employed

by the authors combines the strengths of fuzzy logic and the

94 percent accurate k-nearest neighbor algorithm.

The study in [17] examines a method known as

outfit characterization, which combines various classifiers to

increase the precision of fragility estimations. The forecast

model is displayed with different highlights mixtures and a

few well-known grouping techniques. They generated an

improved exhibition level with an 88:7 percent precision

level.

The authors in [18] evaluated the 10 algorithms'

performance in classifications with two and four attributes.

Their findings show that the most important CVD risk factors

in the majority of the datasets are age, heart rate, and blood

pressure, followed by weight, cholesterol, smoking, serum

creatinine, ejection fraction, kind of chest pain, number of

arteries, platelet count, and obesity. The prediction

performance study made all of these characteristics stand out,

and they therefore affect CVD detection.

Table 1. performance of previous work

The convolutional neural network was employed by

the authors in [19] to forecast heart diseases. They used some

key variables that can help predict a person's susceptibility to

Authors

(year)

Dataset

Collection

(samples)

Applied Models Performance

(Proposed

model)

Mary et

al.

(2020)

[9]

UCI

cleveland

dataset (303)

Support Vector Machine,

Logistic Regression,

Artificial Neural

Network(Proposed),

K-Nearest Neighbor,

Random Forest,

Naïve Bayes,

Decision tree, and

GDBT- Bagging Tree

Accuracy:

90%

Singh et

al.

(2020)

[10]

UCI

cleveland

dataset (303)

K-Nearest Neighbor

(Proposed),

Decision tree, Linear

regression, and Support

vector machine

Accuracy:

87%

Shah et

al.

(2020)

[11]

UCI

cleveland

dataset (303)

Naïve Bayes, Decision

tree,

K-Nearest Neighbor

(Proposed),

and Random Forest

Accuracy:

90.789%

Padmaja

et al.

(2021)

[8]

UCI

cleveland

dataset (303)

Logistic

Regression,

Random

Forest(Proposed),

Support vector machine,

Gaussian Naïve Bayes,

Gradient boosting, K-

nearest neighbours,

Multinomial Naïve

Bayes and Decision trees

Accuracy:

93.44%

Ghosh et

al.

(2021)

[12]

UCI

(Cleveland +

Long Beach

VA +

Switzerland

+ Hungarian

and Stat log)

(1190+)

Decision Tree Bagging

Method (DTBM),

Random Forest

Bagging Method

(RFBM) (Proposed), K-

Nearest Neighbors

Bagging Method

(KNNBM), AdaBoost

Boosting Method

(ABBM), and Gradient

Boosting Boosting

Method (GBBM)

Accuracy:

99.05%

Our

(2022)

Kaggle

dataset

(319795)

XGBoost, AdaBoost,

Random Forest,

Decision Tree, Naïve

Bayes, and Logistic

Regression (Proposed)

Accuracy:

91.50%

0562

Authorized licensed use limited to: CMU Libraries - library.cmich.edu. Downloaded on December 02,2022 at 16:36:35 UTC from IEEE Xplore. Restrictions apply.

heart disease, including age, sex, cholesterol, and ECG slope.

There are four levels in use. The most accurate measurement

is made using an exponential linear unit (ELU) (87.09

percent). Since the research's data set was very limited,

expanding the dataset would further increase accuracy

III. METHODOLOGY

Pre-processing comes after data collection in the suggested

methodology. The chosen classifiers, including XGBoost,

AdaBoost, Random Forest, Decision Tree, Naïve Bayes, and

Logistic Regression are then trained and tested using the

common Hold-Out validation method on the heart diseases

dataset. To discover the best approach for predicting heart

diseases, the results are computed and examined. The

proposed strategy's outline is shown in figure 1.

A. Dataset Collection

We used a dataset titled "Key Indicators of Heart Disease"

in this paper that was obtained from the Kaggle online

domain [20]. This dataset has 319795 instances, and 18

attributes, whereas 1 class attribute and 17 attributes are

predictive. Proper Heart Diseases prediction is conducted by

appropriately using attributes, where the attributes describe

the symptoms. The predictive attributes are gender, age, race,

, obesity (high BMI), diabetic condition, not getting enough

physical activity and health, gen health, mental health,

drinking too much alcohol, smoking, stroke status, difficulty

of walking, asthma, kidney disease, skin cancer, respectively

and the class attribute is heart disease.

B. Dataset Pre-process:

The original dataset of approximately 300 variables was

reduced to just about 18 variables (9 booleans, 5 strings and

4 decimals) for predicting the significant output of cardiac

illnesses or heart diseases. Dataset pre-processing has been

done by using feature extraction, data cleaning, missing

values handling, and categorical variables transformation.

C. Validation Process:

It is essential to choose the right validation procedure for a

specific dataset. The hold-out validation is most effective for

getting the appropriate results when the dataset is large [21].

We applied a hold-out validation process by training 80% of

the dataset and testing 20% of the dataset. Using this

validation process, we calculated the accuracy, sensitivity,

specificity, precision, area under the curve, and F1-Score

performance matrices for each Machine Learning approach

using this validation process. The performance metrics and

visualization output graphs are demonstrated in detail in the

result analysis section. We have explained the overview of

the research work step by step in a flowchart.

Fig. 1. An overview of study

IV. D

ATASET

The Behavioral Risk Factor Surveillance System

(BRFSS), which conducts annual telephone surveys to gather

data on Americans' health conditions, uses the dataset, which

comes from the CDC, as a key component. The BRFSS

collects data from all 50 states, the District of Columbia, and

three U.S. territories. The BRFSS is the largest continually

running health survey system in the world, conducting over

400,000 adult interviews each year. This dataset has 319795

instances, and 18 attributes, whereas 1 class attribute and 17

attributes are predictive. Columns that ask respondents about

their health, including "Do you have considerable trouble

walking or climbing stairs," make up the great majority of the

content. or "Do you have a lifetime cigarette smoking total of

at least 100? 5 packs equal 100 cigarettes". We found

numerous variables (questions) in this dataset that either

directly or indirectly affect heart disease, so we chose the

most pertinent ones and cleaned it up so that it could be used

in machine learning projects. Only roughly 20 variables

remained from the original dataset's almost 300 variables.

Heart disease was treated as a binary variable, with "Yes"

denoting the presence of heart illness and "No" denoting the

absence of heart disease.

ESULTS

The findings of accuracy, F1 score, precision, recall,

specificity, and area under the curve have been used to assess

the performance of six machine learning techniques. Table 2

displays a comparison of the various models.

0563

Authorized licensed use limited to: CMU Libraries - library.cmich.edu. Downloaded on December 02,2022 at 16:36:35 UTC from IEEE Xplore. Restrictions apply.

Table 2. Comparison of results

Only accuracy, however, cannot serve as an

adequate metric for evaluating a model's performance. AUC

value also becomes an important matrix for determining a

model's performance and assesses a model's capacity for class

distinction. The True Positive Rate and the False Positive

Rate are compared along a probability curve at various

thresholds. The ability to differentiate between positive and

negative classes by the models is shown by the AUC. The

outcomes are better the higher the AUC. The range of

numbers between 0 and 1, where 0 denotes an erroneous test,

and a result of 1 indicates that absolute precision in the test.

An AUC of 0.5 often means there is no discrimination (i.e.,

the capacity to distinguish between patients with and 0.7 to

0.8 is thought to be a reasonable threshold for cancer or other

conditions. 0.8 to 0.9 is regarded as excellent, higher than 0.9

is regarded as regarded as exemplary performance [22].

We offered the AUC curves and average outcomes

utilizing a hold-out validation procedure that involves

training 80% of the dataset and testing 20% of it. The results

of Xgboost, Adaboost, and Naive Bayes are comparable.

When accuracy, specificity, and F1score are taken into

account, logistic regression outperforms other models

(showed in table 2).

Fig. 2. AUC curve for Adaboost

Fig.3. AUC curve for Decision tree

Fig.4. AUC curve for Logistic

regression

Fig.5. AUC curve for Naïve Bayes

Fig.6. AUC curve for Random

Forest

Fig.7. AUC curve for XGboost

The highest AUC score is found 0.84 for Adaboost,

Logistic regression and XGBoost. On the other hand, Naïve

Bayes model exhibits the lowest AUC score which is 0.65.

VI. C

ONCLUSION

One of the difficult tasks in medicine is predicting heart

disease. If the disease is recognized, the death rate can be

significantly reduced using machine learning techniques. Six

machine learning algorithms were employed in this research

paper to forecast heart diseases. In our findings, logistic

regression algorithm showed better accuracy 91.57% for the

prediction of heart diseases. Using a big data set and selecting

more features efficiently the accuracy can be increased in the

future research work.

EFERENCES

[1] V. Krishnaiah, G. Narsimha, N. Subhash Chandra, “heart disease

Prediction System using Data Mining Techniques and Intelligent Fuzzy

Approach: A Review”, International Journal of Computer Applications,

February 2016

[2] What is CVD?—World Heart Federation. Available online:

https://world-heart-federation.org/what-is-cvd/ (accessed on 15 July 2022)

[3] M. Mamun, A. Farjana, M. Al Mamun and M. S. Ahammed, "Lung

cancer prediction model using ensemble learning techniques and a

systematic review analysis," 2022 IEEE World AI IoT Congress (AIIoT),

2022, pp. 187-193, doi: 10.1109/AIIoT54504.2022.9817326.

[4] M. Mamun, A. Farjana, M. A. Mamun, M. S. Ahammed and M. M.

Rahman, "Heart failure survival prediction using machine learning

algorithm: am I safe from heart failure?," 2022 IEEE World AI IoT Congress

(AIIoT), 2022, pp. 194-200, doi: 10.1109/AIIoT54504.2022.9817303.

[5] M. Mamun, S. B. Shawkat, M. S. Ahammed, M. M. Uddin, M. I.

Mahmud, A. M. Islam, "Deep Learning Based Model for Alzheimer's

Disease Detection Using Brain MRI Images", 2022 IEEE 13th Annual

Ubiquitous Computing, Electronics & Mobile Communication Conference

(UEMCON), 2022, (Preprint)

[6] M. Mamun, M. I. Mahmud, M. I. Hossain, A. M. Islam, M. S. Ahammed,

M. M. Uddin, "Vocal Feature Guided Detection of Parkinson's Disease

Using Machine Learning Algorithms", 2022 IEEE 13th Annual Ubiquitous

Computing, Electronics & Mobile Communication Conference (UEMCON),

2022, (Preprint)

[7] M. M. Uddin and J. Park, "Machine learning model evaluation for 360°

video caching," 2022 IEEE World AI IoT Congress (AIIoT), 2022, pp. 238-

244, doi: 10.1109/AIIoT54504.2022.9817292.

[8] M. Milon Uddin and J. Park, "360 Degree Video Caching with LRU &

LFU," 2021 IEEE 12th Annual Ubiquitous Computing, Electronics &

Mobile Communication Conference (UEMCON), 2021, pp. 0045-0050, doi:

10.1109/UEMCON53757.2021.9666668.

[9] T. KarayÕlan, O. KÕlÕç, “Prediction of Heart Disease Using Neural

Network”, 2nd International Conference of Computer Science and

Engineering, IEEE, 2017.

[

10] B Padmaja, E. (2022). Early and Accurate Prediction of Heart Disease

Using Machine Learning Model. Retrieved 13 July 2022, from

https://turcomat.org/index.php/turkbilmat/article/view/8438

Model Acc.

(%)

Sen.

(%)

Spec.

(%)

Precision

(%)

F1-

score

(%)

AUC

Xgboost 91.50 92.22 50.28 99.10 95.53 0.84

Adaboos

91.55 92.35 51.70 99 95.56 0.84

Random

Forest

90.28 92.40 33.09 97.46 94.86 0.79

Decision

Tree

86.44 92.96 22.65 92.16 92.56 0.59

Logistic

Regressi

91.57 92.32 52.61 99.07 95.58 0.84

Naïve

Bayes

91.40 91.51 15.38 99.96 95.55 0.65

0564

Authorized licensed use limited to: CMU Libraries - library.cmich.edu. Downloaded on December 02,2022 at 16:36:35 UTC from IEEE Xplore. Restrictions apply.

[11]Mary, M. (2020). Heart Disease Prediction using Machine Learning

Techniques: A Survey. International Journal For Research In Applied

Science And Engineering Technology, 8(10), 441-447.

[12]A. Singh and R. Kumar, "Heart Disease Prediction Using Machine

Learning Algorithms," 2020 International Conference on Electrical and

Electronics Engineering (ICE3), 2020, pp. 452-457

[13] Shah, D., Patel, S. & Bharti, S.K. Heart Disease Prediction using

Machine Learning Techniques. SN COMPUT. SCI. 1, 345 (2020).

[14] Ghosh, P., Azam, S., Jonkman, M., Karim, A., Shamrat, F., & Ignatious,

E. et al. (2021). Efficient Prediction of Cardiovascular Disease Using

Machine Learning Algorithms With Relief and LASSO Feature Selection

Techniques. IEEE Access, 9, 19304-19326.

[15] Purushottama. C, Kanak Saxenab, Richa Sharma (2016), “Efficient

Heart Disease Prediction System”, Elsevier, Procedia Computer Science,

No. 85, pp. 962 – 969

[16] Sharanyaa, S., S. Lavanya, M. R. Chandhini, R. Bharathi, and K.

Madhulekha. "Hybrid Machine Learning Techniques for Heart Disease

Prediction." International Journal of Advanced Engineering Research and

Science 7, no. 3 (2020), pp 44-8.

[17] B. Keerthi Samhitha, M. R. Sarika Priya., C. Sanjana., S. C. Mana and

J. Jose, "Improving the Accuracy in Prediction of Heart Disease using

Machine Learning Algorithms," 2020 International Conference on

Communication and Signal Processing (ICCSP), 2020, pp. 1326-1330.

[18] L. R. Guarneros-Nolasco, N. A. Cruz-Ramos, G. Alor-Hernández, L.

Rodríguez-Mazahua, and J. L. Sánchez-Cervantes, “Identifying the main risk

factors for cardiovascular diseases prediction using machine learning

algorithms,” Mathematics, vol. 9, no. 20, p. 2537, 2021.

[19] I. Dokare, A. Prithiani, H. Ochani, S. Kanjan, and D. Tarachandani,

“Prediction of having a heart disease using machine learning,” SSRN

Electronic Journal, 2020.

[20] Kamil Pytlak.(2022, February).Personal Key Indicators of Heart

Disease. Version 1. Retrieved July 3, 2022 from

https://www.kaggle.com/datasets/kamilpytlak/personal-key-indicators-of-

heart-disease.

[19] Dwork, C., Feldman, V., Hardt, M., Pitassi, T., Reingold, O., & Roth,

A. (2015). The reusable holdout: Preserving validity in adaptive data

analysis. Science, 349(6248), 636-638. doi: 10.1126/science.aaa9375

[22] Mandrekar, J. (2010). Receiver Operating Characteristic Curve in

Diagnostic Test Assessment. Journal Of Thoracic Oncology,5(9), 1315-

1316.doi: 10.1097/jto.0b013e3181ec173d

0565

Authorized licensed use limited to: CMU Libraries - library.cmich.edu. Downloaded on December 02,2022 at 16:36:35 UTC from IEEE Xplore. Restrictions apply.

Machine Learning Based Heart Disease Prediction Using Random Forest

Article

Apr 2024

. Swarna S

A recent study by the World Health Organization sheds light on the alarming increase in cardiovascular diseases, contributing to approximately 17.9 million deaths annually. This study delves into the effectiveness of employing the Random Forest algorithm, a robust machine learning approach, to forecast the likelihood of heart disease based on diverse risk factors. By leveraging a dataset encompassing demographic, clinical, and lifestyle attributes, the Random Forest model underwent training to categorize individuals into two groups: those with or without heart disease. Through meticulous feature selection and ensemble learning, the algorithm adeptly captures intricate relationships among predictors, thereby augmenting prediction accuracy. Evaluation metrics including accuracy and AUC-ROC curve were employed in order to determine model's effectiveness. Impressively, our model achieves a prediction accuracy of 97%. Moreover, a comparative analysis with other prominent machine learning models such as Naive Bayes, Support Vector Machine (SVM), Logistic Regression (LR), XGBoost, Decision Tree revealed that the Random Forest approach outperforms others in terms of accuracy and efficiency in prediction tasks. Keywords: Random Forest (RF), Machine Learning (ML), Accuracy, Classification.

Improving Heart Disease Probability Prediction Sensitivity with a Grow Network Model

Preprint

Full-text available

Mar 2024

The traditional approaches in heart disease prediction across a vast amount of data encountered a huge amount of class imbalances. Applying the conventional approaches that are available to resolve the class imbalances provides a low recall for the minority class or results in imbalance outcomes. A lightweight GrowNet-based architecture has been proposed that can obtain higher recall for the minority class using the Behavioral Risk Factor Surveillance System (BRFSS) 2022 dataset. A Synthetic Refinement Pipeline using Adaptive-TomekLinks has been employed to resolve the class imbalances. The proposed model has been tested in different versions of BRFSS datasets including BRFSS 2022, BRFSS 2021, and BRFSS 2020. The proposed model has obtained the highest specificity and sensitivity of 0.74 and 0.81 respectively across the BRFSS 2022 dataset. The proposed approach achieved an Area Under the Curve (AUC) of 0.8709. Additionally, applying explainable AI (XAI) to the proposed model has revealed the impacts of transitioning from smoking to e-cigarettes and chewing tobacco on heart disease.

Heart Disease Detection Using ML

Conference Paper

Full-text available

Mar 2023

Hearth disease is one of the leading causes of death globally and a common disease in the middle and old ages. Among all heart diseases, heart attack and strokes are the most common cardiac illness that is the responsible majority of heart disease death. To identify heart diseases, for instance, Angiography is costly and has significant side effects. Therefore, machine learning can play an important role in identifying and predicting the potential risk factor of cardiac disease based on clinical and patient data, which is affordable and reliable. This study proposed and evaluated six machine learning models using survey data of 400k US residents to predict heart disease. This study also compared the evaluated six machine learning models, which are Xgboost, Bagging, Random Forest, Decision Tree, K-Nearest Neighbor, and Naïve Bayes. The accuracy, sensitivity, F1-score, and AUC of six machine learning …

A Hybrid Technique to Predict Brain Tumour using MRI Image

Article

Full-text available

May 2024

Currently, the radiologist can more accurately identify brain tumours through the development of Computer-Assisted Diagnosis (CAD), Machine Learning and Deep Learning. Recently, Deep Learning (DL) strategies have gained traction as a means to rapidly and accurately construct automated systems for diagnosing and segmenting the image. The standard approach to this issue is to create a custom feature for classification. Most neurological diseases originate from abnormal growth of brain cells, which can compromise brain architecture and even lead to malignant brain tumours. Brain tumour detection and classification algorithms that are both quick and accurate have been the subject of extensive study. This facilitates the straight forward diagnosis of brain tumours using Magnetic Resonance Image (MRI) images. Through Deep Learning (DL) model the diagnosis of brain malignancies in MRI images using Convolutional Neural Network (CNN) is possible by training the data. So, in this paper the brain tumouris predicted byproposing a Hybridfeature extraction technique i.e., tuned CNN model with ResNet150 and U-net.

Heart Disease Prediction Using GridSearchCV and Random Forest

Article

Full-text available

Mar 2024

INTRODUCTION: This study explores machine learning algorithms (SVM, Adaboost, Logistic Regression, Naive Bayes, and Random Forest) for heart disease prediction, utilizing comprehensive cardiovascular and clinical data. Our research enables early detection, aiding timely interventions and preventive measures. Hyperparameter tuning via GridSearchCV enhances model accuracy, reducing heart disease's burdens. Methodology includes preprocessing, feature engineering, model training, and cross-validation. Results favor Random Forest for heart disease prediction, promising clinical applications. This work advances predictive healthcare analytics, highlighting machine learning's pivotal role. Our findings have implications for healthcare and policy, advocating efficient predictive models for early heart disease management. Advanced analytics can save lives, cut costs, and elevate care quality. OBJECTIVES: Evaluate the models to enable early detection, timely interventions, and preventive measures. METHODS: Utilize GridSearchCV for hyperparameter tuning to enhance model accuracy. Employ preprocessing, feature engineering, model training, and cross-validation methodologies. Evaluate the performance of SVM, Adaboost, Logistic Regression, Naive Bayes, and Random Forest algorithms. RESULTS: The study reveals Random Forest as the favored algorithm for heart disease prediction, showing promise for clinical applications. Advanced analytics and hyperparameter tuning contribute to improved model accuracy, reducing the burden of heart disease. CONCLUSION: The research underscores machine learning's pivotal role in predictive healthcare analytics, advocating efficient models for early heart disease management.

Assessing the Robustness of Machine Learning Algorithms for Cardiovascular Disease Detection Across Diverse Clinical Datasets

Conference Paper

Full-text available

Mar 2024

Cardiovascular diseases (CVD) continue to pose significant health risks globally, accentuating the need for early and precise detection mechanisms. With the evolution of computational methods in healthcare, machine learning offers transformative solutions for diagnostic accuracy. This research aims to identify an algorithm with consistent performance across multiple datasets for potential integration into a cardiac disease prediction platform. We examined nine prominent machine learning algorithms, namely Support Vector Machine (SVM), Gradient Boosting (GB), Random Forest (RF), Logistic Regression (LR), Decision Tree (DT), K-Nearest Neighbor (KNN), Naive Bayes (NB), Extreme Gradient Boosting (XGBoost), and Multilayer Perceptron (MLP), and evaluated their predictive performance across two heterogeneous datasets. Both datasets encompass 14 attributes but differ in instance sizes: 303 and 1025, respectively. Through a meticulous methodological framework, the data underwent preprocessing, splitting, and model training, followed by validation using metrics such as Precision, Recall, F1 score, and Accuracy, coupled with a confusion matrix for detailed class-based evaluation. Our findings revealed that the Random Forest and MLP algorithms exhibited superior consistency and robustness in disease prediction across both datasets, achieving a peak accuracy of 95.14%. While XGBoost performed proficiently on one dataset, its performance wavered in a cross-dataset scenario. Based on these findings, either the Random Forest or Multilayer Perceptron models are recommended for developing a robust heart disease prediction system. This research not only affirms the potential of machine learning in revolutionizing CVD diagnostics but also underscores the importance of algorithm selection based on dataset characteristics.

Distributed information fusion for secure healthcare

Chapter

Jan 2024

A Novel Heart Disease Prediction System Using XGBoost Classifier Coupled With ADASYN SMOTE

Conference Paper

Nov 2023

Brain Tumor Detection Using Convolutional Neural Networks: A Comparative Study

Article

Dec 2023

Using Magnetic Resonance Imaging (MRI) images to detect brain tumors by medical practitioners is mundane and prone to errors. Misdiagnosis of brain tumors can be life-threatening, so to lessen misdiagnosis, computational techniques can be used in concert with medical professionals. Deep learning approaches have been gaining popularity in modeling and developing systems for medical image processing that can detect abnormalities quickly. The methods proposed herein are based on Convolutional Neural Networks (CNN) trained on the 'BR35H::Brain Tumor Detection 2020' dataset. A custom CNN architecture was designed, followed by the utilization of transfer learning with four pre-trained models: InceptionV3, ResNet101, VGG19, and DenseNet169 and a comparative analysis of these architectures has been presented in this paper. The experimental results show that the DenseNet169 model outperformed other models with a training accuracy of 99.83 %, test accuracy of 99.66%, precision of 99.67%, and recall of 99.67%. Additionally, ResNet101 has a 95.92% test accuracy, VGG19 has a test accuracy of 97.83%, the custom architecture has a test accuracy of 98.16%, and InceptionV3 has the lowest test accuracy of 91.66%. It has been concluded that DenseNet169 provides better results for the classification of brain tumors than other models.

Undersampling for Fairness: Achieving More Equitable Predictions in Diabetes and Prediabetes

Preprint

Full-text available

May 2023

While type 2 diabetes is predominantly found in the elderly population, recent publications indicates an increasing prevalence in the young adult population. Failing to predict it in the minority younger age group could have significant adverse effects on their health. The previous work acknowledges the bias of machine learning models towards different gender and race groups and proposes various approaches to mitigate it. However, prior work has not proposed any effective methodologies to predict diabetes in the young population which is the minority group in the diabetic population. In this paper, we identify this deficiency in traditional machine learning models and implement double prioritization (DP) bias correction techniques to mitigate the bias towards the young population when predicting diabetes. Deviating from the traditional concept of one-model-fits-all, we train customized machine-learning models for each age group. The DP model consistently improves recall of diabetes class by 26% to 40% in the young age group (30-44). Moreover, the DP technique outperforms 7 commonly used whole-group sampling techniques such as random oversampling, SMOTE, and AdaSyns techniques by at least 36% in terms of diabetes recall in the young age group. We also analyze the feature importance to investigate the source of bias in the original model. Data and Code Availability We use a publicly available dataset called Behavioral Risk Factor Surveillance System (BRFSS) from 2021 CDC. To reproduce the result, the anonymised code has been attached as supplementary files. The code will be uploaded to a public repository upon publication. Institutional Review Board (IRB) Our research does not require IRB approval.

Heart Disease Prediction Using Logistic Regression

Article

Full-text available

Feb 2023

Keywords Machine Learning, Logistic regression, Framingham dataset, heart diseases. Abstract Myocardial Infarction and Brain attacks are responsible for the fatalities of individuals from cardiovascular diseases (CVDs), and especially the deaths occur before age 70. 17.9 million people are thought to pass away from CVDs annually. Accurate monitoring for each patient individually is not always possible, and clinicians cannot consult with patients every 24 hours due to the additional time and knowledge required. Using the patient's various cardiac characteristics and the machine learning approach of logistic regression on a publicly accessible dataset from Kaggle, we developed and examined models for predicting heart disease in this research. The main objective is to ascertain of acquiring coronary heart disease (CHD) upto 10 years of health risk. More than 4,000 records, 15 attributes, and patient data are included in the collection. To forecast outcomes, it makes predictions about a dependent variable based on one or more sets of independent variables. Both binary classification and multi-class classification can use it. This study aims to establish the most significant heart disease risk factors and estimate the overall risk using logistic regression.

Lung cancer prediction model using ensemble learning techniques and a systematic review analysis

Conference Paper

Full-text available

Dec 2022

Vocal Feature Guided Detection of Parkinson’s Disease Using Machine Learning Algorithms

Conference Paper

Full-text available

Oct 2022

Deep Learning Based Model for Alzheimer's Disease Detection Using Brain MRI Images

Conference Paper

Full-text available

Oct 2022

Heart failure survival prediction using machine learning algorithm: am I safe from heart failure?

Conference Paper

Full-text available

Jun 2022

Machine learning model evaluation for 360° video caching

Conference Paper

Full-text available

Jun 2022

360 degree virtual reality videos enhance the viewing experience by giving a more immersive and interactive environment compared to traditional videos. These videos require large bandwidth to transmit. Typically, viewers observe only a part of the entire 3600 videos, called the field of view (FoV), when watching 3600 videos. Edge caching can be a good solution to optimize bandwidth utilization as well as improve user quality of experience (QoE). In this research, three machine learning models utilizing random forest, linear regression, and Bayesian regression have been proposed to develop a 3600 -video caching algorithm. Tile frequency, user’s view prediction probability and tile resolution have been used as feature. The purpose of the developed machine learning models is to determine the caching strategy of 360-degree video tiles. The models are capable to predict the viewing frequency of 360 degree video tiles (subsets of a full video). We have compared the results of the three developed models and the results show that the random forest regression model outperforms the other proposed models with a predictive R2 value of 0.79

360 Degree Video Caching with LRU & LFU

Conference Paper

Full-text available

Dec 2021

360-degree videos, which provide a means to enjoy virtual reality, have gained in popularity among people around the world. It allows users to view video scenes at any angles while watching videos. 360-degree video caching at the edge server can be a good solution to minimize the bandwidth cost and to deliver the video with less latency. Popular video contents can be divided into tiles which are cached at the edge server in a potential 360- degree video streaming system. In this research, a system architecture for 360 video caching has been proposed, and video caching has been performed using the Least Recently Used (LRU) and Least Frequently Used (LFU) algorithms. Recency and frequency are used for cache eviction. In the experiment, 48 users’ head movement data is utilized in a sequential and randomized order for two 360-degree videos, and caching is compared between the LRU cache and LFU cache by varying cache size. The results show that average cache hit rate is greater when using LFU caching as compared to LRU caching for a varying cache size

Identifying the Main Risk Factors for Cardiovascular Diseases Prediction Using Machine Learning Algorithms

Article

Full-text available

Oct 2021

Cardiovascular Diseases (CVDs) are a leading cause of death globally. In CVDs, the heart is unable to deliver enough blood to other body regions. As an effective and accurate diagnosis of CVDs is essential for CVD prevention and treatment, machine learning (ML) techniques can be effectively and reliably used to discern patients suffering from a CVD from those who do not suffer from any heart condition. Namely, machine learning algorithms (MLAs) play a key role in the diagnosis of CVDs through predictive models that allow us to identify the main risks factors influencing CVD development. In this study, we analyze the performance of ten MLAs on two datasets for CVD prediction and two for CVD diagnosis. Algorithm performance is analyzed on top-two and top-four dataset attributes/features with respect to five performance metrics –accuracy, precision, recall, f1-score, and roc-auc—using the train-test split technique and k-fold cross-validation. Our study identifies the top-two and top-four attributes from CVD datasets analyzing the performance of the accuracy metrics to determine that they are the best for predicting and diagnosing CVD. As our main findings, the ten ML classifiers exhibited appropriate diagnosis in classification and predictive performance with accuracy metric with top-two attributes, identifying three main attributes for diagnosis and prediction of a CVD such as arrhythmia and tachycardia; hence, they can be successfully implemented for improving current CVD diagnosis efforts and help patients around the world, especially in regions where medical staff is lacking.

Efficient Prediction of Cardiovascular Disease Using Machine Learning Algorithms With Relief and LASSO Feature Selection Techniques

Article

Full-text available

Jan 2021

Cardiovascular diseases are among the most common serious illnesses affecting human health. CVDs may be prevented or mitigated by early diagnosis, and this may reduce mortality rates. Identifying risk factors using machine learning models is a promising approach. We would like to propose a model that incorporates different methods to achieve effective prediction of heart disease. For our proposed model to be successful, we have used efficient Data Collection, Data Pre-processing and Data Transformation methods to create accurate information for the training model. We have used a combined dataset (Cleveland, Long Beach VA, Switzerland, Hungarian and Stat log). Suitable features are selected by using the Relief, and Least Absolute Shrinkage and Selection Operator (LASSO) techniques. New hybrid classifiers like Decision Tree Bagging Method (DTBM), Random Forest Bagging Method (RFBM), K-Nearest Neighbors Bagging Method (KNNBM), AdaBoost Boosting Method (ABBM), and Gradient Boosting Boosting Method (GBBM) are developed by integrating the traditional classifiers with bagging and boosting methods, which are used in the training process. We have also instrumented some machine learning algorithms to calculate the Accuracy (ACC), Sensitivity (SEN), Error Rate, Precision (PRE) and F1 Score (F1) of our model, along with the Negative Predictive Value (NPR), False Positive Rate (FPR), and False Negative Rate (FNR). The results are shown separately to provide comparisons. Based on the result analysis, we can conclude that our proposed model produced the highest accuracy while using RFBM and Relief feature selection methods (99.05%).

Heart Disease Prediction Model using Machine Learning

Conference Paper

Dec 2022

MLHeartDis:Can Machine Learning Techniques Enable to Predict Heart Diseases?

Figures

Recommended publications

Improve the Accuracy of Heart Disease Predictions Using Machine Learning and Feature Selection Techn...

Prediction of Heart Diseases Using Machine Learning Approaches

c4. Heart Disease Prediction System Based on Hybrid Machine Learning Techniques

A Novel Approach for Prediction of Heart Disease using Machine Learning Algorithms