ArticlePDF Available

Enhancing Heart Disease Prediction Accuracy: A Comparative Study of Machine Learning Models with Ensemble Method

June 2024

June 2024
10(3):4827-4833

Authors:

Chandra Shekhar Gautam

AKS University Satna

Akhilesh A. Waoo

AKS University, Satna

Heart disease remains a critical global health concern, driving mortality rates and presenting challenges for early detection and treatment. Leveraging modern medical advancements, our study employs a multifaceted approach integrating electronic health records and online-connected regulators with wearable medical sensors. We utilize data mining techniques to efficiently process the continuous stream of human-generated health data, focusing on accurate classification for early heart disease detection. Our methodology encompasses meticulous data pre-processing, including missing value imputation, normalization, and categorical feature encoding. We employ a diverse array of machine learning algorithms, ranging from traditional logistic regression to advanced methods like random forests and support vector machines, optimizing them through rigorous experimentation and hyper-parameter tuning. Crucially, we emphasize feature selection to identify the most influential predictors of heart disease risk. Evaluation metrics such as accuracy, precision, recall, F1 score, and AUC-ROC underscore the effectiveness of our models, highlighting significant performance advantages for certain algorithms.

Content uploaded by Chandra Shekhar Gautam

Content may be subject to copyright.

Vol-10 Issue-3 2024 IJARIIE-ISSN(O)-2395-4396

24234 ijariie.com 4827

Enhancing Heart Disease Prediction

Accuracy: A Comparative Study of Machine

Learning Models with Ensemble Method

Sanjana Chaudhari, Mr. Chandra Shekhar Gautam, Dr. Akhilesh A. Waoo

AKS University, Satna (M.P.), India

sanjanachaudhari7828@gmail.com

AKS University, Satna (M.P.), India

Shekharg84@gmail.com

AKS University, Satna (M.P.), India

akhileshwaoo@gmail.com

ABSTRACT

Heart disease remains a critical global health concern, driving mortality rates and presenting challenges for early

detection and treatment. Leveraging modern medical advancements, our study employs a multifaceted approach

integrating electronic health records and online-connected regulators with wearable medical sensors. We utilize data

mining techniques to efficiently process the continuous stream of human-generated health data, focusing on accurate

classification for early heart disease detection. Our methodology encompasses meticulous data pre-processing,

including missing value imputation, normalization, and categorical feature encoding. We employ a diverse array of

machine learning algorithms, ranging from traditional logistic regression to advanced methods like random forests

and support vector machines, optimizing them through rigorous experimentation and hyper-parameter tuning.

Crucially, we emphasize feature selection to identify the most influential predictors of heart disease risk. Evaluation

metrics such as accuracy, precision, recall, F1 score, and AUC-ROC underscore the effectiveness of our models,

highlighting significant performance advantages for certain algorithms.

Keywords: Heart Disease, Machine Learning, Ensemble Methods, LR, RF, SVM, NB.

INTRODUCTION

Heart disease stands as the leading cause of global mortality, claiming approximately 17.9 million lives annually,

constituting 31% of all fatalities worldwide, as reported by the World Health Organization (WHO) [1]. This

encompassing term includes conditions such as heart failure, hypertension, and coronary artery disease, affecting both

the heart and blood vessels. Identifying individuals at risk is crucial for implementing preventive measures and timely

intervention. While traditional risk assessment methods rely on factors like blood pressure, cholesterol levels, and

family medical history, they may not fully capture an individual's risk profile complexity, as approximately 17.5

million individuals succumb to cardiovascular diseases annually. In low- and middle-income nations, heart disease

accounts for over 75% of all deaths, with 80% of fatalities attributed to strokes and heart attacks [2]. India faces a

concerning trend, with the number of cardiovascular disease cases rising each year, affecting an estimated 30 million

people annually.

Heart diseases, or cardiovascular diseases (CVD), encompass various types, including coronary heart disease,

arteriosclerosis, rheumatic diseases, congenital diseases, myocarditis, Angina pectoris, and cardiac arrhythmias. Risk

Vol-10 Issue-3 2024 IJARIIE-ISSN(O)-2395-4396

24234 ijariie.com 4828

factors associated with heart disease underscore the importance of preventive measures. These factors can be divided

into modifiable and non-modifiable categories. Non-modifiable risk factors include gender, age, and heredity, which

cannot be changed and often serve as the main causes of heart disease. Modifiable risk factors, on the other hand, are

related to habits, stress, diet, and various biochemical factors.

LITERATURE REVIEW

Abhishek Tanja et al. (Reference [3]) developed a heart disease prediction system using data mining techniques and

various supervised machine learning algorithms such as J48, Naïve Bayes, and Multilayer Perceptron in the WEKA

Machine learning software, employing 10-fold cross-validation. Their study revealed that J48 outperformed Naïve

Bayes and Neural Networks depending on the dataset's nature.

Priti Chandra et al. (Reference [4]) explored Computational Intelligence Techniques for early diagnosis of heart

disease using WEKA and 10-fold cross-validation. Their research utilized the Naïve Bayes algorithm, achieving an

86.29% accuracy deemed satisfactory but not optimal for automated heart disease diagnosis.

Ashok Kumar Dived et al. (Reference [5]) evaluated various machine-learning techniques for heart disease prediction

using tenfold cross-validation. Their study incorporated algorithms like Naïve Bayes, Classification Tree, KNN,

Logistic Regression, SVM, and ANN, with Logistic Regression demonstrating superior accuracy.

Bo Jim, Chao Chee, et al. In 2018, a model titled "Predicting the Risk of Heart Failure with EHR Sequential Data

Modeling" was proposed, employing a neural network approach. The study utilized real-world electronic health record

(EHR) data on congestive heart disease to predict the onset of the condition in advance. We tend to use one-hot

encryption and word vectors to model the diagnosing events Predicting coronary failure events using the fundamental

concepts of an extended memory network model. Examining the findings underscores the significance of honoring

the chronological order of medical records [4].

Senthil Kumar Mohan, Chandrasekhar Tirumala, and their collaborators proposed a method titled "Effective Heart

Disease Prediction Using Hybrid Machine Learning Techniques." (2019) was an efficient technique using hybrid

machine learning methodology. The hybrid approach is a combination of random forest and linear methods. A dataset

and specific attribute subsets were gathered to facilitate prediction modeling. Certain attributes were selected from the

pre-processed dataset of cardiovascular disease, forming a specific subset for analysis. After pre-processing, the hybrid

techniques were applied and diagnosed cardiovascular disease [5].

METHODOLOGY

A systematic approach is utilized to improve the accuracy of heart disease prediction through machine learning

techniques. Initially, the dataset undergoes rigorous data preprocessing, encompassing steps like handling missing

values and normalizing features using Min-Max scaling. Correlation analysis is then conducted, often visualized

through bar plots, to discern relationships between features and the target variable. Subsequently, the dataset is split

into testing and training subsets to facilitate model evaluation. To ensure robustness, cross-validation techniques such

as k-fold cross-validation are applied. Ensemble methods like Random Forest (RF), alongside traditional algorithms

including Logistic Regression (LR), Support Vector Machines (SVM), and Naive Bayes (NB), are employed to

harness diverse modeling approaches. Evaluation methods such as accuracy, precision, recall, and area under the ROC

curve (AUC) are utilized to gauge model performance.

1. Data Source

The dataset available from IEEE Dataport.org offers detailed information on human heart health, comprising 11

features and a target variable. It includes 6 nominal and 5 numeric attributes. The "target" attribute indicates the

presence of heart disease, with 0 denoting absence and 1 indicating its presence. Below are descriptions of the

attributes and their significance for research purposes:

Vol-10 Issue-3 2024 IJARIIE-ISSN(O)-2395-4396

24234 ijariie.com 4829

1. Age: Patients' age in years (numeric)

2. Sex: Gender of the patient (1 for male, 0 for female) (nominal)

3. Type of chest pain: Categorized as typical angina, atypical angina, non-angina pain, or asymptomatic

(nominal)

4. Resting basal points: Resting blood pressure measured in mm/HG (numeric)

5. Cholesterol: Serum cholesterol level measured in mg/dl (numeric)

6. Fasting blood sugar: Presence of fasting blood sugar > 120 mg/dl (1 for true, 0 for false) (nominal)

7. Resting ECG: The result of resting electrocardiogram is categorized as normal, ST-T wave abnormality, or

left ventricular hypertrophy (nominal)

8. Maximum heart rate: Maximum heart rate achieved (numeric)

9. Exercise angina: Presence of exercise-induced angina (1 for yes, 0 for no) (nominal)

10. Old peak: ST segment depression due to physical activity compared to rest (numeric)

11. ST slope: Slope of the ST segment during peak exercise categorized as normal, up-sloping, flat, or down-

sloping (nominal)

12. Target: The target variable indicates the patient's risk for heart disease (1 for at-risk, 0 for healthy).

Figure: Diagram of Proposed Methodology

2. Data preprocessing

Data preprocessing is crucial for both data analysis and training machine learning models. Normalization adjusts data

to accommodate differences in scales, such as converting temperature measurements from Celsius to Fahrenheit.

Standardization scales data to reflect deviations from the mean, enhancing classifier performance by targeting a

standard deviation of 1 and a mean of 0.

3. In this step, hyperparameter tuning aims to optimize hyperparameter values for improved accuracy. Utilizing the

GridSearchCV method, we systematically explore hyperparameter combinations to identify the best settings. This

involves adjusting hyperparameters before training machine learning classifiers to enhance their performance. The fit

function of the Scikit-learn GridSearchCV class facilitates this process by training each algorithm and adjusting

hyperparameters within a unified framework. Once optimal hyperparameter values are determined, the entire training

dataset is employed to create a precise model. The 10-fold cross-validation technique assists in selecting the most

suitable hyperparameter values by repeatedly training and evaluating the model on different subsets of the training

data. This iterative process ensures the attainment of the highest classification accuracy.

Heart Disease

Dataset

Data

Preprocessing

Testing Split

Training Split

Cross

Validation

Classification

Techniques

Evaluation

Methods

Vol-10 Issue-3 2024 IJARIIE-ISSN(O)-2395-4396

24234 ijariie.com 4830

4. In this step, apply the machine learning algorithms, including AdaBoost, logistic regression, extra trees, multinomial

Naive Bayes, support vector machine, linear discriminant analysis, classification and regression tree, random forest,

and XGBoost.

5. In this step, the performance of the prediction model is assessed using a range of metrics, including accuracy,

precision, recall, and F-measure. The model selected is the one that attains the highest values across all these metrics.

6. Performance metrics

Performance metrics in machine learning assess the effectiveness of an algorithm based on various criteria like

accuracy, precision, sensitivity, and more.

Confusion metrics aid in evaluating a model's performance by organizing classification outcomes and distinguishing

between actual and predicted values. They delineate true positives (correctly identified positive outcomes), false

positives (incorrectly identified negative outcomes as positive), false negatives (mistakenly identified positive

outcomes as negative), and true negatives (correctly identified negative outcomes).

The accuracy metric gauges the correctness of predictions made by a machine learning or classifier model and is

mathematically represented by an equation.

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃 +𝑇𝑁

𝑇𝑃 +𝑇𝑁 +𝐹𝑃 +𝐹𝑁

Precision measures the accuracy of positive predictions by evaluating the ratio of true positives to all positive

predictions. Mathematically, it is expressed as:

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃

𝑇𝑃 +𝐹𝑃

Sensitivity evaluates the model's ability to identify all actual positive cases relative to the total number of positive

cases missed. This is represented mathematically as.

𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = 𝑇𝑃

𝑇𝑃 +𝐹𝑁

The F-Measure balances precision and recall, calculated as their harmonic mean, represented mathematically by

equation.

𝐹1 𝑆𝑐𝑜𝑟𝑒 = 2 ∗ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑆𝑒𝑛𝑠𝑡𝑖𝑣𝑖𝑡𝑦

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦

The main evaluation measures for this problem area are sensitivity, specificity, precision, F1 measure, and ROC-AUC

curve. In addition to these, we use two other performance measures that are more reliable statistical measures, namely

Matthews Correlation Coefficient (MCC) and Log Loss.

RESULT AND DISCUSSION

Following the systematic approach to enhancing heart disease prediction accuracy through machine learning

techniques, our findings demonstrate significant advancements. Despite the widespread adoption of algorithms such

as SVC and Decision Trees in diagnosing heart disease, our utilization of KNN, Random Forest Classifier, and

Logistic Regression outperforms them [12]. These selected algorithms not only exhibit superior accuracy but also

offer cost-efficiency and faster processing compared to earlier methodologies. Notably, KNN and Logistic Regression

achieve maximum accuracies of 88.5%, matching or surpassing those reported in prior studies. After training and

evaluating ten machine learning models and comparing their performances, the Random Forest model using the

Vol-10 Issue-3 2024 IJARIIE-ISSN(O)-2395-4396

24234 ijariie.com 4831

entropy criterion consistently outperforms others, achieving an accuracy of 90.63%. Additionally, a majority vote

feature selection technique, incorporating various selection methods, maintains the Random Forest model's superiority

post-feature selection, with an accuracy of 89.36%. Remarkably, this represents a minimal decrease of less than 1%

in accuracy compared to its performance before feature selection.

Model

Accuracy

Precision

Sensitivity

F1 Score

ROC

Log

Loss

Random

Forest

91.4894

88.7218

95.935

0.921875

0.912711

3.067545

The Random Forest model exhibited the highest performance during cross-validation.

Fig -1 Performance of RF Model

Fig -2 Comparison with Other Models

0.5

1.5

2.5

3.5

Random Forest

MLP KNN Extra tree

classifier XGB SVC SGD Adaboost CART GBM

COM PARISON WITH OTHER M ODELS

Accuracy Precision Sensitivity Specificity F1 Score ROC Log Loss

Vol-10 Issue-3 2024 IJARIIE-ISSN(O)-2395-4396

24234 ijariie.com 4832

The above results indicate that the XGBoost Classifier performs the best, achieving the highest test accuracy of 0.9191,

sensitivity of 0.943, specificity of 0.89, and the highest f1-score of 0.9243, with the lowest Log Loss of 2.792. Random

forest, on the other hand, attained the highest sensitivity of 95.122%.

Only 11 features have been selected by at least one feature selection method. Notably, features like

fasting_blood_sugar, chest_pain_type_typical angina, rest_ecg_left ventricular hypertrophy, and rest_ecg_normal are

absent from the table as they haven't been deemed important by any of the feature selection methods.

The top 6 features selected, based on a 5 out of 6 majority votes from feature selection methods, are st_slope_flat,

st_depression, max_heart_rate_achieved, exercise_induced_angina, cholesterol, and age. The machine learning

models will be retrained using these 6 features, and their performance will be compared to assess any potential drop

in performance post-feature selection.

CONCLUSION

In this study, we employed 10-fold cross-validation to evaluate the performance of 13 machine-learning algorithms

with varying hyperparameters. Subsequently, we trained and assessed these models on the test set, ultimately

identifying the top three performers. Notably, a stacked ensemble of powerful algorithms demonstrated superior

performance compared to individual models. Specifically, the Random Forest model utilizing the entropy criterion

exhibited the highest accuracy of 90.63%. Even after implementing feature selection techniques, the Random Forest

model remained the top performer, with only a negligible decrease in accuracy. Feature importance analysis

highlighted Cholesterol, Max Heart Rate achieved, and ST Depression as the most influential features.

REFERENCE

[1] "Cardiovascular diseases (CVDs)." WHO, 2020, https://www.who.int/zh/news-

room/factsheets/detail/cardiovascular-diseases-(cvds).

[2] In 2011, R. Rao surveyed the prediction of heart morbidity using data mining techniques, which was published in

Knowledge Management, volume 1, issue 3, spanning pages 14–34.

[3] A. Taneja authored a paper titled "Heart Disease Prediction System Using Data Mining Techniques" in the Oriental

Journal in 2013.

[4] N. O. Fowler published a paper on the diagnosis of heart disease in volume V of an unspecified journal in March

2012.

[5] Ashok Kumar Dwivedi explored computational intelligence techniques for predicting diabetes mellitus in an article

titled "Analysis of Computational Intelligence Techniques for Diabetes Mellitus Prediction," published in Neural

Computing Applications in 2017.

[6] Bo Jin, Chao Che, Zhen Liu, Shillong Zhang, Xiaomeng Yin, And Xiaoping Wii, “Predicting the Risk of Heart

Failure With EHR Sequential Data Modelling", IEEE Access 2018.

[7] Senthilkumar Mohan, Chandrasegar Thirumalai, and Gautam Srivastava, “Effective Heart Disease Prediction

Using Hybrid Machine Learning Techniques”, IEEE Access 2019.

[8] A. Lakshmanarao, Y. Swathi, and P. Sri Sai Sundareswar investigated the application of machine learning

techniques in predicting heart disease. This research was published in the International Journal of Scientific &

Technology Research, Volume 8, Issue 11, in November 2019.

[9] Mijwil, M. M., Al-Mistarehi, A. H., and Mutur, D. S. conducted a literature review titled "The Practices of

Artificial Intelligence Techniques and Their Value in Addressing the COVID-19 Pandemic" in the Mobile Forensics

Journal in 2022, volume 4, issue 1, pages 11-30.

Vol-10 Issue-3 2024 IJARIIE-ISSN(O)-2395-4396

24234 ijariie.com 4833

[10] A. Taneja authored a paper titled "Heart Disease Prediction System Using Data Mining Techniques" in the

Oriental Journal in 2013.

[11] N. O. Fowler published a paper on the diagnosis of heart disease in volume V of an unspecified journal in March

2012.

[12] Ashok Kumar Dwivedi explored computational intelligence techniques for predicting diabetes mellitus in an

article titled "Analysis of Computational Intelligence Techniques for Diabetes Mellitus Prediction," published in

Neural Computing Applications in 2017.

[13] M. Shahi and R. Kaur Germ presented a heart disease prediction system using data mining techniques in the

Orient Journal of Computer Science and Technology in 2013.

[14] S. M. S. Shah et al. conducted research on heart disease diagnosis using parallel probabilistic principal component

analysis to extract relevant features, published in Statistical Mechanics and its Applications in 2017.

[15] T. Karthikeyan, B. Raghavan, and V. A. Kanimozhi conducted a study on the application of data mining

classification algorithms for heart disease prediction, which was published in the International Journal of Advanced

Research in Computer Engineering and Technology, volume 5, issue 4, pages 1076–1081, also in 2016.

[16] Folsom, A. R., Princes, R. J., Kaye, S. A., and Solar, J. T. (1989) investigated the association between body fat

distribution and the self-reported prevalence of hypertension, heart attack, and other heart diseases in older women.

[17] Gour, S., Panwar, P., Dwivedi, D., and Mali, C. (2022). In 'Intelligent Sustainable Systems,' published by Springer

in Singapore, the chapter titled 'A Machine Learning Approach for Heart Attack Prediction' spans pages 741 to 747."

[18] Konda Babu, A., Siddhartha, V., Kumar, B.B., and Penumutchi, B. (2021). "A comparative study on machine

learning-based heart disease prediction." Materials Today Proceedings.

[19] Gudmundsson, E.F., Bjornsdottir, G., Sigurdsson, S., Andersen, K., Thorsson, B., Aspelund, T., and Gudnason, V.

(2022). In a population-based cohort, the presence of carotid plaque correlates significantly with coronary artery calcium

and serves as a predictive factor for the development of incident coronary heart disease.

[19] A. Jagtap, P. Malewadkar, O. Baswat, and H. Rambade conducted a study titled "Heart disease prediction using

machine learning," published in the International Journal of Research in Engineering, Science, and Management in 2019.

[20] U. N. Dulhare published a paper titled "Prediction system for heart disease using naive Bayes and particle swarm

optimization" in the Biomedical Research Journal in 2018.

[21] J. K. Kim and S. Kang presented research on "Neural network-based coronary heart disease risk prediction using

feature correlation analysis" in the Journal of Healthcare Engineering in 2017.

[22] K. C. Siontis, P. A. Noseworthy, Z. I. Attia, and A. Paul discussed "Artificial intelligence-enhanced

electrocardiography in cardiovascular disease management" in a 2021 article published in Nature Reviews Cardiology.

[23] P. S. Linda, W. Yin, P. A. Gregory, Z. Amanda, and G. Margaux developed "Development of a Novel Clinical

Decision Support System for Exercise Prescription among Patients with multiple cardiovascular disease risk factors,"

published in Mayo Clinic Proceedings: Innovations, Quality & Outcomes in 2021.

[24] Y. Ali, R. Amir, and A.-M. Fardin conducted a study titled "Profile-based assessment of diseases affecting factors

using fuzzy association rule mining approach: a case study in heart diseases," published in the Journal of Biomedical

Informatics in 2021.

[25] Ghwanmeh, S., Mohammad, A., & Al-Ibrahim, A. (2013). "Innovative artificial neural network-based decision

support system for heart disease diagnosis." Journal of Intelligent Learning Systems and Applications, 5(3), 353-396.

[26] Al-Shayea, Q. K. (2011). "Artificial neural networks in medical diagnosis." International Journal of Computer

Science Issues, 8(2), 150–154.

ResearchGate has not been able to resolve any citations for this publication.

Effective Heart Disease Prediction Using Hybrid Machine Learning Techniques

Article

Full-text available

Jun 2019

Heart disease is one of the most significant causes of mortality in the world today. Prediction of cardiovascular disease is a critical challenge in the area of clinical data analysis. Machine learning has been shown to be effective in assisting in making decisions and predictions from the large quantity of data produced by the healthcare industry. We have also seen machine learning (ML) techniques being used in recent developments in different areas of Internet of Things (IoT). Various studies give only a glimpse into predicting heart disease with machine learning techniques. In this paper, we propose a novel method that aims at finding significant features by applying machine learning techniques resulting in improving the accuracy in the prediction of cardiovascular disease. The prediction model is introduced with different combinations of features, and several known classification techniques. We produce an enhanced performance level with accuracy level of 88.7% through the prediction model for heart disease with Hybrid Random Forest with Linear Model (HRFLM).

Neural Network-Based Coronary Heart Disease Risk Prediction Using Feature Correlation Analysis

Article

Full-text available

Sep 2017

Background Of the machine learning techniques used in predicting coronary heart disease (CHD), neural network (NN) is popularly used to improve performance accuracy. Objective Even though NN-based systems provide meaningful results based on clinical experiments, medical experts are not satisfied with their predictive performances because NN is trained in a “black-box” style. Method We sought to devise an NN-based prediction of CHD risk using feature correlation analysis (NN-FCA) using two stages. First, the feature selection stage, which makes features acceding to the importance in predicting CHD risk, is ranked, and second, the feature correlation analysis stage, during which one learns about the existence of correlations between feature relations and the data of each NN predictor output, is determined. Result Of the 4146 individuals in the Korean dataset evaluated, 3031 had low CHD risk and 1115 had CHD high risk. The area under the receiver operating characteristic (ROC) curve of the proposed model (0.749 ± 0.010) was larger than the Framingham risk score (FRS) (0.393 ± 0.010). Conclusions The proposed NN-FCA, which utilizes feature correlation analysis, was found to be better than FRS in terms of CHD risk prediction. Furthermore, the proposed model resulted in a larger ROC curve and more accurate predictions of CHD risk in the Korean population than the FRS.

Artificial Neural Networks in Medical Diagnosis

Article

Full-text available

Feb 2011

Qeethara Al-Shayea

Artificial neural networks are finding many uses in the medical diagnosis application. The goal of this paper is to evaluate artificial neural network in disease diagnosis. Two cases are studied. The first one is acute nephritis disease; data is the disease symptoms. The second is the heart disease; data is on cardiac Single Proton Emission Computed Tomography (SPECT) images. Each patient classified into two categories: infected and non-infected. Classification is an important tool in medical diagnosis decision support. Feed-forward back propagation neural network is used as a classifier to distinguish between infected or non-infected person in both cases. The results of applying the artificial neural networks methodology to acute nephritis diagnosis based upon selected symptoms show abilities of the network to learn the patterns corresponding to symptoms of the person. In this study, the data were obtained from UCI machine learning repository in order to diagnosed diseases. The data is separated into inputs and targets. The targets for the neural network will be identified with 1's as infected and will be identified with 0's as non-infected. In the diagnosis of acute nephritis disease; the percent correctly classified in the simulation sample by the feed-forward back propagation network is 99 percent while in the diagnosis of heart disease; the percent correctly classified in the simulation sample by the feed-forward back propagation network is 95 percent.

Innovative Artificial Neural Networks-Based Decision Support System for Heart Diseases Diagnosis

Article

Full-text available

Jan 2013

Heart diagnosis is not always possible at every medical center, especially in the rural areas where less support and care, due to lack of advanced heart diagnosis equipment. Also, physician intuition and experience are not always sufficient to achieve high quality medical procedures results. Therefore, medical errors and undesirable results are reasons for a need for unconventional computer-based diagnosis systems, which in turns reduce medical fatal errors, increasing the patient safety and save lives. The proposed solution, which is based on an Artificial Neural Networks (ANNs), provides a decision support system to identify three main heart diseases: mitral stenosis, aortic stenosis and ventricular septal defect. Furthermore, the system deals with an encouraging opportunity to develop an operational screening and testing device for heart disease diagnosis and can deliver great assistance for clinicians to make advanced heart diagnosis. Using real medical data, series of experiments have been conducted to examine the performance and accuracy of the proposed solution. Compared results revealed that the system performance and accuracy are acceptable, with a heart diseases classification accuracy of 92%.

A comparative study on machine learning based heart disease prediction

Article

Feb 2021

This article has been withdrawn: please see Elsevier Policy on Article Withdrawal (https://www.elsevier.com/about/our-business/policies/article-withdrawal). This article has been withdrawn as part of the withdrawal of the Proceedings of the International Conference on Emerging Trends in Materials Science, Technology and Engineering (ICMSTE2K21). Subsequent to acceptance of these Proceedings papers by the responsible Guest Editors, Dr S. Sakthivel, Dr S. Karthikeyan and Dr I. A. Palani, several serious concerns arose regarding the integrity and veracity of the conference organisation and peer-review process. After a thorough investigation, the peer-review process was confirmed to fall beneath the high standards expected by Materials Today: Proceedings. The veracity of the conference also remains subject to serious doubt and therefore the entire Proceedings has been withdrawn in order to correct the scholarly record.

Profile-Based Assessment of Diseases Affective Factors Using Fuzzy Association Rule Mining Approach: A Case Study in Heart Diseases

Article

Feb 2021

The existing data mining solutions to identify risk factors associated with diseases are burdened with quite a few shortcomings. They usually use crisp partitions for numerical features and also do not use patient-specific profiles. These shortcomings create limitations for solving real problems. Discretizing a numerical feature through crisp partitions can also generate substantial partitioning errors, particularly for features whose values are closer to crisp boundaries. Since the normal range of each numerical feature varies according to the age, gender, and medical conditions of the patients, then ignoring these differences can undermine the accuracy of the extracted itemsets and rules. This paper presents a profile-based fuzzy association rule mining (PB-FARM) approach for the assessment of risk factors highly correlated with diseases. The proposed approach has three phases. Phase I involves creating profiles for patients based on their age, gender, and medical conditions, to determine a normal range of each numerical feature. Then fuzzy partitioning is done for all features (namely, numerical and categorical), and consequently, a structure, called FirstScan, is created. In Phase II, the FirstScan structure is utilized to mine for large fuzzy k-itemsets. Ultimately, in Phase III, the given k-itemsets are employed to generate fuzzy rules for associations between risk factors and diseases. To evaluate the performance of the proposed method the Z-Alizadeh Sani coronary artery disease (CAD) dataset, containing 303 records and 54 features, was used. The results show a positive correlation between typical chest pain and old age with the incidence of CAD. The comparisons made in this study showed that, firstly, the proposed algorithm has a higher partitioning accuracy than other methods, and secondly, it has a reasonably short execution time

Artificial intelligence-enhanced electrocardiography in cardiovascular disease management

Article

Feb 2021

The application of artificial intelligence (AI) to the electrocardiogram (ECG), a ubiquitous and standardized test, is an example of the ongoing transformative effect of AI on cardiovascular medicine. Although the ECG has long offered valuable insights into cardiac and non-cardiac health and disease, its interpretation requires considerable human expertise. Advanced AI methods, such as deep-learning convolutional neural networks, have enabled rapid, human-like interpretation of the ECG, while signals and patterns largely unrecognizable to human interpreters can be detected by multilayer AI networks with precision, making the ECG a powerful, non-invasive biomarker. Large sets of digital ECGs linked to rich clinical data have been used to develop AI models for the detection of left ventricular dysfunction, silent (previously undocumented and asymptomatic) atrial fibrillation and hypertrophic cardiomyopathy, as well as the determination of a person’s age, sex and race, among other phenotypes. The clinical and population-level implications of AI-based ECG phenotyping continue to emerge, particularly with the rapid rise in the availability of mobile and wearable ECG technologies. In this Review, we summarize the current and future state of the AI-enhanced ECG in the detection of cardiovascular disease in at-risk populations, discuss its implications for clinical decision-making in patients with cardiovascular disease and critically appraise potential limitations and unknowns.

Feature extraction through parallel Probabilistic Principal Component Analysis for heart disease diagnosis

Article

Apr 2017
PHYSICA A

Automatic diagnosis of human diseases are mostly achieved through decision support systems. The performance of these systems is mainly dependent on the selection of the most relevant features. This becomes harder when the dataset contains missing values for the different features. Probabilistic Principal Component Analysis (PPCA) has reputation to deal with the problem of missing values of attributes. This research presents a methodology which uses the results of medical tests as input, extracts a reduced dimensional feature subset and provides diagnosis of heart disease. The proposed methodology extracts high impact features in new projection by using Probabilistic Principal Component Analysis (PPCA). PPCA extracts projection vectors which contribute in highest covariance and these projection vectors are used to reduce feature dimension. The selection of projection vectors is done through Parallel Analysis (PA). The feature subset with the reduced dimension is provided to radial basis function (RBF) kernel based Support Vector Machines (SVM). The RBF based SVM serves the purpose of classification into two categories i.e., Heart Patient (HP) and Normal Subject (NS). The proposed methodology is evaluated through accuracy, specificity and sensitivity over the three datasets of UCI i.e., Cleveland, Switzerland and Hungarian. The statistical results achieved through the proposed technique are presented in comparison to the existing research showing its impact. The proposed technique achieved an accuracy of 82.18%, 85.82% and 91.30% for Cleveland, Hungarian and Switzerland dataset respectively.

A Study on Data mining Classification Algorithms in Heart Disease Prediction

Article

Apr 2016

Cardiovascular diseases (CVDs)

Jan 2020

"Cardiovascular diseases (CVDs)." WHO, 2020, https://www.who.int/zh/newsroom/factsheets/detail/cardiovascular-diseases-(cvds).

Enhancing Heart Disease Prediction Accuracy: A Comparative Study of Machine Learning Models with Ensemble Method

Abstract

Recommended publications

Depression Level Classification on Real-Time Data using Machine Learning and Data Mining Techniques

A Smart Healthcare Diabetes Prediction System Using Ensemble of Classifiers

Optimizing Heart Disease Prediction Accuracy using Machine Learning Models

A Method for Improving Prediction of Human Heart Disease Using Machine Learning Algorithms

Intelligent System for the Prediction of Heart Diseases Using Machine Learning Algorithms with Anew...

EMPLOYABILITY OF THE MACHINE LEARNING ALGORITHMS IN THE EARLY DETECTION AND DIAGNOSIS OF CARDIOVASCU...