ArticlePDF Available

Enhancing Heart Disease Prediction Accuracy: A Comparative Study of Machine Learning Models with Ensemble Method

Authors:
  • AKS University Satna

Abstract

Heart disease remains a critical global health concern, driving mortality rates and presenting challenges for early detection and treatment. Leveraging modern medical advancements, our study employs a multifaceted approach integrating electronic health records and online-connected regulators with wearable medical sensors. We utilize data mining techniques to efficiently process the continuous stream of human-generated health data, focusing on accurate classification for early heart disease detection. Our methodology encompasses meticulous data pre-processing, including missing value imputation, normalization, and categorical feature encoding. We employ a diverse array of machine learning algorithms, ranging from traditional logistic regression to advanced methods like random forests and support vector machines, optimizing them through rigorous experimentation and hyper-parameter tuning. Crucially, we emphasize feature selection to identify the most influential predictors of heart disease risk. Evaluation metrics such as accuracy, precision, recall, F1 score, and AUC-ROC underscore the effectiveness of our models, highlighting significant performance advantages for certain algorithms.
Vol-10 Issue-3 2024 IJARIIE-ISSN(O)-2395-4396
24234 ijariie.com 4827
Enhancing Heart Disease Prediction
Accuracy: A Comparative Study of Machine
Learning Models with Ensemble Method
Sanjana Chaudhari, Mr. Chandra Shekhar Gautam, Dr. Akhilesh A. Waoo
AKS University, Satna (M.P.), India
sanjanachaudhari7828@gmail.com
AKS University, Satna (M.P.), India
Shekharg84@gmail.com
AKS University, Satna (M.P.), India
akhileshwaoo@gmail.com
ABSTRACT
Heart disease remains a critical global health concern, driving mortality rates and presenting challenges for early
detection and treatment. Leveraging modern medical advancements, our study employs a multifaceted approach
integrating electronic health records and online-connected regulators with wearable medical sensors. We utilize data
mining techniques to efficiently process the continuous stream of human-generated health data, focusing on accurate
classification for early heart disease detection. Our methodology encompasses meticulous data pre-processing,
including missing value imputation, normalization, and categorical feature encoding. We employ a diverse array of
machine learning algorithms, ranging from traditional logistic regression to advanced methods like random forests
and support vector machines, optimizing them through rigorous experimentation and hyper-parameter tuning.
Crucially, we emphasize feature selection to identify the most influential predictors of heart disease risk. Evaluation
metrics such as accuracy, precision, recall, F1 score, and AUC-ROC underscore the effectiveness of our models,
highlighting significant performance advantages for certain algorithms.
Keywords: Heart Disease, Machine Learning, Ensemble Methods, LR, RF, SVM, NB.
INTRODUCTION
Heart disease stands as the leading cause of global mortality, claiming approximately 17.9 million lives annually,
constituting 31% of all fatalities worldwide, as reported by the World Health Organization (WHO) [1]. This
encompassing term includes conditions such as heart failure, hypertension, and coronary artery disease, affecting both
the heart and blood vessels. Identifying individuals at risk is crucial for implementing preventive measures and timely
intervention. While traditional risk assessment methods rely on factors like blood pressure, cholesterol levels, and
family medical history, they may not fully capture an individual's risk profile complexity, as approximately 17.5
million individuals succumb to cardiovascular diseases annually. In low- and middle-income nations, heart disease
accounts for over 75% of all deaths, with 80% of fatalities attributed to strokes and heart attacks [2]. India faces a
concerning trend, with the number of cardiovascular disease cases rising each year, affecting an estimated 30 million
people annually.
Heart diseases, or cardiovascular diseases (CVD), encompass various types, including coronary heart disease,
arteriosclerosis, rheumatic diseases, congenital diseases, myocarditis, Angina pectoris, and cardiac arrhythmias. Risk
Vol-10 Issue-3 2024 IJARIIE-ISSN(O)-2395-4396
24234 ijariie.com 4828
factors associated with heart disease underscore the importance of preventive measures. These factors can be divided
into modifiable and non-modifiable categories. Non-modifiable risk factors include gender, age, and heredity, which
cannot be changed and often serve as the main causes of heart disease. Modifiable risk factors, on the other hand, are
related to habits, stress, diet, and various biochemical factors.
LITERATURE REVIEW
Abhishek Tanja et al. (Reference [3]) developed a heart disease prediction system using data mining techniques and
various supervised machine learning algorithms such as J48, Naïve Bayes, and Multilayer Perceptron in the WEKA
Machine learning software, employing 10-fold cross-validation. Their study revealed that J48 outperformed Naïve
Bayes and Neural Networks depending on the dataset's nature.
Priti Chandra et al. (Reference [4]) explored Computational Intelligence Techniques for early diagnosis of heart
disease using WEKA and 10-fold cross-validation. Their research utilized the Naïve Bayes algorithm, achieving an
86.29% accuracy deemed satisfactory but not optimal for automated heart disease diagnosis.
Ashok Kumar Dived et al. (Reference [5]) evaluated various machine-learning techniques for heart disease prediction
using tenfold cross-validation. Their study incorporated algorithms like Naïve Bayes, Classification Tree, KNN,
Logistic Regression, SVM, and ANN, with Logistic Regression demonstrating superior accuracy.
Bo Jim, Chao Chee, et al. In 2018, a model titled "Predicting the Risk of Heart Failure with EHR Sequential Data
Modeling" was proposed, employing a neural network approach. The study utilized real-world electronic health record
(EHR) data on congestive heart disease to predict the onset of the condition in advance. We tend to use one-hot
encryption and word vectors to model the diagnosing events Predicting coronary failure events using the fundamental
concepts of an extended memory network model. Examining the findings underscores the significance of honoring
the chronological order of medical records [4].
Senthil Kumar Mohan, Chandrasekhar Tirumala, and their collaborators proposed a method titled "Effective Heart
Disease Prediction Using Hybrid Machine Learning Techniques." (2019) was an efficient technique using hybrid
machine learning methodology. The hybrid approach is a combination of random forest and linear methods. A dataset
and specific attribute subsets were gathered to facilitate prediction modeling. Certain attributes were selected from the
pre-processed dataset of cardiovascular disease, forming a specific subset for analysis. After pre-processing, the hybrid
techniques were applied and diagnosed cardiovascular disease [5].
METHODOLOGY
A systematic approach is utilized to improve the accuracy of heart disease prediction through machine learning
techniques. Initially, the dataset undergoes rigorous data preprocessing, encompassing steps like handling missing
values and normalizing features using Min-Max scaling. Correlation analysis is then conducted, often visualized
through bar plots, to discern relationships between features and the target variable. Subsequently, the dataset is split
into testing and training subsets to facilitate model evaluation. To ensure robustness, cross-validation techniques such
as k-fold cross-validation are applied. Ensemble methods like Random Forest (RF), alongside traditional algorithms
including Logistic Regression (LR), Support Vector Machines (SVM), and Naive Bayes (NB), are employed to
harness diverse modeling approaches. Evaluation methods such as accuracy, precision, recall, and area under the ROC
curve (AUC) are utilized to gauge model performance.
1. Data Source
The dataset available from IEEE Dataport.org offers detailed information on human heart health, comprising 11
features and a target variable. It includes 6 nominal and 5 numeric attributes. The "target" attribute indicates the
presence of heart disease, with 0 denoting absence and 1 indicating its presence. Below are descriptions of the
attributes and their significance for research purposes:
Vol-10 Issue-3 2024 IJARIIE-ISSN(O)-2395-4396
24234 ijariie.com 4829
1. Age: Patients' age in years (numeric)
2. Sex: Gender of the patient (1 for male, 0 for female) (nominal)
3. Type of chest pain: Categorized as typical angina, atypical angina, non-angina pain, or asymptomatic
(nominal)
4. Resting basal points: Resting blood pressure measured in mm/HG (numeric)
5. Cholesterol: Serum cholesterol level measured in mg/dl (numeric)
6. Fasting blood sugar: Presence of fasting blood sugar > 120 mg/dl (1 for true, 0 for false) (nominal)
7. Resting ECG: The result of resting electrocardiogram is categorized as normal, ST-T wave abnormality, or
left ventricular hypertrophy (nominal)
8. Maximum heart rate: Maximum heart rate achieved (numeric)
9. Exercise angina: Presence of exercise-induced angina (1 for yes, 0 for no) (nominal)
10. Old peak: ST segment depression due to physical activity compared to rest (numeric)
11. ST slope: Slope of the ST segment during peak exercise categorized as normal, up-sloping, flat, or down-
sloping (nominal)
12. Target: The target variable indicates the patient's risk for heart disease (1 for at-risk, 0 for healthy).
Figure: Diagram of Proposed Methodology
2. Data preprocessing
Data preprocessing is crucial for both data analysis and training machine learning models. Normalization adjusts data
to accommodate differences in scales, such as converting temperature measurements from Celsius to Fahrenheit.
Standardization scales data to reflect deviations from the mean, enhancing classifier performance by targeting a
standard deviation of 1 and a mean of 0.
3. In this step, hyperparameter tuning aims to optimize hyperparameter values for improved accuracy. Utilizing the
GridSearchCV method, we systematically explore hyperparameter combinations to identify the best settings. This
involves adjusting hyperparameters before training machine learning classifiers to enhance their performance. The fit
function of the Scikit-learn GridSearchCV class facilitates this process by training each algorithm and adjusting
hyperparameters within a unified framework. Once optimal hyperparameter values are determined, the entire training
dataset is employed to create a precise model. The 10-fold cross-validation technique assists in selecting the most
suitable hyperparameter values by repeatedly training and evaluating the model on different subsets of the training
data. This iterative process ensures the attainment of the highest classification accuracy.
Heart Disease
Dataset
Data
Preprocessing
Testing Split
Training Split
Cross
Validation
Classification
Techniques
Evaluation
Methods
Vol-10 Issue-3 2024 IJARIIE-ISSN(O)-2395-4396
24234 ijariie.com 4830
4. In this step, apply the machine learning algorithms, including AdaBoost, logistic regression, extra trees, multinomial
Naive Bayes, support vector machine, linear discriminant analysis, classification and regression tree, random forest,
and XGBoost.
5. In this step, the performance of the prediction model is assessed using a range of metrics, including accuracy,
precision, recall, and F-measure. The model selected is the one that attains the highest values across all these metrics.
6. Performance metrics
Performance metrics in machine learning assess the effectiveness of an algorithm based on various criteria like
accuracy, precision, sensitivity, and more.
Confusion metrics aid in evaluating a model's performance by organizing classification outcomes and distinguishing
between actual and predicted values. They delineate true positives (correctly identified positive outcomes), false
positives (incorrectly identified negative outcomes as positive), false negatives (mistakenly identified positive
outcomes as negative), and true negatives (correctly identified negative outcomes).
The accuracy metric gauges the correctness of predictions made by a machine learning or classifier model and is
mathematically represented by an equation.
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃 +𝑇𝑁
𝑇𝑃 +𝑇𝑁 +𝐹𝑃 +𝐹𝑁
Precision measures the accuracy of positive predictions by evaluating the ratio of true positives to all positive
predictions. Mathematically, it is expressed as:
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃
𝑇𝑃 +𝐹𝑃
Sensitivity evaluates the model's ability to identify all actual positive cases relative to the total number of positive
cases missed. This is represented mathematically as.
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = 𝑇𝑃
𝑇𝑃 +𝐹𝑁
The F-Measure balances precision and recall, calculated as their harmonic mean, represented mathematically by
equation.
𝐹1 𝑆𝑐𝑜𝑟𝑒 = 2 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑆𝑒𝑛𝑠𝑡𝑖𝑣𝑖𝑡𝑦
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦
The main evaluation measures for this problem area are sensitivity, specificity, precision, F1 measure, and ROC-AUC
curve. In addition to these, we use two other performance measures that are more reliable statistical measures, namely
Matthews Correlation Coefficient (MCC) and Log Loss.
RESULT AND DISCUSSION
Following the systematic approach to enhancing heart disease prediction accuracy through machine learning
techniques, our findings demonstrate significant advancements. Despite the widespread adoption of algorithms such
as SVC and Decision Trees in diagnosing heart disease, our utilization of KNN, Random Forest Classifier, and
Logistic Regression outperforms them [12]. These selected algorithms not only exhibit superior accuracy but also
offer cost-efficiency and faster processing compared to earlier methodologies. Notably, KNN and Logistic Regression
achieve maximum accuracies of 88.5%, matching or surpassing those reported in prior studies. After training and
evaluating ten machine learning models and comparing their performances, the Random Forest model using the
Vol-10 Issue-3 2024 IJARIIE-ISSN(O)-2395-4396
24234 ijariie.com 4831
entropy criterion consistently outperforms others, achieving an accuracy of 90.63%. Additionally, a majority vote
feature selection technique, incorporating various selection methods, maintains the Random Forest model's superiority
post-feature selection, with an accuracy of 89.36%. Remarkably, this represents a minimal decrease of less than 1%
in accuracy compared to its performance before feature selection.
Accuracy
Precision
Sensitivity
F1 Score
ROC
Log
Loss
91.4894
88.7218
95.935
0.921875
0.912711
3.067545
The Random Forest model exhibited the highest performance during cross-validation.
Fig -1 Performance of RF Model
Fig -2 Comparison with Other Models
0
0.5
1
1.5
2
2.5
3
3.5
Random Forest
0
1
2
3
4
5
6
7
8
MLP KNN Extra tree
classifier XGB SVC SGD Adaboost CART GBM
COM PARISON WITH OTHER M ODELS
Accuracy Precision Sensitivity Specificity F1 Score ROC Log Loss
Vol-10 Issue-3 2024 IJARIIE-ISSN(O)-2395-4396
24234 ijariie.com 4832
The above results indicate that the XGBoost Classifier performs the best, achieving the highest test accuracy of 0.9191,
sensitivity of 0.943, specificity of 0.89, and the highest f1-score of 0.9243, with the lowest Log Loss of 2.792. Random
forest, on the other hand, attained the highest sensitivity of 95.122%.
Only 11 features have been selected by at least one feature selection method. Notably, features like
fasting_blood_sugar, chest_pain_type_typical angina, rest_ecg_left ventricular hypertrophy, and rest_ecg_normal are
absent from the table as they haven't been deemed important by any of the feature selection methods.
The top 6 features selected, based on a 5 out of 6 majority votes from feature selection methods, are st_slope_flat,
st_depression, max_heart_rate_achieved, exercise_induced_angina, cholesterol, and age. The machine learning
models will be retrained using these 6 features, and their performance will be compared to assess any potential drop
in performance post-feature selection.
CONCLUSION
In this study, we employed 10-fold cross-validation to evaluate the performance of 13 machine-learning algorithms
with varying hyperparameters. Subsequently, we trained and assessed these models on the test set, ultimately
identifying the top three performers. Notably, a stacked ensemble of powerful algorithms demonstrated superior
performance compared to individual models. Specifically, the Random Forest model utilizing the entropy criterion
exhibited the highest accuracy of 90.63%. Even after implementing feature selection techniques, the Random Forest
model remained the top performer, with only a negligible decrease in accuracy. Feature importance analysis
highlighted Cholesterol, Max Heart Rate achieved, and ST Depression as the most influential features.
REFERENCE
[1] "Cardiovascular diseases (CVDs)." WHO, 2020, https://www.who.int/zh/news-
room/factsheets/detail/cardiovascular-diseases-(cvds).
[2] In 2011, R. Rao surveyed the prediction of heart morbidity using data mining techniques, which was published in
Knowledge Management, volume 1, issue 3, spanning pages 1434.
[3] A. Taneja authored a paper titled "Heart Disease Prediction System Using Data Mining Techniques" in the Oriental
Journal in 2013.
[4] N. O. Fowler published a paper on the diagnosis of heart disease in volume V of an unspecified journal in March
2012.
[5] Ashok Kumar Dwivedi explored computational intelligence techniques for predicting diabetes mellitus in an article
titled "Analysis of Computational Intelligence Techniques for Diabetes Mellitus Prediction," published in Neural
Computing Applications in 2017.
[6] Bo Jin, Chao Che, Zhen Liu, Shillong Zhang, Xiaomeng Yin, And Xiaoping Wii, “Predicting the Risk of Heart
Failure With EHR Sequential Data Modelling", IEEE Access 2018.
[7] Senthilkumar Mohan, Chandrasegar Thirumalai, and Gautam Srivastava, “Effective Heart Disease Prediction
Using Hybrid Machine Learning Techniques”, IEEE Access 2019.
[8] A. Lakshmanarao, Y. Swathi, and P. Sri Sai Sundareswar investigated the application of machine learning
techniques in predicting heart disease. This research was published in the International Journal of Scientific &
Technology Research, Volume 8, Issue 11, in November 2019.
[9] Mijwil, M. M., Al-Mistarehi, A. H., and Mutur, D. S. conducted a literature review titled "The Practices of
Artificial Intelligence Techniques and Their Value in Addressing the COVID-19 Pandemic" in the Mobile Forensics
Journal in 2022, volume 4, issue 1, pages 11-30.
Vol-10 Issue-3 2024 IJARIIE-ISSN(O)-2395-4396
24234 ijariie.com 4833
[10] A. Taneja authored a paper titled "Heart Disease Prediction System Using Data Mining Techniques" in the
Oriental Journal in 2013.
[11] N. O. Fowler published a paper on the diagnosis of heart disease in volume V of an unspecified journal in March
2012.
[12] Ashok Kumar Dwivedi explored computational intelligence techniques for predicting diabetes mellitus in an
article titled "Analysis of Computational Intelligence Techniques for Diabetes Mellitus Prediction," published in
Neural Computing Applications in 2017.
[13] M. Shahi and R. Kaur Germ presented a heart disease prediction system using data mining techniques in the
Orient Journal of Computer Science and Technology in 2013.
[14] S. M. S. Shah et al. conducted research on heart disease diagnosis using parallel probabilistic principal component
analysis to extract relevant features, published in Statistical Mechanics and its Applications in 2017.
[15] T. Karthikeyan, B. Raghavan, and V. A. Kanimozhi conducted a study on the application of data mining
classification algorithms for heart disease prediction, which was published in the International Journal of Advanced
Research in Computer Engineering and Technology, volume 5, issue 4, pages 10761081, also in 2016.
[16] Folsom, A. R., Princes, R. J., Kaye, S. A., and Solar, J. T. (1989) investigated the association between body fat
distribution and the self-reported prevalence of hypertension, heart attack, and other heart diseases in older women.
[17] Gour, S., Panwar, P., Dwivedi, D., and Mali, C. (2022). In 'Intelligent Sustainable Systems,' published by Springer
in Singapore, the chapter titled 'A Machine Learning Approach for Heart Attack Prediction' spans pages 741 to 747."
[18] Konda Babu, A., Siddhartha, V., Kumar, B.B., and Penumutchi, B. (2021). "A comparative study on machine
learning-based heart disease prediction." Materials Today Proceedings.
[19] Gudmundsson, E.F., Bjornsdottir, G., Sigurdsson, S., Andersen, K., Thorsson, B., Aspelund, T., and Gudnason, V.
(2022). In a population-based cohort, the presence of carotid plaque correlates significantly with coronary artery calcium
and serves as a predictive factor for the development of incident coronary heart disease.
[19] A. Jagtap, P. Malewadkar, O. Baswat, and H. Rambade conducted a study titled "Heart disease prediction using
machine learning," published in the International Journal of Research in Engineering, Science, and Management in 2019.
[20] U. N. Dulhare published a paper titled "Prediction system for heart disease using naive Bayes and particle swarm
optimization" in the Biomedical Research Journal in 2018.
[21] J. K. Kim and S. Kang presented research on "Neural network-based coronary heart disease risk prediction using
feature correlation analysis" in the Journal of Healthcare Engineering in 2017.
[22] K. C. Siontis, P. A. Noseworthy, Z. I. Attia, and A. Paul discussed "Artificial intelligence-enhanced
electrocardiography in cardiovascular disease management" in a 2021 article published in Nature Reviews Cardiology.
[23] P. S. Linda, W. Yin, P. A. Gregory, Z. Amanda, and G. Margaux developed "Development of a Novel Clinical
Decision Support System for Exercise Prescription among Patients with multiple cardiovascular disease risk factors,"
published in Mayo Clinic Proceedings: Innovations, Quality & Outcomes in 2021.
[24] Y. Ali, R. Amir, and A.-M. Fardin conducted a study titled "Profile-based assessment of diseases affecting factors
using fuzzy association rule mining approach: a case study in heart diseases," published in the Journal of Biomedical
Informatics in 2021.
[25] Ghwanmeh, S., Mohammad, A., & Al-Ibrahim, A. (2013). "Innovative artificial neural network-based decision
support system for heart disease diagnosis." Journal of Intelligent Learning Systems and Applications, 5(3), 353-396.
[26] Al-Shayea, Q. K. (2011). "Artificial neural networks in medical diagnosis." International Journal of Computer
Science Issues, 8(2), 150154.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Heart disease is one of the most significant causes of mortality in the world today. Prediction of cardiovascular disease is a critical challenge in the area of clinical data analysis. Machine learning has been shown to be effective in assisting in making decisions and predictions from the large quantity of data produced by the healthcare industry. We have also seen machine learning (ML) techniques being used in recent developments in different areas of Internet of Things (IoT). Various studies give only a glimpse into predicting heart disease with machine learning techniques. In this paper, we propose a novel method that aims at finding significant features by applying machine learning techniques resulting in improving the accuracy in the prediction of cardiovascular disease. The prediction model is introduced with different combinations of features, and several known classification techniques. We produce an enhanced performance level with accuracy level of 88.7% through the prediction model for heart disease with Hybrid Random Forest with Linear Model (HRFLM).
Article
Full-text available
Background Of the machine learning techniques used in predicting coronary heart disease (CHD), neural network (NN) is popularly used to improve performance accuracy. Objective Even though NN-based systems provide meaningful results based on clinical experiments, medical experts are not satisfied with their predictive performances because NN is trained in a “black-box” style. Method We sought to devise an NN-based prediction of CHD risk using feature correlation analysis (NN-FCA) using two stages. First, the feature selection stage, which makes features acceding to the importance in predicting CHD risk, is ranked, and second, the feature correlation analysis stage, during which one learns about the existence of correlations between feature relations and the data of each NN predictor output, is determined. Result Of the 4146 individuals in the Korean dataset evaluated, 3031 had low CHD risk and 1115 had CHD high risk. The area under the receiver operating characteristic (ROC) curve of the proposed model (0.749 ± 0.010) was larger than the Framingham risk score (FRS) (0.393 ± 0.010). Conclusions The proposed NN-FCA, which utilizes feature correlation analysis, was found to be better than FRS in terms of CHD risk prediction. Furthermore, the proposed model resulted in a larger ROC curve and more accurate predictions of CHD risk in the Korean population than the FRS.
Article
Full-text available
Artificial neural networks are finding many uses in the medical diagnosis application. The goal of this paper is to evaluate artificial neural network in disease diagnosis. Two cases are studied. The first one is acute nephritis disease; data is the disease symptoms. The second is the heart disease; data is on cardiac Single Proton Emission Computed Tomography (SPECT) images. Each patient classified into two categories: infected and non-infected. Classification is an important tool in medical diagnosis decision support. Feed-forward back propagation neural network is used as a classifier to distinguish between infected or non-infected person in both cases. The results of applying the artificial neural networks methodology to acute nephritis diagnosis based upon selected symptoms show abilities of the network to learn the patterns corresponding to symptoms of the person. In this study, the data were obtained from UCI machine learning repository in order to diagnosed diseases. The data is separated into inputs and targets. The targets for the neural network will be identified with 1's as infected and will be identified with 0's as non-infected. In the diagnosis of acute nephritis disease; the percent correctly classified in the simulation sample by the feed-forward back propagation network is 99 percent while in the diagnosis of heart disease; the percent correctly classified in the simulation sample by the feed-forward back propagation network is 95 percent.
Article
Full-text available
Heart diagnosis is not always possible at every medical center, especially in the rural areas where less support and care, due to lack of advanced heart diagnosis equipment. Also, physician intuition and experience are not always sufficient to achieve high quality medical procedures results. Therefore, medical errors and undesirable results are reasons for a need for unconventional computer-based diagnosis systems, which in turns reduce medical fatal errors, increasing the patient safety and save lives. The proposed solution, which is based on an Artificial Neural Networks (ANNs), provides a decision support system to identify three main heart diseases: mitral stenosis, aortic stenosis and ventricular septal defect. Furthermore, the system deals with an encouraging opportunity to develop an operational screening and testing device for heart disease diagnosis and can deliver great assistance for clinicians to make advanced heart diagnosis. Using real medical data, series of experiments have been conducted to examine the performance and accuracy of the proposed solution. Compared results revealed that the system performance and accuracy are acceptable, with a heart diseases classification accuracy of 92%.
Article
This article has been withdrawn: please see Elsevier Policy on Article Withdrawal (https://www.elsevier.com/about/our-business/policies/article-withdrawal). This article has been withdrawn as part of the withdrawal of the Proceedings of the International Conference on Emerging Trends in Materials Science, Technology and Engineering (ICMSTE2K21). Subsequent to acceptance of these Proceedings papers by the responsible Guest Editors, Dr S. Sakthivel, Dr S. Karthikeyan and Dr I. A. Palani, several serious concerns arose regarding the integrity and veracity of the conference organisation and peer-review process. After a thorough investigation, the peer-review process was confirmed to fall beneath the high standards expected by Materials Today: Proceedings. The veracity of the conference also remains subject to serious doubt and therefore the entire Proceedings has been withdrawn in order to correct the scholarly record.
Article
The existing data mining solutions to identify risk factors associated with diseases are burdened with quite a few shortcomings. They usually use crisp partitions for numerical features and also do not use patient-specific profiles. These shortcomings create limitations for solving real problems. Discretizing a numerical feature through crisp partitions can also generate substantial partitioning errors, particularly for features whose values are closer to crisp boundaries. Since the normal range of each numerical feature varies according to the age, gender, and medical conditions of the patients, then ignoring these differences can undermine the accuracy of the extracted itemsets and rules. This paper presents a profile-based fuzzy association rule mining (PB-FARM) approach for the assessment of risk factors highly correlated with diseases. The proposed approach has three phases. Phase I involves creating profiles for patients based on their age, gender, and medical conditions, to determine a normal range of each numerical feature. Then fuzzy partitioning is done for all features (namely, numerical and categorical), and consequently, a structure, called FirstScan, is created. In Phase II, the FirstScan structure is utilized to mine for large fuzzy k-itemsets. Ultimately, in Phase III, the given k-itemsets are employed to generate fuzzy rules for associations between risk factors and diseases. To evaluate the performance of the proposed method the Z-Alizadeh Sani coronary artery disease (CAD) dataset, containing 303 records and 54 features, was used. The results show a positive correlation between typical chest pain and old age with the incidence of CAD. The comparisons made in this study showed that, firstly, the proposed algorithm has a higher partitioning accuracy than other methods, and secondly, it has a reasonably short execution time
Article
The application of artificial intelligence (AI) to the electrocardiogram (ECG), a ubiquitous and standardized test, is an example of the ongoing transformative effect of AI on cardiovascular medicine. Although the ECG has long offered valuable insights into cardiac and non-cardiac health and disease, its interpretation requires considerable human expertise. Advanced AI methods, such as deep-learning convolutional neural networks, have enabled rapid, human-like interpretation of the ECG, while signals and patterns largely unrecognizable to human interpreters can be detected by multilayer AI networks with precision, making the ECG a powerful, non-invasive biomarker. Large sets of digital ECGs linked to rich clinical data have been used to develop AI models for the detection of left ventricular dysfunction, silent (previously undocumented and asymptomatic) atrial fibrillation and hypertrophic cardiomyopathy, as well as the determination of a person’s age, sex and race, among other phenotypes. The clinical and population-level implications of AI-based ECG phenotyping continue to emerge, particularly with the rapid rise in the availability of mobile and wearable ECG technologies. In this Review, we summarize the current and future state of the AI-enhanced ECG in the detection of cardiovascular disease in at-risk populations, discuss its implications for clinical decision-making in patients with cardiovascular disease and critically appraise potential limitations and unknowns.
Article
Automatic diagnosis of human diseases are mostly achieved through decision support systems. The performance of these systems is mainly dependent on the selection of the most relevant features. This becomes harder when the dataset contains missing values for the different features. Probabilistic Principal Component Analysis (PPCA) has reputation to deal with the problem of missing values of attributes. This research presents a methodology which uses the results of medical tests as input, extracts a reduced dimensional feature subset and provides diagnosis of heart disease. The proposed methodology extracts high impact features in new projection by using Probabilistic Principal Component Analysis (PPCA). PPCA extracts projection vectors which contribute in highest covariance and these projection vectors are used to reduce feature dimension. The selection of projection vectors is done through Parallel Analysis (PA). The feature subset with the reduced dimension is provided to radial basis function (RBF) kernel based Support Vector Machines (SVM). The RBF based SVM serves the purpose of classification into two categories i.e., Heart Patient (HP) and Normal Subject (NS). The proposed methodology is evaluated through accuracy, specificity and sensitivity over the three datasets of UCI i.e., Cleveland, Switzerland and Hungarian. The statistical results achieved through the proposed technique are presented in comparison to the existing research showing its impact. The proposed technique achieved an accuracy of 82.18%, 85.82% and 91.30% for Cleveland, Hungarian and Switzerland dataset respectively.
Cardiovascular diseases (CVDs)
"Cardiovascular diseases (CVDs)." WHO, 2020, https://www.who.int/zh/newsroom/factsheets/detail/cardiovascular-diseases-(cvds).