ArticlePDF Available

Prediction of Heart Disease Using Classification Based Data Mining Techniques

December 2015

December 2015
32:503-511

DOI:10.1007/978-81-322-2208-8_46

Authors:

Sujata Joshi

Nitte Meenakshi Institute of Technology

Mydhili K. Nair

M.S. Ramaiah Institute of Technology

Data Mining is an interesting field of research whose major objective is to find interesting and useful patterns from huge data sets. These patterns can be further used to make important decisions based on the result of the analysis. Healthcare industry today generates huge amount of data on a day to day basis. This data has to be analysed and hidden and meaningful patterns can be discovered. Data mining plays a promising and significant role in this aspect. Data Mining techniques can be used for disease prediction. In this research, the classification based data mining techniques are applied to healthcare data. This research focuses on the prediction of heart disease using three classification techniques namely Decision Trees, Naïve Bayes and K Nearest Neighbour.

Results of decision tree algorithm

…

Results of few attributes using Naïve Bayes technique

…

Results of Naïve Bayes technique

…

Results of K-nearest neighbor technique

…

Figures - uploaded by Mydhili K. Nair

Content may be subject to copyright.

Content uploaded by Mydhili K. Nair

Content may be subject to copyright.

Prediction of Heart Disease Using

Classiﬁcation Based Data Mining

Techniques

Sujata Joshi and Mydhili K. Nair

Abstract Data Mining is an interesting ﬁeld of research whose major objective is

to ﬁnd interesting and useful patterns from huge data sets. These patterns can be

further used to make important decisions based on the result of the analysis.

Healthcare industry today generates huge amount of data on a day to day basis. This

data has to be analysed and hidden and meaningful patterns can be discovered. Data

mining plays a promising and signiﬁcant role in this aspect. Data Mining techniques

can be used for disease prediction. In this research, the classiﬁcation based data

mining techniques are applied to healthcare data. This research focuses on the

prediction of heart disease using three classiﬁcation techniques namely Decision

Trees, Naïve Bayes and K Nearest Neighbour.

Keywords Data mining Classiﬁcation technique Heart disease Healthcare 

Decision tree Naïve bayes K-Nearest neighbor Dataset

1 Introduction

Heart Disease is a class of diseases that involve the heart, the blood vessels or both.

The most common causes of heart disease are atherosclerosis and/or hypertension.

Atherosclerosis is a condition that develops when a substance called plaque builds

up in the walls of the arteries. This buildup narrows the arteries, making it harder

for blood to ﬂow through. If a blood clot forms, it can stop the blood ﬂow. This can

S. Joshi (&)

Department of Computer Science and Engineering, Nitte Meenakshi Institute of Technology,

Bangalore, Karnataka, India

e-mail: sujata_msrp@yahoo.com

M.K. Nair

Department of Information Science and Engineering, M. S. Ramaiah Institute of Technology,

Bangalore, Karnataka, India

e-mail: mydhili.nair@gmail.com

©Springer India 2015

L.C. Jain et al. (eds.), Computational Intelligence in Data Mining - Volume 2,

Smart Innovation, Systems and Technologies 32, DOI 10.1007/978-81-322-2208-8_46

503

cause a heart attack or stroke. The major risk factors for heart diseases are age,

gender, high blood pressure, diabetes mellitus, tobacco smoking, processed meat

consumption, excessive alcohol consumption, sugar consumption, family history,

obesity, lack of physical activity, psychosocial factors, and air pollution.

Heart disease is the leading cause of deaths worldwide, however since the 1970s,

mortality rate due to heart related diseases have declined in many high-income

countries. At the same time, heart related deaths and diseases have increased at a

fast rate in low and middle-income countries. Although heart disease usually affects

older adults, the symptoms may begin in early life, making primary prevention

efforts necessary from childhood. Therefore risk factors may be modiﬁed by having

healthy eating habits, exercising regularly, and avoiding of smoking tobacco.

In today’s world, most of the hospitals maintain their patient data in electronic

form through some hospital database management system. These systems generate

huge amount of data on a daily basis. This data may be in the form of free text,

structured as in databases or in the form of images. This data may be used to extract

useful information which may be used for decision making. This requirement has

led to the use of Knowledge Discovery in Databases (KDD) which is responsible

for transforming data of low-level into high-level knowledge for decision making.

Data mining which is one of the KDD process aims at ﬁnding useful patterns from

large datasets. These patterns can be further analyzed and the result can be used for

effective decision making and analysis. The various tasks of data mining are

classiﬁcation, clustering, association analysis and outlier detection. In this paper,

various data mining classiﬁcation techniques are applied to healthcare data related

to heart diseases. It has helped to determine the best prediction technique in terms of

its accuracy and error rate on the speciﬁc dataset.

2 Related Work

There has been an increase in the number of people suffering from heart diseases in

the recent years [1]. With the advent of information technology and its applications

data mining plays a very important and apt role in early detection of diseases. Data

mining is extensively used in all ﬁelds and healthcare industry in particular [2–6]. In

the healthcare industry, the data mining techniques are used for diagnosis of dis-

eases [7], disease prediction [8], and analysis [9]. Data mining techniques can be

applied for predicting the outcome of interest. Hence prediction is a very important

task. The issues and guidelines of Predictive data mining in clinical medicine is

discussed in [10]. Research work [7,11,12] related to heart disease diagnosis using

data mining techniques is the motivation for this work. Classiﬁcation based on Gini

index is discussed in [13]. The data mining techniques Decision tree, Naïve Bayes

and KNN are discussed in [8,10,14,15]. A model based on Combination of Naïve

Bayes Classiﬁer and K-Nearest Neighbor is proposed in [16]. A clinical decision

support system using association rule mining is discussed in [17]. A prediction

system for lung cancer detection is proposed in [18]. A diagnostic tool is proposed

504 S. Joshi and M.K. Nair

in [19] for skin diseases. In [6,9], the researchers analyze healthcare data using

different data mining techniques. After the extensive literature survey of the dataset,

algorithms, methods employed by the authors, results and future work, it is found

that there is a lot of scope in discovering efﬁcient methods of medical diagnosis for

various diseases and their analysis. This work is an attempt to predict the occur-

rence of heart diseases using classiﬁcation data mining techniques namely Decision

Tree, Naïve Bayes and K-Nearest Neighbor techniques.

3 Classiﬁcation

Classiﬁcation is one of the important data mining tasks. The objective of classiﬁ-

cation is to assign a class to previously unseen data accurately. Classiﬁcation

consists of two stages:

Stage 1: Model construction

Stage 2: Model usage

Classiﬁcation creates a model for the attributes of the dataset. A dataset is

divided into training set and test set. In the ﬁrst stage the training set is used to build

the classiﬁcation model using a learning algorithm. In the second stage, the learned

model is put into operational use i.e. it is used to validate the test set. If the model

performs well, then the model is now ready for prediction.

3.1 Classiﬁcation Techniques

In this study, the classiﬁcation techniques, Decision tree, Naïve Bayes and KNN are

explored and applied to the dataset.

3.1.1 Decision Trees

The decision tree is a structure that includes root node, branch and leaf node. Each

internal node denotes a test on an attribute, each branch denotes the outcome of test

and each leaf node holds the class label. The ﬁrst node in the tree is the root node.

First, an attribute is selected and placed at the root node, and a branch is made for

each possible value. This splits up the data set into subsets, one for every value of

the attribute. Now repeat the process recursively for each branch, using only those

instances that actually reach the branch. When all instances at a node have the same

classiﬁcation, the tree development can be stopped. To select the best split the

measures used generally are Gini, Entropy or Classiﬁcation error.

Prediction of Heart Disease …505

3.1.2 Naïve Bayes Classiﬁer

Classiﬁcation based on Bayes Theory is known as Bayesian Classiﬁcation. Naive

Bayes classiﬁer is a statistical based classiﬁer which is based on Bayes Theory. It

assumes that attributes are statistically independent. This classiﬁer is based on

probabilities.

Given two events A and B, P(A) is prior probability and P(A|B) is posterior

probability, then according to Bayes theorem

PðAjBÞ¼P(B/A)P(A)/P(B) and P(BjA) is computed as P(A \B)=P(A)

These Bayesian probabilities are used to determine the most likely next event for

the given instance given all the training data. Conditional probabilities are deter-

mined from the training data.

This classiﬁer yields optimal prediction (given the assumptions). It can also

handle discrete or numeric attribute values.

3.1.3 K-Nearest Neighbor

Nearest neighbor method is a instance based classiﬁcation technique that remembers

all the instances. When the new instance is encountered, it uses previous instances as

a model and compares it with the new instance. Prediction for the current instance is

the one with the most similar previously observed instance. K-NN classiﬁes the

instances using the K nearest neighbors. This classiﬁer has faster training rate but is

slow when the dataset is large since it has to evaluate all instances.

4 Methodology

4.1 TOOL Used

WEKA [20] Tool (Waikato Environment for Knowledge Analysis), is a set of data

mining algorithms and tools which can be used for analysis of data. WEKA is

developed in JAVA. WEKA allows analyzing the data sets saved in .arff format using

various algorithms. In this study, the Decision tree, Naïve Bayes and K-NN algor ithms

are applied to heart data set and the results of applying these techniques are shown.

4.2 Data Source

The heart diseases data set from the UCI [21] Learning Repository is used for this

study. The heart data set consists of 303 records and 14 attributes. The attributes are

listed in Table 1.

506 S. Joshi and M.K. Nair

4.3 Decision Tree

The decision tree is created by selecting the best split at every node. To select the

best attribute for the split, the information gain is computed at each node and the

attributes are ranked accordingly. Here the attribute evaluator used is Gain Ratio

AttributeEval and the search method used is Ranker method from WEKA Tool. The

ranked attributes are listed in Table 2.

Table 1 Attributes of the heart.arff ﬁle

No Attribute Type

1 age Real

2 sex {female, male}

3 cp {typ_angina, asympt, non_anginal, atyp_angina}

4 trestbps Real

5 chol Real

6 restecg {left_vent_hyper, normal, st_t_wave_abnormality}

7 thalach real

8 restecg {left_vent_hyper, normal, st_t_wave_abnormality}

9 exang {no, yes}

10 oldpeak real

11 slope {up, ﬂat, down}

12 ca real

13 thal {ﬁxed_defect, normal, reversable_defect}

14 num {‘<50’,‘>50_1’,’>50_2’,‘>50_3’,‘>50_4’}

Table 2 Attribute ranking

based on information gain Info gain Rank Attribute

0.17 12 thal

0.16 13 ca

0.15 9 exang

0.13 8 thalach

0.11 3 cp

0.10 10 oldpeak

0.09 11 slope

0.065 2 sex

0.060 1 age

0.022 7 restecg

0 6 fbs

0 5 chol

0 4 trestbps

Prediction of Heart Disease …507

The attributes selected in the order are: 12,13,9,8,3,10,11,2,1,7,6,5, 4.

The Decision Tree algorithm J48 is then applied to the heart data set and the

decision tree in Fig. 1is generated. This decision tree can be used for prediction.

The results are shown in Table 3.

4.4 Naïve Bayes

The attribute evaluator used is Gain Ratio AttributeEval and the search method used

is Ranker method. The ranked attributes are same as in Decision tree. The Naïve

Bayes algorithm is applied to the heart data set and the results of few attributes are

shown in Table 4.

The results are shown in Table 5.

Fig. 1 Decision tree generated using J48 algorithm

Table 3 Results of decision tree algorithm

No of instances Percentage (%)

Correctly classiﬁed instances 279 92.0792

Incorrectly classiﬁed instances 24 7.9208

Total instances 303

508 S. Joshi and M.K. Nair

4.5 K-Nearest Neighbor

The KNN algorithm is applied to the heart data set and the results are shown in

Table 6.

5 Results and Conclusion

The evaluation measures used are Sensitivity, Speciﬁcity and Accuracy

(i) Sensitivity =TP/P

(ii) Speciﬁcity =TN/N

(iii) Accuracy =(TP +TN)/(P+N)

Table 4 Results of few attributes using Naïve Bayes technique

Attribute <50 (0.54) >50_1 (0.45) >50_2 (0) >50_3 (0) >50_4 (0)

typ_angina 17.0 8.0 1.0 1.0 1.0

asymp 40.0 105.0 1.0 1.0 1.0

non_anginal 70.0 19.0 1.0 1.0 1.0

atyp_angina 42.0 10.0 1.0 1.0 1.0

[total] 169.0 142.0 4.0 4.0 4.0

restecg

left_vent_hyper 69.0 80.0 1.0 1.0 1.0

Normal 97.0 57.0 1.0 1.0 1.0

st_t_wave_abnormality 2.0 4.0 1.0 1.0 1.0

[total] 168.0 141.0 3.0 3.0 3.0

Table 5 Results of Naïve Bayes technique

No of instances Percentage (%)

Correctly classiﬁed instances 255 84.1584

Incorrectly classiﬁed instances 48 15.8416

Total instances 303

Table 6 Results of K-nearest neighbor technique

No of instances Percentage (%)

Correctly classiﬁed instances 303 100

Incorrectly classiﬁed instances 0 0

Total instances 303

Prediction of Heart Disease …509

where TP is true positives, TN is true negatives, P and T are actual positives and

actual negatives respectively. A good predictor must have high sensitivity, low

speciﬁcity and high accuracy. The comparisons of these measures with respect to

the three prediction techniques are summarized in Table 7.

The experiments are conducted with WEKA tool and the algorithms applied on

the heart dataset. The graph in Fig. 2reveals that sensitivity and accuracy are high

and speciﬁcity is low. Hence the predictors perform well on operational use. With

respect to model creation the results show that KNN has highest accuracy as

expected since KNN remembers all the instances. But when used for prediction the

Decision Tree performs well when compared to other two methods for the given

heart dataset.

References

1. Heart Disease—General Info and Peer reviewed studies: [Online] Available http://www.

aristoloft.com

2. Patka, S., et al.: Recent trends and rapid development of applications in data mining. IOSR

J. Comput. Sci. (IOSR-JCE) 73–78. e-ISSN: 2278-0661, p-ISSN: 2278-8727 (2014)

3. Tomar, D., Agarwal, S.: A survey of data mining approaches for healthcare. Int. J. Bio-Science

and Bio-Technology 5(5), 241–256 (2013)

4. El-Sappagh, S.H., et al.: Data mining and knowledge discovery: applications, techniques,

challenges and process models in healthcare. Int. J. Eng. Res. Appl. (IJERA) 3(3), 900–906.

ISSN: 2248-9622 www.ijera.com (2013)

5. Koh, H.C., Tan, G.: Data mining applications in healthcare. J. Healthc. Inf. Manag. 19(2), 65

(2011)

Table 7 Summarization of

prediction techniques with

performance

Prediction

technique

Sensitivity Speciﬁcity Accuracy

Decision tree 0.921 0.085 0.922

Naïve bayes 0.842 0.165 0.842

KNN 1 0 1

0.2

0.4

0.6

0.8

Sensitivity Specificity Accuracy

Predictor Comparison

Decision Tree Naïve Bayes KNN

Fig. 2 Comparison of

prediction techniques

510 S. Joshi and M.K. Nair

6. Obenshain, M.K.: Application of data mining techniques to healthcare data. Infect. Control

Hosp. Epidemiol. 25(8), 690–695 (2004)

7. Shouman, M., Turner, T., Stocker, R.: Using data mining techniques in heart disease diagnosis

and treatment. In: Proceedings in Japan–Egypt Conference on Electronics, Communications

and Computers, vol. 2, pp. 174–177. IEEE (2012)

8. Bellazzi, R., Zupan, B.: Predictive data mining in clinical medicine: current issues and

guidelines. Int. J. Med. Inf. 77(2), 81–97 (2006)

9. Gosain, A.: Analysis of healthcare data using different data mining techniques, IEEE, ISBN:

978-1-4244-4711-4 (2009)

10. Milovic, B., Milovic, M.: Prediction and decision making in health care using data mining. Int.

J. Public Health Sci. (IJPHS) 1(2), 69–78 (2012). ISSN: 2252-8806

11. Melillo, P., De Luca, N., Bracale, M., Pecchia, L.: Classiﬁcation tree for risk assessment in

patients suffering from congestive heart failure via long-term heart rate variability. IEEE

J. Biomed Health Inf. 17(3), 727–733 (2013)

12. Rao, R.B., Krishan, S., Niculescu, R.S.: Data mining for improved cardiac care. ACM

SIGKDD Explor. Newsl. 8(1), 3–10 (2006)

13. Suneetha, N., Hari, V.M.K., Kumar, V.S.: Modiﬁed gini index classiﬁcation: a case study of

heart disease dataset. Int. J. Comput. Sci. Eng. 2(6), 1959–1965 (2010)

14. Han, J., Kamber, M.: Data Mining: Concepts And Techniques. Morgan Kaufmann, San

Francisco (2001)

15. Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining, 4th edn. Pearson

Publications, Boston

16. Ferdousy, E.Z., Islam, M.M, Matin, M.A.: Combination of Naïve Bayes classiﬁer and K-

nearest neighbor in the classiﬁcation based predictive models. J. Comput. Inf. Sci. 6(3), 48–56.

ISSN: 1913-8989 (2013)

17. Cheng, C., Chanani, N., Vengopalan, J., Maher, K., Wang, D.: icuARM—an ICU clinical

decision support system using association rule mining. IEEE J. Transl. Eng. Health Med. 1

(2013)

18. Krishnaiah, V., Narasimha, G., Chandra, N.S.: Diagnosis of lung cancer prediction system

using data mining classiﬁcation techniques. Int. J. Comput. Sci. Inf. Technol. 4,39–45 (2013)

19. Cataloluk, H., Kesler, M.: A diagnostic software tool for skin diseases with basic and weighted

K-NN, IEEE. ISBN: 978-1-4673-1448-0/12 (2012)

20. WEKA: Data Mining Machine Learning Software. [Online] Available http://www.cs.waikato.

ac.nz/ml/weka/

21. UCI Machine Learning Repository. [Online] Available http://archive.ics.uci.edu/ml/datasets.

html

22. Cios, K.J., William Moore, G.: Uniqueness of medical data mining. J. Artif. Intell. Med. 26(1),

1–24 (2002)

Prediction of Heart Disease …511

RELATIVE STUDY OF DIFFERENT MACHINE LEARNING CLASSIFICATION ALGORITHMS TO FORECAST THE HEART DISEASE

Article

May 2022

Cardiovascular is comprehended as heart disease, and it covers different cases that impact the heart and have been the direct cause of death. It allies numerous risk elements in heart disease and the necessity to build practical strategies for earlier diagnosis to manage the condition promptly. The machine learning classifier has evolved significantly in the medical database, particularly for diagnosing disease. Nowadays, numerous organizations utilize these machine learning strategies to improve medical diagnostics for the earlier prognosis of conditions. This paper summarizes these machine learning algorithms in disease prediction and computational. Classification algorithms considered are logic regression, Naive Bayes, K-NN (nearest neighbor), (Support vector machine), SVM, (Decision Tree) DT, K-means clustering, and RF (random forest). This work reviewed 20 papers from 2018- 2021 that employed Classifier to detect specifically heart diseases in the medical sector during the last four years.

Heart Diseases Forecast Using Data Mining Techniques and Tools

Article

Full-text available

Dec 2019

Data Mining have always been a field and combination of both computer science and statistical knowledge. From the beginning it is used to ascertain designs, patterns and arrangements which are formed in the information pool. The motive of the data mining development is to produce useful information from the pool of raw data and convert it into useful information which can be used for future arrangements. The tools which are used in data mining are helpful in predicting the future trends and predictions across the market, which also help in decision making and building the knowledge to make decisions. The “Healthcare Industry” is generally information rich. It has been collecting data to improve the continuing problems and help to identify the solutions for that problems. Data mining techniques can be used to predict heart conditions from the voluminous and complex data which are kept by the hospitals for decision making which are difficult to analyze by outmoded methods. Unfortunately, outmoded methods are less accurate in discovering hidden information from effective decision making. Data mining helps in altering the huge amount of data into knowledge driven which takes, as compared to others, less time and effort for the prediction and with greater accuracy. Our effort is to apply different data mining techniques that are used to solve the problem of biased forecasts and decision making and help in calculating the results with more accuracy.

Cardiovascular Abnormalities Identification by Machine Learning Classifiers and its Comparision

Conference Paper

Full-text available

Dec 2023

Data Mining Techniques in Disease Classification: Descriptive Bibliometric Analysis and Visualization of Global Publications

Article

Full-text available

Jan 2023

Many literature searches are required in scientific study, and these take a significant amount of time and effort. The bibliometric analysis is useful for locating research hotspots and gaining an understanding of research trends, according to the published literature. This bibliometric analysis was carried out with the assistance of the tools Bibliometric package in R (biblioshyny) and VOSviewer; the data used in this analysis was primarily derived from the Scopus repository to analyze the data mining approaches for disease classification (DMDC). A sample of 804 articles was selected by utilizing a query including essential key terms such as (”data mining” OR “data-mining”) AND (“disease classification” OR “disease identification” OR “disease prediction”). Overall, the findings of the study indicate the highest number of publications on applying DMDC published in 2019 with 141 research articles. As for individual researchers, the most productive authors are Jabbar MA, Li J, and Wang X. Jabbar MA in the field of DMDC, and 57 articles (1.2%) were written by one author, while the rest of the 747 articles were written by multiple authors. The Advances in Intelligent Systems and Computing journal (26 articles) has the greatest number of published articles connected to the DMDC field. Total of 804 articles in 474 distinct journals, there are 57 journals that have already published more than three papers, accounting for 12.03%. The USA was the most productive country, and the University of California is the affiliated university that comes from most top research in this field. The most cited research article published by Moore JH in 2006 included 489 citations. There are different types of diseases identified using data mining techniques such as heart disease, Brest cancer, liver disease, chronic kidney disease, Parkinson’s disease, diabetes mellitus, and Alzheimer’s disease. The most widely used algorithms in the research community include the random forest, decision tree, support vector machine, and naive bayes algorithms. In the future, the field of data mining could grow in many different directions and could be an effective way to increase the accuracy of disease prediction.

Augmentasi Awal untuk Diagnosis Diabetes Menggunakan Pendekatan Data Mining

Article

Full-text available

Jan 2021

Wildan Akbar Kombat Ginting

Di bidang medis, spesialis ingin data yang tersedia untuk membuat keputusan. Teknik data mining saat ini terkait dengan penyelidikan terapeutik untuk memecah sejumlah besar data penyembuhan. Penelitian ini bertujuan untuk menggunakan teknik data mining untuk mengeksplorasi database diabetes yang dapat digunakan untuk diagnosis yang lebih baik. Oleh karena itu, penelitian ini berfokus pada analisis data diabetes menggunakan berbagai sistem data mining. Pengobatan mutakhir menghasilkan sejumlah besar data yang dimasukkan ke dalam database terapeutik. Analisis yang tepat dari data tersebut dapat mengungkapkan beberapa realitas mengejutkan yang telah disembunyikan atau didistribusikan entah bagaimana. Data mining adalah bidang yang bertujuan untuk fokus pada fakta menarik dari pengumpulan data yang ekstensif. Penelitian ini bertujuan untuk memecah kumpulan data diabetes dan mengekstrak beberapa fakta menarik yang dapat digunakan untuk mengembangkan model harapan. Salah satu aplikasi di mana instrumen data mining menunjukkan nilainya adalah menentukan penyebab penyakit. Sepanjang tahun-tahun sebelumnya, diabetes telah menjadi penyebab utama kematian di seluruh dunia. Beberapa ilmuwan menggunakan data faktual. Ketersediaan data terapeutik dalam jumlah besar memerlukan pengembangan alat penambangan yang canggih untuk membantu para profesional medis dalam mendiagnosis infeksi diabetes. Penggunaan data mining sebagai bagian dari diagnosis diabetes telah diselidiki dengan cermat, menampilkan tingkat presisi yang memuaskan. Spesialis baru-baru ini mengeksplorasi pengaruh menggabungkan banyak prosedur dengan hasil yang ditingkatkan dalam mendeteksi penyakit diabetes.

A Hybrid Method: Hierarchical Agglomerative Clustering Algorithm with Classification Techniques for Effective Heart Disease Prediction

Article

Jan 2022

Prediction of heart disease is challenging because countless data are collected for clinical data analysis, but all this information is not equally important for making the right decisions. We have proposed a hybrid method: Hierarchical Agglomerative Clustering algorithm combined with conventional classification techniques such as K-Nearest Neighbors (K-NN), Decision Tree (J48), and Naïve Bayes which aims to reduce the prediction time by clustering the patients having almost similar symptoms of heart failure. This approach minimizes the forecasting time based on clusters of patients instead of individual patients. Moreover, a comparison between the classification techniques and our approach is depicted based on precision, recall, F1 score, accuracy, and prediction time. The accuracies of the classifiers (K-NN-66.67%, J48-83.33%, and Naïve Bayes83.33%) of our system have slightly decreased compared with the conventional methods (K-NNN-69.128%, J48-83.8926%, and Naïve Bayes-87.248%) but the prediction time was significantly low (K-NNN-230ms, J48-203ms, and Naïve Bayes-195ms).

Early Exposing Cardiovascular Disease Identification Using Machine Learning Approach

Conference Paper

Full-text available

Oct 2022

Heart Disease Prediction System Using Linear Regression, Smoreg, And Rep Trees Algorithms

Article

Full-text available

Aug 2019

The broad area of data mining covers most of the fields of research. Its role in medical diagnosis is very motivative to the researchers. It is very easy for the medical practitioners to analyze and treat the disease of the patients at an early stage. The proposed work deals with predicting heart disease of the patients at an early stage. The method was organized in three stages, Data collection, Data preprocessing and Data classification. The dataset for the work was collected from UCI repository. The collected sample was first preprocessed to clean unwanted information from the dataset. Classification operation is then performed on the preprocessed data. Classification is carried out with three different techniques, Linear regression model, SMOreg and REP trees. The results of the three methods were compared based on Root mean squared error and the Absolute error and are tabulated.

PREDICTIVE THE HEART DISEASE USING THE WEIGHTED GAIN DECISION TREE ALGORTHIM

Article

May 2022

Data mining techniques have been mostly used in medical area for prediction and diagnosis of various diseases. These techniques discover the hidden pattern and relationship in medical data and therefore have been very important in designing clinical support. Now a day's data mining techniques are widely used in diagnosis of heart disease because of increasing death rate worldwide. The reason of this may be the complex and expensive tests conducted in labs to predict the heart disease. Systems based on these risk factors not only benet healthcare professionals, but warn them of the potential presence of heart disease even before a patient is admitted to the hospital or undergoes an expensive medical examination. This in order to reduce the risk of this disease a better approach would to identify risk factor the result in heart disease. This study is an effort in this direction. This approach to predict the heart disease in early stage is developed in present study by analyzing risk factors. This technique developed weighted gain decision tree predicts the risk of heart disease with an accuracy of 90%.

Multilingual Conversational AI incorporated with Visual Questions Answering and Intelligent Disease Prediction for Healthcare Industry

Conference Paper

Apr 2022

Combination of Naïve Bayes Classifier and K-Nearest Neighbor (cNK) in the Classification Based Predictive Models

Article

Full-text available

May 2013

In this study, we present a new classifier that combines the distance-based algorithm K-Nearest Neighbor and statistical based Naïve Bayes Classifier. That is equipped with the power of both but avoid their weakness. The performance of the proposed algorithm in terms of accuracy is experimented on some standard datasets from the machine-learning repository of University of California and compared with some of the art algorithms. The experiments show that in most of the cases the proposed algorithm outperforms the other to some extent. Finally we apply the algorithm for predicting profitability positions of some financial institutions of Bangladesh using data provided by the central bank.

A diagnostic software tool for skin diseases with basic and weighted K-NN

Conference Paper

Full-text available

Jul 2012

In the field of dermatology, to able to make differential diagnosis of erythemato-squamous diseases between each other accurately, is quite significant for the treatment of the disease. Especially the symptoms seen in the early stages of diseases in this group, may be very similar to each other. And this situation makes it difficult to determine accurate diagnosis for patients. Hence this study presents a data mining application for this problem in the medical field. Mentioned software tool in this study is trying to obtain correct diagnosis of erythemato-squamous diseases by using the basic and weighted K-NN algorithms on medical data. In this way, this paper presents a comparison between these two methods by evaluating and presenting the performances of them. Furthermore there is also a comparison between the Euclidean and Manhattan distance measures.

icuARM-An ICU Clinical Decision Support System Using Association Rule Mining

Article

Full-text available

Nov 2013

The rapid development of biomedical monitoring technologies has enabled modern intensive care units (ICUs) to gather vast amounts of multimodal measurement data about their patients. However, processing large volumes of complex data in real-time has become a big challenge. Together with ICU physicians, we have designed and developed an ICU clinical decision support system icuARM based on associate rule mining (ARM), and a publicly available research database MIMIC-II (Multi-parameter Intelligent Monitoring in Intensive Care II) that contains more than 40,000 ICU records for 30,000+patients. icuARM is constructed with multiple association rules and an easy-to-use graphical user interface (GUI) for care providers to perform real-time data and information mining in the ICU setting. To validate icuARM, we have investigated the associations between patients' conditions such as comorbidities, demographics, and medications and their ICU outcomes such as ICU length of stay. Coagulopathy surfaced as the most dangerous co-morbidity that leads to the highest possibility (54.1%) of prolonged ICU stay. In addition, women who are older than 50 years have the highest possibility (38.8%) of prolonged ICU stay. For clinical conditions treatable with multiple drugs, icuARM suggests that medication choice can be optimized based on patient-specific characteristics. Overall, icuARM can provide valuable insights for ICU physicians to tailor a patient's treatment based on his or her clinical status in real time.

Classification Tree for Risk Assessment in Patients Suffering From Congestive Heart Failure via Long-Term Heart Rate Variability

Article

Full-text available

May 2013

This study aims to develop an automatic classifier for risk assessment in patients suffering from congestive heart failure (CHF). The proposed classifier separates lower risk patients from higher risk ones, using standard long-term heart rate variability (HRV) measures. Patients are labeled as lower or higher risk according to the New York Heart Association classification (NYHA). A retrospective analysis on two public Holter databases was performed, analyzing the data of 12 patients suffering from mild CHF (NYHA I and II), labeled as lower risk, and 32 suffering from severe CHF (NYHA III and IV), labeled as higher risk. Only patients with a fraction of total heartbeats intervals (RR) classified as normal-to-normal (NN) intervals (NN/RR) higher than 80% were selected as eligible in order to have a satisfactory signal quality. Classification and regression tree (CART) was employed to develop the classifiers. A total of 30 higher risk and 11 lower risk patients were included in the analysis. The proposed classification trees achieved a sensitivity and a specificity rate of 93.3% and 63.6%, respectively, in identifying higher risk patients. Finally, the rules obtained by CART are comprehensible and consistent with the consensus showed by previous studies that depressed HRV is a useful tool for risk assessment in patients suffering from CHF.

A survey on Data Mining approaches for Healthcare

Article

Full-text available

Oct 2013

Divya Tomar

Data Mining is one of the most motivating area of research th at is become increasingly popular in health organization. Data Mining plays an important role for uncovering new trends in healthcare organization which in turn helpful for all the parties associated with this field. This survey explores the utility of var ious Data Mining techniques such as classification, clustering, association, regression in health domain. In this paper, we present a brief introduction of these techniques and their advantages and disadvantages. This survey also highlights applications, c hallenges and future issues of Data Mining in healthcare. Recommendation regarding the suitable choice of available Data Mining technique is also discussed in this paper.

Data Mining: Concepts and Techniques

Book

Jan 2012

This is the third edition of the premier professional reference on the subject of data mining, expanding and updating the previous market leading edition. This was the first (and is still the best and most popular) of its kind. Combines sound theory with truly practical applications to prepare students for real-world challenges in data mining. Like the first and second editions, Data Mining: Concepts and Techniques, 3rd Edition equips professionals with a sound understanding of data mining principles and teaches proven methods for knowledge discovery in large corporate databases. The first and second editions also established itself as the market leader for courses in data mining, data analytics, and knowledge discovery. Revisions incorporate input from instructors, changes in the field, and new and important topics such as data warehouse and data cube technology, mining stream data, mining social networks, and mining spatial, multimedia and other complex data. This book begins with a conceptual introduction followed by a comprehensive and state-of-the-art coverage of concepts and techniques. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. Wherever possible, the authors raise and answer questions of utility, feasibility, optimization, and scalability. relational data. -- A comprehensive, practical look at the concepts and techniques you need to get the most out of real business data. -- Updates that incorporate input from readers, changes in the field, and more material on statistics and machine learning, -- Scores of algorithms and implementation examples, all in easily understood pseudo-code and suitable for use in real-world, large-scale data mining projects. -- Complete classroom support for instructors as well as bonus content available at the companion website. A comprehensive and practical look at the concepts and techniques you need in the area of data mining and knowledge discovery.

Data Mining: Concepts and Techniques

Article

Jan 2006

Prediction and decision making in Health Care using Data Mining

Article

Aug 2012

Tendency for data mining application in healthcare today is great, because healthcare sector is rich with information, and data mining is becoming a necessity. Healthcare organizations produce and collect large volumes of information on daily basis. Use of information technologies allows automatization of processes for extraction of data that help to get interesting knowledge and regularities, which means the elimination of manual tasks and easier extraction of data directly from electronic records, transferring onto secure electronic system of medical records which will save lives and reduce the cost of the healthcare services, as well and early discovery of contagious diseases with the advanced collection of data. Data mining can enable healthcare organizations to predict trends in the patient conditions and their behaviors, which is accomplished by data analysis from different perspectives and discovering connections and relations from seemingly unrelated information. Raw data from healthcare organizations are voluminous and heterogeneous. They need to be collected and stored in the organized forms, and their integration enables forming of hospital information system. Healthcare data mining provides countless possibilities for hidden pattern investigation from these data sets. These patterns can be used by physicians to determine diagnoses, prognoses and treatments for patients in healthcare organizations.

Using data mining techniques in heart disease diagnosis and treatment

Article

Mar 2012

The availability of huge amounts of medical data leads to the need for powerful data analysis tools to extract useful knowledge. Researchers have long been concerned with applying statistical and data mining tools to improve data analysis on large data sets. Disease diagnosis is one of the applications where data mining tools are proving successful results. Heart disease is the leading cause of death all over the world in the past ten years. Several researchers are using statistical and data mining tools to help health care professionals in the diagnosis of heart disease. Using single data mining technique in the diagnosis of heart disease has been comprehensively investigated showing acceptable levels of accuracy. Recently, researchers have been investigating the effect of hybridizing more than one technique showing enhanced results in the diagnosis of heart disease. However, using data mining techniques to identify a suitable treatment for heart disease patients has received less attention. This paper identifies gaps in the research on heart disease diagnosis and treatment and proposes a model to systematically close those gaps to discover if applying data mining techniques to heart disease treatment data can provide as reliable performance as that achieved in diagnosing heart disease.

Analysis of health care data using different data mining techniques

Article

Jul 2009

Data mining is an interesting field of research whose major objective is to acquire knowledge from large amounts of data. With advances in health care related research, there is a wealth of data available. However, there is a lack of effective analytical tools to discover hidden and meaningful patterns and trends in data, which is essential for any research. In recent years, human immune-deficiency virus (HIV) related illnesses have become a threat to the modern world. Researchers all over world, including India, are trying hard to find suitable answer to this and this led to lots of research in the field. Therefore, a tool which can process data in meaningful way is the need the time. In this study, we briefly examine the potential use of classification based data mining techniques such as decision tree and association rule to massive volume of health care data. Further we developed a prototype/approach that is specially designed to monitor the patients receiving antiretroviral therapy (ART). As monitoring of individual is not a difficult task however deriving inferences from a large cohort and then use this information for future guidelines need this kind of prototype/approach. We expect, this would have great impact in current management and future strategies against HIV.

Prediction of Heart Disease Using Classification Based Data Mining Techniques

Abstract and Figures

Recommended publications

Data Mining in Healthcare for Heart Diseases

Applying matrix factorization in data reconstruction for heart disease patient classification

Survey of Classification Based Prediction Techniques in Healthcare

Healthcare Support Using Data Mining: A Case Study on Stroke Prediction

Study and analysis of data mining for healthcare

A Risk Assessment Model for Patients Suffering from Coronary Heart Disease Using a Novel Feature Sel...