ArticlePDF Available

Prediction of Heart Disease Using Classification Based Data Mining Techniques

Authors:

Abstract and Figures

Data Mining is an interesting field of research whose major objective is to find interesting and useful patterns from huge data sets. These patterns can be further used to make important decisions based on the result of the analysis. Healthcare industry today generates huge amount of data on a day to day basis. This data has to be analysed and hidden and meaningful patterns can be discovered. Data mining plays a promising and significant role in this aspect. Data Mining techniques can be used for disease prediction. In this research, the classification based data mining techniques are applied to healthcare data. This research focuses on the prediction of heart disease using three classification techniques namely Decision Trees, Naïve Bayes and K Nearest Neighbour.
Content may be subject to copyright.
Prediction of Heart Disease Using
Classication Based Data Mining
Techniques
Sujata Joshi and Mydhili K. Nair
Abstract Data Mining is an interesting eld of research whose major objective is
to nd interesting and useful patterns from huge data sets. These patterns can be
further used to make important decisions based on the result of the analysis.
Healthcare industry today generates huge amount of data on a day to day basis. This
data has to be analysed and hidden and meaningful patterns can be discovered. Data
mining plays a promising and signicant role in this aspect. Data Mining techniques
can be used for disease prediction. In this research, the classication based data
mining techniques are applied to healthcare data. This research focuses on the
prediction of heart disease using three classication techniques namely Decision
Trees, Naïve Bayes and K Nearest Neighbour.
Keywords Data mining Classication technique Heart disease Healthcare
Decision tree Naïve bayes K-Nearest neighbor Dataset
1 Introduction
Heart Disease is a class of diseases that involve the heart, the blood vessels or both.
The most common causes of heart disease are atherosclerosis and/or hypertension.
Atherosclerosis is a condition that develops when a substance called plaque builds
up in the walls of the arteries. This buildup narrows the arteries, making it harder
for blood to ow through. If a blood clot forms, it can stop the blood ow. This can
S. Joshi (&)
Department of Computer Science and Engineering, Nitte Meenakshi Institute of Technology,
Bangalore, Karnataka, India
e-mail: sujata_msrp@yahoo.com
M.K. Nair
Department of Information Science and Engineering, M. S. Ramaiah Institute of Technology,
Bangalore, Karnataka, India
e-mail: mydhili.nair@gmail.com
©Springer India 2015
L.C. Jain et al. (eds.), Computational Intelligence in Data Mining - Volume 2,
Smart Innovation, Systems and Technologies 32, DOI 10.1007/978-81-322-2208-8_46
503
cause a heart attack or stroke. The major risk factors for heart diseases are age,
gender, high blood pressure, diabetes mellitus, tobacco smoking, processed meat
consumption, excessive alcohol consumption, sugar consumption, family history,
obesity, lack of physical activity, psychosocial factors, and air pollution.
Heart disease is the leading cause of deaths worldwide, however since the 1970s,
mortality rate due to heart related diseases have declined in many high-income
countries. At the same time, heart related deaths and diseases have increased at a
fast rate in low and middle-income countries. Although heart disease usually affects
older adults, the symptoms may begin in early life, making primary prevention
efforts necessary from childhood. Therefore risk factors may be modied by having
healthy eating habits, exercising regularly, and avoiding of smoking tobacco.
In todays world, most of the hospitals maintain their patient data in electronic
form through some hospital database management system. These systems generate
huge amount of data on a daily basis. This data may be in the form of free text,
structured as in databases or in the form of images. This data may be used to extract
useful information which may be used for decision making. This requirement has
led to the use of Knowledge Discovery in Databases (KDD) which is responsible
for transforming data of low-level into high-level knowledge for decision making.
Data mining which is one of the KDD process aims at nding useful patterns from
large datasets. These patterns can be further analyzed and the result can be used for
effective decision making and analysis. The various tasks of data mining are
classication, clustering, association analysis and outlier detection. In this paper,
various data mining classication techniques are applied to healthcare data related
to heart diseases. It has helped to determine the best prediction technique in terms of
its accuracy and error rate on the specic dataset.
2 Related Work
There has been an increase in the number of people suffering from heart diseases in
the recent years [1]. With the advent of information technology and its applications
data mining plays a very important and apt role in early detection of diseases. Data
mining is extensively used in all elds and healthcare industry in particular [26]. In
the healthcare industry, the data mining techniques are used for diagnosis of dis-
eases [7], disease prediction [8], and analysis [9]. Data mining techniques can be
applied for predicting the outcome of interest. Hence prediction is a very important
task. The issues and guidelines of Predictive data mining in clinical medicine is
discussed in [10]. Research work [7,11,12] related to heart disease diagnosis using
data mining techniques is the motivation for this work. Classication based on Gini
index is discussed in [13]. The data mining techniques Decision tree, Naïve Bayes
and KNN are discussed in [8,10,14,15]. A model based on Combination of Naïve
Bayes Classier and K-Nearest Neighbor is proposed in [16]. A clinical decision
support system using association rule mining is discussed in [17]. A prediction
system for lung cancer detection is proposed in [18]. A diagnostic tool is proposed
504 S. Joshi and M.K. Nair
in [19] for skin diseases. In [6,9], the researchers analyze healthcare data using
different data mining techniques. After the extensive literature survey of the dataset,
algorithms, methods employed by the authors, results and future work, it is found
that there is a lot of scope in discovering efcient methods of medical diagnosis for
various diseases and their analysis. This work is an attempt to predict the occur-
rence of heart diseases using classication data mining techniques namely Decision
Tree, Naïve Bayes and K-Nearest Neighbor techniques.
3 Classication
Classication is one of the important data mining tasks. The objective of classi-
cation is to assign a class to previously unseen data accurately. Classication
consists of two stages:
Stage 1: Model construction
Stage 2: Model usage
Classication creates a model for the attributes of the dataset. A dataset is
divided into training set and test set. In the rst stage the training set is used to build
the classication model using a learning algorithm. In the second stage, the learned
model is put into operational use i.e. it is used to validate the test set. If the model
performs well, then the model is now ready for prediction.
3.1 Classication Techniques
In this study, the classication techniques, Decision tree, Naïve Bayes and KNN are
explored and applied to the dataset.
3.1.1 Decision Trees
The decision tree is a structure that includes root node, branch and leaf node. Each
internal node denotes a test on an attribute, each branch denotes the outcome of test
and each leaf node holds the class label. The rst node in the tree is the root node.
First, an attribute is selected and placed at the root node, and a branch is made for
each possible value. This splits up the data set into subsets, one for every value of
the attribute. Now repeat the process recursively for each branch, using only those
instances that actually reach the branch. When all instances at a node have the same
classication, the tree development can be stopped. To select the best split the
measures used generally are Gini, Entropy or Classication error.
Prediction of Heart Disease 505
3.1.2 Naïve Bayes Classier
Classication based on Bayes Theory is known as Bayesian Classication. Naive
Bayes classier is a statistical based classier which is based on Bayes Theory. It
assumes that attributes are statistically independent. This classier is based on
probabilities.
Given two events A and B, P(A) is prior probability and P(A|B) is posterior
probability, then according to Bayes theorem
PðAjBÞ¼P(B/A)P(A)/P(B) and P(BjA) is computed as P(A \B)=P(A)
These Bayesian probabilities are used to determine the most likely next event for
the given instance given all the training data. Conditional probabilities are deter-
mined from the training data.
This classier yields optimal prediction (given the assumptions). It can also
handle discrete or numeric attribute values.
3.1.3 K-Nearest Neighbor
Nearest neighbor method is a instance based classication technique that remembers
all the instances. When the new instance is encountered, it uses previous instances as
a model and compares it with the new instance. Prediction for the current instance is
the one with the most similar previously observed instance. K-NN classies the
instances using the K nearest neighbors. This classier has faster training rate but is
slow when the dataset is large since it has to evaluate all instances.
4 Methodology
4.1 TOOL Used
WEKA [20] Tool (Waikato Environment for Knowledge Analysis), is a set of data
mining algorithms and tools which can be used for analysis of data. WEKA is
developed in JAVA. WEKA allows analyzing the data sets saved in .arff format using
various algorithms. In this study, the Decision tree, Naïve Bayes and K-NN algor ithms
are applied to heart data set and the results of applying these techniques are shown.
4.2 Data Source
The heart diseases data set from the UCI [21] Learning Repository is used for this
study. The heart data set consists of 303 records and 14 attributes. The attributes are
listed in Table 1.
506 S. Joshi and M.K. Nair
4.3 Decision Tree
The decision tree is created by selecting the best split at every node. To select the
best attribute for the split, the information gain is computed at each node and the
attributes are ranked accordingly. Here the attribute evaluator used is Gain Ratio
AttributeEval and the search method used is Ranker method from WEKA Tool. The
ranked attributes are listed in Table 2.
Table 1 Attributes of the heart.arff le
No Attribute Type
1 age Real
2 sex {female, male}
3 cp {typ_angina, asympt, non_anginal, atyp_angina}
4 trestbps Real
5 chol Real
6 restecg {left_vent_hyper, normal, st_t_wave_abnormality}
7 thalach real
8 restecg {left_vent_hyper, normal, st_t_wave_abnormality}
9 exang {no, yes}
10 oldpeak real
11 slope {up, at, down}
12 ca real
13 thal {xed_defect, normal, reversable_defect}
14 num {<50,>50_1,>50_2,>50_3,>50_4}
Table 2 Attribute ranking
based on information gain Info gain Rank Attribute
0.17 12 thal
0.16 13 ca
0.15 9 exang
0.13 8 thalach
0.11 3 cp
0.10 10 oldpeak
0.09 11 slope
0.065 2 sex
0.060 1 age
0.022 7 restecg
0 6 fbs
0 5 chol
0 4 trestbps
Prediction of Heart Disease 507
The attributes selected in the order are: 12,13,9,8,3,10,11,2,1,7,6,5, 4.
The Decision Tree algorithm J48 is then applied to the heart data set and the
decision tree in Fig. 1is generated. This decision tree can be used for prediction.
The results are shown in Table 3.
4.4 Naïve Bayes
The attribute evaluator used is Gain Ratio AttributeEval and the search method used
is Ranker method. The ranked attributes are same as in Decision tree. The Naïve
Bayes algorithm is applied to the heart data set and the results of few attributes are
shown in Table 4.
The results are shown in Table 5.
Fig. 1 Decision tree generated using J48 algorithm
Table 3 Results of decision tree algorithm
No of instances Percentage (%)
Correctly classied instances 279 92.0792
Incorrectly classied instances 24 7.9208
Total instances 303
508 S. Joshi and M.K. Nair
4.5 K-Nearest Neighbor
The KNN algorithm is applied to the heart data set and the results are shown in
Table 6.
5 Results and Conclusion
The evaluation measures used are Sensitivity, Specicity and Accuracy
(i) Sensitivity =TP/P
(ii) Specicity =TN/N
(iii) Accuracy =(TP +TN)/(P+N)
Table 4 Results of few attributes using Naïve Bayes technique
Attribute <50 (0.54) >50_1 (0.45) >50_2 (0) >50_3 (0) >50_4 (0)
cp
typ_angina 17.0 8.0 1.0 1.0 1.0
asymp 40.0 105.0 1.0 1.0 1.0
non_anginal 70.0 19.0 1.0 1.0 1.0
atyp_angina 42.0 10.0 1.0 1.0 1.0
[total] 169.0 142.0 4.0 4.0 4.0
restecg
left_vent_hyper 69.0 80.0 1.0 1.0 1.0
Normal 97.0 57.0 1.0 1.0 1.0
st_t_wave_abnormality 2.0 4.0 1.0 1.0 1.0
[total] 168.0 141.0 3.0 3.0 3.0
Table 5 Results of Naïve Bayes technique
No of instances Percentage (%)
Correctly classied instances 255 84.1584
Incorrectly classied instances 48 15.8416
Total instances 303
Table 6 Results of K-nearest neighbor technique
No of instances Percentage (%)
Correctly classied instances 303 100
Incorrectly classied instances 0 0
Total instances 303
Prediction of Heart Disease 509
where TP is true positives, TN is true negatives, P and T are actual positives and
actual negatives respectively. A good predictor must have high sensitivity, low
specicity and high accuracy. The comparisons of these measures with respect to
the three prediction techniques are summarized in Table 7.
The experiments are conducted with WEKA tool and the algorithms applied on
the heart dataset. The graph in Fig. 2reveals that sensitivity and accuracy are high
and specicity is low. Hence the predictors perform well on operational use. With
respect to model creation the results show that KNN has highest accuracy as
expected since KNN remembers all the instances. But when used for prediction the
Decision Tree performs well when compared to other two methods for the given
heart dataset.
References
1. Heart DiseaseGeneral Info and Peer reviewed studies: [Online] Available http://www.
aristoloft.com
2. Patka, S., et al.: Recent trends and rapid development of applications in data mining. IOSR
J. Comput. Sci. (IOSR-JCE) 7378. e-ISSN: 2278-0661, p-ISSN: 2278-8727 (2014)
3. Tomar, D., Agarwal, S.: A survey of data mining approaches for healthcare. Int. J. Bio-Science
and Bio-Technology 5(5), 241256 (2013)
4. El-Sappagh, S.H., et al.: Data mining and knowledge discovery: applications, techniques,
challenges and process models in healthcare. Int. J. Eng. Res. Appl. (IJERA) 3(3), 900906.
ISSN: 2248-9622 www.ijera.com (2013)
5. Koh, H.C., Tan, G.: Data mining applications in healthcare. J. Healthc. Inf. Manag. 19(2), 65
(2011)
Table 7 Summarization of
prediction techniques with
performance
Prediction
technique
Sensitivity Specicity Accuracy
Decision tree 0.921 0.085 0.922
Naïve bayes 0.842 0.165 0.842
KNN 1 0 1
0
0.2
0.4
0.6
0.8
1
Sensitivity Specificity Accuracy
Predictor Comparison
Decision Tree Naïve Bayes KNN
Fig. 2 Comparison of
prediction techniques
510 S. Joshi and M.K. Nair
6. Obenshain, M.K.: Application of data mining techniques to healthcare data. Infect. Control
Hosp. Epidemiol. 25(8), 690695 (2004)
7. Shouman, M., Turner, T., Stocker, R.: Using data mining techniques in heart disease diagnosis
and treatment. In: Proceedings in JapanEgypt Conference on Electronics, Communications
and Computers, vol. 2, pp. 174177. IEEE (2012)
8. Bellazzi, R., Zupan, B.: Predictive data mining in clinical medicine: current issues and
guidelines. Int. J. Med. Inf. 77(2), 8197 (2006)
9. Gosain, A.: Analysis of healthcare data using different data mining techniques, IEEE, ISBN:
978-1-4244-4711-4 (2009)
10. Milovic, B., Milovic, M.: Prediction and decision making in health care using data mining. Int.
J. Public Health Sci. (IJPHS) 1(2), 6978 (2012). ISSN: 2252-8806
11. Melillo, P., De Luca, N., Bracale, M., Pecchia, L.: Classication tree for risk assessment in
patients suffering from congestive heart failure via long-term heart rate variability. IEEE
J. Biomed Health Inf. 17(3), 727733 (2013)
12. Rao, R.B., Krishan, S., Niculescu, R.S.: Data mining for improved cardiac care. ACM
SIGKDD Explor. Newsl. 8(1), 310 (2006)
13. Suneetha, N., Hari, V.M.K., Kumar, V.S.: Modied gini index classication: a case study of
heart disease dataset. Int. J. Comput. Sci. Eng. 2(6), 19591965 (2010)
14. Han, J., Kamber, M.: Data Mining: Concepts And Techniques. Morgan Kaufmann, San
Francisco (2001)
15. Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining, 4th edn. Pearson
Publications, Boston
16. Ferdousy, E.Z., Islam, M.M, Matin, M.A.: Combination of Naïve Bayes classier and K-
nearest neighbor in the classication based predictive models. J. Comput. Inf. Sci. 6(3), 4856.
ISSN: 1913-8989 (2013)
17. Cheng, C., Chanani, N., Vengopalan, J., Maher, K., Wang, D.: icuARMan ICU clinical
decision support system using association rule mining. IEEE J. Transl. Eng. Health Med. 1
(2013)
18. Krishnaiah, V., Narasimha, G., Chandra, N.S.: Diagnosis of lung cancer prediction system
using data mining classication techniques. Int. J. Comput. Sci. Inf. Technol. 4,3945 (2013)
19. Cataloluk, H., Kesler, M.: A diagnostic software tool for skin diseases with basic and weighted
K-NN, IEEE. ISBN: 978-1-4673-1448-0/12 (2012)
20. WEKA: Data Mining Machine Learning Software. [Online] Available http://www.cs.waikato.
ac.nz/ml/weka/
21. UCI Machine Learning Repository. [Online] Available http://archive.ics.uci.edu/ml/datasets.
html
22. Cios, K.J., William Moore, G.: Uniqueness of medical data mining. J. Artif. Intell. Med. 26(1),
124 (2002)
Prediction of Heart Disease 511
... Moreover, that build causes narrow arteries and composed the more blood flow. Blood clot forms (Joshi & Nair, 2015). Nowadays, cardiovascular diseases (CVD) are a hot subject in the Healthcare industry globally (Qaiser et al., 2015). ...
... Furthermore, the power of numbers can help eliminate specific errors and delays in algorithm and construct better distinct learning algorithms. The major pros are that it can proficiently address massive quantity of data (Joshi & Nair, 2015). The parameter with the best prediction accuracy as the number of estimator increases is shown in Fig. 2. ...
... This classifier contains numerous essential attributes in terms of effective performance and computation (Ao et al., 2014;Bharti et al., 2021;Research & 2018Research & , 2018. For the input elements, number of scaling instructed is as shown in Fig. 5. Fig. 5. Logistic function (Joshi & Nair, 2015) ...
Article
Cardiovascular is comprehended as heart disease, and it covers different cases that impact the heart and have been the direct cause of death. It allies numerous risk elements in heart disease and the necessity to build practical strategies for earlier diagnosis to manage the condition promptly. The machine learning classifier has evolved significantly in the medical database, particularly for diagnosing disease. Nowadays, numerous organizations utilize these machine learning strategies to improve medical diagnostics for the earlier prognosis of conditions. This paper summarizes these machine learning algorithms in disease prediction and computational. Classification algorithms considered are logic regression, Naive Bayes, K-NN (nearest neighbor), (Support vector machine), SVM, (Decision Tree) DT, K-means clustering, and RF (random forest). This work reviewed 20 papers from 2018- 2021 that employed Classifier to detect specifically heart diseases in the medical sector during the last four years.
... Data mining methods can be implemented for predicting the results of the domain. Hence prediction plays a very important role [4]. The methodology and implementation of Naïve Bayes and Decision Tree technique used for prediction of heart diseases is discussed in [1]. ...
... The results show the short but accurate outcomes by applying classification which is used for the newly added patients compared to the already added ones. A model based on Combination of Naïve Bayes Classifier and K-Nearest Neighbor is proposed in [3,4,11]. Adding two more attributes helps in defining the clear result. ...
... Though the result can be predicted from any algorithm but the more accurate an algorithm could be the more it is usable [4]. From figure 4 it can be clearly predicted that the K-NN is more accurate and sensitive than others. ...
Article
Full-text available
Data Mining have always been a field and combination of both computer science and statistical knowledge. From the beginning it is used to ascertain designs, patterns and arrangements which are formed in the information pool. The motive of the data mining development is to produce useful information from the pool of raw data and convert it into useful information which can be used for future arrangements. The tools which are used in data mining are helpful in predicting the future trends and predictions across the market, which also help in decision making and building the knowledge to make decisions. The “Healthcare Industry” is generally information rich. It has been collecting data to improve the continuing problems and help to identify the solutions for that problems. Data mining techniques can be used to predict heart conditions from the voluminous and complex data which are kept by the hospitals for decision making which are difficult to analyze by outmoded methods. Unfortunately, outmoded methods are less accurate in discovering hidden information from effective decision making. Data mining helps in altering the huge amount of data into knowledge driven which takes, as compared to others, less time and effort for the prediction and with greater accuracy. Our effort is to apply different data mining techniques that are used to solve the problem of biased forecasts and decision making and help in calculating the results with more accuracy.
... To focus on the heart disease prediction, Sujata Joshi [4] writers employed three classifiers, including DT, Naive Bayes, and KNN. They demonstrated that predictors performed better in real-world applications. ...
... Data mining is a subfield of artificial intelligence that entails the examination of enormous amounts of data in order to uncover previously unrecognized patterns [5]. Additionally, data mining approaches can be used to group variables with similar behaviours and forecast future events. ...
Article
Full-text available
Many literature searches are required in scientific study, and these take a significant amount of time and effort. The bibliometric analysis is useful for locating research hotspots and gaining an understanding of research trends, according to the published literature. This bibliometric analysis was carried out with the assistance of the tools Bibliometric package in R (biblioshyny) and VOSviewer; the data used in this analysis was primarily derived from the Scopus repository to analyze the data mining approaches for disease classification (DMDC). A sample of 804 articles was selected by utilizing a query including essential key terms such as (”data mining” OR “data-mining”) AND (“disease classification” OR “disease identification” OR “disease prediction”). Overall, the findings of the study indicate the highest number of publications on applying DMDC published in 2019 with 141 research articles. As for individual researchers, the most productive authors are Jabbar MA, Li J, and Wang X. Jabbar MA in the field of DMDC, and 57 articles (1.2%) were written by one author, while the rest of the 747 articles were written by multiple authors. The Advances in Intelligent Systems and Computing journal (26 articles) has the greatest number of published articles connected to the DMDC field. Total of 804 articles in 474 distinct journals, there are 57 journals that have already published more than three papers, accounting for 12.03%. The USA was the most productive country, and the University of California is the affiliated university that comes from most top research in this field. The most cited research article published by Moore JH in 2006 included 489 citations. There are different types of diseases identified using data mining techniques such as heart disease, Brest cancer, liver disease, chronic kidney disease, Parkinson’s disease, diabetes mellitus, and Alzheimer’s disease. The most widely used algorithms in the research community include the random forest, decision tree, support vector machine, and naive bayes algorithms. In the future, the field of data mining could grow in many different directions and could be an effective way to increase the accuracy of disease prediction.
... Ketika Anda menderita diabetes, tubuh Anda tidak menghasilkan cukup insulin atau tidak menggunakannya dengan benar. Gula menumpuk dalam darah Anda, menyebabkan konsekuensi seperti penyakit jantung, stroke, neuropati, sirkulasi yang buruk, amputasi anggota badan, kebutaan, gagal ginjal, kerusakan saraf, dan kematian [21]. ...
Article
Full-text available
Di bidang medis, spesialis ingin data yang tersedia untuk membuat keputusan. Teknik data mining saat ini terkait dengan penyelidikan terapeutik untuk memecah sejumlah besar data penyembuhan. Penelitian ini bertujuan untuk menggunakan teknik data mining untuk mengeksplorasi database diabetes yang dapat digunakan untuk diagnosis yang lebih baik. Oleh karena itu, penelitian ini berfokus pada analisis data diabetes menggunakan berbagai sistem data mining. Pengobatan mutakhir menghasilkan sejumlah besar data yang dimasukkan ke dalam database terapeutik. Analisis yang tepat dari data tersebut dapat mengungkapkan beberapa realitas mengejutkan yang telah disembunyikan atau didistribusikan entah bagaimana. Data mining adalah bidang yang bertujuan untuk fokus pada fakta menarik dari pengumpulan data yang ekstensif. Penelitian ini bertujuan untuk memecah kumpulan data diabetes dan mengekstrak beberapa fakta menarik yang dapat digunakan untuk mengembangkan model harapan. Salah satu aplikasi di mana instrumen data mining menunjukkan nilainya adalah menentukan penyebab penyakit. Sepanjang tahun-tahun sebelumnya, diabetes telah menjadi penyebab utama kematian di seluruh dunia. Beberapa ilmuwan menggunakan data faktual. Ketersediaan data terapeutik dalam jumlah besar memerlukan pengembangan alat penambangan yang canggih untuk membantu para profesional medis dalam mendiagnosis infeksi diabetes. Penggunaan data mining sebagai bagian dari diagnosis diabetes telah diselidiki dengan cermat, menampilkan tingkat presisi yang memuaskan. Spesialis baru-baru ini mengeksplorasi pengaruh menggabungkan banyak prosedur dengan hasil yang ditingkatkan dalam mendeteksi penyakit diabetes.
... S. Joshi et al. [10] performed a heart disease prediction system using three classification techniques such as K-NN, Decision Tree, and Naïve Bayes. They also used a dataset with 303 entities and 14 attributes from the UCI repository. ...
Article
Prediction of heart disease is challenging because countless data are collected for clinical data analysis, but all this information is not equally important for making the right decisions. We have proposed a hybrid method: Hierarchical Agglomerative Clustering algorithm combined with conventional classification techniques such as K-Nearest Neighbors (K-NN), Decision Tree (J48), and Naïve Bayes which aims to reduce the prediction time by clustering the patients having almost similar symptoms of heart failure. This approach minimizes the forecasting time based on clusters of patients instead of individual patients. Moreover, a comparison between the classification techniques and our approach is depicted based on precision, recall, F1 score, accuracy, and prediction time. The accuracies of the classifiers (K-NN-66.67%, J48-83.33%, and Naïve Bayes83.33%) of our system have slightly decreased compared with the conventional methods (K-NNN-69.128%, J48-83.8926%, and Naïve Bayes-87.248%) but the prediction time was significantly low (K-NNN-230ms, J48-203ms, and Naïve Bayes-195ms).
Article
Full-text available
The broad area of data mining covers most of the fields of research. Its role in medical diagnosis is very motivative to the researchers. It is very easy for the medical practitioners to analyze and treat the disease of the patients at an early stage. The proposed work deals with predicting heart disease of the patients at an early stage. The method was organized in three stages, Data collection, Data preprocessing and Data classification. The dataset for the work was collected from UCI repository. The collected sample was first preprocessed to clean unwanted information from the dataset. Classification operation is then performed on the preprocessed data. Classification is carried out with three different techniques, Linear regression model, SMOreg and REP trees. The results of the three methods were compared based on Root mean squared error and the Absolute error and are tabulated.
Article
Data mining techniques have been mostly used in medical area for prediction and diagnosis of various diseases. These techniques discover the hidden pattern and relationship in medical data and therefore have been very important in designing clinical support. Now a day's data mining techniques are widely used in diagnosis of heart disease because of increasing death rate worldwide. The reason of this may be the complex and expensive tests conducted in labs to predict the heart disease. Systems based on these risk factors not only benet healthcare professionals, but warn them of the potential presence of heart disease even before a patient is admitted to the hospital or undergoes an expensive medical examination. This in order to reduce the risk of this disease a better approach would to identify risk factor the result in heart disease. This study is an effort in this direction. This approach to predict the heart disease in early stage is developed in present study by analyzing risk factors. This technique developed weighted gain decision tree predicts the risk of heart disease with an accuracy of 90%.
Article
Full-text available
In this study, we present a new classifier that combines the distance-based algorithm K-Nearest Neighbor and statistical based Naïve Bayes Classifier. That is equipped with the power of both but avoid their weakness. The performance of the proposed algorithm in terms of accuracy is experimented on some standard datasets from the machine-learning repository of University of California and compared with some of the art algorithms. The experiments show that in most of the cases the proposed algorithm outperforms the other to some extent. Finally we apply the algorithm for predicting profitability positions of some financial institutions of Bangladesh using data provided by the central bank.
Conference Paper
Full-text available
In the field of dermatology, to able to make differential diagnosis of erythemato-squamous diseases between each other accurately, is quite significant for the treatment of the disease. Especially the symptoms seen in the early stages of diseases in this group, may be very similar to each other. And this situation makes it difficult to determine accurate diagnosis for patients. Hence this study presents a data mining application for this problem in the medical field. Mentioned software tool in this study is trying to obtain correct diagnosis of erythemato-squamous diseases by using the basic and weighted K-NN algorithms on medical data. In this way, this paper presents a comparison between these two methods by evaluating and presenting the performances of them. Furthermore there is also a comparison between the Euclidean and Manhattan distance measures.
Article
Full-text available
The rapid development of biomedical monitoring technologies has enabled modern intensive care units (ICUs) to gather vast amounts of multimodal measurement data about their patients. However, processing large volumes of complex data in real-time has become a big challenge. Together with ICU physicians, we have designed and developed an ICU clinical decision support system icuARM based on associate rule mining (ARM), and a publicly available research database MIMIC-II (Multi-parameter Intelligent Monitoring in Intensive Care II) that contains more than 40,000 ICU records for 30,000+patients. icuARM is constructed with multiple association rules and an easy-to-use graphical user interface (GUI) for care providers to perform real-time data and information mining in the ICU setting. To validate icuARM, we have investigated the associations between patients' conditions such as comorbidities, demographics, and medications and their ICU outcomes such as ICU length of stay. Coagulopathy surfaced as the most dangerous co-morbidity that leads to the highest possibility (54.1%) of prolonged ICU stay. In addition, women who are older than 50 years have the highest possibility (38.8%) of prolonged ICU stay. For clinical conditions treatable with multiple drugs, icuARM suggests that medication choice can be optimized based on patient-specific characteristics. Overall, icuARM can provide valuable insights for ICU physicians to tailor a patient's treatment based on his or her clinical status in real time.
Article
Full-text available
This study aims to develop an automatic classifier for risk assessment in patients suffering from congestive heart failure (CHF). The proposed classifier separates lower risk patients from higher risk ones, using standard long-term heart rate variability (HRV) measures. Patients are labeled as lower or higher risk according to the New York Heart Association classification (NYHA). A retrospective analysis on two public Holter databases was performed, analyzing the data of 12 patients suffering from mild CHF (NYHA I and II), labeled as lower risk, and 32 suffering from severe CHF (NYHA III and IV), labeled as higher risk. Only patients with a fraction of total heartbeats intervals (RR) classified as normal-to-normal (NN) intervals (NN/RR) higher than 80% were selected as eligible in order to have a satisfactory signal quality. Classification and regression tree (CART) was employed to develop the classifiers. A total of 30 higher risk and 11 lower risk patients were included in the analysis. The proposed classification trees achieved a sensitivity and a specificity rate of 93.3% and 63.6%, respectively, in identifying higher risk patients. Finally, the rules obtained by CART are comprehensible and consistent with the consensus showed by previous studies that depressed HRV is a useful tool for risk assessment in patients suffering from CHF.
Article
Full-text available
Data Mining is one of the most motivating area of research th at is become increasingly popular in health organization. Data Mining plays an important role for uncovering new trends in healthcare organization which in turn helpful for all the parties associated with this field. This survey explores the utility of var ious Data Mining techniques such as classification, clustering, association, regression in health domain. In this paper, we present a brief introduction of these techniques and their advantages and disadvantages. This survey also highlights applications, c hallenges and future issues of Data Mining in healthcare. Recommendation regarding the suitable choice of available Data Mining technique is also discussed in this paper.
Book
This is the third edition of the premier professional reference on the subject of data mining, expanding and updating the previous market leading edition. This was the first (and is still the best and most popular) of its kind. Combines sound theory with truly practical applications to prepare students for real-world challenges in data mining. Like the first and second editions, Data Mining: Concepts and Techniques, 3rd Edition equips professionals with a sound understanding of data mining principles and teaches proven methods for knowledge discovery in large corporate databases. The first and second editions also established itself as the market leader for courses in data mining, data analytics, and knowledge discovery. Revisions incorporate input from instructors, changes in the field, and new and important topics such as data warehouse and data cube technology, mining stream data, mining social networks, and mining spatial, multimedia and other complex data. This book begins with a conceptual introduction followed by a comprehensive and state-of-the-art coverage of concepts and techniques. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. Wherever possible, the authors raise and answer questions of utility, feasibility, optimization, and scalability. relational data. -- A comprehensive, practical look at the concepts and techniques you need to get the most out of real business data. -- Updates that incorporate input from readers, changes in the field, and more material on statistics and machine learning, -- Scores of algorithms and implementation examples, all in easily understood pseudo-code and suitable for use in real-world, large-scale data mining projects. -- Complete classroom support for instructors as well as bonus content available at the companion website. A comprehensive and practical look at the concepts and techniques you need in the area of data mining and knowledge discovery.
Article
Tendency for data mining application in healthcare today is great, because healthcare sector is rich with information, and data mining is becoming a necessity. Healthcare organizations produce and collect large volumes of information on daily basis. Use of information technologies allows automatization of processes for extraction of data that help to get interesting knowledge and regularities, which means the elimination of manual tasks and easier extraction of data directly from electronic records, transferring onto secure electronic system of medical records which will save lives and reduce the cost of the healthcare services, as well and early discovery of contagious diseases with the advanced collection of data. Data mining can enable healthcare organizations to predict trends in the patient conditions and their behaviors, which is accomplished by data analysis from different perspectives and discovering connections and relations from seemingly unrelated information. Raw data from healthcare organizations are voluminous and heterogeneous. They need to be collected and stored in the organized forms, and their integration enables forming of hospital information system. Healthcare data mining provides countless possibilities for hidden pattern investigation from these data sets. These patterns can be used by physicians to determine diagnoses, prognoses and treatments for patients in healthcare organizations.
Article
The availability of huge amounts of medical data leads to the need for powerful data analysis tools to extract useful knowledge. Researchers have long been concerned with applying statistical and data mining tools to improve data analysis on large data sets. Disease diagnosis is one of the applications where data mining tools are proving successful results. Heart disease is the leading cause of death all over the world in the past ten years. Several researchers are using statistical and data mining tools to help health care professionals in the diagnosis of heart disease. Using single data mining technique in the diagnosis of heart disease has been comprehensively investigated showing acceptable levels of accuracy. Recently, researchers have been investigating the effect of hybridizing more than one technique showing enhanced results in the diagnosis of heart disease. However, using data mining techniques to identify a suitable treatment for heart disease patients has received less attention. This paper identifies gaps in the research on heart disease diagnosis and treatment and proposes a model to systematically close those gaps to discover if applying data mining techniques to heart disease treatment data can provide as reliable performance as that achieved in diagnosing heart disease.
Article
Data mining is an interesting field of research whose major objective is to acquire knowledge from large amounts of data. With advances in health care related research, there is a wealth of data available. However, there is a lack of effective analytical tools to discover hidden and meaningful patterns and trends in data, which is essential for any research. In recent years, human immune-deficiency virus (HIV) related illnesses have become a threat to the modern world. Researchers all over world, including India, are trying hard to find suitable answer to this and this led to lots of research in the field. Therefore, a tool which can process data in meaningful way is the need the time. In this study, we briefly examine the potential use of classification based data mining techniques such as decision tree and association rule to massive volume of health care data. Further we developed a prototype/approach that is specially designed to monitor the patients receiving antiretroviral therapy (ART). As monitoring of individual is not a difficult task however deriving inferences from a large cohort and then use this information for future guidelines need this kind of prototype/approach. We expect, this would have great impact in current management and future strategies against HIV.