ArticlePDF Available

Early Prediction of Heart Disease Using Decision Tree Algorithm

April 2017

April 2017
3(3)

Authors:

A. Sankari Karthiga

Manonmaniam Sundaranar University

Safish Mary

St. Xavier's College, Palayamkottai

M. Yogasini

Sadakathullah Appa College

For processing of large amount of data numerous techniques are used. Data Mining is one of the techniques that are used most often. To process these data, Data mining combines traditional data analysis with sophisticated algorithms. Medical data mining is an important area of Data Mining and considered as one of the important research field due to its application in healthcare domain. Classification and prediction of medical datasets poses challenges in Medical Data Mining. The heart disease accounts to be the leading cause of death worldwide. It is difficult for medical practitioners to predict the heart attack as it is a complex task that requires experience and knowledge. The health sector today contains hidden information that can be important in making decisions. Data mining algorithms such as decision tree and Naïve Bayes are applied in this research for predicting heart attacks. The research result shows prediction accuracy of 99%. Data mining enable the health sector to predict patterns in the dataset.

Content uploaded by Safish Mary

Content may be subject to copyright.

ISSN (ONLINE):2395-695X

ISSN (PRINT):2395-695X

International Journal of Advanced Research in Basic Engineering Sciences and Technology (IJARBEST)

Vol.3, Issue.3, March 2017

Early Prediction of Heart Disease Using

Decision Tree Algorithm

A. Sankari Karthiga1, M. Safish Mary2, M. Yogasini3

M.Phil. Scholar1, Assistant Professor2, Assistant Professor3,

Mother Teresa Women’s University, Kodaikanal1

St. Xavier’s College, Tirunelveli2

Sadakathdulla Appa College, Tirunelveli3

sankarikarthiga.sk@gmail.com

Abstract-For processing of large amount of data numerous techniques are used. Data

Mining is one of the techniques that are used most often. To process these data, Data

mining combines traditional data analysis with sophisticated algorithms. Medical data

mining is an important area of Data Mining and considered as one of the important

research field due to its application in healthcare domain. Classification and prediction

of medical datasets poses challenges in Medical Data Mining. The heart disease

accounts to be the leading cause of death worldwide. It is difficult for medical

practitioners to predict the heart attack as it is a complex task that requires experience

and knowledge. The health sector today contains hidden information that can be

important in making decisions. Data mining algorithms such as decision tree and Naïve

Bayes are applied in this research for predicting heart attacks. The research result

shows prediction accuracy of 99%. Data mining enable the health sector to predict

patterns in the dataset.

Index Terms- Decision Tree Algorithm, Naïve Bayes Algorithm.

I. INTRODUCTION

1.1. DATA MINING

Data Mining is about explaining the past and predicting the future by means of data

analysis. Data mining is a multi-disciplinary field that combines statistics, machine learning,

artificial intelligence and database technology. The value of data mining applications is often

estimated to be very high. Many businesses have stored large amounts of data over years of

operation, and data mining is able to extract very valuable knowledge from this data. The

businesses are then able to leverage the extracted knowledge into more clients, more sales,

and greater profits. This is also true in the engineering and medical fields.

1.1.1. Statistics

The science of statistics is to collecting, classifying, summarizing, organizing, analyzing, and

interpreting data.

1.1.2. Artificial Intelligence

The study of computer algorithms is to dealing with the simulation of intelligent behaviour in

order to perform those activities that are normally thought to require intelligence.

A. Sankari Karthiga, M. Safish Mary, M. Yogasini ©IJARBEST PUBLICATIONS

This work by IJARBEST is licensed under Creative Commons Attribution 4.0 International License. Available at https://www.ijarbest.com/Archive

ISSN (ONLINE):2395-695X

ISSN (PRINT):2395-695X

International Journal of Advanced Research in Basic Engineering Sciences and Technology (IJARBEST)

Vol.3, Issue.3, March 2017

1.1.3. Machine Learning

The study of the computer algorithms aim is to learn in order to improve automatically

through experience.

1.1.4. Database

The science and technology of collecting, storing and managing data so users can retrieve,

add, update or remove such data.

1.1.5. Data warehousing

The science and technology of collecting, storing and managing data with advanced multi-

dimensional reporting services in support of the decision-making processes.

1.1.6. Explaining the Past

Data mining explains the past through data exploration.

1.1.7. Predicting the Future

Data mining predicts the future by means of modeling.

1.1.8. Data Exploration

Data Exploration is about describing the data by means of statistical and visualization

techniques. We explore data in order to bring important aspects of that data into focus for

further analysis.

“Data Mining is a non-trivial extraction of implicit, previously unknown and potential

useful information about data” [1]. In short, it is a process of analyzing data from different

perspective and gathering the knowledge from it. The discovered knowledge can be used for

different applications for example healthcare industry. Nowadays healthcare industry

generates large amount of data about patients, disease diagnosis etc. Data mining provides a

set of techniques to discover hidden patterns from data. A major challenge facing Healthcare

industry is quality of service. Quality of service implies diagnosing disease correctly &

provides effective treatments to patients. Poor diagnosis can lead to disastrous consequences

that are unacceptable.

According to survey of WHO, 17 million total global deaths are due to heart attacks

and strokes. The deaths due to heart disease in many countries occur due to work overload,

mental stress and many other problems. Overall, it is found as primary reason behind death in

adults. Diagnosis is complicated and important task that needs to be executed accurately and

efficiently. The diagnosis is often made, based on doctor’s experience & knowledge. This

leads to unwanted results & excessive medical costs of treatments provided to patients.

Therefore, an automatic medical diagnosis system is designed that take advantage of

collected database and decision support system. This system can help in diagnosing disease

with less medical tests & effective treatments.

A. Sankari Karthiga, M. Safish Mary, M. Yogasini ©IJARBEST PUBLICATIONS

ISSN (ONLINE):2395-695X

ISSN (PRINT):2395-695X

International Journal of Advanced Research in Basic Engineering Sciences and Technology (IJARBEST)

Vol.3, Issue.3, March 2017

1.2. MEDICAL DATA MINING

Medical data mining has great potential for exploring the hidden patterns in the data

sets of the medical domain. These patterns can be utilized for clinical diagnosis. However,

the available raw medical data are widely distributed, heterogeneous in nature, and

voluminous. These data need to be collected in an organized form. This collected data can be

then integrated to form a hospital information system. Data mining technology provides a

user oriented approach to novel and hidden patterns in the data.

The World Health Organization has estimated that 12 million deaths occurs

worldwide, every year due to the Heart diseases. Half the deaths in the United States and

other developed countries occur due to cardio vascular diseases. It is also the chief reason of

deaths in numerous developing countries. On the whole, it is regarded as the primary reason

behind deaths in adults. The term Heart disease encompasses the diverse diseases that affect

the heart. Heart disease was the major cause of casualties in the different countries including

India. Heart disease kills one person every 34 seconds in the United States. Coronary heart

disease, Cardiomyopathy and Cardiovascular disease are some categories of heart diseases.

The term “cardiovascular disease” includes a wide range of conditions that affect the heart

and the blood vessels and the manner in which blood is pumped and circulated through the

body. Cardiovascular disease (CVD) results in several illness, disability, and death. The

diagnosis of diseases is a vital and intricate job in medicine.

Medical diagnosis is regarded as an important yet complicated task that needs to be

executed accurately and efficiently. The automation of this system would be extremely

advantageous. Regrettably all doctors do not possess expertise in every sub specialty and

moreover there is a shortage of resource persons at certain places. Therefore, an automatic

medical diagnosis system would probably be exceedingly beneficial by bringing all of them

together. Appropriate computer-based information and/or decision support systems can aid in

achieving clinical tests at a reduced cost. Efficient and accurate implementation of automated

system needs a comparative study of various techniques available. This paper aims to analyze

the different predictive/ descriptive data mining techniques proposed in recent years for the

diagnosis of heart disease.

Medical diagnosis is considered as a significant yet intricate task that needs to be

carried out precisely and efficiently. The automation of the same would be highly beneficial.

Clinical decisions are often made based on doctor’s intuition and experience rather than on

the knowledge rich data hidden in the database. This practice leads to unwanted biases, errors

and excessive medical costs which affects the quality of service provided to patients. Data

mining have the potential to generate a knowledge-rich environment which can help to

significantly improve the quality of clinical decisions.

Decision Tree is a popular classifier which is simple and easy to implement. It

requires no domain knowledge or parameter setting and can handle high dimensional data.

The results obtained from Decision Trees are easier to read and interpret. The drill through

feature to access detailed patients‟ profiles is only available in Decision Trees.

Naïve Bayes is a statistical classifier which assumes no dependency between

attributes. It attempts to maximize the posterior probability in determining the class. The

advantage of using naive bayes is that one can work with the naive Bayes model without

using any Bayesian methods. Naive Bayes classifiers have works well in many complex real-

world situations

A. Sankari Karthiga, M. Safish Mary, M. Yogasini ©IJARBEST PUBLICATIONS

ISSN (ONLINE):2395-695X

ISSN (PRINT):2395-695X

International Journal of Advanced Research in Basic Engineering Sciences and Technology (IJARBEST)

Vol.3, Issue.3, March 2017

1.3. HEART DISEASE

The heart is important organ of human body part. It is nothing more than a pump,

which pumps blood through the body. If circulation of blood in body is inefficient the organs

like brain suffer and if heart stops working altogether, death occurs within minutes. Life is

completely dependent on efficient working of the heart. The term Heart disease refers to

disease of heart & blood vessel system within it.

A number of factors have been shown that increases the risk of Heart disease:

• Family history

• Smoking

• Poor diet

• High blood pressure

• High blood cholesterol

• Obesity

• Physical inactivity

• Hyper tension

Factors like these are used to analyze the Heart disease. In many cases, diagnosis is

generally based on patient’s current test results & doctor’s experience. Thus the diagnosis is a

complex task that requires much experience & high skill.

Heart disease is a broad term that includes all types of diseases affecting different

components of the heart. Heart means 'cardio.' Therefore, all heart diseases belong to the

category of cardiovascular diseases. Some types of Heart diseases are

1. Coronary heart disease It also known as coronary artery disease (CAD), it is

the most common type of heart disease across the world. It is a condition in

which plaque deposits block the coronary blood vessels leading to a reduced

supply of blood and oxygen to the heart.

2. Angina pectoris it is a medical term for chest pain that occurs due to

insufficient supply of blood to the heart. Also known as angina, it is a warning

signal for heart attack. The chest pain is at intervals ranging for few seconds or

minutes.

3. Congestive heart failure it is a condition where the heart cannot pump enough

blood to the rest of the body. It is commonly known as heart failure.

4. Cardiomyopathy, it is the weakening of the heart muscle or a change in the

structure of the muscle due to inadequate heart pumping. Some of the common

causes of cardiomyopathy are hypertension, alcohol consumption, viral

infections, and genetic defects.

5. Congenital heart disease, it also known as congenital heart defect, it refers to

the formation of an abnormal heart due to a defect in the structure of the heart

or its functioning. It is also a type of congenital disease that children are born

with.

A. Sankari Karthiga, M. Safish Mary, M. Yogasini ©IJARBEST PUBLICATIONS

ISSN (ONLINE):2395-695X

ISSN (PRINT):2395-695X

International Journal of Advanced Research in Basic Engineering Sciences and Technology (IJARBEST)

Vol.3, Issue.3, March 2017

6. Arrhythmias it is associated with a disorder in the rhythmic movement of the

heartbeat. The heartbeat can be slow, fast, or irregular. These abnormal

heartbeats are caused by a short circuit in the heart's electrical system.

7. Myocarditis it is an inflammation of the heart muscle usually caused by viral,

fungal, and bacterial infections affecting the heart. It is an uncommon disease

with few symptoms like joins pain, leg swelling or fever that cannot be

directly related to the heart.

1.4. DECISION TREES

The decision tree approach is more powerful for classification problems. There are

two steps in this techniques building a tree & applying the tree to the dataset. There are many

popular decision tree algorithms CART, ID3, C4.5, CHAID, and J48. From these J48

algorithm is used for this system. J48 algorithm uses pruning method to build a tree. Pruning

is a technique that reduces size of tree by removing over fitting data, which leads to poor

accuracy in predications. The J48 algorithm recursively classifies data until it has been

categorized as perfectly as possible. This technique gives maximum accuracy on training

data. The overall concept is to build a tree that provides balance of flexibility & accuracy.

1.5. NAIVE BAYES

Naive Bayes classifier is based on Bayes theorem. This classifier algorithm uses

conditional independence, means it assumes that an attribute value on a given class is

independent of the values of other attributes.

1.6. ORGANIZATION OF THE THESIS

This chapter is organized as follows: first, we outline the basics of patient physiology

and fetus response to different stages of oxygen deficiency - hypo anemia, hypoxia, and

asphyxia. Next, we describe an interaction between mother and fetus during gestation with

emphasis on the antepartum and intrapartum period. Finally, we introduce methods for the

patient hypoxia diagnostics with focus on electronic patient monitoring that involves

observation of CTG or FECG changes. We stress the significance of signal interpretation and

describe advantages and disadvantages of respective methods.

II. CLASSIFICATION USING DECISION TREE ALGORITHM

2.1. INTRODUCTION

Decision tree builds classification or regression models in the form of a tree structure.

It breaks down a dataset into smaller and smaller subsets while at the same time an associated

decision tree is incrementally developed. The final result is a tree with decision nodes and

leaf nodes. A decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast

and Rainy). Leaf node (e.g., Play) represents a classification or decision. The topmost

decision node in a tree which corresponds to the best predictor called root node. Decision

trees can handle both categorical and numerical data.

A. Sankari Karthiga, M. Safish Mary, M. Yogasini ©IJARBEST PUBLICATIONS

ISSN (ONLINE):2395-695X

ISSN (PRINT):2395-695X

International Journal of Advanced Research in Basic Engineering Sciences and Technology (IJARBEST)

Vol.3, Issue.3, March 2017

2.2. ALGORITHM

The core algorithm for building decision trees called C4.5 by J. R. Quinlan which

employs a top-down, greedy search through the space of possible branches with no

backtracking. C4.5 uses Entropy and Information Gain to construct a decision tree.

2.3. ENTROPY

A decision tree is built top-down from a root node and involves partitioning the data

into subsets that contain instances with similar values (homogenous). ID3 algorithm uses

entropy to calculate the homogeneity of a sample. If the sample is completely homogeneous

the entropy is zero and if the sample is an equally divided it has entropy of one. To build a

decision tree, we need to calculate two types of entropy using frequency tables as follows:

a) Entropy using the frequency table of one attributes:

b) Entropy using the frequency table of two attributes:

2.4. INFORMATION GAIN

The information gain is based on the decrease in entropy after a dataset is split on an

attribute. Constructing a decision tree is all about finding attribute that returns the highest

information gain (i.e., the most homogeneous branches).

Step 1: Calculate entropy of the target.

Step 2: The dataset is then split on the different attributes. The entropy for each branch is

calculated. Then it is added proportionally, to get total entropy for the split. The resulting

entropy is subtracted from the entropy before the split. The result is the Information Gain, or

decrease in entropy.

Step 3: Choose attribute with the largest information gain as the decision node.

Step 4(a): A branch with entropy of 0 is a leaf node.

Step 4(b): A branch with entropy more than 0 needs further splitting.

Step 5: The ID3 algorithm is run recursively on the non-leaf branches, until all data is

classified.

2.5. DECISION TREE TO DECISION RULES

A decision tree can easily be transformed to a set of rules by mapping from the root

node to the leaf nodes one by one.

A. Sankari Karthiga, M. Safish Mary, M. Yogasini ©IJARBEST PUBLICATIONS

ISSN (ONLINE):2395-695X

ISSN (PRINT):2395-695X

International Journal of Advanced Research in Basic Engineering Sciences and Technology (IJARBEST)

Vol.3, Issue.3, March 2017

III. CLASSIFICATION USING NAIVE BAYES CLASSIFIER

A. INTRODUCTION

The Naive Bayesian classifier is based on Bayes’ theorem with independence

assumptions between predictors. A Naive Bayesian model is easy to build, with no

complicated iterative parameter estimation which makes it particularly useful for very large

datasets. Despite its simplicity, the Naive Bayesian classifier often does surprisingly well and

is widely used because it often outperforms more sophisticated classification methods.

B. ALGORITHM

Bayes theorem provides a way of calculating the posterior probability, P (c|x), from P(c),

P(x), and P (x|c). Naive Bayes classifier assumes that the effect of the value of a predictor (x)

on a given class (c) is independent of the values of other predictors. This assumption is called

class conditional independence.

 P (c|x) is the posterior probability of class (target) given predictor (attribute).

 P(c) is the prior probability of class.

 P (x|c) is the likelihood which is the probability of predictor given class.

 P(x) is the prior probability of predictor.

Thus, we can write:

Since there is a total of 60 objects, 40 of which are GREEN and 20 RED, our prior

probabilities for class membership are:

A. Sankari Karthiga, M. Safish Mary, M. Yogasini ©IJARBEST PUBLICATIONS

ISSN (ONLINE):2395-695X

ISSN (PRINT):2395-695X

International Journal of Advanced Research in Basic Engineering Sciences and Technology (IJARBEST)

Vol.3, Issue.3, March 2017

Having formulated our prior probability, we are now ready to classify a new object

(WHITE circle). Since the objects are well clustered, it is reasonable to assume that the more

GREEN (or RED) objects in the vicinity of X, the more likely that the new cases belong to

that particular color. To measure this likelihood, we draw a circle around X which

encompasses a number (to be chosen a priori) of points irrespective of their class labels. Then

we calculate the number of points in the circle belonging to each class label. From this we

calculate the likelihood:

From the illustration above, it is clear that Likelihood of X given GREEN is smaller than

Likelihood of X given RED, since the circle encompasses 1 GREEN object and 3 RED ones.

Thus:

Although the prior probabilities indicate that X may belong to GREEN (given that there are

twice as many GREEN compared to RED) the likelihood indicates otherwise; that the class

membership of X is RED (given that there are more RED objects in the vicinity of X than

GREEN). In the Bayesian analysis, the final classification is produced by combining both

sources of information, i.e., the prior and the likelihood, to form a posterior probability using

the so-called Bayes' rule (named after Rev. Thomas Bayes 1702-1761).

Finally, we classify X as RED since its class membership achieves the largest posterior

probability.

Naive Bayes can be modelled in several different ways including normal, lognormal, gamma

and Poisson density functions:

A. Sankari Karthiga, M. Safish Mary, M. Yogasini ©IJARBEST PUBLICATIONS

ISSN (ONLINE):2395-695X

ISSN (PRINT):2395-695X

International Journal of Advanced Research in Basic Engineering Sciences and Technology (IJARBEST)

Vol.3, Issue.3, March 2017

C. PERFORMANCE ANALYSES

(i) DECISION TREE CLASSIFIER-CROSS VALIDATION (EXPERIMENTAL RESULTS)

(ii) DECISION TREE PERFORMANCE METRICS

METHOD

DECISION TREE

Accuracy

98.2753

Sensitivity

95.452

Specificity

97.7919

A. Sankari Karthiga, M. Safish Mary, M. Yogasini ©IJARBEST PUBLICATIONS

ISSN (ONLINE):2395-695X

ISSN (PRINT):2395-695X

International Journal of Advanced Research in Basic Engineering Sciences and Technology (IJARBEST)

Vol.3, Issue.3, March 2017

(iii) SUMMARY

The constructing decision tree techniques are generally computationally

inexpensive, making it possible to quickly construct models even when the training set size is

very large. Furthermore, once a decision tree has been built, classifying a test record is

extremely fast.

D. NAÏVE BAYES

(i) EXPRIMENTAL RESULTS

(ii) NAÏVE BAYES PERFORMANCE METRICS

METHOD

NAIVE BAYES

Accuracy

89.9028

Sensitivity

70.9042

Specificity

85.5353

98.2753

95.452

97.7919

94.5

95.5

96.5

97.5

98.5

Accuracy Sensitivity Specificity

DECISION TREE

A. Sankari Karthiga, M. Safish Mary, M. Yogasini ©IJARBEST PUBLICATIONS

ISSN (ONLINE):2395-695X

ISSN (PRINT):2395-695X

International Journal of Advanced Research in Basic Engineering Sciences and Technology (IJARBEST)

Vol.3, Issue.3, March 2017

(iii) SUMMARY

Poisson variables are regarded here as continuous since they are ordinal rather than truly

categorical. For categorical variables, a discrete probability is used with values of the

categorical level being proportional to their conditional frequency in the training data.

IV. RESULT ANALYSIS

The dataset consists of total 573 records in Heart disease database. The total records

are divided into two data sets one is used for training consists of 303 records & another for

testing consists of 270 records. The data mining tool MATLAB is used for experiment.

Initially dataset contained some fields, in which some value in the records was

missing. These were identified and replaced with most appropriate values using Replace

Missing Values filter from MATLAB. The ReplaceMissingValues filter scans all records &

replaces missing values with mean mode method. This process is known as Data Pre-

processing. After pre-processing the data, data mining classification techniques such as

Neural Networks, Decision Trees, & Naive Bayes were applied.

A confusion matrix is obtained to calculate the accuracy of classification. A confusion

matrix shows how many instances have been assigned to each class. In our experiment we

have two classes, and therefore we have a 2x2 confusion matrix.

Class a = YES (has heart disease)

Class b = NO (no heart disease)

89.9028

70.9042

85.5353

100

Accuracy Sensitivity Specificity

NAIVE BAYES

A. Sankari Karthiga, M. Safish Mary, M. Yogasini ©IJARBEST PUBLICATIONS

ISSN (ONLINE):2395-695X

ISSN (PRINT):2395-695X

International Journal of Advanced Research in Basic Engineering Sciences and Technology (IJARBEST)

Vol.3, Issue.3, March 2017

V. CONFUSION MATRIX

TP (True Positive): It denotes the number of records classified as true while they were

actually true.

FN (False Negative): It denotes the number of records classified as false while they were

actually true.

FP (False Positive): It denotes the number of records classified as true while they were

actually false.

TN (True Negative): It denotes the number of records classified as false while they were

actually false.

Confusion matrix obtained for three classification methods with 13 attributes

CONFUSION MATRIX FOR NAIVE BAYES

CONFUSION MATRIX FOR DECISION TREES

The classification task is to generalize well on unseen/independent data. A classifier is

learned on training/learning data and then tested on data that has not been used for learning

(unseen test data). There exist many measures to assess performance of a classifier and a lot

of techniques to create training and test data in order to estimate generalization ability of a

classifier on test (unseen) data.

Heart disease dataset: UCI Machine Learning Repository.

CHARACTERISTICS OF A DATA SET

Data Set Characteristics

Multivariate

Attribute Characteristics

Real

Associated tasks

Classification

Number of Instances

573

Number of Attributes

A. Sankari Karthiga, M. Safish Mary, M. Yogasini ©IJARBEST PUBLICATIONS

ISSN (ONLINE):2395-695X

ISSN (PRINT):2395-695X

International Journal of Advanced Research in Basic Engineering Sciences and Technology (IJARBEST)

Vol.3, Issue.3, March 2017

CLASS INFORMATION:

The PHR pattern classification for the three class are.

– Category I (Normal)

– Category II (Disease)

VI. PERFORMANCE EVALUATION

This is a measurement tool to calculate the performance:

• Accuracy =

• Sensitivity =

• Specificity =

PERFORMANCE METRICS OF DT AND NB

METHOD

DECISION TREE

NAIVE BAYES

Accuracy

98.2753

89.9028

Sensitivity

95.452

70.9042

Specificity

97.7919

85.5353

98.2753

89.9028

100

DECISION

TREE NAIVE

BAYES

ACCURACY

Accuracy

100

120

DECISION

TREE NAIVE

BAYES

SENSITIVITY

Sensitivity













FNTPTP

















FNFPTNTP TNTP













FPTNTN

A. Sankari Karthiga, M. Safish Mary, M. Yogasini ©IJARBEST PUBLICATIONS

ISSN (ONLINE):2395-695X

ISSN (PRINT):2395-695X

International Journal of Advanced Research in Basic Engineering Sciences and Technology (IJARBEST)

Vol.3, Issue.3, March 2017

SUMMARY OF THE CLASSIFICATION ACCURACY – DECISION TREE CLASSIFIER - CROSS

VALIDATION

NORMAL

DISEASE

Accuracy

97.6482

99.3885

Sensitivity

98.3686

93.7500

Specificity

95.1168

99.8974

SUMMARY OF THE CLASSIFICATION ACCURACY – NAIVE BAYES CLASSIFIER - CROSS

VALIDATION

NORMAL

DISEASE

Accuracy

87.3001

93.3208

Sensitivity

93.7160

82.3864

Specificity

64.7558

94.3077

97.7919

85.5353

100

DECISION

TREE NAIVE

BAYES

SPECIFICITY

Specificity

100

102

Percentage of Accuracy

Diagnosis Result

DECISION TREE CLASSIFICATION

USING CROSS VALIDATION

NORMAL

DISEASE

ISSN (ONLINE):2395-695X

ISSN (PRINT):2395-695X

International Journal of Advanced Research in Basic Engineering Sciences and Technology (IJARBEST)

Vol.3, Issue.3, March 2017

PERFORMANCE ANALYSIS FOR FOUR CLASSIFIERS - 10 FOLD CROSS VALIDATION

METHOD

DECISION TREE

NAIVE BAYES

Accuracy

98.2753

89.9028

Sensitivity

95.4520

70.9042

Specificity

97.7919

85.5353

VII. CONCLUSION

The overall objective of our work is to predict more accurately the presence of heart

disease. In this paper, UCI repository dataset are used to get more accurate results. Three data

mining classification techniques were applied namely Decision trees and Naive Bayes. From

results, it has been seen that Decision trees provides accurate results as compare to Naive

Bayes. This system can be further expanded. It can use more number of inputs. Other data

mining techniques can also be used for predication e.g. Clustering, Time series, Association

rules. The text mining can be used to mine huge amount of unstructured data available in

healthcare industry database.

100

Accuracy Sensitivity Specificity

Percentage of Accuracy

Diagnosis Result

NAIVE BAYES CLASSIFICATION USING

CROSS VALIDATION

NORMAL

DISEASE

100

120

Percentage

Performance

PERFORMANCE METRICS

COMPARISION OF SENSITIVITY FOR

TWO CLASSIFIERS

DECISION TREE

NAIVE BAYES

ISSN (ONLINE):2395-695X

ISSN (PRINT):2395-695X

International Journal of Advanced Research in Basic Engineering Sciences and Technology (IJARBEST)

Vol.3, Issue.3, March 2017

REFERENCES

[1] Frawley and G. Piatetsky -Shapiro, Knowledge Discovery in Databases: An Overview. Published

by the AAAI Press/ The MIT Press, Menlo Park, C.A 1996.

[2] Yanwei, X.; Wang, J.; Zhao, Z.; GAO, Y., “Combination data mining models with new medical data to

predict outcome of coronary heart disease”. Proceedings International Conference on Convergence

Information Technology 2007, pp. 868 – 872.

[3] SellappanPalaniappan, RafiahAwang, "Intelligent Heart Disease Prediction System Using Data Mining

Techniques", IJCSNS International Journal of Computer Science and Network Security, Vol.8 No.8,

August 2008

[4] Niti Guru, Anil Dahiya, NavinRajpal, "Decision Support System for Heart Disease Diagnosis Using

Neural Network", Delhi Business Review, Vol. 8, No. 1 (January - June 2007).

[5] HeonGyu Lee, Ki Yong Noh, KeunHoRyu, “Mining Bio signal Data: Coronary Artery Disease

Diagnosis using Linear and Nonlinear Features of HRV,” LNAI 4819: Emerging Technologies in

Knowledge Discovery and Data Mining, pp. 56-66, May 2007.

[6] ShantakumarB.Patil, Y.S.Kumaraswamy “Intelligent and Effective Heart Attack Prediction System

Using Data Mining and Artificial Neural Network”. ISSN 1450-216X Vol.31 No.4 (2009), pp.642-656.

[7] Carlos Ordonez, "Improving Heart Disease Prediction Using Constrained Association Rules,"

Seminar Presentation at University of Tokyo, 2004.

[8] Kiyong Noh, HeonGyu Lee, Ho-Sun Shon, Bum Ju Lee, and KeunHoRyu, "Associative Classification

Approach for Diagnosing Cardiovascular Disease", Springer, Vol: 345, pp: 721- 727, 2006.

[9] Franck Le Duff, CristianMunteanb, Marc Cuggiaa, Philippe Mabob, "Predicting Survival Causes After

Out of Hospital Cardiac Arrest using Data Mining Method”, Studies in health technology and

informatics, Vol. 107, No. Pt 2, pp. 1256-9, 2004.

[10] LathaParthiban and R.Subramanian, "Intelligent Heart Disease Prediction System using CANFIS and

Genetic Algorithm", International Journal of Biological, Biomedical and Medical Sciences, Vol. 3,

No. 3, 2008.

[11] Antepartum patient heart rate feature extraction and classification using empirical mode decomposition

and support vector machine, Niranjana KrupaEmail author, Mohd Ali MA, Edmond Zahedi, Shuhaila

Ahmed and Fauziah M Hassan, January 2011

[12] Performance Evaluation of K-Means and Heirarichal Clustering in Terms of Accuracy and Running

Time, Nidhi Singh,.Divakar Singh, Department of computer Science &

Engg.BUIT,BU,Bhopal.India.(M.P) , 2012.

[13] Determination of Patient State from Cardiotocogram Using LS-SVM with Particle Swarm

Optimization and Binary Decision Tree, Ersen Yılmaz and Çağlar Kılıkçıer Electrical-Electronic

Engineering Department, Uludag University, 16059 Gorukle, Bursa, Turkey, Received 26 June 2013;

Accepted 6 September 2013.

Machine Learning and Deep Learning Models for Early Detection of Heart Disease

Conference Paper

Nov 2023

Design of a Classifier model for Heart Disease Prediction using normalized graph model

Article

Full-text available

Mar 2024

Heart disease is an illness that influences enormous people worldwide. Particularly in cardiology, heart disease diagnosis and treatment need to happen quickly and precisely. Here, a machine learning-based (ML) approach is anticipated for diagnosing a cardiac disease that is both effective and accurate. The system was developed using standard feature selection algorithms for removing unnecessary and redundant features. Here, a novel normalized graph model (n-GM) is used for prediction. To address the issue of feature selection, this work considers the significant information feature selection approach. To improve classification accuracy and shorten the time it takes to process classifications, feature selection techniques are utilized. Furthermore, the hyper-parameters and learning techniques for model evaluation have been accomplished using cross-validation. The performance is evaluated with various metrics. The performance is evaluated on the features chosen via features representation. The outcomes demonstrate that the suggested n-GM gives 98% accuracy for modeling an intelligent system to detect heart disease using a classifier support vector machine

Diagnosis of Coronary Artery Disease using Adult Data from Blood Tests and Electrocardiograms

Article

Full-text available

Nov 2023

Anika Pallapothu

Objective: Many modifiable risk factors affect the onset of coronary artery disease (CAD), a condition that is extremely common throughout the globe. Predictive models created using machine learning (ML) algorithms may help physicians identify CAD earlier and may lead to better results. The goal of this project was to use ML algorithms to predict CAD in patients. Methods: The gathered dataset of UCI heart disease was used in this study to evaluate a variety of machine learning methods to predict CAD. Just the most crucial aspects of the hypothesis testing method were kept. Support vector machines (SVM) were used in a comparative analysis employing a variety of assessment measures. Results: All machine learning methods achieved accuracy levels of at least 80%, with the SVM algorithm obtaining accuracy levels of at least 90%. Predictive ML models had high diagnostic relevance in CAD, as seen by the SVM model's high recall (0.9), which is was the highest of all the models. Conclusion: The findings of the current study demonstrated that, independent of the measures used to evaluate machine learning models, feature selection has a significant impact on performance. Finding the most useful features is thus crucial. SVM was chosen as the top model based on the features we considered.

Multiclass Liver Disease Prediction with Adaptive Data Preprocessing and Ensemble Modeling

Article

Full-text available

Mar 2024

As described in Chapter two, this thesis focuses on creating a manufacturing testbed in the Future Factories (FF) lab at the University of South Carolina and utilizing Semantic Web technologies on this testbed to realize an autonomous manufacturing use case. This section will narrow down the use case adopted in this thesis. Following that, various aspects of the developed testbed are introduced that showcase its capabilities and how they fit in with the overall use case. Finally, this section will cover the implementation plan of the application developed in this thesis

Heart Disease Prediction: A Comparative Analysis of Machine Learning Algorithms

Article

Feb 2024

Improving accuracy in intelligent coronary heart disease diagnosis prediction model using support vector clustering technique compared over Xgboost classifier algorithm

Conference Paper

Jan 2024

Machine learning for human emotion recognition: a comprehensive review

Article

Full-text available

Feb 2024
NEURAL COMPUT APPL

Emotion is an interdisciplinary research field investigated by many research areas such as psychology, philosophy, computing, and others. Emotions influence how we make decisions, plan, reason, and deal with various aspects. Automated human emotion recognition (AHER) is a critical research topic in Computer Science. It can be applied in many applications such as marketing, human–robot interaction, electronic games, E-learning, and many more. It is essential for any application requiring to know the emotional state of the person and act accordingly. The automated methods for recognizing emotions use many modalities such as facial expressions, written text, speech, and various biosignals such as the electroencephalograph, blood volume pulse, electrocardiogram, and others to recognize emotions. The signals can be used individually(uni-modal) or as a combination of more than one modality (multi-modal). Most of the work presented is in laboratory experiments and personalized models. Recent research is concerned about in the wild experiments and creating generic models. This study presents a comprehensive review and an evaluation of the state-of-the-art methods for AHER employing machine learning from a computer science perspective and directions for future research work.

The PREDICTION FOR HEART DISEASE USING DIVERSE MACHINE LEARNING APPROACHES AND TECHNIQUES

Article

Full-text available

Oct 2023

Classification of cardiac disorders based on electrocardiogram data using a decision tree classification approach with the C45 algorithm

Article

Full-text available

Sep 2023

p>The limitations of medical personnel, especially heart disease, cause difficulties in diagnosing heart disorders, so diagnosing heart disorders is not easy, it takes the ability and experience of a cardiologist who has the expertise and experience to be able to accurately diagnose heart disorders. Several studies in the field of computing have been carried out in diagnosing cardiac abnormalities in patients. This study was conducted to accurately test the results of the classification of heart disorders using electrocardiogram medical record data with a C.45 decision tree approach. The results showed that the classification of heart defects obtained a mean squared error (MSE) value of 0.24, a root mean squared error (RMSE) value of 0.49, and an accuracy value of 75.33% with the C4.5 algorithm.</p

Detection and Classification of Leaf Blast Disease using Decision Tree Algorithm in Rice Crop

Chapter

Jul 2023

Lung disease prediction system using data mining techniques

Article

Full-text available

Jan 2017

Data mining is defined as analyzing very large amount of data for getting some useful information. Data mining techniques like association rule mining, classification and clustering is implemented to analyze the different types of disease. Classification is an important problem in Data mining. Given a database contains a collection of records, each with a single class label, a classifier performs a brief and clear definition for each class that can be used to classify successive records. Data mining plays an important role in medical systems. It is used to discover the knowledge out of data and presenting it in the form that human can easily understand. It is a cooperative effort of humans and computers. There are two primary goals of data mining - Prediction and Description. Prediction involves some variables or fields in the data set to predict unknown or future values of other variables of interest. Description focuses on finding patterns describing the data that can be interpreted by humans. It is very useful for predicting diseases such as Heart disease, Lung disease. Lung cancer is one of the most dangerous diseases in the world. The early detection of lung cancer can cure the disease completely. Data mining plays an effective role by using Naïve Bayes and Artificial Neural Network to massive volume of healthcare of data. The health care industry collects huge amounts of data which unfortunately are not mined to find the hidden data. The Naïve Bayes aims at delivering robust classifications also when dealing with small or incomplete data sets. The aim of the paper is to detect and diagnose the lung diesases as early as possible which will help the doctor to save the patient’s life. This paper describes how lung cancer was predicted and controlled, using data mining techniques. © 2017, Institute of Advanced Scientific Research, Inc. All rights reserved.

Intelligent Heart Disease Prediction System using CANFIS and Genetic Algorithm

Article

Full-text available

Jan 2007

Heart disease (HD) is a major cause of morbidity and mortality in the modern society. Medical diagnosis is an important but complicated task that should be performed accurately and efficiently and its automation would be very useful. All doctors are unfortunately not equally skilled in every sub specialty and they are in many places a scarce resource. A system for automated medical diagnosis would enhance medical care and reduce costs. In this paper, a new approach based on coactive neuro-fuzzy inference system (CANFIS) was presented for prediction of heart disease. The proposed CANFIS model combined the neural network adaptive capabilities and the fuzzy logic qualitative approach which is then integrated with genetic algorithm to diagnose the presence of the disease. The performances of the CANFIS model were evaluated in terms of training performances and classification accuracies and the results showed that the proposed CANFIS model has great potential in predicting the heart disease. Keywords—CANFIS, Genetic Algorithms (GA), Heart disease, Membership Function (MF).

Decision support system for heart disease diagnosis using neural network

Article

Jan 2007

Anil Kumar

Knowledge discovery in databases: An overview

Article

Jan 1991

Intelligent and effective heart attack prediction system using data mining and artificial neural network

Article

Jan 2009

The diagnosis of diseases is a vital and intricate job in medicine. The recognition of heart disease from diverse features or signs is a multi-layered problem that is not free from false assumptions and is frequently accompanied by impulsive effects. Thus the attempt to exploit knowledge and experience of several specialists and clinical screening data of patients composed in databases to assist the diagnosis procedure is regarded as a valuable option. This research work is the extension of our previous research with intelligent and effective heart attack prediction system using neural network. A proficient methodology for the extraction of significant patterns from the heart disease warehouses for heart attack prediction has been presented. Initially, the data warehouse is pre-processed in order to make it suitable for the mining process. Once the preprocessing gets over, the heart disease warehouse is clustered with the aid of the K-means clustering algorithm, which will extract the data appropriate to heart attack from the warehouse. Consequently the frequent patterns applicable to heart disease are mined with the aid of the MAFIA algorithm from the data extracted. In addition, the patterns vital to heart attack prediction are selected on basis of the computed significant weightage. The neural network is trained with the selected significant patterns for the effective prediction of heart attack. We have employed the Multi-layer Perceptron Neural Network with Back-propagation as the training algorithm. The results thus obtained have illustrated that the designed prediction system is capable of predicting the heart attack effectively.

Mining biosignal data: Coronary artery disease diagnosis using linear and nonlinear features of HRV emerging technologies in knowledge discovery and data mining

Article

Jan 2009
Lect Notes Comput Sci

Combination Data Mining Methods with New Medical Data to Predicting Outcome of Coronary Heart Disease

Article

Nov 2007

The prediction of survival of Coronary Heart Disease (CHD) has been a challenging research problem for medical society. The goal of this paper is to develop data mining algorithms for predicting survival of CHD patients based on 1000 cases .We carry out a clinical observation and a 6-month follow up to include 1000 CHD cases. The survival information of each case is obtained via follow up. Based on the data, we employed three popular data mining algorithms to develop the prediction models using the 502 cases. We also used 10-fold cross-validation methods to measure the unbiased estimate of the three prediction models for performance comparison purposes. The results indicated that the SVM is the best predictor with 92.1 accuracy on the holdout sample artificial neural networks came out to be the second with 91.0 accuracy and the decision tress models came out to be the worst of the three with 89.6% accuracy. The comparative study of multiple prediction models for survival of CHD patients along with a 10-fold cross- validation provided us with an insight into the relative prediction ability of different data.

Associative Classification Approach for Diagnosing Cardiovascular Disease

Chapter

Oct 2006

ECG is a test that measures a heart’s electrical activity, which provides valuable clinical information about the heart’s status. In this paper, we propose a classification method for extracting multi-parametric features by analyzing HRV from ECG, data preprocessing and heart disease pattern. The proposed method is an associative classifier based on the efficient FP-growth method. Since the volume of patterns produced can be large, we offer a rule cohesion measure that allows a strong push of pruning patterns in the pattern-generating process. We conduct an experiment for the associative classifier, which utilizes multiple rules and pruning, and biased confidence (or cohesion measure) and dataset consisting of 670 participants distributed into two groups, namely normal people and patients with coronary artery disease.

Mining Biosignal Data: Coronary Artery Disease Diagnosis Using Linear and Nonlinear Features of HRV

Conference Paper

May 2007

The main purpose of our study is to propose a novel methodology to develop the multi-parametric feature including linear and nonlinear features of HRV (Heart Rate Variability) diagnosing cardiovascular disease. To develop the multi-parametric feature of HRV, we used the statistical and classification techniques. This study analyzes the linear and the non-linear properties of HRV for three recumbent positions, namely the supine, left lateral and right lateral position. Interaction effect between recumbent positions and groups (normal and patients) was observed based on the HRV indices and the extracted HRV indices used to classify the CAD (Coronary Artery Disease) group from the normal people. We have carried out various experiments on linear and non-linear features of HRV indices to evaluate several classifiers, e.g., Bayesian classifiers, CMAR, C4.5 and SVM. In our experiments, SVM outperformed the other classifiers.

Knowledge Discovery in Databases: An Overview

Article

Jan 1992
AI MAG

this article. 0738-4602/92/$4.00 1992 AAAI 58 AI MAGAZINE for the 1990s (Silberschatz, Stonebraker, and Ullman 1990)

Early Prediction of Heart Disease Using Decision Tree Algorithm

Abstract

Recommended publications

Prediction of Heart Disease using Classification Algorithms

Effective Prediction of Heart Disease: Data Mining in Healthcare Domain

HEART DISEASE PREDICTION USING DATA MINING TECHNIQUES

Improved Study of Heart Disease Prediction System using Data Mining Classification Techniques

An Enhanced Study on Predicting Heart Diseases Using Datamining Techniques