Conference PaperPDF Available

Cardiovascular Disease Forecast using Machine Learning Paradigms

Authors:

Abstract and Figures

In this recent era, Cardiovascular disease (CVD)propagation rate has been intensifying the cause of deathworldwide among the non-communicable disease. In particularthe south asian countries have a tremendous risk ofcardiovascular disease at an early age than any other ethnicgroup. Most often it’s challenging for medical practitioners topredict cardiovascular disease as it requires experience andknowledge which is a complex task to accomplish. This healthindustry has enormous amounts of data which is useful formaking effective conclusions using their hidden information. S o,using appropriate results and making effective decisions on data,some superior data analysis techniques are used, for exampleNaive Bayes, Decision Tree. By using some properties like (age,gender, bp, stress, etc) it can be predicted the chances ofcardiovascular disease. In this study, we collected 301 sampledata with 12 clinical attributes. Logistic regression, Decision tree,S VM, and Naive bayes classification algorithms have beenapplied to predict heart disease. In this case, logistic regressionprovided 86.25% accuracy. However, we also compared the UCIdataset based results with our model.
Content may be subject to copyright.
Cardiovascular Disease Forecast using Machine
Learning Paradigms
Saiful Islam
Senior Lecturer
Daffodil Inte rn ation a l Univ ers it y
E-mail: s aiful.cs e@d iu.edu .bd
Nusrat Jahan
Senior Lecturer
Daffodil Inte rn ation a l Univ ers it y
E-mail: nusratjah an.cs e@ diu.edu .bd
Mst. Eshita Khatun
Lecturer
Daffodil Inte rn ation a l Univ ers it y
E-mail: eshita .c s e @d iu .e du.bd
Abstract— In thi s recent era, Cardiovascular di sease (CVD)
propagation rate has been i ntensifying the cause of death
worldwide among the non -communicabl e disease . In particular
the south asian countries have a tremendous risk of
cardi ovascular disease at an e arly age than any other ethni c
group. Most often it’s challe nging for medical practitioners to
predict cardiovascular disease as it requires experience and
knowledge which is a complex task to accomplish. This health
industry has enormous amounts of data which is use ful for
making effe ctive conclusi ons using their hi dden i nformation. So,
usi ng appropriate resu lts and makin g effe ctive deci sions on data,
some superior data analysi s te chniques are used, for example
Naive Bayes, Decision Tree. By using some propertie s like (age,
gender, bp, stress, etc) it can be predicted the chances of
cardi ovascular disease . In thi s study, we collected 301 sample
data wi th 12 cl inical attributes . Logi stic regressi on, Deci sion tree,
SVM, and Naive bayes classifi cation algorithms have be en
appl ie d to predict heart dise ase. In this case , logisti c regression
provided 86.25% accuracy. Howe ver, we also compared the UCI
dataset based resul ts with our model.
Keywords— Classification Algorithm, Heart Diseases, Decision
Tree, SVM, Logistic Regression, Naive Bayes, UCI dataset.
I. INTRODUCTION
Cardiovascular disease is one of the leading causes of death
throughout the world. Many people die from cardiovascular
diseases than from any other causes. Cardiovascular diseases
are more accountable than from any other causes to lay down
one’s life. It accounts for nearly one in every three deaths
worldwide annually according to The World Health
Organization report. If the heart stops functioning normally,
our body's other organs will st o p the ir working proc es s .
Mortality rate increases on account of cardiovascular dis ease in
different countries including Bangladesh. In 2017, The World
Health Organization figured out the deaths from heart disease
in Bangladesh entered 14.31 percent of total deaths, and every
year cardio v as cula r d is e ase kills 17.9 million peop le , 31
percent of global deaths [1].
In the medical domain, data mining can be used to extract
information from hidden patterns of dataset. In present we
receive any medical data in a distributed way as a paper
document. This data orientation needs an embodied structure.
Pre-process data is needed to apply machine learning
algorithms to gain a more accurate result. By using data mining
te chniq u es , t his extrac tive d ata will h elp to p re dict th e medica l
diagnosis. Future prediction based approach also will he lp the
doctor’s to take the right steps to treat the patient in time with
the help of previous patterns of the dataset. Techniques of the
data mining and model of the prediction are responsible for
proper prognosis of the disease [2].
Cardiovascular disease rate increases gradually in related to
people’s driven life-style as like habits of smoking, high fat
ingestion, lakings of physical mobility. A s the h eart is a pump
to circulate the blood across the whole body. According to The
United States National Institutes of Health (NIH) reports heart
rate varies from person to person on the basis of different
pa ramete rs o n av erag e 60-100 times in one minu te [3].
This study is for finding the appropriate model to predict
cardiovas cular diseas e. Heart d is ease is o ne of the major causes
of death in our country also. It is crucial to make people aware
about the risk factor of cardiovascular diseases.
In the following section we first discussed a few related
works then talked about our proposed approach to find the risk
factor as well as a better model for heart disease prediction.
Aft er that, in section III we dis cu s s ed the res ult, and next
conclusion with some notes for future work in section IV.
II. RELATED WORK
This section is for presenting the research demand on this
topic and some works that must highlight our study. We found
that a lot of research was focused on cardiovascular disease.
We were keen to know the risk factors for predicting heart
disease.
Nabaouia Louridi, Meryem Amar and Bouabid El Ouahidi
used different machine learning algorithms to identify the
Cardiovascular Disease (CVD) where they finally proposed a
SVM with a linear kernel approach. In this approach they used
13 features and found an accuracy of 86.8% [4]. In 2019 N.
Satish Chandra Reddy, Song Shue Nee, Lim Zhi Min & Chew
Xin Ying stated that Random Forest can be used as
classification algorithm to train the sys tem for identifying CVD
with 90% to 95% accuracy. They used 14 features [5].
Us ing t he Da tas et o f UCI lib rary in 2018 A d iti Gav han e,
Gouthami Kokkula, Isha Pandya & Prof. Kailas Devadkar
conducted a study to predict the heart dis ease of a human using
13 features among 76. In this study they used MLP and got an
average of 0.91 precis ion [6].
Proceedings of the Fourth International Conference on Computing Methodologies and Communication (ICCMC 2020)
IEEE Xplore Part Number:CFP20K25-ART; ISBN:978-1-7281-4889-2
978-1-7281-4889-2/20/$31.00 ©2020 IEEE 487
2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC) 978-1-7281-4889-2/20/$31.00 ©2020 IEEE 10.1109/ICCMC48092.2020.ICCMC-00091
In this work, Sonakshi Harjai1 & Sunil Kumar Khatri have
used Correlation-based feature selection and Multilayer
Perceptron classifier to propose a new model. 297 input was
tested to find an 89.2% accuracy with the proposed model [7].
In 2019 Sen thilkumar Mohan, Cha ndras ega r Thiru malai
And Gautam Srivastava proposed a Hybrid Random Forest
with a linear model with 88.7% accuracy to predict heart
disease. In this study they consider UCI machine learning
repository to collect the dat aset. They used multi-class variable
and binary classification for data pre-processing [8].
Uma N Dulh are in 2018, narrates a methodology to
improve the performance of Bayesian classifier to predict heart
disease. Here, the author used Statlog (heart) data set UCI
where th ere were 14 at tribu tes with 1 cla s s label an d 270
instances with no missing values. He obtained 87.91%
accuracy for Particle swarm optimization (PSO) with Naive
Bayes classifier [9].
So, it is obvious that we need a proper solution to prevent
heart disease as soon as possible. This is because all over the
world researchers are very keen to work with cardiovascular
disease prediction as we are able to know the risk factors of
this disease.
III. PROP OSED MODEL
In this study we address the issue for predicting heart
dis e ase. W e co llected 301 dat a ag ain s t 13 at tribut es [10]. This
study proposed an approach which classified the risk of having
cardiovascular disease. In this proposed system, we used
Logistic regression, Naive Bayes, SVM, and Decision Tree
classification algorithm for getting better accuracy in the case
of predict heart disease. Finally, analysing the obtained results
with the help of Comparing Models through Confusion Matrix.
A e ntire of 301 s ampless with 13 at tribut es were o bta ined
as we mentioned. After that, whole samples were divided
equally into two sets: training data (80%) and testing data
(20%). To avoid bias, the samples for each set were nominated
randomly. Presented Fig. 1 to illustrat e the wh ole working
process. We started our study with feature selection as we
considered UCI dataset (Heart) to compare our study. After
feature selection we collwcted all s ample data . At the end of
the study, we applied four selected classifier algorithms to
predict the risk level of cardiovascular disease. Among four
classifier we got best result from logistic regression. Finally,
our model is ready to predict heart disease.
There are some machine learning algorithms from the area
of statistics. Also known as the go-to technique for
classification problems in machine learning. Logistic
Regression is a little bit similar because both have the goal of
estimating the values for the parameters or coefficients. After
training a model we find out the relation between training and
testing data. We got 86.25% accurate result for logistic
regression.
Naive Bayes is another classification algorithm which uses
the Bayes theorem with independent assumptions between
features. One dimensional Naive Bayes classifier computes the
ratio o f t he lo g p ro b abilit ies of th e feat ures belonging in all the
classes. Naive Bayes does not consider the correlation between
attributes. Naive Bayes is a very scalable classifier but it can
create a bias towards one or more attributes. In our approach
we found 73.77% accuracy after applying naive bayes.
In the large amount of data environment Support vector
Machine (SVM) provides classification learning model.
Linearly (e.g., straight line or hyperplane) data domain
dividation is called Linear SVM. And data domain
transformation to a space entitled the space of feature and to
separate the classes, data domain can be divided linearly is
named non linear support vector machine [11]. In the case of
classify our dataset we got 83.61% accuracy for SVM.
Finally, our last selected algorithm was Decision tree.
Decision tree is one of the most popular classification and
decision making algorithms. Different types of decision tree are
available. Here, we used classification decision tree to classify
the datase. However we did not get better result (75.41%) for
this algorithm. Arundhati Navada et al. in 2011 pres en ted a
paper for describing the basic idea of decision tree. In this
paper they talked about different types of decision tree with
equation and explanation [12].
Fig. 1. Flowchart of our proposed Model
Now, we are going to describe our datset. We considered
UCI repository as a standard clinical attributes to select
features for our study as we said. After that we fixed our
attributes list with class label to collect the data from a google
form. Table 1 for presenting the attributes list of our dataset.
Here, we described all 12 attributes with proper measurement
values.
Proceedings of the Fourth International Conference on Computing Methodologies and Communication (ICCMC 2020)
IEEE Xplore Part Number:CFP20K25-ART; ISBN:978-1-7281-4889-2
978-1-7281-4889-2/20/$31.00 ©2020 IEEE 488
0
20
40
60
80
100
Low Normal High
Heart Disease frequency According to
Exercise
Risk No Risk
TABLE 1. DESCRIPTION OF ATTRIBU TES
Now, we are going to explore our dataset. Number of
affected people according to gender and comparison of age for
blood cholostore is illu s t ra ted in Fig . 2 a n d 3 respec tively for
depicting the dataset.
Fig. 2. Explore Gender Dat aset
Fig. 3. Explore Blood Cholesterol Dataset
Fig. 4. Ex plore Class label for exercise.
Exercise or physical activity is one of the most effective
factor to predict heart disease. In Fig. 4, we presenting clas s
label according to daily exercis e. It is obvious from bar chart
thet exercis e is helpful to reduce risk of heart dis ease.
Fig. 5: Risk Factor of CVD
We demonastrated the risk factor of Cardiovaascular
Disease (CVD) with the help of Fig 5. Here, we got that
obesity is one of the crucial risk factors for CVD.
Proceedings of the Fourth International Conference on Computing Methodologies and Communication (ICCMC 2020)
IEEE Xplore Part Number:CFP20K25-ART; ISBN:978-1-7281-4889-2
978-1-7281-4889-2/20/$31.00 ©2020 IEEE 489
IV. RESULT ANALYSIS
Our prediction approach was developed with 13 attributes
with class labels. Table 2 for present the result of our proposed
approach. Here, it is obvious that logistic regression gave better
results among four selected models. In our strudy we applied
all thos e classifier on our own collected data. Elsewhere we
found more accurate those who worked with datas et (Heart)
from UCI repos itory .
TABLE 2. COMPA RISON THE ACCU RACY OF VARIOUS ALGORITHMS
Cl assification Algo rithm
Accuracy
Logistic Regression
86.25%
Support Vector Classifier
83.61%
Decis io n T rees
75.41%
Naïve Bayes
73.77%
A. Comparative Study
In this study we also ovserved UCI Heart disease dataset
and foun many reseach works on this dataset. Senthilkumar
Mohan et al. worked with UCI dataset and they also applied
those four classification algorithms. Table 3 for compared our
result with UCI dataset based study.
TABLE 3. UCI DATASET BASED RESULT COMP ARISIO N
Here, we can find that they got less accuracy for Naive
bayes and best result for SVM. Howwever, we got best result
for logistic regression. Now Table 4 to depict the result of
Logistic regression classifier.
TABLE 4. LOGISTIC REGRESSIO N RESULT ANALYSIS
V. CONCLUSION
Heart disease is one of the major deaths anywhere in the
world. We addressed this issue in addition proposed an
approach to predict the risk factor as we can prevent this
disease as early as possible. In this study we got logistic
regression provided better results agains t 12 attributes. We also
compared UCI dataset and found the factor to predict heart
disease. Many research works showed more accurate results
though they used UCI repository. However, we considered our
own collected data.
In future we have a desire to predict this disease with more
accurately and for this case we will h ave to colle ct more d ata as
much as poss ible. Concurrently, we need accurate clinical data
to predict any types of diseases. Everyone needs good health to
live a be autiful life ot h erwis e s ocia l d eve lopment will b e stuc k.
REFERENCES
[1] Cardiovascular diseases” Av a ila bl e : h t t p s: / /www. wh o .in t /en /n ews -
roo m/fact-sheets/detail/cardiovascular-diseases-(cv ds). [Accessed: 25-
January- 2020]
[2] Wu, Ching-seh Mike, Mustafa Badsh ah, and Vish wa Bhagwat , “ Heart
Disease Pr edict ion Using Data Mining Techniques. ” In Pro ceedin gs of
the 2019 2nd International Conference on Data Science and Information
Technology, pp. 7-11. 2019.
[3] “W hat should my heart rat e be?”
h t t p s : // www. m ed ic a ln e ws t o da y . co m / a r t i cl e s/ 235710.php#normal-
rest ing-heart-rate. [Accessed: 21- January- 2020]
[4] Nabao uia Lo ur idi, Me rye m Amar , Bo uabid El Oua hidi
IDENT IFICAT ION OF CARDIOVASCULAR DISEASES USING
MACHINE LEARNING”, 7th Mediterranean Congress of
Telecommunications (CMT), 2019, DOI: 10.1109/CMT.2019.8931411.
[5] N. Satish Chandra Reddy, Song Shue Nee, Lim Zhi Min & Chew Xin
Ying “ Classification an d Feature Selection App roaches by Machine
Learning T ech niques: Heart Disease Prediction”, Int ernation al Journal
of Innovative Computing, 2019, DOI:
ht tps://doi.org/10.11113/ijic.v9n1.210.
[6] Aditi Gavhane, Gouthami Kokkula, Isha Pandya & Prof. Kailas
Devadkar (P hD) “ Prediction of Heart Disease Using Machine Learning”,
ICECA 2018, IEEE Xplore ISBN:978-1-5386-0965-1.
[7] Sonakshi Harjai1 & Sunil Kumar Kh at ri, “ An Int elligent Clinical
Decision Support System Based on Artificial Neural Network for Early
Diagnosis of Cardiovascular Diseases in Rural Areas”, AICAI, 2019,
DOI: 10.1109/AICAI.2019.8701237.
[8] Sen th ilk uma r Moha n, Ch andra segar T hirum ala i A nd Gaut am Sriv astava,
“Effective Heart Disease Prediction Using Hybrid Machine Learning
Techniques”, Special Sect ion On Smart Caching, Communications,
Computing and Cybersecurity For Information-centric Internet Of
Things, IEEE Access Volume 7, 2019,
DOI:10.1109/ACCESS.2019.2923707.
[9] Duraipan dia n, M. "Performance Eva luat io n of Ro ut in g Al gorithm f or
MANET based on t he Machine L earning Techniques." Journ al o f trends
in Com puter Science and Smart technology (TCSST) 1, no. 01 (2019):
25-38.
[10] “ Dataset” Available: https://gith ub.com /istyak /hd/blob/master/a4.csv.
[Accessed: 12- Dec- 2019].
[11] Suthaharan S., “Support Vector Machine. In: Machine Learning Models
and Algorithms for Big Data Classification”. Integrated Series in
Information Systems, vol 36. Springer, Boston, MA, 20 16.
[12] Arundhat i Navada, Aamir Nizam Ansari, Siddhart h Pat il, Balwan t
A.Sonkamble, “Overview of Use of Decision Tree algorithms in
Machine Learning”. IIEEE Control and System Graduate Research
Colloquium, 2011.
Classification Algorithm Accuracy
Lo gistic Regressio n 82.9%
Support Vector Classifier 86.1%
Decision Trees 85%
Naïve Bayes 75.8%
Class
Label
Precision Recall F1-score Support
0 0.82 0.97 0.89 34
1 0.95 0.74 0.83 27
Avg 0.89 0.86 0.86 61
Proceedings of the Fourth International Conference on Computing Methodologies and Communication (ICCMC 2020)
IEEE Xplore Part Number:CFP20K25-ART; ISBN:978-1-7281-4889-2
978-1-7281-4889-2/20/$31.00 ©2020 IEEE 490
... To effectively appraise the performance of the devised strategy, the benchmark dataset Cleveland for heart disease is used and these benchmark databases are used by other researchers [5]. To enhance the performance of ML predictive models, it is crucial to have appropriate data for training and testing. ...
... In another prior study conducted by Saiful Islam et al., titled "Cardiovascular Disease Forecast using Machine Learning Paradigms", the research utilized 301 sample data points with 12 clinical attributes. The classification algorithms used to predict heart disease included Logistic Regression, Decision Tree, SVM, and Naive Bayes, with Logistic Regression achieving an accuracy of 86.25% [5]. However, Deep Neural Networks (DNN), while powerful, encounter challenges such as complexity that leads to overfitting, the necessity for large datasets, limited transparency, and high computational demands. ...
Article
Heart disease has become a global health issue and is recorded as one of the primary causes of death in many countries. In this modern era, with rapid technological advancements and shifting lifestyles, numerous factors contribute to the increasing prevalence of heart diseases. These range from dietary habits, lack of physical activity, stress, to genetic factors. Given the complexity of this ailment, information technology plays a crucial role in providing innovative solutions. One of them is predicting the risk of heart disease, enabling more targeted early prevention and treatment interventions.Correct data analysis is pivotal in making predictions. However, a common challenge often encountered is the imbalance in data classes, which can result in a predictive model being biased. This is certainly detrimental, especially in the context of predicting strokes, where prediction accuracy can mean the difference between life and death.In this research, our focus was on developing a Deep Neural Network (DNN) Architecture model. This model aims to offer more accurate predictions by considering data complexities. By optimizing several key parameters, such as the type of optimizer, learning rate, and the number of epochs, we strived to achieve the model's best performance. Specifically, we selected Adagrad as the optimizer, set the learning rate at 0.01, and employed a total of 100 epochs in its training.The results obtained from this research are quite promising. The optimized DNN model displayed an accuracy score of 0.92, precision of 0.92, recall of 0.95, and an f-measure of 0.93. This indicates that with the right approach and meticulous optimization, technology can be a highly valuable tool in combatting heart diseases.
... Moreover, an intelligent model utilizing Random Forest for disease prediction, focusing on the importance of smart disease prediction within a ML framework, was introduced in [30]. Finally, authors the authors of [31] proposed a model using Logistic Regression, Decision Tree, SVM, and Naive Bayes for heart disease prediction. ...
Article
Full-text available
Heart disease is a leading global cause of mortality, demanding early detection for effective and timely medical intervention. In this study, we propose a machine learning-based model for early heart disease prediction. This model is trained on a dataset from the UC Irvine Machine Learning Repository (UCI) and employs the Extra Trees Classifier for performing feature selection. To ensure robust model training, we standardize this dataset using the StandardScaler method for data standardization, thus preserving the distribution shape and mitigating the impact of outliers. For the classification task, we introduce a novel approach, which is the concatenated hybrid ensemble voting classification. This method combines two hybrid ensemble classifiers, each one utilizing a distinct subset of base classifiers from a set that includes Support Vector Machine, Decision Tree, K-Nearest Neighbor, Logistic Regression, Adaboost and Naive Bayes. By leveraging the concatenated ensemble classifiers, the proposed model shows some promising performance results; in particular, it achieves an accuracy of 86.89%. The obtained results highlight the efficacy of combining the strengths of multiple base classifiers in the problem of early heart disease prediction, thus aiding and enabling timely medical intervention.
Chapter
As known that heart disease (HD) is one among the most deadly diseases that hamper lives of many of the people across the globe. Life loss will be prevented when the heart disease is detected early. The cardiac hubs and hospitals are hugely depending on the ECG as a common tool to assess and diagnose the heat failure disease at early stages. Early detection of heart disease is one of the most vital issues in HCS (Health Care Services). This paper presents various Machine Learning (ML) and Deep Learning (DL) technologies based heart disease prediction systems in brief analysis. The SVM (Support Vector Machine), NB (Naïve Bayes), XGBoost, Enhanced Deep Convolutional Neutral Network (EDCNN), Deep Neutral Network (DNN) and K-Nearest Neighbour (KNN) are the used classifiers in this study. These classifiers use the data of heart disease patients from different datasets. Various performance parameters are used for the performance evaluation of individual classifiers and these are Accuracy, Precision, recall and f1-measure.
Article
Full-text available
In recent times, heart disease has been the significant cause of mortality worldwide, where cardiovascular disease prediction is a challenging task in clinical data valuation. Notably, the size and number of medical datasets in healthcare are enormously increasing, and the automated model using data mining and machine learning techniques helps clinicians to make appropriate and efficient decisions. The major challenges in heart disease prediction include an imbalance of the samples and a lack of magnitude for some features. This research article aims to propose effective feature optimization and classification methodologies for heart disease prediction that help medical practitioners in the early diagnosis of disease. Firstly, the min–max normalization technique is applied to rescale the collected data from Shahid Rajaei hospital and UCI Cleveland datasets. Further, the optimal features and feature subsets are determined by employing Recursive Feature Elimination and Improved Particle Swarm Optimization (IPSO) algorithms. Lastly, the optimized feature subsets are fed to the ensemble classifier for heart disease prediction: presence and absence of the heart disease. From the experimental analysis, the proposed ensemble-based IPSO model obtained 98.41% and 97.40% accuracy on the UCI Cleveland and Shahid Rajaei hospital datasets, where the obtained results are superior compared to the traditional machine learning techniques.
Chapter
Considering the existence of autonomous vehicles, it is seen that many studies have been done on the traffic light classification recently. Automatic determination of traffic lights can significantly prevent traffic accidents. As the number of vehicles on the road increases daily, such a classification process becomes crucial. The classification process appears to result in higher accuracy using deep learning approaches. In this study, a deep learning-based classification process is performed for traffic lights. A convolutional neural network model with efficient parameters is proposed. Additively, hyperparameter adjustment is made. In addition to this, the effects of color spaces and input image sizes on the classification results are investigated. There are four classes of images with red, yellow, green, and off tags in the database used. When the results are examined, it is seen that the classification accuracy of over 96% is achieved.
Article
Full-text available
Heart disease is one of the most significant causes of mortality in the world today. Prediction of cardiovascular disease is a critical challenge in the area of clinical data analysis. Machine learning has been shown to be effective in assisting in making decisions and predictions from the large quantity of data produced by the healthcare industry. We have also seen machine learning (ML) techniques being used in recent developments in different areas of Internet of Things (IoT). Various studies give only a glimpse into predicting heart disease with machine learning techniques. In this paper, we propose a novel method that aims at finding significant features by applying machine learning techniques resulting in improving the accuracy in the prediction of cardiovascular disease. The prediction model is introduced with different combinations of features, and several known classification techniques. We produce an enhanced performance level with accuracy level of 88.7% through the prediction model for heart disease with Hybrid Random Forest with Linear Model (HRFLM).
Article
The heart disease has been one of the major causes of death worldwide. The heart disease diagnosis has been expensive nowadays, thus it is necessary to predict the risk of getting heart disease with selected features. The feature selection methods could be used as valuable techniques to reduce the cost of diagnosis by selecting the important attributes. The objectives of this study are to predict the classification model, and to know which selected features play a key role in the prediction of heart disease by using Cleveland and statlog project heart datasets. The accuracy of random forest algorithm both in classification and feature selection model has been observed to be 90–95% based on three different percentage splits. The 8 and 6 selected features seem to be the minimum feature requirements to build a better performance model. Whereby, further dropping of the 8 or 6 selected features may not lead to better performance for the prediction model.
Article
The rapid advances in wireless communication technology has led to an extraordinary progress in the adhoc type of networking. The mobile adhoc networks being a subtype of the adhoc network almost poses the same characteristics of the adhoc network, presenting multiple challenges in framing a route for the transmission of the information from the source to the destination. So the paper proposes a routing method developed based on the reinforcement learning, exploiting the node information’s to establish a route that is short and stable. The proposed method scopes to minimize the energy consumption, transmission delay, and improve the delivery ratio of the packets, enhancing the throughput. The efficiency of the proposed method is determined by validating its performance in the network simulator-II, in terms of the energy consumption, delay in the transmission and the packet delivery ratio.
Conference Paper
Studies have shown that heart diseases have emerged as the number one cause of deaths. Heart disease is accountable for deaths in all age groups and is common among males and females. A good solution to this problem is to be able to predict what a patient's health status will be like in the future so the doctors can start treatment much sooner which will yield better results. It's a lot better than acting at the last minute where the patient is already at risk and hence the prediction of heart disease is widely researched area. A lot of research and technological advancement has been recorded in similar fields. This paper aims to report about taking advantage of the various data mining techniques and develop prediction models for heart disease survivability.
Chapter
Support Vector Machine is one of the classical machine learning techniques that can still help solve big data classification problems. Especially, it can help the multidomain applications in a big data environment. However, the support vector machine is mathematically complex and computationally expensive. The main objective of this chapter is to simplify this approach using process diagrams and data flow diagrams to help readers understand theory and implement it successfully. To achieve this objective, the chapter is divided into three parts: (1) modeling of a linear support vector machine; (2) modeling of a nonlinear support vector machine; and (3) Lagrangian support vector machine algorithm and its implementations. The Lagrangian support vector machine with simple examples is also implemented using the R programming platform on Hadoop and non-Hadoop systems.
Conference Paper
A decision tree is a tree whose internal nodes can be taken as tests (on input data patterns) and whose leaf nodes can be taken as categories (of these patterns). These tests are filtered down through the tree to get the right output to the input pattern. Decision Tree algorithms can be applied and used in various different fields. It can be used as a replacement for statistical procedures to find data, to extract text, to find missing data in a class, to improve search engines and it also finds various applications in medical fields. Many Decision tree algorithms have been formulated. They have different accuracy and cost effectiveness. It is also very important for us to know which algorithm is best to use. The ID3 is one of the oldest Decision tree algorithms. It is very useful while making simple decision trees but as the complications increases its accuracy to make good Decision trees decreases. Hence IDA (intelligent decision tree algorithm) and C4.5 algorithms have been formulated.