Conference PaperPDF Available

Risk Factor Prediction of Chronic Kidney Disease Based on Machine Learning Algorithms

Authors:

Abstract and Figures

Chronic kidney disease (CKD) is an increasing medical issue that declines the productivity of renal capacities and subsequently damages the kidneys. CKD is very common nowadays; cardiovascular infection and end-stage renal illness are two life-threatening diseases that can be caused as after-effects of CKD. These are conceivably preventable through early recognizable conditions and treatment of people who are in danger. The expectation of medical problems is a very troublesome assignment. CKD is particularly one of the most lethal diseases in the clinical field. Before it becomes too late to recognize CKD forecast, to get rid of risks, the prediction of risk factor is a major necessary step in the immediate stage. In this paper, we have applied six algorithms. Naïve Bayes, Random forest, Simple logistic regression, Decision Stump, Linear regression model, simple linear regression is using for predicting the risk factors of CKD. Considering the orderly execution and investigations of these strategies, six algorithms give a superior and quicker characterization execution. Six individual algorithms are applied to the dataset and the best outcomes have been acquired through the classification of predicting risk factors.
Content may be subject to copyright.
Risk Factor Prediction of Chronic Kidney Disease
Based on Machine Learning Algorithms.
Md. Ashiqul Islam 1
Dept. of Computer Science and Engineering
Daffodil International University
Dhaka, Bangladesh
ashiqul15-951@diu.edu.bd
Shamima Akter 2
Dept. of Bioinformatics and Computational Biology
George Mason University
Manassas, VA -20110, USA
sakter5@gmu.edu
Md. Sagar Hossen 3
Dept. of Computer Science and Engineering
Daffodil International University
Dhaka, Bangladesh
sagar15-1504@diu.edu.bd
Sadia Ahmed Keya 4
Dept. of Computer Science and Engineering
Daffodil International University
Dhaka, Bangladesh
sadia15-1442@diu.edu.bd
Sadia Afrin Tisha 5
Dept. of Computer Science and Engineering
Daffodil International University
Dhaka, Bangladesh
sadia15-1478@diu.edu.bd
Shahed Hossain 6
Dept. of Computer Science and Engineering
Daffodil International University
Dhaka, Bangladesh
shahed15-2659@diu.edu.bd
Abstract Chronic kidney disease (CKD) is an increasing
medical issue that declines the productivity of renal capacities
and subsequently damages the kidneys. CKD is very common
nowadays; cardiovascular infection and end-stage renal illness
are two life threatening diseases that can be caused as after-
effects of CKD. These are conceivably preventable through early
recognizable conditions and treatment of people who are in
danger. The expectation of medical problems is a very
troublesome assignment. CKD is particularly one of the most
lethal diseases in the clinical field. Before it becomes too late to
recognize CKD forecast, to get rid of risks, the prediction of risk
factor is a major necessary step in the immediate stage. In this
paper, we have applied six algorithms. Naïve Bayes, Random
forest, Simple logistic regression, Decision Stump, Linear
regression model, simple linear regression is using for predicting
the risk factors of CKD. Considering the orderly execution and
investigations of these strategies, six algorithms give a superior
and quicker characterization execution. Six individual algorithms
are applied to the dataset and the best outcomes have been
acquired through the classification of predicting risk factors.
KeywordsRisk Factor, Classificassion, Cardiovascular,
Chronic Kidney Disease..
I. INTRODUCTION
Now a days, the ratio of chronic kidney disease is rapidly
progressive. The current state of CKD is hampering human’s
day to day life and it cause for heart failure. Many people are
facing this problem in Bangladesh. In most cases rural areas
people are not aware about it for deficiency of unbounded
sense, few sensations are the main reason for CKD [9].
Technology are increasing rapidly but people are not alert
about this. So, they have face huge risk to their kidney [17].
When the utility of a kidney did not work properly, people
needs transplantation of the kidney that is not much suitable.
Several kidney diseases occur with various symptoms as well
kidneys will be damaged, it cannot filter blood the way it
should [12]. Sometimes it goes incurable, chronic. Many of
several symptoms can be used to predict risk factor for kidney
diseases. In this paper the proposal is to analyze the risk
factors of CKD and warn patients to stay healthy. Mostly it
can help the doctor to identify the symptoms easily and take
proper steps to reduce it in before long stage. For this
prediction analysis, using several algorithms named Naïve
Bayes, Random Forest, Simple Logistic regression, Decision
Stump, Linear regression model, Simple linear regression to
predict the risk factor.
II. RELATED WORK
Different sort of work has been accomplished for gathering
helpful fact from Chronic Kidney Disease dataset utilizing
information mining methods [8]. This was done to decrease
the hour of the examination and what is more, it would expand
the exactness of the expectation with the assistance of the
information mining classification technique [1]. Data Mining
is likewise utilized for the goal and prognostic of a few
infections [2] [3] [4]. K. Ero ̆glu and T. Palaba ̧s [5] proposed
guidance that connected six classifiers-KNN, NB, SVM,
choice tables, RF, J48, and three outfits measure. Creators of
[6] try different things with ongoing kidney sickness utilizing
the k-means algorithm and Apriori. An examination was
introduced to recognize CKD utilizing SVM, DT, NB, and
KNN calculations [8]. Ani R et al [7] altered different
characterization of calculations, for example, DT, NB, LDA
classifier, Back Propagation Network (BPN), Random
Subspace, and KNN. For counteraction of death rate brought
about by CKD were applied DT and NB characterization
methods to anticipate CKD [9], [10] made a plan which can
estimate Chronic Kidney Disease at a beginning phase? They
utilized a few neural networks algorithm. A trial [11] led by
M. S. Wibawa, I. M. D. Maysanjaya, and I. M. A. W. Putra
test that truncation of KNN, CFS, and AdaBoost. Its
prosperity was 98.1%. M.P.N.M. Wickramasinghe et al [12]
Presents an exploration concentrate by bringing information
from a patient's clinical records and afterward applying an
arrangement calculation to these records, which has given
CKD patients a reasonable eating regimen plan. Arora, M.,
and Sharma, E. A. [13] proposed a technique for information
mining that has Identification capacities of release window to
execution in weka's apparatus. Ms. Astha Ameta et al [14]
essentially retained information mining strategies and the
techniques by which it can foresee persistent kidney infection.
So unmistakably information mining was a more practical
instrument for foreseeing long term kidney illnesses [15]. Our
proposed technique will investigate the portrayal of Naïve
Bayes, Random woods, simply calculated relapse classifiers
discover the better exactness of CKD and quest the best
answer for identifying CKD. A. J. Aljaaf and Deepika B et al
[18] [21], analysis the early stage of CKD based on machine
learning algorithm and find out the most significant factor.
Siddheswar Tekale and S. Pitchumani Angayarkanni et al [16]
[19] [20], they are predicted the early stage of CKD and find
out the better accuracy to prevent it. Marwa Almasoud et al
[22], detected the CKD using least numbering prediction using
machine learning. From the exploratory consequences of
Decision Stump, Linear regression model, simple linear
regression calculations discover the preferred factor
positioning over different Algorithm.
III. PROPOSED SCHEME
Data mining for diagnostic has become an existent tendency in
our technological advanced world. In human body there are
many survival organs, if they are not working properly
human’s life are in danger [17]. Kidney is one of the major
organs of them. It helps us to reduce the waste product that
flow in our body. It is not only filtering the excess fluids but
also filter the toxic from our blood. Kidney can control the
body’s red blood cells, blood pressure and realizes
erythropoietin, enzymes such as kallikreins [1]. Chronic kidney
illness has ended up a worldwide wellbeing issue concern with
rising predominance. Chorionic kidney disease, also called
chronic kidney miscarriage, describes the continual decrease of
kidney function [9]. Must need to take a few steps to anticipate
and control it. By utilizing different information data mining
strategies. The proposed model is to predict kidney diseases
with a large dataset to increase the model accuracy and find out
the significant and non-significant risk factor of CKD. Naive
Bayes algorithm, Simple logistic regression, Random forest are
using to predict the accuracy of the model and linear regression
model, Decision stump, Simple Linear regression model re
using to find out most significant and non-significant risk
factor of CKD.
Naive Bayes classification and dissect the foremost viable
method. Naive Bayes classifiers are straightforward classifiers
with likelihood based on Bayes theorem. The Random Forest
accomplished higher than Naive Bayes within the prediction of
CKD in our analysis. The quantity of accuracy predictions in
Naive Bayes 93.9056%, Random Forest 98.8858%, Simple
logistic 94.7679%. So, anticipating the result from the
exactness that what number of patients’ unit of measurement
having the persistent nephropathy at interims a particular time.
The Random Forest wrapped up way better in expressions of
precision, and f degree over datasets, though Naive Bayes,
appears way better Accuracy. Subsequently, it can be said that
Random Forest accomplished higher than Simple logistic and
Naive Bayes within the expectation of CKD in the analysis.
In the methodology, six algorithms have been implemented to
predict the risk factors of CKD. Associated the algorithms to
make a hugely effective method of predicting the CKD risk
factors, ensuring very less defects while predicting [6].
In the primary step, the data was prepared by pre-processing
for doing the actual operation. At first, some information has
been elected to integrate it into very small parts. Then the data
cleaning was done and separated. Finally, a Synthetic Dataset
for CKD have been obtained.
After getting the synthetic data set, allocation of the dataset
occurs of two individual actions which are Normalized data,
Formatting data. After these two actions are completed then
combined those and examined for finding the “Z” score.
Applied a condition of if the Z>-2 or not. If not, then disease is
not found, and the process ends. But if it matches the
condition then disease is found.
Z-score normalization may be a strategy of normalizing
data that avoids this outlier issue. The formula for Z-score
normalization is below: v a l u e − μ σ \frac{value -
\mu}{\sigma} σ value−μ Here, μ is that the mean of the
feature and σ is that the variance of the feature. -2 is the
formal value of Z-score, it is the minimum threshold
condition that can fulfill the standard normalization
process. Z- Score is very helpful to understanding the
probability of data to normalize it easily.
Fig. 1. Flowchart of proposed model.
After finding out the disease, splitted the data into two
sections, Test data and Train data. After this, six algorithms
were applied to find out the risk factors. Naïve Bayes,
Random forest, Simple logistic regression is using to find out
the predicting accuracy of the CKD and Decision Stump,
Linear regression model, Simple linear regression is using to
calculate the risk factor of CKD. After that, the result was
found, and a model was got and the visualization of the model
have been done [13]. After discerning this prediction,
analyzing is done. Finally, processing and the closure of the
operation is performed.
IV. DATASET DESCRIPTION
From the Previous study information was gathered from a
survey and created a questionnaire for the data collection.
Then, from a reputed medical college in Bangladesh, patient
medical data was collected. In the questionnaire both case and
control type question were added. The age, blood pressure,
hypertension etc. were also included in the questionnaire.
After creating the questionnaire, the data was collected
through this questionnaire and formatted the data set into CSV
format have collected 1032 patient data. We are applying 68%
data for the training process and 32% data for the testing
process. The overall data preprocessing process is easily
maintain and we can get our valuable output to analyze it.
TABLE 1: Stages of CKD risk level
Attribute
Description
Blood Pressure
Given in mm/Hg
Specificity Gravity
Ranges from 1005 to 10025 (the
higher the risk)
Albumin
Range is 0 to 5 (the higher the
better)
Sugar level
5 levels indicating severity
Red Blood Cells
Is abnormal or normal
Blood urea
It is in mgs/dl
Serum creatinine
High level is not good
Sodium
It is measured in mEq/L
Potassium
It is measured in mEq/L
Hemoglobin
Less than 15 is kidney failure
White Blood Cell Count
This is numerical cell count
Red Blood Cell Count
Should not be higher or less than
normal
Hypertension
It is categorical (yes or no)
Class
Given as CKD or not CKD
A. Dataset Preprocessing
After collecting the data set, we preprocessed the data because
in real-time data set, data often missing or contain garbage
value [10]. So, we fill out the missing value by the mean of its
attribute column, smooth the noisy values. Data encoding was
also applied to convert the data from string to numeric. Some
attributes like age, specific gravity, blood glucose regulator,
hemoglobin, etc. were organized in a continuous format.
B. Algorithms
Decision stump: A decision stump is a Decision Tree. It
utilizes just a solitary trait for parting. This commonly
implies that the tree comprises just a solitary inside hub for
discrete credits, point to be noted that the root has just
leaves as replacement hubs. On the off chance that the
property is mathematical, the tree might be more mind
boggling. A decision stump is an AI model. It consists of a
one-level decision tree. That is, it is a decision tree with one
inside hub which is quickly associated with the terminal
hubs. Decision stumps perform shockingly well on some
regularly utilized benchmark datasets from the UCI vault,
which outlines that students with a high Bias and low
Variance may perform well since they are less inclined to
Over fitting.
Simple linear regression: Simple linear regression is a
measurable technique. It permits to sum up and study
connections between two nonstop (quantitative) factors:
One variable, signified x, is viewed as the indicator, logical,
or free factor. Regression permits to appraise the way a
reliant variable change as the free variable(s) change.
Simple linear regression is utilized to appraise the
connection among two quantitative factors. However, due to
its specific nature, this strategy is one of the quickest with
regards to simple linear regression. Aside from the fitted
coefficient and capture term, it likewise returns fundamental
measurements, for example, R² coefficient and standard
blunder.
Equation: Y = a + bX +eXor ………….(1)
Here, Y is a Dependent variable of (Y) and alpha is a
constant; X is the Independent variable of (X) which is the
coefficient of X; e is the error term.
Naive Bayes: Naive Bayes could be a machine learning
probability algorithm that will be used for a spread of
classification tasks. Typical applications include classifying
documents, sentiment prediction, etc. it is going to be a
probabilistic demonstration, the calculation is often coded up
effectively, and therefore, the forecasts made genuine fast. It
has been successfully used for several purposes; naive Bayes
may be a probabilistic machine learning algorithm supported
the Bayes Theorem. Bayes’ Theorem is a law of conditional
probabilities. It is used to classify the parameter estimation of
small training data. It performs well in multiple class
prediction.
Simple logistic regression: It is an easy Algorithm that you
simply can use as a performance baseline, it is easy to
implement, and it will have the best enough in many tasks.
Therefore, every Machine Learning engineer should be
conversant in its concepts. Like many other machine learning
techniques, it is borrowed from the sector of statistics and
despite its name, it is not an algorithm for regression
problems, where you would like to predict the endless
outcomes. It gives you a discrete binary outcome between 0 to
1. To mention it in simpler words, it is the result is either one
thing or another.
Random Forest: The "random forest" may be a classification
algorithm comprising of various choice trees. It utilizes
stowing and highlight arbitrariness when constructing every
individual tree to aim to form an uncorrelated wood of trees
whose expectation by the panel is more exact than that of a
person tree. In the training set hundred to thousand trees are
counting on the dimensions, the number of sample trees are B.
The error of prediction in each training sample Xi, Xi only
using in bootstrap sample to fit the training and test error tend
in some number of trees.
…(2)
Linear Regression: Linear regression may be a fundamental
and frequently utilized quite prescient investigation. The
overall thought of regression is to seem at two things: (1) the
indicator factors work superbly in anticipating subordinate
volatile? (2) Which factors specifically are huge indicators of
the result variable, and the way would theyshowed by the
extent and indication of the beta appraisals sway the result
volatile? The smallest amount complex sort of the regression
condition are characterized by the recipe y = c + b*x, where y
= assessed subordinate variable score, c = consistent, b =
parametric statistic, and x = score on the free factor.
V. RESULT AND ANALYSIS
In this Analysis, we have obtained results from three distinct
calculations Naïve Bayes, Random Forest, Simple Logistic
which are regulated calculations. We examined the result and
got an aftereffect of up to 90%, so we can say that these
models are highly proficient for this Dataset. Our trained
information collection was 63%. At this point, when we
prepared up, we got the results by testing 37% information.
Furthermore, we identified reasons for Kidney ailment by
administering calculations by Decision Stump, Linear
Regression Model, and Simple Linear Regression. The
fundamental driver is hemoglobin, and it is the principle factor
of kidney ailments.
TABLE 2: Accuracy Table of CKD
Algorithm
Accuracy
Naïve Bayes
93.9056 %
Random Forest
98.8858 %
Simple Logistic
94.7679 %
From the result analysis, we can see that Random forest
algorithm get the high accuracy 98.8858%. So, our approach
model better than any other models. By using this model, we
can easily predict the chronic kidney disease. It is an invention
that can help the medical community to progress our bio-
medical science.
90%
92%
94%
96%
98%
100%
Model accuracy
Loss
Validation
Fig. 2. Accuracy chart diagram of CKD.
TABLE 3: Most significant risk factor predictation table
Factor
Hemo <= 13.05 : 0.9578606158833063
Hemo > 13.05 : 0.12310286677908938
-0.0002 * Bu +
-0.091 * Hemo +
0 * Wbcc +
-0.0076 * Rbcc +
0.288 * Htn +
1.5892
-0.12 * Hemo + 2.13
From the overview of the result we can easily identify the
most significant and non-significant risk factor. In decision
stump and simple linear regression model can analyze that
hemoglobin is the most significant risk factor.
71%
23%
6% 0%
Factor Analysis
Hemo
Htn
Rbcc
Bu
Fig. 3. Pie chart of risk factor prediction.
From the factor analysis we can find out the most significant
and non-significant risk factor of CKD. From analizing report
we can find that hemoglobin is the most significant risk factor
for CKD and Hyper tension is the less significant risk factor in
CKD.
A. Comparision Table
TABLE 4: Comparative analysis of approach model
Algorithms
Precision
Recall
F-Measure
ROC
Area
Naïve Bayes
0.940
0.939
0.939
0.972
Random Forest
0.989
0.989
0.989
0.999
Simple Logistic
0.948
0.948
0.948
0.976
Through the comparative analysis between three algorithms,
we have obtained the best accuracy from Random forest
algorithm and it is the best fit for our dataset. Our approach
model is better than other model to find out the most
significant and non-significant risk factor of CKD.
VI. FUTURE WORK
In this paper, we have completed our work with a large
dataset. In the future, a new and unprecedented aspect of CKD
prevention and control will be revealed through CKD risk
prediction, which will play a vital role in diagnosing CKD in
our medical science [13]. Doctors will be able to predict CKD
by observing the results. Medical scientists will be able to use
this dataset and observe the results to play a special role in
controlling and preventing CKD. Through this study, a new
horizon in CKD control will be opened in the future.
VII. CONCLUSION
In this paper, we predicted chronic kidney disease (CKD)
risk factors and predicted the progression of CKD. Risk factor
predictions perform an essential induction in recognizing the
risk of getting rid of chronic kidney disease (CKD) [9]. Using
algorithms to predict risk factors to get the best results are
achieved by categorizing every single strategy. We are getting
high accuracy with Random forest algorithm (98.8858 %). In
this context, chronic kidney disease (CKD) will be particularly
effective in predicting outcomes by identifying or listing
people at risk. It will be especially effective to treat people by
listing people at risk for scores to predict outcomes. However,
a significant portion of the population at lofty hazard of
chronic kidney disease (CKD) can still be recognized or
identified within the community using CKD risk factor
predicting without admittance for taking care in hospital.
REFERENCES
[1]Ramya, S., & Radha, N. (2016). Diagnosis of chronic kidney disease using
machine learning algorithms. International Journal of Innovative Research in
Computer and Communication Engineering, 4(1), 812-820..J. Clerk Maxwell,
A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon,
1892, pp.6873.
[2] Purushottam Sharma, Kanak Saxena and Richa Sharma, "Heart Disease
Prediction System Evaluation Using C4. 5 Rules and Partial Tree," in
Computational Intelligence in Data MiningVolume 2, ed: Springer, pp. 285-
294, 2016.
[3] Ritika Chadha, Shubhankar Mayank, Anurag Vardhan and Tribikram
Pradhan, "Application of Data Mining Techniques on Heart Disease
Prediction: A Survey," in Emerging Research in Computing,Information,
Communication and Applications, ed: Springer, pp. 413-426, 2016.
[4] Moloud Abdar, Mariam Zomorodi-Moghadam, Resul Das and I-Hsien
Ting, "Performance analysis of classification algorithms on early detection
of liver disease," Expert Systems with Applications, vol. 67, pp. 239 -251,
2017.
[5] K. Ero ̆glu and T. Palaba ̧s, "The impact on the classification
performanceof the combined use of different classification methods and
differentensemble algorithms in chronic kidney disease detection," 2016 Na-
tional Conference on Electrical, Electronics and Biomedical
Engineering(ELECO), Bursa, 2016, pp. 512-516.
[6] N. Tazin, S. A. Sabab and M. T. Chowdhury, "Diagnosis of
ChronicKidney Disease using effective classification and feature selection
tech-nique," 2016 International Conference on Medical Engineering,
HealthInformatics and Technology (MediTec), Dhaka, 2016, pp. 1 -6.
doi:10.1109/MEDITEC.2016.7835365.
[7] R. Ani, G. Sasi, U. R. Sankar and O. S. Deepa, "Decision supportsystem
for diagnosis and prediction of chronic renal failure using randomsubspace
classification," 2016 International Conference on Advances inComputing,
Communications and Informatics (ICACCI), Jaipur, 2016,pp. 1287-1292. doi:
10.1109/ICACCI.2016.7732224 .
[8] Arif-Ul-Islam and S. H. Ripon, "Rule Induction and Prediction of Chronic
Kidney Disease Using Boosting Classifiers, Ant-Miner and J48 Decision
Tree," 2019 International Conference on Electrical, Computer and
Communication Engineering (ECCE), Cox'sBazar, Bangladesh, 2019, pp. 1 -6,
doi: 10.1109/ECACE.2019.8679388.
[9] K. Anantha Padmanaban and G. Parthiban, "Applying Machine
LearningTechniques for Predicting the Risk of Chronic Kidney Disease",
IndianJournal of Science and Technology, vol. 9, no. 29, 2016.
[10] R. K. Chiu, R. Y. Chen, Shin-An Wang and Sheng-Jen Jian,
"Intelligentsystems on the cloud for the early detection of chronic kidney
disease,"2012 International Conference on Machine Learning and
Cybernetics,Xian, 2012, pp. 1737-1742. doi: 10.1109/ICMLC.2012.6359637
[11] M. S. Wibawa, I. M. D. Maysanjaya and I. M. A. W. Putra, "Boosted
classifier and features selection for enhancing chronic kidney
diseasediagnose," 2017 5th International Conference on Cyber and IT
ServiceManagement (CITSM), Denpasar, 2017, pp. 1-6.
[12] D. P. a. K. K. M.P.N.M. Wickramasinghe, "Dietary prediction for
patients with Chronic Kidney Disease (CKD) by considering blood potassium
level using machine learning algorithms," in 2017 IEEE Life Sciences
Conference (LSC), Sydney, NSW, Australia, 2017.
[13] Arora, M., & Sharma, E. A. (2016). Chronic Kidney Disease Detection
by Analyzing Medical Datasets in Weka. International Journal of Computer
Application, 6(4), 20-26.
[14] M. K. J. Ms. Astha Ameta, "Data Mining Techniques for the Prediction
of Kidney Diseases and Treatment: A Review," International Journal Of
Engineering.
[15] A. Maurya, R. Wable, R. Shinde, S. John, R. Jadhav and R. Dakshayani,
"Chronic Kidney Disease Prediction and Recommendation of Suitable Diet
Plan by using Machine Learning," 2019 International Conference on Nascent
Technologies in Engineering (ICNTE), Navi Mumbai, India, 2019, pp. 1-4,
doi: 10.1109/ICNTE44896.2019.8946029.
[16] J. Snegha, V. Tharani, S. D. Preetha, R. Charanya and S. Bhavani,
"Chronic Kidney Disease Prediction Using Data Mining," 2020 International
Conference on Emerging Trends in Information Technology and Engineering
(ic-ETITE), Vellore, India, 2020, pp. 1-5, doi: 10.1109/ic-
ETITE47903.2020.482.
[17] Satukumati S.B., Satla S., Kogila R. (2019). Feature extraction
techniques for chronic kidney disease identification, Ingenierie des
Systemesd'Information, Vol. 24, No. 1, pp. 95-99.
https://doi.org/10.18280/isi.240114
[18] A. J. Aljaaf et al., "Early Prediction of Chronic Kidney Disease Using
Machine Learning Supported by Predictive Analytics," 2018 IEEE Congress
on Evolutionary Computation (CEC), Rio de Janeiro, 2018, pp. 1-9, doi:
10.1109/CEC.2018.8477876.
[19] Siddheswar Tekale, Pranjal Shingavi, Sukanya Wandhekar, Ankit
Chatorikar, Prediction of Chronic Kidney Disease Using Machine Learning
Algorithm International Journal of Advanced Research in Computer and
Communication Engineering. Vol-7, issue-10, 2018.
[20] S. Pitchumani Angayarkanni. Predictive Analysis of Chronic Kidney
Disease Using Machine Learning Algorithm International Journal of Recent
Technology and Engineering (IJRTE). ISSN:2277-3878, vol-8, issue- 2,2019.
[21] Deepika B, Rao VKR, Rampure DN, Prajwal P, Gowda DG, et al (2020)
Early Prediction of Chronic Kidney Disease by using Machine Learning
Techniques. Am J Comput Sci Eng Surv Vol. 8 No. 2:7.
[22] Marwa Almasoud, Tomas E Ward. Detection of Chronic Kidney
Disease using Machine Learning Algorithms with Least Number o
Predictors. International Journal o Advanced Science and Applications
(IJACSA), vol. 10, No. 8, 2019.
... This transparency is crucial for improving trust and acceptance of AI-assisted diagnostics in the medical community. Moreover, explainability facilitates effective communication between healthcare providers and patients, enabling them to comprehend the reasoning behind diagnostic outcomes (Islam et al., 2020). The study by Raihan et al. (Raihan et al., 2023) utilises XGB and Biogeography-Based Optimization (BBO) to improve Chronic Kidney Disease diagnosis through a transparent machine-learning model that aids clinicians, especially in areas with a shortage of nephrologists. ...
Article
Full-text available
Chronic Kidney Disease (CKD) is increasingly recognised as a major health concern due to its rising prevalence. The average survival period without functioning kidneys is typically limited to approximately 18 days, creating a significant need for kidney transplants and dialysis. Early detection of CKD is crucial, and machine learning methods have proven effective in diagnosing the condition, despite their often opaque decision-making processes. This study utilised explainable machine learning to predict CKD, thereby overcoming the 'black box' nature of traditional machine learning predictions. Of the six machine learning algorithms evaluated, the extreme gradient boost (XGB) demonstrated the highest accuracy. For interpretability, the study employed Shapley Additive Explanations (SHAP) and Partial Dependency Plots (PDP), which elucidate the rationale behind the predictions and support the decision-making process. Moreover, for the first time, a graphical user interface with explanations was developed to diagnose the likelihood of CKD. Given the critical nature and high stakes of CKD, the use of explainable machine learning can aid healthcare professionals in making accurate diagnoses and identifying root causes.
... Risk factor prediction of CKD dataset (CKD-2) [28]: This dataset is gathered from Enam Medical College in Dhaka, Bangladesh, through a survey involving 200 patients, each associated with equal number of features. It comprises of 202 instances and 28 features. ...
Article
Full-text available
In clinical decision-making for chronic disorders like chronic kidney disease, high variability often leads to uncertainty and negative outcomes. Deep learning techniques have been developed as useful tools for minimizing the chance and improving clinical decision-making. Moreover, traditional techniques for chronic kidney disease recognition frequently the accuracy is compromised as it relies on limited sets of biological attributes. Therefore, in the proposed work, a combination of deep radial bias network and the puma optimization algorithm is suggested for precised chronic kidney disease classification. Initially, the accessed data undergo preprocessing using Spectral Z score Bag Boost K-Means SMOTE transformation, which includes robust scaling, data cleaning, balancing, encoding, handling missing values, min–max scaling, and z-standardization. Feature selection is then conducted using the hybrid methodology of Role-oriented Binary Walrus Grey Wolf Algorithm to choose discriminative features for improving classification accuracy. Then, Auto Encoder with Patch-Based Principal Component Analysis is employed for dimensionality reduction to minimize the processing time. Finally, the proposed classification method utilizes deep radial bias and the puma optimization search algorithm for effective chronic kidney disease classification. The introduced scheme is tested on two datasets: the risk factor prediction of chronic kidney disease dataset and chronic kidney disease dataset, which provides accuracies of 99.02%, and 99.15%, respectively. Experiments demonstrate that the proposed model identifies chronic kidney disease more accurately than the existing approaches.
... Predicting the presence of topographic features within extensive image datasets poses a formidable computational challenge, primarily attributed to diverse factors, including the suboptimal choice of predictive variables, the inherent limitations in dataset size, and the conventional reliance on feature sets in conjunction with machine learning classifiers (Nandhini & Aravinth, 2021;Islam et al., 2020;Alassaf et al., 2018). Furthermore, the utilization of deep learning models for image prediction has been hampered by the inadequacies in predictor variable selection and the lack of hybridized models. ...
Article
Full-text available
The detection of natural images, such as glaciers and mountains, holds practical applications in transportation automation and outdoor activities. Convolutional neural networks (CNNs) have been widely employed for image recognition and classification tasks. While previous studies have focused on fruits, land sliding, and medical images, there is a need for further research on the detection of natural images, particularly glaciers and mountains. To address the limitations of traditional CNNs, such as vanishing gradients and the need for many layers, the proposed work introduces a novel model called DenseHillNet. The model utilizes a DenseHillNet architecture, a type of CNN with densely connected layers, to accurately classify images as glaciers or mountains. The model contributes to the development of automation technologies in transportation and outdoor activities. The dataset used in this study comprises 3,096 images of each of the “glacier” and “mountain” categories. Rigorous methodology was employed for dataset preparation and model training, ensuring the validity of the results. A comparison with a previous work revealed that the proposed DenseHillNet model, trained on both glacier and mountain images, achieved higher accuracy (86%) compared to a CNN model that only utilized glacier images (72%). Researchers and graduate students are the audience of our article.
Article
Full-text available
There are numerous machine learning approaches that can perform predictive analytics on vast volumes of data in a range of businesses. Although using predictive analytics in healthcare is challenging, it will eventually help practitioners make quick choices about the health and treatment of patients based on vast amounts of data. Globally, diseases including liver disease, diabetes, kidney diseases, cancer and heart-related diseases are responsible for a large number of fatalities, however the majority of these deaths are the result of improperly timed disease check-ups. Due to a lack of medical infrastructure and a low doctor-to-population ratio, the aforementioned issue exists. According to data, India has a doctor-to-population ratio of 1:1456 compared to the WHO's suggested ratio of 1 doctor to 1000 patients, demonstrating a physician shortage. If not identified early, diseases including diabetes, liver, kidney, cancer and heart disease pose a risk to humanity. As a result, many lives can be saved by early detection and diagnosis of these disorders. The main goal of this research is to use machine learning classification algorithms to anticipate dangerous diseases. Diabetes, heart disease, liver, cancer and heart diseases are all covered in this study. Our team developed a medical test online application that uses the idea of machine learning to make predictions about various diseases in order to make this run smoothly and be accessible to the general public. Our goal in this effort is to create a web application that uses machine learning to forecast numerous ailments, such as liver, diabetes, kidney, cancer and heart disorders.
Conference Paper
Full-text available
Chronic kidney disease is a rising health issue that affects millions of people worldwide. Early detection and characterization of this disease is essential for effective management and control. This disease is associated with several serious health risks, such as cardiovascular disease, increased risk of stroke, and end-stage renal disease, which can be effectively prevented by early detection and treatment. Medical scientists rely on machine learning algorithms to diagnose the disease accurately at its outset. Recently, adding value to healthcare is being accomplished through the integration of machine learning algorithms into mobile health solution. Considering this, this paper proposes a predictive model of three machine learning classifiers, including Support Vector Machine, Decision Tree, and Multilayer Perceptron for chronic kidney disease prediction. The performance of the model was assessed using confusion matrix and executed in popular machine learning software tools such as WEKA and Rapid Minor. The study found that support vector machine yielded the highest accuracy rate of 98% in predicting chronic kidney disease in WEKA among other standard classifiers by using 10-fold cross validation. In addition, the proposed prediction model has been compared with existing models in terms of accuracy, sensitivity, and specificity. The experimental results indicate that the proposed predictive model shows promising results. These findings could integrate with the development of mobile health solution and other innovative approaches to prevent and treat this debilitating condition.
Article
Rough set theory offers a novel approach to identifying structural correlations amidst imprecise or noisy data, particularly applicable to variables with diverse values. It presents a promising avenue for handling fuzzy, conflicting, and uncertain data, with recent models incorporating various fuzzy generalizations. This technique stands out as a popular solution within artificial intelligence, particularly in data analysis and processing tasks. In the medical domain, where missing data poses a significant challenge, leveraging rough set theory alongside machine learning algorithms for disease prediction is common. This paper proposes a model that effectively predicts missing values using rough set theory, addressing the prevalent issue of incomplete data. By providing a systematic approach and robust algorithm, the model demonstrates the adaptability and potential of rough set theory in contemporary data analysis scenarios. Classification of the predicted data set using supervised Learning Model (SVM) results in accuracy of 82.1% while the F1 score is 82.6%. Through validation with reallife medical datasets using supervised classification techniques, the paper underscores the accuracy and applicability of the proposed algorithm, offering a valuable tool for researchers and practitioners grappling with the complexities of modern data analysis.
Chapter
The kidneys are the prominent organs which help in the removal of waste and toxic material from the body. Kidney malfunctioning occurs due to various reasons, but if certain symptoms are ignored and not treated on time, then it may lead to persistent malfunctioning leading to Chronic Renal Disease (CRD). This condition expedites kidney failure and, in turn, death if not attended appropriately. This work depicts the appropriate, relevant, and correlated attributes among all the attributes and reduction of features in the dataset using chi-squared test on the patients’ dataset for better detection and prediction of CRD. The CRDP algorithm is implemented, and the results are predominantly used in logistic regression and K-nearest neighbor classification techniques to enhance and improve their prediction accuracy on CRD.
Conference Paper
Chronic kidney disease (CKD) is one of the heterogeneous disorders in which the kidneys’ functionality degenerates over time. Although there is a range of abnormalities in kidney function, the malfunction going beyond a threshold leads to untreated kidney failure, also narrated as end-stage renal disorder. However, at times, high-end complex treatments such as kidney transplantation or dialysis may also be life-threatening in CKD patients. The situation often leads to irreversible kidney structure and function, which may also implicate cardio, endocrine, and xenobiotic toxic complications. CKD is identified as a decrease in GFR and/or a rise in albuminuria. As this health disorder becomes more prevalent, the quality of life index becomes detrimental. Moreover, the consequences impact the nation’s economy direct or indirectly. At this juncture, suitable preventive measures and strategic planning are imperative. On the other hand, the world is advancing with modern innovations. Artificial Intelligence, Machine Learning, and Deep Learning are unique technologies exhaustively employed in every sector. These disruptive technologies did not exempt the health segment and even proved their supremacy in several contexts. Accurate disease prediction and early detection are among the outcomes that could be expected from these technologies, so preventive measures could be suggested beforehand. In this article, a comprehensive investigation done by distinguished researchers is explored and presented. Around 100 articles published during the past decade are part of our study, which are deep-dived, and the respective contributions are cited.
Article
Full-text available
According to the health statistics of India on Chronic Kidney Disease (CKD) a total of 63538 cases has been registered. Average age of men and women prone to kidney disease lies in the range of 48 to 70 years. CKD is more prevalent among male than among female. India ranks 17th position in CKD during 2015[1]. This paper focus on the predictive analytics architecture to analyse CKD dataset using feature engineering and classification algorithm. The proposed model incorporates techniques to validate the feasibility of the data points used for analysis. The main focus of this research work is to analyze the dataset of chronic kidney failure and perform the classification of CKD and Non CKD cases. The feasibility of the proposed dataset is determined through the Learning curve performance. The features which play a vital role in classification are determined using sequential forward selection algorithm. The training dataset with the selected features is fed into various classifier to determine which classifier plays a vital and accurate role in detection of CKD. The proposed dataset is classified using various Classification algorithms like Linear Regression(LR), Linear Discriminant Analysis(LDA), K-Nearest Neighbour(KNN), Classification and Regression Tree(CART), Naive Bayes(NB), Support Vector Machine(SVM), Random Forest(RF), eXtreme Gradient Boosting(XGBoost) and Ada Boost Regressor (ABR). It was found that for the given CKD dataset with 25 attributes of 11 Numeric and 14 Nominal the following classifier like LR, LDA, CART,NB,RF,XGB and ABR provides an accuracy ranging from 98% to 100% . The proposed architecture validates the dataset against the thumb rule when working with less number of data points used for classification and the classifier is validated against under fit, over fit conditions. The performance of the classifier is evaluated using accuracy and F-Score. The proposed architecture indicates that LR, RF and ABR provides a very high accuracy and F-Score.
Conference Paper
Full-text available
Chronic Kidney Disease (CKD) is one of the deadliest diseases that slowly damages human kidney. The disease remains undetected in its early stage and the patients can only realize the severity of the disease when it gets advanced. Hence, detecting such disease at earlier stage is a key challenge now. Data mining is a branch of Artificial Intelligence that is widely used to derive interesting patterns from a large volume of medical data. While various data mining techniques used by Experts, boosting and rule extraction techniques have rarely been applied in analyzing Kidney diseases. Boosting is a method of ensemble technique that enhances the prediction power of a data mining model. AdaBoost and LogitBoost are used here for comparing the performance of classification. Ant-Miner is also a data mining algorithm that applies Ant Colony Optimization technique. Ant-Miner along with Decision tree have been used in the paper to derive rules. The aim of this paper is two-fold: analyzing the performance of boosting algorithms for detecting CKD and deriving rules illustrating relationship among the attributes of CKD. The best information retrieved by both classification and rule generation techniques are promising and can be adopted by the Medical Scientists for their research purpose.
Chapter
Predictive analysis plays a major role in healthcare industry where forecasting the disease will reduce the risk that happen to patients. Statistics show that cardiovascular diseases have increased the mortality rate in India. Machine learning which is used in developing a predictive model for various domains is nowadays applied in the field of medical diagnostics. Machine learning is playing an integral role in predicting the presence or absence of heart diseases. Such predictions, if done well in advance, can help the doctors to carry out the treatment for the patients and mitigate their health risk. Biological samples such as blood or tissues are collected from the human body to predict cardiovascular diseases. The proposed work is focused on developing various machine learning predictive models using support vector machine, decision tree, neural network and K-nearest neighbour for prediction of heart disease. For this work Cleveland heart disease dataset is used which consists of 14 attributes and 294 records. A comparative analysis on the prediction models were carried out. From the results, it was found that support vector machine, decision tree and KNN (k = 15) classifiers yield better accuracy to predict heart disease than the other models.
Conference Paper
Chronic Kidney Disease is a serious lifelong condition that induced by either kidney pathology or reduced kidney functions. Early prediction and proper treatments can possibly stop, or slow the progression of this chronic disease to end-stage, where dialysis or kidney transplantation is the only way to save patient’s life. In this study, we examine the ability of several machine-learning methods for early prediction of Chronic Kidney Disease. This matter has been studied widely; however, we are supporting our methodology by the use of predictive analytics, in which we examine the relationship in between data parameters as well as with the target class attribute. Predictive analytics enables us to introduce the optimal subset of parameters to feed machine learning to build a set of predictive models. This study starts with 24 parameters in addition to the class attribute and ends up by 30% of them as ideal subset to predict Chronic Kidney Disease. A total of 4 machine learning based classifiers have been evaluated within a supervised learning setting, achieving highest performance outcomes of AUC 0.995, sensitivity 0.9897, and specificity 1. The experimental procedure concludes that advances in machine learning, with assist of predictive analytics, represent a promising setting by which to recognize intelligent solutions, which in turn prove the ability of predication in the kidney disease domain and beyond.