Conference PaperPDF Available

Risk Factor Prediction of Chronic Kidney Disease Based on Machine Learning Algorithms

January 2021

January 2021

DOI:10.1109/ICISS49785.2020.9315878

Conference: 3rd International Conference on Intelligent Sustainable Systems (ICISS 2020)
At: India

Authors:

Md. Ashiqul Islam

Daffodil International University

Shamima Akter

George Mason University

Md. Sagar Hossen

Institut Teknologi Sepuluh Nopember

Sadia Ahmed Keya

Daffodil International University

Show all 6 authorsHide

Chronic kidney disease (CKD) is an increasing medical issue that declines the productivity of renal capacities and subsequently damages the kidneys. CKD is very common nowadays; cardiovascular infection and end-stage renal illness are two life-threatening diseases that can be caused as after-effects of CKD. These are conceivably preventable through early recognizable conditions and treatment of people who are in danger. The expectation of medical problems is a very troublesome assignment. CKD is particularly one of the most lethal diseases in the clinical field. Before it becomes too late to recognize CKD forecast, to get rid of risks, the prediction of risk factor is a major necessary step in the immediate stage. In this paper, we have applied six algorithms. Naïve Bayes, Random forest, Simple logistic regression, Decision Stump, Linear regression model, simple linear regression is using for predicting the risk factors of CKD. Considering the orderly execution and investigations of these strategies, six algorithms give a superior and quicker characterization execution. Six individual algorithms are applied to the dataset and the best outcomes have been acquired through the classification of predicting risk factors.

Stages of CKD risk level

…

Most significant risk factor predictation table

…

Comparative analysis of approach model

…

Figures - uploaded by Md. Ashiqul Islam

Content may be subject to copyright.

Content uploaded by Md. Ashiqul Islam

Content may be subject to copyright.

Risk Factor Prediction of Chronic Kidney Disease

Based on Machine Learning Algorithms.

Md. Ashiqul Islam 1

Dept. of Computer Science and Engineering

Daffodil International University

Dhaka, Bangladesh

ashiqul15-951@diu.edu.bd

Shamima Akter 2

Dept. of Bioinformatics and Computational Biology

George Mason University

Manassas, VA -20110, USA

sakter5@gmu.edu

Md. Sagar Hossen 3

Dept. of Computer Science and Engineering

Daffodil International University

Dhaka, Bangladesh

sagar15-1504@diu.edu.bd

Sadia Ahmed Keya 4

Dept. of Computer Science and Engineering

Daffodil International University

Dhaka, Bangladesh

sadia15-1442@diu.edu.bd

Sadia Afrin Tisha 5

Dept. of Computer Science and Engineering

Daffodil International University

Dhaka, Bangladesh

sadia15-1478@diu.edu.bd

Shahed Hossain 6

Dept. of Computer Science and Engineering

Daffodil International University

Dhaka, Bangladesh

shahed15-2659@diu.edu.bd

Abstract— Chronic kidney disease (CKD) is an increasing

medical issue that declines the productivity of renal capacities

and subsequently damages the kidneys. CKD is very common

nowadays; cardiovascular infection and end-stage renal illness

are two life threatening diseases that can be caused as after-

effects of CKD. These are conceivably preventable through early

recognizable conditions and treatment of people who are in

danger. The expectation of medical problems is a very

troublesome assignment. CKD is particularly one of the most

lethal diseases in the clinical field. Before it becomes too late to

recognize CKD forecast, to get rid of risks, the prediction of risk

factor is a major necessary step in the immediate stage. In this

paper, we have applied six algorithms. Naïve Bayes, Random

forest, Simple logistic regression, Decision Stump, Linear

regression model, simple linear regression is using for predicting

the risk factors of CKD. Considering the orderly execution and

investigations of these strategies, six algorithms give a superior

and quicker characterization execution. Six individual algorithms

are applied to the dataset and the best outcomes have been

acquired through the classification of predicting risk factors.

Keywords—Risk Factor, Classificassion, Cardiovascular,

Chronic Kidney Disease..

I. INTRODUCTION

Now a days, the ratio of chronic kidney disease is rapidly

progressive. The current state of CKD is hampering human’s

day to day life and it cause for heart failure. Many people are

facing this problem in Bangladesh. In most cases rural areas

people are not aware about it for deficiency of unbounded

sense, few sensations are the main reason for CKD [9].

Technology are increasing rapidly but people are not alert

about this. So, they have face huge risk to their kidney [17].

When the utility of a kidney did not work properly, people

needs transplantation of the kidney that is not much suitable.

Several kidney diseases occur with various symptoms as well

kidneys will be damaged, it cannot filter blood the way it

should [12]. Sometimes it goes incurable, chronic. Many of

several symptoms can be used to predict risk factor for kidney

diseases. In this paper the proposal is to analyze the risk

factors of CKD and warn patients to stay healthy. Mostly it

can help the doctor to identify the symptoms easily and take

proper steps to reduce it in before long stage. For this

prediction analysis, using several algorithms named Naïve

Bayes, Random Forest, Simple Logistic regression, Decision

Stump, Linear regression model, Simple linear regression to

predict the risk factor.

II. RELATED WORK

Different sort of work has been accomplished for gathering

helpful fact from Chronic Kidney Disease dataset utilizing

information mining methods [8]. This was done to decrease

the hour of the examination and what is more, it would expand

the exactness of the expectation with the assistance of the

information mining classification technique [1]. Data Mining

is likewise utilized for the goal and prognostic of a few

infections [2] [3] [4]. K. Ero ̆glu and T. Palaba ̧s [5] proposed

guidance that connected six classifiers-KNN, NB, SVM,

choice tables, RF, J48, and three outfits measure. Creators of

[6] try different things with ongoing kidney sickness utilizing

the k-means algorithm and Apriori. An examination was

introduced to recognize CKD utilizing SVM, DT, NB, and

KNN calculations [8]. Ani R et al [7] altered different

characterization of calculations, for example, DT, NB, LDA

classifier, Back Propagation Network (BPN), Random

Subspace, and KNN. For counteraction of death rate brought

about by CKD were applied DT and NB characterization

methods to anticipate CKD [9], [10] made a plan which can

estimate Chronic Kidney Disease at a beginning phase? They

utilized a few neural networks algorithm. A trial [11] led by

M. S. Wibawa, I. M. D. Maysanjaya, and I. M. A. W. Putra

test that truncation of KNN, CFS, and AdaBoost. Its

prosperity was 98.1%. M.P.N.M. Wickramasinghe et al [12]

Presents an exploration concentrate by bringing information

from a patient's clinical records and afterward applying an

arrangement calculation to these records, which has given

CKD patients a reasonable eating regimen plan. Arora, M.,

and Sharma, E. A. [13] proposed a technique for information

mining that has Identification capacities of release window to

execution in weka's apparatus. Ms. Astha Ameta et al [14]

essentially retained information mining strategies and the

techniques by which it can foresee persistent kidney infection.

So unmistakably information mining was a more practical

instrument for foreseeing long term kidney illnesses [15]. Our

proposed technique will investigate the portrayal of Naïve

Bayes, Random woods, simply calculated relapse classifiers

discover the better exactness of CKD and quest the best

answer for identifying CKD. A. J. Aljaaf and Deepika B et al

[18] [21], analysis the early stage of CKD based on machine

learning algorithm and find out the most significant factor.

Siddheswar Tekale and S. Pitchumani Angayarkanni et al [16]

[19] [20], they are predicted the early stage of CKD and find

out the better accuracy to prevent it. Marwa Almasoud et al

[22], detected the CKD using least numbering prediction using

machine learning. From the exploratory consequences of

Decision Stump, Linear regression model, simple linear

regression calculations discover the preferred factor

positioning over different Algorithm.

III. PROPOSED SCHEME

Data mining for diagnostic has become an existent tendency in

our technological advanced world. In human body there are

many survival organs, if they are not working properly

human’s life are in danger [17]. Kidney is one of the major

organs of them. It helps us to reduce the waste product that

flow in our body. It is not only filtering the excess fluids but

also filter the toxic from our blood. Kidney can control the

body’s red blood cells, blood pressure and realizes

erythropoietin, enzymes such as kallikreins [1]. Chronic kidney

illness has ended up a worldwide wellbeing issue concern with

rising predominance. Chorionic kidney disease, also called

chronic kidney miscarriage, describes the continual decrease of

kidney function [9]. Must need to take a few steps to anticipate

and control it. By utilizing different information data mining

strategies. The proposed model is to predict kidney diseases

with a large dataset to increase the model accuracy and find out

the significant and non-significant risk factor of CKD. Naive

Bayes algorithm, Simple logistic regression, Random forest are

using to predict the accuracy of the model and linear regression

model, Decision stump, Simple Linear regression model re

using to find out most significant and non-significant risk

factor of CKD.

Naive Bayes classification and dissect the foremost viable

method. Naive Bayes classifiers are straightforward classifiers

with likelihood based on Bayes theorem. The Random Forest

accomplished higher than Naive Bayes within the prediction of

CKD in our analysis. The quantity of accuracy predictions in

Naive Bayes 93.9056%, Random Forest 98.8858%, Simple

logistic 94.7679%. So, anticipating the result from the

exactness that what number of patients’ unit of measurement

having the persistent nephropathy at interims a particular time.

The Random Forest wrapped up way better in expressions of

precision, and f degree over datasets, though Naive Bayes,

appears way better Accuracy. Subsequently, it can be said that

Random Forest accomplished higher than Simple logistic and

Naive Bayes within the expectation of CKD in the analysis.

In the methodology, six algorithms have been implemented to

predict the risk factors of CKD. Associated the algorithms to

make a hugely effective method of predicting the CKD risk

factors, ensuring very less defects while predicting [6].

In the primary step, the data was prepared by pre-processing

for doing the actual operation. At first, some information has

been elected to integrate it into very small parts. Then the data

cleaning was done and separated. Finally, a Synthetic Dataset

for CKD have been obtained.

After getting the synthetic data set, allocation of the dataset

occurs of two individual actions which are Normalized data,

Formatting data. After these two actions are completed then

combined those and examined for finding the “Z” score.

Applied a condition of if the Z>-2 or not. If not, then disease is

not found, and the process ends. But if it matches the

condition then disease is found.

Z-score normalization may be a strategy of normalizing

data that avoids this outlier issue. The formula for Z-score

normalization is below: v a l u e − μ σ \frac{value -

\mu}{\sigma} σ value−μ Here, μ is that the mean of the

feature and σ is that the variance of the feature. -2 is the

formal value of Z-score, it is the minimum threshold

condition that can fulfill the standard normalization

process. Z- Score is very helpful to understanding the

probability of data to normalize it easily.

Fig. 1. Flowchart of proposed model.

After finding out the disease, splitted the data into two

sections, Test data and Train data. After this, six algorithms

were applied to find out the risk factors. Naïve Bayes,

Random forest, Simple logistic regression is using to find out

the predicting accuracy of the CKD and Decision Stump,

Linear regression model, Simple linear regression is using to

calculate the risk factor of CKD. After that, the result was

found, and a model was got and the visualization of the model

have been done [13]. After discerning this prediction,

analyzing is done. Finally, processing and the closure of the

operation is performed.

IV. DATASET DESCRIPTION

From the Previous study information was gathered from a

survey and created a questionnaire for the data collection.

Then, from a reputed medical college in Bangladesh, patient

medical data was collected. In the questionnaire both case and

control type question were added. The age, blood pressure,

hypertension etc. were also included in the questionnaire.

After creating the questionnaire, the data was collected

through this questionnaire and formatted the data set into CSV

format have collected 1032 patient data. We are applying 68%

data for the training process and 32% data for the testing

process. The overall data preprocessing process is easily

maintain and we can get our valuable output to analyze it.

TABLE 1: Stages of CKD risk level

Attribute

Description

Blood Pressure

Given in mm/Hg

Specificity Gravity

Ranges from 1005 to 10025 (the

higher the risk)

Albumin

Range is 0 to 5 (the higher the

better)

Sugar level

5 levels indicating severity

Red Blood Cells

Is abnormal or normal

Blood urea

It is in mgs/dl

Serum creatinine

High level is not good

Sodium

It is measured in mEq/L

Potassium

It is measured in mEq/L

Hemoglobin

Less than 15 is kidney failure

White Blood Cell Count

This is numerical cell count

Red Blood Cell Count

Should not be higher or less than

normal

Hypertension

It is categorical (yes or no)

Class

Given as CKD or not CKD

A. Dataset Preprocessing

After collecting the data set, we preprocessed the data because

in real-time data set, data often missing or contain garbage

value [10]. So, we fill out the missing value by the mean of its

attribute column, smooth the noisy values. Data encoding was

also applied to convert the data from string to numeric. Some

attributes like age, specific gravity, blood glucose regulator,

hemoglobin, etc. were organized in a continuous format.

B. Algorithms

Decision stump: A decision stump is a Decision Tree. It

utilizes just a solitary trait for parting. This commonly

implies that the tree comprises just a solitary inside hub for

discrete credits, point to be noted that the root has just

leaves as replacement hubs. On the off chance that the

property is mathematical, the tree might be more mind

boggling. A decision stump is an AI model. It consists of a

one-level decision tree. That is, it is a decision tree with one

inside hub which is quickly associated with the terminal

hubs. Decision stumps perform shockingly well on some

regularly utilized benchmark datasets from the UCI vault,

which outlines that students with a high Bias and low

Variance may perform well since they are less inclined to

Over fitting.

Simple linear regression: Simple linear regression is a

measurable technique. It permits to sum up and study

connections between two nonstop (quantitative) factors:

One variable, signified x, is viewed as the indicator, logical,

or free factor. Regression permits to appraise the way a

reliant variable change as the free variable(s) change.

Simple linear regression is utilized to appraise the

connection among two quantitative factors. However, due to

its specific nature, this strategy is one of the quickest with

regards to simple linear regression. Aside from the fitted

coefficient and capture term, it likewise returns fundamental

measurements, for example, R² coefficient and standard

blunder.

Equation: Y = a + bX +eXor ………….(1)

Here, Y is a Dependent variable of (Y) and alpha is a

constant; X is the Independent variable of (X) which is the

coefficient of X; e is the error term.

Naive Bayes: Naive Bayes could be a machine learning

probability algorithm that will be used for a spread of

classification tasks. Typical applications include classifying

documents, sentiment prediction, etc. it is going to be a

probabilistic demonstration, the calculation is often coded up

effectively, and therefore, the forecasts made genuine fast. It

has been successfully used for several purposes; naive Bayes

may be a probabilistic machine learning algorithm supported

the Bayes Theorem. Bayes’ Theorem is a law of conditional

probabilities. It is used to classify the parameter estimation of

small training data. It performs well in multiple class

prediction.

Simple logistic regression: It is an easy Algorithm that you

simply can use as a performance baseline, it is easy to

implement, and it will have the best enough in many tasks.

Therefore, every Machine Learning engineer should be

conversant in its concepts. Like many other machine learning

techniques, it is borrowed from the sector of statistics and

despite its name, it is not an algorithm for regression

problems, where you would like to predict the endless

outcomes. It gives you a discrete binary outcome between 0 to

1. To mention it in simpler words, it is the result is either one

thing or another.

Random Forest: The "random forest" may be a classification

algorithm comprising of various choice trees. It utilizes

stowing and highlight arbitrariness when constructing every

individual tree to aim to form an uncorrelated wood of trees

whose expectation by the panel is more exact than that of a

person tree. In the training set hundred to thousand trees are

counting on the dimensions, the number of sample trees are B.

The error of prediction in each training sample Xi, Xi only

using in bootstrap sample to fit the training and test error tend

in some number of trees.

……(2)

Linear Regression: Linear regression may be a fundamental

and frequently utilized quite prescient investigation. The

overall thought of regression is to seem at two things: (1) the

indicator factors work superbly in anticipating subordinate

volatile? (2) Which factors specifically are huge indicators of

the result variable, and the way would they–showed by the

extent and indication of the beta appraisals sway the result

volatile? The smallest amount complex sort of the regression

condition are characterized by the recipe y = c + b*x, where y

= assessed subordinate variable score, c = consistent, b =

parametric statistic, and x = score on the free factor.

V. RESULT AND ANALYSIS

In this Analysis, we have obtained results from three distinct

calculations Naïve Bayes, Random Forest, Simple Logistic

which are regulated calculations. We examined the result and

got an aftereffect of up to 90%, so we can say that these

models are highly proficient for this Dataset. Our trained

information collection was 63%. At this point, when we

prepared up, we got the results by testing 37% information.

Furthermore, we identified reasons for Kidney ailment by

administering calculations by Decision Stump, Linear

Regression Model, and Simple Linear Regression. The

fundamental driver is hemoglobin, and it is the principle factor

of kidney ailments.

TABLE 2: Accuracy Table of CKD

Algorithm

Accuracy

Naïve Bayes

93.9056 %

Random Forest

98.8858 %

Simple Logistic

94.7679 %

From the result analysis, we can see that Random forest

algorithm get the high accuracy 98.8858%. So, our approach

model better than any other models. By using this model, we

can easily predict the chronic kidney disease. It is an invention

that can help the medical community to progress our bio-

medical science.

90%

92%

94%

96%

98%

100%

Model accuracy

Loss

Validation

Fig. 2. Accuracy chart diagram of CKD.

TABLE 3: Most significant risk factor predictation table

Algorithm

Factor

Decision Stump

Hemo <= 13.05 : 0.9578606158833063

Hemo > 13.05 : 0.12310286677908938

Linear Regression Model

-0.0002 * Bu +

-0.091 * Hemo +

0 * Wbcc +

-0.0076 * Rbcc +

0.288 * Htn +

1.5892

Simple Linear Regression

-0.12 * Hemo + 2.13

From the overview of the result we can easily identify the

most significant and non-significant risk factor. In decision

stump and simple linear regression model can analyze that

hemoglobin is the most significant risk factor.

71%

23%

6% 0%

Factor Analysis

Hemo

Htn

Rbcc

Fig. 3. Pie chart of risk factor prediction.

From the factor analysis we can find out the most significant

and non-significant risk factor of CKD. From analizing report

we can find that hemoglobin is the most significant risk factor

for CKD and Hyper tension is the less significant risk factor in

CKD.

A. Comparision Table

TABLE 4: Comparative analysis of approach model

Algorithms

Precision

Recall

F-Measure

ROC

Area

Naïve Bayes

0.940

0.939

0.972

Random Forest

0.989

0.999

Simple Logistic

0.948

0.976

Through the comparative analysis between three algorithms,

we have obtained the best accuracy from Random forest

algorithm and it is the best fit for our dataset. Our approach

model is better than other model to find out the most

significant and non-significant risk factor of CKD.

VI. FUTURE WORK

In this paper, we have completed our work with a large

dataset. In the future, a new and unprecedented aspect of CKD

prevention and control will be revealed through CKD risk

prediction, which will play a vital role in diagnosing CKD in

our medical science [13]. Doctors will be able to predict CKD

by observing the results. Medical scientists will be able to use

this dataset and observe the results to play a special role in

controlling and preventing CKD. Through this study, a new

horizon in CKD control will be opened in the future.

VII. CONCLUSION

In this paper, we predicted chronic kidney disease (CKD)

risk factors and predicted the progression of CKD. Risk factor

predictions perform an essential induction in recognizing the

risk of getting rid of chronic kidney disease (CKD) [9]. Using

algorithms to predict risk factors to get the best results are

achieved by categorizing every single strategy. We are getting

high accuracy with Random forest algorithm (98.8858 %). In

this context, chronic kidney disease (CKD) will be particularly

effective in predicting outcomes by identifying or listing

people at risk. It will be especially effective to treat people by

listing people at risk for scores to predict outcomes. However,

a significant portion of the population at lofty hazard of

chronic kidney disease (CKD) can still be recognized or

identified within the community using CKD risk factor

predicting without admittance for taking care in hospital.

REFERENCES

[1]Ramya, S., & Radha, N. (2016). Diagnosis of chronic kidney disease using

machine learning algorithms. International Journal of Innovative Research in

Computer and Communication Engineering, 4(1), 812-820..J. Clerk Maxwell,

A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon,

1892, pp.68–73.

[2] Purushottam Sharma, Kanak Saxena and Richa Sharma, "Heart Disease

Prediction System Evaluation Using C4. 5 Rules and Partial Tree," in

Computational Intelligence in Data Mining—Volume 2, ed: Springer, pp. 285-

294, 2016.

[3] Ritika Chadha, Shubhankar Mayank, Anurag Vardhan and Tribikram

Pradhan, "Application of Data Mining Techniques on Heart Disease

Prediction: A Survey," in Emerging Research in Computing,Information,

Communication and Applications, ed: Springer, pp. 413-426, 2016.

[4] Moloud Abdar, Mariam Zomorodi-Moghadam, Resul Das and I-Hsien

Ting, "Performance analysis of classification algorithms on early detection

of liver disease," Expert Systems with Applications, vol. 67, pp. 239 -251,

2017.

[5] K. Ero ̆glu and T. Palaba ̧s, "The impact on the classification

performanceof the combined use of different classification methods and

differentensemble algorithms in chronic kidney disease detection," 2016 Na-

tional Conference on Electrical, Electronics and Biomedical

Engineering(ELECO), Bursa, 2016, pp. 512-516.

[6] N. Tazin, S. A. Sabab and M. T. Chowdhury, "Diagnosis of

ChronicKidney Disease using effective classification and feature selection

tech-nique," 2016 International Conference on Medical Engineering,

HealthInformatics and Technology (MediTec), Dhaka, 2016, pp. 1 -6.

doi:10.1109/MEDITEC.2016.7835365.

[7] R. Ani, G. Sasi, U. R. Sankar and O. S. Deepa, "Decision supportsystem

for diagnosis and prediction of chronic renal failure using randomsubspace

classification," 2016 International Conference on Advances inComputing,

Communications and Informatics (ICACCI), Jaipur, 2016,pp. 1287-1292. doi:

10.1109/ICACCI.2016.7732224 .

[8] Arif-Ul-Islam and S. H. Ripon, "Rule Induction and Prediction of Chronic

Kidney Disease Using Boosting Classifiers, Ant-Miner and J48 Decision

Tree," 2019 International Conference on Electrical, Computer and

Communication Engineering (ECCE), Cox'sBazar, Bangladesh, 2019, pp. 1 -6,

doi: 10.1109/ECACE.2019.8679388.

[9] K. Anantha Padmanaban and G. Parthiban, "Applying Machine

LearningTechniques for Predicting the Risk of Chronic Kidney Disease",

IndianJournal of Science and Technology, vol. 9, no. 29, 2016.

[10] R. K. Chiu, R. Y. Chen, Shin-An Wang and Sheng-Jen Jian,

"Intelligentsystems on the cloud for the early detection of chronic kidney

disease,"2012 International Conference on Machine Learning and

Cybernetics,Xian, 2012, pp. 1737-1742. doi: 10.1109/ICMLC.2012.6359637

[11] M. S. Wibawa, I. M. D. Maysanjaya and I. M. A. W. Putra, "Boosted

classifier and features selection for enhancing chronic kidney

diseasediagnose," 2017 5th International Conference on Cyber and IT

ServiceManagement (CITSM), Denpasar, 2017, pp. 1-6.

[12] D. P. a. K. K. M.P.N.M. Wickramasinghe, "Dietary prediction for

patients with Chronic Kidney Disease (CKD) by considering blood potassium

level using machine learning algorithms," in 2017 IEEE Life Sciences

Conference (LSC), Sydney, NSW, Australia, 2017.

[13] Arora, M., & Sharma, E. A. (2016). Chronic Kidney Disease Detection

by Analyzing Medical Datasets in Weka. International Journal of Computer

Application, 6(4), 20-26.

[14] M. K. J. Ms. Astha Ameta, "Data Mining Techniques for the Prediction

of Kidney Diseases and Treatment: A Review," International Journal Of

Engineering.

[15] A. Maurya, R. Wable, R. Shinde, S. John, R. Jadhav and R. Dakshayani,

"Chronic Kidney Disease Prediction and Recommendation of Suitable Diet

Plan by using Machine Learning," 2019 International Conference on Nascent

Technologies in Engineering (ICNTE), Navi Mumbai, India, 2019, pp. 1-4,

doi: 10.1109/ICNTE44896.2019.8946029.

[16] J. Snegha, V. Tharani, S. D. Preetha, R. Charanya and S. Bhavani,

"Chronic Kidney Disease Prediction Using Data Mining," 2020 International

Conference on Emerging Trends in Information Technology and Engineering

(ic-ETITE), Vellore, India, 2020, pp. 1-5, doi: 10.1109/ic-

ETITE47903.2020.482.

[17] Satukumati S.B., Satla S., Kogila R. (2019). Feature extraction

techniques for chronic kidney disease identification, Ingenierie des

Systemesd'Information, Vol. 24, No. 1, pp. 95-99.

https://doi.org/10.18280/isi.240114

[18] A. J. Aljaaf et al., "Early Prediction of Chronic Kidney Disease Using

Machine Learning Supported by Predictive Analytics," 2018 IEEE Congress

on Evolutionary Computation (CEC), Rio de Janeiro, 2018, pp. 1-9, doi:

10.1109/CEC.2018.8477876.

[19] Siddheswar Tekale, Pranjal Shingavi, Sukanya Wandhekar, Ankit

Chatorikar, “Prediction of Chronic Kidney Disease Using Machine Learning

Algorithm” International Journal of Advanced Research in Computer and

Communication Engineering. Vol-7, issue-10, 2018.

[20] S. Pitchumani Angayarkanni. “Predictive Analysis of Chronic Kidney

Disease Using Machine Learning Algorithm” International Journal of Recent

Technology and Engineering (IJRTE). ISSN:2277-3878, vol-8, issue- 2,2019.

[21] Deepika B, Rao VKR, Rampure DN, Prajwal P, Gowda DG, et al (2020)

Early Prediction of Chronic Kidney Disease by using Machine Learning

Techniques. Am J Comput Sci Eng Surv Vol. 8 No. 2:7.

[22] Marwa Almasoud, Tomas E Ward. “Detection of Chronic Kidney

Disease using Machine Learning Algorithms with Least Number o

Predictors.” International Journal o Advanced Science and Applications

(IJACSA), vol. 10, No. 8, 2019.

On the diagnosis of chronic kidney disease using a machine learning-based interface with explainable artificial intelligence

Article

Full-text available

Jun 2024

Chronic Kidney Disease (CKD) is increasingly recognised as a major health concern due to its rising prevalence. The average survival period without functioning kidneys is typically limited to approximately 18 days, creating a significant need for kidney transplants and dialysis. Early detection of CKD is crucial, and machine learning methods have proven effective in diagnosing the condition, despite their often opaque decision-making processes. This study utilised explainable machine learning to predict CKD, thereby overcoming the 'black box' nature of traditional machine learning predictions. Of the six machine learning algorithms evaluated, the extreme gradient boost (XGB) demonstrated the highest accuracy. For interpretability, the study employed Shapley Additive Explanations (SHAP) and Partial Dependency Plots (PDP), which elucidate the rationale behind the predictions and support the decision-making process. Moreover, for the first time, a graphical user interface with explanations was developed to diagnose the likelihood of CKD. Given the critical nature and high stakes of CKD, the use of explainable machine learning can aid healthcare professionals in making accurate diagnoses and identifying root causes.

An effective role-oriented binary Walrus Grey Wolf approach for feature selection in early-stage chronic kidney disease detection

Article

Full-text available

May 2024
INT UROL NEPHROL

In clinical decision-making for chronic disorders like chronic kidney disease, high variability often leads to uncertainty and negative outcomes. Deep learning techniques have been developed as useful tools for minimizing the chance and improving clinical decision-making. Moreover, traditional techniques for chronic kidney disease recognition frequently the accuracy is compromised as it relies on limited sets of biological attributes. Therefore, in the proposed work, a combination of deep radial bias network and the puma optimization algorithm is suggested for precised chronic kidney disease classification. Initially, the accessed data undergo preprocessing using Spectral Z score Bag Boost K-Means SMOTE transformation, which includes robust scaling, data cleaning, balancing, encoding, handling missing values, min–max scaling, and z-standardization. Feature selection is then conducted using the hybrid methodology of Role-oriented Binary Walrus Grey Wolf Algorithm to choose discriminative features for improving classification accuracy. Then, Auto Encoder with Patch-Based Principal Component Analysis is employed for dimensionality reduction to minimize the processing time. Finally, the proposed classification method utilizes deep radial bias and the puma optimization search algorithm for effective chronic kidney disease classification. The introduced scheme is tested on two datasets: the risk factor prediction of chronic kidney disease dataset and chronic kidney disease dataset, which provides accuracies of 99.02%, and 99.15%, respectively. Experiments demonstrate that the proposed model identifies chronic kidney disease more accurately than the existing approaches.

DenseHillNet: a lightweight CNN for accurate classification of natural images

Article

Full-text available

Apr 2024

The detection of natural images, such as glaciers and mountains, holds practical applications in transportation automation and outdoor activities. Convolutional neural networks (CNNs) have been widely employed for image recognition and classification tasks. While previous studies have focused on fruits, land sliding, and medical images, there is a need for further research on the detection of natural images, particularly glaciers and mountains. To address the limitations of traditional CNNs, such as vanishing gradients and the need for many layers, the proposed work introduces a novel model called DenseHillNet. The model utilizes a DenseHillNet architecture, a type of CNN with densely connected layers, to accurately classify images as glaciers or mountains. The model contributes to the development of automation technologies in transportation and outdoor activities. The dataset used in this study comprises 3,096 images of each of the “glacier” and “mountain” categories. Rigorous methodology was employed for dataset preparation and model training, ensuring the validity of the results. A comparison with a previous work revealed that the proposed DenseHillNet model, trained on both glacier and mountain images, achieved higher accuracy (86%) compared to a CNN model that only utilized glacier images (72%). Researchers and graduate students are the audience of our article.

Comparative Analysis of Various Algorithms on Multi -Diseases Using Machine Learning

Article

Full-text available

Mar 2024

There are numerous machine learning approaches that can perform predictive analytics on vast volumes of data in a range of businesses. Although using predictive analytics in healthcare is challenging, it will eventually help practitioners make quick choices about the health and treatment of patients based on vast amounts of data. Globally, diseases including liver disease, diabetes, kidney diseases, cancer and heart-related diseases are responsible for a large number of fatalities, however the majority of these deaths are the result of improperly timed disease check-ups. Due to a lack of medical infrastructure and a low doctor-to-population ratio, the aforementioned issue exists. According to data, India has a doctor-to-population ratio of 1:1456 compared to the WHO's suggested ratio of 1 doctor to 1000 patients, demonstrating a physician shortage. If not identified early, diseases including diabetes, liver, kidney, cancer and heart disease pose a risk to humanity. As a result, many lives can be saved by early detection and diagnosis of these disorders. The main goal of this research is to use machine learning classification algorithms to anticipate dangerous diseases. Diabetes, heart disease, liver, cancer and heart diseases are all covered in this study. Our team developed a medical test online application that uses the idea of machine learning to make predictions about various diseases in order to make this run smoothly and be accessible to the general public. Our goal in this effort is to create a web application that uses machine learning to forecast numerous ailments, such as liver, diabetes, kidney, cancer and heart disorders.

Examination of Unremitting Kidney Illness by Utilizing Machine Learning Classifiers

Conference Paper

Full-text available

Jul 2023

Chronic kidney disease is a rising health issue that affects millions of people worldwide. Early detection and characterization of this disease is essential for effective management and control. This disease is associated with several serious health risks, such as cardiovascular disease, increased risk of stroke, and end-stage renal disease, which can be effectively prevented by early detection and treatment. Medical scientists rely on machine learning algorithms to diagnose the disease accurately at its outset. Recently, adding value to healthcare is being accomplished through the integration of machine learning algorithms into mobile health solution. Considering this, this paper proposes a predictive model of three machine learning classifiers, including Support Vector Machine, Decision Tree, and Multilayer Perceptron for chronic kidney disease prediction. The performance of the model was assessed using confusion matrix and executed in popular machine learning software tools such as WEKA and Rapid Minor. The study found that support vector machine yielded the highest accuracy rate of 98% in predicting chronic kidney disease in WEKA among other standard classifiers by using 10-fold cross validation. In addition, the proposed prediction model has been compared with existing models in terms of accuracy, sensitivity, and specificity. The experimental results indicate that the proposed predictive model shows promising results. These findings could integrate with the development of mobile health solution and other innovative approaches to prevent and treat this debilitating condition.

Coupling of Rough Set Theory and Predictive Power of SVM Towards Mining of Missing Data

Article

Jan 2024

Rough set theory offers a novel approach to identifying structural correlations amidst imprecise or noisy data, particularly applicable to variables with diverse values. It presents a promising avenue for handling fuzzy, conflicting, and uncertain data, with recent models incorporating various fuzzy generalizations. This technique stands out as a popular solution within artificial intelligence, particularly in data analysis and processing tasks. In the medical domain, where missing data poses a significant challenge, leveraging rough set theory alongside machine learning algorithms for disease prediction is common. This paper proposes a model that effectively predicts missing values using rough set theory, addressing the prevalent issue of incomplete data. By providing a systematic approach and robust algorithm, the model demonstrates the adaptability and potential of rough set theory in contemporary data analysis scenarios. Classification of the predicted data set using supervised Learning Model (SVM) results in accuracy of 82.1% while the F1 score is 82.6%. Through validation with reallife medical datasets using supervised classification techniques, the paper underscores the accuracy and applicability of the proposed algorithm, offering a valuable tool for researchers and practitioners grappling with the complexities of modern data analysis.

Predicting Factors Affecting Kidney Functions using Machine Learning

Conference Paper

Nov 2023

Forecasting of Long term Renal disease using Machine learning

Conference Paper

Jan 2024

CRDP: Chronic Renal Disease Prediction and Evaluation with Reduced Prominent Features

Chapter

Mar 2024

The kidneys are the prominent organs which help in the removal of waste and toxic material from the body. Kidney malfunctioning occurs due to various reasons, but if certain symptoms are ignored and not treated on time, then it may lead to persistent malfunctioning leading to Chronic Renal Disease (CRD). This condition expedites kidney failure and, in turn, death if not attended appropriately. This work depicts the appropriate, relevant, and correlated attributes among all the attributes and reduction of features in the dataset using chi-squared test on the patients’ dataset for better detection and prediction of CRD. The CRDP algorithm is implemented, and the results are predominantly used in logistic regression and K-nearest neighbor classification techniques to enhance and improve their prediction accuracy on CRD.

A Study on Machine Learning and Deep Learning Techniques Applied in Predicting Chronic Kidney Diseases

Conference Paper

Feb 2024

Chronic kidney disease (CKD) is one of the heterogeneous disorders in which the kidneys’ functionality degenerates over time. Although there is a range of abnormalities in kidney function, the malfunction going beyond a threshold leads to untreated kidney failure, also narrated as end-stage renal disorder. However, at times, high-end complex treatments such as kidney transplantation or dialysis may also be life-threatening in CKD patients. The situation often leads to irreversible kidney structure and function, which may also implicate cardio, endocrine, and xenobiotic toxic complications. CKD is identified as a decrease in GFR and/or a rise in albuminuria. As this health disorder becomes more prevalent, the quality of life index becomes detrimental. Moreover, the consequences impact the nation’s economy direct or indirectly. At this juncture, suitable preventive measures and strategic planning are imperative. On the other hand, the world is advancing with modern innovations. Artificial Intelligence, Machine Learning, and Deep Learning are unique technologies exhaustively employed in every sector. These disruptive technologies did not exempt the health segment and even proved their supremacy in several contexts. Accurate disease prediction and early detection are among the outcomes that could be expected from these technologies, so preventive measures could be suggested beforehand. In this article, a comprehensive investigation done by distinguished researchers is explored and presented. Around 100 articles published during the past decade are part of our study, which are deep-dived, and the respective contributions are cited.

Predictive Analytics of Chronic Kidney Disease using Machine Learning Algorithm

Article

Full-text available

Jul 2019

Dr. S. Pitchumani Angayarkanni

According to the health statistics of India on Chronic Kidney Disease (CKD) a total of 63538 cases has been registered. Average age of men and women prone to kidney disease lies in the range of 48 to 70 years. CKD is more prevalent among male than among female. India ranks 17th position in CKD during 2015[1]. This paper focus on the predictive analytics architecture to analyse CKD dataset using feature engineering and classification algorithm. The proposed model incorporates techniques to validate the feasibility of the data points used for analysis. The main focus of this research work is to analyze the dataset of chronic kidney failure and perform the classification of CKD and Non CKD cases. The feasibility of the proposed dataset is determined through the Learning curve performance. The features which play a vital role in classification are determined using sequential forward selection algorithm. The training dataset with the selected features is fed into various classifier to determine which classifier plays a vital and accurate role in detection of CKD. The proposed dataset is classified using various Classification algorithms like Linear Regression(LR), Linear Discriminant Analysis(LDA), K-Nearest Neighbour(KNN), Classification and Regression Tree(CART), Naive Bayes(NB), Support Vector Machine(SVM), Random Forest(RF), eXtreme Gradient Boosting(XGBoost) and Ada Boost Regressor (ABR). It was found that for the given CKD dataset with 25 attributes of 11 Numeric and 14 Nominal the following classifier like LR, LDA, CART,NB,RF,XGB and ABR provides an accuracy ranging from 98% to 100% . The proposed architecture validates the dataset against the thumb rule when working with less number of data points used for classification and the classifier is validated against under fit, over fit conditions. The performance of the classifier is evaluated using accuracy and F-Score. The proposed architecture indicates that LR, RF and ABR provides a very high accuracy and F-Score.

Chronic Kidney Disease Prediction and Recommendation of Suitable Diet Plan by using Machine Learning

Conference Paper

Full-text available

Jan 2019

Detection of Chronic Kidney Disease using Machine Learning Algorithms with Least Number of Predictors

Article

Full-text available

Jan 2019

Rule Induction and Prediction of Chronic Kidney Disease Using Boosting Classifiers, Ant-Miner and J48 Decision Tree

Conference Paper

Full-text available

Feb 2019

Chronic Kidney Disease (CKD) is one of the deadliest diseases that slowly damages human kidney. The disease remains undetected in its early stage and the patients can only realize the severity of the disease when it gets advanced. Hence, detecting such disease at earlier stage is a key challenge now. Data mining is a branch of Artificial Intelligence that is widely used to derive interesting patterns from a large volume of medical data. While various data mining techniques used by Experts, boosting and rule extraction techniques have rarely been applied in analyzing Kidney diseases. Boosting is a method of ensemble technique that enhances the prediction power of a data mining model. AdaBoost and LogitBoost are used here for comparing the performance of classification. Ant-Miner is also a data mining algorithm that applies Ant Colony Optimization technique. Ant-Miner along with Decision tree have been used in the paper to derive rules. The aim of this paper is two-fold: analyzing the performance of boosting algorithms for detecting CKD and deriving rules illustrating relationship among the attributes of CKD. The best information retrieved by both classification and rule generation techniques are promising and can be adopted by the Medical Scientists for their research purpose.

Prediction of Chronic Kidney Disease Using Machine Learning Algorithm

Article

Full-text available

Oct 2018

A Predictive Analysis for Heart Disease Using Machine Learning

Chapter

Sep 2020

Predictive analysis plays a major role in healthcare industry where forecasting the disease will reduce the risk that happen to patients. Statistics show that cardiovascular diseases have increased the mortality rate in India. Machine learning which is used in developing a predictive model for various domains is nowadays applied in the field of medical diagnostics. Machine learning is playing an integral role in predicting the presence or absence of heart diseases. Such predictions, if done well in advance, can help the doctors to carry out the treatment for the patients and mitigate their health risk. Biological samples such as blood or tissues are collected from the human body to predict cardiovascular diseases. The proposed work is focused on developing various machine learning predictive models using support vector machine, decision tree, neural network and K-nearest neighbour for prediction of heart disease. For this work Cleveland heart disease dataset is used which consists of 14 attributes and 294 records. A comparative analysis on the prediction models were carried out. From the results, it was found that support vector machine, decision tree and KNN (k = 15) classifiers yield better accuracy to predict heart disease than the other models.

Chronic Kidney Disease Prediction Using Data Mining

Conference Paper

Feb 2020

Feature Extraction Techniques for Chronic Kidney Disease Identification

Article

Apr 2019

Early Prediction of Chronic Kidney Disease Using Machine Learning Supported by Predictive Analytics

Conference Paper

Jul 2018

Chronic Kidney Disease is a serious lifelong condition that induced by either kidney pathology or reduced kidney functions. Early prediction and proper treatments can possibly stop, or slow the progression of this chronic disease to end-stage, where dialysis or kidney transplantation is the only way to save patient’s life. In this study, we examine the ability of several machine-learning methods for early prediction of Chronic Kidney Disease. This matter has been studied widely; however, we are supporting our methodology by the use of predictive analytics, in which we examine the relationship in between data parameters as well as with the target class attribute. Predictive analytics enables us to introduce the optimal subset of parameters to feed machine learning to build a set of predictive models. This study starts with 24 parameters in addition to the class attribute and ends up by 30% of them as ideal subset to predict Chronic Kidney Disease. A total of 4 machine learning based classifiers have been evaluated within a supervised learning setting, achieving highest performance outcomes of AUC 0.995, sensitivity 0.9897, and specificity 1. The experimental procedure concludes that advances in machine learning, with assist of predictive analytics, represent a promising setting by which to recognize intelligent solutions, which in turn prove the ability of predication in the kidney disease domain and beyond.

Dietary prediction for patients with Chronic Kidney Disease (CKD) by considering blood potassium level using machine learning algorithms

Conference Paper

Dec 2017

Risk Factor Prediction of Chronic Kidney Disease Based on Machine Learning Algorithms

Abstract and Figures

Recommended publications

Prediction of Chronic Kidney Disease Using Machine Learning

An Effective Way to Identify Chronic Kidney Disease Using Machine Learning

Chronic Kidney Disease Detection Using Machine Learning Approach

Comparative analysis of machine learning techniques based on chronic kidney disease dataset