Conference PaperPDF Available

A Novel Approach to Predict Chronic Kidney Disease using Machine Learning Algorithms

Authors:
  • Saveetha Institute of Medical and Technical Sciences (SIMATS)

Figures

Content may be subject to copyright.
A Novel Approach to Predict Chronic Kidney Disease
using Machine Learning Algorithms
Bhavya Gudeti
School of Computer Science and
Engineering
Vellore Institute of Technology,
Chennai, 600127, Tamilnadu, India.
bhavyagudeti@gmail.com
Terrance Frederick Fernandez
[0000-0002-7317-3362]*
Rajiv Gandhi College of Engineering
and Technology, Puducherry, India
frederick@pec.edu
Shashvi Mishra
School of Computer Science and
Engineering
Vellore Institute of Technology,
Chennai, 600127,
Tamilnadu, India.
shashvimishra@gmail.com
Amit Kumar Tyagi [0000-0003-2657-8700]*
School of Computer Science and
Engineering,
Vellore Institute of Technology,
Chennai, 600127, Tamilnadu, India.
amitkrtyagi025@gmail.com
Shaveta Malik
Terna College of Engineering,
University of Mumbai ,
Maharashtra, India
shavetamalik687@gmail.com
Shabnam Kumari
Anumit Academy of Research and
Research Network, India
shabnam.kt25@gmail.com
AbstractA staggering 63,538 cases have been registered,
according to India's health statistics upon Chronic Kidney
Disease (CKD). The average age of nephropathy for humans lies
between 48-70 years. CKD is more prevalent among males than
females. Bitterly, our Nation rank among top 17 countries in
CKD since 2015 which is characterized by a gradual loss of
excretory organ performance over time. Earlier detection of the
illness followed by treatment could keep this dreaded disease at
the shore. Machine Learning, is making sensible applications in
areas such as analyzing medical science outcomes, sleuthing
fraud etc. For the prediction of chronic diseases various machine
learning algorithms are implemented.
Our main aim is to differentiate the performance of various
machine learning algorithms primarily based on its accuracy. In
this work we idolized Rcode to compare their performances. The
pivotal purpose of this study is to analyze the Chronic Kidney
Disease dataset and conduct CKD and Non CKD classification
cases.
Keywords MachineLearning, Chronic Kidney Disease,
Classification, Accuracy, LogisticRegression, Support Vector
Machine
I. INTRODUCTION
Way back in 1950s, the communication among the human
were predominantly oral. However as technology progressed
since, mankind were obsessed with the technology. The million
dollar query remains, "Why Humans are more obsessed with
technology?”. The response is straight-forward. Demand rise
in manufacturing accelerates surplus data in trade growth,
product, business perspective and sales. These days industries
like automation, aerospace, health care, etc., are operating in
communication of Machines or interconnection of Internet of
Things (IoTs). These IoTs devices (in interconnection) are
manufacturing heaps of knowledge that is required to be
analyzed with efficiency through efficient and fashionable
tools/approaches. Current available ancient tools don’t seem to
be enough to analyze huge volumes of data.
Clustering may be thought as an assortment of objects into
clusters that are similar in nature. The cluster/group contains
objects that mimic each other, while the objects in the other
ones are dissimilar. The application is widely applicable in
applications like Marketing, World Wide Web (WWW),
Earthquake Studies, Aerospace, Biology, Insurance, etc. On
another hand, if the information renders with categorizes/ class
labels, and then classification technique is employed to
categorize the given information into number of classes/
categories based on their similarities. The various applications
of classification are speech and handwriting recognition,
Identification of biometric, classification of documents, etc.
Association Rule Mining (ARM) is: if-then statements
facilitate to indicate the relationships between data items
among transactional databases. Further, Regression (or linear
regression) is employed to seek out the relationship between
two continuous variables. One variable is termed as predictor
or independent variable and other is dependent or response
variable. Outlier detection is outlined as “The method of
distinguishing the extreme points within the data. It is a branch
of data mining.” These all algorithms (discussed above) are a
part of Data mining/ Machine Learning/ Computer Vision.
In the human body, the kidney is instrumental in absorbing
and discharging all the toxic and unessential materials,
typically wastes, from the body through egesting and excretion
process. In India, there are approximately one million cases of
Chronic Kidney Disease (CKD) every year. It is dangerous to
kidney and it produces gradual loss in kidney functionality.
Nevertheless, it is unpredictable because its symptoms grow
gradually and are not unique to the disorder, it is important to
detect CKD at its early stage. Kidneys filter wastes and excess
fluids from the blood that are then excreted in excrement. In
the early stages of CKD, we will have a few signs or
symptoms.
In healthcare organization, Classification is one in all the
foremost usually used ways of machine learning. The
classification model shows the class of result for each data
point.
The classifying methods are Decision Tree, Support Vector
Machine, K-Nearest Neighbor, Naïve Bayes classifier, and
Neural Networks. KNN is used to visualize at the relationship
between different CKD risk factors, in order to predict the
disease at an early stage. Machine Learning is a growing phase
dealing with the study of a huge variable data and it is grown
from the study of pattern (speech and handwriting) recognition
and computational learning theory in Artificial Intelligence
having numerous methods, algorithms, and techniques to
analyze and predict the data. Machine Learning techniques
have proved huge success in detection and recognition of many
essential diseases in medical science’s point of view. Machine
learning would thus be useful for predicting whether the patient
has CKD or not in this question. By using old CKD patient
data to train predictive model, Machine Learning does so.
A. Analysis of Chronic Kidney Disease (CKD)
It makes its way as a ground-breaking and actual channel
that liberates the body from squander and parlous substances
and return supplements, amino acids, insulin, hormones and
different basic substances to the circulatory framework.
Incidentally things will flip out gravely, however. “Chronic
Kidney Disease (CKD) is used at some stage in the world to
suggest to any variety of nephritis that returns. “Infection”
incorporates any deviation from the urinary organ structure or
limit customary, paying very little heed to whether or not it is
most likely going to create a man feel unwell or manufacture
complexities. It is a typical issue that may influence anybody at
any age. It’s assessed that just about 3 million people within
the United Kingdom are at risk of CKD. A combination of
totally different conditions that usually place as train on the
kidneys works on CKD.
Hence, the manuscript is organized as follows: Section II
mentions about related work upon this research topic. Section
III, discusses the Proposed System to classify Chronic Kidney
Disease (CKD). Section IV describes the information regarding
the dataset used and transient introduction regarding the
attributes. Section V deals with the machine learning
algorithms, code and its results for variable measures and
therefore the corresponding output obtained in each
classification algorithm. Further, section VI discusses about an
open discussion about current view, results about chronic
disease. Section VII finally discusses the conclusion of the
research work alongside with the attribute improvement.
II. `RELATED WORK
There are diverse researchers who have worked with the
assistance of several different classification algorithms on CKD
prediction. All those had their model performance expected.
Gunarathne, W.H.S.D. [1] compared the effects of divergent
models. Finally, they concluded that the Multiclass Decision
forest algorithm provides plentiful precision for the 14-attribute
(reduced) data set. S.Dilli Arasu and Dr. R. Thirumalaiselvi [2]
worked on missing values in a Chronic Kidney Disease dataset.
They deduced that the missing values in the dataset can not
only reduce the model's accuracy but also the effects of the
prediction. By patterning a numerical method on stages of
Chronic Kidney Disease, they found a solution to this issue and
by doing so; they stood up with unknown values. They
substituted the missing values with those recalculated ones.
In discovering Chronic Kidney Disease using machine
learning algorithms, Asif Salekin and John stankovic[3] used
novel approach. They got findings on a dataset consisting of
400 records and 25 attributes resulting in a patient prone to
CKD or not. In order to achieve results, they used KNN,
random forest and Neural Network algorithms. They used
wrapper methodology for feature reduction which finds CKD
with high accuracy.
12 different classification algorithms on various datasets
were tested by Sahil Sharma, Vinod Sharma, and Atul Sharma
[4], each with 400 records and 24 attributes. They compared
their expected outcomes with actual results in order to
determine predictive accuracy. They used metrics such as
precision, sensitivity, accuracy and specificity for measuring
the performance of the classifiers. Note that Chronic Kidney
Disease (CKD) is not uncommon.
However, a lot of correct information regarding risk for
progression to nephropathy is direly needed for clinical
selections concerning testing, treatment and referral. Hence,
this section highlighted upon the state of art in the field of
CKD. Interestingly, the further section would discuss our work
in detail.
III. PROPOSED SYSTEM
The proposed system deals with the detection of Chronic
Kidney disease. The healthcare systems generate colossal data.
Thus, it is obligatory to use this data productively to analyze,
predict, and to treat an explicit disease. A classification model
offers some solution from determined values. In classification
type, we have a tendency to expect fewer or lots of input to
predict values of their outcomes. In a supervised machine
learning algorithms, the classification algorithm uses the
training dataset. Classification predicts the categorical class
labels in the data.
The research work tries to present a machine learning
framework for information discovery on the Chronic Kidney
Disease dataset. To classify the disease at puerility, three
machine learning algorithms are used, namely Logistic
Regression, Support Vector Machine, and K-Nearest
Neighbors. The molarity of every algorithm is inspected. Our
proposed model combines the Support Vector Machine,
Logistic Regression and K-Nearest Neighbours (KNN) as
mentioned in Fig.1.
This section snapshot our system that has been proposed in
this paper. Now, next section will discuss the datasets that are
being employed and introduce the table that shows the
attributes and description on the same.
Fig.1. Proposed Model using Various Classification Algorithms
IV. `DATASET USED
The proposed framework uses the UCI Machine Learning
Repository dataset called Chronic Kidney Disease (CKD) that
has 25 attributes, out of which, 11 are numerical and 14 are
nominal. Entire 400 instances of the dataset are used for
training to predict machine learning algorithms. In 400
instances, 250 are labeled as Chronic Kidney Disease (CKD)
and 150 are labeled as Non Chronic Kidney disease. The
attributes present in the data set are bacteria, sodium, age,
Hemoglobin, Diabetes Mellitus, Classification, Appetite,
Coronary Artery Disease, Blood Pressure, Pus cell, Anemia,
Pedal Edema, Sugar, White Blood Cell Count, Hypertension,
Red Blood Cell Count, Potassium, Specific Gravity, Pus cell
chumps, Packed Cell Volume, Albumin, Serum Creatinine,
Red Blood Cells, Blood Urea, and Blood Glucose Random.
The dataset that is taken is divided into two groups, one for
testing the samples and another for training the samples. The
ratio for testing and training data is 30% and 70% respectively.
The data set used has been listed in table 1. The readers can
refer following URL [16] for collecting data. Now, next section
will discuss regarding the machine learning algorithms used to
classify CKD.
TABLE 1 :DATA SET USED
S.No
Attribute
Description about the attribute
1.
Bacteria(nominal)
ba (present / not present)
2.
Sodium(numerical)
sod in mEq/L
3.
Age (numerical)
Person’sAgeinYears
4.
Haemoglobin (numerical)
Hemo in grams
5.
Diabetes Mellitus (nominal)
dm ( yes / no)
6.
Class (nominal )
class (ckd / notckd)
7.
Appetite (nominal)
appet (good / poor)
8.
Coronary Artery Disease
(nominal)
CAD (yes / no)
9.
Blood Pressure (numerical)
BP in mm/Hg
10.
Pus cell (nominal)
PC (normal / abnormal)
11.
Anemia (nominal)
ane (yes / no)
12.
Pedal Edema (nominal)
pe (yes / no)
13.
Sugar (nominal)
su (0/1/2/3/4/5)
14.
White Blood CellCount
(numerical )
Wc in cells/cumm
15.
Hypertension (nominal)
htn (yes/no)
16.
Red Blood Cell Count
(numerical)
Rc in cells/cumm
17.
Potassium (numerical)
Pot in mEq/L
18.
Specific Gravity (nominal)
Sg -
(1.005/1.010/1.015/1.020/1.025)
19.
Pus Cell clumps (nominal)
pcc (present / notpresent)
20.
Packed Cell Volume
(numerical)
P cv
21.
Albumin (nominal)
al (0/1/2/3/4/5)
22.
Serum
Creatinine(numerical)
Sc in mgs/dl
23.
Red Blood Cells (nominal)
RBC (normal/ abnormal)
24.
Blood Urea (numerical)
Bu in mgs/dl
25.
Blood Glucose Random
(numerical)
BGR in mgs/dl
V. SIMULATION RESULTS
This section describes about the simulation results that
are being used in the paper here.
A. Logistic Regression
Logistic Regression may be a calculation for order. As
per heaps of autonomous factors, the logic is 1/0, Yes/No,
True/False. It can be employed to access a paired answer. We
have a tendency to utilize the likelihood log as an
impoverished variable. Logistic Regression is used for the
classification problems in Machine Learning Algorithms. It is
a prophetic analysis algorithm and it is based mostly on the
concept of probability. It means that, given a certain factor,
logistic regression is used to predict an outcome that has two
values. The source code is exemplified in Table I and the
output in Fig.2. From them, we can deduce that the accuracy
of Logistic Regression is 0.7725
TABLE II : RCODE FOR LOGISTIC REGRESSION
ckd<- read.csv("C:/Users/bhavya/Desktop/ckd.csv")
ckd
ckd$Type<- NULL
head(ckd)
dim(ckd)
summary(ckd)
names(ckd)
contrasts(ckd$classification)
#Logistic Regression
glm.fit=glm(classification~age+bp+pcv+bu,
data=ckd,family=binomial)
summary(glm.fit)
#predict provides a vector of fitted probabilities.
glm.probab=predict(glm.fit,type="response")
glm.probab[1:20]
glm.predc=rep("ckd",400)
glm.predc[glm.probab>.5]="notckd"
table(glm.predc,ckd$classification)
mean(glm.pred==ckd$classification)
Fig.2. Output for Logistic Regression
B. Support Vector Machines (SVM)
For each relapse and grouping undertakings, Support
Vector Machine, curtailed as SVM, will be used. Multitude of
researchers favors it deeply as it provides unbelievable
accuracy with less power of activity. In ML, SVM support
vector systems are supervised models compatible with
learning. Support Vector Machine (SVM) offers a dual
platform for regression and classification. This can be used to
solve both linear problems and non-linear ones. This algorithm
uses a hyper plane to categorize the data points. Within this
SVM algorithm, each data point will be plotted as a point in n-
dimensional space, with a value of each attribute being the
value of a given coordinate. Classification can be
accomplished by searching for the right hyper-plane which
basically distinguishes between the two CKD and not CKD
groups. Table III presents the code behind SVM and from the
results in Fig.3, we can witness that the accuracy of SVM =
0.9925187
TABLE III : RCODE FOR SVM
#Generate a random number that is 70% of the total number of
rows in dataset.
ckd1 <- sample(1:nrow(ckd),0.7*nrow(ckd))
ckd.train<- ckd[ckd1,]
ckd.test<- ckd[-ckd1,]
set.seed(1)
ckd<-ckd[1:200,]
x=cbind.data.frame(ckd.train[,9:13])
y=ckd.train$classification
dataset=data.frame(x=x, y=as.factor(y))
library(e1071)## Support Vector Machine
svmfit=svm(y~., data=dataset, kernel="radial",gamma=1,
cost=1)
summary(svmfit)
svm.probs=predict(svmfit,type="response")
svm.probs[1:400]
svm.pred=rep("ckd",400)
svm.pred[svm.probs="notckd"]="notckd"
mean(svm.pred==ckd$classification)
Fig.3. Output for SVM
C. K-Nearest Neighbors Classification:
The sole performance of the K nearest neighbor classifier
algorithm is to predict the target variable by capturing the
nearest neighbor class. The nearest class will be known as the
target variable using the distance measures like Euclidean
distance.
Algorithm:
1. Initialize the parameter K.
2. Calculate the distance between the test sample and all
the training samples
3. Sort the distance in the ascending order.
4. Take K-nearest neighbors.
5. Gather the class of the nearest neighbor.
6. Here as we can see the accuracy in KNN = 0.7875
From the algorithm mentioned above, it is evident
that the results are better in Support Vector Machine. We
provide result with an accuracy of 0.9925187. Now, the
subsequent section will provide a conclusion regarding this
work in brief adding some future enhancement possibilities
with this work.
TABLE IV : SIMULATION RESULTS
Accuracy
0.7725
0.9925187
0.7875
`VI. AN OPEN DISCUSSION
Each classifier's results were evaluated using different
evaluation parameters, and cross-checked against over-fitting
with 10-fold cross-validation. The technique of nested cross-
validation has also helped to fine-tune the model parameters.
The tests will be carried out using the Python 3.3
programming language through the Jupyter Notebook web
application. Several Sciket-learning libraries were used, which
is a free machine learning system platform for Python.
Accuracy using F1-measurement, sensitivity, specificity and
Area under Curve (AUC) are the assessment measures
considered in this analysis. Each model produces different
outputs; depending on its parameter values.Thus with the GB
model we achieve the best efficiency in detection. This result
is better than the results obtained by using a multilayer
perceptron algorithm (MLP) single point split, seven
attributes, and a 98.4 percent F1 measurement. By contrast, a
98.0 per cent F1-measure was obtained with better efficiency
relative to study using RF and five apps.
Some limitations on the dataset used are, however,
important to this analysis. Second, the sample size (400
instances) is expected to be low and may affect the reliability
of the studies. Second, problem identification is another
dataset which has the same features to assess the performance
of the data sets. Also, the readers are suggested to read [17,
18, 19, 20, 21 and 22] to know more about artificial
intelligence, machine learning and deep learning techniques,
i.e., their scope in near future.
VII. CONCLUSION AND FUTURE WORK
Aimed to diagnose Chronic Kidney Disease (CKD) at an
earlier stage, this manuscript introduced a variety of machine
learning algorithms. The models obtained from CKD patients
are trained and authenticated with the mentioned input
parameters. Support Vector Machine, Logistic Regression and
knn are analyzed to conduct the study of CKD. The
performances of those algorithms were determined primarily
on the basis of precision. Our results exemplified that the
Support Vector Machine algorithm predicts Chronic Kidney
Disease better than Logistic Regression and K-Nearest
Neighbors within the narrow limits of this medical scenario.
The benefit of this approach is that the prediction process
takes far less time and helps doctors to initiate treatment at the
earliest for patients with CKD and further to classify larger
population of patients within shorter span. Because the dataset
used in this paper is tiny with 400 examples, we prefer to work
with larger datasets in the future or compare the results of this
dataset with a different dataset with the same. In addition, to
help minimise the incidence of CKD, we try to predict if a
person with this syndrome chances chronic risk factors such as
hypertension, family history of kidney failure and diabetes
using the appropriate dataset.
AUTHORS CONTRIBUTIONS
Shashvi and Bhavya have drafted this manuscript. Amit
Kumar Tyagi and Terrance Frederick Fernandez have
analyzed and approved this manuscript for publication.
CONFLICT OF INTEREST
The authors do not have any conflict concerning
publication of this manuscript.
REFERENCES
[1] L. Rubini, “Early stage of chronic kidney disease UCI machine learning
repository,”2015. [Online].
Available:http://archive.ics.uci.edu/ml/datasets/Chronic Kidney Disease.
[2] Asif Salekin, John Stankovic, "Detection of Chronic Kidney Disease and
Selectiing Important Predictive Attributes," Proc. IEEE International
Conference on Healthcare Informatics (ICHI), IEEE, Oct. 2016,
doi:10.1109/ICHI.2016.36.
[3] Q. Zhang and D. Rothenbacher, "Prevalence of chronic kidney disease
in population-based studies: systematic review," BMC Public Health,
vol. 8, (1), pp. 117, 2008.
[4] K. A. Padmanaban and G. Parthiban, "Applying Machine Learning
Techniques for Predicting the Risk of Chronic Kidney Disease," Indian
Journal of Science and Technology, vol. 9, (29), 2016.
[5] J. Xiao et al, "Comparison and development of machine learning tools in
the prediction of chronic kidney disease progression," Journal of
Translational Medicine, vol. 17, (1), pp. 119, 2019.
[6] Sahil Sharma, Vinod Sharma, Atul Sharma, “Performance Based
Evaluation of Various Machine Learning Classification Techniques for
Chronic Kidney Disease Diagnosis,” July18, 2016.
[7] GunarathneW.H.S.D, Perera K.D.M, Kahandawaarachchi K.A.D.C.P,
“Performance Evaluation on Machine Learning Classification
Techniques for Disease Classification and Forecasting through Data
Analytics for Chronic Kidney Disease (CKD)”, 2017 IEEE
17thInternational Conference on Bioinformatics and Bioengineering.
[8] S.Ramya, Dr.N.Radha, "Diagnosis of Chronic Kidney Disease Using
Machine Learning Algorithms," Proc. International Journal of
Innovative Research in Computer and Communication Engineering, Vol.
4, Issue 1, January 2016.
[9] S. A. Shinde and P. R. Rajeswari, “Intelligent health risk prediction
systems using machine learning: a review,” IJET, vol. 7, no. 3, pp.
1019 1023, 2018.
[10] A.J. Aljaaf et al, "Early prediction of chronic renal
disorder mistreatment machine learning supported
by prognosticative analytics," in 2018 IEEE
Congress on organic process Computation (CEC), 2018.
[11] J.Xiao et al, "Comparison and development of machine learning tools in
the prediction of chronic renal disorder progression," Journal of
Translational drugs, vol. 17, (1), pp. 119, 2019.
[12] P. Yang et al, "A review of ensemble strategies in bioinformatics,
"Current Bioinformatics, vol. 5, (4), pp. 296-308, 2010.
[13] L.Deng et al, "Prediction of protein-protein interaction
sites mistreatment associate
ensemble methodology," BMC Bioinformatics, vol. 10, (1), pp. 426,
2009.
[14] M. Moslem and M. Pasha, "Survey of machine learning algorithms
fordisease diagnostic," Journal of Intelligent Learning Systems
andApplications, vol. 9, (01), pp. 1, 2017.
[15] S.Karamizadeh et al, "Advantage and disadvantage of support vector
machine practicality," in 2014 International Conference on laptop,
Communications, and management Technology (I4CT), 2014.
[16] http://archive.ics.uci.edu/ml/datasets/Chronic_Kidney_Disease
[17] Akshara Pramod, Harsh Sankar Naicker, Amit Kumar Tyagi, “Machine
Learning and Deep Learning: Open Issues and Future Research
Directions for Next Ten Years”, Book: Computational Analysis and
Understanding of Deep Learning for Medical Care: Principles, Methods,
and Applications, 2020, Wiley Scrivener, 2020.
[18] Tyagi, Amit Kumar and G, Rekha, Machine Learning with Big Data
(March 20, 2019). Proceedings of International Conference on
Sustainable Computing in Science, Technology and Management
(SUSCOM), Amity University Rajasthan, Jaipur - India, February 26-
28, 2019.
[19] Amit Kumar Tyagi, Poonam Chahal, “Artificial Intelligence and
Machine Learning Algorithms”, Book: Challenges and Applications for
Implementing Machine Learning in Computer Vision, IGI Global, 2020.
[20] Amit Kumar Tyagi, G. Rekha, “Challenges of Applying Deep Learning
in Real-World Applications”, Book: Challenges and Applications for
Implementing Machine Learning in Computer Vision, IGI Global 2020,
p. 92-118.
[21] Terrance Frederick Fernandez and M. Pradeep, "Multi-level Predictive
with Training Framework (MP with TF) for ranking machine learning
algorithms", IEEE proceeding of 4th International conference on I-
SMAC (IoT in Social, Mobile, Analytics and Cloud (I-SMAC 2020),
pp.697-703, ISBN: 978-1-7281-5464-0/20, 7th to 9th October 2020,
SCAD Palladum, Tamil Nadu.
[22] Aravindan C, Terrance Frederick Fernandez, Hema Malini V and
Catherine Madhu Vidha J, “An Extensive Research on Cyber Threats
using Learning Algorithm”, IEEE proceeding of International
Conference on Emerging Trends in Information Technology and
Engineering, ISBN: 978-1-7281-4141-1, 25th February 2020, Vellore,
India.
... The difficulties encountered during model implementation, ethical concerns, the requirement for high-quality data in order to prepare robust models, and external model validation through the use of an independent data set are all covered in detail in this paper along with other challenges faced by machine learning approaches. In [37,38], a chronic kidney disease prediction model is described. Support vector machines (SVM), logistic regression (LR), decision trees (DT), and kernel neighborhood network (KNN) are the four machine learning techniques used in the development of this model. ...
Article
Full-text available
People today suffer from a wide range of illnesses as a result of their lifestyle choices and the state of the environment. In order to stop such diseases from getting worse, it is crucial to recognize and anticipate them early on. Most of the time, doctors find it challenging to precisely identify the disorders by hand. This article aims to predict and identify patients with more prevalent chronic illnesses. This might be accomplished by making sure that this classification accurately identifies those who have chronic illnesses by applying a state-of-the-art machine learning technique. Another difficult task is predicting diseases. Therefore, data mining is essential to the prediction of disease. By using machine learning algorithms like convolutional neural network (CNN) for automatic feature extraction and disease prediction and K-nearest neighbor (KNN) for distance calculation to find the exact match in the data set and the final disease prediction outcome, the proposed system provides a broad disease prognosis based on the patient's symptoms. The creation of the data set involved gathering information on the symptoms of the sickness, the individual's lifestyle, and specifics about medical consultations, all of which were factored into this broad illness prediction. In conclusion, this research presents a comparative analysis of the suggested system using different algorithms, including logistic regression, decision trees, and Naive Bayes.
... These algorithms can detect subtle changes in kidney function that may not be observable via conventional diagnostic methods (Ebiaredoh-Mienye et al., 2022). Moreover, machine learning models can continuously learn and improve, adapting to new data and emerging trends in patient health profiles (Gudeti et al., 2020). This dynamic nature of machine learning makes it particularly suited for managing chronic conditions like CKD, where patient data evolves over time. ...
Article
Full-text available
Chronic Kidney Disease (CKD) is increasingly recognised as a major health concern due to its rising prevalence. The average survival period without functioning kidneys is typically limited to approximately 18 days, creating a significant need for kidney transplants and dialysis. Early detection of CKD is crucial, and machine learning methods have proven effective in diagnosing the condition, despite their often opaque decision-making processes. This study utilised explainable machine learning to predict CKD, thereby overcoming the 'black box' nature of traditional machine learning predictions. Of the six machine learning algorithms evaluated, the extreme gradient boost (XGB) demonstrated the highest accuracy. For interpretability, the study employed Shapley Additive Explanations (SHAP) and Partial Dependency Plots (PDP), which elucidate the rationale behind the predictions and support the decision-making process. Moreover, for the first time, a graphical user interface with explanations was developed to diagnose the likelihood of CKD. Given the critical nature and high stakes of CKD, the use of explainable machine learning can aid healthcare professionals in making accurate diagnoses and identifying root causes.
... However, this work focused on kidney disease prediction in the network. Many machine learning schemes [6]- [10] suggested to train and test the kidney data on the different nodes. ...
Article
Full-text available
The number of people with kidney disease rises every day for many reasons. Many existing machine-learning-enabled mechanisms for processing kidney disease suffer from long delays and consume much more resources during processing. In this paper, the study shows how federated and reinforcement learning schemes can be used to develop the best delay scheme. The scheme must optimize both the internal and external states of reinforcement learning and the federated learning fog cloud network. This work presents the Adaptive Federated Reinforcement Learning-Enabled System (AFRLS) for Internet of Things (IoT) consumers’ kidney disease image processing. The main relationship between IoT consumers and kidney image is that the data is collected from different IoT consumer sources, such as ultrasound and X-rays in healthcare clinics. In healthcare applications, kidney urinary tasks reduce the time it takes to preprocess federated learning datasets for training and testing and run them on different fog and cloud nodes. AFRLS decides the scheduling on other nodes and improves constraints based on the decision tree. Based on the simulation results, AFRLS is a new strategy that reduces the time tasks need to be delayed compared to other machine learning methods used in fog cloud networks. The AFRLS improved the delay among nodes by 55%, the delay among internal states by 40%, and the training and testing delay by 51%.
Chapter
When it comes to the smart healthcare sector, blockchain technology presents several prospects. Aside from its usage in the financial industry, blockchain technology is now also utilised in the process of establishing trust, protecting privacy, and ensuring security. Within the scope of this work, we will provide an explanation of a new development in the healthcare business that strives to enhance the effectiveness and safety of the administration of healthcare data. We employ blockchain technology to construct a decentralised and tamper-proof network that facilitates safe data exchange among healthcare stakeholders such as patients, providers, and insurers. This technique is known as Blockchain-based Intelligent and Interactive Healthcare Systems (Blockchain-based IHS). The purpose of this chapter is to present an overview of BIIHS, including its advantages, disadvantages, and potential future paths. The BIIHS has the potential to enhance patient outcomes by facilitating personalised treatment plans, lowering the number of medical mistakes, and offering real-time access to vital and sensitive health data. Nevertheless, in order to fully realise the promise of BIIHS, it is necessary to solve problems such as regulation compliance, interoperability, and privacy concerns. Artificial intelligence and the internet of things are two examples of upcoming technologies that might be included into BIIHS in the future. This would allow for the healthcare sector to further improve its capabilities.
Chapter
In today's smart era, the healthcare landscape is rapidly evolving, driven by advancements in technology and the growing healthcare needs of an aging and increasingly interconnected society. To address these challenges, the concept of digital twins has emerged as a promising solution to transform healthcare services for the next generation. This work provides an overview of the key aspects and benefits of digital twin-based smart healthcare services and their potential to revolutionize the healthcare industry. DWT involves creating a digital replica or model of a physical entity, in this case, an individual's health and medical data. By harnessing real-time data from various sources, including wearable devices, electronic health records, and medical imaging, Digital Twins provide a holistic view of an individual's health status, treatment history, and predictive analytics for future health outcomes. This work provides information about data-driven approach enables healthcare providers to make more informed decisions and tailor personalized treatment plans/ improving patient outcomes.
Conference Paper
Kidney disease encompasses various abnormalities in renal function, ranging from subtle damage to severe conditions such as excessive cell expansion, impaired blood filtration, and the deposition of crystalline minerals. Recognizing its significant impact on mortality and years of life lost, early detection becomes paramount in providing timely and focused medical interventions. This research employs a diverse set of machine learning algorithms, including K-Nearest Neighbors, Decision Tree Classifier, Random Forest Classifier, Ada Boost Classifier, Gradient Boosting Classifier, Stochastic Gradient Boosting, Xg- Boost, Cat Boost Classifier, Extra Trees Classifier, Light Gradient Boosting Machine (LGBM) Classifier, Logistic Regression, Support Vector Machine (SVM), Naive Bayes, and Artificial Neural Network (ANN). Evaluation of these algorithms reveals outstanding performance by Ada Boost, XgBoost, and Light Gradient Boosting Machine (LGBM) Classifier, achieving an impressive accuracy of 98%. This study underscores the pivotal role of machine learning in early predicting kidney disease, paving the way for personalized patient care.
Chapter
End-stage renal disease (ESRD), commonly known as kidney failure, is a critical medical condition that has a significant impact on global health. Early detection of kidney failure is crucial in preventing and managing this condition. In recent years, machine learning (ML) models have emerged as promising tools for predicting kidney failure, offering the potential to improve patient outcomes through timely intervention. This comprehensive review provides an overview of the current state of research on kidney failure prediction using various ML models. The review begins by presenting an overview of kidney failure, its prevalence, and the challenges associated with its early detection. It then delves into the role of ML in healthcare and specifically focuses on its application in predicting kidney failure. The discussion encompasses a wide range of ML techniques, including logistic regression, decision trees, support vector machines, and deep learning. The review analyzes key studies and methodologies employed in predicting kidney failure, highlighting the strengths and limitations of different ML approaches. It emphasizes the importance of feature selection, data preprocessing, and model evaluation in enhancing the accuracy and reliability of predictions. Furthermore, it addresses the issue of data imbalance, a common challenge in medical datasets, and explores strategies to mitigate its impact on model performance. In addition to summarizing existing research, the review identifies current gaps in the literature and suggests avenues for future research. This includes the exploration of novel data sources, the integration of multi-modal data, and the development of interpretable models that can assist healthcare professionals in making informed decisions. Overall, this review serves as a valuable resource for researchers, clinicians, and healthcare professionals interested in the application of ML models for kidney failure prediction. By synthesizing the current state of knowledge, it provides insights into the potential of ML models to improve patient outcomes and highlights areas for further research.
Article
Full-text available
Chronic kidney disease is a significant health problem worldwide that affects millions of people, and early detection of this disease is crucial for successful treatment and improved patient outcomes. In this research paper, we conducted a comprehensive comparative analysis of several machine learning algorithms, including logistic regression, Gaussian Naive Bayes, Bernoulli Naive Bayes, Support Vector Machine, X Gradient Boosting, Decision Tree Classifier, Grid Search CV, Random Forest Classifier, AdaBoost Classifier, Gradient Boosting Classifier, XgBoost, Cat Boost Classifier, Extra Trees Classifier, KNN, MLP Classifier, Stochastic gradient descent, and Artificial Neural Network, for the prediction of kidney disease. In this study, a dataset of patient records was utilized, where each record consisted of twenty-five clinical features, including hypertension, blood pressure, diabetes mellitus, appetite and blood urea. The results of our analysis showed that Artificial Neural Network (ANN) outperformed other machine learning algorithms with a maximum accuracy of 100%, while Gaussian Naive Bayes had the lowest accuracy of 94.0%. This suggests that ANN can provide accurate and reliable predictions for kidney disease. The comparative analysis of these algorithms provides valuable insights into their strengths and weaknesses, which can help clinicians choose the most appropriate algorithm for their specific requirements.
Article
Full-text available
Chronic kidney disease is a significant health problem worldwide that affects millions of people, and early detection of this disease is crucial for successful treatment and improved patient outcomes. In this research paper, we conducted a comprehensive comparative analysis of several machine learning algorithms, including logistic regression, Gaussian Naive Bayes, Bernoulli Naive Bayes, Support Vector Machine, X Gradient Boosting, Decision Tree Classifier, Grid Search CV, Random Forest Classifier, AdaBoost Classifier, Gradient Boosting Classifier, XgBoost, Cat Boost Classifier, Extra Trees Classifier, KNN, MLP Classifier, Stochastic gradient descent, and Artificial Neural Network, for the prediction of kidney disease. In this study, a dataset of patient records was utilized, where each record consisted of twenty-five clinical features, including hypertension, blood pressure, diabetes mellitus, appetite and blood urea. The results of our analysis showed that Artificial Neural Network (ANN) outperformed other machine learning algorithms with a maximum accuracy of 100%, while Gaussian Naive Bayes had the lowest accuracy of 94.0%. This suggests that ANN can provide accurate and reliable predictions for kidney disease. The comparative analysis of these algorithms provides valuable insights into their strengths and weaknesses, which can help clinicians choose the most appropriate algorithm for their specific requirements.
Chapter
Full-text available
With the development in technology, many other technologies like machine learning (ML), deep learning, blockchain technology, Internet of Things, and quantum computing have taken place in this current era. These technologies are helping human being to live their life comfortably and without any hurdle. Today, technology is helping human and protecting nature with minimum waste of available/limited resources. Among these inventions, ML and deep learning are two unique inventions which have attract many researchers or computer science researchers (or many research communities) to solve complex problems through ML. Today, ML use has been moved in many sectors to increase productivity of businesses; for example, for retail/marketing purpose, churn prediction of customers, for e-healthcare, and detecting disease in early stages. These are the few examples where ML is used in this current smart era. Together, this deep learning also has increased its importance over ML in many applications like bio-informatics, health informatics, identification of images or handwritten languages, and audio recognition. Many researchers get problematic scenario when they are not sure about particular use of machine and deep learning. This work fulfil such conditions/requirements and provide a complete details about ML and deep learning, i.e., with its evolution to forefront use, to use in many applications, to benefiting to the society, and to challenges and potential limitation in the respective learning techniques.
Chapter
Full-text available
Due to development in technology, millions of devices (internet of things: IoTs) are generating a large amount of data (which is called as big data). This data is required for analysis processes or analytics tools or techniques. In the past several decades, a lot of research has been using data mining, machine learning, and deep learning techniques. Here, machine learning is a subset of artificial intelligence and deep learning is a subset of machine leaning. Deep learning is more efficient than machine learning technique (in terms of providing result accurate) because in this, it uses perceptron and neuron or back propagation method (i.e., in these techniques, solve a problem by learning by itself [with being programmed by a human being]). In several applications like healthcare, retails, etc. (or any real-world problems), deep learning is used. But, using deep learning techniques in such applications creates several problems and raises several critical issues and challenges, which are need to be overcome to determine accurate results.
Chapter
Full-text available
With the recent development in technologies and integration of millions of internet of things devices, a lot of data is being generated every day (known as Big Data). This is required to improve the growth of several organizations or in applications like e-healthcare, etc. Also, we are entering into an era of smart world, where robotics is going to take place in most of the applications (to solve the world's problems). Implementing robotics in applications like medical, automobile, etc. is an aim/goal of computer vision. Computer vision (CV) is fulfilled by several components like artificial intelligence (AI), machine learning (ML), and deep learning (DL). Here, machine learning and deep learning techniques/algorithms are used to analyze Big Data. Today's various organizations like Google, Facebook, etc. are using ML techniques to search particular data or recommend any post. Hence, the requirement of a computer vision is fulfilled through these three terms: AI, ML, and DL.
Article
Full-text available
Background Urinary protein quantification is critical for assessing the severity of chronic kidney disease (CKD). However, the current procedure for determining the severity of CKD is completed through evaluating 24-h urinary protein, which is inconvenient during follow-up. Objective To quickly predict the severity of CKD using more easily available demographic and blood biochemical features during follow-up, we developed and compared several predictive models using statistical, machine learning and neural network approaches. Methods The clinical and blood biochemical results from 551 patients with proteinuria were collected. Thirteen blood-derived tests and 5 demographic features were used as non-urinary clinical variables to predict the 24-h urinary protein outcome response. Nine predictive models were established and compared, including logistic regression, Elastic Net, lasso regression, ridge regression, support vector machine, random forest, XGBoost, neural network and k-nearest neighbor. The AU-ROC, sensitivity (recall), specificity, accuracy, log-loss and precision of each of the models were evaluated. The effect sizes of each variable were analysed and ranked. Results The linear models including Elastic Net, lasso regression, ridge regression and logistic regression showed the highest overall predictive power, with an average AUC and a precision above 0.87 and 0.8, respectively. Logistic regression ranked first, reaching an AUC of 0.873, with a sensitivity and specificity of 0.83 and 0.82, respectively. The model with the highest sensitivity was Elastic Net (0.85), while XGBoost showed the highest specificity (0.83). In the effect size analyses, we identified that ALB, Scr, TG, LDL and EGFR had important impacts on the predictability of the models, while other predictors such as CRP, HDL and SNA were less important. Conclusions Blood-derived tests could be applied as non-urinary predictors during outpatient follow-up. Features in routine blood tests, including ALB, Scr, TG, LDL and EGFR levels, showed predictive ability for CKD severity. The developed online tool can facilitate the prediction of proteinuria progress during follow-up in clinical practice. Electronic supplementary material The online version of this article (10.1186/s12967-019-1860-0) contains supplementary material, which is available to authorized users.
Article
Full-text available
Humans are considered to be the most intelligent species on the mother earth and are inherently more health conscious. Since Centuries mankind has discovered various proven healthcare systems. To automate the process and predict diseases more accurately machine learn-ing methods are gaining popularity in research community. Machine Learning methods facilitate development of the intelligence into a machine, so that it can perform better in the future using the learned experience. Machine learning methods application on electronic health record dataset could provide valuable information and predication of health risks. The aim of this research review paper are four-fold: i) serve as a guideline for researchers who are new to machine learning area and want to contribute to it, ii) provide state-of-the-art survey of machine learning, iii) application of machine learning techniques in the health prediction, and iv) provides further research directions required into health prediction system using machine learning.
Article
Full-text available
In medical imaging, Computer Aided Diagnosis (CAD) is a rapidly growing dynamic area of research. In recent years, significant attempts are made for the enhancement of computer aided diagnosis applications because errors in medical diagnostic systems can result in seriously misleading medical treatments. Machine learning is important in Computer Aided Diagnosis. After using an easy equation, objects such as organs may not be indicated accurately. So, pattern recognition fundamentally involves learning from examples. In the field of bio-medical, pattern recognition and machine learning promise the improved accuracy of perception and diagnosis of disease. They also promote the objectivity of decision-making process. For the analysis of high-dimensional and multimodal bio-medical data, machine learning offers a worthy approach for making classy and automatic algorithms. This survey paper provides the comparative analysis of different machine learning algorithms for diagnosis of different diseases such as heart disease, diabetes disease, liver disease, dengue disease and hepatitis disease. It brings attention towards the suite of machine learning algorithms and tools that are used for the analysis of diseases and decision-making process accordingly.
Conference Paper
Chronic Kidney Disease is a serious lifelong condition that induced by either kidney pathology or reduced kidney functions. Early prediction and proper treatments can possibly stop, or slow the progression of this chronic disease to end-stage, where dialysis or kidney transplantation is the only way to save patient’s life. In this study, we examine the ability of several machine-learning methods for early prediction of Chronic Kidney Disease. This matter has been studied widely; however, we are supporting our methodology by the use of predictive analytics, in which we examine the relationship in between data parameters as well as with the target class attribute. Predictive analytics enables us to introduce the optimal subset of parameters to feed machine learning to build a set of predictive models. This study starts with 24 parameters in addition to the class attribute and ends up by 30% of them as ideal subset to predict Chronic Kidney Disease. A total of 4 machine learning based classifiers have been evaluated within a supervised learning setting, achieving highest performance outcomes of AUC 0.995, sensitivity 0.9897, and specificity 1. The experimental procedure concludes that advances in machine learning, with assist of predictive analytics, represent a promising setting by which to recognize intelligent solutions, which in turn prove the ability of predication in the kidney disease domain and beyond.