ArticlePDF Available

Machine Learning Hybrid Model for the Prediction of Chronic Kidney Disease

Authors:

Abstract and Figures

To diagnose an illness in healthcare, doctors typically conduct physical exams and review the patient's medical history, followed by diagnostic tests and procedures to determine the underlying cause of symptoms. Chronic kidney disease (CKD) is currently the leading cause of death, with a rapidly increasing number of patients, resulting in 1.7 million deaths annually. While various diagnostic methods are available, this study utilizes machine learning due to its high accuracy. In this study, we have used the hybrid technique to build our proposed model. In our proposed model, we have used the Pearson correlation for feature selection. In the frst step, the best models were selected on the basis of critical literature analysis. In the second step, the combination of these models is used in our proposed hybrid model. Gaussian Naïve Bayes, gradient boosting, and decision tree classifer are used as a base classifer, and the random forest classifer is used as a meta-classifer in the proposed hybrid model. Te objective of this study is to evaluate the best machine learning classification techniques and identify the best-used machine-learning classifier in terms of accuracy. Tis provides a solution for overfitting and achieves the highest accuracy. It also highlights some of the challenges that afect the result of better performance. In this study, we critically review the existing available machine-learning classification techniques. We evaluate in terms of accuracy, and a comprehensive analytical evaluation of the related work is presented with a tabular system. In implementation, we have used the top four models and built a hybrid model using UCI chronic kidney disease dataset for prediction. Gradient boosting achieves around 99% accuracy, random forest achieves 98%, and decision tree classifier achieves 96% accuracy, Our proposed hybrid model performs best, getting 100% accuracy on the same dataset. Some of the main machine learning algorithms used to predict the occurrence of CKD are Naïve Bayes, decision trees, K-nearest neighbor, random forest, support vector machine, LDA, GB, and neural network. In this study, we apply GB (gradient boosting), Gaussian Naïve Bayes, and decision tree along with random forest on the same set of features and compare the accuracy score.
This content is subject to copyright. Terms and conditions apply.
Research Article
Machine Learning Hybrid Model for the Prediction of Chronic
Kidney Disease
Hira Khalid,
1
Ajab Khan ,
1
Muhammad Zahid Khan ,
2
Gulzar Mehmood ,
3
and Muhammad Shuaib Qureshi
4
1
Department of Information Technology, Abbottabad University of Science and Technology, Havelian 22500,
Abbottabad, Pakistan
2
Department of Computer Science and I.T, Network Systems and Security Research Group, University of Malakand,
Chakdara 18800, Khyber Pakhtunkhwa, Pakistan
3
Department of Computer Science, IQRA National University, Swat Campus 19220, Peshawar, Pakistan
4
Department of Computer Science, School of Arts and Sciences, University of Central Asia, Bishkek, Kyrgyzstan
Correspondence should be addressed to Muhammad Shuaib Qureshi; muhammad.qureshi@ucentralasia.org
Received 25 July 2022; Revised 6 September 2022; Accepted 19 September 2022; Published 14 March 2023
Academic Editor: Farman Ali
Copyright ©2023 Hira Khalid et al. is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
To diagnose an illness in healthcare, doctors typically conduct physical exams and review the patient’s medical history, followed
by diagnostic tests and procedures to determine the underlying cause of symptoms. Chronic kidney disease (CKD) is currently
the leading cause of death, with a rapidly increasing number of patients, resulting in 1.7 million deaths annually. While various
diagnostic methods are available, this study utilizes machine learning due to its high accuracy. In this study, we have used the
hybrid technique to build our proposed model. In our proposed model, we have used the Pearson correlation for feature
selection. In the rst step, the best models were selected on the basis of critical literature analysis. In the second step, the
combination of these models is used in our proposed hybrid model. Gaussian Na¨
ıve Bayes, gradient boosting, and decision tree
classier are used as a base classier, and the random forest classier is used as a meta-classier in the proposed hybrid model.
e objective of this study is to evaluate the best machine learning classication techniques and identify the best-used machine
learning classier in terms of accuracy. is provides a solution for overtting and achieves the highest accuracy. It also
highlights some of the challenges that aect the result of better performance. In this study, we critically review the existing
available machine learning classication techniques. We evaluate in terms of accuracy, and a comprehensive analytical
evaluation of the related work is presented with a tabular system. In implementation, we have used the top four models and built
a hybrid model using UCI chronic kidney disease dataset for prediction. Gradient boosting achieves around 99% accuracy,
random forest achieves 98%, decision tree classier achieves 96% accuracy, and our proposed hybrid model performs best
getting 100% accuracy on the same dataset. Some of the main machine learning algorithms used to predict the occurrence of
CKD are Na¨
ıve Bayes, decision tree, K-nearest neighbor, random forest, support vector machine, LDA, GB, and neural
network. In this study, we apply GB (gradient boosting), Gaussian Na¨
ıve Bayes, and decision tree along with random forest on
the same set of features and compare the accuracy score.
1. Introduction
Nowadays, chronic kidney disease (CKD) is a rapidly
growing disease, and millions of people die due to lack of
timely aordable treatment. Chronic kidney disease patients
belong to low-class and middle-classincome-generating
countries [1, 2].
In 2013, about one million people died due to chronic
kidney disease [3]. e developing world suers more from
the chronic kidney disease, and low to average income
countries contain a total of 387.5 million CKD patients
where 177.4 million patients are male and 210.1 million
patients are female [4]. ese gures show that a large
number of people in developing countries suer from
Hindawi
Computational Intelligence and Neuroscience
Volume 2023, Article ID 9266889, 14 pages
https://doi.org/10.1155/2023/9266889
chronic kidney disease, and this ratio is increasing day by
day. A lot of work has been done for the early diagnosis of
chronic kidney disease so that the disease could be treated at
an early stage. In this article, we are focusing on machine
learning prediction models for chronic kidney disease and
giving importance to accuracy.
Chronic kidney disease is a common type of kidney
disease that occurs when both kidneys are damaged, and
the CKD patients suer from this condition for a long
term. Here, the term kidney damage means any kidney
condition that can cause improper functioning of the
kidney. is could be caused by any disorder or due to lack
of essentials like the glomerular ltration rate (GFR)
reduction [5]. Our proposed prediction model takes the
clinical symptoms as input and predicts the results using
the stacking classier with the random forest algorithm as
a base classier.
Machine learning is gaining signicance in healthcare
diagnosis as it enables intricate analysis, thereby minimizing
human errors and enhancing the precision of predictions.
Machine learning algorithms and classiers are now con-
sidered the most reliable techniques for the diagnosis of
dierent diseases like heart disease, diabetes, tumors disease,
and liver disease predictions [6].
Dierent machine learning algorithms used the Na¨
ıve Bayes,
SVM, and the decision tree for the classication purpose, while
random forest, logistic regression, and linear regression were
used for the regression purpose in the medical elds for the
prediction. With the ecient use of these algorithms, the death
rate can be minimized due to early-stage diagnosis and patients
can be treated timely. Along with maintaining the clinical
symptoms, chronic kidney disease patients should include
physical activities in daily life. ey should exercise, drink water,
and avoid junk food. e common symptoms of chronic kidney
disease are shown in Figure 1.
is article delivers an overview and analysis sub-
sequently followed by an implementation and evaluation of
the machine learning classiers used in CKD diagnosis.
Further, this article discusses the importance of machine
learning classiers in healthcare and explains how these can
make more accurate predictions. Figure 2 represents the
block diagram of the chronic kidney disease
prediction model.
e core objective of this article is to propose and im-
plement a hybrid machine learning prediction model for
chronic kidney disease where due importance is given to
accuracy. In this article, we have analyzed the accuracy of
same dataset with respect to dierent machine learning
algorithms and compared their accuracy score so as to get
a better model. Our focus remains on the solution of
overtting problem using cross-validation while achieving
the highest accuracy to build a best hybrid model from the
combination of available popular machine learning classi-
ers such as decision tree, gradient boosting, Gaussian Na¨
ıve
Bayes, and gradient boosting. e ultimate goal is to deliver
an accurate and eective treatment to CKD patients at
a reduced cost. Before we proceed further, we need to know
little more about common diseases of the kidney. In Table 1,
there is a list of some of the most common kidney diseases
(Table 2).
e remaining portion of the article is organized as
follows. Section 2 contains the literature survey along with
the tabular comparison of the dierent machine learning
algorithms used and an analysis of the results. Section 3
contains the proposed methodology. Section 4 contains the
dataset details. Section 5 contains results and discussion.
Section 6 contains conclusion and future work.
2. Literature Review
is section covers research work related to algorithms
and assesses some algorithms based on their accuracy. In
research work [7], the data mining technique applied to
specic analysis of clinical records is a good method. e
performance of the decision tree method was 91% (ac-
curacy) compared to the Na¨
ıve Bayesian method. e
classication algorithm for diabetes dataset had 94%
specicity and 95% sensitivity. ey also found that
mining helps retrieve correlations of attributes that are no
longer direct indicators of the type they are trying to
predict. Similar work still needs to be done to improve the
0.30%
2.50%
3.30%
1.90%
2.60%
2.30%
1.80%
2.40%
2.20%
6.80%
9.70%
2.10%
71.90%
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
(%)
Symptoms in CKD Patients
ethnicity
marital status
educational level
diabetes
hypertension
cerebrovascular disease
myocardial infarction
malignancy
psychiatric disease
BMI
albumin
haemoglobin
proteinuria
Figure 1: Symptoms in CKD patients [7].
CKD EDA
Feature selection
Hybrid Mode
Predicted output
Figure 2: Block diagram of the machine learning hybrid model.
2Computational Intelligence and Neuroscience
overall performance of prediction engine accuracy in the
statistical analysis of neural networks and clustering
algorithms.
In [8], the authors described the prediction models using
machine learning techniques including K-nearest neighbor
(KNN), support vector machine (SVM), logistic regression
(LR), and decision tree classiers for CKD prediction. From
the experiment, it was concluded that the SVM classier
provides the highest accuracy, 98.3%. SVM has the absolute
best sensitivity after training and testing performed with the
proposed method. erefore, according to this comparison,
it could be concluded that an SVM classier is used to
predict persistent kidney disease.
In the paper [9], they chose four dierent algorithms and
compared them to get an accurate expectation rate over the
dataset. Unlike all approaches that were presented, they got
the best results from the gradient boosting classier. e
models eectively achieve an accuracy rate of 99.80%,
whereas AdaBoost and LDA achieve 97.91% at a low value.
Also, the gradient boosting ML classier takes much time to
make the prediction compared to others and has a higher
predictable value in both the curves (ROC and AUC).
Hence, an accurate expectation undoubtedly depends on the
preprocessing strategy, and the methods of preprocessing
must be approached cautiously to precisely achieve recog-
nized results.
In [7], the authors investigated the machine learning
ability, which is supported by predictive analysis so as to
predict CKD early. An experimental procedure was per-
formed by considering a dataset of 400 cases collected by
Apollo Hospitals India. In this article, two labels were used
as output/targets in this hybrid model (i.e., patients having
CKD and others who are healthy) and four dierent ma-
chine learning classiers were implemented. On the com-
parison of these classiers, the classication along with
regression tree, and the RPART classication model, showed
remarkably better results in terms of accuracy. ey used the
information gain quotient for excruciating criterion, and
here the optimum spilling reduces the noise of the resulting
feature subsets. In this study, the RPART limited value of
criterion for the splitting was ve, meaning that splits re-
peatedly occur for the ve instances present in the leaf node.
In addition, they identied an equivalent previous proba-
bility for the class attributes. Here, the RPART prediction
model used seven terminal nodes for the earlier predictions
of CKD. e experimental results showed that the highest
AUC and TPR were obtained with the machine learning
prediction model, whereas the highest TNR (1.00) was
achieved with the model RPART. e RPART model could
be described as a set of rules for making the decision.
However, the major drawback of RPART is the consider-
ation of the single factor as a parameter in every division
Table 1: Description of common diseases of the kidney.
Diseases Description
CKD Chronic kidney disease (CKD) can occur when a disease or condition damages
kidney function, causing kidney damage to deteriorate over a few months or years.
Kidney stones Kidney stones (also called renal calculi) are hard pledges made of salts and minerals
that form inside your kidney.
Glomerulonephritis
Glomerulonephritis causes infection and damage to the ltering part of the kidneys
(glomerulus). It can occur quickly or could be over a longer period. Poisons,
metabolic wastes, and surplus uid are not properly strained into the urine. Instead,
they build up in the body producing inammation and fatigue.
Polycystic kidney disease
Polycystic kidney disease (PKD) is a genetic disorder that can produce many cysts
lled with uid and they grow inside your kidneys. Usually, they are harmless. e
cysts can change the shape of the kidneys while making them much bigger.
Table 2: Equations for accuracy measurement.
S. no Authors Accuracy equations
1 Padmanaban and Parthiban [8] Precision iTPi/TPi + FPi
2 Charleonnan et al. [9] ACC (TP + TN)/(P + N)
3 Ghosh et al. [7] e results of performance degree indices are dependent on TP, TN, FP, and FN
4 Fu et al. [10] Ext. values points >Q3 + 1.5 (IQR) points <Q1 1.5 (IQR)
5 Devika et al. [11] Accuracy number of properly classied samples/total variety of samples
6 Revathy et al. [12] Accuracy (TP +TN)/(TP +TN +FP +FN)Accuracy TP + TN/
TP + TN + FP + FN
7 Nishat et al. [14] Accuracy (TP +TN)/(TP +TN +FP +FN)Accuracy TP + TN/
TP + TN + FP + FN
8 Rabby et al. [13] Descriptive analysis of the data as well as the experimental results
9 Pouriyeh et al. [15] Finding most signicant feature using chi-square test
10 Jabbar et al. [16] Experimental results only
True positive (TP) list contains stated cases that are correctly categorized with CKD. False positive (FP) list contains set that is inaccurately categorized
with CKD. True negative (TN) list contains stated instances that are correctly categorized with CKD. False negative (FN) list contains set of instances that
are exactly categorized with CKD.
Computational Intelligence and Neuroscience 3
procedure, while considering dierent parameter combi-
nations could result in better CKD predictions. However, the
machine learning prediction model gives the lowest error
rate. e major reason is that the MLP could adopt and
handle complex predictions. e complex relationships
require hidden nodes and they are useful as they allow neural
networks to model between parameters while sometimes
deal with nonlinearity in data. e overall results indicate
that the algorithms of machine learning give an inspiring
and a feasible methodology for earlier CKD prediction.
As we have already seen, there are dierent machine
learning prediction models and learning programs avail-
able to assist practitioners. In [5], they used a new selection
guide for predicting CKD. In this work, CKD is predicted
by using specic classiers and a reasonable study of overall
performance. In this study, they performed the evaluation
of the Na¨
ıve Bayes classier, random forest, and articial
neural network classiers and concluded that the random
forest classier performs better as compared to other
classiers. e worth of forecasting CKD has been pro-
gressive. Several sustainable evolutionary policies can be
used to improve the outcomes of the suggested classiers.
Here, Na¨
ıve Bayes, random forest, and KNN were applied
to predict CKD. Early diagnosis of CKD helps to treat those
aected well in time and prevent the disease from pro-
gressing to worse stage. e early detection of this type of
disease and well-timed treatment is one of the main ob-
jectives of the medical eld.
In [10], a machine learning prediction model was de-
veloped for the early prediction of CKD. e dataset gives
input features gathered from the CKD dataset and the
models were tested and validated for the given input fea-
tures. Machine learning decision tree classier, random
forest classier, and support vector classier were con-
structed for the diagnosis of CKD. e performance analysis
of the models was assessed on the basis of the accuracy score
of the prediction model. On comparison, the results of the
research showed that the random forest classier model
performs much better at predicting CKD as compared to
decision tree and support vector classiers.
e kidneys play a vital role in maintaining the body’s
blood pressure, acid-base sense of balance, and electrolyte
sense of balance, not only needed to lter toxins from the
body. Malfunction is accountable for insignicant to mortal
illnesses, in addition to dysfunction in the other body or-
gans. erefore, researchers all over the world have dedi-
cated themselves for nding techniques to accurately
diagnose and eectively treat chronic kidney disease. As
machine learning classiers are increasingly used in the
medical eld for diagnosis, now CKD is also included in the
list of diseases that could be predicted using machine
learning classiers. e research to detect CKD with ML
algorithms has enhanced the procedure and consequence
accuracy progressively. ey proposed the random forest
classier (99.75% accuracy) as the maximum ecient
classier among all other classiers. e study demonstrates
the eective handling of missing values in data through four
techniques, namely, mode, mean, median, and zero-point
methods. It also evaluates the performance of machine
learning models under two scenarios, with and without
tuning the hyperparameters, and observes signicant im-
provement in the classiers’ performance, which is visually
presented through graphs [11].
Overall, the motive of the study is to examine the ap-
plicability of specic supervised machine learning classiers
in the eld of bioinformatics and oer their compatibility in
detecting several serious diseases such as the diagnosis of
CKD at an early stage [12].
ey built an updated and procient machine learning
(ML) application that can perceptually perceive and predict
the state of chronic kidney disease. In this work, the ten most
important machine learning methods for predicting per-
manent kidney disease were considered. e level of ac-
curacy of the classication algorithm we used in our project
is as good as we wanted.
For the prediction of disease, the rst most essential step
is to detect the disease that is costly in developing countries
like Pakistan and Bangladesh. e people of these countries
mostly suer from this. Currently, CKD patient proportion
is increasing rapidly in Pakistan and Bangladesh. So, in that
article, the authors tried to develop a system that helps in
predicting the risk of CKD. In the proposed model, they used
and processed UCI datasets and real-time datasets and tried
to deal with missing data and trained the model using
random forest and ANN classiers. en, they implemented
these two algorithms in the Python language. e accuracy
they got with the random forest algorithm is 97.12% and that
with ANN is 94.5%, which is relatively very good. By use of
this proposed method, risk prediction of CKD at an early
stage is possible.
In [13], the authors predicted CKD based on sugar levels,
aluminum levels, and red blood cell percentage. In this
perception, ve classiers were applied, namely, Na¨
ıve
Bayes, logistic regression, decision table, random tree, and
random forest, and for each classier, the results were noted
based on (i) without preprocessing, (ii) SMOTE with
resampling, and (iii) class equalizer. Random forest classier
has been observed to give the highest accuracy at 98.93% in
SMOTE with resampling.
2.1. Comparison of Machine Learning Classiers for CKD.
In this section, a comprehensive comparison of the state of
the art is presented in the form of a table. e evaluation is
formed in the aspect of accuracy, which can be compre-
hended in Table 3. e table has eight features that are
described below:
Author: this contains the names of the authors of each
article along with the reference.
Year: this column provides the year of the paper’s
publication.
Input data: this column shows the type of dataset that
was used as input for the machine learning classiers.
Disease type: is section shows the type of disease that
was predicted by using dierent classiers. It shows the
best classier found in the research paper, which is the
classier with the maximum accuracy.
4Computational Intelligence and Neuroscience
Table 3: Comparison of classiers for CKD.
S.
no Authors Year Input
data
Disease
type Tools Classiers Cross-validation Accuracy
1Padmanaban and Parthiban
[8] 2016 Diabetic patients CKD WEKA, YALE Na¨
ıve Bayes 10 folds 86%
UCI machine learning Decision tree 91%
2 Charleonnan et al. [9] 2016 Clinical data CKD WEKA,
MATLAB
SVM
5 folds
98. %
Logistic regression 96.55%
Decision tree 94.81%
KNN 98.1%
3 Ghosh et al. [7] 2020 Apollo Hospitals India CKD Python
SVM
5 folds
99.56%
AB 97.91%
LDA 97.91%
GB 99.80%
4 Fu et al.. [10] 2018 UCI repository (CKD dataset) CKD Python
RPART
No
cross-validation
98.2%
SVM 97.3%
LOGR 99.4%
MLP 99.5%
5 Devika et al. [11] 2019 UCI repository (CKD dataset) Chronic renal
disorder C Sharp
Na¨
ıve Bayes No
cross-validation
99.63%
KNN 87.78%
Random forest 99.84%
6 Revathy et al. [12] 2019 UCI repository (CKD dataset) CKD Python
Decision tree No
cross-validation
94.16%
SVM 98.33%
Random forest 99.16%
7 Nishat et al. [14] 2021 Learning repository of University of
California, Irvine CKD Python
CNN
No
cross-validation
78%
LR 98.25%
DT 99%
RF 99.75%
SVM 85%
NB 96.5%
MLP 81.25%
QDA 37.5%
8 Rabby et al. [13] 2019 UCI repository (CKD dataset) CKD Python
K-nearest neighbor
No
cross-validation
71.25%
RF 98.75%
SVM 97.50
GNB 100%
AB 98.75%
DT 100%
LDA 97.50%
GB 98.75
LR 97.50%
ANN 65%
9 Pouriyeh et al. [15] 2020 UCI repository (CKD dataset) CKD Python RF 10 folds 97.12%
ANN 94.5%
Computational Intelligence and Neuroscience 5
Table 3: Continued.
S.
no Authors Year Input
data
Disease
type Tools Classiers Cross-validation Accuracy
10 Jabber et al. [16] 2020 UCI repository (CKD dataset) CKD Python
Decision tree
No
cross-validation
96.79%
Logistic regression 97.86%
Na¨
ıve Bayes 97.33%
Random forest 98.9 %
11 Bmc [17] 2013 UCI repository Diabetic kidney
disease MATLAB
SVM
No
cross-validation
0.91
PLS 0.83
FFNN 0.85
RPART 0.87
Random forest 0.91
Na¨
ıve Bayes 0.86
C5.0 0.90
12 Ramya and Radha [18] 2016 UCI repository Chronic kidney
disease R
BP No
cross-validation
80.4
RBF 85.3
Random forest (RF) 78.6
13 Kumar [19] 2016 UCI repository CKD MATLAB
RF
No
cross-validation
95.67
SMO 90
Na¨
ıve Bayes 87.64
RBF 83.78
MLPC 89
SLG 87
14 Basarslan and Kayaalp [20] 2019 UCI repository Chronic kidney
disease MATLAB
K-nearest neighbor
No
cross-validation
97
Na¨
ıve Bayes 96.5
LR 97.56
RF 99
15 Dowluru and Rayavarapu
[21] 2012 UCI repository Kidney stone
WEKA tool
Na¨
ıve Bayes
classication No
cross-validation
0.99
Logistic regression 1.00
J48 algorithm 0.97
Random forest 0.98
Orange tool
Na¨
ıve Bayes 0.79
KNN 0.7377
Classication tree 0.9352
C4.5 0.9352
SVM 0.9198
Random forest 0.9352
Bold values represent the highest accuracy in the relevant paper.
6Computational Intelligence and Neuroscience
Classiers: this column signies the dierent machine
learning classiers that were used in the research and
the comparison between them.
Tool: e column represents the programming lan-
guage or the framework that was used in building the
model. e researchers used these tools to preprocess
the input data, then create a prediction model, and
nally go to the testing stage.
Cross-validation: this column gives information about
the validation of the classiers and makes a comparison
of dierent research papers regarding folds of cross-
validation used.
Accuracy: e accuracy of the outcomes of the rec-
ommended model is represented in this column. If the
article crisscrosses a comparison, the accuracy column
only contains the accuracy percent of the best classier
conrmed by the author.
2.2. ML Classier with Highest Accuracy. e machine
learning algorithms that we analyzed from the above lit-
erature are listed in Table 4 and Figure 3.
3. Proposed Methodology
e proposed hybrid model is implemented in Python with
pandas, sklearn, Matplotlib, Plotly, and other essential libraries.
We have downloaded the CKD dataset from the UCI re-
pository. e dataset contains two groups (CKD represented by
1 and non-CKD represented by 0) of chronic kidney disease in
the downloaded information. e machine learning algorithm
that has best accuracy is selected for analysis and imple-
mentation so that repeated results are produced. We have also
developed a hybrid model based on knowledge that we gained
during the analysis and implementation. e hybrid model
consists of Gaussian Na¨
ıve Bayes, gradient boosting, and de-
cision tree as base classiers and random forest as a meta
classier. We have selected the tree-based machine learning
algorithms for achieving the highest accuracy, while at the same
time, it can handle the overtting problem. In this paper, we
detect the outliers with the violin plot as shown in Figure 4. As
a solution of this problem, we implement the k-fold technique
and design our model in such a way that it can reduce the
problem of overtting along with achieving the highest ac-
curacy. e classiers are discussed as under.
3.1. Na¨
ıve Bayes (NB). e NB classier is related to the
group of probabilistic classiers and is constructed on the
basis of the Na¨
ıve Bayes (NB) theorem. It takes up vigorous
independence between the component’s/features, and it
contains the most crucial part of how this classier creates
forecasts. It can be built easily and is appropriately used in
the medical eld for the prediction of dierent diseases [15].
3.2. Decision Tree (DT). e decision tree classier has
a tree-like conguration or owchart-like construction. It
consists of subdivisions, leaves/child nodes, and a root/
parent node. Here inner nodes comprise the features,
whereas the subdivisions epitomize the outcome of every
check on every node. Decision tree is one of the commonly
used classiers for classication determination because it
does not need abundant information in the eld or place
constraints for it to work [15].
3.3. Random Forest (RF). In the ensemble and stacking
classication approach, the random forest (RF) is the most
eective algorithm among the other machine learning al-
gorithms. In prediction and probability estimations, random
forest (RF) algorithm has been used. Random forest (RF)
classier consists of many decision trees. Tin Kam Ho of Bell
Labs introduced the concept of random forest in 1995, where
each decision tree casts a vote to determine the object’s class.
e RF method is the combination of both bagging and
random selection of attributes. Random forest classier has
the three hyperparameter tuning values [16].
(i) Number of decision trees (ntree) used by the
random forest classier
(ii) Size of the minimum node in the trees
(iii) Number of attributes employed in splitting every
node for every tree (mtry). Here, m is the number of
attributes.
Table 4: Machine learning algorithms and classiers.
Articles Classiers Highest accuracy (%)
1 Decision tree 91
2 SVM 98.3
3 GB 99.80
4 MLP 99.5
5 Random forest 99.84
6 Random forest 99.16
7 Random forest 99.75
8GNB 100
Decision tree 100
9 Random forest 97.12
10 Random forest 98.93
Bold values represent the highest accuracy in the literature.
91% 98.30%
99.80%
99.50%
99.84%
99.16%
99.75%
100%
100%
97.12%98.93%
Highest Accuracy
Decision Tree
SVM
GB
MLP
Random forest
Random forest
Random forest
GNB
Decision Tree
Random Forest
Random Forest
Figure 3: Comparison of machine learning classiers.
Computational Intelligence and Neuroscience 7
80
100
60
40
20
0
age
01
class
class
1.0
0.0
180
160
140
120
100
80
60
40
bp
01
class
class
1.0
0.0
1.025
1.02
1.015
1.01
1.005
sg
01
class
class
1.0
0.0
5
6
4
3
2
1
0
−1
al
01
class
class
1.0
0.0
5
4
3
2
1
0
su
01
class
class
1.0
0.0
500
400
300
200
100
0
bgr
01
class
class
1.0
0.0
1.025
1.02
1.015
1.01
1.005
sg
01
class
class
1.0
0.0
400
300
200
100
0
bu
01
class
class
1.0
0.0
(a)
Figure 4: Continued.
8Computational Intelligence and Neuroscience
Some of the advantages of the random forest classier
are listed as follows.
(i) For ensemble learning algorithms, the random
forest is the most appropriate choice
(ii) For large datasets, random forest classier
performs well
(iii) Random forest (RF) is able to handle hundreds of
input attributes
(iv) Random forest can estimate which attributes are
more important in classication
(v) Missing value can be handled by using random
forest classier
(vi) Random forest handles the balancing error for class
in unbalanced datasets
3.4. Gaussian Na¨
ıve Bayes (GNB). Gaussian Na¨
ıve Bayes
(GNB) calculated the mean and standard deviation of each
attribute at the training stage. To calculate the probabilities for
the test data, mean and standard deviation were used. Due to
this reason, some values of attributes are too big or too small
from the value of the mean calculated. It aects the classier
25k
20k
15k
10k
5k
001
class
wc
class
1.0
0.0
8
7
6
5
4
3
2
rc
01
class
class
1.0
0.0
60
50
40
30
20
10
pcv
0 1
class
class
1.0
0.0
150
100
50
0
sod
01
class
class
1.0
0.0
15
10
5
hemo
01
class
class
1.0
0.0
50
40
30
20
10
0
pot
01
class
class
1.0
0.0
(b)
Figure 4: Violin plot of attributes.
Computational Intelligence and Neuroscience 9
performance when testing data patterns have those attribute
values and gives sometimes wrong output labels [22].
3.5. Hybrid Model. We use the concept of stacking for our
hybrid model. As a type of ensemble technique in stacking,
multiple classication models were combined with a main/
meta classier. One after the other, multiple layers were
placed, where the models pass their predictions, and the
upper most layer model makes decisions on the base of the
combination of dierent models as a base model. e models
in the low layer get attributes as input from the original data.
e topmost layer of the model gets output from the lower
layers and gives the results as a nal prediction. e stacking
technique involves using multiple independent machine-
learning models as input to process the original data. After
that, the meta classier is used to predict the input along with
the output of each machine learning model and individual
algorithm’s weights are estimated. e algorithms that are
performing best are selected, and others having low perfor-
mance are removed. In this technique, multiple classiers as
base model are combined and then, by using dierent ma-
chine learning algorithms, are trained on the same dataset
through the use of a meta-classier [23]. Figure 5 shows the
ow diagram for the proposed hybrid model.
e execution of the model with the sequence of the
steps is given below:
(i) Collect the data of CKD from UCI repository
(ii) Exploratory data analysis (EDA) is performed on
that dataset
(iii) is dataset is split into two parts: test data and
train data
(iv) Apply the cross-validation of 10 folds
(v) Train the base models Gaussian Na¨
ıve Bayes, gradient
boosting, and decision tree with the train set giving
the predictions as M1, M2, and M3, respectively
(vi) e output of the base models M1, M2, and M3 and
test set data serve as input for random forest as
input for training
(vii) Once the random forest gets trained, it gives the
prediction on the basis of training dataset and the
output predictions of the base models
In this study, we have considered the UCI CKD dataset,
and this dataset is split into two parts. 80% of data is used for
training purposes as an input to the machine learning al-
gorithms. We exploited the Gaussian Na¨
ıve Bayes, gradient
boosting, decision tree, and stacking classier with random
forest algorithm which was used to predict the chronic
kidney disease for 20% test data as input and plotted the
predicted values and compared their values. Our proposed
methodology has the following advantages.
(i) We implemented four machine learning algorithms
that are decision tree, gradient boosting, Gaussian
Na¨
ıve Bayes, and random forest. We applied
stacking classiers to build the hybrid model that
combines these four algorithms.
(ii) We analyzed the accuracy of the same dataset with
respect to dierent machine learning algorithms
and compared their accuracy score to get the
best model
(iii) We implemented a stacking classier technique to
build a new model with improved accuracy
4. Dataset Details
We selected 14 attributes from the dataset that we are
using from the UCI repository dataset of chronic kidney
disease as input features as shown in Table 5 where age
attribute shows the patient’s age, bp indicates the blood
pressure, sg indicates the specic gravity of the urine, al
indicates the level of aluminum in the patient urine, bgr
(blood glucose random) indicates the blood sugar level
glucose tolerance, su represents the sugar level, bu in-
dicates the blood urea, sod indicates the amount of so-
dium, sc indicates the serum creatinine, pot indicates the
amount of potassium, hemo indicates the hemoglobin,
and pcv indicates the packed cell volume. Further, wc
indicates the white blood cell count, and rc indicates the
red blood cell count.
To identify the number of chronic kidney disease pa-
tients and the number of healthy ones, we performed the
visualization on the CKD dataset, which can be seen in the
histogram plot in Figure 6. Here 0.0 represents the healthy
cases, while 1.0 represents the chronic kidney disease pa-
tients. In this dataset, there are 250 chronic kidney disease
patients, while 150 are healthy people.
e Pearson correlation feature selection method is used
to get the best combination of features for the prediction of
chronic kidney disease. e correlation of the 14 attributes
and 1 output label is presented in Figure 7.
When we go from the exploratory data analysis stage to
the pair plot visualization, it is observed to be very helpful as
it gives the data that can be used to nd the relationship
between attributes for both the categorical and continuous
variables. We import the Seaborn library to get pair plot. e
information about all the attributes is in one picture and is
clear. e statistical information is in attractive format
represented with pair plot as shown in Figure 8.
e violin plots are used for all the attributes in ex-
ploratory data analysis that are used in the hybrid model.
ese can give additional useful information like density
trace and distribution of the dataset. e violin plots give the
whole range of dataset which cannot be shown by box plot.
e violin plots of all 14 attributes are given in Figure 4.
Figure 9 shows the comparison of dierent models’ accuracy
scores in the form of a chart.
. Results and Discussion
Machine learning algorithms such as gradient boosting,
Gaussian Na¨
ıve Bayes, decision tree, and random forest
classier were used in the proposed hybrid model. ese
dierent machine learning classiers were used as a com-
bination for the chronic kidney disease predictions. is also
overcomes the overtting problem and results in higher
10 Computational Intelligence and Neuroscience
Start
Ckd data
set
Exploratory Data Analysis
Data preprocessing
Data splitting
Cross validation
Implementing Base
models
Trained based model
Random Forest
Prediction
Final Model
Test se t
Train set
M3
M2
M1
ckd
Not-ckd
Figure 5: Flowchart for the proposed model.
Table 5: e attribute set with their data types.
# Attributes Full form Data type Nonempty value Missing values
0 age Age oat64 400 0
1 bp Blood pressure oat64 400 0
2 sg Specic gravity of urine oat64 400 0
3 al Level of aluminum oat64 400 0
4 su Sugar level oat64 400 0
5 bgr Blood glucose random oat64 400 0
6 bu Blood urea oat64 400 0
7 sc Sugar level oat64 400 0
8 sod Amount of sodium oat64 400 0
9 pot Amount of potassium oat64 400 0
10 hemo Hemoglobin oat64 400 0
11 pcv Packed cell volume oat64 400 0
12 wc White cell oat64 400 0
13 rc Red cell oat64 400 0
Computational Intelligence and Neuroscience 11
accuracy. In order to improve accuracy and to come up with
a novel approach as compared to the existing work, we have
implemented the proposed hybrid model with the best
combination of GB, GNB, and decision tree, along with the
random forest classiers [24–27]. e results described in
Table 6 show that diagnosis of chronic kidney disease is
eective using the random forest with combination as
a stacking technique in the hybrid model. Gradient boosting
achieves 99% accuracy, random forest achieves 98% accu-
racy, and our hybrid model achieves 100% accuracy, and at
the same time, it has reduced the chances of overtting.
In order to nd the contributions to the development of
prediction models for chronic kidney disease, a regional
basis analysis is performed. As discussed in the Introduction
section that the developing countries’ population suers
more from chronic kidney disease, it was observed that most
of the research work is performed in developing countries. A
summary of this region-wise contribution is presented in
Figure 10.
250
200
150
100
count
50
0
1.0
class
0.0
Figure 6: Histogram plot.
10
0.8
0.6
0.4
0.2
0.0
−0.2
−0.4
−0.6
age
bp
sg
al
su
bgr
bu
sc
sod
pot
hemo
pcv
wc
rc
class
age
bp
sg
al
su
bgr
bu
sc
sod
pot
hemo
pcv
wc
rc
class
Figure 7: Heat map of chosen attributes.
Figure 8: Pair plot of each attribute.
Figure 9: Accuracy score of implemented machine learning classiers.
Table 6: Accuracy score of implemented machine learning
classiers.
ML algorithms Accuracy (%)
Gradient boosting 99
Gaussian Na¨
ıve Bayes 93
Decision tree 96
Random forest 98
Hybrid model 100
Asia
50%
Europe
20%
Africa
10%
America
20%
REGION WISE
CONTRIBUTIONS
Figure 10: Region-wise contributions.
12 Computational Intelligence and Neuroscience
6. Conclusion
Chronic kidney disease is considered as one of the prom-
inent life-threatening diseases in the developing world. e
most obvious cause seems to be lack of physical exercise. e
medical practitioners used a number of diagnosis processes
and procedures, where machine learning is the recent de-
velopment. In this paper, we have selected machine learning
because in terms of accuracy, it performs better as compared
to other available approaches. In this article, we have used
the Pearson correlation feature selection method and ap-
plied the same on machine learning classier. GB, GNB,
decision tree, and random forest are the base classiers for
the stacking algorithm, whereas these are implemented with
the cross-validation on the basis of accuracy score. In this
study, we evaluated these algorithms on the same dataset.
Furthermore, we have used dataset of CKD from the UCI
directory that contains 14 attributes and 400 instances. On
the basis of these attributes, our proposed stacking model is
able to predict whether the person is a CKD patient or not
with 100% accuracy. Best features are selected using the
Pearson correlation method, and the stacking algorithm is
implemented with the best machine learning classiers. e
cross-validation enhances the performance of the stacking
model. As we have worked on the chronic kidney disease
data of the binary group, the stacking algorithm performs
better with these combinations of algorithms. We can im-
plement the stacking technique for the prediction of other
diseases to get better accuracy score.
Data Availability
No data were used to support this study.
Conflicts of Interest
e authors declare that they have no conicts of interest.
References
[1] V. Jha, G. Garcia-Garcia, K. Iseki et al., “Chronic kidney
disease: global dimension and perspectives,” e Lancet,
vol. 382, no. 9888, pp. 260–272, 2013.
[2] R. Ruiz-Arenas, “A summary of worldwide national activities
in chronic kidney disease (CKD) testing, the electronic
journal of the international federation of,” Clinical Chemistry
and Laboratory Medicine, vol. 28, no. 4, pp. 302–314, 2017.
[3] edailystar, “Over 35,000 develop kidney failure in Ban-
gladesh every year,” 2019, https://www.thedailystar.net/city/
news/18m-kidney-patients-bangladesh-every-year-1703665.
[4] Prothomalo, “Women more aected by kidney diseases,”
2018, https://en.prothomalo.com/bangladesh/Womenmore-
aected-by-kidney-diseases.
[5] Scottish Intercollegiate Guidelines Network (Sign), Diagnosis
and Management of Chronic Kidney Disease: A National
Clinical Guideline, SIGN, Victoria, Australia, 2008.
[6] M. Kavitha, G. Gnaneswar, R. Dinesh, Y. R. Sai, and
R. S. Suraj, “Heart disease prediction using hybrid machine
learning model,” in Proceedings of the 2021 6th International
Conference on Inventive Computation Technologies (ICICT),
Coimbatore, India, January 2021.
[7] P. Ghosh, F. M. Javed Mehedi Shamrat, S. Shultana, S. Afrin,
A. A. Anjum, and A. A. Khan, “Optimization of prediction
method of chronic kidney disease using machine learning
algorithm,” in Proceedings of the 2020 15th International Joint
Symposium on Articial Intelligence and Natural Language
Processing (iSAI-NLP), Bangkok, ailand, November 2020.
[8] K. R. A. Padmanaban and G. Parthiban, “Applying machine
learning techniques for predicting the risk of chronic kidney
disease,” Indian Journal of Science and Technology, vol. 9,
no. 29, 2016.
[9] A. Charleonnan, T. Fufaung, T. Niyomwong,
W. Chokchueypattanakit, S. Suwannawach, and
N. Ninchawee, “Predictive analytics for chronic kidney dis-
ease using machine learning techniques,” in Proceedings of the
2016 Management and Innovation Technology International
Conference (MITicon), Bang-San, ailand, October 2016.
[10] G.-S. Fu, Y. Levin-Schwartz, Q.-H. Lin, and D. Zhang,
“Machine learning for medical imaging,” Journal of healthcare
engineering, vol. 2019, pp. 1-2, 2019.
[11] R. Devika, S. V. Avilala, and V. Subramaniyaswamy,
“Comparative study of classier for chronic kidney disease
prediction using naive Bayes, KNN and random forest,” in
Proceedings of the 2019 3rd International Conference on
Computing Methodologies and Communication (ICCMC),
Erode, India, March 2019.
[12] S. Revathy, B. Bharathi, P. Jeyanthi, and M. Ramesh, “Chronic
kidney disease prediction using machine learning models,”
International Journal of Engineering and Advanced Technol-
ogy, vol. 9, no. 1, pp. 6364–6367, 2019.
[13] A. S. A. Rabby, R. Mamata, M. A. Laboni, Ohidujjaman, and
S. Abujar, “Machine learning applied to kidney disease pre-
diction: comparison study,” in Proceedings of the 2019 10th
International Conference on Computing, Communication and
Networking Technologies (ICCCNT), Kanpur, India, July 2019.
[14] M. Nishat, F. Faisal, R. Dip et al., “A comprehensive analysis
on detecting chronic kidney disease by employing machine
learning algorithms,” EAI Endorsed Transactions on Pervasive
Health and Technology, vol. 7, Article ID 170671, 2018.
[15] S. Pouriyeh, S. Vahid, G. Sannino, G. De Pietro, H. Arabnia,
and J. Gutierrez, “A comprehensive investigation and com-
parison of Machine Learning Techniques in the domain of
heart disease,” in Proceedings of the 2017 IEEE Symposium on
Computers and Communications (ISCC), Heraklion, Greece,
July 2017.
[16] M. A. Jabbar, B. L. Deekshatulu, and P. Chandra, “Intelligent
heart disease prediction system using random forest and
evolutionary approach,” Journal of network and innovative
computing, vol. 4, pp. 175–184, 2016.
[17] Bmc, “Biomedcentral,” 2022.
[18] S. Ramya and N. Radha, “Diagnosis of chronic kidney disease
using machine learning algorithms,” International Journal of
Innovative Research in Computer and Communication Engi-
neering, vol. 4, no. 1, 2016.
[19] M. Kumar, “Prediction of chronic kidney disease using
random forest machine learning algorithm,” International
Journal of Computer Science and Mobile Computing, vol. 5,
pp. 24–33, 2016.
[20] M. S. Basarslan and F. Kayaalp, “Performance analysis of fuzzy
rough set-based and correlation-based attribute selection
methods on detection of chronic kidney disease with various
classiers,” in Proceedings of the 2019 Scientic Meeting on
Electrical-Electronics and Biomedical Engineering and Com-
puter Science (EBBT), April 2019.
Computational Intelligence and Neuroscience 13
[21] S. K. Dowluru and A. K. Rayavarapu, “Statistical and data
mining aspects on kidney stones: a systematic review and
metza-analysis,” Open Access Scientic Reports, vol. 1, no. 12,
2012.
[22] S. M. M. Hasan, M. A. Mamun, M. P. Uddin, and
M. A. Hossain, “Comparative analysis of classication ap-
proaches for heart disease prediction,” in Proceedings of the
2018 International Conference on Computer, Communication,
Chemical, Material and Electronic Engineering (IC4ME2),
pp. 1–4, Rajshahi, Bangladesh, February 2018.
[23] C. B. C. Latha and S. C. Jeeva, “Improving the accuracy of
prediction of heart disease risk based on ensemble classi-
cation techniques,” Informatics in Medicine Unlocked, vol. 16,
Article ID 100203, 2019.
[24] A. J. Aljaaf, A.-J. Dhiya, H. M. Hussein et al., “Early prediction
of chronic kidney disease using machine learning supported
by predictive analytics,” in Proceedings of the 2018 IEEE
Congress on Evolutionary Computation (CEC), Rio de Janeiro,
Brazil, July 2018.
[25] S. Khan, M. Z. Khan, P. Khan, G. Mehmood, A. Khan, and
M. Fayaz, “An ant-hocnet routing protocol based on opti-
mized fuzzy logic for swarm of UAVs in FANET,” Wireless
Communications and Mobile Computing, vol. 2022, Article ID
6783777, 12 pages, 2022.
[26] M. Fayaz, G. Mehmood, A. Khan, S. Abbas, M. Fayaz, and
J. Gwak, “Counteracting selsh nodes using reputation based
system in mobile ad hoc networks,” Electronics, vol. 11, no. 2,
p. 185, 2022.
[27] M. Z. U. Haq, M. Z. Khan, H. U. Rehman et al., “An adaptive
topology management scheme to maintain network con-
nectivity in Wireless Sensor Networks,” Sensors, vol. 22, no. 8,
p. 2855, 2022.
14 Computational Intelligence and Neuroscience
... Education and backing may disseminate through texting services and online discussions relaying prevention information and encouragement amid lifestyle changes decreasing condition danger. mHealth technologies may also improve care accessibility for those remote or underserved, possibly confronting barriers accessing traditional providers using telemedicine permitting remote supervision and consultation [40]. ...
Article
Full-text available
Background: Glomerulonephritis refers to a range of conditions involving inflammation and injury to the kidneys' glomeruli, often leading to significant morbidity if left untreated. Purpose: This review aims to examine emerging advancements in the prevention and treatment of glomerulonephritis and highlight progress in transforming the prognosis of this spectrum of diseases, while also identifying gaps requiring ongoing effort. Main body: Novel targeted immunotherapies utilizing engineered delivery platforms and biologicals like monoclonal antibodies are progressing in research pipelines, potentially offering safer, more efficacious alternatives to current standard immunosuppression. High-throughput biomarker assays and AI/machine learning algorithms have demonstrated the ability to improve early detection of kidney damage and guide personalized treatment plans. Further prevention opportunities emerge from modulating microbiome-immune interactions, lifestyle factors, and vaccinations shielding against infections triggering renal disorders. Conclusion: Although challenges remain, recent advancements in unraveling the pathogenesis of glomerulonephritis coupled with the emergence of cutting-edge diagnostics and targeted interventions set the stage for a new era combating the risk and progression of this spectrum of diseases.
... Khalid et al. 16 conducted a study that made significant strides in predicting chronic kidney disease (CKD) by developing a hybrid model. This model integrates four machine learning classifiers: Gaussian Naive Bayes, gradient boosting, decision tree, and random forest, and achieved a 100% accuracy in predicting CKD. ...
Article
Full-text available
The air quality index (AQI) is a commonly employed metric for evaluating air quality across diverse locations and temporal spans. Similar to other environmental datasets, AQI data can exhibit outliers data points markedly divergent from the norm, signifying instances of exceptionally favorable or adverse air quality. This becomes crucial in identifying and comprehending severe pollution episodes with far‐reaching environmental and public health implications. This study utilizes air quality data from January 1, 2014, to January 31, 2021, collected at daily intervals in Shanghai City, China, as the experimental dataset. The dataset includes daily AQI measurements, along with six pollutant concentrations: particulate matter (PM2.5 and PM10), sulfur dioxide (SO2), nitrogen dioxide (NO2), ozone (O3), and carbon monoxide (CO). Each pollutant's concentration is measured in micrograms per cubic meter (μ$$ \upmu $$g/m 3$$ {}^3 $$). The dataset is then preprocessed by cleaning and normalizing it before using K‐means clustering to discover different patterns. A stacked ensemble machine learning model that incorporates K‐means clustering, random forest (RF) and gradient boosting classifier (GBC) is developed and compared to decision tree, support vector machine, K‐nearest neighbor and Naive Bayes algorithms to evaluate its performance in identifying outliers using accuracy, precision, recall, and F1‐score. The stacked model outperformed all other established models based on the accuracy, precision, recall, and F1‐score of 0.99, 0.99, 0.97, and 0.99, respectively.
... -Current research [9][10][11] on chronic kidney disease focused on machine learning-based classification algorithms, presuming that various risk variables were equally significant. ...
Article
Full-text available
An extensive world population, in particular aged people, is suffering from chronic kidney disease (CKD). Early prediction of CKD is crucial in mitigating disease complications, slowing its progression, and improving patient survival rates. This work analyzed CKD ailment-related issues using three computational approaches: (1) Several statistical methods were investigated to find the relationship between a heterogeneous risk factor and disease. In addition, different significance tests were exercised to classify the risk factors for two classes, with and without CKD. (2) A hybrid statistical approach was followed to identify the most critical risk factors significant to predict CKD. (3) Machine learning techniques were used to predict the onset of chronic kidney disease in terms of the significant risk factors. Several experiments were conducted to substantiate the efficacy of the proposed analysis and prognosis. Proposing a statistical approach that outperforms existing methods to identify the minimum number of significant risk factors and predict CKD using those factors without compromising maximum prediction accuracy strengthens the contribution of the research. Indeed, it incorporates a low-cost approach in the field of affordable healthcare.
Article
Full-text available
Drones or unmanned aircraft are commonly known as unmanned aerial vehicles (UAVs), and the ad hoc network formed by these UAVs is commonly known as Flying Ad Hoc Network (FANET). UAVs and FANET were initially associated with military surveillance and intelligence gathering; moreover, they are now excessively used in civilian roles including search and rescue, traffic monitoring, firefighting, videography, and smart agriculture. However, due to the distinctive architecture, they pose considerable design and deployment challenges, prominently related to routing protocols, as the traditional routing protocols cannot be used directly in FANET. For instance, due to high mobility and sparse topology, frequent link breakage and route maintenance incur high overhead and latency. In this paper, we employ the bio-inspired Ant Colony Optimization (ACO) algorithm called "Ant-Hocnet" based on optimized fuzzy logic to improve routing in FANET. Fuzzy logic is used to analyze the information about the status of the wireless links, such as available bandwidth, node mobility, and link quality, and calculate the best wireless links without a mathematical model. To evaluate and compare our design, we implemented it in the MATLAB simulator. The results show that our approach offers improvements in throughput and end-to-end delays, hence enhancing the reliability and efficiency of the FANET.
Article
Full-text available
The roots of Wireless Sensor Networks (WSNs) are tracked back to US military developments, and, currently, WSNs have paved their way into a vast domain of civil applications, especially environmental, critical infrastructure, habitat monitoring, etc. In the majority of these applications, WSNs have been deployed to monitor critical and inaccessible terrains; however, due to their unique and resource-constrained nature, WSNs face many design and deployment challenges in these difficult-to-access working environments, including connectivity maintenance, topology management, reliability, etc. However, for WSNs, topology management and connectivity still remain a major concern in WSNs that hampers their operations, with a direct impact on the overall application performance of WSNs. To address this issue, in this paper, we propose a new topology management and connectivity maintenance scheme called a Tolerating Fault and Maintaining Network Connectivity using Array Antenna (ToMaCAA) for WSNs. ToMaCAA is a system designed to adapt to dynamic structures and maintain network connectivity while consuming fewer network resources. Thereafter, we incorporated a Phase Array Antenna into the existing topology management technologies, proving ToMaCAA to be a novel contribution. This new approach allows a node to connect to the farthest node in the network while conserving resources and energy. Moreover, data transmission is restricted to one route, reducing overheads and conserving energy in various other nodes’ idle listening state. For the implementation of ToMaCAA, the MATLAB network simulation platform has been used to test and analyse its performance. The output results were compared with the benchmark schemes, i.e., Disjoint Path Vector (DPV), Adaptive Disjoint Path Vector (ADPV), and Pickup Non-Critical Node Based k-Connectivity (PINC). The performance of ToMaCAA was evaluated based on different performance metrics, i.e., the network lifetime, total number of transmitted messages, and node failure in WSNs. The output results revealed that the ToMaCAA outperformed the DPV, ADPV, and PINC schemes in terms of maintaining network connectivity during link failures and made the network more fault-tolerant and reliable.
Article
Full-text available
A mobile ad hoc network (MANET) is a group of nodes constituting a network of mobile nodes without predefined and pre-established architecture where mobile nodes can communicate without any dedicated access points or base stations. In MANETs, a node may act as a host as well as a router. Nodes in the network can send and receive packets through intermediate nodes. However, the existence of malicious and selfish nodes in MANETs severely degrades network performance. The identification of such nodes in the network and their isolation from the network is a challenging problem. Therefore, in this paper, a simple reputation-based scheme is proposed which uses the consumption and contribution information for selfish node detection and cooperation enforcement. Nodes failing to cooperate are detached from the network to save resources of other nodes with good reputation. The simulation results show that our proposed scheme outperforms the benchmark scheme in terms of NRL (normalized routing load), PDF (packet delivery fraction), and packet drop in the presence of malicious and selfish attacks. Furthermore, our scheme identifies the selfish nodes quickly and accurately as compared to the benchmark scheme.
Article
Full-text available
Chronic Kidney Disease refers to the slow, progressive deterioration of kidney functions. However, the impairment is irreversible and imperceptible up until the disease reaches one of the later stages, demanding early detection and initiation of treatment in order to ensure a good prognosis and prolonged life. In this aspect, machine learning algorithms have proven to be promising, and points towards the future of disease diagnosis. We aim to apply different machine learning algorithms for the purpose of assessing and comparing their accuracies and other performance parameters for the detection of chronic kidney disease. The 'chronic kidney disease dataset' from the machine learning repository of University of California, Irvine, has been harnessed, and eight supervised machine learning models have been developed by utilizing the python programming language for the detection of the disease. A comparative analysis is portrayed among eight machine learning models by evaluating different performance parameters like accuracy, precision, sensitivity, F1 score and ROC-AUC. Among the models, Random Forest displayed the highest accuracy of 99.75%. We observed that machine learning algorithms can contribute significantly to the domain of predictive analysis of chronic kidney disease, and can assist in developing a robust computer-aided diagnosis system to aid the healthcare professionals in treating the patients properly and efficiently.
Preprint
Full-text available
Chronic Kidney disease (CKD), a slow and late-diagnosed disease, is one of the most important problems of mortality rate in the medical sector nowadays. Based on this critical issue, a significant number of men and women are now suffering due to the lack of early screening systems and appropriate care each year. However, patients' lives can be saved with the fast detection of disease in the earliest stage. In addition, the evaluation process of machine learning algorithm can detect the stage of this deadly disease much quicker with a reliable dataset. In this paper, the overall study has been implemented based on four reliable approaches, such as Support Vector Machine (henceforth SVM), AdaBoost (henceforth AB), Linear Discriminant Analysis (henceforth LDA), and Gradient Boosting (henceforth GB) to get highly accurate results of prediction. These algorithms are implemented on an online dataset of UCI machine learning repository. The highest predictable accuracy is obtained from Gradient Boosting (GB) Classifiers which is about to 99.80% accuracy. Later, different performance evaluation metrics have also been displayed to show appropriate outcomes. To end with, the most efficient and optimized algorithms for the proposed job can be selected depending on these benchmarks.
Article
Full-text available
: The field of biosciences have advanced to a larger extent and have generated large amounts of information from Electronic Health Records. This have given rise to the acute need of knowledge generation from this enormous amount of data. Data mining methods and machine learning play a major role in this aspect of biosciences. Chronic Kidney Disease(CKD) is a condition in which the kidneys are damaged and cannot filter blood as they always do. A family history of kidney diseases or failure, high blood pressure, type 2 diabetes may lead to CKD. This is a lasting damage to the kidney and chances of getting worser by time is high. The very common complications that results due to a kidney failure are heart diseases, anemia, bone diseases, high potasium and calcium. The worst case situation leads to complete kidney failure and necessitates kidney transplant to live. An early detection of CKD can improve the quality of life to a greater extent. This calls for good prediction algorithm to predict CKD at an earlier stage. Literature shows a wide range of machine learning algorithms employed for the prediction of CKD. This paper uses data preprocessing,data transformation and various classifiers to predict CKD and also proposes best Prediction framework for CKD. The results of the framework show promising results of better prediction at an early stage of CKD
Conference Paper
Full-text available
Machine learning has earned a remarkable position in the healthcare sector because of its capability to enhance disease prediction in the healthcare sector. Artificial intelligence and Machine learning techniques are being used in the healthcare sector. Nowadays, one of the world's crucial health-related problem is kidney disease. It is increasing day by day because of not maintaining proper food habits, drinking less amount of water and lack of health consciousness. So we need some technique that will continuously monitor health conditions effectively. Here, we have proposed an approach for real-time kidney disease prediction, monitoring, and application (KDPMA). Our aim is to find an optimized and efficient machine learning (ML) technique that can effectively recognize and predict the condition of chronic kidney disease. In this work, we used ten most popular machine learning technique to predict kidney disease. In this process, the data has been divided into two sections. In one section train dataset got trained and another section got evaluated by test dataset. The analysis results show that the Decision Tree Classifier and Gaussian Naive Bayes achieved the highest performance than the other classifiers, obtaining the accuracy score of 100% and 1 recall(Sensitivity) score. Now we are developing a mobile applications based on the best output results classifier technique to predict Kidney Disease from the patient reports.
Article
Diabetes and high blood pressure are the primary causes of Chronic Kidney Disease (CKD). A person with CKD has a higher chance of dying young. Doctors face a difficult task in diagnosing the different diseases linked to CKD at an early stage to prevent the disease. Early discovery of CKD empowers sufferers to get the opportunity remedy to decorate the motion of this infection. CKD is among the top 20 causes of death worldwide and affects approximately 10% of the world's adult population. CKD is a disorder that disrupts normal kidney function. The novelty of this study lies in developing a diagnosis system to detect chronic kidney diseases. This study focused on evaluating a dataset collected from 400 patients containing 24 features. The mean and mode statistical analysis methods were used to replace the missing numerical and nominal values. To choose the most important features, Recursive Feature Elimination (RFE) was applied. Three classification algorithms applied in this study were k-nearest neighbors (KNN), Random Forest Classifier (RFC), and Ada Boost Classifier (ABC). All the classification algorithms achieved promising performance. The RFC and ABC Algorithm outperformed all other applied algorithms, reaching an accuracy, precision, recall, and F1-score of 100% for all measures. Therefore, Machine Learning techniques are of great importance in the early detection of CKD. These techniques are supportive of experts and doctors in early diagnosis to avoid developing kidney failure.
Conference Paper
Heart disease causes a significant mortality rate around the world, and it has become a health threat for many people. Early prediction of heart disease may save many lives; detecting cardiovascular diseases like heart attacks, coronary artery diseases etc., is a critical challenge by the regular clinical data analysis. Machine learning (ML) can bring an effective solution for decision making and accurate predictions. The medical industry is showing enormous development in using machine learning techniques. In the proposed work, a novel machine learning approach is proposed to predict heart disease. The proposed study used the Cleveland heart disease dataset, and data mining techniques such as regression and classification are used. Machine learning techniques Random Forest and Decision Tree are applied. The novel technique of the machine learning model is designed. In implementation, 3 machine learning algorithms are used, they are 1. Random Forest, 2. Decision Tree and 3. Hybrid model (Hybrid of random forest and decision tree). Experimental results show an accuracy level of 88.7% through the heart disease prediction model with the hybrid model. The interface is designed to get the user's input parameter to predict the heart disease, for which we used a hybrid model of Decision Tree and Random Forest.
Conference Paper
Chronic Kidney disease (CKD), a slow and late-diagnosed disease, is one of the most important problems of mortality rate in the medical sector nowadays. Based on this critical issue, a significant number of men and women are now suffering due to the lack of early screening systems and appropriate care each year. However, patients' lives can be saved with the fast detection of disease in the earliest stage. In addition, the evaluation process of machine learning algorithm can detect the stage of this deadly disease much quicker with a reliable dataset. In this paper, the overall study has been implemented based on four reliable approaches, such as Support Vector Machine (henceforth SVM), AdaBoost (henceforth AB), Linear Discriminant Analysis (henceforth LDA), and Gradient Boosting (henceforth GB) to get highly accurate results of prediction. These algorithms are implemented on an online dataset of UCI machine learning repository. The highest predictable accuracy is obtained from Gradient Boosting (GB) Classifiers which is about to 99.80% accuracy. Later, different performance evaluation metrics have also been displayed to show appropriate outcomes. To end with, the most efficient and optimized algorithms for the proposed job can be selected depending on these benchmarks.