ArticlePDF Available

Machine Learning Hybrid Model for the Prediction of Chronic Kidney Disease

March 2023
Computational Intelligence and Neuroscience 2023(4)

March 2023
2023(4)

DOI:10.1155/2023/9266889

License
CC BY

Authors:

Muhammad Zahid Khan

University of Malakand

Gulzar Mehmood

Iqra National University

Show all 5 authorsHide

To diagnose an illness in healthcare, doctors typically conduct physical exams and review the patient's medical history, followed by diagnostic tests and procedures to determine the underlying cause of symptoms. Chronic kidney disease (CKD) is currently the leading cause of death, with a rapidly increasing number of patients, resulting in 1.7 million deaths annually. While various diagnostic methods are available, this study utilizes machine learning due to its high accuracy. In this study, we have used the hybrid technique to build our proposed model. In our proposed model, we have used the Pearson correlation for feature selection. In the frst step, the best models were selected on the basis of critical literature analysis. In the second step, the combination of these models is used in our proposed hybrid model. Gaussian Naïve Bayes, gradient boosting, and decision tree classifer are used as a base classifer, and the random forest classifer is used as a meta-classifer in the proposed hybrid model. Te objective of this study is to evaluate the best machine learning classification techniques and identify the best-used machine-learning classifier in terms of accuracy. Tis provides a solution for overfitting and achieves the highest accuracy. It also highlights some of the challenges that afect the result of better performance. In this study, we critically review the existing available machine-learning classification techniques. We evaluate in terms of accuracy, and a comprehensive analytical evaluation of the related work is presented with a tabular system. In implementation, we have used the top four models and built a hybrid model using UCI chronic kidney disease dataset for prediction. Gradient boosting achieves around 99% accuracy, random forest achieves 98%, and decision tree classifier achieves 96% accuracy, Our proposed hybrid model performs best, getting 100% accuracy on the same dataset. Some of the main machine learning algorithms used to predict the occurrence of CKD are Naïve Bayes, decision trees, K-nearest neighbor, random forest, support vector machine, LDA, GB, and neural network. In this study, we apply GB (gradient boosting), Gaussian Naïve Bayes, and decision tree along with random forest on the same set of features and compare the accuracy score.

Symptoms in CKD patients [7].

…

Block diagram of the machine learning hybrid model.

…

Comparison of machine learning classifiers.

…

Violin plot of attributes.

…

+10

Violin plot of attributes.

…

Figures - available from: Computational Intelligence and Neuroscience

This content is subject to copyright. Terms and conditions apply.

Access to this full-text is provided by Wiley.

Learn more

Content available from Computational Intelligence and Neuroscience

This content is subject to copyright. Terms and conditions apply.

Research Article

Machine Learning Hybrid Model for the Prediction of Chronic

Kidney Disease

Hira Khalid,

Ajab Khan ,

Muhammad Zahid Khan ,

Gulzar Mehmood ,

and Muhammad Shuaib Qureshi

Department of Information Technology, Abbottabad University of Science and Technology, Havelian 22500,

Abbottabad, Pakistan

Department of Computer Science and I.T, Network Systems and Security Research Group, University of Malakand,

Chakdara 18800, Khyber Pakhtunkhwa, Pakistan

Department of Computer Science, IQRA National University, Swat Campus 19220, Peshawar, Pakistan

Department of Computer Science, School of Arts and Sciences, University of Central Asia, Bishkek, Kyrgyzstan

Correspondence should be addressed to Muhammad Shuaib Qureshi; muhammad.qureshi@ucentralasia.org

Received 25 July 2022; Revised 6 September 2022; Accepted 19 September 2022; Published 14 March 2023

Academic Editor: Farman Ali

which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

To diagnose an illness in healthcare, doctors typically conduct physical exams and review the patient’s medical history, followed

by diagnostic tests and procedures to determine the underlying cause of symptoms. Chronic kidney disease (CKD) is currently

the leading cause of death, with a rapidly increasing number of patients, resulting in 1.7 million deaths annually. While various

diagnostic methods are available, this study utilizes machine learning due to its high accuracy. In this study, we have used the

hybrid technique to build our proposed model. In our proposed model, we have used the Pearson correlation for feature

selection. In the rst step, the best models were selected on the basis of critical literature analysis. In the second step, the

combination of these models is used in our proposed hybrid model. Gaussian Na¨

ıve Bayes, gradient boosting, and decision tree

classier are used as a base classier, and the random forest classier is used as a meta-classier in the proposed hybrid model.

e objective of this study is to evaluate the best machine learning classication techniques and identify the best-used machine

learning classier in terms of accuracy. is provides a solution for overtting and achieves the highest accuracy. It also

highlights some of the challenges that aect the result of better performance. In this study, we critically review the existing

available machine learning classication techniques. We evaluate in terms of accuracy, and a comprehensive analytical

evaluation of the related work is presented with a tabular system. In implementation, we have used the top four models and built

a hybrid model using UCI chronic kidney disease dataset for prediction. Gradient boosting achieves around 99% accuracy,

random forest achieves 98%, decision tree classier achieves 96% accuracy, and our proposed hybrid model performs best

getting 100% accuracy on the same dataset. Some of the main machine learning algorithms used to predict the occurrence of

CKD are Na¨

ıve Bayes, decision tree, K-nearest neighbor, random forest, support vector machine, LDA, GB, and neural

network. In this study, we apply GB (gradient boosting), Gaussian Na¨

ıve Bayes, and decision tree along with random forest on

the same set of features and compare the accuracy score.

1. Introduction

Nowadays, chronic kidney disease (CKD) is a rapidly

growing disease, and millions of people die due to lack of

timely aordable treatment. Chronic kidney disease patients

belong to low-class and middle-classincome-generating

countries [1, 2].

In 2013, about one million people died due to chronic

kidney disease [3]. e developing world suers more from

the chronic kidney disease, and low to average income

countries contain a total of 387.5 million CKD patients

where 177.4 million patients are male and 210.1 million

patients are female [4]. ese gures show that a large

number of people in developing countries suer from

Hindawi

Computational Intelligence and Neuroscience

Volume 2023, Article ID 9266889, 14 pages

https://doi.org/10.1155/2023/9266889

chronic kidney disease, and this ratio is increasing day by

day. A lot of work has been done for the early diagnosis of

chronic kidney disease so that the disease could be treated at

an early stage. In this article, we are focusing on machine

learning prediction models for chronic kidney disease and

giving importance to accuracy.

Chronic kidney disease is a common type of kidney

disease that occurs when both kidneys are damaged, and

the CKD patients suer from this condition for a long

term. Here, the term kidney damage means any kidney

condition that can cause improper functioning of the

kidney. is could be caused by any disorder or due to lack

of essentials like the glomerular ltration rate (GFR)

reduction [5]. Our proposed prediction model takes the

clinical symptoms as input and predicts the results using

the stacking classier with the random forest algorithm as

a base classier.

Machine learning is gaining signicance in healthcare

diagnosis as it enables intricate analysis, thereby minimizing

human errors and enhancing the precision of predictions.

Machine learning algorithms and classiers are now con-

sidered the most reliable techniques for the diagnosis of

dierent diseases like heart disease, diabetes, tumors disease,

and liver disease predictions [6].

Dierent machine learning algorithms used the Na¨

ıve Bayes,

SVM, and the decision tree for the classication purpose, while

random forest, logistic regression, and linear regression were

used for the regression purpose in the medical elds for the

prediction. With the ecient use of these algorithms, the death

rate can be minimized due to early-stage diagnosis and patients

can be treated timely. Along with maintaining the clinical

symptoms, chronic kidney disease patients should include

physical activities in daily life. ey should exercise, drink water,

and avoid junk food. e common symptoms of chronic kidney

disease are shown in Figure 1.

is article delivers an overview and analysis sub-

sequently followed by an implementation and evaluation of

the machine learning classiers used in CKD diagnosis.

Further, this article discusses the importance of machine

learning classiers in healthcare and explains how these can

make more accurate predictions. Figure 2 represents the

block diagram of the chronic kidney disease

prediction model.

e core objective of this article is to propose and im-

plement a hybrid machine learning prediction model for

chronic kidney disease where due importance is given to

accuracy. In this article, we have analyzed the accuracy of

same dataset with respect to dierent machine learning

algorithms and compared their accuracy score so as to get

a better model. Our focus remains on the solution of

overtting problem using cross-validation while achieving

the highest accuracy to build a best hybrid model from the

combination of available popular machine learning classi-

ers such as decision tree, gradient boosting, Gaussian Na¨

ıve

Bayes, and gradient boosting. e ultimate goal is to deliver

an accurate and eective treatment to CKD patients at

a reduced cost. Before we proceed further, we need to know

little more about common diseases of the kidney. In Table 1,

there is a list of some of the most common kidney diseases

(Table 2).

e remaining portion of the article is organized as

follows. Section 2 contains the literature survey along with

the tabular comparison of the dierent machine learning

algorithms used and an analysis of the results. Section 3

contains the proposed methodology. Section 4 contains the

dataset details. Section 5 contains results and discussion.

Section 6 contains conclusion and future work.

2. Literature Review

is section covers research work related to algorithms

and assesses some algorithms based on their accuracy. In

research work [7], the data mining technique applied to

specic analysis of clinical records is a good method. e

performance of the decision tree method was 91% (ac-

curacy) compared to the Na¨

ıve Bayesian method. e

classication algorithm for diabetes dataset had 94%

specicity and 95% sensitivity. ey also found that

mining helps retrieve correlations of attributes that are no

longer direct indicators of the type they are trying to

predict. Similar work still needs to be done to improve the

0.30%

2.50%

3.30%

1.90%

2.60%

2.30%

1.80%

2.40%

2.20%

6.80%

9.70%

2.10%

71.90%

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

(%)

Symptoms in CKD Patients

ethnicity

marital status

educational level

diabetes

hypertension

cerebrovascular disease

myocardial infarction

malignancy

psychiatric disease

BMI

albumin

haemoglobin

proteinuria

Figure 1: Symptoms in CKD patients [7].

CKD EDA

Feature selection

Hybrid Mode

Predicted output

Figure 2: Block diagram of the machine learning hybrid model.

2Computational Intelligence and Neuroscience

overall performance of prediction engine accuracy in the

statistical analysis of neural networks and clustering

algorithms.

In [8], the authors described the prediction models using

machine learning techniques including K-nearest neighbor

(KNN), support vector machine (SVM), logistic regression

(LR), and decision tree classiers for CKD prediction. From

the experiment, it was concluded that the SVM classier

provides the highest accuracy, 98.3%. SVM has the absolute

best sensitivity after training and testing performed with the

proposed method. erefore, according to this comparison,

it could be concluded that an SVM classier is used to

predict persistent kidney disease.

In the paper [9], they chose four dierent algorithms and

compared them to get an accurate expectation rate over the

dataset. Unlike all approaches that were presented, they got

the best results from the gradient boosting classier. e

models eectively achieve an accuracy rate of 99.80%,

whereas AdaBoost and LDA achieve 97.91% at a low value.

Also, the gradient boosting ML classier takes much time to

make the prediction compared to others and has a higher

predictable value in both the curves (ROC and AUC).

Hence, an accurate expectation undoubtedly depends on the

preprocessing strategy, and the methods of preprocessing

must be approached cautiously to precisely achieve recog-

nized results.

In [7], the authors investigated the machine learning

ability, which is supported by predictive analysis so as to

predict CKD early. An experimental procedure was per-

formed by considering a dataset of 400 cases collected by

Apollo Hospitals India. In this article, two labels were used

as output/targets in this hybrid model (i.e., patients having

CKD and others who are healthy) and four dierent ma-

chine learning classiers were implemented. On the com-

parison of these classiers, the classication along with

regression tree, and the RPART classication model, showed

remarkably better results in terms of accuracy. ey used the

information gain quotient for excruciating criterion, and

here the optimum spilling reduces the noise of the resulting

feature subsets. In this study, the RPART limited value of

criterion for the splitting was ve, meaning that splits re-

peatedly occur for the ve instances present in the leaf node.

In addition, they identied an equivalent previous proba-

bility for the class attributes. Here, the RPART prediction

model used seven terminal nodes for the earlier predictions

of CKD. e experimental results showed that the highest

AUC and TPR were obtained with the machine learning

prediction model, whereas the highest TNR (1.00) was

achieved with the model RPART. e RPART model could

be described as a set of rules for making the decision.

However, the major drawback of RPART is the consider-

ation of the single factor as a parameter in every division

Table 1: Description of common diseases of the kidney.

Diseases Description

CKD Chronic kidney disease (CKD) can occur when a disease or condition damages

kidney function, causing kidney damage to deteriorate over a few months or years.

Kidney stones Kidney stones (also called renal calculi) are hard pledges made of salts and minerals

that form inside your kidney.

Glomerulonephritis

Glomerulonephritis causes infection and damage to the ltering part of the kidneys

(glomerulus). It can occur quickly or could be over a longer period. Poisons,

metabolic wastes, and surplus uid are not properly strained into the urine. Instead,

they build up in the body producing inammation and fatigue.

Polycystic kidney disease

Polycystic kidney disease (PKD) is a genetic disorder that can produce many cysts

lled with uid and they grow inside your kidneys. Usually, they are harmless. e

cysts can change the shape of the kidneys while making them much bigger.

Table 2: Equations for accuracy measurement.

S. no Authors Accuracy equations

1 Padmanaban and Parthiban [8] Precision i�TPi/TPi + FPi

2 Charleonnan et al. [9] ACC �(TP + TN)/(P + N)

3 Ghosh et al. [7] e results of performance degree indices are dependent on TP, TN, FP, and FN

4 Fu et al. [10] Ext. values �points >Q3 + 1.5 (IQR) points <Q1 −1.5 (IQR)

5 Devika et al. [11] Accuracy �number of properly classied samples/total variety of samples

6 Revathy et al. [12] Accuracy � (TP +TN)/(TP +TN +FP +FN)Accuracy �TP + TN/

TP + TN + FP + FN

7 Nishat et al. [14] Accuracy � (TP +TN)/(TP +TN +FP +FN)Accuracy �TP + TN/

TP + TN + FP + FN

8 Rabby et al. [13] Descriptive analysis of the data as well as the experimental results

9 Pouriyeh et al. [15] Finding most signicant feature using chi-square test

10 Jabbar et al. [16] Experimental results only

True positive (TP) �list contains stated cases that are correctly categorized with CKD. False positive (FP) �list contains set that is inaccurately categorized

with CKD. True negative (TN) �list contains stated instances that are correctly categorized with CKD. False negative (FN) �list contains set of instances that

are exactly categorized with CKD.

Computational Intelligence and Neuroscience 3

procedure, while considering dierent parameter combi-

nations could result in better CKD predictions. However, the

machine learning prediction model gives the lowest error

rate. e major reason is that the MLP could adopt and

handle complex predictions. e complex relationships

require hidden nodes and they are useful as they allow neural

networks to model between parameters while sometimes

deal with nonlinearity in data. e overall results indicate

that the algorithms of machine learning give an inspiring

and a feasible methodology for earlier CKD prediction.

As we have already seen, there are dierent machine

learning prediction models and learning programs avail-

able to assist practitioners. In [5], they used a new selection

guide for predicting CKD. In this work, CKD is predicted

by using specic classiers and a reasonable study of overall

performance. In this study, they performed the evaluation

of the Na¨

ıve Bayes classier, random forest, and articial

neural network classiers and concluded that the random

forest classier performs better as compared to other

classiers. e worth of forecasting CKD has been pro-

gressive. Several sustainable evolutionary policies can be

used to improve the outcomes of the suggested classiers.

Here, Na¨

ıve Bayes, random forest, and KNN were applied

to predict CKD. Early diagnosis of CKD helps to treat those

aected well in time and prevent the disease from pro-

gressing to worse stage. e early detection of this type of

disease and well-timed treatment is one of the main ob-

jectives of the medical eld.

In [10], a machine learning prediction model was de-

veloped for the early prediction of CKD. e dataset gives

input features gathered from the CKD dataset and the

models were tested and validated for the given input fea-

tures. Machine learning decision tree classier, random

forest classier, and support vector classier were con-

structed for the diagnosis of CKD. e performance analysis

of the models was assessed on the basis of the accuracy score

of the prediction model. On comparison, the results of the

research showed that the random forest classier model

performs much better at predicting CKD as compared to

decision tree and support vector classiers.

e kidneys play a vital role in maintaining the body’s

blood pressure, acid-base sense of balance, and electrolyte

sense of balance, not only needed to lter toxins from the

body. Malfunction is accountable for insignicant to mortal

illnesses, in addition to dysfunction in the other body or-

gans. erefore, researchers all over the world have dedi-

cated themselves for nding techniques to accurately

diagnose and eectively treat chronic kidney disease. As

machine learning classiers are increasingly used in the

medical eld for diagnosis, now CKD is also included in the

list of diseases that could be predicted using machine

learning classiers. e research to detect CKD with ML

algorithms has enhanced the procedure and consequence

accuracy progressively. ey proposed the random forest

classier (99.75% accuracy) as the maximum ecient

classier among all other classiers. e study demonstrates

the eective handling of missing values in data through four

techniques, namely, mode, mean, median, and zero-point

methods. It also evaluates the performance of machine

learning models under two scenarios, with and without

tuning the hyperparameters, and observes signicant im-

provement in the classiers’ performance, which is visually

presented through graphs [11].

Overall, the motive of the study is to examine the ap-

plicability of specic supervised machine learning classiers

in the eld of bioinformatics and oer their compatibility in

detecting several serious diseases such as the diagnosis of

CKD at an early stage [12].

ey built an updated and procient machine learning

(ML) application that can perceptually perceive and predict

the state of chronic kidney disease. In this work, the ten most

important machine learning methods for predicting per-

manent kidney disease were considered. e level of ac-

curacy of the classication algorithm we used in our project

is as good as we wanted.

For the prediction of disease, the rst most essential step

is to detect the disease that is costly in developing countries

like Pakistan and Bangladesh. e people of these countries

mostly suer from this. Currently, CKD patient proportion

is increasing rapidly in Pakistan and Bangladesh. So, in that

article, the authors tried to develop a system that helps in

predicting the risk of CKD. In the proposed model, they used

and processed UCI datasets and real-time datasets and tried

to deal with missing data and trained the model using

random forest and ANN classiers. en, they implemented

these two algorithms in the Python language. e accuracy

they got with the random forest algorithm is 97.12% and that

with ANN is 94.5%, which is relatively very good. By use of

this proposed method, risk prediction of CKD at an early

stage is possible.

In [13], the authors predicted CKD based on sugar levels,

aluminum levels, and red blood cell percentage. In this

perception, ve classiers were applied, namely, Na¨

ıve

Bayes, logistic regression, decision table, random tree, and

random forest, and for each classier, the results were noted

based on (i) without preprocessing, (ii) SMOTE with

resampling, and (iii) class equalizer. Random forest classier

has been observed to give the highest accuracy at 98.93% in

SMOTE with resampling.

2.1. Comparison of Machine Learning Classiers for CKD.

In this section, a comprehensive comparison of the state of

the art is presented in the form of a table. e evaluation is

formed in the aspect of accuracy, which can be compre-

hended in Table 3. e table has eight features that are

described below:

Author: this contains the names of the authors of each

article along with the reference.

Year: this column provides the year of the paper’s

publication.

Input data: this column shows the type of dataset that

was used as input for the machine learning classiers.

Disease type: is section shows the type of disease that

was predicted by using dierent classiers. It shows the

best classier found in the research paper, which is the

classier with the maximum accuracy.

4Computational Intelligence and Neuroscience

Table 3: Comparison of classiers for CKD.

no Authors Year Input

data

Disease

type Tools Classiers Cross-validation Accuracy

1Padmanaban and Parthiban

[8] 2016 Diabetic patients CKD WEKA, YALE Na¨

ıve Bayes 10 folds 86%

UCI machine learning Decision tree 91%

2 Charleonnan et al. [9] 2016 Clinical data CKD WEKA,

MATLAB

SVM

5 folds

98. %

Logistic regression 96.55%

Decision tree 94.81%

KNN 98.1%

3 Ghosh et al. [7] 2020 Apollo Hospitals India CKD Python

SVM

5 folds

99.56%

AB 97.91%

LDA 97.91%

GB 99.80%

4 Fu et al.. [10] 2018 UCI repository (CKD dataset) CKD Python

RPART

cross-validation

98.2%

SVM 97.3%

LOGR 99.4%

MLP 99.5%

5 Devika et al. [11] 2019 UCI repository (CKD dataset) Chronic renal

disorder C Sharp

Na¨

ıve Bayes No

cross-validation

99.63%

KNN 87.78%

Random forest 99.84%

6 Revathy et al. [12] 2019 UCI repository (CKD dataset) CKD Python

Decision tree No

cross-validation

94.16%

SVM 98.33%

Random forest 99.16%

7 Nishat et al. [14] 2021 Learning repository of University of

California, Irvine CKD Python

CNN

cross-validation

78%

LR 98.25%

DT 99%

RF 99.75%

SVM 85%

NB 96.5%

MLP 81.25%

QDA 37.5%

8 Rabby et al. [13] 2019 UCI repository (CKD dataset) CKD Python

K-nearest neighbor

cross-validation

71.25%

RF 98.75%

SVM 97.50

GNB 100%

AB 98.75%

DT 100%

LDA 97.50%

GB 98.75

LR 97.50%

ANN 65%

9 Pouriyeh et al. [15] 2020 UCI repository (CKD dataset) CKD Python RF 10 folds 97.12%

ANN 94.5%

Computational Intelligence and Neuroscience 5

Table 3: Continued.

no Authors Year Input

data

Disease

type Tools Classiers Cross-validation Accuracy

10 Jabber et al. [16] 2020 UCI repository (CKD dataset) CKD Python

Decision tree

cross-validation

96.79%

Logistic regression 97.86%

Na¨

ıve Bayes 97.33%

Random forest 98.9 %

11 Bmc [17] 2013 UCI repository Diabetic kidney

disease MATLAB

SVM

cross-validation

0.91

PLS 0.83

FFNN 0.85

RPART 0.87

Random forest 0.91

Na¨

ıve Bayes 0.86

C5.0 0.90

12 Ramya and Radha [18] 2016 UCI repository Chronic kidney

disease R

BP No

cross-validation

80.4

RBF 85.3

Random forest (RF) 78.6

13 Kumar [19] 2016 UCI repository CKD MATLAB

cross-validation

95.67

SMO 90

Na¨

ıve Bayes 87.64

RBF 83.78

MLPC 89

SLG 87

14 Basarslan and Kayaalp [20] 2019 UCI repository Chronic kidney

disease MATLAB

K-nearest neighbor

cross-validation

Na¨

ıve Bayes 96.5

LR 97.56

RF 99

15 Dowluru and Rayavarapu

[21] 2012 UCI repository Kidney stone

WEKA tool

Na¨

ıve Bayes

classication No

cross-validation

0.99

Logistic regression 1.00

J48 algorithm 0.97

Random forest 0.98

Orange tool

Na¨

ıve Bayes 0.79

KNN 0.7377

Classication tree 0.9352

C4.5 0.9352

SVM 0.9198

Random forest 0.9352

Bold values represent the highest accuracy in the relevant paper.

6Computational Intelligence and Neuroscience

Classiers: this column signies the dierent machine

learning classiers that were used in the research and

the comparison between them.

Tool: e column represents the programming lan-

guage or the framework that was used in building the

model. e researchers used these tools to preprocess

the input data, then create a prediction model, and

nally go to the testing stage.

Cross-validation: this column gives information about

the validation of the classiers and makes a comparison

of dierent research papers regarding folds of cross-

validation used.

Accuracy: e accuracy of the outcomes of the rec-

ommended model is represented in this column. If the

article crisscrosses a comparison, the accuracy column

only contains the accuracy percent of the best classier

conrmed by the author.

2.2. ML Classier with Highest Accuracy. e machine

learning algorithms that we analyzed from the above lit-

erature are listed in Table 4 and Figure 3.

3. Proposed Methodology

e proposed hybrid model is implemented in Python with

pandas, sklearn, Matplotlib, Plotly, and other essential libraries.

We have downloaded the CKD dataset from the UCI re-

pository. e dataset contains two groups (CKD represented by

1 and non-CKD represented by 0) of chronic kidney disease in

the downloaded information. e machine learning algorithm

that has best accuracy is selected for analysis and imple-

mentation so that repeated results are produced. We have also

developed a hybrid model based on knowledge that we gained

during the analysis and implementation. e hybrid model

consists of Gaussian Na¨

ıve Bayes, gradient boosting, and de-

cision tree as base classiers and random forest as a meta

classier. We have selected the tree-based machine learning

algorithms for achieving the highest accuracy, while at the same

time, it can handle the overtting problem. In this paper, we

detect the outliers with the violin plot as shown in Figure 4. As

a solution of this problem, we implement the k-fold technique

and design our model in such a way that it can reduce the

problem of overtting along with achieving the highest ac-

curacy. e classiers are discussed as under.

3.1. Na¨

ıve Bayes (NB). e NB classier is related to the

group of probabilistic classiers and is constructed on the

basis of the Na¨

ıve Bayes (NB) theorem. It takes up vigorous

independence between the component’s/features, and it

contains the most crucial part of how this classier creates

forecasts. It can be built easily and is appropriately used in

the medical eld for the prediction of dierent diseases [15].

3.2. Decision Tree (DT). e decision tree classier has

a tree-like conguration or owchart-like construction. It

consists of subdivisions, leaves/child nodes, and a root/

parent node. Here inner nodes comprise the features,

whereas the subdivisions epitomize the outcome of every

check on every node. Decision tree is one of the commonly

used classiers for classication determination because it

does not need abundant information in the eld or place

constraints for it to work [15].

3.3. Random Forest (RF). In the ensemble and stacking

classication approach, the random forest (RF) is the most

eective algorithm among the other machine learning al-

gorithms. In prediction and probability estimations, random

forest (RF) algorithm has been used. Random forest (RF)

classier consists of many decision trees. Tin Kam Ho of Bell

Labs introduced the concept of random forest in 1995, where

each decision tree casts a vote to determine the object’s class.

e RF method is the combination of both bagging and

random selection of attributes. Random forest classier has

the three hyperparameter tuning values [16].

(i) Number of decision trees (ntree) used by the

random forest classier

(ii) Size of the minimum node in the trees

(iii) Number of attributes employed in splitting every

node for every tree (mtry). Here, m is the number of

attributes.

Table 4: Machine learning algorithms and classiers.

Articles Classiers Highest accuracy (%)

1 Decision tree 91

2 SVM 98.3

3 GB 99.80

4 MLP 99.5

5 Random forest 99.84

6 Random forest 99.16

7 Random forest 99.75

8GNB 100

Decision tree 100

9 Random forest 97.12

10 Random forest 98.93

Bold values represent the highest accuracy in the literature.

91% 98.30%

99.80%

99.50%

99.84%

99.16%

99.75%

100%

97.12%98.93%

Highest Accuracy

Decision Tree

SVM

MLP

Random forest

GNB

Decision Tree

Random Forest

Figure 3: Comparison of machine learning classiers.

Computational Intelligence and Neuroscience 7

100

age

class

1.0

0.0

180

160

140

120

100

class

1.0

0.0

1.025

1.02

1.015

1.01

1.005

class

1.0

0.0

−1

class

1.0

0.0

class

1.0

0.0

500

400

300

200

100

bgr

class

1.0

0.0

1.025

1.02

1.015

1.01

1.005

class

1.0

0.0

400

300

200

100

class

1.0

0.0

(a)

Figure 4: Continued.

8Computational Intelligence and Neuroscience

Some of the advantages of the random forest classier

are listed as follows.

(i) For ensemble learning algorithms, the random

forest is the most appropriate choice

(ii) For large datasets, random forest classier

performs well

(iii) Random forest (RF) is able to handle hundreds of

input attributes

(iv) Random forest can estimate which attributes are

more important in classication

(v) Missing value can be handled by using random

forest classier

(vi) Random forest handles the balancing error for class

in unbalanced datasets

3.4. Gaussian Na¨

ıve Bayes (GNB). Gaussian Na¨

ıve Bayes

(GNB) calculated the mean and standard deviation of each

attribute at the training stage. To calculate the probabilities for

the test data, mean and standard deviation were used. Due to

this reason, some values of attributes are too big or too small

from the value of the mean calculated. It aects the classier

25k

20k

15k

10k

001

class

1.0

0.0

class

1.0

0.0

pcv

0 1

class

1.0

0.0

150

100

sod

class

1.0

0.0

hemo

class

1.0

0.0

pot

class

1.0

0.0

(b)

Figure 4: Violin plot of attributes.

Computational Intelligence and Neuroscience 9

performance when testing data patterns have those attribute

values and gives sometimes wrong output labels [22].

3.5. Hybrid Model. We use the concept of stacking for our

hybrid model. As a type of ensemble technique in stacking,

multiple classication models were combined with a main/

meta classier. One after the other, multiple layers were

placed, where the models pass their predictions, and the

upper most layer model makes decisions on the base of the

combination of dierent models as a base model. e models

in the low layer get attributes as input from the original data.

e topmost layer of the model gets output from the lower

layers and gives the results as a nal prediction. e stacking

technique involves using multiple independent machine-

learning models as input to process the original data. After

that, the meta classier is used to predict the input along with

the output of each machine learning model and individual

algorithm’s weights are estimated. e algorithms that are

performing best are selected, and others having low perfor-

mance are removed. In this technique, multiple classiers as

base model are combined and then, by using dierent ma-

chine learning algorithms, are trained on the same dataset

through the use of a meta-classier [23]. Figure 5 shows the

ow diagram for the proposed hybrid model.

e execution of the model with the sequence of the

steps is given below:

(i) Collect the data of CKD from UCI repository

(ii) Exploratory data analysis (EDA) is performed on

that dataset

(iii) is dataset is split into two parts: test data and

train data

(iv) Apply the cross-validation of 10 folds

(v) Train the base models Gaussian Na¨

ıve Bayes, gradient

boosting, and decision tree with the train set giving

the predictions as M1, M2, and M3, respectively

(vi) e output of the base models M1, M2, and M3 and

test set data serve as input for random forest as

input for training

(vii) Once the random forest gets trained, it gives the

prediction on the basis of training dataset and the

output predictions of the base models

In this study, we have considered the UCI CKD dataset,

and this dataset is split into two parts. 80% of data is used for

training purposes as an input to the machine learning al-

gorithms. We exploited the Gaussian Na¨

ıve Bayes, gradient

boosting, decision tree, and stacking classier with random

forest algorithm which was used to predict the chronic

kidney disease for 20% test data as input and plotted the

predicted values and compared their values. Our proposed

methodology has the following advantages.

(i) We implemented four machine learning algorithms

that are decision tree, gradient boosting, Gaussian

Na¨

ıve Bayes, and random forest. We applied

stacking classiers to build the hybrid model that

combines these four algorithms.

(ii) We analyzed the accuracy of the same dataset with

respect to dierent machine learning algorithms

and compared their accuracy score to get the

best model

(iii) We implemented a stacking classier technique to

build a new model with improved accuracy

4. Dataset Details

We selected 14 attributes from the dataset that we are

using from the UCI repository dataset of chronic kidney

disease as input features as shown in Table 5 where age

attribute shows the patient’s age, bp indicates the blood

pressure, sg indicates the specic gravity of the urine, al

indicates the level of aluminum in the patient urine, bgr

(blood glucose random) indicates the blood sugar level

glucose tolerance, su represents the sugar level, bu in-

dicates the blood urea, sod indicates the amount of so-

dium, sc indicates the serum creatinine, pot indicates the

amount of potassium, hemo indicates the hemoglobin,

and pcv indicates the packed cell volume. Further, wc

indicates the white blood cell count, and rc indicates the

red blood cell count.

To identify the number of chronic kidney disease pa-

tients and the number of healthy ones, we performed the

visualization on the CKD dataset, which can be seen in the

histogram plot in Figure 6. Here 0.0 represents the healthy

cases, while 1.0 represents the chronic kidney disease pa-

tients. In this dataset, there are 250 chronic kidney disease

patients, while 150 are healthy people.

e Pearson correlation feature selection method is used

to get the best combination of features for the prediction of

chronic kidney disease. e correlation of the 14 attributes

and 1 output label is presented in Figure 7.

When we go from the exploratory data analysis stage to

the pair plot visualization, it is observed to be very helpful as

it gives the data that can be used to nd the relationship

between attributes for both the categorical and continuous

variables. We import the Seaborn library to get pair plot. e

information about all the attributes is in one picture and is

clear. e statistical information is in attractive format

represented with pair plot as shown in Figure 8.

e violin plots are used for all the attributes in ex-

ploratory data analysis that are used in the hybrid model.

ese can give additional useful information like density

trace and distribution of the dataset. e violin plots give the

whole range of dataset which cannot be shown by box plot.

e violin plots of all 14 attributes are given in Figure 4.

Figure 9 shows the comparison of dierent models’ accuracy

scores in the form of a chart.

. Results and Discussion

Machine learning algorithms such as gradient boosting,

Gaussian Na¨

ıve Bayes, decision tree, and random forest

classier were used in the proposed hybrid model. ese

dierent machine learning classiers were used as a com-

bination for the chronic kidney disease predictions. is also

overcomes the overtting problem and results in higher

10 Computational Intelligence and Neuroscience

Start

Ckd data

set

Exploratory Data Analysis

Data preprocessing

Data splitting

Cross validation

Implementing Base

models

Trained based model

Random Forest

Prediction

Final Model

Test se t

Train set

ckd

Not-ckd

Figure 5: Flowchart for the proposed model.

Table 5: e attribute set with their data types.

# Attributes Full form Data type Nonempty value Missing values

0 age Age oat64 400 0

1 bp Blood pressure oat64 400 0

2 sg Specic gravity of urine oat64 400 0

3 al Level of aluminum oat64 400 0

4 su Sugar level oat64 400 0

5 bgr Blood glucose random oat64 400 0

6 bu Blood urea oat64 400 0

7 sc Sugar level oat64 400 0

8 sod Amount of sodium oat64 400 0

9 pot Amount of potassium oat64 400 0

10 hemo Hemoglobin oat64 400 0

11 pcv Packed cell volume oat64 400 0

12 wc White cell oat64 400 0

13 rc Red cell oat64 400 0

Computational Intelligence and Neuroscience 11

accuracy. In order to improve accuracy and to come up with

a novel approach as compared to the existing work, we have

implemented the proposed hybrid model with the best

combination of GB, GNB, and decision tree, along with the

random forest classiers [24–27]. e results described in

Table 6 show that diagnosis of chronic kidney disease is

eective using the random forest with combination as

a stacking technique in the hybrid model. Gradient boosting

achieves 99% accuracy, random forest achieves 98% accu-

racy, and our hybrid model achieves 100% accuracy, and at

the same time, it has reduced the chances of overtting.

In order to nd the contributions to the development of

prediction models for chronic kidney disease, a regional

basis analysis is performed. As discussed in the Introduction

section that the developing countries’ population suers

more from chronic kidney disease, it was observed that most

of the research work is performed in developing countries. A

summary of this region-wise contribution is presented in

Figure 10.

250

200

150

100

count

1.0

class

0.0

Figure 6: Histogram plot.

0.8

0.6

0.4

0.2

0.0

−0.2

−0.4

−0.6

age

bgr

sod

pot

hemo

pcv

class

age

bgr

sod

pot

hemo

pcv

class

Figure 7: Heat map of chosen attributes.

Figure 8: Pair plot of each attribute.

Hybrid

Model

Gradient

Boosting

Random

Forest

Decision

Tre e

Gaussian

Naïve Bayes

Accuracy 100 99 98 96 93

100

102

Axis Title

Accuracy

Figure 9: Accuracy score of implemented machine learning classiers.

Table 6: Accuracy score of implemented machine learning

classiers.

ML algorithms Accuracy (%)

Gradient boosting 99

Gaussian Na¨

ıve Bayes 93

Decision tree 96

Random forest 98

Hybrid model 100

Asia

50%

Europe

20%

Africa

10%

America

20%

REGION WISE

CONTRIBUTIONS

Figure 10: Region-wise contributions.

12 Computational Intelligence and Neuroscience

6. Conclusion

Chronic kidney disease is considered as one of the prom-

inent life-threatening diseases in the developing world. e

most obvious cause seems to be lack of physical exercise. e

medical practitioners used a number of diagnosis processes

and procedures, where machine learning is the recent de-

velopment. In this paper, we have selected machine learning

because in terms of accuracy, it performs better as compared

to other available approaches. In this article, we have used

the Pearson correlation feature selection method and ap-

plied the same on machine learning classier. GB, GNB,

decision tree, and random forest are the base classiers for

the stacking algorithm, whereas these are implemented with

the cross-validation on the basis of accuracy score. In this

study, we evaluated these algorithms on the same dataset.

Furthermore, we have used dataset of CKD from the UCI

directory that contains 14 attributes and 400 instances. On

the basis of these attributes, our proposed stacking model is

able to predict whether the person is a CKD patient or not

with 100% accuracy. Best features are selected using the

Pearson correlation method, and the stacking algorithm is

implemented with the best machine learning classiers. e

cross-validation enhances the performance of the stacking

model. As we have worked on the chronic kidney disease

data of the binary group, the stacking algorithm performs

better with these combinations of algorithms. We can im-

plement the stacking technique for the prediction of other

diseases to get better accuracy score.

Data Availability

No data were used to support this study.

Conflicts of Interest

e authors declare that they have no conicts of interest.

References

[1] V. Jha, G. Garcia-Garcia, K. Iseki et al., “Chronic kidney

disease: global dimension and perspectives,” e Lancet,

vol. 382, no. 9888, pp. 260–272, 2013.

[2] R. Ruiz-Arenas, “A summary of worldwide national activities

in chronic kidney disease (CKD) testing, the electronic

journal of the international federation of,” Clinical Chemistry

and Laboratory Medicine, vol. 28, no. 4, pp. 302–314, 2017.

[3] edailystar, “Over 35,000 develop kidney failure in Ban-

gladesh every year,” 2019, https://www.thedailystar.net/city/

news/18m-kidney-patients-bangladesh-every-year-1703665.

[4] Prothomalo, “Women more aected by kidney diseases,”

2018, https://en.prothomalo.com/bangladesh/Womenmore-

aected-by-kidney-diseases.

[5] Scottish Intercollegiate Guidelines Network (Sign), Diagnosis

and Management of Chronic Kidney Disease: A National

Clinical Guideline, SIGN, Victoria, Australia, 2008.

[6] M. Kavitha, G. Gnaneswar, R. Dinesh, Y. R. Sai, and

R. S. Suraj, “Heart disease prediction using hybrid machine

learning model,” in Proceedings of the 2021 6th International

Conference on Inventive Computation Technologies (ICICT),

Coimbatore, India, January 2021.

[7] P. Ghosh, F. M. Javed Mehedi Shamrat, S. Shultana, S. Afrin,

A. A. Anjum, and A. A. Khan, “Optimization of prediction

method of chronic kidney disease using machine learning

algorithm,” in Proceedings of the 2020 15th International Joint

Symposium on Articial Intelligence and Natural Language

Processing (iSAI-NLP), Bangkok, ailand, November 2020.

[8] K. R. A. Padmanaban and G. Parthiban, “Applying machine

learning techniques for predicting the risk of chronic kidney

disease,” Indian Journal of Science and Technology, vol. 9,

no. 29, 2016.

[9] A. Charleonnan, T. Fufaung, T. Niyomwong,

W. Chokchueypattanakit, S. Suwannawach, and

N. Ninchawee, “Predictive analytics for chronic kidney dis-

ease using machine learning techniques,” in Proceedings of the

2016 Management and Innovation Technology International

Conference (MITicon), Bang-San, ailand, October 2016.

[10] G.-S. Fu, Y. Levin-Schwartz, Q.-H. Lin, and D. Zhang,

“Machine learning for medical imaging,” Journal of healthcare

engineering, vol. 2019, pp. 1-2, 2019.

[11] R. Devika, S. V. Avilala, and V. Subramaniyaswamy,

“Comparative study of classier for chronic kidney disease

prediction using naive Bayes, KNN and random forest,” in

Proceedings of the 2019 3rd International Conference on

Computing Methodologies and Communication (ICCMC),

Erode, India, March 2019.

[12] S. Revathy, B. Bharathi, P. Jeyanthi, and M. Ramesh, “Chronic

kidney disease prediction using machine learning models,”

International Journal of Engineering and Advanced Technol-

ogy, vol. 9, no. 1, pp. 6364–6367, 2019.

[13] A. S. A. Rabby, R. Mamata, M. A. Laboni, Ohidujjaman, and

S. Abujar, “Machine learning applied to kidney disease pre-

diction: comparison study,” in Proceedings of the 2019 10th

International Conference on Computing, Communication and

Networking Technologies (ICCCNT), Kanpur, India, July 2019.

[14] M. Nishat, F. Faisal, R. Dip et al., “A comprehensive analysis

on detecting chronic kidney disease by employing machine

learning algorithms,” EAI Endorsed Transactions on Pervasive

Health and Technology, vol. 7, Article ID 170671, 2018.

[15] S. Pouriyeh, S. Vahid, G. Sannino, G. De Pietro, H. Arabnia,

and J. Gutierrez, “A comprehensive investigation and com-

parison of Machine Learning Techniques in the domain of

heart disease,” in Proceedings of the 2017 IEEE Symposium on

Computers and Communications (ISCC), Heraklion, Greece,

July 2017.

[16] M. A. Jabbar, B. L. Deekshatulu, and P. Chandra, “Intelligent

heart disease prediction system using random forest and

evolutionary approach,” Journal of network and innovative

computing, vol. 4, pp. 175–184, 2016.

[17] Bmc, “Biomedcentral,” 2022.

[18] S. Ramya and N. Radha, “Diagnosis of chronic kidney disease

using machine learning algorithms,” International Journal of

Innovative Research in Computer and Communication Engi-

neering, vol. 4, no. 1, 2016.

[19] M. Kumar, “Prediction of chronic kidney disease using

random forest machine learning algorithm,” International

Journal of Computer Science and Mobile Computing, vol. 5,

pp. 24–33, 2016.

[20] M. S. Basarslan and F. Kayaalp, “Performance analysis of fuzzy

rough set-based and correlation-based attribute selection

methods on detection of chronic kidney disease with various

classiers,” in Proceedings of the 2019 Scientic Meeting on

Electrical-Electronics and Biomedical Engineering and Com-

puter Science (EBBT), April 2019.

Computational Intelligence and Neuroscience 13

[21] S. K. Dowluru and A. K. Rayavarapu, “Statistical and data

mining aspects on kidney stones: a systematic review and

metza-analysis,” Open Access Scientic Reports, vol. 1, no. 12,

2012.

[22] S. M. M. Hasan, M. A. Mamun, M. P. Uddin, and

M. A. Hossain, “Comparative analysis of classication ap-

proaches for heart disease prediction,” in Proceedings of the

2018 International Conference on Computer, Communication,

Chemical, Material and Electronic Engineering (IC4ME2),

pp. 1–4, Rajshahi, Bangladesh, February 2018.

[23] C. B. C. Latha and S. C. Jeeva, “Improving the accuracy of

prediction of heart disease risk based on ensemble classi-

cation techniques,” Informatics in Medicine Unlocked, vol. 16,

Article ID 100203, 2019.

[24] A. J. Aljaaf, A.-J. Dhiya, H. M. Hussein et al., “Early prediction

of chronic kidney disease using machine learning supported

by predictive analytics,” in Proceedings of the 2018 IEEE

Congress on Evolutionary Computation (CEC), Rio de Janeiro,

Brazil, July 2018.

[25] S. Khan, M. Z. Khan, P. Khan, G. Mehmood, A. Khan, and

M. Fayaz, “An ant-hocnet routing protocol based on opti-

mized fuzzy logic for swarm of UAVs in FANET,” Wireless

Communications and Mobile Computing, vol. 2022, Article ID

6783777, 12 pages, 2022.

[26] M. Fayaz, G. Mehmood, A. Khan, S. Abbas, M. Fayaz, and

J. Gwak, “Counteracting selsh nodes using reputation based

system in mobile ad hoc networks,” Electronics, vol. 11, no. 2,

p. 185, 2022.

[27] M. Z. U. Haq, M. Z. Khan, H. U. Rehman et al., “An adaptive

topology management scheme to maintain network con-

nectivity in Wireless Sensor Networks,” Sensors, vol. 22, no. 8,

p. 2855, 2022.

14 Computational Intelligence and Neuroscience

Content uploaded by Gulzar Mehmood

Content may be subject to copyright.

Transforming glomerulonephritis care through emerging diagnostics and therapeutics

Article

Full-text available

Jun 2024

Background: Glomerulonephritis refers to a range of conditions involving inflammation and injury to the kidneys' glomeruli, often leading to significant morbidity if left untreated. Purpose: This review aims to examine emerging advancements in the prevention and treatment of glomerulonephritis and highlight progress in transforming the prognosis of this spectrum of diseases, while also identifying gaps requiring ongoing effort. Main body: Novel targeted immunotherapies utilizing engineered delivery platforms and biologicals like monoclonal antibodies are progressing in research pipelines, potentially offering safer, more efficacious alternatives to current standard immunosuppression. High-throughput biomarker assays and AI/machine learning algorithms have demonstrated the ability to improve early detection of kidney damage and guide personalized treatment plans. Further prevention opportunities emerge from modulating microbiome-immune interactions, lifestyle factors, and vaccinations shielding against infections triggering renal disorders. Conclusion: Although challenges remain, recent advancements in unraveling the pathogenesis of glomerulonephritis coupled with the emergence of cutting-edge diagnostics and targeted interventions set the stage for a new era combating the risk and progression of this spectrum of diseases.

Enhancing outlier detection in air quality index data using a stacked machine learning model

Article

Full-text available

May 2024

The air quality index (AQI) is a commonly employed metric for evaluating air quality across diverse locations and temporal spans. Similar to other environmental datasets, AQI data can exhibit outliers data points markedly divergent from the norm, signifying instances of exceptionally favorable or adverse air quality. This becomes crucial in identifying and comprehending severe pollution episodes with far‐reaching environmental and public health implications. This study utilizes air quality data from January 1, 2014, to January 31, 2021, collected at daily intervals in Shanghai City, China, as the experimental dataset. The dataset includes daily AQI measurements, along with six pollutant concentrations: particulate matter (PM2.5 and PM10), sulfur dioxide (SO2), nitrogen dioxide (NO2), ozone (O3), and carbon monoxide (CO). Each pollutant's concentration is measured in micrograms per cubic meter (μ$$ \upmu $$g/m 3$$ {}^3 $$). The dataset is then preprocessed by cleaning and normalizing it before using K‐means clustering to discover different patterns. A stacked ensemble machine learning model that incorporates K‐means clustering, random forest (RF) and gradient boosting classifier (GBC) is developed and compared to decision tree, support vector machine, K‐nearest neighbor and Naive Bayes algorithms to evaluate its performance in identifying outliers using accuracy, precision, recall, and F1‐score. The stacked model outperformed all other established models based on the accuracy, precision, recall, and F1‐score of 0.99, 0.99, 0.97, and 0.99, respectively.

Statistical Analysis of Renal Risk Factors and Prediction of Chronic Kidney Disease

Article

Full-text available

Apr 2024

An extensive world population, in particular aged people, is suffering from chronic kidney disease (CKD). Early prediction of CKD is crucial in mitigating disease complications, slowing its progression, and improving patient survival rates. This work analyzed CKD ailment-related issues using three computational approaches: (1) Several statistical methods were investigated to find the relationship between a heterogeneous risk factor and disease. In addition, different significance tests were exercised to classify the risk factors for two classes, with and without CKD. (2) A hybrid statistical approach was followed to identify the most critical risk factors significant to predict CKD. (3) Machine learning techniques were used to predict the onset of chronic kidney disease in terms of the significant risk factors. Several experiments were conducted to substantiate the efficacy of the proposed analysis and prognosis. Proposing a statistical approach that outperforms existing methods to identify the minimum number of significant risk factors and predict CKD using those factors without compromising maximum prediction accuracy strengthens the contribution of the research. Indeed, it incorporates a low-cost approach in the field of affordable healthcare.

Breast Cancer Detection: An Evaluation of Machine Learning, Ensemble Learning, and Deep Learning Algorithms

Chapter

Jun 2024

Early Prediction and Progrssion of Chronic Kidney Disease Using Machine Lerning Techniques

Conference Paper

Apr 2024

Assessing Chronic Kidney Disease Prediction Models: A Comparative Analysis of Smart AI and Intelligent Machine Learning Approaches

Conference Paper

Jan 2024

Computer-Aided Diagnosis of Duchenne Muscular Dystrophy Based on Texture Pattern Recognition on Ultrasound Images Using Unsupervised Clustering Algorithms and Deep Learning

Article

Apr 2024

Machine Learning Predictive Model for Chronic Kidney Disease Classification Using Python

Conference Paper

Dec 2023

Early Prediction of Chronic Kidney Disease: A Comprehensive Survey

Conference Paper

Jan 2024

Kidney Disease Prediction using ML techniques

Conference Paper

Feb 2024

An Ant-Hocnet Routing Protocol Based on Optimized Fuzzy Logic for Swarm of UAVs in FANET

Article

Full-text available

Jun 2022
WIREL COMMUN MOB COM

Drones or unmanned aircraft are commonly known as unmanned aerial vehicles (UAVs), and the ad hoc network formed by these UAVs is commonly known as Flying Ad Hoc Network (FANET). UAVs and FANET were initially associated with military surveillance and intelligence gathering; moreover, they are now excessively used in civilian roles including search and rescue, traffic monitoring, firefighting, videography, and smart agriculture. However, due to the distinctive architecture, they pose considerable design and deployment challenges, prominently related to routing protocols, as the traditional routing protocols cannot be used directly in FANET. For instance, due to high mobility and sparse topology, frequent link breakage and route maintenance incur high overhead and latency. In this paper, we employ the bio-inspired Ant Colony Optimization (ACO) algorithm called "Ant-Hocnet" based on optimized fuzzy logic to improve routing in FANET. Fuzzy logic is used to analyze the information about the status of the wireless links, such as available bandwidth, node mobility, and link quality, and calculate the best wireless links without a mathematical model. To evaluate and compare our design, we implemented it in the MATLAB simulator. The results show that our approach offers improvements in throughput and end-to-end delays, hence enhancing the reliability and efficiency of the FANET.

An Adaptive Topology Management Scheme to Maintain Network Connectivity in Wireless Sensor Networks

Article

Full-text available

Apr 2022
SENSORS-BASEL

The roots of Wireless Sensor Networks (WSNs) are tracked back to US military developments, and, currently, WSNs have paved their way into a vast domain of civil applications, especially environmental, critical infrastructure, habitat monitoring, etc. In the majority of these applications, WSNs have been deployed to monitor critical and inaccessible terrains; however, due to their unique and resource-constrained nature, WSNs face many design and deployment challenges in these difficult-to-access working environments, including connectivity maintenance, topology management, reliability, etc. However, for WSNs, topology management and connectivity still remain a major concern in WSNs that hampers their operations, with a direct impact on the overall application performance of WSNs. To address this issue, in this paper, we propose a new topology management and connectivity maintenance scheme called a Tolerating Fault and Maintaining Network Connectivity using Array Antenna (ToMaCAA) for WSNs. ToMaCAA is a system designed to adapt to dynamic structures and maintain network connectivity while consuming fewer network resources. Thereafter, we incorporated a Phase Array Antenna into the existing topology management technologies, proving ToMaCAA to be a novel contribution. This new approach allows a node to connect to the farthest node in the network while conserving resources and energy. Moreover, data transmission is restricted to one route, reducing overheads and conserving energy in various other nodes’ idle listening state. For the implementation of ToMaCAA, the MATLAB network simulation platform has been used to test and analyse its performance. The output results were compared with the benchmark schemes, i.e., Disjoint Path Vector (DPV), Adaptive Disjoint Path Vector (ADPV), and Pickup Non-Critical Node Based k-Connectivity (PINC). The performance of ToMaCAA was evaluated based on different performance metrics, i.e., the network lifetime, total number of transmitted messages, and node failure in WSNs. The output results revealed that the ToMaCAA outperformed the DPV, ADPV, and PINC schemes in terms of maintaining network connectivity during link failures and made the network more fault-tolerant and reliable.

Counteracting Selfish Nodes Using Reputation Based System in Mobile Ad Hoc Networks

Article

Full-text available

Jan 2022

A mobile ad hoc network (MANET) is a group of nodes constituting a network of mobile nodes without predefined and pre-established architecture where mobile nodes can communicate without any dedicated access points or base stations. In MANETs, a node may act as a host as well as a router. Nodes in the network can send and receive packets through intermediate nodes. However, the existence of malicious and selfish nodes in MANETs severely degrades network performance. The identification of such nodes in the network and their isolation from the network is a challenging problem. Therefore, in this paper, a simple reputation-based scheme is proposed which uses the consumption and contribution information for selfish node detection and cooperation enforcement. Nodes failing to cooperate are detached from the network to save resources of other nodes with good reputation. The simulation results show that our proposed scheme outperforms the benchmark scheme in terms of NRL (normalized routing load), PDF (packet delivery fraction), and packet drop in the presence of malicious and selfish attacks. Furthermore, our scheme identifies the selfish nodes quickly and accurately as compared to the benchmark scheme.

A Comprehensive Analysis on Detecting Chronic Kidney Disease by Employing Machine Learning Algorithms

Article

Full-text available

Aug 2021

Chronic Kidney Disease refers to the slow, progressive deterioration of kidney functions. However, the impairment is irreversible and imperceptible up until the disease reaches one of the later stages, demanding early detection and initiation of treatment in order to ensure a good prognosis and prolonged life. In this aspect, machine learning algorithms have proven to be promising, and points towards the future of disease diagnosis. We aim to apply different machine learning algorithms for the purpose of assessing and comparing their accuracies and other performance parameters for the detection of chronic kidney disease. The 'chronic kidney disease dataset' from the machine learning repository of University of California, Irvine, has been harnessed, and eight supervised machine learning models have been developed by utilizing the python programming language for the detection of the disease. A comparative analysis is portrayed among eight machine learning models by evaluating different performance parameters like accuracy, precision, sensitivity, F1 score and ROC-AUC. Among the models, Random Forest displayed the highest accuracy of 99.75%. We observed that machine learning algorithms can contribute significantly to the domain of predictive analysis of chronic kidney disease, and can assist in developing a robust computer-aided diagnosis system to aid the healthcare professionals in treating the patients properly and efficiently.

Optimization of Prediction Method of Chronic Kidney Disease Using Machine Learning Algorithm

Preprint

Full-text available

Nov 2020

Chronic Kidney disease (CKD), a slow and late-diagnosed disease, is one of the most important problems of mortality rate in the medical sector nowadays. Based on this critical issue, a significant number of men and women are now suffering due to the lack of early screening systems and appropriate care each year. However, patients' lives can be saved with the fast detection of disease in the earliest stage. In addition, the evaluation process of machine learning algorithm can detect the stage of this deadly disease much quicker with a reliable dataset. In this paper, the overall study has been implemented based on four reliable approaches, such as Support Vector Machine (henceforth SVM), AdaBoost (henceforth AB), Linear Discriminant Analysis (henceforth LDA), and Gradient Boosting (henceforth GB) to get highly accurate results of prediction. These algorithms are implemented on an online dataset of UCI machine learning repository. The highest predictable accuracy is obtained from Gradient Boosting (GB) Classifiers which is about to 99.80% accuracy. Later, different performance evaluation metrics have also been displayed to show appropriate outcomes. To end with, the most efficient and optimized algorithms for the proposed job can be selected depending on these benchmarks.

Chronic Kidney Disease Prediction using Machine Learning Models

Article

Full-text available

May 2020

Revathy Ramesh

: The field of biosciences have advanced to a larger extent and have generated large amounts of information from Electronic Health Records. This have given rise to the acute need of knowledge generation from this enormous amount of data. Data mining methods and machine learning play a major role in this aspect of biosciences. Chronic Kidney Disease(CKD) is a condition in which the kidneys are damaged and cannot filter blood as they always do. A family history of kidney diseases or failure, high blood pressure, type 2 diabetes may lead to CKD. This is a lasting damage to the kidney and chances of getting worser by time is high. The very common complications that results due to a kidney failure are heart diseases, anemia, bone diseases, high potasium and calcium. The worst case situation leads to complete kidney failure and necessitates kidney transplant to live. An early detection of CKD can improve the quality of life to a greater extent. This calls for good prediction algorithm to predict CKD at an earlier stage. Literature shows a wide range of machine learning algorithms employed for the prediction of CKD. This paper uses data preprocessing,data transformation and various classifiers to predict CKD and also proposes best Prediction framework for CKD. The results of the framework show promising results of better prediction at an early stage of CKD

Machine Learning Applied to Kidney Disease Prediction: Comparison Study

Conference Paper

Full-text available

Dec 2019

Machine learning has earned a remarkable position in the healthcare sector because of its capability to enhance disease prediction in the healthcare sector. Artificial intelligence and Machine learning techniques are being used in the healthcare sector. Nowadays, one of the world's crucial health-related problem is kidney disease. It is increasing day by day because of not maintaining proper food habits, drinking less amount of water and lack of health consciousness. So we need some technique that will continuously monitor health conditions effectively. Here, we have proposed an approach for real-time kidney disease prediction, monitoring, and application (KDPMA). Our aim is to find an optimized and efficient machine learning (ML) technique that can effectively recognize and predict the condition of chronic kidney disease. In this work, we used ten most popular machine learning technique to predict kidney disease. In this process, the data has been divided into two sections. In one section train dataset got trained and another section got evaluated by test dataset. The analysis results show that the Decision Tree Classifier and Gaussian Naive Bayes achieved the highest performance than the other classifiers, obtaining the accuracy score of 100% and 1 recall(Sensitivity) score. Now we are developing a mobile applications based on the best output results classifier technique to predict Kidney Disease from the patient reports.

Chronic Kidney Disease Prediction using Machine Learning Algorithms

Article

May 2023

Diabetes and high blood pressure are the primary causes of Chronic Kidney Disease (CKD). A person with CKD has a higher chance of dying young. Doctors face a difficult task in diagnosing the different diseases linked to CKD at an early stage to prevent the disease. Early discovery of CKD empowers sufferers to get the opportunity remedy to decorate the motion of this infection. CKD is among the top 20 causes of death worldwide and affects approximately 10% of the world's adult population. CKD is a disorder that disrupts normal kidney function. The novelty of this study lies in developing a diagnosis system to detect chronic kidney diseases. This study focused on evaluating a dataset collected from 400 patients containing 24 features. The mean and mode statistical analysis methods were used to replace the missing numerical and nominal values. To choose the most important features, Recursive Feature Elimination (RFE) was applied. Three classification algorithms applied in this study were k-nearest neighbors (KNN), Random Forest Classifier (RFC), and Ada Boost Classifier (ABC). All the classification algorithms achieved promising performance. The RFC and ABC Algorithm outperformed all other applied algorithms, reaching an accuracy, precision, recall, and F1-score of 100% for all measures. Therefore, Machine Learning techniques are of great importance in the early detection of CKD. These techniques are supportive of experts and doctors in early diagnosis to avoid developing kidney failure.

Heart Disease Prediction using Hybrid machine Learning Model

Conference Paper

Jan 2021

Heart disease causes a significant mortality rate around the world, and it has become a health threat for many people. Early prediction of heart disease may save many lives; detecting cardiovascular diseases like heart attacks, coronary artery diseases etc., is a critical challenge by the regular clinical data analysis. Machine learning (ML) can bring an effective solution for decision making and accurate predictions. The medical industry is showing enormous development in using machine learning techniques. In the proposed work, a novel machine learning approach is proposed to predict heart disease. The proposed study used the Cleveland heart disease dataset, and data mining techniques such as regression and classification are used. Machine learning techniques Random Forest and Decision Tree are applied. The novel technique of the machine learning model is designed. In implementation, 3 machine learning algorithms are used, they are 1. Random Forest, 2. Decision Tree and 3. Hybrid model (Hybrid of random forest and decision tree). Experimental results show an accuracy level of 88.7% through the heart disease prediction model with the hybrid model. The interface is designed to get the user's input parameter to predict the heart disease, for which we used a hybrid model of Decision Tree and Random Forest.

Optimization of Prediction Method of Chronic Kidney Disease Using Machine Learning Algorithm

Conference Paper

Oct 2020

Machine Learning Hybrid Model for the Prediction of Chronic Kidney Disease

Abstract and Figures

Recommended publications

Chronic kidney disease prediction using different machine learning models

Early Detection of Chronic Kidney Disease Using Different Machine Learning Algorithms

A Review on Kidney Failure Prediction Using Machine Learning Models

Exploring Recent Trends in Blockchain Technology: A Systematic Literature Review