Conference PaperPDF Available

Performance Evaluation of Random Forests and Artificial Neural Networks for the Classification of Liver Disorder

Authors:

Abstract and Figures

Liver is the major organ inside the human body which is very supportive for digesting food, eliminating poisons, and stocking energy. The rate of Liver disorder patients is rapidly rising all over the world. But it is very hard to identify the disorder from its ambiguous symptoms which increases the mortality rate due to this disease. The paper represents an expert scheme for the classification of liver disorder using Random Forests (RFs) and Artificial Neural Networks (ANNs). The methods train the input features using 10-fold cross validation fashion. The dataset named as BUPA liver dataset is retrieved from UCI machine learning repository for our research study. The performance of the proposed scheme is assessed in view of accuracy, positive predictive value, negative predictive value, sensitivity, specificity and F1 score. The scheme delivers a better result for training but comparatively low for testing. The scheme obtained the accuracy of 80% and 85.29% by RFs and ANNs respectively along with the F1 score of 75.86% and 82.76% in testing phase.
Content may be subject to copyright.
Performance Evaluation of Random Forests and
Artificial Neural Networks for the Classification of
Liver Disorder
Md. Rezwanul Haque
1
, Md. Milon Islam
1
, Hasib Iqbal
1
, Md. Sumon Reza
2
, and Md. Kamrul Hasan
1
1,2
Department of Computer Science and Engineering
1
Khulna University of Engineering & Technology, Khulna-9203, Bangladesh
2
Daffodil International University, Dhaka-1207, Bangladesh
r.haque.249.rh@gmail.com
1
, milonislam@cse.kuet.ac.bd
1
, pranto00250@gmail.com
1
, sumon.info2015@gmail.com
2
, and
mhgolap11@gmail.com
1
Abstract— Liver is the major organ inside the human body which
is very supportive for digesting food, eliminating poisons, and
stocking energy. The rate of Liver disorder patients is rapidly
rising all over the world. But it is very hard to identify the disorder
from its ambiguous symptoms which increases the mortality rate
due to this disease. The paper represents an expert scheme for the
classification of liver disorder using Random Forests (RFs) and
Artificial Neural Networks (ANNs). The methods train the input
features using 10-fold cross validation fashion. The dataset named
as BUPA liver dataset is retrieved from UCI machine learning
repository for our research study. The performance of the
proposed scheme is assessed in view of accuracy, positive
predictive value, negative predictive value, sensitivity, specificity
and F1 score. The scheme delivers a better result for training but
comparatively low for testing. The scheme obtained the accuracy
of 80% and 85.29% by RFs and ANNs respectively along with the
F1 score of 75.86% and 82.76% in testing phase.
Keywords- Liver Disorder; Random Forests; Artificial Neural
Networks; Performance Measure Indices.
I.
I
NTRODUCTION
Conferring to National statistics [1] in the UK, liver disorder
has been classified as the fifth most common reason of
mortality. It is also acknowledged as the second reason of death
amongst all gastral diseases in the US [2]. A recent statistics
from the International Liver Congress [3] shows that the
number of sufferers from a chronic liver condition in the
European zone is about 29000 thousand and 30000 thousand
have a liver disorder in America.
The liver is the biggest strong organ in the human body and
it is used to make and secrete bile as this is considered as gland.
The location of the liver is the upper right portion of the
abdomen which is surrounded the rib cage. The liver performs
many complex tasks inside the body and due to the damage of
the liver the tasks cannot be completed which creates illness.
Liver disorder is the trouble of liver task that origins illness [4].
Alcohol consumption plays the major role for liver disorder.
Parasites and viruses may affect the liver resulting
inflammation which decreases liver task. An irregular gene
inherited from the parents can cause liver disorder.
Generally it is very difficult to detect liver disorder because
of its ambiguous symptoms and confused with various health
difficulties [5]. Generally, the doctors used various types of
blood test to determine the proper functionality of liver. The
indicators used to detect the liver disorder are in LTs
considering gamma-glutamyl transpeptidase (GGT), alamine
aminotransferase alamine aminotransferase (ALT), alkaline
phosphotase (ALP) and aspartate aminotransferase (AST)
which are found in liver cells. The presence of above described
enzymes in blood causes liver disorder generally. The ALT and
AST are utilized to find the liver disorder in case of viral
hepatitis for the accurate diagnosis. The anomalies of ALP and
GGT causes liver disorder [6].
The above described techniques are unable to deliver an
exact and consistent result. The techniques are involved with
doctors or physicians or other medical staffs. So, a scheme
which can operate without any medical equipment’s and
medical staffs can lead to a suitable solution. We introduce an
expert scheme to classify the input attributes depending on the
selector of liver disorder. We have used two supervised
machine learning techniques termed as RFs and ANNs which
are related with learning computations that explore data utilized
for regression analysis and classification to identify the liver
disorder.
The rest of the paper is ordered as follows: Section II
illustrated the related works that are on-going in this field. The
working principle of the RFs and ANNs is described in Section
III. The proposed methodology with necessary steps is
explained in Section IV. The implementation and results
analysis are depicted in Section V. Section VI concludes the
paper.
II. R
ELATED WORKS
There are several recent techniques have been developed
with the evolution of technology for the classification of liver
disorder. The recent works in this field is described shortly as
follows.
Ramana et al. [7] proposed a technique for the detection of
liver disorder using selected classification algorithms. The
approach used are Support Vector Machines, C4.5, Naïve
Bayes classifier and Back propagation Neural Network for
classification and the performance of the technique is measured
in view of accuracy, precision, sensitivity etc. They simulated
their work in WEKA. The accuracy obtained by the technique
is 56.62% in NBC, 68.69% in C4.5, 71.59% in Back
Propagation, 62.89% for K-NN and 58.26% in SVM. They
showed the result in the paper by tuning the features.
In [8], the authors demonstrated a scheme for the
identification of liver disorder which using KNN with
Euclidean Distance. The results of the KNN based method are
compared with the other classifier. The scheme obtained
accuracy of 92.53% in testing and 100% in training.
Bahramirad et al. [9] drawn a comparative study for the
detection of liver disorder. The Rapid Mining and Weka are
used for simulation purpose. They used eleven classification
algorithms in their study. The highest accuracy obtained by
their technique is 73.91% using Neural Net and Gaussian
Processes. They also measured precision and recall for all the
algorithms. The precision and recall in the scheme is 79.01%
and 79.52% using Gaussian Processes and Neural Net
respectively.
The authors in [10] illustrated a survey and compare the
various data mining algorithms for the prediction of liver
disorder using Rapid Miner and IBM SPSS Modeler. C5.0,
C4.5, Decision tree and Neural Network are used as a
classification algorithms. The C4.5 and C5.0 algorithms
obtained accuracy of 72.37% and 87.91% by using IBM SPSS
Modeler and Rapid Miner tools respectively.
Dixon et al. [11] proposed a scheme termed as artificial
immune systems for liver disorder diagnosis which worked
depending on the blood test outcomes. The scheme employed
ANN and other classification algorithms. The detection rate
obtained in the scheme is 81.18% using MVD.
III. T
HE
T
HEORETICAL
E
XPLANATION
The learning procedure in Machine Learning strategies can be
partitioned into two principle classifications such as supervised
and unsupervised learning. Supervised learning is used to
predict a certain outcome from a given input with input-output
pairs. The machine learning model is formed from these input-
output pairs, which comprise the training set. The goal is to
make accurate predictions to new, never-before seen data.
A. Random Forests
The Random Forests classification is one of the supervised
classification techniques. The RFs classification is an ensemble
method that can be thought of as a form of nearest neighbor
predictor. Ensemble learning is the process by which multiple
models, such as classifiers or experts, are strategically
generated and combined to solve a particular computational
intelligence problem. Random Forests are essentially a
collection of decision trees. The Random Forests model is built
from the numbers of trees [12], [13].
The basic steps of Random Forests technique are as follows.
Step 1: Pick a random K data point from the training set
Step 2: Build the decision tree associated to these K data points
Step 3: Choose the number of N-tree for trees that are formed
and repeat steps 1 and 2
Step 4: For a new data point, form the N-tree that predict the
category to which the data points belongs, and assign the new
data point to the category that wins the majority vote.
Fig. 1. Working procedure of Artificial Neural Networks [12].
B. Artificial Neural Networks
Artificial neural networks are based on a rather simple model
of a neuron. Most neurons have three parts: a dendrite which
collects inputs from neurons (or from an external stimulus); a
soma which performs an important nonlinear processing step;
finally an axon, a cable-like write along which the output signal
is transmitted to other neurons further down the processing
chain.
In Neural network, output value can be continuous, binary, and
categorical. Here, we apply multilayer perceptron (known as
feed-forward) for classification and regression that can serve as
starting point for more involved deep learning method. In
construction of this network, the number of neurons in entering
layer is equal to the number of existing characteristics for
decision making about each sample data; here there are six input
layer. The working procedure of Artificial Neural Networks
[12] is illustrated in Fig. 1. In contrast, the nodes of network are
attributes of samples. Other parts of this network is hidden
layer. The number of hidden layers is considered as one layer
because one layer can mostly solve the question [12].
IV. THE
PROPOSED
M
ETHODOLOGY
A. Data Collection and Preparation
The liver disorder dataset named as BUPA is collected from
University of California in Irvine (UCI) machine learning
repository [14]. This dataset contains 345 records of liver
patient where the cases are labeled as either class label 1 or 2
and 145 (42.03%) of the cases are of class label 1 and 200
(57.97%) are of class label 2. The dataset has 7 features that are
as follows excluding class label.
MCV (x
1
)
Alkphos (x
2
)
SGPT (x
3
)
SGOT (x
4
)
Gammagt (x
5
)
Drinks / Day (x
6
)
The BUPA liver dataset is retrieved from the UCI machine
learning repository dataset [14]. If the diagnosis is positive it
comes under class label 1 category and if the diagnosis is
negative it comes under class label 2 category. Pearson
Fig. 2.
Pearson corelation for class label 1.
Fig. 3.
Pearson corrlatrion for class label 2.
correlation is a degree of the linear correlation between two
attributes. The correlation among six attributes of class label 1
and class label 2 is calculated that represents the high
correlation between positive and negative class as illustrated in
Fig.2 and Fig. 3. The correlation among the attributes is
positive.
B. Training and Testing Phase
Training phase acquires the data set from features and testing
phase used to measure how well the model performs at making
predictions on that testing set. In k-fold cross-validation, the
given data set is split into k equal size chunks. A single chunk
is used for testing and k-1 chunks is used for training. The
process is gone through k times. In this scenario all the dataset
are used for training as well as testing. It is possible to avoid the
overfitting scenario in k-fold cross validation.
C. The Performance Measure Indices
The performance of machine learning techniques is measured
in terms of some performance measure indices. A confusion
matrix for actual and predicted class is formed comprising of
TP, FP, TN, and FN to evaluate the parameter. The significance
of the terms is given below.
TP = True Positive ( Correctly Identified )
TN = True Negative ( Incorrectly Identified )
FP = False Positive ( Correctly Rejected )
FN = False Negative ( Incorrectly Rejected )
The performance of the proposed system is measured by the
following formulas:
V. I
MPLEMENTATION AND
R
ESULTS
A
NALYSIS
We have developed a model using Random Forests and
Artificial Neural Networks which is implemented in a high
configuration computer. The computer configuration was Intel
Core i5 with 8GB RAM. We have used Scikit-learn which is an
open source software developed in Python for machine learning
library. An Integrated development environment named as
Spyder is used to run the program.
We set some tuning parameters for both modalities. The tuning
parameter for both modalities is shown in Table I.
We have used the 10-fold cross-validation technique i.e. the
data set was split into 10 chunks. The 10 fold technique is
utilized to approve the methodical model. In this scenario, 9
folds are utilized for training and the rest one for testing.
TABLE I. TUNING PARAMETER
RFs ANNs
1 Number of estimators = 10 Hidden layer size=2
2 Criterion = 'entropy' Input dimension = 6
3 Random state = 1234 Batch size =12
4 Cross-validation =10 Cross-validation=10
0
0.5
1
x1 x2 x3 x4 x5 x6
x1 1 0.08 0.21 0.27 0.3 0.43
x2 0.08 1 0.02 0.16 0.17 0.06
x3 0.21 0.02 1 0.68 0.6 0.42
x4 0.27 0.16 0.68 1 0.56 0.43
x5 0.3 0.17 0.6 0.56 1 0.49
x6 0.43 0.06 0.42 0.43 0.49 1
Pearson corelation for class label 1
0
0.5
1
x1 x2 x3 x4 x5 x6
x1 1 0.01 0.12 0.18 0.21 0.24
x2 0.01 1 0.11 0.17 0.14 0.14
x3 0.12 0.11 1 0.78 0.48 0.07
x4 0.18 0.17 0.78 1 0.5 0.22
x5 0.21 0.14 0.48 0.5 1 0.26
x6 0.24 0.14 0.07 0.22 0.26 1
Pearson corelatrion for class label 2
Accuracy ( Acc )
=
()
()
TP TN
TP TN FP FN
+
+++
(1)
Sensitivity ( Sen )
=
()
TP
TP FN+
(2)
Specificity ( Spec )
=
()
TN
TN FP+
(3)
Positive Predictive Value ( PPV )
=
()
TP
TP FP+
(4)
Negative Predictive Value ( NPV )
=
()
TN
TN FN+
(5)
F1 Score
=
2
(2 )
TP
TP FP FN++
(6)
(a)
(b)
Fig. 4. Confusion matrix. (a) Training phase (b) Testing phase.
We have formed a confusion matrix from the model. We have
utilized 90% instances of total data for training both Random
Forests (RFs) classification and Artificial Neural Networks
(ANNs) individually. The remaining 10% instances used for
testing both in RFs and ANNs individually. The graphical
representation of the confusion matrix for each modality is
illustrated in Fig. 4.
The performance measure indices are calculated both for
training and testing using the above-described equations. The
calculated values are depicted in Table II and the graphical view
of the performance measure indices is illustrated in Fig. 5. The
results represented in Table II and Fig. 5 shows that ANNs has
the best performance over RFs in terms of specificity,
sensitivity, and the accuracy are obtained 89.47%, 80%, and
85.29% respectively in testing phase.
A comparison study is drawn in Table III for testing phase. In
[9], the authors measured the performance eleven classification
algorithms such as Gaussian Processes, Linear Logistic
Regression, Multilayer Perceptron, Neural Network, Support
Vector Machine etc. for the BUPA liver dataset. The accuracy,
sensitivity and PPV is obtained by Neural Network which are
73.91%, 79.52% and 77.65% respectively in testing phase. Lin
et al. [15] proposed a hybrid approach and the accuracy
obtained by the system is 78.18%. Polat et al. [16] introduced a
technique for the diagnosis of liver disorder using Artificial
Immune Recognition System (AIRS) and obtained the accuracy
of 83.38%. The accuracy achieved by our model is 80% and
85.29% in Random Forests and Artificial Neural Networks
individually. Developed model also provide greater percentage
in sensitivity and positive predictive value. We have used 10
fold technique to build our model.
TABLE II. PERFORMANCE MEASURE INDICES
Parameters Training Phase Testing Phase
ANNs RFs ANNs RFs
Accuracy (%) 84.24 98.57 85.29 80
Positive Predictive
Value (%)
74.81 100 85.71 84.61
Negative Predictive
Value (%)
91.11 96.15 85 77.27
Sensitivity (%) 85.96 97.78 80 68.75
Specificity (%) 83.25 100 89.47 89.47
F1 Score (%) 80 98.88 82.76 75.86
(a)
(b)
Fig. 5. Performance measure indices for the classification of liver disorder. (a)
Training phase (b) Testing phase.
TABLE III. COMPARISON WITH EXISTING METHODS IN TESTING
PHASE
Methods Acc (%) PPV
(%)
Sen (%)
Gaussian Processes 73.91 79.01 77.11
Linear Logistic Regression 69.57 74.70 74.70
Multilayer Perceptron 68.84 76.32 69.88
Neural Net 73.91 77.65 79.52
Support Vector Machine 69.23 75 75
CBR-PSO 78.18 - -
AIRS 83.38 - -
Developed model using RFs 80 89.47 68.75
Developed model using ANNs 85.29 89.47 80
0
50
100
150
200
TP TN FP FN
RFs 132 75 0 3
ANNs 98 164 33 16
Confusion matrix in training phase
0
5
10
15
20
TP TN FP FN
RFs 11 17 2 5
ANNs 12 17 2 3
Confusion matrix in testing phase
0
20
40
60
80
100
Acc PPV NPV Sen Spec F1
Score
Performance measure indices
RFs ANNs
0
20
40
60
80
100
Acc PPV NPV Sen Spec F1
Score
Performance measure indices
RFs ANNs
VI. C
ONCLUSION
Liver disorder classification is very significant in the area of
Medicare and Biomedical. In this paper we focused on building
a model which aims at classifying the most severe disease
known as liver disorder. Liver disorder is a remarkably risky
disease that causes a lot of death for numerous people all over
the world. So, early detection of this disorder can save a lot of
valuable life. We developed a model which is based on Random
Forests and Artificial Neural Networks. Both of the techniques
has been implemented by the Python to be the most effective in
classifying the diagnostic data set into the two classes in view
of the seriousness of the disease. We end up with an accuracy
of 80% and 85.29% in RFs and ANNs respectively in testing
phase. The developed model will be very helpful for the
medical staffs as well as general people. The model obtained by
supervised machine learning techniques will be very supportive
in the field of medical disorders and proper diagnosing.
R
EFERENCES
[1] UK National Statistics, http://www.statistics.gov.uk, accessed on: Sep.
25, 2017.
[2] Everhart JE, Ruhl CE “ Burden of digestive diseases in the United States
Part III: Liver, biliary tract, and pancreas,” Gastroenterology, 136,
pp.1134 –1144, 2009.
[3] American Liver Foundation. The Liver Lowdown – Liver Disease: the big
picture,http://www.liverfoundation.org/education/liverlowdown/ll1013/b
igpicture, accessed on: Sep. 25, 2017.
[4] Liver Disease, http://www.medicinenet.com/liver_disease/article.htm,
accessed on: Sep. 25, 2017.
[5] Gonzalez, F., Dasgupta, D., Nino, L. F., “A Randomized Real-
Valued Negative Selection Algorithm”, Second International
Conference on Artificial Immune Systems. United Kingdom. September,
2003.
[6] Diagnosing Liver Disease, http://www.liver.ca/liverdisease/diagnosing-
liver-disease, accessed on: Sep. 25, 2017.
[7] B. Ramana, M. Babu, N. Venkateswarlu, “A Critical Study of Selected
Classification Algorithms for Liver Disease Diagnosis”, International
Journal of Database Management Systems (IJDMS), Vol.3, No.2, pp.101
- 114, May 2011.
[8] Aman Singh; Babita Pandey, “An euclidean distance based KNN
computational method for assessing degree of liver damage”, 2016
International Conference on Inventive Computation Technologies
(ICICT), Vol. 1, pp. 1 – 4, 2016.
[9] S. Bahramirad, A. Mustapha and M. Eshraghi, "Classification of liver
disease diagnosis: A comparative study," 2013 Second International
Conference on Informatics & Applications (ICIA), Lodz, pp. 42-46, 2013.
[10] M. ABDAR, “ A Survey and Compare the Performance of IBM SPSS
Modeler and Rapid Miner Software for Predicting Liver Disease by Using
Various Data Mining Algorithms,” J. Sci.(CSJ), Vol. 36, 2015.
[11] S. Dixon and X. H. Yu, "Liver disorder detection based on artificial
immune systems," 2015 11th International Conference on Natural
Computation (ICNC), Zhangjiajie, pp. 743-748, 2015.
[12] G. James , D. Witten , T.Hastie , R.Tibshirani : An Introduction to
Statistical Learning, 2013.
[13] S. Guido, Andreas C. Müller : Introduction to machine learning with
python. O'Reilly Media, Inc. , 2016
[14] BUPA Liver Disorders Dataset. UCI Repository of Machine Learning
Databases, http://archive.ics.uci.edu/ml/datasets/Liver+Disorders,
accessed on: Sep. 25, 2017.
[15] Jyun Jie Lin and P. C. Chang, "A particle swarm optimization based
classifier for liver disorders classification," International Conference on
Computational Problem-Solving, Lijiang, pp. 63-65, 2010.
[16] K. Polat, S. Sahan, H. Kodaz and S. Gunes, "A new classification method
to diagnosis liver disorders: supervised artificial immune system (AIRS),"
Proceedings of the IEEE 13th Signal Processing and Communications
Applications Conference, pp. 169-174, 2005.
... Receiver operator characteristic (ROC) curve analysis was created and used to calculate the specificities, positive predictive value, and negative predictive value of the ANN. Furthermore, to assess the performance of the ANN, we used various evaluation metrics: the overall accuracy, the precision, the recall, and the F1-Score [37]. ...
... Table 7 shows the value of the area under the ROC curve and the predictive values of ANN models to predict the achievement of MCST. Furthermore, to assess the performance and accuracy of the ANN, we used evaluation metrics: the overall accuracy was 74.6%, precision was 62.7%, the recall was 95.4%, and the F1-Score was 75.6% [37]. ...
Article
Full-text available
Citation: Santilli, G.; Vetrano, M.; Mangone, M.; Agostini, F.; Bernetti, A.; Coraci, D.; Paoloni, M.; de Sire, A.; Paolucci, T.; Latini, E.; et al. Predictive Abstract: The supraspinatus tendon is one of the most involved tendons in the development of shoulder pain. Extracorporeal shockwave therapy (ESWT) has been recognized as a valid and safe treatment. Sometimes the symptoms cannot be relieved, or a relapse develops, affecting the patient's quality of life. Therefore, a prediction protocol could be a powerful tool aiding our clinical decisions. An artificial neural network was run, in particular a multilayer perceptron model incorporating input information such as the VAS and Constant-Murley score, administered at T0 and at T1 after six months. It showed a model sensitivity of 80.7%, and the area under the ROC curve was 0.701, which demonstrates good discrimination. The aim of our study was to identify predictive factors for minimal clinically successful therapy (MCST), defined as a reduction of ≥40% in VAS score at T1 following ESWT for chronic non-calcific supraspinatus tendinopathy (SNCCT). From the male gender, we expect greater and more frequent clinical success. The more severe the patient's initial condition, the greater the possibility that clinical success will decrease. The Constant and Murley score, Roles and Maudsley score, and VAS are not just evaluation tools to verify an improvement; they are also prognostic factors to be taken into consideration in the assessment of achieving clinical success. Due to the lower clinical improvement observed in older patients and those with worse clinical and functional scales, it would be preferable to also provide these patients with the possibility of combined treatments. The ANN predictive model is reasonable and accurate in studying the influence of prognostic factors and achieving clinical success in patients with chronic non-calcific tendinopathy of the supraspinatus treated with ESWT.
... Additionally, researchers have employed the BUPA dataset, also obtained from UCI machine repository [3], as highlighted in references [17][18][19]. This dataset comprises 345 instances featuring 7 attributes, with an extra attribute indicating disease and its severity. ...
... Machine learning [13,15,16,17], bio-inspired computation approaches [17,18,19,20,21,22,23], deep learning [24], and other diverse techniques have recently found use in the field of medical prediction. Despite the validation of a variety of strategies, none have consistently produced exact and trustworthy results. ...
Article
Full-text available
Breast cancer remains a significant public health issue worldwide, underlining the need for accurate and efficient diagnostic methods. In this paper, we propose a new technique to enhance breast cancer diagnosis through the integration of multiple machine-learning models. Our strategy employs a combination of the Naive Bayes classifier, Stochastic Gradient Descent (SGD), Bagging, and the ZeroR classifier, alongside Bayes Network learning. The cornerstone of our approach is Bayes Network learning, a probabilistic graphical model designed to map out the intricate interconnections among various diagnostic factors. This is significant in the way that it can help to uncover complex relationships in the data for the sake of leading to more accurate predictions. Added to the above, we use the Naïve Bayes classifier, a classifier showing good validity in classification tasks and based on probabilistic reasoning, for the screening of breast cancer. Further, a refined model's parameter is included using the SGD and leads to enhancement of the generalization and overall performance of the model. In addition, as part of controlling overfitting, one can also use Bagging. It is an ensemble method in the sense that it considers several models. ZeroR classifier is a very basic classifier and is just used to compare its performance with our composite approach. We are comparing complex ensemble results to its simplicity. We will validate the ability of our proposed methodology to compare the performance of our integrated models against ZeroR.
... Liver disease detection has been done by research using data mining techniques in the medical field, to diagnose liver disease with classification algorithms in data mining [2]. Using Random Forest and Artificial Neural Networks models by proposing a 10-fold cross-validation technique and tuning parameters resulted in the best accuracy in Artificial Neural Networks with an accuracy value of 85.29%, and a positive predictive value of 89.47%, and a sensitivity of 80% [6]. ...
Article
Liver disease has affected more than one million new patients in the world. which is where the liver organ has an important role function for the body's metabolism in channeling several vital functions. Liver disease has symptoms including jaundice, abdominal pain, fatigue, nausea, vomiting, back pain, abdominal swelling, weight loss, enlarged spleen and gallbladder and has abnormalities that are very difficult to detect because the liver works as usual even though some liver functions have been damaged. Diagnosis of liver disease through Deep Neural Network classification, optimizing the weight value of neural networks with the Particle Swarm Optimization algorithm. The results of optimizing the PSO weight value get the best accuracy of 92.97% of the Hepatitis dataset, 79.21%, Hepatitis 91.89%, and Hepatocellular 92.97% which is greater than just using a Deep Neural Network.
Article
Liver diseases represent a significant healthcare challenge, impacting millions globally and posing complexities in diagnosis. To address this global health concern, this paper introduces a groundbreaking enhancement to the Kepler Optimization Algorithm, termed I-KOA, designed specifically for feature selection in high-dimensional datasets. By harnessing the synergies of Opposition-Based Learning and a Local Escaping Operator grounded in the k-nearest Neighbor (kNN) classifier, I-KOA asserts itself as a potent tool for local exploitation, balanced exploration, and evasion of local optima. To our knowledge, this is the first work to exploit KOA as a feature selection method. Pioneering the utilization of KOA as a feature selection method, the paper rigorously tests I-KOA in two extensive experiments, tackling the complex CEC’22 benchmark suite functions and the intricate landscape of five liver disease datasets. Results underscore I-KOA’s unparalleled performance, validated through the Friedman test, where it surpasses seven rival optimization algorithms. Achieving an outstanding overall classification accuracy of 93.46%, Feature selection size of 0.1042, sensitivity of 97.46%, precision of 94.37%, and F1-score of 90.35% across the liver disease datasets, I-KOA’s randomized algorithm ensures robust feature selection, striking a compelling balance between subset size and classification efficacy. Acknowledging computational demands and generalization nuances, I-KOA is a formidable tool ready to revolutionize medical diagnosis and decision support systems. The open source codes of the proposed I-KOA are available at https://www.mathworks.com/matlabcentral/fileexchange/161376- improved-kepler-optimization-algorithm.
Article
Detecting breast cancer early is crucial for improving patient survival rates. Using machine learning models to predict breast cancer holds promise for enhancing early detection methods. However, evaluating the effectiveness of these models remains challenging. Therefore, achieving high accuracy in cancer prediction is essential for improving treatment strategies and patient outcomes. By applying various machine learning algorithms to the Breast Cancer Wisconsin Diagnostic dataset, researchers aim to identify the most efficient approach for breast cancer diagnosis. They evaluate the performance of classifiers such as Random Forest, Naïve Bayes, Decision Tree (C4.5), KNN, SVM, and Logistic Regression, considering metrics like confusion matrix, accuracy, and precision. The assessment reveals that Random Forest outperforms other classifiers, achieving the highest accuracy rate of 97%. This study is conducted using the Anaconda environment, Python programming language, and Sci-Kit Learn library, ensuring replicability and accessibility of the findings. In summary, this study demonstrates the potential of machine learning algorithms for breast cancer prediction and highlights Random Forest as the most effective approach. Its findings contribute valuable insights to the field of breast cancer diagnosis and treatment.
Article
Full-text available
Patients with Liver disease have been continuously increasing because of excessive consumption of alcohol, inhale of harmful gases, intake of contaminated food, pickles and drugs. Automatic classification tools may reduce burden on doctors. This paper evaluates the selected classification algorithms for the classification of some liver patient datasets. The classification algorithms considered here are Naïve Bayes classifier, C4.5, Back propagation Neural Network algorithm, and Support Vector Machines. These algorithms are evaluated based on four criteria: Accuracy, Precision, Sensitivity and Specificity.
Conference Paper
Full-text available
This paper presents a real-valued negative selection algorithm with good mathematical foundation that solves some of the drawbacks of our previ- ous approach (11). Specifically, it can produce a good estimate of the optimal number of detectors needed to cover the non-self space, and the maximization of the non-self coverage is done through an optimization algorithm with proven convergence properties. The proposed method is a randomized algorithm based on Monte Carlo methods. Experiments are performed to validate the assumptions made while designing the algorithm and to evaluate its performance.3
Article
Today, with the development of industry and mechanized life style, prevalence of diseases is rising steadily, as well. In the meantime, the number of patients with liver diseases (such as fatty liver, cirrhosis and liver cancer, etc.) is rising. Since prevention is better than treatment, early diagnosis can be helpful for the treatment process so it is essential to develop some methods for detecting high-risk individuals who have the chance of getting liver diseases and also to adopt appropriate solutions for early diagnosis and initiation of treatment in early stages of the disease. In this study, we tried to use common data mining techniques that are used nowadays for diagnosis and treatment of different diseases, for the diagnosis and treatment of liver disease. For this purpose, we used Rapid Miner and IBM SPSS Modeler data mining tools together. Accuracy of different data mining algorithms such as C5.0 and C4.5, Decision tree and Neural Network were examined by the two above tools for predicting the prevalence of these diseases or early diagnosis of them using these algorithms. According to the results, the C4.5 and C5.0 algorithms by using IBM SPSS Modeler and Rapid Miner tools had 72.37% and 87.91% of accuracy respectively. Further, Neural Network algorithm by using Rapid Miner had the ability of showing more details.
Conference Paper
Medical Data Mining (MDM) is one of the most critical aspects of automated disease diagnosis and disease prediction. MDM involves developing data mining algorithms and techniques to analyze medical data. In recent years, liver disorders have excessively increased and liver diseases are becoming one of the most fatal diseases in several countries. In this study, two real liver patient datasets were investigated for building classification models in order to predict liver diagnosis. Eleven data mining classification algorithms were applied to the datasets and the performance of all classifiers are compared against each other in terms of accuracy, precision, and recall. Several investigations have also been carried out to improve performance of the classification models. Finally, the results shown promising methodology in diagnosing liver disease during the earlier stages.
Article
A hybrid model is developed by integrating a case-based reasoning approach and a particle swarm optimization model for medical data classification. The data sets from UCI Machine Learning Repository; Liver Disorders Data Set is employed for benchmark test. Initially a case-based reasoning method is applied to preprocess the data set thus a weight vector for each feature is derived. A particle swarm optimization model is then applied to construct a decision-making system based on the selected features and diseases identified. The PSO algorithm starts by partitioning the data set into a relatively large number of clusters to reduce the effects of initial conditions and then reducing the number of clusters into two. The average for liver disorders of CBRPSO is 78.18%. The proposed case-based particle swarm optimization model is able to produce more accurate and comprehensible results for medical experts in medical diagnosis..