Content uploaded by Md. Milon Islam
Author content
All content in this area was uploaded by Md. Milon Islam on Sep 21, 2018
Content may be subject to copyright.
Performance Evaluation of Random Forests and
Artificial Neural Networks for the Classification of
Liver Disorder
Md. Rezwanul Haque
1
, Md. Milon Islam
1
, Hasib Iqbal
1
, Md. Sumon Reza
2
, and Md. Kamrul Hasan
1
1,2
Department of Computer Science and Engineering
1
Khulna University of Engineering & Technology, Khulna-9203, Bangladesh
2
Daffodil International University, Dhaka-1207, Bangladesh
r.haque.249.rh@gmail.com
1
, milonislam@cse.kuet.ac.bd
1
, pranto00250@gmail.com
1
, sumon.info2015@gmail.com
2
, and
mhgolap11@gmail.com
1
Abstract— Liver is the major organ inside the human body which
is very supportive for digesting food, eliminating poisons, and
stocking energy. The rate of Liver disorder patients is rapidly
rising all over the world. But it is very hard to identify the disorder
from its ambiguous symptoms which increases the mortality rate
due to this disease. The paper represents an expert scheme for the
classification of liver disorder using Random Forests (RFs) and
Artificial Neural Networks (ANNs). The methods train the input
features using 10-fold cross validation fashion. The dataset named
as BUPA liver dataset is retrieved from UCI machine learning
repository for our research study. The performance of the
proposed scheme is assessed in view of accuracy, positive
predictive value, negative predictive value, sensitivity, specificity
and F1 score. The scheme delivers a better result for training but
comparatively low for testing. The scheme obtained the accuracy
of 80% and 85.29% by RFs and ANNs respectively along with the
F1 score of 75.86% and 82.76% in testing phase.
Keywords- Liver Disorder; Random Forests; Artificial Neural
Networks; Performance Measure Indices.
I.
I
NTRODUCTION
Conferring to National statistics [1] in the UK, liver disorder
has been classified as the fifth most common reason of
mortality. It is also acknowledged as the second reason of death
amongst all gastral diseases in the US [2]. A recent statistics
from the International Liver Congress [3] shows that the
number of sufferers from a chronic liver condition in the
European zone is about 29000 thousand and 30000 thousand
have a liver disorder in America.
The liver is the biggest strong organ in the human body and
it is used to make and secrete bile as this is considered as gland.
The location of the liver is the upper right portion of the
abdomen which is surrounded the rib cage. The liver performs
many complex tasks inside the body and due to the damage of
the liver the tasks cannot be completed which creates illness.
Liver disorder is the trouble of liver task that origins illness [4].
Alcohol consumption plays the major role for liver disorder.
Parasites and viruses may affect the liver resulting
inflammation which decreases liver task. An irregular gene
inherited from the parents can cause liver disorder.
Generally it is very difficult to detect liver disorder because
of its ambiguous symptoms and confused with various health
difficulties [5]. Generally, the doctors used various types of
blood test to determine the proper functionality of liver. The
indicators used to detect the liver disorder are in LTs
considering gamma-glutamyl transpeptidase (GGT), alamine
aminotransferase alamine aminotransferase (ALT), alkaline
phosphotase (ALP) and aspartate aminotransferase (AST)
which are found in liver cells. The presence of above described
enzymes in blood causes liver disorder generally. The ALT and
AST are utilized to find the liver disorder in case of viral
hepatitis for the accurate diagnosis. The anomalies of ALP and
GGT causes liver disorder [6].
The above described techniques are unable to deliver an
exact and consistent result. The techniques are involved with
doctors or physicians or other medical staffs. So, a scheme
which can operate without any medical equipment’s and
medical staffs can lead to a suitable solution. We introduce an
expert scheme to classify the input attributes depending on the
selector of liver disorder. We have used two supervised
machine learning techniques termed as RFs and ANNs which
are related with learning computations that explore data utilized
for regression analysis and classification to identify the liver
disorder.
The rest of the paper is ordered as follows: Section II
illustrated the related works that are on-going in this field. The
working principle of the RFs and ANNs is described in Section
III. The proposed methodology with necessary steps is
explained in Section IV. The implementation and results
analysis are depicted in Section V. Section VI concludes the
paper.
II. R
ELATED WORKS
There are several recent techniques have been developed
with the evolution of technology for the classification of liver
disorder. The recent works in this field is described shortly as
follows.
Ramana et al. [7] proposed a technique for the detection of
liver disorder using selected classification algorithms. The
approach used are Support Vector Machines, C4.5, Naïve
Bayes classifier and Back propagation Neural Network for
classification and the performance of the technique is measured
in view of accuracy, precision, sensitivity etc. They simulated
their work in WEKA. The accuracy obtained by the technique
is 56.62% in NBC, 68.69% in C4.5, 71.59% in Back
Propagation, 62.89% for K-NN and 58.26% in SVM. They
showed the result in the paper by tuning the features.
In [8], the authors demonstrated a scheme for the
identification of liver disorder which using KNN with
Euclidean Distance. The results of the KNN based method are
compared with the other classifier. The scheme obtained
accuracy of 92.53% in testing and 100% in training.
Bahramirad et al. [9] drawn a comparative study for the
detection of liver disorder. The Rapid Mining and Weka are
used for simulation purpose. They used eleven classification
algorithms in their study. The highest accuracy obtained by
their technique is 73.91% using Neural Net and Gaussian
Processes. They also measured precision and recall for all the
algorithms. The precision and recall in the scheme is 79.01%
and 79.52% using Gaussian Processes and Neural Net
respectively.
The authors in [10] illustrated a survey and compare the
various data mining algorithms for the prediction of liver
disorder using Rapid Miner and IBM SPSS Modeler. C5.0,
C4.5, Decision tree and Neural Network are used as a
classification algorithms. The C4.5 and C5.0 algorithms
obtained accuracy of 72.37% and 87.91% by using IBM SPSS
Modeler and Rapid Miner tools respectively.
Dixon et al. [11] proposed a scheme termed as artificial
immune systems for liver disorder diagnosis which worked
depending on the blood test outcomes. The scheme employed
ANN and other classification algorithms. The detection rate
obtained in the scheme is 81.18% using MVD.
III. T
HE
T
HEORETICAL
E
XPLANATION
The learning procedure in Machine Learning strategies can be
partitioned into two principle classifications such as supervised
and unsupervised learning. Supervised learning is used to
predict a certain outcome from a given input with input-output
pairs. The machine learning model is formed from these input-
output pairs, which comprise the training set. The goal is to
make accurate predictions to new, never-before seen data.
A. Random Forests
The Random Forests classification is one of the supervised
classification techniques. The RFs classification is an ensemble
method that can be thought of as a form of nearest neighbor
predictor. Ensemble learning is the process by which multiple
models, such as classifiers or experts, are strategically
generated and combined to solve a particular computational
intelligence problem. Random Forests are essentially a
collection of decision trees. The Random Forests model is built
from the numbers of trees [12], [13].
The basic steps of Random Forests technique are as follows.
Step 1: Pick a random K data point from the training set
Step 2: Build the decision tree associated to these K data points
Step 3: Choose the number of N-tree for trees that are formed
and repeat steps 1 and 2
Step 4: For a new data point, form the N-tree that predict the
category to which the data points belongs, and assign the new
data point to the category that wins the majority vote.
Fig. 1. Working procedure of Artificial Neural Networks [12].
B. Artificial Neural Networks
Artificial neural networks are based on a rather simple model
of a neuron. Most neurons have three parts: a dendrite which
collects inputs from neurons (or from an external stimulus); a
soma which performs an important nonlinear processing step;
finally an axon, a cable-like write along which the output signal
is transmitted to other neurons further down the processing
chain.
In Neural network, output value can be continuous, binary, and
categorical. Here, we apply multilayer perceptron (known as
feed-forward) for classification and regression that can serve as
starting point for more involved deep learning method. In
construction of this network, the number of neurons in entering
layer is equal to the number of existing characteristics for
decision making about each sample data; here there are six input
layer. The working procedure of Artificial Neural Networks
[12] is illustrated in Fig. 1. In contrast, the nodes of network are
attributes of samples. Other parts of this network is hidden
layer. The number of hidden layers is considered as one layer
because one layer can mostly solve the question [12].
IV. THE
PROPOSED
M
ETHODOLOGY
A. Data Collection and Preparation
The liver disorder dataset named as BUPA is collected from
University of California in Irvine (UCI) machine learning
repository [14]. This dataset contains 345 records of liver
patient where the cases are labeled as either class label 1 or 2
and 145 (42.03%) of the cases are of class label 1 and 200
(57.97%) are of class label 2. The dataset has 7 features that are
as follows excluding class label.
• MCV (x
1
)
• Alkphos (x
2
)
• SGPT (x
3
)
• SGOT (x
4
)
• Gammagt (x
5
)
•
Drinks / Day (x
6
)
The BUPA liver dataset is retrieved from the UCI machine
learning repository dataset [14]. If the diagnosis is positive it
comes under class label 1 category and if the diagnosis is
negative it comes under class label 2 category. Pearson
Fig. 2.
Pearson corelation for class label 1.
Fig. 3.
Pearson corrlatrion for class label 2.
correlation is a degree of the linear correlation between two
attributes. The correlation among six attributes of class label 1
and class label 2 is calculated that represents the high
correlation between positive and negative class as illustrated in
Fig.2 and Fig. 3. The correlation among the attributes is
positive.
B. Training and Testing Phase
Training phase acquires the data set from features and testing
phase used to measure how well the model performs at making
predictions on that testing set. In k-fold cross-validation, the
given data set is split into k equal size chunks. A single chunk
is used for testing and k-1 chunks is used for training. The
process is gone through k times. In this scenario all the dataset
are used for training as well as testing. It is possible to avoid the
overfitting scenario in k-fold cross validation.
C. The Performance Measure Indices
The performance of machine learning techniques is measured
in terms of some performance measure indices. A confusion
matrix for actual and predicted class is formed comprising of
TP, FP, TN, and FN to evaluate the parameter. The significance
of the terms is given below.
TP = True Positive ( Correctly Identified )
TN = True Negative ( Incorrectly Identified )
FP = False Positive ( Correctly Rejected )
FN = False Negative ( Incorrectly Rejected )
The performance of the proposed system is measured by the
following formulas:
V. I
MPLEMENTATION AND
R
ESULTS
A
NALYSIS
We have developed a model using Random Forests and
Artificial Neural Networks which is implemented in a high
configuration computer. The computer configuration was Intel
Core i5 with 8GB RAM. We have used Scikit-learn which is an
open source software developed in Python for machine learning
library. An Integrated development environment named as
Spyder is used to run the program.
We set some tuning parameters for both modalities. The tuning
parameter for both modalities is shown in Table I.
We have used the 10-fold cross-validation technique i.e. the
data set was split into 10 chunks. The 10 fold technique is
utilized to approve the methodical model. In this scenario, 9
folds are utilized for training and the rest one for testing.
TABLE I. TUNING PARAMETER
RFs ANNs
1 Number of estimators = 10 Hidden layer size=2
2 Criterion = 'entropy' Input dimension = 6
3 Random state = 1234 Batch size =12
4 Cross-validation =10 Cross-validation=10
0
0.5
1
x1 x2 x3 x4 x5 x6
x1 1 0.08 0.21 0.27 0.3 0.43
x2 0.08 1 0.02 0.16 0.17 0.06
x3 0.21 0.02 1 0.68 0.6 0.42
x4 0.27 0.16 0.68 1 0.56 0.43
x5 0.3 0.17 0.6 0.56 1 0.49
x6 0.43 0.06 0.42 0.43 0.49 1
Pearson corelation for class label 1
0
0.5
1
x1 x2 x3 x4 x5 x6
x1 1 0.01 0.12 0.18 0.21 0.24
x2 0.01 1 0.11 0.17 0.14 0.14
x3 0.12 0.11 1 0.78 0.48 0.07
x4 0.18 0.17 0.78 1 0.5 0.22
x5 0.21 0.14 0.48 0.5 1 0.26
x6 0.24 0.14 0.07 0.22 0.26 1
Pearson corelatrion for class label 2
Accuracy ( Acc )
=
()
()
TP TN
TP TN FP FN
+
+++
(1)
Sensitivity ( Sen )
=
()
TP
TP FN+
(2)
Specificity ( Spec )
=
()
TN
TN FP+
(3)
Positive Predictive Value ( PPV )
=
()
TP
TP FP+
(4)
Negative Predictive Value ( NPV )
=
()
TN
TN FN+
(5)
F1 Score
=
2
(2 )
TP
TP FP FN++
(6)
(a)
(b)
Fig. 4. Confusion matrix. (a) Training phase (b) Testing phase.
We have formed a confusion matrix from the model. We have
utilized 90% instances of total data for training both Random
Forests (RFs) classification and Artificial Neural Networks
(ANNs) individually. The remaining 10% instances used for
testing both in RFs and ANNs individually. The graphical
representation of the confusion matrix for each modality is
illustrated in Fig. 4.
The performance measure indices are calculated both for
training and testing using the above-described equations. The
calculated values are depicted in Table II and the graphical view
of the performance measure indices is illustrated in Fig. 5. The
results represented in Table II and Fig. 5 shows that ANNs has
the best performance over RFs in terms of specificity,
sensitivity, and the accuracy are obtained 89.47%, 80%, and
85.29% respectively in testing phase.
A comparison study is drawn in Table III for testing phase. In
[9], the authors measured the performance eleven classification
algorithms such as Gaussian Processes, Linear Logistic
Regression, Multilayer Perceptron, Neural Network, Support
Vector Machine etc. for the BUPA liver dataset. The accuracy,
sensitivity and PPV is obtained by Neural Network which are
73.91%, 79.52% and 77.65% respectively in testing phase. Lin
et al. [15] proposed a hybrid approach and the accuracy
obtained by the system is 78.18%. Polat et al. [16] introduced a
technique for the diagnosis of liver disorder using Artificial
Immune Recognition System (AIRS) and obtained the accuracy
of 83.38%. The accuracy achieved by our model is 80% and
85.29% in Random Forests and Artificial Neural Networks
individually. Developed model also provide greater percentage
in sensitivity and positive predictive value. We have used 10
fold technique to build our model.
TABLE II. PERFORMANCE MEASURE INDICES
Parameters Training Phase Testing Phase
ANNs RFs ANNs RFs
Accuracy (%) 84.24 98.57 85.29 80
Positive Predictive
Value (%)
74.81 100 85.71 84.61
Negative Predictive
Value (%)
91.11 96.15 85 77.27
Sensitivity (%) 85.96 97.78 80 68.75
Specificity (%) 83.25 100 89.47 89.47
F1 Score (%) 80 98.88 82.76 75.86
(a)
(b)
Fig. 5. Performance measure indices for the classification of liver disorder. (a)
Training phase (b) Testing phase.
TABLE III. COMPARISON WITH EXISTING METHODS IN TESTING
PHASE
Methods Acc (%) PPV
(%)
Sen (%)
Gaussian Processes 73.91 79.01 77.11
Linear Logistic Regression 69.57 74.70 74.70
Multilayer Perceptron 68.84 76.32 69.88
Neural Net 73.91 77.65 79.52
Support Vector Machine 69.23 75 75
CBR-PSO 78.18 - -
AIRS 83.38 - -
Developed model using RFs 80 89.47 68.75
Developed model using ANNs 85.29 89.47 80
0
50
100
150
200
TP TN FP FN
RFs 132 75 0 3
ANNs 98 164 33 16
Confusion matrix in training phase
0
5
10
15
20
TP TN FP FN
RFs 11 17 2 5
ANNs 12 17 2 3
Confusion matrix in testing phase
0
20
40
60
80
100
Acc PPV NPV Sen Spec F1
Score
Performance measure indices
RFs ANNs
0
20
40
60
80
100
Acc PPV NPV Sen Spec F1
Score
Performance measure indices
RFs ANNs
VI. C
ONCLUSION
Liver disorder classification is very significant in the area of
Medicare and Biomedical. In this paper we focused on building
a model which aims at classifying the most severe disease
known as liver disorder. Liver disorder is a remarkably risky
disease that causes a lot of death for numerous people all over
the world. So, early detection of this disorder can save a lot of
valuable life. We developed a model which is based on Random
Forests and Artificial Neural Networks. Both of the techniques
has been implemented by the Python to be the most effective in
classifying the diagnostic data set into the two classes in view
of the seriousness of the disease. We end up with an accuracy
of 80% and 85.29% in RFs and ANNs respectively in testing
phase. The developed model will be very helpful for the
medical staffs as well as general people. The model obtained by
supervised machine learning techniques will be very supportive
in the field of medical disorders and proper diagnosing.
R
EFERENCES
[1] UK National Statistics, http://www.statistics.gov.uk, accessed on: Sep.
25, 2017.
[2] Everhart JE, Ruhl CE “ Burden of digestive diseases in the United States
Part III: Liver, biliary tract, and pancreas,” Gastroenterology, 136,
pp.1134 –1144, 2009.
[3] American Liver Foundation. The Liver Lowdown – Liver Disease: the big
picture,http://www.liverfoundation.org/education/liverlowdown/ll1013/b
igpicture, accessed on: Sep. 25, 2017.
[4] Liver Disease, http://www.medicinenet.com/liver_disease/article.htm,
accessed on: Sep. 25, 2017.
[5] Gonzalez, F., Dasgupta, D., Nino, L. F., “A Randomized Real-
Valued Negative Selection Algorithm”, Second International
Conference on Artificial Immune Systems. United Kingdom. September,
2003.
[6] Diagnosing Liver Disease, http://www.liver.ca/liverdisease/diagnosing-
liver-disease, accessed on: Sep. 25, 2017.
[7] B. Ramana, M. Babu, N. Venkateswarlu, “A Critical Study of Selected
Classification Algorithms for Liver Disease Diagnosis”, International
Journal of Database Management Systems (IJDMS), Vol.3, No.2, pp.101
- 114, May 2011.
[8] Aman Singh; Babita Pandey, “An euclidean distance based KNN
computational method for assessing degree of liver damage”, 2016
International Conference on Inventive Computation Technologies
(ICICT), Vol. 1, pp. 1 – 4, 2016.
[9] S. Bahramirad, A. Mustapha and M. Eshraghi, "Classification of liver
disease diagnosis: A comparative study," 2013 Second International
Conference on Informatics & Applications (ICIA), Lodz, pp. 42-46, 2013.
[10] M. ABDAR, “ A Survey and Compare the Performance of IBM SPSS
Modeler and Rapid Miner Software for Predicting Liver Disease by Using
Various Data Mining Algorithms,” J. Sci.(CSJ), Vol. 36, 2015.
[11] S. Dixon and X. H. Yu, "Liver disorder detection based on artificial
immune systems," 2015 11th International Conference on Natural
Computation (ICNC), Zhangjiajie, pp. 743-748, 2015.
[12] G. James , D. Witten , T.Hastie , R.Tibshirani : An Introduction to
Statistical Learning, 2013.
[13] S. Guido, Andreas C. Müller : Introduction to machine learning with
python. O'Reilly Media, Inc. , 2016
[14] BUPA Liver Disorders Dataset. UCI Repository of Machine Learning
Databases, http://archive.ics.uci.edu/ml/datasets/Liver+Disorders,
accessed on: Sep. 25, 2017.
[15] Jyun Jie Lin and P. C. Chang, "A particle swarm optimization based
classifier for liver disorders classification," International Conference on
Computational Problem-Solving, Lijiang, pp. 63-65, 2010.
[16] K. Polat, S. Sahan, H. Kodaz and S. Gunes, "A new classification method
to diagnosis liver disorders: supervised artificial immune system (AIRS),"
Proceedings of the IEEE 13th Signal Processing and Communications
Applications Conference, pp. 169-174, 2005.