ArticlePDF Available

An Efficient Diagnosis System for Detection of Liver Disease Using a Novel Integrated Method Based on Principal Component Analysis and K-Nearest Neighbor (PCA-KNN)

IGI Global
International Journal of Healthcare Information Systems and Informatics
Authors:

Abstract and Figures

Talk about organ failure and people immediately recall kidney diseases. On the contrary, there is no such alertness about liver diseases and its failure despite the fact that this disease is one of the leading causes of mortality worldwide. Therefore, an effective diagnosis and in time treatment of patients is paramount. This study accordingly aims to construct an intelligent diagnosis system which integrates principle component analysis (PCA) and k-nearest neighbor (KNN) methods to examine the liver patient dataset. The model works with the combination of feature extraction and classification performed by PCA and KNN respectively. Prediction results of the proposed system are compared using statistical parameters that include accuracy, sensitivity, specificity, positive predictive value and negative predictive value. In addition to higher accuracy rates, the model also attained remarkable sensitivity and specificity, which were a challenging task given an uneven variance among attribute values in the dataset.
Content may be subject to copyright.
DOI: 10.4018/IJHISI.2016100103

Volume 11 • Issue 4 • October-December 2016
56





Aman Singh, Department of Computer Science and Engineering, Lovely Professional University, Phagwara, India
Babita Pandey, Department of Computer Applications, Lovely Professional University, Phagwara, India

Talk about organ failure and people immediately recall kidney diseases. On the contrary, there is
no such alertness about liver diseases and its failure despite the fact that this disease is one of the
leading causes of mortality worldwide. Therefore, an effective diagnosis and in time treatment of
patients is paramount. This study accordingly aims to construct an intelligent diagnosis system which
integrates principle component analysis (PCA) and k-nearest neighbor (KNN) methods to examine the
liver patient dataset. The model works with the combination of feature extraction and classification
performed by PCA and KNN respectively. Prediction results of the proposed system are compared
using statistical parameters that include accuracy, sensitivity, specificity, positive predictive value and
negative predictive value. In addition to higher accuracy rates, the model also attained remarkable
sensitivity and specificity, which were a challenging task given an uneven variance among attribute
values in the dataset.

Classification, Feature Reduction, K-Nearest Neighbor, Liver Disease Diagnosis, Principal Component Analysis

For last few decades, liver disease has shown an extreme presence worldwide. Liver has a vital
importance to human body as it performs numerous key bodily functions that are chemical
detoxification, protein production, drug metabolizing, blood clotting, glucose storage, cholesterol
production and bilirubin clearance (Bucak and Baki, 2010). Improper working of any of these functions
leads to liver disease. Nausea, fatigue, energy loss, weight loss, poor appetite, and upper right quadrant
abdominal pain are some of the early symptoms of the disease. These symptoms may come slowly
but can get worse after a period of time depending upon an individual life style. Severe symptoms
may include memory confusion, abnormal bleeding, easy bruising, redness on the palms of hands,
jaundice, edema and ascites. Common causes of the disease are hepatitis A, B, C, D, E, inherited
abnormal genes, Epstein Barr virus, iron overloading and alcohol abuse (Chuang, 2011; Lin and
Chuang, 2010; Lin, 2009). More than hundred types of liver diseases exist, out which most prevalent

Volume 11 • Issue 4 • October-December 2016
57
are autoimmune hepatitis, neonatal hepatitis, primary biliary cholangitis, liver fibrosis, liver cirrhosis,
liver cancer, alcoholic liver disease and nonalcoholic fatty liver disease (Singh and Pandey, 2014).
Individual and integrated computer-aided models have been widely used to evaluate liver disease
and its types. Literature study shows considerable applicability of artificial neural network (ANN),
fuzzy logic (FL), decision trees, ANN-CBR (case-based reasoning), ANN-FL, AIS (artificial immune
recognition)-FL, ANN-GA (genetic algorithm), FL-GA, AIS-ANN-FL and ANN-GA-RBR (rule-
based reasoning) to build liver diagnostic systems. ANN based frameworks showed high reliability,
robustness and accuracy. These systems generally take less learning time even in case of large size
growing problems (Autio et al., 2007; Azaid et al., 2006; Bucak and Baki, 2010; Elizondo et al., 2012;
Hashem et al., 2010; Içer et al., 2006; Lee et al., 2005; Ozyilmaz and Yildirim, 2003). ANN based
models were developed to forecast timely prediction of patient with hepatectomised (Hamamoto et al.,
1995), to classify hepatobiliary disorders (Hayashi et al., 2000), to detect hepatitis disease (Ozyilmaz
and Yildirim, 2003; Sartakhti et al., 2015), to diagnose liver disease (Revett et al., 2006). FL based
methodologies were used for performing semi-automatic liver tumour segmentation (Li et al., 2012),
for identifying hepatitis disease (Obot and Udoh, 2011) and for classifying hepatobiliary disorders
(Ming et al., 2011). C5.0 decision tree and boosting were employed to categorize liver viruses as
chronic hepatitis C and B (Floares, 2009), C4.5 decision tree was applied to examine liver cirrhosis
(Yan et al., 2008). Similarly, in integration, ANN-CBR was built to study presence of liver disease
and to detect its types (Chuang, 2011; Lin and Chuang, 2010). ANN-FL hybridization was used to
identify liver disorders (Celikyilmaz et al., 2009; Çomak et al., 2007; Kulluk et al., 2013; Li and Liu,
2010; Neshat and Zadeh, 2010), to enhance classification accuracy rates for liver disease (Li et al.,
2010). AIS-FL integration was used to categorize liver disorders and to evaluate prediction accuracy
of hepatitis disease (Mezyk & Unold, 2011; Polat, Şahan, Kodaz, & Gunes, 2007). ANN-GA was
used to detect liver disorders and to grade liver fibrosis stabilization in chronic hepatitis C (Dehuri
and Cho, 2010; Gorunescu et al., 2012). FL-GA was used to discover liver disorders (Luukka, 2009;
Torun and Tohumoğlu, 2011). AIS-ANN-FL was used to classify hepatitis disease (Kahramanli and
Allahverdi, 2009) and ANN-GA-RBR was used to take decision on liver transplantation (Aldape-
Perez et al., 2012).
Liver disease in human body is examined using liver function tests. Presence of severe symptoms
also indicates the damage and makes the disease more recognizable. However, by then it is too late
to start an appropriate treatment. A physician’s decision is typically dependent on present blood test
results or on the previous assessment of similar cases. This study deployed a number of integrated
models to build an efficient diagnosis system for detection of liver disease. These models include
principal component analysis-linear discriminant analysis (PCA-LDA), principal component
analysis-diagonal linear discriminant analysis (PCA-DLDA), principal component analysis-quadratic
discriminant analysis (PCA-QDA), principal component analysis-diagonal quadratic discriminant
analysis (PCA-DQDA), principal component analysis-least squares support vector machine (PCA-
LSSVM) and principal component analysis-k-nearest neighbor (PCA-KNN). All these systems
were compared in terms of accuracy, sensitivity, specificity, positive predictive value and negative
predictive value. Results showed that the proposed model has better classification performance than
other integrated approaches.
The rest of the paper is arranged as follows. Section 2 presents the proposed integrated intelligent
system and other classifiers implemented. Section 3 describes experimental results and comparison.
Finally, conclusions are drawn in Section 4.

The proposed PCA-KNN model has two key stages which are dimensionality reduction and
classification performed by PCA and correlation distance metric based KNN approach. Figure 1
illustrates the block diagram of presented model where first step is loading of dataset, second is to

Volume 11 • Issue 4 • October-December 2016
58
perform data preprocessing, third step is to execute feature selection and reduction, fourth is to classify
samples and last step is to evaluate the performances using statistical parameters that include accuracy,
sensitivity, specificity, positive predictive value and negative predictive value. In data preprocessing,
each sample is symbolized as a vector of real numbers before giving input to KNN. Dataset contains
one nominal and nine numeric features. For instance, in selector field a sick class is represented
by 1 and a healthy class is indicated with 0. Furthermore, this section presents description of PCA
and classification algorithms employed to detect liver disease. Intelligent integrated frameworks
implemented for the study are PCA-LDA, PCA-DLDA, PCA-QDA, PCA-DQDA, PCA-LSSVM and
PCA-KNN. The used methods and techniques are introduced as follows.
R. A. Fisher developed LDA in 1936 for feature reduction and classification. It uses the concept
of covariance matrix for distinguishes two distinct classes and finds linear combination to separate
classes for a given set of variables. Input and targets are given to LDA in numerical form (Guo et al.,
2007; Ye and Li, 2005). Between-classes variance and within-classes variance were used to draw
decision region for given target. For example, the dataset have N classes; class h mean vector is µh
where h=1, 2,. .N; Bh is the number of samples within class h where h=1, 2, .. N.
B B
h
N
h
=
=
0
(1)
Figure 1. Block diagram of PCA-KNN based proposed integrated system

Volume 11 • Issue 4 • October-December 2016
59
X c c c
k
h
N
q
B
q h q h
T
q
= −
( )
( )
= =
1 1
 µ (2)
Xo
h
N
h h
T
= −
( )
( )
=
1
 µ µ µ µ (3)
µ µ
/=
=
1
1
N
h
N
h (4)
where B is entire samples, Xk is scatter matrix representation of within-class, Xo is scatter matrix
representation of between-class and µ is dataset mean. In addition, DLDA is the extension of linear
discriminant analysis where covariance matrices are assumed equal across groups.
QDA works with heterogeneous variance-covariance matrices and is a generalized version of
LDA. It divides multiple targets dimensions by developing a quadratic surface. It also differs in
calculating covariance of targets and doesn’t presume them of identical nature. Each group in the
dataset has a quadratic score function calculated by QDA. Maximizing joint likelihood of features
and their classes is calculated for estimating parameters. Score function is allied with population
mean vector and variance-covariance matrices. In addition, DQDA is the expansion of quadratic
discriminant analysis where covariance matrices are used in which all off-diagonal elements are set
to be zero (Srivastava et al., 2007).
Support vector machine is significantly applied in pattern recognition and bioinformatics. It
is a well-known approach in classifying diseases. SVMs are primarily developed by Vapnik and
Cortes to classify and regress datasets (Cortes and Vapnik, 1995). It uses kernel functions to place
training data samples on a fitting hyperplane surface. It is a supervised learning algorithm which
set the optimum kernel parameters for productive performance on a problem. Choices of targets are
separated by selected hyperplane surfaces that are established by support vector machines. To reach
a requisite hyperplane a maximum euclidean distance to the adjacent point was used. The value
between 1 and 100000 was used for width of gaussian kernel and value between 0.1 and 10 was
used for regularization factor. Let’s assume that the dataset hold two separate classes like [+1, -1].
Suppose, a given training data
k l k l k R and l
d d
d
1 1 1 1, , ., , , ,
( )
( )
∈ +
{ }
contains data d. Then maximum possible margin were shaped with
H k a k a
( )
=
( )
+.0 (5)
and the disproportion for l = +1 and l = -1 is
l a k a x d
i x
( )
+
= …
01 1, , ., (6)

Volume 11 • Issue 4 • October-December 2016
60
This formula given by data point k and l in parity state is known as support vectors. Difference
abided by hyperplane margins are as follows.
lm H km
a
m d
×
( )
= …Γ, , .,1 (7)
Now, equation (8) is firm for sinking total number of solutions. The given formula is
Γ× =a1 (8)
1
2
2
a, this is minimized subject to equation 6 (9)
For βi, equation (6) and (9). The stated case then substitute the equation (6) and (9) with the
new one as mentioned.
l a k a
i x x
.
( )
+
≥ −
01β (10)
T a
x
d
x
=
+
1
2
1
2
β (11)
This standard SVM functioning is prolific in linear classification of data but could not solve the
nonlinear cases. To surmount this constraint kernel functions are developed that works by mapping
available data into kernel space. Various types of kernel functions are linear kernels, quadratic kernels,
radial basis function (RBF) kernels, polynomial kernels and multilayer perceptron (MLP) kernels.
The study attained finest results with RBF kernel and least squares hyperplane (Suykens and
Vandewalle, 1999). RBF is defined as K k k k k, exp
′ ′
( )
= − − 22
σ (Tsujinishi and Abe, 2003).
LSSVM is defined as follows.
For x d
k a k a
i x x
= …
( )
+
= −
1
1
0
, .,
β
(12)
1
2 2
2
1
2
aT
x
d
x
+
=
β (13)
S a c a Tl a k a
x
d
x
x
d
x x x x
, , ,α β β α β
( )
= +
( )
+
− +
{
= =
∑ ∑
1
2 2 1
2
1
2
1
0
}}
(14)

Volume 11 • Issue 4 • October-December 2016
61
The αx
is knowns as Lagrange multipliers used for finding local maxima and minima. Generic
support vector machine has positive value for it and least squares vector machine can have either
positive or negative value.
KNN algorithm belongs to the group of instance based algorithms and is semi-supervised in
nature. It develops its model on training data and predicts a new sample based on searching k-most
similar cases. It uses distance metric function to find similarity measure where selection of metric
function depends on composition of data. Number of functions available are euclidean, cityblock,
cosine, correlation and hamming out of which correlation was used for the study (Dong et al., 2011;
Samet, 2008; Sun and Huang, 2010). For instance, suppose a ka-by-l data of metric Z that can be
represented as kz (1-by-l) row vectors z1, z2,…zpz, and ky-by-l data of metric Y that can be represented
as ky (1-by-l) row vectors y1, y2,…yky. Correlation distance is a statistical difference between vector
zu and yv are defined in Equation (5)
dz z y y
z z z z y y y y
uv
u u v v
u u u u v v v v
= −
( )
( )
( )
( )
( )
( )
1
(15)
where z lz
u
j
uj
=
1 (16)
ylz
v
j
vj
=
1 (17)
PCA is an appearance based technique used widely in image recognition, image compression,
signal processing, and face recognition. It shows considerable performance in reducing dimensions
of a dataset having large number of features which eventually enhance the results (Belhumeur et
al., 1997). Therefore, we integrated it with multiple classifiers and finally proposed a PCA-KNN
prediction model. PCA working is depicted as mentioned below:
Assume a dataset D having v dimensions. The p principal axes Q Q Qp1 2
, ,.. where 1≤ ≤p v .
The covariance matrix would be represented as:
DRc d c d
i
R
i
V
i
=
( )
( )
=
1
1
(18)
where c D
i, d is samples mean, R is number of samples.
BQ w Q
i i i
= (19)
where i p∈ ……1, , ; wi is the ith largest eigen value in B.
Finally, p principal components given a x D
i can be explained as follows:

Volume 11 • Issue 4 • October-December 2016
62
h h h h Q c Q c Q Q c
p
V V
n
V V
= …
= …
=
1 2 1 2
, , , , , , (20)
here, h is the p principal component of x.

Liver patient dataset obtained from University of California repository of machine learning database
was used for experimentation. Total number of instances and features are 583 and 10 respectively.
Features contain information about age, gender, total bilirubin, direct bilirubin, albumin and globulin
ratio, alkaline phosphotase, albumin, alamine aminotransferase, aspartate aminotransferase and total
proteins. Data samples with attribute values are given in Table 1. The dataset contains two target
classes (sick and healthy). Sick class has 416 instances and healthy class has 167. For record, data
had 441 male and 142 female cases. To reduce sample biasness and to estimate misclassification
probabilities the given dataset was divided into training and testing using leave-m-out cross validation
method. Obtained results of integrated classification algorithms were compared using accuracy,
sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) rates
that are defined in Equations (21), (22), (23), (24) and (25) respectively.
Accuracy TP TN
TP TN FP FN
=+
+ + + (21)
Sensitivity TP
TP FN
=+ (22)
Specificity TN
TN FP
=+ (23)
Table 1. The features of liver patient dataset

Volume 11 • Issue 4 • October-December 2016
63
PPV TP
TP FP
=+ (24)
NPV TN
TN FN
=+ (25)
where TN indicates true negative (normal people correctly recognized as normal), TP is true positive
(diseased people correctly recognized as diseased), FN is false negative (diseased people incorrectly
identified as normal), and FP expresses false positive (normal people incorrectly identified as diseased).
Classification models implemented for the study were PCA-LDA, PCA-DLDA, PCA-QDA,
PCA-DQDA, PCA-LSSVM and PCA-KNN. Simulation results showed that PCA-LDA, PCA-DLDA,
PCA-QDA, and PCA-DQDA based frameworks had not shown significant diagnostic performance.
Although, PCA-LSSVM showed enhanced accuracy rates then the aforesaid methods but PCA-KNN
achieved highest among all. Figures 2, 3, 4, 5 and 6 present a comparison among integrated models
using accuracy, sensitivity, specificity, PPV, and NPV rates respectively. Figure 2 illustrates that
PCA-LDA had 61.1% (training) and 60.89% (testing) accuracy, PCA-DLDA had 62.13% (training)
and 61.92% (testing) accuracy, PCA-QDA had 52.15% (training) and 52.14% (testing) accuracy, PCA-
DQDA had 52.67% (training) and 52.66% (testing) accuracy, PCA-LSSVM had 76.08% (training)
and 75.99% (testing) accuracy and PCA-KNN had 100% (training) and 99.83% (testing) accuracy.
Figure 3 depicts that PCA-LDA had 78.92% (training) and 78.44% (testing) sensitivity, PCA-
DLDA had 77.11% (training) and 76.65% (testing) sensitivity, PCA-QDA had 94.58% (training) and
94.61% (testing) sensitivity, PCA-DQDA had 96.39% (training) and 96.41% (testing) sensitivity,
PCA-LSSVM had 26.51% (training) and 26.35% (testing) sensitivity, and PCA-KNN had 100%
(training) and 100% (testing) sensitivity. Figure 4 shows that PCA-LDA had 100% (training) and
97.84% (testing) specificity, PCA-DLDA had 56.14% (training) and 56.01% (testing) specificity,
PCA-QDA had 35.18% (training) and 35.1% (testing) specificity, PCA-DQDA had 35.18% (training)
and 35.1% (testing) specificity, PCA-LSSVM had 95.9% (training) and 95.91% (testing) specificity,
and PCA-KNN had 100% (training) and 99.4% (testing) specificity.
Figure 5 illustrates that PCA-LDA had 40.68% (training) and 40.56% (testing) PPV, PCA-DLDA
had 41.29% (training) and 41.16% (testing) PPV, PCA-QDA had 36.85% (training) and 36.92% (testing)
Figure 2. The comparative view of obtained accuracy

Volume 11 • Issue 4 • October-December 2016
64
Figure 3. The comparative view of obtained sensitivity
Figure 4. The comparative view of obtained specificity
Figure 5. The comparative view of obtained positive predictive value

Volume 11 • Issue 4 • October-December 2016
65
PPV, PCA-DQDA had 37.3% (training) and 37.35% (testing) PPV, PCA-LSSVM had 72.13% (training)
and 72.13% (testing) PPV, and PCA-KNN had 100% (training) and 99.76% (testing) PPV. Figure
6 demonstrates that PCA-LDA had 86.49% (training) and 86.15% (testing) NPV, PCA-DLDA had
85.98% (training) and 85.66% (testing) NPV, PCA-QDA had 94.19% (training) and 94.19% (testing)
NPV, PCA-DQDA had 96.05% (training) and 96.05% (testing) NPV, PCA-LSSVM had 76.54%
(training) and 76.447% (testing) NPV, and PCA-KNN had 100% (training) and 100% (testing) NPV.
To select the most efficient diagnosis system, obtained results (accuracy, sensitivity, specificity,
PPV and NPV) of all integrated model were compared. Table 2 presents the simulation results for
comparison. It was found that PCA-KNN based framework outperforms all and was selected as the
best predictive model for liver disease. This integrated approach combines the advantages of both
Figure 6. The comparative view of obtained negative predictive value
Table 2. The simulation results of integrated classification models

Volume 11 • Issue 4 • October-December 2016
66
PCA and KNN such as high classification rates, good generalization, plain structure and efficient
problem solving ability through feature reduction. Achieved classification accuracy, sensitivity,
specificity, PPV and NVP of the model were 99.83%, 100%, 99.4%, 99.76% and 100% respectively.
Usually, clinicians have the prime role in final judgment on patient’s health condition but performing
a resourceful diagnosis is an intricate job that requires enormous medical experience. Certainly, these
computationally intelligent methods cannot replace physicians’ role but may positively assist them in
examining medical records by acting as a second opinion. This study is also an effort in that direction
to propose a PCA-KNN based predictive model for the efficient and effective diagnosis of liver disease.

Development of computer-aided diagnostic systems is a productive work in clinical research. Medicine
field do have continuous advancements but diagnosing a disease is still challenging. Similarly,
assessment of liver disease in the initial stages is also difficult. As a part of constant efforts for
making liver diagnosis process well-organized and proficient, this study aims to build a PCA-KNN
based two-phase intelligent prediction model having an inclusive analytic structure which boost the
classification performance. In first phase, PCA is deployed for dimensionality reduction and in second
phase, KNN algorithm with correlation distance metric is employed to distinguish between sick and
healthy individual. For future work, more features can be added to the dataset for experimentation
and the proposed methodology can also be applied for assessing other disease.

Volume 11 • Issue 4 • October-December 2016
67

Aldape-Perez, M., Yanez-Marquez, C., Camacho-Nieto, O. J., & Arguelles-Cruz, A. (2012). An associative
memory approach to medical decision support systems. Computer Methods and Programs in Biomedicine, 106,
287–307. doi:10.1016/j.cmpb.2011.05.002 PMID:21703713
Autio, L., Juhola, M., & Laurikkala, J. (2007). On the neural network classification of medical data and an
endeavour to balance non-uniform data sets with artificial data extension. Computers in Biology and Medicine,
37(3), 388–397. doi:10.1016/j.compbiomed.2006.05.001 PMID:16780826
Azaid, S., Fakhr, M. W., & Mohamed, F. (2006). Automatic Diagnosis of Liver Diseases from Ultrasound Images.
Proceedings of the 2006 Int. Conf. Comput. Eng. Syst. (pp. 313–319).
Belhumeur, P. N., Hespanha, J. P., & Kriegman, D. J. (1997). Eigenfaces vs. fisherfaces: Recognition using class
specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), 711–720.
doi:10.1109/34.598228
Bucak, İ. Ö., & Baki, S. (2010). Diagnosis of liver disease by using CMAC neural network approach. Expert
Systems with Applications, 37(9), 6157–6164. doi:10.1016/j.eswa.2010.02.112
Celikyilmaz, A., Türkşen, I. B., Aktaş, R., Doganay, M. M., & Ceylan, N. B. (2009). Increasing accuracy of two-
class pattern recognition with enhanced fuzzy functions. Expert Systems with Applications, 36(2), 1337–1354.
doi:10.1016/j.eswa.2007.11.039
Chuang, C.-L. (2011). Case-based reasoning support for liver disease diagnosis. Artificial Intelligence in Medicine,
53(1), 15–23. doi:10.1016/j.artmed.2011.06.002 PMID:21757326
Comak, E., Polat, K., Gunes, S., & Arslan, A. (2007). A new medical decision making system: Least square
support vector machine (LSSVM) with Fuzzy Weighting Pre-processing. Expert Systems with Applications,
32(2), 409–414. doi:10.1016/j.eswa.2005.12.001
Cortes, C., & Vapnik, V. (1995). Support-Vector Networks. Machine Learning, 20(3), 273–297. doi:10.1007/
BF00994018
Dehuri, S., & Cho, S. B. (2010). Evolutionarily optimized features in functional link neural network for
classification. Expert Systems with Applications, 37(6), 4379–4391. doi:10.1016/j.eswa.2009.11.090
Dong, W., Moses, C., & Li, K. (2011). Efficient k-nearest neighbor graph construction for generic similarity
measures. In WWW (pp. 577–586).
Elizondo, D., Birkenhead, R., Gamez, M., Garcia, N., & Alfaro, E. (2012). Linear separability and classification
complexity. Expert Systems with Applications, 39(9), 7796–7807. doi:10.1016/j.eswa.2012.01.090
Floares, A. G. (2009). Intelligent clinical decision supports for interferon treatment in chronic hepatitis C and B
based on i-biopsy. Proceedings of the2009 International Joint Conference on Neural Networks (pp. 855–860).
doi:10.1109/IJCNN.2009.5178905
Gorunescu, F., Belciug, S., Gorunescu, M., & Badea, R. (2012). Intelligent decision-making for liver fibrosis
stadialization based on tandem feature selection and evolutionary-driven neural network. Expert Systems with
Applications, 39(17), 12824–12832. doi:10.1016/j.eswa.2012.05.011
Guo, Y., Hastie, T., & Tibshirani, R. (2007). Regularized linear discriminant analysis and its application in
microarrays. Biostatistics (Oxford, England), 8(1), 86–100. doi:10.1093/biostatistics/kxj035 PMID:16603682
Hamamoto, I., Okada, S., Hashimoto, T., Wakabayashi, H., Maeba, T., & Maeta, H. (1995). Prediction of the
early prognosis of the hepatectomized patient with hepatocellular carcinoma with a neural network. Computers
in Biology and Medicine, 25(1), 49–59. doi:10.1016/0010-4825(95)98885-H PMID:7600761
Hashem, A. M., Rasmy, M. E. M., Wahba, K. M., & Shaker, O. G. (2010). Prediction of the degree of liver
fibrosis using different pattern recognition techniques. 2010 5th Cairo International Biomedical Engineering
Conference, CIBEC 2010. pp. 210–214. doi:10.1109/CIBEC.2010.5716043
Hayashi, Y., Setiono, R., Yoshida, K. (2000). A comparison between two neural network rule extraction techniques
for the diagnosis of hepatobiliary disorders. Artificial intelligence in Medicine, 20(3), 205–216.

Volume 11 • Issue 4 • October-December 2016
68
Icer, S., Kara, S., & Güven, A. (2006). Comparison of multilayer perceptron training algorithms for portal
venous doppler signals in the cirrhosis disease. Expert Systems with Applications, 31(2), 406–413. doi:10.1016/j.
eswa.2005.09.037
Kahramanli, H., & Allahverdi, N. (2009). Extracting rules for classification problems: AIS based approach.
Expert Systems with Applications, 36(7), 10494–10502. doi:10.1016/j.eswa.2009.01.029
Kulluk, S., Ozbakır, L., & Baykasoğlu, A. (2013). Fuzzy DIFACONN-miner: A novel approach for fuzzy
rule extraction from neural networks. Expert Systems with Applications, 40(3), 938–946. doi:10.1016/j.
eswa.2012.05.050
Lee, C.-C., Chung, P.-C., & Chen, Y.-J. (2005). Classification of liver diseases from CT images using BP-CMAC
neural network. Proceedings of the 2005 9th International Workshop on Cellular Neural Networks and their
Applications.
Li, B. N., Chui, C. K., Chang, S., & Ong, S. H. (2012). A new unified level set method for semi-automatic liver
tumor segmentation on contrast-enhanced CT images. Expert Systems with Applications, 39(10), 9661–9668.
doi:10.1016/j.eswa.2012.02.095
Li, D.-C., & Liu, C.-W. (2010). A class possibility based kernel to increase classification accuracy for small
data sets using support vector machines. Expert Systems with Applications, 37(4), 3104–3110. doi:10.1016/j.
eswa.2009.09.019
Li, D.-C., Liu, C.-W., & Hu, S. C. (2010). A learning method for the class imbalance problem with medical
data sets. Computers in Biology and Medicine, 40(5), 509–518. doi:10.1016/j.compbiomed.2010.03.005
PMID:20347072
Lin, R.-H. (2009). An intelligent model for liver disease diagnosis. Artificial Intelligence in Medicine, 47(1),
53–62. doi:10.1016/j.artmed.2009.05.005 PMID:19540738
Lin, R. H., & Chuang, C. L. (2010). A hybrid diagnosis model for determining the types of the liver disease.
Computers in Biology and Medicine, 40(7), 665–670. doi:10.1016/j.compbiomed.2010.06.002 PMID:20591425
Luukka, P. (2009). Classification based on fuzzy robust PCA algorithms and similarity classifier. Expert Systems
with Applications, 36(4), 7463–7468. doi:10.1016/j.eswa.2008.09.015
Mezyk, E., & Unold, O. (2011). Mining fuzzy rules using an Artificial Immune System with fuzzy partition
learning. Applied Soft Computing, 11(2), 1965–1974. doi:10.1016/j.asoc.2010.06.012
Ming, L. K., Kiong, L. C., & Soong, L. W. (2011). Autonomous and deterministic supervised fuzzy clustering
with data imputation capabilities. Applied Soft Computing, 11(1), 1117–1125. doi:10.1016/j.asoc.2010.02.011
Neshat, M., & Zadeh, A. E. (2010). Hopfield neural network and fuzzy Hopfield neural network for diagnosis of
liver disorders. Proceedings of the 2010 5th IEEE International Conference Intelligent Systems (pp. 162–167).
doi:10.1109/IS.2010.5548321
Obot, O. U., & Udoh, S. S. (2011). A framework for fuzzy diagnosis of hepatitis. Proceedings of the 2011 World
Congr. Inf. Commun. Technol. (pp. 439–443).
Ozyilmaz, L., & Yildirim, T. (2003). Artificial neural networks for diagnosis of hepatitis disease.Proc. Int. Jt.
Conf. Neural Networks. doi:10.1109/IJCNN.2003.1223422
Polat, K., Sahan, S., Kodaz, H., & Güneş, S. (2007). Breast cancer and liver disorders classification using artificial
immune recognition system (AIRS) with performance evaluation by fuzzy resource allocation mechanism. Expert
Systems with Applications, 32(1), 172–183. doi:10.1016/j.eswa.2005.11.024
Revett, K., Gorunescu, F., Gorunescu, M., & Ene, M. 2006. Mining A Primary Biliary Cirrhosis Dataset Using
Rough Sets and a Probabilistic Neural Network. Proceedings of the 2006 3rd International IEEE Conference
Intelligent Systems (pp. 284–289). doi:10.1109/IS.2006.348432
Samet, H. (2008). K-nearest neighbor finding using MaxNearestDist. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 30(2), 243–252. doi:10.1109/TPAMI.2007.1182 PMID:18084056

Volume 11 • Issue 4 • October-December 2016
69
Aman Singh is working as an Assistant Professor in Department of Computer Science and Engineering at Lovely
Professional University, Punjab, India. He has about four years of teaching and research experience and his areas
of interest are biomedical engineering, information security, cyber-crime and computer forensics.
Babita Pandey is an Associate Professor at the School of Computer Engineering in Lovely Professional University,
Punjab, India. She has over seven years of teaching experience and has published around 40 research papers and
articles. Her main research interests are AI and multi-agent system and its application to medicine, e-commerce
and semantic web.
Sartakhti, J. S., Zangooei, M. H., & Mozafari, K. (2015). Hepatitis disease diagnosis using a novel hybrid method
based on support vector machine and simulated annealing (SVM-SA). Computer Methods and Programs in
Biomedicine, 108(2), 570–579. doi:10.1016/j.cmpb.2011.08.003 PMID:21968203
Singh, A., & Pandey, B. (2014). Intelligent techniques and applications in liver disorders: A survey. Int. J.
Biomed. Eng. Technol., 16(1), 27–70. doi:10.1504/IJBET.2014.065638
Srivastava, S., Gupta, M. R., & Frigyik, B. A. (2007). Bayesian Quadratic Discriminant Analysis. Journal of
Machine Learning Research, 8, 1277–1305.
Sun, S., & Huang, R. 2010. An adaptive k-nearest neighbor algorithm. Proceedings of the 2010 7th International
Conference on Fuzzy Systems and Knowledge Discovery FSKD ‘10 (pp. 91–94). doi:10.1109/FSKD.2010.5569740
Suykens, J. A. K., & Vandewalle, J. (1999). Least Squares Support Vector Machine Classifiers. Neural Processing
Letters, 9(3), 293–300. doi:10.1023/A:1018628609742
Torun, Y., & Tohumoglu, G. (2011). Designing simulated annealing and subtractive clustering based fuzzy
classifier. Applied Soft Computing, 11(2), 2193–2201. doi:10.1016/j.asoc.2010.07.020
Tsujinishi, D., & Abe, S. 2003. Fuzzy least squares support vector machines for multiclass problems, in: Neural
Networks. pp. 785–792. doi:10.1016/S0893-6080(03)00110-2
Yan, W., Lizhuang, M., Xiaowei, L., & Ping, L. 2008. Correlation between Child-Pugh Degree and the Four
Examinations of Traditional Chinese Medicine (TCM) with Liver Cirrhosis. Proceedings of the 2008 Int. Conf.
Biomed. Eng. Informatics (pp. 858–862).
Ye, J., & Li, Q. (2005). A two-stage linear discriminant analysis via QR-decomposition. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 27(6), 929–941. doi:10.1109/TPAMI.2005.110 PMID:15943424
... In [23], the authors constructed an intelligent diagnosis system that integrates principle component analysis (PCA) and kNN methods to evaluate a liver patient dataset. The method operates by combining the extraction of feature and the classification, respectively realized by PCA and kNN. ...
... Their model combined feature extraction and classification steps, performed by PCA and KNN, respectively. The results were compared using statistical metrics, including positive and negative predictive values [8]. Hashmi and Khan proposed a fuzzy control model for diagnosing a disease related to the human liver. ...
Article
Full-text available
Decision support systems improve medical diagnosis and minimize diagnostic errors. Existing diagnostic systems are often complex and exhibit limited performance on liver diseases, particularly the liver cancer. This paper presents a fuzzy decision support system for helping students diagnose some human liver diseases in educational medical institutions. The proposed system aims to improve real medical diagnosis processes. The approach has three basic steps: 1) symptoms-based diagnosis, 2) liver function-based diagnosis, and 3) image processing- based diagnosis. The proposed system employs two artificial intelligence techniques: fuzzy logic and image processing. The first is used for diagnosing liver diseases based on the liver function tests, while the second is used for diagnosing liver diseases such as the liver cancer, hepatitis, liver cirrhosis, liver fibrosis, and fatty liver. The proposed system combines two methods: the Mamdani inference and simulation method used in the MATLAB17 fuzzy logic toolbox, and the gray level co-occurrence matrix, for extracting the features of the second- order statistical texture of images acquired using computed tomography, magnetic resonance imaging, or ultrasound, for various liver diseases. Our results reveal a very good agreement between expert-made and system-made diagnoses, suggesting high accuracy. © 2020. The Korean Institute of Intelligent Systems. All Rights Reserved.
... In more details, the architecture is comprised of five layers. The first layer receives the input values and determines the [31] particle swarm optimization Liver Disease 2013 Satarkar et al. [32] Fuzzy expert system Liver Disease 2015 Hashemi et al. [33] fuzzy logic Liver Disease 2015 Singh et al. [34] Principal Component Analysis and K-Nearest Neighbor (PCA-KNN) ...
Preprint
Full-text available
A new Adaptive Neuro Particle Swarm Optimization (ANPSO) combined with a fuzzy inference system for diagnosing disorders is presented in this paper. The main contributions of the novel proposed method can be a global search across the whole search space with faster convergence rate. Moreover, it shows a better exploration and exploitation by applying the adaptive control parameters, automatic control of inertia weight and coefficient of personal and social behaviours. Utilizing such attributes lead to a fast and smart diagnosis mechanism which is able to diagnosis the diseases by the high accuracy. The ANPSO is associated with tuning the characteristics of the inference system to achieve the minimum diagnosis error as far as the optimized model is obtained. As a case study, we use liver disorders dataset called Bupa. According to the preliminary ramifications, the suggested adaptive PSO performance can overcome the traditional inference system and combined with other optimization methods substantially. Adaptive Fuzzy Inference Systems Nuro Inference System OptimizationAdaptive Particle Swarm Optimization Diagnosing disorders.
... The system was developed at the beginning of the 1990s [44]. [27] particle swarm optimization Liver Disease 2013 Satarkar et al. [28] Fuzzy expert system Liver Disease 2015 Hashemi et al. [29] fuzzy logic Liver Disease 2015 Singh et al. [30] Principal Component Analysis and K-Nearest Neighbor (PCA-KNN) ...
Preprint
Full-text available
In this study, a hybrid method based on an Adaptive Neuro-Fuzzy Inference System (ANFIS) and Particle Swarm Optimization (PSO) for diagnosing Liver disorders (ANFIS-PSO) is introduced. This smart diagnosis method deals with a combination of making an inference system and optimization process which tries to tune the hyper-parameters of ANFIS based on the data-set. The Liver diseases characteristics are taken from the UCI Repository of Machine Learning Databases. The number of these characteristic attributes are 7, and the sample number is 354. The right diagnosis performance of the ANFIS-PSO intelligent medical system for liver disease is evaluated by using classification accuracy, sensitivity and specificity analysis, respectively. According to the experimental results, the performance of ANFIS-PSO can be more considerable than traditional FIS and ANFIS without optimization phase.
Conference Paper
The prevention of hepatocellular carcinoma (HCC), which is rated third for causing death due to cancer in the world, and the selection of more effective treatment have necessitated the development of HCC diagnosis and prediction systems using artificial intelligence. The presented paper examines the possibility of applying machine learning algorithms to predict liver cancer. Machine learning methods such as Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF) are used to predict HCC. The HCC Dataset taken from the website Kaggle (Kaggle.com) is referenced for the realization of prediction. This research uses the libraries scikit- learn, Pandas, NumPy, etc. in the Jupiter programming environment to conduct experiments. The results of the experiments are compared, and the RF classifier is estimated to perform the highest result. Referring to this fact, the importance of using the RF method in building an initial HCC diagnosis and prognosis system is justified.
Article
Amygdalin content in apricot kernels is an essential factor in the rapid and nondestructive identification of sweet or bitter apricot kernels through spectroscopy. Now, amygdalin content has been determined by high-performance liquid chromatography and near-infrared spectral database to construct a model so that the sweet or bitter apricot kernels could be identified and classified. Principal component analysis-K-nearest neighbor classification algorithm combined with multivariate scattering correction pretreatment method could distinguish sweet and bitter apricot kernels in the wavelength range of 1650-1740 nm with 98.3% accuracy and apricot kernel species with 96.3% recognition rate in the full wavelength spectrum. Furthermore, prediction of amygdalin content in bitter and sweet apricot kernels by partial least squares model was superior to that by back-propagation neural network model. This study provides a theoretical basis for quality identification of apricot kernel quality, as well as a method for nondestructive and rapid detection of sweet and bitter apricot kernels. Supplementary information: The online version contains supplementary material available at 10.1007/s10068-022-01095-y.
Article
Full-text available
The article proposes the principles for the development of a fuzzy rule-based physician decision support system n to determine the stages of the most common hepatocellular carcinoma (HCC) among malignant tumors of liver. The stages of HCC, i.e., critical situations, are expressed by different combinations of clinical signs of input data and emerging clinical conditions. These combinations shape the multiplicity of possible situations (critical situations) by forming linguistic rules that are in fuzzy relations with one another. The article presents the task of developing a fuzzy rules-based system for HCC staging by classifying the set of possible situations into given classes. In order to solve the problem, fuzzy rules of clinical situations and critical situations deviated from them are developed according to the possible clinical signs of input data. The rules in accordance with the decision-making process are developed in two phases. In the first phase, three input data are developed: nine rules are developed to determine possible clinical conditions based on the number, size, and vascular invasion of tumor. In the second phase, seven rules are developed based on possible combinations of input data on the presence of lymph nodes and metastases in these nine clinical conditions. At this stage, the rules representing the fuzzification of results obtained are also described. The latter provide an interpretation of results and a decision on related stage of HCC. It also proposes a functional scheme of fuzzy rules-based system for HCC staging, and presents the working principle of structural blocks. The fuzzy rule-based system for HCC staging can be used to support physicians to make diagnostic and treatment decisions
Article
Full-text available
The diagnosis of diseases is decisive for planning proper treatment and ensuring the well-being of patients. Human error hinders accurate diagnostics, as interpreting medical information is a complex and cognitively challenging task. The application of artificial intelligence (AI) can improve the level of diagnostic accuracy and efficiency. While the current literature has examined various approaches to diagnosing various diseases, an overview of fields in which AI has been applied, including their performance aiming to identify emergent digitalized healthcare services, has not yet been adequately realized in extant research. By conducting a critical review, we portray the AI landscape in diagnostics and provide a snapshot to guide future research. This paper extends academia by proposing a research agenda. Practitioners understand the extent to which AI improves diagnostics and how healthcare benefits from it. However, several issues need to be addressed before successful application of AI in disease diagnostics can be achieved.
Article
Full-text available
Məqalədə qaraciyərin bədxassəli şişləri arasında ən geniş yayılan hepatosellular karsinomanın (HSK) mərhələlərinin təyini üçün həkim qərarlarının qəbulunu dəstəkləyən sistemin yaradılması prinsipləri təklif edilmişdir. HSK kritik vəziyyətlərin kliniki hallar toplusu ilə xarakterizə olunur, bunların da hər biri kliniki əlamətlər çoxluğu ilə təyin edilir. HSK-nın konkret kliniki halı əlamətlərin müxtəlif kombinasiyaları ilə ifadə olunur, bu kombinasiyalar mümkün situasiyaların çoxvariantlılığını şərtləndirir. HSK-nın mərhələlərinin təyini mümkün situasiyalar çoxluğunun verilmiş siniflər üzrə təsnifləndirilməsini tələb edir və xəstəliyin müalicəsi sxeminin seçilməsinin əsasını təşkil edir. Çoxvariantlı situasiyalar şəraitində yol verilə biləcək həkim səhvlərinin qarşısının alınması məqsədilə HSK-nın mərhələlərinin təyini üçün intellektual sistemin işlənilməsi məsələsi qoyulmuşdur. İntellektual sistemlərin işlənilməsi metodologiyasına uyğun olaraq HSK-nın mərhələlərinin təyini məsələsinin konseptual modeli təklif olunmuş, əsas anlayışlar, onlar arasında qarşılıqlı əlaqələr müəyyənləşdirilmişdir. Əldə olunan ekspert biliklərinin sistemə transformasiyası üçün biliklərin təsvirinin produksiya modelindən istifadə olunmuş, bilik bazasını formalaşdıran qaydalar işlənilmişdir. HSK-nın mərhələlərinin təyini intellektual sisteminin strukturu işlənilmiş, onu təşkil edən blokların iş prinsipi göstərilmişdir. Delphi2009 proqramlaşdırma platformasında reallaşdırılan sistemin interfeysi və qərarların qəbul olunması mexanizmi təsvir edilmişdir. HSK-nın mərhələlərinin təyini üçün işlənilmiş intellektual sistem diaqnoz-müalicə qərarlarının qəbulu prosesində həkimlərə dəstək göstərmək üçün istifadə oluna bilər.
Article
Full-text available
Liver disease is one of the leading causes of mortality in India, as it is in rest of the world. This paper presents a survey on intelligent techniques applied to liver disorders between the years January 1995 and January 2013. Individual ITs include artificial neural network (ANN), data mining (DM), fuzzy logic (FL) etc. Integrated ITs combine methods as artificial neural network-case-based reasoning (ANN-CBR), artificial immune system-artificial neural network-fuzzy logic (AIS-ANN-FL) etc. The different types of liver disorders covered in the study are hepatitis, liver fibrosis, liver cirrhosis, liver cancer, fatty liver, liver disorders data set, hepatitis data set and hepatobiliary disorders data set. The study identifies which ITs are applied for what types of liver disorders and on which types of disorders maximum works has been done. Another imperative fact emerging from this survey is that large part of the research work on liver disorders has been done from 2007 onwards.
Conference Paper
Full-text available
Liver biopsy is considered as mandatory for the management of patients infected with the hepatitis C virus (HCV), particularly for staging of fibrosis degree. However, due to its invasive nature and limitations of sampling error, the tendency is to substitute the liver biopsy with non-invasive method. The objective of this study is to combine the serum biomarkers and histopathological findings to develop a classification model that can predict the hepatic fibrosis stage. The best developed classification model was able to predict the different fibrosis grades with accuracy of 93.7%. This accuracy represents a substantial improvement over previous works and would pave the way to utilize classification models as a clinically non-invasive and reliable method to assess the degree of liver fibrosis.
Article
In Support Vector Machines (SVMs), the solution of the classification problem is characterized by a (convex) quadratic programming (QP) problem. In a modified version of SVMs, called Least Squares SVM classifiers (LS-SVMs), a least squares cost function is proposed so as to obtain a linear set of equations in the dual space. While the SVM classifier has a large margin interpretation, the LS-SVM formulation is related in this paper to a ridge regression approach for classification with binary targets and to Fisher's linear discriminant analysis in the feature space. Multiclass categorization problems are represented by a set of binary classifiers using different output coding schemes. While regularization is used to control the effective number of parameters of the LS-SVM classifier, the sparseness property of SVMs is lost due to the choice of the 2-norm. Sparseness can be imposed in a second stage by gradually pruning the support value spectrum and optimizing the hyperparameters during the sparse approximation procedure. In this paper, twenty public domain benchmark datasets are used to evaluate the test set performance of LS-SVM classifiers with linear, polynomial and radial basis function (RBF) kernels. Both the SVM and LS-SVM classifier with RBF kernel in combination with standard cross-validation procedures for hyperparameter selection achieve comparable test set performances. These SVM and LS-SVM performances are consistently very good when compared to a variety of methods described in the literature including decision tree based algorithms, statistical algorithms and instance based learning methods. We show on ten UCI datasets that the LS-SVM sparse approximation procedure can be successfully applied.
Article
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Article
Artificial neural networks (ANNs) are mathematical models inspired from the biological nervous system. They have the ability of predicting, learning from experiences and generalizing from previous examples. An important drawback of ANNs is their very limited explanation capability, mainly due to the fact that knowledge embedded within ANNs is distributed over the activations and the connection weights. Therefore, one of the main challenges in the recent decades is to extract classification rules from ANNs. This paper presents a novel approach to extract fuzzy classification rules (FCR) from ANNs because of the fact that fuzzy rules are more interpretable and cope better with pervasive uncertainty and vagueness with respect to crisp rules. A soft computing based algorithm is developed to generate fuzzy rules based on a data mining tool (DIFACONN-miner), which was recently developed by the authors. Fuzzy DIFACONN-miner algorithm can extract fuzzy classification rules from datasets containing both categorical and continuous attributes. Experimental research on the benchmark datasets and comparisons with other fuzzy rule based classification (FRBC) algorithms has shown that the proposed algorithm yields high classification accuracies and comprehensible rule sets.
Article
We study the relationship between linear separability and the level of complexity of classification data sets. Linearly separable classification problems are generally easier to solve than non linearly separable ones. This suggests a strong correlation between linear separability and classification complexity. We propose a novel and simple method for quantifying the complexity of the classification problem. The method, which is shown below, reduces any two class classification problem to a sequence of linearly separable steps. The number of such reduction steps could be viewed as measuring the degree of non-separability and hence the complexity of the problem. This quantification in turn can be used as a measure for the complexity of classification data sets. Results obtained using several benchmarks are provided.
Article
A study of the orthodox practice of diagnosing hepatitis revealed that inexactness in the diagnostic results has led several patients into abusing therapies. This prompted a further study into how this could be resolved. In this regard, effort was made for medical doctors to specify some linguistic labels while taking history and performing medical examinations on the patients. The effort yielded few responses which necessitated a study of the application of fuzzy logic technology to medical diagnosis. The symptoms were fuzzified with some membership functions which aided in the extraction of fuzzy rule base. With data and rules, fuzzy inference using the maxmin method was applied on the knowledge base, the results obtained were defuzzified to obtain crisp outputs that represent the diagnostic values with linguistic labels. The novelty of the result is that the degree or extent to which a patient suffers from hepatitis is reported to the patient and based on such revelation therapy would be administered without an abuse.