ArticlePDF Available

Leukemia Image Segmentation Using a Hybrid Histogram-Based Soft Covering Rough K-Means Clustering Algorithm

Authors:
  • Sona College of Arts and Science

Abstract and Figures

Segmenting an image of a nucleus is one of the most essential tasks in a leukemia diagnostic system. Accurate and rapid segmentation methods help the physicians identify the diseases and provide better treatment at the appropriate time. Recently, hybrid clustering algorithms have started being widely used for image segmentation in medical image processing. In this article, a novel hybrid histogram-based soft covering rough k-means clustering (HSCRKM) algorithm for leukemia nucleus image segmentation is discussed. This algorithm combines the strengths of a soft covering rough set and rough k-means clustering. The histogram method was utilized to identify the number of clusters to avoid random initialization. Different types of features such as gray level co-occurrence matrix (GLCM), color, and shape-based features were extracted from the segmented image of the nucleus. Machine learning prediction algorithms were applied to classify the cancerous and non-cancerous cells. The proposed strategy is compared with an existing clustering algorithm, and the efficiency is evaluated based on the prediction metrics. The experimental results show that the HSCRKM method efficiently segments the nucleus, and it is also inferred that logistic regression and neural network perform better than other prediction algorithms.
Content may be subject to copyright.
electronics
Article
Leukemia Image Segmentation Using a Hybrid
Histogram-Based Soft Covering Rough K-Means
Clustering Algorithm
Hannah Inbarani H. 1, Ahmad Taher Azar 2, 3, * and Jothi G 4
1
Department of Computer Science, Periyar University, Tamil Nadu, Salem 636 011, India; hhinba@gmail.com
2Robotics and Internet-of-Things Lab (RIOTU), Prince Sultan University, Riyadh 11586, Saudi Arabia
3Faculty of Computers and Artificial Intelligence, Benha University, Benha 13511, Egypt
4Department of Computer Science and Engineering, Sona College of Technology (Autonomous),
Salem 636005, India; jothiys@gmail.com
*Correspondence: aazar@psu.edu.sa or ahmad.azar@fci.bu.edu.eg
Received: 1 December 2019; Accepted: 1 January 2020; Published: 19 January 2020


Abstract:
Segmenting an image of a nucleus is one of the most essential tasks in a leukemia diagnostic
system. Accurate and rapid segmentation methods help the physicians identify the diseases and
provide better treatment at the appropriate time. Recently, hybrid clustering algorithms have started
being widely used for image segmentation in medical image processing. In this article, a novel hybrid
histogram-based soft covering rough k-means clustering (HSCRKM) algorithm for leukemia nucleus
image segmentation is discussed. This algorithm combines the strengths of a soft covering rough set
and rough k-means clustering. The histogram method was utilized to identify the number of clusters
to avoid random initialization. Dierent types of features such as gray level co-occurrence matrix
(GLCM), color, and shape-based features were extracted from the segmented image of the nucleus.
Machine learning prediction algorithms were applied to classify the cancerous and non-cancerous
cells. The proposed strategy is compared with an existing clustering algorithm, and the eciency is
evaluated based on the prediction metrics. The experimental results show that the HSCRKM method
eciently segments the nucleus, and it is also inferred that logistic regression and neural network
perform better than other prediction algorithms.
Keywords:
leukemia nucleus image; segmentation; soft covering rough set; clustering; machine
learning algorithm; soft computing
1. Introduction
Due to the growth of advanced medical imaging modalities, it is very dicult to analyze the
medical images manually. For this reason, an advanced and ecient computer-aided system is needed
to diagnose the diseases. This will help the hematologist to begin the treatment at the right time and
increase the patient’s survival rate. Leukemia is a cancer of blood-forming tissues that aects the bone
marrow. Leukemia is caused by the proliferation of abnormal white blood cells in the body. Leukemia
is mostly aected by people living in developed countries and children aged 14 or under. As per
the National Cancer Institute (NCI) statistics, in the United States, it is expected that there will be
62,130 persons as new cases for cancer treatment and 245,000 cases that are fatal or very serious [
1
]. In
India, leukemia stands at ninth position among diseases (tumors) among children [
2
,
3
]. Leukemia is
identified into two broad categories such as acute and chronic. Acute forms of leukemia occur when
the number of immature blood cells increases, and it is the most common type of leukemia in children.
Segmenting an image of a nucleus is one of the major challenging tasks in leukemia diagnosis. Recently,
Electronics 2020,9, 188; doi:10.3390/electronics9010188 www.mdpi.com/journal/electronics
Electronics 2020,9, 188 2 of 22
soft computing plays an important role in many research areas such as medical image processing,
pattern recognition, big data analytics, Internet of Things (IoT) analysis, bioinformatics, and so on.
The rough set theory [
4
] was proposed by Pawlak in 1982. This concept is an extension of set theory
for the study of intelligent systems characterized by insucient and incomplete information. This
classical rough set theory is based on equivalence relations, but it can also be extended to covering based
rough sets [
5
7
]. In 1999, Molodtsov [
8
] proposed the concept of a soft set, which can be seen as a new
mathematical approach to vagueness. The absence of any restrictions on the approximate description in
soft set theory makes this theory very versatile and easily applicable in practice. Maji et al. [
9
] improved
Molodtsov’s idea by introducing several operations in soft set theory. In [
10
], the researcher investigated
a soft covering-based rough set as a new kind of soft rough set. This method is a combination of a
covering soft set and rough set. In [
11
], a covering-based rough k-means clustering approach is applied
to segment the leukemia nucleus. The advantage of covering-based subsets is that they generate upper
and lower approximations by using the covering feature, which brings about more roughness. Since
dierent clusters give rise to dierent results, determination of the number of clusters is a dicult
task in clustering-based segmentation. To overcome this limitation, the hybrid histogram-based soft
covering rough k-means clustering algorithm (HSCRKM) is introduced to segment the image of the
leukemia nucleus. In this algorithm, the peak values of the histogram of an image are identified
and the number of clusters is initialized. This will avoid the random initialization of a number of
clusters. Here, soft covering approximation space is also included. The term ‘covering soft set’ is more
accurate than ‘soft rough set.’ It also combines the strengths of covering soft set theory and the rough
k-means clustering algorithm to eectively segment the image of the nucleus. Soft covering rough
approximation is utilized to find the lower and upper approximation values. The performance of the
HSCRKM algorithm is evaluated using existing algorithms such as k-means clustering, fuzzy c-means
clustering, and particle swarm optimization (PSO)-based clustering. Dierent types of features such
as GLCM-0, GLCM-45, GLCM-90, GLCM-135, and shape color-based features are extracted from the
segmented leukemia nucleus image. Nowadays, a lot of machine learning algorithms are applied to
predict the degree of sickness. The state-of-art machine learning prediction algorithms such as neural
networks (NN) [
12
], logistic regression (LR) [
13
], support vector machine (SVM) [
14
], naive Bayes
(NB) [
15
], k-nearest neighborhood (KNN) [
13
], decision tree (DT) [
13
], and random forest (RF) [
16
] are
applied to classify the cancerous and non-cancerous leukemia cells. The empirical results show that
logistic regression and neural network eciently predict the blast and non-blast cells when compared
with other prediction algorithms.
The main objective of this research work is to develop a diagnostic approach for the identification
of acute lymphoblastic leukemia blast cells using image processing and computational intelligence
techniques. In experimental analysis, relevant image processing and computational intelligence
techniques are applied in order to select the most suitable approach for the delineation of acute
lymphoblastic leukemia cells. The following objectives have been formulated in order to predict
leukemia: to apply computational intelligence-based algorithms for the segmentation of acute
lymphoblastic leukemia blast cells in images and to apply machine learning algorithms to evaluate the
performance of the proposed method.
The contribution of this study is summarized as follows. To find the number of clusters using
the peak value of a histogram image and compute the lower and upper approximation values based
on the soft covering approximation space, three clustering methods—k-means, FCM, and PSO-based
clustering—are preferred for segmentation comparison. Through these methods, dierent kinds of
features are extracted, and the eciency of the proposed algorithm is assessed using machine learning
prediction algorithms. The HSCRKM achieves the successful results i.e., above 80% when compared
with the existing clustering algorithms. Therefore, it can be concluded that the HSCRKM clustering
algorithm works eectively with the other clustering algorithms.
In the clustering algorithm, defining the number of clusters is a very dicult task. To overcome
this limitation, the proposed algorithm identifies the peak values of the histogram of an image and
Electronics 2020,9, 188 3 of 22
initializes the number of clusters. This is one of the advantages of our proposed method, which avoids
the random initialization of a number of clusters. The next advantage of the HSCRKM algorithm is
that it combines the strengths of covering soft set theory and the rough k-means clustering algorithm
to eectively segment the image of the nucleus. Based on a literature review, the term ‘covering soft
set’ is more accurate than ‘soft rough set’, since it gives a better result than the soft rough set for several
applications. In covering rough sets, the lower and upper approximation values are computed based
on the soft covering approximation space.
Morphologically, a lymphoblast consists of a massive nucleus of irregular shape and size. In blood
sample images, it is dicult to identify the cytoplasm, because it appears rarely and even if it does, it
looks intensely colored. The nucleus and cytoplasm of lymphoblast cells reflect the morphological and
functional changes. Feature extraction plays a main role in the assessment of leukemia in blood samples.
After segmenting the nucleus using the proposed HSCRKM algorithm, salient features are extracted.
It reduces the amount of data space and the working time of an image. In this research, dierent
kinds of features are extracted such as gray level co-occurrence matrix (GLCM), color, and shape-based
features. These were measured from every channel of the segmented nucleus image. The eciency of
the proposed algorithm is assessed using machine learning prediction algorithms. The performance of
the segmentation algorithms was analyzed in the light of dierent machine learning (ML) prediction
methods. With respect to HSCRKM clustering algorithms, most of the ML methods (except naive
Bayes) achieved greater than 80% prediction accuracy compared with the existing clustering algorithms,
viz., k-means clustering, fuzzy c-means clustering, and rough k-means clustering. It is inferred that the
proposed clustering algorithms are more eective in segmenting the nucleus image. Due to the eective
segmentation process, the extracted features have increased the prediction accuracy. To evaluate the
experimental results, we have empirically set the best accuracy to be greater than 80%. The outline of
the proposed system is shown in Figure 1.
Electronics 2020, 9, x FOR PEER REVIEW 3 of 24
In the clustering algorithm, defining the number of clusters is a very difficult task. To overcome
this limitation, the proposed algorithm identifies the peak values of the histogram of an image and
initializes the number of clusters. This is one of the advantages of our proposed method, which
avoids the random initialization of a number of clusters. The next advantage of the HSCRKM
algorithm is that it combines the strengths of covering soft set theory and the rough k-means
clustering algorithm to effectively segment the image of the nucleus. Based on a literature review,
the term ‘covering soft set’ is more accurate than ‘soft rough set’, since it gives a better result than the
soft rough set for several applications. In covering rough sets, the lower and upper approximation
values are computed based on the soft covering approximation space.
Morphologically, a lymphoblast consists of a massive nucleus of irregular shape and size. In
blood sample images, it is difficult to identify the cytoplasm, because it appears rarely and even if it
does, it looks intensely colored. The nucleus and cytoplasm of lymphoblast cells reflect the
morphological and functional changes. Feature extraction plays a main role in the assessment of
leukemia in blood samples. After segmenting the nucleus using the proposed HSCRKM algorithm,
salient features are extracted. It reduces the amount of data space and the working time of an image.
In this research, different kinds of features are extracted such as gray level co-occurrence matrix
(GLCM), color, and shape-based features. These were measured from every channel of the
segmented nucleus image. The efficiency of the proposed algorithm is assessed using machine
learning prediction algorithms. The performance of the segmentation algorithms was analyzed in
the light of different machine learning (ML) prediction methods. With respect to HSCRKM
clustering algorithms, most of the ML methods (except naive Bayes) achieved greater than 80%
prediction accuracy compared with the existing clustering algorithms, viz., k-means clustering,
fuzzy c-means clustering, and rough k-means clustering. It is inferred that the proposed clustering
algorithms are more effective in segmenting the nucleus image. Due to the effective segmentation
process, the extracted features have increased the prediction accuracy. To evaluate the experimental
results, we have empirically set the best accuracy to be greater than 80%. The outline of the proposed
system is shown in Figure 1.
Figure 1. Outline of the proposed image segmentation process.
LR | NB | SVM | KNN | NN | RF
K-Means | FCM | PSO | HSCRKM
Input Image Preprocessing
Nucleus Segmentation
Classification
Texture | Shape | Colour
Feature Extraction
Figure 1. Outline of the proposed image segmentation process.
Electronics 2020,9, 188 4 of 22
The rest of the research report is organized as follows. Section 2reviews the related literature on
clustering-based segmentation algorithms. Section 3describes the methods of the proposed algorithm
and its results. The empirical results are discussed in Section 4. Section 5states the conclusion and
indicates the future direction of this research.
2. Related Literature
In recent years, a lot of clustering algorithms have been developed for segmenting medical images.
Petal [
17
] applied k-means clustering for segmentation and the Zack algorithm for clustered
white blood cells (WBCs). The features—namely, the mean, standard deviation, area, elongation,
perimeter, color etc.—are extracted, and support vector machine (SVM) was used to classify the
cells. The proposed algorithm eectively segmented the WBCs, which produced 93.57% accuracy.
For this experiment, 27 images from the Acute Lymphoblastic Leukemia Image Database (ALL-IDB)
were utilized.
Two bare-bones particle swarm optimization (BBPSO) algorithms with and without subswarms
were introduced by Srisukkham et al. in 2017 [
18
] to diagnosis the leukemia cells. A stimulating
discriminant measure (SDM)-based clustering algorithm that combined with the genetic algorithm
(GA) was employed to segment the nucleus, cytoplasm, and background regions. The relevant features
were extracted; then, various feature selection methods such as particle swarm optimization (PSO),
cuckoo search (CS), and dragonfly algorithm (DA) were applied to select the optimal features and
reduce the dimensions. An average geometric mean was computed with dierent sizes of training and
test samples to evaluate the performance of the proposed methods. The BBPSO and binary BBPSO
algorithms produced 91% to 96% of the geometric mean value.
Su [
19
] developed two stages of segmentation process using k-means clustering and HMRF
(hidden Markov random field), which are used to group the six dierent types of AML cells from
the bone marrow images. The segmentation algorithm achieved an accuracy of 96% to 98% (average)
when compared with other existing segmentation methods.
In [
20
], k-means and fuzzy c-means clustering algorithms were applied to segment the brain tumor
images. Various feature reduction algorithms, namely probabilistic principal component analysis
(PPCA), expectation maximization-based principal component analysis (EM-PCA), the generalized
Hebbian algorithm (GHA), and adaptive principal component extraction (APEX) were employed to
reduce the dimensions of the feature set. The produced coecient of variance (CV) values for k-means
and Fuzzy C-mean (FCM) are 0.4582 and 0.1224, respectively.
In [
21
], potential field segmentation was employed to segment the MRI brain tumor images.
This method achieved the standard deviation of 0.283, the average value of 0.517, and the median
values of 0.644. From the experimental results, it was observed that ensemble methods generated
better segmentations.
Küçükkülahlı [
22
] and Namburu [
23
] identified the number of cluster values in the clustering
algorithm using the peak value of the histogram of an image. In [
22
], the automatic segmentation
method using the histogram-based k-means clustering algorithm was developed. In [
23
], the soft
fuzzy rough c-means clustering algorithm (SFRCM) was used to segment the MRI brain tumor images.
The proposed SRFCM algorithm achieved a better Jaccard coecient value of 0.97 for without noise
and 0.79 for with 9% Gaussian noise when compared with the existing clustering algorithms namely,
k-means, rough k-means (RKM), rough fuzzy c-means (RFCM), and generalized rough c-means
(GFCM).
Ali [
24
] introduced a new clustering algorithm based on neutrosophic orthogonal matrices
(CANOM) to segment the dental X-ray images. The experimental results show that the CANOM
simplified silhouette width criterion (SSWC) index is 0.941 and the FCM is 0.02. CANOM is also better
than Otsu and eSFCM with the values being 0.657 and 0.647, respectively. The value of CANOM is 47
times larger than that of FCM and 1.43 times larger than those of Otsu and eSFCM.
Electronics 2020,9, 188 5 of 22
In [
25
], the unsupervised fuzzy c-means (FCM) clustering technique was employed for prostate
cancer MRI images. The derived average dice similarity, Jaccard index, sensitivity, specificity, mean
absolute dierence, and Hausdordistance is 88.68, 81.26, 90.71, 88.09, 88.09, 3.5, and 4.1 respectively.
In [
26
], the proposed multi-Otsu thresholding-based segmentation method can successfully
segmented the CT image stacks. In addition, it sows the distribution characteristics of dierent
components in three dimensions.
In [
27
], the enhanced adaptive fuzzy k-means (AFKM) algorithm was used to detect the three
regions such as white matter (WM), gray matter (GM), and cerebrospinal fluid spaces (CSF) in the
brain images. AFKM performed better than FCM, which produced a minimum mean square error
(MSE) value of 2.2441.
In [
28
], the clustering method intuitionistic fuzzy c-means (IFCM) was applied for medical image
segmentation. It is observed from the experimental results that the proposed method outperformed
other algorithms that achieved the average quantitative index 0.95. The chronic wound region was
detected using fuzzy spectral clustering in [
29
]. The proposed method produced 91.5% segmentation
accuracy, an 86.7% Dice index, a Jaccard score of 79.0%, 87.3% sensitivity, and 95.7% specificity.
In [
30
], the convolutional neural networks (CNN) approach is applied to identify the subtypes
of leukemia. It is observed from the experimental results that the CNN model achieves 88.25% and
81.74% accuracy for leukemia and healthy cells, respectively. From the literature review, it is inferred
that the clustering-based algorithms were applied to segment the tumor region. A brief review of the
literature on various clustering methods in image segmentation and their performances appears in
Table 1.
Electronics 2020,9, 188 6 of 22
Table 1. Overview of the literature on clustering algorithms.
Author Used Methods Objective Type of Diseases
Imaging
Modalities/Dataset
Used
No. of IMAGES Performance Metrics
and Accuracy %
Patel et al., 2015 [17]
K-mean clustering Zack
algorithm, Support vector
machines (SVM)
The K-means clustering algorithm
was used to detect the white blood
cells and the Zack algorithm was
applied to categorize the cells.
Leukemia Microscopic image
(ALL-IDB) 27 Classification
accuracy 93.57%
Srisukkham et al.,
2017 [18]
Spatial Data Mining
(SDM)-based clustering,
Genetic Algorithm (GA),
particle swarm
optimization (PSO), Bare
Bones PSO (BBPSO)
This optimization method was
utilized to diagnose leukemia.
Acute lymphoblastic
leukemia (ALL)
Microscopic image
(ALL-IDB) 180
Geometric mean 91 to
96%
Su et al., 2017 [19]K-means, Hidden
Markov random field
This algorithm segmented the
nucleus from the background,
extracted the features, and then
classified the blast cells.
Acute myeloid
leukemia
Microscopic image
(AML Patient) 61
Segmentation
accuracy 96 to 98%
(average)
Kaya et al., 2017 [20] K-means, fuzzy c-means
Comparative analysis of various
types of PCA algorithms on MRIs for
two cluster methods.
Brain tumor MRI (Hospital) -
Average
reconstruction error
rates, Euclidean
distance error rate,
CV of K-Means =
0.4582 FCM =0.1224
Cabria et al., 2017 [
21
]
Potential field clustering
The algorithm is based on an
analogy with the concept of potential
field in physics and views the
intensity of a pixel in an MRI as a
“mass” that creates a potential field.
Brain tumor MRI (BRATS) 30
SD =0.283, Average =
0.517, Median =0.644.
Küçükkülahlı et al.,
2016 [22]
Histogram-based
k-means clustering
This method to find the optimum
cluster number based on the
histogram of an image.
MATLAB media Image Dataset 10-15 Derived metrics
Ali et al., 2017 [23]
Fuzzy clustering based
on neutrosophic
orthogonal matrix
This algorithm transforms image
data into a neutrosophic set and
computes the inner products of the
cutting matrix of input. Then, pixels
are segmented using the orthogonal
principle to form clusters.
Dental X-Ray (Hospital) 22 DB index Silhouette
index =0.941
Electronics 2020,9, 188 7 of 22
Table 1. Cont.
Author Used Methods Objective Type of Diseases
Imaging
Modalities/Dataset
Used
No. of IMAGES Performance Metrics
and Accuracy %
Rundo et al., 2018 [
24
]
Fuzzy c-means (FCM)
This approach automatically
segments the prostate and image
computes the gland volume.
Prostate cancer MRI (Hospital) 7 (Patients)
Dice Similarity =
88.68, Jaccard index =
81.26, Sensitivity =
90.71, Specificity =
88.09, Mean Absolute
Dierence =3.5,
Hausdordistance =
4.1
Zhang et al., 2017 [
25
]
Multi-Otsu thresholding
algorithm
This segmentation method can
successfully segment CT image
stacks. In addition, it sows the
distribution characteristics of
dierent components in three
dimensions.
Backscattered
electron images X-ray CT (Hospital) 1571 (Slice) Derived metrics
Namburu et al., 2017
[26]
classical k-means (KM),
rough k-means (RKM),
rough fuzzy c-means
(RFCM), and generalized
rough c-means (GFCM).
In this method, soft fuzzy rough
approximations are applied to obtain
the rough regions of an image and
compute the similarity of the clusters
using soft set similarity coecient.
Brain tumor MRI (BRATS) 20 Jaccard’s coecient =
0.97 Accuracy
Ganesh et al., 2017
[27]
Enhanced adaptive fuzzy
k-means algorithm
This approach is used to classify the
three important regions in brain:
namely, white matter, gray matter,
and cerebrospinal fluid spaces.
Brain tumor MRI (Brain Image) 3 MSE 2.2441
Kaur 2017 [28]
Intuitionistic fuzzy
sets-based credibilistic
fuzzy c-means clustering
In this method, the hesitation factor
and fuzzy entropy were utilized to
improve the noise sensitivity of
fuzzy c-means.
Brain tumor MRI (brainweb) 3 Quantitative
index 0.95
Dhane et al., 2017 [
29
]
Fuzzy spectral clustering
gray-based fuzzy
similarity measure
This approach is adopted to compute
the ulcer boundary demarcation and
estimation.
Chronic wound Digital Camera 70
Sensitivity =87.3%
Specificity =95.7%
Accuracy =91.5%
Dice index =86.7%
Jaccard score =79.0%
Ahmed et al.,
2019 [30]
Convolutional neural
network (CNN)
This approach is identify the
subtypes of leukemia. Leukemia
Microscopic image
(ALL-IDB) ASH
Image Bank
903
Accuracy =88.25%
(Leukemia) Accuracy
=81.74%
(Healthy cell)
Electronics 2020,9, 188 8 of 22
3. Methods
3.1. Basics of Soft Covering Based Rough Set
This section describes the basic properties of soft covering-based rough approximation [11].
Definition 1.
Let
CG=(F,A)
be a covering soft set over
U
if
F(a),
,
aA
. The pair
S=(U,CG)
is
known as soft covering approximation space. For a set
XU
, the soft covering lower and upper approximations
are, respectively, defined as
S(X)=aAF(a):F(a)X(1)
S(X)=MdS(x):xX. (2)
In addition,
Spos (X)=S(X)(3)
Sneg (X)=US(X)(4)
Sbon (X)=S(X)S(X)(5)
are called the soft covering positive, negative, and boundary regions of X, respectively [11].
Definition 2.
Let
S=(U,CG)
be a soft covering approximation space. If
S(X)=S(X)
, then subset
XU
is called soft covering. X is said to be a soft covering based rough set if S(X),S(X).
The soft covering based rough set can be applied to image segmentation with the following
considerations.
The set of pixels in the input image is denoted as
U U =X={xi/xi
is the value of the ith pixel
in the image}.
Let
CG=(F,A)
be the covering soft set to be constructed containing the pixels belonging
to clusters.
The set of parameter Ais considered as the number of clusters ClG{i=1, 2, 3, . . . ,k}to which
the pixels fit.
For example, let a set of pixels in an image be denoted as
U={x1,x2,x3,x4,}
and the parameter
set
A
be denoted as number of clusters {
ClG1
,
ClG2
,
ClG3
} to which the pixels belong. The distance
between each pixel and the centroids are calculated. Based on the minimum distance, the pixels
are grouped to the clusters. Assume that the input pixels are grouped in one cluster or more than
one clusters as follows.
F(ClG1)={x2,x3,x4}
F(ClG2)={x1,x4,}
F(ClG3)={x1,x3}
Let (F,A) be represented as (F,A)={F(
ClG
)
|ClG
A}. The soft covering based rough set
representation of the above example is given by
(F,A)=
ClG1={x2,x3,x4}
ClG2={x1,x4,}
ClG3={x1,x3}
.
A tabular presentation of soft sets appears in Table 2. If
xiF(ClGi)
, then the value is one; else, it
is zero.
Electronics 2020,9, 188 9 of 22
Table 2. Soft covering-based rough set representation of an image.
UClG1ClG2ClG3
x10 1 1
x21 0 0
x31 0 1
x41 1 0
3.2. The Proposed Histogram-Based Soft Covering Rough K-Means Clustering
The proposed histogram-based soft covering rough k-means clustering is summarized in Algorithm
1. The combination of the covering soft set and rough set gives rise to a new kind of soft rough sets.
Based on the covering soft sets, soft covering rough approximation was proposed by Yüksel et al.
in 2014 [
11
,
31
], which is more accurate than the soft rough set. Here, we establish a rough k-means
clustering using soft covering-based rough approximation to segment the image of the leukemia
nucleus. Let
S(X)
,
S(X)
be denoted as soft covering lower and upper approximation, and for
S(X)S(X)i.e.
, in soft covering-based rough k-means clustering, the lower approximation is a subset
of the upper approximation. The pixel data
Xn=(x1,x2,. . . . . . .xn)
of the lower approximation surely
belong to the cluster; in this way, they can not have a place with some other cluster. The pixel data
Xn=(x1,x2,. . . . . . .xn)
in an upper approximation may belong to the cluster. Since their participation
is dubious, they should be an individual set from an upper approximation of at least another cluster.
The distance between the pixel data Xnand the mean smkis defined as [32]
d(Xn,smk)=kXnsmkk. (6)
The cluster center smki.e., the mean, is computed using the following equation:
smk=
wlow P
XnS
Xn
Sk
+wupp P
XnS
Xn
S
k
f or S,φ
P
XnS
Xn
S
k
otherwise,(7)
where
Sk
indicates the numbers of pixels in the lower approximation of the cluster
k
and
S
k
is the
number of pixels in the upper approximation of the cluster
k
. The weight parameters
wlow
and
wupp
stress the significance of the lower and upper approximation of the cluster.
Explanation: In this algorithm, identify the peak value of a histogram image and use it to
define the number of clusters
k
. Initially, assign each pixel
Xn=(x1,x2,. . . . . . .xn)
to exactly one
lower approximation. Here, soft covering-based rough approximation is applied instead of rough
approximation. Determine the new means
smk
using Equation (7). Assign each pixel data to its
closest mean using Equation (6). Compute the distance between each pixel
Xn
with centroid
smk
i.e.,
d(Xn,smk)
. For each pixel, compute the relative distance (RD). If it is greater than the threshold,
then the pixel is put into the upper approximation of the cluster
k
; otherwise, put it into the lower
approximation of the cluster
h
. This algorithm is continued until all the data objects close to the cluster
remain unchanged. Finally, the clustered image is labeled by the cluster index, and the segmented
image of the nucleus is extracted.
Electronics 2020,9, 188 10 of 22
Algorithm 1:Based Soft Covering Rough K Means Clustering Algorithm
Input :Img (Xn),k,wlow ,wupp,δ
Output :Segmented Nucleus Image Segneu
Initialization :
Xn=(x1,x2,. . . . . . .xn)// n=no.o f pixels in an image
K=hist(Img(Xn))No.o f Clusters found using the peak value o f a histogram image
wlow =Lower Approximation Weight
wupp =Upper Approximation Weight
δ=Threshold Value
Randomly assign each pixel into exactly one lower approximation.
Procedure :
Step1:Randomly assign each pixel0s data to the so f t covering approximations
Step2:Compute cluster centers smkusing Equation (7)
Step3:Assign the pixels to the approximations.
The pixel data Xndetermine its closest mean smh.
sdmin
n,h=d(Xn,smh)=min
k=1,2,...Kd(Xn,smk)
Assign Xnto the upper approximation o f the cluster h :XnS
h.
Step4:The relative distance is de fined as
RD =d(Xn,smk)d(Xn,smh)
ST ={t:RD δh,k}.
I f ST ,φthen XnS
ttT.
Else,XnSh.
Step5:Check the convergence o f the algorithm;i f not,make it conver ge,and then continue
with Step 1.
Step6:Lable the image by cluster index and extract the leukemia nucleus (Segneu ).
3.3. Performance Assessment for Segmentation Algorithms
After preprocessing, a novel HSCRKM algorithm is applied for leukemia nucleus image
segmentation. The peak values of histogram are identified, and these values will automatically
be assigned the number of clusters (K). In each iteration, the k value will change. The range of weight
of the lower and boundary region in rough k-means algorithms is (0.0
<=wlow
,
wbon <=
1.0
)
.
The relative threshold in the HSCRKM algorithm is defined as
δ
<= 1.0. The parameters’ values
are assigned as
wlow =
0.7,
wbon =
0.3,
and δ=
0.5. These values give possible stable results
in rough k-means [
30
]. Figure 2illustrates the segmentation results produced by the proposed
HSCRKM algorithm.
𝐴
𝑒
Figure 2. Cont.
Electronics 2020,9, 188 11 of 22
Electronics 2020, 9, x FOR PEER REVIEW 11 of 24
Figure 2. Segmentation results produced by the proposed histogram-based soft covering rough
k-means clustering (HSCRKM) algorithm.
In Figure 2, the first column displays the original image, the second column shows the
histogram of an image that helps find the number of clusters (K), the third column displays the
clustered image, and the last column displays the extracted nucleus. It is observed that if the k value
is at its minimum, we get a better segmentation result. This helps reduce the processing time. The
parameters utilized in the clustering algorithms are presented in Figure 3.
Figure 3. Parameters utilized in clustering algorithms.
K=3, Max Iteration = 500 K-Means Clustering
K=3, υ=0.000001, m=2 FCM Clustering
•K=3
PSO-based Clustering •𝐾 =𝑃𝑒𝑎𝑘 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝐻𝑖𝑠𝑡𝑜𝑔𝑟𝑎𝑚 𝑖𝑚𝑎𝑔𝑒,
• 𝑤=0.7, 𝑤= 0.3, 𝑎𝑛𝑑 𝛿 =0.5
HSCRKM Clustering
K=3
K=4
K=3
K=4
Figure 2.
Segmentation results produced by the proposed histogram-based soft covering rough k-means
clustering (HSCRKM) algorithm.
In Figure 2, the first column displays the original image, the second column shows the histogram
of an image that helps find the number of clusters (K), the third column displays the clustered image,
and the last column displays the extracted nucleus. It is observed that if the k value is at its minimum,
we get a better segmentation result. This helps reduce the processing time. The parameters utilized in
the clustering algorithms are presented in Figure 3.
Electronics 2020, 9, x FOR PEER REVIEW 11 of 24
Figure 2. Segmentation results produced by the proposed histogram-based soft covering rough
k-means clustering (HSCRKM) algorithm.
In Figure 2, the first column displays the original image, the second column shows the
histogram of an image that helps find the number of clusters (K), the third column displays the
clustered image, and the last column displays the extracted nucleus. It is observed that if the k value
is at its minimum, we get a better segmentation result. This helps reduce the processing time. The
parameters utilized in the clustering algorithms are presented in Figure 3.
Figure 3. Parameters utilized in clustering algorithms.
K=3, Max Iteration = 500 K-Means Clustering
K=3, υ=0.000001, m=2 FCM Clustering
•K=3
PSO-based Clustering •𝐾=𝑃𝑒𝑎𝑘 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝐻𝑖𝑠𝑡𝑜𝑔𝑟𝑎𝑚 𝑖𝑚𝑎𝑔𝑒,
• 𝑤=0.7, 𝑤= 0.3, 𝑎𝑛𝑑 𝛿 =0.5
HSCRKM Clustering
K=3
K=4
K=3
K=4
Figure 3. Parameters utilized in clustering algorithms.
Figure 4shows the sample output of leukemia image segmentation using existing clustering
algorithms such as k-means clustering, FCM clustering, and PSO-based clustering algorithms. Here,
the number of clusters k is assigned as three using the elbow method.
Electronics 2020,9, 188 12 of 22
Electronics 2020, 9, x FOR PEER REVIEW 12 of 24
Figure 4 shows the sample output of leukemia image segmentation using existing clustering
algorithms such as k-means clustering, FCM clustering, and PSO-based clustering algorithms. Here,
the number of clusters k is assigned as three using the elbow method.
Original image K-means FCM PSO
Figure 4. Segmentation results by k-means, FCM, and particle swarm optimization (PSO)
algorithms.
4. Results and Discussion
4.1. Dataset
The Acute Lymphoblastic Leukemia Image Database (ALL-IDB) datasets were used for this
experiment. These data were downloaded from the website www.dti.unimi.it/fscotti/all/ [33–36].
There were 368 images—175 benign and 193 malignant—taken for this experimental analysis.
Digital microscopes are not suitable, since they are usually designed to work in the RGB color space.
In the preprocessing step, all the RGB input images are converted into a LAB color space.
4.2. Feature Extraction
The segmented image data were too large, and it was very difficult to process them. Feature
extraction is a technique to extract the relevant informative data of a segmented image. This will
Figure 4.
Segmentation results by k-means, FCM, and particle swarm optimization (PSO) algorithms.
4. Results and Discussion
4.1. Dataset
The Acute Lymphoblastic Leukemia Image Database (ALL-IDB) datasets were used for this
experiment. These data were downloaded from the website www.dti.unimi.it/fscotti/all/[
33
36
].
There were 368 images—175 benign and 193 malignant—taken for this experimental analysis. Digital
microscopes are not suitable, since they are usually designed to work in the RGB color space. In the
preprocessing step, all the RGB input images are converted into a LAB color space.
4.2. Feature Extraction
The segmented image data were too large, and it was very dicult to process them. Feature
extraction is a technique to extract the relevant informative data of a segmented image. This will reduce
the processing speed, time, and dimensionality of an image. In this research, 21 shape and color-based
features—namely, the area, perimeter, roundness, elongation, form_factor, length_to_diameter_ratio,
compactness, discrete_fourier_transform, mean_of_harra_coecient, h_coecient, v_coecient,
variance_of_harra_coecient, h_coecient, v_coecient, mean_colour_intensity for red,
green, and blue, hue, saturation, value component, and class attribute—were extracted [
37
].
Twenty-three texture-based features—namely, angular_second_moment, entropy, dissimilarity,
Electronics 2020,9, 188 13 of 22
contrast, inverse_dierence, correlation, homogeneity, autocorrelation, cluster_shade,
cluster_prominence, maximum_probability, sum_of_squares, sum_average, sum_variance,
sum_entropy, dierence_variance, dierence_entropy, information_measures_correlation1,
information_measures_correlation2, maximal_correlation_ coecient, inverse_dierence_normalized,
inverse_dierence_moment_normalized, and class attribute were extracted. These features are derived
from the gray level co-occurrence matrix (GLCM) in directions 0
, 45
, 90
, and 135
[
38
,
39
]. From the
literature review, we found that these features are widely used for leukemia image analysis.
4.3. Performance Assessments of Segmentation Algorithms
The empirical results are interpreted in two ways. First, we analyze the eciency of various
clustering-based segmentation algorithms through state-of-the-art machine learning algorithms.
Secondly, we compare the machine learning methods using some evaluation measures such as receiver
operating characteristic (ROC) curve analysis and kappa statistics. The extracted feature set was fed
into the machine learning (ML) prediction algorithms to classify the segments indicating the tumor
and non-tumor leukemia in the image. In this experiment, there were seven ML algorithms—namely,
logistic regression (LR), naive Bayes (NB), support vector machine (SVM), k-nearest neighborhood
(KNN), neural network (NN), random forest (RF), and decision tree (DT)—were used to evaluate the
performance of the clustering algorithms.
The performance of the machine learning prediction algorithm was analyzed using various
evaluation metrics such as accuracy (A), precision (P), recall (R), F1 measure, area under the ROC
Curve (AUC), mean absolute error (MAE), and coecient of determination (R
2
) [
40
,
41
]. It is noted that
the prediction value of R2lies between 0 and 1 for no-fit and perfect fit, respectively.
The classification results of k-means clustering, FCM clustering, PSO-based clustering, and the
proposed HSCRKM clustering algorithms are presented in Tables 36, respectively. The performance
of the segmentation algorithms was analyzed through dierent machine learning prediction methods.
The experimental results show that the proposed method HSCRKM clustering algorithm performs
better than the existing algorithms. On a closer look at the overall performance of the proposed
method, it is believed that logistic regression and neural network perform well when compared to
other prediction algorithms and also produce the highest classification accuracy i.e., 93%. It is also
observed that the naive Bayes method produces the lowest classification accuracy rate i.e., 58%.
Table 3presents the performance analysis of k-means clustering. The LR, NN, and RF algorithms
produce the highest classification accuracy of 79%. The NB algorithm gives the minimum accuracy of
65%. KNN and DT produce 72% accuracy and SVM produces 74% accuracy. The overall performance
of k-means clustering was 69%, which is computed by the average accuracy of all the datasets with all
the ML algorithms.
Table 5presents the performance analysis of FCM clustering. The LR, DT, and RF algorithms
achieve the maximum accuracy value of 88%. Obviously, it gives the lowest mean absolute error
(MAE) value. Similar to k-means clustering, the NB algorithm gives the lowest accuracy value of 81%
when compared to other algorithms. The SVM and NN give the accuracy of 83% and 84%, respectively.
The overall accuracy of FCM clustering is 77%.
Electronics 2020,9, 188 14 of 22
Table 3.
Performance analysis of k-means clustering. A: accuracy, AUC: area under the receiver
operating characteristic curve, DT: decision tree, KNN: k-nearest neighborhood, LR: logistic regression,
MAE: mean absolute error, NB: naive Bayes, NN: neural network, P: precision, R: recall, RF:
random forest.
ML Algorithms Dataset P R F1 AUC MAE R2A
LR
GLCM-0 81.00 77.00 78.00 0.821 0.134 0.195 76.74
GLCM045
79.00 79.00 78.00 0.660 0.087 0.076 79.07
GLCM-90
60.00 66.00 68.00 0.706 0.112 0.097 65.17
GLCM-135
70.00 66.00 67.00 0.805 0.128 0.159 67.44
SC 76.00 74.00 75.00 0.805 0.115 0.152 74.42
NB
GLCM-0 61.00 60.00 61.00 0.738 0.082 0.193 60.47
GLCM045
62.00 61.00 58.00 0.880 0.093 0.068 60.60
GLCM-90
54.00 51.00 50.00 0.799 0.128 0.162 51.16
GLCM-135
60.00 56.00 57.00 0.710 0.088 0.132 55.81
SC 68.00 65.00 65.00 0.618 0.110 0.047 65.11
SVM
GLCM-0 77.00 74.00 71.00 0.805 0.073 0.106 74.41
GLCM045
80.00 72.00 72.00 0.871 0.113 0.323 72.09
GLCM-90
82.00 70.00 73.00 0.792 0.032 0.143 69.77
GLCM-135
65.00 65.00 65.00 0.750 0.130 0.372 65.11
SC 73.00 70.00 67.00 0.692 0.113 0.090 69.78
KNN
GLCM-0 71.00 72.00 72.00 0.928 0.119 0.083 72.09
GLCM045
71.00 67.00 39.00 0.819 0.062 0.169 67.44
GLCM-90
69.00 67.00 67.00 0.817 0.138 0.162 67.44
GLCM-135
63.00 63.00 63.00 0.787 0.135 0.162 62.79
SC 67.00 67.00 67.00 0.839 0.112 0.135 67.44
NN
GLCM-0 79.00 79.00 76.00 0.821 0.077 0.274 79.06
GLCM045
77.00 77.00 77.00 0.806 0.139 0.348 76.74
GLCM-90
66.00 67.00 69.00 0.859 0.094 0.079 66.66
GLCM-135
72.00 70.00 71.00 0.817 0.105 0.052 69.69
SC 65.00 65.00 64.00 0.853 0.116 0.090 65.11
RF
GLCM-0 74.00 72.00 72.00 0.777 0.158 0.288 72.09
GLCM045
70.00 70.00 69.00 0.761 0.127 0.275 69.77
GLCM-90
69.00 65.00 65.00 0.798 0.106 0.182 65.11
GLCM-135
79.00 79.00 79.00 0.805 0.126 0.186 79.06
SC 67.00 67.00 67.00 0.472 0.131 0.342 67.42
DT
GLCM-0 66.00 70.00 66.00 0.865 0.073 0.192 69.76
GLCM045
80.00 79.00 76.00 0.549 0.093 0.101 79.06
GLCM-90
71.00 70.00 67.00 0.664 0.118 0.250 69.79
GLCM-135
72.00 72.00 71.00 0.852 0.118 0.250 72.09
SC 72.00 70.00 71.00 0.817 0.105 0.052 69.69
Average Overall Accuracy 69%
Table 4. Performance analysis of FCM clustering.
ML Algorithms Dataset P R F1 AUC MAE Rˆ2 A
LR
GLCM-0 79.00 79.00 76.00 0.802 0.089 0.098 79.06
GLCM045
89.00 88.00 89.00 0.950 0.080 0.401 88.37
GLCM-90
71.00 72.00 71.00 0.816 0.105 0.185 72.09
GLCM-135
88.00 86.00 82.00 0.792 0.058 0.156 86.04
SC 81.00 77.00 78.00 0.821 0.134 0.195 76.74
NB
GLCM-0 75.00 60.00 62.00 0.767 0.076 0.135 60.64
GLCM045
60.00 60.00 60.00 0.716 0.113 0.143 60.45
GLCM-90
63.00 63.00 63.00 0.787 0.135 0.162 62.79
GLCM-135
67.00 67.00 67.00 0.926 0.112 0.135 67.44
SC 84.00 81.00 81.00 0.825 0.145 0.132 81.39
Electronics 2020,9, 188 15 of 22
Table 4. Cont.
ML Algorithms Dataset P R F1 AUC MAE Rˆ2 A
SVM
GLCM-0 77.00 74.00 71.00 0.805 0.073 0.106 74.41
GLCM045
73.00 72.00 72.00 0.655 0.195 0.269 72.09
GLCM-90
75.00 74.00 73.00 0.822 0.176 0.265 74.41
GLCM-135
72.00 72.00 72.00 0.766 0.128 0.161 72.09
SC 83.00 84.00 83.00 0.849 0.053 0.167 83.72
KNN
GLCM-0 79.00 79.00 79.00 0.821 0.077 0.274 79.06
GLCM045
76.00 74.00 74.00 0.812 0.151 0.048 74.41
GLCM-90
83.00 83.00 84.00 0.893 0.122 0.368 83.72
GLCM-135
85.00 84.00 85.00 0.866 0.098 0.312 85.31
SC 79.00 80.00 79.00 0.910 0.079 0.171 79.07
NN
GLCM-0 84.00 84.00 84.00 0.745 0.125 0.349 83.72
GLCM045
87.00 85.00 82.00 0.773 0.148 0.238 84.84
GLCM-90
76.00 74.00 75.00 0.805 0.115 0.152 74.42
GLCM-135
79.00 79.00 79.00 0.805 0.126 0.186 79.07
SC 80.00 77.00 77.00 0.771 0.101 0.156 77.18
RF
GLCM-0 77.00 77.00 75.00 0.785 0.080 0.186 76.74
GLCM045
81.00 86.00 71.00 0.852 0.158 0.437 68.76
GLCM-90
80.00 77.00 77.00 0.839 0.155 0.248 76.74
GLCM-135
88.00 88.00 88.00 0.899 0.073 0.114 88.37
SC 86.00 79.00 78.00 0.795 0.086 0.090 79.06
DT
GLCM-0 84.00 83.00 83.00 0.929 0.159 0.265 82.75
GLCM045
88.00 88.00 88.00 0.953 0.050 0.138 88.37
GLCM-90
81.00 81.00 81.00 0.793 0.115 0.049 81.39
GLCM-135
87.00 86.00 86.00 0.938 0.064 0.053 86.04
SC 85.00 81.00 76.00 0.813 0.131 0.095 81.39
Average Overall Accuracy 77%
Table 5shows the eciency of the algorithm for PSO-based clustering. In this table, it is noted
that the NN method attains 90% accuracy. The LR, SVM, KNN, and RF methods give above 80% of the
classification accuracy. The NB algorithm again provides the minimum accuracy of 67%. The overall
classification accuracy of PSO-based clustering is 78%.
Table 5. Performance analysis of PSO-based clustering.
ML Algorithms Dataset P R F1 AUC MAE R2A
LR
GLCM-0 86.00 81.00 82.00 0.717 0.141 0.279 81.39
GLCM045
88.00 86.00 85.00 0.741 0.150 0.060 86.04
GLCM-90
84.00 79.00 76.00 0.739 0.143 0.095 79.06
GLCM-135
90.00 86.00 86.00 0.963 0.065 0.093 86.04
SC 86.00 81.00 82.00 0.793 0.092 0.334 81.39
NB
GLCM-0 69.00 67.00 68.00 0.833 0.082 0.098 67.44
GLCM045
60.00 64.00 66.00 0.713 0.118 0.129 63.63
GLCM-90
56.00 61.00 58.00 0.880 0.093 0.068 60.61
GLCM-135
64.00 64.00 65.00 0.764 0.012 0.165 63.63
SC 61.00 62.00 62.00 0.876 0.118 0.148 62.69
SVM
GLCM-0 84.00 79.00 76.00 0.739 0.143 0.095 79.07
GLCM045
79.00 79.00 78.00 0.827 0.085 0.142 79.06
GLCM-90
71.00 72.00 71.00 0.816 0.105 0.185 72.09
GLCM-135
76.00 77.00 72.00 0.807 0.086 0.192 76.74
SC 81.00 81.00 79.00 0.801 0.120 0.167 81.39
Electronics 2020,9, 188 16 of 22
Table 5. Cont.
ML Algorithms Dataset P R F1 AUC MAE R2A
KNN
GLCM-0 82.00 79.00 80.00 0.811 0.123 0.017 79.06
GLCM045
71.00 73.00 71.00 0.864 0.082 0.108 72.72
GLCM-90
70.00 67.00 68.00 0.816 0.141 0.526 67.44
GLCM-135
75.00 74.00 73.00 0.822 0.176 0.265 74.41
SC 82.00 81.00 81.00 0.788 0.118 0.147 80.66
NN
GLCM-0 62.00 76.00 68.00 0.726 0.075 0.448 75.44
GLCM045
61.00 73.00 66.00 0.848 0.123 0.261 72.12
GLCM-90
88.00 84.00 83.00 0.849 0.053 0.167 83.72
GLCM-135
91.00 91.00 91.00 0.929 0.062 0.210 90.67
SC 86.00 86.00 86.00 0.950 0.070 0.070 86.12
RF
GLCM-0 84.00 81.00 81.00 0.825 0.145 0.135 81.39
GLCM045
74.00 79.00 74.00 0.885 0.114 0.264 78.77
GLCM-90
79.00 70.00 79.00 0.747 0.139 0.102 79.65
GLCM-135
78.00 78.00 77.00 0.917 0.082 0.538 77.47
SC 81.00 81.00 81.00 0.841 0.097 0.056 81.39
DT
GLCM-0 87.00 84.00 80.00 0.717 0.115 0.207 83.72
GLCM045
89.00 88.00 89.00 0.929 0.081 0.229 88.37
GLCM-90
87.00 85.00 82.00 0.662 0.086 0.125 84.84
GLCM-135
82.00 82.00 82.00 0.950 0.102 0.214 81.82
SC 90.00 88.00 88.00 0.926 0.103 0.221 88.37
Average Overall Accuracy 78%
The performance analysis of the HSCRKM algorithm is shown in Table 6. The LR, NN, and DT
algorithms achieve 93% classification accuracy. NB, KNN, and RF give accuracy values of 84%,
85%, and 86%, respectively. It is also interesting to note that the SVM gives the minimum accuracy,
i.e., 84%. The overall accuracy of the HSCRKM algorithm is 82%. The proposed method leads the
accuracy of 13% for k-means clustering, 5% for FCM, and 4% for PSO-based clustering. It means that
the accurate segmentation produces the best performance. The experimental results show that the
HSCRKM algorithm accurately segments the nucleus. From the literature review report, the various
authors produce above 90% accuracy. However, they are using a very small number of images for the
experiments. In this research, around 350 images are used to evaluate the performance of the proposed
HSCRKM algorithm.
Table 6. Performance analysis of the HSCRKM algorithm.
ML Algorithms Dataset P R F1 AUC MAE R2A
LR
GLCM-0 84.00 84.00 85.00 0.848 0.017 0.214 84.72
GLCM045
93.00 93.00 93.00 0.944 0.072 0.584 93.02
GLCM-90
87.00 86.00 86.00 0.825 0.032 0.219 87.65
GLCM-135
89.00 88.00 88.00 0.899 0.112 0.427 88.37
SC 86.00 85.00 85.00 0.965 0.047 0.138 85.65
NB
GLCM-0 70.00 70.00 70.00 0.848 0.190 0.133 69.76
GLCM045
67.00 65.00 65.00 0.782 0.128 0.171 65.11
GLCM-90
67.00 65.00 65.00 0.782 0.128 0.171 65.11
GLCM-135
61.00 58.00 56.00 0.750 0.152 0.131 58.13
SC 84.00 84.00 85.00 0.848 0.017 0.214 84.72
SVM
GLCM-0 84.00 81.00 81.00 0.760 0.140 0.206 81.39
GLCM045
84.00 81.00 81.00 0.760 0.140 0.206 81.36
GLCM-90
79.00 79.00 79.00 0.768 0.321 0.341 79.06
GLCM-135
80.00 74.00 73.00 0.780 0.132 0.122 74.41
SC 86.00 84.00 84.00 0.967 0.089 0.312 83.92
Electronics 2020,9, 188 17 of 22
Table 6. Cont.
ML Algorithms Dataset P R F1 AUC MAE R2A
KNN
GLCM-0 86.00 84.00 84.00 0.967 0.072 0.309 83.92
GLCM045
82.00 81.00 81.00 0.952 0.097 0.291 81.39
GLCM-90
75.00 72.00 71.00 0.727 0.127 0.102 72.09
GLCM-135
77.00 77.00 76.00 0.911 0.101 0.151 76.74
SC 86.00 85.00 85.00 0965 0.047 0.138 85.65
NN
GLCM-0 86.00 86.00 86.00 0.982 0.070 0.135 86.04
GLCM045
91.00 91.00 90.00 0.950 0.054 0.274 90.69
GLCM-90
84.00 84.00 85.00 0.848 0.017 0.138 84.72
GLCM-135
93.00 93.00 93.00 0.939 0.074 0.526 93.02
SC 86.00 87.00 86.00 0.965 0.047 0.138 85.65
RF
GLCM-0 82.00 81.00 81.00 0.860 0.174 0.331 81.39
GLCM045
86.00 85.00 85.00 0.965 0.047 0.138 85.65
GLCM-90
82.00 81.00 81.00 0.890 0.441 0.321 81.39
GLCM-135
84.00 84.00 85.00 0.848 0.017 0.214 84.72
SC 86.00 87.00 86.00 0.913 0.144 0.225 86.05
DT
GLCM-0 84.00 84.00 85.00 0.848 0.017 0.214 84.72
GLCM045
93.00 93.00 93.00 0.944 0.072 0.584 93.02
GLCM-90
86.00 84.00 84.00 0.967 0.072 0.309 83.72
GLCM-135
89.00 88.00 88.00 0.899 0.112 0.427 88.72
SC 91.00 91.00 90.00 0.930 0.072 0.297 90.69
Average Overall Accuracy 82%
Figure 5shows the overall prediction accuracy for various machine learning algorithms. With
respect to k-means clustering, all the machine learning algorithms produce the lowest prediction
accuracy i.e., below 80%. It is noted that with respect to PSO and FCM, some of the ML methods
(i.e., logistic regression, random forest, and decision tree) attain above 80% prediction accuracy. With
respect to the HSCRKM clustering algorithm, most of the ML methods (except naive Bayes) achieve
above 80% prediction accuracy. It can also be inferred that the proposed HSCRKM clustering algorithm
eciently segment the nucleus, and the extracted features (based on the segments) probably increase
the prediction accuracy. To interpret the experimental results, we are manually preserving the best
accuracy range as above 80%.
Electronics 2020, 9, x FOR PEER REVIEW 18 of 24
Figure 5. Overall prediction accuracy.
4.4. Performance Assessments of Machine Learning Algorithms
4.4.1. Kappa Statistics
Figure 6 shows a comparison of the performances for various prediction algorithms and the
proposed HSCRKM algorithm in terms of Cohen’s kappa value [42], which is a statistical measure
used to evaluate the inter-rater reliability of the classifier. The reliability rate lies on a 0 to 1 scale,
where “1” means perfect agreement and less than “1” means less than perfect agreement. With
respect to the shape and color-based feature dataset, the proposed algorithm produces a substantial
agreement range [43] (i.e., 0.61 to 0.80) amidst all the existing prediction algorithms taken up for
study. Compared with other machine learning algorithms, neural networks have the capability to
learn and model nonlinear and complex relationships. It also has the ability to perceive all possible
interactions between predictor variables and the availability of multiple training algorithms. From
the figure, it is noted that the neural network algorithm produces the highest kappa value (i.e., 0.67
to 0.85), which means perfect agreement for prediction. It also produces the highest classification
accuracy when compared with other machine learning algorithms.
0
10
20
30
40
50
60
70
80
90
100
GLCM-0 GLCM045 GLCM-90 GLCM-135 SC
K-Means
LR NB SVM KNN NN RF DT
0
20
40
60
80
100
GLCM-0 GLCM045 GLCM-90 GLCM-135 SC
FCM
LR NB SVM KNN NN RF DT
0
20
40
60
80
100
GLCM-0 GLCM045 GL CM-90 GLCM-135 SC
PSO
LR NB SVM KNN NN RF DT
0
20
40
60
80
100
GLCM-0 GLCM045 GLCM-90 GLCM-135 SC
HSCRKM
LR NB SVM KNN NN RF DT
Figure 5. Overall prediction accuracy.
Electronics 2020,9, 188 18 of 22
4.4. Performance Assessments of Machine Learning Algorithms
4.4.1. Kappa Statistics
Figure 6shows a comparison of the performances for various prediction algorithms and the
proposed HSCRKM algorithm in terms of Cohen’s kappa value [
42
], which is a statistical measure used
to evaluate the inter-rater reliability of the classifier. The reliability rate lies on a 0 to 1 scale, where
“1” means perfect agreement and less than “1” means less than perfect agreement. With respect to
the shape and color-based feature dataset, the proposed algorithm produces a substantial agreement
range [
43
] (i.e., 0.61 to 0.80) amidst all the existing prediction algorithms taken up for study. Compared
with other machine learning algorithms, neural networks have the capability to learn and model
nonlinear and complex relationships. It also has the ability to perceive all possible interactions between
predictor variables and the availability of multiple training algorithms. From the figure, it is noted that
the neural network algorithm produces the highest kappa value (i.e., 0.67 to 0.85), which means perfect
agreement for prediction. It also produces the highest classification accuracy when compared with
other machine learning algorithms.
Electronics 2020, 9, x FOR PEER REVIEW 19 of 24
Figure 6. Kappa value for HSCRKM clustering.
4.4.2. ROC Curve Analysis
Receiver operating characteristic (ROC) curve analysis is a widely used validation method to
evaluate the diagnostic ability of the various prediction algorithms [44]. It can be generated by
plotting the cumulative distribution function of the true positive rate versus the false positive rate. If
the ROC curve of the prediction algorithm appears in the top left corner, then the algorithm
accurately predicts disease. If it is closer to the diagonal line, then the performance of the prediction
algorithm is less accurate. Figure 7 depicts the ROC curve analysis for the proposed algorithm
HSCRKM. The ROC curve is generated for all the extracted datasets, namely GLCM_0, GLCM_45,
GLCM_90, GLCM_135, and Shape_Colour. From Figure 6, we inferred that the shape and
color-based feature datasets produce the highest accuracy values when compared to another dataset.
It is noted that decision tree, random forest, and SVM attain similar prediction accuracy. So, the
curves appear in the same orientation. It is also noted that the neural network (NN) and logistic
regression (LR) algorithms performed better than the other machine learning algorithms. Those
algorithms curve lines almost appeared in the top left corner of the graph. The naive Bayes
algorithm curve line is executed near the diagonal line. So, this method probably attains minimum
accuracy compared to the other ML algorithms.
LR NB SVM KNN NN RF DT
GLCM-0 0.6738 0.3836 0.6211 0.6814 0.7139 0.6117 0.6738
GLCM-45 0.8525 0.3011 0.6211 0.6211 0.8058 0.8525 0.8525
GLCM-902 0.7492 0.3011 0.5452 0.4798 0.6738 0.6117 0.6472
GLCM-135 0.77 0.2053 0.4785 0.393 0.8525 0.6738 0.77
SC 0.7655 0.6738 0.6814 0.7655 0.7655 0.799 0.8058
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Kappa Value
Cohen's Kappa value
Figure 6. Kappa value for HSCRKM clustering.
4.4.2. ROC Curve Analysis
Receiver operating characteristic (ROC) curve analysis is a widely used validation method to
evaluate the diagnostic ability of the various prediction algorithms [
44
]. It can be generated by plotting
the cumulative distribution function of the true positive rate versus the false positive rate. If the ROC
curve of the prediction algorithm appears in the top left corner, then the algorithm accurately predicts
disease. If it is closer to the diagonal line, then the performance of the prediction algorithm is less
accurate. Figure 7depicts the ROC curve analysis for the proposed algorithm HSCRKM. The ROC
curve is generated for all the extracted datasets, namely GLCM_0, GLCM_45, GLCM_90, GLCM_135,
and Shape_Colour. From Figure 6, we inferred that the shape and color-based feature datasets produce
the highest accuracy values when compared to another dataset. It is noted that decision tree, random
forest, and SVM attain similar prediction accuracy. So, the curves appear in the same orientation. It is
also noted that the neural network (NN) and logistic regression (LR) algorithms performed better than
the other machine learning algorithms. Those algorithms curve lines almost appeared in the top left
Electronics 2020,9, 188 19 of 22
corner of the graph. The naive Bayes algorithm curve line is executed near the diagonal line. So, this
method probably attains minimum accuracy compared to the other ML algorithms.
Electronics 2020, 9, x FOR PEER REVIEW 20 of 24
(a) GLCM_0
(b) GLCM_45
(c) GLCM_90
Figure 7. Cont.
Electronics 2020,9, 188 20 of 22
Electronics 2020, 9, x FOR PEER REVIEW 21 of 24
(d) GLCM_135
(e) Shape
Figure 7. ROC curve analysis for HSCRKM clustering.
5. Conclusions and Future Work
Clustering is an unsupervised classification method that is widely employed for image
segmentation. Throughout the present research, a hybrid histogram-based soft covering rough
k-means clustering algorithm is proposed to segment the image of the leukemia nucleus. In this
method, the histogram is used to initialize the number of clusters. The main advantage of this
method is that it applies the soft covering rough approximation instead of rough approximation. It is
a new kind of soft rough set that efficiently deals with uncertainties. The results are interpreted in
the following two ways. (1) The efficiency of the proposed technique is compared with the popular
and frequently used clustering algorithms such as k-means clustering, FCM, and PSO-based
clustering. (2) The state-of-the-art prediction techniques in machine learning (ML) were compared
using evolution metrics.
Figure 7. ROC curve analysis for HSCRKM clustering.
5. Conclusions and Future Work
Clustering is an unsupervised classification method that is widely employed for image
segmentation. Throughout the present research, a hybrid histogram-based soft covering rough
k-means clustering algorithm is proposed to segment the image of the leukemia nucleus. In this
method, the histogram is used to initialize the number of clusters. The main advantage of this method
is that it applies the soft covering rough approximation instead of rough approximation. It is a new kind
of soft rough set that eciently deals with uncertainties. The results are interpreted in the following
two ways. (1) The eciency of the proposed technique is compared with the popular and frequently
used clustering algorithms such as k-means clustering, FCM, and PSO-based clustering. (2) The
state-of-the-art prediction techniques in machine learning (ML) were compared using evolution metrics.
From the experimental results, it is inferred that the HSCRKM clustering algorithm and all of
the ML methods (except for naive Bayes) achieve above 80% prediction accuracy. It is also noted that
logistic regression and neural network provide on average above 90% accuracy, which performs better
than other prediction methods. The limitation of this method is that when we go for multiple color
images such as satellite images, agricultural images, photographs etc., the number of peak values in
the histogram is increased, and consequently the processing time is also increased. This method is
more suitable for the segmentation of medical images and the extraction of specific portions with high
Electronics 2020,9, 188 21 of 22
clarity (for deep study). In the future, bio-inspired algorithms could be used to optimize the number
of clusters.
Author Contributions:
Conceptualization, J.G., A.T.A., and H.I.H.; methodology, J.G., A.T.A.; software, J.G.;
validation, J.G., A.T.A., and H.I.H.; formal analysis, A.T.A. and H.I.H.; investigation, H.I.H.; resources, H.I.H.; data
curation, J.G.; writing—original draft preparation, J.G., A.T.A., and H.I.H.; writing—review and editing, A.T.A.
and H.I.H.; visualization, J.G.; funding acquisition, A.T.A. All authors have read and agreed to the published
version of the manuscript.
Funding: This research is funded by Prince Sultan University, Riyadh, Saudi Arabia.
Acknowledgments:
The authors would like to thank Prince Sultan University, Riyadh, Saudi Arabia for supporting
and funding this work. Special acknowledgment to Robotics and Internet-of-Things Lab (RIOTU) at Prince Sultan
University, Riyadh, SA. In addition, the authors wish to acknowledge the editor and anonymous reviewers for
their insightful comments, which have improved the quality of this publication.
Conflicts of Interest: The authors declare no conflict of interest.
References
1.
Surveillance, Epidemiology, and End Results (SEER). Cancer Stat Facts: Leukemia. Available online:
https://seer.cancer.gov/statfacts/html/leuks.html (accessed on 3 January 2020).
2.
Arora, R.S.; Arora, B. Acute leukemia in children: A review of the current Indian data. South Asian J. Cancer
2016,5, 155. [CrossRef] [PubMed]
3.
National Centre for Disease Informatics and Research. Available online: http://ncdirindia.org/(accessed on 3
January 2020).
4. Pawlak, Z. Rough sets. Int. J. Comput. Inf. Sci. 1982,11, 341–356. [CrossRef]
5.
Zhu, W.; Wang, F. On three types of covering-based rough sets. IEEE Trans. Knowl. Data Eng.
2007
,19,
1131–1143. [CrossRef]
6. Zhu, W. Topological approaches to covering rough sets. Inf. Sci. 2007,177, 1499–1508. [CrossRef]
7.
Kumar, S.S.; Inbarani, H.H.; Azar, A.T.; Polat, K. Covering-based rough set classification system. Neural
Comput. Appl. 2017,28, 2879–2888. [CrossRef]
8. Molodtsov, D. Soft set theory—first results. Comput. Math. Appl. 1999,37, 19–31. [CrossRef]
9. Maji, P.K.; Biswas, R.; Roy, A. Softset theory. Comput. Math. Appl. 2003,45, 555–562. [CrossRef]
10.
Yüksel, ¸S.; Güzel Ergül, Z.; Tozlu, N. Soft covering based rough sets and their application. Sci. World J.
2014
.
[CrossRef]
11.
Jothi, G.; Hannah Inbarani, H. Leukemia Nucleus Image Segmentation Using Covering-Based Rough
K-Means Clustering Algorithm. In Proceedings of the International Conference on Intelligent Computing
Systems, Tamilnadu, India, 15–16 December 2017; pp. 373–385.
12.
Zhang, G.P. Neural networks for classification: A survey. IEEE Trans. Syst. Man Cybern. Part C
2000
,30,
451–462. [CrossRef]
13. Mitchell, T.M. Machine Learning; McGraw Hill: Burr Ridge, IL, USA, 1997; Volume 45, pp. 870–877.
14. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995,20, 273–297. [CrossRef]
15.
Friedman, N.; Geiger, D.; Goldszmidt, M. Bayesian network classifiers. Mach. Learn.
1997
,29, 131–163.
[CrossRef]
16. Liaw, A.; Matthew, W. Classification and regression by random Forest. R News 2002,2, 18–22.
17.
Patel, N.; Mishra, A. Automated Leukaemia Detection Using Microscopic Images. Procedia Comput. Sci.
2015
,
58, 635–642. [CrossRef]
18.
Srisukkham, W.; Zhang, L.; Neoh, S.C.; Todryk, S.; Lim, C.P. Intelligent leukaemia diagnosis with bare-bones
PSO based feature optimization. Appl. Soft Comput. 2017,56, 405–419. [CrossRef]
19.
Su, J.; Liu, S.; Song, J. A segmentation method based on HMRF for the aided diagnosis of acute myeloid
leukemia. Comput. Methods Programs Biomed. 2017,152, 115–123. [CrossRef] [PubMed]
20.
Kaya, I.E.; Pehlivanlı, A.Ç.; Sekizkarde¸s, E.G.; Ibrikci, T. PCA based clustering for brain tumor segmentation
of T1w MRI images. Comput. Methods Programs Biomed. 2017,140, 19–28. [CrossRef]
21.
Cabria, I.; Gondra, I. MRI segmentation fusion for brain tumor detection. Inf. Fusion
2017
,36, 1–9. [CrossRef]
22.
Küçükkülahlı, E.; Erdo ˘gmu¸s, P.; Polat, K. Histogram-based automatic segmentation of images. Neural
Comput. Appl. 2016,27, 1445–1450. [CrossRef]
Electronics 2020,9, 188 22 of 22
23.
Namburu, A.; kumar Samay, S.; Edara, S.R. Soft fuzzy rough set-based MR brain image segmentation. Appl.
Soft Comput. 2017,54, 456–466. [CrossRef]
24.
Ali, M.; Khan, M.; Tung, N.T. Segmentation of Dental X-ray Images in Medical Imaging using Neutrosophic
Orthogonal Matrices. Expert Syst. Appl. 2018,91, 434–441. [CrossRef]
25.
Rundo, L.; Militello, C.; Russo, G.; D’Urso, D.; Valastro, L.M.; Garufi, A.; Gilardi, M.C. Fully Automatic
Multispectral MR Image Segmentation of Prostate Gland Based on the Fuzzy C-Means Clustering Algorithm.
In Multidisciplinary Approaches to Neural Computing. Smart Innovation, Systems and Technologies; Esposito, A.,
Faudez-Zanuy, M., Eds.; Springer: Cham, Switzerland, 2017; Volume 69, pp. 23–37.
26.
Zhang, P.; Lu, S.; Li, J.; Zhang, P.; Xie, L.; Xue, H.; Zhang, J. Multi-component segmentation of X-ray computed
tomography (CT) image using multi-Otsu thresholding algorithm and scanning electron microscopy. Energy
Explor. Exploit. 2017,35, 281–294. [CrossRef]
27.
Ganesh, M.; Naresh, M.; Arvind, C. MRI Brain Image Segmentation Using Enhanced Adaptive Fuzzy
K-Means Algorithm. Intell. Autom. Soft Comput. 2017,23, 325–330. [CrossRef]
28.
Kaur, P. Intuitionistic fuzzy sets based credibilistic fuzzy C-means clustering for medical image segmentation.
Int. J. Inf. Technol. 2017. [CrossRef]
29.
Dhane, D.M.; Maity, M.; Mungle, T.; Bar, C.; Achar, A.; Kolekar, M.; Chakraborty, C. Fuzzy spectral clustering for
automated delineation of chronic wound region using digital images. Comput. Biol. Med. 2017,89, 551–560.
30.
Ahmed, N.; Yigit, A.; Isik, Z.; Alpkocak, A. Identification of Leukemia Subtypes from Microscopic Images
Using Convolutional Neural Network. Diagnostics 2019,9, 104. [CrossRef]
31.
Yüksel, ¸S.; Tozlu, N.; Dizman, T.H. An application of multicriteria group decision making by soft covering
based rough sets. Filomat 2015,29, 209–219. [CrossRef]
32. Peters, G. Some refinements of rough k-means clustering. Pattern Recognit. 2006,39, 1481–1491. [CrossRef]
33.
Labati, R.D.; Piuri, V.; Scotti, F. ALL-IDB: The Acute Lymphoblastic Leukemia Image Database for Image
Processing. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Brussels,
Belgium, 11–14 September 2011.
34.
Scotti, F. Robust Segmentation and Measurements Techniques of White Cells in Blood Microscope Images.
In Proceedings of the IEEE Instrumentation and Measurement Technology Conference, Sorrento, Italy,
24–27 April 2006; pp. 43–48.
35.
Scotti, F. Automatic morphological analysis for acute leukemia identification in peripheral blood microscope
images. In Proceedings of the IEEE International Conference on Computational Intelligence for Measurement
Systems and Applications, Giardini Naxos, Italy, 20–22 July 2005; pp. 96–101.
36.
Piuri, V.; Scotti, F. Morphological classification of blood leucocytes by microscope images. In Proceedings of
the IEEE International Conference on Computational Intelligence for Measurement Systems and Applications,
Boston, MA, USA, 14–16 July 2004; pp. 103–108.
37.
Jothi, G.; Inbarani, H.H. Hybrid Tolerance Rough Set–Firefly based supervised feature selection for MRI
brain tumor image classification. Appl. Soft Comput. 2016,46, 639–651.
38.
Jothi, G.; Inbarani, H.H.; Azar, A.T. Hybrid Tolerance Rough Set: PSO Based Supervised Feature Selection
for Digital Mammogram Images. Int. J. Fuzzy Syst. Appl. 2013,3, 15–30. [CrossRef]
39.
Jothi, G.; Inbarani, H.H. Soft set based feature selection approach for lung cancer images. arXiv
2012
,
arXiv:1212.5391.
40.
Inbarani, H.H.; Azar, A.T.; Jothi, G. Supervised hybrid feature selection based on PSO and rough sets for
medical diagnosis. Comput. Methods Programs Biomed. 2014,113, 175–185. [CrossRef] [PubMed]
41.
Ganesan, J.; Inbarani, H.H.; Azar, A.T.; Polat, K. Tolerance rough set firefly-based quick reduct.
Neural Comput. Appl. 2017,28, 2995–3008. [CrossRef]
42.
Landis, J.R.; Koch, G.G. The measurement of observer agreement for categorical data. Biometrics
1977
,33,
159–174. [CrossRef] [PubMed]
43.
Viera, A.J.; Garrett, J.M. Understanding interobserver agreement: The kappa statistic. Fam. Med.
2005
,37,
360–363.
44. Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006,27, 861–874. [CrossRef]
©
2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).
... Table 6 presents a comparison of diverse studies employing different methods to classify leukemia along with their respective outcomes. Inbarani et al. [46] introduced a hybrid approach for segmenting leukemia nucleus images, effective but less efficient with multi-color images. Boldú et al. [47] presented ALNet, a self-collected data-driven deep learning system for acute leukemia diagnosis using blood cell images. ...
Preprint
Full-text available
Acute lymphoblastic leukemia (ALL) severity is determined by the presence and ratios of blast cells (abnormal white blood cells) in both bone marrow and peripheral blood. Manual diagnosis of this disease is a tedious and time-consuming operation, making it difficult for professionals to accurately examine blast cell characteristics. To address this difficulty, researchers use deep learning and machine learning. In this paper, a ResNet-based feature extractor is utilized to detect ALL, along with a variety of feature selectors and classifiers. To get the best results, a variety of transfer learning models, including the Resnet, VGG, EfficientNet, and DensNet families, are used as deep feature extractors. Following extraction, different feature selectors are used, including Genetic algorithm, PCA, ANOVA, Random Forest, Univariate, Mutual information, Lasso, XGB, Variance, and Binary ant colony. After feature qualification, a variety of classifiers are used, with MLP outperforming the others. The recommended technique is used to categorize ALL and HEM in the selected dataset which is C-NMC 2019. This technique got an impressive 90.71% accuracy and 95.76% sensitivity for the relevant classifications, and its metrics on this dataset outperformed others.
... Also, it's worth mentioning that the algorithm proposed by authors was not tested on the realworld datasets. Other approaches of this kind were proposed in [17], [18]. ...
Article
Full-text available
Development of efficient methods of the cellular image processing is an important avenue for practical application of modern artificial intelligence techniques. In particular, practical hematology requires automatic classification of images with or without leukemic (blast) cells in peripheral blood smears. This paper presents a new approach to the problem of classification of such cellular images based on graph theory, XGBoost algorithm and convolutional neural networks (CNN). Firstly, each image is transformed into a weighted graph using gradient of intensity. Secondly, a number of graph invariants are computed thus producing a set of synthetic features that is used to train machine learning model based on XGBoost. Combining XGBoost with CNN further increases the accuracy of leukemic cell classification. Sensitivity (TPR) and Specifity (TNR) of the XGBoost-based model were 95% and 97% accordingly; ResNet-50 model showed TPR of 95% and TNR of 98%. Combined use of the XGBoost-based and the ResNet-50 models demonstrated TPR of 99% and TNR of 99%.
... This model obtained some promising results in terms of both classes' global mean accuracy. A nucleus image segmentation for automated interpretation of the leukemia disease is proposed in [16]. This approach is named HSCRKM (hybrid histogram-based soft covering rough k-means clustering). ...
Article
Full-text available
Microscopic image analysis is an important task from the diagnostic point of view because microscopic investigation is often required to diagnose the root cause of some diseases. Some inherent characteristics of the microscopic images often create some problems for the automated analysis algorithms and therefore, a lot of developments are to be performed to enrich the automated and computer-aided diagnostic systems. The proposed work proposes a novel microscopic image segmentation approach that is based on affinity propagation-based clustering. The main objective of this work is to provide an elegant solution for computer-aided diagnostic systems by introducing an efficient segmentation approach because segmentation plays a crucial role in many biomedical image analysis frameworks. A balanced semi-supervised framework is proposed that can work with a lesser number of annotations. The proposed approach uses incremental and decremental learning besides affinity propagation clustering. It allows a balance between new learning and forgetting already learned information. The proposed affinity calculation method is used for clustering purposes. The proposed approach can be helpful for real-life microscopic image interpretation purposes and can act as a third eye for physicians. The proposed approach can be helpful where properly annotated ground truth segmented data are not available. Some real-life microscopic data are considered to perform the experiments and the obtained results are quite encouraging and prove the effectiveness of the proposed approach.
... Finally, locally weighted regression smoothing (LOESS) was applied to obtain the GB and gallstone contours. Hannah et al. [18] used a hybrid histogram-based soft covering Rough K-means clustering algorithm for leukemia image segmentation. Xian et al. [19] built a breast image segmentation benchmark to compare the performance of five existing stateof-the-art breast image segmentation techniques using a large dataset comprising of 562 images, wherein two of the methods are semi-automatic and the rest three are fully automatic segmentation methods. ...
Article
Full-text available
Delineation of Gallbladder (GB) and identification of gallstones from Computed Tomography (CT) and Ultrasonography (USG) images is an essential step in the radiomic analysis of Gallbladder Cancer (GBC). In this study, we devise a method for effective segmentation of GB from 2D CT images and Gallstones from USG images, by introducing a Rough Density based Segmentation (RDS) method. Based on the threshold value obtained using rough entropy thresholding, the image is thresholded and passed as an input to the RDS method to obtain the desired segmented regions. To evaluate the performance of RDS method, we collected images from 30 patients exhibiting normal GB and 8 patients with gallstones. Additionally, the versatility of our RDS method has also been tested for segmenting lungs from a publicly available Covid-19 lung CT image dataset with cohort size of 20 patients. Our method has been compared with several well-known methods like hybrid fuzzy clustering, morphological active contour without edges, modified fuzzy c means and morphological geodesic active contours and found to give significantly better results with reference to Jaccard coefficient, Dice coefficient, accuracy, precision, sensitivity, specificity and McNemar’s test.
... The resulting yield is then assessed. (Emary et al., 2014a;Hassanien et al., 2023Hassanien et al., , 2020Hassanien et al., , 2019aHassanien et al., ,b, 2014bInbarani et al., 2020;Sain et al., 2022;Santoro et al., 2013). During network learning, the objective is to minimize the fault between the network yield and the exact result, which is quantified by a loss function. ...
Article
Full-text available
The difficulty in predicting early cancer is due to the lack of early illness indicators. Metaheuristic approaches are a family of algorithms that seek to find the optimal values for uncertain problems with several implications in optimization and classification problems. An automated system for recognizing illnesses can respond with accuracy, efficiency, and speed, helping medical professionals spot abnormalities and lowering death rates. This study proposes the Novel Hybrid GAO (Genetic Arithmetic Optimization algorithm based Feature Selection) (Genetic Arithmetic Optimization Algorithm-based feature selection) method as a way to choose the features for several machine learning algorithms to classify readily available data on COVID-19 and lung cancer. By choosing just important features, feature selection approaches might improve performance. The proposed approach employs a Genetic and Arithmetic Optimization to enhance the outcomes in an optimization approach.
Article
Full-text available
Brain tumors present a significant medical challenge, demanding accurate and timely diagnosis for effective treatment planning. These tumors disrupt normal brain functions in various ways, giving rise to a broad spectrum of physical, cognitive, and emotional challenges. The daily increase in mortality rates attributed to brain tumors underscores the urgency of this issue. In recent years, advanced medical imaging techniques, particularly magnetic resonance imaging (MRI), have emerged as indispensable tools for diagnosing brain tumors. Brain MRI scans provide high-resolution, non-invasive visualization of brain structures, facilitating the precise detection of abnormalities such as tumors. This study aims to propose an effective neural network approach for the timely diagnosis of brain tumors. Our experiments utilized a multi-class MRI image dataset comprising 21,672 images related to glioma tumors, meningioma tumors, and pituitary tumors. We introduced a novel neural network-based feature engineering approach, combining 2D convolutional neural network (2DCNN) and VGG16. The resulting 2DCNN-VGG16 network (CVG-Net) extracted spatial features from MRI images using 2DCNN and VGG16 without human intervention. The newly created hybrid feature set is then input into machine learning models to diagnose brain tumors. We have balanced the multi-class MRI image features data using the Synthetic Minority Over-sampling Technique (SMOTE) approach. Extensive research experiments demonstrate that utilizing the proposed CVG-Net, the k-neighbors classifier outperformed state-of-the-art studies with a k-fold accuracy performance score of 0.96. We also applied hyperparameter tuning to enhance performance for multi-class brain tumor diagnosis. Our novel proposed approach has the potential to revolutionize early brain tumor diagnosis, providing medical professionals with a cost-effective and timely diagnostic mechanism.
Article
Cancer remains a substantial worldwide health issue that requires careful and exact classification to plan treatment in its early stages. Classical methods of cancer diagnosis involve lab-based testing using biopsy, and imaging tests. Modern technologies may contribute effectively to speed up the diagnosis of cancer. Machine learning-based algorithms have been more prominent in cancer classification in recent years. These algorithms hold great promise in interpreting complex datasets and applying the learned knowledge to categorize unseen samples for cancer classification. In addition, many computer vision-based algorithms play a vital role in image pre-processing, segmentation, and feature extraction. This review article discusses nine major cancer types: carcinoma, sarcoma, neuroendocrine tumor, melanoma, lymphoma, germ cell tumor, leukemia, brain tumor, and multiple myeloma. We conducted a detailed survey of recent literature. We focused on systems that utilize clinical imaging modalities as input and preprocessing, segmentation, and feature extraction as intermediate stages with machine learning classifier as their concluding stage. We have examined the works that classify cancer as mentioned above types using machine learning algorithms. We have analyzed six prominent machine learning-based algorithms: Support vector machines, decision trees, random forest, Naïve Bayes, logistic regression, and K-nearest neighbors. This work also gives insights into various imaging modalities, such as Computed Tomography scan, histopathological images, dermoscopic images, and their utility in diagnosing cancer. In addition, the paper discusses the performance measures used for evaluating the efficiency of machine learning-based models, including accuracy, sensitivity, specificity, F1-score. We have reviewed various pre-processing and segmentation techniques suitable for clinical image-based cancer classification. This survey also discusses some significant challenges researchers face during cancer classification studies. The main objective of this systematic review is to provide researchers and medical experts with extensive knowledge of the present status of cancer classification with the aid of computer vision and machine learning-based systems. We intend to provide a foundation for enhanced cancer detection and therapy precision using these techniques. This effort eventually contributes to the progression of the field of cancer and the enhancement of patient predictions. In addition, we have recognized a few possible directions for research in this domain.
Article
Blood disorders are such conditions that impact the blood’s ability to function correctly. There is a range of different symptoms depending on the type. There are several different types of blood disorders such as Leukemia, chronic myelocytic leukemia, lymphoma, myelofibrosis, polycythemia, thrombocytopenia, anemia, and leukocytosis. Some resolve completely with therapy or do not cause symptoms and do not affect overall lifespan. Some are chronic and lifelong but do not affect how an individual lives. Other blood disorders, like sickle cell disease and blood cancers, can be even fatal. There needs to be a capture of hidden information in the medical data for detecting diseases in the early stages. This paper presents a novel hybrid modeling strategy that makes use of the synergy between two methods with histogram-based gradient boosting classifier tree and random subspace. It should be emphasized that the combination of these two models is being employed in this study for the first time. We present this novel model built for the assessment of blood diseases. The results show that the proposed model can predict the tumor of blood disease better than the other classifiers.
Article
Full-text available
The rapid development in information technology makes it easier to collect vast numbers of data through the cloud, internet and other sources of information. Multiview clustering is a significant way for clustering multiview data that may come from multiple ways. The fuzzy c-means (FCM) algorithm for clustering (single-view) datasets was extended to process multiview datasets in the literature, called the multiview FCM (MV-FCM). However, most of the MV-FCM clustering algorithms and their extensions in the literature need prior information about the number of clusters and are also highly influenced by initializations. In this paper, we propose a novel MV-FCM clustering algorithm with an unsupervised learning framework, called the unsupervised MV-FCM (U-MV-FCM), such that it can search an optimal number of clusters during the iteration process of the algorithm without giving the number of clusters a priori. It is also free of initializations and parameter selection. We then use three synthetic and six benchmark datasets to make comparisons between the proposed U-MV-FCM and other existing algorithms and to highlight its practical implications. The experimental results show that our proposed U-MV-FCM algorithm is superior and more useful for clustering multiview datasets.
Article
Full-text available
Leukemia is a fatal cancer and has two main types: Acute and chronic. Each type has two more subtypes: Lymphoid and myeloid. Hence, in total, there are four subtypes of leukemia. This study proposes a new approach for diagnosis of all subtypes of leukemia from microscopic blood cell images using convolutional neural networks (CNN), which requires a large training data set. Therefore, we also investigated the effects of data augmentation for an increasing number of training samples synthetically. We used two publicly available leukemia data sources: ALL-IDB and ASH Image Bank. Next, we applied seven different image transformation techniques as data augmentation. We designed a CNN architecture capable of recognizing all subtypes of leukemia. Besides, we also explored other well-known machine learning algorithms such as naive Bayes, support vector machine, k-nearest neighbor, and decision tree. To evaluate our approach, we set up a set of experiments and used 5-fold cross-validation. The results we obtained from experiments showed that our CNN model performance has 88.25% and 81.74% accuracy, in leukemia versus healthy and multi-class classification of all subtypes, respectively. Finally, we also showed that the CNN model has a better performance than other well-known machine learning algorithms.
Article
Full-text available
Medical images are widely used to plan further treatment for the patient. However, the images sometimes are corrupted with a noise, which normally exists or occurs during storage or while transferring the image. Therefore, the need to enhance the image is crucial in order to improve the image quality. Segmentation techniques for Magnetic Resonance Imaging (MRI) of the brain are one of the methods used by radiographer to detect any abnormality that has happened specifically for the brain. The method is used to identify important regions in brain such as white matter (WM), grey matter (GM) and cerebrospinal fluid spaces (CSF). The clustering method known as Enhanced Adaptive Fuzzy K-means (EAFKM) is proposed to be used in this project as a tool to classify the three regions. The results are then compared with fuzzy C-means clustering (FCM) and adaptive fuzzy k-means (AFKM). The segmented image is analyzed both qualitative and quantitative. The proposed method provides better visual quality of the image and minimum Mean Square Error.
Article
Full-text available
X-ray computed tomography is an efficient method for quantitatively estimating the characteristics and heterogeneity of shales in three dimensions. A threshold is commonly used to separate pore-fractures from the background image. However, few studies have focused on the multi-component segmentation of computed tomography images. To obtain the distribution characteristics of different components in three dimensions, a segmentation method was proposed that combines a multi-Otsu thresholding algorithm with scanning electron microscopy. The gray value distributions of different components were first determined using this method. Then, the shale components were divided into several groups based on these gray values. The threshold of each component group was determined using the multi-Otsu thresholding algorithm. The computed tomography image stacks of two shale samples were processed using this segmentation method, and another computed tomography image stack was used to verify the method. The results showed that (1) the multi-component segmentation method can successfully segment computed tomography image stacks using the calculated values determined by computed tomography, which agree well with the measured values obtained from X-ray diffraction, total organic carbon, and porosity tests in the laboratory; (2) samples with similar provenances and mineral compositions have the same gray value distribution in the back scattering scanning electron microscopy and computed tomography images; (3) this method is superior in both the effectiveness and efficiency of the computed tomography image stack segmentation of samples according to the gray value distribution, as determined by samples with similar provenances and mineral compositions.
Article
Full-text available
In this research, we propose an intelligent decision support system for acute lymphoblastic leukaemia (ALL) diagnosis using microscopic images. Two Bare-bones Particle Swarm Optimization (BBPSO) algorithms are proposed to identify the most significant discriminative characteristics of healthy and blast cells to enable efficient ALL classification. The first BBPSO variant incorporates accelerated chaotic search mechanisms of food chasing and enemy avoidance to diversify the search and mitigate the premature convergence of the original BBPSO algorithm. The second BBPSO variant exhibits both of the abovementioned new search mechanisms in a subswarm-based search. Evaluated with the ALL-IDB2 database, both proposed algorithms achieve superior geometric mean performances of 94.94% and 96.25%, respectively, and outperform other metaheuristic search and related methods significantly for ALL classification.
Article
Background and objectives The diagnosis of acute myeloid leukemia (AML) is purely dependent on counting the percentages of blasts (>20%) in the peripheral blood or bone marrow. Manual microscopic examination of peripheral blood or bone marrow aspirate smears is time consuming and less accurate. The first and very important step in blast recognition is the segmentation of the cells from the background for further cell feature extraction and cell classification. In this paper, we aimed to utilize computer technologies in image analysis and artificial intelligence to develop an automatic program for blast recognition and counting in the aspirate smears. Methods We proposed a method to analyze the aspirate smear images, which first performs segmentation of the cells by k-means cluster, then builds cell image representing model by HMRF (Hidden-Markov Random Field), estimates model parameters through probability of EM (expectation maximization), carries out convergence iteration until optimal value, and finally achieves second stage refined segmentation. Furthermore, the segmentation results are compared with several other methods using six classes of cells respectively. Results The proposed method was applied to six groups of cells from 61 bone marrow aspirate images, and compared with other algorithms for its performance on the analysis of the whole images, the segmentation of nucleus, and the efficiency of calculation. It showed improved segmentation results in both the cropped images and the whole images, which provide the base for down-stream cell feature extraction and identification. Conclusions Segmentation of the aspirate smear images using the proposed method helps the analyst in differentiating six groups of cells and in the determination of blasts counting, which will be of great significance for the diagnosis of acute myeloid leukemia.
Article
Over the last few decades, the advance of new technologies in computer equipment, cameras and medical devices became a starting point for the shape of medical imaging systems. Since then, many new medical devices, e.g. the X-Ray machines, computed tomography scans, magnetic resonance imaging, etc., accompanied with operational algorithms inside has contributed greatly to successful diagnose of clinical cases. Enhancing the accuracy of segmentation, which plays an important role in the recognition of disease patterns, has been the focus of various researches in recent years. Segmentation using advanced fuzzy clustering to handle the problems of common boundaries between clusters would tackle many challenges in medical imaging. In this paper, we propose a new fuzzy clustering algorithm based on the neutrosophic orthogonal matrices for segmentation of dental X-Ray images. This algorithm transforms image data into a neutrosophic set and computes the inner products of the cutting matrix of input. Pixels are then segmented by the orthogonal principle to form clusters. The experimental validation on real dental datasets of Hanoi Medical University Hospital, Vietnam showed the superiority of the proposed method against the relevant ones in terms of clustering quality.
Article
Intuitionistic fuzzy c-means (IFCM) is a clustering technique which considers hesitation factor and fuzzy entropy to improve the noise sensitivity of fuzzy c-means (FCM). Credibilistic FCM modified FCM by introducing a term, credibility, to reduce the affect of outliers on the location of cluster centers. In this paper an intutionistic fuzzy set based robust credibilistic IFCM is proposed. Proposed method is tested on real and simulated MRI and CT scan brain images and is compared with seven algorithms namely fuzzy c-means (FCM), Type-2 FCM, IFCM, credibilistic FCM, spatial FCM, possibilistic c-means and probabilistic FCM.
Chapter
Prostate imaging is a very critical issue in the clinical practice, especially for diagnosis, therapy, and staging of prostate cancer. Magnetic Resonance Imaging (MRI) can provide both morphologic and complementary functional information of tumor region. Manual detection and segmentation of prostate gland and carcinoma on multispectral MRI data is not easily practicable in the clinical routine because of the long times required by experienced radiologists to analyze several types of imaging data. In this paper, a fully automatic image segmentation method, exploiting an unsupervised Fuzzy C-Means (FCM) clustering technique for multispectral T1-weighted and T2-weighted MRI data processing, is proposed. This approach enables prostate segmentation and automatic gland volume calculation. Segmentation trials have been performed on a dataset composed of 7 patients affected by prostate cancer, using both area-based and distance-based metrics for its evaluation. The achieved experimental results are encouraging, showing good segmentation accuracy.
Article
Chronic wound is an abnormal disease condition of localized injury to the skin and its underlying tissues having physiological impaired healing response. Assessment and management of such wound is a significant burden on the healthcare system. Currently, precise wound bed estimation depends on the clinical judgment and remains a difficult task. The paper introduces a novel method for ulcer boundary demarcation and estimation, using optical images captured by a hand-held digital camera. The proposed approach involves gray based fuzzy similarity measure using spatial knowledge of an image. The fuzzy measure is used to construct similarity matrix. The best color channel was chosen by calculating the mean contrast for 26 different color channels of 14 color spaces. It was found that Db color channel has highest mean contrast which provide best segmentation result in comparison with other color channels. The fuzzy spectral clustering (FSC) method was applied on Db color channel for effective delineation of wound region. The segmented wound regions were effectively post-processed using various morphological operations. The performance of proposed segmentation technique was validated by ground-truth images labeled by two experienced dermatologists and a surgeon. The FSC approach was tested on 70 images. FSC effectively segmented targeted ulcer boundary yielding 91.5% segmentation accuracy, 86.7%, Dice index and 79.0%. Jaccard score. The sensitivity and specificity was found to be 87.3% and 95.7% respectively. The performance evaluation shows the robustness of the proposed method of wound area segmentation and its potential to be used for designing patient comfort centric wound care system.