Content uploaded by Alagu .S
Author content
All content in this area was uploaded by Alagu .S on Jul 11, 2022
Content may be subject to copyright.
A Decision Support System for the Identification of
Acute Lymphoblastic Leukemia in Microscopic
Blood Smear Images
Cranaf R 1, Kavitha G 2 and Alagu S 3
1,2,3Department of Electronics Engineering, Madras Institute of Technology, Anna University, Chennai.
Abstract
Blood cancer is one of the most critical
diseases. Leukemia, in particular, is the common blood
cancer that causes an overabundance of leukocytes to
be produced. The detection of acute lymphocytic
leukemia from single cell blood smear images is
identified by various of methodologies. The goal of this
research is to come up with a reliable strategy for
classifying leukemia. The input images are obtained
from the public database ALL-IDB2 which contains
198 images. Resizing and image augmentation are
carried out to get a more number of images with
uniform size. For the segmentation of ALL nucleus the
proposed work employs SK U-Net convolutional
neural network. The SK U-Net outperformed the
conventional U-Net with a Dice score of 0.916,
compared to the conventional U-Net with a Dice score
0.320. Deep features are obtained from segmented
images. ResNet and CapsuleNet are used to extract
deep features from segmented images. When
compared to CapsuleNet, ResNet provides the most
significant features. The distinct features are fed into
the binary bat algorithm, to reduce irrelevant features.
Finally, ANOVA is used to do statistical analysis of
consistent and robust feature sets. The obtained p
value is 0.00001. The selected feature from binary bat
algorithm is given to a k-nearest neighbour and
support vector machine. SVM gives better accuracy of
98%
Introduction
Medical image processing plays vital role for
disease screening and automated image evaluation is
common in health care. White blood cells come in a
variety of sizes and shapes in the human body.
Leukemia is a blood cancer that begins in the bone
marrow and resulting in an excessively large number
of white blood cells. Today's lab tests for diagnosing
leukaemia infection take longer and they are
complex, prone to human error, and repetitious.
Leukemia is divided into two categories: acute and
chronic. Leukemia is sub categorised into four
categories based on the rate of spread and type of
white blood cells: acute lymphocytic leukaemia
(ALL), acute myelogenous leukaemia (AML),
chronic lymphocytic leukaemia (CLL), and chronic
myelogenous leukaemia (CML) [1].
According to a World Health Organization
analysis of cancer databases, leukemia incidence
varies significantly by area and subtype. As per
Globocan 2020, over 20,000 new cases of childhood
blood cancer are diagnosed every year in India, of
which nearly 15,000 of those cases is leukemia [2].
There are about 61780 cases of leukemia in the U.S.
in the year 2019 and around 9900 new cases of
leukemia are detected in U.K. every year. Globally,
the number of new cases of leukemia is increased
from 345.5 thousand in 1990 to 518.5 thousand in
2018 which decreases the ASIR by 0.43% for every
year [3].
Microscopic blood smears are usually used to
identify leukemia by the medical examiner.
In a healthy white blood cell, there is a nucleus at
the centre and cytoplasm as the next layer. In recent
studies, various methods have been utilised to detect
ALL early from microscopic images [4]. Colour
space conversion and the k-means method, as well
as CNN models-based classification, were used to
segment WBC nucleus [5].
Recently, many research works are going on
to detect ALL. Some of them are discussed here.
Medical experts prefer the cluster of differentiation
(CD) marker with morphological features for
leukemia cell classification [6]. To assimilate the
differences in healthy and blast cells, many
geometrical and moment-based aspects are also
examined. To extract more relevant features, a
combination of maximal information coefficient and
ridge feature selection methods were used in
conjunction with CNN models [7]. For a confident
choice, statistical analysis, such as ANOVA tests, is
essential [8].
The use of a pre-trained AlexNet model
could help classify different forms of leukaemia. For
leukemia identification, machine learning and
simple CNN models such as CafeNet and VGG-f are
considered [9].
Wang et al. employed the CNN model's
optimised deep features and a graph convolutional
network (GCN) with deep feature fusion (DFF) to
improve classification accuracy [10]. The purpose of
this study is to use microscopic blood smear images
to identify acute lymphoblastic leukemia. To obtain
a larger number of images of uniform size, resizing
and image augmentations are performed. The
proposed technique uses the SK U-Net
convolutional neural network to segment ALL
nuclei. The SK U-Net outperformed the
conventional U-Net (Dice score of 0.320) and the
SK U-Net has the Dice score of 0.916.
Deep features are extracted from
segmented images using ResNet and CapsuleNet.
ResNet extracts the most significant features when
compared to CapsuleNet. The binary bat algorithm
is used to reduce irrelevant features by feeding the
different features into it. Finally, ANOVA is utilised
to do statistical analysis on feature sets that are
consistent and robust. It is observed that 1000
percent of the deep characteristics appear to be
better with p =. 00001.
Material and methods
2.1 Dataset
There are 198 microscopic images of blood
smears in the "ALL IDB2" database. There are 99
healthy cells and 99 unhealthy cells among the 198
images. The energy shot G5 and a laboratory optical
microscope were used to capture all of the images.
The images have a resolution of 2592 x 1944 pixels
and a depth of 24 bits. [11].
2.2. Pre processing
The proposed computer vision paradigm
employs resizing. The collection provided a wide
range of images in different sizes. Images of a
specific size are required for further processing.
Using the bi-cubic interpolation method, all of the
Acute Lymphoblastic Leukemia images in our
database were resized to 224 x 224 pixels and then
processed with a 3 x 3 median filter. Manual ROIs
were reduced to 224 × 224 using the nearest
neighbour interpolation approach. To increase the
amount of dataset, the given input images are fed to
an augmentation unit.
Figure 1:Block diagram of proposed work
2.3. Selective Kernel U-Net based segmentation
The SK-U-Net architecture was based on the
U-Net architecture, however instead of activation
function blocks, SK blocks were used. Figure 2
depicts the SK-U-Net architecture and the SK block
in general.
The purpose of each SK block was to modify
the network's receptive field adaptively and mix
feature maps obtained by different convolutions to
successfully distinguish nucleus from ALL. In each
SK block, there were two branches. The first
employed 3x3 kernel filters and convolutions with a
dilation size of 2, whereas the second used 3x3
kernel filters without dilation.
Another deep neural network U-Net is used for
comparison. U-Net architecture consists of
contracting and expansive path which gives U-
shaped architecture. The contracting path CNN
consists of repeated application of convolution
followed by rectified linear unit and maxpooling
operation.
2.4. Feature Extraction
Feature Extraction is used to describe the
most relevant information of an image. The
segmented nucleus is subjected to feature extraction
using different CNN models. Using U-Net, SK U-
Net, the nucleus is segmented and features extracted
by using ResNet and CapsuleNet.
Deep CNN is used to extract features from
input and classify data [12]. In the proposed work,
ResNet and CapsuleNet are used for this purpose.
A residual neural network (ResNet) is one of
the artificial neural network (ANN).
Typical ResNet models are implemented
with double- or triple- layer skips that contain
nonlinearities (ReLU) and batch normalization in
between. Models with several parallel skips are
referred to as DenseNets. In the context of residual
neural networks, a non-residual network may be
described as a plain network.
The robust features can be extracted by
using ResNet. The architecture is inspired on VGG-
19 and has a 34-layer plain network to which
shortcut and skip connections are added [12].
The Capsulenet is also used to extract
features. A unique capsule of this sort is capable of
detecting the face as well as other forms of
information. To build the capsule network,
numerous layers of capsule nodes are used. The
CapsuleNet or CapsNet is an encoding unit that
consists of three layers of capsule nodes.
2.5. Feature Selection
For feature selection, the obtained deep
features are taken into account. To choose the most
important features, the binary bat method is used. As
a result, feature selection not only eliminates
unimportant features but also significantly reduces
the cost of computation.
A new meta-heuristic algorithm based on
micro bat echolocation, with the fundamental
assumption that the micro bat can distinguish
between an obstacle and its prey and only modifies
its behaviour when it is near a prey rather than an
obstruction. The algorithm is designed to mimic the
behaviour of a colony of micro bats pursuing their
meal [13].
It is assumed that the bats reside in an n-
dimensional vector space, where n is the number of
optimization points of the problem. The current
position of the ith bat flying with a velocity Vi is
represented as Xi. The waves emitted are assumed
to be within the frequency range (fmin, fmax), with
an initial loudness A0, and the rate of pulse emission
as r. The position and velocity of the virtual micro
bat at a (t + 1) is realized as a function of its
frequency, position and velocity using Equations
(1), (2) and (3).
𝑓
𝑖= 𝑓
𝑚𝑖𝑛 + (𝑓
𝑚𝑎𝑥 + 𝑓
𝑚𝑖𝑛)𝛽 ----------------(1)
𝑉𝑖= 𝑉𝑖+(𝑋𝑖− 𝑋𝑏𝑒𝑠𝑡)𝑓
𝑖 ----------------(2)
𝑋𝑖= 𝑋𝑖+ 𝑉𝑖 ------------------(3)
In this case, β is a random number between 0 and 1,
and Xbest is the current location of the global best
solution in the algorithm.
2.6. Classification
The classification is performed by the K-
Nearest Neighbour (KNN) and Support Vector
Machine (SVM) algorithm.
The K-Nearest Neighbour algorithm is based
on the supervised learning technique and is one of
the most basic Machine Learning algorithms. The K-
NN algorithm assumes that the new case/data and
existing cases are similar and places the new case in
the category that is most similar to the existing
categories.
The K-NN method stores all available data
and classifies a new data point based on its similarity
to the existing data. This means that new data can be
quickly sorted into a well-defined category using the
K-NN method. The K-NN algorithm can be used for
both regression and classification, but it is more
commonly utilised for classification tasks.
SVM is a supervised machine learning
technique that may be used for both classification
and regression. Though we might also argue
regression difficulties, categorization is the best fit.
The goal of the SVM algorithm is to find a
hyperplane in an N-dimensional space that
categorises data points clearly. The hyperplane's size
is determined by the number of features.
Results
The proposed research aims to classify Acute
Lymphoblastic Leukemia. Figure 3 shows examples
of healthy and blast cells. Each image contains two
types of cells.
Figure 3: Typical microscopic images (a) Healthy
cells and (b) blast cells
The images from the database are sent
through the preprocessing unit, as shown in Figure
1. During preprocessing, the images are resized to
224 x 224 pixels so that SK U-Net can segment ALL
cells. After resizing, the input images are fed into the
augmentation unit. Rotation of 45 degrees width
shift, height shift, zooming, and horizontal shift are
all done during image augmentation.
Table 1 compares the standard U-Net and the
SK U-Net in terms of segmentation performance. In
terms of overall performance, this paper
outperformed the U-Net.
For healthy cells, the SK U-Net had a
median Dice score of 0.914, whereas for blast cells,
it had a median Dice score of 0.898. Figures 4 and 5
show several segmentation results for healthy cells
and blast cells with Dice coefficients around the
median.
Figure 4: Segmentation of healthy cells nucleus by
SK U-Net
Figure 5: Segmentation of Unhealthy cell nucleus
by SK U-Net
Figure 5 displays the SK U-Net segmentation
results, with the resized input images in the first
column. The second and third columns, respectively,
show the ground truth mask from Kaggle and the
nucleus expected mask from SK U-Net. The
segmented images in the final column are used as
input in various CNN configurations. The visual
results for SK U-Net appear to be better than scaled
photos.
Table1: Dice coefficient and accuracy of SK U-
Net and U-Net
The predicted mask is compared with the ground
truth mask by Dice metrics and Accuracy. The plot
for variation in Dice metrics and accuracy is shown
in Figure 6.
Figure 6: Performance comparison of SK U-Net and
U-Net.
The suggested method outperforms previous
methods in terms of performance, with an accuracy
of 0.97.
The feature extraction block receives the
segmented images from SK U-Net. Feature
extraction is performed on the segmented images
using several CNN models such as ResNet and
CapsuleNet. A multi-layered artificial neural
network called a deep convolutional neural network
(CNN) is a sort of deep convolutional neural
network. It was created with the goal of extracting
features from the input and classifying high-
dimensional data.
A deep CNN is made up of several
convolutional and Max pooling layers, as well as
0.916
0.323
0.967
0.76
0
0.2
0.4
0.6
0.8
1
1.2
SK U-Net U-Net
Normalized values
Dice coefficient
Accuracy
fully connected output layers. A number of feature
maps make up each convolutional layer in the
network. The neurons in one feature map with the
same weights ensure that parameters are reduced
and shift invariance is maintained. The model is
trained using the back propagation approach.
In order to find desired features, the Resnet
algorithm provided above 4000 features. The
percentages of various features are checked for p
value of anova test calculator to find the desired
features.
Table2: Significant results of anova test (ResNet)
Furthermore, the p-values of the proposed
method for different percentages of the fused deep
feature set show a significant difference. Among
them, 100% of the features are highly significant
with the value of p = .00001, which can be more
suitable to differentiate blast cells from normal cells.
CapsuleNet is used for comparison purpose.
Furthermore, the suggested method's p-values for
different percentages of the fused deep feature set
change significantly. 50 percent of the features are
highly significant with a p value of.00001, making
them better for distinguishing blast cells from
normal cells.
Table3: Significant results of anova test
(CapsuleNet)
According to the Table 2 and Table 3, ResNet has
the most significant features over CapsuleNet. In
order to eliminate irrelevant features, the collected
features from ResNet are passed into a binary bat
method. Table 4 displays the outcome of the binary
bat algorithm.
Table4: Significant results of anova test (BAT)
The p-values of the proposed method for
different percentages of the fused deep feature set
show a significant difference. All of the features are
highly significant with the value of p = .00001,
which can be more suitable to differentiate blast
cells from normal cells.
The binary bat algorithm's retrieved features
are fed into classification models such as KNN and
SVM. Figure 7 depicts the confusion matrix of KNN
algorithm.
Figure 7: Confusion matrix of KNN
KNN obtains accuracy of 0.58. The goal of the
SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space
into classes so that we can easily put the new data
point in the correct category in the future. This best
decision boundary is called a hyperplane. Figure 8
depicts the confusion matrix of SVM algorithm.
Figure 8: SVM confusion matrix.
Healthy cells have a SVM accuracy of 0.98, while
unhealthy cells have a SVM accuracy of 0.98.
Conclusion
A novel approach for leukemia detection is
introduced on single smear blood cells. Input images
are obtained from public database Initially the
images are pre-processed and given to the
segmentation process. For proper segmentation of
nucleus, selective kernel U-Net segmentation model
is carried out. The Segmentation is compared U-Net
Algorithm. Based on the Dice metrics of both the
models, the SK U-Net gives better results than the
active U-Net. From the segmented output, the deep
features are extracted by using ResNet and
CapsuleNet, ResNet provide Significant features
than CapsNet. These extracted features are given to
the binary bat algorithm. For proper selection of
optimal weights, BBA is developed.
. For the classification KNN model and SVM model
of leukemia detection is well performed. The
proposed SVM has comparatively increase the
performance of the CNN model with the accuracy of
100%. In future, hybrid optimization algorithm will
be developed for training the Deep CNN which will
enhance the classification results.
References
[1] Seifedine Kadry and et al., 2021. Automated
segmentation of leukocyte from hematological images- a
study using various CNN schemes. Springer, doi:
https://doi.org/10.1007/s11227-021-04125-4.
[2] Cancer Research UK.
http://www.cancerresearchuk.org/, 2020.
[3] Ying Dong et al., “Leukemia incidence trends at the
global, regional and national level between 1990 and
2018,” Research article on Experimental Hematology and
Oncology, 2020.
[4] Michal Byra, Piotr Jarosik and et.al., 2020. “Breast
mass segmentation in ultrasound with selective kernel U-
Netconvolutional neural network”. ELSEVIER
Biomedical Signal Processing and Control 61,
102027.doi: https://doi.org/10.1016/j.bspc.2020.102027.
[5] Pradeep Kumar Das and et al., (2021). “An efficient
deep convolutional neural network based detection and
classification of acute lymphoblastic leukemia”.
ELESEVIER Expert System with Application 183, doi:
https://doi.org/10.1016/j.eswa.2021.115311.
[6] Mishra, S., B. Majhi, and P. K. Sa. 2018. GLRLM-
Based feature extraction for Acute Lymphoblastic
Leukemia (ALL) detection. Recent findings in intelligent
computing techniques. Advances in Intelligent Systems
and Computing 708. Springer, Singapore.
doi:10.1007/978-981-10-8636-6_41.
[7] Toğaçar, M., B. Ergen, and C. Zafer. 2020.
Classification of white blood cells using deep features
obtained from Convolutional Neural Network models
based on the combination of feature selection methods.
Applied Soft Computing 97:1–10. doi:
10.1016/j.asoc.2020.106810.
[8] Hassan Al-Yassin, I., A. Jaafar Mousa, M. Fadhel, O.
Al-Shamma, and L. Alzubaidi. 2020. Statistical accuracy
analysis of different detecting algorithms for surveillance
system in smart city. Indonesian Journal of Electrical
Engineering and Computer Science 18 (2):979–86.
doi:10.11591/ijeecs.v18.i2.pp979-986
[9] Wang, S. H., V. V. Govindaraj, J. M. Górriz, X. Zhang,
and Y. D. Zhang. 2021. Covid-19 classification by
FGCNet with deep feature fusion from graph
convolutional network and convolutional neural network.
An International Journal on Information Fusion 67:208–
29. doi: 10.1016/j.inffus.2020.10.004.
[10] Donida Labati, R., V. Piuri, and F. Scotti., 2011.
ALL-IDB: The acute lymphoblastic leukemia image
database for image processing. IEEE International
conference on Image Processing. Brussels,
Belgium.2045–48. doi:10.1109/ICIP.2011.6115881.
[11] Kumar, D., N. Jain, A. Khurana, S. Mittal, S. C.
Satapathy, D. Roman Senkerik, and J. Hemanth. 2020.
Automatic detection of white blood cancer from bone
marrow microscopic images using convolutional neural
networks. IEEE Access 8:142521–31. doi:10.1109/
access.2020.3012292.
[12] Laith Alzubaidi et al.,2021. Review of deep learning:
concepts, CNN architectures, challenges, applications,
future directions. Journal of big data. Doi:
https://doi.org/10.1186/s40537-021-00444-8.
[13] Prerna Sharma, Kapil Sharma. A novel quantum-
inspired binary bat algorithm for leukocytes classification
in blood smear. Wiley expert systems doi:
10.1111/exsy.12813