Content uploaded by Esraa Abd Elraouf
Author content
All content in this area was uploaded by Esraa Abd Elraouf on Sep 22, 2023
Content may be subject to copyright.
A Deep Learning-Based Classification Framework for
Annotated Histopathology Lung Cancer Images
Esraa A.-R. Hamed1, Mohammed A.-M. Sa-
lem 2, Nagwa L. Badr1, and Mohamed F.
Tolba 1
1Faculty of Computer and Information Sciences, Ain Shams University Cairo, Egypt
2Media Engineering and Technology, GUC, Cairo, Egypt
1{esraa.raoof,nagwabadr,fahmytolba}@cis.asu.edu.eg
2mohammed.salem@guc.edu.eg
Abstract. Cancer is the second leading cause of death globally, with one in six
people dying from it. It occurs when abnormal cells divide uncontrollably and
spread to other organs in the body. Lung cancer is one of the most common and
deadliest types of cancer. Several methods, such as X-rays, CT scans, PET-CT
scans, bronchoscopies, and biopsies, can be used to diagnose lung cancer. Studies
have shown that the type of histology in lung cancer is linked to the diagnosis
and treatment course, making early and accurate detection of lung cancer histol-
ogy crucial for improving survival rates. Artificial intelligence (AI) can aid in the
automation of cancer detection, allowing for the evaluation of more cases in less
time and at a lower cost. The main objective of this research is to evaluate the
effectiveness of a newly proposed CNN model in distinguishing between benign
and malignant lung cancer images obtained from digital pathology. To conduct
the experiment, the LC25000 dataset, containing 5000 images for each category,
was utilized, resulting in a total of 10,000 images. The findings of the proposed
CNN model were then compared to those of existing deep learning models,
demonstrating its ability to accurately identify cancerous tissues with a maximum
accuracy of 99.9% to 100%, while also reducing processing time. These out-
comes can play a crucial role in the development of a precise and automated sys-
tem for identifying various types of lung cancer.
Keywords: deep learning, Convolutional Neural Networks, lung cancer classi-
fication, histopathological image analysis, Squamous Cell Carcinomas.
1 Introduction
Cancer is the main cause of mortality worldwide, according to the World Health Or-
ganization (WHO). The most frequent cancer diagnosis (11.4% of all cases) and the
leading cause of cancer mortality (18.0% of all cancer deaths) is lung cancer [1]. Glob-
ally, the incidence of malignant tumors has been seen to be rising, which may be con-
nected to population expansion. Malignancy can affect any age group depending on the
histological type; however, it is typically found in elderly people between the ages of
50 and 60 [2].
The respiratory system, which includes the lungs, is important for distributing oxygen
throughout the body. In the lung, abnormal cell proliferation can develop and lead to
cancer or pulmonary carcinoma. Poor environmental conditions and an unhealthy life-
style are to reason for this [3]. Only after the tumor is sufficiently large or has spread
to other areas can symptoms of lung cancer become apparent. The success rate of ther-
apy increases with earlier cancer diagnosis [4]. Yet, there is a decreased likelihood of
recovery if cancer is indicated to have spread to other organs. Non-small cell lung can-
cer (NSCLC) accounts for 81– 85% of lung cancer cases. Squamous cell carcinoma,
adenocarcinoma, and giant cell carcinoma of NSCLC are a few of the major subtypes
of lung cancer. They are all categorized as NSCLC subtypes since they came from var-
ious types of lung cells [5].
The specific clinicopathological and genetic features that distinguish Squamous-Cell
Carcinomas (SCC) of the lung have changed significantly over time. The most common
subtype of non-small-cell lung malignancies in the past, these neoplasms were thought
to be central tumors with great molecular complexity and no genetic alterations that
might be targeted [6]. It often starts to develop in the cells lining the bronchi. Cancer
can spread over time by infiltrating neighboring lymph nodes and organs and "metas-
tasizing" (moving to other regions of the body through the blood). It has a close rela-
tionship to the history of smoking. There is a considerable risk of dying. Besides age,
family history, and exposure to secondhand smoke, there are other risk factors for SCC
[7]. As the kind of histology, molecular profile, and stage of the disease all affect how
the disease is treated, it is urgently necessary to identify lung cancer histology. It is also
crucial to analyze the histopathological images of the disease. Manually analyzing his-
topathology results, however, takes time and is not objective [8].
Artificial intelligence (AI) is the stimulation of human intelligence in computer soft-
ware that facilitates communication with machines like that of human communication.
Artificial intelligence (AI), which is utilized in various computer vision domains, has
recently emerged as the most significant science of the twenty-first century. In several
disciplines, machine learning [18,19,20] and deep learning [21,22] have the greatest
levels of accuracy. Deep learning techniques, especially Convolution Neural Net-
works (CNN), are being more widely used in the healthcare industry and has a signifi-
cant influence on all parts of primary care. CNN may be used in the field of medical
imaging to identify and classify any disease at an early stage, allowing for timely treat-
ment and easier recovery for the patient [7].
This study article aims to evaluate and test a proposed Convolutional Neural Network
(CNN) architecture for the classification of lung cancer. The paper follows the follow-
ing structure. Section 2 describes previous research on the identification and categori-
zation of lung cancer. In Section 3, the utilized dataset is briefly introduced. Further
information on the suggested deep learning procedure using CNN architecture is pro-
vided in Section 4. Section 5 summarizes all of the experimental observations and con-
clusions. The experiment is finished in Section 6, which also makes some recommen-
dations for further research.
2 Related Work
Of all machine learning techniques, deep neural networks have shown improved out-
comes in the identification of medical images. To increase the precision of detection
and classification, several CNN algorithms are applied to the classification of lung can-
cer images.
A capsule network with numerous inputs was suggested by Mumtaz et al. [9] to build
a diagnostic model for aberrant cell cancer of the lung and colon. A convolutional layer
block and an additional convolutional layer block were employed by the capsule net-
work. Pathological images are used as input by the convolutional layer block (CLB),
whereas histopathological images are used by the Separable CLB. Based on histopatho-
logical scans, the suggested model had a 99.58% accuracy rate for anomalies in the
colon and lungs.
Gessert et al. [10] used microscopic images of colon cancer to classify the data using
transfer learning-based CNN models. They used models like Inception, VGG, and
DenseNet to train datasets. They had the most success categorizing data using the
DenseNet model, which had a classification accuracy rate of 91.2%. DHSCapsNet was
suggested by Kwabena et al. [11] to assess histological images of lung and colon cancer.
The network is made up of a combination of encoder features and DHSCaps. The en-
coder features are made up of the convolutional layer features that have a lot of strong
information. HSquash pulls data from several ckgrounds. They outperformed standard
CapsNet (85.55%) with results of 99.23%.
Vuong et al. [12] proposed a multi-purpose learning strategy to evaluate digital pathol-
ogy images. They employed a collection of pathology image data divided into four
classifications for their research. They utilized the DenseNet-121 model for dataset
training and configured the input data for the model to be 800X800 pixels. They found
an 85.91% classifier accuracy rate. A CNN Pre-Trained Diagnostic Network for Lung
and Colon Cancer was suggested by Sanidhya et al. [13]. Histological slips were ana-
lyzed using a shallow CNN architecture. For the diagnosis of colon and lung malignan-
cies, the network obtained 96% and 97% accuracy, respectively.
The DarkNet-19 model was suggested by Mesut et al. [14] to train the lung and colon
malignancy dataset from scratch. To choose the inefficient features, the Equilibrium
method was used, followed by a separation of the inefficient features from the efficient
ones. The SVM is given effective features for classification. The total accuracy rating
was 99.69%. An approach for precisely identifying lung and colon cancer cells was
presented by SHAHID et al. [15]. By altering the four fundamental layers of AlexNet
and then training it on a dataset, an accuracy of 89% was attained.
Authors in [16] used histological images of colon cancer to make their classifications
using a deep learning methodology. There are four classes in the dataset. Each image
was subjected to the cell identification technique for cell patches. Here, segmenting was
used to separate the images into discrete sizes. By using cell patches created by the used
CNN model, they carried out the classification process. Their correlation accuracy rat-
ings ranged from 90% to 96.9%.
While the proposed model has achieved accuracy of 99.9% in classifying lung malig-
nant from benign lesions. The proposed model achieved the highest accuracies, com-
pared by the state-of-the-art models.
3. Used Dataset
The LC25000 Lung and Colon Histopathological Image collection contains 5000 im-
ages of each type of lung and colon cancerous. The dataset has been validated and com-
plies with HIPAA [17]. Just 750 original images were gathered in total, 250 of which
were given to each category and had a dimension of 1024 x 768 pixels. These images
are scaled down to 768x768 pixels using Python, and then they are enlarged using the
software package augmenter [17]. As a result, the bigger dataset has 5000 images for
each group.
By rotating left and right and flipping horizontally and vertically, augmentation is ac-
complished [17]. Table 1 displays the description of class names of the LC25000 da-
taset with a sample image for each category.
Table 1. Description of LC25000 dataset.
Lung Benign
Class name
Number of Images
Sample Image
Lung Adenocarcinoma
lung_n
5000
Lung Squamous Cell
Carcinoma
lung_aca
5000
Colon Benign
lung_scc
5000
Colon Adenocarcinoma
col_n
5000
Lung Benign
col_aca
5000
The lung cancer squamous cell carcinoma (SCC), which progresses through the kerat-
inization process, is distinguished by the presence of polygonal-shaped cells. In its early
stages, this disease has no symptoms. Because of this, cancer is frequently discovered
after it has spread to other organ areas. As a result, early identification is crucial to
improving treatment outcomes. The patient's likelihood of surviving for five years is
less than 20% if the diagnosis is delayed. Therefore, 10,000 histopathological images
representing two types of lung tissue (lung squamous cell cancer and benign lung tis-
sue) were chosen from the LC25000 collection.
4. Proposed Deep Learning Approach
This section discusses the proposed method for classifying images of lung cancer his-
tology from the LC25000 dataset. The dataset was randomly split into training and test-
ing sets, and the proposed CNN model achieved high accuracy in classifying the images
into benign or malignant (squamous cell carcinoma) categories. Fig. 1 illustrates the
proposed CNN architecture, which consists of two main steps: feature extraction and
classification. Image resizing to 224x224 pixels in the RGB color space is a crucial step
in image preprocessing for training the model. During the training phase, the proposed
CNN model was trained using the training data, and the model parameters obtained
from this phase were used for classification on the testing data. The expected output of
the model is the classification of lung cancer histology images into benign or malignant
(squamous cell carcinoma) categories.
Fig. 1. Proposed Architecture System.
The structure of the Convolutional Neural Network (CNN) used in the study is depicted
in Fig. 2, consisting of various layers dedicated to different functions. The CNN model
includes four convolutional layers (CL) with respective kernel values of 32, 64, 128,
and 256. The first CL layer utilized a kernel size of (11x11), while the second, third,
and fourth layers used (3x3). Following each convolution layer, a max-pooling layer
with a (2x2) kernel size was applied.
Training Images
Testing Images
Trained
Model
Classification; Benign or
Malignant
Image Pre-processing
Convolutional Neural
Network CNN
A Flatten Layer was used to generate the feature vector for the Fully Connected (FC)
layers after the final CL layer. The FC layers included three layers with 1024 and 512
neurons, respectively. Due to the two classes of lung cancer, the last FC layer has two
neurons. ReLU was applied as an activation function for each convolution layer, and
Softmax was applied for the output layer. Additionally, a Dropout Layer with a rate of
0.4 was implemented.
Fig. 2. The Proposed CNN Architecture.
5 The Experimental Work and Results
This research employed a proposed CNN model to classify the LC25000 Lung histo-
pathology images. Two experiments were conducted with different proportions of data
for the training and testing sets, and a maximum of 50 epochs were used when applied
to Google Colab.
In the first experiment, the dataset was split into a testing set of 20% and a training set
of 80%, with a batch size of 150. The proposed model was compared to other deep
learning models, including VGG16, VGG19, AlexNet, Inception ResNet v2, ResNet50,
Inception v3, GoogleNet, and MobileNet. The proposed model had a minimum total
number of training parameters of 1 million and achieved a maximum accuracy of 100%,
with zero test loss, as indicated in Table 2.
Table 2. The total training parameters (Millions), the accuracy (%), and test loss (%) in first ex-
periment.
Model
Total parame-
ters
CNN Accuracy
Test Loss
VGG19
171
100
0
VGG16
165
100
0
AlexNet
58
99.9
0.1
InceptionResNetV2
54
100
0
ResNet50
24
94.6
19.5
Inception_v3
22
99.9
0.1
GoogleNet
5
100
0
MobileNet
3
100
0
proposed CNN model
1
100
0
The second experiment involved dividing the dataset into 40% for the training set and
60% for the testing set, using a batch size of 150. Subsequently, it was compared with
various other deep learning models, including VGG16, VGG19, AlexNet, Inception
ResNet v2, ResNet50, Inception v3, GoogleNet, and MobileNet. The proposed model
used a minimum of 1 million training parameters and achieved a maximum accuracy
of 99.9%, with a low-test loss of 0.3%, as illustrated in Table 3.
Table 3. The total training parameters (Millions), the accuracy (%), and test loss (%) in second
experiment
Model
Total parameters
CNN Accuracy
Test Loss
VGG19
171
99.7
1.1
VGG16
165
99.5
0.6
AlexNet
58
99.7
1.2
InceptionResNetV2
54
99.7
0.5
ResNet50
24
89.8
34.6
Inception_v3
22
99.7
0.6
GoogleNet
5
99.8
0.5
MobileNet
3
50
817.4
proposed CNN model
1
99.9
0.3
The comparison between the proposed CNN model and other deep learning models,
including VGG16, VGG19, AlexNet, Inception ResNet v2, ResNet50, Inception v3,
GoogleNet, and MobileNet, is illustrated in Fig. 3. The proposed model achieved a
maximum accuracy of 99.9% with a minimum total number of training parameters of
one million.
Fig. 3. The comparison between the proposed CNN model and other existing deep
learning models in second experiment.
6. Conclusions
This study introduces a proposed CNN model for the detection and classification of
lung cancer using the LC25000 lung histopathology image dataset. The proposed model
categorizes each image as either benign or malignant. Two experiments were conducted
to validate the proposed model using the lung histopathology images.
In the first experiment, 80% of the dataset was used for training, and 20% for testing.
In the second experiment, the dataset was split into 40% for training and 60% for test-
ing. The model's performance was evaluated using a maximum of 50 epochs when ap-
plied to Google Colab. The proposed model achieved an accuracy of 99.9% to 100%
and outperformed other deep learning models in terms of performance, using only four
convolutional layers, four maximum collection layers, two fully connected layers, and
one million parameters overall.
The experimental results showed that the proposed model achieved maximum accuracy
with the fewest parameters and that reducing the number of training images did not
significantly affect accuracy. The proposed approach was proven to be effective com-
pared to existing state-of-the-art deep learning models. In the future, the suggested
model can be improved to reduce computation time and applied to other datasets to
enhance the hyperparameters.
0
50
100
150
200
CNN Accuracy (%), and Total Parameters(Million)
CNN Accuracy Total parameters (M)
References
1.
Sung, Hyuna, et al. "Global cancer statistics 2020: GLOBOCAN estimates of
incidence and mortality worldwide for 36 cancers in 185 countries." CA: a
cancer journal for clinicians 71.3 (2021): 209-249.WALSER, Tonya, et al.
Smoking and lung cancer: the role of inflammation. Proceedings of the
American Thoracic Society, 2008, 5.8: 811-815.
2.
Araghi Marzieh, Soerjomataram Isabelle, Jenkins Mark, Brierley James,
Morris Eva, Bray Freddie, Arnold Melina. Global trends in colorectal cancer
mortality: projections to the year 2035 // International journal of cancer. 2019.
144, 12. 2992–3000.
3.
K. Inamura, Lung cancer: understanding its molecular pathology and the 2015
WHO classification, Front. Oncol. 7 (2017), 193.
https://doi.org/10.3389/fonc.2017.00193.
4.
N. Aliyah, E. Pranggono, B. Andriyoko, Kanker Paru: Sebuah Kajian Singkat,
Indones. J. Chest Emerg. Med. 4 (2016), 28–32.
5.
Molina, Julian R., et al. “Non-small cell lung cancer: epidemiology, risk
factors, treatment, and survivorship”. In: Mayo clinic proceedings. Elsevier,
2008. p. 584-594.
6.
Drilon et al.,”Squamous-cell carcinomas of the lung: emerging biology,
controversies, and the promise of targeted therapy,” Volume 13, Issue
10, October 2012, Pages e418-e426.
7.
Mishra, Swati; AGRAWAL, Utcarsh. Lung Cancer Detection (LCD) from
Histopathological Images using Fine-Tuned Deep Neural Network. Annals of
Medical and Health Sciences Research| Volume, 2022, 12.10: 2.
8.
Baranwal, Neha; Doravari, Preethi; Kachhoria, Renu. Classification of
Histopathology Images of Lung Cancer Using Convolutional Neural Network
(CNN). arXiv preprint arXiv:2112.13553, 2021.
9.
Ali, M.; Ali, R. Multi-Input Dual-Stream Capsule Network for Improved
Lung and Colon Cancer Classification. Diagnostics 2021, 11, 1485.
10.
N. Gessert, M. Bengs, L. Wittig, D. Dr omann, T. Keck, A. Schlaefer, D.B.
Ellebrecht, Deep transfer learning methods for colon cancer classification in
confocal laser microscopy images, Int. J. Comput. Assist. Radiol. Surg. 14
(2019) 1837–1845.
11.
Adu, K.; Yu, Y.; Cai, J.; Owusu-Agyemang, K.; Twumasi, B.A.; Wang, X.
DHS-CapsNet: Dual horizontal squash capsule networks for lung and colon
cancer classification from whole slide histopathological images. Int. J.
Imaging Syst. Technol. 2021, 31, 2075–2092.
12.
T.L.T. Vuong, D. Lee, J.T. Kwak, K. Kim, Multi-task deep learning for colon
cancer grading, Int. Conf. Electron. Information, Commun. 2020 (2020) 1–2.
13.
Mangal, S.; Chaurasia, A.; Khajanchi, A. Convolution neural networks for
diagnosing colon and lung cancer histopathological images. arXiv 2020,
arXiv:2009.03878.
14.
To ˘gaçar, M. Disease type detection in lung and colon cancer images using
the complement approach of inefficient sets. Comput. Biol. Med. 2021, 137,
104827.
15.
Mehmood, S.; Ghazal, T.M.; Khan, M.A.; Zubair, M.; Naseem, M.T.; Faiz,
T.; Ahmad, M. Malignancy detection in lung and colon histopathology
images using transfer learning with class selective image processing. IEEE
Access 2022, 10, 25657–25668.
16.
M. Shapcott, K.J. Hewitt, N. Rajpoot, Deep learning with sampling in colon
cancer histology, Front. Bioeng. Biotechnol. 7 (2019).
17.
BORKOWSKI, Andrew A., et al. Lung and colon cancer histopathological
image dataset (lc25000). arXiv preprint arXiv:1912.12142, 2019.
18.
Krizhevsky, A., Sutskever, I., & Hinton, G. E, “Imagenet classification with
deep convolutional neural networks, “In Advances in neural information
processing systems pp. 1097-1105, 2012.
19.
Wang, Z.,”The applications of deep learning on traffic identification,
“BlackHat USA, 2015.
20.
Wang, D., Khosla, A., Gargeya, R., Irshad, H., & Beck, A. H., “Deep learning
for identifying metastatic breast cancer, “. ArXiv preprint arXiv: 1606.05718,
2016.
21.
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., Senior,
A., Vanhoucke, V., Nguyen, P., Sainath, T.N. & Kingsbury, B.,” Deep neural
networks for acoustic modeling in speech recognition,” The shared views of
four research groups. IEEE Signal Processing Magazine, 29(6), pp. 82-97,
2012.
22.
Shafaey M.A., Salem M.AM. Ebied H.M., Al-Berry M.N., Tolba M.F., “Deep
Learning for Satellite Image Classification,” The International Conference on
Advanced Intelligent Systems and Informatics, Vol 845. Springer, 2018.