Conference PaperPDF Available

Detecting Lung Cancer from Histopathological Images using Convolution Neural Network

Authors:

Abstract

Lung cancer is one of the leading causes of mortality in both men and women throughout the world. That is why early identification and treatment of lung cancer patients bear a huge significance in the recovery procedure of such patients. A lot of time, pathologists use histopathological pictures of tissue biopsy from possibly diseased regions of the lungs to detect the probability and type of cancer. However, this procedure is both tedious and sometimes fallible too. Machine learning based solutions for medical image analysis can help a lot in this regard. The aim of this work is to provide a convolution neural network (CNN) model that can accurately recognize and categorize lung cancer types with superior accuracy which is very important for treatment. We propose a CNN model with 15000 images split into 3 categories: Training, validation, and testing. Three different types of lung tissues (Benign tissue, Adenocarcinoma, and squamous cell carcinoma) have been examined. 50 instances from every class were kept for testing procedure. The rest of the data was split as: about 80% and 20% for training and validation respectively. Eventually, our model obtained 98.15% training accuracy and 98.07% validation accuracy.
Detecting Lung Cancer from Histopathological
Images using Convolution Neural Network
Dewan Ziaul Karim
Department of Computer Science and Engineering
Brac University
Dhaka, Bangladesh
ziaul.karim@bracu.ac.bd
Tasfia Anika Bushra
Department of Computer Science and Engineering
Daffodil International University
Dhaka, Bangladesh
anika.cse@diu.edu.bd
Abstract Lung cancer is one of the leading causes of
mortality in both men and women throughout the world. That
is why early identification and treatment of lung cancer patients
bear a huge significance in the recovery procedure of such
patients. A lot of time, pathologists use histopathological
pictures of tissue biopsy from possibly diseased regions of the
lungs to detect the probability and type of cancer. However, this
procedure is both tedious and sometimes fallible too. Machine
learning based solutions for medical image analysis can help a
lot in this regard. The aim of this work is to provide a
convolution neural network (CNN) model that can accurately
recognize and categorize lung cancer types with superior
accuracy which is very important for treatment. We propose a
CNN model with 15000 images split into 3 categories: Training,
validation, and testing. Three different types of lung tissues
(Benign tissue, Adenocarcinoma, and squamous cell carcinoma)
have been examined. 50 instances from every class were kept for
testing procedure. The rest of the data was split as: about 80%
and 20% for training and validation respectively. Eventually,
our model obtained 98.15% training accuracy and 98.07%
validation accuracy.
KeywordsLung Cancer, Histopathological Images, Deep
Learning, CNN, Classification.
I. INTRODUCTION
Lung cancer is regarded as one of the most prominent
cancers in the whole wide world. It makes up around 25% of
all cancer related deaths [1]. The most common reason behind
lung cancer is smoking. However, in the case of non-smokers,
exposure to radon, second-hand smoking, air pollution, or
certain other substances can all cause lung cancer [2].
Unfortunately, the mortality rate of lung cancer is on the rise
and it is supposed to become about 17 million worldwide in
the year 2030 [3]. In case of developing countries, at the
current growth rate, people's odds of acquiring cancer during
their lifespan may rise up to 50%-60% by 2050 [4].
There are many medical tests (CT scan, X-rays, biopsy,
etc.) done to find out potential cancerous cells. In a biopsy,
histopathology slides are evaluated by pathologists to
establish the potential diagnosis [5,6,7] and determine the
type of lung cancers [8]. But it is a time-consuming procedure
and there is always a chance that cancer types could be
misdiagnosed, which eventually results in incorrect treatment
and puts a toll on patients’ lives.
For the reason mentioned above, it is essential to
implement an automated system for assisting doctors in the
diagnosis of lung cancers as early as possible with high
accuracy. Due to advancements in the technological sector, it
is now possible to build such an automated system using
artificial intelligence (AI) and machine learning (ML).
ML is considered as a branch of AI that concentrates on
using algorithms and data to emulate the way that people
learn and improve accuracy over time [9]. In recent years,
many researchers considered combining different machine
learning techniques with x-rays and CT images to provide a
workable system for identifying types of lung cancer. These
techniques involve Random Forest (RF), Support Vector
Machine (SVM), Bayesian Networks (BN), and Convolution
Neural Network (CNN) for detecting and recognizing lung
cancers. Recently, some authors considered using
histopathological images to differentiate between carcinomas
and non-carcinomas images using CNN.
CNN is an approach under deep learning that is widely
used in image recognition and classification [10,11,12]. It
usually considers an input image, allocates biases and
weights to the images and distinguishes one image from
another. CNN is superior to other conventional approaches in
a sense that it needs a very low amount of preprocessing.
Meaning that in other traditional techniques, filters have to be
set up manually, whereas the neural network obtains the
information itself.
CNN is frequently used for image-related tasks including
classification, segmentation, medical image analysis,
recognition, etc. because it has numerous benefits over other
methods. After providing the input images in CNN, they go
through several convolution layers like flattening, pooling,
and fully-connected (FC) layers. Some types of activation
functions are also used in order to perfectly identify an
image.
The primary aim of our research is to provide a feasible,
efficient and accurate ML model to detect lung cancer from
histopathological images by classifying benign tissue,
adenocarcinoma, and squamous cell carcinomas using CNN
architecture.
II. RELATED WORK
The authors Bijaya Kumar Hatuwal and Himal Chand
Thapa [13] created a deep CNN model to identify benign
tissue, adenocarcinoma, and squamous cell carcinoma where
there were three hidden layers, one input layer, and one fully
connected layer. A dropout value of 0.1 and max-pooling were
used in their research. They used “Adam optimizer and
eventually got 96.11% accuracy in training and 97.2%
accuracy in validation.
Muayed S AL-Huseiny et al. [14] proposed the approach
of deploying a transfer learning based deep neural network
(DNN) to detect lung nodules that are malignant using CT
images. They performed a fast pre-processing technique to
find out the ROI (Region of Interest) from the images. In this
work, GoogLeNet DNN was used and modified for their
dataset. The code was run in a machine having a processor of
2.5 GHz (Core-i3) with 16 GBs of ram and eventually
achieved an accuracy of 94.38%.
Another paper [15] described a lung cancer detection
system using Alexnet CNN. This work only distinguished
between malignant and benign lung tumors with the help of a
model based on convolution neural network and AlexNet. It
is to be mentioned that AlexNet is made up of 25 layers (with
a scale of 227x227x3). SGDM optimization model and an
initial learning rate of 0.0003 were used. MATLAB 2021a
software was used to run the code and the proposed method
achieved 96% accuracy in the end.
Ying Su et al. [16] proposed an approach for detecting
lung nodules using Faster R-CNN. They experimented on the
LIDC-IDRI dataset [17]. They used 0.001 as learning rate and
70000 as step size. Their attenuation coefficient, dropout rate,
and batch size were 0.1, 0.5, and 64 respectively. The
researchers achieved an accuracy of 91.2% with their
optimized and improved Faster R-CNN method.
Mehedi Masud et al. [18] suggested a classification
framework that differentiates among 5 different types of
colon and lung tissues by analyzing their histopathological
images. Among those 5 classes, 2 are benign and 3 are
malignant. A total of 25000 pictures were included in the
dataset. The authors used DFT and DWT techniques for
feature extraction from images. Later they used a CNN based
technique to identify cancer tissues with an accuracy of
96.33%. Satvik Garg et al. [19] conducted another research
that demonstrated the results of various pre-trained CNN
models.
Another work [20] suggested an automated system for
detecting lung malignancies in WSI (Whole Slide Images) of
lung tissues using two CNN architectures - ResNet and
VGG16. The target was to identify image patches into normal
and tumor cells. The authors used SGD as the optimizer.
Binary crossentropy was assigned as the loss function and a
learning rate of 0.0001 was chosen. Finally, it was observed
that VGG16 (75.41%) outperformed ResNet (72.05%) in
terms of patch level accuracy.
Albert Chon [21] et al. presented a Googlenet-based 3D
CNN model for lung cancer detection. The dataset contained
labeled data for 2101 patients, which the authors divided into
training, validation and test set size of 1261, 420, and 420. A
dropout with 0.3 probability was used after each convolution
and inception layers during training. They used “Adam”
optimizer with 0.0001 learning rate. It was seen that the
suggested model achieved an accuracy of 75.1% with an
AUC score of 0.757.
III. DATASET DESCRIPTION
The dataset used in this study contains 15000 lung
histopathology images. This dataset is obtained from
LC25000 Lung and colon histopathological image dataset
[22]. Those 15000 images are divided into 3 different
categories: benign tissue, adenocarcinoma, and squamous cell
carcinoma. Among those 15000 images, 11850 were put into
training, 3000 were used for validation and 150 were kept for
testing purposes. The pictures were all in RGB format, with
256 X 256 pixel sizes. Some samples from different classes is
shown below:
adenocarcinoma
squamous cell
carcinoma
benign
Fig. 1. Dataset Sample
IV. METHODOLOGY
In this study, a CNN model has been created to detect 3
classes of lung cancers. Fig. 2 indicates the complete
workflow of this research.
Fig. 2. Methodology of Detecting Lung Cancer
The procedure can be divided into 2 main steps: i)
Preparation of dataset ii) Implementing CNN model
A. Dataset Preparation
To avoid getting a disappointing result, it is always better
to pre-process the dataset to increase efficiency [23]. In our
work, various steps were considered to prepare the training
dataset.
Outliers Removal: The dataset was examined
rigorously for any outliers as outliers can affect the
performance of our model.
Resizing Images: All the images were scaled to a pixel
size of 256 x 256 as CNN models tend to take a fixed
dimension as inputs.
Dataset Normalization: Normalized data can help deep
learning based models gain more stability and provide
a better chance of convergence. The range of pixel
values in a picture is 0 to 255. So we used Minmax (1)
normalizer to normalize the pixel values of our
images.
  󰇛 󰇜 (1)
Data Augmentation: Usually CNN models perform
better with more images. Hence, we applied some data
augmentation methods to expand our training data.
Techniques such as shearing, rotating, shifting,
flipping, etc. were applied to bring variety to the
dataset and make the model more robust.
B. Proposed Model’s Architecture
In this work, we suggest a multi-layered CNN model to
classify different types of lung cancers from histopathological
images. There are 6 convolution layers and 3 dense layers in
our CNN model. There are 32,64,128,128,128, and 64 filters
respectively in those 6 convolution layers with 3 x 3 kernel
size. All the convolution operations are followed by Batch
Normalization [24] operation (2) which helps to make the
learning procedure faster. Following that, a Max-pooling [25]
procedure with a pool size of 2 x 2 was performed.

 (2)
Since convolution networks work better with ReLU [26],
all the convolution layers use “ReLU” as activation function
(3).
󰇛󰇜 (3)
A flatten layer is designed just after those 6 convolution
layers. It helps in the process of converting data into a one-
dimensional array for usage in the next layer. After this, 3
consecutive dense layers are implemented with 512, 64, and 3
units respectively. There are 3 nodes in the last dense layer as
we are trying to classify 3 different types of lung tissues. A
softmax activation function (4) was applied in the last dense
layer.
 (4)
TABLE I. PROPOSED MODEL SUMMARY
Layers
Shape of Output
conv2d_0
(None,254,254,32)
batch_normalization_0
(None,254,254,32)
max_pooling2d_0
(None,127,127,32)
conv2d_1
(None,125,125,64)
batch_normalization_1
(None,125,125,64)
max_pooling2d_1
(None,62,62,64)
conv2d_2
(None,60,60,128)
batch_normalization_2
(None,60,60,128)
max_pooling2d_2
(None,30,30,128)
conv2d_3
(None,28,28,128)
batch_normalization_3
(None,28,28,128)
max_pooling2d_3
(None,14,14,128)
conv2d_4
(None,12,12,128)
batch_normalization_4
(None,12,12,128)
max_pooling2d_4
(None,6,6,128)
conv2d_5
(None,4,4,64)
batch_normalization_5
(None,4,4,64)
max_pooling2d_5
(None,2,2,64)
flatten_1
(None,256)
dense_0
(None,512)
batch_normalization_6
(None,512)
dense_1
(None,64)
batch_normalization_7
(None,64)
dense_2
(None,3)
activation
(None,3)
Total params: 631,299
Trainable params: 629,059
Non-trainable params: 2,240
C. Parameters used in Training
For our proposed model, we tried to use multiple
parameters e.g., optimizer, learning rate, metrics, batch size,
epoch numbers, callbacks, etc. Table II indicates the various
training parameters used in our model:
TABLE II. TRAINING PARAMETERS USED IN THE MODEL
Name of Parameter
Value
Used Optimizer
Adam
Learning Rate (Initial)
0.01
Learning Rate (Minimum)
.000001
Regularizer
L1 (0.000001)
Batch Size
20
Epochs
60
Steps per Epoch
593
Loss Function
Categorical Crossentropy
Metrics
Accuracy, Precision,
Recall, Loss
Callbacks
ReduceLROnPlateau
D. Evaluation Tools
Python version 3.X was used for the whole experiment
including dataset preparation, model implementation, and
evaluation.
V. RESULT ANALYSIS
From the whole dataset, 50 images from each class were
kept aside for testing purposes. The remaining images were
split in such a way that about 80% data went into training and
20% went into validation. The model finally achieved a
training and validation accuracy of 98.15% and 98.07%
respectively. Fig. 3 and 4 indicate accuracy and loss graphs
for both training and validation respectively.
Fig. 3. Accuracy for Both Training and Validation
Fig. 4. Loss for Both Training and Talidation
Moreover, we also calculated the accuracy of different
pre-trained CNN models for the same dataset along with same
hyperparameters and compared with the result of our proposed
CNN model. The different models that we tried out are
DenseNet201, ResNet152V2, MobileNetV2, InceptionV3,
Xception, InceptionResNetV2, VGG16, VGG19 and
ResNet50. It was seen that all of those models performed
poorer than our proposed model. Among the pre-trained
models, DenseNet201 and MobileNetV2 achieved the highest
training and validation accuracy of 95.41% and 95.03%
respectively. Nevertheless, both of these are lower than the
training and validation accuracy achieved by our proposed
model. As a result, we came to the conclusion that compared
to the different transfer learning approaches, our approach to
lung cancer diagnosis has demonstrated better results with
greater accuracy rates.
Table III shows the comparison between the accuracy rate
of pretrained models and our suggested CNN model against
the same dataset.
TABLE III. TRAINING AND VALIDATION ACCURACY COMPARISON OF
PROPOSED AND PRE-TRAINED CNN MODELS
Model
Validation
Accuracy
Proposed Model
98.07%
DenseNet201
94.10%
ResNet152V2
93.53%
MobileNetV2
95.03%
InceptionV3
93.20%
Xception
92.30%
InceptionResNetV2
92.60%
VGG16
91.77%
VGG19
82.50%
ResNet50
51.50%
Fig. 5 and 6 indicate training and validation accuracy
graphs for different models respectively.
Fig. 5. Training Accuracy Comparison of Different Models
Fig. 6. Validation Accuracy Comparison of Different Models
To understand our results better, we noticed the confusion
matrix based on test samples. It is important in the sense that it
provides a clear overview of samples being classified
correctly or incorrectly [27].
Fig. 7. Confusion Matrix on Test Samples
Fig. 7 exhibits the model’s confusion matrix on our
selected test(unseen) samples that include lung
adenocarcinoma (lung_aca), lung squamous cell carcinoma
(lung_scc), and lung benign tissue (lung_n).
If we look at the confusion matrix, we notice that our
model identified all of the samples from lung adenocarcinoma
and lung benign tissues with an accuracy of 100%. However,
for the class squamous cell carcinoma, 48 instances were
classified correctly and 2 were wrongly classified.
Observing the value of Recall (R), Precision (P), and F1
score on test samples is another good idea to check the
reliability of any model [28]. The formula for F1 is = 2*((R *
P) / (R + P)). Precision is calculated using the formula = TP /
(TP + FP). Dividing TP by the addition of TP and FN gives us
Recall. Here, TP is True Positive, FP is False Positive and FN
is False Negative. Table IV illustrates precision, recall and F1
score for every category in our test sample dataset.
TABLE IV. CLASSIFICATION REPORT BASED ON TEST SAMPLES
VI. FUTURE WORK
In the future, different CNN architecture with some
hyperparameters tuning may result in better accuracy than the
current one. This work may be extended to CT scan imaging
problems too. It may also be possible to build a mobile
application that will provide real time detection and eventually
widen the utilization of our technique.
VII. CONCLUSION
This work represents a CNN model to detect lung cancer
using histopathological images. The whole dataset consisted
of 15000 images and our experimental findings indicated
training and validation accuracy of 98.15% and 98.07%
respectively. It is expected that this model will help
pathologists to identify lung cancer (benign, adenocarcinoma,
squamous cell adenocarcinoma lung tissues) with less time,
effort and cost.
REFERENCES
[1] (2020) “American Cancer Society, Lung Cancer
Statistics.[Online]”.Available:
https://www.cancer.org/cancer/lungcancer/about/key-
statistics.html
[2] (2019) “American Cancer Society, Lung Cancer
Causes. [Online].” Available:
https://www.cancer.org/cancer/lungcancer/causes-
risks-prevention/what-causes.html
[3] Nie, L., Zhang, L., Yang, Y., Wang, M., Hong, R. and
Chua, T.S., 2015, October. Beyond doctors: Future
health prediction from multimedia and multimodal
observations. In Proceedings of the 23rd ACM
international conference on Multimedia (pp. 591-600).
[4] Kumar, P., Bhattacharyya, G.S., Dattatreya, S. and
Malhotra, H., 2009. Tackling the cancer Tsunami.
Indian journal of cancer, 46(1), p.1.
[5] Silvestri, G.A., Gould, M.K., Margolis, M.L., Tanoue,
L.T., McCrory, D., Toloza, E. and Detterbeck, F., 2007.
Noninvasive staging of non-small cell lung cancer:
ACCP evidenced-based clinical practice
guidelines. Chest, 132(3), pp.178S-201S.
[6] Travis, W.D., Brambilla, E., Noguchi, M., Nicholson,
A.G., Geisinger, K.R., Yatabe, Y., Beer, D.G., Powell,
C.A., Riely, G.J., Van Schil, P.E. and Garg, K., 2011.
International association for the study of lung
cancer/american thoracic society/european respiratory
society international multidisciplinary classification of
lung adenocarcinoma. Journal of thoracic
oncology, 6(2), pp.244-285.
[7] Collins, L.G., Haines, C., Perkel, R. and Enck, R.E.,
2007. Lung cancer: diagnosis and
management. American family physician, 75(1), pp.56-
63.
[8] Yu, K.H., Zhang, C., Berry, G.J., Altman, R.B., Ré, C.,
Rubin, D.L. and Snyder, M., 2016. Predicting non-
small cell lung cancer prognosis by fully automated
microscopic pathology image features. Nature
communications, 7(1), pp.1-10.
[9] Michie, D., Spiegelhalter, D.J. and Taylor, C.C., 1994.
Machine learning, neural and statistical classification.
[10] O'Shea, K. and Nash, R., 2015. An introduction to
convolutional neural networks. arXiv preprint
arXiv:1511.08458.
[11] Hijazi, S., Kumar, R. and Rowen, C., 2015. Using
convolutional neural networks for image recognition.
Cadence Design Systems Inc.: San Jose, CA, USA,
pp.1-12.
[12] Sultana, F., Sufian, A. and Dutta, P., 2018, November.
Advancements in image classification using
convolutional neural network. In 2018 Fourth
International Conference on Research in Computational
Name of Classes
Precision
Recall
F1
Score
Support
Lung
Adenocarcinoma
.96
1
.98
50
Lung Benign
1
1
1
Lung Squamous
Cell Carcinoma
1
.96
.98
Accuracy
.99
50*3=150
Marco Average
.99
.99
.99
Weighted
Average
.99
.99
.99
Intelligence and Communication Networks (ICRCICN)
(pp. 122-129). IEEE.
[13] Hatuwal, B.K. and Thapa, H.C., 2020. Lung Cancer
Detection Using Convolutional Neural Network on
Histopathological Images. Int. J. Comput. Trends
Technol, 68, pp.21-24.
[14] AL-Huseiny, M.S. and Sajit, A.S., 2021. Transfer
learning with GoogLeNet for detection of lung
cancer. Indonesian Journal of Electrical Engineering
and Computer Science, 22(2), pp.1078-1086.
[15] Agarwal, A., Patni, K. and Rajeswari, D., 2021, July.
Lung Cancer Detection and Classification Based on
Alexnet CNN. In 2021 6th International Conference on
Communication and Electronics Systems (ICCES) (pp.
1390-1397). IEEE.
[16] Su, Y., Li, D. and Chen, X., 2021. Lung nodule
detection based on faster R-CNN
framework. Computer Methods and Programs in
Biomedicine, 200, p.105866.
[17] Armato III, S.G., McLennan, G., Bidaut, L., McNitt‐
Gray, M.F., Meyer, C.R., Reeves, A.P., Zhao, B.,
Aberle, D.R., Henschke, C.I., Hoffman, E.A. and
Kazerooni, E.A., 2011. The lung image database
consortium (LIDC) and image database resource
initiative (IDRI): a completed reference database of
lung nodules on CT scans. Medical physics, 38(2),
pp.915-931.
[18] Masud, M., Sikder, N., Nahid, A.A., Bairagi, A.K. and
AlZain, M.A., 2021. A machine learning approach to
diagnosing lung and colon cancer using a deep learning-
based classification framework. Sensors, 21(3), p.748.
[19] Garg, S. and Garg, S., 2020, December. Prediction of
lung and colon cancer through analysis of
histopathological images by utilizing Pre-trained CNN
models with visualization of class activation and
saliency maps. In 2020 3rd Artificial Intelligence and
Cloud Computing Conference (pp. 38-45).
[20] Šarić, M., Russo, M., Stella, M. and Sikora, M., 2019,
June. CNN-based method for lung cancer detection in
whole slide histopathology images. In 2019 4th
International Conference on Smart and Sustainable
Technologies (SpliTech) (pp. 1-4). IEEE.
[21] Chon, A., Balachandar, N. and Lu, P., 2017. Deep
convolutional neural networks for lung cancer
detection. Standford University.
[22] Borkowski, A.A., Bui, M.M., Thomas, L.B., Wilson,
C.P., DeLand, L.A. and Mastorides, S.M., 2019. Lung
and colon cancer histopathological image dataset
(lc25000). arXiv preprint arXiv:1912.12142.
[23] Pal, K.K. and Sudeep, K.S., 2016, May. Preprocessing
for image classification by convolutional neural
networks. In 2016 IEEE International Conference on
Recent Trends in Electronics, Information &
Communication Technology (RTEICT) (pp. 1778-
1781). IEEE.
[24] Ioffe, S. and Szegedy, C., 2015, June. Batch
normalization: Accelerating deep network training by
reducing internal covariate shift. In International
conference on machine learning (pp. 448-456). PMLR.
[25] Scherer, D., Müller, A. and Behnke, S., 2010,
September. Evaluation of pooling operations in
convolutional architectures for object recognition. In
International conference on artificial neural networks
(pp. 92-101). Springer, Berlin, Heidelberg.
[26] Agarap, A.F., 2018. Deep learning using rectified linear
units (relu). arXiv preprint arXiv:1803.08375.
[27] Ting, K.M., 2017. Confusion matrix. Encyclopedia of
Machine Learning and Data Mining, 260.
[28] Goutte, C. and Gaussier, E., 2005, March. A
probabilistic interpretation of precision, recall and F-
score, with implication for evaluation. In European
conference on information retrieval (pp. 345-359).
Springer, Berlin, Heidelberg.
... There are various phases involved in classifying skin lesions using the AlexNet convolutional neural network. AlexNet is a Convolutional neural network (CNN) architecture that has 5 convolutional layers and 3 FC layers [26]. Local response normalization, dropout regularization, and ReLU activation functions are all included in the architecture of AlexNet. ...
... While the fully connected layers of AlexNet learn high-level features like object recognition, the convolutional layers of the network learn to detect low-level features like edges and corners. With approximately 60 million parameters, AlexNet has a much bigger number of parameters than earlier models [26]. The AlexNet CNN, which has been demonstrated to perform better in the categorization of skin lesions, is used in the second stage to develop the model architecture. ...
Article
Full-text available
The classification of skin lesions is crucial because it increases the likelihood that malignant skin lesions will be discovered early on, allowing for more effective treatment. Due to the abundance of lesion images and the possibility of human error, early detection can be difficult for dermatologists. This work aims to classify skin lesions using two pipelines that were designed using support vector machine (SVM) and AlexNet convolutional neural network (CNN) models. Pipeline-1 uses the AlexNet CNN, while pipeline-2 proposes a bisectional feature extraction approach with an SVM model. The skin lesion images are initially preprocessed and the lesion regions are segmented. The lesion regions are further subdivided into four regions based on the intensity mapping function. The bisectional features are then extracted from the subdivided regions and the extracted features are trained with the SVM model. The dataset used in the experiment is the HAM-10000 dataset and the PAD-UFES-20 dataset, which consists of dermatoscopic skin lesions images. Based on the models’ accuracy, sensitivity, DCI, specificity, and F1-score, the experiment’s findings will be assessed for five different skin lesion conditions. By accurately and effectively classifying skin lesions, the study’s findings will help in the diagnosis and treatment of skin disorders. The SVM pipeline performs better than the AlexNet CNN pipeline where the SVM pipeline and AlexNet CNN pipeline result in an accuracy of 98.66% and 97.68% respectively for the HAM-10000 dataset. The AlexNet CNN and SVM pipeline structure results in an accuracy of 96.87% and 98.10% respectively for the PAD-UFES-20 dataset.
... Karim et al. [29] proposed a deep learning strategy based on CNN for the classification of lung tissue samples, achieving a training accuracy of 98.15% and a validation accuracy of 98.07%. ...
Article
Full-text available
Automatic classification of colon and lung cancer images is crucial for early detection and accurate diagnostics. However, there is room for improvement to enhance accuracy, ensuring better diagnostic precision. This study introduces two novel dense architectures (D1 and D2) and emphasizes their effectiveness in classifying colon and lung cancer from diverse images. It also highlights their resilience, efficiency, and superior performance across multiple datasets. These architectures were tested on various types of datasets, including NCT-CRC-HE-100K (set of 100,000 non-overlapping image patches from hematoxylin and eosin (H&E) stained histological images of human colorectal cancer (CRC) and normal tissue), CRC-VAL-HE-7K (set of 7180 image patches from N = 50 patients with colorectal adenocarcinoma, no overlap with patients in NCT-CRC-HE-100K), LC25000 (Lung and Colon Cancer Histopathological Image), and IQ-OTHNCCD (Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases), showcasing their effectiveness in classifying colon and lung cancers from histopathological and Computed Tomography (CT) scan images. This underscores the multi-modal image classification capability of the proposed models. Moreover, the study addresses imbalanced datasets, particularly in CRC-VAL-HE-7K and IQ-OTHNCCD, with a specific focus on model resilience and robustness. To assess overall performance, the study conducted experiments in different scenarios. The D1 model achieved an impressive 99.80 % accuracy on the NCT-CRC-HE-100K dataset, with a Jaccard Index (J) of 0.8371, a Matthew's Correlation Coefficient (MCC) of 0.9073, a Cohen's Kappa (Kp) of 0.9057, and a Critical Success Index (CSI) of 0.8213. When subjected to 10-fold cross-validation on LC25000, the D1 model averaged (avg) 99.96 % accuracy (avg J, MCC, Kp, and CSI of 0.9993, 0.9987, 0.9853, and 0.9990), surpassing recent reported performances. Furthermore, the ensemble of D1 and D2 reached 93 % accuracy (J, MCC, Kp, and CSI of 0.7556, 0.8839, 0.8796, and 0.7140) on the IQ-OTHNCCD dataset, exceeding recent benchmarks and aligning with other reported results. Efficiency evaluations were conducted in various scenarios. For instance, training on only 10 % of LC25000 resulted in high accuracy rates of 99.19 % (J, MCC, Kp, and CSI of 0.9840, 0.9898, 0.9898, and 0.9837) (D1) and 99.30 % (J, MCC, Kp, and CSI of 0.9863, 0.9913, 0.9913, and 0.9861) (D2). In NCT-CRC-HE-100K, D2 achieved an impressive 99.53 % accuracy (J, MCC, Kp, and CSI of 0.9906, 0.9946, 0.9946, and 0.9906) with training on only 30 % of the dataset and testing on the remaining 70 %. When tested on CRC-VAL-HE-7K, D1 and D2 achieved 95 % accuracy (J, MCC, Kp, and CSI of 0.8845, 0.9455, 0.9452, and 0.8745) and 96 % accuracy (J, MCC, Kp, and CSI of 0.8926, 0.9504, 0.9503, and 0.8798), respectively, outperforming previously reported results and aligning closely with others. Lastly, training D2 on just 10 % of NCT-CRC-HE-100K and testing on CRC-VAL-HE-7K resulted in significant outperformance of InceptionV3, Xception, and DenseNet201 benchmarks, achieving an accuracy rate of 82.98 % (J, MCC, Kp, and CSI of 0.7227, 0.8095, 0.8081, and 0.6671). Finally, using explainable AI algorithms such as Grad-CAM, Grad-CAM++, Score-CAM, and Faster Score-CAM, along with their emphasized versions, we visualized the features from the last layer of DenseNet201 for histopathological as well as CT-scan image samples. The proposed dense models, with their multi-modality, robustness, and efficiency in cancer image classification, hold the promise of significant advancements in medical diagnostics. They have the potential to revolutionize early cancer detection and improve healthcare accessibility worldwide.
... In this case, three pre-trained CNN methods are employed such as ResNet-18, GoogLeNet, ShuffleNet V2, and one simple adapted CNN algorithm. Karim and Bushra [17] focused on offering a CNN method, which is used for accurately identifying and classifying lung cancer varieties with higher accuracy that can be highly significant for treatment. Saif et al. [18] introduced a CNN to detect 3 categories of lung cancers (Squamous, Benign, and Adenocarcinoma) depending on HI. ...
Article
Full-text available
Lung and colon cancers are deadly diseases that can develop concurrently in organs and undesirably affect human life in some special cases. The detection of these cancers from histopathological images poses a complex challenge in medical diagnostics. Advanced image processing techniques, including deep learning algorithms, offer a solution by analyzing intricate patterns and structures in histopathological slides. The integration of artificial intelligence in histopathological analysis not only improves the proficiency of cancer detection but also holds the potential to increase prognostic assessments, eventually contributing to effective treatment strategies for patients with lung and colon cancers. This manuscript presents an Improved Water Strider Algorithm with Convolutional Autoencoder for Lung and Colon Cancer Detection (IWSACAE-LCCD) on HIs. The major aim of the IWSACAE-LCCD technique aims to detect lung and colon cancer. For noise removal process, median filtering (MF) approach can be used. Besides, deep convolutional neural network based MobileNetv2 model can be applied as a feature extractor with IWSA based hyperparameter optimizer. Finally, convolutional autoencoder (CAE) model can be applied to detect the presence of lung and colon cancer. To enhance the detection results of the IWSACAE-LCCD technique, a series of simulations were performed. The obtained results highlighted that the IWSACAE-LCCD technique outperforms other approaches in terms of different measures.
... Disease detection from medical imaging has long been accomplished using machine learning (ML) [3]. Numerous studies have been conducted in order to diagnose and screen a variety of diseases [24] including tuberculosis and lung cancer [14]. Deep learning (DL)-based classification models have continued to advance and have been shown to be the most effective ones [1,10,25]. ...
Conference Paper
Access to a large dataset is necessary to improve disease detection with excellent accuracy. However, due to data confidentiality and privacy restrictions, collecting data from hospitals or other organizations is a significant challenge in the healthcare sector. Due to this, Federated Learning (FL), which adopts a decentralized approach, is developed to replace the conventional machine learning methodology in the development of improved screening methods. Since there is no requirement for data to be centralized in federated learning, patient data privacy is ensured. In this paper, we have compared the sequential model and the ensemble model for both federated learning and centralized approach, two different types of models. For each approach, these models were applied to separate X-ray images for the detection of two different lung diseases: lung cancer and tuberculosis. In this paper, we also have showed the analysis of their accuracy and demonstrated how FL can be the most effective strategy through comparison.
... One of the most difficult challenges in image processing is image segmentation of the lung image, which may be found in the medical field. Image segmentation is essential in many image-processing applications [21,22]. The requirement for extreme precision in dealing with human life drives the automated categorization and identification of various medical images of malignancies [23]. ...
Article
Full-text available
The medical community has more concern on lung cancer analysis. Medical experts’ physical segmentation of lung cancers is time-consuming and needs to be automated. The research study’s objective is to diagnose lung tumors at an early stage to extend the life of humans using deep learning techniques. Computer-Aided Diagnostic (CAD) system aids in the diagnosis and shortens the time necessary to detect the tumor detected. The application of Deep Neural Networks (DNN) has also been exhibited as an excellent and effective method in classification and segmentation tasks. This research aims to separate lung cancers from images of Magnetic Resonance Imaging (MRI) with threshold segmentation. The Honey hook process categorizes lung cancer based on characteristics retrieved using several classifiers. Considering this principle, the work presents a solution for image compression utilizing a Deep Wave Auto-Encoder (DWAE). The combination of the two approaches significantly reduces the overall size of the feature set required for any future classification process performed using DNN. The proposed DWAE-DNN image classifier is applied to a lung imaging dataset with Radial Basis Function (RBF) classifier. The study reported promising results with an accuracy of 97.34%, whereas using the Decision Tree (DT) classifier has an accuracy of 94.24%. The proposed approach (DWAE-DNN) is found to classify the images with an accuracy of 98.67%, either as malignant or normal patients. In contrast to the accuracy requirements, the work also uses the benchmark standards like specificity, sensitivity, and precision to evaluate the efficiency of the network. It is found from an investigation that the DT classifier provides the maximum performance in the DWAE-DNN depending on the network’s performance on image testing, as shown by the data acquired by the categorizers themselves.
... Disease detection from medical imaging has long been accomplished using machine learning (ML) [3]. Numerous studies have been conducted in order to diagnose and screen a variety of diseases [24] including tuberculosis and lung cancer [14]. Deep learning (DL)-based classification models have continued to advance and have been shown to be the most effective ones [1,10,25]. ...
Preprint
Access to a large dataset is necessary to improve disease detection with excellent accuracy. However, due to data confidentiality and privacy restrictions, collecting data from hospitals or other organizations is a significant challenge in the healthcare sector. Due to this, Federated Learning (FL), which adopts a decentralized approach, is developed to replace the conventional machine learning methodology in the development of improved screening methods. Since there is no requirement for data to be centralized in federated learning, patient data privacy is ensured. In this paper, we have compared the sequential model and the ensemble model for both federated learning and centralized approach, two different types of models. For each approach , these models were applied to separate X-ray images for the detection of two different lung diseases: lung cancer and tuberculosis. In this paper, we also have showed the analysis of their accuracy and demonstrated how FL can be the most effective strategy through comparison.
Article
Full-text available
Lung cancer, one of the deadliest diseases worldwide, can be treated, where the survival rates increase with early detection and treatment. CT scans are the most advanced imaging modality in clinical practices. Interpreting and identifying cancer from CT scan images can be difficult for doctors. Thus, automated detection helps doctors to identify malignant cells. A variety of techniques including deep learning and image processing have been extensively examined and evaluated. The objective of this study is to evaluate different transfer learning models through the optimization of certain variables including learning rate (LR), batch size (BS), and epochs. Finally, this study presents an enhanced model that achieves improved accuracy and faster processing times. Three models, namely VGG16, ResNet-50, and CNN Sequential Model, have undergone evaluation by changing parameters like learning rate, batch size, and epochs and after extensive experiments, it has been found that among these three models, the CNN Sequential model is working best with an accuracy of 94.1% and processing time of 1620 seconds. However, VGG16 and ResNet50 have 95.0% and 93% accuracies along with processing times of 5865 seconds and 9460 seconds, respectively.
Conference Paper
Full-text available
Lung cancer is characterized by high mortality and incidence rates, making it one of the most prevalent cancers globally. Early detection significantly improves the chances of survival for individuals affected by this disease. The histopathological diagnosis is a crucial factor in determining the specific type of cancer. In recent years, there has been a significant increase in novel computer-aided diagnostic techniques utilizing deep learning algorithms for the early detection of lung cancer. However, sharing sensitive patient data is significantly restricted by regulations such as HIPAA and GDPR, primarily due to privacy concerns. Given the current constraints, institutions face challenges in effectively exchanging information to enhance the accuracy of lung cancer classification. In order to address the issue of privacy in lung cancer classification, we propose a federated learning approach. This methodology involves employing local models with an Inception-v3 backbone to carry out the classification of histopathological images of lung cancer & updating the global model based on the local weights. These images have been obtained from the LC25000 dataset. The lung cancer images from the LC25000 dataset were analyzed, which consisted of three distinct classes. Each class contained a total of 5000 images. The applied model has achieved a classification accuracy of 99.867% in categorizing lung cancer images into three distinct classes. The performance of the proposed framework has demonstrated superiority over other existing methodologies. Furthermore, this solution effectively addresses the privacy concerns associated with the sharing of medical data among different institutions.
Article
Full-text available
p class="p1">The use of computer algorithms has gained momentum in filling/assisting roles of specialists especially in early diagnosis scenarios. This paper proposes the employment of deep neural networks (DNN) to detect images with malignant nodules of lung computed tomography (CT). The method includes subjecting input images to a simple and fast pre-processing which isolates regions of interest (ROI), that’s the lungs dominated area, ridding the images of other surrounding tissues and artefacts. Centered and size normalized images are then fed to a deep neural network for training and validation. In this work transfer learning is used to readjust GoogLeNet DNN to learn this medical data. This includes allowing final layers of the DNN to evolve while restricting deep layers. In this setting, a rough, unprocessed dataset, the IQ-OTH/NCCD lung cancer dataset was used to train/validate the proposed algorithm. Experimental results show that this algorithm scores 94.38% accuracy, which outperforms benchmark method previously used with this dataset.</p
Conference Paper
Full-text available
Colon and Lung cancer is one of the most perilous and dangerous ailments that individuals are enduring worldwide and has become a general medical problem. To lessen the risk of death, a legitimate and early finding is particularly required. In any case, it is a truly troublesome task that depends on the experience of histopathologists. If a histologist is under-prepared it may even hazard the life of a patient. As of late, deep learning has picked up energy, and it is being valued in the analysis of Medical Imaging. This paper intends to utilize and alter the current pre-trained CNN-based model to identify lung and colon cancer utilizing histopathological images with better augmentation techniques. In this paper, eight distinctive Pre-trained CNN models, VGG16, NASNetMobile, InceptionV3, InceptionResNetV2, ResNet50, Xception, MobileNet, and DenseNet169 are trained on LC25000 dataset. The model performances are assessed on precision, recall, f1score, accuracy, and auroc score. The results exhibit that all eight models accomplished noteworthy results ranging from 96% to 100% accuracy. Subsequently, GradCAM and SmoothGrad are also used to picture the attention images of Pre-trained CNN models classifying malignant and benign images.
Article
Full-text available
The field of Medicine and Healthcare has attained revolutionary advancements in the last forty years. Within this period, the actual reasons behind numerous diseases were unveiled, novel diagnostic methods were designed, and new medicines were developed. Even after all these achievements, diseases like cancer continue to haunt us since we are still vulnerable to them. Cancer is the second leading cause of death globally; about one in every six people die suffering from it. Among many types of cancers, the lung and colon variants are the most common and deadliest ones. Together, they account for more than 25% of all cancer cases. However, identifying the disease at an early stage significantly improves the chances of survival. Cancer diagnosis can be automated by using the potential of Artificial Intelligence (AI), which allows us to assess more cases in less time and cost. With the help of modern Deep Learning (DL) and Digital Image Processing (DIP) techniques , this paper inscribes a classification framework to differentiate among five types of lung and colon tissues (two benign and three malignant) by analyzing their histopathological images. The acquired results show that the proposed framework can identify cancer tissues with a maximum of 96.33% accuracy. Implementation of this model will help medical professionals to develop an automatic and reliable system capable of identifying various types of lung and colon cancers.
Article
Full-text available
Lung Cancer is one of the leading life taking cancer worldwide. Early detection and treatment are crucial for patient recovery. Medical professionals use histopathological images of biopsied tissue from potentially infected areas of lungs for diagnosis. Most of the time, the diagnosis regarding the types of lung cancer are error-prone and time-consuming. Convolutional Neural networks can identify and classify lung cancer types with greater accuracy in a shorter period, which is crucial for determining patients' right treatment procedure and their survival rate. Benign tissue, Adenocarcinoma, and squamous cell carcinoma are considered in this research work. The CNN model training and validation accuracy of 96.11 and 97.2 percentage are obtained.
Article
Full-text available
We introduce the use of rectified linear units (ReLU) as the classification function in a deep neural network (DNN). Conventionally, ReLU is used as an activation function in DNNs, with Softmax function as their classification function. However, there have been several studies on using a classification function other than Softmax, and this study is an addition to those. We accomplish this by taking the activation of the penultimate layer $h_{n - 1}$ in a neural network, then multiply it by weight parameters $\theta$ to get the raw scores $o_{i}$. Afterwards, we threshold the raw scores $o_{i}$ by $0$, i.e. $f(o) = \max(0, o_{i})$, where $f(o)$ is the ReLU function. We provide class predictions $\hat{y}$ through argmax function, i.e. argmax $f(x)$.
Article
Background Lung cancer is a worldwide high-risk disease, and lung nodules are the main manifestation of early lung cancer. Automatic detection of lung nodules reduces the workload of radiologists, the rate of misdiagnosis and missed diagnosis. For this purpose, we propose a Faster R-CNN algorithm for the detection of these lung nodules. Method Faster R-CNN algorithm can detect lung nodules, and the training set is used to prove the feasibility of this technique. In theory, parameter optimization can improve network structure, as well as detection accuracy. Result Through experiments, the best parameters are that the basic learning rate is 0.001, step size is 70,000, attenuation coefficient is 0.1, the value of Dropout is 0.5, and the value of Batch Size is 64. Compared with other networks for detecting lung nodules, the optimized and improved algorithm proposed in this paper generally improves detection accuracy by more than 20% when compared with the other traditional algorithms. Conclusion Our experimental results have proved that the method of detecting lung nodules based on Faster R-CNN algorithm has good accuracy and therefore, presents potential clinical value in lung disease diagnosis. This method can further assist radiologists, and also for researchers in the design and development of the detection system for lung nodules.
Conference Paper
Convolutional Neural Network (CNN) is the state-of-the-art for image classification task. Here we have briefly discussed different components of CNN. In this paper, We have explained different CNN architectures for image classification. Through this paper, we have shown advancements in CNN from LeNet-5 to latest SENet model. We have discussed the model description and training details of each model. We have also drawn a comparison among those models.
Conference Paper
Although chronic diseases cannot be cured, they can be effectively controlled as long as we understand their progressions based on the current observational health records, which is often in the form of multimedia data. A large and growing body of literature has investigated the disease progression problem. However, far too little attention to date has been paid to jointly consider the following three observations of the chronic disease progression: 1) the health statuses at different time points are chronologically similar; 2) the future health statuses of each patient can be comprehensively revealed from the current multimedia and multimodal observations, such as visual scans, digital measurements and textual medical histories; and 3) the discriminative capabilities of different modalities vary significantly in accordance to specific diseases. In the light of these, we propose an adaptive multimodal multi-task learning model to co-regularize the modality agreement, temporal progression and discriminative capabilities of different modalities. We theoretically show that our proposed model is a linear system. Before training our model, we address the data missing problem via the matrix factorization approach. Extensive evaluations on a real-world Alzheimer's disease dataset well verify our proposed model. It should be noted that our model is also applicable to other chronic diseases.