Conference PaperPDF Available

Detecting Lung Cancer from Histopathological Images using Convolution Neural Network

December 2021

December 2021

DOI:10.1109/TENCON54134.2021.9707242

Conference: TENCON 2021 - 2021 IEEE Region 10 Conference (TENCON)

Authors:

Dewan Ziaul Karim

BRAC University

Tasfia Anika Bushra

Daffodil International University

Lung cancer is one of the leading causes of mortality in both men and women throughout the world. That is why early identification and treatment of lung cancer patients bear a huge significance in the recovery procedure of such patients. A lot of time, pathologists use histopathological pictures of tissue biopsy from possibly diseased regions of the lungs to detect the probability and type of cancer. However, this procedure is both tedious and sometimes fallible too. Machine learning based solutions for medical image analysis can help a lot in this regard. The aim of this work is to provide a convolution neural network (CNN) model that can accurately recognize and categorize lung cancer types with superior accuracy which is very important for treatment. We propose a CNN model with 15000 images split into 3 categories: Training, validation, and testing. Three different types of lung tissues (Benign tissue, Adenocarcinoma, and squamous cell carcinoma) have been examined. 50 instances from every class were kept for testing procedure. The rest of the data was split as: about 80% and 20% for training and validation respectively. Eventually, our model obtained 98.15% training accuracy and 98.07% validation accuracy.

Content uploaded by Dewan Ziaul Karim

Content may be subject to copyright.

Detecting Lung Cancer from Histopathological

Images using Convolution Neural Network

Dewan Ziaul Karim

Department of Computer Science and Engineering

Brac University

Dhaka, Bangladesh

ziaul.karim@bracu.ac.bd

Tasfia Anika Bushra

Department of Computer Science and Engineering

Daffodil International University

Dhaka, Bangladesh

anika.cse@diu.edu.bd

Abstract— Lung cancer is one of the leading causes of

mortality in both men and women throughout the world. That

is why early identification and treatment of lung cancer patients

bear a huge significance in the recovery procedure of such

patients. A lot of time, pathologists use histopathological

pictures of tissue biopsy from possibly diseased regions of the

lungs to detect the probability and type of cancer. However, this

procedure is both tedious and sometimes fallible too. Machine

learning based solutions for medical image analysis can help a

lot in this regard. The aim of this work is to provide a

convolution neural network (CNN) model that can accurately

recognize and categorize lung cancer types with superior

accuracy which is very important for treatment. We propose a

CNN model with 15000 images split into 3 categories: Training,

validation, and testing. Three different types of lung tissues

(Benign tissue, Adenocarcinoma, and squamous cell carcinoma)

have been examined. 50 instances from every class were kept for

testing procedure. The rest of the data was split as: about 80%

and 20% for training and validation respectively. Eventually,

our model obtained 98.15% training accuracy and 98.07%

validation accuracy.

Keywords—Lung Cancer, Histopathological Images, Deep

Learning, CNN, Classification.

I. INTRODUCTION

Lung cancer is regarded as one of the most prominent

cancers in the whole wide world. It makes up around 25% of

all cancer related deaths [1]. The most common reason behind

lung cancer is smoking. However, in the case of non-smokers,

exposure to radon, second-hand smoking, air pollution, or

certain other substances can all cause lung cancer [2].

Unfortunately, the mortality rate of lung cancer is on the rise

and it is supposed to become about 17 million worldwide in

the year 2030 [3]. In case of developing countries, at the

current growth rate, people's odds of acquiring cancer during

their lifespan may rise up to 50%-60% by 2050 [4].

There are many medical tests (CT scan, X-rays, biopsy,

etc.) done to find out potential cancerous cells. In a biopsy,

histopathology slides are evaluated by pathologists to

establish the potential diagnosis [5,6,7] and determine the

type of lung cancers [8]. But it is a time-consuming procedure

and there is always a chance that cancer types could be

misdiagnosed, which eventually results in incorrect treatment

and puts a toll on patients’ lives.

For the reason mentioned above, it is essential to

implement an automated system for assisting doctors in the

diagnosis of lung cancers as early as possible with high

accuracy. Due to advancements in the technological sector, it

is now possible to build such an automated system using

artificial intelligence (AI) and machine learning (ML).

ML is considered as a branch of AI that concentrates on

using algorithms and data to emulate the way that people

learn and improve accuracy over time [9]. In recent years,

many researchers considered combining different machine

learning techniques with x-rays and CT images to provide a

workable system for identifying types of lung cancer. These

techniques involve Random Forest (RF), Support Vector

Machine (SVM), Bayesian Networks (BN), and Convolution

Neural Network (CNN) for detecting and recognizing lung

cancers. Recently, some authors considered using

histopathological images to differentiate between carcinomas

and non-carcinomas images using CNN.

CNN is an approach under deep learning that is widely

used in image recognition and classification [10,11,12]. It

usually considers an input image, allocates biases and

weights to the images and distinguishes one image from

another. CNN is superior to other conventional approaches in

a sense that it needs a very low amount of preprocessing.

Meaning that in other traditional techniques, filters have to be

set up manually, whereas the neural network obtains the

information itself.

CNN is frequently used for image-related tasks including

classification, segmentation, medical image analysis,

recognition, etc. because it has numerous benefits over other

methods. After providing the input images in CNN, they go

through several convolution layers like flattening, pooling,

and fully-connected (FC) layers. Some types of activation

functions are also used in order to perfectly identify an

image.

The primary aim of our research is to provide a feasible,

efficient and accurate ML model to detect lung cancer from

histopathological images by classifying benign tissue,

adenocarcinoma, and squamous cell carcinomas using CNN

architecture.

II. RELATED WORK

The authors Bijaya Kumar Hatuwal and Himal Chand

Thapa [13] created a deep CNN model to identify benign

tissue, adenocarcinoma, and squamous cell carcinoma where

there were three hidden layers, one input layer, and one fully

connected layer. A dropout value of 0.1 and max-pooling were

used in their research. They used “Adam” optimizer and

eventually got 96.11% accuracy in training and 97.2%

accuracy in validation.

Muayed S AL-Huseiny et al. [14] proposed the approach

of deploying a transfer learning based deep neural network

(DNN) to detect lung nodules that are malignant using CT

images. They performed a fast pre-processing technique to

find out the ROI (Region of Interest) from the images. In this

work, GoogLeNet DNN was used and modified for their

dataset. The code was run in a machine having a processor of

2.5 GHz (Core-i3) with 16 GBs of ram and eventually

achieved an accuracy of 94.38%.

Another paper [15] described a lung cancer detection

system using Alexnet CNN. This work only distinguished

between malignant and benign lung tumors with the help of a

model based on convolution neural network and AlexNet. It

is to be mentioned that AlexNet is made up of 25 layers (with

a scale of 227x227x3). SGDM optimization model and an

initial learning rate of 0.0003 were used. MATLAB 2021a

software was used to run the code and the proposed method

achieved 96% accuracy in the end.

Ying Su et al. [16] proposed an approach for detecting

lung nodules using Faster R-CNN. They experimented on the

LIDC-IDRI dataset [17]. They used 0.001 as learning rate and

70000 as step size. Their attenuation coefficient, dropout rate,

and batch size were 0.1, 0.5, and 64 respectively. The

researchers achieved an accuracy of 91.2% with their

optimized and improved Faster R-CNN method.

Mehedi Masud et al. [18] suggested a classification

framework that differentiates among 5 different types of

colon and lung tissues by analyzing their histopathological

images. Among those 5 classes, 2 are benign and 3 are

malignant. A total of 25000 pictures were included in the

dataset. The authors used DFT and DWT techniques for

feature extraction from images. Later they used a CNN based

technique to identify cancer tissues with an accuracy of

96.33%. Satvik Garg et al. [19] conducted another research

that demonstrated the results of various pre-trained CNN

models.

Another work [20] suggested an automated system for

detecting lung malignancies in WSI (Whole Slide Images) of

lung tissues using two CNN architectures - ResNet and

VGG16. The target was to identify image patches into normal

and tumor cells. The authors used SGD as the optimizer.

Binary crossentropy was assigned as the loss function and a

learning rate of 0.0001 was chosen. Finally, it was observed

that VGG16 (75.41%) outperformed ResNet (72.05%) in

terms of patch level accuracy.

Albert Chon [21] et al. presented a Googlenet-based 3D

CNN model for lung cancer detection. The dataset contained

labeled data for 2101 patients, which the authors divided into

training, validation and test set size of 1261, 420, and 420. A

dropout with 0.3 probability was used after each convolution

and inception layers during training. They used “Adam”

optimizer with 0.0001 learning rate. It was seen that the

suggested model achieved an accuracy of 75.1% with an

AUC score of 0.757.

III. DATASET DESCRIPTION

The dataset used in this study contains 15000 lung

histopathology images. This dataset is obtained from

LC25000 Lung and colon histopathological image dataset

[22]. Those 15000 images are divided into 3 different

categories: benign tissue, adenocarcinoma, and squamous cell

carcinoma. Among those 15000 images, 11850 were put into

training, 3000 were used for validation and 150 were kept for

testing purposes. The pictures were all in RGB format, with

256 X 256 pixel sizes. Some samples from different classes is

shown below:

adenocarcinoma

squamous cell

carcinoma

benign

Fig. 1. Dataset Sample

IV. METHODOLOGY

In this study, a CNN model has been created to detect 3

classes of lung cancers. Fig. 2 indicates the complete

workflow of this research.

Fig. 2. Methodology of Detecting Lung Cancer

The procedure can be divided into 2 main steps: i)

Preparation of dataset ii) Implementing CNN model

A. Dataset Preparation

To avoid getting a disappointing result, it is always better

to pre-process the dataset to increase efficiency [23]. In our

work, various steps were considered to prepare the training

dataset.

• Outliers Removal: The dataset was examined

rigorously for any outliers as outliers can affect the

performance of our model.

• Resizing Images: All the images were scaled to a pixel

size of 256 x 256 as CNN models tend to take a fixed

dimension as inputs.

• Dataset Normalization: Normalized data can help deep

learning based models gain more stability and provide

a better chance of convergence. The range of pixel

values in a picture is 0 to 255. So we used Minmax (1)

normalizer to normalize the pixel values of our

images.

   󰇛 󰇜 (1)

• Data Augmentation: Usually CNN models perform

better with more images. Hence, we applied some data

augmentation methods to expand our training data.

Techniques such as shearing, rotating, shifting,

flipping, etc. were applied to bring variety to the

dataset and make the model more robust.

B. Proposed Model’s Architecture

In this work, we suggest a multi-layered CNN model to

classify different types of lung cancers from histopathological

images. There are 6 convolution layers and 3 dense layers in

our CNN model. There are 32,64,128,128,128, and 64 filters

respectively in those 6 convolution layers with 3 x 3 kernel

size. All the convolution operations are followed by Batch

Normalization [24] operation (2) which helps to make the

learning procedure faster. Following that, a Max-pooling [25]

procedure with a pool size of 2 x 2 was performed.







 (2)

Since convolution networks work better with ReLU [26],

all the convolution layers use “ReLU” as activation function

(3).

  󰇛󰇜 (3)

A flatten layer is designed just after those 6 convolution

layers. It helps in the process of converting data into a one-

dimensional array for usage in the next layer. After this, 3

consecutive dense layers are implemented with 512, 64, and 3

units respectively. There are 3 nodes in the last dense layer as

we are trying to classify 3 different types of lung tissues. A

softmax activation function (4) was applied in the last dense

layer.







 (4)

TABLE I. PROPOSED MODEL SUMMARY

Layers

Shape of Output

conv2d_0

(None,254,254,32)

batch_normalization_0

(None,254,254,32)

max_pooling2d_0

(None,127,127,32)

conv2d_1

(None,125,125,64)

batch_normalization_1

(None,125,125,64)

max_pooling2d_1

(None,62,62,64)

conv2d_2

(None,60,60,128)

batch_normalization_2

(None,60,60,128)

max_pooling2d_2

(None,30,30,128)

conv2d_3

(None,28,28,128)

batch_normalization_3

(None,28,28,128)

max_pooling2d_3

(None,14,14,128)

conv2d_4

(None,12,12,128)

batch_normalization_4

(None,12,12,128)

max_pooling2d_4

(None,6,6,128)

conv2d_5

(None,4,4,64)

batch_normalization_5

(None,4,4,64)

max_pooling2d_5

(None,2,2,64)

flatten_1

(None,256)

dense_0

(None,512)

batch_normalization_6

(None,512)

dense_1

(None,64)

batch_normalization_7

(None,64)

dense_2

(None,3)

activation

(None,3)

Total params: 631,299

Trainable params: 629,059

Non-trainable params: 2,240

C. Parameters used in Training

For our proposed model, we tried to use multiple

parameters e.g., optimizer, learning rate, metrics, batch size,

epoch numbers, callbacks, etc. Table II indicates the various

training parameters used in our model:

TABLE II. TRAINING PARAMETERS USED IN THE MODEL

Name of Parameter

Value

Used Optimizer

Adam

Learning Rate (Initial)

0.01

Learning Rate (Minimum)

.000001

Regularizer

L1 (0.000001)

Batch Size

Epochs

Steps per Epoch

593

Loss Function

Categorical Crossentropy

Metrics

Accuracy, Precision,

Recall, Loss

Callbacks

ReduceLROnPlateau

D. Evaluation Tools

Python version 3.X was used for the whole experiment

including dataset preparation, model implementation, and

evaluation.

V. RESULT ANALYSIS

From the whole dataset, 50 images from each class were

kept aside for testing purposes. The remaining images were

split in such a way that about 80% data went into training and

20% went into validation. The model finally achieved a

training and validation accuracy of 98.15% and 98.07%

respectively. Fig. 3 and 4 indicate accuracy and loss graphs

for both training and validation respectively.

Fig. 3. Accuracy for Both Training and Validation

Fig. 4. Loss for Both Training and Talidation

Moreover, we also calculated the accuracy of different

pre-trained CNN models for the same dataset along with same

hyperparameters and compared with the result of our proposed

CNN model. The different models that we tried out are

DenseNet201, ResNet152V2, MobileNetV2, InceptionV3,

Xception, InceptionResNetV2, VGG16, VGG19 and

ResNet50. It was seen that all of those models performed

poorer than our proposed model. Among the pre-trained

models, DenseNet201 and MobileNetV2 achieved the highest

training and validation accuracy of 95.41% and 95.03%

respectively. Nevertheless, both of these are lower than the

training and validation accuracy achieved by our proposed

model. As a result, we came to the conclusion that compared

to the different transfer learning approaches, our approach to

lung cancer diagnosis has demonstrated better results with

greater accuracy rates.

Table III shows the comparison between the accuracy rate

of pretrained models and our suggested CNN model against

the same dataset.

TABLE III. TRAINING AND VALIDATION ACCURACY COMPARISON OF

PROPOSED AND PRE-TRAINED CNN MODELS

Model

Training

Accuracy

Validation

Accuracy

Proposed Model

98.15%

98.07%

DenseNet201

95.41%

94.10%

ResNet152V2

94.55%

93.53%

MobileNetV2

94.23%

95.03%

InceptionV3

93.79%

93.20%

Xception

93.72%

92.30%

InceptionResNetV2

93.00%

92.60%

VGG16

91.91%

91.77%

VGG19

90.62%

82.50%

ResNet50

74.68%

51.50%

Fig. 5 and 6 indicate training and validation accuracy

graphs for different models respectively.

Fig. 5. Training Accuracy Comparison of Different Models

Fig. 6. Validation Accuracy Comparison of Different Models

To understand our results better, we noticed the confusion

matrix based on test samples. It is important in the sense that it

provides a clear overview of samples being classified

correctly or incorrectly [27].

Fig. 7. Confusion Matrix on Test Samples

Fig. 7 exhibits the model’s confusion matrix on our

selected test(unseen) samples that include lung

adenocarcinoma (lung_aca), lung squamous cell carcinoma

(lung_scc), and lung benign tissue (lung_n).

If we look at the confusion matrix, we notice that our

model identified all of the samples from lung adenocarcinoma

and lung benign tissues with an accuracy of 100%. However,

for the class squamous cell carcinoma, 48 instances were

classified correctly and 2 were wrongly classified.

Observing the value of Recall (R), Precision (P), and F1

score on test samples is another good idea to check the

reliability of any model [28]. The formula for F1 is = 2*((R *

P) / (R + P)). Precision is calculated using the formula = TP /

(TP + FP). Dividing TP by the addition of TP and FN gives us

Recall. Here, TP is True Positive, FP is False Positive and FN

is False Negative. Table IV illustrates precision, recall and F1

score for every category in our test sample dataset.

TABLE IV. CLASSIFICATION REPORT BASED ON TEST SAMPLES

VI. FUTURE WORK

In the future, different CNN architecture with some

hyperparameters tuning may result in better accuracy than the

current one. This work may be extended to CT scan imaging

problems too. It may also be possible to build a mobile

application that will provide real time detection and eventually

widen the utilization of our technique.

VII. CONCLUSION

This work represents a CNN model to detect lung cancer

using histopathological images. The whole dataset consisted

of 15000 images and our experimental findings indicated

training and validation accuracy of 98.15% and 98.07%

respectively. It is expected that this model will help

pathologists to identify lung cancer (benign, adenocarcinoma,

squamous cell adenocarcinoma lung tissues) with less time,

effort and cost.

REFERENCES

[1] (2020) “American Cancer Society, Lung Cancer

Statistics.[Online]”.Available:

https://www.cancer.org/cancer/lungcancer/about/key-

statistics.html

[2] (2019) “American Cancer Society, Lung Cancer

Causes. [Online].” Available:

https://www.cancer.org/cancer/lungcancer/causes-

risks-prevention/what-causes.html

[3] Nie, L., Zhang, L., Yang, Y., Wang, M., Hong, R. and

Chua, T.S., 2015, October. Beyond doctors: Future

health prediction from multimedia and multimodal

observations. In Proceedings of the 23rd ACM

international conference on Multimedia (pp. 591-600).

[4] Kumar, P., Bhattacharyya, G.S., Dattatreya, S. and

Malhotra, H., 2009. Tackling the cancer Tsunami.

Indian journal of cancer, 46(1), p.1.

[5] Silvestri, G.A., Gould, M.K., Margolis, M.L., Tanoue,

L.T., McCrory, D., Toloza, E. and Detterbeck, F., 2007.

Noninvasive staging of non-small cell lung cancer:

ACCP evidenced-based clinical practice

guidelines. Chest, 132(3), pp.178S-201S.

[6] Travis, W.D., Brambilla, E., Noguchi, M., Nicholson,

A.G., Geisinger, K.R., Yatabe, Y., Beer, D.G., Powell,

C.A., Riely, G.J., Van Schil, P.E. and Garg, K., 2011.

International association for the study of lung

cancer/american thoracic society/european respiratory

society international multidisciplinary classification of

lung adenocarcinoma. Journal of thoracic

oncology, 6(2), pp.244-285.

[7] Collins, L.G., Haines, C., Perkel, R. and Enck, R.E.,

2007. Lung cancer: diagnosis and

management. American family physician, 75(1), pp.56-

63.

[8] Yu, K.H., Zhang, C., Berry, G.J., Altman, R.B., Ré, C.,

Rubin, D.L. and Snyder, M., 2016. Predicting non-

small cell lung cancer prognosis by fully automated

microscopic pathology image features. Nature

communications, 7(1), pp.1-10.

[9] Michie, D., Spiegelhalter, D.J. and Taylor, C.C., 1994.

Machine learning, neural and statistical classification.

[10] O'Shea, K. and Nash, R., 2015. An introduction to

convolutional neural networks. arXiv preprint

arXiv:1511.08458.

[11] Hijazi, S., Kumar, R. and Rowen, C., 2015. Using

convolutional neural networks for image recognition.

Cadence Design Systems Inc.: San Jose, CA, USA,

pp.1-12.

[12] Sultana, F., Sufian, A. and Dutta, P., 2018, November.

Advancements in image classification using

convolutional neural network. In 2018 Fourth

International Conference on Research in Computational

Name of Classes

Precision

Recall

Score

Support

Lung

Adenocarcinoma

.96

.98

Lung Benign

Lung Squamous

Cell Carcinoma

.96

.98

Accuracy

.99

50*3=150

Marco Average

.99

Weighted

Average

.99

Intelligence and Communication Networks (ICRCICN)

(pp. 122-129). IEEE.

[13] Hatuwal, B.K. and Thapa, H.C., 2020. Lung Cancer

Detection Using Convolutional Neural Network on

Histopathological Images. Int. J. Comput. Trends

Technol, 68, pp.21-24.

[14] AL-Huseiny, M.S. and Sajit, A.S., 2021. Transfer

learning with GoogLeNet for detection of lung

cancer. Indonesian Journal of Electrical Engineering

and Computer Science, 22(2), pp.1078-1086.

[15] Agarwal, A., Patni, K. and Rajeswari, D., 2021, July.

Lung Cancer Detection and Classification Based on

Alexnet CNN. In 2021 6th International Conference on

Communication and Electronics Systems (ICCES) (pp.

1390-1397). IEEE.

[16] Su, Y., Li, D. and Chen, X., 2021. Lung nodule

detection based on faster R-CNN

framework. Computer Methods and Programs in

Biomedicine, 200, p.105866.

[17] Armato III, S.G., McLennan, G., Bidaut, L., McNitt‐

Gray, M.F., Meyer, C.R., Reeves, A.P., Zhao, B.,

Aberle, D.R., Henschke, C.I., Hoffman, E.A. and

Kazerooni, E.A., 2011. The lung image database

consortium (LIDC) and image database resource

initiative (IDRI): a completed reference database of

lung nodules on CT scans. Medical physics, 38(2),

pp.915-931.

[18] Masud, M., Sikder, N., Nahid, A.A., Bairagi, A.K. and

AlZain, M.A., 2021. A machine learning approach to

diagnosing lung and colon cancer using a deep learning-

based classification framework. Sensors, 21(3), p.748.

[19] Garg, S. and Garg, S., 2020, December. Prediction of

lung and colon cancer through analysis of

histopathological images by utilizing Pre-trained CNN

models with visualization of class activation and

saliency maps. In 2020 3rd Artificial Intelligence and

Cloud Computing Conference (pp. 38-45).

[20] Šarić, M., Russo, M., Stella, M. and Sikora, M., 2019,

June. CNN-based method for lung cancer detection in

whole slide histopathology images. In 2019 4th

International Conference on Smart and Sustainable

Technologies (SpliTech) (pp. 1-4). IEEE.

[21] Chon, A., Balachandar, N. and Lu, P., 2017. Deep

convolutional neural networks for lung cancer

detection. Standford University.

[22] Borkowski, A.A., Bui, M.M., Thomas, L.B., Wilson,

C.P., DeLand, L.A. and Mastorides, S.M., 2019. Lung

and colon cancer histopathological image dataset

(lc25000). arXiv preprint arXiv:1912.12142.

[23] Pal, K.K. and Sudeep, K.S., 2016, May. Preprocessing

for image classification by convolutional neural

networks. In 2016 IEEE International Conference on

Recent Trends in Electronics, Information &

Communication Technology (RTEICT) (pp. 1778-

1781). IEEE.

[24] Ioffe, S. and Szegedy, C., 2015, June. Batch

normalization: Accelerating deep network training by

reducing internal covariate shift. In International

conference on machine learning (pp. 448-456). PMLR.

[25] Scherer, D., Müller, A. and Behnke, S., 2010,

September. Evaluation of pooling operations in

convolutional architectures for object recognition. In

International conference on artificial neural networks

(pp. 92-101). Springer, Berlin, Heidelberg.

[26] Agarap, A.F., 2018. Deep learning using rectified linear

units (relu). arXiv preprint arXiv:1803.08375.

[27] Ting, K.M., 2017. Confusion matrix. Encyclopedia of

Machine Learning and Data Mining, 260.

[28] Goutte, C. and Gaussier, E., 2005, March. A

probabilistic interpretation of precision, recall and F-

score, with implication for evaluation. In European

conference on information retrieval (pp. 345-359).

Springer, Berlin, Heidelberg.

Pipelined Structure in the Classification of Skin Lesions Based on Alexnet CNN and SVM Model With Bi-Sectional Texture Features

Article

Full-text available

Jan 2024

The classification of skin lesions is crucial because it increases the likelihood that malignant skin lesions will be discovered early on, allowing for more effective treatment. Due to the abundance of lesion images and the possibility of human error, early detection can be difficult for dermatologists. This work aims to classify skin lesions using two pipelines that were designed using support vector machine (SVM) and AlexNet convolutional neural network (CNN) models. Pipeline-1 uses the AlexNet CNN, while pipeline-2 proposes a bisectional feature extraction approach with an SVM model. The skin lesion images are initially preprocessed and the lesion regions are segmented. The lesion regions are further subdivided into four regions based on the intensity mapping function. The bisectional features are then extracted from the subdivided regions and the extracted features are trained with the SVM model. The dataset used in the experiment is the HAM-10000 dataset and the PAD-UFES-20 dataset, which consists of dermatoscopic skin lesions images. Based on the models’ accuracy, sensitivity, DCI, specificity, and F1-score, the experiment’s findings will be assessed for five different skin lesion conditions. By accurately and effectively classifying skin lesions, the study’s findings will help in the diagnosis and treatment of skin disorders. The SVM pipeline performs better than the AlexNet CNN pipeline where the SVM pipeline and AlexNet CNN pipeline result in an accuracy of 98.66% and 97.68% respectively for the HAM-10000 dataset. The AlexNet CNN and SVM pipeline structure results in an accuracy of 96.87% and 98.10% respectively for the PAD-UFES-20 dataset.

Colon and lung cancer classification from multi-modal images using resilient and efficient neural network architectures

Article

Full-text available

May 2024

Automatic classification of colon and lung cancer images is crucial for early detection and accurate diagnostics. However, there is room for improvement to enhance accuracy, ensuring better diagnostic precision. This study introduces two novel dense architectures (D1 and D2) and emphasizes their effectiveness in classifying colon and lung cancer from diverse images. It also highlights their resilience, efficiency, and superior performance across multiple datasets. These architectures were tested on various types of datasets, including NCT-CRC-HE-100K (set of 100,000 non-overlapping image patches from hematoxylin and eosin (H&E) stained histological images of human colorectal cancer (CRC) and normal tissue), CRC-VAL-HE-7K (set of 7180 image patches from N = 50 patients with colorectal adenocarcinoma, no overlap with patients in NCT-CRC-HE-100K), LC25000 (Lung and Colon Cancer Histopathological Image), and IQ-OTHNCCD (Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases), showcasing their effectiveness in classifying colon and lung cancers from histopathological and Computed Tomography (CT) scan images. This underscores the multi-modal image classification capability of the proposed models. Moreover, the study addresses imbalanced datasets, particularly in CRC-VAL-HE-7K and IQ-OTHNCCD, with a specific focus on model resilience and robustness. To assess overall performance, the study conducted experiments in different scenarios. The D1 model achieved an impressive 99.80 % accuracy on the NCT-CRC-HE-100K dataset, with a Jaccard Index (J) of 0.8371, a Matthew's Correlation Coefficient (MCC) of 0.9073, a Cohen's Kappa (Kp) of 0.9057, and a Critical Success Index (CSI) of 0.8213. When subjected to 10-fold cross-validation on LC25000, the D1 model averaged (avg) 99.96 % accuracy (avg J, MCC, Kp, and CSI of 0.9993, 0.9987, 0.9853, and 0.9990), surpassing recent reported performances. Furthermore, the ensemble of D1 and D2 reached 93 % accuracy (J, MCC, Kp, and CSI of 0.7556, 0.8839, 0.8796, and 0.7140) on the IQ-OTHNCCD dataset, exceeding recent benchmarks and aligning with other reported results. Efficiency evaluations were conducted in various scenarios. For instance, training on only 10 % of LC25000 resulted in high accuracy rates of 99.19 % (J, MCC, Kp, and CSI of 0.9840, 0.9898, 0.9898, and 0.9837) (D1) and 99.30 % (J, MCC, Kp, and CSI of 0.9863, 0.9913, 0.9913, and 0.9861) (D2). In NCT-CRC-HE-100K, D2 achieved an impressive 99.53 % accuracy (J, MCC, Kp, and CSI of 0.9906, 0.9946, 0.9946, and 0.9906) with training on only 30 % of the dataset and testing on the remaining 70 %. When tested on CRC-VAL-HE-7K, D1 and D2 achieved 95 % accuracy (J, MCC, Kp, and CSI of 0.8845, 0.9455, 0.9452, and 0.8745) and 96 % accuracy (J, MCC, Kp, and CSI of 0.8926, 0.9504, 0.9503, and 0.8798), respectively, outperforming previously reported results and aligning closely with others. Lastly, training D2 on just 10 % of NCT-CRC-HE-100K and testing on CRC-VAL-HE-7K resulted in significant outperformance of InceptionV3, Xception, and DenseNet201 benchmarks, achieving an accuracy rate of 82.98 % (J, MCC, Kp, and CSI of 0.7227, 0.8095, 0.8081, and 0.6671). Finally, using explainable AI algorithms such as Grad-CAM, Grad-CAM++, Score-CAM, and Faster Score-CAM, along with their emphasized versions, we visualized the features from the last layer of DenseNet201 for histopathological as well as CT-scan image samples. The proposed dense models, with their multi-modality, robustness, and efficiency in cancer image classification, hold the promise of significant advancements in medical diagnostics. They have the potential to revolutionize early cancer detection and improve healthcare accessibility worldwide.

Improved Water Strider Algorithm With Convolutional Autoencoder for Lung and Colon Cancer Detection on Histopathological Images

Article

Full-text available

Jan 2023

Lung and colon cancers are deadly diseases that can develop concurrently in organs and undesirably affect human life in some special cases. The detection of these cancers from histopathological images poses a complex challenge in medical diagnostics. Advanced image processing techniques, including deep learning algorithms, offer a solution by analyzing intricate patterns and structures in histopathological slides. The integration of artificial intelligence in histopathological analysis not only improves the proficiency of cancer detection but also holds the potential to increase prognostic assessments, eventually contributing to effective treatment strategies for patients with lung and colon cancers. This manuscript presents an Improved Water Strider Algorithm with Convolutional Autoencoder for Lung and Colon Cancer Detection (IWSACAE-LCCD) on HIs. The major aim of the IWSACAE-LCCD technique aims to detect lung and colon cancer. For noise removal process, median filtering (MF) approach can be used. Besides, deep convolutional neural network based MobileNetv2 model can be applied as a feature extractor with IWSA based hyperparameter optimizer. Finally, convolutional autoencoder (CAE) model can be applied to detect the presence of lung and colon cancer. To enhance the detection results of the IWSACAE-LCCD technique, a series of simulations were performed. The obtained results highlighted that the IWSACAE-LCCD technique outperforms other approaches in terms of different measures.

Comparative Analysis of Federated Learning and Centralized Approach for Detecting Different Lung Diseases

Conference Paper

Aug 2023

Access to a large dataset is necessary to improve disease detection with excellent accuracy. However, due to data confidentiality and privacy restrictions, collecting data from hospitals or other organizations is a significant challenge in the healthcare sector. Due to this, Federated Learning (FL), which adopts a decentralized approach, is developed to replace the conventional machine learning methodology in the development of improved screening methods. Since there is no requirement for data to be centralized in federated learning, patient data privacy is ensured. In this paper, we have compared the sequential model and the ensemble model for both federated learning and centralized approach, two different types of models. For each approach, these models were applied to separate X-ray images for the detection of two different lung diseases: lung cancer and tuberculosis. In this paper, we also have showed the analysis of their accuracy and demonstrated how FL can be the most effective strategy through comparison.

Early Diagnosis of Lung Tumors for Extending Patients’ Life Using Deep Neural Networks

Article

Full-text available

Jun 2023
CMC-COMPUT MATER CON

The medical community has more concern on lung cancer analysis. Medical experts’ physical segmentation of lung cancers is time-consuming and needs to be automated. The research study’s objective is to diagnose lung tumors at an early stage to extend the life of humans using deep learning techniques. Computer-Aided Diagnostic (CAD) system aids in the diagnosis and shortens the time necessary to detect the tumor detected. The application of Deep Neural Networks (DNN) has also been exhibited as an excellent and effective method in classification and segmentation tasks. This research aims to separate lung cancers from images of Magnetic Resonance Imaging (MRI) with threshold segmentation. The Honey hook process categorizes lung cancer based on characteristics retrieved using several classifiers. Considering this principle, the work presents a solution for image compression utilizing a Deep Wave Auto-Encoder (DWAE). The combination of the two approaches significantly reduces the overall size of the feature set required for any future classification process performed using DNN. The proposed DWAE-DNN image classifier is applied to a lung imaging dataset with Radial Basis Function (RBF) classifier. The study reported promising results with an accuracy of 97.34%, whereas using the Decision Tree (DT) classifier has an accuracy of 94.24%. The proposed approach (DWAE-DNN) is found to classify the images with an accuracy of 98.67%, either as malignant or normal patients. In contrast to the accuracy requirements, the work also uses the benchmark standards like specificity, sensitivity, and precision to evaluate the efficiency of the network. It is found from an investigation that the DT classifier provides the maximum performance in the DWAE-DNN depending on the network’s performance on image testing, as shown by the data acquired by the categorizers themselves.

Comparative Analysis of Federated Learning and Centralized Approach for Detecting Different Lung Diseases

Preprint

May 2023

Access to a large dataset is necessary to improve disease detection with excellent accuracy. However, due to data confidentiality and privacy restrictions, collecting data from hospitals or other organizations is a significant challenge in the healthcare sector. Due to this, Federated Learning (FL), which adopts a decentralized approach, is developed to replace the conventional machine learning methodology in the development of improved screening methods. Since there is no requirement for data to be centralized in federated learning, patient data privacy is ensured. In this paper, we have compared the sequential model and the ensemble model for both federated learning and centralized approach, two different types of models. For each approach , these models were applied to separate X-ray images for the detection of two different lung diseases: lung cancer and tuberculosis. In this paper, we also have showed the analysis of their accuracy and demonstrated how FL can be the most effective strategy through comparison.

Investigating Deep Learning Methods for Detecting Lung Adenocarcinoma on the TCIA Dataset

Article

Full-text available

Dec 2023

Lung cancer, one of the deadliest diseases worldwide, can be treated, where the survival rates increase with early detection and treatment. CT scans are the most advanced imaging modality in clinical practices. Interpreting and identifying cancer from CT scan images can be difficult for doctors. Thus, automated detection helps doctors to identify malignant cells. A variety of techniques including deep learning and image processing have been extensively examined and evaluated. The objective of this study is to evaluate different transfer learning models through the optimization of certain variables including learning rate (LR), batch size (BS), and epochs. Finally, this study presents an enhanced model that achieves improved accuracy and faster processing times. Three models, namely VGG16, ResNet-50, and CNN Sequential Model, have undergone evaluation by changing parameters like learning rate, batch size, and epochs and after extensive experiments, it has been found that among these three models, the CNN Sequential model is working best with an accuracy of 94.1% and processing time of 1620 seconds. However, VGG16 and ResNet50 have 95.0% and 93% accuracies along with processing times of 5865 seconds and 9460 seconds, respectively.

Privacy Preserving Federated Learning for Lung Cancer Classification

Conference Paper

Full-text available

Dec 2023

Lung cancer is characterized by high mortality and incidence rates, making it one of the most prevalent cancers globally. Early detection significantly improves the chances of survival for individuals affected by this disease. The histopathological diagnosis is a crucial factor in determining the specific type of cancer. In recent years, there has been a significant increase in novel computer-aided diagnostic techniques utilizing deep learning algorithms for the early detection of lung cancer. However, sharing sensitive patient data is significantly restricted by regulations such as HIPAA and GDPR, primarily due to privacy concerns. Given the current constraints, institutions face challenges in effectively exchanging information to enhance the accuracy of lung cancer classification. In order to address the issue of privacy in lung cancer classification, we propose a federated learning approach. This methodology involves employing local models with an Inception-v3 backbone to carry out the classification of histopathological images of lung cancer & updating the global model based on the local weights. These images have been obtained from the LC25000 dataset. The lung cancer images from the LC25000 dataset were analyzed, which consisted of three distinct classes. Each class contained a total of 5000 images. The applied model has achieved a classification accuracy of 99.867% in categorizing lung cancer images into three distinct classes. The performance of the proposed framework has demonstrated superiority over other existing methodologies. Furthermore, this solution effectively addresses the privacy concerns associated with the sharing of medical data among different institutions.

A Novel Deep Learning Approach for Colon and Lung Cancer Classification Using Histopathological Images

Conference Paper

Oct 2023

CellIdentifier: Classification of Peripheral Blood Cell Images using Deep Learning

Conference Paper

Jul 2023

Lung Cancer Detection and Classification Based on Alexnet CNN

Conference Paper

Full-text available

Jul 2021

Transfer learning with GoogLeNet for detection of lung cancer

Article

Full-text available

May 2021

p class="p1">The use of computer algorithms has gained momentum in filling/assisting roles of specialists especially in early diagnosis scenarios. This paper proposes the employment of deep neural networks (DNN) to detect images with malignant nodules of lung computed tomography (CT). The method includes subjecting input images to a simple and fast pre-processing which isolates regions of interest (ROI), that’s the lungs dominated area, ridding the images of other surrounding tissues and artefacts. Centered and size normalized images are then fed to a deep neural network for training and validation. In this work transfer learning is used to readjust GoogLeNet DNN to learn this medical data. This includes allowing final layers of the DNN to evolve while restricting deep layers. In this setting, a rough, unprocessed dataset, the IQ-OTH/NCCD lung cancer dataset was used to train/validate the proposed algorithm. Experimental results show that this algorithm scores 94.38% accuracy, which outperforms benchmark method previously used with this dataset.</p

Prediction of lung and colon cancer through analysis of histopathological images by utilizing Pre-trained CNN models with visualization of class activation and saliency maps

Conference Paper

Full-text available

Dec 2020

Colon and Lung cancer is one of the most perilous and dangerous ailments that individuals are enduring worldwide and has become a general medical problem. To lessen the risk of death, a legitimate and early finding is particularly required. In any case, it is a truly troublesome task that depends on the experience of histopathologists. If a histologist is under-prepared it may even hazard the life of a patient. As of late, deep learning has picked up energy, and it is being valued in the analysis of Medical Imaging. This paper intends to utilize and alter the current pre-trained CNN-based model to identify lung and colon cancer utilizing histopathological images with better augmentation techniques. In this paper, eight distinctive Pre-trained CNN models, VGG16, NASNetMobile, InceptionV3, InceptionResNetV2, ResNet50, Xception, MobileNet, and DenseNet169 are trained on LC25000 dataset. The model performances are assessed on precision, recall, f1score, accuracy, and auroc score. The results exhibit that all eight models accomplished noteworthy results ranging from 96% to 100% accuracy. Subsequently, GradCAM and SmoothGrad are also used to picture the attention images of Pre-trained CNN models classifying malignant and benign images.

A Machine Learning Approach to Diagnosing Lung and Colon Cancer Using a Deep Learning-Based Classification Framework

Article

Full-text available

Jan 2021
SENSORS-BASEL

The field of Medicine and Healthcare has attained revolutionary advancements in the last forty years. Within this period, the actual reasons behind numerous diseases were unveiled, novel diagnostic methods were designed, and new medicines were developed. Even after all these achievements, diseases like cancer continue to haunt us since we are still vulnerable to them. Cancer is the second leading cause of death globally; about one in every six people die suffering from it. Among many types of cancers, the lung and colon variants are the most common and deadliest ones. Together, they account for more than 25% of all cancer cases. However, identifying the disease at an early stage significantly improves the chances of survival. Cancer diagnosis can be automated by using the potential of Artificial Intelligence (AI), which allows us to assess more cases in less time and cost. With the help of modern Deep Learning (DL) and Digital Image Processing (DIP) techniques , this paper inscribes a classification framework to differentiate among five types of lung and colon tissues (two benign and three malignant) by analyzing their histopathological images. The acquired results show that the proposed framework can identify cancer tissues with a maximum of 96.33% accuracy. Implementation of this model will help medical professionals to develop an automatic and reliable system capable of identifying various types of lung and colon cancers.

Lung Cancer Detection Using Convolutional Neural Network on Histopathological Images

Article

Full-text available

Oct 2020

Lung Cancer is one of the leading life taking cancer worldwide. Early detection and treatment are crucial for patient recovery. Medical professionals use histopathological images of biopsied tissue from potentially infected areas of lungs for diagnosis. Most of the time, the diagnosis regarding the types of lung cancer are error-prone and time-consuming. Convolutional Neural networks can identify and classify lung cancer types with greater accuracy in a shorter period, which is crucial for determining patients' right treatment procedure and their survival rate. Benign tissue, Adenocarcinoma, and squamous cell carcinoma are considered in this research work. The CNN model training and validation accuracy of 96.11 and 97.2 percentage are obtained.

Deep Learning using Rectified Linear Units (ReLU)

Article

Full-text available

Mar 2018

Abien Fred Agarap

We introduce the use of rectified linear units (ReLU) as the classification function in a deep neural network (DNN). Conventionally, ReLU is used as an activation function in DNNs, with Softmax function as their classification function. However, there have been several studies on using a classification function other than Softmax, and this study is an addition to those. We accomplish this by taking the activation of the penultimate layer $h_{n - 1}$ in a neural network, then multiply it by weight parameters $\theta$ to get the raw scores $o_{i}$. Afterwards, we threshold the raw scores $o_{i}$ by $0$, i.e. $f(o) = \max(0, o_{i})$, where $f(o)$ is the ReLU function. We provide class predictions $\hat{y}$ through argmax function, i.e. argmax $f(x)$.

Lung Nodule Detection based on Faster R-CNN Framework

Article

Nov 2020
COMPUT METH PROG BIO

Background Lung cancer is a worldwide high-risk disease, and lung nodules are the main manifestation of early lung cancer. Automatic detection of lung nodules reduces the workload of radiologists, the rate of misdiagnosis and missed diagnosis. For this purpose, we propose a Faster R-CNN algorithm for the detection of these lung nodules. Method Faster R-CNN algorithm can detect lung nodules, and the training set is used to prove the feasibility of this technique. In theory, parameter optimization can improve network structure, as well as detection accuracy. Result Through experiments, the best parameters are that the basic learning rate is 0.001, step size is 70,000, attenuation coefficient is 0.1, the value of Dropout is 0.5, and the value of Batch Size is 64. Compared with other networks for detecting lung nodules, the optimized and improved algorithm proposed in this paper generally improves detection accuracy by more than 20% when compared with the other traditional algorithms. Conclusion Our experimental results have proved that the method of detecting lung nodules based on Faster R-CNN algorithm has good accuracy and therefore, presents potential clinical value in lung disease diagnosis. This method can further assist radiologists, and also for researchers in the design and development of the detection system for lung nodules.

CNN-based Method for Lung Cancer Detection in Whole Slide Histopathology Images

Conference Paper

Jun 2019

Advancements in Image Classification using Convolutional Neural Network

Conference Paper

Nov 2018

Convolutional Neural Network (CNN) is the state-of-the-art for image classification task. Here we have briefly discussed different components of CNN. In this paper, We have explained different CNN architectures for image classification. Through this paper, we have shown advancements in CNN from LeNet-5 to latest SENet model. We have discussed the model description and training details of each model. We have also drawn a comparison among those models.

Beyond Doctors: Future Health Prediction from Multimedia and Multimodal Observations

Conference Paper

Oct 2015

Although chronic diseases cannot be cured, they can be effectively controlled as long as we understand their progressions based on the current observational health records, which is often in the form of multimedia data. A large and growing body of literature has investigated the disease progression problem. However, far too little attention to date has been paid to jointly consider the following three observations of the chronic disease progression: 1) the health statuses at different time points are chronologically similar; 2) the future health statuses of each patient can be comprehensively revealed from the current multimedia and multimodal observations, such as visual scans, digital measurements and textual medical histories; and 3) the discriminative capabilities of different modalities vary significantly in accordance to specific diseases. In the light of these, we propose an adaptive multimodal multi-task learning model to co-regularize the modality agreement, temporal progression and discriminative capabilities of different modalities. We theoretically show that our proposed model is a linear system. Before training our model, we address the data missing problem via the matrix factorization approach. Extensive evaluations on a real-world Alzheimer's disease dataset well verify our proposed model. It should be noted that our model is also applicable to other chronic diseases.

Detecting Lung Cancer from Histopathological Images using Convolution Neural Network

Abstract

Recommended publications

Non-small cell lung cancer diagnosis aid with histopathological images using Explainable Deep Learni...

Efficient Lung Cancer Classification on Multi level Convolution Neural Network using Histopathologic...

PestDetector: A Deep Convolutional Neural Network to Detect Jute Pests

Lung Cancer Detection Using Convolutional Neural Network on Histopathological Images

Lung Cancer Classification and Prediction of Disease Severity Score Using Deep Learning