Content uploaded by Yousef Sharrab
Author content
All content in this area was uploaded by Yousef Sharrab on Oct 21, 2022
Content may be subject to copyright.
International Journal of Electrical and Computer Engineering (IJECE)
Vol. x, No. x, xx 201x, pp. xx∼xx
ISSN: 2088-8708, DOI: 10.11591/ijece.vxix.ppxx-xx r1
Performance Comparison for Several Deep Learning-Based
Object Detection Algorithms utilizing Thermal Images
Yousef O. Sharrab1, Sanaa AlShboul2, Ala’ Khalifah3, Mohammad Alsmirat2
1Wayne State Multimedia Systems and Deep Learning Research Laboratory, ECE Department, Detroit, USA,
Email:{yousef.sharrab}@wayne.edu
2{Department of Computer Engineering, Department of Computer Science}, Jordan University of Science and
Technology, Jordan, Email: mailto:{Smalshboul16, masmirat}@just.edu.jo
3Department of Electrical Engineering, German Jordan University, Email: ala.khalifeh@gju.edu.jo
Article Info
Article history:
ABSTRACT
Keywords:
Object Detection
Detection Accuracy
Models Accuracy Comparison
Thermal Images
Tensorflow 2.0
SSD
ResNet
MobileNet
EfficientDet
Computer Vision
Neural Networks
Deep Learning
Machine Learning
Advancement of the object detection prediction models used in a variety of fields is the
result of the advancement in deep learning learning approaches of machine learning
and computer vision technology. Utilizing such models in thermal imaging is excep-
tionally necessary and required.
Object detection algorithms are improving continuously. There are many common
Application Program Interface (APIs) or libraries to be used. The most two common
techniques ones are in Google Tensorflow object detection. Each Object Detection
has its advantages and disadvantages. A direct comparison between the most common
state-of-the-art standard object detection methods help in finding the best solution for
thermal image detection/recognition systems/application. This paper will discuss the
algorithms and compare them in terms of accuracy and classification loss, we compare
the performance on thermal images for six most suitable object detection models that
are supported by tensorflow 2.0. Since we only have a small dataset (56) images, we
use transfer learning to benefit from already trained (pre-trained) models.
Copyright c
201x Institute of Advanced Engineering and Science.
All rights reserved.
Corresponding Author:
Ala’ Khalifah3,
Department of Electrical Engineering,
German Jordan University,
Email: ala.khalifeh@gju.edu.jo
Journal homepage: http://iaescore.com/journals/index.php/IJECE
2rISSN: 2088-8708
1. INTRODUCTION
Recent advancements in Computer Vision (CV) and Object Detection (OD) are driven by the tech-
nology, Artificial Intelligence (AI), and Deep Learning (DL) widespread adoption by the industry. It is used
in self-driving cars, medical diagnostic processes, detecting threats in security systems, and detecting crop
diseases [1, 2]. Tensorflow is a deep learning framework that supports many of the latest models in video com-
munication, video streaming, speech recognition, Natural Language Processing (NLP), and speech synthesis
[3, 4].
Deep learning (DL) has been continuously showing the best performance for diverse problems in
the domain of computer vision. Deep learning approaches have recently contributed to the advancement of
the object detection models used for several applications [5, 6]. This paper will discuss the several standard
state-of-the-art object detectors algorithms and compare them in terms of accuracy and classification loss. We
compare the performance on thermal images for six most suitable object detection models that are supported
by tensorflow 2.0. Since we only have a small dataset (56) images, we use transfer learning to benefit from
already trained (pre-trained) models. The TensorFlow Object Detection API is a framework for creating a deep
learning network that solves object detection problems. It contains some pre-trained models trained on several
large data sets that can be used for inference. Additionally, we can use this framework to apply learning to
move in pre-trained models that have been pre-trained on large data sets that enable us to customize these
models for a specific task. The intuition behind learning to transfer to classify images is that if a model is
trained on a sufficiently large and general dataset, that model will function effectively as a general model for
the visual world. You can then leverage these acquired feature maps without having to start from scratch by
training a large model on a large data set. For object detection performance metric classification loss and the
visual accuracy (True Positive, True Negative, False Positive, and False Negative) [7, 8].
Advances in object discovery models used in a variety of fields are the result of advances in deep
learning algorithms including the Convolutional Neural Network (CNN). We show that we can compare sev-
eral models of object detection based on thermal images. We show we can use a small number of thermal
images (56) by making use of them Pre-trained models using a process called transfer learning. We have found
that ”EfficientDet D0 512x512” and ”SSD ResNet50 V1 FPN 640x640” models provide a better rating of clas-
sification loss performance and showing higher ”True” detection and less ”False” detection. The organization
of the paper is as takes after. Section 2. provides background information of the used encoders for compari-
son. Subsequently, Section 3. analyzes and discusses the performance assessment and evaluation procedures,
techniques, and methodology. At last, Section 4. illustrate and analyzes the most results.
2. BACKGROUND INFORMATION
Computer vision is becoming ubiquitous in the world, with applications in image understanding,
smartphone apps, self-driving cars, drones, video communication, video streaming, automated video surveil-
lance, security, safety, health and medicine [9, 10, 11, 12]. The core of many of these applications is visual
recognition tasks such as image classification and object detection. Recent advancements in the neural network
approach have led to major advances in the performance of these state-of-the-art visual recognition systems
[13, 14].
The field of computer vision is like vision for humans. Computer vision functions can be divided
into five Different areas as shown in Figure 1. The ability to sort images into several classes depending
on the content is called image classification. Classification and identifying the object called (classification
+ localization). The ability to identify multiple objects in images and determine their location by putting
bounding rectangle boxes around objects is called object detection. To classify every pixel in the image is a
process called segmentation. There are mainly two types of segmentation; Semantic Segmentation which refers
to the process of linking each pixel in the given image to a particular class label. For example in the following
image the pixels are labelled as car, tree, pedestrian etc. These segments are then used to find the interactions /
relations between various objects. on the other hand, instance Segmentation: Here, we associate a class label to
each pixel similar to semantic segmentation, except that it treats multiple objects of the same class as individual
objects / separate entities [15, 16, 17].
object detection is the core computer vision task, tensorflow Object Detection Application Program-
ming Interface (TF OD API) got greatly improved. Recently. Google released the new version of TF OD API
which supports Tensorflow 2.x. The TensorFlow 2 Object Detection API allows us to train a collection state
Int J Elec & Comp Eng, Vol. x, No. x, xx 201x : xx – xx
Int J Elec & Comp Eng ISSN: 2088-8708 r3
Figure 1. Computer Vision Tasks
of the art object detection models under a unified framework, including Google state of the art models such as
Single Shot Multibox Detector (SSD) (MobileNet/ResNet), Faster R-CNN (ResNet/Inception ResNet), Mask
R-CNN, CenterNet, ExtremeNet, and EfficientDet. More generally, object detection models allow us to train
the computer to identify objects in a scene with bounding boxes and class labels. There are many ways we can
utilize deep learning techniques to model this problem and the TensorFlow 2 Object Detection API allows us
to deploy a wide variety of different models and strategies to achieve this goal.
The TensorFlow 2 includes a selection of trainable detection models, including: Region-Based Fully
Convolutional Networks (R-FCN) with Resnet 101, Faster RCNN with Resnet 50, Faster RCNN with Resnet
101, Faster RCNN with Inception Resnet v2, SSD with MobileNet, SSD with Inception V2, A list of all frozen
weights (trained on the COCO dataset) for each of the above models to be used for out-of-the-box inference
purposes can be found in [18, 19, 20, 21]. A Jupyter notebook for performing inference with one of these
released models convenient local training scripts as well as distributed training and evaluation pipelines via
Google Cloud.
Several research studies have tried to automatically detect people intruding into unauthorized areas
by utilizing infrared cameras. Study [22] proposes a precise and effective method for detecting intruders in
Machine learning-based energy consumption modeling and comparing (Yousef Sharrab)
4rISSN: 2088-8708
infrared images during the night-time. An infrared video surveillance system using deep learning is presented
in [23]. Terrorist threats have increased the concerns for the security of people. Unlike the RGB cameras which
do not perform well at night and in weather conditions, the thermal cameras have become a necessary part of
advanced video surveillance systems. By applying Faster R-CNN on thermal and RGB images, analysis shows
that thermal images are performs better compared to RGB images [24]. In Study [25], the authors investigate
the task of automatic people detection in thermal cameras utilizing convolutional neural network architecture
that is developed for detection in RGB images. They compare the performance of the standard state-of-the-art
ODs such as SSD, Faster R-CNN, and YOLO, that were retrained on a dataset of thermal images.
In study [26], the authors investigate the utilization of unmanned aerial systems in marine search and
rescue missions for the detection and classification of objects on the surface of the sea. The data consists
of experimental thermal images. Authors of [27] use deep learning and RGB and thermal cameras to De-
tect pedestrians in aerial images Captured by multirotor Unmanned Aerial Vehicles (UAV). Study [28] uses a
smartphone thermal camera to capture thermal textures. A deep neural network classifies these textures into
material type. Study [29] proposes the use of well-known image-to-image translation frameworks to generate
RGB equivalents of a given thermal image and then use a an architecture for object detection in the thermal
image. Study [30] reports a visible and thermal drone monitoring system that integrates deep learning-based
detection and tracking.
3. PERFORMANCE EVALUATION METHODOLOGY
3.1. Used Object Detection Algorithms, Dataset, and Performance Metrics
As for the experimental set up, we adopted these six object detection algorithms from tensorflow
2.0 API, ”EfficientDet D0 512x512”, ”SSD MobileNet V1 FPN 640x640”, ”SSD MobileNet V2 FPN ite
320x320”, ”SSD MobileNet V2 FPNLite 640x640”, ”SSD ResNet50 V1 FPN 640x640”, ”SSD ResNet101
V1 FPN 640x640”. The Roboflow Thermal Dogs and People dataset is used, a collection of 203 thermal in-
frared images captured at various distances from people and dogs in a park and near a home. Images were
captured both portrait and landscape. Thermal images were captured using the Seek Compact XR Extra Range
Thermal Imaging Camera for iPhone [31]. The metrics we used are the classification loss from the training
and validation set, the visual accuracy (True Positive, True Negative, False Positive, and False Negative) from
thermal test images. We want to maximize True Positives (a head, a detection) and True Negatives (no head, no
detection) since they are correctly predicted observations so that they are displayed in green color in Figure 2.
We want to minimize False Positives (a head, no detection) and False Negatives (no head, a detection) therefore
are shown in red color in Figure 2.
Figure 2. False positive, false negative, true positive, and true negative
3.2. Procedure
We use the 203 dataset images in [31], we select those images of human with head, which are 56
images, we annotate them by specifying a mask on the head [32]. We use Google Colab [33] to train, validate,
and test the six object detection algorithms as in Figure 3.
Int J Elec & Comp Eng, Vol. x, No. x, xx 201x : xx – xx
Int J Elec & Comp Eng ISSN: 2088-8708 r5
Figure 3. Deep Learning-Based Object detection Training Block Diagram
4. RESULT PRESENTATION AND ANALYSIS
In this section, we analyze the Performance for Several deep learning-based object detection algo-
rithms and perform a comparison among them to select the most suitable algorithm for head detection utilizing
thermal Images, to be used to predict a human temperature entering a building or a campus.
For ”EfficientDet D0 512x512” Model, as we see in Figure 4b and Row 1of Table 1, classification loss
is 0.134, which is the minimum of all the algorithms under test. In addition, this algorithm able to detect all the
heads in the test images with high probabilities as illustrated in Row 1of Table 1 and Figure 4a. Furthermore, it
does not detect the false heads that some other algorithms detected as illustrated in Row 1of Table 2 and Figure
4a. For these reasons, we select this algorithm for our other thermal object detection applications. Model ”SSD
(a) Thermal Images (b) Classification Loss
Figure 4. Object Detection Comparison in Performance, (EfficientDet D0 512x512) Model Test Images
(loss=0.134)
MobileNet V2 FPNLite 640x640”, as we see in Figure 5b and Row 2of Table 1, classification loss is 0.190,
which is close to the minimum. In addition, this algorithm able to detect all the heads in the test images with
high probabilities as illustrated in Row 2of Table 1 and Figure 5a. On the other hand, it does detect the false
Machine learning-based energy consumption modeling and comparing (Yousef Sharrab)
6rISSN: 2088-8708
heads specified by test as illustrated in Row 2of Table 2 and Figure 5a. For these reasons, we do not select this
algorithm for our other thermal object detection applications.
(a) Thermal Image 1 (b) Classification Loss
Figure 5. Object Detection Comparison in Performance, (SSD MobileNet V2 FPNLite 640x640) Model Test
Images (loss=0.190)
We do not select Model ”SSD MobileNet V2 FPNLite 320x320”, as we see in Figure 6b and Row
3of Table 1, classification loss is 0.212, which is still close to the minimum, but this algorithm does not
able to detect all the heads in the test images as illustrated in Row 3of Table 1 and Figure 6a. On the other
hand, it does detect the false heads specified by the test as illustrated in Row 3of Table 2 and Figure 6a. For
these reasons, we do not select this algorithm for our other thermal object detection applications. For ”SSD
(a) Thermal Image (b) Classification Loss
Figure 6. Object Detection Comparison in Performance, (SSD MobileNet V2 FPNLite 320x320) Model Test
Images (loss=0.212)
MobileNet V2 FPNLite 320x320) Model”, as we see in Figure 7b and Row 4of Table 1, classification loss
is 0.213, is relatively low. In addition, this algorithm able to detect all the heads in the test images with high
probabilities as illustrated in Row 4of Table 1 and Figure 7a. Furthermore, it does not detect the false heads
that some other algorithms detected as illustrated in Row 4of Table 2 and Figure 7a. For these reasons, we
select this algorithm for our other thermal object detection applications.
Model ”SSD ResNet101 V1 FPN 640x640”, as we see in Figure 9b and Row 5of Table 1, classifi-
cation loss is 0.665, which is far from the minimum. For that reason, we do not look for the other parts of
the figure, and we do not select this algorithm for our other thermal object detection applications. Model SSD
MobileNet V1 FPN 640x640 , as we see in Figure 9b and Row 6of Table 1, classification loss is 0.77, which is
Int J Elec & Comp Eng, Vol. x, No. x, xx 201x : xx – xx
Int J Elec & Comp Eng ISSN: 2088-8708 r7
(a) Thermal Image (b) Classification Loss
Figure 7. Object Detection Comparison in Performance, (SSD MobileNet V2 FPNLite 640x640) Model Test
Images (loss=0.213)
(a) Thermal Image (b) Classification Loss
Figure 8. Object Detection Comparison in Performance, (SSD ResNet101 V1 FPN 640x640) Model Test
Images (loss=0.665)
too far from the minimum. In addition, In addition, it does detect the false heads specified by test as illustrated
in Row 6of Table 2 and Figure 5a.For that reason, we do not look for the other parts of the figure, and we do
not select this algorithm for our other thermal object detection applications.
5. CONCLUSIONS
Progression of the object detection models utilized in a variety of areas is a result of the advancement
of deep learning algorithms including the Convolutional neural network (CNN). We show that we can compare
several object detection models based on thermal images. We show that we can use a small number of images
(56) by benefiting from pre-trained models by using a process called transfer learning. We found that ”Effi-
cientDet D0 512x512” and ”SSD ResNet50 V1 FPN 640x640” models perform the best in classification loss
and by visually showing higher ”True” detection and lower ”False” detection.
REFERENCES
[1] Ming-Ching Chang, Chen-Kuo Chiang, Chun-Ming Tsai, Yun-Kai Chang, Hsuan-Lun Chiang, Yu-An Wang, Shih-
Ya Chang, Yun-Lun Li, Ming-Shuin Tsai, and Hung-Yu Tseng. Ai city challenge 2020-computer vision for smart
transportation applications. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Workshops, pages 620–621, 2020.
[2] Yousef Sharrab Sharrab. Video stream adaptation in computer vision systems. 2017.
Machine learning-based energy consumption modeling and comparing (Yousef Sharrab)
8rISSN: 2088-8708
(a) Thermal Image (b) Classification Loss
Figure 9. Object Detection Comparison in Performance, (SSD MobileNet V1 FPN 640x640) Model Test
Images (loss=0.77)
Table 1. Test Images Probability of Confidence- True Positive
True Positive
No. OD Model Class. Loss Image1 Image2 Image3 Image4 Image5 Image6
1 EfficientDet D0 512x512 0.134 87% 58% 100% 90% 100% 100%
2 (SSD MobileNet V2 FPNLite 640x640 0.190 99% 96% 100% 96% 100% 100%
3 SSD MobileNet V2 FPNLite 320x320 0.212 0% 65% 100% 75% 100% 100%
4 SSD ResNet50 V1 FPN 640x640 0.213 95% 84% 96% 94% 96% 97%
5 SSD ResNet101 V1 FPN 640x640 0.665 74% 81% 86% 66% 57% 95%
6 SSD MobileNet V1 FPN 640x640 0.77 100% 94% 100% 97% 100% 100%
[3] Weili Fang, Lieyun Ding, Peter ED Love, Hanbin Luo, Heng Li, Feniosky Pena-Mora, Botao Zhong, and Cheng Zhou.
Computer vision applications in construction safety assurance. Automation in Construction, 110:103013, 2020.
[4] Yousef O Sharrab and Nabil J Sarhan. Modeling and analysis of power consumption in live video streaming systems.
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 13(4):1–25, 2017.
[5] Li Liu, Wanli Ouyang, Xiaogang Wang, Paul Fieguth, Jie Chen, Xinwang Liu, and Matti Pietik¨
ainen. Deep learning
for generic object detection: A survey. International journal of computer vision, 128(2):261–318, 2020.
[6] Polina Timofeeva. Object detection in thermal imagery for crowd density estimation. 2020.
[7] Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. Generalized
intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pages 658–666, 2019.
[8] R. Padilla, S. L. Netto, and E. A. B. da Silva. A survey on performance metrics for object-detection algorithms. In
2020 International Conference on Systems, Signals and Image Processing (IWSSIP), pages 237–242, 2020.
[9] Fredrik K Gustafsson, Martin Danelljan, and Thomas B Schon. Evaluating scalable bayesian deep learning methods
for robust computer vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Workshops, pages 318–319, 2020.
[10] Yousef O Sharrab and Nabil J Sarhan. Accuracy and power consumption tradeoffs in video rate adaptation for
computer vision applications. In 2012 IEEE International Conference on Multimedia and Expo, pages 410–415.
IEEE, 2012.
[11] Mohamed R Ibrahim, James Haworth, and Tao Cheng. Understanding cities with machine eyes: A review of deep
computer vision in urban analytics. Cities, 96:102481, 2020.
[12] Yousef O Sharrab and Nabil J Sarhan. Detailed comparative analysis of vp8 and h. 264. In 2012 IEEE International
Symposium on Multimedia, pages 133–140. IEEE, 2012.
Int J Elec & Comp Eng, Vol. x, No. x, xx 201x : xx – xx
Int J Elec & Comp Eng ISSN: 2088-8708 r9
Table 2. Test Images Probability of Confidence- False Positive
False Positive False Positive
No. OD Model Image7 Image8
1 EfficientDet D0 512x512 0% 0%
2 (SSD MobileNet V2 FPNLite 640x640 70% 63%
3 SSD MobileNet V2 FPNLite 320x320 0% 0%
4 SSD ResNet50 V1 FPN 640x640 0% 0%
5 SSD ResNet101 V1 FPN 640x640 0% 0%
6 SSD MobileNet V1 FPN 640x640 58% 0%
[13] Siddharth Singh Chouhan, Uday Pratap Singh, and Sanjeev Jain. Applications of computer vision in plant pathology:
a survey. Archives of computational methods in engineering, 27(2):611–632, 2020.
[14] Yousef O Sharrab and Nabil J Sarhan. Aggregate power consumption modeling of live video streaming systems. In
Proceedings of the 4th ACM Multimedia Systems Conference, pages 60–71, 2013.
[15] Markus Jangblad. Object detection in infrared images using deep convolutional neural networks, 2018.
[16] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection
and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages
580–587, 2014.
[17] Christian Szegedy, Alexander Toshev, and Dumitru Erhan. Deep neural networks for object detection. In Advances in
neural information processing systems, pages 2553–2561, 2013.
[18] https://github.com/tensorflow/models/blob/master/research/object detection/g3doc/tf2 detection zoo.md. Tensorflow
2 detection model zoo. 2017.
[19] Alexander Womg, Mohammad Javad Shafiee, Francis Li, and Brendan Chwyl. Tiny ssd: A tiny single-shot detection
deep convolutional neural network for real-time embedded object detection. In 2018 15th Conference on Computer
and Robot Vision (CRV), pages 95–101. IEEE, 2018.
[20] Bin Liu, Wencang Zhao, and Qiaoqiao Sun. Study of object detection based on faster r-cnn. In 2017 Chinese
Automation Congress (CAC), pages 6233–6236. IEEE, 2017.
[21] Nihad Karim Chowdhury, Md Rahman, Noortaz Rezoana, Muhammad Ashad Kabir, et al. Ecovnet: An ensemble
of deep convolutional neural networks based on efficientnet to detect covid-19 from chest x-rays. arXiv preprint
arXiv:2009.11850, 2020.
[22] Jisoo Park, Jingdao Chen, Yong K Cho, Dae Y Kang, and Byung J Son. Cnn-based person detection using infrared
images for night-time intrusion warning systems. Sensors, 20(1):34, 2020.
[23] Huaizhong Zhang, Chunbo Luo, Qi Wang, Matthew Kitchin, Andrew Parmley, Jesus Monge-Alvarez, and Pablo
Casaseca-De-La-Higuera. A novel infrared video surveillance system using deep learning based techniques. Multi-
media Tools and Applications, 77(20):26657–26676, 2018.
[24] Usha Mittal, Sonal Srivastava, and Priyanka Chawla. Object detection and classification from thermal images using
region based convolutional neural network. 2019.
[25] Mate Kriˇ
sto, Marina Ivasic-Kos, and Miran Pobar. Thermal object detection in difficult weather conditions using
yolo. IEEE Access, 8:125459–125476, 2020.
[26] Christopher Dahlin Rodin, Luciano Netto de Lima, Fabio Augusto de Alcantara Andrade, Diego Barreto Haddad,
Tor Arne Johansen, and Rune Storvold. Object classification in thermal images using convolutional neural networks
for search and rescue missions with unmanned aerial systems. In 2018 International Joint Conference on Neural
Networks (IJCNN), pages 1–8. IEEE, 2018.
[27] Diulhio Candido De Oliveira and Marco Aurelio Wehrmeister. Using deep learning and low-cost rgb and thermal
cameras to detect pedestrians in aerial images captured by multirotor uav. Sensors, 18(7):2244, 2018.
[28] Youngjun Cho, Nadia Bianchi-Berthouze, Nicolai Marquardt, and Simon J Julier. Deep thermal imaging: Proximate
material type recognition in the wild through deep learning of spatial surface temperature patterns. In Proceedings of
the 2018 CHI Conference on Human Factors in Computing Systems, pages 1–13, 2018.
Machine learning-based energy consumption modeling and comparing (Yousef Sharrab)
10 rISSN: 2088-8708
[29] D CHAITANYA, A NINAD, MS MANUJ, et al. Borrow from anywhere: Pseudo multi-modal object detection in
thermal imagery, 2020.
[30] Ye Wang, Yueru Chen, Jongmoo Choi, and C-C Jay Kuo. Towards visible and thermal drone monitoring with convo-
lutional neural networks. APSIPA Transactions on Signal and Information Processing, 8, 2019.
[31] https://public.roboflow.com/object-detection/thermal-dogs-and people. Thermal dogs and people dataset. 07 2020.
[32] https://blog.roboflow.com/cvat/. Getting started with cvat - annotation for computer vision. 2020.
[33] https://colab.research.google.com/drive/1sLqFKVV94wm lglFq 0kGo2ciM0kecWD. Tensorflow 2.0 object detector
on google colab. 2020.
Int J Elec & Comp Eng, Vol. x, No. x, xx 201x : xx – xx