Conference PaperPDF Available

Performance Comparison of Several Deep Learning-Based Object Detection Algorithms Utilizing Thermal Images

November 2021

November 2021

DOI:10.1109/IDSTA53674.2021.9660820

Conference: 2021 Second International Conference on Intelligent Data Science Technologies and Applications (IDSTA)

Authors:

Yousef Sharrab

General Motors Company

Show all 7 authorsHide

False positive, false negative, true positive, and true negative

…

Figures - uploaded by Yousef Sharrab

Content may be subject to copyright.

Content uploaded by Yousef Sharrab

Content may be subject to copyright.

International Journal of Electrical and Computer Engineering (IJECE)

Vol. x, No. x, xx 201x, pp. xx∼xx

ISSN: 2088-8708, DOI: 10.11591/ijece.vxix.ppxx-xx r1

Performance Comparison for Several Deep Learning-Based

Object Detection Algorithms utilizing Thermal Images

Yousef O. Sharrab1, Sanaa AlShboul2, Ala’ Khalifah3, Mohammad Alsmirat2

1Wayne State Multimedia Systems and Deep Learning Research Laboratory, ECE Department, Detroit, USA,

Email:{yousef.sharrab}@wayne.edu

2{Department of Computer Engineering, Department of Computer Science}, Jordan University of Science and

Technology, Jordan, Email: mailto:{Smalshboul16, masmirat}@just.edu.jo

3Department of Electrical Engineering, German Jordan University, Email: ala.khalifeh@gju.edu.jo

Article Info

Article history:

ABSTRACT

Keywords:

Object Detection

Detection Accuracy

Models Accuracy Comparison

Thermal Images

Tensorﬂow 2.0

SSD

ResNet

MobileNet

EfﬁcientDet

Computer Vision

Neural Networks

Deep Learning

Machine Learning

Advancement of the object detection prediction models used in a variety of ﬁelds is the

result of the advancement in deep learning learning approaches of machine learning

and computer vision technology. Utilizing such models in thermal imaging is excep-

tionally necessary and required.

Object detection algorithms are improving continuously. There are many common

Application Program Interface (APIs) or libraries to be used. The most two common

techniques ones are in Google Tensorﬂow object detection. Each Object Detection

has its advantages and disadvantages. A direct comparison between the most common

state-of-the-art standard object detection methods help in ﬁnding the best solution for

thermal image detection/recognition systems/application. This paper will discuss the

algorithms and compare them in terms of accuracy and classiﬁcation loss, we compare

the performance on thermal images for six most suitable object detection models that

are supported by tensorﬂow 2.0. Since we only have a small dataset (56) images, we

use transfer learning to beneﬁt from already trained (pre-trained) models.

201x Institute of Advanced Engineering and Science.

Corresponding Author:

Ala’ Khalifah3,

Department of Electrical Engineering,

German Jordan University,

Email: ala.khalifeh@gju.edu.jo

Journal homepage: http://iaescore.com/journals/index.php/IJECE

2rISSN: 2088-8708

1. INTRODUCTION

Recent advancements in Computer Vision (CV) and Object Detection (OD) are driven by the tech-

nology, Artiﬁcial Intelligence (AI), and Deep Learning (DL) widespread adoption by the industry. It is used

in self-driving cars, medical diagnostic processes, detecting threats in security systems, and detecting crop

diseases [1, 2]. Tensorﬂow is a deep learning framework that supports many of the latest models in video com-

munication, video streaming, speech recognition, Natural Language Processing (NLP), and speech synthesis

[3, 4].

Deep learning (DL) has been continuously showing the best performance for diverse problems in

the domain of computer vision. Deep learning approaches have recently contributed to the advancement of

the object detection models used for several applications [5, 6]. This paper will discuss the several standard

state-of-the-art object detectors algorithms and compare them in terms of accuracy and classiﬁcation loss. We

compare the performance on thermal images for six most suitable object detection models that are supported

by tensorﬂow 2.0. Since we only have a small dataset (56) images, we use transfer learning to beneﬁt from

already trained (pre-trained) models. The TensorFlow Object Detection API is a framework for creating a deep

learning network that solves object detection problems. It contains some pre-trained models trained on several

large data sets that can be used for inference. Additionally, we can use this framework to apply learning to

move in pre-trained models that have been pre-trained on large data sets that enable us to customize these

models for a speciﬁc task. The intuition behind learning to transfer to classify images is that if a model is

trained on a sufﬁciently large and general dataset, that model will function effectively as a general model for

the visual world. You can then leverage these acquired feature maps without having to start from scratch by

training a large model on a large data set. For object detection performance metric classiﬁcation loss and the

visual accuracy (True Positive, True Negative, False Positive, and False Negative) [7, 8].

Advances in object discovery models used in a variety of ﬁelds are the result of advances in deep

learning algorithms including the Convolutional Neural Network (CNN). We show that we can compare sev-

eral models of object detection based on thermal images. We show we can use a small number of thermal

images (56) by making use of them Pre-trained models using a process called transfer learning. We have found

that ”EfﬁcientDet D0 512x512” and ”SSD ResNet50 V1 FPN 640x640” models provide a better rating of clas-

siﬁcation loss performance and showing higher ”True” detection and less ”False” detection. The organization

of the paper is as takes after. Section 2. provides background information of the used encoders for compari-

son. Subsequently, Section 3. analyzes and discusses the performance assessment and evaluation procedures,

techniques, and methodology. At last, Section 4. illustrate and analyzes the most results.

2. BACKGROUND INFORMATION

Computer vision is becoming ubiquitous in the world, with applications in image understanding,

smartphone apps, self-driving cars, drones, video communication, video streaming, automated video surveil-

lance, security, safety, health and medicine [9, 10, 11, 12]. The core of many of these applications is visual

recognition tasks such as image classiﬁcation and object detection. Recent advancements in the neural network

approach have led to major advances in the performance of these state-of-the-art visual recognition systems

[13, 14].

The ﬁeld of computer vision is like vision for humans. Computer vision functions can be divided

into ﬁve Different areas as shown in Figure 1. The ability to sort images into several classes depending

on the content is called image classiﬁcation. Classiﬁcation and identifying the object called (classiﬁcation

+ localization). The ability to identify multiple objects in images and determine their location by putting

bounding rectangle boxes around objects is called object detection. To classify every pixel in the image is a

process called segmentation. There are mainly two types of segmentation; Semantic Segmentation which refers

to the process of linking each pixel in the given image to a particular class label. For example in the following

image the pixels are labelled as car, tree, pedestrian etc. These segments are then used to ﬁnd the interactions /

relations between various objects. on the other hand, instance Segmentation: Here, we associate a class label to

each pixel similar to semantic segmentation, except that it treats multiple objects of the same class as individual

objects / separate entities [15, 16, 17].

object detection is the core computer vision task, tensorﬂow Object Detection Application Program-

ming Interface (TF OD API) got greatly improved. Recently. Google released the new version of TF OD API

which supports Tensorﬂow 2.x. The TensorFlow 2 Object Detection API allows us to train a collection state

Int J Elec & Comp Eng, Vol. x, No. x, xx 201x : xx – xx

Int J Elec & Comp Eng ISSN: 2088-8708 r3

Figure 1. Computer Vision Tasks

of the art object detection models under a uniﬁed framework, including Google state of the art models such as

Single Shot Multibox Detector (SSD) (MobileNet/ResNet), Faster R-CNN (ResNet/Inception ResNet), Mask

R-CNN, CenterNet, ExtremeNet, and EfﬁcientDet. More generally, object detection models allow us to train

the computer to identify objects in a scene with bounding boxes and class labels. There are many ways we can

utilize deep learning techniques to model this problem and the TensorFlow 2 Object Detection API allows us

to deploy a wide variety of different models and strategies to achieve this goal.

The TensorFlow 2 includes a selection of trainable detection models, including: Region-Based Fully

Convolutional Networks (R-FCN) with Resnet 101, Faster RCNN with Resnet 50, Faster RCNN with Resnet

101, Faster RCNN with Inception Resnet v2, SSD with MobileNet, SSD with Inception V2, A list of all frozen

weights (trained on the COCO dataset) for each of the above models to be used for out-of-the-box inference

purposes can be found in [18, 19, 20, 21]. A Jupyter notebook for performing inference with one of these

released models convenient local training scripts as well as distributed training and evaluation pipelines via

Google Cloud.

Several research studies have tried to automatically detect people intruding into unauthorized areas

by utilizing infrared cameras. Study [22] proposes a precise and effective method for detecting intruders in

Machine learning-based energy consumption modeling and comparing (Yousef Sharrab)

4rISSN: 2088-8708

infrared images during the night-time. An infrared video surveillance system using deep learning is presented

in [23]. Terrorist threats have increased the concerns for the security of people. Unlike the RGB cameras which

do not perform well at night and in weather conditions, the thermal cameras have become a necessary part of

advanced video surveillance systems. By applying Faster R-CNN on thermal and RGB images, analysis shows

that thermal images are performs better compared to RGB images [24]. In Study [25], the authors investigate

the task of automatic people detection in thermal cameras utilizing convolutional neural network architecture

that is developed for detection in RGB images. They compare the performance of the standard state-of-the-art

ODs such as SSD, Faster R-CNN, and YOLO, that were retrained on a dataset of thermal images.

In study [26], the authors investigate the utilization of unmanned aerial systems in marine search and

rescue missions for the detection and classiﬁcation of objects on the surface of the sea. The data consists

of experimental thermal images. Authors of [27] use deep learning and RGB and thermal cameras to De-

tect pedestrians in aerial images Captured by multirotor Unmanned Aerial Vehicles (UAV). Study [28] uses a

smartphone thermal camera to capture thermal textures. A deep neural network classiﬁes these textures into

material type. Study [29] proposes the use of well-known image-to-image translation frameworks to generate

RGB equivalents of a given thermal image and then use a an architecture for object detection in the thermal

image. Study [30] reports a visible and thermal drone monitoring system that integrates deep learning-based

detection and tracking.

3. PERFORMANCE EVALUATION METHODOLOGY

3.1. Used Object Detection Algorithms, Dataset, and Performance Metrics

As for the experimental set up, we adopted these six object detection algorithms from tensorﬂow

2.0 API, ”EfﬁcientDet D0 512x512”, ”SSD MobileNet V1 FPN 640x640”, ”SSD MobileNet V2 FPN ite

320x320”, ”SSD MobileNet V2 FPNLite 640x640”, ”SSD ResNet50 V1 FPN 640x640”, ”SSD ResNet101

V1 FPN 640x640”. The Roboﬂow Thermal Dogs and People dataset is used, a collection of 203 thermal in-

frared images captured at various distances from people and dogs in a park and near a home. Images were

captured both portrait and landscape. Thermal images were captured using the Seek Compact XR Extra Range

Thermal Imaging Camera for iPhone [31]. The metrics we used are the classiﬁcation loss from the training

and validation set, the visual accuracy (True Positive, True Negative, False Positive, and False Negative) from

thermal test images. We want to maximize True Positives (a head, a detection) and True Negatives (no head, no

detection) since they are correctly predicted observations so that they are displayed in green color in Figure 2.

We want to minimize False Positives (a head, no detection) and False Negatives (no head, a detection) therefore

are shown in red color in Figure 2.

Figure 2. False positive, false negative, true positive, and true negative

3.2. Procedure

We use the 203 dataset images in [31], we select those images of human with head, which are 56

images, we annotate them by specifying a mask on the head [32]. We use Google Colab [33] to train, validate,

and test the six object detection algorithms as in Figure 3.

Int J Elec & Comp Eng, Vol. x, No. x, xx 201x : xx – xx

Int J Elec & Comp Eng ISSN: 2088-8708 r5

Figure 3. Deep Learning-Based Object detection Training Block Diagram

4. RESULT PRESENTATION AND ANALYSIS

In this section, we analyze the Performance for Several deep learning-based object detection algo-

rithms and perform a comparison among them to select the most suitable algorithm for head detection utilizing

thermal Images, to be used to predict a human temperature entering a building or a campus.

For ”EfﬁcientDet D0 512x512” Model, as we see in Figure 4b and Row 1of Table 1, classiﬁcation loss

is 0.134, which is the minimum of all the algorithms under test. In addition, this algorithm able to detect all the

heads in the test images with high probabilities as illustrated in Row 1of Table 1 and Figure 4a. Furthermore, it

does not detect the false heads that some other algorithms detected as illustrated in Row 1of Table 2 and Figure

4a. For these reasons, we select this algorithm for our other thermal object detection applications. Model ”SSD

(a) Thermal Images (b) Classiﬁcation Loss

Figure 4. Object Detection Comparison in Performance, (EfﬁcientDet D0 512x512) Model Test Images

(loss=0.134)

MobileNet V2 FPNLite 640x640”, as we see in Figure 5b and Row 2of Table 1, classiﬁcation loss is 0.190,

which is close to the minimum. In addition, this algorithm able to detect all the heads in the test images with

high probabilities as illustrated in Row 2of Table 1 and Figure 5a. On the other hand, it does detect the false

Machine learning-based energy consumption modeling and comparing (Yousef Sharrab)

6rISSN: 2088-8708

heads speciﬁed by test as illustrated in Row 2of Table 2 and Figure 5a. For these reasons, we do not select this

algorithm for our other thermal object detection applications.

(a) Thermal Image 1 (b) Classiﬁcation Loss

Figure 5. Object Detection Comparison in Performance, (SSD MobileNet V2 FPNLite 640x640) Model Test

Images (loss=0.190)

We do not select Model ”SSD MobileNet V2 FPNLite 320x320”, as we see in Figure 6b and Row

3of Table 1, classiﬁcation loss is 0.212, which is still close to the minimum, but this algorithm does not

able to detect all the heads in the test images as illustrated in Row 3of Table 1 and Figure 6a. On the other

hand, it does detect the false heads speciﬁed by the test as illustrated in Row 3of Table 2 and Figure 6a. For

these reasons, we do not select this algorithm for our other thermal object detection applications. For ”SSD

(a) Thermal Image (b) Classiﬁcation Loss

Figure 6. Object Detection Comparison in Performance, (SSD MobileNet V2 FPNLite 320x320) Model Test

Images (loss=0.212)

MobileNet V2 FPNLite 320x320) Model”, as we see in Figure 7b and Row 4of Table 1, classiﬁcation loss

is 0.213, is relatively low. In addition, this algorithm able to detect all the heads in the test images with high

probabilities as illustrated in Row 4of Table 1 and Figure 7a. Furthermore, it does not detect the false heads

that some other algorithms detected as illustrated in Row 4of Table 2 and Figure 7a. For these reasons, we

select this algorithm for our other thermal object detection applications.

Model ”SSD ResNet101 V1 FPN 640x640”, as we see in Figure 9b and Row 5of Table 1, classiﬁ-

cation loss is 0.665, which is far from the minimum. For that reason, we do not look for the other parts of

the ﬁgure, and we do not select this algorithm for our other thermal object detection applications. Model SSD

MobileNet V1 FPN 640x640 , as we see in Figure 9b and Row 6of Table 1, classiﬁcation loss is 0.77, which is

Int J Elec & Comp Eng, Vol. x, No. x, xx 201x : xx – xx

Int J Elec & Comp Eng ISSN: 2088-8708 r7

(a) Thermal Image (b) Classiﬁcation Loss

Figure 7. Object Detection Comparison in Performance, (SSD MobileNet V2 FPNLite 640x640) Model Test

Images (loss=0.213)

(a) Thermal Image (b) Classiﬁcation Loss

Figure 8. Object Detection Comparison in Performance, (SSD ResNet101 V1 FPN 640x640) Model Test

Images (loss=0.665)

too far from the minimum. In addition, In addition, it does detect the false heads speciﬁed by test as illustrated

in Row 6of Table 2 and Figure 5a.For that reason, we do not look for the other parts of the ﬁgure, and we do

not select this algorithm for our other thermal object detection applications.

5. CONCLUSIONS

Progression of the object detection models utilized in a variety of areas is a result of the advancement

of deep learning algorithms including the Convolutional neural network (CNN). We show that we can compare

several object detection models based on thermal images. We show that we can use a small number of images

(56) by beneﬁting from pre-trained models by using a process called transfer learning. We found that ”Efﬁ-

cientDet D0 512x512” and ”SSD ResNet50 V1 FPN 640x640” models perform the best in classiﬁcation loss

and by visually showing higher ”True” detection and lower ”False” detection.

REFERENCES

[1] Ming-Ching Chang, Chen-Kuo Chiang, Chun-Ming Tsai, Yun-Kai Chang, Hsuan-Lun Chiang, Yu-An Wang, Shih-

Ya Chang, Yun-Lun Li, Ming-Shuin Tsai, and Hung-Yu Tseng. Ai city challenge 2020-computer vision for smart

transportation applications. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Workshops, pages 620–621, 2020.

[2] Yousef Sharrab Sharrab. Video stream adaptation in computer vision systems. 2017.

Machine learning-based energy consumption modeling and comparing (Yousef Sharrab)

8rISSN: 2088-8708

(a) Thermal Image (b) Classiﬁcation Loss

Figure 9. Object Detection Comparison in Performance, (SSD MobileNet V1 FPN 640x640) Model Test

Images (loss=0.77)

Table 1. Test Images Probability of Conﬁdence- True Positive

True Positive

No. OD Model Class. Loss Image1 Image2 Image3 Image4 Image5 Image6

1 EfﬁcientDet D0 512x512 0.134 87% 58% 100% 90% 100% 100%

2 (SSD MobileNet V2 FPNLite 640x640 0.190 99% 96% 100% 96% 100% 100%

3 SSD MobileNet V2 FPNLite 320x320 0.212 0% 65% 100% 75% 100% 100%

4 SSD ResNet50 V1 FPN 640x640 0.213 95% 84% 96% 94% 96% 97%

5 SSD ResNet101 V1 FPN 640x640 0.665 74% 81% 86% 66% 57% 95%

6 SSD MobileNet V1 FPN 640x640 0.77 100% 94% 100% 97% 100% 100%

[3] Weili Fang, Lieyun Ding, Peter ED Love, Hanbin Luo, Heng Li, Feniosky Pena-Mora, Botao Zhong, and Cheng Zhou.

Computer vision applications in construction safety assurance. Automation in Construction, 110:103013, 2020.

[4] Yousef O Sharrab and Nabil J Sarhan. Modeling and analysis of power consumption in live video streaming systems.

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 13(4):1–25, 2017.

[5] Li Liu, Wanli Ouyang, Xiaogang Wang, Paul Fieguth, Jie Chen, Xinwang Liu, and Matti Pietik¨

ainen. Deep learning

for generic object detection: A survey. International journal of computer vision, 128(2):261–318, 2020.

[6] Polina Timofeeva. Object detection in thermal imagery for crowd density estimation. 2020.

[7] Hamid Rezatoﬁghi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. Generalized

intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE Conference on

Computer Vision and Pattern Recognition, pages 658–666, 2019.

[8] R. Padilla, S. L. Netto, and E. A. B. da Silva. A survey on performance metrics for object-detection algorithms. In

2020 International Conference on Systems, Signals and Image Processing (IWSSIP), pages 237–242, 2020.

[9] Fredrik K Gustafsson, Martin Danelljan, and Thomas B Schon. Evaluating scalable bayesian deep learning methods

for robust computer vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Workshops, pages 318–319, 2020.

[10] Yousef O Sharrab and Nabil J Sarhan. Accuracy and power consumption tradeoffs in video rate adaptation for

computer vision applications. In 2012 IEEE International Conference on Multimedia and Expo, pages 410–415.

IEEE, 2012.

[11] Mohamed R Ibrahim, James Haworth, and Tao Cheng. Understanding cities with machine eyes: A review of deep

computer vision in urban analytics. Cities, 96:102481, 2020.

[12] Yousef O Sharrab and Nabil J Sarhan. Detailed comparative analysis of vp8 and h. 264. In 2012 IEEE International

Symposium on Multimedia, pages 133–140. IEEE, 2012.

Int J Elec & Comp Eng, Vol. x, No. x, xx 201x : xx – xx

Int J Elec & Comp Eng ISSN: 2088-8708 r9

Table 2. Test Images Probability of Conﬁdence- False Positive

False Positive False Positive

No. OD Model Image7 Image8

1 EfﬁcientDet D0 512x512 0% 0%

2 (SSD MobileNet V2 FPNLite 640x640 70% 63%

3 SSD MobileNet V2 FPNLite 320x320 0% 0%

4 SSD ResNet50 V1 FPN 640x640 0% 0%

5 SSD ResNet101 V1 FPN 640x640 0% 0%

6 SSD MobileNet V1 FPN 640x640 58% 0%

[13] Siddharth Singh Chouhan, Uday Pratap Singh, and Sanjeev Jain. Applications of computer vision in plant pathology:

a survey. Archives of computational methods in engineering, 27(2):611–632, 2020.

[14] Yousef O Sharrab and Nabil J Sarhan. Aggregate power consumption modeling of live video streaming systems. In

Proceedings of the 4th ACM Multimedia Systems Conference, pages 60–71, 2013.

[15] Markus Jangblad. Object detection in infrared images using deep convolutional neural networks, 2018.

[16] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection

and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages

580–587, 2014.

[17] Christian Szegedy, Alexander Toshev, and Dumitru Erhan. Deep neural networks for object detection. In Advances in

neural information processing systems, pages 2553–2561, 2013.

[18] https://github.com/tensorﬂow/models/blob/master/research/object detection/g3doc/tf2 detection zoo.md. Tensorﬂow

2 detection model zoo. 2017.

[19] Alexander Womg, Mohammad Javad Shaﬁee, Francis Li, and Brendan Chwyl. Tiny ssd: A tiny single-shot detection

deep convolutional neural network for real-time embedded object detection. In 2018 15th Conference on Computer

and Robot Vision (CRV), pages 95–101. IEEE, 2018.

[20] Bin Liu, Wencang Zhao, and Qiaoqiao Sun. Study of object detection based on faster r-cnn. In 2017 Chinese

Automation Congress (CAC), pages 6233–6236. IEEE, 2017.

[21] Nihad Karim Chowdhury, Md Rahman, Noortaz Rezoana, Muhammad Ashad Kabir, et al. Ecovnet: An ensemble

of deep convolutional neural networks based on efﬁcientnet to detect covid-19 from chest x-rays. arXiv preprint

arXiv:2009.11850, 2020.

[22] Jisoo Park, Jingdao Chen, Yong K Cho, Dae Y Kang, and Byung J Son. Cnn-based person detection using infrared

images for night-time intrusion warning systems. Sensors, 20(1):34, 2020.

[23] Huaizhong Zhang, Chunbo Luo, Qi Wang, Matthew Kitchin, Andrew Parmley, Jesus Monge-Alvarez, and Pablo

Casaseca-De-La-Higuera. A novel infrared video surveillance system using deep learning based techniques. Multi-

media Tools and Applications, 77(20):26657–26676, 2018.

[24] Usha Mittal, Sonal Srivastava, and Priyanka Chawla. Object detection and classiﬁcation from thermal images using

region based convolutional neural network. 2019.

[25] Mate Kriˇ

sto, Marina Ivasic-Kos, and Miran Pobar. Thermal object detection in difﬁcult weather conditions using

yolo. IEEE Access, 8:125459–125476, 2020.

[26] Christopher Dahlin Rodin, Luciano Netto de Lima, Fabio Augusto de Alcantara Andrade, Diego Barreto Haddad,

Tor Arne Johansen, and Rune Storvold. Object classiﬁcation in thermal images using convolutional neural networks

for search and rescue missions with unmanned aerial systems. In 2018 International Joint Conference on Neural

Networks (IJCNN), pages 1–8. IEEE, 2018.

[27] Diulhio Candido De Oliveira and Marco Aurelio Wehrmeister. Using deep learning and low-cost rgb and thermal

cameras to detect pedestrians in aerial images captured by multirotor uav. Sensors, 18(7):2244, 2018.

[28] Youngjun Cho, Nadia Bianchi-Berthouze, Nicolai Marquardt, and Simon J Julier. Deep thermal imaging: Proximate

material type recognition in the wild through deep learning of spatial surface temperature patterns. In Proceedings of

the 2018 CHI Conference on Human Factors in Computing Systems, pages 1–13, 2018.

Machine learning-based energy consumption modeling and comparing (Yousef Sharrab)

10 rISSN: 2088-8708

[29] D CHAITANYA, A NINAD, MS MANUJ, et al. Borrow from anywhere: Pseudo multi-modal object detection in

thermal imagery, 2020.

[30] Ye Wang, Yueru Chen, Jongmoo Choi, and C-C Jay Kuo. Towards visible and thermal drone monitoring with convo-

lutional neural networks. APSIPA Transactions on Signal and Information Processing, 8, 2019.

[31] https://public.roboﬂow.com/object-detection/thermal-dogs-and people. Thermal dogs and people dataset. 07 2020.

[32] https://blog.roboﬂow.com/cvat/. Getting started with cvat - annotation for computer vision. 2020.

[33] https://colab.research.google.com/drive/1sLqFKVV94wm lglFq 0kGo2ciM0kecWD. Tensorﬂow 2.0 object detector

on google colab. 2020.

Int J Elec & Comp Eng, Vol. x, No. x, xx 201x : xx – xx

iHELP: a model for instant learning of video coding in VR/AR real-time applications

Article

Full-text available

Mar 2024
MULTIMED TOOLS APPL

Virtual and augmented reality (VR/AR), teleoperation, and telepresence technologies heavily depend on video streaming and playback to enable immersive user experiences. However, the substantial bandwidth requirements and file sizes associated with VR/AR and 360-degree video content present significant challenges for efficient transmission and storage. Modern video coding standards, including HEVC, AV1, VP9, VVC, and EVC, have been designed to address these issues by enhancing coding efficiency while maintaining video quality on par with the H.264 standard. Nonetheless, the adaptive block structures inherent to these video coding standards introduce increased computational complexity, necessitating additional intra-prediction modes. The integration of AI in video coding has the potential to substantially improve video compression efficiency, reduce file sizes, and enhance video quality, making it a crucial area of research and development within the video coding domain. As AI systems can execute a wide array of tasks and adapt to new challenges, their incorporation into video coding may result in even more advanced compression techniques and innovative solutions to meet the ever-evolving demands of the industry. In this study, we introduce a state-of-the-art adaptive instant learning-based model, named iHELP, developed to address the computational complexity arising from encoders’ adaptive block structures. The iHELP model achieves outstanding coding efficiency and quality while considerably improving encoding speed. iHELP model has been tested on HEVC, but it applies to other encoders with similar adaptive block structures. iHELP model employs entropy-based block similarity to predict the splitting decision of the LCU, determining whether to divide the block based on the correlation between the block content and previously adjacent encoded blocks in both spatial and temporal dimensions. Our methodology has been rigorously evaluated using the HEVC standard’s common test conditions, and the results indicate that iHELP serves as an effective solution for efficient video coding in bandwidth-constrained situations, making it suitable for real-time video applications. The proposed method achieves an 80% reduction in encoding time while maintaining comparable PSNR performance relative to the RDO approach. The exceptional potential of the iHELP model calls for further exploration, as no other existing methods have demonstrated such a high level of performance.

Medicinal Plants Recognition Using Deep Learning

Conference Paper

Full-text available

Jun 2023

The use of medicinal plants has been a longstanding practice in traditional medicine worldwide. Accurately identifying medicinal plants is crucial for determining their medicinal properties and potential applications. However, it can be a challenging task due to the complexity of their appearance. Variations in growth stage, lighting, and imaging conditions can make classification challenging, which limits the application of traditional methods for plant identification. This paper proposes a deep learning-based approach that uses a convolutional neural network (CNN) based on the VGG-16 model. With a dataset of 25,686 images, the CNN is capable of learning and representing complex features in images, enabling it to recognize and classify medicinal plants with high accuracy. The proposed approach can efficiently classify plants with different growth stages, lighting conditions, and imaging settings, providing a reliable tool for plant identification. We achieved an impressive recognition rate of 98%, demonstrating the feasibility of using deep learning techniques for accurate plant classification. The proposed approach has enormous potential for providing healthcare professionals and herbal medicine researchers with a reliable tool for identifying herbal plants. the study represents an essential advancement in the use of deep learning techniques for medicinal plant recognition, overcoming the challenges posed by their complex appearance. The proposed approach has farreaching implications and can significantly impact the field of herbal medicine research, enabling researchers and healthcare professionals to identify and classify medicinal plants more accurately

Deep Neural Networks Hydrologic and Hydraulic Modeling in Flood Hazard Analysis

Preprint

Full-text available

Mar 2024

Natural disasters can be devastating to the environment and natural resources. Flood inundation mapping and hydraulic modeling are essential to forecast critical flood information, including flood depth and water surface height. In this research, several factors that influence floods were studied. These factors include the intensity of the rainstorm, the depth of precipitation, soil types, geologic settings, and topographic features. Furthermore, the research carried out hydraulic modeling of storm flows for 50- and 100-Year return periods and estimated that the water depth in Wadi Al Wala could reach 15m at 50 years of storm and 25m at 100 return years of storms. A DNN model is developed with good accuracy to predict flood flow based on historical records from 1980 to 2018 meteorological data. The goal of this research is to improve flood prediction, and risk assessment with the use of DNN integrated with hydrological and hydraulic models.

Detecting Distracted Drivers Using Convolutional Neural Networks

Conference Paper

Oct 2023

Medicinal Plants Recognition Using Deep Learning

Conference Paper

Full-text available

Jun 2023

The use of medicinal plants has been a long-standing practice in traditional medicine worldwide. Accurately identifying medicinal plants is crucial for determining their medicinal properties and potential applications. However, it can be a challenging task due to the complexity of their appearance. Variations in growth stage, lighting, and imaging conditions can make classification challenging, which limits the application of traditional methods for plant identification. This paper proposes a deep learning-based approach that uses a Convolutional Neural Network (CNN) based on the Visual Geometry Group (VGG-16) model. With a dataset of 25,686 images, the CNN is capable of learning and representing complex features in images, enabling it to recognize and classify medicinal plants with high accuracy. The proposed approach can efficiently classify plants with different growth stages, lighting conditions, and imaging settings, providing a reliable tool for plant identification. We achieved an impressive recognition rate of 98%, demonstrating the feasibility of using deep learning techniques for accurate plant classification. The proposed approach has enormous potential for providing healthcare professionals and herbal medicine researchers with a reliable tool for identifying herbal plants. The study represents an essential advancement in the use of deep learning techniques for medicinal plant recognition, overcoming the challenges posed by their complex appearance. The proposed approach has far-reaching implications and can significantly impact the field of herbal medicine research, enabling researchers and healthcare professionals to identify and classify medicinal plants more accurately

Prediction and modeling of water quality using deep neural networks

Preprint

Full-text available

May 2023

Water pollution is one of the most challenging environmental issues. A powerful tool for measuring the suitability of water for drinking is required. The Water Quality Index (WQI) is a widely used parameter for the assessment of water quality through mathematical formulas. In this paper, a Deep Neural Network (DNN) model is developed to forecast WQI based on parameters selected for the dry and wet seasons throughout the year. Statistical modeling and unsupervised machine learning techniques are used. These mod-elings include the Principal Component Analysis/Factor Analysis (PCA/FA) which is used to interpret seasonal changes and the sources of springs under study. The other modeling technique utilized in this study is the Hierarchical Cluster Analysis (HCA). The results of this study reveal that the developed DNN model has achieved a high accuracy of (0.951). The goodness of fit of the developed model using R-Squared (R2) is (0.98) which is deemed high. The Mean Square Error metric is close to zero. Furthermore, the PCA/FA revealed five major parameters that impact water quality which together account for 92% of the total variance of water quality in summer and 96% in winter. Moreover, results show that the average of the WQI for all springs is of poor water quality at 46.75% during the dry season and medium water quality at 55.5% during the wet season.

Prediction and modeling of water quality using deep neural networks

Article

Full-text available

May 2023
Environ Dev Sustain

Water pollution is one of the most challenging environmental issues. A powerful tool for measuring the suitability of water for drinking is required. The Water Quality Index (WQI) is a widely used parameter for the assessment of water quality through mathematical formulas. In this paper, a Deep Neural Network (DNN) model is developed to forecast WQI based on parameters selected for the dry and wet seasons throughout the year. Statistical modeling and unsupervised machine learning techniques are used. These modelings include the Principal Component Analysis/Factor Analysis (PCA/FA) which is used to interpret seasonal changes and the sources of springs under study. The other modeling technique utilized in this study is the Hierarchical Cluster Analysis (HCA). The results of this study reveal that the developed DNN model has achieved a high accuracy of ***. The goodness of fit of the developed model using R-Squared (R2) is 0.98 which is deemed high. The Mean Square Error metric is close to zero. Furthermore, the PCA/FA revealed five major parameters that impact water quality which together account for 92% of the total variance of water quality in summer and 96% in winter. Moreover, results show that the average of the WQI for all springs is of poor water quality at 46.75% during the dry season and medium water quality at 55.5% during the wet season.

Why Do Consumers Adopt Smart Voice Assistants for Shopping Purposes? A Perspective from Complexity Theory

Article

Full-text available

May 2023

The widespread appeal of Smart Voice Assistants (SVAs) stems from their ability to enhance the everyday lives of consumers in a practical, enjoyable, and meaningful manner. Despite their popularity, the factors that shape consumer adoption of SVAs remain largely unexplored. To address this research gap, we utilized complexity theory to construct an integrated model that sheds light on the determinants of consumer decision-making in regard to SVA adoption. Furthermore, we applied fuzzy-set Qualitative Comparative Analysis (fsQCA) to examine the proposed model and uncover the causal recipes associated with SVA adoption. Our necessary condition analysis highlights that perceived ease of use, perceived usefulness, perceived humanness, and perceived social presence are necessary predictors for consumers' intentions to adopt SVA. This study constitutes a significant addition to the existing literature by providing a comprehensive and nuanced understanding of the drivers of SVA adoption. Moreover, it offers crucial implications for online service provider managers to improve the adoption of SVAs among their customers.

Toward Smart and Immersive Classroom based on AI, VR, and 6G

Article

Full-text available

Jan 2023

The technological revolutions greatly impact current and future Classrooms. These advances in technology include the revolution of artificial intelligence, virtual reality, and super highspeed internet. In the coming next generation (6G), the data rate will be very sufficient for live scenes in virtual reality applications such as telepresence and teleoperation. This paper review and discuss next-generation technologies in AI, VR, and communication. Moreover, we examine the motivation for establishing an Advanced Technology based Smart and Immersive Classroom (SIC) and the advantages of its availability to the virtual society. Recent advances in computer and communications technology have delivered capabilities to tomorrow’s SIC. Advances in virtual reality and real-time streaming on the internet have created a revolution in curricula and classrooms. Index Terms—Virtual Reality in education; 6G for education, Artificial Intelligence in education, Immersive Classroom

Assessing thermal imagery integration into object detection methods on air-based collection platforms

Article

Full-text available

May 2023

Object detection models commonly focus on utilizing the visible spectrum via Red–Green–Blue (RGB) imagery. Due to various limitations with this approach in low visibility settings, there is growing interest in fusing RGB with thermal Long Wave Infrared (LWIR) (7.5–13.5 µm) images to increase object detection performance. However, we still lack baseline performance metrics evaluating RGB, LWIR and RGB-LWIR fused object detection machine learning models, especially from air-based platforms. This study undertakes such an evaluation, finding that a blended RGB-LWIR model generally exhibits superior performance compared to independent RGB or LWIR approaches. For example, an RGB-LWIR blend only performs 1–5% behind the RGB approach in predictive power across various altitudes and periods of clear visibility. Yet, RGB fusion with a thermal signature overlay provides edge redundancy and edge emphasis, both which are vital in supporting edge detection machine learning algorithms (especially in low visibility environments). This approach has the ability to improve object detection performance for a range of use cases in industrial, consumer, government, and military applications. This research greatly contributes to the study of multispectral object detection by quantifying key factors affecting model performance from drone platforms (including distance, time-of-day and sensor type). Finally, this research additionally contributes a novel open labeled training dataset of 6300 images for RGB, LWIR, and RGB-LWIR fused imagery, collected from air-based platforms, enabling further multispectral machine-driven object detection research.

Video Stream Adaptation In Computer Vision Systems VIDEO STREAM ADAPTATION IN COMPUTER VISION SYSTEMS

Thesis

Full-text available

Jan 2017

P.h.D.

Machine learning-based energy consumption modeling and comparison of H.264/AVC and google VP8 encoders

Article

Full-text available

Apr 2021
IJECE

Advancement of the prediction models used in a variety of fields is a result of the contribution of machine learning approaches. Utilizing such modeling in feature engineering is exceptionally imperative and required. In this research, we show how to utilize machine learning to save time in research experiments, where we save more than five thousand hours of measuring the energy consumption of encoding recordings. Since measuring the energy consumption has got to be done by humans and since we require more than eleven thousand experiments to cover all the combinations of video sequences, video bit_rate, and video encoding settings, we utilize machine learning to model the energy consumption utilizing linear regression. VP8 codec has been offered by Google as an open video encoder in an effort to replace the popular MPEG-4 Part 10, known as H.264/AVC video encoder standard. This research model energy consumption and describes the major differences between H.264/AVC and VP8 encoders in terms of energy consumption and performance through experiments that are based on machine learning modeling. Twenty-nine raw video sequences are used, offering a wide range of resolutions and contents, with the frame sizes ranging from QCIF(176x144) to 2160p(3840x2160). For fairness in comparison analysis, we use seven settings in VP8 encoder and fifteen types of tuning in H.264/AVC. The settings cover various video qualities. The performance metrics include video qualities, encoding time, and encoding energy consumption.

Overlapping Vehicles Detection Using Different Object Detection Methodologies

Presentation

Full-text available

Feb 2021

Computer Vision Vehicle detection

A Survey on Performance Metrics for Object-Detection Algorithms

Conference Paper

Full-text available

Jul 2020

This work explores and compares the plethora of metrics for the performance evaluation of object-detection algorithms. Average precision (AP), for instance, is a popular metric for evaluating the accuracy of object detectors by estimating the area under the curve (AUC) of the precision × recall relationship. Depending on the point interpolation used in the plot, two different AP variants can be defined and, therefore, different results are generated. AP has six additional variants increasing the possibilities of benchmarking. The lack of consensus in different works and AP implementations is a problem faced by the academic and scientific communities. Metric implementations written in different computational languages and platforms are usually distributed with corresponding datasets sharing a given bounding-box description. Such projects indeed help the community with evaluation tools, but demand extra work to be adapted for other datasets and bounding-box formats. This work reviews the most used metrics for object detection detaching their differences, applications, and main concepts. It also proposes a standard implementation that can be used as a benchmark among different datasets with minimum adaptation on the annotation files.

A Comprehensive Survey on Transfer Learning

Article

Full-text available

Jul 2020

Transfer learning aims at improving the performance of target learners on target domains by transferring the knowledge contained in different but related source domains. In this way, the dependence on a large number of target-domain data can be reduced for constructing target learners. Due to the wide application prospects, transfer learning has become a popular and promising area in machine learning. Although there are already some valuable and impressive surveys on transfer learning, these surveys introduce approaches in a relatively isolated way and lack the recent advances in transfer learning. Due to the rapid expansion of the transfer learning area, it is both necessary and challenging to comprehensively review the relevant studies. This survey attempts to connect and systematize the existing transfer learning research studies, as well as to summarize and interpret the mechanisms and the strategies of transfer learning in a comprehensive way, which may help readers have a better understanding of the current research status and ideas. Unlike previous surveys, this survey article reviews more than 40 representative transfer learning approaches, especially homogeneous transfer learning approaches, from the perspectives of data and model. The applications of transfer learning are also briefly introduced. In order to show the performance of different transfer learning models, over 20 representative transfer learning models are used for experiments. The models are performed on three different data sets, that is, Amazon Reviews, Reuters-21578, and Office-31, and the experimental results demonstrate the importance of selecting appropriate transfer learning models for different applications in practice.

Thermal Object Detection in Difficult Weather Conditions Using YOLO

Article

Full-text available

Jul 2020

Global terrorist threats and illegal migration have intensified concerns for the security of citizens, and every effort is made to exploit all available technological advances to prevent adverse events and protect people and their property. Due to the ability to use at night and in weather conditions where RGB cameras do not perform well, thermal cameras have become an important component of sophisticated video surveillance systems. In this paper, we investigate the task of automatic person detection in thermal images using convolutional neural network models originally intended for detection in RGB images. We compare the performance of the standard state-of-the-art object detectors such as Faster R-CNN, SSD, Cascade R-CNN, and YOLOv3, that were retrained on a dataset of thermal images extracted from videos that simulate illegal movements around the border and in protected areas. Videos are recorded at night in clear weather, rain, and in the fog, at different ranges, and with different movement types. YOLOv3 was significantly faster than other detectors while achieving performance comparable with the best, so it was used in further experiments. We experimented with different training dataset settings in order to determine the minimum number of images needed to achieve good detection results on test datasets. We achieved excellent detection results with respect to average accuracy for all test scenarios although a modest set of thermal images was used for training. We test our trained model on different well known and widely used thermal imaging datasets as well. In addition, we present the results of the recognition of humans and animals in thermal images, which is particularly important in the case of sneaking around objects and illegal border crossings. Also, we present our original thermal dataset used for experimentation that contains surveillance videos recorded at different weather and shooting conditions.

EfficientDet: Scalable and Efficient Object Detection

Conference Paper

Jun 2020

AI City Challenge 2020 – Computer Vision for Smart Transportation Applications

Conference Paper

Jun 2020

Evaluating Scalable Bayesian Deep Learning Methods for Robust Computer Vision

Conference Paper

Jun 2020

A Survey on Performance Metrics for Object-Detection Algorithms

Conference Paper

Jul 2020

Performance Comparison of Several Deep Learning-Based Object Detection Algorithms Utilizing Thermal Images

Figures

Recommended publications

A Deep Learning Framework for Detection of Targets in Thermal Images to Improve Firefighting

Video coding deep learning-based modeling for long life video streaming over next network generation

Video coding deep learning-based modeling for long life video streaming over next network generation

Towards the availability of video communication in artificial intelligence-based computer vision sys...

Machine learning-based energy consumption modeling and comparison of H.264/AVC and google VP8 encode...