Conference PaperPDF Available

Performance Comparison of Several Deep Learning-Based Object Detection Algorithms Utilizing Thermal Images

Authors:

Figures

Content may be subject to copyright.
International Journal of Electrical and Computer Engineering (IJECE)
Vol. x, No. x, xx 201x, pp. xxxx
ISSN: 2088-8708, DOI: 10.11591/ijece.vxix.ppxx-xx r1
Performance Comparison for Several Deep Learning-Based
Object Detection Algorithms utilizing Thermal Images
Yousef O. Sharrab1, Sanaa AlShboul2, Ala’ Khalifah3, Mohammad Alsmirat2
1Wayne State Multimedia Systems and Deep Learning Research Laboratory, ECE Department, Detroit, USA,
Email:{yousef.sharrab}@wayne.edu
2{Department of Computer Engineering, Department of Computer Science}, Jordan University of Science and
Technology, Jordan, Email: mailto:{Smalshboul16, masmirat}@just.edu.jo
3Department of Electrical Engineering, German Jordan University, Email: ala.khalifeh@gju.edu.jo
Article Info
Article history:
ABSTRACT
Keywords:
Object Detection
Detection Accuracy
Models Accuracy Comparison
Thermal Images
Tensorflow 2.0
SSD
ResNet
MobileNet
EfficientDet
Computer Vision
Neural Networks
Deep Learning
Machine Learning
Advancement of the object detection prediction models used in a variety of fields is the
result of the advancement in deep learning learning approaches of machine learning
and computer vision technology. Utilizing such models in thermal imaging is excep-
tionally necessary and required.
Object detection algorithms are improving continuously. There are many common
Application Program Interface (APIs) or libraries to be used. The most two common
techniques ones are in Google Tensorflow object detection. Each Object Detection
has its advantages and disadvantages. A direct comparison between the most common
state-of-the-art standard object detection methods help in finding the best solution for
thermal image detection/recognition systems/application. This paper will discuss the
algorithms and compare them in terms of accuracy and classification loss, we compare
the performance on thermal images for six most suitable object detection models that
are supported by tensorflow 2.0. Since we only have a small dataset (56) images, we
use transfer learning to benefit from already trained (pre-trained) models.
Copyright c
201x Institute of Advanced Engineering and Science.
All rights reserved.
Corresponding Author:
Ala’ Khalifah3,
Department of Electrical Engineering,
German Jordan University,
Email: ala.khalifeh@gju.edu.jo
Journal homepage: http://iaescore.com/journals/index.php/IJECE
2rISSN: 2088-8708
1. INTRODUCTION
Recent advancements in Computer Vision (CV) and Object Detection (OD) are driven by the tech-
nology, Artificial Intelligence (AI), and Deep Learning (DL) widespread adoption by the industry. It is used
in self-driving cars, medical diagnostic processes, detecting threats in security systems, and detecting crop
diseases [1, 2]. Tensorflow is a deep learning framework that supports many of the latest models in video com-
munication, video streaming, speech recognition, Natural Language Processing (NLP), and speech synthesis
[3, 4].
Deep learning (DL) has been continuously showing the best performance for diverse problems in
the domain of computer vision. Deep learning approaches have recently contributed to the advancement of
the object detection models used for several applications [5, 6]. This paper will discuss the several standard
state-of-the-art object detectors algorithms and compare them in terms of accuracy and classification loss. We
compare the performance on thermal images for six most suitable object detection models that are supported
by tensorflow 2.0. Since we only have a small dataset (56) images, we use transfer learning to benefit from
already trained (pre-trained) models. The TensorFlow Object Detection API is a framework for creating a deep
learning network that solves object detection problems. It contains some pre-trained models trained on several
large data sets that can be used for inference. Additionally, we can use this framework to apply learning to
move in pre-trained models that have been pre-trained on large data sets that enable us to customize these
models for a specific task. The intuition behind learning to transfer to classify images is that if a model is
trained on a sufficiently large and general dataset, that model will function effectively as a general model for
the visual world. You can then leverage these acquired feature maps without having to start from scratch by
training a large model on a large data set. For object detection performance metric classification loss and the
visual accuracy (True Positive, True Negative, False Positive, and False Negative) [7, 8].
Advances in object discovery models used in a variety of fields are the result of advances in deep
learning algorithms including the Convolutional Neural Network (CNN). We show that we can compare sev-
eral models of object detection based on thermal images. We show we can use a small number of thermal
images (56) by making use of them Pre-trained models using a process called transfer learning. We have found
that ”EfficientDet D0 512x512” and ”SSD ResNet50 V1 FPN 640x640” models provide a better rating of clas-
sification loss performance and showing higher ”True” detection and less ”False” detection. The organization
of the paper is as takes after. Section 2. provides background information of the used encoders for compari-
son. Subsequently, Section 3. analyzes and discusses the performance assessment and evaluation procedures,
techniques, and methodology. At last, Section 4. illustrate and analyzes the most results.
2. BACKGROUND INFORMATION
Computer vision is becoming ubiquitous in the world, with applications in image understanding,
smartphone apps, self-driving cars, drones, video communication, video streaming, automated video surveil-
lance, security, safety, health and medicine [9, 10, 11, 12]. The core of many of these applications is visual
recognition tasks such as image classification and object detection. Recent advancements in the neural network
approach have led to major advances in the performance of these state-of-the-art visual recognition systems
[13, 14].
The field of computer vision is like vision for humans. Computer vision functions can be divided
into five Different areas as shown in Figure 1. The ability to sort images into several classes depending
on the content is called image classification. Classification and identifying the object called (classification
+ localization). The ability to identify multiple objects in images and determine their location by putting
bounding rectangle boxes around objects is called object detection. To classify every pixel in the image is a
process called segmentation. There are mainly two types of segmentation; Semantic Segmentation which refers
to the process of linking each pixel in the given image to a particular class label. For example in the following
image the pixels are labelled as car, tree, pedestrian etc. These segments are then used to find the interactions /
relations between various objects. on the other hand, instance Segmentation: Here, we associate a class label to
each pixel similar to semantic segmentation, except that it treats multiple objects of the same class as individual
objects / separate entities [15, 16, 17].
object detection is the core computer vision task, tensorflow Object Detection Application Program-
ming Interface (TF OD API) got greatly improved. Recently. Google released the new version of TF OD API
which supports Tensorflow 2.x. The TensorFlow 2 Object Detection API allows us to train a collection state
Int J Elec & Comp Eng, Vol. x, No. x, xx 201x : xx xx
Int J Elec & Comp Eng ISSN: 2088-8708 r3
Figure 1. Computer Vision Tasks
of the art object detection models under a unified framework, including Google state of the art models such as
Single Shot Multibox Detector (SSD) (MobileNet/ResNet), Faster R-CNN (ResNet/Inception ResNet), Mask
R-CNN, CenterNet, ExtremeNet, and EfficientDet. More generally, object detection models allow us to train
the computer to identify objects in a scene with bounding boxes and class labels. There are many ways we can
utilize deep learning techniques to model this problem and the TensorFlow 2 Object Detection API allows us
to deploy a wide variety of different models and strategies to achieve this goal.
The TensorFlow 2 includes a selection of trainable detection models, including: Region-Based Fully
Convolutional Networks (R-FCN) with Resnet 101, Faster RCNN with Resnet 50, Faster RCNN with Resnet
101, Faster RCNN with Inception Resnet v2, SSD with MobileNet, SSD with Inception V2, A list of all frozen
weights (trained on the COCO dataset) for each of the above models to be used for out-of-the-box inference
purposes can be found in [18, 19, 20, 21]. A Jupyter notebook for performing inference with one of these
released models convenient local training scripts as well as distributed training and evaluation pipelines via
Google Cloud.
Several research studies have tried to automatically detect people intruding into unauthorized areas
by utilizing infrared cameras. Study [22] proposes a precise and effective method for detecting intruders in
Machine learning-based energy consumption modeling and comparing (Yousef Sharrab)
4rISSN: 2088-8708
infrared images during the night-time. An infrared video surveillance system using deep learning is presented
in [23]. Terrorist threats have increased the concerns for the security of people. Unlike the RGB cameras which
do not perform well at night and in weather conditions, the thermal cameras have become a necessary part of
advanced video surveillance systems. By applying Faster R-CNN on thermal and RGB images, analysis shows
that thermal images are performs better compared to RGB images [24]. In Study [25], the authors investigate
the task of automatic people detection in thermal cameras utilizing convolutional neural network architecture
that is developed for detection in RGB images. They compare the performance of the standard state-of-the-art
ODs such as SSD, Faster R-CNN, and YOLO, that were retrained on a dataset of thermal images.
In study [26], the authors investigate the utilization of unmanned aerial systems in marine search and
rescue missions for the detection and classification of objects on the surface of the sea. The data consists
of experimental thermal images. Authors of [27] use deep learning and RGB and thermal cameras to De-
tect pedestrians in aerial images Captured by multirotor Unmanned Aerial Vehicles (UAV). Study [28] uses a
smartphone thermal camera to capture thermal textures. A deep neural network classifies these textures into
material type. Study [29] proposes the use of well-known image-to-image translation frameworks to generate
RGB equivalents of a given thermal image and then use a an architecture for object detection in the thermal
image. Study [30] reports a visible and thermal drone monitoring system that integrates deep learning-based
detection and tracking.
3. PERFORMANCE EVALUATION METHODOLOGY
3.1. Used Object Detection Algorithms, Dataset, and Performance Metrics
As for the experimental set up, we adopted these six object detection algorithms from tensorflow
2.0 API, ”EfficientDet D0 512x512”, ”SSD MobileNet V1 FPN 640x640”, ”SSD MobileNet V2 FPN ite
320x320”, ”SSD MobileNet V2 FPNLite 640x640”, ”SSD ResNet50 V1 FPN 640x640”, ”SSD ResNet101
V1 FPN 640x640”. The Roboflow Thermal Dogs and People dataset is used, a collection of 203 thermal in-
frared images captured at various distances from people and dogs in a park and near a home. Images were
captured both portrait and landscape. Thermal images were captured using the Seek Compact XR Extra Range
Thermal Imaging Camera for iPhone [31]. The metrics we used are the classification loss from the training
and validation set, the visual accuracy (True Positive, True Negative, False Positive, and False Negative) from
thermal test images. We want to maximize True Positives (a head, a detection) and True Negatives (no head, no
detection) since they are correctly predicted observations so that they are displayed in green color in Figure 2.
We want to minimize False Positives (a head, no detection) and False Negatives (no head, a detection) therefore
are shown in red color in Figure 2.
Figure 2. False positive, false negative, true positive, and true negative
3.2. Procedure
We use the 203 dataset images in [31], we select those images of human with head, which are 56
images, we annotate them by specifying a mask on the head [32]. We use Google Colab [33] to train, validate,
and test the six object detection algorithms as in Figure 3.
Int J Elec & Comp Eng, Vol. x, No. x, xx 201x : xx xx
Int J Elec & Comp Eng ISSN: 2088-8708 r5
Figure 3. Deep Learning-Based Object detection Training Block Diagram
4. RESULT PRESENTATION AND ANALYSIS
In this section, we analyze the Performance for Several deep learning-based object detection algo-
rithms and perform a comparison among them to select the most suitable algorithm for head detection utilizing
thermal Images, to be used to predict a human temperature entering a building or a campus.
For ”EfficientDet D0 512x512” Model, as we see in Figure 4b and Row 1of Table 1, classification loss
is 0.134, which is the minimum of all the algorithms under test. In addition, this algorithm able to detect all the
heads in the test images with high probabilities as illustrated in Row 1of Table 1 and Figure 4a. Furthermore, it
does not detect the false heads that some other algorithms detected as illustrated in Row 1of Table 2 and Figure
4a. For these reasons, we select this algorithm for our other thermal object detection applications. Model ”SSD
(a) Thermal Images (b) Classification Loss
Figure 4. Object Detection Comparison in Performance, (EfficientDet D0 512x512) Model Test Images
(loss=0.134)
MobileNet V2 FPNLite 640x640”, as we see in Figure 5b and Row 2of Table 1, classification loss is 0.190,
which is close to the minimum. In addition, this algorithm able to detect all the heads in the test images with
high probabilities as illustrated in Row 2of Table 1 and Figure 5a. On the other hand, it does detect the false
Machine learning-based energy consumption modeling and comparing (Yousef Sharrab)
6rISSN: 2088-8708
heads specified by test as illustrated in Row 2of Table 2 and Figure 5a. For these reasons, we do not select this
algorithm for our other thermal object detection applications.
(a) Thermal Image 1 (b) Classification Loss
Figure 5. Object Detection Comparison in Performance, (SSD MobileNet V2 FPNLite 640x640) Model Test
Images (loss=0.190)
We do not select Model ”SSD MobileNet V2 FPNLite 320x320”, as we see in Figure 6b and Row
3of Table 1, classification loss is 0.212, which is still close to the minimum, but this algorithm does not
able to detect all the heads in the test images as illustrated in Row 3of Table 1 and Figure 6a. On the other
hand, it does detect the false heads specified by the test as illustrated in Row 3of Table 2 and Figure 6a. For
these reasons, we do not select this algorithm for our other thermal object detection applications. For ”SSD
(a) Thermal Image (b) Classification Loss
Figure 6. Object Detection Comparison in Performance, (SSD MobileNet V2 FPNLite 320x320) Model Test
Images (loss=0.212)
MobileNet V2 FPNLite 320x320) Model”, as we see in Figure 7b and Row 4of Table 1, classification loss
is 0.213, is relatively low. In addition, this algorithm able to detect all the heads in the test images with high
probabilities as illustrated in Row 4of Table 1 and Figure 7a. Furthermore, it does not detect the false heads
that some other algorithms detected as illustrated in Row 4of Table 2 and Figure 7a. For these reasons, we
select this algorithm for our other thermal object detection applications.
Model ”SSD ResNet101 V1 FPN 640x640”, as we see in Figure 9b and Row 5of Table 1, classifi-
cation loss is 0.665, which is far from the minimum. For that reason, we do not look for the other parts of
the figure, and we do not select this algorithm for our other thermal object detection applications. Model SSD
MobileNet V1 FPN 640x640 , as we see in Figure 9b and Row 6of Table 1, classification loss is 0.77, which is
Int J Elec & Comp Eng, Vol. x, No. x, xx 201x : xx xx
Int J Elec & Comp Eng ISSN: 2088-8708 r7
(a) Thermal Image (b) Classification Loss
Figure 7. Object Detection Comparison in Performance, (SSD MobileNet V2 FPNLite 640x640) Model Test
Images (loss=0.213)
(a) Thermal Image (b) Classification Loss
Figure 8. Object Detection Comparison in Performance, (SSD ResNet101 V1 FPN 640x640) Model Test
Images (loss=0.665)
too far from the minimum. In addition, In addition, it does detect the false heads specified by test as illustrated
in Row 6of Table 2 and Figure 5a.For that reason, we do not look for the other parts of the figure, and we do
not select this algorithm for our other thermal object detection applications.
5. CONCLUSIONS
Progression of the object detection models utilized in a variety of areas is a result of the advancement
of deep learning algorithms including the Convolutional neural network (CNN). We show that we can compare
several object detection models based on thermal images. We show that we can use a small number of images
(56) by benefiting from pre-trained models by using a process called transfer learning. We found that ”Effi-
cientDet D0 512x512” and ”SSD ResNet50 V1 FPN 640x640” models perform the best in classification loss
and by visually showing higher ”True” detection and lower ”False” detection.
REFERENCES
[1] Ming-Ching Chang, Chen-Kuo Chiang, Chun-Ming Tsai, Yun-Kai Chang, Hsuan-Lun Chiang, Yu-An Wang, Shih-
Ya Chang, Yun-Lun Li, Ming-Shuin Tsai, and Hung-Yu Tseng. Ai city challenge 2020-computer vision for smart
transportation applications. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Workshops, pages 620–621, 2020.
[2] Yousef Sharrab Sharrab. Video stream adaptation in computer vision systems. 2017.
Machine learning-based energy consumption modeling and comparing (Yousef Sharrab)
8rISSN: 2088-8708
(a) Thermal Image (b) Classification Loss
Figure 9. Object Detection Comparison in Performance, (SSD MobileNet V1 FPN 640x640) Model Test
Images (loss=0.77)
Table 1. Test Images Probability of Confidence- True Positive
True Positive
No. OD Model Class. Loss Image1 Image2 Image3 Image4 Image5 Image6
1 EfficientDet D0 512x512 0.134 87% 58% 100% 90% 100% 100%
2 (SSD MobileNet V2 FPNLite 640x640 0.190 99% 96% 100% 96% 100% 100%
3 SSD MobileNet V2 FPNLite 320x320 0.212 0% 65% 100% 75% 100% 100%
4 SSD ResNet50 V1 FPN 640x640 0.213 95% 84% 96% 94% 96% 97%
5 SSD ResNet101 V1 FPN 640x640 0.665 74% 81% 86% 66% 57% 95%
6 SSD MobileNet V1 FPN 640x640 0.77 100% 94% 100% 97% 100% 100%
[3] Weili Fang, Lieyun Ding, Peter ED Love, Hanbin Luo, Heng Li, Feniosky Pena-Mora, Botao Zhong, and Cheng Zhou.
Computer vision applications in construction safety assurance. Automation in Construction, 110:103013, 2020.
[4] Yousef O Sharrab and Nabil J Sarhan. Modeling and analysis of power consumption in live video streaming systems.
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 13(4):1–25, 2017.
[5] Li Liu, Wanli Ouyang, Xiaogang Wang, Paul Fieguth, Jie Chen, Xinwang Liu, and Matti Pietik¨
ainen. Deep learning
for generic object detection: A survey. International journal of computer vision, 128(2):261–318, 2020.
[6] Polina Timofeeva. Object detection in thermal imagery for crowd density estimation. 2020.
[7] Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. Generalized
intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pages 658–666, 2019.
[8] R. Padilla, S. L. Netto, and E. A. B. da Silva. A survey on performance metrics for object-detection algorithms. In
2020 International Conference on Systems, Signals and Image Processing (IWSSIP), pages 237–242, 2020.
[9] Fredrik K Gustafsson, Martin Danelljan, and Thomas B Schon. Evaluating scalable bayesian deep learning methods
for robust computer vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Workshops, pages 318–319, 2020.
[10] Yousef O Sharrab and Nabil J Sarhan. Accuracy and power consumption tradeoffs in video rate adaptation for
computer vision applications. In 2012 IEEE International Conference on Multimedia and Expo, pages 410–415.
IEEE, 2012.
[11] Mohamed R Ibrahim, James Haworth, and Tao Cheng. Understanding cities with machine eyes: A review of deep
computer vision in urban analytics. Cities, 96:102481, 2020.
[12] Yousef O Sharrab and Nabil J Sarhan. Detailed comparative analysis of vp8 and h. 264. In 2012 IEEE International
Symposium on Multimedia, pages 133–140. IEEE, 2012.
Int J Elec & Comp Eng, Vol. x, No. x, xx 201x : xx xx
Int J Elec & Comp Eng ISSN: 2088-8708 r9
Table 2. Test Images Probability of Confidence- False Positive
False Positive False Positive
No. OD Model Image7 Image8
1 EfficientDet D0 512x512 0% 0%
2 (SSD MobileNet V2 FPNLite 640x640 70% 63%
3 SSD MobileNet V2 FPNLite 320x320 0% 0%
4 SSD ResNet50 V1 FPN 640x640 0% 0%
5 SSD ResNet101 V1 FPN 640x640 0% 0%
6 SSD MobileNet V1 FPN 640x640 58% 0%
[13] Siddharth Singh Chouhan, Uday Pratap Singh, and Sanjeev Jain. Applications of computer vision in plant pathology:
a survey. Archives of computational methods in engineering, 27(2):611–632, 2020.
[14] Yousef O Sharrab and Nabil J Sarhan. Aggregate power consumption modeling of live video streaming systems. In
Proceedings of the 4th ACM Multimedia Systems Conference, pages 60–71, 2013.
[15] Markus Jangblad. Object detection in infrared images using deep convolutional neural networks, 2018.
[16] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection
and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages
580–587, 2014.
[17] Christian Szegedy, Alexander Toshev, and Dumitru Erhan. Deep neural networks for object detection. In Advances in
neural information processing systems, pages 2553–2561, 2013.
[18] https://github.com/tensorflow/models/blob/master/research/object detection/g3doc/tf2 detection zoo.md. Tensorflow
2 detection model zoo. 2017.
[19] Alexander Womg, Mohammad Javad Shafiee, Francis Li, and Brendan Chwyl. Tiny ssd: A tiny single-shot detection
deep convolutional neural network for real-time embedded object detection. In 2018 15th Conference on Computer
and Robot Vision (CRV), pages 95–101. IEEE, 2018.
[20] Bin Liu, Wencang Zhao, and Qiaoqiao Sun. Study of object detection based on faster r-cnn. In 2017 Chinese
Automation Congress (CAC), pages 6233–6236. IEEE, 2017.
[21] Nihad Karim Chowdhury, Md Rahman, Noortaz Rezoana, Muhammad Ashad Kabir, et al. Ecovnet: An ensemble
of deep convolutional neural networks based on efficientnet to detect covid-19 from chest x-rays. arXiv preprint
arXiv:2009.11850, 2020.
[22] Jisoo Park, Jingdao Chen, Yong K Cho, Dae Y Kang, and Byung J Son. Cnn-based person detection using infrared
images for night-time intrusion warning systems. Sensors, 20(1):34, 2020.
[23] Huaizhong Zhang, Chunbo Luo, Qi Wang, Matthew Kitchin, Andrew Parmley, Jesus Monge-Alvarez, and Pablo
Casaseca-De-La-Higuera. A novel infrared video surveillance system using deep learning based techniques. Multi-
media Tools and Applications, 77(20):26657–26676, 2018.
[24] Usha Mittal, Sonal Srivastava, and Priyanka Chawla. Object detection and classification from thermal images using
region based convolutional neural network. 2019.
[25] Mate Kriˇ
sto, Marina Ivasic-Kos, and Miran Pobar. Thermal object detection in difficult weather conditions using
yolo. IEEE Access, 8:125459–125476, 2020.
[26] Christopher Dahlin Rodin, Luciano Netto de Lima, Fabio Augusto de Alcantara Andrade, Diego Barreto Haddad,
Tor Arne Johansen, and Rune Storvold. Object classification in thermal images using convolutional neural networks
for search and rescue missions with unmanned aerial systems. In 2018 International Joint Conference on Neural
Networks (IJCNN), pages 1–8. IEEE, 2018.
[27] Diulhio Candido De Oliveira and Marco Aurelio Wehrmeister. Using deep learning and low-cost rgb and thermal
cameras to detect pedestrians in aerial images captured by multirotor uav. Sensors, 18(7):2244, 2018.
[28] Youngjun Cho, Nadia Bianchi-Berthouze, Nicolai Marquardt, and Simon J Julier. Deep thermal imaging: Proximate
material type recognition in the wild through deep learning of spatial surface temperature patterns. In Proceedings of
the 2018 CHI Conference on Human Factors in Computing Systems, pages 1–13, 2018.
Machine learning-based energy consumption modeling and comparing (Yousef Sharrab)
10 rISSN: 2088-8708
[29] D CHAITANYA, A NINAD, MS MANUJ, et al. Borrow from anywhere: Pseudo multi-modal object detection in
thermal imagery, 2020.
[30] Ye Wang, Yueru Chen, Jongmoo Choi, and C-C Jay Kuo. Towards visible and thermal drone monitoring with convo-
lutional neural networks. APSIPA Transactions on Signal and Information Processing, 8, 2019.
[31] https://public.roboflow.com/object-detection/thermal-dogs-and people. Thermal dogs and people dataset. 07 2020.
[32] https://blog.roboflow.com/cvat/. Getting started with cvat - annotation for computer vision. 2020.
[33] https://colab.research.google.com/drive/1sLqFKVV94wm lglFq 0kGo2ciM0kecWD. Tensorflow 2.0 object detector
on google colab. 2020.
Int J Elec & Comp Eng, Vol. x, No. x, xx 201x : xx xx
... Advanced video compression standards such as H.264, H.265, and VP9 are used to reduce the size of video files while maintaining the quality of the video. Video encoding is an important technology that has enabled the delivery of high-quality video content over the internet, and it will continue to be a crucial aspect of video technology as video resolutions continue to increase [16]. ...
... Significant contributions in 2023 include Al-Ghuwairi et al.'s work on intrusion detection in cloud computing using machine learning [34], Sharrab et al.'s deep neural networks in social media forensics [35], and Parikh et al.'s program on cooperative adaptive cruise control [36]. Sharrab et al. [15] compared deep learning-based object detection algorithms [16], while Sharrab et al. [1] discussed AI, VR, and 6G in smart, immersive classrooms [1]. Tarabin et al. [37] focused on detecting distracted drivers [37], and Al-Ghuwairi et al. [34] visualized software refactoring [38]. ...
Article
Full-text available
Virtual and augmented reality (VR/AR), teleoperation, and telepresence technologies heavily depend on video streaming and playback to enable immersive user experiences. However, the substantial bandwidth requirements and file sizes associated with VR/AR and 360-degree video content present significant challenges for efficient transmission and storage. Modern video coding standards, including HEVC, AV1, VP9, VVC, and EVC, have been designed to address these issues by enhancing coding efficiency while maintaining video quality on par with the H.264 standard. Nonetheless, the adaptive block structures inherent to these video coding standards introduce increased computational complexity, necessitating additional intra-prediction modes. The integration of AI in video coding has the potential to substantially improve video compression efficiency, reduce file sizes, and enhance video quality, making it a crucial area of research and development within the video coding domain. As AI systems can execute a wide array of tasks and adapt to new challenges, their incorporation into video coding may result in even more advanced compression techniques and innovative solutions to meet the ever-evolving demands of the industry. In this study, we introduce a state-of-the-art adaptive instant learning-based model, named iHELP, developed to address the computational complexity arising from encoders’ adaptive block structures. The iHELP model achieves outstanding coding efficiency and quality while considerably improving encoding speed. iHELP model has been tested on HEVC, but it applies to other encoders with similar adaptive block structures. iHELP model employs entropy-based block similarity to predict the splitting decision of the LCU, determining whether to divide the block based on the correlation between the block content and previously adjacent encoded blocks in both spatial and temporal dimensions. Our methodology has been rigorously evaluated using the HEVC standard’s common test conditions, and the results indicate that iHELP serves as an effective solution for efficient video coding in bandwidth-constrained situations, making it suitable for real-time video applications. The proposed method achieves an 80% reduction in encoding time while maintaining comparable PSNR performance relative to the RDO approach. The exceptional potential of the iHELP model calls for further exploration, as no other existing methods have demonstrated such a high level of performance.
... However, accurately identifying herbal plants can be a challenge, especially for individuals without specialized knowledge in botany and plant systematics [1]. In recent years, there has been growing interest in utilizing computer vision [2] and deep learning techniques to aid in the identification of herbal plants [3]- [6]. These techniques have the potential to provide a fast and accurate method for plant identification, making them valuable tools for healthcare professionals and researchers in the field of herbal medicine. ...
... The actual accuracy of a VGG-16 implementation on the ImageNet dataset can vary based on factors such as the training data, implementation, and procedure. The research questions and objectives include: (1) Can a deep learning-based system accurately recognize and identify herbal medicinal plants using a dataset of images?, (2) How does the proposed deep learning-based system compare to traditional machine learning methods for recognizing and identifying herbal medicinal plants?, (3) How can the performance of the deep learning-based system be improved?, and (4) How well can the system perform in natural environments? ...
Conference Paper
Full-text available
The use of medicinal plants has been a longstanding practice in traditional medicine worldwide. Accurately identifying medicinal plants is crucial for determining their medicinal properties and potential applications. However, it can be a challenging task due to the complexity of their appearance. Variations in growth stage, lighting, and imaging conditions can make classification challenging, which limits the application of traditional methods for plant identification. This paper proposes a deep learning-based approach that uses a convolutional neural network (CNN) based on the VGG-16 model. With a dataset of 25,686 images, the CNN is capable of learning and representing complex features in images, enabling it to recognize and classify medicinal plants with high accuracy. The proposed approach can efficiently classify plants with different growth stages, lighting conditions, and imaging settings, providing a reliable tool for plant identification. We achieved an impressive recognition rate of 98%, demonstrating the feasibility of using deep learning techniques for accurate plant classification. The proposed approach has enormous potential for providing healthcare professionals and herbal medicine researchers with a reliable tool for identifying herbal plants. the study represents an essential advancement in the use of deep learning techniques for medicinal plant recognition, overcoming the challenges posed by their complex appearance. The proposed approach has farreaching implications and can significantly impact the field of herbal medicine research, enabling researchers and healthcare professionals to identify and classify medicinal plants more accurately
... The weight affects the input to the next neuron's output and the nal output layer. Initial weights are assigned randomly, but when the network is iteratively trained, the weights are optimized to ensure the network predicts correctly [32]. ...
Preprint
Full-text available
Natural disasters can be devastating to the environment and natural resources. Flood inundation mapping and hydraulic modeling are essential to forecast critical flood information, including flood depth and water surface height. In this research, several factors that influence floods were studied. These factors include the intensity of the rainstorm, the depth of precipitation, soil types, geologic settings, and topographic features. Furthermore, the research carried out hydraulic modeling of storm flows for 50- and 100-Year return periods and estimated that the water depth in Wadi Al Wala could reach 15m at 50 years of storm and 25m at 100 return years of storms. A DNN model is developed with good accuracy to predict flood flow based on historical records from 1980 to 2018 meteorological data. The goal of this research is to improve flood prediction, and risk assessment with the use of DNN integrated with hydrological and hydraulic models.
... Object detection and instance segmentation: CNNs have been extended to detect and localize multiple objects in images, such as with the R-CNN family of models [9], [10]. These models combine region proposal techniques with CNNs to perform object detection and instance segmentation, providing pixel-level object boundaries instead of just bounding boxes. ...
... However, accurately identifying herbal plants can be a challenge, especially for individuals without specialized knowledge in botany and plant systematics [1]. In recent years, there has been growing interest in utilizing computer vision [2] and Deep Neural Network (DNN) techniques to aid in the identification of herbal plants [3]- [6]. These techniques have the potential to provide a fast and accurate method for plant identification, making them valuable tools for healthcare professionals and researchers in the field of herbal medicine. ...
Conference Paper
Full-text available
The use of medicinal plants has been a long-standing practice in traditional medicine worldwide. Accurately identifying medicinal plants is crucial for determining their medicinal properties and potential applications. However, it can be a challenging task due to the complexity of their appearance. Variations in growth stage, lighting, and imaging conditions can make classification challenging, which limits the application of traditional methods for plant identification. This paper proposes a deep learning-based approach that uses a Convolutional Neural Network (CNN) based on the Visual Geometry Group (VGG-16) model. With a dataset of 25,686 images, the CNN is capable of learning and representing complex features in images, enabling it to recognize and classify medicinal plants with high accuracy. The proposed approach can efficiently classify plants with different growth stages, lighting conditions, and imaging settings, providing a reliable tool for plant identification. We achieved an impressive recognition rate of 98%, demonstrating the feasibility of using deep learning techniques for accurate plant classification. The proposed approach has enormous potential for providing healthcare professionals and herbal medicine researchers with a reliable tool for identifying herbal plants. The study represents an essential advancement in the use of deep learning techniques for medicinal plant recognition, overcoming the challenges posed by their complex appearance. The proposed approach has far-reaching implications and can significantly impact the field of herbal medicine research, enabling researchers and healthcare professionals to identify and classify medicinal plants more accurately
... It is very useful for applications with large amounts of data and contains DNN algorithms (Goodfellow et al., 2016). A DNN is a network with multiple layers (input and output), and every layer contains nodes that connect with all nodes in the next layer (LeCun et al., 2015;Sharrab et al., 2021). The DNN was considered a solution to many problems by predicting the output variables from the input features (Dawson & Wilby, 2001). ...
Preprint
Full-text available
Water pollution is one of the most challenging environmental issues. A powerful tool for measuring the suitability of water for drinking is required. The Water Quality Index (WQI) is a widely used parameter for the assessment of water quality through mathematical formulas. In this paper, a Deep Neural Network (DNN) model is developed to forecast WQI based on parameters selected for the dry and wet seasons throughout the year. Statistical modeling and unsupervised machine learning techniques are used. These mod-elings include the Principal Component Analysis/Factor Analysis (PCA/FA) which is used to interpret seasonal changes and the sources of springs under study. The other modeling technique utilized in this study is the Hierarchical Cluster Analysis (HCA). The results of this study reveal that the developed DNN model has achieved a high accuracy of (0.951). The goodness of fit of the developed model using R-Squared (R2) is (0.98) which is deemed high. The Mean Square Error metric is close to zero. Furthermore, the PCA/FA revealed five major parameters that impact water quality which together account for 92% of the total variance of water quality in summer and 96% in winter. Moreover, results show that the average of the WQI for all springs is of poor water quality at 46.75% during the dry season and medium water quality at 55.5% during the wet season.
... It is very useful for applications with large amounts of data and contains DNN algorithms (Goodfellow et al., 2016). A DNN is a network with multiple layers (input and output), and every layer contains nodes that connect with all nodes in the next layer (LeCun et al., 2015;Sharrab et al., 2021). The DNN was considered a solution to many problems by predicting the output variables from the input features (Dawson & Wilby, 2001). ...
Article
Full-text available
Water pollution is one of the most challenging environmental issues. A powerful tool for measuring the suitability of water for drinking is required. The Water Quality Index (WQI) is a widely used parameter for the assessment of water quality through mathematical formulas. In this paper, a Deep Neural Network (DNN) model is developed to forecast WQI based on parameters selected for the dry and wet seasons throughout the year. Statistical modeling and unsupervised machine learning techniques are used. These modelings include the Principal Component Analysis/Factor Analysis (PCA/FA) which is used to interpret seasonal changes and the sources of springs under study. The other modeling technique utilized in this study is the Hierarchical Cluster Analysis (HCA). The results of this study reveal that the developed DNN model has achieved a high accuracy of ***. The goodness of fit of the developed model using R-Squared (R2) is 0.98 which is deemed high. The Mean Square Error metric is close to zero. Furthermore, the PCA/FA revealed five major parameters that impact water quality which together account for 92% of the total variance of water quality in summer and 96% in winter. Moreover, results show that the average of the WQI for all springs is of poor water quality at 46.75% during the dry season and medium water quality at 55.5% during the wet season.
... It is very useful for applications with large amounts of data and contains DNN algorithms (Goodfellow et al., 2016). A DNN is a network with multiple layers (input and output), and every layer contains nodes that connect with all nodes in the next layer (LeCun et al., 2015;Sharrab et al., 2021). The DNN was considered a solution to many problems by predicting the output variables from the input features (Dawson & Wilby, 2001). ...
Article
Full-text available
The widespread appeal of Smart Voice Assistants (SVAs) stems from their ability to enhance the everyday lives of consumers in a practical, enjoyable, and meaningful manner. Despite their popularity, the factors that shape consumer adoption of SVAs remain largely unexplored. To address this research gap, we utilized complexity theory to construct an integrated model that sheds light on the determinants of consumer decision-making in regard to SVA adoption. Furthermore, we applied fuzzy-set Qualitative Comparative Analysis (fsQCA) to examine the proposed model and uncover the causal recipes associated with SVA adoption. Our necessary condition analysis highlights that perceived ease of use, perceived usefulness, perceived humanness, and perceived social presence are necessary predictors for consumers' intentions to adopt SVA. This study constitutes a significant addition to the existing literature by providing a comprehensive and nuanced understanding of the drivers of SVA adoption. Moreover, it offers crucial implications for online service provider managers to improve the adoption of SVAs among their customers.
... Finally, section VII present the conclusion. [17], [18]. The Second Stage: Expert Systems. ...
Article
Full-text available
The technological revolutions greatly impact current and future Classrooms. These advances in technology include the revolution of artificial intelligence, virtual reality, and super highspeed internet. In the coming next generation (6G), the data rate will be very sufficient for live scenes in virtual reality applications such as telepresence and teleoperation. This paper review and discuss next-generation technologies in AI, VR, and communication. Moreover, we examine the motivation for establishing an Advanced Technology based Smart and Immersive Classroom (SIC) and the advantages of its availability to the virtual society. Recent advances in computer and communications technology have delivered capabilities to tomorrow’s SIC. Advances in virtual reality and real-time streaming on the internet have created a revolution in curricula and classrooms. Index Terms—Virtual Reality in education; 6G for education, Artificial Intelligence in education, Immersive Classroom
Article
Full-text available
Object detection models commonly focus on utilizing the visible spectrum via Red–Green–Blue (RGB) imagery. Due to various limitations with this approach in low visibility settings, there is growing interest in fusing RGB with thermal Long Wave Infrared (LWIR) (7.5–13.5 µm) images to increase object detection performance. However, we still lack baseline performance metrics evaluating RGB, LWIR and RGB-LWIR fused object detection machine learning models, especially from air-based platforms. This study undertakes such an evaluation, finding that a blended RGB-LWIR model generally exhibits superior performance compared to independent RGB or LWIR approaches. For example, an RGB-LWIR blend only performs 1–5% behind the RGB approach in predictive power across various altitudes and periods of clear visibility. Yet, RGB fusion with a thermal signature overlay provides edge redundancy and edge emphasis, both which are vital in supporting edge detection machine learning algorithms (especially in low visibility environments). This approach has the ability to improve object detection performance for a range of use cases in industrial, consumer, government, and military applications. This research greatly contributes to the study of multispectral object detection by quantifying key factors affecting model performance from drone platforms (including distance, time-of-day and sensor type). Finally, this research additionally contributes a novel open labeled training dataset of 6300 images for RGB, LWIR, and RGB-LWIR fused imagery, collected from air-based platforms, enabling further multispectral machine-driven object detection research.
Article
Full-text available
Advancement of the prediction models used in a variety of fields is a result of the contribution of machine learning approaches. Utilizing such modeling in feature engineering is exceptionally imperative and required. In this research, we show how to utilize machine learning to save time in research experiments, where we save more than five thousand hours of measuring the energy consumption of encoding recordings. Since measuring the energy consumption has got to be done by humans and since we require more than eleven thousand experiments to cover all the combinations of video sequences, video bit_rate, and video encoding settings, we utilize machine learning to model the energy consumption utilizing linear regression. VP8 codec has been offered by Google as an open video encoder in an effort to replace the popular MPEG-4 Part 10, known as H.264/AVC video encoder standard. This research model energy consumption and describes the major differences between H.264/AVC and VP8 encoders in terms of energy consumption and performance through experiments that are based on machine learning modeling. Twenty-nine raw video sequences are used, offering a wide range of resolutions and contents, with the frame sizes ranging from QCIF(176x144) to 2160p(3840x2160). For fairness in comparison analysis, we use seven settings in VP8 encoder and fifteen types of tuning in H.264/AVC. The settings cover various video qualities. The performance metrics include video qualities, encoding time, and encoding energy consumption.
Conference Paper
Full-text available
This work explores and compares the plethora of metrics for the performance evaluation of object-detection algorithms. Average precision (AP), for instance, is a popular metric for evaluating the accuracy of object detectors by estimating the area under the curve (AUC) of the precision × recall relationship. Depending on the point interpolation used in the plot, two different AP variants can be defined and, therefore, different results are generated. AP has six additional variants increasing the possibilities of benchmarking. The lack of consensus in different works and AP implementations is a problem faced by the academic and scientific communities. Metric implementations written in different computational languages and platforms are usually distributed with corresponding datasets sharing a given bounding-box description. Such projects indeed help the community with evaluation tools, but demand extra work to be adapted for other datasets and bounding-box formats. This work reviews the most used metrics for object detection detaching their differences, applications, and main concepts. It also proposes a standard implementation that can be used as a benchmark among different datasets with minimum adaptation on the annotation files.
Article
Full-text available
Transfer learning aims at improving the performance of target learners on target domains by transferring the knowledge contained in different but related source domains. In this way, the dependence on a large number of target-domain data can be reduced for constructing target learners. Due to the wide application prospects, transfer learning has become a popular and promising area in machine learning. Although there are already some valuable and impressive surveys on transfer learning, these surveys introduce approaches in a relatively isolated way and lack the recent advances in transfer learning. Due to the rapid expansion of the transfer learning area, it is both necessary and challenging to comprehensively review the relevant studies. This survey attempts to connect and systematize the existing transfer learning research studies, as well as to summarize and interpret the mechanisms and the strategies of transfer learning in a comprehensive way, which may help readers have a better understanding of the current research status and ideas. Unlike previous surveys, this survey article reviews more than 40 representative transfer learning approaches, especially homogeneous transfer learning approaches, from the perspectives of data and model. The applications of transfer learning are also briefly introduced. In order to show the performance of different transfer learning models, over 20 representative transfer learning models are used for experiments. The models are performed on three different data sets, that is, Amazon Reviews, Reuters-21578, and Office-31, and the experimental results demonstrate the importance of selecting appropriate transfer learning models for different applications in practice.
Article
Full-text available
Global terrorist threats and illegal migration have intensified concerns for the security of citizens, and every effort is made to exploit all available technological advances to prevent adverse events and protect people and their property. Due to the ability to use at night and in weather conditions where RGB cameras do not perform well, thermal cameras have become an important component of sophisticated video surveillance systems. In this paper, we investigate the task of automatic person detection in thermal images using convolutional neural network models originally intended for detection in RGB images. We compare the performance of the standard state-of-the-art object detectors such as Faster R-CNN, SSD, Cascade R-CNN, and YOLOv3, that were retrained on a dataset of thermal images extracted from videos that simulate illegal movements around the border and in protected areas. Videos are recorded at night in clear weather, rain, and in the fog, at different ranges, and with different movement types. YOLOv3 was significantly faster than other detectors while achieving performance comparable with the best, so it was used in further experiments. We experimented with different training dataset settings in order to determine the minimum number of images needed to achieve good detection results on test datasets. We achieved excellent detection results with respect to average accuracy for all test scenarios although a modest set of thermal images was used for training. We test our trained model on different well known and widely used thermal imaging datasets as well. In addition, we present the results of the recognition of humans and animals in thermal images, which is particularly important in the case of sneaking around objects and illegal border crossings. Also, we present our original thermal dataset used for experimentation that contains surveillance videos recorded at different weather and shooting conditions.