ArticlePDF Available

Car Detection for Smart Parking Systems Based on Improved YOLOv5

Authors:

Abstract and Figures

Nowadays, YOLOv5 is one of the most popular object detection network architectures used in real-time and industrial systems. Traffic management and regulation are typical applications. To take advantage of the YOLOv5 network and develop a parking management tool, this paper proposes a car detection network based on redesigning the YOLOv5 network architecture. This research focuses on network parameter optimization using lightweight modules from EfficientNet and PP-LCNet architectures. On the other hand, this work also presents an aerial view dataset for car detection tasks in the parking, named the AVPL. The proposed network is trained and evaluated on two benchmark datasets which are the Car Parking Lot Dataset and the Pontifical Catholic University of Parana+ Dataset and one proposed dataset. The experiments are reported on mAP@0.5 and mAP@0.5:0.95 measurement units. As a result, this network achieves the best performances at 95.8%, 97.4%, and 97.0% of mAP@0.5 on the Car Parking Lot Dataset, the Pontifical Catholic University of Parana+ Dataset, and the proposed AVPL dataset, respectively. A set of demonstration videos and the proposed dataset are available here: https://bit.ly/3YUoSwi .
Content may be subject to copyright.
Car Detection for Smart Parking Systems
Based on Improved YOLOv5
Duy-Linh Nguyen
*
, Xuan-Thuy Vo
, Adri Priadana
and Kang-Hyun Jo
§
Department of Electrical, Electronic and Computer Engineering
University of Ulsan, Ulsan 44610, South Korea
*
ndlinh301@mail.ulsan.ac.kr
xthuy@islab.ulsan.ac.kr
priadana3202@mail.ulsan.ac.kr
§
acejo@ulsan.ac.kr
Received 4 September 2023
Revised 13 November 2023
Accepted 16 November 2023
Published 13 December 2023
Nowadays, YOLOv5 is one of the most popular object detection network architectures used in
real-time and industrial systems. Tra±c management and regulation are typical applications.
To take advantage of the YOLOv5 network and develop a parking management tool, this paper
proposes a car detection network based on redesigning the YOLOv5 network architecture. This
research focuses on network parameter optimization using lightweight modules from E±-
cientNet and PP-LCNet architectures. On the other hand, this work also presents an aerial view
dataset for car detection tasks in the parking, named the AVPL. The proposed network is
trained and evaluated on two benchmark datasets which are the Car Parking Lot Dataset and
the Ponti¯cal Catholic University of Parana+ Dataset and one proposed dataset. The experi-
ments are reported on mAP@0.5 and mAP@0.5:0.95 measurement units. As a result, this
network achieves the best performances at 95.8%, 97.4%, and 97.0% of mAP@0.5 on the Car
Parking Lot Dataset, the Ponti¯cal Catholic University of Parana+ Dataset, and the proposed
AVPL dataset, respectively. A set of demonstration videos and the proposed dataset are
available here: https://bit.ly/3YUoSwi.
Keywords: Convolutional neural network (CNN); E±cientNet; PP-LCNet; parking
management; YOLOv5.
1. Introduction
Along with the rapid development of modern and smart cities, the number of vehicles
in general and cars in particular have also increased in both quantity and type.
According to a report by the Statista website,
1
there are currently about one and a
§
Corresponding author.
This is an Open Access article published by World Scienti¯c Publishing Company. It is distributed under
the terms of the Creative Commons Attribution 4.0 (CC BY) License which permits use, distribution and
reproduction in any medium, provided the original work is properly cited.
OPEN ACCESS
Vietnam Journal of Computer Science
(2023) 115
#
.
cThe Author(s)
DOI: 10.1142/S2196888823500185
1
Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com
by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.
half million cars in the world and it is predicted that in 2023, the number of cars sold
will reach nearly 69.9 million. This number will increase further in the coming years.
Therefore, developing tools to support parking lot management is essential.
To construct smart parking lots, researchers propose many methods based on geo-
magnetic,
2
ultrasonic,
3
infrared
4
and wireless techniques.
5
In the geomagnetic
method, geomagnetic sensors installed around the parking lot are used to collect and
transmit information about cars and the environment to the processing center.
The ultrasonic method utilizes ultrasonic wave signals to predict the outline and
boundaries of the cars through a grid map. Similarly, the infrared method applies
infrared waves to estimate car distance and communicate between devices through
electronic circuits. The wireless technique designs the wireless network nodes and
develops them based on the electronic chips to detect the vehicle and environmental
information in the parking. Generally, these approaches mainly rely on the operation
of sensors designed and installed in the parking lot. Although these designs achieve
high accuracy, they require large investment, labor and maintenance costs, especially
when deployed in large-scale parking lots. Exploiting the bene¯ts of convolutional
neural networks (CNNs) in the ¯eld of computer vision, several researchers have
designed networks to detect empty or occupied parking spaces using conventional
cameras with quite good accuracy.
68
Following that trend, this paper proposes a car
detector to support smart parking systems. This work explores lightweight network
architectures and redesigned modules inside of the YOLOv5 network
9
to balance
network parameters, detection accuracy, and computational complexity. It ensures
deployment in real-time systems with the lowest deployment cost. The main
contributions of this paper are shown below:
(1) Proposes an improved YOLOv5 architecture for car detection that can be
applied to smart parking systems and other related ¯elds of computer vision.
(2) Provides an Aerial View Parking Lot (AVPL) dataset for car detection tasks in
the parking.
(3) The proposed detector performs better than other detectors on the Car Parking
Lot dataset, the Ponti¯cal Catholic University of Parana+ dataset, and the
proposed AVPL dataset.
The distribution of the remaining parts in the paper is as follows: Section 2presents
the car detection-based methods. Section 3explains the proposed architecture in
detail. Section 4introduces the experimental setup and analyzes the experimental
results. Section 5summarizes the issue and future work orientation.
2. Related works
2.1. Traditional machine learning-based methods
The car detection process of traditional machine learning-based techniques is divided
into two stages, manual feature extraction and classi¯cation. First, feature
2D.-L. Nguyen et al.
Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com
by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.
extractors generate feature vectors using classical methods such as Scale-invariant
Feature Transform (SIFT),
10
Histograms of Oriented Gradients (HOG),
11
and Haar-
like features.
12
Then, the feature vectors go through classi¯ers like the Support
Vector Machine (SVM) and Adaboost
13,14
to obtain the target classi¯cation result.
The traditional feature extraction methods rely heavily on prior knowledge. How-
ever, in the practical application, there are many objective confounding factors
including weather, exposure, distortion, etc. Therefore, the applicability of these
techniques on real-time systems is limited due to low accuracy.
2.2. CNN-based methods
Parking lot images obtained from drones or overhead cameras contain many small-
sized cars. In order to detect these objects well, many studies have focused on the
small object detection topic using a combination of CNN and traditional methods or
one-stage detectors. The authors in Refs. 15,16 and 17 fuse the modern CNNs and
SVM networks to achieve high spatial resolution in vehicle count detection and
counting. Research in Ref. 18 develops a network based on the YOLOv3 network
architecture in which the backbone network is combined between ResNet and
DarkNet to solve object vision in drone images. The work in Ref. 19 proposes a new
feature-matching method and a spatial context analysis for pedestrian-vehicle dis-
crimination. An improved YOLOv5 network architecture is designed by Ref. 20 for
vehicle detection and classi¯cation in Unmanned Aerial Vehicle (UAV) imagery and
Ref. 21 for real-world imagery. Another study in Ref. 22 provides a one-stage de-
tector (SF-SSD) with a new spatial cognition algorithm for car detection in UAV
imagery. The advantage of modern machine learning methods is high detection and
classi¯cation accuracy, especially for small-sized objects. However, they require the
network to have a high-level feature extraction and fusion, and a certain complexity
to ensure operation in real-world conditions.
3. Methodology
The proposed car detection network is shown in Fig. 1. This network is an improved
YOLOv5 architecture including three main parts: backbone, neck, and detection
head.
3.1. Proposed network architecture
Basically, the structure of the proposed network follows the design of the YOLOv5
network architecture with many changes inside the backbone and neck modules.
Speci¯cally, the focus module is replaced by a simple block called Conv. This block is
constructed with a standard convolution layer (Conv2D) with kernel size of 11
followed by a batch normalization (BN) and a ReLU activation function as shown in
Fig. 2(a). This replacement greatly reduces computational complexity but still
ensures feature extraction at the initial stage. Subsequent blocks in the backbone
Car Detection for Smart Parking Systems 3
Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com
by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.
Conv2D
ConvConv Conv +
n BottleNeck ×
Conv2D Concat Concat Concat Conv2D
(b)
(a)
Conv2D
()
()
()
BN SiLU
(c)
k = 7
k = 5
k =3
Conv2D
k3
k3
k3
MaxPooling
k5
k5
k5
MaxPooling
k7
k
7
k
7
MaxPooling
Concat Conv2D
Fig. 2. The architecture of Conv (a), BottleNeck Cross Stage Partial (CSP) (b), and Spatial Pyramid
Pooling (SPP) (c) blocks.
PP-LCNet
PP-LCNet
PP-LCNet
LiteEfficientNet
640×640×3
40×40×192
20×20×384
20×20×384
40×40×18
Medium object
80×80×18
Small object
20×20×18
Large object
80×80×192
40×40×384
80×80×96
40×40×576
80×80×288
40×40×576
40×40×384
20×20×768
20×20×768
Backbone Neck Detection head
PP-LCNet
LiteEfficientNet
LiteEfficientNet
LiteEfficientNet
PP-LCNet
2020
20
SPP Conv
Upsample
Concat
CSP
Conv
Conv
Upsample
Concat
CSP Conv
Concat
×
CSP
Conv
20×2
Concat
20
CSP
Conv2D
Conv2D
Conv2D
Fig. 1. The architecture of proposed car detector.
4D.-L. Nguyen et al.
Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com
by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.
module are also redesigned based on inspiration from lightweight network archi-
tectures such as PP-LCNet
23
and E±cientNet.
24
The design of the PP-LCNet layer is
described in detail in Fig. 3(a). It consists of a depthwise convolution layer (33
DWConv), an attention block (SE block), and ends with a standard convolution
layer (11Conv2D). In between these layers, the BN and the hardswish activation
function
25
are used. The SE block is an attention mechanism based on a global
average pooling (GAP) layer, a fully connected layer (FC1) followed by a recti¯ed
linear unit activation function (ReLU), and a second fully connected layer (FC2)
followed by a sigmoid activation function as Fig. 3(b). This method uses lightweight
convolution layers that save a lot of network parameters. In addition, the attention
mechanism helps the network focus on learning important information about the
object on each feature map level. The next block is LiteE±cientNet. This block is
very simple and is divided into two types corresponding to two stride levels
(stride ¼1or stride ¼2). In the ¯rst type with stride ¼2, the LiteE±cientNet
block uses an extended convolution layer (11Conv2D), a depth-wise convolution
layer (33DWConv), and ends with a project convolution layer (11Conv2D).
For the second type with stride ¼1, the LiteE±cientNet block is exactly designed
the same as the ¯rst type and added a skip connection to merge the current and
original feature maps with the addition operation. These blocks still apply the
lightweight architectures and add one more skip connection to extract the feature
maps following the channel dimension. The combined use of PP-LCNet and
LiteE±cientNet blocks ensures that feature extraction is both spatial and channel
dimensions of each feature map level. The detail of the LiteE±cientNet block is
shown in Fig. 4. The last block in the backbone module is the SPP block. This work
re-applies the architecture of SPP in the YOLOv5 as Fig. 2(c). However, to minimize
the network parameters, the max pooling kernel sizes are reduced from 55,99,
and 13 13 to 33,55, and 77, respectively.
The neck module in the proposed network utilizes the Path Aggregation Network
(PAN) architecture with Conv and BottleNeck CSP blocks following the original
BN, Hardswish
BN, Hardswish
3×3 DWConv
1 1 Conv2D×
SE block
(a)
SE block
FC1, ReLU
FC2, Sigmoid
GAP
(b)
Fig. 3. The architecture of PP-LCNet (a) and SE (b) blocks.
Car Detection for Smart Parking Systems 5
Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com
by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.
YOLOv5. Figure 2(b) shows details of the CSP block. The PAN module combines
the current feature maps with previous feature maps by concatenation operations.
It generates the output with three multi-scale feature maps that are enriched
information. These serve as three inputs for the detection heads.
The detection head module also leverages the construction of three detection
heads from the YOLOv5. Three feature map scales of the PAN neck go through
three convolution operations to conduct prediction on three object scales: small,
medium, and large. Each detection head uses three anchor sizes that describe in
Table 1.
3.2. Loss function
The de¯nition of the loss function is shown as follows:
boxLbox þobj Lobj þclsLcls ;ð1Þ
where Lbox uses CIoU loss
26
to compute the bounding box regression as shown in Eq. (2).
The object con¯dence score loss Lobj and the classi¯cation loss Lcls using Binary Cross
Entropy loss
27
to calculate as presented in Eqs. (5)and(6), respectively. box,obj,and
cls are balancing parameters.
Stride = 2
1 1 Conv2D ×
BN, ReLU6
BN, ReLU6
BN
Expand
Conv
Project
Conv
1 1 Conv2D ×
3 × 3 DWConv
(a)
Stride = 1
+
Skip connection
1 1 Conv2D ×
BN, ReLU6
BN, ReLU6
BN
Expand
Conv
Project
Conv
1 1 Conv2D ×
3 × 3 DWConv
(b)
Fig. 4. The two types of LiteE±cientNet architecture, stride ¼2(a) and stride ¼1(b).
Table 1. Detection heads and anchors sizes.
Heads Input Anchor sizes Ouput Object
180 80 129 (10, 13), (16, 30), (33, 23) 80 80 18 Small
240 40 384 (30, 61), (62, 45), (59, 119) 40 40 18 Medium
320 20 768 (116, 90), (156, 198), (373, 326) 20 20 18 Large
6D.-L. Nguyen et al.
Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com
by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.
The bounding box regression loss:
Lbox ¼1
Npos X
ðx;yÞ
1IoU þ2ðb;bgtÞ
c2þ; ð2Þ
in which:
.Npos is the total number of cells containing an object,
.ðx;yÞis coordinate of each cell,
.IoU is the Intersection Over Union of the predicted and ground-truth boxes,
.band bgt are the central points of the predicted bounding box and ground-truth
bounding box, respectively,
.is the Euclidean distance, and cis the diagonal length of the smallest enclosing
bounding box covering the two boxes,
.is a positive trade-o® parameter:
¼
ð1IoUÞþ;ð3Þ
.measures the consistency of the aspect ratio:
¼4
arctan wgt
hgt arctan w
h2
;ð4Þ
.where wgt;hgt and w;hare the dimensions (w: width, h: height) of the ground-
truth box and predicted box, respectively.
The object con¯dence score loss:
Lobj ¼1
Npos X
ðx;yÞX
class¼1
c¼1
CclogðpðCcÞÞ þ ð1CcÞlogð1pðCcÞÞ;ð5Þ
in which:
.Ccis the con¯dent score of an object belonging to class c,
.pðCcÞis the predicted probability of an object belonging to class c.
The classi¯cation loss:
Lcls ¼1
Npos X
ðx;yÞX
class¼1
c¼1
yclogðpðycÞÞ þ ð1ycÞlogð1pðycÞÞ;ð6Þ
in which:
.ycis the ground-truth label of class c,
.pðycÞis the predicted probability of class c.
Car Detection for Smart Parking Systems 7
Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com
by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.
4. Experiments
4.1. Datasets
The proposed network is trained and evaluated on two benchmark datasets, the Car
Parking Lot Dataset (CarPK) and the Ponti¯cal Catholic University of Parana+
Dataset (PUCPR+)
28
and one proposed dataset. The CarPK dataset contains
89,777 cars collected from the Phantom 3 Professional drone. The images were taken
from four parking lots with an approximate height of 40 m. The CarPK dataset is
divided into 988 images for training and 459 images for validation phases.
The PUCPR+ dataset is selected from a part of the PUCPR dataset consisting of
16,456 cars. The PUCPR+ dataset provides 100 images for training and 25 images
for validation. These are image datasets for car counting in di®erent parking lots.
The cars in the image are annotated by bounding boxes with top-left and bottom-
right angles and stored as text ¯les (*.txt ¯les). To accommodate the training and
evaluation processes, this experiment converts the entire format of the annotation
¯les to the YOLOv5 format. The proposed dataset is a combination of the Drone Car
Counting Dataset YOLO
29
and the Aerial View Car Detection for Yolov5
30
from the
Kaggle website, named the AVPL. This dataset contains 339 drone-view images
including 314 images for training and 25 images for evaluation. The annotation ¯les
follow the YOLO format.
4.2. Experimental setup
The proposed network is conducted on the Pytorch framework (version 1.11.0) and
the Python programming language (version 3.7.1). This network is trained on a
Testla V100 32 GB GPU and evaluated on a GeForce GTX 1080Ti 11 GB GPU. The
optimizer is Adam optimization. The learning rate is initialized at 105and ends at
103. The momentum sets at 0.8 and then increases to 0.937. The training process
goes through 300 epochs with a batch size of 64. The balance parameters are set as
follows: box ¼0:05,obj ¼1,andcls ¼0:5. To increase training scenarios and avoid
the over-¯tting issue, this experiment applies data augmentation methods such as
mosaic, translate, scale, and °ip. For the inference process, other arguments are set like
an image size of 1024 1024, a batch size of 32, a con¯dence threshold ¼0:5,andan
IoU threshold ¼0:5. The speed testing results are reported in milliseconds (ms).
4.3. Experimental results
The performance of the proposed network is evaluated lying on the comparison
results with the retrained YOLOv5 networks from scratch and the recent research on
the two above benchmark datasets. Speci¯cally, this work conducts the training and
evaluation of the proposed network with the four versions of YOLOv5 architectures
(l, m, s, n). Then, it compares the results obtained with the results in Refs. 20 and 22
on the CarPK dataset and the results in Ref. 20 on the PUCPR+ dataset. As a
result, the proposed network achieves 95.8% of mean Average Precision with an
8D.-L. Nguyen et al.
Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com
by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.
IoU threshold of 0.5 (mAP@0.5) and 63.1% of mAP with 10 IoU thresholds from
0.50.95 (mAP@0.5:0.95) on the CarPK dataset. This result shows the superior
ability of the proposed network compared to other networks. While the speed
(inferent time) is only 1.7 ms higher than the retrained YOLOv5m network, nearly
1.5 times lower than the retrained YOLOv5l network, and quite lower than other
experiments in Ref. 20 from 2.3 (YOLOv5m) to 7.9 (YOLOv5m) times. Besides,
the weight of the network (22.7 MB) and the computational complexity (23.9
GFLOPs) are only half of the retrained YOLOv5m architecture. The comparison
results on the CarPK validation set are presented in Table 2. For the PUCPR+
dataset, the proposed network achieves 97.4% of mAP@0.5 and 58.0% of
mAP@0.5:0.95. This result is outstanding compared to other competitors and is
only 0.3% of mAP@0.5 and 2.5% of mAP@0.5:095 lower than retrained YOLOv5m,
respectively. However, the proposed network has a speed of 17.9 ms, which is only
slightly higher than the retrained YOLOv5m network (2.3 ms ") and lower than
the retrained YOLOv5l network (4.5 ms #). The comparison results on the PUCPR
+ dataset are shown in Table 3. For the AVPL dataset case, this work only
compares the performance of the proposed network and four versions of the
YOLOv5 architecture. The results in Table 4show that the proposed network
reaches 97.0% of mAP@0.5 and 76.2% of mAP@0.5:0.95. This result is leading
compared to the YOLOv5 series except to YOLOv5m (0.2% #) at mAP@0.5:0.95,
while the computational complexity and network weight are half of YOLOv5m.
For speed evaluation, the proposed network is better than the large-size
models YOLOv5l (36.2 ms #) and the medium-size model YOLOv5m (3.2 ms #)in
the same testing conditions. Several qualitative results on each dataset are shown
in Fig. 5. Besides, this experiment also compares the performance of the proposed
Table 2. Comparison result of proposed car detection network with other networks and retrained
YOLOv5 on CarPK validation set.
Models Parameter Weight (MB) GFLOPs mAP@0.5 mAP@0.5:0.95 Inf. time (ms)
YOLOv5l46,631,350 93.7 114.2 95.3 62.3 26.4
YOLOv5m21,056,406 42.4 50.4 94.4 61.5 15.9
YOLOv5s7,022,326 14.3 15.8 95.6 62.7 8.7
YOLOv5n1,765,270 3.7 4.2 93.9 57.8 6.3
YOLOv5x
20
N/A 167.0 205.0 94.5 57.9 138.2
YOLOv5l
20
N/A 90.6 108.0 95.0 59.2 72.1
YOLOv5m
20
N/A 41.1 48.0 94.6 57.8 40.4
Modi¯ed YOLOv5
20
N/A 44.0 57.7 94.9 61.1 50.5
SSD
22
N/A N/A N/A 68.7 N/A N/A
YOLO9000
22
N/A N/A N/A 20.9 N/A N/A
YOLOv3
22
N/A N/A N/A 85.3 N/A N/A
YOLOv4
22
N/A N/A N/A 87.81 N/A N/A
SA+CF+CRT
22
N/A N/A N/A 89.8 N/A N/A
SF-SSD
22
N/A N/A N/A 90.1 N/A N/A
Our 11,188,534 22.7 23.9 95.8 63.1 17.6
Note: The symbol \" denotes the retrained networks. N/A means not-available values.
Car Detection for Smart Parking Systems 9
Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com
by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.
method and YOLOv5m as the results are shown in Fig. 6. These results prove that
the proposed method is better than YOLOv5m when detecting the car in a dark
color, overlapping, and crowded condition.
From the mentioned results, the proposed network has a balance in performance,
speed, and network parameters. Therefore, it can be implemented in smart parking
systems on low-computing and embedded devices. However, the process of testing
this network also revealed some disadvantages. Since the car detection network is
mainly based on the signal obtained from the drone-view or °oor-view camera, it is
in°uenced by a number of environmental factors, including illumination, weather,
car density, occlusion, shadow, object similarity, and the distance from the camera to
the cars.
4.4. Ablation studies
The experiment conducted several ablation studies to inspect the importance of each
block in the proposed backbones. The blocks are replaced in turn, trained on the
CarPK training set, and evaluated on the CarPK validation set as shown in Table 4.
The results from this table show that the PP-LCNet block increases the network
performance at mAP@0.5 (1.1% ") but decreases in mAP@0.5:0.95 (0.8% #) when
Table 4. Comparison result of proposed car detection network with retrained YOLOv5 on the AVPL
validation set.
Models Parameter Weight (MB) GFLOPs mAP@0.5 mAP@0.5:0.95 Inf. time (ms)
YOLOv5l 46,631,350 93.7 114.2 96.0 76.1 65.7
YOLOv5m 21,056,406 42.4 50.4 96.4 76.4 32.7
YOLOv5s 7,022,326 14.3 15.8 95.9 75.7 13.3
YOLOv5n 1,765,270 3.7 4.2 96.3 74.3 4.6
Our 11,188,534 22.7 23.9 97.0 76.2 29.5
Table 3. Comparison result of proposed car detection network with other networks and retrained
YOLOv5 on PUCPR+ validation set.
Models Parameter Weight (MB) GFLOPs mAP@0.5 mAP@0.5:0.95 Inf. time (ms)
YOLOv5l46,631,350 93.7 114.2 96.4 53.8 22.4
YOLOv5m21,056,406 42.4 50.4 97.7 60.5 15.6
YOLOv5s7,022,326 14.3 15.8 84.6 38.9 7.4
YOLOv5n1,765,270 3.7 4.2 89.7 41.6 5.9
SSD
22
N/A N/A N/A 32.6 N/A N/A
YOLO9000
22
N/A N/A N/A 12.3 N/A N/A
YOLOv3
22
N/A N/A N/A 95.0 N/A N/A
YOLOv4
22
N/A N/A N/A 94.1 N/A N/A
SA+CF+CRT
22
N/A N/A N/A 92.9 N/A N/A
SF-SSD
22
N/A N/A N/A 90.8 N/A N/A
Our 11,188,534 22.7 23.9 97.4 58.0 17.9
Note: The symbol \" denotes the retrained networks. N/A means not-available values.
10 D.-L. Nguyen et al.
Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com
by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.
CarPK dataset
PUCPR+ dataset
AVPL dataset
Fig. 5. The qualitative results of the proposed network on the validation set of the CarPK, PUCPR+,
and AVPL datasets with IoU threshold ¼0:5and con¯dence score ¼0:5.
Car Detection for Smart Parking Systems 11
Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com
by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.
compared to the LiteE±cientNet block. Combining these two blocks gives a perfect
result along with the starting Conv and the ending SPP blocks. Besides, it also shows
the superiority of the SPP block (0.4% "of mAP@0.5 and mAP@0.5:0.95) over the
SPPF block when they generate the same GFLOPs and network parameters.
5. Conclusion
This paper introduces an improved YOLOv5 architecture for car detection in smart
parking systems. The proposed network contains three main modules: backbone,
neck, and detection head. The backbone module is redesigned using lightweight
architectures: PP-LCNet and LiteE±cientNet. The neck and detection head reuse
Table 5. Ablation studies with di®erent types of backbones on the CarPK
validation set.
Blocks Proposed backbones
Conv üüü ü
PP-LCNet üü ü
LiteE±cientNet üüü
SPPF ü
SPP üü ü
Parameter 10,728,766 9,780,850 11,188,534 11,188,534
Weight (MB) 21.9 19.9 22.7 22.7
GFLOPs 20.8 18.5 23.9 23.9
mAP@0.5 95.1 94.3 95.4 95.8
mAP@0.5:0.95 58.2 59.3 62.7 63.1
Proposed YOLOv5m
Fig. 6. The comparison results between the proposed method and YOLOv5m on the CarPK dataset with
IoU threshold and con¯dence are set to 0.5.
12 D.-L. Nguyen et al.
Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com
by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.
the structure from the original YOLOv5m with several minor modi¯cations.
The network achieves the best results at 95.8%, 97.4%, and 97.0% of mAP@0.5 and
better performances when compared to recent works. The optimization of network
parameters, speed, and detection accuracy provides the ability to deploy the model
on real-time systems. In the future, the neck and detection head modules will be
developed to detect smaller vehicles and implement them on larger datasets.
Moreover, the improved method also will be compared to the latest YOLOv8 version
to inspect the e±ciency and novelty of the proposed architecture.
Acknowledgment
This result was supported by the \Regional Innovation Strategy (RIS)" through the
National Research Foundation of Korea (NRF) funded by the Ministry of Education
(MOE)(2021RIS-003).
ORCID
Duy-Linh Nguyen https://orcid.org/0000-0001-6184-4133
Xuan-Thuy Vo https://orcid.org/0000-0002-7411-0697
Adri Priadana https://orcid.org/0000-0002-1553-7631
Kang-Hyun Jo https://orcid.org/0000-0002-4937-7082
References
1. Scotiabank, Number of cars sold worldwide from 2010 to 2022, with a 2023 forecast
(in million units), https://www.statista.com/statistics/200002/international-car-sales-
since-1990. Accessed 1 January 2023.
2. F. Zhou and Q. Li, Parking guidance system based on zigbee and geomagnetic sensor
technology, in 2014 13th Int. Symp. Distributed Computing and Applications to Business,
Engineering and Science, 2014, Xian Ning, Hubei, China, pp. 268271.
3. Y. Shao, P. Chen and T. Cao, A grid projection method based on ultrasonic sensor for
parking space detection, IGARSS 2018 - 2018 IEEE International Geoscience and
Remote Sensing Symposium, Valencia, Spain, 2018, pp. 33783381, doi: 10.1109/
IGARSS.2018.8519022.
4. H.-C. Chen, C.-J. Huang and K.-H. Lu, Design of a non-processor OBU device for parking
system based on infrared communication, in 2017 IEEE Int. Conf. Consumer Electronics
Taiwan (ICCE-TW), 2017, Taipei, Taiwan, pp. 297298.
5. C. Yuan and L. Qian, Design of intelligent parking lot system based on wireless network,
in 2017 29th Chinese Control and Decision Conf. (CCDC), 2017, Chongqing, China,
pp. 35963601.
6. X. Ding and R. Yang, Vehicle and parking space detection based on improved yolo
network model, J. Phys.: Conf. Ser. 1325(10) (2019) 012084.
7. R. Martín Nieto, A. García-Martín, A. G. Hauptmann and J. M. Martinez, Automatic
vacant parking places management system using multicamera vehicle detection, IEEE
Trans. Intell. Transp. Syst. 20(3) (2019) 10691080.
Car Detection for Smart Parking Systems 13
Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com
by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.
8. S. N. R. Mettupally and V. Menon, A smart eco-system for parking detection using deep
learning and big data analytics, in 2019 SoutheastCon, 2019, Huntsville, Alabama, USA,
pp. 14.
9. G. Jocher et al., ultralytics/yolov5: v3.1 Bug ¯xes and performance improvements
(2020).
10. X. Zihao, H. Weiquan and W. Yin, Multi-class vehicle detection in surveillance video
based on deep learning, J. Comput. Appl. 39(3) (2019) 700705.
11. S. Zhang and X. Wang, Human detection and object tracking based on histograms
of oriented gradients, in 2013 Ninth Int. Conf. Natural Computation (ICNC), 2013,
Shenyang, China, pp. 13491353.
12. P. Viola and M. Jones, Rapid object detection using a boosted cascade of simple features,
in Proc. 2001 IEEE Computer Society Conf. Computer Vision and Pattern Recognition,
CVPR 2001, Vol. 1, 2001, Kauai, HI, USA, pp. II.
13. Y. Freund and R. E. Schapire, A desicion-theoretic generalization of on-line learning and
an application to boosting, in Computational Learning Theory, ed. P. Vit
anyi (Springer,
Berlin, 1995), pp. 2337.
14. V. Mitra, C.-J. Wang and S. Banerjee, Text classi¯cation: A least square support vector
machine approach, Appl. Soft Comput. 7(6) (2007) 908914.
15. N. Ammour, H. Alhichri, Y. Bazi, B. Benjdira, N. Alajlan and M. Zuair, Deep learning
approach for car detection in UAV imagery, Remote Sens. 9(2017) 115.
16. F. Zhao, Q. Kong, Y. Zeng and B. Xu, A brain-inspired visual fear responses model for UAV
emergent obstacle dodging, IEEE Trans. Cogn. Develop. Syst. 12(1) (2020) 124132.
17. S. Chen, S. Zhang, J. Shang, B. Chen and N. Zheng, Brain-inspired cognitive model with
attention for self-driving cars, IEEE Trans. Cogn. Develop. Syst. 11(1) (2019) 1325.
18. M. Liu, X. Wang, A. Zhou, X. Fu, Y. Ma and C. Piao, Uav-yolo: Small object detection on
unmanned aerial vehicle perspective, Sensors 20(8) (2020) 2238.
19. X. Liang, J. Zhang, L. Zhuo, Y. Li and Q. Tian, Small object detection in unmanned
aerial vehicle images using feature fusion and scaling-based single shot detector with
spatial context analysis, IEEE Trans. Circuits Syst. Video Technol. 30(6) (2019) 1758
1770.
20. M. H. Hamzenejadi and H. Mohseni, Real-time vehicle detection and classi¯cation in
UAV imagery using improved yolov5, in 2022 12th Int. Conf. Computer and Knowledge
Engineering (ICCKE), 2022, Ferdowsi University of Mashhad, Iran, pp. 231236.
21. Y. Zhang, Z. Guo, J. Wu, Y. Tian, H. Tang and X. Guo, Real-time vehicle detection
based on improved yolo v5, Sustainability 14(19) (2022) 12274.
22. J. Yu, H. Gao, J. Sun, D. Zhou and Z. Ju, Spatial cognition-driven deep learning for car
detection in unmanned aerial vehicle imagery, IEEE Trans. Cogn. Develop. Syst. 14(4)
(2022) 15741583.
23. C. Cui, T. Gao, S. Wei, Y. Du, R. Guo, S. Dong, B. Lu, Y. Zhou, X. Lv, Q. Liu, X. Hu, D.
Yu and Y. Ma, Pp-lcnet: A lightweight CPU convolutional neural network, arXiv:abs/
2109.15099.
24. M. Tan and Q. V. Le, E±cientnet: Rethinking model scaling for convolutional neural
networks, arXiv:1905.11946.
25. A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R.
Pang, V. Vasudevan, Q. V. Le and H. Adam, Searching for mobilenetv3, in 2019 IEEE/
CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019
pp. 13141324, doi: 10.1109/ICCV.2019.00140.
26. Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye and D. Ren, Distance-iou loss: Faster and better
learning for bounding box regression, arXiv:abs/1911.08287.
14 D.-L. Nguyen et al.
Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com
by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.
27. U. Ruby and V. Yendapalli, Binary cross entropy with deep learning technique for image
classi¯cation, Int. J. Adv. Trends Comput. Sci. Eng. 9(2020) 53935397.
28. M. Hsieh, Y. Lin and W. H. Hsu, Drone-based object counting by spatially regularized
regional proposal network, arXiv:1707.05972.
29. Hemateja, Drone car counting dataset yolo (2021), https://www.kaggle.com/datasets/
ahemateja19bec1025/drone-car-counting-dataset-yolo. Accessed 21 August 2023.
30. Braunge, Aerial view car detection for yolov5 (2023), https://www.kaggle.com/datasets/
braunge/aerial-view-car-detection-for-yolov5. Accessed 21 August 2023.
Car Detection for Smart Parking Systems 15
Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com
by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.
... Future research directions include customizing training data to enhance detection accuracy and exploring synchronization of algorithms across multiple cameras for comprehensive monitoring of large parking areas [7]. Additionally, advancements in detection architectures, such as the improved YOLOv5 [8], demonstrate promising performance with high mean Average Precision (mAP), facilitating real-time deployments [9]. Further research is recommended to optimize detection for smaller vehicles and compare the efficiency of YOLOv5 with YOLOv8 for optimal selection. ...
Preprint
Full-text available
This paper investigates the efficacy of YOLOv8 variants for vehicle detection and license plate detection within smart parking applications, emphasizing performance under varying ambient lighting conditions. The proposed system is to seize full video frames, extracts regions of interest containing vehicles, and feeds them into separate, pre-trained YOLOv8 models – one dedicated to vehicle detection and another for license plate detection. Four YOLOv8 variants, nano, small, medium, and large, are evaluated. As a pre-processing step, the images are processed with the help of OpenCV and Pillow libraries to adjust the luminosity and increase the images’ DPI so that they would be easy to perceive by the Tesseract OCR engine. Sixteen potential combinations arise from pairing the four YOLOv8 models for vehicle and license plate detection tasks. To identify the most suitable combinations, we employ the Multi-Criteria Decision-Making method, specifically Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) analysis. This analysis considers 4 critical metrics: Precision, mean average precision at 95% Intersection of Union threshold, Recall and Total Inference Time. The objective is to achieve an optimal balance between high accuracy and real-time processing. Following the selection of optimal YOLOv8 combinations through TOPSIS analysis, we assess their performance under varying ambient light intensity (measured in lux). This evaluation aims to identify the most robust model combinations that ensure accurate vehicle and license plate recognition across the diverse lighting conditions encountered in real-world environment.
Article
Full-text available
To reduce the false detection rate of vehicle targets caused by occlusion, an improved method of vehicle detection in different traffic scenarios based on an improved YOLO v5 network is proposed. The proposed method uses the Flip-Mosaic algorithm to enhance the network’s perception of small targets. A multi-type vehicle target dataset collected in different scenarios was set up. The detection model was trained based on the dataset. The experimental results showed that the Flip-Mosaic data enhancement algorithm can improve the accuracy of vehicle detection and reduce the false detection rate.
Article
Full-text available
This paper discussed about a unique neural network approach stimulated by a technique that has reformed the field of computer vision: pixel-wise image classification, which we combine with binary cross entropy loss and pre training of the CNN (Convolutional Neural Network) as an auto encoder. The pixel-wise classification technique directly estimates the image source label for each time-frequency (T-F) bin in our image, thus eliminating common pre-and-post processing tasks. The proposed convolutional neural network is trained by using the binary mask as the target output label. The binary mask identifies the dominant image source in each T-F bin of the magnitude spectrogram of a mixture signal, by considering each T-F bin as a pixel with a multi-label (for each image source). Binary Cross Entropy is used as the training objective, so as to minimize the average probability error between the target and predicted label for each pixel. The Inception V3 architecture is used to further boost ImageNet classification accuracy. The results show that the proposed algorithm has the highest accuracy. © 2020, World Academy of Research in Science and Engineering. All rights reserved.
Article
Full-text available
Object detection, as a fundamental task in computer vision, has been developed enormously, but is still challenging work, especially for Unmanned Aerial Vehicle (UAV) perspective due to small scale of the target. In this study, the authors develop a special detection method for small objects in UAV perspective. Based on YOLOv3, the Resblock in darknet is first optimized by concatenating two ResNet units that have the same width and height. Then, the entire darknet structure is improved by increasing convolution operation at an early layer to enrich spatial information. Both these two optimizations can enlarge the receptive filed. Furthermore, UAV-viewed dataset is collected to UAV perspective or small object detection. An optimized training method is also proposed based on collected UAV-viewed dataset. The experimental results on public dataset and our collected UAV-viewed dataset show distinct performance improvement on small object detection with keeping the same level performance on normal dataset, which means our proposed method adapts to different kinds of conditions.
Article
Full-text available
YOLO has a fast detection speed and is suitable for object detection in real-time environment. This paper is based on YOLO v3 network and applied to parking spaces and vehicle detection in parking lots. Based on YOLO v3, this paper adds a residual structure to extract deep vehicle parking space features, and uses four different scale feature maps for object detection, so that deep networks can extract more fine-grained features. Experiment results show that this method can improve the detection accuracy of vehicle and parking space, while reducing the missed detection rate.
Article
Small object detection is the main challenge for image detection of unmanned aerial vehicles (UAVs), especially with small pixel ratios and blurred boundaries. In this paper, a one-stage detector (SF-SSD) is proposed with a new spatial cognition algorithm. The deconvolution operation is introduced to a feature fusion module, which enhances the representation of shallow features. These more representative features prove effective for small-scale object detection. Empowered by a spatial cognition method, the deep model can re-detect objects with less-reliable confidence scores. This enables the detector to improve detection accuracy significantly. Both between-class similarity and within-class similarity are fully exploited to suppress useless background information. This motivates the proposed model to take full use of semantic features in the detection process of multi-class small objects. A simplified network structure can improve the speed of object detection. The experiments are conducted on a newly collected dataset (SY-UAV) and the benchmark datasets (CARPK and PUCPR+). To further demonstrate the effectiveness of the spatial cognition module, a multi-class object detection experiment is conducted on the Stanford Drone dataset (SDD). The results show that the proposed model achieves high frame rates and better detection accuracies than the state-of-the-art methods, which are 90.1% (CAPPK), 90.8% (PUCPR+), and 91.2% (SDD).
Article
Dodging emergent dangers is an innate cognitive ability for animals, which helps them to survive in the natural environment. The retina-superior colliculus-pulvinar-amygdala-periaqueductal gray pathway is responsible for the visual fear responses, and it is able to quickly detect the looming obstacles for innate dodging. Inspired by the mechanism of the visual fear responses pathway, we propose a brain-inspired emergent obstacle dodging method to model the functions of the related brain regions. This method firstly detects the moving direction and speed of the salient point of moving objects (retina). Then, we detect the looming obstacles (superior colliculus). Thirdly, we modulate attention to the most dangerous area (pulvinar). Fourthly, if the degree of danger exceeds the threshold (amygdala), the UAV moves back to dodge it (periaqueductal gray). Two types of experiments are conducted to validate the effectiveness of the proposed model. In simulated scene, we simulate the process of mice’s fear responses by putting looming dark lights shining on them. In natural scene, we apply the proposed model to the UAV emergent obstacles dodging. Compared to the stereo vision model, the proposed model is not only more biologically realistic from the mechanisms perspective, but also more accurate and faster for computation.
Article
Objects in unmanned aerial vehicle (UAV) images are generally small due to the high photography altitude. Although many efforts have been made in object detection, how to accurately and quickly detect small objects is still one of the remaining open challenges. In this paper, we propose a feature fusion and scaling-based single shot detector (FS-SSD) for small object detection in UAV images. FS-SSD is an enhancement based on FSSD, a variety of the original single shot multibox detector (SSD). We add an extra scaling branch of the deconvolution module with an average pooling operation to form a feature pyramid. The original feature fusion branch is adjusted to be better suited to the small object detection task. The two feature pyramids generated by the deconvolution module and feature fusion module are utilized to make predictions together. In addition to the deep features learned by FS-SSD, to further improve the detection accuracy, spatial context analysis is proposed to incorporate the object spatial relationships into object redetection. The interclass and intraclass distances between different object instances are computed as spatial context, which proves effective for multiclass small object detection. Six experiments are conducted on the PASCAL VOC dataset and two UAV image datasets. The experimental results demonstrate that the proposed method can achieve a comparable detection speed but an accuracy superior to those of the six state-of-the-art methods.