ArticlePDF Available

Car Detection for Smart Parking Systems Based on Improved YOLOv5

November 2023
Vietnam Journal of Computer Science 11(02)

November 2023
11(02)

DOI:10.1142/S2196888823500185

Authors:

Duy-Linh Nguyen

University of Ulsan

Xuan-Thuy Vo

University of Ulsan

Adri Priadana

University of Ulsan

Kang-Hyun Jo

University of Ulsan

Nowadays, YOLOv5 is one of the most popular object detection network architectures used in real-time and industrial systems. Traffic management and regulation are typical applications. To take advantage of the YOLOv5 network and develop a parking management tool, this paper proposes a car detection network based on redesigning the YOLOv5 network architecture. This research focuses on network parameter optimization using lightweight modules from EfficientNet and PP-LCNet architectures. On the other hand, this work also presents an aerial view dataset for car detection tasks in the parking, named the AVPL. The proposed network is trained and evaluated on two benchmark datasets which are the Car Parking Lot Dataset and the Pontifical Catholic University of Parana+ Dataset and one proposed dataset. The experiments are reported on mAP@0.5 and mAP@0.5:0.95 measurement units. As a result, this network achieves the best performances at 95.8%, 97.4%, and 97.0% of mAP@0.5 on the Car Parking Lot Dataset, the Pontifical Catholic University of Parana+ Dataset, and the proposed AVPL dataset, respectively. A set of demonstration videos and the proposed dataset are available here: https://bit.ly/3YUoSwi .

Comparison result of proposed car detection network with other networks and retrained YOLOv5 on CarPK validation set.

…

Comparison result of proposed car detection network with retrained YOLOv5 on the AVPL validation set.

…

Comparison result of proposed car detection network with other networks and retrained YOLOv5 on PUCPR+ validation set.

…

Ablation studies with di®erent types of backbones on the CarPK validation set.

…

Figures - uploaded by Duy-Linh Nguyen

Content may be subject to copyright.

Content uploaded by Duy-Linh Nguyen

Content may be subject to copyright.

Car Detection for Smart Parking Systems

Based on Improved YOLOv5

Duy-Linh Nguyen

, Xuan-Thuy Vo

†

, Adri Priadana

‡

and Kang-Hyun Jo

Department of Electrical, Electronic and Computer Engineering

University of Ulsan, Ulsan 44610, South Korea

ndlinh301@mail.ulsan.ac.kr

†

xthuy@islab.ulsan.ac.kr

‡

priadana3202@mail.ulsan.ac.kr

acejo@ulsan.ac.kr

Received 4 September 2023

Revised 13 November 2023

Accepted 16 November 2023

Published 13 December 2023

Nowadays, YOLOv5 is one of the most popular object detection network architectures used in

real-time and industrial systems. Tra±c management and regulation are typical applications.

To take advantage of the YOLOv5 network and develop a parking management tool, this paper

proposes a car detection network based on redesigning the YOLOv5 network architecture. This

research focuses on network parameter optimization using lightweight modules from E±-

cientNet and PP-LCNet architectures. On the other hand, this work also presents an aerial view

dataset for car detection tasks in the parking, named the AVPL. The proposed network is

trained and evaluated on two benchmark datasets which are the Car Parking Lot Dataset and

the Ponti¯cal Catholic University of Parana+ Dataset and one proposed dataset. The experi-

ments are reported on mAP@0.5 and mAP@0.5:0.95 measurement units. As a result, this

network achieves the best performances at 95.8%, 97.4%, and 97.0% of mAP@0.5 on the Car

Parking Lot Dataset, the Ponti¯cal Catholic University of Parana+ Dataset, and the proposed

AVPL dataset, respectively. A set of demonstration videos and the proposed dataset are

available here: https://bit.ly/3YUoSwi.

Keywords: Convolutional neural network (CNN); E±cientNet; PP-LCNet; parking

management; YOLOv5.

1. Introduction

Along with the rapid development of modern and smart cities, the number of vehicles

in general and cars in particular have also increased in both quantity and type.

According to a report by the Statista website,

there are currently about one and a

Corresponding author.

This is an Open Access article published by World Scienti¯c Publishing Company. It is distributed under

the terms of the Creative Commons Attribution 4.0 (CC BY) License which permits use, distribution and

reproduction in any medium, provided the original work is properly cited.

OPEN ACCESS

Vietnam Journal of Computer Science

(2023) 1–15

cThe Author(s)

DOI: 10.1142/S2196888823500185

Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com

by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.

half million cars in the world and it is predicted that in 2023, the number of cars sold

will reach nearly 69.9 million. This number will increase further in the coming years.

Therefore, developing tools to support parking lot management is essential.

To construct smart parking lots, researchers propose many methods based on geo-

magnetic,

ultrasonic,

infrared

and wireless techniques.

In the geomagnetic

method, geomagnetic sensors installed around the parking lot are used to collect and

transmit information about cars and the environment to the processing center.

The ultrasonic method utilizes ultrasonic wave signals to predict the outline and

boundaries of the cars through a grid map. Similarly, the infrared method applies

infrared waves to estimate car distance and communicate between devices through

electronic circuits. The wireless technique designs the wireless network nodes and

develops them based on the electronic chips to detect the vehicle and environmental

information in the parking. Generally, these approaches mainly rely on the operation

of sensors designed and installed in the parking lot. Although these designs achieve

high accuracy, they require large investment, labor and maintenance costs, especially

when deployed in large-scale parking lots. Exploiting the bene¯ts of convolutional

neural networks (CNNs) in the ¯eld of computer vision, several researchers have

designed networks to detect empty or occupied parking spaces using conventional

cameras with quite good accuracy.

6–8

Following that trend, this paper proposes a car

detector to support smart parking systems. This work explores lightweight network

architectures and redesigned modules inside of the YOLOv5 network

to balance

network parameters, detection accuracy, and computational complexity. It ensures

deployment in real-time systems with the lowest deployment cost. The main

contributions of this paper are shown below:

(1) Proposes an improved YOLOv5 architecture for car detection that can be

applied to smart parking systems and other related ¯elds of computer vision.

(2) Provides an Aerial View Parking Lot (AVPL) dataset for car detection tasks in

the parking.

(3) The proposed detector performs better than other detectors on the Car Parking

Lot dataset, the Ponti¯cal Catholic University of Parana+ dataset, and the

proposed AVPL dataset.

The distribution of the remaining parts in the paper is as follows: Section 2presents

the car detection-based methods. Section 3explains the proposed architecture in

detail. Section 4introduces the experimental setup and analyzes the experimental

results. Section 5summarizes the issue and future work orientation.

2. Related works

2.1. Traditional machine learning-based methods

The car detection process of traditional machine learning-based techniques is divided

into two stages, manual feature extraction and classi¯cation. First, feature

2D.-L. Nguyen et al.

Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com

by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.

extractors generate feature vectors using classical methods such as Scale-invariant

Feature Transform (SIFT),

Histograms of Oriented Gradients (HOG),

and Haar-

like features.

Then, the feature vectors go through classi¯ers like the Support

Vector Machine (SVM) and Adaboost

13,14

to obtain the target classi¯cation result.

The traditional feature extraction methods rely heavily on prior knowledge. How-

ever, in the practical application, there are many objective confounding factors

including weather, exposure, distortion, etc. Therefore, the applicability of these

techniques on real-time systems is limited due to low accuracy.

2.2. CNN-based methods

Parking lot images obtained from drones or overhead cameras contain many small-

sized cars. In order to detect these objects well, many studies have focused on the

small object detection topic using a combination of CNN and traditional methods or

one-stage detectors. The authors in Refs. 15,16 and 17 fuse the modern CNNs and

SVM networks to achieve high spatial resolution in vehicle count detection and

counting. Research in Ref. 18 develops a network based on the YOLOv3 network

architecture in which the backbone network is combined between ResNet and

DarkNet to solve object vision in drone images. The work in Ref. 19 proposes a new

feature-matching method and a spatial context analysis for pedestrian-vehicle dis-

crimination. An improved YOLOv5 network architecture is designed by Ref. 20 for

vehicle detection and classi¯cation in Unmanned Aerial Vehicle (UAV) imagery and

Ref. 21 for real-world imagery. Another study in Ref. 22 provides a one-stage de-

tector (SF-SSD) with a new spatial cognition algorithm for car detection in UAV

imagery. The advantage of modern machine learning methods is high detection and

classi¯cation accuracy, especially for small-sized objects. However, they require the

network to have a high-level feature extraction and fusion, and a certain complexity

to ensure operation in real-world conditions.

3. Methodology

The proposed car detection network is shown in Fig. 1. This network is an improved

YOLOv5 architecture including three main parts: backbone, neck, and detection

head.

3.1. Proposed network architecture

Basically, the structure of the proposed network follows the design of the YOLOv5

network architecture with many changes inside the backbone and neck modules.

Speci¯cally, the focus module is replaced by a simple block called Conv. This block is

constructed with a standard convolution layer (Conv2D) with kernel size of 11

followed by a batch normalization (BN) and a ReLU activation function as shown in

Fig. 2(a). This replacement greatly reduces computational complexity but still

ensures feature extraction at the initial stage. Subsequent blocks in the backbone

Car Detection for Smart Parking Systems 3

Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com

by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.

Conv2D

ConvConv Conv +

n BottleNeck ×

Conv2D Concat Concat Concat Conv2D

(b)

(a)

Conv2D

()

BN SiLU

(c)

k = 7

k = 5

k =3

Conv2D

MaxPooling

Concat Conv2D

Fig. 2. The architecture of Conv (a), BottleNeck Cross Stage Partial (CSP) (b), and Spatial Pyramid

Pooling (SPP) (c) blocks.

PP-LCNet

LiteEfﬁcientNet

640×640×3

40×40×192

20×20×384

40×40×18

Medium object

80×80×18

Small object

20×20×18

Large object

80×80×192

40×40×384

80×80×96

40×40×576

80×80×288

40×40×576

40×40×384

20×20×768

Backbone Neck Detection head

3×

PP-LCNet

LiteEfﬁcientNet

PP-LCNet

2020

SPP Conv

Upsample

Concat

CSP

Conv

Upsample

Concat

CSP Conv

Concat

CSP

Conv

20×2

Concat

CSP

Conv2D

Fig. 1. The architecture of proposed car detector.

4D.-L. Nguyen et al.

Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com

by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.

module are also redesigned based on inspiration from lightweight network archi-

tectures such as PP-LCNet

and E±cientNet.

The design of the PP-LCNet layer is

described in detail in Fig. 3(a). It consists of a depthwise convolution layer (33

DWConv), an attention block (SE block), and ends with a standard convolution

layer (11Conv2D). In between these layers, the BN and the hardswish activation

function

are used. The SE block is an attention mechanism based on a global

average pooling (GAP) layer, a fully connected layer (FC1) followed by a recti¯ed

linear unit activation function (ReLU), and a second fully connected layer (FC2)

followed by a sigmoid activation function as Fig. 3(b). This method uses lightweight

convolution layers that save a lot of network parameters. In addition, the attention

mechanism helps the network focus on learning important information about the

object on each feature map level. The next block is LiteE±cientNet. This block is

very simple and is divided into two types corresponding to two stride levels

(stride ¼1or stride ¼2). In the ¯rst type with stride ¼2, the LiteE±cientNet

block uses an extended convolution layer (11Conv2D), a depth-wise convolution

layer (33DWConv), and ends with a project convolution layer (11Conv2D).

For the second type with stride ¼1, the LiteE±cientNet block is exactly designed

the same as the ¯rst type and added a skip connection to merge the current and

original feature maps with the addition operation. These blocks still apply the

lightweight architectures and add one more skip connection to extract the feature

maps following the channel dimension. The combined use of PP-LCNet and

LiteE±cientNet blocks ensures that feature extraction is both spatial and channel

dimensions of each feature map level. The detail of the LiteE±cientNet block is

shown in Fig. 4. The last block in the backbone module is the SPP block. This work

re-applies the architecture of SPP in the YOLOv5 as Fig. 2(c). However, to minimize

the network parameters, the max pooling kernel sizes are reduced from 55,99,

and 13 13 to 33,55, and 77, respectively.

The neck module in the proposed network utilizes the Path Aggregation Network

(PAN) architecture with Conv and BottleNeck CSP blocks following the original

BN, Hardswish

3×3 DWConv

1 1 Conv2D×

SE block

(a)

SE block

FC1, ReLU

FC2, Sigmoid

GAP

(b)

Fig. 3. The architecture of PP-LCNet (a) and SE (b) blocks.

Car Detection for Smart Parking Systems 5

Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com

by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.

YOLOv5. Figure 2(b) shows details of the CSP block. The PAN module combines

the current feature maps with previous feature maps by concatenation operations.

It generates the output with three multi-scale feature maps that are enriched

information. These serve as three inputs for the detection heads.

The detection head module also leverages the construction of three detection

heads from the YOLOv5. Three feature map scales of the PAN neck go through

three convolution operations to conduct prediction on three object scales: small,

medium, and large. Each detection head uses three anchor sizes that describe in

Table 1.

3.2. Loss function

The de¯nition of the loss function is shown as follows:

L¼boxLbox þobj Lobj þclsLcls ;ð1Þ

where Lbox uses CIoU loss

to compute the bounding box regression as shown in Eq. (2).

The object con¯dence score loss Lobj and the classi¯cation loss Lcls using Binary Cross

Entropy loss

to calculate as presented in Eqs. (5)and(6), respectively. box,obj,and

cls are balancing parameters.

Stride = 2

1 1 Conv2D ×

BN, ReLU6

Expand

Conv

Project

Conv

1 1 Conv2D ×

3 × 3 DWConv

(a)

Stride = 1

Skip connection

1 1 Conv2D ×

BN, ReLU6

Expand

Conv

Project

Conv

1 1 Conv2D ×

3 × 3 DWConv

(b)

Fig. 4. The two types of LiteE±cientNet architecture, stride ¼2(a) and stride ¼1(b).

Table 1. Detection heads and anchors sizes.

Heads Input Anchor sizes Ouput Object

180 80 129 (10, 13), (16, 30), (33, 23) 80 80 18 Small

240 40 384 (30, 61), (62, 45), (59, 119) 40 40 18 Medium

320 20 768 (116, 90), (156, 198), (373, 326) 20 20 18 Large

6D.-L. Nguyen et al.

Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com

by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.

The bounding box regression loss:

Lbox ¼1

Npos X

ðx;yÞ

1IoU þ2ðb;bgtÞ

c2þ; ð2Þ

in which:

.Npos is the total number of cells containing an object,

.ðx;yÞis coordinate of each cell,

.IoU is the Intersection Over Union of the predicted and ground-truth boxes,

.band bgt are the central points of the predicted bounding box and ground-truth

bounding box, respectively,

.is the Euclidean distance, and cis the diagonal length of the smallest enclosing

bounding box covering the two boxes,

.is a positive trade-o® parameter:

¼

ð1IoUÞþ;ð3Þ

.measures the consistency of the aspect ratio:

¼4

arctan wgt

hgt arctan w

h2

;ð4Þ

.where wgt;hgt and w;hare the dimensions (w: width, h: height) of the ground-

truth box and predicted box, respectively.

The object con¯dence score loss:

Lobj ¼1

Npos X

ðx;yÞX

class¼1

c¼1

CclogðpðCcÞÞ þ ð1CcÞlogð1pðCcÞÞ;ð5Þ

in which:

.Ccis the con¯dent score of an object belonging to class c,

.pðCcÞis the predicted probability of an object belonging to class c.

The classi¯cation loss:

Lcls ¼1

Npos X

ðx;yÞX

class¼1

c¼1

yclogðpðycÞÞ þ ð1ycÞlogð1pðycÞÞ;ð6Þ

in which:

.ycis the ground-truth label of class c,

.pðycÞis the predicted probability of class c.

Car Detection for Smart Parking Systems 7

Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com

by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.

4. Experiments

4.1. Datasets

The proposed network is trained and evaluated on two benchmark datasets, the Car

Parking Lot Dataset (CarPK) and the Ponti¯cal Catholic University of Parana+

Dataset (PUCPR+)

and one proposed dataset. The CarPK dataset contains

89,777 cars collected from the Phantom 3 Professional drone. The images were taken

from four parking lots with an approximate height of 40 m. The CarPK dataset is

divided into 988 images for training and 459 images for validation phases.

The PUCPR+ dataset is selected from a part of the PUCPR dataset consisting of

16,456 cars. The PUCPR+ dataset provides 100 images for training and 25 images

for validation. These are image datasets for car counting in di®erent parking lots.

The cars in the image are annotated by bounding boxes with top-left and bottom-

right angles and stored as text ¯les (*.txt ¯les). To accommodate the training and

evaluation processes, this experiment converts the entire format of the annotation

¯les to the YOLOv5 format. The proposed dataset is a combination of the Drone Car

Counting Dataset YOLO

and the Aerial View Car Detection for Yolov5

from the

Kaggle website, named the AVPL. This dataset contains 339 drone-view images

including 314 images for training and 25 images for evaluation. The annotation ¯les

follow the YOLO format.

4.2. Experimental setup

The proposed network is conducted on the Pytorch framework (version 1.11.0) and

the Python programming language (version 3.7.1). This network is trained on a

Testla V100 32 GB GPU and evaluated on a GeForce GTX 1080Ti 11 GB GPU. The

optimizer is Adam optimization. The learning rate is initialized at 105and ends at

103. The momentum sets at 0.8 and then increases to 0.937. The training process

goes through 300 epochs with a batch size of 64. The balance parameters are set as

follows: box ¼0:05,obj ¼1,andcls ¼0:5. To increase training scenarios and avoid

the over-¯tting issue, this experiment applies data augmentation methods such as

mosaic, translate, scale, and °ip. For the inference process, other arguments are set like

an image size of 1024 1024, a batch size of 32, a con¯dence threshold ¼0:5,andan

IoU threshold ¼0:5. The speed testing results are reported in milliseconds (ms).

4.3. Experimental results

The performance of the proposed network is evaluated lying on the comparison

results with the retrained YOLOv5 networks from scratch and the recent research on

the two above benchmark datasets. Speci¯cally, this work conducts the training and

evaluation of the proposed network with the four versions of YOLOv5 architectures

(l, m, s, n). Then, it compares the results obtained with the results in Refs. 20 and 22

on the CarPK dataset and the results in Ref. 20 on the PUCPR+ dataset. As a

result, the proposed network achieves 95.8% of mean Average Precision with an

8D.-L. Nguyen et al.

Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com

by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.

IoU threshold of 0.5 (mAP@0.5) and 63.1% of mAP with 10 IoU thresholds from

0.5–0.95 (mAP@0.5:0.95) on the CarPK dataset. This result shows the superior

ability of the proposed network compared to other networks. While the speed

(inferent time) is only 1.7 ms higher than the retrained YOLOv5m network, nearly

1.5 times lower than the retrained YOLOv5l network, and quite lower than other

experiments in Ref. 20 from 2.3 (YOLOv5m) to 7.9 (YOLOv5m) times. Besides,

the weight of the network (22.7 MB) and the computational complexity (23.9

GFLOPs) are only half of the retrained YOLOv5m architecture. The comparison

results on the CarPK validation set are presented in Table 2. For the PUCPR+

dataset, the proposed network achieves 97.4% of mAP@0.5 and 58.0% of

mAP@0.5:0.95. This result is outstanding compared to other competitors and is

only 0.3% of mAP@0.5 and 2.5% of mAP@0.5:095 lower than retrained YOLOv5m,

respectively. However, the proposed network has a speed of 17.9 ms, which is only

slightly higher than the retrained YOLOv5m network (2.3 ms ") and lower than

the retrained YOLOv5l network (4.5 ms #). The comparison results on the PUCPR

+ dataset are shown in Table 3. For the AVPL dataset case, this work only

compares the performance of the proposed network and four versions of the

YOLOv5 architecture. The results in Table 4show that the proposed network

reaches 97.0% of mAP@0.5 and 76.2% of mAP@0.5:0.95. This result is leading

compared to the YOLOv5 series except to YOLOv5m (0.2% #) at mAP@0.5:0.95,

while the computational complexity and network weight are half of YOLOv5m.

For speed evaluation, the proposed network is better than the large-size

models YOLOv5l (36.2 ms #) and the medium-size model YOLOv5m (3.2 ms #)in

the same testing conditions. Several qualitative results on each dataset are shown

in Fig. 5. Besides, this experiment also compares the performance of the proposed

Table 2. Comparison result of proposed car detection network with other networks and retrained

YOLOv5 on CarPK validation set.

Models Parameter Weight (MB) GFLOPs mAP@0.5 mAP@0.5:0.95 Inf. time (ms)

YOLOv5l†46,631,350 93.7 114.2 95.3 62.3 26.4

YOLOv5m†21,056,406 42.4 50.4 94.4 61.5 15.9

YOLOv5s†7,022,326 14.3 15.8 95.6 62.7 8.7

YOLOv5n†1,765,270 3.7 4.2 93.9 57.8 6.3

YOLOv5x

N/A 167.0 205.0 94.5 57.9 138.2

YOLOv5l

N/A 90.6 108.0 95.0 59.2 72.1

YOLOv5m

N/A 41.1 48.0 94.6 57.8 40.4

Modi¯ed YOLOv5

N/A 44.0 57.7 94.9 61.1 50.5

SSD

N/A N/A N/A 68.7 N/A N/A

YOLO9000

N/A N/A N/A 20.9 N/A N/A

YOLOv3

N/A N/A N/A 85.3 N/A N/A

YOLOv4

N/A N/A N/A 87.81 N/A N/A

SA+CF+CRT

N/A N/A N/A 89.8 N/A N/A

SF-SSD

N/A N/A N/A 90.1 N/A N/A

Our 11,188,534 22.7 23.9 95.8 63.1 17.6

Note: The symbol \†" denotes the retrained networks. N/A means not-available values.

Car Detection for Smart Parking Systems 9

Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com

by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.

method and YOLOv5m as the results are shown in Fig. 6. These results prove that

the proposed method is better than YOLOv5m when detecting the car in a dark

color, overlapping, and crowded condition.

From the mentioned results, the proposed network has a balance in performance,

speed, and network parameters. Therefore, it can be implemented in smart parking

systems on low-computing and embedded devices. However, the process of testing

this network also revealed some disadvantages. Since the car detection network is

mainly based on the signal obtained from the drone-view or °oor-view camera, it is

in°uenced by a number of environmental factors, including illumination, weather,

car density, occlusion, shadow, object similarity, and the distance from the camera to

the cars.

4.4. Ablation studies

The experiment conducted several ablation studies to inspect the importance of each

block in the proposed backbones. The blocks are replaced in turn, trained on the

CarPK training set, and evaluated on the CarPK validation set as shown in Table 4.

The results from this table show that the PP-LCNet block increases the network

performance at mAP@0.5 (1.1% ") but decreases in mAP@0.5:0.95 (0.8% #) when

Table 4. Comparison result of proposed car detection network with retrained YOLOv5 on the AVPL

validation set.

Models Parameter Weight (MB) GFLOPs mAP@0.5 mAP@0.5:0.95 Inf. time (ms)

YOLOv5l 46,631,350 93.7 114.2 96.0 76.1 65.7

YOLOv5m 21,056,406 42.4 50.4 96.4 76.4 32.7

YOLOv5s 7,022,326 14.3 15.8 95.9 75.7 13.3

YOLOv5n 1,765,270 3.7 4.2 96.3 74.3 4.6

Our 11,188,534 22.7 23.9 97.0 76.2 29.5

Table 3. Comparison result of proposed car detection network with other networks and retrained

YOLOv5 on PUCPR+ validation set.

Models Parameter Weight (MB) GFLOPs mAP@0.5 mAP@0.5:0.95 Inf. time (ms)

YOLOv5l†46,631,350 93.7 114.2 96.4 53.8 22.4

YOLOv5m†21,056,406 42.4 50.4 97.7 60.5 15.6

YOLOv5s†7,022,326 14.3 15.8 84.6 38.9 7.4

YOLOv5n†1,765,270 3.7 4.2 89.7 41.6 5.9

SSD

N/A N/A N/A 32.6 N/A N/A

YOLO9000

N/A N/A N/A 12.3 N/A N/A

YOLOv3

N/A N/A N/A 95.0 N/A N/A

YOLOv4

N/A N/A N/A 94.1 N/A N/A

SA+CF+CRT

N/A N/A N/A 92.9 N/A N/A

SF-SSD

N/A N/A N/A 90.8 N/A N/A

Our 11,188,534 22.7 23.9 97.4 58.0 17.9

Note: The symbol \†" denotes the retrained networks. N/A means not-available values.

10 D.-L. Nguyen et al.

Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com

by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.

CarPK dataset

PUCPR+ dataset

AVPL dataset

Fig. 5. The qualitative results of the proposed network on the validation set of the CarPK, PUCPR+,

and AVPL datasets with IoU threshold ¼0:5and con¯dence score ¼0:5.

Car Detection for Smart Parking Systems 11

Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com

by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.

compared to the LiteE±cientNet block. Combining these two blocks gives a perfect

result along with the starting Conv and the ending SPP blocks. Besides, it also shows

the superiority of the SPP block (0.4% "of mAP@0.5 and mAP@0.5:0.95) over the

SPPF block when they generate the same GFLOPs and network parameters.

5. Conclusion

This paper introduces an improved YOLOv5 architecture for car detection in smart

parking systems. The proposed network contains three main modules: backbone,

neck, and detection head. The backbone module is redesigned using lightweight

architectures: PP-LCNet and LiteE±cientNet. The neck and detection head reuse

Table 5. Ablation studies with di®erent types of backbones on the CarPK

validation set.

Blocks Proposed backbones

Conv üüü ü

PP-LCNet üü ü

LiteE±cientNet üüü

SPPF ü

SPP üü ü

Parameter 10,728,766 9,780,850 11,188,534 11,188,534

Weight (MB) 21.9 19.9 22.7 22.7

GFLOPs 20.8 18.5 23.9 23.9

mAP@0.5 95.1 94.3 95.4 95.8

mAP@0.5:0.95 58.2 59.3 62.7 63.1

Proposed YOLOv5m

Fig. 6. The comparison results between the proposed method and YOLOv5m on the CarPK dataset with

IoU threshold and con¯dence are set to 0.5.

12 D.-L. Nguyen et al.

Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com

by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.

the structure from the original YOLOv5m with several minor modi¯cations.

The network achieves the best results at 95.8%, 97.4%, and 97.0% of mAP@0.5 and

better performances when compared to recent works. The optimization of network

parameters, speed, and detection accuracy provides the ability to deploy the model

on real-time systems. In the future, the neck and detection head modules will be

developed to detect smaller vehicles and implement them on larger datasets.

Moreover, the improved method also will be compared to the latest YOLOv8 version

to inspect the e±ciency and novelty of the proposed architecture.

Acknowledgment

This result was supported by the \Regional Innovation Strategy (RIS)" through the

National Research Foundation of Korea (NRF) funded by the Ministry of Education

(MOE)(2021RIS-003).

ORCID

Duy-Linh Nguyen https://orcid.org/0000-0001-6184-4133

Xuan-Thuy Vo https://orcid.org/0000-0002-7411-0697

Adri Priadana https://orcid.org/0000-0002-1553-7631

Kang-Hyun Jo https://orcid.org/0000-0002-4937-7082

References

1. Scotiabank, Number of cars sold worldwide from 2010 to 2022, with a 2023 forecast

(in million units), https://www.statista.com/statistics/200002/international-car-sales-

since-1990. Accessed 1 January 2023.

2. F. Zhou and Q. Li, Parking guidance system based on zigbee and geomagnetic sensor

technology, in 2014 13th Int. Symp. Distributed Computing and Applications to Business,

Engineering and Science, 2014, Xian Ning, Hubei, China, pp. 268–271.

3. Y. Shao, P. Chen and T. Cao, A grid projection method based on ultrasonic sensor for

parking space detection, IGARSS 2018 - 2018 IEEE International Geoscience and

Remote Sensing Symposium, Valencia, Spain, 2018, pp. 3378–3381, doi: 10.1109/

IGARSS.2018.8519022.

4. H.-C. Chen, C.-J. Huang and K.-H. Lu, Design of a non-processor OBU device for parking

system based on infrared communication, in 2017 IEEE Int. Conf. Consumer Electronics

Taiwan (ICCE-TW), 2017, Taipei, Taiwan, pp. 297–298.

5. C. Yuan and L. Qian, Design of intelligent parking lot system based on wireless network,

in 2017 29th Chinese Control and Decision Conf. (CCDC), 2017, Chongqing, China,

pp. 3596–3601.

6. X. Ding and R. Yang, Vehicle and parking space detection based on improved yolo

network model, J. Phys.: Conf. Ser. 1325(10) (2019) 012084.

7. R. Martín Nieto, A. García-Martín, A. G. Hauptmann and J. M. Martinez, Automatic

vacant parking places management system using multicamera vehicle detection, IEEE

Trans. Intell. Transp. Syst. 20(3) (2019) 1069–1080.

Car Detection for Smart Parking Systems 13

Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com

by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.

8. S. N. R. Mettupally and V. Menon, A smart eco-system for parking detection using deep

learning and big data analytics, in 2019 SoutheastCon, 2019, Huntsville, Alabama, USA,

pp. 1–4.

9. G. Jocher et al., ultralytics/yolov5: v3.1  Bug ¯xes and performance improvements

(2020).

10. X. Zihao, H. Weiquan and W. Yin, Multi-class vehicle detection in surveillance video

based on deep learning, J. Comput. Appl. 39(3) (2019) 700–705.

11. S. Zhang and X. Wang, Human detection and object tracking based on histograms

of oriented gradients, in 2013 Ninth Int. Conf. Natural Computation (ICNC), 2013,

Shenyang, China, pp. 1349–1353.

12. P. Viola and M. Jones, Rapid object detection using a boosted cascade of simple features,

in Proc. 2001 IEEE Computer Society Conf. Computer Vision and Pattern Recognition,

CVPR 2001, Vol. 1, 2001, Kauai, HI, USA, pp. I–I.

13. Y. Freund and R. E. Schapire, A desicion-theoretic generalization of on-line learning and

an application to boosting, in Computational Learning Theory, ed. P. Vit

anyi (Springer,

Berlin, 1995), pp. 23–37.

14. V. Mitra, C.-J. Wang and S. Banerjee, Text classi¯cation: A least square support vector

machine approach, Appl. Soft Comput. 7(6) (2007) 908–914.

15. N. Ammour, H. Alhichri, Y. Bazi, B. Benjdira, N. Alajlan and M. Zuair, Deep learning

approach for car detection in UAV imagery, Remote Sens. 9(2017) 1–15.

16. F. Zhao, Q. Kong, Y. Zeng and B. Xu, A brain-inspired visual fear responses model for UAV

emergent obstacle dodging, IEEE Trans. Cogn. Develop. Syst. 12(1) (2020) 124–132.

17. S. Chen, S. Zhang, J. Shang, B. Chen and N. Zheng, Brain-inspired cognitive model with

attention for self-driving cars, IEEE Trans. Cogn. Develop. Syst. 11(1) (2019) 13–25.

18. M. Liu, X. Wang, A. Zhou, X. Fu, Y. Ma and C. Piao, Uav-yolo: Small object detection on

unmanned aerial vehicle perspective, Sensors 20(8) (2020) 2238.

19. X. Liang, J. Zhang, L. Zhuo, Y. Li and Q. Tian, Small object detection in unmanned

aerial vehicle images using feature fusion and scaling-based single shot detector with

spatial context analysis, IEEE Trans. Circuits Syst. Video Technol. 30(6) (2019) 1758–

1770.

20. M. H. Hamzenejadi and H. Mohseni, Real-time vehicle detection and classi¯cation in

UAV imagery using improved yolov5, in 2022 12th Int. Conf. Computer and Knowledge

Engineering (ICCKE), 2022, Ferdowsi University of Mashhad, Iran, pp. 231–236.

21. Y. Zhang, Z. Guo, J. Wu, Y. Tian, H. Tang and X. Guo, Real-time vehicle detection

based on improved yolo v5, Sustainability 14(19) (2022) 12274.

22. J. Yu, H. Gao, J. Sun, D. Zhou and Z. Ju, Spatial cognition-driven deep learning for car

detection in unmanned aerial vehicle imagery, IEEE Trans. Cogn. Develop. Syst. 14(4)

(2022) 1574–1583.

23. C. Cui, T. Gao, S. Wei, Y. Du, R. Guo, S. Dong, B. Lu, Y. Zhou, X. Lv, Q. Liu, X. Hu, D.

Yu and Y. Ma, Pp-lcnet: A lightweight CPU convolutional neural network, arXiv:abs/

2109.15099.

24. M. Tan and Q. V. Le, E±cientnet: Rethinking model scaling for convolutional neural

networks, arXiv:1905.11946.

25. A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R.

Pang, V. Vasudevan, Q. V. Le and H. Adam, Searching for mobilenetv3, in 2019 IEEE/

CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019

pp. 1314–1324, doi: 10.1109/ICCV.2019.00140.

26. Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye and D. Ren, Distance-iou loss: Faster and better

learning for bounding box regression, arXiv:abs/1911.08287.

14 D.-L. Nguyen et al.

Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com

by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.

27. U. Ruby and V. Yendapalli, Binary cross entropy with deep learning technique for image

classi¯cation, Int. J. Adv. Trends Comput. Sci. Eng. 9(2020) 5393–5397.

28. M. Hsieh, Y. Lin and W. H. Hsu, Drone-based object counting by spatially regularized

regional proposal network, arXiv:1707.05972.

29. Hemateja, Drone car counting dataset yolo (2021), https://www.kaggle.com/datasets/

ahemateja19bec1025/drone-car-counting-dataset-yolo. Accessed 21 August 2023.

30. Braunge, Aerial view car detection for yolov5 (2023), https://www.kaggle.com/datasets/

braunge/aerial-view-car-detection-for-yolov5. Accessed 21 August 2023.

Car Detection for Smart Parking Systems 15

Vietnam J. Comp. Sci. Downloaded from www.worldscientific.com

by 61.76.198.59 on 12/13/23. Re-use and distribution is strictly not permitted, except for Open Access articles.

Evaluating the Performance of Ensembled YOLOv8 Variants in Smart Parking Applications for Vehicle Detection and License Plate Recognition under Varying Lighting Conditions

Preprint

Full-text available

Apr 2024

This paper investigates the efficacy of YOLOv8 variants for vehicle detection and license plate detection within smart parking applications, emphasizing performance under varying ambient lighting conditions. The proposed system is to seize full video frames, extracts regions of interest containing vehicles, and feeds them into separate, pre-trained YOLOv8 models – one dedicated to vehicle detection and another for license plate detection. Four YOLOv8 variants, nano, small, medium, and large, are evaluated. As a pre-processing step, the images are processed with the help of OpenCV and Pillow libraries to adjust the luminosity and increase the images’ DPI so that they would be easy to perceive by the Tesseract OCR engine. Sixteen potential combinations arise from pairing the four YOLOv8 models for vehicle and license plate detection tasks. To identify the most suitable combinations, we employ the Multi-Criteria Decision-Making method, specifically Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) analysis. This analysis considers 4 critical metrics: Precision, mean average precision at 95% Intersection of Union threshold, Recall and Total Inference Time. The objective is to achieve an optimal balance between high accuracy and real-time processing. Following the selection of optimal YOLOv8 combinations through TOPSIS analysis, we assess their performance under varying ambient light intensity (measured in lux). This evaluation aims to identify the most robust model combinations that ensure accurate vehicle and license plate recognition across the diverse lighting conditions encountered in real-world environment.

Real-Time Vehicle Detection Based on Improved YOLO v5

Article

Full-text available

Sep 2022

To reduce the false detection rate of vehicle targets caused by occlusion, an improved method of vehicle detection in different traffic scenarios based on an improved YOLO v5 network is proposed. The proposed method uses the Flip-Mosaic algorithm to enhance the network’s perception of small targets. A multi-type vehicle target dataset collected in different scenarios was set up. The detection model was trained based on the dataset. The experimental results showed that the Flip-Mosaic data enhancement algorithm can improve the accuracy of vehicle detection and reduce the false detection rate.

Binary cross entropy with deep learning technique for Image classification

Article

Full-text available

Oct 2020

This paper discussed about a unique neural network approach stimulated by a technique that has reformed the field of computer vision: pixel-wise image classification, which we combine with binary cross entropy loss and pre training of the CNN (Convolutional Neural Network) as an auto encoder. The pixel-wise classification technique directly estimates the image source label for each time-frequency (T-F) bin in our image, thus eliminating common pre-and-post processing tasks. The proposed convolutional neural network is trained by using the binary mask as the target output label. The binary mask identifies the dominant image source in each T-F bin of the magnitude spectrogram of a mixture signal, by considering each T-F bin as a pixel with a multi-label (for each image source). Binary Cross Entropy is used as the training objective, so as to minimize the average probability error between the target and predicted label for each pixel. The Inception V3 architecture is used to further boost ImageNet classification accuracy. The results show that the proposed algorithm has the highest accuracy. © 2020, World Academy of Research in Science and Engineering. All rights reserved.

UAV-YOLO: Small Object Detection on Unmanned Aerial Vehicle Perspective

Article

Full-text available

Apr 2020
SENSORS-BASEL

Object detection, as a fundamental task in computer vision, has been developed enormously, but is still challenging work, especially for Unmanned Aerial Vehicle (UAV) perspective due to small scale of the target. In this study, the authors develop a special detection method for small objects in UAV perspective. Based on YOLOv3, the Resblock in darknet is first optimized by concatenating two ResNet units that have the same width and height. Then, the entire darknet structure is improved by increasing convolution operation at an early layer to enrich spatial information. Both these two optimizations can enlarge the receptive filed. Furthermore, UAV-viewed dataset is collected to UAV perspective or small object detection. An optimized training method is also proposed based on collected UAV-viewed dataset. The experimental results on public dataset and our collected UAV-viewed dataset show distinct performance improvement on small object detection with keeping the same level performance on normal dataset, which means our proposed method adapts to different kinds of conditions.

Vehicle and Parking Space Detection Based on Improved YOLO Network Model

Article

Full-text available

Oct 2019

YOLO has a fast detection speed and is suitable for object detection in real-time environment. This paper is based on YOLO v3 network and applied to parking spaces and vehicle detection in parking lots. Based on YOLO v3, this paper adds a residual structure to extract deep vehicle parking space features, and uses four different scale feature maps for object detection, so that deep networks can extract more fine-grained features. Experiment results show that this method can improve the detection accuracy of vehicle and parking space, while reducing the missed detection rate.

Real-Time Vehicle Detection and Classification in UAV imagery Using Improved YOLOv5

Conference Paper

Nov 2022

Spatial Cognition-Driven Deep Learning for Car Detection in Unmanned Aerial Vehicle Imagery

Article

Nov 2021

Small object detection is the main challenge for image detection of unmanned aerial vehicles (UAVs), especially with small pixel ratios and blurred boundaries. In this paper, a one-stage detector (SF-SSD) is proposed with a new spatial cognition algorithm. The deconvolution operation is introduced to a feature fusion module, which enhances the representation of shallow features. These more representative features prove effective for small-scale object detection. Empowered by a spatial cognition method, the deep model can re-detect objects with less-reliable confidence scores. This enables the detector to improve detection accuracy significantly. Both between-class similarity and within-class similarity are fully exploited to suppress useless background information. This motivates the proposed model to take full use of semantic features in the detection process of multi-class small objects. A simplified network structure can improve the speed of object detection. The experiments are conducted on a newly collected dataset (SY-UAV) and the benchmark datasets (CARPK and PUCPR+). To further demonstrate the effectiveness of the spatial cognition module, a multi-class object detection experiment is conducted on the Stanford Drone dataset (SDD). The results show that the proposed model achieves high frame rates and better detection accuracies than the state-of-the-art methods, which are 90.1% (CAPPK), 90.8% (PUCPR+), and 91.2% (SDD).

A Smart Eco-System for Parking Detection Using Deep Learning and Big Data Analytics

Conference Paper

Apr 2019

Searching for MobileNetV3

Conference Paper

Oct 2019

A Brain-inspired Visual Fear Responses Model for UAV Emergent Obstacle Dodging

Article

Sep 2019

Dodging emergent dangers is an innate cognitive ability for animals, which helps them to survive in the natural environment. The retina-superior colliculus-pulvinar-amygdala-periaqueductal gray pathway is responsible for the visual fear responses, and it is able to quickly detect the looming obstacles for innate dodging. Inspired by the mechanism of the visual fear responses pathway, we propose a brain-inspired emergent obstacle dodging method to model the functions of the related brain regions. This method firstly detects the moving direction and speed of the salient point of moving objects (retina). Then, we detect the looming obstacles (superior colliculus). Thirdly, we modulate attention to the most dangerous area (pulvinar). Fourthly, if the degree of danger exceeds the threshold (amygdala), the UAV moves back to dodge it (periaqueductal gray). Two types of experiments are conducted to validate the effectiveness of the proposed model. In simulated scene, we simulate the process of mice’s fear responses by putting looming dark lights shining on them. In natural scene, we apply the proposed model to the UAV emergent obstacles dodging. Compared to the stereo vision model, the proposed model is not only more biologically realistic from the mechanisms perspective, but also more accurate and faster for computation.

Small Object Detection in Unmanned Aerial Vehicle Images Using Feature Fusion and Scaling-Based Single Shot Detector With Spatial Context Analysis

Article

Mar 2019

Objects in unmanned aerial vehicle (UAV) images are generally small due to the high photography altitude. Although many efforts have been made in object detection, how to accurately and quickly detect small objects is still one of the remaining open challenges. In this paper, we propose a feature fusion and scaling-based single shot detector (FS-SSD) for small object detection in UAV images. FS-SSD is an enhancement based on FSSD, a variety of the original single shot multibox detector (SSD). We add an extra scaling branch of the deconvolution module with an average pooling operation to form a feature pyramid. The original feature fusion branch is adjusted to be better suited to the small object detection task. The two feature pyramids generated by the deconvolution module and feature fusion module are utilized to make predictions together. In addition to the deep features learned by FS-SSD, to further improve the detection accuracy, spatial context analysis is proposed to incorporate the object spatial relationships into object redetection. The interclass and intraclass distances between different object instances are computed as spatial context, which proves effective for multiclass small object detection. Six experiments are conducted on the PASCAL VOC dataset and two UAV image datasets. The experimental results demonstrate that the proposed method can achieve a comparable detection speed but an accuracy superior to those of the six state-of-the-art methods.

Car Detection for Smart Parking Systems Based on Improved YOLOv5

Abstract and Figures

Recommended publications

Car Detector Based on YOLOv5 for Parking Management

YOLO5PKLot: A Parking Lot Detection Network Based on Improved YOLOv5 for Smart Parking Management Sy...

YOLO5PKLot: A Parking Lot Detection Network Based on Improved YOLOv5 for Smart Parking Management Sy...

Vehicle Detector Based on Improved YOLOv5 Architecture for Traffic Management and Control Systems