Conference PaperPDF Available

Car Detector Based on YOLOv5 for Parking Management

Authors:

Abstract

Nowadays, YOLOv5 is one of the most widely used object detection network architectures in real-time systems for traffic management and regulation. To develop a parking management tool, this paper proposes a car detection network based on redesigning the YOLOv5 network architecture. This research focuses on network parameter optimization using lightweight modules from EfficientNet and PP-LCNet architectures. The proposed network is trained and evaluated on two benchmark datasets which are the Car Parking Lot Dataset and the Pontifical Catholic University of Parana+ Dataset and reported on mAP@0.5 and mAP@0.5:0.95 measurement units. As a result, this network achieves the best performances at 95.8 % and 97.4 % of mAP@0.5 on the Car Parking Lot Dataset and the Pontifical Catholic University of Parana+ Dataset, respectively.
Car Detector Based on YOLOv5 for Parking
Management
Duy-Linh Nguyen[0000000161844133], Xuan-Thuy Vo[0000000274110697],
Adri Priadana[0000000215537631], and Kang-Hyun Jo[0000000249377082]
Department of Electrical, Electronic and Computer Engineering, University of Ulsan,
Ulsan 44610, South Korea
ndlinh301@mail.ulsan.ac.kr, xthuy@islab.ulsan.ac.kr,
priadana@mail.ulsan.ac.kr, acejo@ulsan.ac.kr
Abstract. Nowadays, YOLOv5 is one of the most widely used object
detection network architectures in real-time systems for traffic manage-
ment and regulation. To develop a parking management tool, this paper
proposes a car detection network based on redesigning the YOLOv5 net-
work architecture. This research focuses on network parameter optimiza-
tion using lightweight modules from EfficientNet and PP-LCNet archi-
tectures. The proposed network is trained and evaluated on two bench-
mark datasets which are the Car Parking Lot Dataset and the Pontifical
Catholic University of Parana+ Dataset and reported on mAP@0.5 and
mAP@0.5:0.95 measurement units. As a result, this network achieves the
best performances at 95.8 % and 97.4 % of mAP@0.5 on the Car Parking
Lot Dataset and the Pontifical Catholic University of Parana+ Dataset,
respectively.
Keywords: Convolutional neural network (CNN) ·EfficientNet ·PP-
LCNet ·Parking management ·YOLOv5.
1 Introduction
Along with the rapid development of modern and smart cities, the number of
vehicles in general and cars in particular has also increased in both quantity
and type. According to a report by the Statista website [15], there are currently
about one and a half million cars in the world and it is predicted that in 2023,
the number of cars sold will reach nearly 69.9 million. This number will increase
further in the coming years. Therefore, the management and development of
tools to support parking lots are essential. To construct smart parking lots,
researchers propose many methods based on geomagnetic [25], ultrasonic [16],
infrared [2], and wireless techniques [21]. These approaches mainly rely on the
operation of sensors designed and installed in the parking lot. Although these
designs achieve high accuracy, they require large investment, labor, and mainte-
nance costs, especially when deployed in large-scale parking lots. Exploiting the
benefits of convolutional neural networks (CNNs) in the field of computer vision,
several researchers have designed networks to detect empty or occupied parking
2 Duy-Linh Nguyen et al.
spaces using conventional cameras with quite good accuracy [5, 12, 13]. Following
that trend, this paper proposes a car detector to support smart parking man-
agement. This work explores lightweight network architectures and redesigned
modules inside of the YOLOv5 network to balance network parameters, detec-
tion accuracy, and computational complexity. It ensures deployment in real-time
systems with the lowest deployment cost. The main contributions of this paper
are shown below:
1 - Proposes an improved YOLOv5 architecture for car detection that can be
applied to parking management and other related fields of computer vision.
2 - The proposed detector performs better than other detectors on the Car Park-
ing Lot Dataset and the Pontifical Catholic University of Parana+ Dataset.
The distribution of the remaining parts in the paper is as follows: Section 2
presents the car detection-based methods. Section 3 explains the proposed ar-
chitecture in detail. Section 4 introduces the experimental setup and analyzes
the experimental results. Section 5 summarizes the issue and future work orien-
tation.
2 Related works
2.1 Traditional machine learning-based methods
The car detection process of traditional machine learning-based techniques is
divided into two stages, manual feature extraction and classification. First, fea-
ture extractors generate feature vectors using classical methods such as Scale-
invariant Feature Transform (SIFT), Histograms of Oriented Gradients (HOG),
and Haar-like features [18, 19, 22]. Then, the feature vectors go through classifiers
like the Support Vector Machine (SVM) and Adaboost [6, 14] to obtain the tar-
get classification result. The traditional feature extraction methods rely heavily
on prior knowledge. However, in the practical application, there are many objec-
tive confounding factors including weather, exposure, distortion, etc. Therefore,
the applicability of these techniques on real-time systems is limited due to low
accuracy.
2.2 CNN-based methods
Parking lot images obtained from drones or overhead cameras contain many
small-sized cars. In order to detect these objects well, many studies have focused
on the small object detection topic using a combination of CNN and traditional
methods or one-stage detectors. The authors in [1, 24, 3] fuse the modern CNNs
and SVM networks to achieve high spatial resolution in vehicle count detection
and counting. Research in [11] develops a network based on the YOLOv3 net-
work architecture in which the backbone network is combined between ResNet
and DarkNet to solve object vision in drone images. The work in [10] proposes
a new feature-matching method and a spatial context analysis for pedestrian-
vehicle discrimination. An improved YOLOv5 network architecture is designed
Car Detector Based on YOLOv5 for Parking Management 3
Conv
PP-LCNet
LiteEfficientNet
Spatial Pyramid Pooling
Upsample
Concat
BottleNeck Cross Stage Partial
Con2D
640×640×3
40×40×192
20×20×384 20×20×384
40×40×18
Medium object
PP-LC
PP-LC
PP-LC
80×80×18
Small object
20×20×18
Large object
80×80×192
40×40×384
80×80×96
40×40×576
80×80×288
40×40×576
40×40×384
20×20×768
20×20×768
Backbone Neck Detection head
Fig. 1. The architecture of proposed car detector.
4 Duy-Linh Nguyen et al.
by [7] for vehicle detection and classification in Unmanned Aerial Vehicle (UAV)
imagery and [23] for real-world imagery. Another study in [20] provides a one-
stage detector (SF-SSD) with a new spatial cognition algorithm for car detection
in UAV imagery. The advantage of modern machine learning methods is high
detection and classification accuracy, especially for small-sized objects. However,
they require the network to have a high-level feature extraction and fusion, and
a certain complexity to ensure operation in real-world conditions.
3 Methodology
The proposed car detection network is shown in Fig. 1. This network is an
improved YOLOv5 architecture [9] including three main parts: backbone, neck,
and detection head.
3.1 Proposed network architecture
Basically, the structure of the proposed network follows the design of the YOLOv5
network architecture with many changes inside the backbone and neck modules.
Specifically, the Focus module is replaced by a simple block called Conv. This
block is constructed with a standard convolution layer (Con2D) with kernel size
of 1 ×1 followed by a batch normalization (BN) and a ReLU activation func-
tion as shown in Fig. 2 (a). Subsequent blocks in the backbone module are also
Con2D
Conv
Batch Normalization
SiLU
LeakyReLU
Concat
Maxpooling
k=7
k=5
k=3
(c)
(a)
+
(b)
BottleNeck
Fig. 2. The architecture of Conv (a), BottleNeck Cross Stage Partial (b), and Spatial
Pyramid Pooling (c) blocks.
redesigned based on inspiration from lightweight network architectures such as
PP-LCNet [4] and EfficientNet [17]. The design of the PP-LCNet (PP-LC) layer
is described in detail in Fig. 3 (a). It consists of a depthwise convolution layer
Car Detector Based on YOLOv5 for Parking Management 5
(b)
GAP
FC1, ReLU
FC2, Sigmoid
SE block
BN, Hardswish
3×3 DWConv
SE block
1×1 Con2D
BN, Hardswish
(a)
Fig. 3. The architecture of PP-LCNet (a) and SE (b) blocks.
(3 ×3 DWConv), an attention block (SE block), and ends with a standard con-
volution layer (1×1 Con2D). In between these layers, the BN and the hardswish
activation function are used. The SE block is an attention mechanism based on
a global average pooling (GAP) layer, a fully connected layer (FC1) followed by
a rectified linear unit activation function (ReLU), and a second fully connected
layer (FC2) followed by a sigmoid activation function as Fig. 3 (b). This method
uses lightweight convolution layers that save a lot of network parameters. In
addition, the attention mechanism helps the network focus on learning impor-
tant information about the object on each feature map level. The next block
BN, ReLU6
BN, ReLU6
1×1 Con2D
3×3 DWConv
1×1 Con2D
BN
Expand
Conv
Project
Conv
Stride = 2
(a)
BN, ReLU6
BN, ReLU6
1×1 Con2D
1×1 Con2D
BN
Expand
Conv
Project
Conv
Stride = 1
(b)
+
3×3 DWConv
Skip connection
Fig. 4. The two types of LiteEfficientNet (LE) architecture, stride = 2 (a) and stride
= 1 (b)
is LiteEfficientNet (LE). This block is very simple and is divided into two types
corresponding to two stride levels (stride = 1 or stride = 2). In the first type
with stride = 2, the LiteEfficientNet block uses an extended convolution layer
(1 ×1 Con2D), a depth-wise convolution layer (3 ×3 DWConv), and ends with
a project convolution layer (1 ×1 Con2D). For the second type with stride =
1, the LiteEfficientNet block is exactly designed the same as the first type and
6 Duy-Linh Nguyen et al.
added a skip connection to merge the current and original feature maps with
the addition operation. This block extracts the feature maps on the channel di-
mension. The combined use of PP-LCNet and LiteEfficientNet blocks ensures
that feature extraction is both spatial and channel dimensions of each feature
map level. The detail of the LiteEfficientNet block is shown in Fig. 4. The last
block in the backbone module is the Spatial Pyramid Pooling (SPP) block. This
work re-applies the architecture of SPP in the YOLOv5 as Fig. 2 (c). However,
to minimize the network parameters, the max pooling kernel sizes are reduced
from 5 ×5, 9 ×9, and 13 ×13 to 3 ×3, 5 ×5, and 7 ×7, respectively.
The neck module in the proposed network utilizes the Path Aggregation Network
(PAN) architecture following the original YOLOv5. This module combines the
current feature maps with previous feature maps by concatenation operations.
It generates the output with three multi-scale feature maps that are enriched
information. These serve as three inputs for the detection heads.
The detection head module also leverages the construction of three detection
heads from the YOLOv5. Three feature map scales of the PAN neck go through
three convolution operations to conduct prediction on three object scales: small,
medium, and large. Each detection head uses three anchor sizes that describe in
Table 1.
Table 1. Detection heads and anchors sizes.
Heads Input Anchor sizes Ouput Object
1 80 ×80 ×129 (10, 13), (16, 30), (33, 23) 80 ×80 ×18 Small
2 40 ×40 ×384 (30, 61), (62, 45), (59, 119) 40 ×40 ×18 Medium
3 20 ×20 ×768 (116, 90), (156, 198), (373, 326) 20 ×20 ×18 Large
3.2 Loss function
The definition of the loss function is shown as follows:
L=λboxLbox +λobj Lobj +λcls Lcls,(1)
where Lbox uses CIoU loss to compute the bounding box regression. The object
confidence score loss Lobj and the classification loss Lcls using Binary Cross
Entropy loss to calculate. λbox,λobj , and λcls are balancing parameters.
4 Experiments
4.1 Datasets
The proposed network is trained and evaluated on two benchmark datasets,
the Car Parking Lot Dataset (CarPK) and the Pontifical Catholic University
Car Detector Based on YOLOv5 for Parking Management 7
of Parana+ Dataset (PUCPR+) [8]. The CarPK dataset contains 89,777 cars
collected from the Phantom 3 Professional drone. The images were taken from
four parking lots with an approximate height of 40 meters. The CarPK dataset
is divided into 988 images for training and 459 images for validation phases.
The PUCPR+ dataset is selected from a part of the PUCPR dataset consisting
of 16,456 cars. The PUCPR+ dataset provides 100 images for training and 25
images for validation. These are image datasets for car counting in different
parking lots. The cars in the image are annotated by bounding boxes with top-
left and bottom-right angles and stored as text files (*.txt files). To accommodate
the training and evaluation processes, this experiment converts the entire format
of the annotation files to the YOLOv5 format.
4.2 Experimental setup
The proposed network is conducted on the Pytorch framework and the Python
programming language. This network is trained on a Testla V100 32GB GPU
and evaluated on a GeForce GTX 1080Ti 11GB GPU. The optimizer is Adam
optimization. The learning rate is initialized at 105and ends at 103. The mo-
mentum set at 0.8 and then increased to 0.937. The training process goes through
300 epochs with a batch size of 64. The balance parameters are set as follows:
λbox=0.05, λobj =1, and λcls =0.5. To increase training scenarios and avoid the
over-fitting issue, this experiment applies data augmentation methods such as
mosaic, translate, scale, and flip. For the inference process, other arguments are
set like an image size of 1024×1024, a batch size of 32, a confidence threshold =
0.5, and an IoU threshold = 0.5. The speed results are reported in milliseconds
(ms).
4.3 Experimental results
The performance of the proposed network is evaluated lying on the comparison
results with the retrained networks from scratch and the recent research on the
two above benchmark datasets. Specifically, this work conducts the training and
evaluation of the proposed network and the four versions of YOLOv5 architec-
tures (l, m, s, n). Then, it compares the results obtained with the results in [7,
20] on the CarPK dataset and the results in [20] on the PUCPR+ dataset. As
a result, the proposed network achieves 95.8% of mean Average Precision with
an IoU threshold of 0.5 (mAP@0.5) and 63.1% of mAP with ten IoU thresholds
from 0.5 to 0.95 (mAP@0.5:0.95). This result shows the superior ability of the
proposed network compared to other networks. While the speed (inferent time)
is only 1.7 ms higher than retrained YOLOv5m network, nearly 1.5 times lower
than the retrained YOLOv5l network, and quite lower than other experiments
in [7] from 2.3 (YOLOv5m) to 7.9 (YOLOv5m) times. Besides, the weight of
the network (22.7 MB) and the computational complexity (23.9 GFLOPs) are
only half of the retrained YOLOv5m architecture. The comparison results on
the CarPK validation set are presented in Table 2. For the PUCPR+ dataset,
the proposed network achieves 97.4% of mAP@0.5 and 58.0% of mAP@0.5:0.95.
8 Duy-Linh Nguyen et al.
Table 2. Comparison result of proposed car detection network with other networks
and retrained YOLOv5 on CarPK validation set. The symbol denotes the retrained
networks. N/A means not-available values.
Models Parameter Weight (MB) GFLOPs mAP@0.5 mAP@0.5:0.95 Inf. time (ms)
YOLOv5l46,631,350 93.7 114.2 95.3 62.3 26.4
YOLOv5m21,056,406 42.4 50.4 94.4 61.5 15.9
YOLOv5s7,022,326 14.3 15.8 95.6 62.7 8.7
YOLOv5n1,765,270 3.7 4.2 93.9 57.8 6.3
YOLOv5x [7] N/A 167.0 205.0 94.5 57.9 138.2
YOLOv5l [7] N/A 90.6 108.0 95.0 59.2 72.1
YOLOv5m [7] N/A 41.1 48.0 94.6 57.8 40.4
Modified YOLOv5 [7] N/A 44.0 57.7 94.9 61.1 50.5
SSD [20] N/A N/A N/A 68.7 N/A N/A
YOLO9000 [20] N/A N/A N/A 20.9 N/A N/A
YOLOv3 [20] N/A N/A N/A 85.3 N/A N/A
YOLOv4 [20] N/A N/A N/A 87.81 N/A N/A
SA+CF+CRT [20] N/A N/A N/A 89.8 N/A N/A
SF-SSD [20] N/A N/A N/A 90.1 N/A N/A
Our 11,188,534 22.7 23.9 95.8 63.1 17.6
This result is outstanding compared to other competitors and is only 0.3% of
mAP@0.5 and 2.5% of mAP@0.5:095 lower than retrained YOLOv5m, respec-
tively. However, the proposed network has a speed of 17.9 ms, which is only
slightly higher than the retrained YOLOv5m network (2.3 ms ) and lower than
the retrained YOLOv5l network (4.5 ms ). The comparison results are shown
in Table 3 and several qualitative results are shown in Fig. 5.
Table 3. Comparison result of proposed car detection network with other networks and
retrained YOLOv5 on PUCPR+ validation set. The symbol denotes the retrained
networks. N/A means not-available values.
Models Parameter Weight (MB) GFLOPs mAP@0.5 mAP@0.5:0.95 Inf. time (ms)
YOLOv5l46,631,350 93.7 114.2 96.4 53.8 22.4
YOLOv5m21,056,406 42.4 50.4 97.7 60.5 15.6
YOLOv5s7,022,326 14.3 15.8 84.6 38.9 7.4
YOLOv5n1,765,270 3.7 4.2 89.7 41.6 5.9
SSD [20] N/A N/A N/A 32.6 N/A N/A
YOLO9000 [20] N/A N/A N/A 12.3 N/A N/A
YOLOv3 [20] N/A N/A N/A 95.0 N/A N/A
YOLOv4 [20] N/A N/A N/A 94.1 N/A N/A
SA+CF+CRT [20] N/A N/A N/A 92.9 N/A N/A
SF-SSD [20] N/A N/A N/A 90.8 N/A N/A
Our 11,188,534 22.7 23.9 97.4 58.0 17.9
From the mentioned results, the proposed network has a balance in performance,
speed, and network parameters. Therefore, it can be implemented in parking
management systems on low-computing and embedded devices. However, the
process of testing this network also revealed some disadvantages. Since the car
detection network is mainly based on the signal obtained from the drone-view
or floor-view camera, it is influenced by a number of environmental factors, in-
Car Detector Based on YOLOv5 for Parking Management 9
CarPK dataset
PUCPR+ dataset
Fig. 5. The qualitative results and several mistakes of the proposed network on the
validation set of the CarPK and PUCPR+ datasets with IoU threshold = 0.5 and
confidence score = 0.5. Yellow circles denote the wrong detection areas.
10 Duy-Linh Nguyen et al.
cluding illumination, weather, car density, occlusion, shadow, objects similarity,
and the distance from the camera to the cars. Several mistaken cases are listed
in Fig. 5 with yellow circles.
4.4 Ablation study
The experiment conducted several ablation studies to inspect the importance
of each block in the proposed backbones. The blocks are replaced in turn,
trained on the CarPK training set, and evaluated on the CarPK validation
set as shown in Table 4. The results from this table show that the PP-LCNet
block increases the network performance at mAP@ 0.5 (1.1% ) but decreased in
mAP@0.5:0.95 (0.8% ) when compared to the LiteEfficientNet block. Combin-
ing these two blocks gives a perfect result along with the starting Conv and the
ending SPP blocks. Besides, it also shows the superiority of the SPP block (0.4%
of mAP@0.5 and mAP@0.5:0.95) over the SPPF block when they generate the
same GFLOPs and network parameters.
Table 4. Ablation studies with different types of backbones on the CarPK validation
set.
Blocks Proposed backbones
Conv D D D D
PP-LCNet D D D
LiteEfficientNet D D D
SPPF D
SPP D D D
Parameter 10,728,766 9,780,850 11,188,534 11,188,534
Weight (MB) 21.9 19.9 22.7 22.7
GFLOPs 20.8 18.5 23.9 23.9
mAP@0.5 95.1 94.3 95.4 95.8
mAP@0.5:0.95 58.2 59.3 62.7 63.1
5 Conclusion
This paper introduces an improved YOLOv5 architecture for car detection in
parking management systems. The proposed network contains three main mod-
ules: backbone, neck, and detection head. The backbone module is redesigned
using lightweight architectures: PP-LCNet and LiteEfficientNet. The network
achieves 95.8 % of mAP@0.5 and 63.1 % of mAP@0.5:0.95 and better perfor-
mance results when compared to recent works. The optimization of network
parameters, speed, and detection accuracy provides the ability to deploy on
real-time systems. In the future, the neck and detection head modules will be
developed to detect smaller vehicles and implement on larger datasets.
Car Detector Based on YOLOv5 for Parking Management 11
Acknowledgement
This result was supported by the ”Regional Innovation Strategy (RIS)” through
the National Research Foundation of Korea(NRF) funded by the Ministry of
Education(MOE)(2021RIS-003).
References
1. Ammour, N., Alhichri, H., Bazi, Y., Benjdira, B., Alajlan, N., Zuair, M.: Deep
learning approach for car detection in uav imagery. Remote Sensing 9, 1–15 (03
2017). https://doi.org/10.3390/rs9040312
2. Chen, H.C., Huang, C.J., Lu, K.H.: Design of a non-processor obu device for
parking system based on infrared communication. In: 2017 IEEE International
Conference on Consumer Electronics - Taiwan (ICCE-TW). pp. 297–298 (2017).
https://doi.org/10.1109/ICCE-China.2017.7991113
3. Chen, S., Zhang, S., Shang, J., Chen, B., Zheng, N.: Brain-inspired cognitive model
with attention for self-driving cars. IEEE Transactions on Cognitive and Develop-
mental Systems 11(1), 13–25 (2019). https://doi.org/10.1109/TCDS.2017.2717451
4. Cui, C., Gao, T., Wei, S., Du, Y., Guo, R., Dong, S., Lu, B., Zhou, Y., Lv, X.,
Liu, Q., Hu, X., Yu, D., Ma, Y.: Pp-lcnet: A lightweight CPU convolutional neural
network. CoRR abs/2109.15099 (2021), https://arxiv.org/abs/2109.15099
5. Ding, X., Yang, R.: Vehicle and parking space detection based on improved yolo
network model. Journal of Physics: Conference Series 1325, 012084 (10 2019).
https://doi.org/10.1088/1742-6596/1325/1/012084
6. Freund, Y., Schapire, R.E.: A desicion-theoretic generalization of on-line learn-
ing and an application to boosting. In: Vit´anyi, P. (ed.) Computational Learning
Theory. pp. 23–37. Springer Berlin Heidelberg, Berlin, Heidelberg (1995)
7. Hamzenejadi, M.H., Mohseni, H.: Real-time vehicle detection and classification
in uav imagery using improved yolov5. In: 2022 12th International Confer-
ence on Computer and Knowledge Engineering (ICCKE). pp. 231–236 (2022).
https://doi.org/10.1109/ICCKE57176.2022.9960099
8. Hsieh, M., Lin, Y., Hsu, W.H.: Drone-based object counting by spa-
tially regularized regional proposal network. CoRR abs/1707.05972 (2017),
http://arxiv.org/abs/1707.05972
9. Jocher, G., et al.: ultralytics/yolov5: v3.1 - Bug Fixes and Perfor-
mance Improvements (Oct 2020). https://doi.org/10.5281/zenodo.4154370,
https://doi.org/10.5281/zenodo.4154370
10. Liang, X., Zhang, J., Zhuo, L., Li, Y., Tian, Q.: Small object detection in unmanned
aerial vehicle images using feature fusion and scaling-based single shot detector
with spatial context analysis. IEEE Transactions on Circuits and Systems for Video
Technology pp. 1758–1770 (2019)
11. Liu, M., Wang, X., Zhou, A., Fu, X., Ma, Y., Piao, C.: Uav-yolo:
Small object detection on unmanned aerial vehicle perspective. Sensors
20(8) (2020). https://doi.org/10.3390/s20082238, https://www.mdpi.com/1424-
8220/20/8/2238
12. Mart´ın Nieto, R., Garc´ıa-Mart´ın, , Hauptmann, A.G., Mart´ınez, J.M.: Automatic
vacant parking places management system using multicamera vehicle detection.
IEEE Transactions on Intelligent Transportation Systems 20(3), 1069–1080 (2019).
https://doi.org/10.1109/TITS.2018.2838128
12 Duy-Linh Nguyen et al.
13. Mettupally, S.N.R., Menon, V.: A smart eco-system for parking detection using
deep learning and big data analytics. In: 2019 SoutheastCon. pp. 1–4 (2019).
https://doi.org/10.1109/SoutheastCon42311.2019.9020502
14. Mitra, V., Wang, C.J., Banerjee, S.: Text classification: A least square sup-
port vector machine approach. Applied Soft Computing 7, 908–914 (06 2007).
https://doi.org/10.1016/j.asoc.2006.04.002
15. Scotiabank: Number of cars sold worldwide from 2010 to 2022, with a 2023 fore-
cast (in million units). https://www.statista.com/statistics/200002/international-
car-sales-since-1990/, note = Accessed: Jan. 01, 2023. [Online]. Available:
https://www.statista.com/statistics/200002/international-car-sales-since-1990/
16. Shao, Y., Chen, P., Tongtong, C.: A grid projection method based on
ultrasonic sensor for parking space detection. pp. 3378–3381 (07 2018).
https://doi.org/10.1109/IGARSS.2018.8519022
17. Tan, M., Le, Q.V.: Efficientnet: Rethinking model scaling for convolutional neural
networks. CoRR abs/1905.11946 (2019), http://arxiv.org/abs/1905.11946
18. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of sim-
ple features. In: Proceedings of the 2001 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition. CVPR 2001. vol. 1, pp. I–I (2001).
https://doi.org/10.1109/CVPR.2001.990517
19. XU Zihao, HUANG Weiquan, W.Y.: Multi-class vehicle detection in surveillance
video based on deep learning. Journal of Computer Applications 39(3), 700 (2019)
20. Yu, J., Gao, H., Sun, J., Zhou, D., Ju, Z.: Spatial cognition-driven deep
learning for car detection in unmanned aerial vehicle imagery. IEEE Trans-
actions on Cognitive and Developmental Systems 14(4), 1574–1583 (2022).
https://doi.org/10.1109/TCDS.2021.3124764
21. Yuan, C., Qian, L.: Design of intelligent parking lot system based on wireless
network. In: 2017 29th Chinese Control And Decision Conference (CCDC). pp.
3596–3601 (2017). https://doi.org/10.1109/CCDC.2017.7979129
22. Zhang, S., Wang, X.: Human detection and object tracking based on histograms of
oriented gradients. In: 2013 Ninth International Conference on Natural Computa-
tion (ICNC). pp. 1349–1353 (2013). https://doi.org/10.1109/ICNC.2013.6818189
23. Zhang, Y., Guo, Z., Wu, J., Tian, Y., Tang, H., Guo, X.: Real-time
vehicle detection based on improved yolo v5. Sustainability 14(19)
(2022). https://doi.org/10.3390/su141912274, https://www.mdpi.com/2071-
1050/14/19/12274
24. Zhao, F., Kong, Q., Zeng, Y., Xu, B.: A brain-inspired visual fear
responses model for uav emergent obstacle dodging. IEEE Transac-
tions on Cognitive and Developmental Systems 12(1), 124–132 (2020).
https://doi.org/10.1109/TCDS.2019.2939024
25. Zhou, F., Li, Q.: Parking guidance system based on zigbee and geomagnetic sen-
sor technology. In: 2014 13th International Symposium on Distributed Comput-
ing and Applications to Business, Engineering and Science. pp. 268–271 (2014).
https://doi.org/10.1109/DCABES.2014.58
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
To reduce the false detection rate of vehicle targets caused by occlusion, an improved method of vehicle detection in different traffic scenarios based on an improved YOLO v5 network is proposed. The proposed method uses the Flip-Mosaic algorithm to enhance the network’s perception of small targets. A multi-type vehicle target dataset collected in different scenarios was set up. The detection model was trained based on the dataset. The experimental results showed that the Flip-Mosaic data enhancement algorithm can improve the accuracy of vehicle detection and reduce the false detection rate.
Poster
Full-text available
We have presented the experiment we have performed in the detection of Dhaka traffic on a public dataset and presented our results thereby. Our experiment includes training the YOLOv5 object detection model and fine-tuning the parameters for better mean average precision.
Article
Full-text available
Object detection, as a fundamental task in computer vision, has been developed enormously, but is still challenging work, especially for Unmanned Aerial Vehicle (UAV) perspective due to small scale of the target. In this study, the authors develop a special detection method for small objects in UAV perspective. Based on YOLOv3, the Resblock in darknet is first optimized by concatenating two ResNet units that have the same width and height. Then, the entire darknet structure is improved by increasing convolution operation at an early layer to enrich spatial information. Both these two optimizations can enlarge the receptive filed. Furthermore, UAV-viewed dataset is collected to UAV perspective or small object detection. An optimized training method is also proposed based on collected UAV-viewed dataset. The experimental results on public dataset and our collected UAV-viewed dataset show distinct performance improvement on small object detection with keeping the same level performance on normal dataset, which means our proposed method adapts to different kinds of conditions.
Article
Full-text available
YOLO has a fast detection speed and is suitable for object detection in real-time environment. This paper is based on YOLO v3 network and applied to parking spaces and vehicle detection in parking lots. Based on YOLO v3, this paper adds a residual structure to extract deep vehicle parking space features, and uses four different scale feature maps for object detection, so that deep networks can extract more fine-grained features. Experiment results show that this method can improve the detection accuracy of vehicle and parking space, while reducing the missed detection rate.
Article
Small object detection is the main challenge for image detection of unmanned aerial vehicles (UAVs), especially with small pixel ratios and blurred boundaries. In this paper, a one-stage detector (SF-SSD) is proposed with a new spatial cognition algorithm. The deconvolution operation is introduced to a feature fusion module, which enhances the representation of shallow features. These more representative features prove effective for small-scale object detection. Empowered by a spatial cognition method, the deep model can re-detect objects with less-reliable confidence scores. This enables the detector to improve detection accuracy significantly. Both between-class similarity and within-class similarity are fully exploited to suppress useless background information. This motivates the proposed model to take full use of semantic features in the detection process of multi-class small objects. A simplified network structure can improve the speed of object detection. The experiments are conducted on a newly collected dataset (SY-UAV) and the benchmark datasets (CARPK and PUCPR+). To further demonstrate the effectiveness of the spatial cognition module, a multi-class object detection experiment is conducted on the Stanford Drone dataset (SDD). The results show that the proposed model achieves high frame rates and better detection accuracies than the state-of-the-art methods, which are 90.1% (CAPPK), 90.8% (PUCPR+), and 91.2% (SDD).
Article
Dodging emergent dangers is an innate cognitive ability for animals, which helps them to survive in the natural environment. The retina-superior colliculus-pulvinar-amygdala-periaqueductal gray pathway is responsible for the visual fear responses, and it is able to quickly detect the looming obstacles for innate dodging. Inspired by the mechanism of the visual fear responses pathway, we propose a brain-inspired emergent obstacle dodging method to model the functions of the related brain regions. This method firstly detects the moving direction and speed of the salient point of moving objects (retina). Then, we detect the looming obstacles (superior colliculus). Thirdly, we modulate attention to the most dangerous area (pulvinar). Fourthly, if the degree of danger exceeds the threshold (amygdala), the UAV moves back to dodge it (periaqueductal gray). Two types of experiments are conducted to validate the effectiveness of the proposed model. In simulated scene, we simulate the process of mice’s fear responses by putting looming dark lights shining on them. In natural scene, we apply the proposed model to the UAV emergent obstacles dodging. Compared to the stereo vision model, the proposed model is not only more biologically realistic from the mechanisms perspective, but also more accurate and faster for computation.
Article
Objects in unmanned aerial vehicle (UAV) images are generally small due to the high photography altitude. Although many efforts have been made in object detection, how to accurately and quickly detect small objects is still one of the remaining open challenges. In this paper, we propose a feature fusion and scaling-based single shot detector (FS-SSD) for small object detection in UAV images. FS-SSD is an enhancement based on FSSD, a variety of the original single shot multibox detector (SSD). We add an extra scaling branch of the deconvolution module with an average pooling operation to form a feature pyramid. The original feature fusion branch is adjusted to be better suited to the small object detection task. The two feature pyramids generated by the deconvolution module and feature fusion module are utilized to make predictions together. In addition to the deep features learned by FS-SSD, to further improve the detection accuracy, spatial context analysis is proposed to incorporate the object spatial relationships into object redetection. The interclass and intraclass distances between different object instances are computed as spatial context, which proves effective for multiclass small object detection. Six experiments are conducted on the PASCAL VOC dataset and two UAV image datasets. The experimental results demonstrate that the proposed method can achieve a comparable detection speed but an accuracy superior to those of the six state-of-the-art methods.