ArticlePDF Available

SSD Real-Time Illegal Parking Detection Based on Contextual Information Transmission

Tech Science Press
Computers, Materials & Continua
Authors:
  • State Key Labortatory of Communication Content Cognition
Computers, Materials & Continua CMC, vol.62, no.1, pp.293-307, 2020
CMC. doi:10.32604/cmc.2020.06427 www.techscience.com/cmc
SSD Real-Time Illegal Parking Detection Based on Contextual
Information Transmission
Huanrong Tang1, Aoming Peng1, Dongming Zhang2, Tianming Liu3 and
Jianquan Ouyang1, *
Abstract: With the improvement of the national economic level, the number of vehicles
is still increasing year by year. According to the statistics of National Bureau of Statics,
the number is approximately up to 327 million in China by the end of 2018, which makes
urban traffic pressure continues to rise so that the negative impact of urban traffic order is
growing. Illegal parking-the common problem in the field of transportation security is
urgent to be solved and traditional methods to address it are mainly based on ground loop
and manual supervision, which may miss detection and cost much manpower. Due to the
rapidly developing deep learning sweeping the world in recent years, object detection
methods relying on background segmentation cannot meet the requirements of complex
and various scenes on speed and precision. Thus, an improved Single Shot MultiBox
Detector (SSD) based on deep learning is proposed in our study, we introduce attention
mechanism by spatial transformer module which gives neural networks the ability to
actively spatially transform feature maps and add contextual information transmission in
specified layer. Finally, we found out the best connection layer in the detection model by
repeated experiments especially for small objects and increased the precision by 1.5%
than the baseline SSD without extra training cost. Meanwhile, we designed an illegal
parking vehicle detection method by the improved SSD, reaching a high precision up to
97.3% and achieving a speed of 40FPS, superior to most of vehicle detection methods,
will make contributions to relieving the negative impact of illegal parking.
Keywords: Contextual information transmission, illegal parking detection, spatial
attention mechanism, deep learning.
1 Introduction
Illegal parking-a common problem in the field of transportation security is urgent to be
solved. According to the statistics of National Bureau of Statics, the number of vehicles is
up to 327 million in China by the of 2018, which makes urban traffic pressure continue to
1 Key Laboratory of Intelligent Computing and Information Processing, Ministry of Education, College of
Information Engineering, Xiangtan University, Xiangtan, 411105, China.
2 National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing,
100029, China.
3 Department of Computer Science, the University of Georgia, Athens, Georgia, USA.
* Corresponding Author: Jianquan Ouyang. Email: oyjq@xtu.edu.cn.
294 CMC, vol.62, no.1, pp.293-307, 2020
rise and the impact of illegal parking on urban traffic order is growing. There are lots of
unnecessary traffic accidents, such as a heavy-duty trailer which hits the crosswalk in a
city, making the non-motor vehicle lanes unusable and thus unfortunately leading to the
death of a pedestrian. It undoubtedly shows the public transportation safety problems
caused by illegal parking need to be eagerly solved [Zheng (2014)]. In the light of above,
it is of great significance to propose a method that can automatically detect the illegal
parking in real-time, which is embodied in two aspects: on the one hand, the method
could relieve and avoid the public transportation safety accidents caused by illegal
parking, on the other hand, it could also provide law enforcement officers with more
efficient monitoring and management tools.
Our study on detection of illegal parking belongs to the field of object detection in
computer vision, the performance is mainly evaluated by speed, precision and robustness.
Traditional methods are mainly based on ground loop and manual supervision, which
may miss detection and cost much manpower, and then researchers do image analysis to
supervise illegal parking. There are limited number of related studies on this topic, few
methods have been proposed to establish such a system [Maddalena and Petrosino (2013);
Mu, Ma and Zhang (2015); Wahyono, Filonenko and Jo (2015)], most of which are based
on separation of foreground and background. Specifically, vehicles are first extracted
from background and then are tracked. An alarm will be triggered if a vehicle was found
to be stationary and it lasts over a preset time in the rectangular region of interest (ROI).
Maddalena et al. [Maddalena and Petrosino (2013)] utilized sophisticated background
modeling strategies to extract the foreground objects, it does well in simple traffic
environment except the crowded scenes. Mu et al. [Mu, Xing and Zhang (2015)]
proposed a method that subtracting the background constructed by Gaussian Mixture
Model to extract the foreground, and then a vehicle is recognized by detecting wheels, it
effectively separates the foreground and background, but a vehicle cannot be detected
when its wheels are occluded. Wahyono et al. [Wahyono, Filonenko and Jo (2015)]
proposed to use background subtraction to get candidates of stationary regions and verify
a vehicle by exacting scalable histogram of oriented gradient (SHOG) features followed
by support vector machines (SVM) classification. It performs well when lighting changes,
but the design of the SHOG features is hard and cannot treat well with complex weather
conditions. Overall, most of above methods are easily affected by various environments,
such as illumination changing, occlusion and weather.
In order to enhance the robustness and precision of the detection in complex
environments, feature extraction plays an important role. The aim of the object detection
is to locate all the objects and specify each object category on a given image or video
[Meng, Rice, Wang et al. (2018)]. With the rapidly development of deep learning, the R-
CNN [Girshick, Donahue, Darrell et al. (2014)], SPP-Net [He, Zhang, Ren et al. (2014)],
Fast R-CNN [Ren, He, Girshick et al. (2015)], Faster R-CNN [Ren, He, Girshick et al.
(2015)] algorithms who based on region proposal are gradually evolved and got better
performance step by step. Finally, the Faster R-CNN has achieved an essentially end to
end detection system with a high precision, but its speed still needs improving. After that,
end to end object detection algorithms such as You Only Look Once (YOLO) [Redmon,
Divvala, Girshick et al. (2016)] and SSD [Liu, Anguelov, Erhan et al. (2016)] which have
SSD Real-Time Illegal Parking Detection Based on Contextual 295
obtained a higher speed and precision, and the latter SSD have gotten a balanced
performance on precision and speed. but its shortcoming is the detection of small objects.
Particularly, to improve the detection about small objects, the Deconvolutional Single
Shot Detector (DSSD) [Fu, Liu, Ranga et al. (2017)] introduces contextual information to
get a better feature representation ability by replacing the basic VGG with ResNet [He,
Zhang, Ren et al. (2016)] and does much better in small objects detection. What’s more,
Google DeepMind had proposed Spatial Transformer Networks (STN), whose
differentiable module do not require redundant annotations and can adaptively learn the
spatial transformation of different data. It can not only transform the input spatially, but
also can be inserted into the arbitrary layer of the existing network as a network module
to achieve the spatial transformation of different feature maps, it is essentially a spatial
domain attention learning mechanism. So that we try to add STNs module to SSD to
further improve the performance of overall model.
Based on the above, the main contributions made in this paper are described below:
a. Aiming at the problem that object detection lacks deep semantic feature information
when using the feature information of shallow network in the SSD to predict object. This
paper is inspired by DSSD and dedicated to making use of the deconvolution layer to
achieve the contextual information fusion. Thus, the improved SSD model that can
extract deeper and focused details is proposed, which obtained a higher precision on
PASCAL VOC2007 than baseline SSD by 1.5%.
b. In view of the fact that the pooling layer in Convolutional Neural Network has gotten
certain spatial robustness at the expense of very important location feature information,
this paper introduces a spatial attention transformation to the SSD to relieve the problem.
c. Applying the improved model which combines contextual fusion and STNs to illegal
parking detection in sophisticated scenes and training it with the public datasets BIT-
Vehicle [Dong, Pei, He et al. (2014)] and vehicle images of the PASCAL VOC2007
together, the improved model finally achieved good results that a precision up to 97.3%
and the real-time performance with a speed of 40FPS.
2 Related works
2.1 The single shot detector (SSD)
SSD is a general detector published by Wei Liu in ECCV 2016, which takes advantages
of Faster R-CNN, YOLO and multiscale pyramids. Concretely, it discretizes the output
space of bounding boxes into a set of default boxes, and every feature map owns different
aspect ratios and scales. Additionally, it combines predictions from multiple feature maps
with different resolutions to naturally handle objects of various sizes.
Specifically, the framework of SSD is described by the following Fig. 1.
296 CMC, vol.62, no.1, pp.293-307, 2020
Conv1_x
Pool1Conv2_x
Conv3_x
Conv4_x Conv5_x
SSD Layers
Figure 1: Framework of SSD
The loss function of the SSD is described below:
() ( ) ( )
( )
1
x, ,, , ,,
conf loc
L cl g L xc L xl g
N
α
= +
(1)
where
N
is the number of matched default box for a ground truth.
conf
L
is confidence
loss and
loc
L
is localization loss.
is an indicator for matching the
th
i
default
box to the
th
j
ground truth box of category
p
. The localization loss is a Smooth L1 loss
between the predicated box (
l
) and the ground truth box (
g
) parameters. The parameters
of location and category are trained at the same time. The scale of the default boxes for
each feature map is computed as:
max min
min
s ( 1) , [1, ]
1
k
ss
s kkm
m
=+ −∈
(2)
where
min
s
is 0.2 and
max
s
is 0.9, meaning the lowest layer has a scale ratio of 0.2 and the
highest layer has a scale ratio of 0.9, and all layers in between are regularly spaced. In
practice, one can also design a distribution of default boxes to best fit a specific dataset, thus
in this paper, we made use of k-means to deal with our dataset to find proper ratio parameters.
By combining predictions for all default boxes with different scales and aspect ratios
from all locations of many feature maps, there are diverse set of predictions, covering
various input object sizes and shapes.
2.2 The deconvolutional single shot detector (DSSD)
Convolutional neural networks have inherent problems in structure: the receptive fields
of high-level networks are relatively large, and the semantic information representation
ability is strong, but the resolution of feature maps is low, and the representation ability
of geometric information is weak; the receptive fields of low-level networks are relatively
small, the information representation ability is strong, although the resolution is high, the
semantic information representation ability is weak. SSD used multi-scale feature maps
to predict objects, high-level feature information with large receptive fields to predict
large objects, and low-level feature information with small receptive fields to predict
small targets. This brings up a problem: the classification performance of the SSD for the
small objects will be poor when using the feature information of the low-level network to
predict the small objects due to the lack of high-level semantic features. The idea to solve
SSD Real-Time Illegal Parking Detection Based on Contextual 297
this problem is to fuse high-level semantic information and low-level semantic
information, which could enrich the prediction of the multi-scale feature map of the input
and finally improves the accuracy.
Based on the above description, DSSD proposed a deconvolution and width-narrow-wide
asymmetric hourglassstructure to inspire us to address the above problem. DSSD is one
of the improved SSD model, it replaces the basic VGG network with Resnet-101 and
introduced the residual module before the classification and regression. After the auxiliary
convolution layer added by the SSD, the deconvolution layer is added to form “wide-
narrow-wide” hourglass structure. One of the biggest improvements in DSSD compared to
SSD is that DSSD has been greatly improved in the detection of small objects. The final
part of the paper also shows the detection performance of small objects. Even so, the
detection speed is much slower than SSD because the Resnet-101 is too deep.
2.3 Spatial transformer networks
Spatial Transformer Networks (STNs), the spatial transformation network was released
by Googles DeepMind. The classification network is simpler and more efficient by
executing inverse spatial transformation on the data to eliminate the influence made by
the deformation of the target images.
Its framework is pictured as the following Fig. 2, which could be divided into three parts:
localisation network, grid generator and sampler, which can be inserted into the existing
CNN model.
In the Fig. 2, U is the input feature map that passed to a localisation network which
regresses the transformation parameters
θ
, the regular spatial grid G over V is
transformed to the sampling gird
θ
ΓG
, which is applied to U to produce the output
feature map V. The combination of the localisation network and sampling mechanism
defines a spatial transformer. What’s more, the STNs can be seen as a generalization of
differentiable attention to any spatial transformation.
UV
Sampler
Localisation net
Grid generator
Spatial Transformer
Figure 2: Framework of spatial transformer networks
2.3.1 Localisation network
The localisation network takes the input feature map
HWC
UR
××
with width W, height
H and C channels and outputs
θ
, the parameters of the transformation
θ
Γ
to be applied
298 CMC, vol.62, no.1, pp.293-307, 2020
to the feature map:
()
loc
fU
θ
=
. The size of
θ
will be changed depending on the
transformation type that is parameterized.
The localisation network function
()
loc
f
can take any form, such as a fully-connected
network or a convolutional network, but should include a regression layer as the last layer
to produce the transformation parameters
θ
.
2.3.2 Parameterized sampling grid
As the paper mentioned, by pixel the paper refers to an element of a generic feature map,
not necessarily an image. In general, the output pixels are defined to lie on a regular grid
}
{
i
GG=
of pixels
( )
,
ss
i ii
G xy=
, forming an output feature map
''HWC
VR
××
, where
'H
and
'W
are the height and width of the grid, and
C
is the number of channels,
which is same in the input and output.
In the next expression, we show an case for 2D affine transformation
A
θ
.
11 12 13
21 22 23
()
11
tt
ii
s
tt
i
ii i
s
i
xx
xG Ay y
y
θθ
θθθ
θθθ
 
 
 
=Γ= =
 
 

  
 
(3)
where
( )
,
tt
ii
xy
are the target coordinates of the regular grid in the output feature map, are
( )
,
ss
ii
xy
the source coordinates in the input feature map that define the sample points,
and
A
θ
is the affine transformation matrix.
In Eq. (3), the transform allows cropping, translation, rotation, scale, and skew to be
applied to the input feature map, and requires only 6 parameters (the 6 elements of
A
θ
) to
be produced by the localisation network.
2.3.3 Differentiable image sampling
The next expression is based on the above, Each
( )
,
ss
ii
xy
coordinate in
( )
G
θ
Γ
defines
the spatial location in the input where a sampling kernel is applied to get the value at a
particular pixel in the output V .
( ) ( )
[ ] [ ]
; ; 1... ' ' 1...
HW
c cs s
i nm i x i y i c
nm
V U k x m k y n HW C= Φ − Φ ∀∈
∑∑
(4)
where
x
Φ
and
y
Φ
are the parameters of a generic sampling kernel
( )
k
which defines
the image interpolation (e.g., bilinear),
c
nm
U
is the value at location
( )
,nm
in channel of
the input, and
c
i
V
is the output value for pixel
i
at location
( )
,
tt
ii
xy
in channel
c
.
In theory, any sampling kernel can be used, as long as (sub-)gradients can be defined
SSD Real-Time Illegal Parking Detection Based on Contextual 299
with respect to
s
i
x
and
s
i
y
. For example, using the integer sampling kernel reduces (4) to
( )
()
( )
( )
0.5
HW
cc s s
i nm i i
nm
V U round x m round y n
δδ
= +− −
∑∑
(5)
where
()
round
means
x
to the nearest integer and
( )
δ
is the Kronecker delta
function. This sampling kernel equates to just copying the value at the nearest pixel to
( )
,
ss
ii
xy
to the output location
( )
,
tt
ii
xy
. Alternatively, a bilinear sampling kernel can be
used, giving it as below.
( ) ( )
max 0,1 | | max 0,1 | |
HW
cc s s
i nm i i
nm
V U xm yn
= −− −−
∑∑
(6)
To allow backpropagation of the loss through this sampling mechanism we can define the
gradients with respect to
U
and
G
. For bilinear sampling (6) the partial derivatives are
( ) ( )
max 0,1 | | max 0,1 | |
cHW
ss
i
ii
c
nm
nm
Vxm yn
U
= −− −
∑∑
(7)
( )
0 | |1
max 0,1 | | 1
x1
s
i
cHW
cs s
i
nm i i
s
nm
is
i
if m x
VU y n if m x
if m x
−≥
= −− ≥
−<
∑∑
(8)
the
c
i
s
i
V
y
is similarity to (8).
3 Proposed method
As shown in Fig. 3 below, the work we have done is mainly divided into two pipelines.
The first is the acquisition and processing of monitoring video data, which includes data
cleaning and labeling, it is not the focus of this paper, but it clearly shows the overall
work. The second pipeline includes the improved SSD of illegal parking detection
method which will be described in the later section.
Figure 3: Overall workflow
300 CMC, vol.62, no.1, pp.293-307, 2020
3.1 Improved SSD based on contextual information and STNs
Based on the above, in the case that SSD does well in large objects, so that we pay more
attention to improve the detection of small objects. In our study, we use deconvolution to
feed deeper layer information back to the shallow layer and add STNs to our model,
which introduces contextual information transmission and spatial transformer network to
promote the performance of our improved model.
In addition, the STNs module is computationally very fast and does not impair the
training speed so that there is little time overhead when used normally, whats more it
will potentially accelerate the model by subsequent downsampling that applied to the
output of the transformer.
The improved model architecture is shown in the following Fig. 4.
Conv1_x
Pool1Conv2_x
Conv3_x
Conv4_x Conv5_x
SSD Layers
Spatial Transformer
Figure 4: Improved SSD model architecture
As mentioned in the paper [Cao, Xie, Yang et al. (2017)], contextual information fusion
will inevitably introduce unnecessary noise, so it is necessary to find suitable layers for
connection. The connection module was inspired by DSSD and ResNet, its specific
structure is shown in Fig. 5.
Figure 5: Connection module
In order to ensure that Conv5_3 layer and Conv4 3 layer get the same size to connect,
the Conv5_3 layer is connected to a 2*2*512 bilinear deconvolution layer for upsampling.
Then, the Conv4_3 and Conv5_3 layers are respectively connected to a 3*3*512
convolution layer to strenthen the ability of learning features. At the same time, the
normalization layer is respectively connected to accelerate training speed and finally the
feature information fusion is established after respectively connecting the ReLU function
and the rule is adding up the corresponding feature value.
The best combination is tested by the dataset PASCAL VOC2007 and verified the
SSD Real-Time Illegal Parking Detection Based on Contextual 301
precision of Conv4_3, Conv4_3+Conv5_3, Conv4_3+fc6 and Conv3_3+Conv4_3+
Conv5_3. The results will be displayed in the section of 3.3, which indicates that
Conv4_3+Conv5_3 makes more excellent performance.
3.2 Illegal parking detection
The illegal parking detection method proposed in this paper is different from the methods
which execute background modeling with background segmentation, that is very
susceptible to complex environmental factors.
3.2.1 Definition of illegal parking
The definition of illegal parking is pictured as following Fig. 6:
Figure 6: Examples of illegal parking (Images come from training dataset)
Within the fixed monitoring range of the camera, an alarm will be automatically triggered
when there are vehicles staying over a preset time in the red alert area, the color of the
bounding box will change from green to yellow.
3.2.2 Process of illegal parking
The specific flowchart is approximately shown in Fig. 7.
Inspired by Xie et al. [Xie, Wang, Chen et al. (2017)], we firstly optimize the flowchart
to suit our method. Secondly, we set up the alert area and read video by frame to the
improved SSD. Third, we calculate IOU between all the detected bounding boxes and
alert area. IOU is a method used for calculating the proportion of overlapping parts for
two bounding boxes. If it overlaps, the method will output all bounding boxes within
illegal area. Finally, we determine whether there is an illegal parking according to
following key aspects below:
a. Status Analysis
For each vehicle, we will firstly get two bounding boxes of the adjacent two frames and
calculate respective centroid, then we calculate Euclidean distance of the two positions to
determine whether it is driving, if the distance is greater than a given threshold, it is
judged to be moving, otherwise, it is judged to be stationary.
b. Timer Strategy
For each vehicle, the timer is started as soon as the vehicle is moving, and the timer is
cleared to zero once the vehicle is stationary.
302 CMC, vol.62, no.1, pp.293-307, 2020
c. New or Old Box
Since the SSD will always detect all the vehicles in an input image, so we need to match
the newly detected boxes with the bounding boxes in last frame. Here we match two
bounding boxes by calculating the IOU between them. If an IOU of two bounding boxes
is greater than the given threshold, we think the two bounding boxes contain the same
vehicle and the new box will inherit the timing information of the old box.
An alarm will be automatically triggered when there are vehicles staying over a preset
time in the alert area.
Start
Read video
by frame
Detection by
improved SSD
Vehicle in
alert area?
Previous vehicle?
Y
N
Y
Update box
position
New box
N
Status analysis
Illegal parking?
Alerting
NNext vehicle
Y
Next vehicle
Figure 7: Flowchart of detection
3.2.3 Evaluation of illegal parking
We redefine precision and recall as the following formulas:
SSD Real-Time Illegal Parking Detection Based on Contextual 303
Pr
illegal parking
illegal parking illegal parking
ecision
True
True False
−−
=+
(9)
Re
illegal parking
illegal parking N illegal parking
call
True
True False
− −−
=+
(10)
illegal parking
True
: the times of correctly detecting illegal parking;
illegal parking
False
: the times of incorrectly detecting illegal parking;
N illegal parking
False
−−
: the times of missing detection of illegal parking.
4 Experiment results
4.1 Experiment configuration
OS: Ubuntu-Server 16.04(64 bit), CPU: 2.2 GHz Dual-Core CPU, Memory: 16 GB, GPU:
Matrox Electronics Systems Ltd. MGA G200e [Pilot] Server Engines, Tensorflow
Version: 1.4.0, CUDA Version: 8.0, CUDANN Version: 8.0.61.
4.2 Experiment dataset
The dataset mainly comes from the BIT-Vehicle and PASCAL VOC2007. The former
has a total of 9850 vehicle images all over the China, it contains various color,
illumination, car model, the latter is a classic dataset widely used in object detection.
Specifically, we use K-means to deal with the dataset to get proper ratio parameters in
SSD model, the flow of K-means is generally as follows:
(1) Initialize the sample dataset.
(2) Select random K initial clustering centers for the analysis data.
(3) Calculate the distance between each point and the center point in the sample dataset
and divide it into the category that the distance is closest.
(4) Calculate the average of all points in each cluster and set it as a new clustering center
(5) Repeat Step 3.
(6) Repeat Step 4 until met the preset iteration condition.
Then, the operation for our dataset is described below:
(1) Read the labelling files of dataset.
(2) Get the coordinates of ground truth bounding boxes and calculate length, width,
finally generate the 2D coordinate.
(3) Start clustering with the coordinate as the feature until met the iteration condition.
(Select K=4 in our dataset).
Finally, the K-means result is show in Fig. 8, we get the ratio parameters
}
{
= 1 0.9 1.2 1.5 2
BIT
a , , ,
which made our experimental performance better.
304 CMC, vol.62, no.1, pp.293-307, 2020
Figure 8: K-means result of our dataset
4.3 Experiment results of connection module
In this part, we display the test results of earlier Section 3.1, we choose the PASCAL
VOC2007 dataset to train and use “xavier” method for weight initialization. The learning
rate and loss curve in the experiment are shown in Fig. 9.
Figure 9: Learning rate and total loss
It shows that the learning rate and the loss are close to convergence when the training
epoch is 15K times. The experiment result of Section 3.1 described in Fig. 10.
SSD Real-Time Illegal Parking Detection Based on Contextual 305
Figure 10: Detection results of different connection layers
In order to find out the most suitable combination of connection module, we repeat
experiments with VOC 2007 to take the mean. As shown in Tab. 1, Conv4_3 + Conv5_3
makes the better performance that its precision is higher than Conv4_3 by 1.5%, while
fc6 has a larger receptive field for small objects compared to conv5_3, which will
introduce more background noise and make the performance slightly worse.
4.4 Result of detection of illegal parking
In order to verify the improved effect, we changed different detection model for
comparative experiments. The experiment results are shown in Tab. 1 below.
Table 1: Detection results of different basic models
As shown in Tab. 1, It clearly shows that our method has better performance because of the
strong ability to extract features, which gets a high precision up to 97.3% and a speed of 40
FPS, superior to the other methods for illegal parking detection with same conditions.
5 Conclusion
In our paper, we proposed a real-time illegal parking detection method to relieve the
negative impact caused by illegal parking. We designed and verified an improved SSD
Basic Model
Precision (%)
Recall (%)
FPS
DPM
92.32
83.75
1/3
Faster R-CNN
93.45
87.72
5
Yolo-v2
94.22
95.42
38
Improved SSD
97.30
96.84
40
306 CMC, vol.62, no.1, pp.293-307, 2020
inspired by DSSD and STNs, which takes advantage of contextual information fusion and
spatial transformer networks (spatial attention mechanism). Due to our improvement, it
enhances the robustness of detection in various environments and improved the detection
accuracy on small objects. With the improved SSD, we exactly achieved a high precision
that up to 97.3% and a speed of 40FPS real-time detection, superior to the other methods
for illegal parking detection with the same conditions.
In our future work, we will make further improvement on our detection method for
workflow optimization and keep studying for new methods such as Nai et al. [Nai, Li, Li
et al. (2018)], which proposes a novel local sparse representation based on tracking
framework for visual tracking.
Acknowledgement: This research has been supported by NSFC (61672495), Scientific
Research Fund of Hunan Provincial Education Department (16A208), Project of Hunan
Provincial Science and Technology Department (2017SK2405), and in part by the
construct program of the key discipline in Hunan Province and the CERNET Innovation
Project (NGII20170715).
References
Cao, G. M.; Xie, X. M.; Yang, W. Z.; Liao, Q.; Shi, G. M. (2017): Feature-fused SSD:
fast detection for small objects. International Conference on Graphic and Image
Processing.
Dong, Z.; Pei, M. T.; He, Y.; Liu, T.; Dong, Y. M. et al. (2014): Vehicle type
classification using unsupervised convolutional neural network. IEEE International
Conference on Pattern Recognition, pp. 172-177.
Fu, C.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A. (2017): DSSD: deconvolutional single
shot detector. Computer Vision and Pattern Recognition.
Girshick, R. (2015): Fast R-CNN. IEEE International Conference on Computer Vision,
pp. 1440-1448.
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. (2014): Rich feature hierarchies for
accurate object detection and semantic segmentation. IEEE Conference on Computer
Vision and Pattern Recognition, pp. 580-587.
He, K. M.; Zhang, X. H.; Ren, S. Q.; Sun, J. (2016): Deep residual learning for image
recognition. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778.
He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J. (2014): Spatial pyramid pooling in deep
convolutional networks for visual recognition. European Conference on Computer Vision,
pp. 346-361.
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S. et al. (2016): SSD: single
shot multibox detector. European Conference on Computer Vision, pp. 21-37.
Meng, R. H.; Rice, S. G.; Wang, J.; Sun, X. M. (2018): A fusion steganographic
Algorithm based on faster R-CNN. Computers, Materials & Continua, vol. 55, no. 1, pp.
1-16.
Maddalena, L.; Petrosino, A. (2013): Stopped object detection by learning foreground
SSD Real-Time Illegal Parking Detection Based on Contextual 307
model in videos. IEEE Transactions on Neural Networks and Learning Systems, vol. 24,
no. 5, pp. 723-735.
Mu, C. Y.; Ma, X.; Zhang, P. P. (2015): Smart detection of vehicle in Illegal parking
area by fusing of multi-features. International Conference on Next Generation Mobile
Applications, Services and Technologies, pp. 388-392.
Nai, K.; Li, Z. Y.; Li, G. J.; Wang, S. Q. (2018): Robust object tracking via local sparse
appearance model. IEEE Transactions on Image Processing, vol. 27, no. 10, pp. 4958-4970.
Ren, S. Q.; He, K. M.; Girshick, R.; Sun, J. (2015): Faster R-CNN: towards real-time
object detection with region proposal networks. Advances in Neural Information
Processing Systems, pp. 91-99.
Redmon, J.; Divvala, S.; Girshick, R; Farhadi, A. (2016): You only look once: unified,
real time object detection. IEEE Conference on Computer Vision and Pattern
Recognition, pp. 779-788.
Wahyono, W.; Filonenko, A.; Kang-Hyun, J. (2018): Illegally parked vehicle detection
using adaptive dual background model. Industrial Electronics Society, pp. 25-28.
Xie, X. M.; Wang, C. Y.; Chen, S.; Shi, G. M.; Zhao, Z. F. (2017): Real-time illegal
parking detection system based on deep learning. International Conference on Deep
Learning Technologies.
Zheng, Y. (2014): Research on Key Technologies of Illegal Parking Forensics (Ph.D.
Thesis). Shanghai Jiao Tong University, China.
... Recently, various methods of cracking down on illegal parking vehicles using CCTV have been proposed. Studies [7]- [12] have determined the illegal parking situation using image segmentation and deep learning models based on the CCTV system. However, the fixed CCTV enforcement method is only applicable to illegal parking vehicles in areas where CCTVs are installed. ...
Article
Full-text available
The complex road topography of South Korea presents significant challenges to the timely arrival of emergency vehicles. Compounding the issue, obstacles such as legally or illicitly constructed structures, and improperly parked or stationary vehicles, frequently obstruct the path of emergency vehicles. To address these challenges, this study introduces a novel system aimed at enhancing emergency response times. The system employs ultrasonic sensors that can be integrated into personal devices to measure the width of the numerous narrow alleys prevalent in Korea's densely populated regions. Experiments demonstrate that within a 1-meter range in front of a narrow alley with widths varying between 270 cm and 450 cm where vehicle maneuvering is possible, it's possible to accurately gauge the width using two ultrasonic sensors, achieving a precision within a 5 cm margin of error. This level of accuracy enables the practical assessment of whether emergency vehicles can access the area in real-time by identifying the alley's narrowest point. The proposed system is a cost-effective method using easy-to-buy devices for augmenting emergency preparedness and enhancing emergency response times by ensuring that emergency vehicles can navigate through alleys, thereby fostering a safer living environment.
... Yamada and Mizuno [40] demonstrated that the surface of the parking space would infuence the detection results, especially for poor-condition white mark-of lines. Tang et al. [58] showed that deep learning models for parking space recognition are subject to variable environments, such as illumination changing, occlusion, and weather. Ichihashi et al. [59] proposed that weather, such as raindrops, can cause the camera to become distorted and make the sharpness of the image less clear, thus afecting the performance of camerabased vehicle detector for parking lot. ...
Article
Full-text available
Real-time status acquisition of parking spaces is highly valuable for an intelligent urban parking system. Crowdsourcing-based parking availability sensing via connected and automated vehicles (CAVs) provides a feasible method with the advantages of high coverage and low cost. However, data trust issues arise from incorrect detection and incomplete information. This paper proposes a trustworthiness assessment method for crowdsourced CAV data considering different impact factors, such as the distance between the CAV and the target parking space, line abrasion, scene complexity, and image sharpness. The crowdsourced CAV data are collected through extensive field experiments and PreScan simulations. The classical line detection algorithm of VPS-Net and the target detection algorithm of YOLO-v3 are applied to detect on-street parking availability. A failure probability model based on the XGBoost algorithm is then developed to establish the relationship between data trustworthiness and different impact factors. The results show that the proposed model has an average accuracy of 78.29% and can effectively assess the degrees of external influences on the trustworthiness of the crowdsourced data. This paper provides a new tool to identify the data quality and improve the sensing accuracy for a crowdsourcing-based parking availability information system.
... Apart from image processing algorithms, many Deep Learning algorithms have been deployed to detect parking violations. For instance, networks such as the Single Shot MultiBox Detector (SSD) and You Only Look Once (YOLO) have been implemented to identify parking violations [6][7][8]. ...
Article
Full-text available
The major problem in Thailand related to parking is time violation. Vehicles are not allowed to park for more than a specified amount of time. Implementation of closed-circuit television (CCTV) surveillance cameras along with human labor is the present remedy. However, this paper presents an approach that can introduce a low-cost time violation tracking system using CCTV, Deep Learning models, and object tracking algorithms. This approach is fairly new because of its appliance of the SOTA detection technique, object tracking approach, and time boundary implementations. YOLOv8, along with the DeepSORT/OC-SORT algorithm, is utilized for the detection and tracking that allows us to set a timer and track the time violation. Using the same apparatus along with Deep Learning models and algorithms has produced a better system with better performance. The performance of both tracking algorithms was well depicted in the results, obtaining MOTA scores of (1.0, 1.0, 0.96, 0.90) and (1, 0.76, 0.90, 0.83) in four different surveillance data for DeepSORT and OC-SORT, respectively.
... e new generation of information technology includes but is not limited to blockchain [24], Big data [25], cloud computing [26,27], Internet of ings [28][29][30], deep learning [31,32], etc., which is a new state of full utilization of the information resources. Information dissemination has been studied and applied in various fields. ...
Article
Full-text available
The study of information spreading based on the complex network theory and topological structure has become an important issue in complex networks. Plenty of infectious disease models are widely used for information diffusion research in complex networks. Based on these state-of-the-art models, a new epidemic dynamic model with dynamic evolution equations is proposed and performed on the homogeneous and heterogeneous networks, respectively, in this paper. Meanwhile, we divide the propagation states into two states: L and H (low propagation ability groups and high propagation ability groups) and consider the transformation of these two states in our model. Then, the equilibria and stability of the model are analyzed for both homogeneous and heterogeneous networks to verify the validity of the proposed model. Finally, simulation results illustrate that the proposed model and information propagation dynamic evolution equations are reasonable and effective. Experiments with effect factors also reveal the interaction mechanism and the diffusion process of the proposed model in complex networks.
... Deep learning methods have also been applied to illegal parking detection. For example, Convolutional Neural Networks (CNN) and Single Shot MultiBox Detector (SSD) networks have been implemented into illegal parking detection CCTV systems [11], [12], [13]. Compared with traditional image segmentation algorithms, deep learning networks are more robust to the change of nearby operating environments such as illumination and weather conditions. ...
Preprint
Full-text available
Currently, illegal parking detection tasks are mainly achieved through manually checking by enforcement officers on patrol or using Closed-Circuit Television (CCTV) cameras. However, these methods either need high human labour costs or demand installation costs and procedures. Therefore, illegal parking detection solutions, which can reduce significant labour and equipment installation costs, are highly demanded. This paper proposes a novel voting based detection algorithm using deep learning networks implemented using in-vehicle cameras to achieve illegal parking detection with multiple offences’ types. Adopting in-vehicle cameras better matches real-world mobile scenarios than using traditional CCTV cameras as this helps enforcement authorities to reduce manpower and installation costs. A well-constructed new dataset with more than 10 000 high-quality labelled images with seven object categories is built for illegal parking detection tasks. Additionally, one novel labelling method named “minimal illegal units” is proposed for illegal parking detection. It reduces the time and human labelling costs significantly, achieving a better correlation of a vehicle and its parking type. The experiments have been conducted in the urban areas of Singapore. Furthermore, the illumination robustness test has also been performed to illustrate that the proposed detection algorithm exhibits strong resistance to changing illumination conditions in varied operating environments. Our proposed detection algorithm can provide a benchmark for research in illegal parking detection.
... Xie, Wang, et al. 2017;Chen and Yeo 2019;J.T. Lee et al. 2009) and many models use vehicle tracking (Màrmol and Sevillano 2016) (X. Xie, Wang, et al. 2017;Tang et al. 2020;Chen and Yeo 2019) for event detection. Since public traffic surveillance cameras suffer from low image resolution and frame rate, an effective solution that accounts for this feature is needed. ...
Article
Full-text available
The rapid development of the internet of things (IoT), sensing technologies, machine learning and deep learning techniques, along with the growing variety and volume of data, have yielded new perspectives on how novel technologies can be applied to obtain new sources of curb data to achieve cost-effective curb management. This study presents a new computer vision based data acquisition and analytics approach for curb lane monitoring and illegal parking impact assessment. The proposed "rank, detect, and quantify impact" system consists of three main modules, 1) hotspot identification based on rankings generated by local spatial autocorrelation analysis, 2) curb lane occupancy estimation leveraging traffic cameras and computer vision techniques, and 3) illegal parking traffic impact quantification using a M/M/∞ queueing model. To demonstrate the feasibility and validity of the proposed approach, it was tested and empirically validated using field data collected from three case study sites in Midtown Manhattan, New York City (NYC)-one of the most complex urban networks in the world. Different types of curb lane occupancy, including parking and bus lanes, and different frequencies of illegal parking (high, moderate, low frequency) were investigated. Specifically, the proposed method was proven to be effective even for low resolution and discontinuous video feed obtained from publicly available traffic cameras. All three case study sites achieved good detection accuracy (86% to 96%) for parking and bus lane occupancy, and acceptable precision and recall on detecting illegal parking events. The queueing model was also proven to effectively quantify link travel time with the appearance of illegal parking events with different frequencies. The proposed "rank, detect, and quantify impact" system is friendly for large-scale implementation and real-time application. It is also highly scalable and can be easily adopted by other cities to provide transportation agencies with effective data collection and innovative curb space management strategies.
... Also, based on the Elu activation function, the Kelu activation function is designed to ensure the accuracy of license plate detection. Reference Tang et al. [24] proposed an improved, single-shot, multiframe detector based on deep learning. e attention mechanism is introduced through the spatial transformation module, so that the neural network can actively perform spatial transformation on the feature map. ...
Article
Full-text available
Taxi has the characteristics of strong mobility and wide dispersion, which makes it difficult for relevant law enforcement officers to make accurate judgment on their illegal acts quickly and accurately. With the investment of intelligent transportation system, image analysis technology has become a new method to determine the illegal behavior of taxis, but the current image analysis method is still difficult to support the detection of illegal behavior of taxis in the actual complex image scene. To solve this problem, this study proposed a method of taxi violation recognition based on semantic segmentation of PSPNet and improved YOLOv3. (1) Based on YOLOv3, the proposed method introduces spatial pyramid pooling (SPP) for taxi recognition, which can convert vehicle feature images with different resolutions into feature vectors with the same dimension as the full connection layer and solve the problem of repeated extraction of YOLOv3 vehicle image features. (2) This method can recognize two different violations of taxi (blocking license plate and illegal parking) rather than only one. (3) Based on PSPNet semantic segmentation network, a taxi illegal parking detection method is proposed. This method can collect the global information of road condition images and aggregate the image information of different regions, so as to improve the ability to obtain the global information orderly and improve the accuracy of taxi illegal parking detection. The experimental results show that the proposed method has excellent recognition performance for the detection rate of license plate occlusion behavior DR is 85.3%, and the detection rate of taxi illegal parking phenomenon DR is 96.1%.
Preprint
Full-text available
In Thailand, parking time violation is a major problem, especially for mini-marts. At present the task of detecting parking time violation is mainly conducted manually using Closed-Circuit Television (CCTV). This method requires additional human labour to track incoming and outgoing vehicles. Therefore, low cost time violation tracking is needed. To the best of our knowledge, there has not been any research for parking violation detection and tracking conducted for parking time limits. This paper introduces a novel parking time violation detection algorithm using the Yolov8 and DeepSORT tracking algorithms to track vehicles in consecutive frames. The presented parking violation tracking algorithm can provide a guideline for research in parking time violation detection.
Article
Illegal parking represents a costly problem for most cities as it leads to an increase in traffic congestion and emission of air pollutants, and decreases pedestrian, biking, and driving safety, making cities less clean, secure, and attractive to citizens and tourists. Most decision-support systems employed to deal with parking illegalities rely on cameras and video-processing algorithms to capture infractions in real-time. Despite being effective, their implementation is costly and challenging due to road environment conditions. On the other hand, studies that relay on spatiotemporal features to predict infractions can present a more efficient alternative, one that is less costly to implement and free of environment and spatial conditioning. In this work, we propose the Illegal Parking Score (IPS), a score of the conditional probability of illegal parking occurring in a road segment, based on spatiotemporal conditions, and able to distinguish between illegality types. The IPS is calculated for the Lisbon Municipality, in Portugal, and it is supported by a Light Gradient Boosting Machine model that allows for IPS prediction for unseen conditions. Likewise, we propose the IPS Simulator, a simulation tool that allows for users to infer the IPS by defining spatiotemporal conditions. This system will be deployed in the Lisbon City Council and provides responsible authorities with a tool to support their daily operations and promote sustainable transport and demand planning, by identifying and monitoring critical zones and by aiding in the design and gauge of parking regulation.
Article
Full-text available
With the outbreak of the Coronavirus Disease in 2019, life seemed to be had come to a standstill. To combat the transmission of the virus, World Health Organization (WHO) announced wearing of face mask as an imperative way to limit the spread of the virus. However, manually ensuring whether people are wearing face masks or not in a public area is a cumbersome task. The exigency of monitoring people wearing face masks necessitated building an automatic system. Currently, distinct methods using machine learning and deep learning can be used effectively. In this paper, all the essential requirements for such a model have been reviewed. The need and the structural outline of the proposed model have been discussed extensively, followed by a comprehensive study of various available techniques and their respective comparative performance analysis. Further, the pros and cons of each method have been analyzed in depth. Subsequently, sources to multiple datasets are mentioned. The several software needed for the implementation are also discussed. And discussions have been organized on the various use cases, limitations, and observations for the system, and the conclusion of this paper with several directions for future research.
Article
In this paper, we propose a novel local sparse representation based tracking framework for visual tracking. To deeply mine the appearance characteristics of different local patches, the proposed method divides all local patches of a candidate target into three categories, which are stable patches, valid patches and invalid patches. All these patches are assigned different weights to consider the different importance of the local patches. For stable patches, we introduce a local sparse score to identify them, and discriminative local sparse coding (DLSC) is developed to decrease the weights of background patches among the stable patches. For valid patches and invalid patches, we adopt local linear regression (LLR) to distinguish the former from the latter. Furthermore, we propose a weight shrinkage method to determine weights for different valid patches to make our patch weight computation more reasonable. Experimental results on public tracking benchmarks with challenging sequences demonstrate that the proposed method performs favorably against other state-of-the-art tracking methods.
Conference Paper
Can a large convolutional neural network trained for whole-image classification on ImageNet be coaxed into detecting objects in PASCAL? We show that the answer is yes, and that the resulting system is simple, scalable, and boosts mean average precision, relative to the venerable deformable part model, by more than 40% (achieving a final mAP of 48% on VOC 2007). Our framework combines powerful computer vision techniques for generating bottom-up region proposals with recent advances in learning high-capacity convolutional neural networks. We call the resulting system R-CNN: Regions with CNN features. The same framework is also competitive with state-of-the-art semantic segmentation methods, demonstrating its flexibility. Beyond these results, we execute a battery of experiments that provide insight into what the network learns to represent, revealing a rich hierarchy of discriminative and often semantically meaningful features.
Conference Paper
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [7] and Fast R-CNN [5] have reduced the running time of these detection networks, exposing region pro-posal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolu-tional features. For the very deep VGG-16 model [18], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image. The code will be released.
Article
The automatic detection of objects that are abandoned or removed in a video scene is an interesting area of computer vision, with key applications in video surveillance. Forgotten or stolen luggage in train and airport stations and irregularly parked vehicles are examples that concern significant issues, such as the fight against terrorism and crime, and public safety. Both issues involve the basic task of detecting static regions in the scene. We address this problem by introducing a model-based framework to segment static foreground objects against moving foreground objects in single view sequences taken from stationary cameras. An image sequence model, obtained by learning in a self-organizing neural network image sequence variations, seen as trajectories of pixels in time, is adopted within the model-based framework. Experimental results on real video sequences and comparisons with existing approaches show the accuracy of the proposed stopped object detection approach.
Vehicle type classification using unsupervised convolutional neural network
  • Z Dong
  • M T Pei
  • Y He
  • T Liu
  • Y M Dong
Dong, Z.; Pei, M. T.; He, Y.; Liu, T.; Dong, Y. M. et al. (2014): Vehicle type classification using unsupervised convolutional neural network. IEEE International Conference on Pattern Recognition, pp. 172-177.