ArticlePDF Available

SSD Real-Time Illegal Parking Detection Based on Contextual Information Transmission

Computers, Materials & Continua

January 2019
61(3):293-307

DOI:10.32604/cmc.2020.06427

Authors:

Huanrong Tang

Xiangtan University

Dongming Zhang

State Key Labortatory of Communication Content Cognition

Show all 5 authorsHide

Framework of spatial transformer networks

…

Examples of illegal parking (Images come from training dataset)

…

Flowchart of detection

…

K-means result of our dataset

…

Learning rate and total loss It shows that the learning rate and the loss are close to convergence when the training epoch is 15K times. The experiment result of Section 3.1 described in Fig. 10.

…

Figures - uploaded by Jianquan Ouyang

Content may be subject to copyright.

Content uploaded by Jianquan Ouyang

Content may be subject to copyright.

Computers, Materials & Continua CMC, vol.62, no.1, pp.293-307, 2020

CMC. doi:10.32604/cmc.2020.06427 www.techscience.com/cmc

SSD Real-Time Illegal Parking Detection Based on Contextual

Information Transmission

Huanrong Tang1, Aoming Peng1, Dongming Zhang2, Tianming Liu3 and

Jianquan Ouyang1, *

Abstract: With the improvement of the national economic level, the number of vehicles

is still increasing year by year. According to the statistics of National Bureau of Statics,

the number is approximately up to 327 million in China by the end of 2018, which makes

urban traffic pressure continues to rise so that the negative impact of urban traffic order is

growing. Illegal parking-the common problem in the field of transportation security is

urgent to be solved and traditional methods to address it are mainly based on ground loop

and manual supervision, which may miss detection and cost much manpower. Due to the

rapidly developing deep learning sweeping the world in recent years, object detection

methods relying on background segmentation cannot meet the requirements of complex

and various scenes on speed and precision. Thus, an improved Single Shot MultiBox

Detector (SSD) based on deep learning is proposed in our study, we introduce attention

mechanism by spatial transformer module which gives neural networks the ability to

actively spatially transform feature maps and add contextual information transmission in

specified layer. Finally, we found out the best connection layer in the detection model by

repeated experiments especially for small objects and increased the precision by 1.5%

than the baseline SSD without extra training cost. Meanwhile, we designed an illegal

parking vehicle detection method by the improved SSD, reaching a high precision up to

97.3% and achieving a speed of 40FPS, superior to most of vehicle detection methods,

will make contributions to relieving the negative impact of illegal parking.

Keywords: Contextual information transmission, illegal parking detection, spatial

attention mechanism, deep learning.

1 Introduction

Illegal parking-a common problem in the field of transportation security is urgent to be

solved. According to the statistics of National Bureau of Statics, the number of vehicles is

up to 327 million in China by the of 2018, which makes urban traffic pressure continue to

1 Key Laboratory of Intelligent Computing and Information Processing, Ministry of Education, College of

Information Engineering, Xiangtan University, Xiangtan, 411105, China.

2 National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing,

100029, China.

3 Department of Computer Science, the University of Georgia, Athens, Georgia, USA.

* Corresponding Author: Jianquan Ouyang. Email: oyjq@xtu.edu.cn.

294 CMC, vol.62, no.1, pp.293-307, 2020

rise and the impact of illegal parking on urban traffic order is growing. There are lots of

unnecessary traffic accidents, such as a heavy-duty trailer which hits the crosswalk in a

city, making the non-motor vehicle lanes unusable and thus unfortunately leading to the

death of a pedestrian. It undoubtedly shows the public transportation safety problems

caused by illegal parking need to be eagerly solved [Zheng (2014)]. In the light of above,

it is of great significance to propose a method that can automatically detect the illegal

parking in real-time, which is embodied in two aspects: on the one hand, the method

could relieve and avoid the public transportation safety accidents caused by illegal

parking, on the other hand, it could also provide law enforcement officers with more

efficient monitoring and management tools.

Our study on detection of illegal parking belongs to the field of object detection in

computer vision, the performance is mainly evaluated by speed, precision and robustness.

Traditional methods are mainly based on ground loop and manual supervision, which

may miss detection and cost much manpower, and then researchers do image analysis to

supervise illegal parking. There are limited number of related studies on this topic, few

methods have been proposed to establish such a system [Maddalena and Petrosino (2013);

Mu, Ma and Zhang (2015); Wahyono, Filonenko and Jo (2015)], most of which are based

on separation of foreground and background. Specifically, vehicles are first extracted

from background and then are tracked. An alarm will be triggered if a vehicle was found

to be stationary and it lasts over a preset time in the rectangular region of interest (ROI).

Maddalena et al. [Maddalena and Petrosino (2013)] utilized sophisticated background

modeling strategies to extract the foreground objects, it does well in simple traffic

environment except the crowded scenes. Mu et al. [Mu, Xing and Zhang (2015)]

proposed a method that subtracting the background constructed by Gaussian Mixture

Model to extract the foreground, and then a vehicle is recognized by detecting wheels, it

effectively separates the foreground and background, but a vehicle cannot be detected

when its wheels are occluded. Wahyono et al. [Wahyono, Filonenko and Jo (2015)]

proposed to use background subtraction to get candidates of stationary regions and verify

a vehicle by exacting scalable histogram of oriented gradient (SHOG) features followed

by support vector machines (SVM) classification. It performs well when lighting changes,

but the design of the SHOG features is hard and cannot treat well with complex weather

conditions. Overall, most of above methods are easily affected by various environments,

such as illumination changing, occlusion and weather.

In order to enhance the robustness and precision of the detection in complex

environments, feature extraction plays an important role. The aim of the object detection

is to locate all the objects and specify each object category on a given image or video

[Meng, Rice, Wang et al. (2018)]. With the rapidly development of deep learning, the R-

CNN [Girshick, Donahue, Darrell et al. (2014)], SPP-Net [He, Zhang, Ren et al. (2014)],

Fast R-CNN [Ren, He, Girshick et al. (2015)], Faster R-CNN [Ren, He, Girshick et al.

(2015)] algorithms who based on region proposal are gradually evolved and got better

performance step by step. Finally, the Faster R-CNN has achieved an essentially end to

end detection system with a high precision, but its speed still needs improving. After that,

end to end object detection algorithms such as You Only Look Once (YOLO) [Redmon,

Divvala, Girshick et al. (2016)] and SSD [Liu, Anguelov, Erhan et al. (2016)] which have

SSD Real-Time Illegal Parking Detection Based on Contextual 295

obtained a higher speed and precision, and the latter SSD have gotten a balanced

performance on precision and speed. but its shortcoming is the detection of small objects.

Particularly, to improve the detection about small objects, the Deconvolutional Single

Shot Detector (DSSD) [Fu, Liu, Ranga et al. (2017)] introduces contextual information to

get a better feature representation ability by replacing the basic VGG with ResNet [He,

Zhang, Ren et al. (2016)] and does much better in small objects detection. What’s more,

Google DeepMind had proposed Spatial Transformer Networks (STN), whose

differentiable module do not require redundant annotations and can adaptively learn the

spatial transformation of different data. It can not only transform the input spatially, but

also can be inserted into the arbitrary layer of the existing network as a network module

to achieve the spatial transformation of different feature maps, it is essentially a spatial

domain attention learning mechanism. So that we try to add STNs module to SSD to

further improve the performance of overall model.

Based on the above, the main contributions made in this paper are described below:

a. Aiming at the problem that object detection lacks deep semantic feature information

when using the feature information of shallow network in the SSD to predict object. This

paper is inspired by DSSD and dedicated to making use of the deconvolution layer to

achieve the contextual information fusion. Thus, the improved SSD model that can

extract deeper and focused details is proposed, which obtained a higher precision on

PASCAL VOC2007 than baseline SSD by 1.5%.

b. In view of the fact that the pooling layer in Convolutional Neural Network has gotten

certain spatial robustness at the expense of very important location feature information,

this paper introduces a spatial attention transformation to the SSD to relieve the problem.

c. Applying the improved model which combines contextual fusion and STNs to illegal

parking detection in sophisticated scenes and training it with the public datasets BIT-

Vehicle [Dong, Pei, He et al. (2014)] and vehicle images of the PASCAL VOC2007

together, the improved model finally achieved good results that a precision up to 97.3%

and the real-time performance with a speed of 40FPS.

2 Related works

2.1 The single shot detector (SSD)

SSD is a general detector published by Wei Liu in ECCV 2016, which takes advantages

of Faster R-CNN, YOLO and multiscale pyramids. Concretely, it discretizes the output

space of bounding boxes into a set of default boxes, and every feature map owns different

aspect ratios and scales. Additionally, it combines predictions from multiple feature maps

with different resolutions to naturally handle objects of various sizes.

Specifically, the framework of SSD is described by the following Fig. 1.

296 CMC, vol.62, no.1, pp.293-307, 2020

Conv1_x

Pool1Conv2_x

Conv3_x

Conv4_x Conv5_x

SSD Layers

Figure 1: Framework of SSD

The loss function of the SSD is described below:

() ( ) ( )

( )

x, ,, , ,,

conf loc

L cl g L xc L xl g

= +

(1)

where

is the number of matched default box for a ground truth.

conf

is confidence

loss and

loc

is localization loss.

{ }

1, 0

is an indicator for matching the

default

box to the

ground truth box of category

. The localization loss is a Smooth L1 loss

between the predicated box (

) and the ground truth box (

) parameters. The parameters

of location and category are trained at the same time. The scale of the default boxes for

each feature map is computed as:

max min

min

s ( 1) , [1, ]

s kkm

−

=+ −∈

−

(2)

where

min

is 0.2 and

max

is 0.9, meaning the lowest layer has a scale ratio of 0.2 and the

highest layer has a scale ratio of 0.9, and all layers in between are regularly spaced. In

practice, one can also design a distribution of default boxes to best fit a specific dataset, thus

in this paper, we made use of k-means to deal with our dataset to find proper ratio parameters.

By combining predictions for all default boxes with different scales and aspect ratios

from all locations of many feature maps, there are diverse set of predictions, covering

various input object sizes and shapes.

2.2 The deconvolutional single shot detector (DSSD)

Convolutional neural networks have inherent problems in structure: the receptive fields

of high-level networks are relatively large, and the semantic information representation

ability is strong, but the resolution of feature maps is low, and the representation ability

of geometric information is weak; the receptive fields of low-level networks are relatively

small, the information representation ability is strong, although the resolution is high, the

semantic information representation ability is weak. SSD used multi-scale feature maps

to predict objects, high-level feature information with large receptive fields to predict

large objects, and low-level feature information with small receptive fields to predict

small targets. This brings up a problem: the classification performance of the SSD for the

small objects will be poor when using the feature information of the low-level network to

predict the small objects due to the lack of high-level semantic features. The idea to solve

SSD Real-Time Illegal Parking Detection Based on Contextual 297

this problem is to fuse high-level semantic information and low-level semantic

information, which could enrich the prediction of the multi-scale feature map of the input

and finally improves the accuracy.

Based on the above description, DSSD proposed a deconvolution and “width-narrow-wide”

asymmetric “hourglass” structure to inspire us to address the above problem. DSSD is one

of the improved SSD model, it replaces the basic VGG network with Resnet-101 and

introduced the residual module before the classification and regression. After the auxiliary

convolution layer added by the SSD, the deconvolution layer is added to form “wide-

narrow-wide” hourglass structure. One of the biggest improvements in DSSD compared to

SSD is that DSSD has been greatly improved in the detection of small objects. The final

part of the paper also shows the detection performance of small objects. Even so, the

detection speed is much slower than SSD because the Resnet-101 is too deep.

2.3 Spatial transformer networks

Spatial Transformer Networks (STNs), the spatial transformation network was released

by Google’s DeepMind. The classification network is simpler and more efficient by

executing inverse spatial transformation on the data to eliminate the influence made by

the deformation of the target images.

Its framework is pictured as the following Fig. 2, which could be divided into three parts:

localisation network, grid generator and sampler, which can be inserted into the existing

CNN model.

In the Fig. 2, U is the input feature map that passed to a localisation network which

regresses the transformation parameters

, the regular spatial grid G over V is

transformed to the sampling gird

Γ（G）

, which is applied to U to produce the output

feature map V. The combination of the localisation network and sampling mechanism

defines a spatial transformer. What’s more, the STNs can be seen as a generalization of

differentiable attention to any spatial transformation.

Sampler

Localisation net

Grid generator

Spatial Transformer

Figure 2: Framework of spatial transformer networks

2.3.1 Localisation network

The localisation network takes the input feature map

HWC

××

∈

with width W, height

H and C channels and outputs

, the parameters of the transformation

to be applied

298 CMC, vol.62, no.1, pp.293-307, 2020

to the feature map:

()

loc

. The size of

will be changed depending on the

transformation type that is parameterized.

The localisation network function

()

loc

can take any form, such as a fully-connected

network or a convolutional network, but should include a regression layer as the last layer

to produce the transformation parameters

2.3.2 Parameterized sampling grid

As the paper mentioned, by pixel the paper refers to an element of a generic feature map,

not necessarily an image. In general, the output pixels are defined to lie on a regular grid

}

{

GG=

of pixels

( )

i ii

G xy=

, forming an output feature map

''HWC

××

∈

, where

and

are the height and width of the grid, and

is the number of channels,

which is same in the input and output.

In the next expression, we show an case for 2D affine transformation

11 12 13

21 22 23

()

ii i

xG Ay y

θθ

θθθ

 

 

 

=Γ= =

 

 



  

 

(3)

where

( )

are the target coordinates of the regular grid in the output feature map, are

( )

the source coordinates in the input feature map that define the sample points,

and

is the affine transformation matrix.

In Eq. (3), the transform allows cropping, translation, rotation, scale, and skew to be

applied to the input feature map, and requires only 6 parameters (the 6 elements of

) to

be produced by the localisation network.

2.3.3 Differentiable image sampling

The next expression is based on the above, Each

( )

coordinate in

( )

defines

the spatial location in the input where a sampling kernel is applied to get the value at a

particular pixel in the output V .

( ) ( )

[ ] [ ]

; ; 1... ' ' 1...

c cs s

i nm i x i y i c

V U k x m k y n HW C= − Φ − Φ ∀∈ ∀∈

∑∑

(4)

where

and

are the parameters of a generic sampling kernel

( )

which defines

the image interpolation (e.g., bilinear),

is the value at location

( )

,nm

in channel of

the input, and

is the output value for pixel

at location

( )

in channel

In theory, any sampling kernel can be used, as long as (sub-)gradients can be defined

SSD Real-Time Illegal Parking Detection Based on Contextual 299

with respect to

and

. For example, using the integer sampling kernel reduces (4) to

( )

()

( )

0.5

cc s s

i nm i i

V U round x m round y n

δδ

= +− −

∑∑

(5)

where

()

round

means

to the nearest integer and

( )

is the Kronecker delta

function. This sampling kernel equates to just copying the value at the nearest pixel to

( )

to the output location

( )

. Alternatively, a bilinear sampling kernel can be

used, giving it as below.

( ) ( )

max 0,1 | | max 0,1 | |

cc s s

i nm i i

V U xm yn

= −− −−

∑∑

(6)

To allow backpropagation of the loss through this sampling mechanism we can define the

gradients with respect to

and

. For bilinear sampling (6) the partial derivatives are

( ) ( )

max 0,1 | | max 0,1 | |

cHW

Vxm yn

∂= −− −−

∂∑∑

(7)

( )

0 | |1

max 0,1 | | 1

cHW

cs s

nm i i

if m x

VU y n if m x

if m x

−≥

∂

= −− ≥



∂−<



∑∑

(8)

the

∂

is similarity to (8).

3 Proposed method

As shown in Fig. 3 below, the work we have done is mainly divided into two pipelines.

The first is the acquisition and processing of monitoring video data, which includes data

cleaning and labeling, it is not the focus of this paper, but it clearly shows the overall

work. The second pipeline includes the improved SSD of illegal parking detection

method which will be described in the later section.

Figure 3: Overall workflow

300 CMC, vol.62, no.1, pp.293-307, 2020

3.1 Improved SSD based on contextual information and STNs

Based on the above, in the case that SSD does well in large objects, so that we pay more

attention to improve the detection of small objects. In our study, we use deconvolution to

feed deeper layer information back to the shallow layer and add STNs to our model,

which introduces contextual information transmission and spatial transformer network to

promote the performance of our improved model.

In addition, the STNs module is computationally very fast and does not impair the

training speed so that there is little time overhead when used normally, what’s more it

will potentially accelerate the model by subsequent downsampling that applied to the

output of the transformer.

The improved model architecture is shown in the following Fig. 4.

Conv1_x

Pool1Conv2_x

Conv3_x

Conv4_x Conv5_x

SSD Layers

Spatial Transformer

Figure 4: Improved SSD model architecture

As mentioned in the paper [Cao, Xie, Yang et al. (2017)], contextual information fusion

will inevitably introduce unnecessary noise, so it is necessary to find suitable layers for

connection. The connection module was inspired by DSSD and ResNet, its specific

structure is shown in Fig. 5.

Figure 5: Connection module

In order to ensure that Conv5_3 layer and Conv4 3 layer get the same size to connect,

the Conv5_3 layer is connected to a 2*2*512 bilinear deconvolution layer for upsampling.

Then, the Conv4_3 and Conv5_3 layers are respectively connected to a 3*3*512

convolution layer to strenthen the ability of learning features. At the same time, the

normalization layer is respectively connected to accelerate training speed and finally the

feature information fusion is established after respectively connecting the ReLU function

and the rule is adding up the corresponding feature value.

The best combination is tested by the dataset PASCAL VOC2007 and verified the

SSD Real-Time Illegal Parking Detection Based on Contextual 301

precision of Conv4_3, Conv4_3+Conv5_3, Conv4_3+fc6 and Conv3_3+Conv4_3+

Conv5_3. The results will be displayed in the section of 3.3, which indicates that

Conv4_3+Conv5_3 makes more excellent performance.

3.2 Illegal parking detection

The illegal parking detection method proposed in this paper is different from the methods

which execute background modeling with background segmentation, that is very

susceptible to complex environmental factors.

3.2.1 Definition of illegal parking

The definition of illegal parking is pictured as following Fig. 6:

Figure 6: Examples of illegal parking (Images come from training dataset)

Within the fixed monitoring range of the camera, an alarm will be automatically triggered

when there are vehicles staying over a preset time in the red alert area, the color of the

bounding box will change from green to yellow.

3.2.2 Process of illegal parking

The specific flowchart is approximately shown in Fig. 7.

Inspired by Xie et al. [Xie, Wang, Chen et al. (2017)], we firstly optimize the flowchart

to suit our method. Secondly, we set up the alert area and read video by frame to the

improved SSD. Third, we calculate IOU between all the detected bounding boxes and

alert area. IOU is a method used for calculating the proportion of overlapping parts for

two bounding boxes. If it overlaps, the method will output all bounding boxes within

illegal area. Finally, we determine whether there is an illegal parking according to

following key aspects below:

a. Status Analysis

For each vehicle, we will firstly get two bounding boxes of the adjacent two frames and

calculate respective centroid, then we calculate Euclidean distance of the two positions to

determine whether it is driving, if the distance is greater than a given threshold, it is

judged to be moving, otherwise, it is judged to be stationary.

b. Timer Strategy

For each vehicle, the timer is started as soon as the vehicle is moving, and the timer is

cleared to zero once the vehicle is stationary.

302 CMC, vol.62, no.1, pp.293-307, 2020

c. New or Old Box

Since the SSD will always detect all the vehicles in an input image, so we need to match

the newly detected boxes with the bounding boxes in last frame. Here we match two

bounding boxes by calculating the IOU between them. If an IOU of two bounding boxes

is greater than the given threshold, we think the two bounding boxes contain the same

vehicle and the new box will inherit the timing information of the old box.

An alarm will be automatically triggered when there are vehicles staying over a preset

time in the alert area.

Start

Read video

by frame

Detection by

improved SSD

Vehicle in

alert area?

Previous vehicle?

Update box

position

New box

Status analysis

Illegal parking?

Alerting

NNext vehicle

Next vehicle

Figure 7: Flowchart of detection

3.2.3 Evaluation of illegal parking

We redefine precision and recall as the following formulas:

SSD Real-Time Illegal Parking Detection Based on Contextual 303

illegal parking

illegal parking illegal parking

ecision

True

True False

−

−−

(9)

illegal parking

illegal parking N illegal parking

call

True

True False

−

− −−

(10)

illegal parking

True

−

: the times of correctly detecting illegal parking;

illegal parking

False −

: the times of incorrectly detecting illegal parking;

N illegal parking

False

−−

: the times of missing detection of illegal parking.

4 Experiment results

4.1 Experiment configuration

OS: Ubuntu-Server 16.04(64 bit), CPU: 2.2 GHz Dual-Core CPU, Memory: 16 GB, GPU:

Matrox Electronics Systems Ltd. MGA G200e [Pilot] Server Engines, Tensorflow

Version: 1.4.0, CUDA Version: 8.0, CUDANN Version: 8.0.61.

4.2 Experiment dataset

The dataset mainly comes from the BIT-Vehicle and PASCAL VOC2007. The former

has a total of 9850 vehicle images all over the China, it contains various color,

illumination, car model, the latter is a classic dataset widely used in object detection.

Specifically, we use K-means to deal with the dataset to get proper ratio parameters in

SSD model, the flow of K-means is generally as follows:

(1) Initialize the sample dataset.

(2) Select random K initial clustering centers for the analysis data.

(3) Calculate the distance between each point and the center point in the sample dataset

and divide it into the category that the distance is closest.

(4) Calculate the average of all points in each cluster and set it as a new clustering center

(5) Repeat Step 3.

(6) Repeat Step 4 until met the preset iteration condition.

Then, the operation for our dataset is described below:

(1) Read the labelling files of dataset.

(2) Get the coordinates of ground truth bounding boxes and calculate length, width,

finally generate the 2D coordinate.

(3) Start clustering with the coordinate as the feature until met the iteration condition.

(Select K=4 in our dataset).

Finally, the K-means result is show in Fig. 8, we get the ratio parameters

}

{

= 1 0.9 1.2 1.5 2

BIT

a，，，，

which made our experimental performance better.

304 CMC, vol.62, no.1, pp.293-307, 2020

Figure 8: K-means result of our dataset

4.3 Experiment results of connection module

In this part, we display the test results of earlier Section 3.1, we choose the PASCAL

VOC2007 dataset to train and use “xavier” method for weight initialization. The learning

rate and loss curve in the experiment are shown in Fig. 9.

Figure 9: Learning rate and total loss

It shows that the learning rate and the loss are close to convergence when the training

epoch is 15K times. The experiment result of Section 3.1 described in Fig. 10.

SSD Real-Time Illegal Parking Detection Based on Contextual 305

Figure 10: Detection results of different connection layers

In order to find out the most suitable combination of connection module, we repeat

experiments with VOC 2007 to take the mean. As shown in Tab. 1, Conv4_3 + Conv5_3

makes the better performance that its precision is higher than Conv4_3 by 1.5%, while

fc6 has a larger receptive field for small objects compared to conv5_3, which will

introduce more background noise and make the performance slightly worse.

4.4 Result of detection of illegal parking

In order to verify the improved effect, we changed different detection model for

comparative experiments. The experiment results are shown in Tab. 1 below.

Table 1: Detection results of different basic models

As shown in Tab. 1, It clearly shows that our method has better performance because of the

strong ability to extract features, which gets a high precision up to 97.3% and a speed of 40

FPS, superior to the other methods for illegal parking detection with same conditions.

5 Conclusion

In our paper, we proposed a real-time illegal parking detection method to relieve the

negative impact caused by illegal parking. We designed and verified an improved SSD

Basic Model

Precision (%)

Recall (%)

FPS

DPM

92.32

83.75

1/3

Faster R-CNN

93.45

87.72

Yolo-v2

94.22

95.42

Improved SSD

97.30

96.84

306 CMC, vol.62, no.1, pp.293-307, 2020

inspired by DSSD and STNs, which takes advantage of contextual information fusion and

spatial transformer networks (spatial attention mechanism). Due to our improvement, it

enhances the robustness of detection in various environments and improved the detection

accuracy on small objects. With the improved SSD, we exactly achieved a high precision

that up to 97.3% and a speed of 40FPS real-time detection, superior to the other methods

for illegal parking detection with the same conditions.

In our future work, we will make further improvement on our detection method for

workflow optimization and keep studying for new methods such as Nai et al. [Nai, Li, Li

et al. (2018)], which proposes a novel local sparse representation based on tracking

framework for visual tracking.

Acknowledgement: This research has been supported by NSFC (61672495), Scientific

Research Fund of Hunan Provincial Education Department (16A208), Project of Hunan

Provincial Science and Technology Department (2017SK2405), and in part by the

construct program of the key discipline in Hunan Province and the CERNET Innovation

Project (NGII20170715).

References

Cao, G. M.; Xie, X. M.; Yang, W. Z.; Liao, Q.; Shi, G. M. (2017): Feature-fused SSD:

fast detection for small objects. International Conference on Graphic and Image

Processing.

Dong, Z.; Pei, M. T.; He, Y.; Liu, T.; Dong, Y. M. et al. (2014): Vehicle type

classification using unsupervised convolutional neural network. IEEE International

Conference on Pattern Recognition, pp. 172-177.

Fu, C.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A. (2017): DSSD: deconvolutional single

shot detector. Computer Vision and Pattern Recognition.

Girshick, R. (2015): Fast R-CNN. IEEE International Conference on Computer Vision,

pp. 1440-1448.

Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. (2014): Rich feature hierarchies for

accurate object detection and semantic segmentation. IEEE Conference on Computer

Vision and Pattern Recognition, pp. 580-587.

He, K. M.; Zhang, X. H.; Ren, S. Q.; Sun, J. (2016): Deep residual learning for image

recognition. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778.

He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J. (2014): Spatial pyramid pooling in deep

convolutional networks for visual recognition. European Conference on Computer Vision,

pp. 346-361.

Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S. et al. (2016): SSD: single

shot multibox detector. European Conference on Computer Vision, pp. 21-37.

Meng, R. H.; Rice, S. G.; Wang, J.; Sun, X. M. (2018): A fusion steganographic

Algorithm based on faster R-CNN. Computers, Materials & Continua, vol. 55, no. 1, pp.

1-16.

Maddalena, L.; Petrosino, A. (2013): Stopped object detection by learning foreground

SSD Real-Time Illegal Parking Detection Based on Contextual 307

model in videos. IEEE Transactions on Neural Networks and Learning Systems, vol. 24,

no. 5, pp. 723-735.

Mu, C. Y.; Ma, X.; Zhang, P. P. (2015): Smart detection of vehicle in Illegal parking

area by fusing of multi-features. International Conference on Next Generation Mobile

Applications, Services and Technologies, pp. 388-392.

Nai, K.; Li, Z. Y.; Li, G. J.; Wang, S. Q. (2018): Robust object tracking via local sparse

appearance model. IEEE Transactions on Image Processing, vol. 27, no. 10, pp. 4958-4970.

Ren, S. Q.; He, K. M.; Girshick, R.; Sun, J. (2015): Faster R-CNN: towards real-time

object detection with region proposal networks. Advances in Neural Information

Processing Systems, pp. 91-99.

Redmon, J.; Divvala, S.; Girshick, R; Farhadi, A. (2016): You only look once: unified,

real time object detection. IEEE Conference on Computer Vision and Pattern

Recognition, pp. 779-788.

Wahyono, W.; Filonenko, A.; Kang-Hyun, J. (2018): Illegally parked vehicle detection

using adaptive dual background model. Industrial Electronics Society, pp. 25-28.

Xie, X. M.; Wang, C. Y.; Chen, S.; Shi, G. M.; Zhao, Z. F. (2017): Real-time illegal

parking detection system based on deep learning. International Conference on Deep

Learning Technologies.

Zheng, Y. (2014): Research on Key Technologies of Illegal Parking Forensics (Ph.D.

Thesis). Shanghai Jiao Tong University, China.

Determining the possibility of passage through narrow alleys using ultrasonic sensors

Article

Full-text available

Jun 2024

The complex road topography of South Korea presents significant challenges to the timely arrival of emergency vehicles. Compounding the issue, obstacles such as legally or illicitly constructed structures, and improperly parked or stationary vehicles, frequently obstruct the path of emergency vehicles. To address these challenges, this study introduces a novel system aimed at enhancing emergency response times. The system employs ultrasonic sensors that can be integrated into personal devices to measure the width of the numerous narrow alleys prevalent in Korea's densely populated regions. Experiments demonstrate that within a 1-meter range in front of a narrow alley with widths varying between 270 cm and 450 cm where vehicle maneuvering is possible, it's possible to accurately gauge the width using two ultrasonic sensors, achieving a precision within a 5 cm margin of error. This level of accuracy enables the practical assessment of whether emergency vehicles can access the area in real-time by identifying the alley's narrowest point. The proposed system is a cost-effective method using easy-to-buy devices for augmenting emergency preparedness and enhancing emergency response times by ensuring that emergency vehicles can navigate through alleys, thereby fostering a safer living environment.

Trustworthiness Assessment for Crowdsourcing-Based Citywide Parking Availability Sensing via Connected and Automated Vehicles

Article

Full-text available

Sep 2023
J ADV TRANSPORT

Real-time status acquisition of parking spaces is highly valuable for an intelligent urban parking system. Crowdsourcing-based parking availability sensing via connected and automated vehicles (CAVs) provides a feasible method with the advantages of high coverage and low cost. However, data trust issues arise from incorrect detection and incomplete information. This paper proposes a trustworthiness assessment method for crowdsourced CAV data considering different impact factors, such as the distance between the CAV and the target parking space, line abrasion, scene complexity, and image sharpness. The crowdsourced CAV data are collected through extensive field experiments and PreScan simulations. The classical line detection algorithm of VPS-Net and the target detection algorithm of YOLO-v3 are applied to detect on-street parking availability. A failure probability model based on the XGBoost algorithm is then developed to establish the relationship between data trustworthiness and different impact factors. The results show that the proposed model has an average accuracy of 78.29% and can effectively assess the degrees of external influences on the trustworthiness of the crowdsourced data. This paper provides a new tool to identify the data quality and improve the sensing accuracy for a crowdsourcing-based parking availability information system.

Parking Time Violation Tracking Using YOLOv8 and Tracking Algorithms

Article

Full-text available

Jun 2023
SENSORS-BASEL

The major problem in Thailand related to parking is time violation. Vehicles are not allowed to park for more than a specified amount of time. Implementation of closed-circuit television (CCTV) surveillance cameras along with human labor is the present remedy. However, this paper presents an approach that can introduce a low-cost time violation tracking system using CCTV, Deep Learning models, and object tracking algorithms. This approach is fairly new because of its appliance of the SOTA detection technique, object tracking approach, and time boundary implementations. YOLOv8, along with the DeepSORT/OC-SORT algorithm, is utilized for the detection and tracking that allows us to set a timer and track the time violation. Using the same apparatus along with Deep Learning models and algorithms has produced a better system with better performance. The performance of both tracking algorithms was well depicted in the results, obtaining MOTA scores of (1.0, 1.0, 0.96, 0.90) and (1, 0.76, 0.90, 0.83) in four different surveillance data for DeepSORT and OC-SORT, respectively.

SELHR: A Novel Epidemic-Based Model for Information Propagation in Complex Networks

Article

Full-text available

Oct 2022

The study of information spreading based on the complex network theory and topological structure has become an important issue in complex networks. Plenty of infectious disease models are widely used for information diffusion research in complex networks. Based on these state-of-the-art models, a new epidemic dynamic model with dynamic evolution equations is proposed and performed on the homogeneous and heterogeneous networks, respectively, in this paper. Meanwhile, we divide the propagation states into two states: L and H (low propagation ability groups and high propagation ability groups) and consider the transformation of these two states in our model. Then, the equilibria and stability of the model are analyzed for both homogeneous and heterogeneous networks to verify the validity of the proposed model. Finally, simulation results illustrate that the proposed model and information propagation dynamic evolution equations are reasonable and effective. Experiments with effect factors also reveal the interaction mechanism and the diffusion process of the proposed model in complex networks.

Real-Time Illegal Parking Detection Algorithm in Urban Environments

Preprint

Full-text available

Jun 2022

Currently, illegal parking detection tasks are mainly achieved through manually checking by enforcement officers on patrol or using Closed-Circuit Television (CCTV) cameras. However, these methods either need high human labour costs or demand installation costs and procedures. Therefore, illegal parking detection solutions, which can reduce significant labour and equipment installation costs, are highly demanded. This paper proposes a novel voting based detection algorithm using deep learning networks implemented using in-vehicle cameras to achieve illegal parking detection with multiple offences’ types. Adopting in-vehicle cameras better matches real-world mobile scenarios than using traditional CCTV cameras as this helps enforcement authorities to reduce manpower and installation costs. A well-constructed new dataset with more than 10 000 high-quality labelled images with seven object categories is built for illegal parking detection tasks. Additionally, one novel labelling method named “minimal illegal units” is proposed for illegal parking detection. It reduces the time and human labelling costs significantly, achieving a better correlation of a vehicle and its parking type. The experiments have been conducted in the urban areas of Singapore. Furthermore, the illumination robustness test has also been performed to illustrate that the proposed detection algorithm exhibits strong resistance to changing illumination conditions in varied operating environments. Our proposed detection algorithm can provide a benchmark for research in illegal parking detection.

A New Curb Lane Monitoring and Illegal Parking Impact Estimation Approach Based on Queueing Theory and Computer Vision for Cameras with Low Resolution and Low Frame Rate

Article

Full-text available

Jun 2022
TRANSPORT RES A-POL

The rapid development of the internet of things (IoT), sensing technologies, machine learning and deep learning techniques, along with the growing variety and volume of data, have yielded new perspectives on how novel technologies can be applied to obtain new sources of curb data to achieve cost-effective curb management. This study presents a new computer vision based data acquisition and analytics approach for curb lane monitoring and illegal parking impact assessment. The proposed "rank, detect, and quantify impact" system consists of three main modules, 1) hotspot identification based on rankings generated by local spatial autocorrelation analysis, 2) curb lane occupancy estimation leveraging traffic cameras and computer vision techniques, and 3) illegal parking traffic impact quantification using a M/M/∞ queueing model. To demonstrate the feasibility and validity of the proposed approach, it was tested and empirically validated using field data collected from three case study sites in Midtown Manhattan, New York City (NYC)-one of the most complex urban networks in the world. Different types of curb lane occupancy, including parking and bus lanes, and different frequencies of illegal parking (high, moderate, low frequency) were investigated. Specifically, the proposed method was proven to be effective even for low resolution and discontinuous video feed obtained from publicly available traffic cameras. All three case study sites achieved good detection accuracy (86% to 96%) for parking and bus lane occupancy, and acceptable precision and recall on detecting illegal parking events. The queueing model was also proven to effectively quantify link travel time with the appearance of illegal parking events with different frequencies. The proposed "rank, detect, and quantify impact" system is friendly for large-scale implementation and real-time application. It is also highly scalable and can be easily adopted by other cities to provide transportation agencies with effective data collection and innovative curb space management strategies.

Recognition of Taxi Violations Based on Semantic Segmentation of PSPNet and Improved YOLOv3

Article

Full-text available

Nov 2021

Taxi has the characteristics of strong mobility and wide dispersion, which makes it difficult for relevant law enforcement officers to make accurate judgment on their illegal acts quickly and accurately. With the investment of intelligent transportation system, image analysis technology has become a new method to determine the illegal behavior of taxis, but the current image analysis method is still difficult to support the detection of illegal behavior of taxis in the actual complex image scene. To solve this problem, this study proposed a method of taxi violation recognition based on semantic segmentation of PSPNet and improved YOLOv3. (1) Based on YOLOv3, the proposed method introduces spatial pyramid pooling (SPP) for taxi recognition, which can convert vehicle feature images with different resolutions into feature vectors with the same dimension as the full connection layer and solve the problem of repeated extraction of YOLOv3 vehicle image features. (2) This method can recognize two different violations of taxi (blocking license plate and illegal parking) rather than only one. (3) Based on PSPNet semantic segmentation network, a taxi illegal parking detection method is proposed. This method can collect the global information of road condition images and aggregate the image information of different regions, so as to improve the ability to obtain the global information orderly and improve the accuracy of taxi illegal parking detection. The experimental results show that the proposed method has excellent recognition performance for the detection rate of license plate occlusion behavior DR is 85.3%, and the detection rate of taxi illegal parking phenomenon DR is 96.1%.

Parking Time Violation Tracking using Yolov8 and DeepSORT

Preprint

Full-text available

May 2023

In Thailand, parking time violation is a major problem, especially for mini-marts. At present the task of detecting parking time violation is mainly conducted manually using Closed-Circuit Television (CCTV). This method requires additional human labour to track incoming and outgoing vehicles. Therefore, low cost time violation tracking is needed. To the best of our knowledge, there has not been any research for parking violation detection and tracking conducted for parking time limits. This paper introduces a novel parking time violation detection algorithm using the Yolov8 and DeepSORT tracking algorithms to track vehicles in consecutive frames. The presented parking violation tracking algorithm can provide a guideline for research in parking time violation detection.

The Illegal Parking Score – Understanding and predicting the risk of parking illegalities in Lisbon based on spatiotemporal features

Article

Jul 2022

Illegal parking represents a costly problem for most cities as it leads to an increase in traffic congestion and emission of air pollutants, and decreases pedestrian, biking, and driving safety, making cities less clean, secure, and attractive to citizens and tourists. Most decision-support systems employed to deal with parking illegalities rely on cameras and video-processing algorithms to capture infractions in real-time. Despite being effective, their implementation is costly and challenging due to road environment conditions. On the other hand, studies that relay on spatiotemporal features to predict infractions can present a more efficient alternative, one that is less costly to implement and free of environment and spatial conditioning. In this work, we propose the Illegal Parking Score (IPS), a score of the conditional probability of illegal parking occurring in a road segment, based on spatiotemporal conditions, and able to distinguish between illegality types. The IPS is calculated for the Lisbon Municipality, in Portugal, and it is supported by a Light Gradient Boosting Machine model that allows for IPS prediction for unseen conditions. Likewise, we propose the IPS Simulator, a simulation tool that allows for users to infer the IPS by defining spatiotemporal conditions. This system will be deployed in the Lisbon City Council and provides responsible authorities with a tool to support their daily operations and promote sustainable transport and demand planning, by identifying and monitoring critical zones and by aiding in the design and gauge of parking regulation.

Face mask detection in COVID-19: a strategic review

Article

Full-text available

May 2022
MULTIMED TOOLS APPL

With the outbreak of the Coronavirus Disease in 2019, life seemed to be had come to a standstill. To combat the transmission of the virus, World Health Organization (WHO) announced wearing of face mask as an imperative way to limit the spread of the virus. However, manually ensuring whether people are wearing face masks or not in a public area is a cumbersome task. The exigency of monitoring people wearing face masks necessitated building an automatic system. Currently, distinct methods using machine learning and deep learning can be used effectively. In this paper, all the essential requirements for such a model have been reviewed. The need and the structural outline of the proposed model have been discussed extensively, followed by a comprehensive study of various available techniques and their respective comparative performance analysis. Further, the pros and cons of each method have been analyzed in depth. Subsequently, sources to multiple datasets are mentioned. The several software needed for the implementation are also discussed. And discussions have been organized on the various use cases, limitations, and observations for the system, and the conclusion of this paper with several directions for future research.

Robust Object Tracking via Local Sparse Appearance Model

Article

Jun 2018

In this paper, we propose a novel local sparse representation based tracking framework for visual tracking. To deeply mine the appearance characteristics of different local patches, the proposed method divides all local patches of a candidate target into three categories, which are stable patches, valid patches and invalid patches. All these patches are assigned different weights to consider the different importance of the local patches. For stable patches, we introduce a local sparse score to identify them, and discriminative local sparse coding (DLSC) is developed to decrease the weights of background patches among the stable patches. For valid patches and invalid patches, we adopt local linear regression (LLR) to distinguish the former from the latter. Furthermore, we propose a weight shrinkage method to determine weights for different valid patches to make our patch weight computation more reasonable. Experimental results on public tracking benchmarks with challenging sequences demonstrate that the proposed method performs favorably against other state-of-the-art tracking methods.

Rich feature hierarchies for accurate object detection and semantic segmentation

Conference Paper

Nov 2014

Can a large convolutional neural network trained for whole-image classification on ImageNet be coaxed into detecting objects in PASCAL? We show that the answer is yes, and that the resulting system is simple, scalable, and boosts mean average precision, relative to the venerable deformable part model, by more than 40% (achieving a final mAP of 48% on VOC 2007). Our framework combines powerful computer vision techniques for generating bottom-up region proposals with recent advances in learning high-capacity convolutional neural networks. We call the resulting system R-CNN: Regions with CNN features. The same framework is also competitive with state-of-the-art semantic segmentation methods, demonstrating its flexibility. Beyond these results, we execute a battery of experiments that provide insight into what the network learns to represent, revealing a rich hierarchy of discriminative and often semantically meaningful features.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Conference Paper

Jan 2016

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [7] and Fast R-CNN [5] have reduced the running time of these detection networks, exposing region pro-posal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolu-tional features. For the very deep VGG-16 model [18], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image. The code will be released.

You Only Look Once: Unified, Real-Time Object Detection

Conference Paper

Jun 2016

Illegally parked vehicle detection using adaptive dual background model

Conference Paper

Nov 2015

Stopped Object Detection by Learning Foreground Model in Videos

Article

May 2013

The automatic detection of objects that are abandoned or removed in a video scene is an interesting area of computer vision, with key applications in video surveillance. Forgotten or stolen luggage in train and airport stations and irregularly parked vehicles are examples that concern significant issues, such as the fight against terrorism and crime, and public safety. Both issues involve the basic task of detecting static regions in the scene. We address this problem by introducing a model-based framework to segment static foreground objects against moving foreground objects in single view sequences taken from stationary cameras. An image sequence model, obtained by learning in a self-organizing neural network image sequence variations, seen as trajectories of pixels in time, is adopted within the model-based framework. Experimental results on real video sequences and comparisons with existing approaches show the accuracy of the proposed stopped object detection approach.

Vehicle type classification using unsupervised convolutional neural network

Jan 2014
172-177

Z Dong
M T Pei
Y He
T Liu
Y M Dong

Dong, Z.; Pei, M. T.; He, Y.; Liu, T.; Dong, Y. M. et al. (2014): Vehicle type classification using unsupervised convolutional neural network. IEEE International Conference on Pattern Recognition, pp. 172-177.

SSD Real-Time Illegal Parking Detection Based on Contextual Information Transmission

Figures

Recommended publications

Coupling Information Transmission With Approximate Message-Passing

Real-Time Illegal Parking Detection System Based on Deep Learning

DNN Design for Object Detection in Airport Runway Operations

A Blockchain-Based Framework for Secure Storage and Sharing of Resumes

A Noise Extraction Method for Cryo-EM Single-Particle Denoising