Conference PaperPDF Available

Vehicle Count System based on Time Interval Image Capture Method and Deep Learning Mask R-CNN

Authors:
  • Nhat Tinh Co. Ltd

Abstract and Figures

Traffic congestion is an undesirable problem for big cities especially in third world countries. Better policy planning and decision-making from the authority comes from well-conducted practical research. In this study, a Vehicle Count System (VCS) using deep learning Mask R-CNN is developed to classify and count vehicles passing in a target street. A novel time interval image capture (TIIC) system is employed to the VCS instead of the typical real-time video streaming to avoid big data storage cost. To determine the effectiveness of the developed VCS, its output is compared to that of conventional method from manual recording of the passing vehicles. Four vehicle types-cars, motorbikes, trucks and buses are present in the 1800 real traffic images gathered from an actual field. As an initial stage, the developed tool performs satisfactorily in classifying and counting car-type vehicle with 97.62% accuracy in a 10-hour testing. However, it fails to recognize motorbikes probably due to its relatively smaller pixel size compared to other vehicle types. The presence of jeeps confused the VCS. The real image dataset can be used as basis for further development. The newly developed TIIC system can also be used in future research as a promising tool to replace real-time video streaming.
Content may be subject to copyright.
Vehicle Count System based on Time Interval Image
Capture Method and Deep Learning Mask R-CNN
Eduardo Jr Piedad
Department of Electrical Engineering
University of San Jose-Recoletos
Cebu City, Philippines
eduardojr.piedad@usjr.edu.ph
Fhenyl Kristel Pama
Department of Civil Engineering
University of San Jose-Recoletos
Cebu City, Philippines
fhenylkristel.pama@gmail.com
Tuan-Tang Le
Department of Mechanical Engineering
National Taiwan University of Science
and Technology
Taipei City, Taiwan
d10603809@mail.ntust.edu.tw
Ianny Tabale
Department of Civil Engineering
University of San Jose-Recoletos
Cebu City, Philippines
ianny.tabale@gmail.com
Kimberly Aying
Department of Civil Engineering
University of San Jose-Recoletos
Cebu City, Philippines
kimberlyaying@gmail.com
Abstract Traffic congestion is an undesirable problem for
big cities especially in third world countries. Better policy
planning and decision-making from the authority comes from
well-conducted practical research. In this study, a Vehicle
Count System (VCS) using deep learning Mask R-CNN is
developed to classify and count vehicles passing in a target
street. A novel time interval image capture (TIIC) system is
employed to the VCS instead of the typical real-time video
streaming to avoid big data storage cost. To determine the
effectiveness of the developed VCS, its output is compared to
that of conventional method from manual recording of the
passing vehicles. Four vehicle types cars, motorbikes, trucks
and buses are present in the 1800 real traffic images gathered
from an actual field. As an initial stage, the developed tool
performs satisfactorily in classifying and counting car-type
vehicle with 97.62% accuracy in a 10-hour testing. However, it
fails to recognize motorbikes probably due to its relatively
smaller pixel size compared to other vehicle types. The presence
of jeeps confused the VCS. The real image dataset can be used
as basis for further development. The newly developed TIIC
system can also be used in future research as a promising tool to
replace real-time video streaming.
Keywordsvehicle counting system, time interval image
capture, Mask R-CNN, traffic monitoring
I. INTRODUCTION
Traffic congestion is a growing concern in many
developing countries especially in the Philippines. Its capital
city, Metro Manila, is considered to be the worst traffic in the
world based on 2015 Global Driver Satisfaction Index (GDSI)
conducted by Waze navigation app [1]. Later, the problem
may also grow in other big cities of the country such as Cebu
and Davao City. Policy-making bodies need sufficient
information from various research agencies in the race of
addressing the problem without inflicting further cost. In this
initial study, a deep learning based traffic monitoring system
is developed.
Various studies address the big data problem due to the
integration of smart cameras for traffic related application. For
example, expert systems deploying a network of smart
cameras for traffic monitoring are developed by [2] to handle
several subsystems for a wide traffic control. Instead of fixed
camera, another study of [3] uses images from unmanned
aerial vehicles (UAV) for vehicle detection. A study on ad hoc
networks by [4] reviews the gaining enormous research on
vehicular Ad hoc Networks (VANETs) to address traffic
safety and other applications such as traffic status monitoring
and road traffic management. Another study by [5] conducted
an optimization based and deep learning based methods to
deeply understand traffic density from a large-scale web
camera data. Due to the economic cost generated from large-
scale camera data, there is a need for cheaper yet effective
system. This study uses a strategy that limits big data
acquisition by developing a time interval-based image capture
technique.
In the recent vehicle recognition and count, deep learning
methods tend to be the widely-used due to its promising
benefits such as fast computation and high accuracy. A
TraCount technique developed by [6] to address the counting
problem of overlapping vehicle. Another technique, FCN-
rLSTM, by [7] develop a deep learning using spatio-temporal
neural networks to count vehicles in low quality videos
captured by city cameras. Another similar study uses a fine-
grained vehicle classification model developed by [8] that
handles complicated transportation scene. A developed
vehicle counting system (VCS) by [9] provides a more
accurate VCS based on vehicle types instead of determining
only whether vehicle or not vehicle. It classifies which vehicle
type whether car, taxi or truck based on convolutional neural
network with layer skipping-strategy (CNNLS) framework.
The study of [10] generated a large contextual dataset for
classification, detection and counting of cars using a deep
learning called ResCeption. This network offers a new way
of counting cars in a single look instead of using the
conventional image processing techniques. A deep
Fig 1. Flowchart of the developed vehicle counting
2675
978-1-7281-1895-6/19/$31.00 c
2019 IEEE
convolutional neural network developed by [11] gives a recent
image-based learning technique to measure traffic density.
Most of these studies develop deep learning tools that uses big
scale dataset from video streaming image data. Some focuses
in improving the tool itself.
A practical implementation in an actual scenario using a
more recent deep learning tool, mask region-based
convolutional neural network (Mask R-CNN), is proposed in
this study. This study uses a pre-trained Mask R-CNN model
to classify and count vehicles in the generated traffic images
from the developed interval-based image capture.
II. VEHICLE COUNTING SYSTEM (VCS) SETUP
The developed VCS in this study setup is shown in Fig. 1.
There two important parts of the VCS time interval image
capture (TIIC) system and the deep learning Mask R-CNN.
The setup is simply a fixed camera with the same angle of
depression per image in order to get the desired traffic
parameter. A sample image of the VCS is shown in Fig. 2.
Later in the deep learning implementation, this original image
will be preprocessed in order to detect the vehicles only in the
desired area of the image.
Fig 3. The time interval image capture method
Fig 4. Three captured images in 60-second green status of the target street
Fig 5. Deep learning Mask R-CNN Implementation
Fig 2. A sample image taken by the vehicle counting system
2676 2019 IEEE Region 10 Conference (TENCON 2019)
A. Time interval image capture (TIIC) system
Then the time interval image capture (TIIC) method is
illustrated in Fig. 3. There are three typical traffic status
green, red and yellow. The green status is when the vehicle
signals the vehicle to ‘Go’ while red signals ‘Stop’. Since the
yellow status usually covers only around three-second time
interval, this is ignored in this method. The duration of go and
stop statuses depends on every location and normally it takes
at least 16 seconds each status. In the TIIC method, at least
three images are captured per status as shown in Fig. 4. After
detecting and counting the vehicle type, the mean number per
vehicle type in the three said images are taken. There are 1800
traffic images collected from the overpass of Osmeña
Boulevard, one of the most congested place in Cebu City,
Philippines.
B. Deep Learning Mask R-CNN
The second part of VCS includes the vehicle detection and
counting. The typical deep learning implementation is shown
in Fig. 5 where proposed VCS is compared with the
conventional classification of manual recording. The images
captured in TIIC system are fed into the deep learning model.
In this study, we use the recent mask region-based
convolutional neural network (Mask R-CNN). The readers are
invited to check the literature of Mask R-CNN and its similar
variants in [12][15]. It is recently known for accurate object
Fig 6. Mask R-CNN integrated learning model pre-trained by COCO dataset
Fig 7. Image processing from the original image (a) to final output (d)
2019 IEEE Region 10 Conference (TENCON 2019) 2677
detection and simultaneously object classification while
creating an output mask of the object. There are four object
concerned corresponding to four vehicle types cars,
motorbike, truck and bus. The performance evaluation is done
by comparing the classification accuracy between proposed
VCS and the conventional method using equation (1).
ܣܿܿݑݎܽܿݕ ൌ  ே௨௠௕௘௥௢௙௖௢௥௥ ௘௖௧௟௬௖௟௔௦௦௜௙௜௘ௗ௩௘௛௜௖௟௘
்௢௧௔௟௡௨௠௕௘௥௢௙௩௘௛௜௖௟௘ Ψ (1)
III. IMPLEMENTATION
The learning model based on Mask R-CNN is shown in
Fig. 6. There are four essential steps of this process data
preprocessing, Mask R-CNN architecture, filter and output
post-processing.
Step 1. In the data preprocessing stage, the original image
in Fig. 7 (a) is transformed into the preprocessed image in Fig.
6. In order to prevent detecting the vehicles not in the desired
street as shown in Fig. 7 (b), an image processing technique is
performed to subtract the information outside the desired
street as shown in Fig. 7 (c). Note that the vehicle detection
process in Figs. 7 (b)-(d) is discussed in the next step and is
only shown as a reference.
Step 2. The new images serve as the Mask RCNN
architecture input. In this study, we employed a pre-trained
model which detects 80 different classes. Mask R-CNN model
is fast and easy to train. This learning model is used to detect
and classify multiple vehicles in an image. The training is
done with ResNet-50-FPN on COCO trainval35k that takes 32
hours in our synchronized 8-GPU implementation (0.72s per
16-image mini-batch), and 44 hours with ResNet-101-FPN. In
fact, fast prototyping can be completed in less than one day
when training using the train set. Models are trained in all
COCO trainval35k images that contains annotated keypoints.
To avoid overfitting, as this training set is smaller, we train
using image scales randomly sampled from [640, 800] pixels
while inference is on a single scale of 800 pixels. We train for
90,000 iterations, starting from a learning rate of 0.02 and
reducing it by 10 at 60000 and 80000 iterations. We use
bounding-box NMS with a threshold of 0.5. The COCO
dataset is available online in [16]. The network’s output is a
random sequence of classes, boxes and masks. The output
information will be collected based on the same object with
the same index.
Step 3. Since there are a number of undesirable outputs of
pre-trained Mask R-CNN, a filter is created so that the desired
information such as the body boxes and masks of the desired
vehicle type - cars, trucks, motors and buses, are combined or
retained.
Step 4. Finally, the retained information is sorted and a
counting process is performed. Similar vehicle type is sorted
and counted together.
IV. RESULTS AND DISCUSSIO N
Based on the output images in Fig. 8 (a), there is a
difficulty of detecting a motorbike due to its comparatively
small pixel size which can be easily overlapped with other
bigger vehicle types. In addition, the number of both truck and
bus detected by the proposed VCS is more than the manual
method due to their confusion with jeep as shown in Fig. 8. In
this study, jeep is categorized as car type and based on Fig. 8
(b), two overlapping jeeps looks similar with trucks and buses.
Fig. 8 (c) shows a successful detection of vehicle in an image
while some misdetection and misclassifications in Fig. 8 (d).
Fig 8 (a)-
(d). Output images of the proposed VCS with vehicle
masks and labels
Table 1. Vehicle Classification Accuracy of Manual and Proposed VCS
Hour Manual Proposed % acc Manual Proposed % acc Manual Proposed % acc Manual Proposed % acc
1 340 292 85.88 250 97 38.80 0 16 0.00 16 101 -15.84
2 464 506 91.70 397 157 49.20 4 15 -26.67 20 151 -13.25
3 403 384 95.29 313 154 49.20 1 7 -14.29 6 121 - 4.96
4 514 777 66.15 588 105 17.86 1 23 -4.35 33 186 -17.74
5 341 352 96.88 287 138 48.08 5 25 -20.00 4 111 -3.60
6 832 744 89.42 455 132 29.01 6 33 18.18 16 124 -12.90
7 355 415 85.54 236 147 62.29 1 20 -5.00 14 156 -8.97
8 489 349 71.37 292 117 40.07 1 20 -5.00 31 98 - 31.63
9 414 711 58.23 311 125 40.19 0 39 0.00 16 232 -6.90
10 882 384 43.54 413 139 33.66 9 22 -40.91 48 98 -48.98
Total 5034 4914 3542 1311 28 220 204 1378
Total
Weighted
Accuracy
97.62% 37.01% -12.73% - 14.80%
Car
Motor
Bus
Truck
2678 2019 IEEE Region 10 Conference (TENCON 2019)
Table 1 presents the 10-hour vehicle classification
accuracy of conventional and proposed vehicle counting
system (VCS). It also shows the accuracy of proposed VCS in
comparison with the manual method. Note that the negative
accuracy only means that proposed VCS detects more vehicle
than the manual method. Accordingly, the proposed VCS
successfully detects car with satisfactory total accuracy of
97.62 % while poorly to the rest of vehicle types. It can be
observed that in the 10th hour, it poorly detects car with only
37.01 % accuracy while truck and bus have significantly
increased. This means that the proposed VCS has the tendency
of confusing car with either truck or bus.
V. CONCLUSION
A vehicle counting system (VCS) based on time interval
image capture (TIIC) system and deep learning Mask R-CNN
is successfully developed and implemented in an actual field
scenario. The developed VCS failed to count the number of
motorbikes, trucks and bus but sufficiently for cars. The
presence of jeep which is categorized as car confused the VCS
and detect it as either truck or bus. The VCS tends to missed
most of motorbikes probably due to its small pixel size
comparative to other vehicle types. Future researches can
implement the novel TIIC system to minimize data storage
cost.
REFERENCES
[1] B. Bongat, “How Much Money Are You Losing Because of Traffic?,”
Yahoo News, 2015. [Online]. Available:
https://sg.news.yahoo.com/much-money-losing-because-traffic-
220012598.html. [Accessed: 15-Feb-2019].
[2] L. Calderoni, D. Maio, and S. Rovis, “Expert Systems with
Applications Deploying a network of smart cameras for traffic
monitoring on a ‘“ city kernel ,”’” vol. 41, pp. 502–507, 2014.
[3] G. V Konoplich, E. O. Putin, and A. A. Filchenkov, “Application of
Deep Learning to the Problem of Vehicle Detection in UAV Images,”
2016 XIX IEEE Int. Conf. Soft Comput. Meas., pp. 46, 2016.
[4] T. Darwish and K. A. Bakar, “Traffic density estimation in vehicular
ad hoc networks: A review,” Ad Hoc Networks, vol. 24, pp. 337351,
2015.
[5] S. Zhang, G. Wu, and P. Costeira, “Understanding Traffic Density from
Large -Scale Web Camera Data,” arXiv Prepr., 2017.
[6] S. Surya and V. Babu, “TraCount: A Deep Convolutional Neural
Network for Highly Overlapping Vehicle Counting,” 2016.
[7] S. Zhang, G. Wu, J. Costeira, and J. Moura, “FCN-rLSTM: Deep
Spatio-Temporal Neural Networks for Vehicle Counting in City
Cameras,” in 2017 IEEE International Conference on Computer Vision
(ICCV), 2017.
[8] S. Yu, Y. Wu, W. Li, Z. Song, and W. Zeng, “A model for fine-grained
vehicle classification based on deep learning,Neurocomputing, vol.
257, pp. 97103, 2017.
[9] S. Awang and N. M. A. N. Azmi, “Vehicle Counting System Based on
Vehicle Type Classification Using Deep Learning Method,” in IT
Convergence and Security 2017, 2018, pp. 5259.
[10] T. N. Mundhenk, G. Konjevod, W. A. Sakla, and K. Boakye, “A Large
Contextual Dataset for Classification, Detection and Counting of Cars
with Deep Learning,” in Computer Vision -- ECCV 2016, 2016, pp.
785800.
[11] J. Chung and K. Sohn, “Image-Based Learning to Measure Traffic
Density Using a Deep Convolutional Neural Network,” IEEE Trans.
Intell. Transp. Syst., vol. 19, no. 5, pp. 16701675, 2018.
[12] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature
hierarchies for accurate object detection and semantic segmentation,”
in Proceedings of the IEEE conference on computer vision and pattern
recognition, 2014, pp. 580587.
[13] R. Girshick, “Fast R-CNN,” in The IEEE International Conference on
Computer Vision (ICCV), 2015.
[14] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-
time object detection with region proposal networks,” in Advances in
neural information processing systems, 2015, pp. 9199.
[15] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in
Computer Vision (ICCV), 2017 IEEE International Conference on,
2017, pp. 29802988.
[16] C. Consortium, “Coco dataset.” [Online]. Available:
http://cocodataset.org/#home. [Accessed: 03-Mar-2019].
2019 IEEE Region 10 Conference (TENCON 2019) 2679
... It showed that in some datasets, different hyperparameter value gives a different counting precision level. In [27][28][29], mask R-CNN was implemented. 27. ...
... 27. Ciampi et al. [27] only considered cars and trucks for counting, while Al-Ariny et al. [28] and Eduardo [29] considered 4 classes including motorcycle and bus. Despite higher processing time compared to YOLO, mask R-CNN was preferred as it gives segmentation mask as one of the detection outputs. ...
... These points were then tracked using KLT tracker. In [29] it was reported that cars can be counted with good precision but not the other vehicle classes such as a motorcycle. This issue may be the result of no tracking was implemented, thus the same vehicles may be counted more than once. ...
... RCNN algorithm is a deep learning method that provides fairly high accuracy of up to 97.62% but has the disadvantage that the system does not recognize motorcycletype vehicles because the pixel size of the image is smaller than other vehicles. Another study on vehicle detection and counting system in highway scenes used YOLOv3 Algorithm [4]. The evaluation found an accuracy of 93.2%, and the disadvantage of the method is that large vehicles cover part of small vehicles and cars in line. ...
... Deep Learning is widely implemented with Convolutional Neural Network (CNN) algorithms [3,4,13,21,22,23]. The CNN has an excellent ability to learn image features and can perform various tasks such as classification and bounding box regression. ...
... One-Stage algorithm type is YOLO, which consists of many versions such as YOLOv2 [8], and YOLOv3 [1,16,20]. At the same time, the type of Two-Stage algorithm is Regional Convolutional Neural Network (RCNN) which is one type of CNN [4,8,9,13,21]. One of the widely used RCNN algorithms is Mask-RCNN [4,9], where in this method, the image data is labeled and then sorted, grouped, and calculated based on similar objects. ...
Conference Paper
Full-text available
A vehicle counting system (VCS) is one technology that can help the Intelligent Transportation System (ITS) achieve its goals of providing safe and efficient road and transportation infrastructure. Some challenges arise when dealing with CCTV objects, such as an object being blocked by other objects or scanning objects being too small, resulting in the object not being detected. This paper implements YOLOv4, with the number of vehicles based on five classes: car, bus, truck, bicycle, and motorcycle. To demonstrate the feasibility system and adequacy, which implement in three stages: input the videos or images into a system using pre-trained YOLOv4, get the bounding box, and compare with the ground truth to the pre-trained model using the MS COCO dataset. The result of this research is that accuracy of 85.61% is obtained based on the results of the two-test media of Image and Video, which were broken down into frame by frame to get accuracy by frame. For image detection, an accuracy of 88% is obtained; for video detection, an accuracy of 83.22% is obtained.
... Piedad et al. introduced a vehicle counting system based on deep learning with Mask R-CNN. They compared its output to the conventional approach of manually recording past vehicle data, demonstrating higher accuracy in classifying and quantifying automobile-type vehicles using the developed tool [19]. Bie et al. proposed an improved lightweight vehicle detection algorithm based on YOLOv5. ...
Article
Full-text available
Hazardous chemical vehicles are specialized vehicles used for transporting flammable gases, medical waste, and liquid chemicals, among other dangerous chemical substances. During their transportation, there are risks of fire, explosion, and leakage of hazardous materials, posing serious threats to human safety and the environment. To mitigate these possible hazards and decrease their probability, this study proposes a lightweight object detection method for hazardous chemical vehicles based on the YOLOv7-tiny model.The method first introduces a lightweight feature extraction structure, E-GhostV2 network, into the trunk and neck of the model to achieve effective feature extraction while reducing the burden of the model. Additionally, the PConv is used in the model’s backbone to effectively reduce redundant computations and memory access, thereby enhancing efficiency and feature extraction capabilities. Furthermore, to address the problem of performance degradation caused by overemphasizing high-quality samples, the model adopts the WIoU loss function, which balances the training effect of high-quality and low-quality samples, enhancing the model’s robustness and generalization performance. Experimental results demonstrate that the improved model achieves satisfactory detection accuracy while reducing the number of model parameters, providing robust support for theoretical research and practical applications in the field of hazardous chemical vehicle object detection.
... Today, there is an increasing popularity and a need to maximize the function of this existing surveillance technology by integrating artificial intelligence (AI) techniques. For example, a vehicle counter system based on traffic surveillance is satisfactorily developed in [5]. ...
... One of the possible applications of Mask R-CNN can be utilization in cities, in which the count of vehicles and issues associated with it grows every year. In 2019, Piedad et al. [24] presented a system able to count vehicles based on the mentioned model. Four categories of interest include trucks, buses, cars, and motorbikes. ...
Article
Full-text available
The paper is focused on the development of the experimental web-based solution for image processing from the perspective of an Unmanned Aerial Vehicle (UAV). Specifically, the research is carried out as part of the broader study on drone utilization in traffic at the Technical University of Kosice. This contribution explores the possibility of using the UAV as a tool to detect the temporal state of the traffic in multiple locations. Road traffic analysis is enabled through the detection of vehicles from the user-defined region of interest (ROI). Its content then serves as the input for motion detection, followed by the detection of vehicles using the YOLOv4 model. Detection of other types of objects is possible, thus making the system more universal. The vehicle is tracked after recognition in two consecutive frames. The tracking algorithm is based on the calculation of the Euclidean distance and the intersection of the rectangles. The experimental verification yields lower hardware requirements for CPU and GPU by about two FPS when using optimization techniques, such as ROI or reference dimensions of objects. The accuracy of detection and the subsequent tracking of cars reaches almost 100% while providing accurate trajectory determination.
Chapter
Vehicle detection and tracking play a critical role in Intelligent Traffic Systems (ITS) for managing, identifying, and understanding the behaviour of vehicles on the road. In recent years, researchers have focused on developing computer vision-based approaches to improve the accuracy of vehicle detection and tracking. However, state-of-the-art approaches face challenges such as uncontrollable environmental conditions, occlusion of vehicles in a single frame, and high levels of noise, which can affect the accuracy of vehicle detection. In this study, we propose a computer vision-based approach to identify and track the movement of vehicles using the “YOU ONLY LOOK ONCE” (YOLO) version 7 Convolutional Neural Network (CNN) model. The YOLO model is fine-tuned to detect different types of vehicles, including cars, ambulances, buses, trucks, motorcycles, and bicycles. We utilized the Open Images Dataset from Google to test and validate our approach. Our experimental results demonstrate that the proposed approach achieves a higher detection accuracy of 81.28% mean Average Precision (mAP) compared to the state-of-the-art approaches. Furthermore, we tested our approach on the DAWN dataset to demonstrate its effectiveness as a multi-detector of vehicles in different weather conditions and online videos on the road for vehicle detection and tracking. Despite the challenges in the state-of-the-art computer vision-based ITS, our proposed approach demonstrates that it is possible to achieve higher accuracy using the YOLO version 7 CNN model. This study highlights the importance of developing accurate and efficient vehicle detection and tracking techniques to improve the operational functions of Intelligent Traffic Systems.
Article
Vehicle numeration is associate interaction to appraise the road traffic thickness to judge the traffic conditions for shrewd transportation frameworks. With the broad use of cameras in metropolitan vehicle frameworks, the reconnaissance mission video has become a focal info supply to boot, constant traffic the board framework has become illustrious as lately owing to the accessibility of handheld/versatile cameras and machine learning investigation. In this work, propose video-based vehicle as well as technique in associate superhighway traffic video caught utilizing hand-held cameras. The primary and therefore the necessary step to estimate the vehicle flow; this later helps North American nation to count the vehicles victimization the virtual line Generally, we have a tendency to begin with the background subtraction to isolate moving objects. To facilitate crossing of vehicles with the road, we have a tendency to apply the detection of objects. Our system uses the LBPH (Local Binary Pattern Histogram) algorithmic program as a way to deduct the background, so as to use our numeration algorithmic program. Traffic observance is one space that utilizes Deep Learning for many functions. By exploitation cameras put in in some spots on the roads, several tasks like vehicle investigating, vehicle identification, traffic violation observance, vehicle speed observance, etc. will be completed. Deep Learning may be a common Machine Learning formula that's wide employed in several areas in current way of life. Its strong performance and ready-to-use frameworks and architectures allows many of us to develop varied Deep Learning-based code or systems to support human tasks and activities. During this paper, we tend to discuss a Deep Learning implementation to form a vehicle investigating system while not having to trace the vehicles movements. to reinforce the system performance and to cut back time in deploying Deep Learning design, therefore pre-trained model of YOLOv3 is employed during this analysis because of its sensible performance and moderate process time in object detection. This analysis aims to form a straightforward vehicle investigating system to assist human in classify and investigating the vehicles that cross the road. The investigating relies on four varieties of vehicle, i.e. car, motorcycle, bus, and truck, whereas previous analysis counts the automobile solely. because the result, our planned system capable to count the vehicles crossing the road supported video captured by camera with the very best accuracy of ninety seven.97%.
Article
Full-text available
Existing methodologies to count vehicles from a road image have depended upon both hand-crafted feature engineering and rule-based algorithms. These require many predefined thresholds to detect and track vehicles. This paper provides a supervised learning methodology that requires no such feature engineering. A deep convolutional neural network was devised to count the number of vehicles on a road segment based solely on video images. The present methodology does not regard an individual vehicle as an object to be detected separately; rather, it collectively counts the number of vehicles as a human would. The test results show that the proposed methodology outperforms existing schemes. IEEE
Article
Full-text available
In this paper, we estimate traffic density from low quality videos captured by city web cameras (webcams). Webcam videos have low resolution, low frame rate, high occlusion and large perspective, making most existing methods lose their efficacy. To deeply understand traffic density, we explore both deep learning based and optimization based methods. To avoid individual vehicle detection and tracking, both methods map the image into vehicle density map, one based on rank constrained regression and the other one based on fully convolution networks (FCN). The regression based method learns different weights for different blocks in the image to increase freedom degrees of weights and embed perspective information. The FCN based method jointly estimates vehicle density map and vehicle count with a residual learning framework to perform end-to-end dense prediction, allowing arbitrary image resolution, and adapting to different vehicle scales and perspectives. We analyze and compare both methods, and get insights from optimization based method to improve deep model. Since existing datasets do not cover all the challenges in our work, we collected and labelled a large-scale traffic video dataset, containing 60 million frames from 212 webcams. Both methods are extensively evaluated and compared on different counting tasks and three datasets, with experimental results demonstrating their effectiveness and robustness. In particular, FCN based method significantly reduces the mean absolute value from 10.99 to 5.31 on the public dataset TRANCOS compared with the state-of-the-art baseline.
Conference Paper
Can a large convolutional neural network trained for whole-image classification on ImageNet be coaxed into detecting objects in PASCAL? We show that the answer is yes, and that the resulting system is simple, scalable, and boosts mean average precision, relative to the venerable deformable part model, by more than 40% (achieving a final mAP of 48% on VOC 2007). Our framework combines powerful computer vision techniques for generating bottom-up region proposals with recent advances in learning high-capacity convolutional neural networks. We call the resulting system R-CNN: Regions with CNN features. The same framework is also competitive with state-of-the-art semantic segmentation methods, demonstrating its flexibility. Beyond these results, we execute a battery of experiments that provide insight into what the network learns to represent, revealing a rich hierarchy of discriminative and often semantically meaningful features.
Conference Paper
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [7] and Fast R-CNN [5] have reduced the running time of these detection networks, exposing region pro-posal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolu-tional features. For the very deep VGG-16 model [18], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image. The code will be released.
Article
We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without tricks, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code will be made available.
Article
A model for fine-grained vehicle classification based on deep learning is proposed to handle complicated transportation scene. This model comprises of two parts, vehicle detection model and vehicle fine-grained detection and classification model.Faster R-CNN method is adopted in vehicle detection model to extract single vehicle images from an image with clutter background which may contains serval vehicles.This step provides data for the next classification model. In vehicle fine-grained classification model, an image contains only one vehicle is fed into a CNN model to produce a feature, then a joint bayesian network is used to implement the fine-grained classification process.Experiments show that vehicle’s make and model can be recognized from transportation images effectively by using our method. Furthermore,in order to build a large scale database easier, this paper comes up with a novel network collaborative annotation mechanism.