Conference PaperPDF Available

Vehicle Count System based on Time Interval Image Capture Method and Deep Learning Mask R-CNN

December 2019

December 2019

DOI:10.1109/TENCON.2019.8929426

Conference: Vehicle Count System based on Time Interval Image Capture Method and Deep Learning Mask R-CNN
At: Kochi, India, India

Authors:

Eduardo Jr Piedad

Universitat Politècnica de Catalunya

Tang Tuan Le

Nhat Tinh Co. Ltd

Show all 6 authorsHide

Traffic congestion is an undesirable problem for big cities especially in third world countries. Better policy planning and decision-making from the authority comes from well-conducted practical research. In this study, a Vehicle Count System (VCS) using deep learning Mask R-CNN is developed to classify and count vehicles passing in a target street. A novel time interval image capture (TIIC) system is employed to the VCS instead of the typical real-time video streaming to avoid big data storage cost. To determine the effectiveness of the developed VCS, its output is compared to that of conventional method from manual recording of the passing vehicles. Four vehicle types-cars, motorbikes, trucks and buses are present in the 1800 real traffic images gathered from an actual field. As an initial stage, the developed tool performs satisfactorily in classifying and counting car-type vehicle with 97.62% accuracy in a 10-hour testing. However, it fails to recognize motorbikes probably due to its relatively smaller pixel size compared to other vehicle types. The presence of jeeps confused the VCS. The real image dataset can be used as basis for further development. The newly developed TIIC system can also be used in future research as a promising tool to replace real-time video streaming.

Flowchart of the developed vehicle counting

…

Mask R-CNN integrated learning model pre-trained by COCO dataset

…

Vehicle Classification Accuracy of Manual and Proposed VCS

…

Figures - uploaded by Tang Tuan Le

Content may be subject to copyright.

Content uploaded by Tang Tuan Le

Content may be subject to copyright.

Vehicle Count System based on Time Interval Image

Capture Method and Deep Learning Mask R-CNN

Eduardo Jr Piedad

Department of Electrical Engineering

University of San Jose-Recoletos

Cebu City, Philippines

eduardojr.piedad@usjr.edu.ph

Fhenyl Kristel Pama

Department of Civil Engineering

University of San Jose-Recoletos

Cebu City, Philippines

fhenylkristel.pama@gmail.com

Tuan-Tang Le

Department of Mechanical Engineering

National Taiwan University of Science

and Technology

Taipei City, Taiwan

d10603809@mail.ntust.edu.tw

Ianny Tabale

Department of Civil Engineering

University of San Jose-Recoletos

Cebu City, Philippines

ianny.tabale@gmail.com

Kimberly Aying

Department of Civil Engineering

University of San Jose-Recoletos

Cebu City, Philippines

kimberlyaying@gmail.com

Abstract— Traffic congestion is an undesirable problem for

big cities especially in third world countries. Better policy

planning and decision-making from the authority comes from

well-conducted practical research. In this study, a Vehicle

Count System (VCS) using deep learning Mask R-CNN is

developed to classify and count vehicles passing in a target

street. A novel time interval image capture (TIIC) system is

employed to the VCS instead of the typical real-time video

streaming to avoid big data storage cost. To determine the

effectiveness of the developed VCS, its output is compared to

that of conventional method from manual recording of the

passing vehicles. Four vehicle types – cars, motorbikes, trucks

and buses are present in the 1800 real traffic images gathered

from an actual field. As an initial stage, the developed tool

performs satisfactorily in classifying and counting car-type

vehicle with 97.62% accuracy in a 10-hour testing. However, it

fails to recognize motorbikes probably due to its relatively

smaller pixel size compared to other vehicle types. The presence

of jeeps confused the VCS. The real image dataset can be used

as basis for further development. The newly developed TIIC

system can also be used in future research as a promising tool to

replace real-time video streaming.

Keywords—vehicle counting system, time interval image

capture, Mask R-CNN, traffic monitoring

I. INTRODUCTION

Traffic congestion is a growing concern in many

developing countries especially in the Philippines. Its capital

city, Metro Manila, is considered to be the worst traffic in the

world based on 2015 Global Driver Satisfaction Index (GDSI)

conducted by Waze navigation app [1]. Later, the problem

may also grow in other big cities of the country such as Cebu

and Davao City. Policy-making bodies need sufficient

information from various research agencies in the race of

addressing the problem without inflicting further cost. In this

initial study, a deep learning based traffic monitoring system

is developed.

Various studies address the big data problem due to the

integration of smart cameras for traffic related application. For

example, expert systems deploying a network of smart

cameras for traffic monitoring are developed by [2] to handle

several subsystems for a wide traffic control. Instead of fixed

camera, another study of [3] uses images from unmanned

aerial vehicles (UAV) for vehicle detection. A study on ad hoc

networks by [4] reviews the gaining enormous research on

vehicular Ad hoc Networks (VANETs) to address traffic

safety and other applications such as traffic status monitoring

and road traffic management. Another study by [5] conducted

an optimization based and deep learning based methods to

deeply understand traffic density from a large-scale web

camera data. Due to the economic cost generated from large-

scale camera data, there is a need for cheaper yet effective

system. This study uses a strategy that limits big data

acquisition by developing a time interval-based image capture

technique.

In the recent vehicle recognition and count, deep learning

methods tend to be the widely-used due to its promising

benefits such as fast computation and high accuracy. A

TraCount technique developed by [6] to address the counting

problem of overlapping vehicle. Another technique, FCN-

rLSTM, by [7] develop a deep learning using spatio-temporal

neural networks to count vehicles in low quality videos

captured by city cameras. Another similar study uses a fine-

grained vehicle classification model developed by [8] that

handles complicated transportation scene. A developed

vehicle counting system (VCS) by [9] provides a more

accurate VCS based on vehicle types instead of determining

only whether vehicle or not vehicle. It classifies which vehicle

type whether car, taxi or truck based on convolutional neural

network with layer skipping-strategy (CNNLS) framework.

The study of [10] generated a large contextual dataset for

classification, detection and counting of cars using a deep

learning called ResCeption. This network offers a new way

of counting cars in a single look instead of using the

conventional image processing techniques. A deep

Fig 1. Flowchart of the developed vehicle counting

2675

978-1-7281-1895-6/19/$31.00 c

2019 IEEE

convolutional neural network developed by [11] gives a recent

image-based learning technique to measure traffic density.

Most of these studies develop deep learning tools that uses big

scale dataset from video streaming image data. Some focuses

in improving the tool itself.

A practical implementation in an actual scenario using a

more recent deep learning tool, mask region-based

convolutional neural network (Mask R-CNN), is proposed in

this study. This study uses a pre-trained Mask R-CNN model

to classify and count vehicles in the generated traffic images

from the developed interval-based image capture.

II. VEHICLE COUNTING SYSTEM (VCS) SETUP

The developed VCS in this study setup is shown in Fig. 1.

There two important parts of the VCS – time interval image

capture (TIIC) system and the deep learning Mask R-CNN.

The setup is simply a fixed camera with the same angle of

depression per image in order to get the desired traffic

parameter. A sample image of the VCS is shown in Fig. 2.

Later in the deep learning implementation, this original image

will be preprocessed in order to detect the vehicles only in the

desired area of the image.

Fig 3. The time interval image capture method

Fig 4. Three captured images in 60-second green status of the target street

Fig 5. Deep learning Mask R-CNN Implementation

Fig 2. A sample image taken by the vehicle counting system

2676 2019 IEEE Region 10 Conference (TENCON 2019)

A. Time interval image capture (TIIC) system

Then the time interval image capture (TIIC) method is

illustrated in Fig. 3. There are three typical traffic status –

green, red and yellow. The green status is when the vehicle

signals the vehicle to ‘Go’ while red signals ‘Stop’. Since the

yellow status usually covers only around three-second time

interval, this is ignored in this method. The duration of go and

stop statuses depends on every location and normally it takes

at least 16 seconds each status. In the TIIC method, at least

three images are captured per status as shown in Fig. 4. After

detecting and counting the vehicle type, the mean number per

vehicle type in the three said images are taken. There are 1800

traffic images collected from the overpass of Osmeña

Boulevard, one of the most congested place in Cebu City,

Philippines.

B. Deep Learning Mask R-CNN

The second part of VCS includes the vehicle detection and

counting. The typical deep learning implementation is shown

in Fig. 5 where proposed VCS is compared with the

conventional classification of manual recording. The images

captured in TIIC system are fed into the deep learning model.

In this study, we use the recent mask region-based

convolutional neural network (Mask R-CNN). The readers are

invited to check the literature of Mask R-CNN and its similar

variants in [12]–[15]. It is recently known for accurate object

Fig 6. Mask R-CNN integrated learning model pre-trained by COCO dataset

Fig 7. Image processing from the original image (a) to final output (d)

2019 IEEE Region 10 Conference (TENCON 2019) 2677

detection and simultaneously object classification while

creating an output mask of the object. There are four object

concerned corresponding to four vehicle types – cars,

motorbike, truck and bus. The performance evaluation is done

by comparing the classification accuracy between proposed

VCS and the conventional method using equation (1).

ܣܿܿݑݎܽܿݕ ൌ  ே௨௠௕௘௥௢௙௖௢௥௥ ௘௖௧௟௬௖௟௔௦௦௜௙௜௘ௗ௩௘௛௜௖௟௘ 

்௢௧௔௟௡௨௠௕௘௥௢௙௩௘௛௜௖௟௘ Ψ (1)

III. IMPLEMENTATION

The learning model based on Mask R-CNN is shown in

Fig. 6. There are four essential steps of this process – data

preprocessing, Mask R-CNN architecture, filter and output

post-processing.

Step 1. In the data preprocessing stage, the original image

in Fig. 7 (a) is transformed into the preprocessed image in Fig.

6. In order to prevent detecting the vehicles not in the desired

street as shown in Fig. 7 (b), an image processing technique is

performed to subtract the information outside the desired

street as shown in Fig. 7 (c). Note that the vehicle detection

process in Figs. 7 (b)-(d) is discussed in the next step and is

only shown as a reference.

Step 2. The new images serve as the Mask RCNN

architecture input. In this study, we employed a pre-trained

model which detects 80 different classes. Mask R-CNN model

is fast and easy to train. This learning model is used to detect

and classify multiple vehicles in an image. The training is

done with ResNet-50-FPN on COCO trainval35k that takes 32

hours in our synchronized 8-GPU implementation (0.72s per

16-image mini-batch), and 44 hours with ResNet-101-FPN. In

fact, fast prototyping can be completed in less than one day

when training using the train set. Models are trained in all

COCO trainval35k images that contains annotated keypoints.

To avoid overfitting, as this training set is smaller, we train

using image scales randomly sampled from [640, 800] pixels

while inference is on a single scale of 800 pixels. We train for

90,000 iterations, starting from a learning rate of 0.02 and

reducing it by 10 at 60000 and 80000 iterations. We use

bounding-box NMS with a threshold of 0.5. The COCO

dataset is available online in [16]. The network’s output is a

random sequence of classes, boxes and masks. The output

information will be collected based on the same object with

the same index.

Step 3. Since there are a number of undesirable outputs of

pre-trained Mask R-CNN, a filter is created so that the desired

information such as the body boxes and masks of the desired

vehicle type - cars, trucks, motors and buses, are combined or

retained.

Step 4. Finally, the retained information is sorted and a

counting process is performed. Similar vehicle type is sorted

and counted together.

IV. RESULTS AND DISCUSSIO N

Based on the output images in Fig. 8 (a), there is a

difficulty of detecting a motorbike due to its comparatively

small pixel size which can be easily overlapped with other

bigger vehicle types. In addition, the number of both truck and

bus detected by the proposed VCS is more than the manual

method due to their confusion with jeep as shown in Fig. 8. In

this study, jeep is categorized as car type and based on Fig. 8

(b), two overlapping jeeps looks similar with trucks and buses.

Fig. 8 (c) shows a successful detection of vehicle in an image

while some misdetection and misclassifications in Fig. 8 (d).

Fig 8 (a)-

(d). Output images of the proposed VCS with vehicle

masks and labels

Table 1. Vehicle Classification Accuracy of Manual and Proposed VCS

Hour Manual Proposed % acc Manual Proposed % acc Manual Proposed % acc Manual Proposed % acc

1 340 292 85.88 250 97 38.80 0 16 0.00 16 101 -15.84

2 464 506 91.70 397 157 49.20 4 15 -26.67 20 151 -13.25

3 403 384 95.29 313 154 49.20 1 7 -14.29 6 121 - 4.96

4 514 777 66.15 588 105 17.86 1 23 -4.35 33 186 -17.74

5 341 352 96.88 287 138 48.08 5 25 -20.00 4 111 -3.60

6 832 744 89.42 455 132 29.01 6 33 18.18 16 124 -12.90

7 355 415 85.54 236 147 62.29 1 20 -5.00 14 156 -8.97

8 489 349 71.37 292 117 40.07 1 20 -5.00 31 98 - 31.63

9 414 711 58.23 311 125 40.19 0 39 0.00 16 232 -6.90

10 882 384 43.54 413 139 33.66 9 22 -40.91 48 98 -48.98

Total 5034 4914 3542 1311 28 220 204 1378

Total

Weighted

Accuracy

97.62% 37.01% -12.73% - 14.80%

Car

Motor

Bus

Truck

2678 2019 IEEE Region 10 Conference (TENCON 2019)

Table 1 presents the 10-hour vehicle classification

accuracy of conventional and proposed vehicle counting

system (VCS). It also shows the accuracy of proposed VCS in

comparison with the manual method. Note that the negative

accuracy only means that proposed VCS detects more vehicle

than the manual method. Accordingly, the proposed VCS

successfully detects car with satisfactory total accuracy of

97.62 % while poorly to the rest of vehicle types. It can be

observed that in the 10th hour, it poorly detects car with only

37.01 % accuracy while truck and bus have significantly

increased. This means that the proposed VCS has the tendency

of confusing car with either truck or bus.

V. CONCLUSION

A vehicle counting system (VCS) based on time interval

image capture (TIIC) system and deep learning Mask R-CNN

is successfully developed and implemented in an actual field

scenario. The developed VCS failed to count the number of

motorbikes, trucks and bus but sufficiently for cars. The

presence of jeep which is categorized as car confused the VCS

and detect it as either truck or bus. The VCS tends to missed

most of motorbikes probably due to its small pixel size

comparative to other vehicle types. Future researches can

implement the novel TIIC system to minimize data storage

cost.

REFERENCES

[1] B. Bongat, “How Much Money Are You Losing Because of Traffic?,”

Yahoo News, 2015. [Online]. Available:

https://sg.news.yahoo.com/much-money-losing-because-traffic-

220012598.html. [Accessed: 15-Feb-2019].

[2] L. Calderoni, D. Maio, and S. Rovis, “Expert Systems with

Applications Deploying a network of smart cameras for traffic

monitoring on a ‘“ city kernel ,”’” vol. 41, pp. 502–507, 2014.

[3] G. V Konoplich, E. O. Putin, and A. A. Filchenkov, “Application of

Deep Learning to the Problem of Vehicle Detection in UAV Images,”

2016 XIX IEEE Int. Conf. Soft Comput. Meas., pp. 4–6, 2016.

[4] T. Darwish and K. A. Bakar, “Traffic density estimation in vehicular

ad hoc networks : A review,” Ad Hoc Networks, vol. 24, pp. 337–351,

2015.

[5] S. Zhang, G. Wu, and P. Costeira, “Understanding Traffic Density from

Large -Scale Web Camera Data,” arXiv Prepr., 2017.

[6] S. Surya and V. Babu, “TraCount : A Deep Convolutional Neural

Network for Highly Overlapping Vehicle Counting,” 2016.

[7] S. Zhang, G. Wu, J. Costeira, and J. Moura, “FCN-rLSTM : Deep

Spatio-Temporal Neural Networks for Vehicle Counting in City

Cameras,” in 2017 IEEE International Conference on Computer Vision

(ICCV), 2017.

[8] S. Yu, Y. Wu, W. Li, Z. Song, and W. Zeng, “A model for fine-grained

vehicle classification based on deep learning,” Neurocomputing, vol.

257, pp. 97–103, 2017.

[9] S. Awang and N. M. A. N. Azmi, “Vehicle Counting System Based on

Vehicle Type Classification Using Deep Learning Method,” in IT

Convergence and Security 2017, 2018, pp. 52–59.

[10] T. N. Mundhenk, G. Konjevod, W. A. Sakla, and K. Boakye, “A Large

Contextual Dataset for Classification, Detection and Counting of Cars

with Deep Learning,” in Computer Vision -- ECCV 2016, 2016, pp.

785–800.

[11] J. Chung and K. Sohn, “Image-Based Learning to Measure Traffic

Density Using a Deep Convolutional Neural Network,” IEEE Trans.

Intell. Transp. Syst., vol. 19, no. 5, pp. 1670–1675, 2018.

[12] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature

hierarchies for accurate object detection and semantic segmentation,”

in Proceedings of the IEEE conference on computer vision and pattern

recognition, 2014, pp. 580–587.

[13] R. Girshick, “Fast R-CNN,” in The IEEE International Conference on

Computer Vision (ICCV), 2015.

[14] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-

time object detection with region proposal networks,” in Advances in

neural information processing systems, 2015, pp. 91–99.

[15] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in

Computer Vision (ICCV), 2017 IEEE International Conference on,

2017, pp. 2980–2988.

[16] C. Consortium, “Coco dataset.” [Online]. Available:

http://cocodataset.org/#home. [Accessed: 03-Mar-2019].

2019 IEEE Region 10 Conference (TENCON 2019) 2679

Real-time vehicle counting in complex scene for traffic flow estimation using multi-level convolutional neural network

Article

Feb 2021

Employing YOLOv4 in a Vehicle Counting System Using Random Dataset

Conference Paper

Full-text available

Aug 2022

A vehicle counting system (VCS) is one technology that can help the Intelligent Transportation System (ITS) achieve its goals of providing safe and efficient road and transportation infrastructure. Some challenges arise when dealing with CCTV objects, such as an object being blocked by other objects or scanning objects being too small, resulting in the object not being detected. This paper implements YOLOv4, with the number of vehicles based on five classes: car, bus, truck, bicycle, and motorcycle. To demonstrate the feasibility system and adequacy, which implement in three stages: input the videos or images into a system using pre-trained YOLOv4, get the bounding box, and compare with the ground truth to the pre-trained model using the MS COCO dataset. The result of this research is that accuracy of 85.61% is obtained based on the results of the two-test media of Image and Video, which were broken down into frame by frame to get accuracy by frame. For image detection, an accuracy of 88% is obtained; for video detection, an accuracy of 83.22% is obtained.

G-YOLO: A YOLOv7-based target detection algorithm for lightweight hazardous chemical vehicles

Article

Full-text available

Apr 2024
PLOS ONE

Hazardous chemical vehicles are specialized vehicles used for transporting flammable gases, medical waste, and liquid chemicals, among other dangerous chemical substances. During their transportation, there are risks of fire, explosion, and leakage of hazardous materials, posing serious threats to human safety and the environment. To mitigate these possible hazards and decrease their probability, this study proposes a lightweight object detection method for hazardous chemical vehicles based on the YOLOv7-tiny model.The method first introduces a lightweight feature extraction structure, E-GhostV2 network, into the trunk and neck of the model to achieve effective feature extraction while reducing the burden of the model. Additionally, the PConv is used in the model’s backbone to effectively reduce redundant computations and memory access, thereby enhancing efficiency and feature extraction capabilities. Furthermore, to address the problem of performance degradation caused by overemphasizing high-quality samples, the model adopts the WIoU loss function, which balances the training effect of high-quality and low-quality samples, enhancing the model’s robustness and generalization performance. Experimental results demonstrate that the improved model achieves satisfactory detection accuracy while reducing the number of model parameters, providing robust support for theoretical research and practical applications in the field of hazardous chemical vehicle object detection.

Intelligent Flood Detection using Traffic Surveillance Images based on Convolutional Neural Network and Image Parsing

Conference Paper

Nov 2022

Traffic Monitoring from the Perspective of an Unmanned Aerial Vehicle

Article

Full-text available

Aug 2022

The paper is focused on the development of the experimental web-based solution for image processing from the perspective of an Unmanned Aerial Vehicle (UAV). Specifically, the research is carried out as part of the broader study on drone utilization in traffic at the Technical University of Kosice. This contribution explores the possibility of using the UAV as a tool to detect the temporal state of the traffic in multiple locations. Road traffic analysis is enabled through the detection of vehicles from the user-defined region of interest (ROI). Its content then serves as the input for motion detection, followed by the detection of vehicles using the YOLOv4 model. Detection of other types of objects is possible, thus making the system more universal. The vehicle is tracked after recognition in two consecutive frames. The tracking algorithm is based on the calculation of the Euclidean distance and the intersection of the rectangles. The experimental verification yields lower hardware requirements for CPU and GPU by about two FPS when using optimization techniques, such as ROI or reference dimensions of objects. The accuracy of detection and the subsequent tracking of cars reaches almost 100% while providing accurate trajectory determination.

AN AI-BASED AUTOMOTIVE TRUCK ASSESSMENT AND RECOMMENDATION SYSTEM USING MACHINE LEARNING ALGORITHMS Chapter One

Research Proposal

May 2024

Vision-Based Multi-detection and Tracking of Vehicles Using the Convolutional Neural Network Model YOLO

Chapter

Apr 2024

Vehicle detection and tracking play a critical role in Intelligent Traffic Systems (ITS) for managing, identifying, and understanding the behaviour of vehicles on the road. In recent years, researchers have focused on developing computer vision-based approaches to improve the accuracy of vehicle detection and tracking. However, state-of-the-art approaches face challenges such as uncontrollable environmental conditions, occlusion of vehicles in a single frame, and high levels of noise, which can affect the accuracy of vehicle detection. In this study, we propose a computer vision-based approach to identify and track the movement of vehicles using the “YOU ONLY LOOK ONCE” (YOLO) version 7 Convolutional Neural Network (CNN) model. The YOLO model is fine-tuned to detect different types of vehicles, including cars, ambulances, buses, trucks, motorcycles, and bicycles. We utilized the Open Images Dataset from Google to test and validate our approach. Our experimental results demonstrate that the proposed approach achieves a higher detection accuracy of 81.28% mean Average Precision (mAP) compared to the state-of-the-art approaches. Furthermore, we tested our approach on the DAWN dataset to demonstrate its effectiveness as a multi-detector of vehicles in different weather conditions and online videos on the road for vehicle detection and tracking. Despite the challenges in the state-of-the-art computer vision-based ITS, our proposed approach demonstrates that it is possible to achieve higher accuracy using the YOLO version 7 CNN model. This study highlights the importance of developing accurate and efficient vehicle detection and tracking techniques to improve the operational functions of Intelligent Traffic Systems.

Vehicle Counting based on Convolution Neural Network

Conference Paper

May 2023

Proposed Model for Detection and Classification of Vehicles in Real-Time Video Based on Deep Learning

Conference Paper

Jun 2022

Speed, Direction, Color and Type Identification of NHAI Expansions Using Deep Learning

Article

Aug 2022

Vehicle numeration is associate interaction to appraise the road traffic thickness to judge the traffic conditions for shrewd transportation frameworks. With the broad use of cameras in metropolitan vehicle frameworks, the reconnaissance mission video has become a focal info supply to boot, constant traffic the board framework has become illustrious as lately owing to the accessibility of handheld/versatile cameras and machine learning investigation. In this work, propose video-based vehicle as well as technique in associate superhighway traffic video caught utilizing hand-held cameras. The primary and therefore the necessary step to estimate the vehicle flow; this later helps North American nation to count the vehicles victimization the virtual line Generally, we have a tendency to begin with the background subtraction to isolate moving objects. To facilitate crossing of vehicles with the road, we have a tendency to apply the detection of objects. Our system uses the LBPH (Local Binary Pattern Histogram) algorithmic program as a way to deduct the background, so as to use our numeration algorithmic program. Traffic observance is one space that utilizes Deep Learning for many functions. By exploitation cameras put in in some spots on the roads, several tasks like vehicle investigating, vehicle identification, traffic violation observance, vehicle speed observance, etc. will be completed. Deep Learning may be a common Machine Learning formula that's wide employed in several areas in current way of life. Its strong performance and ready-to-use frameworks and architectures allows many of us to develop varied Deep Learning-based code or systems to support human tasks and activities. During this paper, we tend to discuss a Deep Learning implementation to form a vehicle investigating system while not having to trace the vehicles movements. to reinforce the system performance and to cut back time in deploying Deep Learning design, therefore pre-trained model of YOLOv3 is employed during this analysis because of its sensible performance and moderate process time in object detection. This analysis aims to form a straightforward vehicle investigating system to assist human in classify and investigating the vehicles that cross the road. The investigating relies on four varieties of vehicle, i.e. car, motorcycle, bus, and truck, whereas previous analysis counts the automobile solely. because the result, our planned system capable to count the vehicles crossing the road supported video captured by camera with the very best accuracy of ninety seven.97%.

Image-Based Learning to Measure Traffic Density Using a Deep Convolutional Neural Network

Article

Full-text available

Aug 2017

Existing methodologies to count vehicles from a road image have depended upon both hand-crafted feature engineering and rule-based algorithms. These require many predefined thresholds to detect and track vehicles. This paper provides a supervised learning methodology that requires no such feature engineering. A deep convolutional neural network was devised to count the number of vehicles on a road segment based solely on video images. The present methodology does not regard an individual vehicle as an object to be detected separately; rather, it collectively counts the number of vehicles as a human would. The test results show that the proposed methodology outperforms existing schemes. IEEE

Understanding Traffic Density from Large-Scale Web Camera Data

Article

Full-text available

Mar 2017

In this paper, we estimate traffic density from low quality videos captured by city web cameras (webcams). Webcam videos have low resolution, low frame rate, high occlusion and large perspective, making most existing methods lose their efficacy. To deeply understand traffic density, we explore both deep learning based and optimization based methods. To avoid individual vehicle detection and tracking, both methods map the image into vehicle density map, one based on rank constrained regression and the other one based on fully convolution networks (FCN). The regression based method learns different weights for different blocks in the image to increase freedom degrees of weights and embed perspective information. The FCN based method jointly estimates vehicle density map and vehicle count with a residual learning framework to perform end-to-end dense prediction, allowing arbitrary image resolution, and adapting to different vehicle scales and perspectives. We analyze and compare both methods, and get insights from optimization based method to improve deep model. Since existing datasets do not cover all the challenges in our work, we collected and labelled a large-scale traffic video dataset, containing 60 million frames from 212 webcams. Both methods are extensively evaluated and compared on different counting tasks and three datasets, with experimental results demonstrating their effectiveness and robustness. In particular, FCN based method significantly reduces the mean absolute value from 10.99 to 5.31 on the public dataset TRANCOS compared with the state-of-the-art baseline.

Vehicle Counting System Based on Vehicle Type Classification Using Deep Learning Method

Chapter

Jan 2018

Mask R-CNN

Conference Paper

Oct 2017

FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras

Conference Paper

Oct 2017

Understanding Traffic Density from Large-Scale Web Camera Data

Conference Paper

Jul 2017

Rich feature hierarchies for accurate object detection and semantic segmentation

Conference Paper

Nov 2014

Can a large convolutional neural network trained for whole-image classification on ImageNet be coaxed into detecting objects in PASCAL? We show that the answer is yes, and that the resulting system is simple, scalable, and boosts mean average precision, relative to the venerable deformable part model, by more than 40% (achieving a final mAP of 48% on VOC 2007). Our framework combines powerful computer vision techniques for generating bottom-up region proposals with recent advances in learning high-capacity convolutional neural networks. We call the resulting system R-CNN: Regions with CNN features. The same framework is also competitive with state-of-the-art semantic segmentation methods, demonstrating its flexibility. Beyond these results, we execute a battery of experiments that provide insight into what the network learns to represent, revealing a rich hierarchy of discriminative and often semantically meaningful features.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Conference Paper

Jan 2016

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [7] and Fast R-CNN [5] have reduced the running time of these detection networks, exposing region pro-posal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolu-tional features. For the very deep VGG-16 model [18], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image. The code will be released.

Mask R-CNN

Article

Mar 2017

We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without tricks, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code will be made available.

A Model for Fine-Grained Vehicle Classification Based on Deep Learning

Article

Feb 2017
NEUROCOMPUTING

A model for fine-grained vehicle classification based on deep learning is proposed to handle complicated transportation scene. This model comprises of two parts, vehicle detection model and vehicle fine-grained detection and classification model.Faster R-CNN method is adopted in vehicle detection model to extract single vehicle images from an image with clutter background which may contains serval vehicles.This step provides data for the next classification model. In vehicle fine-grained classification model, an image contains only one vehicle is fed into a CNN model to produce a feature, then a joint bayesian network is used to implement the fine-grained classification process.Experiments show that vehicle’s make and model can be recognized from transportation images effectively by using our method. Furthermore,in order to build a large scale database easier, this paper comes up with a novel network collaborative annotation mechanism.

Vehicle Count System based on Time Interval Image Capture Method and Deep Learning Mask R-CNN

Abstract and Figures

Recommended publications

Deep learning for noninvasive classification of clustered horticultural crops – A case for banana fr...

A more accurate mask detection algorithm based on Nao robot platform and YOLOv7

Electric Equipment Panel Detection and Segmentation based on Mask R-CNN

Construction of GUI Elements Recognition Model for AI Testing based on Deep Learning