Content uploaded by Dong-Kyu Kim
Author content
All content in this area was uploaded by Dong-Kyu Kim on May 04, 2021
Content may be subject to copyright.
1
Article title: Investigating the Influential Factors for Practical Application of Multiclass Vehicle
1
Detection for Images from Unmanned Aerial Vehicle Using Deep Learning Models
2
Journal title: Transportation Research Record
3
Paper history: Submitted 1st August 2019
4
Revised 10th March 2020
5
Accepted 2nd May 2020
6
Published online 16th October 2020
7
Published 1st December 2020
8
Funding: Ministry of Science and ICT, Republic of Korea (NRF-2019R1H1A1080045)
9
DOT information: https://doi.org/10.1177/0361198120954187
10
---------------------------
11
2
Investigating the Influential Factors for Practical Application of
1
Multiclass Vehicle Detection for Images from Unmanned Aerial
2
Vehicle Using Deep Learning Models
3
4
5
Seung Woo Ham
6
Department of Civil and Environmental Engineering
7
Seoul National University, Gwanak-gu, Seoul, Republic of Korea, 08826
8
Email: seungwoo.ham@snu.ac.kr
9
10
Ho-Chul Park
11
Department of Transportation Engineering
12
Myongji University, Yongin, Kyunggi, Republic of Korea, 17058
13
E-mail: hcpark@mju.ac.kr
14
15
Eui-Jin Kim
16
Department of Civil and Environmental Engineering
17
Seoul National University, Gwanak-gu, Seoul, Republic of Korea, 08826
18
Email: kyjcwal@snu.ac.kr
19
20
Seung-Young Kho
21
Department of Civil and Environmental Engineering
22
Seoul National University, Gwanak-gu, Seoul, Republic of Korea, 08826
23
Email: sykho@snu.ac.kr
24
25
Dong-Kyu Kim, Corresponding Author
26
Department of Civil and Environmental Engineering and Institute of Construction and Environmental
27
Engineering
28
Seoul National University, Gwanak-gu, Seoul, Republic of Korea, 08826
29
Email: dongkyukim@snu.ac.kr
30
31
32
Word Count: 6,436 words + 4 tables = 7,436 words
33
34
Call for Papers: Collection and Application of Quality Traffic Data (ABJ35)
35
36
37
3
ABSTRACT
1
Traffic density, which is a critical measure in traffic operations, should be collected precisely at various
2
locations and times to reflect site-specific spatiotemporal characteristics. For detailed analysis, heavy
3
vehicles have to be separated from ordinary vehicles, since they have a significant effect on traffic flow as
4
well as traffic safety. With unmanned aerial vehicles (UAVs), we can easily acquire a video for vehicle
5
detection by collecting images from above the traffic without any disturbances. Despite previous studies
6
for vehicle detection, there is still a lack of research for real-world applications in estimating traffic
7
density. In this study, we investigate the effects of influential factors, which are the size of objects,
8
number of samples, and a combination of datasets, on detecting multi-class vehicle using deep learning
9
models in various UAV images. We compare three detection models, which are Faster Region-based
10
Convolutional Neural Networks (Faster R-CNN), Region-based Fully Convolutional Network (R-FCN),
11
and Single-Shot Detector (SSD), to suggest guidelines for model selection. The results provided several
12
findings: 1) Vehicle detection from UAV images showed sufficient performance with a small number of
13
samples and small objects; 2) Deep learning-based multi-class vehicle detectors can have advantages
14
compared with single-class detectors; 3) Among all the models, SSD showed the best performance due to
15
its algorithmic structure; 4) Simply combining datasets in different environments cannot guarantee
16
performance improvement. Based on our findings, we brought practical guidelines for estimating multi-
17
class traffic density using UAV.
18
19
Keywords: Vehicle Detection, Deep Learning, Unmanned Aerial Vehicle, Reproducibility in Practice,
20
Single-Shot Detector
21
4
INTRODUCTION
1
Traffic phenomena in congestion such as traffic oscillation, traffic breakdown, and capacity drop
2
result in not just traveler delays that reduce system-wide efficiencies but also increased crash potential.
3
Many studies have unveiled the mechanism that triggers those phenomena and founded measures to
4
capture them (1, 2). Their effort demonstrated that traffic density (i.e., the inverse of average vehicle
5
spacing) is the most critical measure. To diagnose and analyze the congestion phenomena, traffic density
6
should be collected precisely at various locations and times to reflect site-specific characteristics.
7
Meanwhile, in traffic flow analysis, heavy vehicles have a significant effect on traffic congestion as well
8
as traffic safety due to their physical characteristics, e.g., heavy weight, large size, and maneuvering
9
limitations (3-5). For precise analysis, therefore, heavy vehicles should be considered, but it is costly to
10
obtain the density of heavy vehicles separately from ordinary vehicles.
11
Based on vehicle detector systems, aerial images, or surveillance cameras, many studies have
12
attempted to obtain vehicle densities (6-8). Although they showed the possibility of high performance for
13
collecting traffic density, there is still a need for improvements in real-world congestion management (9).
14
This is because the approaches were cost-ineffective to install surveillance cameras at many points or
15
collect high-resolution aerial images at different times. Also, most studies focused on detecting only
16
ordinary vehicles. Furthermore, in the case of a surveillance camera, the images cannot be taken in the
17
vertical direction from above the traffic like in a UAV, resulting in the problem of overlapping vehicle
18
images in congested traffic.
19
Recently, unmanned aerial vehicles (UAVs) have been proposed to mitigate this inefficiency
20
using their mobility, cost-effectiveness, wide field-of-view, and ability to hover (stationary flight) (6, 7,
21
10). UAVs can easily obtain high-resolution images from above traffic at low altitude, and only simple
22
camera calibration is required to acquire a clear image and correct the geometric distortion (10, 11). The
23
major drawback of images from UAV is that vehicle features are represented in a small number of pixels.
24
This can be further exacerbated in congested traffic, where shadows partially occlude vehicles or adjacent
25
vehicles are detected as one; further, it significantly reduces the accuracy of collected traffic density (12).
26
The conventional approaches for detecting vehicles such as background subtraction, blob analysis, and
27
optical flow are vulnerable to those difficulties because it cannot robustly detect the exact bounding boxes
28
surrounding the vehicles (13, 14).
29
With the development of computer vision and deep learning, the supervised learning-based
30
vehicle detection method has been proposed to collect accurate traffic density even in congestion. Until
31
recently, combining feature representation and learning algorithms was mainly done for detecting
32
vehicles in UAV images (8, 11, 15). Because those studies used generic features for object detection
33
instead of customized features for vehicles in UAV images (8, 16), efficiency and accuracy can be further
34
enhanced.
35
As the convolutional neural network (CNN) (17) had great success with image classification,
36
many researchers have recently focused on vehicle detection using CNN. These deep learning structures
37
automatically create features in the images (18), and those features showed better performance for vehicle
38
detection in UAV images than the generic features (19). In particular, combining CNN with bounding box
39
regression (20) called “region-based CNN” (R-CNN) (21) allows the location of vehicles to be precisely
40
specified by a bounding box, which drastically improves the performance of CNN-based object detection.
41
Faster R-CNN (22), the enhanced model of R-CNN, performed real-time vehicle detection with high
42
accuracy (19). In addition, a variety of advanced methodologies have been applied to measure vehicle
43
density accurately, such as Region-based Fully Convolutional Network (R-FCN), Single-Shot Detector
44
(SSD), and so on.
45
Despite many methodological studies for vehicle detection, there is still a lack of experimental
46
research for real-world applications. For example, multi-class vehicle detection, which classifies vehicles
47
and heavy vehicles, is essential for analyzing congestion due to its different impacts on traffic conditions
48
(23, 24). Regarding the performance, validation in the various environments is required for evaluating the
49
robustness of a detector because detection performance can vary greatly depending on the characteristics
50
5
of the image, e.g., image resolution, lighting conditions, geometric features of the road. However, detailed
1
analysis in various environments has not yet appeared in previous studies (19, 25, 26).
2
In this paper, we investigate the effects of influential factors, i.e., small objects, size of a vehicle
3
in the image, number of samples, and combination of datasets, on detecting multi-class vehicle using deep
4
learning models. In addition, we compare three models, i.e., Faster R-CNN, R-FCN, and SSD, which are
5
modern deep learning object detecting models (27, 28). Based on the results, we provide practical
6
guidelines for multi-class vehicle detection.
7
The remainder of this paper is organized as follows. First, we introduce a literature review of
8
vehicle detection. In the next section, we discuss the deep learning methodologies of this study. Then, we
9
describe three datasets and measures of effectiveness used in the study. We show the model estimation
10
results and discuss our findings. Lastly, we conclude this study and provide guidelines for multi-class
11
vehicle detection.
12
13
LITERATURE REVIEW
14
Vehicle detection using aerial and UAV images are becoming popular for their maneuverability
15
and promising results. Among the vast literature, we have categorized important recent studies focusing
16
on the development of methodologies: edge and blob detection, machine learning, and deep learning.
17
Because unsupervised methods such as edge detection and blob detection have the advantage of
18
relatively less computational power compared with other methods, it has been widely used for real-time
19
detection. Azevedo et al. and Khan et al. used a background subtraction approach and blob analysis to
20
target vehicles from the uncongested free flow image (10, 29). Ke et al. detected vehicles from UAV
21
traffic video using Shi-Tomasi features (7). These previous works showed that the unsupervised method
22
can be applied in real-time without training and perform appropriately in free-flow and urban
23
intersections. However, the detection was not validated in a congested situation where the features used in
24
those methods are reported susceptible to image conditions.
25
To improve the robustness of detection performance in a complex environment, machine learning
26
methods are widely used. Elmikaty and Stathaki trained support vector machines (SVMs) with human-
27
made features such as gradient, color, and texture (30). Gleason et al. used the histogram of gradients
28
(HoG) and histogram of Gabor coefficients as a feature. They tested it with various detectors such as k-
29
Nearest Neighbors (k-NN), Random Forests, and SVM (31). However, these detection models were
30
conducted in a single environment so its performance can drop in some situations which limits practical
31
use. Moreover, the fact that human-made features are targeted for general objects, not the vehicle from
32
UAV image also limits its performance.
33
Deep learning methods have significant advantages over other methods in that they automatically
34
select the feature from the image (17). Xu et al. used Faster R-CNN and a VGG16 network to train a
35
vehicle detector of a UAV image (19). The authors also showed that the Faster R-CNN method has
36
robustness on image orientation, compared with a Viola-Jones object detection scheme and HoG feature-
37
infused linear SVM. In this study, it is difficult to know how well the performance of the detection is
38
achieved because there is no detailed information about the evaluation metric.
39
While the mainstream of vehicle detection focused on binary vehicle detection, some research has
40
focused on multi-class vehicle detection. Tang et al. trained a UAV image with CNN and cascade of a
41
boosted classifier to detect two-class vehicles with images gathered from a different road in the daytime.
42
The result showed that deep learning method works better compared with the conventional machine
43
learning techniques (25). Liu and Mattyus applied a binary detector using a soft-cascade structure with
44
integral channel features and classified the result into multi-class with an aggregated classifier method
45
(8). Li et al. trained the R-CNN network with a high-resolution aerial image to detect vehicles in multi-
46
class. After detected by binary vehicle detector, each vehicle was classified in four different classes, and
47
the station wagon showed the highest detection performance with 2,302 training samples (32). However,
48
these studies have limitations in adding one more stage for classification after detecting vehicles, which
49
induces not only the longer detection time but also a performance drop because errors occur
50
independently in each stage.
51
6
Multi-class object detection solely on deep learning can set the number of classes from the
1
beginning of the training stage. Although previous attempts have been developed and evaluated the multi-
2
class detection model for a generic object (33), there is no research to provide a reproducible guideline for
3
detecting vehicles and heavy vehicles from UAV images. In other words, each machine learning model
4
for object detection has different strengths and weaknesses by the type of the object (19, 30, 31). So, a
5
model that is suitable for vehicle detection from UAV images needs to be selected for practical usage.
6
Several studies have been attempts to apply vehicle detections in specific environmental conditions of the
7
training images, but there is a lack of research on detailed performance analysis for various conditions
8
such as lighting conditions, the ratio of heavy vehicle and resolution of the image (34). Therefore, the
9
type of model architecture that shows decent performance for vehicle detection from UAV images and the
10
impact of an influential factor on the performance of the methods should be investigated.
11
As the deep learning method has strengths in image recognition in traffic images, as well as
12
various fields (17), we focused on investigating an effective deep learning architecture for vehicle
13
detection from UAV images, as well as the influential factor on detection performance. Even the deep
14
learning method need greater computational power for implementation than the traditional methods; still,
15
it can be used in real-time detection (22). Notably, a one-stage deep learning algorithm such as SSD
16
shows faster running time than other algorithms due to its less complexity (27).
17
18
DEEP LEARNING METHODOLOGIES
19
In this paper, we used the three state-of-the-art object detectors: Faster R-CNN, SSD, and R-FCN.
20
We properly adjusted the hyperparameters of each detector for the best performance in vehicle detection.
21
All three methods share common hyperparameters. The default sliding window (anchor box) was
22
specified to 128×64, the scale of the sliding window was varied from 0.25 to 4.00, and the ratio was
23
varied from 1.0 to 4.0. Contexts below introduce the core idea of each detector.
24
Faster R-CNN is the third version of R-CNN architecture. Figure 1 shows the structure of Faster
25
R-CNN. R-CNN architecture extracts the region that is likely to contain an object, i.e., region proposals,
26
from the image and classifies each proposal if it has an object or not. Among the region proposals, more
27
plausible proposals are selected as a region of interests (RoI). The early version of R-CNN extracted RoI
28
by using an algorithm called “selective search.” Each RoI is entered into CNN, transforming RoI to a
29
feature vector. Output feature vectors are then used for classification by SVM, and the coordinate of the
30
detection is adjusted by bounding box regression.
31
Faster R-CNN fastens the region proposing process by extracting RoI from the output feature
32
vector. This stage is called the “region proposal network” (RPN). In RPN, the sliding window method is
33
used to find the RoI. Twelve sliding windows (three different ratios and four different scales) are applied
34
for each point. Each RoI is then evaluated using the integrated loss, which contains classification loss and
35
bounding box regression loss. The equation is described in Equation 1.
36
37
(1)
38
39
Here, is the mini-batch size, and is the number of sliding windows. is an indicator to
40
identify the individual sliding window. The classification loss (), which uses cross-entropy loss, and
41
bounding box regression loss (), which uses smooth loss is calculated for each sliding window.
42
represents the predicted probability that indicates if sliding window has an object or not. is the binary
43
indicator that demonstrates the ground-truth of
.
and are adjusted coordinates of the predicted
44
bounding box and ground-truth bounding box, respectively. Weight between classification loss and
45
bounding box regression loss is controlled by .
46
Region-based Fully Convolutional Network (R-FCN) is an attempt to solve a detection problem
47
like a classification problem. The classification problem is a translation invariance problem, which means
48
the result does not change by the translation of an image. On the other hand, the detection problem has
49
translation variance characteristic as its output varies if the image is shifted or enlarged. Because the
50
7
classification problem is easier than the detection problem, it has many advantages if we can use the
1
property of classification in detection.
2
R-FCN suggests a position-sensitive score map that contains relative location information of the
3
components of an object. A position-sensitive score map learns the arrangement of components in the
4
object at the training stage and uses it in the detection stage. For example, in the case of detecting a
5
human face, a position-sensitive score map learns that the nose will be in the center and mouth will be at
6
the bottom of the face. By using this knowledge embedded in position-sensitive score map, the detector
7
searches for the face that must have a nose in the center, and mouth at the bottom.
8
R-FCN splits the length and breadth of the object into . Thus, R-FCN searches for
9
components of the object. When the detector is trained for C number of categories, it creates the vectors
10
that can categorize a total of categories, including the background. Thus, the total number of the
11
channel becomes . Figure 2(a) depicts the case .
12
Single-Shot Detector (SSD), depicted in Figure 2(b), is a one-stage detection framework that has
13
a simple structure compared with Faster R-CNN and R-FCN. While those two-stage detection
14
frameworks contain a pre-processing stage such as region proposal network, a one-stage detection
15
framework does the region proposal and detection simultaneously. SSD uses a feature map from various
16
convolutional layers. As an image goes through the convolutional layer, the output feature map represents
17
more complex components. At the initial layer, the output feature map represents a component with low
18
complexity, and at the end of the layers, components with high complexity are represented. Also, the
19
same area in each feature map contains a different area of the original image. The latter, the larger. This
20
enables SSD to target multiple objects with a variety of complexity and sizes in one image. The latter
21
feature map will detect a large complex object, and the initial feature map will detect the small, simple
22
object. The scale of a default bounding box is set as Equation 2:
23
24
(2)
25
26
Here, indicates the number of the whole feature map that is used, and indicates the number of
27
the current feature map. and is set as 0.9 and 0.2, respectively. The aspect ratio of each default
28
bounding box is prescribed as
.
29
The bounding box is created for each feature map, and it contains four numbers of coordinates
30
and c number of a prediction score for each class. The bounding boxes are then evaluated by the same
31
loss function, which was used in the Faster R-CNN. In the paper where SSD was introduced, the authors
32
defined the loss function as the weighted sum of the confidence loss and localization loss; however, each
33
loss is the same with classification loss and bounding box regression loss, respectively. If several
34
bounding boxes indicate the same object, only one bounding box was selected by non-maximum
35
suppression (NMS).
36
37
DESCRIPTION OF THE DATASET AND RESEARCH FRAMEWORK
38
Figure 3 illustrates the overall framework of the research, including data collection, data labeling,
39
and model evaluation. We compared the three advanced deep learning architecture for vehicle detection
40
from UAV images, with detailed performance analysis according to the number of training images, the
41
resolution of images, and the composition of the dataset.
42
Four types of video images taken from four different places were used in the study: Cheonho
43
Bridge (CB), Gyeongbu Expressway (GE), Gyeongin Expressway 2 (GE2), and Seohaean Expressway
44
(SE). The four videos can be characterized by environmental conditions affecting performance, such as
45
lighting conditions, shadow, congestion, and surroundings. Here, a vehicle that is longer than 15 m in the
46
image was classified as a heavy vehicle. Examples of the video image and important information of each
47
image are shown in Figure 4 and Table 1. The ground truth data were obtained by manual labeling of
48
bounding boxes around the vehicles in each frame. To reduce the labeling effort, we used a user-friendly
49
image labeler provided by MATLAB.
50
8
The videos were taken in the vertical direction. The photography was done with a DJI Inspire Pro
1
1 equipped with a Zenmuse X5 camera, which is a quadcopter drone with 4K video and 3-axis Gimbals.
2
The resolution of the video was 3840×2160 (25 fps), and a vehicle roughly consisted of 40×100 image
3
pixels in this video. Although the hovering capability of our UAV with 3-axis Gimbals was enough for
4
minimizing UAV instability in all environments, an additional stabilization process may be required in
5
harsh conditions. Details about the stabilization process for UAV are presented in other work (35).
6
We constructed the training and evaluation datasets for each place, not mixing with other places.
7
For example, the model that detects a vehicle in ‘Cheonho Bridge’ is trained by the images of ‘Cheonho
8
Bridge,’ and evaluated by the images of ‘Cheonho Bridge.’ The impact of a mixed dataset on the detection
9
performance was investigated in the later section.
10
11
MEASURE OF EFFECTIVENESS
12
The measure of detection performance is based on the intersection over union (IoU). IoU, also known as
13
the Jaccard index, is the value of overlap area divided by the union area of two boxes, detection box, and
14
ground-truth box. We set a threshold of IoU to determine if the detection is true or false. If the IoU of
15
detection is larger than a threshold, we accept it as true positive and vice versa.
16
Let us say that we detected four vehicles in one image as in Figure 5 and Table 2. A good
17
detection may have an IoU of near 1.00, while a bad detection may have an IoU near 0. Our example
18
represents four detections with IoUs ranged from 0.22 to 0.92.
19
Regarding the threshold, true detection result varies, because the higher threshold means stricter
20
measures of evaluation. In the case of threshold equals to 0.5, all detections except Detection 3, in which
21
its IoU is 0.22, are recognized as true detection. In this case, from four detection results, three are true, so
22
the precision becomes 0.75. The recall will be 0.60 as three true detections were done from five ground
23
truth data. Precision and recall represent the exactness and sensitivity of the model, respectively. Detailed
24
equation with true positive (TP), true negative (TN), false positive (FP), and false negative (FN) are
25
explained as follows:
26
27
(3)
28
(4)
29
(5)
30
31
However, when the threshold increases up to 0.9, Detection 1 is the only detection that is
32
regarded as true. The threshold can be determined by the future application of detection results. If the
33
detections aim to count vehicles for estimating traffic density in the sections, a low threshold is
34
acceptable. However, if the detections aim to count vehicles for each lane or calculate the spacing
35
between vehicles, a high threshold should be set.
36
Precision is the accuracy of predicting true positives from all of the detected samples, while the
37
recall is the number of true positives detected among all the ground-truth. The F-score can reflect both
38
precision and recall, which is in a trade-off relationship with each other. In the detection example above,
39
if the detector had created a bounding box in every spot where it has the only small chance of containing a
40
vehicle, it would record high recall as it detects most of the ground-truth data. However, at the same time,
41
most bounding boxes would not have a vehicle inside the box, so the precision drops. On the other hand,
42
if only objects that are clearly classified into vehicles are recognized as real vehicles, the number of
43
vehicles detected will decrease, and the recall will decrease at the same time, while precision surges near
44
to 1. This relationship can be described in the receiver operating characteristics (ROC) curve, which
45
shows the capability of the model to classify objects. The F-score came to be used because both precision
46
and recall are important in vehicle detection, where accurate detection without false positives and false
47
negatives is essential.
48
9
In the field of computer vision, many types of evaluation methods exist for detection problems.
1
For example, an average precision 0.5 (AP 0.5) is a measure of the effectiveness with a threshold set as
2
IoU 0.5. It was used in vision-based learning in the transportation field (36), but it is a relatively low
3
standard for practical applications such as lane-level traffic density estimation and crash-likelihood
4
estimation by vehicle spacing (37). Recently, mean average precision [0.50:0.95] (mAP [0.50:0.95]) and
5
a mean average recall [0.50:0.95] (mAR [0.50:0.95]) are widely used in benchmark study (33), which
6
measures the mean of area under the ROC curve for each threshold, from 0.50 to 0.95 with step size 0.05.
7
In this study, we used the modified version of them, mean F-score [0.50:0.95] (mF-score [0.50:0.95]),
8
which calculates F-score from mAP [0.50:0.95] and mAR [0.50:0.95]. This method can be used as a
9
balanced evaluation measure as it includes both information about precision and recall:
10
11
(6)
12
13
Figure 6 also shows the need for an evaluation with an mF-score [0.50:0.95] rather than an AP
14
0.5. The purpose of this evaluation is to reveal the changes in the performance of the SSD detector for
15
each environment. However, Figure 6(a) shows that consistent performance regardless of the
16
environment when evaluated with AP 0.5. In contrast, when the mF-score [0.50:0.95] is adapted as an
17
evaluation metric, the performance changes vary depending on the environment. In addition, Figure 6(b)
18
shows that the number of training samples is highly relevant to performance as the green and yellow lines
19
ensure an increase in performance while the increase rate decreases when the performance is above a
20
certain level. Using mF-score [0.50:0.95], which is a strict evaluation method, brought a lucid comparison
21
of various aspects of the detection performance. Borji et al. also suggested that F-score is the most
22
appropriate evaluation metric for object detection (38). The models that performed well on the F-score
23
metric also found to have a good performance on other evaluation metrics.
24
25
RESULTS
26
27
What is the difference in accuracy between single and multi-class detection?
28
If the performance of multi-class detection (i.e., detection of vehicles with a classification of a vehicle and
29
a heavy vehicle) lags far behind those of single-class detection (i.e., detection of vehicles without
30
classification), separation of vehicle and the heavy vehicle should be conducted by single-class detection
31
followed by additional classification.
32
Table 3 shows the detection results evaluated by mF-score [0.50:0.95] for single-class and multi-
33
class, respectively. For CB, the SSD showed mF-score [0.50:0.95] of 0.861 for single-class detector and hit
34
mF-score [0.50:0.95] of 0.893 for multi-class detection. The similar results of the SSD detector were also
35
observed for GE2 dataset. From all six cases, in only two cases with GE2 detection using Faster R-CNN
36
hitting mF-Score [0.50:0.95] of 0.702, and CB detection using R-FCN hitting mF-Score [0.50:0.95] of
37
0.803, single-class detector showed better performance than multi-class detector. However, even in these
38
cases, the performance difference between the single-class detector and the multi-class detector was
39
negligible. This means a multi-class detector does properly extract different feature representations from
40
each vehicle type; vehicle and heavy vehicle. As a result, there was no significant difference in performance
41
between single-class and multi-class detectors, although the multi-class detector classified vehicle and
42
heavy vehicle. Regarding the importance of distinguishing heavy vehicles, there is no reason to use single-
43
class detection.
44
45
What is the best among Faster R-CNN, R-FCN, and SSD?
46
The number of training samples was changed to confirm their relevance to detection performance. Figure
47
7 showed that the performance increases as the number of training samples increases, as common-sense
48
dictates. However, the convergence of performance was already achieved with a small number of training
49
samples near 3,000 in the case of GE and GE2. This shows that, if vehicle detection is performed on UAV
50
10
images, the deep learning model can be easily saturated with only a small number of training samples.
1
These results suggest that UAV-based vehicle detection can be utilized in various environments without
2
excessive labeling work from a practical point of view.
3
As a result of comparing the three models in Table 4, the result of SSD showed the best
4
performance for all cases. The mF-score [0.50:0.95] of SSD was 0.893 for CB, 0.875 for GE, 0.806 for
5
GE2, and 0.694 for SE. SSD also has advantages over other models in terms of execution speed. While the
6
other model is two-stage, the SSD requires less computational power itself as a one-stage model. The
7
performance of the SSD in Table 4 is not only the highest among the three models but also higher than the
8
advanced object detection models from the previous research (39). Bodla et al. recorded 0.647 in bus (heavy
9
vehicle) detection and 0.615 in car (vehicle) detection with mAP [0.50:0.95] evaluation metric, which is far
10
behind the 0.893 and 0.870. Besides, the mF-Score [0.50:0.95] of this study is a more rigorous metric than
11
mAP [0.50:0.95] as they consider recall and precision together. This high performance of SSD is due to the
12
properly adjusted hyperparameters and target-environment-specific training datasets. In Figure 8, the
13
detection result with SSD of Gyeongin Expressway 2 and Cheonho Bridge is illustrated, which is the most
14
crowded environment and free flow environment, respectively. The green boxes are vehicles, and the blue
15
boxes indicate heavy vehicles.
16
17
How tolerant does SSD to a small object detection issue?
18
The wider view of road image enables the analysis of longer road sections. In order to obtain traffic
19
information from a wider area, the size (number of pixels) of an individual vehicle must be reduced
20
accordingly, though it is concerned as a difficult problem for common deep learning models.
21
To confirm the robustness of SSD in detecting small objects, we reduced the number of horizontal and
22
vertical pixels of the image by 20%, only remaining 1/25 pixels of the entire image. The test for small
23
objects was conducted with CB and GE2, where the average vehicle size was largest. The average object
24
size of each environment is reduced from 159.3*65.6 pixels to 31.9*13.1 pixels and from 144.1*62.2
25
pixels to 28.8*12.4 pixels, respectively. In computer vision, an object smaller than 32*32 pixels is
26
generally classified as a small object (33), which means our problem is harder than usual small object
27
detection in the computer vision area.
28
Figure 9(a) shows the recognition performance in various resolutions using SSD. 100% means
29
the original image, and 20% is the image that has been reduced by 1/25. However, recognition
30
performance does not have a specific tendency of change in resolution. These results can be interpreted in
31
many ways. First, the characteristics of the road image may have influenced. Comparing Figure 9(b), the
32
original image, and Figure 9(c), image reduced to 1/25, we can still draw a bounding box for vehicles in
33
Figure 9(c) with confidence. As we can see, the visible features of two road images are not very different,
34
even if the image size is reduced.
35
Structural characteristics also can be a possible reason for its high performance in small object
36
detection. The SSD passes the image through the neural network and stores the obtained feature map in
37
every stage. The initial feature map can recognize a small image with simple property (e.g., a vehicle
38
from UAV image), and the latter feature map can recognize a large image with complex property (e.g.,
39
human posture recognition) (40). This characteristic was also applied in Ren et al. They improved Faster
40
R-CNN’s performance of small object detection with ResNet-50. ResNet-50 has a deeper network than
41
VGG16, but it also creates features that skip some networks at the same time, making it suitable for small
42
and large objects (41). With these reasons, vehicle recognition on the road using ResNet-50 based on SSD
43
has good performance, even though the vehicle is a small object.
44
45
Does the dataset expansion always enhance performance?
46
In the existing deep learning framework, it is a common method to acquire and train with a lot of datasets.
47
However, since the road images collected from UAVs are unique and simple, different characteristics
48
from the general deep learning context can be expected. The results in Figure 10 show a different
49
tendency from general deep learning. We fixed the test set as the CB dataset. First, we used only CB
50
dataset as a training set and then expanded the training set by adding up GE dataset, GE2 dataset; finally,
51
11
SE dataset was added to the training dataset. Unlike what we expect from general deep learning,
1
performance does not increase as the number of data increases. Sometimes, it even decreases. The final
2
CB+GE+GE2+SE training set had more than 15 times larger samples than the initial training ran only
3
with CB, it means that it is simply not right to increase the amount of the dataset to achieve high
4
performance. We can infer that the video form different environments (non-target-environment)
5
introduces interference of feature, hindering the performance.
6
Therefore, it can be seen that for vehicle detection using UAVs from a practical perspective, it is
7
most useful to secure a proper level of training sample for various environments and utilize it only when it
8
has a similar environment with the target video (target-environment). This result also suggests that the
9
UAV can be a more effective data collection system than a fixed video camera since the UAV can quickly
10
obtain a small amount of video data in various locations without installing an extensive fixed
11
infrastructure.
12
13
CONCLUSIONS
14
In this study, the performance of the Faster R-CNN, R-FCN, and SSD, i.e., modern deep learning-
15
based vehicle detection models, was compared and analyzed. The evaluation of models was applied by a
16
strict mF-score [0.50:0.95] compared with the previous AP 0.5. Each model was adjusted, e.g., anchor size
17
and aspect ratio, to be used appropriately for the vehicle detection in the UAV image. As a result, SSD
18
showed significantly higher performance compared with the Faster R-CNN and R-FCN, which was a
19
common conclusion for all environments. This result was noteworthy considering that the SSD is a one-
20
stage detector; thus, the speed is also faster than that of the other two. In the case of SSD, the image size is
21
expanded and reduced through the algorithm, and it creates several rectangular anchor boxes with various
22
scales. The shape of these rectangular anchor boxes is similar to the target, which is a vehicle, and it also
23
generates various sizes of anchor boxes compared to other algorithms. For these reasons, it is assumed that
24
higher performance could be achieved from SSD. SSD, in particular, did not show much difference
25
compared with when the training data set was small. Even if there are about 500 samples in the training set,
26
the detection was robust enough. In fact, just as it did with a general deep learning framework, random
27
mixing of all images did not guarantee high accuracy but rather showed a decrease in overall performance
28
when other environments were added. Therefore, UAV pre-flight planning should focus on getting more
29
than 500 samples from a single environment.
30
In tests using small objects, which are traditionally treated as weaknesses of general detectors, SSD
31
showed robust performance, unlike other concerns. There was no significant change in performance even
32
when objects were smaller than 32*32 pixels, the usual standard for small objects. Even if we removed 96%
33
pixels of the existing image, the performance was fine. This can be explained by the characteristics of road
34
traffic images, the use of the ResNet-50, and the use of early-stage feature maps. The simple characteristic
35
of road traffic image from UAV was able to interpret by ResNet-50, which provides features from the
36
various depth of networks, and SSD’s multi-scale feature map structure. Therefore, it is safe to shoot a wide
37
range road with UAVs by floating them higher, as the performance of SSD is not significantly affected by
38
the size of the object themselves when identifying traffic patterns using SSD. However, if the distance is
39
far away, the image naturally responds more sensitively to the vibrations of the UAV itself and convection,
40
so in reality, the accuracy is expected to decrease.
41
This study is significant in that the vehicle detection with deep learning, which has thus far ended
42
at the trial level, has been conducted in various environments and verified to a reproducible level. There is
43
a contribution in that we suggest the optimal strategy that can be used in general vehicle detection from
44
UAV images with deep learning. Practical applications using emerging technology is difficult due to the
45
local contexts such as the objective, data collection, environmental condition, and so on. To address these
46
difficulties, this study mainly focused on the various aspects for practical application of deep learning-based
47
vehicle detection, not just the model selection, training process, and performance measure but also the
48
evaluation in various environment and image resolution. This study also classified heavy vehicles from
49
other vehicles, which is significant in terms of traffic flow. SSD, which showed the highest performance in
50
this study, can operate in real-time, so if only a UAV can float, the traffic state of the site can be analyzed
51
12
in real-time. Using the result of this research, microscale vehicle level traffic data of a specific point can be
1
obtained by repeated video recording. We hope our research can contribute to bridging the gaps between
2
practice and emerging front-line technology by guiding the reproducible methods. Future research could
3
include research on more detailed contexts that we have not covered in the research, e.g., night, evening
4
lighting conditions, complex congestion crossings, and vehicle tracking.
5
13
DATA ACCESSBILITY STATEMENT
1
Some or all data, models, or code generated or used during the study are available from the corresponding
2
author by request.
3
4
ACKNOWLEDGMENT
5
This research was supported by Basic Science Research Program through the National Research
6
Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2019R1H1A1080045).
7
8
AUTHOR CONTRIBUTIONS
9
The authors confirm contribution to the paper as follows: study conception and design: Ham, Park, Kim,
10
Kim, and Kho; data collection: Ham and Kim; analysis and interpretation of results: Ham; draft manuscript
11
preparation: Ham, Park, and Kim. All authors reviewed the results and approved the final version of the
12
manuscript.
13
14
REFERENCES
1
1. Chung, K., J. Rudjanakanoknad, and M. J. Cassidy. Relation between Traffic Density and Capacity
2
Drop at Three Freeway Bottlenecks. Transportation Research Part B: Methodological, 2007. 41: 82–95.
3
4
2. Li, L., X. Chen, and L. Zhang. Multimodel Ensemble for Freeway Traffic State Estimations. IEEE
5
Transactions on Intelligent Transportation Systems, 2014. 15: 1323–1336.
6
7
3. Hamdar, S.H., Qin, L. and Talebpour, A. Weather and road geometry impact on longitudinal driving
8
behavior: Exploratory analysis using an empirically supported acceleration modeling framework.
9
Transportation research part C: emerging technologies, 2016. 67: 193-213.
10
11
4. Park, H.-C., Y.-J. Joo, S.-Y. Kho, D.-K. Kim, and B.-J. Park. Injury Severity of Bus–Pedestrian
12
Crashes in South Korea Considering the Effects of Regional and Company Factors. Sustainability, 2019.
13
11: 3169.
14
15
5. Park, H.-C., D.-K. Kim, S.-Y. Kho, and P. Y. Park. Cross-Classified Multilevel Models for Severity of
16
Commercial Motor Vehicle Crashes Considering Heterogeneity among Companies and Regions.
17
Accident Analysis and Prevention, 2017. 106: 305–314.
18
19
6. Coifman, B., M. McCord, R. G. Mishalani, M. Iswalt, and Y. Ji. Roadway Traffic Monitoring from an
20
Unmanned Aerial Vehicle. IEE Proceedings - Intelligent Transport Systems, 2006. 153: 11–20.
21
22
7. Ke, R., S. Member, Z. Li, S. Kim, J. Ash, Z. Cui, and Y. Wang. Real-Time Bidirectional Traffic Flow
23
Parameter Estimation From Aerial Videos. IEEE Transactions on Intelligent Transportation Systems,
24
2016. 18: 890–901.
25
26
8. Liu, K., and G. Mattyus. Fast Multiclass Vehicle Detection on Aerial Images. IEEE Geoscience and
27
Remote Sensing Letters, 2015. 12: 1938–1942.
28
29
9. Ozkurt, C., and F. Camci. Automatic Traffic Density Estimation and Vehicle Classification for Traffic
30
Surveillance Systems Using Neural Networks. Mathematical and Computational Applications, 2009. 14:
31
187–196.
32
33
10. Khan, M. A., W. Ectors, T. Bellemans, D. Janssens, and G. Wets. UAV-Based Traffic Analysis: A
34
Universal Guiding Framework Based on Literature Survey. Transportation Research Procedia, 2017. 22:
35
541–550.
36
37
11. Kim, E. J., H. C. Park, S. W. Ham, S. Y. Kho, and D. K. Kim. Extracting Vehicle Trajectories Using
38
Unmanned Aerial Vehicles in Congested Traffic Conditions. Journal of Advanced Transportation, 2019.
39
2019: https://doi.org/10.1155/2019/9060797.
40
41
12. Xu, Y., G. Yu, X. Wu, Y. Wang, and Y. Ma. An Enhanced Viola-Jones Vehicle Detection Method
42
from Unmanned Aerial Vehicles Imagery. IEEE Transactions on Intelligent Transportation Systems,
43
2017. 18: 1845–1856.
44
45
13. Barmpounakis, E. N., E. I. Vlahogianni, and J. C. Golias. Unmanned Aerial Aircraft Systems for
46
Transportation Engineering: Current Practice and Future Challenges. International Journal of
47
Transportation Science and Technology, 2017. 5: 111–122.
48
49
14. Lyu, S., M. C. Chang, D. Du, L. Wen, H. Qi, Y. Li, Y. Wei, L. Ke, T. Hu, M. Del Coco, P. Carcagni,
50
D. Anisimov, E. Bochinski, F. Galasso, F. Bunyak, G. Han, H. Ye, H. Wang, K. Palaniappan, K. Ozcan,
51
15
L. Wang, L. Wang, M. Lauer, N. Watcharapinchai, N. Song, N. M. Al-Shakarji, S. Wang, S. Amin, S.
1
Rujikietgumjorn, T. Khanova, T. Sikora, T. Kutschbach, V. Eiselein, W. Tian, X. Xue, X. Yu, Y. Lu, Y.
2
Zheng, Y. Huang, and Y. Zhang. UA-DETRAC 2017: Report of AVSS2017 & IWT4S Challenge on
3
Advanced Traffic Monitoring. Presented at 2017 14th IEEE International Conference on Advanced Video
4
and Signal Based Surveillance, 2017. https://doi.org/10.1109/AVSS.2017.8078560.
5
6
15. Gaszczak, A., T. P. Breckon, and J. Han. Real-Time People and Vehicle Detection from UAV
7
Imagery. Intelligent Robots and Computer Vision XXVIII: Algorithms and Techniques, 2011. 7878:
8
https://doi.org/10.1117/12.876663.
9
10
16. Tuermer, S., F. Kurz, P. Reinartz, and U. Stilla. Airborne Vehicle Detection in Dense Urban Areas
11
Using HoG Features and Disparity Maps. IEEE Journal of Selected Topics in Applied Earth Observations
12
and Remote Sensing, 2013. 6: 2327–2337.
13
14
17. Krizhevsky, A., I. Sutskever, and G. E. Hinton. Imagenet Classification with Deep Convolutional
15
Neural Networks. Presented at Advances in Neural Information Processing Systems, 2012.
16
17
18. Wang, R., L. Zhang, K. Xiao, R. Sun, and L. Cui. EasiSee: Real-Time Vehicle Classification and
18
Counting via Low-Cost Collaborative Sensing. IEEE Transactions on Intelligent Transportation Systems,
19
2014. 15: 414–424.
20
21
19. Xu, Y., G. Yu, Y. Wang, X. Wu, and Y. Ma. Car Detection from Low-Altitude UAV Imagery with
22
the Faster R-CNN. Journal of Advanced Transportation, 2017. 2017:
23
https://doi.org/10.1155/2017/2823617.
24
25
20. Felzenszwalb, P. F., R. B. Girshick, and D. McAllester. Cascade Object Detection with Deformable
26
Part Models. Presented at 2010 IEEE Computer Society Conference on Computer Vision and Pattern
27
Recognition, 2010.
28
29
21. Girshick, R., J. Donahue, T. Darrell, and J. Malik. Region-Based Convolutional Networks for
30
Accurate Object Detection and Segmentation. IEEE Transactions on Pattern Analysis and Machine
31
Intelligence, 2016. 38: 142–158.
32
33
22. Ren, S., K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards Real-Time Object Detection with
34
Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017. 39:
35
1137–1149.
36
37
23. Al-Kaisy, A., J. Bhatt, and H. Rakha. Modeling the Effect of Heavy Vehicles on Sign Occlusion at
38
Multilane Highways. Journal of Transportation Engineering, 2005. 131: 219–228.
39
40
24. Van Lint, J. W. C., S. P. Hoogendoorn, and M. Schreuder. Fastlane: New Multiclass First-Order
41
Traffic Flow Model. Transportation Research Record: Journal of the Transportation
42
Research Board, 2008. 2088: 177–187.
43
44
25. Tang, T., S. Zhou, Z. Deng, H. Zou, and L. Lei. Vehicle Detection in Aerial Images Based on Region
45
Convolutional Neural Networks and Hard Negative Example Mining. Sensors, 2017. 17:
46
https://doi.org/10.3390/s17020336.
47
48
26. Maria, G., E. Baccaglini, D. Brevi, M. Gavelli, and R. Scopigno. A Drone-Based Image Processing
49
System for Car Detection in a Smart Transport Infrastructure. Presented at 18th Mediterranean
50
Electrotechnical Conference: Intelligent and Efficient Technologies and Services for the Citizen, 2016.
51
16
1
27. Liu, W., D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg. SSD: Single Shot
2
Multibox Detector, Presented at European Conference on Computer Vision, 2016.
3
4
28. Dai, J., Li, Y., He, K. and Sun, J. R-fcn: Object detection via region-based fully convolutional
5
networks. In Advances in Neural Information Processing Systems, 2016.
6
7
29. Azevedo, C. L., J. L. Cardoso, M. Ben-Akiva, J. P. Costeira, and M. Marques. Automatic Vehicle
8
Trajectory Extraction by Aerial Remote Sensing. Procedia - Social and Behavioral Sciences, 2014. 111:
9
849–858.
10
11
30. Elmikaty, M., and T. Stathaki. Detection of Cars in High-Resolution Aerial Images of Complex Urban
12
Environments, IEEE Transactions on Geoscience and Remote Sensing, 2017. 55: 5913–5924.
13
14
31. Gleason, J., A. V Nefian, X. Bouyssounousse, T. Fong, and G. Bebis. Vehicle Detection from Aerial
15
Imagery. Presented at 2011 IEEE International Conference on Robotics and Automation, 2011.
16
17
32. Li, F., S. Li, C. Zhu, X. Lan, and H. Chang. Cost-Effective Class-Imbalance Aware CNN for Vehicle
18
Localization and Categorization in High Resolution Aerial Images. Remote Sensing, 2017. 9: 1–29.
19
20
33. Lin, T. Y., M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick.
21
Microsoft COCO: Common Objects in Context. Lecture Notes in Computer Science, 2014. 8693: 740–
22
755.
23
24
34. Zhao, X., D. Dawson, W. A. Sarasua, and S. T. Birchfield. Automated Traffic Surveillance System
25
with Aerial Camera Arrays Imagery: Macroscopic Data Collection with Vehicle Tracking. Journal of
26
Computing in Civil Engineering, 2016. 31: https://doi.org/10.1061/(asce)cp.1943-5487.0000646.
27
28
35. Khan, M. A., Ectors, W., Bellemans, T., Janssens, D., and Wets, G. Unmanned Aerial Vehicle-Based
29
Traffic Analysis: A Methodological Framework for Automated Multi-Vehicle Trajectory Extraction.
30
Transportation Research Record: Journal of the Transportation Research Board, 2017. 32: 1–15.
31
32
36. Yu, S. L., T. Westfechtel, R. Hamada, K. Ohno, and S. Tadokoro. Vehicle Detection and Localization
33
on Bird’s Eye View Elevation Images Using Convolutional Neural Network. Presented at 2017 IEEE
34
International Symposium on Safety, Security and Rescue Robotics (SSRR), 2017.
35
36
37. Li, Z., S. Ahn, K. Chung, D. R. Ragland, W. Wang, and J. W. Yu. Surrogate Safety Measure for
37
Evaluating Rear-End Collision Risk Related to Kinematic Waves near Freeway Recurrent Bottlenecks.
38
Accident Analysis & Prevention, 2014. 64: 52–61.
39
40
38. Borji, A., M. M. Cheng, H. Jiang, and J. Li. Salient Object Detection: A Benchmark. IEEE
41
Transactions on Image Processing, 2015. 24: 5706–5722.
42
43
39. Bodla, N., B. Singh, R. Chellappa, and L. S. Davis. Soft-NMS--Improving Object Detection With
44
One Line of Code. Presented at 2017 IEEE International Conference on Computer Vision (ICCV), 2017.
45
46
40. Bau, D., Zhou, B., Khosla, A., Oliva, A. and Torralba, A. Network dissection: Quantifying
47
interpretability of deep visual representations. Presented at 2017 IEEE Conference on Computer Vision
48
and Pattern Recognition, 2017.
49
50
17
41. Ren, Y., C. Zhu, and S. Xiao. Small Object Detection in Optical Remote Sensing Images via Modified
1
Faster R-CNN. Applied Sciences, 2018. 8: https://doi.org/10.3390/app8050813.
2
18
TABLE 1 Environment Description of Sample Images
1
Environment
Number
of
Vehicle
Number
of Heavy
Vehicle
Sum
Average
Length of
Vehicle
(px)
Average
Width of
Vehicle
(px)
Average
Length
of Heavy
Vehicle
(px)
Average
Width of
Heavy
Vehicle
(px)
Heavy
Vehicle
Ratio
Average
Length
(px)
Average
width
(px)
Number
of
Images
Cheonho
Bridge
1042
95
1137
146.08
63.70
304.55
86.59
8.36
159.32
65.62
125
Gyeongbu
Expressway
4333
393
4726
91.80
40.66
189.20
52.78
8.32
99.90
41.67
98
Geyeongin
Expressway 2
4447
309
4756
134.14
60.30
286.76
89.23
6.50
144.06
62.18
130
Seohaean
Expressway
1576
118
1694
132.52
57.34
267.83
93.29
6.97
141.95
59.84
118
Total or
Average
11398
915
12313
126.13
55.50
262.08
80.47
7.43
136.24
57.36
471
2
19
TABLE 2 Evaluation of Sample Detection
1
Detection Number
IoU
Threshold
0.50
0.80
0.90
1
0.92
True
True
True
2
0.55
True
False
False
3
0.22
False
False
False
4
0.82
True
True
False
Precision
0.75
0.5
0.25
Recall
0.60
0.40
0.20
F-Score
0.67
0.44
0.22
2
20
TABLE 3 mF-score [0.50:0.95] Comparison by Number of Detection Class
1
Cheonho Bridge (CB)
Gyeongin Expressway 2 (GE2)
Faster
R-
CNN
Single-Class
Detector
Multi-Class Detector
Single-Class
Detector
Multi-Class Detector
Overall
Vehicle
Heavy
Vehicle
Overall
Vehicle
Heavy Vehicle
0.775
0.782
0.774
0.702
0.696
0.689
R-FCN
Single-Class
Detector
Multi-Class Detector
Single-Class
Detector
Multi-Class Detector
Overall
Vehicle
Heavy
Vehicle
Overall
Vehicle
Heavy Vehicle
0.803
0.802
0.780
0.616
0.624
0.603
SSD
Single-Class
Detector
Multi-Class Detector
Single-Class
Detector
Multi-Class Detector
Overall
Vehicle
Heavy
Vehicle
Overall
Vehicle
Heavy Vehicle
0.861
0.893
0.870
0.800
0.806
0.759
2
21
TABLE 4 Performance Comparison by Detection Algorithm
1
2
3
`
Cheonho Bridge
Gyeongbu Expressway
Gyeongin Expressway 2
Seohaean Expressway
Faster
R-CNN
R-FCN
SSD
Faster
R-CNN
R-FCN
SSD
Faster
R-CNN
R-FCN
SSD
Faster
R-CNN
R-FCN
SSD
Vehicle
0.782
0.802
0.893
0.563
0.423
0.875
0.696
0.624
0.806
0.575
0.556
0.694
Heavy
Vehicle
0.774
0.780
0.870
0.563
0.466
0.876
0.689
0.603
0.759
0.589
0.472
0.614
22
1
2
Figure 1 Structure of Faster R-CNN
3
23
1
2
3
Figure 2 (a) Structure of Faster R-FCN; (b) Structure of SSD
4
24
1
2
Figure 3 Overall framework of the research
3
4
25
1
2
3
4
5
Figure 4 Sample image data (a) Cheonho bridge (clear morning, no shadow, free flow, and road mark) (b)
6
Gyeongbu expressway (cloudy morning, faded shadow, moderate traffic, and large spatial scope) (c)
7
Gyeongin expressway 2 (clear evening, full shadow, heavy traffic, and slight curve), and (d) Seohaean
8
expressway (clear afternoon, partial shadow, moderate traffic, and shadow of road sign)
9
10
26
1
2
Figure 5 Sample detection image
3
4
27
1
2
3
Figure 6 Result of same detection with different evaluation (a) AP 0.5 (left) (b) mF-score [0.50:0.95] (right)
4
5
6
28
1
2
Figure 7 Performance comparison of detection algorithms
3
4
29
1
2
Figure 8 Result of vehicle detection with SSD at the heaviest traffic (Gyeongin Expressway 2) (top) and
3
the free flow condition (Cheonho Bridge) (bottom)
4
5
30
1
2
3
4
Figure 9 (a) SSD’s performance by variation of image resolution (above) (b) Original UAV image (bottom
5
left) (c) UAV Image reduced to 1/25 (bottom right)
6
7
31
1
2
Figure 10 Change of performance by adding up training datasets
3