Conference PaperPDF Available

Automatic Traffic Sign Board Detection from Camera Images Using Deep learning and Binarization Search Algorithm

Authors:
  • Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology
  • Vel Tech Rangarajan Dr.Sagunthala R&D Institute of Science and Technology
1
Abstract- A street scene in a city can be split into several different
objects. The primary focus of this paper is on developing an
autonomous recognition system for detecting and recognising
traffic sign elements in use, with a variety of options for setting
parameters and constraints. The algorithms and methods used by
the system are effective for identifying elements of traffic signs
inside camera-generated images. Bitmap image algorithms and
geometrical element techniques are merged in the recognition
process in order to increase recognition success and make the
operation more time-effective and efficient. The first step is to
remove the previously specified image from the camera using deep
learning based edge detection. The following stage is
standardization, which is frequently carried out via a binarization
image search that scans the image for continuous portions.
Periodic symptomatic evaluation, the main criterion for decision-
making for the action recognition system, is done in these areas. In
order to accurately identify some of the discovered relationships,
they are then connected with predefined items. Due to the scanning
and processing of cameras in the control and safety car
applications, a sophisticated autonomous system structure is
created for real-time application. With a driving aid, the proposed
technology reduces the possibility of human error. The suggested
approach improves overall performance favourably and segments
even small objects significantly better.
Keywords- Autonomous Recognition System, Bit mapping,
Binarization Geometrical recognition, Deep Learning based Edge
Detection.
I. INTRODUCTION
Deep convolutional neural networks substantially speed up
semantic segmentation algorithms. Scalable human-invented
architectures are the foundation of partial meta-learning
techniques for picture classification challenges. Scene parsing,
person-part segmentation, and semantic picture segmentation
were the main goals for the development of meta-learning
approaches for dense image prediction [1]. Many efforts have
been made to design and create architectures automatically by
creating a search space and combining it with straightforward
learning techniques. Due to the multiple scaling depiction of
visual information and requirement to operate on high-quality
photos, creating suitable search spaces in this area is difficult.
The challenging robotics task of autonomous driving calls for
sensing, planning, and execution within ever-changing
conditions. Since safety is of the utmost importance, this task
also needs to be carried out with absolute precision. Recursive
search spaces are built into dense image prediction to show how
effectiveness of random searching. Each pixel is indicated on
specific class objects in semantic segmentation issues, which
are also known as classification problems. The photos are first
processed by scene interpretation and semantic texturing
modules, following which a number of verifier model generates
two confidence scores for the present position and backdrop.
II. RELATED WORKS
A known angle overlay and matched the post filter
limit to discover circular road signs on their perimeter or
triangular traffic signs with three vertex points after capturing a
colour is implemented. Road sign recognition is accomplished
using Bayesian and Ada-Boost classification training. The
locations of the circle traffic signs were filtered using MSER
after the photographs were converted to the Color space.
Following the recording of the HOG's features, the SVM
classifier was employed to identify the traffic sign [2].
In 2023, two research on detection of sign features
YOLOv5-TDHSA and YOLOv5-DH on the YOLOv5s model.
It switches the YOLOv5s linked head for a decoupled head to
increase detection accuracy and speed convergence [3]. High-
order energy optimization was used to present a new
segmentation technique for stereo images in 2016 [4]. To
improve high-order potential functions, this method makes use
of disparity maps and relevant statistics from stereo pictures.
In 2015, a method opted to perform the restriction on
the area with an aspect ratio and presented the images in the
RGB colour space [5]. The normal feature of the region was
then extracted, and the SVM classifier was utilised to train and
categorise the traffic signs. A novel variational method for
segmenting images with intensity in homogeneity while also
estimating the bias field in 2015 [6]. Introduce a sliding window
to translate the original image intensity onto a different domain,
where the intensity distribution of each object remains Gaussian
Automatic Traffic Sign Board Detection from
Camera Images Using Deep learning and
Binarization Search Algorithm
Banu Priya Prathaban
Assistant Professor, Department of ECE
Vel Tech Rangarajan Dr. Sakunthala R&D
Institute of Science and Technology, Avadi,
Chennai, India
banupriyaprathaban@gmail.com
Ashwini A
Assistant Professor, Department of ECE
Vel Tech Rangarajan Dr. Sakunthala R&D
Institute of Science and Technology, Avadi,
Chennai, India
a.aswiniur@gmail.com
Purushothaman K. E
Assistant Professor, Department of ECE
Vel Tech Rangarajan Dr. Sakunthala R&D
Institute of Science and Technology, Avadi,
Chennai, India
Purushothaman1992@gmail.com
Jenath. M
Assistant Professor, Department of ECE
Sri Sairam Engineering College,
West Tambaram, Chennai, India
jenath.ece@sairam.edu.in
Prasanna R
Assistant Professor, Department of ECE
Sri Sairam Institute of Technology,
West Tambaram, Chennai, India
prasanna.ece@sairamit.edu.in
2023 International Conference on Recent Advances in Electrical, Electronics, Ubiquitous Communication, and Computational Intelligence (RAEEUCCI) | 979-8-3503-3742-6/23/$31.00 ©2023 IEEE | DOI: 10.1109/RAEEUCCI57140.2023.10134376
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on June 03,2023 at 04:43:01 UTC from IEEE Xplore. Restrictions apply.
2
and the modelled intensity of inhomogeneous objects is
Gaussian with distinct means and variances. A method looked
into the impact of the inception construction [7]. Finding and
describing the dense elements of the local feature pattern in the
convolutional network is the major goal of inception. It is
expected that the repetition is spatially stretched and that
convolutional blocks create translation invariance. It is planned
to swap the feature mining module in YOLOv5, which was put
forth in 2023 with a compact module dubbed C3Ghost. The
lightweight feature extraction method used by C3Ghost
modules significantly quickens inference. The simultaneous
development of a feature extraction at multiple stages will boost
the attention on small things [8]. All of these currently used
techniques use classification to find the cutoff that minimizes
intra-class variance, which is calculated as the weighted
average of variations within the two groups. It provides one
intensity threshold that divides pixels into the foreground and
background classes. The drawback is that some details, such
termination to bifurcation and reverse, can be switched.
Research Gap Identified
The lack of a precise definition of segmentation is the
key research gap. While some arrange pixels into super-pixels,
others group edges into closed shapes. It is resistant to noise and
occlusion. Lack of repeatability in the presence of even modest
changes in visual material is another major issue with most
segmentation algorithms.
Only high-level feedback signals resulting from a
recognition process can be used to segment data using context
in this way. Segmentation is a by-product of recognition;
however, it is still unknown how it fits into the recognition
pipeline. Image segmentation doesn't have a clear definition.
The pixel-level criterion, which attempts to take into
consideration the spatial positions of abnormalities, is the other
criterion [9]. It does not reward loose localization. It does not
provide a reliable indicator of the number of false positive
regions a given algorithm is likely to encounter in the real
world.
Contribution of the Proposed Research work
.
The number of patches has little effect on either variation,
and the adaptable realization is suitable for automotive and
other applications. The baseline algorithms are most
effective at identifying abnormal activity.
The background model takes a few frames to absorb the
immobile individual; therefore, the deep learning edge
detection of the loitering anomaly is obvious.
The background model takes a few frames to absorb the
immobile individual; therefore, for these methods, the
beginning of the loitering anomaly is obvious. The region-
based results, where the suggested method outperforms the
flow-based method, are explained by a similar effect.
It finds a lesser percentage of all anomalous locations and
a higher percentage of all anomalous tracks with low false
positive rates. This suggests that the track-based criterion
might provide a more precise indication of how well an
algorithm performs in actual situations. The signal
indicators are part of the anomalous zone that was
discovered nearby. There is also a false-positive region
towards the bottom, left corner, which is caused by a
person's shadow, but it is still counted as a correct detection
because of the low threshold [10].
The ground truth bounding box includes the signal signs.
Even though the baseline method only examines a single-
scale image, these accurate detections demonstrate the
range of sizes that it can handle.
The following sections outline the paper's structure: Section 3
presents the suggested approach. Section 4 presents the
research's findings; Section 5 evaluates performance metrics
pertinent to the desired research; and Section 6 presents the
study's conclusion.
III. PROPOSED RESEARCH WORK
The acquisition of the image is the first stage of the algorithm
for the suggested research topic, and then there are further
stages. The flow is shown in Figure 1.
Fig. 1. Proposed sign board detection
A. Host Image
Modern camera systems can be integrated into a
variety of devices, including autonomous vehicles, surveillance
systems, and mobile phones, to provide extremely high-quality
photos at a low cost. The need for systems that can decipher and
comprehend these images grows as a result. However, the
method for identifying items and determining their significance
in images remains the same in contests using image recognition.
As time went on, particularly thanks to the internet, an
enormous amount of data started to be created and kept in the
HOST IMAGE
BINARIZATION SEARCH BASED
SEGMENTED OUTPUT
GROUND TRUTH REPRESENTATION
BIT MAPPING BASED GEOMETRICAL
ELEMENT ALGORITHM
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on June 03,2023 at 04:43:01 UTC from IEEE Xplore. Restrictions apply.
3
digital environment. The parallel computing design of the
graphics processor allows computational processes to be carried
out significantly more quickly in conjunction with
advancements in GPU technology. Deeper neural networks are
now frequently used in practise due to this increase in
processing power. The image is used as input to determine each
pixel's class probability value.
B. Edge Detection based on Deep learning
The new DNN module uses holistically layered
semantic segmentation, a neural training contrast
enhancement technique. A trimmed VGG-like convolutional
neural network is used in this learning-based end-to-end edge
detection system to do image-to-image prediction. The side
outputs of intermediary layers are utilised by HED. To get the
final predictions, the information from all 5 convolutional
layersknown as side outputis fused with the data from
prior layers. Since the feature maps produced at each layer are
of a different size, the image is actually being viewed at
various sizes.
C. Ground truth representation
Ground truth is often accomplished through on-site
surface measurements and observations of the numerous
ground resolution cell characteristics being investigated on the
remotely sensed digital image. It also requires using GPS to
retrieve the ground resolution cell's geographic coordinates and
comparing them to the coordinates of the "pixel" under study
provided by the remote sensing software in order to understand
and evaluate the location errors. Ground truth is essential for
the initial supervised categorization of a picture. The spectral
features of these regions trains the remote based software,
which employs specific decision criteria to identify remaining
areas of the picture. At ground truth locations, these sensor can
produce an error matrix that confirms the accuracy of the
classification system. Different forms of classification
techniques have variable degrees of inaccuracy for
classification endeavor. The remote sensor must use a
classification strategy that minimizes error and is compatible
with the number of categories being used.
D. Bit Mapping based geometrical element Algorithm
A bitmap is memory structure used to store digital
pictures. A spatial map based bits or a map of bits are simply
referred to as "bitmaps," a word that originates from computer
programming. Pixmap now frequently relates to the idea of a
pixel mapping of arrays. Raster images, produced through any
method of digitally or physically, are commonly referred as
pixmaps or bitmaps in memory files. The amount of bits per
pixel used to describe an image's colours is frequently used to
represent an image's colour depth. The bits that represent the
pixels bitmap range may be packed or unpacked which is based
on word or byte limits depending on device or on the format
requirements.
E. Binarization Search Based Segmented Image
A search method known binary search, also referred to
as half-interval search, exponential search, or bipolar chop,
locates a target value within a sorted collection. The
binarization approach has the capacity to fill in any little gaps
or holes in the ridges and also filter away minor cuts.
Binarization is the process of separating a backdrop from its
foreground text. Recovery of the object from the damaged
object images is the goal, and the text belongs in the
foreground. The central member of the array is contrasted with
the goal value in a binary search. If not, the half where the target
cannot reside is eliminated, and the remaining half is searched
while once again comparing centre value element to the value
of the target. This procedure is repeated up until the target value
is discovered. The next-smallest or next-largest member of the
array relative to the target can be found using binary search,
which can be applied to a broader range of issues, even if the
target isn't present in the collection. Binary search comes in a
lot of different forms. Binary scans for the same value across
multiple arrays are accelerated, especially by fractional
cascading.
• Set the sorted pixel block as the search space.
• Compare the target value to the element in the search space
that is in the middle values. The target value is chosen if the
target is equal to the center element.
• Return if there isn't a match in the array.
IV. RESULT AND DISCUSSION
The obtained image is then transformed into the ensuing
grayscale image based on the sensor's resolution from the
Figure 2(a). The specific object of interest is taken, and the
distortion is extracted from it. For processing, the image is
changed to a grayscale version.
Fig. 2. (a) Host Image (b) DL-Edge Detection
Fig. 3. (a) Ground-truth Representation (b) Bit Mapping based Geometrical
Representation
From the edges that have been found, the target axis is
retrieved in Figure 2(b). The edges' pixels with comparable
intensity levels are those that occur there. When the horizontal
and vertical axes of extraction are required, these edges are then
recognized using the sobel edge detection technique. The image
used as a mask for the object mask is the one that represents the
truth. These aid in identifying distinct items when object
detection, which implies traffic board sign detection, is
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on June 03,2023 at 04:43:01 UTC from IEEE Xplore. Restrictions apply.
4
conducted and used in a scene is given in Figure 3 and 4. This
efficiently works to determine the traffic board's letters, which
are primarily used in robotic systems.
Fig. 3. Binarization Search Algorithm with sign board information
To build a separate bitmap that offers a packed layer of picture
components, an alpha channel is used. Using the directional
gradient in the texture feature maps, textured pictures are
computed, and the texture edge flow is computed with the
performance flow in Figure 5.
Fig. 4. Detected Text as Output Image
The algorithm creates segmentation, which separates the image
into its individual objects of interest. It usually tries to maximise
the weight between the class variances while minimising the
weight within the class variances. Based on the active search
region of an image, it provides distinct label values.
Fig. 5. Performance Measure
The Figure 5 shows the performance measure which compares
the time which is very much reduced in the proposed system
when compared to the existing system. Moreover the PSNR
value is high when compared to the overall existing systems.
The peak signal-to-noise ratio between input and the output
images, measured in decibels, is computed by the PSNR block.
This ratio is used to compare the original and compressed
images' quality. The quality of the compressed or rebuilt image
improves with increasing PSNR.
TABLE I. COMPARATIVE STUDY OF EXISTING AND PROPOSED
METHOD
S.No.
Method
PSNR
Time
1
Existing
23.45
36.91
2
Proposed
24.04
29.63
Based on the object of interest, the sign board is detected
throughout the entire scene. Based on PSNR and time value
sequence, existing performance and proposed criteria are
assessed. When compared to the current method, the execution
time is shortened. This research finds application in traffic sign
based detection of automated vehicles using the robotic
systems. This helps in traffic signal detection applications.
V. CONCLUSION
In addition to suggesting a minimal weight for the traffic board
street scene, this work also offered a metric to gauge the
effectiveness of the binarization search technique between each
pair of object types. The proposed method may significantly
reduce the noise level in a scene of small objects and enhance
overall segmentation performance, according to the findings of
the experiment. The proposed architecture can be quickly and
cheaply coupled with numerous cutting-edge segmentation
networks during deployment. By employing other
improvement techniques, the proposed method's small accuracy
drop can be addressed. Future work on this can be improved by
utilising different label values. High-accuracy methods can be
used in future studies to get around these drawbacks.
REFERENCES
[1] Zhang, Yongliang, Yang Lu, Wuqiang Zhu, Xing Wei, and Zhen Wei,
"Traffic sign detection based on multi-scale feature extraction and cascade
feature fusion," The Journal of Supercomputing, vol. 79, no. 2, pp. 2137-
2152, 2023.
[2] Liu. W, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C.
Berg, “Ssd: Single shot multibox detector,” in European Conference on
Computer Vision, Springer, pp. 2137, 2016.
[3] Bai, Wei, Jingyi Zhao, Chenxu Dai, Haiyang Zhang, Li Zhao, Zhanlin Ji,
and Ivan Ganchev, "Two Novel Models for Traffic Sign Detection Based
on YOLOv5s," Axioms, vol.12, no. 2, pp.160, 2023.
[4] Pinggera, S. Ramos, S. Gehrig, U. Franke, C. Rother, and R. Mester, “Lost
and found: detecting small road hazards for self-driving vehicles,” in
Proceedings of the IEEE International Conference on Intelligent Robots
and Systems. IEEE, pp. 10991106, 2016.
[5] Sermanet. P and Y. LeCun, “Traffic sign recognition with multi-scale
convolutional networks,” International Joint Conference on Neural
Networks, IEEE, pp. 28092813, 2015.
[6] Zhang. S, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C.
Huang, and P. H. Torr, “Conditional random fields as recurrent neural
networks,” Proceedings of the IEEE International Conference on
Computer Vision, pp. 15291537, 2015.
[7] Li. H, Z. Lin, X. Shen, J. Brandt, and G. Hua, “A convolutional neural
network cascade for face detection,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 53255334,
2015.
[8] Zhang, Shuo, Shengbing Che, Zhen Liu, and Xu Zhang. "A real-time and
lightweight traffic sign detection method based on ghost-
YOLO." Multimedia Tools and Applications (2023): 1-25.
[9] Huang, Kai, "Traditional methods and machine learning-based methods
for traffic sign detection," In Third International Conference on
Intelligent Computing and Human-Computer Interaction), SPIE, vol.
12509, pp. 539-544, 2023.
[10] Lu, Guanlin, Xiaohui He, Qiang Wang, Faming Shao, Jinkang Wang, and
Cong Hu. "A Traffic Sign Detection Network Based on PosNeg-Balanced
Anchors and Domain Adaptation." Arabian Journal for Science and
Engineering, vol. 48, no. 2, pp. 1333-1347, 2023.
0
25
50
75
100
PSNR Time(sec)
Proposed
Existing
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on June 03,2023 at 04:43:01 UTC from IEEE Xplore. Restrictions apply.
5
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on June 03,2023 at 04:43:01 UTC from IEEE Xplore. Restrictions apply.
Chapter
The emergence of internet of things (IoT) based smart sensors has brought a revolution in disaster management to create early warning systems and to quickly respond to natural disasters. This chapter explores the vital function that these sensors serve, emphasizing how they might improve preparedness and lessen the destructive effects of disasters. It provides vital information that helps communities and authorities to anticipate any flooding events using smart sensors, flood sensors, atmospheric sensors. It emphasizes the necessity of developments in IoT-based smart sensors through case studies and real-world examples. Through an examination of the application of advanced technologies like artificial intelligence and machine learning, the chapter highlights how these sensors can sense on changing environmental conditions, improving the precision of early warning systems and response plans. The chapter ends with a discussion of the applications of internet of things-based smart sensors that can successfully address the difficulties brought on by natural catastrophes.
Chapter
Artificial intelligence (AI) has emerged as a transformative force in the area of thrust production. The substantial effects of AI-powered tools on the production of engines, turbine systems, and propulsion that create lift for aircraft are examined in this chapter. Green aviation is advancing due to electric hybrid engine technology, which reduces emissions and solve environmental issues. Artificial intelligence, additive manufacturing, and technological innovation are shaping its evolution. This chapter explores developments and emerging themes offering an overview of the opportunities facing the aerospace sector. Software for process optimization examines data in real time to find bottlenecks and boost output effectiveness. Design optimization is aided by AI-driven models, while operational safety and fuel efficiency are enhanced by performance monitoring systems. These developments bring in a new age of technological growth and excellence by highlighting the crucial role that AI plays in enhancing reliability, productivity, and safety of thrust manufacturing.
Article
Full-text available
Object detection and image recognition are some of the most significant and challenging branches in the field of computer vision. The prosperous development of unmanned driving technology has made the detection and recognition of traffic signs crucial. Affected by diverse factors such as light, the presence of small objects, and complicated backgrounds, the results of traditional traffic sign detection technology are not satisfactory. To solve this problem, this paper proposes two novel traffic sign detection models, called YOLOv5-DH and YOLOv5-TDHSA, based on the YOLOv5s model with the following improvements (YOLOv5-DH uses only the second improvement): (1) replacing the last layer of the ‘Conv + Batch Normalization + SiLU’ (CBS) structure in the YOLOv5s backbone with a transformer self-attention module (T in the YOLOv5-TDHSA’s name), and also adding a similar module to the last layer of its neck, so that the image information can be used more comprehensively, (2) replacing the YOLOv5s coupled head with a decoupled head (DH in both models’ names) so as to increase the detection accuracy and speed up the convergence, and (3) adding a small-object detection layer (S in the YOLOv5-TDHSA’s name) and an adaptive anchor (A in the YOLOv5-TDHSA’s name) to the YOLOv5s neck to improve the detection of small objects. Based on experiments conducted on two public datasets, it is demonstrated that both proposed models perform better than the original YOLOv5s model and three other state-of-the-art models (Faster R-CNN, YOLOv4-Tiny, and YOLOv5n) in terms of the mean accuracy (mAP) and F1 score, achieving mAP values of 77.9% and 83.4% and F1 score values of 0.767 and 0.811 on the TT100K dataset, and mAP values of 68.1% and 69.8% and F1 score values of 0.71 and 0.72 on the CCTSDB2021 dataset, respectively, for YOLOv5-DH and YOLOv5-TDHSA. This was achieved, however, at the expense of both proposed models having a bigger size, greater number of parameters, and slower processing speed than YOLOv5s, YOLOv4-Tiny and YOLOv5n, surpassing only Faster R-CNN in this regard. The results also confirmed that the incorporation of the T and SA improvements into YOLOv5s leads to further enhancement, represented by the YOLOv5-TDHSA model, which is superior to the other proposed model, YOLOv5-DH, which avails of only one YOLOv5s improvement (i.e., DH).
Article
Full-text available
Traffic sign detection is an essential part of traffic security and unmanned driving system. Due to the changes in the traffic environment is complex, how to intelligently and efficiently detect traffic signs in real scenes is of great significance. The traffic sign detection task is characterized by many small targets and complex environmental interference, and the detection scene also requires the detection model to be lightweight and efficient. This paper proposes a lightweight model Ghost-YOLO, and a lightweight module C3Ghost is designed to replace the feature extraction module in YOLOv5. C3Ghost modules extract features in a lightweight way, which effectively speeds up inference. At the same time, a new multi-scale feature extraction is designed to enhance the focus on small targets. Experimental results show that the mAP of the Ghost-YOLO is 92.71%, and the number of parameters and computations are respectively reduced to 91.4% and 50.29% of the original. Compared with multiple lightweight models, the speed and accuracy of this method are competitive.
Article
Full-text available
Existing algorithms have difficulty in solving the two tasks of localization and classification simultaneously when performing traffic sign detection on realistic images of complex traffic scenes. In order to solve the above problems, a new road traffic sign dataset is created, and based on the YOLOv4 algorithm, for the complexity of realistic traffic scene images and the large variation in the size of traffic signs in the images, the multi-scale feature extraction module, cascade feature fusion module and attention mechanism module are designed to improve the algorithm’s ability to locate and classify traffic signs simultaneously. Experimental results on the newly created dataset show that the improved algorithm achieves a mean average precision of 84.44%, which is higher than several major CNN-based object detection algorithms for the same type of task.
Conference Paper
Full-text available
We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. SSD is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stages and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, COCO, and ILSVRC datasets confirm that SSD has competitive accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. For \(300 \times 300\) input, SSD achieves 74.3 % mAP on VOC2007 test at 59 FPS on a Nvidia Titan X and for \(512 \times 512\) input, SSD achieves 76.9 % mAP, outperforming a comparable state of the art Faster R-CNN model. Compared to other single stage methods, SSD has much better accuracy even with a smaller input image size. Code is available at https:// github. com/ weiliu89/ caffe/ tree/ ssd.
Article
Full-text available
Detecting small obstacles on the road ahead is a critical part of the driving task which has to be mastered by fully autonomous cars. In this paper, we present a method based on stereo vision to reliably detect such obstacles from a moving vehicle. The proposed algorithm performs statistical hypothesis tests in disparity space directly on stereo image data, assessing freespace and obstacle hypotheses on independent local patches. This detection approach does not depend on a global road model and handles both static and moving obstacles. For evaluation, we employ a novel lost-cargo image sequence dataset comprising more than two thousand frames with pixelwise annotations of obstacle and free-space and provide a thorough comparison to several stereo-based baseline methods. The dataset will be made available to the community to foster further research on this important topic. The proposed approach outperforms all considered baselines in our evaluations on both pixel and object level and runs at frame rates of up to 20 Hz on 2 mega-pixel stereo imagery. Small obstacles down to the height of 5 cm can successfully be detected at 20 m distance at low false positive rates.
Article
For the purpose of achieving effective detection of traffic signs and improving the network transfer ability in different road scenarios, a traffic sign detection network based on PosNeg-balanced anchors and domain adaptation named STDN is proposed. The network is mainly composed of an improved single-stage prediction network (ISPN) and a two-stage domain adaptive network (TDAN). Specifically, the ISPN is a unique single-stage detector that introduces an anchor frame calibration module and a feature matching module to alleviate the imbalance of positive and negative samples of anchor frames, strengthen the expression of feature alignment, and promote efficient detection. The TDAN uses global and local hierarchical domain adaptive modules to reduce inter-domain deviations and improve the network stability and inter-domain migration performance in complex, dynamic, and irregular road scene. The experimental results confirm that the STDN has the advantages of high detection accuracy, fast response speed, and excellent domain transfer performance. It has considerable potential for engineering applications.