2D object detection problem in the Waymo Open Dataset

Source publication

Enhancing Object Detection for Autonomous Driving by Optimizing Anchor Generation and Addressing Class Imbalance

Article

Full-text available

Apr 2021

Object detection has been one of the most active topics in computer vision for the past years. Recent works have mainly focused on pushing the state-of-the-art in the general-purpose COCO benchmark. However, the use of such detection frameworks in specific applications such as autonomous driving is yet an area to be addressed. This study presents a...

Context 1

... different classes are considered for this problem: vehicles (which includes any wheeled motor object such as cars or motorbikes), pedestrians and cyclists. Figure 2a shows an example of the labeled data provided, which are tightly fitting bounding boxes around the objects. Furthermore, Waymo provides two different difficulty levels for the labels (Level 1 and 2), which are illustrated in Figure 2b. ...

View in full-text

Context 2

... 2a shows an example of the labeled data provided, which are tightly fitting bounding boxes around the objects. Furthermore, Waymo provides two different difficulty levels for the labels (Level 1 and 2), which are illustrated in Figure 2b. Level 2 instances are objects considered as hard and the criteria depends on both the human labelers and the object statistics. ...

View in full-text

Figure 1. Architecture description of purposed solution.

Figure 3. Annotation MoLa data format. For the metadata, we defined the...

Figure 5. From Top Left to Bottom Right: RGB, Depth, Point-Cloud,...

Figure 6. Laboratory car testbed from different perspectives. To train...

Fusion Object Detection and Action Recognition to Predict Violent Action

Preprint

Full-text available

Apr 2023

In the context of Shared Autonomous Vehicles, the need to monitor the environment inside the car will be crucial. This article focuses on the application of deep learning algorithms to detect objects, namely lost/forgotten items to inform the passengers, and aggressive items to monitoring if violent actions may arise between passengers. For object...

Figure 6. Visualizations of the pseudo-labels produced on similar and...

Figure 7. The ratio of pseudo-label distribution and ground truth class...

Figure 8. KL divergence between the ground truth class distribution in...

Comparison of mAP for different semi-supervised methods on SODA10M. The...

Comparison of mAP for different semi-supervised methods on SODA10M...

Dual-Curriculum Teacher for Domain-Inconsistent Object Detection in Autonomous Driving

Preprint

Full-text available

Oct 2022

Object detection for autonomous vehicles has received increasing attention in recent years, where labeled data are often expensive while unlabeled data can be collected readily, calling for research on semi-supervised learning for this area. Existing semi-supervised object detection (SSOD) methods usually assume that the labeled and unlabeled data...

Figure 1: Images from a single frame obtained by the five cameras of...

Figure 2: 2D object detection problem in the Waymo Open Dataset

Figure 3: Faster R-CNN architecture. The improvements proposed in this...

Figure 4: Correlation between the object size and its location in the...

Figure 5: Distribution of the aspect and scale ratio of the elements...

Enhancing Object Detection for Autonomous Driving by Optimizing Anchor Generation and Addressing Class Imbalance

Preprint

Full-text available

Apr 2021

Histogram depicting the number of annotations present for each class in...

Architecture of the ensemble YOLOv7 and Mask R-CNN model

On-road obstacle detection in real time environment using an ensemble deep learning model

Article

Full-text available

May 2024

Obstacle detection on road is a challenging task in autonomous vehicle driving. Although obstacle detection is carried out with the help of sensors which are accurate and precise in real-time, they are not cost effective and computationally intensive. So, a computer vision and deep learning-based approach can be considered as a potential alternativ...

The qualitative results of encoder‐decoder framework with dynamic...

The pipeline of encoder‐decoder framework with dynamic convolution. The...

Details of the Dynamic Head. For one thing, each FPN feature map is...

Illustrations of the box regression target. x1,y1...

Illustration of the Instance Head. The Conv block contains a 3 × 3...

An encoder‐decoder framework with dynamic convolution for weakly supervised instance segmentation

Article

Full-text available

May 2023

In the systems of industrial robotics and autonomous vehicles, instance segmentation is widely employed. However, manually labelling an object outline is time‐consuming. In order to reduce annotation costs, we present a weakly supervised instance segmentation method in this article. A deeply convolutional network is first used to construct multi‐sc...

Enhancing Small Object Detection in Aerial Images: A Novel Approach with PCSG Model

Article

Full-text available

May 2024

Generalized target detection algorithms perform well for large- and medium-sized targets but struggle with small ones. However, with the growing importance of aerial images in urban transportation and environmental monitoring, detecting small targets in such imagery has been a promising research hotspot. The challenge in small object detection lies in the limited pixel proportion and the complexity of feature extraction. Moreover, current mainstream detection algorithms tend to be overly complex, leading to structural redundancy for small objects. To cope with these challenges, this paper recommends the PCSG model based on yolov5, which optimizes both the detection head and backbone networks. (1) An enhanced detection header is introduced, featuring a new structure that enhances the feature pyramid network and the path aggregation network. This enhancement bolsters the model’s shallow feature reuse capability and introduces a dedicated detection layer for smaller objects. Additionally, redundant structures in the network are pruned, and the lightweight and versatile upsampling operator CARAFE is used to optimize the upsampling algorithm. (2) The paper proposes the module named SPD-Conv to replace the strided convolution operation and pooling structures in yolov5, thereby enhancing the backbone’s feature extraction capability. Furthermore, Ghost convolution is utilized to optimize the parameter count, ensuring that the backbone meets the real-time needs of aerial image detection. The experimental results from the RSOD dataset show that the PCSG model exhibits superior detection performance. The value of mAP increases from 97.1% to 97.8%, while the number of model parameters decreases by 22.3%, from 1,761,871 to 1,368,823. These findings unequivocally highlight the effectiveness of this approach.

An empirical analysis of the relationship among price, demand and CO2 emissions in the Spanish electricity market

Article

Full-text available

Feb 2024

CO2 emissions play a crucial role in international politics. Countries enter into agreements to reduce the amount of pollution emitted into the atmosphere. Energy generation is one of the main contributors to pollution and is generally considered the main cause of climate change. Despite the interest in reducing emissions, few studies have focused on investigating energy pricing technologies. This article analyzes the technologies used to meet the demand for electricity from 2016 to 2021. The analysis is based on data provided by the Spanish Electricity System regulator, using statistical and clustering techniques. The objective is to establish the relationship between the level of pollution of electricity generation technologies and the hourly price and demand. Overall, the results suggest that there are two distinct periods with respect to the technologies used in the studied years, with a trend toward the use of cleaner technologies and a decrease in power generation using fossil fuels. It is also surprising that in the years 2016 to 2018, the most polluting technologies offered the cheapest prices.

Regional feature fusion for on-road detection of objects using camera and 3D-LiDAR in high-speed autonomous vehicles

Article

Full-text available

Nov 2023

Autonomous vehicles require accurate, and fast decision-making perception systems to know the driving environment. The 2D object detection is critical in allowing the perception system to know the environment. However, 2D object detection lacks depth information, which are crucial for understanding the driving environment. Therefore, 3D object detection is essential for the perception system of autonomous vehicles to predict the location of objects and understand the driving environment. The 3D object detection also faces challenges because of scale changes, and occlusions. Therefore in this study, a novel object detection method is presented that fuses the complementary information of 2D and 3D object detection to accurately detect objects in autonomous vehicles. Firstly, the aim is to project the 3D-LiDAR data into image space. Secondly, the regional proposal network (RPN) to produce a region of interest (ROI) is utilised. The ROI pooling network is used to map the ROI into ResNet50 feature extractor to get a feature map of fixed size. To accurately predict the dimensions of all the objects, we fuse the features of the 3D-LiDAR with the regional features obtained from camera images. The fused features from 3D-LiDAR and camera images are employed as input to the faster-region based con-volution neural network (Faster-RCNN) network for the detection of objects. The assessment results on the KITTI object detection dataset reveal that the method can accurately predict car, van, truck, pedestrian and cyclist with an average precision of 94.59%, 82.50%, 79.60%, 85.31%, 86.33%, respectively, which is better than most of the previous methods. Moreover, the average processing time of the proposed method is only 70 ms which meets the real-time demand of autonomous vehicles. Additionally, the proposed model runs at 15.8 frames per second (FPS), which is faster than state-of-the-art fusion methods for 3D-LiDAR and camera.

End-to-end acceleration of the YOLO object detection framework on FPGA-only devices

Article

Full-text available

Nov 2023
NEURAL COMPUT APPL

Object detection has been revolutionized by convolutional neural networks (CNNs), but their high computational complexity and heavy data access requirements make implementing these algorithms on edge devices challenging. To address this issue, we propose an efficient object detection accelerator for YOLO series algorithm. Our architecture utilizes multiple dimensions of parallelism to accelerate the convolution computation. We employ line-buffer-based parallel data caches and dedicated data access units to minimize off-chip bandwidth pressure. Additionally, our proposed design not only accelerates the convolutional computation, but also control-intensive post-processing to achieve low detection latency. We evaluate the final design on Xilinx V7-690t FPGA device, achieving a throughput of 525 GOP/s for a batch size of 1 and 914 GOP/s for a batch size equal to 2. Compared with state-of-the-art YOLOv2 and YOLOv3 implementations, our proposed accelerator offers up to 9×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} throughput improvement and 5×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} shorter latency.

Regional feature fusion for on-road detection of objects using camera and 3D-LiDAR in high-speed autonomous vehicles

Article

Full-text available

Oct 2023
SOFT COMPUT

Autonomous vehicles require accurate, and fast decision-making perception systems to know the driving environment. The 2D object detection is critical in allowing the perception system to know the environment. However, 2D object detection lacks depth information, which are crucial for understanding the driving environment. Therefore, 3D object detection is essential for the perception system of autonomous vehicles to predict the location of objects and understand the driving environment. The 3D object detection also faces challenges because of scale changes, and occlusions. Therefore in this study, a novel object detection method is presented that fuses the complementary information of 2D and 3D object detection to accurately detect objects in autonomous vehicles. Firstly, the aim is to project the 3D-LiDAR data into image space. Secondly, the regional proposal network (RPN) to produce a region of interest (ROI) is utilised. The ROI pooling network is used to map the ROI into ResNet50 feature extractor to get a feature map of fixed size. To accurately predict the dimensions of all the objects, we fuse the features of the 3D-LiDAR with the regional features obtained from camera images. The fused features from 3D-LiDAR and camera images are employed as input to the faster-region based convolution neural network (Faster-RCNN) network for the detection of objects. The assessment results on the KITTI object detection dataset reveal that the method can accurately predict car, van, truck, pedestrian and cyclist with an average precision of 94.59%, 82.50%, 79.60%, 85.31%, 86.33%, respectively, which is better than most of the previous methods. Moreover, the average processing time of the proposed method is only 70 ms which meets the real-time demand of autonomous vehicles. Additionally, the proposed model runs at 15.8 frames per second (FPS), which is faster than state-of-the-art fusion methods for 3D-LiDAR and camera.

Autonomous Vehicle Detection and Tracking Based on Improved Yolov5 and Gmmpf in Harsh Weather Conditions

Preprint

Full-text available

Jul 2023

Autonomous vehicles (AVs) rely on various sensory data to accurately understand their surroundings and guarantee a safe voyage. In AVs, and intelligent transportation systems, vehicle detection and tracking (VDT) are crucial. A camera's ability to perform is dangerously restricted by adverse or challenging weather conditions (CWC) like fog, rain, snow, sandstorms or dust, which all compromise driving safety by lowering visibility. These limitations affect how well the identification and tracking models used in traffic surveillance systems as well as applications for AVs function. This paper proposes autonomous VDT system using Improved You Look Only Once Version 5 (IYOLOV5) and Particle Filter based on a Gaussian Mixture Model (GMMPF) in harsh weather conditions. This paper consists of four steps: image collection, image deweathering, vehicle detection, and vehicle tracking (VT). First, the multiple roadside vehicles are collected from the datasets. Next, image deweathering is performed based on the Adaptive Automatic White balance (AAWB) method, which improves the quality of the images and preserves the edge details. Next, the IYOLOV5 algorithm is used to detect the vehicle, and finally, the vehicles are tracked using the GMMPF concept. The suggested method is evaluated and contrasted with the current methods on the DAWN and COCO datasets. The outcomes have confirmed the usefulness of the suggested solution, which outperforms cutting-edge vehicle recognition and tracking techniques in inclement weather.

Enhancement of Real-Time Object Detection for Autonomous Driving Based on YOLOv5

Conference Paper

Full-text available

Apr 2023

Object detection is a critical factor in autonomous driving systems. Many inspiring methods for improving object detection have been proposed in various studies by altering the internal structure of deep learning models. This research reveals several aspects that improve the speed and accuracy of object detection in autonomous driving. Based on the YOLOv5 one-stage deep learning framework, two models with different settings are constructed to evaluate the effectiveness of the considered factors, using a minimum amount of resources. Compared with YOLOv5, the algorithms derived in this paper improve the average accuracy by 9.1% and inference speed by 6.7% on a comparatively selected validation block from the BDD100k dataset.

Toward a Balanced Feature Space for the Deep Imbalanced Regression

Article

Full-text available

Jan 2023

Jangho Lee

Regression with imbalanced data has been regarded as a more realistic scenario due to the difficulty of data acquisition and label annotations. However, it has not been extensively studied compared to the imbalanced classification. In imbalanced regression scenario, the classical regression approach may lead to regression bias toward high-frequency target regions. In this study, we present a novel framework for effectively handling imbalanced data in regression tasks. We introduce a density-based stochastic mask that perturbs the mini-batch distribution by assigning probabilities based on the data distribution statistics. The mask assigns a higher probability to more frequent samples on the Bernoulli distribution. Next, we employ consistency-based learning to encourage the encoder to produce similar representations for perturbed versions of the same input, drawing inspiration from modern consistency-based learning approaches. By jointly training with the two proposed learning objectives, we achieved state-of-the-art performance on AgeDB-DIR and IMDB-WIKI-DIR, which are representative imbalanced age estimation datasets. Furthermore, we evaluated the generalization performance using UTKFace. Through extensive experiments, we confirmed that our method demonstrates efficacy in dealing with imbalanced regression data. The forthcoming task involves extending the suggested approach to different uses, such as predicting the progress of diseases in medical diagnoses and estimating monocular depth in self-driving technology.

Effective Vehicle Detection Using Improved Faster Recursive Convolutional Neural Network Model

Article

Full-text available

Dec 2022

In recent decades, vehicle recognition plays an essential and effective role in the intelligent transportation system and traffic safety. Currently, the deep learning approaches made an effective impact in the fast vehicle detection application. In the real-time traffic monitoring video sequences, it is difficult to recognize the smaller vehicle targets and multi-scale vehicle targets in the complex scenes. A new fully automated vehicle detection model is implemented in this manuscript to address the above-mentioned issue. After obtaining the videos from KITTI dataset, the mask is created for specific classes like car, pedestrian, and cyclist. Additionally, the data augmentation is accomplished using the techniques like zoom-out, zoom-in, shift, shear, flipping, and rotation. The data augmentation enhances the performance of the deep learning models by creating different and new examples for training the dataset. The deep learning models perform accurately, if the dataset is rich and sufficient. After data augmentation, an improved faster Recursive Convolutional Neural Network (R-CNN) model is developed for vehicle detection. The improved faster R-CNN model first extracts discriminative feature values from the images utilizing U-Net and Visual Geometry Group (VGG) 19 pre-trained methods. Then, it creates the region proposal to improve the detection performance and narrow the search space. On the KITTI dataset, the improved faster R-CNN model achieved 90.59% of average precision and 0.45 s of processing time, which are better compared to the existing models.

Provident vehicle detection at night for advanced driver assistance systems

Article

Full-text available

Nov 2022
AUTON ROBOT

In recent years, computer vision algorithms have become more powerful, which enabled technologies such as autonomous driving to evolve rapidly. However, current algorithms mainly share one limitation: They rely on directly visible objects. This is a significant drawback compared to human behavior, where visual cues caused by objects (e. g., shadows) are already used intuitively to retrieve information or anticipate occurring objects. While driving at night, this performance deficit becomes even more obvious: Humans already process the light artifacts caused by the headlamps of oncoming vehicles to estimate where they appear, whereas current object detection systems require that the oncoming vehicle is directly visible before it can be detected. Based on previous work on this subject, in this paper, we present a complete system that can detect light artifacts caused by the headlights of oncoming vehicles so that it detects that a vehicle is approaching providently (denoted as provident vehicle detection). For that, an entire algorithm architecture is investigated, including the detection in the image space, the three-dimensional localization, and the tracking of light artifacts. To demonstrate the usefulness of such an algorithm, the proposed algorithm is deployed in a test vehicle to use the detected light artifacts to control the glare-free high beam system proactively (react before the oncoming vehicle is directly visible). Using this experimental setting, the provident vehicle detection system’s time benefit compared to an in-production computer vision system is quantified. Additionally, the glare-free high beam use case provides a real-time and real-world visualization interface of the detection results by considering the adaptive headlamps as projectors. With this investigation of provident vehicle detection, we want to put awareness on the unconventional sensing task of detecting objects providently (detection based on observable visual cues the objects cause before they are visible) and further close the performance gap between human behavior and computer vision algorithms to bring autonomous and automated driving a step forward.

2D object detection problem in the Waymo Open Dataset

Contexts in source publication

Similar publications

Citations