Figure 2 - uploaded by Manuel Carranza-García
Content may be subject to copyright.
2D object detection problem in the Waymo Open Dataset

2D object detection problem in the Waymo Open Dataset

Source publication
Article
Full-text available
Object detection has been one of the most active topics in computer vision for the past years. Recent works have mainly focused on pushing the state-of-the-art in the general-purpose COCO benchmark. However, the use of such detection frameworks in specific applications such as autonomous driving is yet an area to be addressed. This study presents a...

Contexts in source publication

Context 1
... different classes are considered for this problem: vehicles (which includes any wheeled motor object such as cars or motorbikes), pedestrians and cyclists. Figure 2a shows an example of the labeled data provided, which are tightly fitting bounding boxes around the objects. Furthermore, Waymo provides two different difficulty levels for the labels (Level 1 and 2), which are illustrated in Figure 2b. ...
Context 2
... 2a shows an example of the labeled data provided, which are tightly fitting bounding boxes around the objects. Furthermore, Waymo provides two different difficulty levels for the labels (Level 1 and 2), which are illustrated in Figure 2b. Level 2 instances are objects considered as hard and the criteria depends on both the human labelers and the object statistics. ...

Similar publications

Preprint
Full-text available
In the context of Shared Autonomous Vehicles, the need to monitor the environment inside the car will be crucial. This article focuses on the application of deep learning algorithms to detect objects, namely lost/forgotten items to inform the passengers, and aggressive items to monitoring if violent actions may arise between passengers. For object...
Preprint
Full-text available
Object detection for autonomous vehicles has received increasing attention in recent years, where labeled data are often expensive while unlabeled data can be collected readily, calling for research on semi-supervised learning for this area. Existing semi-supervised object detection (SSOD) methods usually assume that the labeled and unlabeled data...
Preprint
Full-text available
Object detection has been one of the most active topics in computer vision for the past years. Recent works have mainly focused on pushing the state-of-the-art in the general-purpose COCO benchmark. However, the use of such detection frameworks in specific applications such as autonomous driving is yet an area to be addressed. This study presents a...
Article
Full-text available
Obstacle detection on road is a challenging task in autonomous vehicle driving. Although obstacle detection is carried out with the help of sensors which are accurate and precise in real-time, they are not cost effective and computationally intensive. So, a computer vision and deep learning-based approach can be considered as a potential alternativ...
Article
Full-text available
In the systems of industrial robotics and autonomous vehicles, instance segmentation is widely employed. However, manually labelling an object outline is time‐consuming. In order to reduce annotation costs, we present a weakly supervised instance segmentation method in this article. A deeply convolutional network is first used to construct multi‐sc...

Citations

... Furthermore, noise filtering during convolution operations can reduce image resolution, causing the loss of critical features essential for effectively learning from small targets [7]. Hence, addressing the challenge of small object detection, characterized by limited pixel proportions and complex feature extraction, is essential for advancing the capabilities of object detection algorithms in aerial imagery analysis [8,9]. ...
Article
Full-text available
Generalized target detection algorithms perform well for large- and medium-sized targets but struggle with small ones. However, with the growing importance of aerial images in urban transportation and environmental monitoring, detecting small targets in such imagery has been a promising research hotspot. The challenge in small object detection lies in the limited pixel proportion and the complexity of feature extraction. Moreover, current mainstream detection algorithms tend to be overly complex, leading to structural redundancy for small objects. To cope with these challenges, this paper recommends the PCSG model based on yolov5, which optimizes both the detection head and backbone networks. (1) An enhanced detection header is introduced, featuring a new structure that enhances the feature pyramid network and the path aggregation network. This enhancement bolsters the model’s shallow feature reuse capability and introduces a dedicated detection layer for smaller objects. Additionally, redundant structures in the network are pruned, and the lightweight and versatile upsampling operator CARAFE is used to optimize the upsampling algorithm. (2) The paper proposes the module named SPD-Conv to replace the strided convolution operation and pooling structures in yolov5, thereby enhancing the backbone’s feature extraction capability. Furthermore, Ghost convolution is utilized to optimize the parameter count, ensuring that the backbone meets the real-time needs of aerial image detection. The experimental results from the RSOD dataset show that the PCSG model exhibits superior detection performance. The value of mAP increases from 97.1% to 97.8%, while the number of model parameters decreases by 22.3%, from 1,761,871 to 1,368,823. These findings unequivocally highlight the effectiveness of this approach.
... ML is seen as a part of artificial intelligence and one of its purposes is to extract knowledge from data. ML is also one of the most widely used techniques for knowledge discovery, and its use can be found in many areas of knowledge such as chemistry [11,12], computer vision [13][14][15] or data streaming [16][17][18]. Furthermore, we can find numerous studies that use ML to predict electricity consumption [19,20] or price prediction [21,22]. ...
Article
Full-text available
CO2 emissions play a crucial role in international politics. Countries enter into agreements to reduce the amount of pollution emitted into the atmosphere. Energy generation is one of the main contributors to pollution and is generally considered the main cause of climate change. Despite the interest in reducing emissions, few studies have focused on investigating energy pricing technologies. This article analyzes the technologies used to meet the demand for electricity from 2016 to 2021. The analysis is based on data provided by the Spanish Electricity System regulator, using statistical and clustering techniques. The objective is to establish the relationship between the level of pollution of electricity generation technologies and the hourly price and demand. Overall, the results suggest that there are two distinct periods with respect to the technologies used in the studied years, with a trend toward the use of cleaner technologies and a decrease in power generation using fossil fuels. It is also surprising that in the years 2016 to 2018, the most polluting technologies offered the cheapest prices.
... It is a sensor that examines the surroundings by sending out a laser beam, recording the reflection, and calculating the distance travelled by each pulse to determine the depth. These sensors are capable of recognising targets at a distance with accurate depth and have night vision capabilities (Carranza-García et al. 2021). In object detection for self-driving vehicles, the 3D LIDAR can obtain the orientation of detected objects, because the laser scans the spatial coordinates of objects. ...
Article
Full-text available
Autonomous vehicles require accurate, and fast decision-making perception systems to know the driving environment. The 2D object detection is critical in allowing the perception system to know the environment. However, 2D object detection lacks depth information, which are crucial for understanding the driving environment. Therefore, 3D object detection is essential for the perception system of autonomous vehicles to predict the location of objects and understand the driving environment. The 3D object detection also faces challenges because of scale changes, and occlusions. Therefore in this study, a novel object detection method is presented that fuses the complementary information of 2D and 3D object detection to accurately detect objects in autonomous vehicles. Firstly, the aim is to project the 3D-LiDAR data into image space. Secondly, the regional proposal network (RPN) to produce a region of interest (ROI) is utilised. The ROI pooling network is used to map the ROI into ResNet50 feature extractor to get a feature map of fixed size. To accurately predict the dimensions of all the objects, we fuse the features of the 3D-LiDAR with the regional features obtained from camera images. The fused features from 3D-LiDAR and camera images are employed as input to the faster-region based con-volution neural network (Faster-RCNN) network for the detection of objects. The assessment results on the KITTI object detection dataset reveal that the method can accurately predict car, van, truck, pedestrian and cyclist with an average precision of 94.59%, 82.50%, 79.60%, 85.31%, 86.33%, respectively, which is better than most of the previous methods. Moreover, the average processing time of the proposed method is only 70 ms which meets the real-time demand of autonomous vehicles. Additionally, the proposed model runs at 15.8 frames per second (FPS), which is faster than state-of-the-art fusion methods for 3D-LiDAR and camera.
... Deep learning-based object detection algorithms have been applied in many fields, such as autonomous driving [1], medical image detection [2], and industrial inspection [3]. However, these algorithms still face some crucial problems, such as the high computational workload and bandwidth requirement of the convolutional neural network (CNN) computation, which restricts the actual application. ...
Article
Full-text available
Object detection has been revolutionized by convolutional neural networks (CNNs), but their high computational complexity and heavy data access requirements make implementing these algorithms on edge devices challenging. To address this issue, we propose an efficient object detection accelerator for YOLO series algorithm. Our architecture utilizes multiple dimensions of parallelism to accelerate the convolution computation. We employ line-buffer-based parallel data caches and dedicated data access units to minimize off-chip bandwidth pressure. Additionally, our proposed design not only accelerates the convolutional computation, but also control-intensive post-processing to achieve low detection latency. We evaluate the final design on Xilinx V7-690t FPGA device, achieving a throughput of 525 GOP/s for a batch size of 1 and 914 GOP/s for a batch size equal to 2. Compared with state-of-the-art YOLOv2 and YOLOv3 implementations, our proposed accelerator offers up to 9×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} throughput improvement and 5×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} shorter latency.
... It is a sensor that examines the surroundings by sending out a laser beam, recording the reflection, and calculating the distance travelled by each pulse to determine the depth. These sensors are capable of recognising targets at a distance with accurate depth and have night vision capabilities (Carranza-García et al. 2021). In object detection for self-driving vehicles, the 3D LIDAR can obtain the orientation of detected objects, because the laser scans the spatial coordinates of objects. ...
Article
Full-text available
Autonomous vehicles require accurate, and fast decision-making perception systems to know the driving environment. The 2D object detection is critical in allowing the perception system to know the environment. However, 2D object detection lacks depth information, which are crucial for understanding the driving environment. Therefore, 3D object detection is essential for the perception system of autonomous vehicles to predict the location of objects and understand the driving environment. The 3D object detection also faces challenges because of scale changes, and occlusions. Therefore in this study, a novel object detection method is presented that fuses the complementary information of 2D and 3D object detection to accurately detect objects in autonomous vehicles. Firstly, the aim is to project the 3D-LiDAR data into image space. Secondly, the regional proposal network (RPN) to produce a region of interest (ROI) is utilised. The ROI pooling network is used to map the ROI into ResNet50 feature extractor to get a feature map of fixed size. To accurately predict the dimensions of all the objects, we fuse the features of the 3D-LiDAR with the regional features obtained from camera images. The fused features from 3D-LiDAR and camera images are employed as input to the faster-region based convolution neural network (Faster-RCNN) network for the detection of objects. The assessment results on the KITTI object detection dataset reveal that the method can accurately predict car, van, truck, pedestrian and cyclist with an average precision of 94.59%, 82.50%, 79.60%, 85.31%, 86.33%, respectively, which is better than most of the previous methods. Moreover, the average processing time of the proposed method is only 70 ms which meets the real-time demand of autonomous vehicles. Additionally, the proposed model runs at 15.8 frames per second (FPS), which is faster than state-of-the-art fusion methods for 3D-LiDAR and camera.
... The experiments were performed on the DAWN dataset and attained a mean average precision of 81%, which was more satisfactory than the other related techniques. Manuel Carranza-Garcia et al. [22] introduced an ensemble framework for autonomous VDT in severe weather conditions. The anchor box optimization was done in the presented system to enhance the object detection accuracy, and the precision of the second stage network was enhanced using the candidate region's spatial information. ...
... However, collecting, processing, and analyzing training data for deep neural networks can be challenging and time-consuming, primarily when the training data reflect all real-world scenarios [25]. Real-time speed is essential in autonomous driving, although some researchers should have focused on that [22]. In urban settings, situational awareness is essential for autonomous driving. ...
... In the comparative analysis section, the performance of the proposed IYOLOV5-GMMPF is compared with the previous models proposed for vehicle detection, namely, YOLOV4 [21], Original Faster R-CNN [22], RetinaResNet50 [24], GYOLOV3 [26]. This comparative analysis is compared in terms of mean average precision (mAP) and training time. ...
Preprint
Full-text available
Autonomous vehicles (AVs) rely on various sensory data to accurately understand their surroundings and guarantee a safe voyage. In AVs, and intelligent transportation systems, vehicle detection and tracking (VDT) are crucial. A camera's ability to perform is dangerously restricted by adverse or challenging weather conditions (CWC) like fog, rain, snow, sandstorms or dust, which all compromise driving safety by lowering visibility. These limitations affect how well the identification and tracking models used in traffic surveillance systems as well as applications for AVs function. This paper proposes autonomous VDT system using Improved You Look Only Once Version 5 (IYOLOV5) and Particle Filter based on a Gaussian Mixture Model (GMMPF) in harsh weather conditions. This paper consists of four steps: image collection, image deweathering, vehicle detection, and vehicle tracking (VT). First, the multiple roadside vehicles are collected from the datasets. Next, image deweathering is performed based on the Adaptive Automatic White balance (AAWB) method, which improves the quality of the images and preserves the edge details. Next, the IYOLOV5 algorithm is used to detect the vehicle, and finally, the vehicles are tracked using the GMMPF concept. The suggested method is evaluated and contrasted with the current methods on the DAWN and COCO datasets. The outcomes have confirmed the usefulness of the suggested solution, which outperforms cutting-edge vehicle recognition and tracking techniques in inclement weather.
... However, most of these studies have focused on modifying the deep learning models' underlying architecture to improve the detection performance (Cai et al., 2021;Dong et al., 2018;Huang et al., 2022;Liu et al., 2016). Only a few researchers have examined alternate methods based on two-stage models (Carranza et al., 2021). Based on a one-stage detection model, YOLOv5s (Jocher, 2020), this paper presents several factors that considerably improve the speed and accuracy of object detection in autonomous driving (AD) even with varying environmental conditions. ...
Conference Paper
Full-text available
Object detection is a critical factor in autonomous driving systems. Many inspiring methods for improving object detection have been proposed in various studies by altering the internal structure of deep learning models. This research reveals several aspects that improve the speed and accuracy of object detection in autonomous driving. Based on the YOLOv5 one-stage deep learning framework, two models with different settings are constructed to evaluate the effectiveness of the considered factors, using a minimum amount of resources. Compared with YOLOv5, the algorithms derived in this paper improve the average accuracy by 9.1% and inference speed by 6.7% on a comparatively selected validation block from the BDD100k dataset.
... Considering the difficulty of data collection and label annotation, it is inevitable that the gathered data shows an imbalanced distribution [5], [6]. In particular, as the difficulty of data compilation increases, the occurrence of imbalanced data situations becomes more frequent [7], [8]. Based on the previously mentioned properties, constructing a large-scale well-balanced dataset to accomplish successful learning requires significant time and financial cost [2], [9], [10]. ...
... Therefore, effective handling of imbalanced data in machine learning is crucial for the reliable and robust recognition. Imbalanced recognition is actively studied in classification [4], [5], [14], regression tasks [13], [15], and other computer vision areas [7], [8]. The imbalanced classification aims to alleviate a classification bias problem [16] derived from an unequal distribution of classes in the training dataset. ...
Article
Full-text available
Regression with imbalanced data has been regarded as a more realistic scenario due to the difficulty of data acquisition and label annotations. However, it has not been extensively studied compared to the imbalanced classification. In imbalanced regression scenario, the classical regression approach may lead to regression bias toward high-frequency target regions. In this study, we present a novel framework for effectively handling imbalanced data in regression tasks. We introduce a density-based stochastic mask that perturbs the mini-batch distribution by assigning probabilities based on the data distribution statistics. The mask assigns a higher probability to more frequent samples on the Bernoulli distribution. Next, we employ consistency-based learning to encourage the encoder to produce similar representations for perturbed versions of the same input, drawing inspiration from modern consistency-based learning approaches. By jointly training with the two proposed learning objectives, we achieved state-of-the-art performance on AgeDB-DIR and IMDB-WIKI-DIR, which are representative imbalanced age estimation datasets. Furthermore, we evaluated the generalization performance using UTKFace. Through extensive experiments, we confirmed that our method demonstrates efficacy in dealing with imbalanced regression data. The forthcoming task involves extending the suggested approach to different uses, such as predicting the progress of diseases in medical diagnoses and estimating monocular depth in self-driving technology.
... Still, the MFCN need to encounter the challenging issues like occlusion, truncation, rare poses, motion blur, and defocus. Carranza-García et al. [12] developed a twodimensional object detector based on faster RCNN for an effective object recognition in the context of autonomous driving. Additionally, Nguyen [13] implemented a novel system based on faster RCNN for vehicle detection. ...
Article
Full-text available
In recent decades, vehicle recognition plays an essential and effective role in the intelligent transportation system and traffic safety. Currently, the deep learning approaches made an effective impact in the fast vehicle detection application. In the real-time traffic monitoring video sequences, it is difficult to recognize the smaller vehicle targets and multi-scale vehicle targets in the complex scenes. A new fully automated vehicle detection model is implemented in this manuscript to address the above-mentioned issue. After obtaining the videos from KITTI dataset, the mask is created for specific classes like car, pedestrian, and cyclist. Additionally, the data augmentation is accomplished using the techniques like zoom-out, zoom-in, shift, shear, flipping, and rotation. The data augmentation enhances the performance of the deep learning models by creating different and new examples for training the dataset. The deep learning models perform accurately, if the dataset is rich and sufficient. After data augmentation, an improved faster Recursive Convolutional Neural Network (R-CNN) model is developed for vehicle detection. The improved faster R-CNN model first extracts discriminative feature values from the images utilizing U-Net and Visual Geometry Group (VGG) 19 pre-trained methods. Then, it creates the region proposal to improve the detection performance and narrow the search space. On the KITTI dataset, the improved faster R-CNN model achieved 90.59% of average precision and 0.45 s of processing time, which are better compared to the existing models.
... For example, under good visibility conditions (e. g., daylight), vehicles are detected based on feature descriptors like edge detectors and symmetry arguments using classifiers like support vector machines on top (e. g., Sun et al., 2002Sun et al., , 2006Teoh & Bräunl, 2011) or are being detected by end-to-end trained deep NNs (e. g., Fan et al., 2016;Hassaballah et al., 2021;Carranza-García et al., 2021). Under this condition, detectors often assume that vehicles can be localized mainly by their contours, which is also supported by the fact that the most commonly used annotation method for objects is by bounding boxes that inherently require clearly visible object contours to reliably annotate them (see the survey of Liu et al., 2019). ...
Article
Full-text available
In recent years, computer vision algorithms have become more powerful, which enabled technologies such as autonomous driving to evolve rapidly. However, current algorithms mainly share one limitation: They rely on directly visible objects. This is a significant drawback compared to human behavior, where visual cues caused by objects (e. g., shadows) are already used intuitively to retrieve information or anticipate occurring objects. While driving at night, this performance deficit becomes even more obvious: Humans already process the light artifacts caused by the headlamps of oncoming vehicles to estimate where they appear, whereas current object detection systems require that the oncoming vehicle is directly visible before it can be detected. Based on previous work on this subject, in this paper, we present a complete system that can detect light artifacts caused by the headlights of oncoming vehicles so that it detects that a vehicle is approaching providently (denoted as provident vehicle detection). For that, an entire algorithm architecture is investigated, including the detection in the image space, the three-dimensional localization, and the tracking of light artifacts. To demonstrate the usefulness of such an algorithm, the proposed algorithm is deployed in a test vehicle to use the detected light artifacts to control the glare-free high beam system proactively (react before the oncoming vehicle is directly visible). Using this experimental setting, the provident vehicle detection system’s time benefit compared to an in-production computer vision system is quantified. Additionally, the glare-free high beam use case provides a real-time and real-world visualization interface of the detection results by considering the adaptive headlamps as projectors. With this investigation of provident vehicle detection, we want to put awareness on the unconventional sensing task of detecting objects providently (detection based on observable visual cues the objects cause before they are visible) and further close the performance gap between human behavior and computer vision algorithms to bring autonomous and automated driving a step forward.