Figure 5 - uploaded by Fabio Cuzzolin
Content may be subject to copyright.
Visual demonstration of the improved detection results of YOLO-Z S (bottom) compared to YOLOv5 S (top) over a region of a sample image covering far away / small scale objects. Yellow and blue cone detections are shown as bounding boxes in the respective colours, detections missed by both models are shown as red boxes, detections missed by YOLOv5 but correctly identified by YOLO-Z S as red circles. One can observe that the improvement is most evident for farther away / smaller cones.

Visual demonstration of the improved detection results of YOLO-Z S (bottom) compared to YOLOv5 S (top) over a region of a sample image covering far away / small scale objects. Yellow and blue cone detections are shown as bounding boxes in the respective colours, detections missed by both models are shown as red boxes, detections missed by YOLOv5 but correctly identified by YOLO-Z S as red circles. One can observe that the improvement is most evident for farther away / smaller cones.

Source publication
Conference Paper
Full-text available
As autonomous vehicles and autonomous racing rise in popularity, so does the need for faster and more accurate detectors. While our naked eyes are able to extract contextual information almost instantly, even from far away, image resolution and computational resources limitations make detecting smaller objects (that is, objects that occupy a small...

Contexts in source publication

Context 1
... images that compose the dataset are a mixture of images taken from racing vehicles, generic pictures of cones at different ranges and arrangements, images from simulations and images digitally augmented for different weather and lighting conditions, including purposely challenging conditions. There are a total of 4 classes (yellow, blue, orange and big orange cones) and close to 4,000 images (see Figure 5). ...
Context 2
... context in which we have applied the proposed techniques, that of autonomous racing, is one that can greatly benefit from such an improvement. As we can see in Fig- ure 5, such changes do have a quantifiable impact on detection. In this work we have not only significantly improved the performance of the baseline model, but also identified a number of specific techniques that can be applied to any other application involving the detection of small or far away objects. ...

Similar publications

Article
Full-text available
Autonomous vehicles (AVs) have the potential to improve safety, traffic capacity, and energy efficiency, but these advantages can only be realized when the AV market penetration is sufficiently high. To promote the adoption of AVs, it would be crucial for the government to take policy measures. This paper develops a two-stage model to explore the e...
Preprint
Full-text available
Trajectory prediction plays a crucial role in the autonomous driving stack by enabling autonomous vehicles to anticipate the motion of surrounding agents. Goal-based prediction models have gained traction in recent years for addressing the multimodal nature of future trajectories. Goal-based prediction models simplify multimodal prediction by first...
Article
Full-text available
The societal integration of autonomous vehicles (AVs) relies on public acceptance, closely related to individual emotions and perceptions. This study explores the emotional factors affecting AV acceptance in Spain through lexical tasks, virtual AV simulations, and questionnaires, surpassing traditional attitude surveys by examining subtle emotional...
Article
Full-text available
Recent advancements in artificial intelligence (AI) have greatly improved the object detection capabilities of autonomous vehicles, especially using convolutional neural networks (CNNs). However, achieving high levels of accuracy and speed simultaneously in vehicular environments remains a challenge. Therefore, this paper proposes a hybrid approach...
Article
Full-text available
In recent years, there has been an increased interest in giving verbal commands to self-driving cars. Even though multiple companies have showcased progress towards fully autonomous vehicles, surveys have indicated that people are wary of relinquishing total control of the vehicle to the AI. Thus, a system allowing passengers to control the vehicle...

Citations

... YOLO, known for its superior performance in speed and detecting smaller targets, has been chosen in this study for conducting detection and recognition experiments on damaged steel cable surfaces. The YOLO object detection algorithm is a widely used single-stage detection algorithm with various versions such as YOLOv3 [1], YOLOv5 [2], YOLOv7 [3]. YOLOv5, considered the most outstanding due to its performance, builds on YOLOv4 [4] with some improvements. ...
Article
Full-text available
To address the issues such as limited detection device resources and prolonged detection times in surface damage detection of steel cables installed commercial, public, and industrial buildings, advanced deep learning techniques, and Convolutional Neural Networks (CNN) have been investigated in this study and a new network model has been designed. This work proposes a steel cable defect detection network model based on YOLO, incorporating GhostNet into the backbone network, and introducing a novel feature extraction module (ShuffleNC3) based on ShuffleNet and attention mechanisms. Pruning improvements are then applied to the Head part. Experimental results indicate that the improved network achieves approximately1.149% increase in average precision compared to the baseline YOLOv5s. This modification achieves a simultaneous reduction of network computational costs and maintains high recognition accuracy, meeting better requirements for surface damage detection in steel cables. The parameters and computational costs are reduced by approximately 43 % and 31.4%, respectively, while the model size also decreases by 42%.
... Feature pyramid networks (FPN) of many varieties can aggregate feature maps in different ways to improve a backbone in different ways. Such methods are rather effective [29]. ...
... These systems place a strong emphasis on inference time, at the expense of performance, if necessary, but work can be done to improve them at a low cost. Performance in this field is crucial, as even minor improvements can have a significant impact on the entire detecting system [29]. ...
Preprint
Full-text available
Deep learning has been constantly improving in recent years and a significant number of researchers have devoted themselves to the research of defect detection algorithms. Detection and recognition of small and complex targets is still a problem that needs to be solved. The authors of this research would like to present an improved defect detection model for detecting small and complex defect targets in steel surfaces. During steel strip production mechanical forces and environmental factors cause surface defects of the steel strip. Therefore the detection of such defects is key to the production of high-quality products. Moreover surface defects of the steel strip cause great economic losses to the high-tech industry. So far few studies have explored methods of identifying the defects and most of the currently available algorithms are not sufficiently effective. Therefore this study presents an improved real-time metallic surface defect detection model based on You Only Look Once (YOLOv5) specially designed for small networks. For the smaller features of the target the conventional part is replaced with a depth-wise convolution and channel shuffle mechanism. Then assigning weights to Feature Pyramid Networks (FPN) output features and fusing them increases feature propagation and the networks characterization ability. The experimental results reveal that the improved proposed model outperforms other comparable models in terms of accuracy and detection time. The precision of the proposed model achieved by @mAP is 77.5% on the Northeastern University Dataset NEU-DET and 70.18% on the GC10-DET datasets
... Each YOLO version has been pivotal in advancing the capabilities of autonomous vehicles by providing highly efficient and accurate real-time detection systems. Each iteration of YOLO has brought improvements that enhance the vehicle's ability to perceive its environment quickly and accurately, which is critical for safe navigation and decision-making [120]. Starting with YOLOv1 [26], the YOLO algorithm revolutionized the approach by performing detection tasks directly from full images in a single network pass, allowing for the detection of objects at a remarkable speed [121]. ...
... This is particularly evident in fields like medical imaging or satellite image analysis, where precision is crucial for identifying fine details. Techniques such as spatial pyramid pooling or enhanced up-sampling may be needed to increase the receptive field and improve the detection of smaller objects without compromising the model's efficiency [120,263,264]. • While YOLOv5 offers faster training and inference times compared to previous versions, its deployment on edge devices is limited by high memory and processing requirements [265,127]. ...
Preprint
Full-text available
This review systematically examines the progression of the You Only Look Once (YOLO) object detection algorithms from YOLOv1 to the recently unveiled YOLOv10. Employing a reverse chronological analysis, this study examines the advancements introduced by YOLO algorithms, beginning with YOLOv10 and progressing through YOLOv9, YOLOv8, and subsequent versions to explore each version's contributions to enhancing speed, accuracy, and computational efficiency in real-time object detection. The study highlights the transformative impact of YOLO across five critical application areas: automotive safety, healthcare, industrial manufacturing, surveillance, and agriculture. By detailing the incremental technological advancements in subsequent YOLO versions, this review chronicles the evolution of YOLO, and discusses the challenges and limitations in each earlier versions. The evolution signifies a path towards integrating YOLO with multimodal, context-aware, and General Artificial Intelligence (AGI) systems for the next YOLO decade, promising significant implications for future developments in AI-driven applications. Keywords: You Only Look Once, YOLOv10 to YOLOv1, CNN, Deep learning, Object detection, Real-time object detection, Artificial intelligence, Computer vision, Healthcare, Autonomous Vehicles, Industrial manufacturing, Surveillance, Agriculture, YOLOv10, YOLOv9, YOLOv8, YOLOv7, YOLOv6, YOLOv5, YOLOv4, YOLOv3, YOLOv2, YOLO
... Object detectors are usually divided into primary detectors and secondary detectors. First-level detectors have faster speed and can predict the location and class information of object objects without generating regional recommendations, and common first-level detectors can be further divided into anchor-based methods (such as SSD [17] and YOLO [18][19][20][21][22][23]), anchorless methods (such as FCOS [24]), and hybrid networks (such as RepPoints [25]). A lot of work has been proposed on the SSD basis to improve performance, and Retinanet [26] assigns more weight to the sample with a novel loss function with significant performance gains. ...
Article
Full-text available
In order to better detect objects of different scales, detectors need different resolutions and inputs from different receptive fields. Currently, advanced detectors usually combine the structure of feature pyramid to achieve the fusion of multi-scale object features. Top-down and bottom-up network structure is the basic strategy of multi-scale feature extraction. Although the feature pyramid network(FPN) can alleviate the contradiction between resolution and receptive field to a certain extent, the existing models based on FPN tend to ignore the contradictory information between different layers in the fusion process, and some fuzzy boundary information is also prone to lose features in top-down propagation. This paper first introduces the detector, then analyzes the defects behind the feature pyramid network, and finally proposes a feature pyramid network(SG-FPN) with adaptive fusion strategy and enhanced semantic information to solve these problems. The validity of our model is verified on mainstream data sets, and the performance is superior compared with other state-of-the-art methods.
... In 2016 Redom et al [35] designed the YOLO network, which requires only a single forward propagation to complete the target detection task.The YOLO network has a significant speed improvement compared to the RCNN network, but the first generation version is not effective in detecting small targets.In 2017 Zhang et al. applied the YOLO algorithm to traffic sign detection.Compared to other?target detection algorithm time was reduced by 0.017 seconds [36][37][38]. In the same year the team made public the CCTSDB traffic sign target detection dataset [39].Benjumea et al [40] modified the Neck layer and network structure to improve the accuracy of small target detection.In 2018 Yu et al [41] designed a fusion model based on the YOLOv3 and VGG19 networks to detect traffic signs in autonomous driving using the sequential relationship between the images, and its accuracy of detecting traffic signs is more than 90%.Dewi et al [42] augmented training of existing dataset by generating traffic sign images through generative adversarial network and the average accuracy on YOLOv3, YOLOv4 model is 84.9%, 89.33%.Gao et al [43] conducted test experiments on YOLOv4 for traffic sign sign signals and the results show that YOLOv4 has better image recognition performance. Pan Huiping et al [44] used YOLOv4 in combination with the SPP module to lift the original network's limitation on image size, modified the size of the source image captured by the car recorder in order to increase the receptive field, and greatly improved the detection of traffic signs with an average accuracy of 99.0%. ...
Article
YOLO (You Only Look Once) as an efficient target detection algorithm has significant advantages in the field of image recognition and traffic sign detection. The continuous development of autonomous driving technology needs to be supported by an algorithm that can quickly and accurately identify traffic signs, vehicles, pedestrians and other important objects on the road. By using the YOLO algorithm, we can achieve fast and accurate recognition of traffic signs, which is of great significance for improving the safety of autonomous driving technology. This study firstly introduces the general framework of YOLO series algorithms, including the network structure, introduces the development history of YOLO series and analyses the characteristics of each generation of algorithms, then discusses the application of YOLO algorithms in the field of traffic sign recognition, and finally summarizes the existing problems and puts forward a few points of possible optimization directions in the future.
... In order to improve the detection accuracy of small target objects, Bosquet et al. [30] optimized the model by improving the loss function and adding an attention mechanism, and the final accuracy reached 37.66%. Benjumea et al. [31] used the Resnet50 and DenseNet networks to replace the original backbone network. The improved model detection speed is faster, and the detection accuracy of small objects was also improved by 6.9%. ...
Article
Full-text available
This paper proposes a small target disease detection method using YOLOv5 framework for detecting small apparent diseases on intelligent bridges, aiming to address the problem of missed and false detection. To enhance the detection of small apparent diseases, a layer for detecting small objects is added to the YOLOv5 model. Additionally, an ECA attention mechanism module is embedded in the feature enhancement network to improve the extraction of disease features. To validate the effectiveness of the proposed algorithm, a dataset of 996 bridges with apparent diseases such as corrosion, rebar, speckle, hole and spall was established and trained after manual annotation and data augmentation. The experiment showed that the proposed algorithm achieves a mAP of 87.91%. Compared to the original YOLOv5 model, the proposed algorithm improved the mAP on the bridge apparent disease dataset by 1.97%. This algorithm accurately detects small apparent diseases on bridges and effectively reduces missed detection.
... YOLOv5 was used by Kasper et al. [14] to identify heavy freight trucks at winter rest places to estimate parking spot occupancy. YOLOv5 was also put into action by Benjumea et al. [3] for detecting small objects for autonomous vehicles, which has a maximum accuracy of 96.05% by improving YOLOv5 and is called YOLO-Z. ...
Conference Paper
Full-text available
Automatic instrument reading has become a critical issue for intelligent sensors in smart cities. Several artificial intelligence techniques are developing tools for addressing the issue. The image-based Automatic Meter Reading (AMR) techniques have been tested on images taken under regulated conditions, but they become unresponsive when dealing with fuzzy, hazy or blurred meter images. In this paper, we deal with AMR, which focuses on unconstrained settings such as fuzzy, hazy or blurry meter images. Automated meter reading consists of three major components: identifying the counter region, localising and cropping the counter region and digit recognition. In this article, the deep learning model YOLOv5 have been used on the image dataset. YOLOv5 is a state-of-the-art single-stage deep learning detector that outperforms all other detectors and it is observed that the proposed technique and the trained model based on YOLOv5 can reliably detect and recognise meter readings from the different meter kinds. For the task of digit recognition, a YOLOv5 based custom-built digit optical character reader is used that can recognise 0-to-9-digit numbers. Furthermore, the proposed AMR system achieves remarkable recognition rates of 99.74% for counters and 88.70% for digit recognition even while rejecting counters with lower confidence values.
... The demand for faster and more accurate detectors for small objects in target detection tasks is increasing. Although the human eye can almost instantly capture contextual information even from a distance, due to limitations in image resolution and computing resources, machine detection of small objects that occupy only a few pixels in an image remains a challenging task [17]. Pretraining results show that the target detection algorithm only recognizes targets when vehicles are relatively close, and this phenomenon is more intuitive in the case of unclear input images. ...
... However, in nighttime or low-light environments, relying solely on visible light images for pedestrian detection may lead to issues such as false positives or false negatives, posing potential safety risks. With the continuous improvement of infrared detection methods and the reduction in hardware costs, infrared cameras have gained widespread popularity in applications such as image capture and intelligent monitoring systems [9][10][11]. This is attributed to the characteristic of infrared cameras being insensitive to changes in lighting conditions while highly sensitive to temperature changes, making them particularly suitable for pedestrian detection tasks in nighttime traffic scenarios [12]. ...
Article
Full-text available
Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion scenarios. However, while continuously improving cross-modal feature extraction and fusion, ensuring the model’s detection speed is also a challenging issue. We have devised a deep learning network model for cross-modal pedestrian detection based on Resnet50, aiming to focus on more reliable features and enhance the model’s detection efficiency. This model employs a spatial attention mechanism to reweight the input visible light and infrared image data, enhancing the model’s focus on different spatial positions and sharing the weighted feature data across different modalities, thereby reducing the interference of multi-modal features. Subsequently, lightweight modules with depthwise separable convolution are incorporated to reduce the model’s parameter count and computational load through channel-wise and point-wise convolutions. The network model algorithm proposed in this paper was experimentally validated on the publicly available KAIST dataset and compared with other existing methods. The experimental results demonstrate that our approach achieves favorable performance in various complex environments, affirming the effectiveness of the multispectral pedestrian detection technology proposed in this paper.
... In [9] they have improved the capability of the YOLOv5 object detector for detecting tiny objects, specifically in the context of autonomous racing. The researchers propose architectural modifications to the YOLOv5 model, including changes to the backbone, neck, and other elements. ...
Article
Full-text available
The application of convolutional neural networks (CNNs) in particular has greatly enhanced the object detection capabilities of self-driving cars, because of recent advancements in artificial intelligence (AI). However, striking a balance in vehicular settings between high precision and fast processing continues to be a persistent challenge. Developing nations such as India, possessing the second-largest global population, introduce unique intricacies to road scenarios. Numerous challenges arise on Indian roads, such as unique vehicle kinds and a variety of traffic patterns, such as auto-rickshaws, which are only seen in India. This study presents the outcomes of evaluating the YOLOv8 models, which have demonstrated superior performance in Indian traffic conditions when compared to other existing YOLO models. The examination utilized the dataset, compiled from data collected in the cities of Bangalore and Hyderabad, as well as their surrounding areas. The investigation's findings demonstrate how well the YOLOv8 models work to address the unique problems that Indian road conditions present. This study advances the development of autonomous vehicles designed for intricate traffic situations such as those found on Indian Roads.