Comparison of different YOLOv5 models.

Comparison of different YOLOv5 models.

Source publication
Article
Full-text available
The governance of rural living environments is one of the important tasks in the implementation of a rural revitalization strategy. At present, the illegal behaviors of random construction and random storage in public spaces have seriously affected the effectiveness of the governance of rural living environments. The current supervision on such pro...

Contexts in source publication

Context 1
... training set contained 4107 images, the validation set contained 458 images, and the test set contained 507 images. From the results (as shown in Table 2), it can be seen that the detection precision of the YOLOv5s model was 91.2%, and its number of parameters and computation volume were 7025023 and 15.9 GFLOPs, respectively. Except for the Precision, YOLOv5s outperformed YOLOv5n in terms of all other performance indicators. ...
Context 2
... training set contained 4107 images, the validation set contained 458 images, and the test set contained 507 images. From the results (as shown in Table 2), it can be seen that the detection precision of the YOLOv5s model was 91.2%, and its number of parameters and computation volume were 7025023 and 15.9 GFLOPs, respectively. Except for the Precision, YOLOv5s outperformed YOLOv5n in terms of all other performance indicators. ...

Similar publications

Article
Full-text available
The paper deals with the study of aerodynamics of small Unmanned Aerial Vehicle for the purpose of reconnaissance which usually carries payloads like Camera. The two wing setup is termed as Tandem Aircraft. Researchers authenticated that tandem wing setup provides better aerodynamic efficiencies at low Reynolds Number compared to conventional. The...
Preprint
Full-text available
Intersection safety often relies on the correct modelling of signal phasing and timing parameters. A slight increase in yellow time or red time can have significant impact on the rear end crashes or conflicts. This paper aims to identify the relationship between surrogate safety measures and signal phasing. Unmanned Aerial Vehicle (UAV) video data...

Citations

... Recently, deep learning models represented by convolutional neural networks have developed rapidly. As very effective classification and recognition models, they have attracted considerable attention worldwide, been widely used [11][12][13][14][15], and achieved good results in the agricultural field. Examples include fruit identification [16,17], crop diseases and pests identification [18,19], animal behavior detection [20,21], etc. ...
Article
Full-text available
Identification of sugarcane stem nodes is generally dependent on high-performance recognition equipment in sugarcane seed pre-cutting machines and inefficient. Accordingly, this study proposes a novel lightweight architecture for the detection of sugarcane stem nodes based on the YOLOv5 framework, named G-YOLOv5s-SS. Firstly, the study removes the CBS and C3 structures at the end of the backbone network to fully utilize shallow-level feature information. This enhances the detection performance of sugarcane stem nodes. Simultaneously, it eliminates the 32 times down-sampled branches in the neck structure and the 20x20 detection heads at the prediction end, reducing model complexity. Secondly, a Ghost lightweight module is introduced to replace the conventional convolution module in the BottleNeck structure, further reducing the model’s complexity. Finally, the study incorporates the SimAM attention mechanism to enhance the extraction of sugarcane stem node features without introducing additional parameters. This improvement aims to enhance recognition accuracy, compensating for any loss in precision due to lightweight modifications. The experimental results showed that the average precision of the improved network for sugarcane stem node identification reached 97.6%, which was 0.6% higher than that of the YOLOv5 baseline network. Meanwhile, a model size of 2.6MB, 1,129,340 parameters, and 7.2G FLOPs, representing respective reductions of 82%, 84%, and 54.4%. Compared with mainstream one-stage target detection algorithms such as YOLOv4-tiny, YOLOv4, YOLOv5n, YOLOv6n, YOLOv6s, YOLOv7-tiny, and YOLOv7, G-YOLOv5s-SS achieved respective average precision improvements of 12.9%, 5.07%, 3.6%, 2.1%, 1.2%, 3%, and 0.4% in sugarcane stem nodes recognition. Meanwhile, the model size was compressed by 88.9%, 98.9%, 33.3%, 72%, 92.9%, 78.8% and 96.3%, respectively. Compared with similar studies, G-YOLOv5s-SS not only enhanced recognition accuracy but also considered model size, demonstrating an overall excellent performance that aligns with the requirements of sugarcane seed pre-cutting machines.
... Therefore, extracting rural buildings is more challenging, and related research is relatively scarce. In addition, the high cloud cover in rural areas increases the extraction difficulty [6]. Accurately identifying rural building roof types is of great importance for rural revitalization, environmental planning, energy assessment and disaster management [7][8][9][10]. ...
... Difficulties exist in recognizing complex and diverse rural building types. Additionally, the high cloud cover in rural areas affects satellite image quality [6]. To solve this problem, lowaltitude remote sensing technologies have advantages, such as low cost and high-quality data acquisition, which can effectively improve the accuracy of rural building recognition. ...
... The convolutional feature maps are generated as the output from the Block5 section of the encoder, and these maps serve as the input for the atrous spatial pyramid pooling. The ASPP module contains 4 parallel branches using dilated convolutions with different dilation rates (1,6,12,18) to obtain feature information at different scales. The 4 groups of feature maps after 1 × 1 convolution are concatenated to obtain feature expressions sensitive to multiscale features. ...
Article
Full-text available
Rural building automatic extraction technology is of great significance for rural planning and disaster assessment; however, existing methods face the dilemma of scarce sample data and large regional differences in rural buildings. To solve this problem, this study constructed an image dataset of typical Chinese rural buildings, including nine typical geographical regions, such as the Northeast and North China Plains. Additionally, an improved remote sensing image rural building extraction network called AGSC-Net was designed. Based on an encoder–decoder structure, the model integrates multiple attention gate (AG) modules and a context collaboration network (CC-Net). The AG modules realize focused expression of building-related features through feature selection. The CC-Net module models the global dependency between different building instances, providing complementary localization and scale information to the decoder. By embedding AG and CC-Net modules between the encoder and decoder, the model can capture multiscale semantic information on building features. Experiments show that, compared with other models, AGSC-Net achieved the best quantitative metrics on two rural building datasets, verifying the accuracy of the extraction results. This study provides an effective example for automatic extraction in complex rural scenes and lays the foundation for related monitoring and planning applications.
... To ensure a more precise assessment of object detection performance, four essential metrics were introduced: precision (P), recall (R), F1, and mean average precision (mAP) [33,34]. The specific computational formulas for these metrics are provided as follows (Equations (7)-(10)): ...
Article
Full-text available
The rapid detection of distracted driving behaviors is crucial for enhancing road safety and preventing traffic accidents. Compared with the traditional methods of distracted-driving-behavior detection, the YOLOv8 model has been proven to possess powerful capabilities, enabling it to perceive global information more swiftly. Currently, the successful application of GhostConv in edge computing and embedded systems further validates the advantages of lightweight design in real-time detection using large models. Effectively integrating lightweight strategies into YOLOv8 models and reducing their impact on model performance has become a focal point in the field of real-time distracted driving detection based on deep learning. Inspired by GhostConv, this paper presents an innovative GhostC2f design, aiming to integrate the idea of linear transformation to generate more feature maps without additional computation into YOLOv8 for real-time distracted-driving-detection tasks. The goal is to reduce model parameters and computational load. Additionally, enhancements have been made to the path aggregation network (PAN) to amplify multi-level feature fusion and contextual information propagation. Furthermore, simple attention mechanisms (SimAMs) are introduced to perform self-normalization on each feature map, emphasizing feature maps with valuable information and suppressing redundant information interference in complex backgrounds. Lastly, the nine distinct distracted driving types in the publicly available SFDDD dataset were expanded to 14 categories, and nighttime scenarios were introduced. The results indicate a 5.1% improvement in model accuracy, with model weight size and computational load reduced by 36.7% and 34.6%, respectively. During 30 real vehicle tests, the distracted-driving-detection accuracy reached 91.9% during daylight and 90.3% at night, affirming the exceptional performance of the proposed model in assisting distracted driving detection when driving and contributing to accident-risk reduction.
... As the best algorithm of the YOLO family, YOLv5 has a slow detection speed, and there are problems involving the false detection of leaks in large objects, which cannot be detected well. Therefore, research scholars have started to study the improved YOLOv5 algorithm, and the main improvement methods are Transformers [20][21][22][23], attention mechanisms [24][25][26][27], etc. ...
... Liu et al. [21] proposed a nested residual Transformer structure to obtain global information, and the model improved the accuracy of tiny object detection and reduced its complexity. Wang et al. [22] used a visual Transformer structure to make their model focus more on the global features of an object, compensating for CNNs' tendency to focus only on local features and substantially improving the model's accuracy. ...
Article
Full-text available
Leak monitoring is essential for the intelligent operation and maintenance of marine systems, and can effectively prevent catastrophic accidents on ships. In response to this challenge, a machine vision-based leak model is proposed in this study and applied to leak detection in different types of marine system in complex engine room environments. Firstly, an image-based leak database is established, and image enhancement and expansion methods are applied to the images. Then, Standard Convolution and Fast Spatial Pyramid Pooling modules are added to the YOLOv5 backbone network to reduce the floating-point operations involved in the leak feature channel fusion process, thereby improving the detection speed. Additionally, Bottleneck Transformer and Shuffle Attention modules are introduced to the backbone and neck networks, respectively, to enhance the feature representation performance, select critical information for the leak detection task, and suppress non-critical information to improve detection accuracy. Finally, the proposed model’s effectiveness is verified using leak images collected by the ship’s video system. The test results demonstrate that the proposed model exhibits excellent recognition performance for various types of leak, especially for drop-type leaks (for which the accuracy reaches 0.97).
Article
Full-text available
This paper introduces a new visual dataset and framework to facilitate computer-vision-based traffic monitoring in high density, mixed and lane-free traffic (TRAMON). While there are advanced deep learning algorithms that can detect and track vehicles from traffic videos, none of the existing systems provides accurate traffic monitoring in mixed traffic. The mixed traffic flows in developing countries often includes the types of vehicles that are not widely known by the existing visual datasets. The computer vision algorithms also face difficulties in detecting and tracking a high density of vehicles that are not following lanes. This paper proposes a large-scale visual dataset of >282,000 labelled images of traffic vehicles, as well as a comprehensive framework and strategy to train common deep-learning-based computer vision algorithms to detect and track vehicles in high density, heterogeneous and lane-free traffic. A systematic evaluation of results shows that TRAMON, the proposed visual dataset and framework, performs well and better than the common visual dataset at all traffic densities.