Figure - available from: Journal of Real-Time Image Processing
This content is subject to copyright. Terms and conditions apply.
YOLO architecture (What’s New in YOLOv6? n.d.) [22]

YOLO architecture (What’s New in YOLOv6? n.d.) [22]

Source publication
Article
Full-text available
Object detection and object recognition are the most important applications of computer vision. To pursue the task of object detection efficiently, a model with higher detection accuracy is required. Increasing the detection accuracy of the model increases the model’s size and computation cost. Therefore, it becomes a challenge to use deep learning...

Similar publications

Article
Full-text available
Iron deficiency in plants causes iron chlorosis which frequently occurs in soils that are alkaline (pH greater than 7.0) and that contains lime. This deficiency turns affected plant leaves to yellow, or with brown edges in advanced stages. The goal of this research is to use the deep learning model to identify nutrient deficiency in plant leaves an...

Citations

... YOLOv6 [212], 2023 "Real-time Tool Detection in Smart Manufacturing Using YOLOv5" ...
Preprint
Full-text available
This review systematically examines the progression of the You Only Look Once (YOLO) object detection algorithms from YOLOv1 to the recently unveiled YOLOv10. Employing a reverse chronological analysis, this study examines the advancements introduced by YOLO algorithms, beginning with YOLOv10 and progressing through YOLOv9, YOLOv8, and subsequent versions to explore each version's contributions to enhancing speed, accuracy, and computational efficiency in real-time object detection. The study highlights the transformative impact of YOLO across five critical application areas: automotive safety, healthcare, industrial manufacturing, surveillance, and agriculture. By detailing the incremental technological advancements in subsequent YOLO versions, this review chronicles the evolution of YOLO, and discusses the challenges and limitations in each earlier versions. The evolution signifies a path towards integrating YOLO with multimodal, context-aware, and General Artificial Intelligence (AGI) systems for the next YOLO decade, promising significant implications for future developments in AI-driven applications. Keywords: You Only Look Once, YOLOv10 to YOLOv1, CNN, Deep learning, Object detection, Real-time object detection, Artificial intelligence, Computer vision, Healthcare, Autonomous Vehicles, Industrial manufacturing, Surveillance, Agriculture, YOLOv10, YOLOv9, YOLOv8, YOLOv7, YOLOv6, YOLOv5, YOLOv4, YOLOv3, YOLOv2, YOLO
... The twostage architecture of Faster R-CNN allows it to assign objects to different regions in the image accurately, even when they are closely packed or overlapping. YOLOv6, being a single-shot detector, may struggle in such scenarios and may have difficulty distinguishing individual objects [18]. ...
Article
Full-text available
Transportation systems primarily depend on vehicular flow on roads. Developed countries have shifted towards automated signal control, which manages and updates signal synchronisation automatically. In contrast, traffic in underdeveloped countries is mainly governed by manual traffic light systems. These existing manual systems lead to numerous issues, wasting substantial resources such as time, energy, and fuel, as they cannot make real‐time decisions. In this work, we propose an algorithm to determine traffic signal durations based on real‐time vehicle density, obtained from live closed circuit television camera feeds adjacent to traffic signals. The algorithm automates the traffic light system, making decisions based on vehicle density and employing Faster R‐CNN for vehicle detection. Additionally, we have created a local dataset from live streams of Punjab Safe City cameras in collaboration with the local police authority. The proposed algorithm achieves a class accuracy of 96.6% and a vehicle detection accuracy of 95.7%. Across both day and night modes, our proposed method maintains an average precision, recall, F1 score, and vehicle detection accuracy of 0.94, 0.98, 0.96 and 0.95, respectively. Our proposed work surpasses all evaluation metrics compared to state‐of‐the‐art methodologies.
... Techniques based on deep learning (DL) have replaced conventional object recognition approaches because of their weak robustness when features are manually chosen. Today, users can observe the detection method based on DL everywhere, such as the realtime vehicle monitoring in the intelligent transportation system [4], the fall detection system for older people with smart cameras [5], and a real-time detection model for the visually impaired [6]. The YOLO algorithm [7], the Single Shot MultiBox Detector (SSD) algorithm [8], R-CNN [9], and RetinaNet [10] techniques, which are traditional OD methods based on DL, demonstrate the high detection accuracy of the DL method. ...
... The proposed feature extractor replaces its extractor and hence contributes to better performance. The model is also pruned with a pruning algorithm which was proposed by the authors in a previous work with YOLOv6 [6]. While pruning, the model's accuracy deprecates; hence, a transfer learning algorithm has been used to improve the proposed model's deprecated accuracy. ...
... The framework uses the locally adaptive contrast enhancement (MLLE) technique and minimal colour loss to improve underwater photos. Gupta et al. [6] proposed a novel finetuned YOLOv6 framework for real-time OD. The study proposed algorithms for pruning as well as transfer learning which are utilized for finetuning the baseline YOLOv6 framework. ...
Article
Full-text available
When recognizing underwater images, problems, including poor image quality and complicated backdrops, are significant. The main problem of underwater images is the blurriness and invisibility of objects present in an image. This study presents a unique object identification design built on a YOLOv8 (You Only Look Once) framework upgraded to address these problems and further improve the models' accuracy. The study also helps in identifying underwater trash. The model is a two-phase detector model. The first phase has an Underwater Image Enhancer (UIE) data augmentation technique that works with Laplacian pyramids and gamma correctness methods to enhance the underwater images. The second phase, the proposed refined, innovative YOLOv8 model for classification purposes, takes the output from the first stage as its input. The YOLOv8 model's existing feature extractor is replaced in this study with a new feature extractor technique, HEFA, that yields superior results and better detection accuracy. The introduction of the UIE and HEFA feature extractor method represents the significant novelty of this paper. The proposed model is pruned simultaneously to eliminate unnecessary parameters and further condense the model. Pruning causes the model's accuracy to decline. Thus, the transfer learning procedure is employed to raise it. The trials’ findings show that the technique can detect objects with an accuracy of 98.5% and a mAP@50 of 98.1% and that its real-time detection speed on the GPU is double that of the YOLOv8m model's baseline performance.
... In contrast, the suggested image processing method produced a maximum inaccuracy of 0.05 mm for pitch computation and 100% accuracy for standardsize assignments. In addition, there is an introduction to the utilization of YOLOv6 for transferring learning to a real-time object detection model, [12]. Another crucial component of this work is the suggested model's ability to recognize every object in a scene-indoor and outdoor-and to alert the user to close and distant things via voice output. ...
Article
Garbage problems in urban areas are becoming more serious as the population increases, resulting in community garbage, including Bangkok, the capital of Thailand, being affected by pollution from rotten waste. Therefore, this research aims to apply deep learning technology to detect images from CCTV cameras in urban areas of Bangkok by using YOLO to detect images from CCTV cameras in urban areas of Bangkok, using YOLO to detect 1,383 images of overflowing garbage bins, classified into 2 classes: garbage class and bin class. YOLO in each version was compared, consisting of YOLOv5n, YOLOv6n, YOLOv7, and YOLOv8n. The comparison results showed that YOLOv5n was able to classify classes with an accuracy of 94.50%, followed by YOLOv8n at 93.80%, YOLOv6n at 71.60%, and YOLOv7 at 24.60%, respectively. The results from this research can be applied to develop a mobile or web application to notify of overflowing garbage bins by integrating with CCTV cameras installed in communities to monitor garbage that is overflowing or outside the bin and notify relevant agencies or the locals. This will allow for faster and more efficient waste management.
... To promote convergence, YOLOv8 designs the C2f module to control the shortest longest gradient path [36]. Decoupled-head [37] and task-aligned assigners [38] have also been introduced to enhance its capabilities. However, directly applying the above methods to the detection of suspicious objects in MMW images does not fully exploit their advantages. ...
Article
Full-text available
Millimeter wave (MMW) imaging systems have been widely used for security screening in public places due to their advantages of being able to detect a variety of suspicious objects, non-contact operation, and harmlessness to the human body. In this study, we propose an innovative, multi-dimensional information fusion YOLO network that can aggregate and capture multimodal information to cope with the challenges of low resolution and susceptibility to noise in MMW images. In particular, an MMW data information aggregation module is developed to adaptively synthesize a novel type of MMW image, which simultaneously contains pixel, depth, phase, and diverse signal-to-noise information to overcome the limitations of current MMW images containing consistent pixel information in all three channels. Furthermore, this module is capable of differentiable data enhancements to take into account adverse noise conditions in real application scenarios. In order to fully acquire the augmented contextual information mentioned above, we propose an asymptotic path aggregation network and combine it with YOLOv8. The proposed method is able to adaptively and bidirectionally fuse deep and shallow features while avoiding semantic gaps. In addition, a multi-view, multi-parameter mapping technique is designed to enhance the detection ability. The experiments on the measured MMW datasets validate the improvement in object detection using the proposed model.
... The accuracy rate demonstrates how much more effective the suggested strategy is than the standard approach. The performance of the proposed method is compared to the prior Fig. 6 Results of experiments using the proposed method for IDD dataset Fig. 7 Examples of the object detection results of existing with proposed method YOLO v3 [30], YOLO v5 [31] and YOLO v6 [32] methods, respectively. ...
... The performance of the proposed technique is evaluated using specificity, accuracy, recall, precision, and F-measure is contrasted with that of the existing methodologies in Table 2. By obtaining an accuracy of 98.99%, the proposed approach exceeds previous models like the YOLO v3 [30], YOLO v5 [31] and YOLO v6 [32] methodologies. We have the evaluated the performance of the existing methods (i.e., Med Glasses [22], smart glasses application [23], and Lid Sonic [25]) using the same dataset of our research work. ...
Article
Full-text available
Visually impairments or blindness people need guidance in order to avoid collision risks with outdoor obstacles. Recently, technology has been proving its presence in all aspects of human life, and new devices provide assistance to humans on a daily basis. However, due to real-time dynamics or a lack of specialized knowledge, object detection confronts a reliability difficulty. To overcome the challenge, YOLO Glass a Video-based Smart object detection model has been proposed for visually impaired person to navigate effectively in indoor and outdoor environments. Initially the captured video is converted into key frames and pre-processed using Correlation Fusion-based disparity approach. The pre-processed images were augmented to prevent overfitting of the trained model. The proposed method uses an obstacle detection system based on a Squeeze and Attendant Block YOLO Network model (SAB-YOLO). A proposed system assists visually impaired users in detecting multiple objects and their locations relative to their line of sight, and alerts them by providing audio messages via headphones. The system assists blind and visually impaired people in managing their daily tasks and navigating their surroundings. The experimental results show that the proposed system improves accuracy by 98.99%, proving that it can accurately identify objects. The detection accuracy of the proposed method is 5.15%, 7.15% and 9.7% better that existing YOLO v6, YOLO v5 and YOLO v3, respectively.
... The YOLO network is superior in inspection speed as a single-stage target detection network. The YOLO series has experienced remarkable growth, with notable advancements, including the release of YOLOv5 [34], followed by YOLOv6 [35] and YOLOv7 [36], and the latest iteration, YOLOv8 [37]. In contrast to other iterations and versions, the YOLOv5 series models offer the flexibility to adjust the model size, resulting in a wide array of compact models derived from the foundational model. ...
Article
Full-text available
Accurate and efficient sorting of diverse magnetic tiles during manufacturing is vital. However, challenges arise due to visual similarities among types, necessitating complex computer vision algorithms with large sizes and high computational needs. This impedes cost-effective deployment in the industry, resulting in the continued use of inefficient manual sorting. To address this issue, we propose an innovative lightweight magnetic tile detection approach that improves knowledge distillation for a compressed YOLOv5s model. Incorporating spatial attention modules into different feature extraction stages of YOLOv5s during the knowledge distillation process can enhance the ability of the compressed model to learn the knowledge of intermediate feature extraction layers from the original large model at different stages. Combining different outputs to form a multi-scale output, the multi-scale output feature in the knowledge refinement process enhances the capacity of the compressed model to grasp comprehensive target knowledge in outputs. Experimental results on our self-built magnetic tile dataset demonstrate significant achievements: 0.988 mean average precision, 0.5% discrepancy compared to the teacher’s network, and an 85% model size reduction. Moreover, a 36.70% boost in inference speed is observed for single image analysis. Our method’s effectiveness is also validated by the Pascal VOC dataset results, showing potential for broader target detection scenarios. This approach offers a solution to magnetic tile target detection challenges while being expected to expand to other applications.
... However, limitations included smoke classification and accuracy in poorly illuminated environments. Another piece of research suggested a transfer learning-based model using YOLOv6 for real-time object detection in embedded environments [26]. Pruning and finetuning algorithms improve accuracy and speed. ...
Article
Full-text available
Nowadays, wireless sensor networks (WSNs) have a significant and long-lasting impact on numerous fields that affect all facets of our lives, including governmental, civil, and military applications. WSNs contain sensor nodes linked together via wireless communication links that need to relay data instantly or subsequently. In this paper, we focus on unmanned aerial vehicle (UAV)-aided data collection in wireless sensor networks (WSNs), where multiple UAVs collect data from a group of sensors. The UAVs may face some static or moving obstacles (e.g., buildings, trees, static or moving vehicles) in their traveling path while collecting the data. In the proposed system, the UAV starts and ends the data collection tour at the base station, and, while collecting data, it captures images and videos using the UAV aerial camera. After processing the captured aerial images and videos, UAVs are trained using a YOLOv8-based model to detect obstacles in their traveling path. The detection results show that the proposed YOLOv8 model performs better than other baseline algorithms in different scenarios—the F1 score of YOLOv8 is 96% in 200 epochs.
... Extracted features feed into the network's neck and head sections, shouldering the majority of the computational load. To reconcile the opposing requirements of speed and accuracy often encountered in traditional multi-branch networks like ResNets and linear networks like VGG, YOLOv6 introduces reparameterized backbones [21]. This technique adjusts the network structure during training and inference. ...
... The smaller YOLOv6 models (nano, tiny, and small) harness reparameterized VGG networks (RepBlock) with skip connections for training, which transition to simple 3 × 3 convolutional (RepConv) blocks during inference [22]. Meanwhile, the medium and large YOLOv6 models deploy reparameterized versions of the CSP backbone, termed CSPStackRep, culminating in the EfficientRep backbone [21], [23]. ...
... In a significant departure from its predecessors, YOLOv6 debuts the Efficient Decoupled Head [21]. This unique architecture ensures that the classification and detection branches no longer share parameters, branching off independently from the backbone, which effectively reduces computational requirements while amplifying accuracy [20], [21]. ...
Article
Full-text available
This paper presents a comprehensive evaluation of various YOLO architectures for smoke and wildfire detection, including YOLOv5, YOLOv6, YOLOv7, YOLOv8, and YOLO-NAS. The study aims to assess their effectiveness in early detection of wildfires using the Foggia dataset, comprising 8,974 images specifically designed for this purpose. Performance evaluation employs metrics such as Recall, Precision, F1-score, and mean Average Precision to provide a holistic assessment of the models’ performance. The study follows a rigorous methodology involving fixed epochs, continuous performance tracking, and unbiased testing. Results show that YOLOv5, YOLOv7, and YOLOv8 exhibit a balanced performance across all metrics in both validation and testing. YOLOv6 performs slightly lower in recall during validation but achieves a good balance on testing. YOLO-NAS variants excel in recall, making them suitable for critical applications. However, precision performance is lower for YOLO-NAS models. Visual results demonstrate that the top-performing models accurately identify most instances in the test set. However, they struggle with distant scenes and poor lighting conditions, occasionally detecting false positives. In favorable conditions, the models perform well in identifying relevant instances. We conclude that no single model excels in all aspects of smoke and wildfire detection. The choice of model depends on specific application requirements, considering accuracy, recall, and inference time. This research contributes to the field of computer vision in smoke and wildfire detection, providing a foundation for improving detection systems and mitigating the impact of wildfires. Researchers can build upon these findings to propose modifications and enhance the effectiveness of wildfire detection systems.