Conference Paper

CSE-YOLOv5: A Lightweight Attention Guided YOLOv5 Network based on EIoU Loss

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Human image segmentation has been a practical and active research topic due to its wide range of potential application. There are some previous studies on manual, semi-automatic and automatic segmentation methods to investigate the semantic segmentation of human parts fully for real-world human analysis scenarios, but further research is still needed. This paper presents a novel semantic segmentation network, named TRCA-Net, for human image segmentation tasks. Having the TransUNet as the backbone, TRCA-Net incorporates Res2Net and Coordinate Attention to improve the performance. Res2Net blocks and Transformer can obtain better feature maps by encoding the input images. The Coordinate Attention in the decoder aggregates and upsamples the encoded feature maps, and connects to the high-resolution CNN feature maps for gaining accurate segmentation. The TRCA-Net can enhance finer details by recovering local spatial information. We compare the TRCA-Net with state-of-the-art (SOAT) semantic segmentation networks: the original U-Net, DeepLabv3+, and TransUNet. The experiment results have demonstrated that our proposed TRCA-Net outperforms these networks.
Article
Full-text available
In order to enable the picking robot to detect and locate apples quickly and accurately in the orchard natural environment, we propose an apple object detection method based on Shufflenetv2-YOLOX. This method takes YOLOX-Tiny as the baseline and uses the lightweight network Shufflenetv2 added with the convolutional block attention module (CBAM) as the backbone. An adaptive spatial feature fusion (ASFF) module is added to the PANet network to improve the detection accuracy, and only two extraction layers are used to simplify the network structure. The average precision (AP), precision, recall, and F1 of the trained network under the verification set are 96.76%, 95.62%, 93.75%, and 0.95, respectively, and the detection speed reaches 65 frames per second (FPS). The test results show that the AP value of Shufflenetv2-YOLOX is increased by 6.24% compared with YOLOX-Tiny, and the detection speed is increased by 18%. At the same time, it has a better detection effect and speed than the advanced lightweight networks YOLOv5-s, Efficientdet-d0, YOLOv4-Tiny, and Mobilenet-YOLOv4-Lite. Meanwhile, the half-precision floating-point (FP16) accuracy model on the embedded device Jetson Nano with TensorRT acceleration can reach 26.3 FPS. This method can provide an effective solution for the vision system of the apple picking robot.
Article
Full-text available
After the revival of deep learning in computer vision in 2012, SAR ship detection comes into the deep learning era too. The deep learning-based computer vision algorithms can work in an end-to-end pipeline, without the need of designing features manually, and they have amazing performance. As a result, it is also used to detect ships in SAR images. The beginning of this direction is the paper we published in 2017BIGSARDATA, in which the first dataset SSDD was used and shared with peers. Since then, lots of researchers focus their attention on this field. In this paper, we analyze the past, present, and future of the deep learning-based ship detection algorithms in SAR images. In the past section, we analyze the difference between traditional CFAR (constant false alarm rate) based and deep learning-based detectors through theory and experiment. The traditional method is unsupervised while the deep learning is strongly supervised, and their performance varies several times. In the present part, we analyze the 177 published papers about SAR ship detection. We highlight the dataset, algorithm, performance, deep learning framework, country, timeline, etc. After that, we introduce the use of single-stage, two-stage, anchor-free, train from scratch, oriented bounding box, multi-scale, and real-time detectors in detail in the 177 papers. The advantages and disadvantages of speed and accuracy are also analyzed. In the future part, we list the problem and direction of this field. We can find that, in the past five years, the AP50 has boosted from 78.8% in 2017 to 97.8 % in 2022 on SSDD. Additionally, we think that researchers should design algorithms according to the specific characteristics of SAR images. What we should do next is to bridge the gap between SAR ship detection and computer vision by merging the small datasets into a large one and formulating corresponding standards and benchmarks. We expect that this survey of 177 papers can make people better understand these algorithms and stimulate more research in this field.
Article
Full-text available
The intelligent crack detection method is an important guarantee for the realization of intelligent operation and maintenance, and it is of great significance to traffic safety. In recent years, the recognition of road pavement cracks based on computer vision has attracted increasing attention. With the technological breakthroughs of general deep learning algorithms in recent years, detection algorithms based on deep learning and convolutional neural networks have achieved better results in the field of crack recognition. In this paper, deep learning is investigated to intelligently detect road cracks, and Faster R-CNN and Mask R-CNN are compared and analyzed. The results show that the joint training strategy is very effective, and we are able to ensure that both Faster R-CNN and Mask R-CNN complete the crack detection task when trained with only 130+ images and can outperform YOLOv3. However, the joint training strategy causes a degradation in the effectiveness of the bounding box detected by Mask R-CNN.
Article
Full-text available
Due to advances in remote sensing satellite imaging and image processing technologies and their wide applications, intelligent remote sensing satellites are facing an opportunity for rapid development. The key technologies, standards, and laws of intelligent remote sensing satellites are also experiencing a series of new challenges. Novel concepts and key technologies in the intelligent hyperspectral remote sensing satellite system have been proposed since 2011. The aim of these intelligent remote sensing satellites is to provide real-time, accurate, and personalized remote sensing information services. This article reviews the current developments in new-generation intelligent remote sensing satellite systems, with a focus on intelligent remote sensing satellite platforms, imaging payloads, onboard processing systems, and other key technological chains. The technological breakthroughs and current defects of intelligence-oriented designs are also analyzed. Intelligent remote sensing satellites collect personalized remote sensing data and information, with real-time data features and information interaction between remote sensing satellites or between satellites and the ground. Such developments will expand the use of remote sensing applications beyond government departments and industrial users to a massive number of individual users. However, this extension faces challenges regarding privacy protection, societal values, and laws regarding the sharing and distribution of data and information.
Article
Full-text available
Due to its great application value in the military and civilian fields, ship detection in synthetic aperture radar (SAR) images has always attracted much attention. However, ship targets in High-Resolution (HR) SAR images show the significant characteristics of multi-scale, arbitrary directions and dense arrangement, posing enormous challenges to detect ships quickly and accurately. To address these issues above, a novel YOLO-based arbitrary-oriented SAR ship detector using bi-directional feature fusion and angular classification (BiFA-YOLO) is proposed in this article. First of all, a novel bi-directional feature fusion module (Bi-DFFM) tailored to SAR ship detection is applied to the YOLO framework. This module can efficiently aggregate multi-scale features through bi-directional (top-down and bottom-up) information interaction, which is helpful for detecting multi-scale ships. Secondly, to effectively detect arbitrary-oriented and densely arranged ships in HR SAR images, we add an angular classification structure to the head network. This structure is conducive to accurately obtaining ships’ angle information without the problem of boundary discontinuity and complicated parameter regression. Meanwhile, in BiFA-YOLO, a random rotation mosaic data augmentation method is employed to suppress the impact of angle imbalance. Compared with other conventional data augmentation methods, the proposed method can better improve detection performance of arbitrary-oriented ships. Finally, we conduct extensive experiments on the SAR ship detection dataset (SSDD) and large-scene HR SAR images from GF-3 satellite to verify our method. The proposed method can reach the detection performance with precision = 94.85%, recall = 93.97%, average precision = 93.90%, and F1-score = 0.9441 on SSDD. The detection speed of our method is approximately 13.3 ms per 512 × 512 image. In addition, comparison experiments with other deep learning-based methods and verification experiments on large-scene HR SAR images demonstrate that our method shows strong robustness and adaptability.
Article
Full-text available
SAR Ship Detection Dataset (SSDD) is the first open dataset that is widely used to research state-of-the-art technology of ship detection from synthetic aperture radar (SAR) imagery based on deep learning (DL). According to our investigation, up to 46.59% of the total 161 public reports confidently select SSDD to study DL-based SAR ship detection. Undoubtedly, this situation reveals the popularity and great influence of SSDD in the SAR remote sensing community. Nevertheless, the coarse annotations and ambiguous standards of use of its initial version both hinder fair methodological comparisons and effective academic exchanges. Additionally, its single-function horizontal-vertical rectangle bounding box (BBox) labels can no longer satisfy the current research needs of the rotatable bounding box (RBox) task and the pixel-level polygon segmentation task. Therefore, to address the above two dilemmas, in this review, advocated by the publisher of SSDD, we will make an official release of SSDD based on its initial version. SSDD’s official release version will cover three types: (1) a bounding box SSDD (BBox-SSDD), (2) a rotatable bounding box SSDD (RBox-SSDD), and (3) a polygon segmentation SSDD (PSeg-SSDD). We relabel ships in SSDD more carefully and finely, and then explicitly formulate some strict using standards, e.g., (1) the training-test division determination, (2) the inshore-offshore protocol, (3) the ship-size reasonable definition, (4) the determination of the densely distributed small ship samples, and (5) the determination of the densely parallel berthing at ports ship samples. These using standards are all formulated objectively based on the using differences of existing 75 (161 × 46.59%) public reports. They will be beneficial for fair method comparison and effective academic exchanges in the future. Most notably, we conduct a comprehensive data analysis on BBox-SSDD, RBox-SSDD, and PSeg-SSDD. Our analysis results can provide some valuable suggestions for possible future scholars to further elaborately design DL-based SAR ship detectors with higher accuracy and stronger robustness when using SSDD.
Article
Full-text available
Deep learning-based object detection and instance segmentation have achieved unprecedented progress. In this article, we propose complete-IoU (CIoU) loss and Cluster-NMS for enhancing geometric factors in both bounding-box regression and nonmaximum suppression (NMS), leading to notable gains of average precision (AP) and average recall (AR), without the sacrifice of inference efficiency. In particular, we consider three geometric factors, that is: 1) overlap area; 2) normalized central-point distance; and 3) aspect ratio, which are crucial for measuring bounding-box regression in object detection and instance segmentation. The three geometric factors are then incorporated into CIoU loss for better distinguishing difficult regression cases. The training of deep models using CIoU loss results in consistent AP and AR improvements in comparison to widely adopted $\ell _{n}$ -norm loss and IoU-based loss. Furthermore, we propose Cluster-NMS, where NMS during inference is done by implicitly clustering detected boxes and usually requires fewer iterations. Cluster-NMS is very efficient due to its pure GPU implementation, and geometric factors can be incorporated to improve both AP and AR. In the experiments, CIoU loss and Cluster-NMS have been applied to state-of-the-art instance segmentation (e.g., YOLACT and BlendMask-RT), and object detection (e.g., YOLO v3, SSD, and Faster R-CNN) models. Taking YOLACT on MS COCO as an example, our method achieves performance gains as +1.7 AP and +6.2 AR <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">100</sub> for object detection, and +1.1 AP and +3.5 AR <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">100</sub> for instance segmentation, with 27.1 FPS on one NVIDIA GTX 1080Ti GPU. All the source code and trained models are available at https://github.com/Zzh-tju/CIoU .
Article
Full-text available
Ship detection in synthetic aperture radar (SAR) images is becoming a research hotspot. In recent years, as the rise of artificial intelligence, deep learning has almost dominated SAR ship detection community for its higher accuracy, faster speed, less human intervention, etc. However, today, there is still a lack of a reliable deep learning SAR ship detection dataset that can meet the practical migration application of ship detection in large-scene space-borne SAR images. Thus, to solve this problem, this paper releases a Large-Scale SAR Ship Detection Dataset-v1.0 (LS-SSDD-v1.0) from Sentinel-1, for small ship detection under large-scale backgrounds. LS-SSDD-v1.0 contains 15 large-scale SAR images whose ground truths are correctly labeled by SAR experts by drawing support from the Automatic Identification System (AIS) and Google Earth. To facilitate network training, the large-scale images are directly cut into 9000 sub-images without bells and whistles, providing convenience for subsequent detection result presentation in large-scale SAR images. Notably, LS-SSDD-v1.0 has five advantages: (1) large-scale backgrounds, (2) small ship detection, (3) abundant pure backgrounds, (4) fully automatic detection flow, and (5) numerous and standardized research baselines. Last but not least, combined with the advantage of abundant pure backgrounds, we also propose a Pure Background Hybrid Training mechanism (PBHT-mechanism) to suppress false alarms of land in large-scale SAR images. Experimental results of ablation study can verify the effectiveness of the PBHT-mechanism. LS-SSDD-v1.0 can inspire related scholars to make extensive research into SAR ship detection methods with engineering application value, which is conducive to the progress of SAR intelligent interpretation technology.
Article
Full-text available
Convolutional neural network has shown strong capability to improve performance in vehicle detection, which is one of the main research topics of intelligent transportation system. Aiming to detect the blocked vehicles efficiently in actual traffic scenes, we propose a novel convolutional neural network based on multi-target corner pooling layers. The hourglass network, which could extract local and global information of the vehicles in the images simultaneously, is chosen as the backbone network to provide vehicles’ features. Instead of using the max pooling layer, the proposed multi-target corner pooling (MTCP) layer is used to generate the vehicles’ corners. And in order to complete the blocked corners that cannot be generated by MTCP, a novel matching corners method is adopted in the network. Therefore, the proposed network can detect blocked vehicles accurately. Experiments demonstrate that the proposed network achieves an AP of 43.5% on MS COCO dataset and a precision of 93.6% on traffic videos, which outperforms the several existing detectors.
Article
Full-text available
Ship detection by Unmanned Airborne Vehicles (UAVs) and satellites plays an important role in a spectrum of related military and civil applications. To improve the detection efficiency, accuracy, and speed, a novel ship detection method from coarse to fine is presented. Ship targets are viewed as uncommon regions in the sea background caused by the differences in colors, textures, shapes, or other factors. Inspired by this fact, a global saliency model is constructed based on high-frequency coefficients of the multi-scale and multi-direction wavelet decomposition, which can characterize different feature information from edge to texture of the input image. To further reduce the false alarms, a new and effective multi-level discrimination method is designed based on the improved entropy and pixel distribution, which is robust against the interferences introduced by islands, coastlines, clouds, and shadows. The experimental results on optical remote sensing images validate that the presented saliency model outperforms the comparative models in terms of the area under the receiver operating characteristic curves core and the accuracy in the images with different sizes. After the target identification, the locations and the number of the ships in various sizes and colors can be detected accurately and fast with high robustness.
Article
Deep learning algorithms have been widely utilized for synthetic aperture radar (SAR) target detection. Nevertheless, the traditional feature extraction methods and deep learning methods achieve improved ship detection accuracy at a cost of increased complexity and lower detection speed. As detection speed also is meaningful, especially in real-time maritime rescue and emergency military decision-making applications, we propose a new framework of faster region-based convolutional neural network (R-CNN) detection method to handle this problem. A new lightweight basic network with feature relay amplification and multiscale feature jump connection structure is designed to extract the features of each scale target in the SAR images, so as to improve its recognition and localization task network. Moreover, the K-Means method is used to obtain the distribution of the target scale, which enables to select more appropriate preset anchor boxes to reduce the difficulty of network learning. Finally, RoIAlign instead of region of interest (RoI) Pooling is used to reduce the quantization error during positioning. Experimental results show that the detection performance of the proposed method achieves 0.898 average precision (AP), which is 2.78% better than the conventional Faster R-CNN and 800% faster detection speed.
Chapter
Although two-stage object detectors have continuously advanced the state-of-the-art performance in recent years, the training process itself is far from crystal. In this work, we first point out the inconsistency problem between the fixed network settings and the dynamic training procedure, which greatly affects the performance. For example, the fixed label assignment strategy and regression loss function cannot fit the distribution change of proposals and thus are harmful to training high quality detectors. Consequently, we propose Dynamic R-CNN to adjust the label assignment criteria (IoU threshold) and the shape of regression loss function (parameters of SmoothL1 Loss) automatically based on the statistics of proposals during training. This dynamic design makes better use of the training samples and pushes the detector to fit more high quality samples. Specifically, our method improves upon ResNet-50-FPN baseline with 1.9% AP and 5.5% AP90 on the MS COCO dataset with no extra overhead. Codes and models are available at https://github.com/hkzhang95/DynamicRCNN.
Article
Small ship detection is an important topic in autonomous ship technology and plays an essential role in shipping safety. Since traditional object detection techniques based on the shipborne radar are not qualified for the task of near and small ship detection, deep learning-based image recognition methods based on video surveillance systems can be naturally utilized on autonomous vessels to effectively detect near and small ships. However, a limited number of real-world samples of small ships may fail to train a learning method that can accurately detect small ships in most cases. To address this, a novel hybrid deep learning method that combines a modified Generative Adversarial Network (GAN) and a Convolutional Neural Network (CNN)-based detection approach is proposed for small ship detection. Specifically, a Gaussian Mixture Wasserstein GAN with Gradient Penalty is utilized to first directly generate sufficient informative artificial samples of small ships based on the zero-sum game between a generator and a discriminator, and then an improved CNN-based real-time detection method is trained on both the original and the generated data for accurate small ship detection. Experimental results show that the proposed deep learning method (a) is competent to generate sufficient informative small ship samples and (b) can obtain significantly improved and robust results of small ship detection. The results also indicate that the proposed method can be effectively applied to ensuring autonomous ship safety.
Chapter
We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network. By detecting objects as paired keypoints, we eliminate the need for designing a set of anchor boxes commonly used in prior single-stage detectors. In addition to our novel formulation, we introduce corner pooling, a new type of pooling layer that helps the network better localize corners. Experiments show that CornerNet achieves a 42.1% AP on MS COCO, outperforming all existing one-stage detectors.
Article
We present YOLO, a unified pipeline for object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is also extremely fast; YOLO processes images in real-time at 45 frames per second, hundreds to thousands of times faster than existing detection systems. Our system uses global image context to detect and localize objects, making it less prone to background errors than top detection systems like R-CNN. By itself, YOLO detects objects at unprecedented speeds with moderate accuracy. When combined with state-of-the-art detectors, YOLO boosts performance by 2-3% points mAP.
Distance-IoU loss: faster and better learning for bounding box regression
  • Z Zheng
  • P Wang
  • W Liu
  • J Li
  • R Ye
  • D Ren