Figure - available from: Remote Sensing
This content is subject to copyright.
Patches from the DOTA test set; (a) cropped, (b) very big, (c) very small, (d) complex background, (e) illuminance effect, and (f) panchromatic samples.

Patches from the DOTA test set; (a) cropped, (b) very big, (c) very small, (d) complex background, (e) illuminance effect, and (f) panchromatic samples.

Source publication
Article
Full-text available
Object detection from satellite images has been a challenging problem for many years. With the development of effective deep learning algorithms and advancement in hardware systems, higher accuracies have been achieved in the detection of various objects from very high-resolution (VHR) satellite images. This article provides a comparative evaluatio...

Similar publications

Article
Full-text available
Deep neural networks (DNNs) require a large amount of manually labeled training data to make significant achievements. However, manual labeling is laborious and costly. In this study, we propose a method for automatically generating training data and effectively using the generated data to reduce the labeling cost. The generated data (called "machi...

Citations

... During the training process, the detector compares the detected objects (predicted bounding boxes) with the GTBs using the intersection over union (IoU) at each iteration and updates its parameters accordingly. It is widely accepted that an IoU >= 0.5 indicates that a predicted bounding box overlaps with the GTBs by at least 50% and is considered a correct prediction (Alganci et al., 2020). In this study, the threshold value for IoU was set to 0.5 for both datasets. ...
... While precision and recall are valuable metrics, they individually do not provide a comprehensive assessment of a detector's performance (Alganci et al., 2020). However, their harmonic mean, known as the F1 score, offers a more robust measure of the detector's effectiveness. ...
Article
Full-text available
Most small rodent populations worldwide exhibit fascinating population dynamics, capturing the attention of numerous scholars due to their multiyear cyclic fluctuations in population size and the astonishing amplitude of these fluctuations. Hulunbuir steppe stands as a crucial global hub for livestock production, yet in recent decades, the area has faced recurring challenges from steppes rodent invasions, with Brandt’s vole (Lasiopodomys brandtii, BV) being particularly rampant among them. They not only exhibit seasonal reproduction but also strong social behavior, and are generally considered pests, especially during population outbreak years. Prior studies suggest that BV population outbreaks tend to occur across a wider geographic area, and a strong indicator for identifying rodent outbreaks is recognizing their burrow clusters (burrow systems). Hence, this paper conducts target object detection of BV burrow clusters in the typical steppes of Hulunbuir using two GF-2 satellite images from 2021 (the year of the BV outbreak). This task is accomplished by incorporating the Faster R-CNN model in combination with three detection approaches: object-based image classification (OBIC), based on vegetation index classification (BVIC), and based on texture classification (BTC). The results indicate that OBIC demonstrated the highest robustness in BV burrow cluster detection, achieving an average AP of 63.80% and an F1 score of 0.722 across the two images. BTC exhibited the second-highest level of accuracy, achieving an average AP of 55.95% and an F1 score of 0.6660. Moreover, this approach displayed a strong performance in BV burrow clusters localization. In contrast, BVIC achieved the lowest level of accuracy among the three methods, with an average AP of only 29.45% and an F1 score of 0.4370. Overall, this study demonstrates the crucial role of utilizing high-resolution satellite imagery combined with DL-based object detection techniques in effectively monitoring and managing the potential outbreaks of steppe rodent pests across larger spatial extents.
... In this study, a process of accuracy evaluation was carried out to authenticate the generated image classifications and minimize errors in digital imagery (Alganci et al., 2020). Two techniques were utilised to assess accuracy: the error matrix and the kappa coefficient (Feizizadeh et al., 2022;Thien and Phuong, 2023). ...
... Landsat satellite images (TM and OLI/TIRS) from 1992, 2010, and 2022 were classified using the supervised classification method with four LULC classes, vegetation, barren land, built-up area and bodies of water (Figure 3). The area of each LULC class is also presented in Table 2. Accuracy assessment is important to confirm the correctness of the generated image classifications (Alganci et al., 2020;Hussain et al., 2020;Thien and Phuong, 2023). ...
Article
Full-text available
This study utilised remote sensing data and ArcGIS 10.8 software to evaluate changes in land use and land cover (LULC) and their effects on land surface temperature (LST) in Hai Duong Province, Vietnam, from 1992 to 2022. Landsat satellite data were pre-processed and classified using supervised methods for the years 1992, 2010, and 2022. In 1992, vegetation cover accounted for 57.89% of land cover, increasing to 84.49% in 2010, but then decreasing again to 66.67% in 2022. In contrast, the built-up area consistently increased, from 2.88% in 1992 to 29.35% in 2022, as most of the barren land present in 1992 became built-up area in 2022. The LST values were calculated from the thermal bands for the years 1992, 2010, and 2022 and ranged from 16.09°C to 34.27°C, 17.04°C to 36.74°C, and 11.03°C to 28.44°C, respectively. In addition, the Normalized Difference Vegetation Index (NDVI) values were calculated using the near-infrared band and the red band, with values ranging from -0.40 to 0.70 over the study period. A linear regression analysis indicated a shift in the correlation between NDVI and LST from positive to negative. This study highlights the significant transformation that occurred in Hai Duong Province due to rapid population density increases, urban growth and infrastructure development, leading to a decline in greenery. These LULC changes can cause severe environmental damage. These research findings will assist policymakers in formulating management strategies and sustainable land-use plans to minimize potential harm and promote sustainable development in the area.
... The number of frames taken increases, as does the video length. As a result, Faster RCNN takes longer to compute than YOLOv3 and YOLOv4 (Alganci et al., 2020). The accuracy results in Faster RCNN are higher than YOLO (Dixit et al., 2019). ...
Article
Malaysia ranks third among ASEAN countries in terms of deaths due to accidents, with an alarming increase in the number of fatalities each year. Road conditions contribute significantly to near-miss incidents, while the inefficiency of installed CCTVs and the lack of monitoring system algorithms worsen the situation. The objective of this research is to address the issue of increasing accidents and fatalities on Malaysian roads. Specifically, the study aims to investigate the use of video technology and machine learning algorithms for the car detection and analysis of near-miss accidents. To achieve this goal, the researchers focused on Penang, where the MBPP has deployed 1841 CCTV cameras to monitor traffic and document near-miss accidents. The study utilised the YOLOv3, YOLOv4, and Faster RCNN algorithms for vehicle detection. Additionally, the study employed image processing techniques such as Bird’s Eye View and Social Distancing Monitoring to detect and analyse how near misses occur. Various video lengths (20s, 40s, 60s and 80s) were tested to compare the algorithms’ error detection percentage and test duration. The results indicate that Faster RCNN beats YOLOv3 and YOLOV4 in car detection with low error detection, whereas YOLOv3 and YOLOv4 outperform near-miss detection, while Faster RCNN does not perform it. Overall, this study demonstrates the potential of video technology and machine learning algorithms in near-miss accident detection and analysis. Transportation authorities can better understand the causes of accidents and take appropriate measures to improve road safety using these models. This research can be a foundation for further traffic safety and accident prevention studies.
... In this section, two main experiments are implemented using Faster RCNN with Resnet-50 as the foundation CNN. Faster RCNN has shown to be better framework for detecting aircraft and is ideal for real-world with little training data situations (Alganci et al., 2020;Azam et al., 2022). In the first experiment, Faster RCNN model is trained by aerial set as shown in Figure 5. Second experiment, Faster RCCN is trained by GF-2 Satellite set as shown in Figure 6. ...
... Each image is of the size in the range from 800 × 800 to 20,000 × 20,000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. (Alganci et al., 2020). Finetuning of the parameters and extending the training set with the DOTA datasets are also applied. ...
... The results indicate that the distribution of objects in datasets could improve the accuracy of GF-2 satellite trained model besides the proposed convenient anchor sizes.The results shows that the accuracy of the two experiments aerial trained model according to aerial, GF-2, JL-1 and Pleiades satellite test set are 0.898, 0.977, 0.506, 0.656 respectively, and GF-2 trained model according to aerial, GF-2, JL-1 and Pleiades satellite test set are 0.68, 0.96, 0.394 and 0.484 respectively. Both experiments gave higher accuracy compared to the DOTA trained model according to DOTA and Pleiades satellite test sets 0.717, 0.364 respectively(Alganci et al., 2020). Hence, the aerial and GF-2 trained models are more likely to perform well in aerial and satellite tests than the DOTA-trained model.Finally, this section presents some samples images of the two experiments. ...
Conference Paper
Full-text available
Object detection in remote sensing imagery plays an important role in many applications, such as tracking and change detection. With the development of deep learning algorithms and advancement in hardware systems, improved accuracies have been achieved in the detection of various objects from remote sensing images. However, object detection across heterogeneous remote sensing imagery remains an important issue, particularly for satellite and aerial imagery. The colour variation for the same ground objects, variable resolutions, different platform heights, the parallax effect, and image distortion brought on by diverse shooting angles are the biggest hurdles in satellite-aerial detection applications. The research aims to obtain successful model for detecting aircrafts from satellite and aerial images and reduce cost and the gap of revisit time between sensors. The networks were tested using aerial, GF-2, Jilin-1 (JL-1) and Pleiades satellites test sets after being trained individually using the RGB high-resolution aerial set and panchromatic low-resolution GF-2 satellite set to validate the efficiency of the trained models. Also, the aerial-trained model and GF-2 satellite-trained model as dedicated models were compared with each other, and model trained by all dataset for Object Detection in Aerial Images (DOTA). It is observed that the anchor sizes and augmentation methods can enhance the performance of detection models. k-means algorithm and data augmentation were applied to produce better anchor box selection and avoid overfitting, atmospheric conditions problems, respectively. The accuracy assessment results demonstrate that the aerial-trained model outperforms the GF-2 satellite-trained model. In addition, the results of two dedicated detection models show improved accuracy compared to the DOTA-trained model.
... Recent advancements have seen the development of specialized deep learning architectures tailored for aerial imagery. These models incorporate modifications to handle the unique challenges posed by aerial data, such as multi-scale object detection mechanisms, robustness to varying lighting conditions, and the ability to handle large-scale datasets efficiently [5]. The integration of attention mechanisms and region proposal networks has further enhanced the precision and recall rates of these models. ...
Article
Full-text available
In the field of aerial remote sensing, detecting small objects in aerial images is challenging. Their subtle presence against broad backgrounds, combined with environmental complexities and low image resolution, complicates identification. While their detection is crucial for urban planning, traffic monitoring, and military reconnaissance, many deep learning approaches demand significant computational resources, hindering real-time applications. To elevate the accuracy of small object detection in aerial imagery and cater to real-time requirements, we introduce SenseLite, a lightweight and efficient model tailored for aerial image object detection. First, we innovatively structured the YOLOv5 model for a more streamlined structure. In the backbone, we replaced the original structure with cutting-edge lightweight neural operator Involution, enhancing contextual semantics and weight distribution. For the neck, we incorporated GSConv and slim-Neck, striking a balance between reduced computational complexity and performance, which is ideal for rapid predictions. Additionally, to enhance detection accuracy, we integrated a squeeze-and-excitation (SE) mechanism to amplify channel communication and improve detection accuracy. Finally, the Soft-NMS strategy was employed to manage overlapping targets, ensuring precise concurrent detections. Performance-wise, SenseLite reduces parameters by 30.5%, from 7.05 M to 4.9 M, as well as computational demands, with GFLOPs decreasing from 15.9 to 11.2. It surpasses the original YOLOv5, showing a 5.5% mAP0.5 improvement, 0.9% higher precision, and 1.4% better recall on the DOTA dataset. Compared to other leading methods, SenseLite stands out in terms of performance.
... As image processing technology, sensors, and data storage capabilities continue to advance, the acquisition of high-resolution (HR) remote sensing images has become more common and feasible [1]. HR remote sensing images refer to image data with corresponding spatial resolutions acquired by remote sensing platforms, such as satellites, aviation, or unmanned aerial vehicles. ...
... In recent years, attention mechanisms have been widely adopted in the field of computer vision. There are two ways of modeling attention mechanisms: (1) One is to use global information to obtain attentional weights to enhance key local areas or channels without considering the dependencies between global information. SE-Block [19] represents a classical approach to attention, aiming to explicitly establish interdependencies between feature channels. ...
Article
Full-text available
Semantic segmentation of high-resolution remote sensing images holds paramount importance in the field of remote sensing. To better excavate and fully fuse the features in high-resolution remote sensing images, this paper introduces a novel Global and Local Feature Fusion Network, abbreviated as GLF-Net, by incorporating the extensive contextual information and refined fine-grained features. The proposed GLF-Net, devised as an encoder–decoder network, employs the powerful ResNet50 as its baseline model. It incorporates two pivotal components within the encoder phase: a Covariance Attention Module (CAM) and a Local Fine-Grained Extraction Module (LFM). And an additional wavelet self-attention module (WST) is integrated into the decoder stage. The CAM effectively extracts the features of different scales from various stages of the ResNet and then encodes them with graph convolutions. In this way, the proposed GLF-Net model can well capture the global contextual information with both universality and consistency. Additionally, the local feature extraction module refines the feature map by encoding the semantic and spatial information, thereby capturing the local fine-grained features in images. Furthermore, the WST maximizes the synergy between the high-frequency and the low-frequency information, facilitating the fusion of global and local features for better performance in semantic segmentation. The effectiveness of the proposed GLF-Net model is validated through experiments conducted on the ISPRS Potsdam and Vaihingen datasets. The results verify that it can greatly improve segmentation accuracy.
... While R-CNN is faster than YOLOv3 in accuracy, YOLOv3 runs faster than faster R-CNN, making it preferable for real-time applications. A similar result was obtained in [34]. Roy et al. [35] used SSD, Faster R-CNN, YOLOv3, and YOLOv3Tiny to recognize masked individuals and tested them on the Moxa3K dataset. ...
Article
Full-text available
In 2019, the COVID-19 disease spread worldwide, and the World Health Organization recommended using masks for everyone. Using a mask is one of the ways to prevent the transmission of the coronavirus. Naturally, there was a need to distinguish people wearing masks from people without masks automatically. Artificial intelligence can be used to indicate masked from unmasked individuals if needed. In this regard, machine learning models and convolutional neural networks have been employed to design an effective model for mask recognition. This problem represents a supervised learning and binary classification problem, where one group is masked, and the other is unmasked. The proposed model has been implemented using the Python programming language, and PyTorch has been utilized for its development. The custom model includes four convolutional layers for extracting image features and four fully connected layers for the artificial neural network part, distinguishing a masked person from an unmasked person. A dataset of approximately 12,000 face mask recognition images was utilized for training the model, resulting in an accuracy of 99.95% during the training phase and 99.02% during the testing phase. In addition to the competitive accuracy achieved in training and testing, the proposed model has consistently demonstrated outstanding performance across multiple metrics. The average precision, recall, and F1 score, exceeding 99%, further highlight the exceptional capabilities of our model. These results are particularly notable compared to the findings reported in other articles, reaffirming the effectiveness and superiority of our proposed approach.
... Meanwhile, since the cross-boundary anchor boxes brought about a large number of error terms that were difficult to correct, the anchor boxes with boundary-crossing outliers were ignored in the training processes. Finally, based on the classification information of the generated proposal regions, a non-maximum suppression (NMS) approach was adapted to deal with the highly overlapping candidate boxes; the IoU threshold for NMS was fixed at 0.7 [60]. ...
Article
Full-text available
Real-time and accurate awareness of the grain situation proves beneficial for making targeted and dynamic adjustments to cleaning parameters and strategies, leading to efficient and effective removal of impurities with minimal losses. In this study, harvested maize was employed as the raw material, and a specialized object detection network focused on impurity-containing maize images was developed to determine the types and distribution of impurities during the cleaning operations. On the basis of the classic contribution Faster Region Convolutional Neural Network, EfficientNetB7 was introduced as the backbone of the feature learning network and a cross-stage feature integration mechanism was embedded to obtain the global features that contained multi-scale mappings. The spatial information and semantic descriptions of feature matrices from different hierarchies could be fused through continuous convolution and upsampling operations. At the same time, taking into account the geometric properties of the objects to be detected and combining the images’ resolution, the adaptive region proposal network (ARPN) was designed and utilized to generate candidate boxes with appropriate sizes for the detectors, which was beneficial to the capture and localization of tiny objects. The effectiveness of the proposed tiny object detection model and each improved component were validated through ablation experiments on the constructed RGB impurity-containing image datasets.
... This sharing of layers enhances the overall efficiency of the network. The Faster RCNN model has emerged as one of the state-of-the-art object detectors, surpassing the performance of other traditional models such as YOLO, SSD, and other traditional models on several key metrics [26,27] ...
Article
Full-text available
Solar photovoltaic (PV) deployment plays a crucial role in the transition to renewable energy. However, comprehensive models that can effectively explain the variations in solar PV deployment are lacking. This study aims to address this gap by introducing two innovative models: (i) a computer vision model that can estimate spatial distribution of solar PV deployment across neighborhoods using satellite images and (ii) a machine learning (ML) model predicting such distribution based on 43 factors. Our computer vision model using Faster Regions with Convolutional Neural Network (Faster RCNN) achieved a mean Average Precision (mAP) of 81% for identifying solar panels and 95% for identifying roofs. Using this model, we analyzed 652,795 satellite images from Colorado, USA, and found that approximately 7% of households in Colorado have rooftop PV systems, while solar panels cover around 2.5% of roof areas in the state as of early 2021. Of our 16 predictive models, the XGBoost models performed the best, explaining approximately 70% of the variance in rooftop solar deployment. We also found that the share of Democratic party votes, hail and strong wind risks, median home value, the percentage of renters, and solar PV permitting timelines are the key predictors of rooftop solar deployment in Colorado. This study provides insights for business and policy decision making to support more efficient and equitable grid infrastructure investment and distributed energy resource management.
... Convolutional Neural Networks (CNN) are a special category of deep learning algorithms that can accept as input several sample images and perform convolution operations to extract features from the input images and can distinguish each object from the other (Alganci, Soydas, & Sertel, 2020). The structural architecture of the CNN network ( Figure 1) is similar to the structure of the neuronal connectivity of the human brain. ...
Article
Full-text available
Papaya California (Carica papaya L) is one of the agricultural commodities in the tropics and has a very big opportunity to develop in Indonesia as an agribusiness venture with quite promising prospects. So the quality of papaya fruit is determined by the level of maturity of the fruit, the hardness of the fruit, and its appearance. Papaya fruit undergoes a marked change in color during the ripening process, which indicates chemical changes in the fruit. The change in papaya color from green to yellow is due to the loss of chlorophyll. During storage, the papaya fruit is initially green, then turns slightly yellow. The longer the storage color, the changes to mature the yellow. The process of classifying papaya fruit's ripeness level is usually done manually by business actors, that is, by simply looking at the color of the papaya with the normal eye. Based on the problems that exist in classifying the ripeness level of papaya fruit, in this research, we create a system that can be used to classify papaya fruit skin color using a digital image processing approach. The method used to classify the maturity level of papaya fruit is the Convolutional Neural Network (CNN) Architecture to classify the texture and color of the fruit. This study uses eight transfer learning architectures with 216 simulations with parameter constraints such as optimizer, learning rate, batch size, number of layers, epoch, and dense and can classify the ripeness level of the papaya fruit with a fairly high accuracy of 97%. Farmers use the results of the research in classifying papaya fruit to be harvested by differentiating the maturity level of the fruit more accurately and maintaining the quality of the papaya fruit.