(A) Reference map at 30m spatial resolution; (B) coarsened map at 900 m using majority rule, argmax(H); and (C) area proportion of that class, max(H). 

(A) Reference map at 30m spatial resolution; (B) coarsened map at 900 m using majority rule, argmax(H); and (C) area proportion of that class, max(H). 

Source publication
Article
Full-text available
Land cover mapping for large regions often employs satellite images of medium to coarse spatial resolution, which complicates mapping of discrete classes. Class memberships, which estimate the proportion of each class for every pixel, have been suggested as an alternative. This paper compares different strategies of training data allocation for dis...

Contexts in source publication

Context 1
... classification accuracy of NLCD2006 with 16 original classes is 78% and class aggregation to level I with eight classes yields 84% overall accuracy [40]. In comparison, assessment of NLCD2006 for the southeastern United States with nine classes (Table 1) and coarsened to 900 m ( Figure 3B) using the primary or primary and alternative reference samples resulted in 59.3% respective 72.9% overall accuracy. Major sources of error are Wetland, which was frequently confused with forests, also having the lowest accuracies in NLCD2006 [40], and Pasture versus Grassland, as two land-use forms of herbaceous cover. Despite the coarse resolution developed areas were classified ...
Context 2
... estimation from discrete maps is a straight-forward pixel count for class i multiplied by the area of each pixel. The area of each class from memberships is the total of all membership values times their pixel area [15]. The total absolute difference in area (AD) between reference R (NLCD2006) and classification or membership CM, with K being the total of all pixels, was calculated using Equation (6) and expressed in area and in percent against the total of the study area. Table 2 shows near-to-perfect spatial co-registration between NDVI from ten Landsat images and corresponding dates of MODIS composites. The offsets are negligible, with averages of x = −3 m, y = −3 m and extremes lower-equal ±30 m. The coefficient values itself are all positive and indicate a sufficiently high correlation, i.e., the spatial patterns in Landsat and MODIS NDVI are closely related to each other. This finding is an important prerequisite for the following analysis as it permits a direct relation between Landsat-based NLCD2006 maps and MODIS. Table 2. Spatial offset between Landsat images (for their spatial location see Figure 1) and temporally corresponding composites of MODIS data using the NDVI. Figure 3A shows the NLCD2006 map recoded to nine classes (Table 1) at 30 m spatial resolution. The map illustrates some spatial details such as the road network in Kansas that disappeared in Figure 3B, showing the spatial distribution of the dominant class at 900 m spatial resolution derived with majority rule argmax(H). Figure 3C indicates the corresponding area proportion of the dominating class, max(H). There are distinct regional patterns with homogeneous areas in the western portion (Shrubland, Grassland, Cultivated crops), the Mississippi valley (Cultivated crops), the southern Ozark and Appalachians mountains (Deciduous forest), the Okefenokee Swamp in southern Georgia (Wetlands), and large metropolitan areas like Atlanta, Dallas-Fort Worth, and St. Louis (Developed). In particular, the southeastern region is highly heterogeneous with area proportions of the dominating class below 50%; similar heterogeneous patterns exist in eastern Texas, Oklahoma, Louisiana, and Arkansas. Table 3 shows for each class the percentage of homogeneity in 12 bins. It is evident that there are more pixels with low homogeneity, but the magnitude is different for each class. For instance, class Water only exists in selected parts of the map and thus H = 0% makes up 76.7% of the study area. Class Deciduous forest is rather ubiquitous with a proportion of 37.6% for 10% ≤ H < 60%. Due to many roads that cause a homogeneity slightly above 0%, class Developed is an interesting example with only 21.3% for H = 0% but 61.5% for 0% < H < 10%. Table 3. Homogeneity (H) in 10-percent bins and bins for 0 and 100 percent derived from NLCD2006. For abbreviations of class names see Table 1 ...
Context 3
... estimation from discrete maps is a straight-forward pixel count for class i multiplied by the area of each pixel. The area of each class from memberships is the total of all membership values times their pixel area [15]. The total absolute difference in area (AD) between reference R (NLCD2006) and classification or membership CM, with K being the total of all pixels, was calculated using Equation (6) and expressed in area and in percent against the total of the study area. Table 2 shows near-to-perfect spatial co-registration between NDVI from ten Landsat images and corresponding dates of MODIS composites. The offsets are negligible, with averages of x = −3 m, y = −3 m and extremes lower-equal ±30 m. The coefficient values itself are all positive and indicate a sufficiently high correlation, i.e., the spatial patterns in Landsat and MODIS NDVI are closely related to each other. This finding is an important prerequisite for the following analysis as it permits a direct relation between Landsat-based NLCD2006 maps and MODIS. Table 2. Spatial offset between Landsat images (for their spatial location see Figure 1) and temporally corresponding composites of MODIS data using the NDVI. Figure 3A shows the NLCD2006 map recoded to nine classes (Table 1) at 30 m spatial resolution. The map illustrates some spatial details such as the road network in Kansas that disappeared in Figure 3B, showing the spatial distribution of the dominant class at 900 m spatial resolution derived with majority rule argmax(H). Figure 3C indicates the corresponding area proportion of the dominating class, max(H). There are distinct regional patterns with homogeneous areas in the western portion (Shrubland, Grassland, Cultivated crops), the Mississippi valley (Cultivated crops), the southern Ozark and Appalachians mountains (Deciduous forest), the Okefenokee Swamp in southern Georgia (Wetlands), and large metropolitan areas like Atlanta, Dallas-Fort Worth, and St. Louis (Developed). In particular, the southeastern region is highly heterogeneous with area proportions of the dominating class below 50%; similar heterogeneous patterns exist in eastern Texas, Oklahoma, Louisiana, and Arkansas. Table 3 shows for each class the percentage of homogeneity in 12 bins. It is evident that there are more pixels with low homogeneity, but the magnitude is different for each class. For instance, class Water only exists in selected parts of the map and thus H = 0% makes up 76.7% of the study area. Class Deciduous forest is rather ubiquitous with a proportion of 37.6% for 10% ≤ H < 60%. Due to many roads that cause a homogeneity slightly above 0%, class Developed is an interesting example with only 21.3% for H = 0% but 61.5% for 0% < H < 10%. Table 3. Homogeneity (H) in 10-percent bins and bins for 0 and 100 percent derived from NLCD2006. For abbreviations of class names see Table 1 ...
Context 4
... estimation from discrete maps is a straight-forward pixel count for class i multiplied by the area of each pixel. The area of each class from memberships is the total of all membership values times their pixel area [15]. The total absolute difference in area (AD) between reference R (NLCD2006) and classification or membership CM, with K being the total of all pixels, was calculated using Equation (6) and expressed in area and in percent against the total of the study area. Table 2 shows near-to-perfect spatial co-registration between NDVI from ten Landsat images and corresponding dates of MODIS composites. The offsets are negligible, with averages of x = −3 m, y = −3 m and extremes lower-equal ±30 m. The coefficient values itself are all positive and indicate a sufficiently high correlation, i.e., the spatial patterns in Landsat and MODIS NDVI are closely related to each other. This finding is an important prerequisite for the following analysis as it permits a direct relation between Landsat-based NLCD2006 maps and MODIS. Table 2. Spatial offset between Landsat images (for their spatial location see Figure 1) and temporally corresponding composites of MODIS data using the NDVI. Figure 3A shows the NLCD2006 map recoded to nine classes (Table 1) at 30 m spatial resolution. The map illustrates some spatial details such as the road network in Kansas that disappeared in Figure 3B, showing the spatial distribution of the dominant class at 900 m spatial resolution derived with majority rule argmax(H). Figure 3C indicates the corresponding area proportion of the dominating class, max(H). There are distinct regional patterns with homogeneous areas in the western portion (Shrubland, Grassland, Cultivated crops), the Mississippi valley (Cultivated crops), the southern Ozark and Appalachians mountains (Deciduous forest), the Okefenokee Swamp in southern Georgia (Wetlands), and large metropolitan areas like Atlanta, Dallas-Fort Worth, and St. Louis (Developed). In particular, the southeastern region is highly heterogeneous with area proportions of the dominating class below 50%; similar heterogeneous patterns exist in eastern Texas, Oklahoma, Louisiana, and Arkansas. Table 3 shows for each class the percentage of homogeneity in 12 bins. It is evident that there are more pixels with low homogeneity, but the magnitude is different for each class. For instance, class Water only exists in selected parts of the map and thus H = 0% makes up 76.7% of the study area. Class Deciduous forest is rather ubiquitous with a proportion of 37.6% for 10% ≤ H < 60%. Due to many roads that cause a homogeneity slightly above 0%, class Developed is an interesting example with only 21.3% for H = 0% but 61.5% for 0% < H < 10%. Table 3. Homogeneity (H) in 10-percent bins and bins for 0 and 100 percent derived from NLCD2006. For abbreviations of class names see Table 1 ...
Context 5
... MAD compared to other tested algorithms. Homogeneous training data for classification trees are clearly inferior compared to heterogeneous training pixels (∆r = 0.11 and ∆MAD = 1.94%). Equal allocation between classes results in slightly lower correlation and higher MAD than random or area-proportional allocation. For regression trees, random allocation shows 0.07 higher correlations and a notably (4.9%) lower MAD than uniform sampling. Normalization only marginally improves correlation coefficients but the MAD decreases by 1.5%. Best results of classification trees were obtained for RF-C with area-proportional between-class sample allocation and randomly allocated heterogeneous samples (r = 0.83, MAD = 6.67%) which is almost as good as best results from regression trees with Cubist, random allocation of heterogeneous pixels with no minimum set and normalization r = 0.86 and MAD = 5.93%). Figure 5 shows the spatial distribution of the MAD for which MAD was computed for each pixel individually. The figure only displays results for RF-C and Cubist; the spatial distribution of the error was similar for C5.0 and RF-R, respectively. For classification trees there are no spatial differences between among-class allocations (area-proportional allocation is shown), and there are no differences between allocations of heterogeneous pixels (uniform is displayed), which corresponds to the spatial patterns shown in Figure 3C. Allocating only homogeneous pixels for training shows notably higher errors in general and in particular for transitional zones from Deciduous forest to Evergreen forest in Mississippi, Alabama, and Georgia as well as transitions from Shrubland to Cultivated crops to Grassland in Texas and Oklahoma. Regression tree results with random allocation of heterogeneous pixels depict no differences among each other (Random-0 is depicted) for which normalization has no impact on the spatial distribution of errors. There are isolated areas with high errors, e.g., the Okefenokee Swamp in southeastern Georgia for which the membership values for class Wetland were underestimated. Uniform allocation depicts high MAD throughout the entire image, which decrease when normalization is applied. Figure 5. Spatial mean absolute difference (MAD) of selected image sets of class memberships. Random forest classification (RF-C) with area-proportional sample allocation between classes and homogeneous and heterogeneous, uniformly allocated training pixels. Cubist with uniform and random allocation of heterogeneous training pixels and with and without ...

Similar publications

Article
Full-text available
There are many disagreements and uncertainties among global land use/land cover (LULC) products, which make it unsuitable to apply these products directly to a specific region. In this study, Enhanced Vegetation Index (EVI) time-series data from the Moderate Resolution Imaging Spectroradiometer (MODIS) with 250 m spatial resolution, combining with...
Preprint
Full-text available
Abstract. Abstract. The dynamic characteristics of seasonal snow cover are critical for hydrology management, climate system, and ecosystem function. Although optical satellite remote sensing has proved to be an effective tool for monitoring global and regional variations of snow cover, it is still problematic to accurately capture the snow dynami...
Article
Full-text available
Satellite-derived rugged land surface temperature (LST) is an important parameter indicating the status of the Earth’s surface energy budget and its seasonal/temporal dynamic change. However, existing LST products from rugged areas are more prone to error when supporting applications in mountainous areas and Earth surface processes that occur at hi...
Preprint
Full-text available
The long exposure to particulate matter (PM) with aerodynamic diameters 10µm (PM10) and 2.5µm (PM2.5) has negative effects on human health. Although station-based PM monitoring has been conducted around the world, it is still challenging to provide spatially continuous PM information for vast areas at high spatial resolution. Satellite-derived aero...
Article
Full-text available
The Qilian Mountains (QLM) are an important ecological barrier in western China. High-precision land cover data products are the basic data for accurately detecting and evaluating the ecological service functions of the QLM. In order to study the land cover in the QLM and performance of different remote sensing classification algorithms for land co...

Citations

... The samples independence should be guaranteed as the spatial correlation of between training and evaluation samples may boost the reported classification accuracy (Colditz. 2015;Kattenborn et al., 2022). Fig. 4 shows the spatial distribution of samples for each type. All the samples are well distributed in the study area despite that grassland and barren land are more concentrated in the west of the study area because the nature of the land cove distribution of the study area (See Fig. 14 in Section 4.4). The be ...
... Through our results for the typical semantic segmentation networks with different structures, we verified the generalizability and effectiveness of the multiclass complexitybased optimal sampling method. Previous studies [44,53,54,80,81] have shown that the stratified sampling method can obtain training samples from different strata (regions), potentially improving the level of classification accuracy. However, the performance improvement in these studies depended on correctly stratifying (partitioning) the data, as there is no quantified standard indicator to measure the contribution of stratification to performance, and many have overlooked the significant contribution of each individual sample to the model's generalization capability for prediction. ...
Article
Full-text available
Challenges in enhancing the multiclass segmentation of remotely sensed data include expensive and scarce labeled samples, complex geo-surface scenes, and resulting biases. The intricate nature of geographical surfaces, comprising varying elements and features, introduces significant complexity to the task of segmentation. The limited label data used to train segmentation models may exhibit biases due to imbalances or the inadequate representation of certain surface types or features. For applications like land use/cover monitoring, the assumption of evenly distributed simple random sampling may be not satisfied due to spatial stratified heterogeneity, introducing biases that can adversely impact the model’s ability to generalize effectively across diverse geographical areas. We introduced two statistical indicators to encode the complexity of geo-features under multiclass scenes and designed a corresponding optimal sampling scheme to select representative samples to reduce sampling bias during machine learning model training, especially that of deep learning models. The results of the complexity scores showed that the entropy-based and gray-based indicators effectively detected the complexity from geo-surface scenes: the entropy-based indicator was sensitive to the boundaries of different classes and the contours of geographical objects, while the Moran’s I indicator had a better performance in identifying the spatial structure information of geographical objects in remote sensing images. According to the complexity scores, the optimal sampling methods appropriately adapted the distribution of the training samples to the geo-context and enhanced their representativeness relative to the population. The single-score optimal sampling method achieved the highest improvement in DeepLab-V3 (increasing pixel accuracy by 0.3% and MIoU by 5.5%), and the multi-score optimal sampling method achieved the highest improvement in SegFormer (increasing ACC by 0.2% and MIoU by 2.4%). These findings carry significant implications for quantifying the complexity of geo-surface scenes and hence can enhance the semantic segmentation of high-resolution remote sensing images with less sampling bias.
... Compared to SVM, RF had performed better in terms of both accuracy (61.1%) and kappa values (0.338). When performing multi-classification, RF obtained better classification results [62,63]. RF outperformed SVM in their ability to generalize and handle multidimensional data. ...
... RF gave better results when there was a large number of training samples available [15]. Our training samples share of 4% was much larger than 0.25% [62], although SVM had good performance on imbalanced training datasets [16]. However, due to variations in stem density and spacing and the range of tree heights and sizes, there is tremendous variability and overlap in the spectral signature of trees in old-growth P. orientalis stands [66,67], which make the spectrum unusual and low in accuracy. ...
Article
Full-text available
Assessing the health status of old trees is crucial for the effective protection and health management of old trees. In this study, we utilized an unmanned aerial vehicle (UAV) equipped with multispectral cameras to capture images for the rapid assessment of the health status of old trees. All trees were classified according to health status into three classes: healthy, declining, and severe declining trees, based on the above-ground parts of the trees. Two traditional machine learning algorithms, Support Vector Machines (SVM) and Random Forest (RF), were employed to assess their health status. Both algorithms incorporated selected variables, as well as additional variables (aspect and canopy area). The results indicated that the inclusion of these additional variables improved the overall accuracy of the models by 8.3% to 13.9%, with kappa values ranging from 0.166 and 0.233. Among the models tested, the A-RF model (RF with aspect and canopy area variables) demonstrated the highest overall accuracy (75%) and kappa (0.571), making it the optimal choice for assessing the health condition of old trees. Overall, this research presents a novel and cost-effective approach to assessing the health status of old trees.
... The tree interpreter of the CATE model was applied to visualize how depression influenced a subgroup of samples differently from others (Fig. 4). We trained a tree interpreter with a minimum of 100 samples per leaf (1% of all samples) to maximally capture important features without overfitting (Colditz, 2015). Finally, the results of four sensitivity tests based on DML were shown in Table 3. ...
Article
Depression has been identified as a risk factor for suicide, yet limited evidence has elucidated the underlying pathways linking depression to subsequent suicide risk. Therefore, we aimed to examine the psychological mechanisms that connect depression to suicide risk via linguistic characteristics on Weibo. We sampled 487,251 posts from 3196 users who belong to the depression super-topic community (DSTC) on Sina Weibo as the depression group, and 357,939 posts from 5167 active users as the control group. We employed the double machine learning method (DML) to estimate the impact of depression on suicide risk, and interpreted the pathways from depression to suicide risk using SHapley Additive exPlanations (SHAP) values and tree interpreters. The results indicated an 18% higher likelihood of suicide risk in the depression group compared to people without depression. The SHAP values further revealed that Exclusive (M = 0.029) was the most critical linguistic feature. Meanwhile, the three-depth tree interpreter illustrated that the high suicide risk subgroup of the depression group (N = 1196, CATE = 0.32 ± 0.04, 95%CI [0.20, 0.43]) was predicted by higher usage of Exclusive (>0.59) and Health (>-0.10). DML revealed pathways linking depression to suicide risk. The visualized tree interpreter showed cognitive complexity and physical distress might be positively associated with suicide risk in depressed populations. These findings have invigorated further investigation to elucidate the relationship between depression and suicide risk. Understanding the underlying mechanisms serves as a basis for future research on suicide prevention and treatment for individuals with depression.
... For example, some regions have training units that are geographically clumped (e.g., Ghana) or land cover classes that are overrepresented (e.g., herbaceous) (Figs. [3][4][5]. Some users may need to sub-sample the dataset or enforce constraints on data density depending on their research question, application, or area of interest. ...
... For applications focused on land cover change (abrupt or gradual), for which our database includes proportionally less data, we recommend retaining all change training data (for guidance see 3,11,13 ). For applications focused on agriculture, users can use the Level 2 category of ' Agriculture' as a starting point but should be aware that this label has not undergone rigorous quality control and is available for a limited subset of the global training dataset (286,284 units). ...
Article
Full-text available
State-of-the-art cloud computing platforms such as Google Earth Engine (GEE) enable regional-to-global land cover and land cover change mapping with machine learning algorithms. However, collection of high-quality training data, which is necessary for accurate land cover mapping, remains costly and labor-intensive. To address this need, we created a global database of nearly 2 million training units spanning the period from 1984 to 2020 for seven primary and nine secondary land cover classes. Our training data collection approach leveraged GEE and machine learning algorithms to ensure data quality and biogeographic representation. We sampled the spectral-temporal feature space from Landsat imagery to efficiently allocate training data across global ecoregions and incorporated publicly available and collaborator-provided datasets to our database. To reflect the underlying regional class distribution and post-disturbance landscapes, we strategically augmented the database. We used a machine learning-based cross-validation procedure to remove potentially mis-labeled training units. Our training database is relevant for a wide array of studies such as land cover change, agriculture, forestry, hydrology, urban development, among many others.
... Numerous techniques, including simple random sampling and the stratified sample strategy, were employed to acquire the training samples to avoid misclassification issues. Three diverse stratified sampling methods such as stratified equal random sampling, stratified proportional random sampling, stratified systematic sampling, and the binomial minimum fifty-sample rule, were employed to collect optimum training samples [40,68,69]. In this study, the binomial minimum fifty-sample rule with stratified random sampling was employed to collect a minimum of 50 samples per LULC class, dividing them into a 65:35 ratio as training (65%) and test samples (35%) ( Table 3). ...
Article
Full-text available
Land use and land cover (LULC) classification plays a significant role in the analysis of climate change, evidence-based policies, and urban and regional planning. For example, updated and detailed information on land use in urban areas is highly needed to monitor and evaluate urban development plans. Machine learning (ML) algorithms, and particularly ensemble ML models support transferability and efficiency in mapping land uses. Generalization, model consistency, and efficiency are essential requirements for implementing such algorithms. The transfer-ensemble learning approach is increasingly used due to its efficiency. However, it is rarely investigated for mapping complex urban LULC in Global South cities, such as India. The main objective of this study is to assess the performance of machine and ensemble-transfer learning algorithms to map the LULC of two metropolitan cities of India using Landsat 5 TM, 2011, and DMSP-OLS nightlight, 2013. This study used classical ML algorithms, such as Support Vector Machine-Radial Basis Function (SVM-RBF), SVM-Linear, and Random Forest (RF). A total of 480 samples were collected to classify six LULC types. The samples were split into training and validation sets with a 65:35 ratio for the training, parameter tuning, and validation of the ML algorithms. The result shows that RF has the highest accuracy (94.43%) of individual models, as compared to SVM-RBF (85.07%) and SVM-Linear (91.99%). Overall, the ensemble model-4 produces the highest accuracy (94.84%) compared to other ensemble models for the Kolkata metropolitan area. In transfer learning, the pre-trained ensemble model-4 achieved the highest accuracy (80.75%) compared to other pre-trained ensemble models for Delhi. This study provides innovative guidelines for selecting a robust ML algorithm to map urban LULC at the metropolitan scale to support urban sustainability.
... This is because (i) the LST products of some dates were not produced for various reasons, making it impossible to obtain them from the official website. (ii) Although some dates have LST products available from the official website, the number of good-quality and other-quality pixels in the image is 0, which makes it impossible to build an XGBoost model to reconstruct LSTs; iii) in the images of other dates, the number of good-quality and other-quality pixels is too low (less than 0.25% [41]) to build a model, and the LST data for these dates have therefore not been reconstructed. Therefore, a weighted averaging algorithm based on the distance between dates is used to reconstruct the LST missing over time, with the specific formula as follows: ...
Article
Full-text available
Accurate, seamless, and long-term land surface temperature (LST) data sets are crucial for investigating climate change and agriculture production. However, factors like cloud contamination have led to invalid values in the LST product, which has restricted the application of the LST dataset. Therefore, the reconstruction of LST products is challenging, and it is attracting widespread attention. This study compared the performance of different algorithms (XGBoost, GBDT, RF, POLY, MLR) and different training sets (using only good-quality pixels or using both good-quality and other-quality pixels) in the estimation of missing pixels in the LST data, obtaining a seamless daily 1 km LST dataset of MODIS Terra-day, Aqua-day, Terra-night, and Aqua-night data for Zhejiang Province and its surrounding areas from 2000 to 2022. The results demonstrated that the performance of machine-learning models is significantly better than that of linear models, and among the five models, XGBoost performed the best, with an RMSE of less than 1 °C. The Wilcoxon test between the reconstructed LST and the true LST values revealed that including both good-quality and other-quality pixels for reconstruction resulted in a 33% increase in the number of days with non-significant differences compared with using only good-quality pixels. Moreover, the reconstructed nighttime LST has a lower RMSE compared with the reconstructed daytime LST, and the RMSE of the reconstructed LST on the Terra satellite is lower than the RMSE of the reconstructed LST on the Aqua satellite. The RMSEs for the reconstructed LSTs are 0.50 °C, 0.61 °C, 0.36 °C, and 0.39 °C, corresponding to Terra-day, Aqua-day, Terra-night, and Aqua-night for images with coverage reaching 70%, 0.65 °C, 0.83 °C, 0.49 °C, respectively, and 0.52 °C for images with coverage less than 70%. The accuracy of the reconstructed LSTs using our proposed framework outperforms the existing reconstruction methods. The 1 km daily seamless LST products can be applied in various fields, such as air temperature estimation, climate change, urban heat island, and crop temperature stress monitoring.
... We employed random forest analyses to identify the most important microbial predictors of changes in SOC concentrations and ASI within bulk soil and all aggregate size classes. The generation of forest trees required three critical parameters: (i) the number of tree to be generated (ntree), which was set to 1000 based on previous studies (e.g., Colditz, 2015), as a higher ntree will generate more steady evaluations of variable importance; (ii) the number of variables for the minimal splits in the tree (ntest), which was set as the square root of the number of input variables (Gislason et al., 2006); (iii) the least number of observations at the final node of the tree, which was set to 5 in our analysis. All random forest analyses were conducted using the "RandomForest" package (Breiman, 2001) in R software (ver.3.4.2). ...
Article
Increasing soil carbon (C) stocks and improving soil structure are critical challenges in semi-arid agroecosystems. Conservation tillage has been widely applied to promote aggregate stability, enhance soil organic C (SOC) storage, and conjointly influence microbial community composition. However, the relation among soil microbial groups, aggregate stability and SOC stocks under management practices remains unclear. We conducted a 17-year field experiment in a spring maize cropping system on the Loess Plateau of northwest China, comparing three types of management: conventional tillage with residue removal (CT-RR), reduced tillage with residue incorporation (RT-RI) and no-tillage with residue mulching (NT-RM). We evaluated aggregate stability index (ASI), SOC stocks and microbial community composition at 0-10 and 10-25 cm. The results showed that RT-RI and NT-RM significantly increased ASI by 11% and 16% relative to CT-RR in the 0-10 cm layer; RT-RI significantly increased ASI by 13% relative to CT-RR in the 10-25 cm layer (p < 0.05). The RT-RI increased the SOC concentrations and SOC stocks of macroaggregates (>250 μm), which harbor most of the total SOC stocks in bulk soil. Both RT-RI and NT-RM increased total microbial biomass and biomass of six microbial groups (i.e., gram-negative bacteria (GN), gram-positive bacteria (GP), total bacteria (B), total fungi (F), arbuscular mycorrhizal fungi (AMF) and saprophytic fungi (SF)) at 0-10 cm, and RT-RI increased the above groups at 10-25 cm. Across a range of microbial community indicators, we found strong positive relationships between SOC and GN, ASI and AMF in bulk soil. A random forest analysis indicated that GN and F were the best microbial predictors of SOC concentrations and overall aggregate stability, whereas AMF/SF was the best predictor of SOC concentrations within aggregates and the stability of individual aggregate size classes. These results demonstrated a strong link between aggregate stability, SOC dynamics and microbial community composition, and suggest that conservation tillage increases both soil aggregation and SOC storage, thus providing sustainability and technical feasibility for the development of dryland agroecosystems.
... Object attributes were extracted for each of the digitized sample points. Digitized sample points, when intersected with objects, comprised an area of ~453 km2 (~5%) of the total landscape, surpassing the minimum area (0.25% of the total) as suggested by Colditz (2015). Finally, the training data set was split into training (80%) and testing (20%). ...
Article
Large-area land use land cover (LULC) mapping using high-resolution imagery remains challenging due to radiometric differences between scenes, the low spectral depth of the imagery, landscape heterogeneity, and computational limitations. Using a random forest (RF)- supervised machine-learning algorithm, we present a geographic object-based image analysis approach to classifying a large mosaic of 220 National Agriculture Imagery Program orthoimagery into lulc categories. The approach was applied in central Texas, USA, covering over 6000 km2. We generated 36 variables for each object and accounted for spatial structures of sample data to determine the distance at which samples were spatially independent. The final rf model produced 94.8% accuracy on independent stratified random samples. In addition, vegetation and water indices, the mean and standard deviation of principal components, and texture features improved classification accuracy. This study demonstrates a cost-effective way of producing an accurate multi-class land use/land cover map using high-spatial/low-spectral resolution orthoimagery.
... The quantity and area of the sample points can significantly affect the classification results of the random forest. To ensure optimal results, the area of training samples should account for about 0.25% of the total study area [42]. For our study, we labeled 10,526 vector polygons from Google images and MSI RGB images, using the visual discrimination method, as samples, of which 8053 were photovoltaic sample points, and 2473 were non-PV (NOPV) sample points. ...
Article
Full-text available
Photovoltaic (PV) panels convert sunlight into electricity, and play a crucial role in energy decarbonization, and in promoting urban resources and environmental sustainability. The area of PV panels in China’s coastal regions is rapidly increasing, due to the huge demand for renewable energy. However, a rapid, accurate, and robust PV panel mapping approach, and a practical PV panel classification strategy for large-scale applications have not been established. Here, we developed a new approach that uses spectral and textural features to identify and map the PV panels there were in coastal China in 2021 using multispectral instrument (MSI) and synthetic aperture radar (SAR) images, and the Google Earth Engine (GEE), to differentiate PV panels according to their underlying surface properties. Our 10-m-spatial-resolution PV panel map had an overall accuracy of 94.31% in 2021. There was 510.78 km2 of PV panels in coastal China in 2021, which included 254.47 km2 of planar photovoltaic (PPV) panels, 170.70 km2 of slope photovoltaic (SPV) panels, and 85.61 km2 of water photovoltaic (WPV) panels. Our resultant PV panel map provides a detailed dataset for renewable layouts, ecological assessments, and the energy-related Sustainable Development Goals (SDGs).