Multicollinearity diagnosis using VIF, TOL and Information Gain Ratio.

Multicollinearity diagnosis using VIF, TOL and Information Gain Ratio.

Source publication
Article
Full-text available
Spatial modelling of gully erosion at regional level is very relevant for local authorities to establish successful counter-measures and to change land-use planning. This work is exploring and researching the potential of a genetic algorithm-extreme gradient boosting (GE-XGBoost) hybrid computer education solution for spatial mapping of the suscept...

Contexts in source publication

Context 1
... results of the multicollinearity analysis on the relevance of gully erosion conditioning factors are reported in Table 5. As stated previously, collinearity among the predictors are investigated by VIF and TOL. ...
Context 2
... results of the multicollinearity analysis on the relevance of gully erosion conditioning factors are reported in Table 5. As stated previously, collinearity among the predictors are investigated by VIF and TOL. ...

Citations

... To achieve the first goal, we will adopt a two-fold approach that uses GIS and DTM (digital terrain model) to determine the extent and depth of soil loss (Arabameri et al., 2021). ...
Article
This research used the Revised Universal Soil Loss Equation (RUSLE) with Sediment Delivery Ratio (SDR) model. The analytic hierarchy process (AHP) method, while also incorporating the use of a geographic information system (GIS) and remote sensing (RS) to predict the annual soil loss rate and spatialise the processes of water erosion at the scale of the Loukkos Watershed, Morocco. The RUSLE model and AHP parameters were estimated using RS data, and the erosion vulnerability zones were determined using GIS. We used five parameters, including precipitation erosivity, soil erodibility, slope length and steepness, vegetation cover, and soil erosion control practices in the RUSLE. For the AHP technique, we used seven geo-environmental factors, including annual average precipitation, drainage density, lineament density, slope, soil texture, land use/land cover and landform maps. The results of RUSLE indicated that the average annual soil loss varied from 0 to 2388.27 t Á ha À1 Á year À1. The total estimated annual potential soil loss was approximately 40 790 220.11 t Á ha À1 Á year À1 , and a sediment yield estimated by RUSLE-SDR was 8 647 526.66 t Á ha À1 Á year À1 , equivalent to 6.65 Mm 3. This value is very close to the measured value of 6.81 Mm 3 , for a difference of 0.16 Mm 3. Furthermore, the results of the AHP indicate that the soil erosion potential index varies from 0 to 0.205315 t Á ha À1 Á year À1. Overall, nearly 13.7% of the area suffered from severe soil erosion exceeding 50 t Á ha À1 Á year À1. Approximately 80% of the Loukkos Watershed area experienced only slight erosion, while the remaining 6% incurred moderate erosion. Integrating GIS and RS into the RUSLE model and AHP helped us robustly estimate the extent and degree of erosion risk. Territorial decision-makers should adopt our results to develop soil conservation strategies, water management plans and other necessary soil and water conservation measures for this region.
... However, these techniques frequently fail to completely capture the intricate dynamics of erosion processes, particularly in changing landscapes and climatic conditions. Artificial intelligence, notably machine learning (ML) algorithms, may help improve erosion susceptibility mapping by considering many aspects and interactions (Arabameri et al., 2021;Cimusa Kulimushi et al., 2023). ...
Article
Full-text available
Gully erosion is a widespread environmental danger, threatening global socio-economic stability and sustainable development. This study comprehensively applied seven machine learning (ML) models including SVM, KNN, RF, XGBoost, ANN, DT, and LR, and evaluated gully erosion susceptibility in the Tensift catchment and predict it within the Haouz plain, Morocco. To ensure the reliability of the findings, the study employed a robust combination of gully erosion inventory, sentinel images, and Digital Surface Model. Eighteen predictors, encompassing topographical, geomorphological, environmental, and hydrological factors, were selected after multicollinearity analyses. The gully erosion susceptibility of the study revealed that approximately 28.18% of the Tensift catchment is at a very high risk of erosion. Furthermore, 15.13% and 31.28% of the catchment are categorized as low and very low respectively. These findings extend to the Haouz plain, where 7.84% of the surface area are very highly risking erosion, while 18.25% and 55.18% are characterized as low and very low risk areas. To gauge the performance of the ML models, an array of metrics including specificity, precision, sensitivity, and accuracy were employed. The study highlights XGBoost and KNN as the most promising models, achieving AUC ROC values of 0.96 and 0.93 in the test phase. The remaining models namely RF (AUC ROC = 0.89), LR (AUC ROC = 0.80), SVM (AUC ROC = 0.81), DT (AUC ROC = 0.86), and ANN (AUC ROC = 0.78), also displayed commendable performance. The novelty of this research is its innovative approach to combat gully erosion through cutting edge ML models, offering practical solutions for watershed conservation, sustainable management, and the prevention of land degradation. These insights are invaluable for addressing the challenges posed by gully erosion within the region, and beyond its geographical boundaries and can be used for defining appropriate mitigation strategies at local to national scale.
... Their results showed the potential of using an ensemble model to predict gully erosion. In another study carried out by Arabameri et al. (2021a), a Genetic Algorithm-Extreme Gradient Boosting (GE-XGBoost) hybrid approach was developed to map the susceptibility of gully erosion. This method combined the efficiency of genetic algorithms with the accuracy of XGBoost and showed promising results for gully erosion mapping. ...
Article
Full-text available
Ephemeral gully erosion is one of the main sources of soil loss from agricultural landscapes. Various tools including predictive models with machine learning (ML) algorithms have shown promise to identify susceptible areas. However, ML models have two limitations: (1) a trained model in one area may not be applicable in another area and (2) their application for susceptibility mapping of ephemeral gullies at large-scale areas presents a challenge due to the small size of these features and the need for digitization of all gullies for accurate susceptibility mapping. To overcome these limitations, a novel approach was introduced in the current study for comprehensive validation of ML models and prepare a susceptibility map of ephemeral gullies using an areal transfer of calibration–validation relations. Five ML models were evaluated in Northern Lake Erie Basin as a large-scale region. First, the region was divided into three zones based on the most effective factors of gully formation, and a total of eight watersheds were selected in Zone 1 and Zone 2 (hereafter study area). Zone 3 was not considered, because no gullies were observed in this zone. All the ML models were compared using a new validation approach, including local (trained and validated in the same area) and transferred (trained in one area and tested in other areas). Results showed that random forest (RF) was the most accurate local model in both Zone 1 (accuracy = 0.8833, AUC = 0.8830, sensitivity = 0.9239, and specificity = 0.8537) and Zone 2 (accuracy = 0.8606, AUC = 0.8608, sensitivity = 0.8987, and specificity = 0.8381), while gradient boosting decision tree (GBDT) was the most accurate transferred model (accuracy = 0.7298, AUC = 0.7297, and sensitivity = 0.7826). From the results of the current study, it can be concluded that (1) zonation technique supports the prediction of ephemeral gullies by dividing the study area into the small zones that carry similar topographical and morphological characteristics and (2) the local-transferred validation technique is a helpful method for finding the ML model that can be trained in a small watershed and scale up to the larger area without further calibration.
... Although XGBoost requires additional hyperparameters that are designed to lower the risk of overfitting, lower prediction variability, and hence increase accuracy, it maintains many parameters with other tree-based models. XGB is defined as the process of integrating the outputs of numerous "weak learners" to produce a "strong learner" [72]. ...
Article
Full-text available
Floods are among nature’s most destructive disasters because they create extremely extensive damage to structures, the environment, and people. Therefore, it is important to determine the causes of floods as well as areas that are vulnerable to flooding, which can be done by performing a flood susceptibility model. This research identified flood-prone locations in the Periyar River Basin using historical flood records from 2000 to 2020 and some of the conditioning features. The ten variables considered in the present study include elevation, slope, aspect, flow direction, drainage density, rainfall, Normalized Difference Water Index (NDWI), Stream Power Index (SPI), Sediment Transport Index (STI), and Topographic position Index (TPI). In order to create a flood susceptibility map and examine the correlation between flood incidence, the logistic regression (LR), support vector machine, naive Bayes, random forest, AdaBoost, gradient boost, and extreme gradient boost models were developed and validated. The model accuracies were measured using the receiver operating characteristic curve (ROC) and the area under the curve (AUC). In addition to this, some other indices like precision, recall, sensitivity, specificity, F1 score, and overall accuracy matrices are used for model evaluation. The results demonstrated that every model can identify flood-prone locations with reasonable accuracy. However, compared to other models, the random forest model showed a better performance and prediction rate (AUC = 94). Furthermore, all models indicated that low-lying places near water bodies and in the western region of the study area had the largest probability of flooding. According to the study, machine learning techniques are a useful tool for mapping and predicting flood-prone areas and for creating flood mitigation strategies and plans.
... Due to the complex interplay of various contributing factors, studies on gully susceptibility often entail the fusion of robust machine learning (ML) algorithms with data from remote sensing (RS) sources. Instances of this integration can be observed in the works of Arabameri et al. (2021), Hitouri et al. (2022), and Chuma et al. (2023). Additionally, Unmanned Aerial Vehicles (UAVs) have emerged as promising instruments in the realms of remote sensing and soil science, as highlighted by Niculițȃ et al. (2020) and Meinen and Robinson (2021). ...
Article
Gully erosion represents the most severe soil loss, with far-reaching consequences beyond the immediate site. Assessing the stability of gullies is particularly challenging in tropical regions with sandy soils and limited accurate data. Nonetheless, initiating gully inventories is a crucial first step in guiding public policies and conservation projects. In this study, we focus on the Upper Taquari River Basin (UTRB) situated on the fringes of the Brazilian Pantanal, where extensive erosion occurs in the upper regions and flooding occurs in the plains. We present the first qualitative and quantitative analysis of gullies in this region. Considering the historical context of the UTRB, it has long suffered from land mismanagement, particularly in livestock activities. Our objective was to evaluate the correspondence between gullies and land use classes in the MapBiomas Project, Brazil's most reliable non-governmental land use map, and the Rural Environmental Registry (CAR), the official information shared between landowners and public authorities. Thirteen remote-sensed indicators encompassing vegetation, water, soil, and terrain indices were assessed for 2022. Gullies were digitized through visual interpretation of a high-resolution Maxar Vivid Basic 2017 image. The classification was performed using the Random Forest (RF) algorithm, wherein pixels were classified into three classes: active, intermediate, and stable, based on the degree of vegetation cover and bare soil. The agreement of the gullies with the features of MapBiomas and CAR was also examined. The results revealed an overall accuracy of 96% and a Kappa index of 93% for the pixel classification. In the study area, 2,960 gullies were digitized, with 60% classified as active features and only 2% as stable. Furthermore, the MapBiomas algorithm misclassified many pixels with active gullies as pasture. Conversely, the CAR data failed to identify gullies as areas demanding restoration. To address these issues, we recommend revising both land use maps to accurately represent the presence of erosions and improve decision-making that favors efficient conservation efforts of the region. As a further result of our actions, the method described here may prove valuable in formulating restoration plans for other tropical savanna regions.
... In the context of classification problems, an ideal model outcome can be achieved with a training database encompassing 100 cases (Domingos 2012). An expansion in data volume can help alleviate issues of overfitting and data uncertainty (Arabameri et al. 2021;Pham et al. 2021). Our current database of dam instances comprises 153 landslide dam cases, some of which are historic and contain uncertain records of landslide dam characteristics. ...
Article
Full-text available
A rapid and accurate prediction of a landslide dam’s life span is of significant importance for emergency geological treatment. However, current prediction models for the state of a landslide dam are based solely on geomorphological indexes, and do not take into consideration attribute properties such as landslide types, trigger factors, and dam types. This study investigates the relationships between a landslide dam’s geometry and the capacity of the barrier lake and proposes fitting models, which supplement the current landslide dam database. Subsequently, six predictive models for landslide dam life span are established, utilizing machine learning algorithms such as logistic regression, k-nearest neighbors, support vector machine, Naïve Bayes, decision tree, and random forest, which consider five factors, including geometry parameters and attribute properties. The performances of these six models are analyzed and compared to a typical prediction model, the dimensionless blockage index (DBI). The results suggest that the models established in this study not only have a consistent absolute accuracy as the DBI model, but also overcome the disadvantage that a large number of cases cannot be judged by the DBI model. Among the formulated machine learning models, the random forest model exhibits the highest absolute accuracy (89%), lowest error rate (7%), lowest false alarm rate (15%), and no uncertainty rate. Additionally, three renowned landslide dams, namely the Costantino, Hsiaolin, and Baige landslide dams, are analyzed to illustrate the applicability of the established machine learning models. The study results provide essential guidance for the predictions and emergency geological treatments of landslide dam disasters.
... Machine/Deep learning models further made significant leverage in not only pattern extraction but predicting the spatial susceptibility of the studied phenomenon across a given area (Arabameri et al. 2020c;Conoscenti et al. 2018;Gayen et al. 2019;Pourghasemi et al. 2017;Roy et al. 2020). Some researchers went even beyond and used ensemble machine learning models with various optimization algorithms to automatically tune the hyper-parameters embedded in the models through significantly high modeling iterations (e.g., Band et al. (2020) and Arabameri et al. (2021)). Metaheuristic algorithms such as simulated annealing (Kirkpatrick et al. 1983), ant colony optimization (Dorigo 1992), particle swarm optimization (Kennedy and Eberhart 1995), harmony search (Geem et al. 2001), artificial bee colony (Karaboga 2005), imperialist competitive algorithm (Atashpaz-Gargari and Lucas 2007), and gravitational search algorithm (Rashedi et al. 2009) are some examples of the advanced optimization algorithms that found their way to spatial modeling of natural hazards. ...
Article
Full-text available
Developing a susceptibility map is a crucial primary step for dealing with undesirable natural phenomena, gully erosion included. On the other hand, recent computational progress call for employing new methodologies to keep the solutions updated. In this work, the performance of a conventional artificial neural network (ANN) is improved by applying a metaheuristic algorithm (symbiotic organisms search—SOS) for generating the gully erosion susceptibility map of an area in Golestan Province, Northern Iran. A geo-database is created from the gully erosion inventory and twenty conditioning factors. After analyzing the interrelated relationships between the geo-database components, training and testing data sets are formed. The models are executed with proper configurations and according to the results, the SOS algorithm could enhance the training accuracy of the ANN from 92.8% to 98.4%, and testing accuracy from 89.8% to 91.4%. In addition, comparing the performance of the SOS with shuffled complex evolution (SCE-NN) and electromagnetic field optimization (EFO-NN) algorithms revealed the greater accuracy of the SOS. However, the SCE-NN and EFO-NN performed more accurately than conventional ANN. Therefore, it can be concluded that the use of metaheuristic techniques may improve the prediction ability of the ANN in gully erosion susceptibility mapping. Finally, a monolithic equation is extracted from the SOS–ANN model to be used as a predictive formula for similar purposes.
... We noticed that the delineation of the predictions was improved by using the combined predictions of Setting 1 and 2. This improved performance is also reflected by higher length-weighted overlap values for all blindtest areas, increasing the length-weighted overlap by at least +8.2 %. Similar to the studies of Arabameri et al.(2021 ) and Roy & Saha (2022 ), which used ensemble models for forecasting areas vulnerable to gully erosion, our findings confirms that the combined products of the pre-trained models increase the overall delineation of the ravines. ...
Preprint
Full-text available
Gullies and ravines are common landforms in raised marine fine-grained deposits in Norway. Gullies in marine clay are significant landforms indicative of soil erosion, natural hazards and are of high conservation value. As a result of the substantial impact of human intervention over the past century, marine clay gullies are now red-listed. To monitor the condition of these landforms we need to improve our understanding of their spatial extent, complexity, and morphology. We explore the applicability of automated approaches that uses a methodology of combining deep learning (DL), fully convolutional neural networks (FCNN), and a U-Net model with ArcPy libraries and ground truth data to derive a high-resolution map of gullies in raised marine fine-grained deposits. Predictors used comprise solely terrain derivatives to broaden the usage of the pre-trained model to other regions. Our best model achieved a precision score of 0.82 and a recall of 0.75. We find that our pre-trained model can successfully predict gullies in blind-test areas. The model performs better in regions with similar geological settings, scoring a length-weighted overlap of >72% with reference datasets. We also find that the model’s applicability increases when we post-process the predictions by eliminating noise, especially by using the predictions derived from ensembled models. We, therefore, conclude that the pre-trained models can effectively be used to supplement the geomorphological mapping of marine clay gullies in Norway. The outcome of this research contributes towards mapping the spatial extent and condition of red-listed landforms in Norway, as well as the development of monitoring systems for future landscape change. Keywords: gullies, ravines, landforms, marine clay, deep learning, U-net
... The mapping of areas with potential risk of soil loss offers a representation of current knowledge about land use in relation to responses to erosive processes. From this, it is possible to identify areas susceptible to diverse aspects of environmental impacts caused by land use (ARABAMERI et al. 2021). ...
Article
Full-text available
The modeling of areas susceptible to soil loss due to hydro-erosive processes consists of methods that simplify reality to predict future behavior based on the observation and interaction of a set of geoenvironmental factors. Thus, the objective of the current analysis is to predict susceptibility to soil loss and map areas with the potential risk of erosion using the principles of Binary Logistic Regression (BLR) and Artificial Neural Networks (ANN). The hydrographic sub-basin of the Sete Voltas River (330 km2), Rondônia, Brazil, was defined as the experimental area. Models were obtained using 100 sample units and 14 predictor parameters. Susceptibility was mapped based on five reference classes: very low, low, moderate, high, and very high. ANN obtained an area under the curve (AUC) of 0.808 and global precision of 79.2%, and the BLR model showed an AUC of 0.888 and global precision of 77%. Potentially susceptible areas represent 57.71% and 54.80% of the area for BLR and ANN models, respectively. The greatest potential risks are verified in places with no vegetation cover associated with agricultural practices. The technique proved to be effective, with adequate precision and the advantage of being less time-consuming and expensive than other methods.
... A erosão é um processo natural de transporte do solo pelas forças do vento e/ou da água que ocorre a uma taxa mais rápida do que os vários processos de formação do solo (KOURGIALAS et al. 2016;ARABAMERI et al. 2021). Dependendo de sua proporção e magnitude, pode constituir um dos maiores problemas ambientais atuais, na medida em que é responsável pela diminuição da fertilidade dos solos, causando redução na produtividade e aumento nos custos, além de provocar o assoreamento de rios e a diminuição da qualidade da água (NHU et al. 2020;MAURYA et al. 2021). ...
... Salienta-se que o zoneamento é uma das ferramentas essenciais para o controle da erosão (DUBE et al. 2014;RAZAVI-TERMEH et al. 2020). Sendo assim, a integração de imagens de satélite e ferramentas de modelagem de dados é empregada para mapear porções mais suscetíveis à erosão, sobretudo no contexto da gestão de bacias hidrográficas (GHORBANZADEH et al. 2020;ARABAMERI et al. 2021). ...
Article
Full-text available
A espacialização de informações físico-naturais e antrópicas em bacias hidrográficas propicia um entendimento sistêmico da unidade e o desenvolvimento de estratégias de planejamento frente a processos erosivos. Sendo assim, o objetivo desta pesquisa é propor um método de mapeamento da vulnerabilidade à erosão para a Bacia Hidrográfica do Rio Santa Maria (BHRSM), localizada no sudoeste gaúcho. A área de estudo foi escolhida por apresentar o desenvolvimento de feições erosivas documentadas por outros autores. A metodologia envolveu a atribuição de pesos para as variáveis de geologia, solos, uso da terra, erosividade da chuva e elementos de Geomorphons com posterior cruzamento em ambiente SIG. O modelo demonstrou que a BHRSM apresenta uma vulnerabilidade natural à erosão decorrente da presença de rochas sedimentares friáveis, embora o sobrepastoreio na porção ocupada pelo gado venha aumentado a degradação do solo e fomentando o transporte de sedimentos. Ademais, a introdução da soja propicia a exposição do solo em determinados períodos do ano, ocasionando a erosão. O modelo indicou, de forma inédita, a associação das porções com maior presença de feições erosivas à maior erosividade média anual da chuva na BHRSM. O modelo também apresentou como vantagem a possibilidade de mudança de classe de vulnerabilidade em decorrência da variação da erosividade e uso da terra, dados que apresentam mudanças anuais na BHRSM. Tendo em vista a situação de vulnerabilidade à erosão, é necessário desenvolver estratégias de manejo, sobretudo para a formação campestre e culturas temporárias, a fim de minimizar os danos dos processos erosivos.