Fig 1 - uploaded by Miska Luoto
Content may be subject to copyright.
Location of study area (black box) in northern sub-arctic Finland. Two spatially independent data sets ‘‘calibration’’ and ‘‘evaluation’’ are indicated by B and A. Total of 2032 grid squares are also shown on map. 

Location of study area (black box) in northern sub-arctic Finland. Two spatially independent data sets ‘‘calibration’’ and ‘‘evaluation’’ are indicated by B and A. Total of 2032 grid squares are also shown on map. 

Source publication
Article
Full-text available
Currently, statistical models, which relate the spatial distribution of landforms and processes with environmental conditions, are widely used in geomorphological mapping. However, because models are the result of both simplification and imperfect representation of reality, model predictions may be tainted by errors. In this study, we evaluated the...

Contexts in source publication

Context 1
... of the spatial distributions of earth surface processes, landforms and the underlying environmental factors affecting them has an important role in geomorphological research (Allen, 1997). However, data on the distribution of different processes and landforms are often scarce and can be difficult to acquire. One potential means to complement the insufficient information concerning the distribution of geomorphological phenomena and suitable physical environments for them is provided by predictive geomorphological modelling (Vitek et al., 1996; Luoto and Hjort, 2005). Recently, spatial modelling has become one of the key issues in geomorphology, e.g. in assessing the stability of steep terrain (Dai and Lee, 2002; Guzzetti et al., 2006), mapping of the glaciated landscapes (Brown et al., 1998), mapping of soil and bedrock properties (Kheir et al., 2008) and periglacial processes (Mackay et al., 1992; Graff and Usery, 1993; Luoto and Seppa ̈ la ̈ , 2002; Hjort and Luoto, 2006). Previous studies have shown that modern spatial modelling techniques can provide useful forecasts of geomorphological phenomena in unsurveyed parts of landscapes (Luoto and Hjort, 2005), and can provide valuable contributions to theoretical (Walsh et al., 1998) and applied research (Haeberli, 1992; Harris et al., 2001a b; Gude and Barsch, 2005). Development of spatial modelling in geomorphology is based on three trends: growth in the availability of remotely sensed (RS) data and development of GIS techniques integrated with novel statistical methods (Walsh et al., 1998). In a methodological study, Luoto and Hjort (2005) compared different modelling techniques in predictive geomorphological mapping. Most importantly, although predictive models perform relatively accurately, they do not always provide robust spatial predictions. Such variability in modelling results is not surprising given that spatial models are correlative and therefore sensitive to the data and the mathematical functions utilized to describe the distributions of geomorphological phenomena in relation to environmental parameters. Process-based models using theoretical and experimental knowledge provide an alternative that is less dependent on empirical relationships. However, their implementation at the landscape level is difficult because of the complex processes and interactions that must be represented; and variability in forecasts is also common (Arau ́ jo and New, 2006). To overcome the problem of variability in predictions, the use of multiple models within a consensus modelling framework has been presented in various fields of research, e.g. in ecology (Huang and Lees, 2004; Thuiller, 2004; Arau ́ jo et al., 2005b; Huang and Lees, 2005), economy (Gregory et al., 2001), biomedicine (Nilsson et al., 2000), meteorology (Sanders, 1963), climatology (Benestad, 2004) and hydrology (Goswami and O’Connor, 2007). In this study, eight state-of-the-art modelling techniques were utilized to predict the distribution of 12 geomorphological landform types in sub-arctic Finland. Next, the predictive performances of four consensus methods combining the model outputs (probability values) of eight modelling techniques were evaluated. We put special emphasis on model testing, and therefore we assessed the accuracy of the predictive models with spatially independent evaluation data (Fig. 1). The use of spatially independent data are of particular value since alternative approaches, including re-substitution and one- time data splitting, have been shown to lead to over- optimistic estimates of the model predictive capabilities in new areas and biased signals of the importance of different predictors (Fielding and Haworth, 1995; Peterson and Vieglais, 2001; Arau ́ jo et al., 2005a; Randin et al., 2006). The study area is located in sub-arctic Finland (Fig. 1). The topography of the area is characterized by eroded fells with elevations ranging from ca. 200 to 640 m above sea level (a.s.l.). Geologically, the area belongs to a Pre-cambrian granulite complex about 1.9 billion years old (Merila ̈ inen, 1976). Surface deposits consist of glaci- genic till, peat, as well as sand and gravel deposits. The area lies within the zone of discontinuous permafrost (King and Seppa ̈ la ̈ , 1987). The mean annual air temperature was À 2.0 1 C and mean annual precipitation ca. 400 mm during the period 1962–1990 (Climatological Statistics in Finland 1961–1990, 1991). Botanically, the region lies to the north of the northern limit of the continuous Scots pine ( Pinus sylvestris L.) forest in the Orohemiarctic Zone with mountain birch ( Betula pub- escens ssp. czerepanovii ) as the prevailing tree species (Ahti et al., 1968). Mires belong to the palsa and sub- alpine mire types (Luoto and Seppa ̈ la ̈ , 2002). A more detailed description of the study region can be obtained from Hjort (2006). The 12 geomorphological landforms utilized in this study were peaty permafrost mounds (palsas), frost- formed fine-scale hummocks (convex non-sorted circles, earth hummocks and peat pounus), sorted patterned ground features (stone pits, sorted nets and sorted stripes), solifluction landforms (non-sorted solifluction terraces, sorted solifluction sheets and streams) as well as wind deflation sites (Hjort and Luoto, 2006). Despite the fact that landforms can be grouped, the features were treated as distinct types because different processes govern their formation (Washburn, 1979; French, 1996). The landforms were mapped and converted to grid- based modelling data in a four-step process (Hjort, ...
Context 2
... of the spatial distributions of earth surface processes, landforms and the underlying environmental factors affecting them has an important role in geomorphological research (Allen, 1997). However, data on the distribution of different processes and landforms are often scarce and can be difficult to acquire. One potential means to complement the insufficient information concerning the distribution of geomorphological phenomena and suitable physical environments for them is provided by predictive geomorphological modelling (Vitek et al., 1996; Luoto and Hjort, 2005). Recently, spatial modelling has become one of the key issues in geomorphology, e.g. in assessing the stability of steep terrain (Dai and Lee, 2002; Guzzetti et al., 2006), mapping of the glaciated landscapes (Brown et al., 1998), mapping of soil and bedrock properties (Kheir et al., 2008) and periglacial processes (Mackay et al., 1992; Graff and Usery, 1993; Luoto and Seppa ̈ la ̈ , 2002; Hjort and Luoto, 2006). Previous studies have shown that modern spatial modelling techniques can provide useful forecasts of geomorphological phenomena in unsurveyed parts of landscapes (Luoto and Hjort, 2005), and can provide valuable contributions to theoretical (Walsh et al., 1998) and applied research (Haeberli, 1992; Harris et al., 2001a b; Gude and Barsch, 2005). Development of spatial modelling in geomorphology is based on three trends: growth in the availability of remotely sensed (RS) data and development of GIS techniques integrated with novel statistical methods (Walsh et al., 1998). In a methodological study, Luoto and Hjort (2005) compared different modelling techniques in predictive geomorphological mapping. Most importantly, although predictive models perform relatively accurately, they do not always provide robust spatial predictions. Such variability in modelling results is not surprising given that spatial models are correlative and therefore sensitive to the data and the mathematical functions utilized to describe the distributions of geomorphological phenomena in relation to environmental parameters. Process-based models using theoretical and experimental knowledge provide an alternative that is less dependent on empirical relationships. However, their implementation at the landscape level is difficult because of the complex processes and interactions that must be represented; and variability in forecasts is also common (Arau ́ jo and New, 2006). To overcome the problem of variability in predictions, the use of multiple models within a consensus modelling framework has been presented in various fields of research, e.g. in ecology (Huang and Lees, 2004; Thuiller, 2004; Arau ́ jo et al., 2005b; Huang and Lees, 2005), economy (Gregory et al., 2001), biomedicine (Nilsson et al., 2000), meteorology (Sanders, 1963), climatology (Benestad, 2004) and hydrology (Goswami and O’Connor, 2007). In this study, eight state-of-the-art modelling techniques were utilized to predict the distribution of 12 geomorphological landform types in sub-arctic Finland. Next, the predictive performances of four consensus methods combining the model outputs (probability values) of eight modelling techniques were evaluated. We put special emphasis on model testing, and therefore we assessed the accuracy of the predictive models with spatially independent evaluation data (Fig. 1). The use of spatially independent data are of particular value since alternative approaches, including re-substitution and one- time data splitting, have been shown to lead to over- optimistic estimates of the model predictive capabilities in new areas and biased signals of the importance of different predictors (Fielding and Haworth, 1995; Peterson and Vieglais, 2001; Arau ́ jo et al., 2005a; Randin et al., 2006). The study area is located in sub-arctic Finland (Fig. 1). The topography of the area is characterized by eroded fells with elevations ranging from ca. 200 to 640 m above sea level (a.s.l.). Geologically, the area belongs to a Pre-cambrian granulite complex about 1.9 billion years old (Merila ̈ inen, 1976). Surface deposits consist of glaci- genic till, peat, as well as sand and gravel deposits. The area lies within the zone of discontinuous permafrost (King and Seppa ̈ la ̈ , 1987). The mean annual air temperature was À 2.0 1 C and mean annual precipitation ca. 400 mm during the period 1962–1990 (Climatological Statistics in Finland 1961–1990, 1991). Botanically, the region lies to the north of the northern limit of the continuous Scots pine ( Pinus sylvestris L.) forest in the Orohemiarctic Zone with mountain birch ( Betula pub- escens ssp. czerepanovii ) as the prevailing tree species (Ahti et al., 1968). Mires belong to the palsa and sub- alpine mire types (Luoto and Seppa ̈ la ̈ , 2002). A more detailed description of the study region can be obtained from Hjort (2006). The 12 geomorphological landforms utilized in this study were peaty permafrost mounds (palsas), frost- formed fine-scale hummocks (convex non-sorted circles, earth hummocks and peat pounus), sorted patterned ground features (stone pits, sorted nets and sorted stripes), solifluction landforms (non-sorted solifluction terraces, sorted solifluction sheets and streams) as well as wind deflation sites (Hjort and Luoto, 2006). Despite the fact that landforms can be grouped, the features were treated as distinct types because different processes govern their formation (Washburn, 1979; French, 1996). The landforms were mapped and converted to grid- based modelling data in a four-step process (Hjort, ...
Context 3
... explanatory data utilized in the modelling were collected from three different information sources, namely a digital elevation model (DEM; Fig. 1), biotope database and digital soil map (Hjort and Luoto, 2006). Three topographical parameters, three soil-type variables and three vegetation variables were compiled using Arc/Info GRID at 500 m cell size resolution (25 ha; Table 1). This rather coarse resolution was chosen based on the accuracy assessment of the used GIS data (Hjort and Luoto, 2006) and in an attempt to minimize the potential risks of spatial autocorrelation in statistical analyses (e.g. McCullagh and Nelder, 1989). The accuracies of the models and consensus methods (described in Sections 3.3 and 3.4) were calculated using spatially independent test data by the area under the curve (AUC) of a receiver-operating characteristic (ROC) plot (Fig. 1). The range of AUC values is from 0.0 to 1.0. A model providing excellent prediction has an AUC higher than 0.9, a fair model has an AUC between 0.7 and 0.9, and a model is considered as poor if it has an AUC lower than 0.7 (Swets, 1988). Based on AUC values, a ‘‘rank average’’ index indicates the average of the ranks of the modelling technique computed for each geomorphologic landform. In this study, eight modelling techniques and four consensus methods were tested. The rank values vary between 1 and 12, 12 indicating the highest model performance. A Wilcoxon signed ranks test was used to compare the statistical difference between the models. All implemented modelling techniques were run in R environment 1 under the BIOMOD framework (Thuiller, 2003). These techniques can be assigned to three main categories: (1) regressive algorithms [generalized linear models (GLMs), generalized additive models (GAMs), multiple adaptive regression splines (MARS)], (2) classification techniques [classification tree analysis (CTA) and mixture discriminant analysis (MDA)] and (3) machine-learning methods [generalized boosting methods (GBMs), artificial neural networks (ANNs) and random forest (RF)]. It is important to stress that all models were used with a predictive rather than inductive goal in this study. In such circumstances, accuracy of model predictions is more important than the significance of particular environmental variables. We did not further investigate autocorrelation aspects or the relative importance of different variables (Legendre, 1993). GLMs are mathematical extensions of linear models (McCullagh and Nelder, 1989). Recently, GLMs appear to be increasingly popular as the statistical model to be used. This is due to the ability of GLMs to handle nonlinear relationships and different types of statistical distributions characterizing spatial data. We used an automatic stepwise procedure based on the Akaike information criterion (AIC) in model calibration. Examples of the use of GLMs in geomorphological studies can be found in Atkinson et al. (1998), Rowbotham and Dudycha (1998), Dai and Lee (2002), Luoto and Seppa ̈ la ̈ (2002) and Luoto and Hjort (2004). GAMs are non-parametric extensions of GLMs. They provide a flexible data-driven class of models that permit both linear and complex additive response shapes, as well as the combination of the two within the same model (Hastie and Tibshirani, 1990). GAMs have been recently used in geomorphological studies (Hjort and Luoto, 2006; Brenning et al., 2007). Multivariate adaptive regression splines (MARS) represent a relatively new technique that combines classical linear regression, mathematical construction of splines and binary recursive partitioning to produce a local model in which relationships between response and predictors are either linear or nonlinear (Friedman, 1991). An important feature of MARS is its sensitivity to outliers and to collinearity between the variables (Deichmann et al., 2002). Examples of the use of MARS in geomorphologic studies can be found in Luoto and Hjort (2005), in climatology in Corte-Real et al. (1995) and in geophysics in Deveaux et al. (1993). CTA is an alternative to regression techniques, and uses a tree structure (Breiman et al., 1984). It is a rule-based method defined by binary decision splits about the values of predictors (Venables and Ripley, 2002). CTA is used rather frequently in geomorphological and environmental studies (Franklin, 2002; Luoto and Hjort, 2005). Discriminant analysis is used in statistics to identify the linear combination of features which best separate two or more classes of object. MDA is an extension of the well-known linear discriminant analysis (LDA) (Venables and Ripley, 2002), in which classes are modelled as a mixtures of subclasses, with each subclass represented by a Gaussian distribution. An example of the use of MDA in geomorphology was presented by Merritt and Wohl (2003). GBM is a sequential method based on binary trees (Ridgeway, 1999). GBM is considered as a machine- learning method using adaptive weighting of multiple outputs of numerous classification algorithms. The boosted classifier’s prediction is based on an accuracy weighted vote across the ...
Context 4
... explanatory data utilized in the modelling were collected from three different information sources, namely a digital elevation model (DEM; Fig. 1), biotope database and digital soil map (Hjort and Luoto, 2006). Three topographical parameters, three soil-type variables and three vegetation variables were compiled using Arc/Info GRID at 500 m cell size resolution (25 ha; Table 1). This rather coarse resolution was chosen based on the accuracy assessment of the used GIS data (Hjort and Luoto, 2006) and in an attempt to minimize the potential risks of spatial autocorrelation in statistical analyses (e.g. McCullagh and Nelder, 1989). The accuracies of the models and consensus methods (described in Sections 3.3 and 3.4) were calculated using spatially independent test data by the area under the curve (AUC) of a receiver-operating characteristic (ROC) plot (Fig. 1). The range of AUC values is from 0.0 to 1.0. A model providing excellent prediction has an AUC higher than 0.9, a fair model has an AUC between 0.7 and 0.9, and a model is considered as poor if it has an AUC lower than 0.7 (Swets, 1988). Based on AUC values, a ‘‘rank average’’ index indicates the average of the ranks of the modelling technique computed for each geomorphologic landform. In this study, eight modelling techniques and four consensus methods were tested. The rank values vary between 1 and 12, 12 indicating the highest model performance. A Wilcoxon signed ranks test was used to compare the statistical difference between the models. All implemented modelling techniques were run in R environment 1 under the BIOMOD framework (Thuiller, 2003). These techniques can be assigned to three main categories: (1) regressive algorithms [generalized linear models (GLMs), generalized additive models (GAMs), multiple adaptive regression splines (MARS)], (2) classification techniques [classification tree analysis (CTA) and mixture discriminant analysis (MDA)] and (3) machine-learning methods [generalized boosting methods (GBMs), artificial neural networks (ANNs) and random forest (RF)]. It is important to stress that all models were used with a predictive rather than inductive goal in this study. In such circumstances, accuracy of model predictions is more important than the significance of particular environmental variables. We did not further investigate autocorrelation aspects or the relative importance of different variables (Legendre, 1993). GLMs are mathematical extensions of linear models (McCullagh and Nelder, 1989). Recently, GLMs appear to be increasingly popular as the statistical model to be used. This is due to the ability of GLMs to handle nonlinear relationships and different types of statistical distributions characterizing spatial data. We used an automatic stepwise procedure based on the Akaike information criterion (AIC) in model calibration. Examples of the use of GLMs in geomorphological studies can be found in Atkinson et al. (1998), Rowbotham and Dudycha (1998), Dai and Lee (2002), Luoto and Seppa ̈ la ̈ (2002) and Luoto and Hjort (2004). GAMs are non-parametric extensions of GLMs. They provide a flexible data-driven class of models that permit both linear and complex additive response shapes, as well as the combination of the two within the same model (Hastie and Tibshirani, 1990). GAMs have been recently used in geomorphological studies (Hjort and Luoto, 2006; Brenning et al., 2007). Multivariate adaptive regression splines (MARS) represent a relatively new technique that combines classical linear regression, mathematical construction of splines and binary recursive partitioning to produce a local model in which relationships between response and predictors are either linear or nonlinear (Friedman, 1991). An important feature of MARS is its sensitivity to outliers and to collinearity between the variables (Deichmann et al., 2002). Examples of the use of MARS in geomorphologic studies can be found in Luoto and Hjort (2005), in climatology in Corte-Real et al. (1995) and in geophysics in Deveaux et al. (1993). CTA is an alternative to regression techniques, and uses a tree structure (Breiman et al., 1984). It is a rule-based method defined by binary decision splits about the values of predictors (Venables and Ripley, 2002). CTA is used rather frequently in geomorphological and environmental studies (Franklin, 2002; Luoto and Hjort, 2005). Discriminant analysis is used in statistics to identify the linear combination of features which best separate two or more classes of object. MDA is an extension of the well-known linear discriminant analysis (LDA) (Venables and Ripley, 2002), in which classes are modelled as a mixtures of subclasses, with each subclass represented by a Gaussian distribution. An example of the use of MDA in geomorphology was presented by Merritt and Wohl (2003). GBM is a sequential method based on binary trees (Ridgeway, 1999). GBM is considered as a machine- learning method using adaptive weighting of multiple outputs of numerous classification algorithms. The boosted classifier’s prediction is based on an accuracy weighted vote across the estimated classifiers (Ridgeway, 1999). Boosting methods are novel statistical techniques, which have been used recently in ecological modelling (Elith et al., 2006). However, to the best of our knowledge GBM has not previously been used in geomorphological research. ANN are powerful rule-based modelling techniques, which are frequently used in spatial modelling. ANN provide an alternative way to generalize linear regression functions (Venables and Ripley, 2002). Neural networks have received considerable attention as a means of building accurate models for prediction when the func- tional form of the underlying equations is unknown (Lek and Guegan, 1999). Luoto and Hjort (2005) evaluated the reliability of ANN for ...

Similar publications

Article
Full-text available
Landslide susceptibility study is a critically important topic throughout the globe owing to the social and fnancial catastrophes of landslides. The present research aims to evaluate as well as compare the profciencies of six advanced machine learning techniques (MLTs) for mapping the landslide susceptibility of northern parts of Pakistan. The cons...