Table 1 - uploaded by Carla Osthoff
Content may be subject to copyright.
Description of variables generated by the modeling process.

Description of variables generated by the modeling process.

Source publication
Chapter
Full-text available
Spatial analysis tools and synthesis of results are key to identifying the best solutions in biodiversity conservation. The importance of process automation is associated with increased efficiency and performance both in the data pre-processing phase and in the post-analysis of the results generated by the packages and modeling programs. The Model-...

Contexts in source publication

Context 1
... main ob- jective of this work was to package modeling procedures as R functions and to create an application (Model-R) that allows, either via command-line or through a web interface, to perform ecological niche modeling, overcoming the most com- mon barriers and providing approaches for data entry steps, data cleaning, choice of predictor variables, parametrization of algorithms, and post-analysis as well as the retrieval of the results. A list of acronyms and variable definitions is presented in Table 1. ...
Context 2
... values obtained from the validation process are stored as a table, and their values are presented in Figure 2 (step 6). A brief description of each variable is presented in Table 1. ...

Similar publications

Article
Full-text available
Background: Lyme disease (LD) is a bacterial infection transmitted by the black-legged tick (Ixodes scapularis) in eastern North America. It is an emerging disease in Canada due to the expanding range of its tick vector. Environmental risk maps for LD, based on the distribution of the black-legged tick, have focused on coarse determinants such as...
Article
Full-text available
The pond turtle Emys trinacris is an endangered endemic species of Sicily showing a fragmented distribution throughout the main island. In this study, we applied "Ensemble Niche Modelling", combining more classical statistical techniques as Generalized Linear Models and Multivariate Adaptive Regression Splines with machine-learning approaches as Bo...
Article
Full-text available
The responses of animal communities to fire are not well understood. We used modelling techniques to analyse how fire altered the habitat suitability of amphibian and reptile species across a 37-year chronosequence (1975–2011). The study was conducted at a biogeographical crossroads between the Mediterranean and medio-European bioregions. Using 944...
Article
Full-text available
Objective: To identify the high risk spatiotemporal clusters of dengue cases and explore the associated risk factors. Methods: Monthly indigenous dengue cases in 2005-2017 were aggregated at county level. Spatiotemporal cluster analysis was used to explore dengue distribution features using SaTScan9.4.4 and Arcgis10.3.0. In addition, the influen...
Article
Camptothecin (CPT) is one of anticancer drug that is widely used for treating various cancers. In India, the drug is primarily sourced from natural habitats of the red listed species Nothapodytes nimmoniana. Ecological niche models are potential tools to define and predict the “ecological niche” of a species that exhibit ecological variations. The...

Citations

... After the compilation, the Eremanthus database was edited to remove unreliable records, keeping only data suitable for use in research, eliminating problems with misidentification, inaccuracy, records outside raster boundaries, more than one datapoint per pixel and duplicated records (Giannini et al. 2012). This process was carried out manually and with the clean functions ("clean_dupl", "clean_nas", "clean_uni") of the "modleR" 0.0.0.9000 (Sánchez-Tapia et al. 2018) package in RStudio 1.3.1056 (RStudio Team 2020) with R 3.6.3 ...
Article
Full-text available
Characterized as one of the largest biodiversity hot spots, the Cerrado ecoregion is home to a wide variety of endemic species. Several threats such as agricultural expansion and habitat fragmentation put the species of the Cerrado ecosystems and biodiversity at risk. Thus, this study analysed the genus Eremanthus, which has abundant species in the Cerrado and suffers from intense anthropogenic pressure due to overexploitation, mainly as a material utilized for the construction of fences and the extraction of essential oils. Environmental suitability was estimated for the genus for the present and future (2070), in order to characterize the importance of the climate in the species distribution and to analyse the conservation status of the genus. The Species Distribution Modelling and Gap Analysis showed that the areas of environmental suitability are limited and are found in a matrix composed of a high presence of anthropic activity, which can intensify the loss of species habitat and increase the vulnerability of the group. The studied species were classified as Endangered and Vulnerable according to IUCN criteria, presenting very reduced areas of environmental suitability projected in the future and a low percentage of species in protected areas, that may influence possible species extinctions in the genus. Thus, this study provides insights to assist in conservation planning and reinforces the importance of protecting the biodiversity of the Cerrado.
... After the compilation, the Eremanthus database was edited to remove unreliable records, keeping only data suitable for use in research, eliminating problems with misidentification, inaccuracy, records outside raster boundaries, more than one datapoint per pixel and duplicated records (Giannini et al. 2012). This process was carried out manually and with the clean functions ("clean_dupl", "clean_nas", "clean_uni") of the "modleR" 0.0.0.9000 (Sánchez-Tapia et al. 2018) package in RStudio 1.3.1056 (RStudio Team 2020) with R 3.6.3 ...
Preprint
Full-text available
Characterized as one of the largest biodiversity hotspots, the Cerrado ecoregion houses a wide variety of endemic species. Several threats, such as agricultural expansion and habitat fragmentation, put the species of the Cerrado ecosystems and biodiversity at risk. The genus Eremanthus is frequent in the Cerrado and suffers from intense anthropogenic pressure due to overexploitation mainly for the construction of fences and extraction of essential oil. Environmental suitability, of the Mid-Holocene, present and future (2070), were estimated for the genus in order to characterize the importance of the climate in the species distribution and to analyse the conservation status. The Species Distribution Modelling showed that most species of Eremanthus presented similarities between the environmental suitability of the present and the Mid-Holocene, enabling the identification of areas of environmental stability in OSL areas of campos rupestres. The species of the genus were classified as Endangered and Vulnerable according to IUCN criteria, presenting very reduced areas of environmental suitability projected in the future and a low percentage of species in Protected Areas, that may influence possible extinctions of species in the genus. The approaches in this study provide consistent subsidies to assist in conservation planning.
... BIOCLIM and Mahalanobis Distance, Busby 1991), some require presence and background data (Maxent, Phillips et al. 2006), and many require presence and (pseudo)absence data (e.g. GLM, GAM, and Random Forest, Breiman 2001;Sánchez-Tapia et al. 2018). ...
Article
Different models are available to estimate species’ niche and distribution. Mechanistic and correlative models have different underlying conceptual bases, thus generating different estimates of a species’ niche and geographic extent. Hybrid models, which combining correlative and mechanistic approaches, are considered a promising strategy, however, no synthesis in the literature assessed their applicability for terrestrial vertebrates to allow best-choice model considering their strengths and trade-offs. Here, we provide a systematic review of studies that compared or integrated correlative and mechanistic models to estimate species’ niche for terrestrial vertebrates under climate change. Our goal was to understand their conceptual, methodological, and performance differences, and the applicability of each approach. The studies we reviewed directly compared mechanistic and correlative predictions in terms of accuracy or estimated suitable area, however, without any quantitative analysis to support comparisons. Contrastingly, many studies suggest that instead of comparing approaches, mechanistic and correlative methods should be integrated (hybrid models). However, we stress that the best approach is highly context-dependent. Indeed, the quality and effectiveness of the prediction depends on the study's objective, methodological design, and which type of species’ niche and geographic distribution estimated are more appropriate to answer the study's issue. This article is protected by copyright. All rights reserved
... Currently, there are several packages available in R that can help in producing useful data in the reproducibility processes of the ENM experiments (Cobos, Peterson, Barve, & Osorio-Olvera, 2019;de Andrade, Velazco, & De Marco Júnior, 2020;Golding et al., 2018;Kass et al., 2018;Qiao et al., 2016;Sánchez-Tapia et al., 2018). A framework for scalable and reproducible ENM Model-R (Sánchez-Tapia et al., 2018) was developed with the objective of unifying and automating preprocessing, processing, and postprocessing steps, as well to maintain all this information for reproducibility uses. ...
... Currently, there are several packages available in R that can help in producing useful data in the reproducibility processes of the ENM experiments (Cobos, Peterson, Barve, & Osorio-Olvera, 2019;de Andrade, Velazco, & De Marco Júnior, 2020;Golding et al., 2018;Kass et al., 2018;Qiao et al., 2016;Sánchez-Tapia et al., 2018). A framework for scalable and reproducible ENM Model-R (Sánchez-Tapia et al., 2018) was developed with the objective of unifying and automating preprocessing, processing, and postprocessing steps, as well to maintain all this information for reproducibility uses. This tool includes packages related to retrieving and cleaning data, multi-projection tools that can be applied to different temporal and spatial datasets, and postprocessing tools linked to the generated models. ...
Article
Full-text available
The unprecedented size of the human population, along with its associated economic activities, has an ever‐increasing impact on global environments. Across the world, countries are concerned about the growing resource consumption and the capacity of ecosystems to provide resources. To effectively conserve biodiversity, it is essential to make indicators and knowledge openly available to decision‐makers in ways that they can effectively use them. The development and deployment of tools and techniques to generate these indicators require having access to trustworthy data from biological collections, field surveys and automated sensors, molecular data, and historic academic literature. The transformation of these raw data into synthesized information that is fit for use requires going through many refinement steps. The methodologies and techniques applied to manage and analyze these data constitute an area usually called biodiversity informatics. Biodiversity data follow a life cycle consisting of planning, collection, certification, description, preservation, discovery, integration, and analysis. Researchers, whether producers or consumers of biodiversity data, will likely perform activities related to at least one of these steps. This article explores each stage of the life cycle of biodiversity data, discussing its methodologies, tools, and challenges. This article is categorized under: • Algorithmic Development > Biological Data Mining Abstract Biodiversity data has a life‐cycle that involves planning, gathering, quality improvement, documentation through metadata, preservation through publication, discovery and integration, and analysis. This article describes the practices, tools, and challenges in each step of this life‐cycle.
... Here, we present modleR, a workflow designed to automatize some of the common steps when performing ecological niche models, using the R statistical environment (R Core Team, 2018). An early version (Sánchez-Tapia et al., 2018) had a similar structure but several improvements have been implemented to this day, so we refer the users to the current manuscript, with focus on the R package. We have used functions from well-known packages, such as dismo (Hijmans et al., 2017), maxnet , RandomForest (Liaw and Wiener, 2002), and some newer but promising implementations, such as kuenm (Cobos et al., 2019). ...
Preprint
Full-text available
Ecological niche models (ENM) use the environmental variables associated with the currently known distribution of a species to model its ecological niche and project it into the geographic space. Widely used and misused, ENM has become a common tool for ecologists and decision-makers. Many ENM platforms have been developed over the years, first as standalone programs, later as packages within script-based programming languages and environments. The democratization of these programming tools and the advent of Open Science brought a growing concern regarding the reproducibility, transparency, robustness, portability, and interoperability in ENM workflows. ENM workflows have some core components that are replicated between projects. However, they have a large internal variation due to the variety of research questions and applications. Any ecological niche modeling platform should take into account this trade-off between stability and reproducibility on one hand, and flexibility and decision-making on the other. Here, we present modleR , a four-step workflow that wraps some of the common phases executed during an ecological niche model procedure. We have divided the process into (1) data setup, (2) model fitting and projection, (3) partition joining and (4) ensemble modeling (consensus between algorithms). modleR is highly adaptable and replicable depending on the user’s needs and is open to deeper internal parametrization. It can be used as a testing platform due to its consistent folder structure and its capacity to control some sources of variation while changing others. It can be run in interactive local sessions and in high-performance or high-throughput computational (HPC/HTC) platforms and parallelized by species or algorithms. It can also communicate with other tools in the field, allowing the user to enter and exit the workflow at any phase, and execute complementary routines outside the package. Finally, it records metadata and session information at each step, ensuring reproducibility beyond the use of script-based applications.
... Using these inputs with the methodology provides a consistent modeling approach for multiple species that can be reviewed and compared. Similar frameworks for invasive species have been suggested however they are often conceptual, have limited human input or suggest the development of multiple models across spatial scales [24,[58][59][60][61]. Sofaer et al. [12] identified the development of credible and repeatable models as an important issue limiting the use of species distribution models, and this framework provides a credible process as illustrated in the evaluation table (Table 4) and outputs such as response curves for expert review that are easily repeatable (Table B and Figure A in S1 File). ...
Article
Full-text available
Predictions of habitat suitability for invasive plant species can guide risk assessments at regional and national scales and inform early detection and rapid-response strategies at local scales. We present a general approach to invasive species modeling and mapping that meets objectives at multiple scales. Our methodology is designed to balance trade-offs between developing highly customized models for few species versus fitting non-specific and generic models for numerous species. We developed a national library of environmental variables known to physiologically limit plant distributions and relied on human input based on natural history knowledge to further narrow the variable set for each species before developing habitat suitability models. To ensure efficiency, we used largely automated modeling approaches and human input only at key junctures. We explore and present uncertainty by using two alternative sources of background samples, including five statistical algorithms, and constructing model ensembles. We demonstrate the use and efficiency of the Software for Assisted Habitat Modeling [SAHM 2.1.2], a package in VisTrails, which performs the majority of the modeling analyses. Our workflow includes solicitation of expert feedback on model outputs such as spatial prediction results and variable response curves, and iterative improvement based on new data availability and directed field validation of initial model results. We highlight the utility of the models for decision-making at regional and local scales with case studies of two plant species that invade natural areas: fountain grass (Pennisetum setaceum) and goutweed (Aegopodium podagraria). By balancing model automation with human intervention, we can efficiently provide land managers with mapped predicted distributions for multiple invasive species to inform decisions across spatial scales.
... As a result, studies that rely on ENMs usually do not have the same methodological rigor as studies that focus on developing ENMs, i.e., several studies still apply (Area Under the Curve) AUC as an evaluation metric, even though it has been demonstrated for over 10 years that the metric is deeply affected by prevalence (Lobo et al., 2008) or the extent of the accessible area (Peterson et al., 2008;Barve et al., 2011). On the other side, there has been a great effort to develop alternatives for the AUC and several other methodological aspects, which have been implemented in several R packages and ENMs software Guo & Liu, 2010;Naimi & Araújo, 2016;Hijmans et al., 2017;Golding et al., 2018;Kass et al., 2018;Sánchez-Tapia et al., 2018;Cobos et al., 2019). ...
Article
Ecological niche models (ENMs) is a popular method in ecology, mostly due to its broad applicability and the fact that required data is simple and easily accessible from digital databases. Nevertheless, there is an underlying methodological complexity, often overlooked by many scientists that rely on ENMs to achieve other objectives. We present here the package ENMTML, an Open Source R package. The main purpose of this package is to assemble all this methodological complexity spread over several papers and bring it into the spotlight in a simple way for people not used to the details of ENMs. The package contains several alternatives to different methodological steps, e.g., pseudo-absence allocation and accessible area delimitation, formulated within a single function, to make it accessible for people not used to the programming environment.
... Along with other types of participants, GBIF gathers data from 1152 institutions, totaling approximately a billion records. SiBBr [16] is the Brazilian GBIF node, publishing species occurrence records and providing an ecological niche modeling portal [45]. BaMBa [27] is a biodiversity database that focuses on marine ecosystems that is also integrated with GBIF. ...
Article
Full-text available
The well-being of human and wildlife health involves many challenges, such as monitoring the movement of pathogens; expanding health surveillance; collecting data and extracting information to identify and predict risks; integrating specialists from different areas to handle data, species and distinct social and environmental contexts; and the commitment to bringing relevant information to society. In Brazil, there is still the difficulty of building a system that is not impaired by its large territorial extension and its poorly integrated sectoral policies. The Brazilian Wildlife Health Information System, SISS-Geo (SISS-Geo is the abbreviation of “Sistema de Informação em Saúde Silvestre Georreferenciado” (which translates to “Georeferenced Wildlife Health Information System”) and can be accessed at http://www.biodiversidade.ciss.fiocruz.br or http://sissgeo.lncc.br (in Portuguese)), is a platform for collaborative monitoring that intends to overcome the challenges in wildlife health. It aims at the integration and participation of various segments of society, encompassing the registration of animals occurrences by citizen scientists; the reliable diagnosis of pathogens from the laboratory and expert networks; and computational and mathematical challenges in analytical and predictive systems, model interpretation, data integration and visualization, and geographic information systems. It has been successfully applied to support decision-making on recent wildlife health events, such as a Yellow Fever epizootic. The online version of this article is available at: https://rdcu.be/bIxuD
... The end result is identification of a set of conditions associated with the occurrence of the species in question. Model-R [14] is a workflow, implemented in R, that automates some of the common steps for performing ENM. The following is an overview of each of these steps and the respective activities implemented by Model-R (Fig. 4): Pre-processing: comprises the data acquisition stage, which includes assembly of environmental layers and occurrence points, used as input to model algorithms. ...
Chapter
Full-text available
Reproducibility is a fundamental requirement of the scientific process since it enables outcomes to be replicated and verified. Computational scientific experiments can benefit from improved reproducibility for many reasons, including validation of results and reuse by other scientists. However, designing reproducible experiments remains a challenge and hence the need for developing methodologies and tools that can support this process. Here, we propose a conceptual model for reproducibility to specify its main attributes and properties, along with a framework that allows for computational experiments to be findable, accessible, interoperable, and reusable. We present a case study in ecological niche modeling to demonstrate and evaluate this framework.
... Tapia et al. 2018) with a three-fold cross validation procedure, meaning that two251 partitions were used for parameter estimation and algorithm training, and one to 252 evaluate the model's accuracy. Random pseudo-absence points (nback = 1000) were 253 sorted within a mean distance buffer, where the radius of the buffer was the mean 254 geographic distance between the occurrence points. ...
Preprint
Full-text available
The increasing worldwide interest on the conservation of tropical forests reflects the conversion of over 50% of their area into agricultural lands and other uses. Understanding the distribution of remaining biodiversity across agricultural landscapes is an essential task to guide future conservation strategies. To understand the long-term effects of fragmentation on biodiversity, we investigated whether forest fragments in southeastern Brazil are under a taxonomic homogenization or heterogenization process. We estimated pre-deforestation species richness and composition based on a Species Distribution Modelling approach, and compared them to the observed patterns of α- and β-diversity. In particular, we asked (i) if changes in β-diversity reveal convergence or divergence on species composition; (ii) if these changes are similar between forest fragments in Strictly Protected Areas (SPAs) (n=20) and within private lands (n=367) and in different regions of the state (West, Center, and Southeast). We detected steep reductions in observed local species richness in relation to our modeled predictions, and this was particularly true among forest fragments in non-protected private lands. The higher observed β diversity indicated an overall biotic heterogenization process, consistent with the idea that the originally diverse vegetation is now reduced to small and isolated patches, with unique disturbance histories and impoverished communities derived from a large regional species pool. Since conservation of biodiversity extends beyond the boundaries of strictly Protected Areas, we advocate forest fragments are valuable for conservation in agricultural landscapes, with particular relevance for private lands, which represent the most exposed and neglected share of what is left.