scholarly journals block CV : An r package for generating spatially or environmentally separated folds for k ‐fold cross‐validation of species distribution models

2018 ◽  
Vol 10 (2) ◽  
pp. 225-232 ◽  
Author(s):  
Roozbeh Valavi ◽  
Jane Elith ◽  
José J. Lahoz‐Monfort ◽  
Gurutzeta Guillera‐Arroita
2018 ◽  
Author(s):  
Roozbeh Valavi ◽  
Jane Elith ◽  
José J. Lahoz-Monfort ◽  
Gurutzeta Guillera-Arroita

SummaryWhen applied to structured data, conventional random cross-validation techniques can lead to underestimation of prediction error, and may result in inappropriate model selection.We present the R package blockCV, a new toolbox for cross-validation of species distribution modelling.The package can generate spatially or environmentally separated folds. It includes tools to measure spatial autocorrelation ranges in candidate covariates, providing the user with insights into the spatial structure in these data. It also offers interactive graphical capabilities for creating spatial blocks and exploring data folds.Package blockCV enables modellers to more easily implement a range of evaluation approaches. It will help the modelling community learn more about the impacts of evaluation approaches on our understanding of predictive performance of species distribution models.


2020 ◽  
Author(s):  
Daijiang Li ◽  
Russell Dinnage ◽  
Lucas Nell ◽  
Matthew R. Helmus ◽  
Anthony Ives

SummaryModel-based approaches are increasingly popular in ecological studies. A good example of this trend is the use of joint species distribution models to ask questions about ecological communities. However, most current applications of model-based methods do not include phylogenies despite the well-known importance of phylogenetic relationships in shaping species distributions and community composition. In part, this is due to lack of accessible tools allowing ecologists to fit phylogenetic species distribution models easily.To fill this gap, the R package phyr (pronounced fire) implements a suite of metrics, comparative methods and mixed models that use phylogenies to understand and predict community composition and other ecological and evolutionary phenomena. The phyr workhorse functions are implemented in C++ making all calculations and model estimations fast.phyr can fit a variety of models such as phylogenetic joint-species distribution models, spatiotemporal-phylogenetic autocorrelation models, and phylogenetic trait-based bipartite network models. phyr also estimates phylogenetically independent trait correlations with measurement error to test for adaptive syndromes and performs fast calculations of common alpha and beta phylogenetic diversity metrics. All phyr methods are united under Brownian motion or Ornstein-Uhlenbeck models of evolution and phylogenetic terms are modelled as phylogenetic covariance matrices.The functions and model formula syntax we propose in phyr serves as a simple and unified framework that ignites the use of phylogenies to address a variety of ecological questions.


2017 ◽  
Vol 8 (12) ◽  
pp. 1795-1803 ◽  
Author(s):  
Sylvain Schmitt ◽  
Robin Pouteau ◽  
Dimitri Justeau ◽  
Florian Boissieu ◽  
Philippe Birnbaum

Author(s):  
Matutini Florence ◽  
Baudry Jacques ◽  
Pain Guillaume ◽  
Sineau Morgane ◽  
Pithon Joséphine

AbstractSpecies distribution models (SDM) have been increasingly developed in recent years but their validity is questioned. Their assessment can be improved by the use of independent data but this can be difficult to obtain and prohibitive to collect. Standardized data from citizen science may be used to establish external evaluation datasets and to improve SDM validation and applicability. We used opportunistic presence-only data along with presence-absence data from a standardized citizen science program to establish and assess habitat suitability maps for 9 species of amphibian in western France. We assessed Generalized Additive and Random Forest Models’ performance by (1) cross-validation using 30% of the opportunistic dataset used to calibrate the model or (2) external validation using different independent data sets derived from citizen science monitoring. We tested the effects of applying different combinations of filters to the citizen data and of complementing it with additional standardized fieldwork. Cross-validation with an internal evaluation dataset resulted in higher AUC (Area Under the receiver operating Curve) than external evaluation causing overestimation of model accuracy and did not select the same models; models integrating sampling effort performed better with external validation. AUC, specificity and sensitivity of models calculated with different filtered external datasets differed for some species. However, for most species, complementary fieldwork was not necessary to obtain coherent results, as long as the citizen science data was strongly filtered. Since external validation methods using independent data are considered more robust, filtering data from citizen sciences may make a valuable contribution to the assessment of SDM. Limited complementary fieldwork with volunteer’s participation to complete ecological gradients may also possibly enhance citizen involvement and lead to better use of SDM in decision processes for nature conservation.


2021 ◽  
Author(s):  
Jaime Carrasco ◽  
Fugencio Lison ◽  
Andres Weintraub

Traditional Species Distribution Models (SDMs) may not be appropriate when examples of one class (e.g. absence or pseudo-absences) greatly outnumber examples of the other class (e.g. presences or observations), because they tend to favor the learning of observations more frequently. We present an ensemble method called Random UnderSampling and Boosting (RUSBoost), which was designed to address the case where the number of presence and absence records are imbalanced, and we opened the "black-box" of the algorithm to interpret its results and applicability in ecology. We applied our methodology to a case study of twenty-five species of bats from theIberian Peninsula and we build a RUSBoost model for each species. Furthermore,in order to improve to build tighter models, we optimized their hyperparametersusing Bayesian Optimization. In particular, we implemented a objective function that represents the cross-validation loss: kFoldLoss(z), with z representing the hyper-parameters Maximum Number of Splits, Number of Learners and Learning Rate. The models reached average values for Area Under the ROC Curve (AUC), specificity, sensitivity, and overall accuracy of 0.84±0.05%, 79.5±4.87%, 74.9±6.05%,and 78.8±5.0%, respectively. We also obtained values of variable importance and we analyzed the relationships between explanatory variables and bat presence probability. The results of our study showed that RUSBoost could be a useful tool to develop SDMs with good performance when the presence/absence databases are imbalanced. The application of this algorithm could improve the prediction of SDMs and help in conservation biology and management.


2018 ◽  
Author(s):  
Boyan Angelov

ABSTRACTSpecies Distribution Models (SDMs) are used to generate maps of realised and potential ecological niches for a given species. As any other machine learning technique they can be seen as “black boxes”, due to a lack of interpretability. Advances in other areas of applied machine learning can be applied to remedy this problem. In this study we test a new tool relying on Local Interpretable Model-agnostic Explanations (LIME) by comparing its results of other known methods and ecological interpretations from domain experts. The findings confirm that LIME provides consistent and ecologically sound explanations of climate feature importance during the training of SDMs, and that the sdmexplain R package can be used with confidence.


Sign in / Sign up

Export Citation Format

Share Document