Do pseudo-absence selection strategies influence species distribution models and their predictions? An information-theoretic approach based on simulated data

doi:10.1186/1472-6785-9-8

. 2009 Apr 24:9:8.

doi: 10.1186/1472-6785-9-8.

Do pseudo-absence selection strategies influence species distribution models and their predictions? An information-theoretic approach based on simulated data

Mary S Wisz¹, Antoine Guisan

Affiliations

PMID: 19393082
PMCID: PMC2680809
DOI: 10.1186/1472-6785-9-8

Do pseudo-absence selection strategies influence species distribution models and their predictions? An information-theoretic approach based on simulated data

Mary S Wisz et al. BMC Ecol. 2009.

. 2009 Apr 24:9:8.

doi: 10.1186/1472-6785-9-8.

Authors

Mary S Wisz¹, Antoine Guisan

Affiliation

¹ Department of Arctic Environment, National Environmental Research Institute, University of Aarhus, Frederiksborgvej 399, 4000 Roskilde, Denmark. msw@dmu.dk

PMID: 19393082
PMCID: PMC2680809
DOI: 10.1186/1472-6785-9-8

Abstract

Background: Multiple logistic regression is precluded from many practical applications in ecology that aim to predict the geographic distributions of species because it requires absence data, which are rarely available or are unreliable. In order to use multiple logistic regression, many studies have simulated "pseudo-absences" through a number of strategies, but it is unknown how the choice of strategy influences models and their geographic predictions of species. In this paper we evaluate the effect of several prevailing pseudo-absence strategies on the predictions of the geographic distribution of a virtual species whose "true" distribution and relationship to three environmental predictors was predefined. We evaluated the effect of using a) real absences b) pseudo-absences selected randomly from the background and c) two-step approaches: pseudo-absences selected from low suitability areas predicted by either Ecological Niche Factor Analysis: (ENFA) or BIOCLIM. We compared how the choice of pseudo-absence strategy affected model fit, predictive power, and information-theoretic model selection results.

Results: Models built with true absences had the best predictive power, best discriminatory power, and the "true" model (the one that contained the correct predictors) was supported by the data according to AIC, as expected. Models based on random pseudo-absences had among the lowest fit, but yielded the second highest AUC value (0.97), and the "true" model was also supported by the data. Models based on two-step approaches had intermediate fit, the lowest predictive power, and the "true" model was not supported by the data.

Conclusion: If ecologists wish to build parsimonious GLM models that will allow them to make robust predictions, a reasonable approach is to use a large number of randomly selected pseudo-absences, and perform model selection based on an information theoretic approach. However, the resulting models can be expected to have limited fit.

PubMed Disclaimer

Figures

**Figure 1**
**Chart summarizing methods**.

**Figure 2**
**Percent adjusted deviance explained by models developed using 2 step and random pseudo-absence strategies (see bottom left corner of plot)**. Each model included the same three predictors used to define the virtual species distribution (tree cover, IAVNDVI, and minimum average temperature) along with their quadratic expressions.

**Figure 3**
ROC-AUC values assessing model discriminatory power for each pseudo-absence threshold from 3- predictor models (correct predictors) including (tree cover, IAVNDVI, and minimum average temperature) along with their quadratic expressions (a-b), plus 6 predictor models that included these plus 3- incorrect predictors including minimum NDVI, seasonality of precipitation, and elevational range (c-d). Model selection was performed using model averaging based on AIC (c-d).

**Figure 4**
**ROC-AUC (discriminatory power) for models built only with the 3 correct predictors versus adjusted deviance explained (i.e. model fit)**. Model fit and discriminatory power are not always inversely correlated. The model built with "true" absences achieved high values for both. Thus ROC-AUC and adjusted deviance measure very different aspects of a model's performance and one should never be used as a surrogate for the other.

See this image and copyright information in PMC

Cited by

Using pseudo-absence models to test for environmental selection in marine movement ecology: the importance of sample size and selection strength.
Pinti J, Shatley M, Carlisle A, Block BA, Oliver MJ. Pinti J, et al. Mov Ecol. 2022 Dec 29;10(1):60. doi: 10.1186/s40462-022-00362-1. Mov Ecol. 2022. PMID: 36581885 Free PMC article.
The potential impact of invasive woody oil plants on protected areas in China under future climate conditions.
Dai G, Yang J, Lu S, Huang C, Jin J, Jiang P, Yan P. Dai G, et al. Sci Rep. 2018 Jan 18;8(1):1041. doi: 10.1038/s41598-018-19477-w. Sci Rep. 2018. PMID: 29348468 Free PMC article.
Topographic models for predicting malaria vector breeding habitats: potential tools for vector control managers.
Nmor JC, Sunahara T, Goto K, Futami K, Sonye G, Akweywa P, Dida G, Minakawa N. Nmor JC, et al. Parasit Vectors. 2013 Jan 16;6:14. doi: 10.1186/1756-3305-6-14. Parasit Vectors. 2013. PMID: 23324389 Free PMC article.
The effects of sampling bias and model complexity on the predictive performance of MaxEnt species distribution models.
Syfert MM, Smith MJ, Coomes DA. Syfert MM, et al. PLoS One. 2013;8(2):e55158. doi: 10.1371/journal.pone.0055158. Epub 2013 Feb 14. PLoS One. 2013. PMID: 23457462 Free PMC article.
Estimating hantavirus risk in southern Argentina: a GIS-based approach combining human cases and host distribution.
Andreo V, Neteler M, Rocchini D, Provensal C, Levis S, Porcasi X, Rizzoli A, Lanfri M, Scavuzzo M, Pini N, Enria D, Polop J. Andreo V, et al. Viruses. 2014 Jan 14;6(1):201-22. doi: 10.3390/v6010201. Viruses. 2014. PMID: 24424500 Free PMC article.

See all "Cited by" articles

References

1. Guisan A, Thuiller W. Predicting species distribution: offering more than simple habitat models. Ecology Letters. 2005;8:993–1009. doi: 10.1111/j.1461-0248.2005.00792.x. - DOI - PubMed
1. Leathwick JR. Are New Zealand's Nothofagus species in equilibrium with their environment? Journal of Vegetation Science. 1998;9:719–732. doi: 10.2307/3237290. - DOI
1. Wisz MS, Walther BA, Rahbek C. Using potential distributions to explore determinants of Western Palaearctic migratory songbird species richness in sub-Saharan Africa. Journal of Biogeography. 2007;34:828–841. doi: 10.1111/j.1365-2699.2006.01661.x. - DOI
1. Robertson MP, Caithness N, Villet MH. A PCA-based modelling technique for predicting environmental suitability for organisms from presence records. Diversity and Distributions. 2001;7:15–27. doi: 10.1046/j.1472-4642.2001.00094.x. - DOI
1. Busby JR. BIOCLIM – a bioclimate analysis and prediction system. In: Margules CR, Austin MP, editor. Nature Conservation: Cost Effective Biological Surveys and Data Analysis. Canberra, Australia: CSIRO; 1991. pp. 64–68.

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

[1] Guisan A, Thuiller W. Predicting species distribution: offering more than simple habitat models. Ecology Letters. 2005;8:993–1009. doi: 10.1111/j.1461-0248.2005.00792.x. - DOI - PubMed

[2] Guisan A, Thuiller W. Predicting species distribution: offering more than simple habitat models. Ecology Letters. 2005;8:993–1009. doi: 10.1111/j.1461-0248.2005.00792.x. - DOI - PubMed

[3] Leathwick JR. Are New Zealand's Nothofagus species in equilibrium with their environment? Journal of Vegetation Science. 1998;9:719–732. doi: 10.2307/3237290. - DOI

[4] Leathwick JR. Are New Zealand's Nothofagus species in equilibrium with their environment? Journal of Vegetation Science. 1998;9:719–732. doi: 10.2307/3237290. - DOI

[5] Wisz MS, Walther BA, Rahbek C. Using potential distributions to explore determinants of Western Palaearctic migratory songbird species richness in sub-Saharan Africa. Journal of Biogeography. 2007;34:828–841. doi: 10.1111/j.1365-2699.2006.01661.x. - DOI

[6] Wisz MS, Walther BA, Rahbek C. Using potential distributions to explore determinants of Western Palaearctic migratory songbird species richness in sub-Saharan Africa. Journal of Biogeography. 2007;34:828–841. doi: 10.1111/j.1365-2699.2006.01661.x. - DOI

[7] Robertson MP, Caithness N, Villet MH. A PCA-based modelling technique for predicting environmental suitability for organisms from presence records. Diversity and Distributions. 2001;7:15–27. doi: 10.1046/j.1472-4642.2001.00094.x. - DOI

[8] Robertson MP, Caithness N, Villet MH. A PCA-based modelling technique for predicting environmental suitability for organisms from presence records. Diversity and Distributions. 2001;7:15–27. doi: 10.1046/j.1472-4642.2001.00094.x. - DOI

[9] Busby JR. BIOCLIM – a bioclimate analysis and prediction system. In: Margules CR, Austin MP, editor. Nature Conservation: Cost Effective Biological Surveys and Data Analysis. Canberra, Australia: CSIRO; 1991. pp. 64–68.

[10] Busby JR. BIOCLIM – a bioclimate analysis and prediction system. In: Margules CR, Austin MP, editor. Nature Conservation: Cost Effective Biological Surveys and Data Analysis. Canberra, Australia: CSIRO; 1991. pp. 64–68.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Do pseudo-absence selection strategies influence species distribution models and their predictions? An information-theoretic approach based on simulated data

Affiliation

Do pseudo-absence selection strategies influence species distribution models and their predictions? An information-theoretic approach based on simulated data

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources