unable to remove spatial autocorrelation from a binomial gam
Hello Olga Thanks a lot for your response. It is very helpful. Yes, my data is presence/absence because I'm observing the occurrence of bear damaging apiaries in a particular region. Since there is a compensation system that is running for a long time we can assume that almost all damage is included in the database. So perhaps a few absences could be presences (a beekeeper not claiming the damage) but I'm pretty sure that it'd be marginal. I have also read what you say about environmental data not being always an issue that should be removed from a model. But in some books and articles, it is written that properly accounting for autocorrelation is necessary for obtaining reliable statistical inference ( http://highstat.com/index.php/mixed-effects-models-and-extensions-in-ecology-with-r see also here https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.1674 ). What should I follow? So far my approach is more conservative and I try to remove since I imagine reviewers asking me to do so. I knew about the possibility of subsampling to avoid autocorrelation but I've read that it's not the best solution. That's why I was trying to use correlation structures. I have got the advice to use the function gamm that allow such correlations and check if the model fit is more ore less similar to the one of a gam model. I am in the middle of that now and waiting for the gamm to finish as it is computationally costly (it may take a few days). I didn't know about the package that you recommended so I will take a look at it. Maybe the weightCases() function will be a good solution to my problem. Thank you so much once again for your help. All the best, Carlos
On Fri, 10 Apr 2020 at 12:04, Olga Boet <formigareina at gmail.com> wrote:
Hi Carlos, Excuse me, I don't sure that I can help you, I know little about GAM. I don?t understand your script and variogram, I work different. I hope someone else gives you a better answer than mine. But if it can help, here are some considerations. Spatial data is often correlated, but it must be evaluated if it is a problem or not. For exemple, some species are distributed by stains as frogs, fihes or some plants species (this correlation should not be eliminated). I think the smooothing function in GAM is to smooth the curves, that is, it softens (less abrupt) the effect of environmental variables (not the coordinates, since the coordinates are not environmental variables in a spatial model). However, in Dimo package, there are two interesting functions: balancing weights function and thinning function. Balance function is weightCases(), and it is used when the background is very large with respect to the number of presences. So that the values of the variables in the presence points have more weight in the model despite the lower number. Thinning function removes points that are too close to each other (or in a space where variable data is not available). It is used when there are points that are too clustered as a result of sampling (but it does not correspond to the actual distribution). In this function you can determine the minimum distance between the points. thinning() is from package spThin (URL: https://cran.r-project.org/web/packages/spThin) Finally, are your data really presence/absence data? did you go to at 3355 cells and detect presence/absence of the species? spatial models are different if we have absences, pseudoabsences or backround. The type of absence data is important for choosing a model. I'm sorry I couldn't answer your questions Kind regards, Olga Boet Documentalista de la col?lecci? de cordats. CMCNB *Myrmex* Missatge de Carlos Bautista <carlosbautistaleon at gmail.com> del dia dj., 9 d?abr. 2020 a les 17:52:
Dear list members, I am using gam (from mgcv package in R) to model presence/absence data in 3355 cells of 1x1km (151 presences and 3204 absences). Even though I include a smooth with the spatial locations in the model to address the spatial dependence in my data, the results from a variogram show spatial autocorrelation in the residuals of my gam (range=6000 meters). Since I am modelling a binary response, using a gamm with a correlation structure is not advisable because it "performs poorly with binary data", neither gamm4 because (although is supposed to be appropriate for binary data) it has "no facility for nlme style correlation structures". The alternative I have found is to fit my model using the function magic from the same mgcv package. Because I found no examples of how to use magic for spatially correlated data I have adapted the ?magic example for temporally correlated data. The results of the output change the coefficients of the model but do not remove the spatial autocorrelation and the smooth plots show the same effect. You can find find the output from my models and figures of the variograms and plots of the smooth effects in the following link https://stackoverflow.com/questions/61110762/gam-with-binomial-distribution-and-with-spatial-autocorrelation-in-r Could someone tell me if there is something wrong in my script? Does anyone know another alternative to remove the residuals' spatial autocorrelation from a binomial gam? Thank you very much. Kind regards, Carlos -- Carlos Bautista Institute of Nature Conservation Polish Academy of Sciences Mickiewicza 33 31-120 Krakow, Poland www.carpathianbear.pl www.iop.krakow.pl [[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Carlos Bautista Institute of Nature Conservation Polish Academy of Sciences Mickiewicza 33 31-120 Krakow, Poland www.carpathianbear.pl www.iop.krakow.pl [[alternative HTML version deleted]]