Statistical consultation GLMM
On 7/22/21 7:45 PM, Estefania Isabel Mu?oz Salas wrote:
Hi, My name is Estefan?a; I am doing a master's degree in Marine Ecology. As part of the project, we are dealing with shorebird count data, which have been taken along the coast of California and northwestern Mexico. The surveys are conducted under a standardized monitoring protocol. Sampling units have been established at each of the sites, polygons with different sizes, which vary from site to site. The birds present in each unit have been counted year after year from 2011 to 2019 one time in winter. In addition to the above, the count data in this case, given the nature of the birds to congregate, make that many units have zeros, and some units have abundances of 1000 birds or more, making the data do not approximate to a normal distribution. Therefore, to treat these data, we use Generalized Linear Mixed Models (GLMM) to contemplate the variability in bird abundance from site to site and from the sampling unit to the sampling unit. The objective of my work is to know the population trend of three species of shorebirds (analyzed separately), and if there is a relationship with environmental variables such as average temperature, minimum, and maximum temperature, and precipitation; and if there is a difference between regions, in this case, were grouped sites in California, those of the Baja California peninsula and another region of northwestern Mexico, that we called Continental. Initially, I tested which distribution family fit the data by testing a Poisson, Poisson zero-inflated, and negative binomial and negative binomial zero-inflated distribution, which are the most common for count data. The distribution that obtained the lowest AIC was the negative binomial zero-inflated. Knowing that there could be a correlation between the predictor variables, I calculated their correlations and for the time we defined that since the correlation between the years and the environmental variables was low <.30, a single model would be made, in which the year, we also decided that the size of each of the sampling units (logarithm of the hectares) would be included since it is different in each unit, and we want to take that into account. The region would also be considered as a factor with 3 levels. Still, the temperature variables did present high correlations, but are the variables we are interested in so, this is where I have several doubts because my formation is not statistical 1.-Should I not include environmental variables in a single model because they are correlated, although they are of interest? 2.-If what I am doing is right or not? 3.-How do I know if I have made a good fit of the data to the model? How do I test it? 4.-How do I select the best model? 5.-What assumptions should I test? 7.- Am I missing something obvious?
These aren't really GLMM-specific questions.
Opinions differ about correlations; my personal opinion is that it is
rarely a good idea to exclude highly correlated predictors from a
regression (see refs below).
I would recommend the DHARMa package (and its extensive,
high-quality vignettes) for assessing issues with the fits.
I would not recommend selecting a best model with a reduced set of
predictors - I would use the full model - but AIC is fine.
Dormann, Carsten F., Jane Elith, Sven Bacher, Carsten Buchmann, Gudrun
Carl, Gabriel Carr?, Jaime R. Garc?a Marqu?z, et al. ?Collinearity: A
Review of Methods to Deal with It and a Simulation Study Evaluating
Their Performance.? Ecography, 2012, no-no.
https://doi.org/10.1111/j.1600-0587.2012.07348.x.
Graham, Michael H. ?Confronting Multicollinearity in Ecological Multiple
Regression.? Ecology 84, no. 11 (2003): 2809?15.
https://doi.org/10.1890/02-3114.
Morrissey, Michael B.?; Ruxton, and Graeme D. Ruxton. ?Multiple
Regression Is Not Multiple Regressions: The Meaning of Multiple
Regression and the Non-Problem of Collinearity.? Philosophy, Theory, and
Practice in Biology 10 (2018).
http://dx.doi.org/10.3998/ptpbio.16039257.0010.003.
Vanhove, Jan. ?Collinearity Isn?t a Disease That Needs Curing.?
PsyArXiv, May 12, 2020. https://doi.org/10.31234/osf.io/mv2wx.
All the above I have done with the glmmTMB package in Rstudio. Thank you very much and sorry in advance if these are very basic questions. The fit I try so far is this: m2znb.all<-glmmTMB(total~ logha + YearCollected + Geopolitical + tmp + tmn + tmx + pre + (1|Site/Plot), ziformula = ~1, data = mc2, family="nbinom2") where: total is the abundance of a species of shorebird logha the size of the unit (logarithmic of the hectare) YearCollected Geopolitical is the region tmp is the mean temperature tmn is the minimum temperature tmx is the maximum temperature pre is the precipitation It would be possible to share the data Regards, Estefan?a. [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dr. Benjamin Bolker Professor, Mathematics & Statistics and Biology, McMaster University Director, School of Computational Science and Engineering Graduate chair, Mathematics & Statistics