Hi, My name is Estefan?a; I am doing a master's degree in Marine Ecology. As part of the project, we are dealing with shorebird count data, which have been taken along the coast of California and northwestern Mexico. The surveys are conducted under a standardized monitoring protocol. Sampling units have been established at each of the sites, polygons with different sizes, which vary from site to site. The birds present in each unit have been counted year after year from 2011 to 2019 one time in winter. In addition to the above, the count data in this case, given the nature of the birds to congregate, make that many units have zeros, and some units have abundances of 1000 birds or more, making the data do not approximate to a normal distribution. Therefore, to treat these data, we use Generalized Linear Mixed Models (GLMM) to contemplate the variability in bird abundance from site to site and from the sampling unit to the sampling unit. The objective of my work is to know the population trend of three species of shorebirds (analyzed separately), and if there is a relationship with environmental variables such as average temperature, minimum, and maximum temperature, and precipitation; and if there is a difference between regions, in this case, were grouped sites in California, those of the Baja California peninsula and another region of northwestern Mexico, that we called Continental. Initially, I tested which distribution family fit the data by testing a Poisson, Poisson zero-inflated, and negative binomial and negative binomial zero-inflated distribution, which are the most common for count data. The distribution that obtained the lowest AIC was the negative binomial zero-inflated. Knowing that there could be a correlation between the predictor variables, I calculated their correlations and for the time we defined that since the correlation between the years and the environmental variables was low <.30, a single model would be made, in which the year, we also decided that the size of each of the sampling units (logarithm of the hectares) would be included since it is different in each unit, and we want to take that into account. The region would also be considered as a factor with 3 levels. Still, the temperature variables did present high correlations, but are the variables we are interested in so, this is where I have several doubts because my formation is not statistical 1.-Should I not include environmental variables in a single model because they are correlated, although they are of interest? 2.-If what I am doing is right or not? 3.-How do I know if I have made a good fit of the data to the model? How do I test it? 4.-How do I select the best model? 5.-What assumptions should I test? 7.- Am I missing something obvious? All the above I have done with the glmmTMB package in Rstudio. Thank you very much and sorry in advance if these are very basic questions. The fit I try so far is this: m2znb.all<-glmmTMB(total~ logha + YearCollected + Geopolitical + tmp + tmn + tmx + pre + (1|Site/Plot), ziformula = ~1, data = mc2, family="nbinom2") where: total is the abundance of a species of shorebird logha the size of the unit (logarithmic of the hectare) YearCollected Geopolitical is the region tmp is the mean temperature tmn is the minimum temperature tmx is the maximum temperature pre is the precipitation It would be possible to share the data Regards, Estefan?a.
Statistical consultation GLMM
1 message · Estefania Isabel Muñoz Salas