help
<jersa at ...> writes:
Dear glmer experts, I would be very happy if someone could help with following problem.
I have following data: I planted seed bags in the vicinity of 10 mother plants into three directions and 4 distances (10,31,56, 100cm). The germination success (0/1) was asseed by extracting one seed bag per microsite for next three years. I am interested into the effect of distance on germination success and possible differencies in germination between years. The data have lots of 0 values.
I used originaly following syntax glmer(germination~distance+(1|plant/direction/year), family=binomial,data=seed)
Logically, direction and distance seem more like fixed effects to me (see http://glmm.wikidot.com/faq#fixed_vs_random for more discussion), but this leads to some serious overparameterization problems, so you may actually be better off treating them as grouping factors as you are here. Since you have a randomized-block design (all levels of fixed effects are replicated within every block), you could *in principle* fit a full model: glmer(germination~distance*direction*year+ (distance*direction*year|plant), ...) that accounts for the variation in all effects among plants, but it certainly won't be practical -- there are 36 combinations of year/direction/distance, and the random effect here would try to estimate all of the correlations among them, so you'd have 36 fixed-effect parameters and (36*37/2) random effect parameters -- somewhat crazy. You have a total of 360 observations, but if you have "lots of zeros" then the effective sample size is more appropriately considered as the number of successful germinations (see Harrell _Regression Modeling Strategies_). If we suppose you have 10% germination overall, you shouldn't be trying to fit more than three or four (approx. N/10) parameters to this data set, so you're going to be having some difficulty ... Even germination~(distance+direction+year)^2+ (1|plant) which fits all the two-way interactions between distance/direction/year is way too complex ... The logical problem with your plant/direction/year specification is that it assumes that the effects of direction can only vary within plants, not consistently across plants (maybe reasonable if your directions differ for each plant and are not e.g. North/South/West), and worse that the effect of year can only vary within plant and direction and not overall. It's tempting to use (1|plant/direction)+(1|year) , but then you'll be in trouble because it's hard to estimate a variance from three points (you'll probably end up concluding, wrongly, that there's zero variance across years). Logically you could add year as a fixed effect, but that then costs another two parameters, which you can hardly afford to spend ... To get back to your original question about interactions -- unless they're very large, I think you're going to have a hard time detecting them in any case with this size data set. What I might do is use something close to your original model, or perhaps ~distance+year+(1|plant/direction) or ~distance+year+(distance|plant) (since distance is your variable of primary interest, you really should be trying to allow for among-plant variation in it -- see Schielzeth and Forstmeier 2009) -- this is not going to be practical unless you treat distance as a continuous variable though. Bottom line: I would try to do something fairly simple and sensible, *LOOK AT YOUR DATA* to try to see what the main patterns are, and hope that large interactions etc. will emerge in the model diagnostic plots if they're there. good luck Ben Bolker