Very large HPDintervals, mcmcsamp, lmer
Thierry, I have two quick follow up questions for my model that I hope you can can answer for me. model<-lmer(TN ~ wsh*rip + (1|stream), data=all24) First, the one I should know: how did you determine the number of parameters? Watershed (4) + reach (2) +reach x watershed (1) = 7? And how many parameters are in this model? 9? model<-glmer(Y ~ wsh*rip + (1|stream) + (1|stream:rip) + (1|obs), data=ept, family=binomial(link="logit")) Second: is there a simple explanation for why 10 observations per parameter is the rule of thumb? Thank you Colin On Tue, Nov 22, 2011 at 12:59 AM, ONKELINX, Thierry
<Thierry.ONKELINX at inbo.be> wrote:
Dear Colin, Your sample size is too small for you model. You model has 7 fixed and 1 random parameter. You could use 10 observations per parameters as a rule of thumb. Hence your model would require about 80 obversations. This leaves you with two options: a) collect more data b) reduce the model. Best regards, Thierry
-----Oorspronkelijk bericht----- Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models- bounces at r-project.org] Namens Colin Wahl Verzonden: maandag 21 november 2011 23:54 Aan: r-sig-mixed-models at r-project.org Onderwerp: [R-sig-ME] Very large HPDintervals, mcmcsamp, lmer Hello all, I am using lmer to describe patterns in stream variables between streams draining watersheds with different land uses. I need to make some kind of determination of significance. I understand that mixed models cannot provide accurate p values because the denominator df is unknown. As per D. Bates recommendation in his post "lmer, p-values and all that" I used mcmcsamp and HPDinterval to get 95% "confidence intervals." These intervals were very large and suggest that there are no significant differences in variables where differences seem obvious. I'll use one of my variables as an example: total stream nitrogen. There are 12 streams and 24 sites (two riparian types per stream). model<-lmer(TN ~ wsh*rip + (1|stream), data=all24) ? ?Data: all24 ? ?AIC ? BIC logLik deviance REMLdev ?78.47 90.25 -29.24 ? ?65.15 ? 58.47 Random effects: ?Groups ? Name ? ? ? ?Variance Std.Dev. ?stream ? (Intercept) 4.10443 ?2.0259 ?Residual ? ? ? ? ? ? 0.20885 ?0.4570 Number of obs: 24, groups: stream, 12 Fixed effects: ? ? ? ? ? ? ? ? ?Estimate Std. Error t value (Intercept) ?6.04887 ? ?1.19903 ? 5.045 wshD ? ? ? ?-4.65847 ? ?1.69569 ?-2.747 wshF ? ? ? ?-4.85194 ? ?1.58617 ?-3.059 wshG ? ? ? ?-5.18847 ? ?1.89584 ?-2.737 ripN ? ? ? ? ? ?-0.18540 ? ?0.37314 ?-0.497 wshD:ripN ? ?0.15800 ? ?0.52770 ? 0.299 wshF:ripN ? ?0.09825 ? ?0.49362 ? 0.199 wshG:ripN ? ?0.00915 ? ?0.58998 ? 0.016 Estimates are accurate; intercept(cultivated) watersheds clearly have a much higher nitrogen concentration than the other watersheds. I use mcmcsamp and HPD interval to get 95% CIs: model.mcmc<-mcmcsamp(model, n=1000) model.HPDi<-HPDinterval(model.mcmc, prob=0.95) ? ? ? ? ? ? ? ? ? ? ? Estimate ? ? ? ? ?Std. Error ? ? ? ? ? ?t ? ?lower ? ? ? ? ? ? ? ?upper Cultivated ? ? ? 6.04886663 ? 1.19903192 ? 5.04479200 ? 3.75759240 ? 8.40959042 Developed ? ? ? ?1.39040003 ? 1.69568720 ?-2.74724406 ?-4.27036665 6.80232811 Forested ? ? ? ? 1.19692503 ? 1.58617014 ?-3.05890364 ?-4.53524163 ? 6.52712926 Grassland ? ? ? ?0.86039999 ? 1.89583593 ?-2.73676987 ?-4.96308190 ? 6.93964588 Riparia(Cult) ? ?5.86346654 ? 0.37313911 ?-0.49686587 ? 0.52284572 11.44574598 Riparia(Devel) ? 1.36300000 ? 0.52769839 ? 0.29941356 -12.15807916 14.22417096 Riparia(Fores) ? 1.10977501 ? 0.49361665 ? 0.19904123 -11.93267982 13.77391850 Riparia(Grass) ? 0.68414998 ? 0.58998474 ? 0.01550901 -13.67422625 14.50577622 Considering that the CI are substantially overlapped, I interpret these CIs to suggest there is no detectable differences between cultivated streams and other streams. This cant be right. I have a very elementary understanding of Bayesian statistics, and have not used mcmc or HPDinterval before. Is it apparent that I've done something wrong? Is my sample size too small? I hope this isnt just a problem with calculating CIs from the output, which I'm pretty sure is done the same way its done with lmer (see below). Please help; I'm desperate to wrap up this analysis. Thank you, Colin Wahl Graduate Student Western Wash. University Bellingham, WA USA CI recalculation: I would be confident that CIs for each fixed effect are relative to the intercept, but the CIs look wrong so I am questioning myself. Is it true that the HPDintervals for each fixed effect are relative to the intercept coefficient, just as they are in the initial lmer output? i.e... ? ? ? ? ? ? ? ? lower ? ? upper (Intercept) ?3.684270 ?8.626003 wshD ? ? ? ?-8.256167 -1.259411 wshF ? ? ? ?-8.469073 -1.852237 wshG ? ? ? ?-9.338573 -1.498553 ripN ? ? ? ?-3.309220 ?3.105756 wshD:ripN ? -4.412022 ?4.944243 wshF:ripN ? -4.452947 ?4.422371 wshG:ripN ? -5.510786 ?4.917613 becomes: ? ? ? ? ? ? ? ? ? lower ? ? upper (Intercept) ? 3.6842697 ?8.626003 wshD ? ? ? ? -4.5718974 ?7.366592 wshF ? ? ? ? -4.7848029 ?6.773766 wshG ? ? ? ? -5.6543034 ?7.127450 ripN ? ? ? ? ?0.3750498 11.731759 ripD ? ? ? ?-12.2931391 15.416590 ripF ? ? ? ?-12.5469694 14.301893 ripG ? ? ? ?-14.4743090 15.150819 Using the following code: ci<-as.matrix(model.HPDi$fixef) C=ci[1,] D=C+ci[2,] FF=C+ci[3,] G=C+ci[4,] CN=C+ci[5,] DN=(D)+(ci[5,]+ci[6,]) FN=(FF)+(ci[5,]+ci[7,]) GN=(G)+(ci[5,]+ci[8,]) confint<-c(C,D,FF,G,CN,DN,FN,GN) -- CW
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
CW