Back to formatted view
Raw Message

Message-ID: <CACu_zZnP-pjY9N733ZkqGz9k9OPbxtY5e7LfpNYeGg-JkQqnnw@mail.gmail.com>
Date: 2012-01-16T20:54:48Z
From: Colin Wahl
Subject: Comparing results from glmer and glht
In-Reply-To: <AA818EAD2576BC488B4F623941DA74275732B97D@inbomail.inbo.be>

Thierry,
Thank you for the suggestion. I think I understand, though creating
user defined contrasts will be difficult.

I create a combined variable:
wshrip <-c bind(wsh, rip)

Resulting in a new model
modelEPT<-glmer(EPT ~ wshrip + (1|stream) + (1|stream:rip) + (1|obs),
data=ept, family=binomial(link="logit"))

The user defined contrasts I tried previously were:
wsh <- rbind("C vs. D" = c(1,0,0,0,0,0,0,0),
            "C vs. F" = c(0,1,0,0,0,0,0,0),
            "C vs. G" = c(0,0,1,0,0,0,0,0),
            "D vs. F" = c(-1,1,0,0,0,0,0,0),
            "D vs. G" = c(-1,0,1,0,0,0,0,1),
            "F vs. G" = c(0,-1,1,0,0,0,0,0))

This did not give me reasonable results, prompting me to use built-in
contrasts:

summary(glht(modelEPT, linfct=mcp(wsh="Tukey")))

For riparian contrasts I used:
rip <- rbind("C: Forested vs. Non-Forested" = c(0,0,0,0,1,0,0,0),
            "D: Forested vs. Non-Forested" = c(0,0,0,0,1,1,0,0),
            "F: Forested vs. Non-Forested" = c(0,0,0,0,1,0,1,0),
            "G: Forested vs. Non-Forested" = c(0,0,0,0,1,0,0,1))

I dont understand how these contrasts are defined, but based them on
contrasts on an almost identical design found here:
http://thebiobucket.blogspot.com/2011/06/glmm-with-custom-multiple-comparisons.html#more

Could you please explain how to use user-defined for the model with a
wshrip combined variable? I cant find a clear example of how to do
this. The parameter length is different from the original model now
that there is a combined variable, correct?

Thank you very much, I'd be lost in the dark without this mailing list.

Colin






On Mon, Jan 16, 2012 at 12:58 AM, ONKELINX, Thierry
<Thierry.ONKELINX at inbo.be> wrote:
> Dear Colin,
>
> I don't think your Tukey tests are very informative. Because you are testing main effects how have an interaction with another variable. So you are not testing the overall effect of a variable but the effect when all other variable are 0 (continuous) or at the reference level (factor).
>
> A better way of doing this could bet o create a new variable which is the interaction between wsh and rip and use that in your model instead of wsh * rip. Then you can use glht() with user defined contrasts so that the contrasts test what you want to test.
>
> Best regards,
>
> Thierry
>
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
> + 32 2 525 02 51
> + 32 54 43 61 85
> Thierry.Onkelinx at inbo.be
> www.inbo.be
>
> To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
> ~ Sir Ronald Aylmer Fisher
>
> The plural of anecdote is not data.
> ~ Roger Brinner
>
> The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> -----Oorspronkelijk bericht-----
> Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Colin Wahl
> Verzonden: zondag 15 januari 2012 1:23
> Aan: r-sig-mixed-models at r-project.org
> Onderwerp: [R-sig-ME] Comparing results from glmer and glht
>
> I will try to make this concise.
>
> Background: I am testing the effects of land use and forested riparian buffers on stream invertebrates and in-stream variables. There are 4 watershed types (defined by 4 types of land use) and two riparian types (forested and non). Percent EPT (relative abundance) was my main response variable. I also measured a variety of in-stream variables like temperature, nutrients, and toxicity. There are 72 observations for invertebrates, and 24 for in-stream variables.
>
> I am curious of how acceptable p values are from pairwise comparisons using glht() from the multcomp package
>
> I used glmer with a binomial error structure and an observation-level random effect (to account for overdispersion), to model invertebrates:
>
> modelEPT<-glmer(EPT ~ wsh*rip + (1|stream) + (1|stream:rip) + (1|obs), data=ept, family=binomial(link="logit"))
>
> ? AIC ? BIC logLik deviance
> ?284.4 309.5 -131.2 ? ?262.4
> Random effects:
> ?Groups ? ? Name ? ? ? ?Variance Std.Dev.
> ?obs ? ? ? ?(Intercept) 0.30186 ?0.54942
> ?stream:rip (Intercept) 0.40229 ?0.63427
> ?stream ? ? (Intercept) 0.12788 ?0.35760
> Number of obs: 72, groups: obs, 72; stream:rip, 24; stream, 12
>
> Fixed effects:
> ? ? ? ? ? ? ? ? Estimate Std. Error z value Pr(>|z|)
> (Intercept) ? -4.2906 ? ? 0.4935 ? -8.694 ?< 2e-16 ***
> wshd ? ? ? ? ? -2.0557 ? ? 0.7601 ?-2.705 0.00684 **
> wshf ? ? ? ? ? ?3.3575 ? ? 0.6339 ? 5.297 ?1.18e-07 ***
> wshg ? ? ? ? ? 3.3923 ? ? 0.7486 ? ?4.531 ?5.86e-06 ***
> ripN ? ? ? ? ? ? 0.1425 ? ? 0.6323 ? 0.225 ?0.82165
> wshd:ripN ? ? 0.3708 ? ? 0.9682 ? 0.383 ?0.70170
> wshf:ripN ? ?-0.8665 ? ? 0.8087 ? -1.071 ?0.28400
> wshg:ripN ? ?-3.1530 ? ? 0.9601 ?-3.284 ?0.00102 **
> ---
>
> Correlation of Fixed Effects:
> ? ? ? ? ? ? ? ? (Intr) ?wshd ? wshf ? wshg ? ripN ? wshd:N wshf:N
> wshd ? ? ? ?-0.649
> wshf ? ? ? ?-0.779 ?0.505
> wshg ? ? ? ?-0.659 ?0.428 ?0.513
> ripN ? ? ? ? -0.644 ?0.418 ?0.501 ?0.424
> wshd:ripN ?0.421 -0.672 -0.327 -0.277 -0.653 wshf:ripN ?0.503 -0.327 -0.638 -0.332 -0.782 ?0.511 wshg:ripN ?0.424 -0.275 -0.330 -0.632 -0.659 ?0.430 ?0.515
>
>
> I then used this model to do Tukey's HSD contrasts between watershed types:
>
> summary(glht(modelEPT, linfct=mcp(wsh="Tukey"))) Linear Hypotheses:
>
> ? ? ? ? ? ? ? ?Estimate Std. Error z value Pr(>|z|)
> d - c == 0 -2.05573 ? ?0.76010 ?-2.705 ? 0.0341 *
> f - c == 0 ?3.35753 ? ?0.63386 ? 5.297 ? <0.001 ***
> g - c == 0 ?3.39231 ? ?0.74862 ? 4.531 ? <0.001 ***
> f - d == 0 ?5.41326 ? ?0.70176 ? 7.714 ? <0.001 ***
> g - d == 0 ?5.44804 ? ?0.80692 ? 6.752 ? <0.001 ***
> g - f == 0 ?0.03479 ? ?0.68931 ? 0.050 ? 1.0000
>
> and riparian types:
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Estimate Std. Error z value Pr(>|z|)
> C: Forested vs. Non-Forested == 0 ? ? ? ? 0.1425 ? ? 0.6323 ? 0.225 ?0.99999
> D: Forested vs. Non-Forested == 0 ? ? ? ? 0.5134 ? ? 0.7332 ? 0.700 ?0.98659
> F: Forested vs. Non-Forested == 0 ? ? ? ?-0.7239 ? ? 0.5042 ?-1.436 ?0.69625
> G: Forested vs. Non-Forested == 0 ? ? ? ?-3.0105 ? ? 0.7225 ?-4.167 ?< 0.001 ***
>
> Are these p values accurate? Or is that a personal judgement I have to make based on the clarity of the patterns they reflect?
>
> I've shown these results in my figures and explained them in my results. I've basically explained that though these p values reasonably reflect patterns in my data, effects sizes, and variances, that they are inexact and potentially anti-conservative due to the issues with degrees of freedom in mixed models.
>
> >From what I understand from my research in the last year is that
> Douglas Bates and others advocate something of a paradigm shift away from the petagogically reinforced reliance on cryptic p values toward more in depth discussions of effects sizes and variances. The use of MCMC sampling and HPD intervals are suggested, but these are not available for generalized models.
>
> I am interested in publishing these results as an ecologist, not a statistician (pardon the somewhat artificial distinction), and, I am very interested in what kind of a discussion the statisticians and ecologists of the r-sig-mixed-models mailing list would like to see as potential reviewers.
>
> Thank you,
>
> Colin Wahl
>
> M.S. candidate,
> Dept. of Biology
> Western Washington University
> Bellingham, WA
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models