Data sheet notation and model structure for GLMM with 3 non-factorial factors
Thanks for the help!
On Sat, Sep 26, 2009 at 2:43 PM, Douglas Bates <bates at stat.wisc.edu> wrote:
?Sat, Sep 26, 2009 at 3:11 AM, Raldo Kruger <raldo.kruger at gmail.com> wrote:
Hi Douglas,
Many thanks for the input. I've run two analyses on the same dataset using 1) indicator columns and the 2) a single 'factor / treatment' column for the non-factorial design described in my previous e-mail, and the results were identical (great!).
However, I did the same for a dataset with a factorial design (N, G, N*G, i.e. there were plots with N, plots with G, and plots with both N and G), and the results for the main effects are identical, but the estimates for the interaction effects (N*G) are different between the two analyses (see below). Could you help me make sense of that please (i.e. which one is correct?) !
Generally when you have the possibility of having N and G combined you would treat the design as a two-factor two-level factorial. ?That is, one factor for presence or absence of G and another factor for presence or absence of N. ?You could treat it as a single factor with four levels (neither, G only, N only and both N and G) but, as you have seen you need to translate between the representations. In the two-factor, two-level factorial design, let a be the estimate of the main effect for G, b be the estimate of the main effect for N, and c be the interaction estimate. ?In your example a = 0.14929, b = 0.03766 and c = -0.31633. ?Then the estimated cell mean for the NG cell is a + b + c =
?0.03766 + 0.14929 + (-0.31633)
[1] -0.12938
Thanks, Raldo With expanded treatment notation- Fixed effects: ? ? ? ? ? ? ?Estimate Std. Error z value Pr(>|z|) (Intercept) ? ?2.92060 ? ?0.23834 ?12.254 ?< 2e-16 *** N ? ? ? ? ? ? ?0.03766 ? ?0.03486 ? 1.080 ? 0.2801 G ? ? ? ? ? ? ?0.14929 ? ?0.03395 ? 4.397 1.10e-05 *** Yearthree ? ? -2.85449 ? ?0.10664 -26.768 ?< 2e-16 *** Yeartwo ? ? ? -1.88175 ? ?0.06844 -27.494 ?< 2e-16 *** N:G ? ? ? ? ? -0.31633 ? ?0.04953 ?-6.386 1.70e-10 *** N:Yearthree ? ?0.15710 ? ?0.14428 ? 1.089 ? 0.2762 N:Yeartwo ? ? ?0.14736 ? ?0.09305 ? 1.584 ? 0.1133 G:Yearthree ? -0.25107 ? ?0.15430 ?-1.627 ? 0.1037 G:Yeartwo ? ? ?0.07550 ? ?0.09200 ? 0.821 ? 0.4118 N:G:Yearthree ?0.36353 ? ?0.20810 ? 1.747 ? 0.0807 . N:G:Yeartwo ? -0.01158 ? ?0.12996 ?-0.089 ? 0.9290 With single column treatment notation- Fixed effects: ? ? ? ? ? ? ? ? ? Estimate Std. Error z value Pr(>|z|) (Intercept) ? ? ? ? 2.92057 ? ?0.23836 ?12.253 ?< 2e-16 *** TreatG ? ? ? ? ? ? ?0.14928 ? ?0.03395 ? 4.397 1.10e-05 *** TreatN ? ? ? ? ? ? ?0.03767 ? ?0.03486 ? 1.080 0.279928 TreatNG ? ? ? ? ? ?-0.12938 ? ?0.03639 ?-3.556 0.000377 *** Yearthree ? ? ? ? ?-2.85448 ? ?0.10664 -26.768 ?< 2e-16 *** Yeartwo ? ? ? ? ? ?-1.88175 ? ?0.06844 -27.494 ?< 2e-16 *** TreatG:Yearthree ? -0.25109 ? ?0.15430 ?-1.627 0.103693 TreatN:Yearthree ? ?0.15711 ? ?0.14428 ? 1.089 0.276199 TreatNG :Yearthree ?0.26959 ? ?0.14636 ? 1.842 0.065483 . TreatG:Yeartwo ? ? ?0.07549 ? ?0.09200 ? 0.820 0.411941 TreatN:Yeartwo ? ? ?0.14735 ? ?0.09305 ? 1.583 0.113308 TreatNG :Yeartwo ? ?0.21118 ? ?0.09558 ? 2.210 0.027139 * On Thu, Sep 24, 2009 at 2:10 PM, Douglas Bates <bates at stat.wisc.edu> wrote:
On Thu, Sep 24, 2009 at 1:22 AM, Raldo Kruger <raldo.kruger at gmail.com> wrote:
Hi R users,
I have 3 factors in a non-factorial design (G, K and N), as well as
two time periods (Year) and a random factor (Site), with Plant numbers
as the response variable.
My 1st question relates to the the notation of the treatments in the
data frame. Is it appropriate to use an expanded treatment notation,
such as this, when using glmer{lme4}:
Site ? ?Year ? ?Plant ? G ? ? ? K ? ? ? N
A ? ? ? 1 ? ? ? 5 ? ? ? 0 ? ? ? 0 ? ? ? 0
A ? ? ? 1 ? ? ? 4 ? ? ? 1 ? ? ? 0 ? ? ? 0
A ? ? ? 1 ? ? ? 7 ? ? ? 0 ? ? ? 1 ? ? ? 0
A ? ? ? 1 ? ? ? 10 ? ? ?0 ? ? ? 0 ? ? ? 1
A ? ? ? 2 ? ? ? 3 ? ? ? 0 ? ? ? 0 ? ? ? 0
A ? ? ? 2 ? ? ? 4 ? ? ? 1 ? ? ? 0 ? ? ? 0
A ? ? ? 2 ? ? ? 8 ? ? ? 0 ? ? ? 1 ? ? ? 0
A ? ? ? 2 ? ? ? 12 ? ? ?0 ? ? ? 0 ? ? ? 1
B ? ? ? 1 ? ? ? 7 ? ? ? 0 ? ? ? 0 ? ? ? 0
B ? ? ? 1 ? ? ? 3 ? ? ? 1 ? ? ? 0 ? ? ? 0
B ? ? ? 1 ? ? ? 7 ? ? ? 0 ? ? ? 1 ? ? ? 0
B ? ? ? 1 ? ? ? 12 ? ? ?0 ? ? ? 0 ? ? ? 1
B ? ? ? 2 ? ? ? 4 ? ? ? 0 ? ? ? 0 ? ? ? 0
B ? ? ? 2 ? ? ? 5 ? ? ? 1 ? ? ? 0 ? ? ? 0
B ? ? ? 2 ? ? ? 6 ? ? ? 0 ? ? ? 1 ? ? ? 0
B ? ? ? 2 ? ? ? 11 ? ? ?0 ? ? ? 0 ? ? ? 1
With the model
m1<-glmer(Plant~G+K+N+Year+(1|Site), ...)
Or is it better to use a single column for the treatments, like this:
Site ? ?Year ? ?Plant ? Treatment
A ? ? ? 1 ? ? ? 5 ? ? ? C
A ? ? ? 1 ? ? ? 4 ? ? ? G
A ? ? ? 1 ? ? ? 7 ? ? ? K
A ? ? ? 1 ? ? ? 10 ? ? ?N
A ? ? ? 2 ? ? ? 3 ? ? ? C
A ? ? ? 2 ? ? ? 4 ? ? ? G
A ? ? ? 2 ? ? ? 8 ? ? ? K
A ? ? ? 2 ? ? ? 12 ? ? ?N
B ? ? ? 1 ? ? ? 7 ? ? ? C
B ? ? ? 1 ? ? ? 3 ? ? ? G
B ? ? ? 1 ? ? ? 7 ? ? ? K
B ? ? ? 1 ? ? ? 12 ? ? ?N
B ? ? ? 2 ? ? ? 4 ? ? ? C
B ? ? ? 2 ? ? ? 5 ? ? ? G
B ? ? ? 2 ? ? ? 6 ? ? ? K
B ? ? ? 2 ? ? ? 11 ? ? ?N
With the following model:
m1<-glmer(Plants~Treatment+Year+(1|Site), ...)
The latter is preferred. ?R will generate the indicator columns for the levels of the Treatment factor (the 0/1 columns shown in the first form) and, when appropriate, reduce them to a set of 2 "contrasts" in the model. ?(The reason for quoting the word "contrasts" is that there is a formal mathematical definition of a contrast but the linear combinations generated by R do not always satisfy this definition. The method and results are correct, it is just the name that is inaccurate.) The reason that the latter is preferred is that it is easier to maintain the data in a consistent form (factors maintain consistency and are easy to check in the output from str() or summary(), whereas indicator columns have inter-column dependencies that must be checked separately) and the "when appropriate" clause above. ?Determining a useful parameterization of a linear model incorporating factors is subtle and a lot of code in the R function model.matrix is devoted to a symbolic analysis designed to get this right. ?Also, you can, if you wish, change the parameterization (see ?contrasts).
-- Raldo
Raldo