Skip to content

GLMM for Combined experiments and overdispersed data

10 messages · Thierry Onkelinx, Peter Claussen, Juan Pablo Edwards Molina

#
I am analyzing data from 3 field experiments (farms=3) for a citrus flower
disease: response variable is binomial because the flower can only be
diseased or healthy.

I have particular interest in comparing 5 fungicide spraying systems
(trt=5).

Each farm had 4 blocks (bk=4) including 2 trees as subsamples (tree=2) in
which I assessed 100 flowers each one. This is a quick look of the data:

farm      trt      bk    tree   dis   tot     <fctr>   <fctr>  <fctr>
<fctr> <int> <int>
iaras      cal      1      1     0    100
iaras      cal      1      2     1    100
iaras      cal      2      1     1    100
iaras      cal      2      2     3    100
iaras      cal      3      1     0    100
iaras      cal      3      2     5    100...

The model I considered was:

resp <- with(df, cbind(dis, tot-dis))

m1 = glmer(resp ~ trt + (1|farm/bk) , family = binomial, data=df)

I tested the overdispersion with the overdisp_fun() from GLMM page
<http://glmm.wikidot.com/faq>

        chisq         ratio             p          logp
 4.191645e+02  3.742540e+00  4.804126e-37 -8.362617e+01

As ratio (residual dev/residual df) > 1, and the p-value < 0.05, I
considered to add the observation level random effect (link
<http://r.789695.n4.nabble.com/Question-on-overdispersion-td3049898.html>)
to deal with the overdispersion.

farm      trt      bk    tree   dis   tot tree_id    <fctr>   <fctr>
<fctr> <fctr> <int> <int> <fctr>
iaras      cal      1      1     0    100    1
iaras      cal      1      2     1    100    2
iaras      cal      2      1     1    100    3...

so now was added a random effect for each row (tree_id) to the model, but I
am not sure of how to include it. This is my approach:

m2 = glmer(resp ~ trt + (1|farm/bk) + (1|tree_id), family = binomial, data=df)

I also wonder if farm should be a fixed effect, since it has only 3
levels...

m3 = glmer(resp ~ trt * farm + (1|farm:bk) + (1|tree_id), family =
binomial, data=df)

I really appreciate your suggestions about my model specifications...



*Juan? Edwards- - - - - - - - - - - - - - - - - - - - - - - -# PhD student
- ESALQ-USP/Brazil*
2 days later
#
Dear Juan,

Use unique id's for random effects variables. So each bk should only be
present in one farm. And each tree_id should be present in only one bk. In
case each block has different treatments then each tree_id should be unique
to one combination of bk and trt.

Farm has too few levels to be a random effects. So either model is as a
fixed effect or drop it. In case you drop it, the information will be
picked up by bk. Note that trt + (1|farm) is less complex than trt * farm.

Assuming that you are not interested in the effect of a specific farm, you
could use sum, polynomial or helmert contrasts for the farms. Unlike the
default treatment contrast, these type of contrasts sum to zero. Thus the
effect of trt will be that for the average farm instead of the reference
farm.

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2017-04-21 22:32 GMT+02:00 Juan Pablo Edwards Molina <
edwardsmolina at gmail.com>:

  
  
#
I?m sorry... I?m new in the list, and when I figured out that the question
would suit best in the mixed model list I had already post it in general
R-help. I don?t know if there?s a way to "cancel a question"... I will take
care of it from now on.

Dear Thierry, thanks for your answer.
Yes, I am not interested in the effect of a specific farm, they simply
represent the total of farms from the region where I want to suggest the
best treatments.

I Followed your suggestions, but still have a couple of doubts,

1- May "farm" be include as a simple fixed effect or interacting with the
treatment?

m3 = glmer(resp ~ trt * farm + (1|tree_id), family = binomial, data=df)
m4 = glmer(resp ~ trt + farm + (1|tree_id), family = binomial, data=df)

?2 - ?
In case of significant
?[ trt * farm ], should I report the results for each farm??

Thanks again Thierry,

Juan Edwards


*Juan*

On Mon, Apr 24, 2017 at 4:29 AM, Thierry Onkelinx <thierry.onkelinx at inbo.be>
wrote:

  
  
#
Juan,

I would model this as 

m3 = glmer(resp ~ trt * farm +  (1| bk/tree), family = binomial, data=df)
or
m3 = glmer(resp ~ trt * farm +  (1| bk) +  (1| tree_id), family = binomial, data=df)
(I can?t say off the top of my head if what the difference would be if you?re dealing with over-dispersion).

1. I?m assuming that block is a somewhat uniform grouping of trees, so that including block in the model gives you an estimate of spatial variability in the response, and if that is important relative to tree-to-tree variation.  

2. You will most certainly want to include trt*farm to test for treatment-by-environment interaction. If interaction is not significant, you may choose to exclude interaction from the model. If there is interaction, then you will want to examine each farm to determine if cross-over interaction present.

If your experiment is to determine the ?best? fungicide spraying system, and cross-over interaction is present, then you have no ?best? system. You might have cross-over arising because, say, system 1 ranks ?best? on farm 1, but system 2 ranks ?best? on farm 2.

There is extensive literature on the topic, mostly from the plant breeding genotype-by-environment interaction side. Some of the associated statistics implemented in the agricolae package, i.e. AMMI.

Peter
#
Dear Peter,

Both models will yield identical results in case tree_id uses unique codes
over the blocks.

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2017-04-24 15:55 GMT+02:00 Peter Claussen <dakotajudo at mac.com>:

  
  
#
Yes, tree_id uses unique code over blocks and farms: ?each bk is
only present in one farm and each tree is present in only one bk. (tree_id
structure: farm_bk_tree)

By this way the model can handle the overdispersion (If I'm not mistaken)

Thank you all,

*Juan*

On Mon, Apr 24, 2017 at 11:46 AM, Thierry Onkelinx <thierry.onkelinx at inbo.be

  
  
#
Thierry, sorry to bother you again...
I tried to follow your suggestion and I did the herlmert contrasts with
lsmeans package.

dinc <- within(dinc, { tree_id <- as.factor(interaction(farm, trt, bk,
tree)) })

resp1 <- with(dinc, cbind(dis, tot-dis))

m0 = glmer(resp1 ~ trt + farm + (1|tree_id), family = binomial, data=dinc)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [
glmerMod]
 Family: binomial  ( logit )
Formula: resp1 ~ trt + farm + (1 | tree_id)
   Data: dinc

     AIC      BIC   logLik deviance df.resid
   521.5    543.8   -252.7    505.5      112

Scaled residuals:
     Min       1Q   Median       3Q      Max
-0.90445 -0.51114 -0.00572  0.31456  1.04667

Random effects:
 Groups  Name        Variance Std.Dev.
 tree_id (Intercept) 1.028    1.014
Number of obs: 120, groups:  tree_id, 120

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.87786    0.37604 -12.972  < 2e-16 ***
trtG10      -0.06738    0.49125  -0.137  0.89090
trtG15       0.90620    0.44435   2.039  0.04141 *
trtG20       1.13733    0.43920   2.590  0.00961 **
trtControl   5.10202    0.41215  12.379  < 2e-16 ***
farmstacruz -0.80155    0.30294  -2.646  0.00815 **
farmtaqua   -0.84738    0.30659  -2.764  0.00571 **
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Correlation of Fixed Effects:
            (Intr) trtG10 trtG15 trtG20 trtCnt frmstc
trtG10      -0.635
trtG15      -0.710  0.538
trtG20      -0.720  0.546  0.604
trtControl  -0.763  0.571  0.650  0.651
farmstacruz -0.300  0.022 -0.015  0.010 -0.081
farmtaqua   -0.301  0.012  0.004  0.017 -0.083  0.441

### Setting up a custom contrast function

helmert.lsmc <- function(levs, ...) {
  M <- as.data.frame(contr.helmert(levs))
  names(M) <- paste(levs[-1],"vs calendar")
  attr(M, "desc") <- "Helmert contrasts"
  M
}
$lsmeans
 trt             prob          SE df   asymp.LCL   asymp.UCL
 Calendar 0.004374833 0.001541432 NA 0.002191201 0.008715549
 G10      0.004090935 0.001498922 NA 0.001993307 0.008377443
 G15      0.010757825 0.003091538 NA 0.006116214 0.018855141
 G20      0.013517346 0.003832360 NA 0.007740919 0.023502164
 Control  0.419339074 0.051564118 NA 0.322885163 0.522377507

Results are averaged over the levels of: farm
Confidence level used: 0.95
Intervals are back-transformed from the logit scale

$contrasts
 contrast              odds.ratio           SE df z.ratio p.value
 G10 vs calendar     9.348400e-01 4.592373e-01 NA  -0.137  0.8909
 G15 vs calendar     6.552019e+00 4.909975e+00 NA   2.508  0.0121
 G20 vs calendar     1.310737e+01 1.307704e+01 NA   2.579  0.0099
 Control vs calendar 1.011296e+08 1.124089e+08 NA  16.582  <.0001

Results are averaged over the levels of: farm
Tests are performed on the log odds ratio scale

## Do you think it's correct, if I consider trt calendar as the reference
to test my other treatments?

Thanks!

Juan

*Juan*

On Mon, Apr 24, 2017 at 11:46 AM, Thierry Onkelinx <thierry.onkelinx at inbo.be

  
  
#
The effects of the baseline and control are very large. Do you have quasi
complete separation? All failures at the baseline and all successes at the
control?

Op 25 apr. 2017 3:40 p.m. schreef "Juan Pablo Edwards Molina" <
edwardsmolina at gmail.com>:

Thierry, sorry to bother you again...
I tried to follow your suggestion and I did the herlmert contrasts with
lsmeans package.

dinc <- within(dinc, { tree_id <- as.factor(interaction(farm, trt, bk,
tree)) })

resp1 <- with(dinc, cbind(dis, tot-dis))

m0 = glmer(resp1 ~ trt + farm + (1|tree_id), family = binomial, data=dinc)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [
glmerMod]
 Family: binomial  ( logit )
Formula: resp1 ~ trt + farm + (1 | tree_id)
   Data: dinc

     AIC      BIC   logLik deviance df.resid
   521.5    543.8   -252.7    505.5      112

Scaled residuals:
     Min       1Q   Median       3Q      Max
-0.90445 -0.51114 -0.00572  0.31456  1.04667

Random effects:
 Groups  Name        Variance Std.Dev.
 tree_id (Intercept) 1.028    1.014
Number of obs: 120, groups:  tree_id, 120

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.87786    0.37604 -12.972  < 2e-16 ***
trtG10      -0.06738    0.49125  -0.137  0.89090
trtG15       0.90620    0.44435   2.039  0.04141 *
trtG20       1.13733    0.43920   2.590  0.00961 **
trtControl   5.10202    0.41215  12.379  < 2e-16 ***
farmstacruz -0.80155    0.30294  -2.646  0.00815 **
farmtaqua   -0.84738    0.30659  -2.764  0.00571 **
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Correlation of Fixed Effects:
            (Intr) trtG10 trtG15 trtG20 trtCnt frmstc
trtG10      -0.635
trtG15      -0.710  0.538
trtG20      -0.720  0.546  0.604
trtControl  -0.763  0.571  0.650  0.651
farmstacruz -0.300  0.022 -0.015  0.010 -0.081
farmtaqua   -0.301  0.012  0.004  0.017 -0.083  0.441

### Setting up a custom contrast function

helmert.lsmc <- function(levs, ...) {
  M <- as.data.frame(contr.helmert(levs))
  names(M) <- paste(levs[-1],"vs calendar")
  attr(M, "desc") <- "Helmert contrasts"
  M
}
$lsmeans
 trt             prob          SE df   asymp.LCL   asymp.UCL
 Calendar 0.004374833 0.001541432 NA 0.002191201 0.008715549
 G10      0.004090935 0.001498922 NA 0.001993307 0.008377443
 G15      0.010757825 0.003091538 NA 0.006116214 0.018855141
 G20      0.013517346 0.003832360 NA 0.007740919 0.023502164
 Control  0.419339074 0.051564118 NA 0.322885163 0.522377507

Results are averaged over the levels of: farm
Confidence level used: 0.95
Intervals are back-transformed from the logit scale

$contrasts
 contrast              odds.ratio           SE df z.ratio p.value
 G10 vs calendar     9.348400e-01 4.592373e-01 NA  -0.137  0.8909
 G15 vs calendar     6.552019e+00 4.909975e+00 NA   2.508  0.0121
 G20 vs calendar     1.310737e+01 1.307704e+01 NA   2.579  0.0099
 Control vs calendar 1.011296e+08 1.124089e+08 NA  16.582  <.0001

Results are averaged over the levels of: farm
Tests are performed on the log odds ratio scale

## Do you think it's correct, if I consider trt calendar as the reference
to test my other treatments?

Thanks!

Juan

*Juan*

On Mon, Apr 24, 2017 at 11:46 AM, Thierry Onkelinx <thierry.onkelinx at inbo.be

  
  
#
Perhaps I?m missing something.

Are the individual trees the experimental units, and are you only taking one count per tree, or are there multiple counts? That makes me think that that tree_id variance is effectively residual variance and that there is no random effect other than residual.

You have 112 df.resid, and 120 groups : tree_id.

So what do you gain by using glmer as opposed to gem with a quasibinomial family?

Peter

  
  
#
Should read ?glm?, un-autocorrected.