Skip to content

zero variance query

8 messages · Christine Griffiths, Emmanuel Charpentier, Ben Bolker +1 more

#
Dear R users,

I am having a problem with getting zero variance in my lmer models which 
specify two random effects. Having scoured the help lists, I have read that 
this could be because my variables are strongly correlated. However, when I 
simplify my model I still encounter the same problem.

My response variable is abundance which ranges from 0-0.14.

Below is an example of my model:
Generalized linear mixed model fit by the Laplace approximation
Formula: Abundance ~ Treatment + (1 | Month) + (1 | Block)
   AIC   BIC logLik deviance
 17.55 36.00 -2.777    5.554
Random effects:
 Groups   Name        Variance   Std.Dev.
 Month    (Intercept) 5.1704e-17 7.1906e-09
 Block    (Intercept) 0.0000e+00 0.0000e+00
 Residual             1.0695e-03 3.2704e-02
Number of obs: 160, groups: Month, 10; Block, 6

Fixed effects:
                   Estimate Std. Error t value
(Intercept)        -3.73144    0.02728 -136.80
Treatment2.Radiata  0.58779    0.03521   16.69
Treatment3.Aldabra  0.47269    0.03606   13.11

Correlation of Fixed Effects:
            (Intr) Trt2.R
Trtmnt2.Rdt -0.775
Trtmnt3.Ald -0.756  0.586

1. Is it wrong to treat this as count data?
2. I would like to retain these as random factors given that I designed my 
experiment as a randomised block design and repeated measures, albeit 
non-orthogonal and unbalanced. Is it acceptable to retain these random 
factors, is all else is correct?
3. The above response variable was calculated per m2 by dividing the Count 
by the sample area. When I used the Count (range 0-9) as my response 
variable, I get a small but reasonable variation of random effects. Could 
anyone explain why this occurs and whether one response variable is better 
than another?
Generalized linear mixed model fit by the Laplace approximation
Formula: Count ~ Treatment + (1 | Month) + (1 | Block)
   AIC BIC logLik deviance
 312.5 331 -150.3    300.5
Random effects:
 Groups   Name        Variance Std.Dev.
 Month    (Intercept) 0.14591  0.38198
 Block    (Intercept) 0.58690  0.76609
 Residual             2.79816  1.67277
Number of obs: 160, groups: Month, 10; Block, 6

Fixed effects:
                   Estimate Std. Error t value
(Intercept)          0.3098     0.3799  0.8155
Treatment2.Radiata   0.5879     0.2299  2.5575
Treatment3.Aldabra   0.5745     0.2382  2.4117

Correlation of Fixed Effects:
            (Intr) Trt2.R
Trtmnt2.Rdt -0.347
Trtmnt3.Ald -0.348  0.536

Many thanks,
Christine
#
Le lundi 01 juin 2009 ? 18:00 +0100, Christine Griffiths a ?crit :
Hmmm... IST vaguely R that, when the world was young and I was (already)
silly, Poisson distribution used to be a *discrete* distribution. Of
course, this may or may not stand for "quasi"Poisson (for some value of
"quasi").

May I inquire if you tried to analyze log(Abundance) (or log(Count),
maybe including log(area) in the model) ?

HTH,

					Emmanuel Charpentier
#
Emmanuel Charpentier wrote:
I think so ...
To agree with what Emmanuel said above: you should use Count~...,
offset=log(area) for the correct analysis ...  that should solve
both your technical (zero random effects) and conceptual (even
quasiPoisson should be discrete data) issues.

  
    
#
Dear Emmanuel and Ben

Many thanks for your advice. Unfortunately, I don't think that I can offset 
with log(area), given that each area is the same. My rationale for 
converting to m2 was to standardise abundances to 1 m2 as I have other 
parameters which were measured to different areas. I had previously 
attempted to normalise my data by logging but felt that it did not improve 
the distribution. I just hadn't tried it in my modelling. Logging my count 
data dramatically improved the fit of the model (AIC 116.7 v 312.5), 
however the variance still remains low. Does this appear acceptable?
Furthermore, can I assess model fit of different transformations of the 
same dataset using AIC values, i.e. compare log(Count) and inverse 
transformed Count?

lncount<-log(Count+1)
m1<-m1<-lmer(lncount~Treatment+(1|Month)+(1|Block),family=quasipoisson)
summary(m1)
Generalized linear mixed model fit by the Laplace approximation
Formula: lncount ~ Treatment + (1 | Month) + (1 | Block)
   AIC   BIC logLik deviance
 116.7 135.1 -52.33    104.7
Random effects:
 Groups   Name        Variance   Std.Dev.
 Month    (Intercept) 1.8937e-14 1.3761e-07
 Block    (Intercept) 3.5018e-02 1.8713e-01
 Residual             3.9318e-01 6.2704e-01
Number of obs: 160, groups: Month, 10; Block, 6

Fixed effects:
                   Estimate Std. Error t value
(Intercept)         -0.4004     0.1239  -3.232
Treatment2.Radiata   0.4596     0.1305   3.522
Treatment3.Aldabra   0.4295     0.1334   3.220

Correlation of Fixed Effects:
            (Intr) Trt2.R
Trtmnt2.Rdt -0.581
Trtmnt3.Ald -0.577  0.530

I used quasipoisson as my data is overdispersed. It was further improved by 
an inverse transformation (AIC 43.54). Again I have small variances.

invcount<-1/(Count+1)
m3<-lmer(invcount~Treatment+(1|Month)+(1|Block),family=quasipoisson)
summary(m3)
Generalized linear mixed model fit by the Laplace approximation
Formula: invcount ~ Treatment + (1 | Month) + (1 | Block)
   AIC BIC logLik deviance
 43.54  62 -15.77    31.54
Random effects:
 Groups   Name        Variance  Std.Dev.
 Month    (Intercept) 0.0000000 0.000000
 Block    (Intercept) 0.0021038 0.045867
 Residual             0.0926225 0.304339
Number of obs: 160, groups: Month, 10; Block, 6

Fixed effects:
                   Estimate Std. Error t value
(Intercept)        -0.51644    0.05411  -9.545
Treatment2.Radiata -0.36246    0.08401  -4.314
Treatment3.Aldabra -0.29319    0.08197  -3.577

Correlation of Fixed Effects:
            (Intr) Trt2.R
Trtmnt2.Rdt -0.566
Trtmnt3.Ald -0.580  0.372

Log(Abundance) did not solve the problem of zero variance. If quasipoisson 
errors are not acceptable to use with abundance, i.e. non-integers, is 
there a family of errors that would be recommended? Or should I simply 
multiply abundance to obtain whole numbers?

Many thanks in advance,
Christine
--On 01 June 2009 23:17 -0400 Ben Bolker <bolker at ufl.edu> wrote:

            
----------------------
Christine Griffiths
School of Biological Sciences
University of Bristol
Woodland Road
Bristol BS8 1UG
Tel: 0117 9287593
Fax 0117 925 7374
Christine.Griffiths at bristol.ac.uk
http://www.bio.bris.ac.uk/research/mammal/tortoises.html
#
Christine Griffiths wrote:
Why not?  All the offset does is add a constant (i.e., fixed rather
than estimated -- could be the same or different for different
observations) to the regression model.
Don't quite understand this.  Parameters from other studies that you
want to compare in discussion?  If so, you can just rescale your
predictions/parameters *after* you estimate them ...

I had previously
No, not without a correction.  See
http://www.unc.edu/courses/2007spring/enst/562/001/docs/lectures/lecture22.htm

  Generalized linear modeling is not as flexible (in some ways) as
classical linear models -- you can't just transform the data any way you
want (in principle I suppose you could,  but it's basically not possible
to "transform to achieve a Poisson distribution" the way you would
transform continuous data to achieve normality etc.)

  
    
#
Le mardi 02 juin 2009 ? 10:45 +0100, Christine Griffiths a ?crit :
Aaargh ! I thought that "lmer(lncount~Treatment+(1|Month)+(1|
Block),family=gaussian)" (+/- log(area somewhere in the model or the
fit) might give good hints as a first (known bad) approximation, and
didn't made myself clear. That's what happens when you try to be
sarcastic while dog-tired...

And maybe your random effects *are* quite low. What happens with :
summary(glm(Count~Treatment+Month+Block, family=quasipoisson) (or
something to that effect) ? What says a simple crosstabulation ? or a
graph ("caterpillar", for example) ?

HTH,

					Emmanuel Charpentier
#
Dear All

Thank you for all your incredibly helpful advice. Having done as suggested 
by Emmanuel, I have come to the conclusion that there is basically very 
little variation among my data, which is perhaps not surprising given that 
I have a small count range 0-9. I was just concerned that such low variance 
meant that there was something wrong with my analysis.

Emmanuel, logging my data does not improve the distribution and so I think 
I cannot use Gaussian errors. I have a large number of zeros. These are off 
biological significance and so I am interested in retaining them in the 
model.

Since Anna mentioned that quasi models are unstable in lme4, would you 
recommend using poisson models and calculating QAIC instead?

Many thanks
Christine


--On 02 June 2009 20:07 +0200 Emmanuel Charpentier
<charpent at bacbuc.dyndns.org> wrote:

            
----------------------
Christine Griffiths
School of Biological Sciences
University of Bristol
Woodland Road
Bristol BS8 1UG
Tel: 0117 9287593
Fax 0117 925 7374
Christine.Griffiths at bristol.ac.uk
http://www.bio.bris.ac.uk/research/mammal/tortoises.html