random as fixed effect

[cc'ing back to r-sig-mixed]
Ben,

I was going to expand on her question, but you beat me to the punch.

In agriculture, we typically run the same CRD, RCBD, etc (with all fixed
effects) for 2 to 3 years. In doing this (given instruction from past
biometry teachers), I would call year/trial random as I do not really
care about what year/trial is best and I hope to be able to talk about
the wider range of conditions seen outside of our time frame. I noticed
in an archived post that you stated 2-3 years/varieties/etc are not
enough to base an estimate of the variance of the population of effects.
Is that ultimately the deciding factor in determining whether or not
year/trial is fixed or random? In other words, is that sufficient
justification for calling year/trial fixed?

This is my one major stumbling block in transitioning from SAS to R. I
greatly appreciate you comments.
I would argue this is not really a problem in transitioning from SAS
to R, but from classical method-of-moments ANOVA to modern mixed models;
you will have the same kinds of results with SAS PROC MIXED as you will
with nlme/lme4.  http://glmm.wikidot.com/faq#fixed_vs_random  goes into
more detail.  There is a distinction between _conceptual_ or
_philosophical_ random effects (we don't want to make inferences about
specific values, we want to make inferences about the population) and
_computational_ random effects (we want to estimate effects with
shrinkage, we have enough levels to estimate the variance reasonably
well). I would agree that in the best of all possible worlds you would
somehow be able to generalize from an experiment that was run in two
successive years to the performance of a crop variety across all
possible years (and estimate the variance among years accurately), but
that doesn't work particularly well on statistical grounds (the variance
is extremely poorly determined), and in the case of mixed models it
generally fails for computational reasons as well.
Andrew

On 10/11/2012 7:07 AM, Ben Bolker wrote:
joana martelo <jmmartelo at ...> writes:

I?m modeling fish activity data with a gaussian distribution for scores
obtained from Principal Component Analysis. My explanatory variables are
group size, fish length, temperature and year. Because year has only two
levels I know I can?t use it as a random effect. However, do you
think that
considering year a fixed effect will inflate the effect of the other
explanatory variables?
   No.  On the basis of what you've told us, using year as a fixed
effect seems perfectly sensible.  You might want to check whether
there are important interactions between year and the other explanatory
variables ...

   (Your title seems a bit odd.)

   Ben Bolker

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

1. "we want to make inferences about the population": 
Even making year a random effect is not really enough.  We are dealing with 
a time series, and modelling it as a random effect is a weak concession to that 
issue.  If one does nonetheless fit year as a fixed effect, one should at least 
examine the results for the separate years separately, and check on the extent 
to which they point in the same direction.  Published use of the analysis should 
acknowledge the consequent uncertainty.  

Note however that for certain types of balanced models, the estimates of treatment 
effects will be the same irrespective of whether one fits years as random or fixed.
The model is not allowing for a year by treatment interaction, just as the standard
form of analysis of block designs does not and cannot allow for a block x treatment
interaction.

2. "statistical grounds (the variance is extremely poorly determined)": 
but of course ignoring this component of variance, if it does affect treatment or other 
estimates, does not cause it to go away.

3. "computational reasons": 
The algorithms used in lme4 are general to the extent that they are able to handle
a huge variety of designs.  My experience is using Genstat, which uses quite a
different algorithm. was that it rarely failed for the balanced or approximately
balanced designs that are usual in field and suchlike experimentation.  ASREML
would no doubt perform similarly.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

[cc'ing back to r-sig-mixed]

On 12-10-11 09:08 AM, Andrew Koeser wrote:
Ben,

I was going to expand on her question, but you beat me to the punch.

In agriculture, we typically run the same CRD, RCBD, etc (with all fixed
effects) for 2 to 3 years. In doing this (given instruction from past
biometry teachers), I would call year/trial random as I do not really
care about what year/trial is best and I hope to be able to talk about
the wider range of conditions seen outside of our time frame. I noticed
in an archived post that you stated 2-3 years/varieties/etc are not
enough to base an estimate of the variance of the population of effects.
Is that ultimately the deciding factor in determining whether or not
year/trial is fixed or random? In other words, is that sufficient
justification for calling year/trial fixed?

This is my one major stumbling block in transitioning from SAS to R. I
greatly appreciate you comments.
 I would argue this is not really a problem in transitioning from SAS
to R, but from classical method-of-moments ANOVA to modern mixed models;
you will have the same kinds of results with SAS PROC MIXED as you will
with nlme/lme4.  http://glmm.wikidot.com/faq#fixed_vs_random  goes into
more detail.  There is a distinction between _conceptual_ or
_philosophical_ random effects (we don't want to make inferences about
specific values, we want to make inferences about the population) and
_computational_ random effects (we want to estimate effects with
shrinkage, we have enough levels to estimate the variance reasonably
well). I would agree that in the best of all possible worlds you would
somehow be able to generalize from an experiment that was run in two
successive years to the performance of a crop variety across all
possible years (and estimate the variance among years accurately), but
that doesn't work particularly well on statistical grounds (the variance
is extremely poorly determined), and in the case of mixed models it
generally fails for computational reasons as well.

Andrew

On 10/11/2012 7:07 AM, Ben Bolker wrote:
joana martelo <jmmartelo at ...> writes:

I?m modeling fish activity data with a gaussian distribution for scores
obtained from Principal Component Analysis. My explanatory variables are
group size, fish length, temperature and year. Because year has only two
levels I know I can?t use it as a random effect. However, do you
think that
considering year a fixed effect will inflate the effect of the other
explanatory variables?
  No.  On the basis of what you've told us, using year as a fixed
effect seems perfectly sensible.  You might want to check whether
there are important interactions between year and the other explanatory
variables ...

  (Your title seems a bit odd.)

  Ben Bolker

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
1. "we want to make inferences about the population": 
Even making year a random effect is not really enough.  We are
dealing with a time series, and modelling it as a random effect is a
weak concession to that issue.  If one does nonetheless fit year as
a fixed effect, one should at least examine the results for the
separate years separately, and check on the extent to which they
point in the same direction.  Published use of the analysis should
acknowledge the consequent uncertainty.

Note however that for certain types of balanced models, the
estimates of treatment effects will be the same irrespective of
whether one fits years as random or fixed.  The model is not
allowing for a year by treatment interaction, just as the standard
form of analysis of block designs does not and cannot allow for a
block x treatment interaction.

2. "statistical grounds (the variance is extremely poorly
determined)": but of course ignoring this component of variance, if
it does affect treatment or other estimates, does not cause it to go
away.
I echo John's concern.  I would argue that this component of variance
will always affect interval estimates, and it should not be ignored.
I feel uneasy about converting random effects into fixed effects
simply because they have few levels; in so doing we risk
over-confidence in our estimates and tests, because we're assuming
that the contribution is really 0.

My opinion is that the structure of the model should honestly reflect
the structure of the design, at very least.  In an ideal world we
should include the uncertainty around the random effects estimate,
but I do not see that being done.  Maybe two experimental units really
is too few for inference!

Best wishes

Andrew
3. "computational reasons": The algorithms used in lme4 are general
to the extent that they are able to handle a huge variety of
designs.  My experience is using Genstat, which uses quite a
different algorithm. was that it rarely failed for the balanced or
approximately balanced designs that are usual in field and suchlike
experimentation.  ASREML would no doubt perform similarly.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 12/10/2012, at 12:20 AM, Ben Bolker <bbolker at gmail.com> wrote:

[cc'ing back to r-sig-mixed]

On 12-10-11 09:08 AM, Andrew Koeser wrote:
Ben,

I was going to expand on her question, but you beat me to the punch.

In agriculture, we typically run the same CRD, RCBD, etc (with all fixed
effects) for 2 to 3 years. In doing this (given instruction from past
biometry teachers), I would call year/trial random as I do not really
care about what year/trial is best and I hope to be able to talk about
the wider range of conditions seen outside of our time frame. I noticed
in an archived post that you stated 2-3 years/varieties/etc are not
enough to base an estimate of the variance of the population of effects.
Is that ultimately the deciding factor in determining whether or not
year/trial is fixed or random? In other words, is that sufficient
justification for calling year/trial fixed?

This is my one major stumbling block in transitioning from SAS to R. I
greatly appreciate you comments.
 I would argue this is not really a problem in transitioning from SAS
to R, but from classical method-of-moments ANOVA to modern mixed models;
you will have the same kinds of results with SAS PROC MIXED as you will
with nlme/lme4.  http://glmm.wikidot.com/faq#fixed_vs_random  goes into
more detail.  There is a distinction between _conceptual_ or
_philosophical_ random effects (we don't want to make inferences about
specific values, we want to make inferences about the population) and
_computational_ random effects (we want to estimate effects with
shrinkage, we have enough levels to estimate the variance reasonably
well). I would agree that in the best of all possible worlds you would
somehow be able to generalize from an experiment that was run in two
successive years to the performance of a crop variety across all
possible years (and estimate the variance among years accurately), but
that doesn't work particularly well on statistical grounds (the variance
is extremely poorly determined), and in the case of mixed models it
generally fails for computational reasons as well.

Andrew

On 10/11/2012 7:07 AM, Ben Bolker wrote:
joana martelo <jmmartelo at ...> writes:

I?m modeling fish activity data with a gaussian distribution for scores
obtained from Principal Component Analysis. My explanatory variables are
group size, fish length, temperature and year. Because year has only two
levels I know I can?t use it as a random effect. However, do you
think that
considering year a fixed effect will inflate the effect of the other
explanatory variables?
  No.  On the basis of what you've told us, using year as a fixed
effect seems perfectly sensible.  You might want to check whether
there are important interactions between year and the other explanatory
variables ...

  (Your title seems a bit odd.)

  Ben Bolker

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Andrew Robinson  
Deputy Director, ACERA 
Department of Mathematics and Statistics            Tel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia               (prefer email)
http://www.ms.unimelb.edu.au/~andrewpr              Fax: +61-3-8344-4599
http://www.acera.unimelb.edu.au/

Forest Analytics with R (Springer, 2011) 
http://www.ms.unimelb.edu.au/FAwR/
Introduction to Scientific Programming and Simulation using R (CRC, 2009): 
http://www.ms.unimelb.edu.au/spuRs/