nlme model specification

Sat, May 24, 2008 3:35 AM

Matthew,

Zuur et al (2007) recommands that you first select a good random structure. In a next step you can refine the model by adding correlation structures. I don't think that you may compare f.gls with f.lme because both the random effect and the correlation structure are different (gls is like a lme without random effect.

So in a first step I would compare the models below.

gls(fixed = growth ~ diameter*vines)
lme(fixed = growth ~ diameter*vines, random = ~ 1 | tree.id)
lme(fixed = growth ~ diameter*vines, random = ~ yr | tree.id)
lme(fixed = growth ~ diameter*vines, random = ~ factor(yr) | tree.id) 

The last random effect needs a lot of parameters (45). But depending on the structure of the variance-covariance matrix of the random effect you might be able to simplify it to a pdDiag, pdCompSymm or pdIdent structure. Have a look at Pinheiro and Bates for more details on those structures.

Suppose ~ year | tree.id was the best random effect. Then i could compare

lme(fixed = growth ~ diameter*vines, random = ~ yr | tree.id)
lme(fixed = growth ~ diameter*vines, random = ~ yr | tree.id, correlation = corAR1(form = ~ yr|tree.id))
lme(fixed = growth ~ diameter*vines, random = ~ yr | tree.id, correlation = corARMA(form = ~ yr|tree.id, p = 0, q = 1))
...

HTH,

Thierry


----------------------------------------------------------------------------
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium 
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be 
www.inbo.be 

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: Landis, R Matthew [mailto:rlandis at middlebury.edu] 
Verzonden: vrijdag 23 mei 2008 20:40
Aan: ONKELINX, Thierry; Christian A. Parker; 'P?ter S?lymos'
CC: r-sig-ecology at r-project.org
Onderwerp: RE: [R-sig-eco] nlme model specification

Dear Thierry, Chris, Peter, and the list,

Based on these suggestions and some others, I realize now that tree id is clearly the grouping variable.  A couple people had mentioned in a previous letter that there is likely to be temporal autocorrelation within trees, and I am particularly interested in this aspect of the data, which prompted me to look into the autocorrelation structures described in Pinheiro and Bates.

I am still a little confused about the two different approaches for accounting for non-independence within trees among years (i.e. specifying tree.id as a random effect vs. specifying autocorrelation in the residuals.  I am currently considering these alternative models:

f.gls <- gls(growth ~ diameter*vines, corr = corAR1(form = ~ yr|tree.id)

f.lme <- lme(fixed = growth ~ diameter*vines, random = ~ 1 | tree.id)

f.lme.ar1 <- lme(fixed = growth ~ diameter*vines, random = ~ 1 | tree.id, correlation = corAR1(form = ~ yr | tree.id))

For f.gls, phi is 0.67, which pretty closely matches the value I determined empirically by correlating residuals from different years.
plot(ACF(f.gls)) also reveals highly signficant autocorrelation which decays smoothly with lag.  This approach is very appealing, because it closely matches what I see if I make a correlation matrix among years empirically from the residuals.

However, f.lme.ar1 also has significant correlation, and a lower AIC than f.gls (and f.lme).  But now phi is only 0.26, and plot(ACF(f.lme.ar1)) does not look like AR1, but is rather increasingly negative with longer lags.  I don't quite understand this change in phi, or the biological interpretation of the correlation structure.

So, even though f.gls has a higher AIC, I'd like to use that one because it seems more biologically interpretable.  Does this adequately account for the pseudoreplication across years, or do I really need that random effect of tree.id in there?

Again, many many thanks for all your help.  This makes a lot more sense to me now than it did.

Matt

-----Original Message-----
From: ONKELINX, Thierry [mailto:Thierry.ONKELINX at inbo.be]
Sent: Thursday, May 22, 2008 11:15 AM
To: Christian A. Parker; Landis, R Matthew
Cc: r-sig-ecology at r-project.org
Subject: RE: [R-sig-eco] nlme model specification

Matthew,

I support Chris' point of view. The random effects should model the
dependence structure between the measurements. The parameters of your
fixed effect will be better when you have specified the random effect
correctly. Define the most complex model that are willing to
accept, run
it with different random effects and compare those models (by LRT or
AIC, both under REML). Take the best one and refine your fixed effects
(under ML).

A few suggestions:

M0 <- gls(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, method = 'REML')
Ma <- lme(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, random = ~1|year,
method = 'REML')
M1 <- lme(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, random = ~1|id,
method = 'REML')
M2 <- lme(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, random =
~year|id,
method = 'REML')
M3 <- lme(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, random =
~factor(year)|id, method = 'REML')
M4 <- lme(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, random =
list(id =
pdCompSymm(~factor(year))), method = 'REML')

anova(M0, Ma)
anova(M0, M1, M2)
anova(M0, M1, M3)
anova(M0, M1, M4)

Warning: M3 may take some time to calculate as it will have estimate 45
parameters for the random effect.

A random slope by year when you group by year makes no sense since you
have in each group only data from one year. Hence it is impossible to
have a slope by year in each group.

Thierry

---------------------------------------------------------------
---------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an
answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-sig-ecology-bounces at r-project.org
[mailto:r-sig-ecology-bounces at r-project.org] Namens Christian A. Parker
Verzonden: donderdag 22 mei 2008 16:39
Aan: Landis, R Matthew
CC: 'r-sig-ecology at r-project.org'
Onderwerp: Re: [R-sig-eco] nlme model specification

Matthew
Please correct me if I am wrong (anyone) but because your observations
are not independent across your desired groups (years) your error terms
will be biased which will then influence your significant tests. So
regardless of the factor that you are interested you would
still want to

account for the fact that all measurements were taken on the same trees
each year by doing a repeated measures model of some sort.
Hope this helps,
-Chris

Landis, R Matthew wrote:

Dear R-sig-eco:

Many thanks to all of those who took the time to reply to my

question.
The diversity of replies has made me go back and try to clarify my
question.  Apologies for the length of the e-mail.  Thanks in
advance to
anyone willing to plow through this and understand it.  If you're ever
in Middlebury I'll buy you a beer.

To repeat, I have 300 trees, ranging in size from 10 - 150

cm diameter
(big trees).  To simplify my original question, let's say I want to
understand the relationship between growth and two variables, diameter
(continuous) and vine load (ordinal index from 1-4). I'd also like to
know the relative importance of diameter vs. vine load, e.g. by partial
R2.  If I had one year of data, this would be a simple regression.

However, I have 9 years of annual measurements on the trees.  It's as

if I have the above analysis repeated 9 times.  There was no initial
treatment, so I view these 9 years as a random sample of the years in
the life of the tree, and unlike most examples of repeated measures I
have read, the time effect is of no interest whatsoever. That is, I am
not interested in viewing xyplot(growth ~ time|id).  I don't expect to
see any consistent directional response to time.  In a way, it's as if
the 9 years represent blocks, (except that it's the same 300 trees in
each block) -- this is why I view the yr as a random effect, and as the
grouping variable.

If I were to graph the data, I would use xyplot(growth ~ diameter|yr)

to see what I am most interested in.  Grouping by individual doesn't
make sense to me here because each individual only represents a very
small slice of the full range of measurements - e.g. over the
ten years,
each tree only grows from 10 cm - 14 cm, so I can't really estimate the
growth vs. diameter relationship for each tree.  xyplot(growth ~
diameter|id) would not be useful. This is why I don't consider the
individual to be the grouping variable, but perhaps I am wrong on this.

So, now, as before, I am back to

fit <- lme(fixed = growth ~ diameter * vines, random = ~ 1|year)

I'm expecting that this will estimate separate intercepts for each

year.  Which is what I want (I would like to fit separate
slopes by year
too, but that model didn't converge).

I guess what I'm most concerned about is whether the significance

tests obtained for each term use the appropriate error term and the
appropriate degrees of freedom.  I'm currently using something like the
following command to test the effect of diameter

anova(fit.full.model, update(fit.full.model, . ~ vines))

But maybe I'm way off base there.

Thanks very much!

Matt Landis

nlme model specification

Thread (10 messages)