Rules of thumb for model complexity with small sample size in lme() - R-SIG-mixed-models

Wed, Apr 4, 2018 6:55 AM #

Hello,

I have an experiment with six streams in two groups (regulated and control).  At each stream there were five sites (Transect).  At each site there were unreplicated nutrient treatments (N, P, N+P, C).  Light was measured at each site.

Stream	Regulated	Transect	Nitrogen	Phosphorus	R	Light
Cranberry	Regulated	30	C	C	-0.102512563	2042.266667
Cranberry	Regulated	30	C	P	-0.08877551	2042.266667
Cranberry	Regulated	50	C	C	-0.107142857	1283.3
Cranberry	Regulated	50	N	C	-0.059375	1283.3
Cranberry	Regulated	70	C	C	-0.067346939	1336.6
Cranberry	Regulated	70	N	C	-0.063636364	1336.6
...

I would like to know if the response differs among groups (regulated vs control) or is related to light or nutrient treatment.  I have two separate analyses, N = 107 and N = 66 with different numbers of missing values (N = 120 before missing values). 

I think the appropriate model structure is:

lme(Response ~ Regulated + Light + Nitrogen + Phosphorus + Nitrogen:Phosphorus), random=~1|Stream/Transect, data=data, method="ML"))

However, I'm concerned that the model is far too complex for my sample size.  Any advice would be appreciated!

Thanks!

John P. Ludlam, Ph.D. - Fitchburg State University

Thierry Onkelinx

Wed, Apr 4, 2018 8:10 AM #

Dear John,

Since you don't have replication at the transect level, you should
omit that from the random effects structure.

I tend to strive for at least 10 observations per parameter. More is
better of course. Assuming that Nitrogen and Phosphorus are factors
with two levels, then Regulated + Light + Nitrogen + Phosphorus +
Nitrogen:Phosphorus requires 5 parameters. Add 1 for the random effect
and you have 6 parameters or at least 60 observations. So this model
might work with N = 66. However you will need to carefully check the
model.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

2018-04-04 15:55 GMT+02:00 John Ludlam <jludlam at fitchburgstate.edu>:

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Ben Bolker

Wed, Apr 4, 2018 8:16 AM #

As I interpret the description, there is actually replication at the
transect level; ~ 2 samples per transect (30 transects, total N=66). In
principle one could even ask if there are variations in the effect of N
and P across transects (this is essentially a very unbalanced
randomized-block design), but I agree that would be unrealistically
optimistic.

  Otherwise I agree with Thierry.

On 18-04-04 11:10 AM, Thierry Onkelinx wrote:

Dear John,

Since you don't have replication at the transect level, you should
omit that from the random effects structure.

I tend to strive for at least 10 observations per parameter. More is
better of course. Assuming that Nitrogen and Phosphorus are factors
with two levels, then Regulated + Light + Nitrogen + Phosphorus +
Nitrogen:Phosphorus requires 5 parameters. Add 1 for the random effect
and you have 6 parameters or at least 60 observations. So this model
might work with N = 66. However you will need to carefully check the
model.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

2018-04-04 15:55 GMT+02:00 John Ludlam <jludlam at fitchburgstate.edu>:

Hello,

I have an experiment with six streams in two groups (regulated and control).  At each stream there were five sites (Transect).  At each site there were unreplicated nutrient treatments (N, P, N+P, C).  Light was measured at each site.

Stream  Regulated       Transect        Nitrogen        Phosphorus      R       Light
Cranberry       Regulated       30      C       C       -0.102512563    2042.266667
Cranberry       Regulated       30      C       P       -0.08877551     2042.266667
Cranberry       Regulated       50      C       C       -0.107142857    1283.3
Cranberry       Regulated       50      N       C       -0.059375       1283.3
Cranberry       Regulated       70      C       C       -0.067346939    1336.6
Cranberry       Regulated       70      N       C       -0.063636364    1336.6
...

I would like to know if the response differs among groups (regulated vs control) or is related to light or nutrient treatment.  I have two separate analyses, N = 107 and N = 66 with different numbers of missing values (N = 120 before missing values).

I think the appropriate model structure is:

lme(Response ~ Regulated + Light + Nitrogen + Phosphorus + Nitrogen:Phosphorus), random=~1|Stream/Transect, data=data, method="ML"))

However, I'm concerned that the model is far too complex for my sample size.  Any advice would be appreciated!

Thanks!

John P. Ludlam, Ph.D. - Fitchburg State University

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models