Hello all,
This is my first post in r-sig-ME. I have benefited and learned a lot from
the threads here. In this email I would like to ask for comments on the
correct way to specify my random effect term (sampling site), which
'overlap' with the fixed effect of main interest (altitude).
I am now using nlme, lme4, and glmmADMB to analyze altitudinal survey data
of plant traits.
Below is the structure of the data I am now have trouble analyzing:
=======================================
a) Research design: In every Jan, Apr, Jul, Oct from Jul-2012 to Oct-2015,
we visited the 3 sites at low altitude and 3 sites at medium altitude,
haphazardly selected 6 individual plants of the focal species, and measured
traits and # herbivore for each individual.
b) Hypothesis to test: Does the trait/# herbivore vary across altitude and
month? Are there interactions?
c) Notations:
alt (altitude): L (low) vs. M (medium); a 2-level factor.
m (month): Jan, Apr, Jul, Oct; a 4-level factor.
site (sampling site): 3 at low- and 3 at medium-altitude Taiwan; a 6-level
factor with unique location names.
y (year of survey): 2012-2015; a 4-level factor.
--------
C_count (# chewing herbivore): discrete (count) response variable. Many
zeros, mostly 1-3, a few large values (10,17,22).
mT (physical leaf toughness): continuous response variable.
d) My model structure and outputs:
mod_T = lme(mT ~ m + alt + m:alt, random = ~ 1|site/y,
weights = varIdent(form = ~1|alt),
contrasts = c(list(m=contr.sum,alt=contr.sum)),
na.action = "na.omit", data = dat))
#### weights model form (1|alt) is AIC-selected to control
for heteroskedasticity.
Linear mixed-effects model fit by REML
Data: dat
AIC BIC logLik
3533.131 3591.853 -1752.565
Random effects:
Formula: ~1 | site
(Intercept)
StdDev: 5.603294
Formula: ~1 | y %in% site
(Intercept) Residual
StdDev: 4.142521 9.003203
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | m
Parameter estimates:
July Oct Jan Apr
1.0000000 0.7836473 0.7696658 1.0022467
Fixed effects: mT ~ m + alt + m:alt
Value Std.Error DF t-value p-value
(Intercept) 39.38305 3.590926 468 10.967380 0.0000
mApr 7.57426 1.548235 468 4.892190 0.0000
mJuly 8.44167 1.457696 468 5.791103 0.0000
mOct 4.19561 1.300168 468 3.226973 0.0013
altM -2.72082 5.093589 4 -0.534166 0.6215
mApr:altM -2.31793 2.224683 468 -1.041916 0.2980
mJuly:altM -3.14538 2.098787 468 -1.498666 0.1346
mOct:altM -3.09390 1.880431 468 -1.645314 0.1006
Correlation:
(Intr) mApr mJuly mOct altM mApr:M mJly:M
mApr -0.160
mJuly -0.191 0.394
mOct -0.214 0.442 0.527
altM -0.705 0.113 0.135 0.151
mApr:altM 0.111 -0.696 -0.274 -0.307 -0.171
mJuly:altM 0.133 -0.274 -0.695 -0.366 -0.201 0.414
mOct:altM 0.148 -0.305 -0.364 -0.691 -0.225 0.462 0.546
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.49626272 -0.70206429 -0.03138199 0.59483874 3.00467177
Number of Observations: 498
Number of Groups:
site y %in% site
6 24
------------------------------------------------------------------------------------------
mod_C = glmmadmb(C_count ~ m + alt + m:alt + (1|site/y),
family = "nbinom",
zeroInflation = TRUE, data= dat)
#### AIC(nbinom model) << AIC(poisson model), so use nbinom
AIC: 686.6
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.70537 0.47722 -1.48 0.1394
mApr 0.00304 0.63689 0.00 0.9962
mJuly -1.24819 0.63544 -1.96 0.0495 *
mOct -0.36723 0.60261 -0.61 0.5423
altM -1.63873 0.73215 -2.24 0.0252 *
mApr:altM 0.81522 0.94844 0.86 0.3900
mJuly:altM 3.21364 0.88936 3.61 0.0003 ***
MonthOct:AltM 1.21148 0.89989 1.35 0.1782
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Number of observations: total=499, Site=6, Site:Year=24
Random effect variance(s):
Group=Site
Variance StdDev
(Intercept) 0.03713 0.1927
Group=Site:Year
Variance StdDev
(Intercept) 0.2289 0.4785
Negative binomial dispersion parameter: 0.17887 (std. err.: 0.036541)
Zero-inflation: 1.0132e-06 (std. err.: 0.00013482 )
Log-likelihood: -331.319
Warning message:
In .local(x, sigma, ...) :
'sigma' and 'rdig' arguments are present for compatibility only: ignored
================================================
My Questions:
1) Is it reasonable to include (1|site/y)? Since site effect is mixed up
with altitude effect (my main focus), I am worried that adding 'site' as
random effect would disturb the estimation of altitude effect (which seems
strong based on plotted data), or even cause estimation problem? (for
example, in summary(mod_T), the DF of the 'altM' term is so small (4)
compared to models without site random effect (~480, attached below for
your reference)).
2) One step back, are lme() (for controlling heteroskedasticity) and
glmmadmb() (for accounting for zero-inflation) the right choices for this
analysis?
I would appreciate your comments and help.
Thank very much!
Chung-Huey Wu
####################################################
[WITHOUT site random effect]
mod_T = lme(mT ~ m + alt + m:alt, random = ~ 1|y,
weights = varIdent(form = ~1|alt),
contrasts = c(list(m=contr.sum,alt=contr.sum)),
na.action = "na.omit", data = dat))
Linear mixed-effects model fit by REML
Data: dat
AIC BIC logLik
3672.361 3726.888 -1823.18
Random effects:
Formula: ~1 | y
(Intercept) Residual
StdDev: 3.515026 9.344131
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | m
Parameter estimates:
July Oct Jan Apr
1.0000000 1.0896624 0.9214424 1.0804838
Fixed effects: mT ~ m + alt + m:alt
Value Std.Error DF t-value p-value
(Intercept) 39.65193 2.135613 487 18.567007 0.0000
mApr 7.57426 1.805682 487 4.194680 0.0000
mJuly 8.17279 1.638495 487 4.987985 0.0000
mOct 3.92672 1.706423 487 2.301144 0.0218
altM -3.48711 1.709890 487 -2.039375 0.0420
mApr:altM -1.32758 2.588246 487 -0.512928 0.6082
mJuly:altM -2.37910 2.312807 487 -1.028662 0.3041
mOct:altM -2.32762 2.409046 487 -0.966198 0.3344
Correlation:
(Intr) mApr mJuly mOct altM mApr:M mJly:M
mApr -0.356
mJuly -0.421 0.464
mOct -0.404 0.446 0.526
altM -0.376 0.445 0.490 0.471
mApr:altM 0.248 -0.698 -0.324 -0.311 -0.661
mJuly:altM 0.278 -0.329 -0.682 -0.348 -0.739 0.488
mOct:altM 0.267 -0.316 -0.348 -0.684 -0.710 0.469 0.525
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.2019475 -0.7909630 -0.0548838 0.7254892 3.0912138
Number of Observations: 498
Number of Groups: 4
####################################################
Random effect 'overlap' with fixed effect
2 messages · Chung-Huey Wu, Thierry Onkelinx
Dear Chung-Huey, You have 6 x 4 observations per site - year combination. That is sufficient to use 1|site/year as random effect. Site is an essential part of the design of your study. Therefore it should be in the model. The site effect will take up information which is not explained by the altitude. So if the altitude effect is smaller in comparison to the site effect, then the site-to-site variation is more important than the altitude effect. Note that lots of zero's does not imply zero inflation see https://rpubs.com/INBOstats/zeroinflation Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2017-01-18 5:59 GMT+01:00 Chung-Huey Wu <chung.huey.wu at gmail.com>:
Hello all,
This is my first post in r-sig-ME. I have benefited and learned a lot from
the threads here. In this email I would like to ask for comments on the
correct way to specify my random effect term (sampling site), which
'overlap' with the fixed effect of main interest (altitude).
I am now using nlme, lme4, and glmmADMB to analyze altitudinal survey data
of plant traits.
Below is the structure of the data I am now have trouble analyzing:
=======================================
a) Research design: In every Jan, Apr, Jul, Oct from Jul-2012 to Oct-2015,
we visited the 3 sites at low altitude and 3 sites at medium altitude,
haphazardly selected 6 individual plants of the focal species, and measured
traits and # herbivore for each individual.
b) Hypothesis to test: Does the trait/# herbivore vary across altitude and
month? Are there interactions?
c) Notations:
alt (altitude): L (low) vs. M (medium); a 2-level factor.
m (month): Jan, Apr, Jul, Oct; a 4-level factor.
site (sampling site): 3 at low- and 3 at medium-altitude Taiwan; a 6-level
factor with unique location names.
y (year of survey): 2012-2015; a 4-level factor.
--------
C_count (# chewing herbivore): discrete (count) response variable. Many
zeros, mostly 1-3, a few large values (10,17,22).
mT (physical leaf toughness): continuous response variable.
d) My model structure and outputs:
mod_T = lme(mT ~ m + alt + m:alt, random = ~ 1|site/y,
weights = varIdent(form = ~1|alt),
contrasts = c(list(m=contr.sum,alt=contr.sum)),
na.action = "na.omit", data = dat))
#### weights model form (1|alt) is AIC-selected to control
for heteroskedasticity.
Linear mixed-effects model fit by REML
Data: dat
AIC BIC logLik
3533.131 3591.853 -1752.565
Random effects:
Formula: ~1 | site
(Intercept)
StdDev: 5.603294
Formula: ~1 | y %in% site
(Intercept) Residual
StdDev: 4.142521 9.003203
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | m
Parameter estimates:
July Oct Jan Apr
1.0000000 0.7836473 0.7696658 1.0022467
Fixed effects: mT ~ m + alt + m:alt
Value Std.Error DF t-value p-value
(Intercept) 39.38305 3.590926 468 10.967380 0.0000
mApr 7.57426 1.548235 468 4.892190 0.0000
mJuly 8.44167 1.457696 468 5.791103 0.0000
mOct 4.19561 1.300168 468 3.226973 0.0013
altM -2.72082 5.093589 4 -0.534166 0.6215
mApr:altM -2.31793 2.224683 468 -1.041916 0.2980
mJuly:altM -3.14538 2.098787 468 -1.498666 0.1346
mOct:altM -3.09390 1.880431 468 -1.645314 0.1006
Correlation:
(Intr) mApr mJuly mOct altM mApr:M mJly:M
mApr -0.160
mJuly -0.191 0.394
mOct -0.214 0.442 0.527
altM -0.705 0.113 0.135 0.151
mApr:altM 0.111 -0.696 -0.274 -0.307 -0.171
mJuly:altM 0.133 -0.274 -0.695 -0.366 -0.201 0.414
mOct:altM 0.148 -0.305 -0.364 -0.691 -0.225 0.462 0.546
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.49626272 -0.70206429 -0.03138199 0.59483874 3.00467177
Number of Observations: 498
Number of Groups:
site y %in% site
6 24
------------------------------------------------------------
------------------------------
mod_C = glmmadmb(C_count ~ m + alt + m:alt + (1|site/y),
family = "nbinom",
zeroInflation = TRUE, data= dat)
#### AIC(nbinom model) << AIC(poisson model), so use nbinom
AIC: 686.6
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.70537 0.47722 -1.48 0.1394
mApr 0.00304 0.63689 0.00 0.9962
mJuly -1.24819 0.63544 -1.96 0.0495 *
mOct -0.36723 0.60261 -0.61 0.5423
altM -1.63873 0.73215 -2.24 0.0252 *
mApr:altM 0.81522 0.94844 0.86 0.3900
mJuly:altM 3.21364 0.88936 3.61 0.0003 ***
MonthOct:AltM 1.21148 0.89989 1.35 0.1782
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Number of observations: total=499, Site=6, Site:Year=24
Random effect variance(s):
Group=Site
Variance StdDev
(Intercept) 0.03713 0.1927
Group=Site:Year
Variance StdDev
(Intercept) 0.2289 0.4785
Negative binomial dispersion parameter: 0.17887 (std. err.: 0.036541)
Zero-inflation: 1.0132e-06 (std. err.: 0.00013482 )
Log-likelihood: -331.319
Warning message:
In .local(x, sigma, ...) :
'sigma' and 'rdig' arguments are present for compatibility only: ignored
================================================
My Questions:
1) Is it reasonable to include (1|site/y)? Since site effect is mixed up
with altitude effect (my main focus), I am worried that adding 'site' as
random effect would disturb the estimation of altitude effect (which seems
strong based on plotted data), or even cause estimation problem? (for
example, in summary(mod_T), the DF of the 'altM' term is so small (4)
compared to models without site random effect (~480, attached below for
your reference)).
2) One step back, are lme() (for controlling heteroskedasticity) and
glmmadmb() (for accounting for zero-inflation) the right choices for this
analysis?
I would appreciate your comments and help.
Thank very much!
Chung-Huey Wu
####################################################
[WITHOUT site random effect]
mod_T = lme(mT ~ m + alt + m:alt, random = ~ 1|y,
weights = varIdent(form = ~1|alt),
contrasts = c(list(m=contr.sum,alt=contr.sum)),
na.action = "na.omit", data = dat))
Linear mixed-effects model fit by REML
Data: dat
AIC BIC logLik
3672.361 3726.888 -1823.18
Random effects:
Formula: ~1 | y
(Intercept) Residual
StdDev: 3.515026 9.344131
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | m
Parameter estimates:
July Oct Jan Apr
1.0000000 1.0896624 0.9214424 1.0804838
Fixed effects: mT ~ m + alt + m:alt
Value Std.Error DF t-value p-value
(Intercept) 39.65193 2.135613 487 18.567007 0.0000
mApr 7.57426 1.805682 487 4.194680 0.0000
mJuly 8.17279 1.638495 487 4.987985 0.0000
mOct 3.92672 1.706423 487 2.301144 0.0218
altM -3.48711 1.709890 487 -2.039375 0.0420
mApr:altM -1.32758 2.588246 487 -0.512928 0.6082
mJuly:altM -2.37910 2.312807 487 -1.028662 0.3041
mOct:altM -2.32762 2.409046 487 -0.966198 0.3344
Correlation:
(Intr) mApr mJuly mOct altM mApr:M mJly:M
mApr -0.356
mJuly -0.421 0.464
mOct -0.404 0.446 0.526
altM -0.376 0.445 0.490 0.471
mApr:altM 0.248 -0.698 -0.324 -0.311 -0.661
mJuly:altM 0.278 -0.329 -0.682 -0.348 -0.739 0.488
mOct:altM 0.267 -0.316 -0.348 -0.684 -0.710 0.469 0.525
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.2019475 -0.7909630 -0.0548838 0.7254892 3.0912138
Number of Observations: 498
Number of Groups: 4
####################################################
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models