I am trying to help a friend towards an appropriate lmer() analysis of
her data and help myself understand how to use lmer syntax in the right way.
She says "...here is what I think is a very standard situation in
ecology....
A normally distributed measure (say bird density) is collected for:
5 years of data (fixed factor),
2 seasons within each year (fixed factor),
3 months within each season (fixed factor)..."
The response is measured at the same six sites at every visit. What
follows is my interpretation of what she might want, which may not be
what she really wants, but this may not matter for the purpose of
creating an example.
I will simulate some data from a known model, then use lmer() to fit the
model and compare the estimated parameters with those used in the
simulation.
I would be interested in comments about
a) whether I have simulated the data from my model correctly
and
b) whether my lmer syntax is appropriate for my model.
The systematic part of my model is:
\mu + \beta_{is} + \gamma_{sm}
where i varies over years, s over seasons, and m over months.
(I assume that months are nested within seasons but crossed with years.)
Now let me decide how these coefficients should vary with site. I am
going to assume that the constant term and the year-season term vary
with site but that month-within-season does not.
The random part of the model needs to be generated from a 1 + 5*2 =
11-dimensional multivariate normal. I will assume 1 correlation
parameter, that between two seasons in the same year. All other
correlations are zero.
Values for fixed effects:
Suppose \mu = 10
the beta_{i1} are 7, 8, 9, 6, 7.5
the beta_{i2} are 1, 3, 4, 2, 2.5
the gamma_{1m} are 0, 2, 1
the gamma_{2m} are 0, -2, -2
The s.d. parameters are sigma_mu = 3, sigma_beta = 2, sigma = 0.1
The correlation parameter is 0.4.
Instead of 6 sites I will assume nsite = 60.
I have written the following to generate data and fit the model:
library(mvtnorm)
library(lme4)
V = diag(11)
for ( i in 2:10) {
V[i,i+1] = 0.4
V[i+1,i] = 0.4
}
sig = diag(c(3,rep(2,10)))
V = sig%*%V%*%sig
nsite = 60 #Why not?
sigma = 0.1
theta = c(10,7, 1, 8, 3, 9, 4, 6,2, 7.5,2.5)
P = rmvnorm(nsite,mean=theta,sigma=V)
a = rep(1:10,rep(3,10))
TH = cbind(rep(1,30),model.matrix(~factor(a)-1))
gg = rep(c(0,2,1,0,-2,-2),5)
y = as.vector(TH%*%t(P)) + rep(gg,nsite) + rnorm(30*nsite,0,sigma)
year = gl(5,2*3,30*nsite)
season = gl(2,3,30*nsite)
month = gl(3,1,30*nsite)
site = gl(nsite,30,30*nsite)
mod = lmer(y ~ year/season + season/month + (1 + year/season | site) )
summary(mod)
The output I get is
library(mvtnorm)
library(lme4)
Loading required package: Matrix
Loading required package: lattice
Attaching package: 'Matrix'
The following object(s) are masked from package:stats :
xtabs
The following object(s) are masked from package:base :
colMeans,
colSums,
rcond,
rowMeans,
rowSums
V = diag(11)
for ( i in 2:10) {
+ V[i,i+1] = 0.4
+ V[i+1,i] = 0.4
+ }
sig = diag(c(3,rep(2,10)))
V = sig%*%V%*%sig
nsite = 60
sigma = 0.1
theta = c(10,7, 1, 8, 3, 9, 4, 6,2, 7.5,2.5)
P = rmvnorm(nsite,mean=theta,sigma=V)
a = rep(1:10,rep(3,10))
TH = cbind(rep(1,30),model.matrix(~factor(a)-1))
gg = rep(c(0,2,1,0,-2,-2),5)
y = as.vector(TH%*%t(P)) + rep(gg,nsite) + rnorm(30*nsite,0,sigma)
year = gl(5,2*3,30*nsite)
season = gl(2,3,30*nsite)
month = gl(3,1,30*nsite)
site = gl(nsite,30,30*nsite)
mod = lmer(y ~ year/season + season/month + (1 + year/season | site) )
summary(mod)
The variance estimates for the random effects look large, but maybe
because of the way I have parameterized the model I am estimating some
combination of sigma_mu and sigma_beta?
Cheers,
Murray Jorgensen
Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: maj at waikato.ac.nz Fax 7 838 4155
Phone +64 7 838 4773 wk Home +64 7 825 0441 Mobile 021 0200 8350