lmer - error message

Hi Sean,

Thanks for the data.  Your first problem is that Age is a factor with
14 levels.  Second, Year is a factor with 16 levels.  Now consider
what interactions of these will be---factors with a huge number of
levels, exceeding what is reasonable to fit to your size of data.  I
included inline code below.  First I convert these variables to
numeric.  I also do some plots.

While I get a model to run, the estimated variance of the random
intercept is 0, which is a bit troublesome.  I looked at the data, but
I do not have a particularly good reason, other than presumably there
is very little variability by fish after with age and lake in the
model.  Anyway, I go on to try a log transformation of Age because of
its relationship with Increment.  That seems to work pretty well, but
the residuals are clearly heteroscedastic, so I also show how you can
use the sandwich package to (at least partially) address this.

Cheers,

Josh

require(lme4)

## shows that there are 224 levels in the Age by Year interaction!!
with(odata1, interaction(Age, Year))
## create numeric variables instead
odata1 <- within(odata1, {
  numAge <- as.numeric(levels(Age))[Age]
  numYear <- as.numeric(levels(Year))[Year]
})

## look at a density plot of Increment
plot(density(odata1$Increment))
rug(odata1$Increment)

## look at Increment versus Age (numeric)
## looks like a straight line is likely a poor fit
xyplot(Increment ~ numAge, data = odata1)

## compare by lake (looks pretty similar)
xyplot(Increment ~ numAge | lake, data = odata1)
## also note that it looks like there is more variability when age is
log than hight
## we'll keep that in mind for later

## simple model
m1 <- lmer(Increment ~  0 + numAge*lake + (1 | FishID),
  data = odata1)

## check residuals against normal
qqnorm(resid(m1))
## residuals against age
## this looks pretty bad
plot(odata1$numAge, resid(m1))
## what about the random effects?
plot(ranef(m1))

## checking the model summary
## we see that the variance for the random intercepts is 0
## this is also rather concerning
summary(m1)

## lets look at some descriptives
descriptives <- with(odata1, do.call("rbind", tapply(Increment,
FishID, function(x) {
  c(M = mean(x, na.rm = TRUE), V = var(x, na.rm = TRUE), N = length(x))
})))
descriptives <- descriptives[order(descriptives[, "M"]), ]
descriptives # print

## looks like there are quite a few fish with only one observations
## also, those with only one tend to have a higher mean increment

## given the shape of the relationship, we might try a log transformation on Age
## also given that the random intercepts have 0 variance, we can drop it
malt <- lm(Increment ~ 1 + log(numAge)*lake, data = odata1)
qqnorm(resid(malt)) ## looks decent enough

## not too bad, but clearly not homoscedastic residuals
plot(odata1$numAge, resid(malt))

## the residuals appear heteroscedastic, so we could
## use a sandwhich estimator of the covariance matrix
## the general form is: inv(X'X) X' W X inv(X'X)
## the correction comes in terms of the weight matrix, W which
## can be defined in various ways

## load two required packages
require(lmtest)
require(sandwich)

## the default
coef(summary(malt))
## using sandwhich estimator
## (okay, the results come out pretty similar, not parameter estimates
will be unchaged
## only SEs change)
coeftest(malt, vcov. = vcovHC(malt, type = "HC", sandwich = TRUE))