Hi Sean,
Thanks for the data. Your first problem is that Age is a factor with
14 levels. Second, Year is a factor with 16 levels. Now consider
what interactions of these will be---factors with a huge number of
levels, exceeding what is reasonable to fit to your size of data. I
included inline code below. First I convert these variables to
numeric. I also do some plots.
While I get a model to run, the estimated variance of the random
intercept is 0, which is a bit troublesome. I looked at the data, but
I do not have a particularly good reason, other than presumably there
is very little variability by fish after with age and lake in the
model. Anyway, I go on to try a log transformation of Age because of
its relationship with Increment. That seems to work pretty well, but
the residuals are clearly heteroscedastic, so I also show how you can
use the sandwich package to (at least partially) address this.
Cheers,
Josh
require(lme4)
## shows that there are 224 levels in the Age by Year interaction!!
with(odata1, interaction(Age, Year))
## create numeric variables instead
odata1 <- within(odata1, {
numAge <- as.numeric(levels(Age))[Age]
numYear <- as.numeric(levels(Year))[Year]
})
## look at a density plot of Increment
plot(density(odata1$Increment))
rug(odata1$Increment)
## look at Increment versus Age (numeric)
## looks like a straight line is likely a poor fit
xyplot(Increment ~ numAge, data = odata1)
## compare by lake (looks pretty similar)
xyplot(Increment ~ numAge | lake, data = odata1)
## also note that it looks like there is more variability when age is
log than hight
## we'll keep that in mind for later
## simple model
m1 <- lmer(Increment ~ 0 + numAge*lake + (1 | FishID),
data = odata1)
## check residuals against normal
qqnorm(resid(m1))
## residuals against age
## this looks pretty bad
plot(odata1$numAge, resid(m1))
## what about the random effects?
plot(ranef(m1))
## checking the model summary
## we see that the variance for the random intercepts is 0
## this is also rather concerning
summary(m1)
## lets look at some descriptives
descriptives <- with(odata1, do.call("rbind", tapply(Increment,
FishID, function(x) {
c(M = mean(x, na.rm = TRUE), V = var(x, na.rm = TRUE), N = length(x))
})))
descriptives <- descriptives[order(descriptives[, "M"]), ]
descriptives # print
## looks like there are quite a few fish with only one observations
## also, those with only one tend to have a higher mean increment
## given the shape of the relationship, we might try a log transformation on Age
## also given that the random intercepts have 0 variance, we can drop it
malt <- lm(Increment ~ 1 + log(numAge)*lake, data = odata1)
qqnorm(resid(malt)) ## looks decent enough
## not too bad, but clearly not homoscedastic residuals
plot(odata1$numAge, resid(malt))
## the residuals appear heteroscedastic, so we could
## use a sandwhich estimator of the covariance matrix
## the general form is: inv(X'X) X' W X inv(X'X)
## the correction comes in terms of the weight matrix, W which
## can be defined in various ways
## load two required packages
require(lmtest)
require(sandwich)
## the default
coef(summary(malt))
## using sandwhich estimator
## (okay, the results come out pretty similar, not parameter estimates
will be unchaged
## only SEs change)
coeftest(malt, vcov. = vcovHC(malt, type = "HC", sandwich = TRUE))
On Thu, Feb 16, 2012 at 11:44 PM, Sean Godwin <sean.godwin at gmail.com> wrote:
Hi Josh, Thank you so much for your response. The point of the process was actually to find out whether there are different age effects for each lake, so an interaction term between age and lake is a necessary one. ?Taking out the other random effects to make the model simpler would work perfectly to make this easier to tackle though, and I can add the other terms on later as you say! So:? m1 <- lmer(Increment ~ 0 + Age + Age*lake + (1|FishID),lakedata) But apparently I don't understand the Age*lake syntax... all I'm trying to do with this is to have an interaction term between age and lake, but since neither are random effects, I can't just write (1|Age:lake) as I could with (1|Age:Year), right? ?It is the Age*lake term that is causing the error... when I take it out, the model runs fine. There are 117 fish observations, so 117 unique values of fishID. ?The number of observations per fish changes depending on how old the fish is ( a2 year old fish = 2 observations), with a maximum fish age of 13 years (so max 13 observations per fish and 13 years total in the dataset). I've attached my data (Josh_fulldata.csv) and the script used to rearrange it into a usable format (Josh_model.R). ?For some reason, I wasn't able to export the resulting dataframe without causing errors when re-inputting it, so hopefully this works for you. ?Hopefully the inclusion of the full dataset answers your other questions. ?Interestingly, when I used a 50 fish subset, the error message didn't occur (I've attached that just in case: Josh_data.csv). I can't thank you enough for your help. ?Already you've pointed me in a better direction. ?All I'm trying to do is to include a non-random interaction in here (age:lake), and it sure has me stumped! -Sean On 16 February 2012 23:06, Joshua Wiley <jwiley.psych at gmail.com> wrote:
Hi Sean, Can you fit a simpler model? ?Particularly I would start with just one random effect (say, (1|fishid)). ?I would not expect the age*lake interaction to be a problem per se. Also you may not be fitting the model you think you are. ?Writing age*lake in a formula expands to: Age + lake + age:lake That is the two main effects of age and lake plus their interaction. ?That makes your specification of the main effect of age redundant and means you will have a main effect of lake, which since you suppressed the intercept by adding the 0, will be the conditional expected values of the two lakes when age is 0 for the same random effects. In terms of getting around the error, some other information about your data would be helpful. ?How many observations do you have? ?How many unique values of fishid are there? ?How many observations per fish? ?How many years? ?Is it a numeric variable or factor? ?What about age? ?For factor variables what do the joint distributions look like? ?Do you really mean to have a random constant by year AND age:year? ?Within a fish, do age and fish have a perfect correlation? In terms of practical advice, I would start with simple models and make them more complex. ?Try to find the term that when you add it leads to the error. ?Start with an unconditional model only if you have to. ?As you estimate additional parameters in your model, what happens to the others? ?Are they stable? ?Do they change a lot? What about the standard errors? ?Does adding one term the model still fit but the SEs become huge? If you need more help, posting a sample of your data that reproduces the error so we can run it on our machines would help (a plaintext file or the output of dput() would work well...we do not need all of it, just enough that a model could work (say 50-100 observations) and that reproduces the same error. Cheers, Josh On Feb 16, 2012, at 21:39, Sean Godwin <sean.godwin at gmail.com> wrote:
Hi all, I am fairly new to mixed effects models and lmer, so bear with me. Here is a subset of my data, which includes a binary variable (lake (TOM or JAN)), one other fixed factor (Age) and a random factor (Year). ?lake FishID Age Increment Year 1 ?TOM ? ? ?1 ? 1 ? ? 0.304 ? ? 2007 2 ?TOM ? ? ?1 ? 2 ? ? 0.148 ? ? 2008 3 ?TOM ? ? ?1 ? 3 ? ? 0.119 ? ? 2009 4 ?TOM ? ? ?1 ? 4 ? ? 0.053 ? ? 2010 5 ?JAN ? ? ? 2 ? 1 ? ? 0.352 ? ? 2009 6 ?JAN ? ? ? 2 ? 2 ? ? 0.118 ? ? 2010 The model I'm trying to fit is: m1 <- lmer(Increment ~ 0 + Age + Age*lake + (1|Year) + (1|Year:Age) + (1|FishID),lakedata) The error message I get is: *"Error in mer_finalize(ans) : Downdated X'X is not positive definite, 27."* * *
From reading up on the subject, I think my problem is that I can't
incorporate the 'lake' variable in a fixed-effect interaction because it is only has one binary observation. ?But I don't know what to do to be able to fit this model. ?Any help would be greatly appreciated! -Sean ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/