An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20120602/9960983a/attachment.pl>
multiple variance structure in lmer giving zero variances
4 messages · Sara Krause, Ben Bolker
Sara Krause <skkrause at ...> writes:
I'm hoping someone can help me out. Forgive me if I my mistake is something simple. I am new to mixed models, new to R, and new to lme4 and am struggling to figure everything out. I have two questions that I am hoping someone can answer. 1) Am I using the correct random structure for my model? 2) Can someone help me figure out what is wrong with my syntax to code for random effect variance by treatment group?
(I can understand your desperation, but please don't cross-post ... I almost answered this on r-help with a suggestion that you try on r-sig-mixed instead.)
These questions somewhat go together but let me tackle number one first. I was told by our statistician (who unfortunately doesn't know R well) that my model should include random effects for ID, ID*state, and ID*season. If my understanding of lme4 code is correct, my random structure would appear as it does in this model M1<-lmer (y~trt*season*state*site+(1|ID)+(1|state)+(1|season))
[snipped from below]:
There are two levels in each of these factors. Treatment is G or S (basically treated or control groups) and state is either Pre-treatment or Post-treatment. My other fixed factors are a site factor (two locations) and a season factor (breeding or non-breeding season). Is this correct? (I am starting with the fully crossed fixed effects and I will use iterative model selection to find the optimal model after I make sure that I have the correct random terms)
In general, categorical predictors should not appear in both the fixed and random parts of the model -- that constitutes overfitting. Depending on the structure of your data you should also strongly consider including interactions between treatment and your random factors (see Schielzeth, Holger, and Wolfgang Forstmeier. 2009. ?Conclusions Beyond Support: Overconfident Estimates in Mixed Models.? Behavioral Ecology 20 (2):416?420. doi:10.1093/beheco/arn145. http://www.ncbi.nlm.nih.gov/pubmed/19461866.). However, you can only include treatment interactions that make sense. It sounds like each individual is allocated to a single treatment, so (guessing that your statistician means that you want the *interactions* of ID with state and season) I think you should probably use y~trt*season*state*site+(state|ID)+(0+season|ID) which includes a random effect of state, a state:ID interaction (variation of the state effect among individuals), and a season:ID interaction (variation of seasonal effect among individuals). Why are you moving to lmer? is it so that you can handle the separate effects of state:ID and season:ID ?
The second question is a bit more complicated and changes my random structure as well. I had originally built my model in nlme and used the multiple variance (varIdent) function to allow different variance for two of my terms (trt and state) in my nlme model because different levels in each term had different variances. I found a very helpful post ( https://stat.ethz.ch/pipermail/r-sig-mixed-models/2007q3/000248.html) on how to also do this in lmer but it doesn't seem to be working correctly for me and I have not been able to figure out why.
I'm mostly going to postpone comment on your second question until your first question is sorted out, although I would also encourage you to see if your heteroscedasticity problem might be handled by transformation -- most typically, if treatment increases both the mean and variance a lot, then log transforming the data might take care of the problem with less hassle ...
I am also trying to do this with 2 factors instead of one. The overall structure of my data is Structure of my dataset (d) ..@ frame :'data.frame': 173 obs. of 10 variables: .. ..$ y : int [1:173] 209 382 448 353 224 112 198 273 495 622 ... .. ..$ trt : Factor w/ 2 levels "G","S": 1 1 1 1 1 1 1 1 1 1 ... .. ..$ season: Factor w/ 2 levels "Breeding","NonBreeding": 2 2 2 2 2 2 2 1 1 1 ... .. ..$ state : Factor w/ 2 levels "Post","Pre": 1 1 1 1 1 1 1 1 1 1 ... .. ..$ site : Factor w/ 2 levels "Mrak","Orchard": 1 1 1 1 1 2 2 1 1 1 ... .. ..$ G : num [1:173] 1 1 1 1 1 1 1 1 1 1 ... .. ..$ ID : Factor w/ 59 levels "100","103","105",..: 11 19 22 58 6 36 50 11 20 24 ... .. ..$ S : num [1:173] 0 0 0 0 0 0 0 0 0 0 ... .. ..$ Pre : num [1:173] 0 0 0 0 0 0 0 0 0 0 ... .. ..$ Post : num [1:173] 1 1 1 1 1 1 1 1 1 1 ...
m1<-lmer(y~trt*season*state*site+(0+G|ID)+(0+S|ID) +(0+Pre|ID) + (0+Post|ID)+(1|season), data=d)
Your basic problem here is that it doesn't make sense to test the interaction of treatment with ID (as you have effectively done here), because each individual is only in one treatment category ...
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20120603/8b5df5ac/attachment.pl>
Sara Krause <skkrause at ...> writes:
I apologize for the cross-post. posted to the regular list serve first and received a suggestion from someone else that I post here instead. ... I should have been clear originally that I had posted on the regular listserve first.
Not a big deal, now you know.
Your response was very useful in helping me move along in this analysis. I have addressed several of your comments in more detail below but to summarize my email, these are my current questions: 1. Is it possible for me to still use the nlme package? If so, do I use the pdBlocked structure to incorporate my random effects? And if I do use this structure, could someone point me towards a resource to figure out how to create my groupedData? I'm sure my lack of understanding here is a statisitical, rather than R code issue.
I don't think groupedData is absolutely necessary. I'll leave the pdBlocked question for someone else ...
2. It is unclear to me how to deal with the problem of having my categorical variables occur in the fixed and random effects. Any insight on this would be helpful. 3. Based on the paper you sent and a look at my data, it does seem like I need to incorporate the trt interactions but this exacerbates the problem of an overfitted model.
Yes, but in this case you just can't -- because each individual only gets a single treatment. So the interaction might be there, but you can't estimate it from your data (which gets you off the hook).
4. I tried the model that you kindly provided and received an error. See below for details.
I was switching to lme4 because I didn't think that I could use the correct random terms in nlme. I would actually prefer to stick with nlme if it is possible to do so. I have a few simpler models that I have already run in nlme so it would be nice to not have to re-run all of them. I understand the p-values provided are a bit controversial but journals like them anyway so they are nice to have.
My best guess for how to do this based on what I could figure out and your
suggested lmer model is the following
m1<-lme(y=trt*state*season*site, data=group,
random=pdBlocked(list(pdIdent(~state|ID), pdIdent(~season|ID-1))),
method=("REML"))
However, I have not been able to find any documentation on how to correctly create the groupedData object. To make my model run, I used
group=groupedData(y~1|state, data=d, order.groups=FALSE)
If there is a way to run this model correctly in nlme, I would greatly appreciate your insight on this.
Anyone else want to chime in on this ... ?
You also raised a couple of other issues in your email that I am not sure how to address. You stated
In general, categorical predictors should not appear in both the fixed and random parts of the model -- that constitutes overfitting. My effect of interest in this model is actually the interaction between
treatment and state because I expect that the treated individuals but not the control will decline in (scrotal) size after (contraceptive) treatment. Thus, assuming I am not missing something, state must be in my fixed effects. I expect that there will be seasonal differences in size due to breeding season. This seasonal difference is interesting but ancillary to the main question so perhaps I could take season out of the fixed effects.
I think you're OK: (state|ID) and (season|ID) are putting *interactions* in the model. What would definitely break would be (1|state) or (1|season) (i.e. using fixed effects as *grouping variables* for the random effects). I think the model is fine as it is.
The other issue you raised was this one:
Depending on the structure of your data you should also strongly consider including interactions between treatment and your random factors (see Schielzeth, Holger, and Wolfgang Forstmeier. 2009. ?Conclusions Beyond Support: Overconfident Estimates in Mixed Models.? Behavioral Ecology 20 (2):416?420. doi:10.1093/beheco/arn145. http://www.ncbi.nlm.nih.gov/pubmed/19461866.). However, you can only include treatment interactions that make sense.
I asked about this and he said that the interactions with ID were most important but this paper does make a good argument for it.
Do you mean "not" above (and "he" is your statistician)? I can imagine him saying "not" because they are unidentifiable based on your design -- so it's not that they're not important, just that you can't do anything about it. [snip] I claim you're already including these interactions (as interactions of *fixed* effects). [snip]
It sounds like each
individual is allocated to a single treatment, so (guessing that your statistician means that you want the *interactions* of ID with state and season) I think you should probably use
y~trt*season*state*site+(state|ID)+(0+season|ID)
Yes, the interactions of ID with state and season is exactly what I am looking for. Also, your suggestion of a log transformation did take care of the problem of the unequal variances so thankfully, I should not have to deal with that in a more complicated way. Unfortunately, the model as above but on log transformed data resulted in the following error In mer_finalize(ans) : singular convergence (7)
It is unclear to me why this is occurring.
Probably still somewhat overfitted. Try simplifying the model a bit and see if you can get somewhere (e.g. take out the four-way interaction and use (trt+season+state+site)^3 instead?