Mixed model correlation structure for unbalanced, longitudinal data
------------------------------ Message: 2 Date: Tue, 03 Jul 2012 17:47:51 -0400 From: Andy Flies <andyflies at gmail.com> To: r-sig-mixed-models at r-project.org Subject: [R-sig-ME] Mixed model correlation structure for unbalanced longitudinal data Message-ID: <4FF36887.5050702 at gmail.com> Content-Type: text/plain; charset=windows-1252; format=flowed Dear R users, I have data from a long-term study that has opportunistically collected samples over the past 10 years. My data set is highly unbalanced because of the opportunistic sample collection.I have a single sample from 19 individuals, 2 samples from 4 individuals, and 3 samples from 2 individuals.I know that lmer can accommodate unbalanced data sets, but I am unsure if my data set is too unbalanced. I am testing if social rank, reproductive status, and age affect my response variables. I also need to determine if sample collection parameters such as sample date and the time from anesthetizing the animal to the time the sample was collected affects the response variables. Here are what I see as potential options: 1)Use a mixed model with subject as random intercept and sample date as random slope to account for potential temporal autocorrelation within the repeat samples. Lmer( y ~ 1 + x1 + x2 + x3 + ? (1 + date | subject) 2)Use a mixed model with subject as random intercept. Initial data exploration does not show any obvious temporal autocorrelation. Lmer( y ~ 1 + x1 + x2 + x3 + ? (1 | subject) 3)Use a GEE and specify an autoregressive correlation structure. I think this would be a good option, but from what I have found in the literature, my sample size is too small for this. 4)Use the mean for each individual and use a standard linear model. This option is not good because it does not allow me to include reproductive status as a predictor because reproductive status changes between samples. 5)Use only a single sample from each individual in standard linear model. This option is not good because my already limited sample size would be further reduced. Please let me know which of the above options would be best or if you can suggest a better option. Any advice or literature references are sincerely appreciated. Thanks, Andy Andy...do I understand it well that you have 33 observations in total? If so...then I don't want to be the boogie man....but......seriously consider simplifying all these models. Option 4 with only 1 or 2 covariates would be my choice. Ask yourself whether it makes sense to analyze these data at all...perhaps making only some simple graphs? Alain
Dr. Alain F. Zuur First author of: 1. Analysing Ecological Data (2007). Zuur, AF, Ieno, EN and Smith, GM. Springer. 680 p. URL: www.springer.com/0-387-45967-7 2. Mixed effects models and extensions in ecology with R. (2009). Zuur, AF, Ieno, EN, Walker, N, Saveliev, AA, and Smith, GM. Springer. http://www.springer.com/life+sci/ecology/book/978-0-387-87457-9 3. A Beginner's Guide to R (2009). Zuur, AF, Ieno, EN, Meesters, EHWG. Springer http://www.springer.com/statistics/computational/book/978-0-387-93836-3 4. Zero Inflated Models and Generalized Linear Mixed Models with R. (2012) Zuur, Saveliev, Ieno. http://www.highstat.com/book4.htm Other books: http://www.highstat.com/books.htm Statistical consultancy, courses, data analysis and software Highland Statistics Ltd. 6 Laverock road UK - AB41 6FN Newburgh Tel: 0044 1358 788177 Email: highstat at highstat.com URL: www.highstat.com URL: www.brodgar.com