MIXED MODEL WITH REPEATED MEASURES
So then let's take this screening-study as an example. Would that contain cross-sectional information on individuals from all ages/cohorts, gathered across 20 years (a)? Or would that be individual trajectories/histories spanning 20 years of time (b)? In the latter case (b) the dependent variable (i.e. having cancer) would be time-dependent and could be modeled as such. But I assume you mean the first set-up (a) where you just look at cross-sections of all sorts of individuals. I presume that you do not have longitudinal information for the study subjects in such a set-up. So both possible designs seem to be very different from yours. However, to stay with the example, what you propose would be comparable to a design in which you observe single individuals for, say, 20 years, and some of your measures vary over time, and some don't, and now you want to predict what does not change with something that does change. This just does not make much sense. Consider you have the information whether your subjects ever had cancer or not, so throughout the entire period of 20 years, they either have a yes or a no. Now you want to predict the individual's chances of getting cancer or not and one predictor would be the number of cigarettes a person smokes in a year, measured at every year across the 20 measurement points. Now consider an individual that did not smoke in the beginning of that period, smoked in the middle, and did not smoke at the end of the observation window. How would you relate these information to somebody having cancer or not when the individual essentially has cancer all the time, or does not have cancer all the time, i.e. throughout the entire observation period? In this case, the longitudinal information about smoking history just does not contribute anything that would help saying something about cancer risk. If you want to predict cancer risk in such a setup you would need to reduce the longitudinal smoking information to cross-sectional information, for example by building an indicator whether one ever smoked or not, or something like that. Then you would be back to set-up (a) and would look at cross-sectional correlations. This is of course not very desirable as somebody could get cancer with 30 but only started smoking with 40, but these are the natural problems with cross-sectional data. In any case, if your dependent variable is of such cross-sectional nature, there is not much you can do about it other than stepping back to a more correlational point of view. Joerg
On Sat, Dec 10, 2011 at 1:49 PM, Erin Ryan <erin at the-ryans.com> wrote:
Good insights, Joerg - thanks. Unfortunately, I wish to predict the value of the dependent variable for future subjects well prior to the last measure (what I envision is an answer with a conf interval that steadily decreases over time). An apt analogy would be a cancer-screening study involving 500 patients over 20 years. In such a study, there would be a multitude of indep variables characterizing each subject, and the dep variable would simply be a nominal-level measure of whether or not a given subject had contracted cancer at some point in the 20 years. The purpose of the study would be to identify future subjects who are at higher risk of cancer, but the conclusions would be based on empirical data in which the dep variable (yes or no for having cancer) would be the same across the entire time-series. So, what is the correct statistical approach for a dataset like this in which the data is not iid, but the dep variable is constant for each subject? Erin -----Original Message----- From: Joerg Luedicke [mailto:joerg.luedicke at gmail.com] Sent: Saturday, December 10, 2011 10:36 AM To: Erin Ryan Subject: Re: [R-sig-ME] MIXED MODEL WITH REPEATED MEASURES On Fri, Dec 9, 2011 at 9:33 PM, Erin Ryan <erin at the-ryans.com> wrote:
Good suggestions; however, there is inherent value in the temporal progression of the repeated measures, so I need to capture that in some
way. If your dependent variable is a constant within units for which you observe "temporal progression", then this "progression" does not matter whatsoever. Imagine you would fit a conventional regression and your dependent variable would be a constant. It would not matter at all how different the subjects would be in whatever regard.
For similar reasons, averaging the values of the independent variables is problematic, as they progress over time to a final, actual value, which presumably should be weighted more heavily. In other words, truth is known on the final repeated measure, but I wish to make accurate predictions much earlier than the final repeated measure.
I don't know what your field of research is, but if you believe that later measures are better measures of your object of interest, you could just take the last one instead of the average. Or, you could take a weighted average of some sort. HTH, Joerg
-----Original Message----- From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Ben Bolker Sent: Thursday, December 08, 2011 5:01 PM To: r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] MIXED MODEL WITH REPEATED MEASURES Erin Ryan <erin at ...> writes:
I am trying to specify a mixed model for my research, but I can't quite get it to work. I've spent several weeks looking thru various online sources to no avail. I can't find an example of someone trying to do precisely what I'm trying to do. I'm hoping some smart member of this mailing list may be able to help. First off, full disclosure: (1) I'm an engineer by trade, so the problem may be related to my ignorance of statistics, and/or (2) I'm fairly new to R, so the problem may be related to my ignorance of R syntax. Here is the basic structure of my data (in longitudinal form):
?[snip]
The rows below each subject are repeated measures (in years), with the specific pattern of repeated measurements unique to each subject. The data contains fixed effects and random effects, and there is clearly correlation in the random effects within each subject. The DepVar column represents the dependent variable which is a constant for each subject. All the data is empirical, but I wish to create a predictive model. Specifically, I wish to predict the value for DepVar for new
subjects.
So I understand enough about statistics to know that I must employ a mixed model. I further understand that I must specify a covariance matrix structure. Given the relatively high degree of correlation in consecutive years, an AR(1) structure seems like a good starting point. I have been trying to build the model in SPSS, but without success, so I've recently turned to R. My first attempt was as follows-- ModelFit <- lme(fixed = DepVar ~FixedVar1+FixedVar2, random = ~RandomVar1+RandomVar2 | Subject, na.action = na.omit, data = dataset, corr = corAR1()) I assume this can't be the right specification since it neglects the repeated measure aspect of the data, so I instead decided to employ the corCAR1 structure, i.e.-- ModelFit <- lme(fixed = DepVar ~FixedVar1+FixedVar2, random = ~RandomVar1+RandomVar2 | Subject, na.action = na.omit, data = dataset, corr = corCAR1(0.5, form = ~ Years | Subject)) Now perhaps neither correlation structure is the right one (probably a different discussion for another day), but the problem I'm experiencing seems to occur regardless of the structure I specify. In both cases, I get the following error-- Error in solve.default(estimates[dimE[1] - (p:1), dimE[2] - (p:1), drop = FALSE]) : ? system is computationally singular: reciprocal condition number = 5.42597e-022 Anybody know what is going wrong here? This error appears to be related to the fact that the DepVar is constant for each subject, because when I select a different dependent variable that is different for each repeated measure w/in the subject, I do not get this
error.
?I think you're right that DepVar is fixed per individual. Technical details aside, I'm having trouble seeing how you're going to estimate the effects of predictor variables that vary within subject when you've only got one response per subject. Furthermore, I think what you're terming "RandomVar1" and "RandomVar2" are probably *not* random variables, but rather are variables that vary within subject. ? For this response variable, I would suggest averaging the values of RandomVar1 and RandomVar2 per subject and collapsing the data set to a simple linear model on subjects -- and get rid of the correlation model at the same time. ?For response variables that do vary within subject, I would suggest ModelFit <- lme(fixed = DepVar ~FixedVar1+FixedVar2+ ? RandomVar1 + RandomVar2, random = 1 | Subject, ?na.action = na.omit, data = dataset, corr = corAR())
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models