An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20131008/6812b7f5/attachment.pl>
model for clustered longitudinal binary data
5 messages · Ben Bolker, Adrien Combaz
Adrien Combaz <Adrien.Combaz at ...> writes:
Dear list members,
[snip]
I measure a longitudinal binary outcome (correctness of detection, 0: incorrect, 1: correct) with respect to 5 different experimental conditions (1 baseline and 4 treatments). The outcome is always measured at the same 10 time points. Each of the 9 subjects participated in all 5 conditions. Additionally, for each subject and condition, the experiment was replicated 36 times. I therefore end up with 9*5*36=1620 binary longitudinal series (= trials of 10 points each).
My aim is to assess the influence of the experimental condition on my binary outcome. I need to build a model that would take into consideration the correlation along time for a given trial and the correlation among trials for a given subject.
Correlation among trials for a given subject should be straightforward, correlation along time for a given trial may be difficult (see below).
I am considering a 3 levels logistic models where 10 consecutive binary measurements (level 1) are obtained on replicates (level 2) which are clustered into subjects (level 3). My only level 1 covariate would be the time of measurement (ordinal factor, T = 1, ..., 10) and as level 2 covariate, I consider the experimental condition. I don't consider any level 3 covariate per se, but still want the model to account for between-subject variability.
This all seems reasonable. If you really want time to be treated as ordinal, you'll want to look at the clmm function from the 'ordinal' package. In most R modeling packages you don't need to state explicitly which levels the covariates are measured at (but keeping track of it is of course useful for thinking about issues of identifiability, etc.) A simple model would be something like response ~ time + expcond + (1|rep/sub) As a more complete model you could consider response ~ time + expcond + (time|rep/sub) + (expcond|sub)
Thanks Ben for your reply,
Dear list members,
[snip]
I measure a longitudinal binary outcome (correctness of detection, 0: incorrect, 1: correct) with respect to 5 different experimental conditions (1 baseline and 4 treatments). The outcome is always measured at the same 10 time points. Each of the 9 subjects participated in all 5 conditions. Additionally, for each subject and condition, the experiment was replicated 36 times. I therefore end up with 9*5*36=1620 binary longitudinal series (= trials of 10 points each).
My aim is to assess the influence of the experimental condition on my binary outcome. I need to build a model that would take into consideration the correlation along time for a given trial and the correlation among trials for a given subject.
Correlation among trials for a given subject should be straightforward, correlation along time for a given trial may be difficult (see below).
Yes, this is my main issue.
I am considering a 3 levels logistic models where 10 consecutive binary measurements (level 1) are obtained on replicates (level 2) which are clustered into subjects (level 3). My only level 1 covariate would be the time of measurement (ordinal factor, T = 1, ..., 10) and as level 2 covariate, I consider the experimental condition. I don't consider any level 3 covariate per se, but still want the model to account for between-subject variability.
This all seems reasonable. If you really want time to be treated as ordinal, you'll want to look at the clmm function from the 'ordinal' package. In most R modeling packages you don't need to state explicitly which levels the covariates are measured at (but keeping track of it is of course useful for thinking about issues of identifiability, etc.)
I am not sure to understand how I can use the clmm function, I am not familiar with it but from what I could read, it is used to fit cumulative link models for an ordinal response variable, while in my case time is not the response variable but a factor (and my response variable is binary). I preferred to treat time as discrete factor rather than a continuous variable for 2 reasons: 1) it represents a number of cycles which is discrete and ordered by nature 2) on average, the correctness (logit) increases with time, but the relationship is nonlinear. It means that, if I use the time as a continuous variable, I should choose an adequate transformation to obtain a linear relationship, which can be very subjective. Since my main objective is to study the influence of the experimental condition, I didn't really want to go there.
A simple model would be something like response ~ time + expcond + (1|rep/sub)
I tried something like that with the lmer function, only difference is that I had as random effect (1|sub/rep). I thought that it was the proper syntax for replicates nested within subjects, giving a random intercept for each subject and for each replicate within subject. Am I missing something?
As a more complete model you could consider response ~ time + expcond + (time|rep/sub) + (expcond|sub)
With such a model where expcond is also used to define the random effect structure, can I use the anova function to compare it to the following "null model": response ~ time + (time|rep/sub) + (expcond|sub) and make a statement on the significance of the effect of the experiment condition?
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Adrien Combaz <Adrien.Combaz at ...> writes: [snip]
I measure a longitudinal binary outcome (correctness of detection, 0: incorrect, 1: correct) with respect to 5 different experimental conditions (1 baseline and 4 treatments). The outcome is always measured at the same 10 time points. Each of the 9 subjects participated in all 5 conditions. Additionally, for each subject and condition, the experiment was replicated 36 times. I therefore end up with 9*5*36=1620 binary longitudinal series (= trials of 10 points each).
[snip]
Correlation among trials for a given subject
should be straightforward,
correlation along time for a given trial may be difficult (see below).
Yes, this is my main issue.
I forgot to say that unless you are explicitly interested in the estimated correlation structure, you could hope to get around this by fitting the model without correlation and then showing that the temporal autocorrelation in the residuals is negligible ....
I am considering a 3 levels logistic models where 10 consecutive binary measurements (level 1) are obtained on replicates (level 2) which are clustered into subjects (level 3). My only level 1 covariate would be the time of measurement (ordinal factor, T = 1, ..., 10) and as level 2 covariate, I consider the experimental condition. I don't consider any level 3 covariate per se, but still want the model to account for between-subject variability.
This all seems reasonable. If you really want time to be treated as ordinal, you'll want to look at the clmm function from the 'ordinal' package.
[snip]
I am not sure to understand how I can use the clmm function, I am not familiar with it but from what I could read, it is used to fit cumulative link models for an ordinal response variable, while in my case time is not the response variable but a factor (and my response variable is binary).
You're right, my bad. The only difference between ordered and unordered factors in the standard R approach to model-fitting is that by default, treatment contrasts are used for unordered and orthogonal polynomial contrasts are used for ordered factors. Another perhaps underused option is to specify successive-differences contrasts, using the contr.sdif() function in the MASS package. None of these will make a difference in the overall complexity or fit of the model, just in the interpretation of the parameters.
I preferred to treat time as discrete factor rather than a continuous variable for 2 reasons: 1) it represents a number of cycles which is discrete and ordered by nature 2) on average, the correctness (logit) increases with time, but the relationship is nonlinear. It means that, if I use the time as a continuous variable, I should choose an adequate transformation to obtain a linear relationship, which can be very subjective. Since my main objective is to study the influence of the experimental condition, I didn't really want to go there.
A simple model would be something like response ~ time + expcond + (1|rep/sub)
I tried something like that with the lmer function, only difference is that I had as random effect (1|sub/rep). I thought that it was the proper syntax for replicates nested within subjects, giving a random intercept for each subject and for each replicate within subject. Am I missing something?
No, my bad again. it should be sub/rep
As a more complete model you could consider response ~ time + expcond + (time|rep/sub) + (expcond|sub)
With such a model where expcond is also used to define the random effect structure, can I use the anova function to compare it to the following "null model": response ~ time + (time|rep/sub) + (expcond|sub) and make a statement on the significance of the effect of the experiment condition?
Yes.
1 day later
-----Original Message----- From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed- models-bounces at r-project.org] On Behalf Of Ben Bolker Sent: Wednesday, October 09, 2013 11:46 PM To: r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] model for clustered longitudinal binary data Adrien Combaz <Adrien.Combaz at ...> writes: [snip]
I measure a longitudinal binary outcome (correctness of detection, 0: incorrect, 1: correct) with respect to 5 different experimental conditions (1 baseline and 4 treatments). The outcome is always measured at the same 10 time points. Each of the 9 subjects participated in all 5 conditions. Additionally, for each subject and condition, the experiment was replicated 36 times. I therefore end up with 9*5*36=1620 binary longitudinal series (= trials of 10 points each).
[snip]
Correlation among trials for a given subject
should be straightforward,
correlation along time for a given trial may be difficult (see below).
Yes, this is my main issue.
I forgot to say that unless you are explicitly interested in the estimated correlation structure, you could hope to get around this by fitting the model without correlation and then showing that the temporal autocorrelation in the residuals is negligible ....
That would indeed be nice. Although, I was advised to avoid looking at residuals when doing logistic mixed models on binary data. I'm actually not sure about what they represent. When doing a normal mixed model, I'm able to retrieve my observed data by adding up fitted values and residuals, but it's not the case with logistic regression. Therefore I'm wondering what they really represent and if looking at their autocorrelation will give me the information I expect.
I am considering a 3 levels logistic models where 10 consecutive binary measurements (level 1) are obtained on replicates (level 2) which are clustered into subjects (level 3). My only level 1 covariate would be the time of measurement (ordinal factor, T = 1, ..., 10) and as level 2 covariate, I consider the experimental condition. I don't consider any level 3 covariate per se, but still want the model to account for between-subject variability.
This all seems reasonable. If you really want time to be treated as ordinal, you'll want to look at the clmm function from the 'ordinal' package.
[snip]
I am not sure to understand how I can use the clmm function, I am not familiar with it but from what I could read, it is used to fit cumulative link models for an ordinal response variable, while in my case time is not the response variable but a factor (and my response variable is binary).
You're right, my bad. The only difference between ordered and unordered factors in the standard R approach to model-fitting is that by default, treatment contrasts are used for unordered and orthogonal polynomial contrasts are used for ordered factors. Another perhaps underused option is to specify successive-differences contrasts, using the contr.sdif() function in the MASS package. None of these will make a difference in the overall complexity or fit of the model, just in the interpretation of the parameters.
I preferred to treat time as discrete factor rather than a continuous variable for 2 reasons: 1) it represents a number of cycles which is discrete and ordered by nature 2) on average, the correctness (logit) increases with time, but the relationship is nonlinear. It means that, if I use the time as a continuous variable, I should choose an adequate transformation to obtain a linear relationship, which can be very subjective. Since my main objective is to study the influence of the experimental condition, I didn't really want to go there.
A simple model would be something like response ~ time + expcond + (1|rep/sub)
I tried something like that with the lmer function, only difference is that I had as random effect (1|sub/rep). I thought that it was the proper syntax for replicates nested within subjects, giving a random intercept for each subject and for each replicate within subject. Am I missing something?
No, my bad again. it should be sub/rep
As a more complete model you could consider response ~ time + expcond + (time|rep/sub) + (expcond|sub)
With such a model where expcond is also used to define the random effect structure, can I use the anova function to compare it to the following "null model": response ~ time + (time|rep/sub) + (expcond|sub) and make a statement on the significance of the effect of the experiment condition?
Yes.
Although this model seems nice, I'm reaching the maximum number of iterations without getting convergence, so I'll probably have to go for something a bit simpler.
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models