Dear R-Users! Is there a possibility in R to do analyze longitudinal survey data (repeated measures in a survey)? I know that for longitudinal data I can use lme() to incorporate the correlation structure within individual and I know that there is the package survey for analyzing survey data. How can I combine both? I am trying to calculate design-based estimates. However, if I use svyglm() from the survey package I would ignore the correlation structure of the repeated measures. Thanks! Dassy
longitudinal survey data
7 messages · Koen Pelleriaux, Hadassa Brunschwig, Thomas Lumley
On Thu, 26 May 2005 h.brunschwig at utoronto.ca wrote:
Dear R-Users! Is there a possibility in R to do analyze longitudinal survey data (repeated measures in a survey)? I know that for longitudinal data I can use lme() to incorporate the correlation structure within individual and I know that there is the package survey for analyzing survey data. How can I combine both? I am trying to calculate design-based estimates. However, if I use svyglm() from the survey package I would ignore the correlation structure of the repeated measures.
You *can* fit regression models to these data with svyglm(). Remember that from a design-based point of view there is no such thing as a correlation structure of repeated measures -- only the sampling is random, not the population data. If you *want* to fit mixed models (eg because you are interested in estimating variance components, or perhaps to gain efficiency) then it's quite a bit trickier. You can't just use the sampling weights in lme(). You can correct for the biased sampling if you put the variables that affect the weights in as predictors in the model. Cluster sampling could perhaps then be modelled as another level of random effect. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle
On 5/26/05, Thomas Lumley <tlumley at u.washington.edu> wrote:
If you *want* to fit mixed models (eg because you are interested in estimating variance components, or perhaps to gain efficiency) then it's quite a bit trickier. You can't just use the sampling weights in lme(). You can correct for the biased sampling if you put the variables that affect the weights in as predictors in the model. Cluster sampling could perhaps then be modelled as another level of random effect.
I've been struggeling with case weights (in the case of unequal selection probabilities) in mixed effects models. Those are not possible in lme(). Isn't it, however, possible to use case weights in glmmPQL from MASS? Koen Pelleriaux Sociologist University of Antwerp
Thank you for your reply. Does that mean that in order to take in account the repeated measures I denote these as another cluster in R? Dassy Quoting Thomas Lumley <tlumley at u.washington.edu>:
On Thu, 26 May 2005 h.brunschwig at utoronto.ca wrote:
Dear R-Users! Is there a possibility in R to do analyze longitudinal survey data
(repeated
measures in a survey)? I know that for longitudinal data I can use lme()
to
incorporate the correlation structure within individual and I know that
there is
the package survey for analyzing survey data. How can I combine both? I
am
trying to calculate design-based estimates. However, if I use svyglm() from
the
survey package I would ignore the correlation structure of the repeated
measures.
You *can* fit regression models to these data with svyglm(). Remember that from a design-based point of view there is no such thing as a correlation structure of repeated measures -- only the sampling is random, not the population data. If you *want* to fit mixed models (eg because you are interested in estimating variance components, or perhaps to gain efficiency) then it's quite a bit trickier. You can't just use the sampling weights in lme(). You can correct for the biased sampling if you put the variables that affect the weights in as predictors in the model. Cluster sampling could perhaps then be modelled as another level of random effect. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle
On Fri, 27 May 2005 h.brunschwig at utoronto.ca wrote:
Thank you for your reply. Does that mean that in order to take in account the repeated measures I denote these as another cluster in R?
Yes, but unless you have multistage finite population corrections to put in the design object only the first stage of clustering affects the results, so you may not need to bother. -thomas
Sorry, still confused. If I dont have fpc's ready in my dataset (calculate myself?) that means that R will use the weight of an individual for each of his repeated observations. But is that then still correct? The "cluster" individual is ignored and each observation of an individual has the same weight. Thanks a lot. Dassy Quoting Thomas Lumley <tlumley at u.washington.edu>:
On Fri, 27 May 2005 h.brunschwig at utoronto.ca wrote:
Thank you for your reply. Does that mean that in order to take in account the repeated measures I
denote
these as another cluster in R?
Yes, but unless you have multistage finite population corrections to put in the design object only the first stage of clustering affects the results, so you may not need to bother. -thomas
On Fri, 27 May 2005 h.brunschwig at utoronto.ca wrote:
Sorry, still confused. If I dont have fpc's ready in my dataset (calculate myself?) that means that R will use the weight of an individual for each of his repeated observations. But is that then still correct? The "cluster" individual is ignored and each observation of an individual has the same weight.
Well, it depends to some extent on what inferences you are making, but yes, you probably do want each observation to have the same weight. Suppose you have 4 measurements on each person, and you are working with a simple random sample of 1000 people from a population of 1,000,000. If you had done these 4 measurements on the whole population you would have 4,000,000 measurements, so the 4000 measurements you have are 1/1000 of the population. This is the same weighting as if you had a single measurement person person, giving 1000 measurements in the sample and 1,000,000 in the population. If different individuals have different numbers of measurements then things get a bit trickier. It depends then on why there are different numbers of measurements.If they are the result of non-response you might want to rescale the weights at later time points to give the right population totals. If they are part of the sampling design then the design will specify what to do with them. -thomas