I'm reviewing a paper for a colleague, and I haven't seen this done before. Imagine that she has a sample of 100 houses, all of which include children who raise chickens. She includes a random term for household and finds that there is substantial household-level variance in chicken husbandry by kids. She then takes the household-level estimates (i.e., plus/minus relative to the model intercept) and uses them as an explanatory variable in an OLS model with households as the sampling unit. For example, she would predict something like household-level income while using the random-intercept estimates from the chicken analysis (and other covariates). At first glance, this might seem relatively straightforward, but I haven't encountered similar analyses, and I'm wondering about potential pitfalls . . . particularly given the variable number of kids in each house. Any thoughts? Thanks!
Is it kosher to use random-intercept estimates as explanatory variables in another model?
4 messages · Jeremy Koster, Reinhold Kliegl, Gebregziabher, Mulugeta +1 more
The random effects are not independent "observations"; the amount of shrinkage depends on the model parameters which are estimated from all the data. So unless there is no shrinkage associated with the random effects this is not a good idea. It may be better to to think about including the other variables (plus suitable interaction terms) in the first model. Alternatively, a structural equation model may be a better path to pursue. Reinhold Kliegl
On Mon, Jun 6, 2011 at 7:55 PM, Jeremy Koster <helixed2 at yahoo.com> wrote:
I'm reviewing a paper for a colleague, and I haven't seen this done before. Imagine that she has a sample of 100 houses, all of which include children who raise chickens. ?She includes a random term for household and finds that there is substantial household-level variance in chicken husbandry by kids. She then takes the household-level estimates (i.e., plus/minus relative to the model intercept) and uses them as an explanatory variable in an OLS model with households as the sampling unit. ?For example, she would predict something like household-level income while using the random-intercept estimates from the chicken analysis (and other covariates). At first glance, this might seem relatively straightforward, but I haven't encountered similar analyses, and I'm wondering about potential pitfalls . . . particularly given the variable number of kids in each house. Any thoughts? Thanks!
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
This looks like the two stage joint modeling approach used in Ye et al (2008) or Gebregziabher et al (2010). The key, I think, is to use some kind of robust variance for the coefficients of the first stage estimated values that are used as covariates in the second stage. References Ye W, Lin X, Taylor JMG. Semiparametric modeling of longitudinal measurements and time-to-event data - A two-stage regression calibration approach. Biometrics 2008;64(4):1238-1246. Gebregziabher M Egede LE, et al (2010) Effect of Trajectories of glycemic control on mortality in type 2 diabetes: A semiparametric joint modeling approach. Am J Epidemiol. 2010 May 15;171(10):1090-8. Epub 2010 Apr 27 Hope this helps. Mulugeta -----Original Message----- From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Jeremy Koster Sent: Monday, June 06, 2011 1:55 PM To: r-sig-mixed-models at r-project.org Subject: [R-sig-ME] Is it kosher to use random-intercept estimates as explanatory variables in another model? I'm reviewing a paper for a colleague, and I haven't seen this done before. Imagine that she has a sample of 100 houses, all of which include children who raise chickens. She includes a random term for household and finds that there is substantial household-level variance in chicken husbandry by kids. She then takes the household-level estimates (i.e., plus/minus relative to the model intercept) and uses them as an explanatory variable in an OLS model with households as the sampling unit. For example, she would predict something like household-level income while using the random-intercept estimates from the chicken analysis (and other covariates). At first glance, this might seem relatively straightforward, but I haven't encountered similar analyses, and I'm wondering about potential pitfalls . . . particularly given the variable number of kids in each house. Any thoughts? Thanks! _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
She could do this in a single step using a multilevel model that includes group-level predictors to model part of the variation associated with the intercept. Gelman & Hill include a nice example and discussion of the effect that group-level predictors have on the estimates of observation-level parameters in Section 12.6 of their book. -Christos -----Original Message----- From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Jeremy Koster Sent: Monday, June 06, 2011 1:55 PM To: r-sig-mixed-models at r-project.org Subject: [R-sig-ME] Is it kosher to use random-intercept estimates as explanatory variables in another model? I'm reviewing a paper for a colleague, and I haven't seen this done before. Imagine that she has a sample of 100 houses, all of which include children who raise chickens. She includes a random term for household and finds that there is substantial household-level variance in chicken husbandry by kids. She then takes the household-level estimates (i.e., plus/minus relative to the model intercept) and uses them as an explanatory variable in an OLS model with households as the sampling unit. For example, she would predict something like household-level income while using the random-intercept estimates from the chicken analysis (and other covariates). At first glance, this might seem relatively straightforward, but I haven't encountered similar analyses, and I'm wondering about potential pitfalls . . . particularly given the variable number of kids in each house. Any thoughts? Thanks! _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models