Skip to content
Prev 18310 / 20628 Next

Controlling for self-selection bias / endogeneity in mixed models

Thanks yet again, John. I actually began my ?journey? with propensity score matching, using the MatchIt package. Then the authors of the package came out against propensity score matching (http://gking.harvard.edu/files/gking/files/psnot.pdf) so I turned to Coarsened Exact Matching (CEM). Further evaluation revealed what I think is a sound argument that matching is only effective if you match using the separating / omitted variable (see chrisblattman.com/2010/10/27/the-cardinal-sin-of-matching/ and projects.iq.harvard.edu/sss_blog/can_matching_so). But, if you have the missing variable, you have no need to match! In short, the argument is that while matching provides a benefit over regression wrt regression extrapolation (e.g., the control variable and treatment variables? related outcome values have little overlap), it is not a solution for addressing endogeneity. But I am quite open to returning to matching if I misunderstood the argument.

You wrote in your original reply, ??a random coefficient  will likely show up as mattering for model fit with something like an  LR test.? An anova test of models with/without a random slope did indicate a better fit with the random slope. Per a response of yours in the 2016 thread,  ?The typical response when this test shows that there is still a violation of the no correlation between a random effect and a level 1 variable assumption is to stop making that assumption and use a random coefficients model.? So in my case, random (subject) and level 1 (treatment, or perhaps the missing IQ) ? what remains to be solved?

Requoting Bell ??unchanging and/or unmeasured characteristics of an individual (such as intelligence, ability, etc.) will be controlled out of the estimate of the within effect.?  This seems to address my main concern - an omitted variable (e.g., IQ) not orthogonal with treatment and correlated with the outcome. Are you not convinced that the ?within solution? in fact solves this? Or perhaps it addresses a different problem and I am not thinking about my problem correctly?

Thanks again ? I don?t want to be lazy and ask you to think through issues I should be thinking through, but discussing this with someone more familiar with the issues and a deeper understanding of the underlying statistics is a huge help!

FYI, for anyone following this thread, there is a helpful implementation of Bell et al in R to be found at https://strengejacke.github.io/mixed-models-snippets/random-effects-within-between-effects-model.html


From: John Poe <jdpoe223 at gmail.com>
Sent: Sunday, April 12, 2020 8:22 PM
To: Slaughter, Kelly <KELLY.SLAUGHTER at tcu.edu>
Cc: Ben Bolker <bbolker at gmail.com>; r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Controlling for self-selection bias / endogeneity in mixed models

Ah, okay I see the problem now. This kind of multilevel causal inference problem is a bit hard for me to conceptualize. I usually think about them with DAGs.

I *think* you're going to end up trying to model the selection mechanism itself via something like propensity score weighting unless you can find a good natural IV. In this context the propensity score is an artificial instrumental variable (much like randomization is an instrument). You can find a good explanation of IPW in Hernan and Robins https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.hsph.harvard.edu_miguel-2Dhernan_causal-2Dinference-2Dbook_&d=DwMFaQ&c=7Q-FWLBTAxn3T_E3HWrzGYJrC4RvUoWDrzTlitGRH_A&r=t-hV_EQcvMxUUCFqXmGPFL3N6XmAH6-xWI5Xpn-HlYI&m=w8nTOzq9PBY5WYHICKLJZ9zkWubZcwWebZPmZfsF9Oc&s=7nKaj3-u-912u_MjyXCT3jUs8dLY2q6kbYAy2vvk1as&e=> which includes some detail on longitudinal models though that is geared to time varying treatments. I think you'll just be focusing on building a propensity score at the time of the choice since it never changes which simplifies it down to the first cross-section of data. I'm familiar with 15 or 20ish papers on multilevel propensity score modeling so they are easy to find. One that you might look at is Arpino, B. and Mealli, F., 2011. The specification of the propensity score in multilevel observational studies. Computational Statistics & Data Analysis, 55(4), pp.1770-1780. Arpino has several papers on the topic including a statistics in medicine article that's also pretty good. Causal identification is going to be based on how good the propensity score is and there's no real way around that. Once you get the weighted (or matched if you want to go that route) data you can put it in a regular multilevel model.

It's possible that you could model this with cross-level interactions between ownership and all the level 1 stuff in the model but that would get messy. I think the propensity score route is at least more straightforward to interpret. If you had pre-treatment outcome data of some kind then you could do something like a synthetic control method but I don't know if that's feasible with what you've got.
On Sun, Apr 12, 2020 at 8:56 PM Slaughter, Kelly <KELLY.SLAUGHTER at tcu.edu<mailto:KELLY.SLAUGHTER at tcu.edu>> wrote:
Thanks for the extensive reply, John! Before I attempt to absorb it all, let me offer a couple of quick answers to your questions just to be sure the thread does not spiral in multiple directions :)

(1)     The beginning of the thread I reference can be found here: https://hypatia.math.ethz.ch/pipermail/r-sig-mixed-models/2016q4/025147.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__hypatia.math.ethz.ch_pipermail_r-2Dsig-2Dmixed-2Dmodels_2016q4_025147.html&d=DwMFaQ&c=7Q-FWLBTAxn3T_E3HWrzGYJrC4RvUoWDrzTlitGRH_A&r=t-hV_EQcvMxUUCFqXmGPFL3N6XmAH6-xWI5Xpn-HlYI&m=w8nTOzq9PBY5WYHICKLJZ9zkWubZcwWebZPmZfsF9Oc&s=ko7SDSV6QyHTTxwz0WlGtzSAT0DkpUH6s9xQHipJviI&e=>

(2)     I am referring to omitted variable bias, sorry for the confusion. My treatment / control is ownership of multiple financial accounts / ownership of single accounts. So perhaps let's say IQ tends to make someone more likely to hold multiple accounts (treatment) AND allows them to expend less effort in researching financial trades (outcome variable), whereas I am theorizing that multiple accounts themselves reduce effort directly.

BTW, Ben, thank you for your extensive support across multiple sites in helping the general public with mixed models in R. I have relied upon an EXTENSIVE number of your answers to mixed model questions when developing my models.

-----Original Message-----
From: Ben Bolker <bbolker at gmail.com<mailto:bbolker at gmail.com>>
Sent: Sunday, April 12, 2020 7:46 PM
To: John Poe <jdpoe223 at gmail.com<mailto:jdpoe223 at gmail.com>>
Cc: Slaughter, Kelly <KELLY.SLAUGHTER at tcu.edu<mailto:KELLY.SLAUGHTER at tcu.edu>>; r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>
Subject: Re: [R-sig-ME] Controlling for self-selection bias / endogeneity in mixed models

  Wow, this is the kind of content I come here for.  (It will take me a while to digest this ...) Thank you!
On Sun, Apr 12, 2020 at 8:36 PM John Poe <jdpoe223 at gmail.com<mailto:jdpoe223 at gmail.com>> wrote: