Cluster-robust SEs & random effects -- seeking some clarification
Thanks for this further clarification, James. By precision I meant accuracy of inference which, I believe, is what more 'robust' SEs that account for umodeled heteroskedasticity, will allow for, correct? "Accurate statistical inference" in the language of Cameron & Miller (2015). When you note, 'if you trust the specification of your random effects structure' can you elaborate on this? I imagine in the extreme, no random effects structure will ever truly be perfect, so I guess it comes down to some combination of theory, practicality, and model tractability? JD
On Mon, Aug 15, 2022 at 3:18 PM James Pustejovsky <jepusto at gmail.com> wrote:
Hi JD, Below are a couple of further thoughts on the questions you posed. James On Sat, Aug 13, 2022 at 6:33 PM J.D. Haltigan <jhaltiga at gmail.com> wrote:
One further post perhaps framing my question slightly differently (or altogether more generally): What, specifically, do cluster-robust/robust SEs allow one to do with more accuracy/precision *if* they are already using both random effects and slopes to model relevant cluster-specific effects.
Just to be clear, using cluster-robust SEs does not change anything about the accuracy or precision of the model's coefficient estimates. Using them or not using them is purely a matter of how to estimate standard errors (and thus build test statistics or confidence intervals) for those coefficient estimates. The advantage of using clustered SEs in a random effects model is that doing so captures unmodeled sources of dependence or heteroskedasticity in the errors. Thus, if you trust the specification of your random effects structure, then there is no need to use clustered SEs. On the other hand, if you (or your audience) are skeptical that you've got the right specification, then clustered SEs are helpful. Think of them as an insurance policy for your SEs/t-statistics/CIs, so that they remain valid even in the event that your model might be incorrectly specified in some respects.
Is it the case that there may be any number of sources that could potentially account for sources of heteroskedasticity (i.e., autoregressive structure in the case of repeated measurements/time variables) that using the cluster robust SEs would be of value for in making more precise inference assuming some misspecification of the random effects structure of the model?
Yes.
Relatedly, is there a 'seminal' or 'key' paper that provides a deep dive on the concept of heteroskedasticity? I have a few on hand, but wanted to see if there was something I might not be aware of .
Cameron and Miller (noted in your subsequent paper) is an excellent, thorough survey from the econometric perspective. McNeish and Kelley (2019; https://doi.org/10.1037/met0000182) is a great resource that addresses the fixed effects vs mixed effects modeling contexts. To be a bit self-promotional, I have a working paper with Young Ri Lee that looks at these issues in the context of multi-way clustering: https://psyarxiv.com/f9mr2 The simulations in the paper illustrate the consequences of several different forms of model mis-specification (such as omission of random slopes).