Cluster-robust SEs & random effects -- seeking some clarification

Tue, Aug 16, 2022 2:47 PM
Thanks, James The McNeish & Kelley (2019) paper is one I was not aware of
despite my read of several other Kelley-authored articles.

Indeed, that paper provides a point of departure for a question on my work
on the Bangladesh RCT mask-intervention study mentioned earlier.

In short: they used cluster-affiliated dummy variables (read: the pairID
variable) in a fixed effect model. For their linear run with baseline
controls, their STATA code was:
reghdfe posXsymp treatment proper_mask_base prop_resp_ill_base_2,
absorb(pairID) vce(cluster union)

In translating this to a random-effects model using lmer, does it make
sense to include the pairID variable in the model *if* I treat the cluster
variable as its own random effect as:

lme4_1_B = lmer(posXsymp~treatment+proper_mask_base+prop_resp_ill_base_2 +
pairID + (1 | union), data = bdata.raw1)#lme4 package

I have mentioned previously that the lmer code above is a random-intercepts
only model. This is by design as there are mean-level differences in the
clusters to begin with on several background variables that are captured by
the random effects. I also am making a conceptual case that in order for
the mask study to have appropriate generalizability, one must assume or
treat clusters as *randomly* selected from a larger population of clusters.
Otherwise, any marginal effect of the mask-intervention (while perhaps more
accurately estimated in a fixed model), is not going to have the
generalizability to any population of human interactions. My focal question
nonetheless concerns how to treat the pairID variable in my translation of
their fixed effects model to a random effects model in lmer. If I include
the pairID variable as above, what does it reflect given that cluster is
treated as a random effect? I have a separate model where I eliminate the
pairID variable as:

lme4_1 = lmer(posXsymp~treatment+proper_mask_base+prop_resp_ill_base_2 + (1
| union), data = bdata.raw1)#lme4 package

*What is the substantive difference between these two models? *My sense is
that this gets at the separation of between/within effects and that the
pairID variable in their original STATA fixed effects model (a
cluster-affiliated variable in the language of McNeish & Kelley) is
analogous to the cluster variable itself BUT in their model, a) the
assumption is that clusters are interchangeable (not drawn from a random
population); and b) one can not estimate within-cluster/between cluster
effects using their parameterization (i.e., random effects--in my case
intercepts--for the clusters).

I realize this is a bit of a mouthful, but I was inspired to post after
reading the McNeish & Kelley and needed to get this out for my own thinking.

-JD

On Mon, Aug 15, 2022 at 10:00 PM James Pustejovsky <jepusto at gmail.com>
wrote:
Cluster-robust SEs & random effects -- seeking some clarification

Thread (8 messages)