Skip to content
Prev 19382 / 20628 Next

Comparing weighted and unweighted estimation RE: Methodological and practical issues about survey weights using lme4

Dear list members

I am still trying to understand weighed estimation of mixed models. In previous messages I was very kindly told that the weights argument in lmer() is for precision weights (not sampling weights). However, I am still not convinced about that and I would appreciat more thoughts about weighting.

The vignette of WeMix packaged says: ?The packagelme4 fits mixed models when there are no weights or weights only for first-level units (Bates, Maechler,Bolker, & Walker, 2015) and is recommended when both of two conditions hold: no weights are above the first level,and cluster-robust standard errors are not required.WeMixcan fit models with weights at every level of the modeland also calculates cluster-robust standard errors that account for covariance between units in the same groups?. See https://cran.r-project.org/web/packages/WeMix/
Additionally, in the help page of the mix() function of WeMix explains: "When all weights above the individual level are 1, this is similar to a lmer and you should use lme4 because it is much faster. "

I have seen explanations of that use, such as in the following link: https://www.r-bloggers.com/2017/06/sampling-weights-and-multilevel-modeling-in-r/

That use is different from the use in meta-analysis, weighting by inverse variance or sample size: https://www.metafor-project.org/doku.php/tips:rma_vs_lm_lme_lmer

There are also links of debates in internet commenting that lmer() cannot be used for survey weights. Some of them are old so I do not summarize them here.

The 'weights' argument in lmer() function of lme4 is explained in the following way: "weights an optional vector of ?prior weights? to be used in the fitting process. Should be NULL or a numeric vector. Prior weights are not normalized or standardized in any way. In particular, the diagonal of the residual covariance matrix is the squared residual standard deviation parameter sigma times the vector of inverse weights. Therefore, if the weights have relatively large magnitudes, then in order to compensate, the sigma parameter will also need to have a relatively large magnitude"

I apologise for my ignorance but I do not understand the difference between precisoon weights or survey weights in this last function. The expression "prior weights" doest not help me with that.

Using the European Social Survey, I have used the ?analysis weights? () normalized to sum to the sample size after deletion of missing data.
https://www.europeansocialsurvey.org/methodology/ess_methodology/data_processing_archiving/weighting.html
Using lmer() with weights and WeMix with those weights for level-1 and unitary weights for level-2 produce very similar estimates of level-1-and-2 variables. I use unitary weights for level-2 because they are European countries, therefore it is not a sample such a sample of schools in a country. Example of results for my main variables:

  *   Using lmer() without weights
                                           Estimate             Std. Error          t value
level-2 variable1                0.20795196        0.06229626      3.338113
level-2 variable2                -0.26445932       0.06232801     -4.243025
level-2 variable2                0.46072085         0.05212811      8.838241

  *   Using lmer() and weights
                                           Estimate             Std. Error           t value
level-2 variable1                0.194559163       0.06695520      2.9058113
level-2 variable2                -0.258452710     0.06771138     -3.8169759
level-2 variable2                0.466058046       0.05746252      8.1106439

  *   Using WeMix
                                           Estimate              Std. Error        t value
level-2 variable1                0.1954945           0.0512900       3.8116
level-2 variable2                -0.2593960          0.0585994        -4.4266
level-2 variable2                0.4667014           0.0548841       8.5034

I am surprised by the similarity of results between weighted and unweighted lmer(). The four most populated countries are 22% of the observations in my sample but 60% of the sum of the normalized weights. Therefore, I was expecting more impact of weighting. In any case, comparing weighted lmer() and WeMix?s function, we find similar results.

Apart from the sofware issues, my more general question was methodological. Solon et al (2015) do not suggest using weights for causal analysis. Indeed, their paper starts with a paragraph that is worthy to repeat here: ?At the beginning of their textbook?s section on weighted estimation of regression models, Angrist and Pischke (2009, p. 91) acknowledge, ?Few things are as confusing to applied researchers as the role of sample weights. Even now, 20 years post- Ph.D., we read the section of the Stata manual on weighting with some dismay.? After years of discussing weighting issues with fellow economic researchers, we know that Angrist and Pischke are in excellent company. In published research, top- notch empirical scholars make conflicting choices about whether and how to weight and often provide little or no rationale for their choices. And in private discussions, we have found that accomplished researchers sometimes own up to confusion or declare demonstrably faulty reasons for their weighting choices.?
http://jhr.uwpress.org/content/50/2/301

Therefore, some of the available discussions in internet are probably wrong. I would appreciate further comments about these issues: 1) convenience of using weights for causal analysis; 2) using survey weights in lme4 pacakge; 3) comparison of weighted and unweighted results in spite of such a difference of the importance of the level-2 units (countries here).

Thank you very much. All the best,

Fernando Bruna