Skip to content
Prev 316 / 5632 Next

[R-meta] R-help: rma() in metafor drops levels in variables that correlates perfectly with levels in another variable

Dear Wey Wen,

I think your example doesn't quite show what you meant it to show. For the data you provided, all levels are estimable. I think you meant 'Study C' to have the values '1 2 1'. And then indeed, the coefficient for level 3 of variable 2 is not estimable. This is not something specific to metafor, but applies to linear models in general. For example, you will find lm() to behave in the same way (except that it shows the coefficient as NA, while metafor drops it from the output).

To illustrate:

dat <- read.table(header=TRUE, text = "
study x1 x2 x3
'Study A'   1    1     2
'Study B'   1    2     2
'Study C'   1    2     1
'Study D'   2    3     3
'Study E'   2    3     1
'Study F'   2    3     2
'Study G'   3    1     3
'Study H'   3    1     3
'Study I'   3    2     2")

dat$y <- rnorm(9)
res <- lm(y ~ factor(x1) + factor(x2) + factor(x3), data=dat)
summary(res)

This happens because the model matrix is not of full rank. Take a look at:

model.matrix(res)

You will see that variable 'factor(x1)2' and 'factor(x2)3' are identical.

So, to answer your questions:

1) Yes, this is intentional.

2) It is indeed because of collinearity.

I don't know why you think that "If collinearity is an issue, it only applies to the specific level within a variable, not between variables." It may be worth reviewing: https://en.wikipedia.org/wiki/Multicollinearity

3) There are ways of estimating all coefficients, but (a) this requires taking a generalized inverse and (b) leads to a non-unique solution, so one could argue that the results will be arbitrary. I don't think this should be done (and apparently neither do the authors of lm() which I think says a lot).

Best,
Wolfgang