Skip to content

[R-meta] Prediction Intervals for estimates of random slope rma.mv model with predict.rma()

7 messages · Wolfgang Viechtbauer, Zac Robinson, James Pustejovsky

#
Dear Zac,

Please respond to the mailing list, not just to me personally.

Just summing up all the variances isn't correct - the correct computation of the prediction interval for such a random-effects structure is unfortunately more complex.

Best,
Wolfgang
7 days later
#
Wolfgang,

Apologies for the incorrect method to reply - hopefully got it right this
time. If I'm understanding things correctly - the coding can (potentially)
end up giving me what I'm after, but I'm just specifying the sigma / sd for
the prediction interval incorrectly (i.e., the variances can't just be
summed).

Thus, I guess my follow up question is - how would you recommend going
about this calculation? The closest thing I can find in the literature with
a similar variance calculation is in the context of R2 from Johnson et al.,
where they calculate the "mean random effects variance" which seems like it
could be useful here:

https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.12225

Let me know what you think and thank you for your time and expertise,

Zac

On Thu, Mar 14, 2024 at 9:23?AM Viechtbauer, Wolfgang (NP) <
wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

            

  
  
#
It looks like you are trying to generate a prediction interval for a new
effect size given a specific set of predictors, based on a model with
random slopes terms. A model like look like the following:

Y_ij = b0 + b1 X_ij + u_0j + u_1j X_ij + v_ij + e_ij,

where (u_0j, u_1j) are the study-level random effects with
variance-covariance matrix T, v_ij is a within-study random effect with
variance omega^2, and e_ij is the sampling error with known variance V_ij.
Your goal might be to predict a new effect size for a study with X_ij = a,
or the distribution of

delta_ij = b0 + b1 * (a) + u_0j + u_1j * (a) + v_ij

The tricky bit with these models (and probably the reason that PIs are not
currently implemented in metafor) is that the study-level random effects
terms can be correlated and you need to take the variances and covariances
into account to get a prediction interval.

My understanding is that metafor calculates prediction intervals by taking
the predicted value given the fixed effects terms, plus or minus a critical
value (call it z_p) times a quantity S, where S^2 is the sum of the
variance of the predicted value and the variance of the random effects at
the predicted value. For the example above:

PI = [bhat0 + bhat1 * (a)] +/- z_p * S, where S = sqrt[ Var(bhat0 + bhat1 *
(a)) + Var(u_0j + u_1j * (a) + v_ij) ]

You can calculate the first term directly from the beta estimates that come
out of metafor. You can calculate the first component of S by using the
vcov matrix that comes out of metafor. And you can calculate the third term
from the estimated variance components.

To make the calculations simpler, it would be convenient to re-center the
predictor at the value for which you're trying to get a prediction
interval. In other words, if you set X'_ij = X_ij - a, then you're
predicting at the value X'_ij = 0, and the prediction interval simplifies to

PI = bhat0 +/- z_p * sqrt[ Var(bhat0) + Var(u_0j + v_ij)]

Example code using Konstantopoulos2011 below.

James


# load packages and data
library(metafor)
library(tidyverse)

data("dat.konstantopoulos2011")
df <-
  dat.konstantopoulos2011 %>%
  mutate(
    factor=sample(c("A","B"),nrow(.),replace=TRUE)
  )

# RANDOM INTERCEPT --------------------------------------------------------
# fit random intercept model
m1 <- rma.mv(
  yi ~ year + factor, V = vi,
  random = list(~1|study, ~1|district, ~1|school),
  data = df, method = "REML"
)

# Prediction interval at year = 1989 (for Taylor) and factor = A
predict(m1, newmods = cbind(1989, 0), level = 80)

# Same thing, manual calcualtion
vec <- c(1, 1989, 0)
pred <- sum(m1$beta * vec)
Vb <- as.numeric(t(vec) %*% vcov(m1) %*% vec)
V_RE <- sum(m1$sigma2)
S <- sqrt(Vb + V_RE)
z_crit <- qnorm((1 + 0.8) / 2)
pred + c(-1, 1) * z_crit * S

# RANDOM SLOPE (x1) -------------------------------------------------------
# fit single random slope model
# Re-center the year predictor
df$year_c <- df$year - 1989
m2 <- rma.mv(
  yi ~ year_c + factor, vi,
  random = list(~ year_c | study, ~1|district, ~1|school),
  data = df, method = "REML", struct = "GEN"
)

# Does not return prediction interval
predict(m2, newmods = cbind(0, 0), level = 80)

# Calculate prediction interval for year_c = 0, factor = A
vec <- c(1, 0, 0)
pred <- sum(m2$beta * vec)
Vb <- as.numeric(t(vec) %*% vcov(m2) %*% vec)
V_RE <- sum(m2$sigma2) + m2$tau2[1] # Using only the random intercept at
the study level
S <- sqrt(Vb + V_RE)
z_crit <- qnorm((1 + 0.8) / 2)
pred + c(-1, 1) * z_crit * S


On Thu, Mar 21, 2024 at 11:19?AM Zac Robinson via R-sig-meta-analysis <
r-sig-meta-analysis at r-project.org> wrote:

            

  
  
#
James,

Thank you very much for your reply - it was extremely helpful!

I have taken your code, modified it a bit, and attempted to create a
function that works alongside emmprep / emmeans. From my tests it seems to
get the same results as your manual calculations but let me know if I'm
missing something - only thing I wondered is if "rho" and "phi" need to be
incorporated directly, as well as the covariances you mentioned

Thanks again,
Zac

# load packages and data
library(metafor)
library(tidyverse)
library(emmeans)

data("dat.konstantopoulos2011")
df <-
  dat.konstantopoulos2011 %>%
  mutate(
    factor=sample(c("A","B"),nrow(.),replace=TRUE)
  )

# create function
get.pi.emmeans.rma.mv <- function(model, mm){

  tmp <- summary(mm)
  tmp <- tmp[ , ]
  test.stat <- stats::qt(mm at misc$level+(1-mm at misc$level)/2, tmp$df[[1]])

  n_g <- colnames(model$mf.g)
  n_g2 <- ifelse(is.null(n_g),0,n_g[n_g!="intrcpt"&n_g!="outer"])
  l_g <- ifelse(is.null(n_g),0,mm at linfct[,n_g2])
  v_g <- ifelse(is.null(n_g),0,diag(model$G)%>%as.data.frame()%>%.[n_g2,])

  n_h <- colnames(model$mf.h)
  n_h2 <- ifelse(is.null(n_h),0,n_h[n_h!="intrcpt"&n_h!="outer"])
  l_h <- ifelse(is.null(n_h),0,mm at linfct[,n_h2])
  v_h <- ifelse(is.null(n_h),0,diag(model$H)%>%as.data.frame()%>%.[n_h2,])

  V_b <- tmp$SE^2
  V_ri <- sum(model$sigma2 + model$tau2[1] + model$gamma2[1])
  V_g <- l_g*v_g
  V_h <- l_h*v_h

  PI <- test.stat * sqrt(V_b + V_ri + V_g + V_h)

  tmp$lower.PI <- tmp$emmean - PI
  tmp$upper.PI <- tmp$emmean + PI

  return(tmp)
}


# RANDOM INTERCEPT --------------------------------------------------------
# fit random intercept model
m1 <- rma.mv(
  yi ~ year + factor, V = vi,
  random = list(~1|district, ~1|interaction(district,school)),
  data = df, method = "REML"
)

# Prediction interval at year = 1989 (for Taylor) and factor = A
predict(m1, newmods = cbind(1989, 0), level = 95)

# Using function
emmprep(m1,
        at=list(year=1989,
                factor="A"))%>%
  emmeans(~year+factor,
          weights = "prop")%>%
  get.pi.emmeans.rma.mv(model=m1) #same

# RANDOM SLOPE (x1) -------------------------------------------------------
# fit single random slope model
m2 <- rma.mv(
  yi ~ year + factor, vi,
  random = list(~year|district, ~1|interaction(district,school)),
  data = df, method = "REML", struct = "GEN",control =
list(optimizer="optim")
)

# Calculate prediction interval for year = 1989, factor = A
vec <- c(1, 1989, 0)
pred <- sum(m2$beta * vec)
Vb <- as.numeric(t(vec) %*% vcov(m2) %*% vec)
V_RE <- sum(m2$sigma2) + m2$tau2[1] + m2$tau2[2]*1989
S <- sqrt(Vb + V_RE)
z_crit <- qnorm((1 + 0.95) / 2)
pred + c(-1, 1) * z_crit * S

# Using function
emmprep(m2,
        at=list(year=1989,
                factor="A"))%>%
  emmeans(~year+factor,
          weights = "prop")%>%
  get.pi.emmeans.rma.mv(model=m2) # same
On Thu, Mar 21, 2024 at 1:00?PM James Pustejovsky <jepusto at gmail.com> wrote:

            

  
  
#
This doesn't look correct to me. The step of centering the predictor is
essential for the code that I posted to work. If you want to be able to
predict at arbitrary points, then you'll need to take into account the
covariance between the random effects, so calculating
Var(u_0j + u_1j * (a) + v_ij) = Var(u_0j) + a^2 * Var(u_1j) + 2 * a *
Cov(u_0j, u_1j) + Var(v_ij)

James


On Thu, Mar 21, 2024 at 6:04?PM Zac Robinson <zacrobinson2015 at gmail.com>
wrote:

  
  
#
Hey James,

Thanks again for the follow up - I was pretty sure I was missing something.
To take this one more step further, let's say I had multiple random slopes
on the same level - would the covariance between them also need to be taken
into the account? How would the equation look in that case?

Thank you very much for your time and expertise,
Zac

On Fri, Mar 22, 2024 at 10:03?AM James Pustejovsky <jepusto at gmail.com>
wrote:

  
  
3 days later
#
just be variance algebra. Say that the row-vector of fixed effect
predictors is x_ij and the row-vector of random effects terms is z_ij. The
model is

Y_ij = x_ij beta + z_ij u_j + e_ij

where the variance-covariance matrix of u_j is Tau and the sampling
variance of beta-hat is Vbeta.

The prediction interval at a given set of values would be
mu = x_ij beta-hat
sigma_sq = Var(x_ij beta-hat) + Var(z_ij u_j) = x_ij Vbeta x_ij' + z_ij Tau
z_ij'
mu +/- crit * sqrt(sigma_sq)

If you have multiple sets of independent random effects terms, then each
set would need to be added. For example, say that you've got

Y_ij = x_ij beta + z0_ij u_j + z1_ij v_ij + e_ij

where Var(u_j) = Tau and Var(v_ij) = Omega and the u_j's are independent of
the v_ij's. Then sigma_sq would need to be calculated as

sigma_sq = Var(x_ij beta-hat) + Var(z0_ij u_j) + Var(z1_ij u_j) = x_ij
Vbeta x_ij' + z0_ij Tau z0_ij' + z1_ij Omega z1_ij'

James