Skip to content

Best way to handle missing data?

4 messages · Bonnie Dixon, Malcolm Fairbrother

#
Thanks for this suggestion, Malcolm.  Here is an example in which I use
Amelia/Zelig with the "africa" data set that is available in Amelia.
I extracted the average standard deviation of the random effects from the
result produced by Zelig.  (In this example, I am using the version of the
summary.MI function found here:
http://stackoverflow.com/questions/16571580/multi-level-regression-model-on-multiply-imputed-data-set-in-r-amelia-zelig-l)
 Perhaps this approach will work for my purposes.

# Get packages
require(Amelia)
require(Zelig)
require(ZeligMultilevel)

# Look at the data
data(africa)
head(africa)
summary(africa)
help(africa)

# Impute the missing data
africa.am <-
  amelia(x = africa,
         m = 30,
         cs = "country",
         ts = "year",
         logs = "gdp_pc")
summary(africa.am)
plot(africa.am)
missmap(africa.am)
names(africa.am)

# Create a model:
africa.z <-
  zelig(formula = gdp_pc ~ infl + tag(infl | country),
        data = africa.am$imputations,
        model = "ls.mixed")

# The combined fixed effects:
summary(africa.z)

# The average standard deviation of the random intercepts and slopes:
ran.ints <-
  sapply(africa.z,
         function(x)
           attributes(VarCorr(x$result)$country)$stddev["(Intercept)"])
mean(ran.ints)

ran.slopes <-
  sapply(africa.z,
         function(x)
           attributes(VarCorr(x$result)$country)$stddev["infl"])
mean(ran.slopes)



On Fri, Feb 27, 2015 at 4:47 AM, Malcolm Fairbrother <
M.Fairbrother at bristol.ac.uk> wrote:

            

  
  
#
Hi Bonnie,

I was getting an error with that code, and finding Zelig cumbersome. So how
about just:

mods <- lapply(africa.am[[1]], function(x) lmer(gdp_pc ~ infl + (infl |
country), data=x))
rowMeans(sapply(mods, fixef)) # fixed effects
rowMeans(sapply(mods, function(x) as.data.frame(VarCorr(x))$vcov)) # random
effects variances
Reduce("+", sapply(mods, ranef))/length(mods) # random effects (intercepts
and slopes)
sqrt(rowMeans(sapply(mods, function(x) diag(vcov(x)))) +
diag(var(t(sapply(mods, fixef))))*(1+1/length(mods))) # SEs

I think that gets you everything you want?

The last row of code is my interpretation of: "The variance of the point
estimate is the average of the estimated variances from within each
completed data set, plus the sample variance in the point estimates across
the data sets (multiplied by a factor that corrects for the bias because m
< ?)." (from http://r.iq.harvard.edu/docs/amelia/amelia.pdf)

Does that correspond to what you were getting via Zelig? I'd be interested
to know that this worked, actually.

Cheers,
Malcolm
On 2 March 2015 at 20:37, Bonnie Dixon <bmdixon at ucdavis.edu> wrote:

            

  
  
#
Thanks, Malcolm!  Yes, that extracts all the results I need from out of the
multiple imputation models.  This method also seems preferable over using
Zelig because it provides the flexibility to create the models using lme4,
or using a different package, such as nlme or MCMCglmm.  And, yes, the
results using this method are identical to the results from Zelig.

Bonnie

P.S.  By the way, that error that occurs when using summary() on the object
created by Zelig can be avoided by using the modified version of the
summary.MI function below (from
http://stackoverflow.com/questions/16571580/multi-level-regression-model-on-multiply-imputed-data-set-in-r-amelia-zelig-l
).

summary.MI <- function (object, subset = NULL, ...) {
  if (length(object) == 0) {
    stop('Invalid input for "subset"')
  } else {
    if (length(object) == 1) {
      return(summary(object[[1]]))
    }
  }

    getcoef <- function(obj) {
    # S4
    if (!isS4(obj)) {
      coef(obj)
    } else {
      if ("coef3" %in% slotNames(obj)) {
        obj at coef3
      } else {
        obj at coef
      }
    }
  }

  res <- list()

  # Get indices
  subset <- if (is.null(subset)) {
    1:length(object)
  } else {
    c(subset)
  }

  # Compute the summary of all objects
  for (k in subset) {
    res[[k]] <- summary(object[[k]])
  }


  # Answer
  ans <- list(
    zelig = object[[1]]$name,
    call = object[[1]]$result at call,
    all = res
  )

  coef1 <- se1 <- NULL

  for (k in subset) {
    tmp <- coef(res[[k]])
    coef1 <- cbind(coef1, tmp[, 1])
    se1 <- cbind(se1, tmp[, 2])
  }

  rows <- nrow(coef1)
  Q <- apply(coef1, 1, mean)
  U <- apply(se1^2, 1, mean)
  B <- apply((coef1-Q)^2, 1, sum)/(length(subset)-1)
  var <- U+(1+1/length(subset))*B
  nu <- (length(subset)-1)*(1+U/((1+1/length(subset))*B))^2

  coef.table <- matrix(NA, nrow = rows, ncol = 4)
  dimnames(coef.table) <- list(rownames(coef1),
                               c("Value", "Std. Error", "t-stat",
"p-value"))
  coef.table[,1] <- Q
  coef.table[,2] <- sqrt(var)
  coef.table[,3] <- Q/sqrt(var)
  coef.table[,4] <- pt(abs(Q/sqrt(var)), df=nu, lower.tail=F)*2
  ans$coefficients <- coef.table
  ans$cov.scaled <- ans$cov.unscaled <- NULL

  for (i in 1:length(ans)) {
    if (is.numeric(ans[[i]]) && !names(ans)[i] %in% c("coefficients")) {
      tmp <- NULL
      for (j in subset) {
        r <- res[[j]]
        tmp <- cbind(tmp, r[[pmatch(names(ans)[i], names(res[[j]]))]])
      }
      ans[[i]] <- apply(tmp, 1, mean)
    }
  }

  class(ans) <- "summaryMI"
  ans
}

On Tue, Mar 3, 2015 at 1:53 AM, Malcolm Fairbrother <
M.Fairbrother at bristol.ac.uk> wrote:

            

  
  
#
Great, glad that worked!

On the topic of MCMC estimation with multiply imputed data, my
understanding is that you can simply merge each (m=5, 30, or whatever) of
the separate chains, and then analyse those using whatever techniques you
would have applied to a single chain. I guess this could also be useful for
assessing convergence. Others are welcome to correct me on this, however.

- Malcolm
On 4 March 2015 at 06:42, Bonnie Dixon <bmdixon at ucdavis.edu> wrote: