Skip to content

Bootstrap the variance difference

4 messages · Kevin Spring, David Duffy, Joshua Wiley +1 more

#
On Mon, 6 Feb 2012, Kevin Spring wrote:

            
So you tried one combined dataset and subset?
#
Hi Kevin,

I worry a bit that your model is correctly specified and you are doing
what you really want to be doing, but that is a bit of a different
question (perhaps worth checking).  In any case, your models did not
run on my system with your example data.  Supposing all works well in
the full data and you are confident with what you have, then this is
one approach to dealing with the boot issue:

rather than pass two data sets, pass one wide data set in.  Use
slightly different names and pass in two formulae to use the correct
variables.

Hope this helps,

Josh

P.S. If real data is large, this may be somewhat slow and would easily
benefit from parallelizing if you have multiple cores and sufficient
memory----check out the parallel option to boot().

#####################################################

dat1 <- read.csv(file.choose()) ## final
dat2 <- read.csv(file.choose()) ## initial

colnames(dat1) <- paste("f", colnames(dat1), sep = '')
colnames(dat2) <- paste("i", colnames(dat2), sep = '')

dat <- cbind(dat2, dat1)

varcomp <- function (lformula, dat, indices) {
  d <- dat[indices, ]
  fit1 <- lmer(lformula[[1]], data=d) #linear model
  fit2 <- lmer(lformula[[2]], data=d) #linear model
  a <- (attr (VarCorr(fit1), "sc")^2) #output variance estimation
  b <- (attr (VarCorr(fit2), "sc")^2) #output variance estimation
  drv <- (a - b) #difference between the variance estimations
  return(drv)
}

require(lme4)
require(boot)

system.time(ip1.boot <- boot (data = dat, statistic = varcomp, R =
100, lformula = list(
  initial = iCNPC ~ (1 | iCell.line) + (1 | iDNA.extract) + iCell.line,
  final = fCNPC ~ (1 | fCell.line) + (1 | fDNA.extract) + fCell.line)))
On Mon, Feb 6, 2012 at 1:32 PM, Kevin Spring <kevinjspring at gmail.com> wrote:

  
    
#
Hi Kevin,

add an indicator to each dataset and then rbind them. The indicator
variable then distinguishes between the sources. Pass it to boot as
the strata= argument.  Then also use that indicator in the varcomp
function to distinguish between the two datasets, e.g. using the
subset= arguments in lmer..

Cheers

Andrew
On Mon, Feb 06, 2012 at 03:32:35PM -0600, Kevin Spring wrote: