Bootstrap the variance difference

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20120206/559e7bb6/attachment-0001.pl>

I can't do it this way because the boot function only allows for one data
set to be inputted.
So you tried one combined dataset and subset?
Hi Kevin,

I worry a bit that your model is correctly specified and you are doing
what you really want to be doing, but that is a bit of a different
question (perhaps worth checking).  In any case, your models did not
run on my system with your example data.  Supposing all works well in
the full data and you are confident with what you have, then this is
one approach to dealing with the boot issue:

rather than pass two data sets, pass one wide data set in.  Use
slightly different names and pass in two formulae to use the correct
variables.

Hope this helps,

Josh

P.S. If real data is large, this may be somewhat slow and would easily
benefit from parallelizing if you have multiple cores and sufficient
memory----check out the parallel option to boot().

#####################################################

dat1 <- read.csv(file.choose()) ## final
dat2 <- read.csv(file.choose()) ## initial

colnames(dat1) <- paste("f", colnames(dat1), sep = '')
colnames(dat2) <- paste("i", colnames(dat2), sep = '')

dat <- cbind(dat2, dat1)

varcomp <- function (lformula, dat, indices) {
  d <- dat[indices, ]
  fit1 <- lmer(lformula[[1]], data=d) #linear model
  fit2 <- lmer(lformula[[2]], data=d) #linear model
  a <- (attr (VarCorr(fit1), "sc")^2) #output variance estimation
  b <- (attr (VarCorr(fit2), "sc")^2) #output variance estimation
  drv <- (a - b) #difference between the variance estimations
  return(drv)
}

require(lme4)
require(boot)

system.time(ip1.boot <- boot (data = dat, statistic = varcomp, R =
100, lformula = list(
  initial = iCNPC ~ (1 | iCell.line) + (1 | iDNA.extract) + iCell.line,
  final = fCNPC ~ (1 | fCell.line) + (1 | fDNA.extract) + fCell.line)))
?I asked this question on Stack Exchange, but I think it might be too
specialized. ?Hopefully someone in the mixed model group can help me out.

I want to be able to bootstrap the variance differences between two data
sets obtained at different times while taking out the error in a random
effect.

I have 2 sets of experimental data, where the data was measured at 2 time
points (initial and final). I also have a set of simulation data. I want to
compare the variance of the simulated date with the variance difference
between the experimental data (final - initial). The idea is to get
confidence intervals from the bootstrap to compare the experimental data
with the simulation.

I am having trouble making the statistic for the bootstrap function in the
boot package for R. So far I have.

varcomp <- function ( formula, data, indices ) {
? ?d <- data[indices,] #sample for boot
? ?fit <- lmer(formula, data=d) #linear model
? ?res.var = (attr (VarCorr(fit), "sc")^2) # variance estimation
? ?return(res.var)
? ?}

But this function only returns the variance of a single data set. I want to
be able to input 2 sets of data and have it return the difference between
the two data sets' variance.

When I try something like:

varcomp <- function ( formula, data1, data2, indices ) {
d1 <- data1[indices,] #sample for boot
d2 <- data2[indices,] #sample for boot
fit1 <- lmer(formula, data=d1) #linear model
fit2 <- lmer(formula, data=d2) #linear model
a = (attr (VarCorr(fit1), "sc")^2) #output variance estimation
b = (attr (VarCorr(fit2), "sc")^2) #output variance estimation
drv = a - b #difference between the variance estimations
return(drv)
}

I would then put it into boot such as:

ip1.boot <- boot ( data = ip1, statistic=varcomp, R=100,
formula=CNPC~(1|Cell.line:DNA.extract)+Cell.line)

I can't do it this way because the boot function only allows for one data
set to be inputted.

*Does anyone know how to create the correct statistic function for this?*

An example of the data can also be downloaded
here<http://www.mediafire.com/file/68a3ro1cfneiy2r/data.zip.zip>(2 csv
files zipped 1.22KB.)

My data looks something like the following:

Initial

? ? ? Cell.line ? ?Time DNA.extract ? Gene ? ? ?CNPC
1 ? ? ? ? ?9 initial ? ? ? ? ? 1 atubP1 1778.4589
2 ? ? ? ? ?9 initial ? ? ? ? ? 1 atubP1 2108.0552
3 ? ? ? ? ?9 initial ? ? ? ? ? 1 atubP1 2118.6725
4 ? ? ? ? ?9 initial ? ? ? ? ? 2 atubP1 2018.6593
5 ? ? ? ? ?9 initial ? ? ? ? ? 2 atubP1 1935.9008
6 ? ? ? ? ?9 initial ? ? ? ? ? 2 atubP1 1749.9158
7 ? ? ? ? ?9 initial ? ? ? ? ? 3 atubP1 1524.7475
8 ? ? ? ? ?9 initial ? ? ? ? ? 3 atubP1 1532.9781
9 ? ? ? ? ?9 initial ? ? ? ? ? 3 atubP1 1693.3098
10 ? ? ? ?17 initial ? ? ? ? ? 1 atubP1 1076.4720
11 ? ? ? ?17 initial ? ? ? ? ? 1 atubP1 1101.3315
12 ? ? ? ?17 initial ? ? ? ? ? 1 atubP1 1185.3606
13 ? ? ? ?17 initial ? ? ? ? ? 2 atubP1 1131.1118
14 ? ? ? ?17 initial ? ? ? ? ? 2 atubP1 ?892.7087
15 ? ? ? ?17 initial ? ? ? ? ? 2 atubP1 1028.5465
16 ? ? ? ?17 initial ? ? ? ? ? 3 atubP1 ?887.9972
17 ? ? ? ?17 initial ? ? ? ? ? 3 atubP1 ?732.9646
18 ? ? ? ?17 initial ? ? ? ? ? 3 atubP1 ?680.6724

Final

? Cell.line ?Time DNA.extract ? Gene ? ? ?CNPC
1 ? ? ? ? ?9 final ? ? ? ? ? 1 atubP1 1262.2378
2 ? ? ? ? ?9 final ? ? ? ? ? 1 atubP1 1261.9858
3 ? ? ? ? ?9 final ? ? ? ? ? 1 atubP1 1390.6873
4 ? ? ? ? ?9 final ? ? ? ? ? 2 atubP1 1539.7180
5 ? ? ? ? ?9 final ? ? ? ? ? 2 atubP1 1510.5405
6 ? ? ? ? ?9 final ? ? ? ? ? 2 atubP1 1443.1767
7 ? ? ? ? ?9 final ? ? ? ? ? 3 atubP1 1456.2050
8 ? ? ? ? ?9 final ? ? ? ? ? 3 atubP1 1578.6396
9 ? ? ? ? ?9 final ? ? ? ? ? 3 atubP1 1656.1822
10 ? ? ? ?17 final ? ? ? ? ? 1 atubP1 1462.5179
11 ? ? ? ?17 final ? ? ? ? ? 1 atubP1 1580.9956
12 ? ? ? ?17 final ? ? ? ? ? 1 atubP1 1255.9020
13 ? ? ? ?17 final ? ? ? ? ? 2 atubP1 ?886.7579
14 ? ? ? ?17 final ? ? ? ? ? 2 atubP1 ?581.8116
15 ? ? ? ?17 final ? ? ? ? ? 2 atubP1 ?722.0526
16 ? ? ? ?17 final ? ? ? ? ? 3 atubP1 4168.7895
17 ? ? ? ?17 final ? ? ? ? ? 3 atubP1 3266.2105
18 ? ? ? ?17 final ? ? ? ? ? 3 atubP1 4219.5645

? ? ? ?[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/
Hi Kevin,

add an indicator to each dataset and then rbind them. The indicator
variable then distinguishes between the sources. Pass it to boot as
the strata= argument.  Then also use that indicator in the varcomp
function to distinguish between the two datasets, e.g. using the
subset= arguments in lmer..

Cheers

Andrew
 I asked this question on Stack Exchange, but I think it might be too
specialized.  Hopefully someone in the mixed model group can help me out.