An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20120206/559e7bb6/attachment-0001.pl>
Bootstrap the variance difference
4 messages · Kevin Spring, David Duffy, Joshua Wiley +1 more
On Mon, 6 Feb 2012, Kevin Spring wrote:
I can't do it this way because the boot function only allows for one data set to be inputted.
So you tried one combined dataset and subset?
Hi Kevin,
I worry a bit that your model is correctly specified and you are doing
what you really want to be doing, but that is a bit of a different
question (perhaps worth checking). In any case, your models did not
run on my system with your example data. Supposing all works well in
the full data and you are confident with what you have, then this is
one approach to dealing with the boot issue:
rather than pass two data sets, pass one wide data set in. Use
slightly different names and pass in two formulae to use the correct
variables.
Hope this helps,
Josh
P.S. If real data is large, this may be somewhat slow and would easily
benefit from parallelizing if you have multiple cores and sufficient
memory----check out the parallel option to boot().
#####################################################
dat1 <- read.csv(file.choose()) ## final
dat2 <- read.csv(file.choose()) ## initial
colnames(dat1) <- paste("f", colnames(dat1), sep = '')
colnames(dat2) <- paste("i", colnames(dat2), sep = '')
dat <- cbind(dat2, dat1)
varcomp <- function (lformula, dat, indices) {
d <- dat[indices, ]
fit1 <- lmer(lformula[[1]], data=d) #linear model
fit2 <- lmer(lformula[[2]], data=d) #linear model
a <- (attr (VarCorr(fit1), "sc")^2) #output variance estimation
b <- (attr (VarCorr(fit2), "sc")^2) #output variance estimation
drv <- (a - b) #difference between the variance estimations
return(drv)
}
require(lme4)
require(boot)
system.time(ip1.boot <- boot (data = dat, statistic = varcomp, R =
100, lformula = list(
initial = iCNPC ~ (1 | iCell.line) + (1 | iDNA.extract) + iCell.line,
final = fCNPC ~ (1 | fCell.line) + (1 | fDNA.extract) + fCell.line)))
On Mon, Feb 6, 2012 at 1:32 PM, Kevin Spring <kevinjspring at gmail.com> wrote:
?I asked this question on Stack Exchange, but I think it might be too
specialized. ?Hopefully someone in the mixed model group can help me out.
I want to be able to bootstrap the variance differences between two data
sets obtained at different times while taking out the error in a random
effect.
I have 2 sets of experimental data, where the data was measured at 2 time
points (initial and final). I also have a set of simulation data. I want to
compare the variance of the simulated date with the variance difference
between the experimental data (final - initial). The idea is to get
confidence intervals from the bootstrap to compare the experimental data
with the simulation.
I am having trouble making the statistic for the bootstrap function in the
boot package for R. So far I have.
varcomp <- function ( formula, data, indices ) {
? ?d <- data[indices,] #sample for boot
? ?fit <- lmer(formula, data=d) #linear model
? ?res.var = (attr (VarCorr(fit), "sc")^2) # variance estimation
? ?return(res.var)
? ?}
But this function only returns the variance of a single data set. I want to
be able to input 2 sets of data and have it return the difference between
the two data sets' variance.
When I try something like:
varcomp <- function ( formula, data1, data2, indices ) {
d1 <- data1[indices,] #sample for boot
d2 <- data2[indices,] #sample for boot
fit1 <- lmer(formula, data=d1) #linear model
fit2 <- lmer(formula, data=d2) #linear model
a = (attr (VarCorr(fit1), "sc")^2) #output variance estimation
b = (attr (VarCorr(fit2), "sc")^2) #output variance estimation
drv = a - b #difference between the variance estimations
return(drv)
}
I would then put it into boot such as:
ip1.boot <- boot ( data = ip1, statistic=varcomp, R=100,
formula=CNPC~(1|Cell.line:DNA.extract)+Cell.line)
I can't do it this way because the boot function only allows for one data
set to be inputted.
*Does anyone know how to create the correct statistic function for this?*
An example of the data can also be downloaded
here<http://www.mediafire.com/file/68a3ro1cfneiy2r/data.zip.zip>(2 csv
files zipped 1.22KB.)
My data looks something like the following:
Initial
? ? ? Cell.line ? ?Time DNA.extract ? Gene ? ? ?CNPC
1 ? ? ? ? ?9 initial ? ? ? ? ? 1 atubP1 1778.4589
2 ? ? ? ? ?9 initial ? ? ? ? ? 1 atubP1 2108.0552
3 ? ? ? ? ?9 initial ? ? ? ? ? 1 atubP1 2118.6725
4 ? ? ? ? ?9 initial ? ? ? ? ? 2 atubP1 2018.6593
5 ? ? ? ? ?9 initial ? ? ? ? ? 2 atubP1 1935.9008
6 ? ? ? ? ?9 initial ? ? ? ? ? 2 atubP1 1749.9158
7 ? ? ? ? ?9 initial ? ? ? ? ? 3 atubP1 1524.7475
8 ? ? ? ? ?9 initial ? ? ? ? ? 3 atubP1 1532.9781
9 ? ? ? ? ?9 initial ? ? ? ? ? 3 atubP1 1693.3098
10 ? ? ? ?17 initial ? ? ? ? ? 1 atubP1 1076.4720
11 ? ? ? ?17 initial ? ? ? ? ? 1 atubP1 1101.3315
12 ? ? ? ?17 initial ? ? ? ? ? 1 atubP1 1185.3606
13 ? ? ? ?17 initial ? ? ? ? ? 2 atubP1 1131.1118
14 ? ? ? ?17 initial ? ? ? ? ? 2 atubP1 ?892.7087
15 ? ? ? ?17 initial ? ? ? ? ? 2 atubP1 1028.5465
16 ? ? ? ?17 initial ? ? ? ? ? 3 atubP1 ?887.9972
17 ? ? ? ?17 initial ? ? ? ? ? 3 atubP1 ?732.9646
18 ? ? ? ?17 initial ? ? ? ? ? 3 atubP1 ?680.6724
Final
? Cell.line ?Time DNA.extract ? Gene ? ? ?CNPC
1 ? ? ? ? ?9 final ? ? ? ? ? 1 atubP1 1262.2378
2 ? ? ? ? ?9 final ? ? ? ? ? 1 atubP1 1261.9858
3 ? ? ? ? ?9 final ? ? ? ? ? 1 atubP1 1390.6873
4 ? ? ? ? ?9 final ? ? ? ? ? 2 atubP1 1539.7180
5 ? ? ? ? ?9 final ? ? ? ? ? 2 atubP1 1510.5405
6 ? ? ? ? ?9 final ? ? ? ? ? 2 atubP1 1443.1767
7 ? ? ? ? ?9 final ? ? ? ? ? 3 atubP1 1456.2050
8 ? ? ? ? ?9 final ? ? ? ? ? 3 atubP1 1578.6396
9 ? ? ? ? ?9 final ? ? ? ? ? 3 atubP1 1656.1822
10 ? ? ? ?17 final ? ? ? ? ? 1 atubP1 1462.5179
11 ? ? ? ?17 final ? ? ? ? ? 1 atubP1 1580.9956
12 ? ? ? ?17 final ? ? ? ? ? 1 atubP1 1255.9020
13 ? ? ? ?17 final ? ? ? ? ? 2 atubP1 ?886.7579
14 ? ? ? ?17 final ? ? ? ? ? 2 atubP1 ?581.8116
15 ? ? ? ?17 final ? ? ? ? ? 2 atubP1 ?722.0526
16 ? ? ? ?17 final ? ? ? ? ? 3 atubP1 4168.7895
17 ? ? ? ?17 final ? ? ? ? ? 3 atubP1 3266.2105
18 ? ? ? ?17 final ? ? ? ? ? 3 atubP1 4219.5645
? ? ? ?[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/
Hi Kevin, add an indicator to each dataset and then rbind them. The indicator variable then distinguishes between the sources. Pass it to boot as the strata= argument. Then also use that indicator in the varcomp function to distinguish between the two datasets, e.g. using the subset= arguments in lmer.. Cheers Andrew
On Mon, Feb 06, 2012 at 03:32:35PM -0600, Kevin Spring wrote:
I asked this question on Stack Exchange, but I think it might be too
specialized. Hopefully someone in the mixed model group can help me out.
I want to be able to bootstrap the variance differences between two data
sets obtained at different times while taking out the error in a random
effect.
I have 2 sets of experimental data, where the data was measured at 2 time
points (initial and final). I also have a set of simulation data. I want to
compare the variance of the simulated date with the variance difference
between the experimental data (final - initial). The idea is to get
confidence intervals from the bootstrap to compare the experimental data
with the simulation.
I am having trouble making the statistic for the bootstrap function in the
boot package for R. So far I have.
varcomp <- function ( formula, data, indices ) {
d <- data[indices,] #sample for boot
fit <- lmer(formula, data=d) #linear model
res.var = (attr (VarCorr(fit), "sc")^2) # variance estimation
return(res.var)
}
But this function only returns the variance of a single data set. I want to
be able to input 2 sets of data and have it return the difference between
the two data sets' variance.
When I try something like:
varcomp <- function ( formula, data1, data2, indices ) {
d1 <- data1[indices,] #sample for boot
d2 <- data2[indices,] #sample for boot
fit1 <- lmer(formula, data=d1) #linear model
fit2 <- lmer(formula, data=d2) #linear model
a = (attr (VarCorr(fit1), "sc")^2) #output variance estimation
b = (attr (VarCorr(fit2), "sc")^2) #output variance estimation
drv = a - b #difference between the variance estimations
return(drv)
}
I would then put it into boot such as:
ip1.boot <- boot ( data = ip1, statistic=varcomp, R=100,
formula=CNPC~(1|Cell.line:DNA.extract)+Cell.line)
I can't do it this way because the boot function only allows for one data
set to be inputted.
*Does anyone know how to create the correct statistic function for this?*
An example of the data can also be downloaded
here<http://www.mediafire.com/file/68a3ro1cfneiy2r/data.zip.zip>(2 csv
files zipped 1.22KB.)
My data looks something like the following:
Initial
Cell.line Time DNA.extract Gene CNPC
1 9 initial 1 atubP1 1778.4589
2 9 initial 1 atubP1 2108.0552
3 9 initial 1 atubP1 2118.6725
4 9 initial 2 atubP1 2018.6593
5 9 initial 2 atubP1 1935.9008
6 9 initial 2 atubP1 1749.9158
7 9 initial 3 atubP1 1524.7475
8 9 initial 3 atubP1 1532.9781
9 9 initial 3 atubP1 1693.3098
10 17 initial 1 atubP1 1076.4720
11 17 initial 1 atubP1 1101.3315
12 17 initial 1 atubP1 1185.3606
13 17 initial 2 atubP1 1131.1118
14 17 initial 2 atubP1 892.7087
15 17 initial 2 atubP1 1028.5465
16 17 initial 3 atubP1 887.9972
17 17 initial 3 atubP1 732.9646
18 17 initial 3 atubP1 680.6724
Final
Cell.line Time DNA.extract Gene CNPC
1 9 final 1 atubP1 1262.2378
2 9 final 1 atubP1 1261.9858
3 9 final 1 atubP1 1390.6873
4 9 final 2 atubP1 1539.7180
5 9 final 2 atubP1 1510.5405
6 9 final 2 atubP1 1443.1767
7 9 final 3 atubP1 1456.2050
8 9 final 3 atubP1 1578.6396
9 9 final 3 atubP1 1656.1822
10 17 final 1 atubP1 1462.5179
11 17 final 1 atubP1 1580.9956
12 17 final 1 atubP1 1255.9020
13 17 final 2 atubP1 886.7579
14 17 final 2 atubP1 581.8116
15 17 final 2 atubP1 722.0526
16 17 final 3 atubP1 4168.7895
17 17 final 3 atubP1 3266.2105
18 17 final 3 atubP1 4219.5645
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Andrew Robinson Deputy Director, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/