bootstrapping respecting subject level information
I'd like to preface this answer by suggesting that if you have multiple measurements within subjects then you should possibly be thinking about using mixed-effects models. Here you have a balanced design and seem to be thinking about a constrained bootstrap, but I don't know whether the resulting distribution will have the right properties - others on the list may. That said, I think that this needs the 'strata' argument of the boot function. See the help. Something like data.boot <- boot(data, boot.huber, 1999, strata = Subject) (not tested) Cheers Andrew
On Thu, Jul 4, 2013 at 12:19 PM, Sol Lago <solcita.lago at gmail.com> wrote:
Hi there,
This is the first time I use this forum, and I want to say from the start I am not a skilled programmer. So please let me know if the question or code were unclear!
I am trying to bootstrap an interaction (that is my test statistic) using the package "boot". My problem is that for every resample, I would like the randomization to be done within subjects, so that observations from different subjects are not mixed. Here is the code to generate a dataframe similar to mine:
Subject = rep(c("S1","S2","S3","S4"),4)
Num = rep(c("singular","plural"),8)
Gram = rep(c("gram","gram","ungram","ungram"),4)
RT = c(657,775,678,895,887,235,645,916,930,768,890,1016,590,978,450,920)
data = data.frame(Subject,Num,Gram,RT)
This is the code I used to get the empirical interaction value:
summary(lm(RT ~ Num*Gram, data=data))
As you can see, the interaction between my two factors is -348. I want to get a bootstrap confidence interval for this statistic, which I can generate using the "boot" package:
#Function to create the statistic to be boostrapped
boot.huber <- function(data, indices) {
data <- data[indices, ] #select obs. in bootstrap sample
mod <- lm(RT ~ Num*Gram, data=data)
coefficients(mod) #return coefficient vector
}
#Generate bootstrap estimate
data.boot <- boot(data, boot.huber, 1999)
#Get confidence interval
boot.ci(data.boot, index=4, type=c("norm", "perc", "bca"),conf=0.95) #4 gets the CI for the interaction
My problem is that I think the resamples should be generated without mixing the individual subjects observations: that is, to generate the new resamples, the observations from subject 1 (S1) should be shuffled within subject 1, not mixing them with the observations from subjects 2, etc... I don't know how "boot" is doing the resampling (I read the documentation but don't understand how the function is doing it)
Does anyone know how I could make sure that the resampling procedure used by "boot" respects the subject level information?
Thanks a lot for your help/advice!
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Andrew Robinson Deputy Director, CEBRA Senior Lecturer in Applied Statistics Tel: +61-3-8344-6410 Department of Mathematics and Statistics Fax: +61-3-8344 4599 University of Melbourne, VIC 3010 Australia Email: a.robinson at ms.unimelb.edu.au Website: http://www.ms.unimelb.edu.au FAwR: http://www.ms.unimelb.edu.au/~andrewpr/FAwR/ SPuR: http://www.ms.unimelb.edu.au/spuRs/