Hi, I'm doing some modelling (lm) for my 3rd year dissertation and I
want to do some resampling, especially as I'm working with microbes,
getting them to evolve resistance to antimicrobial compounds, and after
each exposure I'm measuring the minimum concentration required to kill
them (which I'm expecting to rise over time, or exposures), I have 5
lineages per cleaner, and I'm using 2 cleaners(of different chemical
origin, and it's these two different origins I'm interested in, or
rather, and differences in concentration results between them). So the
amount of data I get is small, hence my desire to resample. But thats
not so important.
I have used help from Kaplans Book: Statistical Modelling A Fresh
Approach, to get write the following code for my project:
samps = do(500)*
coef(lm(MIC. ~ 1 + Challenge + Cleaner + Replicate,
data=resample(ecoli)))
sd(samps)
But the "resample" and "do" operators are functions specific to a
workspace that comes with the book, not a normal R setup. So I was
thinking of ways I could achive the same result, or sort of result
because the resample should be different each time, I think the
following would work to the same effect:
resampled_ecoli = sample(ecoli, 500, replace=T)
coefs = (coef(lm(MIC. ~ 1 + Challenge + Cleaner + Replicate,
data=resampled_ecoli)))
sd(coefs)
And then I can work out confidence intervals by multiplying the standard
errors by 2.
Although I'm not used to doing this sort of operation in R so I don't
want to do the wrong thing.
If anyon could tell me if that would work or what I need to do instead
I'd be eternally greatful.
Thanks,
Ben Ward.
Resampling to find Confidence intervals
4 messages · Dieter Menne, Ben Ward
Axolotl9250 wrote:
... resampled_ecoli = sample(ecoli, 500, replace=T) coefs = (coef(lm(MIC. ~ 1 + Challenge + Cleaner + Replicate, data=resampled_ecoli))) sd(coefs) ...
Below a simplified and self-consistent version of your code, and some
changes
Dieter
# resample
d = data.frame(x=rnorm(10))
d$y = d$x*3+rnorm(10,0.01)
# if you do this, you only get ONE bootstrap sample
d1 = d[sample(1:nrow(d),10,TRUE),]
d1.coef = coef(lm(y~x,data=d1))
d1.coef
# No error below, because you compute the sd of (Intercept) and slope
# but result is wrong!
sd(d1.coef)
# We have to do this over and over
# Check ?replicate for a more R-ish approach....
nsamples = 1000
allboot = NULL
for (i in 1:1000) {
d1 = d[sample(1:nrow(d),10,TRUE),]
d1.coef = coef(lm(y~x,data=d1))
allboot = rbind(allboot,d1.coef) # Not very efficient, preallocate!
}
head(allboot) # display first of nsamples lines
apply(allboot,2,mean) # Compute mean
apply(allboot,2,sd) # compute sd
# After you are sure you understood the above, you might try package boot.
View this message in context: http://r.789695.n4.nabble.com/Resampling-to-find-Confidence-intervals-tp3172867p3173846.html Sent from the R help mailing list archive at Nabble.com.
Ok I'll check I understand: So it's using sample, to resample d once, 10 values, because the rnorm has 10 values, with replacement (I assume thats the TRUE part). Then a for loop has this to resample the data - in the loop's case its 1000 times. Then it does a lm to get the coefficients and add them to d1.coef. I'm guessing that the allboot bit with rbind, which is null at the start of the loop, is the collection of d1.coef values, as I think that without it, every cycle of the loop the d1.coef from the previous cycle round the loop would be gone?
On 04/01/2011 16:24, Dieter Menne wrote:
Axolotl9250 wrote:
... resampled_ecoli = sample(ecoli, 500, replace=T) coefs = (coef(lm(MIC. ~ 1 + Challenge + Cleaner + Replicate, data=resampled_ecoli))) sd(coefs) ...
Below a simplified and self-consistent version of your code, and some
changes
Dieter
# resample
d = data.frame(x=rnorm(10))
d$y = d$x*3+rnorm(10,0.01)
# if you do this, you only get ONE bootstrap sample
d1 = d[sample(1:nrow(d),10,TRUE),]
d1.coef = coef(lm(y~x,data=d1))
d1.coef
# No error below, because you compute the sd of (Intercept) and slope
# but result is wrong!
sd(d1.coef)
# We have to do this over and over
# Check ?replicate for a more R-ish approach....
nsamples = 1000
allboot = NULL
for (i in 1:1000) {
d1 = d[sample(1:nrow(d),10,TRUE),]
d1.coef = coef(lm(y~x,data=d1))
allboot = rbind(allboot,d1.coef) # Not very efficient, preallocate!
}
head(allboot) # display first of nsamples lines
apply(allboot,2,mean) # Compute mean
apply(allboot,2,sd) # compute sd
# After you are sure you understood the above, you might try package boot.
You mentioned the boot package, I've just stumbled across a package called simpleboot, with a function lm.boot. Would this be suitable - it says I can sample cases from the origional dataset, as well as from the residuals of a model. Not all the options I understand but I assume the defaults might be suitable for what I'm doing?
On 04/01/2011 17:56, Ben Ward wrote:
Ok I'll check I understand: So it's using sample, to resample d once, 10 values, because the rnorm has 10 values, with replacement (I assume thats the TRUE part). Then a for loop has this to resample the data - in the loop's case its 1000 times. Then it does a lm to get the coefficients and add them to d1.coef. I'm guessing that the allboot bit with rbind, which is null at the start of the loop, is the collection of d1.coef values, as I think that without it, every cycle of the loop the d1.coef from the previous cycle round the loop would be gone? On 04/01/2011 16:24, Dieter Menne wrote: Axolotl9250 wrote:
... resampled_ecoli = sample(ecoli, 500, replace=T) coefs = (coef(lm(MIC. ~ 1 + Challenge + Cleaner + Replicate, data=resampled_ecoli))) sd(coefs) ...
Below a simplified and self-consistent version of your code, and some
changes
Dieter
# resample
d = data.frame(x=rnorm(10))
d$y = d$x*3+rnorm(10,0.01)
# if you do this, you only get ONE bootstrap sample
d1 = d[sample(1:nrow(d),10,TRUE),]
d1.coef = coef(lm(y~x,data=d1))
d1.coef
# No error below, because you compute the sd of (Intercept) and slope
# but result is wrong!
sd(d1.coef)
# We have to do this over and over
# Check ?replicate for a more R-ish approach....
nsamples = 1000
allboot = NULL
for (i in 1:1000) {
d1 = d[sample(1:nrow(d),10,TRUE),]
d1.coef = coef(lm(y~x,data=d1))
allboot = rbind(allboot,d1.coef) # Not very efficient, preallocate!
}
head(allboot) # display first of nsamples lines
apply(allboot,2,mean) # Compute mean
apply(allboot,2,sd) # compute sd
# After you are sure you understood the above, you might try package
boot.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.