Skip to content

Resampling to find Confidence intervals

4 messages · Dieter Menne, Ben Ward

#
Hi, I'm doing some modelling (lm) for my 3rd year dissertation and I 
want to do some resampling, especially as I'm working with microbes, 
getting them to evolve resistance to antimicrobial compounds, and after 
each exposure I'm measuring the minimum concentration required to kill 
them (which I'm expecting to rise over time, or exposures), I have 5 
lineages per cleaner, and I'm using 2 cleaners(of different chemical 
origin, and it's these two different origins I'm interested in, or 
rather, and differences in concentration results between them). So the 
amount of data I get is small, hence my desire to resample. But thats 
not so important.

I have used help from Kaplans Book: Statistical Modelling A Fresh 
Approach, to get write the following code for my project:

samps = do(500)*
    coef(lm(MIC. ~ 1 + Challenge + Cleaner + Replicate, 
data=resample(ecoli)))
  sd(samps)

But the "resample" and "do" operators are functions specific to  a 
workspace that comes with the book, not a normal R setup. So I was 
thinking of ways I could achive the same result, or sort of result 
because the resample should be different each time, I think the 
following would work to the same effect:

resampled_ecoli = sample(ecoli, 500, replace=T)
coefs = (coef(lm(MIC. ~ 1 + Challenge + Cleaner + Replicate, 
data=resampled_ecoli)))
sd(coefs)

And then I can work out confidence intervals by multiplying the standard 
errors by 2.

Although I'm not used to doing this sort of operation in R so I don't 
want to do the wrong thing.
If anyon could tell me if that would work or what I need to do instead 
I'd be eternally greatful.

Thanks,
Ben Ward.
#
Axolotl9250 wrote:
Below a simplified and self-consistent version of your code, and some
changes

Dieter

# resample
d = data.frame(x=rnorm(10))
d$y = d$x*3+rnorm(10,0.01)

# if you do this, you only get ONE bootstrap sample
d1 = d[sample(1:nrow(d),10,TRUE),]
d1.coef = coef(lm(y~x,data=d1))
d1.coef
# No error below, because you compute the sd of (Intercept) and slope
# but result is wrong!
sd(d1.coef)

# We have to do this over and over
# Check ?replicate for a more R-ish approach....
nsamples = 1000
allboot = NULL
for (i in 1:1000) {
  d1 = d[sample(1:nrow(d),10,TRUE),]
  d1.coef = coef(lm(y~x,data=d1))
  allboot = rbind(allboot,d1.coef) # Not very efficient, preallocate!
}
head(allboot) # display first of nsamples lines
apply(allboot,2,mean) # Compute mean
apply(allboot,2,sd) # compute sd
# After you are sure you understood the above, you might try package boot.
#
Ok I'll check I understand:
So it's using sample, to resample d once, 10 values, because the rnorm 
has 10 values, with replacement (I assume thats the TRUE part).
Then a for loop has this to resample the data - in the loop's case its 
1000 times. Then it does a lm to get the coefficients and add them to 
d1.coef. I'm guessing that the allboot bit with rbind, which is null at 
the start of the loop, is the collection of d1.coef values, as I think 
that without it, every cycle of the loop the d1.coef from the previous 
cycle round the loop would be gone?
On 04/01/2011 16:24, Dieter Menne wrote:

        
Axolotl9250 wrote:

            
#
You mentioned the boot package, I've just stumbled across a package 
called simpleboot, with a function lm.boot. Would this be suitable - it 
says I can sample cases from the origional dataset, as well as from the 
residuals of a model. Not all the options I understand but I assume the 
defaults might be suitable for what I'm doing?
On 04/01/2011 17:56, Ben Ward wrote: