Skip to content

How to compute p-Values

8 messages · Andreas Klein, gregor rolshausen, Yinghai Deng +1 more

#
Hello.


How can I compute the Bootstrap p-Value for a one- and two sided test, when I have a bootstrap sample of a statistic of 1000 for example?

My hypothesis are for example:

1. Two-Sided: H0: mean=0 vs. H1: mean!=0
2. One Sided: H0: mean>=0 vs. H1: mean<0



I hope you can help me


Thanks in advance


Regards,
Andreas
#
Andreas Klein wrote:
hi,
do you want to test your original t.test against t.tests of bootstrapped 
samples from you data?

if so, you can just write a function creating a vector with the 
statistics (t) of the single t.tests (in your case 1000 t.tests each 
with a bootstrapped sample of your original data -> 1000 simulated 
t-values).
you extract them by:

 > tvalue=t.test(a~factor)$statistic

then just calculate the proportion of t-values from you bootstrapped 
tests that are bigger than your original t-value.

 >p=sum(simualted_tvalue>original_tvalue)/1000


(or did I get the question wrong?)

cheers,
gregor
#
Hello.

What I wanted was:

I have a sample of 100 relizations of a random variable and I want a p-Value for the hypothesis, that the the mean of the sample equals zero (H0) or not (H1). That is for a two sampled test.
The same question holds for a one sided version, where I want to know if the mean is bigger than zero (H0) or smaller or equal than zero (H1).

Therfore I draw a bootstrap sample with replacement from the original sample and compute the mean of that bootstrap sample. I repeat this 1000 times and obtain 1000 means.

Now: How can I compute the p-Value for an one sided and two sided test like described above?



Regards,
Andreas


--- gregor rolshausen <gregor.rolshausen at biologie.uni-freiburg.de> schrieb am Mi, 14.1.2009:
#
I read the problem a bit differently than Andreas. I thought you were  
trying to create a *substitute* for the parametric t-test.

A p-value is not a statement about a group of tests. It is a statement  
about one sample of data in comparison with the theoretical (in the  
case of the parametric test), or on your case, with the bootstrap  
distribution. You want to construct a CDF of your distribution of  
means/s.d. values and package it up in a form that would allow you to  
return the *proportion* of values (the "p-value") above one particular  
new sample value.

?ecdf   #will give you information on how to turn 1000 realizations  
into a function, it's really pretty simple.

If your sample of potentially (but not necessarily) t-like statistics  
is tt then ttCDF <- ecdf(tt) will return nothing, but result in ttCDF  
becoming a function. Then with a sample value mean_a to test, you get  
useful results with:

ttCDF(mean_a)

Turning this into a "test" requires a bit more packaging but it think  
the road is clear ahead.
#
I think what you need is just count. For example, if you want to know the p
value of the mean bigger than 0 and you have 5 such cases in your draws then
the p value is 5/1000=0.005, right?

HTH
YHDENG

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Andreas Klein
Sent: January 14, 2009 9:23 AM
To: r help
Subject: Re: [R] How to compute p-Values

Hello.

What I wanted was:

I have a sample of 100 relizations of a random variable and I want a p-Value
for the hypothesis, that the the mean of the sample equals zero (H0) or not
(H1). That is for a two sampled test.
The same question holds for a one sided version, where I want to know if the
mean is bigger than zero (H0) or smaller or equal than zero (H1).

Therfore I draw a bootstrap sample with replacement from the original sample
and compute the mean of that bootstrap sample. I repeat this 1000 times and
obtain 1000 means.

Now: How can I compute the p-Value for an one sided and two sided test like
described above?



Regards,
Andreas


--- gregor rolshausen <gregor.rolshausen at biologie.uni-freiburg.de> schrieb
am Mi, 14.1.2009:
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
I think we are at the stage where it is your responsibility to provide  
some code to set up the problem.
#
Ok... I set up the problem by some code as requested:

Example:

x <- rnorm(100)

mean_x <- mean(x)

mean_boot <- numeric(1000)

for (i in 1:1000) {

  mean_boot[i] <- mean(sample(x,100,replace=TRUE))

}


How can I compute the p-Value out of mean_boot for the following tests:

1. H0: mean_x = 0 vs. H1: mean_x != 0

2. H0: mean_x >= 0 vs. H1: mean_x < 0



Is there a possibility to construct such p-Values or did I get something wrong?

Someone told, that the p-Value = 2 * min(sum(mean_boot>=0)/1000, sum(mean_boot<0)/1000) for the first (two sided) test is, but didn't get the idea behind it. Maybe someone can explain it, if it is the solution to the problem.

Regards,
Andreas.



--- David Winsemius <dwinsemius at comcast.net> schrieb am Mi, 14.1.2009:
#
On Jan 14, 2009, at 4:27 PM, Andreas Klein wrote:

            
You should realize that you are conflating p-values and Neyman-Pearson  
hypothesis-testing formalism. It's perfectly possible that your  
textbook is also doing the same sort of conflation.
That does not sound correct. For one thing, no mention is made of  
comparing the mean of a particular sample (of the same size as the  
bootstrap samples) against the distribution of bootstrap means. For  
another thing, you want to know if the sample mean either is less than  
the 0.025 quantile of the boot_mean distribution or is greater than  
the 0.975 quantile. Perhaps your informant meant to construct the test  
as 2* min( sum(mean_boot[i] < mean_x)/1000, sum(mean_boot[i] > mean_x)/ 
1000).

In the HA: mean_x <0 directed one sided case, you are only interested  
in whether the mean is below the 0.05 quantile. For the HA: mean_x >0  
you want to know if mean_x is above the 0.95 quantile

In my realization of the mean_boot I get:

quantile(mean_boot, probs = c(0.025, 0.05, 0.95, 0.975)  )
       2.5%         5%        95%      97.5%
-0.2057778 -0.1643545  0.1562328  0.1825198

Those are going to be your critical points for alpha=0.05 Neyman- 
Pearson tests of the sorts 1 and 2. The outer numbers are for the two- 
sided alternative.

For calculation of the p-values, I still think you need ecdf and  
probably Hmisc:::inverseFuntion as well. For p-values you need to go  
from the observed value back to the proportion that exceed that value.