Skip to content

p-val issue for ranked two-group test

4 messages · Laurel Klein Serieys, Joshua Wiley, Peter Dalgaard

#
Hi-
I'm wondering if anyone can help me with my code.  I'm coming up dry  
when I try to get a p-value from the following code.  If I make a  
histogram of my resampled distribution, I find the difference between  
by groups to be significant.  I've ranked the data since I have  
outliers in one of my groups.

mange= c(35,  60,  81, 158, 89, 130,  90,  38, 119, 137,  52,  30,   
27, 115, 123,  31, 124,  91)

healthy= c(46, 50, 30, 58, 32, 42, 42, 33, 19, 42, 30, 26, 38, 23, 16,  
28, 42, 42, 33, 35, 51, 31, 39, 40 , 42, 38, 36, 39, 38)

l.mange<-length(mange)
l.healthy<-length(healthy)

exptdiff <- mean.mange - mean.healthy #the expected difference between  
between the mean of the ranked groups


both.chemistry<-c(mange, healthy) #concatenate two vectors into one in  
preparation for resampling the data


both.ranks<-rank(both.chemistry) #rank combined data in the case that  
there are outlying values in the data or the dataset is small

reps=1000


z<-rep(NA,reps) # z will the the simulated storage value for the  
resampling efforts

for(i in 1:reps){ #create the loop

x<- sample(both.ranks, length(both.ranks),replace=FALSE) #instructions  
for how to resample where sample the entire combined data without  
replacment

p.mange<-mean(x[(1:l.mange)])  #create a simulate mean value for the  
resampled mange values
p.healthy<-mean(x[(l.mange+1):(l.mange+l.healthy)])  #create a  
simulated mean value for the resampled healthy values

pdiff<- p.mange-p.healthy #the simulated difference between groups

z[i]<- pdiff  #the stored list of simulated differences
}
p=mean(z>=exptdiff)*2 #2-tailed test multiply by two
p

hist(z, xlab="Resample Values", main="Distribution for Two-Group BUN Test")
confints=quantile(z, c(0.025,0.975))
abline(v=confints, col="blue") #draw a line for each cutoffs
abline(v=exptdiff, col="red")

Thanks!
L.Serieys
#
Hi,

It looks like you are trying to manually bootstrap.  Take a look at:

require(boot)
?boot

as an added advantage of using boot instead of trying to do it
manually, you can easily parallelize.  In fact, if you are using one
of the pre-release versions of 2.14.0, the new parallel package is
included by default and you do not even have to go venturing out into
the wide world of CRAN to look.  That said, there are several aspects
of your code that could be readily vectorized.  More specific details
supplied if a less homework-like example is provided.

Cheers,

Josh

On Thu, Oct 20, 2011 at 10:17 AM, Laurel Klein Serieys
<laurelklein at ucla.edu> wrote:

  
    
#
On Oct 20, 2011, at 23:48 , Joshua Wiley wrote:

            
Nope. It's a manually performed approximate Wilcoxon test. Which is fair enough if the object is to learn something. (Notice, however, that the ExactRankTests package eats this sort of problem for breakfast.)

As for the actual code, the problem seems mainly to be unclear thinking. The most glaring problem is that "exptdiff" is never actually calculated, but even if it were, it wouldn't be  what the comment says that it is. If it was, it would logically be zero since the "expected mean rank" is the same in both groups (under the null, but what else could be meant?). More likely the intention is to calculate the _observed_ difference in the mean rank, as is done for pdiff (what does "p" stand for??) inside the loop, but based on both.ranks rather than on x. With clarified thinking, I'd expect things to fall into place rather easily.

  
    
#
On Thu, Oct 20, 2011 at 3:42 PM, peter dalgaard <pdalgd at gmail.com> wrote:
You are correct, Peter.  Apologies, Laurel, for a hastie read and
reply to your email.