Back to formatted view
Raw Message

Message-ID: <f93a14b5418efd99355a201557d9f8da@mail.nih.gov>
Date: 2005-03-03T22:22:29Z
From: Sean Davis
Subject: Rank-based p-value on large dataset

I have a fairly simple problem--I have about 80,000 values (call them 
y) that I am using as an empirical distribution and I want to find the 
p-value (never mind the multiple testing issues here, for the time 
being) of 130,000 points (call them x) from the empirical distribution. 
  I typically do that (for one-sided test) something like

loop over i in x
p.val[i] = sum(y>x[i])/length(y)

and repeat for all i.  However, length(x) is large here as is 
length(y), so this process takes quite a long time.  Any suggestions?

Thanks,
Sean