Rank-based p-value on large dataset

The x's and y's are different sets--210,000 values altogether.  That is  
really the issue--they can't just be sorted, at least that I can  
see....

Sean

When you say the 130,000 points are from the empirical distribution,  
how did
you get them? Is each one really one of the values of y? If you sorted  
y
first, would you know which one (ie which index) each x is? (Sorting  
80,000
elements took essentially no time at all on my sub-gigahertz Pentium  
III.)
But maybe that's not an option... more details would help.

Reid Huntsinger

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Sean Davis
Sent: Thursday, March 03, 2005 5:22 PM
To: r-help
Subject: [R] Rank-based p-value on large dataset

I have a fairly simple problem--I have about 80,000 values (call them
y) that I am using as an empirical distribution and I want to find the
p-value (never mind the multiple testing issues here, for the time
being) of 130,000 points (call them x) from the empirical distribution.
  I typically do that (for one-sided test) something like

loop over i in x
p.val[i] = sum(y>x[i])/length(y)

and repeat for all i.  However, length(x) is large here as is
length(y), so this process takes quite a long time.  Any suggestions?

Thanks,
Sean

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

----------------------------------------------------------------------- 
-------
Notice:  This e-mail message, together with any attachment...{{dropped}}

Rank-based p-value on large dataset

Thread (2 messages)