Back to formatted view
Raw Message

Message-ID: <9115BD0A-6559-494D-A160-86F2E481DC61@me.com>
Date: 2011-04-14T20:37:29Z
From: Marc Schwartz
Subject: Find number of elements less than some number: Elegant/fast solution needed
In-Reply-To: <5E6B0509-52C9-4BE5-9908-50FFBF12D109@gmail.com>

On Apr 14, 2011, at 2:34 PM, Kevin Ummel wrote:

> Take vector x and a subset y:
> 
> x=1:10
> 
> y=c(4,5,7,9)
> 
> For each value in 'x', I want to know how many elements in 'y' are less than 'x'.
> 
> An example would be:
> 
> sapply(x,FUN=function(i) {length(which(y<i))})
> [1] 0 0 0 0 1 2 2 3 3 4
> 
> But this solution is far too slow when x and y have lengths in the millions.
> 
> I'm certain an elegant (and computationally efficient) solution exists, but I'm in the weeds at this point.
> 
> Any help is much appreciated.
> 
> Kevin
> 
> University of Manchester
> 


I started working on a solution to your problem above and then noted the one below.

Here is one approach to the above:

> colSums(outer(y, x, "<"))
 [1] 0 0 0 0 1 2 2 3 3 4



> 
> Take two vectors x and y, where y is a subset of x:
> 
> x=1:10
> 
> y=c(2,5,6,9)
> 
> If y is removed from x, the original x values now have a new placement (index) in the resulting vector (new): 
> 
> new=x[-y]
> 
> index=1:length(new)
> 
> The challenge is: How can I *quickly* and *efficiently* deduce the new 'index' value directly from the original 'x' value -- using only 'y' as an input?
> 
> In practice, I have very large matrices containing the 'x' values, and I need to convert them to the corresponding 'index' if the 'y' values are removed.


Something like the following might work, if I correctly understand the problem:

> match(x, x[-y])
 [1]  1 NA  2  3 NA NA  4  5 NA  6


HTH,

Marc Schwartz