effective way to return only the first argument of "which()"

Thu, Sep 20, 2012 8:39 AM

Thank you very much, especially Milan and Bert!
I will do some speedtests and fit the function to my needs.

I think the best way would be a modified function in C...
But i am not familiar enough with C. Perhaps this would be a simple
but useful extension.
If someone has a solution, i would appreciate a post in this mailing list.

Cheers and thanks to all,
Nico


2012/9/19 Bert Gunter <gunter.berton at gene.com>:

Well, following up on this observation, which can be put under the
heading of "Sometimes vectorization can be much slower than explicit
loops" , I offer the following:

 firsti  <-function(x,k)
{
  i <- 1
  while(x[i]<=k){i <- i+1}
  i
}

system.time(for(i in 1:100)which(x>.99)[1])

   user  system elapsed
   19.1     2.4    22.2

system.time(for(i in 1:100)which.max(x>.99))

   user  system elapsed
  30.45    6.75   37.46

system.time(for(i in 1:100)firsti(x,.99))

   user  system elapsed
   0.03    0.00    0.03

## About a 500 - 1000 fold speedup !

firsti(x,.99)

[1] 122

It doesn't seem to scale too badly, either (whatever THAT means!):
(Of course, the which() versions are essentially identical in timing,
and so are omitted)

system.time(for(i in 1:100)firsti(x,.9999))

   user  system elapsed
   2.70    0.00    2.72

firsti(x,.9999)

[1] 18200

Of course, at some point, the explicit looping is awful -- with k =
.999999, the index was about 360000, and the timing test took 54
seconds.

So I guess the point is -- as always -- that the optimal approach
depends on the nature of the data. Prudence and robustness clearly
demands the vectorized which() approaches if you have no information.
But if you do know something about the data, then you can often write
much faster tailored solutions. Which is hardly revelatory, of course.

Cheers to all,
Bert

On Wed, Sep 19, 2012 at 8:55 AM, Milan Bouchet-Valat <nalimilan at club.fr> wrote:

Le mercredi 19 septembre 2012 ? 15:23 +0000, William Dunlap a ?crit :

The original method is faster than which.max for longish numeric vectors
(in R-2.15.1), but you should check time and memory usage on your
own machine:

x <- runif(18e6)
system.time(for(i in 1:100)which(x>0.99)[1])

   user  system elapsed
  11.64    1.05   12.70

system.time(for(i in 1:100)which.max(x>0.99))

   user  system elapsed
  16.38    2.94   19.35

If you the probability that such an element appears at the beginning of
the vector, a custom hack might well be more efficient. The problem with
">", which() and which.max() is that they will go over all the elements
of the vector even if it's not needed at all. So you can start with a
small subset of the vector, and increase its size in a few steps until
you find the value you're looking for.

Proof of concept (the values of n obviously need to be adapted):
x <-runif(1e7)

find <- function(x, lim) {
    len <- length(x)

    for(n in 2^(14:0)) {
        val <- which(x[seq.int(1L, len/n)] > lim)

        if(length(val) > 0) return(val[1])
    }

    return(NULL)
}

system.time(for(i in 1:100)which(x>0.999)[1])

utilisateur     syst?me      ?coul?
      9.740       5.795      15.890

system.time(for(i in 1:100)which.max(x>0.999))

utilisateur     syst?me      ?coul?
     14.288       9.510      24.562

system.time(for(i in 1:100)find(x, .999))

utilisateur     syst?me      ?coul?
      0.017       0.002       0.019

find(x, .999)

[1] 1376

(Looks almost like cheating... ;-)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
Of Jeff Newmiller
Sent: Wednesday, September 19, 2012 8:06 AM
To: Mike Spam; r-help at r-project.org
Subject: Re: [R] effective way to return only the first argument of "which()"

?which.max
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.

Mike Spam <ichmagspam at googlemail.com> wrote:

Hi,

I was looking for a function like "which()" but only returns the first
argument.
Compare:

x <- c(1,2,3,4,5,6)
y <- 4
which(x>y)

returns:
5,6

which(x>y)[1]
returns:
5

which(x>y)[1] is exactly what i need. I did use this but the dataset
is too big (~18 mio. Points).
That's why i need a more effective way to get the first element of a
vector which is bigger/smaller than a specific number.

I found "match()" but this function only works for equal numbers.



Thanks,
Nico

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

effective way to return only the first argument of "which()"

Thread (12 messages)