Skip to content

Speed of for loops

12 messages · Tom McCallum, Tamas K Papp, Ramon Diaz-Uriarte +3 more

#
Hi Everyone,

I have a question about for loops.  If you have something like:

f <- function(x) {
	y <- rep(NA,10);
	for( i in 1:10 ) {
		if ( i > 3 ) {
			if ( is.na(y[i-3]) == FALSE ) {
				# some calculation F which depends on one or more of the previously  
generated values in the series
				y[i] = y[i-1]+x[i];
			} else {
				y[i] <- x[i];
			}
		}
	}
	y
}

e.g.
[1] NA NA NA  4  5  6 13 21 30 40

is there a faster way to process this than with a 'for' loop?  I have  
looked at lapply as well but I have read that lapply is no faster than a  
for loop and for my particular application it is easier to use a for loop.  
Also I have seen 'rle' which I think may help me but am not sure as I have  
only just come across it, any ideas?

Many thanks

Tom
#
Tom,

*apply's generally speed up calculations dramatically. However, if and 
only if you do a repetitive operation on a vector, list matrix which 
does NOT require accessing other elements of that variable than the one 
currently in the *apply index. This means in your case any of *apply 
will not speed up your calculation (until you significantly rethink the 
code). At the same time, you can speed up your code by orders of 
magnitude using c-functions for "complex" vector indexing operations. If 
you need instructions, I can send you a very nice "Step-by-step guide 
for using C/C++ in R" which goes beyond "Writing R Extensions" document.

Otherwise, such questions should be posted to R-help, not Rd, please 
post correspondingly.

Best regards,
Oleg
Tom McCallum wrote:

  
    
#
On Tue, Jan 30, 2007 at 12:15:29PM +0000, Oleg Sklyar wrote:

            
Hi Oleg,

Can you please post this guide online?  I think that many people would
be interested in reading it, incl. me.

Tamas
#
I know this should not go to [Rd], but the original post was there and 
the replies as well.

Thank you all who expressed interest in the "Step-by-step guide for 
using C/C++ in R"! Answering some of you, yes it is by me and was 
written to assist other group members to start adding c/c++ code to 
their R coding.

You can now download it from:

http://www.ebi.ac.uk/~osklyar/kb/CtoRinterfacingPrimer.pdf

I would also appreciate your comments if you find it useful or not, or 
maybe what can be added or modified. But not on the list, directly to my 
email please.

Best wishes,
Oleg
Tamas K Papp wrote:

  
    
#
On Tuesday 30 January 2007 15:46, Tamas K Papp wrote:
Me too.

Thanks,

R.

  
    
#
Tom McCallum wrote:
Hi Tom,

In the general case, you need a loop in order to propagate calculations
and their results across a vector.

In _your_ particular case however, it seems that all you are doing is a
cumulative sum on x (at least this is what's happening for i >= 6).
So you could do:

f2 <- function(x)
{
    offset <- 3
    start_propagate_at <- 6
    y_length <- 10
    init_range <- (offset+1):start_propagate_at
    y <- rep(NA, offset)
    y[init_range] <- x[init_range]
    y[start_propagate_at:y_length] <- cumsum(x[start_propagate_at:y_length])
    y
}

and it will return the same thing as your function 'f' (at least when 'x' doesn't
contain NAs) but it's not faster :-/

IMO, using sapply for propagating calculations across a vector is not appropriate
because:

  (1) It requires special care. For example, this:

        > x <- 1:10
        > sapply(2:length(x), function(i) {x[i] <- x[i-1]+x[i]})

      doesn't work because the 'x' symbol on the left side of the <- in the
      anonymous function doesn't refer to the 'x' symbol defined in the global
      environment. So you need to use tricks like this:

        > sapply(2:length(x),
                 function(i) {x[i] <- x[i-1]+x[i]; assign("x", x, envir=.GlobalEnv); x[i]})

  (2) Because of this kind of tricks, then it is _very_ slow (about 10 times
      slower or more than a 'for' loop).

Cheers,
H.
#
Actually, why not use a closure to store previous value(s)?

In the simple case, which depends on x_i and y_{i-1}

gen.iter = function(x) {
    y = NA
    function(i) {
       y <<- if(is.na(y)) x[i] else y+x[i]
    }
}

y = sapply(1:10,gen.iter(x))

Obviously you can modify the function for the bookkeeping required to
manage whatever lag you need. I use this sometimes when I'm
implementing MCMC samplers of various kinds.
On 1/30/07, Herve Pages <hpages at fhcrc.org> wrote:

  
    
#
Actually, better yet:

gen.iter = function(y=NA) {
  function(x) {
    y <<- if(is.na(y)) x else x+y
  }
}
sapply(x,gen.iter())
On 1/30/07, Byron Ellis <byron.ellis at gmail.com> wrote:

  
    
#
It is surely an elegant way of doing things (although far from being 
easy to parse visually) but is it really faster than a loop?

After all, the indexing problem is the same and sapply simply does the 
same job as for in this case, plus "<<-" will _search_ through the 
environment on every single step. Where is the gain?

Oleg

--
Dr Oleg Sklyar | EBI-EMBL, Cambridge CB10 1SD, UK | +44-1223-494466
Byron Ellis wrote:
#
IIRC a for loop has more per-iteration overhead that lapply, but the
real answer is "it depends on what you're doing exactly." I've seen it
be a faster, slower and equal approach.
On 1/30/07, Oleg Sklyar <osklyar at ebi.ac.uk> wrote:

  
    
#
Hi,
Byron Ellis wrote:
gen.iter = function(y=NA) {
 function(x) {
   y <<- if(is.na(y)) x else x+y
 }
}

sapply + gen.iter is slithly faster on small vectors:

  > x <- rep(1, 5000)
  > system.time(tt <- sapply(x,gen.iter()))
     user  system elapsed
    0.012   0.000   0.012
  > x <- rep(1, 5000)
  > system.time(tt <- for(i in 2:length(x)) {x[i] <- x[i-1]+x[i]})
     user  system elapsed
    0.016   0.000   0.016

but much slower on big vectors:

  > x <- rep(1, 10000000)
  > system.time(tt <- sapply(x,gen.iter()))
     user  system elapsed
  138.589   0.964 139.633
  > x <- rep(1, 10000000)
  > system.time(tt <- for(i in 2:length(x)) {x[i] <- x[i-1]+x[i]})
     user  system elapsed
   29.978   0.480  30.454


Cheers,
H.
#
Thank you all for your advice and tips.  In the end, I think the for loop  
is the easiest way forward due to other requirements but its good to know  
that I haven't missed anything too obvious.

Tom
On Tue, 30 Jan 2007 23:42:27 -0000, Oleg Sklyar <osklyar at ebi.ac.uk> wrote: