Speed of for loops

12 messages · Tom McCallum, Tamas K Papp, Ramon Diaz-Uriarte +3 more

Original

1

12

Tom McCallum

Tue, Jan 30, 2007 2:38 AM #

Hi Everyone,

I have a question about for loops.  If you have something like:

f <- function(x) {
	y <- rep(NA,10);
	for( i in 1:10 ) {
		if ( i > 3 ) {
			if ( is.na(y[i-3]) == FALSE ) {
				# some calculation F which depends on one or more of the previously  
generated values in the series
				y[i] = y[i-1]+x[i];
			} else {
				y[i] <- x[i];
			}
		}
	}
	y
}

e.g.

[1] NA NA NA  4  5  6 13 21 30 40

is there a faster way to process this than with a 'for' loop?  I have  
looked at lapply as well but I have read that lapply is no faster than a  
for loop and for my particular application it is easier to use a for loop.  
Also I have seen 'rle' which I think may help me but am not sure as I have  
only just come across it, any ideas?

Many thanks

Tom

Dr. Thomas McCallum
Systems Architect,
Level E Limited
ETTC, The King's Buildings
Mayfield Road,
Edinburgh EH9 3JL, UK
Work  +44 (0) 131 472 4813
Fax:  +44 (0) 131 472 4719
http://www.levelelimited.com
Email: tom at levelelimited.com

Level E is a limited company incorporated in Scotland. The c...{{dropped}}

Oleg Sklyar

Tue, Jan 30, 2007 4:15 AM #

Tom,

*apply's generally speed up calculations dramatically. However, if and 
only if you do a repetitive operation on a vector, list matrix which 
does NOT require accessing other elements of that variable than the one 
currently in the *apply index. This means in your case any of *apply 
will not speed up your calculation (until you significantly rethink the 
code). At the same time, you can speed up your code by orders of 
magnitude using c-functions for "complex" vector indexing operations. If 
you need instructions, I can send you a very nice "Step-by-step guide 
for using C/C++ in R" which goes beyond "Writing R Extensions" document.

Otherwise, such questions should be posted to R-help, not Rd, please 
post correspondingly.

Best regards,
Oleg

Tom McCallum wrote:

Dr Oleg Sklyar * EBI/EMBL, Cambridge CB10 1SD, England * +44-1223-494466

Tamas K Papp

Tue, Jan 30, 2007 6:46 AM #

On Tue, Jan 30, 2007 at 12:15:29PM +0000, Oleg Sklyar wrote:

Hi Oleg,

Can you please post this guide online?  I think that many people would
be interested in reading it, incl. me.

Tamas

Oleg Sklyar

Tue, Jan 30, 2007 7:27 AM #

I know this should not go to [Rd], but the original post was there and 
the replies as well.

Thank you all who expressed interest in the "Step-by-step guide for 
using C/C++ in R"! Answering some of you, yes it is by me and was 
written to assist other group members to start adding c/c++ code to 
their R coding.

You can now download it from:

http://www.ebi.ac.uk/~osklyar/kb/CtoRinterfacingPrimer.pdf

I would also appreciate your comments if you find it useful or not, or 
maybe what can be added or modified. But not on the list, directly to my 
email please.

Best wishes,
Oleg

Tamas K Papp wrote:

Dr Oleg Sklyar * EBI/EMBL, Cambridge CB10 1SD, England * +44-1223-494466

Ramon Diaz-Uriarte

Tue, Jan 30, 2007 8:57 AM #

On Tuesday 30 January 2007 15:46, Tamas K Papp wrote:

Me too.

Thanks,

R.

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Ram?n D?az-Uriarte
Centro Nacional de Investigaciones Oncol?gicas (CNIO)
(Spanish National Cancer Center)
Melchor Fern?ndez Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y en s...{{dropped}}

Tue, Jan 30, 2007 2:29 PM #

Tom McCallum wrote:

Hi Tom,

In the general case, you need a loop in order to propagate calculations
and their results across a vector.

In _your_ particular case however, it seems that all you are doing is a
cumulative sum on x (at least this is what's happening for i >= 6).
So you could do:

f2 <- function(x)
{
    offset <- 3
    start_propagate_at <- 6
    y_length <- 10
    init_range <- (offset+1):start_propagate_at
    y <- rep(NA, offset)
    y[init_range] <- x[init_range]
    y[start_propagate_at:y_length] <- cumsum(x[start_propagate_at:y_length])
    y
}

and it will return the same thing as your function 'f' (at least when 'x' doesn't
contain NAs) but it's not faster :-/

IMO, using sapply for propagating calculations across a vector is not appropriate
because:

  (1) It requires special care. For example, this:

        > x <- 1:10
        > sapply(2:length(x), function(i) {x[i] <- x[i-1]+x[i]})

      doesn't work because the 'x' symbol on the left side of the <- in the
      anonymous function doesn't refer to the 'x' symbol defined in the global
      environment. So you need to use tricks like this:

        > sapply(2:length(x),
                 function(i) {x[i] <- x[i-1]+x[i]; assign("x", x, envir=.GlobalEnv); x[i]})

  (2) Because of this kind of tricks, then it is _very_ slow (about 10 times
      slower or more than a 'for' loop).

Cheers,
H.

Byron Ellis

Tue, Jan 30, 2007 3:23 PM #

Actually, why not use a closure to store previous value(s)?

In the simple case, which depends on x_i and y_{i-1}

gen.iter = function(x) {
    y = NA
    function(i) {
       y <<- if(is.na(y)) x[i] else y+x[i]
    }
}

y = sapply(1:10,gen.iter(x))

Obviously you can modify the function for the bookkeeping required to
manage whatever lag you need. I use this sometimes when I'm
implementing MCMC samplers of various kinds.

On 1/30/07, Herve Pages <hpages at fhcrc.org> wrote:

Tom McCallum wrote:

Hi Everyone,

I have a question about for loops.  If you have something like:

f <- function(x) {
      y <- rep(NA,10);
      for( i in 1:10 ) {
              if ( i > 3 ) {
                      if ( is.na(y[i-3]) == FALSE ) {
                              # some calculation F which depends on one or more of the previously
generated values in the series
                              y[i] = y[i-1]+x[i];
                      } else {
                              y[i] <- x[i];
                      }
              }
      }
      y
}

e.g.

f(c(1,2,3,4,5,6,7,8,9,10,11,12));

  [1] NA NA NA  4  5  6 13 21 30 40

is there a faster way to process this than with a 'for' loop?  I have
looked at lapply as well but I have read that lapply is no faster than a
for loop and for my particular application it is easier to use a for loop.
Also I have seen 'rle' which I think may help me but am not sure as I have
only just come across it, any ideas?

Hi Tom,

In the general case, you need a loop in order to propagate calculations
and their results across a vector.

In _your_ particular case however, it seems that all you are doing is a
cumulative sum on x (at least this is what's happening for i >= 6).
So you could do:

f2 <- function(x)
{
    offset <- 3
    start_propagate_at <- 6
    y_length <- 10
    init_range <- (offset+1):start_propagate_at
    y <- rep(NA, offset)
    y[init_range] <- x[init_range]
    y[start_propagate_at:y_length] <- cumsum(x[start_propagate_at:y_length])
    y
}

and it will return the same thing as your function 'f' (at least when 'x' doesn't
contain NAs) but it's not faster :-/

IMO, using sapply for propagating calculations across a vector is not appropriate
because:

  (1) It requires special care. For example, this:

        > x <- 1:10
        > sapply(2:length(x), function(i) {x[i] <- x[i-1]+x[i]})

      doesn't work because the 'x' symbol on the left side of the <- in the
      anonymous function doesn't refer to the 'x' symbol defined in the global
      environment. So you need to use tricks like this:

        > sapply(2:length(x),

                 function(i) {x[i] <- x[i-1]+x[i]; assign("x", x, envir=.GlobalEnv); x[i]})

  (2) Because of this kind of tricks, then it is _very_ slow (about 10 times
      slower or more than a 'for' loop).

Cheers,
H.

Many thanks

Tom

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Byron Ellis (byron.ellis at gmail.com)
"Oook" -- The Librarian

Byron Ellis

Tue, Jan 30, 2007 3:25 PM #

Actually, better yet:

gen.iter = function(y=NA) {
  function(x) {
    y <<- if(is.na(y)) x else x+y
  }
}
sapply(x,gen.iter())

On 1/30/07, Byron Ellis <byron.ellis at gmail.com> wrote:

Actually, why not use a closure to store previous value(s)?

In the simple case, which depends on x_i and y_{i-1}

gen.iter = function(x) {
    y = NA
    function(i) {
       y <<- if(is.na(y)) x[i] else y+x[i]
    }
}

y = sapply(1:10,gen.iter(x))

Obviously you can modify the function for the bookkeeping required to
manage whatever lag you need. I use this sometimes when I'm
implementing MCMC samplers of various kinds.


On 1/30/07, Herve Pages <hpages at fhcrc.org> wrote:

Tom McCallum wrote:

Hi Everyone,

I have a question about for loops.  If you have something like:

f <- function(x) {
      y <- rep(NA,10);
      for( i in 1:10 ) {
              if ( i > 3 ) {
                      if ( is.na(y[i-3]) == FALSE ) {
                              # some calculation F which depends on one or more of the previously
generated values in the series
                              y[i] = y[i-1]+x[i];
                      } else {
                              y[i] <- x[i];
                      }
              }
      }
      y
}

e.g.

f(c(1,2,3,4,5,6,7,8,9,10,11,12));

  [1] NA NA NA  4  5  6 13 21 30 40

is there a faster way to process this than with a 'for' loop?  I have
looked at lapply as well but I have read that lapply is no faster than a
for loop and for my particular application it is easier to use a for loop.
Also I have seen 'rle' which I think may help me but am not sure as I have
only just come across it, any ideas?

Hi Tom,

In the general case, you need a loop in order to propagate calculations
and their results across a vector.

In _your_ particular case however, it seems that all you are doing is a
cumulative sum on x (at least this is what's happening for i >= 6).
So you could do:

f2 <- function(x)
{
    offset <- 3
    start_propagate_at <- 6
    y_length <- 10
    init_range <- (offset+1):start_propagate_at
    y <- rep(NA, offset)
    y[init_range] <- x[init_range]
    y[start_propagate_at:y_length] <- cumsum(x[start_propagate_at:y_length])
    y
}

and it will return the same thing as your function 'f' (at least when 'x' doesn't
contain NAs) but it's not faster :-/

IMO, using sapply for propagating calculations across a vector is not appropriate
because:

  (1) It requires special care. For example, this:

        > x <- 1:10
        > sapply(2:length(x), function(i) {x[i] <- x[i-1]+x[i]})

      doesn't work because the 'x' symbol on the left side of the <- in the
      anonymous function doesn't refer to the 'x' symbol defined in the global
      environment. So you need to use tricks like this:

        > sapply(2:length(x),

                 function(i) {x[i] <- x[i-1]+x[i]; assign("x", x, envir=.GlobalEnv); x[i]})

  (2) Because of this kind of tricks, then it is _very_ slow (about 10 times
      slower or more than a 'for' loop).

Cheers,
H.

Many thanks

Tom

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Byron Ellis (byron.ellis at gmail.com)
"Oook" -- The Librarian

Oleg Sklyar

Tue, Jan 30, 2007 3:42 PM #

It is surely an elegant way of doing things (although far from being 
easy to parse visually) but is it really faster than a loop?

After all, the indexing problem is the same and sapply simply does the 
same job as for in this case, plus "<<-" will _search_ through the 
environment on every single step. Where is the gain?

Oleg

--
Dr Oleg Sklyar | EBI-EMBL, Cambridge CB10 1SD, UK | +44-1223-494466

Byron Ellis wrote:

Actually, why not use a closure to store previous value(s)?

In the simple case, which depends on x_i and y_{i-1}

gen.iter = function(x) {
    y = NA
    function(i) {
       y <<- if(is.na(y)) x[i] else y+x[i]
    }
}

y = sapply(1:10,gen.iter(x))

Obviously you can modify the function for the bookkeeping required to
manage whatever lag you need. I use this sometimes when I'm
implementing MCMC samplers of various kinds.


On 1/30/07, Herve Pages <hpages at fhcrc.org> wrote:

Tom McCallum wrote:

Hi Everyone,

I have a question about for loops.  If you have something like:

f <- function(x) {
      y <- rep(NA,10);
      for( i in 1:10 ) {
              if ( i > 3 ) {
                      if ( is.na(y[i-3]) == FALSE ) {
                              # some calculation F which depends on one or more of the previously
generated values in the series
                              y[i] = y[i-1]+x[i];
                      } else {
                              y[i] <- x[i];
                      }
              }
      }
      y
}

e.g.

f(c(1,2,3,4,5,6,7,8,9,10,11,12));

  [1] NA NA NA  4  5  6 13 21 30 40

is there a faster way to process this than with a 'for' loop?  I have
looked at lapply as well but I have read that lapply is no faster than a
for loop and for my particular application it is easier to use a for loop.
Also I have seen 'rle' which I think may help me but am not sure as I have
only just come across it, any ideas?

Hi Tom,

In the general case, you need a loop in order to propagate calculations
and their results across a vector.

In _your_ particular case however, it seems that all you are doing is a
cumulative sum on x (at least this is what's happening for i >= 6).
So you could do:

f2 <- function(x)
{
    offset <- 3
    start_propagate_at <- 6
    y_length <- 10
    init_range <- (offset+1):start_propagate_at
    y <- rep(NA, offset)
    y[init_range] <- x[init_range]
    y[start_propagate_at:y_length] <- cumsum(x[start_propagate_at:y_length])
    y
}

and it will return the same thing as your function 'f' (at least when 'x' doesn't
contain NAs) but it's not faster :-/

IMO, using sapply for propagating calculations across a vector is not appropriate
because:

  (1) It requires special care. For example, this:

        > x <- 1:10
        > sapply(2:length(x), function(i) {x[i] <- x[i-1]+x[i]})

      doesn't work because the 'x' symbol on the left side of the <- in the
      anonymous function doesn't refer to the 'x' symbol defined in the global
      environment. So you need to use tricks like this:

        > sapply(2:length(x),

                 function(i) {x[i] <- x[i-1]+x[i]; assign("x", x, envir=.GlobalEnv); x[i]})

  (2) Because of this kind of tricks, then it is _very_ slow (about 10 times
      slower or more than a 'for' loop).

Cheers,
H.

Many thanks

Tom

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Byron Ellis

Tue, Jan 30, 2007 4:01 PM #

IIRC a for loop has more per-iteration overhead that lapply, but the
real answer is "it depends on what you're doing exactly." I've seen it
be a faster, slower and equal approach.

On 1/30/07, Oleg Sklyar <osklyar at ebi.ac.uk> wrote:

It is surely an elegant way of doing things (although far from being
easy to parse visually) but is it really faster than a loop?

After all, the indexing problem is the same and sapply simply does the
same job as for in this case, plus "<<-" will _search_ through the
environment on every single step. Where is the gain?

Oleg

--
Dr Oleg Sklyar | EBI-EMBL, Cambridge CB10 1SD, UK | +44-1223-494466


Byron Ellis wrote:

Actually, why not use a closure to store previous value(s)?

In the simple case, which depends on x_i and y_{i-1}

gen.iter = function(x) {
    y = NA
    function(i) {
       y <<- if(is.na(y)) x[i] else y+x[i]
    }
}

y = sapply(1:10,gen.iter(x))

Obviously you can modify the function for the bookkeeping required to
manage whatever lag you need. I use this sometimes when I'm
implementing MCMC samplers of various kinds.


On 1/30/07, Herve Pages <hpages at fhcrc.org> wrote:

Tom McCallum wrote:

Hi Everyone,

I have a question about for loops.  If you have something like:

f <- function(x) {
      y <- rep(NA,10);
      for( i in 1:10 ) {
              if ( i > 3 ) {
                      if ( is.na(y[i-3]) == FALSE ) {
                              # some calculation F which depends on one or more of the previously
generated values in the series
                              y[i] = y[i-1]+x[i];
                      } else {
                              y[i] <- x[i];
                      }
              }
      }
      y
}

e.g.

f(c(1,2,3,4,5,6,7,8,9,10,11,12));

  [1] NA NA NA  4  5  6 13 21 30 40

is there a faster way to process this than with a 'for' loop?  I have
looked at lapply as well but I have read that lapply is no faster than a
for loop and for my particular application it is easier to use a for loop.
Also I have seen 'rle' which I think may help me but am not sure as I have
only just come across it, any ideas?

Hi Tom,

In the general case, you need a loop in order to propagate calculations
and their results across a vector.

In _your_ particular case however, it seems that all you are doing is a
cumulative sum on x (at least this is what's happening for i >= 6).
So you could do:

f2 <- function(x)
{
    offset <- 3
    start_propagate_at <- 6
    y_length <- 10
    init_range <- (offset+1):start_propagate_at
    y <- rep(NA, offset)
    y[init_range] <- x[init_range]
    y[start_propagate_at:y_length] <- cumsum(x[start_propagate_at:y_length])
    y
}

and it will return the same thing as your function 'f' (at least when 'x' doesn't
contain NAs) but it's not faster :-/

IMO, using sapply for propagating calculations across a vector is not appropriate
because:

  (1) It requires special care. For example, this:

        > x <- 1:10
        > sapply(2:length(x), function(i) {x[i] <- x[i-1]+x[i]})

      doesn't work because the 'x' symbol on the left side of the <- in the
      anonymous function doesn't refer to the 'x' symbol defined in the global
      environment. So you need to use tricks like this:

        > sapply(2:length(x),

                 function(i) {x[i] <- x[i-1]+x[i]; assign("x", x, envir=.GlobalEnv); x[i]})

  (2) Because of this kind of tricks, then it is _very_ slow (about 10 times
      slower or more than a 'for' loop).

Cheers,
H.

Many thanks

Tom

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Byron Ellis (byron.ellis at gmail.com)
"Oook" -- The Librarian

Tue, Jan 30, 2007 4:44 PM #

Hi,

Byron Ellis wrote:

gen.iter = function(y=NA) {
 function(x) {
   y <<- if(is.na(y)) x else x+y
 }
}

sapply + gen.iter is slithly faster on small vectors:

  > x <- rep(1, 5000)
  > system.time(tt <- sapply(x,gen.iter()))
     user  system elapsed
    0.012   0.000   0.012
  > x <- rep(1, 5000)
  > system.time(tt <- for(i in 2:length(x)) {x[i] <- x[i-1]+x[i]})
     user  system elapsed
    0.016   0.000   0.016

but much slower on big vectors:

  > x <- rep(1, 10000000)
  > system.time(tt <- sapply(x,gen.iter()))
     user  system elapsed
  138.589   0.964 139.633
  > x <- rep(1, 10000000)
  > system.time(tt <- for(i in 2:length(x)) {x[i] <- x[i-1]+x[i]})
     user  system elapsed
   29.978   0.480  30.454


Cheers,
H.

Tom McCallum

Wed, Jan 31, 2007 4:35 AM #

Thank you all for your advice and tips.  In the end, I think the for loop  
is the easiest way forward due to other requirements but its good to know  
that I haven't missed anything too obvious.

Tom

On Tue, 30 Jan 2007 23:42:27 -0000, Oleg Sklyar <osklyar at ebi.ac.uk> wrote:

It is surely an elegant way of doing things (although far from being
easy to parse visually) but is it really faster than a loop?

After all, the indexing problem is the same and sapply simply does the
same job as for in this case, plus "<<-" will _search_ through the
environment on every single step. Where is the gain?

Oleg

--
Dr Oleg Sklyar | EBI-EMBL, Cambridge CB10 1SD, UK | +44-1223-494466


Byron Ellis wrote:

Actually, why not use a closure to store previous value(s)?

In the simple case, which depends on x_i and y_{i-1}

gen.iter = function(x) {
    y = NA
    function(i) {
       y <<- if(is.na(y)) x[i] else y+x[i]
    }
}

y = sapply(1:10,gen.iter(x))

Obviously you can modify the function for the bookkeeping required to
manage whatever lag you need. I use this sometimes when I'm
implementing MCMC samplers of various kinds.


On 1/30/07, Herve Pages <hpages at fhcrc.org> wrote:

Tom McCallum wrote:

Hi Everyone,

I have a question about for loops.  If you have something like:

f <- function(x) {
      y <- rep(NA,10);
      for( i in 1:10 ) {
              if ( i > 3 ) {
                      if ( is.na(y[i-3]) == FALSE ) {
                              # some calculation F which depends on  
one or more of the previously
generated values in the series
                              y[i] = y[i-1]+x[i];
                      } else {
                              y[i] <- x[i];
                      }
              }
      }
      y
}

e.g.

f(c(1,2,3,4,5,6,7,8,9,10,11,12));

  [1] NA NA NA  4  5  6 13 21 30 40

is there a faster way to process this than with a 'for' loop?  I have
looked at lapply as well but I have read that lapply is no faster  
than a
for loop and for my particular application it is easier to use a for  
loop.
Also I have seen 'rle' which I think may help me but am not sure as I  
have
only just come across it, any ideas?

Hi Tom,

In the general case, you need a loop in order to propagate calculations
and their results across a vector.

In _your_ particular case however, it seems that all you are doing is a
cumulative sum on x (at least this is what's happening for i >= 6).
So you could do:

f2 <- function(x)
{
    offset <- 3
    start_propagate_at <- 6
    y_length <- 10
    init_range <- (offset+1):start_propagate_at
    y <- rep(NA, offset)
    y[init_range] <- x[init_range]
    y[start_propagate_at:y_length] <-  
cumsum(x[start_propagate_at:y_length])
    y
}

and it will return the same thing as your function 'f' (at least when  
'x' doesn't
contain NAs) but it's not faster :-/

IMO, using sapply for propagating calculations across a vector is not  
appropriate
because:

  (1) It requires special care. For example, this:

        > x <- 1:10
        > sapply(2:length(x), function(i) {x[i] <- x[i-1]+x[i]})

      doesn't work because the 'x' symbol on the left side of the <-  
in the
      anonymous function doesn't refer to the 'x' symbol defined in  
the global
      environment. So you need to use tricks like this:

        > sapply(2:length(x),

                 function(i) {x[i] <- x[i-1]+x[i]; assign("x", x,  
envir=.GlobalEnv); x[i]})

  (2) Because of this kind of tricks, then it is _very_ slow (about 10  
times
      slower or more than a 'for' loop).

Cheers,
H.

Many thanks

Tom

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Dr. Thomas McCallum
Systems Architect,
Level E Limited
ETTC, The King's Buildings
Mayfield Road,
Edinburgh EH9 3JL, UK
Work  +44 (0) 131 472 4813
Fax:  +44 (0) 131 472 4719
http://www.levelelimited.com
Email: tom at levelelimited.com

Level E is a limited company incorporated in Scotland. The c...{{dropped}}