Skip to content

How can I avoid nested 'for' loops or quicken the process?

13 messages · Brigid Mooney, Daniel Nordlund, David Winsemius +4 more

#
On Dec 23, 2008, at 9:55 AM, Brigid Mooney wrote:

            
apply is giving calcProfit a named numeric vector and then calcProfit  
is trying to parse it with "$" which is an operator for lists. Try  
serial corrects of the form:

long <- IterParam["long"]

That seemed to let the interpreter move on to the next error ;-)

 > Results2 <- apply(CombParam, 1, calcProfit, X, Y)
Error in IterParam$short : $ operator is invalid for atomic vectors
#
On Dec 23, 2008, at 10:56 AM, Brigid Mooney wrote:

            
snip
Have you tried as.data.frame() on Results2? Each of its elements  
should have the proper structure.

You no longer have a reproducible example, but see this session clip:
 > lairq <- apply(airquality,1, function(x) x )
 > str(lairq)
  num [1:6, 1:153] 41 190 7.4 67 5 1 36 118 8 72 ...
  - attr(*, "dimnames")=List of 2
   ..$ : chr [1:6] "Ozone" "Solar.R" "Wind" "Temp" ...
   ..$ : NULL
 > is.data.frame(lairq)
[1] FALSE
 > is.data.frame(rbind(lairq))
[1] FALSE
 > is.data.frame( as.data.frame(lairq) )
#
Avoiding multiple nested for loops (as requested in the subject) is usually
a good idea, especially if you can take advantage of vectorized functions.
You were able redesign your code to use a single for loop.  I presume there
was a substantial improvement in program speed.  How much additional time is
saved by using apply to  eliminate the final for loop?  Is it worth the
additional programming time?  Enquiring minds want to know. :-)

Dan

Daniel Nordlund
Bothell, WA USA
#
FWIW:

Good advice below! -- after all, the first rule of optimizing code is:
Don't!

For the record (yet again), the apply() family of functions (and their
packaged derivatives, of course) are "merely" vary carefully written for()
loops: their main advantage is in code readability, not in efficiency gains,
which may well be small or nonexistent. True efficiency gains require
"vectorization", which essentially moves the for() loops from interpreted
code to (underlying) C code (on the underlying data structures): e.g.
compare rowMeans() [vectorized] with ave() or apply(..,1,mean).

Cheers,
Bert Gunter
Genentech Nonclinical Statistics

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Daniel Nordlund
Sent: Tuesday, December 23, 2008 10:01 AM
To: r-help at r-project.org
Subject: Re: [R] How can I avoid nested 'for' loops or quicken the process?

Avoiding multiple nested for loops (as requested in the subject) is usually
a good idea, especially if you can take advantage of vectorized functions.
You were able redesign your code to use a single for loop.  I presume there
was a substantial improvement in program speed.  How much additional time is
saved by using apply to  eliminate the final for loop?  Is it worth the
additional programming time?  Enquiring minds want to know. :-)

Dan

Daniel Nordlund
Bothell, WA USA
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
I have to agree with Daniel Nordlund regarding not creating subsidiary  
problems when the main problem has been cracked. Nonetheless, ...   
might you be happier with the result of changing the last data.frame()  
call in calcProfit to c()?

I get a matrix:
 > str(Results2)
  num [1:14, 1:16] 3.00e+04 3.00 -4.50e+02 -1.50e-02 -1.54e-02  
7.50e-01 -5.00e-01 1.00e+04 -1.50e-02 2.00e-04 ...
  - attr(*, "dimnames")=List of 2
   ..$ : chr [1:14] "OutTotInvestment" "OutNumInvestments.investment"  
"OutDolProf" "OutPerProf" ...
   ..$ : NULL

... if you go along with that strategy, then I think it is possible  
that you really want as.data.frame( t( Results2)) since the rows and  
columns seem to be transposed from what I would have wanted.

Now, ...  your next task is to set up your mail-client so it sends  
unformatted text to R-help.
2 days later
#
Bert Gunter <gunter.berton <at> gene.com> writes:
[...]

The apply-functions do bring speed-advantages.

This is not only what I read about it,
I have used the apply-functions and really got
results faster.

The reason is simple: an apply-function does
make in C, what otherwise would be done on the level of R
with for-loops.

Ciao,
   Oliver
#
On Thu, 25 Dec 2008, Oliver Bandel wrote:

            
Not true of apply(): true of lapply() and hence sapply().  I'll leave you 
to check eapply, mapply, rapply, tapply.

So the issue is what is meant by 'the apply() family of functions': people 
often mean *apply(), of which apply() is an unusual member, if one at all.

[Historical note: a decade ago lapply was internally a for() loop.  I 
rewrote it in C in 2000: I also moved apply to C at the same time but it 
proved too little an advantage and was reverted.  The speed of lapply 
comes mainly from reduced memory allocation: for() is also written in C.]
#
Prof Brian Ripley wrote:
....
Conceptually, I think it belongs there. apply(M,1,max) is similar to 
tapply(M,row(M),max), etc. The "apply-functions" share a general 
split-operate-reassemble set of semantics, and apply _could_ be 
implemented as splitting by indices in MARGINS, followed by lapply, 
followed by reassembly into a matrix, as in tapply().

In reality, apply() is implemented differently, using aperm() and direct 
indexing. This is more efficient, but it shouldn't necessarily change 
the way in which we think about it. It is a bit unfortunate that the 
most complex mamber of the family has gotten the most basic name, though.

  
    
#
Thankyou for the clarification, Brian. This is very helpful (as usual).

However, I think the important point, which I misstated, is that whether it
be for() or, e.g. lapply(), the "loop" contents must be evaluated at the
interpreted R level, and this is where most time is typically spent. To get
the speedup that most people hope for, avoiding the loop altogether (i.e.
moving loop **and** evaluations) to C level, via R programming -- e.g. via
use of matrix operations, indexing, or built-in .Internal functions, etc. --
is the key.

Please correct me if I'm (even partially) wrong. As you know, the issue
arises frequently.

-- Bert Gunter
Genentech

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Prof Brian Ripley
Sent: Friday, December 26, 2008 12:44 AM
To: Oliver Bandel
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] How can I avoid nested 'for' loops or quicken the process?
On Thu, 25 Dec 2008, Oliver Bandel wrote:

            
for()
gains,
Not true of apply(): true of lapply() and hence sapply().  I'll leave you 
to check eapply, mapply, rapply, tapply.

So the issue is what is meant by 'the apply() family of functions': people 
often mean *apply(), of which apply() is an unusual member, if one at all.

[Historical note: a decade ago lapply was internally a for() loop.  I 
rewrote it in C in 2000: I also moved apply to C at the same time but it 
proved too little an advantage and was reverted.  The speed of lapply 
comes mainly from reduced memory allocation: for() is also written in C.]
#
On Fri, 26 Dec 2008, Bert Gunter wrote:

            
'Typically' is not the whole story.  In a loop like

Y <- double(length(X))
for(i in seq_along(X)) Y[i] <- fun(X[i])

quite a lot of time and memory may be spent in re-allocating Y at each
step of the loop, and lapply() is able to avoid that.  E.g.

X <- runif(1e6)
system.time({
Y <- double(length(X))
for(i in seq_along(X)) Y[i] <- sin(X[i])
})

takes 5.2 secs vs unlist(lapply(X, sin)) which takes 1.5.  Of course, 
using the vectorized function sin() takes 0.05 sec.  If you use sapply you 
will lose all the gain.

This is not a typical example, but it arises often enough to make it 
worthwhile having an optimized lapply().