Good morning,
I have discovered what I believe to be a performance regression
between Zoo 1.6x and Zoo 1.7-6 in the application of rollapply.
On zoo 1.6x, rollapply of my function over my data takes about 20
minutes. Using 1.7-6, the same code takes about 6 hours.
R --version
R version 2.13.1 (2011-07-08)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)
Two versions of zoo 1.6 run *fast* On one machine I am running
less /usr/lib64/R/library/zoo/DESCRIPTION
Package: zoo
Version: 1.6-3
Date: 2010-04-23
Title: Z's ordered observations
...
Packaged: 2010-04-23 07:28:47 UTC; zeileis
Repository: CRAN
Date/Publication: 2010-04-23 07:43:54
Built: R 2.10.1; ; 2010-04-25 06:41:34 UTC; unix
(Thankfully I forgot to upgrade.packages() on this machine!)
On the other
Package: zoo
Version: 1.6-5
Date: 2011-04-08
...
Packaged: 2011-04-08 17:13:47 UTC; zeileis
Repository: CRAN
Date/Publication: 2011-04-08 17:27:47
Built: R 2.13.1; ; 2011-11-04 15:49:54 UTC; unix
I have stripped out zoo 1.7-6 from all my machines.
I tried to ensure all libraries were identical on the two machines
(using lsof), and after finally downgrading zoo I got the second
machine to be as fast as the first, so I am quite certain the
difference in speed is down to the Zoo version used.
My code runs a fairly simple function over a time series using the
following call to process a year of 30s data (9 columns, about a
million rows):
vals <- rollapply(data=ts.data[,c(n.3.cols, o.3.cols,volocc.cols)]
,width=40
,FUN=rolling.function.fn(n.cols=n.3.cols,o.cols=o.3.cols,vo.cols=volocc.cols)
,by.column=FALSE
,align='right')
(The rolling.function.fn call returns a function that is initialized
with the initial call above (a trick I learned from Javascript))
If this is a known situation with the new 1.7 generation Zoo, my
apologies and I'll go away. If my code could be turned into a useful
test, I'd be happy to help out as much as I'm able. Given the extreme
runtime difference though, I thought I should offer my help in this
case, since zoo is such a useful package in my work.
Regards,
James Marca
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111104/a7a4de1a/attachment.bin>
zoo performance regression noticed (1.6-5 is faster...)
4 messages · James Marca, Gabor Grothendieck
On Fri, Nov 4, 2011 at 12:34 PM, James Marca
<jmarca at translab.its.uci.edu> wrote:
Good morning, I have discovered what I believe to be a performance regression between Zoo 1.6x and Zoo 1.7-6 in the application of rollapply. On zoo 1.6x, rollapply of my function over my data takes about 20 minutes. Using 1.7-6, the same code takes about 6 hours. R --version R version 2.13.1 (2011-07-08) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-pc-linux-gnu (64-bit) Two versions of zoo 1.6 run *fast* ?On one machine I am running ?less /usr/lib64/R/library/zoo/DESCRIPTION ?Package: zoo ?Version: 1.6-3 ?Date: 2010-04-23 ?Title: Z's ordered observations ?... ?Packaged: 2010-04-23 07:28:47 UTC; zeileis ?Repository: CRAN ?Date/Publication: 2010-04-23 07:43:54 ?Built: R 2.10.1; ; 2010-04-25 06:41:34 UTC; unix (Thankfully I forgot to upgrade.packages() on this machine!) On the other ?Package: zoo ?Version: 1.6-5 ?Date: 2011-04-08 ?... ?Packaged: 2011-04-08 17:13:47 UTC; zeileis ?Repository: CRAN ?Date/Publication: 2011-04-08 17:27:47 ?Built: R 2.13.1; ; 2011-11-04 15:49:54 UTC; unix I have stripped out zoo 1.7-6 from all my machines. I tried to ensure all libraries were identical on the two machines (using lsof), and after finally downgrading zoo I got the second machine to be as fast as the first, so I am quite certain the difference in speed is down to the Zoo version used. My code runs a fairly simple function over a time series using the following call to process a year of 30s data (9 columns, about a million rows): ? ?vals <- rollapply(data=ts.data[,c(n.3.cols, o.3.cols,volocc.cols)] ? ? ? ? ? ? ? ? ?,width=40 ? ? ? ? ? ? ? ? ?,FUN=rolling.function.fn(n.cols=n.3.cols,o.cols=o.3.cols,vo.cols=volocc.cols) ? ? ? ? ? ? ? ? ?,by.column=FALSE ? ? ? ? ? ? ? ? ?,align='right') (The rolling.function.fn call returns a function that is initialized with the initial call above (a trick I learned from Javascript)) If this is a known situation with the new 1.7 generation Zoo, my apologies and I'll go away. ?If my code could be turned into a useful test, I'd be happy to help out as much as I'm able. ?Given the extreme runtime difference though, I thought I should offer my help in this case, since zoo is such a useful package in my work.
This was a known problem and was fixed but if its still there then there must be some other condition under which it can occur as well. If you can provide a small self contained reproducible example it would help in tracking it down.
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
On Fri, Nov 4, 2011 at 12:56 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
On Fri, Nov 4, 2011 at 12:34 PM, James Marca <jmarca at translab.its.uci.edu> wrote:
Good morning, I have discovered what I believe to be a performance regression between Zoo 1.6x and Zoo 1.7-6 in the application of rollapply. On zoo 1.6x, rollapply of my function over my data takes about 20 minutes. Using 1.7-6, the same code takes about 6 hours. R --version R version 2.13.1 (2011-07-08) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-pc-linux-gnu (64-bit) Two versions of zoo 1.6 run *fast* ?On one machine I am running ?less /usr/lib64/R/library/zoo/DESCRIPTION ?Package: zoo ?Version: 1.6-3 ?Date: 2010-04-23 ?Title: Z's ordered observations ?... ?Packaged: 2010-04-23 07:28:47 UTC; zeileis ?Repository: CRAN ?Date/Publication: 2010-04-23 07:43:54 ?Built: R 2.10.1; ; 2010-04-25 06:41:34 UTC; unix (Thankfully I forgot to upgrade.packages() on this machine!) On the other ?Package: zoo ?Version: 1.6-5 ?Date: 2011-04-08 ?... ?Packaged: 2011-04-08 17:13:47 UTC; zeileis ?Repository: CRAN ?Date/Publication: 2011-04-08 17:27:47 ?Built: R 2.13.1; ; 2011-11-04 15:49:54 UTC; unix I have stripped out zoo 1.7-6 from all my machines. I tried to ensure all libraries were identical on the two machines (using lsof), and after finally downgrading zoo I got the second machine to be as fast as the first, so I am quite certain the difference in speed is down to the Zoo version used. My code runs a fairly simple function over a time series using the following call to process a year of 30s data (9 columns, about a million rows): ? ?vals <- rollapply(data=ts.data[,c(n.3.cols, o.3.cols,volocc.cols)] ? ? ? ? ? ? ? ? ?,width=40 ? ? ? ? ? ? ? ? ?,FUN=rolling.function.fn(n.cols=n.3.cols,o.cols=o.3.cols,vo.cols=volocc.cols) ? ? ? ? ? ? ? ? ?,by.column=FALSE ? ? ? ? ? ? ? ? ?,align='right') (The rolling.function.fn call returns a function that is initialized with the initial call above (a trick I learned from Javascript)) If this is a known situation with the new 1.7 generation Zoo, my apologies and I'll go away. ?If my code could be turned into a useful test, I'd be happy to help out as much as I'm able. ?Given the extreme runtime difference though, I thought I should offer my help in this case, since zoo is such a useful package in my work.
This was a known problem and was fixed but if its still there then there must be some other condition under which it can occur as well. If you can provide a small self contained reproducible example it would help in tracking it down. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Also, as a workaround you can try this to use an old rollapply in a
new version of zoo:
library(zoo)
source("http://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/zoo/R/rollapply.R?revision=817&root=zoo")
rollapply(...whatever...)
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
On Fri, Nov 4, 2011 at 1:02 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
On Fri, Nov 4, 2011 at 12:56 PM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
On Fri, Nov 4, 2011 at 12:34 PM, James Marca <jmarca at translab.its.uci.edu> wrote:
Good morning, I have discovered what I believe to be a performance regression between Zoo 1.6x and Zoo 1.7-6 in the application of rollapply. On zoo 1.6x, rollapply of my function over my data takes about 20 minutes. Using 1.7-6, the same code takes about 6 hours. R --version R version 2.13.1 (2011-07-08) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-pc-linux-gnu (64-bit) Two versions of zoo 1.6 run *fast* ?On one machine I am running ?less /usr/lib64/R/library/zoo/DESCRIPTION ?Package: zoo ?Version: 1.6-3 ?Date: 2010-04-23 ?Title: Z's ordered observations ?... ?Packaged: 2010-04-23 07:28:47 UTC; zeileis ?Repository: CRAN ?Date/Publication: 2010-04-23 07:43:54 ?Built: R 2.10.1; ; 2010-04-25 06:41:34 UTC; unix (Thankfully I forgot to upgrade.packages() on this machine!) On the other ?Package: zoo ?Version: 1.6-5 ?Date: 2011-04-08 ?... ?Packaged: 2011-04-08 17:13:47 UTC; zeileis ?Repository: CRAN ?Date/Publication: 2011-04-08 17:27:47 ?Built: R 2.13.1; ; 2011-11-04 15:49:54 UTC; unix I have stripped out zoo 1.7-6 from all my machines. I tried to ensure all libraries were identical on the two machines (using lsof), and after finally downgrading zoo I got the second machine to be as fast as the first, so I am quite certain the difference in speed is down to the Zoo version used. My code runs a fairly simple function over a time series using the following call to process a year of 30s data (9 columns, about a million rows): ? ?vals <- rollapply(data=ts.data[,c(n.3.cols, o.3.cols,volocc.cols)] ? ? ? ? ? ? ? ? ?,width=40 ? ? ? ? ? ? ? ? ?,FUN=rolling.function.fn(n.cols=n.3.cols,o.cols=o.3.cols,vo.cols=volocc.cols) ? ? ? ? ? ? ? ? ?,by.column=FALSE ? ? ? ? ? ? ? ? ?,align='right') (The rolling.function.fn call returns a function that is initialized with the initial call above (a trick I learned from Javascript)) If this is a known situation with the new 1.7 generation Zoo, my apologies and I'll go away. ?If my code could be turned into a useful test, I'd be happy to help out as much as I'm able. ?Given the extreme runtime difference though, I thought I should offer my help in this case, since zoo is such a useful package in my work.
This was a known problem and was fixed but if its still there then there must be some other condition under which it can occur as well. If you can provide a small self contained reproducible example it would help in tracking it down. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Also, as a workaround you can try this to use an old rollapply in a
new version of zoo:
library(zoo)
source("http://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/zoo/R/rollapply.R?revision=817&root=zoo")
rollapply(...whatever...)
Have looked at it and there is now a performance improvement in the development version of rollapply that gives an order of magnitude performance boost in the following case:
library(zoo) n <- 10000 z <- zoo(cbind(a = 1:n, b = 1:n)) system.time(rollapplyr(z, 10, sum, by.column = FALSE))
user system elapsed 8.80 0.02 8.97
# download rollapply rev 913 from svn repo and rerun
source("http://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/zoo/R/rollapply.R?revision=913&root=zoo")
system.time(rollapplyr(z, 10, sum, by.column = FALSE))
user system elapsed 0.52 0.02 0.53
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com