Message-ID: <CAP01uRm2=Lrtu_CWDBtVDh_4+1Oa+E8GJkk3sDnwKmv8z8xkBA@mail.gmail.com>
Date: 2011-11-05T14:32:45Z
From: Gabor Grothendieck
Subject: zoo performance regression noticed (1.6-5 is faster...)
In-Reply-To: <CAP01uRmMETAZ43W3P5LLG9ApwAADDDONbgYFDU6SQTR7ohr7_A@mail.gmail.com>
On Fri, Nov 4, 2011 at 1:02 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> On Fri, Nov 4, 2011 at 12:56 PM, Gabor Grothendieck
> <ggrothendieck at gmail.com> wrote:
>> On Fri, Nov 4, 2011 at 12:34 PM, James Marca
>> <jmarca at translab.its.uci.edu> wrote:
>>> Good morning,
>>>
>>> I have discovered what I believe to be a performance regression
>>> between Zoo 1.6x and Zoo 1.7-6 in the application of rollapply.
>>> On zoo 1.6x, rollapply of my function over my data takes about 20
>>> minutes. Using 1.7-6, the same code takes about 6 hours.
>>>
>>> R --version
>>> R version 2.13.1 (2011-07-08)
>>> Copyright (C) 2011 The R Foundation for Statistical Computing
>>> ISBN 3-900051-07-0
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>
>>> Two versions of zoo 1.6 run *fast* ?On one machine I am running
>>>
>>> ?less /usr/lib64/R/library/zoo/DESCRIPTION
>>> ?Package: zoo
>>> ?Version: 1.6-3
>>> ?Date: 2010-04-23
>>> ?Title: Z's ordered observations
>>> ?...
>>> ?Packaged: 2010-04-23 07:28:47 UTC; zeileis
>>> ?Repository: CRAN
>>> ?Date/Publication: 2010-04-23 07:43:54
>>> ?Built: R 2.10.1; ; 2010-04-25 06:41:34 UTC; unix
>>>
>>> (Thankfully I forgot to upgrade.packages() on this machine!)
>>>
>>> On the other
>>>
>>> ?Package: zoo
>>> ?Version: 1.6-5
>>> ?Date: 2011-04-08
>>> ?...
>>> ?Packaged: 2011-04-08 17:13:47 UTC; zeileis
>>> ?Repository: CRAN
>>> ?Date/Publication: 2011-04-08 17:27:47
>>> ?Built: R 2.13.1; ; 2011-11-04 15:49:54 UTC; unix
>>>
>>> I have stripped out zoo 1.7-6 from all my machines.
>>>
>>> I tried to ensure all libraries were identical on the two machines
>>> (using lsof), and after finally downgrading zoo I got the second
>>> machine to be as fast as the first, so I am quite certain the
>>> difference in speed is down to the Zoo version used.
>>>
>>> My code runs a fairly simple function over a time series using the
>>> following call to process a year of 30s data (9 columns, about a
>>> million rows):
>>>
>>> ? ?vals <- rollapply(data=ts.data[,c(n.3.cols, o.3.cols,volocc.cols)]
>>> ? ? ? ? ? ? ? ? ?,width=40
>>> ? ? ? ? ? ? ? ? ?,FUN=rolling.function.fn(n.cols=n.3.cols,o.cols=o.3.cols,vo.cols=volocc.cols)
>>> ? ? ? ? ? ? ? ? ?,by.column=FALSE
>>> ? ? ? ? ? ? ? ? ?,align='right')
>>>
>>>
>>> (The rolling.function.fn call returns a function that is initialized
>>> with the initial call above (a trick I learned from Javascript))
>>>
>>> If this is a known situation with the new 1.7 generation Zoo, my
>>> apologies and I'll go away. ?If my code could be turned into a useful
>>> test, I'd be happy to help out as much as I'm able. ?Given the extreme
>>> runtime difference though, I thought I should offer my help in this
>>> case, since zoo is such a useful package in my work.
>>
>> This was a known problem and was fixed but if its still there then
>> there must be some other condition under which it can occur as well.
>> If you can provide a small self contained reproducible example it
>> would help in tracking it down.
>>
>> --
>> Statistics & Software Consulting
>> GKX Group, GKX Associates Inc.
>> tel: 1-877-GKX-GROUP
>> email: ggrothendieck at gmail.com
>>
>
> Also, as a workaround you can try this to use an old rollapply in a
> new version of zoo:
>
> library(zoo)
> source("http://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/zoo/R/rollapply.R?revision=817&root=zoo")
> rollapply(...whatever...)
>
Have looked at it and there is now a performance improvement in the
development version of rollapply that gives an order of magnitude
performance boost in the following case:
> library(zoo)
> n <- 10000
> z <- zoo(cbind(a = 1:n, b = 1:n))
> system.time(rollapplyr(z, 10, sum, by.column = FALSE))
user system elapsed
8.80 0.02 8.97
>
> # download rollapply rev 913 from svn repo and rerun
> source("http://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/zoo/R/rollapply.R?revision=913&root=zoo")
> system.time(rollapplyr(z, 10, sum, by.column = FALSE))
user system elapsed
0.52 0.02 0.53
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com