An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20110119/6a5f301f/attachment.pl>
Performance comparison xts v. zoo
6 messages · Sheftel, Ryan, Dirk Eddelbuettel, Shane Conway +2 more
On Wed, Jan 19, 2011 at 8:08 AM, Sheftel, Ryan
<ryan.sheftel at credit-suisse.com> wrote:
I am looking for a comparison of the performance speed between xts and zoo on time series. I remember once seeing this in a pdf document, perhaps a magazine article?, but after extensive google-ing I have come up blank. Any direction would be helpful before I reproduce the results myself. Thanks.
At this point in time xts is mostly written in C while zoo is mostly written in R so xts should be substantially faster. There is an objective of merging the backends of xts and zoo at which point they should run at about the same speed.
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Hi Ryan,
On 19 January 2011 at 08:08, Sheftel, Ryan wrote:
| I am looking for a comparison of the performance speed between xts and | zoo on time series. I remember once seeing this in a pdf document, | perhaps a magazine article?, but after extensive google-ing I have come | up blank. | | Any direction would be helpful before I reproduce the results myself. xts is faster, period, as zoo started as more general (than just financial time series: "ordered objects") and is still R-code only. And as Gabor just restated, that is bound to change with some xts code expected to merge over to zoo at some point---though we have been told that for years. Xts on the other hand has compiled C and Fortran code for key operations making it very fast (and generally faster than zoo), as well as powerful. One example is the ISO8601 date parsing which can subset based on human-readable strings such as "2011-01-18 10:00/2011-01-19 10:30" getting you just that half-hour interval yesterday on intra-day data in an xts object. So in short, I usually start projects with xts. As for blazing fast, Jeff also has the 'indexing' package (as well as 'mmap') which go even further, and both together are competitive in access time with commercial offerings. Maybe you recall a writeup Jeff did for that? I can't recall a xts-vs-zoo horse race but maybe I missed it. Hope this helps, Dirk
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20110119/023fd8c5/attachment.pl>
On Wed, Jan 19, 2011 at 8:32 AM, Dirk Eddelbuettel <edd at debian.org> wrote:
Hi Ryan, On 19 January 2011 at 08:08, Sheftel, Ryan wrote: | I am looking for a comparison of the performance speed between xts and | zoo on time series. I remember once seeing this in a pdf document, | perhaps a magazine article?, but after extensive google-ing I have come | up blank. | | Any direction would be helpful before I reproduce the results myself. Xts on the other hand has compiled C and Fortran code for key operations making it very fast (and generally faster than zoo), as well as powerful.
One example is the ISO8601 date parsing which can subset based on human-readable strings such as "2011-01-18 10:00/2011-01-19 10:30" getting you just that half-hour interval yesterday on intra-day data in an xts object.
That is not truly an example. A true example would be that xts has C code for merge whereas zoo has R code. Thus merges and functionality depending on merges could be expected to be faster in xts. On the other hand, comparing time operations is not a good example. Parsing times are not a part of zoo nor are time operations in general. zoo defines an API that allows it to use any time class that supports certain time/date methods (and all popular ones and most lesser known ones do as do many classes that are not ordinarily thought of as time classes) whereas xts hard codes these so the "example" is really comparing particular time class methods, not part of zoo, with hard coded functionality in xts. Relative speeds would depend on the particular time class and its implementation. The one implication for speed is that if a new faster time class comes along then zoo could likely use it without modifying zoo whereas xts would have to be modified to handle it.
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
6 days later
Hi Ryan, Sorry for the late reply. The slides you are thinking about are from a talk I gave at Columbia back in 2008, so they are a bit outdated. http://www.quantmod.com/Columbia2008/ColumbiaDec4.pdf In general, xts has gotten substantially faster since then. Some things in xts are now even faster than using raw matrices for data. x <- .xts(1:1e6L, 1:1e6L) system.time( x[,1] ) user system elapsed 0.002 0.000 0.002 m <- coredata(x) # a matrix, no 'time' index, so not really a time series system.time( m[,1] ) user system elapsed 0.015 0.000 0.015 I haven't put together a comparison in a while, but repeating the benchmarks in the slides have xts with a substantial edge, often times an order of magnitude better (even 2 or 3 orders of magnitude) Most of the differences in xts vs. zoo come from the C in xts of course. But as Gabor noted, the effort is under way to move many of the core C functions back up into zoo. The limits here are that while xts and zoo are very, very compatible - some xts functionality differs - and we can't realistically break anything in zoo in the process. The other limit is with respect to time. Some of the code is in zoo already, though not 'switched on' yet. Subsetting and basic Ops are really the primary target for the migration. My 2c is that xts is as fast as it can be, as it is all highly optimized C --- 10x faster than all the other ts classes at a minimum --- but that zoo will be brought up to speed "soon". ;-) The other part of xts is that it does make "development" speedier by way of the ISO8601 subsetting and related to.period aggregation code. Not sure if that counts for 'performance speed' as your original post requests though. Best, Jeff P.S. The timeSeries values in the slides are _much_ improved as of current implementations - though still much slower than xts. On Wed, Jan 19, 2011 at 7:08 AM, Sheftel, Ryan
<ryan.sheftel at credit-suisse.com> wrote:
I am looking for a comparison of the performance speed between xts and
zoo on time series. I remember once seeing this in a pdf document,
perhaps a magazine article?, but after extensive google-ing I have come
up blank.
Any direction would be helpful before I reproduce the results myself.
Thanks.
===============================================================================
Please access the attached hyperlink for an important el...{{dropped:8}}
_______________________________________________ R-SIG-Finance at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.
Jeffrey Ryan jeffrey.ryan at lemnica.com www.lemnica.com