Skip to content

Performance comparison xts v. zoo

6 messages · Sheftel, Ryan, Dirk Eddelbuettel, Shane Conway +2 more

#
On Wed, Jan 19, 2011 at 8:08 AM, Sheftel, Ryan
<ryan.sheftel at credit-suisse.com> wrote:
At this point in time xts is mostly written in C while zoo is mostly
written in R so xts should be substantially faster.  There is an
objective of merging the backends of xts and zoo at which point they
should run at about the same speed.
#
Hi Ryan,
On 19 January 2011 at 08:08, Sheftel, Ryan wrote:
| I am looking for a comparison of the performance speed between xts and
| zoo on time series. I remember once seeing this in a pdf document,
| perhaps a magazine article?, but after extensive google-ing I have come
| up blank.
| 
| Any direction would be helpful before I reproduce the results myself.

xts is faster, period, as zoo started as more general (than just financial
time series: "ordered objects") and is still R-code only. And as Gabor just
restated, that is bound to change with some xts code expected to merge over
to zoo at some point---though we have been told that for years.

Xts on the other hand has compiled C and Fortran code for key operations
making it very fast (and generally faster than zoo), as well as powerful.
One example is the ISO8601 date parsing which can subset based on
human-readable strings such as "2011-01-18 10:00/2011-01-19 10:30" getting
you just that half-hour interval yesterday on intra-day data in an xts
object. So in short, I usually start projects with xts.

As for blazing fast, Jeff also has the 'indexing' package (as well as 'mmap')
which go even further, and both together are competitive in access time with
commercial offerings. Maybe you recall a writeup Jeff did for that?  I can't
recall a xts-vs-zoo horse race but maybe I missed it.

Hope this helps,  Dirk
#
On Wed, Jan 19, 2011 at 8:32 AM, Dirk Eddelbuettel <edd at debian.org> wrote:
That is not truly an example.

A true example would be that xts has C code for merge whereas zoo has
R code.  Thus merges and functionality depending on merges could be
expected to be faster in xts.

On the other hand, comparing time operations is not a good example.
Parsing times are not a part of zoo nor are time operations in
general.  zoo defines an API that allows it to use any time class that
supports certain time/date methods (and all popular ones and most
lesser known ones do as do many classes that are not ordinarily
thought of as time classes) whereas xts hard codes these so the
"example" is really comparing particular time class methods, not part
of zoo, with hard coded functionality in xts.   Relative speeds would
depend on the particular time class and its implementation.

The one implication for speed is that if a new faster time class comes
along then zoo could likely use it without modifying zoo whereas xts
would have to be modified to handle it.
6 days later
#
Hi Ryan,

Sorry for the late reply.

The slides you are thinking about are from a talk I gave at Columbia
back in 2008, so they are a bit outdated.

http://www.quantmod.com/Columbia2008/ColumbiaDec4.pdf

In general, xts has gotten substantially faster since then.  Some
things in xts are now even faster than using raw matrices for data.

x <- .xts(1:1e6L, 1:1e6L)
system.time( x[,1] )
   user  system elapsed
  0.002   0.000   0.002

m <- coredata(x) # a matrix, no 'time' index, so not really a time series
system.time( m[,1] )
   user  system elapsed
  0.015   0.000   0.015

I haven't put together a comparison in a while, but repeating the
benchmarks in the slides have xts with a substantial edge, often times
an order of magnitude better (even 2 or 3 orders of magnitude)

Most of the differences in xts vs. zoo come from the C in xts of
course.  But as Gabor noted, the effort is under way to move many of
the core C functions back up into zoo.  The limits here are that while
xts and zoo are very, very compatible - some xts functionality differs
- and we can't realistically break anything in zoo in the process.
The other limit is with respect to time.  Some of the code is in zoo
already, though not 'switched on' yet. Subsetting and basic Ops are
really the primary target for the migration.

My 2c is that xts is as fast as it can be, as it is all highly
optimized C --- 10x faster than all the other ts classes at a minimum
--- but that zoo will be brought up to speed "soon". ;-)

The other part of xts is that it does make "development" speedier by
way of the ISO8601 subsetting and related to.period aggregation code.
Not sure if that counts for 'performance speed' as your original post
requests though.

Best,
Jeff

P.S. The timeSeries values in the slides are _much_ improved as of
current implementations - though still much slower than xts.

On Wed, Jan 19, 2011 at 7:08 AM, Sheftel, Ryan
<ryan.sheftel at credit-suisse.com> wrote: