Skip to content

xts() speed on data with date index

11 messages · Gabor Grothendieck, Michael, Roger Bergande +3 more

#
I like xts because  subsetting is easier.  Something like x["2009-08"].
But it seems that xts() is a little slower than zoo() when converting
data with date index instead of time index.
user  system elapsed
      0       0       0
user  system elapsed
      0       0       0
user  system elapsed
      0       0       0
user  system elapsed
   0.34    0.00    0.35

Regards,
Michael
#
The explanation for the time difference is likely that xts
converts all time indexes to POSIXct internally whereas
zoo works with them in their original form.   Thus the
time difference would be expected to be the conversion time.

Once its converted then xts would likely be faster for those
xts operatoins where the xts code is in C.  In contrast zoo
is 100% R although eventually it is anticipated that portions
of zoo will be in C too.

As xts in a subclass of zoo some of its operations are done
in zoo so in that case there would not be much of a difference
in speed or if xts handles the corresponding operation itself
but in R then it would not be expected that there would be
much difference either.
On Sat, Jul 25, 2009 at 8:59 AM, michael li<michaellibeijing at gmail.com> wrote:
#
Gabor is correct.

In addition the operation you are looking at is the creation of an xts
object, which is usually happening once during typical use (or at
least far fewer times than other operations).

xts stores all times internally as POSIXct, for internal
efficiency/consistency purposes. The original time class is preserved
in an internal attribute that is used in calls where it makes sense to
convert for the user:

(1)index():
where you are extracting the 'original' index.  Transparently the
POSIXct is converted back to the time class you expect.  This carries
overhead, but again should be only occasionally used.  Direct access
to the index is via .index()

(2)print() or printing by default.
Conversion cost is irrelevant here, as it is assumed you will spend
far more time looking at the output than any cost to convert.

The logic is that everything is 'time', just with slightly different
internal representations.  The user shouldn't have to care about time
choice, just as the user should have to care about big-endian or
little-endian hardware storage.

There are some technical issues that are still not perfect of course.
Timezone handling can get tricky if you do a lot of strange subsetting
with *different* index classes.  This type of handling is entirely
impossible in most other time series classes, so the 'weakness' isn't
really valid, as it is still better than most, but it isn't perfect
... yet.

HTH
Jeff


On Sat, Jul 25, 2009 at 8:19 AM, Gabor
Grothendieck<ggrothendieck at gmail.com> wrote:

  
    
#
An example to clarify/confuse ;)
user  system elapsed
  0.001   0.001   0.002
user  system elapsed
  0.164   0.004   0.168
[1] "Date"
[1] "POSIXt"  "POSIXct"

##  in POSIXct time as expected
[,1]       [,2]       [,3]       [,4]
2009-07-26 09:39:32  0.6739000  0.6124116 -0.5820407  0.1022264
2009-07-27 09:39:32 -0.7806308 -0.4988508 -0.5448239 -0.5649892
2009-07-28 09:39:32  1.0275375 -0.8538655 -0.6098992  1.9661175
2009-07-29 09:39:32  0.6268532 -0.3144514  1.0142005 -0.2303953
2009-07-30 09:39:32 -3.8062200 -0.3875341 -0.9251609 -0.4677076
2009-07-31 09:39:32 -1.1192617 -0.6575085 -0.9918533 -0.8743504
[,1]       [,2]       [,3]       [,4]
2009-07-26  0.6739000  0.6124116 -0.5820407  0.1022264
2009-07-27 -0.7806308 -0.4988508 -0.5448239 -0.5649892
2009-07-28  1.0275375 -0.8538655 -0.6098992  1.9661175
2009-07-29  0.6268532 -0.3144514  1.0142005 -0.2303953
2009-07-30 -3.8062200 -0.3875341 -0.9251609 -0.4677076
2009-07-31 -1.1192617 -0.6575085 -0.9918533 -0.8743504
### simply convert the indexClass -- this is essentially costless,
### and now you have an xts object 'indexed' by Date
[,1]       [,2]       [,3]       [,4]
2009-07-26  0.6739000  0.6124116 -0.5820407  0.1022264
2009-07-27 -0.7806308 -0.4988508 -0.5448239 -0.5649892
2009-07-28  1.0275375 -0.8538655 -0.6098992  1.9661175
2009-07-29  0.6268532 -0.3144514  1.0142005 -0.2303953
2009-07-30 -3.8062200 -0.3875341 -0.9251609 -0.4677076
2009-07-31 -1.1192617 -0.6575085 -0.9918533 -0.8743504
user  system elapsed
  0.001   0.001   0.001

Now convert back, and see if something like merge cares about the
index class... (hint: it doesn't)
user  system elapsed
  0.001   0.000   0.000
[1] "POSIXt"  "POSIXct"
[,1]       [,2]       [,3]       [,4]
2009-07-26 09:39:32  0.6739000  0.6124116 -0.5820407  0.1022264
2009-07-27 09:39:32 -0.7806308 -0.4988508 -0.5448239 -0.5649892
2009-07-28 09:39:32  1.0275375 -0.8538655 -0.6098992  1.9661175
2009-07-29 09:39:32  0.6268532 -0.3144514  1.0142005 -0.2303953
2009-07-30 09:39:32 -3.8062200 -0.3875341 -0.9251609 -0.4677076
2009-07-31 09:39:32 -1.1192617 -0.6575085 -0.9918533 -0.8743504
[,1]       [,2]       [,3]       [,4]
2009-07-26  0.6739000  0.6124116 -0.5820407  0.1022264
2009-07-27 -0.7806308 -0.4988508 -0.5448239 -0.5649892
2009-07-28  1.0275375 -0.8538655 -0.6098992  1.9661175
2009-07-29  0.6268532 -0.3144514  1.0142005 -0.2303953
2009-07-30 -3.8062200 -0.3875341 -0.9251609 -0.4677076
2009-07-31 -1.1192617 -0.6575085 -0.9918533 -0.8743504
user  system elapsed
  0.001   0.000   0.002
user  system elapsed
  0.001   0.000   0.002

### Caveat: converting back and forth is a bit silly, and possibly
perilous, but it is
### illustrative of what you can do with xts

HTHsomewhat

Jeff
On Sat, Jul 25, 2009 at 7:59 AM, michael li<michaellibeijing at gmail.com> wrote:

  
    
#
The conversion to or from Date occurs when you store it (as you
noticed) or print it out or otherwise try to use it in a context that
requires a Date.  Your example does not do any of those so no
conversion was needed.  Try this:
...
   user  system elapsed
   0.39    0.00    0.39
...
   user  system elapsed
   0.08    0.00    0.11
On Sat, Jul 25, 2009 at 9:38 PM, Michael<michaellibeijing at gmail.com> wrote:
3 days later
#
Dear all

I?m facing the problem of valuing a tier 1 Bond. The Bond has an  
embedded option which allows no coupon payment if the issuer doesn?t  
pay dividends.

Does anybody have an idea how I could model this type of Bond? Please  
let me know if you have a model or a replicating method.

Many thanks in advance and best regards,
Roger
#
Hi,
  I noticed that the periodReturn function seems to a take non-trivial
amount of time to compute weekly returns. 

  The calls below all compute the log returns from 1/1/2007 to now and
they take slightly different amounts of time.

  I'd like to be able to compute weekly returns as fast as possible.
Does anyone have suggestions for minimizing the processing time?
Different data type?  Different function to call?  

Thanks,
Ian
subset="2007::"))
   user  system elapsed 
   0.06    0.00    0.06
subset="2007::"))
   user  system elapsed 
   0.04    0.00    0.07
user  system elapsed 
   0.05    0.00    0.05
#
Ian Coe wrote:
Ian,

1/20th of a second for WEEKLY aggregated returns on 2+ years of data 
doesn't seem outrageous.  So might gain a tiny bit by calling log() 
directly, but I doubt it.  I could of course be wrong, but the tiny time 
difference seems trivial, not "non-trivial".  Perhaps you need to 
investigate use of foreach and doMC for your larger problem?

Regards,

   - Brian
#
I suspect we live in marginally different worlds - as 0.05 seems
'trivial' to me.

While all this software is "free" I'd like to reiterate that it
actually costs someone (usually) lots of time.

A better/helpful post could include some effort to identify a
bottleneck, or maybe show what you've accomplished to make it better.

The above said, a few suggestions:

In one CPU hour (10 cents on Amazon?) you could do periodReturn 60,000 times.

Assuming an hour to better the performance by 50%, and at a price of
to code of say $100/hr you'd need to plan on using the new function
120,000,000 times.  These estimates are also quite unrealistic..
300,000,000 times might be closer.  That is one heck of a universe of
instruments.

If you find yourself calling the function over and over, saving the
results might be useful too.

Part of the above <rant> was to illustrate that there is no free lunch
--- even in the land of free beer.

That said, periodReturn was meant to be easy to use first, and fast a
distant second.

Try:
elapsed
20.71429
[1] TRUE

The direct route is 20x faster.

As it turns out, I am refactoring the internal to.period code to be
much faster/memory efficient.  The original was Fortran, which while
very fast, suffers from a lot of required copying in R, so the code is
changing (originally from 2007).  I'll probably also think about
making the simple cases use something more along the direct route.

HTH
Jeff
On Wed, Jul 29, 2009 at 5:06 PM, Ian Coe<ICoe at connectcap.com> wrote: