Skip to content
Prev 6134 / 15274 Next

R + HDF5 + Pytables

Yes, indexing is my answer to an out-of-core data solution. 

It is essentially a data.frame on disk, where OS level mmap is used to manage the process efficiently and transparently. 

It is under very active development and really is quite far from "stable", though it is functional and being used internally on data that is fairly large.

The parts that make it fast are the facts that is column oriented (like data.frames) and that traditional indexing tools are available by default. Currently sorted indexing is implemented, but bitmap variants are in the works, as well as compression tools for bitmaps. The LZO algorithm is part of the development as well as more high performance variants related to some advanced compression schemes that allow for relational algebra on the compressed bitmaps. 

As Daniel stated though this isn't really 'finance' per se, so I'll stop here. 

When the progress is further along, I will make announcements to the list(s). I am also presenting this at useR in DC this summer. 

Benchmarks against the Kdb's of the world would indeed be fun. I don't think they allow that... I wonder why? ;-)

Jeff
Sent via BlackBerry from T-Mobile

-----Original Message-----
From: Daniel Cegie?ka <daniel.cegielka at gmail.com>
Date: Tue, 18 May 2010 16:23:36 
To: Manoj<manojsw at gmail.com>
Cc: <r-sig-finance at stat.math.ethz.ch>
Subject: Re: [R-SIG-Finance] R + HDF5 + Pytables

Manoj, this is not a financial subject  - you should send this to
r-sig-hpc list.
Now indexing is still under development, but ability to work with high
performance with TB of tick data it was one of primary design goal of
indexing package. Inside xts code you can find nice optimized C code
for low latency and high performance. And when you join xts with
indexing package you can compare it even with kdb... (next point - you
can use indexing as a shared memory for many R instances).

Indexing will work nice event with many TB of tick data and you don't
have latency from TCP stack (kdb).

It need(?) only some nice compression solution...

regards,
daniel


W dniu 18 maja 2010 05:56 u?ytkownik Manoj <manojsw at gmail.com> napisa?:
_______________________________________________
R-SIG-Finance at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Message-ID: <1535314764-1274198683-cardhu_decombobulator_blackberry.rim.net-864149837-@bda325.bisx.prod.on.blackberry>
In-Reply-To: <AANLkTikxuiULNqtyy243jUKkR0gPUscGWkiT6hLzWEGa@mail.gmail.com>