R devel question - R-devel | R Mailing Lists

Peter Dalgaard · 1998-10-12T11:37:04Z

David Mosberger writes: > Hi Peter, > > Hope you don't mind my asking you this directly (I wasn't sure if it > would be appropriate to post this question to R-devel; It would. > if you think it > is, please feel free to forward it). Done. > I'm wondering whether there are any plans to extend R so it could > handle arbitrarily large objects. For the particular application I > have in mind, the objects would be several hundred MB in size (in > uncompressed form).

Peter Dalgaard

Mon, Oct 12, 1998 4:37 AM #

David Mosberger <david_mosberger@hp.com> writes:

It would.

Done.

Some ideas involving "virtual objects" and database interfaces have
been vented on various occasions, but there are no immediate plans. It
*should* happen at some point, I think, but just now we're busy trying
to get the documentation in sync with the implementation and getting
rid of bugs.

One way I think this could be handled is to leave the basic data types
presented by R to the user unchanged, but to offer different
implementation choices for those user data types.  For example, an
"array of structures" is presently loaded into memory in its entirety
when accessed.  For large data objects, this isn't ideal.  An
alternative implementation would be to store such a big array on disk
and load into memory only the parts that are really needed.  In other
words, the incore representation would simply be a cache of the entire
data object.  Of course, once this is done you could also vary the
external representation of objects.  For example, instead of storing
each array element next to each other, it often could be advantageous
to store the fields of the array next to each other (so that
operations like "compute the average of the .age field" could be
performed efficiently).  Yet another variation might be to add
on-the-fly compression/decompression to minimize the size of the
external data file.

If this approach were taken, I'd imagine that R would continue to use
the "store entirely in memory" approach by default to maintain
backwards compatibility.  At the same time, a few new functions could
be introduced that would allow precise control over how the object is
implemented.  So when the user wants to deal with a large object, it
would create the object, set its implementation to something suitable
(e.g., cache-only, field-sequential layout, on-the-fly compression)
and then continue to use the object as usual.

Since I'm not familiar with the internals of R, I have no idea how
easy/hard this would be and I'd therefore appreciate hearing your
opinion on whether you think this would be a valuable and doable
extension.

In any case, thanks for working on R!  I was excited to find that I
now have to option to use the S language on my Linux systems!

Cheers,

	--david

-- 
David Mosberger, Ph.D; HP Labs; 1501 Page Mill Rd MS 1U17; Palo Alto, CA 94304
davidm@hpl.hp.com               voice (650) 236-2575              fax 857-5100

O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._