Skip to content

error loading huge .RData

3 messages · Liaw, Andy, Peter Dalgaard, Luke Tierney

#
Patrick,

I appreciate your comments, and practice everything that you preach.
However, that workspace image contains only 2~3 R objects: the input and
output of a single R command.  I knew there could be problems, so I've
stripped it down to the bare minimum.  Yes, I also kept the commands in a
script.  That single command (in case you want to know: a random forest run
with 4000 rows and nearly 7000 variables) took over 3 days to run.  There's
not a whole lot I can do here when the data is so large.

Andy
------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named in this message.  If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it.

==============================================================================

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
"Liaw, Andy" <andy_liaw at merck.com> writes:
Hmm. You could be running into some sort of situation where data
temporarily take up more space in memory than they need to. It does
sound like a bit of a bug if R can write images that are bigger than
it can read. Not sure how to proceed though. Does anyone on R-core
have a similarly big  system and a spare gigabyte of disk? Is it
possible to create a mock-up of similarly organized data that displays
the same effect, but takes less than three days?

        -p

 BTW: Did we ever hear what system this is happening on?
#
On Wed, Apr 24, 2002 at 02:39:15PM +0200, Peter Dalgaard BSA wrote:
I guess we could make sure the write fails as well :-)

Actually that isn't entirely flippant. The serialization mechanism
only preserves sharing that is semantically meaningful (symbols,
environments, external references and weak references).  This has been
so since the first change in the save format in R 0.something.  As a
result, saving and loading a value may result in using more memory for
the restored version.  It would be possible to preserve all sharing
within a single save operation, but that would require keeping track
of all objects as they are written, which requires more memory, and
hence could make the write fail.

It is fairly hard to create values with shared data structure at the R
level (easy in C though) so it hasn't been much of an issue.  One
place where we might be getting bitten though is in the way names are
attached to things; those are often shared when objects are created
but will be duplicated by our save/load strategy. Whether that is an
issue here is hard to tell.

luke