Skip to content
Prev 22582 / 398502 Next

A test on a 8000 x 1000 matrix using blocksize = 1000 ran in about 150MbRe: [R] package for saving large datasets in ASCII

The sort of `large' here is 7500x1200.  That's 72Mb if real numbers, so
let's assume you have at least 256Mb to use.  I ran the following on
Windows with a 256Mb limit (and I had to use R-devel to do so). I actually
found it difficult to create a data frame of that size in 256Mb, and
resorted to

A1 <- vector("list", 1000)
for(i in 1:1000) A1[[i]] <- rnorm(8000)
class(A1) <- "data.frame"
row.names(A1) <- 1:8000

which took 15 secs and 140Mb as an underhand way to make a data frame.
(1.5.1 took too much memory here.)

Then

A2 <- as.matrix(A1)

took 1.8secs (hardly slow) and an additional 64Mb to hold the object A2.
I then deleted A1.  Running

write.table(A2, "foo.dat", blocksize=1000)

used about 150Mb in about four minutes.  That is formatting 8 million
numbers, and 85% of the time was spent in the system calls, as one should
expect.  (I suspect I did not need to delete A1, but didn't want to wait
around to find out.)

So

1) you could have checked your claims by some simple experiments.

2) as claimed, write.matrix does indeed do the job.
On Sun, 11 Aug 2002, Ott Toomet wrote:

            
A few hundred, probably.

Why did you assume that blocksize=1 was best?  R is a vector language, and
it is normally best to use the largest blocks that you can fit in memory.
Not if it is a matrix: what's the function name?  For a general data frame
there really is no choice but to convert each column as a whole.
False: see above.
Yes (and so is the format call), but there is garbage collection.  That's
one reason why a blocksize of 1 is not at all sensible, forcing the loop
to be run thousands of times.  Just choose blocksize to keep this step in
your memory bounds.
Your memory size?  I suggest buying another 512Mb/1Gb of RAM.