Skip to content

memory problem in handling large dataset

3 messages · Liaw, Andy, Weiwei Shi, Søren Højsgaard

#
If my calculation is correct (very doubtful, sometimes), that's
[1] 4116.446

or over 4 terabytes, just to store the data in memory.

To sample rows and read that into R, Bert's suggestion of using connections,
perhaps along with seek() for skipping ahead, would be what I'd try.  I had
try to do such things in Python as a chance to learn that language, but I
found operationally it's easier to maintain the project by doing everything
in one language, namely R, if possible.

Andy
#
Dear Andy:
I think our emails crossed. But thanks as before.

Weiwei
On 10/27/05, Liaw, Andy <andy_liaw at merck.com> wrote:
#
An alternative could be to store data in a MySql database and then select a sample of the cases using the RODBC package.
Best
S??ren

________________________________

Fra: r-help-bounces at stat.math.ethz.ch p?? vegne af Liaw, Andy
Sendt: to 27-10-2005 19:21
Til: 'Berton Gunter'; 'Weiwei Shi'; 'r-help'
Emne: Re: [R] memory problem in handling large dataset



If my calculation is correct (very doubtful, sometimes), that's
[1] 4116.446

or over 4 terabytes, just to store the data in memory.

To sample rows and read that into R, Bert's suggestion of using connections,
perhaps along with seek() for skipping ahead, would be what I'd try.  I had
try to do such things in Python as a chance to learn that language, but I
found operationally it's easier to maintain the project by doing everything
in one language, namely R, if possible.

Andy
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html