Skip to content
Back to formatted view

Raw Message

Message-ID: <4020F8CE.2010101@unine.ch>
Date: 2004-02-04T13:51:10Z
From: Fabien Fivaz
Subject: Using huge datasets

Hi,

Here is what I want to do. I have a dataset containing 4.2 *million* 
rows and about 10 columns and want to do some statistics with it, mainly 
using it as a prediction set for GAM and GLM models. I tried to load it 
from a csv file but, after filling up memory and part of the swap (1 gb 
each), I get a segmentation fault and R stops. I use R under Linux. Here 
are my questions :

1) Has anyone ever tried to use such a big dataset?
2) Do you think that it would possible on a more powerfull machine, such 
as a cluster of computers?
3) Finaly, does R has some "memory limitation" or does it just depend on 
the machine I'm using?

Best wishes

Fabien Fivaz