memory problem for R

An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: https://stat.ethz.ch/pipermail/r-help/attachments/20040129/3f4736f1/attachment.pl
Here is the exact error I got
----------------------
Read 73 items
Error: cannot allocate vector of size 1953 Kb
Execution halted
-----------------------
I am running R on Freebsd 4.3
with double CPU and 2 GB memory
Is that sufficient?

hw.model: Pentium III/Pentium III Xeon/Celeron
hw.ncpu: 2
hw.byteorder: 1234
hw.physmem: 2144411648
hw.usermem: 2009980928

thanks for your advice in advance,

Yun-Fang
----- Original Message -----
From: "Yun-Fang Juan" <yunfang at yahoo-inc.com>
To: <r-help at stat.math.ethz.ch>
Sent: Thursday, January 29, 2004 7:03 PM
Subject: [R] memory problem for R
Hi,
I try to use lm to fit a linear model with 600k rows and 70 attributes.
But I can't even load the data into the R environment.
The error message says the vector memory is used up.

Is there anyone having experience with large datasets in R? (I bet)

Please advise.

thanks,

Yun-Fang

[[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Here is the exact error I got
----------------------
Read 73 items
Error: cannot allocate vector of size 1953 Kb
Execution halted
-----------------------
I am running R on Freebsd 4.3
with double CPU and 2 GB memory
Is that sufficient?
Clearly not.  What is the structure of your `attributes'?  As Andy Liaw
said, the design matrix may be bigger than that if there are factors
involved.  (And you need several copies of the design matrix.)

I would try a 10% sample of the rows to get a measure of what will fit
into your memory.  I have never seen a regression problem for which 600k
cases were needed, and would be interested to know the context.  (It is
hard to imagine that the cases are from a single homogeneous population
and that a linear model fits so well that the random error is not 
dominated by systematic error.)
Yun-Fang
----- Original Message -----
From: "Yun-Fang Juan" <yunfang at yahoo-inc.com>
To: <r-help at stat.math.ethz.ch>
Sent: Thursday, January 29, 2004 7:03 PM
Subject: [R] memory problem for R

Hi,
I try to use lm to fit a linear model with 600k rows and 70 attributes.
But I can't even load the data into the R environment.
The error message says the vector memory is used up.

Is there anyone having experience with large datasets in R? (I bet)

Please advise.

thanks,

Yun-Fang

[[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
Hello, Yun-Fan: 

      Prof. Ripley's comments will get you started.  Part of the key is 
finding informative ways to subset and summarize the data so you don't 
try to read it all into R at once.  You can read segments using 
arguments "skip" and "nrows" in "read.table".  You can then analyze a 
portion, save a summary, discard the bulk of the data and read another 
portion. 

      Beyond this, you may know that Kalman filtering is essentially 
linear regression performed one observation or one group of observations 
at a time, downweighting "older" observations gracefully.  It 
essentially assumes that the regression parameters follow a random walk 
between observations or groups of observations.  I've done ordinary 
least squares with Kalman filtering software one observation at a time, 
just by setting the migration variance to zero.  R software for Kalman 
filtering was discussed recently in this list;  to find it, I would use 
the search facilities described in the posting guide at the end of every 
r-help email. 

      hope this helps. 
      spencer graves

On Thu, 29 Jan 2004, Yun-Fang Juan wrote:

Here is the exact error I got
----------------------
Read 73 items
Error: cannot allocate vector of size 1953 Kb
Execution halted
-----------------------
I am running R on Freebsd 4.3
with double CPU and 2 GB memory
Is that sufficient?

Clearly not.  What is the structure of your `attributes'?  As Andy Liaw
said, the design matrix may be bigger than that if there are factors
involved.  (And you need several copies of the design matrix.)

I would try a 10% sample of the rows to get a measure of what will fit
into your memory.  I have never seen a regression problem for which 600k
cases were needed, and would be interested to know the context.  (It is
hard to imagine that the cases are from a single homogeneous population
and that a linear model fits so well that the random error is not 
dominated by systematic error.)

Yun-Fang
----- Original Message -----
From: "Yun-Fang Juan" <yunfang at yahoo-inc.com>
To: <r-help at stat.math.ethz.ch>
Sent: Thursday, January 29, 2004 7:03 PM
Subject: [R] memory problem for R

Hi,
I try to use lm to fit a linear model with 600k rows and 70 attributes.
But I can't even load the data into the R environment.
The error message says the vector memory is used up.

Is there anyone having experience with large datasets in R? (I bet)

Please advise.

thanks,

Yun-Fang

[[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!

http://www.R-project.org/posting-guide.html

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Pleaase see the comments below.
Here is the exact error I got
----------------------
Read 73 items
Error: cannot allocate vector of size 1953 Kb
Execution halted
-----------------------
I am running R on Freebsd 4.3
with double CPU and 2 GB memory
Is that sufficient?
Clearly not.  What is the structure of your `attributes'?  As Andy Liaw
said, the design matrix may be bigger than that if there are factors
involved.  (And you need several copies of the design matrix.)

I would try a 10% sample of the rows to get a measure of what will fit
into your memory.  I have never seen a regression problem for which 600k
cases were needed, and would be interested to know the context.  (It is
hard to imagine that the cases are from a single homogeneous population
and that a linear model fits so well that the random error is not
dominated by systematic error.)
I tried 10% sample and it turned out the matrix became singular after I did
that.
Ther reason is some of the attributes only have zero values most of the
time.
The data i am using is web log data and after some transformation, they are
all numeric.
Can we specify some parameters in read.table so that the program will treat
all the vars as numeric
(with this context, hopefully that will reduce the memory consumption)  ?

thanks a lot,

Yun-Fang

Yun-Fang
----- Original Message -----
From: "Yun-Fang Juan" <yunfang at yahoo-inc.com>
To: <r-help at stat.math.ethz.ch>
Sent: Thursday, January 29, 2004 7:03 PM
Subject: [R] memory problem for R

Hi,
I try to use lm to fit a linear model with 600k rows and 70
attributes.
But I can't even load the data into the R environment.
The error message says the vector memory is used up.

Is there anyone having experience with large datasets in R? (I bet)

Please advise.

thanks,

Yun-Fang

[[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

--
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Was your 10% sample contiguous or randomly selected from the 
entire file?  If contiguous, you might get something from, say, 
processing the file in 100 contiguous blocks, computing something like 
the mean of each 1% block (or summarizing in some other way within 
blocks), then combining the summaries and do regression on block 
summaries. 

      If it was an honest random sample (e.g., selecting approximately 
10% from each 10%), then the block averaging won't work:  You have an 
inherent singularity in the structure of the data that will likely not 
permit you to estimate everything you want to estimate.  You need to 
understand that singularity / lack of estimability and decide what to do 
about it. 

      In either case, "lm(..., singular.ok=T)" will at least give you an 
answer even when the model is not fully estimable. 

      hope this helps. 
      spencer graves

Pleaase see the comments below.

Here is the exact error I got
----------------------
Read 73 items
Error: cannot allocate vector of size 1953 Kb
Execution halted
-----------------------
I am running R on Freebsd 4.3
with double CPU and 2 GB memory
Is that sufficient?

Clearly not.  What is the structure of your `attributes'?  As Andy Liaw
said, the design matrix may be bigger than that if there are factors
involved.  (And you need several copies of the design matrix.)

I would try a 10% sample of the rows to get a measure of what will fit
into your memory.  I have never seen a regression problem for which 600k
cases were needed, and would be interested to know the context.  (It is
hard to imagine that the cases are from a single homogeneous population
and that a linear model fits so well that the random error is not
dominated by systematic error.)

I tried 10% sample and it turned out the matrix became singular after I did
that.
Ther reason is some of the attributes only have zero values most of the
time.
The data i am using is web log data and after some transformation, they are
all numeric.
Can we specify some parameters in read.table so that the program will treat
all the vars as numeric
(with this context, hopefully that will reduce the memory consumption)  ?

thanks a lot,

Yun-Fang

Yun-Fang
----- Original Message -----
From: "Yun-Fang Juan" <yunfang at yahoo-inc.com>
To: <r-help at stat.math.ethz.ch>
Sent: Thursday, January 29, 2004 7:03 PM
Subject: [R] memory problem for R

Hi,
I try to use lm to fit a linear model with 600k rows and 70

attributes.

But I can't even load the data into the R environment.
The error message says the vector memory is used up.

Is there anyone having experience with large datasets in R? (I bet)

Please advise.

thanks,

Yun-Fang

[[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!

http://www.R-project.org/posting-guide.html

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!

http://www.R-project.org/posting-guide.html

--
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html