-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of mahesh r
Sent: Wednesday, July 19, 2006 4:23 PM
To: r-help at stat.math.ethz.ch
Subject: Re: [R] how to use large data set ?
Hi,
I would like to extend to the query posted earlier on using large data
bases. I am trying to use Rgdal to mine within the remote
sensing imageries.
I dont have problems bring the images within the R
environment. But when I
try to convert the images to a data.frame I receive an
warning message from
R saying "1: Reached total allocation of 510Mb: see
help(memory.size)" and
the process terminates. Due to project constarints I am given a very
old 2.4Ghz computer with only 512 MB RAM. I think what R is currently
doing is
trying to store the results in the RAM and since the image
size is very big
(some 9 million pixels), I think it gets out of memory.
My question is
1. Is there any possibility to dump the temporary variables
in a temp folder
within the hard disk (as many softwares do) instead of leting
R store them
in RAM
2. Could this be possible without creating a connection to a
any back hand
database like Oracle.
Thanks,
Mahesh
On 7/19/06, Greg Snow <Greg.Snow at intermountainmail.org> wrote:
You did not say what analysis you want to do, but many
can be done as special cases of regression models and you
biglm package to do regression models.
Here is an example that worked for me to get the mean and standard
deviation by day from an oracle database with over 23
had previously set up 'edw' as an odbc connection to the
widows, any of the database connections packages should work for you
though):
library(RODBC)
library(biglm)
con <- odbcConnect('edw',uid='glsnow',pwd=pass)
odbcQuery(con, "select ADMSN_WEEKDAY_CD, LOS_DYS from
t1 <- Sys.time()
tmp <- sqlGetResults(con, max=100000)
names(tmp) <- c("Day","LoS")
tmp$Day <- factor(tmp$Day, levels=as.character(1:7))
tmp <- na.omit(tmp)
tmp <- subset(tmp, LoS > 0)
ff <- log(LoS) ~ Day
fit <- biglm(ff, tmp)
i <- nrow(tmp)
while( !is.null(nrow( tmp <- sqlGetResults(con, max=100000) ) ) ){
names(tmp) <- c("Day","LoS")
tmp$Day <- factor(tmp$Day, levels=as.character(1:7))
tmp <- na.omit(tmp)
tmp <- subset(tmp, LoS > 0)
fit <- update(fit,tmp)
i <- i + nrow(tmp)
cat(format(i,big.mark=',')," rows processed\n")
}
summary(fit)
t2 <- Sys.time()
t2-t1
Hope this helps,
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of
Sent: Wednesday, July 19, 2006 9:42 AM
To: 'r-help at stat.math.ethz.ch'
Subject: [R] how to use large data set ?
Hello R users,
Sorry for my English, i'm French.
I want to use a large dataset (3 millions of rows and 70 var) but I
don't know how to do because my computer crash quickly (P4
).
I have also a bi Xeon with 2Go so I want to do computation on this
computer and show the results on mine. Both of them are on
To do shortly I have:
1 server with a MySQL database
1computer
and I want to use them with a large dataset.
I'm trying to use RDCOM to connect the database and
hard for me..) Rpad.
Is there another solutions ?
Thanks in advance
Yohan C.
----------------------------------------------------------------------
Ce message est confidentiel. Son contenu ne represente en
engagement de la part du Groupe Soft Computing sous reserve de tout
accord conclu par ecrit entre vous et le Groupe Soft
publication, utilisation ou diffusion, meme partielle, doit etre
autorisee prealablement.
Si vous n'etes pas destinataire de ce message, merci d'en avertir
immediatement l'expediteur.
This message is confidential. Its content does not constitute a
commitment by Soft Computing Group except where provided for in a
written agreement between you and Soft Computing Group. Any
disclosure, use or dissemination, either whole or partial, is
prohibited. If you are not the intended recipient of this message,
please notify the sender immediately.
----------------------------------------------------------------------
[[alternative HTML version deleted]]