Reading large files in R
You can also use the RODBC package to hold the data in a database, say MySQL and only import it when you do the modelling, e.g.
library(RODBC)
library(sspir)
con <- odbcConnect("MySQL Test")
data(vandrivers)
sqlSave(con,dat=vandrivers,append=FALSE)
rm(vandrivers)
gc()
van.call <- sqlQuery(con,'select * from vandrivers;')
vd <- ssm( y ~ tvar(1) + seatbelt + sumseason(time,12),
time=time, family=poisson(link="log"),
data=eval(van.call))
vd$ss$phi["(Intercept)"] <- exp(- 2*3.703307 )
vd$ss$C0 <- diag(13)*1000
vd.res <- kfs(vd)
gc()
In this case I have first saved the vandriver data in 'MySQL Test', but one can obviously write the data directly to the database. Since the data is not held in memory I find that I can do much larger computations than is otherwise possible. The downside is of course that computations take a bit longer. Best wishes, Andreas ===================== Andreas D Hary Email: u08adh at hotmail.com Mobile: 07906860987 Phone: 02076554940 ----- Original Message ----- From: "Berton Gunter" <gunter.berton at gene.com> To: <ramasamy at cancer.org.uk>; "'Jean-Pierre Gattuso'" <gattuso at obs-vlfr.fr> Cc: <r-help at stat.math.ethz.ch> Sent: Monday, August 08, 2005 8:35 PM Subject: Re: [R] Reading large files in R
... and it is likely that even if you did have enough memory (several times the size of the data are generally needed) it would take a very long time. If you do have enough memory and the data are all of one type -- numeric here -- you're better off treating it as a matrix rather than converting it to a data frame. -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA "The business of the statistician is to catalyze the scientific learning process." - George E. P. Box
-----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Adaikalavan Ramasamy Sent: Monday, August 08, 2005 12:02 PM To: Jean-Pierre Gattuso Cc: r-help at stat.math.ethz.ch Subject: Re: [R] Reading large files in R
From Note section of help("read.delim") :
'read.table' is not the right tool for reading large matrices,
especially those with many columns: it is designed to read _data
frames_ which may have columns of very different classes. Use
'scan' instead.
So I am not sure why you used 'scan', then converted it to a
data frame.
1) Can provide an sample of the data that you are trying to read in.
2) How much memory does your machine has ?
3) Try reading in the first few lines using the nmax argument in scan.
Regards, Adai
On Mon, 2005-08-08 at 12:50 -0600, Jean-Pierre Gattuso wrote:
Dear R-listers: I am trying to work with a big (262 Mb) file but apparently
reach a
memory limit using R on a MacOSX as well as on a unix machine. This is the script:
> type=list(a=0,b=0,c=0) > tmp <- scan(file="coastal_gebco_sandS_blend.txt", what=type,
sep="\t", quote="\"", dec=".", skip=1, na.strings="-99",
nmax=13669628)
Read 13669627 records
> gebco <- data.frame(tmp)
Error: cannot allocate vector of size 106793 Kb Even tmp does not seem right:
> summary(tmp)
Error: recursive default argument reference Do you have any suggestion? Thanks, Jean-Pierre Gattuso
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html