I know this is something R isn't meant to do well but I tried it anyway :)
I have this SPSS-datafile (size 31 MB). When I converted it to a R object
with read.spss("datafile.sav") I ended up with a .RData-file which was 229
MB big. Is this considered normal?
Then I tried to dump that object into a database with RPgSQL-package
function db.write.table(object) (Memory ran out first time I tried to
convert SPSS-file into a R-object so I was quite prepared for the
database manouvre ; I increased the size of swap (working with linux) to
2500 MB) The process kept going and going and getting bigger and bigger.
After 6 hours and 30 minutes I aborted it. At that time the process had
grown into 1400 MB:s. Again, is this considered normal? And further more,
am I likely to succeed if I'm patient enough?
-perttu-
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Huge memory comsumption with foreign and RPgSQL
3 messages · Perttu Muurimäki, Peter Dalgaard, Tim Keitt
Perttu Muurim?ki <Perttu.Muurimaki at Helsinki.Fi> writes:
I know this is something R isn't meant to do well but I tried it anyway :)
I have this SPSS-datafile (size 31 MB). When I converted it to a R object
with read.spss("datafile.sav") I ended up with a .RData-file which was 229
MB big. Is this considered normal?
Doesn't sound completely unreasonable: If all your fields fit in a single byte to begin with and get converted to double in the process, you'll have an inflation by a factor of 8.
Then I tried to dump that object into a database with RPgSQL-package function db.write.table(object) (Memory ran out first time I tried to convert SPSS-file into a R-object so I was quite prepared for the database manouvre ; I increased the size of swap (working with linux) to 2500 MB) The process kept going and going and getting bigger and bigger. After 6 hours and 30 minutes I aborted it. At that time the process had grown into 1400 MB:s. Again, is this considered normal? And further more, am I likely to succeed if I'm patient enough?
This, however, sounds a bit excessive, although I wouldn't know exactly what goes on inside RPgSQL... If it is converting every field in the entire data frame to string form before sending it to the database, then I might understand. Might it be possible to send it in smaller blocks?
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
I usually import directly into postgresql first and then read the data using rpgsql. In psql, create a table, e.g., create table my_table (col1 int, col2 float, ...) then format your data as a tab-separated ascii file, one column per variable. In psql, \copy my_table from 'filename' or \g copy my_table from 'filename' using delimiters 'delim' with null as 'null string' Once the data are in postgresql, fire up R and read the tables with rpgsql. Tim
Peter Dalgaard BSA wrote:
Perttu Muurim?ki <Perttu.Muurimaki at Helsinki.Fi> writes:
I know this is something R isn't meant to do well but I tried it anyway :)
I have this SPSS-datafile (size 31 MB). When I converted it to a R object
with read.spss("datafile.sav") I ended up with a .RData-file which was 229
MB big. Is this considered normal?
Doesn't sound completely unreasonable: If all your fields fit in a single byte to begin with and get converted to double in the process, you'll have an inflation by a factor of 8.
Then I tried to dump that object into a database with RPgSQL-package function db.write.table(object) (Memory ran out first time I tried to convert SPSS-file into a R-object so I was quite prepared for the database manouvre ; I increased the size of swap (working with linux) to 2500 MB) The process kept going and going and getting bigger and bigger. After 6 hours and 30 minutes I aborted it. At that time the process had grown into 1400 MB:s. Again, is this considered normal? And further more, am I likely to succeed if I'm patient enough?
This, however, sounds a bit excessive, although I wouldn't know exactly what goes on inside RPgSQL... If it is converting every field in the entire data frame to string form before sending it to the database, then I might understand. Might it be possible to send it in smaller blocks?
Timothy H. Keitt Department of Ecology and Evolution State University of New York at Stony Brook Phone: 631-632-1101, FAX: 631-632-7626 http://life.bio.sunysb.edu/ee/keitt/ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._