Skip to content

Huge memory comsumption with foreign and RPgSQL

3 messages · Perttu Muurimäki, Peter Dalgaard, Tim Keitt

#
I know this is something R isn't meant to do well but I tried it anyway :)

I have this SPSS-datafile (size 31 MB). When I converted it to a R object
with read.spss("datafile.sav") I ended up with a .RData-file which was 229
MB big. Is this considered normal?

Then I tried to dump that object into a database with RPgSQL-package
function db.write.table(object) (Memory ran out first time I tried to
convert SPSS-file into a R-object so I was quite prepared for the
database manouvre ; I increased the size of swap (working with linux) to
2500 MB) The process kept going and going and getting bigger and bigger.
After 6 hours and 30 minutes I aborted it. At that time the process had
grown into 1400 MB:s. Again, is this considered normal? And further more,
am I likely to succeed if I'm patient enough?

-perttu-

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Perttu Muurim?ki <Perttu.Muurimaki at Helsinki.Fi> writes:
Doesn't sound completely unreasonable: If all your fields fit in a
single byte to begin with and get converted to double in the process,
you'll have an inflation by a factor of 8.
This, however, sounds a bit excessive, although I wouldn't know
exactly what goes on inside RPgSQL... If it is converting every field
in the entire data frame to string form before sending it to the
database, then I might understand. Might it be possible to send it in
smaller blocks?
#
I usually import directly into postgresql first and then read the data 
using rpgsql.  In psql, create a table, e.g.,

   create table my_table (col1 int, col2 float, ...)

then format your data as a tab-separated ascii file, one column per 
variable.  In psql,

   \copy my_table from 'filename'

or

   \g copy my_table from 'filename' using delimiters 'delim' with null 
as 'null string'

Once the data are in postgresql, fire up R and read the tables with rpgsql.

Tim
Peter Dalgaard BSA wrote: