Hello R-help, I am trying to import a large dataset from SPSS into R. The SPSS file is in .SAV format and is about 1GB in size. I use read.spss to import the file and get an error saying that I have run out of memory. I am on a MAC OS X 10.5 system with 4GB of RAM. Monitoring the R process tells me that R runs out of memory when reaching about 3GB of RAM so I suppose the remaining 1GB is used up by the OS. Why would a 1GB SPSS file take up more than 3GB of memory in R? Is it perhaps because R is converting each SPSS column to a less memory- efficient data type? In general, what is the best strategy to load large datasets in R? Thanks! P.S. I exported the SPSS .SAV file to .CSV and tried importing the comma delimited file. Same results ? the import was much slower but eventually I ran out of memory again...
Running out of memory when importing SPSS files
5 messages · dobomode, Uwe Ligges, Thomas Lumley +1 more
dobomode wrote:
Hello R-help, I am trying to import a large dataset from SPSS into R. The SPSS file is in .SAV format and is about 1GB in size. I use read.spss to import the file and get an error saying that I have run out of memory. I am on a MAC OS X 10.5 system with 4GB of RAM. Monitoring the R process tells me that R runs out of memory when reaching about 3GB of RAM so I suppose the remaining 1GB is used up by the OS. Why would a 1GB SPSS file take up more than 3GB of memory in R?
Because SPSS stores data in a compressed way? > Is it
perhaps because R is converting each SPSS column to a less memory- efficient data type? In general, what is the best strategy to load large datasets in R?
Use a 64-bit version of R and have sufficient amount of RAM in your system. Uwe Ligges
Thanks! P.S. I exported the SPSS .SAV file to .CSV and tried importing the comma delimited file. Same results ? the import was much slower but eventually I ran out of memory again...
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
I found the culprit. I had a number of variables in the SPSS file that were a variable length string data type (255 characters). This seemed to force R into creating 255-byte variables which eventually choked my machine's memory... On Feb 18, 5:34?pm, Uwe Ligges <lig... at statistik.tu-dortmund.de> wrote:
dobomodewrote:
Hello R-help,
I am trying to import a large dataset from SPSS into R. The SPSS file is in .SAV format and is about 1GB in size. I use read.spss to import the file and get an error saying that I have run out of memory. I am on a MAC OS X 10.5 system with 4GB of RAM. Monitoring the R process tells me that R runs out of memory when reaching about 3GB of RAM so I suppose the remaining 1GB is used up by the OS.
Why would a 1GB SPSS file take up more than 3GB of memory in R?
Because SPSS stores data in a compressed way? ?> Is it
perhaps because R is converting each SPSS column to a less memory- efficient data type? In general, what is the best strategy to load large datasets in R?
Use a 64-bit version of R and have sufficient amount of RAM in your system. Uwe Ligges
Thanks!
P.S.
I exported the SPSS .SAV file to .CSV and tried importing the comma delimited file. Same results ? the import was much slower but eventually I ran out of memory again...
______________________________________________ R-h... at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Wed, 18 Feb 2009, Uwe Ligges wrote:
dobomode wrote:
Hello R-help, I am trying to import a large dataset from SPSS into R. The SPSS file is in .SAV format and is about 1GB in size. I use read.spss to import the file and get an error saying that I have run out of memory. I am on a MAC OS X 10.5 system with 4GB of RAM. Monitoring the R process tells me that R runs out of memory when reaching about 3GB of RAM so I suppose the remaining 1GB is used up by the OS. Why would a 1GB SPSS file take up more than 3GB of memory in R?
Because SPSS stores data in a compressed way?
Or because R uses quite a lot more memory to read a data set than to store it. Either way, even if the data set eventually took up only 1Gb in R you still would probably not be able to work usefully with it on a 32-bit machine.
You need to either use a 64-bit system or avoid loading the whole data set. Unfortunately read.spss can't read the data selectively [something I'd like to fix, sometime], but if you had a .csv file you could read a subset of columns or rows using read.table.
A better bet is likely to be putting the data set into a database (SQLite is easiest) and reading subsets of the data that way. That's how I handle data sets of a few Gb (on a laptop with 1Gb memory).
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
2009/2/19 Thomas Lumley <tlumley at u.washington.edu>:
On Wed, 18 Feb 2009, Uwe Ligges wrote:
dobomode wrote:
Hello R-help, I am trying to import a large dataset from SPSS into R. The SPSS file is in .SAV format and is about 1GB in size. I use read.spss to import the file and get an error saying that I have run out of memory. I am on a MAC OS X 10.5 system with 4GB of RAM. Monitoring the R process tells me that R runs out of memory when reaching about 3GB of RAM so I suppose the remaining 1GB is used up by the OS. Why would a 1GB SPSS file take up more than 3GB of memory in R?
Because SPSS stores data in a compressed way?
Or because R uses quite a lot more memory to read a data set than to store
it. Either way, even if the data set eventually took up only 1Gb in R you
still would probably not be able to work usefully with it on a 32-bit
machine.
You need to either use a 64-bit system or avoid loading the whole data set.
Unfortunately read.spss can't read the data selectively [something I'd like
to fix, sometime], but if you had a .csv file you could read a subset of
columns or rows using read.table.
A better bet is likely to be putting the data set into a database (SQLite is
easiest) and reading subsets of the data that way. That's how I handle data
sets of a few Gb (on a laptop with 1Gb memory).
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
You could try using package memisc and only bring in the variables you need to analyse. see spss.system.file() and the additional subset() methods in memisc. Paul Bivand --------------------------------------------------------- Paul Bivand Head of Analysis and Statistics Inclusion Inclusion has a launched a new website, please visit: www.cesi.org.uk