Hi folks, I've got this funny problem with R's foreign library when reading stata files. One file consistently produces vector out of memory errors after gobbling up 2.7G of memory. I parsed through the read.dta function and figured out where the error occurs and the description is below. I am running R-1.8.1 on Debian stable system glibc2.2 kernel 2.4.24. R is is compiled from source as a shared library. The file that I am reading is only 172M in size. The system I am using has 4G of free memory and 8 G of swap so this doesn't seem to be a problem for lack of free memory. See Below. Thanks. ----------------------------------------------------------------------- I stepped through the function and found that everything runs fine but I get a bunch of warnings duing the convert.factors section of the code like:
warnings()
Warning messages: 1: Value labels (fafdstmp) for afdstmp are missing 2: Value labels (fafsmon) for afsmon are missing 3: Value labels (fafsnum) for afsnum are missing 4: Value labels (fafsval) for afsval are missing 5: Value labels (fahcmcar) for ahcmcare are missing 6: Value labels (fahengyv) for ahengyv are missing 7: Value labels (fahenrgy) for ahenrgy are missing 8: Value labels (fahflnch) for ahflnch are missing 9: Value labels (fahflnno) for ahflnno are missing 10: Value labels (fahhcvhi) for ahhcvhi are missing 11: Value labels (fahhhino) for ahhhino are missing 12: Value labels (fahhnum) for ahhnum are missing 13: Value labels (fahmcnum) for ahmcnum are missing 14: Value labels (fahncvhi) for ahncvhi are missing etc. then when I try and return rval as the last line in the function and this is where R starts gobbling up a tone of memory and eventualy dies with a vector memory exhausted error. Do you have a sense of where this could be coming from? Must be something funny about the communication between the foreign library and the main R lib. I'll email the R folks.
On Wed, 4 Feb 2004, Mark S. Handcock wrote:
Date: Wed, 4 Feb 2004 14:38:12 -0800
From: Mark S. Handcock <handcock at stat.washington.edu>
To: 'Cere M. Davis' <cere at u.washington.edu>,
'R. Anderson' <anders10 at u.washington.edu>
Cc: morrism at u.washington.edu, 'Matthew B Weatherford' <mbw at u.washington.edu>,
Msh <handcock at stat.washington.edu>
Subject: RE: error
Cere,
This is useful information. How large is the original data file? If it is
small (<1Gb) then the 2.7Gb is excessive. Have you searched the R users
group on www.r-project.org?
Also, can you try:
rval <- .External("do_readStata", "file", PACKAGE = "foreign")
where "file" is the stata file name on both machines. This is the internal R
read using C, so if that works it is elsewhere in the "read.dta" function
which is easy to fix.
Mark
-----Original Message----- From: Cere M. Davis [mailto:cere at u.washington.edu] Sent: Monday, February 02, 2004 10:45 PM To: R. Anderson Cc: morrism at u.washington.edu; handcock at stat.washington.edu; Matthew B Weatherford Subject: Re: error More info on the R memory problem. Just reading one dta file in via the foreign library requires upwards of 2.7G of memory on any machine, 2.7G is the point at which the process runs out of memory so I can't know the upper limit of this process. I am running the R read process on Libra now but it's been 5 hours since I started the read request and the disk swap is so busy that I cannot tell when the process will finish. There does appear to be a problem with this R job using system swap space on Mosix so a quick test and fix for this is coopt another machine and aggregate some RAM from another machine - if there is physical space in the machine - sometime tommorow hopefully. Stay tuned.
Thanks Robin for this email. I am able to reproduce what
you reported
using the file that you gave me below so thank you very
much for that.
From what I can see this appears to me a memory allocation
issue that
affects all systems but because the main node has such fast ethernet speeds on can see the results of the problem quckly. I am
testing this
problem on a system with more memory and may have a better
sense of what
is needed once I see the results. I'll let you know as I learn more perhaps later today. Thanks, Cere On Wed, 28 Jan 2004, R. Anderson wrote:
Date: Wed, 28 Jan 2004 22:25:11 -0800 (PST) From: R. Anderson <anders10 at u.washington.edu> To: Cere M. Davis <cere at u.washington.edu> Cc: morrism at u.washington.edu Subject: Re: error Cere- In the March files(which use the same .dta as the match
files-- we were
looking at on friday), I was able to get 1979-1988 and
1996-2001 to
run with marchdatameta.R and create Rdata files. However when the meta file ran, for example, 1989, the
vector error
occured again. So I tried running some of the files (marchdatacopy1989.R, marchdatacopy1990.R,...) individually. I was able to
produce an RData set
from the 1989 file. However when I ran the 1990.R file, I got the follwing error:
______________________________________________________________________
##################################################
# marchdatacopy1990.R #
# 10 Jan 2004 -ra #
# #
# This is a template file that is used to read #
# SPSS data into R and should prepare the basic #
# variables needed for the analysis of income #
# for any year 1990 that is specified. It is #
# sourced by the shell script "marchmetacode" #
# for years that are specified in #
# "marchdatameta.R". #
# -RA, 10 Jan 2004 #
##################################################
library(foreign)
options(object.size = 10000000)
mar1990 <-
read.dta("/net/home/morrism/Data/CPS/March/Extracts.all/mar1990.dta")
Error: vector memory exhausted (limit reached?)
Process R segmentation fault at Wed Jan 28 21:14:41 2004
______________________________________________________________
_________
This was ran in mos2, interactively in emacs and the
error differs from
the other vecor errors.
And then I ran the marchdatacopy1990.R in klee and got
the following
warning:
______________________________________________________________
_______________
run marchdatacopy1990.R
/usr/local/R-1.8.1/lib/R/bin/BATCH: line 55: 31545 Done
( echo "invisible(options(echo = TRUE))"; cat ${in}; echo
"proc.time()" )
31546 Killed | ${R_HOME}/bin/R
${opts} >${out} 2>&1
______________________________________________________________
_____________
When I openned the outfile, marchdatacopy1990.Rout, There
was nothing but
the R prompt.(This is outfile after running the file in klee)
I can stop by Friday morning or Thursday
afternoon(I meet with Prof Morris at 3 and can stop by
afterwards).
I think it is very odd that the marchdatameta file ran
without error some
of the years and others it produced an error. Aslo note
that running
the matchdatameta file continued to produce same errors
as before for all
years.
The directories for the match and march are:
/net/home/morrism/Data/CPS/Comp/R/Code/MarchData ---For march
/net/home/morrism/Data/CPS/Comp/R/Code/MatchData ---For match
In each directory I am creating datasets from the same
.dta files, which
are in:
/net/home/morrism/Data/CPS/March/Extracts.all
So I do not understand why the marchdatameta file will
work for some years
and the matchdatameta produces the vector error for all years.
Thanks,
Robin Anderson
On Fri, 23 Jan 2004, Cere M. Davis wrote:
If you are going to be around today please come by and
we'll work on this
some more if you have time.
Cere-
By running ..1987.R through the matchdatmeta.R I do
get the "vector"
error.
I am running that file interactivly through emacs/R
split window.
Here is the file path for the .Rout file:
/net/home/morrism/Data/CPS/Comp/R/Code/MatchData/matchdatacopy
1987.Rout
This is the file path for the file that creates an R
for each year, runs
the R file, by R BATCH --no-save, to get the .Rout file.:
/net/home/morrism/Data/CPS/Comp/R/Code/MatchData/matchdatameta.R
Thanks Again
Robin
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - -
Cere Davis
Unix Systems Administrator - CSDE
cere at u.washington.edu ph: 206.685.5346
https://staff.washington.edu/cere
GnuPG Key http://staff.washington.edu/cere/gpgkey.txt
Key fingerprint = B63C 2361 3B9B 8599 ECC9 D061 3E48
A832 F455 9E7FA
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - -
Cere Davis
Unix Systems Administrator - CSDE
cere at u.washington.edu ph: 206.685.5346
https://staff.washington.edu/cere
GnuPG Key http://staff.washington.edu/cere/gpgkey.txt
Key fingerprint = B63C 2361 3B9B 8599 ECC9 D061 3E48 A832
F455 9E7FA
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - -
Cere Davis
Unix Systems Administrator - CSDE
cere at u.washington.edu ph: 206.685.5346
https://staff.washington.edu/cere
GnuPG Key http://staff.washington.edu/cere/gpgkey.txt
Key fingerprint = B63C 2361 3B9B 8599 ECC9 D061 3E48 A832 F455 9E7FA
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Cere Davis
Unix Systems Administrator - CSDE
cere at u.washington.edu ph: 206.685.5346
https://staff.washington.edu/cere
GnuPG Key http://staff.washington.edu/cere/gpgkey.txt
Key fingerprint = B63C 2361 3B9B 8599 ECC9 D061 3E48 A832 F455 9E7FA