Trying to select a subset of cases (rows of data) I encountered several
problems:
Firstly, because I did not read the help to read.spss() thoroughly
enough, I treated the data read as a data frame. For example,
dr2000 <- read.spss('myfile.sav')
d <- subset(dr2000,RBINZ99 > 0)
and thus received an error message (Object "RBINZ99" not found), because
dr2000 is not a data.frame but a list (shown by class(dr2000)).
d <- subset(dr2000,dr2000$RBINZ99)
didn' help either, because now d is empty (dim = NULL).
Thus, I tried to use the option "to.data.frame=T" of read.spss():
dr2000 <- read.spss('myfile.sav',to.data.frame=T)
However, now R "crashes" ('R for Windows GUI front-end has found an
error and must be closed') (the error message is in German).
Finally, I tried again using read.spss() without the option
'to.data.frame=T' (as before) and tried to convert dr2000 to a data
frame by using
d <- as.data.frame(dr2000)
However, R crashes again (with the same error message).
Of course, I could use SPSS first and save only the cases with RBINZ99 >
0, but this is not always possible (all users of the data must have SPSS
available and we have to use different selection criteria). Is there
another possibility to solve the problem by using R? I want to select
certain rows (cases) based on the values of one "variable" of dr2000,
but keep all columns (variables) - although dr2000 is not a data frame?
And: R should not crash but rather give a warning.
------------------------
R version 2.1.1 Patched (2005-07-15)
Package Foreign Version 0.8-10
Operating system: Windows XP Professional (5.1 (Build 2600))
CPU: Pentium Model 2 Stepping 9
RAM: 512 MB
*************************************************
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Edmund-Siemers-Allee 1
D-20146 Hamburg
Germany
phone: +49-040-42838.7498 (office)
+49-040-42838.4591 (Billon)
fax: +49-040-42838.2344
email: dirk.enzmann at jura.uni-hamburg.de
www:
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html
Problem with read.spss() and as.data.frame(), or: alternative to subset()?
5 messages · Dirk Enzmann, Martin Maechler, Thomas Lumley +1 more
The selection problem can be solved by
dr2000=read.spss('myfile')
d=lapply(dr2000,subset,dr2000$RBINZ99 > 0)
however, there is still the problem that R crashes when using
d = as.data.frame(dr2000)
or
dr2000=read.spss('myfile',to.data.frame=T)
Any suggestions why? I checked whether all components of dr2000 are of
the same length and the sort of object of each component. This is not
the problem: Each component has the same length (9232) and there are 66
components of the class 'character', 981 of the class 'factor', and 479
of the class 'numeric'.
Trying to select a subset of cases (rows of data) I encountered several
problems:
Firstly, because I did not read the help to read.spss() thoroughly
enough, I treated the data read as a data frame. For example,
dr2000 <- read.spss('myfile.sav')
d <- subset(dr2000,RBINZ99 > 0)
and thus received an error message (Object "RBINZ99" not found), because
dr2000 is not a data.frame but a list (shown by class(dr2000)).
d <- subset(dr2000,dr2000$RBINZ99 > 0)
didn' help either, because now d is empty (dim = NULL).
Thus, I tried to use the option "to.data.frame=T" of read.spss():
dr2000 <- read.spss('myfile.sav',to.data.frame=T)
However, now R "crashes" ('R for Windows GUI front-end has found an
error and must be closed') (the error message is in German).
Finally, I tried again using read.spss() without the option
'to.data.frame=T' (as before) and tried to convert dr2000 to a data
frame by using
d <- as.data.frame(dr2000)
However, R crashes again (with the same error message).
Of course, I could use SPSS first and save only the cases with RBINZ99 >
0, but this is not always possible (all users of the data must have SPSS
available and we have to use different selection criteria). Is there
another possibility to solve the problem by using R? I want to select
certain rows (cases) based on the values of one "variable" of dr2000,
but keep all columns (variables) - although dr2000 is not a data frame?
And: R should not crash but rather give a warning.
------------------------
R version 2.1.1 Patched (2005-07-15)
Package Foreign Version 0.8-10
Operating system: Windows XP Professional (5.1 (Build 2600))
CPU: Pentium Model 2 Stepping 9
RAM: 512 MB
*************************************************
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Edmund-Siemers-Allee 1
D-20146 Hamburg
Germany
phone: +49-040-42838.7498 (office)
+49-040-42838.4591 (Billon)
fax: +49-040-42838.2344
email: dirk.enzmann at jura.uni-hamburg.de
www:
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html
"Dirk" == Dirk Enzmann <dirk.enzmann at jura.uni-hamburg.de>
on Wed, 21 Sep 2005 13:18:32 +0200 writes:
Dirk> The selection problem can be solved by
Dirk> dr2000=read.spss('myfile')
Dirk> d=lapply(dr2000,subset,dr2000$RBINZ99 > 0)
Dirk> however, there is still the problem that R crashes when using
Dirk> d = as.data.frame(dr2000)
which is bug in a R, or at least in your R installation.
However we can't do anything about it at the moment, because we
can't even try to do reproduce it...
So dr2000 is a list; what length() does it have?, what names() ?
what does str(dr2000) look like?
What does happen for as.data.frame(dr2000[1:10]) ?
and '100' or '1000' instead of '10'?
Maybe try to find a small version of 'dr2000' which still has
the problem, and show us that one,
e.g. by making it available via http://... if it is still large,
otherwise (if it's small), maybe even posting the result of
dump(..).
Regards,
Martin
2 days later
On Wed, 21 Sep 2005, Martin Maechler wrote:
"Dirk" == Dirk Enzmann <dirk.enzmann at jura.uni-hamburg.de>
on Wed, 21 Sep 2005 13:18:32 +0200 writes:
Dirk> The selection problem can be solved by
Dirk> dr2000=read.spss('myfile')
Dirk> d=lapply(dr2000,subset,dr2000$RBINZ99 > 0)
Dirk> however, there is still the problem that R crashes when using
Dirk> d = as.data.frame(dr2000)
which is bug in a R, or at least in your R installation.
However we can't do anything about it at the moment, because we
can't even try to do reproduce it...
I suspect this is the same stack overflow in coerce.c:substituteList that was reported in PR#8141 -thomas
So dr2000 is a list; what length() does it have?, what names() ? what does str(dr2000) look like? What does happen for as.data.frame(dr2000[1:10]) ? and '100' or '1000' instead of '10'? Maybe try to find a small version of 'dr2000' which still has the problem, and show us that one, e.g. by making it available via http://... if it is still large, otherwise (if it's small), maybe even posting the result of dump(..). Regards, Martin
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle
On Fri, 23 Sep 2005, Thomas Lumley wrote:
On Wed, 21 Sep 2005, Martin Maechler wrote:
"Dirk" == Dirk Enzmann <dirk.enzmann at jura.uni-hamburg.de>
on Wed, 21 Sep 2005 13:18:32 +0200 writes:
Dirk> The selection problem can be solved by
Dirk> dr2000=read.spss('myfile')
Dirk> d=lapply(dr2000,subset,dr2000$RBINZ99 > 0)
Dirk> however, there is still the problem that R crashes when using
Dirk> d = as.data.frame(dr2000)
which is bug in a R, or at least in your R installation.
However we can't do anything about it at the moment, because we
can't even try to do reproduce it...
I suspect this is the same stack overflow in coerce.c:substituteList that was reported in PR#8141
Apparently not (it had only about 1500 columns rather than 198000). After taking it offline I was able to make it work on 1Gb machines under Windows and Linux, and Dirk succeeded using --max-mem-size=640M on Windows. So it looks like it was a problem with total memory usage - I have yet to find out what exactly.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595