parts of data frames: subset vs. [-c()]
From: "Stefan Th. Gries" <stgries_lists at arcor.de> writes:
I have a problem with splitting up a data frame called ReVerb: I would like to extract all cases where SYNTAX=="Ditrans" from ReVerb, store that in a file, and then generate ReVerb again without these cases and factor levels. My problem is probably obvious from the following lines of code:
ditrans<-which(SYNTAX=="Ditrans") ReVerb1<-ReVerb[-c(ditrans),]; dim(ReVerb1)
[1] 91532 16 # ok, so the 92713-91532=1181 cases where SYNTAX=="Ditrans" have been removed, but ...
ReVerb1<-subset(ReVerb, SYNTAX!="Ditrans"); dim(ReVerb1)
[1] 91528 16 # ... so why don't I get 91532 again as the number of rows? # Any ideas??
From: Peter Dalgaard <p.dalgaard at biostat.ku.dk> The SYNTAX variable is not necessarily the same. Could you retry the first case with ditrans <- which(ReVerb$SYNTAX=="Ditrans") ?
The results were the same as with 'ditrans<-which(SYNTAX=="Ditrans")'.
Otherwise, try doing a setdiff() on the rownames of the two discrepant results and see which are the four cases that differ.
This solved the issue: Using setdiff, I found that the cases that the second way with subset fails to include are NA's ... - I was not aware of how subset treats NA, sorry. Thanks a lot, STG -- Stefan Th. Gries ---------------------------------------- Max Planck Inst. for Evol. Anthropology http://people.freenet.de/Stefan_Th_Gries ---------------------------------------- Machen Sie aus 14 Cent spielend bis zu 100 Euro! Die neue Gaming-Area von Arcor - ??ber 50 Onlinespiele im Angebot. http://www.arcor.de/rd/emf-gaming-1