help with subset(), still original dataframe in tapply
Frank Mattes <f.mattes at rfc.ucl.ac.uk> writes:
I'm creating now a subset missing the values 0 and "NA"
newex<-subset(ex,ex$REL>0) newex
UID REL 5 R1.B8.38 0.010 6 R1.B8.38 0.060 7 R1.B8.38 0.006 8 R1.B8.38 0.010 9 R1.B8.48 0.080 11 R1.B8.48 0.006 and now would like to apply the mean to each group in (UID)
tapply(newex$REL,newex$UID,mean,rm.na=T)
R1.B8.31 R1.B8.38 R1.B8.48
NA 0.0215 0.0430
to my surprise, I still have the mean for group R1.B8.31, which has
been removed by the subset function before.
A subset of a three-level factor is still a three-level factor. If you want it to become a factor with only those levels that are present in data, you need to say so, e.g. with tapply(newex$REL,factor(newex$UID),mean)
but I would like to know why the tapply still uses the original dataframe.
It doesn't.
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907