Message-ID: <x2wugda5yo.fsf@biostat.ku.dk>
Date: 2003-05-26T18:53:44Z
From: Peter Dalgaard
Subject: help with subset(), still original dataframe in tapply
In-Reply-To: <p05200f00baf80a888472@[128.40.218.142]>
Frank Mattes <f.mattes at rfc.ucl.ac.uk> writes:
> I'm creating now a subset missing the values 0 and "NA"
> > newex<-subset(ex,ex$REL>0)
> > newex
> UID REL
> 5 R1.B8.38 0.010
> 6 R1.B8.38 0.060
> 7 R1.B8.38 0.006
> 8 R1.B8.38 0.010
> 9 R1.B8.48 0.080
> 11 R1.B8.48 0.006
>
> and now would like to apply the mean to each group in (UID)
>
> > tapply(newex$REL,newex$UID,mean,rm.na=T)
> R1.B8.31 R1.B8.38 R1.B8.48
> NA 0.0215 0.0430
>
> to my surprise, I still have the mean for group R1.B8.31, which has
> been removed by the subset function before.
A subset of a three-level factor is still a three-level factor. If you
want it to become a factor with only those levels that are present in
data, you need to say so, e.g. with
tapply(newex$REL,factor(newex$UID),mean)
> but I would like to know why the tapply still uses the original dataframe.
It doesn't.
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907