help with subset(), still original dataframe in tapply

2 messages · Frank, Peter Dalgaard

Frank

Mon, May 26, 2003 11:38 AM #

Dear R-help reader,

it would be great if someone knows what I'm doing wrong.
I have (shorten) dataframe, which consists of  a group identification 
and a number

UID   REL
1  R1.B8.31 0.000
2  R1.B8.31 0.000
3  R1.B8.31 0.000
4  R1.B8.31 0.000
5  R1.B8.38 0.010
6  R1.B8.38 0.060
7  R1.B8.38 0.006
8  R1.B8.38 0.010
9  R1.B8.48 0.080
10 R1.B8.48    NA
11 R1.B8.48 0.006

I'm creating now a subset missing the values 0 and "NA"

UID   REL
5  R1.B8.38 0.010
6  R1.B8.38 0.060
7  R1.B8.38 0.006
8  R1.B8.38 0.010
9  R1.B8.48 0.080
11 R1.B8.48 0.006

and now would like to apply the mean to each group in (UID)

R1.B8.31 R1.B8.38 R1.B8.48
       NA   0.0215   0.0430

to my surprise, I still have the mean for group R1.B8.31, which has 
been removed by the subset function before.

I can remove the NA by

  tapply(newex$REL,interaction(newex$UID,drop=T),mean,rm.na=T)

but I would like to know why the tapply still uses the original dataframe.

Many thanks for your help

Frank

Frank Mattes, 				e-mail:	f.mattes at ucl.ac.uk
Department of Virology			fax	0044(0)207 8302854
Royal Free Hospital and 			tel	0044(0)207 8302997
University College Medical School
London

Peter Dalgaard

Mon, May 26, 2003 11:53 AM #

Frank Mattes <f.mattes at rfc.ucl.ac.uk> writes:

A subset of a three-level factor is still a three-level factor. If you
want it to become a factor with only those levels that are present in
data, you need to say so, e.g. with

tapply(newex$REL,factor(newex$UID),mean)

It doesn't.

O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907