Message-ID: <1355638962.92830.YahooMailNeo@web142601.mail.bf1.yahoo.com>
Date: 2012-12-16T06:22:42Z
From: arun
Subject: how to handle NA values in aggregate()
In-Reply-To: <CAKK5bs-gHQd=9uT5DRJsSVN7weVPrZnqa5hp9di5Cg__1SE4Gg@mail.gmail.com>
HI,
This should also work:
df1<-read.table(text="
FID? MID??? IID??????? EW_INCU EW_17.5? EMW??????? EEratio
1? 4621? TWF2H5??? 45.26??????? NA??????????? 15.61??????? NA
1? 4621? TWF2H6??? 48.02??????? 44.09??????? 13.41????? 0.3041506
2? 4630? TWF2H19? 51.44????? 47.81??????? NA??????????? NA
2? 4631? TWF2H21? NA????????? 52.72??????? 16.70????? 0.3167678
2? 4632? TWF2H22? 55.70????? 50.45??????? 16.48????? 0.3266601
2? 4633? TWF2H23? 44.42????? 40.89??????? 12.96????? 0.3169479
",sep="",header=TRUE,stringsAsFactors=FALSE)
aggregate(df1[,4:7],by=list(df1[,1]), mean,na.rm=T)
#? Group.1 EW_INCU EW_17.5? EMW EEratio
#1?????? 1??? 46.6??? 44.1 14.5?? 0.304
#2?????? 2??? 50.5??? 48.0 15.4?? 0.320
#or
library(plyr)
ddply(df1,.(FID),colwise(mean,c("EW_INCU","EW_17.5","EMW","EEratio")),na.rm=TRUE)
#? FID EW_INCU EW_17.5? EMW EEratio
#1?? 1??? 46.6??? 44.1 14.5?? 0.304
#2?? 2??? 50.5??? 48.0 15.4?? 0.320
#or
library(data.table)
df2<-data.table(df1)
?df3<-df2[,c(1,4:7),with=FALSE]
?df3[,lapply(.SD,mean,na.rm=TRUE),by=FID]
#?? FID EW_INCU EW_17.5? EMW EEratio
#1:?? 2??? 50.5??? 48.0 15.4?? 0.320
#2:?? 1??? 46.6??? 44.1 14.5?? 0.304
A.K.
----- Original Message -----
From: Yao He <yao.h.1988 at gmail.com>
To: r-help at r-project.org
Cc:
Sent: Saturday, December 15, 2012 10:44 PM
Subject: [R] how to handle NA values in aggregate()
Dear All:
I am trying to calculate four columns' means in a dataframe like this:
FID? MID? ? IID? ? ? ? EW_INCU EW_17.5? EMW? ? ? ? EEratio
1? 4621? TWF2H5? ? 45.26? ? ? ? NA? ? ? ? ? ? 15.61? ? ? ? NA
1? 4621? TWF2H6? ? 48.02? ? ? ? 44.09? ? ? ? 13.41? ? ? 0.3041506
2? 4630? TWF2H19? 51.44? ? ? 47.81? ? ? ? NA? ? ? ? ? ? NA
2? 4631? TWF2H21? NA? ? ? ? ? 52.72? ? ? ? 16.70? ? ? 0.3167678
2? 4632? TWF2H22? 55.70? ? ? 50.45? ? ? ? 16.48? ? ? 0.3266601
2? 4633? TWF2H23? 44.42? ? ? 40.89? ? ? ? 12.96? ? ? 0.3169479
I try this code
> aggregate(df[,4:7],df[,1],mean)
But I couldn't set the agrument na.rm=T in the mean() function,so the
results are all NAs
Please tell me how to handle NA values in the use of aggregate()
Thanks a lot
Yao He
?????????????????????????
Master candidate in 2rd year
Department of Animal genetics & breeding
Room 436,College of Animial Science&Technology,
China Agriculture University,Beijing,100193
E-mail: yao.h.1988 at gmail.com
??????????????????????????
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.