Skip to content

how to handle NA values in aggregate()

4 messages · Yao He, arun, Anthony Damico

#
Dear All:

I am trying to calculate four columns' means in a dataframe like this:

FID  MID     IID         EW_INCU EW_17.5   EMW        EEratio
1   4621  TWF2H5    45.26        NA             15.61         NA
1   4621  TWF2H6    48.02        44.09         13.41      0.3041506
2   4630  TWF2H19   51.44       47.81         NA             NA
2   4631  TWF2H21   NA          52.72         16.70      0.3167678
2   4632  TWF2H22   55.70       50.45         16.48      0.3266601
2   4633  TWF2H23   44.42       40.89         12.96      0.3169479

I try this code
But I couldn't set the agrument na.rm=T in the mean() function,so the
results are all NAs

Please tell me how to handle NA values in the use of aggregate()

Thanks a lot

Yao He
?????????????????????????
Master candidate in 2rd year
Department of Animal genetics & breeding
Room 436,College of Animial Science&Technology,
China Agriculture University,Beijing,100193
E-mail: yao.h.1988 at gmail.com
??????????????????????????
#
HI,
Try this:
df1<-read.table(text="
FID? MID??? IID??????? EW_INCU EW_17.5? EMW??????? EEratio
1? 4621? TWF2H5??? 45.26??????? NA??????????? 15.61??????? NA
1? 4621? TWF2H6??? 48.02??????? 44.09??????? 13.41????? 0.3041506
2? 4630? TWF2H19? 51.44????? 47.81??????? NA??????????? NA
2? 4631? TWF2H21? NA????????? 52.72??????? 16.70????? 0.3167678
2? 4632? TWF2H22? 55.70????? 50.45??????? 16.48????? 0.3266601
2? 4633? TWF2H23? 44.42????? 40.89??????? 12.96????? 0.3169479
",sep="",header=TRUE,stringsAsFactors=FALSE)
?
aggregate(df1[,4:7],by=list(df1[,1]),function(x) mean(x,na.rm=T))
#? Group.1 EW_INCU EW_17.5? EMW EEratio
#1?????? 1??? 46.6??? 44.1 14.5?? 0.304
#2?????? 2??? 50.5??? 48.0 15.4?? 0.320





----- Original Message -----
From: Yao He <yao.h.1988 at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Saturday, December 15, 2012 10:44 PM
Subject: [R] how to handle NA values in aggregate()

Dear All:

I am trying to calculate four columns' means in a dataframe like this:

FID? MID? ?  IID? ? ? ?  EW_INCU EW_17.5?  EMW? ? ? ? EEratio
1?  4621? TWF2H5? ? 45.26? ? ? ? NA? ? ? ? ? ?  15.61? ? ? ?  NA
1?  4621? TWF2H6? ? 48.02? ? ? ? 44.09? ? ? ?  13.41? ? ? 0.3041506
2?  4630? TWF2H19?  51.44? ? ?  47.81? ? ? ?  NA? ? ? ? ? ?  NA
2?  4631? TWF2H21?  NA? ? ? ? ? 52.72? ? ? ?  16.70? ? ? 0.3167678
2?  4632? TWF2H22?  55.70? ? ?  50.45? ? ? ?  16.48? ? ? 0.3266601
2?  4633? TWF2H23?  44.42? ? ?  40.89? ? ? ?  12.96? ? ? 0.3169479

I try this code
But I couldn't set the agrument na.rm=T in the mean() function,so the
results are all NAs

Please tell me how to handle NA values in the use of aggregate()

Thanks a lot

Yao He
?????????????????????????
Master candidate in 2rd year
Department of Animal genetics & breeding
Room 436,College of Animial Science&Technology,
China Agriculture University,Beijing,100193
E-mail: yao.h.1988 at gmail.com
??????????????????????????

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
HI,

This should also work:
df1<-read.table(text="
FID? MID??? IID??????? EW_INCU EW_17.5? EMW??????? EEratio
1? 4621? TWF2H5??? 45.26??????? NA??????????? 15.61??????? NA
1? 4621? TWF2H6??? 48.02??????? 44.09??????? 13.41????? 0.3041506
2? 4630? TWF2H19? 51.44????? 47.81??????? NA??????????? NA
2? 4631? TWF2H21? NA????????? 52.72??????? 16.70????? 0.3167678
2? 4632? TWF2H22? 55.70????? 50.45??????? 16.48????? 0.3266601
2? 4633? TWF2H23? 44.42????? 40.89??????? 12.96????? 0.3169479
",sep="",header=TRUE,stringsAsFactors=FALSE)

aggregate(df1[,4:7],by=list(df1[,1]), mean,na.rm=T)
#? Group.1 EW_INCU EW_17.5? EMW EEratio
#1?????? 1??? 46.6??? 44.1 14.5?? 0.304
#2?????? 2??? 50.5??? 48.0 15.4?? 0.320

#or 
library(plyr)
ddply(df1,.(FID),colwise(mean,c("EW_INCU","EW_17.5","EMW","EEratio")),na.rm=TRUE)
#? FID EW_INCU EW_17.5? EMW EEratio
#1?? 1??? 46.6??? 44.1 14.5?? 0.304
#2?? 2??? 50.5??? 48.0 15.4?? 0.320

#or
library(data.table)
df2<-data.table(df1)
?df3<-df2[,c(1,4:7),with=FALSE]
?df3[,lapply(.SD,mean,na.rm=TRUE),by=FID]
#?? FID EW_INCU EW_17.5? EMW EEratio
#1:?? 2??? 50.5??? 48.0 15.4?? 0.320
#2:?? 1??? 46.6??? 44.1 14.5?? 0.304

A.K.



----- Original Message -----
From: Yao He <yao.h.1988 at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Saturday, December 15, 2012 10:44 PM
Subject: [R] how to handle NA values in aggregate()

Dear All:

I am trying to calculate four columns' means in a dataframe like this:

FID? MID? ?  IID? ? ? ?  EW_INCU EW_17.5?  EMW? ? ? ? EEratio
1?  4621? TWF2H5? ? 45.26? ? ? ? NA? ? ? ? ? ?  15.61? ? ? ?  NA
1?  4621? TWF2H6? ? 48.02? ? ? ? 44.09? ? ? ?  13.41? ? ? 0.3041506
2?  4630? TWF2H19?  51.44? ? ?  47.81? ? ? ?  NA? ? ? ? ? ?  NA
2?  4631? TWF2H21?  NA? ? ? ? ? 52.72? ? ? ?  16.70? ? ? 0.3167678
2?  4632? TWF2H22?  55.70? ? ?  50.45? ? ? ?  16.48? ? ? 0.3266601
2?  4633? TWF2H23?  44.42? ? ?  40.89? ? ? ?  12.96? ? ? 0.3169479

I try this code
But I couldn't set the agrument na.rm=T in the mean() function,so the
results are all NAs

Please tell me how to handle NA values in the use of aggregate()

Thanks a lot

Yao He
?????????????????????????
Master candidate in 2rd year
Department of Animal genetics & breeding
Room 436,College of Animial Science&Technology,
China Agriculture University,Beijing,100193
E-mail: yao.h.1988 at gmail.com
??????????????????????????

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.