how to substitute missing values (NAs) by the group means

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090609/955cee55/attachment-0001.pl>
Try this:

d$traits[is.na(d$traits)] <- ave(d$traits,
                                           d$group,
                                            FUN=function(x)mean(x,
na.rm = T))[is.na(d$traits)]
Dear Ruser's

I ask for helps on how to substitute missing values (NAs) by mean of the
group it is belonging to.

my dummy dataframe is:

df
       group traits
1  BSPy01-10     NA
2  BSPy01-10    7.3
3  BSPy01-10    7.3
4  BSPy01-11    5.3
5  BSPy01-11    5.4
6  BSPy01-11    5.6
7  BSPy01-11     NA
8  BSPy01-11     NA
9  BSPy01-11    4.8
10 BSPy01-12    8.1
11 BSPy01-12    6.0
12 BSPy01-12    6.0
13 BSPy01-13    6.1

I want to substitute each "NA" by the group mean of which the "NA" is
belonging to. For example, substitute the first record of traits "NA" by the
mean of "BSPy01-10".

I have ever tried to solve this problem by using doBy package. But, I
failed. I ask for the right solutions by using doBy package or not.

The commands used and the output I got are as followed:

library(doBy)
df<-orderBy(~group,data=df)   # succeeded
f1<-function(x){m<-mean(x, na.ram=TRUE); x[is.na(x)]<-m; x} # succeeded
datatraits<-lapplyBy(traits~group,data=df, FUN=f1(traits)) # failed
errors: mean(x, na.ram = TRUE), can not find 'traits'.

Thanks in advance.

Sincerely,

Mao J-F

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Henrique Dallazuanna
Curitiba-Paran?-Brasil
25? 25' 40" S 49? 16' 22" O

Dear Ruser's

I ask for helps on how to substitute missing values (NAs) by mean of  
the
group it is belonging to.

my dummy dataframe is:

df
      group traits
1  BSPy01-10     NA
2  BSPy01-10    7.3
3  BSPy01-10    7.3
4  BSPy01-11    5.3
5  BSPy01-11    5.4
6  BSPy01-11    5.6
7  BSPy01-11     NA
8  BSPy01-11     NA
9  BSPy01-11    4.8
10 BSPy01-12    8.1
11 BSPy01-12    6.0
12 BSPy01-12    6.0
13 BSPy01-13    6.1

I want to substitute each "NA" by the group mean of which the "NA" is
belonging to. For example, substitute the first record of traits  
"NA" by the
mean of "BSPy01-10".

I have ever tried to solve this problem by using doBy package. But, I
failed. I ask for the right solutions by using doBy package or not.
This should replace any NA by the mean with the group, or the non-NA  
value:

as.numeric(apply(df, 1, function (x) ifelse( is.na(x[2]),
                                              tapply(df$traits, df 
$group, mean, na.rm=TRUE)[x[1]] ,
                                              x[2] )
             )                               )

  [1] 7.300 7.300 7.300 5.300 5.400 5.600 5.275 5.275 4.800 8.100  
6.000 6.000 6.100

Whether that is the "right solution" depends on your artistic standards.

If you accept that solution, you would execute:

df$traits <-  <the above expression>

Another approach only replacing the NA's, rather than the whole column:

df[is.na(df$traits), "traits"] <- tapply(df$traits, df$group, mean,  
na.rm=TRUE)[ df[is.na(df$traits),"group"] ]

The commands used and the output I got are as followed:

library(doBy)
df<-orderBy(~group,data=df)   # succeeded
f1<-function(x){m<-mean(x, na.ram=TRUE); x[is.na(x)]<-m; x} #  
succeeded
datatraits<-lapplyBy(traits~group,data=df, FUN=f1(traits)) # failed
errors: mean(x, na.ram = TRUE), can not find 'traits'.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090608/2be5d64c/attachment-0001.pl>
Dear Ruser's

I ask for helps on how to substitute missing values (NAs) by mean of the
group it is belonging to.

my dummy dataframe is:

df
? ? ? group traits
1 ?BSPy01-10 ? ? NA
2 ?BSPy01-10 ? ?7.3
3 ?BSPy01-10 ? ?7.3
4 ?BSPy01-11 ? ?5.3
5 ?BSPy01-11 ? ?5.4
6 ?BSPy01-11 ? ?5.6
7 ?BSPy01-11 ? ? NA
8 ?BSPy01-11 ? ? NA
9 ?BSPy01-11 ? ?4.8
10 BSPy01-12 ? ?8.1
11 BSPy01-12 ? ?6.0
12 BSPy01-12 ? ?6.0
13 BSPy01-13 ? ?6.1

I want to substitute each "NA" by the group mean of which the "NA" is
belonging to. For example, substitute the first record of traits "NA" by the
mean of "BSPy01-10".
Here's yet another way, using the plyr package, http://had.co.nz/

library(plyr)
impute.mean <- function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))
ddply(df, ~ group, transform, traits = impute.mean(traits))

Or if you wanted to make it a little more generic

impute <- function(x, fun) {
  missing <- is.na(x)
  replace(x, missing, fun(x[!missing]))
}
ddply(df, ~ group, transform, traits = impute(traits, mean))
ddply(df, ~ group, transform, traits = impute(traits, median))
ddply(df, ~ group, transform, traits = impute(traits, min))

Hadley
http://had.co.nz/