Skip to content

how to substitute missing values (NAs) by the group means

5 messages · Mao Jianfeng, Henrique Dallazuanna, David Winsemius +2 more

#
Try this:

d$traits[is.na(d$traits)] <- ave(d$traits,
                                           d$group,
                                            FUN=function(x)mean(x,
na.rm = T))[is.na(d$traits)]
On 6/8/09, Mao Jianfeng <jianfeng.mao at gmail.com> wrote:

  
    
#
On Jun 8, 2009, at 9:56 PM, Mao Jianfeng wrote:

            
This should replace any NA by the mean with the group, or the non-NA  
value:

as.numeric(apply(df, 1, function (x) ifelse( is.na(x[2]),
                                              tapply(df$traits, df 
$group, mean, na.rm=TRUE)[x[1]] ,
                                              x[2] )
             )                               )

  [1] 7.300 7.300 7.300 5.300 5.400 5.600 5.275 5.275 4.800 8.100  
6.000 6.000 6.100

Whether that is the "right solution" depends on your artistic standards.

If you accept that solution, you would execute:

df$traits <-  <the above expression>


Another approach only replacing the NA's, rather than the whole column:

df[is.na(df$traits), "traits"] <- tapply(df$traits, df$group, mean,  
na.rm=TRUE)[ df[is.na(df$traits),"group"] ]
#
On Mon, Jun 8, 2009 at 8:56 PM, Mao Jianfeng<jianfeng.mao at gmail.com> wrote:
Here's yet another way, using the plyr package, http://had.co.nz/

library(plyr)
impute.mean <- function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))
ddply(df, ~ group, transform, traits = impute.mean(traits))

Or if you wanted to make it a little more generic

impute <- function(x, fun) {
  missing <- is.na(x)
  replace(x, missing, fun(x[!missing]))
}
ddply(df, ~ group, transform, traits = impute(traits, mean))
ddply(df, ~ group, transform, traits = impute(traits, median))
ddply(df, ~ group, transform, traits = impute(traits, min))

Hadley