Skip to content

replacing missing values with row average

3 messages · Daniel M., Joshua Wiley, Bert Gunter

#
Hi Daniel,

If your data is stored in a matrix, the following should work (and be
fairly efficient):

#############
dat <- matrix(rnorm(100), nrow = 10)
dat[sample(1:10, 3), sample(1:10, 3)] <- NA
## create an index of missing values
index <- which(is.na(dat), arr.ind = TRUE)
## calculate the row means and "duplicate" them to assign to appropriate cells
dat[index] <- rowMeans(dat, na.rm = TRUE)[index[, "row"]]

## for documentation see
?which # particularly the arr.ind argument
?"[" # for extraction or selecting a subset to overwrite
#############

the only reason this does not work as is with data frames is because
of how they are indexed/subset.  dat[index] does not work.  The
required modification is probably fairly minimal, but if you are happy
to use a matrix, then its a moot issue.

HTH,

Josh
On Sun, Feb 27, 2011 at 3:25 PM, Daniel M. <danielmessay at yahoo.com> wrote:

  
    
#
Warning: This is not a helpful answer. Actually, it's a question: Why
do you want to do this? Replacing missing values with row or column
averages and then analyzing the data as if the missing values were not
there is a dangerous thing to do it can produce biased estimates and
understate the true error, likely resulting in biased inference. Of
course, this depends on the specifics (how many are missing and
where).

R has a lot of built-in capabilities for handling missing values. I
agree: it's not easy stuff. Nor do you necessarily need to get that
complicated: Maybe your scheme is perfectly adequate for your
situation. I just wanted to caution you think about this carefully if
you aren't aware of the possible problems and haven't already done so.

-- Bert
On Sun, Feb 27, 2011 at 3:25 PM, Daniel M. <danielmessay at yahoo.com> wrote: