Skip to content

Replacing NAs in one variable with values of another variable

4 messages · Ista Zahn, Nordlund, Dan (DSHS/RDA), StellathePug

#
Hello everyone,
I am trying to figure out a way of replacing missing observations in one of
the variables of a data frame by values of another variable. For example,
assume my data is X

X <-as.data.frame(matrix(c(9, 6, 1, 3, 9, "NA", "NA","NA","NA","NA",
                    6, 4, 3,"NA", "NA", "NA", 5, 4, 1, 3), ncol=2))
names(X)<-c("X1","X2")

I want to change X1 so that instead of the missing values it uses the values
in X2 (regardless of whether these are missing). So my X1, should become
X$X1 <- c(9, 6, 1, 3, 9, "NA", 5, 4, 1, 3).

I have searched online for a while and looked at the manuals and the best
(unsuccessful) attempt I have come up with is

X$X1[X$X1=="NA"] <- X$X2

and that produces the following X1 

X$X1<-c(9, 6, 1, 3, 9, 6, "NA", 3, "NA", "NA")

and generates the following warning:

Warning messages:
1: In `[<-.factor`(`*tmp*`, X$X1 == "NA", value = c(5L, 3L, 2L, 6L,  :
  invalid factor level, NAs generated
2: In x[...] <- m :
  number of items to replace is not a multiple of replacement length

I think that my error is that it is ignoring the non-missing values of X1
and the dimensions don't match. But what I want my code to do is to look at
the rows of X1, see if it's a missing value; if it is, replace it with the
value that is in the row of X2; if it's not missing, leave it as is.

What am I doing wrong?

Thank you very much!
Rita


--
View this message in context: http://r.789695.n4.nabble.com/Replacing-NAs-in-one-variable-with-values-of-another-variable-tp3763269p3763269.html
Sent from the R help mailing list archive at Nabble.com.
#
Hi,
On Tue, Aug 23, 2011 at 12:29 PM, StellathePug <ritacarreira at hotmail.com> wrote:
Note that you don't have any missing values in X, as "NA" != NA

So my X1, should become
Here are two solutions, one that is a correction to your first
attempt, and another using ifelse:

X$X1[X$X1=="NA"] <- X$X2[X$X1=="NA"]

X$X1 <- ifelse(X$X1 == "NA", X$X2, X$X1)


Best,
Ista

  
    
#
Rita,

In addition Ista's advice, I have a question.  Did you really want your columns X1 and X2 to be factors?  Your use of "NA" to represent missing has caused the columns to become factors.  If you actually wanted a numeric matrix | data.frame then remove the quotes from around the NA.  The you need to use is.na() to test for missing.

X <-as.data.frame(matrix(c(9, 6, 1, 3, 9, NA, NA, NA, NA, NA,
                   6, 4, 3, NA, NA, NA, 5, 4, 1, 3), ncol=2))
names(X)<-c("X1","X2")

X$X1 <- ifelse(is.na(X$X1), X$X2, X$X1)



Hope this is helpful,

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
#
Thank you Dan and Ista!

Both of you are correct, I should have used NA rather than "NA" in my
example. So the correct code should be:

X <-as.data.frame(matrix(c(9, 6, 1, 3, 9, NA, NA,NA,NA,NA,
                           6, 4, 3,NA, NA, NA, 5, 4, 1, 3), ncol=2))
names(X)<-c("X1","X2")      

X$X1[is.na(X$X1)] <- X$X2[is.na(X$X1)] 

Where the last line replaces the missing observations of X1 by those of X2.
The "if else" statement also works.

Thank you very much, again!
Rita

--
View this message in context: http://r.789695.n4.nabble.com/Replacing-NAs-in-one-variable-with-values-of-another-variable-tp3763269p3765317.html
Sent from the R help mailing list archive at Nabble.com.