Hello everyone,
I am trying to figure out a way of replacing missing observations in one of
the variables of a data frame by values of another variable. For example,
assume my data is X
X <-as.data.frame(matrix(c(9, 6, 1, 3, 9, "NA", "NA","NA","NA","NA",
6, 4, 3,"NA", "NA", "NA", 5, 4, 1, 3), ncol=2))
names(X)<-c("X1","X2")
I want to change X1 so that instead of the missing values it uses the values
in X2 (regardless of whether these are missing). So my X1, should become
X$X1 <- c(9, 6, 1, 3, 9, "NA", 5, 4, 1, 3).
I have searched online for a while and looked at the manuals and the best
(unsuccessful) attempt I have come up with is
X$X1[X$X1=="NA"] <- X$X2
and that produces the following X1
X$X1<-c(9, 6, 1, 3, 9, 6, "NA", 3, "NA", "NA")
and generates the following warning:
Warning messages:
1: In `[<-.factor`(`*tmp*`, X$X1 == "NA", value = c(5L, 3L, 2L, 6L, :
invalid factor level, NAs generated
2: In x[...] <- m :
number of items to replace is not a multiple of replacement length
I think that my error is that it is ignoring the non-missing values of X1
and the dimensions don't match. But what I want my code to do is to look at
the rows of X1, see if it's a missing value; if it is, replace it with the
value that is in the row of X2; if it's not missing, leave it as is.
What am I doing wrong?
Thank you very much!
Rita
--
View this message in context: http://r.789695.n4.nabble.com/Replacing-NAs-in-one-variable-with-values-of-another-variable-tp3763269p3763269.html
Sent from the R help mailing list archive at Nabble.com.
Replacing NAs in one variable with values of another variable
4 messages · Ista Zahn, Nordlund, Dan (DSHS/RDA), StellathePug
Hi,
On Tue, Aug 23, 2011 at 12:29 PM, StellathePug <ritacarreira at hotmail.com> wrote:
Hello everyone,
I am trying to figure out a way of replacing missing observations in one of
the variables of a data frame by values of another variable. For example,
assume my data is X
X <-as.data.frame(matrix(c(9, 6, 1, 3, 9, "NA", "NA","NA","NA","NA",
? ? ? ? ? ? ? ? ? ?6, 4, 3,"NA", "NA", "NA", 5, 4, 1, 3), ncol=2))
names(X)<-c("X1","X2")
I want to change X1 so that instead of the missing values it uses the values
in X2 (regardless of whether these are missing).
Note that you don't have any missing values in X, as "NA" != NA So my X1, should become
X$X1 <- c(9, 6, 1, 3, 9, "NA", 5, 4, 1, 3). I have searched online for a while and looked at the manuals and the best (unsuccessful) attempt I have come up with is X$X1[X$X1=="NA"] <- X$X2 and that produces the following X1 X$X1<-c(9, 6, 1, 3, 9, 6, "NA", 3, "NA", "NA") and generates the following warning: Warning messages: 1: In `[<-.factor`(`*tmp*`, X$X1 == "NA", value = c(5L, 3L, 2L, 6L, ?: ?invalid factor level, NAs generated 2: In x[...] <- m : ?number of items to replace is not a multiple of replacement length I think that my error is that it is ignoring the non-missing values of X1 and the dimensions don't match. But what I want my code to do is to look at the rows of X1, see if it's a missing value; if it is, replace it with the value that is in the row of X2; if it's not missing, leave it as is.
Here are two solutions, one that is a correction to your first attempt, and another using ifelse: X$X1[X$X1=="NA"] <- X$X2[X$X1=="NA"] X$X1 <- ifelse(X$X1 == "NA", X$X2, X$X1) Best, Ista
What am I doing wrong? Thank you very much! Rita -- View this message in context: http://r.789695.n4.nabble.com/Replacing-NAs-in-one-variable-with-values-of-another-variable-tp3763269p3763269.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- project.org] On Behalf Of Ista Zahn Sent: Tuesday, August 23, 2011 11:06 AM To: StellathePug Cc: r-help at r-project.org Subject: Re: [R] Replacing NAs in one variable with values of another variable Hi, On Tue, Aug 23, 2011 at 12:29 PM, StellathePug <ritacarreira at hotmail.com> wrote:
Hello everyone, I am trying to figure out a way of replacing missing observations in
one of
the variables of a data frame by values of another variable. For
example,
assume my data is X
X <-as.data.frame(matrix(c(9, 6, 1, 3, 9, "NA", "NA","NA","NA","NA",
? ? ? ? ? ? ? ? ? ?6, 4, 3,"NA", "NA", "NA", 5, 4, 1, 3), ncol=2))
names(X)<-c("X1","X2")
I want to change X1 so that instead of the missing values it uses the
values
in X2 (regardless of whether these are missing).
Note that you don't have any missing values in X, as "NA" != NA So my X1, should become
X$X1 <- c(9, 6, 1, 3, 9, "NA", 5, 4, 1, 3). I have searched online for a while and looked at the manuals and the
best
(unsuccessful) attempt I have come up with is X$X1[X$X1=="NA"] <- X$X2 and that produces the following X1 X$X1<-c(9, 6, 1, 3, 9, 6, "NA", 3, "NA", "NA") and generates the following warning: Warning messages: 1: In `[<-.factor`(`*tmp*`, X$X1 == "NA", value = c(5L, 3L, 2L, 6L,
?:
?invalid factor level, NAs generated 2: In x[...] <- m : ?number of items to replace is not a multiple of replacement length I think that my error is that it is ignoring the non-missing values
of X1
and the dimensions don't match. But what I want my code to do is to
look at
the rows of X1, see if it's a missing value; if it is, replace it
with the
value that is in the row of X2; if it's not missing, leave it as is.
Here are two solutions, one that is a correction to your first attempt, and another using ifelse: X$X1[X$X1=="NA"] <- X$X2[X$X1=="NA"] X$X1 <- ifelse(X$X1 == "NA", X$X2, X$X1) Best, Ista
Rita,
In addition Ista's advice, I have a question. Did you really want your columns X1 and X2 to be factors? Your use of "NA" to represent missing has caused the columns to become factors. If you actually wanted a numeric matrix | data.frame then remove the quotes from around the NA. The you need to use is.na() to test for missing.
X <-as.data.frame(matrix(c(9, 6, 1, 3, 9, NA, NA, NA, NA, NA,
6, 4, 3, NA, NA, NA, 5, 4, 1, 3), ncol=2))
names(X)<-c("X1","X2")
X$X1 <- ifelse(is.na(X$X1), X$X2, X$X1)
Hope this is helpful,
Dan
Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
Thank you Dan and Ista!
Both of you are correct, I should have used NA rather than "NA" in my
example. So the correct code should be:
X <-as.data.frame(matrix(c(9, 6, 1, 3, 9, NA, NA,NA,NA,NA,
6, 4, 3,NA, NA, NA, 5, 4, 1, 3), ncol=2))
names(X)<-c("X1","X2")
X$X1[is.na(X$X1)] <- X$X2[is.na(X$X1)]
Where the last line replaces the missing observations of X1 by those of X2.
The "if else" statement also works.
Thank you very much, again!
Rita
--
View this message in context: http://r.789695.n4.nabble.com/Replacing-NAs-in-one-variable-with-values-of-another-variable-tp3763269p3765317.html
Sent from the R help mailing list archive at Nabble.com.