An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121115/17ba5df2/attachment.pl>
using ifelse to remove NA's from specific columns of a data frame containing strings and numbers
8 messages · Stendera, Sonja, Dr., Bert Gunter, David Romano +3 more
Hi everyone, please put me off that list!!! The unsubscribe function does not function... THANKS!!! BW Sonja -----Urspr?ngliche Nachricht----- Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im Auftrag von David Romano Gesendet: 15 November 2012 12:19 An: r-help at r-project.org Betreff: [R] using ifelse to remove NA's from specific columns of a data frame containing strings and numbers Hi everyone, I have a data frame one of whose columns is a character vector and the rest are numeric, and in debugging a script, I noticed that an ifelse call seems to be coercing the character column to a numeric column, and producing unintended values as a result. Roughly, here's what I tried to do: df: a data frame with, say, the first column as a character column and the second and third columns numeric. also: NA's occur only in the numeric columns, and if they occur in one, they occur in the other as well. I wanted to replace the NA's in column 2 with 0's and the ones in column 3 with 1's, so first I did this:
na.replacements <-ifelse(col(df)==2,0,1).
Then I used a second ifelse call to try to remove the NA's as I wanted, first by doing this:
clean.df <- ifelse(is.na(df), na.replacements, df),
which produced a list of lists vaguely resembling df, with the NA's mostly intact, and so then I tried this:
clean.df <- ifelse(is.na(df), na.replacements, unlist(df)),
which seems to work if all the columns are numeric, but otherwise changes strings to numbers. I can't make sense of the help documentation enough to clear this up, but my guess is that the "yes" and "no" values passed to ifelse need to be vectors, in which case it seems I'll have to use another approach entirely, but even if is not the case and lists are acceptable, I'm not sure how to convert a mixed-mode data frame into a vector-like list of elements (which I would hope would work). I'd be grateful for any suggestions! Thanks, David Romano ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David:
You seem to be getting lost in basic R tasks. Have you read the Intro
to R tutorial? If not, do so, as this should tell you how to do what
you need. If so, re-read the sections on indexing ("["), replacement,
and NA's. Also read about character vectors and factors.
-- Bert
On Thu, Nov 15, 2012 at 3:19 AM, David Romano <dromano at stanford.edu> wrote:
Hi everyone, I have a data frame one of whose columns is a character vector and the rest are numeric, and in debugging a script, I noticed that an ifelse call seems to be coercing the character column to a numeric column, and producing unintended values as a result. Roughly, here's what I tried to do: df: a data frame with, say, the first column as a character column and the second and third columns numeric. also: NA's occur only in the numeric columns, and if they occur in one, they occur in the other as well. I wanted to replace the NA's in column 2 with 0's and the ones in column 3 with 1's, so first I did this:
na.replacements <-ifelse(col(df)==2,0,1).
Then I used a second ifelse call to try to remove the NA's as I wanted, first by doing this:
clean.df <- ifelse(is.na(df), na.replacements, df),
which produced a list of lists vaguely resembling df, with the NA's mostly intact, and so then I tried this:
clean.df <- ifelse(is.na(df), na.replacements, unlist(df)),
which seems to work if all the columns are numeric, but otherwise changes
strings to numbers.
I can't make sense of the help documentation enough to clear this up, but
my guess is that the "yes" and "no" values passed to ifelse need to be
vectors, in which case it seems I'll have to use another approach entirely,
but even if is not the case and lists are acceptable, I'm not sure how to
convert a mixed-mode data frame into a vector-like list of elements (which
I would hope would work).
I'd be grateful for any suggestions!
Thanks,
David Romano
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121115/5c5fb694/attachment.pl>
Hi, df1<-read.table(text=" col1 col2 col3 A?? 15.5?? 8.5 A?? 8.5??? 7.5 A?? NA???? NA B?? 8.0?? 6.0 B?? NA???? NA B?? 9.0?? 10.0 ",sep="",header=TRUE,stringsAsFactors=FALSE) ?str(df1) #'data.frame':??? 6 obs. of? 3 variables: # $ col1: chr? "A" "A" "A" "B" ... # $ col2: num? 15.5 8.5 NA 8 NA 9 # $ col3: num? 8.5 7.5 NA 6 NA 10 ?df1$col2[is.na(df1$col2)]<-0 ?df1$col3[is.na(df1$col3)]<-1 ?df1 #? col1 col2 col3 #1??? A 15.5? 8.5 #2??? A? 8.5? 7.5 #3??? A? 0.0? 1.0 #4??? B? 8.0? 6.0 #5??? B? 0.0? 1.0 #6??? B? 9.0 10.0 #or if you want to use ifelse() from the original df1 ?ifelse(is.na(df1$col2),0,df1$col2) #[1] 15.5? 8.5? 0.0? 8.0? 0.0? 9.0 ?ifelse(is.na(df1$col3),1,df1$col2) #[1] 15.5? 8.5? 1.0? 8.0? 1.0? 9.0 A.K. ----- Original Message ----- From: David Romano <dromano at stanford.edu> To: r-help at r-project.org Cc: Sent: Thursday, November 15, 2012 6:19 AM Subject: [R] using ifelse to remove NA's from specific columns of a data frame containing strings and numbers Hi everyone, I have a data frame one of whose columns is a character vector and the rest are numeric, and in debugging a script, I noticed that an ifelse call seems to be coercing the character column to a numeric column, and producing unintended values as a result.? Roughly, here's what I tried to do: df: a data frame with, say, the first column as a character column and the second and third columns numeric. also: NA's occur only in the numeric columns, and if they occur in one, they occur in the other as well. I wanted to replace the NA's in column 2 with 0's and the ones in column 3 with 1's, so first I did this:
na.replacements <-ifelse(col(df)==2,0,1).
Then I used a second ifelse call to try to remove the NA's as I wanted, first by doing this:
clean.df <- ifelse(is.na(df), na.replacements, df),
which produced a list of lists vaguely resembling df, with the NA's mostly intact, and so then I tried this:
clean.df <- ifelse(is.na(df), na.replacements, unlist(df)),
which seems to work if all the columns are numeric, but otherwise changes strings to numbers. I can't make sense of the help documentation enough to clear this up, but my guess is that the "yes" and "no" values passed to ifelse need to be vectors, in which case it seems I'll have to use another approach entirely, but even if is not the case and lists are acceptable, I'm not sure how to convert a mixed-mode data frame into a vector-like list of elements (which I would hope would work). I'd be grateful for any suggestions! Thanks, David Romano ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Replace your NA's column by column, not all at once.
In your first example, of the form
ifelse(condition, numbers, data.frame)
the second and third arguments are replicated to the length
of the first. A data.frame's length is the number of columns
it has, so ifelse repeats its columns, not what you want.
Also, the 2nd and 3rd arguments to ifelse should be of the same
type, since the output will be a vector that accepts some values
from each. If they don't have the same type the output will be
of some type that can accept values from both types. That type
is often character or list, not what you want
Your second example code used unlist(data.frame). data.frames
contain columns of various classes and unlist(data.frame) creates
a vector with one class, the class is chosen to retain the information,
if not the format, of columns in the data.frame. It is generally not
a useful thing, unless all columns have the same class.
You showed some code but not data, so I'll make up something like
you described
df <- data.frame(stringsAsFactors=FALSE,
Number1 = c(1, 2, 3, NA, 5, 6),
Number2 = c(11, 12, 13, 14, 14, NA),
String = c("one","two",NA,"four","five","six"),
Factor = factor(c("Group A", NA, "Group A", "Group B", "Group B", "Group B")))
Look at its structure with
> str(df)
'data.frame': 6 obs. of 4 variables:
$ Number1: num 1 2 3 NA 5 6
$ Number2: num 11 12 13 14 14 NA
$ String : chr "one" "two" NA "four" ...
$ Factor : Factor w/ 2 levels "Group A","Group B": 1 NA 1 2 2 2
To do the sort of conversion you want try something like
f <- function(d) {
for(i in seq_along(d)) {
di <- d[[i]]
di[is.na(di)] <- if (is.numeric(di)) { # could use switch instead of if-then-else
if (i==2) { 0 } else { 1 }
} else if (is.factor(di)) {
levels(di)[1] # I don't know what you want here
} else if (is.character(di)) {
"Unknown"
}
d[[i]] <- di
}
d
}
That would give you
> str(f(df))
'data.frame': 6 obs. of 4 variables:
$ Number1: num 1 2 3 1 5 6
$ Number2: num 11 12 13 14 14 0
$ String : chr "one" "two" "Unknown" "four" ...
$ Factor : Factor w/ 2 levels "Group A","Group B": 1 1 1 2 2 2
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of David Romano Sent: Thursday, November 15, 2012 7:58 AM To: Bert Gunter Cc: r-help at r-project.org Subject: Re: [R] using ifelse to remove NA's from specific columns of a data frame containing strings and numbers Thanks for the suggestion, Bert; I just re-read the introduction with particular attention to the sections you mentioned, but I don't see how any of it bears on my question. Namely -- to rephrase: What constraints are there on the form of the "yes" and "no" values required by ifelse? The introduction doesn't really speak to this, and the help documentation seems to suggest that as long the shapes of the test, "yes" values, and "no" values agree, that would be sufficient -- I don't see anything that specifies that any of these should be of a particular data type. My example, however, seems to indicate that the "yes" and "no" values can't be a mixture of characters and numbers, and I'm trying to figure out what the underlying constraints are on ifelse. Thanks again, David On Thu, Nov 15, 2012 at 6:46 AM, Bert Gunter <gunter.berton at gene.com> wrote:
David:
You seem to be getting lost in basic R tasks. Have you read the Intro
to R tutorial? If not, do so, as this should tell you how to do what
you need. If so, re-read the sections on indexing ("["), replacement,
and NA's. Also read about character vectors and factors.
-- Bert
On Thu, Nov 15, 2012 at 3:19 AM, David Romano <dromano at stanford.edu>
wrote:
Hi everyone, I have a data frame one of whose columns is a character vector and the
rest
are numeric, and in debugging a script, I noticed that an ifelse call
seems
to be coercing the character column to a numeric column, and producing unintended values as a result. Roughly, here's what I tried to do: df: a data frame with, say, the first column as a character column and
the
second and third columns numeric. also: NA's occur only in the numeric columns, and if they occur in one, they occur in the other as well. I wanted to replace the NA's in column 2 with 0's and the ones in column
3
with 1's, so first I did this:
na.replacements <-ifelse(col(df)==2,0,1).
Then I used a second ifelse call to try to remove the NA's as I wanted, first by doing this:
clean.df <- ifelse(is.na(df), na.replacements, df),
which produced a list of lists vaguely resembling df, with the NA's
mostly
intact, and so then I tried this:
clean.df <- ifelse(is.na(df), na.replacements, unlist(df)),
which seems to work if all the columns are numeric, but otherwise changes strings to numbers. I can't make sense of the help documentation enough to clear this up, but my guess is that the "yes" and "no" values passed to ifelse need to be vectors, in which case it seems I'll have to use another approach
entirely,
but even if is not the case and lists are acceptable, I'm not sure how to convert a mixed-mode data frame into a vector-like list of elements
(which
I would hope would work).
I'd be grateful for any suggestions!
Thanks,
David Romano
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
biostatistics/pdb-ncb-home.htm
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
#Data
df<-data.frame(id=letters[1:10],var1=rnorm(10,10,5),var2=rnorm(10,5,2),var3=rnorm(10,1,1))
#Missing
df$var1[2]<-df$var2[c(2,6)]<-df$var3[c(2,5)]<-NA
na.replace<-seq(1:ncol(df))-1
df[,names(df)]<-sapply(1:dim(df)[2], function(ii)
{ifelse(is.na(df[,ii]),na.replace[ii],df[,ii])} )
David Romano-2 wrote
Hi everyone, I have a data frame one of whose columns is a character vector and the rest are numeric, and in debugging a script, I noticed that an ifelse call seems to be coercing the character column to a numeric column, and producing unintended values as a result. Roughly, here's what I tried to do: df: a data frame with, say, the first column as a character column and the second and third columns numeric. also: NA's occur only in the numeric columns, and if they occur in one, they occur in the other as well. I wanted to replace the NA's in column 2 with 0's and the ones in column 3 with 1's, so first I did this:
na.replacements <-ifelse(col(df)==2,0,1).
Then I used a second ifelse call to try to remove the NA's as I wanted, first by doing this:
clean.df <- ifelse(is.na(df), na.replacements, df),
which produced a list of lists vaguely resembling df, with the NA's mostly intact, and so then I tried this:
clean.df <- ifelse(is.na(df), na.replacements, unlist(df)),
which seems to work if all the columns are numeric, but otherwise changes strings to numbers. I can't make sense of the help documentation enough to clear this up, but my guess is that the "yes" and "no" values passed to ifelse need to be vectors, in which case it seems I'll have to use another approach entirely, but even if is not the case and lists are acceptable, I'm not sure how to convert a mixed-mode data frame into a vector-like list of elements (which I would hope would work). I'd be grateful for any suggestions! Thanks, David Romano [[alternative HTML version deleted]]
______________________________________________
R-help@
mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- View this message in context: http://r.789695.n4.nabble.com/using-ifelse-to-remove-NA-s-from-specific-columns-of-a-data-frame-containing-strings-and-numbers-tp4649599p4649642.html Sent from the R help mailing list archive at Nabble.com.
HI,
But, this replace second column NAs to 1.? May be, the na.replace() should be applied to df1[,-1]
df1<-read.table(text="
col1 col2 col3
A?? 15.5?? 8.5
A?? 8.5??? 7.5
A?? NA???? NA
B?? 8.0?? 6.0
B?? NA???? NA
B?? 9.0?? 10.0
",sep="",header=TRUE,stringsAsFactors=FALSE)
df2<-df1[,-1]
na.replace<-seq(1:ncol(df2))-1
df2[,names(df2)]<-sapply(1:dim(df2)[2],function(ii){ifelse(is.na(df2[,ii]),na.replace[ii],df2[,ii])})
df2$col1<-df1$col1
df2[order(names(df2))]
#? col1 col2 col3
#1??? A 15.5? 8.5
#2??? A? 8.5? 7.5
#3??? A? 0.0? 1.0
#4??? B? 8.0? 6.0
#5??? B? 0.0? 1.0
#6??? B? 9.0 10.0
A.K.
----- Original Message -----
From: soon yi <soon.yi at ymail.com>
To: r-help at r-project.org
Cc:
Sent: Thursday, November 15, 2012 2:29 PM
Subject: Re: [R] using ifelse to remove NA's from specific columns of a data frame containing strings and numbers
#Data
df<-data.frame(id=letters[1:10],var1=rnorm(10,10,5),var2=rnorm(10,5,2),var3=rnorm(10,1,1))
#Missing
df$var1[2]<-df$var2[c(2,6)]<-df$var3[c(2,5)]<-NA
na.replace<-seq(1:ncol(df))-1
df[,names(df)]<-sapply(1:dim(df)[2], function(ii)
{ifelse(is.na(df[,ii]),na.replace[ii],df[,ii])} )
David Romano-2 wrote
Hi everyone, I have a data frame one of whose columns is a character vector and the rest are numeric, and in debugging a script, I noticed that an ifelse call seems to be coercing the character column to a numeric column, and producing unintended values as a result.? Roughly, here's what I tried to do: df: a data frame with, say, the first column as a character column and the second and third columns numeric. also: NA's occur only in the numeric columns, and if they occur in one, they occur in the other as well. I wanted to replace the NA's in column 2 with 0's and the ones in column 3 with 1's, so first I did this:
na.replacements <-ifelse(col(df)==2,0,1).
Then I used a second ifelse call to try to remove the NA's as I wanted, first by doing this:
clean.df <- ifelse(is.na(df), na.replacements, df),
which produced a list of lists vaguely resembling df, with the NA's mostly intact, and so then I tried this:
clean.df <- ifelse(is.na(df), na.replacements, unlist(df)),
which seems to work if all the columns are numeric, but otherwise changes strings to numbers. I can't make sense of the help documentation enough to clear this up, but my guess is that the "yes" and "no" values passed to ifelse need to be vectors, in which case it seems I'll have to use another approach entirely, but even if is not the case and lists are acceptable, I'm not sure how to convert a mixed-mode data frame into a vector-like list of elements (which I would hope would work). I'd be grateful for any suggestions! Thanks, David Romano ??? [[alternative HTML version deleted]]
______________________________________________
R-help@
? mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- View this message in context: http://r.789695.n4.nabble.com/using-ifelse-to-remove-NA-s-from-specific-columns-of-a-data-frame-containing-strings-and-numbers-tp4649599p4649642.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.