using ifelse to remove NA's from specific columns of a data frame containing strings and numbers

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121115/17ba5df2/attachment.pl>
Hi everyone, please put me off that list!!! The unsubscribe function does not function...
THANKS!!!
BW Sonja

-----Urspr?ngliche Nachricht-----
Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im Auftrag von David Romano
Gesendet: 15 November 2012 12:19
An: r-help at r-project.org
Betreff: [R] using ifelse to remove NA's from specific columns of a data frame containing strings and numbers

Hi everyone,

I have a data frame one of whose columns is a character vector and the rest are numeric, and in debugging a script, I noticed that an ifelse call seems to be coercing the character column to a numeric column, and producing
unintended values as a result.   Roughly, here's what I tried to do:

df: a data frame with, say, the first column as a character column and the second and third columns numeric.

also: NA's occur only in the numeric columns, and if they occur in one, they occur in the other as well.

I wanted to replace the NA's in column 2 with 0's and the ones in column 3 with 1's, so first I did this:
na.replacements <-ifelse(col(df)==2,0,1).
Then I used a second ifelse call to try to remove the NA's as I wanted, first by doing this:
clean.df <- ifelse(is.na(df), na.replacements, df),
which produced a list of lists vaguely resembling df, with the NA's mostly intact, and so then I tried this:
clean.df <- ifelse(is.na(df), na.replacements, unlist(df)),
which seems to work if all the columns are numeric, but otherwise changes strings to numbers.

I can't make sense of the help documentation enough to clear this up, but my guess is that the "yes" and "no" values passed to ifelse need to be vectors, in which case it seems I'll have to use another approach entirely, but even if is not the case and lists are acceptable, I'm not sure how to convert a mixed-mode data frame into a vector-like list of elements (which I would hope would work).

I'd be grateful for any suggestions!

Thanks,
David Romano

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David:

You seem to be getting lost in basic R tasks. Have you read the Intro
to R tutorial? If not, do so, as this should tell you how to do what
you need. If so, re-read the sections on indexing ("["), replacement,
and NA's. Also read about character vectors and factors.

-- Bert
Hi everyone,

I have a data frame one of whose columns is a character vector and the rest
are numeric, and in debugging a script, I noticed that an ifelse call seems
to be coercing the character column to a numeric column, and producing
unintended values as a result.   Roughly, here's what I tried to do:

df: a data frame with, say, the first column as a character column and the
second and third columns numeric.

also: NA's occur only in the numeric columns, and if they occur in one,
they occur in the other as well.

I wanted to replace the NA's in column 2 with 0's and the ones in column 3
with 1's, so first I did this:

na.replacements <-ifelse(col(df)==2,0,1).
Then I used a second ifelse call to try to remove the NA's as I wanted,
first by doing this:

clean.df <- ifelse(is.na(df), na.replacements, df),
which produced a list of lists vaguely resembling df, with the NA's mostly
intact, and so then I tried this:

clean.df <- ifelse(is.na(df), na.replacements, unlist(df)),
which seems to work if all the columns are numeric, but otherwise changes
strings to numbers.

I can't make sense of the help documentation enough to clear this up, but
my guess is that the "yes" and "no" values passed to ifelse need to be
vectors, in which case it seems I'll have to use another approach entirely,
but even if is not the case and lists are acceptable, I'm not sure how to
convert a mixed-mode data frame into a vector-like list of elements (which
I would hope would work).

I'd be grateful for any suggestions!

Thanks,
David Romano

        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121115/5c5fb694/attachment.pl>
Hi,

df1<-read.table(text="
col1 col2 col3
A?? 15.5?? 8.5
A?? 8.5??? 7.5
A?? NA???? NA
B?? 8.0?? 6.0
B?? NA???? NA
B?? 9.0?? 10.0
",sep="",header=TRUE,stringsAsFactors=FALSE)
?str(df1)
#'data.frame':??? 6 obs. of? 3 variables:
# $ col1: chr? "A" "A" "A" "B" ...
# $ col2: num? 15.5 8.5 NA 8 NA 9
# $ col3: num? 8.5 7.5 NA 6 NA 10

?df1$col2[is.na(df1$col2)]<-0
?df1$col3[is.na(df1$col3)]<-1
?df1
#? col1 col2 col3
#1??? A 15.5? 8.5
#2??? A? 8.5? 7.5
#3??? A? 0.0? 1.0
#4??? B? 8.0? 6.0
#5??? B? 0.0? 1.0
#6??? B? 9.0 10.0

#or if you want to use ifelse() from the original df1

?ifelse(is.na(df1$col2),0,df1$col2)
#[1] 15.5? 8.5? 0.0? 8.0? 0.0? 9.0
?ifelse(is.na(df1$col3),1,df1$col2)
#[1] 15.5? 8.5? 1.0? 8.0? 1.0? 9.0
A.K.

----- Original Message -----
From: David Romano <dromano at stanford.edu>
To: r-help at r-project.org
Cc: 
Sent: Thursday, November 15, 2012 6:19 AM
Subject: [R] using ifelse to remove NA's from specific columns of a data frame containing strings and numbers

Hi everyone,

I have a data frame one of whose columns is a character vector and the rest
are numeric, and in debugging a script, I noticed that an ifelse call seems
to be coercing the character column to a numeric column, and producing
unintended values as a result.?  Roughly, here's what I tried to do:

df: a data frame with, say, the first column as a character column and the
second and third columns numeric.

also: NA's occur only in the numeric columns, and if they occur in one,
they occur in the other as well.

I wanted to replace the NA's in column 2 with 0's and the ones in column 3
with 1's, so first I did this:
na.replacements <-ifelse(col(df)==2,0,1).
Then I used a second ifelse call to try to remove the NA's as I wanted,
first by doing this:
clean.df <- ifelse(is.na(df), na.replacements, df),
which produced a list of lists vaguely resembling df, with the NA's mostly
intact, and so then I tried this:
clean.df <- ifelse(is.na(df), na.replacements, unlist(df)),
which seems to work if all the columns are numeric, but otherwise changes
strings to numbers.

I can't make sense of the help documentation enough to clear this up, but
my guess is that the "yes" and "no" values passed to ifelse need to be
vectors, in which case it seems I'll have to use another approach entirely,
but even if is not the case and lists are acceptable, I'm not sure how to
convert a mixed-mode data frame into a vector-like list of elements (which
I would hope would work).

I'd be grateful for any suggestions!

Thanks,
David Romano

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Replace your NA's column by column, not all at once. 

In your first example, of the form
    ifelse(condition, numbers, data.frame) 
the second and third arguments are replicated to the length
of the first.  A data.frame's length is the number of columns
it has, so ifelse repeats its columns, not what you want.
Also, the 2nd and 3rd arguments to ifelse should be of the same
type, since the output will be a vector that accepts some values
from each.  If they don't have the same type the output will be
of some type that can accept values from both types.  That type
is often character or list, not what you want

Your second example code used unlist(data.frame).  data.frames
contain columns of various classes and unlist(data.frame) creates
a vector with one class, the class is chosen to retain the information,
if not the format, of columns in the data.frame.  It is generally not
a useful thing, unless all columns have the same class.

You showed some code but not data, so I'll make up something like
you described
   df <- data.frame(stringsAsFactors=FALSE,
         Number1 = c(1, 2, 3, NA, 5, 6),
         Number2 = c(11, 12, 13, 14, 14, NA),
         String = c("one","two",NA,"four","five","six"),
         Factor = factor(c("Group A", NA, "Group A", "Group B", "Group B", "Group B"))) 
Look at its structure with
   > str(df)
   'data.frame':   6 obs. of  4 variables:
    $ Number1: num  1 2 3 NA 5 6
    $ Number2: num  11 12 13 14 14 NA
    $ String : chr  "one" "two" NA "four" ...
    $ Factor : Factor w/ 2 levels "Group A","Group B": 1 NA 1 2 2 2
To do the sort of conversion you want try something like
    f <- function(d) {
        for(i in seq_along(d)) {
            di <- d[[i]]
            di[is.na(di)] <- if (is.numeric(di)) { # could use switch instead of if-then-else 
                                              if (i==2) { 0 } else { 1 }
                                      } else if (is.factor(di)) {
                                              levels(di)[1] # I don't know what you want here
                                       } else if (is.character(di)) {
                                              "Unknown"
                                       }
           d[[i]] <- di
        }
        d
    }
That would give you
   > str(f(df))
  'data.frame':   6 obs. of  4 variables:
   $ Number1: num  1 2 3 1 5 6
   $ Number2: num  11 12 13 14 14 0
   $ String : chr  "one" "two" "Unknown" "four" ...
   $ Factor : Factor w/ 2 levels "Group A","Group B": 1 1 1 2 2 2

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
Of David Romano
Sent: Thursday, November 15, 2012 7:58 AM
To: Bert Gunter
Cc: r-help at r-project.org
Subject: Re: [R] using ifelse to remove NA's from specific columns of a data frame
containing strings and numbers

Thanks for the suggestion, Bert; I just re-read the introduction with
particular attention to the sections you mentioned, but I don't see how any
of it bears on my question.  Namely -- to rephrase:  What constraints are
there on the form of the "yes" and "no" values required by ifelse?   The
introduction doesn't really speak to this, and the help documentation seems
to suggest that as long the shapes of the test, "yes" values, and "no"
values agree, that would be sufficient -- I don't see anything that
specifies that any of these should be of a particular data type.   My
example, however, seems to indicate that the "yes" and "no" values can't be
a mixture of characters and numbers, and I'm trying to figure out what the
underlying constraints are on ifelse.

Thanks again,
David

On Thu, Nov 15, 2012 at 6:46 AM, Bert Gunter <gunter.berton at gene.com> wrote:

David:

You seem to be getting lost in basic R tasks. Have you read the Intro
to R tutorial? If not, do so, as this should tell you how to do what
you need. If so, re-read the sections on indexing ("["), replacement,
and NA's. Also read about character vectors and factors.

-- Bert

On Thu, Nov 15, 2012 at 3:19 AM, David Romano <dromano at stanford.edu>
wrote:
Hi everyone,

I have a data frame one of whose columns is a character vector and the
rest
are numeric, and in debugging a script, I noticed that an ifelse call
seems
to be coercing the character column to a numeric column, and producing
unintended values as a result.   Roughly, here's what I tried to do:

df: a data frame with, say, the first column as a character column and
the
second and third columns numeric.

also: NA's occur only in the numeric columns, and if they occur in one,
they occur in the other as well.

I wanted to replace the NA's in column 2 with 0's and the ones in column
3
with 1's, so first I did this:

na.replacements <-ifelse(col(df)==2,0,1).
Then I used a second ifelse call to try to remove the NA's as I wanted,
first by doing this:

clean.df <- ifelse(is.na(df), na.replacements, df),
which produced a list of lists vaguely resembling df, with the NA's
mostly
intact, and so then I tried this:

clean.df <- ifelse(is.na(df), na.replacements, unlist(df)),
which seems to work if all the columns are numeric, but otherwise changes
strings to numbers.

I can't make sense of the help documentation enough to clear this up, but
my guess is that the "yes" and "no" values passed to ifelse need to be
vectors, in which case it seems I'll have to use another approach
entirely,
but even if is not the case and lists are acceptable, I'm not sure how to
convert a mixed-mode data frame into a vector-like list of elements
(which
I would hope would work).

I'd be grateful for any suggestions!

Thanks,
David Romano

        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:

http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
biostatistics/pdb-ncb-home.htm

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#Data
df<-data.frame(id=letters[1:10],var1=rnorm(10,10,5),var2=rnorm(10,5,2),var3=rnorm(10,1,1))
#Missing
df$var1[2]<-df$var2[c(2,6)]<-df$var3[c(2,5)]<-NA

na.replace<-seq(1:ncol(df))-1

df[,names(df)]<-sapply(1:dim(df)[2], function(ii)
{ifelse(is.na(df[,ii]),na.replace[ii],df[,ii])} )

David Romano-2 wrote
Hi everyone,

I have a data frame one of whose columns is a character vector and the
rest
are numeric, and in debugging a script, I noticed that an ifelse call
seems
to be coercing the character column to a numeric column, and producing
unintended values as a result.   Roughly, here's what I tried to do:

df: a data frame with, say, the first column as a character column and the
second and third columns numeric.

also: NA's occur only in the numeric columns, and if they occur in one,
they occur in the other as well.

I wanted to replace the NA's in column 2 with 0's and the ones in column 3
with 1's, so first I did this:

na.replacements <-ifelse(col(df)==2,0,1).
Then I used a second ifelse call to try to remove the NA's as I wanted,
first by doing this:

clean.df <- ifelse(is.na(df), na.replacements, df),
which produced a list of lists vaguely resembling df, with the NA's mostly
intact, and so then I tried this:

clean.df <- ifelse(is.na(df), na.replacements, unlist(df)),
which seems to work if all the columns are numeric, but otherwise changes
strings to numbers.

I can't make sense of the help documentation enough to clear this up, but
my guess is that the "yes" and "no" values passed to ifelse need to be
vectors, in which case it seems I'll have to use another approach
entirely,
but even if is not the case and lists are acceptable, I'm not sure how to
convert a mixed-mode data frame into a vector-like list of elements (which
I would hope would work).

I'd be grateful for any suggestions!

Thanks,
David Romano

	[[alternative HTML version deleted]]

______________________________________________

R-help@
 mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
View this message in context: http://r.789695.n4.nabble.com/using-ifelse-to-remove-NA-s-from-specific-columns-of-a-data-frame-containing-strings-and-numbers-tp4649599p4649642.html
Sent from the R help mailing list archive at Nabble.com.
HI,

But, this replace second column NAs to 1.? May be, the na.replace() should be applied to df1[,-1]

df1<-read.table(text="
col1 col2 col3
A?? 15.5?? 8.5
A?? 8.5??? 7.5
A?? NA???? NA
B?? 8.0?? 6.0
B?? NA???? NA
B?? 9.0?? 10.0
",sep="",header=TRUE,stringsAsFactors=FALSE)
df2<-df1[,-1]
na.replace<-seq(1:ncol(df2))-1
df2[,names(df2)]<-sapply(1:dim(df2)[2],function(ii){ifelse(is.na(df2[,ii]),na.replace[ii],df2[,ii])})
df2$col1<-df1$col1
df2[order(names(df2))]
#? col1 col2 col3
#1??? A 15.5? 8.5
#2??? A? 8.5? 7.5
#3??? A? 0.0? 1.0
#4??? B? 8.0? 6.0
#5??? B? 0.0? 1.0
#6??? B? 9.0 10.0
A.K.

----- Original Message -----
From: soon yi <soon.yi at ymail.com>
To: r-help at r-project.org
Cc: 
Sent: Thursday, November 15, 2012 2:29 PM
Subject: Re: [R] using ifelse to remove NA's from specific columns of a data frame containing strings and numbers

#Data
df<-data.frame(id=letters[1:10],var1=rnorm(10,10,5),var2=rnorm(10,5,2),var3=rnorm(10,1,1))
#Missing
df$var1[2]<-df$var2[c(2,6)]<-df$var3[c(2,5)]<-NA

na.replace<-seq(1:ncol(df))-1

df[,names(df)]<-sapply(1:dim(df)[2], function(ii)
{ifelse(is.na(df[,ii]),na.replace[ii],df[,ii])} )

David Romano-2 wrote
Hi everyone,

I have a data frame one of whose columns is a character vector and the
rest
are numeric, and in debugging a script, I noticed that an ifelse call
seems
to be coercing the character column to a numeric column, and producing
unintended values as a result.?  Roughly, here's what I tried to do:

df: a data frame with, say, the first column as a character column and the
second and third columns numeric.

also: NA's occur only in the numeric columns, and if they occur in one,
they occur in the other as well.

I wanted to replace the NA's in column 2 with 0's and the ones in column 3
with 1's, so first I did this:

na.replacements <-ifelse(col(df)==2,0,1).
Then I used a second ifelse call to try to remove the NA's as I wanted,
first by doing this:

clean.df <- ifelse(is.na(df), na.replacements, df),
which produced a list of lists vaguely resembling df, with the NA's mostly
intact, and so then I tried this:

clean.df <- ifelse(is.na(df), na.replacements, unlist(df)),
which seems to work if all the columns are numeric, but otherwise changes
strings to numbers.

I can't make sense of the help documentation enough to clear this up, but
my guess is that the "yes" and "no" values passed to ifelse need to be
vectors, in which case it seems I'll have to use another approach
entirely,
but even if is not the case and lists are acceptable, I'm not sure how to
convert a mixed-mode data frame into a vector-like list of elements (which
I would hope would work).

I'd be grateful for any suggestions!

Thanks,
David Romano

??? [[alternative HTML version deleted]]

______________________________________________

R-help@
? mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
View this message in context: http://r.789695.n4.nabble.com/using-ifelse-to-remove-NA-s-from-specific-columns-of-a-data-frame-containing-strings-and-numbers-tp4649599p4649642.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.