I have a small Excel data file with two columns of character variables, one column with a numeric variable and three rows. One of the character cells is blank and one of the numeric cells is blank.
I read the data file with the following code:
library(RODBC)
channel <- odbcConnectExcel('u:/test.xls')
sqlTables(channel)
s1 <- sqlFetch(channel, "Sheet1")
odbcClose(channel)
s1
The code prints the data file as:
Firstname Lastname Age
1 Bob Smith 20
2 John Clark NA
3 Andy <NA> 40
I can replace the NA with the number zero in the second row of the third (?Age?) column using the following line:
s1[is.na(s1[,3]), 3] <- 0
What code can I use to replace the <NA> in the third row of the second (?Lastname?) column with qqqqqqqq?
Thank you in advance for any advice.
Mark Miller
Gainesville, Florida
replacing <NA> in character column
5 messages · Mark Miller, David Winsemius
On Oct 28, 2009, at 5:19 PM, Mark Miller wrote:
I have a small Excel data file with two columns of character
variables, one column with a numeric variable and three rows. One
of the character cells is blank and one of the numeric cells is blank.
I read the data file with the following code:
library(RODBC)
channel <- odbcConnectExcel('u:/test.xls')
sqlTables(channel)
s1 <- sqlFetch(channel, "Sheet1")
odbcClose(channel)
s1
The code prints the data file as:
Firstname Lastname Age
1 Bob Smith 20
2 John Clark NA
3 Andy <NA> 40
I am guessing that your <NA> is not really an NA_character, but
rather a factor with a level of "<NA>".
See if str(s1) confirms my suspicions.
> df1 <- read.table(textConnection(" Firstname Lastname Age
+ 1 Bob Smith 20
+ 2 John Clark NA
+ 3 Andy <NA> 40"), header=T)
> levels(df1$Lastname)
[1] "<NA>" "Clark" "Smith"
> levels(df1$Lastname)[1] <- "qqqq"
> levels(df1$Lastname)
[1] "qqqq" "Clark" "Smith"
> df1
Firstname Lastname Age
1 Bob Smith 20
2 John Clark NA
3 Andy qqqq 40
I can replace the NA with the number zero in the second row of the third (?Age?) column using the following line: s1[is.na(s1[,3]), 3] <- 0 What code can I use to replace the <NA> in the third row of the second (?Lastname?) column with qqqqqqqq? Thank you in advance for any advice.
David Winsemius, MD Heritage Laboratories West Hartford, CT
I am guessing that your <NA> is not really an NA_character, but
rather a factor with a level of "<NA>".
See if str(s1) confirms my suspicions.
> df1 <- read.table(textConnection(" Firstname Lastname Age
+ 1 Bob Smith 20
+ 2 John Clark NA
+ 3 Andy <NA> 40"), header=T)
> levels(df1$Lastname)
[1] "<NA>" "Clark" "Smith"
> levels(df1$Lastname)[1] <- "qqqq"
> levels(df1$Lastname)
[1] "qqqq" "Clark" "Smith"
> df1
Firstname Lastname Age
1 Bob Smith 20
2 John Clark NA
3 Andy qqqq 40
--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
Dr. Winsemius, I just tried your suggestion of typing str(s1). Below is the
code and the comments that R returned:
channel <- odbcConnectExcel('u:/test.xls')
sqlTables(channel)
TABLE_CAT TABLE_SCHEM TABLE_NAME TABLE_TYPE REMARKS 1 u:\\test <NA> Sheet1$ SYSTEM TABLE <NA> 2 u:\\test <NA> Sheet2$ SYSTEM TABLE <NA> 3 u:\\test <NA> Sheet3$ SYSTEM TABLE <NA>
s1 <- sqlFetch(channel, "Sheet1") odbcClose(channel) s1
Firstname Lastname Age 1 Bob Smith 20 2 John Clark NA 3 Andy <NA> 40
str(s1)
'data.frame': 3 obs. of 3 variables: $ Firstname: Factor w/ 3 levels "Andy","Bob","John": 2 3 1 $ Lastname : Factor w/ 2 levels "Clark","Smith": 2 1 NA $ Age : num 20 NA 40
levels(s1$Lastname)
[1] "Clark" "Smith"
When I used the code: levels(s1$Lastname)[1] <- "qqqq" R replaced Clark with qqqq and kept <NA> I guess because <NA> is not considered a level here:
levels(s1$Lastname)[1] <- "qqqq" s1
Firstname Lastname Age 1 Bob Smith 20 2 John qqqq NA 3 Andy <NA> 40
View this message in context: http://www.nabble.com/replacing-%3CNA%3E-in-character-column-tp26102360p26103240.html Sent from the R help mailing list archive at Nabble.com.
On Oct 28, 2009, at 6:22 PM, Mark W. Miller wrote:
I am guessing that your <NA> is not really an NA_character, but rather a factor with a level of "<NA>". See if str(s1) confirms my suspicions.
df1 <- read.table(textConnection(" Firstname Lastname Age
+ 1 Bob Smith 20 + 2 John Clark NA + 3 Andy <NA> 40"), header=T)
levels(df1$Lastname)
[1] "<NA>" "Clark" "Smith"
levels(df1$Lastname)[1] <- "qqqq" levels(df1$Lastname)
[1] "qqqq" "Clark" "Smith"
df1
Firstname Lastname Age 1 Bob Smith 20 2 John Clark NA 3 Andy qqqq 40 -- David Winsemius, MD Heritage Laboratories West Hartford, CT Dr. Winsemius, I just tried your suggestion of typing str(s1). Below is the code and the comments that R returned:
channel <- odbcConnectExcel('u:/test.xls')
sqlTables(channel)
TABLE_CAT TABLE_SCHEM TABLE_NAME TABLE_TYPE REMARKS 1 u:\\test <NA> Sheet1$ SYSTEM TABLE <NA> 2 u:\\test <NA> Sheet2$ SYSTEM TABLE <NA> 3 u:\\test <NA> Sheet3$ SYSTEM TABLE <NA>
s1 <- sqlFetch(channel, "Sheet1") odbcClose(channel) s1
Firstname Lastname Age 1 Bob Smith 20 2 John Clark NA 3 Andy <NA> 40
str(s1)
'data.frame': 3 obs. of 3 variables: $ Firstname: Factor w/ 3 levels "Andy","Bob","John": 2 3 1 $ Lastname : Factor w/ 2 levels "Clark","Smith": 2 1 NA
My guess was wrong.
$ Age : num 20 NA 40
levels(s1$Lastname)
[1] "Clark" "Smith"
When I used the code: levels(s1$Lastname)[1] <- "qqqq" R replaced Clark with qqqq and kept <NA> I guess because <NA> is not considered a level here:
levels(s1$Lastname)[1] <- "qqqq"
When I did that on the dataset I already had examined "df1". I knew that <NA> was a level. In your case is was not a named level. So you should not have expected the same outcome as with my efforts. I am unable to explain why your methods did not work with that structure: This seems to work: > df1$Lastname <- ifelse(is.na(df1$Lastname), "qqqq", df1$Lastname) > df1 Firstname Lastname Age 1 Bob 3 20 2 John 2 NA 3 Andy qqqq 40
s1
Firstname Lastname Age 1 Bob Smith 20 2 John qqqq NA 3 Andy <NA> 40
--
David Winsemius, MD Heritage Laboratories West Hartford, CT
On Oct 28, 2009, at 7:06 PM, David Winsemius wrote:
On Oct 28, 2009, at 6:22 PM, Mark W. Miller wrote:
I am guessing that your <NA> is not really an NA_character, but rather a factor with a level of "<NA>". See if str(s1) confirms my suspicions.
df1 <- read.table(textConnection(" Firstname Lastname Age
+ 1 Bob Smith 20 + 2 John Clark NA + 3 Andy <NA> 40"), header=T)
levels(df1$Lastname)
[1] "<NA>" "Clark" "Smith"
levels(df1$Lastname)[1] <- "qqqq" levels(df1$Lastname)
[1] "qqqq" "Clark" "Smith"
df1
Firstname Lastname Age 1 Bob Smith 20 2 John Clark NA 3 Andy qqqq 40 -- David Winsemius, MD Heritage Laboratories West Hartford, CT Dr. Winsemius, I just tried your suggestion of typing str(s1). Below is the code and the comments that R returned:
channel <- odbcConnectExcel('u:/test.xls')
sqlTables(channel)
TABLE_CAT TABLE_SCHEM TABLE_NAME TABLE_TYPE REMARKS 1 u:\\test <NA> Sheet1$ SYSTEM TABLE <NA> 2 u:\\test <NA> Sheet2$ SYSTEM TABLE <NA> 3 u:\\test <NA> Sheet3$ SYSTEM TABLE <NA>
s1 <- sqlFetch(channel, "Sheet1") odbcClose(channel) s1
Firstname Lastname Age 1 Bob Smith 20 2 John Clark NA 3 Andy <NA> 40
str(s1)
'data.frame': 3 obs. of 3 variables: $ Firstname: Factor w/ 3 levels "Andy","Bob","John": 2 3 1 $ Lastname : Factor w/ 2 levels "Clark","Smith": 2 1 NA
My guess was wrong.
$ Age : num 20 NA 40
levels(s1$Lastname)
[1] "Clark" "Smith"
When I used the code: levels(s1$Lastname)[1] <- "qqqq" R replaced Clark with qqqq and kept <NA> I guess because <NA> is not considered a level here:
levels(s1$Lastname)[1] <- "qqqq"
When I did that on the dataset I already had examined "df1". I knew that <NA> was a level. In your case is was not a named level. So you should not have expected the same outcome as with my efforts. I am unable to explain why your methods did not work with that structure: This seems to work:
df1$Lastname <- ifelse(is.na(df1$Lastname), "qqqq", df1$Lastname) df1
Firstname Lastname Age 1 Bob 3 20 2 John 2 NA 3 Andy qqqq 40
Oooops. Tripped up by factors and ifelse again. Would probably need to be: > df1$Lastname <- ifelse(is.na(df1$Lastname), "qqqq", as.character(df1$Lastname)) > df1 Firstname Lastname Age 1 Bob Smith 20 2 John Clark NA 3 Andy qqqq 40
s1
Firstname Lastname Age 1 Bob Smith 20 2 John qqqq NA 3 Andy <NA> 40
--
David Winsemius, MD Heritage Laboratories West Hartford, CT