Hello,
I use read.xls from the gdata package to read in xlsx files. Sometimes these
data.frames contain NA columns
and rows only. I know how to get rid of those ones but here is the R output
of a test data set read in with read.xls
t1
A B X D X.1 X.2
1 test 1 NA NA
2 <NA> asd NA asdasd NA
3 NA asdasd NA
4 NA NA NA
t1[1,2], t1[4,5] and t1[4,6] are NA in text form in the excel sheet. I don't
understand why in the first column it is <NA> while in the last two is not.
I basically want to get rid of column 5 and 6 and row 4 as they do not
contain any relevant information. If i do a is.na.data.frame(t1):
is.na.data.frame(t1)
A B X D X.1 X.2
[1,] FALSE FALSE TRUE FALSE TRUE FALSE
[2,] TRUE FALSE TRUE FALSE TRUE FALSE
[3,] FALSE FALSE TRUE FALSE TRUE FALSE
[4,] FALSE FALSE TRUE FALSE TRUE FALSE
does not give me the result I hoped to get.
It seems that <NA> and NA are treated as NA but in t1[4,6] it is treated as
FALSE because if I do
On Mon, Feb 13, 2012 at 07:48:11AM -0800, syrvn wrote:
Hello,
I use read.xls from the gdata package to read in xlsx files. Sometimes these
data.frames contain NA columns
and rows only. I know how to get rid of those ones but here is the R output
of a test data set read in with read.xls
t1
A B X D X.1 X.2
1 test 1 NA NA
2 <NA> asd NA asdasd NA
3 NA asdasd NA
4 NA NA NA
t1[1,2], t1[4,5] and t1[4,6] are NA in text form in the excel sheet. I don't
understand why in the first column it is <NA> while in the last two is not.
I basically want to get rid of column 5 and 6 and row 4 as they do not
contain any relevant information. If i do a is.na.data.frame(t1):
is.na.data.frame(t1)
A B X D X.1 X.2
[1,] FALSE FALSE TRUE FALSE TRUE FALSE
[2,] TRUE FALSE TRUE FALSE TRUE FALSE
[3,] FALSE FALSE TRUE FALSE TRUE FALSE
[4,] FALSE FALSE TRUE FALSE TRUE FALSE
does not give me the result I hoped to get.
It seems that <NA> and NA are treated as NA but in t1[4,6] it is treated as
FALSE because if I do
as.character(t1[4,6])
[1] "NA "
Hi.
I do not know, how "NA " appeared, however, it is possible
to change them to real NA as follows.
# some data frame
df <- structure(list(a = c(NA, 2L, 3L, 4L), b = c("a", NA, "c", "NA "),
c = structure(c(1L, 2L, NA, 4L), .Label = c("e", "f", "g", "h"),
class = "factor")), .Names = c("a", "b", "c"), row.names = c(NA, -4L),
class = "data.frame")
df
a b c
1 NA a e
2 2 <NA> f
3 3 c <NA>
4 4 NA h
df[4, 2] # this is not NA, but "NA "
[1] "NA "
# replace all "NA " by NA in column 2
df[which(df[,2] == "NA "), 2] <- NA
df
a b c
1 NA a e
2 2 <NA> f
3 3 c <NA>
4 4 <NA> h
Hope this helps.
Petr Savicky.
Hi,
thanks for you suggestion.
I finally solved it in a different way using apply and is.na for TRUE NA's
and if(as.character(x) == "NA") etc.
However, I just spotted that read.xls seems to have problems reading in
special characters such as < or >.
Is there any workaround for that?
--
View this message in context: http://r.789695.n4.nabble.com/Deleting-rows-and-columns-containing-NA-s-and-only-tp4384173p4384663.html
Sent from the R help mailing list archive at Nabble.com.
Hi,
thanks for you suggestion.
I finally solved it in a different way using apply and is.na for
TRUE NA's
and if(as.character(x) == "NA") etc.
However, I just spotted that read.xls seems to have problems reading
in
special characters such as < or >.
Or could it be that you have problems in not reading the help pages
carefully?
> read.table(text="<NA>")
V1
1 <NA>
> read.table(text="<NA>", na.strings=c("<NA>","NA"))
V1
1 NA