An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101208/24a3e015/attachment.pl>
evaluating NAs in a dataframe
6 messages · Wade Wall, Peter Ehlers, Philipp Pagel +2 more
On 2010-12-08 12:10, Wade Wall wrote:
Hi all,
How can one evaluate NAs in a numeric dataframe column? For example, I have
a dataframe (demo) with a column of numbers and several NAs. If I write
demo.df>= 10, numerals will return TRUE or FALSE, but if the value is
"NA", "NA" is returned. But if I write demo.df == "NA", it returns as "NA"
also. I know that I can remove NAs, but would like to keep the dataframe as
is without creating a subset. I basically want to add a line that evaluates
the NA in the demo dataframe.
As an example, I want to assign rows to classes based on values in
demo$Area. Some of the values in demo$Area are "NA"
for (i in 1:nrow(demo)) {
if (demo$Area[i]> 0&& demo$Area[i]< 10) {Class[i]<-"S01"} ## 1-10 cm2
if (demo$Area[i]>= 10&& demo$Area[i]< 25) {Class[i]<- "S02"} ##
10-25cm2
if (demo$Area[i]>= 25&& demo$Area[i]< 50) {Class[i]<-"S03"} ## 25-50
cm2
if (demo$Area[i]>= 50&& demo$Area[i]< 100) {Class[i]<-"S04"} ## 50-100
cm2
if (demo$Area[i]>= 100&& demo$Area[i]< 200) {Class[i]<- "S05"} ##
100-200 cm2
if (demo$Area[i]>= 200&& demo$Area[i]< 400) {Class[i]<- "S06"} ##
200-400 cm2
if (demo$Area[i]>= 400&& demo$Area[i]< 800) {Class[i]<- "S07"} ##
400-800 cm2
if (demo$Area[i]>= 800&& demo$Area[i]< 1600) {Class[i]<- "S08"} ##
800-1600 cm2
if (demo$Area[i]>= 1600&& demo$Area[i]< 3200) {Class[i]<- "S09"} ##
1600-3200 cm2
if (demo$Area[i]>=3200) {Class[i]<- "S10"} ##>3200 cm2
}
What happens is that I get the message "Error in if (demo$Area[i]> 0&&
demo$Area[i]< 10) { : missing value where TRUE/FALSE needed"
You don't say what you want to have occur when x is NA. (I don't know
what 'evaluate NA' means.)
But why not just use something like:
for(....){
if(!is.na(x[i]){
.... your stuff, preferably replacing '&&' with '&' ....
} else {....}
}
Peter Ehlers
Thanks for any help Wade
Hi!
How can one evaluate NAs in a numeric dataframe column? For example, I have a dataframe (demo) with a column of numbers and several NAs. If I write demo.df >= 10, numerals will return TRUE or FALSE, but if the value is "NA", "NA" is returned. But if I write demo.df == "NA", it returns as "NA"
Sounds like you are looking for is.na :
is.na(c(1,NA,3))
[1] FALSE TRUE FALSE
As an example, I want to assign rows to classes based on values in
demo$Area. Some of the values in demo$Area are "NA"
for (i in 1:nrow(demo)) {
if (demo$Area[i] > 0 && demo$Area[i] < 10) {Class[i]<-"S01"} ## 1-10 cm2
if (demo$Area[i] >= 10 && demo$Area[i] < 25) {Class[i] <- "S02"} ##
10-25cm2
[...]
if (demo$Area[i] >=3200) {Class[i] <- "S10"} ## >3200 cm2
}
What happens is that I get the message "Error in if (demo$Area[i] > 0 &&
demo$Area[i] < 10) { : missing value where TRUE/FALSE needed"
First of all, you don't need a loop here. Example:
# make up some data
foo <- data.frame(a=sample(1:20, 20, replace=TRUE))
# assign to classes
foo$class <- cut(foo$a, breaks=c(-1, 7, 13, 20), labels=c('small', 'medium', 'large'))
This also works in the presence of NAs - but of course the class will
be NA in those cases which, at least in my opinion, is the correct
value.
cu
Philipp
Dr. Philipp Pagel Lehrstuhl f?r Genomorientierte Bioinformatik Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Wade Wall
Sent: Wednesday, December 08, 2010 12:11 PM
To: r-help at stat.math.ethz.ch
Subject: [R] evaluating NAs in a dataframe
Hi all,
How can one evaluate NAs in a numeric dataframe column? For example, I
have
a dataframe (demo) with a column of numbers and several NAs. If I write
demo.df >= 10, numerals will return TRUE or FALSE, but if the value is
"NA", "NA" is returned. But if I write demo.df == "NA", it returns as
"NA"
also. I know that I can remove NAs, but would like to keep the dataframe
as
is without creating a subset. I basically want to add a line that
evaluates
the NA in the demo dataframe.
As an example, I want to assign rows to classes based on values in
demo$Area. Some of the values in demo$Area are "NA"
for (i in 1:nrow(demo)) {
if (demo$Area[i] > 0 && demo$Area[i] < 10) {Class[i]<-"S01"} ## 1-10 cm2
if (demo$Area[i] >= 10 && demo$Area[i] < 25) {Class[i] <- "S02"} ##
10-25cm2
if (demo$Area[i] >= 25 && demo$Area[i] < 50) {Class[i] <-"S03"} ## 25-50
cm2
if (demo$Area[i] >= 50 && demo$Area[i] < 100) {Class[i] <-"S04"} ## 50-
100
cm2
if (demo$Area[i] >= 100 && demo$Area[i] < 200) {Class[i] <- "S05"} ##
100-200 cm2
if (demo$Area[i] >= 200 && demo$Area[i] < 400) {Class[i] <- "S06"} ##
200-400 cm2
if (demo$Area[i] >= 400 && demo$Area[i] < 800) {Class[i] <- "S07"} ##
400-800 cm2
if (demo$Area[i] >= 800 && demo$Area[i] < 1600) {Class[i] <- "S08"} ##
800-1600 cm2
if (demo$Area[i] >= 1600 && demo$Area[i] < 3200) {Class[i] <- "S09"} ##
1600-3200 cm2
if (demo$Area[i] >=3200) {Class[i] <- "S10"} ## >3200 cm2
}
What happens is that I get the message "Error in if (demo$Area[i] > 0 &&
demo$Area[i] < 10) { : missing value where TRUE/FALSE needed"
Thanks for any help
Wade
Wade,
As you have discovered, you need to test for NA first, and to do that you need to use is.na(). Something like this should work
for (i in 1:nrow(demo)) {
if (is.na(demo$Area[i])) Class[i] <- "Sna" else
if (demo$Area[i] < 10) Class[i] <- "S01" else
if (demo$Area[i] < 25) Class[i] <- "S02" else
if (demo$Area[i] < 50) Class[i] <- "S03" else
if (demo$Area[i] < 100) Class[i] <- "S04" else
if (demo$Area[i] < 200) Class[i] <- "S05" else
if (demo$Area[i] < 400) Class[i] <- "S06" else
if (demo$Area[i] < 800) Class[i] <- "S07" else
if (demo$Area[i] < 1600) Class[i] <- "S08" else
if (demo$Area[i] < 3200) Class[i] <- "S09" else
Class[i] <- "S10"
}
Hope this is helpful,
Dan
Daniel Nordlund
Bothell, WA USA
On Dec 8, 2010, at 3:10 PM, Wade Wall wrote:
Hi all, How can one evaluate NAs in a numeric dataframe column? For example, I have a dataframe (demo) with a column of numbers and several NAs. If I write demo.df >= 10, numerals will return TRUE or FALSE, but if the value is "NA", "NA" is returned. But if I write demo.df == "NA", it returns as "NA" also. I know that I can remove NAs, but would like to keep the dataframe as is without creating a subset. I basically want to add a line that evaluates the NA in the demo dataframe.
That looks really, really painful. Why not use the function
findInterval and then do a lookup in a character vector. Then you can
throw away that loopy construct completely.
> demo <- data.frame(Area = runif(10, 0, 100))
> demo$catarea <- findInterval(demo$Area, c(0,25,50,75,100))
> demo
Area catarea
1 71.440401 3
2 8.438097 1
3 45.492178 2
4 50.669996 3
5 15.444114 1
6 33.954948 2
7 19.738747 1
8 56.485654 3
9 29.218921 2
10 74.204611 3
> demo$catname <- c("S01","S02", "S03","S04")[demo$catarea]
> demo
Area catarea catname
1 71.440401 3 S03
2 8.438097 1 S01
3 45.492178 2 S02
4 50.669996 3 S03
5 15.444114 1 S01
6 33.954948 2 S02
7 19.738747 1 S01
8 56.485654 3 S03
9 29.218921 2 S02
10 74.204611 3 S03
David.
>
> As an example, I want to assign rows to classes based on values in
> demo$Area. Some of the values in demo$Area are "NA"
>
> for (i in 1:nrow(demo)) {
> if (demo$Area[i] > 0 && demo$Area[i] < 10) {Class[i]<-"S01"} ##
> 1-10 cm2
> if (demo$Area[i] >= 10 && demo$Area[i] < 25) {Class[i] <- "S02"} ##
> 10-25cm2
> if (demo$Area[i] >= 25 && demo$Area[i] < 50) {Class[i] <-"S03"} ##
> 25-50
> cm2
> if (demo$Area[i] >= 50 && demo$Area[i] < 100) {Class[i] <-"S04"} ##
> 50-100
> cm2
> if (demo$Area[i] >= 100 && demo$Area[i] < 200) {Class[i] <- "S05"} ##
> 100-200 cm2
> if (demo$Area[i] >= 200 && demo$Area[i] < 400) {Class[i] <- "S06"} ##
> 200-400 cm2
> if (demo$Area[i] >= 400 && demo$Area[i] < 800) {Class[i] <- "S07"} ##
> 400-800 cm2
> if (demo$Area[i] >= 800 && demo$Area[i] < 1600) {Class[i] <- "S08"}
> ##
> 800-1600 cm2
> if (demo$Area[i] >= 1600 && demo$Area[i] < 3200) {Class[i] <-
> "S09"} ##
> 1600-3200 cm2
> if (demo$Area[i] >=3200) {Class[i] <- "S10"} ## >3200 cm2
> }
>
> What happens is that I get the message "Error in if (demo$Area[i] >
> 0 &&
> demo$Area[i] < 10) { : missing value where TRUE/FALSE needed"
>
> Thanks for any help
>
> Wade
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101208/18117388/attachment.pl>