Problem with %IN% ==FALSE when data frames are equal
On Mon, 13 Jul 2009, Jim Burke wrote:
Hi everyone I have a issue with %IN% using FALSE deep within my R code processing. It errors if the comparison data frames are equal. DESIRED WORK AROUND: I would like examine my %IN% data frame fields before submitting then to the erroring block of code. Then appropriately trap this error. Any suggestions?
Exactly. Just calculate the condition vector first, then condition with
for example if(any()), only subsetting with "[" if the condition is met.
You'll have to decide what to do if it isn't.
Always try small examples interactively before writing complicated and (in
this case) logically flawed scripts - you were assuming that at least one
logical vector value would be true.
Note that %in% and match() do the same thing, so you can also do match() -
check the ordering of arguments first! - and check for NAs in the returned
index vector - or use which() on the condition, checking on the length of
the output; this is what "[" does internally.
match(c("a", "c", "e"), letters)
match(c("A"), letters)
which(letters %in% c("a", "c", "e"))
which(letters %in% c("A"))
length(which(letters %in% c("A")))
Hope this helps,
Roger
OVERVIEW: seems [sp$block %IN% df$ID==FALSE] chokes when sp and df contain equal variables. My normal processing is to take the larger sp and get all larger sp blocks that are not in the smaller sp. As I process the list, it goes well until the last item when both sets of data frame variables are equal. PROBLEM: ############################################### ## remove blocks that we just found in pct_blk_df ############################################### tmp_hd_census_blk_sp <- hd_census_blk_sp [ (hd_census_blk_sp$BLKIDFP00 %in% pct_blk_df$ID==FALSE),] Error in lst[[i]] : subscript out of bounds
traceback()
2: .bboxCalcR(x at polygons)
1: hd_census_blk_sp[(hd_census_blk_sp$BLKIDFP00 %in% pct_blk_df$ID ==
FALSE), ]
INPUTS:
hd_census_blk_sp$BLKIDFP00
[1] 481130089001000 481130086032001 481130034002043 481130086032002 [5] 481130034002046 481130086032004 481130034002047 481130086032000 [9] 481130086032009 481130034002044 481130086032005 481130034002048 [13] 481130089001013 481130086032007 481130086032010 481130089001014 [17] 481130086032003 28480 Levels: 481130001001000 481130001001001 481130001001002 ... 481130199004013
pct_blk_df$ID
[1] 481130089001000 481130086032001 481130034002043 481130086032002 [5] 481130034002046 481130086032004 481130034002047 481130086032000 [9] 481130086032009 481130034002044 481130086032005 481130034002048 [13] 481130089001013 481130086032007 481130086032010 481130089001014 [17] 481130086032003 28480 Levels: 481130001001000 481130001001001 481130001001002 ... 481130199004013 Thanks, Jim Burke PS My thanks to Gledson Luiz Picharski for his help initially with this "outside of %IN%" logic.
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no