Expected behaviour of is.unsorted?
On 12-05-24 7:39 AM, Matthew Dowle wrote:
Duncan Murdoch<murdoch.duncan<at> gmail.com> writes:
On 12-05-23 4:37 AM, Matthew Dowle wrote:
Hi, I've read ?is.unsorted and searched. Have found a few items but nothing close, yet. Is the following expected?
is.unsorted(data.frame(1:2))
[1] FALSE
is.unsorted(data.frame(2:1))
[1] FALSE
is.unsorted(data.frame(1:2,3:4))
[1] TRUE
is.unsorted(data.frame(2:1,4:3))
[1] TRUE IIUC, is.unsorted is intended for atomic vectors only (description of x in ?is.unsorted). Indeed the C source (src/main/sort.c) contains an error message "only atomic vectors can be tested to be sorted". So that is the error message I expected to see in all cases above, since I know that data.frame is not an atomic vector. But there is also this in ?is.unsorted: "except for atomic vectors and objects with a class (where the>= or> method is used)" which I don't understand. Where>= or> is used by what, and where?
If you look at the source, you will see that the basic test for classed objects is all(x[-1L]>= x[-length(x)]) (in the function base:::.gtn). This comparison doesn't really makes sense for dataframes, but it does seem to be backwards: that tests that x[2]>= x[1], x[3]>= x[2], etc., returning TRUE if all comparisons are TRUE: but that sounds like it should be is.sorted(), not is.unsorted(). Or is it my brain that is backwards?
Thanks. Yes you're right. So is.unsorted() on a data.frame is trying to tell us if there exists any unsorted row, it seems.
I would guess that it was never intended to be used this way. It is intended for to test x[1] < x[2] < x[3] ... for objects where this is a sensible calculation; it isn't really sensible for dataframes.
DF = data.frame(a=c(1,3,5),b=c(1,3,5)) DF
a b 1 1 1 # this row is sorted 2 3 3 # this row is sorted 3 5 5 # this row is sorted
is.unsorted(DF) # going by row but should be !.gtn
[1] TRUE
with(DF,is.unsorted(order(a,b))) # most people's natural expectation I guess
[1] FALSE
DF[2,2]=2 DF
a b 1 1 1 # this row is sorted 2 3 2 # this row isn't sorted 3 5 5 # this row is sorted
is.unsorted(DF) # going by row but should be !.gtn
[1] FALSE
with(DF,is.unsorted(order(a,b))) # most people's natural expectation I guess
[1] FALSE Since it seems to have a bug anyway (and if so, can't be correct in anyone's use of it), could either is.unsorted on a data.frame return the error that's in the C code already: "only atomic vectors can be tested to be sorted", for safety and to lessen confusion, or be changed to return the natural expectation proposed above? The easiest quick fix would be to negate the result of the .gtn call of course, but then you could never go back.
I don't follow the last sentence. If the .gtn call needs to be negated, why would you want to go back? Duncan Murdoch
Matthew
Duncan Murdoch
I understand why the first two are FALSE (1 item of anything must be
sorted). I don't understand the 3rd and 4th cases where length is 2:
do_isunsorted seems to call lang3(install(".gtn"), x, CADR(args))). Does
that fall back to TRUE for some reason?
Matthew
sessionInfo()
R version 2.15.0 (2012-03-30) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] data.table_1.8.0 loaded via a namespace (and not attached): [1] tools_2.15.0
______________________________________________ R-devel<at> r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel