custom sort?

I've moved this to R-devel...
I couldn't get your suggested method to work:

  `==.foo` <- function(a,b) unclass(a)==unclass(b)
  `>.foo` <- function(a,b) unclass(a) < unclass(b)     # invert comparison
  is.na.foo <- function(a)is.na(unclass(a))

  sort(structure(sample(5),class="foo"))  #-> 1:5  -- not reversed

What am I missing?
There are two problems.  First, I didn't mention that you need a method 
for indexing as well.  The code needs to evaluate things like x[i] > 
x[j], and by default x[i] will not be of class "foo", so the custom 
comparison methods won't be called.

Second, I think there's a bug in the internal code, specifically in 
do_rank or orderVector1 in sort.c:  orderVector1 ignores the class of x. 
  do_rank pays attention when breaking ties, so I think this is an 
oversight.

So I'd say two things should be done:

  1.  the bug should be fixed.  Even if this isn't the most obvious 
approach, it should work.

  2.  we should look for ways to make all of this simpler, e.g. allowing 
a comparison function to be used.

I'll take on 1, but not 2.  It's hard to work out the right place for 
the comparison function to appear, and it would require a lot of work to 
implement, because all of this stuff (sort, rank, order, xtfrm, 
sort.int, etc.) is closely interrelated, some but not all of the 
functions are S3 generics, some implemented internally, etc.  In the 
end, I'd guess the results won't be very satisfactory from a performance 
point of view:  all those calls out to R to do the comparisons are going 
to be really slow.

I think your advice to use order() with multiple keys is likely to be 
much faster in most instances.  It's just a better approach in R.

Duncan Murdoch
           -s

On Thu, May 28, 2009 at 5:48 PM, Duncan Murdoch <murdoch at stats.uwo.ca>wrote:

On 28/05/2009 5:34 PM, Steve Jaffe wrote:

Sounds simple but haven't been able to find it in docs: is it possible to
sort a vector using a user-defined comparison function? Seems it must be,
but "sort" doesn't seem to provide that option, nor does "order" sfaics

You put a class on the vector (e.g. using class(x) <- "myvector"), then
define a conversion to numeric (e.g. xtfrm.myvector) or actual comparison
methods (you'll need ==.myvector, >.myvector, and is.na.myvector).

Duncan Murdoch

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
I've moved this to R-devel...

On 5/28/2009 8:17 PM, Stavros Macrakis wrote:
I couldn't get your suggested method to work:

  `==.foo` <- function(a,b) unclass(a)==unclass(b)
  `>.foo` <- function(a,b) unclass(a) < unclass(b)     # invert comparison
  is.na.foo <- function(a)is.na(unclass(a))

  sort(structure(sample(5),class="foo"))  #-> 1:5  -- not reversed

What am I missing?
There are two problems.  First, I didn't mention that you need a method 
for indexing as well.  The code needs to evaluate things like x[i] > 
x[j], and by default x[i] will not be of class "foo", so the custom 
comparison methods won't be called.

Second, I think there's a bug in the internal code, specifically in 
do_rank or orderVector1 in sort.c:  orderVector1 ignores the class of x. 
  do_rank pays attention when breaking ties, so I think this is an 
oversight.

So I'd say two things should be done:

  1.  the bug should be fixed.  Even if this isn't the most obvious 
approach, it should work.
I've now fixed the bug, and clarified the documentation to say

   The default method will make use of == and > methods
   for the class of x[i] (for integers i), and the
   is.na method for the class of x, but might be rather
   slow when doing so.

You don't actually need a custom indexing method, you just need to be 
aware that it's the class of x[i] that is important for comparisons.

This will make it into R-patched and R-devel.

Duncan Murdoch
  2.  we should look for ways to make all of this simpler, e.g. allowing 
a comparison function to be used.

I'll take on 1, but not 2.  It's hard to work out the right place for 
the comparison function to appear, and it would require a lot of work to 
implement, because all of this stuff (sort, rank, order, xtfrm, 
sort.int, etc.) is closely interrelated, some but not all of the 
functions are S3 generics, some implemented internally, etc.  In the 
end, I'd guess the results won't be very satisfactory from a performance 
point of view:  all those calls out to R to do the comparisons are going 
to be really slow.