Skip to content

custom sort?

3 messages · Duncan Murdoch, Stavros Macrakis

#
I've moved this to R-devel...
On 5/28/2009 8:17 PM, Stavros Macrakis wrote:
There are two problems.  First, I didn't mention that you need a method 
for indexing as well.  The code needs to evaluate things like x[i] > 
x[j], and by default x[i] will not be of class "foo", so the custom 
comparison methods won't be called.

Second, I think there's a bug in the internal code, specifically in 
do_rank or orderVector1 in sort.c:  orderVector1 ignores the class of x. 
  do_rank pays attention when breaking ties, so I think this is an 
oversight.

So I'd say two things should be done:

  1.  the bug should be fixed.  Even if this isn't the most obvious 
approach, it should work.

  2.  we should look for ways to make all of this simpler, e.g. allowing 
a comparison function to be used.

I'll take on 1, but not 2.  It's hard to work out the right place for 
the comparison function to appear, and it would require a lot of work to 
implement, because all of this stuff (sort, rank, order, xtfrm, 
sort.int, etc.) is closely interrelated, some but not all of the 
functions are S3 generics, some implemented internally, etc.  In the 
end, I'd guess the results won't be very satisfactory from a performance 
point of view:  all those calls out to R to do the comparisons are going 
to be really slow.

I think your advice to use order() with multiple keys is likely to be 
much faster in most instances.  It's just a better approach in R.

Duncan Murdoch
#
On 5/29/2009 9:28 AM, Duncan Murdoch wrote:
I've now fixed the bug, and clarified the documentation to say

   The default method will make use of == and > methods
   for the class of x[i] (for integers i), and the
   is.na method for the class of x, but might be rather
   slow when doing so.

You don't actually need a custom indexing method, you just need to be 
aware that it's the class of x[i] that is important for comparisons.

This will make it into R-patched and R-devel.

Duncan Murdoch