Skip to content

vector labels are not permuted properly in a call to sort() (R 2.1)

4 messages · Liaw, Andy, Martin Maechler, David James +1 more

#
The `problem' is that sort() does not doing anything special when given
a matrix: it only treat it as a vector.  After sorting, it copies
attributes of the original input to the output.  Since dimnames are
attributes, they get copied as is.

Try:
[,1] [,2]
A    8    4
B    7    3
C    6    2
D    5    1
[,1] [,2]
A    1    5
B    2    6
C    3    7
D    4    8

Notice the row names stay the same.  I'd argue that this is the correct
behavior.

Andy
#
AndyL> The `problem' is that sort() does not doing anything special when given
    AndyL> a matrix: it only treat it as a vector.  After sorting, it copies
    AndyL> attributes of the original input to the output.  Since dimnames are
    AndyL> attributes, they get copied as is.

exactly. Thanks Andy.

And I think users would want this (copying of attributes) in
many cases; in particular for user-created attributes

?sort  really talks about sorting of vectors and factors;
       and it doesn't mention attributes explicitly at all
       {which should probably be improved}.

One could wonder if R should keep the dim & dimnames
attributes for arrays and matrices.  
S-plus (6.2) simply drops them {returning a bare unnames vector}
and that seems pretty reasonable to me.

At least the user would never make the wrong assumptions that
Greg made about ``matrix sorting''.


    AndyL> Try:

    >> y <- matrix(8:1, 4, 2, dimnames=list(LETTERS[1:4], NULL))
    >> y
    AndyL> [,1] [,2]
    AndyL> A    8    4
    AndyL> B    7    3
    AndyL> C    6    2
    AndyL> D    5    1
    >> sort(y)
    AndyL> [,1] [,2]
    AndyL> A    1    5
    AndyL> B    2    6
    AndyL> C    3    7
    AndyL> D    4    8

    AndyL> Notice the row names stay the same.  I'd argue that this is the correct
    AndyL> behavior.

    AndyL> Andy


    >> From: Greg Finak
    >> 
    >> Not sure if this is the correct forum for this, 

yes, R-devel is the proper forum.
{also since this is really a proposal for a change in R ...}

    >> but I've found what I  
    >> would consider to be a potentially serious bug to the 
    >> unsuspecting user.
    >> Given a numeric vector V with class labels in R,  the following calls
    >> 
    >> 1.
    >> > sort(as.matrix(V))
    >> 
    >> and
    >> 
    >> 2.
    >> >as.matrix(sort(V))
    >> 
    >> produce different ouput. The vector is sorted properly in 
    >> both cases,  
    >> but only 2. produces the correct labeling of the vector. The call to  
    >> 1. produces a vector with incorrect labels (not sorted).
    >> 
    >> Code:
    >> >X<-c("A","B","C","D","E","F","G","H")
    >> >Y<-rev(1:8)
    >> >names(Y)<-X
    >> > Y
    >> A B C D E F G H
    >> 8 7 6 5 4 3 2 1
    >> > sort(as.matrix(Y))
    >> [,1]
    >> A    1
    >> B    2
    >> C    3
    >> D    4
    >> E    5
    >> F    6
    >> G    7
    >> H    8
    >> > as.matrix(sort(Y))
    >> [,1]
    >> H    1
    >> G    2
    >> F    3
    >> E    4
    >> D    5
    >> C    6
    >> B    7
    >> A    8
    >>
#
Martin Maechler wrote:
This is as described in the Blue book, p.146, "Throwing Away Attributes".
--
David
#
The main problem is that R is inconsistent here.  There are lots of 
branches through the sort() code.  Greg showed one.  Here are four more
[,1] [,2]
A    1    5
B    2    6
C    3    7
D    4    8
h g f e d c b a
1 2 3 4 5 6 7 8
[,1] [,2]
A    1    5
B    2    6
C    3    7
D    4    8
attr(,"names")
[1] "h" "g" "f" "e" "d" "c" "b" "a"
[,1] [,2]
A    1    5
B    2    6
C    3    7
D    4    8
attr(,"names")
[1] "a" "b" "c" "d" "e" "f" "g" "h"

I believe Svr4 does keep names but does not allow names on matrices.

There are other problems: should sorting a time-series preserve the ts 
properties (probably not, but it does).  Should (S3 or S4) class 
information be preserved (it seems inappropriate for a time series, for 
example)?

The course of least resistance here is to always preserve attributes and 
to document that we do so.  Probably the most S-compliant solution is to 
preserve only names (and sort them as now).

David James quotes the Blue Book, but note that S itself no longer follows 
the principle stated there.
On Wed, 5 Oct 2005, Martin Maechler wrote: