Skip to content

Small change to plot.xy

13 messages · Peter Dalgaard, Duncan Murdoch, Jonathan Rougier +3 more

#
Hi everyone,

Is there any reason why we should not automatically coerce a factor
supplied as an argument to col in a plotting function?  The following
modification (to R-1.6.1) seems pretty harmless
function (xy, type, pch = 1, lty = "solid", col = par("fg"), 
    bg = NA, cex = 1, ...) 
{
    if (is.factor(col)) 
        col <- codes(col)
    .Internal(plot.xy(xy, type, pch, lty, col, bg, cex, ...))
}
<environment: namespace:base>

and I think it is natural and not really wrong to want to type, say,
and get the colours.

Cheers, Jonathan.
#
Jonathan Rougier <J.C.Rougier@durham.ac.uk> writes:
It's not clear to me that you want codes() there. Consider

f <-factor(c("red","blue","green")) 
plot(1:3,col=codes(f))

which get coloured green, black, and red. Arguably better than to drop
codes() and get black, black, and black, but not by much. Alternatives
could be as.numeric() or as.character(), but it all gets a bit
arbitrary. I think I prefer the explicit style in any case

clr <- sort(c("red","blue","green"))
plot(1:3,col=clr[f])

(beware the ordering of levels in f)
#
*Not* codes! Unclass perhaps.

However, why should  col=factor(c("red", "blue", "green"))
not as naturally coerce to character.

I think we do need to get the user to specify what is meant, as now.
There are lots of examples of this sort of thing in MASS, BTW.
On Fri, 22 Nov 2002, Jonathan Rougier wrote:

            

  
    
#
On 22 Nov 2002 14:01:10 +0100, Peter Dalgaard BSA
<p.dalgaard@biostat.ku.dk> wrote in message
<x2lm3lvjvt.fsf@biostat.ku.dk>:
I think you want it to act as though it is coercing everything to
character.  If something prints as "1", then it should act as though
it is colour 1.  If it prints as "red", it should be red.

I don't think you actually want to coerce everything to character;
that's too inefficient in the case of numeric colours, but it should
act as if you did.

I think this version gives that behaviour, but I might have the wrong
test...

plot.xy <- function (xy, type, pch = 1, lty = "solid", col =
par("fg"), 
    bg = NA, cex = 1, ...) 
{
    if (!is.numeric(col)) 
        col <- as.character(col)
    .Internal(plot.xy(xy, type, pch, lty, col, bg, cex, ...))
}

This doesn't help with Jonathan's iris example; in that case, I think
the fix should be to abort with the declaration that "setosa" etc.
aren't colours.  Alternatively, they could be coerced to numeric
colours (preferably not all black), but at the very least a warning
should be given when that is done.

Duncan Murdoch
#
OK -- so there are two objections to the proposal:

1) The factor might be a colour specifier

2) Using codes is not a good idea because there are only 8 colours
available for an integer argument to col.

I can see that both of these have merit, but I think they are both easy
to work around, if necessary.  No-one has disagreed that it's natural to
want to pass a factor to col, and I believe that the vast majority of
times when this occurs the factor is not designed explicitly to paint
the points.

I think I would let the first objection pass.  Using codes to coerce the
factor we still get different colours for different factors, just not
the specified colours.  This is not right, but it's rarely a disaster
either and should be easy to spot for anyone who is expecting to see
"slateblue" and gets "red".

As for the second objection, I think this is valid and should be
addressed, but at the same time it is a different problem.  I often
wondered why col = 1 was black (not really a colour at all, and not very
good for boxplots) and, more pertinently, why col = 2 is red and col = 3
is green: isn't the most common form of colour-blindness red/green?  At
the very least, let's not have them next to each other in the list!  We
have 657 named colours to choose from: why not have a explicit "int2col"
function that provides a bigger and better table?

More-or-less the same arguments also apply to pch.

Jonathan.
Jonathan Rougier wrote:

  
    
#
On Sat, 23 Nov 2002, Jonathan Rougier wrote:

            
You seem not to appreciate

0) You used codes when you should have used unclass!

which was made twice.  Please do make sure you understand the difference.
I did disagree.  I don't see why users should not explicitly map the
factor levels to colours, which takes only a few extra characters.
People have been doing this for a decade in S without objecting.
Why is it worth complicating R for?
Not all devices (by any means) can display those colours, or only a
limited number of colours in total.

  
    
#
ripley@stats.ox.ac.uk wrote:
You are right -- I can see from the help page of codes that I should
have used unclass.
That's the difference between us: you think they should have to type a
few extra characters to achieve a natural result, and I don't.  It's two
extra lines in the source and an extra line in the help file -- I don't
call this a complication and I think that the next generation of
statisticians will be that much more taken with R (as opposed to, say,
SPSS) if we take the trouble to make the default behaviour as intuitive
as possible.
That doesn't stop us coming up with a longer list in a better order,
does it?

Jonathan.
#
On Sat, 23 Nov 2002, Jonathan Rougier wrote:

            
No, the difference is that what you find `intuitive' other people find
perverse.  Giving a factor of colour names and getting a different set of
colours is perverse.  If the intention is not clear, it is more
`intuitive' to give an error than to guess incorrectly.

Does SPSS actually do this, that is arbitrarily assign colours to
categories that might be names of colours?
#
ripley@stats.ox.ac.uk wrote:
I think you know the answer to that!  That wasn't my point.  I give
statistical computing practicals using the same material to two similar
groups of students, some using SPSS and some using R.  Right now, the
SPSS students are getting a lot more experience of data analysis.  My
SPSS students put Species in the "colour points by" box.  You and I may
disagree on what is natural and what is peverse, but my R students want
to do "col = iris$Species" and I am sympathetic.  

If your objection is overwriting colours then let's go with Robert's
suggestion of checking for this explicitly, maybe something along the
lines of

if (is.factor(col)) {
  tmp <- as.character(col)
  if (all(tmp %in% colors()))
    col <- tmp
  else
    col <- unclass(col)
}

which has the desired effect:
Jonathan.
#
On Saturday 23 November 2002 04:53 am, Jonathan Rougier wrote:

            
This sort of thing is handled more systematically in Trellis. For example, try

library(lattice)
data(iris)
xyplot(Sepal.Length ~ Sepal.Width, data = iris, groups = Species)
splom(iris[, 1:4], groups = iris$Species)

(This doesn't exactly answer your original question, but I think it does 
address the example you have given here.)

Deepayan
#
On Sat, 23 Nov 2002, Jonathan Rougier wrote:

            
The function below might help
It checks:
  - are the levels colour names
  - are they numbers
  - are they color specifiers like #a0f3d2
otherwise it returns the underlying codes with unclass().

On the issue of whether the current default palettes are ideal I can quote
Paul Murrell's talk at the JSM ("The only word for this is
`embarrassing'").  Improved palettes are planned, and one step in that
direction is the RColorBrewer package, which supplies color schemes
useful for images, maps and barplots.

	-thomas


factor2color<-function(color){
   nms<-as.character(color)
   n<-length(nms)
   ## are they color names
   is.color<-nms %in% colors()
   if (all(is.color)) return(nms)
   if (sum(is.color)>2 && mean(is.color)>2/3)
	warning("Not all factor names are colors")
   ## are they numbers
   m<-length(grep("[^:digit:]",nms))
   if (m==0 && all(!is.na(nums<-as.numeric(nms))))
 	return(nums)
   ## are they #a0bfe4
   n<-length(nms)
   m<-length(grep("#[0-9A-Fa-f]{6,6}",nms))
   if(m==n)
      return(nms)
   ## otherwise just return numbers
   return(unclass(color))

}
1 day later
#
Hi Thomas,
Thomas Lumley wrote:
Is there a case, given that colours *can* be represented in several
different ways, for including the two functions "is.color" and
"as.color" in the base?  I imagine the latter would work like
"as.numeric", putting NA for elements that cannot be coerced to
colours.  The function "as.color" might return a specific colour format,
perhaps #hhhhhh if that is the most general, which would simplify other
parts of the code.

Jonathan.
#
The problem is that col= is interpreted in C code for many different R
functions, not kust those going through plot.xy.  The only consistent way I
see to handle this is to change the common C code to handle more cases, if
people really think it is worth complicating R for.

I was going to point out that your previous solution was less general than
what was there at present.
On Mon, 25 Nov 2002, Jonathan Rougier wrote: