Small change to plot.xy

Hi everyone,

Is there any reason why we should not automatically coerce a factor
supplied as an argument to col in a plotting function?  The following
modification (to R-1.6.1) seems pretty harmless
plot.xy
function (xy, type, pch = 1, lty = "solid", col = par("fg"), 
    bg = NA, cex = 1, ...) 
{
    if (is.factor(col)) 
        col <- codes(col)
    .Internal(plot.xy(xy, type, pch, lty, col, bg, cex, ...))
}
<environment: namespace:base>

and I think it is natural and not really wrong to want to type, say,
data(iris)
pairs(iris[, 1:4], col = iris[, 5])
and get the colours.

Cheers, Jonathan.
Jonathan Rougier                       Science Laboratories
Department of Mathematical Sciences    South Road
University of Durham                   Durham DH1 3LE
tel: +44 (0)191 374 2361, fax: +44 (0)191 374 7388
http://www.maths.dur.ac.uk/stats/people/jcr/jcr.html
Jonathan Rougier <J.C.Rougier@durham.ac.uk> writes:
Hi everyone,

Is there any reason why we should not automatically coerce a factor
supplied as an argument to col in a plotting function?  The following
modification (to R-1.6.1) seems pretty harmless

plot.xy
function (xy, type, pch = 1, lty = "solid", col = par("fg"), 
    bg = NA, cex = 1, ...) 
{
    if (is.factor(col)) 
        col <- codes(col)
    .Internal(plot.xy(xy, type, pch, lty, col, bg, cex, ...))
}
<environment: namespace:base>

and I think it is natural and not really wrong to want to type, say,

data(iris)
pairs(iris[, 1:4], col = iris[, 5])
and get the colours.
It's not clear to me that you want codes() there. Consider

f <-factor(c("red","blue","green")) 
plot(1:3,col=codes(f))

which get coloured green, black, and red. Arguably better than to drop
codes() and get black, black, and black, but not by much. Alternatives
could be as.numeric() or as.character(), but it all gets a bit
arbitrary. I think I prefer the explicit style in any case

clr <- sort(c("red","blue","green"))
plot(1:3,col=clr[f])

(beware the ordering of levels in f)
O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)             FAX: (+45) 35327907
*Not* codes! Unclass perhaps.

However, why should  col=factor(c("red", "blue", "green"))
not as naturally coerce to character.

I think we do need to get the user to specify what is meant, as now.
There are lots of examples of this sort of thing in MASS, BTW.

Hi everyone,

Is there any reason why we should not automatically coerce a factor
supplied as an argument to col in a plotting function?  The following
modification (to R-1.6.1) seems pretty harmless

plot.xy
function (xy, type, pch = 1, lty = "solid", col = par("fg"),
    bg = NA, cex = 1, ...)
{
    if (is.factor(col))
        col <- codes(col)
    .Internal(plot.xy(xy, type, pch, lty, col, bg, cex, ...))
}
<environment: namespace:base>

and I think it is natural and not really wrong to want to type, say,

data(iris)
pairs(iris[, 1:4], col = iris[, 5])
and get the colours.

Cheers, Jonathan.

--
Jonathan Rougier                       Science Laboratories
Department of Mathematical Sciences    South Road
University of Durham                   Durham DH1 3LE
tel: +44 (0)191 374 2361, fax: +44 (0)191 374 7388
http://www.maths.dur.ac.uk/stats/people/jcr/jcr.html

______________________________________________
R-devel@stat.math.ethz.ch mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-devel

Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
On 22 Nov 2002 14:01:10 +0100, Peter Dalgaard BSA
<p.dalgaard@biostat.ku.dk> wrote in message
<x2lm3lvjvt.fsf@biostat.ku.dk>:
It's not clear to me that you want codes() there. Consider

f <-factor(c("red","blue","green")) 
plot(1:3,col=codes(f))

which get coloured green, black, and red. Arguably better than to drop
codes() and get black, black, and black, but not by much. Alternatives
could be as.numeric() or as.character(), but it all gets a bit
arbitrary.
I think you want it to act as though it is coercing everything to
character.  If something prints as "1", then it should act as though
it is colour 1.  If it prints as "red", it should be red.

I don't think you actually want to coerce everything to character;
that's too inefficient in the case of numeric colours, but it should
act as if you did.

I think this version gives that behaviour, but I might have the wrong
test...

plot.xy <- function (xy, type, pch = 1, lty = "solid", col =
par("fg"), 
    bg = NA, cex = 1, ...) 
{
    if (!is.numeric(col)) 
        col <- as.character(col)
    .Internal(plot.xy(xy, type, pch, lty, col, bg, cex, ...))
}

This doesn't help with Jonathan's iris example; in that case, I think
the fix should be to abort with the declaration that "setosa" etc.
aren't colours.  Alternatively, they could be coerced to numeric
colours (preferably not all black), but at the very least a warning
should be given when that is done.

Duncan Murdoch
OK -- so there are two objections to the proposal:

1) The factor might be a colour specifier

2) Using codes is not a good idea because there are only 8 colours
available for an integer argument to col.

I can see that both of these have merit, but I think they are both easy
to work around, if necessary.  No-one has disagreed that it's natural to
want to pass a factor to col, and I believe that the vast majority of
times when this occurs the factor is not designed explicitly to paint
the points.

I think I would let the first objection pass.  Using codes to coerce the
factor we still get different colours for different factors, just not
the specified colours.  This is not right, but it's rarely a disaster
either and should be easy to spot for anyone who is expecting to see
"slateblue" and gets "red".

As for the second objection, I think this is valid and should be
addressed, but at the same time it is a different problem.  I often
wondered why col = 1 was black (not really a colour at all, and not very
good for boxplots) and, more pertinently, why col = 2 is red and col = 3
is green: isn't the most common form of colour-blindness red/green?  At
the very least, let's not have them next to each other in the list!  We
have 657 named colours to choose from: why not have a explicit "int2col"
function that provides a bigger and better table?

More-or-less the same arguments also apply to pch.

Jonathan.
Hi everyone,

Is there any reason why we should not automatically coerce a factor
supplied as an argument to col in a plotting function?  The following
modification (to R-1.6.1) seems pretty harmless

plot.xy
function (xy, type, pch = 1, lty = "solid", col = par("fg"),
    bg = NA, cex = 1, ...)
{
    if (is.factor(col))
        col <- codes(col)
    .Internal(plot.xy(xy, type, pch, lty, col, bg, cex, ...))
}
<environment: namespace:base>

and I think it is natural and not really wrong to want to type, say,

data(iris)
pairs(iris[, 1:4], col = iris[, 5])
and get the colours.

Cheers, Jonathan.

--
Jonathan Rougier                       Science Laboratories
Department of Mathematical Sciences    South Road
University of Durham                   Durham DH1 3LE
tel: +44 (0)191 374 2361, fax: +44 (0)191 374 7388
http://www.maths.dur.ac.uk/stats/people/jcr/jcr.html

______________________________________________
R-devel@stat.math.ethz.ch mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-devel

Jonathan Rougier                       Science Laboratories
Department of Mathematical Sciences    South Road
University of Durham                   Durham DH1 3LE
tel: +44 (0)191 374 2361, fax: +44 (0)191 374 7388
http://www.maths.dur.ac.uk/stats/people/jcr/jcr.html

OK -- so there are two objections to the proposal:

1) The factor might be a colour specifier

2) Using codes is not a good idea because there are only 8 colours
available for an integer argument to col.
You seem not to appreciate

0) You used codes when you should have used unclass!

which was made twice.  Please do make sure you understand the difference.
I can see that both of these have merit, but I think they are both easy
to work around, if necessary.  No-one has disagreed that it's natural to
want to pass a factor to col, and I believe that the vast majority of
times when this occurs the factor is not designed explicitly to paint
the points.
I did disagree.  I don't see why users should not explicitly map the
factor levels to colours, which takes only a few extra characters.
People have been doing this for a decade in S without objecting.
Why is it worth complicating R for?
I think I would let the first objection pass.  Using codes to coerce the
factor we still get different colours for different factors, just not
the specified colours.  This is not right, but it's rarely a disaster
either and should be easy to spot for anyone who is expecting to see
"slateblue" and gets "red".

As for the second objection, I think this is valid and should be
addressed, but at the same time it is a different problem.  I often
wondered why col = 1 was black (not really a colour at all, and not very
good for boxplots) and, more pertinently, why col = 2 is red and col = 3
is green: isn't the most common form of colour-blindness red/green?  At
the very least, let's not have them next to each other in the list!  We
have 657 named colours to choose from: why not have a explicit "int2col"
function that provides a bigger and better table?
Not all devices (by any means) can display those colours, or only a
limited number of colours in total.
More-or-less the same arguments also apply to pch.

Jonathan.

Jonathan Rougier wrote:
Hi everyone,

Is there any reason why we should not automatically coerce a factor
supplied as an argument to col in a plotting function?  The following
modification (to R-1.6.1) seems pretty harmless

plot.xy
function (xy, type, pch = 1, lty = "solid", col = par("fg"),
    bg = NA, cex = 1, ...)
{
    if (is.factor(col))
        col <- codes(col)
    .Internal(plot.xy(xy, type, pch, lty, col, bg, cex, ...))
}
<environment: namespace:base>

and I think it is natural and not really wrong to want to type, say,

data(iris)
pairs(iris[, 1:4], col = iris[, 5])
and get the colours.

Cheers, Jonathan.

--
Jonathan Rougier                       Science Laboratories
Department of Mathematical Sciences    South Road
University of Durham                   Durham DH1 3LE
tel: +44 (0)191 374 2361, fax: +44 (0)191 374 7388
http://www.maths.dur.ac.uk/stats/people/jcr/jcr.html

______________________________________________
R-devel@stat.math.ethz.ch mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-devel
--
Jonathan Rougier                       Science Laboratories
Department of Mathematical Sciences    South Road
University of Durham                   Durham DH1 3LE
tel: +44 (0)191 374 2361, fax: +44 (0)191 374 7388
http://www.maths.dur.ac.uk/stats/people/jcr/jcr.html

______________________________________________
R-devel@stat.math.ethz.ch mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-devel

Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
On Sat, 23 Nov 2002, Jonathan Rougier wrote:

OK -- so there are two objections to the proposal:

1) The factor might be a colour specifier

2) Using codes is not a good idea because there are only 8 colours
available for an integer argument to col.
You seem not to appreciate

0) You used codes when you should have used unclass!

which was made twice.  Please do make sure you understand the difference.
You are right -- I can see from the help page of codes that I should
have used unclass.
I can see that both of these have merit, but I think they are both easy
to work around, if necessary.  No-one has disagreed that it's natural to
want to pass a factor to col, and I believe that the vast majority of
times when this occurs the factor is not designed explicitly to paint
the points.
I did disagree.  I don't see why users should not explicitly map the
factor levels to colours, which takes only a few extra characters.
People have been doing this for a decade in S without objecting.
Why is it worth complicating R for?
That's the difference between us: you think they should have to type a
few extra characters to achieve a natural result, and I don't.  It's two
extra lines in the source and an extra line in the help file -- I don't
call this a complication and I think that the next generation of
statisticians will be that much more taken with R (as opposed to, say,
SPSS) if we take the trouble to make the default behaviour as intuitive
as possible.

I think I would let the first objection pass.  Using codes to coerce the
factor we still get different colours for different factors, just not
the specified colours.  This is not right, but it's rarely a disaster
either and should be easy to spot for anyone who is expecting to see
"slateblue" and gets "red".

As for the second objection, I think this is valid and should be
addressed, but at the same time it is a different problem.  I often
wondered why col = 1 was black (not really a colour at all, and not very
good for boxplots) and, more pertinently, why col = 2 is red and col = 3
is green: isn't the most common form of colour-blindness red/green?  At
the very least, let's not have them next to each other in the list!  We
have 657 named colours to choose from: why not have a explicit "int2col"
function that provides a bigger and better table?
Not all devices (by any means) can display those colours, or only a
limited number of colours in total.
That doesn't stop us coming up with a longer list in a better order,
does it?

Jonathan.
Jonathan Rougier                       Science Laboratories
Department of Mathematical Sciences    South Road
University of Durham                   Durham DH1 3LE
tel: +44 (0)191 374 2361, fax: +44 (0)191 374 7388
http://www.maths.dur.ac.uk/stats/people/jcr/jcr.html

ripley@stats.ox.ac.uk wrote:

I can see that both of these have merit, but I think they are both easy
to work around, if necessary.  No-one has disagreed that it's natural to
want to pass a factor to col, and I believe that the vast majority of
times when this occurs the factor is not designed explicitly to paint
the points.
I did disagree.  I don't see why users should not explicitly map the
factor levels to colours, which takes only a few extra characters.
People have been doing this for a decade in S without objecting.
Why is it worth complicating R for?
That's the difference between us: you think they should have to type a
few extra characters to achieve a natural result, and I don't.  It's two
extra lines in the source and an extra line in the help file -- I don't
call this a complication and I think that the next generation of
statisticians will be that much more taken with R (as opposed to, say,
SPSS) if we take the trouble to make the default behaviour as intuitive
as possible.
No, the difference is that what you find `intuitive' other people find
perverse.  Giving a factor of colour names and getting a different set of
colours is perverse.  If the intention is not clear, it is more
`intuitive' to give an error than to guess incorrectly.

Does SPSS actually do this, that is arbitrarily assign colours to
categories that might be names of colours?
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
On Sat, 23 Nov 2002, Jonathan Rougier wrote:

ripley@stats.ox.ac.uk wrote:

I can see that both of these have merit, but I think they are both easy
to work around, if necessary.  No-one has disagreed that it's natural to
want to pass a factor to col, and I believe that the vast majority of
times when this occurs the factor is not designed explicitly to paint
the points.
I did disagree.  I don't see why users should not explicitly map the
factor levels to colours, which takes only a few extra characters.
People have been doing this for a decade in S without objecting.
Why is it worth complicating R for?
That's the difference between us: you think they should have to type a
few extra characters to achieve a natural result, and I don't.  It's two
extra lines in the source and an extra line in the help file -- I don't
call this a complication and I think that the next generation of
statisticians will be that much more taken with R (as opposed to, say,
SPSS) if we take the trouble to make the default behaviour as intuitive
as possible.
No, the difference is that what you find `intuitive' other people find
perverse.  Giving a factor of colour names and getting a different set of
colours is perverse.  If the intention is not clear, it is more
`intuitive' to give an error than to guess incorrectly.

Does SPSS actually do this, that is arbitrarily assign colours to
categories that might be names of colours?
I think you know the answer to that!  That wasn't my point.  I give
statistical computing practicals using the same material to two similar
groups of students, some using SPSS and some using R.  Right now, the
SPSS students are getting a lot more experience of data analysis.  My
SPSS students put Species in the "colour points by" box.  You and I may
disagree on what is natural and what is peverse, but my R students want
to do "col = iris$Species" and I am sympathetic.  

If your objection is overwriting colours then let's go with Robert's
suggestion of checking for this explicitly, maybe something along the
lines of

if (is.factor(col)) {
  tmp <- as.character(col)
  if (all(tmp %in% colors()))
    col <- tmp
  else
    col <- unclass(col)
}

which has the desired effect:
data(iris)
iris$color <- c("red", "lightgreen", "slateblue")[unclass(iris$Species)]
pairs(iris[, 1:4], col = iris$Species) # black, red, green
pairs(iris[, 1:4], col = iris$color)   # as above
Jonathan.
Jonathan Rougier                       Science Laboratories
Department of Mathematical Sciences    South Road
University of Durham                   Durham DH1 3LE
tel: +44 (0)191 374 2361, fax: +44 (0)191 374 7388
http://www.maths.dur.ac.uk/stats/people/jcr/jcr.html

Does SPSS actually do this, that is arbitrarily assign colours to
categories that might be names of colours?
I think you know the answer to that!  That wasn't my point.  I give
statistical computing practicals using the same material to two similar
groups of students, some using SPSS and some using R.  Right now, the
SPSS students are getting a lot more experience of data analysis.  My
SPSS students put Species in the "colour points by" box.  You and I may
disagree on what is natural and what is peverse, but my R students want
to do "col = iris$Species" and I am sympathetic.
This sort of thing is handled more systematically in Trellis. For example, try

library(lattice)
data(iris)
xyplot(Sepal.Length ~ Sepal.Width, data = iris, groups = Species)
splom(iris[, 1:4], groups = iris$Species)

(This doesn't exactly answer your original question, but I think it does 
address the example you have given here.)

Deepayan

OK -- so there are two objections to the proposal:

1) The factor might be a colour specifier

The function below might help
It checks:
  - are the levels colour names
  - are they numbers
  - are they color specifiers like #a0f3d2
otherwise it returns the underlying codes with unclass().

On the issue of whether the current default palettes are ideal I can quote
Paul Murrell's talk at the JSM ("The only word for this is
`embarrassing'").  Improved palettes are planned, and one step in that
direction is the RColorBrewer package, which supplies color schemes
useful for images, maps and barplots.

	-thomas

factor2color<-function(color){
   nms<-as.character(color)
   n<-length(nms)
   ## are they color names
   is.color<-nms %in% colors()
   if (all(is.color)) return(nms)
   if (sum(is.color)>2 && mean(is.color)>2/3)
	warning("Not all factor names are colors")
   ## are they numbers
   m<-length(grep("[^:digit:]",nms))
   if (m==0 && all(!is.na(nums<-as.numeric(nms))))
 	return(nums)
   ## are they #a0bfe4
   n<-length(nms)
   m<-length(grep("#[0-9A-Fa-f]{6,6}",nms))
   if(m==n)
      return(nms)
   ## otherwise just return numbers
   return(unclass(color))

}
Hi Thomas,
On Sat, 23 Nov 2002, Jonathan Rougier wrote:

OK -- so there are two objections to the proposal:

1) The factor might be a colour specifier

The function below might help
It checks:
  - are the levels colour names
  - are they numbers
  - are they color specifiers like #a0f3d2
otherwise it returns the underlying codes with unclass().
Is there a case, given that colours *can* be represented in several
different ways, for including the two functions "is.color" and
"as.color" in the base?  I imagine the latter would work like
"as.numeric", putting NA for elements that cannot be coerced to
colours.  The function "as.color" might return a specific colour format,
perhaps #hhhhhh if that is the most general, which would simplify other
parts of the code.

Jonathan.
Jonathan Rougier                       Science Laboratories
Department of Mathematical Sciences    South Road
University of Durham                   Durham DH1 3LE
tel: +44 (0)191 374 2361, fax: +44 (0)191 374 7388
http://www.maths.dur.ac.uk/stats/people/jcr/jcr.html
The problem is that col= is interpreted in C code for many different R
functions, not kust those going through plot.xy.  The only consistent way I
see to handle this is to change the common C code to handle more cases, if
people really think it is worth complicating R for.

I was going to point out that your previous solution was less general than
what was there at present.

Hi Thomas,

Thomas Lumley wrote:
On Sat, 23 Nov 2002, Jonathan Rougier wrote:

OK -- so there are two objections to the proposal:

1) The factor might be a colour specifier

The function below might help
It checks:
  - are the levels colour names
  - are they numbers
  - are they color specifiers like #a0f3d2
otherwise it returns the underlying codes with unclass().
Is there a case, given that colours *can* be represented in several
different ways, for including the two functions "is.color" and
"as.color" in the base?  I imagine the latter would work like
"as.numeric", putting NA for elements that cannot be coerced to
colours.  The function "as.color" might return a specific colour format,
perhaps #hhhhhh if that is the most general, which would simplify other
parts of the code.

Jonathan.

--
Jonathan Rougier                       Science Laboratories
Department of Mathematical Sciences    South Road
University of Durham                   Durham DH1 3LE
tel: +44 (0)191 374 2361, fax: +44 (0)191 374 7388
http://www.maths.dur.ac.uk/stats/people/jcr/jcr.html

______________________________________________
R-devel@stat.math.ethz.ch mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-devel

Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595