Skip to content
Prev 37845 / 63424 Next

On the median

I have recently become aware of some curious behaviour of median() which I think could be usefully corrected.  I am sure this must have come up before, but I'm raising it again.

The phenomenon is best shown by a simple example.
[,1]       [,2]       [,3]      [,4]
[1,] 0.1388592 0.08478220 0.02012404 0.7733054
[2,] 0.1718332 0.06370432 0.66167219 0.2521809
[3,] 0.3190116 0.08616569 0.23107320 0.6278422
[4,] 0.9185233 0.29218144 0.99193823 0.6306847
[1] 0.1118207 0.2120070 0.2750424 0.7746040

So far, so good. But what happens when you turn it into a data frame?
[1] 0.1118207 0.2120070 0.2750424 0.7746040

No problem there, yet.  But if you just look at one row:
[1] 0.0847822 0.1388592

without warning you get a vector of size two as the result, viz the two values which enclose the middle.  I thought this was simply because one row of a data frame is a list, but that can't be the whole story.  e.g.
[1] 0.2454224
Error in sort.list(x, partial = half + 0L:1L) : 
  'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
(Well yes, Brian, I did...)  

The function mean() has a nice property when you call it on a data frame, e.g.
X1        X2        X3        X4 
0.3870568 0.1317084 0.4762019 0.5710033 

and just to complicate the issue even further,
X1         X2         X3         X4 
0.13885916 0.08478220 0.02012404 0.77330535 

On the other hand, median(), whose behaviour should be similar I would suggest, just fails when handed a data frame argument.
[1] NA NA
Warning messages:
1: In mean.default(X[[1L]], ...) :
  argument is not numeric or logical: returning NA
2: In mean.default(X[[2L]], ...) :
  argument is not numeric or logical: returning NA
_________________

I suggest that there should be some consistency here, and I suggest that median() be given a data.frame method that would allow it to respond much the same as mean() does.  The way it responds to data frame arguments now is quirky, at best.

Currently median() though generic, has only the default method.
[1] mean.data.frame mean.Date       mean.default    mean.difftime   mean.POSIXct   
[6] mean.POSIXlt
[1] median.default
Perhaps quantile() should also have a data.frame method for the same reason.  To me it seems curious, too, that quantile has a POSIXt method (in the stats package) whereas median currently does not.  (mean.POSIX*t are in the base package.)
[1] quantile.default quantile.POSIXt*

   Non-visible functions are asterisked
How do people respond to this?

(I see there have been hints of this in the past, see http://tolstoy.newcastle.edu.au/R/e2/help/06/12/7692.html
but I could only find hints.)

Bill Venables
CSIRO/CMIS, Cleveland Labs.