Skip to content

[RsR] Case where options in mad() seemingly have no effect

4 messages · John Buggeln, Martin Maechler

#
Hello!

I have what seems to be a quick question. Below I have pasted the code for the mad() function, from ?package::stats?. 

function (x, center = median(x), constant = 1.4826, na.rm = FALSE, 
  low = FALSE, high = FALSE) 
{
  if (na.rm) 
    x <- x[!is.na(x)]
  n <- length(x)
  constant * if ((low || high) && n%%2 == 0) {
    if (low && high) 
      stop("'low' and 'high' cannot be both TRUE")
    n2 <- n%/%2 + as.integer(high)
    sort(abs(x - center), partial = n2)[n2]
  }
  else median(abs(x - center))
}

The documentation states that the options ?low? and ?high? update the way the median is being calculated. However, in the code there is no update to ?center" from the value of low or high. What am I missing here? There must be something fundamental I don?t understand about R implementing this function. 

I also have a small example where the options seemingly have to effect on the MAD calculation.

###################VERSION INFO#############################

R 4.5.0 on MacOSX 14.5 (M2 Apple Silicon)
Sys.getlocale() is "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8?

Information on package ?stats?

Description:

Package: stats
Version: 4.5.0
Priority: base
Title: The R Stats Package
Author: R Core Team and contributors worldwide
Maintainer: R Core Team <do-use-Contact-address at r-project.org>
Contact: R-help mailing list <r-help at r-project.org>
Description: R statistical functions.
License: Part of R 4.5.0
Imports: utils, grDevices, graphics
Suggests: MASS, Matrix, SuppDists, methods, stats4
NeedsCompilation: yes
Encoding: UTF-8
Enhances: Kendall, coin, multcomp, pcaPP, pspearman, robustbase
Built: R 4.5.0; aarch64-apple-darwin20; 2025-04-11 20:15:23 UTC; unix

###################EXAMPLE#############################
Read 12 items
[1] 0.88956
[1] 0.88956
[1] 0.88956

This is confusing because when calculated manually the MAD is different:
[1] 0.88956
[1] 1.03782
[1] 1.11195


Best,

John Buggeln, MS
PhD Candidate
University of Delaware
#
> Hello!
    > I have what seems to be a quick question. Below I have pasted the code for the mad() function, from ?package::stats?. 

    > function (x, center = median(x), constant = 1.4826, na.rm = FALSE, 
    > 	        low = FALSE, high = FALSE) 
    > {
    >   if (na.rm) 
    >      x <- x[!is.na(x)]
    >   n <- length(x)
    >   constant * if ((low || high) && n%%2 == 0) {
    >      if (low && high) 
    >        stop("'low' and 'high' cannot be both TRUE")
    >      n2 <- n%/%2 + as.integer(high)
    >      sort(abs(x - center), partial = n2)[n2]
    >   }
    >   else median(abs(x - center))
    > }

    > The documentation states that the options ?low? and ?high?
    > update the way the median is being calculated.

Yes, but *not* the "inner" median(x) which is the default for `center`, 
but rather the "outer" median( |x - center| ).
Yes.  I think you just need to read the help file more carefully.

If you think the help file was not clear, can you propose an
improvement there ?

    > I also have a small example where the options seemingly have to effect on the MAD calculation.

Yes, in that case there is no effect (of course, see below)

    > ###################VERSION INFO#############################

    > R 4.5.0 on MacOSX 14.5 (M2 Apple Silicon)
    > Sys.getlocale() is "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8?

    > Information on package ?stats?

    > Description:

	..............
	..............


... the R version  4.5.0   is all that counts;  'stats' is part of R !

    > ###################EXAMPLE#############################

    >> var <-scan(text="3.2 3.6 4.3 4.3 4.4 4.7 5.1 5.3 5.5 5.8 6.7 8.9")
    > Read 12 items
    >> mad(var) 
    > [1] 0.88956
    >> mad(var, high=TRUE)
    > [1] 0.88956
    >> mad(var, low=TRUE)
    > [1] 0.88956

Indeed, in this case, the outer lo-median, median, and hi-median
are identical (using 'x' instead of 'var' (which is after all
the var() function)) :


##--------------o<------------------o<------------------o<-----------
x <- c(3.2, 3.6, 4.3, 4.3, 4.4, 4.7, 5.1, 5.3, 5.5, 5.8, 6.7, 8.9)
mad(x)            # 0.88956
mad(x, high=TRUE) # 0.88956
mad(x, low =TRUE) # 0.88956

## MM:  the same, yes, because in this case,
sort(abs(x - median(x)))
## [1] 0.2 0.2 0.4 0.5 0.6 0.6 0.6 0.9 1.3 1.7 1.8 4.0
##      1   2   3   4   5   6   7   8   ..          12
## the median(), lo-median() and hi-median  {w/o the constant factor}
## are all == 0.6

## MM: slightly more interesting (after all this is about robustness !
x <- round(10*x)
x[1:3] <- 1000 + 1:3
dput(x) # c(1001, 1002, 1003, 43, 44, 47, 51, 53, 55, 58, 67, 89)
## and now,
cbind(mads <- c(lomad = mad(x, low=TRUE), mad = mad(x), himad = mad(x, high = TRUE))) / 1.4826
## give

## lomad 10.5
## mad   11.5
## himad 12.5
##--------------o<------------------o<------------------o<-----------


With best regards,
Martin




    > This is confusing because when calculated manually the MAD is different:

This is not "the MAD" that we have defined; rather you use
different *center*, i.e. lo_median, median, or hi_median for the __center__ 

    >> # Manually
    >> median(abs(var-median(var)))*1.4826
    > [1] 0.88956
    >> median(abs(var-4.7))*1.4826 # LOW for median
    > [1] 1.03782
    >> median(abs(var-5.1))*1.4826 # HIGH for median
    > [1] 1.11195


    > Best,

    > John Buggeln, MS
    > PhD Candidate
    > University of Delaware
    > _______________________________________________
    > R-SIG-Robust at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-sig-robust
#
This was clarifying, thank you for your response.

A suggestion for the help file would be to specify which median the low/high options effect. E.g., 

"if TRUE, compute the ?lo-median? of the absolute deviations, i.e., for even sample size, do not average the two middle values, but take the smaller one.?

Best,

John Buggeln, MS
PhD Candidate
University of Delaware
#
> This was clarifying, thank you for your response.
    > A suggestion for the help file would be to specify which median the low/high options effect. E.g., 

    > "if TRUE, compute the ?lo-median? of the absolute deviations, i.e., for even sample size, do not average the two middle values, but take the smaller one.?

    > Best,

    > John Buggeln, MS
    > PhD Candidate
    > University of Delaware


Ok, thank you.
I now have added something like your proposal to R's sources
help file for mad()  [ <Rsrc>/src/library/stats/man/mad.Rd ;
svn rev 88376

Best,
Martin


[......................]
[......................]