once more: methods on missing data
On Thu, 7 Jun 2001 Maciej.Hoffman-Wecker at evotecoai.com wrote in part:
The result of the call
x <- as.numeric(c(NA,NA,NA)); STATISTIC(x[!is.na(x)])
depends on the STATISTIC.
STATISTIC RESULT
min Inf and warning message
max -Inf and warning message
mean NaN and no warning message
quantile named vector containing NAs and no warning message
sd abortion of the evaluation with an error message
<snip>
Should not the statistics generally return NA and a warning message?
Ideally, they shouldn't. NA is missing data -- that is, we don't know the value of the statistic because some data were not measured. That's why, for example NA & FALSE is FALSE, not NA, because the value of the expression is known, no matter what the first operand is. The results for min() and max() have the rationale that eg max(a,max(b)) should return the same as max(a,b) even when b is empty. There's even some examples where this is genuinely helpful. If the others were to return a value I think NaN (undefined numerical result) would be better than NA (missing data), as is the case with mean(). This would argue for changing the return value of quantile() as well. However, I think it's reasonable for a function to refuse to calculate the variance of no data. We do have try() to handle errors if needed. -thomas Thomas Lumley Asst. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._