Skip to content

once more: methods on missing data

3 messages · Maciej.Hoffman-Wecker@evotecoai.com, Thomas Lumley, Achim Zeileis

#
Thanks for replies, but i was not precise enough.

The problem is not evaluating statistics on data with NA values.
The problem is evaluation of statistics on data with length = 0.

To make the problem more clear this is what i tried:

This works fine:

     tapply(as.numeric(c(NA,2)), as.factor(c("a","b")), summary)

But i need SDev, aswell, so i copied summary.default to my.summary and
changed only the line

        qq <- signif(c(qq[1:3], mean(object), qq[4:5]), digits)
        names(qq) <- c("Min.", "1st Qu.", "Median", "Mean", "3rd Qu.",
"Max.")
to

     qq <- signif(c(qq[1:3], mean(object), qq[4:5], sd(object),
mad(object)), digits)
        names(qq) <- c("Min.", "1st Qu.", "Median", "Mean", "3rd Qu.",
"Max.","SDev","MAD")

and

     tapply(as.numeric(c(NA,2)), as.factor(c("a","b")), my.summary)

results in

     Error in var(x, na.rm = na.rm) : `x' is empty

I think this is a frequent problem. It results from the following.

The result of the call

     x <- as.numeric(c(NA,NA,NA)); STATISTIC(x[!is.na(x)])

depends on the STATISTIC.

     STATISTIC           RESULT
     min                 Inf and warning message
     max                 -Inf and warning message
     mean                NaN and no warning message
     quantile            named vector containing NAs and no warning message
     sd                  abortion of the evaluation with an error message

The breakup is more difficult to handle.

What i did is changing the var function. I changed

     .Internal(cov(x, y, na.method))
to
     z <- try(.Internal(cov(x, y, na.method)))
     if (inherits(z, "try-error")) return(as.numeric(NA))
     else return(z)

This works fine, but a solution within cov.c would be better, i think.
I try not to change standard source code on myself, as i don't know if
this has any consequences.

Should not the statistics generally return NA and a warning message?

Hope this is not a too marginal problem.

Maciej



-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Thu, 7 Jun 2001 Maciej.Hoffman-Wecker at evotecoai.com wrote in part:
<snip>
Ideally, they shouldn't.  NA is missing data -- that is, we don't know the
value of the statistic because some data were not measured. That's why,
for example  NA & FALSE is FALSE, not NA, because the value of the
expression is known, no matter what the first operand is.

The results for min() and max() have the rationale that eg max(a,max(b))
should return the same as max(a,b) even when b is empty. There's even some
examples where this is genuinely helpful.

If the others were to return a value I think NaN (undefined numerical
result) would be better than NA (missing data), as is the case with
mean(). This would argue for changing the return value of quantile() as
well.

However, I think it's reasonable for a function to refuse to calculate the
variance of no data. We do have try() to handle errors if needed.

	-thomas

Thomas Lumley			Asst. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
OK, now I see your problem. R 1.2.3 gives the following for your
example:
Error in var(x, na.rm = na.rm) : `x' is emptyplatform
sparc-sun-solaris2.7

but this seems to be fixed in the current development version (the
forthcoming R 1.3.0)
[1] NA

I'm not sure, what difference in the code is responsible for this but I
hope this helps. My respective systems are given below.
Achim


---------------------------
Achim Zeileis
Institut f?r Statistik
Technische Universit?t Wien


platform sparc-sun-solaris2.7
arch     sparc               
os       solaris2.7          
system   sparc, solaris2.7   
status                       
major    1                   
minor    2.3                 
year     2001                
month    04                  
day      26                  
language R   

platform i686-pc-linux-gnu           
arch     i686                        
os       linux-gnu                   
system   i686, linux-gnu             
status   Under development (unstable)
major    1                           
minor    3.0                         
year     2001                        
month    03                          
day      20                          
language R
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._