Skip to content

Bug in tapply with factors containing NAs (PR#6672)

3 messages · george.leigh@dpi.qld.gov.au, Brian Ripley, Peter Dalgaard

#
Full_Name: George Leigh
Version: 1.8.1
OS: Windows 2000
Submission from: (NULL) (203.25.1.208)


The following example gives the correct answer when the first argument of tapply
is a numeric vector, but an incorrect answer when it is a factor.  If the
function used by tapply is "length", the type and contents of the first argument
should make no difference, provided it has the same length as the second
argument.
1 
1
1 
2
#
On Mon, 15 Mar 2004 george.leigh@dpi.qld.gov.au wrote:

            
Not so:
$"1"
[1] 1
$"1"
[1] <NA> 1
Levels: 1

Note that as there is only one level, NA must be 1 in y, whereas it does
not have to be in x.  So the answer for a factor in your problem is
definitely correct, if fortuitous.

R does the same as S in this example.

If there were more than one level in y, the issue is less clearcut.
Probably y[[k]] <- x[f == k] in split.default should be x[f %in% k]

Note too

z <- x; class(x) <- "foo"
$"1"
[1] NA  1

  
    
#
george.leigh@dpi.qld.gov.au writes:
The core of this is that
$"1"
[1] <NA> 1
Levels: 1
$"1"
[1] 1


which in turn comes from the innards of split.default:

...
    if (is.null(attr(x, "class")) && is.null(names(x)))
        return(.Internal(split(x, f)))
    lf <- levels(f)
    y <- vector("list", length(lf))
    names(y) <- lf
    for (k in lf) y[[k]] <- x[f == k]
    y

Factors have a class attribute, so you don't use the internal code in
that case and
[1] <NA> 1
Levels: 1 

I think the line in split.default  needs to read

    for (k in lf) y[[k]] <- x[!is.na(f) & f == k]