Dropping unused levels of a factor that has "NA" as a level
It is history: r16144 | ripley | 2001-09-28 19:40:28 +0100 (Fri, 28 Sep 2001) | 2 lines add is.na<-, distinguish NA level and NA codes in factors so predates having NA character strings distinct from "NA".
On Tue, 11 Jul 2006, Brahm, David wrote:
I mentioned this in R-help on April 28: <https://stat.ethz.ch/pipermail/r-help/2006-April/104595.html> | as.character.factor contains this line (where cx=levels(x)[x]): | if ("NA" %in% levels(x)) cx[is.na(x)] <- "<NA>" | | Is it possible that this is no longer the desired behavior? These | two results don't seem very consistent: | | > as.character(as.factor(c("AB", "CD", NA))) | [1] "AB" "CD" NA | > is.na(.Last.value)[3] | [1] TRUE | | > as.character(as.factor(c("NA", "CD", NA))) | [1] "NA" "CD" "<NA>" | > is.na(.Last.value)[3] | [1] FALSE | | I'm using R-2.3.0 on Redhat Linux, but I don't think the behavior | is new (maybe since character NA's were introduced?). | | -- David Brahm (brahm at alum.mit.edu) -----Original Message----- From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of Peter Dalgaard Sent: Tuesday, July 11, 2006 5:59 PM To: J. Hosking Cc: r-devel at stat.math.ethz.ch Subject: Re: [Rd] Dropping unused levels of a factor that has "NA" as a level "J. Hosking" <jh910 at juno.com> writes:
Is this a bug?
> f1 <- factor(c("a", NA), levels = c("a", "NA") )
> f2 <- f1[, drop = TRUE]
> f2
[1] a <NA>
Levels: a <NA>
I would have expected f2 to have only one level, "a". It seems
to me that the code in [.factor does not follow the advice in
help("factor") on how to set factor codes to be missing when
"NA" is a level of the factor.
Something odd is going on, that's for sure... The problem is also there with factor(f1). And the logic in as.character.factor seems to be at the root of it:
as.character.factor
function (x, ...)
{
cx <- levels(x)[x]
if ("NA" %in% levels(x))
cx[is.na(x)] <- "<NA>"
cx
}
This looks like something from before we had character NA values. I
wonder if it is a mistake or there could actually be a reason to
keep it.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595