Skip to content

factor(x, exclude=NULL) for factor x; names in as.factor(<integer>)

2 messages · Suharto Anggono Suharto Anggono, Martin Maechler

#
In R 3.3.0 (also in R 2.7.2), the documentation on 'factor', in "Details" section, has this statement.
'factor(x, exclude = NULL)' applied to a factor is a no-operation unless there are unused levels: in that case, a factor with the reduced level set is returned.

It is not true for a factor 'x' that has NA. In that case, if levels of 'x' doesn't contain NA, factor(x, exclude = NULL) adds NA as a level.
If levels of a factor 'x' doesn't contain NA, factor(x) is a no-operation if all levels are used.


In R 3.3.0 (also in R 3.1.3), for a named integer 'x', factor(x) has names and as.factor(x) doesn't. It would be better if the behavior on names were matched.
[1] "a"
NULL
R version 3.3.0 (2016-05-03)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows XP (build 2600) Service Pack 2

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.3.0
#
> In R 3.3.0 (also in R 2.7.2), the documentation on 'factor', in "Details" section, has this statement.
    > 'factor(x, exclude = NULL)' applied to a factor is a no-operation unless there are unused levels: in that case, a factor with the reduced level set is returned.

    > It is not true for a factor 'x' that has NA. In that case, if levels of 'x' doesn't contain NA, factor(x, exclude = NULL) adds NA as a level.
    > If levels of a factor 'x' doesn't contain NA, factor(x) is a no-operation if all levels are used.

So we should fix the documentation (only!), right ?

-------- -------

    > In R 3.3.0 (also in R 3.1.3), for a named integer 'x', factor(x) has names and as.factor(x) doesn't. It would be better if the behavior on names were matched.

I agree .. for consistency with the named "double" case (and
also consistency with earlier versions of R) :
This is a bug indeed, only present in R versions  >= 3.1.0

Another MRE is (note that '0' is "double"):
[1] 1 2
Levels: 1 2
one two 
  1   2 
Levels: 1 2
one two 
  1   2 
Levels: 1 2
one two 
  1   2 
Levels: 1 2
>> x <- integer(1)
    >> names(x) <- "a"
    >> names(factor(x))
    > [1] "a"
    >> names(as.factor(x))
    > NULL
    >> sessionInfo()
    > R version 3.3.0 (2016-05-03)
    > Platform: i386-w64-mingw32/i386 (32-bit)
    > Running under: Windows XP (build 2600) Service Pack 2

    > locale:
    > [1] LC_COLLATE=English_United States.1252
    > [2] LC_CTYPE=English_United States.1252
    > [3] LC_MONETARY=English_United States.1252
    > [4] LC_NUMERIC=C
    > [5] LC_TIME=English_United States.1252

    > attached base packages:
    > [1] stats     graphics  grDevices utils     datasets  methods   base

    > loaded via a namespace (and not attached):
    > [1] tools_3.3.0

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel