Skip to content

matrix subset problem with factors

3 messages · ঋষি ( ऋषि / rIsHi ), Marc Schwartz, Jeff Newmiller

#
Hi All,

I like to report this bug related to matrix subset by rownames when passed
as factors. Now factors are may not be safe to use but then it should
generate a warning message. Since many time we use values returned by some
packages as factor to subset a matrix and which may result in a wrong
calculation.

I wish if "factor" is not expected in matrix operation then it should throw
an error/warning message.

Below are the codes to reproduce it.
c("A","B","C")))
could be overlooked
[1] X Z
Levels: X Z
A B C
X 1 4 7
Y 2 5 8
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.1

  
    
#
Hi,

I get the same behavior in R 3.5.2 on macOS.

Others may feel differently, but I am not so sure that this is a bug, as opposed to perhaps the need to clarify in ?Extract, that the following, which is found under Atomic vectors:

"The index object i can be numeric, logical, character or empty. Indexing by factors is allowed and is equivalent to indexing by the numeric codes (see factor) and not by the character values which are printed (for which use [as.character(i)])."

also applies to the indexing of matrices and arrays.

Since matrices and arrays in R are vectors with 'dim' attributes, the behavior is essentially consistent as described above.

Thus, perhaps just add the second sentence above or similar wording to the section for Matrices and arrays.

Regards,

Marc Schwartz
#
With on official weight, I second the opinion that the existing behavior is appropriate and not a bug.

Functions should not "unexpectedly" return factors... a common example are the read.table family of functions that by default return factors, but the behaviour is deterministic and controllable with the as.is or stringsAsFactors arguments. If you have functions that randomly return different types then the bug is in those functions.

Don't confuse factors and character data types... they are distinct and used for different purposes.
On February 20, 2019 12:59:54 PM PST, Marc Schwartz via R-help <r-help at r-project.org> wrote: