partial matching of row names in [-indexing

Although implicit, but what I don't think anyone has mentioned is that
the partial matching of row names only applies if the row name is
uniquely matched, as in:
X <- data.frame(a=1:3, b=letters[1:3], row.names=c("A1", "B", "C"))
X["A", ]
a b
A1 1 a

If it matches two or more rows, you get:
X <- data.frame(a=1:3, b=letters[1:3], row.names=c("A1", "A2", "C"))
X["A", ]
a    b
NA NA <NA>

just as you would get if there is no match:
X["A3", ]
a    b
NA NA <NA>

So, the current behavior is dependent on what the other similar row
names too, that is, what might work at one point, might break when new
data are added to the data frame.

This is a behavior that I think stems from someone thought it's handy
while working interactively with data.frame:s interactively.  I think
it's an error-prone property when it comes to production code (script,
packages, and dynamic documents).  To me, this behavior should be
phased out from R to avoid silent errors and false scientific results.
It's not clear to me how to best deprecate the partial matching,
because of the default behavior of returning NA:s when there is no
match.  This means it can't be just a warning or an error.

My $.03

/Henrik
   Makes sense if you realize that ?"[" only applies to *vector*,
*list*, and *matrix* indexing and that data frames follow their own
rules that are documented elsewhere ...

   So yes, not a bug but I claim it's an infelicity. I might submit a
doc patch.

  FWIW

b["A1",]
as.matrix(b)["A1",]

  illustrates the difference.

  thanks
    Ben

On 1/14/22 9:19 PM, Steve Martin wrote:
I don't think this is a bug in the documentation. The help page for
`?[.data.frame` has the following in the last paragraph of the
details:

Both [ and [[ extraction methods partially match row names. By default
neither partially match column names, but [[ will if exact = FALSE
(and with a warning if exact = NA). If you want to exact matching on
row names use match, as in the examples.

The example it refers to is

sw <- swiss[1:5, 1:4]  # select a manageable subset
sw["C", ] # partially matches
sw[match("C", row.names(sw)), ] # no exact match

Whether this is good behaviour or not is a different question, but the
documentation seems clear enough (to me, at least).

Best,
Steve

On Fri, 14 Jan 2022 at 20:40, Ben Bolker <bbolker at gmail.com> wrote:

    People are often surprised that row-indexing a data frame by [ +
character does partial matching (and annoyed that there is no way to
turn it off:

https://stackoverflow.com/questions/18033501/warning-when-partial-matching-rownames

https://stackoverflow.com/questions/34233235/r-returning-partial-matching-of-row-names

https://stackoverflow.com/questions/70716905/why-does-r-have-inconsistent-behaviors-when-a-non-existent-rowname-is-retrieved

?"[" says:

Character indices can in some circumstances be partially matched
       (see ?pmatch?) to the names or dimnames of the object being
       subsetted (but never for subassignment).  UNLIKE S (Becker et al_
       p. 358), R NEVER USES PARTIAL MATCHING WHEN EXTRACTING BY ?[?, and
       partial matching is not by default used by ?[[? (see argument
       ?exact?).

(EMPHASIS ADDED).

Looking through the rest of that page, I don't see any other text that
modifies or supersedes that statement.

    Is this a documentation bug?

The example given in one of the links above:

b <- as.data.frame(matrix(4:5, ncol = 1, nrow = 2, dimnames =
list(c("A10", "B"), "V1")))

b["A1",]  ## 4 (partial matching)
b[rownames(b) == "A1",]  ## logical(0)
b["A1", , exact=TRUE]    ## unused argument error
b$V1[["A1"]] ## subscript out of bounds error
b$V1["A1"]   ## NA

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

partial matching of row names in [-indexing

Thread (5 messages)