partial matching of row names in [-indexing
Although implicit, but what I don't think anyone has mentioned is that the partial matching of row names only applies if the row name is uniquely matched, as in:
X <- data.frame(a=1:3, b=letters[1:3], row.names=c("A1", "B", "C"))
X["A", ]
a b A1 1 a If it matches two or more rows, you get:
X <- data.frame(a=1:3, b=letters[1:3], row.names=c("A1", "A2", "C"))
X["A", ]
a b NA NA <NA> just as you would get if there is no match:
X["A3", ]
a b NA NA <NA> So, the current behavior is dependent on what the other similar row names too, that is, what might work at one point, might break when new data are added to the data frame. This is a behavior that I think stems from someone thought it's handy while working interactively with data.frame:s interactively. I think it's an error-prone property when it comes to production code (script, packages, and dynamic documents). To me, this behavior should be phased out from R to avoid silent errors and false scientific results. It's not clear to me how to best deprecate the partial matching, because of the default behavior of returning NA:s when there is no match. This means it can't be just a warning or an error. My $.03 /Henrik
On Fri, Jan 14, 2022 at 6:55 PM Ben Bolker <bbolker at gmail.com> wrote:
Makes sense if you realize that ?"[" only applies to *vector*,
*list*, and *matrix* indexing and that data frames follow their own
rules that are documented elsewhere ...
So yes, not a bug but I claim it's an infelicity. I might submit a
doc patch.
FWIW
b["A1",]
as.matrix(b)["A1",]
illustrates the difference.
thanks
Ben
On 1/14/22 9:19 PM, Steve Martin wrote:
I don't think this is a bug in the documentation. The help page for
`?[.data.frame` has the following in the last paragraph of the
details:
Both [ and [[ extraction methods partially match row names. By default
neither partially match column names, but [[ will if exact = FALSE
(and with a warning if exact = NA). If you want to exact matching on
row names use match, as in the examples.
The example it refers to is
sw <- swiss[1:5, 1:4] # select a manageable subset
sw["C", ] # partially matches
sw[match("C", row.names(sw)), ] # no exact match
Whether this is good behaviour or not is a different question, but the
documentation seems clear enough (to me, at least).
Best,
Steve
On Fri, 14 Jan 2022 at 20:40, Ben Bolker <bbolker at gmail.com> wrote:
People are often surprised that row-indexing a data frame by [ +
character does partial matching (and annoyed that there is no way to
turn it off:
https://stackoverflow.com/questions/18033501/warning-when-partial-matching-rownames
https://stackoverflow.com/questions/34233235/r-returning-partial-matching-of-row-names
https://stackoverflow.com/questions/70716905/why-does-r-have-inconsistent-behaviors-when-a-non-existent-rowname-is-retrieved
?"[" says:
Character indices can in some circumstances be partially matched
(see ?pmatch?) to the names or dimnames of the object being
subsetted (but never for subassignment). UNLIKE S (Becker et al_
p. 358), R NEVER USES PARTIAL MATCHING WHEN EXTRACTING BY ?[?, and
partial matching is not by default used by ?[[? (see argument
?exact?).
(EMPHASIS ADDED).
Looking through the rest of that page, I don't see any other text that
modifies or supersedes that statement.
Is this a documentation bug?
The example given in one of the links above:
b <- as.data.frame(matrix(4:5, ncol = 1, nrow = 2, dimnames =
list(c("A10", "B"), "V1")))
b["A1",] ## 4 (partial matching)
b[rownames(b) == "A1",] ## logical(0)
b["A1", , exact=TRUE] ## unused argument error
b$V1[["A1"]] ## subscript out of bounds error
b$V1["A1"] ## NA
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-- Dr. Benjamin Bolker Professor, Mathematics & Statistics and Biology, McMaster University Director, School of Computational Science and Engineering (Acting) Graduate chair, Mathematics & Statistics
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel