Row subsetting of data frames (PR#425)
From: plummer@iarc.fr
Date: Wed, 9 Feb 2000 18:38:12 +0100 (MET)
To: r-devel@stat.math.ethz.ch
Subject: [Rd] Row subsetting of data frames (PR#425)
CC: R-bugs@biostat.ku.dk
X-Loop: R-bugs@biostat.ku.dk
If you want to use row names to take a row subset of a data.frame then
there is a bug when
- One row has a name which is a completion of another row name
- The shorter name comes after the longer one
- You want to retrieve the row with the shorter name
An example:
R> x <- matrix(1:4, 2, 2, dimnames=list(c("abc","ab"), c("cde","cd")))
R> x
cde cd
abc 1 3
ab 2 4
R> x["ab",] #Works OK for matrices
cde cd
2 4
R> y <- as.data.frame(x)
R> y["ab",] #but not for a data frame
cde cd
abc 1 3
The problem boils down to
pmatch("ab", c("abc", "ab"), duplicates.ok = T)
[1] 1
and the code expects 2, which is what S gives. The description of
duplicates.ok: should duplicate matches be allowed?
If there are multiple matches the result depends on the value of
`duplicates.ok'. If this is false multiple matches will result in
the value of `nomatch' being returned, and if it is true, the
index of the first matching value will be returned.
is different from S: the argument in S refers to allowing
duplicates in x, so
pmatch(rep("ab",3), c("abc", "ab"), duplicates.ok = T)
[1] 2 2 2
pmatch(rep("ab",3), c("abc", "ab"), duplicates.ok = F)
[1] 2 1 NA
A quick fix is in [.data.frame, to give
if (is.character(i))
i <- sapply(i, function(x) match(x, rows))
but I think we should make pmatch S-compatible.
Brian
Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._