grep() and factors
On 6/5/06, Bill Dunlap <bill at insightful.com> wrote:
On Mon, 5 Jun 2006, Marc Schwartz (via MN) wrote:
grep("[a-z]", factor(letters))
numeric(0)
I was recently surprised by this also. In addition, if
R's grep did support factors in this way, what sort of
object (factor or character) should it return when value=T?
I recently changed Splus's grep to return a character vector in
that case.
Splus> grep("[def]", letters[26:1])
[1] 21 22 23
Splus> grep("[def]", factor(letters[26:1], levels=letters[26:1]))
[1] 21 22 23
Splus> grep("[def]", letters[26:1], value=T)
[1] "f" "e" "d"
Splus> grep("[def]", factor(letters[26:1], levels=letters[26:1]), value=T)
[1] "f" "e" "d"
Splus> class(.Last.value)
[1] "character"
R does this when grepping an integer vector.
R> grep("1", 0:11, value=T)
[1] "1" "10" "11"
help(grep) says it returns "the matching elements themselves", but
doesn't say if "themselves" means before or after the conversion to
character.
Bill, My first inclination for the return value when used on a factor would be the indexed factor elements where grep() would otherwise simply return the indices. This would also maintain the factor levels from the original source factor since "[".factor would normally retain these when drop = FALSE.
That would be my first inclination also. I would have expected the output of grep(pattern, text, value=TRUE) to be identical to that of text[grep(pattern, text, value=FALSE)] no matter what class text has. No end users have seen this in Splus so we can change it to anything, but we want to keep it the same as R's.
I could be convinced either way. The concern of course being that (given the offlist replies I have received today) even experienced users are getting bitten by the current behavior versus their intuitive expectations, which are at least loosely supported by the documentation.
I would have expected If non-character text arguments are accepted I would have expected that they be coerced to character so that grep(pattern, text, ...) would return the same result as grep(pattern, as.character(text), ...)