Strplit code
Dear Wacek, I've thought a bit more about this problem, and recall that I originally wrote Strsplit() [and replacements for sub() and gsub(), which were not then in S-PLUS] for the version of the car package that I released for S-PLUS, because other functions in the package used these. The strings involved were small, so performance issues weren't that important, although of course it's better to have a more efficient solution. Although I no longer have an installed copy of S-PLUS to confirm this, I believe that gregexepr() is still not present in S-PLUS (though I think that strsplit() is in the latest version). If that's the case, then your function wouldn't work at all in the context of the original posting, which asked for a solution in S-PLUS. You could make your code work in S-PLUS, and probably still have it more efficient than mine, by writing a replacement for gregexpr().
-----Original Message----- From: Wacek Kusnierczyk [mailto:Waclaw.Marcin.Kusnierczyk at idi.ntnu.no] Sent: December-04-08 7:29 AM To: John Fox Cc: R help Subject: Re: [R] Strplit code John Fox wrote:
Dear Wacek, "Wrong" is a bit strong, I think -- limited to single-pattern characters
is
more accurate.
nothing is ever wrong if seen from an appropriate perspective. for example, there is nothing wrong in that many core functions in r deparse some, but not all, of the argument expressions, without any obvious pattern -- when you get used to it and learn each single case by heart, it's perfectly correct.
Moreover, it isn't hard to make the function work with multiple-character matches as well:
which you probably should have done before posting the flawed version.
Indeed. Had I anticipated the possibility of multiple-character splits I would have done so. John
Strsplit <- function(x, split){
if (length(x) > 1) {
return(lapply(x, Strsplit, split)) # vectorization
}
result <- character(0)
if (nchar(x) == 0) return(result)
posn <- regexpr(split, x)
if (posn <= 0) return(x)
c(result, substring(x, 1, posn - 1),
Recall(substring(x, posn + attr(posn, "match.length"),
nchar(x)), split)) # recursion
}
On the other hand, your function is much more efficient.
just one order of magnitude in my tests. might not be completely fool proof, though. vQ