Strplit code
R does not support tail recursion so not using it would not matter. On Thu, Dec 4, 2008 at 5:04 AM, Wacek Kusnierczyk
<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
John Fox wrote:
By coincidence, I have a version of strsplit() that I've used to
illustrate recursion:
Strsplit <- function(x, split){
if (length(x) > 1) {
return(lapply(x, Strsplit, split)) # vectorization
}
result <- character(0)
if (nchar(x) == 0) return(result)
posn <- regexpr(split, x)
if (posn <= 0) return(x)
c(result, substring(x, 1, posn - 1),
Recall(substring(x, posn+1, nchar(x)), split)) # recursion
}
well, it is both inefficient and wrong.
inefficient because of the non-tail recursion and recursive
concatenation, which is justified for the sake the purpose of showing
recursion, but for practical purposes you'd rather use gregexepr.
wrong because of how you pick the remaining part of the string to be
split -- it works just under the assumption the pattern is a single
character:
Strsplit("hello-dolly,--sweet", "--")
# the pattern is *two* hyphens
# [1] "hello-dolly" "-sweet"
Strsplit("hello dolly", "")
# the pattern is the empty string
# [1] "" "" "" "" "" "" "" "" "" "" ""
here's a quick rewrite -- i haven't tested it on extreme cases, it may
not be perfect, and there's a hidden source of inefficiency here as well:
strsplit =
function(strings, split) {
positions = gregexpr(split, strings)
lapply(1:length(strings), function(i)
substring(strings[[i]], c(1, positions[[i]] +
attr(positions[[i]], "match.length")), c(positions[[i]]-1,
nchar(strings[[i]]))))
}
n = 1000; m = 100
strings = replicate(n, paste(sample(c(letters, " "), 100, replace=TRUE),
collapse=""))
system.time(replicate(m, strsplit(strings, " ")))
system.time(replicate(m, Strsplit(strings, " ")))
vQ
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.