I did this to generate and search 40 million unique strings
grams <- as.character(1:4e7) ## a long time passes...
system.time(grep("^900001", grams)) ## similar times to grepl
user system elapsed
10.384 0.168 10.543
Is that the basic task you're trying to accomplish? grep(l) goes
quickly to C, so I don't think data.table or other will be markedly
faster if you're looking for an arbitrary regular expression (use
fixed=TRUE if looking for an exact match).
If you're looking for strings that start with a pattern, then in
R-3.3.0 there is
system.time(res0 <- startsWith(grams, "900001"))
user system elapsed
0.658 0.012 0.669
which returns the same result as grepl
identical(res0, res1 <- grepl("^900001", grams))
[1] TRUE
One can also parallelize the already vectorized grepl function with
parallel::pvec, with some opportunity for gain (compared to grepl) on
non-Windows
system.time(res2 <- pvec(seq_along(grams), function(i)
grepl("^900001", grams[i]), mc.cores=8))
user system elapsed
24.996 1.709 3.974
[[1]] TRUE
I think anything else would require pre-processing of some kind, and
then some more detail about what your data looks like is required.