Skip to content
Prev 332651 / 398506 Next

speed issue: gsub on large data frame

what is missing is any idea of what the 'patterns' are that you are searching for.  Regular expressions are very sensitive to how you specify the pattern.  you indicated that you have up to 500 elements in the pattern, so what does it look like?  alternation and backtracking can be very expensive.  so a lot more specificity is required.  there are whole books written on how pattern matching works and what is hard and what is easy.  this is true for wherever regular expressions are used, not just in R.  also some idea of what the timing is; are you talking about 1-10-100 seconds/minutes/hours.

Sent from my iPad
On Nov 5, 2013, at 3:13, Simon Pickert <simon.pickert at t-online.de> wrote: