Skip to content
Prev 332798 / 398513 Next

Multiple String word replacements: Performance Issue

Here is a start.  I was wondering how long it would take to at least
substitute 800 different patterns into 4M vectors.  Here is my test.
It took longer (99 sec) to create the test data than to do the
substitutes (52 secs).  Now some variations on this can provide the
other information that you are probably after in less than a day ( I
would guess less than an hour)
+         , sample(LETTERS, n, TRUE)
+         , sample(LETTERS, n, TRUE)
+         , sample(LETTERS, n, TRUE)
+         , sample(LETTERS, n, TRUE)
+ )
+ output <- replicate(n, paste(sample(x,2), collapse = ' '))
+ })
   user  system elapsed
  99.85    0.22  100.37
+ pattern <- paste0("\\", x, collapse = "|")
+ z <- gsub(pattern, "[ticker]", output, perl = TRUE)
+ })
   user  system elapsed
  52.05    0.00   52.21
chr [1:4000000] "$JHVN $VKOL" "$GTEU $CEGL" "$LOEY $ETQK" "$AFDO
$SDLH" "$MOIN $WEVR" ...
chr [1:4000000] "[ticker] [ticker]" "[ticker] [ticker]" "[ticker] [ticker]" ...
chr "\\$MATF|\\$GFGC|\\$SRYC|\\$HLWS|\\$GHFB|\\$BGVU|\\$GFDW|\\$PSFN|\\$ONDY|\\$SXUH|\\$EBDJ|\\$YNQY|\\$NDBT|\\$TOQK|\\$IUBN|\\$VSMT"|
__truncated__
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
On Wed, Nov 6, 2013 at 8:11 AM, Simon Pickert <simon.pickert at t-online.de> wrote: