speed issue: gsub on large data frame
How?s that not reproducible? 1. Data frame, one column with text strings 2. Size of data frame= 4million observations 3. A bunch of gsubs in a row ( gsub(patternvector, ?[token]?,dataframe$text_column) ) 4. General question: How to speed up string operations on ?large' data sets? Please let me know what more information you need in order to reproduce this example? It?s more a general type of question, while I think the description above gives you a specific picture of what I?m doing right now. General question: Am 05.11.2013 um 06:59 schrieb Jeff Newmiller <jdnewmil at dcn.davis.CA.us>:
Example not reproducible. Communication fail. Please refer to Posting Guide.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
Simon Pickert <simon.pickert at t-online.de> wrote:
Hi R?lers, I?m running into speeding issues, performing a bunch of ?gsub(patternvector, [token],dataframe$text_column)" on a data frame containing >4millionentries. (The ?patternvectors? contain up to 500 elements) Is there any better/faster way than performing like 20 gsub commands in a row? Thanks! Simon
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.