Skip to content
Back to formatted view

Raw Message

Message-ID: <2925DAD9-CD46-4303-973A-A8C5A5F12B9A@t-online.de>
Date: 2013-11-05T08:13:12Z
From: Simon Pickert
Subject: speed issue: gsub on large data frame
In-Reply-To: <5c17fc2c-78f1-409e-9150-2b7379108d07@email.android.com>

How?s that not reproducible?

1. Data frame, one column with text strings
2. Size of data frame= 4million observations
3. A bunch of gsubs in a row (  gsub(patternvector, ?[token]?,dataframe$text_column)  )
4. General question: How to speed up string operations on ?large' data sets?


Please let me know what more information you need in order to reproduce this example? 
It?s more a general type of question, while I think the description above gives you a specific picture of what I?m doing right now.






General question: 
Am 05.11.2013 um 06:59 schrieb Jeff Newmiller <jdnewmil at dcn.davis.CA.us>:

> Example not reproducible. Communication fail. Please refer to Posting Guide.
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>                                      Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> --------------------------------------------------------------------------- 
> Sent from my phone. Please excuse my brevity.
> 
> Simon Pickert <simon.pickert at t-online.de> wrote:
>> Hi R?lers,
>> 
>> I?m running into speeding issues, performing a bunch of 
>> 
>> ?gsub(patternvector, [token],dataframe$text_column)"
>> 
>> on a data frame containing >4millionentries.
>> 
>> (The ?patternvectors? contain up to 500 elements) 
>> 
>> Is there any better/faster way than performing like 20 gsub commands in
>> a row?
>> 
>> 
>> Thanks!
>> Simon
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>