Matt
On Sun, May 29, 2011 at 6:44 PM, Ian Gow<iandgow at gmail.com> wrote:
Not a new approach, but some benchmark data (the perl=TRUE speeds up Jim's
suggestion):
x<- c('18x.6','12x.9','302x.3')
y<- rep(x,100000)
system.time(temp<- unlist(lapply(strsplit(y,".",fixed=TRUE),function(x)
x[1])))
user system elapsed
1.203 0.018 1.222
system.time(temp2<- gsub("^(.*?)\\..*$","\\1",y, perl=TRUE))
user system elapsed
0.176 0.001 0.176
system.time(temp3<- gsub("^(.*)\\..*", '\\1', y))
user system elapsed
0.292 0.001 0.291
system.time(temp3<- gsub("^(.*)\\..*", '\\1', y, perl=TRUE))
user system elapsed
0.160 0.001 0.161
On 5/29/11 7:40 PM, "jim holtman"<jholtman at gmail.com> wrote:
x<- c('18x.6','12x.9','302x.3')
gsub("^(.*)\\..*", '\\1', x)
[1] "18x" "12x" "302x"
On Sun, May 29, 2011 at 8:10 PM, Matthew Keller<mckellercran at gmail.com>
wrote:
hi all,
I'm full of questions today :). Thanks in advance for your help!
Here's the problem:
x<- c('18x.6','12x.9','302x.3')
I want to get a vector that is c('18x','12x','302x')
This is easily done using this code:
unlist(lapply(strsplit(x,".",fixed=TRUE),function(x) x[1]))
So far so good. The problem is that x is a vector of length 132e6.
When I run the above code, it runs for> 30 minutes, and it takes> 23
Gb RAM (no kidding!).
Does anyone have ideas about how to speed up the code above and (more
importantly) reduce the RAM footprint? I'd prefer not to change the
file on disk using, e.g., awk, but I will do that as a last resort.
Best
Matt
--
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com