Skip to content
Back to formatted view

Raw Message

Message-ID: <1359314200.90410.YahooMailNeo@web142602.mail.bf1.yahoo.com>
Date: 2013-01-27T19:16:40Z
From: arun
Subject: Removing values containing a specific character
In-Reply-To: <CANEe9Zrz8dd9Q4gvy70S83LpSxbFh2=VBG=hJmn0aFkBr4gg6Q@mail.gmail.com>

Hi, 
I tried with bigger dataset.

set.seed(25)
names <- sample(c("bob", "joe", "craig at gmail.com", "emily", "jane at yahoo.com"),5e6,replace=TRUE)
set.seed(1651)
emails
 <- sample(c("bobj at cup.com", "joesmith at gmail.com", "craig at gmail.com",
 "emily2 at yahoo.com", "jane at yahoo.com"),5e6,replace=TRUE)

?df <- data.frame(names, emails) 
?dim(df)
#[1] 5000000?????? 2
?df[]<-lapply(df,as.character)
?system.time(df[,1][grep("@",df$names)]<- "" )
#?? user? system elapsed 
#? 1.732?? 0.108?? 1.844 
?system.time(dfNew1<-df[grep("\\w+",df$names),])
#?? user? system elapsed 
#? 0.896?? 0.024?? 0.923 
?system.time(dfNew2<- df[df$names!="",])
#?? user? system elapsed 
?# 0.460?? 0.028?? 0.490 
A.K.







________________________________
From: Yasha Podeswa <ypodeswa at gmail.com>
To: arun <smartpink111 at yahoo.com> 
Cc: R help <r-help at r-project.org>; Uwe Ligges <ligges at statistik.tu-dortmund.de> 
Sent: Sunday, January 27, 2013 2:05 PM
Subject: Re: [R] Removing values containing a specific character


You two were 100% right, it was just a memory issue.? This was part of a bigger project where I had a number of data frames loaded, all with 1-5 million rows. Cleaned up my code to have less data frames loaded at once, and everything is working great.? Thanks for the help!
On Jan 27, 2013 9:46 AM, "arun" <smartpink111 at yahoo.com> wrote:

Hi Yasha,
>
>?I guess you got Uwe's response.
>
>?I created `df2` with the intention of getting the two results from the original dataset.
>For example, after you get the first result
>df[,1][grep("@",df$names)]<- ""
>#you can get the second result by:
>df[df$names!="",]
>?# names???????????? emails
>#1?? bob?????? bobj at cup.com
>#2?? joe joesmith at gmail.com
>#4 emily?? emily2 at yahoo.com
>
>#or
>df[grep("\\w+",df$names),]
>#? names???????????? emails
>#1?? bob?????? bobj at cup.com
>#2?? joe joesmith at gmail.com
>#4 emily?? emily2 at yahoo.com
>
>But, I am? not sure how this will work over a 5.5 million rows.
>A.K.
>
>
>
>
>----- Original Message -----
>From: ypodeswa <ypodeswa at gmail.com>
>To: r-help at r-project.org
>Cc:
>Sent: Sunday, January 27, 2013 1:11 AM
>Subject: Re: [R] Removing values containing a specific character
>
>Actually, it worked perfectly for my sample data, but my actual data has
>5.5 million rows, and grep doesn't seem to work with over a million rows.
>Any idea on a workaround?
>
>
>On Sat, Jan 26, 2013 at 9:37 PM, Yasha Podeswa <ypodeswa at gmail.com> wrote:
>
>> Awesome, thanks Arun, that's exactly what I was looking for!
>>
>>
>> On Sat, Jan 26, 2013 at 9:21 PM, arun kirshna [via R] <
>> ml-node+s789695n4656749h63 at n4.nabble.com> wrote:
>>
>>> Hi,
>>> Try this:
>>> df[]<-lapply(df,as.character)
>>> df2<-df
>>> df[,1][grep("@",df$names)]<- ""
>>> df
>>>? ?#names? ? ? ? ? ? ?emails
>>> #1? ?bob? ? ? bobj at cup.com
>>> #2? ?joe joesmith at gmail.com
>>> #3? ? ? ? ? craig at gmail.com
>>> #4 emily? emily2 at yahoo.com
>>> #5? ? ? ? ? jane at yahoo.com
>>>
>>> #2nd part:
>>>
>>>? df2[-grep("@",df2$names),]
>>>? ?names? ? ? ? ? ? ?emails
>>> #1? ?bob? ? ? bobj at cup.com
>>> #2? ?joe joesmith at gmail.com
>>> #4 emily? emily2 at yahoo.com
>>> A.K.
>>>
>>> ------------------------------
>>>? If you reply to this email, your message will be added to the
>>> discussion below:
>>>
>>> http://r.789695.n4.nabble.com/Removing-values-containing-a-specific-character-tp4656744p4656749.html
>>>? To unsubscribe from Removing values containing a specific character, click
>>> here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4656744&code=eXBvZGVzd2FAZ21haWwuY29tfDQ2NTY3NDR8LTEyMTY0MzM4NDk=>
>>> .
>>> NAML<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>
>>
>>
>
>
>
>
>--
>View this message in context: http://r.789695.n4.nabble.com/Removing-values-containing-a-specific-character-tp4656744p4656751.html
>Sent from the R help mailing list archive at Nabble.com.
>??? [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>