Skip to content
Prev 226891 / 398500 Next

Data Frame Manipulation using function

On Jul 8, 2010, at 10:33 PM, Erik Iverson wrote:

            
Yeah it was kind of pain, but ...

dta <- read.table(textConnection('     id      
url                                                         urlType
1     1      "www.yahoo.com <http://www.yahoo.com>"      1
2     2      "www.google.com/?search= <http://www.google.com/? 
search=>" 2
3     3      "www.google.com <http://www.google.com>" 1
4     4      "www.yahoo.com/?query= <http://www.yahoo.com/?query=>"   2
5     5      "www.gmail.com <http://www.gmail.com>" 1') )
Seems to ... after I fixed my incorrect cmd-V paste of the function  
name and guessing that trim was the one in gdata:

 > require(gdata)
 > checkBaseLine <- function(s){
+ for (listItem in WHITELIST){
+ if(regexpr(as.character(listItem), s)[1] > -1){
+ return(TRUE)
+ }
+ }
+ return(FALSE)
+ }
 >
 > #Here is the definition for WHITELIST:-
 >
 > WHITELIST = "[?]query=, [?]search="
 > WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))
 > vcheck <- Vectorize(checkBaseLine)
 >
 > vcheck <- Vectorize(checkBaseLine)
 >
 > dta[ dta$urlType != 1 & vcheck(dta$url) , "url" ]
[1] www.google.com/?search= <http://www.google.com/?search=> www.yahoo.com/?query= 
  <http://www.yahoo.com/?query=>
5 Levels: www.gmail.com <http://www.gmail.com> www.google.com <http://www.google.com 
 > ... www.yahoo.com/?query= <http://www.yahoo.com/?query=>