Skip to content
Prev 319621 / 398502 Next

Grep with wildcards across multiple columns

Hi,

You could try this for multiple intersect:

?dt[Reduce(function(...) intersect(...), list(grep(par.fund,fund),grep(par.func,func),grep(par.obj,obj))),sum(amount),by=c('code','year')]
#?? code year???? V1
#1: 1001 2011 123528
#2: 1001 2012? 97362
#3: 1002 2011 103811
#4: 1002 2012? 97179
?dt[intersect(intersect(grep(par.fund, fund), grep(par.func, func)),
???????????????????? grep(par.obj, obj)),
?????????? sum(amount), by=c('code', 'year')]
?#? code year???? V1
#1: 1001 2011 123528
#2: 1001 2012? 97362
#3: 1002 2011 103811
#4: 1002 2012? 97179
A.K.



----- Original Message -----
From: "Bush,  Daniel P.   DPI" <Daniel.Bush at dpi.wi.gov>
To: "'r-help at r-project.org'" <r-help at r-project.org>
Cc: 'William Dunlap' <wdunlap at tibco.com>; "'smartpink111 at yahoo.com'" <smartpink111 at yahoo.com>; "'djmuser at gmail.com'" <djmuser at gmail.com>
Sent: Friday, March 15, 2013 10:06 AM
Subject: RE: Grep with wildcards across multiple columns

I think the way I set up my sample data without any explanation confused things slightly. These data might make things clearer:

# Create fake data
df <- data.frame(code?  = c(rep(1001, 8), rep(1002, 8)),
? ? ? ? ? ? ? ?  year?  = rep(c(rep(2011, 4), rep(2012, 4)), 2),
? ? ? ? ? ? ? ?  fund?  = rep(c("10E", "27E", "27E", "29E"), 4),
? ? ? ? ? ? ? ?  func?  = rep(c("110000", "122000", "214000", "158000"), 4),
? ? ? ? ? ? ? ?  obj? ? = rep(c("100", "100", "210", "220"), 4),
? ? ? ? ? ? ? ?  amount = round(rnorm(16, 50000, 10000)))

These are financial data with a hierarchical account structure where a zero represents a summary account that rolls up all the accounts at subsequent digits (e.g. 100000 rolls up 110000, 122000, 158000, etc.). I was trying to do two things with the search parameters: turn zeroes into question marks, and duplicate the functionality of a SQL query using those question marks as wildcards:

# Set parameters
par.fund <- "20E"; par.func <- "100000"; par.obj <- "000"
par.fund <- glob2rx(gsub("0", "?", par.fund))
par.func <- glob2rx(gsub("0", "?", par.func))
par.obj <- glob2rx(gsub("0", "?", par.obj))

Fortunately, Bill's suggestion to use the intersect function worked just fine--since intersect accepts only two arguments, I had to nest a pair of statements:

# Solution: Use a pair of nested intersects
dt2 <- dt[intersect(intersect(grep(par.fund, fund), grep(par.func, func)),
? ? ? ? ? ? ? ? ? ? grep(par.obj, obj)),
? ? ? ? ? sum(amount), by=c('code', 'year')]
df2 <- ddply(df[intersect(intersect(grep(par.fund, df$fund),
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? grep(par.func, df$func)),
? ? ? ? ? ? ? ? ? ? ? ? ? grep(par.obj, df$obj)), ],
? ? ? ? ? ?  .(code, year), summarize, amount = sum(amount))

Thanks for your ideas!

DB

Daniel Bush | School Finance Consultant 
School Financial Services | Wis. Dept. of Public Instruction 
daniel.bush -at- dpi.wi.gov | 608-267-9212

-----Original Message-----
From: William Dunlap [mailto:wdunlap at tibco.com] 
Sent: Thursday, March 14, 2013 5:49 PM
To: Bush, Daniel P. DPI; 'r-help at r-project.org'
Subject: RE: Grep with wildcards across multiple columns

grep(pattern, textVector) returns of the integer indices of the elements of textVector that match the pattern.? E.g.,
? > grep("T", c("One","Two","Three","Four"))
? [1] 2 3

The '&' operator naturally operates on logical vectors of the same length (If you give it numbers it silently converts 0 to FALSE and? other numbers to TRUE.)

The two don't fit together.? You could use grepl(), which returns a logical vector the length of textVector, as in
?  grepl(p1,v1) & grepl(p2,v2)
to figure which entries in the table have v1 matching p1 and v2 matching p2.

Or, you could use
? intersect(grep(p1,v1), grep(p2,v2))
if you want to stick with integer indices.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com