Skip to content

calculations on columns with partially matching names

2 messages · Jim Bouldin, David Winsemius

#
Is there a command for partial matching of character strings? Specifically,
I'd like to be able to calculate the mean of the values in any columns in a
data frame or matrix that have identity in part of their column names.  For
example, columns labeled "mpw06a" and "mpw06b" match on the first five
characters; their mean would be taken whereas any columns beginning with
other than "mpw06" would be excluded.  I need to compare every pair of
columns in the frame, and in some cases, possibly three at a time. 

Thanks in advance for any ideas.




Jim Bouldin
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740
#
On Jan 3, 2010, at 6:09 PM, Jim Bouldin wrote:

            
?grep
?"["

 > tdf <- data.frame(mpw06a=rnorm(10), mpw06b=rnorm(10), abc=rnorm(10))

 > lapply(tdf[ , grep("mpw06", names(tdf)) ], mean)
$mpw06a
[1] -0.1825447

$mpw06b
[1] -0.2386772
?combn
David Winsemius, MD
Heritage Laboratories
West Hartford, CT