Skip to content

regex challenge

5 messages · Frank E Harrell Jr, Guanrao Chen, William Dunlap +2 more

#
I would like to be able to use gsub or gsubfn to process a formula and 
to translate the variables but to ignore expressions in the formula. 
Supposing that the R formula has already been transformed into a 
character string and that the transformation is to convert variable 
names to upper case and to append z to the names, an example would be to 
convert y1 + y2 ~ a*(b + c) + d + f * (h == 3) + (sex == 'male')*i to 
Y1z + Y2z ~ Az*(Bz + Cz) + Dz + Fz * (h == 3) + (sex == 'male')*Iz.  Any 
expression that is not just a simple variable name would be left alone.

Does anyone want to try their hand at creating a regex that would 
accomplish this?

Thanks
Frank
#
I think substitute() or  bquote() will do a better job here than gsub() be
they work on the parsed formula rather than on the raw string.  The
terms() function will interpret the formula-specific operators like "+"
and ":" to come up with a list of the 'variables' (or 'terms') in the formula 
E.g., with the 'f' given below we get
Y1z + Y2z ~ Az * (Bz + Cz) + Dz + Fz * (h == 3) + (sex == "male") * Iz

Is that what you wanted?

If you only wanted to keep intact the expressions of the form
  var==value
(calls to `==`) but transform things like log(a) to log(Az) you
could extend this code to do that as well.

f <- function(formula) {
   trms <- terms(formula)
   variables <- as.list(attr(trms, "variables"))[-1]
   # the 'variables' attribute is stored as a call to list(),
   # so we changed the call to a list and removed the first element
   # to get the variables themselves.
   if (attr(trms, "response") == 1) {
       # terms does not pull apart right hand side of formula,
       # so we assume each non-function is to be renamed.
       responseVars <- lapply(all.vars(variables[[1]]), as.name)
       variables <- variables[-1]
   } else {
       responseVars <- list()
   }
   # omit non-name variables from list of ones to change.
   # This is where you could expand calls to certain functions.
   variables <- variables[vapply(variables, is.name, TRUE)]
   variables <- c(responseVars, variables) # all are names now
   names(variables) <- vapply(variables, as.character, "")
   newVars <- lapply(variables, function(v) as.name(paste0(toupper(v), "z")))
   formula(do.call("substitute", list(formula, newVars)), env=environment(formula))
}

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
#
Slightly modified, also seems to work.

gsubfn( "([[:alpha:]][[:alnum:]]*)((?=\\s*[-+~*)])|$)",function(x,...) paste0(toupper(x),'z'), test, perl=TRUE )
#[1] "Y1z + Y2z ~ Az*(Bz + Cz) + Dz + Fz * (h == 3) + (sex == 'male')*Iz"
A.K.



----- Original Message -----
From: Greg Snow <538280 at gmail.com>
To: Frank Harrell <f.harrell at vanderbilt.edu>
Cc: RHELP <R-help at stat.math.ethz.ch>
Sent: Thursday, August 15, 2013 5:07 PM
Subject: Re: [R] regex challenge

Here is a first stab:

library(gsubfn)

test <- "y1 + y2 ~ a*(b + c) + d + f * (h == 3) + (sex == 'male')*i"

gsubfn( "([a-zA-Z][a-zA-Z0-9]*)((?=\\s*[-+~)*])|\\s*$)",
function(x,...) paste0(toupper(x),'z'), test, perl=TRUE )
On Wed, Aug 14, 2013 at 9:13 PM, Frank Harrell <f.harrell at vanderbilt.edu>wrote: