Skip to content

trouble automating formula edits when log or * are present; update trouble

10 messages · Paul Johnson, Gabor Grothendieck, Charles C. Berry +2 more

#
Greetings

I want to take a fitted regression and replace all uses of a variable
in a formula. For example, I'd like to take

m1 <- lm(y ~ x1, data=dat)

and replace x1 with something else, say x1c, so the formula would become

m1 <- lm(y ~ x1c, data=dat)

I have working code to finish that part of the problem, but it fails
when the formula is more complicated. If the formula has log(x1) or
x1:x2, the update code I'm testing doesn't get right.

Here's the test code:

##PJ
## 2012-05-29
dat <- data.frame(x1=rnorm(100,m=50), x2=rnorm(100,m=50),
x3=rnorm(100,m=50), y=rnorm(100))

m1 <- lm(y ~ log(x1) + x1 + sin(x2) + x2 + exp(x3), data=dat)
m2 <- lm(y ~ log(x1) + x2*x3, data=dat)

suffixX <- function(fmla, x, s){
    upform <- as.formula(paste0(". ~ .", "-", x, "+", paste0(x, s)))
    update.formula(fmla, upform)
}

newFmla <- formula(m2)
newFmla
suffixX(newFmla, "x2", "c")
suffixX(newFmla, "x1", "c")

The last few lines of the output. See how the update misses x1 inside
log(x1) or in the interaction?
y ~ log(x1) + x2 * x3
y ~ log(x1) + x3 + x2c + x2:x3
y ~ log(x1) + x2 + x3 + x1c + x2:x3

It gets the target if the target is all by itself, but not otherwise.

After messing with this for quite a while, I conclude that update was
the wrong way to go because it is geared to replacement of individual
bits, not editing all instances of a thing.

So I started studying the structure of formula objects.  I noticed
this really interesting thing. the newFmla object can be probed
recursively to eventually reveal all of the individual pieces:
y ~ log(x1) + x2 * x3
log(x1) + x2 * x3
log(x1)
x1

So, if you could tell me of a general way to "walk" though a formula
object, couldn't I use "gsub" or something like that to recognize each
instance of "x1" and replace with "x1c"??

I just can't figure how to automate the checking of each possible
element in a formula, to get the right combination of [[]][[]][[]].
See what I mean? I need to avoid this:
Error in newFmla[[3]][[2]][[3]] : subscript out of bounds

pj
#
Hi Paul,

I haven't quite thought through this yet, but might it not be easier
to convert your formula to a character and then use gsub et al on it
directly?

Something like this

# Using m2 as you set up below
m2 <- lm(y ~ log(x1) + x2*x3, data=dat)

f2 <- formula(m2)

as.formula(paste(f2[2], f2[1],gsub("x1", "x1c", as.character(f2[3]))))

It's admittedly unwieldy, but it seems pretty robust.

Something like:

changeFormula <- function(form, xIn, xOut){
    as.formula(paste(form[2], form[1], gsub(xIn, xOut, as.character(form[3]))))
}

changeForm(formula(m2), "x1", "x1c")

I'm not sure if this will play nice with environments and what not so
you might need to change those manually.

Hope this gets you started,
Michael
On Tue, May 29, 2012 at 11:43 AM, Paul Johnson <pauljohn32 at gmail.com> wrote:
#
Michael:

m2 is a model fit, not a formula. So I don't think what you suggested will work.

However, I think your idea is a good one. The trick is to protect the
model specification from evaluation via quote(). e.g.
[1] "lm(y ~ x1)"

Then you can apply your suggestion:
[1] "lm(y ~ log(x1))"
Call:
lm(formula = y ~ log(x1))

Coefficients:
(Intercept)      log(x1)
   -0.04894      0.36484


The gsub() would make the substitution wherever "x1" appeared in the
model formula, thus fulfilling the OP's request.

Two comments:

1. update() behaves as documented. It is a formula update method, not
a macro substitution procedure.

2. I believe this illustrates a legitimate violation of the "avoid the
eval(parse)) construction" precept. However, I may be wrong about this
and would welcome being corrected and shown a better alternative.

Cheers,
Bert





On Tue, May 29, 2012 at 9:31 AM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:

  
    
#
I should have added:

If the formula is just assigned to a name, quote() and
eval(parse(...)) are not needed:

fm1 <-  y ~ x1  ## a formula
w <- gsub( "x1","log(x1)", deparse(fm1))
fm2 <- formula(w)

This is probably the btter way to do it.

-- Bert
On Tue, May 29, 2012 at 10:01 AM, Bert Gunter <bgunter at gene.com> wrote:

  
    
#
On Tue, May 29, 2012 at 11:43 AM, Paul Johnson <pauljohn32 at gmail.com> wrote:
Try substitute:
y ~ log(x1c) + x2 * x3
#
Damn. That's pretty. I'd say "setNames" a magic bullet too.

Thanks very much.

The approach suggested by Michael and Bert has the little shortcoming
that grepping for "x1" also picks up similarly named variables like
"x1new" and "x1old".  I've not yet found a similar downside with
Gabor's method.

pj

  
    
#
Paul Johnson <pauljohn32 at gmail.com> writes:
So incantation involving substitute(), perhaps??
y ~ log(z) * (w + u)


HTH,

Chuck

  
    
#
Deparse... that's it -- was disappointed with having to turn
as.character.formula inside out once and again. Merci!

But, as always, we all loose to Gabor ;-)

Michael
On Tue, May 29, 2012 at 1:16 PM, Bert Gunter <gunter.berton at gene.com> wrote:
#
Paul et. al:

I think Gabor's incantation qualifies as my desired alternative to
eval(parse())). It is, unfortunately, rather tricky, imo.

However, the objection you raise to the gsub(deparse()) solution is
easily overcome through the use of an appopriate regex: e.g.:

 gsub("\\<x\\>","log(x)","x+xc+cx")
[1] "log(x)+xc+cx"

See the ?regex documentation on \< and \> . I believe this allows this
simple approach to handle all cases -- am I wrong about this?

Incidentally, I need to note that I was **wrong** in my previous statement that
formula(lm(y!x1))  does not work. It works fine, as there is a formula
method for (g)lm . I should look before I leap.

-- Bert
On Tue, May 29, 2012 at 10:41 AM, Paul Johnson <pauljohn32 at gmail.com> wrote:

  
    
#
or use quote()...

-- Bert
On Tue, May 29, 2012 at 10:48 AM, <cberry at tajo.ucsd.edu> wrote:
## or using quote()