Skip to content

robust updating methods

5 messages · Thierry Onkelinx, Ben Bolker

#
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

WARNING: this is long.  Sorry I couldn't find a way to compress it.

Is there a reasonable way to design an update method so that it's
robust to a variety of reasonable use cases of generating calls or
data inside or outside a function?  Is it even possible?  Should I
just tell users "don't do that"?

* `update.default()` uses `eval(call, parent.frame())`; this fails
  when the call depends on objects that were defined in a different
  environment (e.g., when the data are generated and the model
  initially fitted within a function scope)

* an alternative is to store the original environment in which the
  fitting is done in the environment of the formula and use
  `eval(call, env=environment(formula(object)))`; this fails if the
  user tries to update the model originally fitted outside a function
  with data modified within a function ...

* I think I've got a hack that works below, which first tries in the
  environment of the formula and falls back to the parent frame if
  that fails, but I wonder if I'm missing something much simpler ..

  Thoughts?  My understanding of environments and frames is still,
after all these years, not what it should be ...

I've thought of some other workarounds, none entirely satisfactory:

* force evaluation of all elements in the original call
    * printing components of the call can get ugly (can save the
original call before evaluating)
    * large objects in the call get duplicated
* don't use `eval(call)` for updates; instead try to store everything
internally
    * this works OK but has the same drawback of potentially storing
large extra copies
    * we could try to use the model frame (which is stored already),
but there are issues with this (the basis of a whole separate rant)
because the model frame stores something in between predictor
variables and input variables. For example

   d <- data.frame(y=1:10,x=runif(10))
   names(model.frame(lm(y~log(x),data=d)))
   ## "y" "log(x)"

So if we wanted to do something like update to "y ~ sqrt(x)",
it wouldn't work ...

==================
update.envformula <- function(object,...) {
    extras <- match.call(expand.dots = FALSE)$...
    call <- getCall(object)
    for (i in names(extras)) {
        existing <- !is.na(match(names(extras), names(call)))
        for (a in names(extras)[existing]) call[[a]] <- extras[[a]]
        if (any(!existing)) {
            call <- c(as.list(call), extras[!existing])
            call <- as.call(call)
        }
    }
    eval(call, env=environment(formula(object)))
    ## enclos=parent.frame() doesn't help
}

update.both <- function(object,...) {
    extras <- match.call(expand.dots = FALSE)$...
    call <- getCall(object)
    for (i in names(extras)) {
        existing <- !is.na(match(names(extras), names(call)))
        for (a in names(extras)[existing]) call[[a]] <- extras[[a]]
        if (any(!existing)) {
            call <- c(as.list(call), extras[!existing])
            call <- as.call(call)
        }
    }
    pf <- parent.frame()  ## save parent frame in case we need it
    tryCatch(eval(call, env=environment(formula(object))),
             error=function(e) {
                 eval(call, pf)
             })
}

### TEST CASES

set.seed(101)
d <- data.frame(x=1:10,y=rnorm(10))
m1 <- lm(y~x,data=d)

##' define data within function, return fitted model
f1 <- function() {
    d2 <- d
    lm(y~x,data=d2)
    return(lm(y~x,data=d2))
}
##' define (and modify) data within function, try to update
##' model fitted elsewhere
f2 <- function(m) {
    d2 <- d; d2[1] <- d2[1]+0 ## force copy
    update.default(m,data=d2)
}
##' define (and modify) data within function, try to update
##' model fitted elsewhere (use envformula)
f3 <- function(m) {
    d2 <- d; d2[1] <- d2[1]+0 ## force copy
    update.envformula(m,data=d2)
}

##' hack: first try the formula, then the parent frame
##' if that doesn't work for any reason
f4 <- function(m) {
    d2 <- d; d2[1] <- d2[1]+0 ## force copy
    update.both(m,data=d2)
}

## Case 1: fit within function
m2 <- f1()
try(update.default(m2))       ## default: object 'd2' not found
m3A <- update.envformula(m2)  ## envformula: works
m3B <- update.both(m2)        ## works

## Case 2: update within function
m4A <- f2(m1)  ## default: works
try(f3(m1))    ## envformula: object 'd2' not found
m4B <- f4(m1)  ## works

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBAgAGBQJVDvHBAAoJEOCV5YRblxUHtnwH/2452Fqmsm//LxNeXxhNrs/G
3ss8rPV2EVJQFjAyO34zzuYZVp1mWlgt7C1L7nXcH4lcf66gYfj75X/yVvMxxwmS
uXEP9HKr4TR5D6Ukf16qqtK41PhTkjS+RDaEpj6BVq9YxlYb53kSjKF+anrf08SL
K6i2L+c4nJIwk56CCp0Re/EIDCNeW5lXYX9jdPAalMoxSHTG8wmytvq0x89+KsRC
aUTpdylPka70vwBQle9W4OEyhBpADIHfEg8csYXZ/MeOKM0bqCRu2IU3OyuCsxyX
Pbo3w1Z7x0e+2WoX4PcltweYK4PJC9O10v+RKYaI5YRy1dFWMQWcPoAKRKf7jwY=
=S7zP
-----END PGP SIGNATURE-----
1 day later
#
Dear Ben,

Last week I was struggling with incorporating lme4 into a package. I traced
the problem and made a reproducible example (
https://github.com/ThierryO/testlme4).  It looks very simular to the
problem you describe.

The 'tests' directory contains the reproducible examples. confint() of a
model as returned by a function fails. It even fails when I try to
calculate the confint() inside the same function as the glmer() call (see
the fit_model_ci function).

Best regards,

Thierry


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2015-03-22 17:45 GMT+01:00 Ben Bolker <bbolker at gmail.com>:

  
  
1 day later
#
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 15-03-23 12:55 PM, Thierry Onkelinx wrote:
Ugh.  I can get this to work if I also try searching up the call
stack, as follows (within update.merMod).  This feels like "code
smell" to me though -- i.e., if I have to hack this hard I must be
doing something wrong/misunderstanding how the problem *should* be done.


    if (evaluate) {
        ff <- environment(formula(object))
        pf <- parent.frame()  ## save parent frame in case we need it
        sf <- sys.frames()[[1]]
        tryCatch(eval(call, env=ff),
                 error=function(e) {
                     tryCatch(eval(call, env=sf),
                              error=function(e) {
                                  eval(call, pf)
                              })
                 })
    } else call

  Here is an adapted even-more-minimal version of your code, which
seems to work with the version of update.merMod I just pushed to
github, but fails for glm():


## https://github.com/ThierryO/testlme4/blob/master/R/fit_model_ci.R
fit_model_ci <- function(formula, dataset, mfun=glmer){
    model <- mfun(
        formula = formula,
        data = dataset,
        family = "poisson"
    )
    ci <- confint(model)
    return(list(model = model, confint = ci))
}

library("lme4")
set.seed(101)
dd <- data.frame(f=factor(rep(1:10,each=100)),
                 y=rpois(2,1000))
fit_model_ci(y~(1|f),dataset=dd)
fit_model_ci(y~(1|f),dataset=dd,mfun=glm)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBAgAGBQJVEfl/AAoJEOCV5YRblxUHWdgH/AqLAhDqKV8aRg6jnX9rO96D
nwzqv0ClMIxVr2dzD4eSQTL2caWZnXVkws+lg9N7bc4BaWplcYxLNRBw5M8zHOPJ
E7JlhG3EecvmeAEt9OY0/q6I0D6vdoEjcH7wzzuyLLIqllu9OskxURi/azMs0XRo
tiN+oG5aOKsMYsEGjtiWySRDzhJh2TM40A1HHjAViqpxZcqilAZ6RiNEFe1t1JY0
IvDI8yesSuHnKtgAiqk9ivGw4BCCGoBSIHB3GrJIi11j06iYKw0ugVHIlKYO8cqf
AYTvEX2sSxsJgKWYTiG/1dr/kiFTntTDji03zRLVUdPKIZATJMczv+KB+0bpoVY=
=Z34K
-----END PGP SIGNATURE-----
2 days later
#
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

  [Sorry to those who don't like it for top-posting]

  Thierry, I'm curious whether this addresses your problem (although
we don't have a hard timetable for the next release [it has to avoid
conflicts with the 3.2.0 release in 2.5 weeks at the very least], so
this might be problematic if your package needs to depend on it).

  I'm still curious whether there are any ideas/opinions from other
readers.  Has anyone else struggled with this?  Is there a canonical
solution?

 Ben Bolker
On 15-03-24 07:55 PM, Ben Bolker wrote:
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBAgAGBQJVFc1CAAoJEOCV5YRblxUHF+MH/3Y9uFZFolhx5b5jWSyXwQgp
i9oawx4K6il0qiAiDiO5D7NSSdc0u9jlgj8NjH0G2O9u3ctpvcYNVwa7cP9288Xz
xRyInnnh2FIpT6W0XyzJDivw5EX3IkyYuv6eDNqVyGcYXkvzJMA+vwMMWdGWEZbL
jKtDc0trG+9yJnwIi6DW6IQWPovrDaNxEinS+V7+DmYACQvJ4P2kg2u/ZsxAx89q
mcA1pS5usJjkOiQwBVUvV7l2UKNhHPFNwbBK1QdHgpP7PTdB52EQr+IyERhpf56s
8tYyNbSSPWoG9vt6/1pKyUK4iNRBtGgxtuozAv5OUjF8VGWGwUXBLo5G2yrBbs4=
=o1PJ
-----END PGP SIGNATURE-----
#
Dear Ben,

Fitting the model and calculating the confidence intervals within the same
function works (
https://github.com/ThierryO/testlme4/blob/master/tests/testthat/test_fit_model_ci.R
passes).
Fitting the model inside a function and calculating the confidence
intervals on the output still fails (
https://github.com/ThierryO/testlme4/blob/master/tests/testthat/test_fit_model.R
fails).

Directly calculating the confidence intervals in the same function is an
acceptable solution for my work. Thank you for this solution.

Best regards,

Thierry

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2015-03-27 22:36 GMT+01:00 Ben Bolker <bbolker at gmail.com>: