robust updating methods
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 15-03-23 12:55 PM, Thierry Onkelinx wrote:
Dear Ben, Last week I was struggling with incorporating lme4 into a package. I traced the problem and made a reproducible example ( https://github.com/ThierryO/testlme4). It looks very simular to the problem you describe. The 'tests' directory contains the reproducible examples. confint() of a model as returned by a function fails. It even fails when I try to calculate the confint() inside the same function as the glmer() call (see the fit_model_ci function). Best regards, Thierry
Ugh. I can get this to work if I also try searching up the call
stack, as follows (within update.merMod). This feels like "code
smell" to me though -- i.e., if I have to hack this hard I must be
doing something wrong/misunderstanding how the problem *should* be done.
if (evaluate) {
ff <- environment(formula(object))
pf <- parent.frame() ## save parent frame in case we need it
sf <- sys.frames()[[1]]
tryCatch(eval(call, env=ff),
error=function(e) {
tryCatch(eval(call, env=sf),
error=function(e) {
eval(call, pf)
})
})
} else call
Here is an adapted even-more-minimal version of your code, which
seems to work with the version of update.merMod I just pushed to
github, but fails for glm():
## https://github.com/ThierryO/testlme4/blob/master/R/fit_model_ci.R
fit_model_ci <- function(formula, dataset, mfun=glmer){
model <- mfun(
formula = formula,
data = dataset,
family = "poisson"
)
ci <- confint(model)
return(list(model = model, confint = ci))
}
library("lme4")
set.seed(101)
dd <- data.frame(f=factor(rep(1:10,each=100)),
y=rpois(2,1000))
fit_model_ci(y~(1|f),dataset=dd)
fit_model_ci(y~(1|f),dataset=dd,mfun=glm)
ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
Research Institute for Nature and Forest team Biometrie &
Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat
25 1070 Anderlecht Belgium
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may
be able to say what the experiment died of. ~ Sir Ronald Aylmer
Fisher The plural of anecdote is not data. ~ Roger Brinner The
combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given
body of data. ~ John Tukey
2015-03-22 17:45 GMT+01:00 Ben Bolker <bbolker at gmail.com>:
WARNING: this is long. Sorry I couldn't find a way to compress
it.
Is there a reasonable way to design an update method so that it's
robust to a variety of reasonable use cases of generating calls or
data inside or outside a function? Is it even possible? Should I
just tell users "don't do that"?
* `update.default()` uses `eval(call, parent.frame())`; this fails
when the call depends on objects that were defined in a different
environment (e.g., when the data are generated and the model
initially fitted within a function scope)
* an alternative is to store the original environment in which the
fitting is done in the environment of the formula and use
`eval(call, env=environment(formula(object)))`; this fails if the
user tries to update the model originally fitted outside a
function with data modified within a function ...
* I think I've got a hack that works below, which first tries in
the environment of the formula and falls back to the parent frame
if that fails, but I wonder if I'm missing something much simpler
..
Thoughts? My understanding of environments and frames is still,
after all these years, not what it should be ...
I've thought of some other workarounds, none entirely
satisfactory:
* force evaluation of all elements in the original call * printing
components of the call can get ugly (can save the original call
before evaluating) * large objects in the call get duplicated *
don't use `eval(call)` for updates; instead try to store
everything internally * this works OK but has the same drawback of
potentially storing large extra copies * we could try to use the
model frame (which is stored already), but there are issues with
this (the basis of a whole separate rant) because the model frame
stores something in between predictor variables and input
variables. For example
d <- data.frame(y=1:10,x=runif(10))
names(model.frame(lm(y~log(x),data=d))) ## "y" "log(x)"
So if we wanted to do something like update to "y ~ sqrt(x)", it
wouldn't work ...
================== update.envformula <- function(object,...) {
extras <- match.call(expand.dots = FALSE)$... call <-
getCall(object) for (i in names(extras)) { existing <-
!is.na(match(names(extras), names(call))) for (a in
names(extras)[existing]) call[[a]] <- extras[[a]] if
(any(!existing)) { call <- c(as.list(call), extras[!existing]) call
<- as.call(call) } } eval(call, env=environment(formula(object)))
## enclos=parent.frame() doesn't help }
update.both <- function(object,...) { extras <-
match.call(expand.dots = FALSE)$... call <- getCall(object) for (i
in names(extras)) { existing <- !is.na(match(names(extras),
names(call))) for (a in names(extras)[existing]) call[[a]] <-
extras[[a]] if (any(!existing)) { call <- c(as.list(call),
extras[!existing]) call <- as.call(call) } } pf <- parent.frame()
## save parent frame in case we need it tryCatch(eval(call,
env=environment(formula(object))), error=function(e) { eval(call,
pf) }) }
### TEST CASES
set.seed(101) d <- data.frame(x=1:10,y=rnorm(10)) m1 <-
lm(y~x,data=d)
##' define data within function, return fitted model f1 <-
function() { d2 <- d lm(y~x,data=d2) return(lm(y~x,data=d2)) } ##'
define (and modify) data within function, try to update ##' model
fitted elsewhere f2 <- function(m) { d2 <- d; d2[1] <- d2[1]+0 ##
force copy update.default(m,data=d2) } ##' define (and modify) data
within function, try to update ##' model fitted elsewhere (use
envformula) f3 <- function(m) { d2 <- d; d2[1] <- d2[1]+0 ## force
copy update.envformula(m,data=d2) }
##' hack: first try the formula, then the parent frame ##' if that
doesn't work for any reason f4 <- function(m) { d2 <- d; d2[1] <-
d2[1]+0 ## force copy update.both(m,data=d2) }
## Case 1: fit within function m2 <- f1() try(update.default(m2))
## default: object 'd2' not found m3A <- update.envformula(m2) ##
envformula: works m3B <- update.both(m2) ## works
## Case 2: update within function m4A <- f2(m1) ## default: works
try(f3(m1)) ## envformula: object 'd2' not found m4B <- f4(m1)
## works
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAEBAgAGBQJVEfl/AAoJEOCV5YRblxUHWdgH/AqLAhDqKV8aRg6jnX9rO96D nwzqv0ClMIxVr2dzD4eSQTL2caWZnXVkws+lg9N7bc4BaWplcYxLNRBw5M8zHOPJ E7JlhG3EecvmeAEt9OY0/q6I0D6vdoEjcH7wzzuyLLIqllu9OskxURi/azMs0XRo tiN+oG5aOKsMYsEGjtiWySRDzhJh2TM40A1HHjAViqpxZcqilAZ6RiNEFe1t1JY0 IvDI8yesSuHnKtgAiqk9ivGw4BCCGoBSIHB3GrJIi11j06iYKw0ugVHIlKYO8cqf AYTvEX2sSxsJgKWYTiG/1dr/kiFTntTDji03zRLVUdPKIZATJMczv+KB+0bpoVY= =Z34K -----END PGP SIGNATURE-----