-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
WARNING: this is long. Sorry I couldn't find a way to compress it.
Is there a reasonable way to design an update method so that it's
robust to a variety of reasonable use cases of generating calls or
data inside or outside a function? Is it even possible? Should I
just tell users "don't do that"?
* `update.default()` uses `eval(call, parent.frame())`; this fails
when the call depends on objects that were defined in a different
environment (e.g., when the data are generated and the model
initially fitted within a function scope)
* an alternative is to store the original environment in which the
fitting is done in the environment of the formula and use
`eval(call, env=environment(formula(object)))`; this fails if the
user tries to update the model originally fitted outside a function
with data modified within a function ...
* I think I've got a hack that works below, which first tries in the
environment of the formula and falls back to the parent frame if
that fails, but I wonder if I'm missing something much simpler ..
Thoughts? My understanding of environments and frames is still,
after all these years, not what it should be ...
I've thought of some other workarounds, none entirely satisfactory:
* force evaluation of all elements in the original call
* printing components of the call can get ugly (can save the
original call before evaluating)
* large objects in the call get duplicated
* don't use `eval(call)` for updates; instead try to store everything
internally
* this works OK but has the same drawback of potentially storing
large extra copies
* we could try to use the model frame (which is stored already),
but there are issues with this (the basis of a whole separate rant)
because the model frame stores something in between predictor
variables and input variables. For example
d <- data.frame(y=1:10,x=runif(10))
names(model.frame(lm(y~log(x),data=d)))
## "y" "log(x)"
So if we wanted to do something like update to "y ~ sqrt(x)",
it wouldn't work ...
==================
update.envformula <- function(object,...) {
extras <- match.call(expand.dots = FALSE)$...
call <- getCall(object)
for (i in names(extras)) {
existing <- !is.na(match(names(extras), names(call)))
for (a in names(extras)[existing]) call[[a]] <- extras[[a]]
if (any(!existing)) {
call <- c(as.list(call), extras[!existing])
call <- as.call(call)
}
}
eval(call, env=environment(formula(object)))
## enclos=parent.frame() doesn't help
}
update.both <- function(object,...) {
extras <- match.call(expand.dots = FALSE)$...
call <- getCall(object)
for (i in names(extras)) {
existing <- !is.na(match(names(extras), names(call)))
for (a in names(extras)[existing]) call[[a]] <- extras[[a]]
if (any(!existing)) {
call <- c(as.list(call), extras[!existing])
call <- as.call(call)
}
}
pf <- parent.frame() ## save parent frame in case we need it
tryCatch(eval(call, env=environment(formula(object))),
error=function(e) {
eval(call, pf)
})
}
### TEST CASES
set.seed(101)
d <- data.frame(x=1:10,y=rnorm(10))
m1 <- lm(y~x,data=d)
##' define data within function, return fitted model
f1 <- function() {
d2 <- d
lm(y~x,data=d2)
return(lm(y~x,data=d2))
}
##' define (and modify) data within function, try to update
##' model fitted elsewhere
f2 <- function(m) {
d2 <- d; d2[1] <- d2[1]+0 ## force copy
update.default(m,data=d2)
}
##' define (and modify) data within function, try to update
##' model fitted elsewhere (use envformula)
f3 <- function(m) {
d2 <- d; d2[1] <- d2[1]+0 ## force copy
update.envformula(m,data=d2)
}
##' hack: first try the formula, then the parent frame
##' if that doesn't work for any reason
f4 <- function(m) {
d2 <- d; d2[1] <- d2[1]+0 ## force copy
update.both(m,data=d2)
}
## Case 1: fit within function
m2 <- f1()
try(update.default(m2)) ## default: object 'd2' not found
m3A <- update.envformula(m2) ## envformula: works
m3B <- update.both(m2) ## works
## Case 2: update within function
m4A <- f2(m1) ## default: works
try(f3(m1)) ## envformula: object 'd2' not found
m4B <- f4(m1) ## works
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQEcBAEBAgAGBQJVDvHBAAoJEOCV5YRblxUHtnwH/2452Fqmsm//LxNeXxhNrs/G
3ss8rPV2EVJQFjAyO34zzuYZVp1mWlgt7C1L7nXcH4lcf66gYfj75X/yVvMxxwmS
uXEP9HKr4TR5D6Ukf16qqtK41PhTkjS+RDaEpj6BVq9YxlYb53kSjKF+anrf08SL
K6i2L+c4nJIwk56CCp0Re/EIDCNeW5lXYX9jdPAalMoxSHTG8wmytvq0x89+KsRC
aUTpdylPka70vwBQle9W4OEyhBpADIHfEg8csYXZ/MeOKM0bqCRu2IU3OyuCsxyX
Pbo3w1Z7x0e+2WoX4PcltweYK4PJC9O10v+RKYaI5YRy1dFWMQWcPoAKRKf7jwY=
=S7zP
-----END PGP SIGNATURE-----
robust updating methods
5 messages · Thierry Onkelinx, Ben Bolker
1 day later
Dear Ben, Last week I was struggling with incorporating lme4 into a package. I traced the problem and made a reproducible example ( https://github.com/ThierryO/testlme4). It looks very simular to the problem you describe. The 'tests' directory contains the reproducible examples. confint() of a model as returned by a function fails. It even fails when I try to calculate the confint() inside the same function as the glmer() call (see the fit_model_ci function). Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-03-22 17:45 GMT+01:00 Ben Bolker <bbolker at gmail.com>:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
WARNING: this is long. Sorry I couldn't find a way to compress it.
Is there a reasonable way to design an update method so that it's
robust to a variety of reasonable use cases of generating calls or
data inside or outside a function? Is it even possible? Should I
just tell users "don't do that"?
* `update.default()` uses `eval(call, parent.frame())`; this fails
when the call depends on objects that were defined in a different
environment (e.g., when the data are generated and the model
initially fitted within a function scope)
* an alternative is to store the original environment in which the
fitting is done in the environment of the formula and use
`eval(call, env=environment(formula(object)))`; this fails if the
user tries to update the model originally fitted outside a function
with data modified within a function ...
* I think I've got a hack that works below, which first tries in the
environment of the formula and falls back to the parent frame if
that fails, but I wonder if I'm missing something much simpler ..
Thoughts? My understanding of environments and frames is still,
after all these years, not what it should be ...
I've thought of some other workarounds, none entirely satisfactory:
* force evaluation of all elements in the original call
* printing components of the call can get ugly (can save the
original call before evaluating)
* large objects in the call get duplicated
* don't use `eval(call)` for updates; instead try to store everything
internally
* this works OK but has the same drawback of potentially storing
large extra copies
* we could try to use the model frame (which is stored already),
but there are issues with this (the basis of a whole separate rant)
because the model frame stores something in between predictor
variables and input variables. For example
d <- data.frame(y=1:10,x=runif(10))
names(model.frame(lm(y~log(x),data=d)))
## "y" "log(x)"
So if we wanted to do something like update to "y ~ sqrt(x)",
it wouldn't work ...
==================
update.envformula <- function(object,...) {
extras <- match.call(expand.dots = FALSE)$...
call <- getCall(object)
for (i in names(extras)) {
existing <- !is.na(match(names(extras), names(call)))
for (a in names(extras)[existing]) call[[a]] <- extras[[a]]
if (any(!existing)) {
call <- c(as.list(call), extras[!existing])
call <- as.call(call)
}
}
eval(call, env=environment(formula(object)))
## enclos=parent.frame() doesn't help
}
update.both <- function(object,...) {
extras <- match.call(expand.dots = FALSE)$...
call <- getCall(object)
for (i in names(extras)) {
existing <- !is.na(match(names(extras), names(call)))
for (a in names(extras)[existing]) call[[a]] <- extras[[a]]
if (any(!existing)) {
call <- c(as.list(call), extras[!existing])
call <- as.call(call)
}
}
pf <- parent.frame() ## save parent frame in case we need it
tryCatch(eval(call, env=environment(formula(object))),
error=function(e) {
eval(call, pf)
})
}
### TEST CASES
set.seed(101)
d <- data.frame(x=1:10,y=rnorm(10))
m1 <- lm(y~x,data=d)
##' define data within function, return fitted model
f1 <- function() {
d2 <- d
lm(y~x,data=d2)
return(lm(y~x,data=d2))
}
##' define (and modify) data within function, try to update
##' model fitted elsewhere
f2 <- function(m) {
d2 <- d; d2[1] <- d2[1]+0 ## force copy
update.default(m,data=d2)
}
##' define (and modify) data within function, try to update
##' model fitted elsewhere (use envformula)
f3 <- function(m) {
d2 <- d; d2[1] <- d2[1]+0 ## force copy
update.envformula(m,data=d2)
}
##' hack: first try the formula, then the parent frame
##' if that doesn't work for any reason
f4 <- function(m) {
d2 <- d; d2[1] <- d2[1]+0 ## force copy
update.both(m,data=d2)
}
## Case 1: fit within function
m2 <- f1()
try(update.default(m2)) ## default: object 'd2' not found
m3A <- update.envformula(m2) ## envformula: works
m3B <- update.both(m2) ## works
## Case 2: update within function
m4A <- f2(m1) ## default: works
try(f3(m1)) ## envformula: object 'd2' not found
m4B <- f4(m1) ## works
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQEcBAEBAgAGBQJVDvHBAAoJEOCV5YRblxUHtnwH/2452Fqmsm//LxNeXxhNrs/G
3ss8rPV2EVJQFjAyO34zzuYZVp1mWlgt7C1L7nXcH4lcf66gYfj75X/yVvMxxwmS
uXEP9HKr4TR5D6Ukf16qqtK41PhTkjS+RDaEpj6BVq9YxlYb53kSjKF+anrf08SL
K6i2L+c4nJIwk56CCp0Re/EIDCNeW5lXYX9jdPAalMoxSHTG8wmytvq0x89+KsRC
aUTpdylPka70vwBQle9W4OEyhBpADIHfEg8csYXZ/MeOKM0bqCRu2IU3OyuCsxyX
Pbo3w1Z7x0e+2WoX4PcltweYK4PJC9O10v+RKYaI5YRy1dFWMQWcPoAKRKf7jwY=
=S7zP
-----END PGP SIGNATURE-----
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
1 day later
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 15-03-23 12:55 PM, Thierry Onkelinx wrote:
Dear Ben, Last week I was struggling with incorporating lme4 into a package. I traced the problem and made a reproducible example ( https://github.com/ThierryO/testlme4). It looks very simular to the problem you describe. The 'tests' directory contains the reproducible examples. confint() of a model as returned by a function fails. It even fails when I try to calculate the confint() inside the same function as the glmer() call (see the fit_model_ci function). Best regards, Thierry
Ugh. I can get this to work if I also try searching up the call
stack, as follows (within update.merMod). This feels like "code
smell" to me though -- i.e., if I have to hack this hard I must be
doing something wrong/misunderstanding how the problem *should* be done.
if (evaluate) {
ff <- environment(formula(object))
pf <- parent.frame() ## save parent frame in case we need it
sf <- sys.frames()[[1]]
tryCatch(eval(call, env=ff),
error=function(e) {
tryCatch(eval(call, env=sf),
error=function(e) {
eval(call, pf)
})
})
} else call
Here is an adapted even-more-minimal version of your code, which
seems to work with the version of update.merMod I just pushed to
github, but fails for glm():
## https://github.com/ThierryO/testlme4/blob/master/R/fit_model_ci.R
fit_model_ci <- function(formula, dataset, mfun=glmer){
model <- mfun(
formula = formula,
data = dataset,
family = "poisson"
)
ci <- confint(model)
return(list(model = model, confint = ci))
}
library("lme4")
set.seed(101)
dd <- data.frame(f=factor(rep(1:10,each=100)),
y=rpois(2,1000))
fit_model_ci(y~(1|f),dataset=dd)
fit_model_ci(y~(1|f),dataset=dd,mfun=glm)
ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
Research Institute for Nature and Forest team Biometrie &
Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat
25 1070 Anderlecht Belgium
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may
be able to say what the experiment died of. ~ Sir Ronald Aylmer
Fisher The plural of anecdote is not data. ~ Roger Brinner The
combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given
body of data. ~ John Tukey
2015-03-22 17:45 GMT+01:00 Ben Bolker <bbolker at gmail.com>:
WARNING: this is long. Sorry I couldn't find a way to compress
it.
Is there a reasonable way to design an update method so that it's
robust to a variety of reasonable use cases of generating calls or
data inside or outside a function? Is it even possible? Should I
just tell users "don't do that"?
* `update.default()` uses `eval(call, parent.frame())`; this fails
when the call depends on objects that were defined in a different
environment (e.g., when the data are generated and the model
initially fitted within a function scope)
* an alternative is to store the original environment in which the
fitting is done in the environment of the formula and use
`eval(call, env=environment(formula(object)))`; this fails if the
user tries to update the model originally fitted outside a
function with data modified within a function ...
* I think I've got a hack that works below, which first tries in
the environment of the formula and falls back to the parent frame
if that fails, but I wonder if I'm missing something much simpler
..
Thoughts? My understanding of environments and frames is still,
after all these years, not what it should be ...
I've thought of some other workarounds, none entirely
satisfactory:
* force evaluation of all elements in the original call * printing
components of the call can get ugly (can save the original call
before evaluating) * large objects in the call get duplicated *
don't use `eval(call)` for updates; instead try to store
everything internally * this works OK but has the same drawback of
potentially storing large extra copies * we could try to use the
model frame (which is stored already), but there are issues with
this (the basis of a whole separate rant) because the model frame
stores something in between predictor variables and input
variables. For example
d <- data.frame(y=1:10,x=runif(10))
names(model.frame(lm(y~log(x),data=d))) ## "y" "log(x)"
So if we wanted to do something like update to "y ~ sqrt(x)", it
wouldn't work ...
================== update.envformula <- function(object,...) {
extras <- match.call(expand.dots = FALSE)$... call <-
getCall(object) for (i in names(extras)) { existing <-
!is.na(match(names(extras), names(call))) for (a in
names(extras)[existing]) call[[a]] <- extras[[a]] if
(any(!existing)) { call <- c(as.list(call), extras[!existing]) call
<- as.call(call) } } eval(call, env=environment(formula(object)))
## enclos=parent.frame() doesn't help }
update.both <- function(object,...) { extras <-
match.call(expand.dots = FALSE)$... call <- getCall(object) for (i
in names(extras)) { existing <- !is.na(match(names(extras),
names(call))) for (a in names(extras)[existing]) call[[a]] <-
extras[[a]] if (any(!existing)) { call <- c(as.list(call),
extras[!existing]) call <- as.call(call) } } pf <- parent.frame()
## save parent frame in case we need it tryCatch(eval(call,
env=environment(formula(object))), error=function(e) { eval(call,
pf) }) }
### TEST CASES
set.seed(101) d <- data.frame(x=1:10,y=rnorm(10)) m1 <-
lm(y~x,data=d)
##' define data within function, return fitted model f1 <-
function() { d2 <- d lm(y~x,data=d2) return(lm(y~x,data=d2)) } ##'
define (and modify) data within function, try to update ##' model
fitted elsewhere f2 <- function(m) { d2 <- d; d2[1] <- d2[1]+0 ##
force copy update.default(m,data=d2) } ##' define (and modify) data
within function, try to update ##' model fitted elsewhere (use
envformula) f3 <- function(m) { d2 <- d; d2[1] <- d2[1]+0 ## force
copy update.envformula(m,data=d2) }
##' hack: first try the formula, then the parent frame ##' if that
doesn't work for any reason f4 <- function(m) { d2 <- d; d2[1] <-
d2[1]+0 ## force copy update.both(m,data=d2) }
## Case 1: fit within function m2 <- f1() try(update.default(m2))
## default: object 'd2' not found m3A <- update.envformula(m2) ##
envformula: works m3B <- update.both(m2) ## works
## Case 2: update within function m4A <- f2(m1) ## default: works
try(f3(m1)) ## envformula: object 'd2' not found m4B <- f4(m1)
## works
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAEBAgAGBQJVEfl/AAoJEOCV5YRblxUHWdgH/AqLAhDqKV8aRg6jnX9rO96D nwzqv0ClMIxVr2dzD4eSQTL2caWZnXVkws+lg9N7bc4BaWplcYxLNRBw5M8zHOPJ E7JlhG3EecvmeAEt9OY0/q6I0D6vdoEjcH7wzzuyLLIqllu9OskxURi/azMs0XRo tiN+oG5aOKsMYsEGjtiWySRDzhJh2TM40A1HHjAViqpxZcqilAZ6RiNEFe1t1JY0 IvDI8yesSuHnKtgAiqk9ivGw4BCCGoBSIHB3GrJIi11j06iYKw0ugVHIlKYO8cqf AYTvEX2sSxsJgKWYTiG/1dr/kiFTntTDji03zRLVUdPKIZATJMczv+KB+0bpoVY= =Z34K -----END PGP SIGNATURE-----
2 days later
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 [Sorry to those who don't like it for top-posting] Thierry, I'm curious whether this addresses your problem (although we don't have a hard timetable for the next release [it has to avoid conflicts with the 3.2.0 release in 2.5 weeks at the very least], so this might be problematic if your package needs to depend on it). I'm still curious whether there are any ideas/opinions from other readers. Has anyone else struggled with this? Is there a canonical solution? Ben Bolker
On 15-03-24 07:55 PM, Ben Bolker wrote:
On 15-03-23 12:55 PM, Thierry Onkelinx wrote:
Dear Ben,
Last week I was struggling with incorporating lme4 into a package. I traced the problem and made a reproducible example ( https://github.com/ThierryO/testlme4). It looks very simular to the problem you describe.
The 'tests' directory contains the reproducible examples. confint() of a model as returned by a function fails. It even fails when I try to calculate the confint() inside the same function as the glmer() call (see the fit_model_ci function).
Best regards,
Thierry
Ugh. I can get this to work if I also try searching up the call
stack, as follows (within update.merMod). This feels like "code
smell" to me though -- i.e., if I have to hack this hard I must be
doing something wrong/misunderstanding how the problem *should* be
done.
if (evaluate) { ff <- environment(formula(object)) pf <-
parent.frame() ## save parent frame in case we need it sf <-
sys.frames()[[1]] tryCatch(eval(call, env=ff), error=function(e) {
tryCatch(eval(call, env=sf), error=function(e) { eval(call, pf) })
}) } else call
Here is an adapted even-more-minimal version of your code, which
seems to work with the version of update.merMod I just pushed to
github, but fails for glm():
##
https://github.com/ThierryO/testlme4/blob/master/R/fit_model_ci.R
fit_model_ci <- function(formula, dataset, mfun=glmer){ model <-
mfun( formula = formula, data = dataset, family = "poisson" ) ci <-
confint(model) return(list(model = model, confint = ci)) }
library("lme4") set.seed(101) dd <-
data.frame(f=factor(rep(1:10,each=100)), y=rpois(2,1000))
fit_model_ci(y~(1|f),dataset=dd)
fit_model_ci(y~(1|f),dataset=dd,mfun=glm)
ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey
2015-03-22 17:45 GMT+01:00 Ben Bolker <bbolker at gmail.com>:
WARNING: this is long. Sorry I couldn't find a way to compress it.
Is there a reasonable way to design an update method so that it's robust to a variety of reasonable use cases of generating calls or data inside or outside a function? Is it even possible? Should I just tell users "don't do that"?
* `update.default()` uses `eval(call, parent.frame())`; this fails when the call depends on objects that were defined in a different environment (e.g., when the data are generated and the model initially fitted within a function scope)
* an alternative is to store the original environment in which the fitting is done in the environment of the formula and use `eval(call, env=environment(formula(object)))`; this fails if the user tries to update the model originally fitted outside a function with data modified within a function ...
* I think I've got a hack that works below, which first tries in the environment of the formula and falls back to the parent frame if that fails, but I wonder if I'm missing something much simpler ..
Thoughts? My understanding of environments and frames is still, after all these years, not what it should be ...
I've thought of some other workarounds, none entirely satisfactory:
* force evaluation of all elements in the original call * printing components of the call can get ugly (can save the original call before evaluating) * large objects in the call get duplicated * don't use `eval(call)` for updates; instead try to store everything internally * this works OK but has the same drawback of potentially storing large extra copies * we could try to use the model frame (which is stored already), but there are issues with this (the basis of a whole separate rant) because the model frame stores something in between predictor variables and input variables. For example
d <- data.frame(y=1:10,x=runif(10)) names(model.frame(lm(y~log(x),data=d))) ## "y" "log(x)"
So if we wanted to do something like update to "y ~ sqrt(x)", it wouldn't work ...
================== update.envformula <- function(object,...) {
extras <- match.call(expand.dots = FALSE)$... call <-
getCall(object) for (i in names(extras)) { existing <-
!is.na(match(names(extras), names(call))) for (a in
names(extras)[existing]) call[[a]] <- extras[[a]] if
(any(!existing)) { call <- c(as.list(call), extras[!existing])
call <- as.call(call) } } eval(call,
env=environment(formula(object))) ## enclos=parent.frame()
doesn't help }
update.both <- function(object,...) { extras <-
match.call(expand.dots = FALSE)$... call <- getCall(object) for
(i in names(extras)) { existing <- !is.na(match(names(extras),
names(call))) for (a in names(extras)[existing]) call[[a]] <-
extras[[a]] if (any(!existing)) { call <- c(as.list(call),
extras[!existing]) call <- as.call(call) } } pf <-
parent.frame() ## save parent frame in case we need it
tryCatch(eval(call, env=environment(formula(object))),
error=function(e) { eval(call, pf) }) }
### TEST CASES
set.seed(101) d <- data.frame(x=1:10,y=rnorm(10)) m1 <- lm(y~x,data=d)
##' define data within function, return fitted model f1 <-
function() { d2 <- d lm(y~x,data=d2) return(lm(y~x,data=d2)) }
##' define (and modify) data within function, try to update ##'
model fitted elsewhere f2 <- function(m) { d2 <- d; d2[1] <-
d2[1]+0 ## force copy update.default(m,data=d2) } ##' define (and
modify) data within function, try to update ##' model fitted
elsewhere (use envformula) f3 <- function(m) { d2 <- d; d2[1] <-
d2[1]+0 ## force copy update.envformula(m,data=d2) }
##' hack: first try the formula, then the parent frame ##' if
that doesn't work for any reason f4 <- function(m) { d2 <- d;
d2[1] <- d2[1]+0 ## force copy update.both(m,data=d2) }
## Case 1: fit within function m2 <- f1() try(update.default(m2)) ## default: object 'd2' not found m3A <- update.envformula(m2) ## envformula: works m3B <- update.both(m2) ## works
## Case 2: update within function m4A <- f2(m1) ## default: works try(f3(m1)) ## envformula: object 'd2' not found m4B <- f4(m1) ## works
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAEBAgAGBQJVFc1CAAoJEOCV5YRblxUHF+MH/3Y9uFZFolhx5b5jWSyXwQgp i9oawx4K6il0qiAiDiO5D7NSSdc0u9jlgj8NjH0G2O9u3ctpvcYNVwa7cP9288Xz xRyInnnh2FIpT6W0XyzJDivw5EX3IkyYuv6eDNqVyGcYXkvzJMA+vwMMWdGWEZbL jKtDc0trG+9yJnwIi6DW6IQWPovrDaNxEinS+V7+DmYACQvJ4P2kg2u/ZsxAx89q mcA1pS5usJjkOiQwBVUvV7l2UKNhHPFNwbBK1QdHgpP7PTdB52EQr+IyERhpf56s 8tYyNbSSPWoG9vt6/1pKyUK4iNRBtGgxtuozAv5OUjF8VGWGwUXBLo5G2yrBbs4= =o1PJ -----END PGP SIGNATURE-----
Dear Ben, Fitting the model and calculating the confidence intervals within the same function works ( https://github.com/ThierryO/testlme4/blob/master/tests/testthat/test_fit_model_ci.R passes). Fitting the model inside a function and calculating the confidence intervals on the output still fails ( https://github.com/ThierryO/testlme4/blob/master/tests/testthat/test_fit_model.R fails). Directly calculating the confidence intervals in the same function is an acceptable solution for my work. Thank you for this solution. Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-03-27 22:36 GMT+01:00 Ben Bolker <bbolker at gmail.com>:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 [Sorry to those who don't like it for top-posting] Thierry, I'm curious whether this addresses your problem (although we don't have a hard timetable for the next release [it has to avoid conflicts with the 3.2.0 release in 2.5 weeks at the very least], so this might be problematic if your package needs to depend on it). I'm still curious whether there are any ideas/opinions from other readers. Has anyone else struggled with this? Is there a canonical solution? Ben Bolker On 15-03-24 07:55 PM, Ben Bolker wrote:
On 15-03-23 12:55 PM, Thierry Onkelinx wrote:
Dear Ben,
Last week I was struggling with incorporating lme4 into a package. I traced the problem and made a reproducible example ( https://github.com/ThierryO/testlme4). It looks very simular to the problem you describe.
The 'tests' directory contains the reproducible examples. confint() of a model as returned by a function fails. It even fails when I try to calculate the confint() inside the same function as the glmer() call (see the fit_model_ci function).
Best regards,
Thierry
Ugh. I can get this to work if I also try searching up the call
stack, as follows (within update.merMod). This feels like "code
smell" to me though -- i.e., if I have to hack this hard I must be
doing something wrong/misunderstanding how the problem *should* be
done.
if (evaluate) { ff <- environment(formula(object)) pf <-
parent.frame() ## save parent frame in case we need it sf <-
sys.frames()[[1]] tryCatch(eval(call, env=ff), error=function(e) {
tryCatch(eval(call, env=sf), error=function(e) { eval(call, pf) })
}) } else call
Here is an adapted even-more-minimal version of your code, which
seems to work with the version of update.merMod I just pushed to
github, but fails for glm():
##
https://github.com/ThierryO/testlme4/blob/master/R/fit_model_ci.R
fit_model_ci <- function(formula, dataset, mfun=glmer){ model <-
mfun( formula = formula, data = dataset, family = "poisson" ) ci <-
confint(model) return(list(model = model, confint = ci)) }
library("lme4") set.seed(101) dd <-
data.frame(f=factor(rep(1:10,each=100)), y=rpois(2,1000))
fit_model_ci(y~(1|f),dataset=dd)
fit_model_ci(y~(1|f),dataset=dd,mfun=glm)
ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey
2015-03-22 17:45 GMT+01:00 Ben Bolker <bbolker at gmail.com>:
WARNING: this is long. Sorry I couldn't find a way to compress it.
Is there a reasonable way to design an update method so that it's robust to a variety of reasonable use cases of generating calls or data inside or outside a function? Is it even possible? Should I just tell users "don't do that"?
* `update.default()` uses `eval(call, parent.frame())`; this fails when the call depends on objects that were defined in a different environment (e.g., when the data are generated and the model initially fitted within a function scope)
* an alternative is to store the original environment in which the fitting is done in the environment of the formula and use `eval(call, env=environment(formula(object)))`; this fails if the user tries to update the model originally fitted outside a function with data modified within a function ...
* I think I've got a hack that works below, which first tries in the environment of the formula and falls back to the parent frame if that fails, but I wonder if I'm missing something much simpler ..
Thoughts? My understanding of environments and frames is still, after all these years, not what it should be ...
I've thought of some other workarounds, none entirely satisfactory:
* force evaluation of all elements in the original call * printing components of the call can get ugly (can save the original call before evaluating) * large objects in the call get duplicated * don't use `eval(call)` for updates; instead try to store everything internally * this works OK but has the same drawback of potentially storing large extra copies * we could try to use the model frame (which is stored already), but there are issues with this (the basis of a whole separate rant) because the model frame stores something in between predictor variables and input variables. For example
d <- data.frame(y=1:10,x=runif(10)) names(model.frame(lm(y~log(x),data=d))) ## "y" "log(x)"
So if we wanted to do something like update to "y ~ sqrt(x)", it wouldn't work ...
================== update.envformula <- function(object,...) {
extras <- match.call(expand.dots = FALSE)$... call <-
getCall(object) for (i in names(extras)) { existing <-
!is.na(match(names(extras), names(call))) for (a in
names(extras)[existing]) call[[a]] <- extras[[a]] if
(any(!existing)) { call <- c(as.list(call), extras[!existing])
call <- as.call(call) } } eval(call,
env=environment(formula(object))) ## enclos=parent.frame()
doesn't help }
update.both <- function(object,...) { extras <-
match.call(expand.dots = FALSE)$... call <- getCall(object) for
(i in names(extras)) { existing <- !is.na(match(names(extras),
names(call))) for (a in names(extras)[existing]) call[[a]] <-
extras[[a]] if (any(!existing)) { call <- c(as.list(call),
extras[!existing]) call <- as.call(call) } } pf <-
parent.frame() ## save parent frame in case we need it
tryCatch(eval(call, env=environment(formula(object))),
error=function(e) { eval(call, pf) }) }
### TEST CASES
set.seed(101) d <- data.frame(x=1:10,y=rnorm(10)) m1 <- lm(y~x,data=d)
##' define data within function, return fitted model f1 <-
function() { d2 <- d lm(y~x,data=d2) return(lm(y~x,data=d2)) }
##' define (and modify) data within function, try to update ##'
model fitted elsewhere f2 <- function(m) { d2 <- d; d2[1] <-
d2[1]+0 ## force copy update.default(m,data=d2) } ##' define (and
modify) data within function, try to update ##' model fitted
elsewhere (use envformula) f3 <- function(m) { d2 <- d; d2[1] <-
d2[1]+0 ## force copy update.envformula(m,data=d2) }
##' hack: first try the formula, then the parent frame ##' if
that doesn't work for any reason f4 <- function(m) { d2 <- d;
d2[1] <- d2[1]+0 ## force copy update.both(m,data=d2) }
## Case 1: fit within function m2 <- f1() try(update.default(m2)) ## default: object 'd2' not found m3A <- update.envformula(m2) ## envformula: works m3B <- update.both(m2) ## works
## Case 2: update within function m4A <- f2(m1) ## default: works try(f3(m1)) ## envformula: object 'd2' not found m4B <- f4(m1) ## works
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAEBAgAGBQJVFc1CAAoJEOCV5YRblxUHF+MH/3Y9uFZFolhx5b5jWSyXwQgp i9oawx4K6il0qiAiDiO5D7NSSdc0u9jlgj8NjH0G2O9u3ctpvcYNVwa7cP9288Xz xRyInnnh2FIpT6W0XyzJDivw5EX3IkyYuv6eDNqVyGcYXkvzJMA+vwMMWdGWEZbL jKtDc0trG+9yJnwIi6DW6IQWPovrDaNxEinS+V7+DmYACQvJ4P2kg2u/ZsxAx89q mcA1pS5usJjkOiQwBVUvV7l2UKNhHPFNwbBK1QdHgpP7PTdB52EQr+IyERhpf56s 8tYyNbSSPWoG9vt6/1pKyUK4iNRBtGgxtuozAv5OUjF8VGWGwUXBLo5G2yrBbs4= =o1PJ -----END PGP SIGNATURE-----