So, I've embarked on my threatened modifications to the mle subset of the stats4 package. Most of what I've done so far has *not* been adding the slick formula interface, but rather making it work properly and reasonably robustly with real mle problems -- especially ones involving reasonably complex fixed and default parameter sets. Some of what I've done breaks backward compatibility (see below), but there are what I think are some reasonably important design issues -- it would be nice to get them fixed now, while the mle library is still in its infancy ... I've appended quite a long list of changes and "to do" stuff, which is also available at http://www.zoo.ufl.edu/bolker/mle/mle-changes.txt also there are http://www.zoo.ufl.edu/bolker/mle/mle-diffs.txt (diffs against 1.9.1) and http://www.zoo.ufl.edu/bolker/mle/mle.R (R code). I'm sure some of the code could be improved, especially in issues relating to eval() -- I'm not very good at that stuff. I apologize for changing so much at once, I got kind of carried away. I look forward to comments on the various bits & pieces ... Ben Bolker ------------ *** = changes default behavior in a user-visible way that I consider better, but arguably not just a bug fix: I've tried to justify these *** added checking code at the beginning of mle() that converts named numeric vectors into lists [this seems harmless to me -- is there a reason not to allow the user to specify a named vector rather than a list? especially since start gets sapply'd back to a vector before it gets passed to optim() anyway?] Added some (probably not quite right) code to mle to make sure that fullcoef gets evaluated properly when (1) the list contains expressions and (2) the list contains expressions that depend on other arguments e.g. fixed = list(a=y,b=x[y]) I used fullcoef <- lapply(fullcoef,eval,envir=fullcoef, enclos=parent.frame(100)) but there may well be a better way to do it. *** Changed type of fullcoef from "numeric" to "list", and return fullcoef rather than unlist(fullcoef) from mle [couldn't see a rationale for this -- it destroys a lot of the information in fullcoef *and* is a pain, say, when the fixed arguments include a data frame with lots of information in it] *** Changed "coef" method to return object@coef, not object@fullcoef [this really seems to be the better choice to me -- I normally want to see the *fitted values* of the MLE, not all the other auxiliary stuff. Besides, object@fullcoef can be very long, and therefore a nuisance to see in the default show(object) method] made a fullcoef accessor *function* to return object@fullcoef -- should this really be a method? added a cor method for mle objects -- which just normalizes the variance-covariance matrix to a correlation matrix. Is this a bad idea/perversion of the cor method? changed variable "pi" to "p.i" throughout mle -- less confusing! changed call$fixed <- fix to call$fixed <- c(fix,eval(call$fixed)) for cases where there are non-trivial fixed arguments added "follow" argument to profile: this makes profiling use a continuation method where the starting point for each profile optimization is the previous best-fit solution, rather than the overall MLEs of the parameters. Actually fairly easy to implement (I think: I haven't really tested that it works on anything hard, just that it doesn't seem to break profiling) -- requires pfit to be assigned globally within onestep() and a few lines of code further down. added an AIC method for mle objects collapsed the absVal/!absVal code cases slightly added a "sqrVal" argument for those who want to see the value of the log-likelihood, not the square root or signed square root (could be collapsed into a "scale" argument for the profile plot = "sqrt", "abssqrt", "lik") added code and options to plot labels of confidence levels (logical plot.confstr, character confstr) added add= argument (to draw profiles on an existing plot) added arguments for color of minimum values, confidence limits, profile (col.minval, col.conf, col.prof) added options for confidence interval: when applied to an mle object, method "spline" does the previous behavior (profile and apply confint to the result). Method "quad" simply presents the quadratic approximation to the confidence intervals. Method "exact" uses uniroot() to find the precise point where the profile crosses the critical level in each direction. Added mle.options() command, and .mle.options state variable, to keep global options (method for optim() and method for confint()): I'm not at all sure that this is the best way to implement options, this was just my first crack at it added a warning to show(mle) if optim() did not converge Added code that allows (1) default arguments (evaluated in the frame of the full coefficient list, with fixed values and starting values substituted and (2) arguments specified in the start list in arbitrary order (which seems like a reasonable expectation since it *is* specified as a list). The fundamental problem is that optim() loses names of the parameter vector somewhere. Example: x = runif(200) y = 1+x+x^2+rnorm(200,sd=0.05) fn <- function(a,b,z=2,c,d) { -sum(dnorm(y,mean=a+c*x+d*x^2,sd=exp(b),log=TRUE)) } m1 = mle(minuslogl=fn,start=list(a=1,b=1,c=1,d=1)) ## fails with "missing argument" warning, about wrong argument m1 = mle(minuslogl=fn,start=list(a=1,b=1,c=1,d=1),fixed=list(z=2)) ## works m2 = mle(minuslogl=fn,start=list(a=1,d=1,c=1,b=1),fixed=list(z=2)) ## fails -- coeffs returned in wrong order TO DO: torture-test on some real problems! better documentation? e.g. ?profile.mle-class doesn't give details on arguments -- have to look at profile.nls (allow "which" to be a character vector -- match names)? HARDER: fancy formula interface [cf. svymle in survey package] e.g. mll <- mLL(type="independent",distrib="normal",resp=y,mean=~a+b*x,sd=~s, param=~a+b+s) allow for fitting of transformed parameters (exp/log, tanh/atanh = logistic/logit) 2D profiles (quadratic or thin-plate spline???) EASIER but breaking backward compatibility: merge absVal/sqrVal into a "scale" argument? EASIER but ??worthwhile??: allow spline to be turned off when plotting profiles? (method "spline"/"raw")? code for producing/plotting "slices" (non-optimized transects through parameter space); other diagnostic tools? NOT SURE: change show, show(summary) methods to bring them more in line with other classes? add test to confint(profile) that warns if method is supplied?
overhaul of mle
6 messages · Ben Bolker, Peter Dalgaard, Roger D. Peng
Ben Bolker <bolker@zoo.ufl.edu> writes:
*** Changed type of fullcoef from "numeric" to "list", and return fullcoef rather than unlist(fullcoef) from mle [couldn't see a rationale for this -- it destroys a lot of the information in fullcoef *and* is a pain, say, when the fixed arguments include a data frame with lots of information in it]
Wait a minute. How can a likelihood function have an argument that is
a data frame? I think you're abusing the fixed arguments if you use it
to pass in data. The natural paradigm for that would be to pass data
via a closure, i.e.
mll <- with(data,
function(lambda=1,theta=0)sum(dpois(y, lambda+theta*x, log=TRUE))
)
*** Changed "coef" method to return object@coef, not object@fullcoef [this really seems to be the better choice to me -- I normally want to see the *fitted values* of the MLE, not all the other auxiliary stuff. Besides, object@fullcoef can be very long, and therefore a nuisance to see in the default show(object) method]
See above. This was never intended to contain auxiliary stuff (and AFAIR has already been changed once in the opposite direction, by Brian)
made a fullcoef accessor *function* to return object@fullcoef -- should this really be a method? added a cor method for mle objects -- which just normalizes the variance-covariance matrix to a correlation matrix. Is this a bad idea/perversion of the cor method?
Yes, I think so. cov2cor(vcov(ml.obj)) is easy enough.
changed variable "pi" to "p.i" throughout mle -- less confusing!
OK.
changed call$fixed <- fix to call$fixed <- c(fix,eval(call$fixed)) for cases where there are non-trivial fixed arguments
Which there shouldn't be...
added "follow" argument to profile: this makes profiling use a continuation method where the starting point for each profile optimization is the previous best-fit solution, rather than the overall MLEs of the parameters. Actually fairly easy to implement (I think: I haven't really tested that it works on anything hard, just that it doesn't seem to break profiling) -- requires pfit to be assigned globally within onestep() and a few lines of code further down.
Sounds nice, but surely you don't need a global assignment there? A
superassign ("<<-") perhaps, but that doesn't need to go to
.GlobalEnv.
added an AIC method for mle objects collapsed the absVal/!absVal code cases slightly added a "sqrVal" argument for those who want to see the value of the log-likelihood, not the square root or signed square root (could be collapsed into a "scale" argument for the profile plot = "sqrt", "abssqrt", "lik") added code and options to plot labels of confidence levels (logical plot.confstr, character confstr) added add= argument (to draw profiles on an existing plot) added arguments for color of minimum values, confidence limits, profile (col.minval, col.conf, col.prof) added options for confidence interval: when applied to an mle object, method "spline" does the previous behavior (profile and apply confint to the result). Method "quad" simply presents the quadratic approximation to the confidence intervals. Method "exact" uses uniroot() to find the precise point where the profile crosses the critical level in each direction.
All fine. The last one could be important as I had a case where the spline method went rather badly wrong (the data it happened with are still rather heavily embargoed I'm afraid).
Added mle.options() command, and .mle.options state variable, to keep
global options (method for optim() and method for confint()): I'm not
at all sure that this is the best way to implement options, this was
just my first crack at it
added a warning to show(mle) if optim() did not converge
Added code that allows (1) default arguments (evaluated
in the frame of the full coefficient list, with fixed values
and starting values substituted and (2) arguments specified in the
start list in arbitrary order (which seems like a reasonable
expectation since
it *is* specified as a list). The fundamental problem is that optim()
loses names
of the parameter vector somewhere.
Example:
x = runif(200)
y = 1+x+x^2+rnorm(200,sd=0.05)
fn <- function(a,b,z=2,c,d) {
-sum(dnorm(y,mean=a+c*x+d*x^2,sd=exp(b),log=TRUE))
}
m1 = mle(minuslogl=fn,start=list(a=1,b=1,c=1,d=1))
## fails with "missing argument" warning, about wrong argument
m1 = mle(minuslogl=fn,start=list(a=1,b=1,c=1,d=1),fixed=list(z=2))
## works
m2 = mle(minuslogl=fn,start=list(a=1,d=1,c=1,b=1),fixed=list(z=2))
## fails -- coeffs returned in wrong order
Hmm.. I see the effect with the current version too. Depending on temperament, it is the labels rather than the order that is wrong...
TO DO:
torture-test on some real problems!
better documentation? e.g. ?profile.mle-class doesn't give details on
arguments -- have to look at profile.nls
(allow "which" to be a character vector -- match names)?
HARDER:
fancy formula interface [cf. svymle in survey package] e.g.
mll <-
mLL(type="independent",distrib="normal",resp=y,mean=~a+b*x,sd=~s,
param=~a+b+s)
allow for fitting of transformed parameters (exp/log, tanh/atanh =
logistic/logit)
The last one should be trivial, no? mll2 <- function(a,b,c,d) mll1(log(a),atan(b),c,d) Also: code for combination of likelihoods (i.e. summing likelihoods for independent subexperiments involving the same parameters; integrating out nuisance variables.) MUCH, MUCH HARDER: Figure out what it would take to include higher order asymptotic inference (as per Brazzale/Bellio's hoa bundle) in a generic likelihood setting...
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907
Peter Dalgaard wrote:
Ben Bolker <bolker@zoo.ufl.edu> writes:
*** Changed type of fullcoef from "numeric" to "list", and return fullcoef rather than unlist(fullcoef) from mle [couldn't see a rationale for this -- it destroys a lot of the information in fullcoef *and* is a pain, say, when the fixed arguments include a data frame with lots of information in it]
Wait a minute. How can a likelihood function have an argument that is
a data frame? I think you're abusing the fixed arguments if you use it
to pass in data. The natural paradigm for that would be to pass data
via a closure, i.e.
mll <- with(data,
function(lambda=1,theta=0)sum(dpois(y, lambda+theta*x, log=TRUE))
)
*** Changed "coef" method to return object@coef, not object@fullcoef [this really seems to be the better choice to me -- I normally want to see the *fitted values* of the MLE, not all the other auxiliary stuff. Besides, object@fullcoef can be very long, and therefore a nuisance to see in the default show(object) method]
See above. This was never intended to contain auxiliary stuff (and AFAIR has already been changed once in the opposite direction, by Brian)
OK, I want to hear about this. My normal approach to writing
likelihood functions that can be evaluated with more than one data
set is essentially
mll <- function(par1,par2,par3,X=Xdefault,Y=Ydefault,Z=Zdefault) { ... }
where X, Y, Z are the data values that may change from one fitting
exercise to the next. This seems straightforward to me, and I always
thought it was the reason why optim() had a ... in its argument list,
so that one could easily pass these arguments down. I have to confess
that I don't quite understand how your paradigm with with() works: if
mll() is defined as you have it above, "data" is a data frame containing
$x and $y, right? How do I run mle(minuslogl=mll,start=...) for
different values of "data" (data1, data2, data3) in this case? Does
it go in the call as mle(minuslogl=mll,start=...,data=...)? Once I've
found my mle, how do I view/access these values when I want to see
what data I used to fit mle1, mle2, etc.?
I'm willing to change the way I do things (and teach my students
differently), but I need to see how ... I don't see how what I've
written is an "abuse" of the fixed arguments [I'm willing to believe
that it is, but just don't know why]
added a cor method for mle objects -- which just normalizes the variance-covariance matrix to a correlation matrix. Is this a bad idea/perversion of the cor method?
Yes, I think so. cov2cor(vcov(ml.obj)) is easy enough.
OK. I wasn't aware of cov2cor().
changed call$fixed <- fix to call$fixed <- c(fix,eval(call$fixed)) for cases where there are non-trivial fixed arguments
Which there shouldn't be...
added "follow" argument to profile: this makes profiling use a continuation method where the starting point for each profile optimization is the previous best-fit solution, rather than the overall MLEs of the parameters. Actually fairly easy to implement (I think: I haven't really tested that it works on anything hard, just that it doesn't seem to break profiling) -- requires pfit to be assigned globally within onestep() and a few lines of code further down.
Sounds nice, but surely you don't need a global assignment there? A
superassign ("<<-") perhaps, but that doesn't need to go to
.GlobalEnv.
OK -- I think that's just my ignorance showing.
Added code that allows (1) default arguments (evaluated
in the frame of the full coefficient list, with fixed values
and starting values substituted and (2) arguments specified in the
start list in arbitrary order (which seems like a reasonable
expectation since
it *is* specified as a list). The fundamental problem is that optim()
loses names
of the parameter vector somewhere.
Example:
x = runif(200)
y = 1+x+x^2+rnorm(200,sd=0.05)
fn <- function(a,b,z=2,c,d) {
-sum(dnorm(y,mean=a+c*x+d*x^2,sd=exp(b),log=TRUE))
}
m1 = mle(minuslogl=fn,start=list(a=1,b=1,c=1,d=1))
## fails with "missing argument" warning, about wrong argument
m1 = mle(minuslogl=fn,start=list(a=1,b=1,c=1,d=1),fixed=list(z=2))
## works
m2 = mle(minuslogl=fn,start=list(a=1,d=1,c=1,b=1),fixed=list(z=2))
## fails -- coeffs returned in wrong order
Hmm.. I see the effect with the current version too. Depending on temperament, it is the labels rather than the order that is wrong...
Hmmm. By "current version" do you mean you can still find ways to get wrong answers with my modified version? Can you send me an example? Can you think of a better way to fix this?
allow for fitting of transformed parameters (exp/log, tanh/atanh = logistic/logit)
The last one should be trivial, no? mll2 <- function(a,b,c,d) mll1(log(a),atan(b),c,d)
Yes, but I was thinking of something convenient for my students --
syntactic sugar, if you like -- so that fitted values, confidence
intervals, etc., would automatically be displayed on the
original (non-transformed) scale
cheers,
Ben Bolker
Ben Bolker <bolker@zoo.ufl.edu> writes:
OK, I want to hear about this. My normal approach to writing
likelihood functions that can be evaluated with more than one data
set is essentially
mll <- function(par1,par2,par3,X=Xdefault,Y=Ydefault,Z=Zdefault) { ... }
where X, Y, Z are the data values that may change from one fitting
exercise to the next. This seems straightforward to me, and I always
thought it was the reason why optim() had a ... in its argument list,
so that one could easily pass these arguments down. I have to confess
that I don't quite understand how your paradigm with with() works: if
mll() is defined as you have it above, "data" is a data frame containing
$x and $y, right? How do I run mle(minuslogl=mll,start=...) for
different values of "data" (data1, data2, data3) in this case? Does
it go in the call as mle(minuslogl=mll,start=...,data=...)? Once I've
found my mle, how do I view/access these values when I want to see
what data I used to fit mle1, mle2, etc.?
You generate different likelihood functions. If you do it using lexical scoping, the data is available in the environment(mll). If you insist on having one function that can be modified by changing data, you can simply assign into that environment.
I'm willing to change the way I do things (and teach my students differently), but I need to see how ... I don't see how what I've written is an "abuse" of the fixed arguments [I'm willing to believe that it is, but just don't know why]
"Abuse" is of course a strong word, but... The point is that likelihood inference views the likelihood as a function of the model parameters *only* and it is useful to stick to that concept in the implementation. The fixed arguments were only ever introduced to allow profiling by keeping some parameters fixed during optimization. Another aspect is that one of my ultimate goals with this stuff was to allow generic operations to be defined on likelihoods (combining, integrating, mixing, conditioning...) and that becomes even more elusive if you have to deal with data arguments for the component likelihoods. Not that I've got very far thinking about the design, but I'm rather sure that the "likelihood object" needs to be well-defined if it is to have a chance at all.
Hmm.. I see the effect with the current version too. Depending on temperament, it is the labels rather than the order that is wrong...
Hmmm. By "current version" do you mean you can still find ways to get wrong answers with my modified version? Can you send me an example? Can you think of a better way to fix this?
Strike "too", haven't gotten around to checking your code yet. Possibly, this requires a bug fix for 1.9.1.
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907
Taking advantage of lexical scoping is definitely the way to go
here. I usually write a function called `makeLogLik' or
something like that which basically returns a function that can
be passed into an optimizer. Something like:
makeLogLik <- function(X=Xdefault, Y=Ydefault, Z=Zdefault) {
function(par1, par2, par3) {
...
}
}
-roger
Peter Dalgaard wrote:
Ben Bolker <bolker@zoo.ufl.edu> writes:
OK, I want to hear about this. My normal approach to writing
likelihood functions that can be evaluated with more than one data
set is essentially
mll <- function(par1,par2,par3,X=Xdefault,Y=Ydefault,Z=Zdefault) { ... }
where X, Y, Z are the data values that may change from one fitting
exercise to the next. This seems straightforward to me, and I always
thought it was the reason why optim() had a ... in its argument list,
so that one could easily pass these arguments down. I have to confess
that I don't quite understand how your paradigm with with() works: if
mll() is defined as you have it above, "data" is a data frame containing
$x and $y, right? How do I run mle(minuslogl=mll,start=...) for
different values of "data" (data1, data2, data3) in this case? Does
it go in the call as mle(minuslogl=mll,start=...,data=...)? Once I've
found my mle, how do I view/access these values when I want to see
what data I used to fit mle1, mle2, etc.?
You generate different likelihood functions. If you do it using lexical scoping, the data is available in the environment(mll). If you insist on having one function that can be modified by changing data, you can simply assign into that environment.
I'm willing to change the way I do things (and teach my students differently), but I need to see how ... I don't see how what I've written is an "abuse" of the fixed arguments [I'm willing to believe that it is, but just don't know why]
"Abuse" is of course a strong word, but... The point is that likelihood inference views the likelihood as a function of the model parameters *only* and it is useful to stick to that concept in the implementation. The fixed arguments were only ever introduced to allow profiling by keeping some parameters fixed during optimization. Another aspect is that one of my ultimate goals with this stuff was to allow generic operations to be defined on likelihoods (combining, integrating, mixing, conditioning...) and that becomes even more elusive if you have to deal with data arguments for the component likelihoods. Not that I've got very far thinking about the design, but I'm rather sure that the "likelihood object" needs to be well-defined if it is to have a chance at all.
Hmm.. I see the effect with the current version too. Depending on temperament, it is the labels rather than the order that is wrong...
Hmmm. By "current version" do you mean you can still find ways to get wrong answers with my modified version? Can you send me an example? Can you think of a better way to fix this?
Strike "too", haven't gotten around to checking your code yet. Possibly, this requires a bug fix for 1.9.1.
2 days later
Hmmm. I'm still struggling with this a little bit. I do now see, I
guess, how to use lexical scoping to get the same thing accomplished
as
mll <- function(p1,p2,p3,data1,data2,data3)
[although it still seems like the long way around].
Peter, I appreciate why you're trying to define things this narrowly,
and I agree that I've extended the use of the "fixed" arguments. Would
it make you any happier if there were a third argument in addition to
start and fixed (say "otherargs") that allowed for other arguments?
The other possibility would be to try to allow other arguments in a ...
argument that got passed through to optim(), but that would conflict
with the way the function is defined now since right now ... is supposed
to contain additional arguments to optim() [if you wanted to
allow the use of ... you could either (1) add
an "optimargs" argument list with arguments for optim or (2) try
to sort the ... list according to those that matched formals(optim)
and those that didn't -- either way is bit of a nuisance, I guess).
The advantage of an "otherargs" argument is that it would provide
a way for you (Peter) to define a strict subset of the functionality
that only worked when people refrained from messing with the data --
if someone wanted to combine two likelihoods you could either check
that the "otherargs" components were identical, or disallow that
operation if "otherargs" was non-NULL.
Again, I don't want to be a pain -- we all have our own goals and
ways of working, and I certainly don't have any intention
of hijacking the mle code (although
in the end I guess I could always fork my own version -- this is GPL,
right?). It's just that it seems much more natural to me to do
mll <- function(p1,p2,p3,data1,data2,data3)
m1 = mle(mll,start=...,otherargs=list(data1=X1,data2=X2,data3=X3))
m2 = mle(mll,start=...,otherargs=list(data1=Y1,data2=Y2,data3=Y3))
rather than
mll <- function(p1,p2,p3) { f(p1,p2,p3,data1) }
assign("data1",X1,envir=environment(mll))
assign("data2",X2,envir=environment(mll))
assign("data3",X3,envir=environment(mll))
...
m1 = mle(mll,start=...)
assign("data1",Y1,envir=environment(mll))
assign("data2",Y2,envir=environment(mll))
assign("data3",Y3,envir=environment(mll))
m2 = mle(mll,start=...)
or
makeLogLik <- function(data1,data2,data3) {
function(p1,p2,p3) { }
}
m1 <- mle(makeLogLik(X1,X2,X3),start=...)
m2 <- mle(makeLogLik(Y1,Y2,Y3),start=...)
[and yes, I do sometimes have three different
"data" objects, of different lengths, to use
in MLE analyses: right now I'm doing something
where I have population samples in one set
of years and fishery closures in another, different-length,
set of years -- it would be possible to combine
them all into a single data frame, but the point
is that it's not an unusual requirement to
have multiple auxiliary data sets]
Am I making any sense here? Do any other readers of R-devel (or at
least those that have hung on to this point) routinely run the same
MLE analyses for a variety of different data sets and find my
approach sensible, or am I alone?
Ben
PS lm(), glm(), nls(), nlme(), ... all have a "data" argument that
seems to play the role of "otherargs" presented above. Would this
be a sensible and precedented way to go?
[SNIPPED context: discussion of lexical scoping and closures
and why to use them instead of passing data as extra parameters
to a likelihood function]