suggested modification to the 'mle' documentation?
On Fri, 7 Dec 2007, Duncan Murdoch wrote:
On 12/7/2007 8:10 AM, Peter Dalgaard wrote:
Ben Bolker wrote:
At this point I'd just like to advertise the "bbmle" package (on CRAN) for those who respectfully disagree, as I do, with Peter over this issue. I have added a data= argument to my version of the function that allows other variables to be passed to the objective function. It seems to me that this is perfectly in line with the way that other modeling functions in R behave.
This is at least cleaner than abusing the "fixed" argument. As you know, I have reservations, one of which is that it is not a given that I want it to behave just like other modeling functions, e.g. a likelihood function might refer to more than one data set, and/or data that are not structured in the traditional data frame format. The design needs more thought than just adding arguments.
We should allow more general things to be passed as data arguments in cases where it makes sense. For example a list with names or an environment would be a reasonable way to pass data that doesn't fit into a data frame.
I still prefer a design based a plain likelihood function. Then we can discuss how to construct such a function so that the data are incorporated in a flexible way. There are many ways to do this, I've shown one, here's another:
f <- function(lambda) -sum(dpois(x, lambda, log=T)) d <- data.frame(x=rpois(10000, 12.34)) environment(f)<-evalq(environment(),d)
We really need to expand as.environment, so that it can convert data frames into environments. You should be able to say: environment(f) <- as.environment(d) and get the same result as environment(f)<-evalq(environment(),d) But I'd prefer to avoid the necessity for users to manipulate the environment of a function. I think the pattern model( f, data=d )
For working at the general likelihood I think is is better to
encourage the approach of definign likelihood constructor functions.
The problem with using f, data is that you need to mathc the names
used in f and in data, so either you have to explicitly write out f
with the names you have in data or you have to modify data to use the
names f likes -- in the running example think
f <- function(lambda) -sum(dpois(x, lambda, log=T))
d <- data.frame(y=rpois(10000, 12.34))
somebody has to connext up the x in f with the y in d. With a negative
log likelihood constructor defines, for example, as
makePoisonNegLogLikelihood <- function(x)
function(lambda) -sum(dpois(x, lambda, log=T))
this happens naturally with
makePoisonNegLogLikelihood(d$y)
being implemented internally as environment(f) <- as.environment(d, parent = environment(f)) is very nice and general. It makes things like cross-validation, bootstrapping, etc. conceptually cleaner: keep the same formula/function f, but manipulate the data and see what happens. It does have problems when d is an environment that already has a parent, but I think a reasonable meaning in that case would be to copy its contents into a new environment with the new parent set.
Both (simple) bootstrapping and (simple leave-one-out) crossvalidation
require a data structure with a notion of cases, which is much more
restrictive than the conext in which mle can be used. A more ngeneric
aproach to bootstrapping that might fit closer to the level of
generality of mle might be parameterized in terms of a negative log
likelihood constructor, a starting value constructor, and a resampling
function, with a single iteration implemented soemthing like
mleboot1 <- function(nllmaker, start, esample) {
newdata <- resample()
newstart <- do.call(start, newdata)
nllfun <- do.call(nllmaker, newdata)
mle(fnllfun, start = newstart)
}
This would leave decisions on the resampling method and data structure
up to the user. Somehing similar could be done with K-fold CV.
luke
Duncan Murdoch
mle(f, start=list(lambda=10))
Call: mle(minuslogl = f, start = list(lambda = 10)) Coefficients: lambda 12.3402 It is not at all an unlikely design to have mle() as a generic function which works on many kinds of objects, the default method being function(object,...) mle(minuslogl(obj)) and minuslogl is an extractor function returning (tada!) the negative log likelihood function.
(My version also has a cool formula interface and other
bells and whistles, and I would love to get feedback from other
useRs about it.)
cheers
Ben Bolker
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke at stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu