suggested modification to the 'mle' documentation?

On 12/7/2007 8:10 AM, Peter Dalgaard wrote:
Ben Bolker wrote:
  At this point I'd just like to advertise the "bbmle" package
(on CRAN) for those who respectfully disagree, as I do, with Peter over
this issue.  I have added a data= argument to my version
of the function that allows other variables to be passed
to the objective function.  It seems to me that this is perfectly
in line with the way that other modeling functions in R
behave.

This is at least cleaner than abusing the "fixed" argument. As you know,
I have reservations, one of which is that it is not a given that I want
it to behave just like other modeling functions, e.g. a likelihood
function might refer to more than one data set, and/or data that are not
structured in the traditional data frame format. The design needs more
thought than just adding arguments.
We should allow more general things to be passed as data arguments in
cases where it makes sense.  For example a list with names or an
environment would be a reasonable way to pass data that doesn't fit into
a data frame.

I still prefer a design based a plain likelihood function. Then we can
discuss how to construct such a function so that  the data are
incorporated in a flexible way.  There are many ways to do this, I've
shown one, here's another:

f <- function(lambda) -sum(dpois(x, lambda, log=T))
d <- data.frame(x=rpois(10000, 12.34))
environment(f)<-evalq(environment(),d)
We really need to expand as.environment, so that it can convert data
frames into environments.  You should be able to say:

environment(f) <- as.environment(d)

and get the same result as

environment(f)<-evalq(environment(),d)

But I'd prefer to avoid the necessity for users to manipulate the
environment of a function.  I think the pattern

model( f, data=d )
For working at the general likelihood I think is is better to
encourage the approach of definign likelihood constructor functions.
The problem with using f, data is that you need to mathc the names
used in f and in data, so either you have to explicitly write out f
with the names you have in data or you have to modify data to use the
names f likes -- in the running example think

     f <- function(lambda) -sum(dpois(x, lambda, log=T))
     d <- data.frame(y=rpois(10000, 12.34))

somebody has to connext up the x in f with the y in d. With a negative
log likelihood constructor defines, for example, as

     makePoisonNegLogLikelihood <- function(x)
         function(lambda) -sum(dpois(x, lambda, log=T))

this happens naturally with

     makePoisonNegLogLikelihood(d$y)
being implemented internally as

environment(f) <- as.environment(d, parent = environment(f))

is very nice and general.  It makes things like cross-validation,
bootstrapping, etc. conceptually cleaner:  keep the same
formula/function f, but manipulate the data and see what happens.
It does have problems when d is an environment that already has a
parent, but I think a reasonable meaning in that case would be to copy
its contents into a new environment with the new parent set.
Both (simple) bootstrapping and (simple leave-one-out) crossvalidation
require a data structure with a notion of cases, which is much more
restrictive than the conext in which mle can be used.  A more ngeneric
aproach to bootstrapping that might fit closer to the level of
generality of mle might be parameterized in terms of a negative log
likelihood constructor, a starting value constructor, and a resampling
function, with a single iteration implemented soemthing like

     mleboot1 <- function(nllmaker, start, esample)  {
 	newdata <- resample()
 	newstart <- do.call(start, newdata)
 	nllfun <- do.call(nllmaker, newdata)
 	mle(fnllfun, start = newstart)
     }

This would leave decisions on the resampling method and data structure
up to the user. Somehing similar could be done with K-fold CV.

luke
Duncan Murdoch

mle(f, start=list(lambda=10))
Call:
mle(minuslogl = f, start = list(lambda = 10))

Coefficients:
 lambda
12.3402

It is not at all an unlikely design to have mle() as a generic function
which works on many kinds of objects, the default method being
function(object,...) mle(minuslogl(obj)) and minuslogl is an extractor
function returning (tada!) the negative log likelihood function.
  (My version also has a cool formula interface and other
bells and whistles, and I would love to get feedback from other
useRs about it.)

   cheers
    Ben Bolker

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

suggested modification to the 'mle' documentation?

Thread (17 messages)