Skip to content

proposed simulate.glm method

9 messages · Ben Bolker, Alex D'Amour, Martin Maechler +2 more

#
I have found the "simulate" method (incorporated
in some packages) very handy. As far as I can tell the
only class for which simulate is actually implemented
in base R is lm ... this is actually a little dangerous
for a naive user who might be tempted to try
simulate(X) where X is a glm fit instead, because
it defaults to simulate.lm (since glm inherits from
the lm class), and the answers make no sense ...

Here is my simulate.glm(), which is modeled on
simulate.lm .  It implements simulation for poisson
and binomial (binary or non-binary) models, should
be easy to implement others if that seems necessary.

  I hereby request comments and suggest that it wouldn't
hurt to incorporate it into base R ...  (I will write
docs for it if necessary, perhaps by modifying ?simulate --
there is no specific documentation for simulate.lm)

  cheers
    Ben Bolker


simulate.glm <- function (object, nsim = 1, seed = NULL, ...)
{
  ## RNG stuff copied from simulate.lm
  if (!exists(".Random.seed", envir = .GlobalEnv, inherits = FALSE))
    runif(1)
  if (is.null(seed))
    RNGstate <- get(".Random.seed", envir = .GlobalEnv)
  else {
    R.seed <- get(".Random.seed", envir = .GlobalEnv)
    set.seed(seed)
    RNGstate <- structure(seed, kind = as.list(RNGkind()))
    on.exit(assign(".Random.seed", R.seed, envir = .GlobalEnv))
  }
  ## get probabilities/intensities
  pred <- matrix(rep(predict(object,type="response"),nsim),ncol=nsim)
  ntot <- length(pred)
  if (object$family$family=="binomial") {
    resp <- object$model[[1]]
    size <- if (is.matrix(resp)) rowSums(resp) else 1
  }
  val <- switch(object$family$family,
                poisson=rpois(ntot,pred),
                binomial=rbinom(ntot,prob=pred,size=size),
                stop("family ",object$family$family," not implemented"))
  ans <- as.data.frame(matrix(val,ncol=nsim))
  attr(ans, "seed") <- RNGstate
  ans
}

if (FALSE) {
  ## examples: modified from ?simulate
  x <- 1:10
  n <- 10
  y <- rbinom(length(x),prob=plogis((x-5)/2),size=n)
  y2 <- c("a","b")[1+rbinom(length(x),prob=plogis((x-5)/2),size=1)]
  mod1 <- glm(cbind(y,n-y) ~ x,family=binomial)
  mod2 <- glm(factor(y2) ~ x,family=binomial)
  S1 <- simulate(mod1, nsim = 4)
  S1B <- simulate(mod2, nsim = 4)
  ## repeat the simulation:
  .Random.seed <- attr(S1, "seed")
  identical(S1, simulate(mod1, nsim = 4))

  S2 <- simulate(mod1, nsim = 200, seed = 101)
  rowMeans(S2)/10 # after correcting for binomial sample size, should be
about
  fitted(mod1)

  plot(rowMeans(S2)/10)
  lines(fitted(mod1))

  ## repeat identically:
  (sseed <- attr(S2, "seed")) # seed; RNGkind as attribute
  stopifnot(identical(S2, simulate(mod1, nsim = 200, seed = sseed)))
}
#
There is functionality similar to this included in the Zelig package
with it's "sim" method. The "sim" method goes a step further and
replicates the fitted model's analysis on the generated datasets as
well. I would suggest taking a look -- Zelig supports most (if not
all) glm models and a wide range of others.

The Zelig maintainers' site can be found at: http://gking.harvard.edu/zelig/.

Full disclosure: I am an employee of the Institute for Quantitative
Social Science, which performs most of the development and support for
the Zelig package.

Best,
Alex D'Amour
Statistical Programmer
Harvard Institute for Quantitative Social Science


2009/2/12 Ben Bolker <bolker at ufl.edu>:
#
Elsewhere (at least in lme4), refit(sim(model)) does the
same thing [and so one would need something like
apply(sim(model,1000),2,refit)].

  sim() is quite interesting, as is Zelig, but I'm not
sure I am ready to leap to it yet -- this was basically
a suggestion that simulate.glm could be included in
"vanilla" R ...

  Also (for better or worse), it looks like sim() also
does parametric bootstrapping on the parameter values,
whereas simulate.[g]lm() just uses "plug-in" estimates.

  cheers
    Ben Bolker
Alex D'Amour wrote:

  
    
#
BB> I have found the "simulate" method (incorporated
    BB> in some packages) very handy. As far as I can tell the
    BB> only class for which simulate is actually implemented
    BB> in base R is lm ... this is actually a little dangerous
    BB> for a naive user who might be tempted to try
    BB> simulate(X) where X is a glm fit instead, because
    BB> it defaults to simulate.lm (since glm inherits from
    BB> the lm class), and the answers make no sense ...

    BB> Here is my simulate.glm(), which is modeled on
    BB> simulate.lm .  It implements simulation for poisson
    BB> and binomial (binary or non-binary) models, should
    BB> be easy to implement others if that seems necessary.

    BB> I hereby request comments and suggest that it wouldn't
    BB> hurt to incorporate it into base R ...  (I will write
    BB> docs for it if necessary, perhaps by modifying ?simulate --
    BB> there is no specific documentation for simulate.lm)

    BB> cheers
    BB> Ben Bolker

[...............]

Hi Ben,
thank you for your proposals.

I agree that  simulate.glm() has been in missing in some way,
till now, in particular, as, as you note, "glm" objects extend
"lm" ones and hence  simulate(<glm>, ...) currently dispatches to
calling simulate.lm(....) which is only correct in the case of
the gaussian family.

I have looked at your proposal a bit, already "improved" the
code slightly (e.g. re-include the comment you lost when you
``copied'' simulate.lm():  In such cases, please work from the
source, not from what you get by print()ing
stats:::simulate.lm --- the source is either a recent tarball,
or the SVN repository, in this case, file
https://svn.r-project.org/R/trunk/src/library/stats/R/lm.R ]
and am planning to look at your and some own examples; 
all with the goal to indeed include this in the R standard
'stats' package in R-devel [to become R 2.9.0 in the future].

About the help page:  At the moment, I think that only a few
words would need to be added to the simulate help page,
i.e., https://svn.r-project.org/R/trunk/src/library/stats/man/simulate.Rd
and will be happy to receive a patch against this file.

Thank you again, and best regards,
Martin Maechler, ETH Zurich
#
Dear Martin,

I think a simulate.glm method ought to be able to work for gnm objects
too. David Firth and I started to work on this a long time ago, but
stopped part-way through when simulate.lm was introduced, thinking that
simulate.glm was probably in the pipeline and we were duplicating
effort. Obviously we have let this slip when a contribution might have
been useful. We developed a prototype for poisson, binomial, gaussian,
gamma and inverse gaussian models which might be usefully merged with
Ben's proposed simulate.glm. What's the best way to go about this? I
would also like to test the proposed simulate.glm to check whether it
will work with gnm objects or whether a simulate.gnm will be necessary.

Thanks and best regards,

Heather
Martin Maechler wrote:
#
Thanks a lot, Heather,
HT> Dear Martin,
    HT> I think a simulate.glm method ought to be able to work for gnm objects
    HT> too. David Firth and I started to work on this a long time ago, but
    HT> stopped part-way through when simulate.lm was introduced, thinking that
    HT> simulate.glm was probably in the pipeline and we were duplicating
    HT> effort. Obviously we have let this slip when a contribution might have
    HT> been useful. We developed a prototype for poisson, binomial, gaussian,
    HT> gamma and inverse gaussian models which might be usefully merged with
    HT> Ben's proposed simulate.glm. What's the best way to go about this? I
    HT> would also like to test the proposed simulate.glm to check whether it
    HT> will work with gnm objects or whether a simulate.gnm will be necessary.

In the mean time, private e-mail communications have started on
the subject, and yes, we are very insterested in finding 
``the best'' possible way, probably making use of
Heather+David's code together with Ben's. 
One alternative (not mentioned yet on R-devel), we've been
considering is to use simulate.lm() to also deal with "glm" (and
possibly "gnm") objects ``in one place''.

Martin
HT> Martin Maechler wrote:
>>>>>>> "BB" == Ben Bolker <bolker at ufl.edu>
    >>>>>>> on Thu, 12 Feb 2009 11:29:14 -0500 writes:
    >> 
    BB> I have found the "simulate" method (incorporated
    BB> in some packages) very handy. As far as I can tell the
    BB> only class for which simulate is actually implemented
    BB> in base R is lm ... this is actually a little dangerous
    BB> for a naive user who might be tempted to try
    BB> simulate(X) where X is a glm fit instead, because
    BB> it defaults to simulate.lm (since glm inherits from
    BB> the lm class), and the answers make no sense ...
    >> 
    BB> Here is my simulate.glm(), which is modeled on
    BB> simulate.lm .  It implements simulation for poisson
    BB> and binomial (binary or non-binary) models, should
    BB> be easy to implement others if that seems necessary.
    >> 
    BB> I hereby request comments and suggest that it wouldn't
    BB> hurt to incorporate it into base R ...  (I will write
    BB> docs for it if necessary, perhaps by modifying ?simulate --
    BB> there is no specific documentation for simulate.lm)
    >> 
    BB> cheers
    BB> Ben Bolker
    >> 
    >> [...............]
    >> 
    >> Hi Ben,
    >> thank you for your proposals.
    >> 
    >> I agree that  simulate.glm() has been in missing in some way,
    >> till now, in particular, as, as you note, "glm" objects extend
    >> "lm" ones and hence  simulate(<glm>, ...) currently dispatches to
    >> calling simulate.lm(....) which is only correct in the case of
    >> the gaussian family.
    >> 
    >> I have looked at your proposal a bit, already "improved" the
    >> code slightly (e.g. re-include the comment you lost when you
    >> ``copied'' simulate.lm():  In such cases, please work from the
    >> source, not from what you get by print()ing
    >> stats:::simulate.lm --- the source is either a recent tarball,
    >> or the SVN repository, in this case, file
    >> https://svn.r-project.org/R/trunk/src/library/stats/R/lm.R ]
    >> and am planning to look at your and some own examples; 
    >> all with the goal to indeed include this in the R standard
    >> 'stats' package in R-devel [to become R 2.9.0 in the future].
    >> 
    >> About the help page:  At the moment, I think that only a few
    >> words would need to be added to the simulate help page,
    >> i.e., https://svn.r-project.org/R/trunk/src/library/stats/man/simulate.Rd
    >> and will be happy to receive a patch against this file.
    >> 
    >> Thank you again, and best regards,
    >> Martin Maechler, ETH Zurich
#
Yes, thanks to Ben for getting the ball rolling. His code was more
streamlined than mine, pointing to further simplifications which I've
included in the extended version below.

The code for the additional families uses functions from MASS and
SuppDists - I wasn't sure about the best way to do this, so have just
used :: for now.

It appears to be working happily for both glm and gnm objects (no
gnm-specific code used).

Best wishes,

Heather

simulate.glm <- function (object, nsim = 1, seed = NULL, ...)
{
    fam <- object$family$family
    if(fam == "gaussian")
	return(simulate.lm(object, nsim=nsim, seed=seed, ...))
    if(!exists(".Random.seed", envir = .GlobalEnv, inherits = FALSE))
	runif(1)		       # initialize the RNG if necessary
    if(is.null(seed))
	RNGstate <- get(".Random.seed", envir = .GlobalEnv)
    else {
	R.seed <- get(".Random.seed", envir = .GlobalEnv)
	set.seed(seed)
	RNGstate <- structure(seed, kind = as.list(RNGkind()))
	on.exit(assign(".Random.seed", R.seed, envir = .GlobalEnv))
    }
    ## get probabilities/intensities
    pred <- object$fitted
    ntot <- length(pred) * nsim
    val <- switch(fam,
		  "poisson" = rpois(ntot, pred),
		  "binomial" = {
                      wts <- object$prior.weights
                      if (any(wts %% 1 != 0))
                          stop("cannot simulate from non-integer
prior.weights")
		      rbinom(ntot, size = wts, prob = pred)/wts
		  },
                  "Gamma" = {
                      shape <- MASS::gamma.shape(object)$alpha
                      rgamma(ntot, shape = shape, rate = shape/pred)
                  },
                  "inverse.gaussian" = {
                      lambda <- 1/summary(object)$dispersion
                      SuppDists::rinvGauss(ntot, nu = pred, lambda = lambda)
                  },
		  stop("family '", fam, "' not yet implemented"))
    ans <- as.data.frame(matrix(val, ncol = nsim))
    attr(ans, "seed") <- RNGstate
    ans
}
Martin Maechler wrote:
#
If you are generalizing this, the saving of the RNG information to 
reproduce normally distribution random number also needs to save the 
normal generator information. (This looks like an omission in 
simulate.lm.) You might want to consider adding the simple functions 
setRNG and getRNG from my setRNG package. Roughly, the functions make 
the  first ten lines of simulate.lm or Ben's code into a function call, 
which may not be so important other than the missing normal information, 
but having a standardized object with all the information seems to be 
useful.

The package also has documentation and tests, which can be helpful for 
other package builders that define simulate methods. It has been around 
for a long time so has been pretty well tested.

Paul Gilbert
Martin Maechler wrote:
====================================================================================

La version fran?aise suit le texte anglais.

------------------------------------------------------------------------------------

This email may contain privileged and/or confidential information, and the Bank of
Canada does not waive any related rights. Any distribution, use, or copying of this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately from
your system and notify the sender promptly by email that you have done so. 

------------------------------------------------------------------------------------

Le pr?sent courriel peut contenir de l'information privil?gi?e ou confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires d?sign?s est interdite. Si vous recevez
ce courriel par erreur, veuillez le supprimer imm?diatement et envoyer sans d?lai ?
l'exp?diteur un message ?lectronique pour l'aviser que vous avez ?limin? de votre
ordinateur toute copie du courriel re?u.
#
Thank you, Heather and Ben,
HT> Yes, thanks to Ben for getting the ball rolling. His
    HT> code was more streamlined than mine, pointing to further
    HT> simplifications which I've included in the extended
    HT> version below.

    HT> The code for the additional families uses functions from
    HT> MASS and SuppDists - I wasn't sure about the best way to
    HT> do this, so have just used :: for now.

    HT> It appears to be working happily for both glm and gnm
    HT> objects (no gnm-specific code used).

    HT> Best wishes,

    HT> Heather

[....]

I have now followed Brian Ripley's suggetion to just extend
simulate.lm() to also deal with "glm" objects, but using
Heather's suggestions for the different families;
I've just commited src/library/stats/R/lm.R  with the new code.
(get it from  svn.r-project.org/R/trunk/ or this night's R-devel
 tarball).

One difference to your propsal: Instead of just
    object$fitted , the code is using
    fitted(object)  ... something which should properly to the na.action
used.

For the (MASS and) SuppDists package requirement, I'm using 
the following

      if(is.null(tryCatch(loadNamespace("SuppDists"),
			  error = function(e) NULL)))
	  stop("Need CRAN package 'SuppDists' for 'inverse.gaussian' family")


I've not yet updated the help page for simulate(),
and have only tested relatively few cases for binomial, poisson
and Gamma.
I've wanted to expose this to you, so you can provide more
feedback and possibly even a patch to
   svn.r-project.org/R/trunk/src/library/stats/man/simulate.Rd

Martin