Skip to content

gaussian family change suggestion

6 messages · Simon Wood, Thomas Lumley, Brian Ripley

#
Hi,

Currently the `gaussian' family's initialization code signals an error if
any response data are zero or negative and a log link is used. Given that
zero or negative response data are perfectly legitimate under the GLM
fitted using `gaussian("log")', this seems a bit unsatisfactory. Might
it be worth changing it?

The current offending code from `gaussian' is:

initialize = expression({
            n <- rep.int(1, nobs)
            if (is.null(etastart) && is.null(start) && is.null(mustart) &&
                ((family$link == "inverse" && any(y == 0)) ||
                  (family$link == "log" && any(y <= 0)))) stop(
              "cannot find valid starting values: please specify some")
            mustart <- y
        })

A possible replacement would be ...

initialize = expression({
            n <- rep.int(1, nobs)
            if (is.null(etastart) && is.null(start) && is.null(mustart) &&
                ((family$link == "inverse" && all(y == 0)) ||
                  (family$link == "log" && all(y <= 0)))) stop(
      "cannot find valid starting values: please specify some")
        mustart <- y
      if (family$link=="log") {
        iy <- y<=0
        if (sum(iy)) mustart[iy] <- min(y[!iy])*.5
      } else if (family$link=="inverse") {
        iy <- y==0
        if (sum(iy)) mustart[iy] <- min(abs(y[!iy]))*.5
      }
        })

best,
Simon
#
On Tue, 11 Apr 2006, Simon Wood wrote:

            
Well, that's not really true: it says it is unable to find starting 
values, and asks you to provide some.  I don't really see why your choices 
are satisfactory: what happens if all the data are negative for example? 
(You get +Inf in the log case, I believe.  The algorithm will probably get 
stuck from a finite starting point, but it will produce a mysterious error 
from an infinite one.)

We could try even harder, but code that is almost never used tends to get 
broken whilst no one is testing it.  So if you want to pursue it I think 
we need a comprehensive test suite.

  
    
#
- then the suggested code traps it.
- well, I guess, but I wouldn't have thought that zeros in data modelled
using `gaussian("log")' is such a rare occurance is it? [Or did you mean
that `gaussian("log")' is almost never used, and should hence be kept
simple].

I suppose there are arguments both ways...

best,
Simon
#
On Tue, 11 Apr 2006, Prof Brian Ripley wrote:

            
I have actually been working with log-link glms quite a bit recently and 
was planning to change the defaults.

 	-thomas
Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle
#
On Tue, 11 Apr 2006, Simon Wood wrote:

            
Yes, but why (it can happen too)?
I meant initialization code for this case would rarely get used, as I 
suppose those people who do this are used to providing starting values
(or there are few or no such people).

  
    
#
- isn't this (i.e. all response <=0 ) a case where the MLE's of the
linear predictor must tend to minus infinity? So there's probably  an
argument for handling it differently [although my proposed error message
is not the most informative, since modifying the starting values won't help].

best,
Simon