Skip to content

Error using glm with poisson family and identity link

13 messages · Federico Gherardini, Brian Ripley, Spencer Graves +4 more

#
Hi all
I'm trying to use the function glm from the MASS package to do the 
following fit.

fit <- glm(FP ~ rand, data = tab, family = poisson(link = "identity"), 
subset = rand >= 1)
(FP is >= 0)

but I get the following error

Error: no valid set of coefficients has been found:please supply 
starting values
In addition: Warning message:
NaNs produced in: log(x)

in contrast if I fit a model without intercept

fit <- glm(FP ~ rand - 1, data = tab, family = poisson(link = 
"identity"), subset = rand >= 1)

everything goes fine.
Now my guess is that the points "naturally" have a negative intercept so 
the error is produced because I'm using the poisson distribution for the 
y and negative values are of course not admitted. Am I right?
Also if this is the cause, shouldn't the function always try to do the 
best fit given the parameters? I mean shouldn't it fit a model with 
intercept 0 anyway and report it as a bad fit?

Thanks

Federico
#
On Thu, 25 Nov 2004, Federico Gherardini wrote:

            
It's in the stats package.
And did you follow the advice in the error message?
We don't have your data, but it is plausible.
Well, I believe functions should do what they say on the box (and the help 
page), and not what some user hopes they might do by mind-reading.

You do have a suitable set of starting values from the second fit, so why 
not just follow the rather explicit advice?
#
Dear Federico: 

      Why do you use the "identity" link?  That can produce situations 
with an average of (-2) Poisson defects per unit, for example.  That's 
physical nonsense.  Also, it seems essentially to be generating your 
error message. 

      Also, have you considered the following: 

fit <- glm(FP ~ offset(log(rand)), data = tab, family = poisson, subset 
= rand >= 1)

      I haven't tried this, but it looks like this model is virtually
equivalent to the one you wrote:
"FP ~ rand-1" with link = "identity" says "estimate 'b' in 
PoissonMean = b*rand". 

      "FP ~ offset(log(rand))" with link = "log" (the default) says 
"estimate 'a' in log(PoissonMean)-log(rand) = a". 

      If I haven't made an error, then log(b) = a. 

      In more general situations, if you really need the "identity" 
link, have you considered searching for good starting values, as Prof. 
Ripley suggested?  You could build up to your final model by estimating 
simpler models and obtaining trial fits using the default "log" link?  
With those results and a little thought, you should be able to obtain 
reasonable starting values. 

      hope this helps. 
      spencer graves
Prof Brian Ripley wrote:

            

  
    
#
Spencer Graves <spencer.graves at pdf.com> writes:
So is _not_ using the identity link when the model is manifestly
additive on the identity scale. E.g. calibrating differential
spectrofluorometry with photon counters recording linear combinations
of intensities at different wavelengths.

I've bumped into similar situations before (binomial(link=identity), I
think it was then) and the glm.fit algorithm could use improvement in
dealing with the parameter constraints in these cases. With the
standard IRLS algorithm, if the maximum is on the boundary, you
basically hit a random point on the boundary and get stuck there with
a search direction pointing out of the valid region.
#
Hi, Peter: 

      What do you do in such situations? 

      Sundar Dorai-Raj and I have extended "glm" concepts to models 
driven by a sum of k independent Poissons, with the a linear model for 
log(defectRate[i]) for each source (i = 1:k).  To handle convergence 
problems, etc., I think we need to use informative Bayes, but we're not 
there yet.  In any context where things are done more than once [which 
covers most human activities], informative Bayes seems sensible. 

      A related question comes with data representing the differences 
between Poisson counts, e.g., with d[i] = X[i]-X[i-1] = the number of 
new defects added between steps i-1 and i in a manufacturing process.  
Most of the time, d[i] is nonnegative.  However, in some cases, it can 
be negative, either because of metrology errors in X[i] or because of 
defect removal between steps i-1 and i. 

      Comments?
      Best Wishes,
      Spencer Graves
Peter Dalgaard wrote:

            

  
    
#
Spencer Graves <spencer.graves at pdf.com> writes:
I haven't got all that much experience with it, but obviously, the
various algorithms for constrained optimization (box- or otherwise) at
least allow you to find a proper maximum likelihood estimator.
#
Hi, Peter: 

      Thanks for the comment and reply. 

      I generally avoid constrained optimizers for three reasons: 

      1.  My experience with them has included many cases where the 
optimizer would stop with an error when testing parameter values that 
violate the constraints.  If I transform the parameter space to remove 
the constraints, that never happens.  The constrained optimizers in R 
2.0.1 may not exhibit this behavior, but I have not checked. 

      2.  In a few cases, I've plotted the log(likelihood) vs. parameter 
values using various transformations.  When I've done that, I typically 
found that the most nearly parabolic performance used unconstrained 
parameterizations.  This makes asymptotic normality more useful and 
increases the accuracy of simple, approximate sequential Bayesian 
procedures. 

      3.  When I think carefully about a particular application, I often 
find a rationale for claiming that a certain unconstrained 
parameterization provides a better description of the application.  For 
example, interest income on investments is essentially additive on the 
log scale.  Similarly, the concept of "materiality" in Accounting is 
closer to being constant in log space:  One might look for an error of a 
few Euros in the accounts of a very small business, but in auditing some 
major government accounts, errors on the order of a few Euros might not 
be investigated.  Also, measurement errors with microvolts are much 
smaller than with megavolts;  expressing the measurements in decibels 
(i.e., on the log scale) makes the measurement errors more nearly 
comparable. 

      Thanks again for your comments. 
      Best Wishes,
      Spencer Graves
Peter Dalgaard wrote:

            

  
    
#
On Thu, 25 Nov 2004, Spencer Graves wrote:

            
They do not, and OTOH if the MLE really does lie on the boundary (and here 
it may well, with one data point fitted with mean zero) transformation 
will often not find a good solution.

Constrained optimization is a hard problem, good methods are very 
complex and good code to implement them is usually expensive.
#
On 26-Nov-04 Prof Brian Ripley wrote:
I've had success with awkward constrained optimisation using the
Nelder-Mead procedure -- simply make the region beyond the constraint
hot enough for the simplex to recoil on touching it.

I once tested it in what I thought might be a tough case by
planting "trees" all over a shallow quadratic bowl. More
precisely, many circular regions which were not to be entered.
The objective was to locate the minimum of the bowl.

The function was defined so as to take a high value inside
each circular region.

The simplex took a long time to locate the minimum, spending
a lot of the time feeling its way slowly round the "trunks"
of the "trees", and some of the time trotting downhill over
open space, but it did get there. Nelder-Mead is pretty robust.

Best wishes to all,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 26-Nov-04                                       Time: 09:39:08
------------------------------ XFMail ------------------------------
3 days later
#
On Thu, 25 Nov 2004, Peter Dalgaard wrote:

            
It's harder than it looks (well, my experience is with the log-binomial 
model, but it should be similar).  The constraints are not box 
constraints, but a more general set of linear constraints, and you either 
have to find the convex hull of the data or use a constraint for every 
data point.

Also, the algorithm in glm.fit, while not perfect, is a little smarter 
than a simple IRLS. It uses step-halving to back away from the edge, and 
when the parameter space is convex it has a reasonable chance of creeping 
along the boundary to the true MLE.

I think better glm fitting is worth pursuing: computational difficulties 
with the log-binomial model have forced many epidemiologists turn to 
estimators other than the MLE (or contrasts other than the relative risk), 
which is a pity.


 	-thomas
#
Thomas Lumley <tlumley at u.washington.edu> writes:
Hmm. That wasn't my experience. I had a situation where there was like
a (virtual) maximum outside the boundary, and the algorithm would
basically stay on the path to that peak, banging its little head
into the same point of the wall repeatedly, so to speak.

(If you make a little drawing of an elliptical contour intersecting a
linear boundary, I'm sure you'll see that this process leads to a
point-of-no-progress that can be quite far from the maximum along the
edge.)
Yup...
#
On 29 Nov 2004 18:07:40 +0100, Peter Dalgaard
<p.dalgaard at biostat.ku.dk> wrote:

            
I love that image:  the little optimizer that couldn't.

I hope the fortunes maintainer is listening...

Duncan Murdoch