Hi all I'm trying to use the function glm from the MASS package to do the following fit. fit <- glm(FP ~ rand, data = tab, family = poisson(link = "identity"), subset = rand >= 1) (FP is >= 0) but I get the following error Error: no valid set of coefficients has been found:please supply starting values In addition: Warning message: NaNs produced in: log(x) in contrast if I fit a model without intercept fit <- glm(FP ~ rand - 1, data = tab, family = poisson(link = "identity"), subset = rand >= 1) everything goes fine. Now my guess is that the points "naturally" have a negative intercept so the error is produced because I'm using the poisson distribution for the y and negative values are of course not admitted. Am I right? Also if this is the cause, shouldn't the function always try to do the best fit given the parameters? I mean shouldn't it fit a model with intercept 0 anyway and report it as a bad fit? Thanks Federico
Error using glm with poisson family and identity link
13 messages · Federico Gherardini, Brian Ripley, Spencer Graves +4 more
On Thu, 25 Nov 2004, Federico Gherardini wrote:
Hi all I'm trying to use the function glm from the MASS package to do the following
It's in the stats package.
fit. fit <- glm(FP ~ rand, data = tab, family = poisson(link = "identity"), subset = rand >= 1) (FP is >= 0) but I get the following error Error: no valid set of coefficients has been found:please supply starting values In addition: Warning message: NaNs produced in: log(x)
And did you follow the advice in the error message?
in contrast if I fit a model without intercept fit <- glm(FP ~ rand - 1, data = tab, family = poisson(link = "identity"), subset = rand >= 1) everything goes fine. Now my guess is that the points "naturally" have a negative intercept so the error is produced because I'm using the poisson distribution for the y and negative values are of course not admitted. Am I right?
We don't have your data, but it is plausible.
Also if this is the cause, shouldn't the function always try to do the best fit given the parameters? I mean shouldn't it fit a model with intercept 0 anyway and report it as a bad fit?
Well, I believe functions should do what they say on the box (and the help page), and not what some user hopes they might do by mind-reading. You do have a suitable set of starting values from the second fit, so why not just follow the rather explicit advice?
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Dear Federico:
Why do you use the "identity" link? That can produce situations
with an average of (-2) Poisson defects per unit, for example. That's
physical nonsense. Also, it seems essentially to be generating your
error message.
Also, have you considered the following:
fit <- glm(FP ~ offset(log(rand)), data = tab, family = poisson, subset
= rand >= 1)
I haven't tried this, but it looks like this model is virtually
equivalent to the one you wrote:
"FP ~ rand-1" with link = "identity" says "estimate 'b' in
PoissonMean = b*rand".
"FP ~ offset(log(rand))" with link = "log" (the default) says
"estimate 'a' in log(PoissonMean)-log(rand) = a".
If I haven't made an error, then log(b) = a.
In more general situations, if you really need the "identity"
link, have you considered searching for good starting values, as Prof.
Ripley suggested? You could build up to your final model by estimating
simpler models and obtaining trial fits using the default "log" link?
With those results and a little thought, you should be able to obtain
reasonable starting values.
hope this helps.
spencer graves
Prof Brian Ripley wrote:
On Thu, 25 Nov 2004, Federico Gherardini wrote:
Hi all I'm trying to use the function glm from the MASS package to do the following
It's in the stats package.
fit. fit <- glm(FP ~ rand, data = tab, family = poisson(link = "identity"), subset = rand >= 1) (FP is >= 0) but I get the following error Error: no valid set of coefficients has been found:please supply starting values In addition: Warning message: NaNs produced in: log(x)
And did you follow the advice in the error message?
in contrast if I fit a model without intercept fit <- glm(FP ~ rand - 1, data = tab, family = poisson(link = "identity"), subset = rand >= 1) everything goes fine. Now my guess is that the points "naturally" have a negative intercept so the error is produced because I'm using the poisson distribution for the y and negative values are of course not admitted. Am I right?
We don't have your data, but it is plausible.
Also if this is the cause, shouldn't the function always try to do the best fit given the parameters? I mean shouldn't it fit a model with intercept 0 anyway and report it as a bad fit?
Well, I believe functions should do what they say on the box (and the help page), and not what some user hopes they might do by mind-reading. You do have a suitable set of starting values from the second fit, so why not just follow the rather explicit advice?
Spencer Graves, PhD, Senior Development Engineer O: (408)938-4420; mobile: (408)655-4567
Spencer Graves <spencer.graves at pdf.com> writes:
Dear Federico: Why do you use the "identity" link? That can produce situations with an average of (-2) Poisson defects per unit, for example. That's physical nonsense.
So is _not_ using the identity link when the model is manifestly additive on the identity scale. E.g. calibrating differential spectrofluorometry with photon counters recording linear combinations of intensities at different wavelengths. I've bumped into similar situations before (binomial(link=identity), I think it was then) and the glm.fit algorithm could use improvement in dealing with the parameter constraints in these cases. With the standard IRLS algorithm, if the maximum is on the boundary, you basically hit a random point on the boundary and get stuck there with a search direction pointing out of the valid region.
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Hi, Peter:
What do you do in such situations?
Sundar Dorai-Raj and I have extended "glm" concepts to models
driven by a sum of k independent Poissons, with the a linear model for
log(defectRate[i]) for each source (i = 1:k). To handle convergence
problems, etc., I think we need to use informative Bayes, but we're not
there yet. In any context where things are done more than once [which
covers most human activities], informative Bayes seems sensible.
A related question comes with data representing the differences
between Poisson counts, e.g., with d[i] = X[i]-X[i-1] = the number of
new defects added between steps i-1 and i in a manufacturing process.
Most of the time, d[i] is nonnegative. However, in some cases, it can
be negative, either because of metrology errors in X[i] or because of
defect removal between steps i-1 and i.
Comments?
Best Wishes,
Spencer Graves
Peter Dalgaard wrote:
Spencer Graves <spencer.graves at pdf.com> writes:
Dear Federico: Why do you use the "identity" link? That can produce situations with an average of (-2) Poisson defects per unit, for example. That's physical nonsense.
So is _not_ using the identity link when the model is manifestly additive on the identity scale. E.g. calibrating differential spectrofluorometry with photon counters recording linear combinations of intensities at different wavelengths. I've bumped into similar situations before (binomial(link=identity), I think it was then) and the glm.fit algorithm could use improvement in dealing with the parameter constraints in these cases. With the standard IRLS algorithm, if the maximum is on the boundary, you basically hit a random point on the boundary and get stuck there with a search direction pointing out of the valid region.
Spencer Graves, PhD, Senior Development Engineer O: (408)938-4420; mobile: (408)655-4567
Spencer Graves <spencer.graves at pdf.com> writes:
Hi, Peter: What do you do in such situations? Sundar Dorai-Raj and I have extended "glm" concepts to models driven by a sum of k independent Poissons, with the a linear model for log(defectRate[i]) for each source (i = 1:k). To handle convergence problems, etc., I think we need to use informative Bayes, but we're not there yet. In any context where things are done more than once [which covers most human activities], informative Bayes seems sensible. A related question comes with data representing the differences between Poisson counts, e.g., with d[i] = X[i]-X[i-1] = the number of new defects added between steps i-1 and i in a manufacturing process. Most of the time, d[i] is nonnegative. However, in some cases, it can be negative, either because of metrology errors in X[i] or because of defect removal between steps i-1 and i. Comments?
I haven't got all that much experience with it, but obviously, the various algorithms for constrained optimization (box- or otherwise) at least allow you to find a proper maximum likelihood estimator.
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Hi, Peter:
Thanks for the comment and reply.
I generally avoid constrained optimizers for three reasons:
1. My experience with them has included many cases where the
optimizer would stop with an error when testing parameter values that
violate the constraints. If I transform the parameter space to remove
the constraints, that never happens. The constrained optimizers in R
2.0.1 may not exhibit this behavior, but I have not checked.
2. In a few cases, I've plotted the log(likelihood) vs. parameter
values using various transformations. When I've done that, I typically
found that the most nearly parabolic performance used unconstrained
parameterizations. This makes asymptotic normality more useful and
increases the accuracy of simple, approximate sequential Bayesian
procedures.
3. When I think carefully about a particular application, I often
find a rationale for claiming that a certain unconstrained
parameterization provides a better description of the application. For
example, interest income on investments is essentially additive on the
log scale. Similarly, the concept of "materiality" in Accounting is
closer to being constant in log space: One might look for an error of a
few Euros in the accounts of a very small business, but in auditing some
major government accounts, errors on the order of a few Euros might not
be investigated. Also, measurement errors with microvolts are much
smaller than with megavolts; expressing the measurements in decibels
(i.e., on the log scale) makes the measurement errors more nearly
comparable.
Thanks again for your comments.
Best Wishes,
Spencer Graves
Peter Dalgaard wrote:
Spencer Graves <spencer.graves at pdf.com> writes:
Hi, Peter: What do you do in such situations? Sundar Dorai-Raj and I have extended "glm" concepts to models driven by a sum of k independent Poissons, with the a linear model for log(defectRate[i]) for each source (i = 1:k). To handle convergence problems, etc., I think we need to use informative Bayes, but we're not there yet. In any context where things are done more than once [which covers most human activities], informative Bayes seems sensible. A related question comes with data representing the differences between Poisson counts, e.g., with d[i] = X[i]-X[i-1] = the number of new defects added between steps i-1 and i in a manufacturing process. Most of the time, d[i] is nonnegative. However, in some cases, it can be negative, either because of metrology errors in X[i] or because of defect removal between steps i-1 and i. Comments?
I haven't got all that much experience with it, but obviously, the various algorithms for constrained optimization (box- or otherwise) at least allow you to find a proper maximum likelihood estimator.
Spencer Graves, PhD, Senior Development Engineer O: (408)938-4420; mobile: (408)655-4567
On Thu, 25 Nov 2004, Spencer Graves wrote:
I generally avoid constrained optimizers for three reasons:
1. My experience with them has included many cases where the optimizer
would stop with an error when testing parameter values that violate the
constraints. If I transform the parameter space to remove the constraints,
that never happens. The constrained optimizers in R 2.0.1 may not exhibit
this behavior, but I have not checked.
They do not, and OTOH if the MLE really does lie on the boundary (and here it may well, with one data point fitted with mean zero) transformation will often not find a good solution. Constrained optimization is a hard problem, good methods are very complex and good code to implement them is usually expensive.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On 26-Nov-04 Prof Brian Ripley wrote:
On Thu, 25 Nov 2004, Spencer Graves wrote:
I generally avoid constrained optimizers for three reasons: 1. My experience with them has included many cases where the optimizer would stop with an error when testing parameter values that violate the constraints. If I transform the parameter space to remove the constraints, that never happens. The constrained optimizers in R 2.0.1 may not exhibit this behavior, but I have not checked.
They do not, and OTOH if the MLE really does lie on the boundary (and here it may well, with one data point fitted with mean zero) transformation will often not find a good solution. Constrained optimization is a hard problem, good methods are very complex and good code to implement them is usually expensive.
I've had success with awkward constrained optimisation using the Nelder-Mead procedure -- simply make the region beyond the constraint hot enough for the simplex to recoil on touching it. I once tested it in what I thought might be a tough case by planting "trees" all over a shallow quadratic bowl. More precisely, many circular regions which were not to be entered. The objective was to locate the minimum of the bowl. The function was defined so as to take a high value inside each circular region. The simplex took a long time to locate the minimum, spending a lot of the time feeling its way slowly round the "trunks" of the "trees", and some of the time trotting downhill over open space, but it did get there. Nelder-Mead is pretty robust. Best wishes to all, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 [NB: New number!] Date: 26-Nov-04 Time: 09:39:08 ------------------------------ XFMail ------------------------------
Thanks very much to everybody for your replies and comments! Cheers, Federico
3 days later
On Thu, 25 Nov 2004, Peter Dalgaard wrote:
I haven't got all that much experience with it, but obviously, the various algorithms for constrained optimization (box- or otherwise) at least allow you to find a proper maximum likelihood estimator.
It's harder than it looks (well, my experience is with the log-binomial model, but it should be similar). The constraints are not box constraints, but a more general set of linear constraints, and you either have to find the convex hull of the data or use a constraint for every data point. Also, the algorithm in glm.fit, while not perfect, is a little smarter than a simple IRLS. It uses step-halving to back away from the edge, and when the parameter space is convex it has a reasonable chance of creeping along the boundary to the true MLE. I think better glm fitting is worth pursuing: computational difficulties with the log-binomial model have forced many epidemiologists turn to estimators other than the MLE (or contrasts other than the relative risk), which is a pity. -thomas
Thomas Lumley <tlumley at u.washington.edu> writes:
Also, the algorithm in glm.fit, while not perfect, is a little smarter than a simple IRLS. It uses step-halving to back away from the edge, and when the parameter space is convex it has a reasonable chance of creeping along the boundary to the true MLE.
Hmm. That wasn't my experience. I had a situation where there was like a (virtual) maximum outside the boundary, and the algorithm would basically stay on the path to that peak, banging its little head into the same point of the wall repeatedly, so to speak. (If you make a little drawing of an elliptical contour intersecting a linear boundary, I'm sure you'll see that this process leads to a point-of-no-progress that can be quite far from the maximum along the edge.)
I think better glm fitting is worth pursuing: computational difficulties with the log-binomial model have forced many epidemiologists turn to estimators other than the MLE (or contrasts other than the relative risk), which is a pity.
Yup...
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
On 29 Nov 2004 18:07:40 +0100, Peter Dalgaard
<p.dalgaard at biostat.ku.dk> wrote:
I had a situation where there was like a (virtual) maximum outside the boundary, and the algorithm would basically stay on the path to that peak, banging its little head into the same point of the wall repeatedly, so to speak.
I love that image: the little optimizer that couldn't. I hope the fortunes maintainer is listening... Duncan Murdoch