Skip to content

glm gives incorrect results for zero-weight cases (PR#780)

4 messages · Brian Ripley, Peter Dalgaard, Thomas Lumley

#
Using zero-weight values in glm returns incorrect fitted values and
linear predictors, the ninth value in the following.
data=d.AD, weights=c(rep(1,8), 0))
1        2        3        4        5        6        7        8 
2.989646 2.535391 2.862201 2.989646 2.535391 2.862201 3.145992 2.691737 
       9 
2.493205
1        2        3        4        5        6        7        8 
2.989646 2.535391 2.862201 2.989646 2.535391 2.862201 3.145992 2.691737 
       9 
3.018547
1        2        3        4        5        6        7        8 
19.87864 12.62136 17.50000 19.87864 12.62136 17.50000 23.24272 14.75728 
       9 
12.10000
[1] 19.87864 12.62136 17.50000 19.87864 12.62136 17.50000 23.24272 14.75728
[9] 20.46154

The reason is obvious: glm.fit only ever updates eta[good], and 
zero-weight values are not `good'.  So eta[weights == 0] is stuck at the
initial values.

There are two possible fixes:

1) Update eta after the final fit, and then mu.  Out of range values
could then be NA (although it looks like predict.glm does not check).

2) Update all eta and hence mu values during the iterations.  This will
apply the constraints on eta/mu at zero-weight points too, and so might
be different.

I am inclined to think that 2) is right, and that adding points with zero 
weight to the fit is not the same as omitting them.

Opinions?


--please do not edit the information below--

Version:
 platform = sparc-sun-solaris2.6
 arch = sparc
 os = solaris2.6
 system = sparc, solaris2.6
 status = 
 major = 1
 minor = 2.0
 year = 2000
 month = 12
 day = 15
 language = R

Search Path:
 .GlobalEnv, package:ctest, Autoloads, package:base
#
On 20 Dec 2000, Peter Dalgaard BSA wrote:

            
Constraints can be added by the user, of course, but in the standard cases
(canonical links) they never bite.  Poisson with linear link is one case
where they might.  This checking is something that R has but S and GLIM
(AFAIR) do not.
#
ripley@stats.ox.ac.uk writes:
Just for clarification: This applies only to cases where the
parametrization is non-canonical, e.g. additive models with Poisson
response, right? And essentially the issue is that if you have a model
like lambda = a + b x and you put in a zero-weight observation with x
= 0, then that should effectively constrain a to be positive. That
does make quite good sense, yes.
#
On 20 Dec 2000, Peter Dalgaard BSA wrote:

            
Not just non-canonical. There are boundary problems with gamma/reciprocal
glms.  I would also go for the second solution.


	-thomas


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._