Skip to content
Prev 173946 / 398503 Next

bigglm() results different from glm()

This is a surprisingly interesting problem that took a while to debug, because the computations all seemed correct.

Your model hasn't converged yet.  You can get the right answer either by running longer:
Large data regression model: bigglm(y ~ ttment, data = dat, family = poisson(link = "log"),
     chunksize = 100000, maxit = 20)
Sample size =  100000
              Coef  (95%   CI)    SE p
(Intercept) 2.304 2.301 2.307 0.001 0
ttment2     0.405 0.401 0.408 0.002 0

or supplying starting values:
Large data regression model: bigglm(y ~ ttment, data = dat, family = poisson(link = "log"),
     chunksize = 100000, start = c(2, 0))
Sample size =  100000
              Coef  (95%   CI)    SE p
(Intercept) 2.304 2.301 2.307 0.001 0
ttment2     0.405 0.401 0.408 0.002 0


The bug is that you weren't told about the lack of convergence.  There is a flag in the object, but it is only set after the model is converged, so it is not there when convergence fails.
NULL
[1] TRUE
[1] TRUE

For the next version I will make sure there is a clear warning when the model hasn't converged.  The default maximum number of iterations is fairly small, by design --- if it isn't working, you want to find out and specify starting values rather than wait for dozens of potentially slow iterations.  This strategy obviously breaks down when you don't notice that failure. :(

      -thomas
On Mon, 16 Mar 2009, Francisco J. Zagmutt wrote:

            
Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle