Skip to content
Prev 53901 / 63424 Next

Wrongly converging glm()

Dear Mark,

I agree that convergence is a problem that applies to optimization in general, where the function you?re trying to optimize may have more than one local minimum. In your case, you probably would have to try different starting points for the MLE procedure. This should not be the case for logistic regression however (unless, like in my data, you have something that defies your model assumptions; check Simon Bonner?s response).

Still, I would think it would be a bit odd if the deviance wouldn?t change, but one of the model parameters did after the next MLE iteration. It would tell me that these parameters wouldn?t add to the model fit, which in my opinion would be useful debugging information, even when I would be hitting a local minimum (it could even help me inform that there is another, more optimal, solution?). Probably I should try to figure out whether this observation is also true for other models/link functions (I honestly don?t know).

However, thanks to your response, I can see that my suggestion is probably not applicable to all glm link functions, and I see how implementation of my proposed ?warning system? could be confusing to the user. Thanks alot!

With kind regards,

Harm-Jan

From: Mark Leeds<mailto:markleeds2 at gmail.com>
Sent: Thursday, July 20, 2017 14:54
To: Harm-Jan Westra<mailto:westra.harmjan at outlook.com>
Cc: Joris Meys<mailto:jorismeys at gmail.com>; r-devel at r-project.org<mailto:r-devel at r-project.org>
Subject: Re: [Rd] Wrongly converging glm()

Hi Harm-Jan. I've been following this thread to some degree and just want to add that
 this issue is not specific to the GLM. It's a problem with optimization of functions in general. I was using use Rvmmin with constraints which is an extremely solid optimization package written by John Nash ( uses a modified BFGS  algorithm) and it took me two years to realize that, although my optimization generally converged, there was an idenitifiability issue with my model that basically meant that the results meant nothing. I only eventually found this out because, in the econometrics literature,  the type of economic model I was estimating ( rational expectations ) is known to have an identifiability issue. I guess if I was an economics expert, I  may have been able to know this but, in general, I think what you are asking
optimization code to do is EXTREMELY DIFFICULT.

John Nash can say more because he's THE optimization masteR but it's much more difficult to write optimization algorithms with convergence rules that are able to identify when mathematical convergence ( norm near zero say ) is not necessarily model convergence. That I can tell you from experience !!!!!!!
On Thu, Jul 20, 2017 at 2:32 PM, Harm-Jan Westra <westra.harmjan at outlook.com<mailto:westra.harmjan at outlook.com>> wrote:
My apologies if I seemed to ?blame R?. This was in no way my intention. I get the feeling that you?re missing my point as well.

I observed something that I thought was confusing, when comparing two more or less identical methods (when validating the C code), and wanted to make a suggestion as to how to help future R users. Note that I already acknowledged that my data was bad. Note that I also mention that the way R determines convergence is a valid approach.

What strikes me as odd is that R would warn you when your data is faulty for a function such as cor(), but not for glm(). I don?t see why you wouldn?t want to check both convergence criteria if you know multiple of such criteria exist. It would make the software more user friendly in the end.

It may be true that there are millions of edge cases causing issues with glm(), as you say, but here I am presenting an edge case that can be easily detected, by checking whether the difference in beta estimates between the current and previous iteration is bigger than a certain epsilon value.

I agree ?that everybody using R should first do the effort of learning what they're doing?, but it is a bit of a non-argument, because we all know that, the world just doesn?t work that way, plus this is one of the arguments that has held for example the Linux community back for quite a while (i.e. let?s not make the software more user friendly because the user should be more knowledgeable).

Harm-Jan


From: Joris Meys<mailto:jorismeys at gmail.com<mailto:jorismeys at gmail.com>>
Sent: Thursday, July 20, 2017 13:16
To: Harm-Jan Westra<mailto:westra.harmjan at outlook.com<mailto:westra.harmjan at outlook.com>>
Cc: r-devel at r-project.org<mailto:r-devel at r-project.org><mailto:r-devel at r-project.org<mailto:r-devel at r-project.org>>
Subject: Re: [Rd] Wrongly converging glm()
On Thu, Jul 20, 2017 at 6:21 PM, Harm-Jan Westra <westra.harmjan at outlook.com<mailto:westra.harmjan at outlook.com><mailto:westra.harmjan at outlook.com<mailto:westra.harmjan at outlook.com>>> wrote:
Dear Joris,


I agree that such a covariate should not be used in the analysis, and fully agree with your assessment. However, your response assumes that everybody who uses R knows what they're doing, which is a dangerous assumption to make. I bet there are a lot of people who blindly trust the output from R, even when there's clearly something wrong with the estimates.

You missed my point then. I don't assume that everybody who uses R knows what they're doing. Actually, I know for a fact quite a few people using R have absolutely no clue about what they are doing. My point is that everybody using R should first do the effort of learning what they're doing. And if they don't, one shouldn't blame R. There's a million different cases where both algorithms would converge and the resulting estimates are totally meaningless regardless. R cannot be held responsible for that.



In terms of your conclusion that the C++ estimate corresponds to a value within the R estimated confidence interval: if I allow the C++ code to run for 1000 iterations, it's estimate would be around -1000. It simply never converges.

I didn't test that far, and you're right in the sense that -100 is indeed not the final estimate. After looking at the C code, it appears as if the author of that code combines a Newton-Raphson approach with a different convergence rule. And then it's quite understandible it doesn't converge. You can wildly vary that estimate, the effect it has on the jacobian, log likelihood or deviance will be insignificant. So the model won't improve, it would just move all over the parameter space.



I think there's nothing wrong with letting the user know there might be something wrong with one of the estimates, especially if your code can easily figure it out for you, by adding an additional rule. Not everyone is always paying attention (even if they know what they're doing).

If R would do that, it wouldn't start the fitting procedure but just return an error "Your analysis died due to a lack of useable data." . Because that's the problem here.



With kind regards,


Harm-Jan
Message-ID: <DM3PR20MB057281C85A6D6494ABBAAA1E87A70@DM3PR20MB0572.namprd20.prod.outlook.com>
In-Reply-To: <CAHz+bWa+ddLiu0TRJDZxSYyALUowJfE7ovF_Yn5iEhLx7YUk8A@mail.gmail.com>