I am running a generalized linear mixed model with a gamma family. How can I understand which link function I should use (log link, identity link, or inverse link)? I tried to plot observed vs fitted values plots. But they look similar? Should I look at AIC? If I fit a gamma glm, should I also look at AIC to know which link function I should use in my model? If I fit gamma GEE . should I look at QIC for choosing the appropriate model with link function?
Identifying link functions for gamma glmm
4 messages · Rolf Turner, Ben Bolker, Tahsin Ferdous
On Fri, 19 Nov 2021 13:38:14 -0700
Tahsin Ferdous <tahsinferdousuofc at gmail.com> wrote:
I am running a generalized linear mixed model with a gamma family. How can I understand which link function I should use (log link, identity link, or inverse link)? I tried to plot observed vs fitted values plots. But they look similar? Should I look at AIC? If I fit a gamma glm, should I also look at AIC to know which link function I should use in my model? If I fit gamma GEE . should I look at QIC for choosing the appropriate model with link function?
I used a cross-validation approach, in a somewhat similar context. I tried both leave-one-out and k-fold cross-validation, with moderate success. The complete story is complicated, and has an unsatisfactory ending (I wound up getting answers that the client did not like!) so I shall not go into any more detail. OTOH if your plots "look similar", perhaps it doesn't really matter. Can you think of a criterion for deciding whether it *does* matter? cheers, Rolf Turner
Honorary Research Fellow Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276
Most of the time I would suggest choosing link functions on scientific grounds, i.e. what scale makes sense for the expected effects? Link functions change the expected relationship with continuous predictors (do I expect the effects of predictors to be linear (identity), exponential (log), or hyperbolic (inverse)?) and change the meaning of interactions (does the value of one variable change the expected effect of the other additively (identity), proportionally (log), or ?? (inverse)). I generally find that log links are more numerically stable (both identity and inverse links can sometimes lead to negative predictions). Logs are also nice because they essentially split the difference between the identity and inverse links. If I have (say) responses that are time intervals, then analyzing on the identity scale describes additive effects on the time scale; analyzing on the inverse scale describes additive effects on the rate or speed (1/time) of the response; analyzing on the log scale describes proportional changes in *either* time or rate (because log(time) = -1*log(1/time)). My general procedure would be to use a log link and see if the diagnostics detected any problems. That said, you could use AIC or cross-validation if you are primarily interested in prediction (and aren't worried about snooping). Cross-validation will be slower but more reliable, *if* you are careful to maintain independence structure when you specify your training and testing sets (i.e., you should sample by levels of your grouping variable, not by individual observations)
On 11/19/21 3:38 PM, Tahsin Ferdous wrote:
I am running a generalized linear mixed model with a gamma family. How can I understand which link function I should use (log link, identity link, or inverse link)? I tried to plot observed vs fitted values plots. But they look similar? Should I look at AIC? If I fit a gamma glm, should I also look at AIC to know which link function I should use in my model? If I fit gamma GEE . should I look at QIC for choosing the appropriate model with link function? [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dr. Benjamin Bolker Professor, Mathematics & Statistics and Biology, McMaster University Director, School of Computational Science and Engineering (Acting) Graduate chair, Mathematics & Statistics
Dear Dr Ben, Thanks a lot for the valuable information. At first, I tried to fit a glm with a log link. I attached the residual vs fitted plot of this model. I have seen an outlier in the plot (I am not sure what is meant by 300 in this plot?). Should we always look at the diagnostic plots (residual vs fitted) for glm, glmm or gee? If I run the model with the outlier, does it give valid results? Kindest regards, Tahsin
On Fri, Nov 19, 2021 at 5:40 PM Ben Bolker <bbolker at gmail.com> wrote:
Most of the time I would suggest choosing link functions on scientific grounds, i.e. what scale makes sense for the expected effects? Link functions change the expected relationship with continuous predictors (do I expect the effects of predictors to be linear (identity), exponential (log), or hyperbolic (inverse)?) and change the meaning of interactions (does the value of one variable change the expected effect of the other additively (identity), proportionally (log), or ?? (inverse)). I generally find that log links are more numerically stable (both identity and inverse links can sometimes lead to negative predictions). Logs are also nice because they essentially split the difference between the identity and inverse links. If I have (say) responses that are time intervals, then analyzing on the identity scale describes additive effects on the time scale; analyzing on the inverse scale describes additive effects on the rate or speed (1/time) of the response; analyzing on the log scale describes proportional changes in *either* time or rate (because log(time) = -1*log(1/time)). My general procedure would be to use a log link and see if the diagnostics detected any problems. That said, you could use AIC or cross-validation if you are primarily interested in prediction (and aren't worried about snooping). Cross-validation will be slower but more reliable, *if* you are careful to maintain independence structure when you specify your training and testing sets (i.e., you should sample by levels of your grouping variable, not by individual observations) On 11/19/21 3:38 PM, Tahsin Ferdous wrote:
I am running a generalized linear mixed model with a gamma family. How
can
I understand which link function I should use (log link, identity link,
or
inverse link)? I tried to plot observed vs fitted values plots. But they
look similar? Should I look at AIC? If I fit a gamma glm, should I also
look at AIC to know which link function I should use in my model?
If I fit gamma GEE . should I look at QIC for choosing the appropriate
model with link function?
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Dr. Benjamin Bolker Professor, Mathematics & Statistics and Biology, McMaster University Director, School of Computational Science and Engineering (Acting) Graduate chair, Mathematics & Statistics
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-------------- next part -------------- A non-text attachment was scrubbed... Name: Rplot 1.png Type: image/png Size: 31426 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20211119/13710482/attachment-0001.png>