Skip to content

Identifying link functions for gamma glmm

4 messages · Rolf Turner, Ben Bolker, Tahsin Ferdous

#
I am running a generalized linear mixed model with a gamma family. How can
I understand which link function I should use (log link, identity link, or
inverse link)? I tried to plot observed vs fitted values plots. But they
look similar? Should I look at AIC? If I fit a gamma glm, should I also
look at AIC to know which link function I should use in my model?

If I fit gamma GEE . should I look at QIC for choosing the appropriate
model with link function?
#
On Fri, 19 Nov 2021 13:38:14 -0700
Tahsin Ferdous <tahsinferdousuofc at gmail.com> wrote:

            
I used a cross-validation approach, in a somewhat similar context.  I
tried both leave-one-out and k-fold cross-validation, with moderate
success.  The complete story is complicated, and has an unsatisfactory
ending (I wound up getting answers that the client did not like!) so
I shall not go into any more detail.

OTOH if your plots "look similar", perhaps it doesn't really matter.
Can you think of a criterion for deciding whether it *does* matter?

cheers,

Rolf Turner
#
Most of the time I would suggest choosing link functions on 
scientific grounds, i.e. what scale makes sense for the expected 
effects? Link functions change the expected relationship with continuous 
predictors (do I expect the effects of predictors to be linear 
(identity), exponential (log), or hyperbolic (inverse)?) and change the 
meaning of interactions (does the value of one variable change the 
expected effect of the other additively (identity), proportionally 
(log), or ?? (inverse)).

   I generally find that log links are more numerically stable (both 
identity and inverse links can sometimes lead to negative predictions). 
Logs are also nice because they essentially split the difference between 
the identity and inverse links.  If I have (say) responses that are time 
intervals, then analyzing on the identity scale describes additive 
effects on the time scale; analyzing on the inverse scale describes 
additive effects on the rate or speed (1/time) of the response; 
analyzing on the log scale describes proportional changes in *either* 
time or rate (because log(time) = -1*log(1/time)).

   My general procedure would be to use a log link and see if the 
diagnostics detected any problems.

   That said, you could use AIC or cross-validation if you are primarily 
interested in prediction (and aren't worried about snooping). 
Cross-validation will be slower but more reliable, *if* you are careful 
to maintain independence structure when you specify your training and 
testing sets (i.e., you should sample by levels of your grouping 
variable, not by individual observations)
On 11/19/21 3:38 PM, Tahsin Ferdous wrote:

  
    
#
Dear Dr Ben,

Thanks a lot for the valuable information. At first, I tried to fit a glm
with a log link. I attached the residual vs fitted plot of this model. I
have seen an outlier in the plot (I am not sure what is meant by 300 in
this plot?). Should we always look at the diagnostic plots (residual vs
fitted) for glm, glmm or gee? If I run the model with the outlier, does it
give valid results?

Kindest regards,

Tahsin
On Fri, Nov 19, 2021 at 5:40 PM Ben Bolker <bbolker at gmail.com> wrote:

            
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Rplot 1.png
Type: image/png
Size: 31426 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20211119/13710482/attachment-0001.png>