Hi Jo,
OooooK, I see your point, sorry for slowness. Yes the residuals look a
bit wonky on your first glm1path plot (ft2). I?ve dug into the reason and
found a bug in the code for residual.glm1path nbinom fits, where it was
using the wrong parameterisation of overdispersion in computing residuals.
Now fixed on github J.
All the best
David
*From:* Joanne Potts <joanne at theanalyticaledge.com>
*Sent:* Wednesday, 5 December 2018 9:49 AM
*To:* David Warton <david.warton at unsw.edu.au>
*Cc:* r-sig-ecology at r-project.org
*Subject:* Re: mvabund: difference between 'glm1path' and 'manyglm'
Thanks David for getting back to me. I think I have followed your answer,
thank you, and I get that when one specifies the theta value, all the
ft3$phis are now constant for each lambda.
Now I wonder if there is any value of ever specifying "
negative.binomial(theta) " as I did below with ft3 (cf the ?glm1path helpfile)
to improve the residuals, when using the LASSO? I guess I always thought
the LASSO was a more robust way to select models but it seems the residuals
of ft2 suggest otherwise.
These questions are motivated for some over dispersed seal-fish data for a
student in Sydney (as we've discussed off list) but I guess these questions
are more of a theoretical nature. I over came my social phobia of posting
on a list instead of hassling you privately(!), maybe someone else can
value from this discussion too :)
Thanks once again,
Jo
On Mon, Dec 3, 2018 at 12:10 AM David Warton <david.warton at unsw.edu.au>
wrote:
Hi Jo,
Thanks for the e-mail, always good to see statistical modelling questions
on this list!
In the mvabund package, you can fit trait models using different methods
of estimation, method=?manyglm? will fit a GLM, ?glm1path? will fit a GLM
with a LASSO penalty (chosen using BIC by default but there are other
options). The way we coded LASSO negative binomial regression was to
update estimates of the overdispersion parameter as the slope parameters
update. Because the LASSO fit gives different slope parameters, it will
also have a different overdispersion parameter. It probably has a larger
overdispersion parameter, because the LASSO pushes slope parameters away
from the best (in-sample) fit hence there is more unexplained variation in
the LASSO model.
All the best
David
Professor David Warton
School of Mathematics and Statistics, Evolution & Ecology Research Centre,
Centre for Ecosystem Science
UNSW Sydney
NSW 2052 AUSTRALIA
phone +61(2) 9385 7031
fax +61(2) 9385 7123
http://www.eco-stats.unsw.edu.au
*From:* Joanne Potts <joanne at theanalyticaledge.com>
*Sent:* Friday, 30 November 2018 1:51 AM
*To:* r-sig-ecology at r-project.org
*Cc:* David Warton <david.warton at unsw.edu.au>
*Subject:* mvabund: difference between 'glm1path' and 'manyglm'
Hi David and list,
Can someone please help me understand why, when changing the
'method=manyglm' argument to 'method=glm1path' under default settings
(negative binomial) the estimates of theta change in the 'trait.glm'
function?
I have provided example code below us the antTraits data set. And you
should see the plots for ft and ft3 are similar, yet ft2 is quite
different, so I think I am missing something (no doubt, probably very
obvious!).
Advice appreciated, thank you.
Jo
data(antTraits)
ft=traitglm(antTraits$abund,antTraits$env,antTraits$traits,method="manyglm")
ft$phi
ft$theta
qqnorm(residuals(ft)); abline(c(0,1),col="red")
ft2=traitglm(antTraits$abund,antTraits$env,antTraits$traits,method="glm1path")
mean(ft2$phis)
qqnorm(residuals(ft2)); abline(c(0,1),col="red")
ft3=traitglm(antTraits$abund,antTraits$env,antTraits$traits,method="glm1path",
negative.binomial(theta=1.641763))
1/mean(ft3$phis)
qqnorm(residuals(ft3)); abline(c(0,1),col="red")
--
Kind regards,
Joanne Potts