Skip to content
Prev 18136 / 20628 Next

bam model selection with 3 million data

Hi David,

I wrote that package, so yes, I am very familiar with it :) :)

buildbam is based on the explained deviance. A term is included if the explained deviance with the term is higher than the explained deviance without it. This means that buildbam will favor models that contain (too?) many effects, as effects that model only noise but still explain perhaps 0.0001% more deviance due to chance will still be included. To be honest with you, I have come to believe that the buildbam function was a mistake in the first place. I had coded buildgam already and buildbam was an easy extension, and it was only later that I realized that bam using PQL meant that only the explained deviance would be a valid model-comparison criterion. But note that the explained deviance is not actually a _formal_ criterion, like the likelihood-ratio test is (which uses a well-known result that differences in log-likelihood are asymptotically chi-square-distributed) or like AIC or BIC (which are respectively based on information theory and on Bayesian inference). I had wanted to remove buildbam for a long time, but this would be wrong of me to do because now that it's out there people may be using it, and given the provided caveat that it is can only use the explained deviance, they may actually have a use case for it. Your message makes me think very seriously about officially deprecating this function in buildmer's next release, or at least issue a warning that explained deviance is not a formal criterion and will favor overfitted models. In fact, I should deprecate that as well -- I had only put it in for buildbam's use.

If buildbam and bam(select=TRUE) conflict, I would definitely trust bam(select=TRUE) over buildbam, as mgcv's automatic smoothness selection is a method derived from first principles by extending the approach used to fit smooth terms anyway in a very straightforward way. On the other hand, the explained deviance is completely ad hoc and has no formal justification. I'll just bite the bullet and deprecate buildbam, advising people to use buildgam or select=TRUE instead.

Thanks for helping me bite that bullet,

Cesko

Op 4-2-2020 om 10:51 schreef David Villegas R?os: