Skip to content
Prev 2935 / 20628 Next

lme vs. lmer

On Wed, Oct 7, 2009 at 6:03 AM, Raldo Kruger <raldo.kruger at gmail.com> wrote:
No, that is backwards.  If the anova says the two models are
different, keep the full, unrestricted model. The significance of the
difference indicates you lost "explanatory power" when you removed a
variable or recoded to remove a parameter.


This is done since one wants a model
It's not a problem that "the program does not give you a p-value."
The problem is that the statistical theory is tenuous enough that
nobody is sure what the p-value ought to be.  The quasi model is based
on the idea that nobody know the sampling distribution of the random
component very well, so the standard errors are, well, sorta
not-really-standard.  I think you'd probably want to calculate some
kind of robust standard error, but I've not done it with a
quasi-poisson model.

Instead of a quasi-poisson, you could look around for a mixed model
program that has a negative binomial option.  For negative binomial,
we have more exact model of randomness and the standard errors are
more well understood.  lmer does not have nb, but I'm pretty sure I've
seen one somewhere.

If you really believe you want a p-value, you can calculate one
yourself.  From the "summary" output, inspect and you'll see there are
columns of b's and standard errrors.  divide away for yourself.

The problem of deciding which variable to drop is a hard one, it is
the same in ordinary regression.

I'm trained in a tradition that says you should try to choose
variables by theory, and don't commit the sin of dropping variables
just because they are "not significant".  If there is any
multicollinearity, the dropping process may lead to mistaken
conclusions.  This is the flaw in so-called "stepwise" regression. You
are a "bonehead" if you let the model tell you which parameters to
include.  Models will lie to you.  Model pruning of that sort--the
search for "significant" estimates--produces bad t-tests and a lot of
silly articles getting published.  I've seen economists and political
scientists crop up with survey articles saying that just about
everything we publish is misleading/wrong because of the model pruning
approach.

I've wondered if we could not work out a "regression tree" or "forest"
framework to choose which variables are needed in your glm.  I read a
lot about it, but concluded it did not exactly fit my need.  If you
are looking for some rigorous justification to include/exclude
variables, I think you have to look in that more exotic direction.  I
saw a beautiful presentation about the LASSO that selects and
estimates and accounts for shrinkage as well.

There's an article by Ed Leamer from AER with a title like "Let's take
the con out of econometrics."  it deals with the variable selection
problem. I *think* his suggested approach would be the one we call
Bayesian Model Averaging today.   If you fit a lot of models, drop
variables in and out, then the final result should somehow summarize
the variety of estimates you observed.

Sorry, this is preaching in the wrong context. You don't really have a
r-sig-mixed problem here, you have a more general (somewhat religious)
question about regression modeling.

pj