Efficient mixed logistic reg w 500k individuals

The problem with random forests is that they don't respect the
hierarchical nature of the data, which depending on the OP's goals may
or may not be a problem. That's in addition to the differences between
random forests vs logistic regression even in a non
hierarchical/multilevel context.

Also, I think the spurious/unstable relationships bit requires some
qualification. Yes, if you're looking at p-values, then with that much
data, you'll typically be able to estimate trivial effects. But the
solution is then not to focus on p-values.

(Not saying random forests and the like aren't useful -- quite the
contrary. But the motivations here are a bit of a red herring.)

Phillip

Efficient mixed logistic reg w 500k individuals

Thread (6 messages)