Predictions from zero-inflated or hurdle models

Hi Jarrod,

thanks for the extensive reply! This helps a lot, though it sounds like I
was hubristic to attempt this myself.
I tried using the approach you mapped out in the function gist
<https://gist.github.com/rubenarslan/aeacdd306b3d061819a6> I posted. I
simply put the pred function in a loop, so that I wouldn't make any
mistakes while vectorising and since I don't care about performance at this
point.

Of course, I have some follow up questions though.. I'm sorry if I'm being
a pain, I really appreciate the free advice, but understand of course if
other things take precedence.

1. You're not "down with" developing publicly are you? Because I sure would
like to test-drive the newdata prediction and simulation functions..

2. Could you make sure that I got this right: "When predictions are to be
taken after marginalising the random effects (including the `residual'
over-dispersion) it is not possible to obtain closed form expressions."
That is basically my scenario, right? In the example I included, I also had
a group-level random effect (family). Ore are you talking about the "trait"
as the random effect (as in your example) and my scenario is different and
I cannot apply the numerical double integration procedure you posted?
To be clear about my prediction goal without using language that I might be
using incorrectly: I want to show what the average effect in the response
unit, number of children, is in my population(s). I have data on whole
populations and am using all of it (except individuals that don't have
completed fertility yet, because I have yet to find a way to model both
zero-inflation and right censoring).

3. "Numerical integration could be extended to double integration in which
case covariance between the Poisson part and the binary part could be
handled." That is what you posted an example of and it applies to my
scenario, because I specified a prior R=list(V=diag(2), nu=1.002, fix=2)
and rcov=~idh(trait):units, random=~idh(trait):idParents?
But this double integration approach is something you just wrote
off-the-cuff and I probably shouldn't use it in a publication? Or is this
in the forthcoming MCMCglmm release and I might actually be able to refer
to it once I get to submitting?

4. Could I change my model specification to forbid covariance between the
two parts and not shoot myself in the foot? Would this allow for a more
valid/tested approach to prediction?

5. When I use your method on my real data, I get less variation around the
prediction "for the reference level" than for all other factor levels.
My reference level actually has fewer cases than the others, so this isn't
"right" in a way.
Is this because I'm not doing newdata prediction? I get the "right" looking
uncertainty if I bootstrap newdata predictions in lme4,
Sorry if this is children's logic :-)
Here's an image of the prediction
<http://rpubs.com/rubenarslan/mcmcglmm_pred> and the raw data
<http://rpubs.com/rubenarslan/raw>.

Many thanks for any answers that you feel inclined to give.

Best wishes,

Ruben

Predictions from zero-inflated or hurdle models

Thread (11 messages)