Dear list,
I've made a reproducible example of the zero-altered prediction,
in the hope that someone will have a look and reassure me that I'm going
about this the right way.
I was a bit confused by the point about p_i and n_i being correlated (they
are in my case), but I think this was a red herring for me
since I'm not deriving the variance analytically.
The script is here: https://gist.github.com/rubenarslan/aeacdd306b3d061819a6
and if you don't want to run the simulation fit yourself, I've put an RDS
file of the fit here:
https://dl.dropboxusercontent.com/u/1078620/m1.rds
Best regards,
Ruben Arslan
On Tue, Mar 10, 2015 at 1:22 PM Ruben Arslan <rubenarslan at gmail.com> wrote:
Dear Dr Bolker,
I'd thought about something like this, one point of asking was to
see whether
a) it's implemented already, because I'll probably make dumb mistakes while
trying b) it's not implemented because it's a bad idea.
Your response and the MCMCglmm course notes make me hope that it's c) not
implemented because nobody did yet or d) it's so simple that everybody does
it on-the-fly.
So I tried my hand and would appreciate corrections. I am sure there is
some screw-up or an inelegant approach in there.
I included code for dealing with mcmc.lists because that's what I have and
I'm not entirely sure how I deal with them is correct either.
I started with a zero-altered model, because those fit fastest and
according to the course notes have the least complex likelihood.
Because I know not what I do, I'm not dealing with my random effects at
all.
I pasted a model summary below to show what I've applied the below
function to. The function gives the following quantiles when applied to 19
chains of that model.
5% 50% 95%
5.431684178 5.561211207 5.690655200
5% 50% 95%
5.003974382 5.178192327 5.348246558
Warm regards,
Ruben Arslan
HPDpredict_za = function(object, predictor) {
if(class(object) != "MCMCglmm") {
if(length( object[[1]]$Residual$nrt )>1) {
object = lapply(object,FUN=function(x) { x$Residual$nrt<-2;x })
}
Sol = mcmc.list(lapply(object,FUN=function(x) { x$Sol}))
vars = colnames(Sol[[1]])
} else {
Sol = as.data.frame(object$Sol)
vars = names(Sol)
}
za_predictor = vars[ vars %ends_with% predictor & vars %begins_with%
"traitza_"]
za_intercept_name = vars[ ! vars %contains% ":" & vars %begins_with%
"traitza_"]
intercept = Sol[,"(Intercept)"]
za_intercept = Sol[, za_intercept_name]
l1 = Sol[, predictor ]
l2 = Sol[, za_predictor ]
if(is.list(object)) {
intercept = unlist(intercept)
za_intercept = unlist(za_intercept)
l1 = unlist(l1)
l2 = unlist(l2)
}
py_0 = dpois(0, exp(intercept + za_intercept))
y_ygt0 = exp(intercept)
at_intercept = (1-py_0) * y_ygt0
py_0 = dpois(0, exp(intercept + za_intercept + l2))
y_ygt0 = exp(intercept + l1)
at_predictor_1 = (1-py_0) * y_ygt0
print(qplot(at_intercept))
print(qplot(at_predictor_1))
df = data.frame("intercept" = at_intercept)
df[, predictor] = at_predictor_1
print(qplot(x=variable,
y=value,data=suppressMessages(melt(df)),fill=variable,alpha=I(0.40), geom =
'violin'))
print(quantile(at_intercept, probs = c(0.05,0.5,0.95)))
print(quantile(at_predictor_1, probs = c(0.05,0.5,0.95)))
invisible(df)
}
Iterations = 100001:299901
Thinning interval = 100
Sample size = 2000
DIC: 349094
G-structure: ~idh(trait):idParents
post.mean l-95% CI u-95% CI eff.samp
children.idParents 0.0189 0.0164 0.0214 1729
za_children.idParents 0.2392 0.2171 0.2622 1647
R-structure: ~idh(trait):units
post.mean l-95% CI u-95% CI eff.samp
children.units 0.144 0.139 0.148 1715
za_children.units 1.000 1.000 1.000 0
Location effects: children ~ trait * (maternalage.factor + paternalloss +
maternalloss + center(nr.siblings) + birth.cohort + urban + male +
paternalage.mean + paternalage.diff)
post.mean l-95% CI u-95% CI
eff.samp pMCMC
(Intercept) 2.088717 2.073009 2.103357
2000 <0.0005 ***
traitza_children -1.933491 -1.981945 -1.887863
2000 <0.0005 ***
maternalage.factor(14,20] 0.007709 -0.014238 0.027883
1500 0.460
maternalage.factor(35,50] 0.006350 -0.009634 0.024107
2000 0.462
paternallossTRUE 0.000797 -0.022716 0.025015
2000 0.925
maternallossTRUE -0.015542 -0.040240 0.009549
2000 0.226
center(nr.siblings) 0.005869 0.004302 0.007510
2000 <0.0005 ***
birth.cohort(1703,1722] -0.045487 -0.062240 -0.028965
2000 <0.0005 ***
birth.cohort(1722,1734] -0.055872 -0.072856 -0.036452
2000 <0.0005 ***
birth.cohort(1734,1743] -0.039770 -0.056580 -0.020907
2000 <0.0005 ***
birth.cohort(1743,1750] -0.030713 -0.048301 -0.012214
2000 0.002 **
urban -0.076748 -0.093240 -0.063002
2567 <0.0005 ***
male 0.106074 0.095705 0.115742
2000 <0.0005 ***
paternalage.mean -0.024119 -0.033133 -0.014444
2000 <0.0005 ***
paternalage.diff -0.018367 -0.032083 -0.005721
2000 0.007 **
traitza_children:maternalage.factor(14,20] -0.116510 -0.182432 -0.051978
1876 0.001 ***
traitza_children:maternalage.factor(35,50] -0.045196 -0.094485 0.002640
2000 0.075 .
traitza_children:paternallossTRUE -0.171957 -0.238218 -0.104820
2000 <0.0005 ***
traitza_children:maternallossTRUE -0.499539 -0.566825 -0.430637
2000 <0.0005 ***
traitza_children:center(nr.siblings) -0.023723 -0.028676 -0.018746
1848 <0.0005 ***
traitza_children:birth.cohort(1703,1722] -0.026012 -0.074250 0.026024
2000 0.319
traitza_children:birth.cohort(1722,1734] -0.279418 -0.329462 -0.227187
2000 <0.0005 ***
traitza_children:birth.cohort(1734,1743] -0.260165 -0.312659 -0.204462
2130 <0.0005 ***
traitza_children:birth.cohort(1743,1750] -0.481457 -0.534568 -0.426648
2000 <0.0005 ***
traitza_children:urban -0.604108 -0.645169 -0.562554
1702 <0.0005 ***
traitza_children:male -0.414988 -0.444589 -0.387005
2000 <0.0005 ***
traitza_children:paternalage.mean 0.006545 -0.018570 0.036227
2000 0.651
traitza_children:paternalage.diff -0.097982 -0.136302 -0.060677
2000 <0.0005 ***
On Mon, Mar 9, 2015 at 10:12 PM Ben Bolker <bbolker at gmail.com> wrote:
Ruben Arslan <rubenarslan at ...> writes:
Dear list,
I wanted to ask: Is there any (maybe just back of the envelope) way to
obtain a response prediction for zero-inflated or hurdle type models?
I've fit such models in MCMCglmm, but I don't work in ecology and my
previous experience with explaining such models to "my audience" did not
bode well. When it comes to humans, the researchers I presented to are
used to offspring count being zero-inflated (or acquainted with that
concept), but in my historical data with high infant mortality, it is
modern data it's actually slightly underdispersed).
Currently I'm using lme4 and simply splitting my models into two stages
(finding a mate and having offspring).
That's okay too, but in one population the effect of interest is not
clearly visible in either stage, only when both are taken together (but
then the outcome is zero-inflated).
I expect to be given a hard time for this and hence thought I'd use a
binomial model with the outcome offspring>0 as my main model, but that
turns out to be hard to explain too and doesn't
really do the data justice.
Basically I don't want to be forced to discuss my smallest population
non-replication of the effect because I was insufficiently able to
the statistics behind my reasoning that the effect shows.
I think the back-of-the envelope answer would be that for a two-stage
model with a prediction of p_i for the probability of having a non-zero
response (or in the case of zero-inflated models, the probability of
_not_ having a structural zero) and a prediction of n_i for the
conditional
part of the model, the mean predicted value is p_i*n_i and the
variance is _approximately_ (p_i*n_i)^2*(var(p_i)/p_i^2 + var(n_i)/n_i^2)
(this is assuming
that you haven't built in any correlation between p_i and n_i, which
would be hard in lme4 but _might_ be possible under certain circumstances
via a multitype model in MCMCglmm).
Does that help?