I agree with Juho that there's a typo -- would be something like.
sum((y==1)*log(pred_prob) + (y==0)*log(1-pred_prob))
As far as what predict() does: depending on the value of re.form, it
either gives a population-level prediction (R=0) or a conditional
prediction (R set to its conditional mode). You might be looking for
*marginal* predictions (which Rizopoulous's GLMMadaptive package can do
...)
On 2019-04-20 3:42 a.m., Juho Kristian Ruohonen wrote:
Rolf: Forgive my ignorance, but isn't the relevant log-likelihood here
the log-likelihood of the observed responses in the validation set given
the model-predicted probabilities? I.e. sum(dbinom(VS$y, size = VS$n,
prob = predict(fit, newdata = VS, type = "response"), log = TRUE))? Even
this would be somewhat off because dbinom() isn't aware of the
random-effects integral business. But it looks to me like your current
call is calculating the log-sum of the predicted probabilities of y = 1
in the validation set, not the loglikelihood of the observed responses
in the validation set.
la 20. huhtik. 2019 klo 8.51 Rolf Turner (r.turner at auckland.ac.nz
<mailto:r.turner at auckland.ac.nz>) kirjoitti:
On 20/04/19 12:44 PM, Ben Bolker wrote:
> The GLMM log-likelihood includes an integral over
> the distribution of the random effects.
I was aware of this. I guess what I was na?vely expecting was that
predict.merMod() would handle this. I.e. that this predict method
(with type = "response") would return, for each observed y_i in the
(new) data set
Pr(Y = y_i) = \int_0 Pr(Y = y_i | R = r) f(r) dr
where R is the vector of random effects and f(r) is its probability
density function (multivariate normal, with mean 0 and some
matrix, which has been estimated in the fitting process.
I guess that this is *not* what predict.merMod() returns --- but I
don't
understand why not. It seems to me that this is what it "should"
return.
I'm probably misunderstanding something, possibly simple, possibly
subtle.
Apropos of nothing much, what *does* predict.merMod() return?
Maybe Pr(Y = y_i | R = 0) ???
<SNIP>
> Here is an **inefficient** method for computing the likelihood
>
> coefs <- unlist(getME(fit,c("theta","beta"))
> newdev <- update(fit, data=VS, devFunOnly=TRUE)
> newdev(coefs)
>
> This is slow because it has to reconstruct all of the
> matrices, do permutations to reorder the relevant matrices to be as
> sparse as possible, etc. etc.
Thanks for this. I'll give it a go. I think that the slowness may
be an overwhelming drawback. Anyhow I shall try to test it out.
Thanks again.
cheers,
Rolf
--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276