Questions on Penalization and Prior Variance in bglmer for data with Complete Separation

I couldn?t determine which method blme uses for penalization in bglmer.
Is it based on maximum likelihood, or does it use Markov-Chain Monte Carlo
(MCMC) methods?
From the package description: "Maximum a posteriori estimation for linear
and generalized linear mixed-effects models in a Bayesian setting,
implementing the methods of Chung, et al. (2013)
<doi:10.1007/s11336-013-9328-2>".

So it applies the priors you specify and then maximizes the posterior.
Additionally, is it acceptable to analyze the results of this model using
a frequentist approach?

Sort of? If your sample size is large then then the contributions of the
priors vanish and the algorithm will converge to the same place as maximum
likelihood (unless that fails to converge because the parameters are truly
at the boundary of the space). However, if you already have a well-defined
problem where the maximum likelihood estimate exists and is stable, you're
probably not going to be using blme. In that case, there really isn't a
coherent interpretation to the standard errors blme returns beyond a
measure of the curvature at the posterior mode, which corresponds to a
normal approximation. That generally isn't considered to be very
meaningful, because the posterior can be far from normal. For models where
lmer fails to converge, blme is intended to allow the researcher to rapidly
prototype solutions before fitting in an MCMC Bayesian setting where
posterior credible intervals can be obtained.
what information should I consider to choose an appropriate variance
value?

I suppose that depends on your goal. If you're just trying to
penalize/regularize a model, then using a weakly informative prior based on
the built-in scale of the logistic function is sufficient (the default). If
you actually have substantive information you want to incorporate, then
treat your posterior uncertainty from that as your prior uncertainty in a
new model. However, in my mind that would largely apply if there is a
meta-analysis you can leverage or if the data you are analyzing are part of
a larger dataset that has been partially analyzed.
What might be the cause of this issue?
Ind contains no repeated values, so the random effects can cause perfect
prediction. e.g.

random_intercepts <- ranef(m1)$Ind
cor(
  random_intercepts[match(Data$Ind,
as.integer(row.names(random_intercepts))),],
  Data$Response
)
[1] 0.9999599

You probably want to fit a glm instead for this example, which I believe
gives you the results you are expecting to see.

Cheers,
Vince

On Thu, Feb 6, 2025 at 8:50?PM Tom?s Manuel Chialina <tmanuelch at gmail.com>
wrote:
Hi everyone,
I'm writing because I have a couple of questions about analyzing data with
complete separation when fitting a Binomial GLMM. I've been following the
suggestions made by Ben Bolker in this StackOverflow post<
https://stats.stackexchange.com/questions/132677/binomial-glmm-with-a-categorical-variable-with-full-successes/132678#132678>
and in the glmmTMB FAQ<
https://bbolker.github.io/mixedmodels-misc/ecostats_chap.html>.
(You can generate data with the lines below)
set.seed(123)
# Number of observations per Trial
n_29 <- 50
n_30 <- 50
Data <- data.frame(
  Ind = 1:(n_29 + n_30),
  Trial = rep(c(29, 30), times = c(n_29, n_30)),
  Response = c(rep(1, n_29 * 0.76), rep(0, n_29 * 0.24),
                   rep(1, n_30 * 1), rep(0, n_30 * 0))
)
I've been using the bglmer alternative (blme package) mentioned there,
which involves including a prior on the fixed effects to penalize the
parameter estimates. The model looks like this:
m1 <- bglmer(Response ~ as.factor(Trial) + (1|Ind),
family = binomial,
fixef.prior = normal(cov = diag(8,2)),
data = Data)
(where 8 is the variance parameter and 2 the number of levels of the
factor Trial).
I have two main questions:
Penalization Method in bglmer:
I couldn?t determine which method blme uses for penalization in bglmer. Is
it based on maximum likelihood, or does it use Markov-Chain Monte Carlo
(MCMC) methods? I?m relatively new to Bayesian methods and have been
interpreting the bglmer results similarly to how I interpret GLMMs in a
frequentist framework (e.g., calculating p-values). However, I?m still
uncertain about the specific method applied by bglmer. The package
documentation doesn?t provide much detail on this aspect. Additionally, is
it acceptable to analyze the results of this model using a frequentist
approach?
Choosing the Prior Variance:
Regarding the specification of the prior in bglmer function (e.g.,
fixef.prior = normal(cov = diag(variance, n_fixed_effects)) ), what
information should I consider to choose an appropriate variance value?
Would it be reasonable to use as a reference the standard error of a
parameter estimate not affected by complete separation in the non-penalized
GLMM (e.g., using glmmTMB function)? For example, given the summary of the
following non-penalized model:
m1NonPenalized <- glmmTMB(Response ~ as.factor(Trial) +(1|Ind),
                family = binomial,
                data = DataM1E2)

summary(m1NonPenalized)
Conditional model:
                    Estimate Std. Error z value Pr(>|z|)
(Intercept)        1.153e+00  4.958e-01   2.325   0.0201 *
as.factor(Trial)30 3.338e+01  4.458e+06   0.000   1.0000
Here, we observe that Trial 30 is affected by separation as its standard
error is extremely large, whereas the intercept (Trial 29) is not. Would
the standard error of Trial 29  serve as a reference for choosing an
appropriate variance value (i.e., as the std.error is 0.49, a variance
value of 1 or higher would be reasonable)?
Additionally, I encountered an issue with the estimation of probabilities
in the model fitted with bglmer (m1). You can generate an histogram to
inspect the observed frequencies of data in order to follow the discussion
below:
hist <- Data %>%  ggplot(aes(x = as.factor(Response))) +
geom_bar(aes(y = after_stat(prop) * 100,group = Trial, fill = Trial),
colour = "black") +
  scale_y_continuous(limits = c(0,100), n.breaks = 10) +
  ylab("% of responses") +
  xlab("Response") +
  facet_grid(. ~ Trial)
While the observed probabilities of response were around 0.76  for Trial
29 and 1 for Trial 30, the model estimated probabilities of approximately
0.999 for both, with incredibly small standard errors (much smaller than
expected given the data variability):

Trial    prob         SE  df asymp.LCL asymp.UCL

 29    0.99985 2.1918e-04 Inf   0.99747   0.99999

 30    0.99998 3.3708e-05 Inf   0.99921   1.00000

I noticed that reducing the variance value from 8 to 1 inside fixef.prior
produced estimates closer to the observed values, as you can see below:

Trial  prob     SE  df asymp.LCL asymp.UCL

 29    0.828 0.0769 Inf     0.626     0.933

 30    0.973 0.0195 Inf     0.893     0.994

What might be the cause of this issue?
Thank you very much in advance for your time and help!
Cheers,
Tom?s

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Questions on Penalization and Prior Variance in bglmer for data with Complete Separation

Thread (2 messages)