Skip to content

CORRECTION: Re: Multicollinearity with brglm?

4 messages · woodbomb, Ioannis Kosmidis

#
I'm running brglm to do binomial loguistic regression.

The perhaps multicollinearity-related feature(s) are: 

(1) the k IVs are all binary categorical, coded as 0 or 1; 
(2) each row of the IVs contains exactly C (< k) 1's; (I think this is the
source of the problem)
(3) there are n * k unique rows, where n is as much as 10; 
(4) when brglm is run, at least 1 IV is reported as involving a singularity
and this occurs for nearly every choice of k, n.

How should I go about computing estimates for the offending IVs? I'm
interested primarily in the reliability of the parameter estimates.
#
Could you please include some code that demonstrates your problem?

Best wishes,

Ioannis
On Wednesday 01 April 2009 15:26:33 woodbomb wrote:
#
Ioannis,

Here's an illustrative example. Note that: glm also objects to X4; X1,..,X4
are defined as factors.

I've looked (albeit in a crude way) at various examples using the perturb
package and it seems to confirm that X4 is the source of multicollinearity.
As I say, I think the constant row-sum condition is the source of the
problem, but I'm not sure why or how to deal with it. 

Thanks for your interest (and for the finite parameter estimates brglm
provides)!
$names
[1] "X1" "X2" "X3" "X4"

$row.names
[1] "2" "3" "4" "5"

$class
[1] "data.frame"
X1 X2 X3 X4
2  0  1  0  1
3  0  1  1  0
4  1  0  0  1
5  1  0  1  0
$dim
[1] 4 2

$dimnames
$dimnames[[1]]
NULL


$dimnames[[2]]
[1] "s" "f"
s f
[1,] 3 7
[2,] 2 8
[3,] 5 5
[4,] 3 7
Call:
brglm(formula = cbind(s, f) ~ X1 + X2 + X3 + X4, family = binomial, 
    data = data)


Coefficients: (1 not defined because of singularities)

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4.5797  on 5  degrees of freedom
Residual deviance: 3.6469  on 2  degrees of freedom
Penalized deviance: -1.79616 
AIC:  26.793
Call:
glm(formula = cbind(s, f) ~ X1 + X2 + X3 + X4, family = binomial, 
    data = data)

Deviance Residuals: 
      1        2        3        4        5        6  
 0.7103  -1.0256   0.3445   0.3760  -1.1876   0.6072  

Coefficients: (1 not defined because of singularities)
              Estimate Std. Error  z value Pr(>|z|)
(Intercept) -1.356e+00  9.219e-01   -1.471    0.141
X11          2.445e-01  7.003e-01    0.349    0.727
X21          7.264e-01  7.048e-01    1.031    0.303
X31          6.316e-14  6.959e-01 9.08e-14    1.000
X41                 NA         NA       NA       NA

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 5.0363  on 5  degrees of freedom
Residual deviance: 3.5957  on 2  degrees of freedom
AIC: 26.742

Number of Fisher Scoring iterations: 4
#
Thanks for your mail.  I guess that the constant row sum on X would create 
problems in a simulation framework because you might end up with linearly 
dependent columns or even with columns of zeros (which I believe do not make 
much sense).

First of all, I think there is a problem with your example below. For this X 
two columns should be eliminated if a constant is to be included in the model 
and in  summary(mod.simple.brglm) only one appears to be eliminated.

The reason for eliminating columns is merely to report a parameterization that 
is identifiable. 

For example, consider a single binomial variable with 
observed value 2 and total number of trials 10. Also, let's suppose that we 
are interested on the log-odds of success beta1.  The 
estimated log-odds for this sample is

hat{beta1} = -1.386

so that the fitted probability is 0.2.

If another constant, say beta2, is introduced in the model then 
there is a whole infinity of values that the vector (beta1,beta2) can take 
for giving fitted probability 0.2 (like for example (-1 , -0.386) or 
(-10^8 , 10^8 - 1.386) and no choice is better than another.  So glm chooses 
to eliminate one of the two constants in order to get an identifiable 
parameterization for which for a specific value of beta1 there corresponds 
one and only one value of the fitted probability.

I hope this helps.

Best wishes,

Ioannis
On Thursday 02 April 2009 12:43:37 woodbomb wrote: