Skip to content

Coefficients: (20 not defined because of singularities)

3 messages · Ott Toomet, Thomas Fischer, Brian Ripley

#
Hi,

"singularity" in this case means that your X'X matrix is singular,
i.e. you have multicollinearity in your data.  A common reasons is
selecting observations with a particular binary feature (e.g. only
women) and then including a control variable for the same feature
(e.g. including child*women cross effect).  You seem to be working
with the continuous variables, so this may not be the case.

A way to check collinearity is using condition numbers (look
kappa() in R).  First, make the model matrix (you may use
model.matrix() but if you have only variables and no special effects,
you may use cbind() instead).  Then take a single column out of the
matrix and calculate the condition number (this is definitely 1).  Now
add the second column, and calculate again.  Print out condition
numbers, corresponding to the number of columns you used.  You should
see where the number explodes, it means corresponding variable is
collinear with some of the previous ones.

Perhaps it helps.

Ott

 | From: Thomas Fischer <th.fischer at gmx.net>
 | Date: Fri, 30 May 2003 10:44:55 +0200
 | 
 | Hello,
 | 
 | I am trying to run a linear regression analysis on my data set. For some 
 | reason most variables are removed due to singularities.
 | 
 | My linear regression looks this way (I am using only partial data, which 
 | is selected by flags):
 | 
 | fm<-lm(log(cplex6.time..sec..[flags]) ~ cplex6.cities[flags] + 
 | log(1/features.meanOver.frust[flags]) + 
 | log(1/features.meanOver.minDist[flags]) +

 | The summary of one of the removed coefficients looks like this:
 | 
 | > summary(features.spanOver.quart1SpanDist[flags])
 |    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 | 0.05584 0.05797 0.06366 0.06311 0.06674 0.07290
 | > summary(log(1/features.spanOver.quart1SpanDist[flags]))
 |    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 |   2.619   2.707   2.754   2.767   2.848   2.885
 | 
 | The summary of a coefficient that was kept looks this way:
 | 
 | > summary(features.quant25Over.minDist[flags])
 |     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
 | 0.001030 0.001030 0.001030 0.001032 0.001030 0.001040
 | > summary(log(1/features.quant25Over.minDist[flags]))
 |    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 |   6.869   6.878   6.878   6.877   6.878   6.878
 | 
 | So, I don't see the difference. Why has the first coefficient been 
 | removed and the second one kept?
 | Please help me.
 | 
 | I'm using R 1.6.2 on a Linux x86 machine.
 | 
 | Greetings,
 | Thomas Fischer
#
Hello,

I am trying to run a linear regression analysis on my data set. For some 
reason most variables are removed due to singularities.

My linear regression looks this way (I am using only partial data, which 
is selected by flags):

fm<-lm(log(cplex6.time..sec..[flags]) ~ cplex6.cities[flags] + 
log(1/features.meanOver.frust[flags]) + 
log(1/features.meanOver.minDist[flags]) +
[...]
avg..steps.to.loc..Opt..norm..[flags] + NN.List.opt..tour.max.[flags])

As I am using inversion and logarithms I set all data to positiv values, 
before running lm():

cplex6.time..sec..[cplex6.time..sec..<=0.00001]=0.00001
features.meanOver.frust[features.meanOver.frust<=0.00001]=0.00001
features.meanOver.minDist[features.meanOver.minDist<=0.00001]=0.00001
[...]
features.varOver.varDist[features.varOver.varDist<=0.00001]=0.00001

Retrieving the summary of fm, I get the message, that some coefficients 
have been removed.

[...]
Coefficients: (20 not defined because of singularities)
                                                Estimate Std. Error t 
value
(Intercept)                                      87.2162    44.1148   
1.977
log(1/features.meanOver.frust[flags])            -2.5298     0.1515 
-16.702
log(1/features.meanOver.minDist[flags])         154.7170    11.3917  
13.582
log(1/features.meanOver.quant25Dist[flags])    -943.4625    71.3505 
-13.223
log(1/features.meanOver.quart1SpanDist[flags])  776.1049    60.0571  
12.923
log(1/features.meanOver.spanDist[flags])         -9.8069     0.1400 
-70.038
log(1/features.meanOver.varDist[flags])         -11.3211     0.6715 
-16.859
log(1/features.quant25Over.minDist[flags])      -46.9655     3.1438 
-14.939
avg..steps.to.loc..Opt..norm..[flags]             0.8324     1.0919   
0.762
                                               Pr(>|t|)
(Intercept)                                      0.0511 .
log(1/features.meanOver.frust[flags])            <2e-16 ***
log(1/features.meanOver.minDist[flags])          <2e-16 ***
log(1/features.meanOver.quant25Dist[flags])      <2e-16 ***
log(1/features.meanOver.quart1SpanDist[flags])   <2e-16 ***
log(1/features.meanOver.spanDist[flags])         <2e-16 ***
log(1/features.meanOver.varDist[flags])          <2e-16 ***
log(1/features.quant25Over.minDist[flags])       <2e-16 ***
avg..steps.to.loc..Opt..norm..[flags]            0.4478
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
[...]


The summary of one of the removed coefficients looks like this:
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
0.05584 0.05797 0.06366 0.06311 0.06674 0.07290
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  2.619   2.707   2.754   2.767   2.848   2.885

The summary of a coefficient that was kept looks this way:
Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
0.001030 0.001030 0.001030 0.001032 0.001030 0.001040
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  6.869   6.878   6.878   6.877   6.878   6.878

So, I don't see the difference. Why has the first coefficient been 
removed and the second one kept?
Please help me.

I'm using R 1.6.2 on a Linux x86 machine.

Greetings,
Thomas Fischer
#
It is the model matrix which is singular, *not* the variable.  You are 
trying to fit a collinear model.

Use alias() to see what is going on.
On Fri, 30 May 2003, Thomas Fischer wrote:

            
No, that they are nor defined, as it says.
That's the summary of the variable, not the coefficient.