Which columns give rise to linear dependency?
Dear Michael, There are several ways of finding near dependencies. For example, Belsley, Kuh, and Welsch in Regression Diagnostics (1980) use the singular-value decomposition. Here are a couple of simple approaches: (1) Use the principal-component analysis of the standardized X-matrix. Very small component variances correspond to near collinearities, and the corresponding principal-component coefficients give you linear combination of the standardized x's nearly equal to 0. (2) Look at the variance-inflation factors. Very large VIFs correspond to variables that are nearly linearly dependent on others; regress each such variable on the others to see what the dependencies are. (Some of these regressions will be redundant.) I hope that this helps, John
At 12:24 PM 11/5/2002 +0000, Michael Dewey wrote:
Short version
If I have a data frame X and I suspect
that there is a dependency between
the columns how do I confirm that,
and how do I tell which subset of columns
is involved?
==================================
Long version
A colleague had been trying to use
the SPSS RELIABILITY procedure.
It told her that the determinant of the
matrix was small. She asked me what that meant
and I told her that one of her variables was a
linear combination of others.
I agreed to investigate further and imported
the datasets into R. The rows of each X represent
people, and the columns items. The x_{ij} are binary (coded
0/1). Three of the datasets gave the
error message from SPSS. I confirmed that
the matrix involved was indeed var(X)
and that det(var(X)) agreed with SPSS.
What I thought was that I would find
that the smallest eigenvalues would
be zero, but in two of the datasets that was not true.
In the third dataset I traced the problem quickly
to a pair of items which were
perfectly correlated.
1 I suspect that det(var(X)) is a poor test of
whether X is of reduced rank. I have also looked at kappa(X)
which gives values of 10 and 17 for the two offending scales,
but I have no feel for whether that is high (bad?).
2 I thought that by doing svd(X) and then
examining V I could answer my problem.
However the elements of V, specifically
the last column, did not show what I
hoped: most values effectively
zero and the rest adding to zero.
This did work for the third dataset though.
3 I think that SPSS was trying to invert
var(X) in order to compute the multiple
correlation of each item with the others.
Is there any neat way of doing that in R?
I am using 1.5.1 on Windows 98 if that makes
a difference.
If anyone wants to look at one of the datasets
I have her permission to make it available.
Point your browser at http://www.aghmed.fsnet.co.uk/r.html
Michael Dewey
michael.dewey at nottingham.ac.uk
http://www.nottingham.ac.uk/~mhzmd/home.html
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
----------------------------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario, Canada L8S 4M4 email: jfox at mcmaster.ca phone: 905-525-9140x23604 web: www.socsci.mcmaster.ca/jfox ----------------------------------------------------- -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._