Hi, Okay, I have a n x p matrix X, which I know is not full rank. In particular, there may be linear dependencies amongst the columns (but not that many). What is a fast way of finding a linearly independent subset of the columns of X that will span the column space of X, in R? If it helps, I have the QR decomposition of the original X 'for free'. I know that it's possible to do this directly by looping over the columns and adding them, but at the very least, a solution without horrible slow loops would be nice. Any ideas welcome. Zhou Fang
Finding a basis in a set of vectors
4 messages · Zhou Fang, Peter Dalgaard
Zhou Fang wrote:
Hi, Okay, I have a n x p matrix X, which I know is not full rank. In particular, there may be linear dependencies amongst the columns (but not that many). What is a fast way of finding a linearly independent subset of the columns of X that will span the column space of X, in R? If it helps, I have the QR decomposition of the original X 'for free'. I know that it's possible to do this directly by looping over the columns and adding them, but at the very least, a solution without horrible slow loops would be nice.
Have a look at stats:::Thin.col(), but beware that it isn't terribly robust.
Any ideas welcome. Zhou Fang
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Ah ha, that does work. What do you mean it isn't robust, though? I mean, obviously linear dependency structures in general are not stable under small perturbations...? Or is it that it's platform dependent? Zhou
On Fri, Feb 6, 2009 at 2:28 PM, Peter Dalgaard <P.Dalgaard at biostat.ku.dk> wrote:
Zhou Fang wrote:
Hi, Okay, I have a n x p matrix X, which I know is not full rank. In particular, there may be linear dependencies amongst the columns (but not that many). What is a fast way of finding a linearly independent subset of the columns of X that will span the column space of X, in R? If it helps, I have the QR decomposition of the original X 'for free'. I know that it's possible to do this directly by looping over the columns and adding them, but at the very least, a solution without horrible slow loops would be nice.
Have a look at stats:::Thin.col(), but beware that it isn't terribly robust.
Any ideas welcome. Zhou Fang
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Zhou Fang wrote:
Ah ha, that does work. What do you mean it isn't robust, though? I mean, obviously linear dependency structures in general are not stable under small perturbations...? Or is it that it's platform dependent?
The former. In particular there is an issue with columns that have all entries near-zero. There were a couple of gotchas in its main application of thinning projection matrices within anova.mlm (the 2.8.1 version has a zapsmall() kludge for that reason). (This can of course also make things platform dependent, since roundoff accumulates differently depending on compilers and such.)
Zhou On Fri, Feb 6, 2009 at 2:28 PM, Peter Dalgaard <P.Dalgaard at biostat.ku.dk> wrote:
Zhou Fang wrote:
Hi, Okay, I have a n x p matrix X, which I know is not full rank. In particular, there may be linear dependencies amongst the columns (but not that many). What is a fast way of finding a linearly independent subset of the columns of X that will span the column space of X, in R? If it helps, I have the QR decomposition of the original X 'for free'. I know that it's possible to do this directly by looping over the columns and adding them, but at the very least, a solution without horrible slow loops would be nice.
Have a look at stats:::Thin.col(), but beware that it isn't terribly robust.
Any ideas welcome. Zhou Fang
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907