Hi folks, I know, this is a fairly common question and I am really disappointed that I could not find a solution. I am trying to calculate Mahanalobis distances in a data frame, where I have several hundreds groups and several hundreds of variables. Whatever I do, however I subset it I get the "system is computationally singular: reciprocal condition number" error. I know what it means and I know what should be the problem, but there is no way this is a singular matrix. I have uploaded the input file to my ftp: http://mkk.szie.hu/dep/talt/lv/CentInpDuplNoHeader.txt It is a tab delimited txt file with no headers. I tried the StatMatch Mahanalobis function and also this function: mahal_dist <-function (data, nclass, nvariable) { dist <- matrix(0, nclass, nclass) n=0 w <- cov(data) print(w) for(i in 1:nclass) { for(c in 1:nclass){ diffl <- vector(length = nvariable) for(l in 1:nvariable){ diffl[l]=abs(data[i,l]-data[c,l]) } ### matrixes print(diffl) dist[i,c]= (t(diffl))%*%(solve(w))%*%(diffl) } n=n+1 print(n) } return(dist) sqrt_dist <- sqrt(dist) print(sqrt_dist) } I have a deadline for this project (not a homework:)), and I could always use this codes, so I thought I will be able to quit the calculations short, but now I am just lost. I would really appreciate any help. Thanks for any help -- View this message in context: http://r.789695.n4.nabble.com/system-is-computationally-singular-reciprocal-condition-number-tp4647472.html Sent from the R help mailing list archive at Nabble.com.
system is computationally singular: reciprocal condition number
8 messages · langvince, Bert Gunter, David Winsemius +1 more
1. I don't know what StatMatch is. Try using stats::mahalanobis. 2. It's the covariance matrix that is **numerically** singular and can't be inverted. Why do you claim that there's "no way" this could be true when there are hundreds of variables (= dimensions). 3. Try calculating the svd of your matrix and see what you get if you haven't already done so. Cheers, Bert
On Thu, Oct 25, 2012 at 4:14 PM, langvince <langv at purdue.edu> wrote:
Hi folks, I know, this is a fairly common question and I am really disappointed that I could not find a solution. I am trying to calculate Mahanalobis distances in a data frame, where I have several hundreds groups and several hundreds of variables. Whatever I do, however I subset it I get the "system is computationally singular: reciprocal condition number" error. I know what it means and I know what should be the problem, but there is no way this is a singular matrix. I have uploaded the input file to my ftp: http://mkk.szie.hu/dep/talt/lv/CentInpDuplNoHeader.txt It is a tab delimited txt file with no headers. I tried the StatMatch Mahanalobis function and also this function: mahal_dist <-function (data, nclass, nvariable) { dist <- matrix(0, nclass, nclass) n=0 w <- cov(data) print(w) for(i in 1:nclass) { for(c in 1:nclass){ diffl <- vector(length = nvariable) for(l in 1:nvariable){ diffl[l]=abs(data[i,l]-data[c,l]) } ### matrixes print(diffl) dist[i,c]= (t(diffl))%*%(solve(w))%*%(diffl) } n=n+1 print(n) } return(dist) sqrt_dist <- sqrt(dist) print(sqrt_dist) } I have a deadline for this project (not a homework:)), and I could always use this codes, so I thought I will be able to quit the calculations short, but now I am just lost. I would really appreciate any help. Thanks for any help -- View this message in context: http://r.789695.n4.nabble.com/system-is-computationally-singular-reciprocal-condition-number-tp4647472.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Hey Bert, thanks for your fast reply. Yes, based on svd it is singular. The "no way" statement was because of the source of the dataset. I would not expect that. I never used the stats Maha dist calc, but after giving it a shot, not a surprise still singular. Any idea how to manipulate the data to have it run, or other idea to solve the problem? thanks -- View this message in context: http://r.789695.n4.nabble.com/system-is-computationally-singular-reciprocal-condition-number-tp4647472p4647483.html Sent from the R help mailing list archive at Nabble.com.
On Oct 25, 2012, at 4:41 PM, Bert Gunter wrote:
1. I don't know what StatMatch is. Try using stats::mahalanobis. 2. It's the covariance matrix that is **numerically** singular and can't be inverted. Why do you claim that there's "no way" this could be true when there are hundreds of variables (= dimensions). 3. Try calculating the svd of your matrix and see what you get if you haven't already done so.
This was crossposted to StackOverflow where Josh O'Brien has responded that his code using svd() shows the matrix to be highly collinear. This is the upper left corner of the correlation matrix:
V1 V2 V3 V4 V5
V1 1.00000000 0.97250825 0.93390424 0.918813118 0.89705917
V2 0.97250825 1.00000000 0.97118079 0.954020724 0.93992361
V3 0.93390424 0.97118079 1.00000000 0.991508026 0.97602188
V4 0.91881312 0.95402072 0.99150803 1.000000000 0.98837387
V5 0.89705917 0.93992361 0.97602188 0.988373865 1.00000000
length( which(cor(mat)==1) )
[1] 374 Just looking at it should give a good idea why. I can see bands of columns that are identically zero.
david. > Cheers, > Bert > > On Thu, Oct 25, 2012 at 4:14 PM, langvince <langv at purdue.edu> wrote: >> Hi folks, >> >> I know, this is a fairly common question and I am really disappointed that I >> could not find a solution. >> I am trying to calculate Mahanalobis distances in a data frame, where I have >> several hundreds groups and several hundreds of variables. >> >> Whatever I do, however I subset it I get the "system is computationally >> singular: reciprocal condition number" error. >> I know what it means and I know what should be the problem, but there is no >> way this is a singular matrix. >> >> I have uploaded the input file to my ftp: >> http://mkk.szie.hu/dep/talt/lv/CentInpDuplNoHeader.txt >> It is a tab delimited txt file with no headers. >> >> I tried the StatMatch Mahanalobis function and also this function: >> >> mahal_dist <-function (data, nclass, nvariable) { >> dist <- matrix(0, nclass, nclass) >> n=0 >> w <- cov(data) >> print(w) >> for(i in 1:nclass) { >> >> for(c in 1:nclass){ >> diffl <- vector(length = nvariable) >> for(l in 1:nvariable){ >> diffl[l]=abs(data[i,l]-data[c,l]) >> >> } >> ### matrixes >> print(diffl) >> dist[i,c]= (t(diffl))%*%(solve(w))%*%(diffl) >> } >> >> n=n+1 >> print(n) >> } >> return(dist) >> sqrt_dist <- sqrt(dist) >> print(sqrt_dist) } >> >> >> I have a deadline for this project (not a homework:)), and I could always >> use this codes, so I thought I will be able to quit the calculations short, >> but now I am just lost. >> >> I would really appreciate any help. >> >> Thanks for any help >> >> >> >> -- >> View this message in context: http://r.789695.n4.nabble.com/system-is-computationally-singular-reciprocal-condition-number-tp4647472.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Alameda, CA, USA
Hey David, my answers are delayed here, although I am not using my gmail email address:) Yep thats right, those bands of zeros are one of the most important values to define one group, and have a nice distance from the rest of the groups :). I cannot really get rid of those, I bet it would not help if I would change all of them to a really small (but same) value. Thanks -- View this message in context: http://r.789695.n4.nabble.com/system-is-computationally-singular-reciprocal-condition-number-tp4647472p4647486.html Sent from the R help mailing list archive at Nabble.com.
On Fri, Oct 26, 2012 at 12:14 PM, langvince <langv at purdue.edu> wrote:
Whatever I do, however I subset it I get the "system is computationally singular: reciprocal condition number" error. I know what it means and I know what should be the problem, but there is no way this is a singular matrix. I have uploaded the input file to my ftp: http://mkk.szie.hu/dep/talt/lv/CentInpDuplNoHeader.txt It is a tab delimited txt file with no headers.
It's a singular matrix. The data matrix has rank 300 according to
either qr() or svd(). The 301st singular value is about ten orders of
magnitude smaller than the 300th one.
The problem is the rounding of the values -- if you take 372 vectors
in 380-dimensional space they should be linearly independent, but if
you force them to lie on a relatively coarse grid there are quite
likely to be linear dependencies. When I add random noise in the
fourth decimal place, the matrix stops being singular.
-thomas
-thomas
Thomas Lumley Professor of Biostatistics University of Auckland
Hi Thomas, thanks for the comment. I had a similar idea, so got rid of the rounding (these are laboratory measurement based data, thats why I have rounded to only 2 decimal values, but I also tried with 4 and got the same. I will try to get rid of the many 0s with random noise, hopefully it will help. Thanks -- View this message in context: http://r.789695.n4.nabble.com/system-is-computationally-singular-reciprocal-condition-number-tp4647472p4647531.html Sent from the R help mailing list archive at Nabble.com.
Adding a small random value (0,0001-0,0009) to all values helped to solve the problem. Thank You everyone, who helped. -- View this message in context: http://r.789695.n4.nabble.com/system-is-computationally-singular-reciprocal-condition-number-tp4647472p4647540.html Sent from the R help mailing list archive at Nabble.com.