Skip to content

another question: how to delete one of columes in two ones with high correlation(0.95)

3 messages · Nikhil Kaza, bbslover

#
my programe is below:
a=c(1,2,1,1,1); b=c(1,2,3,4,1); c=c(3,4,3,3,3); d=c(1,2,3,5,1);
e=c(1,5,3,5,1) 
data.f=data.frame(a,b,c,d,e)
origin.data<-data.f
cor.matrix<-cor(origin.data)
origin.cor<-cor.matrix
m<-0
for(i in 1:(cor.matrix[1]-1))
{
  for(j in (i+1):(cor.matrix[2]))
   {
      if (cor.matrix[i,j]>=0.95)
      {
          data.f<-data.f[,-i];
           i<-i+1
      } 
   }
}
origin.cor
data.f

the result seems to be not righ.
 origin.cor
           a          b          c          d         e
a  1.0000000 -0.0857493  1.0000000 -0.1336306 0.5590170
b -0.0857493  1.0000000 -0.0857493  0.9854509 0.7669650
c  1.0000000 -0.0857493  1.0000000 -0.1336306 0.5590170
d -0.1336306  0.9854509 -0.1336306  1.0000000 0.7470179
e  0.5590170  0.7669650  0.5590170  0.7470179 1.0000000
b c d e
1 1 3 1 1
2 2 4 2 5
3 3 3 3 3
4 4 3 5 5
5 1 3 1 1

either colume b or colume d shold be deleted ,for they hight
correlation(0.9854509), but the result not,why?
#
You need  dim(cor.matrix)[1]

Following might be better instead of a loop, to to get the row ids of  
a matrix

(which(cor.matrix >=0.95) %/% dim(cor.matrix)[1])+1

for column ids use modulus instead of integer divison.

(which(cor.matrix >=0.95) %% dim(cor.matrix)[1])

There are probably better ways than this.

Nikhil

but probably a better way to do this would be
On 6 Nov 2009, at 3:16AM, bbslover wrote:

            
#
thank you. I need learn it, after that, maybe I can understant it well.

thank Nikhil
Nikhil Kaza-2 wrote: