Skip to content

how to remove one of any two indices with a correlation greater than 0.90 in a matrix (or data.frame)

1 message · bbslover

#
my code is not right below:
rm(list=ls())
#define data.frame
a=c(1,2,3,5,6); b=c(1,2,3,4,7); c=c(1,2,3,4,8); d=c(1,2,3,5,1);
e=c(1,2,3,5,7) 
data.f=data.frame(a,b,c,d,e)                         
#backup data.f
origin.data<-data.f 
#get correlation matrix
cor.matrix<-cor(origin.data) 
#backup correlation matrix
origin.cor<-cor.matrix
#get dim
dim.cor<-dim(origin.cor) 
#perform Loop
n<-0
for(i in 1:(dim.cor[1]-1)) 
{ 
  for(j in (i+1):(dim.cor[2]))
   { 
      if (cor.matrix[i,j]>=0.95) 
      { 
          data.f<-data.f[,-(i-n)] 
          n<-1
          break
      } 
    } 
} 
origin.cor 
origin.data
data.f 
cor(data.f)

how write the code to do with my questions? and have a simple way?