Skip to content

Howto build combinations of colums of a data frame

6 messages · Juergen Rose, Baptiste Auguie, David Winsemius

#
Hi,

as a R-newcomer I would like to create some new data frames from a given
data frame. The first new data frame should content all pairs of the
columns of the original data frame. The second new data frame should
content all tripels of of the columns of the original data frame and the
last the quadrupel of columns. The values in the new data frames should
be the product of two, three our four original single field values. For
pairs and tripels I could realize that task, with the following R
script:

Lines <- "a    b    c    d
    13     0    15   16
    23    24    25    0   
    33    34     0   36
     0    44    45   46
    53    54     0   55"

DF <- read.table(textConnection(Lines), header = TRUE)

nrow <-length(rownames(DF))
cnames <- colnames(DF)
nc <-length(DF)

nc.pairs <- nc*(nc-1)/2
#  initialize vector
cnames.new <- c(rep("",nc.pairs))
ind <- 1
print(sprintf("nc=%d",nc))
for (i in 1:(nc-1)) {
  if (i+1 <= nc ) {
    for (j in (i+1):nc) {
      cnames.new[ind] <- paste(cnames[i],cnames[j],sep="")
      ind <- ind+1
    }
  }
}

ind <- 1
#  initialize data.frame
pairs <- data.frame(matrix(c(rep(0,nc.pairs*nrow)),ncol=nc.pairs))
for (i in 1:nc) {
  if (i+1 <= nc ) {
    for (j in (i+1):nc) {
      t <- DF[,i] * DF[,j]
      pairs[[ind]] <- t
      ind <- ind+1
    }
  }
}
colnames(pairs) <- cnames.new
print("pairs=");   print(pairs)

nc.tripels <- nc*(nc-1)*(nc-2)/6
#  initialize vector
cnames.new <- c(rep("",nc.tripels))
ind <- 1
print(sprintf("nc=%d",nc))
for (i in 1:nc) {
  if (i+1 <= nc ) {
    for (j in (i+1):nc) {
      if (j+1 <= nc ) {
        for (k in (j+1):nc) {
          cnames.new[ind] <- paste(cnames[i],cnames[j],cnames[k],sep="")
          ind <- ind+1
        }
      }
    }
  }
}

ind <- 1
#  initialize data.frame
tripels <- data.frame(matrix(c(rep(0,nc.tripels*nrow)),ncol=nc.tripels))
for (i in 1:(nc-1)) {
  if (i+1 <= nc ) {
    for (j in (i+1):nc) {
      if (j+1 <= nc ) {
        for (k in (j+1):nc) {
          t <- DF[,i] * DF[,j] * DF[,k]
          tripels[[ind]] <- t
          ind <- ind+1
        }
      }
    }
  }
}
colnames(tripels) <-  cnames.new
print("tripels=");   print(tripels)

I suppose that here is a much shorter way to get the same results. Any
hint is very much appreciated.

Regards
#
On Apr 16, 2009, at 10:14 AM, Juergen Rose wrote:

            
apply(combn(colnames(DF),2), 2, function(x) DF[,x[1]]*DF[,x[2]] )
      [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    0  195  208    0    0  240
[2,]  552  575    0  600    0    0
[3,] 1122    0 1188    0 1224    0
[4,]    0    0    0 1980 2024 2070
[5,] 2862    0 2915    0 2970    0
> apply(combn(colnames(DF),3), 2, function(x)  
DF[,x[1]]*DF[,x[2]]*DF[,x[3]])
       [,1]   [,2] [,3]  [,4]
[1,]     0      0 3120     0
[2,] 13800      0    0     0
[3,]     0  40392    0     0
[4,]     0      0    0 91080
[5,]     0 157410    0     0
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
#
Am Donnerstag, den 16.04.2009, 10:59 -0400 schrieb David Winsemius:

Thanks David,

is there also a shorter way to get the columns names of the new data
frames?

Juergen
#
Perhaps,

apply(combn(letters[1:4],2), 2, paste,collapse="")

Hope this helps,

baptiste
On 16 Apr 2009, at 17:33, Juergen Rose wrote:

            
_____________________________

Baptiste Augui?

School of Physics
University of Exeter
Stocker Road,
Exeter, Devon,
EX4 4QL, UK

Phone: +44 1392 264187

http://newton.ex.ac.uk/research/emag
#
Am Donnerstag, den 16.04.2009, 17:41 +0100 schrieb baptiste auguie:
Thanks Babtiste,

I use now:

Lines <- "a    b    c    d
    13     0    15   16
    23    24    25    0   
    33    34     0   36
     0    44    45   46
    53    54     0   55"

DF <- read.table(textConnection(Lines), header = TRUE)
cnames <- colnames(DF)

cnames.new  <- apply(combn(cnames,2), 2, paste,collapse="")
pairs <- apply(combn(colnames(DF),2), 2, function(x)
DF[,x[1]]*DF[,x[2]] )
colnames(pairs) <- cnames.new
print("pairs=");   print(pairs)

cnames.new  <- apply(combn(cnames,3), 2, paste,collapse="")
tripels <- apply(combn(colnames(DF),3), 2, function(x)
DF[,x[1]]*DF[,x[2]]*DF[,x[3]])
colnames(tripels) <-  cnames.new
print("tripels=");   print(tripels)

and I am very satisfied.

Juergen
#
Those are not actually dataframes. They are matrices. If you want to  
make them into dataframes, use a coercive function. The names can be  
generated from the original column names using the same construction  
as the column creation:

 > apply(combn(colnames(DF),2), 2, paste, collapse="*")
[1] "a*b" "a*c" "a*d" "b*c" "b*d" "c*d"
 > apply(combn(colnames(DF),3), 2, paste, collapse="*")
[1] "a*b*c" "a*b*d" "a*c*d" "b*c*d"