non-isomorphic sequences - R-help

Mon, Feb 13, 2012 10:05 AM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120213/aa8c0626/attachment.pl>

Petr Savicky

Mon, Feb 13, 2012 11:53 AM #

On Mon, Feb 13, 2012 at 10:05:02AM -0800, zheng wei wrote:

Eliminating isomorphic sequences may be done differently,
if we select different representatives of each equivalence
class. The following also eliminates isomorphic 1,2 sequences

  1 1 1
  1 1 2
  1 2 1
  1 2 2

Is this solution OK?

If p and t are not too large, try the following

  check.row <- function(x)
  {
      y <- unique(x)
      all(y == seq.int(along=y))
  }

  p <- 4
  tt <- 4 # do not rewrite t() for transpose
  elem <- lapply(as.list(pmin(1:p, tt)), function(x) seq.int(length=x))
  a <- as.matrix(rev(expand.grid(rev(elem))))
  ok <- apply(a, 1, check.row)
  out <- a[ok, ]
  out

        Var4 Var3 Var2 Var1
   [1,]    1    1    1    1
   [2,]    1    1    1    2
   [3,]    1    1    2    1
   [4,]    1    1    2    2
   [5,]    1    1    2    3
   [6,]    1    2    1    1
   [7,]    1    2    1    2
   [8,]    1    2    1    3
   [9,]    1    2    2    1
  [10,]    1    2    2    2
  [11,]    1    2    2    3
  [12,]    1    2    3    1
  [13,]    1    2    3    2
  [14,]    1    2    3    3
  [15,]    1    2    3    4

This solution differs from yours, for example, in
the row c(1, 2, 3, 3), which is in your solution
represented by c(2, 3, 1, 1). This a different choice
of the representatives. Is the choice important?

A related thread started at

  https://stat.ethz.ch/pipermail/r-help/2012-January/301489.html

There was an additional requirement that each of t symbols
has at least one occurrence.

Hope this helps.

Petr Savicky.

Zheng Wei

Mon, Feb 13, 2012 2:04 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120213/27231875/attachment.pl>

Petr Savicky

Mon, Feb 13, 2012 2:38 PM #

On Mon, Feb 13, 2012 at 02:04:51PM -0800, zheng wei wrote:

Dear Wei:

Try the following.

  getEquivalent <- function(a, tt)
  {
      b <- as.matrix(rev(expand.grid(rep(list(1:tt), times=max(a)))))
      ok <- apply(b, 1, function(x) length(unique(x))) == ncol(b)
      b <- b[ok, , drop=FALSE]
      dimnames(b) <- NULL
      t(apply(b, 1, function(x) x[a]))
  }

  getEquivalent(c(1, 1, 1, 1), 4)

       [,1] [,2] [,3] [,4]
  [1,]    1    1    1    1
  [2,]    2    2    2    2
  [3,]    3    3    3    3
  [4,]    4    4    4    4

  getEquivalent(c(1, 1, 1, 2), 4)

        [,1] [,2] [,3] [,4]
   [1,]    1    1    1    2
   [2,]    1    1    1    3
   [3,]    1    1    1    4
   [4,]    2    2    2    1
   [5,]    2    2    2    3
   [6,]    2    2    2    4
   [7,]    3    3    3    1
   [8,]    3    3    3    2
   [9,]    3    3    3    4
  [10,]    4    4    4    1
  [11,]    4    4    4    2
  [12,]    4    4    4    3

Hope this helps.

Petr.

Zheng Wei

Mon, Feb 13, 2012 3:32 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120213/ee170483/attachment.pl>

Petr Savicky

Tue, Feb 14, 2012 12:22 AM #

Dear Wei:

It is not clear, which error you get. After adapting the code
to generate sequences in columns instead of rows, i get for
a single column s the error

  Error in do.call(cbind, apply(s, 2, function(x) getEquivalent(x, tt))) : 
  second argument must be a list

This error is caused by the fact that apply() produces preferably
an array and produces a list only if an array canot be used.
For "do.call", we always need a list. A possible solution is
as follows.

  check.row <- function(x)
  {
      y <- unique(x)
      all(y == seq.int(along=y))
  }

  p <- 3
  tt <- 3
  elem <- lapply(as.list(pmin(1:p, tt)), function(x) seq.int(length=x))
  s <- as.matrix(rev(expand.grid(rev(elem))))
  ok <- apply(s, 1, check.row)
  s <- t(s[ok, ]) # sequences are in columns

  getEquivalent <- function(a, tt)
  {
      b <- as.matrix(rev(expand.grid(rep(list(1:tt), times=max(a)))))
      ok <- apply(b, 1, function(x) length(unique(x))) == ncol(b)
      b <- b[ok, , drop=FALSE]
      dimnames(b) <- NULL
      apply(b, 1, function(x) x[a]) # sequences are in columns
  }

  reduced <- s[, 1, drop=FALSE]
  seqList <- lapply(apply(reduced, 2, FUN=list), unlist)
  do.call(cbind, lapply(seqList, function(x) getEquivalent(x, tt)))

       [,1] [,2] [,3]
  [1,]    1    2    3
  [2,]    1    2    3
  [3,]    1    2    3

Hope this helps.

All the best, Petr.