An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120213/aa8c0626/attachment.pl>
non-isomorphic sequences
6 messages · Zheng Wei, Petr Savicky
On Mon, Feb 13, 2012 at 10:05:02AM -0800, zheng wei wrote:
Dear All,
Sorry for the typoes earlier, let me repost the question.
Suppose I want to generate sequences of length 3 from two symbols {1,2}, we get the following 8 sequences
1 1 1
1 1 2
1 2 1
1 2 2
2 1 1
2 1 2
2 2 1
2 2 2
However, I do not want all these 8 sequences. I call two sequencs to be isomorphic if one sequence could be obtained from the other by relabelling the symbols. For example, 111 is isomorphic to 222, 112 is isomorphic to 221.?By eliminating all these isomorphic ones, what I want is the following
1 1 1
1 1 2
1 2 1
2 1 1
Eliminating isomorphic sequences may be done differently, if we select different representatives of each equivalence class. The following also eliminates isomorphic 1,2 sequences 1 1 1 1 1 2 1 2 1 1 2 2 Is this solution OK?
In general, I need to generate non-isomorphic sequences of length p from t distinct symbols. For example, when p=3, t=3 we have matrix(c(1,2,3,1,1,2,2,1,1,1,2,1,1,1,1),3,5) [1,]??? 1??? 1??? 2??? 1??? 1 [2,]??? 2??? 1??? 1??? 2??? 1 [3,]??? 3??? 2??? 1??? 1??? 1 When p=4, t=4 we have matrix(c(1,2,3,4,1,1,2,3,1,2,1,3,1,2,3,1,2,1,1,3,2,3,1,1,2,1,3,1,1,1,2,2,1,2,1,2,1,2,2,1,1,1,1,2,1,1,2,1,1,2,1,1,2,1,1,1,1,1,1,1),4,15) [1,]??? 1??? 1??? 1??? 1??? 2??? 2??? 2??? 1??? 1???? 1???? 1???? 1???? 1???? 2???? 1 [2,]??? 2??? 1??? 2??? 2??? 1??? 3??? 1??? 1??? 2???? 2???? 1???? 1???? 2???? 1???? 1 [3,]??? 3??? 2??? 1??? 3??? 1??? 1??? 3??? 2??? 1???? 2???? 1???? 2???? 1???? 1???? 1 [4,]??? 4??? 3??? 3??? 1??? 3??? 1??? 1??? 2??? 2???? 1???? 2???? 1???? 1???? 1???? 1 In general, I need to do this for arbitrary choices of p and t.
If p and t are not too large, try the following
check.row <- function(x)
{
y <- unique(x)
all(y == seq.int(along=y))
}
p <- 4
tt <- 4 # do not rewrite t() for transpose
elem <- lapply(as.list(pmin(1:p, tt)), function(x) seq.int(length=x))
a <- as.matrix(rev(expand.grid(rev(elem))))
ok <- apply(a, 1, check.row)
out <- a[ok, ]
out
Var4 Var3 Var2 Var1
[1,] 1 1 1 1
[2,] 1 1 1 2
[3,] 1 1 2 1
[4,] 1 1 2 2
[5,] 1 1 2 3
[6,] 1 2 1 1
[7,] 1 2 1 2
[8,] 1 2 1 3
[9,] 1 2 2 1
[10,] 1 2 2 2
[11,] 1 2 2 3
[12,] 1 2 3 1
[13,] 1 2 3 2
[14,] 1 2 3 3
[15,] 1 2 3 4
This solution differs from yours, for example, in
the row c(1, 2, 3, 3), which is in your solution
represented by c(2, 3, 1, 1). This a different choice
of the representatives. Is the choice important?
A related thread started at
https://stat.ethz.ch/pipermail/r-help/2012-January/301489.html
There was an additional requirement that each of t symbols
has at least one occurrence.
Hope this helps.
Petr Savicky.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120213/27231875/attachment.pl>
On Mon, Feb 13, 2012 at 02:04:51PM -0800, zheng wei wrote:
Dear Petr, ? This is fantastic! ? I have one more question, when p=4, tt=4. We have 15 non-isomorphic sequences as you have generated. Among these 15, I selected?2 sequences. How do I recover all the members of the equivalent classes corresponding to these?2 sequences? For example, corresponding to the sequence of?1111, I would like to recover 1111,2222,3333,4444 from this sequence.
Dear Wei:
Try the following.
getEquivalent <- function(a, tt)
{
b <- as.matrix(rev(expand.grid(rep(list(1:tt), times=max(a)))))
ok <- apply(b, 1, function(x) length(unique(x))) == ncol(b)
b <- b[ok, , drop=FALSE]
dimnames(b) <- NULL
t(apply(b, 1, function(x) x[a]))
}
getEquivalent(c(1, 1, 1, 1), 4)
[,1] [,2] [,3] [,4]
[1,] 1 1 1 1
[2,] 2 2 2 2
[3,] 3 3 3 3
[4,] 4 4 4 4
getEquivalent(c(1, 1, 1, 2), 4)
[,1] [,2] [,3] [,4]
[1,] 1 1 1 2
[2,] 1 1 1 3
[3,] 1 1 1 4
[4,] 2 2 2 1
[5,] 2 2 2 3
[6,] 2 2 2 4
[7,] 3 3 3 1
[8,] 3 3 3 2
[9,] 3 3 3 4
[10,] 4 4 4 1
[11,] 4 4 4 2
[12,] 4 4 4 3
Hope this helps.
Petr.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120213/ee170483/attachment.pl>
Dear Wei:
When I have a matrix s, with each column represents?a sequence. I want to recover all equivalent sequences from the sequences/columns in s. I used this command ? do.call(cbind,apply(s,2,function(x) getEquivalent(x,tt))) ? This did a good job when ncol(s) >1, but when ncol(s)=1, there is an error. ? How to getter a better coding which could deal with either the case of ncol(s) >1 or ncol(s)=1 by itself?
It is not clear, which error you get. After adapting the code
to generate sequences in columns instead of rows, i get for
a single column s the error
Error in do.call(cbind, apply(s, 2, function(x) getEquivalent(x, tt))) :
second argument must be a list
This error is caused by the fact that apply() produces preferably
an array and produces a list only if an array canot be used.
For "do.call", we always need a list. A possible solution is
as follows.
check.row <- function(x)
{
y <- unique(x)
all(y == seq.int(along=y))
}
p <- 3
tt <- 3
elem <- lapply(as.list(pmin(1:p, tt)), function(x) seq.int(length=x))
s <- as.matrix(rev(expand.grid(rev(elem))))
ok <- apply(s, 1, check.row)
s <- t(s[ok, ]) # sequences are in columns
getEquivalent <- function(a, tt)
{
b <- as.matrix(rev(expand.grid(rep(list(1:tt), times=max(a)))))
ok <- apply(b, 1, function(x) length(unique(x))) == ncol(b)
b <- b[ok, , drop=FALSE]
dimnames(b) <- NULL
apply(b, 1, function(x) x[a]) # sequences are in columns
}
reduced <- s[, 1, drop=FALSE]
seqList <- lapply(apply(reduced, 2, FUN=list), unlist)
do.call(cbind, lapply(seqList, function(x) getEquivalent(x, tt)))
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 3
[3,] 1 2 3
Hope this helps.
All the best, Petr.