Skip to content

help with apply, please

13 messages · Duncan Murdoch, Gabor Grothendieck, Patrick Burns +2 more

#
Dear list,

I have a problem with a toy example:
mtrx <- matrix(c(1,1,0,1,1,1,0,1,1,0,0,1), nrow=3)
rownames(ma) <- letters[1:3]

I would like to determine which is the minimum combination of rows that 
"covers" all columns with at least a 1.
None of the rows covers all columns; all three rows clearly covers all 
columns, but there are simpler combinations (1st and the 3rd, or 2nd and 3rd) 
which also covers all columns.

I solved this problem by creating a second logical matrix which contains all 
possible combinations of rows:
tt <- matrix(as.logical(c(1,0,0,0,1,0,0,0,1,1,1,0,1,0,1,0,1,1,1,1,1)), nrow=3)

and then subset the first matrix and check if all columns are covered.
This solution, though, is highly inneficient and I am certain that a 
combination of apply or something will do.

###########################

possibles <- NULL
length.possibles <- NULL
## I guess the minimum solution is has half the number of rows
guesstimate <- floor(nrow(tt)/2) + nrow(tt) %% 2
checked <- logical(nrow(tt))
repeat {
    ifelse(checked[guesstimate], break, checked[guesstimate] <- TRUE)
    partials <- as.matrix(tt[, colSums(tt) == guesstimate])
    layer.solution <- logical(ncol(partials))
    
    for (j in 1:ncol(partials)) {
        if (length(which(colSums(mtrx[partials[, j], ]) > 0)) == ncol(mtrx)) {
            layer.solution[j] <- TRUE
        }
    }
    if (sum(layer.solution) == 0) {
        if (!is.null(possibles)) break
        guesstimate <- guesstimate + 1
    } else {
        for (j in which(layer.solution)) {
            possible.solution <- rownames(mtrx)[partials[, j]]
            possibles[[length(possibles) + 1]] <- possible.solution
            length.possibles <- c(length.possibles, length(possible.solution))
        }
        guesstimate <- guesstimate - 1
    }
}
final.solution <- possibles[which(length.possibles == min(length.possibles))]

###########################

More explicitely (if useful) it is about reducing a prime implicants chart in 
a Quine-McCluskey boolean minimisation algorithm. I tried following the 
original algorithm applying row dominance and column dominance, but (as I am 
not a computer scientist), I am unable to apply it.

If you have a better solution for this, I would be gratefull if you'd share 
it.
Thank you in advance,
Adrian
#
On Saturday 19 November 2005 17:24, Gabor Grothendieck wrote:
Thank you Gabor, this solution is superbe (you never stop amazing me :)
Now... it only finds _one_ of the multiple minimum solutions. In the toy 
example, there are two minimum solutions, hence I reckon the output should 
have been a list with:
[[1]]
[1] 1 0 1

[[2]]
[1] 0 1 1

Also, thanks to Duncan and yes, I do very much care finding the smallest 
possible solutions (if I correctly understand your question).

It seems that lp function is very promising, but can I use it to find _all_ 
minimum solutions?

Adrian
#
On 11/19/2005 8:00 AM, Adrian DUSA wrote:
First of all, I imagine there isn't a unique solution, i.e. there are 
probably several subsets that can't be reduced but which are not equal. 
  Do you care if you find the smallest one of those?  If so, it looks 
like a reasonably hard problem.  If not, it's a lot easier:  total all 
of the columns, then see if there is any row whose entries all 
correspond to columns with counts bigger than 1.  Remove it, and continue.

This will find a local minimum in one pass through the rows.

You could make it better by sorting the rows into an order so that rows 
that dominate other rows come later, but I think you still wouldn't be 
guaranteed to find the global min.  (By the way, I'm not sure if we have 
a function that can do this:  i.e., given a partial ordering on the 
rows, sort the matrix so that the resulting order is consistent with it.)

Duncan Murdoch
#
Try minizing 1'x subject to 1 >= x >= 0 and m'x >= 1 where m is your mtrx
and ' means transpose.  It seems to give an integer solution, 1 0 1,
with linear programming even in the absence of explicit integer
constraints:

library(lpSolve)
lp("min", rep(1,3), rbind(t(mtrx), diag(3)), rep(c(">=", "<="), 4:3),
rep(1,7))$solution
On 11/19/05, Adrian DUSA <adi at roda.ro> wrote:
#
On 11/19/05, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
Just one after thought.  The constraint x <= 1 will not be active so,
eliminating it, the above can be reduced to:

lp("min", rep(1,3), rbind(t(mtrx)), rep(">=", 4), rep(1,4))$solution
#
On 11/19/05, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
Although the above is not wrong I should have removed the rbind
which is no longer needed and simplifying it further, as it seems
that lp will do the rep for you itself for certain arguments, gives:

lp("min", rep(1,3), t(mtrx), ">=", 1)$solution  # 1 0 1
#
I suspect that the answer is that finding all solutions
will be hard.  L1 regression is a special case of LP.
I learned how to move around the corners of the
solution space, and could easily find all of the solutions
in the special case of a two-way table.  However,
sometimes there were a lot of solutions.

I would guess that your problem has a lot of solutions
as well.     One cheat would be to do the LP problem
multiple times with the rows of your matrix randomly
permuted.  Assuming you keep track of the real rows,
you could then get a sense of how many solutions there
might be.

Patrick Burns
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")
Adrian Dusa wrote:

            
#
On Saturday 19 November 2005 19:17, Patrick Burns wrote:
Thanks for the answer. The trick does work (i.e. it finds all minimum 
solutions) provided that I permute the rows a sufficient number of times. And 
I have to compare each solution to the existing (unique) ones, which takes a 
lot of time...
In your experience, what would be the definiton of "multiple times" for large 
matrices?

My (dumb) solution is guaranteed to find all possible minimums, because it 
checks every possible combination. For large matrices, though, this would be 
really slow. I wonder if that could be vectorized in some way; before the LP 
function, I was thinking there might be a more efficient way to loop over all 
possible columns (using perhaps the apply family).

Thanks again,
Adrian
#
Dear Ted,
On Saturday 19 November 2005 20:51, Ted Harding wrote:
My case is probably a subset of your general algorithm.
Peaking in the computer science webpages for Quine-McCluskey algorithm, I 
learned that there are way to simplify a matrix (prime implicants chart) 
before trying to find the minimum solutions. 
For example:
1. Row dominance
   0 0 1 1 0 0
   0 1 1 1 0 0
The second row containes all elements that the first row contains, therefore 
the first row (dominated) can be droped

2. Column dominance
  0 1
  0 1
  1 1
  1 1
  0 0
The second column dominates the first column, therefore we can drop the second 
(dominating) column

In a Quine-McCluskey algorithm, the number of rows will always be much lower 
than the number of columns, and applying the two above principles will make 
the matrix even more simple.
There are algorithms written in other languages (like Java) freely available 
on the Internet, but I have no idea how to adapt them to R.

Best,
Adrian
#
On Saturday 19 November 2005 20:51, Ted Harding wrote:
I found this presentation very explicit:
http://www.cs.ualberta.ca/~amaral/courses/329/webslides/Topic5-QuineMcCluskey/sld079.htm

Best wishes,
Adrian
#
On 19-Nov-05 Adrian Dusa wrote:
Thinking about it, I'm not sure that finding the complete set
of solutions (in general, not just to Adrian's toy example)
can be done without enumeration, complete up to the stage
where it is known that all minimal solutions have been found
(and, having said that, I shall probably provoke an expert
into refuting me; im which case, all the better!).

For example, consider a 9-column matrix with 84 rows, each
with 3 1s and 6 0s (nCm(9,3)=84).

Every minimal solution is the union of 3 rows, e.g.

  1 1 1 0 0 0 0 0 0
  0 0 0 1 1 1 0 0 0
  0 0 0 0 0 0 1 1 1

and there is a 1:1 correspondence between the choices of
3 out of 9 and the 84 rows.

So there are 84 = nCm(9,3) choices of 3 1s for the first row,
leaving 6 0s. There are then nCm(6,3) = 20 choices of 3 1s
for the second row. Having done that, the choice of the 1s
for the third row s determined. But this must be divided by
3! = 6 to give "unordered" rows, so there are 84*20/6 = 280
different minimal solutions.


Without yet having taken this down to the level of R code,
I would envisage an algorithm for an N*k matrix M on the
following lines.

1. Check that it's possible: sum(colSums(matr)==0) = 0

2. For m = 1:N work through all choices of m rows out of
   the N; if, for the current value of m, any one of these
   covers, complete the choices for this value of m,
   noting which choices cover, and then exit.

This will give you all the choices of m out of N rows
which cover the columns, for the smallest value of m
for which this is possible, and you will not have
searched in larger values of m.

The function 'combn' in package combinat may be useful
here: combn(N,m) returns an array with nCm columns and
m rows, where the values in a column are a choice of m
out of (1:N).

There is an opposite version of this algorithm: If there
are not many 1s in matr, then you might guess that a lot
of rows may be needed. So it may be more efficient to
do it the other way round: Starting with all rows (m=N),
leave them out 1 at a time, then 2 at a time, and so on,
until you get a value of m such that no choice of m out
of N covers the columns. Then (m+1) is the minimal order
of a covering set of rows, and you've just found all these.

But this may not be sound either, since if one row has
many 1s then possibly some few others can make up the
covering, so it would be better to do it the original
way round. I can't get a clear view of what sort of
criterion to apply to determine this issue programmatically.

There is bound to be a good algorithm out there somewhere
for finding a "minimal coveriung set" but I don't know it!

Comments?

Best wishes to all,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 19-Nov-05                                       Time: 18:51:00
------------------------------ XFMail ------------------------------
#
On Saturday 19 November 2005 22:09, Gabor Grothendieck wrote:
I'm speechless.
It is exactly what I needed.
A billion of thanks!
Adrian
#
On 11/19/05, Adrian DUSA <dusa.adrian at gmail.com> wrote:
Getting back to your original question of using apply, solving the LP
gives us the number of components in any minimal solution and
exhaustive search of all solutions with that many components can
be done using combinations from gtools and apply like this:

library(gtools) # needed for combinations
soln <- lp("min", rep(1,3), rbind(t(mtrx)), rep(">=", 4), rep(1,4))$solution
k <- sum(soln)
m <- nrow(mtrx)
combos <- combinations(m,k)
combos[apply(combos, 1, function(idx) all(colSums(mtrx[idx,]))),]

In the example we get:

     [,1] [,2]
[1,]    1    3
[2,]    2    3

which says that rows 1 and 3 of mtrx form one solution
and rows 2 and 3 of mtrx form another solution.