Skip to content

Autofilling a large matrix in R

5 messages · wwreith, Rui Barradas, Mark Lamias +2 more

#
I wish to create a matrix of all possible percentages with two decimal place
percision. I then want each row  to sum to 100%. I started with the code
below with the intent to then subset the data based on the row sum. This
works great for 2 or 3 columns, but if I try 4 or more columns the number of
rows become to large. I would like to find a way to break it down into some
kind of for loop, so that I can remove the rows that don't sum to 100%
inside the for loop rather than outside it. My first thought was to take
list from 1:10, 11:20, etc. but that does not get all of the points. 

g<-as.matrix(expand.grid(rep(list(1:100), times=3)))

Any thoughts how to split this into pieces?



--
View this message in context: http://r.789695.n4.nabble.com/Autofilling-a-large-matrix-in-R-tp4645991.html
Sent from the R help mailing list archive at Nabble.com.
#
Hello,

Something like this?

g[rowSums(g) == 100, ]

Hope this helps,

Rui Barradas
Em 12-10-2012 15:30, wwreith escreveu:
#
To avoid FAQ 7.31, you probably should use:

seq(0, 10000) / 10000
On Fri, Oct 12, 2012 at 11:12 AM, Mark Lamias <mlamias at yahoo.com> wrote:

  
    
#
I think the issue is that the with expand.grid and times >= 4 you are likely to run out of memory before subscripting (at least on my machine). 

A simplification is to realize that you are looking for points in a lattice in the interior of a (p - 1)-dimensional simplex for p columns/factors/groups. 

As a start the xsimplex() function in the combinat package generates all the points in such a simplex which sums to a specific value (and nsimplex() calculates the number). 

If you then still want to remove the instances on the edges of the simplex (where one of the percentages is 0), at least you have a more memory efficient base within which to search.

For p = 4 then you will start with
[1] 176851

candidate points instead of
[1] 1e+08

points.

As an example, to generate all combinations for 4 factors excluding any 0's, you could do
[1] 176851
5.4 Mb
[1] 156849

Of course the curse of dimensionality will still get you as the number of factors increases. E.g.
[1] 4598125
175.4 Mb

which is still manageable (but for p = 6 your lattice has nearly 100 million points).

Perhaps you can modify the code of xsimplex to automatically discard zeros.