I wish to create a matrix of all possible percentages with two decimal place percision. I then want each row to sum to 100%. I started with the code below with the intent to then subset the data based on the row sum. This works great for 2 or 3 columns, but if I try 4 or more columns the number of rows become to large. I would like to find a way to break it down into some kind of for loop, so that I can remove the rows that don't sum to 100% inside the for loop rather than outside it. My first thought was to take list from 1:10, 11:20, etc. but that does not get all of the points. g<-as.matrix(expand.grid(rep(list(1:100), times=3))) Any thoughts how to split this into pieces? -- View this message in context: http://r.789695.n4.nabble.com/Autofilling-a-large-matrix-in-R-tp4645991.html Sent from the R help mailing list archive at Nabble.com.
Autofilling a large matrix in R
5 messages · wwreith, Rui Barradas, Mark Lamias +2 more
Hello, Something like this? g[rowSums(g) == 100, ] Hope this helps, Rui Barradas Em 12-10-2012 15:30, wwreith escreveu:
I wish to create a matrix of all possible percentages with two decimal place percision. I then want each row to sum to 100%. I started with the code below with the intent to then subset the data based on the row sum. This works great for 2 or 3 columns, but if I try 4 or more columns the number of rows become to large. I would like to find a way to break it down into some kind of for loop, so that I can remove the rows that don't sum to 100% inside the for loop rather than outside it. My first thought was to take list from 1:10, 11:20, etc. but that does not get all of the points. g<-as.matrix(expand.grid(rep(list(1:100), times=3))) Any thoughts how to split this into pieces? -- View this message in context: http://r.789695.n4.nabble.com/Autofilling-a-large-matrix-in-R-tp4645991.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121012/04f66d80/attachment.pl>
To avoid FAQ 7.31, you probably should use: seq(0, 10000) / 10000
On Fri, Oct 12, 2012 at 11:12 AM, Mark Lamias <mlamias at yahoo.com> wrote:
If you are after all the possible percentages with two decimal places, why don't you use this: seq(from=0, to=100, by=.01)/100 I'm not really sure what you are trying to do in terms of rows and columns, however. Can you be a bit more specific on what each row/column is? Are you trying to group the numbers so that all the entries in a row add up to 100% and then, once it does, split the following entries onto the next row until they add up to 100%, etc.? Thanks.
________________________________ From: wwreith <reith_william at bah.com> To: r-help at r-project.org Sent: Friday, October 12, 2012 10:30 AM Subject: [R] Autofilling a large matrix in R I wish to create a matrix of all possible percentages with two decimal place percision. I then want each row to sum to 100%. I started with the code below with the intent to then subset the data based on the row sum. This works great for 2 or 3 columns, but if I try 4 or more columns the number of rows become to large. I would like to find a way to break it down into some kind of for loop, so that I can remove the rows that don't sum to 100% inside the for loop rather than outside it. My first thought was to take list from 1:10, 11:20, etc. but that does not get all of the points. g<-as.matrix(expand.grid(rep(list(1:100), times=3))) Any thoughts how to split this into pieces? -- View this message in context: http://r.789695.n4.nabble.com/Autofilling-a-large-matrix-in-R-tp4645991.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.
I think the issue is that the with expand.grid and times >= 4 you are likely to run out of memory before subscripting (at least on my machine). A simplification is to realize that you are looking for points in a lattice in the interior of a (p - 1)-dimensional simplex for p columns/factors/groups. As a start the xsimplex() function in the combinat package generates all the points in such a simplex which sums to a specific value (and nsimplex() calculates the number). If you then still want to remove the instances on the edges of the simplex (where one of the percentages is 0), at least you have a more memory efficient base within which to search. For p = 4 then you will start with
require(combinat) nsimplex(4,100)
[1] 176851 candidate points instead of
100^4
[1] 1e+08 points. As an example, to generate all combinations for 4 factors excluding any 0's, you could do
mat <- xsimplex(4,100)
ncol(mat)
[1] 176851
print(object.size(mat),unit="Mb")
5.4 Mb
mat <- mat[,apply(mat,2,function(x)!any(x==0))]
ncol(mat)
[1] 156849 Of course the curse of dimensionality will still get you as the number of factors increases. E.g.
mat <- xsimplex(5,100)
ncol(mat)
[1] 4598125
print(object.size(mat),unit="Mb")
175.4 Mb which is still manageable (but for p = 6 your lattice has nearly 100 million points). Perhaps you can modify the code of xsimplex to automatically discard zeros.
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Rui Barradas Sent: Friday, October 12, 2012 18:04 To: wwreith Cc: r-help at r-project.org Subject: Re: [R] Autofilling a large matrix in R Hello, Something like this? g[rowSums(g) == 100, ] Hope this helps, Rui Barradas Em 12-10-2012 15:30, wwreith escreveu:
I wish to create a matrix of all possible percentages with two decimal place percision. I then want each row to sum to 100%. I started with the code below with the intent to then subset the data based on the row sum. This works great for 2 or 3 columns, but if I try 4 or more columns the number of rows become to large. I would like to find a way to break it down into some kind of for loop, so that I can remove the rows that don't sum to 100% inside the for loop rather than outside it. My first thought was to take list from 1:10, 11:20, etc. but that does not
get all of the points.
g<-as.matrix(expand.grid(rep(list(1:100), times=3))) Any thoughts how to split this into pieces? -- View this message in context: http://r.789695.n4.nabble.com/Autofilling-a-large-matrix-in-R-tp464599 1.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.