how to subsample all possible combinations of n species taken 1:n at a time?
Are you just trying to obtain a combination from 25 possible terms? If so, then just sample the number you want and convert the number to binary:
sample(33554432,100)
[1] 6911360 5924262 23052661 12888381 25831589 16700013 24079278 33282839 12751862 26086726 31363494 7118320 21866536 4212929 [15] 8966435 12955834 449305 12830805 29181967 11420211 16175915 20118079 16560488 6248422 27762022 22430005 26650247 3621985 [29] 24283690 13800068 27546362 21711718 26270840 18556802 17774422 26486373 782865 16013167 24572344 23244187 16026237 28897360 [43] 14700082 8214024 2371593 3337527 10612303 17402454 22213173 13650936 30630988 9851680 15403666 11153297 21839554 8657593 [57] 16057288 25713076 2826853 29370859 11377380 28166893 11632747 11199608 15983665 29937151 29002363 13085852 26082502 32232925 [71] 14584722 23907975 13421556 10916983 25403574 6801209 23861215 4083294 8237209 4808486 8040610 1977505 21551566 29402643 [85] 26135975 26753178 15276437 13760103 27208220 20298140 21968831 11851302 9068401 33308858 21256448 7154058 4341004 16042933 [99] 31006704 20091025 This is a 100 samples and you can convert each of the numbers to binary and the bits will tell you might elements to combine.
On Mon, Apr 6, 2009 at 11:39 AM, jasper slingsby <jslingsby at gmail.com> wrote:
Hello
I apologise for the length of this entry but please bear with me.
In short:
I need a way of subsampling communities from all possible communities of n
taxa taken 1:n at a time without having to calculate all possible
combinations (because this gives me a memory error - using
combn() or expand.grid() at least). Does anyone know of a function? Or can
you help me edit the
combn
or
expand.grid
functions to generate subsamples?
In long:
I have been creating all possible communities of n taxa taken 1:n at a time
to get a presence/absence matrix of species occurrence in communities as
below...
Rows are samples, columns are species:
? ?A ? ?B ? ?C ? D ? ? . ? ? . ? ?. ? ?.
? ?1 ? ?0 ? ?1 ? ?1 ? ?1 ? ?0 ? ?0 ? ?0 ? ?1 ? ? 1 ? ? 1 ? ? 1 ? ? 0 ? ? 0
0 ? ? 0
? ?0 ? ?1 ? ?1 ? ?1 ? ?1 ? ?0 ? ?0 ? ?0 ? ?1 ? ? 1 ? ? 1 ? ? 1 ? ? 0 ? ? 0
0 ? ? 0
? ?1 ? ?1 ? ?1 ? ?1 ? ?1 ? ?0 ? ?0 ? ?0 ? ?1 ? ? 1 ? ? 1 ? ? 1 ? ? 0 ? ? 0
0 ? ? 0
? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?1 ? ?0 ? ?0 ? ?1 ? ? 1 ? ? 1 ? ? 1 ? ? 0 ? ? 0
0 ? ? 0
? ?1 ? ?0 ? ?0 ? ?0 ? ?0 ? ?1 ? ?0 ? ?0 ? ?1 ? ? 1 ? ? 1 ? ? 1 ? ? 0 ? ? 0
0 ? ? 0
? ?0 ? ?1 ? ?0 ? ?0 ? ?0 ? ?1 ? ?0 ? ?0 ? ?1 ? ? 1 ? ? 1 ? ? 1 ? ? 0 ? ? 0
0 ? ? 0
? ?1 ? ?1 ? ?0 ? ?0 ? ?0 ? ?1 ? ?0 ? ?0 ? ?1 ? ? 1 ? ? 1 ? ? 1 ? ? 0 ? ? 0
0 ? ? 0
? ?0 ? ?0 ? ?1 ? ?0 ? ?0 ? ?1 ? ?0 ? ?0 ? ?1 ? ? 1 ? ? 1 ? ? 1 ? ? 0 ? ? 0
0 ? ? 0
...but the number of possible communities increases exponentially with each
added taxon.
n<-11 ? ? #number of taxa
sum(for (i in 0:n) choose(i, k = 0:i)) #number of combos
So all possible combinations of 11 taxa taken 1:11 at a time is 2048, all
combos of 12 taken 1:12 is 4096, 13 taken 1:13 = 8192...etc etc such that
when I reach about 25 taken 1:25 the number of combos is 33554432 and I get
a memory error.
I have found that the number of combos of x taxa taken from a pool of n
creates a very kurtotic unimodal distribution,...
x<-vector("integer",20)
for (i in 1:20) {x[i]<-choose(20,i)}
plot(x)
...but have found that limiting the number of samples for any community size
to 1000 is good enough for the further analyses I wish to do.
My problem lies in sampling all possible combos without having to calculate
all possible combos. I have tried two methods but both give memory errors at
about 25 taxa.
The expand.grid() method:
n <- 11
toto <- vector("list",n)
titi <- lapply(toto,function(x) c(0,1))
tutu <- expand.grid(titi)
The combn() method (a slightly lengthlier function):
samplecommunityD<- function(n,numsamples)
{
super<-mat.or.vec(,n)
for (numspploop in 1:n)
{
?minor<-t(combn(n,numspploop))
?if (dim(minor)[1]<numsamples)
?{
? ?minot<-mat.or.vec(dim(minor)[1],n)
? ?for (loopi in 1:dim(minor)[1])
? ?{
? ? ?for (loopbi in 1:dim(minor)[2])
? ? ?{
? ? ? ?minot[loopi,minor[loopi,loopbi]] <- 1
? ? ?}
? ?}
? ?super<-rbind(super,minot)
? ?rm(minot)
?}
?else
?{
? minot<-mat.or.vec(numsamples,n)
? for (loopii in 1:numsamples)
? {
? ? thousand<-sample(dim(minor)[1],numsamples)
? ? ? for (loopbii in 1:dim(minor)[2])
? ? ? {
? ? ? minot[loopii,minor[thousand[loopii],loopbii]] <- 1
? ? ? }
? }
? super<-rbind(super,minot)
? rm(minot)
?}
}
super<-super[!rowSums(super)>n-1&!rowSums(super)<2,]
return(super)
}
samplecommunityD(11,1000)
So unless anyone knows of another function I could try my next step would be
to modify the combn or expand.grid functions to generate subsamples, but
their coding beyond me at this stage (I'm a 3.5 month newbie). Can anyone
identify where in the code I would need to introduce a sampling term or
skipping sequence?
Thanks for your time
Jasper
--
View this message in context: http://www.nabble.com/how-to-subsample-all-possible-combinations-of-n-species-taken-1%3An-at-a-time--tp22911399p22911399.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?