Random assignment
Hello again John, I was going to suggest that you just use qbinom to generate the expected number of extinctions. For example, for the family with 80 spp the central 95% expectation is: qbinom(c(0.025, 0.975), 80, 0.0748) which gives 2 - 11 spp. If you wanted to do look across a large number of families you'd need to deal with multiple comparison error but as a quick first look it might be helpful. However, I've just got a copy of teh paper and it seems that the authors are calculating something different to a simple binomial expecation: they are differentiating between high-risk (red listed) and low-risk species within a family. They state that this equation (expressed here in R-ese)... choose(N, R) * p^R * b^(N - R) ...gives the probabilitiy of an entire family becoming extinct, where N is number of spp in family; R is number of those that are red listed; p is extinction probability for red list spp (presumably over some period but I haven't read the paper properly yet); b is extinction probability for other spp. Then, in their simulations they hold b constant but play around with a range of values for p. So this sounds a bit different to what you originally posted as your objective (?) Michael
On 15 October 2010 22:49, Michael Bedward <michael.bedward at gmail.com> wrote:
Hi John, The word "species" attracted my attention :) Like Dennis, I'm not sure I understand your idea properly. In particular, I don't see what you need the simulation for. If family F has Fn species, your random expectation is that p * Fn of them will be at risk (p = 0.0748). The variance on that expectation will be p * (1-p) * Fn. If you do your simulation that's the result you'll get. ?Perhaps to initial identify families with disproportionate observed extinction rates all you need is the dbinom function ? Michael On 15 October 2010 22:29, John Haart <another83 at me.com> wrote:
Hi Denis and list Thanks for this , and sorry for not providing enough information First let me put the study into a bit more context : - I know the number of species at risk in each family, what i am asking ?is "Is risk random according to family or do certain families have a disproportionate number of at risk species?" My idea was to randomly allocate risk to the families based on the criteria below (binomial(nspecies, 0.0748)) and then compare this to the "true data" and see if there was a significant difference. So in answer to your questions, (assuming my method is correct !)
Is this over all families, or within a particular family? If the former, why does a distinction of family matter?
Within a particular family ?- this is because i am looking to see if risk in the "observed" data set is random in respect to family so this will provide the baseline to compare against.
I guess you've stated the p, but what's the n? The number of species in each family?
This varies largely, for instance i have some families that are monotypic ?(with 1 species) and then i have other families with 100+ species
Assuming you have multiple families, do you want separate simulations per family, or do you want to do some sort of weighting (perhaps proportional to size) over all families?
I am assuming i want some sort of weighting. This is because i am wanting to calculate the number of species expected to be at risk in EACH family under the random binomial distribution ( assuming every species has a 7.48% chance of being at risk. Thanks John On 15 Oct 2010, at 11:19, Dennis Murphy wrote: Hi: I don't believe you've provided quite enough information just yet... On Fri, Oct 15, 2010 at 2:22 AM, John Haart <another83 at me.com> wrote:
Dear List, I am doing some simulation in R and need basic help! I have a list of animal families for which i know the number of species in each family. I am working under the assumption that a species has a 7.48% chance of being at risk.
Is this over all families, or within a particular family? If the former, why does a distinction of family matter?
I want to simulate the number of species expected to be at risk under a random binomial distribution with 10,000 randomizations.
I guess you've stated the p, but what's the n? The number of species in each family? If you're simulating on a family by family basis, then it would seem that a binomial(nspecies, 0.0748) distribution would be the reference. Assuming you have multiple families, do you want separate simulations per family, or do you want to do some sort of weighting (perhaps proportional to size) over all families? The latter is doable, but it would require a two-stage simulation: one to randomly select a family and then to randomly select a species. Dennis
I am relatively knew to this field and would greatly appreciate a "idiot-proof" response, I.e how should the data be entered into R? I was thinking of using read.table, header = T, where the table has F = Family Name, and SP = Number of species in that family? John
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On 15 October 2010 23:34, Michael Bedward <michael.bedward at gmail.com> wrote:
Hi John, I haven't read that particular paper but in answer to your question...
So if i do this for all the families it will be the same as doing the simulation experiment outline in the method above?
Yes :) Michael On 15 October 2010 23:18, John Haart <another83 at me.com> wrote:
Hi Michael, Thanks for this - the reason i am following this approach is that it appeared in a paper i was reading, and i thought it was a interesting angle to take The paper is Vamosi & Wilson, 2008. Nonrandom extinction leads to elevated loss of angiosperm evolutionary history. Ecology Letters, (2008) 11: 1047?1053. and the specific method i am following states :-
We calculated the number of species expected to be at risk in each family under a random binomial distribution in 10 000 randomizations [generated using R version 2.6.0 (R Development Team 2007)] assuming every species has a 7.48% chance of being at risk.
I guess the reason i am doing the simulation is because i am not hugely statistically minded and the paper was asking the same question i am interested in answering :). So following your approach -
if family F has Fn species, your random expectation is that p * Fn of them will be at risk (p = 0.0748). The variance on that expectation will be p * (1-p) * Fn.
Family f = Bromeliaceae , with Fn = 80, p=0.0748 random expectation = p*Fn = (0.0748*80) = 5.984 variance = p * (1-p) * Fn = (0.0748*0.9252) *80 = 5.5363968 So the random expectation is that the Bromeliaceae will have 6 species at risk, if risk is assigned randomly? So if i do this for all the families it will be the same as doing the simulation experiment outline in the method above? Thanks John On 15 Oct 2010, at 12:49, Michael Bedward wrote: Hi John, The word "species" attracted my attention :) Like Dennis, I'm not sure I understand your idea properly. In particular, I don't see what you need the simulation for. If family F has Fn species, your random expectation is that p * Fn of them will be at risk (p = 0.0748). The variance on that expectation will be p * (1-p) * Fn. If you do your simulation that's the result you'll get. ?Perhaps to initial identify families with disproportionate observed extinction rates all you need is the dbinom function ? Michael On 15 October 2010 22:29, John Haart <another83 at me.com> wrote:
Hi Denis and list Thanks for this , and sorry for not providing enough information First let me put the study into a bit more context : - I know the number of species at risk in each family, what i am asking ?is "Is risk random according to family or do certain families have a disproportionate number of at risk species?" My idea was to randomly allocate risk to the families based on the criteria below (binomial(nspecies, 0.0748)) and then compare this to the "true data" and see if there was a significant difference. So in answer to your questions, (assuming my method is correct !)
Is this over all families, or within a particular family? If the former, why does a distinction of family matter?
Within a particular family ?- this is because i am looking to see if risk in the "observed" data set is random in respect to family so this will provide the baseline to compare against.
I guess you've stated the p, but what's the n? The number of species in each family?
This varies largely, for instance i have some families that are monotypic ?(with 1 species) and then i have other families with 100+ species
Assuming you have multiple families, do you want separate simulations per family, or do you want to do some sort of weighting (perhaps proportional to size) over all families?
I am assuming i want some sort of weighting. This is because i am wanting to calculate the number of species expected to be at risk in EACH family under the random binomial distribution ( assuming every species has a 7.48% chance of being at risk. Thanks John On 15 Oct 2010, at 11:19, Dennis Murphy wrote: Hi: I don't believe you've provided quite enough information just yet... On Fri, Oct 15, 2010 at 2:22 AM, John Haart <another83 at me.com> wrote:
Dear List, I am doing some simulation in R and need basic help! I have a list of animal families for which i know the number of species in each family. I am working under the assumption that a species has a 7.48% chance of being at risk.
Is this over all families, or within a particular family? If the former, why does a distinction of family matter?
I want to simulate the number of species expected to be at risk under a random binomial distribution with 10,000 randomizations.
I guess you've stated the p, but what's the n? The number of species in each family? If you're simulating on a family by family basis, then it would seem that a binomial(nspecies, 0.0748) distribution would be the reference. Assuming you have multiple families, do you want separate simulations per family, or do you want to do some sort of weighting (perhaps proportional to size) over all families? The latter is doable, but it would require a two-stage simulation: one to randomly select a family and then to randomly select a species. Dennis
I am relatively knew to this field and would greatly appreciate a "idiot-proof" response, I.e how should the data be entered into R? I was thinking of using read.table, header = T, where the table has F = Family Name, and SP = Number of species in that family? John
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.