Random assignment

Fri, Oct 15, 2010 6:18 AM

Hello again John,

I was going to suggest that you just use qbinom to generate the
expected number of extinctions. For example, for the family with 80
spp the central 95% expectation is:

qbinom(c(0.025, 0.975), 80, 0.0748)

which gives 2 - 11 spp.

If you wanted to do look across a large number of families you'd need
to deal with multiple comparison error but as a quick first look it
might be helpful.

However, I've just got a copy of teh paper and it seems that the
authors are calculating something different to a simple binomial
expecation: they are differentiating between high-risk (red listed)
and low-risk species within a family. They state that this equation
(expressed here in R-ese)...

choose(N, R) * p^R * b^(N - R)

...gives the probabilitiy of an entire family becoming extinct, where
N is number of spp in family; R is number of those that are red
listed; p is extinction probability for red list spp (presumably over
some period but I haven't read the paper properly yet); b is
extinction probability for other spp.

Then, in their simulations they hold b constant but play around with a
range of values for p.

So this sounds a bit different to what you originally posted as your
objective (?)


Michael

On 15 October 2010 22:49, Michael Bedward <michael.bedward at gmail.com> wrote:

Hi John,

The word "species" attracted my attention :)

Like Dennis, I'm not sure I understand your idea properly. In
particular, I don't see what you need the simulation for.

If family F has Fn species, your random expectation is that p * Fn of
them will be at risk (p = 0.0748). The variance on that expectation
will be p * (1-p) * Fn.

If you do your simulation that's the result you'll get. ?Perhaps to
initial identify families with disproportionate observed extinction
rates all you need is the dbinom function ?

Michael


On 15 October 2010 22:29, John Haart <another83 at me.com> wrote:

Hi Denis and list

Thanks for this , and sorry for not providing enough information

First let me put the study into a bit more context : -

I know the number of species at risk in each family, what i am asking ?is "Is risk random according to family or do certain families have a disproportionate number of at risk species?"

My idea was to randomly allocate risk to the families based on the criteria below (binomial(nspecies, 0.0748)) and then compare this to the "true data" and see if there was a significant difference.

So in answer to your questions, (assuming my method is correct !)

Is this over all families, or within a particular family? If the former, why
does a distinction of family matter?

Within a particular family ?- this is because i am looking to see if risk in the "observed" data set is random in respect to family so this will provide the baseline to compare against.

I guess you've stated the p, but what's the n? The number of species in each
family?

This varies largely, for instance i have some families that are monotypic ?(with 1 species) and then i have other families with 100+ species

Assuming you have multiple families, do you want separate simulations per
family, or do you want to do some sort of weighting (perhaps proportional to
size) over all families?

I am assuming i want some sort of weighting. This is because i am wanting to calculate the number of species expected to be at risk in EACH family under the random binomial distribution ( assuming every species has a 7.48% chance of being at risk.

Thanks

John




On 15 Oct 2010, at 11:19, Dennis Murphy wrote:

Hi:

I don't believe you've provided quite enough information just yet...

On Fri, Oct 15, 2010 at 2:22 AM, John Haart <another83 at me.com> wrote:

Dear List,

I am doing some simulation in R and need basic help!

I have a list of animal families for which i know the number of species in
each family.

I am working under the assumption that a species has a 7.48% chance of
being at risk.

Is this over all families, or within a particular family? If the former, why
does a distinction of family matter?

I want to simulate the number of species expected to be at risk under a
random binomial distribution with 10,000 randomizations.

I guess you've stated the p, but what's the n? The number of species in each
family? If you're simulating on a family by family basis, then it would seem
that a binomial(nspecies, 0.0748) distribution would be the reference.
Assuming you have multiple families, do you want separate simulations per
family, or do you want to do some sort of weighting (perhaps proportional to
size) over all families? The latter is doable, but it would require a
two-stage simulation: one to randomly select a family and then to randomly
select a species.

Dennis

I am relatively knew to this field and would greatly appreciate a
"idiot-proof" response, I.e how should the data be entered into R? I was
thinking of using read.table, header = T, where the table has F = Family
Name, and SP = Number of species in that family?

John

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

On 15 October 2010 23:34, Michael Bedward <michael.bedward at gmail.com> wrote:

Hi John,

I haven't read that particular paper but in answer to your question...

So if i do this for all the families it will be the same as doing the simulation experiment
outline in the method above?

Yes :)

Michael


On 15 October 2010 23:18, John Haart <another83 at me.com> wrote:

Hi Michael,

Thanks for this - the reason i am following this approach is that it appeared in a paper i was reading, and i thought it was a interesting angle to take

The paper is

Vamosi & Wilson, 2008. Nonrandom extinction leads to elevated loss of angiosperm evolutionary history. Ecology Letters, (2008) 11: 1047?1053.

and the specific method i am following states :-

We calculated the number of species expected to be at risk in each family under a random binomial distribution in 10 000 randomizations [generated using R version 2.6.0 (R Development Team 2007)] assuming every species has a 7.48% chance of being at risk.

I guess the reason i am doing the simulation is because i am not hugely statistically minded and the paper was asking the same question i am interested in answering :).

So following your approach -

if family F has Fn species, your random expectation is that p * Fn of
them will be at risk (p = 0.0748). The variance on that expectation
will be p * (1-p) * Fn.


Family f = Bromeliaceae , with Fn = 80, p=0.0748
random expectation = p*Fn = (0.0748*80) = 5.984
variance = p * (1-p) * Fn = (0.0748*0.9252) *80 = 5.5363968

So the random expectation is that the Bromeliaceae will have 6 species at risk, if risk is assigned randomly?

So if i do this for all the families it will be the same as doing the simulation experiment outline in the method above?

Thanks

John




On 15 Oct 2010, at 12:49, Michael Bedward wrote:

Hi John,

The word "species" attracted my attention :)

Like Dennis, I'm not sure I understand your idea properly. In
particular, I don't see what you need the simulation for.

If family F has Fn species, your random expectation is that p * Fn of
them will be at risk (p = 0.0748). The variance on that expectation
will be p * (1-p) * Fn.

If you do your simulation that's the result you'll get. ?Perhaps to
initial identify families with disproportionate observed extinction
rates all you need is the dbinom function ?

Michael


On 15 October 2010 22:29, John Haart <another83 at me.com> wrote:

Hi Denis and list

Thanks for this , and sorry for not providing enough information

First let me put the study into a bit more context : -

I know the number of species at risk in each family, what i am asking ?is "Is risk random according to family or do certain families have a disproportionate number of at risk species?"

My idea was to randomly allocate risk to the families based on the criteria below (binomial(nspecies, 0.0748)) and then compare this to the "true data" and see if there was a significant difference.

So in answer to your questions, (assuming my method is correct !)

Is this over all families, or within a particular family? If the former, why
does a distinction of family matter?

Within a particular family ?- this is because i am looking to see if risk in the "observed" data set is random in respect to family so this will provide the baseline to compare against.

I guess you've stated the p, but what's the n? The number of species in each
family?

This varies largely, for instance i have some families that are monotypic ?(with 1 species) and then i have other families with 100+ species

Assuming you have multiple families, do you want separate simulations per
family, or do you want to do some sort of weighting (perhaps proportional to
size) over all families?

I am assuming i want some sort of weighting. This is because i am wanting to calculate the number of species expected to be at risk in EACH family under the random binomial distribution ( assuming every species has a 7.48% chance of being at risk.

Thanks

John




On 15 Oct 2010, at 11:19, Dennis Murphy wrote:

Hi:

I don't believe you've provided quite enough information just yet...

On Fri, Oct 15, 2010 at 2:22 AM, John Haart <another83 at me.com> wrote:

Dear List,

I am doing some simulation in R and need basic help!

I have a list of animal families for which i know the number of species in
each family.

I am working under the assumption that a species has a 7.48% chance of
being at risk.

Is this over all families, or within a particular family? If the former, why
does a distinction of family matter?

I want to simulate the number of species expected to be at risk under a
random binomial distribution with 10,000 randomizations.

I guess you've stated the p, but what's the n? The number of species in each
family? If you're simulating on a family by family basis, then it would seem
that a binomial(nspecies, 0.0748) distribution would be the reference.
Assuming you have multiple families, do you want separate simulations per
family, or do you want to do some sort of weighting (perhaps proportional to
size) over all families? The latter is doable, but it would require a
two-stage simulation: one to randomly select a family and then to randomly
select a species.

Dennis

I am relatively knew to this field and would greatly appreciate a
"idiot-proof" response, I.e how should the data be entered into R? I was
thinking of using read.table, header = T, where the table has F = Family
Name, and SP = Number of species in that family?

John

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Random assignment

Thread (8 messages)