Error distribution for fractional response
Dear Adhara, I just saw that Bob O'Hara already answered this query. Here are some alternatives. Note that there are similarities between Bob's suggestion and number 2) below due to the relatedness of the Poisson and the Binomial. There are a number of ways to answer this question. First, if you really must work with the regeneration variable, then you could use a beta distribution, which is quite flexible and is defined on the zero to one interval. This is not what I would recommend though. Second, if you wanted to know about the number of saplings per adult (and not about the number of saplings irrespective of the number of adults) then you should use a binomial model. Formally, you would assume that the number of saplings is drawn from a binomial distribution with probability of success is the parameter of interest and the number of trials is given by the number of adults. Without covariates, you are assuming that the success probability is constant over each of the observations. There might be overdispersion, which would give you some heartache. A beta-binomial might help there. Third, if you are interested in the number of saplings (irrespective of the number of adults) then you will need to take into account the variation in the number of adults _and_ the number of saplings per adult. Not that the binomial solution earlier accounts for variation of the number of saplings _conditional_ on the number of adults. One way to do this is to assume that the number of adults is Poisson, which is the denominator for the binomial number of saplings. Let lambda be the parameter for the Poisson and pi be the parameter for the binomial. Then the distribution of the result is, surprisingly(?), also a Poisson with parameter lambda*pi. I don't know of an already built R function that will do this. Others on the list might. It wouldn't be that hard to build one if you wanted to go down that route. Just had a thought -- does the number of saplings ever exceed the number of adults? This would give regeneration > 1. It would mean that all of the above is meaningless. Have fun with it. Scott
On 30/01/14 20:58, Adhara Pardo wrote:
Dear R users, I would like to fit a GLM to some plant regeneration data (see bottom of this e-mail). The dependent variable, an index of regeneration, was obtained by diviving the number of saplings by the number of adults plants present in each plot. The result is a highly skewed variable and thus, specifying, for instance, a Gaussian distribution does not seem to be appropriate. Data transformation does not help either. Do you have any suggestion on the best distribution to choose? any help would be greatly appreciated! Best wishes, Adara "saplings","adults","regeneration" 0,1,0 0,2,0 450,4399,0.1 2416,25340,0.1 6,72,0.08 0,6,0 0,8,0 61,95,0.64 6,98,0.06 5,59,0.08 55,88,0.63 216,19,11.37 6,1,6 72,178,0.4 26,42,0.62 6,4,1.5 0,2,0 0,1,0 229,533,0.43 0,43,0 5,27,0.19 28,86,0.33 0,102,0 0,2,0 1,5,0.2 0,1,0 2,26,0.08 0,4,0 11,13,0.85 0,59,0 0,73,0 223,100,2.23 0,2,0 6,5,1.2 0,16,0 104,170,0.61 0,1,0 2,69,0.03 4,88,0.05 51,180,0.28 3,1,3 12,30,0.4 78,807,0.1 1,65,0.02 2,29,0.07 87,1102,0.08 19,2,9.5 18,20,0.9 22,23,0.96 0,1,0 20,417,0.05 29,64,0.45 0,9,0 0,3,0 0,11,0 51,42,1.21 22,17,1.29 15,25,0.6 0,32,0 0,13,0 0,7,0 59,710,0.08 0,20,0 0,25,0 2,77,0.03 0,37,0 174,882,0.2 50,1069,0.05 1,5,0.2 17,10,1.7 0,3,0 0,16,0 3,967,0 8,150,0.05 0,1,0 6,18,0.33 53,122,0.43 0,1,0 42,74,0.57 128,1607,0.08 18,114,0.16 0,1,0 13,31,0.42 50,123,0.41 11,79,0.14 0,28,0 25,106,0.24 106,1197,0.09 4,6,0.67 11,22,0.5 394,213,1.85 4,16,0.25 222,776,0.29 4,468,0.01 0,76,0 3,549,0.01 17,199,0.09 70,2880,0.02 8,396,0.02 0,15,0 14,332,0.04 51,318,0.16 2,515,0 14,1519,0.01 0,78,0 9,326,0.03 11,481,0.02 0,266,0 6,768,0.01 0,8,0 6,519,0.01 2,38,0.05 1,51,0.02 0,7,0 235,2310,0.1 7,521,0.01 0,94,0 3,174,0.02 0,8,0 11,205,0.05 0,4,0 0,15,0 4,40,0.1 0,28,0 75,208,0.36 7,166,0.04 0,15,0 12,143,0.08 0,974,0 160,614,0.26 76,85,0.89 0,39,0 0,121,0 304,699,0.43 50,48,1.04 11,17,0.65 16,211,0.08 2,2,1 140,2138,0.07 0,1,0 6,11,0.55 0,6,0 0,2,0 0,2,0 1,44,0.02 0,65,0 42,2,21 67,198,0.34 98,89,1.1 13,44,0.3 0,1,0 0,2,0 0,6,0 0,1,0 46,231,0.2 22,130,0.17 0,3,0 13,47,0.28 0,1,0 0,2,0 60,304,0.2 543,294,1.85 7,15,0.47 206,475,0.43 1,30,0.03 91,86,1.06 0,15,0 49,98,0.5 9,7,1.29 23,35,0.66 27,449,0.06 5,53,0.09 5,9,0.56 40,134,0.3 0,10,0 0,1,0 13,13,1 150,165,0.91 14,4,3.5 0,7,0 67,48,1.4 0,2,0 2,18,0.11 1,14,0.07 0,6,0 8,765,0.01 20,2860,0.01 1,182,0.01 65,146,0.45 1,86,0.01 0,4,0 0,1,0 0,17,0 0,8,0 3,38,0.08 188,412,0.46 13,1899,0.01 9,855,0.01 0,27,0 1,163,0.01 0,15,0 10,43,0.23 4,22,0.18 17,306,0.06 2,62,0.03 0,3,0 0,106,0 0,26,0 0,53,0 40,15,2.67 2,18,0.11 0,1,0 [[alternative HTML version deleted]]
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Scott Foster Computational Informatics CSIRO E scott.foster at csiro.au T +61 3 6232 5178 Postal address: CSIRO Computational Informatics, GPO Box 1538, Hobart TAS 7001 Street Address: CSIRO Computational Informatics, Castray Esplanade, Hobart Tas 7001, Australia www.csiro.au