Skip to content
Prev 15336 / 20628 Next

Modelling proportion data in lme4

Dear Ramon and Thierry,

Thank you very much for your suggestions. In answer to your questions:

 - I do have 0s in my data. 
 - I don't think we can consider the denominator independent trails in this case. As it is the total abundance of a set of species, each species is 'block voting'. 

RE: Ramon's suggestion of a tweedie model for modelling the numerator as the response variable. The proportion data really is the measure that needs to be modelled as this is the compositional similarity calculation, so I'm not sure that a tweedie model will be suitable. 

RE Thierry's suggestion: Would it still be suitable to use to total abundance of species as the weights in a binomial model, even if the trials aren't strictly independent?

Thanks both for all your help! Any further advice would be very gratefully received!

Many thanks,

Adriana




-----Original Message-----
From: Ramon Diaz-Uriarte [mailto:rdiaz02 at gmail.com] 
Sent: 01 April 2017 09:20
To: Adriana De Palma
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Modelling proportion data in lme4

Dear Adriana,
On Thu, 30-03-2017, at 09:41, Adriana De Palma <A.De-Palma at nhm.ac.uk> wrote:
Do you actually have some 0s? Most of the rest of my answer assumes you do.
You might want to take a look at:


http://stats.stackexchange.com/questions/81343/response-variable-percentage-and-too-many-zeros-zero-inflated-poisson

http://stats.stackexchange.com/questions/142038/two-part-models-in-r-continuous-outcome-with-too-many-zeros

http://stats.stackexchange.com/questions/142013/correct-glmer-distribution-family-and-link-for-a-continuous-zero-inflated-data-s/

and this R-help question (referred from the above questions, e.g. http://stats.stackexchange.com/a/81347):

https://stat.ethz.ch/pipermail/r-help/2005-January/065070.html

where using a Tweedie model is suggested.


The cplm CRAN package, by W. Zhang:
https://cran.r-project.org/web/packages/cplm/index.html

will fit mixed-effects Tweedies.


I'd suggesting checking the vignetted of the cplm package, as well as Zhang's paper

http://link.springer.com/10.1007/s11222-012-9343-7


and Dunn and Smyth's 2005 paper, which contains examples that use the Tweedie distribution, as well as several references in the literature where these models have been used:

https://link.springer.com/article/10.1007/s11222-005-4070-y



Take all of this advice with a grain (or two) of salt, but in somewhat similar cases, and when I had a structure of replicates that allowed me to examine the relationship between mean and variance in the response, I have used it to help me decide whether a Tweedie was, or not, a reasonable choice compared to other options; for instance, with the Tweedie model we'd expect to see a linear slope between log(variance) and log(mean), with the slope, p, being the exponent in the relationship V(mu) = mu^p (see, e.g., Figure 3 in the paper by Dunn and Smyth).
A couple of comments here:

1. I am not sure those proportion data can always be modelled as binomial.
Is the numerator a quantity we can think of as arising from a number of independent trials, where the denominator is that number of independent trials?


2. You might consider modeling the numerator using the denominator not as denominator but as a covariate. This has the advantage of allowing you to examine different possible relationships such as

Numerator ~  Denominator + other stuff

but also

Numerator ~ poly(Denominator, 2) + other stuff

or

Numerator ~ bs(Denominator) + other stuff


and just generally things like


Numerator ~ some_function_of(Denominator, some_other_covariates)

such as

Numerator ~ poly(Denominator, 2) * some_covariate


etc.


When you do

Numerator/Denominator ~ other stuff

you are committing yourself to one particular form of that relationship (which might not be easy to reason about).



Best,


R.
--
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina
Universidad Aut?noma de Madrid
Arzobispo Morcillo, 4
28029 Madrid
Spain

Phone: +34-91-497-2412

Email: rdiaz02 at gmail.com
       ramon.diaz at iib.uam.es

http://ligarto.org/rdiaz