Skip to content

Proper treatment of Proportion Response Data with Two Categorical Predictors

4 messages · Aitor Gastón, Everett

#
Hello,

I believe I have exhausted my online resources (and eyes) in trying to
determine the appropriate method of analysis for the following
investigation.

I wish to determine if the efficiencies (% recovery) of two sampling units
are significantly different. I sampled in three different fields. I
attempted to collect 12 samples per unit per field (2 x 12 x 3 = 72);
however, some sample sites had no seeds and resulting data were excluded (so
as to not confuse with 'true' zero data; i.e., 0 seeds of x recovered).
Working sample size = 24 and 27 (51), per unit.

My dataset sets up like this:

1) 51 observations 
2) Response variable = percent seeds recovered; x = 0-1
3) Predictor variable 1 = unit (K or L); fixed categorical
4) Predictor variable 2 = field (1, 2, or 3); random categorical

More than 50% of my data are zeros, therefore, the distribution is far from
normal.

Can someone provide guidance RE how best to proceed? Thank you kindly in
advance.

-Everett




--
View this message in context: http://r-sig-ecology.471788.n2.nabble.com/Proper-treatment-of-Proportion-Response-Data-with-Two-Categorical-Predictors-tp7577742.html
Sent from the r-sig-ecology mailing list archive at Nabble.com.
#
Everett,

If you have the original binary data that were used to calculate proportions 
you can use generalized linear models with logit link (i.e. logistic 
regression). You can find a simple explanation of this approach and some 
examples with R code in 
http://www.bio.ic.ac.uk/research/crawley/statistics/exercises/R10Proportiondata.pdf

Aitor


--------------------------------------------------
From: "Everett" <ehanna23 at uwo.ca>
Sent: Tuesday, December 11, 2012 12:19 AM
To: <r-sig-ecology at r-project.org>
Subject: [R-sig-eco] Proper treatment of Proportion Response Data with Two 
Categorical Predictors
#
Aitor,

Perhaps I am missing something, but I do not think that my original data can
take binary form. Each sampling point had a unique number of seeds (0 -
+infinity). I sampled at each site and collected a proportion of the seeds
that were available, thus, I would have, for example, 10 seeds available of
which 2 seeds were collected = 0.200 recovery (or 20% recovery). I do not
think that logistic (binary) regression applies here but I am relatively
novice with certain aspects of these topics.

-Everett



--
View this message in context: http://r-sig-ecology.471788.n2.nabble.com/Proper-treatment-of-Proportion-Response-Data-with-Two-Categorical-Predictors-tp7577742p7577747.html
Sent from the r-sig-ecology mailing list archive at Nabble.com.
#
Following your example, you have 2 positive cases and 8 negative cases, i.e. 
a binary response as you can code the data as 0 (not recovered) and 1 
(recovered).

An example of the GLM approach using simulated data:

set.seed(100)#set random number generator  to get reproducible results
N<-round(runif(51,1,10))#simulate number of available seeds
rp<-runif(51,0,1)#simulate proportion of recovered seeds
r<-round(N*rp)#simulate numer of recovered seeds
u<-factor(sample(c("K","L"),51,replace=T)) #simulate units
f<-factor(sample(c("f1","f2","f3"),51,replace=T)) #simulate fields
mod<-glm(cbind(r,N-r)~u + f, family="binomial") #fit a GLM
anova (mod,test="Chisq") #anova test
summary(mod) #summary of the model with "treatment contrasts"

This is a fixed effects model, but it can be adapted to mixed models using 
the glmer function of the lme4 package. An example available in ?glmer

  ## generalized linear mixed model
     (gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
                   family = binomial, data = cbpp))

Hope this helps

Aitor


--------------------------------------------------
From: "Everett" <ehanna23 at uwo.ca>
Sent: Tuesday, December 11, 2012 8:46 PM
To: <r-sig-ecology at r-project.org>
Subject: Re: [R-sig-eco] Proper treatment of Proportion Response Data with 
Two Categorical Predictors