Unbalanced data and random effects

On Wed, Oct 16, 2013 at 6:20 PM, Chris Howden
Hi Krysztof,

Did you have a specific section of Zuur et als book in mind? I've pulled
it off my shelf and tried looking up shrinkage, unbalanced design, design,
etc in the index but couldn't find anything relevant. I'm sure it's in
there, but it?s a rather large book to read in 1 go!!
I was thinking "Mixed Effects Models and Extensions in Ecology with R", but now
that I search through Zuur in Google books there appears to be no mention of
either "partial pooling" or "shrinkage" (I don't have the book on
hand).  It's not mentioned
in the index either...  I recommended  Zuur because I know a lot of
ecologists use it
and shrinkage is such a basic and useful topic I expected it to be covered.

It's page 477 in Gelman and Hill.  Since I stuck my foot in it by
recommending Zuur
without checking: the basic idea is that if you have a data set with
unbalanced group sizes and you just call everything one group, you could get an
estimated group mean, MU.  If you use fixed effects and you estimate
one mean per
group(mu_1, mu_2, ..., mu_k), and the means for the small groups will be poorly
estimated (large standard errors).  If you use a random effects model,
you estimate
one mean per group but you also constrain the group means (mu*_1,
mu*_2, ..., mu*_k)
to come from a normal distribution (with an estimated mean, MU*, and
variance) which has two
effects important for interpretation: 1) groups with fewer
observations will mostly be
represented by the overall mean (mu*_1 is closer to MU* than mu_1 is
to MU, and the
effect is more extreme for groups with small sample size); and 2) this
effect is even more pronounced
in groups with large deviations from MU*.

You can get a feel for how much this matters by simulating/fitting
some data similar
to your data in R (Kery's "Introduction to WinBUGS for ecologists"
does a lot of this
kind simulation).  The terms used to describe these effects are
"shrinkage" and "partial pooling",
since complete pooling is what you get when you disregard the
divisions.  You can also
calculate how much pooling is being done directly (Gelman and Hill, pg. 477):

mu*_j = w_j x MU* + (1-w_j) * mean(observations in group j)

w_j = 1- (estimated variance of random effect) / (estimated variance
of random effect + within-group variance/group size)

Where w_j tells you how much that groups estimate is pooled towards the mean.

That's the short and sloppy version, but the discussion in Gelman is
good, sorry for the confusion,
maybe somebody else knows for sure where/if Zuur discusses this?

Krzysztof
Chris Howden B.Sc. (Hons) GStat.
Founding Partner
Evidence Based Strategic Development, IP Commercialisation and Innovation,
Data Analysis, Modelling and Training
(mobile) 0410 689 945
(skype) chris.howden
chris at trickysolutions.com.au

Disclaimer: The information in this email and any attachments to it are
confidential and may contain legally privileged information. If you are
not the named or intended recipient, please delete this communication and
contact us immediately. Please note you are not authorised to copy, use or
disclose this communication or any attachments without our consent.
Although this email has been checked by anti-virus software, there is a
risk that email messages may be corrupted or infected by viruses or other
interferences. No responsibility is accepted for such interference. Unless
expressly stated, the views of the writer are not those of the company.
Tricky Solutions always does our best to provide accurate forecasts and
analyses based on the data supplied, however it is possible that some
important predictors were not included in the data sent to us. Information
provided by us should not be solely relied upon when making decisions and
clients should use their own judgement.

-----Original Message-----
From: r-sig-ecology-bounces at r-project.org
[mailto:r-sig-ecology-bounces at r-project.org] On Behalf Of Krzysztof
Sakrejda
Sent: Thursday, 17 October 2013 12:29 AM
Cc: r-sig-ecology at r-project.org
Subject: Re: [R-sig-eco] Unbalanced data and random effects

On Wed, Oct 16, 2013 at 6:41 AM,  <v_coudrain at voila.fr> wrote:
Dear all,

I performed a census of insects at different sites and measured there
size. I would like to know if size is related to an environmental
factor. I modelled the size as a fonction of the factor with site as a
random variable to account for within-site variability. However I have
strong unbalanced data with some sites having only two individuals and
others up to 100. Is having site as a random factor sufficient to deal
with this strong data unbalance?

I'm not sure what you mean by "deal with", but reading about shrinkage in
random effects models in any decent source would probably be a fine start
for you, either here:

http://www.amazon.com/Effects-Extensions-Ecology-Statistics-Biology/dp/038
7874577/ref=la_B001JRWU88_1_2/192-3027843-3405263?s=books&ie=UTF8&qid=1381
929893&sr=1-2

or here:

http://www.amazon.com/Analysis-Regression-Multilevel-Hierarchical-Models/d
p/052168689X/ref=sr_1_3?s=books&ie=UTF8&qid=1381929942&sr=1-3&keywords=gel
man+bayesian

The short answer is that the site effect will shrink toward the average
site effect for sites with few individuals.

Krzysztof

The residual fit of the data is quite bad, certainly because of the
strong difference in variance among sites.

Would anybody have some advice? Thank you!
___________________________________________________________
Les pr?visions m?t?o pour aujourd'hui, demain et jusqu'? 8 jours !
Voila.fr http://meteo.voila.fr/

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

--

Krzysztof Sakrejda

Organismic and Evolutionary Biology
University of Massachusetts, Amherst
319 Morrill Science Center South
611 N. Pleasant Street
Amherst, MA 01003

work #: 413-325-6555
email: sakrejda at cns.umass.edu

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Krzysztof Sakrejda

Organismic and Evolutionary Biology
University of Massachusetts, Amherst
319 Morrill Science Center South
611 N. Pleasant Street
Amherst, MA 01003

work #: 413-325-6555
email: sakrejda at cns.umass.edu

Unbalanced data and random effects

Thread (5 messages)