Unbalance design in GLMM

2 messages · Gabriela Agostini, Ben Bolker

Sat, Feb 2, 2013 4:33 PM #

Hello!

I am working with GLMM using the binomial family for testing
differences in amphibian malformations that occur in several ponds
located in two different areas.
The random effects are sampled day (samplday) and pond identity
(pondident).The fixed effects are area (studyarea) and species (sp).
Ymat is the response variable.

[1] "factor

[1] "integer"

[1] "A"     "arro"  "B"     "C"     "campo" "D"     "E"     "F"     "G"
[10] "hum"

NULL

[1] FALSE

[1] FALSE

as you notice, it is an unbalanced design, so When I run the model

Error: length(f1) == length(f2) is not TRUE
Adem?s: Mensajes de aviso perdidos
1: In pondident:samplday :
  expresi?n num?rica tiene 400 elementos: solo el primero es utilizado
2: In pondident:samplday :
  expresi?n num?rica tiene 400 elementos: solo el primero es utilizado

You can help me? I could not find the solution for unbalanced designs
applied to generalized models

Gracias!
Gabriela

Lic. Mar?a Gabriela Agostini


CIMA. Centro de Investigaciones del Medio Ambiente.

Facultad de Ciencias Exactas. UNLP

47 y 115 s/n (1900) La Plata. Argentina


Conservaci?n de Anfibios en Agroecosistemas

Sapos y Ranas del Fondo de tu Casa

http://www.facebook.com/saposyranasdelfondodetucasa

Ben Bolker

Sat, Feb 2, 2013 9:04 PM #

Gabriela Agostini <gabrielaagostini18 at ...> writes:

Try making samplday a factor ... In fact, your error is the
second one listed under http://glmm.wikidot.com/faq#errors , and
making the grouping variables a factor is the suggested remedy.

[snip]

 Lack of balance should not be a problem for GLMMs, unless it's
extreme (e.g. some completely missing combinations of fixed effects,
or all zeros or ones in some random-effect levels, i.e. 
complete separation).  In fact, unbalanced designs are one 
reason that people use 'modern' mixed models rather than
classical method-of-moments ANOVA (which has a hard time
with lack of balance).

By the way, studyarea+sp+studyarea*sp is redundant (although
harmless).   Either

studyarea+sp+studyarea:sp  (main effects + interaction) or
studyarea*sp               (ditto, shorthand) 

should be sufficient