Structural zeros in lme4 - R-SIG-mixed-models

Mon, Oct 4, 2010 11:06 PM #

Dear R users,
I am working with a binary data set of categorical variables with repeated
measurements on 100 individuals over 24 trials with follwing variables

success =0 or 1
age = j, s, y or a
ra = l, m, h
sex= m or f

With the data layed out as below

success   age    ra    sex   ind trial
1         a      l     m     1     1
1         s      m     f     2     1
0         y      h     f     3     1
....................................
....................................
....................................
0         a      l     m     1     24
0         s      m     f     2     24
1         y      h     f     3     24
When I try to run the following model lme4

success~(age + ra + sex)^2 + (1|ind), binomial, verbose=T)
I get the following error message

0:           nan: 0.411061 -2.99573 -0.587787  14.0386  14.3205 -12.5703
-11.9282  11.3544  12.2767  1.07485  1.33312  13.4198      nan      nan
-11.0981 -14.4590 -13.2578  1.18634      nan
Error in asMethod(object) : matrix is not symmetric [1,2]
In addition: Warning message:
In mer_finalize(ans) : gr cannot be computed at initial par (65)

that are produced by the interactions in the data set.

However these empty cells are inherent in the data. Whilst all
combinations of sex and age are possible not all combinations or ra and
age, or ra and sex, can exist.

Specifically ra = l and sex = f
             ra = l and age = s
             ra = l and age = j cannot occur.

If the interactions with ra are removed the model runs. However
interactions between ra, age and sex are theoretically possible. From
looking at the raw data and running a glm without individual as a random
effect it seems like there might be a significant interaction between ra
and age.

Does anyone know if there is a way to test this without collapsing my data?


------------------------------------------------------------------
This message and attachments are subject to a disclaimer. Please refer to http://www.it.up.ac.za/documentation/governance/disclaimer/ for full details. / Hierdie boodskap en aanhangsels is aan 'n vrywaringsklousule onderhewig. Volledige besonderhede is by http://www.it.up.ac.za/documentation/governance/disclaimer/ beskikbaar.

Douglas Bates

Tue, Oct 5, 2010 7:01 AM #

On Tue, Oct 5, 2010 at 1:06 AM, Lydia Belton <lbelton at zoology.up.ac.za> wrote:

Dear R users,
I am working with a binary data set of categorical variables with repeated
measurements on 100 individuals over 24 trials with follwing variables

success =0 or 1
age = j, s, y or a
ra = l, m, h
sex= m or f

With the data layed out as below

success ? age ? ?ra ? ?sex ? ind trial
1 ? ? ? ? a ? ? ?l ? ? m ? ? 1 ? ? 1
1 ? ? ? ? s ? ? ?m ? ? f ? ? 2 ? ? 1
0 ? ? ? ? y ? ? ?h ? ? f ? ? 3 ? ? 1
....................................
....................................
....................................
0 ? ? ? ? a ? ? ?l ? ? m ? ? 1 ? ? 24
0 ? ? ? ? s ? ? ?m ? ? f ? ? 2 ? ? 24
1 ? ? ? ? y ? ? ?h ? ? f ? ? 3 ? ? 24
When I try to run the following model lme4

success~(age + ra + sex)^2 + (1|ind), binomial, verbose=T)
I get the following error message

0: ? ? ? ? ? nan: 0.411061 -2.99573 -0.587787 ?14.0386 ?14.3205 -12.5703
-11.9282 ?11.3544 ?12.2767 ?1.07485 ?1.33312 ?13.4198 ? ? ?nan ? ? ?nan
-11.0981 -14.4590 -13.2578 ?1.18634 ? ? ?nan
Error in asMethod(object) : matrix is not symmetric [1,2]
In addition: Warning message:
In mer_finalize(ans) : gr cannot be computed at initial par (65)

From reading previous posts I'm assuming this is caused by empty cells

that are produced by the interactions in the data set.

However these empty cells are inherent in the data. Whilst all
combinations of sex and age are possible not all combinations or ra and
age, or ra and sex, can exist.

Specifically ra = l and sex = f
? ? ? ? ? ? ra = l and age = s
? ? ? ? ? ? ra = l and age = j cannot occur.

If the interactions with ra are removed the model runs. However
interactions between ra, age and sex are theoretically possible. From
looking at the raw data and running a glm without individual as a random
effect it seems like there might be a significant interaction between ra
and age.

Does anyone know if there is a way to test this without collapsing my data?

I don't know of an easy way of doing that but, as you are the second
person in as many weeks to pose such a question, it is obviously a
situation that we should accommodate.

The glm function uses a modified version of some classical linear
algebra software (Linpack, which predated Lapack) specifically to
handle rank deficient cases.  These arise most commonly from missing
cells, as above.  The way the calculations are done in lmer and glmer
we don't have the option of using that decomposition that detects rank
deficiency.  We may be able to modify the code to do a preliminary
decomposition using the modified Linpack routine and take corrective
action if it is found to be rank deficient.  I regret to say that I
won't be able to do so soon as I am swamped by my teaching obligations
at present.

Hadley Wickham

Wed, Oct 6, 2010 6:27 AM #

Well, it's not easy, but you can do:

  ra_age_sex <- interaction(ra, age, sex, drop = T)

and then model with

  success ~ ra_age_sex + (1|ind)

The hard part is creating the contrasts that take you back to the
questions you're interested in.

Hadley

Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Douglas Bates

Wed, Oct 6, 2010 8:07 AM #

On Wed, Oct 6, 2010 at 8:27 AM, Hadley Wickham <hadley at rice.edu> wrote:

As you say, that's the hard part.

The fix I have in mind is to do a preliminary QR decomposition of the
model matrix for the fixed effects

qr(model.matrix(~ ra + age + sex))

and examine the rank.  If this matrix is rank-deficient there will be
a non-trivial permutation of the coefficients in the qr structure and
you simply extract the first r columns from the model matrix after
permutation.  The remaining p - r columns are the coefficients whose
estimates are NA in the coefficients table.