Error in lme4 rank of X = 28 < ncol(X) = 29

Ben Bolker · 2014-01-12T23:59:34Z

On 14-01-12 06:47 PM, Michael Williamson wrote: > Good Morning, > > I spoke you recently concerning a problem with a rank deficient error > when running a glmer model (see below) > > My boss wants me to try and run a model working out rates using a > glmmadbd model and I'm getting the same rank deficient problem - > SABmod zeroInflation=TRUE,

Ben Bolker

Sun, Jan 12, 2014 3:59 PM

On 14-01-12 06:47 PM, Michael Williamson wrote:

DistanceSingerCat+Pods15Cat+SingersCat+Windspeed.km.h.+BoatPhs,random=~1|FocalID,family="nbinom1",

No, there isn't -- glmmADMB hasn't implemented this feature (yet).
You're welcome to look at the chkRank.drop.cols() function in
https://github.com/lme4/lme4/blob/master/R/modular.R to see how lme4
does it ... one way or the other, though, you're going to have to drop a
column.  You can use qr() or svd() to figure out which columns are
multicollinear ...

  I'm taking the liberty of cc'ing this back to r-sig-mixed-models.

 cheers
   Ben Bolker

15+

SingersCat+mial")

Error in lme4::glFormula(formula = SABinom ~ Socialbeh + Compcat +
 DistancePodCat +  :

rank of X = 29 < ncol(X) = 31

Sadly she is completely out of contact on field work for the next 
2-3 weeks and I'm pretty stuck on this so apologies for the email
but I was wondering if might be able to help.

I'm working on the whether the relationship of boat phase (before 
approach attempt after) affects the rates of binomial data of
surface active behaviours of whales (SABinom).  The predictor
variables are something she has been working on and told me to
include in my model.

From the bit of research I've done it looks like the error says my
 data is rank deficient. Now I think this is most likely because of
my low samples for my approach and attempt but I can't find
anywhere online or in papers (such as Zuul 2009) to confirm this or
find an option for low sample sizes. I've attached the sample sizes
for presence absence of surface active behaviours for boat phase
below. As you can see approach and attempt only have 5 and 15
between. This is due to the approach and attempt phases being
shorter periods than before and after. Is low sample size the
problem here? And if so do you have any suggestions for how I could
go about this, or any books to read. I've been trawling the
libraries online and don't seem to be able find anything.

Sorry to take so long to get back to you.  Your data are indeed
rank-deficient; the error is telling you that you are trying to
estimate 31 fixed-effect parameters (columns of the X matrix), but
that there are only 29 linearly independent combinations of predictor
variables in your data set.  Small sample size is not an inherent
problem, but it is generally more likely (with an unbalanced data
set) that you will end up with a rank-deficient problem.

I was hoping I had written more somewhere else in the past about how
to use model.matrix() and svd() to diagnose multicollinearity
problems --
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2012q4/019499.html
isn't very complete (I should add it to the FAQ).

The fact that this table is badly unbalanced doesn't automatically
mean you have a problem, but in combination with some of your other
variables it is more likely to make the problem unidentifiable.

0    1

After    7445 1015 Approach  351    5 Attempt   991   15 Before
2079  344

If you can install a recent development version of lme4 (e.g.

install.packages("lme4",repos="http://lme4.r-forge.r-project.org/repos"),

that may help -- the current development version automatically tries to
adjust the fixed-effect model matrix to get rid of rank deficiency.