Error in lme4 rank of X = 28 < ncol(X) = 29
On 14-01-12 06:47 PM, Michael Williamson wrote:
Good Morning, I spoke you recently concerning a problem with a rank deficient error when running a glmer model (see below) My boss wants me to try and run a model working out rates using a glmmadbd model and I'm getting the same rank deficient problem - SABmod<-glmmadmb(SABRate~Socialbeh+Compcat+DistancePodCat+
DistanceSingerCat+Pods15Cat+SingersCat+Windspeed.km.h.+BoatPhs,random=~1|FocalID,family="nbinom1",
zeroInflation=TRUE, data=Behav) Error in glmmadmb(SABRate ~ Socialbeh + Compcat + DistancePodCat + DistanceSingerCat + : rank of X = 28 < ncol(X) = 29 An updated version of lme4 seemed to fix the previous model and it now takes into account rank deficiencies and I was wondering if there was a version on glmmadmb package that may do this also. I've tried to reinstall the newest version but doesn't appear to work.
No, there isn't -- glmmADMB hasn't implemented this feature (yet). You're welcome to look at the chkRank.drop.cols() function in https://github.com/lme4/lme4/blob/master/R/modular.R to see how lme4 does it ... one way or the other, though, you're going to have to drop a column. You can use qr() or svd() to figure out which columns are multicollinear ... I'm taking the liberty of cc'ing this back to r-sig-mixed-models. cheers Ben Bolker
Thanks for your time Mike Williamson -----Original Message----- From: lme4 maintainer [mailto:bbolker at gmail.com] Sent: Sunday, 5 January 2014 3:50 PM To: Michael Williamson; "lme4-authors at lists.r-forge.r-project.org"@newmailhub.uq.edu.au Subject: Re: Error in lme4 rank of X = 28 < ncol(X) = 29 On 14-01-02 05:32 PM, Michael Williamson wrote:
Good Morning, My name is Mike Williamson and I am a research assistant for the University of Queensland. I'm in the process of running some analyses with R on a dataset I have been working on in collaboration with a PhD student. Just before Christmas she sent me some code for me to use on my dataset. I started upon it yesterday but was consistently coming up with the error below
SABMod<- glmer(SABinom~Socialbeh+Compcat+DistancePodCat+DistanceSingerCat+Pods
15+
SingersCat+Windspeed.km.h.+BoatPhs+(1|FocalID),data=BehavTr,family="bino
SingersCat+mial")
Error in lme4::glFormula(formula = SABinom ~ Socialbeh + Compcat + DistancePodCat + : rank of X = 29 < ncol(X) = 31 Sadly she is completely out of contact on field work for the next 2-3 weeks and I'm pretty stuck on this so apologies for the email but I was wondering if might be able to help. I'm working on the whether the relationship of boat phase (before approach attempt after) affects the rates of binomial data of surface active behaviours of whales (SABinom). The predictor variables are something she has been working on and told me to include in my model. From the bit of research I've done it looks like the error says my data is rank deficient. Now I think this is most likely because of my low samples for my approach and attempt but I can't find anywhere online or in papers (such as Zuul 2009) to confirm this or find an option for low sample sizes. I've attached the sample sizes for presence absence of surface active behaviours for boat phase below. As you can see approach and attempt only have 5 and 15 between. This is due to the approach and attempt phases being shorter periods than before and after. Is low sample size the problem here? And if so do you have any suggestions for how I could go about this, or any books to read. I've been trawling the libraries online and don't seem to be able find anything.
Sorry to take so long to get back to you. Your data are indeed rank-deficient; the error is telling you that you are trying to estimate 31 fixed-effect parameters (columns of the X matrix), but that there are only 29 linearly independent combinations of predictor variables in your data set. Small sample size is not an inherent problem, but it is generally more likely (with an unbalanced data set) that you will end up with a rank-deficient problem. I was hoping I had written more somewhere else in the past about how to use model.matrix() and svd() to diagnose multicollinearity problems -- https://stat.ethz.ch/pipermail/r-sig-mixed-models/2012q4/019499.html isn't very complete (I should add it to the FAQ). The fact that this table is badly unbalanced doesn't automatically mean you have a problem, but in combination with some of your other variables it is more likely to make the problem unidentifiable.
0 1 After 7445 1015 Approach 351 5 Attempt 991 15 Before 2079 344
If you can install a recent development version of lme4 (e.g.
install.packages("lme4",repos="http://lme4.r-forge.r-project.org/repos"),
that may help -- the current development version automatically tries to adjust the fixed-effect model matrix to get rid of rank deficiency.
Ben Bolker