Skip to content

MCMCglmm Poisson with an offset term and splines

15 messages · Jarrod Hadfield, dani, David Winsemius

#
Hi,

The example is not reproducible: l_lfvcspn does not exist.

The error is telling you that you don't have 11 fixed effects in the 
model. Change k to the number of fixed effects in the model.

Jarrod
On 25/09/2017 16:39, dani wrote:
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20170925/cb9b8f07/attachment.ksh>
#
Hello again,


Thank you so much for your prompt response. I apologize for the silly questions, I am a true beginner and I am ashamed of my ignorance. I guess I should explain what I did:


I used the spl2 function and obtained the fixed and random factors corresponding to the variable I needed the smoother for (named f_lfv_c).


newdatab$l_lfvcspo<-spl2(~s(f_lfv_c,k=10), data=newdatab, p=F)
newdatab$l_lfvcspn<-spl2(~s(f_lfv_c,k=10), data=newdatab)

I am not sure how to attach the random effects corresponding to the l_lfvcspn. I get this array of 8 variables and I am really not sure how to include them in the model. Should I get forget about spl2 and simply add the variable f_lfv_c as a fixed term and spl(f_lfv) in the idv random term?

Also, it seems to me that I have 11 fixed effects, I am not sure what to do.

I am really sorry about all these silly questions, I really do not understand how these things work, but I would like to know more about this.

Best regards!


Sent from Outlook<http://aka.ms/weboutlook>
1 day later
#
Hi Dani,


It is still not possible for us to diagnose the problem. You need to 
provide code+data that reproduces the error. f_lfv_c does not appear in 
newdatab.


Cheers,


Jarrod
On 25/09/2017 18:08, dani wrote:
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20170926/df6ac8c9/attachment-0001.ksh>
#
Hi Jarrod,


Thank you so much for your kind answer! I have been struggling with it and went back to read more about MCMC. I will be posting the complete code and the data as soon as I get a chance this afternoon.


Best regards!

Dani N-M


Sent from Outlook<http://aka.ms/weboutlook>
#
Hello Jarrod,

I have attached my code and my file.

Variable age has 3 levels, variables x2, x8, and x9 have two levels each, and the rest of the variables are continuous.

I re-ran the spl2 function and this time around I obtained one single fixed smoother (l_lfvcspo) and one single random smoother (l_lfvcspn). Last time around I got 8 variables with suffixes from 1-8 for l_lfvcspn and I did not know what to do with those.

Also, as I had 11 fixed effects (corresponding to 11 variables), I thought it was appropriate to choose k=11. I was not sure whether the intercept needed to be counted as well as the levels of the categorical fixed predictors (except for their reference categories). Because I kept on getting the "mu" error for k=11, I tried k=13 (which includes the intercept and the 2 levels for the 3-level age variable, which I did not consider before). I am not sure this is the way to go, I guess I need to read more to be able to model properly fixed effects in R, but for now I was wondering whether this consideration of fixed effects sounds ok in this particular example.

The MCMC model worked properly based on the prior using k=13.

Please see the code below:

#spl2 function

library(mgcv)

spl2<-function(formula, data, p=TRUE, dataX=data){

  aug<-nrow(data)-nrow(dataX)

  if(aug!=0){
    if(aug<0){
      stop("sorry nrow(dataX) must be less than or equal to nrow(data)")
    }else{
      augX<-matrix(0, aug, ncol(dataX))
      colnames(augX)<-colnames(dataX)
      dataX<-rbind(dataX, augX)
    }
  }
  smooth.spec.object<-interpret.gam(formula)$smooth.spec[[1]]
  sm<-smoothCon(smooth.spec.object, data=data, knots=NULL,absorb.cons=TRUE, dataX=dataX)[[1]]

  Sed<-eigen(sm$S[[1]])
  Su<-Sed$vectors
  Sd<-Sed$values
  nonzeros <- which(Sd > sqrt(.Machine$double.eps))

  if(p){
    Zn<-sm$X%*%Su[,nonzeros, drop=FALSE]%*%diag(1/sqrt(Sd[nonzeros]))
  }else{
    Zn<-sm$X[,-nonzeros, drop=FALSE]
  }
  return(Zn[1:(nrow(data)-aug),,drop=FALSE])
}

$spline terms

newdatab1$l_lfvcspo<-spl2(~s(f_lfv_c,k=10), data=newdatab1, p=F)
newdatab1$l_lfvcspn<-spl2(~s(f_lfv_c,k=10), data=newdatab1)
summary(newdatab1$l_lfvcspo)
summary(newdatab1$l_lfvcspn)

summary(newdatab1)
dim(newdatab1)
str(newdatab1)

#PRIOR


k<-13 # number of fixed effects

prior<-list(B=list(V=diag(k)*1e4, mu=rep(0,k)),
            R=list(V=1, nu=0),
            G=list(G1=list(V=1, nu=0),
                   G2=list(V=1, nu=0),
                   G3=list(V=1, nu=0)))

prior$B$mu[k]<-1 # assuming the offset term is last
prior$B$V[k,k]<-1e-4

# MCMC model

mcmc <- MCMCglmm(y ~ age+x2+x8+x9+x3+l_lfvcspo+x4+x5+x6+x7+offset,
                 random =~ STUDYID+class+idv(l_lfvcspn),
                 data   = newdatab1,
                 family = "poisson", prior=prior)
summary(mcmc)



This model has worked, so I would like to thank you so much for all your help so far. I guess I just want to make sure I understand how to model the fixed effects in MCMCglmm and I also would like to make sure that my model is correct.

I truly appreciate all your invaluable help!
Best regards,
Dani NM
<http://aka.ms/weboutlook>
#
Hi,


There are 12 terms in the model not 13 so k=12. The k in the prior 
specification is completely unrelated to the number of knot points in 
the spline.


Cheers,


Jarrod
On 27/09/2017 15:16, dani wrote:
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20170927/803e3d48/attachment-0001.ksh>
#
Hello again,


Thanks! I was indeed referring to the k in the prior formula. I guess I am still confused whether I should add the intercept to the number of fixed effects, as I have 12 variables as I count as fixed effects without the intercept. Would it be possible to confirm that for me, please?


Also, for k=12 in the prior formula, I get this error:

Error in MCMCglmm(y ~ age + x2 + x8 + x9 + x3 + l_lfvcspo + x4 + x5 +  :
  fixed effect mu prior is the wrong dimension


I am puzzled about this error.


I do not seem to understand how to select the fixed effects. Could someone point me to a source of information where I can learn more about this, please?  Also, is there a book where I can learn more about prior specification? I think I need to see more examples.


Thank you so much. I apologize for this, but I am very confused and I would like to learn more about this!

Best,

DNM


Sent from Outlook<http://aka.ms/weboutlook>
#
Hi,


Yes - the intercept is a parameter so should be included. If you get 13 
effects then the data/code you sent us are different from the data/code 
you are using. The issue is not really anything to do with your 
(mis)understanding about priors it is simply that you have set priors 
for k parameters whereas there are in fact not k parameters.


Cheers,


Jarrod
On 27/09/2017 15:38, dani wrote:
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20170927/8eff2170/attachment-0001.ksh>
#
Hello Jarrod,


Thank you so much for your patience!


This is the model that I used:


mcmc <- MCMCglmm(y ~ age+x2+x8+x9+x3+l_lfvcspo+x4+x5+x6+x7+offset,
                 random =~ STUDYID+class+idv(l_lfvcspn),
                 data   = newdatab1,
                 family = "poisson", prior=prior)

I simply do not understand what am I doing wrong. These are my fixed terms:


1) age

2) x2

3) x8

4) x9

5) x3

6) l_lfvcspo

7) x4

8) x5

9) x6

10) x7

11) offset


There are thus 11 terms. If I add the intercept I obtain 12 fixed effects.


If I put k=12 per the number of fixed effects that I have, I get this error:


Error in MCMCglmm(y ~ age + x2 + x8 + x9 + x3 + l_lfvcspo + x4 + x5 +  :
  fixed effect mu prior is the wrong dimension


Do you think it might be an issue with the structure of the newdatab1 dataset? It seems Ok when I look at it. I do not know what else to do, I seem to have exhausted all options.


I apologize again, I really do not understand what am I doing wrong and how can I improve.


Best,

DNM


Sent from Outlook<http://aka.ms/weboutlook>
#
Hi,


If I use the code you sent me with the data you sent me there are 12 
effects and using k=12 works fine. You must have changed the data and/or 
the code.


Jarrod
On 27/09/2017 16:13, dani wrote:
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20170927/22d1115d/attachment-0001.ksh>
#
Hi,


This is very puzzling. I simply copied and pasted the code that I sent you and I used the same table, I did not do anything else. I will re-try on a different computer. The error persists when I use k=12 and I attempt to run my model.


I will try again and report back soon.

Thanks!

Best,

DNM


Sent from Outlook<http://aka.ms/weboutlook>
#
Hello again,


I re-run everything and it worked! Thank you so much!


I elucidated what the problem was: after I ran the spl2 function and I created the two terms for the splines, I saved the table and I guess something happened when I did that. I have now commented that line and I simply ran the code without exporting the file and it finally worked.


Thank you so much, Jarrod! I truly appreciate all your help!

To the other members of the list, I would like to apologize for sending so many messages.

Best regards,

DNM
#
This is just a heads up for dani and Jarrod.

You are having a nice conversation, but the data was never distributed to the list because as far as I can see dani only uses HTML in his emails and probably sent a csv or .Rdata file which would have been scrubbed by the mailserver. So no one but you two can see what is really happening, even though some of the rest of us might have an interest in seeing how well splines do (or don't) work in MCMCglmm.models.

Best;
David.
David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law
#
Hello David,


My file is very large indeed.


I would just like to emphasize that there was nothing wrong with the way the spl2 function worked.


Everything worked properly when I followed Jarrod's advice. The code was ok.


The problem is that I am a beginner and I keep on saving my files and exporting my file messed with my table.


Therefore, everything worked properly.


I apologize again for taking so much time from everyone with this problem that turned out not to be a problem, after all.


Thanks again everyone!

Best

DNM


Sent from Outlook<http://aka.ms/weboutlook>
#
Hi David,

Dani's original issue wasn't with the spline part of the model per se 
but fixing the parameter associated with the offset variable at 1. Below 
is an example of fitting a spline to made up wiggly data if that helps.

Cheers,

Jarrod


library(mgcv)
library(MCMCglmm)

spl2<-function(formula, data, p=TRUE, dataX=data){

   aug<-nrow(data)-nrow(dataX)

   if(aug!=0){
     if(aug<0){
       stop("sorry nrow(dataX) must be less than or equal to nrow(data)")
     }else{
       augX<-matrix(0, aug, ncol(dataX))
       colnames(augX)<-colnames(dataX)
       dataX<-rbind(dataX, augX)
     }
   }
   smooth.spec.object<-interpret.gam(formula)$smooth.spec[[1]]
   sm<-smoothCon(smooth.spec.object, data=data, 
knots=NULL,absorb.cons=TRUE, dataX=dataX)[[1]]

   Sed<-eigen(sm$S[[1]])
   Su<-Sed$vectors
   Sd<-Sed$values
   nonzeros <- which(Sd > sqrt(.Machine$double.eps))

   if(p){
     Zn<-sm$X%*%Su[,nonzeros, drop=FALSE]%*%diag(1/sqrt(Sd[nonzeros]))
   }else{
     Zn<-sm$X[,-nonzeros, drop=FALSE]
   }
   return(Zn[1:(nrow(data)-aug),,drop=FALSE])
}

x<-rnorm(200)
y<-3*cos(x*3)+0.6*x^2+x+rnorm(200)
# made up data that `wiggles' with respect to x

dat<-data.frame(y=y,x=x)

dat$xpo<-spl2(~s(x,k=10), data=dat, p=F)

# random effect predictor for non-smoothed spline term

dat$xpn<-spl2(~s(x,k=10), data=dat)
# random effect design matrix for smoothed spline terms

m1<-MCMCglmm(y~xpo, random=~idv(xpn), data=dat, pr=TRUE)
# fit model

p1<-predict(m1, marginal=NULL)
# get spline predictions

plot(y~x)
# plot data

lines(p1[order(x)]~x[order(x)])
# plot spline predictions

lines(3*cos(x[order(x)]*3)+0.6*x[order(x)]^2+x[order(x)]~x[order(x)], lty=2)
# plot actual function
On 27/09/2017 18:58, David Winsemius wrote: