Mixed-model polytomous ordered logistic regression ?
Hi Emmanuel, You can use the clmm() function in package ordinal to fit cumulative link mixed models which include the proportional odds mixed model that you mention. The non-mixed equivalent, clm() gives the same answers as polr() and glm() on your example data. clmm() currently only allows for one random intercept, but Ken Knoblauch proposed a polmer() function (http://finzi.psych.upenn.edu/R-sig-mixed-models/2010q2/003778.html) that seem to be an extension of glmer() similar to your glm() hack, but for mixed models. This allows for several random effects in a glmer() fashion. On the other hand, with clmm() you get a variety of link functions and extensions to scale and nominal effects for modelling non-proportional effects, structured thresholds etc. The idea of reformulating the cumulative link model as a binomial generalized linear model is good, but I haven't seen it described before - can I ask you which exercise in what edition of Gelman & Hill (2007) that mentions this? Cheers, Rune
On 30 December 2010 23:27, Emmanuel Charpentier <emm.charpentier at free.fr> wrote:
Dear list, Following a hint of an exercise in Gelman & Hill's (2007) regression textbook, I tried to understand ?how to build a function generalizing to mixed models what polr() (MASS) does for fixed models. I started with a very simple artificial dataset (see at end). Using the simplest polr() use :
summary(polr(Cat~X, data=Data))
Re-fitting to get Hessian
Call:
polr(formula = Cat ~ X, data = Data)
Coefficients:
?Value Std. Error t value
X 9.557 ? ? ?1.821 ? 5.247
Intercepts:
? ? ?Value ? Std. Error t value
C1|C2 14.8668 ?3.0447 ? ? 4.8828
C2|C3 24.2772 ?4.7119 ? ? 5.1523
C3|C4 34.2367 ?6.5861 ? ? 5.1984
C4|C5 43.3543 ?8.2390 ? ? 5.2621
C5|C6 53.8174 10.3240 ? ? 5.2128
C6|C7 63.5247 12.2352 ? ? 5.1920
C7|C8 72.5850 13.7899 ? ? 5.2636
C8|C9 82.2256 15.8144 ? ? 5.1994
Residual Deviance: 55.7395
AIC: 73.7395
Message d'avis :
glm.fit: fitted probabilities numerically 0 or 1 occurred
the following snippet :
Glm.polr<-local({
?ll<-length(lev<-levels(Data$Cat))
?thr<-paste(lev[-ll], "|", lev[-1], sep="")
?Dataset<-do.call(rbind,
? ? ? ? ? ? ? ? ? lapply(2:ll,
? ? ? ? ? ? ? ? ? ? ? ? ?function(i) {
? ? ? ? ? ? ? ? ? ? ? ? ? ?data.frame(Thr=thr[i-1],
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Cat=Data$Cat>=lev[i],
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Data[,names(Data)[!(names(Data) %in
% c("Thr", "Cat"))]])}))
?Dataset$Thr<-factor(Dataset$Thr)
?glm(Cat~0+Thr+X, data=Dataset, family=binomial(link=logit))
})
made me able to reproduce my standard (up to numerical precision, with
which I didn't tinker, and a couple of cosmetic details (names and sign
of intercepts)), both for central value and variability :
summary(Glm.polr)
Call: glm(formula = Cat ~ 0 + Thr + X, family = binomial(link = logit), ? ?data = Dataset) Deviance Residuals: ? ? Min ? ? ? ?1Q ? ?Median ? ? ? ?3Q ? ? ? Max -2.09822 ?-0.00001 ? 0.00000 ? 0.00000 ? 1.92014 Coefficients: ? ? ? ? Estimate Std. Error z value Pr(>|z|) ThrC1|C2 ?-14.858 ? ? ?3.050 ?-4.872 1.11e-06 *** ThrC2|C3 ?-24.263 ? ? ?4.722 ?-5.138 2.78e-07 *** ThrC3|C4 ?-34.217 ? ? ?6.600 ?-5.184 2.17e-07 *** ThrC4|C5 ?-43.329 ? ? ?8.257 ?-5.247 1.54e-07 *** ThrC5|C6 ?-53.786 ? ? 10.346 ?-5.199 2.01e-07 *** ThrC6|C7 ?-63.489 ? ? 12.260 ?-5.179 2.24e-07 *** ThrC7|C8 ?-72.542 ? ? 13.820 ?-5.249 1.53e-07 *** ThrC8|C9 ?-82.174 ? ? 15.852 ?-5.184 2.18e-07 *** X ? ? ? ? ? 9.551 ? ? ?1.825 ? 5.233 1.67e-07 *** --- Signif. codes: ?0 ?*** ?0.001 ?** ?0.01 ?* ?0.05 ?. ?0.1 ? ? 1 (Dispersion parameter for binomial family taken to be 1) ? ?Null deviance: 1109.035 ?on 800 ?degrees of freedom Residual deviance: ? 55.737 ?on 791 ?degrees of freedom AIC: 73.737 Number of Fisher Scoring iterations: 12 One should note that "large" thresholds (in absolute value) have larger standard errors, which is probably an artifact, which I don't know how to get rid of. The glm claim of "800 degrees of freedom" is ridiculous : I have only 100 observations. More on this below... Now, to extend this to mixed models (using lme4's notation as ?model) is not absolutely trivial : The "fixed effect" part of the formula must *not* have intercepts (that's what the thresholds become) : how to ensure this ? In the "random effect" part, the intercepts can be replaced by the thresholds (one can think of legitimate reasons for which thresholds should depend on a random factor...). Should that be done also for implicit thresholds (e. g. (X | Id) means, in a GLM (implicitely) "variation of slope of X AND intercept according to Id". Should that become "variation of slope AND threshold according to Id" ?). How should the tinkering with the original data be handled ? Systematically copying the original dta can be heavy, but is safe. Can the built-in "copy-on-write" mechanisms of R be sufficient ? What "other" models would be useful in such a function ? Cloglog ? "Continuation" ? Others ? More generally, shouldn't such a function (polrer ?) be part of some hypothetical (Platonician ?) lme4 ? Ordered data are important in many cases where numerical expressions cannot be trusted, and the ability to fit mixed models of these data would be important. But doing this on a case by case basis is somewhat error-prone (especially when done by me :). For now on, I cope with BUGS and Bayesian interpretation. And hereby lies a possibly important remark. My BUGS code (JAGS dialect, inspired by the "Bones" example of Classic Bugs vol I) and R call (through rjags) is at end. The results call for some comments. Both graphical examination and some formal inspection show good convergence (the predictions are somewhat harder to stabilize...) :
gelman.diag(Mod$Mod.coda[,-(1:100)]) ## First 100 columns are
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ## predicted values for the ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ## original data. Potential scale reduction factors: ? ? ? ? Point est. 97.5% quantile beta.1 ? ? ? ? 1.00 ? ? ? ? ? 1.00 deviance ? ? ? 1.00 ? ? ? ? ? 1.00 gamma[1] ? ? ? 1.00 ? ? ? ? ? 1.00 gamma[2] ? ? ? 1.00 ? ? ? ? ? 1.00 gamma[3] ? ? ? 1.00 ? ? ? ? ? 1.01 gamma[4] ? ? ? 1.00 ? ? ? ? ? 1.00 gamma[5] ? ? ? 1.00 ? ? ? ? ? 1.00 gamma[6] ? ? ? 1.00 ? ? ? ? ? 1.00 gamma[7] ? ? ? 1.00 ? ? ? ? ? 1.00 gamma[8] ? ? ? 1.00 ? ? ? ? ? 1.00 Multivariate psrf 1.00 The fitted data and their variability are cosmetically quite different (the Bayesian model's gamma is the equivalent of the Threshold/X coefficient of the polr/glm frequentist model), but their central value quite consistent with previous values :
summary(Mod$Mod.coda[,-(1:100)])
Iterations = 11010:21000
Thinning interval = 10
Number of chains = 3
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
? plus standard error of the mean:
? ? ? ? ? Mean ? ? ?SD Naive SE Time-series SE
beta.1 ? ?8.927 1.68312 0.030729 ? ? ? 0.027433
deviance 64.674 4.26684 0.077902 ? ? ? 0.087455
gamma[1] ?1.544 0.08815 0.001609 ? ? ? 0.001649
gamma[2] ?2.544 0.10957 0.002000 ? ? ? 0.002065
gamma[3] ?3.580 0.09930 0.001813 ? ? ? 0.001708
gamma[4] ?4.544 0.08445 0.001542 ? ? ? 0.001607
gamma[5] ?5.630 0.11998 0.002191 ? ? ? 0.002137
gamma[6] ?6.627 0.12406 0.002265 ? ? ? 0.002463
gamma[7] ?7.606 0.11175 0.002040 ? ? ? 0.002019
gamma[8] ?8.608 0.18558 0.003388 ? ? ? 0.003343
2. Quantiles for each variable:
? ? ? ? ? 2.5% ? ?25% ? ?50% ? ?75% ?97.5%
beta.1 ? ?6.180 ?7.741 ?8.799 ?9.878 12.582
deviance 58.271 61.458 64.116 67.112 74.836
gamma[1] ?1.363 ?1.489 ?1.548 ?1.604 ?1.706
gamma[2] ?2.333 ?2.469 ?2.541 ?2.616 ?2.765
gamma[3] ?3.386 ?3.512 ?3.582 ?3.647 ?3.779
gamma[4] ?4.385 ?4.487 ?4.543 ?4.600 ?4.713
gamma[5] ?5.406 ?5.545 ?5.630 ?5.713 ?5.870
gamma[6] ?6.358 ?6.550 ?6.633 ?6.715 ?6.851
gamma[7] ?7.392 ?7.529 ?7.603 ?7.682 ?7.821
gamma[8] ?8.266 ?8.470 ?8.608 ?8.738 ?8.967
One should note, however, that, while the central value and sampling StD
of beta are consistent with polr/glm results, the ratios SD/mean of the
thresholds are much smaller than with polr/glm, and this is reflected in
the credible intervals. Furthermore, the SDs do not seem to increase as
much with the (absolute value) of the central values. (Yes, I am aware
that centering my X values would reduce this phenomenon, but I kept them
uncentered to illustrate my point).
For example, the 0.95 confidence interval of the C1|C2 threshold is about
6.1 wide (about 41% of the mean) whereas the 0.95 credible interval of
gamma[1] is (1.36 1.70), 0.34 wide, which is about 25% of its mean. For
the C8|C9 thershold, the IC95/mean ratio is about 37% of the mean, while
the CrI95/mean of gamma[8] is 8.1%.
Given that the two models are respectively a frequentist and a Bayesian
of the very same probability model, this is perplexing.
Are there reasons to suspect that the SD reported by glm and/or polr are
inflated (for example, these model do not seem to account for the re-use
of the same data for the estimation of the various thresholds) ? Or is
there a (maybe not so subtle) error in the Bayesian model interpretation
in BUGS ?
Or am I barking up the wrong tree ?
Any idea would be welcome ...
Sincerely yours,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Emmanuel Charpentier
The artificial data :
Data <-
structure(list(X.real = c(3.49764437862872, 6.90659633160501,
2.49335390384994, 7.24293843863257, 0.803964727576124, 4.04213878877333,
1.06214939381180, 0.554054259416088, 4.08729099177371, 2.52629057633986,
3.24127380430347, 1.11891797458651, 0.530158326675324, 9.13663197362062,
4.50887509479721, 8.28526961997477, 4.88346597634514, 7.1687905770683,
8.30770739436844, 4.10970688126136, 0.926142793485848, 1.60153659527237,
7.9980129347903, 7.09517767824342, 3.13154316634710, 3.73947230927693,
2.80221854118421, 6.59984440492743, 2.61928305396566, 7.01599863310761,
2.24968126253923, 1.58829945203618, 1.90902984485251, 5.12920046748922,
9.2118070532461, 9.08208307216233, 4.43550132482185, 8.05729871351512,
1.18297174176127, 8.46910981252808, 4.43676303030525, 6.4418454794069,
4.53446029962874, 0.699769024165557, 8.46097165332483, 4.53799448354545,
5.27431575051651, 7.54335240307173, 2.17309956012568, 5.88721264347949,
9.4071993099453, 1.88089249201845, 5.3721556593878, 0.771888320064126,
8.1072438324685, 6.31264339382069, 2.00734973388480, 1.55494366180340,
7.02181982951623, 2.30599238507652, 0.527109325928194, 3.66541005952039,
5.5545317236498, 1.09950173374848, 0.589535171670769, 3.53540683843517,
4.45889129379049, 6.80170522713363, 7.37081649369139, 9.4889221652781,
7.19282466718394, 6.13643707306052, 1.35088176999453, 3.16405426710045,
3.32833977017296, 6.08074276163767, 1.63873091713528, 1.46557732912316,
4.91770214161358, 0.938513984685997, 1.48740958479596, 2.01597792723018,
4.21291073575364, 5.43887605123115, 4.70221174192061, 1.67668472528311,
5.24846045233372, 9.0111473276549, 7.23671006856042, 3.72080675801418,
4.39171014756293, 4.81283564404103, 2.41342165250038, 3.82321065255632,
7.669782577978, 9.2234752201292, 0.641411897606775, 1.13470612170815,
1.02966988829139, 1.24689684186557), Cat = structure(c(3L, 7L,
2L, 7L, 1L, 4L, 1L, 1L, 4L, 3L, 3L, 1L, 1L, 9L, 5L, 8L, 5L, 7L,
8L, 4L, 1L, 2L, 8L, 7L, 3L, 4L, 3L, 7L, 3L, 7L, 2L, 2L, 2L, 5L,
9L, 9L, 4L, 8L, 1L, 8L, 4L, 6L, 5L, 1L, 8L, 5L, 5L, 8L, 2L, 6L,
9L, 2L, 5L, 1L, 8L, 6L, 2L, 2L, 7L, 2L, 1L, 4L, 6L, 1L, 1L, 4L,
4L, 7L, 7L, 9L, 7L, 6L, 1L, 3L, 3L, 6L, 2L, 1L, 5L, 1L, 1L, 2L,
4L, 5L, 5L, 2L, 5L, 9L, 7L, 4L, 4L, 5L, 2L, 4L, 8L, 9L, 1L, 1L,
1L, 1L), .Label = c("C1", "C2", "C3", "C4", "C5", "C6", "C7",
"C8", "C9"), class = c("ordered", "factor")), X = c(3.78512255820666,
7.03710649735623, 2.37579547880536, 7.01687069036889, 0.572774665466921,
4.53333128338774, 0.92564311992818, 0.682074690286214, 4.17197303371935,
2.62951212025962, 3.44249595134830, 1.02688214798057, 0.440959503725193,
9.05785834195666, 4.52616407157106, 8.07448237191856, 4.95371782896727,
7.35730414103575, 8.07638469674326, 4.26407495806574, 1.06385684713162,
1.63333280184271, 7.9246178454581, 7.41541196525404, 3.12459209709868,
3.64101167776984, 2.51917629411974, 6.8118629850261, 2.77786400733997,
7.36566898280035, 2.03009770691198, 1.61723070290973, 1.96657910701987,
5.22600151472831, 9.21386030948403, 9.18902347324618, 4.39493834305473,
8.27266371630105, 1.15477068640133, 8.15280513569419, 4.6734050583675,
6.51664088906166, 4.40955269905174, 0.54646886424704, 8.03605092676131,
4.36779957410513, 5.44860695752003, 7.83429549417553, 1.84916430642593,
5.81109382387844, 9.32133297147632, 1.73276033539495, 5.3490390105937,
0.695781216499173, 8.05603266764859, 6.67741469458807, 2.03017543225991,
1.60754713124494, 6.63481861913185, 2.20004584719421, 0.675994410762504,
3.98953568167253, 5.45631429591027, 0.675175153754915, 0.305522085250977,
3.58440905189674, 4.58557708778191, 6.99911313460102, 7.21102458968007,
9.44650050070282, 7.33311596756764, 5.87485729872331, 1.11824133536435,
3.40701638829918, 3.36818213997361, 5.92742193489156, 2.08701427718401,
1.73709762230381, 4.71589786549998, 0.980248052739725, 1.58957599464140,
2.17701413093404, 4.22999685414555, 5.84947271423299, 4.94396458250251,
1.65133703600731, 5.17336780283038, 8.8993318875765, 7.29238547003421,
3.62020801752199, 4.28550112286315, 4.95543088867053, 2.61555021108721,
3.85711807608966, 7.67091954093986, 9.24609859359625, 0.436967673925767,
1.32063426377087, 0.794911931890139, 1.16453130413352)), .Names = c
("X.real",
"Cat", "X"), row.names = c(NA, -100L), class = "data.frame")
The R/BUGS Bayesian model :
system.time(Mod<-local({
?Mod<-function(){ ## BUGS model as an R function. Easy to manage.
? ?## Thresholds.
? ?for (j in 1:ncut) {
? ? ?gamma.raw[j]~dnorm(0, 1.0E-4)
? ?}
? ?gamma<-sort(gamma.raw)
? ?## p[i,j]=Pr(Cat[i]>j)
? ?## q[i,j]=Pr(Cat[i]=j)
? ?## ? ? ? =Pr(Cat[i]>j-1)-Pr(Cat[i]>j)
? ?## ? ? ? =p[i,j-1]-p[i,j]
? ?for (i in 1:nobs) {
? ? ?Cat[i]~dcat(q[i,1:(ncut+1)])
? ? ?## Predictions
? ? ?Cat.new[i]~dcat(q[i,1:(ncut+1)])
? ? ?q[i,1]<-1-p[i,1]
? ? ?for (j in 2:ncut) {
? ? ? ?q[i,j]<-p[i,j-1]-p[i,j]
? ? ?}
? ? ?q[i,ncut+1]<-p[i,ncut]
? ? ?for (j in 1:ncut) {
? ? ? ?logit(p[i,j])<-beta.1*(X[i]-gamma[j])
? ? ?}
? ?}
? ?beta.1~dnorm(0, 1.0E-4)
?}
?tmpf<-tempfile()
?write.model(Mod, tmpf) ## Auxilliary function writing a function body
?Mod.jags<-jags.model(tmpf,
? ? ? ? ? ? ? ? ? ? ? data=with(Data, {
? ? ? ? ? ? ? ? ? ? ? ? Y<-outer(Cat, levels(Cat)[-1], ">=")
? ? ? ? ? ? ? ? ? ? ? ? list(X=X,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Cat=unclass(Cat),
? ? ? ? ? ? ? ? ? ? ? ? ? ? ?nobs=nrow(Y),
? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ncut=ncol(Y))}),
? ? ? ? ? ? ? ? ? ? ? ## "Reasonable" (not too overdispersed) initial
? ? ? ? ? ? ? ? ? ? ? ## values are *CRUCIAL* to model initialisation
? ? ? ? ? ? ? ? ? ? ? inits=function() {
? ? ? ? ? ? ? ? ? ? ? ? list(gamma.raw=sort(runif(8,min=-1,max=1)),
? ? ? ? ? ? ? ? ? ? ? ? ? ? ?beta.1=runif(1,min=0,max=1))
? ? ? ? ? ? ? ? ? ? ? },
? ? ? ? ? ? ? ? ? ? ? n.chains=3)
?update(Mod.jags, n.iter=10000)
?unlink(tmpf)
?Mod.coda<-coda.samples(Mod.jags,
? ? ? ? ? ? ? ? ? ? ? ? variable.names=c("gamma",
? ? ? ? ? ? ? ? ? ? ? ? ? ## "beta.0",
? ? ? ? ? ? ? ? ? ? ? ? ? "beta.1",
? ? ? ? ? ? ? ? ? ? ? ? ? ## "gamma.adj",
? ? ? ? ? ? ? ? ? ? ? ? ? ## "lzeta",
? ? ? ? ? ? ? ? ? ? ? ? ? ## "pi.sgn",
? ? ? ? ? ? ? ? ? ? ? ? ? "deviance",
? ? ? ? ? ? ? ? ? ? ? ? ? "Cat.new"),
? ? ? ? ? ? ? ? ? ? ? ? thin=10,
? ? ? ? ? ? ? ? ? ? ? ? n.iter=10000)
?list(Mod.jags=Mod.jags, Mod.coda=Mod.coda)
}))
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Rune Haubo Bojesen Christensen PhD Student, M.Sc. Eng. Phone: (+45) 45 25 33 63 Mobile: (+45) 30 26 45 54 DTU Informatics, Section for Statistics Technical University of Denmark, Build. 305, Room 122, DK-2800 Kgs. Lyngby, Denmark