Mixed-model polytomous ordered logistic regression ?
Le lundi 03 janvier 2011 ? 19:54 +0100, Rune Haubo a ?crit :
Hi Emmanuel, You can use the clmm() function in package ordinal to fit cumulative link mixed models which include the proportional odds mixed model that you mention. The non-mixed equivalent, clm() gives the same answers as polr() and glm() on your example data. clmm() currently only allows for one random intercept, but Ken Knoblauch proposed a polmer() function (http://finzi.psych.upenn.edu/R-sig-mixed-models/2010q2/003778.html) that seem to be an extension of glmer() similar to your glm() hack, but for mixed models.
That was my point, and Ken Kornblauch has been kind enough to send me his code, which seems quite good. I have not yet worked on his code (and probably won't for the next month at least), but it's probably a good start. The "800 degrees of freedom" problem still bother me. But that's another problem (see below).
This allows for several random effects in a glmer() fashion. On the other hand, with clmm() you get a variety of link functions and extensions to scale and nominal effects for modelling non-proportional effects, structured thresholds etc.
Thank you for the hint. I'll have a look at that. I have trouble remembering which of the 2500+ R packages available on CRAN will solve my problem-of-the-moment. Furthermore, I tend to trust more packages doing something I've personally tinkered with. Hence my current infatuation with BUGS (i. e. JAGS for the time being)... But my point wasn't to solve a specific problem, but to point out that ordinal data are a sufficiently frequent case to deserve "first-class treatment". I am old enough to have been teached what was then called "the general linear model", which was about the only case whee multivariate model could be realistically fitted. Things tended to exa,d a bit with censored data and the introduction of the Cox model, then to Poisson and logistic regression. I tend to think af ordered data as another kind of data that should be analysable the same way. That's what polr() partially does (with a limited but wisely choosen set of options). Similarly, mixed models became *practically* fittable first for the dreaded Gaussian case, then for Poisson, binomial and negative-binomial cases. Ordered data should be analysable the same way. Another case (not directly in my current interests, but one never knows...) is categorical data. Log-linear models allow for a similar treatment, and should be (too) analysable in the mixed models case. A lot is to be learned from the Bayesian models for this kind of problems. I'm currently exploring the (now somewhat outdated) Congdon's textbooks on Bayesian modeling, and it raises interesting ideas bout what should be doable from a frequentist point of view... On the other hand, I'm more and more inclined to think that the Bayesian point of view is extremely valid. The first difficulty is to convince journals' editors that "p-value is crack" and that a posterior joint distribution says all that can be learned about from an experiment/observation, provided the priors are correctly chosen. The second (and harder) is to find ways to express Bayesian probability models in a more concise way than what is currently allowed by BUGS : one should not have to type dozens of almost-identical snippets of code for each and every intermediate parameter appearing in such model. My current idea is tat some sort of a macro facility should help, but is hard to design *correctly*. The third (and hardest) difficulty is to implement that efficiently. I'm a bit stricken to see all "public" BUGS implementations using only one of a 4- ou 8-CPU machine, even for parts (model updating of parallel chains) that are "embarrassingly parallel" ((C) D. Spiegelhalter & al.). In our present case, the Bayesian answer "feels" more reasonable than the answer given by polr/glm, in particular because the variability estimation of the thresholds seems to avoid the artefact induced by a non-centered independent variable...
The idea of reformulating the cumulative link model as a binomial generalized linear model is good, but I haven't seen it described before - can I ask you which exercise in what edition of Gelman & Hill (2007) that mentions this?
Ex 15.1, section 15.5 of Gelman & Hill (2007), p. 342. The printer's info at the beginning of the book (just after the full title page, but I don't know how this page is called in American usage) says it's "First published 2007//Reprinted with corrections 2007//10th printing 2009" I am not aware of any other edition. HTH, Emmanuel Charpentier
Cheers, Rune On 30 December 2010 23:27, Emmanuel Charpentier <emm.charpentier at free.fr> wrote:
Dear list, Following a hint of an exercise in Gelman & Hill's (2007) regression textbook, I tried to understand how to build a function generalizing to mixed models what polr() (MASS) does for fixed models. I started with a very simple artificial dataset (see at end). Using the simplest polr() use :
summary(polr(Cat~X, data=Data))
Re-fitting to get Hessian
Call:
polr(formula = Cat ~ X, data = Data)
Coefficients:
Value Std. Error t value
X 9.557 1.821 5.247
Intercepts:
Value Std. Error t value
C1|C2 14.8668 3.0447 4.8828
C2|C3 24.2772 4.7119 5.1523
C3|C4 34.2367 6.5861 5.1984
C4|C5 43.3543 8.2390 5.2621
C5|C6 53.8174 10.3240 5.2128
C6|C7 63.5247 12.2352 5.1920
C7|C8 72.5850 13.7899 5.2636
C8|C9 82.2256 15.8144 5.1994
Residual Deviance: 55.7395
AIC: 73.7395
Message d'avis :
glm.fit: fitted probabilities numerically 0 or 1 occurred
the following snippet :
Glm.polr<-local({
ll<-length(lev<-levels(Data$Cat))
thr<-paste(lev[-ll], "|", lev[-1], sep="")
Dataset<-do.call(rbind,
lapply(2:ll,
function(i) {
data.frame(Thr=thr[i-1],
Cat=Data$Cat>=lev[i],
Data[,names(Data)[!(names(Data) %in
% c("Thr", "Cat"))]])}))
Dataset$Thr<-factor(Dataset$Thr)
glm(Cat~0+Thr+X, data=Dataset, family=binomial(link=logit))
})
made me able to reproduce my standard (up to numerical precision, with
which I didn't tinker, and a couple of cosmetic details (names and sign
of intercepts)), both for central value and variability :
summary(Glm.polr)
Call:
glm(formula = Cat ~ 0 + Thr + X, family = binomial(link = logit),
data = Dataset)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.09822 -0.00001 0.00000 0.00000 1.92014
Coefficients:
Estimate Std. Error z value Pr(>|z|)
ThrC1|C2 -14.858 3.050 -4.872 1.11e-06 ***
ThrC2|C3 -24.263 4.722 -5.138 2.78e-07 ***
ThrC3|C4 -34.217 6.600 -5.184 2.17e-07 ***
ThrC4|C5 -43.329 8.257 -5.247 1.54e-07 ***
ThrC5|C6 -53.786 10.346 -5.199 2.01e-07 ***
ThrC6|C7 -63.489 12.260 -5.179 2.24e-07 ***
ThrC7|C8 -72.542 13.820 -5.249 1.53e-07 ***
ThrC8|C9 -82.174 15.852 -5.184 2.18e-07 ***
X 9.551 1.825 5.233 1.67e-07 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1109.035 on 800 degrees of freedom
Residual deviance: 55.737 on 791 degrees of freedom
AIC: 73.737
Number of Fisher Scoring iterations: 12
One should note that "large" thresholds (in absolute value) have larger
standard errors, which is probably an artifact, which I don't know how to
get rid of. The glm claim of "800 degrees of freedom" is ridiculous : I
have only 100 observations. More on this below...
Now, to extend this to mixed models (using lme4's notation as model) is
not absolutely trivial :
The "fixed effect" part of the formula must *not* have intercepts (that's
what the thresholds become) : how to ensure this ?
In the "random effect" part, the intercepts can be replaced by the
thresholds (one can think of legitimate reasons for which thresholds
should depend on a random factor...). Should that be done also for
implicit thresholds (e. g. (X | Id) means, in a GLM (implicitely)
"variation of slope of X AND intercept according to Id". Should that
become "variation of slope AND threshold according to Id" ?).
How should the tinkering with the original data be handled ?
Systematically copying the original dta can be heavy, but is safe. Can
the built-in "copy-on-write" mechanisms of R be sufficient ?
What "other" models would be useful in such a function ? Cloglog ?
"Continuation" ? Others ?
More generally, shouldn't such a function (polrer ?) be part of some
hypothetical (Platonician ?) lme4 ? Ordered data are important in many
cases where numerical expressions cannot be trusted, and the ability to
fit mixed models of these data would be important. But doing this on a
case by case basis is somewhat error-prone (especially when done by me :).
For now on, I cope with BUGS and Bayesian interpretation. And hereby lies
a possibly important remark.
My BUGS code (JAGS dialect, inspired by the "Bones" example of Classic
Bugs vol I) and R call (through rjags) is at end.
The results call for some comments. Both graphical examination and some
formal inspection show good convergence (the predictions are somewhat
harder to stabilize...) :
gelman.diag(Mod$Mod.coda[,-(1:100)]) ## First 100 columns are
## predicted values for the
## original data.
Potential scale reduction factors:
Point est. 97.5% quantile
beta.1 1.00 1.00
deviance 1.00 1.00
gamma[1] 1.00 1.00
gamma[2] 1.00 1.00
gamma[3] 1.00 1.01
gamma[4] 1.00 1.00
gamma[5] 1.00 1.00
gamma[6] 1.00 1.00
gamma[7] 1.00 1.00
gamma[8] 1.00 1.00
Multivariate psrf
1.00
The fitted data and their variability are cosmetically quite different
(the Bayesian model's gamma is the equivalent of the Threshold/X
coefficient of the polr/glm frequentist model), but their central value
quite consistent with previous values :
summary(Mod$Mod.coda[,-(1:100)])
Iterations = 11010:21000
Thinning interval = 10
Number of chains = 3
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
beta.1 8.927 1.68312 0.030729 0.027433
deviance 64.674 4.26684 0.077902 0.087455
gamma[1] 1.544 0.08815 0.001609 0.001649
gamma[2] 2.544 0.10957 0.002000 0.002065
gamma[3] 3.580 0.09930 0.001813 0.001708
gamma[4] 4.544 0.08445 0.001542 0.001607
gamma[5] 5.630 0.11998 0.002191 0.002137
gamma[6] 6.627 0.12406 0.002265 0.002463
gamma[7] 7.606 0.11175 0.002040 0.002019
gamma[8] 8.608 0.18558 0.003388 0.003343
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
beta.1 6.180 7.741 8.799 9.878 12.582
deviance 58.271 61.458 64.116 67.112 74.836
gamma[1] 1.363 1.489 1.548 1.604 1.706
gamma[2] 2.333 2.469 2.541 2.616 2.765
gamma[3] 3.386 3.512 3.582 3.647 3.779
gamma[4] 4.385 4.487 4.543 4.600 4.713
gamma[5] 5.406 5.545 5.630 5.713 5.870
gamma[6] 6.358 6.550 6.633 6.715 6.851
gamma[7] 7.392 7.529 7.603 7.682 7.821
gamma[8] 8.266 8.470 8.608 8.738 8.967
One should note, however, that, while the central value and sampling StD
of beta are consistent with polr/glm results, the ratios SD/mean of the
thresholds are much smaller than with polr/glm, and this is reflected in
the credible intervals. Furthermore, the SDs do not seem to increase as
much with the (absolute value) of the central values. (Yes, I am aware
that centering my X values would reduce this phenomenon, but I kept them
uncentered to illustrate my point).
For example, the 0.95 confidence interval of the C1|C2 threshold is about
6.1 wide (about 41% of the mean) whereas the 0.95 credible interval of
gamma[1] is (1.36 1.70), 0.34 wide, which is about 25% of its mean. For
the C8|C9 thershold, the IC95/mean ratio is about 37% of the mean, while
the CrI95/mean of gamma[8] is 8.1%.
Given that the two models are respectively a frequentist and a Bayesian
of the very same probability model, this is perplexing.
Are there reasons to suspect that the SD reported by glm and/or polr are
inflated (for example, these model do not seem to account for the re-use
of the same data for the estimation of the various thresholds) ? Or is
there a (maybe not so subtle) error in the Bayesian model interpretation
in BUGS ?
Or am I barking up the wrong tree ?
Any idea would be welcome ...
Sincerely yours,
Emmanuel Charpentier
The artificial data :
Data <-
structure(list(X.real = c(3.49764437862872, 6.90659633160501,
2.49335390384994, 7.24293843863257, 0.803964727576124, 4.04213878877333,
1.06214939381180, 0.554054259416088, 4.08729099177371, 2.52629057633986,
3.24127380430347, 1.11891797458651, 0.530158326675324, 9.13663197362062,
4.50887509479721, 8.28526961997477, 4.88346597634514, 7.1687905770683,
8.30770739436844, 4.10970688126136, 0.926142793485848, 1.60153659527237,
7.9980129347903, 7.09517767824342, 3.13154316634710, 3.73947230927693,
2.80221854118421, 6.59984440492743, 2.61928305396566, 7.01599863310761,
2.24968126253923, 1.58829945203618, 1.90902984485251, 5.12920046748922,
9.2118070532461, 9.08208307216233, 4.43550132482185, 8.05729871351512,
1.18297174176127, 8.46910981252808, 4.43676303030525, 6.4418454794069,
4.53446029962874, 0.699769024165557, 8.46097165332483, 4.53799448354545,
5.27431575051651, 7.54335240307173, 2.17309956012568, 5.88721264347949,
9.4071993099453, 1.88089249201845, 5.3721556593878, 0.771888320064126,
8.1072438324685, 6.31264339382069, 2.00734973388480, 1.55494366180340,
7.02181982951623, 2.30599238507652, 0.527109325928194, 3.66541005952039,
5.5545317236498, 1.09950173374848, 0.589535171670769, 3.53540683843517,
4.45889129379049, 6.80170522713363, 7.37081649369139, 9.4889221652781,
7.19282466718394, 6.13643707306052, 1.35088176999453, 3.16405426710045,
3.32833977017296, 6.08074276163767, 1.63873091713528, 1.46557732912316,
4.91770214161358, 0.938513984685997, 1.48740958479596, 2.01597792723018,
4.21291073575364, 5.43887605123115, 4.70221174192061, 1.67668472528311,
5.24846045233372, 9.0111473276549, 7.23671006856042, 3.72080675801418,
4.39171014756293, 4.81283564404103, 2.41342165250038, 3.82321065255632,
7.669782577978, 9.2234752201292, 0.641411897606775, 1.13470612170815,
1.02966988829139, 1.24689684186557), Cat = structure(c(3L, 7L,
2L, 7L, 1L, 4L, 1L, 1L, 4L, 3L, 3L, 1L, 1L, 9L, 5L, 8L, 5L, 7L,
8L, 4L, 1L, 2L, 8L, 7L, 3L, 4L, 3L, 7L, 3L, 7L, 2L, 2L, 2L, 5L,
9L, 9L, 4L, 8L, 1L, 8L, 4L, 6L, 5L, 1L, 8L, 5L, 5L, 8L, 2L, 6L,
9L, 2L, 5L, 1L, 8L, 6L, 2L, 2L, 7L, 2L, 1L, 4L, 6L, 1L, 1L, 4L,
4L, 7L, 7L, 9L, 7L, 6L, 1L, 3L, 3L, 6L, 2L, 1L, 5L, 1L, 1L, 2L,
4L, 5L, 5L, 2L, 5L, 9L, 7L, 4L, 4L, 5L, 2L, 4L, 8L, 9L, 1L, 1L,
1L, 1L), .Label = c("C1", "C2", "C3", "C4", "C5", "C6", "C7",
"C8", "C9"), class = c("ordered", "factor")), X = c(3.78512255820666,
7.03710649735623, 2.37579547880536, 7.01687069036889, 0.572774665466921,
4.53333128338774, 0.92564311992818, 0.682074690286214, 4.17197303371935,
2.62951212025962, 3.44249595134830, 1.02688214798057, 0.440959503725193,
9.05785834195666, 4.52616407157106, 8.07448237191856, 4.95371782896727,
7.35730414103575, 8.07638469674326, 4.26407495806574, 1.06385684713162,
1.63333280184271, 7.9246178454581, 7.41541196525404, 3.12459209709868,
3.64101167776984, 2.51917629411974, 6.8118629850261, 2.77786400733997,
7.36566898280035, 2.03009770691198, 1.61723070290973, 1.96657910701987,
5.22600151472831, 9.21386030948403, 9.18902347324618, 4.39493834305473,
8.27266371630105, 1.15477068640133, 8.15280513569419, 4.6734050583675,
6.51664088906166, 4.40955269905174, 0.54646886424704, 8.03605092676131,
4.36779957410513, 5.44860695752003, 7.83429549417553, 1.84916430642593,
5.81109382387844, 9.32133297147632, 1.73276033539495, 5.3490390105937,
0.695781216499173, 8.05603266764859, 6.67741469458807, 2.03017543225991,
1.60754713124494, 6.63481861913185, 2.20004584719421, 0.675994410762504,
3.98953568167253, 5.45631429591027, 0.675175153754915, 0.305522085250977,
3.58440905189674, 4.58557708778191, 6.99911313460102, 7.21102458968007,
9.44650050070282, 7.33311596756764, 5.87485729872331, 1.11824133536435,
3.40701638829918, 3.36818213997361, 5.92742193489156, 2.08701427718401,
1.73709762230381, 4.71589786549998, 0.980248052739725, 1.58957599464140,
2.17701413093404, 4.22999685414555, 5.84947271423299, 4.94396458250251,
1.65133703600731, 5.17336780283038, 8.8993318875765, 7.29238547003421,
3.62020801752199, 4.28550112286315, 4.95543088867053, 2.61555021108721,
3.85711807608966, 7.67091954093986, 9.24609859359625, 0.436967673925767,
1.32063426377087, 0.794911931890139, 1.16453130413352)), .Names = c
("X.real",
"Cat", "X"), row.names = c(NA, -100L), class = "data.frame")
The R/BUGS Bayesian model :
system.time(Mod<-local({
Mod<-function(){ ## BUGS model as an R function. Easy to manage.
## Thresholds.
for (j in 1:ncut) {
gamma.raw[j]~dnorm(0, 1.0E-4)
}
gamma<-sort(gamma.raw)
## p[i,j]=Pr(Cat[i]>j)
## q[i,j]=Pr(Cat[i]=j)
## =Pr(Cat[i]>j-1)-Pr(Cat[i]>j)
## =p[i,j-1]-p[i,j]
for (i in 1:nobs) {
Cat[i]~dcat(q[i,1:(ncut+1)])
## Predictions
Cat.new[i]~dcat(q[i,1:(ncut+1)])
q[i,1]<-1-p[i,1]
for (j in 2:ncut) {
q[i,j]<-p[i,j-1]-p[i,j]
}
q[i,ncut+1]<-p[i,ncut]
for (j in 1:ncut) {
logit(p[i,j])<-beta.1*(X[i]-gamma[j])
}
}
beta.1~dnorm(0, 1.0E-4)
}
tmpf<-tempfile()
write.model(Mod, tmpf) ## Auxilliary function writing a function body
Mod.jags<-jags.model(tmpf,
data=with(Data, {
Y<-outer(Cat, levels(Cat)[-1], ">=")
list(X=X,
Cat=unclass(Cat),
nobs=nrow(Y),
ncut=ncol(Y))}),
## "Reasonable" (not too overdispersed) initial
## values are *CRUCIAL* to model initialisation
inits=function() {
list(gamma.raw=sort(runif(8,min=-1,max=1)),
beta.1=runif(1,min=0,max=1))
},
n.chains=3)
update(Mod.jags, n.iter=10000)
unlink(tmpf)
Mod.coda<-coda.samples(Mod.jags,
variable.names=c("gamma",
## "beta.0",
"beta.1",
## "gamma.adj",
## "lzeta",
## "pi.sgn",
"deviance",
"Cat.new"),
thin=10,
n.iter=10000)
list(Mod.jags=Mod.jags, Mod.coda=Mod.coda)
}))
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models