hypothetical prediction after polr
Hi:
I think the problem is that you're trying to append the predicted
probabilities as a new variable in the (one-line) data frame, when in
fact a vector of probabilities is output ( = number of ordered levels
of the response) for each new observation. Here's a reproducible
example hacked from the faraway package that shows a few ways to deal
with the problem.
library("MASS")
library('faraway')
# Some messing around with the nes96 data to coarsen the factor levels
# of the response and to change income from an ordered factor to
# numeric by replacing factor levels with the midpoint incomes in
# each class. The result is still ordered.
nes96$sPID <- nes96$PID
levels(nes96$sPID) <- c('Dem', 'Dem', 'Ind', 'Ind', 'Ind', 'Rep', 'Rep')
incavg <- c(1.5,4,6,8,9.5,10.5,11.5,12.5,13.5,14.5,16,18.5,21,23.5,27.5,
32.5,37.5,42.5,47.5,55,67.5,82.5,97.5,115)
nes96$nincome <- incavg[unclass(nes96$income)]
# Fit the model:
pomod <- polr(sPID ~ age + educ + nincome, data = nes96)
# Create a new data frame for prediction
ndat <- data.frame(age = 40, educ = 'BAdeg', nincome = mean(nincome))
# Predict: result is a vector rather than a scalar
predict(pomod, newdata = ndat, type = 'probs')
Dem Ind Rep
0.3787940 0.2663598 0.3548461
# Fails because you're trying to append a new column of length three
# to a data frame with one row
ndat$predprob <- predict(pomod, newdata = ndat, type = 'probs')
Error in `$<-.data.frame`(`*tmp*`, "predprob", value = c(0.378794008534862, :
replacement has 3 rows, data has 1
# Better: cbind the predictions to the prediction data frame
# Method 1: for one new observation, cbind() creates a three-line data frame
cbind(ndat, predprob = predict(pomod, newdata = ndat, type = 'probs'))
age educ nincome predprob
Dem 40 BAdeg 46.57574 0.3787940
Ind 40 BAdeg 46.57574 0.2663598
Rep 40 BAdeg 46.57574 0.3548461
# Method 2: One-line data frame output for one-line newdata input
# Since predict() outputs a numeric vector, first coerce it into a list
# and then to a one-line data frame. Then cbind() returns a one line
# data frame.
cbind(ndat,
predprob = as.data.frame(as.list(predict(pomod, newdata = ndat,
type = 'probs'))))
# age educ nincome predprob.Dem predprob.Ind predprob.Rep
#1 40 BAdeg 46.57574 0.378794 0.2663598 0.3548461
# Now try with multiple new observations:
ndat2 <- data.frame(age = c(35, 40), educ = c('HS', 'BAdeg'), nincome
= mean(nincome))
cbind(ndat2, predprob = predict(pomod, newdata = ndat2, type = 'probs'))
age educ nincome predprob.Dem predprob.Ind predprob.Rep
1 35 HS 46.57574 0.4261833 0.2627280 0.3110888
2 40 BAdeg 46.57574 0.3787940 0.2663598 0.3548461
So method 2 is consistent with what you would get from predicting
multiple new observations.
HTH,
Dennis
On Tue, Oct 18, 2011 at 6:49 PM, Xu Jun <junxu.r at gmail.com> wrote:
Dear R-Help listers,
I am trying to estimate an proportional odds logistic regression model
(or ordered logistic regression) and then make predictions by
supplying a hypothetical x vector. However, somehow this does not
work. I guess I must have missed something here. I first used the polr
function in the MASS package, and I create a data frame and supply it
to the predict function (see below):
###############################################################
myologit <- polr(factor(warm) ~ yr89 + male + white + age + ed + prst,
? ? ? ? ? ? ?data=ordwarm2, method=c("logistic"))
yr89 ?<- c(1)
male ?<- c(1)
white <- c(1)
?age <- c(mean(ordwarm2$age))
?ed ?<- c(mean(ordwarm2$ed))
prst ?<- c(mean(ordwarm2$prst))
prdata <- data.frame(yr89, male, white, age, ed, prst)
prdata$rankR <-predict(myologit,newdata=prdata,type="probs")
################################################################
I do not have any problem estimating the model, but when it comes to
the last time (predict), I got the following message:
Error in `$<-.data.frame`(`*tmp*`, "rankR", value = c(0.124294963178868, ?:
?replacement has 4 rows, data has 1
Can anyone help me out with this error here? Thanks a lot!
Jun Xu, Phd
Assistant Professor
Department of Sociology
Ball State University
P.S.: below is the detailed output from R
################################################################
Call:
polr(formula = factor(warm) ~ yr89 + male + white + age + ed +
? ?prst, data = ordwarm2, method = c("logistic"))
Coefficients:
? ? ? ? ?Value Std. Error t value
yr89 ? 0.523912 ? 0.079899 ? 6.557
male ?-0.733309 ? 0.078483 ?-9.344
white -0.391140 ? 0.118381 ?-3.304
age ? -0.021666 ? 0.002469 ?-8.777
ed ? ? 0.067176 ? 0.015975 ? 4.205
prst ? 0.006072 ? 0.003293 ? 1.844
Intercepts:
? ?Value ? ?Std. Error t value
1|2 ?-2.4654 ? 0.2389 ? -10.3188
2|3 ?-0.6309 ? 0.2333 ? ?-2.7042
3|4 ? 1.2618 ? 0.2340 ? ? 5.3919
Residual Deviance: 5689.825
AIC: 5707.825
# predictions for hypothetical data points # white male in year 89 yr89 ?<- c(1) male ?<- c(1) white <- c(1) ? age <- c(mean(ordwarm2$age)) ? ed ?<- c(mean(ordwarm2$ed)) prst ?<- c(mean(ordwarm2$prst)) prdata <- data.frame(yr89, male, white, age, ed, prst) prdata$rankR <-predict(myologit,newdata=prdata,type="probs")
Error in `$<-.data.frame`(`*tmp*`, "rankR", value = c(0.124294963178868, ?: ?replacement has 4 rows, data has 1
prdata
?yr89 male white ? ? ?age ? ? ? ed ? ? prst 1 ? ?1 ? ?1 ? ? 1 44.93546 12.21805 39.58526 ###############################################################3
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.