need help with predicted values from lmer with newdata
Hi Paul,
I am not sure how to use C-c C-d with namespaces and unexported
objects. I might suggest though that for the assignment, you do
something like:
assignInNamespace("predict.merMod", mypredict, envir =
as.environment("package:lme4"))
It is not an issue in this case, but it can be if mypredict calls
functions that are availabe from lme4 but not to functions in the
global environment. I am not a big leaguer though so cannot give you
really definitive advice.
Regarding the shirt, give some free help to a local student (or
charity), and I will consider us even :)
Cheers,
Josh
On Fri, Jun 22, 2012 at 7:44 AM, Paul Johnson <pauljohn32 at gmail.com> wrote:
Confirmed. Follow-up below about how you edit predict.merMod On Fri, Jun 22, 2012 at 4:19 AM, Joshua Wiley <jwiley.psych at gmail.com> wrote:
Hi Paul,
I think this may be a bug in predict.merMod. ?It is not terribly
elegant, but I suggest changing line 102 from:
re_new <- unlist(re_List[m])
to
re_new <- as.vector(do.call("rbind", re_List[m]))
Changing this resolves the anomaly I experience. If you mail me your shirt size & mailing address, I will follow through on the shirt. Official logo and all! Now a more practical question. This is OT to mixed models, but is on topic to Emacs and R and how-to revise lme4 objects to test variations like the one you propose. I'm using Emacs with ESS. ?I learned in this email list that C-c C-d can open a function and revise it, and then load it. ?This is the Emacs-ESS alternative to R's fix() function. I have more and more trouble revising objects because more code is written with non-exported functions and functions. In order to follow your advice and revise predict.merMod, the only thing I could think of was the following.
mypredict <- lme4:::predict.merMod
Then I C-c C-d mypredict, and the re-load that, then change my usage from predict(m4, ...) to mypredict. This seemed like a childish way to get at it, I wondered how they are doing it in the big leagues. ? PJ
I believe the issue with the current is that the random effects are unlisted which essentially 'stacks' the vector, so you have the random effects one at a time by effect, not by level (i.e., all intercepts, then all slopes). ?Later this is post multiplied by a matrix that is sorted by level, the dimensions are correct so the matrices multiply fine, but the results are not what I believe was intended. I cannot see a way to add a patch to R-forge or submit a bug report so I cced Doug on this. Cheers, Josh On Thu, Jun 21, 2012 at 8:28 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:
Greetings.
I'm getting some crazy results out of predict on an lmer model. ?I've
gone through
this lots of times, and feel certain I'm missing some obvious mistake.
?If you browse
through and get to the bottom, there's some output that shows my predict output
is just wrong, but I can't see why.
I will send the first person with the right answer a new KU t-shirt in
the size of
your choice. ?That's better than "thanks in advance". Isn't it?
## Paul Johnson <pauljohn at ku.edu>
## 2012-06-21
## I want to compare the "separate ols regressions on clusters"
## results with the mixed model estimates. I've run into
## trouble because the output of predict with a newdata object
## seems to be completely wrong. ?I demonstrate "manual" calculations
## to try to convince you I understand what's going on (but
## peculiarity of predict output may make things appear otherwise).
## I pasted in some output at the bottom in case you just want to
## see where I think there is a problem.
## My system:
## > sessionInfo()
## R version 2.15.0 (2012-03-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## locale:
## ?[1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C
## ?[3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8
## ?[5] LC_MONETARY=en_US.UTF-8 ? ?LC_MESSAGES=en_US.UTF-8
## ?[7] LC_PAPER=C ? ? ? ? ? ? ? ? LC_NAME=C
## ?[9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
## attached base packages:
## [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base
## other attached packages:
## [1] lme4_0.999902344-0 Matrix_1.0-6 ? ? ? lattice_0.20-6 ? ? MASS_7.3-18
## loaded via a namespace (and not attached):
## [1] compiler_2.15.0 grid_2.15.0 ? ? minqa_1.2.1 ? ? nlme_3.1-104
## [5] splines_2.15.0 ?tools_2.15.0
## Topic: create multi-level data and use regression and lmer to
## estimate it. ?This creates 3 different dependent variables,
## y1: ordinary "homoskedastic" regression (no grouping effect)
## y2: clustered "random intercepts"
## y3: clustered "random intercepts" and "random slope" for 1 predictor (x1)
library(MASS)
set.seed(1234)
## Step 1. Create a Data Frame.
## M respondents, N observations for each one.
## In a repeated measures context, this is called "longitudinal data".
## In cross sectional approach, this is M groups (classrooms) with
## several people in each)
M <- 10
N <- 30
## get M unique gray colors, will use later for plotting
grays <- gray.colors(M)
## Standard deviation of error term at individual level
STDE <- 48
## STDEM: standard deviation of clustered intercepts.
### In a longitudinal "repeated measures" exercise, this is an
## individual-level effect. In a cross section, this is a random
## classroom effect.
STDEM <- 30
## STEx1: standard deviation of slopes across cluster units
STDEx1 <- 5
## The true effects of b0, b1, b2, b3 in
## y = b0 + b1*x1 + b2*x2 + b3*x3
bslopes <- c(0.2, 15.4, -0.2, 3)
## Now generate the data frame with x1, x2 and x3.
## Let's suppose the predictors are Multivariate Normal, with
## means (100,200,150), standard deviations (10, 20, 30), and
## intercorrelations of 0.4. ?These can, of course, be adjusted
## for interesting variations.
## Mind, an "indicator" that says with which cluster an observation belongs.
Mind <- 1:M %x% rep(1,N)
means <- c(100, 200, 150)
sds <- c(10, 20, 30)
rho <- 0.4
corr.mat <- matrix(c(1, rho, rho, rho, 1, rho, rho, rho, 1), nrow = 3)
sigma <- diag(sds) %*% corr.mat %*% diag(sds)
x.mat <- mvrnorm(n = N * M, mu = means, Sigma = sigma)
dimnames(x.mat)[[2]] <- c("x1", "x2", "x3")
## Add an intercept on the front of x.mat
x.mat <- cbind("(Intercept)"= 1, x.mat)
## Create a dependent variable that has no clustering effects. ?This
## is an "ordinary" regression with slopes and random error designated
## above random noise
y1 <- x.mat %*% bslopes + rnorm(M*N, m=0, s= STDE)
dat <- data.frame(id=1:(N*M), Mind, y1, as.data.frame(x.mat))
rm(Mind, y1, x.mat) ## cleanup workspace
## Layer on additive group level error in y2
## Add a group-level intercept.
## A vector of M disturbances is created, and each is
## added to all N case within its cluster
dat$y2 <- dat$y1 + ?rnorm(M, 0, STDEM) %x% rep(1, N)
## In y3, add in random slope effect
## are changing the slope for x1 from b1 to N(b1, STDEx1^2).
## Same as newb1 ~ b1 + N(0, STDEx1) * x1
dat$y3 <- dat$y2 + ? (rnorm(M, 0, STDEx1) %x% rep(1, N)) * dat$x1
## In the lme4 package, there is an "easy" tool to run M separate lm regressions
library(lme4)
##100 separate regressions, with 3 predictors
m3list <- lmList(y3 ~ x1 + x2 + x3 | Mind, data=dat, pool = FALSE)
## Now I see a pattern. I'll use the unique range on x1 for each cluster
## I think that makes the lines look more "in" the data.
plot(y3 ~ x1, data=dat, col=grays[Mind], main = "lm on clusters")
for( i in seq_along(m3list)){
? ?m3mf <- model.frame(m3list[[i]]) #data set for group i
? ?x1range <- range(m3mf$x1) ## use group-specific ranges this time
? ?pgroup <- predict( m3list[[i]], newdata = data.frame(x1=x1range,
x2=mean(m3mf$x2), x3=mean(m3mf$x3)))
? ?lines(x1range, pgroup, col=grays[i])
}
## Hmm. That data looks like it might be handy for comparison later on.
## Build a data frame of it. I've never thought of doing this work
## cluster-by-cluster before.
m3newdat <- lapply(m3list, function(x) {
? ?m3mf <- model.frame(x)
? ?ndf = data.frame(x1=range(m3mf$x1), x2=mean(m3mf$x2), x3=mean(m3mf$x3))
? ?ndf$m3pred <- predict(x, newdata = ndf)
? ?ndf} )
## Smash the list of data frames together
m3newdat <- do.call("rbind", m3newdat)
## Interesting!
m3newdat[1:20, ]
## Better add a variable Mind. This looks stupid, but works.
m3newdat$Mind <- as.integer( do.call("rbind",
strsplit(row.names(m3newdat), split="\\.", perl=T))[ ,1] )
## Draw new graphs on a new device, so we can compare
dev.new()
## The "right" model allows the random effects to be correlated.
mm4 <- lmer( y3 ~ x1 + x2 + x3 + (x1 | Mind), data=dat)
summary(mm4)
mm4ranef <- ranef(mm4, postVar = TRUE) ## a ranef.mer object, a list
that includes one data frame
### What do the predicted values mean from lmer?
### here they are. But what do they include?
head(mm4pred <- predict(mm4))
tail(mm4pred <- predict(mm4))
## I can't tell what those numbers are. Can't be sure from ?predict.merMod
## So I'm going to go through the step-by-step
## process of calculating predicted values. These should follow a
formula like this
## for model 4. j is the "cluster" index.
##
## ?j'th intercept ? ? ? ? ? ? ? ?j'th slope
## ?(b0 + raneff(intercept, j)) + (b1 + raneff(x1,j))*x1 + b2 x2 + b3 x3
mm4b <- coef(summary(mm4))[ ,1]
mm4vnames <- names(mm4b)
mm4mm <- model.matrix(mm4) ## predictors
## Doing this bit by bit. First
## b0 + raneff(intercept, j)
mm4inteffect <- mm4b["(Intercept)"] + ?mm4ranef[[1]][dat$Mind, 1]
mm4x1effect <- mm4mm[ , c("x1")] * (mm4b["x1"] + mm4ranef[[1]][dat$Mind, 2])
mm4pred2 <- mm4inteffect + mm4x1effect + ?mm4mm[ ,c("x2","x3") ] %*%
mm4b[c("x2","x3")]
head(mm4pred2)
tail(mm4pred2)
## Aha! Those exactly match predict for mm4. So I understand what "predict" does
## Now, I want to run predict for some particular values of x1, x2, x3.
## Let's try running m3newdat through the predict function for mm4.
m3newdat$mm4.3 <- predict(mm4, newdata = m3newdat)
m3newdat
## Disaster. Predicted values completely out of whack. Lots of
## lines fall off the bottom of the graph.
plot(y3 ~ x1, data=dat, col=grays[Mind], main="lmer mixed model predictions")
by(m3newdat, m3newdat$Mind, function(x) { lines(x$x1, x$mm4.3,
col=grays[x$Mind])})
## They don't match the manual calculations I perform here
mm4b <- coef(summary(mm4))[ ,1]
mm4vnames <- names(mm4b)
mm4mm <- as.matrix(m3newdat)
## ?b0 + raneff(intercept, j)
mm4inteffect <- mm4b["(Intercept)"] + rep(mm4ranef[[1]][, 1], each=2)
mm4x1effect <- mm4mm[ , c("x1")] * rep(mm4b["x1"] + mm4ranef[[1]][, 2], each=2)
mm4manualpred2 <- mm4inteffect + mm4x1effect + ?mm4mm[ ,c("x2","x3") ]
%*% ?mm4b[c("x2","x3")]
mm4mm <- cbind(as.data.frame(mm4mm), "mm4manualpred" = mm4manualpred2)
mm4mm
## Here's what I see when I do that. m3pred: the lmList predictions.
## mm4manualpred: my "manual" calculations of predicted values for lmer mm4
## mm4.3 output from predict for same "newdat" used with mm4manualpred.
## > mm4mm
## ? ? ? ? ? ? x1 ? ? ? x2 ? ? ? x3 ? m3pred Mind ? ? ?mm4.3 mm4manualpred
## 1.1 ? 88.51074 196.0615 141.3859 1221.477 ? ?1 ?-434.7602 ? ? ?1239.152
## 1.2 ?110.88258 196.0615 141.3859 1483.957 ? ?1 ?-630.0463 ? ? ?1466.961
## 2.1 ? 80.26167 194.1252 133.7265 1405.672 ? ?2 ?-614.3256 ? ? ?1388.751
## 2.2 ?117.08533 194.1252 133.7265 1845.499 ? ?2 -1051.4105 ? ? ?1866.505
## 3.1 ? 80.84091 202.4517 154.3976 1302.205 ? ?3 ?3527.8154 ? ? ?1284.706
## 3.2 ?120.33775 202.4517 154.3976 1692.819 ? ?3 ?5037.6392 ? ? ?1717.210
## 4.1 ? 68.96320 204.5145 151.6662 1321.927 ? ?4 ?5097.5261 ? ? ?1379.251
## 4.2 ?120.17029 204.5145 151.6662 2137.589 ? ?4 ?8559.2362 ? ? ?2108.628
## 5.1 ? 84.85202 200.7799 153.1595 1821.135 ? ?5 ?4736.2240 ? ? ?1813.415
## 5.2 ?122.58730 200.7799 153.1595 2418.381 ? ?5 ?6641.2026 ? ? ?2426.855
## 6.1 ? 84.60214 202.3324 153.2583 2015.183 ? ?6 ?1521.6831 ? ? ?2001.700
## 6.2 ?128.13418 202.3324 153.2583 2778.684 ? ?6 ?2086.4726 ? ? ?2799.053
## 7.1 ? 74.73548 198.5761 155.8984 1780.196 ? ?7 ?1497.6761 ? ? ?1802.381
## 7.2 ?118.97636 198.5761 155.8984 2622.083 ? ?7 ?2127.8282 ? ? ?2603.626
## 8.1 ? 83.40882 200.4991 152.0515 2342.043 ? ?8 ?1954.4592 ? ? ?2323.477
## 8.2 ?118.73997 200.4991 152.0515 3088.686 ? ?8 ?2601.6007 ? ? ?3105.182
## 9.1 ? 82.48090 205.4736 152.2774 1979.852 ? ?9 ?2253.5650 ? ? ?1950.388
## 9.2 ?117.48279 205.4736 152.2774 2554.581 ? ?9 ?3027.9842 ? ? ?2587.319
## 10.1 ?84.77764 205.9689 152.3325 2256.434 ? 10 ?2236.3346 ? ? ?2268.620
## 10.2 119.66417 205.9689 152.3325 3022.053 ? 10 ?2980.1202 ? ? ?3012.406
--
Paul E. Johnson
Professor, Political Science ? ?Assoc. Director
1541 Lilac Lane, Room 504 ? ? Center for Research Methods
University of Kansas ? ? ? ? ? ? ? University of Kansas
http://pj.freefaculty.org ? ? ? ? ? ?http://quant.ku.edu
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/
-- Paul E. Johnson Professor, Political Science ? ?Assoc. Director 1541 Lilac Lane, Room 504 ? ? Center for Research Methods University of Kansas ? ? ? ? ? ? ? University of Kansas http://pj.freefaculty.org ? ? ? ? ? ?http://quant.ku.edu
Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/