Conflicting p-values from pvals.fnc
Hi all, This example has gotten me pretty confused about how the "weights" argument works for lmer. I'd always assumed that setting a weight of k for an observation would cause lmer to act as if it had seen k replicates of that observation (i.e., the contribution of the observation to the likelihood would be taken to the k-th power). After reading this query I found the following post that they are "precision weights not sampling weights": http://tolstoy.newcastle.edu.au/R/e17/help/12/01/2099.html I'm not sure what that means -- was my interpretation one of "sampling weights"? Regardless, I'm noticing what seems to me to be inconsistent behavior in how setting the weights argument affects the t statistic in lmer() output and how the output of pvals.fnc() is affected. I'm including an example below with a fixed effect of "x" and a random intercept of "a": the higher one sets the weights, the larger the t statistic for x becomes (which is what I'd originally expected given my assumptions about the weights semantics), but the broader the posterior on x becomes in the MCMC output. This doesn't seem right, does it? (I also don't understand why the estimate of the random-intercept variance but not the residual variance reported in the lmer output changes.)
set.seed(1)
dat <- data.frame(x=rep(c(0,1),each=4),a=factor(rep(c("a","b"),4)),w=1)
dat$y <- with(dat,10*x+10*(a=="a") + rnorm(8))
print(m1 <- lmer(y~x+(1|a),dat,weights=1*w))
Linear mixed model fit by REML
Formula: y ~ x + (1 | a)
Data: dat
AIC BIC logLik deviance REMLdev
32.32 32.64 -12.16 29.97 24.32
Random effects:
Groups Name Variance Std.Dev.
a (Intercept) 44.09718 6.64057
Residual 0.87763 0.93682
Number of obs: 8, groups: a, 2
Fixed effects:
Estimate Std. Error t value
(Intercept) 5.0792 4.7186 1.076
x 10.1045 0.6624 15.254
Correlation of Fixed Effects:
(Intr)
x -0.070
pvals.fnc(m1)
$fixed
Estimate MCMCmean HPD95lower HPD95upper pMCMC Pr(>|t|)
(Intercept) 5.079 5.037 -1.553 11.32 0.1004 0.3231
x 10.104 10.110 4.917 15.28 0.0038 0.0000
$random
Groups Name Std.Dev. MCMCmedian MCMCmean HPD95lower HPD95upper
1 a (Intercept) 6.6406 2.6325 3.3750 0.0000 7.6330
2 Residual 0.9368 2.8649 3.2042 0.8809 6.0746
print(m2 <- lmer(y~x+(1|a),dat,weights=100*w))
Linear mixed model fit by REML
Formula: y ~ x + (1 | a)
Data: dat
AIC BIC logLik deviance REMLdev
69.17 69.48 -30.58 66.81 61.17
Random effects:
Groups Name Variance Std.Dev.
a (Intercept) 0.44097 0.66406
Residual 0.87763 0.93682
Number of obs: 8, groups: a, 2
Fixed effects:
Estimate Std. Error t value
(Intercept) 5.07921 0.47185 10.76
x 10.10449 0.06624 152.54
Correlation of Fixed Effects:
(Intr)
x -0.070
pvals.fnc(m2)
$fixed
Estimate MCMCmean HPD95lower HPD95upper pMCMC Pr(>|t|)
(Intercept) 5.079 5.38 -23.816 38.45 0.3434 0
x 10.104 10.11 4.115 16.06 0.0068 0
$random
Groups Name Std.Dev. MCMCmedian MCMCmean HPD95lower HPD95upper
1 a (Intercept) 0.6641 5.8216 15.7201 0.0000 61.7422
2 Residual 0.9368 23.8808 30.8847 5.2868 71.7694
print(m3 <- lmer(y~x+(1|a),dat,weights=w/100))
Linear mixed model fit by REML
Formula: y ~ x + (1 | a)
Data: dat
AIC BIC logLik deviance REMLdev
-4.516 -4.199 6.258 -6.868 -12.52
Random effects:
Groups Name Variance Std.Dev.
a (Intercept) 4409.71731 66.40570
Residual 0.87763 0.93682
Number of obs: 8, groups: a, 2
Fixed effects:
Estimate Std. Error t value
(Intercept) 5.079 47.187 0.108
x 10.104 6.624 1.525
Correlation of Fixed Effects:
(Intr)
x -0.070
pvals.fnc(m3)
$fixed
Estimate MCMCmean HPD95lower HPD95upper pMCMC Pr(>|t|)
(Intercept) 5.079 5.053 -0.8904 11.28 0.085 0.9178
x 10.104 10.107 5.3060 15.21 0.002 0.1780
$random
Groups Name Std.Dev. MCMCmedian MCMCmean HPD95lower HPD95upper
1 a (Intercept) 66.4057 2.6379 3.1098 1.1580 6.5406
2 Residual 0.9368 0.2828 0.3121 0.1214 0.5694
I'd be glad to be enlightened...!
Best
Roger
On Mar 26, 2012, at 10:35 AM PDT, Geoff Brookshire wrote:
Hi listers,
If you get p-vals using Wald chi-square tests with lme4::anova, they look
pretty close to the pMCMC output.
fm.1 <- lmer(y ~ block * condition + (1 | as.factor(subject)),
weights = weights, REML = FALSE)
fm.2 <- lmer(y ~ block + condition + (1 | as.factor(subject)),
weights = weights, REML = FALSE)
anova(fm.1, fm.2)
This gives p = .35 for the block*condition interaction. For comparison,
pvals.fnc gave pMCMC = .3 and p(<|t|) < .001. So it looks like p-vals
derived from t-tests are just way off.
It seems to me like we should just totally ignore the P(<|t|) output. Does
anyone who knows more about how these work think otherwise?
-- geoff
On Mon, Mar 19, 2012 at 3:25 PM, Tom Gijssels <tom.gijssels at gmail.com>wrote:
Dear R-listers,
I'm trying to run a mixed effect model using the lmer() function and have
run into some issues in interpreting the p-values generated by
pvals.fnc(). The design is a between-subjects design, with two fixed
effects (condition & block; each with two levels), and one random effect
(subject). Additionally, I have a set of weights that I want to include.
When looking at the pvals.fnc() output,there appears to be a large
discrepancy between the pMCMC values and the t-statistic p-values. Whereas
one of the main effects and the interaction are far from significant
judging by the pMCMC values, they are highly significant when looking at
the t-statistic p-values (e.g. Condition: pMCMC = 0.2294; Pr(>|t|) = 0.0000
& Condition*Block: pMCMC = 0.3296; Pr(>|t|) = 0.0000) . I have read that
the t-statistic based p-values are less conservative, but the difference
between these two values seems really extreme.
Below some code that simulates the model and the data. The original data
set has two precise characteristics that might influence the results, so I
tried to simulate those characteristics in the mock data. That is: 1)
there's fewer observations in block A than in block B; and 2) the weights
for observations in block A generally are lower than those for block B.
Running this code reproduces the original observation of conflicting pMCMC
and p-T-test values. However, when excluding the weights argument from the
lmer model, these values seem to converge, suggesting that the weights
specification might be underlying these problems.
In short, my question is whether anyone knows why these values diverge and
what I could do to address this issue.
Many thanks in advance!
Tom
block <- as.factor(c(rep('a', times = 20), rep('b', times = 200)))
condition <- as.factor(c(rep(c('x', 'y'), each = 10), rep(c('x','y'), each
= 100)))
contrasts(block) <- c(-0.5, 0.5)
contrasts(condition) <- c(-0.5, 0.5)
subject <- c(rep(1:4, each = 5), rep(1:4, each = 50))
intercept <- 100
block.me <- 20
condition.me <- 30
err <- rnorm(length(block), sd = 20)
weights <- c(rep(1, times = 20), rep(10, times = 200))
y <- intercept + ifelse(block == 'a', block.me, 0) + ifelse(condition ==
'x', condition.me, 0) +
ifelse(block == 'a' & condition == 'x', 30, 0) + (subject * 10) + err
fm.1 <- lmer(y ~ block * condition + (1 | as.factor(subject)),
weights = weights, REML = FALSE)
fm.1.mcmc <- pvals.fnc(fm.1, addPlot=F)
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Roger Levy Email: rlevy at ucsd.edu Assistant Professor Phone: 858-534-7219 Department of Linguistics Fax: 858-534-4789 UC San Diego Web: http://idiom.ucsd.edu/~rlevy