Teaching Mixed Effects

4 messages · Ben Bolker, Greg Snow, Rafael Maia +1 more

Thu, Jan 22, 2009 3:42 PM #

Wow.
  Two very small points:

Even if we are not p-value obsessed, we would still presumably
like to be able make some kind of (even informal) inference from
the difference in fits, perhaps at the level of "model 1 fits
(much better|a little better|about the same|a little worse|much worse)
than model 2", or "the range of plausible estimates for this
parameter is (tiny|small|moderate|large|absurdly large)". To
do that we need some kind of metric (if we have not yet fled
to Bayesian or quasi-Bayesian methods) for the range of
the deviance under some kind of null case -- for example,
where should we set cutoff levels on the likelihood profile
to determine confidence regions for parameters? Parametric
bootstrap makes sense, although it is a little scary to think
e.g. of doing a power analysis for such a procedure ...

I agree that there is a non-zero probability that the _estimate_
will be exactly zero, but my point is that there is really no chance
in reality that species, blocks, or other random effects will
not vary at all ... (sorry for the convolution of that last sentence)

  Ben Bolker

3 days later

Greg Snow

Mon, Jan 26, 2009 9:15 AM #

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-
models-bounces at r-project.org] On Behalf Of Ben Bolker
Sent: Thursday, January 22, 2009 4:42 PM
To: R Mixed Models
Subject: Re: [R-sig-ME] Teaching Mixed Effects


  Wow.
  Two very small points:

I feel that the likelihood ratio is a perfectly reasonable way of
comparing two model fits where one is a special case of the other.

In

fact, if the models have been fit by maximum likelihood, the
likelihood ratio would, I think, be the first candidate for a test
statistic.  The problem with likelihood ratio tests is not the
likelihood ratio, per se -- it is converting the likelihood ratio to

p-value.  You need to be able to evaluate the distribution of the
likelihood ratio under the null hypothesis.  The chi-square
approximation to the distribution is exactly that - an approximation

and its validity depends on not testing at the boundary and on having
a large sample, in some sense of the sample size.  If I were really
interested in evaluating a p-value for the likelihood ratio I would
probably try a parametric bootstrap to get a reference distribution.

  Even if we are not p-value obsessed, we would still presumably
like to be able make some kind of (even informal) inference from
the difference in fits, perhaps at the level of "model 1 fits
(much better|a little better|about the same|a little worse|much worse)
than model 2", or "the range of plausible estimates for this
parameter is (tiny|small|moderate|large|absurdly large)". To
do that we need some kind of metric (if we have not yet fled
to Bayesian or quasi-Bayesian methods) for the range of
the deviance under some kind of null case -- for example,
where should we set cutoff levels on the likelihood profile
to determine confidence regions for parameters? Parametric
bootstrap makes sense, although it is a little scary to think
e.g. of doing a power analysis for such a procedure ...

I took this as a bit of a challenge, to look at the idea of finding the preferred model to use for a mixed effects case (I just stuck with the simple either or choice between 2 models, but this could be expanded to your 5 levels above, or other information).  For this I am using the sample dataset, sleepstudy, from the lme4 package.

Let's start with the example from the help page that basically tests if there is a correlation between the random effects for the intercept and the random effects for the slope:

The following function will simulate data of the same structure with given values for the betas and variances:

sleepsimfun1 <- function(b0, b1, Vb0, Vb1, V01, Serror) {
	mydata <- expand.grid( Days=0:9, Subject=1:18 )
	RE <- MASS::mvrnorm(18, c(0,0), matrix( c(Vb0,V01,V01,Vb1), 2) )
	mydata$Reaction <- with(mydata, 
		(b0+RE[Subject,1]) + (b1+RE[Subject,2])*Days + rnorm(180, 0, Serror)
			)

	fit1 <- lmer(Reaction ~ Days + (Days|Subject), mydata)
	fit2 <- lmer(Reaction ~ Days + (1|Subject) + (0+Days|Subject), mydata)
	anova(fit2,fit1)
}

We can use this to simulate data and analysis from the null condition (no correlation):

pb <- winProgressBar(max=1000)
pbinc <- function() setWinProgressBar(pb, getWinProgressBar(pb)+1)

setWinProgressBar(pb,0)
out1 <- replicate(1000, {pbinc(); 
		sleepsimfun1(251, 10.5, 627, 36, 0, sqrt(653))}, FALSE )

p1 <- sapply(out1, function(x) x[2,7])

This gives the results:

[1] 0.06

1-sample proportions test with continuity correction

data:  sum(p1 <= 0.05) out of 1000, null probability 0.5 
X-squared = 772.641, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.5 
95 percent confidence interval:
 0.04645524 0.07702574 
sample estimates:
   p 
0.06 

The histogram is fairly uniform and the size of the test is pretty close to the designated alpha of 0.05, so the regular use of anova is probably appropriate here.

Lets look at a different model comparison, we want to know if the number of days has any effect.  A reduced model without days at all will be simultaneously testing the slope, the random effects on slope, and the correlation between the random effects (3 fewer parameters).

This function simulates the data and compares the 2 models:

sleepsimfun2 <- function(b0, b1, Vb0, Vb1, V01, Serror) {
	mydata <- expand.grid( Days=0:9, Subject=1:18 )
	RE <- MASS::mvrnorm(18, c(0,0), matrix( c(Vb0,V01,V01,Vb1), 2) )
	mydata$Reaction <- with(mydata, 
		(b0+RE[Subject,1]) + (b1+RE[Subject,2])*Days + rnorm(180, 0, Serror)
			)

	fit1 <- lmer(Reaction ~ Days + (Days|Subject), mydata)
	fit2 <- lmer(Reaction ~ 1 + (1|Subject), mydata)
	anova(fit2,fit1)
}

We can test it under the null hypothesis with the following code:

setWinProgressBar(pb,0)
out2 <- replicate(1000, {pbinc(); 
		sleepsimfun2(298.5, 0, 1278.3, 0, 0, sqrt(1959))}, FALSE )

p2 <- sapply(out2, function(x) x[2,7])

The results are:

[1] 0.033

1-sample proportions test with continuity correction

data:  sum(p2 <= 0.05) out of 1000, null probability 0.5 
X-squared = 870.489, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.5 
95 percent confidence interval:
 0.02317469 0.04655855 
sample estimates:
    p 
0.033

The histogram shows p-values biased towards 1 rather than uniform and the true size of the test is smaller than the proposed alpha = 0.05.  This could be a good candidate for doing a likelihood comparison based on simulation (parametric bootstrap).

95% 
6.95113

[1] 158.5811

[1] 0

So 2* log lik difference for the real data is greater than all the 2* log lik differences from the simulation (should probably redo it with exact values instead of my quick rounding off, but the results should be similar), this is roughly equivalent to a p-value of close to 0.

But we may be interested in if this approach really behaves properly, and what would our power be for this type of study but with different values.

This function simulates the above procedure by generating 1000 values simulated under the null hypothesis to get a reference distribution, then simulates the data and tests it by comparing to the refrence distribution.  (this version creates the reference distribution from the hypothesized parameters, an alternative would be to generate the data first, fit the reduced model, then simulate the refrence distribution based on that fit, which is better is another discussion):

sleepsimfun3 <- function(b0, b1, Vb0, Vb1, V01, Serror) {
	setWinProgressBar(pb,0)
	out.null <- replicate(1000, {pbinc(); 
		sleepsimfun2(b0, 0, Vb0, 0, 0, Serror)[2,5]} )

	mydata <- expand.grid( Days=0:9, Subject=1:18 )
	RE <- MASS::mvrnorm(18, c(0,0), matrix( c(Vb0,V01,V01,Vb1), 2) )
	mydata$Reaction <- with(mydata, 
		(b0+RE[Subject,1]) + (b1+RE[Subject,2])*Days + rnorm(180, 0, Serror)
			)

	fit1 <- lmer(Reaction ~ Days + (Days|Subject), mydata)
	fit2 <- lmer(Reaction ~ 1 + (1|Subject), mydata)
	ts <- anova(fit2,fit1)[2,5]
	mean( out.null >= ts, na.rm=TRUE )
}

We can run it under the null hypothesis (arbitrarily chosen values for the non-zero parameters) to test the behavior (note for anyone running the code for themselves, I started the next 2 batches of code running before leaving work Friday afternoon, the first batch ended late Saturday morning and the 2 batch ended sometime after I went to bed on Saturday):

pb2 <- winProgressBar('Progress Bar 2',max=1000)
pbinc2 <- function() setWinProgressBar(pb2, getWinProgressBar(pb2)+1)

# check under the null

setWinProgressBar(pb2, 0)
out3 <- replicate(1000, {pbinc2(); 
		sleepsimfun3(100, 0, 1000, 0, 0, 45)} )

The output:

[1] 0.056

1-sample proportions test with continuity correction

data:  sum(out3 <= 0.05) out of 1000, null probability 0.5 
X-squared = 786.769, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.5 
95 percent confidence interval:
 0.04293596 0.07258036 
sample estimates:
    p 
0.056 

The histogram looks close enough to uniform (I know I could do specific tests, additional plots, but I am just going for the eyeball close enough test here) and the size of the test is close enough to the target alpha=0.05

Now let's try a power analysis for a given slope (arbitrarily chosen in this case):

# find power for b1=5, Vb1=0, V01=0

setWinProgressBar(pb2, 0)
out4 <- replicate(1000, {pbinc2(); 
		sleepsimfun3(100, 5, 1000, 0, 0, 45)} )

with results:

[1] 0.972

1-sample proportions test with continuity correction

data:  sum(out4 <= 0.05) out of 1000, null probability 0.5 
X-squared = 889.249, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.5 
95 percent confidence interval:
 0.9592453 0.9809686 
sample estimates:
    p 
0.972

So we have over 95% power to detect a true slope of 5 (conditional on the other parameters set).

So while a power study may take a bit of time (parallelization will help), it seems feasible (scary or not depends on the person still).

And to think that a few years ago when I was trying to get the flawed Pentium processor in my computer replaced I got the runaround with each person trying to explain that I did not need it replaced because only large corporations and research scientists who do tens of thousands floating point operations would be affected by the flaw.

Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

1 day later

Rafael Maia

Wed, Jan 28, 2009 3:04 AM #

Dear list members,

I am analyzing a data set comprised of two variables measured over  
time (once a month, over a year, for a total of 12 observations per  
individual). I am interested in modeling the relationships between  
these two variables, but I'd like specifically to compare if  
individual variation over time is associated between them. There is  
great individual variation in slopes for both variables, and I'd like  
to measure the degree of intra-individual association between the  
slopes for the two.

My first intuition was to conduct random slopes models for the  
variables and compare random effects (conditional modes for the  
slope?). However, I recall reading in some slides from Douglas Bates  
that these are not actually "estimates", actually something more  
similar to predictors. I also think I remember someone mentioning on  
this list specifically that these values should not be extracted and  
used as parameters for other analyses.

If this was not a mixed-model "situation", I guess it would be  
reasonable to compare regression slopes through an analysis of  
covariance, for example. Anyone has any suggestion as to how to treat  
this case? Any help would be greatly appreciated.

Many thanks in advance!

Abra?os,
Rafael Maia
---
"A little learning is a dangerous thing; drink deep, or taste not the  
Pierian spring." (A. Pope)
M.Sc., Animal Behavior Lab - Dept. of Zoology
Universidade de Brasilia, Brazil
http://www.unb.br/ib/zoo/rhmacedo/

1 day later

Nick Isaac

Fri, Jan 30, 2009 1:56 AM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20090130/038c3541/attachment.pl>