gamm4 model formulation clarification
Oops. It seems that my logic in computing the CI for the difference scores in the "solution" I posted was seriously flawed (see: http://stats.stackexchange.com/questions/1169/ci-for-a-difference-based-on-independent-cis). So, predict.gam(..., se.fit=T) produces predicted values and and SEs for those values; any suggestions on how to compute the proper CI for a difference between such predicted values given the SEs?
On Mon, Aug 2, 2010 at 2:44 PM, Mike Lawrence <Mike.Lawrence at dal.ca> wrote:
On Sat, Jul 31, 2010 at 9:42 PM, Mike Lawrence <Mike.Lawrence at dal.ca> wrote:
Also, I'm just now thinking that neither approach really gets at what I want to test; what I want to test whether the model is improved by (1) letting the smooth vary as a function of design, (2) letting the smooth vary as a function of ddB, and (3) letting the smooth vary as a function of both design & ddB. That is, is there a design*soa interaction, a ddB*soa interaction and/or a design*ddB*soa interaction? In the case of support for any of these interactions, I'd also be interested in pinpointing the timeframe over which the interactions take place.
After playing a bit, I wonder if anyone has any input on the following
"solution" I'm currently working with. First, a change in variable
names to simplify things: A & B are both 2-level factor variables, C
is a continuous numeric variable that needs smoothing, D is the
dependent variable, and E is the random effect (in my case, human
participants in an experiment).
To assess the A*C interaction:
1) turn B into a numeric centered on zero (-1,1):
my_data$B2 = as.numeric(my_data$B2)-1.5
2) fit a gamm, including the A*B2 interaction and smoothing C
independently per level of A:
AC_fit = gamm4(
? ?formula = D ~ A*B2 + s(C,by=A)
? ?, random = ~ (1|E)
? ?, data = my_data
)
3) Obtain predictions & 95% CI from the model for the 2 levels of A
across the range of C
AC_pred = expand.grid(
? ?A = factor(levels(my_data$A),levels=levels(my_data$A))
? ?, B2 = 0
? ?, C = seq(min(my_data$C),max(my_data$X),length.out=1e3)
)
from_predict = predict(AC_fit,newdata=AC_pred,se.fit=T)
AC_pred$D = from_predict$fit
AC_pred$se = from_predict$se.fit
AC_pred$lo_95ci = AC_pred$D - AC_pred$se*1.96
AC_pred$hi_95ci = AC_pred$D + AC_pred$se*1.96
4) You could plot the "pred" data now, plotting D as a function of C
and split by levels of A, but this will include the main effects of
both A and C, obscuring visualization of the interaction.
5) Compute the A*C interaction by computing the difference between the
levels of A at each value of C:
AC_effect = ddply(
? ?.data = AC_pred
? ?, .variables = .(C)
? ?, .fun = function(x){
? ? ? ?to_return = data.frame(
? ? ? ? ? ?A_effect = x$D[x$A==levels(x$A)[1]] - x$D[x$A==levels(x$A)[2]]
? ? ? ? ? ?, lo_95ci = min(
? ? ? ? ? ? ? ?x$hi_95ci[x$A==levels(x$A)[1]] - x$lo_95ci[x$A==levels(x$A)[2]]
? ? ? ? ? ? ? ?, x$lo_95ci[x$A==levels(x$A)[1]] -
x$hi_95ci[x$A==levels(x$A)[2]]
? ? ? ? ? ?)
? ? ? ? ? ?, hi_95ci = max(
? ? ? ? ? ? ? ?x$hi_95ci[x$A==levels(x$A)[1]] - x$lo_95ci[x$A==levels(x$A)[2]]
? ? ? ? ? ? ? ?, x$lo_95ci[x$A==levels(x$A)[1]] -
x$hi_95ci[x$A==levels(x$A)[2]]
? ? ? ? ? ?)
? ? ? ?)
? ? ? ?return(to_return)
? ?}
)
6) Now you can visualize the A*C interaction; any area where the CI
ribbon does not include zero may be considered a region where there is
a significant A*C interaction.
ggplot(data=AC_effect)+
layer(
? ?geom='hline'
? ?, yintercept = 0
? ?, linetype = 2
)+
layer(
? ?geom = 'ribbon'
? ?, mapping = aes(
? ? ? ?x = C
? ? ? ?, ymin = lo_95ci
? ? ? ?, ymax = hi_95ci
? ?)
? ?, alpha = .5
)+
layer(
? ?geom = 'line'
? ?, mapping = aes(
? ? ? ?x = C
? ? ? ?, y = A_effect
? ?)
)
Similar steps can be applied to the inspection of a B*C interaction.
The A*B*C interaction would be assessed by:
1) create dummy variable, F, representing the A*B combinations:
my_data$F = factor(paste(my_data$A,my_data$B))
2) fit a gam that smooths C in each combination of A*B:
ABC_fit = gamm4(
? ?formula = D ~ A*B + s(C,by=F)
? ?, random = ~ (1|E)
? ?, data = my_data
)
3) obtain the prediction & CI for D in each combination of A*B and
across the range of C:
ABC_pred = expand.grid(
? ?A = factor(levels(my_data$A),levels=levels(my_data$A))
? ?, B = factor(levels(my_data$B),levels=levels(my_data$B))
? ?, C = seq(min(my_data$C),max(my_data$X),length.out=1e3)
)
ABC_pred = factor(paste(ABC_pred$A,ABC_pred$B),levels=levels(my_data$F))
from_predict = predict(ABC_fit,newdata=ABC_pred,se.fit=T)
ABC_pred$D = from_predict$fit
ABC_pred$se = from_predict$se.fit
ABC_pred$lo_95ci = ABC_pred$D - ABC_pred$se*1.96
ABC_pred$hi_95ci = ABC_pred$D + ABC_pred$se*1.96
4) Similar to step 5 above, compute the difference between levels of A
at each value of C *and* each level of B.
5) Using the differences computed in step 4, now compute the
difference between levels of B at each value of C.
6) Now we have another single function and confidence ribbon that can
be plotted; any region where the ribbon excludes zero may be
considered a region with a significant A*B*C interaction.
Does anyone have any thoughts on the validity of this approach? Also,
this is clearly only applicable to the cases where A and B have only 2
levels each, so any thoughts on an alternative more generalizable
approach would be appreciated.
Mike
--
Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University
Looking to arrange a meeting? Check my public calendar:
http://tr.im/mikes_public_calendar
~ Certainty is folly... I think. ~
Mike Lawrence Graduate Student Department of Psychology Dalhousie University Looking to arrange a meeting? Check my public calendar: http://tr.im/mikes_public_calendar ~ Certainty is folly... I think. ~