Skip to content
Prev 11625 / 20628 Next

Curved residuals vs fitted plot

Hi Victoria,

I get quite a different residual plot from your model - code pasted below - which looks much better. There still appears to be a weak residual trend but these plots can be deceptive especially if there's a mass of points in one part of the graph. The apparent trend might be an artefact of skew at either end of the probability scale. I believe there's no requirement for these residuals to be normal, or even symmetrical, and they're more likely to be skewed close to predicted probabilities of 0 or 1.  E.g. you have several rows with zero responses, which inevitably produce a lump of negative residuals. (This residual plot should also be viewed with caution for the reason I mentioned earlier about standardising the residuals using the binomial variance function rather than the logitnormal-binomial variance function (does it exist, anyone?).)

Probably a better way to assess the model is to plot on the same graph the data and the model predictions, with prediction intervals. E.g. plot prediction intervals on boxplot(Aphid.rm.prop ~ Treatment*Distance.m, data=dframe1).
If you expand the data from binomial to binary, and still fit the same random effects, you are fitting the same model, with the same power, except that the observation-level overdispersion random effect becomes a more conventional "group" random effect. The only difference will be to make the model fit more slowly. E.g. "+ (1|Site)" implies the the same model in both of these data sets:

Collapsed:
1 2 Site1
2 1 Site2

Expanded:
0 Site1
1 Site1
0 Site1
1 Site2
1 Site2
0 Site2

Best wishes,
Paul



sapply(list(Field,Treatment,Field.section,Transect,Distance.m,Aphid.rm.success,Aphid.remain.fail,fffPlot,Bait.ID),function(x)length(unique(x)))
### Bait.ID has a duplicated level so has n-1 levels - remake with n levels:
Bait.ID<-paste("ID",1:length(Bait.ID),sep="")
dframe1<-data.frame(Field,Treatment,Field.section,Transect,Distance.m,Aphid.rm.success,Aphid.remain.fail,fffPlot,Bait.ID)
rm(Field,Treatment,Field.section,Transect,Distance.m,Aphid.rm.success,Aphid.remain.fail,fffPlot,Bait.ID) # removes ambiguity about source of objects
head(dframe1)
dframe1$n <- dframe1$Aphid.rm.success + dframe1$Aphid.remain.fail
dframe1$Aphid.rm.prop <- dframe1$Aphid.rm.success / dframe1$n

plot(Aphid.rm.prop ~ jitter(Distance.m,factor=0.2), data=dframe1,col=as.numeric(Treatment),pch=as.numeric(Treatment))
boxplot(Aphid.rm.prop ~ Treatment, data=dframe1)
boxplot(Aphid.rm.prop ~ Treatment*Distance.m, data=dframe1)

### I changed Field from random to fixed, because although it's conceptually a random effect, two levels are too few to get a reasonable variance estimate
model3<-glmer(cbind(Aphid.rm.success,Aphid.remain.fail) ~ Treatment*Distance.m+Field+(1|Field.section/Transect/fffPlot/Bait.ID), # Bait.ID to remove overdispersion
binomial(link=logit),data=dframe1) # note I kept fffPlot smallest scale in (better residuals)
summary(model3)
plot(model3) # issue? Or just messy ecological data?

### look at random effect residuals
dotplot(ranef(model3,condVar=TRUE))$Bait.ID
dotplot(ranef(model3,condVar=TRUE))$fffPlot
dotplot(ranef(model3,condVar=TRUE))$Transect  # close to zero variance at this level
dotplot(ranef(model3,condVar=TRUE))$Field.section  # close to zero variance at this level
### these look sufficiently close to normal, but should do a QQ plot, at least on the factors that vary
qqmath(ranef(model3,condVar=TRUE))$Bait.ID
qqmath(ranef(model3,condVar=TRUE))$fffPlot

# shift overdispersion random effect from fitted values to residuals
Fitted <- plogis(qlogis(fitted(model3)) - ranef(model3)$Bait.ID[[1]])
Resid <- (dframe1$Aphid.rm.prop - Fitted) / sqrt(Fitted * (1 - Fitted)/dframe1$n)
# plot residuals differentiating between treatments and distances
plot(Fitted, Resid, col=dframe1$Treatment, pch=21, bg=grey(pnorm(scale(dframe1$Distance.m))))
abline(h=0)
On 11 Mar 2014, at 15:57, Victoria Wickens <V.J.Wickens at pgr.reading.ac.uk> wrote: