Hi Phil,
This is just a suggestion, and details will have to be thought
through carefully. Well actually it is two suggestions (I would use
the second). Please note that I have not delved deeply into your
code and therefore I assume that you are doing something sensible at
the modelling stage.
Suggestion number 1: Use the delta method (don't get confused with
delta model). Basically, you take a linear approximation to the
non-linear model and follow the parameter estimates' uncertainty
through. The calculus shouldn't be too onerous for these two part
models.
Suggestion number 2: Parametric bootstrap. You know the
(asymptotic) distribution of the estimates, through MLE theory. If
you are willing to assume that you are close enough to asymptotic
properties, then you can draw a large number of the of
possible/likely parameter values. For each set of parameter values,
do the prediction. You will end up with a set of predictions that
reflect the uncertainty in the parameters.
I noticed that you are also using a glmm in at least one part of
your model. Another choice needs to be made then: do you predict
conditionally or marginally to the population of random effects. If
conditional, you specify the set of random effects to some value
(usually zero or the observed levels). If marginal you have to
integrate out the stochasticity of the random effect's population.
The approach taken will have an effect on both the mean response and
the uncertainty around that mean. However, in linear models, it
will only affect the uncertainty (but your model isn't linear).
I hope that this helps,
Scott
On 04/02/15 08:21, Philip Harrison wrote:
Update,
So it seems predictions for the delta method ARE as simple as the
probabily*estimates, which for the example can pretty easily
computed using the predict function (ive added my coding below)
However does anyone know how to compute confidence intervals with
this delta method? Im guessing its not as straightforward as the
product of the confidence intervals for the two models which are
easily computed?
Any help would be greatly appreciated?
Cheers
Phil
Code
#### generate a grid for prediction which is object "predition grid"
library(AICmodavg)
#generate probability predictions form the binomial model
predoBin<-predictSE.lme(BinaryMod,predictiongrid,level=0,print.matrix=TRUE,se.fit=TRUE)
predsBinG<-cbind(predictiongrid,predoBin)
predsBinG$Probality<-plogis(predsBinG$fit)
predsBinG$CIH.bin<-plogis(predsBinG$fit+(1.96*predsBinG$fit$se.fit))
predsBinG$CIL.bin<-plogis(predsBinG$fit-(1.96*predsBinG$fit$se.fit))
###give the binomial model predictions an identifier
####generate (and backtransform from the cuberoot) predictions from
the positive model
predoPos<-predictSE.lme(PositveMod,predictiongrid,level=0,print.matrix=TRUE,se.fit=TRUE)
predsPosG<-cbind(PositiveModel,predoPos)
predsPosG$PosWij<-(predsPosG$fit)^3
predsBinG$CIH<-(predsPosG$PosWij+(1.96*predsPosG$se.fit))^3
predsBinG$CIL<-(predsPosG$PosWij-(1.96*predsPosG$se.fit))^3
DeltaMaker<-cbind(predBinG,predsPosG)
#########get the conditional estimates
DeltaMaker$ConditionalWij<-DeltaMaker$Probability*DeltaMaker$PosWij
####theorectical CI estimation (this is likely wrong???need to
bootstrap or something?)
DeltaMaker$ConditionalWijCIH<-DeltaMaker$CIH.bin*DeltaMaker$CIH
DeltaMaker$ConditionalWijCIL<-DeltaMaker$CIL.bin*DeltaMaker$CIL
####im thinking maybe only using the CIs from the positive model
might make sense?
DeltaMaker$ConditionalWijCIH<-DeltaMaker$Probability*DeltaMaker$CIH
DeltaMaker$ConditionalWijCIL<-DeltaMaker$Probability*DeltaMaker$CIL
Quoting Philip Harrison <pharriso at uwaterloo.ca>:
Dear R-sig-ecology group,
I have some telemetry data with a zero-inflated (semi-continuous)
response, which I modelled using a two stage process using
glmmpql/nmle. I chose nlme/glmmpql because a temporal correlation
structure is very important for my telemetry data and I can live
with pql estimation and no AICs if it means I can avoid winBUGS!.
I am pretty happy with my models and I don't want to use
ZIPglmms,ZINBglmms or ZAPglmms as the response is a pretty normal
continuous variable and not a count.
So now, if possible I would like to combine the results of the two
models to make predictions about resource selection (with
Confidence intervals) for each of the levels of my fixed effects,
using the 'delta' approach, mentioned a few times on r-sig-eco
including here
https://stat.ethz.ch/pipermail/r-sig-ecology/2011-May/002159.html
I found quite a few papers about the delta approach (typically
using abundance estimates as the response), but I am having
trouble translating the method into code (I've added refs at the
end of the message). I have seen papers where they appear to just
multiply the predictions from the positive model by the
probability from the first model (although the papers in question
use Bayesian techniques multiplying the posterior modes) Can I do
the same with the predictions from my frequentist approach,
somehow that seems over simplistic?
Also the response for the positive part of my model normalizes
nicely with a cube root transformation and many of the papers use
log-normal positive responses- I'm assuming this doesn't make any
difference as long as I leave the back transformation until the
very end?
Any help would be much appreciated. See below for model details
###############################################################################
Models- First I model the probability of a non zero value, (i.e.
presence/absence) then model the continuous part i.e. the strength
of selection in the event of a positive selection event. Sorry I
don't have a real example dataset; however in this case I just
need to find out how to extract the relevant stuff from
lme/glmmpql output.
Wij - this is my resource selection response variable (Wij=
proportion of time spent in habitat/proportion of habitat
available). This response varies from 0 (i.e. habitat was
available but not used) continuously through 1 (neither selected
nor avoided) and continuously >1 (positive selection) up to a
theoretical infinity.
Seas; a categorical 4 level variable
Sequence; a variable created based on detection date used for the
correlation structure.
Habitat: a categorical habitat variable
Xdata$WijBin<-ifelse(Xdata$Wij>0,1,0)
XdataUsed<- Xdata[Xdata$WijBIN0==1,]
BinaryMod<-glmmpql(BinaryVariable~Seas+Habitat+Seas*Habitat,random=~1|FishID,cor=corCAR1(form=~Sequence),data=Xdata,family="binomial")
PositiveModel <-lme((Wij)^(1/3)~
Seas+Habitat+Seas*Habitat,random=~1|FishID/TemperatureCategory,cor=corCAR1(form=~Sequence),data=XdataUsed)
#########################
Refs
Pennington, Michael. "Efficient estimators of abundance, for fish
and plankton surveys." Biometrics (1983): 281-286.
Stef?nsson, Gunnar. "Analysis of groundfish survey abundance data:
combining the GLM and delta approaches." ICES Journal of Marine
Science: Journal du Conseil 53.3 (1996): 577-588.
Anlauf-Dunn, Kara J., et al. "Habitat connectivity, complexity,
and quality: predicting adult coho salmon occupancy and
abundance." Canadian Journal of Fisheries and Aquatic Sciences
71.12 (2014): 1864-1876.
Phil Harrison
PhD student (Fisheries Ecology)
Department of Biology
University of Waterloo
200 University Avenue West
Waterloo, Ontario, Canada
N2L 3G1