modelling zero inflated continuous data using a two stage model in glmmpql - question about implementing the delta method (Philip Harrison)

Wed, Feb 4, 2015 6:12 AM

Hi Phil,
Not having looked in great detail at your models, but just the general
question - have you checked out the delta method functions in Ben
Bolker's emdbook package? I have found it very useful
(http://cran.r-project.org/web/packages/emdbook/emdbook.pdf).
Cheers,
Lee

Date: Tue, 03 Feb 2015 16:21:18 -0500
From: Philip Harrison <pharriso at uwaterloo.ca>
To: r-sig-ecology at r-project.org
Subject: Re: [R-sig-eco] modelling zero inflated continuous data using
        a two stage model in glmmpql - question about implementing the delta
        method
Message-ID:
        <20150203162118.20141bzmo53jwjwu at www.nexusmail.uwaterloo.ca>
Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes";
        format="flowed"

Update,

So it seems predictions for the delta method ARE as simple as the
probabily*estimates, which for the example can pretty easily computed
using the predict function (ive added my coding below)

However does anyone know how to compute confidence intervals with this
delta method? Im guessing its not as straightforward as the product of
the confidence intervals for the two models which are easily computed?

Any help would be greatly appreciated?

Cheers

Phil

Code

#### generate a grid for prediction which is object "predition grid"


library(AICmodavg)

#generate probability predictions form the binomial model

predoBin<-predictSE.lme(BinaryMod,predictiongrid,level=0,print.matrix=TRUE,se.fit=TRUE)

predsBinG<-cbind(predictiongrid,predoBin)

predsBinG$Probality<-plogis(predsBinG$fit)
predsBinG$CIH.bin<-plogis(predsBinG$fit+(1.96*predsBinG$fit$se.fit))
predsBinG$CIL.bin<-plogis(predsBinG$fit-(1.96*predsBinG$fit$se.fit))


###give the binomial model predictions an identifier


####generate (and backtransform from the cuberoot) predictions from
the  positive model

predoPos<-predictSE.lme(PositveMod,predictiongrid,level=0,print.matrix=TRUE,se.fit=TRUE)
predsPosG<-cbind(PositiveModel,predoPos)
predsPosG$PosWij<-(predsPosG$fit)^3
predsBinG$CIH<-(predsPosG$PosWij+(1.96*predsPosG$se.fit))^3
predsBinG$CIL<-(predsPosG$PosWij-(1.96*predsPosG$se.fit))^3


DeltaMaker<-cbind(predBinG,predsPosG)

#########get the conditional estimates

DeltaMaker$ConditionalWij<-DeltaMaker$Probability*DeltaMaker$PosWij


####theorectical CI estimation (this is likely wrong???need to
bootstrap or something?)


DeltaMaker$ConditionalWijCIH<-DeltaMaker$CIH.bin*DeltaMaker$CIH

DeltaMaker$ConditionalWijCIL<-DeltaMaker$CIL.bin*DeltaMaker$CIL

####im thinking maybe only using the CIs from the positive model might
make sense?

DeltaMaker$ConditionalWijCIH<-DeltaMaker$Probability*DeltaMaker$CIH

DeltaMaker$ConditionalWijCIL<-DeltaMaker$Probability*DeltaMaker$CIL


Quoting Philip Harrison <pharriso at uwaterloo.ca>:

Dear R-sig-ecology group,

I have some telemetry data with a zero-inflated (semi-continuous)
response, which I modelled using a two stage process using
glmmpql/nmle. I chose nlme/glmmpql because a temporal correlation
structure is very important for my telemetry data and I can live
with pql estimation and no AICs if it means I can avoid winBUGS!.

I am pretty happy with my models and I don?t want to use
ZIPglmms,ZINBglmms or ZAPglmms as the response is a pretty normal
continuous variable and not a count.

So now, if possible I would like to combine the results of the two
models to make predictions about resource selection (with Confidence
intervals) for each of the levels of my fixed effects, using the
?delta? approach, mentioned a few times on r-sig-eco including here

https://stat.ethz.ch/pipermail/r-sig-ecology/2011-May/002159.html

I found quite a few papers about the delta approach (typically using
abundance estimates as the response), but I am having trouble
translating the method into code (I?ve added refs at the end of the
message). I have seen papers where they appear to just multiply the
predictions from the positive model by the probability from the
first model (although the papers in question use Bayesian techniques
multiplying the posterior modes) Can I do the same with the
predictions from my frequentist approach, somehow that seems over
simplistic?

Also the response for the positive part of my model normalizes
nicely with a cube root transformation and many of the papers use
log-normal positive responses- I?m assuming this doesn?t make any
difference as long as I leave the back transformation until the very
end?

Any help would be much appreciated. See below for model details

###############################################################################
Models- First I model the probability of a non zero value, (i.e.
presence/absence) then model the continuous part i.e. the strength
of selection in the event of a positive selection event.  Sorry I
don?t have a real example dataset; however in this case I just need
to find out how to extract the relevant stuff from lme/glmmpql output.

Wij ? this is my resource selection response variable  (Wij=
proportion of time spent in habitat/proportion of habitat
available). This response varies from 0 (i.e. habitat was available
but not used) continuously through 1 (neither selected nor avoided)
and continuously >1 (positive selection) up to a theoretical infinity.

Seas; a categorical 4 level variable
Sequence; a variable created based on detection date used for the
correlation structure.
Habitat: a categorical habitat variable

Xdata$WijBin<-ifelse(Xdata$Wij>0,1,0)
XdataUsed<- Xdata[Xdata$WijBIN0==1,]

BinaryMod<-glmmpql(BinaryVariable~Seas+Habitat+Seas*Habitat,random=~1|FishID,cor=corCAR1(form=~Sequence),data=Xdata,family=?binomial?)

PositiveModel <-lme((Wij)^(1/3)~
Seas+Habitat+Seas*Habitat,random=~1|FishID/TemperatureCategory,cor=corCAR1(form=~Sequence),data=XdataUsed)