Datum: Sun, 07 Oct 2012 00:07:49 +0100
Von: Jarrod Hadfield <j.hadfield at ed.ac.uk>
An: Katrin Ronnenberg <ronnenberg at gmx.com>
CC: r-sig-mixed-models at r-project.org
Betreff: Re: [R-sig-ME] ZIP MCMCglmm
Hi,
A) The residual variance for the zero-inflation is not identified nor
is the residual covariation between the zero-inflation and the Poisson
part. However, you are attempting to estimate both. Use:
rcov=idh(trait):units
and fix the [2,2] element of the covariance matrix at something (e.g
1) in the prior (i.e. add fix=2 to your R prior).
B) you have no unique intercepts for the zero inflation and the
Poisson part. You should have trait as a main effect in the fixed
formula (or trait-1 might be easier to interpret).
C)
random= ~us(1+sqkm):EFF_Track
assumes that the regression parameters (intercept and slope) of sqkm
on log Poisson rate and logit zero-inflation are identical within
EFF_Tracks, and that the slope=0 on average (because its not in the
fixed formula). I would have one of the following in the fixed effects:
a) trait:sqkm
b) at.level(trait,1):sqkm
c) at.level(trait,2):sqkm
and one of the following in the random effects.
i) random= ~us(trait):EFF_Track
ii) random= ~idh(trait):EFF_Track
iii) random= ~us(at.level(trait,1)):EFF_Track
iv) random= ~us(at.level(trait,2)):EFF_Track
and make sure that if i) or ii) to have a), if iii) to have a) or b)
and if iv) to have a) or c).
If you use ZAP rather than ZIP then fixed=~sqkm and random= ~EFF_Track
is OK too.
D) DIC should not be trusted.
E) Not sure, but it sounds like g(0) is correcting for missed
sightings of individuals present (?) and therefore refers to the
Poisson part of the model. The zero-inflation part may also be due to
missed sightings of individuals present (e.g. on some transects the
observers had their eyes closed!) but it is possibly more likely to be
due to absence of porpoises? I don't know.
Cheers,
Jarrod
Quoting Katrin Ronnenberg <ronnenberg at gmx.com> on Fri, 05 Oct 2012
12:31:12 +0200:
Dear all,
I have several questions on a zero-inflated MCMCglmm model.
Data consist of 764 transects with porpoise counts ("sum_ppho"). On
304 transects zero porpoises were seen. Data come from several
studies from the North Sea, they were all sampled in roughly the
same area and from altogether 12 years. We are interested in a trend
("year"), seasonal effects ("dayofyear" quadratic (centered) Term +
the simple term. Moreover we have a simple geographical variable
"we"; a factor with two levels "West" and "east". The transect ID
(EFF_Track) I integrated as random factor, assuming that transects
are not independant of each other I used the us structure. Most
transects were revisited over several years, but not all of them.
Since Transects had differing length and visibility conditions,
which affected the effective area sampled, I used the effective area
("sqkm") as random slope term.
in MCMCglmm I tried the following:
prior<-list(R=list(V=diag(2),nu=0.002),
G=list(G1=list(V=diag(2),nu=0.002)))
Mz<-MCMCglmm(sum_ppho~at.level(trait,1):year*at.level(trait,1):we+
+ at.level(trait,1):(I(cdayofyear^2)+
at.level(trait,1):dayofyear), random= ~us(1+sqkm):EFF_Track,
+ family="zipoisson",
+ data=encounter, rcov=~us(trait):units, prior=prior,
verbose=FALSE,
+ nitt=100000,thin =1000, burnin = 5000)
Iterations = 5001:99001
Thinning interval = 1000
Sample size = 95
DIC: 2843.102
G-structure: ~us(1 + sqkm):EFF_Track
post.mean l-95% CI u-95% CI eff.samp
(Intercept):(Intercept).EFF_Track 3.62368 1.21791 6.03721 6.215
sqkm:(Intercept).EFF_Track -0.38983 -0.61860 -0.15801 14.857
(Intercept):sqkm.EFF_Track -0.38983 -0.61860 -0.15801 14.857
sqkm:sqkm.EFF_Track 0.05025 0.02699 0.07369 42.211
R-structure: ~us(trait):units
post.mean l-95% CI u-95% CI eff.samp
sum_ppho:sum_ppho.units 0.8150 0.673652 1.0178 69.615
zi_sum_ppho:sum_ppho.units 0.1894 -0.472509 0.6914 7.978
sum_ppho:zi_sum_ppho.units 0.1894 -0.472509 0.6914 7.978
zi_sum_ppho:zi_sum_ppho.units 0.1561 0.001532 0.6305 8.428
Location effects: sum_ppho ~ at.level(trait, 1):year *
at.level(trait, 1):we + at.level(trait, 1):(I(cdayofyear^2) +
at.level(trait, 1):dayofyear)
post.mean l-95% CI u-95% CI
eff.samp pMCMC
(Intercept) -3.315e+00 -4.486e+00 -2.045e+00
1.993 <0.01 *
at.level(trait, 1):year 1.673e-03 1.111e-03 2.169e-03
3.976 <0.01 *
at.level(trait, 1):wew -5.459e+02 -6.696e+02 -4.612e+02
40.273 <0.01 *
at.level(trait, 1):I(cdayofyear^2) -3.117e-05 -5.063e-05 -7.107e-06
95.000 <0.01 *
at.level(trait, 1):dayofyear -1.726e-03 -3.851e-03 2.335e-05
59.665 0.0632 .
at.level(trait, 1):year:wew 2.727e-01 2.305e-01 3.343e-01
40.388 <0.01 *
---
In general I feel insecure if my assumptions are right and whether
prior specifications make sense. But I have two main questions:
1. Question: Am I correct in the assumption, that, if the output for
autocorr(M1$Sol) and autocorr(M1$VCV) are low, that spatial
autocorrelation is no issue? If I specify a relatively simple model
(only main effects), autocorrelation is no issue, however the more
interaction terms I include the stronger the autocorrelation
becomes. Increasing itterations by a factor of ten to:
nitt=100000,thin =5000, burnin = 10000, hardly changed anything
about that. Is there anything else I could try? I know that I can
increase nitt... further and ask Malcolm about the sir() function.
But I'm especially interested, if experimenting with the random term
has a chance of improving things?
2. Question: Following methods of distance sampling, a G(0) factor
was estimated compensating for the diving porpoises, which escaped
the surveyors. Originally we corrected the observations by dividing
the counts by the G(0) factor. So, basically only the transects with
counts were corrected. Am I right in the assumption that the latent
variable in a ZIP model compensates for all values with false zeros
i.e. also the false zero for the third animal in a transect with 2
counted animals? If that's the case, I would just not include any
correction for g(0). But if the latent variable only compensates for
false zeros of all zero values, but not for other values from the
poisson distribution, then I would assume that the correction factor
is needed. Is that the difference between a hurdle and a ZIP model?
ZIP was in my case better than hurdle according to DIC.
I'm sorry, I hope you can understand my questions. I'm neither a
statistician, nor a native speaker, so I would like to appologize
for bad choice of words. And please let me know, if a representive
sample of my data would help.
Thank you for reading this, ideas and suggestions are very much