naive question about output from mcmcsamp v. lmer estimates - R-SIG-mixed-models

Wed, Feb 7, 2007 7:29 AM #

Dear List,

I am  attempting to estimate the quantitative trait statistic Qst for a 
colleague. Essentially I am fitting the model

X[ijk] = mu + alpha[i] + beta[ij] + gamma[k] + epsilon[ijk]

to data from measured from trees planted in an unbalanced, incomplete 
block experiment, where X[ijk] = individual observation in the ith 
provenance, ijth family and kth block; mu, alpha[i], beta[ij], and 
gamma[k] are the effects of population mean, provenance, family within 
provenance and block respectively. Qst is estimated using variance 
components for provenance and family within provenance. Because of the 
problems inherent in using point estimates to estimate Qst I would like 
to use a Bayesian approach as proposed by Waldmann et al. (Heredity 
(2005) 94, 623?629). I have implemented this model in winBugs and get 
similar summary output to fitting the following lmer model (from lme4, 
using R 2.4.1 on windows):

 >summary(lme.HT02)

Linear mixed-effects model fit by REML
Formula: HT02 ~ (1 | Block) + (1 | Provenance/Family)
    Data: quant
    AIC   BIC logLik MLdeviance REMLdeviance
  12615 12636  -6303      12611        12607
Random effects:
  Groups            Name        Variance Std.Dev.
  Family:Provenance (Intercept)   3.0350  1.7421
  Provenance        (Intercept) 191.7605 13.8478
  Block             (Intercept)  16.1534  4.0191
  Residual                      593.3983 24.3598
number of obs: 1359, groups: Family:Provenance, 335; Provenance, 18; 
Block, 14

Fixed effects:
             Estimate Std. Error t value
(Intercept)  123.497      3.549    34.8

As winBugs is very slow for this problem I was hoping to use mcmcsamp 
instead. I am being naive in thinking that I can use mcmcsamp to 
estimate posterior variance components densities? E.g.

 >colMeans(mcmcsamp(lme.HT02,10000,trans=FALSE))
(Intercept)     sigma^2   Fm:P.(In)   Prvn.(In)   Blck.(In)
   122.57740   571.59628    60.77002    61.59250    17.56870

I was expecting these numbers to be similar, although not identical, to 
the lmer output. This is all quite new ground for me so, I readily 
accept I have probably gone wrong somewhere. But where?

Thanks,

James

-- 
Dr James Gibbons
Research Lecturer in Ecological Modelling
School of the Environment & Natural Resources
University of Wales, Bangor
phone: 01248 382461
email: j.gibbons at bangor.ac.uk

Gall y neges e-bost hon, ac unrhyw atodiadau a anfonwyd gyda hi,
gynnwys deunydd cyfrinachol ac wedi eu bwriadu i'w defnyddio'n unig
gan y sawl y cawsant eu cyfeirio ato (atynt). Os ydych wedi derbyn y
neges e-bost hon trwy gamgymeriad, rhowch wybod i'r anfonwr ar
unwaith a dil?wch y neges. Os na fwriadwyd anfon y neges atoch chi,
rhaid i chi beidio ? defnyddio, cadw neu ddatgelu unrhyw wybodaeth a
gynhwysir ynddi. Mae unrhyw farn neu safbwynt yn eiddo i'r sawl a'i
hanfonodd yn unig  ac nid yw o anghenraid yn cynrychioli barn
Prifysgol Cymru, Bangor. Nid yw Prifysgol Cymru, Bangor yn gwarantu
bod y neges e-bost hon neu unrhyw atodiadau yn rhydd rhag firysau neu
100% yn ddiogel. Oni bai fod hyn wedi ei ddatgan yn uniongyrchol yn
nhestun yr e-bost, nid bwriad y neges e-bost hon yw ffurfio contract
rhwymol - mae rhestr o lofnodwyr awdurdodedig ar gael o Swyddfa
Cyllid Prifysgol Cymru, Bangor.  www.bangor.ac.uk

This email and any attachments may contain confidential mate...{{dropped}}

Douglas Bates

Wed, Feb 7, 2007 8:26 AM #

On 2/7/07, Gibbons,James <afs417 at bangor.ac.uk> wrote:

Dear List,

I am  attempting to estimate the quantitative trait statistic Qst for a
colleague. Essentially I am fitting the model

X[ijk] = mu + alpha[i] + beta[ij] + gamma[k] + epsilon[ijk]

to data from measured from trees planted in an unbalanced, incomplete
block experiment, where X[ijk] = individual observation in the ith
provenance, ijth family and kth block; mu, alpha[i], beta[ij], and
gamma[k] are the effects of population mean, provenance, family within
provenance and block respectively. Qst is estimated using variance
components for provenance and family within provenance. Because of the
problems inherent in using point estimates to estimate Qst I would like
to use a Bayesian approach as proposed by Waldmann et al. (Heredity
(2005) 94, 623?629). I have implemented this model in winBugs and get
similar summary output to fitting the following lmer model (from lme4,
using R 2.4.1 on windows):

 >summary(lme.HT02)

Linear mixed-effects model fit by REML
Formula: HT02 ~ (1 | Block) + (1 | Provenance/Family)
    Data: quant
    AIC   BIC logLik MLdeviance REMLdeviance
  12615 12636  -6303      12611        12607
Random effects:
  Groups            Name        Variance Std.Dev.
  Family:Provenance (Intercept)   3.0350  1.7421
  Provenance        (Intercept) 191.7605 13.8478
  Block             (Intercept)  16.1534  4.0191
  Residual                      593.3983 24.3598
number of obs: 1359, groups: Family:Provenance, 335; Provenance, 18;
Block, 14

Fixed effects:
             Estimate Std. Error t value
(Intercept)  123.497      3.549    34.8

As winBugs is very slow for this problem I was hoping to use mcmcsamp
instead. I am being naive in thinking that I can use mcmcsamp to
estimate posterior variance components densities? E.g.

 >colMeans(mcmcsamp(lme.HT02,10000,trans=FALSE))

(Intercept)     sigma^2   Fm:P.(In)   Prvn.(In)   Blck.(In)
   122.57740   571.59628    60.77002    61.59250    17.56870

I was expecting these numbers to be similar, although not identical, to
the lmer output. This is all quite new ground for me so, I readily
accept I have probably gone wrong somewhere. But where?

As a first step I would suggest that you examine the mcmcsamp result
to see if the chains are stable.  Save the output from mcmcsamp as,
say, samp.HT02 and plot the chains as parallel time series

library(coda)
xyplot(samp.HT02)

You might also want to examine empirical density estimates derived
from the chains

densityplot(samp.HT02, plot.points = FALSE)

or normal probability plots

qqmath(samp.HT02, type = c("g", "p"))

If the distribution of the MCMC sample for a parameter is badly skewed
or otherwise unstable then  the sample mean will not be near to the
estimate.

Gibbons,James

Wed, Feb 7, 2007 9:20 AM #

Douglas Bates wrote:

Thanks for the quick reply. I did as you suggested, the xyplots don't 
look unusual (at least to my eyes) and

 > summary(HT02.mcmc)

Iterations = 1:10000
Thinning interval = 1
Number of chains = 1
Sample size per chain = 10000

1. Empirical mean and standard deviation for each variable,
    plus standard error of the mean:

               Mean    SD Naive SE Time-series SE
(Intercept) 122.55  2.42   0.0242        0.02434
sigma^2     570.51 25.13   0.2513        0.45181
Fm:P.(In)    61.39 12.36   0.1236        0.45281
Prvn.(In)    61.97 34.94   0.3494        0.58404
Blck.(In)    17.65 11.28   0.1128        0.27032

2. Quantiles for each variable:

                2.5%    25%    50%    75%  97.5%
(Intercept) 117.953 120.92 122.48 124.10 127.54
sigma^2     523.706 553.22 569.79 586.78 622.33
Fm:P.(In)    40.139  52.95  60.40  68.71  88.91
Prvn.(In)    19.185  38.20  53.86  76.79 150.31
Blck.(In)     3.441  10.09  15.22  22.30  46.02

doesn't suggest much skew. Running 100,000 samples also makes little 
difference.

By way of comparison some output from my winBugs implementation (100,000 
iterations, 10,000 burnin):

node	 mean	 sd	 MC error	2.5%	median	97.5%
intercept	126.6	7.819	0.4338	110.9	128.4	139.8
v.blocks	18.92	12.02	0.08299	4.772	16.16	49.36	
v.fam	3.056	6.032	0.2949	9.606E-4	0.1622	21.77
v.prov	174.9	74.91	1.05	78.63	159.1	361.7

the family posterior does look very skewed. These are with gamma priors 
for the variances and without a centering parameterization (i.e. very 
naive),

James
-- 
Dr James Gibbons
Research Lecturer in Ecological Modelling
School of the Environment & Natural Resources
University of Wales, Bangor
phone: 01248 382461
email: j.gibbons at bangor.ac.uk

Gall y neges e-bost hon, ac unrhyw atodiadau a anfonwyd gyda hi,
gynnwys deunydd cyfrinachol ac wedi eu bwriadu i'w defnyddio'n unig
gan y sawl y cawsant eu cyfeirio ato (atynt). Os ydych wedi derbyn y
neges e-bost hon trwy gamgymeriad, rhowch wybod i'r anfonwr ar
unwaith a dil?wch y neges. Os na fwriadwyd anfon y neges atoch chi,
rhaid i chi beidio ? defnyddio, cadw neu ddatgelu unrhyw wybodaeth a
gynhwysir ynddi. Mae unrhyw farn neu safbwynt yn eiddo i'r sawl a'i
hanfonodd yn unig  ac nid yw o anghenraid yn cynrychioli barn
Prifysgol Cymru, Bangor. Nid yw Prifysgol Cymru, Bangor yn gwarantu
bod y neges e-bost hon neu unrhyw atodiadau yn rhydd rhag firysau neu
100% yn ddiogel. Oni bai fod hyn wedi ei ddatgan yn uniongyrchol yn
nhestun yr e-bost, nid bwriad y neges e-bost hon yw ffurfio contract
rhwymol - mae rhestr o lofnodwyr awdurdodedig ar gael o Swyddfa
Cyllid Prifysgol Cymru, Bangor.  www.bangor.ac.uk

This email and any attachments may contain confidential mate...{{dropped}}