Skip to content

glmmPQL: Temporal correlation structure for grouped data with observation gaps

2 messages · Jorge Garcia Molinos, Thierry Onkelinx

#
Dear list members,

I have a question regarding the implementation of a temporal correlation
structure in a GLMM context in R. This is my first time posting here so I
hope the question is suitable for this forum.

I have a data set that comprises summer (June-September) paired daily air
and water temperatures measured at 83 monitoring sites over a period of 3
years. The purpose is to predict water temperatures based on air
temperatures and a suite of environmental variables related to the
characteristics of the rivers and contributing catchment (land use, slope,
elevation and so on). The number of observations among sites differ somehow
(the series at some sites started at slightly different dates and some
sites have missing values within the series).

My approach so far has been as follows:

1. Fit a GLMM on water temperatures (WT) with air temperature (AT) and the
different environmental variables as fixed factors and site as random
factor (random intercept).

model <- glmmPQL(WT ~ AT + H2OArea + avEleE + RipB60_fdec + RipB60_fev,
random = ~1|Site, data = t_d, family = Gamma(link = "log"))

The reason for using glmmPQL is because of the need to incorporate a
temporal correlation structure (see point 2)

2. Given the nature of the data, temporal correlation is to be expected. A
check of acf and pacf functions on the model residuals by site shows
consistently a gradually decaying acf, and a sharp drop of the pacf below
the significance threshold after lag 1 (for which the pacf equals approx..
0.8), suggesting an AR1 process is suitable. Considering the different
observation gaps in the temperature series by station and the seasonal gaps
between years (only summer data), I decided to try with a corCAR1 structure
of the form:

modelAR <- update(m5, correlation = corCAR1(0.8, form = ~days|Site))

Where days correspond to the day number since the start of the series
across all sites (the first day an observation was made across all series).
That is:

t_d$days <- difftime(t_d$date, min(t_d$date), units = "days")

3. However, the acf and pacf plots for the residuals in the updated model
look nearly identical from those of the original model, which makes me
think that I have not specified the correlation structure or the time
variable correctly or that I?m missing something else?

An extract of the data set for a few stations can be downloaded here:

https://drive.google.com/file/d/1iMaXSSYvCJ1Xo09ZoJjbLdzmc0gjfJKy/view?usp=sharing

Any help/suggestions will be greatly appreciated!

Many thanks,
Jorge
#
Dear Jorge,

glmmPQL is not the only option if you need a correlation structure. glmmTMB
and INLA allow for correlated random effects. That is IMHO superior to
correlated residuals within the most detailed levels of the random effect
(what nlme does).

Furthermore you need the normalised residuals, as they take the correlation
structure into account.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>


Op wo 10 mrt. 2021 om 05:58 schreef Jorge Garcia Molinos <garciamj at tcd.ie>: