Correct specification of ar1 structure in glmmTMB - R-SIG-mixed-models

Fri, Mar 6, 2020 12:06 PM #
Hi All,

I'm wondering if someone would be able to clarify how to correctly specify
the ar1 structure in glmmTMB. I have read the help page and the vignette (
https://cran.r-project.org/web/packages/glmmTMB/vignettes/covstruct.html),
and understand that the general formula is ar1(time + 0|grouping variable),
where time is a factor, but how does this work when time consists of two
seperate variables (i.e., date and hour)?

For instance, the dataset I'm working with consists of several ids, and
each id contains multiple days of data, and multiple measurements per day
(i.e., a measurement each hour). The structure in its basic form (no
additional variables) would be something like:

df <- data.frame(id = rep(seq(1,5,1), 3),
                 date = rep(seq(lubridate::ymd("2020-03-01"),
                                lubridate::ymd("2020-03-05"), 1),3),
                 hour = rep(seq(5, 9, 1), 3),
                 value = rnorm(15, 5, 2))

Now change 'hour' and 'date' to class factor and create a unique grouping
variable called 'id_date', consisting of each id associated with each date.

df <- df %>%
mutate(hour_factor = as.factor(hour), date_factor = as.factor(date)) %>%
unite(id_date, id, date_factor, remove=FALSE)

And now the model:

glmmTMB(value ~ ar1(hour_factor | id_date) + (1|id), data=df)

Is this the correct specification for the ar1 structure, when 'hour' is
nested within 'date', and when 'id' is a random effect? Or should 'date'
and 'hour' be combined into a single variable (e.g. 2020-03-01 05:00:00)
and then converted to a factor, with the grouping variable being 'id'?

glmmTMB(value ~ ar1(date_hour | id) + (1|id), data=df)

In my own dataset, both models work, but produce different estimates,
p-values and residuals (note that the example here won't work because of
too few observations).

Thanks,
Simon