Skip to content
Prev 4430 / 7420 Next

Cosinor with data that trend over time

I would probably attack this using a GAM modified to model the
residuals as a stochastic time series process.

For example

require("mgcv")
mod <- gamm(y ~ s(DoY, bs = "cc") + s(time), data = foo,
                         correlation = corCAR1(form = ~ time))

where `foo` is your data frame, `DoY` is a variable in the data frame
computed as `as.numeric(strftime(RDate, format = "%j"))` and `time` is
a variable for the passage of time - you could do `as.numeric(RDate)`
but the number of days is probably large as we might encounter more
problems fitting the model. Instead you might do `as.numeric(RDate) /
1000` say to produce values on a more manageable scale. The `bs =
"cc"` bit specifies a cyclic spline applicable to data measured
throughout a year. You may want to fix the start and end knots to be
days 1 and days 366 respectively, say via `knots = list(DoY =
c(0,366))` as an argument to `gam()` [I think I have this right,
specifying the boundary knots, but let me know if you get an error
about the number of knots]. The residuals are said to follow a
continuois time AR(1), the irregular-spaced counter part to the AR(1),
plus random noise.

There may be identifiability issues as the `s(time)` and `corCAR1()`
compete to explain the fine-scale variation. If you hit such a case,
you can make an educated guess as to the wiggliness (degrees of
freedom) for the smooth terms based on a plot of the data and fix the
splines at those values via argument `k = x` and `fx = TRUE`, where
`x` in `k = x` is some integer value. Both these go in as arguments to
the `s()` functions. If the trend is not very non-linear you can use a
low value 1-3 here for x and for the DoY term say 3-4 might be
applicable.

There are other ways to approach this problem of identifiability, but
that would require more time/space here, which I can go into via a
follow-up if needed.

You can interrogate the fitted splines to see when the peak value of
the `DoY` term is in the year.

You can also allow the seasonal signal to vary in time with the trend
by allowing the splines to "interact" in a 2d-tensor product spline.
Using `te(DoY, time, bs = c("cc","cr"))` instead of the two `s()`
terms (or using `ti()` terms for the two "marginal" splines and the
2-d spline). Again you can add in the `k` = c(x,y), fx = TRUE)` to the
`te()` term where `x` and `y` are the dfs for each dimension in the
`te()` term. It is a bit more complex to do this for `ti()` terms.

Part of the reason to prefer a spline for DoY for the seasonal term is
that one might not expect the seasonal cycle to be a symmetric cycle
as a cos/sin terms would imply.

A recent ecological paper describing a similar approach (though using
different package in R) is that of Claire Ferguson and colleagues in J
Applied Ecology (2008) http://doi.org/10.1111/j.1365-2664.2007.01428.x
(freely available).

HTH

G
On 25 March 2014 19:14, Jacob Cram <cramjaco at gmail.com> wrote: