Skip to content

Cosinor with data that trend over time

7 messages · Jacob Cram, Gavin Simpson

#
I would probably attack this using a GAM modified to model the
residuals as a stochastic time series process.

For example

require("mgcv")
mod <- gamm(y ~ s(DoY, bs = "cc") + s(time), data = foo,
                         correlation = corCAR1(form = ~ time))

where `foo` is your data frame, `DoY` is a variable in the data frame
computed as `as.numeric(strftime(RDate, format = "%j"))` and `time` is
a variable for the passage of time - you could do `as.numeric(RDate)`
but the number of days is probably large as we might encounter more
problems fitting the model. Instead you might do `as.numeric(RDate) /
1000` say to produce values on a more manageable scale. The `bs =
"cc"` bit specifies a cyclic spline applicable to data measured
throughout a year. You may want to fix the start and end knots to be
days 1 and days 366 respectively, say via `knots = list(DoY =
c(0,366))` as an argument to `gam()` [I think I have this right,
specifying the boundary knots, but let me know if you get an error
about the number of knots]. The residuals are said to follow a
continuois time AR(1), the irregular-spaced counter part to the AR(1),
plus random noise.

There may be identifiability issues as the `s(time)` and `corCAR1()`
compete to explain the fine-scale variation. If you hit such a case,
you can make an educated guess as to the wiggliness (degrees of
freedom) for the smooth terms based on a plot of the data and fix the
splines at those values via argument `k = x` and `fx = TRUE`, where
`x` in `k = x` is some integer value. Both these go in as arguments to
the `s()` functions. If the trend is not very non-linear you can use a
low value 1-3 here for x and for the DoY term say 3-4 might be
applicable.

There are other ways to approach this problem of identifiability, but
that would require more time/space here, which I can go into via a
follow-up if needed.

You can interrogate the fitted splines to see when the peak value of
the `DoY` term is in the year.

You can also allow the seasonal signal to vary in time with the trend
by allowing the splines to "interact" in a 2d-tensor product spline.
Using `te(DoY, time, bs = c("cc","cr"))` instead of the two `s()`
terms (or using `ti()` terms for the two "marginal" splines and the
2-d spline). Again you can add in the `k` = c(x,y), fx = TRUE)` to the
`te()` term where `x` and `y` are the dfs for each dimension in the
`te()` term. It is a bit more complex to do this for `ti()` terms.

Part of the reason to prefer a spline for DoY for the seasonal term is
that one might not expect the seasonal cycle to be a symmetric cycle
as a cos/sin terms would imply.

A recent ecological paper describing a similar approach (though using
different package in R) is that of Claire Ferguson and colleagues in J
Applied Ecology (2008) http://doi.org/10.1111/j.1365-2664.2007.01428.x
(freely available).

HTH

G
On 25 March 2014 19:14, Jacob Cram <cramjaco at gmail.com> wrote:

  
    
#
1) Visually - unless it actually matters exactly on which day in the
year the peak is observed? If visually is OK, just do `plot(mod, pages
= 1)` to see the fitted splines on a single page. See `?plot.gam` for
more details on the plot method.

2) You could generate some new data to predict upon as follows:

    newdat <- data.frame(DoY = seq_len(366), time = mean(foo$time))

Then predict for those new data but collect the individual
contributions of the spline terms to the predicted value rather than
just the final prediction

    pred <- predict(mod, newdata = newdat, type = "terms")

Then find the maximal value of the DoY contribution

    take <- which(pred$DoY == max(pred$DoY))
    newdat[take, , drop = FALSE]

You could use

    take <- which.max(pred$DoY)

instead of the `which()` I used, but only if there is a single maximal value.

This works because the spline terms in the additive model are just
that; additive. Hence because you haven't let the DoY and time splines
interact (in the simple model I mentioned, it is more complex if you
allow these to interact as you then need to predict DoY for each years
worth of time points), you can separate DoY from the other terms.

None of the above code has been tested, but was written off top of my
head, but should work or at least get you pretty close to something
that works.

HTH

G
On 26 March 2014 10:02, Jacob Cram <cramjaco at gmail.com> wrote:

  
    
#
Sorry about the errors (typos, not syntax errors) - I was forgetting
that you'd need to use `gamm()` and hence access the `$gam` component

I don't follow the point about a factor trending up or down. You
shouldn't try to use the `$lme` part of the model for this.
`summary(mod$gam)` should be sufficient, but as it relates to a
spline, this is more a test of whether the spline is different from a
horizontal, flat, null line. The problem with splines is that the
trend need not just be "trending up or down". In the past, to convey
where change in the trend occurs I have used the first derivative of
the fitted spline and looked for where in time the 95% confidence
interval on the first derivative of the spline doesn't include zero;
that shows the regions in time where the trend is significantly
increasing or decreasing. I cover how to do this in a blog post I
wrote:

http://www.fromthebottomoftheheap.net/2011/06/12/additive-modelling-and-the-hadcrut3v-global-mean-temperature-series/

the post contains links to the R code used for the derivatives etc,
though it is a little more complex in the case of a model with a trend
spline and seasonal spline.

I'm supposed to have updated those codes and the post because several
people have asked me how I do the analysis for models with multiple
spline terms. If you can't get the code to work for your models, ping
me back and I'll try to move that to the top of my TO DO list.

Note the `Xs(time)Fx1` entries in the `summary(mod$lme)` table refer
to the basis functions that represent the spline or at least to some
part of those basis functions. You can't really make much practical
use out of those values are they relate specifically to way the
penalised regression spline model has been converted into an
equivalent linear mixed effect form.

HTH

G
On 26 March 2014 12:10, Jacob Cram <cramjaco at gmail.com> wrote:

  
    
3 days later