Skip to content

time series in R

13 messages · Paul Gilbert, Ross Ihaka, Brian Ripley +3 more

#
Time Series functions in R
==========================

I think a good basic S-like functionality for library(ts) in base R
would include

ts class, tsp, is.ts, as.ts
plot methods
start end window frequency cycle deltat
lag diff aggregate
filter

spectrum, spec.pgram, spec.taper, cumulative periodogram, spec.ar?

ar -- at least univariate by Yule-Walker
arima -- sim, filter, mle, diag, forecast

stl


Of the first group we lack lag, deltat, cycle, filter. These are simple.
We have plot.ts, but might want lines.ts, for example.

Base R has nothing else, so I've taken a look at bats, tseries, dse
and timeslab.

bats
====

This is close to S(-PLUS) but lacks examples.
Its methods are often simple ones: it might benefit a lot from some
re-coding in C / Fortran.

It would give us 

ar.yw

acf and spectrum (spec.pgram, plot.spec).


tseries
=======

This has S-like function names, but does not, for example, accept
multiple time series in acf.  It has nice plots and nice examples.
(It would be better if the LIBNAME were tseries, when no Makefile
would be needed: as it is $(LD) should be $(SHLIBLD) and CFLAGS +=
-Wall is not a good idea being make and CC-specific.)

It would give us

acf/ccf/pacf (for acf in S)
spectrum/cross spectra/cumulative.periodogram


dse
===

As far as I can see (it is very complex) this handles state-space models,
and can handle ARMA models by converting them (approximately)
to state-space form.

dse is rather un-R-like, and did not compile for me without fixing 20
or so Fortran errors, and did not load for me even after an hour's
work fixing many more in the R code. I think the version in devel is
currently unusable, and has far, far too many conflicts. Given Paul's
claims of stability, I have spent a deeply frustrating afternoon.


timeslab
========

Rather self-contained, copyright position unclear (to me).



Suggestions:
===========

If Martyn and Adrian are agreeable, I will start a library(ts) in the
0.65 version. I will put in this:

the ts class and methods (maybe in due course these should be removed
from the base package?)

datasets from MASS, tseries

lag, deltat, cycle
filter

acf from bats, augmented by ideas from tseries.

spectrum, spec.pgram from bats  (after looking at tseries hard)
spec.taper
cpgram from MASS

ar.yw from bats  (although that probably needs to be moved to
C/Fortran, and I may special-case the univariate case to use code I
have for that).


That leaves

stl -- I believe the code for this is on netlib.

arima modelling.  I do have Fortran code for this, and will work on
  a simple R interface for it.  An elegant interface can come later.


One thing we do need to be very careful about is the odd constants
and signs that occur in time-series work. No two books I consulted
agree exactly on the definitions of the periodogram and spectrum!
I think we should follow S-PLUS unless there are compelling reasons
not to. I am bit worried that some of the fft usage actually
only computes approximations, and we need to assess how serious that is.


As the feature-freeze for 0.65 is probably about 2-3 weeks away, we
ought to concentrate on getting the basic stuff (bats-like) in first.

How does that sound as a plan?

Brian
#
I think this is a good idea.
I'd really like the multivariate one if possible. I'm quite sure ar.yw
in bats is multivariate.
Following is not important for the immediate goal, but I am surprize and
would also like to correct a couple of mis-conceptions.
models,
No. It handles both ARMA models and state-space models without any
conversion. It can also convert between the representations. In theory
this is algebraic and not approximate, but it is of course subject to
the usual numerical approximations of computers. The tolerances I get
are typically as good as or better than the differences between Solaris
and Linux calculation. (for example, 1e-15 for differences in residuals,
with data having order of magnitude around 1.) Model roots also compare
to very tight tolerances.
This surprizes me very much as it has compiled for me in Solaris and
Linux since R 0.16 (although I had to compile directly into R in those
days). At one point there was a bug in the g77 compiler which caused a
problem, but that seems to be fixed now. In earlier versions of R I used
f2c, and at  one point there were some unresolved references which I
fixed (by something simple like changing ^2 to ^2.0). When I used the
Sun Fortran compiler it always worked, but I haven't used it in a long
time now.
The version on CRAN is fairly old (by Internet time). Although the code
has not changed much, R's installation procedures have. I am surprized
that there are conflicts, but am not surprized that there are install
problems with recent versions of R, given all the changes in the way R
installs packages.  I intend to update the CRAN version as soon as R's
installation procedures stabalize. In most respects I believe that has
happened now, but there is a pending change in the building of help
files which I have been waiting to use. I have tried to advertize that
the most recent version at <www.bank-banque-canada/pgilbert> should
install with the most recent versions of R. I guess it would be useful
to have some indication of this on CRAN. Perhaps it would be better not
to leave an old version on CRAN and instead just point to were the most
recent versions are located?

If others are having problems compiling this code please let me know.

Paul Gilbert




-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Mon, 19 Jul 1999, Prof Brian D Ripley wrote:

            
I agree with basically everything Brian says, and would add the
following thoughts:

    1.	Think carefully about defining interfaces before leaping
	into code.  By defining an interface independent of the
	particular code at hand we will make it easier to switch to
	new code.

    2.	Think multivariate even when implementing a function for
	univariate series.  If the interface is defined well, it
	should later be able to generalize to the multivariate case.
	I'd hate to see a second multivariate time-series package
	come along later.

    3.	On the definition question: The existing FFT implementation
	uses a particular definition for the discrete transform which
	is pretty standard.  (Edwards "Fourier Series", Brillinger
	"Time Series" etc.) Using another definition may complicate
	documentation.

    4.	The notation for ARMA models may well be the stuff that
	holy wars are made of, but my personal preference is
	    X[i] + phi[1] * X[i - 1] + ... + phi[p] * X[i - p]
	    = theta[0] + epsilon[i] + ... + theta[q] * epsilon[i - q]
	[ Man to judge: ``It was self defense - I thought he was
	  going to hit me, so I hit him first.'' ]

I'd also add a request for some flexible structural modelling code.
I've pretty much abandonded teaching ARIMA models in favour of
structural models.  (I've written some general code for this, but
there is probably better code about.  There are a couple of TOMS
algorithms for Kalman Filtering which should be checked out.)

Finally: Does anyone know Kitagawa?  He has some very nice time series
and smoothing code.  Could he be persuaded to make it available?

	Ross

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
...
For this episode of the war perhaps we should stick with univariate
arima as in S, and use the same notation. For the multivariate episode I
like

A(L) y(t) = B(L) e(t) + C(L) u(t)

for ARMA models and

z(t) = F z(t-1) + G u(t) +K e(t-1)
y(t) = H z(t) + e(t)

for state-space models. I think it is important to think about
state-space and ARMA together, in order to try and have a consistent
approach.

Paul Gilbert

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Tue, 20 Jul 1999, Ross Ihaka wrote:

            
Yes, but we also want (to some extent) back-compatibility with S(-PLUS).
I would like to get something in there, even if we discover it
is not right.  For example, I have been trying what is already there on
bivariate series, with a lot of surprises.
That's the simple part.  But what is the divisor in the periodogram? Which
way do lags go in acfs of bivariate series, and which sign is the phase for
bivariate spectra?
Well, that is not mine, and it is not S-PLUS's either.

I see the virtue of a grander plan, but would like to get the 
simpler parts in now. And having spent an hour writing ts.union and
ts.intersect, I realize the simpler parts are not so simple.  Bill and I
had planned a new ts package to go with V&R3, but the work seemed to
be too much (and seems to be too much before R 1.0, too).

Brian
#
On Mon, 19 Jul 1999, Prof Brian D Ripley wrote:

            
Once you settle the forward discrete transform much of this is settled.
No constant in the dft implies
	Periodogram = |dft|^2/(2*pi*T)
Phases in spectra also fall out if you take a +ve exponent in the dft.

I don't much mind about these choices, but its probably a good idea to
be consistent.
	
	Ross

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Tue, 20 Jul 1999, Ross Ihaka wrote:

            
Isn't that the tail wagging the dog?
Why is the 2*pi there? It is in Bloomfield, but not Brockwell & Davis, for
example. And S-PLUS divides by the fequency to make the periodogram
estimate the spectral density.
But you have to decide which of series i and j gets the complex conjugate.
(Same issue with acfs: second relative to first or first relative to
second?  It's like defining `lags'.)
Consistent with whom?
#
Prof Brian D Ripley wrote:

            
I agree with almost everything.
Concerning the normalization, definition of fft and so on, I just want to say how
it is done in tseries:

Periodogram: Normalizing by (2*pi*n) where n is the length of the original series,
i.e. it differs from Brockwell&Davies by the factor 2*pi. Furthermore, normalizing
by n (and not npad) preserves the shape and not the sum under/over the
periodogram. I also remove the mean before starting the main computation which
makes the correction (10.4.7) from B&D unnecessary(?).
What about simple OLS estimator ar.ols?
I will wait until this first version is finished and then update the tseries
library to fit with the R ts.base library so we can use, e.g., the tests etc. What
about the name tseries. Should I change this name, so that it does not conflict
with the time series base?

Adrian

--
Adrian Trapletti, Vienna University of Economics and Business
Administration, Augasse 2-6, A-1090 Vienna, Austria
Phone: ++43 1 31336 4561, Fax: ++43 1 31336 708,
Email: adrian.trapletti@wu-wien.ac.at



-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Prof Brian D Ripley wrote:

            
I forgot to suggest to add the "embed" routine from tseries to the above list? I
think it's a base routine. It works also for multivariate series and is useful for
many higher level routines.

Adrian

--
Adrian Trapletti, Vienna University of Economics and Business
Administration, Augasse 2-6, A-1090 Vienna, Austria
Phone: ++43 1 31336 4561, Fax: ++43 1 31336 708,
Email: adrian.trapletti@wu-wien.ac.at



-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On 19-Jul-99 Prof Brian D Ripley wrote:
Some of the ts class stuff is implemented in the C code of the base
library. I'm not sure you can separate this.

I also suggested an "na.omit" method for time series - which would be
used by many of the time series functions. Martin has made na.omit()
and na.fail() generic so this is possible now.
I think there is some functional overlap between filter() and convolve(),
which now has a type="filter" option. I'm not sure convolve() needs this
if there is a filter() function.
I updated acf() in bats_0.1-3.tar.gz (now on CRAN) to use FFT instead,
after seeing its rather embarrassing performance.
Adrian's spectrum() function allows a wider range of kernel smoothers
and it would be nice to keep this.  Since spectrum() is just a wrapper
function anyway, it should be possible to do this, while keeping the
S-PLUS compatible interface.
I haven't tried very hard to make my code efficient, as you can see,
concentrating instead on getting something that works like the S-PLUS
version. Much of the work involved trying to get the same answer as
S-PLUS, and when that wasn't possible, trying to work out what S-PLUS
was doing wrong (see the COMPAT file).

I think the multivariate version would be harder to implement in C.
I will see if I can speed the R code up.

I have been looking at Burg's algorithm and I think I can implement
this now, if you want.
Sounds good.

Martyn
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On 19-Jul-99 Prof Brian D Ripley wrote:
Some of the ts class stuff is implemented in the C code of the base
library. I'm not sure you can separate this.

I also suggested an "na.omit" method for time series, which would be
used by many of the time series functions. Martin has made na.omit()
and na.fail() generic so this is possible now.
I think there is some functional overlap between filter() and convolve(),
which now has a type="filter" option. I'm not sure convolve() needs this
if there is a filter() function.
I updated acf() in bats_0.1-3.tar.gz (now on CRAN) to use FFT instead,
after seeing its rather embarrassing performance.
Adrian's spectrum() function allows a wider range of kernel smoothers
and it would be nice to keep this.  Since spectrum() is just a wrapper
function anyway, it should be possible to do this, while keeping the
S-PLUS compatible interface.
I haven't tried very hard to make my code efficient, as you can see,
concentrating instead on getting something that works like the S-PLUS
version. Much of the work involved trying to get the same answer as
S-PLUS, and when that wasn't possible, trying to work out what S-PLUS
was doing wrong (see the COMPAT file).

I think the multivariate version would be harder to implement in C.
I will see if I can speed the R code up.

I have been looking at Burg's algorithm and I think I can implement
this now, if you want.
Sounds good.

Martyn
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
BDR> On Tue, 20 Jul 1999, Ross Ihaka wrote:
>> On Mon, 19 Jul 1999, Prof Brian D Ripley wrote:
>> 
    >> > > 3. On the definition question: The existing FFT implementation >
    >> > uses a particular definition for the discrete transform which > >
    >> is pretty standard.  (Edwards "Fourier Series", Brillinger > > "Time
    >> Series" etc.) Using another definition may complicate > >
    >> documentation.
    >> > 
    >> > That's the simple part.  But what is the divisor in the
    >> periodogram? Which > way do lags go in acfs of bivariate series, and
    >> which sign is the phase for > bivariate spectra?
    >> 
    >> Once you settle the forward discrete transform much of this is
    >> settled.

    BDR> Isn't that the tail wagging the dog?

I think Ross meant to say that  fft()'s definition can't (shouldn't) be changed
anymore, and I agree on that [and that *is* S compatible anyway].

    >> No constant in the dft implies Periodogram = |dft|^2/(2*pi*T)

    BDR> Why is the 2*pi there? It is in Bloomfield, but not Brockwell &
    BDR> Davis, for example. And S-PLUS divides by the fequency to make the
    BDR> periodogram estimate the spectral density.

We have to decide if we go for S-plus compatibility here, see below.

    >> Phases in spectra also fall out if you take a +ve exponent in the
    >> dft.
    >> 
    BDR> But you have to decide which of series i and j gets the complex
    BDR> conjugate.  (Same issue with acfs: second relative to first or
    BDR> first relative to second?  It's like defining `lags'.)

    >> I don't much mind about these choices, but its probably a good idea
    >> to be consistent.

    BDR> Consistent with whom?

Mostly "with your own notation and definitions",
maybe even ``consistent with one book's author''.

I'd prefer Brockwell & Davis,
however I agree with Brian that S compatibility -- to some extent -- is a
goal here; on the other hand, S-plus uses definitions that are not widely
used in standard texts (my limited experience).

This really must must be settled now.
If nobody can show why S-plus definitions are `wrong by design' here,
I'd vote for adopting them, inspite of all...
[if only to keep V&R's chapter on time-series coherent :-)]

Something which should be discussed however is spectrum(0);
Several of us think that S-plus does the wrong thing, at least in some
cases. If  demean=T (mean removed),  should have periodogram(0) = 0,
and maybe even spectrum(0) = 0 [and hence dB-spec. = -Inf ..]
Another possibility would be to leave it NA
and maybe provide methods for estimating it specifically, if desired.

Martin
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
2 days later
#
On 20-Jul-99 Martin Maechler wrote:
[I'm repeating some private discussions with Martin here]

In my version of spec.pgram(), I chose to ignore frequency zero. This
seemed like the right thing to do since the periodogram at zero does
not estimate the spectral density at zero, but is a function of the
mean. It does not make much sense to use this value when smoothing the
periodogram to estimate frequencies near zero. This remains true if you
decide to set periodogram(0)=0.
I'm going to do this in the next version of the coda package.

Martyn
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._