Interpolating / smoothing missing time series data
(see inline)
Sean Davis wrote:
On 9/7/05 10:19 PM, "Gabor Grothendieck" <ggrothendieck at gmail.com> wrote:
On 9/7/05, David James <djames at frontierassoc.com> wrote:
The purpose of this email is to ask for pre-built procedures or techniques for smoothing and interpolating missing time series data. I've made some headway on my problem in my spare time. I started with an irregular time series with lots of missing data. It even had duplicated data. Thanks to zoo, I've cleaned that up -- now I have a regular time series with lots of NA's. I want to use a regression model (i.e. ARIMA) to ill in the gaps. I am certainly open to other suggestions, especially if they are easy to implement. My specific questions: 1. Presumably, once I get ARIMA working, I still have the problem of predicting the past missing values -- I've only seen examples of predicting into the future. 2. When predicting the past (backcasting), I also want to take reasonable steps to make the data look smooth. I guess I'm looking for a really good example in a textbook or white paper (or just an R guru with some experience in this area) that can offer some guidance. Venables and Ripley was a great start (Modern Applied Statistics with S). I really had hoped that the "Seasonal ARIMA Models" section on page 405 would help. It was helpful, but only to a point. I have a hunch (based on me crashing arima numerous times -- maybe I'm just new to this and doing things that are unreasonable?) that using hourly data just does not mesh well with the seasonal arima code?
Have you looked at Durbin, J. and Koopman, S. J. (2001) _Time Series Analysis by State Space Methods._ Oxford University Press, cited with "?arima"? They explain that Kalman filtering is predicting the future, while Kalman smoothing is using all the data to fill the gaps, which seems to match your question. I was able to reproduce Figure 2.1 in that book but got bogged down with Figure 2.2 before I dropped the project. I can send you the script file I developed when working on that if it would help you. I'm still interested in learning how to reproduce in R all the examples in that book, and I'd happily receive suggestions from others on how to do that. spencer graves
Not sure if this answers your question but if you are looking for something simple then na.approx in the zoo package will linearly interpolate for you.
z <- zoo(c(1,2,NA,4,5)) na.approx(z)
1 2 3 4 5 1 2 3 4 5
Alternatively, if you are looking for "more smoothing", you could look at using a moving average or median applied at points of interest with an "appropriate" window size--see wapply in the gplots package (gregmisc bundle). There are a number of other functions that can accomplish the same task. A search for "moving window" or "moving average" in the archives may produce some other ideas. Sean
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Spencer Graves, PhD Senior Development Engineer PDF Solutions, Inc. 333 West San Carlos Street Suite 700 San Jose, CA 95110, USA spencer.graves at pdf.com www.pdf.com <http://www.pdf.com> Tel: 408-938-4420 Fax: 408-280-7915