[RsR] Seasonal adjustment; What's a good `blackbox' robust lm() function?

There is no support for x12arima or Tramo/Seats in R. I am doing a
Check out:

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/40254.html

Also more below.
poor man's seasonal adjustment procedure: For monthly data, I do a
regression with no intercept and 12 dummy variables (i.e. a month
factor), and then calculating the residual.

If I did this using lm() the procedure is always feasible as long as
there are atleast 12 data points.

When I started using this, I noticed that quite a few time series have
extreme values and these were making the function generate wonky
values. So I thought I should switch to a robust lm(). I was surprised
to find that MASS:rlm() does not always converge. For quite a few
series I find that it breaks. I wonder what the gurus on this list
think. I can come back with concrete numerical examples of time-series
where MASS:rlm() does not work, if that's interesting.

More generally, if you have any suggestions on how to do seasonal
adjustment, they would be most interesting.
Noting that months(chron.object) and quarters(chron.object)
gives an ordered factor, here are a few approaches to the coefs:

library(chron)
library(quantreg)
rq(cpi ~ months(as.chron(date)) -1, data = n, method = "fn")

or what is basically the same, the median

tapply(n$cpi, list(month = months(as.chron(n$date))), median, na.rm = TRUE)

or a trimmed mean:

tapply(n$cpi, list(month = months(as.chron(n$date))), mean, trim =
0.1, na.rm = TRUE)

Note that we can simplify the adjustment function using ave. The
following subtracts out
monthly trimmed means from each data point:

ave(n$cpi, months(as.chron(n$date)), FUN = function(x) x - mean(x,
trim = 0.1, na.rm = TRUE))
My code is:

# A vector of data "x" and a vector of dates "dates" is supplied.
# You must choose whether you want to focus on monthly or quarterly
# seasonality. The function will then do the work of mapping from
# "dates" into the relevant vector of dummy variables.
#
# x is assumed to be in levels.
# A dummy variable regression is done.
# NA values for x are correctly handled, but NA values for the
# dates are not tolerated.
# It is typical for you to send in c(NA,diff(log(levels))) in which case
#  the dummy variables regression is additive seasonality in log
#  differences.
# You get back a vector which has the same length as what came in.
# In addition, you also get back the results of the dummy variable
# regression.
library(MASS)

date2month <- function(dates) {
 factor(as.POSIXlt(dates)$mon, levels=0:11,
        labels=c("Jan", "Feb", "Mar", "Apr",
          "May", "Jun", "Jul", "Aug",
          "Sep", "Oct", "Nov", "Dec"))
}
date2quarter <- function(dates) {
 factor(c(1,1,1,2,2,2,3,3,3,4,4,4)[as.numeric(date2month(dates))],
        levels=1:4, labels=c("Q1","Q2","Q3","Q4"))
}

sa.dummyvariableregression <- function(x, dates, seasonality="monthly") {
 d <- data.frame(x=x, date=dates)
 if (seasonality == "monthly") {
   d$pfactor <- date2month(d$date)
 } else {
   d$pfactor <- date2quarter(d$date)
 }
 non.na.locations <- !is.na(d$x)
 d <- d[non.na.locations,]
 # At this point, we're holding the clean non-NA data.
 # Dummy variable regression: do rlm() if you can but use lm() otherwise.
 m <- rlm(x ~ -1 + pfactor, data=d)
 if (! m$converged) {
    cat("rlm failed to converge; switching to lm().\n")
    m <- lm(x ~ -1 + pfactor, data=d)
 }

 # Extract residuals
 sa <- m$residuals               # m$residuals is one short
 sa <- sa + mean(coef(m))        # Intercepts have soaked up the mean
 # Place back into the original layout with NAs in the right places
 sa.full <- rep(NA, length(non.na.locations))
 sa.full[non.na.locations] <- sa

 # Send back the goodies.
 list(sa=sa.full, m=m)
}

Here's some data to experiment with this. It's a data frame "n"
containing n$date and n$cpi.

n <- structure(list(date = structure(c(7395, 7425, 7456, 7486, 7517,
   7548, 7578, 7609, 7639, 7670, 7701, 7729, 7760, 7790, 7821, 7851,
   7882, 7913, 7943, 7974, 8004, 8035, 8066, 8095, 8126, 8156, 8187,
   8217, 8248, 8279, 8309, 8340, 8370, 8401, 8432, 8460, 8491, 8521,
   8552, 8582, 8613, 8644, 8674, 8705, 8735, 8766, 8797, 8825, 8856,
   8886, 8917, 8947, 8978, 9009, 9039, 9070, 9100, 9131, 9162, 9190,
   9221, 9251, 9282, 9312, 9343, 9374, 9404, 9435, 9465, 9496, 9527,
   9556, 9587, 9617, 9648, 9678, 9709, 9740, 9770, 9801, 9831, 9862,
   9893, 9921, 9952, 9982, 10013, 10043, 10074, 10105, 10135, 10166,
   10196, 10227, 10258, 10286, 10317, 10347, 10378, 10408, 10439,
   10470, 10500, 10531, 10561, 10592, 10623, 10651, 10682, 10712,
   10743, 10773, 10804, 10835, 10865, 10896, 10926, 10957, 10988,
   11017, 11048, 11078, 11109, 11139, 11170, 11201, 11231, 11262,
   11292, 11323, 11354, 11382, 11413, 11443, 11474, 11504, 11535,
   11566, 11596, 11627, 11657, 11688, 11719, 11747, 11778, 11808,
   11839, 11869, 11900, 11931, 11961, 11992, 12022, 12053, 12084,
   12112, 12143, 12173, 12204, 12234, 12265, 12296, 12326, 12357,
   12387, 12418, 12449, 12478, 12509, 12539, 12570, 12600, 12631,
   12662, 12692, 12723, 12753, 12784, 12815, 12843, 12874, 12904,
   12935, 12965, 12996, 13027, 13057, 13088, 13118, 13149, 13180,
   13208, 13239, 13269, 13300, 13330, 13361, 13392, 13422, 13453,
   13483, 13514, 13545, 13573, 13604, 13634, 13665, 13695, 13726,
   13757, 13787, 13818, 13848, 13879, 13910, 13939, 13970), class = "Date"),
       cpi = c(38.88, 39.31, 39.96, 40.82, 41.04, 41.25, 42.12,
       42.76, 42.98, 43.63, 43.63, 43.41, 43.63, 44.06, 45.14, 46.22,
       46.87, 47.73, 48.16, 48.6, 48.6, 49.24, 49.46, 49.46, 49.89,
       50.54, 50.97, 52.27, 52.27, 52.48, 52.7, 52.7, 52.48, 52.05,
       52.27, 52.48, 52.92, 53.13, 54, 54.64, 55.29, 55.94, 56.59,
       57.24, 57.02, 56.8, 57.24, 57.67, 58.1, 58.75, 59.83, 60.69,
       61.34, 62.2, 62.42, 62.85, 62.42, 62.42, 62.85, 63.28, 63.71,
       64.79, 66.09, 67.6, 68.03, 68.47, 68.9, 69.33, 68.47, 68.03,
       68.25, 68.9, 69.98, 70.84, 71.92, 73.22, 74.08, 74.3, 74.73,
       75.38, 75.59, 75.59, 75.59, 75.81, 76.46, 76.03, 76.67, 77.32,
       77.54, 77.97, 78.83, 79.05, 80.35, 82.94, 82.51, 82.07, 82.72,
       84.02, 86.18, 88.77, 89.2, 90.71, 93.52, 94.6, 92.66, 90.71,
       89.63, 89.42, 89.63, 90.5, 90.71, 91.58, 92.01, 92.66, 94.38,
       94.6, 93.09, 93.09, 92.87, 93.74, 94.6, 95.03, 95.46, 96.11,
       95.68, 95.9, 96.98, 97.19, 96.33, 96.11, 95.68, 96.11, 96.76,
       97.41, 98.7, 100, 100.65, 100.43, 101.08, 101.94, 101.3,
       100.86, 100.65, 101.08, 101.3, 101.94, 102.81, 103.89, 104.54,
       104.75, 105.18, 105.62, 104.54, 104.32, 104.54, 105.18, 106.48,
       106.7, 107.34, 108.21, 107.78, 107.78, 108.64, 108.86, 108.42,
       108.86, 108.86, 108.86, 108.86, 109.72, 110.58, 111.66, 112.74,
       112.96, 113.61, 113.39, 112.53, 113.61, 113.39, 113.39, 114.25,
       113.82, 114.25, 116.2, 116.63, 117.06, 118.36, 119.44, 118.79,
       119, 119, 119, 120, 121, 123, 124, 124, 125, 127, 127, 127,
       127, 128, 127, 128, 129, 130, 132, 133, 133, 134, 134, 134,
       134, 135, 137, 138)), .Names = c("date", "cpi"), row.names = 4:220, class = "data.frame", na.action = structure(c(1L,
   2L, 3L, 221L), .Names = c("1", "2", "3", "221"), class = "omit"))

--
Ajay Shah                                      http://www.mayin.org/ajayshah
ajayshah at mayin.org                             http://ajayshahblog.blogspot.com
<*(:-? - wizard who doesn't know the answer.

_______________________________________________
R-SIG-Robust at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-robust

[RsR] Seasonal adjustment; What's a good `blackbox' robust lm() function?

Thread (3 messages)