| -----Original Message----- | From: r-sig-finance-bounces at stat.math.ethz.ch [mailto:r-sig-finance- | bounces at stat.math.ethz.ch] On Behalf Of Mark Breman | The computation I need to do for element x is: | - calculate the percentage of the value x within the range of values from | the last y months, i.e. determine the min() and max() of the last y months | of data (including x), and determine what percentage of this range the | value | x is. For example: min(last 1 months) == 10, max(last 1 months) == 50, x | == | 20 would yield: 25% | - elements for which y months of previous data (including x itself) is not | available should become NaN or some other "special value". | I tried the following "vectorized" solution ( example with y = 1 month): | > ((data - min(last(data, "1 months"))) / (max(last(data, "1 months")) - | min(last(data, "1 months")))) * 100 | This does not satisfy my constraints because: | 1) the first month of data should have become NaN or some other special | value as there is not a full month of previous data available. I think | this | is caused by the last() function which simply returns the available data | if | the requested amount of data is greater than the available amount of data. As you said, your data has frequent gaps, so you will never have a full month of previous data. Does that mean that you want a time-series full of NaN's? You should be careful how do you define a 'full month'. | 2) the results for the second month of data are wrong. | >From analyzing the results I get the impression that the last() function | is | not suited for a "vectorized" solution but I'm not really sure... In your code you are not applying a 'vectorized' last. You are taking the last one month of the whole time-series, which is the reason for the strange results. A couple of ideas: build an empty (0-column) xts with all dates (including those not in your series), and merge it with your series, and then you can apply zoo's rollmax & rollmin on a constant 30 or 31 day lookback window. (Both commands are fast, you may want to lookup na.locf for zoo or na.rm argument for max/min to deal with the extra dates.) Even simpler: assume 21 business days per month and do a rollmin/max on a window of 21. (Is it that big a problem if you are 1-2 days off?) If neither works for you and speed is important, this might be a good candidate for C code. HTH, Sandor
Vectorized rolling computation on xts series
1 message · Sandor Benczik