Message-ID: <5aebc8960910070211l4451fd66rd69c0b081f357c3f@mail.gmail.com>
Date: 2009-10-07T09:11:17Z
From: Aleks Clark
Subject: Vectorized rolling computation on xts series
In-Reply-To: <5e6a2e670910070130p6d987317s31d56eb5bb3899be@mail.gmail.com>
Another approach would be to use zoo or xts's lag function to to
generate a dataframe or matrix with the current day's data and N
previous periods in a table. If your data is fairly univariate, this
shouldn't prove a problem, just do a little math and you can specify
easily how many days of data to "go back". You'd do something like
this:
starting with:
d1
d2
d3
d4
d5
use Lag (or lag, they behave differently), then t() and apply() and end up with:
d1 na na na
d2 d1 na na
d3 d2 d1 na
d4 d3 d2 d1
d5 d4 d3 d2
you can then easily run your computations in a vectored form using the
apply family of functions.
On Wed, Oct 7, 2009 at 3:30 AM, Mark Breman <breman.mark at gmail.com> wrote:
> Hi Shane,
> I had a look at these functions but they do not satisfy my constraints:
>
> - apply.monthly works with 'calendar months', but I need a function that
> allows me to specify for instance 1995-01-06 until 1995-02-06 (i.e.
> 'duration' of one month) for the computation of element x = 1995-02-06
>
> - rollapply (and also rollmax, rollmin) need a specification of the number
> of previous elements from the series if I understand it correctly. As you
> can see in the example it is daily data but with lots of gaps, so this would
> be very difficult to do if at all possible.
>
> Thanks for your quick response though,
>
> Kind regards,
>
> -Mark-
>
> 2009/10/7 Shane <shane.conway at gmail.com>
>
>> I think you want the apply.monthly function in xts. It also has other time
>> periods (eg daily).
>>
>> You may also want to look at rollapply in zoo.
>>
>> Sent from my iPhone
>>
>>
>> On Oct 7, 2009, at 4:05 AM, Mark Breman <breman.mark at gmail.com> wrote:
>>
>> ?Hi,
>>> I have a univariate xts timeseries (daily data) for which I need to apply
>>> a
>>> computation for each element. The computation for element x needs the last
>>> y
>>> months of the data from the timeseries. What's more, I need a "vectorized"
>>> computation because looping over all elements is too slow (it's a large
>>> timeseries).
>>>
>>> I think this is what is called a "rolling" or "running" computation in R.
>>>
>>> The computation I need to do for element x is:
>>> - calculate the percentage of the value x within the range of values from
>>> the last y months, i.e. determine the min() and max() of the last y months
>>> of data (including x), and determine what percentage of this range the
>>> value
>>> x is. For example: min(last 1 months) == 10, max(last 1 months) == 50, x
>>> ==
>>> 20 would yield: 25%
>>> - elements for which y months of previous data (including x itself) is not
>>> available should become NaN or some other "special value".
>>>
>>> An example
>>> So let's say I have a timeseries called "data":
>>>
>>> ?data
>>>>
>>> ? ? ? ? ?NonCommNet
>>> 1995-01-03 ? ? ?44580
>>> 1995-01-04 ? ? ?44580
>>> 1995-01-05 ? ? ?44580
>>> 1995-01-06 ? ? ?44580
>>> 1995-01-09 ? ? ?44580
>>> 1995-01-10 ? ? ?32835
>>> 1995-01-11 ? ? ?32835
>>> 1995-01-12 ? ? ?32835
>>> 1995-01-13 ? ? ?32835
>>> 1995-01-16 ? ? ?32835
>>> 1995-01-17 ? ? ?38385
>>> 1995-01-18 ? ? ?38385
>>> 1995-01-19 ? ? ?38385
>>> 1995-01-20 ? ? ?38385
>>> 1995-01-23 ? ? ?38385
>>> 1995-01-24 ? ? ?19150
>>> 1995-01-25 ? ? ?19150
>>> 1995-01-26 ? ? ?19150
>>> 1995-01-27 ? ? ?19150
>>> 1995-01-30 ? ? ?19150
>>> 1995-01-31 ? ? ?15245
>>> 1995-02-01 ? ? ?15245
>>> 1995-02-02 ? ? ?15245
>>> 1995-02-03 ? ? ?15245
>>> 1995-02-06 ? ? ?15245
>>> 1995-02-07 ? ? ?24110
>>> 1995-02-08 ? ? ?24110
>>> 1995-02-09 ? ? ?24110
>>> 1995-02-10 ? ? ?24110
>>> 1995-02-13 ? ? ?24110
>>> 1995-02-14 ? ? ?17615
>>> 1995-02-15 ? ? ?17615
>>> 1995-02-16 ? ? ?17615
>>> 1995-02-17 ? ? ?17615
>>> 1995-02-21 ? ? -23080
>>> 1995-02-22 ? ? -23080
>>> 1995-02-23 ? ? -23080
>>> 1995-02-24 ? ? -23080
>>> 1995-02-27 ? ? -23080
>>> 1995-02-28 ? ? -17445
>>>
>>> I tried the following "vectorized" solution ( example with y = 1 month):
>>>
>>>> ((data - min(last(data, "1 months"))) / (max(last(data, "1 months")) -
>>>>
>>> min(last(data, "1 months")))) * 100
>>> ? ? ? ? ?NonCommNet
>>> 1995-01-03 ?143.37783
>>> 1995-01-04 ?143.37783
>>> 1995-01-05 ?143.37783
>>> 1995-01-06 ?143.37783
>>> 1995-01-09 ?143.37783
>>> 1995-01-10 ?118.48909
>>> 1995-01-11 ?118.48909
>>> 1995-01-12 ?118.48909
>>> 1995-01-13 ?118.48909
>>> 1995-01-16 ?118.48909
>>> 1995-01-17 ?130.25005
>>> 1995-01-18 ?130.25005
>>> 1995-01-19 ?130.25005
>>> 1995-01-20 ?130.25005
>>> 1995-01-23 ?130.25005
>>> 1995-01-24 ? 89.48930
>>> 1995-01-25 ? 89.48930
>>> 1995-01-26 ? 89.48930
>>> 1995-01-27 ? 89.48930
>>> 1995-01-30 ? 89.48930
>>> 1995-01-31 ? 81.21424
>>> 1995-02-01 ? 81.21424
>>> 1995-02-02 ? 81.21424
>>> 1995-02-03 ? 81.21424
>>> 1995-02-06 ? 81.21424
>>> 1995-02-07 ?100.00000
>>> 1995-02-08 ?100.00000
>>> 1995-02-09 ?100.00000
>>> 1995-02-10 ?100.00000
>>> 1995-02-13 ?100.00000
>>> 1995-02-14 ? 86.23649
>>> 1995-02-15 ? 86.23649
>>> 1995-02-16 ? 86.23649
>>> 1995-02-17 ? 86.23649
>>> 1995-02-21 ? ?0.00000
>>> 1995-02-22 ? ?0.00000
>>> 1995-02-23 ? ?0.00000
>>> 1995-02-24 ? ?0.00000
>>> 1995-02-27 ? ?0.00000
>>> 1995-02-28 ? 11.94109
>>>
>>> This does not satisfy my constraints because:
>>> 1) the first month of data should have become NaN or some other special
>>> value as there is not a full month of previous data available. I think
>>> this
>>> is caused by the last() function which simply returns the available data
>>> if
>>> the requested amount of data is greater than the available amount of data.
>>> 2) the results for the second month of data are wrong. For instance look
>>> at
>>> the result for 1995-02-06 which is 81.21424%. This should have been 0%.
>>> The
>>> last months min() is 15245 (from 1995-02-06), the max() is 44580 (from
>>> element 1995-01-06) so it should yield 0%.
>>>
>>> ?From analyzing the results I get the impression that the last() function
>>>> is
>>>>
>>> not suited for a "vectorized" solution but I'm not really sure...
>>>
>>> I also had a look at runMin() and runMax() from the TTR package, but you
>>> can't specify a calendar range with these functions as you can with last()
>>> and first() from the xts package.
>>>
>>> Now my question is: am I doing something wrong here or do you know another
>>> vectorized function that satisfies my constraints?
>>>
>>> Kind regards,
>>>
>>> -Mark-
>>>
>>> ? [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-SIG-Finance at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>> -- Subscriber-posting only.
>>> -- If you want to post, subscribe first.
>>>
>>
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> _______________________________________________
> R-SIG-Finance at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only.
> -- If you want to post, subscribe first.
>
--
Aleks Clark