Skip to content

need help with data management

6 messages · David Winsemius, Gabor Grothendieck, analyst41 at hotmail.com

#
I have a data frame that reads

client ID date transcations

323232   11/1/2010 22
323232   11/2/2010 0
323232   11/3/2010 missing
121212   11/10/2010 32
121212    11/11/2010 15
.................................


I want to order the rows by client ID and date and using a black-box
forecasting method create the data fcst(client,date of forecast, date
for which forecast applies).

 Assume that I have a function that given a time series
x(1),x(2),....x(k) will generate f(i,j) where f(i,j) = forecast j days
ahead, given data till date i.

How can the forecast data be best stored and how would I go about the
taks of processing all the clients and dates?

Thanks.
#
On Dec 25, 2010, at 8:08 AM, analyst41 at hotmail.com wrote:

            
http://lmgtfy.com/?q=forecast+r-project
#
On Dec 25, 10:17?am, David Winsemius <dwinsem... at comcast.net> wrote:
Thanks.  I am planning to write my own univariate forecasting routine.

My question is mostly concerned with separting out the time series by
client, generating the forecasts and then putting everything back
together into something like

ClientID | forecast date| date forecast is for |forecast| actual
#
On Dec 25, 2010, at 10:45 AM, analyst41 at hotmail.com wrote:

            
See the various manipulation functions: split, aggregate, tapply, and  
the plyr package. Specifics will depend on the data structures that  
constitute input.
Well, there is the forecast package ...  but you said you had methods  
in mind, so you can offer code.
Will depend on the choices made in the first step.
The answer is going to depend on the data structures used. Show us  
some data _and_ your code.
#
On Sat, Dec 25, 2010 at 8:08 AM, analyst41 at hotmail.com
<analyst41 at hotmail.com> wrote:
This isn't quite what you asked but it seems more suitable to what you
need.  Instead of using long form data we transform it to wide form
with one client per column.  Try copying this from this post and
pasting it into your R session:

Lines <- "323232   11/1/2010 22
323232   11/2/2010 0
323232   11/3/2010 missing
121212   11/10/2010 32
121212    11/11/2010 15"

library(zoo)
library(chron)

# read in. split = 1 converts to wide form
# can use "myfile.dat" in place of textConnection(Lines) for real data
z <- read.zoo(textConnection(Lines), split = 1, index = 2, FUN = chron,
      na.strings = "missing")
# d is matrix with one row per date and one col per client
d <- coredata(z)

# just use last point as our forecast for next 3 dates
naive.forecast <- function(x) rep(tail(x, 1), 3)
pred <- apply(d, 2, naive.forecast)

# put predictions together with the data
rbind(d, pred)


For the data you showed this gives:
121212 323232
[1,]     NA     22
[2,]     NA      0
[3,]     NA     NA
[4,]     32     NA
[5,]     15     NA
[6,]     15     NA
[7,]     15     NA
[8,]     15     NA
1 day later
#
On Dec 25, 1:36?pm, Gabor Grothendieck <ggrothendi... at gmail.com>
wrote:
Thank you.

Everything works on my system (windows) except that I get the final
output

     X121212 X323232
[1,]      NA      22
[2,]      NA       0
[3,]      NA      NA
[4,]      32      NA
[5,]      15      NA
[6,]      15      NA
[7,]      15      NA
[8,]      15      NA

i.e., an "X" gets attached to the client name.

I'd also like to retain the dates in each row.  I'll try to follow up
along these lines.