An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120525/6ca4390d/attachment.pl>
Query about creating time sequences
10 messages · Shivam, michael.weylandt at gmail.com (R. Michael Weylandt, Jeff Newmiller +2 more
One (somewhat kludgy) way would be to use seq() to make one day's worth of times then to pass those to outer() to add in the needed days and then coerce the whole thing back to a sorted vector. I'm not at a computer right now so this won't be quite right but something like x <- seq(x.start.first.day, x.end.first.day, by = "sec") y <- 24*60*60 *(1:n.days) sort(as.vector(outer(x, y, "+"))) Changing the order of x and y might make the sort unnecessary. M
On May 25, 2012, at 1:14 PM, Shivam <shivamsingh at gmail.com> wrote:
Hi All,
I have a query about time based sequences. I know such questions have been
asked a lot on forums, but I couldnt find the exact thing that I was
looking for.
I want to create a time-based sequence which will mimic the trading window
AND would span multiple days. Something like below:
"2011-01-03 09:15:00 IST"
"2011-01-03 09:15:01 IST"
....
....
....
"2011-01-03 15:29:59 IST"
"2011-01-03 15:30:00 IST"
"2011-01-04 09:15:00 IST"
"2011-01-04 09:15:01 IST"
....
....
....
"2011-01-04 15:29:59 IST"
"2011-01-04 15:30:00 IST"
Kindly notice the change of date in the sequence.
The Indian Equity markets open at 09:15:00 and close at 15:30:00. I have
equity data that spans 124 days, and I need to create a corresponding
sequence which I will later use to regularize the irregular dataset to make
a regular time-series.
I was able to accomplish this task for a single day (i.e. creating a
sequence then merging my dataset with it and use na.locf to make my dataset
regular) but am unable to create a sequence for 'n' number of days. Can
anyone help me with this?
If it is of any help, I have a file which contains all the dates for which
I need the sequence. The dput of the file is placed at the end of the
email.
One option is to create sequences for the entire days and then later remove
all these records after merging. Although I havent checked the feasibility
of this method, it would be complex and more so it will increase the data
four folds (I already have 2 million records in the dataframe which I have
to make regular).
Another approach that I could think of was to make a timebased sequence
based on the date from the file and then use a loop to append one sequence
after another. But am not having much success there either.
Any kind of help would be greatly appreciated.
Thanks and regards,
Shivam
structure(list("20110103", "20110104", "20110105", "20110106",
"20110107", "20110110", "20110111", "20110112", "20110113",
"20110114", "20110117", "20110118", "20110119", "20110120",
"20110121", "20110124", "20110125", "20110127", "20110128",
"20110131", "20110201", "20110202", "20110203", "20110204",
"20110207", "20110208", "20110209", "20110210", "20110211",
"20110214", "20110215", "20110216", "20110217", "20110218",
"20110221", "20110222", "20110223", "20110224", "20110225",
"20110228", "20110301", "20110303", "20110304", "20110307",
"20110308", "20110309", "20110310", "20110311", "20110314",
"20110315", "20110316", "20110317", "20110318", "20110321",
"20110322", "20110323", "20110324", "20110325", "20110328",
"20110329", "20110330", "20110331", "20110401", "20110404",
"20110405", "20110406", "20110407", "20110408", "20110411",
"20110413", "20110415", "20110418", "20110419", "20110420",
"20110421", "20110425", "20110426", "20110427", "20110428",
"20110429", "20110502", "20110503", "20110504", "20110505",
"20110506", "20110509", "20110510", "20110511", "20110512",
"20110513", "20110516", "20110517", "20110518", "20110519",
"20110520", "20110523", "20110524", "20110525", "20110526",
"20110527", "20110530", "20110531", "20110601", "20110602",
"20110603", "20110606", "20110607", "20110608", "20110609",
"20110610", "20110613", "20110614", "20110615", "20110616",
"20110617", "20110620", "20110621", "20110622", "20110623",
"20110624", "20110627", "20110628", "20110629", "20110630"), .Dim =
c(124L,
1L), .Dimnames = list(c("X1", "X2", "X3", "X4", "X5", "X6", "X7",
"X8", "X9", "X10", "X11", "X12", "X13", "X14", "X15", "X16",
"X17", "X18", "X19", "X20", "X21", "X22", "X23", "X24", "X25",
"X26", "X27", "X28", "X29", "X30", "X31", "X32", "X33", "X34",
"X35", "X36", "X37", "X38", "X39", "X40", "X41", "X42", "X43",
"X44", "X45", "X46", "X47", "X48", "X49", "X50", "X51", "X52",
"X53", "X54", "X55", "X56", "X57", "X58", "X59", "X60", "X61",
"X62", "X63", "X64", "X65", "X66", "X67", "X68", "X69", "X70",
"X71", "X72", "X73", "X74", "X75", "X76", "X77", "X78", "X79",
"X80", "X81", "X82", "X83", "X84", "X85", "X86", "X87", "X88",
"X89", "X90", "X91", "X92", "X93", "X94", "X95", "X96", "X97",
"X98", "X99", "X100", "X101", "X102", "X103", "X104", "X105",
"X106", "X107", "X108", "X109", "X110", "X111", "X112", "X113",
"X114", "X115", "X116", "X117", "X118", "X119", "X120", "X121",
"X122", "X123", "X124"), NULL))
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
1 day later
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120528/6ad659e9/attachment.pl>
On Fri, May 25, 2012 at 1:14 PM, Shivam <shivamsingh at gmail.com> wrote:
Hi All, I have a query about time based sequences. I know such questions have been asked a lot on forums, but I couldnt find the exact thing that I was looking for. I want to create a time-based sequence which will mimic the trading window AND would span multiple days. Something like below: "2011-01-03 09:15:00 IST" "2011-01-03 09:15:01 IST" .... .... .... "2011-01-03 15:29:59 IST" "2011-01-03 15:30:00 IST" "2011-01-04 09:15:00 IST" "2011-01-04 09:15:01 IST" .... .... .... "2011-01-04 15:29:59 IST" "2011-01-04 15:30:00 IST" Kindly notice the change of date in the sequence. The Indian Equity markets open at 09:15:00 and close at 15:30:00. I have equity data that spans 124 days, and I need to create a corresponding sequence which I will later use to regularize the irregular dataset to make a regular time-series. I was able to accomplish this task for a single day (i.e. creating a sequence then merging my dataset with it and use na.locf to make my dataset regular) but am unable to create a sequence for 'n' number of days. Can anyone help me with this? If it is of any help, I have a file which contains all the dates for which I need the sequence. The dput of the file is placed at the end of the email. One option is to create sequences for the entire days and then later remove all these records after merging. Although I havent checked the feasibility of this method, it would be complex and more so it will increase the data four folds (I already have 2 million records in the dataframe which I have to make regular). Another approach that I could think of was to make a timebased sequence based on the date from the file and then use a loop to append one sequence after another. But am not having much success there either. Any kind of help would be greatly appreciated. Thanks and regards, Shivam
Create a minute by minute sequence of datetimes (tseq) from the first
datetime to the last datetime and then extract those datetimes whose
times (tt) lie between the desired times of day:
from <- as.POSIXct("2011-01-03 09:15:00:00")
to <- as.POSIXct("2011-01-04 15:30:00")
tseq <- seq(from, to, "1 min")
tt <- format(tseq, "%H:%M")
tseq[tt >= "09:30" & tt <= "15:30"]
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Try this: # Setting TZ is optional, but I find it helps me to be more aware of # timezone effects Sys.setenv( TZ="Asia/Kolkata") library(lubridate) pdates <- as.POSIXct( fdates ) hstart <- new_period( hour=9, minute=15 ) hend <- new_period( hour=15, minute=30 ) mperiod <- new_period( minute=15 ) numperday <- (hend-hstart)/mperiod dtms <- expand.grid( dt=pdates, tm=hstart + mperiod * seq( from=0, to=numperday ) ) dtms$dtm <- with( dtyms, dt + tm ) dtms <- dtms[ order( dtms$dtm ), ] You can discard everything but dtms$dtm once it has been created.
On Mon, 28 May 2012, Shivam wrote:
Thanks for the effort Michael, but the problem here is that the dates for
which the sequences need to be created have gaps in between. Basically I
need the sequence for only those days on which the security market is open
(I have the dates in a file which is present at the end of THIS mail).
What I have been able to do is to create a list, where each element of the
list is a sequence for a single day. It was done like below:
for(i in 1:124){
seqtimes[[i]] = xts(,seq(as.POSIXct(paste(fdates[i],'09:15:00', sep=" ")),
as.POSIXct(paste(fdates[i],'15:30:00', sep=" ")), by = 1))}
where 'fdates' is the file which contains the dates for which the sequences
need to be created.
But now I am stuck. I need a way to get all these sequences in a
(vector/dataframe/xts object) where all the list items are sequentially
present.
I tried merge.xts, but to no avail.
seqtimes[[1]]
Data: numeric(0) Index: POSIXct[1:22501], format: "2011-01-03 09:15:00" "2011-01-03 09:15:01" "2011-01-03 09:15:02" "2011-01-03 09:15:03" "2011-01-03 09:15:04" "2011-01-03 09:15:05" ...
seqtimes[[2]]
Data: numeric(0) Index: POSIXct[1:22501], format: "2011-01-04 09:15:00" "2011-01-04 09:15:01" "2011-01-04 09:15:02" "2011-01-04 09:15:03" "2011-01-04 09:15:04" "2011-01-04 09:15:05" ...
tseq = merge.xts(seqtimes[[1]],seqtimes[[2]], all = TRUE) tseq
Data: numeric(0) Index: integer(0) Any help would be greatly appreciated. Thanks in advance, Regards, Shivam P.S. - The dput of the fdates file:
dput(fdates)
structure(c("2011-01-03", "2011-01-04", "2011-01-05", "2011-01-06",
"2011-01-07", "2011-01-10", "2011-01-11", "2011-01-12", "2011-01-13",
"2011-01-14", "2011-01-17", "2011-01-18", "2011-01-19", "2011-01-20",
"2011-01-21", "2011-01-24", "2011-01-25", "2011-01-27", "2011-01-28",
"2011-01-31", "2011-02-01", "2011-02-02", "2011-02-03", "2011-02-04",
"2011-02-07", "2011-02-08", "2011-02-09", "2011-02-10", "2011-02-11",
"2011-02-14", "2011-02-15", "2011-02-16", "2011-02-17", "2011-02-18",
"2011-02-21", "2011-02-22", "2011-02-23", "2011-02-24", "2011-02-25",
"2011-02-28", "2011-03-01", "2011-03-03", "2011-03-04", "2011-03-07",
"2011-03-08", "2011-03-09", "2011-03-10", "2011-03-11", "2011-03-14",
"2011-03-15", "2011-03-16", "2011-03-17", "2011-03-18", "2011-03-21",
"2011-03-22", "2011-03-23", "2011-03-24", "2011-03-25", "2011-03-28",
"2011-03-29", "2011-03-30", "2011-03-31", "2011-04-01", "2011-04-04",
"2011-04-05", "2011-04-06", "2011-04-07", "2011-04-08", "2011-04-11",
"2011-04-13", "2011-04-15", "2011-04-18", "2011-04-19", "2011-04-20",
"2011-04-21", "2011-04-25", "2011-04-26", "2011-04-27", "2011-04-28",
"2011-04-29", "2011-05-02", "2011-05-03", "2011-05-04", "2011-05-05",
"2011-05-06", "2011-05-09", "2011-05-10", "2011-05-11", "2011-05-12",
"2011-05-13", "2011-05-16", "2011-05-17", "2011-05-18", "2011-05-19",
"2011-05-20", "2011-05-23", "2011-05-24", "2011-05-25", "2011-05-26",
"2011-05-27", "2011-05-30", "2011-05-31", "2011-06-01", "2011-06-02",
"2011-06-03", "2011-06-06", "2011-06-07", "2011-06-08", "2011-06-09",
"2011-06-10", "2011-06-13", "2011-06-14", "2011-06-15", "2011-06-16",
"2011-06-17", "2011-06-20", "2011-06-21", "2011-06-22", "2011-06-23",
"2011-06-24", "2011-06-27", "2011-06-28", "2011-06-29", "2011-06-30"
), .Dim = c(124L, 1L))
On Sat, May 26, 2012 at 6:22 AM, R. Michael Weylandt <
michael.weylandt at gmail.com> <michael.weylandt at gmail.com> wrote:
One (somewhat kludgy) way would be to use seq() to make one day's worth of times then to pass those to outer() to add in the needed days and then coerce the whole thing back to a sorted vector. I'm not at a computer right now so this won't be quite right but something like x <- seq(x.start.first.day, x.end.first.day, by = "sec") y <- 24*60*60 *(1:n.days) sort(as.vector(outer(x, y, "+"))) Changing the order of x and y might make the sort unnecessary. M On May 25, 2012, at 1:14 PM, Shivam <shivamsingh at gmail.com> wrote:
Hi All, I have a query about time based sequences. I know such questions have
been
asked a lot on forums, but I couldnt find the exact thing that I was looking for. I want to create a time-based sequence which will mimic the trading
window
AND would span multiple days. Something like below: "2011-01-03 09:15:00 IST" "2011-01-03 09:15:01 IST" .... .... .... "2011-01-03 15:29:59 IST" "2011-01-03 15:30:00 IST" "2011-01-04 09:15:00 IST" "2011-01-04 09:15:01 IST" .... .... .... "2011-01-04 15:29:59 IST" "2011-01-04 15:30:00 IST" Kindly notice the change of date in the sequence. The Indian Equity markets open at 09:15:00 and close at 15:30:00. I have equity data that spans 124 days, and I need to create a corresponding sequence which I will later use to regularize the irregular dataset to
make
a regular time-series. I was able to accomplish this task for a single day (i.e. creating a sequence then merging my dataset with it and use na.locf to make my
dataset
regular) but am unable to create a sequence for 'n' number of days. Can anyone help me with this? If it is of any help, I have a file which contains all the dates for
which
I need the sequence. The dput of the file is placed at the end of the email. One option is to create sequences for the entire days and then later
remove
all these records after merging. Although I havent checked the
feasibility
of this method, it would be complex and more so it will increase the data four folds (I already have 2 million records in the dataframe which I
have
to make regular). Another approach that I could think of was to make a timebased sequence based on the date from the file and then use a loop to append one
sequence
after another. But am not having much success there either.
Any kind of help would be greatly appreciated.
Thanks and regards,
Shivam
structure(list("20110103", "20110104", "20110105", "20110106",
"20110107", "20110110", "20110111", "20110112", "20110113",
"20110114", "20110117", "20110118", "20110119", "20110120",
"20110121", "20110124", "20110125", "20110127", "20110128",
"20110131", "20110201", "20110202", "20110203", "20110204",
"20110207", "20110208", "20110209", "20110210", "20110211",
"20110214", "20110215", "20110216", "20110217", "20110218",
"20110221", "20110222", "20110223", "20110224", "20110225",
"20110228", "20110301", "20110303", "20110304", "20110307",
"20110308", "20110309", "20110310", "20110311", "20110314",
"20110315", "20110316", "20110317", "20110318", "20110321",
"20110322", "20110323", "20110324", "20110325", "20110328",
"20110329", "20110330", "20110331", "20110401", "20110404",
"20110405", "20110406", "20110407", "20110408", "20110411",
"20110413", "20110415", "20110418", "20110419", "20110420",
"20110421", "20110425", "20110426", "20110427", "20110428",
"20110429", "20110502", "20110503", "20110504", "20110505",
"20110506", "20110509", "20110510", "20110511", "20110512",
"20110513", "20110516", "20110517", "20110518", "20110519",
"20110520", "20110523", "20110524", "20110525", "20110526",
"20110527", "20110530", "20110531", "20110601", "20110602",
"20110603", "20110606", "20110607", "20110608", "20110609",
"20110610", "20110613", "20110614", "20110615", "20110616",
"20110617", "20110620", "20110621", "20110622", "20110623",
"20110624", "20110627", "20110628", "20110629", "20110630"), .Dim =
c(124L,
1L), .Dimnames = list(c("X1", "X2", "X3", "X4", "X5", "X6", "X7",
"X8", "X9", "X10", "X11", "X12", "X13", "X14", "X15", "X16",
"X17", "X18", "X19", "X20", "X21", "X22", "X23", "X24", "X25",
"X26", "X27", "X28", "X29", "X30", "X31", "X32", "X33", "X34",
"X35", "X36", "X37", "X38", "X39", "X40", "X41", "X42", "X43",
"X44", "X45", "X46", "X47", "X48", "X49", "X50", "X51", "X52",
"X53", "X54", "X55", "X56", "X57", "X58", "X59", "X60", "X61",
"X62", "X63", "X64", "X65", "X66", "X67", "X68", "X69", "X70",
"X71", "X72", "X73", "X74", "X75", "X76", "X77", "X78", "X79",
"X80", "X81", "X82", "X83", "X84", "X85", "X86", "X87", "X88",
"X89", "X90", "X91", "X92", "X93", "X94", "X95", "X96", "X97",
"X98", "X99", "X100", "X101", "X102", "X103", "X104", "X105",
"X106", "X107", "X108", "X109", "X110", "X111", "X112", "X113",
"X114", "X115", "X116", "X117", "X118", "X119", "X120", "X121",
"X122", "X123", "X124"), NULL))
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- *Victoria Concordia Crescit* [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120528/78db56dc/attachment.pl>
On Sun, May 27, 2012 at 7:03 PM, Shivam <shivamsingh at gmail.com> wrote:
Thanks for the responses ppl. @Gabor - The issue with your approach was that I had to select the time window for many days (124), which would be very difficult to achieve. I really appreciate you time though.
Why does the number of days "make it difficult to achieve"? The number of days does not affect the code at all. Is there some aspect of the problem you haven't mentioned?
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120528/acddade2/attachment.pl>
On Sun, May 27, 2012 at 8:01 PM, Shivam <shivamsingh at gmail.com> wrote:
Its not the number of days per se, it is the random gaps between the dates (corresponding to the dates on which the security market was closed) which will be difficult to accommodate in the solution proposed by you. So I would have to remove the sequence corresponding to those days from the entire sequence. This was the part which I deemed as difficult to achieve. I had mentioned this issue in my previous mails but you might have missed it.
If dd is a vector of the dates you want then just change the last line
to choose only those using as.Date(tseq, tz = "") %in% dd as below:
dd <- as.Date(c("2011-01-03", "2011-01-04")) ##
from <- as.POSIXct(paste(dd[1], "09:15:00")) ##
to <- as.POSIXct(paste(tail(dd, 1), "15:30:00")) ##
tseq <- seq(from, to, "1 min")
tt <- format(tseq, "%H:%M:%S")
tresult <- tseq[tt >= "09:30:00" & tt <= "15:30:00" & as.Date(tseq, tz
= "") %in% dd] ##
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Depending on your exchange of interest, you might also find some of the functions of the timeDate package helpful, e.g., holidayNYSE() -- it will miss the day the market was closed for extraordinary circumstances, but it seems to do a very good job. [Disclaimer: I haven't used it myself extensively] Michael On Sun, May 27, 2012 at 8:26 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
On Sun, May 27, 2012 at 8:01 PM, Shivam <shivamsingh at gmail.com> wrote:
Its not the number of days per se, it is the random gaps between the dates (corresponding to the dates on which the security market was closed) which will be difficult to accommodate in the solution proposed by you. So I would have to remove the sequence corresponding to those days from the entire sequence. This was the part which I deemed as difficult to achieve. I had mentioned this issue in my previous mails but you might have missed it.
If dd is a vector of the dates you want then just change the last line
to choose only those using as.Date(tseq, tz = "") %in% dd as below:
dd <- as.Date(c("2011-01-03", "2011-01-04")) ##
from <- as.POSIXct(paste(dd[1], "09:15:00")) ##
to <- as.POSIXct(paste(tail(dd, 1), "15:30:00")) ##
tseq <- seq(from, to, "1 min")
tt <- format(tseq, "%H:%M:%S")
tresult <- tseq[tt >= "09:30:00" & tt <= "15:30:00" & as.Date(tseq, tz
= "") %in% dd] ##
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.