Split xts data set into weeks

Hi

I am trying to use split() to split a xts data set into weeks, but the
result seems not right.
original data is as following:
head(xec)
Open   High    Low  Close     mean
2011-02-28 112.34 113.34 111.96 112.87 112.6275
2011-03-01 112.89 113.71 112.75 112.80 113.0375
2011-03-02 112.75 113.56 112.50 113.54 113.0875
2011-03-03 113.50 115.08 113.10 115.05 114.1825
2011-03-04 115.16 115.97 114.85 115.06 115.2600
2011-03-07 115.21 115.26 114.55 114.85 114.9675
...

and I did the split() as following
head(split(x=xec,f="weeks"))
[[1]]
             Open   High    Low  Close     mean
2011-02-28 112.34 113.34 111.96 112.87 112.6275

[[2]]
             Open   High    Low  Close     mean
2011-03-01 112.89 113.71 112.75 112.80 113.0375
2011-03-02 112.75 113.56 112.50 113.54 113.0875
2011-03-03 113.50 115.08 113.10 115.05 114.1825
2011-03-04 115.16 115.97 114.85 115.06 115.2600
2011-03-07 115.21 115.26 114.55 114.85 114.9675
...

the "2011-02-28" is the Monday but not being grouped with the other
days in that week,
which makes the second groups begin from Tuesday ("2011-03-01").

I want it to be as following

[[1]]
             Open   High    Low  Close     mean
2011-02-28 112.34 113.34 111.96 112.87 112.6275
2011-03-01 112.89 113.71 112.75 112.80 113.0375
2011-03-02 112.75 113.56 112.50 113.54 113.0875
2011-03-03 113.50 115.08 113.10 115.05 114.1825
2011-03-04 115.16 115.97 114.85 115.06 115.2600

[[2]]
             Open   High    Low  Close     mean
2011-03-07 115.21 115.26 114.55 114.85 114.9675
...

Could anyone please give some advice?
Thanks in advance.

Seimizu Joukan
Your example of the problem is not reproducible [1]. This behavior could arise due to small discrepancies in the index values, or from specifying "frequency" instead of "f" as the second argument, our perhaps you have found a bug that only your data triggers. Any verification of what your problem is will require a reproducible example.

[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Hi

I am trying to use split() to split a xts data set into weeks, but the
result seems not right.
original data is as following:

head(xec)
            Open   High    Low  Close     mean
2011-02-28 112.34 113.34 111.96 112.87 112.6275
2011-03-01 112.89 113.71 112.75 112.80 113.0375
2011-03-02 112.75 113.56 112.50 113.54 113.0875
2011-03-03 113.50 115.08 113.10 115.05 114.1825
2011-03-04 115.16 115.97 114.85 115.06 115.2600
2011-03-07 115.21 115.26 114.55 114.85 114.9675
...

and I did the split() as following

head(split(x=xec,f="weeks"))
[[1]]
            Open   High    Low  Close     mean
2011-02-28 112.34 113.34 111.96 112.87 112.6275

[[2]]
            Open   High    Low  Close     mean
2011-03-01 112.89 113.71 112.75 112.80 113.0375
2011-03-02 112.75 113.56 112.50 113.54 113.0875
2011-03-03 113.50 115.08 113.10 115.05 114.1825
2011-03-04 115.16 115.97 114.85 115.06 115.2600
2011-03-07 115.21 115.26 114.55 114.85 114.9675
...

the "2011-02-28" is the Monday but not being grouped with the other
days in that week,
which makes the second groups begin from Tuesday ("2011-03-01").

I want it to be as following

[[1]]
            Open   High    Low  Close     mean
2011-02-28 112.34 113.34 111.96 112.87 112.6275
2011-03-01 112.89 113.71 112.75 112.80 113.0375
2011-03-02 112.75 113.56 112.50 113.54 113.0875
2011-03-03 113.50 115.08 113.10 115.05 114.1825
2011-03-04 115.16 115.97 114.85 115.06 115.2600

[[2]]
            Open   High    Low  Close     mean
2011-03-07 115.21 115.26 114.55 114.85 114.9675
...

Could anyone please give some advice?
Thanks in advance.

Seimizu Joukan

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hi, Jeff

Thank you for your advice.
Your example of the problem is not reproducible [1]. This behavior could arise due to small discrepancies in the index values, or from specifying "frequency" instead of "f" as the second argument, our perhaps you have found a bug that only your data triggers. Any verification of what your problem is will require a reproducible example.
[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
I tried to make a piece of reproducible codes.
Would you please paste the following codes to R console and make a confirmation?

#Codes start from here

library("quantmod")
tmp<-structure(c(112.34, 112.89, 112.75, 113.5, 115.16, 115.21, 114.84,
114.93, 115.05, 114.46, 113.34, 113.71, 113.56, 115.08, 115.97,
115.26, 115.22, 115.24, 115.24, 114.98, 111.96, 112.75, 112.5,
113.1, 114.85, 114.55, 114.55, 114.75, 114.2, 112.92, 112.87,
112.8, 113.54, 115.05, 115.06, 114.85, 114.93, 115.09, 114.28,
113.92), class = c("xts", "zoo"), .indexCLASS = "Date", tclass =
"Date", .indexTZ = "", tzone = "", index = structure(c(1298818800,
1298905200, 1298991600, 1299078000, 1299164400, 1299423600, 1299510000,
1299596400, 1299682800, 1299769200), tzone = "", tclass = "Date"),
.Dim = c(10L,
4L), .Dimnames = list(NULL, c("Open", "High", "Low", "Close")))
class(tmp)
(res1<-split(tmp,f="weeks"))
(res2<-split(tmp,frequency="weeks"))

#Codes end here

the original data is saved in "tmp" and the split() results are saved
in res1 and res2.
res1 is the result of "f" and res2 is the resut of "frequency", both break the
week started from "2011-02-28" .... and result of "frequency" is even worse.

BTW, my R version is as following:
version               _
platform       i686-pc-linux-gnu
arch           i686
os             linux-gnu
system         i686, linux-gnu
status
major          2
minor          15.2
year           2012
month          10
day            26
svn rev        61015
language       R
version.string R version 2.15.2 (2012-10-26)
nickname       Trick or Treat

Thank you.

Seimizu Joukan
Would you please paste the following codes to R console and make a confirmation?

Indeed, well done and much appreciated.
#Codes start from here

library("quantmod")
tmp<-structure(c(112.34, 112.89, 112.75, 113.5, 115.16, 115.21, 114.84,
114.93, 115.05, 114.46, 113.34, 113.71, 113.56, 115.08, 115.97,
115.26, 115.22, 115.24, 115.24, 114.98, 111.96, 112.75, 112.5,
113.1, 114.85, 114.55, 114.55, 114.75, 114.2, 112.92, 112.87,
112.8, 113.54, 115.05, 115.06, 114.85, 114.93, 115.09, 114.28,
113.92), class = c("xts", "zoo"), .indexCLASS = "Date", tclass =
"Date", .indexTZ = "", tzone = "", index = structure(c(1298818800,
1298905200, 1298991600, 1299078000, 1299164400, 1299423600, 1299510000,
1299596400, 1299682800, 1299769200), tzone = "", tclass = "Date"),
.Dim = c(10L,
4L), .Dimnames = list(NULL, c("Open", "High", "Low", "Close")))
class(tmp)
(res1<-split(tmp,f="weeks"))
(res2<-split(tmp,frequency="weeks"))
Looking at args(split.xts) I think you actually do want split(..., f =
) here, not split(..., frequency = ), which would ignore and default
to months.

I get the following for res1, running R-Devel on OS X 10.6.8:
res1
[[1]]
             Open   High    Low  Close
2011-02-27 112.34 113.34 111.96 112.87

[[2]]
             Open   High    Low  Close
2011-02-28 112.89 113.71 112.75 112.80
2011-03-01 112.75 113.56 112.50 113.54
2011-03-02 113.50 115.08 113.10 115.05
2011-03-03 115.16 115.97 114.85 115.06
2011-03-06 115.21 115.26 114.55 114.85

[[3]]
             Open   High    Low  Close
2011-03-07 114.84 115.22 114.55 114.93
2011-03-08 114.93 115.24 114.75 115.09
2011-03-09 115.05 115.24 114.20 114.28
2011-03-10 114.46 114.98 112.92 113.92

so I think it's likely a timezone issue. Try setting

indexTZ(tmp) <- "GMT"

or something similar and giving it another shot.

You might also want to move to the R-SIG-Finance class where the
authors of xts are more frequently seen.

It might also help to report Sys.timezone() in addition to your
specific linux distro.

Cheers,

MW
Hi, Michael

Thank you very much!
Looking at args(split.xts) I think you actually do want split(..., f =
) here, not split(..., frequency = ), which would ignore and default
to months.
yes, split(...,f=) is what I want.
so I think it's likely a timezone issue. Try setting
indexTZ(tmp) <- "GMT"
yes, I think you are right.
when I set indexTZ(tmp) to "GMT" or "JST" (Japan),
I got the same result to yours.

when I do Sys.timezone(), I got "" , I am afraid that R failed to get
my ubuntu's system
environment variable. perhaps because I am using ubuntu on vmware,
there are some problems with timezone. not sure! :(

I  know little about timezone, I will go on to investigate and
learn something about it. thank you for your help!

Best regards!

Seimizu Joukan