Hi
I am trying to use split() to split a xts data set into weeks, but the
result seems not right.
original data is as following:
head(xec)
Open High Low Close mean
2011-02-28 112.34 113.34 111.96 112.87 112.6275
2011-03-01 112.89 113.71 112.75 112.80 113.0375
2011-03-02 112.75 113.56 112.50 113.54 113.0875
2011-03-03 113.50 115.08 113.10 115.05 114.1825
2011-03-04 115.16 115.97 114.85 115.06 115.2600
2011-03-07 115.21 115.26 114.55 114.85 114.9675
...
and I did the split() as following
head(split(x=xec,f="weeks"))
[[1]]
Open High Low Close mean
2011-02-28 112.34 113.34 111.96 112.87 112.6275
[[2]]
Open High Low Close mean
2011-03-01 112.89 113.71 112.75 112.80 113.0375
2011-03-02 112.75 113.56 112.50 113.54 113.0875
2011-03-03 113.50 115.08 113.10 115.05 114.1825
2011-03-04 115.16 115.97 114.85 115.06 115.2600
2011-03-07 115.21 115.26 114.55 114.85 114.9675
...
the "2011-02-28" is the Monday but not being grouped with the other
days in that week,
which makes the second groups begin from Tuesday ("2011-03-01").
I want it to be as following
[[1]]
Open High Low Close mean
2011-02-28 112.34 113.34 111.96 112.87 112.6275
2011-03-01 112.89 113.71 112.75 112.80 113.0375
2011-03-02 112.75 113.56 112.50 113.54 113.0875
2011-03-03 113.50 115.08 113.10 115.05 114.1825
2011-03-04 115.16 115.97 114.85 115.06 115.2600
[[2]]
Open High Low Close mean
2011-03-07 115.21 115.26 114.55 114.85 114.9675
...
Could anyone please give some advice?
Thanks in advance.
Seimizu Joukan
Your example of the problem is not reproducible [1]. This behavior could arise due to small discrepancies in the index values, or from specifying "frequency" instead of "f" as the second argument, our perhaps you have found a bug that only your data triggers. Any verification of what your problem is will require a reproducible example.
[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
Seimizu Joukan <saimizi at gmail.com> wrote:
Hi
I am trying to use split() to split a xts data set into weeks, but the
result seems not right.
original data is as following:
head(xec)
Open High Low Close mean
2011-02-28 112.34 113.34 111.96 112.87 112.6275
2011-03-01 112.89 113.71 112.75 112.80 113.0375
2011-03-02 112.75 113.56 112.50 113.54 113.0875
2011-03-03 113.50 115.08 113.10 115.05 114.1825
2011-03-04 115.16 115.97 114.85 115.06 115.2600
2011-03-07 115.21 115.26 114.55 114.85 114.9675
...
and I did the split() as following
head(split(x=xec,f="weeks"))
[[1]]
Open High Low Close mean
2011-02-28 112.34 113.34 111.96 112.87 112.6275
[[2]]
Open High Low Close mean
2011-03-01 112.89 113.71 112.75 112.80 113.0375
2011-03-02 112.75 113.56 112.50 113.54 113.0875
2011-03-03 113.50 115.08 113.10 115.05 114.1825
2011-03-04 115.16 115.97 114.85 115.06 115.2600
2011-03-07 115.21 115.26 114.55 114.85 114.9675
...
the "2011-02-28" is the Monday but not being grouped with the other
days in that week,
which makes the second groups begin from Tuesday ("2011-03-01").
I want it to be as following
[[1]]
Open High Low Close mean
2011-02-28 112.34 113.34 111.96 112.87 112.6275
2011-03-01 112.89 113.71 112.75 112.80 113.0375
2011-03-02 112.75 113.56 112.50 113.54 113.0875
2011-03-03 113.50 115.08 113.10 115.05 114.1825
2011-03-04 115.16 115.97 114.85 115.06 115.2600
[[2]]
Open High Low Close mean
2011-03-07 115.21 115.26 114.55 114.85 114.9675
...
Could anyone please give some advice?
Thanks in advance.
Seimizu Joukan
Your example of the problem is not reproducible [1]. This behavior could arise due to small discrepancies in the index values, or from specifying "frequency" instead of "f" as the second argument, our perhaps you have found a bug that only your data triggers. Any verification of what your problem is will require a reproducible example.
[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
I tried to make a piece of reproducible codes.
Would you please paste the following codes to R console and make a confirmation?
#Codes start from here
library("quantmod")
tmp<-structure(c(112.34, 112.89, 112.75, 113.5, 115.16, 115.21, 114.84,
114.93, 115.05, 114.46, 113.34, 113.71, 113.56, 115.08, 115.97,
115.26, 115.22, 115.24, 115.24, 114.98, 111.96, 112.75, 112.5,
113.1, 114.85, 114.55, 114.55, 114.75, 114.2, 112.92, 112.87,
112.8, 113.54, 115.05, 115.06, 114.85, 114.93, 115.09, 114.28,
113.92), class = c("xts", "zoo"), .indexCLASS = "Date", tclass =
"Date", .indexTZ = "", tzone = "", index = structure(c(1298818800,
1298905200, 1298991600, 1299078000, 1299164400, 1299423600, 1299510000,
1299596400, 1299682800, 1299769200), tzone = "", tclass = "Date"),
.Dim = c(10L,
4L), .Dimnames = list(NULL, c("Open", "High", "Low", "Close")))
class(tmp)
(res1<-split(tmp,f="weeks"))
(res2<-split(tmp,frequency="weeks"))
#Codes end here
the original data is saved in "tmp" and the split() results are saved
in res1 and res2.
res1 is the result of "f" and res2 is the resut of "frequency", both break the
week started from "2011-02-28" .... and result of "frequency" is even worse.
BTW, my R version is as following:
version _
platform i686-pc-linux-gnu
arch i686
os linux-gnu
system i686, linux-gnu
status
major 2
minor 15.2
year 2012
month 10
day 26
svn rev 61015
language R
version.string R version 2.15.2 (2012-10-26)
nickname Trick or Treat
Thank you.
Seimizu Joukan
Looking at args(split.xts) I think you actually do want split(..., f =
) here, not split(..., frequency = ), which would ignore and default
to months.
I get the following for res1, running R-Devel on OS X 10.6.8:
res1
[[1]]
Open High Low Close
2011-02-27 112.34 113.34 111.96 112.87
[[2]]
Open High Low Close
2011-02-28 112.89 113.71 112.75 112.80
2011-03-01 112.75 113.56 112.50 113.54
2011-03-02 113.50 115.08 113.10 115.05
2011-03-03 115.16 115.97 114.85 115.06
2011-03-06 115.21 115.26 114.55 114.85
[[3]]
Open High Low Close
2011-03-07 114.84 115.22 114.55 114.93
2011-03-08 114.93 115.24 114.75 115.09
2011-03-09 115.05 115.24 114.20 114.28
2011-03-10 114.46 114.98 112.92 113.92
so I think it's likely a timezone issue. Try setting
indexTZ(tmp) <- "GMT"
or something similar and giving it another shot.
You might also want to move to the R-SIG-Finance class where the
authors of xts are more frequently seen.
It might also help to report Sys.timezone() in addition to your
specific linux distro.
Cheers,
MW
Looking at args(split.xts) I think you actually do want split(..., f =
) here, not split(..., frequency = ), which would ignore and default
to months.
yes, split(...,f=) is what I want.
so I think it's likely a timezone issue. Try setting
indexTZ(tmp) <- "GMT"
yes, I think you are right.
when I set indexTZ(tmp) to "GMT" or "JST" (Japan),
I got the same result to yours.
when I do Sys.timezone(), I got "" , I am afraid that R failed to get
my ubuntu's system
environment variable. perhaps because I am using ubuntu on vmware,
there are some problems with timezone. not sure! :(
I know little about timezone, I will go on to investigate and
learn something about it. thank you for your help!
Best regards!
Seimizu Joukan