calculate within-day correlations
On Thu, Sep 13, 2012 at 7:35 PM, emorway <emorway at usgs.gov> wrote:
useRs, Here is some R-ready data for my question to follow. Of course this data is small snippet from a much larger dataset that is about a decade long.
<snip data>
Q_use<-data.frame(date=as.POSIXct(paste(Q[,1],"-",Q[,2],"-",Q[,3]," ",floor(Q[,4]/60),":",Q[,4]-(floor(Q[,4]/60)*60),":00",sep=''),"%Y-%m-%d %H:%M:%S",tz=""),Q=Q$Q) SC_use<-data.frame(date=as.POSIXct(paste(SC[,1],"-",SC[,2],"-",SC[,3]," ",floor(SC[,4]/60),":",SC[,4]-(floor(SC[,4]/60)*60),":00",sep=''),"%Y-%m-%d %H:%M:%S",tz=""),SC=SC$SC) Using the data provided, I?m trying to calculate each day?s correlation between Q_use$Q and SC_use$SC and store the values in a data.frame. An example result I?d like to make is #Day 1 cor(Q_use$Q[1:95],SC_use$SC[1:95]) #[1] -0.4916499 #Day 2 cor(Q_use$Q[96:191],SC_use$SC[96:191]) #[1] -0.6085098 edm<-data.frame(Correl=t(t(c(cor(Q_use$Q[1:95],SC_use$SC[1:95]), cor(Q_use$Q[96:191],SC_use$SC[96:191]))))) But of course I want R to figure out appropriate indexes (i.e. 1:95, 96:191, and so in the larger dataset) for me. In other words, I'm seeking some help with R code that will ?pass? through the two datasets calculating each day?s correlation and doesn?t rely on the user supplying the ranges of indexes for way the daily values reside. There are, as there always is, a couple of wrinkles. On day 3, for example, cor(Q_use$Q[192:287],SC_use$SC[192:287]) [1] NA This is because SC_use$SC[275] = NA. Is there a way to direct R to continue calculating that day's correlation using the data that is available for that day? It is also necessary to check and make sure that Q_use[i,1]==SC_use[i,1] for each i in that day because in the larger dataset the row indices don?t necessarily match up (I have made sure that they do for this simple example). It would be handy to know how many values were missing on incomplete days, perhaps in a column appended to the resulting data frame. I appreciate any R code that could help get me started toward this end, I?m stuck. I tried looking at ?aggregate, had a look in the reshape library, and ?rollapply? in the zoo library, but I wasn?t seeing a way to do the error checking I just described. Thanks, Eric
Thanks for the reproducible example. This is pretty simple with xts:
library(xts)
xQ <- xts(Q_use["Q"], Q_use$date)
xSC <- xts(SC_use["SC"], SC_use$date)
x <- merge(xQ,xSC)
Now all the dates for both data sets are aligned in 'x', so you can
use apply.daily() to run a function over each day:
apply.daily(x, function(y) cor(y[,1],y[,2],use="pairwise.complete.obs"))
[,1]
2002-03-28 23:45:00 -0.4916499
2002-03-29 23:45:00 -0.6085098
2002-03-30 23:45:00 -0.1489898
2002-03-31 00:00:00 NA
Note that I had to create a small anonymous wrapper function so I
could pass two objects to the cor() function.
Hope that helps.
-- View this message in context: http://r.789695.n4.nabble.com/calculate-within-day-correlations-tp4643091.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com