Multiple time series with zoo
On Mon, Oct 31, 2011 at 11:27 PM, Vinny Moriarty <vwmoriarty at gmail.com> wrote:
Thanks for everyone's input so far, it is greatly appreciated. But I've got
one last task I could use some advice on
Here are the first few lines of my data set:
site,time_local,time_utc,reef_type_code,sensor_type,sensor_depth_m,temperature_c
06,2006-04-09 10:20:00,2006-04-09 20:20:00,BAK,sb39, 2, 29.63
06,2006-04-09 10:40:00,2006-04-09 20:40:00,BAK,sb39, 2, 29.56
06,2006-04-09 11:00:00,2006-04-09 21:00:00,BAK,sb39, 2, 29.51
06,2006-04-09 11:20:00,2006-04-09 21:20:00,BAK,sb39, 10, 29.53
06,2006-04-09 11:40:00,2006-04-09 21:40:00,BAK,sb39, 2, 29.57
06,2006-04-09 12:00:00,2006-04-09 22:00:00,BAK,sb39, 2, 29.60
06,2006-04-09 12:20:00,2006-04-09 22:20:00,BAK,sb39, 2, 29.66
06,2006-04-09 12:40:00,2006-04-09 22:40:00,BAK,sb39, 2, 29.68
06,2006-04-09 13:00:00,2006-04-09 23:00:00,BAK,sb39, 10, 29.68
06,2006-04-09 13:20:00,2006-04-09 23:20:00,BAK,sb39, 2, 29.71
06,2006-04-09 13:40:00,2006-04-09 23:40:00,BAK,sb39, 2, 29.68
06,2006-04-09 14:00:00,2006-04-10 00:00:00,BAK,sb39, 10, 29.49
06,2006-04-09 14:20:00,2006-04-10 00:20:00,BAK,sb39, 2, 29.31
06,2006-04-09 14:40:00,2006-04-10 00:40:00,BAK,sb39, 10, 29.27
My goal was to extract all of the 10m data (all of the "10"'s from the
"sensor_depth_m" column) and than calculate daily averages.
With the help from this forum I came up with this:
library(zoo)
Data=read.table("06_BottomMountThermistors.csv",sep=",",header=TRUE,as.is
=TRUE)
Ten=subset(Data,sensor_depth_m==10L)
?z=zoo(Ten$temperature_c,Ten$time_local)
Warning message:
In zoo(Ten$temperature_c, Ten$time_local) :
?some methods for ?zoo? objects do not work if the index entries in ?
order.by? are not unique
?ag=aggregate(zoo,as.Date,mean)
write.csv(ag,file="LTER_6_10m.csv")
Which works fine. I'm not sure why I get the error concerning unique
entries as all of the 10m "time local" data is sequential and thus unique.
Certainly some of the "temperature_c" data is repeated, but my
understanding of a zoo object is that I have the "time_local" column set up
as the order.by index.
So any thoughts on the warning message and my understanding of zoo objects
would be appreciated.
But the last task I have for this data is to average several of these data
sets together. My thoughts were to run the code as above for 6 different
sites (column 1 is the "site" index). I still think in excel, so I was
planning on lining up all 6 sites in a spreadsheet so that the dates (daily
means from the above code) line up, and then just averaging the data like
so:
Date, Site1, Site2, Site3, Site4, Site5, Site6 ? Average
2006-04-09 ,20,19,20,19,14,12,average(Site1-6)
2006-04-10,12,13,14,15,16,12 ,average(Site1-6)
2006-04-11,12,12,12,13,12,12 , average(Site1-6)
2006-04-12,12,13,13,12,12,12, average(Site1-6)
But I figure R can do this for me, so why bother going back to excel when R
is turning out to be a way better way to work with this kind of data. I
tried using merge, but I don't think this is the right command.
So is there anyway I can have R ?take 6 different data sets, line them up
by date, and pull a grand average by day for all 6 sites?
The solution is basically only three lines: read.zoo, grep out the required columns and append the average. A fourth statement to clean up the column names would be optional. 1. The read.zoo command below reads all the files indicated by the glob, ignoring the "NULL" columns (colClasses=...), splits each by sensor_depth_m (split=2), takes only the Date part of the date/time (FUN=as.Date) and aggregates (aggregate=mean) using mean over the rows with the same Date. It then arranges the result, z, with daily rows that look like this so that there is one column for each sensor depth/file combination:
z
2.data1.txt 10.data1.txt 2.data2.txt 10.data2.txt 2006-04-09 29.591 29.4925 29.591 29.4925 2. Then we reduce that to just the 10 columns using grep yielding z10 3. append an overall average which looks like this:
z10
10.data1.txt 10.data2.txt Average
2006-04-09 29.4925 29.4925 29.4925
4. It would be possible to add an optional fourth statement which
would be as follows in this sample case in order to get prettier
column names after creating z10. The statement will vary depending on
the precise form of your file names so this line will work with the
sample here since we are using filenames of data1.txt and data2.txt
but you will may need to adjust it to your file names if they are
substantially different. Here we look for column names that start
with 10 followed by a dot [.] and then some characters .* and then a
dot [.] and other characters .* and replace that with just the
parenthesized portion \\1:
colnames(z10) <- sub("10[.](.*)[.].*", "\\1", colnames(z10))
# generate test data
Lines <- "site,time_local,time_utc,reef_type_code,sensor_type,sensor_depth_m,temperature_c
06,2006-04-09 10:20:00,2006-04-09 20:20:00,BAK,sb39, 2, 29.63
06,2006-04-09 10:40:00,2006-04-09 20:40:00,BAK,sb39, 2, 29.56
06,2006-04-09 11:00:00,2006-04-09 21:00:00,BAK,sb39, 2, 29.51
06,2006-04-09 11:20:00,2006-04-09 21:20:00,BAK,sb39, 10, 29.53
06,2006-04-09 11:40:00,2006-04-09 21:40:00,BAK,sb39, 2, 29.57
06,2006-04-09 12:00:00,2006-04-09 22:00:00,BAK,sb39, 2, 29.60
06,2006-04-09 12:20:00,2006-04-09 22:20:00,BAK,sb39, 2, 29.66
06,2006-04-09 12:40:00,2006-04-09 22:40:00,BAK,sb39, 2, 29.68
06,2006-04-09 13:00:00,2006-04-09 23:00:00,BAK,sb39, 10, 29.68
06,2006-04-09 13:20:00,2006-04-09 23:20:00,BAK,sb39, 2, 29.71
06,2006-04-09 13:40:00,2006-04-09 23:40:00,BAK,sb39, 2, 29.68
06,2006-04-09 14:00:00,2006-04-10 00:00:00,BAK,sb39, 10, 29.49
06,2006-04-09 14:20:00,2006-04-10 00:20:00,BAK,sb39, 2, 29.31
06,2006-04-09 14:40:00,2006-04-10 00:40:00,BAK,sb39, 10, 29.27"
cat(Lines, "\n", file = "data1.txt")
cat(Lines, "\n", file = "data2.txt")
library(zoo)
z <- read.zoo(Sys.glob("data*.txt"), header = TRUE, sep = ",", split = 2,
colClasses = c("NULL", NA, "NULL", "NULL", "NULL", NA, NA),
FUN = as.Date, aggregate = mean)
z10 <- z[, grep("^10[.]", colnames(z))]
z10$Average <- rowMeans(z10)
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com