Skip to content

maintaining row connections during aggregate

2 messages · Kara Przeczek, Dennis Murphy

#
Dear All,
I have several sets of data such as this:

  year jday  avg_m3s
1 1960    1 4.262307
2 1960    2 4.242308
3 1960    3 4.216923
4 1960    4 4.185385
5 1960    5 4.151538
6 1960    6 4.133846
 ...

There is a value for each day of multiple years. In this particular data set it goes up to 1974. I am am looking to obtain the minimum and maximum values for each year, but also know on which julian day ("jday") they occurred.
I can get the maximum value for each year with:
year max_daily
1  1960  60.24615
2  1961  73.90000
3  1962  56.40000
...


But I want to output the max with the corresponding day on which it occurred, such as:
  year jday  avg_m3s
1 1960    136 60.24615
2 1961    129 73.90000
3 1962    111 56.40000


I haven't been able to determine how to keep those ties without aggregating by both year *and day, which is what happened with:
aggregate(ddat$avg_m3s, list(Year=ddat$year, Day = ddat$jday), max, na.rm=T),
resulting in a value output for every single day of each year.

Other attempts to get both columns to output failed.

Any help would be greatly appreciated!
Kara
#
Hi:

Here are two ways to do it - one with ddply() in the plyr package and
another with package data.table.

# Toy data frame:
tsdf <- data.frame(year = rep(c(1960:1963), c(366, rep(365, 3))),
                   jday = c(1:366, rep(1:365, 3)),
                   y = rnorm(4*365 + 1))

# A function to output maximum response and the day on which it occurs
# For use in ddply(), f() needs to input a data frame df and output a data frame
f <- function(df) data.frame(max_day = df$jday[which.max(df$y)],
                             ymax = max(df$y))
ddply(tsdf, .(year), f)

# In data.table, one can pass the core of f() in as a list instead:
library(data.table)
tsdt <- data.table(tsdf, key = 'year')
tsdt[, list(max_day = jday[which.max(y)], ymax = max(y)), by = 'year']

If you intend to do a lot of data summarization, these two packages,
along with reshape2 and doBy, are worth being familiar with.

HTH,
Dennis
On Mon, Jun 13, 2011 at 1:30 PM, Kara Przeczek <przeczek at unbc.ca> wrote: