Skip to content
Prev 389148 / 398506 Next

tidyverse: grouped summaries (with summArize)

I think we wandered away into a package rather than base R, but the request seems easy enough.

Just FYI, Rich, as you seem not to have incorporated the advice we gave yet about the first argument, your use of group_by() is a tad odd.

disc %>%
     group_by(hour) %>%
     group_by(day) %>%
     group_by(year, month) %>%
     summarize(disc_by_month, vol = mean(cfs, na.rm = TRUE))

Not sure why you use disc once and disc_by_month the second superfluous time but if you read the manual page for group_by() https://dplyr.tidyverse.org/reference/group_by.html you may note it tends to be called ONCE with multiple arguments in sequence that specify what columns in the data.frame to group by sequentially.

disc %>%
     group_by(hour, day, year, month) %>%
     summarize(vol = mean(cfs, na.rm = TRUE))

Not sure most people would group that way as the above sorts by hours first. Many might reverse that sequence.

-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Rich Shepard
Sent: Monday, September 13, 2021 6:32 PM
To: R mailing list <r-help at r-project.org>
Subject: Re: [R] tidyverse: grouped summaries (with summerize)
On Tue, 14 Sep 2021, Eric Berger wrote:

            
Eric/Avi:

That makes no difference:
# A tibble: 590,940 ? 6
# Groups:   year, month [66]
     year month   day  hour   min    cfs
    <int> <int> <int> <int> <int>  <dbl>
  1  2016     3     3    12     0 149000
  2  2016     3     3    12    10 150000
  3  2016     3     3    12    20 151000
  4  2016     3     3    12    30 156000
  5  2016     3     3    12    40 154000
  6  2016     3     3    12    50 150000
  7  2016     3     3    13     0 153000
  8  2016     3     3    13    10 156000
  9  2016     3     3    13    20 154000
10  2016     3     3    13    30 155000
# ? with 590,930 more rows

I wondered if I need to group first by hour, then day, then year-month.
This, too, produces the same output:

disc %>%
     group_by(hour) %>%
     group_by(day) %>%
     group_by(year, month) %>%
     summarize(disc_by_month, vol = mean(cfs, na.rm = TRUE))

And disc shows the read dataframe.

I don't understand why the columns are not grouping.

Thanks,

Rich

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.