Skip to content

Avoid duplication in dplyr::summarise

4 messages · Lars Bishop, Edjabou Vincent, Eric Berger

#
Dear group,

Is there a way I could avoid the sort of duplication illustrated below?
i.e., I have the same dplyr::summarise function on different group_by
arguments. So I'd like to create a single summarise function that could be
applied to both. My attempt below fails.

df <- data.frame(matrix(rnorm(40), 10, 4),
                 f1 = gl(3, 10, labels = letters[1:3]),
                 f2 = gl(3, 10, labels = letters[4:6]))


df %>%
  group_by(f1, f2) %>%
  summarise(x1m = mean(X1),
            x2m = mean(X2),
            x3m = mean(X3),
            x4m = mean(X4))

df %>%
  group_by(f1) %>%
  summarise(x1m = mean(X1),
            x2m = mean(X2),
            x3m = mean(X3),
            x4m = mean(X4))

# My fail attempt

s <- function() {
  dplyr::summarise(x1m = mean(X1),
                   x2m = mean(X2),
                   x3m = mean(X3),
                   x4m = mean(X4))
}

df %>%
  group_by(f1) %>% s
Error in s(.) : unused argument (.)

Regards,
Lars.
#
Hi Lars

I am not very sure what you really want. However, I am suggesting the
following code that enables (1) to obtain the full summary of your data and
(2) retrieve only mean of X values as function of factors f1 and f2.

library(tidyverse)
library(psych)
df <- data.frame(matrix(rnorm(40), 10, 4),
                 f1 = gl(3, 10, labels = letters[1:3]),
                 f2 = gl(3, 10, labels = letters[4:6]))

##To get all summary of your data
df%>% gather(X_name,X_value,X1:X4)%>%
  group_by(f1,f2,X_name)%>%
  do(describe(.$X_value))

##To obtain only means of your data
df%>% gather(X_name,X_value,X1:X4)%>%
  group_by(f1,f2,X_name)%>%
  do(describe(.$X_value))%>%
  select(mean)%>%# You select only mean value
  spread(X_name,mean)#

Vincent

Med venlig hilsen/ Best regards

Edjabou Maklawe Essonanawe Vincent
Mobile: +45 31 95 99 33
On Sat, Sep 9, 2017 at 12:30 PM, Lars Bishop <lars52r at gmail.com> wrote:

            

  
  
#
Hi Lars,
Two comments:
1. You can achieve what you want with a slight modification of your
definition of s(), using the hint from the error message that you need an
argument '.':
s <- function(.) {
  dplyr::summarise(., x1m = mean(X1),
                   x2m = mean(X2),
                   x3m = mean(X3),
                   x4m = mean(X4))
}

2. You have not given a great test case in how you set your two factors
because the two group_by()'s will give the identical groupings, An
alternative which confirms that the function s() does what you want might
be:

df <- data.frame(matrix(rnorm(40), 10, 4),
                 f1 = base::sample(letters[1:3],30,replace=TRUE),
                 f2 = base::sample(letters[4:6],30,replace=TRUE))

HTH,

Eric
On Sat, Sep 9, 2017 at 1:52 PM, Edjabou Vincent <maklawe at gmail.com> wrote:

            

  
  
#
Exactly what I was looking for Eric, thanks!

I agree on your second point.

Best,
Lars.
On Sat, Sep 9, 2017 at 9:02 AM, Eric Berger <ericjberger at gmail.com> wrote: