Message-ID: <1391555008.11688982.1593950448099.JavaMail.zimbra@psyctc.org>
Date: 2020-07-05T12:00:48Z
From: Chris Evans
Subject: Can I pass the grouped portions of a dataframe/tibble to a function in dplyr
In-Reply-To: <5e822ed0-2285-306e-7ee5-c90e444812ef@sapo.pt>
Ouch. I should have know all those points Rui: my bad. Casual behaviour while just rushing up a little example. Good to be reminded.
group_modify() is clearly exactly what I wanted and I will experiment with it and make sure I understand it properly. I see from the help that it evolves from, or supercedes aspects of do() which I think must have been the function I had forgotten. Even more interestingly I see that it seems to lead me into interesting options and experimental developments in tidyverse that I didn't know.
Excellent. Perfect help ... many thanks!
Chris
----- Original Message -----
> From: "Rui Barradas" <ruipbarradas at sapo.pt>
> To: "Chris Evans" <chrishold at psyctc.org>, "R-help" <r-help at r-project.org>
> Sent: Sunday, 5 July, 2020 13:16:19
> Subject: Re: [R] Can I pass the grouped portions of a dataframe/tibble to a function in dplyr
> Hello,
>
> I forgot to say I redid the data set setting the RNG seed first.
>
>
>
> set.seed(2020)
> n <- 50
> x <- 1:n
> y <- sample(1:3, n, replace = TRUE)
> z <- rnorm(n)
> tib <- tibble(x,y,z)
>
>
> Also, don't do
>
> as_tibble(cbind(...))
> as.data.frame(cbind(...))
>
>
> If one of the variables is of a different class (example, "character")
> all variables are coerced to the least common denominator. It's much
> better to call tibble() or data.frame() directly.
>
> Hope this helps,
>
> Rui Barradas
>
>
> ?s 12:04 de 05/07/2020, Rui Barradas escreveu:
>> Hello,
>>
>> You can pass a grouped tibble to a function with grouped_modify but the
>> function must return a data.frame (or similar).
>>
>> ## this will also do it
>> #sillyFun <- function(tib){
>> #? tibble(nrow = nrow(tib), ncol = ncol(tib))
>> #}
>>
>>
>> sillyFun <- function(tib){
>> ? data.frame(nrow = nrow(tib), ncol = ncol(tib)))
>> }
>>
>> tib %>%
>> ? group_by(y) %>%
>> ? group_modify(~ sillyFun(.))
>> ## A tibble: 3 x 3
>> ## Groups:?? y [3]
>> #????? y? nrow? ncol
>> #? <dbl> <int> <int>
>> #1???? 1??? 17???? 2
>> #2???? 2??? 21???? 2
>> #3???? 3??? 12???? 2
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> ?s 09:43 de 05/07/2020, Chris Evans escreveu:
>>> Apologies if this is a stupid question but searching keeps getting
>>> things I know and don't need.
>>>
>>> What I want to do is to use the group-by() power of dplyr to run
>>> functions that expect a dataframe/tibble per group but I can't see how
>>> do it. Here is a reproducible example.
>>>
>>> ### create trivial tibble
>>> n <- 50
>>> x <- 1:n
>>> y <- sample(1:3, n, replace = TRUE)
>>> z <- rnorm(n)
>>> tib <- as_tibble(cbind(x,y,z))
>>>
>>> ### create trivial function that expects a tibble/data frame
>>> sillyFun <- function(tib){
>>> return(list(nrow = nrow(tib),
>>> ncol = ncol(tib)))
>>> }
>>>
>>> ### works fine on the whole tibble
>>> tib %>%
>>> summarise(dim = list(sillyFun(.))) %>%
>>> unnest_wider(dim)
>>>
>>> That gives me:
>>> # A tibble: 1 x 2
>>> ??? nrow? ncol
>>> ?? <int> <int>
>>> 1??? 50???? 3
>>>
>>>
>>> ### So I try the following hoping to apply the function to the grouped
>>> tibble
>>> tib %>%
>>> group_by(y) %>%
>>> summarise(dim = list(sillyFun(.))) %>%
>>> unnest_wider(dim)
>>>
>>> ### But that gives me:
>>> # A tibble: 3 x 3
>>> ?????? y? nrow? ncol
>>> ?? <dbl> <int> <int>
>>> 1???? 1??? 50???? 3
>>> 2???? 2??? 50???? 3
>>> 3???? 3??? 50???? 3
>>>
>>> Clearly "." is still passing the whole tibble, not the grouped
>>> subsets.? What I can't find is whether there is an alternative to "."
>>> that would pass just the grouped subset of the tibble.
>>>
>>> I have bodged my way around this by writing a function that takes
>>> individual columns and reassembles them into a data frame that the
>>> actual functions I need to use require but that takes me back to a lot
>>> of clumsiness both selecting the variables to pass in the dplyr call
>>> to the function and putting the reassemble-to-data-frame bit in the
>>> function I call.? (The functions I really need are reliability
>>> explorations and can called on whole dataframes.)
>>>
>>> I know I can do this using base R split and lapply but I feel sure it
>>> must be possible to do this within dplyr/tidyverse.? I'm slowly
>>> transferring most of my code to the tidyverse and hitting frustrations
>>> but also finding that it does really help me program more sensibly,
>>> handle relational data structures more easily, and write code that I
>>> seem better at reading when I come back to it after months on other
>>> things so I am slowly trying to move all my coding to tidyverse.? If I
>>> could see how to do this, it would help.
>>>
>>> Very sorry if the answer should be blindingly obvious to me.? I'd also
>>> love to have pointers to guidance to the tidyverse written for people
>>> who aren't professional coders or statisticians and that go a bit
>>> beyond the obvious basics of tidyverse into issues like this.
>>>
>>> TIA,
>>>
>>> Chris
>>>
>>
>
> --
> Este e-mail foi verificado em termos de v?rus pelo software antiv?rus Avast.
> https://www.avast.com/antivirus
--
Small contribution in our coronavirus rigours:
https://www.coresystemtrust.org.uk/home/free-options-to-replace-paper-core-forms-during-the-coronavirus-pandemic/
Chris Evans <chris at psyctc.org> Visiting Professor, University of Sheffield <chris.evans at sheffield.ac.uk>
I do some consultation work for the University of Roehampton <chris.evans at roehampton.ac.uk> and other places
but <chris at psyctc.org> remains my main Email address. I have a work web site at:
https://www.psyctc.org/psyctc/
and a site I manage for CORE and CORE system trust at:
http://www.coresystemtrust.org.uk/
I have "semigrated" to France, see:
https://www.psyctc.org/pelerinage2016/semigrating-to-france/
https://www.psyctc.org/pelerinage2016/register-to-get-updates-from-pelerinage2016/
If you want an Emeeting, I am trying to keep them to Thursdays and my diary is at:
https://www.psyctc.org/pelerinage2016/ceworkdiary/
Beware: French time, generally an hour ahead of UK.