D?nes, thank you for the guidance - which is well-taken. Your side note raises an interesting question: I find the piping %>% operator readable. Is there any downside to it? Or is the side note meant to tell me to drop the last: "%>% `[`"? Thank you, == Michael Lachanski PhD Student in Demography and Sociology MA Candidate in Statistics University of Pennsylvania mikelach at sas.upenn.edu
On Sat, Dec 31, 2022 at 9:22 AM D?nes T?th <toth.denes at kogentum.hu> wrote:
Hi Michael, Note that you have to be very careful when using by-reference operations in data.table (see `?data.table::set`), especially in a functional programming approach. In your function, you avoid this problem by calling `data.table(A)` which makes a copy of A even if it is already a data.table. However, for large data.table-s, copying can be a very expensive operation (esp. in terms of RAM usage), which can be totally eliminated by using data.tables in the data.table-way (e.g., joining, grouping, and aggregating in the same step by performing these operations within `[`, see `?data.table`). So instead of blindly functionalizing all your code, try to be pragmatic. Functional programming is not about using pure functions in *every* part of your code base, because it is unfeasible in 99.9% of real-world problems. Even Haskell has `IO` and `do`; the point is that the imperative and functional parts of the code are clearly separated and imperative components are (tried to be) as top-level as possible. So when using data.table, a good strategy is to use pure functions for performing within-data.table operations, e.g., `DT[, lapply(.SD, mean), .SDcols = is.numeric]`, and when these operations alter `DT` by reference, invoke the chains of these operations in "pure" wrappers - e.g., calling `A <- copy(A)` on the top and then modifying `A` directly. Cheers, Denes Side note: You do not need to use `DT[ , A:= shift(A, fill = NA, type = "lag", n = 1)] %>% `[`(return(DT))`. `[.data.table` returns the result (the modified DT) invisibly. If you want to let auto-print work, you can just use `DT[ , A:= shift(A, fill = NA, type = "lag", n = 1)][]`. Note that this also means you usually you do not need to use magrittr's or base-R pipe when transforming data.table-s. You can do this instead: ``` DT[ ## filter rows where 'x' column equals "a" x == "a" ][ ## calculate the mean of `z` for each gender and assign it to `y` , y := mean(z), by = "gender" ][ ## do whatever you want ... ] ``` On 12/31/22 13:39, Rui Barradas wrote:
?s 06:50 de 31/12/2022, Michael Lachanski escreveu:
Hello,
I am trying to make a habit of "functionalizing" all of my code as
recommended by Hadley Wickham. I have found it surprisingly difficult
to do
so because several intermediate features from data.table break or give
unexpected results using purrr and its data.table adaptation, tidytable.
Here is the a minimal working example of what has stumped me most
recently:
===
library(data.table); library(tidytable)
minimal_failing_function <- function(A){
DT <- data.table(A)
DT[ , A:= shift(A, fill = NA, type = "lag", n = 1)] %>% `[`
return(DT)}
# works
minimal_failing_function(c(1,2))
# fails
tidytable::pmap_dfr(.l = list(c(1,2)),
.f = minimal_failing_function)
===
These should ideally give the same output, but do not. This also fails
using purrr::pmap_dfr rather than tidytable. I am using R 4.2.2 and I
am on
Mac OS Ventura 13.1.
Thank you for any help you can provide or general guidance.
==
Michael Lachanski
PhD Student in Demography and Sociology
MA Candidate in Statistics
University of Pennsylvania
mikelach at sas.upenn.edu
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
Hello,
Use map_dfr instead of pmap_dfr.
library(data.table)
library(tidytable)
minimal_failing_function <- function(A) {
DT <- data.table(A)
DT[ , A:= shift(A, fill = NA, type = "lag", n = 1)] %>% `[`
return(DT)
}
# works
tidytable::map_dfr(.x = list(c(1,2)),
.f = minimal_failing_function)
#> # A tidytable: 2 ? 1
#> A
#> <dbl>
#> 1 NA
#> 2 1
Hope this helps,
Rui Barradas
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.