Dear All,
I am making my baby steps with the tidyverse purr package and I am
stuck with some probably trivial tasks.
Consider the following data set
zz<-list(structure(list(year = c(2000, 2001, 2002, 2003, 2000, 2001,
2002, 2003, 2000, 2001, 2002, 2003), tot_i = c(22393349.081,
23000574.372, 21682040.898, 21671102.853, 34361300.338, 35297814.942,
34745691.204, 35878883.117, 11967951.257, 12297240.57, 13063650.306,
14207780.264), relation = c("EU28-Algeria", "EU28-Algeria", "EU28-Algeria",
"EU28-Algeria", "World-Algeria", "World-Algeria", "World-Algeria",
"World-Algeria", "Extra EU28-Algeria", "Extra EU28-Algeria",
"Extra EU28-Algeria", "Extra EU28-Algeria"), g_rate = c(0.736046372770467,
0.0271163231905857, -0.0573261107603093, -0.000504474880914325,
0.614846575418334, 0.0272549232650638, -0.0156418673197543, 0.0326138831530727,
0.428272657063707, 0.0275142592018328, 0.0623237165799383, 0.0875811837579971
)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
)), structure(list(year = c(2000, 2001, 2002, 2003, 2000, 2001,
2002, 2003, 2000, 2001, 2002, 2003), tot_i = c(9233346.648, 7869288.171,
7271485.687, 6395999.102, 21393949.287, 19851236.26, 19449339.887,
16055014.309, 12160602.639, 11981948.089, 12177854.2, 9659015.207
), relation = c("EU28-Egypt", "EU28-Egypt", "EU28-Egypt", "EU28-Egypt",
"World-Egypt", "World-Egypt", "World-Egypt", "World-Egypt", "Extra EU28-Egypt",
"Extra EU28-Egypt", "Extra EU28-Egypt", "Extra EU28-Egypt"),
g_rate = c(0.0970653722744164, -0.147731751985664, -0.0759665259436081,
-0.120399959882366, 0.124744629514854, -0.0721097823643728,
-0.0202454077789513, -0.174521376957825, 0.146712116047648,
-0.0146912579338002, 0.0163501051368976, -0.206837670383671
)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
)))
I am capable of doing very simple stuff with maps for instance taking the iteratively the mean of a certain column
map(zz, function(x) mean(x$tot_i))
or filtering the values of the years
map(zz, function(x) filter(x, year==2000))
however, I bang my head against the wall as soon as I want to add a bit of complexity. For instance
1) I want to iteratively group the data in zz by relation and summarise them by taking the average of tot_i and
2) Given a list of years
ll<-list(c(2000, 2001), c(2001, 2003))
I would like to filter the two elements of the zz list according to the years listed in ll.
I would then have plenty of other operations to carry out on the data, but already understanding 1 and 2 would take me a long way from where I am stuck now.
Any suggestion is welcome.
Cheers
Lorenzo
Purr and Basic Functional Programming Tasks
4 messages · Lorenzo Isella, jim holtman
Does this answer the first question?
rel <- map(zz, function(x){
+ group_by(x, relation) %>% summarise(tot = mean(tot_i)) + })
rel
[[1]] # A tibble: 3 x 2 relation tot <chr> <dbl> 1 EU28-Algeria 22186767. 2 Extra EU28-Algeria 12884156. 3 World-Algeria 35070922. [[2]] # A tibble: 3 x 2 relation tot <chr> <dbl> 1 EU28-Egypt 7692530. 2 Extra EU28-Egypt 11494855. 3 World-Egypt 19187385.
Jim Holtman *Data Munger Guru* *What is the problem that you are trying to solve?Tell me what you want to do, not how you want to do it.* On Fri, Jan 25, 2019 at 5:45 AM Lorenzo Isella <lorenzo.isella at gmail.com> wrote:
Dear All,
I am making my baby steps with the tidyverse purr package and I am
stuck with some probably trivial tasks.
Consider the following data set
zz<-list(structure(list(year = c(2000, 2001, 2002, 2003, 2000, 2001,
2002, 2003, 2000, 2001, 2002, 2003), tot_i = c(22393349.081,
23000574.372, 21682040.898, 21671102.853, 34361300.338, 35297814.942,
34745691.204, 35878883.117, 11967951.257, 12297240.57, 13063650.306,
14207780.264), relation = c("EU28-Algeria", "EU28-Algeria",
"EU28-Algeria",
"EU28-Algeria", "World-Algeria", "World-Algeria", "World-Algeria",
"World-Algeria", "Extra EU28-Algeria", "Extra EU28-Algeria",
"Extra EU28-Algeria", "Extra EU28-Algeria"), g_rate = c(0.736046372770467,
0.0271163231905857, -0.0573261107603093, -0.000504474880914325,
0.614846575418334, 0.0272549232650638, -0.0156418673197543,
0.0326138831530727,
0.428272657063707, 0.0275142592018328, 0.0623237165799383,
0.0875811837579971
)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
)), structure(list(year = c(2000, 2001, 2002, 2003, 2000, 2001,
2002, 2003, 2000, 2001, 2002, 2003), tot_i = c(9233346.648, 7869288.171,
7271485.687, 6395999.102, 21393949.287, 19851236.26, 19449339.887,
16055014.309, 12160602.639, 11981948.089, 12177854.2, 9659015.207
), relation = c("EU28-Egypt", "EU28-Egypt", "EU28-Egypt", "EU28-Egypt",
"World-Egypt", "World-Egypt", "World-Egypt", "World-Egypt", "Extra
EU28-Egypt",
"Extra EU28-Egypt", "Extra EU28-Egypt", "Extra EU28-Egypt"),
g_rate = c(0.0970653722744164, -0.147731751985664, -0.0759665259436081,
-0.120399959882366, 0.124744629514854, -0.0721097823643728,
-0.0202454077789513, -0.174521376957825, 0.146712116047648,
-0.0146912579338002, 0.0163501051368976, -0.206837670383671
)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
)))
I am capable of doing very simple stuff with maps for instance taking the
iteratively the mean of a certain column
map(zz, function(x) mean(x$tot_i))
or filtering the values of the years
map(zz, function(x) filter(x, year==2000))
however, I bang my head against the wall as soon as I want to add a bit of
complexity. For instance
1) I want to iteratively group the data in zz by relation and summarise
them by taking the average of tot_i and
2) Given a list of years
ll<-list(c(2000, 2001), c(2001, 2003))
I would like to filter the two elements of the zz list according to the
years listed in ll.
I would then have plenty of other operations to carry out on the data, but
already understanding 1 and 2 would take me a long way from where I am
stuck now.
Any suggestion is welcome.
Cheers
Lorenzo
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Try this for the second question:
years <- map2(zz,
+ list(c(2000, 2001), c(2001, 2003)), + ~ filter(.x, year %in% .y) + )
years
[[1]] # A tibble: 6 x 4 year tot_i relation g_rate <dbl> <dbl> <chr> <dbl> 1 2000 22393349. EU28-Algeria 0.736 2 2001 23000574. EU28-Algeria 0.0271 3 2000 34361300. World-Algeria 0.615 4 2001 35297815. World-Algeria 0.0273 5 2000 11967951. Extra EU28-Algeria 0.428 6 2001 12297241. Extra EU28-Algeria 0.0275 [[2]] # A tibble: 6 x 4 year tot_i relation g_rate <dbl> <dbl> <chr> <dbl> 1 2001 7869288. EU28-Egypt -0.148 2 2003 6395999. EU28-Egypt -0.120 3 2001 19851236. World-Egypt -0.0721 4 2003 16055014. World-Egypt -0.175 5 2001 11981948. Extra EU28-Egypt -0.0147 6 2003 9659015. Extra EU28-Egypt -0.207
Jim Holtman *Data Munger Guru* *What is the problem that you are trying to solve?Tell me what you want to do, not how you want to do it.* On Fri, Jan 25, 2019 at 5:45 AM Lorenzo Isella <lorenzo.isella at gmail.com> wrote:
Dear All,
I am making my baby steps with the tidyverse purr package and I am
stuck with some probably trivial tasks.
Consider the following data set
zz<-list(structure(list(year = c(2000, 2001, 2002, 2003, 2000, 2001,
2002, 2003, 2000, 2001, 2002, 2003), tot_i = c(22393349.081,
23000574.372, 21682040.898, 21671102.853, 34361300.338, 35297814.942,
34745691.204, 35878883.117, 11967951.257, 12297240.57, 13063650.306,
14207780.264), relation = c("EU28-Algeria", "EU28-Algeria",
"EU28-Algeria",
"EU28-Algeria", "World-Algeria", "World-Algeria", "World-Algeria",
"World-Algeria", "Extra EU28-Algeria", "Extra EU28-Algeria",
"Extra EU28-Algeria", "Extra EU28-Algeria"), g_rate = c(0.736046372770467,
0.0271163231905857, -0.0573261107603093, -0.000504474880914325,
0.614846575418334, 0.0272549232650638, -0.0156418673197543,
0.0326138831530727,
0.428272657063707, 0.0275142592018328, 0.0623237165799383,
0.0875811837579971
)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
)), structure(list(year = c(2000, 2001, 2002, 2003, 2000, 2001,
2002, 2003, 2000, 2001, 2002, 2003), tot_i = c(9233346.648, 7869288.171,
7271485.687, 6395999.102, 21393949.287, 19851236.26, 19449339.887,
16055014.309, 12160602.639, 11981948.089, 12177854.2, 9659015.207
), relation = c("EU28-Egypt", "EU28-Egypt", "EU28-Egypt", "EU28-Egypt",
"World-Egypt", "World-Egypt", "World-Egypt", "World-Egypt", "Extra
EU28-Egypt",
"Extra EU28-Egypt", "Extra EU28-Egypt", "Extra EU28-Egypt"),
g_rate = c(0.0970653722744164, -0.147731751985664, -0.0759665259436081,
-0.120399959882366, 0.124744629514854, -0.0721097823643728,
-0.0202454077789513, -0.174521376957825, 0.146712116047648,
-0.0146912579338002, 0.0163501051368976, -0.206837670383671
)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
)))
I am capable of doing very simple stuff with maps for instance taking the
iteratively the mean of a certain column
map(zz, function(x) mean(x$tot_i))
or filtering the values of the years
map(zz, function(x) filter(x, year==2000))
however, I bang my head against the wall as soon as I want to add a bit of
complexity. For instance
1) I want to iteratively group the data in zz by relation and summarise
them by taking the average of tot_i and
2) Given a list of years
ll<-list(c(2000, 2001), c(2001, 2003))
I would like to filter the two elements of the zz list according to the
years listed in ll.
I would then have plenty of other operations to carry out on the data, but
already understanding 1 and 2 would take me a long way from where I am
stuck now.
Any suggestion is welcome.
Cheers
Lorenzo
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
2 days later
Dear Jim, Thanks a lot for your stellar replies! They address my questions perfectly. Cheers Lorenzo
On Fri, Jan 25, 2019 at 07:46:50AM -0800, jim holtman wrote:
Try this for the second question:
years <- map2(zz,
+ list(c(2000, 2001), c(2001, 2003)), + ~ filter(.x, year %in% .y) + )
years
[[1]] # A tibble: 6 x 4 year tot_i relation g_rate <dbl> <dbl> <chr> <dbl> 1 2000 22393349. EU28-Algeria 0.736 2 2001 23000574. EU28-Algeria 0.0271 3 2000 34361300. World-Algeria 0.615 4 2001 35297815. World-Algeria 0.0273 5 2000 11967951. Extra EU28-Algeria 0.428 6 2001 12297241. Extra EU28-Algeria 0.0275 [[2]] # A tibble: 6 x 4 year tot_i relation g_rate <dbl> <dbl> <chr> <dbl> 1 2001 7869288. EU28-Egypt -0.148 2 2003 6395999. EU28-Egypt -0.120 3 2001 19851236. World-Egypt -0.0721 4 2003 16055014. World-Egypt -0.175 5 2001 11981948. Extra EU28-Egypt -0.0147 6 2003 9659015. Extra EU28-Egypt -0.207
Jim Holtman *Data Munger Guru* *What is the problem that you are trying to solve?Tell me what you want to do, not how you want to do it.* On Fri, Jan 25, 2019 at 5:45 AM Lorenzo Isella <lorenzo.isella at gmail.com> wrote:
Dear All,
I am making my baby steps with the tidyverse purr package and I am
stuck with some probably trivial tasks.
Consider the following data set
zz<-list(structure(list(year = c(2000, 2001, 2002, 2003, 2000, 2001,
2002, 2003, 2000, 2001, 2002, 2003), tot_i = c(22393349.081,
23000574.372, 21682040.898, 21671102.853, 34361300.338, 35297814.942,
34745691.204, 35878883.117, 11967951.257, 12297240.57, 13063650.306,
14207780.264), relation = c("EU28-Algeria", "EU28-Algeria",
"EU28-Algeria",
"EU28-Algeria", "World-Algeria", "World-Algeria", "World-Algeria",
"World-Algeria", "Extra EU28-Algeria", "Extra EU28-Algeria",
"Extra EU28-Algeria", "Extra EU28-Algeria"), g_rate = c(0.736046372770467,
0.0271163231905857, -0.0573261107603093, -0.000504474880914325,
0.614846575418334, 0.0272549232650638, -0.0156418673197543,
0.0326138831530727,
0.428272657063707, 0.0275142592018328, 0.0623237165799383,
0.0875811837579971
)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
)), structure(list(year = c(2000, 2001, 2002, 2003, 2000, 2001,
2002, 2003, 2000, 2001, 2002, 2003), tot_i = c(9233346.648, 7869288.171,
7271485.687, 6395999.102, 21393949.287, 19851236.26, 19449339.887,
16055014.309, 12160602.639, 11981948.089, 12177854.2, 9659015.207
), relation = c("EU28-Egypt", "EU28-Egypt", "EU28-Egypt", "EU28-Egypt",
"World-Egypt", "World-Egypt", "World-Egypt", "World-Egypt", "Extra
EU28-Egypt",
"Extra EU28-Egypt", "Extra EU28-Egypt", "Extra EU28-Egypt"),
g_rate = c(0.0970653722744164, -0.147731751985664, -0.0759665259436081,
-0.120399959882366, 0.124744629514854, -0.0721097823643728,
-0.0202454077789513, -0.174521376957825, 0.146712116047648,
-0.0146912579338002, 0.0163501051368976, -0.206837670383671
)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
)))
I am capable of doing very simple stuff with maps for instance taking the
iteratively the mean of a certain column
map(zz, function(x) mean(x$tot_i))
or filtering the values of the years
map(zz, function(x) filter(x, year==2000))
however, I bang my head against the wall as soon as I want to add a bit of
complexity. For instance
1) I want to iteratively group the data in zz by relation and summarise
them by taking the average of tot_i and
2) Given a list of years
ll<-list(c(2000, 2001), c(2001, 2003))
I would like to filter the two elements of the zz list according to the
years listed in ll.
I would then have plenty of other operations to carry out on the data, but
already understanding 1 and 2 would take me a long way from where I am
stuck now.
Any suggestion is welcome.
Cheers
Lorenzo
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.