I want to split my dataframe according to a list of factors. Then, in
the resulting list, I want to reference the factors used in split. Is
it possible?
Thanks,
Naresh
mydf <- data.frame(
date = rep(seq.Date(from = as.Date("2022-06-01"), by = 1, length.out =
10), 4),
account = c(rep("ABC", 20), rep("XYZ", 20)),
client = c(rep("P", 10), rep("Q", 10), rep("R", 10), rep("S", 10)),
profit = round(runif(40, 2, 5), 2), sale = round(runif(40, 10, 20), 2))
account.names <- data.frame(account = c("ABC", "DEF", "XYZ"),
corp = c("ABC Corporation", "DEF LLC", "XYZ Incorporated"))
mydf.split <- split(mydf, mydf$account)
# This does not work
myplots <- lapply(mydf.split, function(df) {
myts <- aggregate(sales ~ date, FUN = sum, data = df)
xyplot(sales ~ date, data = myts, main = account)})
# This works, but may have a large overhead
mydf <- merge(mydf, account.names, by = "account", all.x = TRUE)
mydf.split <- split(mydf, mydf$account)
myplots <- lapply(mydf.split, function(df) {
myts <- aggregate(sale ~ date, FUN = sum, data = df)
xyplot(sale ~ date, data = myts, main = unique(myts$corp))})
# Now I can print one plot at a time
myplots[["ABC"]]
myplots[["XYZ"]]
Reference factors inside split
3 messages · Naresh Gurbuxani, Ben Tupper
Hi,
The grouping variable is removed from the subgroups when you split.
Instead of iterating over the elements of the split list, you can
iterate over the **names** of the elements. In your case the account
name is the grouping variable.
##start
library(lattice)
mydf <- data.frame(
date = rep(seq.Date(from = as.Date("2022-06-01"), by = 1, length.out =
10), 4),
account = c(rep("ABC", 20), rep("XYZ", 20)),
client = c(rep("P", 10), rep("Q", 10), rep("R", 10), rep("S", 10)),
profit = round(runif(40, 2, 5), 2), sale = round(runif(40, 10, 20), 2))
account.names <- data.frame(account = c("ABC", "DEF", "XYZ"),
corp = c("ABC Corporation", "DEF LLC",
"XYZ Incorporated"))
mydf.split <- split(mydf, mydf$account)
myplots <- sapply(names(mydf.split),
function(name, x = NULL) {
df <- x[[name]]
myts <- aggregate(sale ~ date, FUN = sum, data = df)
xyplot(sale ~ date, data = myts, main = name)
}, x = mydf.split, USE.NAMES = TRUE, simplify = FALSE)
myplots[["ABC"]]
myplots[["XYZ"]]
## end
Does that help?
On Mon, Jul 11, 2022 at 9:14 AM Naresh Gurbuxani
<naresh_gurbuxani at hotmail.com> wrote:
I want to split my dataframe according to a list of factors. Then, in
the resulting list, I want to reference the factors used in split. Is
it possible?
Thanks,
Naresh
mydf <- data.frame(
date = rep(seq.Date(from = as.Date("2022-06-01"), by = 1, length.out =
10), 4),
account = c(rep("ABC", 20), rep("XYZ", 20)),
client = c(rep("P", 10), rep("Q", 10), rep("R", 10), rep("S", 10)),
profit = round(runif(40, 2, 5), 2), sale = round(runif(40, 10, 20), 2))
account.names <- data.frame(account = c("ABC", "DEF", "XYZ"),
corp = c("ABC Corporation", "DEF LLC", "XYZ Incorporated"))
mydf.split <- split(mydf, mydf$account)
# This does not work
myplots <- lapply(mydf.split, function(df) {
myts <- aggregate(sales ~ date, FUN = sum, data = df)
xyplot(sales ~ date, data = myts, main = account)})
# This works, but may have a large overhead
mydf <- merge(mydf, account.names, by = "account", all.x = TRUE)
mydf.split <- split(mydf, mydf$account)
myplots <- lapply(mydf.split, function(df) {
myts <- aggregate(sale ~ date, FUN = sum, data = df)
xyplot(sale ~ date, data = myts, main = unique(myts$corp))})
# Now I can print one plot at a time
myplots[["ABC"]]
myplots[["XYZ"]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Ben Tupper (he/him) Bigelow Laboratory for Ocean Science East Boothbay, Maine http://www.bigelow.org/ https://eco.bigelow.org
This is what I was looking for. Thanks for your quick response and elegant solution. Naresh Sent from my iPhone
On Jul 11, 2022, at 10:00 AM, Ben Tupper <btupper at bigelow.org> wrote:
?Hi,
The grouping variable is removed from the subgroups when you split.
Instead of iterating over the elements of the split list, you can
iterate over the **names** of the elements. In your case the account
name is the grouping variable.
##start
library(lattice)
mydf <- data.frame(
date = rep(seq.Date(from = as.Date("2022-06-01"), by = 1, length.out =
10), 4),
account = c(rep("ABC", 20), rep("XYZ", 20)),
client = c(rep("P", 10), rep("Q", 10), rep("R", 10), rep("S", 10)),
profit = round(runif(40, 2, 5), 2), sale = round(runif(40, 10, 20), 2))
account.names <- data.frame(account = c("ABC", "DEF", "XYZ"),
corp = c("ABC Corporation", "DEF LLC",
"XYZ Incorporated"))
mydf.split <- split(mydf, mydf$account)
myplots <- sapply(names(mydf.split),
function(name, x = NULL) {
df <- x[[name]]
myts <- aggregate(sale ~ date, FUN = sum, data = df)
xyplot(sale ~ date, data = myts, main = name)
}, x = mydf.split, USE.NAMES = TRUE, simplify = FALSE)
myplots[["ABC"]]
myplots[["XYZ"]]
## end
Does that help?
On Mon, Jul 11, 2022 at 9:14 AM Naresh Gurbuxani
<naresh_gurbuxani at hotmail.com> wrote:
I want to split my dataframe according to a list of factors. Then, in
the resulting list, I want to reference the factors used in split. Is
it possible?
Thanks,
Naresh
mydf <- data.frame(
date = rep(seq.Date(from = as.Date("2022-06-01"), by = 1, length.out =
10), 4),
account = c(rep("ABC", 20), rep("XYZ", 20)),
client = c(rep("P", 10), rep("Q", 10), rep("R", 10), rep("S", 10)),
profit = round(runif(40, 2, 5), 2), sale = round(runif(40, 10, 20), 2))
account.names <- data.frame(account = c("ABC", "DEF", "XYZ"),
corp = c("ABC Corporation", "DEF LLC", "XYZ Incorporated"))
mydf.split <- split(mydf, mydf$account)
# This does not work
myplots <- lapply(mydf.split, function(df) {
myts <- aggregate(sales ~ date, FUN = sum, data = df)
xyplot(sales ~ date, data = myts, main = account)})
# This works, but may have a large overhead
mydf <- merge(mydf, account.names, by = "account", all.x = TRUE)
mydf.split <- split(mydf, mydf$account)
myplots <- lapply(mydf.split, function(df) {
myts <- aggregate(sale ~ date, FUN = sum, data = df)
xyplot(sale ~ date, data = myts, main = unique(myts$corp))})
# Now I can print one plot at a time
myplots[["ABC"]]
myplots[["XYZ"]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Ben Tupper (he/him) Bigelow Laboratory for Ocean Science East Boothbay, Maine http://www.bigelow.org/ https://eco.bigelow.org