Skip to content

Reference factors inside split

3 messages · Naresh Gurbuxani, Ben Tupper

#
I want to split my dataframe according to a list of factors.  Then, in
the resulting list, I want to reference the factors used in split.  Is
it possible?

Thanks,
Naresh

mydf <- data.frame(
date = rep(seq.Date(from = as.Date("2022-06-01"), by = 1, length.out =
10), 4),
account = c(rep("ABC", 20), rep("XYZ", 20)),
client = c(rep("P", 10), rep("Q", 10), rep("R", 10), rep("S", 10)),
profit = round(runif(40, 2, 5), 2), sale = round(runif(40, 10, 20), 2))

account.names <- data.frame(account = c("ABC", "DEF", "XYZ"),
corp = c("ABC Corporation", "DEF LLC", "XYZ Incorporated"))

mydf.split <- split(mydf, mydf$account)

# This does not work
myplots <- lapply(mydf.split, function(df) {
myts <- aggregate(sales ~ date, FUN = sum, data = df)
xyplot(sales ~ date, data = myts, main = account)})

# This works, but may have a large overhead
mydf <- merge(mydf, account.names, by = "account", all.x = TRUE)
mydf.split <- split(mydf, mydf$account)
myplots <- lapply(mydf.split, function(df) {
myts <- aggregate(sale ~ date, FUN = sum, data = df)
xyplot(sale ~ date, data = myts, main = unique(myts$corp))})

# Now I can print one plot at a time
myplots[["ABC"]]
myplots[["XYZ"]]
#
Hi,

The grouping variable is removed from the subgroups when you split.
Instead of iterating over the elements of the split list, you can
iterate over the **names** of the elements.  In your case the account
name is the grouping variable.


##start

library(lattice)
mydf <- data.frame(
  date = rep(seq.Date(from = as.Date("2022-06-01"), by = 1, length.out =
                        10), 4),
  account = c(rep("ABC", 20), rep("XYZ", 20)),
  client = c(rep("P", 10), rep("Q", 10), rep("R", 10), rep("S", 10)),
  profit = round(runif(40, 2, 5), 2), sale = round(runif(40, 10, 20), 2))

account.names <- data.frame(account = c("ABC", "DEF", "XYZ"),
                            corp = c("ABC Corporation", "DEF LLC",
"XYZ Incorporated"))

mydf.split <- split(mydf, mydf$account)

myplots <- sapply(names(mydf.split),
  function(name, x = NULL) {
    df <- x[[name]]
    myts <- aggregate(sale ~ date, FUN = sum, data = df)
    xyplot(sale ~ date, data = myts, main = name)
  }, x = mydf.split, USE.NAMES = TRUE, simplify = FALSE)

myplots[["ABC"]]
myplots[["XYZ"]]

## end

Does that help?

On Mon, Jul 11, 2022 at 9:14 AM Naresh Gurbuxani
<naresh_gurbuxani at hotmail.com> wrote:

  
    
#
This is what I was looking for.  Thanks for your quick response and elegant solution.

Naresh

Sent from my iPhone