Skip to content
Prev 394970 / 398502 Next

aggregate formula - differing results

?s 10:44 de 04/09/2023, Ivan Calandra escreveu:
Hello,

You can define a vector of the columns of interest and subset the data 
with it. Then the default na.action = na.omit will no longer remove the 
rows with NA vals in at least one column and the results are the same.

However, this will not give the mean values of the other numeric 
columns, just of those two.



# define a vector of columns of interest
cols <- c("Length", "Width", "RAWMAT")

# 1) Simple aggregation with 2 variables, select cols:
aggregate(cbind(Length, Width) ~ RAWMAT, data = my_data[cols], FUN = 
mean, na.rm = TRUE)

# 2) Using the dot notation - if cols are selected, equal results:
aggregate(. ~ RAWMAT, data = my_data[cols], FUN = mean, na.rm = TRUE)

# 3) Using dplyr, the results are now the same results as #1 and #2:
my_data %>%
   select(all_of(cols)) %>%
   group_by(RAWMAT) %>%
   summarise(across(c("Length", "Width"), ~ mean(.x, na.rm = TRUE)))


Hope this helps,

Rui Barradas