https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hello,
First of all, 48 * 181 * 28 = 243,264, not 234,576.
And 243264 * 70 = 17,028,480.
As for the question, why don't you try it with smaller data sets?
In the test bellow I have tested with the sizes you have posted and the
many columns (wide format) is fastest. Then the df's list, then the 4
columns (long format).
4 columns because it's sensor, day, season and data.
And the wide format df is only 72 columns wide, one for day, one for
season and one for each sensor.
The test computes mean values aggregated by day and season. When the
data is in the long format it must also include the sensor, so there is
an extra aggregation column.
The test is very simple, real results probably depend on the functions
you want to apply to the data.
# create the test data
makeDataLong <- function(sensor, x) {
x[["data"]] <- rnorm(nrow(df1))
cbind.data.frame(sensor, x)
}
makeDataWide <- function(sensor, x) {
x[[sensor]] <- rnorm(nrow(x))
x
}
set.seed(2025)
n_per_day <- 48
n_days <- 181
n_seasons <- 28
n_sensors <- 70
day <- rep(1:n_days, each = n_per_day)
season <- 1:n_seasons
sensor_names <- sprintf("sensor_%02d", 1:n_sensors)
df1 <- expand.grid(day = day, season = season, KEEP.OUT.ATTRS = FALSE)
df_list <- lapply(1:n_sensors, makeDataLong, x = df1)
names(df_list) <- sensor_names
df_long <- lapply(1:n_sensors, makeDataLong, x = df1) |> do.call(rbind,
args = _)
df_wide <- df1
for(s in sensor_names) {
df_wide <- makeDataWide(s, df_wide)
}
# test functions
f <- function(x) aggregate(data ~ season + day, data = x, mean)
g <- function(x) aggregate(data ~ sensor + season + day, data = x, mean)
h <- function(x) aggregate(. ~ season + day, x, mean)
# timings
bench::mark(
list_base = lapply(df_list, f),
long_base = g(df_long),
wide_base = h(df_wide),
check = FALSE
)
Hope this helps,
Rui Barradas
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.