About size of data frames
On 2025-08-14 7:27 a.m., Stefano Sofia via R-help wrote:
Dear R-list users, let me ask you a very general question about performance of big data frames. I deal with semi-hourly meteorological data of about 70 sensors during 28 winter seasons. It means that for each sensor I have 48 data for each day, 181 days for each winter season (182 in case of leap year): 48 * 181 * 28 = 234,576 234,576 * 70 = 16420320 From the computational point of view it is better to deal with a single data frame of approximately 16.5 M rows and 3 columns (one for data, one for sensor code and one for value), with a single data frame of approximately 235,000 rows and 141 rows or 70 different data frames of approximately 235,000 rows and 3 rows? Or it doesn't make any difference? I personally would prefer the first choice, because it would be easier for me to deal with a single data frame and few columns.
It really depends on what computations you're doing. As a general rule, column operations are faster than row operations. (Also as a general rule, arrays are faster than dataframes, but are much more limited in what they can hold: all entries must be the same type, which probably won't work for your data.) So I'd guess your 3 column solution would likely be best. Duncan Murdoch