Skip to content
Back to formatted view

Raw Message

Message-ID: <c1db4555-8f34-47f7-92d8-17cf24aa15ba@gmail.com>
Date: 2025-08-14T12:01:41Z
From: Duncan Murdoch
Subject: About size of data frames
In-Reply-To: <86f79e4852da4aa68c5d57d3dc4e47c1@regione.marche.it>

On 2025-08-14 7:27 a.m., Stefano Sofia via R-help wrote:
> Dear R-list users,
> 
> let me ask you a very general question about performance of big data frames.
> 
> I deal with semi-hourly meteorological data of about 70 sensors during 28 winter seasons.
> 
> 
> It means that for each sensor I have 48 data for each day, 181 days for each winter season (182 in case of leap year): 48 * 181 * 28 = 234,576
> 
> 234,576 * 70 = 16420320
> 
> 
>  From the computational point of view it is better to deal with a single data frame of approximately 16.5 M rows and 3 columns (one for data, one for sensor code and one for value), with a single data frame of approximately 235,000 rows and 141 rows or 70 different data frames of approximately 235,000 rows and 3 rows? Or it doesn't make any difference?
> 
> I personally would prefer the first choice, because it would be easier for me to deal with a single data frame and few columns.
> 

It really depends on what computations you're doing.  As a general rule, 
column operations are faster than row operations.  (Also as a general 
rule, arrays are faster than dataframes, but are much more limited in 
what they can hold:  all entries must be the same type, which probably 
won't work for your data.)

So I'd guess your 3 column solution would likely be best.

Duncan Murdoch