Skip to content
Prev 2052 / 2152 Next

48K csv files, 1000 lines each. How to redesign? (big picture)

Well, having lived to tell the tale, I would like to mention one option 
that never seems as obvious as it should. With simulation exercises you 
can save the seed and regenerate only portions of the data you want for 
specific analysis. It can be especially fast if the analysis can be done 
without actually saving data to file. This is a trade-off between 
compute speed and storage access/query speed, and depends of course on 
the complexity of the model computation. The trade off does not seem to 
always work the way one is inclined to think it should. (BTW, it is 
important to beware of the details needed for regenerating simulations 
on clusters.)

Paul Gilbert
On 03/01/2017 05:50 PM, Paul Johnson wrote: