Skip to content
Prev 924 / 2152 Next

ff: "aggregate" function for ff matrix ?

Dear Clem,
Reading complete columns at once gets the fastest throughput in ff. If not aggregate is the bottleneck, to speed this up you probably need faster/more RAID0 disks. 
Note that an ffdf can have its columns spread over multiple disks, but so far [.ffdf will not read in parallel. However, you can exctract columns in parallel using snowfall.
There are examples with snowfall on http://ff.r-forge.r-project.org/.
Check the UseR!2009 and the 2010 presentation. 
Keep in mind that this will speed-up your calculation if CPU is the bottleneck. If I/O is the bottleneck, parallel execution only helps if you manage to work in parallel on parallel disks.
Also keep in mind that more processes in parallel need more RAM.

Kind regards
Jens