Skip to content

how to parallelize 'apply' across multiple cores on a Mac

2 messages · David Romano, Charles C. Berry

#
David Romano <dromano <at> stanford.edu> writes:
[description of simple calc's deleted]

David,

If you insist on explicitly parallelizing this:

The functions in the recommended package 'parallel' work on a Mac. 

I would not try to work on each tiny column as a separate function call - 
too much overhead if you parallelize - instead, bundle up 100-1000 columns
to operate on.

The calc's you describe are sound simple enough that I would just write
them in C and use the .Call interface to invoke them. You only need enough
working memory in C to operate on one column and space to save the result. 

So a MacBook with 8GB of memory will handle it with room to breathe.

This is a good use case for the 'inline' package, especially if you are
unfamiliar with the use of .Call.


===

But it might be as fast to forget about paralleizing this (explicitly).

If !any(is.na(column.values)), then what you are doing can be achieved by

  desired.means[ , column.subset] <- 
       crossprod( suitable.matrix, matrix.values )

or better still

  desired.means[, column.subset] <- 
      crossprod(minimal.matrix, matrix.values)[fill.rows,]

where suitable.matrix implements your steps 2-6. 

minimal.matrix is unique(suitable.matrix,MARGIN=2)

fill.rows is s.t  minimal.matrix[fill.rows,] == suitable.matrix 

matrix.values is a subset of columns from your original matrix

and column.subset is where the result should be placed in desired means.

On a Mac, the vecLib BLAS will do crossprod using the multiple 
cores without your needing to do anything special. So you can forget about 
'parallel', 'multicore', etc.


So your remaining problem is to reread steps 2=6 and figure out what
'minimal.matrix' and 'fill.rows' have to be.

===

You can also approach this problem using 'filter', but that can get 
'convoluted' (pun intended - see ?filter).

HTH,