Vectorizing a for-loop for cross-validation in R

Charles C. Berry · 2019-01-23T18:34:14Z

See inline. > On Jan 23, 2019, at 2:17 AM, Aleksandre Gavashelishvili wrote: > > I'm trying to speed up a script that otherwise takes days to handle larger > data sets. So, is there a way to completely vectorize or paralellize the > following script: > > *# k-fold cross validation* > > df df k <- 10 # Number of folds. Note k=nrow(df

Charles C. Berry

Wed, Jan 23, 2019 10:34 AM

See inline.

Rprof()

replicate(100, {

})

Rprof(NULL)

summaryRprof()

## read ?Rprof to get a sense of what it does

## read the summary to determine where time is being spent.

## the result was surprising to me. YMMV.

## there may be redundancies that you can eliminate by 
##  - doing the setup within gam() one time and saving it
##  - calling the worker functions by modifying the setup 
##    in a loop or function and saving the results

This is something you should learn to do. It is pretty standard practice. Use the body of your for loop as the body of a function, add arguments, and create a suitable return value. The something like

	lapply( 1:k, your.loop.body.function, other.arg1, other.arg2, ...)

should work.  If it does, then parallel::mclapply(...) should also work.

HTH,

Chuck

Vectorizing a for-loop for cross-validation in R

Thread (4 messages)