recommendation on B for validate.lrm () ? - R-help

Sun, May 1, 2011 12:36 PM #

For this case B=200 should work well if using the bootstrap.  For cross-val.
you can use B=10-fold cross-val and repeat the process 100 times for
adequate precision, averaging over the 100 as done in
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/RmS/logistic.val.pdf (note
this was using the Design package and there may be subtle changes with the
rms package).

Frank

viostorm wrote:

-----Frank Harrell
Department of Biostatistics, Vanderbilt University--
View this message in context: http://r.789695.n4.nabble.com/recommendation-on-B-for-validate-lrm-tp3486200p3488384.html
Sent from the R help mailing list archive at Nabble.com.

viostorm

Sun, May 8, 2011 5:03 PM #

Thanks so much for the reply it was exceptionally helpful!  A couple of
questions:

1. I was under the impression that k-fold with B=10 would train on 9/10,
validate on 1/10, and repeat 10 times for each different 1/10th.  Is this
how the procedure works in R?

2. Is the reason you recommend repeating k-fold 100 times because the
partitioning is random, ie not 1st 10th, 2nd 10, et cetera so you might
obtain slightly different results?





--
View this message in context: http://r.789695.n4.nabble.com/recommendation-on-B-for-validate-lrm-tp3486200p3508143.html
Sent from the R help mailing list archive at Nabble.com.

Frank E Harrell Jr

Sun, May 8, 2011 5:40 PM #

Yes that's how it works, but a single run does not provide sufficient
precision unless your sample size is enormous.  When you partition into
tenths again the partitions will be different so yes there is some
randomness.  Averaging over 100 times averages out the randomness.  Or just
use the bootstrap with B=300 (depending on sample size).

Frank

viostorm wrote:

-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/recommendation-on-B-for-validate-lrm-tp3486200p3508187.html
Sent from the R help mailing list archive at Nabble.com.