An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20131127/79f2e806/attachment.pl>
Etimating time to run an analysis?
3 messages · Erika Barthelmess, Ben Bolker, Bert Gunter
Erika Barthelmess <barthelmess <at> stlawu.edu> writes:
Hi everyone, I'm new to this list and have searched R help prior for an answer to this question, without luck. If I'm posting in error, please forgive. I'm thinking about using package MuMIn to do multimodel inference with logistic regression. I have many (25) possible predictors and am curious if there is a way to estimate how long the dredge command might take to run? Any suggestions most welcome. Thanks, erika
This is likely to be a bad idea. With 25 predictors you have 2^25 = 33 million candidate models (you can think of an array of models, each predictor is either present or absent in each model -- that makes this a set of 25-digit binary strings ...). (If this doesn't make sense, convince yourself by writing out the number of possible models for a 1-parameter (2), 2-parameter (4), and 3-parameter (8) model, and do the extrapolation.) So pick a model of intermediate complexity, run it, see how long it takes, and multiply that by 33 million ... (if each model takes about one second to fit, the analysis will take about a year to run). You might want to look into penalized regression approaches (e.g. see the glmnet package), which are a much more efficient approach to this type of problem.
I would say that if the OP even contemplated this, it strongly suggests that she needs to consult a local statistician for help. Cheers, Bert
On Wed, Nov 27, 2013 at 1:14 PM, Ben Bolker <bbolker at gmail.com> wrote:
Erika Barthelmess <barthelmess <at> stlawu.edu> writes:
Hi everyone, I'm new to this list and have searched R help prior for an answer to this question, without luck. If I'm posting in error, please forgive. I'm thinking about using package MuMIn to do multimodel inference with logistic regression. I have many (25) possible predictors and am curious if there is a way to estimate how long the dredge command might take to run? Any suggestions most welcome. Thanks, erika
This is likely to be a bad idea. With 25 predictors you have 2^25 = 33 million candidate models (you can think of an array of models, each predictor is either present or absent in each model -- that makes this a set of 25-digit binary strings ...). (If this doesn't make sense, convince yourself by writing out the number of possible models for a 1-parameter (2), 2-parameter (4), and 3-parameter (8) model, and do the extrapolation.) So pick a model of intermediate complexity, run it, see how long it takes, and multiply that by 33 million ... (if each model takes about one second to fit, the analysis will take about a year to run). You might want to look into penalized regression approaches (e.g. see the glmnet package), which are a much more efficient approach to this type of problem.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374