I have a gene expression matrix n (genes) X p (cases), where n = 8000 and p = 100. I want to fit each gene as univariate in a coxph model, i.e., fitting 8000 models. I do something like this: res <- apply(data, 1, coxph.func) which takes about 4 min, not bad. But I need to do large numbers of permutations of the data (permuting the columns), for example, 2000, which would take 5 days. I would like to know if there is way to speed this up? Any help appreciated. Xiao-Jun
speeding up 1000s of coxph regression?
3 messages · Xiao-Jun Ma, Thomas Lumley, A.J. Rossini
On Tue, 10 Jun 2003, Xiao-Jun Ma wrote:
I have a gene expression matrix n (genes) X p (cases), where n = 8000 and p = 100. I want to fit each gene as univariate in a coxph model, i.e., fitting 8000 models. I do something like this: res <- apply(data, 1, coxph.func) which takes about 4 min, not bad. But I need to do large numbers of permutations of the data (permuting the columns), for example, 2000, which would take 5 days. I would like to know if there is way to speed this up?
Calling coxph.fit directly would likely be faster. Also, you probably don't need to do 2000 permutations on all 8000 genes: a few hundred permutations is probably enough to decide that most of the genes aren't interesting. If you are going to be doing a lot of this sort of thing it might be worth looking at the parallel processing facilities in the `snow' package. There's a description of their use in another gene expression problem in the new R Newsletter. -thomas
Thomas Lumley <tlumley at u.washington.edu> writes:
If you are going to be doing a lot of this sort of thing it might be worth looking at the parallel processing facilities in the `snow' package. There's a description of their use in another gene expression problem in the new R Newsletter.
Actually, they used the RPVM package directly; however, Thomas is still correct, it probably would be simple to recast using SNOW. Some hints and details can be found in a tech report by Luke Tierney, Michael Li, and myself in the UW Biostat tech report series (can't recall which #, but it's on http://www.bepress.com/uwbiostat/). best, -tony
A.J. Rossini / rossini at u.washington.edu / rossini at scharp.org
Biomedical/Health Informatics and Biostatistics, University of Washington.
Biostatistics, HVTN/SCHARP, Fred Hutchinson Cancer Research Center.
FHCRC: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email
CONFIDENTIALITY NOTICE: This e-mail message and any attachments ... {{dropped}}