speeding up 1000s of coxph regression? - R-help

Tue, Jun 10, 2003 3:21 PM #

I have a gene expression matrix n (genes) X p (cases), where n = 8000 and p
= 100. I want to fit each gene as univariate in a coxph model, i.e., fitting
8000 models. I do something like this:

res <- apply(data, 1, coxph.func)

which takes about 4 min, not bad. But I need to do large numbers of
permutations of the data (permuting the columns), for example, 2000, which
would take 5 days. I would like to know if there is way to speed this up?

Any help appreciated.

Xiao-Jun

Thomas Lumley

Tue, Jun 10, 2003 4:25 PM #

On Tue, 10 Jun 2003, Xiao-Jun Ma wrote:

Calling coxph.fit directly would likely be faster.

Also, you probably don't need to do 2000 permutations on all 8000 genes: a
few hundred permutations is probably enough to decide that most of the
genes aren't interesting.

If you are going to be doing a lot of this sort of thing it might be worth
looking at the parallel processing facilities in the `snow' package.
There's a description of their use in another gene expression problem in
the new R Newsletter.


	-thomas

A.J. Rossini

Tue, Jun 10, 2003 8:01 PM #

Thomas Lumley <tlumley at u.washington.edu> writes:

Actually, they used the RPVM package directly; however, Thomas is
still correct, it probably would be simple to recast using SNOW.

Some hints and details can be found in a tech report by Luke Tierney,
Michael Li, and myself in the UW Biostat tech report series (can't
recall which #, but it's on http://www.bepress.com/uwbiostat/).

best,
-tony

A.J. Rossini  /  rossini at u.washington.edu  /  rossini at scharp.org
Biomedical/Health Informatics and Biostatistics, University of Washington.
Biostatistics, HVTN/SCHARP, Fred Hutchinson Cancer Research Center.
FHCRC: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email 

CONFIDENTIALITY NOTICE: This e-mail message and any attachments ... {{dropped}}