Resources for utilizing multiple processors

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110608/65449b70/attachment.pl>

Hello,

I know of some various methods out there to utilize multiple processors but
am not sure what the best solution would be. First some things to note:
I'm running dependent simulations, so direct parallel coding is out
(multicore, doSnow, etc).
I'm on Windows, and don't know C. I don't plan on learning C or any of the
*nix languages.
By restricting yourself to one of the least capable OS R runs on, you 
are making this harder for yourself.
My main concern deals with Multiple analyses on large data sets. By large I
mean that when I'm done running 2 simulations R is using ~3G of RAM, the
remaining ~3G is chewed up when I try to create the Gelman-Rubin statistic
to compare the two resulting samples, grinding the process to a halt. I'd
like to have separate cores simultaneously run each analysis. That will save
on time and I'll have to ponder the BGR calculation problem another way. Can
R temporarily use HD space to write calculations to instead of RAM?
By using virtual memory (R does not in fact use RAM, it always uses 
virtual memory).  With a 64bit R you can use up to terabytes of VM. 
Because Windows' disc access is so slow, you will need to set a 
max-memory-size larger than your RAM size to enable this.
The second concern boils down to whether or not there is a way to split up
dependent simulations. For example at iteration (t) I feed a(t-2) into FUN1
to generate a(t), then feed a(t), b(t-1) and c(t-1) into FUN2 to simulate
b(t) and c(t). I'd love to have one core run FUN1 and another run FUN2,
As stated, that is pointless.  The core running FUN2 would be waiting 
for the resuls of FUN1.  However, at time t FUN1 could generate 
a(t+1) from a(t-1) whilst FUN2 generates b(t) and c(t).
and better yet, a third to run all the pre-and post- processing tidbits!
Look into package snow (with socket clusters).  The overhead of what 
you ask may be too high (POSIX OSes can use package multicore, which 
has a much lower overhead), but if the calculations are slow enough it 
may be worthwhile.  There are Windows-oriented examples in package 
RSiena.

So if anyone has any suggestions as to a direction I can look into, it would
be appreciated.

Robin Jeffries
MS, DrPH Candidate
Department of Biostatistics
UCLA
530-633-STAT(7828)

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
----------------------------------------
From: rjeffries at ucla.edu
Date: Wed, 8 Jun 2011 20:54:45 -0700
To: r-help at r-project.org
Subject: [R] Resources for utilizing multiple processors

Hello,

I know of some various methods out there to utilize multiple processors but
am not sure what the best solution would be. First some things to note:
I'm running dependent simulations, so direct parallel coding is out
(multicore, doSnow, etc).
the
*nix languages.
Well, for the situation below you seem to want a function
server. You could consider Rapache and just write this like a big
web application. A web server, like a DB, is not the first thing
you think of with high performance computing but if your computationally
intenstive tasks are in native code this could be a reasoanble
overhead that requires little learning. 

If you literally means cores instead of machines keep in mind
that cores can end up fighting over resources, like memory
( this cites IEEE article with cores making things worse
in non-contrived case)

http://lists.boost.org/boost-users/2008/11/42263.php

I think people have mentioned some classes like bigmemory, I forget
the names exactly, that let you handle larger things. Launching a bunch
of threads and letting VM thrash can easily make things slower quickly.

I guess a better approach would be to get an implementation that is
block oriented and you can do the memory/file stuff in R until
they get a data frame that uses disk transparently and with hints on
expected access patterns ( prefetch etc).
My main concern deals with Multiple analyses on large data sets. By large I
mean that when I'm done running 2 simulations R is using ~3G of RAM, the
remaining ~3G is chewed up when I try to create the Gelman-Rubin statistic
to compare the two resulting samples, grinding the process to a halt. I'd
like to have separate cores simultaneously run each analysis. That will save
on time and I'll have to ponder the BGR calculation problem another way. Can
R temporarily use HD space to write calculations to instead of RAM?

The second concern boils down to whether or not there is a way to split up
dependent simulations. For example at iteration (t) I feed a(t-2) into FUN1
to generate a(t), then feed a(t), b(t-1) and c(t-1) into FUN2 to simulate
b(t) and c(t). I'd love to have one core run FUN1 and another run FUN2, and
[[elided Hotmail spam]]

So if anyone has any suggestions as to a direction I can look into, it would
be appreciated.

Robin Jeffries
MS, DrPH Candidate
Department of Biostatistics
UCLA
530-633-STAT(7828)

[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hello,

I know of some various methods out there to utilize multiple processors but
am not sure what the best solution would be. First some things to note:
I'm running dependent simulations, so direct parallel coding is out
(multicore, doSnow, etc).
I'm on Windows, and don't know C. I don't plan on learning C or any of the
*nix languages.

My main concern deals with Multiple analyses on large data sets. By large I
mean that when I'm done running 2 simulations R is using ~3G of RAM, the
remaining ~3G is chewed up when I try to create the Gelman-Rubin statistic
to compare the two resulting samples, grinding the process to a halt. I'd
like to have separate cores simultaneously run each analysis. That will save
on time and I'll have to ponder the BGR calculation problem another way. Can
R temporarily use HD space to write calculations to instead of RAM?

The second concern boils down to whether or not there is a way to split up
dependent simulations. For example at iteration (t) I feed a(t-2) into FUN1
to generate a(t), then feed a(t), b(t-1) and c(t-1) into FUN2 to simulate
b(t) and c(t). I'd love to have one core run FUN1 and another run FUN2, and
better yet, a third to run all the pre-and post- processing tidbits!
If FUN1 is independent of b() and c(), perhaps the example at the bottom 
of ?socketConnection points in a useful direction -- start one R to 
calculate a(t) and send the result to a socket connection, then move on 
to a(t+1). Start a second R to read from the socket connection and do 
FUN2(t), . You'll be able to overlap the computations and double 
throughput; the 'pipeline' could be extended with pre- and 
post-processing workers, too, though one would want to watch out for the 
complexity of managing this.

Martin

So if anyone has any suggestions as to a direction I can look into, it would
be appreciated.

Robin Jeffries
MS, DrPH Candidate
Department of Biostatistics
UCLA
530-633-STAT(7828)

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793