Advice on multi-user server for R
On Thu, Jul 26, 2012 at 7:59 AM, Carrie Wager <cgwager at gmail.com> wrote:
I'm currently developing several tools in R that I'd like to deploy for use by multiple analysts in my research group. Many of the analysts have no background in using R (but have plenty of experience with SAS), so part of my effort will be in training them to use the new tools. Some of the analyses will be too computationally intensive for our regulation 32 bit Windows desktops, however they will run on a 64 bit machine. I would like to provide our IT department with advice regarding a suitable R server that can handle multiple users. R-studio offers a client/server model that would allow us to run the server under a linux platform without requiring users to learn linux. An alternative (less preferable) solution would be to run a windows server (as we currently do for SAS) which would require users to log on to the server via a Windows session in order to run R. While I've already searched the R newsgroups and obtained some ideas toward a solution, I'm wondering if anyone out there has more recent or current advice (particularly regarding suitable hardware choice and server setup). I would like to find a solution that would allow me to rapidly deploy R solutions to collaborators (all within my computer network) who do not necessarily have much background in R or linux. All data files and data would be accessed via our networked filesystem (unless, of course they are so huge that moving them locally to the server would impact processing time). I'm trying to avoid minimal per-user setup hassle and perceived inconvenience of running R. The system should be able to handle about 5 intensive jobs and up to 20 users simultaneously. Any advice would be appreciated!
Just a few short points: I think the RStudio Server on Linux backend makes a lot of sense. If your coworkers eventually do start using R (just to prototype), they can use desktop RStudio so they have a single unified interface. R is generally very memory hungry so when picking hardware specs, keep that in mind. Also, if you get a multi-core server, note that you'll have to use explicit parallelization in writing scripts if you only have a single R process running at a time. If you have as many processes as cores, it might be better to avoid parallelizing [someone else with more HPC knowledge than me should comment definitively] Also, for scientific / heavy matrix work, go to the effort of building locally and using a tuned BLAS. It really does make a noticeable difference for work on big matrices (which linear models are internally). Your IT folks should not be too unfamiliar with this. For a slightly different take, you might also look at Simon's work on FastRWeb et al. which take much more of a client-server model so you can hide as much as you want (everything?) behind a webpage. Not sure where the best documentation on that is, but I've seen a live demo and it's awesome. Best, Michael
Thanks,
Carrie Greene Wager, PhD
New England Research Institutes
Watertown, MA
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.