What is the most cost effective hardware for R?
Perhaps I have confused the issue. When I initally said "data points" I meant one stand alone analysis, not one piece of data. Each analysis point takes 1.5 seconds. I have not implemented running this over the whole dataset yet, but I would expect it to take about 5 to 10 hours. This is just about acceptable, but it would be better if this was quicker. As I say, the exact analysis method has not yet been determined, and if that was significantly more computationally intensive then that could be an issue. It is not actually a simulation, it is a pre-analysis of the dataset before public display. I do have a simulation of the analysis to run, and that could be some orders of magnitude larger than the real dataset. I can of course wait for that. Thanks for the input.
On 05/08/2012 05:24 PM, Bert Gunter wrote:
Probably just pointing out the obvious, but: 200,000 data points may not be that many these days, depending on the dimensionality of the data. Nor is 10 times that number, neither now nor in 5 years, again depending on data dimensionality. So my question is, have you actually tried running your simulations -- or a reasonable approximation thereof -- on a single "cheap" machine? It might be that your concerns are overblown, especially with multicore and parallelization. Obviously, ignore if you've already done this and know it's nonsense. Cheers, Bert On Tue, May 8, 2012 at 8:50 AM, Hugh Morgan<h.morgan at har.mrc.ac.uk> wrote:
On 05/08/2012 12:14 PM, Zhou Fang wrote:
How many data points do you have?
Currently 200,000. We are likely to have 10 times that in 5 years.
Why buy when you can rent? Unless your hardware is going to be running 24/7 doing these analyses then you are paying for it to sit idle. You might be better off purchasing computing time from Amazon or another cloud computing provider. If you need to run more analyses quickly, just buy some more virtual hosts.
Because of the nature of the funding we are likely to be better off buying. We are likely to be running most of the time, most of the analysis must be rerun as more data becomes available, and that is likely to happen a few times every week. Thank you for all the pointers, we shall consider them all. This email may have a PROTECTIVE MARKING, for an explanation please see: http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
This email may have a PROTECTIVE MARKING, for an explanation please see: http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm