Dear R and S-Plus users: Currently I am using: at work: "S-Plus 2000 Pro" on a PC: Pentium II/350MHz, 256 MB RAM, running Win NT at home: "R" on my Mac PowerBook G3/292MHz, 128 MB RAM, running LinuxPPC Currently, at home I am trying to import a table(nrow=302500, ncol=6) which I have to do for each column extra because of memory problems. I have partially to use the columns, partially I have to convert them in to matrices(550 x 550) for doing calculations. Ultimately, I have to import many (ca 20-100) of these tables, which will be impossible on my current machines due to memory limitations. My question now is the following: At work I have access to the following multiprocessor machines: a, Compaq Proliant Server: 4 x Pentium II/450MHz, 2 GB RAM, Win NT b, Sun Enterprise 450 Server: 4 x SPARC/??MHz, 2 GB RAM, Solaris 2.6 For testing purposes I would like to install "R": 1, Can R take advantage of multiprocessor machines? 2, Which machine would be better suited to run R on? Finally, the question is: Is R or S-Plus better suited for handling such large data? Would "S-Plus 2000" for Win NT or "S-Plus 5" for Unix better suited? Can S-Plus take advantage of multiprocessor machines? Thank you in advance for your help and Happy New Year 2000 (hopefully not 1900) Christian Stratowa, Ph.D., Vienna -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Large data files
3 messages · cstrato@EUnet.at, Thomas Lumley, Andy Elvey
On Wed, 29 Dec 1999 cstrato at EUnet.at wrote:
Dear R and S-Plus users: Currently I am using: at work: "S-Plus 2000 Pro" on a PC: Pentium II/350MHz, 256 MB RAM, running Win NT at home: "R" on my Mac PowerBook G3/292MHz, 128 MB RAM, running LinuxPPC Currently, at home I am trying to import a table(nrow=302500, ncol=6) which I have to do for each column extra because of memory problems. I have partially to use the columns, partially I have to convert them in to matrices(550 x 550) for doing calculations. Ultimately, I have to import many (ca 20-100) of these tables, which will be impossible on my current machines due to memory limitations. My question now is the following: At work I have access to the following multiprocessor machines: a, Compaq Proliant Server: 4 x Pentium II/450MHz, 2 GB RAM, Win NT b, Sun Enterprise 450 Server: 4 x SPARC/??MHz, 2 GB RAM, Solaris 2.6 For testing purposes I would like to install "R": 1, Can R take advantage of multiprocessor machines?
Not really. You can run multiple copies of R, which lets you get four things done at once, but R is not multithreaded.
2, Which machine would be better suited to run R on?
Either would work. We have done some very limited comparisons of speed on machines here: the various test suites for the survival5 package run at about the same speed on a new Sun Enterprise server and on a Pentium II/400 under Linux, and run faster on a Pentium III/500 under WinNT, and slower on an eighteen-month old Sun Enterprise 450 server. The speeds are close enough that other factors are probably more important (which system you prefer, how many other people you will annoy by taking over the machine) If you are doing a lot of simple linear algebra the Sun Workshop compilers might be expected to have some advantages over gcc: I haven't found any examples where it matters, but I don't work with very large matrices much.
Finally, the question is: Is R or S-Plus better suited for handling such large data? Would "S-Plus 2000" for Win NT or "S-Plus 5" for Unix better suited? Can S-Plus take advantage of multiprocessor machines?
Neither R nor S-PLUS is particularly suited to handling large data. I believe S-PLUS has some multithreading, but that its main computations are still done by a single processor. However, this is perhaps not the best list to get information about S-PLUS. You would be better off splitting the data into pieces using some other program. Either S-PLUS or R will handle 550x550 matrices perfectly happily if you have that much memory. Thomas Lumley Assistant Professor, Biostatistics University of Washington, Seattle -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Thomas Lumley wrote:
Neither R nor S-PLUS is particularly suited to handling large data. I believe S-PLUS has some multithreading, but that its main computations are still done by a single processor. However, this is perhaps not the best list to get information about S-PLUS. You would be better off splitting the data into pieces using some other program. Either S-PLUS or R will handle 550x550 matrices perfectly happily if you have that much memory. Thomas Lumley Assistant Professor, Biostatistics University of Washington, Seattle ************
One other possibility that may be worth a try is the language "Yorick" which is specifically designed with array/matrix processing in mind. Try the following URL - ftp://ftp-icf.llnl.gov/pub/Yorick/yorick-ad.html ( Hope this suggestion doesn't offend on an R-help mailing list ... I am a keen (although new) R user but am also aware of a few other ways of solving problems...:-) -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._