Hi, I'm working in R 2.11.1 x64 on Windows x86_64-pc-mingw32. I'm experiencing a strange problem in R that I'm not even sure how to begin to fix. I've got a huge (forty-pages printed) simulation written in R that I'd like to run multiple times. When I open up R and run it on its own, it works fine. At the beginning of the program, there's a variable X that I set to 1, 5, 10, 20, depending on how sensitive I want the simulation to be to a certain parameter. When I just run one instance of R, the X variable stays the same throughout the program. I have a quad-core machine, so I'd like to take advantage of all four processors. If I open up four sessions and set X to 1, 5, 10, and 20 in those different sessions, then run all four simulations all the way through (about eighteen hours of processing time) at the same time, the variable X ends up being 20 at the end of all four sessions. It's as if R mixed up the variable setting between the four concurrent sessions. I can't figure out why else my variable X would ever get changed to 20 in the three simulations that I set it to 1, 5, and 10, respeectively (it doesn't get updated anywhere during the simulation). When I have all four of these simulations running concurrently, I am absolutely maxing out my computer. All four processors are at 100%, and my Windows Task Manager says I'm using almost 100% of my 16 GB of RAM. Is it possible that intense resource use would cause a variable conflict like this? I have no idea where to start troubleshooting this error, so any advice would be appreciated. Thanks! Anthony Damico Kaiser Family Foundation
Could concurrent R sessions mix up variables?
4 messages · Anthony Damico, Duncan Murdoch, Phil Spector +1 more
On 10/12/2010 1:13 PM, Anthony Damico wrote:
Hi, I'm working in R 2.11.1 x64 on Windows x86_64-pc-mingw32. I'm experiencing a strange problem in R that I'm not even sure how to begin to fix. I've got a huge (forty-pages printed) simulation written in R that I'd like to run multiple times. When I open up R and run it on its own, it works fine. At the beginning of the program, there's a variable X that I set to 1, 5, 10, 20, depending on how sensitive I want the simulation to be to a certain parameter. When I just run one instance of R, the X variable stays the same throughout the program. I have a quad-core machine, so I'd like to take advantage of all four processors. If I open up four sessions and set X to 1, 5, 10, and 20 in those different sessions, then run all four simulations all the way through (about eighteen hours of processing time) at the same time, the variable X ends up being 20 at the end of all four sessions. It's as if R mixed up the variable setting between the four concurrent sessions. I can't figure out why else my variable X would ever get changed to 20 in the three simulations that I set it to 1, 5, and 10, respeectively (it doesn't get updated anywhere during the simulation). When I have all four of these simulations running concurrently, I am absolutely maxing out my computer. All four processors are at 100%, and my Windows Task Manager says I'm using almost 100% of my 16 GB of RAM. Is it possible that intense resource use would cause a variable conflict like this? I have no idea where to start troubleshooting this error, so any advice would be appreciated.
If you are running something that takes 18 hours to complete, a common practice is to save intermediate results to disk occasionally. Have you (or whoever wrote the simulation) done this and forgotten about it? If all 4 processes are saving to the same place, then reading results back, you'd see something like you describe. If all calculations are held in memory, you shouldn't. A simple approach that might debug this is to create a new variables initX, set equal to X. Then sprinkle statements stopifnot(X == initX) through your simulation code. That should quit when the change happens, and you can try to figure out why it happened. Duncan Murdoch
Anthony -
I would advise you to use the multicore or snowfall packages
to utilize multiple CPUs. As an example using multicore:
library(multicore) sim = function(mu)max(replicate(100000,max(rnorm(100,mu)))) library(multicore) unlist(mclapply(c(1,5,10,20),sim))
[1] 6.569332 10.268091 15.335847 25.291502 Using snowfall:
library(snowfall) sim = function(mu)max(replicate(100000,max(rnorm(100,mu)))) sfInit(cpus=4,type='SOCK',parallel=TRUE) sfSapply(c(1,5,10,20),sim)
[1] 6.200161 10.307807 15.271581 25.055950 Hope this helps. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu
On Fri, 10 Dec 2010, Anthony Damico wrote:
Hi, I'm working in R 2.11.1 x64 on Windows x86_64-pc-mingw32. I'm experiencing a strange problem in R that I'm not even sure how to begin to fix. I've got a huge (forty-pages printed) simulation written in R that I'd like to run multiple times. When I open up R and run it on its own, it works fine. At the beginning of the program, there's a variable X that I set to 1, 5, 10, 20, depending on how sensitive I want the simulation to be to a certain parameter. When I just run one instance of R, the X variable stays the same throughout the program. I have a quad-core machine, so I'd like to take advantage of all four processors. If I open up four sessions and set X to 1, 5, 10, and 20 in those different sessions, then run all four simulations all the way through (about eighteen hours of processing time) at the same time, the variable X ends up being 20 at the end of all four sessions. It's as if R mixed up the variable setting between the four concurrent sessions. I can't figure out why else my variable X would ever get changed to 20 in the three simulations that I set it to 1, 5, and 10, respeectively (it doesn't get updated anywhere during the simulation). When I have all four of these simulations running concurrently, I am absolutely maxing out my computer. All four processors are at 100%, and my Windows Task Manager says I'm using almost 100% of my 16 GB of RAM. Is it possible that intense resource use would cause a variable conflict like this? I have no idea where to start troubleshooting this error, so any advice would be appreciated. Thanks! Anthony Damico Kaiser Family Foundation
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101211/be36ad54/attachment.pl>