Hi,
I am facing a very weird problem with parallel::mclapply. I have a script which does some data wrangling on an input dataset in parallel and then writes the results to disk. I have been using this script daily for more than one year always on an EC2 instance launched from the same AMI (no updates installed after launch) and processed thousands of different input data sets successfully. I now have an input dataset for which I face the following bug:
The basic outline of the problematic section of the script:
# parts is a data.table with 88 rows
mc_ret <- parallel::mclapply(sample.int(nrow(parts)), function(i) {
# do some data wrangling and write the result to a file
# ...
print(paste0("part ", i, " written successfully."))
return(TRUE)
}, mc.preschedule = FALSE, mc.cores = 2L)
str(mc_ret)
Expected output: "part i written successfully." is printed 88 times, once for each value of i. mc_ret is a list of length 88, each element being TRUE. Its structure is printed once. All outputs are created successfully.
Actual output (see end of the message): "part i written successfully." is printed 88 times, once for each value of i. mc_ret is a list of length 88, each element being TRUE. Its structure is printed. All outputs are created successfully. So far so good.
But then "part i written successfully." it is printed another X times, for values of i for which it was already printed. This output is intermingled with X-1 times the following error message:
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
and Y times the message "Execution halted". mc_ret is printed again, now being a list of length 85, with the first element being TRUE and all other elements being NULL. X and Y vary from run to run.
Now to the main problem: I tried very hard to create a reproducible example, but I failed. What I observed:
- The output is (and has always been) written to path which is on an NFS share. If I instead write to a path on a local disk it will work.
- The script is invoked using Rscript. If I instead source it from an interactive R session it works. There are at least two more people who have observed this: https://stackoverflow.com/questions/51986674/mclapply-sendmaster-error-only-with-rscript
- Before the call to mclapply the code acquires an exclusive file lock on a dedicated lock file, not written to but also on the NFS share. If I remove the code acquiring the lock, the whole script will also work if called using Rscript.
- The problem also occurs for mc.preschedule = TRUE.
- There is no error if I set mc.cores to 1.
- And stressing again: the code works without any changes from Rscript for thousands of other data sets.
Rscript -e "sessionInfo()":
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.5.2
I know this is a fairly old R version. I have not been able to reproduce the bug with a more recent version, but since it is so difficult to trigger, this does not mean much, I guess. I have looked through the changes made to the code of mclapply since that version and could not find something directly related. I am not even sure if it is a problem in the parallel package or some other (memory) bug. What strikes me is that others have observed a very similar error when using Rscript but not when using an interactive R session, just like I do.
I am not expecting a fix based on the information I provide, but maybe someone has some thoughts on this!?
Regards,
Andreas
Actual output:
[1] "part 51 written successfully."
[1] "part 30 written successfully."
[1] "part 32 written successfully."
[1] "part 48 written successfully."
[1] "part 63 written successfully."
[1] "part 5 written successfully."
[1][1] "part 14 written successfully." "part 18 written successfully."
[1] "part 38 written successfully."
[1] "part 11 written successfully."
[1] "part 68 written successfully."
[1] "part 45 written successfully."
[1] "part 88 written successfully."
[1] "part 36 written successfully."
[1] "part 44 written successfully."
[1] "part 55 written successfully."
[1] "part 26 written successfully."
[1] "part 37 written successfully."
[1] "part 22 written successfully."
[1] "part 13 written successfully."
[1] "part 67 written successfully."
[1] "part 10 written successfully."
[1] "part 24 written successfully."
[1] "part 20 written successfully."
[1] "part 74 written successfully."
[1] "part 50 written successfully."
[1] "part 2 written successfully."
[1] "part 17 written successfully."
[1] "part 66 written successfully."
[1] "part 61 written successfully."
[1][1] "part 7 written successfully." "part 31 written successfully."
[1] "part 70 written successfully."
[1] "part 71 written successfully."
[1][1] "part 76 written successfully."
"part 42 written successfully."
[1][1] "part 72 written successfully." "part 12 written successfully."
[1] "part 39 written successfully."
[1] "part 58 written successfully."
[1] "part 87 written successfully."
[1] "part 81 written successfully."
[1] "part 3 written successfully."
[1] "part 29 written successfully."
[1] "part 82 written successfully."
[1] "part 54 written successfully."
[1] "part 23 written successfully."
[1] "part 56 written successfully."
[1] "part 73 written successfully."
[1] "part 27 written successfully."
[1] "part 64 written successfully."
[1] "part 46 written successfully."
[1] "part 80 written successfully."
[1] "part 53 written successfully."
[1] "part 43 written successfully."
[1] "part 47 written successfully."
[1] "part 57 written successfully."
[1] "part 75 written successfully."
[1] "part 84 written successfully."
[1] "part 85 written successfully."
[1][1] "part 86 written successfully."
"part 19 written successfully."
[1][1] "part 33 written successfully." "part 34 written successfully."
[1] "part 21 written successfully."
[1] "part 40 written successfully."
[1][1] "part 4 written successfully."
"part 28 written successfully."
[1] "part 62 written successfully."
[1] "part 16 written successfully."
[1] "part 79 written successfully."
[1] "part 6 written successfully."
[1] "part 77 written successfully."
[1] "part 65 written successfully."
[1] "part 60 written successfully."
[1] "part 78 written successfully."
[1] "part 1 written successfully."
[1] "part 83 written successfully."
[1] "part 9 written successfully."
[1] "part 15 written successfully."
[1][1] "part 41 written successfully."
"part 35 written successfully."
[1][1] "part 8 written successfully." "part 69 written successfully."
[1] "part 59 written successfully."
[1] "part 25 written successfully."
[1] "part 49 written successfully."
[1] "part 52 written successfully."
List of 88
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
[1] "part 31 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> lapply -> FUN -> mcparallel -> sendMaster
[1] "part 71 written successfully."
[1] "part 35 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 42 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 4 written successfully."
Execution halted
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1][1] "part 30 written successfully." "part 40 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 41 written successfully."
[1] "part 80 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 45 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 82 written successfully."
[1] "part 54 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 59 written successfully."
Execution halted
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 27 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 36 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 44 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 83 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 61 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 70 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 66 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 10 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 53 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 55 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 15 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 50 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 5 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 48 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 32 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 69 written successfully."
Execution halted
[1] "part 52 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 77 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 24 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 56 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 6 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 62 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 33 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 16 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 34 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 74 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 2 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 19 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 9 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 11 written successfully."
[1] "part 28 written successfully."
[1] "part 29 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 17 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 65 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 68 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 43 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 38 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 1 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 57 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 8 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 25 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1]List of 85
$ : logi TRUE
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
error in parallel:::sendMaster
6 messages · Tomas Kalibera, Andreas Kersting
Hi again, One important correction of my first message: I misinterpreted the output. Actually in that R session 2 input files were processed one after the other in a loop. The first (with 88 parts went fine). The second (with 85 parts) produced the sendMaster errors and failed. If (in a new session via Rscript) I only process the second input file it will work. The other observations on R vs Rscript, NFS share etc. still hold. Sorry for this! Regards, Andreas 2019-11-27 12:10 GMT+01:00 Andreas Kersting<r-devel at akersting.de>:
Hi,
I am facing a very weird problem with parallel::mclapply. I have a script which does some data wrangling on an input dataset in parallel and then writes the results to disk. I have been using this script daily for more than one year always on an EC2 instance launched from the same AMI (no updates installed after launch) and processed thousands of different input data sets successfully. I now have an input dataset for which I face the following bug:
The basic outline of the problematic section of the script:
# parts is a data.table with 88 rows
mc_ret <- parallel::mclapply(sample.int(nrow(parts)), function(i) {
# do some data wrangling and write the result to a file
# ...
print(paste0("part ", i, " written successfully."))
return(TRUE)
}, mc.preschedule = FALSE, mc.cores = 2L)
str(mc_ret)
Expected output: "part i written successfully." is printed 88 times, once for each value of i. mc_ret is a list of length 88, each element being TRUE. Its structure is printed once. All outputs are created successfully.
Actual output (see end of the message): "part i written successfully." is printed 88 times, once for each value of i. mc_ret is a list of length 88, each element being TRUE. Its structure is printed. All outputs are created successfully. So far so good.
But then "part i written successfully." it is printed another X times, for values of i for which it was already printed. This output is intermingled with X-1 times the following error message:
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
and Y times the message "Execution halted". mc_ret is printed again, now being a list of length 85, with the first element being TRUE and all other elements being NULL. X and Y vary from run to run.
Now to the main problem: I tried very hard to create a reproducible example, but I failed. What I observed:
- The output is (and has always been) written to path which is on an NFS share. If I instead write to a path on a local disk it will work.
- The script is invoked using Rscript. If I instead source it from an interactive R session it works. There are at least two more people who have observed this: https://stackoverflow.com/questions/51986674/mclapply-sendmaster-error-only-with-rscript
- Before the call to mclapply the code acquires an exclusive file lock on a dedicated lock file, not written to but also on the NFS share. If I remove the code acquiring the lock, the whole script will also work if called using Rscript.
- The problem also occurs for mc.preschedule = TRUE.
- There is no error if I set mc.cores to 1.
- And stressing again: the code works without any changes from Rscript for thousands of other data sets.
Rscript -e "sessionInfo()":
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.5.2
I know this is a fairly old R version. I have not been able to reproduce the bug with a more recent version, but since it is so difficult to trigger, this does not mean much, I guess. I have looked through the changes made to the code of mclapply since that version and could not find something directly related. I am not even sure if it is a problem in the parallel package or some other (memory) bug. What strikes me is that others have observed a very similar error when using Rscript but not when using an interactive R session, just like I do.
I am not expecting a fix based on the information I provide, but maybe someone has some thoughts on this!?
Regards,
Andreas
Actual output:
[1] "part 51 written successfully."
[1] "part 30 written successfully."
[1] "part 32 written successfully."
[1] "part 48 written successfully."
[1] "part 63 written successfully."
[1] "part 5 written successfully."
[1][1] "part 14 written successfully." "part 18 written successfully."
[1] "part 38 written successfully."
[1] "part 11 written successfully."
[1] "part 68 written successfully."
[1] "part 45 written successfully."
[1] "part 88 written successfully."
[1] "part 36 written successfully."
[1] "part 44 written successfully."
[1] "part 55 written successfully."
[1] "part 26 written successfully."
[1] "part 37 written successfully."
[1] "part 22 written successfully."
[1] "part 13 written successfully."
[1] "part 67 written successfully."
[1] "part 10 written successfully."
[1] "part 24 written successfully."
[1] "part 20 written successfully."
[1] "part 74 written successfully."
[1] "part 50 written successfully."
[1] "part 2 written successfully."
[1] "part 17 written successfully."
[1] "part 66 written successfully."
[1] "part 61 written successfully."
[1][1] "part 7 written successfully." "part 31 written successfully."
[1] "part 70 written successfully."
[1] "part 71 written successfully."
[1][1] "part 76 written successfully."
"part 42 written successfully."
[1][1] "part 72 written successfully." "part 12 written successfully."
[1] "part 39 written successfully."
[1] "part 58 written successfully."
[1] "part 87 written successfully."
[1] "part 81 written successfully."
[1] "part 3 written successfully."
[1] "part 29 written successfully."
[1] "part 82 written successfully."
[1] "part 54 written successfully."
[1] "part 23 written successfully."
[1] "part 56 written successfully."
[1] "part 73 written successfully."
[1] "part 27 written successfully."
[1] "part 64 written successfully."
[1] "part 46 written successfully."
[1] "part 80 written successfully."
[1] "part 53 written successfully."
[1] "part 43 written successfully."
[1] "part 47 written successfully."
[1] "part 57 written successfully."
[1] "part 75 written successfully."
[1] "part 84 written successfully."
[1] "part 85 written successfully."
[1][1] "part 86 written successfully."
"part 19 written successfully."
[1][1] "part 33 written successfully." "part 34 written successfully."
[1] "part 21 written successfully."
[1] "part 40 written successfully."
[1][1] "part 4 written successfully."
"part 28 written successfully."
[1] "part 62 written successfully."
[1] "part 16 written successfully."
[1] "part 79 written successfully."
[1] "part 6 written successfully."
[1] "part 77 written successfully."
[1] "part 65 written successfully."
[1] "part 60 written successfully."
[1] "part 78 written successfully."
[1] "part 1 written successfully."
[1] "part 83 written successfully."
[1] "part 9 written successfully."
[1] "part 15 written successfully."
[1][1] "part 41 written successfully."
"part 35 written successfully."
[1][1] "part 8 written successfully." "part 69 written successfully."
[1] "part 59 written successfully."
[1] "part 25 written successfully."
[1] "part 49 written successfully."
[1] "part 52 written successfully."
List of 88
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
[1] "part 31 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> lapply -> FUN -> mcparallel -> sendMaster
[1] "part 71 written successfully."
[1] "part 35 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 42 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 4 written successfully."
Execution halted
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1][1] "part 30 written successfully." "part 40 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 41 written successfully."
[1] "part 80 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 45 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 82 written successfully."
[1] "part 54 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 59 written successfully."
Execution halted
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 27 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 36 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 44 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 83 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 61 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 70 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 66 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 10 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 53 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 55 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 15 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 50 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 5 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 48 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 32 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 69 written successfully."
Execution halted
[1] "part 52 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 77 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 24 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 56 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 6 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 62 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 33 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 16 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 34 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 74 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 2 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 19 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 9 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 11 written successfully."
[1] "part 28 written successfully."
[1] "part 29 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 17 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 65 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 68 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 43 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 38 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 1 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 57 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 8 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 25 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1]List of 85
$ : logi TRUE
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Hi Andreas, the error is reported when some child process cannot send results to the master process, which originates from an error returned by write() - when write() returns -1 or 0. The logic around the writing has not changed since R 3.5.2. It should not be related to the printing in the child, only to returning the value. The problem may be originating from the execution environment, virtualization, and/or possibly from a lack of robustness in R. To resolve this we need to find out which error was returned and why. Either you can try to create a reproducible example (something I could use to trigger an error on my system and then debug) or to debug on your system (build R from source, ensure the bug is still triggered, then instrument to print the exact error from the OS and where it was detected, etc). In principle you could also try without code instrumentation just using strace. Just from looking at the code in R around the writing I am not seeing any bug there. If you choose to debug on your system I can help with the instrumentation. Best Tomas
On 11/27/19 12:40 PM, Andreas Kersting wrote:
Hi again, One important correction of my first message: I misinterpreted the output. Actually in that R session 2 input files were processed one after the other in a loop. The first (with 88 parts went fine). The second (with 85 parts) produced the sendMaster errors and failed. If (in a new session via Rscript) I only process the second input file it will work. The other observations on R vs Rscript, NFS share etc. still hold. Sorry for this! Regards, Andreas 2019-11-27 12:10 GMT+01:00 Andreas Kersting<r-devel at akersting.de>:
Hi,
I am facing a very weird problem with parallel::mclapply. I have a script which does some data wrangling on an input dataset in parallel and then writes the results to disk. I have been using this script daily for more than one year always on an EC2 instance launched from the same AMI (no updates installed after launch) and processed thousands of different input data sets successfully. I now have an input dataset for which I face the following bug:
The basic outline of the problematic section of the script:
# parts is a data.table with 88 rows
mc_ret <- parallel::mclapply(sample.int(nrow(parts)), function(i) {
# do some data wrangling and write the result to a file
# ...
print(paste0("part ", i, " written successfully."))
return(TRUE)
}, mc.preschedule = FALSE, mc.cores = 2L)
str(mc_ret)
Expected output: "part i written successfully." is printed 88 times, once for each value of i. mc_ret is a list of length 88, each element being TRUE. Its structure is printed once. All outputs are created successfully.
Actual output (see end of the message): "part i written successfully." is printed 88 times, once for each value of i. mc_ret is a list of length 88, each element being TRUE. Its structure is printed. All outputs are created successfully. So far so good.
But then "part i written successfully." it is printed another X times, for values of i for which it was already printed. This output is intermingled with X-1 times the following error message:
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
and Y times the message "Execution halted". mc_ret is printed again, now being a list of length 85, with the first element being TRUE and all other elements being NULL. X and Y vary from run to run.
Now to the main problem: I tried very hard to create a reproducible example, but I failed. What I observed:
- The output is (and has always been) written to path which is on an NFS share. If I instead write to a path on a local disk it will work.
- The script is invoked using Rscript. If I instead source it from an interactive R session it works. There are at least two more people who have observed this: https://stackoverflow.com/questions/51986674/mclapply-sendmaster-error-only-with-rscript
- Before the call to mclapply the code acquires an exclusive file lock on a dedicated lock file, not written to but also on the NFS share. If I remove the code acquiring the lock, the whole script will also work if called using Rscript.
- The problem also occurs for mc.preschedule = TRUE.
- There is no error if I set mc.cores to 1.
- And stressing again: the code works without any changes from Rscript for thousands of other data sets.
Rscript -e "sessionInfo()":
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.5.2
I know this is a fairly old R version. I have not been able to reproduce the bug with a more recent version, but since it is so difficult to trigger, this does not mean much, I guess. I have looked through the changes made to the code of mclapply since that version and could not find something directly related. I am not even sure if it is a problem in the parallel package or some other (memory) bug. What strikes me is that others have observed a very similar error when using Rscript but not when using an interactive R session, just like I do.
I am not expecting a fix based on the information I provide, but maybe someone has some thoughts on this!?
Regards,
Andreas
Actual output:
[1] "part 51 written successfully."
[1] "part 30 written successfully."
[1] "part 32 written successfully."
[1] "part 48 written successfully."
[1] "part 63 written successfully."
[1] "part 5 written successfully."
[1][1] "part 14 written successfully." "part 18 written successfully."
[1] "part 38 written successfully."
[1] "part 11 written successfully."
[1] "part 68 written successfully."
[1] "part 45 written successfully."
[1] "part 88 written successfully."
[1] "part 36 written successfully."
[1] "part 44 written successfully."
[1] "part 55 written successfully."
[1] "part 26 written successfully."
[1] "part 37 written successfully."
[1] "part 22 written successfully."
[1] "part 13 written successfully."
[1] "part 67 written successfully."
[1] "part 10 written successfully."
[1] "part 24 written successfully."
[1] "part 20 written successfully."
[1] "part 74 written successfully."
[1] "part 50 written successfully."
[1] "part 2 written successfully."
[1] "part 17 written successfully."
[1] "part 66 written successfully."
[1] "part 61 written successfully."
[1][1] "part 7 written successfully." "part 31 written successfully."
[1] "part 70 written successfully."
[1] "part 71 written successfully."
[1][1] "part 76 written successfully."
"part 42 written successfully."
[1][1] "part 72 written successfully." "part 12 written successfully."
[1] "part 39 written successfully."
[1] "part 58 written successfully."
[1] "part 87 written successfully."
[1] "part 81 written successfully."
[1] "part 3 written successfully."
[1] "part 29 written successfully."
[1] "part 82 written successfully."
[1] "part 54 written successfully."
[1] "part 23 written successfully."
[1] "part 56 written successfully."
[1] "part 73 written successfully."
[1] "part 27 written successfully."
[1] "part 64 written successfully."
[1] "part 46 written successfully."
[1] "part 80 written successfully."
[1] "part 53 written successfully."
[1] "part 43 written successfully."
[1] "part 47 written successfully."
[1] "part 57 written successfully."
[1] "part 75 written successfully."
[1] "part 84 written successfully."
[1] "part 85 written successfully."
[1][1] "part 86 written successfully."
"part 19 written successfully."
[1][1] "part 33 written successfully." "part 34 written successfully."
[1] "part 21 written successfully."
[1] "part 40 written successfully."
[1][1] "part 4 written successfully."
"part 28 written successfully."
[1] "part 62 written successfully."
[1] "part 16 written successfully."
[1] "part 79 written successfully."
[1] "part 6 written successfully."
[1] "part 77 written successfully."
[1] "part 65 written successfully."
[1] "part 60 written successfully."
[1] "part 78 written successfully."
[1] "part 1 written successfully."
[1] "part 83 written successfully."
[1] "part 9 written successfully."
[1] "part 15 written successfully."
[1][1] "part 41 written successfully."
"part 35 written successfully."
[1][1] "part 8 written successfully." "part 69 written successfully."
[1] "part 59 written successfully."
[1] "part 25 written successfully."
[1] "part 49 written successfully."
[1] "part 52 written successfully."
List of 88
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
[1] "part 31 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> lapply -> FUN -> mcparallel -> sendMaster
[1] "part 71 written successfully."
[1] "part 35 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 42 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 4 written successfully."
Execution halted
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1][1] "part 30 written successfully." "part 40 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 41 written successfully."
[1] "part 80 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 45 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 82 written successfully."
[1] "part 54 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 59 written successfully."
Execution halted
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 27 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 36 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 44 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 83 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 61 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 70 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 66 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 10 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 53 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 55 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 15 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 50 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 5 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 48 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 32 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 69 written successfully."
Execution halted
[1] "part 52 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 77 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 24 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 56 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 6 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 62 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 33 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 16 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 34 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 74 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 2 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 19 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 9 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 11 written successfully."
[1] "part 28 written successfully."
[1] "part 29 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 17 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 65 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 68 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 43 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 38 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 1 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 57 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 8 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 25 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1]List of 85
$ : logi TRUE
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Hi Tomas, Thanks for your prompt reply and your offer to help. I might need to get back to this since I am not too experienced in debugging these kinds of issues. Anyway, I gave it a try and I think I have found the immediate cause: I installed the debug symbols (r-base-core-dbg), placed https://github.com/wch/r-source/blob/tags/R-3-5-2/src/library/parallel/src/fork.c in cwd and changed the wrapper code to: mc_ret <- parallel::mclapply(seq_len(nrow(parts)), function(i) { # we fail for the input resulting in parts having 85 rows if (nrow(parts) == 85L && !file.exists(as.character(Sys.getpid()))) { file.create(as.character(Sys.getpid())) print(Sys.getpid()) Sys.sleep(30) } # ... return(TRUE) }, mc.preschedule = TRUE, mc.cores = 2L) This way I ended up with only two child processes to which I each attached a debugger. In total I ran about 10 debugging sessions and it was always the second child process failing. The errno after write returned -1 was 9 (EBADF).
From what I can see, the reason for this is that the second child tries to write to fd 7, but already during the very beginning of the first invocation of the anonymous function to parallelize, i.e. during Sys.sleep(30), there is no such file descriptor. From this observation I would conclude that it is NOT the code run from that function, i.e. # ..., causing the issue. Let me point out again, that this is NOT the very first invocation of mclapply in this R session. There is at least one previous call to it, which works fine.
File descriptors directly after attaching gdb to both child processes during Sys.sleep(30):
### master
root at ip-10-0-48-30:~/latest_test# ls -l /proc/22119/fd
total 0
lrwx------ 1 root root 64 Nov 28 04:49 0 -> /dev/pts/0
lrwx------ 1 root root 64 Nov 28 04:49 1 -> /dev/pts/0
lrwx------ 1 root root 64 Nov 28 04:49 2 -> /dev/pts/0
lr-x------ 1 root root 64 Nov 28 04:49 3 -> /path/to/script.R
lrwx------ 1 root root 64 Nov 28 04:49 4 -> /path/on/nfs/write.lock
lr-x------ 1 root root 64 Nov 28 04:49 5 -> 'pipe:[266120]'
l-wx------ 1 root root 64 Nov 28 04:49 8 -> 'pipe:[266121]'
### first child (writes to fd 6)
(gdb) shell ls -l /proc/22134/fd
total 0
lrwx------ 1 root root 64 Nov 28 04:42 0 -> /dev/pts/0
lrwx------ 1 root root 64 Nov 28 04:42 1 -> /dev/pts/0
lrwx------ 1 root root 64 Nov 28 04:42 2 -> /dev/pts/0
lr-x------ 1 root root 64 Nov 28 04:42 3 -> /path/to/script.R
lrwx------ 1 root root 64 Nov 28 04:42 4 -> /path/on/nfs/write.lock
l-wx------ 1 root root 64 Nov 28 04:42 6 -> 'pipe:[266120]'
l-wx------ 1 root root 64 Nov 28 04:42 8 -> 'pipe:[266121]'
### second child (tries writing to fd 7)
(gdb) shell ls -l /proc/22135/fd
total 0
lr-x------ 1 root root 64 Nov 28 04:42 0 -> 'pipe:[266123]'
lrwx------ 1 root root 64 Nov 28 04:42 1 -> /dev/pts/0
l-wx------ 1 root root 64 Nov 28 04:42 10 -> 'pipe:[266123]'
lrwx------ 1 root root 64 Nov 28 04:42 2 -> /dev/pts/0
lr-x------ 1 root root 64 Nov 28 04:42 3 -> /path/to/script.R
lrwx------ 1 root root 64 Nov 28 04:42 4 -> /path/on/nfs/write.lock
# debugging mc_send_master in both child processes:
### first child (all OK):
(gdb) break mc_send_master
Breakpoint 1 at 0x7f5b2687b0a0: file fork.c, line 681.
(gdb) c
Continuing.
Breakpoint 1, mc_send_master (what=0x555c6dd2cd00) at fork.c:681
warning: Source file is more recent than executable.
681 {
(gdb) info args
what = 0x555c6dd2cd00
(gdb) n
682 if (is_master)
(gdb) n
684 if (master_fd == -1)
(gdb) n
686 if (TYPEOF(what) != RAWSXP)
(gdb) n
688 R_xlen_t len = XLENGTH(what);
(gdb) n
689 unsigned char *b = RAW(what);
(gdb) n
693 if (writerep(master_fd, &len, sizeof(len)) != sizeof(len)) {
(gdb) info locals
len = 538
b = <optimized out>
n = <optimized out>
(gdb) s
writerep (fildes=6, buf=buf at entry=0x7ffdd6da7db0, nbyte=nbyte at entry=8)
at fork.c:653
653 {
(gdb) info args
fildes = 6
buf = 0x7ffdd6da7db0
nbyte = 8
(gdb) n
654 size_t wbyte = 0;
(gdb) n
653 {
(gdb) n
657 ssize_t w = write(fildes, ptr + wbyte, nbyte - wbyte);
(gdb) n
658 if (w == -1) {
(gdb) n
657 ssize_t w = write(fildes, ptr + wbyte, nbyte - wbyte);
(gdb) n
658 if (w == -1) {
(gdb) n
664 if (w == 0) {
(gdb) n
670 wbyte += w;
(gdb) n
671 if (wbyte == nbyte)
(gdb) n
672 return wbyte;
(gdb) n
674 }
(gdb) n
mc_send_master (what=<optimized out>) at fork.c:699
699 for (R_xlen_t i = 0; i < len; i += n) {
(gdb) n
700 n = writerep(master_fd, b + i, len - i);
(gdb) n
701 if (n < 1) {
(gdb) n
699 for (R_xlen_t i = 0; i < len; i += n) {
(gdb) n
707 return ScalarLogical(1);
(gdb) n
708 }
### second child (write fails with EBADF):
(gdb) break mc_send_master
Breakpoint 1 at 0x7f5016f3e0a0: file fork.c, line 681.
(gdb) c
Continuing.
Breakpoint 1, mc_send_master (what=0x563382a71910) at fork.c:681
warning: Source file is more recent than executable.
681 {
(gdb) info args
what = 0x563382a71910
(gdb) n
682 if (is_master)
(gdb) n
684 if (master_fd == -1)
(gdb) n
686 if (TYPEOF(what) != RAWSXP)
(gdb) n
688 R_xlen_t len = XLENGTH(what);
(gdb) n
689 unsigned char *b = RAW(what);
(gdb) n
693 if (writerep(master_fd, &len, sizeof(len)) != sizeof(len)) {
(gdb) info locals
len = 526
b = <optimized out>
n = <optimized out>
(gdb) s
writerep (fildes=7, buf=buf at entry=0x7fff4027ad60, nbyte=nbyte at entry=8)
at fork.c:653
653 {
(gdb) info args
fildes = 7
buf = 0x7fff4027ad60
nbyte = 8
(gdb) n
654 size_t wbyte = 0;
(gdb) n
653 {
(gdb) n
657 ssize_t w = write(fildes, ptr + wbyte, nbyte - wbyte);
(gdb) n
658 if (w == -1) {
(gdb) info locals
w = <optimized out>
wbyte = 0
ptr = 0x7fff4027ad60 "\016\002"
(gdb) n
657 ssize_t w = write(fildes, ptr + wbyte, nbyte - wbyte);
(gdb) n
658 if (w == -1) {
(gdb) n
659 if (errno == EINTR)
(gdb) n
674 }
(gdb) p __errno_location()
$1 = (int *) 0x7f50322cb540
(gdb) x/x $1
0x7f50322cb540: 0x00000009
(gdb) python import errno
(gdb) python print(errno.errorcode[9])
EBADF
(gdb) n
mc_send_master (what=<optimized out>) at fork.c:702
702 close(master_fd);
(gdb) n
704 error(_("write error, closing pipe to the master"));
(gdb) n
703 master_fd = -1;
(gdb) n
704 error(_("write error, closing pipe to the master"));
(gdb) n
685 error(_("there is no pipe to the master process"));
Does this help in any way?
Is there something else I can/should look at?
Regards,
Andreas
2019-11-27 15:04 GMT+01:00 Tomas Kalibera<tomas.kalibera at gmail.com>:
Hi Andreas, the error is reported when some child process cannot send results to the master process, which originates from an error returned by write() - when write() returns -1 or 0. The logic around the writing has not changed since R 3.5.2. It should not be related to the printing in the child, only to returning the value. The problem may be originating from the execution environment, virtualization, and/or possibly from a lack of robustness in R. To resolve this we need to find out which error was returned and why. Either you can try to create a reproducible example (something I could use to trigger an error on my system and then debug) or to debug on your system (build R from source, ensure the bug is still triggered, then instrument to print the exact error from the OS and where it was detected, etc). In principle you could also try without code instrumentation just using strace. Just from looking at the code in R around the writing I am not seeing any bug there. If you choose to debug on your system I can help with the instrumentation. Best Tomas On 11/27/19 12:40 PM, Andreas Kersting wrote:
Hi again, One important correction of my first message: I misinterpreted the output. Actually in that R session 2 input files were processed one after the other in a loop. The first (with 88 parts went fine). The second (with 85 parts) produced the sendMaster errors and failed. If (in a new session via Rscript) I only process the second input file it will work. The other observations on R vs Rscript, NFS share etc. still hold. Sorry for this! Regards, Andreas 2019-11-27 12:10 GMT+01:00 Andreas Kersting<r-devel at akersting.de>:
Hi,
I am facing a very weird problem with parallel::mclapply. I have a script which does some data wrangling on an input dataset in parallel and then writes the results to disk. I have been using this script daily for more than one year always on an EC2 instance launched from the same AMI (no updates installed after launch) and processed thousands of different input data sets successfully. I now have an input dataset for which I face the following bug:
The basic outline of the problematic section of the script:
# parts is a data.table with 88 rows
mc_ret <- parallel::mclapply(sample.int(nrow(parts)), function(i) {
# do some data wrangling and write the result to a file
# ...
print(paste0("part ", i, " written successfully."))
return(TRUE)
}, mc.preschedule = FALSE, mc.cores = 2L)
str(mc_ret)
Expected output: "part i written successfully." is printed 88 times, once for each value of i. mc_ret is a list of length 88, each element being TRUE. Its structure is printed once. All outputs are created successfully.
Actual output (see end of the message): "part i written successfully." is printed 88 times, once for each value of i. mc_ret is a list of length 88, each element being TRUE. Its structure is printed. All outputs are created successfully. So far so good.
But then "part i written successfully." it is printed another X times, for values of i for which it was already printed. This output is intermingled with X-1 times the following error message:
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
and Y times the message "Execution halted". mc_ret is printed again, now being a list of length 85, with the first element being TRUE and all other elements being NULL. X and Y vary from run to run.
Now to the main problem: I tried very hard to create a reproducible example, but I failed. What I observed:
- The output is (and has always been) written to path which is on an NFS share. If I instead write to a path on a local disk it will work.
- The script is invoked using Rscript. If I instead source it from an interactive R session it works. There are at least two more people who have observed this: https://stackoverflow.com/questions/51986674/mclapply-sendmaster-error-only-with-rscript
- Before the call to mclapply the code acquires an exclusive file lock on a dedicated lock file, not written to but also on the NFS share. If I remove the code acquiring the lock, the whole script will also work if called using Rscript.
- The problem also occurs for mc.preschedule = TRUE.
- There is no error if I set mc.cores to 1.
- And stressing again: the code works without any changes from Rscript for thousands of other data sets.
Rscript -e "sessionInfo()":
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.5.2
I know this is a fairly old R version. I have not been able to reproduce the bug with a more recent version, but since it is so difficult to trigger, this does not mean much, I guess. I have looked through the changes made to the code of mclapply since that version and could not find something directly related. I am not even sure if it is a problem in the parallel package or some other (memory) bug. What strikes me is that others have observed a very similar error when using Rscript but not when using an interactive R session, just like I do.
I am not expecting a fix based on the information I provide, but maybe someone has some thoughts on this!?
Regards,
Andreas
Actual output:
[1] "part 51 written successfully."
[1] "part 30 written successfully."
[1] "part 32 written successfully."
[1] "part 48 written successfully."
[1] "part 63 written successfully."
[1] "part 5 written successfully."
[1][1] "part 14 written successfully." "part 18 written successfully."
[1] "part 38 written successfully."
[1] "part 11 written successfully."
[1] "part 68 written successfully."
[1] "part 45 written successfully."
[1] "part 88 written successfully."
[1] "part 36 written successfully."
[1] "part 44 written successfully."
[1] "part 55 written successfully."
[1] "part 26 written successfully."
[1] "part 37 written successfully."
[1] "part 22 written successfully."
[1] "part 13 written successfully."
[1] "part 67 written successfully."
[1] "part 10 written successfully."
[1] "part 24 written successfully."
[1] "part 20 written successfully."
[1] "part 74 written successfully."
[1] "part 50 written successfully."
[1] "part 2 written successfully."
[1] "part 17 written successfully."
[1] "part 66 written successfully."
[1] "part 61 written successfully."
[1][1] "part 7 written successfully." "part 31 written successfully."
[1] "part 70 written successfully."
[1] "part 71 written successfully."
[1][1] "part 76 written successfully."
"part 42 written successfully."
[1][1] "part 72 written successfully." "part 12 written successfully."
[1] "part 39 written successfully."
[1] "part 58 written successfully."
[1] "part 87 written successfully."
[1] "part 81 written successfully."
[1] "part 3 written successfully."
[1] "part 29 written successfully."
[1] "part 82 written successfully."
[1] "part 54 written successfully."
[1] "part 23 written successfully."
[1] "part 56 written successfully."
[1] "part 73 written successfully."
[1] "part 27 written successfully."
[1] "part 64 written successfully."
[1] "part 46 written successfully."
[1] "part 80 written successfully."
[1] "part 53 written successfully."
[1] "part 43 written successfully."
[1] "part 47 written successfully."
[1] "part 57 written successfully."
[1] "part 75 written successfully."
[1] "part 84 written successfully."
[1] "part 85 written successfully."
[1][1] "part 86 written successfully."
"part 19 written successfully."
[1][1] "part 33 written successfully." "part 34 written successfully."
[1] "part 21 written successfully."
[1] "part 40 written successfully."
[1][1] "part 4 written successfully."
"part 28 written successfully."
[1] "part 62 written successfully."
[1] "part 16 written successfully."
[1] "part 79 written successfully."
[1] "part 6 written successfully."
[1] "part 77 written successfully."
[1] "part 65 written successfully."
[1] "part 60 written successfully."
[1] "part 78 written successfully."
[1] "part 1 written successfully."
[1] "part 83 written successfully."
[1] "part 9 written successfully."
[1] "part 15 written successfully."
[1][1] "part 41 written successfully."
"part 35 written successfully."
[1][1] "part 8 written successfully." "part 69 written successfully."
[1] "part 59 written successfully."
[1] "part 25 written successfully."
[1] "part 49 written successfully."
[1] "part 52 written successfully."
List of 88
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
[1] "part 31 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> lapply -> FUN -> mcparallel -> sendMaster
[1] "part 71 written successfully."
[1] "part 35 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 42 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 4 written successfully."
Execution halted
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1][1] "part 30 written successfully." "part 40 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 41 written successfully."
[1] "part 80 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 45 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 82 written successfully."
[1] "part 54 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 59 written successfully."
Execution halted
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 27 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 36 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 44 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 83 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 61 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 70 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 66 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 10 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 53 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 55 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 15 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 50 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 5 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 48 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 32 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 69 written successfully."
Execution halted
[1] "part 52 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 77 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 24 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 56 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 6 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 62 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 33 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 16 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 34 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 74 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 2 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 19 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 9 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 11 written successfully."
[1] "part 28 written successfully."
[1] "part 29 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 17 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 65 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 68 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 43 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 38 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 1 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 57 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 8 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 25 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1]List of 85
$ : logi TRUE
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Hi Andreas, thank you very much, good job finding it was EBADF. Now the question is why the pipe has been closed prematurely; it could be accidentally by R (a race condition in the cleanup code in fork.c) or possibly by some other code running in the same process (maybe the R program itself or some other code it runs). Maybe we can take this off the list and come back when we know the cause or have it fixed. It would help a lot if you could try with R built from source, with optimizations disabled to get more accurate debug symbols (e.g. env CFLAGS="-Wall -O0 -gdwarf-2 -g3" CXXFLAGS="-Wall -O0 -gdwarf-2 -g3" ./configure), and with MC_DEBUG defined in fork.c - line 26. Ideally in R-devel, so that we are sure the problem still exists. The debug messages should give a hint whether it was R (fork.c) that closed the pipe and why. Maybe you could also add a debug message to close_fds_child_ci() to see if it was closed there. Maybe you could find this out even in your current debugging setup via breakpoints and backtraces, but I think it may be easier to build from source with these debugging messages. Also if you could send me a complete example I could run that causes this on your system, that would be nice (even if it didn't cause the problem on my system). Thanks Tomas
On 11/28/19 6:35 AM, Andreas Kersting wrote:
Hi Tomas, Thanks for your prompt reply and your offer to help. I might need to get back to this since I am not too experienced in debugging these kinds of issues. Anyway, I gave it a try and I think I have found the immediate cause: I installed the debug symbols (r-base-core-dbg), placed https://github.com/wch/r-source/blob/tags/R-3-5-2/src/library/parallel/src/fork.c in cwd and changed the wrapper code to: mc_ret <- parallel::mclapply(seq_len(nrow(parts)), function(i) { # we fail for the input resulting in parts having 85 rows if (nrow(parts) == 85L && !file.exists(as.character(Sys.getpid()))) { file.create(as.character(Sys.getpid())) print(Sys.getpid()) Sys.sleep(30) } # ... return(TRUE) }, mc.preschedule = TRUE, mc.cores = 2L) This way I ended up with only two child processes to which I each attached a debugger. In total I ran about 10 debugging sessions and it was always the second child process failing. The errno after write returned -1 was 9 (EBADF). From what I can see, the reason for this is that the second child tries to write to fd 7, but already during the very beginning of the first invocation of the anonymous function to parallelize, i.e. during Sys.sleep(30), there is no such file descriptor. From this observation I would conclude that it is NOT the code run from that function, i.e. # ..., causing the issue. Let me point out again, that this is NOT the very first invocation of mclapply in this R session. There is at least one previous call to it, which works fine. File descriptors directly after attaching gdb to both child processes during Sys.sleep(30): ### master root at ip-10-0-48-30:~/latest_test# ls -l /proc/22119/fd total 0 lrwx------ 1 root root 64 Nov 28 04:49 0 -> /dev/pts/0 lrwx------ 1 root root 64 Nov 28 04:49 1 -> /dev/pts/0 lrwx------ 1 root root 64 Nov 28 04:49 2 -> /dev/pts/0 lr-x------ 1 root root 64 Nov 28 04:49 3 -> /path/to/script.R lrwx------ 1 root root 64 Nov 28 04:49 4 -> /path/on/nfs/write.lock lr-x------ 1 root root 64 Nov 28 04:49 5 -> 'pipe:[266120]' l-wx------ 1 root root 64 Nov 28 04:49 8 -> 'pipe:[266121]' ### first child (writes to fd 6) (gdb) shell ls -l /proc/22134/fd total 0 lrwx------ 1 root root 64 Nov 28 04:42 0 -> /dev/pts/0 lrwx------ 1 root root 64 Nov 28 04:42 1 -> /dev/pts/0 lrwx------ 1 root root 64 Nov 28 04:42 2 -> /dev/pts/0 lr-x------ 1 root root 64 Nov 28 04:42 3 -> /path/to/script.R lrwx------ 1 root root 64 Nov 28 04:42 4 -> /path/on/nfs/write.lock l-wx------ 1 root root 64 Nov 28 04:42 6 -> 'pipe:[266120]' l-wx------ 1 root root 64 Nov 28 04:42 8 -> 'pipe:[266121]' ### second child (tries writing to fd 7) (gdb) shell ls -l /proc/22135/fd total 0 lr-x------ 1 root root 64 Nov 28 04:42 0 -> 'pipe:[266123]' lrwx------ 1 root root 64 Nov 28 04:42 1 -> /dev/pts/0 l-wx------ 1 root root 64 Nov 28 04:42 10 -> 'pipe:[266123]' lrwx------ 1 root root 64 Nov 28 04:42 2 -> /dev/pts/0 lr-x------ 1 root root 64 Nov 28 04:42 3 -> /path/to/script.R lrwx------ 1 root root 64 Nov 28 04:42 4 -> /path/on/nfs/write.lock # debugging mc_send_master in both child processes: ### first child (all OK): (gdb) break mc_send_master Breakpoint 1 at 0x7f5b2687b0a0: file fork.c, line 681. (gdb) c Continuing. Breakpoint 1, mc_send_master (what=0x555c6dd2cd00) at fork.c:681 warning: Source file is more recent than executable. 681 { (gdb) info args what = 0x555c6dd2cd00 (gdb) n 682 if (is_master) (gdb) n 684 if (master_fd == -1) (gdb) n 686 if (TYPEOF(what) != RAWSXP) (gdb) n 688 R_xlen_t len = XLENGTH(what); (gdb) n 689 unsigned char *b = RAW(what); (gdb) n 693 if (writerep(master_fd, &len, sizeof(len)) != sizeof(len)) { (gdb) info locals len = 538 b = <optimized out> n = <optimized out> (gdb) s writerep (fildes=6, buf=buf at entry=0x7ffdd6da7db0, nbyte=nbyte at entry=8) at fork.c:653 653 { (gdb) info args fildes = 6 buf = 0x7ffdd6da7db0 nbyte = 8 (gdb) n 654 size_t wbyte = 0; (gdb) n 653 { (gdb) n 657 ssize_t w = write(fildes, ptr + wbyte, nbyte - wbyte); (gdb) n 658 if (w == -1) { (gdb) n 657 ssize_t w = write(fildes, ptr + wbyte, nbyte - wbyte); (gdb) n 658 if (w == -1) { (gdb) n 664 if (w == 0) { (gdb) n 670 wbyte += w; (gdb) n 671 if (wbyte == nbyte) (gdb) n 672 return wbyte; (gdb) n 674 } (gdb) n mc_send_master (what=<optimized out>) at fork.c:699 699 for (R_xlen_t i = 0; i < len; i += n) { (gdb) n 700 n = writerep(master_fd, b + i, len - i); (gdb) n 701 if (n < 1) { (gdb) n 699 for (R_xlen_t i = 0; i < len; i += n) { (gdb) n 707 return ScalarLogical(1); (gdb) n 708 } ### second child (write fails with EBADF): (gdb) break mc_send_master Breakpoint 1 at 0x7f5016f3e0a0: file fork.c, line 681. (gdb) c Continuing. Breakpoint 1, mc_send_master (what=0x563382a71910) at fork.c:681 warning: Source file is more recent than executable. 681 { (gdb) info args what = 0x563382a71910 (gdb) n 682 if (is_master) (gdb) n 684 if (master_fd == -1) (gdb) n 686 if (TYPEOF(what) != RAWSXP) (gdb) n 688 R_xlen_t len = XLENGTH(what); (gdb) n 689 unsigned char *b = RAW(what); (gdb) n 693 if (writerep(master_fd, &len, sizeof(len)) != sizeof(len)) { (gdb) info locals len = 526 b = <optimized out> n = <optimized out> (gdb) s writerep (fildes=7, buf=buf at entry=0x7fff4027ad60, nbyte=nbyte at entry=8) at fork.c:653 653 { (gdb) info args fildes = 7 buf = 0x7fff4027ad60 nbyte = 8 (gdb) n 654 size_t wbyte = 0; (gdb) n 653 { (gdb) n 657 ssize_t w = write(fildes, ptr + wbyte, nbyte - wbyte); (gdb) n 658 if (w == -1) { (gdb) info locals w = <optimized out> wbyte = 0 ptr = 0x7fff4027ad60 "\016\002" (gdb) n 657 ssize_t w = write(fildes, ptr + wbyte, nbyte - wbyte); (gdb) n 658 if (w == -1) { (gdb) n 659 if (errno == EINTR) (gdb) n 674 } (gdb) p __errno_location() $1 = (int *) 0x7f50322cb540 (gdb) x/x $1 0x7f50322cb540: 0x00000009 (gdb) python import errno (gdb) python print(errno.errorcode[9]) EBADF (gdb) n mc_send_master (what=<optimized out>) at fork.c:702 702 close(master_fd); (gdb) n 704 error(_("write error, closing pipe to the master")); (gdb) n 703 master_fd = -1; (gdb) n 704 error(_("write error, closing pipe to the master")); (gdb) n 685 error(_("there is no pipe to the master process")); Does this help in any way? Is there something else I can/should look at? Regards, Andreas 2019-11-27 15:04 GMT+01:00 Tomas Kalibera<tomas.kalibera at gmail.com>:
Hi Andreas, the error is reported when some child process cannot send results to the master process, which originates from an error returned by write() - when write() returns -1 or 0. The logic around the writing has not changed since R 3.5.2. It should not be related to the printing in the child, only to returning the value. The problem may be originating from the execution environment, virtualization, and/or possibly from a lack of robustness in R. To resolve this we need to find out which error was returned and why. Either you can try to create a reproducible example (something I could use to trigger an error on my system and then debug) or to debug on your system (build R from source, ensure the bug is still triggered, then instrument to print the exact error from the OS and where it was detected, etc). In principle you could also try without code instrumentation just using strace. Just from looking at the code in R around the writing I am not seeing any bug there. If you choose to debug on your system I can help with the instrumentation. Best Tomas On 11/27/19 12:40 PM, Andreas Kersting wrote:
Hi again, One important correction of my first message: I misinterpreted the output. Actually in that R session 2 input files were processed one after the other in a loop. The first (with 88 parts went fine). The second (with 85 parts) produced the sendMaster errors and failed. If (in a new session via Rscript) I only process the second input file it will work. The other observations on R vs Rscript, NFS share etc. still hold. Sorry for this! Regards, Andreas 2019-11-27 12:10 GMT+01:00 Andreas Kersting<r-devel at akersting.de>:
Hi,
I am facing a very weird problem with parallel::mclapply. I have a script which does some data wrangling on an input dataset in parallel and then writes the results to disk. I have been using this script daily for more than one year always on an EC2 instance launched from the same AMI (no updates installed after launch) and processed thousands of different input data sets successfully. I now have an input dataset for which I face the following bug:
The basic outline of the problematic section of the script:
# parts is a data.table with 88 rows
mc_ret <- parallel::mclapply(sample.int(nrow(parts)), function(i) {
# do some data wrangling and write the result to a file
# ...
print(paste0("part ", i, " written successfully."))
return(TRUE)
}, mc.preschedule = FALSE, mc.cores = 2L)
str(mc_ret)
Expected output: "part i written successfully." is printed 88 times, once for each value of i. mc_ret is a list of length 88, each element being TRUE. Its structure is printed once. All outputs are created successfully.
Actual output (see end of the message): "part i written successfully." is printed 88 times, once for each value of i. mc_ret is a list of length 88, each element being TRUE. Its structure is printed. All outputs are created successfully. So far so good.
But then "part i written successfully." it is printed another X times, for values of i for which it was already printed. This output is intermingled with X-1 times the following error message:
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
and Y times the message "Execution halted". mc_ret is printed again, now being a list of length 85, with the first element being TRUE and all other elements being NULL. X and Y vary from run to run.
Now to the main problem: I tried very hard to create a reproducible example, but I failed. What I observed:
- The output is (and has always been) written to path which is on an NFS share. If I instead write to a path on a local disk it will work.
- The script is invoked using Rscript. If I instead source it from an interactive R session it works. There are at least two more people who have observed this: https://stackoverflow.com/questions/51986674/mclapply-sendmaster-error-only-with-rscript
- Before the call to mclapply the code acquires an exclusive file lock on a dedicated lock file, not written to but also on the NFS share. If I remove the code acquiring the lock, the whole script will also work if called using Rscript.
- The problem also occurs for mc.preschedule = TRUE.
- There is no error if I set mc.cores to 1.
- And stressing again: the code works without any changes from Rscript for thousands of other data sets.
Rscript -e "sessionInfo()":
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.5.2
I know this is a fairly old R version. I have not been able to reproduce the bug with a more recent version, but since it is so difficult to trigger, this does not mean much, I guess. I have looked through the changes made to the code of mclapply since that version and could not find something directly related. I am not even sure if it is a problem in the parallel package or some other (memory) bug. What strikes me is that others have observed a very similar error when using Rscript but not when using an interactive R session, just like I do.
I am not expecting a fix based on the information I provide, but maybe someone has some thoughts on this!?
Regards,
Andreas
Actual output:
[1] "part 51 written successfully."
[1] "part 30 written successfully."
[1] "part 32 written successfully."
[1] "part 48 written successfully."
[1] "part 63 written successfully."
[1] "part 5 written successfully."
[1][1] "part 14 written successfully." "part 18 written successfully."
[1] "part 38 written successfully."
[1] "part 11 written successfully."
[1] "part 68 written successfully."
[1] "part 45 written successfully."
[1] "part 88 written successfully."
[1] "part 36 written successfully."
[1] "part 44 written successfully."
[1] "part 55 written successfully."
[1] "part 26 written successfully."
[1] "part 37 written successfully."
[1] "part 22 written successfully."
[1] "part 13 written successfully."
[1] "part 67 written successfully."
[1] "part 10 written successfully."
[1] "part 24 written successfully."
[1] "part 20 written successfully."
[1] "part 74 written successfully."
[1] "part 50 written successfully."
[1] "part 2 written successfully."
[1] "part 17 written successfully."
[1] "part 66 written successfully."
[1] "part 61 written successfully."
[1][1] "part 7 written successfully." "part 31 written successfully."
[1] "part 70 written successfully."
[1] "part 71 written successfully."
[1][1] "part 76 written successfully."
"part 42 written successfully."
[1][1] "part 72 written successfully." "part 12 written successfully."
[1] "part 39 written successfully."
[1] "part 58 written successfully."
[1] "part 87 written successfully."
[1] "part 81 written successfully."
[1] "part 3 written successfully."
[1] "part 29 written successfully."
[1] "part 82 written successfully."
[1] "part 54 written successfully."
[1] "part 23 written successfully."
[1] "part 56 written successfully."
[1] "part 73 written successfully."
[1] "part 27 written successfully."
[1] "part 64 written successfully."
[1] "part 46 written successfully."
[1] "part 80 written successfully."
[1] "part 53 written successfully."
[1] "part 43 written successfully."
[1] "part 47 written successfully."
[1] "part 57 written successfully."
[1] "part 75 written successfully."
[1] "part 84 written successfully."
[1] "part 85 written successfully."
[1][1] "part 86 written successfully."
"part 19 written successfully."
[1][1] "part 33 written successfully." "part 34 written successfully."
[1] "part 21 written successfully."
[1] "part 40 written successfully."
[1][1] "part 4 written successfully."
"part 28 written successfully."
[1] "part 62 written successfully."
[1] "part 16 written successfully."
[1] "part 79 written successfully."
[1] "part 6 written successfully."
[1] "part 77 written successfully."
[1] "part 65 written successfully."
[1] "part 60 written successfully."
[1] "part 78 written successfully."
[1] "part 1 written successfully."
[1] "part 83 written successfully."
[1] "part 9 written successfully."
[1] "part 15 written successfully."
[1][1] "part 41 written successfully."
"part 35 written successfully."
[1][1] "part 8 written successfully." "part 69 written successfully."
[1] "part 59 written successfully."
[1] "part 25 written successfully."
[1] "part 49 written successfully."
[1] "part 52 written successfully."
List of 88
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
[1] "part 31 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> lapply -> FUN -> mcparallel -> sendMaster
[1] "part 71 written successfully."
[1] "part 35 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 42 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 4 written successfully."
Execution halted
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1][1] "part 30 written successfully." "part 40 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 41 written successfully."
[1] "part 80 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 45 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 82 written successfully."
[1] "part 54 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 59 written successfully."
Execution halted
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 27 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 36 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 44 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 83 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 61 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 70 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 66 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 10 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 53 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 55 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 15 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 50 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 5 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 48 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 32 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 69 written successfully."
Execution halted
[1] "part 52 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 77 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 24 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 56 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 6 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 62 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 33 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 16 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 34 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 74 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 2 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 19 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 9 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 11 written successfully."
[1] "part 28 written successfully."
[1] "part 29 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 17 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 65 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 68 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 43 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 38 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 1 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 57 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 8 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 25 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1]List of 85
$ : logi TRUE
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
$ : NULL
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Hi Tomas,
I rebuild R (v3.5.2 for now, R-devel to follow) from the Debian package with MC_DEBUG defined and hopefully also with "-Wall -O0 -gdwarf-2 -g3", though I still have to verify this.
Below is the output. I think it is a total of two mclapply invocations in this R session, the failing one starting around the lines "[1] 15381" and "[1] 15382". The "Error in partDF::write_partDF ..." is because the script/package checks the return value of mclapply and detects that it is not a list of length 85 with only the elements "TRUE".
Regarding sending you the complete example: I first have to figure out if this is possible at all, because it would involve data of a client.
Regards,
Andreas
parent[15366] created pipes: comm (6->5), sir (8->7)
parent registers new child 15379
child process 15379 started
parent[15366] created pipes: comm (7->6), sir (10->9)
parent registers new child 15380
child process 15380 started
select_children: added child 15380 (6)
select_children: added child 15379 (5)
select_children: maxfd=6, wlen=2, wcount=2, timeout=-1.000000
child 15380: send_master (550 bytes)
sr = 1
- read select 1 children: 15380
child 15380: 'mcexit' called
child 15380 is waiting for permission to exit
read_child_ci(15380) - read length returned 8
read_child_ci(15380) - read 550 at 0 returned 550
select_children: added child 15380 (6)
select_children: added child 15379 (5)
select_children: maxfd=6, wlen=2, wcount=2, timeout=-1.000000
sr = 1
- read select 1 children: 15380
read_child_ci(15380) - read length returned 8
detached child 15380 (signal 10)
child process 15380 got SIGUSR1; child_exit_status=-1
child 15380: exiting
select_children: added child 15379 (5)
select_children: maxfd=5, wlen=1, wcount=1, timeout=-1.000000
child 15380 terminated with exit status 0
child 15379: send_master (550 bytes)
sr = 1
- read select 1 children: 15379
read_child_ci(15379) - read length returned 8
read_child_ci(15379) - read 550 at 0 returned 550
select_children: added child 15379 (5)
select_children: maxfd=5, wlen=1, wcount=1, timeout=-1.000000
child 15379: 'mcexit' called
child 15379 is waiting for permission to exit
sr = 1
- read select 1 children: 15379
read_child_ci(15379) - read length returned 8
detached child 15379 (signal 10)
child process 15379 got SIGUSR1; child_exit_status=-1
child 15379: exiting
removing waited-for child 15380 from the list
killed detached child 15379 (signal 15)
removing waited-for child 2147483647 from the list
child 15379 terminated with exit status 0
removing waited-for child 15379 from the list
parent[15366] created pipes: comm (6->5), sir (8->7)
parent registers new child 15381
child process 15381 started
parent[15366] created pipes: comm (7->6), sir (10->9)
[1] 15381
parent registers new child 15382
child process 15382 started
select_children: added child 15382 (6)
select_children: added child 15381 (5)
select_children: maxfd=6, wlen=2, wcount=2, timeout=-1.000000
sr = 1
- read select 1 children: 15382
read_child_ci(15382) - read length returned 0
detached child 15382 (signal 10)
child process 15382 got SIGUSR1; child_exit_status=-1
select_children: added child 15381 (5)
select_children: maxfd=5, wlen=1, wcount=1, timeout=-1.000000
[1] 15382
child 15382: send_master (526 bytes)
Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> lapply -> FUN -> sendMaster
child 15382: 'mcexit' called
child 15382: exiting
child 15382 terminated with exit status 1
child 15381: send_master (538 bytes)
sr = 1
- read select 1 children: 15381
read_child_ci(15381) - read length returned 8
read_child_ci(15381) - read 538 at 0 returned 538
child 15381: 'mcexit' called
child 15381 is waiting for permission to exit
select_children: added child 15381 (5)
select_children: maxfd=5, wlen=1, wcount=1, timeout=-1.000000
sr = 1
- read select 1 children: 15381
read_child_ci(15381) - read length returned 8
detached child 15381 (signal 10)
child process 15381 got SIGUSR1; child_exit_status=-1
child 15381: exiting
removing waited-for child 15382 from the list
killed detached child 15381 (signal 15)
removing waited-for child 2147483647 from the list
Error in partDF::write_partDF(X, out_path, c("Date", "Panel", "Type", :
at least one core did not return from parllel write; maybe it was killed (be the Linux Out of Memory Killer ?) or there was a fatal error in the forked process(es)
Calls: lapply -> FUN -> write_partDF_repo -> <Anonymous>
Execution halted
killed detached child 15381 (signal 9)
child 15381 terminated with exit status 0
removing waited-for child 15381 from the list
process 15366 parallel shutdown ok
2019-11-28 10:40 GMT+01:00 Tomas Kalibera<tomas.kalibera at gmail.com>:
Hi Andreas, thank you very much, good job finding it was EBADF. Now the question is=20 why the pipe has been closed prematurely; it could be accidentally by R=20 (a race condition in the cleanup code in fork.c) or possibly by some=20 other code running in the same process (maybe the R program itself or=20 some other code it runs). Maybe we can take this off the list and come=20 back when we know the cause or have it fixed. It would help a lot if you could try with R built from source, with=20 optimizations disabled to get more accurate debug symbols (e.g. env=20 CFLAGS=3D"-Wall -O0 -gdwarf-2 -g3" CXXFLAGS=3D"-Wall -O0 -gdwarf-2 -g3"=20 =2E/configure), and with MC_DEBUG defined in fork.c - line 26. Ideally in= =20 R-devel, so that we are sure the problem still exists. The debug=20 messages should give a hint whether it was R (fork.c) that closed the=20 pipe and why. Maybe you could also add a debug message to=20 close_fds_child_ci() to see if it was closed there. Maybe you could find = this out even in your current debugging setup via breakpoints and=20 backtraces, but I think it may be easier to build from source with these = debugging messages. Also if you could send me a complete example I could run that causes=20 this on your system, that would be nice (even if it didn't cause the=20 problem on my system). Thanks Tomas On 11/28/19 6:35 AM, Andreas Kersting wrote:
Hi Tomas, Thanks for your prompt reply and your offer to help. I might need to ge=
t back to this since I am not too experienced in debugging these kinds of= issues. Anyway, I gave it a try and I think I have found the immediate c= ause:
I installed the debug symbols (r-base-core-dbg), placed https://github.=
com/wch/r-source/blob/tags/R-3-5-2/src/library/parallel/src/fork.c in cwd= and changed the wrapper code to:
mc_ret <- parallel::mclapply(seq_len(nrow(parts)), function(i) {
# we fail for the input resulting in parts having 85 rows
if (nrow(parts) =3D=3D 85L && !file.exists(as.character(Sys.getpid=
()))) {
file.create(as.character(Sys.getpid()))
print(Sys.getpid())
Sys.sleep(30)
}
# ...
return(TRUE)
}, mc.preschedule =3D TRUE, mc.cores =3D 2L)
This way I ended up with only two child processes to which I each attac=
hed a debugger. In total I ran about 10 debugging sessions and it was alw= ays the second child process failing. The errno after write returned -1 w= as 9 (EBADF).
From what I can see, the reason for this is that the second child trie=
s to write to fd 7, but already during the very beginning of the first in= vocation of the anonymous function to parallelize, i.e. during Sys.sleep(= 30), there is no such file descriptor. From this observation I would conc= lude that it is NOT the code run from that function, i.e. # ..., causing= the issue. Let me point out again, that this is NOT the very first invoc= ation of mclapply in this R session. There is at least one previous call = to it, which works fine.
File descriptors directly after attaching gdb to both child processes d=
uring Sys.sleep(30):
### master
root at ip-10-0-48-30:~/latest_test# ls -l /proc/22119/fd
total 0
lrwx------ 1 root root 64 Nov 28 04:49 0 -> /dev/pts/0
lrwx------ 1 root root 64 Nov 28 04:49 1 -> /dev/pts/0
lrwx------ 1 root root 64 Nov 28 04:49 2 -> /dev/pts/0
lr-x------ 1 root root 64 Nov 28 04:49 3 -> /path/to/script.R
lrwx------ 1 root root 64 Nov 28 04:49 4 -> /path/on/nfs/write.lock
lr-x------ 1 root root 64 Nov 28 04:49 5 -> 'pipe:[266120]'
l-wx------ 1 root root 64 Nov 28 04:49 8 -> 'pipe:[266121]'
### first child (writes to fd 6)
(gdb) shell ls -l /proc/22134/fd
total 0
lrwx------ 1 root root 64 Nov 28 04:42 0 -> /dev/pts/0
lrwx------ 1 root root 64 Nov 28 04:42 1 -> /dev/pts/0
lrwx------ 1 root root 64 Nov 28 04:42 2 -> /dev/pts/0
lr-x------ 1 root root 64 Nov 28 04:42 3 -> /path/to/script.R
lrwx------ 1 root root 64 Nov 28 04:42 4 -> /path/on/nfs/write.lock
l-wx------ 1 root root 64 Nov 28 04:42 6 -> 'pipe:[266120]'
l-wx------ 1 root root 64 Nov 28 04:42 8 -> 'pipe:[266121]'
### second child (tries writing to fd 7)
(gdb) shell ls -l /proc/22135/fd
total 0
lr-x------ 1 root root 64 Nov 28 04:42 0 -> 'pipe:[266123]'
lrwx------ 1 root root 64 Nov 28 04:42 1 -> /dev/pts/0
l-wx------ 1 root root 64 Nov 28 04:42 10 -> 'pipe:[266123]'
lrwx------ 1 root root 64 Nov 28 04:42 2 -> /dev/pts/0
lr-x------ 1 root root 64 Nov 28 04:42 3 -> /path/to/script.R
lrwx------ 1 root root 64 Nov 28 04:42 4 -> /path/on/nfs/write.lock
# debugging mc_send_master in both child processes:
### first child (all OK):
(gdb) break mc_send_master
Breakpoint 1 at 0x7f5b2687b0a0: file fork.c, line 681.
(gdb) c
Continuing.
Breakpoint 1, mc_send_master (what=3D0x555c6dd2cd00) at fork.c:681
warning: Source file is more recent than executable.
681 {
(gdb) info args
what =3D 0x555c6dd2cd00
(gdb) n
682 if (is_master)
(gdb) n
684 if (master_fd =3D=3D -1)
(gdb) n
686 if (TYPEOF(what) !=3D RAWSXP)
(gdb) n
688 R_xlen_t len =3D XLENGTH(what);
(gdb) n
689 unsigned char *b =3D RAW(what);
(gdb) n
693 if (writerep(master_fd, &len, sizeof(len)) !=3D sizeof(len)) {
(gdb) info locals
len =3D 538
b =3D <optimized out>
n =3D <optimized out>
(gdb) s
writerep (fildes=3D6, buf=3Dbuf at entry=3D0x7ffdd6da7db0, nbyte=3Dnbyte at e=
ntry=3D8)
at fork.c:653
653 {
(gdb) info args
fildes =3D 6
buf =3D 0x7ffdd6da7db0
nbyte =3D 8
(gdb) n
654 size_t wbyte =3D 0;
(gdb) n
653 {
(gdb) n
657 ssize_t w =3D write(fildes, ptr + wbyte, nbyte - wbyte);
(gdb) n
658 if (w =3D=3D -1) {
(gdb) n
657 ssize_t w =3D write(fildes, ptr + wbyte, nbyte - wbyte);
(gdb) n
658 if (w =3D=3D -1) {
(gdb) n
664 if (w =3D=3D 0) {
(gdb) n
670 wbyte +=3D w;
(gdb) n
671 if (wbyte =3D=3D nbyte)
(gdb) n
672 return wbyte;
(gdb) n
674 }
(gdb) n
mc_send_master (what=3D<optimized out>) at fork.c:699
699 for (R_xlen_t i =3D 0; i < len; i +=3D n) {
(gdb) n
700 n =3D writerep(master_fd, b + i, len - i);
(gdb) n
701 if (n < 1) {
(gdb) n
699 for (R_xlen_t i =3D 0; i < len; i +=3D n) {
(gdb) n
707 return ScalarLogical(1);
(gdb) n
708 }
### second child (write fails with EBADF):
(gdb) break mc_send_master
Breakpoint 1 at 0x7f5016f3e0a0: file fork.c, line 681.
(gdb) c
Continuing.
Breakpoint 1, mc_send_master (what=3D0x563382a71910) at fork.c:681
warning: Source file is more recent than executable.
681 {
(gdb) info args
what =3D 0x563382a71910
(gdb) n
682 if (is_master)
(gdb) n
684 if (master_fd =3D=3D -1)
(gdb) n
686 if (TYPEOF(what) !=3D RAWSXP)
(gdb) n
688 R_xlen_t len =3D XLENGTH(what);
(gdb) n
689 unsigned char *b =3D RAW(what);
(gdb) n
693 if (writerep(master_fd, &len, sizeof(len)) !=3D sizeof(len)) {
(gdb) info locals
len =3D 526
b =3D <optimized out>
n =3D <optimized out>
(gdb) s
writerep (fildes=3D7, buf=3Dbuf at entry=3D0x7fff4027ad60, nbyte=3Dnbyte at e=
ntry=3D8)
at fork.c:653
653 {
(gdb) info args
fildes =3D 7
buf =3D 0x7fff4027ad60
nbyte =3D 8
(gdb) n
654 size_t wbyte =3D 0;
(gdb) n
653 {
(gdb) n
657 ssize_t w =3D write(fildes, ptr + wbyte, nbyte - wbyte);
(gdb) n
658 if (w =3D=3D -1) {
(gdb) info locals
w =3D <optimized out>
wbyte =3D 0
ptr =3D 0x7fff4027ad60 "\016\002"
(gdb) n
657 ssize_t w =3D write(fildes, ptr + wbyte, nbyte - wbyte);
(gdb) n
658 if (w =3D=3D -1) {
(gdb) n
659 if (errno =3D=3D EINTR)
(gdb) n
674 }
(gdb) p __errno_location()
$1 =3D (int *) 0x7f50322cb540
(gdb) x/x $1
0x7f50322cb540: 0x00000009
(gdb) python import errno
(gdb) python print(errno.errorcode[9])
EBADF
(gdb) n
mc_send_master (what=3D<optimized out>) at fork.c:702
702 close(master_fd);
(gdb) n
704 error(_("write error, closing pipe to the master"));
(gdb) n
703 master_fd =3D -1;
(gdb) n
704 error(_("write error, closing pipe to the master"));
(gdb) n
685 error(_("there is no pipe to the master process"));
Does this help in any way?
Is there something else I can/should look at?
Regards,
Andreas
2019-11-27 15:04 GMT+01:00 Tomas Kalibera<tomas.kalibera at gmail.com>:
Hi Andreas, the error is reported when some child process cannot send results to t=
he master process, which originates from an error returned by write() - w= hen write() returns -1 or 0. The logic around the writing has not changed= since R 3.5.2. It should not be related to the printing in the child, on= ly to returning the value. The problem may be originating from the execut= ion environment, virtualization, and/or possibly from a lack of robustnes= s in R. To resolve this we need to find out which error was returned and = why. Either you can try to create a reproducible example (something I cou= ld use to trigger an error on my system and then debug) or to debug on yo= ur system (build R from source, ensure the bug is still triggered, then i= nstrument to print the exact error from the OS and where it was detected,= etc). In principle you could also try without code instrumentation just = using strace. Just from looking at the code in R around the writing I am = not seeing any bug there. If you choose to debug on your system I can hel= p with the instrumentation.
Best Tomas On 11/27/19 12:40 PM, Andreas Kersting wrote:
Hi again, One important correction of my first message: I misinterpreted the ou=
tput. Actually in that R session 2 input files were processed one after t= he other in a loop. The first (with 88 parts went fine). The second (with= 85 parts) produced the sendMaster errors and failed. If (in a new sessio= n via Rscript) I only process the second input file it will work. The oth= er observations on R vs Rscript, NFS share etc. still hold.
Sorry for this! Regards, Andreas 2019-11-27 12:10 GMT+01:00 Andreas Kersting<r-devel at akersting.de>:
Hi, I am facing a very weird problem with parallel::mclapply. I have a s=
cript which does some data wrangling on an input dataset in parallel and = then writes the results to disk. I have been using this script daily for = more than one year always on an EC2 instance launched from the same AMI (= no updates installed after launch) and processed thousands of different i= nput data sets successfully. I now have an input dataset for which I face= the following bug:
The basic outline of the problematic section of the script:
# parts is a data.table with 88 rows
mc_ret <- parallel::mclapply(sample.int(nrow(parts)), function(i) {
# do some data wrangling and write the result to a file
# ...
print(paste0("part ", i, " written successfully."))
return(TRUE)
}, mc.preschedule =3D FALSE, mc.cores =3D 2L)
str(mc_ret)
Expected output: "part i written successfully." is printed 88 times,=
once for each value of i. mc_ret is a list of length 88, each element be= ing TRUE. Its structure is printed once. All outputs are created successf= ully.
Actual output (see end of the message): "part i written successfully=
=2E" is printed 88 times, once for each value of i. mc_ret is a list of l= ength 88, each element being TRUE. Its structure is printed. All outputs = are created successfully. So far so good.
But then "part i written successfully." it is printed another X time=
s, for values of i for which it was already printed. This output is inter= mingled with X-1 times the following error message:
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
and Y times the message "Execution halted". mc_ret is printed again,=
now being a list of length 85, with the first element being TRUE and all= other elements being NULL. X and Y vary from run to run.
Now to the main problem: I tried very hard to create a reproducible =
example, but I failed. What I observed:
- The output is (and has always been) written to path which is on an=
NFS share. If I instead write to a path on a local disk it will work.
- The script is invoked using Rscript. If I instead source it from a=
n interactive R session it works. There are at least two more people who = have observed this: https://stackoverflow.com/questions/51986674/mclapply= -sendmaster-error-only-with-rscript
- Before the call to mclapply the code acquires an exclusive file lo=
ck on a dedicated lock file, not written to but also on the NFS share. If= I remove the code acquiring the lock, the whole script will also work if= called using Rscript.
- The problem also occurs for mc.preschedule =3D TRUE. - There is no error if I set mc.cores to 1. - And stressing again: the code works without any changes from Rscri=
pt for thousands of other data sets.
Rscript -e "sessionInfo()": R version 3.5.2 (2018-12-20) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.2 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so locale: [1] LC_CTYPE=3DC.UTF-8 LC_NUMERIC=3DC LC_TIME=3DC=
=2EUTF-8
[4] LC_COLLATE=3DC.UTF-8 LC_MONETARY=3DC.UTF-8 LC_MESSAGES=
=3DC.UTF-8
[7] LC_PAPER=3DC.UTF-8 LC_NAME=3DC LC_ADDRESS=3D=
C
[10] LC_TELEPHONE=3DC LC_MEASUREMENT=3DC.UTF-8 LC_IDENTIFICA=
TION=3DC
attached base packages: [1] stats graphics grDevices utils datasets methods base=
loaded via a namespace (and not attached): [1] compiler_3.5.2 I know this is a fairly old R version. I have not been able to repro=
duce the bug with a more recent version, but since it is so difficult to = trigger, this does not mean much, I guess. I have looked through the chan= ges made to the code of mclapply since that version and could not find so= mething directly related. I am not even sure if it is a problem in the pa= rallel package or some other (memory) bug. What strikes me is that others= have observed a very similar error when using Rscript but not when using= an interactive R session, just like I do.
I am not expecting a fix based on the information I provide, but may=
be someone has some thoughts on this!?
Regards, Andreas Actual output: [1] "part 51 written successfully." [1] "part 30 written successfully." [1] "part 32 written successfully." [1] "part 48 written successfully." [1] "part 63 written successfully." [1] "part 5 written successfully." [1][1] "part 14 written successfully." "part 18 written successfully=
=2E"
[1] "part 38 written successfully." [1] "part 11 written successfully." [1] "part 68 written successfully." [1] "part 45 written successfully." [1] "part 88 written successfully." [1] "part 36 written successfully." [1] "part 44 written successfully." [1] "part 55 written successfully." [1] "part 26 written successfully." [1] "part 37 written successfully." [1] "part 22 written successfully." [1] "part 13 written successfully." [1] "part 67 written successfully." [1] "part 10 written successfully." [1] "part 24 written successfully." [1] "part 20 written successfully." [1] "part 74 written successfully." [1] "part 50 written successfully." [1] "part 2 written successfully." [1] "part 17 written successfully." [1] "part 66 written successfully." [1] "part 61 written successfully." [1][1] "part 7 written successfully." "part 31 written successfully.=
"
[1] "part 70 written successfully." [1] "part 71 written successfully." [1][1] "part 76 written successfully." "part 42 written successfully." [1][1] "part 72 written successfully." "part 12 written successfully=
=2E"
[1] "part 39 written successfully." [1] "part 58 written successfully." [1] "part 87 written successfully." [1] "part 81 written successfully." [1] "part 3 written successfully." [1] "part 29 written successfully." [1] "part 82 written successfully." [1] "part 54 written successfully." [1] "part 23 written successfully." [1] "part 56 written successfully." [1] "part 73 written successfully." [1] "part 27 written successfully." [1] "part 64 written successfully." [1] "part 46 written successfully." [1] "part 80 written successfully." [1] "part 53 written successfully." [1] "part 43 written successfully." [1] "part 47 written successfully." [1] "part 57 written successfully." [1] "part 75 written successfully." [1] "part 84 written successfully." [1] "part 85 written successfully." [1][1] "part 86 written successfully." "part 19 written successfully." [1][1] "part 33 written successfully." "part 34 written successfully=
=2E"
[1] "part 21 written successfully." [1] "part 40 written successfully." [1][1] "part 4 written successfully." "part 28 written successfully." [1] "part 62 written successfully." [1] "part 16 written successfully." [1] "part 79 written successfully." [1] "part 6 written successfully." [1] "part 77 written successfully." [1] "part 65 written successfully." [1] "part 60 written successfully." [1] "part 78 written successfully." [1] "part 1 written successfully." [1] "part 83 written successfully." [1] "part 9 written successfully." [1] "part 15 written successfully." [1][1] "part 41 written successfully." "part 35 written successfully." [1][1] "part 8 written successfully." "part 69 written successfully.=
"
[1] "part 59 written successfully."
[1] "part 25 written successfully."
[1] "part 49 written successfully."
[1] "part 52 written successfully."
List of 88
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
$ : logi TRUE
[1] "part 31 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> lapply -> FUN -> mcparallel -> send=
Master
[1] "part 71 written successfully."
[1] "part 35 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 42 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
Execution halted
[1] "part 4 written successfully."
Execution halted
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1][1] "part 30 written successfully." "part 40 written successfully=
=2E"
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
Execution halted
[1] "part 41 written successfully."
[1] "part 80 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 45 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
Execution halted
[1] "part 82 written successfully."
[1] "part 54 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 59 written successfully."
Execution halted
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 27 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
Execution halted
[1] "part 36 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
Execution halted
[1] "part 44 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
Execution halted
Execution halted
[1] "part 83 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 61 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 70 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 66 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 10 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
Execution halted
Execution halted
[1] "part 53 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 55 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 15 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 50 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 5 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
Execution halted
Execution halted
[1] "part 48 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 32 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 69 written successfully."
Execution halted
[1] "part 52 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 77 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
Execution halted
Execution halted
[1] "part 24 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 56 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 6 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 62 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 33 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
Execution halted
[1] "part 16 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 34 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 74 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
Execution halted
[1] "part 2 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 19 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
Execution halted
[1] "part 9 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 11 written successfully."
[1] "part 28 written successfully."
[1] "part 29 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 17 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 65 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
Execution halted
Execution halted
[1] "part 68 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 43 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 38 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 1 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 57 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 8 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1] "part 25 written successfully."
Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
ster
[1]List of 85 $ : logi TRUE $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL $ : NULL
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel