Hello
I'm trying to set-up a cluster having three nodes with cluster type=SOCK.
I have a problem to give the path to the rscript (different on the two
machines).... I try to give full information (fill the list(host,
rscript) as in the makeCluster example)
-Once I try only on local host (hist 212), give complete Rscript path:
does not work
ssh: Could not resolve hostname /usr/bin/Rscript: Name or service not known
workaround: don't give full info, only IP
-When add a second host (host 210):
-give full infos: same problem as above: ssh: Could not resolve
hostname /usr/bin/Rscript: Name or service not known
-give only IP: does not work as it is looking for same path for
rscript, and this is different on both machines
I'm really thankfull if you can give me advice and tell me wether I do
something wrong or not! Thanks a lot!
See:
library(snow)
#local host only
host212Full <-list(host = "dsge at 192.100.100.212", rscript =
"/usr/lib64/R/bin/Rscript")
makeCluster(c(rep(host212Full, 2)), type = "SOCK")
dsge at 192.100.100.212's password:
ssh: Could not resolve hostname /usr/lib64/R/bin/Rscript: Name or
service not known
#but it exists!
system("ls /usr/lib64/R/bin/Rs*")
/usr/lib64/R/bin/Rscript
host212<-"dsge at 192.100.100.212"
cl<-makeCluster(c(rep(host212, 2)), type = "SOCK")
stopCluster(cl)
#this work
###adding an external machine
#just by IP:
cl2 <- makeCluster(c("dsge at 192.100.100.212","dsge at 192.100.100.212",
"mat at 192.100.100.210"), type = "SOCK")
bash: /usr/lib64/R/bin/Rscript: Aucun fichier ou dossier de ce type
(english: no such file)
#with full info
host212Full <-list(host = "dsge at 192.100.100.212", rscript =
"/usr/lib64/R/bin/Rscript")
host210 <- list(host = "mat at 192.100.100.210", rscript = "/usr/bin/Rscript")
cl2 <-makeCluster(c(host210,rep(host212Full, 2)), type = "SOCK")
mat at 192.100.100.210's password:
bash: /usr/lib64/R/bin/Rscript: Aucun fichier ou dossier de ce type
#it did not take into account the specific path for 210!
#note that inverting it:
cl2 <-makeCluster(c(rep(host212Full, 2), host210), type = "SOCK")
ssh: Could not resolve hostname /usr/lib64/R/bin/Rscript: Name or
service not known
#shows the same problem as above
Version infos:
local host (212)
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 2
minor 7.1
year 2008
month 06
day 23
svn rev 45970
language R
version.string R version 2.7.1 (2008-06-23)
External host (210)
platform i486-pc-linux-gnu
arch i486
os linux-gnu
system i486, linux-gnu
status
major 2
minor 8.1
year 2008
month 12
day 22
svn rev 47281
language R
version.string R version 2.8.1 (2008-12-22)
snow, socket cluster: problem with path to rscript
18 messages · Steve Weston, Matthieu Stigler, Dirk Eddelbuettel +1 more
Hi Matthieu, On Mon, Apr 13, 2009 at 4:46 AM, Matthieu Stigler
<matthieu.stigler at gmail.com> wrote:
Hello I'm trying to set-up a cluster having three nodes with cluster type=SOCK. I have a problem to give the path to the rscript (different on the two machines).... I try to give full information (fill the list(host, rscript) as in the makeCluster example)
[snip]
I'm really thankfull if you can give me advice and tell me wether I do something wrong or not! Thanks a lot! See: library(snow) #local host only host212Full <-list(host = "dsge at 192.100.100.212", rscript = "/usr/lib64/R/bin/Rscript") makeCluster(c(rep(host212Full, 2)), type = "SOCK")
I think the immediate problem is the way that you're creating the first argument to makeCluster. Your code is concatenating host212Full to itself, creating a list of length 4 with elements that are not lists. I think you want to create a list of two lists, each of length 2. One way to create the appropriate list (which worked for me on R 2.8.1) is: makeCluster(rep(list(host212Full), 2), type = "SOCK") I hope that helps.
Steve Weston REvolution Computing One Century Tower | 265 Church Street, Suite 1006 New Haven, CT 06510 P: 203-777-7442 x266 | www.revolution-computing.com
On Mon, 13 Apr 2009, Steve Weston wrote:
Hi Matthieu, On Mon, Apr 13, 2009 at 4:46 AM, Matthieu Stigler <matthieu.stigler at gmail.com> wrote:
Hello I'm trying to set-up a cluster having three nodes with cluster type=SOCK. I have a problem to give the path to the rscript (different on the two machines).... I try to give full information (fill the list(host, rscript) as in the makeCluster example)
[snip]
I'm really thankfull if you can give me advice and tell me wether I do something wrong or not! Thanks a lot! See: library(snow) #local host only host212Full <-list(host = "dsge at 192.100.100.212", rscript = "/usr/lib64/R/bin/Rscript") makeCluster(c(rep(host212Full, 2)), type = "SOCK")
I think the immediate problem is the way that you're creating the first argument to makeCluster. Your code is concatenating host212Full to itself, creating a list of length 4 with elements that are not lists. I think you want to create a list of two lists, each of length 2. One way to create the appropriate list (which worked for me on R 2.8.1) is: makeCluster(rep(list(host212Full), 2), type = "SOCK") I hope that helps.
Thanks -- the help page gets this wrong (in an example that is not run ) -- I'll fix that for the net release. luke
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke at stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
luke at stat.uiowa.edu a ?crit :
On Mon, 13 Apr 2009, Steve Weston wrote:
Hi Matthieu, On Mon, Apr 13, 2009 at 4:46 AM, Matthieu Stigler <matthieu.stigler at gmail.com> wrote:
Hello I'm trying to set-up a cluster having three nodes with cluster type=SOCK. I have a problem to give the path to the rscript (different on the two machines).... I try to give full information (fill the list(host, rscript) as in the makeCluster example)
[snip]
I'm really thankfull if you can give me advice and tell me wether I do something wrong or not! Thanks a lot! See: library(snow) #local host only host212Full <-list(host = "dsge at 192.100.100.212", rscript = "/usr/lib64/R/bin/Rscript") makeCluster(c(rep(host212Full, 2)), type = "SOCK")
I think the immediate problem is the way that you're creating the first argument to makeCluster. Your code is concatenating host212Full to itself, creating a list of length 4 with elements that are not lists. I think you want to create a list of two lists, each of length 2. One way to create the appropriate list (which worked for me on R 2.8.1) is: makeCluster(rep(list(host212Full), 2), type = "SOCK") I hope that helps.
Thanks -- the help page gets this wrong (in an example that is not run ) -- I'll fix that for the net release. luke
Ok! You pointed out the right error Steve, thanks! So it is now working for the local computer with. However, when trying to use the external computer, it seems to be working but nothing happens after he asked for the last password... > library(snow) > host210 <- list(host = "mat at 192.100.100.210", rscript = "/usr/bin/Rscript") > host212 <-list(host = "dsge at 192.100.100.212", rscript = "/usr/lib64/R/bin/Rscript") > > cl2 <- makeCluster(list(host212, host212), type = "SOCK") dsge at 192.100.100.212's password: dsge at 192.100.100.212's password: > stopCluster(cl2) > host210 <- list(host = "mat at 192.100.100.210", rscript = "/usr/bin/Rscript") > host212 <-list(host = "dsge at 192.100.100.212", rscript = "/usr/lib64/R/bin/Rscript") > > cl2 <- makeCluster(list(host212, host212, host210), type = "SOCK") dsge at 192.100.100.212's password: dsge at 192.100.100.212's password: mat at 192.100.100.210's password: #and then nothing... (I did a ssh mat at 192.100.100.210 before). What should I check to understand this problem? On the local (212), or on the external (210)? Thanks a lot!!
On Tue, Apr 14, 2009 at 5:29 AM, Matthieu Stigler
<matthieu.stigler at gmail.com> wrote:
So it is now working for the local computer with. However, when trying to use the external computer, it seems to be working but nothing happens after he asked for the last password...
This tells you is that "something went wrong". The basic strategy in this case is to use the "outfile" option to hopefully capture an error message. You might need to set outfile differently for different slaves, particularly if you're starting more than one on the same machine, but I suggest just starting one slave on 210 to avoid the issue. So do something like:
host210 <- list(host = "mat at 192.100.100.210", rscript = "/usr/bin/Rscript",
+ outfile="/tmp/log.txt")
cl2 <- makeCluster(list(host210), type = "SOCK")
If it hangs, go to another terminal, ssh to 192.100.100.210, and look at the contents of /tmp/log.txt, and hopefully that will provide a clue to the problem. Another approach is to use the "manual" option. That will print the command that you should use to manually start each of the slaves. You just ssh to that machine from another terminal, and cut and paste the printed command to start the slave. If you set "outfile" to an empty string, then output messages will go right to that terminal. -- Steve Weston REvolution Computing One Century Tower | 265 Church Street, Suite 1006 New Haven, CT 06510 P: 203-777-7442 x266 | www.revolution-computing.com
Steve Weston a ?crit :
On Tue, Apr 14, 2009 at 5:29 AM, Matthieu Stigler <matthieu.stigler at gmail.com> wrote:
So it is now working for the local computer with. However, when trying to
use the external computer, it seems to be working but nothing happens after
he asked for the last password...
This tells you is that "something went wrong". The basic strategy in this case is to use the "outfile" option to hopefully capture an error message. You might need to set outfile differently for different slaves, particularly if you're starting more than one on the same machine, but I suggest just starting one slave on 210 to avoid the issue. So do something like:
host210 <- list(host = "mat at 192.100.100.210", rscript = "/usr/bin/Rscript",
+ outfile="/tmp/log.txt")
cl2 <- makeCluster(list(host210), type = "SOCK")
Ok, thanks for pointing out this methid. I tried it and got following error message. This does not seem not be computer specific (tried to do it to other host 213, and from other host 213 to 212, always same error message): starting worker for ubuntu:10187 Error in socketConnection(master, port = port, blocking = TRUE, open = "a+b") : unable to open connection Calls: local ... slaveLoop -> recvData -> makeSOCKmaster -> socketConnection In addition: Warning message: In socketConnection(master, port = port, blocking = TRUE, open = "a+b") : ubuntu:10187 cannot be opened Execution halted Is it related to ssh or snow? I did not find any reference to that prob googling for it... Thanks a lot for your help!!
If it hangs, go to another terminal, ssh to 192.100.100.210, and look at the contents of /tmp/log.txt, and hopefully that will provide a clue to the problem. Another approach is to use the "manual" option. That will print the command that you should use to manually start each of the slaves. You just ssh to that machine from another terminal, and cut and paste the printed command to start the slave. If you set "outfile" to an empty string, then output messages will go right to that terminal. -- Steve Weston REvolution Computing One Century Tower | 265 Church Street, Suite 1006 New Haven, CT 06510 P: 203-777-7442 x266 | www.revolution-computing.com
On 15 April 2009 at 17:29, Matthieu Stigler wrote:
| >> So it is now working for the local computer with. However, when trying to | >> use the external computer, it seems to be working but nothing happens after | >> he asked for the last password... | >> | > | > This tells you is that "something went wrong". The basic strategy in this case | > is to use the "outfile" option to hopefully capture an error message. You might | > need to set outfile differently for different slaves, particularly if | > you're starting | > more than one on the same machine, but I suggest just starting one slave | > on 210 to avoid the issue. So do something like: | > | > | >> host210 <- list(host = "mat at 192.100.100.210", rscript = "/usr/bin/Rscript", | >> | > + outfile="/tmp/log.txt") | > | >> cl2 <- makeCluster(list(host210), type = "SOCK") | >> | > | > | Ok, thanks for pointing out this methid. | | I tried it and got following error message. This does not seem not be | computer specific (tried to do it to other host 213, and from other host | 213 to 212, always same error message): | | starting worker for ubuntu:10187 | | Error in socketConnection(master, port = port, blocking = TRUE, open = "a+b") : | | unable to open connection | | Calls: local ... slaveLoop -> recvData -> makeSOCKmaster -> socketConnection | | In addition: Warning message: | | In socketConnection(master, port = port, blocking = TRUE, open = "a+b") : | | ubuntu:10187 cannot be opened | | Execution halted | | | Is it related to ssh or snow? I did not find any reference to that prob | googling for it... Test for it --- can you actually issue an ssh command for the _script_ on the _machine_ you have specified ? I.e. run $ ssh mat at 192.100.100.210 /usr/bin/Rscript --version where I added --version to make sure Rscript has something to do. Unless this works 'by itself', you cannot expect to use it. Dirk
Three out of two people have difficulties with fractions.
On Wed, 15 Apr 2009, Matthieu Stigler wrote:
Steve Weston a ?crit :
On Tue, Apr 14, 2009 at 5:29 AM, Matthieu Stigler <matthieu.stigler at gmail.com> wrote:
So it is now working for the local computer with. However, when trying to use the external computer, it seems to be working but nothing happens after he asked for the last password...
This tells you is that "something went wrong". The basic strategy in this case is to use the "outfile" option to hopefully capture an error message. You might need to set outfile differently for different slaves, particularly if you're starting more than one on the same machine, but I suggest just starting one slave on 210 to avoid the issue. So do something like:
host210 <- list(host = "mat at 192.100.100.210", rscript = "/usr/bin/Rscript",
+ outfile="/tmp/log.txt")
cl2 <- makeCluster(list(host210), type = "SOCK")
Ok, thanks for pointing out this methid. I tried it and got following error message. This does not seem not be computer specific (tried to do it to other host 213, and from other host 213 to 212, always same error message): starting worker for ubuntu:10187 Error in socketConnection(master, port = port, blocking = TRUE, open = "a+b") : unable to open connection Calls: local ... slaveLoop -> recvData -> makeSOCKmaster -> socketConnection In addition: Warning message: In socketConnection(master, port = port, blocking = TRUE, open = "a+b") : ubuntu:10187 cannot be opened Execution halted Is it related to ssh or snow? I did not find any reference to that prob googling for it...
It is an issue with your ability to make a socket connection to the
master. Most likely the master computer has a firewall that is
blocking connections to the port snow uses. Try turning the firewall
off or at least enabling the port in the error message.
A simple test is to do
socketConnection(port = 10187, server = TRUE)
in an R session on the master and
telnet ubuntu 10187
in a shell on your worker machine (assumign your master is called
ubuntu) (or you can use R and
socketConnection("ubuntu", port = 10187)
in an R session on the worker).
luke
Thanks a lot for your help!!
If it hangs, go to another terminal, ssh to 192.100.100.210, and look at the contents of /tmp/log.txt, and hopefully that will provide a clue to the problem. Another approach is to use the "manual" option. That will print the command that you should use to manually start each of the slaves. You just ssh to that machine from another terminal, and cut and paste the printed command to start the slave. If you set "outfile" to an empty string, then output messages will go right to that terminal. -- Steve Weston REvolution Computing One Century Tower | 265 Church Street, Suite 1006 New Haven, CT 06510 P: 203-777-7442 x266 | www.revolution-computing.com
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke at stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
On 15 April 2009 at 07:16, Dirk Eddelbuettel wrote:
| On 15 April 2009 at 17:29, Matthieu Stigler wrote:
| | >> So it is now working for the local computer with. However, when trying to | | >> use the external computer, it seems to be working but nothing happens after | | >> he asked for the last password... | | >> | | > | | > This tells you is that "something went wrong". The basic strategy in this case | | > is to use the "outfile" option to hopefully capture an error message. You might | | > need to set outfile differently for different slaves, particularly if | | > you're starting | | > more than one on the same machine, but I suggest just starting one slave | | > on 210 to avoid the issue. So do something like: | | > | | > | | >> host210 <- list(host = "mat at 192.100.100.210", rscript = "/usr/bin/Rscript", | | >> | | > + outfile="/tmp/log.txt") | | > | | >> cl2 <- makeCluster(list(host210), type = "SOCK") | | >> | | > | | > | | Ok, thanks for pointing out this methid. | | | | I tried it and got following error message. This does not seem not be | | computer specific (tried to do it to other host 213, and from other host | | 213 to 212, always same error message): | | | | starting worker for ubuntu:10187 | | | | Error in socketConnection(master, port = port, blocking = TRUE, open = "a+b") : | | | | unable to open connection | | | | Calls: local ... slaveLoop -> recvData -> makeSOCKmaster -> socketConnection | | | | In addition: Warning message: | | | | In socketConnection(master, port = port, blocking = TRUE, open = "a+b") : | | | | ubuntu:10187 cannot be opened | | | | Execution halted | | | | | | Is it related to ssh or snow? I did not find any reference to that prob | | googling for it... | | Test for it --- can you actually issue an ssh command for the _script_ on the | _machine_ you have specified ? I.e. run | | $ ssh mat at 192.100.100.210 /usr/bin/Rscript --version | | where I added --version to make sure Rscript has something to do. Unless | this works 'by itself', you cannot expect to use it. That's what I get for posting before the coffee settles in. Thanks to Luke for correcting the test for socket rather ssh connections. Yet when I specify "SOCK" as type, I still get noise [1] from ssh:
cl <- makeSOCKcluster(c("l1", "l2"))
Control socket connect(/home/deddelbuettel/.ssh/master-deddelbuettel at l1:22): Connection refused ControlSocket /home/deddelbuettel/.ssh/master-deddelbuettel at l1:22 already exists, disabling multiplexing Control socket connect(/home/deddelbuettel/.ssh/master-deddelbuettel at l2:22): Connection refused ControlSocket /home/deddelbuettel/.ssh/master-deddelbuettel at l2:22 already exists, disabling multiplexing
str(cl)
List of 2 $ :List of 3 ..$ con :Classes 'sockconn', 'connection' atomic [1:1] 5 .. .. ..- attr(*, "conn_id")=<externalptr> ..$ host: chr "l1" ..$ rank: int 1 ..- attr(*, "class")= chr "SOCKnode" $ :List of 3 ..$ con :Classes 'sockconn', 'connection' atomic [1:1] 6 .. .. ..- attr(*, "conn_id")=<externalptr> ..$ host: chr "l2" ..$ rank: int 2 ..- attr(*, "class")= chr "SOCKnode" - attr(*, "class")= chr [1:2] "SOCKcluster" "cluster"
implying that you'd still want to sort out ssh access to the hosts even when
you use socket connections.
Dirk
[1] I am using the following in ~/.ssh/config which can speed up multiple
connections to the same machines which is my use case at work:
-----------------------------------------------------------------------------
Host *
ControlPath ~/.ssh/master-%r@%h:%p
ControlMaster auto
ForwardAgent yes
ForwardX11 yes
ForwardX11Trusted yes
-----------------------------------------------------------------------------
Three out of two people have difficulties with fractions.
luke at stat.uiowa.edu a ?crit :
On Wed, 15 Apr 2009, Matthieu Stigler wrote:
Steve Weston a ?crit :
On Tue, Apr 14, 2009 at 5:29 AM, Matthieu Stigler <matthieu.stigler at gmail.com> wrote:
So it is now working for the local computer with. However, when trying to use the external computer, it seems to be working but nothing happens after he asked for the last password...
This tells you is that "something went wrong". The basic strategy in this case is to use the "outfile" option to hopefully capture an error message. You might need to set outfile differently for different slaves, particularly if you're starting more than one on the same machine, but I suggest just starting one slave on 210 to avoid the issue. So do something like:
host210 <- list(host = "mat at 192.100.100.210", rscript = "/usr/bin/Rscript",
+ outfile="/tmp/log.txt")
cl2 <- makeCluster(list(host210), type = "SOCK")
Ok, thanks for pointing out this methid. I tried it and got following error message. This does not seem not be computer specific (tried to do it to other host 213, and from other host 213 to 212, always same error message): starting worker for ubuntu:10187 Error in socketConnection(master, port = port, blocking = TRUE, open = "a+b") : unable to open connection Calls: local ... slaveLoop -> recvData -> makeSOCKmaster -> socketConnection In addition: Warning message: In socketConnection(master, port = port, blocking = TRUE, open = "a+b") : ubuntu:10187 cannot be opened Execution halted Is it related to ssh or snow? I did not find any reference to that prob googling for it...
It is an issue with your ability to make a socket connection to the
master. Most likely the master computer has a firewall that is
blocking connections to the port snow uses. Try turning the firewall
off or at least enabling the port in the error message.
A simple test is to do
socketConnection(port = 10187, server = TRUE)
in an R session on the master and
telnet ubuntu 10187
in a shell on your worker machine (assumign your master is called
ubuntu) (or you can use R and
socketConnection("ubuntu", port = 10187)
in an R session on the worker).
luke
Thanks Luke and Dirk for your help!
I don't think it is a firewall error, as both machines have all port
open (as default with iptables as I understood), and the admin of the
network opened even port 10187.
I tried first the three solutions suggested, none of them seem to give
good results:
$telnet 192.100.100.212 10187
Trying 192.100.100.212...
telnet: Unable to connect to remote host: Connection refused
R>socketConnection(port = 10187, server=TRUE)
#nothing happens... is it right?
R > socketConnection("192.100.100.212", port = 10187)
Erreur dans socketConnection("192.100.100.212", port = 10187) :
impossible d'ouvrir la connexion
De plus : Warning message:
In socketConnection("192.100.100.212", port = 10187) :
192.100.100.212:10187 cannot be opened
Same error message when using "ubuntu"/ dsge at 192.100.100.212 etc..
Going to a ubuntu forum, someone told that one has to open a server on
the port (excuse, explanations are not good as I don't understand that
much the subject :-( ).
So launching in the master (212):
$nc -l -p 10187
then one is able to have in 210:
$telnet 192.100.100.212 10187
Trying 192.100.100.212...
Connected to 192.100.100.212.
Escape character is '^]'.
So it seems that it is working, but there is then no effect on the
previous commands socketConnection, makeCluster, still claims that 10187
can't be open.
With those elements, do you guys see clearer or is it even darker?
Thanks a lot for your help!
Matthieu
Thanks a lot for your help!!
If it hangs, go to another terminal, ssh to 192.100.100.210, and look at the contents of /tmp/log.txt, and hopefully that will provide a clue to the problem. Another approach is to use the "manual" option. That will print the command that you should use to manually start each of the slaves. You just ssh to that machine from another terminal, and cut and paste the printed command to start the slave. If you set "outfile" to an empty string, then output messages will go right to that terminal. -- Steve Weston REvolution Computing One Century Tower | 265 Church Street, Suite 1006 New Haven, CT 06510 P: 203-777-7442 x266 | www.revolution-computing.com
I just noticed that you're running R 2.7.1 on your 192.100.100.212 machine. I believe there are known socketConnection issues with that version of R that Luke fixed as of R 2.7.2. So I strongly suggest that you upgrade your version of R. -- Steve Weston REvolution Computing One Century Tower | 265 Church Street, Suite 1006 New Haven, CT 06510 P: 203-777-7442 x266 | www.revolution-computing.com On Thu, Apr 16, 2009 at 4:52 AM, Matthieu Stigler
<matthieu.stigler at gmail.com> wrote:
luke at stat.uiowa.edu a ?crit :
On Wed, 15 Apr 2009, Matthieu Stigler wrote:
Steve Weston a ?crit :
On Tue, Apr 14, 2009 at 5:29 AM, Matthieu Stigler <matthieu.stigler at gmail.com> wrote:
So it is now working for the local computer with. However, when trying to use the external computer, it seems to be working but nothing happens after he asked for the last password...
This tells you is that "something went wrong". ?The basic strategy in this case is to use the "outfile" option to hopefully capture an error message. ?You might need to set outfile differently for different slaves, particularly if you're starting more than one on the same machine, but I suggest just starting one slave on 210 to avoid the issue. ?So do something like:
host210 <- list(host = "mat at 192.100.100.210", rscript = "/usr/bin/Rscript",
+ ? ? ? ? ? ? ? ? ? ? ? outfile="/tmp/log.txt")
cl2 <- makeCluster(list(host210), type = "SOCK")
Ok, thanks for pointing out this methid. I tried it and got following error message. This does not seem not be computer specific (tried to do it to other host 213, and from other host 213 to 212, always same error message): starting worker for ubuntu:10187 Error in socketConnection(master, port = port, blocking = TRUE, open = "a+b") : unable to open connection Calls: local ... slaveLoop -> recvData -> makeSOCKmaster -> socketConnection In addition: Warning message: In socketConnection(master, port = port, blocking = TRUE, open = "a+b") : ubuntu:10187 cannot be opened Execution halted Is it related to ssh or snow? I did not find any reference to that prob googling for it...
It is an issue with your ability to make a socket connection to the
master. Most likely the master computer has a firewall that is
blocking connections to the port snow uses. ?Try turning the firewall
off or at least enabling the port in the error message.
A simple test is to do
? ?socketConnection(port = 10187, server = TRUE)
in an R session on the master and
? ?telnet ubuntu 10187
in a shell on your worker machine (assumign your master is called
ubuntu) (or you can use R and
? ?socketConnection("ubuntu", port = 10187)
in an R session on the worker).
luke
Thanks Luke and Dirk for your help!
I don't think it is a firewall error, as both machines have all port open
(as default with iptables as I understood), and the admin of the network
opened even port 10187.
I tried first the three solutions suggested, none of them seem to give good
results:
$telnet 192.100.100.212 10187
Trying 192.100.100.212...
telnet: Unable to connect to remote host: Connection refused
R>socketConnection(port = 10187, server=TRUE)
#nothing happens... is it right?
R > socketConnection("192.100.100.212", port = 10187)
Erreur dans socketConnection("192.100.100.212", port = 10187) :
?impossible d'ouvrir la connexion
De plus : Warning message:
In socketConnection("192.100.100.212", port = 10187) :
?192.100.100.212:10187 cannot be opened
Same error message when using "ubuntu"/ dsge at 192.100.100.212 etc..
Going to a ubuntu forum, someone told that one has to open a server on the
port (excuse, explanations are not good as I don't understand that much the
subject :-( ).
So launching in the master (212):
$nc -l -p 10187
then one is able to ?have in 210:
$telnet 192.100.100.212 10187
Trying 192.100.100.212...
Connected to 192.100.100.212.
Escape character is '^]'.
So it seems that it is working, but there is then no effect on the previous
commands socketConnection, makeCluster, still claims that 10187 can't be
open.
With those elements, do you guys see clearer or is it even darker? Thanks a
lot for your help!
Matthieu
Thanks a lot for your help!!
If it hangs, go to another terminal, ssh to 192.100.100.210, and look at the contents of /tmp/log.txt, and hopefully that will provide a clue to the problem. Another approach is to use the "manual" option. ?That will print the command that you should use to manually start each of the slaves. You just ssh to that machine from another terminal, and cut and paste the printed command to start the slave. ?If you set "outfile" to an empty string, then output messages will go right to that terminal. -- Steve Weston REvolution Computing One Century Tower | 265 Church Street, Suite 1006 New Haven, CT ?06510 P: 203-777-7442 x266 | www.revolution-computing.com
Steve Weston a ?crit :
I just noticed that you're running R 2.7.1 on your 192.100.100.212 machine. I believe there are known socketConnection issues with that version of R that Luke fixed as of R 2.7.2. So I strongly suggest that you upgrade your version of R.
I upgraded to R 2.8 but unfortunately this doesn't change, the port
10187 is still said to be close...
I obviously have a problem in opening the port, maybe should I rather
post on the debian list or on other forums? I use nc -l -p 10187, so
that telnet xxx.212 10187 is working, did it on both machines, but still
when running with makeCluster have that issue, also when running from
worker:
socketConnection("ubuntu", port = 10187)
192.100.100.212:10187 cannot be opened
and with:
socketConnection(port = 10187, server = TRUE)
nothing happens, what is actually the expected output?
Thanks a lot for your help and advices!!!
Mat
-- Steve Weston REvolution Computing One Century Tower | 265 Church Street, Suite 1006 New Haven, CT 06510 P: 203-777-7442 x266 | www.revolution-computing.com On Thu, Apr 16, 2009 at 4:52 AM, Matthieu Stigler <matthieu.stigler at gmail.com> wrote:
luke at stat.uiowa.edu a ?crit :
On Wed, 15 Apr 2009, Matthieu Stigler wrote:
Steve Weston a ?crit :
On Tue, Apr 14, 2009 at 5:29 AM, Matthieu Stigler
<matthieu.stigler at gmail.com> wrote:
So it is now working for the local computer with. However, when trying
to
use the external computer, it seems to be working but nothing happens
after
he asked for the last password...
This tells you is that "something went wrong". The basic strategy in
this case
is to use the "outfile" option to hopefully capture an error message.
You might
need to set outfile differently for different slaves, particularly if
you're starting
more than one on the same machine, but I suggest just starting one slave
on 210 to avoid the issue. So do something like:
host210 <- list(host = "mat at 192.100.100.210", rscript =
"/usr/bin/Rscript",
+ outfile="/tmp/log.txt")
cl2 <- makeCluster(list(host210), type = "SOCK")
Ok, thanks for pointing out this methid.
I tried it and got following error message. This does not seem not be
computer specific (tried to do it to other host 213, and from other host 213
to 212, always same error message):
starting worker for ubuntu:10187 Error in socketConnection(master, port =
port, blocking = TRUE, open = "a+b") : unable to open connection
Calls: local ... slaveLoop -> recvData -> makeSOCKmaster ->
socketConnection
In addition: Warning message:
In socketConnection(master, port = port, blocking = TRUE, open = "a+b") :
ubuntu:10187 cannot be opened
Execution halted
Is it related to ssh or snow? I did not find any reference to that prob
googling for it...
It is an issue with your ability to make a socket connection to the
master. Most likely the master computer has a firewall that is
blocking connections to the port snow uses. Try turning the firewall
off or at least enabling the port in the error message.
A simple test is to do
socketConnection(port = 10187, server = TRUE)
in an R session on the master and
telnet ubuntu 10187
in a shell on your worker machine (assumign your master is called
ubuntu) (or you can use R and
socketConnection("ubuntu", port = 10187)
in an R session on the worker).
luke
Thanks Luke and Dirk for your help!
I don't think it is a firewall error, as both machines have all port open
(as default with iptables as I understood), and the admin of the network
opened even port 10187.
I tried first the three solutions suggested, none of them seem to give good
results:
$telnet 192.100.100.212 10187
Trying 192.100.100.212...
telnet: Unable to connect to remote host: Connection refused
R>socketConnection(port = 10187, server=TRUE)
#nothing happens... is it right?
R > socketConnection("192.100.100.212", port = 10187)
Erreur dans socketConnection("192.100.100.212", port = 10187) :
impossible d'ouvrir la connexion
De plus : Warning message:
In socketConnection("192.100.100.212", port = 10187) :
192.100.100.212:10187 cannot be opened
Same error message when using "ubuntu"/ dsge at 192.100.100.212 etc..
Going to a ubuntu forum, someone told that one has to open a server on the
port (excuse, explanations are not good as I don't understand that much the
subject :-( ).
So launching in the master (212):
$nc -l -p 10187
then one is able to have in 210:
$telnet 192.100.100.212 10187
Trying 192.100.100.212...
Connected to 192.100.100.212.
Escape character is '^]'.
So it seems that it is working, but there is then no effect on the previous
commands socketConnection, makeCluster, still claims that 10187 can't be
open.
With those elements, do you guys see clearer or is it even darker? Thanks a
lot for your help!
Matthieu
Thanks a lot for your help!!
If it hangs, go to another terminal, ssh to 192.100.100.210, and look at
the contents of /tmp/log.txt, and hopefully that will provide a clue to
the problem.
Another approach is to use the "manual" option. That will print the
command that you should use to manually start each of the slaves.
You just ssh to that machine from another terminal, and cut and paste
the printed command to start the slave. If you set "outfile" to an
empty
string, then output messages will go right to that terminal.
--
Steve Weston
REvolution Computing
One Century Tower | 265 Church Street, Suite 1006
New Haven, CT 06510
P: 203-777-7442 x266 | www.revolution-computing.com
On Fri, 17 Apr 2009, Matthieu Stigler wrote:
Steve Weston a ?crit :
I just noticed that you're running R 2.7.1 on your 192.100.100.212 machine. I believe there are known socketConnection issues with that version of R that Luke fixed as of R 2.7.2. So I strongly suggest that you upgrade your version of R.
I upgraded to R 2.8 but unfortunately this doesn't change, the port 10187 is still said to be close... I obviously have a problem in opening the port, maybe should I rather post on the debian list or on other forums? I use nc -l -p 10187, so that telnet
According to my man page that argument combination is not legal so I don't know what you actually did.
xxx.212 10187 is working, did it on both machines, but still when running
with makeCluster have that issue, also when running from worker:
socketConnection("ubuntu", port = 10187)
192.100.100.212:10187 cannot be opened
and with:
socketConnection(port = 10187, server = TRUE)
nothing happens, what is actually the expected output?
the server call waits until a connection occurs and then returns an R connection object. The clinet socketConnection call returns a socket connection if curresful and gives an error message if not. So on the master do s <- socketConnection(port = 10187, server = TRUE) and this will wait for a connection and return to the prompt when a connectin occurs. On the wroker machine telnet master 10187 will either succeed and wait until the server socket is closed or fail with an error message about not being able to open the port. If I use nc master 10187 then no an successful connection nc waits (for input) until the server closes the socket with close(s) and then returns to the shell prompt. Failure for me is an immediate resurn to the shell prompt, no error message (and the server side continues to wait). luke
Thanks a lot for your help and advices!!! Mat
-- Steve Weston REvolution Computing One Century Tower | 265 Church Street, Suite 1006 New Haven, CT 06510 P: 203-777-7442 x266 | www.revolution-computing.com On Thu, Apr 16, 2009 at 4:52 AM, Matthieu Stigler <matthieu.stigler at gmail.com> wrote:
luke at stat.uiowa.edu a ?crit :
On Wed, 15 Apr 2009, Matthieu Stigler wrote:
Steve Weston a ?crit :
On Tue, Apr 14, 2009 at 5:29 AM, Matthieu Stigler <matthieu.stigler at gmail.com> wrote:
So it is now working for the local computer with. However, when trying to use the external computer, it seems to be working but nothing happens after he asked for the last password...
This tells you is that "something went wrong". The basic strategy in this case is to use the "outfile" option to hopefully capture an error message. You might need to set outfile differently for different slaves, particularly if you're starting more than one on the same machine, but I suggest just starting one slave on 210 to avoid the issue. So do something like:
host210 <- list(host = "mat at 192.100.100.210", rscript = "/usr/bin/Rscript",
+ outfile="/tmp/log.txt")
cl2 <- makeCluster(list(host210), type = "SOCK")
Ok, thanks for pointing out this methid. I tried it and got following error message. This does not seem not be computer specific (tried to do it to other host 213, and from other host 213 to 212, always same error message): starting worker for ubuntu:10187 Error in socketConnection(master, port = port, blocking = TRUE, open = "a+b") : unable to open connection Calls: local ... slaveLoop -> recvData -> makeSOCKmaster -> socketConnection In addition: Warning message: In socketConnection(master, port = port, blocking = TRUE, open = "a+b") : ubuntu:10187 cannot be opened Execution halted Is it related to ssh or snow? I did not find any reference to that prob googling for it...
It is an issue with your ability to make a socket connection to the
master. Most likely the master computer has a firewall that is
blocking connections to the port snow uses. Try turning the firewall
off or at least enabling the port in the error message.
A simple test is to do
socketConnection(port = 10187, server = TRUE)
in an R session on the master and
telnet ubuntu 10187
in a shell on your worker machine (assumign your master is called
ubuntu) (or you can use R and
socketConnection("ubuntu", port = 10187)
in an R session on the worker).
luke
Thanks Luke and Dirk for your help!
I don't think it is a firewall error, as both machines have all port open
(as default with iptables as I understood), and the admin of the network
opened even port 10187.
I tried first the three solutions suggested, none of them seem to give
good
results:
$telnet 192.100.100.212 10187
Trying 192.100.100.212...
telnet: Unable to connect to remote host: Connection refused
R>socketConnection(port = 10187, server=TRUE)
#nothing happens... is it right?
R > socketConnection("192.100.100.212", port = 10187)
Erreur dans socketConnection("192.100.100.212", port = 10187) :
impossible d'ouvrir la connexion
De plus : Warning message:
In socketConnection("192.100.100.212", port = 10187) :
192.100.100.212:10187 cannot be opened
Same error message when using "ubuntu"/ dsge at 192.100.100.212 etc..
Going to a ubuntu forum, someone told that one has to open a server on the
port (excuse, explanations are not good as I don't understand that much
the
subject :-( ).
So launching in the master (212):
$nc -l -p 10187
then one is able to have in 210:
$telnet 192.100.100.212 10187
Trying 192.100.100.212...
Connected to 192.100.100.212.
Escape character is '^]'.
So it seems that it is working, but there is then no effect on the
previous
commands socketConnection, makeCluster, still claims that 10187 can't be
open.
With those elements, do you guys see clearer or is it even darker? Thanks
a
lot for your help!
Matthieu
Thanks a lot for your help!!
If it hangs, go to another terminal, ssh to 192.100.100.210, and look at the contents of /tmp/log.txt, and hopefully that will provide a clue to the problem. Another approach is to use the "manual" option. That will print the command that you should use to manually start each of the slaves. You just ssh to that machine from another terminal, and cut and paste the printed command to start the slave. If you set "outfile" to an empty string, then output messages will go right to that terminal. -- Steve Weston REvolution Computing One Century Tower | 265 Church Street, Suite 1006 New Haven, CT 06510 P: 203-777-7442 x266 | www.revolution-computing.com
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke at stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
On 17 April 2009 at 20:10, Matthieu Stigler wrote:
| Steve Weston a crit :
| > I just noticed that you're running R 2.7.1 on your 192.100.100.212
| > machine. I believe there are known socketConnection issues
| > with that version of R that Luke fixed as of R 2.7.2. So I strongly
| > suggest that you upgrade your version of R.
| >
| I upgraded to R 2.8 but unfortunately this doesn't change, the port
| 10187 is still said to be close...
For what it is worth, I cannot do that either on Ubuntu at work, yet snow
works just fine:
edd at l1:~$ telnet l2 10187
Trying xxx.xx.50.99...
telnet: Unable to connect to remote host: Connection refused
edd at l1:~$ telnet l1 10187
Trying xxx.xx.50.97...
telnet: Unable to connect to remote host: Connection refused
edd at l1:~$ r -lsnow -e'cl <- makeCluster(c("l1","l2"), "SOCK"); print(str(cl)); stopCluster(cl)'
List of 2
$ :List of 3
..$ con :Classes 'sockconn', 'connection' atomic [1:1] 3
.. .. ..- attr(*, "conn_id")=<externalptr>
..$ host: chr "l1"
..$ rank: int 1
..- attr(*, "class")= chr "SOCKnode"
$ :List of 3
..$ con :Classes 'sockconn', 'connection' atomic [1:1] 4
.. .. ..- attr(*, "conn_id")=<externalptr>
..$ host: chr "l2"
..$ rank: int 2
..- attr(*, "class")= chr "SOCKnode"
- attr(*, "class")= chr [1:2] "SOCKcluster" "cluster"
NULL
edd at l1:~$
Maybe that socket-to-port-10187 thing is not really relevant...
Dirk
| I obviously have a problem in opening the port, maybe should I rather
| post on the debian list or on other forums? I use nc -l -p 10187, so
| that telnet xxx.212 10187 is working, did it on both machines, but still
| when running with makeCluster have that issue, also when running from
| worker:
|
| socketConnection("ubuntu", port = 10187)
| 192.100.100.212:10187 cannot be opened
|
|
| and with:
|
| socketConnection(port = 10187, server = TRUE)
|
| nothing happens, what is actually the expected output?
|
| Thanks a lot for your help and advices!!!
|
| Mat
| > --
| > Steve Weston
| > REvolution Computing
| > One Century Tower | 265 Church Street, Suite 1006
| > New Haven, CT 06510
| > P: 203-777-7442 x266 | www.revolution-computing.com
| >
| >
| > On Thu, Apr 16, 2009 at 4:52 AM, Matthieu Stigler
| > <matthieu.stigler at gmail.com> wrote:
| > | >> luke at stat.uiowa.edu a crit : | >>
| >>> On Wed, 15 Apr 2009, Matthieu Stigler wrote:
| >>> | >>> | >>>> Steve Weston a crit : | >>>> | >>>>> On Tue, Apr 14, 2009 at 5:29 AM, Matthieu Stigler
| >>>>> <matthieu.stigler at gmail.com> wrote:
| >>>>>
| >>>>>
| >>>>>
| >>>>>> So it is now working for the local computer with. However, when trying
| >>>>>> to
| >>>>>> use the external computer, it seems to be working but nothing happens
| >>>>>> after
| >>>>>> he asked for the last password...
| >>>>>>
| >>>>>>
| >>>>> This tells you is that "something went wrong". The basic strategy in
| >>>>> this case
| >>>>> is to use the "outfile" option to hopefully capture an error message.
| >>>>> You might
| >>>>> need to set outfile differently for different slaves, particularly if
| >>>>> you're starting
| >>>>> more than one on the same machine, but I suggest just starting one slave
| >>>>> on 210 to avoid the issue. So do something like:
| >>>>>
| >>>>>
| >>>>>
| >>>>>> host210 <- list(host = "mat at 192.100.100.210", rscript =
| >>>>>> "/usr/bin/Rscript",
| >>>>>>
| >>>>>>
| >>>>> + outfile="/tmp/log.txt")
| >>>>>
| >>>>>
| >>>>>> cl2 <- makeCluster(list(host210), type = "SOCK")
| >>>>>>
| >>>>>>
| >>>>>
| >>>> Ok, thanks for pointing out this methid.
| >>>>
| >>>> I tried it and got following error message. This does not seem not be
| >>>> computer specific (tried to do it to other host 213, and from other host 213
| >>>> to 212, always same error message):
| >>>>
| >>>> starting worker for ubuntu:10187 Error in socketConnection(master, port =
| >>>> port, blocking = TRUE, open = "a+b") : unable to open connection
| >>>>
| >>>> Calls: local ... slaveLoop -> recvData -> makeSOCKmaster ->
| >>>> socketConnection
| >>>>
| >>>> In addition: Warning message:
| >>>>
| >>>> In socketConnection(master, port = port, blocking = TRUE, open = "a+b") :
| >>>>
| >>>> ubuntu:10187 cannot be opened
| >>>>
| >>>> Execution halted
| >>>>
| >>>>
| >>>> Is it related to ssh or snow? I did not find any reference to that prob
| >>>> googling for it...
| >>>>
| >>> It is an issue with your ability to make a socket connection to the
| >>> master. Most likely the master computer has a firewall that is
| >>> blocking connections to the port snow uses. Try turning the firewall
| >>> off or at least enabling the port in the error message.
| >>> A simple test is to do
| >>>
| >>> socketConnection(port = 10187, server = TRUE)
| >>>
| >>> in an R session on the master and
| >>>
| >>> telnet ubuntu 10187
| >>>
| >>> in a shell on your worker machine (assumign your master is called
| >>> ubuntu) (or you can use R and
| >>>
| >>> socketConnection("ubuntu", port = 10187)
| >>>
| >>> in an R session on the worker).
| >>>
| >>> luke
| >>>
| >>>
| >> Thanks Luke and Dirk for your help!
| >>
| >> I don't think it is a firewall error, as both machines have all port open
| >> (as default with iptables as I understood), and the admin of the network
| >> opened even port 10187.
| >>
| >> I tried first the three solutions suggested, none of them seem to give good
| >> results:
| >>
| >> $telnet 192.100.100.212 10187
| >>
| >> Trying 192.100.100.212...
| >>
| >> telnet: Unable to connect to remote host: Connection refused
| >>
| >> R>socketConnection(port = 10187, server=TRUE)
| >>
| >> #nothing happens... is it right?
| >>
| >>
| >> R > socketConnection("192.100.100.212", port = 10187)
| >> Erreur dans socketConnection("192.100.100.212", port = 10187) :
| >> impossible d'ouvrir la connexion
| >>
| >> De plus : Warning message:
| >>
| >> In socketConnection("192.100.100.212", port = 10187) :
| >>
| >> 192.100.100.212:10187 cannot be opened
| >>
| >> Same error message when using "ubuntu"/ dsge at 192.100.100.212 etc..
| >>
| >> Going to a ubuntu forum, someone told that one has to open a server on the
| >> port (excuse, explanations are not good as I don't understand that much the
| >> subject :-( ).
| >> So launching in the master (212):
| >>
| >> $nc -l -p 10187
| >>
| >>
| >> then one is able to have in 210:
| >>
| >> $telnet 192.100.100.212 10187
| >>
| >> Trying 192.100.100.212...
| >>
| >> Connected to 192.100.100.212.
| >>
| >> Escape character is '^]'.
| >>
| >> So it seems that it is working, but there is then no effect on the previous
| >> commands socketConnection, makeCluster, still claims that 10187 can't be
| >> open.
| >>
| >> With those elements, do you guys see clearer or is it even darker? Thanks a
| >> lot for your help!
| >>
| >> Matthieu
| >>
| >>
| >>>> Thanks a lot for your help!!
| >>>>
| >>>>> If it hangs, go to another terminal, ssh to 192.100.100.210, and look at
| >>>>> the contents of /tmp/log.txt, and hopefully that will provide a clue to
| >>>>> the problem.
| >>>>>
| >>>>> Another approach is to use the "manual" option. That will print the
| >>>>> command that you should use to manually start each of the slaves.
| >>>>> You just ssh to that machine from another terminal, and cut and paste
| >>>>> the printed command to start the slave. If you set "outfile" to an
| >>>>> empty
| >>>>> string, then output messages will go right to that terminal.
| >>>>>
| >>>>> --
| >>>>> Steve Weston
| >>>>> REvolution Computing
| >>>>> One Century Tower | 265 Church Street, Suite 1006
| >>>>> New Haven, CT 06510
| >>>>> P: 203-777-7442 x266 | www.revolution-computing.com
| >>>>>
| >>>>>
| >>>>
| >>
|
| _______________________________________________
| R-sig-hpc mailing list
| R-sig-hpc at r-project.org
| https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
Three out of two people have difficulties with fractions.
On Fri, Apr 17, 2009 at 11:28 AM, Dirk Eddelbuettel <edd at debian.org> wrote:
For what it is worth, I cannot do that either on Ubuntu at work, yet snow works just fine: edd at l1:~$ telnet l2 10187 Trying xxx.xx.50.99... telnet: Unable to connect to remote host: Connection refused edd at l1:~$ telnet l1 10187 Trying xxx.xx.50.97... telnet: Unable to connect to remote host: Connection refused
Perhaps I'm confused, but it looks like your telnets are going the wrong way. The telnets are supposed to be acting like the slave processes connecting back to the master. So they would be running on l1 and l2, but connecting back to the master, where "socketConnection(port=10871, server=TRUE)" is executing. The master process in snow is running socketConnection with almost the same arguments in order to create the cluster, so it seems like an important experiment to me. The only reason that it should fail that I can think of is due to a firewall, as Luke pointed out. At this point, I would be running R under gdb, and maybe doing some packet sniffing. I'm not sure what else to suggest. -- Steve Weston REvolution Computing One Century Tower | 265 Church Street, Suite 1006 New Haven, CT 06510 P: 203-777-7442 x266 | www.revolution-computing.com
On 17 April 2009 at 12:53, Steve Weston wrote:
| On Fri, Apr 17, 2009 at 11:28 AM, Dirk Eddelbuettel <edd at debian.org> wrote:
| | > For what it is worth, I cannot do that either on Ubuntu at work, yet snow | > works just fine: | > | > edd at l1:~$ telnet l2 10187 | > Trying xxx.xx.50.99... | > telnet: Unable to connect to remote host: Connection refused | > edd at l1:~$ telnet l1 10187 | > Trying xxx.xx.50.97... | > telnet: Unable to connect to remote host: Connection refused | | Perhaps I'm confused, but it looks like your telnets are going the | wrong way. The telnets are supposed to be acting like the slave | processes connecting back to the master. So they would be | running on l1 and l2, but connecting back to the master, where | "socketConnection(port=10871, server=TRUE)" is executing. Correct, Luke pointed that out is his post that arrived at about the same time. When there is server to connect to, it should be little wonder that the telnet process fails.... Sloppy thinking on my part. | At this point, I would be running R under gdb, and maybe doing | some packet sniffing. I'm not sure what else to suggest. Yes, Matthieu should probably try some 'manual' R socket connection example to prove that these work in his setup before trying to employ them with snow. Dirk
Three out of two people have difficulties with fractions.
2 days later
I solved the problem ;-)
I think that the issue come from that snow is expecting that the name of
the master has been exported in /etc/hosts (typically by IP master_name)
on the workers. In my case, it wasn't working before and since I've
exported it it is working!
The error I did was to check with
#open from the master (212):
socketConnection(port = 10187, server = TRUE)
#and from the slave
socketConnection("192.100.100.212", port = 10187) #from 210
#Even if this works it is not a sufficient condition, for snow to work,
indeed:
socketConnection("master_name", port = 10187) #from 210
has to be working also
Thanks a lot for the help of Dirk, Luke and Steve, who helped me a lot
in finding this!!
Matthieu
luke at stat.uiowa.edu a ?crit :
On Fri, 17 Apr 2009, Matthieu Stigler wrote:
Steve Weston a ?crit :
I just noticed that you're running R 2.7.1 on your 192.100.100.212 machine. I believe there are known socketConnection issues with that version of R that Luke fixed as of R 2.7.2. So I strongly suggest that you upgrade your version of R.
I upgraded to R 2.8 but unfortunately this doesn't change, the port 10187 is still said to be close... I obviously have a problem in opening the port, maybe should I rather post on the debian list or on other forums? I use nc -l -p 10187, so that telnet
According to my man page that argument combination is not legal so I don't know what you actually did.
xxx.212 10187 is working, did it on both machines, but still when
running with makeCluster have that issue, also when running from worker:
socketConnection("ubuntu", port = 10187)
192.100.100.212:10187 cannot be opened
and with:
socketConnection(port = 10187, server = TRUE)
nothing happens, what is actually the expected output?
the server call waits until a connection occurs and then returns an R connection object. The clinet socketConnection call returns a socket connection if curresful and gives an error message if not. So on the master do s <- socketConnection(port = 10187, server = TRUE) and this will wait for a connection and return to the prompt when a connectin occurs. On the wroker machine telnet master 10187 will either succeed and wait until the server socket is closed or fail with an error message about not being able to open the port. If I use nc master 10187 then no an successful connection nc waits (for input) until the server closes the socket with close(s) and then returns to the shell prompt. Failure for me is an immediate resurn to the shell prompt, no error message (and the server side continues to wait). luke
Thanks a lot for your help and advices!!! Mat
-- Steve Weston REvolution Computing One Century Tower | 265 Church Street, Suite 1006 New Haven, CT 06510 P: 203-777-7442 x266 | www.revolution-computing.com On Thu, Apr 16, 2009 at 4:52 AM, Matthieu Stigler <matthieu.stigler at gmail.com> wrote:
luke at stat.uiowa.edu a ?crit :
On Wed, 15 Apr 2009, Matthieu Stigler wrote:
Steve Weston a ?crit :
On Tue, Apr 14, 2009 at 5:29 AM, Matthieu Stigler <matthieu.stigler at gmail.com> wrote:
So it is now working for the local computer with. However, when trying to use the external computer, it seems to be working but nothing happens after he asked for the last password...
This tells you is that "something went wrong". The basic strategy in this case is to use the "outfile" option to hopefully capture an error message. You might need to set outfile differently for different slaves, particularly if you're starting more than one on the same machine, but I suggest just starting one slave on 210 to avoid the issue. So do something like:
host210 <- list(host = "mat at 192.100.100.210", rscript = "/usr/bin/Rscript",
+ outfile="/tmp/log.txt")
cl2 <- makeCluster(list(host210), type = "SOCK")
Ok, thanks for pointing out this methid. I tried it and got following error message. This does not seem not be computer specific (tried to do it to other host 213, and from other host 213 to 212, always same error message): starting worker for ubuntu:10187 Error in socketConnection(master, port = port, blocking = TRUE, open = "a+b") : unable to open connection Calls: local ... slaveLoop -> recvData -> makeSOCKmaster -> socketConnection In addition: Warning message: In socketConnection(master, port = port, blocking = TRUE, open = "a+b") : ubuntu:10187 cannot be opened Execution halted Is it related to ssh or snow? I did not find any reference to that prob googling for it...
It is an issue with your ability to make a socket connection to the
master. Most likely the master computer has a firewall that is
blocking connections to the port snow uses. Try turning the firewall
off or at least enabling the port in the error message.
A simple test is to do
socketConnection(port = 10187, server = TRUE)
in an R session on the master and
telnet ubuntu 10187
in a shell on your worker machine (assumign your master is called
ubuntu) (or you can use R and
socketConnection("ubuntu", port = 10187)
in an R session on the worker).
luke
Thanks Luke and Dirk for your help!
I don't think it is a firewall error, as both machines have all
port open
(as default with iptables as I understood), and the admin of the
network
opened even port 10187.
I tried first the three solutions suggested, none of them seem to
give good
results:
$telnet 192.100.100.212 10187
Trying 192.100.100.212...
telnet: Unable to connect to remote host: Connection refused
R>socketConnection(port = 10187, server=TRUE)
#nothing happens... is it right?
R > socketConnection("192.100.100.212", port = 10187)
Erreur dans socketConnection("192.100.100.212", port = 10187) :
impossible d'ouvrir la connexion
De plus : Warning message:
In socketConnection("192.100.100.212", port = 10187) :
192.100.100.212:10187 cannot be opened
Same error message when using "ubuntu"/ dsge at 192.100.100.212 etc..
Going to a ubuntu forum, someone told that one has to open a server
on the
port (excuse, explanations are not good as I don't understand that
much the
subject :-( ).
So launching in the master (212):
$nc -l -p 10187
then one is able to have in 210:
$telnet 192.100.100.212 10187
Trying 192.100.100.212...
Connected to 192.100.100.212.
Escape character is '^]'.
So it seems that it is working, but there is then no effect on the
previous
commands socketConnection, makeCluster, still claims that 10187
can't be
open.
With those elements, do you guys see clearer or is it even darker?
Thanks a
lot for your help!
Matthieu
Thanks a lot for your help!!
If it hangs, go to another terminal, ssh to 192.100.100.210, and look at the contents of /tmp/log.txt, and hopefully that will provide a clue to the problem. Another approach is to use the "manual" option. That will print the command that you should use to manually start each of the slaves. You just ssh to that machine from another terminal, and cut and paste the printed command to start the slave. If you set "outfile" to an empty string, then output messages will go right to that terminal. -- Steve Weston REvolution Computing One Century Tower | 265 Church Street, Suite 1006 New Haven, CT 06510 P: 203-777-7442 x266 | www.revolution-computing.com
Glad it is working now. By default snow uses Sys.info()["nodename"] on the master to determine the name of the master that is used for the back connection. If you supply an alternative as master="123...." withthe IP address then that should work around not having the master name known on the worker. luke
On Mon, 20 Apr 2009, Matthieu Stigler wrote:
I solved the problem ;-)
I think that the issue come from that snow is expecting that the name of the
master has been exported in /etc/hosts (typically by IP master_name) on the
workers. In my case, it wasn't working before and since I've exported it it
is working!
The error I did was to check with
#open from the master (212):
socketConnection(port = 10187, server = TRUE)
#and from the slave
socketConnection("192.100.100.212", port = 10187) #from 210
#Even if this works it is not a sufficient condition, for snow to work,
indeed:
socketConnection("master_name", port = 10187) #from 210
has to be working also
Thanks a lot for the help of Dirk, Luke and Steve, who helped me a lot in
finding this!!
Matthieu
luke at stat.uiowa.edu a ?crit :
On Fri, 17 Apr 2009, Matthieu Stigler wrote:
Steve Weston a ?crit :
I just noticed that you're running R 2.7.1 on your 192.100.100.212 machine. I believe there are known socketConnection issues with that version of R that Luke fixed as of R 2.7.2. So I strongly suggest that you upgrade your version of R.
I upgraded to R 2.8 but unfortunately this doesn't change, the port 10187 is still said to be close... I obviously have a problem in opening the port, maybe should I rather post on the debian list or on other forums? I use nc -l -p 10187, so that telnet
According to my man page that argument combination is not legal so I don't know what you actually did.
xxx.212 10187 is working, did it on both machines, but still when running
with makeCluster have that issue, also when running from worker:
socketConnection("ubuntu", port = 10187)
192.100.100.212:10187 cannot be opened
and with:
socketConnection(port = 10187, server = TRUE)
nothing happens, what is actually the expected output?
the server call waits until a connection occurs and then returns an R connection object. The clinet socketConnection call returns a socket connection if curresful and gives an error message if not. So on the master do s <- socketConnection(port = 10187, server = TRUE) and this will wait for a connection and return to the prompt when a connectin occurs. On the wroker machine telnet master 10187 will either succeed and wait until the server socket is closed or fail with an error message about not being able to open the port. If I use nc master 10187 then no an successful connection nc waits (for input) until the server closes the socket with close(s) and then returns to the shell prompt. Failure for me is an immediate resurn to the shell prompt, no error message (and the server side continues to wait). luke
Thanks a lot for your help and advices!!! Mat
-- Steve Weston REvolution Computing One Century Tower | 265 Church Street, Suite 1006 New Haven, CT 06510 P: 203-777-7442 x266 | www.revolution-computing.com On Thu, Apr 16, 2009 at 4:52 AM, Matthieu Stigler <matthieu.stigler at gmail.com> wrote:
luke at stat.uiowa.edu a ?crit :
On Wed, 15 Apr 2009, Matthieu Stigler wrote:
Steve Weston a ?crit :
On Tue, Apr 14, 2009 at 5:29 AM, Matthieu Stigler <matthieu.stigler at gmail.com> wrote:
So it is now working for the local computer with. However, when trying to use the external computer, it seems to be working but nothing happens after he asked for the last password...
This tells you is that "something went wrong". The basic strategy in this case is to use the "outfile" option to hopefully capture an error message. You might need to set outfile differently for different slaves, particularly if you're starting more than one on the same machine, but I suggest just starting one slave on 210 to avoid the issue. So do something like:
host210 <- list(host = "mat at 192.100.100.210", rscript = "/usr/bin/Rscript",
+ outfile="/tmp/log.txt")
cl2 <- makeCluster(list(host210), type = "SOCK")
Ok, thanks for pointing out this methid. I tried it and got following error message. This does not seem not be computer specific (tried to do it to other host 213, and from other host 213 to 212, always same error message): starting worker for ubuntu:10187 Error in socketConnection(master, port = port, blocking = TRUE, open = "a+b") : unable to open connection Calls: local ... slaveLoop -> recvData -> makeSOCKmaster -> socketConnection In addition: Warning message: In socketConnection(master, port = port, blocking = TRUE, open = "a+b") : ubuntu:10187 cannot be opened Execution halted Is it related to ssh or snow? I did not find any reference to that prob googling for it...
It is an issue with your ability to make a socket connection to the
master. Most likely the master computer has a firewall that is
blocking connections to the port snow uses. Try turning the firewall
off or at least enabling the port in the error message.
A simple test is to do
socketConnection(port = 10187, server = TRUE)
in an R session on the master and
telnet ubuntu 10187
in a shell on your worker machine (assumign your master is called
ubuntu) (or you can use R and
socketConnection("ubuntu", port = 10187)
in an R session on the worker).
luke
Thanks Luke and Dirk for your help!
I don't think it is a firewall error, as both machines have all port
open
(as default with iptables as I understood), and the admin of the network
opened even port 10187.
I tried first the three solutions suggested, none of them seem to give
good
results:
$telnet 192.100.100.212 10187
Trying 192.100.100.212...
telnet: Unable to connect to remote host: Connection refused
R>socketConnection(port = 10187, server=TRUE)
#nothing happens... is it right?
R > socketConnection("192.100.100.212", port = 10187)
Erreur dans socketConnection("192.100.100.212", port = 10187) :
impossible d'ouvrir la connexion
De plus : Warning message:
In socketConnection("192.100.100.212", port = 10187) :
192.100.100.212:10187 cannot be opened
Same error message when using "ubuntu"/ dsge at 192.100.100.212 etc..
Going to a ubuntu forum, someone told that one has to open a server on
the
port (excuse, explanations are not good as I don't understand that much
the
subject :-( ).
So launching in the master (212):
$nc -l -p 10187
then one is able to have in 210:
$telnet 192.100.100.212 10187
Trying 192.100.100.212...
Connected to 192.100.100.212.
Escape character is '^]'.
So it seems that it is working, but there is then no effect on the
previous
commands socketConnection, makeCluster, still claims that 10187 can't be
open.
With those elements, do you guys see clearer or is it even darker?
Thanks a
lot for your help!
Matthieu
Thanks a lot for your help!!
If it hangs, go to another terminal, ssh to 192.100.100.210, and look at the contents of /tmp/log.txt, and hopefully that will provide a clue to the problem. Another approach is to use the "manual" option. That will print the command that you should use to manually start each of the slaves. You just ssh to that machine from another terminal, and cut and paste the printed command to start the slave. If you set "outfile" to an empty string, then output messages will go right to that terminal. -- Steve Weston REvolution Computing One Century Tower | 265 Church Street, Suite 1006 New Haven, CT 06510 P: 203-777-7442 x266 | www.revolution-computing.com
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke at stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu