Hi, December 27, 2021 I started a thread asking for help troubleshooting non-blocking sockets. While developing the RBaseX client, I had issues with the authentication process. It eventually turned out that a short break had to be inserted in this process between sending the credentials to the server and requesting the status. Tomas Kalibera put me on the right track by drawing my attention to the 'socketSelect' function. I don't know exactly the purpose of this function is (the function itself is documented, but I can't find any information for which situations this function should be called.) but it sufficed to call this function once between sending and requesting. I have two questions. The first is where I can find R documentation on proper use of non-blocking sockets and on the proper use of the socketSelect function? The second question is more focused on using non-blocking sockets in general. Is it allowed to execute a read and a receive command immediately after each other or must a short waiting loop be built in. I'm asking this because I'm running into the same problems in a C++ project as I did with RBaseX. Ben Engbers
Question on non-blocking socket
7 messages · Ben Engbers, Tomas Kalibera, Ivan Krylov +1 more
On 2/15/23 01:24, Ben Engbers wrote:
Hi, December 27, 2021 I started a thread asking for help troubleshooting non-blocking sockets. While developing the RBaseX client, I had issues with the authentication process. It eventually turned out that a short break had to be inserted in this process between sending the credentials to the server and requesting the status. Tomas Kalibera put me on the right track by drawing my attention to the 'socketSelect' function. I don't know exactly the purpose of this function is (the function itself is documented, but I can't find any information for which situations this function should be called.) but it sufficed to call this function once between sending and requesting. I have two questions. The first is where I can find R documentation on proper use of non-blocking sockets and on the proper use of the socketSelect function?
In addition to the demos I sent to you in that 2021 thread on R-pkg-devel, you could also have a look at how it is used in R itself, in the parallel package, in snowSOCK.R, to set up the snow cluster in parallel. Some hints may be also found in the blog post https://blog.r-project.org/2020/03/17/socket-connections-update/. But, in principle, R API is just a thin layer on top of what the OS provides, so general literature and tutorials on sockets should help, there should be even textbooks used at CS universities in networking classes. Basically select() can tell you when data is ready (on input), when the socket interface is able to accept more data (on output) or when there is an incoming connection. In practice, you should not need any delays to be inserted in your program to make it work - if that is needed, it means that is an error in it (a race condition). If the program is polling (checking in a loop whether something has already happened, and then sleeping for a short while), the duration of the sleep may indeed influence latency, but should not affect correctness - if it does, there is an error.
The second question is more focused on using non-blocking sockets in general. Is it allowed to execute a read and a receive command immediately after each other or must a short waiting loop be built in. I'm asking this because I'm running into the same problems in a C++ project as I did with RBaseX.
No, in general there is no need to insert any delays between reads and writes, they can actually happen concurrently. But these are general networking questions, not the topic of this list. Best Tomas
Ben Engbers
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
? Wed, 15 Feb 2023 01:24:26 +0100 Ben Engbers <Ben.Engbers at Be-Logical.nl> ?????:
where I can find R documentation on proper use of non-blocking sockets and on the proper use of the socketSelect function?
A useful guide to the Berkeley sockets API can be found at <https://beej.us/guide/bgnet/>. You'll have to translate between the C idioms and the R idioms, but it's better than having no guide at all. In particular, R spares you from having to figure out differently-sized struct sockaddr objects by converting them to string representations of the addresses (currently limited to IPv4).
Best regards, Ivan
Hi Op 15-02-2023 om 14:38 schreef Tomas Kalibera:
On 2/15/23 01:24, Ben Engbers wrote:
Hi, December 27, 2021 I started a thread asking for help troubleshooting non-blocking sockets.
..
I have two questions. The first is where I can find R documentation on proper use of non-blocking sockets and on the proper use of the socketSelect function?
In addition to the demos I sent to you in that 2021 thread on R-pkg-devel, you could also have a look at how it is used in R itself, in the parallel package, in snowSOCK.R, to set up the snow cluster in parallel. Some hints may be also found in the blog post https://blog.r-project.org/2020/03/17/socket-connections-update/. But, in principle, R API is just a thin layer on top of what the OS provides, so general literature and tutorials on sockets should help, there should be even textbooks used at CS universities in networking classes.
Thanks for the suggestions!
Basically select() can tell you when data is ready (on input), when the socket interface is able to accept more data (on output) or when there is an incoming connection. In practice, you should not need any delays to be inserted in your program to make it work - if that is needed, it means that is an error in it (a race condition). If the program is polling (checking in a loop whether something has already happened, and then sleeping for a short while), the duration of the sleep may indeed influence latency, but should not affect correctness - if it does, there is an error.
In RBaseX I first calculate an MD5 hash that is send to the server and then I check the status byte that is returned by the server. writeBin(auth, private$conn) socketSelect(list(conn)) Accepted <- readBin(conn, what = "raw", n = 1) == 0 Without the second line, 'Accepted' is always FALSE. With this line it is TRUE. BaseX provides example API's in several languages. I've looked at several but indeed none uses any form of delay. All API's follow the same pattern, calculate a MD5, send it to the server and check the status byte. So the server is not likely to enforce a delay. So there is nothing left but to look for that racing condition ;-( Ben
On 2/15/23 16:44, Ben Engbers wrote:
Hi Op 15-02-2023 om 14:38 schreef Tomas Kalibera:
On 2/15/23 01:24, Ben Engbers wrote:
Hi, December 27, 2021 I started a thread asking for help troubleshooting non-blocking sockets.
..
I have two questions. The first is where I can find R documentation on proper use of non-blocking sockets and on the proper use of the socketSelect function?
In addition to the demos I sent to you in that 2021 thread on R-pkg-devel, you could also have a look at how it is used in R itself, in the parallel package, in snowSOCK.R, to set up the snow cluster in parallel. Some hints may be also found in the blog post https://blog.r-project.org/2020/03/17/socket-connections-update/. But, in principle, R API is just a thin layer on top of what the OS provides, so general literature and tutorials on sockets should help, there should be even textbooks used at CS universities in networking classes.
Thanks for the suggestions!
Basically select() can tell you when data is ready (on input), when the socket interface is able to accept more data (on output) or when there is an incoming connection. In practice, you should not need any delays to be inserted in your program to make it work - if that is needed, it means that is an error in it (a race condition). If the program is polling (checking in a loop whether something has already happened, and then sleeping for a short while), the duration of the sleep may indeed influence latency, but should not affect correctness - if it does, there is an error.
In RBaseX I first calculate an MD5 hash that is send to the server and then I check the status byte that is returned by the server. writeBin(auth, private$conn) socketSelect(list(conn)) Accepted <- readBin(conn, what = "raw", n = 1) == 0 Without the second line, 'Accepted' is always FALSE. With this line it is TRUE. BaseX provides example API's in several languages. I've looked at several but indeed none uses any form of delay. All API's follow the same pattern, calculate a MD5, send it to the server and check the status byte. So the server is not likely to enforce a delay. So there is nothing left but to look for that racing condition ;-(
Without knowing more details, this looks ok. If you have a non-blocking connection, and the server produces a response based on the client request, the client has to take into account that it takes the server some time to produce the response. Right, the sockets are full duplex and so could be the communication protocol, but in this case it apparently isn't, it is request/response. Without the second line, there would be a race condition between the server sending a response and the client receiving it. With the second line, the client waits for the server before it starts receiving. In theory, one could be waiting for the response actively in a loop (polling), but socketSelect() is better. Both ways would resolve the race condition. Adding a single fixed-time wait, instead, would not remove the race condition, because one can never be sure that the server wouldn't take longer (apart from waiting too long most of the time). In the example you are waiting only for a single byte. But if the response may be longer, one needs to take into account in the client that not all bytes of the response may be available right away. One would keep receiving the data in a loop, as they become available (e.g. socketSelect() would tell), keep appending them to a buffer, and keep looking for when they are complete. Tomas
Ben
1 day later
Hi Tomas,
Apparently, inserting some kind of socketSelect() is essential when
using non-blocking sockets and a client/erve architecture. That is at
least one thing that I have learned ;-).
In C++, between sending and requesting, I inserted a call to this function:
bool wait(int s) {
fd_set read_set;
struct timeval timeout {};
memset(&timeout, 0, sizeof(timeout));
bool done{};
while (!done ) {
FD_ZERO(&read_set);
FD_SET(s, &read_set);
int rc = select(s + 1, &read_set, NULL, NULL, &timeout);
done = (rc == 1) && FD_ISSET(s, &read_set);
};
return done;
};
Inserting this call was essential in solving my problem.
Ben
Op 15-02-2023 om 17:17 schreef Tomas Kalibera:
In the example you are waiting only for a single byte. But if the response may be longer, one needs to take into account in the client that not all bytes of the response may be available right away. One would keep receiving the data in a loop, as they become available (e.g. socketSelect() would tell), keep appending them to a buffer, and keep looking for when they are complete. Tomas
Ben
Ben, yes, by definition - non-blocking means that reads won't block and always return immediately (the point of non-blocking). The loop below is terrible as it will cause 100% CPU usage while it's spinning. It seems that you want to block so why are you using non-blocking mode? select() effectively gets you back to blocking mode, because it does the "block" that read() would normally do in blocking mode. Moreover select() allows you to block for a specified time (the point of the timeout argument) so if you want to wait, you should set the timeout - you should never use a spin loop without timeouts. Also there are many other conditions you should be handling - there may be an error on the socket or EINTR (you should call R's interrupt handler) or EAGAIN (which you do implicitly, but you can't tell it from an actual error). Sockets and I/O are quite complex matter - it's easy to get it wrong and create hard-to-detect bugs in you code unless you are an expert in it. It's one of the wheels you don't want to be reinventing. Cheers, Simon
On Feb 18, 2023, at 3:00 AM, Ben Engbers <ben.engbers at gmail.com> wrote:
Hi Tomas,
Apparently, inserting some kind of socketSelect() is essential when using non-blocking sockets and a client/erve architecture. That is at least one thing that I have learned ;-).
In C++, between sending and requesting, I inserted a call to this function:
bool wait(int s) {
fd_set read_set;
struct timeval timeout {};
memset(&timeout, 0, sizeof(timeout));
bool done{};
while (!done ) {
FD_ZERO(&read_set);
FD_SET(s, &read_set);
int rc = select(s + 1, &read_set, NULL, NULL, &timeout);
done = (rc == 1) && FD_ISSET(s, &read_set);
};
return done;
};
Inserting this call was essential in solving my problem.
Ben
Op 15-02-2023 om 17:17 schreef Tomas Kalibera:
In the example you are waiting only for a single byte. But if the response may be longer, one needs to take into account in the client that not all bytes of the response may be available right away. One would keep receiving the data in a loop, as they become available (e.g. socketSelect() would tell), keep appending them to a buffer, and keep looking for when they are complete. Tomas
Ben
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel