At least in 2015, a github user, tobigithub, submit an [issue](https://github.com/sneumann/xcms/issues/20) about the error "Error in file(con, "w") : all connections are in use" Nowadays, since AMD have really cool CPUs which increases the thread numbers to 128 or even 256 on a single server, we found that the NCONNECTIONS variable could prevent us from utilizing all the 128 threads. It might be a good choice to increase its value. the variable is defined in `R-4.1.1/src/main/connections.c: 17` I have tested that, increase it to 1024 generates no error and all the clusters (I tried with 256 clusters on my 16 threads Laptop) works fine. Is it possible increase the size of NCONNECTION?
Is it a good choice to increase the NCONNECTION value?
8 messages · qweytr1 m@iii@g oii m@ii@ustc@edu@c@, Martin Maechler, GILLIBERT, Andre +3 more
qweytr1--- via R-devel
on Tue, 24 Aug 2021 00:51:31 +0800 (GMT+08:00) writes:
> At least in 2015, a github user, tobigithub, submit an
> [issue](https://github.com/sneumann/xcms/issues/20) about
> the error "Error in file(con, "w") : all connections are
> in use" Nowadays, since AMD have really cool CPUs which
> increases the thread numbers to 128 or even 256 on a
> single server, we found that the NCONNECTIONS variable
> could prevent us from utilizing all the 128 threads. It
> might be a good choice to increase its value.
> the variable is defined in
> `R-4.1.1/src/main/connections.c: 17` I have tested that,
> increase it to 1024 generates no error and all the
> clusters (I tried with 256 clusters on my 16 threads
> Laptop) works fine.
> Is it possible increase the size of NCONNECTION?
Yes, of course, it is possible.
The question is how much it costs and to which number it should
be increased.
A quick look at the source connections.c --> src/R_ext/include/Connections.h
reveals that the Rconnection* <--> Rconn is a struct with about
200 chars and ca 30 int-like plus another 20 pointers .. which
would amount to a rough 400 bytes per connection.
Adding 1024-128 = 896 new ones would then amount to increase
the R executable by about 360 kB .. all the above being rough.
So personally, I guess that's "about ok" --
are there other things to consider?
Ideally, of course, the number of possible connections could be
increased dynamically only when needed
Martin
RConnection is a pointer to a Rconn structure. The Rconn structure must be allocated independently (e.g. by malloc() in R_new_custom_connection). Therefore, increasing NCONNECTION to 1024 should only use 8 kilobytes on 64-bits platforms and 4 kilobytes on 32 bits platforms. Ideally, it should be dynamically allocated : either as a linked list or as a dynamic array (malloc/realloc). However, a simple change of NCONNECTION to 1024 should be enough for most uses. -- Sincerely Andr? GILLIBERT ________________________________ De : R-devel <r-devel-bounces at r-project.org> de la part de Martin Maechler <maechler at stat.math.ethz.ch> Envoy? : mardi 24 ao?t 2021 10:44 ? : qweytr1 at mail.ustc.edu.cn Cc : R-devel Objet : Re: [Rd] Is it a good choice to increase the NCONNECTION value?
qweytr1--- via R-devel
on Tue, 24 Aug 2021 00:51:31 +0800 (GMT+08:00) writes:
> At least in 2015, a github user, tobigithub, submit an
> [issue](https://github.com/sneumann/xcms/issues/20) about
> the error "Error in file(con, "w") : all connections are
> in use" Nowadays, since AMD have really cool CPUs which
> increases the thread numbers to 128 or even 256 on a
> single server, we found that the NCONNECTIONS variable
> could prevent us from utilizing all the 128 threads. It
> might be a good choice to increase its value.
> the variable is defined in
> `R-4.1.1/src/main/connections.c: 17` I have tested that,
> increase it to 1024 generates no error and all the
> clusters (I tried with 256 clusters on my 16 threads
> Laptop) works fine.
> Is it possible increase the size of NCONNECTION?
Yes, of course, it is possible.
The question is how much it costs and to which number it should
be increased.
A quick look at the source connections.c --> src/R_ext/include/Connections.h
reveals that the Rconnection* <--> Rconn is a struct with about
200 chars and ca 30 int-like plus another 20 pointers .. which
would amount to a rough 400 bytes per connection.
Adding 1024-128 = 896 new ones would then amount to increase
the R executable by about 360 kB .. all the above being rough.
So personally, I guess that's "about ok" --
are there other things to consider?
Ideally, of course, the number of possible connections could be
increased dynamically only when needed
Martin
______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
GILLIBERT, Andre
on Tue, 24 Aug 2021 09:49:52 +0000 writes:
> RConnection is a pointer to a Rconn structure. The Rconn
> structure must be allocated independently (e.g. by
> malloc() in R_new_custom_connection). Therefore,
> increasing NCONNECTION to 1024 should only use 8
> kilobytes on 64-bits platforms and 4 kilobytes on 32
> bits platforms.
You are right indeed, and I was wrong.
> Ideally, it should be dynamically allocated : either as
> a linked list or as a dynamic array
> (malloc/realloc). However, a simple change of
> NCONNECTION to 1024 should be enough for most uses.
There is one important other problem I've been made aware
(similarly to the number of open DLL libraries, an issue 1-2
years ago) :
The OS itself has limits on the number of open files
(yes, I know that there are other connections than files) and
these limits may quite differ from platform to platform.
On my Linux laptop, in a shell, I see
$ ulimit -n
1024
which is barely conformant with your proposed 1024 NCONNECTION.
Now if NCONNCECTION is larger than the max allowed number of
open files and if R opens more files than the OS allowed, the
user may get quite unpleasant behavior, e.g. R being terminated brutally
(or behaving crazily) without good R-level warning / error messages.
It's also not at all sufficient to check for the open files
limit at compile time, but rather at R process startup time
So this may need considerably more work than you / we have
hoped, and it's probably hard to find a safe number that is
considerably larger than 128 and less than the smallest of all
non-crazy platforms' {number of open files limit}.
> Sincerely
> Andr? GILLIBERT
[............]
Martin, I don't think static connection limit is sensible. Recall that connections can be anything, not just necessarily sockets or file descriptions so they are not linked to the system fd limit. For example, if you use a codec then you will need twice the number of connections than the fds. To be honest the connection limit is one of the main reasons why in our big data applications we have always avoided R connections and used C-level sockets instead (others were lack of control over the socket flags, but that has been addressed in the last release). So I'd vote for at the very least increasing the limit significantly (at least 1k if not more) and, ideally, make it dynamic if memory footprint is an issue. Cheers, Simon
On Aug 25, 2021, at 8:53 AM, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
GILLIBERT, Andre on Tue, 24 Aug 2021 09:49:52 +0000 writes:
RConnection is a pointer to a Rconn structure. The Rconn structure must be allocated independently (e.g. by malloc() in R_new_custom_connection). Therefore, increasing NCONNECTION to 1024 should only use 8 kilobytes on 64-bits platforms and 4 kilobytes on 32 bits platforms.
You are right indeed, and I was wrong.
Ideally, it should be dynamically allocated : either as a linked list or as a dynamic array (malloc/realloc). However, a simple change of NCONNECTION to 1024 should be enough for most uses.
There is one important other problem I've been made aware
(similarly to the number of open DLL libraries, an issue 1-2
years ago) :
The OS itself has limits on the number of open files
(yes, I know that there are other connections than files) and
these limits may quite differ from platform to platform.
On my Linux laptop, in a shell, I see
$ ulimit -n
1024
which is barely conformant with your proposed 1024 NCONNECTION.
Now if NCONNCECTION is larger than the max allowed number of
open files and if R opens more files than the OS allowed, the
user may get quite unpleasant behavior, e.g. R being terminated brutally
(or behaving crazily) without good R-level warning / error messages.
It's also not at all sufficient to check for the open files
limit at compile time, but rather at R process startup time
So this may need considerably more work than you / we have
hoped, and it's probably hard to find a safe number that is
considerably larger than 128 and less than the smallest of all
non-crazy platforms' {number of open files limit}.
Sincerely Andr? GILLIBERT
[............]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
We do need to be careful about using too many file descriptors. The standard soft limit on Linux is fairly low (1024; the hard limit is usually quite a bit higher). Hitting that limit, e.g. with runaway with code allocating lots of connections, can cause other things, like loading packages, to fail with hard to diagnose error messages. A static connection limit is a crude way to guard against that. Doing anything substantially better is probably a lot of work. A simple option that may be worth pursuing is to allow the limit to be adjusted at runtime. Users who want to go higher would do so at their own risk and may need to know how to adjust the soft limit on the process. Best, luke
On Wed, 25 Aug 2021, Simon Urbanek wrote:
Martin, I don't think static connection limit is sensible. Recall that connections can be anything, not just necessarily sockets or file descriptions so they are not linked to the system fd limit. For example, if you use a codec then you will need twice the number of connections than the fds. To be honest the connection limit is one of the main reasons why in our big data applications we have always avoided R connections and used C-level sockets instead (others were lack of control over the socket flags, but that has been addressed in the last release). So I'd vote for at the very least increasing the limit significantly (at least 1k if not more) and, ideally, make it dynamic if memory footprint is an issue. Cheers, Simon
On Aug 25, 2021, at 8:53 AM, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
GILLIBERT, Andre on Tue, 24 Aug 2021 09:49:52 +0000 writes:
RConnection is a pointer to a Rconn structure. The Rconn structure must be allocated independently (e.g. by malloc() in R_new_custom_connection). Therefore, increasing NCONNECTION to 1024 should only use 8 kilobytes on 64-bits platforms and 4 kilobytes on 32 bits platforms.
You are right indeed, and I was wrong.
Ideally, it should be dynamically allocated : either as a linked list or as a dynamic array (malloc/realloc). However, a simple change of NCONNECTION to 1024 should be enough for most uses.
There is one important other problem I've been made aware
(similarly to the number of open DLL libraries, an issue 1-2
years ago) :
The OS itself has limits on the number of open files
(yes, I know that there are other connections than files) and
these limits may quite differ from platform to platform.
On my Linux laptop, in a shell, I see
$ ulimit -n
1024
which is barely conformant with your proposed 1024 NCONNECTION.
Now if NCONNCECTION is larger than the max allowed number of
open files and if R opens more files than the OS allowed, the
user may get quite unpleasant behavior, e.g. R being terminated brutally
(or behaving crazily) without good R-level warning / error messages.
It's also not at all sufficient to check for the open files
limit at compile time, but rather at R process startup time
So this may need considerably more work than you / we have
hoped, and it's probably hard to find a safe number that is
considerably larger than 128 and less than the smallest of all
non-crazy platforms' {number of open files limit}.
Sincerely Andr? GILLIBERT
[............]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney at uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Luke, sure, adjustment at run-time works just fine, the issue currently is that it is baked-in at compile time so there is no way to adjust it (re-building R is not an option in production environment where this usually happens). That said, I'm still not sure that connection limit is a good way to guard against the fd limit since there are so many other ways to use up descriptors (DLLs, sockets, pipes, etc. - packages and 3rd party libraries). Apparently we are actually already fiddling with the soft limit - we have R_EnsureFDLimit() and R_GetFDLimit() which is used at startup to raise it to 1024 by default regardless of the ulimit -n setting (comments say this is for DLLs). I guess based on that we know at least what to expect so we could trivially warn if the new setting is larger that the user limit. Cheers, Simon
On Aug 25, 2021, at 1:45 PM, luke-tierney at uiowa.edu wrote: We do need to be careful about using too many file descriptors. The standard soft limit on Linux is fairly low (1024; the hard limit is usually quite a bit higher). Hitting that limit, e.g. with runaway with code allocating lots of connections, can cause other things, like loading packages, to fail with hard to diagnose error messages. A static connection limit is a crude way to guard against that. Doing anything substantially better is probably a lot of work. A simple option that may be worth pursuing is to allow the limit to be adjusted at runtime. Users who want to go higher would do so at their own risk and may need to know how to adjust the soft limit on the process. Best, luke On Wed, 25 Aug 2021, Simon Urbanek wrote:
Martin, I don't think static connection limit is sensible. Recall that connections can be anything, not just necessarily sockets or file descriptions so they are not linked to the system fd limit. For example, if you use a codec then you will need twice the number of connections than the fds. To be honest the connection limit is one of the main reasons why in our big data applications we have always avoided R connections and used C-level sockets instead (others were lack of control over the socket flags, but that has been addressed in the last release). So I'd vote for at the very least increasing the limit significantly (at least 1k if not more) and, ideally, make it dynamic if memory footprint is an issue. Cheers, Simon
On Aug 25, 2021, at 8:53 AM, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
GILLIBERT, Andre on Tue, 24 Aug 2021 09:49:52 +0000 writes:
RConnection is a pointer to a Rconn structure. The Rconn structure must be allocated independently (e.g. by malloc() in R_new_custom_connection). Therefore, increasing NCONNECTION to 1024 should only use 8 kilobytes on 64-bits platforms and 4 kilobytes on 32 bits platforms.
You are right indeed, and I was wrong.
Ideally, it should be dynamically allocated : either as a linked list or as a dynamic array (malloc/realloc). However, a simple change of NCONNECTION to 1024 should be enough for most uses.
There is one important other problem I've been made aware
(similarly to the number of open DLL libraries, an issue 1-2
years ago) :
The OS itself has limits on the number of open files
(yes, I know that there are other connections than files) and
these limits may quite differ from platform to platform.
On my Linux laptop, in a shell, I see
$ ulimit -n
1024
which is barely conformant with your proposed 1024 NCONNECTION.
Now if NCONNCECTION is larger than the max allowed number of
open files and if R opens more files than the OS allowed, the
user may get quite unpleasant behavior, e.g. R being terminated brutally
(or behaving crazily) without good R-level warning / error messages.
It's also not at all sufficient to check for the open files
limit at compile time, but rather at R process startup time
So this may need considerably more work than you / we have
hoped, and it's probably hard to find a safe number that is
considerably larger than 128 and less than the smallest of all
non-crazy platforms' {number of open files limit}.
Sincerely Andr? GILLIBERT
[............]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
On 8/25/21 6:05 AM, Simon Urbanek wrote:
Luke, sure, adjustment at run-time works just fine, the issue currently is that it is baked-in at compile time so there is no way to adjust it (re-building R is not an option in production environment where this usually happens). That said, I'm still not sure that connection limit is a good way to guard against the fd limit since there are so many other ways to use up descriptors (DLLs, sockets, pipes, etc. - packages and 3rd party libraries). Apparently we are actually already fiddling with the soft limit - we have R_EnsureFDLimit() and R_GetFDLimit() which is used at startup to raise it to 1024 by default regardless of the ulimit -n setting (comments say this is for DLLs). I guess based on that we know at least what to expect so we could trivially warn if the new setting is larger that the user limit.
Hi Simon, I think the handling of the OS connections limit (querying, increasing, basing the real DLL limit on that and on a user request), which takes into account problems described by Martin and Luke, could be extended to cover the connections limit in question now. The DLL limit heuristics were chosen based on our R hard-limit on the number of connections. Some background is in https://developer.r-project.org/Blog/public/2018/03/23/maximum-number-of-dlls/index.html If it turns out too much work for the near future/next release (it will be a lot of work to get it right, including the heuristics and their interactions between the connections limit and the DLL limit), we could at least (perhaps temporarily) allow users who explicitly want to override and take the risk to do so, perhaps with some warnings when the overridden value seems too large given the OS-limit and the DLL-limit. Cheers, Tomas
Cheers, Simon
On Aug 25, 2021, at 1:45 PM, luke-tierney at uiowa.edu wrote: We do need to be careful about using too many file descriptors. The standard soft limit on Linux is fairly low (1024; the hard limit is usually quite a bit higher). Hitting that limit, e.g. with runaway with code allocating lots of connections, can cause other things, like loading packages, to fail with hard to diagnose error messages. A static connection limit is a crude way to guard against that. Doing anything substantially better is probably a lot of work. A simple option that may be worth pursuing is to allow the limit to be adjusted at runtime. Users who want to go higher would do so at their own risk and may need to know how to adjust the soft limit on the process. Best, luke On Wed, 25 Aug 2021, Simon Urbanek wrote:
Martin, I don't think static connection limit is sensible. Recall that connections can be anything, not just necessarily sockets or file descriptions so they are not linked to the system fd limit. For example, if you use a codec then you will need twice the number of connections than the fds. To be honest the connection limit is one of the main reasons why in our big data applications we have always avoided R connections and used C-level sockets instead (others were lack of control over the socket flags, but that has been addressed in the last release). So I'd vote for at the very least increasing the limit significantly (at least 1k if not more) and, ideally, make it dynamic if memory footprint is an issue. Cheers, Simon
On Aug 25, 2021, at 8:53 AM, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
GILLIBERT, Andre on Tue, 24 Aug 2021 09:49:52 +0000 writes:
RConnection is a pointer to a Rconn structure. The Rconn structure must be allocated independently (e.g. by malloc() in R_new_custom_connection). Therefore, increasing NCONNECTION to 1024 should only use 8 kilobytes on 64-bits platforms and 4 kilobytes on 32 bits platforms.
You are right indeed, and I was wrong.
Ideally, it should be dynamically allocated : either as a linked list or as a dynamic array (malloc/realloc). However, a simple change of NCONNECTION to 1024 should be enough for most uses.
There is one important other problem I've been made aware
(similarly to the number of open DLL libraries, an issue 1-2
years ago) :
The OS itself has limits on the number of open files
(yes, I know that there are other connections than files) and
these limits may quite differ from platform to platform.
On my Linux laptop, in a shell, I see
$ ulimit -n
1024
which is barely conformant with your proposed 1024 NCONNECTION.
Now if NCONNCECTION is larger than the max allowed number of
open files and if R opens more files than the OS allowed, the
user may get quite unpleasant behavior, e.g. R being terminated brutally
(or behaving crazily) without good R-level warning / error messages.
It's also not at all sufficient to check for the open files
limit at compile time, but rather at R process startup time
So this may need considerably more work than you / we have
hoped, and it's probably hard to find a safe number that is
considerably larger than 128 and less than the smallest of all
non-crazy platforms' {number of open files limit}.
Sincerely Andr? GILLIBERT
[............]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel