Hi,
during the examination of a Sweave hang-up inside an odfWeave call (OOo
XMLs have looong lines) I have discovered that my cat function cannot
write more than 10000 characters to a text file. Otherwise, the internal
C code causes a hang-up, which can only be stopped with a quit signal
that terminates the R session. Is this behavior normal?
Code to reproduce this:
testChunk <- paste(rep("a", 10000 + 1), ## delete "+ 1" to be successful
collapse="")
output <- tempfile()
cat(testChunk, sep = "\n", file = output, append = TRUE)
My sessionInfo:
R version 2.8.1 (2008-12-22)
i686-pc-linux-gnu (actually the latest openSuse 11.1)
locale:
LC_CTYPE=de_DE.UTF-8;LC_NUMERIC=C;LC_TIME=de_DE.UTF-8;LC_COLLATE=de_DE.UTF-8;LC_MONETARY=C;LC_MESSAGES=de_DE.UTF-8;LC_PAPER=de_DE.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=de_DE.UTF-8;LC_IDENTIFICATION=C
Thanks in advance,
Daniel
cat cannot write more than 10000 characters? [R 2.8.1]
7 messages · Daniel Sabanés Bové, Brian Ripley
On Sun, 4 Jan 2009, Daniel Saban?s Bov? wrote:
Hi, during the examination of a Sweave hang-up inside an odfWeave call (OOo XMLs have looong lines) I have discovered that my cat function cannot write more than 10000 characters to a text file. Otherwise, the internal
You mean on a single line?
C code causes a hang-up, which can only be stopped with a quit signal that terminates the R session. Is this behavior normal?
No, works for me on Mac OS X and x86_64 Fedora 8 (as does 10x larger). Can you run this under a debugger and find where it is going wrong for you?
Code to reproduce this:
testChunk <- paste(rep("a", 10000 + 1), ## delete "+ 1" to be successful
collapse="")
output <- tempfile()
cat(testChunk, sep = "\n", file = output, append = TRUE)
We have writeLines() for that and it is more efficient, especially if you keep a connection open.
My sessionInfo: R version 2.8.1 (2008-12-22) i686-pc-linux-gnu (actually the latest openSuse 11.1) locale: LC_CTYPE=de_DE.UTF-8;LC_NUMERIC=C;LC_TIME=de_DE.UTF-8;LC_COLLATE=de_DE.UTF-8;LC_MONETARY=C;LC_MESSAGES=de_DE.UTF-8;LC_PAPER=de_DE.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=de_DE.UTF-8;LC_IDENTIFICATION=C Thanks in advance, Daniel
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Dear Prof. Ripley,
I have discovered that my cat function cannot write more than 10000 characters to a text file.
You mean on a single line?
Yes. OOo tries to save space...
No, works for me on Mac OS X and x86_64 Fedora 8 (as does 10x larger). Can you run this under a debugger and find where it is going wrong for you?
Oh, then this might be distribution- or gcc-version-specific:
gcc --version
gcc (SUSE Linux) 4.3.2 [gcc-4_3-branch revision 141291]
glibc is version 2.9-2.3.
Using ddd I found the (relevant part of the) backtrace when interrupting
the infinite loop:
(gdb) backtrace
#0 __gconv (cd=0x846cde0, inbuf=0xbfff7738, inbufend=0x84ca589 "",
outbuf=0xbfff773c, outbufend=0xbfff9e57 "", irreversible=0xbfff76a8) at
gconv.c:80
The program comes here more than 100 000 times... with outbuf and inbuf
always being "\0".
#1 0xb7b581e7 in iconv (cd=0x846cde0, inbuf=0xbfff7738,
inbytesleft=0xbfff7734, outbuf=0xbfff773c, outbytesleft=0xbfff7730) at
iconv.c:53
[this is result = __gconv (gcd, (const unsigned char **) inbuf,
(const unsigned char *) (*inbuf + *inbytesleft),
(unsigned char **) outbuf,
(unsigned char *) (*outbuf + *outbytesleft),
&irreversible);]
#2 0xb7e44d29 in Riconv (cd=0x846cde0, inbuf=0xbfff7738,
inbytesleft=0xbfff7734, outbuf=0xbfff773c, outbytesleft=0xbfff7730) at
sysutils.c:692
[ this is the only line of Riconv, return iconv((iconv_t) cd,
(ICONV_CONST char **) inbuf, inbytesleft, outbuf, outbytesleft);]
#3 0xb7d2c337 in dummy_vfprintf (con=0x8400bb0, format=0xb7ee0c48 "%s",
ap=0xbfffc604 "\230?L\b??\005\b?h\a\b?h\a\b??\005\b??\005\b\001") at
connections.c:316
[this is ires = Riconv(con->outconv, &ib, &inb, &ob, &onb);]
The infinite loop seems to be inside dummy_vfprintf, as this position is
the "highest" inside the backtrace which is reached again and again. And
at line 249 appears the magic number 10000 as BUFSIZE, which is indeed
selected by the preprocessor in my environment!
#4 0xb7d2c4fa in file_vfprintf (con=0x8400bb0, format=0xb7ee0c48 "%s",
ap=0xbfffc604 "\230?L\b??\005\b?h\a\b?h\a\b??\005\b??\005\b\001") at
connections.c:579
[this is if(con->outconv) return dummy_vfprintf(con, format, ap);]
This and everything above is only reached once, so this might be OK.
#5 0xb7dfe069 in Rvprintf (format=0xb7ee0c48 "%s", arg=0xbfffc604
"\230?L\b??\005\b?h\a\b?h\a\b??\005\b??\005\b\001") at printutils.c:785
[this is (con->vfprintf)(con, format, argcopy);]
#6 0xb7dfe244 in Rprintf (format=0xb7ee0c48 "%s") at printutils.c:679
[this is Rvprintf(format, ap);]
#7 0xb7d0446c in do_cat (call=0x83032a8, op=0x806b7d4, args=<value
optimized out>, rho=0x830359c) at builtin.c:597
[this is Rprintf("%s", p);]
Unfortunately, I'm not experienced in R/C code internals, but if you
have detailed instructions for me (like "show me the value of this
variable after 10000 stops") I can provide more debugging info.
cat(testChunk, sep = "\n", file = output, append = TRUE)
We have writeLines() for that and it is more efficient, especially if you keep a connection open.
OK, maybe Prof. Leisch wants to improve the Sweave code...? Thank you very much for your help, best regards, Daniel Sabanes
6 days later
Looks like a bug in your iconv. However, that section of code is
conditionalized by
if(con->outconv) { /* translate the buffer */
and I don't see that as non-NULL on my systems. It should only be
called when you specify an encoding on the output connection, so have
you set an option (e.g. "encoding") without telling us?
I was able to reproduce a similar problem by
cat(testChunk, sep = "\n", file = file("output", encoding="latin1"),
append = TRUE)
in a UTF-8 locale, and I'll add a workaround to the R sources.
Please do run your tests with R --vanilla and make sure they are
complete -- see the posting guide.
On Mon, 5 Jan 2009, Daniel Saban?s Bov? wrote:
Dear Prof. Ripley,
I have discovered that my cat function cannot write more than 10000 characters to a text file.
I think you meant *bytes*, BTW.
You mean on a single line?
Yes. OOo tries to save space...
No, works for me on Mac OS X and x86_64 Fedora 8 (as does 10x larger). Can you run this under a debugger and find where it is going wrong for you?
Oh, then this might be distribution- or gcc-version-specific:
gcc --version
gcc (SUSE Linux) 4.3.2 [gcc-4_3-branch revision 141291]
glibc is version 2.9-2.3.
Using ddd I found the (relevant part of the) backtrace when interrupting
the infinite loop:
(gdb) backtrace
#0 __gconv (cd=0x846cde0, inbuf=0xbfff7738, inbufend=0x84ca589 "",
outbuf=0xbfff773c, outbufend=0xbfff9e57 "", irreversible=0xbfff76a8) at
gconv.c:80
The program comes here more than 100 000 times... with outbuf and inbuf
always being "\0".
#1 0xb7b581e7 in iconv (cd=0x846cde0, inbuf=0xbfff7738,
inbytesleft=0xbfff7734, outbuf=0xbfff773c, outbytesleft=0xbfff7730) at
iconv.c:53
[this is result = __gconv (gcd, (const unsigned char **) inbuf,
(const unsigned char *) (*inbuf + *inbytesleft),
(unsigned char **) outbuf,
(unsigned char *) (*outbuf + *outbytesleft),
&irreversible);]
#2 0xb7e44d29 in Riconv (cd=0x846cde0, inbuf=0xbfff7738,
inbytesleft=0xbfff7734, outbuf=0xbfff773c, outbytesleft=0xbfff7730) at
sysutils.c:692
[ this is the only line of Riconv, return iconv((iconv_t) cd,
(ICONV_CONST char **) inbuf, inbytesleft, outbuf, outbytesleft);]
#3 0xb7d2c337 in dummy_vfprintf (con=0x8400bb0, format=0xb7ee0c48 "%s",
ap=0xbfffc604 "\230?L\b??\005\b?h\a\b?h\a\b??\005\b??\005\b\001") at
connections.c:316
[this is ires = Riconv(con->outconv, &ib, &inb, &ob, &onb);]
The infinite loop seems to be inside dummy_vfprintf, as this position is
the "highest" inside the backtrace which is reached again and again. And
at line 249 appears the magic number 10000 as BUFSIZE, which is indeed
selected by the preprocessor in my environment!
#4 0xb7d2c4fa in file_vfprintf (con=0x8400bb0, format=0xb7ee0c48 "%s",
ap=0xbfffc604 "\230?L\b??\005\b?h\a\b?h\a\b??\005\b??\005\b\001") at
connections.c:579
[this is if(con->outconv) return dummy_vfprintf(con, format, ap);]
This and everything above is only reached once, so this might be OK.
#5 0xb7dfe069 in Rvprintf (format=0xb7ee0c48 "%s", arg=0xbfffc604
"\230?L\b??\005\b?h\a\b?h\a\b??\005\b??\005\b\001") at printutils.c:785
[this is (con->vfprintf)(con, format, argcopy);]
#6 0xb7dfe244 in Rprintf (format=0xb7ee0c48 "%s") at printutils.c:679
[this is Rvprintf(format, ap);]
#7 0xb7d0446c in do_cat (call=0x83032a8, op=0x806b7d4, args=<value
optimized out>, rho=0x830359c) at builtin.c:597
[this is Rprintf("%s", p);]
Unfortunately, I'm not experienced in R/C code internals, but if you
have detailed instructions for me (like "show me the value of this
variable after 10000 stops") I can provide more debugging info.
cat(testChunk, sep = "\n", file = output, append = TRUE)
We have writeLines() for that and it is more efficient, especially if you keep a connection open.
OK, maybe Prof. Leisch wants to improve the Sweave code...? Thank you very much for your help, best regards, Daniel Sabanes
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Yes, I set the encoding to UTF-8 in my .Rprofile. Sorry that I didn't
mention it already. So the complete stand-alone test code which fails in
R --vanilla is the following:
### code begin
options (encoding = "utf-8")
testChunk <- paste(rep("a", 10000 + 1), ## delete "+ 1" to be successful
collapse="")
output <- tempfile()
cat(testChunk, sep = "\n", file = output, append = TRUE)
### code end
And the version and locale of my system are
R version 2.8.1 (2008-12-22)
i686-pc-linux-gnu
locale:
LC_CTYPE=de_DE.UTF-8;LC_NUMERIC=C;LC_TIME=de_DE.UTF-8;LC_COLLATE=de_DE.UTF-8;LC_MONETARY=C;LC_MESSAGES=de_DE.UTF-8;LC_PAPER=de_DE.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=de_DE.UTF-8;LC_IDENTIFICATION=C
Prof Brian Ripley schrieb:
Looks like a bug in your iconv. However, that section of code is
conditionalized by
if(con->outconv) { /* translate the buffer */
and I don't see that as non-NULL on my systems. It should only be
called when you specify an encoding on the output connection, so have
you set an option (e.g. "encoding") without telling us?
I was able to reproduce a similar problem by
cat(testChunk, sep = "\n", file = file("output", encoding="latin1"),
append = TRUE)
in a UTF-8 locale, and I'll add a workaround to the R sources.
Please do run your tests with R --vanilla and make sure they are
complete -- see the posting guide.
On Mon, 5 Jan 2009, Daniel Saban?s Bov? wrote:
Dear Prof. Ripley,
I have discovered that my cat function cannot write more than 10000 characters to a text file.
I think you meant *bytes*, BTW.
On Sun, 11 Jan 2009, Daniel Saban?s Bov? wrote:
Yes, I set the encoding to UTF-8 in my .Rprofile. Sorry that I didn't
You really don't want to do that: it adds a considerable overhead and relies on a bug-free iconv .... The latest R-patched should work around this.
mention it already. So the complete stand-alone test code which fails in
R --vanilla is the following:
### code begin
options (encoding = "utf-8")
testChunk <- paste(rep("a", 10000 + 1), ## delete "+ 1" to be successful
collapse="")
output <- tempfile()
cat(testChunk, sep = "\n", file = output, append = TRUE)
### code end
And the version and locale of my system are
R version 2.8.1 (2008-12-22)
i686-pc-linux-gnu
locale:
LC_CTYPE=de_DE.UTF-8;LC_NUMERIC=C;LC_TIME=de_DE.UTF-8;LC_COLLATE=de_DE.UTF-8;LC_MONETARY=C;LC_MESSAGES=de_DE.UTF-8;LC_PAPER=de_DE.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=de_DE.UTF-8;LC_IDENTIFICATION=C
Prof Brian Ripley schrieb:
Looks like a bug in your iconv. However, that section of code is
conditionalized by
if(con->outconv) { /* translate the buffer */
and I don't see that as non-NULL on my systems. It should only be
called when you specify an encoding on the output connection, so have
you set an option (e.g. "encoding") without telling us?
I was able to reproduce a similar problem by
cat(testChunk, sep = "\n", file = file("output", encoding="latin1"),
append = TRUE)
in a UTF-8 locale, and I'll add a workaround to the R sources.
Please do run your tests with R --vanilla and make sure they are
complete -- see the posting guide.
On Mon, 5 Jan 2009, Daniel Saban?s Bov? wrote:
Dear Prof. Ripley,
I have discovered that my cat function cannot write more than 10000 characters to a text file.
I think you meant *bytes*, BTW.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Thank you very much for your help and advice! Prof Brian Ripley schrieb:
On Sun, 11 Jan 2009, Daniel Saban?s Bov? wrote:
Yes, I set the encoding to UTF-8 in my .Rprofile. Sorry that I didn't
You really don't want to do that: it adds a considerable overhead and relies on a bug-free iconv .... The latest R-patched should work around this.