Martin Maechler wrote:
"vQ" == Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no>
on Thu, 23 Apr 2009 11:49:54 +0200 writes:
vQ> maechler at stat.math.ethz.ch wrote:
>>
vQ> sprintf has a documented limit on strings included in the output using the
vQ> format '%s'. It appears that there is a limit on the length of strings included
vQ> with, e.g., the format '%d' beyond which surprising things happen (output
vQ> modified for conciseness):
>>
vQ> ... and this limit is *not* documented.
well, it is basically (+ a few bytes ?)
the same 8192 limit that *is* documented.
martin, ?sprintf says:
" There is a limit of 8192 bytes on elements of 'fmt' and also on
strings included by a '%s' conversion specification."
for me, it's clear that the *elements of fmt* cannot be longer than 8192
bytes, and that each single bit included in the output in place of a %s
cannot be longer than 8192. nowhere does it say that strings included
in the output in place of a %d, for example, cannot be longer than
8192. the fact that %s is particularized makes me infer that there is
something specific to %s that does not apply to %d, for example,
otherwise the help would have been formulated differently. (though
given how r help pages are written, nothing seems unlikely.)
and in fact, the limit does not seem to apply in an obvious way in cases
such as sprintf('%*d', 10000, 1), where the output is correct. at the
very least, the documentation leaves the user ignorant as to what will
happen if the limit is exceeded.
>> my version of 'man 3 sprintf' contains
>>
>>
>>>> BUGS
>>>> Because sprintf() and vsprintf() assume an arbitrarily
>>>> long string, callers must be careful not to overflow the
>>>> actual space; this is often impossible to assure. Note
>>>> that the length of the strings produced is
>>>> locale-dependent and difficult to predict. Use
>>>> snprintf() and vsnprintf() instead (or asprintf() and vasprintf).
vQ> yes, but this is c documentation, not r documentation.
Of course! ... and I *do* apply it to R's C code [sprintf.c]
and hence am even concurring with you ..
vQ> while snprintf would help avoid buffer overflow, it may not be a
vQ> solution to the issue of confused output.
I think it would / will. We would be able to give warnings and
errors, by checking the snprintf() return codes.
maybe, i can't judge without carefully examining the code for sprintf.c (which i am rather unwilling to do, having had a look at a sample).
>> More precisely, I see that some windows-only code relies on
>> snprintf() being available whereas in at least on non-Windows
>> section, I read /* we cannot assume snprintf here */
>>
>> Now such platform dependency issues and corresponding configure
>> settings I do typically leave to other R-corers with a much
>> wider overview about platforms and their compilers and C libraries.
>>
vQ> it looks like src/main/sprintf.c is just buggy, and it's plausible that
vQ> the bug could be repaired in a platform-independent manner.
definitely.
In the mean time, I've actually found that what I first said on
the usability of snprintf() in R's code base was only partly correct.
There are other parts of R code where we use snprintf() for all
platforms, hence we rely on its presence (and correct
implementation!) and so we can and I think should use it in
place of sprintf() in quite a few places inside R's sprintf.c
would be interesting to see how this improves sprintf.
>> BTW,
>> 1) sprintf("%n %g", 1,1) also seg.faults
>>
vQ> as do
vQ> sprintf('%n%g', 1, 1)
vQ> sprintf('%n%')
vQ> etc., while
vQ> sprintf('%q%g', 1, 1)
vQ> sprintf('%q%')
vQ> work just fine. strange, because per ?sprintf 'n' is not recognized as
vQ> a format specifier, so the output from the first two above should be as
vQ> from the last two above, respectively. (and likewise in the %S case,
vQ> discussed and bug-reported earlier.)
I have now fixed these bugs at least;
great, i'm going to torture the fix soon ;)
the more subtle "%<too_large_n>d" ones are different, and as I said, I'm convinced that a nice & clean fix for those will start using snprintf().
>> 2) Did you have a true use case where the 8192 limit was an
>> undesirable limit?
vQ> how does it matter?
well, we could increase it, if it did matter.
{{ you *could* have been more polite here, no?
i don't see how i could be more polite here, i had absolutely no intention to be impolite and didn't think i were. i gave a serious answer by means of a serious question. increasing an arbitrary, poorly documented limit of obscure effect is hardly any solution. suggesting that a bug is not a bug because some limit is not likely to be exceeded in practice is not a particularly good idea.
it *was* after all a serious question that I asked! }}
vQ> if you set a limit, be sure to consistently enforce
vQ> it and warn the user on attempts to exceed it. or write clearly in the
vQ> docs that such attempts will cause the output to be silently truncated.
Sure, I'm not at all disagreeing on that, and if you read this into my
posting, you misunderstand.
no, i didn't read that into your posting, i'm just referring to the state of the 'art' in r. cheers, vQ