From: r-devel-bounces at r-project.org=20 [mailto:r-devel-bounces at r-project.org] On Behalf Of Prof Brian Ripley Sent: Friday, November 14, 2008 2:25 AM To: Duncan Murdoch Cc: R-bugs at r-project.org; ocheyett at bonddesk.com;=20 r-devel at stat.math.ethz.ch Subject: Re: [Rd] (PR#13283) R crashes on sprintf with bad=20 format specification =20 As R's sprintf is a wrapper for the OS's sprintf, misuse does=20 run the risk of crashing from OS, and when it does the error=20 will come from the implementation of sprintf (which for R for=20 Windows is the Trio library). One could argue that the OS service should not segfault on=20 incorrect input, but this one often does. =20 The place to add checks is in the R wrapper code (presumably=20 in the C component). We do have checks there, but it is hard=20 to imagine just what errors users will make, and this one has=20 slipped through the checks.
The following change recognizes 'S' as a format type (but doesn't
do anything with it) and thus avoids the crash. =20
--- sprintf.c (revision 46944)
+++ sprintf.c (working copy)
@@ -103,7 +103,7 @@
else {
/* recognise selected types from Table B-1 of K&R */
/* This is MBCS-OK, as we are in a format spec */
- chunk =3D strcspn(formatString + cur + 1,
"aAdisfeEgGxX%") + 2;
+ chunk =3D strcspn(formatString + cur + 1,
"aAdisSfeEgGxX%") + 2;
if (cur + chunk > n)
error(_("unrecognised format at end of
string"));
The resulting error messages are not terribly direct, but nicer
than the crash
> sprintf("A %S %S %S XYZ", 1, 1, 1)
Error in sprintf("A %S %S %S XYZ", 1, 1, 1) :
use format %f, %e, %g or %a for numeric objects
> sprintf("A %S %S %S XYZ", "foo", "bar", "foo")
Error in sprintf("A %S %S %S XYZ", "foo", "bar", "foo") :
use format %s for character objects
One could add 'S' to some switch cases below there to have
it convert the corresponding argument to a wide character string
or say directly that %S is not allowed.
This piece of code finds the conversion type by looking for this
first of a bunch of known conversion characters after the %, which
doesn't find lots of illegal formats and causes some surprises. E.g.,
because 'p' is not in that list we get:
> sprintf("%p", 3)
Error in sprintf("%p", 3) : unrecognised format at end of string
> sprintf("%p is a pointer!", 3)
[1] "0x3 is a pointer!"
> sprintf("%p %o", 17L, 17L)
[1] "0x4 o"
Here the issue is the use of %S, an unknown format for=20 numbers, followed by 'X'. There is a check for a suitable=20 format, but it is not strict enough. And it is non-trivial=20 to write a printf format parser just to check for user error=20 (and potentially it will slow down correct uses:=20 speed in sprintf is important in some applications). =20 So I am not sure this is something that should be R's=20 responsibility to fix: maybe just add a warning to the help page? =20 =20 On Thu, 13 Nov 2008, Duncan Murdoch wrote: =20
On 12/11/2008 8:30 PM, ocheyett at bonddesk.com wrote:
Full_Name: Oren Cheyette Version: 2.7.2 OS: Win XP Submission from: (NULL) (64.161.123.194) =20 =20 Enter the following at the R command prompt:
sprintf("A %S %S %S XYZ", 1, 1, 1);
=20 R is not C, and the empty command at the end of that line (after the=20 semicolon) is not relevant. =20
Note the erroneous capitalized %S instead of %s and the=20
numeric inputs=20
instead of strings. With strings there's no crash - R reports bad format specifications.
2.7.2 is obsolete, but I can confirm a crash on Windows=20
with a recent=20
R-devel. The last few entries in the stack dump at the time of the=20
crash are shown=20
below; these make it look as though the problem is in the=20
Trio library, so it=20
may be hard to fix. Duncan Murdoch Rgui.exe caused an Access Violation at location 6c913fb3 in=20
module R.dll=20
Reading from location 00000001. Registers: eax=3D7fffffff ebx=3D00000000 ecx=3D00e17b21 edx=3D00000001=20
esi=3D00e1c83b edi=3D0000000a
eip=3D6c913fb3 esp=3D00e17944 ebp=3D00e17b3c iopl=3D0 nv up=20
ei pl nz na po nc
cs=3D001b ss=3D0023 ds=3D0023 es=3D0023 fs=3D003b gs=3D0000 =
efl=3D00000206
Call stack:
6C913FB3 R.dll:6C913FB3 TrioFormatProcess trio.c:2724
...
while (length > 0)
{
size =3D TrioWriteWideStringCharacter(self,=20
*wstring++, flags,=20
length); if (size =3D=3D 0) break; /* while */ ... 6C916592 R.dll:6C916592 trio_vsprintf trio.c:3771 ... return status;
status =3D TrioFormatProcess(&data, format, parameters);
if (data.error !=3D 0)
{
...
6C911F62 R.dll:6C911F62 sprintf compat.c:46
...
va_end(ap);
return res;
}
... 6C889F1E R.dll:6C889F1E do_sprintf sprintf.c:297 ... sprintf(bit, fmtp, " NaN"); else
sprintf(bit, fmtp, "NaN");
} else if (x =3D=3D R_PosInf) {
if (strcspn(fmtp, "+") < strlen(fmtp))
...
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
=20 --=20 Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 =20
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel =20