Skip to content

R on Windows crashes when using certain characters in strings in data frames (PR#14125)

4 messages · Karl Ove Hufthammer, Duncan Murdoch

#
Full_Name: Karl Ove Hufthammer
Version: 2.10.0
OS: Windows XP
Submission from: (NULL) (93.124.134.66)


I have found a rather strange bug in R 2.10.0 on Windows, where the choice of
characters used in a string make R crash (i.e., Windows shows a dialogue saying
that the application has a problem, and must be closed).

I can reproduce the bug on two separate systems running Windows XP, and with
both R 2.10.0 and the latest R.2.10.1 RC.

The following commands trigger the crash for me:

n=1e5
k=10
x=sample(k,n,replace=TRUE)
y=sample(k,n,replace=TRUE)
xy=paste(x,y,sep=" ? ")
z=sample(n)
d=data.frame(xy,z)

The last step takes very long time, and R crashes before it's finished. Note
that if I reduce n, the problem disappears. Also, if I change the ? (a
multiplication symbol) to a x (a letter), the problem also disappears (and the
last command takes almost no time to run).

I originally discovered this (or a related?) bug while using 'unique' on a data
frame similar to the 'd' data frame defined above, where R would often, but not
always, crash.
R version 2.10.0 (2009-10-26) 
i386-pc-mingw32 

locale:
[1] LC_COLLATE=Norwegian-Nynorsk_Norway.1252 
[2] LC_CTYPE=Norwegian-Nynorsk_Norway.1252   
[3] LC_MONETARY=Norwegian-Nynorsk_Norway.1252
[4] LC_NUMERIC=C                             
[5] LC_TIME=Norwegian-Nynorsk_Norway.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
1 day later
#
On Thu, 10 Dec 2009 10:20:09 +0100 (CET) karl at huftis.org
<karl at huftis.org> wrote:

            
Note: On the R Bug Tracking System Web site, the character causing the 
problem seems to be incorrectly displayed as a '.', though on the 
mailing list the correct character is used. The character should be the 
multiplication symbol, U+00D7, which looks similar to an 'x'. The 
character does exist in both ISO 8859-1 and Windows-1252.
#
On 11/12/2009 6:36 AM, Karl Ove Hufthammer wrote:
Yes, I can reproduce this, and I know a likely cause.  It will be fixed 
in R-devel soon, but I think it will probably be too late to make it 
into 2.10.1.  It will go into 2.10.1 patched after release if that's the 
case.

Duncan Murdoch
2 days later
#
On 10/12/2009 4:20 AM, karl at huftis.org wrote:
This was related to encoding changes.  It likely appeared 
Windows-specific because Windows uses a different default encoding than 
most Linux systems.  I believe it is fixed now in R-devel, and it will 
soon make it into 2.10.1-patched, but it came too late to make it into 
today's release.

I believe PR#14114 was the same issue and is also fixed, but I did less 
testing of it.  I'd appreciate it if those who saw either bug in real 
code test the patches.  They should be in today's tarball of R-devel, 
and did make it into the Windows binary build of R-devel this morning.

Duncan