Skip to content
Prev 35246 / 63421 Next

Antwort: Re: Crash with Unicode and sub (PR#14114)

I don't know about the technicalities, but Peter Dalgaard said the 
offending code also causes R to come to a stop using SUSE + WINE. Is it 
possible to run that lot on top of valgrind? Of course, it will probably 
take all day ...

If not, I have a  clue which might help. The problem seems to lie in the 
"sub" routine. In the original report I used
-- cut here --
gctorture()
u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2))))
v <- rep(u,1e2)
v <- sub(" ","",v)
v %in% ""
-- cut here --

I've tried reducing this a bit more. Replacing intToUtf8 with a direct 
assignment writing out the string with Unicode escapes seems to make no 
difference. The %in% can be replaced with "match", leaving the following:
-- cut here --
gctorture()
u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2))))
v <- rep(u,1e2)
v <- sub(" ","",v)
match(v,"")
-- cut here --
This also crashes R-2.10.0 and R-2.10.1 RC (2009-12-06 r50684).

The sub line is essential, so far as I can see, without it we don't get 
the crash. If we add "perl = TRUE" this seems to make no difference (there 
is still a crash). If instead we use "fixed = TRUE", the result is strange 
and differs for R-2.10.0 and R-2.10.1 RC. This is especially strange, 
because in an unbugged R, the result of v returned from sub should be the 
same either with fixed = TRUE or perl = TRUE.

R-2.10.0 pauses several seconds, then produces the enigmatic output:
[1] 00 00 06 9d 78 9c cd 54 5d 4f 83 30 14 2d ec 9b a9 33 99 2f fe 89 65 
1a e3
 [26] c3 de 8c 26 be 38 7d d5 c7 4a af 0c 57 ca 42 cb 8c bf dc 18 93 61 29 
1d 83
 [51] 8e 7d c4 18 23 49 a1 f4 de 9e 7b 4e ef 81 47 07 21 64 23 5b 3e ec 1a 
42 56
 [76] 4b de ea b6 5c b3 e4 e8 c8 d1 f2 80 41 e4 bb 32 7e 5c 58 ae f3 49 f8 
66 a6
[101] ce b0 3b c5 1e c8 69 31 b5 15 80 98 84 84 cb e9 22 db 61 27 1b 53 4a 
80 0d
[126] 2f 0a e3 99 9c f4 91 ba 4a 41 67 8e 69 0c d7 14 73 ae d1 cc 8c 0e f7 
3d 86
[151] 45 1c 81 41 be 19 3e bf 82 2b 4c 40 ee 07 33 0a 0f 8c be a7 6f 3a 62 
ad 68
[176] af e8 12 78 c1 31 15 c8 aa 9d 9a a1 5c 89 dd 57 cb eb 27 da 14 38 f2 
20 dd
[201] 5c 45 ba c1 70 00 9b 14 35 dc 4c 6e 49 4d 51 e6 8e d3 4d 95 54 aa f1 
19 90
[226] 32 21 27 29 73 e8 26 bf 53 d6 32 71 0a 4e da 01 51 49 e3 84 48 77 ce 
81 dc
[251] 64 2d 19 ab 0d 7b 33 42 9f 99 42 64 af a7 a8 5e 9d 0f ce 86 83 a1 d9 
c1 a5
[276] 7f d0 97 86 69 16 a2 dd 54 90 a6 93 21 1f 25 ba c2 d2 54 68 25 c8 31 
49 d6
[301] ae ee 9f 2a ba d4 96 a6 89 03 60 22 83 2b 3b 17 53 3a ce 79 17 3e 96 
b5 d3
[326] ea ea b4 3b 9f 8b 9f b9 a5 cd a7 40 41 84 4c a9 ce dd dd 49 fe c6 3e 
07 7f
[351] 54 e7 7f d9 f4 50 77 5c 19 a9 e0 b9 5e b2 dd 60 79 6c 13 ad 1e 17 98 
11 1c
[376] 91 db e5 5f 7e df 0f e7 43 57 c9 7d 01 6c 3e 1a 5d 0c 2f ab 99 6e a9 
a8 88
[401] 57 1c 35 5a 7c 03 73 22 e4 b1

R-2.10.1 RC produces the following equally enigmatic output:
NULL
Fehler: 'getEncChar' muss f?r CHARSXP aufgerufen werden

So my provisional guess is the bug is somewhere in the part of the 
internal code for sub which is invoked whatever the value of fixed or 
perl. It is strange though that it makes a difference whether you specify 
fixed = TRUE or not.

George Russell


Prof Brian Ripley <ripley at stats.ox.ac.uk> schrieb am 10.12.2009 08:00:36:
RC
code
crashes
HRA 95
Witzig
E-Mail
und
in
11:24:50:
compile.
the
35327918
35327907