Skip to content

random output with sub(fixed = TRUE)

5 messages · Roger D. Peng, Peter Dalgaard, Brian Ripley +1 more

#
I've noticed what I think is curious behavior in using 'sub(fixed = TRUE)' and 
was wondering if my expectation is incorrect.  Here is one example:

v <- paste(0:10, "asdf", sep = ".")
sub(".asdf", "", v, fixed = TRUE)

The results I get are

 > sub(".asdf", "", v, fixed = TRUE)
  [1] "0"               "1\0st\0\0"       "2\0<af>\001\0\0" "3\0<af>\001\0\0"
  [5] "4\0mes\0"        "5\0<ba>\001\0\0" "6\0\0\0\0\0"     "7\0\0\0m\0"
  [9] "8\0\0\0t\0"      "9\0<fe>\0\0\0"   "10\0\0\0\0\0"
 >

I expected "0" in the first entry and everything else would be unchanged.  Your 
results may vary since every time I run 'sub()' in this way, I get a slightly 
different answer in entires 2 through 11.

As it turns out, 'gsub(fixed = TRUE)' gives me the answer I *actually* wanted, 
which was to replace the string in every entry.  But I still think the behavior 
of 'sub(fixed = TRUE) is a bit odd.

 > version
          _
platform x86_64-unknown-linux-gnu
arch     x86_64
os       linux-gnu
system   x86_64, linux-gnu
status
major    2
minor    2.1
year     2005
month    12
day      20
svn rev  36812
language R
 >

-roger
#
"Roger D. Peng" <rpeng at jhsph.edu> writes:
Argh... 

year     2005
month    12
day      21

and something like this gets discovered. It's a ritual, I tell ya, a ritual!

If you look at the output and terminate all strings at the embedded
\0, it looks much more sensible, so it should be fairly easy to spot
the cause of this bug...
#
Well, who am I to break this long-standing ritual? :)

Interestingly, while the printed output looks wrong, I get

 > v <- paste(0:10, "asdf", sep = ".")
 > a <- sub(".asdf", "", v, fixed = TRUE)
 > b <- as.character(0:10)
 > identical(a, b)
[1] TRUE
 >

-roger
Peter Dalgaard wrote:

  
    
#
On Wed, 21 Dec 2005, Roger D. Peng wrote:

            
identical is wrong!  R character strings have a true length and a C-style
length: print() prints the all the characters, even those after embedded 
nuls.  identical uses

 	    if(strcmp(CHAR(STRING_ELT(x, i)),
 		      CHAR(STRING_ELT(y, i))) != 0)

which is C-style.

The issue is character.c:1015 whose nr gets trashed: note the first answer 
in the vector is correct.  So easy to fix.

This code has been as currently for years, so I don't think this is at all 
related to the release of 2.2.1.

  
    
#
On 12/21/2005 5:13 PM, Roger D. Peng wrote:
I think finding two separate bugs on the day after the release goes a 
bit beyond what is necessary to satisfy the ritual.

Duncan Murdoch