Skip to content

subRaw?

3 messages · Hervé Pagès, Spencer Graves

#
Hello, All:


       Do you know of any capability to substitute more then one byte in 
an object of class Raw?


       Consider the following:


 > let4 <- paste(letters[1:4], collapse='')
 > (let4Raw <- charToRaw(let4))
[1] 61 62 63 64
 > (let. <- sub('bc', '--', let4Raw))
[1] "61" "62" "63" "64"
 > # no substitution
 > (bc <- charToRaw('bc'))
[1] 62 63
 > (ef <- charToRaw('ef'))
[1] 65 66
 > (let. <- sub(bc, ef, let4Raw))
[1] "61" "65" "63" "64"
Warning messages:
1: In sub(bc, ef, let4Raw) :
   argument 'pattern' has length > 1 and only the first element will be used
2: In sub(bc, ef, let4Raw) :
   argument 'replacement' has length > 1 and only the first element will 
be used


       In this example, "b" was replaced by "e", but "bc" was not 
replaced by "ef"?  Do you know of any function to do this?


       I ask, because I need it.  I've written such a function, subRaw 
for my own use.  If I don't hear that another exists, I plan to add the 
one I've written to the oro.dicom package.


       Thanks,
       Spencer


 > sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods base
#
Hi Spencer,
On 07/19/2012 08:29 PM, Spencer Graves wrote:
It makes no sense to use sub(), grep(), and family (i.e. all the stuff
based on the regex code) *directly* on a raw vector because all these
functions will start by coercing their 'x', 'text', 'pattern',
'replacement' args to character with as.character (if they are not
already character).

But the way as.character() operates on a raw vector won't give good
results in that context. You'd rather do the coercion yourself first
with rawToChar(), and coerce back the result with charToRaw():

   > charToRaw(sub("bc", "--", rawToChar(let4Raw)))
   [1] 61 2d 2d 64

IMO it would make much more sense that sub(), grep(), and family()
raise an error than blindly try to coerce to character but these
functions (like many functions in R) are too polite to tell the
user s/he's doing something wrong.

Cheers,
H.

  
    
#
Hi, Herv?:
On 7/19/2012 10:19 PM, Herv? Pag?s wrote:
Thanks for the reply.


       It sounds like you agree that a function "subRaw" to facilitate 
this would be useful.  In my testing, charToRaw(sub(pattern, 
replacement, rawToChar(x)) did NOT preserve binary codes that did not 
match legitimate characters.  I tried several things before finding one 
that seemed to work.


       Best Wishes,
       Spencer