On 07/08/2007 5:06 PM, Herve Pages wrote:
Hi,
?rawToChar
'rawToChar' converts raw bytes either to a single character string
or a character vector of single bytes. (Note that a single
character string could contain embedded nuls.)
Allowing embedded nuls in a string might be an interesting experiment
but it
seems to cause some troubles to most of the string manipulation
functions.
A string with an embedded 0:
raw0 <- as.raw(c(65:68, 0 , 70))
string0 <- rawToChar(raw0)
[1] "ABCD\0F"
nchar() should return 6:
You don't state your R version. The default type of counting in nchar()
has recently changed from "bytes" (where 6 is correct) to "chars" (where
4 is correct).
R version 2.6.0 Under development (unstable) (2007-07-02 r42107)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY=en_US;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] rcompgen_0.1-15
And indeed:
raw0 <- as.raw(c(65:68, 0 , 70))
string0 <- rawToChar(raw0)
nchar(string0, type="chars")
nchar(string0, type="bytes")
[1] 6
In addition to the string functions already mentioned before, it's worth noting that
'paste' doesn't seem to be "embedded nul aware" neither:
paste(string0, "G", sep="")
[1] "ABCDG"
Same for serialization:
save(string0, file="string0.rda")
load("string0.rda")
string0