i wonder about the following examples showing incoherence in how type
conversions are done in r:
x = TRUE
x[2] = as.raw(1)
# Error in x[2] = as.raw(1) :
# incompatible types (from raw to logical) in subassignment type fix
it seems that there is an attempt to coerce the raw value to logical
here, which fails, even though
as.logical(as.raw(1))
# TRUE
likewise,
x[2] = 1L
# the vector is silently coerced upwards to integer
x[2] = as.raw(1)
# Error in x[2] = as.raw(1) :
# incompatible types (from raw to integer) in subassignment type fix
even though
as.integer(as.raw(1))
# 1
and likewise for double and complex.
there's another incoherence:
x = 1
x[2] = 1i
x
# 1i 1i
x = 1i
x[2] = 1
x
# 1i 1i
in both cases, the higher type is used for the result; in the former
case, the vector is coerced upwards, in the latter, the assigned value
is coerced upwards. however:
x = 1
x[2] = as.raw(1)
# error: incompatible types (from raw to double)
x = as.raw(1)
x[2] = 1
# error: incompatible types (from double to raw)
leaving aside that
as.double(as.raw(1))
# 1
as.raw(as.double(1))
# 1
work just fine, in both cases there is an attempt to coerce the assigned
value to the vector type, and not to the higher type (which would
presumably qbe double, as in ?c), as in the previous example.
interestingly,
c(1, as.raw(1))
# error: type 'raw' is unimplemented in 'RealAnswer'
(note the 'real', not 'double'), whereas
1 == as.raw(1)
# TRUE
works just fine. furthermore,
c('1', as.raw(1))
# "1" "01"
whereas
x = '1'
x[2] = as.raw(1)
# error: incompatible types (from raw to character)
yet another issue is that of indexing a raw vector with an out-of-bounds
index. the r language definition, sec. 3.4.1 [1] says:
"
We shall discuss indexing of simple vectors first. For simplicity,
assume that the expression is x[i]. (...)
If i is positive and exceeds length(x) then the corresponding selection
is NA.
"
it's probably correct to assume that 'simple vector' means 'atomic
vector', though not all r core members seem to be quite sure [2]:
"
So what is a simple vector? That is not explicitly defined, and it
probably should be. I think it is "atomic vectors, except those with a
class that has a method for [".
"
it appears that raw vectors are atomic vectors:
is.atomic(as.raw(1))
# TRUE
so an index out of bounds (at least, a positive integer index exceeding
the length of the vector) should (?) produce an NA; however,
as.raw(1)[2]
# 00
this is presumably because there is no raw NA, and an NA of whatever
type is converted to 0 by as.raw:
as.raw(NA)
# 00
# warning: out-of-range values treated as 0 in coercion to raw
but in this case there's a warning. why does not out-of-bounds indexing
of a raw vector not produce a warning? following the language
definition, the fact that a raw vector is atomic, and the above informal
statement on simple and atomic vectors, the selection should first
produce an NA, which only subsequently is coerced to the raw 0 -- with a
warning. there's an analogous issue with out-of-bounds assignment:
x = as.raw(1)
x[3] = as.raw(3)
x
# 01 00 03
but
x = 1
x[3] = 3
x
# 1 NA 3
as.raw(x)
# warning: out-of-range values treated as 0 in coercion to raw
is all this an intended feature?
vQ
[1] http://cran.r-project.org/doc/manuals/R-lang.html#Indexing-by-vectors
[2] http://tolstoy.newcastle.edu.au/R/e6/devel/09/03/0954.html
incoherent conversions from/to raw
2 messages · Wacek Kusnierczyk
Wacek Kusnierczyk wrote:
interestingly,
c(1, as.raw(1))
# error: type 'raw' is unimplemented in 'RealAnswer'
three more comments.
(1)
the above is interesting in the light of what ?c says:
"
The output type is determined from the highest type of the
components in the hierarchy NULL < raw < logical < integer < real
< complex < character < list < expression.
"
which seems to suggest that raw components should be coerced to whatever
the highest type among all arguments to c, which clearly doesn't happen:
test = function(type)
c(as.raw(1), get(sprintf('as.%s',type))(1))
for (type in c('null', 'logical', 'integer', 'real', 'complex',
'character', 'list', 'expression'))
tryCatch(test(type), error = function(e) cat(sprintf("raw won't
coerce to %s type\n", type)))
which shows that raw won't coerce to the four first types in the
'hierarchy' (excluding NULL), but it will to character, list, and
expression.
suggestion: improve the documentation, or adapt the implementation to
a more coherent design.
(2)
incidentally, there's a bug somewhere there related to the condition
system and printing:
tryCatch(stop(), error=function(e) print(e))
# works just fine
tryCatch(stop(), error=function(e) sprintf('%s', e))
# *** caught segfault ***
# address (nil), cause 'memory not mapped'
# Traceback:
# 1: sprintf("%s", e)
# 2: value[[3]](cond)
# 3: tryCatchOne(expr, names, parentenv, handlers[[1]])
# 4: tryCatchList(expr, classes, parentenv, handlers)
# 5: tryCatch(stop(), error = function(e) sprintf("%s", e))
# Possible actions:
# 1: abort (with core dump, if enabled)
# 2: normal R exit
# 3: exit R without saving workspace
# 4: exit R saving workspace
# Selection:
interestingly, it is possible to stay in the session by typing ^C. the
session seems to work, but if the tryCatch above is tried once again, a
segfault causes r to crash immediately:
# ^C
tryCatch(stop(), error=function(e) sprintf('%s', e))
# [whoever at wherever] $
however, this doesn't happen if some other code is evaluated first:
# ^C
x = 1:10^8
tryCatch(stop(), error=function(e) sprintf('%s', e))
# Error in sprintf("%s", e) : 'getEncChar' must be called on a CHARSXP
this can't be a feature. (tried in both 2.8.0 and r-devel; version
info at the bottom.)
suggestion: trace down and fix the bug.
(3)
the error argument to tryCatch is used in two examples in ?tryCatch, but
it is not explained anywhere in the help page. one can guess that the
argument name corresponds to the class of conditions the handler will
handle, but it would be helpful to have this stated explicitly. the
help page simply says:
"
If a condition is signaled while evaluating 'expr' then
established handlers are checked, starting with the most recently
established ones, for one matching the class of the condition.
When several handlers are supplied in a single 'tryCatch' then the
first one is considered more recent than the second.
"
which is uninformative in this respect -- what does 'one matching the
class' mean?
suggestion: improve the documentation.
vQ
version
_ platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 2 minor 8.0 year 2008 month 10 day 20 svn rev 46754 language R version.string R version 2.8.0 (2008-10-20)
version
_ platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status Under development (unstable) major 2 minor 9.0 year 2009 month 03 day 19 svn rev 48152 language R version.string R version 2.9.0 Under development (unstable) (2009-03-19 r48152)