I would implement it differently from the way you did. I'd call it
a rawConnection, taking a raw variable (or converting something else
using as.raw) as the input, and providing both text and binary
read/write modes (using the same conventions for text mode as a file
connection would). It *should* support seek, at least in binary
mode.
I was trying to reuse as much of the textConnection semantics and
underlying code as possible...
Having a rawConnection() entry point is simple enough. Seeking also
seems straightforward. I'm not so sure about using as.raw(). I
wondered about that, but also thought that rather than coercing to
raw, it might make more sense to cast atomic vector types to raw,
byte-for-byte.
Can you given an example of where a text-mode raw connection would be
a useful thing?
-- Dave
I would implement it differently from the way you did. I'd call it
a rawConnection, taking a raw variable (or converting something else
using as.raw) as the input, and providing both text and binary
read/write modes (using the same conventions for text mode as a file
connection would). It *should* support seek, at least in binary
mode.
I was trying to reuse as much of the textConnection semantics and
underlying code as possible...
Having a rawConnection() entry point is simple enough. Seeking also
seems straightforward. I'm not so sure about using as.raw(). I
wondered about that, but also thought that rather than coercing to
raw, it might make more sense to cast atomic vector types to raw,
byte-for-byte.
I'd prefer as.raw, so that we don't end up with two incompatible ways to
convert other objects to raw objects.
Can you given an example of where a text-mode raw connection would be
a useful thing?
No, but someone else might. Why unnecessarily let the source of the
bytes determine the mode of the connection? In the case of
textConnection, there are natural line breaks, so a text mode connection
makes sense. A raw object can contain anything, so why wouldn't someone
want to put text in it some day?
Duncan Murdoch
Having a rawConnection() entry point is simple enough. Seeking also
seems straightforward. I'm not so sure about using as.raw(). I
wondered about that, but also thought that rather than coercing to
raw, it might make more sense to cast atomic vector types to raw,
byte-for-byte.
I'd prefer as.raw, so that we don't end up with two incompatible ways to
convert other objects to raw objects.
An advantage of no as.raw() would be that you could create a raw
connection on an object without making an extra copy, which was
another of your requests. But there would be a lack of symmetry,
because you could "r" from an arbitrary R object, but only "w" to raw,
unless there was also a way of specifying a type for the result
vector.
Having the backing store be an R object with no copy does seem tricky,
however. Currently, textConnection() makes a copy for "r" connections
but writes directly to an R object for "w" connections. The "w" case
is buggy; you can crash R by removing the target object while the
connection is being used. I'm not familiar enough with R internals to
know how to fix that. Maybe the object has to be searched for every
time the connection is used, to avoid potentially stale pointers?
Can you given an example of where a text-mode raw connection would be
a useful thing?
No, but someone else might. Why unnecessarily let the source of the
bytes determine the mode of the connection? In the case of
textConnection, there are natural line breaks, so a text mode connection
makes sense. A raw object can contain anything, so why wouldn't someone
want to put text in it some day?
It seems that that a text-mode raw connection would be equivalent to a
textConnection on the result of rawToChar(), no?
While some of these possibilities seem like they might be useful, I'm
not sure that all need to be implemented immediately. If we can agree
on the basic interface and semantics, then we could implement a basic
version now, and relax restrictions on the arguments later as needed?
-- Dave
Having a rawConnection() entry point is simple enough. Seeking also
seems straightforward. I'm not so sure about using as.raw(). I
wondered about that, but also thought that rather than coercing to
raw, it might make more sense to cast atomic vector types to raw,
byte-for-byte.
I'd prefer as.raw, so that we don't end up with two incompatible ways to
convert other objects to raw objects.
An advantage of no as.raw() would be that you could create a raw
connection on an object without making an extra copy, which was
another of your requests. But there would be a lack of symmetry,
because you could "r" from an arbitrary R object, but only "w" to raw,
unless there was also a way of specifying a type for the result
vector.
I think the cost of duplicating as.raw is worse than the cost of using
extra memory. If the lack of symmetry bothers you, a solution is to
require a raw object as input.
Having the backing store be an R object with no copy does seem tricky,
however.
In that case I wouldn't bother. It's important to get it right; being
maximally efficient is a second priority.
> Currently, textConnection() makes a copy for "r" connections
but writes directly to an R object for "w" connections. The "w" case
is buggy; you can crash R by removing the target object while the
connection is being used. I'm not familiar enough with R internals to
know how to fix that. Maybe the object has to be searched for every
time the connection is used, to avoid potentially stale pointers?
I've been having an argument with some other people about something
related to this. I think they would say that the language doesn't
support writing to a variable.
I don't know the right way to fix this.
Can you given an example of where a text-mode raw connection would be
a useful thing?
No, but someone else might. Why unnecessarily let the source of the
bytes determine the mode of the connection? In the case of
textConnection, there are natural line breaks, so a text mode connection
makes sense. A raw object can contain anything, so why wouldn't someone
want to put text in it some day?
It seems that that a text-mode raw connection would be equivalent to a
textConnection on the result of rawToChar(), no?
If so, then a binary mode rawConnection (with mention of the way to
convert in the Rd file) would be good enough for me.
While some of these possibilities seem like they might be useful, I'm
not sure that all need to be implemented immediately. If we can agree
on the basic interface and semantics, then we could implement a basic
version now, and relax restrictions on the arguments later as needed?
I'd rather get it right now, but that doesn't have to mean including
every bell and whistle someone (even me!) has suggested.
Duncan Murdoch
I think the cost of duplicating as.raw is worse than the cost of using
extra memory. If the lack of symmetry bothers you, a solution is to
require a raw object as input.
It wouldn't exactly be duplicating as.raw since this way of converting
to raw is actually to do nothing at all, just to treat the object as
if it is already raw. But, I don't have a strong opinion.
> Currently, textConnection() makes a copy for "r" connections
but writes directly to an R object for "w" connections. The "w" case
is buggy; you can crash R by removing the target object while the
connection is being used. I'm not familiar enough with R internals to
know how to fix that. Maybe the object has to be searched for every
time the connection is used, to avoid potentially stale pointers?
I've been having an argument with some other people about something
related to this. I think they would say that the language doesn't
support writing to a variable.
I tried changing textConnection output connections to look up the
destination object on every access and that seems to solve the problem
without being terribly expensive.
If so, then a binary mode rawConnection (with mention of the way to
convert in the Rd file) would be good enough for me.
It seems we are coming back to something close to what I had
originally implemented?
-- Dave
I think the cost of duplicating as.raw is worse than the cost of using
extra memory. If the lack of symmetry bothers you, a solution is to
require a raw object as input.
It wouldn't exactly be duplicating as.raw since this way of converting
to raw is actually to do nothing at all, just to treat the object as
if it is already raw. But, I don't have a strong opinion.
I haven't looked at as.raw, but I think it does something other than
that. For example,
rawToChar(as.raw(1:10)) gives
"\001\002\003\004\005\006\a\b\t\n"
I don't know if there's a way to do exactly what you're proposing. One
argument against it is that the bytes for an object may vary from
platform to platform (big versus little endian, maybe 32 vs 64 bit),
whereas we try to make R code platform independent when we can.
Currently, textConnection() makes a copy for "r" connections
but writes directly to an R object for "w" connections. The "w" case
is buggy; you can crash R by removing the target object while the
connection is being used. I'm not familiar enough with R internals to
know how to fix that. Maybe the object has to be searched for every
time the connection is used, to avoid potentially stale pointers?
I've been having an argument with some other people about something
related to this. I think they would say that the language doesn't
support writing to a variable.
I tried changing textConnection output connections to look up the
destination object on every access and that seems to solve the problem
without being terribly expensive.
If so, then a binary mode rawConnection (with mention of the way to
convert in the Rd file) would be good enough for me.
It seems we are coming back to something close to what I had
originally implemented?
Probably! The differences I still know about are:
- I'd like the name to reflect the data source, so rawConnection or
something similar rather than overloading textConnection.
- It needs a man page, or to be included on the textConnection man page.
Duncan