Skip to content

RFC: "loop connections"

12 messages · David Hinds, Brian Ripley, Martin Maechler +2 more

#
I've just implemented a generalization of R's text connections, to
also support reading/writing raw binary data.  There is very little
new code to speak of.  For input connections, I wrote code to populate
the old text connection buffer from a raw vector, and provided a new
raw_read() method.  For output connections, I wrote a raw_write() to
append to a raw vector.  On input, the mode (text or binary) is
determined by the data type of the input object; on output, I use the
requested output mode (i.e. "w" / "wb").  For example:

 > con <- loopConnection("r", "wb")
 > a <- c(10,100,1000)
 > writeBin(a, con, size=4)
 > r
  [1] 00 00 20 41 00 00 c8 42 00 00 7a 44
 > close(con)
 > con <- loopConnection(r)
 > readBin(con, "double", n=3, size=4)
 [1]   10  100 1000
 > close(con)

I think "loop connection" is a better name for this sort of connection
than "text connection" was even for the old version; that confuses the
mode of the connection (text vs binary) with the mechanism (file,
socket, etc).

I've appended a patch to the end of this message.  As implemented
here, textConnection is replaced by loopConnection but functionally
this is a superset of the old textConnection.  For compatibility, one
could add:

  textConnection <- function(...) loopConnection(...)

The patch is against R-2.1.1.  I can investigate whether any changes
are required for the current development tree.  I can also update the
documentation files as required.  I thought I'd first check whether
anyone else thought this was worth inclusion before spending more time
on it.

The raw_write() code could be improved with smarter memory allocation
(grabbing bigger chunks rather than reallocating the raw vector for
every write), but this is at least a proof of principle.

-- David Hinds



--- src/main/connections.c.orig	2005-06-17 19:05:02.000000000 -0700
+++ src/main/connections.c	2005-08-22 15:54:03.156038200 -0700
@@ -1644,13 +1644,13 @@
     return ans;
 }
 
-/* ------------------- text connections --------------------- */
+/* ------------------- loop connections --------------------- */
 
 /* read a R character vector into a buffer */
 static void text_init(Rconnection con, SEXP text)
 {
     int i, nlines = length(text), nchars = 0;
-    Rtextconn this = (Rtextconn)con->private;
+    Rloopconn this = (Rloopconn)con->private;
 
     for(i = 0; i < nlines; i++)
 	nchars += strlen(CHAR(STRING_ELT(text, i))) + 1;
@@ -1668,19 +1668,35 @@
     this->cur = this->save = 0;
 }
 
-static Rboolean text_open(Rconnection con)
+/* read a R raw vector into a buffer */
+static void raw_init(Rconnection con, SEXP raw)
+{
+    int nbytes = length(raw);
+    Rloopconn this = (Rloopconn)con->private;
+
+    this->data = (char *) malloc(nbytes);
+    if(!this->data) {
+	free(this); free(con->description); free(con->class); free(con);
+	error(_("cannot allocate memory for raw connection"));
+    }
+    memcpy(this->data, RAW(raw), nbytes);
+    this->nchars = nbytes;
+    this->cur = this->save = 0;
+}
+
+static Rboolean loop_open(Rconnection con)
 {
     con->save = -1000;
     return TRUE;
 }
 
-static void text_close(Rconnection con)
+static void loop_close(Rconnection con)
 {
 }
 
-static void text_destroy(Rconnection con)
+static void loop_destroy(Rconnection con)
 {
-    Rtextconn this = (Rtextconn)con->private;
+    Rloopconn this = (Rloopconn)con->private;
 
     free(this->data);
     /* this->cur = this->nchars = 0; */
@@ -1689,7 +1705,7 @@
 
 static int text_fgetc(Rconnection con)
 {
-    Rtextconn this = (Rtextconn)con->private;
+    Rloopconn this = (Rloopconn)con->private;
     if(this->save) {
 	int c;
 	c = this->save;
@@ -1700,48 +1716,69 @@
     else return (int) (this->data[this->cur++]);
 }
 
-static double text_seek(Rconnection con, double where, int origin, int rw)
+static double loop_seek(Rconnection con, double where, int origin, int rw)
 {
-    if(where >= 0) error(_("seek is not relevant for text connection"));
+    if(where >= 0) error(_("seek is not relevant for loop connection"));
     return 0; /* if just asking, always at the beginning */
 }
 
-static Rconnection newtext(char *description, SEXP text)
+static size_t raw_read(void *ptr, size_t size, size_t nitems,
+		       Rconnection con)
+{
+    Rloopconn this = (Rloopconn)con->private;
+    if (this->cur + size*nitems > this->nchars) {
+	nitems = (this->nchars - this->cur)/size;
+	memcpy(ptr, this->data+this->cur, size*nitems);
+	this->cur = this->nchars;
+    } else {
+	memcpy(ptr, this->data+this->cur, size*nitems);
+	this->cur += size*nitems;
+    }
+    return nitems;
+}
+
+static Rconnection newloop(char *description, SEXP data)
 {
     Rconnection new;
     new = (Rconnection) malloc(sizeof(struct Rconn));
-    if(!new) error(_("allocation of text connection failed"));
-    new->class = (char *) malloc(strlen("textConnection") + 1);
+    if(!new) error(_("allocation of loop connection failed"));
+    new->class = (char *) malloc(strlen("loopConnection") + 1);
     if(!new->class) {
 	free(new);
-	error(_("allocation of text connection failed"));
+	error(_("allocation of loop connection failed"));
     }
-    strcpy(new->class, "textConnection");
+    strcpy(new->class, "loopConnection");
     new->description = (char *) malloc(strlen(description) + 1);
     if(!new->description) {
 	free(new->class); free(new);
-	error(_("allocation of text connection failed"));
+	error(_("allocation of loop connection failed"));
     }
     init_con(new, description, "r");
     new->isopen = TRUE;
     new->canwrite = FALSE;
-    new->open = &text_open;
-    new->close = &text_close;
-    new->destroy = &text_destroy;
-    new->fgetc = &text_fgetc;
-    new->seek = &text_seek;
-    new->private = (void*) malloc(sizeof(struct textconn));
+    new->open = &loop_open;
+    new->close = &loop_close;
+    new->destroy = &loop_destroy;
+    new->seek = &loop_seek;
+    new->private = (void*) malloc(sizeof(struct loopconn));
     if(!new->private) {
 	free(new->description); free(new->class); free(new);
-	error(_("allocation of text connection failed"));
+	error(_("allocation of loop connection failed"));
+    }
+    new->text = isString(data);
+    if (new->text) {
+	new->fgetc = &text_fgetc;
+	text_init(new, data);
+    } else {
+	new->read = &raw_read;
+	raw_init(new, data);
     }
-    text_init(new, text);
     return new;
 }
 
-static void outtext_close(Rconnection con)
+static void outloop_close(Rconnection con)
 {
-    Routtextconn this = (Routtextconn)con->private;
+    Routloopconn this = (Routloopconn)con->private;
     SEXP tmp;
     int idx = ConnIndex(con);
 
@@ -1755,9 +1792,9 @@
     SET_VECTOR_ELT(OutTextData, idx, R_NilValue);
 }
 
-static void outtext_destroy(Rconnection con)
+static void outloop_destroy(Rconnection con)
 {
-    Routtextconn this = (Routtextconn)con->private;
+    Routloopconn this = (Routloopconn)con->private;
     free(this->lastline); free(this);
 }
 
@@ -1765,7 +1802,7 @@
 
 static int text_vfprintf(Rconnection con, const char *format, va_list ap)
 {
-    Routtextconn this = (Routtextconn)con->private;
+    Routloopconn this = (Routloopconn)con->private;
     char buf[BUFSIZE], *b = buf, *p, *q, *vmax = vmaxget();
     int res = 0, usedRalloc = FALSE, buffree,
 	already = strlen(this->lastline);
@@ -1830,24 +1867,41 @@
     return res;
 }
 
-static void outtext_init(Rconnection con, char *mode, int idx)
+static size_t raw_write(const void *ptr, size_t size, size_t nitems,
+			Rconnection con)
+{
+    Routloopconn this = (Routloopconn)con->private;
+    SEXP tmp;
+    int idx = ConnIndex(con);
+
+    PROTECT(tmp = lengthgets(this->data, this->len + size*nitems));
+    memcpy(RAW(tmp)+this->len, ptr, size*nitems);
+    this->len += size*nitems;
+    defineVar(this->namesymbol, tmp, VECTOR_ELT(OutTextData, idx));
+    this->data = tmp;
+    UNPROTECT(1);
+    return nitems;
+}
+
+static void outloop_init(Rconnection con, char *mode, int idx)
 {
-    Routtextconn this = (Routtextconn)con->private;
+    Routloopconn this = (Routloopconn)con->private;
+    int st = (con->text ? STRSXP : RAWSXP);
     SEXP val;
 
     this->namesymbol = install(con->description);
-    if(strcmp(mode, "w") == 0) {
+    if(strncmp(mode, "w", 1) == 0) {
 	/* create variable pointed to by con->description */
-	PROTECT(val = allocVector(STRSXP, 0));
+	PROTECT(val = allocVector(st, 0));
 	defineVar(this->namesymbol, val, VECTOR_ELT(OutTextData, idx));
 	UNPROTECT(1);
     } else {
 	/* take over existing variable */
 	val = findVar1(this->namesymbol, VECTOR_ELT(OutTextData, idx),
-		       STRSXP, FALSE);
+		       st, FALSE);
 	if(val == R_UnboundValue) {
-	    warning(_("text connection: appending to a non-existent char vector"));
-	    PROTECT(val = allocVector(STRSXP, 0));
+	    warning(_("loop connection: appending to a non-existent vector"));
+	    PROTECT(val = allocVector(st, 0));
 	    defineVar(this->namesymbol, val, VECTOR_ELT(OutTextData, idx));
 	    UNPROTECT(1);
 	}
@@ -1859,49 +1913,55 @@
 }
 
 
-static Rconnection newouttext(char *description, SEXP sfile, char *mode,
+static Rconnection newoutloop(char *description, SEXP sfile, char *mode,
 			      int idx)
 {
+    int isText = (mode[1] != 'b');
     Rconnection new;
     void *tmp;
 
     new = (Rconnection) malloc(sizeof(struct Rconn));
-    if(!new) error(_("allocation of text connection failed"));
-    new->class = (char *) malloc(strlen("textConnection") + 1);
+    if(!new) error(_("allocation of loop connection failed"));
+    new->class = (char *) malloc(strlen("loopConnection") + 1);
     if(!new->class) {
 	free(new);
-	error(_("allocation of text connection failed"));
+	error(_("allocation of loop connection failed"));
     }
-    strcpy(new->class, "textConnection");
+    strcpy(new->class, "loopConnection");
     new->description = (char *) malloc(strlen(description) + 1);
     if(!new->description) {
 	free(new->class); free(new);
-	error(_("allocation of text connection failed"));
+	error(_("allocation of loop connection failed"));
     }
     init_con(new, description, mode);
+    new->text = isText;
     new->isopen = TRUE;
     new->canread = FALSE;
-    new->open = &text_open;
-    new->close = &outtext_close;
-    new->destroy = &outtext_destroy;
-    new->vfprintf = &text_vfprintf;
-    new->seek = &text_seek;
-    new->private = (void*) malloc(sizeof(struct outtextconn));
+    new->open = &loop_open;
+    new->close = &outloop_close;
+    new->destroy = &outloop_destroy;
+    new->seek = &loop_seek;
+    new->private = (void*) malloc(sizeof(struct outloopconn));
     if(!new->private) {
 	free(new->description); free(new->class); free(new);
-	error(_("allocation of text connection failed"));
+	error(_("allocation of loop connection failed"));
     }
-    ((Routtextconn)new->private)->lastline = tmp = malloc(LAST_LINE_LEN);
+    ((Routloopconn)new->private)->lastline = tmp = malloc(LAST_LINE_LEN);
     if(!tmp) {
 	free(new->private);
 	free(new->description); free(new->class); free(new);
-	error(_("allocation of text connection failed"));
+	error(_("allocation of loop connection failed"));
     }
-    outtext_init(new, mode, idx);
+    if (isText) {
+	new->vfprintf = &text_vfprintf;
+    } else {
+	new->write = &raw_write;
+    }
+    outloop_init(new, mode, idx);
     return new;
 }
 
-SEXP do_textconnection(SEXP call, SEXP op, SEXP args, SEXP env)
+SEXP do_loopconnection(SEXP call, SEXP op, SEXP args, SEXP env)
 {
     SEXP sfile, stext, sopen, ans, class, venv;
     char *desc, *open;
@@ -1914,8 +1974,6 @@
 	error(_("invalid 'description' argument"));
     desc = CHAR(STRING_ELT(sfile, 0));
     stext = CADR(args);
-    if(!isString(stext))
-	error(_("invalid 'text' argument"));
     sopen = CADDR(args);
     if(!isString(sopen) || length(sopen) != 1)
     error(_("invalid 'open' argument"));
@@ -1924,16 +1982,20 @@
     if (!isEnvironment(venv) && venv != R_NilValue)
 	error(_("invalid 'environment' argument"));
     ncon = NextConnection();
-    if(!strlen(open) || strncmp(open, "r", 1) == 0)
-	con = Connections[ncon] = newtext(desc, stext);
-    else if (strncmp(open, "w", 1) == 0 || strncmp(open, "a", 1) == 0) {
+    if(!strlen(open) || strncmp(open, "r", 1) == 0) {
+	if(!isString(stext) && (TYPEOF(stext) != RAWSXP))
+	    error(_("invalid 'object' argument"));
+	con = Connections[ncon] = newloop(desc, stext);
+    } else if (strncmp(open, "w", 1) == 0 || strncmp(open, "a", 1) == 0) {
+	if(!isString(stext))
+	    error(_("invalid 'object' argument"));
 	if (OutTextData == NULL) {
 	    OutTextData = allocVector(VECSXP, NCONNECTIONS);
 	    R_PreserveObject(OutTextData);
 	}
 	SET_VECTOR_ELT(OutTextData, ncon, venv);
 	con = Connections[ncon] =
-	    newouttext(CHAR(STRING_ELT(stext, 0)), sfile, open, ncon);
+	    newoutloop(CHAR(STRING_ELT(stext, 0)), sfile, open, ncon);
     }
     else
 	errorcall(call, _("unsupported mode"));
@@ -1942,7 +2004,7 @@
     PROTECT(ans = allocVector(INTSXP, 1));
     INTEGER(ans)[0] = ncon;
     PROTECT(class = allocVector(STRSXP, 2));
-    SET_STRING_ELT(class, 0, mkChar("textConnection"));
+    SET_STRING_ELT(class, 0, mkChar("loopConnection"));
     SET_STRING_ELT(class, 1, mkChar("connection"));
     classgets(ans, class);
     UNPROTECT(2);
--- src/main/names.c.orig	2005-05-20 05:51:46.000000000 -0700
+++ src/main/names.c	2005-08-22 15:59:47.968828400 -0700
@@ -866,7 +866,7 @@
 {"pushBack", 	do_pushback,	0,      11,     3,      {PP_FUNCALL, PREC_FN,	0}},
 {"clearPushBackLength",do_clearpushback,0,  11,     1,      {PP_FUNCALL, PREC_FN,	0}},
 {"pushBackLength",do_pushbacklength,0,  11,     1,      {PP_FUNCALL, PREC_FN,	0}},
-{"textConnection",do_textconnection,0,	11,     4,      {PP_FUNCALL, PREC_FN,	0}},
+{"loopConnection",do_loopconnection,0,	11,     4,      {PP_FUNCALL, PREC_FN,	0}},
 {"socketConnection",do_sockconn,0,	11,     6,      {PP_FUNCALL, PREC_FN,	0}},
 {"sockSelect",do_sockselect,0,	11,     3,      {PP_FUNCALL, PREC_FN,	0}},
 {"getAllConnections",do_getallconnections,0,11, 0,      {PP_FUNCALL, PREC_FN,	0}},
--- src/include/Rconnections.h.orig	2005-04-18 04:34:02.000000000 -0700
+++ src/include/Rconnections.h	2005-08-22 15:40:02.582767400 -0700
@@ -82,19 +82,19 @@
     int cp;
 } *Rgzfileconn;
 
-typedef struct textconn {
+typedef struct loopconn {
     char *data;  /* all the data */
     int cur, nchars; /* current pos and number of chars */
     char save; /* pushback */
-} *Rtextconn;
+} *Rloopconn;
 
-typedef struct outtextconn {
+typedef struct outloopconn {
     int len;  /* number of lines */
     SEXP namesymbol;
     SEXP data;
     char *lastline;
     int lastlinelength; /* buffer size */
-} *Routtextconn;
+} *Routloopconn;
 
 typedef enum {HTTPsh, FTPsh} UrlScheme;
 
--- src/library/base/R/connections.R.orig	2005-04-18 04:34:17.000000000 -0700
+++ src/library/base/R/connections.R	2005-08-22 16:18:22.095231400 -0700
@@ -83,10 +83,10 @@
                              encoding = getOption("encoding"))
     .Internal(socketConnection(host, port, server, blocking, open, encoding))
 
-textConnection <- function(object, open = "r", local = FALSE) {
+loopConnection <- function(object, open = "r", local = FALSE) {
     if (local) env <- parent.frame()
     else env <- .GlobalEnv
-    .Internal(textConnection(deparse(substitute(object)), object, open, env))
+    .Internal(loopConnection(deparse(substitute(object)), object, open, env))
 }
 
 seek <- function(con, ...)
3 days later
#
I accidentally left one small change out of my previous patch.

So... no response to my request for comments.  Does that mean no one
has an opinion about whether this is a good idea or not?  I'd
appreciate a response from an R core member one way or the other; if
this is not the right way to get a response, should I email people
instead?

-- David Hinds


--- src/include/Internal.h.orig	2005-05-20 05:51:37.000000000 -0700
+++ src/include/Internal.h	2005-08-22 15:46:48.968190600 -0700
@@ -518,7 +518,7 @@
 SEXP do_pushback(SEXP, SEXP, SEXP, SEXP);
 SEXP do_pushbacklength(SEXP, SEXP, SEXP, SEXP);
 SEXP do_clearpushback(SEXP, SEXP, SEXP, SEXP);
-SEXP do_textconnection(SEXP, SEXP, SEXP, SEXP);
+SEXP do_loopconnection(SEXP, SEXP, SEXP, SEXP);
 SEXP do_getallconnections(SEXP, SEXP, SEXP, SEXP);
 SEXP do_sumconnection(SEXP, SEXP, SEXP, SEXP);
 SEXP do_download(SEXP, SEXP, SEXP, SEXP);
#
OK.  I guess you want one of the core people to respond but in the
interim can you explain the terminology "loop"?   
Also, do you have any prototypical applications in mind?
On 8/26/05, dhinds at sonic.net <dhinds at sonic.net> wrote:
#
Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
"loop" is short for "loopback".  A loop or loopback device is one that
just returns the data sent to it.

The prototypical applications are the same sort of applications text
connections are used for: data transformation, in this case of raw
binary data, rather than formatted text data.  In my case, I needed to
interpret a "long raw" column from an Oracle table, that consisted of
packed single precision floating point numbers.

The caTools package on CRAN includes less capable raw2bin and bin2raw
functions, used to implement Base64 encoders and decoders.

-- Dave
#
On Sat, 27 Aug 2005 dhinds at sonic.net wrote:

            
One has.
That is definitely not what text connections do, and not what I read the 
proposal as being given the analogies to text connections.
That I think is where Hinds' confusion arises.  As the posting guide asks, 
please do your homework before posting.

Text connections are from the Green Book, where they are described as

- a 'text connection' using S character strings as lines of text input

- Text connections are a convenience to make it easy to use an object
   containing character strings in a computation that expects to read
   from a connection.

They are read-only (or in R but not S, write-only).  Think for them as the 
analogues of the C functions sscanf and sprintf.  Output text connections 
are a particularly convenient way to create text labels -- see for example 
capture.output().
That is exactly what read-write anonymous file() connections are designed 
for.  They too come from the Green Book.  (If efficiency were an issue, a 
short purpose-designed C routine would be the answer.  But file 
connections are already much 'smarter' than the proposed implementation of 
loop connections and do allow seeking.)

Another piece of homework might be my article in the very first R 
newsletter.
#
On Sat, Aug 27, 2005 at 09:23:37AM +0100, Prof Brian Ripley wrote:
Hmmm.  The distinction you are making seems to be a narrow one.  Or
maybe my analogy is lacking.  To be honest, I could care less about
the name; I wanted something short and more general than "text" but
I've apparently failed.
You're right, this isn't exactly the sense I was thinking of.  But a
binary analog of text connections also seems useful here, doesn't it?
Why should we be able to use objects for computations that expect text
mode connections, but not binary mode connections?  The computation I
want to use is readBin, and it expects a connection, and I want to
feed it an object instead.
Anonymous file() connections seem to be hidden temporary files and I
was trying to avoid going through temporary files just to cast between
data types.  But then I admit to not testing the relative overhead of
the two mechanisms.  I'd rather not write dedicated C routines when
readBin and writeBin already do exactly what I need.

I'm afraid you've stumped me: I thought this seemed like a clean and
orthogonal extension of the existing facility requiring minimal new
code.  If the answer is that it isn't in the Green Book, well, then
shoot, I guess you've got me there!

-- David Hinds
#
David> I've just implemented a generalization of R's text connections, to
    David> also support reading/writing raw binary data.  There is very little
    David> new code to speak of.  For input connections, I wrote code to populate
    David> the old text connection buffer from a raw vector, and provided a new
    David> raw_read() method.  For output connections, I wrote a raw_write() to
    David> append to a raw vector.  On input, the mode (text or binary) is
    David> determined by the data type of the input object; on output, I use the
    David> requested output mode (i.e. "w" / "wb").  For example:

    > con <- loopConnection("r", "wb")
    > a <- c(10,100,1000)
    > writeBin(a, con, size=4)
    > r
     [1] 00 00 20 41 00 00 c8 42 00 00 7a 44
    > close(con)
    > con <- loopConnection(r)
    > readBin(con, "double", n=3, size=4)
     [1]   10  100 1000
    > close(con)

    David> I think "loop connection" is a better name for this
    David> sort of connection than "text connection" was even
    David> for the old version; that confuses the mode of the
    David> connection (text vs binary) with the mechanism (file,
    David> socket, etc).

    ..........

In the mean time, I think it has become clear that
"loopconnection" isn't necessarily a better name, and that
textConnection() has been there in "the S litterature" for a
good reason and for quite a while.
Let's forget about the naming and the exact UI for the moment.

I think the main point of David's proposal is still worth
consideration:  One way to see text connections is as a way to
treat some kind of R objects as "generalized files" i.e., connections.
And AFAICS David proposes to enlarge the kind of R objects that
can be dealt with as connections 
  from  {"character"} 
  to    {"character", "raw"} 
something which has some appeal to me.
IIUC, Brian Ripley is doubting the potential use for the
proposed generalization, whereas David makes a point of someone
else (the 'caTools' author) having written raw2bin / bin2raw function
for a related use case.

Maybe you can elaborate on the above a bit, David?
In any case, as you might have guessed by now, R-core would have
been more positive to a proposal to generalize current
textConnection() - fully back-compatibly - rather than renaming
it first.

Best regards,
Martin
#
Martin Maechler <maechler at stat.math.ethz.ch> wrote:

            
That is entirely fine with me.
I'm not sure what more can be said on the subject.  Most connection
types support both text-mode and binary-mode, so this is partly a
proposal for symmetry and consistency.  Prof. Ripley is correct that
binary anonymous connections provide overlapping functionality, but
the semantics are slightly different, and performance is different.  I
don't see an advantage for having the "text-like" connection only
support text access.

I ran some quick benchmarks on three implementations, where the task
was conversion back and forth between a numeric vector of length 1000,
and a packed raw vector of single precision floats, repeated 1000
times.  The first method uses a new anonymous connection for each
transformation.  The second reuses a single anonymous connection.  The
third uses a new raw textConnection for each transformation.

  usr  sys  elapsed
  1.5  9.5   14.6    anonymous
  1.1  0.1    1.2    persistent
  0.9  0.0    0.9    raw

Setting up and tearing down anonymous connections is very slow (at
least on Windows) because it requires substantial OS intervention.  If
a program can be easily organized so that a single connection can be
used, performance is much better.

I would appreciate feedback on how to improve raw_write() for the case
of appending to an existing vector.  Is it possible to reserve free
space at the end of a vector for appending?  I see that there is a
distinction between LENGTH() and TRUELENGTH() but I'm not sure if this
is the intended use.
I have no interest in sacrificing back compatibility; I did intend
that there would always be a textConnection() entry point, if only as
a wrapper for the new constructor.  The only reason for a new name
(and I'm certainly open to suggestions) is because the notion of a
binary or raw textConnection seemed wrong.

-- David Hinds
#
This may not be entirely on the mark in terms of relevancy but
just in case there is some relevancy I wanted to bring it up.

Just to be concrete, suppose one wants to run the following as 
a concurrent process to R.  (What is does is it implicitly sets x
to zero and then for each line of stdin it adds the first field
of the input to x and prints that to stdout unless the first field is "exit"
in which case it exits.  gawk has an implicit read/process loop
so one does not have to specify the read step.  The fflush()
command just makes sure that output is emitted, rather than
buffered, as it is produced.)

   gawk -f myexample.awk

where myexample.awk contains the single line:

   { if ($1 == "exit") exit else { x += $1; print x; fflush() } }

This has nothing to do with raw data but is prototypical of many
possible situations where one is controlling a remote program
from R and is sending input to it and getting back output with
memory/persistance.

This example is actually the same as
   system("gawk -f myexample.awk", intern = TRUE)
except that it also has memory/persistance whereas the system
call starts up a new instance of gawk each time its called and
so would always start out with x=0 each time rather than
the accumulated sum of past values.

I have not used fifos which I assume could handle this problem
(since they are not yet provided in the Windows version of R which
is what I use) but I was wondering if the application overlaps in any 
way with what is being discussed here.  In particular it would be nice 
to have a read/write "connection" that one writes to in order to provide 
the next line to the gawk process and reads from to get the answer.
2 days later
#
Gabor Grothendieck <ggrothendieck at gmail.com> wrote:

            
It seems you're just trying to reinvent fifo and/or pipe connections
for interprocess communication.  That is not directly related to the
problem I wanted to address.

-- Dave
1 day later
#
Martin Maechler <maechler at stat.math.ethz.ch> wrote:

            
To summarize the motivation for the proposal, again:

- There are two modes of connections: text and binary.  The operations
  supported on text and binary connections are mostly disjoint.  Most
  connection classes (socket, file, etc) support both modes.

- textConnection() binds a character vector to a text connection.
  There is no equivalent for a binary connection.  there are
  workarounds (i.e. anonymous connections, equivalent to temporary
  files), but these have substantial performance penalties.

- Both connection modes have useful applications.  textConnection() is
  useful, or it would not exist.  Orthogonality is good, special cases
  are bad.

- Only about 50 lines of code are required to implement a binary form
  of textConnection() in the R core.  Implementing this functionality
  in a separate package requires substantially more code.

- I need it, and in at least one case, another R package developer has
  implemented it using temporary files (caTools).  I also just noticed
  that Duncon Murdoch recently proposed the EXACT SAME feature on
  r-help:

  https://stat.ethz.ch/pipermail/r-help/2005-April/067651.html

I think that just about sums it up.  I've attached a smaller patch
that makes fewer changes to R source, doesn't change any existing
function names, etc.  The feature adds 400 bytes to the size of R.dll.

-- Dave



--- src/main/connections.c.orig	2005-06-17 19:05:02.000000000 -0700
+++ src/main/connections.c	2005-08-31 15:26:19.947195100 -0700
@@ -1644,7 +1644,7 @@
     return ans;
 }
 
-/* ------------------- text connections --------------------- */
+/* ------------------- text and raw connections --------------------- */
 
 /* read a R character vector into a buffer */
 static void text_init(Rconnection con, SEXP text)
@@ -1668,6 +1668,22 @@
     this->cur = this->save = 0;
 }
 
+/* read a R raw vector into a buffer */
+static void raw_init(Rconnection con, SEXP raw)
+{
+    int nbytes = length(raw);
+    Rtextconn this = (Rtextconn)con->private;
+
+    this->data = (char *) malloc(nbytes);
+    if(!this->data) {
+	free(this); free(con->description); free(con->class); free(con);
+	error(_("cannot allocate memory for raw connection"));
+    }
+    memcpy(this->data, RAW(raw), nbytes);
+    this->nchars = nbytes;
+    this->cur = this->save = 0;
+}
+
 static Rboolean text_open(Rconnection con)
 {
     con->save = -1000;
@@ -1702,41 +1718,60 @@
 
 static double text_seek(Rconnection con, double where, int origin, int rw)
 {
-    if(where >= 0) error(_("seek is not relevant for text connection"));
+    if(where >= 0) error(_("seek is not relevant for this connection"));
     return 0; /* if just asking, always at the beginning */
 }
 
-static Rconnection newtext(char *description, SEXP text)
+static size_t raw_read(void *ptr, size_t size, size_t nitems,
+		       Rconnection con)
+{
+    Rtextconn this = (Rtextconn)con->private;
+    if (this->cur + size*nitems > this->nchars) {
+	nitems = (this->nchars - this->cur)/size;
+	memcpy(ptr, this->data+this->cur, size*nitems);
+	this->cur = this->nchars;
+    } else {
+	memcpy(ptr, this->data+this->cur, size*nitems);
+	this->cur += size*nitems;
+    }
+    return nitems;
+}
+
+static Rconnection newtext(char *description, SEXP data)
 {
     Rconnection new;
+    int isText = isString(data);
     new = (Rconnection) malloc(sizeof(struct Rconn));
-    if(!new) error(_("allocation of text connection failed"));
-    new->class = (char *) malloc(strlen("textConnection") + 1);
-    if(!new->class) {
-	free(new);
-	error(_("allocation of text connection failed"));
-    }
-    strcpy(new->class, "textConnection");
+    if(!new) goto f1;
+    new->class = (char *) malloc(strlen("xxxxConnection") + 1);
+    if(!new->class) goto f2;
+    sprintf(new->class, "%sConnection", isText ? "text" : "raw");
     new->description = (char *) malloc(strlen(description) + 1);
-    if(!new->description) {
-	free(new->class); free(new);
-	error(_("allocation of text connection failed"));
-    }
+    if(!new->description) goto f3;
     init_con(new, description, "r");
     new->isopen = TRUE;
     new->canwrite = FALSE;
     new->open = &text_open;
     new->close = &text_close;
     new->destroy = &text_destroy;
-    new->fgetc = &text_fgetc;
     new->seek = &text_seek;
     new->private = (void*) malloc(sizeof(struct textconn));
-    if(!new->private) {
-	free(new->description); free(new->class); free(new);
-	error(_("allocation of text connection failed"));
+    if(!new->private) goto f4;
+    new->text = isText;
+    if (new->text) {
+	new->fgetc = &text_fgetc;
+	text_init(new, data);
+    } else {
+	new->read = &raw_read;
+	raw_init(new, data);
     }
-    text_init(new, text);
     return new;
+
+f4: free(new->description);
+f3: free(new->class);
+f2: free(new);
+f1: error(_("allocation of %s connection failed"),
+	  isText ? "text" : "raw");
 }
 
 static void outtext_close(Rconnection con)
@@ -1830,24 +1865,42 @@
     return res;
 }
 
+static size_t raw_write(const void *ptr, size_t size, size_t nitems,
+			Rconnection con)
+{
+    Routtextconn this = (Routtextconn)con->private;
+    SEXP tmp;
+    int idx = ConnIndex(con);
+
+    PROTECT(tmp = lengthgets(this->data, this->len + size*nitems));
+    memcpy(RAW(tmp)+this->len, ptr, size*nitems);
+    this->len += size*nitems;
+    defineVar(this->namesymbol, tmp, VECTOR_ELT(OutTextData, idx));
+    this->data = tmp;
+    UNPROTECT(1);
+    return nitems;
+}
+
 static void outtext_init(Rconnection con, char *mode, int idx)
 {
     Routtextconn this = (Routtextconn)con->private;
+    int st = (con->text ? STRSXP : RAWSXP);
     SEXP val;
 
     this->namesymbol = install(con->description);
-    if(strcmp(mode, "w") == 0) {
+    if(strncmp(mode, "w", 1) == 0) {
 	/* create variable pointed to by con->description */
-	PROTECT(val = allocVector(STRSXP, 0));
+	PROTECT(val = allocVector(st, 0));
 	defineVar(this->namesymbol, val, VECTOR_ELT(OutTextData, idx));
 	UNPROTECT(1);
     } else {
 	/* take over existing variable */
 	val = findVar1(this->namesymbol, VECTOR_ELT(OutTextData, idx),
-		       STRSXP, FALSE);
+		       st, FALSE);
 	if(val == R_UnboundValue) {
-	    warning(_("text connection: appending to a non-existent char vector"));
-	    PROTECT(val = allocVector(STRSXP, 0));
+	    warning(_("%s connection: appending to a non-existent vector"),
+		    con->text ? "text" : "raw");
+	    PROTECT(val = allocVector(st, 0));
 	    defineVar(this->namesymbol, val, VECTOR_ELT(OutTextData, idx));
 	    UNPROTECT(1);
 	}
@@ -1862,43 +1915,43 @@
 static Rconnection newouttext(char *description, SEXP sfile, char *mode,
 			      int idx)
 {
+    int isText = (mode[1] != 'b');
     Rconnection new;
     void *tmp;
 
     new = (Rconnection) malloc(sizeof(struct Rconn));
-    if(!new) error(_("allocation of text connection failed"));
-    new->class = (char *) malloc(strlen("textConnection") + 1);
-    if(!new->class) {
-	free(new);
-	error(_("allocation of text connection failed"));
-    }
-    strcpy(new->class, "textConnection");
+    if(!new) goto f1;
+    new->class = (char *) malloc(strlen("xxxxConnection") + 1);
+    if(!new->class) goto f2;
+    sprintf(new->class, "%sConnection", isText ? "text" : "raw");
     new->description = (char *) malloc(strlen(description) + 1);
-    if(!new->description) {
-	free(new->class); free(new);
-	error(_("allocation of text connection failed"));
-    }
+    if(!new->description) goto f3;
     init_con(new, description, mode);
+    new->text = isText;
     new->isopen = TRUE;
     new->canread = FALSE;
     new->open = &text_open;
     new->close = &outtext_close;
     new->destroy = &outtext_destroy;
-    new->vfprintf = &text_vfprintf;
     new->seek = &text_seek;
     new->private = (void*) malloc(sizeof(struct outtextconn));
-    if(!new->private) {
-	free(new->description); free(new->class); free(new);
-	error(_("allocation of text connection failed"));
-    }
+    if(!new->private) goto f4;
     ((Routtextconn)new->private)->lastline = tmp = malloc(LAST_LINE_LEN);
-    if(!tmp) {
-	free(new->private);
-	free(new->description); free(new->class); free(new);
-	error(_("allocation of text connection failed"));
+    if(!tmp) goto f5;
+    if (isText) {
+	new->vfprintf = &text_vfprintf;
+    } else {
+	new->write = &raw_write;
     }
     outtext_init(new, mode, idx);
     return new;
+
+f5: free(new->private);
+f4: free(new->description);
+f3: free(new->class);
+f2: free(new);
+f1: error(_("allocation of %s connection failed"),
+	  isText ? "text" : "raw");
 }
 
 SEXP do_textconnection(SEXP call, SEXP op, SEXP args, SEXP env)
@@ -1914,8 +1967,6 @@
 	error(_("invalid 'description' argument"));
     desc = CHAR(STRING_ELT(sfile, 0));
     stext = CADR(args);
-    if(!isString(stext))
-	error(_("invalid 'text' argument"));
     sopen = CADDR(args);
     if(!isString(sopen) || length(sopen) != 1)
     error(_("invalid 'open' argument"));
@@ -1924,9 +1975,13 @@
     if (!isEnvironment(venv) && venv != R_NilValue)
 	error(_("invalid 'environment' argument"));
     ncon = NextConnection();
-    if(!strlen(open) || strncmp(open, "r", 1) == 0)
+    if(!strlen(open) || (open[0] == 'r')) {
+	if(!isString(stext) && (TYPEOF(stext) != RAWSXP))
+	    error(_("invalid 'object' argument"));
 	con = Connections[ncon] = newtext(desc, stext);
-    else if (strncmp(open, "w", 1) == 0 || strncmp(open, "a", 1) == 0) {
+    } else if ((open[0] == 'w') || (open[0] == 'a')) {
+	if(!isString(stext))
+	    error(_("invalid 'object' argument"));
 	if (OutTextData == NULL) {
 	    OutTextData = allocVector(VECSXP, NCONNECTIONS);
 	    R_PreserveObject(OutTextData);
@@ -1942,7 +1997,7 @@
     PROTECT(ans = allocVector(INTSXP, 1));
     INTEGER(ans)[0] = ncon;
     PROTECT(class = allocVector(STRSXP, 2));
-    SET_STRING_ELT(class, 0, mkChar("textConnection"));
+    SET_STRING_ELT(class, 0, mkChar(con->class));
     SET_STRING_ELT(class, 1, mkChar("connection"));
     classgets(ans, class);
     UNPROTECT(2);
#
dhinds at sonic.net wrote:
Since you quote me:

I would implement it differently from the way you did.  I'd call it a 
rawConnection, taking a raw variable (or converting something else using 
as.raw) as the input, and providing both text and binary read/write 
modes (using the same conventions for text mode as a file connection 
would).  It *should* support seek, at least in binary mode.

I would like an implementation that didn't necessarily duplicate the 
whole raw vector into a buffer (it might be big, and people who deal 
with big objects are always short of memory), but this isn't essential, 
it would just be a nice feature.

Now, it would be nice to have something like this, but I'm not likely to 
  have time to do it in the near future.  If you are interested in doing 
this (and documenting it), I'd be willing to take a look at your code 
and commit it when it looked okay.

The deadline for this to make it into 2.2.0 is that I'd want to commit 
it by Sept 6, so there's not a lot of time left.

Duncan Murdoch