Skip to content

RFC: rawConnection (was "loop connections")

7 messages · Duncan Murdoch, David Hinds

#
Duncan Murdoch <murdoch at stats.uwo.ca> wrote:

            
I was trying to reuse as much of the textConnection semantics and
underlying code as possible...

Having a rawConnection() entry point is simple enough.  Seeking also
seems straightforward.  I'm not so sure about using as.raw().  I
wondered about that, but also thought that rather than coercing to
raw, it might make more sense to cast atomic vector types to raw,
byte-for-byte.

Can you given an example of where a text-mode raw connection would be
a useful thing?

-- Dave
#
dhinds at sonic.net wrote:
I'd prefer as.raw, so that we don't end up with two incompatible ways to 
convert other objects to raw objects.
No, but someone else might.  Why unnecessarily let the source of the 
bytes determine the mode of the connection?  In the case of 
textConnection, there are natural line breaks, so a text mode connection 
makes sense.  A raw object can contain anything, so why wouldn't someone 
want to put text in it some day?

Duncan Murdoch
#
Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
An advantage of no as.raw() would be that you could create a raw
connection on an object without making an extra copy, which was
another of your requests.  But there would be a lack of symmetry,
because you could "r" from an arbitrary R object, but only "w" to raw,
unless there was also a way of specifying a type for the result
vector.

Having the backing store be an R object with no copy does seem tricky,
however.  Currently, textConnection() makes a copy for "r" connections
but writes directly to an R object for "w" connections.  The "w" case
is buggy; you can crash R by removing the target object while the
connection is being used.  I'm not familiar enough with R internals to
know how to fix that.  Maybe the object has to be searched for every
time the connection is used, to avoid potentially stale pointers?
It seems that that a text-mode raw connection would be equivalent to a
textConnection on the result of rawToChar(), no?

While some of these possibilities seem like they might be useful, I'm
not sure that all need to be implemented immediately.  If we can agree
on the basic interface and semantics, then we could implement a basic
version now, and relax restrictions on the arguments later as needed?

-- Dave
#
dhinds at sonic.net wrote:
I think the cost of duplicating as.raw is worse than the cost of using 
extra memory.  If the lack of symmetry bothers you, a solution is to 
require a raw object as input.
In that case I wouldn't bother.  It's important to get it right; being 
maximally efficient is a second priority.

 > Currently, textConnection() makes a copy for "r" connections
I've been having an argument with some other people about something 
related to this.  I think they would say that the language doesn't 
support writing to a variable.

I don't know the right way to fix this.
If so, then a binary mode rawConnection (with mention of the way to 
convert in the Rd file) would be good enough for me.
I'd rather get it right now, but that doesn't have to mean including 
every bell and whistle someone (even me!) has suggested.

Duncan Murdoch
#
Duncan Murdoch <murdoch at stats.uwo.ca> wrote:

            
It wouldn't exactly be duplicating as.raw since this way of converting
to raw is actually to do nothing at all, just to treat the object as
if it is already raw.  But, I don't have a strong opinion.
I tried changing textConnection output connections to look up the
destination object on every access and that seems to solve the problem
without being terribly expensive.
It seems we are coming back to something close to what I had
originally implemented?

-- Dave
#
dhinds at sonic.net wrote:
I haven't looked at as.raw, but I think it does something other than 
that.  For example,

rawToChar(as.raw(1:10)) gives

"\001\002\003\004\005\006\a\b\t\n"

I don't know if there's a way to do exactly what you're proposing.  One 
argument against it is that the bytes for an object may vary from 
platform to platform (big versus little endian, maybe 32 vs 64 bit), 
whereas we try to make R code platform independent when we can.
Probably!  The differences I still know about are:

  - I'd like the name to reflect the data source, so rawConnection or 
something similar rather than overloading textConnection.

  - It needs a man page, or to be included on the textConnection man page.

Duncan
1 day later
#
Duncan Murdoch <murdoch at stats.uwo.ca> wrote:

            
Here is an updated patch, with the rawConnection() entry point, and a
man page, against today's R-devel snapshot.  This also fixes (text or
raw) output connections to verify that the target object still exists
before writing to that object.

-- Dave



--- src/main/connections.c.orig	2005-08-29 17:47:35.000000000 -0700
+++ src/main/connections.c	2005-09-03 13:34:25.098514900 -0700
@@ -1678,7 +1678,7 @@
     return ans;
 }
 
-/* ------------------- text connections --------------------- */
+/* ------------------- text and raw connections --------------------- */
 
 /* read a R character vector into a buffer */
 static void text_init(Rconnection con, SEXP text)
@@ -1702,6 +1702,22 @@
     this->cur = this->save = 0;
 }
 
+/* read a R raw vector into a buffer */
+static void raw_init(Rconnection con, SEXP raw)
+{
+    int nbytes = length(raw);
+    Rtextconn this = (Rtextconn)con->private;
+
+    this->data = (char *) malloc(nbytes);
+    if(!this->data) {
+	free(this); free(con->description); free(con->class); free(con);
+	error(_("cannot allocate memory for raw connection"));
+    }
+    memcpy(this->data, RAW(raw), nbytes);
+    this->nchars = nbytes;
+    this->cur = this->save = 0;
+}
+
 static Rboolean text_open(Rconnection con)
 {
     con->save = -1000;
@@ -1736,41 +1752,60 @@
 
 static double text_seek(Rconnection con, double where, int origin, int rw)
 {
-    if(where >= 0) error(_("seek is not relevant for text connection"));
+    if(where >= 0) error(_("seek is not relevant for this connection"));
     return 0; /* if just asking, always at the beginning */
 }
 
-static Rconnection newtext(char *description, SEXP text)
+static size_t raw_read(void *ptr, size_t size, size_t nitems,
+		       Rconnection con)
+{
+    Rtextconn this = (Rtextconn)con->private;
+    if (this->cur + size*nitems > this->nchars) {
+	nitems = (this->nchars - this->cur)/size;
+	memcpy(ptr, this->data+this->cur, size*nitems);
+	this->cur = this->nchars;
+    } else {
+	memcpy(ptr, this->data+this->cur, size*nitems);
+	this->cur += size*nitems;
+    }
+    return nitems;
+}
+
+static Rconnection newtext(char *description, SEXP data)
 {
     Rconnection new;
+    int isText = isString(data);
     new = (Rconnection) malloc(sizeof(struct Rconn));
-    if(!new) error(_("allocation of text connection failed"));
-    new->class = (char *) malloc(strlen("textConnection") + 1);
-    if(!new->class) {
-	free(new);
-	error(_("allocation of text connection failed"));
-    }
-    strcpy(new->class, "textConnection");
+    if(!new) goto f1;
+    new->class = (char *) malloc(strlen("xxxxConnection") + 1);
+    if(!new->class) goto f2;
+    sprintf(new->class, "%sConnection", isText ? "text" : "raw");
     new->description = (char *) malloc(strlen(description) + 1);
-    if(!new->description) {
-	free(new->class); free(new);
-	error(_("allocation of text connection failed"));
-    }
+    if(!new->description) goto f3;
     init_con(new, description, "r");
     new->isopen = TRUE;
     new->canwrite = FALSE;
     new->open = &text_open;
     new->close = &text_close;
     new->destroy = &text_destroy;
-    new->fgetc = &text_fgetc;
     new->seek = &text_seek;
     new->private = (void*) malloc(sizeof(struct textconn));
-    if(!new->private) {
-	free(new->description); free(new->class); free(new);
-	error(_("allocation of text connection failed"));
+    if(!new->private) goto f4;
+    new->text = isText;
+    if (new->text) {
+	new->fgetc = &text_fgetc;
+	text_init(new, data);
+    } else {
+	new->read = &raw_read;
+	raw_init(new, data);
     }
-    text_init(new, text);
     return new;
+
+f4: free(new->description);
+f3: free(new->class);
+f2: free(new);
+f1: error(_("allocation of %s connection failed"),
+	  isText ? "text" : "raw");
 }
 
 static void outtext_close(Rconnection con)
@@ -1780,10 +1815,13 @@
     int idx = ConnIndex(con);
 
     if(strlen(this->lastline) > 0) {
-	PROTECT(tmp = lengthgets(this->data, ++this->len));
+	tmp = findVar1(this->namesymbol, VECTOR_ELT(OutTextData, idx),
+		       STRSXP, FALSE);
+	if (tmp == R_UnboundValue)
+	    error(_("connection endpoint unbound"));
+	PROTECT(tmp = lengthgets(tmp, ++this->len));
 	SET_STRING_ELT(tmp, this->len - 1, mkChar(this->lastline));
 	defineVar(this->namesymbol, tmp, VECTOR_ELT(OutTextData, idx));
-	this->data = tmp;
 	UNPROTECT(1);
     }
     SET_VECTOR_ELT(OutTextData, idx, R_NilValue);
@@ -1843,10 +1881,13 @@
 	if(q) {
 	    int idx = ConnIndex(con);
 	    *q = '\0';
-	    PROTECT(tmp = lengthgets(this->data, ++this->len));
+	    tmp = findVar1(this->namesymbol, VECTOR_ELT(OutTextData, idx),
+			   STRSXP, FALSE);
+	    if (tmp == R_UnboundValue)
+		error(_("connection endpoint unbound"));
+	    PROTECT(tmp = lengthgets(tmp, ++this->len));
 	    SET_STRING_ELT(tmp, this->len - 1, mkChar(p));
 	    defineVar(this->namesymbol, tmp, VECTOR_ELT(OutTextData, idx));
-	    this->data = tmp;
 	    UNPROTECT(1);
 	} else {
 	    /* retain the last line */
@@ -1864,30 +1905,50 @@
     return res;
 }
 
+static size_t raw_write(const void *ptr, size_t size, size_t nitems,
+			Rconnection con)
+{
+    Routtextconn this = (Routtextconn)con->private;
+    SEXP tmp;
+    int idx = ConnIndex(con);
+
+    tmp = findVar1(this->namesymbol, VECTOR_ELT(OutTextData, idx),
+		   RAWSXP, FALSE);
+    if (tmp == R_UnboundValue)
+	error(_("connection endpoint unbound"));
+    PROTECT(tmp = lengthgets(tmp, this->len + size*nitems));
+    memcpy(RAW(tmp)+this->len, ptr, size*nitems);
+    this->len += size*nitems;
+    defineVar(this->namesymbol, tmp, VECTOR_ELT(OutTextData, idx));
+    UNPROTECT(1);
+    return nitems;
+}
+
 static void outtext_init(Rconnection con, char *mode, int idx)
 {
     Routtextconn this = (Routtextconn)con->private;
+    int st = (con->text ? STRSXP : RAWSXP);
     SEXP val;
 
     this->namesymbol = install(con->description);
-    if(strcmp(mode, "w") == 0) {
+    if(strncmp(mode, "w", 1) == 0) {
 	/* create variable pointed to by con->description */
-	PROTECT(val = allocVector(STRSXP, 0));
+	PROTECT(val = allocVector(st, 0));
 	defineVar(this->namesymbol, val, VECTOR_ELT(OutTextData, idx));
 	UNPROTECT(1);
     } else {
 	/* take over existing variable */
 	val = findVar1(this->namesymbol, VECTOR_ELT(OutTextData, idx),
-		       STRSXP, FALSE);
+		       st, FALSE);
 	if(val == R_UnboundValue) {
-	    warning(_("text connection: appending to a non-existent char vector"));
-	    PROTECT(val = allocVector(STRSXP, 0));
+	    warning(_("%s connection: appending to a non-existent vector"),
+		    con->text ? "text" : "raw");
+	    PROTECT(val = allocVector(st, 0));
 	    defineVar(this->namesymbol, val, VECTOR_ELT(OutTextData, idx));
 	    UNPROTECT(1);
 	}
     }
     this->len = LENGTH(val);
-    this->data = val;
     this->lastline[0] = '\0';
     this->lastlinelength = LAST_LINE_LEN;
 }
@@ -1896,43 +1957,43 @@
 static Rconnection newouttext(char *description, SEXP sfile, char *mode,
 			      int idx)
 {
+    int isText = (mode[1] != 'b');
     Rconnection new;
     void *tmp;
 
     new = (Rconnection) malloc(sizeof(struct Rconn));
-    if(!new) error(_("allocation of text connection failed"));
-    new->class = (char *) malloc(strlen("textConnection") + 1);
-    if(!new->class) {
-	free(new);
-	error(_("allocation of text connection failed"));
-    }
-    strcpy(new->class, "textConnection");
+    if(!new) goto f1;
+    new->class = (char *) malloc(strlen("xxxxConnection") + 1);
+    if(!new->class) goto f2;
+    sprintf(new->class, "%sConnection", isText ? "text" : "raw");
     new->description = (char *) malloc(strlen(description) + 1);
-    if(!new->description) {
-	free(new->class); free(new);
-	error(_("allocation of text connection failed"));
-    }
+    if(!new->description) goto f3;
     init_con(new, description, mode);
+    new->text = isText;
     new->isopen = TRUE;
     new->canread = FALSE;
     new->open = &text_open;
     new->close = &outtext_close;
     new->destroy = &outtext_destroy;
-    new->vfprintf = &text_vfprintf;
     new->seek = &text_seek;
     new->private = (void*) malloc(sizeof(struct outtextconn));
-    if(!new->private) {
-	free(new->description); free(new->class); free(new);
-	error(_("allocation of text connection failed"));
-    }
+    if(!new->private) goto f4;
     ((Routtextconn)new->private)->lastline = tmp = malloc(LAST_LINE_LEN);
-    if(!tmp) {
-	free(new->private);
-	free(new->description); free(new->class); free(new);
-	error(_("allocation of text connection failed"));
+    if(!tmp) goto f5;
+    if (isText) {
+	new->vfprintf = &text_vfprintf;
+    } else {
+	new->write = &raw_write;
     }
     outtext_init(new, mode, idx);
     return new;
+
+f5: free(new->private);
+f4: free(new->description);
+f3: free(new->class);
+f2: free(new);
+f1: error(_("allocation of %s connection failed"),
+	  isText ? "text" : "raw");
 }
 
 SEXP do_textconnection(SEXP call, SEXP op, SEXP args, SEXP env)
@@ -1948,19 +2009,22 @@
 	error(_("invalid '%s' argument"), "description");
     desc = CHAR(STRING_ELT(sfile, 0));
     stext = CADR(args);
-    if(!isString(stext))
-	error(_("invalid '%s' argument"), "text");
     sopen = CADDR(args);
     if(!isString(sopen) || length(sopen) != 1)
-    error(_("invalid '%s' argument"), "open");
+	error(_("invalid '%s' argument"), "open");
     open = CHAR(STRING_ELT(sopen, 0));
     venv = CADDDR(args);
     if (!isEnvironment(venv) && venv != R_BaseEnv)
 	error(_("invalid '%s' argument"), "environment");
     ncon = NextConnection();
-    if(!strlen(open) || strncmp(open, "r", 1) == 0)
+    if(!strlen(open) || (open[0] == 'r')) {
+ 	int isText = (!strlen(open) || (open[1] != 'b'));
+ 	if (TYPEOF(stext) != (isText ? STRSXP : RAWSXP))
+ 	    error(_("invalid '%s' argument"), "object");
 	con = Connections[ncon] = newtext(desc, stext);
-    else if (strncmp(open, "w", 1) == 0 || strncmp(open, "a", 1) == 0) {
+    } else if ((open[0] == 'w') || (open[0] == 'a')) {
+	if (!isString(stext))
+	    error(_("invalid '%s' argument"), "object");
 	if (OutTextData == NULL) {
 	    OutTextData = allocVector(VECSXP, NCONNECTIONS);
 	    R_PreserveObject(OutTextData);
@@ -1976,7 +2040,7 @@
     PROTECT(ans = allocVector(INTSXP, 1));
     INTEGER(ans)[0] = ncon;
     PROTECT(class = allocVector(STRSXP, 2));
-    SET_STRING_ELT(class, 0, mkChar("textConnection"));
+    SET_STRING_ELT(class, 0, mkChar(con->class));
     SET_STRING_ELT(class, 1, mkChar("connection"));
     classgets(ans, class);
     UNPROTECT(2);
--- src/include/Rconnections.h.orig	2005-08-03 08:50:36.000000000 -0700
+++ src/include/Rconnections.h	2005-09-03 12:52:22.790700000 -0700
@@ -95,7 +95,6 @@
 typedef struct outtextconn {
     int len;  /* number of lines */
     SEXP namesymbol;
-    SEXP data;
     char *lastline;
     int lastlinelength; /* buffer size */
 } *Routtextconn;
--- src/library/base/R/connections.R.orig	2005-04-18 04:34:17.000000000 -0700
+++ src/library/base/R/connections.R	2005-09-03 11:27:07.128227400 -0700
@@ -84,6 +84,17 @@
     .Internal(socketConnection(host, port, server, blocking, open, encoding))
 
 textConnection <- function(object, open = "r", local = FALSE) {
+    if (!(open %in% c("","r","a","w")))
+        stop('unsupported mode')
+    if (local) env <- parent.frame()
+    else env <- .GlobalEnv
+    .Internal(textConnection(deparse(substitute(object)), object, open, env))
+}
+
+rawConnection <- function(object, open = "rb", local = FALSE) {
+    if (open == "") open <- "rb"
+    if (!(open %in% c("rb","ab","wb")))
+        stop("unsupported mode")
     if (local) env <- parent.frame()
     else env <- .GlobalEnv
     .Internal(textConnection(deparse(substitute(object)), object, open, env))
--- src/library/base/man/textconnections.Rd.orig	2005-09-03 13:55:48.274305900 -0700
+++ src/library/base/man/textconnections.Rd	2005-09-03 13:55:48.821177400 -0700
@@ -45,8 +45,8 @@
 }
 
 \value{
-  A connection object of class \code{"textConnection"} which inherits
-  from class \code{"connection"}.
+  A text-mode connection object of class \code{"textConnection"} which
+  inherits from class \code{"connection"}.
 }
 
 \note{
@@ -69,7 +69,8 @@
 
 \seealso{
   \code{\link{connections}}, \code{\link{showConnections}},
-  \code{\link{pushBack}}, \code{\link{capture.output}}.
+  \code{\link{pushBack}}, \code{\link{capture.output}},
+  \code{\link{rawConnection}}.
 }
 
 \examples{
--- src/library/base/man/rawconnections.Rd.orig	1969-12-31 16:00:00.000000000 -0800
+++ src/library/base/man/rawconnections.Rd	2005-09-03 13:50:13.620197700 -0700
@@ -0,0 +1,78 @@
+\name{rawConnection}
+\alias{rawConnection}
+\title{Raw Connections}
+\description{
+  Input and output raw connections.
+}
+\usage{
+rawConnection(object, open = "r", local = FALSE)
+}
+\arguments{
+  \item{object}{raw or character.  A description of the connection. 
+    For an input this is an \R raw vector object, and for an output
+    connection the name for the \R raw vector to receive the
+    output.
+  }
+  \item{open}{character.  Either \code{"rb"} (or equivalently \code{""})
+    for an input connection or \code{"wb"} or \code{"ab"} for an output
+    connection.}
+  \item{local}{logical.  Used only for output connections.  If \code{TRUE},
+    output is assigned to a variable in the calling environment.  Otherwise
+    the global environment is used.}
+}
+\details{
+  An input raw connection is opened and the raw vector is copied
+  at time the connection object is created, and \code{close}
+  destroys the copy.
+
+  An output raw connection is opened and creates an \R raw vector of
+  the given name in the user's workspace or in the calling
+  environment, depending on the value of the \code{local} argument.
+  This object will at all times hold the accumulated output to the
+  connection.
+
+  Opening a raw connection with \code{mode = "ab"} will attempt to
+  append to an existing raw vector with the given name in the user's
+  workspace or the calling environment.  If none is found (even if an
+  object exists of the right name but the wrong type) a new raw vector
+  wil be created, with a warning.
+
+  You cannot \code{seek} on a raw connection, and \code{seek} will
+  always return zero as the position.
+}
+
+\value{
+  A binary-mode connection object of class \code{"rawConnection"}
+  which inherits from class \code{"connection"}.
+}
+
+\note{
+  As output raw connections update the result vector after every
+  operation, they can be relatively expensive to use, and it may be
+  better to use an anonymous \code{\link{file}()} connection to collect
+  output when a large amount of binary data needs to be assembled.
+}
+ 
+\seealso{
+  \code{\link{connections}}, \code{\link{showConnections}},
+  \code{\link{readBin}}, \code{\link{writeBin}},
+  \code{\link{textConnection}}.
+}
+
+\examples{
+zz <- rawConnection("foo", "wb")
+writeBin(1:2, zz)
+writeBin(1:8, zz, size=1)
+writeBin(pi, zz, size=4)
+close(zz)
+foo
+
+zz <- rawConnection(foo)
+readBin(zz, "integer", n=2)
+sprintf("\%04x", readBin(zz, "integer", n=2, size=2))
+sprintf("\%08x", readBin(zz, "integer", endian="swap"))
+readBin(zz, "numeric", n=1, size=4)
+close(zz)
+}
+\keyword{file}
+\keyword{connection}