Process to both write to and read from (pipe/fork)? - R-devel

Tue, Feb 1, 2005 6:44 AM #

[Moved from R-help.]

One reason you cannot easily do this with basic R functions is that it is 
not portable.  E.g. pipes on Windows 98 do not work like that, and 
system() on that OS has to work hard to do something similar.

If we only had to consider standard Unices, pipe() would allow read-write 
modes.  As it is, it is easy for you to write an OS-specific extension.

BTW, please re-read the distinction between R-devel and R-help in the 
posting guide: this (and most of your other recent postings) seem to me to 
fall unambiguously within the specification for R-devel.

On Tue, 1 Feb 2005, Jan T. Kim wrote:

Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Jan T. Kim

Tue, Feb 1, 2005 11:52 AM #

On Tue, Feb 01, 2005 at 01:44:37PM +0000, Prof Brian Ripley wrote:

Ok, thanks -- I really thought (and hoped) that I was just overlooking
something obvious.

I'm not fully convinced that the portability issues necessarily
preclude providing the fork / pipe facilities, as there already is some
differentiation present now, reflected by capabilities(). In fact,
when I saw that availability of fifos is reflected by capabilities(),
I thought that then, more basic pipes must also be accessible somehow
(and spent some effort investigating this idea).

Well, that is probably reasonably easy, but (not the least due to that
fact) I'm still surprised that it has not been done already. I can hardly
imagine that I'm the first one to want to use some external utility from
an R program in this way.

So, what do you R-devel folks do in this case, and what would you
recommend?

At the time of writing, I thought I was asking for help in using R...

Best regards, Jan

+- Jan T. Kim -------------------------------------------------------+
 |    *NEW*    email: jtk@cmp.uea.ac.uk                               |
 |    *NEW*    WWW:   http://www.cmp.uea.ac.uk/people/jtk             |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*

Jan T. Kim

Fri, Feb 11, 2005 4:30 AM #

Dear All,

On Tue, Feb 01, 2005 at 07:50:17PM +0000, Jan T. Kim wrote:

I've looked into this and tried to write a function that would start
an external process and return two connections, one for writing to the
external process and one for reading from it. Unfortunately, I haven't
found a way to implement this in a package, without altering the R
source code itself (details below). As an alternative / workaround,
I coded up a function

   xpipe(cmd, input)

that takes a command to start the external process (cmd) and a character
containing the lines to be written (input), and returns a character
vector containing the output produced by the external process. The
xpipe package is available at

    http://www2.cmp.uea.ac.uk/~jtk/software/xpipe_0.0-1.tar.gz

To an extent, this provides the functionality I was looking for, but
it is not satisfactory because the output cannot be processed by R
on line -- xpipe accumulates the entire output and returns it only
after the external process has terminated.

Also technically, it's cumbersome to use: For obtaining something else
than a character value, it seems one has to write the output into an
anonymous file and then use scan, read.table or whatever to read from
that file.

Therefore, I still look for a way to implement the design where the
pipe ends are returned as R connections. The problem in doing so is that
connections are stored in a

    static Rconnection Connections[NCONNECTIONS];

(file src/main/connections.c), and I cannot find any function that
provides an interface for allocating a slot in the Connections array
and storing a connection set up by a the code in my package there.
There is a non-static (i.e. externally visible) NextConnection function
(which is not declared in any header, though), and nothing like

    Rboolean setConnection(int connNumber, Rconnection *conn);
    Rconnection *getConnection(int connNumber);

I haven't found any relevant documentation on these issues (R-exts
doesn't have any info on handling connections in C code at all). Can any
of you direct me to such docs, or point out how I can instantiate and
return connections from within a package?

I'm still curious about this one. If there really is no way of running
stuff through external filter processes in R, I'd volunteer to add
that.

Best regards & thanks in advance, Jan

+- Jan T. Kim -------------------------------------------------------+
 |    *NEW*    email: jtk@cmp.uea.ac.uk                               |
 |    *NEW*    WWW:   http://www.cmp.uea.ac.uk/people/jtk             |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*

Peter Dalgaard

Fri, Feb 11, 2005 6:37 AM #

"Jan T. Kim" <jtk@cmp.uea.ac.uk> writes:

If you know how, please do. I have a suspicion it might not be as easy
as it sounds because of the producer/consumer aspects. Notice, though,
that in most cases you can get by with system() or pipe() and a
temporary file for either the input or the output.

I remember speculating about these matters when I was first introduced
to pipes in C: They'd show you how to open a pipe for reading and how
to do it for writing, but not how to do both with the same process.
Took me a while to realize that there is a nontrivial deadlock issue
if you try to write to a process that itself is blocked trying to
write its output. Now that is of course not to say that it cannot be
done with clever multiplexing and buffering techniques -- or
multithreading, except that R isn't threaded.

BTW, we met in Heidelberg at the ECMBM ages ago, didn't we?

O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)             FAX: (+45) 35327907

Jan T. Kim

Fri, Feb 11, 2005 12:45 PM #

On Fri, Feb 11, 2005 at 02:32:20PM +0100, Peter Dalgaard wrote:

Personally, I see filtering as a process, and the sequence of collecting
input in a file, then filtering that into an output file, then reading
that and carrying on with it as a more complex process that involves
filtering as a part of it. Additional complexity means that there's more
that can go wrong, which is why I dislike temporary files.

Specifically.  I've seen it happen too often (including to myself) that
things went wrong because other processes were interfering with the
temporary files (in most cases, other processes running the same program).

It's clear to me that for real dynamic filtering, you need two processes
(or threads). This requires that the operating system supports forking,
i.e. that the fork package works. Without that, filtering is not
possible, at least I'm not in any way I'm aware of.

So, my plan would be to add some function to src/main/connections.c for
setting up a pipe running through an external command and returning the
write and read connections for use in the R program. Then, one could do
something like (modelled after the pipe example in the base docs):

    library(fork);
    data2 <- c(
      "450, 390, 467, 654,  30, 542, 334, 432, 421,",
      "357, 497, 493, 550, 549, 467, 575, 578, 342,",
      "446, 547, 534, 495, 979, 479");
    fp <- filterpipe("sed -e s/,$//");
    {
      pid <- fork(slave = NULL)
      if (pid == 0)
      {
        close(fp$read);
	write(data2, file = fp$write);
	close(fp$write);
	exit();
      }
      else
      {
        close(fp$write);
	x <- scan(fp$read);
	close(fp$read);
	wait(pid);
      }
    }

Thinking about your buffering suggestion, it occurs to me that it *may*
be possible to create two anonymous files (of the file("") type) and
to connect these to the stdin and the stdout of an external process.
In fact, a couple of days ago I checked whether pipe() would perhaps
accept optional file arguments for specifying the external process'
stdin and stdout, so I could e.g.

    f <- file("");
    p <- pipe("sed -e s/,$//", stdin = f);
    write(data2, file = f);
    scan(p);

but that turned out to be another detour on the way that took me here...

Best regards, Jan

+- Jan T. Kim -------------------------------------------------------+
 |    *NEW*    email: jtk@cmp.uea.ac.uk                               |
 |    *NEW*    WWW:   http://www.cmp.uea.ac.uk/people/jtk             |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*