Skip to content

RFC: System and time support functions in R

10 messages · Kurt Hornik, Duncan Murdoch, Martin Maechler +3 more

#
I've been looking over system utility functions that we might want to
add to R.  A few come out of specific needs, others from looking at
other systems and what people are using system() for.  I've taken
account of Paul Gilbert's comments posted here a while ago (and I
think covered all except the use of mailers).

We currently have

date
*.socket
file.create
file.exists
file.remove
file.append
dir.create
basename
dirname
list.file/dir
unlink         -- it is none too clear what this should do for dirs.
file.show
getenv


S-PLUS 5.x has

access
files.in.dir	-- we have list.files.
is.dir
mkdir		-- we have dir.create
rmdir		-- recursive or not (unlink only removes empty directories)


I have added today for R-devel (Unix and Wundows)

file.access()	-- an access() work-alike.
file.info()	-- subsumes is.dir(), and give the information from
		   stat(2) calls.
file.copy	-- via file.create and file.append.
Sys.info()	-- give the information from uname(2) (including machine
		   name) and getlogin(2) (the login name).

Things which I think we still may need:

putenv() -- or is `setenv' a better name?  (putenv is the POSIX name).

sleep() -- called Sys.sleep()?, and with sub-second accuracy.
	   Tricky to do with event loops running, but looks possible.
	   Package xgobi under Unix has system("sleep 3"), which is
	   not a good idea in an event-driven system.  I have this
	   running on Windows for xgobi there.

unlink() -- I suggest we add a recursive argument, defaulting to
	    FALSE?  (It is currently TRUE on most platforms.)


The other main area that needs something more is date/times.  For the
moment file.info returns times as days/fractional days since 1 Jan
1970, which chron() can interpret.  But that is not *quite* correct,
as not all days are the same length due to the (rare) use of
leap-seconds.  And chron does not know about timezones.

My suggestion here is to implement a time class called POSIXtime which is
just POSIX's time_t. (Number of seconds since 1 Jan 1970.) And another time
class POSIXtm which is an R list giving a struct tm (secs, mins, hours, day
of month, month, year, day of week, day of year).  (I think it also needs
to record the timezone used.)  Then we can have R functions as vectorized
wrappers for the POSIX functions (not necesarily with these names)

time (say Sys.date):    date() as a POSIXtime variable.
localtime / gmtime:     convert POSIXtime to POSIXtm (local TZ/UTC)
mktime:                 convert POSIXtm to POSIXtime
strftime:               convert POSIXtm to character string, flexibly.
difftime:		difference between times in secs.
			(The wrappers for the last two could handle
			POSIXtime and POSIXtm objects.)

(Perhaps if these do not exist on a platform (unlikely) we can have
less accurate alternatives in our code.  They exist on Windows.)

Possibly we might want to allow

tzset:		        set a time zone, for the above functions

or perhaps better just have tz as an argument to the conversion functions.

Is this is a sensible design strategy?  I am reluctant to add another set
of date functions after packages date and chron, but cannot see how to
easily leverage those to do what I need, and in any case POSIX has thought
this through.
#
David James and I are currently discussing re-implementing chron, and
among the issues is interfacing ANSI C time & date functions as you
suggested above.  We should also provide strptime() which is not ANSI
but simple to implement without locale support, and maybe possible to
take from glibc otherwise (assuming it is not in the system's libs).

(Btw, why POSIX?  K&R gave me the feeling that the above is all ANSI.)

I will cc David on this.  The R list (corresponding to struct tm) could
as well be the basic representation of a chron object ...

-k
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
4 days later
#
On Thu, 20 Jul 2000 08:33:01 +0100 (BST), you wrote in message
<Pine.GSO.4.05.10007200826030.9485-100000@auk.stats>:
Is there any interest in adding binary file access to the base?  I
think it would be really useful, and have put together a prototype
(still for Windows only) that's on my web site at

<http://www.stats.uwo.ca/faculty/murdoch/software/Rstreams.zip>

Duncan Murdoch
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Duncan> On Thu, 20 Jul 2000 08:33:01 +0100 (BST), you wrote in message
    Duncan> <Pine.GSO.4.05.10007200826030.9485-100000@auk.stats>:

    >> I've been looking over system utility functions that we might want
    >> to add to R.  A few come out of specific needs, others from looking
    >> at other systems and what people are using system() for.  I've taken
    >> account of Paul Gilbert's comments posted here a while ago (and I
    >> think covered all except the use of mailers).

    >> We currently have
    >> 
    >> date *.socket file.create file.exists file.remove file.append

    Duncan> Is there any interest in adding binary file access to the base?

yes, quite a bit !  (e.g., someone here doing image analysis, would have
liked to be able to do  this)

    Duncan> I think it would be really useful, and have put together a
    Duncan> prototype (still for Windows only) that's on my web site at
	                        ==============
    Duncan> <http://www.stats.uwo.ca/faculty/murdoch/software/Rstreams.zip>

I first thought "great! .." when you announced this a while ago, but
"Windows only" & relying on Delphi, i.e. proprietary software,
stopped me to even have a look, sorry.
We are committed primarily to the POSIX "clarification" of ANSI C and freely
available tools.

An aside :

  Your binary files are read into/from "character", right?

  I think (and others have talked similarly, here) that byte wise
  reading and writing of files should go together with a "raw" atomic data
  type in R -- and then we probably would want to do it "S version 4" (Sv4)
  compatibly (not that I have looked what this would mean exactly).
  which needs even a bit more extensions in "base R" than we have now.


Coming back to your package:
Is it worth/fast enough to port this to POSIX C?
Have you ever compared it to the (very general) approach taken by Sv4 ?
That would be something worth following at least in parts, I think.

Martin Maechler <maechler@stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO D10	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
I agree it would be useful. Additionally I would vote for
disentangling/extending some functionality in R.

e.g. save() does not only creates an external representation, it also writes
it to a file. So we cannot send the representation to a socket or just to a
character variable. Same with load().

e.g. scan() does combined physical read and parsing, i.e. it reads from file
and keyboard, but not from a socket or character variable. Similarly write()
and read.table(). The logic of how to convert character data too a dataframe
should not be bound to file read.
reading/writing from/to a character variable instead of file access. This
would also allow filtering the data e.g. through a compression algorithm.

Regards


NEW EMAIL
jens.oehlschlaegel@bbdo-interone.de

--
Dr. Jens Oehlschlägel
Analyse
BBDO InterOne
Grünstr. 15
40212 Düsseldorf

Tel.: +49 (0)211 1379-187
Fax.: +49 (0)211 1379-461
http://www.bbdo-interone.de
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Tue, 25 Jul 2000, Martin Maechler wrote:

            
I don't think that translating/re-writing it is a problem, but I thought
Duncan was planning to do this.  If not I will have a go.
[Doesn't look like it to me!]
Probably, but I find byte-wise reading not at all useful (and none of
our image-analysis files are that shallow).
Introducing a new type in R looks to me like a fairly major undertaking.
Duncan's package does a different task at a higher level than `raw' (which
is just an unstructured stream of bytes). As ?readint says

  Signed integers of sizes 1, 2, 4, and 8 bytes can be read.  Unsigned
  integers of any size up to 8 bytes can be read. (Integers larger than are
  supported in R will be returned in a vector of doubles.)  Floats of sizes
  4, 8, and 10 bytes can be read.  Complex values using any of the float
  sizes for the real and complex parts can be read.  Any size of character
  string that you can create can be read.

although some of that is Windows-specific (10 bytes = 80 bits = extended
format, I presume).

I suspect the best way forward is a get a general (non-Delphi, both Unix
and Windows) contributed package working and on CRAN, and then think about
merging it into base if it looks worthwhile.  (There is a lot of very
useful stuff not in base, and the point of my original posting was that
those are things which need to be internal and OS-specific.)

BTW, I think something like inttostr (but not that name) and its converse
would be useful in base.  
\description{
  Converts an integer to a string representation in base 2 to 36.
}
My memory says S had a function called something like oddometer, but
I can't find it.

Another comment: The R code uses _, F and T and is seriously lacking in
spaces. One way to get standard formating is to set
options(keep.source=FALSE) and then read in and dump the code.


How much support is there for adding a `raw' (byte-stream) type?

Brian
#
On Tue, 25 Jul 2000, [iso-8859-1] Jens Oehlschlägel wrote:

            
That's a different topic (from Duncan's and from my subject line).
The plan is to use the Svr4 idea of `connections' to replace files (and
scan does only read from files: stdin is a file to C).
#
Prof Brian D Ripley <ripley@stats.ox.ac.uk> writes:
It has only one 'd'...
#
On Tue, 25 Jul 2000 10:33:17 +0100 (BST), you wrote:

            
I am, but it won't be quick.  I now have a C compiler installed, but I
have very little experience writing in C.  I'll probably ask for help
later in cleaning up whatever I write.
Right, as you quoted I try to read and write the native R types.  It
seemed to me that doing type conversions was a lot easier externally
than it would be internally.
I imagine it would be really easy for someone fluent in both Delphi &
C to port it.  I'm fluent in Delphi but not C (which is why I wrote it
in Delphi in the first place), so it'll take a while longer for me.
No, I'm not familiar with that.
Yes, but that's Intel-specific, not Windows-specific.  I think it
would be useful to have on any platform: the idea is that this code
will allow you to read binary files produced by someone else. For
example, I do include code to handle byte-order switching, and use it
in the demo routine (readsfile) to be able to read binary S objects,
whether produced on big or little endian machines.

If the data you're reading were produced on an Intel platform, they
might include the extended types. Are there conversion routines
available in the libraries that R already uses for machines that don't
support extended as a native type?
Sounds good to me.
I don't object to a name change, but I think "odometer" is a pretty
bad choice.
Thanks, I'll do that.
I haven't had any need for such a thing, but the streams code was
written with the intention that it could be extended to handle streams
of bytes from other sources than just files.  I think that will likely
be hidden when I translate to C:  not being an OO language, it doesn't
really support the concept of an abstract stream, as far as I know.

Duncan
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Tue, 25 Jul 2000, Duncan Murdoch wrote:

            
I will take a quick look. It might be faster to do it that try to help you.
Delphi is an extended Pascal, and I used to be fluent in Pascal.
Correctt: I just don't know how to get them in a binary file on
any other Intel-based OS (which may be my ignorance).
Yes, there are. Both in src/main/arithmetic.c and I think in packages 
stataread and foreign.
It's not the function I wanted, anyway, and not being a word in UK usage I
was unfamiliar with its meaning.  (The UK says `trip counter'.)  What we
really want is a radix argument to as.character applied to integers, I
think.