Skip to content

Wish list

8 messages · Gabor Grothendieck, Duncan Murdoch, Duncan Temple Lang +4 more

#
This is my New Year wishlist for R features.  One
common thread is that I find I sometimes use languages
other than R including javascript, Windows batch and
gawk.  Others have mentioned other languages too.  It
would be nice if, in those cases I could use R
simplifying development into a single environment
(viz. R).

The following are not in any order.

1. Self Contained Executables

Make it possible to create self contained R
executables.  Something like tcl starkits
	  http://www.equi4.com/starkit.html
or Python py2exe
	  http://starship.python.net/crew/theller/py2exe
is what I am thinking of.  Its ok if they
are interpreted as long as its all transparent.

2. R as a Filter

Support using R as a filter analogously to
awk/gawk.  e.g.

    echo a x 3 | R -f myprog.R | findstr /i answer
    echo a x 3 | R -e "chartr('x', 'X', readLines(STDIN()))" | findstr /i X

This would allow replacement of certain awk/gawk
filters with R.  In the above STDIN or some
would refer to the echo output, not to further
input from the script.  I think /dev/stdin
can already be used in UNIX but not in Windows.

3. Microsoft Active Scripting Language

Make R into a Microsoft Active Scripting
language.  Nearly every other major scripting
language including perl, python, ruby, tcl,
oorexx, vbscript, jscript and others have
Microsoft Active Scripting support.  This would
allow R to be used like javascript in HTML files
in Microsoft environments and also in any other
software that supports Microsoft's active
scripting interface.

4. Extend Clipboard Support to Non-Text Objects on
Windows

If one selects and copies a table in Internet
Explorer (IE) one can then paste it into Excel and
it comes out as expected with one Excel cell per
IE table cell.  However, R does not currently
support this level of integration. (Current
workaround is to paste it into Excel and then copy
it back out of Excel.  Excel will add tabs to the
text that is so copied.)

I understand that this feature may be in R 2.3.0
but am mentioning it for completeness.

5. Handhelds

Version(s) of R for handheld computers such as
Palm, Windows Mobile, Symbian, Blackberry, etc.
UNIX-based handhelds would likely be simplest
but the others would likely be useful to a wider
audience.

6. Issue Tracking in Packages

Standard method of tracking issues in CRAN
packages.  Provide svn and Trac support or
equivalent to CRAN package authors or at least
have a common change log mechanism.  There is
currently no uniform way of finding out what has
changed in a package.

7. system

The arguments of "system(...)" should be extended
in various operating systems so that a consistent
set is available across them.  Right now it works
differently under Windows and UNIX.

8. Extend Grid to Base Graphics

Rework base graphics so that they use grid
graphics underneath to the extent possible or else
leave them as is but have a version or package
that emulate them using grid graphics.

9. Eliminate Perl

Get rid of all use of perl within R.  The parts
of R that use perl have not changed much probably
because its too onerous to have to deal with a
complex multilanguage setup.  Eliminating perl might
speed up improvements in those areas.  This mostly
affects the package buildin gprocess which could
then be rehosted within R as a package building
package.

10. Event Loop

Add an event loop mechanism to facilitate GUI
programming in R and also to facilitate the development
of facilities to allow higher levels of interaction
within grid graphics.
#
On 1/1/2006 8:47 AM, Gabor Grothendieck wrote:
How self-contained?  It would be relatively easy to create small 
executables that could make use of the DLLs in R_HOME.  But that's 
probably not useful if better support for your next suggestion was 
there. If you want something that would run on a machine with no R 
installed, that probably involves changing the linking scheme (at least 
in Windows), and would not be so simple.
R is not designed to be secure.  I think this would be a very risky 
thing to do.
Yes, I wrote some functions to do some of this, but I haven't committed 
them to the trunk.  I didn't like the user interface much:  using 
clipboard support depends heavily on being able to handle the various 
clipboard format constants (CF_BITMAP, etc.), and R doesn't handle 
constants well.  I'll take another look at them and see if I get any 
inspiration on how to make them palatable.
All this needs is someone who wants to gather/create the tools and start 
building the binaries.
This sounds like the gridBase package.
There is such a thing now:  see the Writing R Extensions manual.  It 
still needs work, but people using it and suggesting improvements will help.

Duncan Murdoch
#
On 1/1/06, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
Similar to tcl starkits and pyexe.  You can just send someone
an .exe and they can run it.  They don't have to have any software
or files other than that.  Or suppose I want to write an R configuration
facility and don't know which version of R is on the machine or even
if R is on the machine.  Currently I use batch files or javascript to
do this (see http://cran.r-project.org/contrib/extra/batchfiles/) since
one does not want to have configure the configurer.  I would actually
prefer to leverage my knowledge of R and not have to go to the
lengths of using a different language.
Some alternatives are:

- restrict execution to .hta's (which are html files that are restricted to
run locally -- they can't be run from a browser).

- do the work to separate out the OS dependent items into libraries
so that access can be restricted similarly to what vscript and jscript
have done with wsh.

- perhaps there is some way for the user to accept or reject such
applications as trusted or not?
Correct me if I am mistaken here but
my understanding is that this makes it possible to use
grid and standard graphics together but does not produce
grobs for all the elements of a graphic like doing it from
the bottom up would.
#
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Gabor Grothendieck wrote:
We have been thinking of this, for me in the context
of putting R on small sensor network nodes.
This has been in the works for a long time and I still dislike two
small pieces of the solution that make it limited.
Yes not that hard to do given the work in other interfaces,
but I wish the Windows users would contribute it.
There is some work in progress on an extensible, R-based
package mechanism.
Well, there has been work on this that people couldn't agree on.


And while we are on the topic of wishlists...
Generally (i.e. not directed specifically to Gabor),
the suggestions are very welcome, but so are contributions.
And for issues such as making the existing R available on handhelds,
that is a programming task. And I draw a large distinction between
programming and creative research which is based on new concepts and
paradigms.  The pool of people working in statistical computing research
is very small. And to a large extent, their time is consumed with
programming - making the same thing work on multiple platforms,
correcting documentation, etc. which are good things, but
not obviously the best use of available research ability and time.
There are many more topics that are in progress that represent
changes to what we can do  rather than just to how we do the same thing.

One of the reasons S (R and S-Plus) is where it is now
is because in Bell Labs, the idea was to be thinking
5 years ahead and both meeting and directing the needs for the future.
Because of R's popularity (somewhat related to it being free), there is
an aspect of development that focuses more on software for statisticians
to use "right now".
Obviously, th development is a mixture of both the current and the
future, but there is less of the future and certainly less of the
longer term directions that is sacrificed by the need to maintain an
existing system and be backward-compatible.
If statistics is to fulfill its potential in this modern IT, we need new
ideas and research into those new ideas. If we focus on basic
programming tasks (however complex) and demand usability above concepts,
we risk losing those whose primary focus is in statistical computing
research from the field.

While R provides statisticians and stat. comp. researchers with a
terrific vehicle for doing their respective work, it also acts as
a constraint for doing anything even moderately new. But much (not all)
of R is based on innovations from the 1970's, 80's and 90's.   And
as IT evolves at a terrific pace, to keep up with it, we need to be
forward looking.


I'll leave it there - for the moment - and go fight off the ants
that are invading my desk!  While I wrote this down relatively
rapidly, the ideas have been brewing for a long time. If anyone
wishes to comment on the theme, I hope they will take a few minutes
to think about the broad set of issues and tradeoffs.


  D.
- --
Duncan Temple Lang                    duncan at wald.ucdavis.edu
Department of Statistics              work:  (530) 752-4782
4210 Mathematical Sciences Building   fax:   (530) 752-7099
One Shields Ave.
University of California at Davis
Davis,
CA 95616,
USA
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (Darwin)

iD8DBQFDt/by9p/Jzwa2QP4RAr6UAJ4mT9C1JcGwlFFJRFVDteyetDrAjACfax7B
0MpswqQE442j23WzJjqUADA=
=Aq8t
-----END PGP SIGNATURE-----
#
Duncan Temple Lang wrote:

            
Hasn't someone ported R to the Sharp Zaurus, for which both the Linux 
kernel and a more or less complete GNU toolchain exist, plus at least 
two GUI builders? I've forgotten what the compiler version is -- it 
might be back around 2.95.

In any event, one of the Lisps and Maxima have been ported to the 
Zaurus. I'm not sure how well a number crunching application like R 
would run on the Zaurus processor, though -- IIRC the floating point is 
emulated in software. Isn't the same true for Palms and Windows CE PDAs?
I'd much rather have changes to what we can do rather than how we do the 
same thing! As the Perl folks say, "There's more than one way to do it!" 
So keep R and its contributed packages focused on making the first few 
ways to do something new!
Amen! Please don't turn R into Perl! The Perl community has statistical 
libraries for the basics. If that's all you want to do, just learn how 
to do it in Perl. The same goes for Python and Ruby. All the scripting 
languages can be used for basic statistical and numeric processing, and 
their communities are adding libraries for more advanced functionality 
all the time.

But no other language/community has the breadth of advanced statistical 
processing that R and its contributed packages have, and no other 
language has the right core semantics to make this kind of computing 
easy, with the possible exception of the newest dialects of Fortran. I 
*could* write a web ecommerce site in R if I wanted to, but why would I? 
I'd do that in PHP or the new Ruby on Rails, because that's what those 
languages were designed to do well!
Could you elaborate on the nature of the constraints R imposes? 
Obviously there are *time* constraints made necessary by the programming 
tasks and finite number of community members, but are there limits to 
the kinds of scientific/statistical computing thoughts one can think if 
one only uses R and its contributed packages?
I've been thinking about related issues over the holiday break, mostly 
triggered by Paul Graham's essay on a programming language that would 
last 100 years. The essay will appear on my blog in the near future. 
Meanwhile, I'll add my wish list (and list of things I'd work on in my 
spare time if I had any :) ) for R.

1. An integrated symbolic math capability. I think packaging GiNaC 
(http://freshmeat.net/projects/ginac/) is the logical way to do this. 
GiNaC is a C++ library, and I suspect it could be easily packaged, but I 
haven't tried it yet. If someone is ahead of me on this, I'd like to 
know about it before I attempt it.

2. A good solid discrete time and continuous time Markov chain analyzer 
for use in computer performance analysis. There are quite a few good 
toolsets out there, some with GUIs and some without, but nearly all of 
them have licenses that are not free as in speech. They're freely 
obtainable in the academic community, but not for "commercial use". 
There is one exception, and if I followed the path of integrating an 
existing package, I'd go with Prism (http://www.cs.bham.ac.uk/~dxp/prism/).

3. Along the lines of 2, more "out-of-core" solver capabilities. I don't 
think it's going to be much longer before a "typical scientific 
researcher" in a domain like bioinformatics or computer performance 
analysis will have available a two (physical 64-bit) processor 4GB 
workstation with a terabyte of local disk, plus, of course, access to a 
grid for the "big problems." :) At the moment, I don't have any computer 
performance analysis problems with enough states to require an efficient 
out-of-core solver, but it's bound to happen.
#
On Sun, Jan 01, 2006 at 11:08:42AM -0800, M. Edward (Ed) Borasky wrote:
Yes, Simon Pickering did this.  I ran his version of R on a Zaurus two
years ago.  His site implies that development is still underway.  

http://people.bath.ac.uk/enpsgp/Zaurus/index.html
It was slow :).  Also I never got it to work with X, but in fairness I
didn't really try hard.  Simon's site implies that improvements are
being implemented.

Cheers

Andrew
#
On Jan 1, 2006, at 7:36 AM, Duncan Temple Lang wrote:

            
This was recently done with CLISP by, essentially cat'ing the image  
onto the end of the CLISP executable. You could probably do the same  
with an Rda and a startup script (since R isn't truly image based) on  
a small custom front end that links the DLL. You'd need to patch save  
and load, but you'd be able to deliver executable apps (assuming base  
and everything is installed). For a completely self-contained  
executable (i.e. not requiring an installer) you'd probably want to  
build a "minimal" R that you can just source in or load as an image  
(or we could just go to images in general.... R's been sort of  
meandering in that direction for the last couple of years ;-) )
No, with the DCOM stuff it should be pretty straightforward these  
days. We actually did a bit of this with I think it was R 1.1 or so  
before the Bioconductor project got started. Security was an issue as  
Duncan M. pointed out as I recall. You could basically remotely trash  
someone's drive, no good. In the end, I think you can do what you  
need to do with the DCOM interface and everything else you can do  
with a front-end. Being able to hook into .Net/Mono would be useful I  
think. On the Very Cool list would be a "managed" version of R, but  
that falls into the Things That Don't Result In Degrees so its on  
indefinite hold (it does give Naras and I something to talk about at  
coffee though).
Many handhelds are simply too limited. Those that aren't are usually  
Linux based and its just a matter of cross-compiling. Unfortunately a  
lot of the chips are not really suitable for floating point work.  
Additionally, a lot of the very limited PDAs (e.g. Coldfire-based  
PDAs) don't have an MMU.
Yes, Winter (such as it is around here) has arrived hasn't it? :-)
---
Byron Ellis (ellis at stat.harvard.edu)
"Oook" -- The Librarian