Skip to content

Minimal build of R ...

13 messages · Jony Hudson, Gabriel Becker, R. Michael Weylandt +2 more

#
Hi,

 I'm trying to cross-compile R to javascript so that it can run in a web-browser. Take as long as you need to stop laughing. So, as I was saying - I want to try and get a build of R running in the browser. [If you're not familiar with it already, you might enjoy looking at emscripten.org. It's a remarkably capable tool for translating LLVM bitcode to javascript. Check out some of the demos!]

I'm trying to start out with the most minimal build of R possible. I can turn off various options in the configure script, but I'm wondering about the bundled R packages (base, stats etc). I'm guessing that the native code portions of these packages are dynamically loaded at runtime, which will probably need patching. To start off, I'd like to not build these packages if possible.

So, is there a way to configure which packages in the library get built or is it just a case of editing the makefile? And is there a minimal set of them that would still allow R to run (not be useful - that can come later - just run)?

Thanks in advance for any help anyone can provide :-)


Jony

--
Centre for Cold Matter, The Blackett Laboratory,
Imperial College London, London SW7 2BW
T: +44 (0)207 5947741
http://www.imperial.ac.uk/people/jony.hudson
http://www.imperial.ac.uk/ccm/research/edm
http://www.monkeycruncher.org
http://j-star.org/
--
#
On Thu, May 2, 2013 at 5:12 PM, Jony Hudson <jony.hudson at imperial.ac.uk> wrote:
You can run just "base:"

for(i in 1:(length(search()) - 2)){detach(2)}

search()

Not sure if you need "compiler" around for the build process, but we
survived without it once, so I'd assume you can get by without it if
you're willing to tweak.

Godspeed,

MW
#
On May 2, 2013, at 6:18 PM, Gabriel Becker wrote:

            
Minor detail: it requires you to have R *and* a special plugin which makes it pretty much non-deployable. It's completely unrelated to what Jony is proposing - which doesn't require any dependencies and is actually pretty cool and would be useful if feasible. FWIW: There are many ways to run R from a browser that already exist - without the need for plugins or other client-side hacks - that's the beauty of modern browsers :).


To get this back on the actual topic: I have been toying with cross-compiling R when I was porting it on the iPhone and it's possible, however, you can't use the build process as-is. It does build core R properly, but the problem is that you need to bootstrap R to build any packages. I worked around the problem at the time by building packages on another platform and re-using those files (things like lazy-loaded DBs, compiled RD files etc.).

I can imagine that you'll need some equivalent to dynamic linking, but conceptually it's nothing else but calling functions, so I think you should be able to compile each package separately and just replace the dynload code by code that loads another JavaScript. The nice thing is that packages will simply be just another JS libraries. That's all in theory, I didn't actually try that part. I suspect you'll have a lot of work, e.g. you'll need to map all the I/O operations that load compiled/stored R code, documentation, data from somewhere etc. Good luck!
If all fails, you can always compile R for JS/Linux ;).

Cheers,
Simon
#
Hi All,

 thanks for the replies. Very helpful to know that it will run with just base. Looks like the best bet, at least to get started, is to not use the usual build-process, but to come up with a simple build-script for just the core. Ultimately, the build script has to be different anyway, as compiling the Fortran code to JS requires a few more steps than the native compile.

For a bit of context, the reason I'm toying with this is I've been experimenting recently with analysis-in-the-browser. The kernel of the idea is that if you could do real analysis, without installing anything, and share it on the web then it would be a Good Thing, and could make it easier to engage people with data. I've got a proof-of-concept version running here http://www.monkeycruncher.org that let's you write javascript analysis code in notebook-style documents. It's neat, but it's a bit hamstrung by the lack of javascript libraries to actually do any useful analysis! If you could have R running in there though, that would be a much better proposition ...

I'll let you know if I make any progress!


Jony


--
Centre for Cold Matter, The Blackett Laboratory,
Imperial College London, London SW7 2BW
T: +44 (0)207 5947741
http://www.imperial.ac.uk/people/jony.hudson
http://www.imperial.ac.uk/ccm/research/edm
http://www.monkeycruncher.org
http://j-star.org/
--
On 3 May 2013, at 01:31, Simon Urbanek <simon.urbanek at r-project.org> wrote:

            
#
Hi Gabriel,

 yes, packages obviously contain all the good stuff, but need to start somewhere!

The ipython notebook project is very impressive, and I've been keeping a close eye on it, although I started out on monkeycruncher long before I was aware of it (I make slow progress). I guess I think of my thing as an experiment in just how much can be done purely in the web client. There are some advantages to pure client-side (rich interactivity, no need for a server, ubiquity) which make it interesting, but it might be a bit "too soon" to be useful!


Jony


--
Centre for Cold Matter, The Blackett Laboratory,
Imperial College London, London SW7 2BW
T: +44 (0)207 5947741
http://www.imperial.ac.uk/people/jony.hudson
http://www.imperial.ac.uk/ccm/research/edm
http://www.monkeycruncher.org
http://j-star.org/
--
On 3 May 2013, at 16:46, Gabriel Becker <gmbecker at ucdavis.edu> wrote:

            
#
On May 3, 2013, at 11:21 AM, Jony Hudson wrote:

            
It seems that you want something not unlike RCloud
http://stats.research.att.com/RCloud/
It uses WebSockets to talk to R either locally or on a server. The nice thing about using WS is that you can leverage large clusters - are not tied to the local machine. Also it allows you to get the benefits of both worlds: R for computation + static graphics while allowing you do to cool interactive graphics in JavaScript. RCloud is something like iPython notebook but based on R with extra interactive graphics. But this is getting OT ;).

Cheers,
Simon
#
On May 3, 2013, at 12:52 PM, Gabriel Becker wrote:

            
This has been attempted quite a few times - literally by the canvas package (in generates JavaScript) and in more general terms by using SVG (this was way back when it was en vogue). The problem is that by design R plots lack the link between data and the objects drawn so you can only add a small amount of interactivity to very specific plots by hand-crafing the links or by trying to apply some heuristics, but it doesn't work in general. That's why all the web-baed interactive graphics typically do it the other way around - define JS-based primitives with interactions and build plots from this. You actually get nice interactive graphics, but you can't re-use R-based graphics (other than re-drawing it interactively, but that's another story).

Cheers,
Simon
20 days later
#
Hi,

 I'm making some progress with this, but have hit a sticking point and am looking for a hint. Most of the compilation seems to be going ok, after some liberal use of f2c, but I'm getting compile errors in src/main/connections.c :

connections.c:926:43: error: use of undeclared identifier 'SSIZE_MAX'
    if ((double) size * (double) nitems > SSIZE_MAX)
                                          ^
connections.c:937:43: error: use of undeclared identifier 'SSIZE_MAX'
    if ((double) size * (double) nitems > SSIZE_MAX)
                                          ^
connections.c:3354:21: warning: implicit conversion from 'long long' to
      'R_xlen_t' (aka 'int') changes value from 4503599627370496 to 0
      [-Wconstant-conversion]
    nnn = (n < 0) ? R_XLEN_T_MAX : n;
        ~           ^~~~~~~~~~~~
../../src/include/Rinternals.h:65:23: note: expanded from macro 'R_XLEN_T_MAX'
# define R_XLEN_T_MAX 4503599627370496
                      ^~~~~~~~~~~~~~~~
connections.c:3662:11: error: duplicate case value '4'
            case sizeof(long):
                 ^
connections.c:3660:11: note: previous case defined here
            case sizeof(int):
                 ^
connections.c:3680:11: error: duplicate case value '4'
            case sizeof(long):
                 ^
connections.c:3678:11: note: previous case defined here
            case sizeof(int):
                 ^
connections.c:3912:11: error: duplicate case value '4'
            case sizeof(long):
                 ^
connections.c:3910:11: note: previous case defined here
            case sizeof(int):
                 ^
connections.c:3956:11: error: duplicate case value '4'
            case sizeof(long):
                 ^
connections.c:3952:11: note: previous case defined here
            case sizeof(int):

Recall that I'm compiling with emscripten, which uses clang to generate LLVM bitcode, which is then converted to javascript. I'm currently using the existing autotools build scripts, which emscripten tries to twist in to doing something sensible. It's quite possible that it's ending up mis-"./configure"d though.

I appreciate this is fairly off-topic, but if anyone has any pointers where to start looking, they would be greatly appreciated :-)

Thanks,


Jony

--
Centre for Cold Matter, The Blackett Laboratory,
Imperial College London, London SW7 2BW
T: +44 (0)207 5947741
http://www.imperial.ac.uk/people/jony.hudson
http://www.imperial.ac.uk/ccm/research/edm
http://www.monkeycruncher.org
http://j-star.org/
--
On 2 May 2013, at 17:12, Jony Hudson <jony.hudson at imperial.ac.uk> wrote:

            
#
On May 23, 2013, at 23:07 , Jony Hudson wrote:

            
Looks like SSIZE_MAX is usually <*/limits.h>:

pd$ grep -r SSIZE_MAX /usr/include/
/usr/include/i386/limits.h:#define	SSIZE_MAX	LONG_MAX	/* max value for a ssize_t */
/usr/include/limits.h:#define	_POSIX_SSIZE_MAX	32767
/usr/include/ppc/limits.h:#define	SSIZE_MAX	LONG_MAX	/* max value for a ssize_t */

If R_xlen_t is int, you need to adjust R_XLEN_T_MAX to INT_MAX or so.

The case warnings look like they are bound to happen on systems where int and long have the same size, and should presumably be harmless.