Skip to content

Advice on parsing / overriding function calls

8 messages · Hadley Wickham, Michael Cassin, Hin-Tak Leung +3 more

#
What are you trying to defend against?  A serious attacker could still
use rm/assign/get/eval/... to circumvent your replaced functions.  I
think it would be very difficult (if not impossible) to prevent this
from happening), especially if the user can load packages.

Hadley
On 8/16/07, Michael Cassin <michael at cassin.name> wrote:

  
    
#
Well, I think there are some serious use e.g. offering a web server
for script uploaded then downloading the Rout result back...

The issue is more about whether he wants to limit *all* file system 
access or just limiting to certain areas. For the former,
I would set up a chroot jail and run R from within; for the latter,
I would probably do something with LD_LIBRARY_PRELOAD to override
all the file system accessing functions in libc directly, really.
That would fix the problem with system(rm) and some such, I think,
because if your entire R process and any sub-process R launches has no 
access to the genuine libc fwrite/fread/etc functions you cannot do
any demage, right?
Both are tricky and take time to do (the chroot jail a bit easier, 
actually...), but quite do-able.

It depends on (1) how paranoid you are, (2) how much trouble you want to 
have for yourself to achieve those restrictions...
hadley wickham wrote:
#
a sneaky trick:

for each compute session, automate setting up a zone ("solaris 
containers") on a solaris 10+ box.  if you have a 
preinstalled/preconfigured zone template, snapshotted with zfs, you can 
roll out a new compute zone in literally seconds.  you can quota it, limit 
the amount of CPU it gets, etc.  really not very difficult at all to set 
up.  sun's tools are *great* for this nowadays.

this is substantially safer than chroot() or LD_PRELOAD tricks, and lets 
you do this stuff without having to invent the wheel.

it also reduces overhead to the point where you really *can* set up a 
naked compute (well, with R in it...) environment for every compute 
session getting instantiated.  in way, way, way less time than it takes 
for the computations to actually run.

if someone does system(rm) in a container... who cares?  they just trashed 
their own session, and nothing else.  just blow the trashed ones away 
periodically.

--e
#
Thinking along these lines, we actually have a mechanism for  
replacing the system call (it's used by the Mac GUI to allow root  
calls) and one could think of expanding this to all critical  
operations. Clearly, there are issues (speed for example), but it  
would be nice to have a 'fortified' version of R that allows turing  
on restrictions. I don't think it's easy, but given the rising demand  
(at least in my perception), it would be interesting to see how far  
we can get.

Re filtering strings in commands - I don't think this will work,  
because you can compute on the language, so you can construct  
arbitrary calls without using the names in verbatim, so it is  
possible to circumvent such filters fairly easily.

Cheers,
Simon
On Aug 16, 2007, at 9:23 AM, Hin-Tak Leung wrote:

            
#
On Thu, 16 Aug 2007, Simon Urbanek wrote:

            
Exactly.  All I would need is access to a file() connection, and I could 
easily do that in such a way that 'file' never appeared in the script.
And I've thought of half a dozen backdoors that have not been mentioned in 
this thread.

I am not sure there is really much point in trying to fortify R, when 
that's the OS's job and it may well be better to run R in a suitable 
sandbox.  Certainly I think that is the solution for web services.

One area where it may be necessary is embedded applications.  Certainly if 
R is embedded into the same process (which is how R as an shlib or DLL is 
usually used) then you may want the main application to have privileges 
you do not give to the embedded R.  But using a separate process (e.g. via 
Rserve) may be more secure.