Skip to content

Julia

34 messages · Douglas Bates, Jeff Ryan, Kjetil Halvorsen +8 more

Messages 26–34 of 34

#
I don't think that using in-place modification as a general property would make
sense.

In-place modification brings in side-effects and that would mean that
the order of evaluation can change the result.

To get reliable results, the order of evaluation should not be
the reason for different results, and thats the reason, why
the functional approach is much better for reliable programs.

So, in general I would say, this feature is a no-no.
In general I would rather discourage in-place modification.

For some certain cases it might help...
but for such certain cases either such a boolean flag
or programming a sparate module in C would make sense.

There could also be a global in-place-flag that might be used (via options
maybe) but if such a thing would be implemented, the default value should be
FALSE.



Ciao,
   Oliver
On Thu, Mar 08, 2012 at 04:21:42PM +0000, William Dunlap wrote:
#
I guess my point is not getting across.  The user should see
the functional programming style but under the hood the
evaluator should be able to use whatever memory and time
saving tricks it can.  Julia seems to want to be a nonfunctional
language, which I think makes it harder to write the sort of
easily reusable functions that S allows.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
#
Aha, ok.

So you not especially look at that one feature (like the anonymous
evaluation tricks), but in general want to ask for better internal optimization.

Especially with your example of the anonymous (unnamed) values given to
a function, I would ask: do you want to write programs all without
using names/variables?
I think this would be much harder than just to add a boolean flag
with inplace=TRUE.
So your reply on the flag-proposal as too much of bad usability
I need to reply with: it's even worse to write code without
variable names and put anything into anonymous datastructures,
that are called inside function application, and inside each of the arguments
there will be more of unnamed calculations.
You will end up not only with a mess, but also with slower calculations,
because unnamed ressources must be calculated more than once if they will be used
more than once.

So I think that you are just asking for more internal optimizations.
Fine.

But I think internal intermediate code (that can be optimized)
would be better than that one "enhancement" of reusing anonymous
data for the output.


Ciao,
   Oliver
On Thu, Mar 08, 2012 at 10:27:22PM +0000, William Dunlap wrote:
8 days later
#
Hello,

regarding the copying issue,
I would like to point to the 

"Writing R-Extensions" documentation.

There it is mentio9ned, that functions of extensions
that use the .C interface normally do get their arguments
pre-copied...


In section 5.2:

  "There can be up to 65 further arguments giving R objects to be
  passed to compiled code. Normally these are copied before being
  passed in, and copied again to an R list object when the compiled
  code returns."

But for the .Call and .Extension interfaces this is NOT the case.



In section 5.9:
  "The .Call and .External interfaces allow much more control, but
  they also impose much greater responsibilities so need to be used
  with care. Neither .Call nor .External copy their arguments. You
  should treat arguments you receive through these interfaces as
  read-only."


Why is read-only preferred?

Please, see the discussion in section 5.9.10.

It's mentioned there, that a copy of an object in the R-language
not necessarily doies a real copy of that object, but instead of
this, just a "rerference" to the real data is created (two names
referring to one bulk of data). That's typical functional
programming: not a variable, but a name (and possibly more than one
name) bound to an object.


Of course, if yo change the orgiginal named value, when there
would be no copy of it, before changing it, then both names
would refer to the changed data.
of course that is not, what is wanted.

But what you also can see in section 5.9.10 is, that
there already is a mechanism (reference counting) that allows
to distinguish between unnamed and named object.

So, this is directly adressing the points you have mentioned in your
examples.

So, at least in principial, R allows to do in-place modifications
of object with the .Call interface.

You seem to refer to the .C interface, and I had explored the .Call
interface. That's the reason why you may insist on "it's copyied
always" and I wondered, what you were talking about, because the
.Call interface allowed me rather C-like raw style of programming
(and the user of it to decide, if copying will be done or not).

The mechanism to descide, if copying should be done or not,
also is mentioined in section 5.9.10: NAMED and SET_NAMED macros.
with NAMED you can get the number of references.

But later in that section it is mentioned, that - at least for now -
NAMED always returns the value 2.


  "Currently all arguments to a .Call call will have NAMED set to 2,
  and so users must assume that they need to be duplicated before
  alteration."
               (section 5.9.10, last sentence)


So, the in-place modification can be done already with the .Call
intefcae for example. But the decision if it is safe or not
is not supported at the moment.

So the situation is somewhere between: "it is possible" and
"R does not support a safe decision if, what is possible, also
can be recommended".
At the moment R rather deprecates in-place modification by default
(the save way, and I agree with this default).

But it's not true, that R in general copies arguments.

But this seems to be true for the .C interface.

Maybe a lot of performance-/memory-problems can be solved
by rewriting already existing packages, by providing them
via .Call instead of .C.


Ciao,
   Oliver
On Tue, Mar 06, 2012 at 04:44:49PM +0000, William Dunlap wrote:
3 days later
#
Hi Oliver,
On 03/17/2012 08:35 AM, oliver wrote:
My understanding is that most packages use the .C interface
because it's simpler to deal with and because they don't need
to pass complicated objects at the C level, just atomic vectors.
My guess is that it's probably rarely the case that the cost
of copying the arguments passed to .C is significant, but,
if that was the case, then they could always call .C() with
DUP=FALSE. However, using DUP=FALSE is dangerous (see Warning
section in the man page).

No need to switch to .Call

Cheers,
H.

  
    
#
On Tue, Mar 20, 2012 at 12:08:12PM -0700, Herv? Pag?s wrote:
[...]
[...]

Yes. I have seen that (DUP=FALSE) in the docs, but while I was
writing the answer like a maniac, I forgot it. ;-)

Thanks for mentionig it.

In the manual also was mentioned, that .Call allows more control.
I did not looked at .C and used .Call from the beginning on.
It did not looked very complicated. But maybe .C would be much easier.
Don't know.
OK, at least not for the point of DUP-arg.
But it seems to me, that when later the names-result will
be correctly set to 0, 1 and 2, then such optimisations,
which were asked for, could be done "automagically".
And to do it safely too.

The .C interface with the DUP-arg seems not to allow this.


Ciao,
   Oliver
1 day later
#
On Mar 20, 2012, at 3:08 PM, Herv? Pag?s wrote:

            
I strongly disagree. I'm appalled to see that sentence here. The overhead is significant for any large vector and it is in particular unnecessary since in .C you have to allocate *and copy* space even for results (twice!). Also it is very error-prone, because you have no information about the length of vectors so it's easy to run out of bounds and there is no way to check. IMHO .C should not be used for any code written in this century (the only exception may be if you are passing no data, e.g. if all you do is to pass a flag and expect no result, you can get away with it even if it is more dangerous). It is a legacy interface that dates way back and is essentially just re-named .Fortran interface. Again, I would strongly recommend the use of .Call in any recent code because it is safer and more efficient (if you don't care about either attribute, well, feel free ;)).

Cheers,
Simon
#
On 03/21/2012 06:23 PM, Simon Urbanek wrote:
Come on!
So aleph will not support the .C interface? ;-)

H.

  
    
#
On Mar 21, 2012, at 9:31 PM, Herv? Pag?s wrote:

            
It will look at the timestamp of the source file and delete the package if it is not before 1980 ;). Otherwise it will send a request for punch cards with ".C is deprecated, please upgrade to .Call" stamped out :P At that point I'll be flaming about using the native Aleph interface and not the R compatibility layer ;)

Cheers,
S