Skip to content

Julia

34 messages · Jeff Ryan, Douglas Bates, Kjetil Halvorsen +8 more

Messages 1–25 of 34

#
My purpose in mentioning the Julia language (julialang.org) here is
not to start a flame war.  I find it to be a very interesting
development and others who read this list may want to read about it
too.

It is still very much early days for this language - about the same
stage as R was in 1995 or 1996 when only a few people knew about it -
but Julia holds much potential.  There is a thread about "R and
statistical programming" on groups.google.com/group/julia-dev.  As
always happens, there is a certain amount of grumbling of the "R IS
SOOOO SLOOOOW" flavor but there is also some good discussion regarding
features of R (well, S actually) that are central to the language.
(Disclaimer: I am one of the participants discussing the importance of
data frames and formulas in R.)

If you want to know why Julia has attracted a lot of interest very
recently (like in the last 10 days), as a language it uses multiple
dispatch (like S4 methods) with methods being compiled on the fly
using the LLVM (http://llvm.org) infrastructure.  In some ways it
achieves the Holy Grail of languages like R, Matlab, NumPy, ... in
that it combines the speed of compiled languages with the flexibility
of the high-level interpreted language.

One of the developers, Jeff Bezanson, gave a seminar about the design
of the language at Stanford yesterday, and the video is archived at
http://www.stanford.edu/class/ee380/.  You don't see John Chambers on
camera but I am reasonably certain that a couple of the questions and
comments came from him.
#
Doug,

Agreed on the interesting point - looks like it has some real promise.
 I think the spike in interest could be attributable to Mike
Loukides's tweet on Feb 20. (editor at O'Reilly)

https://twitter.com/#!/mikeloukides/status/171773229407551488

That is exactly the moment I stumbled upon it.

Jeff
On Thu, Mar 1, 2012 at 11:06 AM, Douglas Bates <bates at stat.wisc.edu> wrote:

  
    
#
On Thu, Mar 1, 2012 at 11:20 AM, Jeffrey Ryan <jeffrey.ryan at lemnica.com> wrote:
I think Jeff Bezanson attributes the interest to a blog posting by
Viral Shah, another member of the development team, that hit Reddit.
He said that, with Viral now in India, it all happened overnight for
those in North America and he awoke the next day to find a firestorm
of interest.  I ran across Julia in the Release Notes of LLVM and
mentioned it to Dirk Eddelbuettel who posted about it on Google+ in
January.  (Dirk, being much younger than I, knows about these
new-fangled social media things and I don't.)
#
Can somebody postb a link to the video? I cant find it, searching
"Julia" on youtube stanford channel gives nothing.

Kjetil
On Thu, Mar 1, 2012 at 11:37 AM, Douglas Bates <bates at stat.wisc.edu> wrote:
#
http://julialang.org/blog

Then click on "Stanford Talk Video".
Then click on "available here".

Ted.
On 01-Mar-2012 Kjetil Halvorsen wrote:
-------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
Date: 01-Mar-2012  Time: 20:47:42
This message was sent by XFMail
1 day later
#
On Thu, Mar 01, 2012 at 11:06:51AM -0600, Douglas Bates wrote:
[...]


Very interesting language.
Thank you for mentioning it here.

Compiling from the github-sources was easy.

Will explore it during the next days.

Seems not to be very specific to statistics,
but good for math in general.

Not sure, if it might make sense to combine
R and Julia in the long run (I mean: combining via
providing interfaces between them, calling the one via the
other, merging code or using libs from the one or the other
from each side).

Ciao,
   Oliver
2 days later
#
I haven't used Julia yet, but from my quick reading
of the docs it looks like arguments to functions are
passed by reference and not by value, so functions
can change their arguments.  My recollection from when
I first started using S (in the course of a job helping
profs and grad students do statistical programming, c. 1983)
is that not having to worry about in-place algorithms changing
your data gave S a big advantage over Fortran or C.
While this feature could slow things down and increase
memory code, I felt that it made it easier to write correct
code and to use functions that others had written.
Does Julia have a const declaration or other
means of controlling or documenting that a given function
will or will not change the data passed into it?

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
#
On Mon, Mar 05, 2012 at 03:53:28PM +0000, William Dunlap wrote:
[...]


C also uses Call-by-Value.
Fortran I don't know in detail.
Yes, I also think, that call-by-value decreases
errors in Code.

What I read about Julia it's like MATLAB plus more features for programming.
Does matlab also only use call-by-reference?
I did not explored it in detail so far.
Maybe the orig-poster already did this in more depth?


Ciao,
   Oliver
#
Hi Oliver,
On 03/05/2012 09:08 AM, oliver wrote:
C *only* uses Call-by-Value.

Cheers,
H.

  
    
#
On Mon, Mar 05, 2012 at 03:58:59PM -0800, Herv? Pag?s wrote:
[...]


Yes, that's what I meant.

With "also" I meant, that it uses call-by-value, as some
other languages also do.


Ciao,
   Oliver
#
On 12-03-05 6:58 PM, Herv? Pag?s wrote:
While literally true, the fact that you can't send an array by value, 
and must send the value of a pointer to it, kind of supports Bill's 
point:  in C, you mostly end up sending arrays by reference.

Duncan Murdoch
#
Yes, C does use call by value, always.  However, data arrays
are almost always passed via pointers to malloc'ed space,
so, effectively, data arrays are passed by reference.
(One can put a 'const type*' in the prototype of a function to declare
that the data pointed to will not not be changed, but it is
up to documentation or coding standards to let someone know that
data pointed to will likely be changed.)

I find R's (& S+'s & S's) copy-on-write-if-not-copying-would-be-discoverable-
by-the-uer machanism for giving the allusion of pass-by-value a good way
to structure the contract between the function writer and the function user.
Does Julia have the tools to let a function writer or user decide whether
he really needs to copy its arguments or not?

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
#
There are many experts on this topic.  I'll keep this short.

Newer Fortran Languages allow for call by value, but call by reference
is the typical and historically, the only approach (there was a time
when you could change the value of 1 to 2!).

C "only" calls by value except that the value can be a pointer! So,
havoc is just a * away.

I'm very pleased to be on this list and read the discussion. Thank you
Douglas Bates for sending the first message.

I like R and will continue to use it. However, I also think that
strict "call by value" can get you into trouble, just trouble of a
different kind. I'm not sure we will ever yearn for "Julia ouR-Julia",
but it is sure fun to think about what might be possible with this
language. And having fun is one key objective.

Nick Crookston

2012/3/5 oliver <oliver at first.in-berlin.de>:
#
On Mon, Mar 05, 2012 at 04:54:05PM -0800, Nicholas Crookston wrote:
Oh, strange.
[...]

For me there was no "havoc" at this point, but for others maybe.

There are also other languages that only use call-by-value...
...functional languages are that way in principal.

  Nevertheless internally they may heavily use pointers and
  even if you have values that are large arrays for example,
  they internally just give a pointer to that data structure.
  (That's, why functional languages are not necessarily slow
  just because you act on large data and have no references
  in that language. (A common misunderstanding about functional
  languages must be slow because they have nor references.)
  The pointer-stuff is just hidden.

Even they ((non-purely) functional languages) may have references,
their concept of references is different. (See OCaml for example.)
There you can use references to change values in place, but the
reference itself is a functional value, and you will never have
access to the pointer stuff directly. Hence no problems with
mem-arithmetics and dangling pointer's or Null-pointers.



[...]
Can you elaborate more on this?
What problems do you have in mind?
And what kind of references do you have in mind?
The C-like pointers or something like OCaml's ref's?
I have fun if things work.
And if the tools do, what I want to achieve...
...and the fun is better, if they do it elegantly.

Do you ask for references in R?
And what kind of references do you have in mind,
and why does it hurt you not to have them?

Can you give examples, so that it's easier to see,
whwere you miss something?


Ciao,
   Oliver

P.S.: The speed issue of R was coming up more than once;
      in some blog posts it was mentioned. would it make
      sense to start a seperated thread of it?
      In one  of the blog-articles I read, it was mourned about
      how NA / missing values were handled, and that NA should
      maybe become thrown out, just to get higher speed.
      I would not like to have that. Handling NA as special
      case IMHO is a very good way. Don't remember if the
      article I have in mind just argued about HOW this was
      handled, or if it should be thrown out completely.
      Making the handling of it better and more performant I
      think is a good idea, ignoring NA IMHO is a bad idea.

      But maybe that really would be worth a seperate thread?
#
On Mon, Mar 05, 2012 at 07:33:10PM -0500, Duncan Murdoch wrote:
[...]

It's a problem of how the term "reference" is used.
If you want to limit the possible confsion, better say:
giving the pointer-by-value.

Or: giving the address-value of the array/struct/...
by value.

To say, you give the array reference is a shorthand,
which maybe creates confusion.

Just avoiding the word "reference" here would make it more clear.
AFAIK in C++ references are different to pointers. (Some others
who know C++ in detail might explain this in detail.)

So, using the same terms for many different concepts can create
a mess in understanding.


Ciao,
   Oliver
#
On Tue, Mar 06, 2012 at 12:35:32AM +0000, William Dunlap wrote:
[...]
[...]


Can you elaborate more on this,
especially on the ...-...-...-if-not-copying-would-be-discoverable-by-the-uer
stuff?

What do you mean with discoverability of not-copying?

Ciao,
   Oliver
#
S (and its derivatives and successors) promises that functions
will not change their arguments, so in an expression like
   val <- func(arg)
you know that arg will not be changed.  You can
do that by having func copy arg before doing anything,
but that uses space and time that you want to conserve.
If arg is not a named item in any environment then it
should be fine to write over the original because there
is no way the caller can detect that shortcut.  E.g., in
    cx <- cos(runif(n))
the cos function does not need to allocate new space for
its output, it can just write over its input because, without
a name attached to it, the caller has no way of looking
at what runif(n) returned.  If you did
    x <- runif(n)
    cx <- cos(x)
then cos would have to allocate new space for its output
because overwriting its input would affect a subsequent
    sum(x)
I suppose that end-users and function-writers could learn
to live with having to decide when to copy, but not having
to make that decision makes S more pleasant (and safer) to use.
I think that is a major reason that people are able to
share S code so easily.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
#
On Tue, Mar 6, 2012 at 11:44 AM, William Dunlap <wdunlap at tibco.com> wrote:
But don't forget the "Holy Grail" that Doug mentioned at the
start of this thread: finding a flexible language that is also
fast. Currently many R packages employ C/C++ components
to compensate for the fact that the R interpreter can be slow,
and the pass-by-value semantics of S provides no protection
here.

In 2008 Ross Ihaka and Duncan Temple Lang published the
paper "Back to the Future: Lisp as a base for a statistical
computing system" where they propose Common
Lisp as a new foundation for R. They suggest that
this could be done while maintaining the same
familiar R syntax.

A key requirement of any strategy is to maintain
easy access to the huge universe of existing
C/C++/Fortran numerical and graphics libraries,
as these libraries are not likely to be rewritten.

Thus there will always be a need for a foreign
function interface, and the problem is to provide
a flexible and type-safe language that does not
force developers to use another unfamiliar,
less flexible, and error-prone language to
optimize the hot spots.

Dominick
#
On Tue, Mar 6, 2012 at 3:56 AM, oliver <oliver at first.in-berlin.de> wrote:
OCaml refs are an "escape hatch" from the pure
functional programming paradigm where nothing can
be changed once given a value, an extreme form of
pass-by-value. Similarly, most languages that are
advertised as pass-by-value include some kind of
escape hatch that permits you to work with pointers
(or mutable vectors) for improved runtime performance.

The speed issues arise for two main reasons: interpreting
code is much slower than running machine code, and
copying large data structures can be expensive.
Pass-by-value semantics forces this to happen in
many situations where the compiler/interpreter cannot
safely optimize it away.

Based on the video Julia manages the speed issue by
viewing everything like a template, thus generating new
methods based on type inference. This means there isn't
a lot of runtime type checking for dispatch, because
customized methods were already generated, but this
can lead to another problem: code bloat. There are
no free lunches.
#
On Wed, Mar 07, 2012 at 10:31:14AM -0500, Dominick Samperi wrote:
OCaml is not a purely functional language and has
not the claim to be one; hence it's not an "escape hatch"
(which seem to have a negative touch to me).

Arrays and strings in OCaml are also imperative.
And with the "mutable" attribute in records, you also can
crearte imperative record entries.

So, it's just a different design / approach than
Haskell for example. OCaml is coming from ML-languages.

Purely Functional on the one hand is beautiful, and 
therefore nice; but it also is dogmatic on the other hand.
References in OCaml are NOT pointers.
You do have access in an imperative / in-place way, but you
have NO POINTER STUFF in that language.

====================================================
# let a = ref 5;;
val a : int ref = {contents = 5}
# a := 7;;
- : unit = ()
# a;;
- : int ref = {contents = 7}
# 
====================================================

This is in-place modification of the contents of the ref,
without any pointer arithmetics.
"a" is a functional value which hosts an imperative one
on the inside.
The functional approach often saves time and space.
This is just not well known.
And the distinction of imperative vs. functional has
nothing to do with interpreted vs. directly executed.


====================================================
# let mylist_1 = [ 3;5;323 ];;
val mylist_1 : int list = [3; 5; 323]
# let mylist_2 = 12 :: mylist_1;;
val mylist_2 : int list = [12; 3; 5; 323]
# mylist_1;;
- : int list = [3; 5; 323]
# mylist_2;;
- : int list = [12; 3; 5; 323]
# 
====================================================

Both lists share the common elements here.
No copy is done.
In this case the functional approach is very nice.

Just a counter-example to "functional is eating up space".

When thinking about the questions here, I think
the design of Ocaml addressed all this, and that this was
the design decision, why arrays are possible to be changed
imperatively.

====================================================
# let my_array = [| 1; 3; 54; 99 |];;
val my_array : int array = [|1; 3; 54; 99|]
# my_array;;
- : int array = [|1; 3; 54; 99|]
# my_array.(2) <- 99999;;
- : unit = ()
# my_array;;
- : int array = [|1; 3; 99999; 99|]
# 
====================================================

If R is rather purely functional here,
then the problem addressed here is, that
a pureley functional approach without any "escape hatches"
creates the problem.

If in-place modification is also not possible on arrays,
then this is the base of the problem.

But changing this behaviour in newer versions of R
would brake a lot of already existing R-code.


Ciao,
   Oliver
#
On Tue, Mar 06, 2012 at 12:49:32PM -0500, Dominick Samperi wrote:
You have two names here, x and cx, hence
your example does not fit into what you want to explain.

A better example would be:
x <- runif(n)
x <- cos(x)
[...]

The distinction imperative vs. functional has nothing to do
with the distinction interpreted vs. directly executed.




Thinking again on the problem that was mentioned here,
I think it might be circumvented.

Looking again at R's properties, looking again into U.Ligges "Programmieren in
R", I saw there was mentioned that in R anything (?!) is an object... so then it's
OOP; but also it was mentioned, R is a functional language. But this does not
mean it's purely functional or has no imperative data structures.

As R relies heavily on vectors, here we have an imperative datastructure.

So, it rather looks to me that "<-" does work in-place
on the vectors, even "<-" itself is a function (which does not matter for
the problem).

If thats true (I assume here, it is; correct me, if it's wrong),
then I think, assigning with "<<-" and assign() also would do an imperative
(in-place) change of the contents.

Then the copying-of-big-objects-when-passed-as-args problem can be circumvented
by working on either a variable in the GlobalEnv (and using "<<-", or using a
certain environment for the big data and passing it's name (and the variable)
as value to the function which then uses assign() and get() to work on that
data.
Then in-place modification should be possible.
If I here "type safe" I rather would think about OCaml
or maybe Ada, but not LISP.

Also, LISP has so many "("'s and ")"'s,
that it's making people going crazy ;-)

Ciao,
   Oliver
#
No my examples are what I meant.  My point was that a function, say cos(),
can act like it does call-by-value but conserve memory when it can  if it can
distinguish between the case
    cx <- cos(x=runif(n)) # no allocation needed, use the input space for the return value
and and the case
   x <- runif(n)
   cx <- cos(x=x) # return value cannot reuse the argument's memory, so allocate space for return value
   sum(x)              # Otherwise sum(x) would return sum(cx)
The function needs to know if a memory block is referred to by a name in any environment
in order to do that.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
#
Hi,

ok, thank you for clarifiying what you meant.
You only referred to the reusage of the args,
not of an already existing vector.
So I overgenerealized your example.

But when looking at your example,
and how I would implement the cos()
I doubt I would use copying the args
before calculating the result.

Just allocate a result-vector, and then place the cos()
of the input-vector into the result vector.

I didn't looked at how it is done in R,
but I would guess it's like that.


  In pseudo-Code something like that:
    cos_val[idx] = cos( input_val[idx] );

But R also handles complex data with cos()
so it will look a bit more laborious.

What I have seen so far from implementing C-extensions
for R is rather C-ish, and so you have the control
on many details. Copying the input just to read it
would not make sense here.

I doubt that R internally is doing that.
Or did you found that in the R-code?

The other problem, someone mentioned, was *changing* the contents
of a matrix... and that this is NO>T done in-place, when using
a function for it.
But the namespace-name / variable-name as "references" to the matrix
might solve that problem.


Ciao,
  Oliver
On Wed, Mar 07, 2012 at 07:10:43PM +0000, William Dunlap wrote:
#
Ah, and you mean if it's an anonymous array
it could be reused directly from the args.

OK, now I see why you insist on the anonymous data thing.
I didn't grasped it even in my last mail.



But that somehow also relates to what I wrote about reusing an already
existing, named vector.

Just the moment of in-place-modification is different.

From
  x  <- runif(n)
  cx <- cos(x)

instead of
to something like

  cx  <- runif(n)
  cos( cx, inplace=TRUE)

or

  cos( runif(n), inplace=TRUE)




This way it would be possible to specify the reusage
of the input *explicitly* (without  implicit rules
like anonymous vs. named values).



In Pseudo-Code something like that:

   if (in_place == TRUE )
   {
     input_val[idx] = cos( input_val[idx] );
     return input_val;
   }
   else
   {
     result_val = alloc_vec( LENGTH(input_val), ... );
     result_val[idx] = cos( input_val[idx] );
     return result_val;
   }



Is this matching, what you were looking for?


Ciao,
   Oliver
On Thu, Mar 08, 2012 at 02:56:24PM +0100, oliver wrote:
#
So you propose an inplace=TRUE/FALSE entry for each
argument to each function which may may want to avoid
allocating memory?  The major problem is that the function
writer has no idea what the value of inplace should be,
as it depends on how the function gets called.  This makes
writing reusable functions (hence packages) difficult.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com