My purpose in mentioning the Julia language (julialang.org) here is not to start a flame war. I find it to be a very interesting development and others who read this list may want to read about it too. It is still very much early days for this language - about the same stage as R was in 1995 or 1996 when only a few people knew about it - but Julia holds much potential. There is a thread about "R and statistical programming" on groups.google.com/group/julia-dev. As always happens, there is a certain amount of grumbling of the "R IS SOOOO SLOOOOW" flavor but there is also some good discussion regarding features of R (well, S actually) that are central to the language. (Disclaimer: I am one of the participants discussing the importance of data frames and formulas in R.) If you want to know why Julia has attracted a lot of interest very recently (like in the last 10 days), as a language it uses multiple dispatch (like S4 methods) with methods being compiled on the fly using the LLVM (http://llvm.org) infrastructure. In some ways it achieves the Holy Grail of languages like R, Matlab, NumPy, ... in that it combines the speed of compiled languages with the flexibility of the high-level interpreted language. One of the developers, Jeff Bezanson, gave a seminar about the design of the language at Stanford yesterday, and the video is archived at http://www.stanford.edu/class/ee380/. You don't see John Chambers on camera but I am reasonably certain that a couple of the questions and comments came from him.
Julia
34 messages · Jeff Ryan, Douglas Bates, Kjetil Halvorsen +8 more
Messages 1–25 of 34
Doug, Agreed on the interesting point - looks like it has some real promise. I think the spike in interest could be attributable to Mike Loukides's tweet on Feb 20. (editor at O'Reilly) https://twitter.com/#!/mikeloukides/status/171773229407551488 That is exactly the moment I stumbled upon it. Jeff
On Thu, Mar 1, 2012 at 11:06 AM, Douglas Bates <bates at stat.wisc.edu> wrote:
My purpose in mentioning the Julia language (julialang.org) here is not to start a flame war. ?I find it to be a very interesting development and others who read this list may want to read about it too. It is still very much early days for this language - about the same stage as R was in 1995 or 1996 when only a few people knew about it - but Julia holds much potential. ?There is a thread about "R and statistical programming" on groups.google.com/group/julia-dev. ?As always happens, there is a certain amount of grumbling of the "R IS SOOOO SLOOOOW" flavor but there is also some good discussion regarding features of R (well, S actually) that are central to the language. (Disclaimer: I am one of the participants discussing the importance of data frames and formulas in R.) If you want to know why Julia has attracted a lot of interest very recently (like in the last 10 days), as a language it uses multiple dispatch (like S4 methods) with methods being compiled on the fly using the LLVM (http://llvm.org) infrastructure. ?In some ways it achieves the Holy Grail of languages like R, Matlab, NumPy, ... in that it combines the speed of compiled languages with the flexibility of the high-level interpreted language. One of the developers, Jeff Bezanson, gave a seminar about the design of the language at Stanford yesterday, and the video is archived at http://www.stanford.edu/class/ee380/. ?You don't see John Chambers on camera but I am reasonably certain that a couple of the questions and comments came from him.
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Jeffrey Ryan jeffrey.ryan at lemnica.com www.lemnica.com www.esotericR.com R/Finance 2012: Applied Finance with R www.RinFinance.com See you in Chicago!!!!
On Thu, Mar 1, 2012 at 11:20 AM, Jeffrey Ryan <jeffrey.ryan at lemnica.com> wrote:
Doug, Agreed on the interesting point - looks like it has some real promise. ?I think the spike in interest could be attributable to Mike Loukides's tweet on Feb 20. (editor at O'Reilly) https://twitter.com/#!/mikeloukides/status/171773229407551488 That is exactly the moment I stumbled upon it.
I think Jeff Bezanson attributes the interest to a blog posting by Viral Shah, another member of the development team, that hit Reddit. He said that, with Viral now in India, it all happened overnight for those in North America and he awoke the next day to find a firestorm of interest. I ran across Julia in the Release Notes of LLVM and mentioned it to Dirk Eddelbuettel who posted about it on Google+ in January. (Dirk, being much younger than I, knows about these new-fangled social media things and I don't.)
Can somebody postb a link to the video? I cant find it, searching "Julia" on youtube stanford channel gives nothing. Kjetil
On Thu, Mar 1, 2012 at 11:37 AM, Douglas Bates <bates at stat.wisc.edu> wrote:
On Thu, Mar 1, 2012 at 11:20 AM, Jeffrey Ryan <jeffrey.ryan at lemnica.com> wrote:
Doug, Agreed on the interesting point - looks like it has some real promise. ?I think the spike in interest could be attributable to Mike Loukides's tweet on Feb 20. (editor at O'Reilly) https://twitter.com/#!/mikeloukides/status/171773229407551488 That is exactly the moment I stumbled upon it.
I think Jeff Bezanson attributes the interest to a blog posting by Viral Shah, another member of the development team, that hit Reddit. He said that, with Viral now in India, it all happened overnight for those in North America and he awoke the next day to find a firestorm of interest. ?I ran across Julia in the Release Notes of LLVM and mentioned it to Dirk Eddelbuettel who posted about it on Google+ in January. ?(Dirk, being much younger than I, knows about these new-fangled social media things and I don't.)
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
http://julialang.org/blog Then click on "Stanford Talk Video". Then click on "available here". Ted.
On 01-Mar-2012 Kjetil Halvorsen wrote:
Can somebody postb a link to the video? I cant find it, searching "Julia" on youtube stanford channel gives nothing. Kjetil On Thu, Mar 1, 2012 at 11:37 AM, Douglas Bates <bates at stat.wisc.edu> wrote:
On Thu, Mar 1, 2012 at 11:20 AM, Jeffrey Ryan <jeffrey.ryan at lemnica.com> wrote:
Doug, Agreed on the interesting point - looks like it has some real promise. ?_I think the spike in interest could be attributable to Mike Loukides's tweet on Feb 20. (editor at O'Reilly) https://twitter.com/#!/mikeloukides/status/171773229407551488 That is exactly the moment I stumbled upon it.
I think Jeff Bezanson attributes the interest to a blog posting by Viral Shah, another member of the development team, that hit Reddit. He said that, with Viral now in India, it all happened overnight for those in North America and he awoke the next day to find a firestorm of interest. ?_I ran across Julia in the Release Notes of LLVM and mentioned it to Dirk Eddelbuettel who posted about it on Google+ in January. ?_(Dirk, being much younger than I, knows about these new-fangled social media things and I don't.)
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at wlandres.net> Date: 01-Mar-2012 Time: 20:47:42 This message was sent by XFMail
1 day later
On Thu, Mar 01, 2012 at 11:06:51AM -0600, Douglas Bates wrote:
My purpose in mentioning the Julia language (julialang.org) here is not to start a flame war. I find it to be a very interesting development and others who read this list may want to read about it too.
[...] Very interesting language. Thank you for mentioning it here. Compiling from the github-sources was easy. Will explore it during the next days. Seems not to be very specific to statistics, but good for math in general. Not sure, if it might make sense to combine R and Julia in the long run (I mean: combining via providing interfaces between them, calling the one via the other, merging code or using libs from the one or the other from each side). Ciao, Oliver
2 days later
I haven't used Julia yet, but from my quick reading of the docs it looks like arguments to functions are passed by reference and not by value, so functions can change their arguments. My recollection from when I first started using S (in the course of a job helping profs and grad students do statistical programming, c. 1983) is that not having to worry about in-place algorithms changing your data gave S a big advantage over Fortran or C. While this feature could slow things down and increase memory code, I felt that it made it easier to write correct code and to use functions that others had written. Does Julia have a const declaration or other means of controlling or documenting that a given function will or will not change the data passed into it? Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of oliver Sent: Friday, March 02, 2012 5:14 PM To: Douglas Bates Cc: R-devel Subject: Re: [Rd] Julia On Thu, Mar 01, 2012 at 11:06:51AM -0600, Douglas Bates wrote:
My purpose in mentioning the Julia language (julialang.org) here is not to start a flame war. I find it to be a very interesting development and others who read this list may want to read about it too.
[...] Very interesting language. Thank you for mentioning it here. Compiling from the github-sources was easy. Will explore it during the next days. Seems not to be very specific to statistics, but good for math in general. Not sure, if it might make sense to combine R and Julia in the long run (I mean: combining via providing interfaces between them, calling the one via the other, merging code or using libs from the one or the other from each side). Ciao, Oliver
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On Mon, Mar 05, 2012 at 03:53:28PM +0000, William Dunlap wrote:
I haven't used Julia yet, but from my quick reading of the docs it looks like arguments to functions are passed by reference and not by value, so functions can change their arguments. My recollection from when I first started using S (in the course of a job helping profs and grad students do statistical programming, c. 1983) is that not having to worry about in-place algorithms changing your data gave S a big advantage over Fortran or C.
[...] C also uses Call-by-Value. Fortran I don't know in detail.
While this feature could slow things down and increase memory code, I felt that it made it easier to write correct code and to use functions that others had written.
Yes, I also think, that call-by-value decreases errors in Code. What I read about Julia it's like MATLAB plus more features for programming. Does matlab also only use call-by-reference?
Does Julia have a const declaration or other means of controlling or documenting that a given function will or will not change the data passed into it?
I did not explored it in detail so far. Maybe the orig-poster already did this in more depth? Ciao, Oliver
Hi Oliver,
On 03/05/2012 09:08 AM, oliver wrote:
On Mon, Mar 05, 2012 at 03:53:28PM +0000, William Dunlap wrote:
I haven't used Julia yet, but from my quick reading of the docs it looks like arguments to functions are passed by reference and not by value, so functions can change their arguments. My recollection from when I first started using S (in the course of a job helping profs and grad students do statistical programming, c. 1983) is that not having to worry about in-place algorithms changing your data gave S a big advantage over Fortran or C.
[...] C also uses Call-by-Value.
C *only* uses Call-by-Value. Cheers, H.
Fortran I don't know in detail.
While this feature could slow things down and increase memory code, I felt that it made it easier to write correct code and to use functions that others had written.
Yes, I also think, that call-by-value decreases errors in Code. What I read about Julia it's like MATLAB plus more features for programming. Does matlab also only use call-by-reference?
Does Julia have a const declaration or other means of controlling or documenting that a given function will or will not change the data passed into it?
I did not explored it in detail so far.
Maybe the orig-poster already did this in more depth?
Ciao,
Oliver
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
On Mon, Mar 05, 2012 at 03:58:59PM -0800, Herv? Pag?s wrote:
Hi Oliver, On 03/05/2012 09:08 AM, oliver wrote:
On Mon, Mar 05, 2012 at 03:53:28PM +0000, William Dunlap wrote:
I haven't used Julia yet, but from my quick reading of the docs it looks like arguments to functions are passed by reference and not by value, so functions can change their arguments. My recollection from when I first started using S (in the course of a job helping profs and grad students do statistical programming, c. 1983) is that not having to worry about in-place algorithms changing your data gave S a big advantage over Fortran or C.
[...] C also uses Call-by-Value.
C *only* uses Call-by-Value.
[...] Yes, that's what I meant. With "also" I meant, that it uses call-by-value, as some other languages also do. Ciao, Oliver
On 12-03-05 6:58 PM, Herv? Pag?s wrote:
Hi Oliver, On 03/05/2012 09:08 AM, oliver wrote:
On Mon, Mar 05, 2012 at 03:53:28PM +0000, William Dunlap wrote:
I haven't used Julia yet, but from my quick reading of the docs it looks like arguments to functions are passed by reference and not by value, so functions can change their arguments. My recollection from when I first started using S (in the course of a job helping profs and grad students do statistical programming, c. 1983) is that not having to worry about in-place algorithms changing your data gave S a big advantage over Fortran or C.
[...] C also uses Call-by-Value.
C *only* uses Call-by-Value.
While literally true, the fact that you can't send an array by value, and must send the value of a pointer to it, kind of supports Bill's point: in C, you mostly end up sending arrays by reference. Duncan Murdoch
Yes, C does use call by value, always. However, data arrays are almost always passed via pointers to malloc'ed space, so, effectively, data arrays are passed by reference. (One can put a 'const type*' in the prototype of a function to declare that the data pointed to will not not be changed, but it is up to documentation or coding standards to let someone know that data pointed to will likely be changed.) I find R's (& S+'s & S's) copy-on-write-if-not-copying-would-be-discoverable- by-the-uer machanism for giving the allusion of pass-by-value a good way to structure the contract between the function writer and the function user. Does Julia have the tools to let a function writer or user decide whether he really needs to copy its arguments or not? Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of Herv? Pag?s Sent: Monday, March 05, 2012 3:59 PM To: oliver Cc: R-devel Subject: Re: [Rd] Julia Hi Oliver, On 03/05/2012 09:08 AM, oliver wrote:
On Mon, Mar 05, 2012 at 03:53:28PM +0000, William Dunlap wrote:
I haven't used Julia yet, but from my quick reading of the docs it looks like arguments to functions are passed by reference and not by value, so functions can change their arguments. My recollection from when I first started using S (in the course of a job helping profs and grad students do statistical programming, c. 1983) is that not having to worry about in-place algorithms changing your data gave S a big advantage over Fortran or C.
[...] C also uses Call-by-Value.
C *only* uses Call-by-Value. Cheers, H.
Fortran I don't know in detail.
While this feature could slow things down and increase memory code, I felt that it made it easier to write correct code and to use functions that others had written.
Yes, I also think, that call-by-value decreases errors in Code. What I read about Julia it's like MATLAB plus more features for programming. Does matlab also only use call-by-reference?
Does Julia have a const declaration or other means of controlling or documenting that a given function will or will not change the data passed into it?
I did not explored it in detail so far.
Maybe the orig-poster already did this in more depth?
Ciao,
Oliver
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
There are many experts on this topic. I'll keep this short. Newer Fortran Languages allow for call by value, but call by reference is the typical and historically, the only approach (there was a time when you could change the value of 1 to 2!). C "only" calls by value except that the value can be a pointer! So, havoc is just a * away. I'm very pleased to be on this list and read the discussion. Thank you Douglas Bates for sending the first message. I like R and will continue to use it. However, I also think that strict "call by value" can get you into trouble, just trouble of a different kind. I'm not sure we will ever yearn for "Julia ouR-Julia", but it is sure fun to think about what might be possible with this language. And having fun is one key objective. Nick Crookston 2012/3/5 oliver <oliver at first.in-berlin.de>:
On Mon, Mar 05, 2012 at 03:58:59PM -0800, Herv? Pag?s wrote:
Hi Oliver, On 03/05/2012 09:08 AM, oliver wrote:
On Mon, Mar 05, 2012 at 03:53:28PM +0000, William Dunlap wrote:
I haven't used Julia yet, but from my quick reading of the docs it looks like arguments to functions are passed by reference and not by value, so functions can change their arguments. ?My recollection from when I first started using S (in the course of a job helping profs and grad students do statistical programming, c. 1983) is that not having to worry about in-place algorithms changing your data gave S a big advantage over Fortran or C.
[...] C also uses Call-by-Value.
C *only* uses Call-by-Value.
[...] Yes, that's what I meant. With "also" I meant, that it uses call-by-value, as some other languages also do. Ciao, ? Oliver
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On Mon, Mar 05, 2012 at 04:54:05PM -0800, Nicholas Crookston wrote:
There are many experts on this topic. I'll keep this short. Newer Fortran Languages allow for call by value, but call by reference is the typical and historically, the only approach (there was a time when you could change the value of 1 to 2!).
Oh, strange.
C "only" calls by value except that the value can be a pointer! So, havoc is just a * away.
[...] For me there was no "havoc" at this point, but for others maybe. There are also other languages that only use call-by-value... ...functional languages are that way in principal. Nevertheless internally they may heavily use pointers and even if you have values that are large arrays for example, they internally just give a pointer to that data structure. (That's, why functional languages are not necessarily slow just because you act on large data and have no references in that language. (A common misunderstanding about functional languages must be slow because they have nor references.) The pointer-stuff is just hidden. Even they ((non-purely) functional languages) may have references, their concept of references is different. (See OCaml for example.) There you can use references to change values in place, but the reference itself is a functional value, and you will never have access to the pointer stuff directly. Hence no problems with mem-arithmetics and dangling pointer's or Null-pointers. [...]
I like R and will continue to use it. However, I also think that strict "call by value" can get you into trouble, just trouble of a different kind.
Can you elaborate more on this? What problems do you have in mind? And what kind of references do you have in mind? The C-like pointers or something like OCaml's ref's?
I'm not sure we will ever yearn for "Julia ouR-Julia", but it is sure fun to think about what might be possible with this language. And having fun is one key objective.
I have fun if things work.
And if the tools do, what I want to achieve...
...and the fun is better, if they do it elegantly.
Do you ask for references in R?
And what kind of references do you have in mind,
and why does it hurt you not to have them?
Can you give examples, so that it's easier to see,
whwere you miss something?
Ciao,
Oliver
P.S.: The speed issue of R was coming up more than once;
in some blog posts it was mentioned. would it make
sense to start a seperated thread of it?
In one of the blog-articles I read, it was mourned about
how NA / missing values were handled, and that NA should
maybe become thrown out, just to get higher speed.
I would not like to have that. Handling NA as special
case IMHO is a very good way. Don't remember if the
article I have in mind just argued about HOW this was
handled, or if it should be thrown out completely.
Making the handling of it better and more performant I
think is a good idea, ignoring NA IMHO is a bad idea.
But maybe that really would be worth a seperate thread?
On Mon, Mar 05, 2012 at 07:33:10PM -0500, Duncan Murdoch wrote:
On 12-03-05 6:58 PM, Herv? Pag?s wrote:
Hi Oliver, On 03/05/2012 09:08 AM, oliver wrote:
On Mon, Mar 05, 2012 at 03:53:28PM +0000, William Dunlap wrote:
I haven't used Julia yet, but from my quick reading of the docs it looks like arguments to functions are passed by reference and not by value, so functions can change their arguments. My recollection from when I first started using S (in the course of a job helping profs and grad students do statistical programming, c. 1983) is that not having to worry about in-place algorithms changing your data gave S a big advantage over Fortran or C.
[...] C also uses Call-by-Value.
C *only* uses Call-by-Value.
While literally true, the fact that you can't send an array by value, and must send the value of a pointer to it, kind of supports Bill's point: in C, you mostly end up sending arrays by reference.
[...] It's a problem of how the term "reference" is used. If you want to limit the possible confsion, better say: giving the pointer-by-value. Or: giving the address-value of the array/struct/... by value. To say, you give the array reference is a shorthand, which maybe creates confusion. Just avoiding the word "reference" here would make it more clear. AFAIK in C++ references are different to pointers. (Some others who know C++ in detail might explain this in detail.) So, using the same terms for many different concepts can create a mess in understanding. Ciao, Oliver
On Tue, Mar 06, 2012 at 12:35:32AM +0000, William Dunlap wrote:
[...]
I find R's (& S+'s & S's) copy-on-write-if-not-copying-would-be-discoverable- by-the-uer machanism for giving the allusion of pass-by-value a good way to structure the contract between the function writer and the function user.
[...] Can you elaborate more on this, especially on the ...-...-...-if-not-copying-would-be-discoverable-by-the-uer stuff? What do you mean with discoverability of not-copying? Ciao, Oliver
S (and its derivatives and successors) promises that functions
will not change their arguments, so in an expression like
val <- func(arg)
you know that arg will not be changed. You can
do that by having func copy arg before doing anything,
but that uses space and time that you want to conserve.
If arg is not a named item in any environment then it
should be fine to write over the original because there
is no way the caller can detect that shortcut. E.g., in
cx <- cos(runif(n))
the cos function does not need to allocate new space for
its output, it can just write over its input because, without
a name attached to it, the caller has no way of looking
at what runif(n) returned. If you did
x <- runif(n)
cx <- cos(x)
then cos would have to allocate new space for its output
because overwriting its input would affect a subsequent
sum(x)
I suppose that end-users and function-writers could learn
to live with having to decide when to copy, but not having
to make that decision makes S more pleasant (and safer) to use.
I think that is a major reason that people are able to
share S code so easily.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message----- From: oliver [mailto:oliver at first.in-berlin.de] Sent: Tuesday, March 06, 2012 1:12 AM To: William Dunlap Cc: Herv? Pag?s; R-devel Subject: Re: [Rd] Julia On Tue, Mar 06, 2012 at 12:35:32AM +0000, William Dunlap wrote: [...]
I find R's (& S+'s & S's) copy-on-write-if-not-copying-would-be-discoverable- by-the-uer machanism for giving the allusion of pass-by-value a good way to structure the contract between the function writer and the function user.
[...] Can you elaborate more on this, especially on the ...-...-...-if-not-copying-would-be-discoverable-by-the-uer stuff? What do you mean with discoverability of not-copying? Ciao, Oliver
On Tue, Mar 6, 2012 at 11:44 AM, William Dunlap <wdunlap at tibco.com> wrote:
S (and its derivatives and successors) promises that functions will not change their arguments, so in an expression like ? val <- func(arg) you know that arg will not be changed. ?You can do that by having func copy arg before doing anything, but that uses space and time that you want to conserve. If arg is not a named item in any environment then it should be fine to write over the original because there is no way the caller can detect that shortcut. ?E.g., in ? ?cx <- cos(runif(n)) the cos function does not need to allocate new space for its output, it can just write over its input because, without a name attached to it, the caller has no way of looking at what runif(n) returned. ?If you did ? ?x <- runif(n) ? ?cx <- cos(x) then cos would have to allocate new space for its output because overwriting its input would affect a subsequent ? ?sum(x) I suppose that end-users and function-writers could learn to live with having to decide when to copy, but not having to make that decision makes S more pleasant (and safer) to use. I think that is a major reason that people are able to share S code so easily.
But don't forget the "Holy Grail" that Doug mentioned at the start of this thread: finding a flexible language that is also fast. Currently many R packages employ C/C++ components to compensate for the fact that the R interpreter can be slow, and the pass-by-value semantics of S provides no protection here. In 2008 Ross Ihaka and Duncan Temple Lang published the paper "Back to the Future: Lisp as a base for a statistical computing system" where they propose Common Lisp as a new foundation for R. They suggest that this could be done while maintaining the same familiar R syntax. A key requirement of any strategy is to maintain easy access to the huge universe of existing C/C++/Fortran numerical and graphics libraries, as these libraries are not likely to be rewritten. Thus there will always be a need for a foreign function interface, and the problem is to provide a flexible and type-safe language that does not force developers to use another unfamiliar, less flexible, and error-prone language to optimize the hot spots. Dominick
Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: oliver [mailto:oliver at first.in-berlin.de] Sent: Tuesday, March 06, 2012 1:12 AM To: William Dunlap Cc: Herv? Pag?s; R-devel Subject: Re: [Rd] Julia On Tue, Mar 06, 2012 at 12:35:32AM +0000, William Dunlap wrote: [...]
I find R's (& S+'s & S's) copy-on-write-if-not-copying-would-be-discoverable- by-the-uer machanism for giving the allusion of pass-by-value a good way to structure the contract between the function writer and the function user.
[...] Can you elaborate more on this, especially on the ...-...-...-if-not-copying-would-be-discoverable-by-the-uer stuff? What do you mean with discoverability of not-copying? Ciao, ? ?Oliver
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On Tue, Mar 6, 2012 at 3:56 AM, oliver <oliver at first.in-berlin.de> wrote:
On Mon, Mar 05, 2012 at 04:54:05PM -0800, Nicholas Crookston wrote:
There are many experts on this topic. ?I'll keep this short. Newer Fortran Languages allow for call by value, but call by reference is the typical and historically, the only approach (there was a time when you could change the value of 1 to 2!).
Oh, strange.
C "only" calls by value except that the value can be a pointer! So, havoc is just a * away.
[...] For me there was no "havoc" at this point, but for others maybe. There are also other languages that only use call-by-value... ...functional languages are that way in principal. ?Nevertheless internally they may heavily use pointers and ?even if you have values that are large arrays for example, ?they internally just give a pointer to that data structure. ?(That's, why functional languages are not necessarily slow ?just because you act on large data and have no references ?in that language. (A common misunderstanding about functional ?languages must be slow because they have nor references.) ?The pointer-stuff is just hidden. Even they ((non-purely) functional languages) may have references, their concept of references is different. (See OCaml for example.) There you can use references to change values in place, but the reference itself is a functional value, and you will never have access to the pointer stuff directly. Hence no problems with mem-arithmetics and dangling pointer's or Null-pointers. [...]
I like R and will continue to use it. However, I also think that strict "call by value" can get you into trouble, just trouble of a different kind.
Can you elaborate more on this? What problems do you have in mind? And what kind of references do you have in mind? The C-like pointers or something like OCaml's ref's?
OCaml refs are an "escape hatch" from the pure functional programming paradigm where nothing can be changed once given a value, an extreme form of pass-by-value. Similarly, most languages that are advertised as pass-by-value include some kind of escape hatch that permits you to work with pointers (or mutable vectors) for improved runtime performance. The speed issues arise for two main reasons: interpreting code is much slower than running machine code, and copying large data structures can be expensive. Pass-by-value semantics forces this to happen in many situations where the compiler/interpreter cannot safely optimize it away. Based on the video Julia manages the speed issue by viewing everything like a template, thus generating new methods based on type inference. This means there isn't a lot of runtime type checking for dispatch, because customized methods were already generated, but this can lead to another problem: code bloat. There are no free lunches.
I'm not sure we will ever yearn for "Julia ouR-Julia", but it is sure fun to think about what might be possible with this language. And having fun is one key objective.
I have fun if things work. And if the tools do, what I want to achieve... ...and the fun is better, if they do it elegantly. Do you ask for references in R? And what kind of references do you have in mind, and why does it hurt you not to have them? Can you give examples, so that it's easier to see, whwere you miss something? Ciao, ? Oliver P.S.: The speed issue of R was coming up more than once; ? ? ?in some blog posts it was mentioned. would it make ? ? ?sense to start a seperated thread of it? ? ? ?In one ?of the blog-articles I read, it was mourned about ? ? ?how NA / missing values were handled, and that NA should ? ? ?maybe become thrown out, just to get higher speed. ? ? ?I would not like to have that. Handling NA as special ? ? ?case IMHO is a very good way. Don't remember if the ? ? ?article I have in mind just argued about HOW this was ? ? ?handled, or if it should be thrown out completely. ? ? ?Making the handling of it better and more performant I ? ? ?think is a good idea, ignoring NA IMHO is a bad idea. ? ? ?But maybe that really would be worth a seperate thread?
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On Wed, Mar 07, 2012 at 10:31:14AM -0500, Dominick Samperi wrote:
On Tue, Mar 6, 2012 at 3:56 AM, oliver <oliver at first.in-berlin.de> wrote:
On Mon, Mar 05, 2012 at 04:54:05PM -0800, Nicholas Crookston wrote:
There are many experts on this topic. ?I'll keep this short. Newer Fortran Languages allow for call by value, but call by reference is the typical and historically, the only approach (there was a time when you could change the value of 1 to 2!).
Oh, strange.
C "only" calls by value except that the value can be a pointer! So, havoc is just a * away.
[...] For me there was no "havoc" at this point, but for others maybe. There are also other languages that only use call-by-value... ...functional languages are that way in principal. ?Nevertheless internally they may heavily use pointers and ?even if you have values that are large arrays for example, ?they internally just give a pointer to that data structure. ?(That's, why functional languages are not necessarily slow ?just because you act on large data and have no references ?in that language. (A common misunderstanding about functional ?languages must be slow because they have nor references.) ?The pointer-stuff is just hidden. Even they ((non-purely) functional languages) may have references, their concept of references is different. (See OCaml for example.) There you can use references to change values in place, but the reference itself is a functional value, and you will never have access to the pointer stuff directly. Hence no problems with mem-arithmetics and dangling pointer's or Null-pointers. [...]
I like R and will continue to use it. However, I also think that strict "call by value" can get you into trouble, just trouble of a different kind.
Can you elaborate more on this? What problems do you have in mind? And what kind of references do you have in mind? The C-like pointers or something like OCaml's ref's?
OCaml refs are an "escape hatch" from the pure functional programming paradigm where nothing can be changed once given a value, an extreme form of pass-by-value.
OCaml is not a purely functional language and has not the claim to be one; hence it's not an "escape hatch" (which seem to have a negative touch to me). Arrays and strings in OCaml are also imperative. And with the "mutable" attribute in records, you also can crearte imperative record entries. So, it's just a different design / approach than Haskell for example. OCaml is coming from ML-languages. Purely Functional on the one hand is beautiful, and therefore nice; but it also is dogmatic on the other hand.
Similarly, most languages that are advertised as pass-by-value include some kind of escape hatch that permits you to work with pointers (or mutable vectors) for improved runtime performance.
References in OCaml are NOT pointers.
You do have access in an imperative / in-place way, but you
have NO POINTER STUFF in that language.
====================================================
# let a = ref 5;;
val a : int ref = {contents = 5}
# a := 7;;
- : unit = ()
# a;;
- : int ref = {contents = 7}
#
====================================================
This is in-place modification of the contents of the ref,
without any pointer arithmetics.
"a" is a functional value which hosts an imperative one
on the inside.
The speed issues arise for two main reasons: interpreting code is much slower than running machine code, and copying large data structures can be expensive.
The functional approach often saves time and space. This is just not well known. And the distinction of imperative vs. functional has nothing to do with interpreted vs. directly executed. ==================================================== # let mylist_1 = [ 3;5;323 ];; val mylist_1 : int list = [3; 5; 323] # let mylist_2 = 12 :: mylist_1;; val mylist_2 : int list = [12; 3; 5; 323] # mylist_1;; - : int list = [3; 5; 323] # mylist_2;; - : int list = [12; 3; 5; 323] # ==================================================== Both lists share the common elements here. No copy is done. In this case the functional approach is very nice. Just a counter-example to "functional is eating up space". When thinking about the questions here, I think the design of Ocaml addressed all this, and that this was the design decision, why arrays are possible to be changed imperatively. ==================================================== # let my_array = [| 1; 3; 54; 99 |];; val my_array : int array = [|1; 3; 54; 99|] # my_array;; - : int array = [|1; 3; 54; 99|] # my_array.(2) <- 99999;; - : unit = () # my_array;; - : int array = [|1; 3; 99999; 99|] # ==================================================== If R is rather purely functional here, then the problem addressed here is, that a pureley functional approach without any "escape hatches" creates the problem. If in-place modification is also not possible on arrays, then this is the base of the problem. But changing this behaviour in newer versions of R would brake a lot of already existing R-code. Ciao, Oliver
On Tue, Mar 06, 2012 at 12:49:32PM -0500, Dominick Samperi wrote:
On Tue, Mar 6, 2012 at 11:44 AM, William Dunlap <wdunlap at tibco.com> wrote:
S (and its derivatives and successors) promises that functions will not change their arguments, so in an expression like ? val <- func(arg) you know that arg will not be changed. ?You can do that by having func copy arg before doing anything, but that uses space and time that you want to conserve. If arg is not a named item in any environment then it should be fine to write over the original because there is no way the caller can detect that shortcut. ?E.g., in ? ?cx <- cos(runif(n)) the cos function does not need to allocate new space for its output, it can just write over its input because, without a name attached to it, the caller has no way of looking at what runif(n) returned. ?If you did ? ?x <- runif(n) ? ?cx <- cos(x)
You have two names here, x and cx, hence your example does not fit into what you want to explain. A better example would be: x <- runif(n) x <- cos(x)
then cos would have to allocate new space for its output because overwriting its input would affect a subsequent ? ?sum(x) I suppose that end-users and function-writers could learn to live with having to decide when to copy, but not having to make that decision makes S more pleasant (and safer) to use. I think that is a major reason that people are able to share S code so easily.
But don't forget the "Holy Grail" that Doug mentioned at the start of this thread: finding a flexible language that is also fast. Currently many R packages employ C/C++ components to compensate for the fact that the R interpreter can be slow, and the pass-by-value semantics of S provides no protection here.
[...] The distinction imperative vs. functional has nothing to do with the distinction interpreted vs. directly executed. Thinking again on the problem that was mentioned here, I think it might be circumvented. Looking again at R's properties, looking again into U.Ligges "Programmieren in R", I saw there was mentioned that in R anything (?!) is an object... so then it's OOP; but also it was mentioned, R is a functional language. But this does not mean it's purely functional or has no imperative data structures. As R relies heavily on vectors, here we have an imperative datastructure. So, it rather looks to me that "<-" does work in-place on the vectors, even "<-" itself is a function (which does not matter for the problem). If thats true (I assume here, it is; correct me, if it's wrong), then I think, assigning with "<<-" and assign() also would do an imperative (in-place) change of the contents. Then the copying-of-big-objects-when-passed-as-args problem can be circumvented by working on either a variable in the GlobalEnv (and using "<<-", or using a certain environment for the big data and passing it's name (and the variable) as value to the function which then uses assign() and get() to work on that data. Then in-place modification should be possible.
In 2008 Ross Ihaka and Duncan Temple Lang published the paper "Back to the Future: Lisp as a base for a statistical computing system" where they propose Common Lisp as a new foundation for R. They suggest that this could be done while maintaining the same familiar R syntax. A key requirement of any strategy is to maintain easy access to the huge universe of existing C/C++/Fortran numerical and graphics libraries, as these libraries are not likely to be rewritten. Thus there will always be a need for a foreign function interface, and the problem is to provide a flexible and type-safe language that does not force developers to use another unfamiliar, less flexible, and error-prone language to optimize the hot spots.
If I here "type safe" I rather would think about OCaml
or maybe Ada, but not LISP.
Also, LISP has so many "("'s and ")"'s,
that it's making people going crazy ;-)
Ciao,
Oliver
No my examples are what I meant. My point was that a function, say cos(),
can act like it does call-by-value but conserve memory when it can if it can
distinguish between the case
cx <- cos(x=runif(n)) # no allocation needed, use the input space for the return value
and and the case
x <- runif(n)
cx <- cos(x=x) # return value cannot reuse the argument's memory, so allocate space for return value
sum(x) # Otherwise sum(x) would return sum(cx)
The function needs to know if a memory block is referred to by a name in any environment
in order to do that.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message----- From: oliver [mailto:oliver at first.in-berlin.de] Sent: Wednesday, March 07, 2012 10:22 AM To: Dominick Samperi Cc: William Dunlap; R-devel Subject: Re: [Rd] Julia On Tue, Mar 06, 2012 at 12:49:32PM -0500, Dominick Samperi wrote:
On Tue, Mar 6, 2012 at 11:44 AM, William Dunlap <wdunlap at tibco.com>
wrote:
S (and its derivatives and successors) promises that functions will not change their arguments, so in an expression like ? val <- func(arg) you know that arg will not be changed. ?You can do that by having func copy arg before doing anything, but that uses space and time that you want to conserve. If arg is not a named item in any environment then it should be fine to write over the original because there is no way the caller can detect that shortcut. ?E.g., in ? ?cx <- cos(runif(n)) the cos function does not need to allocate new space for its output, it can just write over its input because, without a name attached to it, the caller has no way of looking at what runif(n) returned. ?If you did ? ?x <- runif(n) ? ?cx <- cos(x)
You have two names here, x and cx, hence your example does not fit into what you want to explain. A better example would be: x <- runif(n) x <- cos(x)
then cos would have to allocate new space for its output because overwriting its input would affect a subsequent ? ?sum(x) I suppose that end-users and function-writers could learn to live with having to decide when to copy, but not having to make that decision makes S more pleasant (and safer) to use. I think that is a major reason that people are able to share S code so easily.
But don't forget the "Holy Grail" that Doug mentioned at the start of this thread: finding a flexible language that is also fast. Currently many R packages employ C/C++ components to compensate for the fact that the R interpreter can be slow, and the pass-by-value semantics of S provides no protection here.
[...] The distinction imperative vs. functional has nothing to do with the distinction interpreted vs. directly executed. Thinking again on the problem that was mentioned here, I think it might be circumvented. Looking again at R's properties, looking again into U.Ligges "Programmieren in R", I saw there was mentioned that in R anything (?!) is an object... so then it's OOP; but also it was mentioned, R is a functional language. But this does not mean it's purely functional or has no imperative data structures. As R relies heavily on vectors, here we have an imperative datastructure. So, it rather looks to me that "<-" does work in-place on the vectors, even "<-" itself is a function (which does not matter for the problem). If thats true (I assume here, it is; correct me, if it's wrong), then I think, assigning with "<<-" and assign() also would do an imperative (in-place) change of the contents. Then the copying-of-big-objects-when-passed-as-args problem can be circumvented by working on either a variable in the GlobalEnv (and using "<<-", or using a certain environment for the big data and passing it's name (and the variable) as value to the function which then uses assign() and get() to work on that data. Then in-place modification should be possible.
In 2008 Ross Ihaka and Duncan Temple Lang published the paper "Back to the Future: Lisp as a base for a statistical computing system" where they propose Common Lisp as a new foundation for R. They suggest that this could be done while maintaining the same familiar R syntax. A key requirement of any strategy is to maintain easy access to the huge universe of existing C/C++/Fortran numerical and graphics libraries, as these libraries are not likely to be rewritten. Thus there will always be a need for a foreign function interface, and the problem is to provide a flexible and type-safe language that does not force developers to use another unfamiliar, less flexible, and error-prone language to optimize the hot spots.
If I here "type safe" I rather would think about OCaml or maybe Ada, but not
LISP.
Also, LISP has so many "("'s and ")"'s,
that it's making people going crazy ;-)
Ciao,
Oliver
Hi,
ok, thank you for clarifiying what you meant.
You only referred to the reusage of the args,
not of an already existing vector.
So I overgenerealized your example.
But when looking at your example,
and how I would implement the cos()
I doubt I would use copying the args
before calculating the result.
Just allocate a result-vector, and then place the cos()
of the input-vector into the result vector.
I didn't looked at how it is done in R,
but I would guess it's like that.
In pseudo-Code something like that:
cos_val[idx] = cos( input_val[idx] );
But R also handles complex data with cos()
so it will look a bit more laborious.
What I have seen so far from implementing C-extensions
for R is rather C-ish, and so you have the control
on many details. Copying the input just to read it
would not make sense here.
I doubt that R internally is doing that.
Or did you found that in the R-code?
The other problem, someone mentioned, was *changing* the contents
of a matrix... and that this is NO>T done in-place, when using
a function for it.
But the namespace-name / variable-name as "references" to the matrix
might solve that problem.
Ciao,
Oliver
On Wed, Mar 07, 2012 at 07:10:43PM +0000, William Dunlap wrote:
No my examples are what I meant. My point was that a function, say cos(),
can act like it does call-by-value but conserve memory when it can if it can
distinguish between the case
cx <- cos(x=runif(n)) # no allocation needed, use the input space for the return value
and and the case
x <- runif(n)
cx <- cos(x=x) # return value cannot reuse the argument's memory, so allocate space for return value
sum(x) # Otherwise sum(x) would return sum(cx)
The function needs to know if a memory block is referred to by a name in any environment
in order to do that.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message----- From: oliver [mailto:oliver at first.in-berlin.de] Sent: Wednesday, March 07, 2012 10:22 AM To: Dominick Samperi Cc: William Dunlap; R-devel Subject: Re: [Rd] Julia On Tue, Mar 06, 2012 at 12:49:32PM -0500, Dominick Samperi wrote:
On Tue, Mar 6, 2012 at 11:44 AM, William Dunlap <wdunlap at tibco.com>
wrote:
S (and its derivatives and successors) promises that functions will not change their arguments, so in an expression like ? val <- func(arg) you know that arg will not be changed. ?You can do that by having func copy arg before doing anything, but that uses space and time that you want to conserve. If arg is not a named item in any environment then it should be fine to write over the original because there is no way the caller can detect that shortcut. ?E.g., in ? ?cx <- cos(runif(n)) the cos function does not need to allocate new space for its output, it can just write over its input because, without a name attached to it, the caller has no way of looking at what runif(n) returned. ?If you did ? ?x <- runif(n) ? ?cx <- cos(x)
You have two names here, x and cx, hence your example does not fit into what you want to explain. A better example would be: x <- runif(n) x <- cos(x)
then cos would have to allocate new space for its output because overwriting its input would affect a subsequent ? ?sum(x) I suppose that end-users and function-writers could learn to live with having to decide when to copy, but not having to make that decision makes S more pleasant (and safer) to use. I think that is a major reason that people are able to share S code so easily.
But don't forget the "Holy Grail" that Doug mentioned at the start of this thread: finding a flexible language that is also fast. Currently many R packages employ C/C++ components to compensate for the fact that the R interpreter can be slow, and the pass-by-value semantics of S provides no protection here.
[...] The distinction imperative vs. functional has nothing to do with the distinction interpreted vs. directly executed. Thinking again on the problem that was mentioned here, I think it might be circumvented. Looking again at R's properties, looking again into U.Ligges "Programmieren in R", I saw there was mentioned that in R anything (?!) is an object... so then it's OOP; but also it was mentioned, R is a functional language. But this does not mean it's purely functional or has no imperative data structures. As R relies heavily on vectors, here we have an imperative datastructure. So, it rather looks to me that "<-" does work in-place on the vectors, even "<-" itself is a function (which does not matter for the problem). If thats true (I assume here, it is; correct me, if it's wrong), then I think, assigning with "<<-" and assign() also would do an imperative (in-place) change of the contents. Then the copying-of-big-objects-when-passed-as-args problem can be circumvented by working on either a variable in the GlobalEnv (and using "<<-", or using a certain environment for the big data and passing it's name (and the variable) as value to the function which then uses assign() and get() to work on that data. Then in-place modification should be possible.
In 2008 Ross Ihaka and Duncan Temple Lang published the paper "Back to the Future: Lisp as a base for a statistical computing system" where they propose Common Lisp as a new foundation for R. They suggest that this could be done while maintaining the same familiar R syntax. A key requirement of any strategy is to maintain easy access to the huge universe of existing C/C++/Fortran numerical and graphics libraries, as these libraries are not likely to be rewritten. Thus there will always be a need for a foreign function interface, and the problem is to provide a flexible and type-safe language that does not force developers to use another unfamiliar, less flexible, and error-prone language to optimize the hot spots.
If I here "type safe" I rather would think about OCaml or maybe Ada, but not
LISP.
Also, LISP has so many "("'s and ")"'s,
that it's making people going crazy ;-)
Ciao,
Oliver
Ah, and you mean if it's an anonymous array it could be reused directly from the args. OK, now I see why you insist on the anonymous data thing. I didn't grasped it even in my last mail. But that somehow also relates to what I wrote about reusing an already existing, named vector. Just the moment of in-place-modification is different. From x <- runif(n) cx <- cos(x) instead of
cx <- cos(x=runif(n)) # no allocation needed, use the input space for the return value
to something like
cx <- runif(n)
cos( cx, inplace=TRUE)
or
cos( runif(n), inplace=TRUE)
This way it would be possible to specify the reusage
of the input *explicitly* (without implicit rules
like anonymous vs. named values).
In Pseudo-Code something like that:
if (in_place == TRUE )
{
input_val[idx] = cos( input_val[idx] );
return input_val;
}
else
{
result_val = alloc_vec( LENGTH(input_val), ... );
result_val[idx] = cos( input_val[idx] );
return result_val;
}
Is this matching, what you were looking for?
Ciao,
Oliver
On Thu, Mar 08, 2012 at 02:56:24PM +0100, oliver wrote:
Hi,
ok, thank you for clarifiying what you meant.
You only referred to the reusage of the args,
not of an already existing vector.
So I overgenerealized your example.
But when looking at your example,
and how I would implement the cos()
I doubt I would use copying the args
before calculating the result.
Just allocate a result-vector, and then place the cos()
of the input-vector into the result vector.
I didn't looked at how it is done in R,
but I would guess it's like that.
In pseudo-Code something like that:
cos_val[idx] = cos( input_val[idx] );
But R also handles complex data with cos()
so it will look a bit more laborious.
What I have seen so far from implementing C-extensions
for R is rather C-ish, and so you have the control
on many details. Copying the input just to read it
would not make sense here.
I doubt that R internally is doing that.
Or did you found that in the R-code?
The other problem, someone mentioned, was *changing* the contents
of a matrix... and that this is NO>T done in-place, when using
a function for it.
But the namespace-name / variable-name as "references" to the matrix
might solve that problem.
Ciao,
Oliver
On Wed, Mar 07, 2012 at 07:10:43PM +0000, William Dunlap wrote:
No my examples are what I meant. My point was that a function, say cos(),
can act like it does call-by-value but conserve memory when it can if it can
distinguish between the case
cx <- cos(x=runif(n)) # no allocation needed, use the input space for the return value
and and the case
x <- runif(n)
cx <- cos(x=x) # return value cannot reuse the argument's memory, so allocate space for return value
sum(x) # Otherwise sum(x) would return sum(cx)
The function needs to know if a memory block is referred to by a name in any environment
in order to do that.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message----- From: oliver [mailto:oliver at first.in-berlin.de] Sent: Wednesday, March 07, 2012 10:22 AM To: Dominick Samperi Cc: William Dunlap; R-devel Subject: Re: [Rd] Julia On Tue, Mar 06, 2012 at 12:49:32PM -0500, Dominick Samperi wrote:
On Tue, Mar 6, 2012 at 11:44 AM, William Dunlap <wdunlap at tibco.com>
wrote:
S (and its derivatives and successors) promises that functions will not change their arguments, so in an expression like ? val <- func(arg) you know that arg will not be changed. ?You can do that by having func copy arg before doing anything, but that uses space and time that you want to conserve. If arg is not a named item in any environment then it should be fine to write over the original because there is no way the caller can detect that shortcut. ?E.g., in ? ?cx <- cos(runif(n)) the cos function does not need to allocate new space for its output, it can just write over its input because, without a name attached to it, the caller has no way of looking at what runif(n) returned. ?If you did ? ?x <- runif(n) ? ?cx <- cos(x)
You have two names here, x and cx, hence your example does not fit into what you want to explain. A better example would be: x <- runif(n) x <- cos(x)
then cos would have to allocate new space for its output because overwriting its input would affect a subsequent ? ?sum(x) I suppose that end-users and function-writers could learn to live with having to decide when to copy, but not having to make that decision makes S more pleasant (and safer) to use. I think that is a major reason that people are able to share S code so easily.
But don't forget the "Holy Grail" that Doug mentioned at the start of this thread: finding a flexible language that is also fast. Currently many R packages employ C/C++ components to compensate for the fact that the R interpreter can be slow, and the pass-by-value semantics of S provides no protection here.
[...] The distinction imperative vs. functional has nothing to do with the distinction interpreted vs. directly executed. Thinking again on the problem that was mentioned here, I think it might be circumvented. Looking again at R's properties, looking again into U.Ligges "Programmieren in R", I saw there was mentioned that in R anything (?!) is an object... so then it's OOP; but also it was mentioned, R is a functional language. But this does not mean it's purely functional or has no imperative data structures. As R relies heavily on vectors, here we have an imperative datastructure. So, it rather looks to me that "<-" does work in-place on the vectors, even "<-" itself is a function (which does not matter for the problem). If thats true (I assume here, it is; correct me, if it's wrong), then I think, assigning with "<<-" and assign() also would do an imperative (in-place) change of the contents. Then the copying-of-big-objects-when-passed-as-args problem can be circumvented by working on either a variable in the GlobalEnv (and using "<<-", or using a certain environment for the big data and passing it's name (and the variable) as value to the function which then uses assign() and get() to work on that data. Then in-place modification should be possible.
In 2008 Ross Ihaka and Duncan Temple Lang published the paper "Back to the Future: Lisp as a base for a statistical computing system" where they propose Common Lisp as a new foundation for R. They suggest that this could be done while maintaining the same familiar R syntax. A key requirement of any strategy is to maintain easy access to the huge universe of existing C/C++/Fortran numerical and graphics libraries, as these libraries are not likely to be rewritten. Thus there will always be a need for a foreign function interface, and the problem is to provide a flexible and type-safe language that does not force developers to use another unfamiliar, less flexible, and error-prone language to optimize the hot spots.
If I here "type safe" I rather would think about OCaml or maybe Ada, but not
LISP.
Also, LISP has so many "("'s and ")"'s,
that it's making people going crazy ;-)
Ciao,
Oliver
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
So you propose an inplace=TRUE/FALSE entry for each argument to each function which may may want to avoid allocating memory? The major problem is that the function writer has no idea what the value of inplace should be, as it depends on how the function gets called. This makes writing reusable functions (hence packages) difficult. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: oliver [mailto:oliver at first.in-berlin.de] Sent: Thursday, March 08, 2012 7:40 AM To: William Dunlap Cc: R-devel Subject: Re: [Rd] Julia Ah, and you mean if it's an anonymous array it could be reused directly from the args. OK, now I see why you insist on the anonymous data thing. I didn't grasped it even in my last mail. But that somehow also relates to what I wrote about reusing an already existing, named vector. Just the moment of in-place-modification is different. From x <- runif(n) cx <- cos(x) instead of
cx <- cos(x=runif(n)) # no allocation needed, use the input space for the return value
to something like
cx <- runif(n)
cos( cx, inplace=TRUE)
or
cos( runif(n), inplace=TRUE)
This way it would be possible to specify the reusage of the input *explicitly*
(without implicit rules like anonymous vs. named values).
In Pseudo-Code something like that:
if (in_place == TRUE )
{
input_val[idx] = cos( input_val[idx] );
return input_val;
}
else
{
result_val = alloc_vec( LENGTH(input_val), ... );
result_val[idx] = cos( input_val[idx] );
return result_val;
}
Is this matching, what you were looking for?
Ciao,
Oliver
On Thu, Mar 08, 2012 at 02:56:24PM +0100, oliver wrote:
Hi,
ok, thank you for clarifiying what you meant.
You only referred to the reusage of the args, not of an already
existing vector.
So I overgenerealized your example.
But when looking at your example,
and how I would implement the cos()
I doubt I would use copying the args
before calculating the result.
Just allocate a result-vector, and then place the cos() of the
input-vector into the result vector.
I didn't looked at how it is done in R, but I would guess it's like
that.
In pseudo-Code something like that:
cos_val[idx] = cos( input_val[idx] );
But R also handles complex data with cos() so it will look a bit more
laborious.
What I have seen so far from implementing C-extensions for R is rather
C-ish, and so you have the control on many details. Copying the input
just to read it would not make sense here.
I doubt that R internally is doing that.
Or did you found that in the R-code?
The other problem, someone mentioned, was *changing* the contents of a
matrix... and that this is NO>T done in-place, when using a function
for it.
But the namespace-name / variable-name as "references" to the matrix
might solve that problem.
Ciao,
Oliver
On Wed, Mar 07, 2012 at 07:10:43PM +0000, William Dunlap wrote:
No my examples are what I meant. My point was that a function, say
cos(), can act like it does call-by-value but conserve memory when
it can if it can distinguish between the case
cx <- cos(x=runif(n)) # no allocation needed, use the input
space for the return value and and the case
x <- runif(n)
cx <- cos(x=x) # return value cannot reuse the argument's memory, so
allocate space for return value
sum(x) # Otherwise sum(x) would return sum(cx) The function needs to know if a memory block is referred to by a name in any environment in order to do that. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: oliver [mailto:oliver at first.in-berlin.de] Sent: Wednesday, March 07, 2012 10:22 AM To: Dominick Samperi Cc: William Dunlap; R-devel Subject: Re: [Rd] Julia On Tue, Mar 06, 2012 at 12:49:32PM -0500, Dominick Samperi wrote:
On Tue, Mar 6, 2012 at 11:44 AM, William Dunlap <wdunlap at tibco.com>
wrote:
S (and its derivatives and successors) promises that functions will not change their arguments, so in an expression like ? val <- func(arg) you know that arg will not be changed. ?You can do that by having func copy arg before doing anything, but that uses space and time that you want to conserve. If arg is not a named item in any environment then it should be fine to write over the original because there is no way the caller can detect that shortcut. ?E.g., in ? ?cx <- cos(runif(n)) the cos function does not need to allocate new space for its output, it can just write over its input because, without a name attached to it, the caller has no way of looking at what runif(n) returned. ?If you did ? ?x <- runif(n) ? ?cx <- cos(x)
You have two names here, x and cx, hence your example does not fit into what you want to explain. A better example would be: x <- runif(n) x <- cos(x)
then cos would have to allocate new space for its output because overwriting its input would affect a subsequent ? ?sum(x) I suppose that end-users and function-writers could learn to live with having to decide when to copy, but not having to make that decision makes S more pleasant (and safer) to use. I think that is a major reason that people are able to share S code so easily.
But don't forget the "Holy Grail" that Doug mentioned at the start of this thread: finding a flexible language that is also fast. Currently many R packages employ C/C++ components to compensate for the fact that the R interpreter can be slow, and the pass-by-value semantics of S provides no protection here.
[...] The distinction imperative vs. functional has nothing to do with the distinction interpreted vs. directly executed. Thinking again on the problem that was mentioned here, I think it might be circumvented. Looking again at R's properties, looking again into U.Ligges "Programmieren in R", I saw there was mentioned that in R anything (?!) is an object... so then it's OOP; but also it was mentioned, R is a functional language. But this does not mean it's purely functional or
has no imperative data structures.
As R relies heavily on vectors, here we have an imperative datastructure. So, it rather looks to me that "<-" does work in-place on the vectors, even
"<-"
itself is a function (which does not matter for the problem). If thats true (I assume here, it is; correct me, if it's wrong), then I think, assigning with "<<-" and assign() also would do an imperative (in-place) change of the contents. Then the copying-of-big-objects-when-passed-as-args problem can be circumvented by working on either a variable in the GlobalEnv (and using "<<-", or using a certain environment for the big data and passing it's name (and the variable) as value to the function which then uses assign() and get() to work on that data. Then in-place modification should be possible.
In 2008 Ross Ihaka and Duncan Temple Lang published the paper "Back to the Future: Lisp as a base for a statistical computing system" where they propose Common Lisp as a new foundation for R. They suggest that this could be done while maintaining the same
familiar R syntax.
A key requirement of any strategy is to maintain easy access to the huge universe of existing C/C++/Fortran numerical and graphics libraries, as these libraries are not likely to be rewritten. Thus there will always be a need for a foreign function interface, and the problem is to provide a flexible and type-safe language that does not force developers to use another unfamiliar, less flexible, and error-prone language to optimize the hot
spots.
If I here "type safe" I rather would think about OCaml or maybe
Ada, but not LISP.
Also, LISP has so many "("'s and ")"'s, that it's making people
going crazy ;-)
Ciao,
Oliver
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel