An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20120322/5231db82/attachment.pl>
R-devel Digest, Vol 109, Issue 22
12 messages · Peter Dalgaard, Ramon Diaz-Uriarte, Simon Urbanek +2 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20120322/8b88d372/attachment.pl>
On Thu, 22 Mar 2012 10:38:55 -0400,Simon Urbanek <simon.urbanek at r-project.org> wrote:
On Mar 22, 2012, at 9:45 AM, Terry Therneau <therneau at mayo.edu> wrote:
strongly disagree. I'm appalled to see that sentence here. Come on!
The overhead is significant for any large vector and it is in particular unnecessary since in .C you have to allocate *and copy* space even for results (twice!). Also it is very error-prone, because you have no information about the length of vectors so it's easy to run out of bounds and there is no way to check. IMHO .C should not be used for any code written in this century (the only exception may be if you are passing no data, e.g. if all you do is to pass a flag and expect no result, you can get away with it even if it is more dangerous). It is a legacy interface that dates way back and is essentially just re-named .Fortran interface. Again, I would strongly recommend the use of .Call in any recent code because it is safer and more efficient (if you don't care about either attribute, well, feel free ;)).
So aleph will not support the .C interface? ;-)
It will look at the timestamp of the source file and delete the package if it is not before 1980 ;). Otherwise it will send a request for punch cards with ".C is deprecated, please upgrade to .Call" stamped out :P At that point I'll be flaming about using the native Aleph interface and not the R compatibility layer ;) Cheers, S
I'll dissent -- I don't think .C is inherently any more dangerous than .Call and prefer it's simplicity in many cases. Calling C at all is what is inherently dangerous -- I can reference beyond the end of a vector, write over objects that should be read only, and branch to random places using either interface.
You can always do so deliberately, but with .C you have no way of preventing it since you don't even know what is the length! That is certainly far more dangerous than .Call where you can simply loop over the length, check that the lengths are compatible etc. Also for types like strings .C is a minefield that is hard to not blow up whereas .Call it is even more safe than scalar arrays. You can do none of that with .C which relies entirely on conventions with no recorded semantics.
If you are dealing with large objects and worry about memory efficiency then .Call puts more tools at your disposal and is worth the effort. However, I did not find the .Call interface at all easy to use at first
I guess this depends on the developer and is certainly a factor. Personally, I find the subset of the R API needed for .Call fairly small and intuitive (in particular when you are just writing a safer replacement for .C), but I'm obviously biased. Maybe in a separate thread we could discuss this - I'd be happy to write a ref card or cheat sheet if I find out what people find challenging on .Call. Nonetheless, my point is that it is more than worth investing the effort both in safety and performance.
After your previous email I made a mental note "try to finally learn to use .Call since I often deal with large objects". So, yes, I'd love to see a ref card and cheat sheet: I have tried learning to use .Call a few times, but have always gone back to .C since (it seems that) all I needed to know are just a couple of conventions, and the rest is "C as usual". You say "if I find out what people find challenging on .Call". Hummm... can I answer "basically everything"? I think Terry Thereneau says, "the things I needed to know are scattered about in multiple places". When I see the convolve example (5.2 in Writing R extensions) I understand the C code; when I see the convolve2 example in 5.10.1 I think I can guess what lines "PROTECT(a ..." to "xab = NUMERIC_POINTER ..." might be doing, but I would not know how to do that on my own. Yes, I can go to 5.9.1 to read about PROTECT, then search for ... But, at that point, I've gone back to .C. Of course, this might just be my laziness/incompetence/whatever. Best, R.
and we should keep that in mind before getting too pompous in our lectures to the "sinners of .C". (Mostly because the things I needed to know are scattered about in multiple places.) I might have to ask for an exemption on that timestamp -- the first bits of the survival package only reach back to 1986. And I've had to change source code systems multiple times which plays hob with the file times, though I did try to preserve the changelog history to forstall some future litigious soul who claims they wrote it first (sccs -> rcs -> cvs -> svn -> mercurial). :-)
;) Maybe the rule should be based on the date of the first appearance of the package, fair enough :)
Cheers, Simon [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina
Universidad Aut?noma de Madrid
Arzobispo Morcillo, 4
28029 Madrid
Spain
Phone: +34-91-497-2412
Email: rdiaz02 at gmail.com
ramon.diaz at iib.uam.es
http://ligarto.org/rdiaz
Don't know how useful it is any more, but back in the days, I gave this talk in Vienna http://www.ci.tuwien.ac.at/Conferences/useR-2004/Keynotes/Dalgaard.pdf Looking at it now, perhaps it moves a little too quickly into the hairy stuff. On the other hand, those were the things that I had found important to figure out at the time. At a quick glance, I didn't spot anything obviously outdated.
On Mar 22, 2012, at 16:15 , Ramon Diaz-Uriarte wrote:
On Thu, 22 Mar 2012 10:38:55 -0400,Simon Urbanek <simon.urbanek at r-project.org> wrote:
On Mar 22, 2012, at 9:45 AM, Terry Therneau <therneau at mayo.edu> wrote:
strongly disagree. I'm appalled to see that sentence here. Come on!
The overhead is significant for any large vector and it is in particular unnecessary since in .C you have to allocate *and copy* space even for results (twice!). Also it is very error-prone, because you have no information about the length of vectors so it's easy to run out of bounds and there is no way to check. IMHO .C should not be used for any code written in this century (the only exception may be if you are passing no data, e.g. if all you do is to pass a flag and expect no result, you can get away with it even if it is more dangerous). It is a legacy interface that dates way back and is essentially just re-named .Fortran interface. Again, I would strongly recommend the use of .Call in any recent code because it is safer and more efficient (if you don't care about either attribute, well, feel free ;)).
So aleph will not support the .C interface? ;-)
It will look at the timestamp of the source file and delete the package if it is not before 1980 ;). Otherwise it will send a request for punch cards with ".C is deprecated, please upgrade to .Call" stamped out :P At that point I'll be flaming about using the native Aleph interface and not the R compatibility layer ;) Cheers, S
I'll dissent -- I don't think .C is inherently any more dangerous than .Call and prefer it's simplicity in many cases. Calling C at all is what is inherently dangerous -- I can reference beyond the end of a vector, write over objects that should be read only, and branch to random places using either interface.
You can always do so deliberately, but with .C you have no way of preventing it since you don't even know what is the length! That is certainly far more dangerous than .Call where you can simply loop over the length, check that the lengths are compatible etc. Also for types like strings .C is a minefield that is hard to not blow up whereas .Call it is even more safe than scalar arrays. You can do none of that with .C which relies entirely on conventions with no recorded semantics.
If you are dealing with large objects and worry about memory efficiency then .Call puts more tools at your disposal and is worth the effort. However, I did not find the .Call interface at all easy to use at first
I guess this depends on the developer and is certainly a factor. Personally, I find the subset of the R API needed for .Call fairly small and intuitive (in particular when you are just writing a safer replacement for .C), but I'm obviously biased. Maybe in a separate thread we could discuss this - I'd be happy to write a ref card or cheat sheet if I find out what people find challenging on .Call. Nonetheless, my point is that it is more than worth investing the effort both in safety and performance.
After your previous email I made a mental note "try to finally learn to use .Call since I often deal with large objects". So, yes, I'd love to see a ref card and cheat sheet: I have tried learning to use .Call a few times, but have always gone back to .C since (it seems that) all I needed to know are just a couple of conventions, and the rest is "C as usual". You say "if I find out what people find challenging on .Call". Hummm... can I answer "basically everything"? I think Terry Thereneau says, "the things I needed to know are scattered about in multiple places". When I see the convolve example (5.2 in Writing R extensions) I understand the C code; when I see the convolve2 example in 5.10.1 I think I can guess what lines "PROTECT(a ..." to "xab = NUMERIC_POINTER ..." might be doing, but I would not know how to do that on my own. Yes, I can go to 5.9.1 to read about PROTECT, then search for ... But, at that point, I've gone back to .C. Of course, this might just be my laziness/incompetence/whatever. Best, R.
and we should keep that in mind before getting too pompous in our lectures to the "sinners of .C". (Mostly because the things I needed to know are scattered about in multiple places.) I might have to ask for an exemption on that timestamp -- the first bits of the survival package only reach back to 1986. And I've had to change source code systems multiple times which plays hob with the file times, though I did try to preserve the changelog history to forstall some future litigious soul who claims they wrote it first (sccs -> rcs -> cvs -> svn -> mercurial). :-)
;) Maybe the rule should be based on the date of the first appearance of the package, fair enough :)
Cheers, Simon [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
--
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina
Universidad Aut?noma de Madrid
Arzobispo Morcillo, 4
28029 Madrid
Spain
Phone: +34-91-497-2412
Email: rdiaz02 at gmail.com
ramon.diaz at iib.uam.es
http://ligarto.org/rdiaz
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20120322/06065c74/attachment.pl>
On 03/22/2012 11:03 AM, peter dalgaard wrote:
Don't know how useful it is any more, but back in the days, I gave this talk in Vienna http://www.ci.tuwien.ac.at/Conferences/useR-2004/Keynotes/Dalgaard.pdf Looking at it now, perhaps it moves a little too quickly into the hairy stuff. On the other hand, those were the things that I had found important to figure out at the time. At a quick glance, I didn't spot anything obviously outdated.
Peter, I just looked at this, and I'd say that moved into the hairy stuff way too quickly. Much of what it covered I would never expect to use. Some I ddn't understand. Part of this of course is that slides for a talk are rarely very useful without the talker. Something simpler for the layman would be good. Terry T.
Peter, thanks for the slides. However, I felt like Terry and I think because I am missing the "big picture" that I was somewhat surprised by some of the content and organization (e.g., the detail about character vectors, the usage of the tcltk package as example code). Best, R.
On Thu, 22 Mar 2012 12:45:14 -0500,Terry Therneau <therneau at mayo.edu> wrote:
On 03/22/2012 11:03 AM, peter dalgaard wrote:
Don't know how useful it is any more, but back in the days, I gave this talk in Vienna http://www.ci.tuwien.ac.at/Conferences/useR-2004/Keynotes/Dalgaard.pdf Looking at it now, perhaps it moves a little too quickly into the hairy stuff. On the other hand, those were the things that I had found important to figure out at the time. At a quick glance, I didn't spot anything obviously outdated.
Peter, I just looked at this, and I'd say that moved into the hairy stuff way too quickly. Much of what it covered I would never expect to use. Some I ddn't understand. Part of this of course is that slides for a talk are rarely very useful without the talker.
Something simpler for the layman would be good.
Terry T.
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina
Universidad Aut?noma de Madrid
Arzobispo Morcillo, 4
28029 Madrid
Spain
Phone: +34-91-497-2412
Email: rdiaz02 at gmail.com
ramon.diaz at iib.uam.es
http://ligarto.org/rdiaz
This is my shot at a cheat sheet. comments are welcome. Simon -------------- next part -------------- A non-text attachment was scrubbed... Name: R API cheat sheet.pdf Type: application/pdf Size: 51661 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20120323/a13f948a/attachment.pdf> -------------- next part --------------
On Mar 23, 2012, at 6:35 AM, Ramon Diaz-Uriarte wrote:
Peter, thanks for the slides. However, I felt like Terry and I think because I am missing the "big picture" that I was somewhat surprised by some of the content and organization (e.g., the detail about character vectors, the usage of the tcltk package as example code). Best, R. On Thu, 22 Mar 2012 12:45:14 -0500,Terry Therneau <therneau at mayo.edu> wrote:
On 03/22/2012 11:03 AM, peter dalgaard wrote:
Don't know how useful it is any more, but back in the days, I gave this talk in Vienna http://www.ci.tuwien.ac.at/Conferences/useR-2004/Keynotes/Dalgaard.pdf Looking at it now, perhaps it moves a little too quickly into the hairy stuff. On the other hand, those were the things that I had found important to figure out at the time. At a quick glance, I didn't spot anything obviously outdated.
Peter, I just looked at this, and I'd say that moved into the hairy stuff way too quickly. Much of what it covered I would never expect to use. Some I ddn't understand. Part of this of course is that slides for a talk are rarely very useful without the talker.
Something simpler for the layman would be good.
Terry T.
--
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina
Universidad Aut?noma de Madrid
Arzobispo Morcillo, 4
28029 Madrid
Spain
Phone: +34-91-497-2412
Email: rdiaz02 at gmail.com
ramon.diaz at iib.uam.es
http://ligarto.org/rdiaz
Awesome. I love the reference card. This will be useful.
But I couldn't resist recasting your final "silly" example into
a) inline use which I find generally easier than having to do R CMD SHLIB
followed by dyn.load()
b) a comparison with Rcpp which looks just about the same minus some
UPPERCASE terms stemming from the R API.
We turn error() into Rf_error() as that is a toggle Rcpp sets (given how
error(), length(), etc conflict at times with things of the same name
coming from somewhere else). Alternatively, we could also use 'throw
std::range_error("invalid n")' which then calls Rf_error for us.
It uses one templated cast to int, but the intent is as readable as
asInteger. You could of course use asInteger too, as Doug eg prefers.
It then declares on list type of the right size. Alloc and all that is
done behind the scenes -- abstraction. And the assignment of the
unaltered SEXP type x that is prelicated may just be simpler.
Code is below for both variants, and the same outputs.
Dirk
R> library(inline)
R>
R> replicateToListC <- cfunction(signature(x="any", N="integer"), body='
+ int n = asInteger(N);
+ if (n < 0) error("N must be non-negative");
+ SEXP res = allocVector(VECSXP, n);
+ for (int i = 0; i < n; i++) SET_VECTOR_ELT(res, i, x);
+ return res;
+ ')
R>
R> replicateToListC(1:2, -2)
Error in replicateToListC(1:2, -2) : N must be non-negative
R> replicateToListC(1:2, 2)
[[1]]
[1] 1 2
[[2]]
[1] 1 2
R>
R> replicateToListCpp <- cxxfunction(signature(x="any", N="integer"), plugin="Rcpp", body='
+ int n = as<int>(N);
+ if (n < 0) Rf_error("N must be non-negative");
+ List res(n);
+ for (int i = 0; i < n; i++) res[i] = x;
+ return res;
+ ')
R>
R> replicateToListCpp(1:2, -2)
Error in replicateToListCpp(1:2, -2) : N must be non-negative
R> replicateToListCpp(1:2, 2)
[[1]]
[1] 1 2
[[2]]
[1] 1 2
R>
R>
"Outside of a dog, a book is a man's best friend. Inside of a dog, it is too dark to read." -- Groucho Marx
3 days later
On 03/23/2012 10:58 AM, Simon Urbanek wrote:
This is my shot at a cheat sheet. comments are welcome. Simon
I was looking through the cheat sheet. It's nice. There are a few things in it that I can't find in the documentation though. Where would one find a description? (I can guess, but that may be dangerous). mkNamed R_Naint (I don't see quite how this differs from using NA_INTEGER to set a result) R_PreserveObject, R_ReleaseObject (Advantages/disadvantages wrt PRESERVE?) The last is the most interesting to me. Terry T. -------------- next part -------------- A non-text attachment was scrubbed... Name: R API cheat sheet.pdf Type: application/pdf Size: 51660 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20120327/2098440a/attachment.pdf>
On Mar 27, 2012, at 12:03 PM, Terry Therneau wrote:
On 03/23/2012 10:58 AM, Simon Urbanek wrote:
This is my shot at a cheat sheet. comments are welcome. Simon
I was looking through the cheat sheet. It's nice. There are a few things in it that I can't find in the documentation though. Where would one find a description? (I can guess, but that may be dangerous). mkNamed
It is a shorthand for using allocVector and then setting names (which can be tedious). It's a simple way to create a result list/object (a very common thing to do):
SEXP res = PROTECT(mkNamed(VECSXP, (const char*[]) { "foo", "bar", ""}));
// fill res with SET_VECTOR_ELT(res, ..)
setAttrib(res, R_ClassSymbol, mkString("myClass"));
UNPROTECT(1);
return res;
Note that the sentinel is "" (not not NULL as commonly used in other APIs). Also you don't specify the length because it is determined from the names.
R_Naint (I don't see quite how this differs from using NA_INTEGER to set a result)
It doesn't really -- NA_INTEGER is defined to be R_NaInt. In theory NA_INTEGER being a macro could be a constant instead -- maybe for efficiency -- but currently it's not.
R_PreserveObject, R_ReleaseObject (Advantages/disadvantages wrt PRESERVE?)
I guess you mean wrt PROTECT? Preserve/Release is used for objects that you want to be globally preserved - i.e. they will survive exit from the function. In contrast, the protection stack is popped when you exit the function (both by error or success). Cheers, Simon
FWIW: I have put the (slightly updated) sheet at http://r.research.att.com/man/R-API-cheat-sheet.pdf Note that it is certainly incomplete - but that is intentional to a) to fit the space constraints and b) to show only the most basic things since we are talking about starting with .Call -- advanced users may need a different sheet but then they just go straight to the headers anyway ... Cheers, Simon
On Mar 27, 2012, at 12:20 PM, Simon Urbanek wrote:
On Mar 27, 2012, at 12:03 PM, Terry Therneau wrote:
On 03/23/2012 10:58 AM, Simon Urbanek wrote:
This is my shot at a cheat sheet. comments are welcome. Simon
I was looking through the cheat sheet. It's nice. There are a few things in it that I can't find in the documentation though. Where would one find a description? (I can guess, but that may be dangerous). mkNamed
It is a shorthand for using allocVector and then setting names (which can be tedious). It's a simple way to create a result list/object (a very common thing to do):
SEXP res = PROTECT(mkNamed(VECSXP, (const char*[]) { "foo", "bar", ""}));
// fill res with SET_VECTOR_ELT(res, ..)
setAttrib(res, R_ClassSymbol, mkString("myClass"));
UNPROTECT(1);
return res;
Note that the sentinel is "" (not not NULL as commonly used in other APIs). Also you don't specify the length because it is determined from the names.
R_Naint (I don't see quite how this differs from using NA_INTEGER to set a result)
It doesn't really -- NA_INTEGER is defined to be R_NaInt. In theory NA_INTEGER being a macro could be a constant instead -- maybe for efficiency -- but currently it's not.
R_PreserveObject, R_ReleaseObject (Advantages/disadvantages wrt PRESERVE?)
I guess you mean wrt PROTECT? Preserve/Release is used for objects that you want to be globally preserved - i.e. they will survive exit from the function. In contrast, the protection stack is popped when you exit the function (both by error or success). Cheers, Simon
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel