Skip to content

R vs. C

21 messages · Patrick Leyshock, Duncan Murdoch, Dirk Eddelbuettel +7 more

#
Everyone has their own utility
function.  Mine is if the boredom
of waiting for the pure R function
to finish is going to out-weight the
boredom of writing the C code.

Another issue is that adding C code
increases the hassle of users who might
want the code to run on different
architectures.
On 17/01/2011 17:13, Patrick Leyshock wrote:

  
    
#
On 17/01/2011 12:41 PM, Patrick Burns wrote:
... and also makes it harder for you and your users to tweak your code 
for different uses.

It is not uncommon for C code to run 100 times faster than R code (but 
it is also not uncommon to see very little speedup, if the R code is 
well vectorized).  So if you have something that's really slow, think 
about the fundamental operations, and write those in C, then use R code 
to glue them together.  But if it is fast enough without doing that, 
then leave it all in R.

Duncan Murdoch
#
On 17 January 2011 at 09:13, Patrick Leyshock wrote:
| A question, please about development of R packages:
| 
| Are there any guidelines or best practices for deciding when and why to
| implement an operation in R, vs. implementing it in C?  The "Writing R
| Extensions" recommends "working in interpreted R code . . . this is normally
| the best option."  But we do write C-functions and access them in R - the
| question is, when/why is this justified, and when/why is it NOT justified?
| 
| While I have identified helpful documents on R coding standards, I have not
| seen notes/discussions on when/why to implement in R, vs. when to implement
| in C.

The (still fairly recent) book 'Software for Data Analysis: Programming with
R' by John Chambers (Springer, 2008) has a lot to say about this.  John also
gave a talk in November which stressed 'multilanguage' approaches; see e.g.
http://blog.revolutionanalytics.com/2010/11/john-chambers-on-r-and-multilingualism.html

In short, it all depends, and it is unlikely that you will get a coherent
answer that is valid for all circumstances.  We all love R for how expressive
and powerful it is, yet there are times when something else is called for.
Exactly when that time is depends on a great many things and you have not
mentioned a single metric in your question.  So I'd start with John's book.

Hope this helps, Dirk
#
I think we're also forgetting something, namely testing.  If you write your 
routine in C, you have placed additional burden upon yourself to test your C 
code through unit tests, etc.  If you write your code in R, you still need the 
unit tests, but you can rely on the well tested nature of R to allow you to 
reduce the number of tests of your algorithm.  I routinely tell people at Sage 
Bionetworks where I am working now that your new C code needs to experience at 
least one order of magnitude increase in performance to warrant the effort of 
moving from R to C.

But, then again, I am working with scientists who are not primarily, or even 
secondarily, coders...

Dave H



----- Original Message ----
From: Dirk Eddelbuettel <edd at debian.org>
To: Patrick Leyshock <ngkbr8es at gmail.com>
Cc: r-devel at r-project.org
Sent: Mon, January 17, 2011 10:13:36 AM
Subject: Re: [Rd] R vs. C
On 17 January 2011 at 09:13, Patrick Leyshock wrote:
| A question, please about development of R packages:
| 
| Are there any guidelines or best practices for deciding when and why to
| implement an operation in R, vs. implementing it in C?  The "Writing R
| Extensions" recommends "working in interpreted R code . . . this is normally
| the best option."  But we do write C-functions and access them in R - the
| question is, when/why is this justified, and when/why is it NOT justified?
| 
| While I have identified helpful documents on R coding standards, I have not
| seen notes/discussions on when/why to implement in R, vs. when to implement
| in C.

The (still fairly recent) book 'Software for Data Analysis: Programming with
R' by John Chambers (Springer, 2008) has a lot to say about this.  John also
gave a talk in November which stressed 'multilanguage' approaches; see e.g.
http://blog.revolutionanalytics.com/2010/11/john-chambers-on-r-and-multilingualism.html


In short, it all depends, and it is unlikely that you will get a coherent
answer that is valid for all circumstances.  We all love R for how expressive
and powerful it is, yet there are times when something else is called for.
Exactly when that time is depends on a great many things and you have not
mentioned a single metric in your question.  So I'd start with John's book.

Hope this helps, Dirk
#
Another point I have not yet seen mentioned:  If your code is 
painfully slow, that can often be fixed without leaving R by 
experimenting with different ways of doing the same thing -- often after 
using profiling your code to find the slowest part as described in 
chapter 3 of "Writing R Extensions".


       If I'm given code already written in C (or some other language), 
unless it's really simple, I may link to it rather than recode it in R.  
However, the problems with portability, maintainability, transparency to 
others who may not be very facile with C, etc., all suggest that it's 
well worth some effort experimenting with alternate ways of doing the 
same thing in R before jumping to C or something else.


       Hope this helps.
       Spencer
On 1/17/2011 10:57 AM, David Henderson wrote:
#
On Mon, Jan 17, 2011 at 6:57 PM, David Henderson <dnadavewa at yahoo.com> wrote:
If you write your code in C but interface to it in R, you can use the
same R test harness system. I recently coded something up in R, tested
it on small data, discovered it was waaay too slow on the real data,
rewrote the likelihood calculation in C, and then used the same test
set to make sure it was giving the same answers as the R code. It
wasn't. So I fixed that bug until it was. If I'd written the thing in
C to start with I might not have spotted it.

 Sometimes writing a prototype in R is a useful testing tool even when
you know it'll be too slow - as an interpreted language R gives you a
rapid development cycle and handy interactive debugging possibilities.
Things that do exist in C but require compilation....

Barry
#
For me, a major strength of R is the package development 
process.  I've found this so valuable that I created a Wikipedia entry 
by that name and made additions to a Wikipedia entry on "software 
repository", noting that this process encourages good software 
development practices that I have not seen standardized for other 
languages.  I encourage people to review this material and make 
additions or corrections as they like (or sent me suggestions for me to 
make appropriate changes).


       While R has other capabilities for unit and regression testing, I 
often include unit tests in the "examples" section of documentation 
files.  To keep from cluttering the examples with unnecessary material, 
I often include something like the following:


A1 <- myfunc() # to test myfunc

A0 <- ("manual generation of the correct  answer for A1")

\dontshow{stopifnot(} # so the user doesn't see "stopifnot("
all.equal(A1, A0) # compare myfunc output with the correct answer
\dontshow{)} # close paren on "stopifnot(".


       This may not be as good in some ways as a full suite of unit 
tests, which could be provided separately.  However, this has the 
distinct advantage of including unit tests with the documentation in a 
way that should help users understand "myfunc".  (Unit tests too 
detailed to show users could be completely enclosed in "\dontshow".


       Spencer
On 1/17/2011 11:38 AM, Dominick Samperi wrote:
#
Spencer

Would it not be easier to include this kind of test in a small file in the tests/ directory?

Paul

-----Original Message-----
From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of Spencer Graves
Sent: January 17, 2011 3:58 PM
To: Dominick Samperi
Cc: Patrick Leyshock; r-devel at r-project.org; Dirk Eddelbuettel
Subject: Re: [Rd] R vs. C


       For me, a major strength of R is the package development 
process.  I've found this so valuable that I created a Wikipedia entry 
by that name and made additions to a Wikipedia entry on "software 
repository", noting that this process encourages good software 
development practices that I have not seen standardized for other 
languages.  I encourage people to review this material and make 
additions or corrections as they like (or sent me suggestions for me to 
make appropriate changes).


       While R has other capabilities for unit and regression testing, I 
often include unit tests in the "examples" section of documentation 
files.  To keep from cluttering the examples with unnecessary material, 
I often include something like the following:


A1 <- myfunc() # to test myfunc

A0 <- ("manual generation of the correct  answer for A1")

\dontshow{stopifnot(} # so the user doesn't see "stopifnot("
all.equal(A1, A0) # compare myfunc output with the correct answer
\dontshow{)} # close paren on "stopifnot(".


       This may not be as good in some ways as a full suite of unit 
tests, which could be provided separately.  However, this has the 
distinct advantage of including unit tests with the documentation in a 
way that should help users understand "myfunc".  (Unit tests too 
detailed to show users could be completely enclosed in "\dontshow".


       Spencer
On 1/17/2011 11:38 AM, Dominick Samperi wrote:
______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
====================================================================================

La version fran?aise suit le texte anglais.

------------------------------------------------------------------------------------

This email may contain privileged and/or confidential information, and the Bank of
Canada does not waive any related rights. Any distribution, use, or copying of this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately from
your system and notify the sender promptly by email that you have done so. 

------------------------------------------------------------------------------------

Le pr?sent courriel peut contenir de l'information privil?gi?e ou confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires d?sign?s est interdite. Si vous recevez
ce courriel par erreur, veuillez le supprimer imm?diatement et envoyer sans d?lai ?
l'exp?diteur un message ?lectronique pour l'aviser que vous avez ?limin? de votre
ordinateur toute copie du courriel re?u.
#
Hi, Paul:


       The "Writing R Extensions" manual says that *.R code in a "tests" 
directory is run during "R CMD check".  I suspect that many R 
programmers do this routinely.  I probably should do that also.  
However, for me, it's simpler to have everything in the "examples" 
section of *.Rd files.  I think the examples with independently 
developed answers provides useful documentation.


       Spencer
On 1/17/2011 1:52 PM, Paul Gilbert wrote:
#
Hi, Dominick, et al.:


       Demanding complete unit test suites with all software contributed 
to CRAN would likely cut contributions by a factor of 10 or 100.  For 
me, the R package creation process is close to perfection in providing a 
standard process for documentation with places for examples and test 
suites of various kinds.  I mention "perfection", because it makes 
developing "trustworthy software" (Chamber's "prime directive") 
relatively easy without forcing people to do things they don't feel 
comfortable doing.


       If you need more confidence in the software you use, you can 
build your own test suites -- maybe in packages you write yourself -- or 
pay someone else to develop test suites to your specifications.  For 
example, Revolution Analytics offers "Package validation, development 
and support".


        Spencer
On 1/17/2011 3:27 PM, Dominick Samperi wrote:
#
On 01/18/2011 01:13 AM, Dominick Samperi wrote:
Regarding access to all platforms: But there's r-forge where building and checks 
are done nightly for Linux, Win, and Mac (though for some months now the check 
protocols are not available for 32 bit Linux and Windows - but I hope they'll be 
back soon).
I found it extremely easy to get an account & project space and building.
Many thanks to r-forge!

complete unit test suites:
To me, it seems nicer and better to favour packages that do it than mechanical 
enforcement. E.g. show icons that announce if a package comes with vignette, 
test suite (code coverage), and etc.

My 2 ct,

Claudia

  
    
#
I'm not at all a fan of thinking
of the examples as being tests.

Examples should clarify the thinking
of potential users.  Tests should
clarify the space in which the code
is correct.  These two goals are
generally at odds.
On 17/01/2011 22:15, Spencer Graves wrote:

  
    
#
On 01/18/2011 10:53 AM, Patrick Burns wrote:
Patrick, I completely agree with you that
- Tests should not clutter the documentation and go to their proper place.
- Examples are there for the user's benefit - and must be written accordingly.
- Often, test should cover far more situations than good examples.

Yet it seems to me that (part of the) examples are justly considered a (small) 
subset of the tests:
As a potential user, I reqest two things from good examples that have an 
implicit testing message/side effect:
- I like the examples to roughly outline the space in which the code works: they 
should tell me what I'm supposed to do.
- Depending on the function's purpose, I like to see a demonstration of the 
correctness for some example calculation.
(I don't want to see all further tests - I can look them up if I feel the need)

The fact that the very same line of example code serves a testing (side) purpose 
  doesn't mean that it should be copied into the tests, does it?

Thus, I think of the "public" part (the "preface") of the tests living in the 
examples.

My 2 ct,
Best regards,

Claudia

  
    
#
Claudia,

I think we agree.

Having the examples run in the
tests is a good thing, I think.
They might strengthen the tests
some (especially if there are
no other tests).  But mainly if
examples don't work, then it's
hard to have much faith in the
code.
On 18/01/2011 11:36, Claudia Beleites wrote:

  
    
#
On 1/18/2011 8:44 AM, Dominick Samperi wrote:
CRAN also runs "R CMD check" on its contributed packages.  I've found 
problems (and fixed) that I couldn't replicate by reviewing the repeated 
checks on both R-Forge and CRAN.


Spencer