Skip to content

On Reproducible Code

19 messages · David L Carlson, Gabor Grothendieck, David Winsemius +9 more

#
We often refer requesters to the Posting Guide and chide them for not
reading it. Recently I had occasion to re-read the Posting Guide which is
for all R lists not just R-help. The word "reproducible" does not appear
anywhere in the guide. The closest it comes is the following suggestion:

"Sometimes it helps to provide a small example that someone can actually
run."

Recommendations to use the function dput() to provide sample data do not
appear in the guide. 

The bottom of messages to R-help does contain the statement you've all seen,
but I had assumed it summarized advice found elsewhere since first time
posters may not see the message until after they have posted.

"PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal,
self-contained, reproducible code."

The Mailing Lists page describes R-help but refers only to the posting guide
http://www.r-project.org/mail.html and does not include this advisory
statement.

The R-help Info Page also refers only to the posting guide
https://stat.ethz.ch/mailman/listinfo/r-help and does not include this
advisory statement.

I hesitate to sound too optimistic, but there might be some advantage in
making the statement more prominent and adding a reproducible example using
dput().

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
#
On Wed, Jul 25, 2012 at 11:50 AM, David L Carlson <dcarlson at tamu.edu> wrote:
I agree that the posting guide is not ideal.  On the other hand, the
last line of every message to r-help does concisely list what is
required.
#
********************
 PLEASE provide commented, minimal, self-contained, reproducible code.
Whenever possible, provide a small example that can be easily loaded
and run to illustrate your problem. The R function dput() should
generally be used to do this.

For a more complete discussion of how to post queries that will yield
accurate, helpful responses, refer to  the posting guide at
http://www.R-project.org/posting-guide.html.
*********************

I agree. Perhaps slightly modifying the message and moving it to the
top as I have done here (please feel free to edit as appropriate)
would be useful. This might have the psychological advantage of making
it painfuly obvious to OP's and readers when they have not followed
the recommendations.

Might be worth a try and seems like something that would be easy to do.

Cheers,
Bert
On Wed, Jul 25, 2012 at 8:50 AM, David L Carlson <dcarlson at tamu.edu> wrote:

  
    
#
On Wed, Jul 25, 2012 at 12:25 PM, Bert Gunter <gunter.berton at gene.com> wrote:
The one line summary at the end is better than the posting guide.  It
tells you right off what to do in just one line.

On the other hand realistically few are going to wade through the
posting guide..  Its better to keep the one line summary at the end
since that is the best bet that someone will actually read some
guidance on how to post.
#
Hello,

This does not mean that the posting guide is useless. Nor that it 
couldn't or shouldn't be changed.

I would say "shouldn't" because there's a clear call to reproducible 
code in another part of R, the man files created by package.skeleton:

\examples{
##---- Should be DIRECTLY executable !! ----
##-- ==>  Define data, use random,
##--    or do  help(data=index)  for the standard data sets.

So maybe some of the posting guide should be more clear and concise. 
Simple rules at the beginning, itemized or enumerated.

(And keep that end line.)

Rui Barradas

Em 25-07-2012 17:44, Gabor Grothendieck escreveu:
#
On Jul 25, 2012, at 8:50 AM, David L Carlson wrote:

            
The absence of dput from the PG is a bit surprisong, but an equivalent  
bit of advice does appear:

"When providing examples, it is best to give an R command that  
constructs the data, as in the matrix() expression above. For more  
complicated data structures, dump("x", file=stdout()) will print an  
expression that will recreate the object x. "

--
David.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
1 day later
#
On 07/26/2012 01:50 AM, David L Carlson wrote:
The reponses to some requests for help do seem to get a volley of the 
"reproducible code" answers. Some, such as:

I can't get the answer. PLEASE HELP!!!

probably deserve it, but others appear to emerge from the overheated 
brain of the frustrated noob. With a wonderfully informative name like 
"dput", it is rather challenging to guess that this function is the way 
to calm the affronted guru with an example of your problem. I am 
particularly amused by the phrase "reproducible code", which sounds 
perilously close to the definition of a virus. Perhaps the neglected 
little message at the bottom of each email (which seems to reproduce 
itself) might be easier for the uninitiated to understand if it read:

Please include the R code that is causing the problem _and_ enough data 
(see the "dput" function) for someone else to run the code and get the 
same problem.

I can remember when I didn't know that there was a "dput" function.

Jim
#
I can remember spending a lot of time constructing a data set to post before someone mentioned ?dput.  Ah, yes, I still have a couple of generic ones archived.

I think your wording above makes a lot of sense.

____________________________________________________________
GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at http://www.inbox.com/smileys
Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google Talk? and most webmails
#
I agree and would like to see it placed at the **TOP** of every post.

-- Bert
On Fri, Jul 27, 2012 at 7:11 AM, John Kane <jrkrideau at inbox.com> wrote:

  
    
#
I'd vote for that!  
It would probably bug the blazes out of experienced users but the time savings in getting a newbie to actually supply enough information so that someone can, at least, try to answer the question would be well worth it.

John Kane
Kingston ON Canada
____________________________________________________________
FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
#
That assumes:

* Everyone reads the mailing list before making the first posting

* Everyone reads every part of every email.

I'd argue that both assumptions are false. People are particular well
trained to skip over boilerplate text at the bottom of emails.

I'd suggest an alternative approach is for experts to remember what
it's like to be a novice, and cultivate an attitude of patience and
tolerance.  That's about as likely to happen as a mass change in
behaviour in new users.

Hadley
On Fri, Jul 27, 2012 at 9:48 AM, John Kane <jrkrideau at inbox.com> wrote:

  
    
#
Hello,

I agree with you. That's why I've proposed an itemized text.
I want my VCR manual to give me point by point instructions, not to give 
me a clear and brief discourse on exactly what to do. And, though 
without access to VCR manufacturers' data tables, I'm with the 
impression that their way works. Users, seen as a mass, adopt the habit 
of reading the literature.

Rui Barradas

Em 27-07-2012 18:47, Hadley Wickham escreveu:
#
I would like to be able to refer briefly to longer explanations such as the stackoverflow article on reproducible examples rather than patiently rewrite such explanations. A posting guide with more specific recommendations would make it easier to "be patient".
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.
Hadley Wickham <hadley at rice.edu> wrote:

            
#
That's definitely a good idea! But you can do it in essentially two ways:

Hey moron, why don't you know what everyone else already knows about
reproducible examples? http://bit.ly/N8Qml6

OR

It's hard to know what exactly what's going wrong without a
reproducible example.  If you haven't created one before, you might
want to read all about it on stackoverflow:
http://stackoverflow.com/questions/5963269. It's a small investment
that's likely to pay off big.

Both are equally easy to copy and paste (or otherwise reproduce). But
unfortunately it sometimes seems like there is much more of the former
than the latter on R-help.

Hadley
#
....
On Fri, Jul 27, 2012 at 10:47 AM, Hadley Wickham <hadley at rice.edu> wrote:
-- which is why I suggested that Jim Lemon's brief version go at the top.

There's obviously no magic bullet. We're in the realm of social
psychology, I guess, here, so I certainly don't have much insight. But
I think the experiment is easy and worth trying.

-- Bert

  
    
2 days later
#
On Jul 30, 2012, at 13:05 , Thomas Adams wrote:

            
On the contrary, everything can be lost by allowing abusers to persevere!

(And yes, there are people who no longer attempt to help, because of ungrateful and downright arrogant behavior they have experienced on the lists.)
#
How about sending an email to the OP with a message like:

"Hi,

Thanks for submitting a question to the R-help list.
We hope you did read the Posting Guide and submitted a reproducible example
of your code (by the use of dput, structure, ...)."


Then there is no need to add the message to the end of every message so that
most of the people automatically skip reading the end of every message (at
least I do....)


Just my 2 cents

Bart






--
View this message in context: http://r.789695.n4.nabble.com/On-Reproducible-Code-tp4637796p4638513.html
Sent from the R help mailing list archive at Nabble.com.