Skip to content

Source references from the parser

13 messages · Deepayan Sarkar, Duncan Murdoch, Seth Falcon

#
I have just committed some changes to R-devel (which will become R 2.5.0 
next spring) to add source references to parsed R code.  Here's a 
description of the scheme:

The design is done through 2 old-style classes.

"srcfile" corresponds to a source file: it contains a filename, the
working directory in which that filename is to be interpreted, the last
modified timestamp of the file at the time the object is created, plus
some internal components.  It is implemented as an environment so that
there can be multiple references to it.

"srcref" is a reference to a particular range of characters (as the
parser sees them; I think that really means bytes, but I haven't tested
with MBCSs) in a source file.  It is implemented as a vector of 4
integers (first line, first column, last line, last column), with the
srcfile as an attribute.

The parser attaches a srcref attribute to each complete statement as it
gets parsed, if option("useSource") is TRUE.  (I've left the old source
attribute in place as well for functions; I think it won't be needed in 
the long run, but it is needed now.)

When printing an object with a srcref attribute, print.default tries to
read the srcfile to obtain the text.  If it fails, it falls back to an
ugly display of the reference.  Using a new argument useSource=FALSE in
printing will stop this attempt:  when printing language, it will
deparse; when printing a srcref, it will print the ugly fallback.

source(echo=T) will echo all the lines of the file including comments
and formatting.  demo() does the same, and I would guess Sweave will do
this too, but I haven't tested that yet.  I think this will improve
Sweave output, but will need changes to the input file:  people may have
comments there that they don't want shown.  Some sort of
"useSource=FALSE" option will need to be added.

The browser used with debug() etc. will display statements as they were
formatted in the original source.  It will not display leading or
following comments, but will display embedded comments.

Parsing errors display the name of the source file that was parsed, and
display verbose error messages describing what's wrong.  This display
could still be improved, e.g. by displaying the whole source line with a
pointer to the error, instead of just the text up to the location of the
error.

I plan to add some sort of equivalent of C "#line" directives, so that
preprocessed source files (e.g. the concatenated source that is
installed) can include references back to the original source files, for
syntax error reporting, and/or debugging.  This will require
modification of the INSTALL process, but I haven't started on this yet.

It would probably be a good idea to have some utility functions to play
with the srcref records for debugging and other purposes, but I haven't
written those yet.  For example, the current source record on a function
could be replaced with a srcref, but only by expanding the srcref to
include some of the surrounding comments.

Comments and problem reports are welcome.

Duncan Murdoch
#
On 11/25/06, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
I haven't tested this, but the idea seems useful. Will this have any
effect on code parsed using parse(text = "...")? Can it be extended to
have some such effect? I ask because this is relevant in the context
of Sweave, where I have always wanted the ability to retain the
original formatting. I'm currently testing a patch that allows me to
do this specifically for Sweave, but a more general solution is
obviously preferable.

-Deepayan
#
On 11/25/2006 3:12 PM, Deepayan Sarkar wrote:
It won't currently, but that's on the todo list.

When it will arrive depends on how many other things land on my desk in 
the next few weeks:  if I don't get it done before January, it probably 
won't make 2.5.0.

Duncan Murdoch
#
On 11/25/2006 3:12 PM, Deepayan Sarkar wrote:
I've just added the capability to Sweave.  I haven't committed yet, 
because I think it's important that authors can choose whether or not to 
turn this on.  Could you let me know your typical workflow with Sweave, 
whether you'd like this to default to on or off, and where you'd expect 
to change the default?

Duncan Murdoch
#
On 11/25/06, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
I would like it as an option to the RweaveLatex driver (and perhaps
others). In terms of changing the API, this is as simple as adding an
argument to the 'RweaveLatexSetup' function.

In the case of my patch, the default is off, and is turned on by

<<...,src=TRUE>>
...
@

To make this the global default, one can do

\SweaveOpts{src=TRUE}

etc. (the name 'src' is not necessarily the best, some variant of
'keep.source' might be more intuitive.)

Thanks for getting to this so quickly.

-Deepayan
#
On 11/25/2006 11:00 PM, Deepayan Sarkar wrote:
This is now committed.

I used keep.source, exactly the same as the option() that controls this 
behaviour in other places.

I decided to set the default to TRUE.  This means vignettes will all 
look different in R-devel.  The simplest way to get the previous 
appearance is to put in

\SweaveOpts{keep.source=FALSE}

but in most cases I think people will want the new behaviour.  It's only 
bad if the code was badly formatted or contained comments you don't want 
to show up in the final document.  I looked through the grid package 
vignettes, and only saw about half a dozen places where I thought the 
formatting needed tweaking.

Duncan Murdoch
#
On 11/25/06, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
[snipped]
Working great, thanks. The behaviour of comments is interesting (and
fortuitous for me), in that top level comments get associated with the
expression that follows it and comments at the end of a chunk are
ignored. Is this intended and can I expect it to remain unchanged? (My
chunks currently tend to end with a blank comment to work around a bug
in ESS, and I would rather not fix them all.)
Fine with me (although I suspect R's deparse will often produce more
readable code if the original wasn't written in an editor with smart
indentation).

-Deepayan
#
On 11/26/2006 3:02 AM, Deepayan Sarkar wrote:
I'd say it's intentional but subject to change in the next few weeks, 
depending on the reaction to it.  It behaves the same when running 
examples in man pages.  As far as I know it suits the comment style used 
in the base packages well.  Since comments haven't been displayed in the 
past, I can't see how it would inconvenience anyone too much, but you 
never know...

browser() behaves differently during debugging:  there neither preceding 
nor following comments are shown.  My hope is that a source browser will 
be made easier by these source references, so I don't intend to change 
that.

Duncan Murdoch
1 day later
#
Hi Duncan, all,

Duncan Murdoch <murdoch at stats.uwo.ca> writes:
I'd really like the default to be FALSE, at least for the upcoming
release.  As you note, the change will cause vignettes to look
different.  There are over 260 vignettes in the BioC source tree and
given that most developers have little extra time, I would rather they
not have to review these docs because of this change.
I'm glad the feature is there, I think it is desirable.  I hope to
turn this option on when I write new documents ;-)

+ seth
#
On 11/27/2006 7:56 PM, Seth Falcon wrote:
The next release hasn't been scheduled, but it's likely to be 2.4.1, 
sometime in December.  These changes won't be in that release.  They're 
in R-devel, which will become 2.5.0 sometime in April.

I'd rather leave the default as TRUE for now, and make a final decision 
on it sometime near when the alpha test period starts in February or 
early March.  If the default is set to FALSE now, then it won't get 
tested, and I'd much rather hear about bugs in the implementation before 
release, rather than after.

I'd also like to know if the \SweaveOpts line below fails on any 
systems, and whether there's an easier way to turn the option off. 
(I've tested it in R 2.4.0 patched, but not in older systems.)  If it is 
compatible with all old versions, then a sed script to insert it into 
all 260 of your vignettes would be pretty easy to write; the main cost 
would then be to replace the current package versions with new versions 
containing this change.  How many packages are we talking about, that 
wouldn't otherwise be updated before April?

My inclination would be to default to TRUE in the long term, on the 
basis that doing nothing to the user's code should be the default, 
rather than the option.  The fact that this changes the look of 
documents from existing packages is obviously an argument in favour of a 
FALSE default.  So the final decision hasn't been made yet.
Duncan Murdoch
#
There have been a couple of requests to set the keep.source default to 
FALSE (i.e. not to enable the new behaviour), so I've done that.

Hopefully the code will still be exercised enough that we can have 
confidence in it by April.

Duncan Murdoch
#
Duncan Murdoch <murdoch at stats.uwo.ca> writes:
Thank you.  That is much appreciated.
FWIW, I intend to set keep.source=TRUE for the packages that I work on
directly and will speak up if I encounter problems :-)

+ seth
#
On 11/29/2006 11:26 AM, Seth Falcon wrote:
Thanks, that will really help.

Duncan Murdoch