Skip to content

[Bioc-devel] All Package Maintainers Please README!

17 messages · Oleg Sklyar, Marc Carlson, Martin Morgan +6 more

#
Robert recently suggested that I make a stab at a blog-based changelog 
rather than the current monthly postings, sort of similar to what Duncan 
Murdoch has done with the R NEWS and windows CHANGELOG.

The biggest difference between what is done for R and what I will be 
doing for BioC is this; R-core does a really good job of writing 
explanatory notes describing what the change was, and what it means for 
the end user.

On the other hand, the commit messages that people use range from the 
ridiculous to the sublime. Since I will no longer be parsing the commit 
messages by hand, I will not be able to remove the more useless messages 
that people tend to use, and these things will go straight to the 
changelog for all to see.

So, first thing; if you don't want your section of the changelog to be 
populated with things like 'WTF was I thinking?!@!?@!?' or 'Oops', or 
the venerable 'commit' or better yet, the ever popular ' ', you will 
want to actually use a commit message that means something with respect 
to the commit you just made.

Now I know some of the commit messages are not intended for public 
consumption, so there is a way out. If you prepend your commit message 
with INTERNAL, then it will be scrubbed. Or at least I think it will 
;-D. I'm using Python for the first time to do the parsing, so I am sure 
there are bugs aplenty. Note that this INTERNAL thing is _by line_, so 
if you do something like:

INTERNAL This is a commit message nobody should ever see.

But they can see this one.

Then the second part of the message _should_ get through. Note that you 
need to use INTERNAL exactly, as it is always possible that someone 
might use Internal at the beginning of a commit message that they want 
published, so I am not doing any case-changing on the test for this string.

The changelog as it currently exists (with just one day of changes so 
far) can be viewed here:

http://fgc.lsi.umich.edu/cgi-bin/blosxom.cgi

Please take a look and send me any suggestions.

Best,

Jim
#
this is interesting.  it should provoke better changelog comments.
but the comments are less interesting in the absence of data on what
files were actually changed.  and of course the filenames are not
necessarily that informative either.

would it be possible to put in links to some data on the actual code
changes?  you would not want them on the front page, but associated with
each comment.

my sense is that the changelog comments are not a great vehicle
for conveying what is going on.  after all, one can be prompted for just
one comment related to a large number of changes, and details may be
missed.  but the actual physical events on code could -- if appropriately
accessible, and i do not know if the blosxom can do this  without substantial
effort -- be much more informative.
On Fri, 26 Oct 2007, James W. MacDonald wrote:

            
The information transmitted in this electronic communica...{{dropped:2}}
#
Just a suggestion until this finalised. Would not it be more natural to
tag a message with a commonly used comment sign, like # used in R or %
or @? It would also be more unlikely that someone puts such a tag at the
beginning of a public message.

Oleg
-  
Dr Oleg Sklyar * EMBL-EBI, Cambridge CB10 1SD, UK * +441223494466
On Fri, 2007-10-26 at 10:48 -0400, James W. MacDonald wrote:
#
James W. MacDonald wrote:
Hmmm, 

Here in Seattle, Seth had most of us making commit messages where the
1st  line was a brief title describing the major contents of the change
and then there would be a line break followed by any of the gory details
that might be needed to carefully describe what the title meant.  I like
this format because part of a commit message is to say briefly what
changes have taken place, and partly it's also a place to make personal
notes so that later on you can remember what you were thinking at the
time.  So my 1st point is that by habit some of us already separate
these two with a linebreak.

Partly because I adopted this habit already and partly because I don't
want to live in constant fear of what might slip into my commit
messages, it might be nice if you just captured the 1st section and then
allowed us to tag any lines that fall below that linebreak with a
character if we want them to also be in the public eye (with the rest
remaining private by default)?  Of course I could also tag the lower
stuff, but then I am typing lots of extra characters with each commit.

    Marc
#
Jim --

I was actually writing identical sentiment (from across the hall), so
there's a second vote for just the first line (and filtering empty
first lines). Martin

Marc Carlson <mcarlson at fhcrc.org> writes:

  
    
#
That is an interesting thought, although I'm not sure if blosxom can be 
coerced into doing such a thing.

Assuming for a moment that it might be possible, can you give me an 
example of what sort of information you are referring to? I agree that a 
list of files is uninformative unless the reader knows what is in each 
file. OTOH, something like a diff, especially for lots of little 
changes, is IMO very difficult to digest.

As an aside, for someone who is maintaining a single package I think the 
convention is that they just have commit privileges for their package 
alone. However, are they unable to check out other packages and do diffs 
themselves?

I see the changelog as being a simple way to tell others that some 
changes of a certain type have been made, and the interested parties 
could then delve deeper into the subject themselves in whatever manner 
works best for them.
Vincent Carey 525-2265 wrote:
#
i was thinking along the lines of a diff.  these are probably
obtainable through svn.
i don't know
yes, if we have good changelog comment discipline (and it seems they
do at fhcrc) that should be good enough.  and the INTERNAL (or #?)
prefix can be used for those comments we don't care to publicize.
i think that can work.  i wonder if we could make svn put a little
reminder in the editor text at commit time?  would that be a local
configuration or something at the server side?  something like
'comments will be blogged, use # to hide from the external world...'


The information transmitted in this electronic communica...{{dropped:2}}
#
I'm not sure I like this idea, mainly because it is a passive filtering 
on the part of the package maintainer, and a pretty big assumption on my 
part.

Something active like adding an INTERNAL or as Oleg mentioned, a # at 
the beginning of the line signals to me that the maintainer really wants 
this stuff filtered out, whereas only taking the first line requires the 
assumption that the maintainer really only wanted the first line to be 
public.

Also note that a line in this instance is demarcated by a newline 
character, so as long as you don't hit the return key, any number of 
sentences will be filtered by a single INTERNAL or #, so it shouldn't be 
burdensome to filter things out.

Anyway, I'm not sure we should be filtering this stuff out regardless. 
As an example here are some recent commit messages:

First from Biobase:

esApply for ExpressionSet

* made esApply(X, MARGIN, FUN, ...) a generic

* method for exprSet same as previous functoin esApply
   (inappropriately modifies FUN environment, breaking lexical scope)

* method for ExpressionSet equivalent to

   with(pData(X), apply(exprs(X), MARGIN, FUN, ...))

* Documentation to follow shortly

Or from AnnotationDBI:

Remove bad constraint from probes table on chip packages

This change only affects packages for rodents and humans.  It drops
the not null constraint from the accessions collumn which is
problematic since lousy platforms will sometimes have probes that are
measuring "who-knows what".  Users ought to have a right to know when
they are using a probe like this...


Do you really want to argue that the first line in these messages is all 
that should show up in the changelog? In both cases the first line is 
pretty cryptic, but the second line is actually quite useful.

Jim
Martin Morgan wrote:

  
    
#
"James W. MacDonald" <jmacdon at med.umich.edu> writes:
At this level it sounds like the interested party is another
developer. svn log -v PKG gives file names. svn diff -r123:127 PKG
gives a diff between stated revisions.
All developers have read access to the entire repository.  Anonymous
has read access to the Rpacks directory of all devel and release
branches. Maybe the wiki could embed te appropriate code for the diff
/ log.

Martin

  
    
#
James W. MacDonald wrote:
You have a point Jim, but I think that we also have to consider how this
blog will change what we put into our commit messages.  Prior to this, I
have primarily used the 1st line as a "title" and then followed up with
a more detailed description.  But the existence of this blog means that
I will lean towards putting more information in the 1st section (rather
than just a title), and just shift any private information into the
second part.

I am really just suggesting that a standardized format separated by a
clean line break would be the least amount of typing by everyone
involved.  I like this because at least at the fhcrc we are already
using this format, so it's very similar and the format seems to work
pretty well.

This won't change the fact that most people will still just type one
cryptic line for most commits.  But at least those of use who want to
put more personal data in there (purely for the sake of our personal
recollection) will not be penalized by having to type lots of comment
characters.

Of course, I would also like to still have a title for my commits, so
perhaps we should really have 3 sections?  A title, a public
description, and then a private section which could all be separated by
two line breaks?

    Marc
#
Marc Carlson schrieb:
This would make the commit messages less flexible compared to using a 
comment character. I often check in changes for several different things 
at once and it is way easier to read (and type) something like

  some change on foo
  # need to fix the docs for this guy
  bar can now handle....
  # but still can't...

This way I can comment the individual parts directly whereas if we 
separate internal and public stuff into two sections I have to reference 
again in the internal part what I'm talking about.
Florian
#
Marc Carlson wrote:
Yes, there are two of you at the Hutch who do this (now that Seth is 
gone), and I think you both do a wonderful job of describing your commits.

However, none of the other developers do this, so your argument boils 
down to expecting all of the other developers to take up this paradigm 
rather than asking you to add a _single_ # to the beginning of a 
paragraph in your commit message that you don't want published.

If BioC had a BDFL maybe he or she could decree such a thing, but as 
things stand I highly doubt we could coerce all the developers to follow 
your paradigm.

Best,

Jim

  
    
#
On 10/26/07, Martin Morgan <mtmorgan at fhcrc.org> wrote:
A bit off-topic but there is also a program called svn2cl for
producing changelog entries from the svn logs.  I find it very
convenient because it combines information on the revision number, the
files that were modified and the commit message.
#
James W. MacDonald wrote:
Not a problem Jim, 

I was really just trying to get out of hitting shift-3 a few thousand
more times.  We obviously have to go with what most people want.  I just
wanted to put my two cents in.

    Marc
#
Marc Carlson wrote:
Maybe I haven't been very clear. Consider the following as a commit message:

This part of the commit message will be posted. Regardless of the number 
of sentences or apparent number of lines.

#This part of the commit message will not be posted. Even though there 
appears to be several lines here, the Python code only sees one. 
Therefore a single # at the beginning of the paragraph will cause this 
whole paragraph to be scrubbed, regardless of the length. Thus, until 
you have made a few thousand commits you will really not be hitting 
shift-3 that often...

But if you add a newline, the next bit will be posted.

#Unless you comment that part out as well.

Jim

  
    
2 days later
#
Hello All,

just come to this thread and I would like to share my opinion. I  
usually don't log any changes to the svn log for several reason,  
mainly have to do with bad practices (I've been doing it into a  
separate file in inst/doc). I was planing to change to the svn way  
for the next devel/release cycle though. I would like to have some  
flexibility for the log so I prefer adding a specific character at  
the beginning of each line in order to identify them. I don't like  
too much putting something as long as INTERNAL but if a suitable  
character like # can be safely used I'll definitely adhere to this  
policy. As a side comment it may be interesting to add the character  
to the lines that are usually going to be less common. What do you  
think use to take more space/lines in your commit logs? User focused  
comments or detailed developer focused changes? If developer focused  
changes use to be longer in description it might make more sense to  
add the character just to the blogged lines. That will have the  
additional effect of no blogging by default that may be a good thing  
(as Marc pointed out).

Best,

Diego.
On Oct 27, 2007, at 5:10 AM, James W. MacDonald wrote:

            
--
Dr. Diego Diez
Bioinformatics center,
Institute for Chemical Research,
Kyoto University.
Gokasho, Uji, Kyoto 611-0011 JAPAN
diez at kuicr.kyoto-u.ac.jp
#
On Oct 28, 2007, at 10:02 PM, Diego Diez wrote:

            
I think this post raises the interesting question of who this blog- 
style news listing is intended for. If it is developers (which I  
assume), I would vote for keeping it super easy and not worry too  
much about formatting etc. If the intention is more user-oriented I  
think it would be better to try to start a changeLog style thing  
which should be written separately from the svn log. Such an approach  
(which I would like anyway) is already supported by some packages and  
we could make a standard function like changeLog() that displays it.  
That way a user can keep track of changes, something which I  
sometimes have found very useful indeed, especially if algorithms are  
changed. The svn log should - in my opinion - be a way of  
communicating with your co-developers and should not necessarily be  
understandable for anyone who does not have a decent insight into the  
package.

Kasper