Skip to content

SVN vs DVCS

17 messages · Martin Maechler, Gabor Grothendieck, Felix Andrews +6 more

#
Hi,

Just wondering whether anyone had thought about moving the R sources
to a "distributed" version control system such as Bazaar, Git or
Mercurial. These new generation systems make it easier to work on
feature branches, allow working offline, are very fast, etc.

Some projects that have moved to Git are
Linux Kernel
Perl
Ruby on Rails
...
http://en.wikipedia.org/wiki/Git_(software)

Some projects that have moved to Bazaar (bzr) are
Ubuntu
MySQL
Inkscape
...
http://en.wikipedia.org/wiki/Bazaar_(software)

Some projects that have moved to Mercurial (hg) are
Mozilla
Octave
Python
...
http://en.wikipedia.org/wiki/Mercurial_(software)

Joel Spolky's take on it:
http://www.joelonsoftware.com/items/2010/03/17.html

Regards
-Felix

-- 
Felix Andrews / ???
Postdoctoral Fellow
Integrated Catchment Assessment and Management (iCAM) Centre
Fenner School of Environment and Society [Bldg 48a]
The Australian National University
Canberra ACT 0200 Australia
M: +61 410 400 963
T: + 61 2 6125 4670
E: felix.andrews at anu.edu.au
CRICOS Provider No. 00120C
#
On second thoughts it is really none of my business how the R sources
are managed.
But I would encourage package developers and/or r-forge maintainers to
consider these systems.
Regards
-Felix
On 26 May 2010 10:29, Felix Andrews <felix at nfrac.org> wrote:
-- 
Felix Andrews / ???
Postdoctoral Fellow
Integrated Catchment Assessment and Management (iCAM) Centre
Fenner School of Environment and Society [Bldg 48a]
The Australian National University
Canberra ACT 0200 Australia
M: +61 410 400 963
T: + 61 2 6125 4670
E: felix.andrews at anu.edu.au
CRICOS Provider No. 00120C
#
> On second thoughts it is really none of my business how the R sources
    > are managed.
    > But I would encourage package developers and/or r-forge maintainers to
    > consider these systems.

Thank you, Felix, for the compilation of these alternatives.

One very relevant piece of information that you've not added,
is, how easily one could *move* from svn to such a system
(including the full history of every file with revision numbers,
 log messages, etc),
and .. for R-forge, e.g., which of these provide nice and
flexible tools (as svn does) for an automatic web interface to
inspect file histories, differences, etc.

Regards,
Martin  ( maintainer of svn.r-project.org )

    > Regards
    > -Felix
> On 26 May 2010 10:29, Felix Andrews <felix at nfrac.org> wrote:
>> Hi,
    >> 
    >> Just wondering whether anyone had thought about moving the R sources
    >> to a "distributed" version control system such as Bazaar, Git or
    >> Mercurial. These new generation systems make it easier to work on
    >> feature branches, allow working offline, are very fast, etc.
    >> 
    >> Some projects that have moved to Git are
    >> Linux Kernel
    >> Perl
    >> Ruby on Rails
    >> ...
    >> http://en.wikipedia.org/wiki/Git_(software)
    >> 
    >> Some projects that have moved to Bazaar (bzr) are
    >> Ubuntu
    >> MySQL
    >> Inkscape
    >> ...
    >> http://en.wikipedia.org/wiki/Bazaar_(software)
    >> 
    >> Some projects that have moved to Mercurial (hg) are
    >> Mozilla
    >> Octave
    >> Python
    >> ...
    >> http://en.wikipedia.org/wiki/Mercurial_(software)
    >> 
    >> Joel Spolky's take on it:
    >> http://www.joelonsoftware.com/items/2010/03/17.html
    >> 
    >> Regards
    >> -Felix
    >> 
    >> --
    >> Felix Andrews / ???
    >> Postdoctoral Fellow
    >> Integrated Catchment Assessment and Management (iCAM) Centre
    >> Fenner School of Environment and Society [Bldg 48a]
    >> The Australian National University
    >> Canberra ACT 0200 Australia
    >> M: +61 410 400 963
    >> T: + 61 2 6125 4670
    >> E: felix.andrews at anu.edu.au
    >> CRICOS Provider No. 00120C
    >> --
    >> http://www.neurofractal.org/felix/
    >> 



    > -- 
    > Felix Andrews / ???
    > Postdoctoral Fellow
    > Integrated Catchment Assessment and Management (iCAM) Centre
    > Fenner School of Environment and Society [Bldg 48a]
    > The Australian National University
    > Canberra ACT 0200 Australia
    > M: +61 410 400 963
    > T: + 61 2 6125 4670
    > E: felix.andrews at anu.edu.au
    > CRICOS Provider No. 00120C
    > -- 
    > http://www.neurofractal.org/felix/

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel
#
Note that one can also use any of the dvcs systems without actually
moving from svn by using the dvcs (or associated extension/addon) as
an svn client or by using it on an svn checkout.

On Wed, May 26, 2010 at 5:44 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
#
I'm not necessarily advocating a migration; probably an administrative
nightmare, and everyone involved would be forced to learn new stuff...
I was just enthusing because I recently started using a DVCS for the
first time.
On 26 May 2010 21:16, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
Yes, that's a very good point (although in my experience it takes a
very long time to do the initial download of the SVN repository). I'm
not an expert on these systems, but I imagine the main downside (other
than speed) of having SVN upstream is that you have to keep the
history linear, and so e.g can't collaborate on feature branches this
way. But yeah, worth a go.
Indeed... here is the basic process for migrating to Git
http://www.jonmaddox.com/2008/03/05/cleanly-migrate-your-subversion-repository-to-a-git-repository/
This will keep the branches, tags, full history, with SVN revision
numbers added to the log messages (if you leave off the --no-metadata
argument). However, the actual commit ids you would use in git log /
git diff / etc will not be the same as the old SVN ids. In fact git
uses hash strings rather than numbers, and bazaar uses sequential
numbering in each branch (rather than sequential numbers globally as
SVN does). Not sure about Mercurial.
All have web interfaces. In fact FusionForge, which is the new name
for G-Forge, apparently supports Git, Bzr and Hg
http://fusionforge.org/
https://alioth.debian.org/scm/?group_id=30261
http://wiki.debian.org/Alioth/Hg

Other examples of web interfaces can be seen on the hosting services
GitHub.com e.g. http://github.com/hadley/ggplot2
Canonical's launchpad.net (bzr) e.g. https://launchpad.net/igraph
-- 
Felix Andrews / ???
Postdoctoral Fellow
Integrated Catchment Assessment and Management (iCAM) Centre
Fenner School of Environment and Society [Bldg 48a]
The Australian National University
Canberra ACT 0200 Australia
M: +61 410 400 963
T: + 61 2 6125 4670
E: felix.andrews at anu.edu.au
CRICOS Provider No. 00120C
#
On May 26, 2010, at 10:01 AM, Felix Andrews wrote:

            
That (non-linear history) is IMHO the biggest drawback of DVCS because that means there is no way to link a particular build to the source status and you cannot use globally valid build numbers.
But feature branches are as easily (IMHO even more easily since you can closely monitor what others are contributing) worked on with SVN (routinely used with R) so I'm not sure what DVCS would buy you.

AFAICS the only benefit of DVCS is that if you are on a remote island without any internet connection you can accumulate multiple commits before merging them back. I can't say that I desperately need that functionality ;).

Cheers,
Simon
#
Git (and I'm sure the others) provides a globally unique id for each
revision.  Isn't that sufficient?
Feature branches are _much_ easier with git - to the point where some
people suggest using a separate feature branch for every feature you
develop.
You have never worked on an airplane or other location without
internet access?  You must have lived a very privileged life ;)

Hadley
#
Every svn alternative provides tools that are as good as or better
than R-forge, with the exception of package building.  It's a real
shame that this unique component of R-forge is so closely connected to
the tools that many other sites provide.

See http://github.com/hadley/plyr for an example of the development
experience that other sites provide.  Note the absence of broken
images and broken https.

Hadley
#
2010/5/26 Hadley Wickham <hadley at rice.edu>:
Some people just have decent web access only at work, and if you work
on your R project like at home or on the train, you're already having
some difficulties. But please, not the airplane argument! (just
joking...).

Moreover, 'local' commits are way faster than network-based commits. I
can testify: 1microsecond vs 1second delay (or more, depending on how
crappy is your net access) *is* a big difference. On your local
machine, you end up committing much more often, with smaller and
self-contained commits, generally producing a cleaner history.

fabio.

  
    
#
On May 26, 2010, at 11:35 AM, Hadley Wickham wrote:

            
No in that you cannot follow revisions. What you get are those horrible UUIDs that you can't seriously use other that in some autogenerated form (that's one of the main reasons I abandoned GIt after giving it a try).
Ok, what's different? It's trivial to create branches in SVN and trivial to merge - how is it easier in git? That may be the part I don't understand..
Oh, you don't have internet in your airplane? ;) But seriously, yes, I have hacked stuff on airplanes but in general I'm able to get an access soon enough to have reasonable commit granularity. But yes, I do agree that it can be useful at very limited number of times (maybe once or twice so far for me), but that doesn't convince me to give up revisions and central conflict resolution which I use daily. [Note: this is my personal preference]

Cheers,
Simon
#
On May 26, 2010, at 12:26 PM, Antonio, Fabio Di Narzo wrote:

            
I disagree - I don't find commit time having any impact on what I commit. It's always a logical chunk (which is why SVN was such a great step forward from CVS). My RForge does check on commit so I don't even bother waiting for the commit to finish (waiting is just useful if I want the check result - the actual commit is pretty much instantaneous). However, with SVN you'll know immediately if someone else was working on the same issue in the meantime - with DVCS you won't (this happens in R more often that you would think). [Note: again, this is rather about personal preferences I suspect]

Cheers,
Simon
#
On Wed, May 26, 2010 at 11:38 AM, Hadley Wickham <hadley at rice.edu> wrote:
R-Forge does have the capability of mirroring an external subversion
repository according to section 4.2 of the R-Forge manual:
http://r-forge.r-project.org/R-Forge_Manual.pdf
#
On 5/26/10 4:16 AM, Gabor Grothendieck wrote:
FWIW, I have been using git for several years now as my vsc of choice 
and use it for all svn-backed projects (R included) via git-svn.

Some of the things I like:

- Being able to organize changes in local commits that can be revised, 
reordered, rebased prior to publishing.  Once I got in the habit of 
working this way, I simply can't imagine going back.

- Having quick access to full repository history without network 
access/delay.  Features for searching change history are more powerful 
(or easier for me to use) and I have found that useful as well.

- This may not be true any longer with more recent svn servers/clients, 
but aside form the initial repo clone, working via git-svn was 
noticeably faster than straight svn client (!) -- I think related to how 
the tools organize the working copy and how many fstat calls they make.

- I find the log reviewing functionality much better suited to reviewing 
changes.


+ seth
#
2010/5/26 Simon Urbanek <simon.urbanek at r-project.org>:
You'll know immediately as long as you're connected, and that holds
for DVCS too.
Beside, people working simultaneously on the same files and needing
svn to tell them of that? And that happening often? I would hope on
better human interaction and work division, rather than svn conflicts
checks. But...
indeed.

cheers,
fabio.

  
    
#
Antonio, Fabio Di Narzo wrote:

            
It actually happens quite often. Sometimes because two people in, say,
Canada and Oxford fix the same bug report (usually in nearly the same
way), but more typically because ALL our changes are recorded in the
same NEWS file.

Besides, SVN conflict are almost always trivial to resolve: Either a
matter of selecting one of two changes or keeping both.
#
On Wednesday, May 26, 2010, Peter Dalgaard <pdalgd at gmail.com> wrote:
And that's no more difficult in git or any other dvcs.

Hadley

  
    
1 day later
#
I think the main advantage of a DVCS is that it allows many many
people to make changes to a project and to integrate those changes in
a non-insane way. Given that R as a very restricted list of people who
actually make changes to the source, it doesn't seem that something
like git or Hg would provide a major advantage. If the people on that
list are happy with SVN then there's not much else to say. However, if
it were thought that maybe we want more people submitting
patches/making changes, then perhaps it might make more sense to move
to a DVCS.

I use git for everything mainly because it's *fast* and it has much
better tools for viewing changes/patches and revision history. For
example, 'git bisect' has allowed me track down bugs that would have
been very painful for me because I'm not intimately familiar with the
entire R source code.

-roger
On Wed, May 26, 2010 at 1:14 PM, Seth Falcon <seth at userprimary.net> wrote: