Skip to main content

Home/ Groups/ git-vs-mercurial-vs-bazaar
Daniel Jomphe

D.C.T.W.Y.C.D.T: dancing between github and subversion repository - 0 views

  • Once they send patches to you or push their branches, you can do the merge into your git repository and push it to github. Due to the said cron job, the subversion repository (e.g. in Google Code) will get the changes as well. Every now and then, when other contributors commit some changes to the subversion repository, the changes will be also propagated to github. Both parties are happy.
Daniel Jomphe

Mark Shuttleworth » Blog Archive » Merging is the key to software developer c... - 0 views

  • We must keep the cost of merging as low as possible if we want to encourage people to collaborate as much as possible. If a merge is awkward, or slow, or results in lots of conflicts, or breaks when people have renamed files and directories, then I’m likely to avoid merging early and merging often. And that just makes it even harder to merge later.
  • The beauty of distributed version control comes in the form of spontaneous team formation, as people with a common interest in a bug or feature start to work on it, bouncing that work between them by publishing branches and merging from one another. These teams form more easily when the cost of branching and merging is lowered, and taking this to the extreme suggests that it’s very worthwhile investing in the merge experience for developers.
  • In CVS and SVN, the “time to branch” is low, but merging itself is almost always a painful process. Worse, merging a second time from another branch is WORSE, so the incentives for developers to merge regularly are exactly the wrong way around. For merge to be a smooth experience, the tools need to keep track of what has been merged before, so that you never end up redoing work that you’ve already solved. Bzr and Git both handle this pretty well, remembering which revisions in someone else’s branch you have already integrated into yours, and making sure that you don’t need to bother to do it again.
Daniel Jomphe

Mark Shuttleworth » Blog Archive » Renaming is the killer app of distributed ... - 0 views

  • You don’t want to have to dump a whole lot of rules to new contributors like “never rename directories a, b and c because you will break other people and we will be upset”. You want those new contributors to have complete freedom, and then you want to be able to merge, review changes, and commit if you like them. If merging from someone might drop you into a nightmare of renaming fixups, you will be resistant to it, and your community will not be as widely empowered.
  • Keep playing with this. Sooner or later, if you are not using a system like Bzr which treats renames as a first class operation… Oops.
  • If I look at the biggest free software projects, the thing they all have in common is crufty tree structures (directory layouts) and build systems. This is partly a result of never having had tools which really supported renaming, in a way which Would Not Break.
  • ...10 more annotations...
  • The exact details of what it takes to break the renaming support of many DVCS’s vary from implementation to implementation. But by far the most robust of them is Bzr at the moment, which is why we make such heavy use of it at Ubuntu.
  • ’ll gladly accept the extra 0.3 seconds it takes Bzr to give me a tree status in my 5,100 file project, for the security of knowing I never ever have to spend long periods of time sorting out a merge by hand when stuff got renamed.
  • Mercurial does a remove and a add, but it stores the pointer to the last version of the previous file during the add. So there is no information lost.
  • Instead of taking real projects, I’ve written a simple test case in a form of a bash script. It simulates two developers, one renaming a directory and second adding a file to it. I’ve tested it with Bazaar, Git and Subversion. You were right, only Bazaar handled it in the way we would expect an SCM to behave. Git didn’t move the new file to a new directory, merge ended up with two separate directories. Subversion was worse, discarding the added file altogether.
  • The Hg team, like the Git team, have just decided that renaming is not important enough that it needs to be tracked explicitly. So when you rename a file using Hg, it is internally represented as a delete and an add. Later, when you merge across branches where this happens, Hg does a (very good) job of guessing what to do. But that guessing process, while it handles obvious cases well, is likely to break down as directories, and subdirectories, get moved around. It’s easy to show it break, but it’s not my aim here to actually demonstrate a failing in any other project. And any individual use case can be fixed with better guessing - the problem is that on big projects, over time, you will see the use cases get increasingly baroque.
  • So, to be clear, I don’t think there’s anything wrong with Hg, it’s a great tool that is best for certain use cases, but I also don’t think it handles renames in a way that is healthy for projects which want to do real surgery on the shape of their tree. And on this import of the Linux kernel tree, on my laptop, trunk-Bzr does “status” in 1.4 seconds, while Hg 0.9.3 (not the latest trunk which is probably faster) does it in 1.3 seconds. Granted, commit is much faster with Hg currently, but the Bzr team have not done any optimisation on commit, they are focused on status because that’s what everyone does all the time. And 0.1 seconds is not worth the lossiness of guesstimated renames. This is 23,000 files.
  • GIT is an excellent system that does a very specific job. It does not track renames as a first class operation, but does a reasonable job of guessing in simple cases. However, if you want to do real reshaping of your tree, that guessing can hit limits quickly. Git is very good in projects like the kernel which don’t rename stuff and need the speed, and also are UNIX-specific (Git is quite difficult to get to work on Windows, as I understand it).
  • GIT even tells you where parts of a file came from, which nobody else can do.
  • This is a very strong point for renaming, but it is not necessarily an universal one. Here is one example of the issue: one developer renaming a directory in his branch, and another adding a file to the original directory in his branch. What happens at the merge ? - Bazaar renames the directory and puts the new file in the _renamed_ directory. - Git renames the directory with its files, but keeps the old directory too and adds the new file there. Bazaar’s behavior certainly is better for C. However it is not universally better. For example in Java you cannot rename a file without changing its contents. So, moving a file to a directory different from where its author put it will almost certainly break the build. The bottom line is, both behaviors can seem valid or broken, depending on the case. Neither is perfect. At the very abstract level file renames are _not_ a first-class operation. This is especially apparent in a language like Java. Content movement is the first class operation. Things like moving functions, etc. The question is how one can handle that and whether the current strategy has a path for improvement. It could be argued that once you commit yourself to explicitly tracking file renames, you are giving up a slew of opportunities for handling the more general cases. One thing is for certain, a 100% ideal solution is impossible. It would have to be aware of the target programming language _and_ the build environment.
  • Now now, Mark, no misleading assertions if you please. Mercurial tracks renaming information perfectly. It implements it in a different way than Bazaar, but in fact the technique that it uses is more general than Bazaar’s approach. As you know, Bazaar requires a file to have a unique identity. If I rename A to B, and you rename A to C, the only possible outcome when we merge is a conflict that results in either B or C, because Bazaar requires there to be a single file afterwards. However, Mercurial reports the possible conflict to you *and* is perfectly okay if you decide that the appropriate response is to keep either one, *or* to preserve *both* B and C. In the book, I mention a bug (455) with this handling which has subsequently been fixed. I’ve not had a chance to update the book with the current behaviour, but I want to note the fact that it’s fixed here. Mark Shuttleworth says: Now now Bos, don’t cry FUD when all you’re getting is constructive criticism! I like Hg, but I think it’s important to version directories like files, because I think this gives the result most people actually expect. I wouldn’t call renaming support “perfect” unless it clearly included support for renaming directories. The principle of least surprise is important, and I think Bazaar best reflects that. Giving everyone an A, B, and C option when most people really expect A is cute but ultimately makes the tool harder to use. As for your example, in neither case did either developer ADD a file, in both cases they renamed the same file, so it seems odd to think that ending up with TWO files is an expected result. I don’t think a limitation of a tool should be sold as a feature
Daniel Jomphe

Git's future on windows looks better - 0 views

  • Source changes needed for porting to MinGW environment are now all in the main git.git codebase.
  • An ancient merge strategy "stupid" has been removed.
  • git-gui learned to stage changes per-line
Daniel Jomphe

git-svn clone and rebase fails in 1.5.6-preview20080622 (was Re: [msysGit] Re: [ANNOUNC... - 0 views

  • Therefore, Dscho and I decided to remove git-svn from the end-user installer.  Personally, I am only interested in the core commands that are needed for a native git workflow, so I will *not* fix git-svn.  In fact, I never ran git-svn.  Apparently, the other core msysgit developer are also not interested in working on git-svn. git-svn will still be available if you checkout the msysgit source as describe on the msysgit homepage (see "If you want to hack ...").  After git-svn matured, we can include it again in the end-user installer.
  •  
    msysgit doesn't support integration with svn (2008-07). This means it only works with a git infrastructure.
Daniel Jomphe

An introduction to git-svn for Subversion/SVK users and deserters - 0 views

  • This article is aimed at people who want to contribute to projects which are using Subversion as their code-wiki
  • Subversion users can skip SVK and move straight onto git-svn with this tutorial.
  • People who are responsible for Subversion servers and are converting them to git in order to lay them down to die are advised to consider the one-off git-svnimport, which is useful for bespoke conversions where you don't necessarily want to leave SVN/CVS/etc breadcrumbs behind. I'll mention bespoke conversions at the end of the tutorial, and the sort of thing that you end up doing with them.
  • ...75 more annotations...
  • A lot of this tutorial is dedicated to advocacy, sadly necessary. Those who would rather just cut to the chase will probably want to skip straight to
  • Another way of looking at it is to say that it's really a content- addressable filesystem, used to track directory trees.
  • we've got a simple and efficient filesystem which competes with RevML but is XML free
  • Subversion added nothing to CVS' development model.
  • Yes, it's a bunch of small programs that do one thing and do it well, get over it, they're being unified
  • There's also a pure Java
  • I used to push strongly for SVK, but got brow-beaten by people who were getting far more out of their version control system than I knew possible until I saw what they were talking about.
  • SVK could easily use git as a backing filesystem and drop the dependency on Subversion altogether. So could bzr or hg.
  • Writing a tool to do something that you want is often quite a simple matter of plugging together a few core commands. It's simple enough that once a few basic concepts are there, you begin to feel comfortable knowing that the repository just can't wedge, changes can be discarded yet not lost unless you request them to be cleaned up, etc.
  • git is first and foremost a toolkit for writing VCS systems
  • The repository model (see right) is also simple enough that there are complete git re-implementations you can draw upon, in a variety of languages.
  • I really haven't seen a nicer tool than gitk for browsing a repository.
  • gitk does some really cool things but is most useful when looking at projects that have cottoned onto feature branches (see feature branches, below). If you're looking at a project where everyone commits largely unrelated changes to one branch it just ends up a straight line, and not very interesting.
  • You can easily publish your changes for others who are switched on to git to pull. At a stretch, you can just throw the .git directory on an HTTP server somewhere and publish the path.
  • There's the git-daemon for more efficient serving of repositories (at least, in terms of network use), and gitweb.cgi to provide a visualisation of a git repository.
  • With Subversion, everyone has to commit their changes back to the central wiki, I mean repository, to share them.
  • With Git (actually this is completely true for other distributed systems), it's trivial to push and pull changes between each other. If what you're pulling has common history then git will just pull the differences.
  • If the person publishes their repository as described above, using the git-daemon(1), http or anything else that you can get your kernel to map to its VFS, then you can set it up as a "remote" and pull from it
  • There is an awful lot less to keep in your head, and you don't have to do things like plan branching in advance.
  • Because you can easily separate your repositories into stable branches, temporary branches, etc, then you can easily set up programs that only let commits through if they meet criteria of your choosing.
  • Because you can readily work on branches without affecting the stable branch, it is perfectly acceptable for a stable branch to be updated by a single maintainer only
  • Some repositories, for instance the Linux kernel, run a policy of no commit may break the build. What this means is that if you have a problem, you can use bisection to work out which patch introduced the bug.
  • You might use a continual integration server that is responsible for promoting branches to trunk should they pass the strictures that you set.
  • Most people say "but I don't want branches". But users of darcs report that they didn't know how much they really did want branches, but never knew until darcs made it so easy. In essence every change can behave as a branch, and this isn't painful.
  • Good feature branches mean you end up prototyping well-developed changes; the emphasis shifts away from making atomic commits. If you forgot to add a file, or made some other little mistake, it's easy to go back and change it. If you haven't even pushed your changes anywhere, that's not only fine, but appreciated by everyone involved. Review and revise before you push is the counter-balance to frequent commits.
  • Not only is the implementation fast locally, it's very network efficient, and the protocol for exchanging revisions is also very good at figuring out what needs to be transferred quickly. This is a huge difference - one repository hosted on Debian's Alioth SVN server took 2 days to synchronise because the protocol is so chatty. Now it fits in 3 megs and would not take that long to synchronise over a 150 baud modem.
  • Disk might be cheap, but my /home is always full - git has a separate step for compacting repositories, which means that delta compression can be far more effective. If you're a compression buff, think of it as having an arbitrarily sized window, because when delta compressing git is able to match strings anywhere else in the repository - not just the file which is the notional ancestor of the new revision. This space efficiency affects everything - the virtual memory footprint in your buffercache while mining information from the repository, how much data needs to be transferred during "push" and "pull" operations, and so on. Compare that to Subversion, which even when merging between branches is incapable of using the same space for the changes hitting the target branch. The results speak for themselves - I have observed an average of 10 to 1 space savings going from Subversion FSFS to git.
  • Perhaps somebody has already made a conversion of the project and put it somewhere
  • Each of those copies is fully independent, and can diverge freely. You can easily push and pull changes between them without tearing your hair out.
  • But people who use git are used to treating their repositories as a revision data warehouse which they use to mine useful information when they are trying to understand a codebase.
  • importing the whole repository from Subversion
  • git svn init
  • git svn fetch
  • If you like, you can skip early revisions using the -r option to git-fetch.
  • make a local branch for development
  • The name "foo" is completely private; it's just a local name you're assigning to the piece of work you're doing. Eventually you will learn to group related commits onto branches, called "topic branches", as described in the introduction.
  • Say you want to take a project, and work on it somewhere else in a different direction, you can just make a copy using cp or your favourite file manager. Contrast this with Subversion, where you have to fiddle around with branches/ paths, svn cp, svn switch, etc
  • The -l option to git-clone told git to hardlink the objects together, so not only are these two sharing the same repository but they can still be moved around independently. Cool. I now have two checkouts I can work with, build software in, etc.
  • Each time you have a new idea, make a new branch and work in that.
  • But anyway, that copying was too slow and heavy. We don't want to copy 70MB each time we want to work on a new idea. We want to create new branches at the drop of a hat. Maybe you don't want to copy the actual repository, just make another checkout. We can use git-clone again
  • git-svn fetch
  • But all that's a lot of work and most of the time I don't care to create lots of different directories for all my branches. I can just make a new branch and switch to it immediately with git-checkout:
  • Once you have some edits you want to commit, you can use git-commit to commit them. Nothing (not even file changes) gets committed by default; you'll probably find yourself using git-commit -a to get similar semantics to svn commit.
  • There is also a GUI for preparing commits in early (but entirely functional) stages of development.
  • People used to darcs or SVK's interactive commit will like to try git add -i
  • correcting changes in your local branch
  • If it's the top commit, you can just add --amend to your regular git-commit command to, well, amend the last commit. If you explored the git-gui interface, you might have noticed the "Amend Last Commit" switch as well.
  • You can also uncommit. The command for this is git-reset
  • HEAD~1 is a special syntax that means "one commit before the reference called HEAD". HEAD^ is a slightly shorter shorthand for the same thing. I could have also put a complete revision number, a partial (non-ambiguous) revision number, or something like remotes/trunk. See git-rev-parse(1) for the full list of ways in which you can specify revisions.
  • I sometimes write commands like `gitk --all `git-fsck | awk '/dangling commit/ {print $3}'`' to see all the commits in the repository, not just the ones with "post-it notes" (aka references) stuck to them.
  • keep your local branch up to date with Subversion
  • I've introduced a new command there - git-cherry-pick. This takes a commit and tries to copy its changes to the branch you've currently got checked out. This technique is called rebasing commits. There is also a git-rebase command which probably would have been fewer commands than the above. But that's my way.
  • Using Git opens the door to a bazaar of VCS tools rather than sacrificing your projects at the altar of one.
  • "Another" way to revise commits is to make a branch from the point a few commits ago, then make a new series of commits that is revised in the way that you want. This is the same scenario as before.
  • The recommended way to do this for people familiar with Subversion is to use git-svn rebase.
  • Note: before you do this, you should have a "clean" working tree - no local uncommitted changes. You can use git-stash (git 1.5.3+) to hide away local uncommitted changes for later.
  • This command is doing something similar to the above commands that used git-cherry-pick; it's copying the changes from one point on the revision tree to another
  • Better still is to bunch up your in-progress working copy changes into a set of unfinished commits, using git add -i (or git-gui / git-citool). Then try the rebase. You'll end up this time with more commits on top of the SVN tree than just one, so using Stacked Git you can "stg uncommit -n 4" (if you broke your changes into 4 commits), then use "stg pop" / "stg push" to wind around the stack (as well as "stg refresh" when finished making changes) to finish them - see
  • Once you grok that, you'll only need to use stg and git-svn fetch.
  • in my experience stg is the best tool for rebasing
  • Ok, so you've already gone and made the commits locally that you wanted to publish back to the Subversion server. Perhaps you've even made a collection of changes, revising each change to be clearly understandable, making a single small change well such that the entire series of changes can be easily reviewed by your fellow project contributors. It is now time to publish your changes back to Subversion. The command to use is git svn dcommit. The d stands for delta
  • git-svn won't let the server merge revisions on the fly; if there were updates since you fetched / rebased, you'll have to do that again.
  • People are not used to this, thinking somehow that if somebody commits something to file A, then somebody else commits something to file B, the server should make a merged version with both changes, despite neither of the people committing actually having a working tree with both changes. This suffers from the same fundamental problem that darcs' patch calculus does - that just because patches apply 'cleanly' does not imply that they make sense - such a decision can only be automatically made with a dedicated continual integration (smoke) server.
  • This is normally what I use in preference to rebase.
  • This will merge all the commits that aren't in your ancestry, but are in the ancestry of the branch trunk (try setting rightmost drop-down in gitk to 'ancestor' and clicking around to get a feel for what this means), and make a new commit which has two parents - your old HEAD, and whatever commit trunk is up to.
  • there are many shortfallings in git.
  • Sadly, this model is in use by virtually every Subversion hosted project out there. And that is going to be hard to undo.
  • It is possible to use git in this way (see the figure to the right) - but it's not trivial, and not default. In fact git itself is developed in this way, using feature branches, aka topic branches.
  • Left: what darcs thinks when you start committing without marking tag points.
  • Right: Subversion has a somewhat smaller brain...
  • bzr comes with some great utilities like the Patch Queue Manager which helps show you your feature branches. With PQM, you just create a branch with a description of what you're trying to do, make it work against the version that you branched off, and then you're done. The branch can be updated to reflect changes in trunk, and eventually merged and closed.
  • Windows support is good. Consistent implementation. Experience with the distributed development model. Friendly and approachable author and core team.
  • Actually the models of git and bzr are similar enough that bzr could be fitted atop of the git repository model
  • Mercurial is missing lightweight branches that makes git so powerful, and there is no content hashing, so it doesn't really do the whole "revision protocol" thing like git.
  • If you're on Windows it's probably a lot easier to get going.
Daniel Jomphe

'Re: clarification on git, central repositories and commit access lists' - MARC - 0 views

  • Another option is to look at git-svnserver which would allow a git repository backbone, but could talk svn over the wire which these tools could use...
Daniel Jomphe

InfoQ: Distributed Version Control Systems: A Not-So-Quick Guide Through - 0 views

  • Major reason is that branching is easy but merging is a pain
  • Subversion has no History-aware merge capability, forcing its users to manually track exactly which revisions have been merged between branches making it error-prone.
  • No way to push changes to another user (without submitting to the Central Server).
  • ...4 more annotations...
  • Subversion fails to merge changes when files or directories are renamed.
  • Each working copy is effectively a remoted backup of the codebase and change history, providing natural security against data loss.
  • Experimental branches – creating and destroying branches are simple operations and fast. Collaboration between peers made easy.
  • have a look at
Daniel Jomphe

A tour of git: the basics - 0 views

  • it's useful to use "git clone" even when just making a local copy of a repository. Using "git clone" will be much faster and will use much less space than a normal copy.
  • A good commit message will generally have a single line that summarizes the commit, a blank line, and then one or more paragraphs with supporting detail. Since many tools only print the first line of a commit message by default, it’s important that the first line stands alone.
  • Note that we didn't use "commit -a" this time. This means that "git commit --amend" will amend only the commit message and not any of the actual files being tracked, (even if some of them had been modified between the commits). It's also possible to use "git commit -a --amend" to similarly fix up mistakes noticed in code. That will replace the most recent commit with a different commit based on any new changes to files.
  • ...19 more annotations...
  • Sometimes you may not know if you want to pull in the changes from the remote repository or not. It's useful to be able to examine them before accepting them into our branch. The "git pull" command shown in the previous section is conceptually the combination of two commands, "git fetch" and "git merge". We can use these commands separately to examine the change before accepting it.
  • The most convenient way to examine the fetched changes is with the "master..origin" range notation
  • You'll notice that we've been seeing the phrase "fast forward" several times. This is a special-case operation performed by "git merge" where a branch can be advanced along a linear sequence. This happens whenever you pull changes that build directly on top of the same commit you have as your most recent commit. In other words, there was never any divergence or simultaneous commits created in parallel in multiple repositories. If there had been parallel commits, then "git merge" would actually introduce a new merge commit to tie the two commits together.
  • If you have a situation where you want to pull a single time from some repository, then you can simply give the path or URL of the repository on the "git pull" command line. However, it's often the case that if you want to pull changes from a repository once, you'll want to pull changes from that same repository again in the future. This is where the "git remote" notion is extremely useful---it allows you to associate simple names, (and behaviors), with remote repository URLs We've already seen one instance of "git remote" which is the creation of the "origin" remote which happens automatically during "git clone". Let's now create another. Let's assume you are going to be working in the hello-remote repository and you'd like to pull changes from the hello-pull repository, where your friend "fred" has been making changes. Here's how to setup the new remote:
  • So that's a "git remote add" command line followed by an arbitrary name you'd like for the new remote (fred) and the URL of the remote (../hello-pull). Obviously, the URL could be a git:// URL or any other git-supported URL in addition to a local path.
  • The "git remote" command is really just a helper for adding some entries to the .git/config file. You might find it more convenient to edit that file directly once you get comfortable with things.
  • At this point the name "fred" will work much like the name "origin" has worked in previous examples. For example, we can fetch the changes fred has made with "git fetch fred":
  • We can also list all known remote-tracking branches with "git branch -r":
  • These remote-tracking branches make it very easy to collaborate with people as they are working on experimental features not yet ready for upstream inclusion. For example, if fred's latest code is still trashing filesystems then he might not want to push it out to the project's primary repository. But he may still want my help with it. So he can push it to a branch in his own repository for which I've got a remote. Then on my next "git fetch fred" I might notice a new branch called fred/trashes-filesystems and I can examine his code with a command such as "git log ..fred/trashed-filesystems".
  • Now, generally the purpose of pushing to a repository is to have some "collaboration point" where potentially multiple people might be pushing or pulling. Because there might be multiple people pushing into the repository at any point, it wouldn't make sense to have a working-directory associated with this repository. For this, git has the notion of a "bare" repository, which is simply a repository with no working directory. Let's create a new bare repository and push some changes into it:
  • git clone hello hello-clone
  • git remote add fred ../hello-pull
  • git branch -r
  • git fetch fred
  • git --bare init --shared
  • The --shared option sets up the necessary group file permissions so that other users in my group will be able to push into this repository as well.
  • Now lets return to our hello repository and push some changes to this new repository. Since this is our very first push into this repository we need to tell git which branches to push. The easiest way to do this is to use --all to indicate all branches:
  • git push ../hello-bare --all
  • For subsequent pushes we don't need to specify --all as "git push" by default pushes all branches that exist in both the local and remote repositories.
‹ Previous 21 - 37 of 37
Showing 20 items per page