Monthly Archives: December 2007

Planet of GIT vs other SCMs

While searching for blogs that feature articles about GIT, I stumbled over a few nice ones like Ted Tsos article about GIT vs HG as well as Johns Goerzen article. Furthermore Mika wrote a nice git and svn tutorial. The people from dopefreshtightblog have written a small pdf about git (in german). Meanwhile Jeremy keepts you up to date with the ongoing efford of Johannes Schindelin to get GIT working on Windows without Cygwin.

Why (not) GIT

People reading my blog (I don’t know if there are some. If you read my blog, just drop me a comment!) must think that I’m somewhat a GIT zealot. Well I like git sometimes and sometimes I hate it by passion. Why I hate it:

/> git
zsh: do you wish to see all 141 possibilities (47 lines)?

Git offers 141 commands, low-level as well as high-level commands (did you ever need git-get-tar-commit-id). I often know that some workflow works in GIT, but I don’t know the exact commands. So I end up asking in #git (at least they are friendly and you get your answer fast). Thats pretty annoying. It’s not straight forward even you use it for a while. I heard that mercurial is better in that way, but at the moment I’m used to git. Well at least I give hg a try managing my debian package for gc-utils.

Use GIT to help you dealing with CVS

A lot of people have to deal with CVS in their companies or in Open Source projects. Therefore they all know an annoying problem:
You are working on a huge change, introducing a complete new authentification mechanism. Therefore you have to touch a lot of classes. In the meantime, other developers have to change also some parts of the code you have to touch, because they have to fix bugs or introduce other features. You cannot commit your stuff into the CVS everyday until it’s finished as it would break the codebase and nodody would be able to compile the code anymore. So you just wait until you are finished. In that 7days of coding you don’t do any commit, but a lot of code was changed by others! So if you try to commit, you get a huge list of conflicts. It’s really a mess! Let’s drop that, and do it nice using GIT.

Import the CVS HEAD into the master branch of your cvs repository using gc-utils.

gc-import ext:foo@example.com:/repository myProj
cd myProj

Now, create your own branch. We will call it exp. In that branch you will introduce your new breaking-trough feature.

git checkout -b exp master

Let’s start coding your new authentification mechanism. Just use your standard workflow. Change, commit, change, commit. Meanwhile, track the changes from the repository by switching back into the master branch and run gc-update. Your personal branch than keeps up-to-date with the CVS head and you can adopt your feature to API changes made in the CVS head.

// edit …myfanynewauth
// commit
// edit
// commit
git checkout master
gc-update
git checkout exp
git merge master

After you finished introducing your new superb feature, just do a merge back from your personal branch (exp) back into the branch that tracks CVS by merging exp with master. You don’t have to do this step, You just can pick up your changes from exp directly and commit it if you want, because you are allready up-to-date with CVS due to your merge from master (see code above).

git checkout master
git merge exp
git log // pickup your changes
gc-commit -c a2334bc fe2346ab ….

GIT helps you to do things a little bit easier. You can commit, revert and diff every change during your developement and nobody recognize it, as it is just in GIT not in CVS.

Another example
Assuming your CVS HEAD is broken, just import it into your git. Go back to the version that worked and then branch and work in your personal branch.

gc-import ext:foo@example.com:/repository myProj

Then checkout the version that worked

git log
// perfect, version a42bba seems to work
git checkout -b working a42bba

Start working on your branch. After you finished your work just merge it back into the master branch which tracks your CVS.

git checkout master
gc-update
git merge working
git-log
// search for your personal commits
gc-commit -c faab346 ab56dd2e ….

GIT in companies

CVS is probably the most used version control system. A lot of companies and open source projects use it daily. In fact most of the companies prefer to switch over to subversion instead of an decentralized VCS like GIT or mercurial.
Why? They often argue in that way

  • I don’t want to have all the logs/revision on every PC of every developer, it’s not safe enough. Well this is obviously not a good reason. Just run git-cvsimport once and you have everthing from the CVS in your repository.
  • They want to use CVS/SVN as a backup service too. Same here, usually companies work with a central repository. You can do this in GIT and Mecruial too, by using a shared repository where everybody pushs into. Even better, you can just allow the project manager or the security officer to push changes. Therefore he is responsable for broken commits in the shared repository. Therefore he’ll make sure that every commit is good enough, which helps you to improve quality of your software.
  • DVCS are different from centralized VCS. It’s too hard to migrate and to have trainings for your employees. Thats probably a good arguments. Git is not easy to deal with. Mercurial seems to be much easier, but it’s not 100% the workflow like an centralized VCS. So it’s easier to switch over to subversion

interesting gzip behaviour

Sometime, strange things happend that are kind of obvious if you look into deep at the problem.

Just take a look at the following code:

$ cd /tmp
$ mkdir -p foo/bar/
$ touch foo/bar/test
$ touch foo/test
$ for ((i=0;$i<5;i=$i+1)); do tar czf foo.tar.gz foo/ && md5sum foo.tar.gz; sleep 1; done
ba29a144a1f1e61aa3e581ef850cb1ec foo.tar.gz
7790fb2d6b1b9b77c101ce5d4fe63b94 foo.tar.gz
5d8f854f4725022b12d058ff7468e38b foo.tar.gz
b7292d286edcedd7d7f02fce16c6098e foo.tar.gz
7945ea06cd8085454898652db4579506 foo.tar.gz

gzip does always produce a different output file. Therefore an automatic tarballing script that creates an .tar.gz from an directory every hour does always result in different md5sums for the tarballs. This might break some software (like dpkg) which check md5sums of upstream sources.

Pretty anoying behaviour, but I figured out whats the problem. The gzip header contains an MTIME field.

Bzip doesn’t have such a time field, therefore bzip creates archives as expected, having always the same checksum.

$ for ((i=0;$i<5;i=$i+1)); do tar cjf foo.tar.bz2 foo/ && md5sum foo.tar.bz2; sleep 1; done
4a71c3031a58650ac694e95d207af779 foo.tar.bz2
4a71c3031a58650ac694e95d207af779 foo.tar.bz2
4a71c3031a58650ac694e95d207af779 foo.tar.bz2
4a71c3031a58650ac694e95d207af779 foo.tar.bz2
4a71c3031a58650ac694e95d207af779 foo.tar.bz2