Category Archives: Open Source

Mercurial vs. Git vs. Bazaar: The aftermath

Over the last years, the version control system community has fought, what some people would call the “VCS war”. People were arguing on IRC, conferences, mailinglists, they wrote blog posts and upvoted HN articles about which was the best version control system out there. The term “VCS war” is borrowed from the “Editor wars”. A constant fight which people argue which of the major editors VIM, or Emacs and later TextMate or Sublime, or again Vim and Emacs is the best editor. It is similar to programming language discussions, shell environments, window manager and so on and so forth. What they all have in common is that they are tools that are used daily by software engineers, and therefore a lot of people have an opinion on it.

When in 2005 both Git and Mercurial were released and Bazaar followed shortly after, the fight who is the best system of the three started. While early Mercurial versions seemed to be much easier to use than Git, Git was already used in the Linux kernel and built up a strong opinion. Things were even till 2008 Github launched and changed the OpenSource world and is what people would consider Git’s “killer app”. Mercurial’s equivalent Bitbucket never reached the popularity of Github. But people were still writing articles, arguing about merging and rebasing, arguing about performance and abilities to rewrite history, wrote long blog posts about confusing branching strategies. Those were complicated enough that they had to write helper tools, about which they could write articles again….and so on and so forth.

Recently things have become quiet. But why is that? What happend to Git, Mercurial and Bazaar?

Bazaar

I haven’t followed bazaars history much. It’s most notable users were MySQL and Ubuntu. In the early development bazaar lacked performance and couldn’t keep up with Git and Mercurial. It tried to solve this by changing the on-disk format a few times, requiring their users to upgrade their servers and clients. The development was mostly driven by Canonical and they had a hard time reaching out for more active developers. In the end there isn’t much to say about Bazaar. It development slowly deceased and it’s been widely considered the big looser of the VCS wars. Bazaar is dead.

Mercurial

Mercurial started out for the very same reason Git was created and was developed at the same time Linux wrote Git. They both had a fast growing active development group and were equally used in the first years. While Git was the “faster” decentralized version control system, Mercurial was widely considered the more user-friendly system. Nevertheless with the rise of Github, Mercurial lost traction. However the development continued and while more and more people used Git and Github, the Mercurial community worked on some new ideas. Python picked it as it’s version control system in 2012 and Facebook made moved to Mercurial in 2013. So what’s so interesting about Mercurial?

  1. Mercurial is extensible: It’s written mostly in Python and has a powerful extension API. Writing a proof of concept of a new backend or adding additional data that is transferred on cloned is fairly easy. This is a big win for the Python or the Mozilla community that makes it easy for them to adapt Mercurial to their needs.
  2. Mercurial caught up on Git features and performance: Mercurial added “bookmarks”, “rebase” and various other commands to it’s core functionality and constantly improved performance.
  3. Mercurial has new ideas: Mercurial came up with three brilliant ideas in the last 3 years. They first introduced a query language called “revsets” which helps you to easily query the commit graph. Second, they introduced “phases”. A barrier that prevents user from accidentally changing or rebsaing already published changesets – a common mistake for Git users. And last, but not least Evolution Changeset, a experimental feature that helps you to safely modify history and keep track of the history of a changing commit.

So while Mercurial is certainly not the winner, it found a niche with a healthy and enthusiastic community. It’s worth a shot trying it if you are not 100% happy with Git.

Git

The big winner obviously is Git. With the introduction of Github pushed Git usage. Github’s easy to approach fork&merge mechanism revolutionized OpenSource development to a point where most younger projects don’t use mailinglists anymore but rather rely on pull-requests and discussion on Github issues. Github’s feature and community is attractive enough for people to learn git. In addition, Git had a healthy and vocal community creating blog posts, introduction videos and detailed technical explanations. Noways Git market share is big enough that companies move from Subversion to Git because a new hire will more likely know Git than any other version control system (maybe SVN). As an open source developer, there is no way around Git anymore. Moreover the development is going on in rapid pace and the community constantly improves performance and is slowly reaching the v2.0 milestone. It’s yet to be seen if they are going to port some of the ideas from Git. A major challenge for Git however, still, is to deal with large repositories, something that at least the Mercurial community has partly solved. If you haven’t learned it, learn it, there isn’t going to be a way around it anyway – deal with it.

A conclusion

The war is over, and we are all back on working on interesting features with our favorite Version Control System. Nobody needs to write blog posts anymore which system is better and you certainly won’t be able to circumvent Git entirely.

Play around: BGP and the DN42

As far as I am concerned, networking is one of the most fascinating aspects of computing. Connecting people and systems sounds like an easy problem to solve, but looking into the details and the scale of something like the internet, shows that networking is far from easy. While most developers and administrators understand the basics of local IP routing and maybe even OSPF, not many understand how global scale, carrier focused networking works. To understand how the internet works one has to understand routing. To understanding routing on a global, internet scale level, one has to understand the exterior Border Gateway Protocol (eBGP).

Now with our local setup’s, BGP isn’t really something that we use on a daily base (unless you work on the DE-CIX, ASM-IX or at Level3). We need to a bigger network to learn about the details. While we are obviously not being able to learn about BGP on the real internet, we want to build something similiar to the internet to help us learn and hack on this stuff.

welcome to the dn42

The dn42 is a darknet build by members of the German Chaos Computer Club. It connects people on a ISP level and somewhat replicate common internet services like a registry, anycast DNS, whois, etc. The project aims to facilitate understanding how the internet works and build up a darknet at the same time.

The dn42 uses the 172.22.0.0/15 address range out of IANAs reserved, non-publicly routed subnets. A participant of the dn42 usually allocates a /24 from this 172.22.0.0/15 pool by entering the necessary information into the registry. The registry is build similar to the RIPE/ANIC registry. Once a subnet is allocated, a participant has to start peering and announcing the network. The participant has a role similar to a e.g. an ISP in the internet, who announces his allocated subnet to the internet. Just like ISPs and Tier3 up to Tier1 carriers, participants in the DN42 talk eBGP with each other over a secured line.

how the internet works (for dummies and simple..blabla)

So how does your computer know how to get to youtube.com? Your PC got a gateway address from your ISP, so your computer is simply sending everything he doesn’t know how to route to the gateway. But how does the ISP know where the youtube.com computer is? The internet folks thought about this for a very long time before they came up with “BGP”. In BGP, an internet router advertises which routes he can handle himself to his neighbors. For example the German Telekom asks the European Internet Registry for a subnet. Once it got his subnet allocated, their routers start to announce to their peering partners (e.g. French Telecom, Austrian Telecom, Google, etc) that they are now capable of routing the subnet. The neighbors itself announce this new route to their neighbors and so on. Routes are handled by autonomous systems (AS). The German Telekom owns an autonomous system and within the autonomous system, they do whatever routing necessary to get to the right computer. For other autonomous systems like French Telecoms all that matters is that subnet W.X.Y.Z/NN is handled by the German Telekom AS, so they know to sent a package designated for that subnet to the German Telecom AS and don’t have to deal with it any further.. So if the French Telecom sees a package in their network that they know the German Telekom can handle, they are just forwarding it to it, and be done with it.

In DN42 this the network works just like that, just in small scale. You are a big ISP. You have an AS number, a number of peering AS and you announce the subnet that you can route. All autonomous systems will deliver packages for that subnet to you. It’s your responsibility to route them properly. In addition if you can reach an AS faster (in less hops) than another, your peers will start sending your packages to forward to the appropriate AS. So as an ISP you handle traffic designated for you, but also forward packages to others AS if necessary.

your first subnet

So you allocated your new /24 subnet. As an owner of the fresh subnet, one has to start announcing the subnet within the dn42 by telling all other participants in the network that your router is responsible for routing the whole subnet, therefore “announcing” the particular route to the network. Like in the internet, this is done using the exterior Border Gateway Protocol (eBGP). Common software implementations for small scale purposes like the DN42 are Bird and Quagga. To establish a first connection the network, a participant will need a trusted peer that is willing to start peering with him. The connection to the peer is established using a VPN tunnel, usually OpenVPN or IPsec. Once the tunnel is established the participant configures his BIRD to start announcing the his subnet (172.23.217.0/24):

# Configure logging
log syslog { debug, trace, info, remote, warning, error, auth, fatal, bug };

# Override router ID
router id 172.23.217.1;

# This pseudo-protocol performs synchronization between BIRD's routing
# tables and the kernel. If your kernel supports multiple routing tables
# (as Linux 2.2.x does), you can run multiple instances of the kernel
# protocol and synchronize different kernel tables with different BIRD tables.
protocol kernel {
	scan time 20;		# Scan kernel routing table every 20 seconds
	import all;
	export where source != RTS_STATIC;
}

# This pseudo-protocol watches all interface up/down events.
protocol device {
	scan time 10;		# Scan interfaces every 10 seconds
}

Moving again

It’s been a long time since we moved to a new server, but now it’s finally moving time again. This means, no longer we are running Solaris but now we are back on Linux with KVM virtualisation.

There isn’t much to say about it, we are running standard KVM hosts with virtio for both network and block device access. Disks are mapped to LVM partitions on the host and we use Kernel Samepage Merging to overcommit memory. That being said, Solaris served us very well over the years and we are going to miss some of the Solaris capabilities. Zones + ZFS + Crossbow really worked well for us and it was really fun to work with a system that create bootenviroments and snapshots before upading.

It was easy to manage, easy get reallocate memory, space and create complex virtual networks. In the end we moved away not because we think Linux is better, but because there is much more software available for Linux (particularly some experimental stuff I want to get my hands on) and because Solaris went closed source again. It was too hard to get updates and up-to-date software in the end.

Anyway..this is just a short heads up that I am still blogging. More stuff coming up soon.

Probing PHP with Systemtap on Linux

DTrace is a dynamic tracing tool build by Sun Microsystems and is available for Solaris, MacOS and FreeBSD. It features a tracing language which can be used to probe certain “probing” points in kernel or userland. This can be very useful to gather statistics, etc. Linux comes with a separate solution called systemtap. It also features a tracing language and can probe both userland and kernel space. A few Linux distributions such as Fedora enable systemtap in their default kernel.

PHP introduced DTrace support with PHP 5.3, enabling probing points in the PHP executable that can be used to simplify probing of PHP applications without having to the PHP implementation details. We enabled probes on function calls, file compilation, exceptions and errors. But this has always been limited to the operating systems that support DTrace. With the popularity of DTrace, Systemap programmers decided to add a DTrace compatibility layer that allows to use DTrace probes as Systemtap probing points as well.

With my recent commit to the PHP 5.5 branch, we allow DTrace probes to be build on Linux, so people can use Systemtap to probe those userland probes.

To compile PHP with userland probes you need to obtain the PHP 5.5 from git:

$ git clone git://github.com/php/php-src php-src
$ cd php-src
$ git checkout PHP-5.5
Now build PHP with DTrace support. First we have to rebuild configure as we build directly from the repository. Make sure your Linux distribution comes with systemtap and uprobes support.

$ ./buildconf --force
$ ./configure --disable-all --enable-dtrace
$ make
After being done with building we can see if we found any probes:

$ stap -l 'process.provider("php").mark("*")' -c 'sapi/cli/php -i'
process("sapi/cli/php").provider("php").mark("compile__file__entry")
process("sapi/cli/php").provider("php").mark("compile__file__return")
process("sapi/cli/php").provider("php").mark("error")
process("sapi/cli/php").provider("php").mark("exception__caught")
process("sapi/cli/php").provider("php").mark("exception__thrown")
process("sapi/cli/php").provider("php").mark("execute__entry")
process("sapi/cli/php").provider("php").mark("execute__return")
process("sapi/cli/php").provider("php").mark("function__entry")
process("sapi/cli/php").provider("php").mark("function__return")
process("sapi/cli/php").provider("php").mark("request__shutdown")
process("sapi/cli/php").provider("php").mark("request__startup")
Let’s build us a short Systemtap script that counts the function calls of a specific function. we use the function-return and function-entry probes for that:

$ cat request.stp
global callcount;
probe process.provider("php").mark("function-entry") {
    callcount[user_string($arg1)] += 1;
}
probe end {
    printf("count : function\n");
    foreach (name in callcount) {
        printf("%5d : %s\n", callcount[name], name);
    }
}
$ sudo stap -c 'sapi/cli/php test.php' request.stp
count : function
  100 : foo
  101 : bar

So that’s all. You can use systemtap now to probe your PHP. Hope you come up with some useful scripts. Share them!

Bookmarks Revisited Part II: Daily Bookmarking

It’s been a long time since I’ve written part I of the bookmarks revisited series. In the last two years, bookmarks changed a lot. They became part Mercurial’s core functionality and a lot of of tools became bookmark aware.

The current state of bookmarks

As of Mercurial 1.8 bookmarks are part of the Mercurials core. You don’t have to activate the extension anymore. Bookmarks are supported by every major Mercurial hosting platform. Commands like hg summary or hd id will display bookmark information. In addition, the push and pull mechanism changed. I will go into details about his Part III of the series.It’s safe to say, due to it’s exposure, bookmarks became much more mature of the years. It’s time to take a look at how to use them cheap water slides.

Bookmark semantics

Bookmarks are pointers to commits. Think of it as a name for a specific commit. Unlike branches in Mercurial, bookmarks are not recorded in the changeset. They don’t have a history. If you delete them, they will be gone forever.Bookmarks were initially designed for short living branches. I use them as such. It’s indeed possible to use them in different contexts, but I don’t do that. Please be aware, although they were initially intended to be similar to git branches, they often aren’t. They are not branches, they are bookmarks and they should be used like you would use a bookmark in a book. If you advance to the next site, you move the bookmark (or it gets moved).

A bookmark can be active. Only one bookmark can be active at any time, but it’s okay that no bookmark is active. If you have an active bookmark and you commit a new changeset, the bookmark will be moved to the commit. To set a bookmark active you have to update to the bookmark with hg update <name>. To unset, just update to the current revision with hg update ..

A bookmark can have a diverged markers. Bookmarks that are diverged will have a @NAME suffix. For example test@default. Diverged bookmarks are created during push and pull and will be described in Part III.

Continue reading

Xorg: Different options for different keyboards

So I have this esoteric problem that I have 2 totally different keyboards. The Happy Hacking Pro 2 (HHK) and the Realforce 103U. The happy hacking has a special, SUN inspired layout with the control key where standard keyboards usually have their caps. My Realforce 103U has a standard US layout. I am big fan of the old SUN layout and cannot type on keyboards that have the CTRL key on the usual position.

The problem: If I plug in my Realforce, I want to have the CAPS remapped to CTRL. If I plug in my Happy Hacking, it should stay the way it is!
Solution: Xorg Udev Matching

So in recent Xorg versions you can use udev matchings to select the options for a particular keyboard. On my Fedora 16, I added the following file:

$ vim /etc/xorg.conf.d/01-realforce.conf
Section "InputClass"                                                                                             
  Identifier	"Realforce"                                          
  MatchProduct 	"Realforce 103U"
  Option	"XkbLayout"	"us,de"                                              
  Option	"XkbOptions"	"grp:menu_toggle,ctrl:swapcaps"
EndSection

Done. If I plug in my Realforce I have the ctrl and caps key swapped!

Bonus: As the win key on my HHK is right of the alt key I better switch ALT and WIN on my Realforce, too:

Option "XkbOptions" "grp:menu_toggle,ctrl:swapcaps,altwin:swap_lalt_lwin"

Canonical Way to Build PHP 5.4 on Solaris 11


You need gnu-coreutils installed.
$ wget -O php.tar.bz2 http://us.php.net/get/php-5.4.3.tar.bz2/from/this/mirror
$ tar xvjf php.tar.bz2
$ cd php-5.4.3
$ ./configure \
--with-apxs2=/usr/apache2/2.2/bin/apxs \
--prefix=/usr/php/5.4 \
[other options]
$ gsed -ibak 's,\-mt,,' Makefile
$ gsed -i.bak 's,\-i \-a \-n php5 libphp5\.la,-i -n php5 libphp5.la,' Makefile
$ make -j4
$ sudo make install
$ vim /etc/apache2/2.2/conf.d/php5.2.conf
..change stuff to libphp5.la..
$ svcadm restart apache22

Worked for me so far.

Removing a directory from a git repository the fast way

Note to myself:
To remove a directory from an existing git repository there are various ways to do it. The obvious way is

$ git filter-branch --tree-filter 'rm -rf directory/'

Which is just fine for smaller repositories but can take a long time on large repositories with a lot of large files in that directory.
The faster way is to manipulate only the index:

git filter-branch --index-filter 'git ls-files -- DIRECTORY | xargs git update-index --remove' --tag-name-filter cat --prune-empty -f -- --all;

  • git ls-files will give you a list of files in the DIRECTORY
  • git update-index will remove those files from the index
  • And the tag filter is there to include tags
  • –prune-empty tells filter-branch to ignore empty commits.

Done.