Probing PHP with Systemtap on Linux

DTrace is a dynamic tracing tool build by Sun Microsystems and is available for Solaris, MacOS and FreeBSD. It features a tracing language which can be used to probe certain “probing” points in kernel or userland. This can be very useful to gather statistics, etc. Linux comes with a separate solution called systemtap. It also features a tracing language and can probe both userland and kernel space. A few Linux distributions such as Fedora enable systemtap in their default kernel.

PHP introduced DTrace support with PHP 5.3, enabling probing points in the PHP executable that can be used to simplify probing of PHP applications without having to the PHP implementation details. We enabled probes on function calls, file compilation, exceptions and errors. But this has always been limited to the operating systems that support DTrace. With the popularity of DTrace, Systemap programmers decided to add a DTrace compatibility layer that allows to use DTrace probes as Systemtap probing points as well.

With my recent commit to the PHP 5.5 branch, we allow DTrace probes to be build on Linux, so people can use Systemtap to probe those userland probes.

To compile PHP with userland probes you need to obtain the PHP 5.5 from git:

$ git clone git://github.com/php/php-src php-src
$ cd php-src
$ git checkout PHP-5.5
Now build PHP with DTrace support. First we have to rebuild configure as we build directly from the repository. Make sure your Linux distribution comes with systemtap and uprobes support.

$ ./buildconf --force
$ ./configure --disable-all --enable-dtrace
$ make
After being done with building we can see if we found any probes:

$ stap -l 'process.provider("php").mark("*")' -c 'sapi/cli/php -i'
process("sapi/cli/php").provider("php").mark("compile__file__entry")
process("sapi/cli/php").provider("php").mark("compile__file__return")
process("sapi/cli/php").provider("php").mark("error")
process("sapi/cli/php").provider("php").mark("exception__caught")
process("sapi/cli/php").provider("php").mark("exception__thrown")
process("sapi/cli/php").provider("php").mark("execute__entry")
process("sapi/cli/php").provider("php").mark("execute__return")
process("sapi/cli/php").provider("php").mark("function__entry")
process("sapi/cli/php").provider("php").mark("function__return")
process("sapi/cli/php").provider("php").mark("request__shutdown")
process("sapi/cli/php").provider("php").mark("request__startup")
Let’s build us a short Systemtap script that counts the function calls of a specific function. we use the function-return and function-entry probes for that:

$ cat request.stp
global callcount;
probe process.provider("php").mark("function-entry") {
    callcount[user_string($arg1)] += 1;
}
probe end {
    printf("count : function\n");
    foreach (name in callcount) {
        printf("%5d : %s\n", callcount[name], name);
    }
}
$ sudo stap -c 'sapi/cli/php test.php' request.stp
count : function
  100 : foo
  101 : bar

So that’s all. You can use systemtap now to probe your PHP. Hope you come up with some useful scripts. Share them!

Bookmarks Revisited Part II: Daily Bookmarking

It’s been a long time since I’ve written part I of the bookmarks revisited series. In the last two years, bookmarks changed a lot. They became part Mercurial’s core functionality and a lot of of tools became bookmark aware.

The current state of bookmarks

As of Mercurial 1.8 bookmarks are part of the Mercurials core. You don’t have to activate the extension anymore. Bookmarks are supported by every major Mercurial hosting platform. Commands like hg summary or hd id will display bookmark information. In addition, the push and pull mechanism changed. I will go into details about his Part III of the series.

It’s safe to say, due to it’s exposure, bookmarks became much more mature of the years. It’s time to take a look at how to use them.

Bookmark semantics

Bookmarks are pointers to commits. Think of it as a name for a specific commit. Unlike branches in Mercurial, bookmarks are not recorded in the changeset. They don’t have a history. If you delete them, they will be gone forever.

Bookmarks were initially designed for short living branches. I use them as such. It’s indeed possible to use them in different contexts, but I don’t do that. Please be aware, although they were initially intended to be similar to git branches, they often aren’t. They are not branches, they are bookmarks and they should be used like you would use a bookmark in a book. If you advance to the next site, you move the bookmark (or it gets moved).

A bookmark can be active. Only one bookmark can be active at any time, but it’s okay that no bookmark is active. If you have an active bookmark and you commit a new changeset, the bookmark will be moved to the commit. To set a bookmark active you have to update to the bookmark with hg update <name>. To unset, just update to the current revision with hg update ..

A bookmark can have a diverged markers. Bookmarks that are diverged will have a @NAME suffix. For example test@default. Diverged bookmarks are created during push and pull and will be described in Part III.

Read the rest of this entry »

I should…

blog more. Open topics: DTrace Part II, Mercurial Bookmarks Part II.

Language runtimes and backwards compatbility (or why you shouldn’t write a version control system in Python)

Software projects choose languages based on idioms of the languages. Languages can provide mechanisms and structures to support object orientation or functional programming. Less time is spent thinking about backwards compatibility of programming language runtimes. While this is usually a non-issue for short living software like websites or software in tightly controlled environment, it becomes an issue for software projects that need to guarantee backwards-compatibility for years. For example: a version control system.

The Mercurial project aims to support Python 2.4 to Python 2.7. It does not support Python 3. Why? Python 3 is a drastic change. Unicode is the default string type, classes removed, etc. The impact of the changes are similar to the change from PHP 4 to PHP 5. Most software projects have adopted these language changes, but for projects that need to support LTS operating systems like RHEL or Solaris 9/10, it can be become an issue. You could drop Python 2.X support and tell existing users of your software to look for something else – a no-go for a version control system. You could simply not support Python 3 at someday, but Python 2.7 already reached it’s EOL. It’s just a matter time until distribution stop shipping Python 2.X. LTS operating systems might still not have Python 3 and rely on Python 2. Writing software that needs to be backwards-compatbile for 8 years can be a problem.

The source of the problem

Why is this a not an issue for Java or C, but for Python, PHP and Ruby? Java and C compile to bytecode that is guaranteed to be stable. C compiles to machinecode. A processor architecture won’t change anymore. If it’s a x86 processor, it will support x86 machinecode. It won’t change with the next software update. If your code needs to support old C code that modern compilers don’t understand anymore, use an old one. Java is similar in that regard. The JVM runtime has a defined set of instructions, which won’t be changed anymore. It doesnt matter which Java compiler you use, in the end it will produce bytecode that will run on any JVM. Sure you still might have problems supporting multiple versions of a library, but at least the JVM will always run your compiled code.

Python and PHP compile to bytecode as well, similar Java. There is, however, one exception: They do it in memory and the VM to interprete the bytecode is bundled with the compiler. This is were the backwards compatibility problem comes in play. You cannot run Python bytecode compiled on Python 3 with a Python 2 interpreter. You cannot compile with PHP 5 and run it on PHP 4. Either the interpreter simply fails to your old code, or your VM implementation is not guaranteed to be stable. That means in Python and PHP the underlying machine that you compile might change with the next update. Let’s compare this to the x86 world. Your next software update might change the x86 instruction set? You would have to recompile all your C code and maybe some of the old C code cannot be compiled with modern C compilers and old C compilers might not be able to get compiled on the new instruction set. Sounds painful, particularly if you really care about backwards-compatibility.

Sidenote

I think that Python, PHP and others did an architectual mistake. They bundled the VM and runtime with the compiler. Thus your language version defines your runtime and the underlying machinecode. If you write a new language, write down a minimum instruction set that you will always support and separate your VM from your compiler. Always support that instruction set. This can lead to interesting problems. The implementation of Java Generics is a good example. Nobody thought about generics when defining the insturctions set. Therefore the bytecode was not designed to retain information about the generic type. Thats why the Java compiler needs to check the generic type information and than transform it, so that the resulting bytecode is compatible with old JVM versions. This is known as type erasure. Python and PHP developer would probably just introduce new bytecodes, not caring about BC. (Well PHP devs would just pretend that PHP is a web language and web projects shouldn’t care about BC at all ;)).

Conclusion
If you seriously care about backward-compatibility for LTS systems that are 8 years old, choose a language which separates the VM from the compiler. Languages like Java (probably C#) do this. Java developer won’t define behavior that requires a new opcode. PHP and Python are wonderful programming languages, but personally I am not sure if it is wise to write something like a VCS in such a language.

Long story short: Language choice matters for BC. If you write your own language, please separate your VM from your compiler. Better (as johannes pointed out) compile to an existing VM like JVM, CLR or LLVM

Xorg: Different options for different keyboards

So I have this esoteric problem that I have 2 totally different keyboards. The Happy Hacking Pro 2 (HHK) and the Realforce 103U. The happy hacking has a special, SUN inspired layout with the control key where standard keyboards usually have their caps. My Realforce 103U has a standard US layout. I am big fan of the old SUN layout and cannot type on keyboards that have the CTRL key on the usual position.

The problem: If I plug in my Realforce, I want to have the CAPS remapped to CTRL. If I plug in my Happy Hacking, it should stay the way it is!
Solution: Xorg Udev Matching

So in recent Xorg versions you can use udev matchings to select the options for a particular keyboard. On my Fedora 16, I added the following file:

$ vim /etc/xorg.conf.d/01-realforce.conf
Section "InputClass"                                                                                             
  Identifier	"Realforce"                                          
  MatchProduct 	"Realforce 103U"
  Option	"XkbLayout"	"us,de"                                              
  Option	"XkbOptions"	"grp:menu_toggle,ctrl:swapcaps"
EndSection

Done. If I plug in my Realforce I have the ctrl and caps key swapped!

Bonus: As the win key on my HHK is right of the alt key I better switch ALT and WIN on my Realforce, too:

Option "XkbOptions" "grp:menu_toggle,ctrl:swapcaps,altwin:swap_lalt_lwin"

Canonical Way to Build PHP 5.4 on Solaris 11


You need gnu-coreutils installed.
$ wget -O php.tar.bz2 http://us.php.net/get/php-5.4.3.tar.bz2/from/this/mirror
$ tar xvjf php.tar.bz2
$ cd php-5.4.3
$ ./configure \
--with-apxs2=/usr/apache2/2.2/bin/apxs \
--prefix=/usr/php/5.4 \
[other options]
$ gsed -ibak 's,\-mt,,' Makefile
$ gsed -i.bak 's,\-i \-a \-n php5 libphp5\.la,-i -n php5 libphp5.la,' Makefile
$ make -j4
$ sudo make install
$ vim /etc/apache2/2.2/conf.d/php5.2.conf
..change stuff to libphp5.la..
$ svcadm restart apache22

Worked for me so far.

Removing a directory from a git repository the fast way

Note to myself:
To remove a directory from an existing git repository there are various ways to do it. The obvious way is

$ git filter-branch --tree-filter 'rm -rf directory/'

Which is just fine for smaller repositories but can take a long time on large repositories with a lot of large files in that directory.
The faster way is to manipulate only the index:

git filter-branch --index-filter 'git ls-files -- DIRECTORY | xargs git update-index --remove' --tag-name-filter cat --prune-empty -f -- --all;

  • git ls-files will give you a list of files in the DIRECTORY
  • git update-index will remove those files from the index
  • And the tag filter is there to include tags
  • –prune-empty tells filter-branch to ignore empty commits.

Done.

Random thoughts about contributions

The PHP community announced that they will be switching to Git. This lead to some discussion on Twitter, wether it is good to go directly to Github or use git.php.net as the gateway to ensure control over ACLs. People were argueing that github encourages people to contribute and that the PHP community is stuck in the 90s if they don’t switch completly over to Git.

So my 2 cents:

What really makes people contribute:

  1. A nice and encouraging community
  2. Respect the work of others
  3. Don’t take everything for granted

What doesn’t encourage people:

  1. Continous rants about how to do things or not
  2. Telling people what they do is totally wrong
  3. Not contribute yourself

Open Source Projects are community driven. There is a place for discussion, but note that Open Source Communities are open, so people will come in and start ranting about things. If you are serious about a certain problem and want to solve it, contribute! If you want things to change, contribute! If you want to have your opinion heard, contribute! But do not try to squeeze argumentations in 140 characters and think everyone will follow you, just because it’s you.

Personally I’m getting tired of this, making me either not to contribute anymore (and you guys are stuck with SVN :)) or just ignore people.

How to run clojure.test in Slime and Swank

$ lein swank
In emacs use M-x slime-connect to connect to swank.
user> (use 'clojure.test)
nil
user> (use :reload 'geocommit.test.services) (run-test 'geocommit.test.services)
{:type :summary, :test 3, :pass 9, :fail 0, :error 0}

Locate your commits or how to use geocommit.

We recently launched geocommit.com. Geocommit is a service to add geolocation data to your commits. You only need a working WiFi connection. No GPS module is required.

This blogpost gives you an example how to use geocommit and the geocommit.com services. I’ll show how to use geocommits in your Git or mercurial project. How to make github and bitbucket more beautiful with our Chrome and Firefox extensions and how to get a fancy map of your geocommits.

What is geocommit

First of all, geocommit is a text format to attach geolocation data to version control system commits. The geocommit website has detailed information about the geocommit format.
Second, geocommit is a service to store and analyse your geocommit data. We offer a set of tools and a webservice to make geocommit cool. The Git implementation git geo runs on Mac OS X and Linux. The Mercurial implementation hg-geo runs only under Linux. Mac OS support is under way.

Git & Geocommit

To start with geocommit, install git geo:

$ pip install geocommit

Go to a project directory and enable geocommit support:

$ cd myproject.git
$ git geo setup
geocommit setup
Installing geocommit hook in /home/dsp/awesomeproject/.git/hooks/post-rewrite
Installing geocommit hook in /home/dsp/awesomeproject/.git/hooks/post-merge
Installing geocommit hook in /home/dsp/awesomeproject/.git/hooks/post-commit

This will enable geocommit support in your project. If you commit something with git commit, git geo will try to get your current location and add a geocommit. If no WiFi connection is enabled, no geocommit will be created.

Check your geocommits:

$ git log --show-notes='geocommit'
commit 5a34e6ebc8cb5c2a394ca26505c1d375095161c4
Merge: 25cf72d 828af6e
Author: David Soria Parra 
Date:   Tue Jan 4 14:00:55 2011 +0100

    Merge branch 'master' of https://github.com/jezdez/geocommit

Notes (geocommit):
    geocommit (1.0)
    lat: 48.1211828
    long: 11.4853565
    hacc: 39.0
    src: nmg

Let’s push our geocommits to github:

$ git geo push

git geo push accepts the same options as git push. It pulls geocommits first, merges them and then pushes geocommits and the given branch to the remote repository.
That’s everything you need. Easy, isn’t it? So let’s see how to enable geocommits on Mercurial and then talk about the Chrome and Firefox extensions.


Deep dive
git geo stores geocommits in git notes. We use the namespace geocommit for that. Git notes have some cool properties. They are metadata and don’t change the commit hash. Therefore they can be added to a commit at anytime. They are displayed on github and can be deleted without any problem. You also can decide yourself when to push geocommits or not. You can delete already pushed geocommits without breaking the repository or changing any commit sha1. The drawback is that it is hard to deal with git notes from time to time. git notes is a new feature in git and not yet fully supported. We have to write a script to merge git notes as git notes merge is not available before git 1.7.7.


Mercurial & geocommit

You can add support for geocommits to Mercurial by installing the hg-geo extension. Clone the extension and enable it in your hgrc:

$ hg clone http://bitbucket.org/segv/hg-geo
$ echo "[extensions]\ngeo=/path/to/hg-geo/geo.py"
$ hg help geo

The extension will add an additional line to every commit that you do.

$ hg commit
$ hg log -v
changeset:   9:236a0f4c3d2e
tag:         tip
user:        David Soria Parra 
date:        Sun Jan 02 03:01:04 2011 +0100
files:       .hgtags
description:
Added tag v1.0.0 for changeset 3079e3ff3083

geocommit(1.0): lat 48.1211306, long 11.4853251, hacc 30.0, src nmg;

Now push your geocommits to bitbucket.

$ hg push

Deep dive
As Mercurial doesn’t have a way to store metadata, we are adding the geocommit data to the commit message itself. The obvious advantage is that you can use hg-geo with plain Mercurial. You do not need to enable hg-geo on the remote site to push geocommits (like Mercurial bookmarks). The disadvantage is that we modify the commit message and therefore the commit hash. There is no easy way to delete geocommits once they are created.


bitbucket.org and github.com

We can push geocommits easily now. But how to use them? We can install the Firefox or Chrome extension. This will display a map next to your commit!

Firefox
To install the geocommit extension for Firefox you need Greasemonkey. Greasemonkey is a well know and supported extension that enables user scripts to safely modify the displayed website.

Install Greasemonkey from userscripts.org. You can then browse bitbucket.org or github.com and see a map of your geocommit:

bitbucket with geocommit support

github with geocommit support

Chrome
On Chrome install the plugin from
chrome.google.com

Post Hook

We offer a post hook that you can use with github.com and bitbucket.org. Your commits will be tracked by gecommit.com and we will create a global and a project specific map as well as provide further analytics as soon as possible.

github.com
To install the hook go to th eadmin section of your repository and select Service Hooks.

Add http://hook.geocommit.com/api/github as a POST service hook.

on bitbucket.org
Go to the admin seciton of your repository and select Services

Add http://hook.geocommit.com/api/bitbucket as a POST service hook

Thats about it. Browse www.geocommit.com/full.html to checkout your commits on our map.

Questions?!
Enjoy!