DTracing PHP

Dtrace is an extremely flexible and powerful tool to trace and debug applications. I recently dtraced PHP a little bit, so here is a tutorial how to start dtracing PHP.
Prequisites
You need a dtrace capable operating system like Solaris 10, OpenSolaris, Mac OS X 10.5 or FreeBSD 7. I use dtrace on OpenSolaris 2008.11. You also need at least dtrace privileges or need to be root to be able to trace a program. I used PHP 5.3.0alpha3 in my tests, but it actually doesn’t matter. As PHP doesn’t provide DTrace probes itself we have to trace the underlying C-functions, therefore we need to know the Zend Engine and PHP’s internals.

An Introduction into DTrace
Dtrace is really powerful and trying to do an introduction to all it’s features is just not possible. Therefore I will just focus on the basics, that are needed to get our stuff working. The basic idea behind Dtrace is that the kernel and userland programs fire probes on a specific location in the kernel or the userland program. As the probe is just fired if DTrace is running and trying to catch those markers, the probes itself don’t cost CPU power at all (in fact they are NOP’s). But if we trace them, we are able to see those probes and they are able to pass some data. To get way more complex information out of those probes, DTrace has an builtin scripting language that catches those probes and process them.

Dtrace program structure
Every probe in DTrace is describe by a 4 tuple. The first one is the so-called provider. The second, the probemodule, the third is the probefunc and the last one is the probename. The parts are separated by a semicolon, therefore every probe is defined by:

<probeprovider>:<providermodule>:<probefunc>:<probename>

In our example, the provider will usually the pid provider that gives us probes for a given PID (the PHP controller we want to trace). The probemodule will be php and the probefunc will be the function name of the underlying C-function. The name can actually be everything but it is best-practice to have a probe at the entry of a function, called entry and returnjust before the function returns. A DTrace program have an probe identifier that matches a probe. Every identifier has a body that can contain initialization of variables or accumulation of data using a table. Also, every identifier can have a condition that is evaluated before the body is run. So we have the following structure:

<probeprovider>:<providermodule>:<probefunc>:<probename>
/ <condition> /
{
<body>
}

As an example, we trace gc_collect_cylce function:

pid$target:php:gc_collect_cycle_function:entry
{
trace(probename);
}

Please notice that DTrace features a lot predefined variables, such as probename. Please refer to Sun’s Dtrace Guide to get a list of predefined variables.

PHP probes
As PHP doesn’t provide own probes we have to use the underlying C-functions. PHP itself doesn’t provide own probes. There is ext/dtrace written by Wez Furlong which gives PHP some probes, but as we want to get a little bit deeper, we have to use the underlying C-functions. Userland programs are usually traced by the so-called pid-provider. It’s used to get all probes by a program with a given PID. Unlike userland programs, the kernel doesn’t have a pid, therefore kernel probes do not have a PID and can be identified easily be an unique name. As we sometimes do not know the PID of the php process that we want to trace, there is a variable called $target. If yo use it, $target will be automatically filled with the PID that was created when dtrace started the program. If I start a program with dtrace -s test.d -c ‘php test.php’ the programm php test.php will be executed and the created PID will be assigned to the variable $target. DTrace also have an option just to display the available probes: Try dtrace -ln ‘pid$target:php::entry’ -c ‘php test.php’ to get all function entries called by PHP. You might from time to time a get a message that not enough memory was available to trace. In those cases you just tried to capture too much probes. Therefore, I used the specific :::entry notation to just get function entries, which worked fine on my machine. So let’s take look at some interesting probes. If we just try to display all probes related to PHP with dtrace -ln ‘pid$target:php::entry’ -c ‘php test.php’.
The output might start with something like:

ID PROVIDER MODULE FUNCTION NAME
69608 pid2696 php _start entry
69609 pid2696 php __fsr entry
69610 pid2696 php _free_ereg_cache entry
69611 pid2696 php zm_startup_ereg entry
69612 pid2696 php zm_shutdown_ereg entry
69613 pid2696 php zm_info_ereg entry
69614 pid2696 php zif_ereg entry
69615 pid2696 php zif_eregi entry
69616 pid2696 php php_ereg_replace entry
69617 pid2696 php zif_ereg_replace entry
69618 pid2696 php zif_eregi_replace entry
69619 pid2696 php zif_split entry
69620 pid2696 php zif_spliti entry
69621 pid2696 php zif_sql_regcase entry
69622 pid2696 php php_regcomp entry
69623 pid2696 php p_ere entry
69624 pid2696 php p_ere_exp entry
69625 pid2696 php p_str entry
69626 pid2696 php p_bre entry
69627 pid2696 php p_simp_re entry

First example
So our first example will be that we want to trace the compilation time. The compile function is called compile_file in PHP. Therefore our probe identifier would be pid$target:php:compile_file. The PID provider will just get the probes for a specific probe, while the php probemodule will focus on all PHP related functions. We then select the compile_file function. Provided by the kernel, there are two probenames, calle entry and . Entry is called when the function is entered and return, when the function returns. So we need to trace pid$target:php:compile_time:entry and pid$target:php:compile_time:return. We know just have to get information how long the compilation takes. For that purpose DTrace defines a variable called timestamp that is accessible within the body of an DTrace identifier which holds the milliseconds from the beginning of a program. So know we catch our entry probe and set the time to a local value:

pid$target:php:compile_file:entry
{
self->compile_start = timestamp;
}

Note that the self-> notation is used to get a per-thread variable. Therefore every thread would have it’s own value. It’s not particularly necessary for our PHP based example, but it is best-practice to use thread-local variables whenever possible.

We just need our return probe that calculates the real offset:

pid$target:php:compile_file:return
{
printf(“Compile time: %dn”, timestamp – self->compile_start);
}

This will output our actual compile time. Just put both identifiers and their bodies into ‘compiletime.d’ and start dtrace with dtrace -q -s compiletime.d -c ‘php test.php’ and it will display you the compilation time of test.php. Please notice that both self->compile_start and timestamp are integers. Dtrace in fact doesn’t have the notion of a float primitive, so you would not be able to output a float using printf.

The output might be similar to:

$ dtrace -q -s compiletime.d -c ‘php test.php’
Compile Time: 99604

Advcanced Example
Now let’s get a little bit deeper into the engine. We now want to trace when the garbage collector hit’s in and want to see the amount of freed refs. The garbage collector uses the function gc_collect_cycles.

pid$target:php:gc_collect_cycles:return
{
printf(“%d refs freed”, arg1);
}

Please notice the special arg1 variable. This always holds the return value in “return” probenames. In “entry”, arg0…argN will hold the function parameters.

For more detailed examples, see my next blog entry, which will contain more sophisticated DTrace scripts.

2 Comments

  1. Wez Furlong says:

    “As PHP doesn’t provide own probes”

    This is only partially true; there is a pecl extension that gives you more probes; if you google for “dtrace php” you’ll find work from others on this topic.

  2. dsp says:

    Yeah, I’m aware of the extension and I use them quite often myself, but in general, PHP doesn’t provide out-of-the box probes.

Leave a Reply