Inofficial PHP GIT repositories – Importing large trees

A few month ago Johannes Schlüter and I started discussing about GIT and other decentralized version control systems. During our exploration of GIT we thought about importing the PHP CVS tree into git. A few weeks later and a lot of wasted cpu time, we finally managed to provide an inofficial GIT mirror of the PHP CVS repository. It’s provided by Johannes Schlüter and mirrored by me.
Back in late Oct 2007 Johannes and I started discussing about GIT. We both like GIT and the decentralized approch and started working with GIT on our private projects. But the major features of GIT are revealed in bigger projects with more than just one or two contributers. As Johannes is a well-known PHP core developer and I contribute to PHP myself from time to time, we soon came up with the idea of providing a GIT repository for PHP. Since then we both tried to import the CVS repository of PHP from time to time into a GIT repository.

Our first approach was using git-cvsimport which ships with the GIT distribution. git-cvsimport is based on cvsps which parses the output of rlogs on a current working copy and receives all revision for all files in the working copy and imports them into a git repository. It turns out that git-cvsimport works quite well for small projects, but for large and complex CVS repositories like the PHP one, git-cvsimport fails. It imports most of the files but from time to time it messes up with the import of revisions or even complete branches. That leads to a broken repository that was not able to compile anymore. This behaviour seems to be a result of the cvsps import mechanism based on rlogs which doesn’t have enough information to make all mappings of files to branchnes.

Therefore we tried parsecvs. Instead of git-cvsimport, parsecvs works on ,v RCS files from the CVSROOT. It parses them and imports the revisions into GIT. As php.net provides an rsync mirror to get the CVSROOT sources (thanks to Derick for adding ZendEngine2 to the rsyncs), we were able to import the PHP repository into GIT. But we still run into troubles. The PHP CVS obviously has a branch name dev/. That branchname is passed to git-update-ref by parsecvs. This git command is based on git refspecs which uses slashes to separate names. As the slash in dev/ was not escaped right, we needed to patch parsecvs ourselfs to escape (in fact use ~ for /) that character. Besides that fix, which was done by me, Johannes fixes some other issues in our own parsecvs branch.

Thanks to our parsecvs we finally got a working GIT repository imported from CVS. It was just about 2,2 Gb big. A mail from the git mailinglist concerning the GCC GIT repository leads us to the right solution to get that repository smaller by recalculating the revision deltas using git-repack, which tooks just about 3 hours. That attemp leads to a perfect and small 94MB repository.

This inofficial GIT repository is updated from the official PHP CVS 2 times per day. It is served by Johannes Schlüter, who also provides imports for ZendEngine2 and TSRM which are needed to compile PHP. You find a mirror of these repositories on my GIT site.

[LINK] The inofficial GIT PHP repository.
[LINK] The inofficial GIT TSRM repository.
[LINK] The inofficial GIT ZendEngine2 repository.

Feel free to pull the repositories and send patches to us using GIT. Feel free to drop me a comment.

5 thoughts on “Inofficial PHP GIT repositories – Importing large trees

  1. Johannes Schlüter

    David has written a nice a blog article about our experiences while importing the PHP-Sources, including full CVS history, into git. Unfortunately the blog is, for unknown reasons, not listed on Planet PHP anymore. If you read about git this might be a

    Reply
  2. René Leonhardt

    Nice work!
    Have all tags been imported, too, or only a predefined subset?

    I am wondering if someone tried to import PHP into Bazaar or Mercurial (like Mozilla did), the Python 4G SCMs have the advantage of being platform-independent…

    Reply
  3. dsp

    Hi Rene,

    We importet all tags. You might want to clone the complete tree and remove tags you don’t need.

    In fact I tried to import it into Mercurial but the mecurial import of the CVS was not as good as the git stuff. I don’t know if there is something like ParseCVS for Mercurial, but the usuallt git-cvsimport, even it fails, made a better job than the HG import.
    I suggest to clone the GIT repository of PHP and then import that into mecurial, which will result in a good working copy in HG.

    Reply
  4. shire

    This currently seems to be a little broken, as ext/phar/php_phar.h and other files are missing from PHP_5_3 and HEAD.

    I have a parsecvs patch that fixes the missing files for my imports, it seems related to the errors “Warning: %s too late date through branch %s\n” as t dates are off somehow by a good amount. (The patch just continues rather than setting the array item to null and fixes a subsequent segfault when prev->file is NULL). Unfortunately Zend/zend_language_scanner.c is still out of sync (.l seems fine).

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>