Language runtimes and backwards compatbility (or why you shouldn’t write a version control system in Python)

Software projects choose languages based on idioms of the languages. Languages can provide mechanisms and structures to support object orientation or functional programming. Less time is spent thinking about backwards compatibility of programming language runtimes. While this is usually a non-issue for short living software like websites or software in tightly controlled environment, it becomes an issue for software projects that need to guarantee backwards-compatibility for years. For example: a version control system.

The Mercurial project aims to support Python 2.4 to Python 2.7. It does not support Python 3. Why? Python 3 is a drastic change. Unicode is the default string type, classes removed, etc. The impact of the changes are similar to the change from PHP 4 to PHP 5. Most software projects have adopted these language changes, but for projects that need to support LTS operating systems like RHEL or Solaris 9/10, it can be become an issue. You could drop Python 2.X support and tell existing users of your software to look for something else – a no-go for a version control system. You could simply not support Python 3 at someday, but Python 2.7 already reached it’s EOL. It’s just a matter time until distribution stop shipping Python 2.X. LTS operating systems might still not have Python 3 and rely on Python 2. Writing software that needs to be backwards-compatbile for 8 years can be a problem.

The source of the problem

Why is this a not an issue for Java or C, but for Python, PHP and Ruby? Java and C compile to bytecode that is guaranteed to be stable. C compiles to machinecode. A processor architecture won’t change anymore. If it’s a x86 processor, it will support x86 machinecode. It won’t change with the next software update. If your code needs to support old C code that modern compilers don’t understand anymore, use an old one. Java is similar in that regard. The JVM runtime has a defined set of instructions, which won’t be changed anymore. It doesnt matter which Java compiler you use, in the end it will produce bytecode that will run on any JVM. Sure you still might have problems supporting multiple versions of a library, but at least the JVM will always run your compiled code.

Python and PHP compile to bytecode as well, similar Java. There is, however, one exception: They do it in memory and the VM to interprete the bytecode is bundled with the compiler. This is were the backwards compatibility problem comes in play. You cannot run Python bytecode compiled on Python 3 with a Python 2 interpreter. You cannot compile with PHP 5 and run it on PHP 4. Either the interpreter simply fails to your old code, or your VM implementation is not guaranteed to be stable. That means in Python and PHP the underlying machine that you compile might change with the next update. Let’s compare this to the x86 world. Your next software update might change the x86 instruction set? You would have to recompile all your C code and maybe some of the old C code cannot be compiled with modern C compilers and old C compilers might not be able to get compiled on the new instruction set. Sounds painful, particularly if you really care about backwards-compatibility.


I think that Python, PHP and others did an architectual mistake. They bundled the VM and runtime with the compiler. Thus your language version defines your runtime and the underlying machinecode. If you write a new language, write down a minimum instruction set that you will always support and separate your VM from your compiler. Always support that instruction set. This can lead to interesting problems. The implementation of Java Generics is a good example. Nobody thought about generics when defining the insturctions set. Therefore the bytecode was not designed to retain information about the generic type. Thats why the Java compiler needs to check the generic type information and than transform it, so that the resulting bytecode is compatible with old JVM versions. This is known as type erasure. Python and PHP developer would probably just introduce new bytecodes, not caring about BC. (Well PHP devs would just pretend that PHP is a web language and web projects shouldn’t care about BC at all ;)).

If you seriously care about backward-compatibility for LTS systems that are 8 years old, choose a language which separates the VM from the compiler. Languages like Java (probably C#) do this. Java developer won’t define behavior that requires a new opcode. PHP and Python are wonderful programming languages, but personally I am not sure if it is wise to write something like a VCS in such a language.

Long story short: Language choice matters for BC. If you write your own language, please separate your VM from your compiler. Better (as johannes pointed out) compile to an existing VM like JVM, CLR or LLVM

Posted June 4th, 2012 in PHP, Programming, Version Control.


  1. Johannes:

    If you really need a new language it can also be beneficial not to write your own engine but use an existing one, like JVM, CLR or even llvm. Saves lots of work and allows access to a larger environment of libraries and tools.

  2. Anonymous:

    Um, IMHO, you are not taking into account a lot of important arguments right here.

    Java/C# are not precisely more stable and widely supported languages than Python or even PHP. The language raw popularity looks more relevant to me than architectural designs when you try to evaluate the future support for a language. In fact, C# is closed… .NET is not suppose to be, but Microsoft do not show hard investments in keeping it spread outside theirs OS.

    On the other side, there is a lot of support for making 2.x and 3.x Python versions coexists everywhere. Because this is something that anyone is doing right now (every modern distribution today should provide isolated 2 and 3 environments). And of course, having the availability to choose higher-level languages have allowed the vcs programmers to better confront other specific hard problems.

    I think there are much more things here than only the VM support and compatibility issues… And of course, lots of modern vcs are designed with import/export capabilities in mind to prevent in-out migration problems…

    So I do not see how can Python be a problem for a vcs… Your argument seems weak to me. More like a fear.

    Side note:

    I will not ever imply that PHP is a good language to program a vcs, but I just do not see the problem with Python.

  3. dsp:

    It’s actual an issue for Mercurial. The community cannot support Python 3 and Python 2 at the same time as it’s too much effort. All 2to3 efforts don’t work for Mercurial as Mercurial uses certain assumptions on bytes-safety that Python 3 breaks for Mercurial. Lookup the endless Python 3 discussions on the Mercurial mailinglist.

    Sure there are a lot of other decisions to make and my argument is just one of many and probably not the most important one. Nevertheless it’s something most people don’t put into consideration.

  4. qznc:

    Could you provide a link to one or more specific mails? I tried searching the mailinglist archives, but could not find the actual technical reasons.

    From your description it sound more like Mercurial depends on implementation-defined behavior of Python. Which would be equivalent to rely on some implementation-defined behavior of GCC 3.something, which changed with GCC 4. Such problems are not Python-specific, they just look different elsewhere.

  5. dsp:

    It’s hard to find: So far I found

  6. Martin Geisler:

    The only problem is that Mercurial insists that “support” means letting users on these platforms run new Mercurial versions. That is not what support normally means: support means that you send out patch releases as needed for grave bugs. It doesn’t mean that you get to both run a stable OS *and* get new features in Mercurial.

    If we stopped that in Mercurial, then we could quickly move the code to Python 2.6 or 2.7 and make the final jump to Python 3.x when we feel that this platform is widespread enough.

Leave a response: