Monthly Archives: June 2012

Language runtimes and backwards compatbility (or why you shouldn’t write a version control system in Python)

Software projects choose languages based on idioms of the languages. Languages can provide mechanisms and structures to support object orientation or functional programming. Less time is spent thinking about backwards compatibility of programming language runtimes. While this is usually a non-issue for short living software like websites or software in tightly controlled environment, it becomes an issue for software projects that need to guarantee backwards-compatibility for years. For example: a version control system.

The Mercurial project aims to support Python 2.4 to Python 2.7. It does not support Python 3. Why? Python 3 is a drastic change. Unicode is the default string type, classes removed, etc. The impact of the changes are similar to the change from PHP 4 to PHP 5. Most software projects have adopted these language changes, but for projects that need to support LTS operating systems like RHEL or Solaris 9/10, it can be become an issue. You could drop Python 2.X support and tell existing users of your software to look for something else – a no-go for a version control system. You could simply not support Python 3 at someday, but Python 2.7 already reached it’s EOL. It’s just a matter time until distribution stop shipping Python 2.X. LTS operating systems might still not have Python 3 and rely on Python 2. Writing software that needs to be backwards-compatbile for 8 years can be a problem sumo suits for sale.

The source of the problem

Why is this a not an issue for Java or C, but for Python, PHP and Ruby? Java and C compile to bytecode that is guaranteed to be stable. C compiles to machinecode. A processor architecture won’t change anymore. If it’s a x86 processor, it will support x86 machinecode. It won’t change with the next software update. If your code needs to support old C code that modern compilers don’t understand anymore, use an old one. Java is similar in that regard. The JVM runtime has a defined set of instructions, which won’t be changed anymore. It doesnt matter which Java compiler you use, in the end it will produce bytecode that will run on any JVM. Sure you still might have problems supporting multiple versions of a library, but at least the JVM will always run your compiled code.

Python and PHP compile to bytecode as well, similar Java. There is, however, one exception: They do it in memory and the VM to interprete the bytecode is bundled with the compiler. This is were the backwards compatibility problem comes in play. You cannot run Python bytecode compiled on Python 3 with a Python 2 interpreter. You cannot compile with PHP 5 and run it on PHP 4. Either the interpreter simply fails to your old code, or your VM implementation is not guaranteed to be stable. That means in Python and PHP the underlying machine that you compile might change with the next update. Let’s compare this to the x86 world. Your next software update might change the x86 instruction set? You would have to recompile all your C code and maybe some of the old C code cannot be compiled with modern C compilers and old C compilers might not be able to get compiled on the new instruction set. Sounds painful, particularly if you really care about backwards-compatibility.

Sidenote

I think that Python, PHP and others did an architectual mistake. They bundled the VM and runtime with the compiler. Thus your language version defines your runtime and the underlying machinecode. If you write a new language, write down a minimum instruction set that you will always support and separate your VM from your compiler. Always support that instruction set. This can lead to interesting problems. The implementation of Java Generics is a good example. Nobody thought about generics when defining the insturctions set. Therefore the bytecode was not designed to retain information about the generic type. Thats why the Java compiler needs to check the generic type information and than transform it, so that the resulting bytecode is compatible with old JVM versions. This is known as type erasure. Python and PHP developer would probably just introduce new bytecodes, not caring about BC. (Well PHP devs would just pretend that PHP is a web language and web projects shouldn’t care about BC at all ;)).

Conclusion
If you seriously care about backward-compatibility for LTS systems that are 8 years old, choose a language which separates the VM from the compiler. Languages like Java (probably C#) do this. Java developer won’t define behavior that requires a new opcode. PHP and Python are wonderful programming languages, but personally I am not sure if it is wise to write something like a VCS in such a language.

Long story short: Language choice matters for BC. If you write your own language, please separate your VM from your compiler. Better (as johannes pointed out) compile to an existing VM like JVM, CLR or LLVM