interesting gzip behaviour
Sometime, strange things happend that are kind of obvious if you look into deep at the problem.
Just take a look at the following code:$ cd /tmp
$ mkdir -p foo/bar/
$ touch foo/bar/test
$ touch foo/test
$ for ((i=0;$i<5;i=$i+1)); do tar czf foo.tar.gz foo/ && md5sum foo.tar.gz; sleep 1; done
ba29a144a1f1e61aa3e581ef850cb1ec foo.tar.gz
7790fb2d6b1b9b77c101ce5d4fe63b94 foo.tar.gz
5d8f854f4725022b12d058ff7468e38b foo.tar.gz
b7292d286edcedd7d7f02fce16c6098e foo.tar.gz
7945ea06cd8085454898652db4579506 foo.tar.gz
gzip does always produce a different output file. Therefore an automatic tarballing script that creates an .tar.gz from an directory every hour does always result in different md5sums for the tarballs. This might break some software (like dpkg) which check md5sums of upstream sources.
Pretty anoying behaviour, but I figured out whats the problem. The gzip header contains an MTIME field.
Bzip doesn’t have such a time field, therefore bzip creates archives as expected, having always the same checksum.
$ for ((i=0;$i<5;i=$i+1)); do tar cjf foo.tar.bz2 foo/ && md5sum foo.tar.bz2; sleep 1; done
4a71c3031a58650ac694e95d207af779 foo.tar.bz2
4a71c3031a58650ac694e95d207af779 foo.tar.bz2
4a71c3031a58650ac694e95d207af779 foo.tar.bz2
4a71c3031a58650ac694e95d207af779 foo.tar.bz2
4a71c3031a58650ac694e95d207af779 foo.tar.bz2