I just learned about pbzip2, which lets your multicore computer use more than one core when using the bzip2 compression algorithm.

On my Mac Pro at work, I installed it with MacPorts (`sudo port install pbzip2`).  It is this kind of awesome:


$ ls -lh original.tar
-rw-r--r-- 1 jmcmurry staff 2.4G Feb 4 13:47 original.tar
$ time bzip2 -k -v original.tar
original.tar: 36.215:1, 0.221 bits/byte, 97.24% saved, 2604288000 in, 71911733 out.

real 13m3.313s
user 12m50.536s
sys 0m3.773s
$ mv original.tar.bz2 bzip2.tar.bz2
$ time pbzip2 -k -v original.tar
Parallel BZIP2 v1.0.5 - by: Jeff Gilchrist [http://compression.ca]
[Jan. 08, 2009] (uses libbzip2 by Julian Seward)

# CPUs: 8
BWT Block Size: 900k
File Block Size: 900k
-------------------------------------------
File #: 1 of 1
Input Name: original.tar
Output Name: original.tar.bz2

Input Size: 2604288000 bytes
Compressing data...
-------------------------------------------

Wall Clock: 119.369207 seconds

real 1m59.612s
user 14m39.090s
sys 0m44.840s

Sweet. 6.57x faster by adding a “p” to my command line.

The resulting compressed .bz2 files aren’t exactly the same according to md5 (the pbzip2 output is a little larger, which makes sense due to the splitting of the work), but when they decompress, they’re both identical to the original .tar file.

See also: mgzip.