I Did Not Know: pbzip2

I just learned about pbzip2, which lets your multicore computer use more than one core when using the bzip2 compression algorithm.

On my Mac Pro at work, I installed it with MacPorts (sudo port install pbzip2). It is this kind of awesome:

$ ls -lh original.tar
-rw-r--r--  1 jmcmurry  staff   2.4G Feb  4 13:47 original.tar
$ time bzip2 -k -v original.tar
original.tar: 36.215:1,  0.221 bits/byte, 97.24% saved, 
2604288000 in, 71911733 out.

real	13m3.313s
user	12m50.536s
sys	0m3.773s
$ mv original.tar.bz2 bzip2.tar.bz2
$ time pbzip2 -k -v original.tar
Parallel BZIP2 v1.0.5 - by: Jeff Gilchrist [http://compression.ca]
[Jan. 08, 2009]             (uses libbzip2 by Julian Seward)

# CPUs: 8
BWT Block Size: 900k
File Block Size: 900k
-------------------------------------------
File #: 1 of 1
Input Name: original.tar
Output Name: original.tar.bz2

Input Size: 2604288000 bytes
Compressing data...
-------------------------------------------

Wall Clock: 119.369207 seconds

real	1m59.612s 
user	14m39.090s
sys	0m44.840s

Sweet. 6.57x faster by adding a “p” to my command line.

The resulting compressed .bz2 files aren’t exactly the same according to md5 (the pbzip2 output is a little larger, which makes sense due to the splitting of the work), but when they decompress, they’re both identical to the original .tar file.

See also: mgzip.