How to speed up gzip

Making gzip go faster

Gzip. It’s one of those tools you’ll find everywhere. You’ll probably us tar -xzf tarball.tar.gz on a regular basis. But - did you know gzip is singlethreaded by default?

That’s fine if you’re just compressing some small files. But what if you have to compress something large? Say - a 12Gb MySQL export?

Well, I’ll tell you. On my system the singlethreaded gzip took it’s sweet time. 1h 12minutes to be exact.

But pigz - the multithreaded replacement for gzip - was done in a mere 7 minutes.

Speeeeed

How do I tarball with pigz?

One of the most useful things about gzip, is that integrates so well with
tar. Luckily tar allows us to specify which program to use to compress.
So, after you’ve installed pigz through your package manager of choice, try this on for size:

tar -cf -I pigz filename.tar.gz directory_name/

Or better, if you’re in complete control and compile your own tools, try compiling tar with the --with-gzip=pigz flag.

Other compression methods

There are also other multithreaded alternatives to the classic compression programs available. Here are some of them:

  • bz2 -> pbzip2
  • lzip -> plzip
  • xz -> pxz

(Do you see the pattern here)

Bonus: parallel compression with gzip, without pigz

You’re on linux, and you probably have xargs in your distro, right? Did you know you can use it to paralellize a lot of things?

Try this:

xargs -n 1 -P $(grep -c '^processor' /proc/cpuinfo) gzip filename.tar.gz directory_to_compress/

Conclusion

I hope this was as useful for you as it was for me when I first learnt this. I’ve been a commandline jockey for quite a while, and this was one of the tricks which blew my mind.