How to speed up gzip
Making gzip go faster
Gzip. It’s one of those tools you’ll find everywhere. You’ll probably us tar -xzf tarball.tar.gz
on a regular basis. But - did you know gzip
is singlethreaded by default?
That’s fine if you’re just compressing some small files. But what if you have to compress something large? Say - a 12Gb MySQL export?
Well, I’ll tell you. On my system the singlethreaded gzip
took it’s sweet time. 1h 12minutes to be exact.
But pigz
- the multithreaded replacement for gzip
- was done in a mere 7 minutes.
How do I tarball with pigz?
One of the most useful things about gzip
, is that integrates so well withtar
. Luckily tar
allows us to specify which program to use to compress.
So, after you’ve installed pigz
through your package manager of choice, try this on for size:
tar -cf -I pigz filename.tar.gz directory_name/
Or better, if you’re in complete control and compile your own tools, try compiling tar with the --with-gzip=pigz
flag.
Other compression methods
There are also other multithreaded alternatives to the classic compression programs available. Here are some of them:
- bz2 -> pbzip2
- lzip -> plzip
- xz -> pxz
(Do you see the pattern here)
Bonus: parallel compression with gzip, without pigz
You’re on linux, and you probably have xargs
in your distro, right? Did you know you can use it to paralellize a lot of things?
Try this:
xargs -n 1 -P $(grep -c '^processor' /proc/cpuinfo) gzip filename.tar.gz directory_to_compress/
Conclusion
I hope this was as useful for you as it was for me when I first learnt this. I’ve been a commandline jockey for quite a while, and this was one of the tricks which blew my mind.