Tag: gzip

  • Ubuntu: How to use multiple cores with tar gzip compression

     

    One thing you may have noticed when using the z switch with tar is that the compression can take some time! If you look at your CPU usage, though, you’ll notice that only one core is being utilised to compress the files. In a modern system 4 or 8 cores are common, meaning that there is plenty of potential to speed up the process if you could utilise more cores. As the gzip package only supports one core, we need to look elsewhere.

     

    Fortunately, there is a gzip package which uses multiple cores available – it’s called pigz. To install it type:

     

    sudo apt-get install pigz


     

    Once that is installed we can tell the tar command to use it like so:

     

    tar -c --use-compress-program=pigz  -f [tar file] [directory or files]


     

    e.g.:

     

    tar -c --use-compress-program=pigz -f backupOfMovies.tar /opt/movies


     

    Note the double hyphen before use. Check your CPU usage while the command is running – you should be able to see all available cores being utilised!

  • ZFS: How to change the compression level

    By default ZFS uses the lzjb compression algorithm; you can select others when setting compression on a ZFS folder. To try another one do the following:

     

    sudo zfs set compression=gzip [zfs dataset]

     

    This changes the compression algorithm to gzip. By default this sets it to gzip-6 compression; we can actually specify what level we want with:

     

    sudo zfs set compression=gzip-[1-9] [zfs dataset]

     

    e.g.

     

    sudo zfs set compression=gzip-8 kepler/data

     

    Note that you don’t need the leading / for the pool, and that you can set this at a pool level and not just on sub-datasets. 1 is the lowest level of compression (less CPU-intensive, less compressed) where gzip-9 is the opposite – often quite CPU intensive and offers the most compression. This isn’t necessarily a linear scale, mind, and the type of data you are compressing will have a huge impact on what sort of returns you’ll see. Try various levels out on your data, checking the CPU usage as you go and the compression efficiency afterwards – you may find that 9 is too CPU-intensive, or that you don’t get a great deal of benefit after a certain point. Note that when you change the compression level it only affects new data written to the ZFS dataset; an easy way of testing this is to make several sets, set a different level of compression on each and copy some typical data to them one by one while observing. We discussed checking your compression efficiency in a previous post.

     

    Compression doesn’t just benefit us in terms of space saved, however – it can also greatly improve disk performance at a cost of CPU usage. Try some benchmarks on compression-enabled datsets and see if you notice any improvement – it can be anywhere from slight to significant, depending on your setup.