Fwd: lrzip: extreme compression (but beware its slow decompression speed)

Joel Sherrill joel.sherrill at OARcorp.com
Fri Mar 30 17:05:08 UTC 2012


Interesting study of compression programs on gcc's tarball.

The size vs time is interesting.

-------- Original Message --------
Subject: 	lrzip: extreme compression (but beware its slow decompression 
speed)
Date: 	Fri, 30 Mar 2012 12:01:45 -0500
From: 	Jim Meyering <jim at meyering.net>
To: 	gcc at gcc.gnu.org <gcc at gcc.gnu.org>



In case you're evaluating what compression programs to use...

This started off as a comparison of xz and lzip,
but then I added lzrip to the mix.

Sometimes it's useful to have an idea of how far from "ideal"
a compression program is.  I'm not claiming to have the answer,
but merely sharing my surprise at how far off xz and lzip are
when it comes to the size of the compressed result.

I started off by downloading the gcc-4.7.0.tar.bz2 release tarball
and decompressing it, then recompressing using bzip2, lzip, xz and lrzip:
(on a 6/12-core Fedora 17 x86_64 system with plenty of RAM)

   KiB   compression
  size   time m:ss  file name
------  --------   -----------------
514400     NA      gcc-4.7.0.tar
  80588  0:58.12    gcc-4.7.0.tar.bz2 (-9)
  59556  6:16.61    gcc-4.7.0.tar.lz (-9)
  58640  5:55.78    gcc-4.7.0.tar.xz (-9e)
  48876  2:46[*]    gcc-4.7.0.tar.lrz (-z -L8 -w2000)

[*] multi-threaded; I think it had at least 6 or 7 cores busy at one point.
This is using the latest, v0.47-590-ga9ba55f, from the upstream repo,
git://github.com/ckolivas/lrzip.git

The above shows that xz compresses both faster (by 5%)
and better (by 916 KiB, or ~1.5%).

It also shows that lrzip compresses extremely well, saving over 9MiB
(aka more than 16%) over xz with its -9e options.

----------------------------------------------------
More importantly, what about decompression speed?
The compression happens relatively rarely, by the person who prepares
a release, but then many people download and decompress the result.

(the following xz and lzip times are each best-of-3)

     $ env time xz -dc gcc-4.7.0.tar.xz>  /dev/null
     4.35

     $ env time --f=%e lzip -dc gcc-4.7.0.tar.lz>  /dev/null
     6.06

     $ env time --f=%e bzip2 -dc gcc-4.7.0.tar.bz2>  /dev/null
     13.96

     $ ./lrzip -d -o - gcc-4.7.0.tar.lrz>  /dev/null
     3:36.12 (note, that's 3.5 *minutes* to decompress on a 12-core system)

That shows another reason to prefer xz over lzip.
xz decompresses this tarball in 28% less time than lzip.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/devel/attachments/20120330/9d18e313/attachment.html>


More information about the devel mailing list