Re: Optimizing pglz compressor

From: Daniel Farina <daniel(at)heroku(dot)com>
To:
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimizing pglz compressor
Date: 2013-03-19 05:27:23
Message-ID: CAAZKuFZCOCHsswQM60ioDO_hk12tA7OG3YcJA8v=4YebMOA-wA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 6, 2013 at 6:32 AM, Joachim Wieland <joe(at)mcknight(dot)de> wrote:
> On Tue, Mar 5, 2013 at 8:32 AM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>> With these tweaks, I was able to make pglz-based delta encoding perform
>> roughly as well as Amit's patch.
>
> Out of curiosity, do we know how pglz compares with other algorithms, e.g. lz4 ?

This one is for the archives, as I thought it surprising: there can be
a surprisingly huge magnitude of performance difference of these
algorithms depending on architecture. Here's a table reproduced from:
http://www.reddit.com/r/programming/comments/1aim6s/lz4_extremely_fast_compression_algorithm/c8y0ew9

"""
testdata/alice29.txt :
ZLIB: [b 1M] bytes 152089 -> 54404 35.8% comp 0.8 MB/s uncomp 8.1 MB/s
LZO: [b 1M] bytes 152089 -> 82721 54.4% comp 14.5 MB/s uncomp 43.0 MB/s
CSNAPPY: [b 1M] bytes 152089 -> 90965 59.8% comp 2.1 MB/s uncomp 4.4 MB/s
SNAPPY: [b 4M] bytes 152089 -> 90965 59.8% comp 1.8 MB/s uncomp 2.8 MB/s
testdata/asyoulik.txt :
ZLIB: [b 1M] bytes 125179 -> 48897 39.1% comp 0.8 MB/s uncomp 7.7 MB/s
LZO: [b 1M] bytes 125179 -> 73224 58.5% comp 15.3 MB/s uncomp 42.4 MB/s
CSNAPPY: [b 1M] bytes 125179 -> 80207 64.1% comp 2.0 MB/s uncomp 4.2 MB/s
SNAPPY: [b 4M] bytes 125179 -> 80207 64.1% comp 1.7 MB/s uncomp 2.7 MB/s

LZO was ~8x faster compressing and ~16x faster decompressing. Only on
uncompressible data was Snappy was faster:

testdata/house.jpg :
ZLIB: [b 1M] bytes 126958 -> 126513 99.6% comp 1.2 MB/s uncomp 9.6 MB/s
LZO: [b 1M] bytes 126958 -> 127173 100.2% comp 4.2 MB/s uncomp
74.9 MB/s
CSNAPPY: [b 1M] bytes 126958 -> 126803 99.9% comp 24.6 MB/s uncomp 381.2 MB/s
SNAPPY: [b 4M] bytes 126958 -> 126803 99.9% comp 22.8 MB/s uncomp 354.4 MB/s
"""

So that's one more gotcha to worry about, since I surmise most numbers
are being taken on x86. Apparently this has something to do with
alignment of accesses. Some of it may be fixable by tweaking the
implementation rather than the compression encoding, although I am no
expert in the matter.

--
fdr

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2013-03-19 05:28:10 Re: [HACKERS] Trust intermediate CA for client certificates
Previous Message Daniel Farina 2013-03-19 02:59:39 Re: Enabling Checksums