Re: pglz performance

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vladimir Leskov <vladimirlesk(at)yandex-team(dot)ru>
Subject: Re: pglz performance
Date: 2019-08-02 14:43:45
Message-ID: 20190802144345.d62jtiyyx6r2y73f@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 02, 2019 at 04:45:43PM +0300, Konstantin Knizhnik wrote:
>
>
>On 27.06.2019 21:33, Andrey Borodin wrote:
>>
>>>13 мая 2019 г., в 12:14, Michael Paquier <michael(at)paquier(dot)xyz> написал(а):
>>>
>>>Decompression can matter a lot for mostly-read workloads and
>>>compression can become a bottleneck for heavy-insert loads, so
>>>improving compression or decompression should be two separate
>>>problems, not two problems linked. Any improvement in one or the
>>>other, or even both, is nice to have.
>>Here's patch hacked by Vladimir for compression.
>>
>>Key differences (as far as I see, maybe Vladimir will post more complete list of optimizations):
>>1. Use functions instead of macro-functions: not surprisingly it's easier to optimize them and provide less constraints for compiler to optimize.
>>2. More compact hash table: use indexes instead of pointers.
>>3. More robust segment comparison: like memcmp, but return index of first different byte
>>
>>In weighted mix of different data (same as for compression), overall speedup is x1.43 on my machine.
>>
>>Current implementation is integrated into test_pglz suit for benchmarking purposes[0].
>>
>>Best regards, Andrey Borodin.
>>
>>[0] https://github.com/x4m/test_pglz
>
>It takes me some time to understand that your memcpy optimization is
>correct;)
>I have tested different ways of optimizing this fragment of code, but
>failed tooutperform your implementation!
>Results at my computer is simlar with yours:
>
>Decompressor score (summ of all times):
>NOTICE:  Decompressor pglz_decompress_hacked result 6.627355
>NOTICE:  Decompressor pglz_decompress_hacked_unrolled result 7.497114
>NOTICE:  Decompressor pglz_decompress_hacked8 result 7.412944
>NOTICE:  Decompressor pglz_decompress_hacked16 result 7.792978
>NOTICE:  Decompressor pglz_decompress_vanilla result 10.652603
>
>Compressor score (summ of all times):
>NOTICE:  Compressor pglz_compress_vanilla result 116.970005
>NOTICE:  Compressor pglz_compress_hacked result 89.706105
>
>
>But ...  below are results for lz4:
>
>Decompressor score (summ of all times):
>NOTICE:  Decompressor lz4_decompress result 3.660066
>Compressor score (summ of all times):
>NOTICE:  Compressor lz4_compress result 10.288594
>
>There is 2 times advantage in decompress speed and 10 times advantage
>in compress speed.
>So may be instead of "hacking" pglz algorithm we should better switch
>to lz4?
>

I think we should just bite the bullet and add initdb option to pick
compression algorithm. That's been discussed repeatedly, but we never
ended up actually doing that. See for example [1].

If there's anyone willing to put some effort into getting this feature
over the line, I'm willing to do reviews & commit. It's a seemingly
small change with rather insane potential impact.

But even if we end up doing that, it still makes sense to optimize the
hell out of pglz, because existing systems will still use that
(pg_upgrade can't switch from one compression algorithm to another).

regards

[1] https://www.postgresql.org/message-id/flat/55341569.1090107%402ndquadrant.com

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-08-02 14:44:43 Re: Patch to document base64 encoding
Previous Message Karl O. Pinc 2019-08-02 14:32:53 Re: Patch to document base64 encoding