Re: pglz performance

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vladimir Leskov <vladimirlesk(at)yandex-team(dot)ru>
Subject: Re: pglz performance
Date: 2019-08-02 13:45:43
Message-ID: ea57b49a-ecf0-481a-a77b-631833354f7d@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 27.06.2019 21:33, Andrey Borodin wrote:
>
>> 13 мая 2019 г., в 12:14, Michael Paquier <michael(at)paquier(dot)xyz> написал(а):
>>
>> Decompression can matter a lot for mostly-read workloads and
>> compression can become a bottleneck for heavy-insert loads, so
>> improving compression or decompression should be two separate
>> problems, not two problems linked. Any improvement in one or the
>> other, or even both, is nice to have.
> Here's patch hacked by Vladimir for compression.
>
> Key differences (as far as I see, maybe Vladimir will post more complete list of optimizations):
> 1. Use functions instead of macro-functions: not surprisingly it's easier to optimize them and provide less constraints for compiler to optimize.
> 2. More compact hash table: use indexes instead of pointers.
> 3. More robust segment comparison: like memcmp, but return index of first different byte
>
> In weighted mix of different data (same as for compression), overall speedup is x1.43 on my machine.
>
> Current implementation is integrated into test_pglz suit for benchmarking purposes[0].
>
> Best regards, Andrey Borodin.
>
> [0] https://github.com/x4m/test_pglz

It takes me some time to understand that your memcpy optimization is
correct;)
I have tested different ways of optimizing this fragment of code, but
failed tooutperform your implementation!
Results at my computer is simlar with yours:

Decompressor score (summ of all times):
NOTICE:  Decompressor pglz_decompress_hacked result 6.627355
NOTICE:  Decompressor pglz_decompress_hacked_unrolled result 7.497114
NOTICE:  Decompressor pglz_decompress_hacked8 result 7.412944
NOTICE:  Decompressor pglz_decompress_hacked16 result 7.792978
NOTICE:  Decompressor pglz_decompress_vanilla result 10.652603

Compressor score (summ of all times):
NOTICE:  Compressor pglz_compress_vanilla result 116.970005
NOTICE:  Compressor pglz_compress_hacked result 89.706105

But ...  below are results for lz4:

Decompressor score (summ of all times):
NOTICE:  Decompressor lz4_decompress result 3.660066
Compressor score (summ of all times):
NOTICE:  Compressor lz4_compress result 10.288594

There is 2 times advantage in decompress speed and 10 times advantage in
compress speed.
So may be instead of "hacking" pglz algorithm we should better switch to
lz4?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-08-02 14:11:30 Re: Recent failures in IsolationCheck deadlock-hard
Previous Message vignesh C 2019-08-02 13:12:49 Re: block-level incremental backup