Re: pglz performance

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vladimir Leskov <vladimirlesk(at)yandex-team(dot)ru>
Subject: Re: pglz performance
Date: 2019-08-04 15:52:36
Message-ID: 7f52464f-5058-1186-ab49-3ac0931c3413@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 04/08/2019 11:57, Andrey Borodin wrote:
>
>
>> 2 авг. 2019 г., в 21:39, Andres Freund <andres(at)anarazel(dot)de> написал(а):
>>
>> On 2019-08-02 20:40:51 +0500, Andrey Borodin wrote:
>>> We have some kind of "roadmap" of "extensible pglz". We plan to provide implementation on Novembers CF.
>>
>> I don't understand why it's a good idea to improve the compression side
>> of pglz. There's plenty other people that spent a lot of time developing
>> better compression algorithms.
> Improving compression side of pglz has two different projects:
> 1. Faster compression with less code and same compression ratio (patch in this thread).
> 2. Better compression ratio with at least same compression speed of uncompressed values.
> Why I want to do patch for 2? Because it's interesting.
> Will 1 or 2 be reviewed or committed? I have no idea.
> Will many users benefit from 1 or 2? Yes, clearly. Unless we force everyone to stop compressing with pglz.
>

FWIW I agree.

>> Just so that we don't idly talk, what do you think about the attached?
>> It:
>> - adds new GUC compression_algorithm with possible values of pglz (default) and lz4 (if lz4 is compiled in), requires SIGHUP
>> - adds --with-lz4 configure option (default yes, so the configure option is actually --without-lz4) that enables the lz4, it's using system library
>> - uses the compression_algorithm for both TOAST and WAL compression (if on)
>> - supports slicing for lz4 as well (pglz was already supported)
>> - supports reading old TOAST values
>> - adds 1 byte header to the compressed data where we currently store the algorithm kind, that leaves us with 254 more to add :) (that's an extra overhead compared to the current state)
>> - changes the rawsize in TOAST header to 31 bits via bit packing
>> - uses the extra bit to differentiate between old and new format
>> - supports reading from table which has different rows stored with different algorithm (so that the GUC itself can be freely changed)
> That's cool. I suggest defaulting to lz4 if it is available. You cannot start cluster on non-lz4 binaries which used lz4 once.
> Do we plan the possibility of compression algorithm as extension? Or will all algorithms be packed into that byte in core?

What I wrote does not expect extensions providing new compression. We'd
have to somehow reserve compression ids for specific extensions and that
seems like a lot of extra complexity for little benefit. I don't see
much benefit in having more than say 3 generic compressors (I could
imagine adding zstd). If you are thinking about data type specific
compression then I think this is wrong layer.

> What about lz4 "common prefix"? System or user-defined. If lz4 is compiled in we can even offer in-system training, just make sure that trained prefixes will make their way to standbys.
>

I definitely don't plan to work on common prefix. But don't see why that
could not be added later.

--
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2019-08-04 15:53:26 Re: pglz performance
Previous Message Tom Lane 2019-08-04 15:52:32 Re: First draft of back-branch release notes is done