Re: wal_compression=zstd

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: wal_compression=zstd
Date: 2022-03-05 10:26:39
Message-ID: YiM63/0LybPYqSUN@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 04, 2022 at 08:08:03AM -0500, Robert Haas wrote:
> On Fri, Mar 4, 2022 at 6:44 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>> In my 1-off test, it gets 610/633 = 96% of the benefit at 209/273 = 77% of the
>> cost.

Hmm, it may be good to start afresh and compile numbers in a single
chart. I did that here with some numbers on the user and system CPU:
https://www.postgresql.org/message-id/YMmlvyVyAFlxZ+/H(at)paquier(dot)xyz

For this test, regarding ZSTD, the lowest level did not have much
difference with the default level, and at the highest level the user
CPU spiked for little gain in compression. All of them compressed
more than LZ4, with more CPU used in each case, but the default or a
level value lower than the default gives me the impression that it
won't matter much in terms of compression gains and CPU usage.

> I agree with Michael. Your 1-off test is exactly that, and the results
> will have depended on the data you used for the test. I'm not saying
> we could never decide to default to a compression level other than the
> library's default, but I do not think we should do it casually or as
> the result of any small number of tests. There should be a strong
> presumption that the authors of the library have a good idea what is
> sensible in general unless we can explain compellingly why our use
> case is different from typical ones.
>
> There's an ease-of-use concern here too. It's not going to make things
> any easier for users to grok if zstd is available in different parts
> of the system but has different defaults in each place. It wouldn't be
> the end of the world if that happened, but neither would it be ideal.

I'd like to believe that anybody who writes his/her own compression
algorithm have a good idea of the default behavior they want to show,
so we could remain simple, and believe in them. Now, I would not
object to see some fresh numbers, and assuming that all FPIs have the
same page size, we could go down to designing a couple of test cases
that produce a fixed number of FPIs and measure the compressability in
a single session.

Repeatability and randomness of data counts, we could have for example
one case with a set of 5~7 int attributes, a second with text values
that include random data, up to 10~12 bytes each to count on the tuple
header to be able to compress some data, and a third with more
repeatable data, like one attribute with an int column populate
with generate_series(). Just to give an idea.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-03-05 10:31:53 Re: pl/pgsql feature request: shorthand for argument and local variable references
Previous Message Julien Rouhaud 2022-03-05 08:54:18 Re: pl/pgsql feature request: shorthand for argument and local variable references