Re: wal_compression=zstd

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: wal_compression=zstd
Date: 2022-03-04 11:44:06
Message-ID: 20220304114405.GK27651@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 04, 2022 at 04:19:32PM +0900, Michael Paquier wrote:
> On Tue, Feb 22, 2022 at 05:19:48PM -0600, Justin Pryzby wrote:
>
> > As writen, this patch uses zstd level=1 (whereas the ZSTD's default compress
> > level is 6).
>
> Why? ZSTD using this default has its reasons, no? And it would be
> consistent to do the same for ZSTD as for the other two methods.

In my 1-off test, it gets 610/633 = 96% of the benefit at 209/273 = 77% of the
cost.

postgres=# SET wal_compression= 'zstd-6';
postgres=# \set QUIET \\ \timing on \\ SET max_parallel_maintenance_workers=0; SELECT pg_stat_reset_shared('wal'); begin; CREATE INDEX ON t(a); rollback; SELECT * FROM pg_stat_wal;
Duración: 2729,017 ms (00:02,729)
wal_bytes | 6102403

postgres=# SET wal_compression= 'zstd-1';
postgres=# \set QUIET \\ \timing on \\ SET max_parallel_maintenance_workers=0; SELECT pg_stat_reset_shared('wal'); begin; CREATE INDEX ON t(a); rollback; SELECT * FROM pg_stat_wal;
Duración: 2090,459 ms (00:02,090)
wal_bytes | 6330269

It may be relevant that we're only compressing 8k [0].
It would probably pay off to preserve a compression "dictionary" to be re-used
across FPIs.

> - The supported methods are <literal>pglz</literal> and
> - <literal>lz4</literal> (if <productname>PostgreSQL</productname> was
> - compiled with <option>--with-lz4</option>). The default value is
> - <literal>off</literal>. Only superusers can change this setting.
> + The supported methods are <literal>pglz</literal>, and
> + (if configured when <productname>PostgreSQL</productname>was built)
> + <literal>lz4</literal> and zstd.
> + The default value is <literal>off</literal>.
> + Only superusers can change this setting.
>
> This is making the docs less verbose. I think that you should keep
> the mention to respectively --with-lz4 and --with-zstd after each
> value.

I changed this relative to your latest zstd patch in July since it reads better
to me. YMMV.

On Tue, Feb 22, 2022 at 05:19:48PM -0600, Justin Pryzby wrote:
>> 0001 adds support for zstd
>> 0002 makes more efficient our use of bits in the WAL header
>> 0003 makes it the default compression, to exercise on CI (windows will fail).
>> 0004 adds support for zstd-level
>> 0005-6 are needed to allow recovery tests to pass when wal compression is enabled;
>> 0007 (not included) also adds support for zlib. I'm of the impression nobody
>> cares about this, otherwise it would've been included 5-10 years ago.

> 0003, defaulting to zstd, would require a lot of benchmarking to be
> justified.

Maybe you didn't see that I wrote that it was for CI ?
(In any case, it's impossible to do that without first taking care of 005-6).

> and 0004 to extend compression to support a level
> I have argued against making the code more complicated for such things in the
> past, with reloptions.

I suppose that you're referring to reloptions for toast compression, which was
about controlling LZ4 compression level.

https://www.postgresql.org/message-id/flat/CAFiTN-vMHU_yakMgNo90SUih_6LtvmqZ5hpQb2iTgQtVyOjyFA(at)mail(dot)gmail(dot)com

I think you're right that it's not interesting to control the compression level
of LZ4 - but there's no reason to say the same thing of zstd. We're already
debating which is the "right" level to use (1 or 6). I don't think there is a
"right" level - some people will want to trade more CPU for disk writes and
some people won't.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Sharma 2022-03-04 12:20:47 Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Previous Message Masahiko Sawada 2022-03-04 11:43:50 Re: Add the replication origin name and commit-LSN to logical replication worker errcontext