Re: Different compression methods for FPI

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Different compression methods for FPI
Date: 2021-06-15 01:42:08
Message-ID: 20210615014208.GK31772@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 15, 2021 at 09:50:41AM +0900, Michael Paquier wrote:
> On Sun, Jun 13, 2021 at 08:24:12PM -0500, Justin Pryzby wrote:
> > I think it's more nuanced than just finding the algorithm with the least CPU
> > use. The GUC is PGC_USERSET, and it's possible that a data-loading process
> > might want to use zlib for better compress ratio, but an interactive OLTP
> > process might want to use lz4 or no compression for better responsiveness.
>
> It seems to me that this should be a PGC_SUSET, at least? We've had
> our share of problems with assumptions behind data leaks depending on
> data compressibility (see ssl_compression and the kind).

It's USERSET following your own suggestion (which is a good suggestion):

On Mon, May 17, 2021 at 04:44:11PM +0900, Michael Paquier wrote:
> + {"wal_compression_method", PGC_SIGHUP, WAL_SETTINGS,
> + gettext_noop("Set the method used to compress full page images in the WAL."),
> + NULL
> + },
> + &wal_compression_method,
> + WAL_COMPRESSION_PGLZ, wal_compression_options,
> + NULL, NULL, NULL
> Any reason to not make that user-settable? If you merge that with
> wal_compression, that's not an issue.

I don't see how restricting it to superusers would mitigate the hazard at all:
If the local admin enables wal compression, then every user's data will be
compressed, and the degree of compression indicatates a bit about their data,
no matter whether it's pglz or lz4.

It's probably true without compression, too - the fraction of FPW might reveal
their usage patterns.

> > In this patch series, I added compression information to the errcontext from
> > xlog_block_info(), and allow specifying compression levels like zlib-2. I'll
> > rearrange that commit earlier if we decide that's desirable to include.
>
> The compression level may be better if specified with a different
> GUC. That's less parsing to have within the GUC machinery.

I'm not sure about that - then there's an interdependency between GUCs.
If zlib range is 1..9, and zstd is -50..10, then you may have to set the
compression level first, to avoid an error. I believe there's a previous
discussion about inter-dependent GUCs, and maybe a commit fixing a problem they
caused. But I cannot find it offhand.

> seems to me that if we can get the same amount of compression and CPU
> usage just by tweaking the compression level, there is no need to
> support more than one extra compression algorithm, easing the life of
> packagers and users.

I don't think it eases it for packagers, since I anticipate the initial patch
would support {none/pglz/lz4/zlib}. I anticipate that zstd may not be in pg15.

The goal of the patch is to give options, and the overhead of adding both zlib
and lz4 is low. zlib gives good compression at some CPUcost and may be
preferable for (some) DWs, and lz4 is almost certainly better (than pglz) for
OLTPs.

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-06-15 02:09:48 Isolation tests vs. SERIALIZABLE isolation level
Previous Message torikoshia 2021-06-15 01:40:27 Re: Delegating superuser tasks to new security roles