Re: Different compression methods for FPI

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Different compression methods for FPI
Date: 2021-06-15 03:42:06
Message-ID: 20210615034206.GL31772@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 15, 2021 at 11:39:24AM +0900, Michael Paquier wrote:
> On Mon, Jun 14, 2021 at 08:42:08PM -0500, Justin Pryzby wrote:
>>>> It's USERSET following your own suggestion (which is a good suggestion):
> >> + {"wal_compression_method", PGC_SIGHUP, WAL_SETTINGS,
> >> + gettext_noop("Set the method used to compress full page images in the WAL."),
> >> + NULL
> >> + },
> >> + &wal_compression_method,
> >> + WAL_COMPRESSION_PGLZ, wal_compression_options,
> >> + NULL, NULL, NULL
> >> Any reason to not make that user-settable? If you merge that with
> >> wal_compression, that's not an issue.
>
> Hmm, yeah. This can be read as using PGC_USERSET. With the second
> part of my sentence, I think that I imply to use PGC_SUSET and be
> consistent with wal_compression, but I don't recall my mood from one
> month ago :) Sorry for the confusion.

Hold on - we're all confused (and I'm to blame). The patch is changing the
existing wal_compression GUC, rather than adding wal_compression_method.
It's still SUSET, but in earlier messages, I called it USERSET, twice.

> You cannot do cross-checks for GUCs in their assign hooks or even rely
> in the order of those parameters, but you can do that in some backend
> code paths. A recent discussion on the matter is for example what led
> to 79dfa8a for the GUCs controlling the min/max SSL protocols
> allowed.

Thank you - this is what I was remembering.

> > The goal of the patch is to give options, and the overhead of adding both zlib
> > and lz4 is low. zlib gives good compression at some CPUcost and may be
> > preferable for (some) DWs, and lz4 is almost certainly better (than pglz) for
> > OLTPs.
>
> Anything will be better than pglz. I am rather confident in that.
>
> What I am wondering is if we need to eat more bits than necessary for
> the WAL record format, because we will end up supporting it until the
> end of times.

Why ? This is WAL, not table data. WAL depends on the major version, so
I think wal_compression could provide a different set of compression methods at
every major release?

Actually, I was just thinking that default yes/no/on/off stuff maybe should be
defined to mean "lz4" rather than meaning pglz for "backwards compatibility".

> hear here, there are many cases that we may care about depending on
> how much CPU one is ready to pay in order to get more compression,
> knowing that there are no magic solutions for something that's cheap
> in CPU with a very good compression ratio or we could just go with
> that. So it seems to me that there is still an argument for adding
> only one new compression method with a good range of levels, able to
> support the range of cases we'd care about:
> - High compression ratio but high CPU cost.
> - Low compression ratio but low CPU cost.

I think zlib is too expensive and lz4 doesn't get enough compression,
so neither supports both cases. In a sample of 1, zlib-1 is ~35% slower than
lz4 and writes half as much.

I think zstd could support both cases; however, I still see it as this patch's
job to provide options, rather to definitively conclude which compression
algorithm is going to work best for everyone's use data and application.

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-06-15 04:27:02 Re: RFC: Logging plan of the running query
Previous Message Michael Paquier 2021-06-15 03:01:00 Re: PG 14 release notes, first draft