Re: Add LZ4 compression in pg_dump

From: gkokolatos(at)pm(dot)me
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2022-11-29 12:10:46
Message-ID: iKYD3Ryqsn88_UgYBiw3V803ISB5BJokmR2zsFtcedNVVeivIn_fqVv-oRRMqOpMiBFXsi72UYNMmuqMnEjjcQluTTpO-oxQWAVPeAlACnA=@pm.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

------- Original Message -------
On Tuesday, November 29th, 2022 at 7:19 AM, Michael Paquier <michael(at)paquier(dot)xyz> wrote:

>
>
> On Mon, Nov 28, 2022 at 04:32:43PM +0000, gkokolatos(at)pm(dot)me wrote:
>
> > The focus of this version of this series is 0001 and 0002.
> >
> > Admittedly 0001 could be presented in a separate thread though given its size and
> > proximity to the topic, I present it here.
>
>
> I don't mind. This was a hole in meson.build, so nice catch! I have
> noticed a second defect with pg_verifybackup for all the commands, and
> applied both at the same time.

Thank you.

>
> > In an earlier review you spotted the similarity between pg_dump's and pg_receivewal's
> > parsing of compression options. However there exists a substantial difference in the
> > behaviour of the two programs; one treats the lack of support for the requested
> > algorithm as a fatal error, whereas the other does not. The existing functions in
> > common/compression.c do not account for the later. 0002 proposes an implementation
> > for this. It's usefulness is shown in 0003.
>
>
> In what does it matter? The logic in compression.c provides an error
> when looking at a spec or validating it, but the caller is free to
> consume it as it wants because this is shared between the frontend and
> the backend, and that includes consuming it as a warning rather than a
> ahrd failure. If we don't want to issue an error and force
> non-compression if attempting to use a compression method not
> supported in pg_dump, that's fine by me as a historical behavior, but
> I don't see why these routines have any need to be split more as
> proposed in 0002.

I understand. The reason for the change in the routines was because it was
impossible to distinguish a genuine parse error from a missing library in
parse_compress_specification(). If the zlib library is missing, then both
'--compress=gzip:garbage' and '--compress=gzip:7' would populate the
parse_error member of the struct and subsequent calls to
validate_compress_specification() would error out, although only one of
the two options is truly an error. Historically the code would fail on
invalid input regardless of whether the library was present or not.

> Saying that, I do agree that it would be nice to remove the
> duplication between the option parsing of pg_basebackup and
> pg_receivewal. Your patch is very close to that, actually, and it
> occured to me that if we move the check on "server-" and "client-" in
> pg_basebackup to be just before the integer-only check then we can
> consolidate the whole thing.

Great. I did notice the possible benefit but chose to not tread too far
off the necessary in my patch.

> Attached is an alternative that does not sacrifice the pluggability of
> the existing routines while allowing 0003~ to still use them (I don't
> really want to move around the checks on the supported build options
> now in parse_compress_specification(), that was hard enough to settle
> on this location).

Yeah, I thought that it would be a hard sell, hence an "earlier"
version.

The attached version 10, contains verbatim your proposed v9 as 0001.
Then 0002 is switching a bit the parsing order in pg_dump and will not
fail as described above on missing libraries. Now, it will first parse
the algorithm, discard it when unsupported, and only parse the rest of
the option if the algorithm is supported. Granted it is a bit 'uglier'
with the preprocessing blocks, yet it maintains most of the historic
behaviour without altering the common compression interfaces. Now, as
shown in 001_basic.pl, invalid detail will fail only if the algorithm
is supported.

> On top of that, pg_basebackup is able to cope with
> the case of --compress=0 already, enforcing "none" (BaseBackup could
> be simplified a bit more before StartLogStreamer). This refactoring
> shaves a little bit of code.
>
> > Please consider 0003-0005 as work in progress. They are differences from v7 yet they
> > may contain unaddressed comments for now.
>
>
> Okay.

Thank you. Please advice if is preferable to split 0002 in two parts.
I think not but I will happily do so if you think otherwise.

Cheers,
//Georgios

> --
> Michael

Attachment Content-Type Size
v10-0001-Make-the-pg_receivewal-compression-parsing-funct.patch text/x-patch 7.7 KB
v10-0004-Add-LZ4-compression-in-pg_-dump-restore.patch text/x-patch 29.3 KB
v10-0002-Prepare-pg_dump-for-additional-compression-metho.patch text/x-patch 50.6 KB
v10-0003-Introduce-Compressor-API-in-pg_dump.patch text/x-patch 53.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message li jie 2022-11-29 12:21:07 Re: Support logical replication of DDLs
Previous Message Simon Riggs 2022-11-29 12:02:44 Re: O(n) tasks cause lengthy startups and checkpoints