Re: Add LZ4 compression in pg_dump

From: gkokolatos(at)pm(dot)me
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2023-01-25 15:37:12
Message-ID: -myGeDiZ-ePDRyi18026Iu2HTTfmwfjWaJn4OWJ2F3oglZUfJTOQ7FARPW7gYxE4vK3b0HpwYoiHR45nfOWReRvvJZfCBfJzPP5iJyn4muk=@pm.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

------- Original Message -------
On Wednesday, January 25th, 2023 at 2:42 AM, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

>
>
> On Tue, Jan 24, 2023 at 03:56:20PM +0000, gkokolatos(at)pm(dot)me wrote:
>
> > On Monday, January 23rd, 2023 at 7:00 PM, Justin Pryzby pryzby(at)telsasoft(dot)com wrote:
> >
> > > On Mon, Jan 23, 2023 at 05:31:55PM +0000, gkokolatos(at)pm(dot)me wrote:
> > >
> > > > Please find attached v23 which reintroduces the split.
> > > >
> > > > 0001 is reworked to have a reduced footprint than before. Also in an attempt
> > > > to facilitate the readability, 0002 splits the API's and the uncompressed
> > > > implementation in separate files.
> > >
> > > Thanks for updating the patch. Could you address the review comments I
> > > sent here ?
> > > https://www.postgresql.org/message-id/20230108194524.GA27637%40telsasoft.com
> >
> > Please find v24 attached.
>
>
> Thanks for updating the patch.
>
> In 001, RestoreArchive() does:
>
> > -#ifndef HAVE_LIBZ
> > - if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP &&
> > - AH->PrintTocDataPtr != NULL)
> > + supports_compression = false;
> > + if (AH->compression_spec.algorithm == PG_COMPRESSION_NONE ||
> > + AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
> > + supports_compression = true;
> > +
> > + if (AH->PrintTocDataPtr != NULL)
> > {
> > for (te = AH->toc->next; te != AH->toc; te = te->next)
> > {
> > if (te->hadDumper && (te->reqs & REQ_DATA) != 0)
> > - pg_fatal("cannot restore from compressed archive (compression not supported in this installation)");
> > + {
> > +#ifndef HAVE_LIBZ
> > + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
> > + supports_compression = false;
> > +#endif
> > + if (supports_compression == false)
> > + pg_fatal("cannot restore from compressed archive (compression not supported in this installation)");
> > + }
> > }
> > }
> > -#endif
>
>
> This first checks if the algorithm is implemented, and then checks if
> the algorithm is supported by the current build - that confused me for a
> bit. It seems unnecessary to check for unimplemented algorithms before
> looping. That also requires referencing both GZIP and LZ4 in two
> places.

I am not certain that it is unnecessary, at least not in the way that is
described. The idea is that new compression methods can be added, without
changing the archive's version number. It is very possible that it is
requested to restore an archive compressed with a method not implemented
in the current binary. The first check takes care of that and sets
supports_compression only for the supported versions. It is possible to
enter the loop with supports_compression already set to false, for example
because the archive was compressed with ZSTD, triggering the fatal error.

Of course, one can throw the error before entering the loop, yet I think
that it does not help the readability of the code. IMHO it is easier to
follow if the error is thrown once during that check.

>
> I think it could be written to avoid the need to change for added
> compression algorithms:
>
> + if (te->hadDumper && (te->reqs & REQ_DATA) != 0)
>
> + {
> + /* Check if the compression algorithm is supported */
> + pg_compress_specification spec;
> + parse_compress_specification(AH->compression_spec.algorithm, NULL, &spec);
>
> + if (spec->parse_error != NULL)
>
> + pg_fatal(spec->parse_error);
>
> + }

I am not certain how that would work in the example with ZSTD above.
If I am not wrong, parse_compress_specification() will not throw an error
if the codebase supports ZSTD, yet this specific pg_dump binary will not
support it because ZSTD is not implemented. parse_compress_specification()
is not aware of that and should not be aware of it, should it?

>
> Or maybe add a new function to compression.c to indicate whether a given
> algorithm is supported.

I am not certain how this would help, as compression.c is supposed to be
used by multiple binaries while this is a pg_dump specific detail.

> That would also indicate which compression library isn't supported.

If anything, I can suggest to throw an error much earlier, i.e. in ReadHead(),
and remove altogether this check. On the other hand, I like the belts
and suspenders approach because there are no more checks after this point.

> Other than that, I think 001 is ready.

Thank you.

> 002/003 use these names, which I think are too similar - initially I
> didn't even realize there were two separate functions (each with a
> second stub function to handle the case of unsupported compression):
>
> +extern void InitCompressorGzip(CompressorState *cs, const pg_compress_specification compression_spec);
> +extern void InitCompressGzip(CompressFileHandle *CFH, const pg_compress_specification compression_spec);
>
> +extern void InitCompressorLZ4(CompressorState *cs, const pg_compress_specification compression_spec);
> +extern void InitCompressLZ4(CompressFileHandle *CFH, const pg_compress_specification compression_spec);

Fair enough. Names are now updated.

>
> typo:
> s/not build with/not built with/

Thank you.

>
> Should AllocateCompressor() set cs->compression_spec, rather than doing
> it in each compressor ?

I think that compression_spec should be owned by each compressor. With that
in mind, it makes more sense to set it within each compressor. This is not
a hill I am willing to die on though.

Please find v25 attached.

>
> Thanks for considering.
>
> --
> Justin

Attachment Content-Type Size
v25-0001-Prepare-pg_dump-internals-for-additional-compres.patch text/x-patch 13.1 KB
v25-0002-Introduce-Compress-and-Compressor-API-in-pg_dump.patch text/x-patch 69.0 KB
v25-0003-Add-LZ4-compression-to-pg_dump.patch text/x-patch 30.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message songjinzhou 2023-01-25 15:39:00 Re: Re: Support plpgsql multi-range in conditional control
Previous Message Robert Haas 2023-01-25 15:28:22 Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation