Re: Add LZ4 compression in pg_dump

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: gkokolatos(at)pm(dot)me
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2023-01-14 21:43:09
Message-ID: 20230114214308.GC9837@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 08, 2023 at 01:45:25PM -0600, Justin Pryzby wrote:
> On Thu, Dec 22, 2022 at 11:08:59AM -0600, Justin Pryzby wrote:
> > There's a couple of lz4 bits which shouldn't be present in 002: file
> > extension and comments.

> BTW I noticed that cfdopen() was accidentally committed to compress_io.h
> in master without being defined anywhere.

This was resolved in 69fb29d1a (so now needs to be re-added for this
patch series).

> pg_compress_specification is being passed by value, but I think it
> should be passed as a pointer, as is done everywhere else.

ISTM that was an issue with 5e73a6048, affecting a few public and
private functions. I wrote a pre-preparatory patch which changes to
pass by reference.

And addressed a handful of other issues I reported as separate fixup
commits. And changed to use LZ4 by default for CI.

I also rebased my 2 year old patch to support zstd in pg_dump. I hope
it can finally added for v16. I'll send it for the next CF if these
patches progress.

One more thing: some comments still refer to the cfopen API, which this
patch removes.

> There were "LZ4" comments and file extension stuff in the preparatory
> commit. But now it seems like you *removed* them in the LZ4 commit
> (where it actually belongs) rather than *moving* it from the
> prior/parent commit *to* the lz4 commit. I recommend to run something
> like "git diff @{1}" whenever doing this kind of patch surgery.

TODO

> Maybe other places that check if (compression==PG_COMPRESSION_GZIP)
> should maybe change to say compression!=NONE?
>
> _PrepParallelRestore() references ".gz", so I think it needs to be
> retrofitted to handle .lz4. Ideally, that's built into a struct or list
> of file extensions to try. Maybe compression.h should have a function
> to return the file extension of a given algorithm. I'm planning to send
> a patch for zstd, and hoping its changes will be minimized by these
> preparatory commits.

TODO

> I think it's confusing to have two functions, one named
> InitCompressLZ4() and InitCompressorLZ4().

TODO

> pg_compress_algorithm is being writen directly into the pg_dump header.
> Currently, I think that's not an externally-visible value (it could be
> renumbered, theoretically even in a minor release). Maybe there should
> be a "private" enum for encoding the pg_dump header, similar to
> WAL_COMPRESSION_LZ4 vs BKPIMAGE_COMPRESS_LZ4 ? Or else a comment there
> should warn that the values are encoded in pg_dump, and must never be
> changed.

Michael, WDYT ?

--
Justin

Attachment Content-Type Size
0001-pg_dump-pass-pg_compress_specification-as-a-pointer.patch text/x-diff 11.9 KB
0002-Prepare-pg_dump-internals-for-additional-compression.patch text/x-diff 23.5 KB
0003-Introduce-Compressor-API-in-pg_dump.patch text/x-diff 65.5 KB
0004-f.patch text/x-diff 1.3 KB
0005-Add-LZ4-compression-in-pg_-dump-restore.patch text/x-diff 28.8 KB
0006-f.patch text/x-diff 6.8 KB
0007-TMP-pg_dump-use-lz4-by-default-for-CI-only.patch text/x-diff 2.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mikhail Gribkov 2023-01-14 21:56:06 Re: On login trigger: take three
Previous Message Tom Lane 2023-01-14 21:23:47 Re: Removing redundant grouping columns