Re: Add LZ4 compression in pg_dump

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: gkokolatos(at)pm(dot)me
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2022-07-21 05:04:18
Message-ID: YtjeUvZxIN+yIEMw@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 05, 2022 at 01:22:47PM +0000, gkokolatos(at)pm(dot)me wrote:
> I have updated for "some" of the comments. This is not an unwillingness to
> incorporate those specific comments. Simply this patchset had started to divert
> heavily already based on comments from Mr. Paquier who had already requested for
> the APIs to be refactored to use function pointers. This is happening in 0002 of
> the patchset. 0001 of the patchset is using the new compression.h under common.
>
> This patchset should be considered a late draft, as commentary, documentation,
> and some finer details are not yet finalized; because I am expecting the proposed
> refactor to receive a wealth of comments. It would be helpful to understand if
> the proposed direction is something worth to be worked upon, before moving to the
> finer details.

I have read through the patch set, and I like a lot the separation you
are doing here with CompressFileHandle where a compression method has
to specify a full set of callbacks depending on the actions that need
to be taken. One advantage, as you patch shows, is that you reduce
the dependency of each code path depending on the compression method,
with #ifdefs and such located mostly into their own file structure, so
as adding a new compression method becomes really easier. These
callbacks are going to require much more documentation to describe
what anybody using them should expect from them, and perhaps they
could be renamed in a more generic way as the currect names come from
POSIX (say read_char(), read_string()?), even if this patch has just
inherited the names coming from pg_dump itself, but this can be tuned
over and over.

The split into three parts as of 0001 to plug into pg_dump the new
compression option set, 0002 to introduce the callbacks and 0003 to
add LZ4, building on the two first parts, makes sense to me. 0001 and
0002 could be done in a reversed order as they are mostly independent,
this order is fine as-is.

In short, I am fine with the proposed approach.

+#define K_VERS_1_15 MAKE_ARCHIVE_VERSION(1, 15, 0) /* add compressionMethod
+ * in header */
Indeed, the dump format needs a version bump for this information.

+static bool
+parse_compression_option(const char *opt,
+ pg_compress_specification *compress_spec)
This parsing logic in pg_dump.c looks a lot like what pg_receivewal.c
does with its parse_compress_options() where, for compatibility:
- If only a number is given:
-- Assume no compression if level is 0.
-- Assume gzip with given compression if level > 0.
- If a string is found, assume a full spec, with optionally a level.
So some consolidation could be done between both.

By the way, I can see that GZCLOSE(), etc. are still defined in
compress_io.h but they are not used.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2022-07-21 05:30:04 Re: i.e. and e.g.
Previous Message Michael Paquier 2022-07-21 04:33:19 Re: Refactor to make use of a common function for GetSubscriptionRelations and GetSubscriptionNotReadyRelations.