Re: Add LZ4 compression in pg_dump

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Georgios <gkokolatos(at)protonmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2022-03-01 08:04:35
Message-ID: Yh3Tky8vyhMwb4yG@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 25, 2022 at 12:05:31PM +0000, Georgios wrote:
> The first commit does the heavy lifting required for additional compression methods.
> It expands testing coverage for the already supported gzip compression. Commit
> bf9aa490db introduced cfp in compress_io.{c,h} with the intent of unifying
> compression related code and allow for the introduction of additional archive
> formats. However, pg_backup_archiver.c was not using that API. This commit
> teaches pg_backup_archiver.c about cfp and is using it through out.

Thanks for the patch. I have a few high-level comments.

+ # Do not use --no-sync to give test coverage for data sync.
+ compression_gzip_directory_format => {
+ test_key => 'compression',

The tests for GZIP had better be split into their own commit, as
that's a coverage improvement for the existing code.

I was assuming that this was going to be much larger :)

+/* Routines that support LZ4 compressed data I/O */
+#ifdef HAVE_LIBLZ4
+static void InitCompressorLZ4(CompressorState *cs);
+static void ReadDataFromArchiveLZ4(ArchiveHandle *AH, ReadFunc
readF);
+static void WriteDataToArchiveLZ4(ArchiveHandle *AH, CompressorState *cs,
+ const char *data, size_t dLen);
+static void EndCompressorLZ4(ArchiveHandle *AH, CompressorState *cs);
+#endif

Hmm. This is the same set of APIs as ZLIB and NONE to init, read,
write and end, but for the LZ4 compressor (NONE has no init/end).
Wouldn't it be better to refactor the existing pg_dump code to have a
central structure holding all the function definitions in a common
structure so as all those function signatures are set in stone in the
shape of a catalog of callbacks, making the addition of more
compression formats easier? I would imagine that we'd split the code
of each compression method into their own file with their own context
data. This would lead to a removal of compress_io.c, with its entry
points ReadDataFromArchive(), WriteDataToArchive() & co replaced by
pointers to each per-compression callback.

> Furthermore, compression was chosen based on the value of the level passed
> as an argument during the invocation of pg_dump or some hardcoded defaults. This
> does not scale for more than one compression methods. Now the method used for
> compression can be explicitly requested during command invocation, or set during
> hardcoded defaults. Then it is stored in the relevant structs and passed in the
> relevant functions, along side compression level which has lost it's special
> meaning. The method for compression is not yet stored in the actual archive.
> This is done in the next commit which does introduce a new method.

That's one thing Robert was arguing about with pg_basebackup, so that
would be consistent, and the option set is backward-compatible as far
as I get it by reading the code.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-03-01 08:44:29 Re: psql: Make SSL info display more compact
Previous Message Michael Paquier 2022-03-01 07:45:48 Re: Allow file inclusion in pg_hba and pg_ident files