Re: Add LZ4 compression in pg_dump

From: gkokolatos(at)pm(dot)me
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Rachel Heaton <rachelmheaton(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add LZ4 compression in pg_dump
Date: 2022-04-01 15:06:40
Message-ID: syL-7HopUnzsy_Trvi6DON6tGwrcpujr-LhXo26AJ8AoP1fiGW66H33bMkPYU7tlu5HoqaSTt5FBeI6Ew1RLxnvTKy4-YZtBUJPgFMw3bQM=@pm.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, March 31st, 2022 at 4:34 AM, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Wed, Mar 30, 2022 at 03:32:55PM +0000, gkokolatos(at)pm(dot)me wrote:
> > On Wednesday, March 30th, 2022 at 7:54 AM, Michael Paquier michael(at)paquier(dot)xyz wrote:
>
> Okay. 0002 looks fine as-is, and I don't mind the extra fatal()
> calls. These could be asserts but that's not a big deal one way or
> the other. And the cleanup is now applied.

Thank you very much.

> > > + my $compress_program = $ENV{GZIP_PROGRAM};
> > > It seems to me that it is enough to rely on {compress_cmd}, hence
> > > there should be no need for $compress_program, no?
> >
> > Maybe not. We don't want to the tests to fail if the utility is not
> > installed. That becomes even more evident as more methods are added.
> > However I realized that the presence of the environmental variable does
> > not guarrantee that the utility is actually installed. In the attached,
> > the existance of the utility is based on the return value of system_log().
>
> Hmm. [.. thinks ..] The thing that's itching me here is that you
> align the concept of compression with gzip, but that's not going to be
> true once more compression options are added to pg_dump, and that
> would make $supports_compression and $compress_program_exists
> incorrect. Perhaps the right answer would be to rename all that with
> a suffix like "_gzip" to make a difference? Or would there be enough
> control with a value of "compression_gzip" instead of "compression" in
> test_key?

I understand the itch. Indeed when LZ4 is added as compression method, this
block changes slightly. I went with the minimum amount changed. Please find
in 0001 of the attached this variable renamed as $gzip_program_exist. I thought
that as prefix it will match better the already used $ENV{GZIP_PROGRAM}.

> +my $compress_program_exists = (system_log("$ENV{GZIP_PROGRAM}", '-h',
> + '>', '/dev/null') == 0);
>
> Do we need this command execution at all? In all the other tests, we
> rely on a simple "if (!defined $gzip || $gzip eq '');", so we could do
> the same here.

You are very correct that we are using the simple version, and that is what
it was included in the previous versions of the current patch. However, I
did notice that the variable is hard-coded in Makefile.global.in and it does
not go through configure. By now, gzip is considered an essential package
in most installations, and this hard-code makes sense. Though I did remove
the utility from my system, (apt remove gzip) and tried the test with the
simple "if (!defined $gzip || $gzip eq '');", which predictably failed. For
this, I went with the system call, it is not too expensive and is rather
reliable.

It is true that the rest of the TAP tests that use this, e.g. in pg_basebackup,
also failed. There is an argument to go simple and I will be happy to revert
to the previous version.

> A last thing is that we should perhaps make a clear difference between
> the check that looks at if the code has been built with zlib and the
> check for the presence of GZIP_PROGRAM, as it can be useful in some
> environments to be able to run pg_dump built with zlib, even if the
> GZIP_PROGRAM command does not exist (I don't think this can be the
> case, but other tests are flexible).

You are very correct. We do that already in the current patch. Note that we skip
the test only when we specifically have to execute a compression command. Not
all compression tests define such command, exactly so that we can test those
cases as well. The point of using an external utility program is in order to
extend the coverage in previously untested yet supported scenarios, e.g. manual
compression of the *.toc files.

Also in the case where it will actually skip the compression command because the
gzip program is not present, it will execute the pg_dump command first.

> As of now, the patch relies on
> pg_dump enforcing uncompression if building under --without-zlib even
> if --compress/-Z is used, but that also means that those compression
> tests overlap with the existing tests in this case. Wouldn't it be
> more consistent to check after $supports_compression when executing
> the dump command for test_key = "compression[_gzip]"? This would mean
> keeping GZIP_PROGRAM as sole check when executing the compression
> command.

I can see the overlap case. Yet, I understand the test_key as serving different
purpose, as it is a key of %tests and %full_runs. I do not expect the database
content of the generated dump to change based on which compression method is used.

In the next round, I can see one explitcly requesting --compress=none to override
defaults. There is a benefit to group the tests for this scenario under the same
test_key, i.e. compression.

Also there will be cases where if the program exists, yet the codebase is compiled
without support for the method. Then compress_cmd or the restore_cmd that follows
will fail. For example, in the plain output, if we try to uncompress the generated
the test will fail with 'gzip: <filename> not in gzip format'. In the directory
format the compress_cmd will compress the *.toc files, but the restore_cmd will
fail because it does not build with support for them.

In the attached version, I propose that the compression_cmd is converted into
a hash. It contains two keys, the program and the arguments. Maybe it is easier
to read than before or than simply grabbing the first element of the array.

Cheers,
//Georgios

Attachment Content-Type Size
v5-0001-Extend-compression-coverage-for-pg_dump-pg_restor.patch text/x-patch 5.9 KB
v5-0003-Add-LZ4-compression-in-pg_-dump-restore.patch text/x-patch 43.1 KB
v5-0002-Prepare-pg_dump-for-additional-compression-method.patch text/x-patch 52.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-04-01 15:06:53 Re: head fails to build on SLES 12 (wal_compression=zstd)
Previous Message Tomas Vondra 2022-04-01 15:02:14 Re: logical decoding and replication of sequences