Re: Add LZ4 compression in pg_dump

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: gkokolatos(at)pm(dot)me
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Rachel Heaton <rachelmheaton(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add LZ4 compression in pg_dump
Date: 2022-03-31 02:34:52
Message-ID: YkUTTK7dS6Zyeg9r@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 30, 2022 at 03:32:55PM +0000, gkokolatos(at)pm(dot)me wrote:
> On Wednesday, March 30th, 2022 at 7:54 AM, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>> While moving on with 0002, I have noticed the following in
>>
>> _StartBlob():
>> if (AH->compression != 0)
>> sfx = ".gz";
>> else
>> sfx = "";
>>
>> Shouldn't this bit also be simplified, adding a fatal() like the other
>> code paths, for safety?
>
> Agreed. Fixed.

Okay. 0002 looks fine as-is, and I don't mind the extra fatal()
calls. These could be asserts but that's not a big deal one way or
the other. And the cleanup is now applied.

>> + my $compress_program = $ENV{GZIP_PROGRAM};
>>
>> It seems to me that it is enough to rely on {compress_cmd}, hence
>> there should be no need for $compress_program, no?
>
> Maybe not. We don't want to the tests to fail if the utility is not
> installed. That becomes even more evident as more methods are added.
> However I realized that the presence of the environmental variable does
> not guarrantee that the utility is actually installed. In the attached,
> the existance of the utility is based on the return value of system_log().

Hmm. [.. thinks ..] The thing that's itching me here is that you
align the concept of compression with gzip, but that's not going to be
true once more compression options are added to pg_dump, and that
would make $supports_compression and $compress_program_exists
incorrect. Perhaps the right answer would be to rename all that with
a suffix like "_gzip" to make a difference? Or would there be enough
control with a value of "compression_gzip" instead of "compression" in
test_key?

+my $compress_program_exists = (system_log("$ENV{GZIP_PROGRAM}", '-h',
+ '>', '/dev/null') == 0);
Do we need this command execution at all? In all the other tests, we
rely on a simple "if (!defined $gzip || $gzip eq '');", so we could do
the same here.

A last thing is that we should perhaps make a clear difference between
the check that looks at if the code has been built with zlib and the
check for the presence of GZIP_PROGRAM, as it can be useful in some
environments to be able to run pg_dump built with zlib, even if the
GZIP_PROGRAM command does not exist (I don't think this can be the
case, but other tests are flexible). As of now, the patch relies on
pg_dump enforcing uncompression if building under --without-zlib even
if --compress/-Z is used, but that also means that those compression
tests overlap with the existing tests in this case. Wouldn't it be
more consistent to check after $supports_compression when executing
the dump command for test_key = "compression[_gzip]"? This would mean
keeping GZIP_PROGRAM as sole check when executing the compression
command.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-03-31 02:37:58 Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Previous Message shiy.fnst@fujitsu.com 2022-03-31 02:26:06 RE: Logical replication timeout problem