Re: Add LZ4 compression in pg_dump

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: gkokolatos(at)pm(dot)me
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Rachel Heaton <rachelmheaton(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add LZ4 compression in pg_dump
Date: 2022-04-05 01:34:49
Message-ID: YkucudXyfOwPOY5n@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 01, 2022 at 03:06:40PM +0000, gkokolatos(at)pm(dot)me wrote:
> I understand the itch. Indeed when LZ4 is added as compression method, this
> block changes slightly. I went with the minimum amount changed. Please find
> in 0001 of the attached this variable renamed as $gzip_program_exist. I thought
> that as prefix it will match better the already used $ENV{GZIP_PROGRAM}.

Hmm. I have spent some time on that, and upon review I really think
that we should skip the tests marked as dedicated to the gzip
compression entirely if the build is not compiled with this option,
rather than letting the code run a dump for nothing in some cases,
relying on the default to uncompress the contents in others. In the
latter case, it happens that we have already some checks like
defaults_custom_format, but you already mentioned that.

We should also skip the later parts of the tests if the compression
program does not exist as we rely on it, but only if the command does
not exist. This will count for LZ4.

> I can see the overlap case. Yet, I understand the test_key as serving different
> purpose, as it is a key of %tests and %full_runs. I do not expect the database
> content of the generated dump to change based on which compression method is used.

Contrary to the current LZ4 tests in pg_dump, what we have here is a
check for a command-level run and not a data-level check. So what's
introduced is a new concept, and we need a new way to control if the
tests should be entirely skipped or not, particularly if we finish by
not using test_key to make the difference. Perhaps the best way to
address that is to have a new keyword in the $runs structure. The
attached defines a new compile_option, that can be completed later for
new compression methods introduced in the tests. So the idea is to
mark all the tests related to compression with the same test_key, and
the tests can be skipped depending on what compile_option requires.

> In the attached version, I propose that the compression_cmd is converted into
> a hash. It contains two keys, the program and the arguments. Maybe it is easier
> to read than before or than simply grabbing the first element of the array.

Splitting the program and its arguments makes sense.

At the end I am finishing with the attached. I also saw an overlap
with the addition of --jobs for the directory format vs not using the
option, so I have removed the case where --jobs was not used in the
directory format.
--
Michael

Attachment Content-Type Size
v6-0001-Extend-compression-coverage-for-pg_dump-pg_restor.patch text/x-diff 6.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2022-04-05 01:40:04 Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN
Previous Message Masahiko Sawada 2022-04-05 01:13:06 Re: Skipping logical replication transactions on subscriber side