Re: Add LZ4 compression in pg_dump

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: gkokolatos(at)pm(dot)me
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2023-01-25 18:00:20
Message-ID: 20230125180020.GF22427@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 25, 2023 at 03:37:12PM +0000, gkokolatos(at)pm(dot)me wrote:
> Of course, one can throw the error before entering the loop, yet I think
> that it does not help the readability of the code. IMHO it is easier to
> follow if the error is thrown once during that check.

> If anything, I can suggest to throw an error much earlier, i.e. in ReadHead(),
> and remove altogether this check. On the other hand, I like the belts
> and suspenders approach because there are no more checks after this point.

While looking at this, I realized that commit 5e73a6048 introduced a
regression:

@@ -3740,19 +3762,24 @@ ReadHead(ArchiveHandle *AH)

- if (AH->compression != 0)
- pg_log_warning("archive is compressed, but this installation does not support compression -- no data will be available");
+ if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
+ pg_fatal("archive is compressed, but this installation does not support compression");

Before, it was possible to restore non-data chunks of a dump file, even
if the current build didn't support its compression. But that's now
impossible - and it makes the code we're discussing in RestoreArchive()
unreachable.

I don't think we can currently test for that, since it requires creating a dump
using a build --with compression and then trying to restore using a build
--without compression. The coverage report disagrees with me, though...
https://coverage.postgresql.org/src/bin/pg_dump/pg_backup_archiver.c.gcov.html#3901

> > I think it could be written to avoid the need to change for added
> > compression algorithms:
...
>
> I am not certain how that would work in the example with ZSTD above.
> If I am not wrong, parse_compress_specification() will not throw an error
> if the codebase supports ZSTD, yet this specific pg_dump binary will not
> support it because ZSTD is not implemented. parse_compress_specification()
> is not aware of that and should not be aware of it, should it?

You're right.

I think the 001 patch should try to remove hardcoded references to
LIBZ/GZIP, such that the later patches don't need to update those same
places for LZ4. For example in ReadHead() and RestoreArchive(), and
maybe other places dealing with file extensions. Maybe that could be
done by adding a function specific to pg_dump indicating whether or not
an algorithm is implemented and supported.

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-01-25 18:17:50 Re: pgsql: Rename contrib module basic_archive to basic_wal_module
Previous Message Nathan Bossart 2023-01-25 17:57:46 Re: wake up logical workers after ALTER SUBSCRIPTION