Re: Add LZ4 compression in pg_dump

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: gkokolatos(at)pm(dot)me
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2023-01-27 17:23:20
Message-ID: 20230127172320.GZ22427@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 26, 2023 at 12:22:45PM -0600, Justin Pryzby wrote:
> That commit also added this to pg-dump.c:
>
> + case PG_COMPRESSION_ZSTD:
> + pg_fatal("compression with %s is not yet supported", "ZSTD");
> + break;
> + case PG_COMPRESSION_LZ4:
> + pg_fatal("compression with %s is not yet supported", "LZ4");
> + break;
>
> In 002, that could be simplified by re-using the supports_compression()
> function. (And maybe the same in WriteDataToArchive()?)

The first patch aims to minimize references to ".gz" and "GZIP" and
ZLIB. pg_backup_directory.c comments still refers to ".gz". I think
the patch should ideally change to refer to "the compressed file
extension" (similar to compress_io.c), avoiding the need to update it
later.

I think the file extension stuff could be generalized, so it doesn't
need to be updated in multiple places (pg_backup_directory.c and
compress_io.c). Maybe it's useful to add a function to return the
extension of a given compression method. It could go in compression.c,
and be useful in basebackup.

For the 2nd patch:

I might be in the minority, but I still think some references to "gzip"
should say "zlib":

+} GzipCompressorState;
+
+/* Private routines that support gzip compressed data I/O */
+static void
+DeflateCompressorGzip(ArchiveHandle *AH, CompressorState *cs, bool flush)

In my mind, three things here are misleading, because it doesn't use
gzip headers:

| GzipCompressorState, DeflateCompressorGzip, "gzip compressed".

This comment is about exactly that:

* underlying stream. The second API is a wrapper around fopen/gzopen and
* friends, providing an interface similar to those, but abstracts away
* the possible compression. Both APIs use libz for the compression, but
* the second API uses gzip headers, so the resulting files can be easily
* manipulated with the gzip utility.

AIUI, Michael says that it's fine that the user-facing command-line
options use "-Z gzip" (even though the "custom" format doesn't use gzip
headers). I'm okay with that, as long as that's discussed/understood.

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2023-01-27 17:24:07 Re: Show various offset arrays for heap WAL records
Previous Message Tom Lane 2023-01-27 17:14:22 Re: Improve GetConfigOptionValues function