Re: Add LZ4 compression in pg_dump

From: gkokolatos(at)pm(dot)me
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2022-12-01 14:58:35
Message-ID: jBiqIGoBbxyqLL2syvEDLm1W_TPukmBtAOkFyICkAssd8qupR7zjpA066Cj_syxQOVFsQORr3fc-g_WeATSXvdoybB-e7Xj26Y-C8U7rTKA=@pm.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

------- Original Message -------
On Thursday, December 1st, 2022 at 3:05 AM, Michael Paquier <michael(at)paquier(dot)xyz> wrote:

>
>
> On Wed, Nov 30, 2022 at 05:11:44PM +0000, gkokolatos(at)pm(dot)me wrote:
>
> > Fair enough. The atteched v11 does that. 0001 introduces compression
> > specification and is using it throughout. 0002 paves the way to the
> > new interface by homogenizing the use of cfp. 0003 introduces the new
> > API and stores the compression algorithm in the custom format header
> > instead of the compression level integer. Finally 0004 adds support for
> > LZ4.
>
>
> I have been looking at 0001, and.. Hmm. I am really wondering
> whether it would not be better to just nuke this warning into orbit.
> This stuff enforces non-compression even if -Z has been used to a
> non-default value. This has been moved to its current location by
> cae2bb1 as of this thread:
> https://www.postgresql.org/message-id/20160526.185551.242041780.horiguchi.kyotaro%40lab.ntt.co.jp
>
> However, this is only active if -Z is used when not building with
> zlib. At the end, it comes down to whether we want to prioritize the
> portability of pg_dump commands specifying a -Z/--compress across
> environments knowing that these may or may not be built with zlib,
> vs the amount of simplification/uniformity we would get across the
> binaries in the tree once we switch everything to use the compression
> specifications. Now that pg_basebackup and pg_receivewal are managed
> by compression specifications, and that we'd want more compression
> options for pg_dump, I would tend to do the latter and from now on
> complain if attempting to do a pg_dump -Z under --without-zlib with a
> compression level > 0. zlib is also widely available, and we don't
> document the fact that non-compression is enforced in this case,
> either. (Two TAP tests with the custom format had to be tweaked.)

Fair enough. Thank you for looking. However I have a small comment
on your new patch.

- /* Custom and directory formats are compressed by default, others not */
- if (compressLevel == -1)
- {
-#ifdef HAVE_LIBZ
- if (archiveFormat == archCustom || archiveFormat == archDirectory)
- compressLevel = Z_DEFAULT_COMPRESSION;
- else
-#endif
- compressLevel = 0;
- }

Nuking the warning from orbit and changing the behaviour around disabling
the requested compression when the libraries are not present, should not
mean that we need to change the behaviour of default values for different
formats. Please find v13 attached which reinstates it.

Which in itself it got me looking and wondering why the tests succeeded.
The only existing test covering that path is `defaults_dir_format` in
`002_pg_dump.pl`. However as the test is currently written it does not
check whether the output was compressed. The restore command would succeed
in either case. A simple `gzip -t -r` against the directory will not
suffice to test it, because there exist files which are never compressed
in this format (.toc). A little bit more involved test case would need
to be written, yet before I embark to this journey, I would like to know
if you would agree to reinstate the defaults for those formats.

>
> As per the patch, it is true that we do not need to bump the format of
> the dump archives, as we can still store only the compression level
> and guess the method from it. I have added some notes about that in
> ReadHead and WriteHead to not forget.

Agreed. A minor suggestion if you may.

#ifndef HAVE_LIBZ
- if (AH->compression != 0)
+ if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
pg_log_warning("archive is compressed, but this installation does not support compression -- no data will be available");
#endif

It would seem a more consistent to error out in this case. We do error
in all other cases where the compression is not available.

>
> Most of the changes are really-straight forward, and it has resisted
> my tests, so I think that this is in a rather-commitable shape as-is.

Thank you.

Cheers,
//Georgios

> --
> Michael

Attachment Content-Type Size
v13-0001-Teach-pg_dump-about-compress_spec-and-use-it-thr.patch text/x-patch 38.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-12-01 14:58:39 Re: Documentation for building with meson
Previous Message Tom Lane 2022-12-01 14:39:53 Re: Allow round() function to accept float and double precision