Re: Refactoring of compression options in pg_basebackup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Georgios Kokolatos <gkokolatos(at)pm(dot)me>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>
Subject: Re: Refactoring of compression options in pg_basebackup
Date: 2022-01-06 14:27:19
Message-ID: CA+TgmoaUW0v5oxaDTu05n_r--iy6DJFhsxQ2Ldbtf20ZkHm1HA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 6, 2022 at 12:04 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> Yeah. There are cases for both. I just got to wonder whether it
> makes sense to allow both server-side and client-side compression to
> be used at the same time. That would be a rather strange case, but
> well, with the correct set of options that could be possible.

I don't think it makes sense to support that. On the one hand, it
doesn't seem useful: compressing already-compressed data doesn't
usually work very well. Alternatively, I suppose the intent could be
to compress one way for transfer and then decompress and recompress
for storage, but that seems too inefficient to take seriously. On the
other hand, it requires a more complex user interface, and it's
already fairly complicated anyway.

> My view of things is slightly different, aka I'd rather keep
> --compress to mean a compression level with an integer option, but
> introduce a --compression-method={lz4,gzip,none}, with -Z being a
> synonym of --compression-method=gzip. That's at least the path we
> chose for pg_receivewal. I don't mind sticking with one way or
> another, as what you are proposing is basically the same thing I have
> in mind, but both tools ought to use the same set of options.

Did you mean that -z would be a synonym for --compression-method=gzip?
It doesn't really make sense for -Z to be that, unless it's also
setting the compression level.

My objection to --compress=$LEVEL is that the compression level seems
like it ought to rightfully be subordinate to the choice of algorithm.
In general, there's no reason why a compression algorithm has to offer
a choice of compression levels at all, or why they have to be numbered
0 through 9. For example, lz4 on my system claims to offer compression
levels from 1 through 12, plus a separate set of "fast" compression
levels starting with 1 and going up to an unspecified number. And then
it also has options to favor decompression speed, change the block
size, and a few other parameters. We don't necessarily want to expose
all of those options, but we should structure things so that we could
if it became important. The way to do that is to make the compression
algorithm the primary setting, and then anything else you can set for
that compressor is somehow a subordinate setting.

Put another way, we don't decide first that we want to compress with
level 7, and then afterward decide whether that's gzip, lz4, or bzip2.
We pick the compressor first, and then MAYBE think about changing the
compression level.

> > In the proposed patch, you end up with pg_basebackup
> > --compression-method=lz4 -Z2 meaning compression with lz4 level 2. I
> > find that quite odd, though as with all such things, opinions may
> > vary. In my proposal, that would be an error, because it would be
> > equivalent to --compress=lz4 --compress=gzip --compression-level=2,
> > and would thus involve conflicting compression method specifications.
>
> It seems to me that you did not read the patch closely enough. The
> attached patch does not add support for LZ4 in pg_basebackup on the
> client-side yet. Once it gets added, though, the idea is that using
> --compress with LZ4 would result in an error. That's what happens
> with pg_receivewal on HEAD, for one. The patch just shapes things to
> plug LZ4 more easily in the existing code of pg_basebackup.c, and
> walmethods.c.

Well what I was looking at was this:

- printf(_(" -Z, --compress=0-9 compress tar output with given
compression level\n"));
+ printf(_(" -Z, --compress=1-9 compress tar output with given
compression level\n"));
+ printf(_(" --compression-method=METHOD\n"
+ " method to compress data\n"));

That seems to show that, post-patch, the argument to -Z would be a
compression level, even if --compression-method were something other
than gzip.

It's possible that I haven't read something carefully enough, but to
me, what I said seems to be a straightforward conclusion based on
looking at the usage help in the patch. So if I came to the wrong
conclusion, perhaps that usage help isn't reflecting the situation you
intend to create, or not as clearly as it ought.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-01-06 14:37:54 Re: sqlsmith: ERROR: XX000: bogus varno: 2
Previous Message Simon Riggs 2022-01-06 14:09:35 Re: Add 64-bit XIDs into PostgreSQL 15