Re: Refactoring of compression options in pg_basebackup

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Georgios Kokolatos <gkokolatos(at)pm(dot)me>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>
Subject: Re: Refactoring of compression options in pg_basebackup
Date: 2022-01-06 05:04:07
Message-ID: YdZ4R7sNek759eZX@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 05, 2022 at 10:33:38AM -0500, Robert Haas wrote:
> I think we're going to want to offer both options. We can't know
> whether the user prefers to consume CPU cycles on the server or on the
> client. Compressing on the server has the advantage of potentially
> saving transfer bandwidth, but the server is also often the busiest
> part of the whole system, and users are often keen to offload as much
> work as possible.

Yeah. There are cases for both. I just got to wonder whether it
makes sense to allow both server-side and client-side compression to
be used at the same time. That would be a rather strange case, but
well, with the correct set of options that could be possible.

> Given that, I'd like us to be thinking about what the full set of
> options looks like once we have (1) compression on either the server
> or the client and (2) multiple compression algorithms and (3) multiple
> compression levels. Personally, I don't really like the decision made
> by this proposed patch. In this patch's view of the world, -Z is a way
> of providing the compression level for whatever compression algorithm
> you happen to have selected, but I think of -Z as being the upper-case
> version of -z which I think of as selecting specifically gzip. It's
> not particularly intuitive to me that in a command like pg_basebackup
> --compress=<something>, <something> is a compression level rather than
> an algorithm. So what I would propose is probably something like:
>
> pg_basebackup --compress=ALGORITHM [--compression-level=NUMBER]
> pg_basebackup --server-compress=ALGORITHM [--compression-level=NUMBER]
>
> And then make -z short for --compress=gzip and -Z <n> short for
> --compress=gzip --compression-level=<n>. That would be a
> backward-incompatible change to the definition of --compress, but as
> long as -Z <n> works the same as today, I don't think many people will
> notice. If we like, we can notice if the argument to --compress is an
> integer and suggest using either -Z or --compress=gzip
> --compression-level=<n> instead.

My view of things is slightly different, aka I'd rather keep
--compress to mean a compression level with an integer option, but
introduce a --compression-method={lz4,gzip,none}, with -Z being a
synonym of --compression-method=gzip. That's at least the path we
chose for pg_receivewal. I don't mind sticking with one way or
another, as what you are proposing is basically the same thing I have
in mind, but both tools ought to use the same set of options.

Hmm. Perhaps at the end the problem is with --compress, where we
don't know if it means a compression level or a compression method?
For me, --compress means the former, and for you the latter. So a
third way of seeing things is to drop completely --compress, but have
one --compression-method and one --compression-level. That would
bring a clear split. Or just one --compression-method for the
client-side compression as you are proposing for the server-side
compression, however I'd like to think that a split between the method
and level is more intuitive.

> In the proposed patch, you end up with pg_basebackup
> --compression-method=lz4 -Z2 meaning compression with lz4 level 2. I
> find that quite odd, though as with all such things, opinions may
> vary. In my proposal, that would be an error, because it would be
> equivalent to --compress=lz4 --compress=gzip --compression-level=2,
> and would thus involve conflicting compression method specifications.

It seems to me that you did not read the patch closely enough. The
attached patch does not add support for LZ4 in pg_basebackup on the
client-side yet. Once it gets added, though, the idea is that using
--compress with LZ4 would result in an error. That's what happens
with pg_receivewal on HEAD, for one. The patch just shapes things to
plug LZ4 more easily in the existing code of pg_basebackup.c, and
walmethods.c.

So.. As of now, it is actually possible to cut the pie in three
parts. There are no real objections to the cleanup of walmethods.c
and the addition of some conditional TAP tests with pg_basebackup and
client-side compression, as far as I can see, only to the option
renaming part. Attached are two patches, then. 0001 is the cleanup
of walmethods.c to rely the compression method, with more tests (tests
that could be moved into their own patch, as well). 0002 is the
addition of the options I suggested upthread, but we may change that
depending on what gets used for the server-side compression for
consistency so I am not suggesting to merge that until we agree on the
full picture. The point of this thread was mostly about 0001, so I am
fine to discard 0002. Thoughts?
--
Michael

Attachment Content-Type Size
v3-0001-Refactor-walmethods.c-s-tar-method-to-use-compres.patch text/x-diff 8.6 KB
v3-0002-Rework-the-option-set-of-pg_basebackup.patch text/x-diff 8.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2022-01-06 05:16:25 Re: Delay the variable initialization in get_rel_sync_entry
Previous Message Amit Kapila 2022-01-06 04:57:27 Re: Skipping logical replication transactions on subscriber side