Re: Refactoring of compression options in pg_basebackup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Georgios Kokolatos <gkokolatos(at)pm(dot)me>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>
Subject: Re: Refactoring of compression options in pg_basebackup
Date: 2022-01-05 15:33:38
Message-ID: CA+TgmoYb4jnOU+-Xipf2V+twF9MR1P9bbLQqjiAmi3yc3r1tOA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 5, 2022 at 4:17 AM <gkokolatos(at)pm(dot)me> wrote:
> When the backend-side compression is completed, were there really be a need for
> client-side compression? If yes, then it seems logical to have distinct options
> for them and this cleanup makes sense. If not, then it seems logical to maintain
> the current options list and 'simply' change the internals of the code, and this
> cleanup makes sense.

I think we're going to want to offer both options. We can't know
whether the user prefers to consume CPU cycles on the server or on the
client. Compressing on the server has the advantage of potentially
saving transfer bandwidth, but the server is also often the busiest
part of the whole system, and users are often keen to offload as much
work as possible.

Given that, I'd like us to be thinking about what the full set of
options looks like once we have (1) compression on either the server
or the client and (2) multiple compression algorithms and (3) multiple
compression levels. Personally, I don't really like the decision made
by this proposed patch. In this patch's view of the world, -Z is a way
of providing the compression level for whatever compression algorithm
you happen to have selected, but I think of -Z as being the upper-case
version of -z which I think of as selecting specifically gzip. It's
not particularly intuitive to me that in a command like pg_basebackup
--compress=<something>, <something> is a compression level rather than
an algorithm. So what I would propose is probably something like:

pg_basebackup --compress=ALGORITHM [--compression-level=NUMBER]
pg_basebackup --server-compress=ALGORITHM [--compression-level=NUMBER]

And then make -z short for --compress=gzip and -Z <n> short for
--compress=gzip --compression-level=<n>. That would be a
backward-incompatible change to the definition of --compress, but as
long as -Z <n> works the same as today, I don't think many people will
notice. If we like, we can notice if the argument to --compress is an
integer and suggest using either -Z or --compress=gzip
--compression-level=<n> instead.

In the proposed patch, you end up with pg_basebackup
--compression-method=lz4 -Z2 meaning compression with lz4 level 2. I
find that quite odd, though as with all such things, opinions may
vary. In my proposal, that would be an error, because it would be
equivalent to --compress=lz4 --compress=gzip --compression-level=2,
and would thus involve conflicting compression method specifications.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-01-05 16:07:53 Re: SQL:2011 application time
Previous Message Tom Lane 2022-01-05 15:22:06 Re: Refactoring of compression options in pg_basebackup