Re: libpq compression

From: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Denis Smirnov <sd(at)arenadata(dot)io>, Daniil Zakhlystov <usernamedt(at)yandex-team(dot)ru>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: libpq compression
Date: 2021-01-12 14:05:22
Message-ID: a6dbaab3-a9fd-7f57-7fbe-d0c088cf2e4a@garret.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12.01.2021 4:20, Justin Pryzby wrote:
> On Mon, Jan 11, 2021 at 04:53:51PM +0300, Konstantin Knizhnik wrote:
>> On 09.01.2021 23:31, Justin Pryzby wrote:
>>> I suggest that there should be an enum of algorithms, which is constant across
>>> all servers. They would be unconditionally included and not #ifdef depending
>>> on compilation options.
>> I do not think that it is possible (even right now, it is possible to build
>> Postgres without zlib support).
>> Also if new compression algorithms  are added, then in any case we have to
>> somehow handle situation when
>> old client is connected to new server and visa versa.
> I mean an enum of all compression supported in the master branch, starting with
> ZLIB = 1. I think this applies to libpq compression (which requires client and
> server to handle a common compression algorithm), and pg_dump (output of which
> needs to be read by pg_restore), but maybe not TOAST, the patch for which
> supports extensions, with dynamically allocated OIDs.

Sorry, I do not understand the goal of introducing this enum.
Algorithms are in any case specified by name. And internally there is no
need in such enum,
at least in libpq compression.
>
>>> In config.sgml, it says that libpq_compression defaults to on (on the server
>>> side), but in libpq.sgml it says that it defaults to off (on the client side).
>>> Is that what's intended ? I would've thought the defaults would match, or that
>>> the server would enforce a default more conservative than the client's (the DBA
>>> should probably have to explicitly enable compression, and need to "opt-in"
>>> rather than "opt-out").
>> Yes, it is intended behavior:  libpq_compression GUC allows to prohibit
>> compression requests fro clients if due to some reasons (security, CPU
>> consumption is not desired).
>> But by default server should support compression if it is requested by
>> client.
> It's not clear to me if that's true.. It may be what's convenient for you,
> especially during development, but that doesn't mean it's safe or efficient or
> what's generally desirable to everyone.

Definitely it is only my point of view, may be DBA will have different
opinions.
I think that compression is not dangerousness or resource consuming
feature which should be disabled by default.
At least there is on GUC prohibiting SSL connections and them are even
more expensive.
In any case, I will be glad to collect votes whether compression
requests should be be default rejected by server or not.

>> But client should not request compression by default: it makes sense only for
>> queries returning large result sets or transferring a lot of data (liek
>> COPY).
> I think you're making assumptions about everyone's use of the tools, and it's
> better if the DBA makes that determination. The clients aren't generally under
> the admin's control, and if they don't request compression, then it's not used.
> If they request compression, then the DBA still has control over whether it's
> allowed.
>
> We agree that it should be disabled by default, but I suggest that it's most
> flexible if client's makes the request and allow the server to decide. By
> default the server should deny/ignore the request, with the client gracefully
> falling back to no compression.

I think that compression will be most efficient for internal connections
(i.e. replication, bulk data loading, ...)
which are mostly controlled by DBA.
> Compression would have little effect on most queries, especially at default
> level=1.
I mostly agree with you here, except that compression level 1 gives
little effect.
All my experiments both bith page level compression and libpq
compression shows that default compression level
provides optimal balance between compression ratio and compression speed.

In Daniil's benchmark the difference between compression ration 1 and 19
in zstd is only 30%.

> Right now "compression" parameter accepts not only boolean values but
> also
>> list of suggested algorithms with optional compression level, like
>> "zstd:1,zlib"
> You're talking about the client compression param. I'm suggesting that the
> server should allow fine-grained control of what compression algs are permitted
> at *runtime*. This would allow distributions to compile with all compression
> libraries enabled at configure time, and still allow an DBA to disable one
> without recompiling.

Frankly speaking I do not see much sense in it.
My point of view is the following: there are several well known and
widely used compression algorithms now:
zlib, zstd, lz4. There are few others but results of various my
benchmarks (mostly for page level compression)
shows that zstd provides best compression ratio/speed  balance and lz4 -
the best speed.

zlib is available almost everywhere and postgres by default is
configured with zlib.
So it can be considered as default compression algorithm available
everywhere.
If Postgres was built with zstd or lz4 (not currently supported for
libpq compression) then it should be used instead of zlib
because it is faster and provides better compression quality.

So I do not see any other arguments for enabling or disabling some
particular compression algorithms by DBA or suggesting it by client.
>
> That's possible with environment variable or connection string.
>
> I'm proposing to simplify change of its default when recompiling, just like this one:
> src/interfaces/libpq/fe-connect.c:#define DefaultHost "localhost"
>
> Think this would be a 2 line change, and makes the default less hardcoded.

Sorry, my intention was the following: there is list of supported
algorithms in order of decreasing their efficiency
(yes, I realize that in general case it may be not always possible to
provide partial order on the set of supported compression algorithms,
but right now with zstd and zlib it is trivial).
Client sends to the server its list of supported algorithms (or
explicitly specified by user) and server choose most efficient and
supported among them.
So there is no need to specify some default in this schema. We can
change order of algorithms in the array to affect choice of default
algorithm.

>>> Your patch warns if *none* of the specified algorithms are supported, but I
>>> wonder if it should warn if *any* of them are unsupported (like if someone
>>> writes libz instead of zlib, or zst/libzstd instead of zstd, which I guess
>>> about half of us will do).
>> Ugh... Handling mistyping in connection string seems to be not so good idea
>> (from my point of view).
>> And situation when some of algorithms is not supported by server seems to
>> normal (if new client connects to old server).
>> So I do not want to produce warning in this case.
> I'm not talking about warning if an alg is "not supported by server", but
> rather if it's "not supported by client". I think there should be a warning if
> a connection string specifies two libraries, but one is misspelled.
Makes sense. I  add this check.

>
> I think it's good if it *can* be automatic. But it's also good if the DBA can
> implement a simple "policy" about what's (not) supported. If the user requests
> a compression and it's not known on the *client* side I think it should warn.
>
> BTW I think the compression should be shown in psql \conninfo

Good point: done.

>
>
> + if ((unsigned)index >= conn->n_compressors)
> + {
> + appendPQExpBuffer(&conn->errorMessage,
> + libpq_gettext(
> + "server returns incorrect compression aslogirhm index: %d\n"),
>
> Now, you're checking that the index is not too large, but not that it's not too
> small.

Sorry, it was my"fad" to write most optimal code which is completely not
relevant in this place.
I specially use unsigned comparison to perform check using just one
comparison instead of two.
I have added comment here.
>>> Right now, when I try to connect to an unpatched server, I get:
>>> psql: error: expected authentication request from server, but received v
>> Thank you for pointing it: fixed.
> I tested that I can connect to unpatced server, thanks.
>
> I have to confess that I don't know how this works, or what it has to do with
> what the commit message claims it does ?
>
> Are you saying I can use zlib in one direction and zstd in the other ? How
> would I do that ?

Sorry, this feature was suggested by Andres and partly implemented by
Daniil.
I always think that it is useless and overkill feature.

New version of the patch with the suggested changes is attached.

Attachment Content-Type Size
libpq-compression-31.patch text/x-patch 81.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2021-01-12 14:53:11 Re: Added schema level support for publication.
Previous Message Andrew Dunstan 2021-01-12 14:04:51 Re: Cirrus CI (Windows help wanted)