Re: libpq compression

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Denis Smirnov <sd(at)arenadata(dot)io>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Daniil Zakhlystov <usernamedt(at)yandex-team(dot)ru>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: libpq compression
Date: 2021-01-12 01:20:19
Message-ID: 20210112012018.GW1849@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 11, 2021 at 04:53:51PM +0300, Konstantin Knizhnik wrote:
> On 09.01.2021 23:31, Justin Pryzby wrote:
> > I suggest that there should be an enum of algorithms, which is constant across
> > all servers. They would be unconditionally included and not #ifdef depending
> > on compilation options.
>
> I do not think that it is possible (even right now, it is possible to build
> Postgres without zlib support).
> Also if new compression algorithms  are added, then in any case we have to
> somehow handle situation when
> old client is connected to new server and visa versa.

I mean an enum of all compression supported in the master branch, starting with
ZLIB = 1. I think this applies to libpq compression (which requires client and
server to handle a common compression algorithm), and pg_dump (output of which
needs to be read by pg_restore), but maybe not TOAST, the patch for which
supports extensions, with dynamically allocated OIDs.

> > In config.sgml, it says that libpq_compression defaults to on (on the server
> > side), but in libpq.sgml it says that it defaults to off (on the client side).
> > Is that what's intended ? I would've thought the defaults would match, or that
> > the server would enforce a default more conservative than the client's (the DBA
> > should probably have to explicitly enable compression, and need to "opt-in"
> > rather than "opt-out").
>
> Yes, it is intended behavior:  libpq_compression GUC allows to prohibit
> compression requests fro clients if due to some reasons (security, CPU
> consumption is not desired).
> But by default server should support compression if it is requested by
> client.

It's not clear to me if that's true.. It may be what's convenient for you,
especially during development, but that doesn't mean it's safe or efficient or
what's generally desirable to everyone.

> But client should not request compression by default: it makes sense only for
> queries returning large result sets or transferring a lot of data (liek
> COPY).

I think you're making assumptions about everyone's use of the tools, and it's
better if the DBA makes that determination. The clients aren't generally under
the admin's control, and if they don't request compression, then it's not used.
If they request compression, then the DBA still has control over whether it's
allowed.

We agree that it should be disabled by default, but I suggest that it's most
flexible if client's makes the request and allow the server to decide. By
default the server should deny/ignore the request, with the client gracefully
falling back to no compression.

Compression would have little effect on most queries, especially at default
level=1.

> > Maybe instead of a boolean, this should be a list of permitted compression
> > algorithms. This allows the admin to set a "policy" rather than using the
> > server's hard-coded preferences. This could be important to disable an
> > algorithm at run-time if there's a vulnerability found, or performance problem,
> > or buggy client, or for diagnostics, or performance testing. Actually, I think
> > it may be important to allow the admin to disable not just an algorithm, but
> > also its options. It should be possible for the server to allow "zstd"
> > compression but not "zstd --long", or zstd:99 or arbitrarily large window
> > sizes. This seems similar to SSL cipher strings. I think we'd want to be able
> > to allow (or prohibit) at least alg:* (with options) or alg (with default
> > options).
>
> Sorry, may be you are looking not at the latest version of the patch?
> Right now "compression" parameter accepts not only boolean values but also
> list of suggested algorithms with optional compression level, like
> "zstd:1,zlib"

You're talking about the client compression param. I'm suggesting that the
server should allow fine-grained control of what compression algs are permitted
at *runtime*. This would allow distributions to compile with all compression
libraries enabled at configure time, and still allow an DBA to disable one
without recompiling.

> > Your patch documents "libpq_compression=auto" but that doesn't work:
> > WARNING: none of specified algirthms auto is supported by client
> > I guess you mean "any" which is what's implemented.
> > I suggest to just remove that.
> It is some inconsistency with documentation.
> It seems to me that I have already fixed it. "auto" was renamed to "any",
> > I think maybe your patch should include a way to trivially change the client's
> > compile-time default:
> > "if (!conn->compression) conn->compression = DefaultCompression"
>
> It can be done using PG_COMPRESSION environment variable.
> Do we need some other mechanism for it?

That's possible with environment variable or connection string.

I'm proposing to simplify change of its default when recompiling, just like this one:
src/interfaces/libpq/fe-connect.c:#define DefaultHost "localhost"

Think this would be a 2 line change, and makes the default less hardcoded.

> > Your patch warns if *none* of the specified algorithms are supported, but I
> > wonder if it should warn if *any* of them are unsupported (like if someone
> > writes libz instead of zlib, or zst/libzstd instead of zstd, which I guess
> > about half of us will do).
> Ugh... Handling mistyping in connection string seems to be not so good idea
> (from my point of view).
> And situation when some of algorithms is not supported by server seems to
> normal (if new client connects to old server).
> So I do not want to produce warning in this case.

I'm not talking about warning if an alg is "not supported by server", but
rather if it's "not supported by client". I think there should be a warning if
a connection string specifies two libraries, but one is misspelled.

> I once again want to repeat my opinion: choosing of compression algorithms
> should be done automatically.
> Client should just make an intention to use compression (using
> compression=on or compression=any)
> and server should choose most efficient algorithm which is supported by both
> of them.

I think it's good if it *can* be automatic. But it's also good if the DBA can
implement a simple "policy" about what's (not) supported. If the user requests
a compression and it's not known on the *client* side I think it should warn.

BTW I think the compression should be shown in psql \conninfo

> > src/interfaces/libpq/fe-connect.c
> >
> > + pqGetc(&resp, conn);
> > + index = resp;
> > + if (index == (char)-1)
> > + {
> > + appendPQExpBuffer(&conn->errorMessage,
> > + libpq_gettext(
> > + "server is not supported requested compression algorithms %s\n"),
> > + conn->compression);
> > + goto error_return;
> > + }
> > + Assert(!conn->zstream);
> > + conn->zstream = zpq_create(conn->compressors[index].impl,
> > + conn->compressors[index].level,
> > + (zpq_tx_func)pqsecure_write, (zpq_rx_func)pqsecure_read, conn,
> > + &conn->inBuffer[conn->inCursor], conn->inEnd-conn->inCursor);
> >
> > This takes the "index" returned by the server and then accesses
> > conn->compressors[index] without first checking if the index is out of range,
> > so a malicious server could (at least) crash the client by returning index=666.
>
> Thank you for pointed this problem. I will add the check, although I think
> that problem of malicious server is less critical than malicious client. Also
> such "byzantine" server may return wrong data, which in many cases is more
> fatal than crash of a client.

+ if ((unsigned)index >= conn->n_compressors)
+ {
+ appendPQExpBuffer(&conn->errorMessage,
+ libpq_gettext(
+ "server returns incorrect compression aslogirhm index: %d\n"),

Now, you're checking that the index is not too large, but not that it's not too
small.

> > Right now, when I try to connect to an unpatched server, I get:
> > psql: error: expected authentication request from server, but received v
>
> Thank you for pointing it: fixed.

I tested that I can connect to unpatced server, thanks.

I have to confess that I don't know how this works, or what it has to do with
what the commit message claims it does ?

Are you saying I can use zlib in one direction and zstd in the other ? How
would I do that ?

> Author: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
> Date: Mon Jan 11 16:41:52 2021 +0300
> Making it possible to specify different compresion algorithms in both directions

+ else if (conn->n_compressors != 0 && beresp == 'v') /* negotiate protocol version */
+ {
+ appendPQExpBuffer(&conn->errorMessage,
+ libpq_gettext(
+ "server is not supporting libpq compression\n"));
+ goto error_return;

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2021-01-12 01:47:21 Re: A failure of standby to follow timeline switch
Previous Message Masahiko Sawada 2021-01-12 00:59:49 Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion