Re: libpq compression

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Denis Smirnov <sd(at)arenadata(dot)io>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Daniil Zakhlystov <usernamedt(at)yandex-team(dot)ru>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: libpq compression
Date: 2021-01-11 13:53:51
Message-ID: f02d6f64-fe5e-f507-4870-f433cf253441@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09.01.2021 23:31, Justin Pryzby wrote:
> On Thu, Dec 17, 2020 at 05:54:28PM +0300, Konstantin Knizhnik wrote:
>> I am maintaining this code in
>> git(at)github(dot)com:postgrespro/libpq_compression.git repository.
>> I will be pleased if anybody, who wants to suggest any bug
>> fixes/improvements of libpq compression, create pull requests: it will be
>> much easier for me to merge them.
> Thanks for working on this.
> I have a patch for zstd compression in pg_dump so I looked at your patch.
> I'm attaching some language fixes.

Thank you very much.
I applied your patch on top of pull request of Daniil Zakhlystov who has
implemented support of using different compressors in different direction.

Frankly speaking I still very skeptical concerning too much flexibility
in compression configuration:
- toggle compression on the fly
- using different compression algorithms in both directions
- toggle compression on the fly

According to Daniil's results there is only 30% differences in
compression ration between zstd:1 and zstd:19.
But making it possible to specify arbitrary compression level we give
user of a simple tool to attack server (cause CPU/memory exhaustion).

>
>> +zstd_create(int level, zpq_tx_func tx_func, zpq_rx_func rx_func, void *arg, char* rx_data, size_t rx_data_size)
>> +zlib_create(int level, zpq_tx_func tx_func, zpq_rx_func rx_func, void *arg, char* rx_data, size_t rx_data_size)
>> +build_compressors_list(PGconn *conn, char** client_compressors, bool build_descriptors)
> Are you able to run pg_indent to fix all the places where "*" is before the
> space ? (And similar style issues).

Also done by Daniil.

>
> There are several compression patches in the commitfest, I'm not sure how much
> they need to be coordinated, but for sure we should coordinate the list of
> compressions available at compile time.
>
> Maybe there should be a central structure for this, or maybe just the ID
> numbers of compressors should be a common enum/define. In my patch, I have:
>
> +struct compressLibs {
> + const CompressionAlgorithm alg;
> + const char *name; /* Name in -Z alg= */
> + const char *suffix; /* file extension */
> + const int defaultlevel; /* Default compression level */
> +};
>
> Maybe we'd also want to store the "magic number" of each compression library.
> Maybe there'd also be a common parsing of compression options.
>
> You're supporting a syntax like zlib,zstd:5, but zstd also supports long-range,
> checksum, and rsyncable modes (rsyncable is relevant to pg_dump, but not to
> libpq).

There are at least three places in Postgres where compression is used
right :
1. TOAST and extened attributes compression.
2. pg_basebackup compression
3. pg_dump compression

And there are also patches for
4. page compression
5. protocol compression

It is awful that all this five places are using compression in their own
way.
It seems to me that compression (as well as all other system dependent
stuff as socket IO, file IO, synchronization primitives,...) should be
extracted to SAL (system-abstract layer)
where then can be used both by backend, frontend and utilities.
Including external utilities, like pg_probackup, pg_bouncer, Odyssey, ...
Unfortunately such refactoring requires so much efforts, that it can
have any chance to be committed if this work will be coordinated by one
of the core committers.

> I think your patch has an issue here. You have this:
>
> src/interfaces/libpq/fe-connect.c
>
> + pqGetc(&resp, conn);
> + index = resp;
> + if (index == (char)-1)
> + {
> + appendPQExpBuffer(&conn->errorMessage,
> + libpq_gettext(
> + "server is not supported requested compression algorithms %s\n"),
> + conn->compression);
> + goto error_return;
> + }
> + Assert(!conn->zstream);
> + conn->zstream = zpq_create(conn->compressors[index].impl,
> + conn->compressors[index].level,
> + (zpq_tx_func)pqsecure_write, (zpq_rx_func)pqsecure_read, conn,
> + &conn->inBuffer[conn->inCursor], conn->inEnd-conn->inCursor);
>
> This takes the "index" returned by the server and then accesses
> conn->compressors[index] without first checking if the index is out of range,
> so a malicious server could (at least) crash the client by returning index=666.

Thank you for pointed this problem. I will add the check, although I
think that problem of malicious server
is less critical than malicious client. Also such "byzantine" server may
return wrong data, which in many cases
is more fatal than crash of a client.
> I suggest that there should be an enum of algorithms, which is constant across
> all servers. They would be unconditionally included and not #ifdef depending
> on compilation options.

I do not think that it is possible (even right now, it is possible to
build Postgres without zlib support).
Also if new compression algorithms  are added, then in any case we have
to somehow handle situation when
old client is connected to new server and visa versa.

>
> That would affect the ZpqAlgorithm data structure, which would include an ID
> number similar to
> src/bin/pg_dump/compress_io.h:typedef enum...CompressionAlgorithm;
>
> The CompressionAck would send the ID rather than the "index".
> A protocol analyzer like wireshark could show "Compression: Zstd".
> You'd have to verify that the ID is supported (and not bogus).
>
> Right now, when I try to connect to an unpatched server, I get:
> psql: error: expected authentication request from server, but received v

Thank you for pointing it: fixed.
> +/*
> + * Array with all supported compression algorithms.
> + */
> +static ZpqAlgorithm const zpq_algorithms[] =
> +{
> +#if HAVE_LIBZSTD
> + {zstd_name, zstd_create, zstd_read, zstd_write, zstd_free, zstd_error, zstd_buffered_tx, zstd_buffered_rx},
> +#endif
> +#if HAVE_LIBZ
> + {zlib_name, zlib_create, zlib_read, zlib_write, zlib_free, zlib_error, zlib_buffered_tx, zlib_buffered_rx},
> +#endif
> + {no_compression_name}
> +};
>
> In config.sgml, it says that libpq_compression defaults to on (on the server
> side), but in libpq.sgml it says that it defaults to off (on the client side).
> Is that what's intended ? I would've thought the defaults would match, or that
> the server would enforce a default more conservative than the client's (the DBA
> should probably have to explicitly enable compression, and need to "opt-in"
> rather than "opt-out").

Yes, it is intended behavior:  libpq_compression GUC allows to prohibit
compression requests fro clients
if due to some reasons (security, CPU consumption is not desired).
But by default server should support compression if it is requested by
client.
But client should not request compression by default: it makes sense
only for queries returning large result sets
or transferring a lot of data (liek COPY).

>
> Maybe instead of a boolean, this should be a list of permitted compression
> algorithms. This allows the admin to set a "policy" rather than using the
> server's hard-coded preferences. This could be important to disable an
> algorithm at run-time if there's a vulnerability found, or performance problem,
> or buggy client, or for diagnostics, or performance testing. Actually, I think
> it may be important to allow the admin to disable not just an algorithm, but
> also its options. It should be possible for the server to allow "zstd"
> compression but not "zstd --long", or zstd:99 or arbitrarily large window
> sizes. This seems similar to SSL cipher strings. I think we'd want to be able
> to allow (or prohibit) at least alg:* (with options) or alg (with default
> options).

Sorry, may be you are looking not at the latest version of the patch?
Right now "compression" parameter accepts not only boolean values but
also list of suggested algorithms with optional compression level, like
"zstd:1,zlib"

> Your patch documents "libpq_compression=auto" but that doesn't work:
> WARNING: none of specified algirthms auto is supported by client
> I guess you mean "any" which is what's implemented.
> I suggest to just remove that.
It is some inconsistency with documentation.
It seems to me that I have already fixed it. "auto" was renamed to "any",
> I think maybe your patch should include a way to trivially change the client's
> compile-time default:
> "if (!conn->compression) conn->compression = DefaultCompression"

It can be done using PG_COMPRESSION environment variable.
Do we need some other mechanism for it?

>
> Your patch warns if *none* of the specified algorithms are supported, but I
> wonder if it should warn if *any* of them are unsupported (like if someone
> writes libz instead of zlib, or zst/libzstd instead of zstd, which I guess
> about half of us will do).
Ugh... Handling mistyping in connection string seems to be not so good
idea (from my point of view).
And situation when some of algorithms is not supported by server seems
to normal (if new client connects to old server).
So I do not want to produce warning in this case.

I once again want to repeat my opinion: choosing of compression
algorithms should be done automatically.
Client should just make an intention to use compression (using
compression=on or compression=any)
and server should choose most efficient algorithm which is supported by
both of them.

> $ LD_LIBRARY_PATH=./src/interfaces/libpq PGCOMPRESSION=abc,def src/bin/psql/psql 'host=localhost port=1111'
> WARNING: none of the specified algorithms are supported by client: abc,def
> $ LD_LIBRARY_PATH=./src/interfaces/libpq PGCOMPRESSION=abc,zlib src/bin/psql/psql 'host=localhost port=1111'
> (no warning)
>
> The libpq_compression GUC can be set for a user, like ALTER ROLE .. SET ...
> Is there any utility in making it configurable by client address? Or by
> encryption? I'm thinking of a policy like "do not allow compression from LOCAL
> connection" or "do not allow compression on encrypted connection". Maybe this
> would be somehow integrated into pg_hba. But maybe it's not needed (a separate
> user could be used to allow/disallow compression).

There is definitely no sense to use compression together with
encryption: if compression is desired in this case, it cane be done at
SSL level.
But I do not think that we need more sophisticated mechanism to prohibit
compression requests at server level.
Yes, it is possible to grant privileges to use compression to some
particular role. Or to prohibit it for SSL connections.
But I can't imagine some natural arguments for it.
May be I am wrong. I will be pleased of somebody can describe some
realistic scenarios when any of this features may be needed (taken in
account
that compression of libpq traffic is not something very bad, insecure or
expensive, which can harm server). Misuse of libpq compression
may just consume that extra CPU (but not so much to overload server). In
any case: we do not have such mechanism to restrict of use SSL
connections for some particular roles
or disable it for local connection. Why do we need it for compression,
which is very similar with encryption?

New version of libpq compression patch is attached.
It can be also be found at git(at)github(dot)com:postgrespro/libpq_compression.git

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
libpq-compression-29.patch text/x-patch 78.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2021-01-11 14:00:36 Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion
Previous Message Bharath Rupireddy 2021-01-11 13:22:50 Re: Added schema level support for publication.