Re: libpq compression

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Denis Smirnov <sd(at)arenadata(dot)io>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Daniil Zakhlystov <usernamedt(at)yandex-team(dot)ru>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: libpq compression
Date: 2021-01-09 20:31:19
Message-ID: 20210109203119.GL1849@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 17, 2020 at 05:54:28PM +0300, Konstantin Knizhnik wrote:
> I am maintaining this code in
> git(at)github(dot)com:postgrespro/libpq_compression.git repository.
> I will be pleased if anybody, who wants to suggest any bug
> fixes/improvements of libpq compression, create pull requests: it will be
> much easier for me to merge them.

Thanks for working on this.
I have a patch for zstd compression in pg_dump so I looked at your patch.
I'm attaching some language fixes.

> +zstd_create(int level, zpq_tx_func tx_func, zpq_rx_func rx_func, void *arg, char* rx_data, size_t rx_data_size)
> +zlib_create(int level, zpq_tx_func tx_func, zpq_rx_func rx_func, void *arg, char* rx_data, size_t rx_data_size)
> +build_compressors_list(PGconn *conn, char** client_compressors, bool build_descriptors)

Are you able to run pg_indent to fix all the places where "*" is before the
space ? (And similar style issues).

There are several compression patches in the commitfest, I'm not sure how much
they need to be coordinated, but for sure we should coordinate the list of
compressions available at compile time.

Maybe there should be a central structure for this, or maybe just the ID
numbers of compressors should be a common enum/define. In my patch, I have:

+struct compressLibs {
+ const CompressionAlgorithm alg;
+ const char *name; /* Name in -Z alg= */
+ const char *suffix; /* file extension */
+ const int defaultlevel; /* Default compression level */
+};

Maybe we'd also want to store the "magic number" of each compression library.
Maybe there'd also be a common parsing of compression options.

You're supporting a syntax like zlib,zstd:5, but zstd also supports long-range,
checksum, and rsyncable modes (rsyncable is relevant to pg_dump, but not to
libpq).

I think your patch has an issue here. You have this:

src/interfaces/libpq/fe-connect.c

+ pqGetc(&resp, conn);
+ index = resp;
+ if (index == (char)-1)
+ {
+ appendPQExpBuffer(&conn->errorMessage,
+ libpq_gettext(
+ "server is not supported requested compression algorithms %s\n"),
+ conn->compression);
+ goto error_return;
+ }
+ Assert(!conn->zstream);
+ conn->zstream = zpq_create(conn->compressors[index].impl,
+ conn->compressors[index].level,
+ (zpq_tx_func)pqsecure_write, (zpq_rx_func)pqsecure_read, conn,
+ &conn->inBuffer[conn->inCursor], conn->inEnd-conn->inCursor);

This takes the "index" returned by the server and then accesses
conn->compressors[index] without first checking if the index is out of range,
so a malicious server could (at least) crash the client by returning index=666.

I suggest that there should be an enum of algorithms, which is constant across
all servers. They would be unconditionally included and not #ifdef depending
on compilation options.

That would affect the ZpqAlgorithm data structure, which would include an ID
number similar to
src/bin/pg_dump/compress_io.h:typedef enum...CompressionAlgorithm;

The CompressionAck would send the ID rather than the "index".
A protocol analyzer like wireshark could show "Compression: Zstd".
You'd have to verify that the ID is supported (and not bogus).

Right now, when I try to connect to an unpatched server, I get:
psql: error: expected authentication request from server, but received v

+/*
+ * Array with all supported compression algorithms.
+ */
+static ZpqAlgorithm const zpq_algorithms[] =
+{
+#if HAVE_LIBZSTD
+ {zstd_name, zstd_create, zstd_read, zstd_write, zstd_free, zstd_error, zstd_buffered_tx, zstd_buffered_rx},
+#endif
+#if HAVE_LIBZ
+ {zlib_name, zlib_create, zlib_read, zlib_write, zlib_free, zlib_error, zlib_buffered_tx, zlib_buffered_rx},
+#endif
+ {no_compression_name}
+};

In config.sgml, it says that libpq_compression defaults to on (on the server
side), but in libpq.sgml it says that it defaults to off (on the client side).
Is that what's intended ? I would've thought the defaults would match, or that
the server would enforce a default more conservative than the client's (the DBA
should probably have to explicitly enable compression, and need to "opt-in"
rather than "opt-out").

Maybe instead of a boolean, this should be a list of permitted compression
algorithms. This allows the admin to set a "policy" rather than using the
server's hard-coded preferences. This could be important to disable an
algorithm at run-time if there's a vulnerability found, or performance problem,
or buggy client, or for diagnostics, or performance testing. Actually, I think
it may be important to allow the admin to disable not just an algorithm, but
also its options. It should be possible for the server to allow "zstd"
compression but not "zstd --long", or zstd:99 or arbitrarily large window
sizes. This seems similar to SSL cipher strings. I think we'd want to be able
to allow (or prohibit) at least alg:* (with options) or alg (with default
options).

Your patch documents "libpq_compression=auto" but that doesn't work:
WARNING: none of specified algirthms auto is supported by client
I guess you mean "any" which is what's implemented.
I suggest to just remove that.

I think maybe your patch should include a way to trivially change the client's
compile-time default:
"if (!conn->compression) conn->compression = DefaultCompression"

Your patch warns if *none* of the specified algorithms are supported, but I
wonder if it should warn if *any* of them are unsupported (like if someone
writes libz instead of zlib, or zst/libzstd instead of zstd, which I guess
about half of us will do).

$ LD_LIBRARY_PATH=./src/interfaces/libpq PGCOMPRESSION=abc,def src/bin/psql/psql 'host=localhost port=1111'
WARNING: none of the specified algorithms are supported by client: abc,def
$ LD_LIBRARY_PATH=./src/interfaces/libpq PGCOMPRESSION=abc,zlib src/bin/psql/psql 'host=localhost port=1111'
(no warning)

The libpq_compression GUC can be set for a user, like ALTER ROLE .. SET ...
Is there any utility in making it configurable by client address? Or by
encryption? I'm thinking of a policy like "do not allow compression from LOCAL
connection" or "do not allow compression on encrypted connection". Maybe this
would be somehow integrated into pg_hba. But maybe it's not needed (a separate
user could be used to allow/disallow compression).

--
Justin

Attachment Content-Type Size
0001-fixes-to-docs-and-comments.notapxtch text/x-diff 18.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2021-01-09 20:34:30 Re: [HACKERS] [PATCH] Generic type subscripting
Previous Message Thomas Munro 2021-01-09 20:21:59 Use pg_pwrite() in pg_test_fsync