Re: libpq compression

From: Daniil Zakhlystov <usernamedt(at)yandex-team(dot)ru>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, ianzag(at)gmail(dot)com
Subject: Re: libpq compression
Date: 2021-07-29 14:51:22
Message-ID: 66AB9CC3-4EDA-4443-8F82-F96D8919F942@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

I made some noticeable changes to the patch and fixed the previously mentioned issues.

On Fri, Mar 19, 2021 at 16:28 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> Previously, I suggested that the server should have a "policy" GUC defining
> which compression methods are allowed. Possibly including compression "level".
> For example, the default might be to allow zstd, but only up to level 9.

Now libpq_compression GUC server setting controls the available client-server traffic compression methods.
It allows specifying the exact list of the allowed compression algorithms.

For example, to allow only and zlib, set the setting to "zstd,zlib". Also, maximal allowed compression level can be specified for each
method, e.g. "zstd:1,zlib:2" setting will set the maximal compression level for zstd to 1 and zlib to 2.
If a client requests the compression with a higher compression level, it will be set to the maximal allowed one.
The default (and recommended) maximal compression level for each algorithm is 1.

On Fri, Mar 19, 2021 at 16:28 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> There's commit messages claiming that the compression can be "asymmetric"
> between client-server vs server-client, but the commit seems to be unrelated,
> and the protocol documentation doesn't describe how this works.

On Dec 10, 2020, at 1:39 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Another idea is that you could have a new message type that says "hey,
> the payload of this is 1 or more compressed messages." It uses the
> most-recently set compression method. This would make switching
> compression methods easier since the SetCompressionMethod message
> itself could always be sent uncompressed and/or not take effect until
> the next compressed message

I rewrote the compression initialization logic. Now it is asymmetric and the compression method is changeable on the fly.

Now compression initialization consists only of two phases:
1. Client sends startup packet with _pq_.compression containing the list of compression algorithms specified by the client with an optional
specification of compression level (e.g. “zstd:2,zlib:1”)
2. The server intersects the requested compression algorithms with the allowed ones (controlled via the libpq_compression server config setting).
If the intersection is not empty, the server responds with CompressionAck containing the final list of the compression algorithms that can be used for the compression of libpq messages between the client and server.
If the intersection is empty (server does not accept any of the requested algorithms), then it replies with CompressionAck containing the empty list.

After sending the CompressionAck message, the server can send the SetCompressionMethod message to set the current compression algorithm for server-to-client traffic compression.
After receiving the CompressionAck message, the client can send the SetCompressionMethod message to set the current compression algorithm for client-to-server traffic compression.

SetCompressionMethod message contains the index of the compression algorithm in the final list of the compression algorithms which is generated during compression initialization (described above).

Compressed data is wrapped into CompressedData messages.

Rules for compressing the protocol messages are defined in zpq_stream.c. For each protocol message, the preferred compression algorithm can be chosen.

On Wed, Apr 21, 2021 at 15:35 AM Ian Zagorskikh <izagorskikh(at)cloudlinux(dot)com> wrote:
> Let me drop my humble 2 cents in this thread. At this moment I checked only 0001-Add-zlib-and-zstd-streaming-compression patch. With no order:
>
> * No checks for failures. For example, value from malloc() is not checked for NULL and used as-is right after the call. Also as a result possibly NULL values passed into ZSTD functions which are explicitly not NULL-tolerant and so dereference pointers without checks.
>
> * Used memory management does not follow the schema used in the common module. Direct calls to std C malloc/free are hard coded. AFAIU this is not backend friendly. Looking at the code around I believe they should be wrapped to ALLOC/FREE local macroses that depend on FRONTEND define. As it's done for example. in src/common/hmac.c.
>
> * If we're going to fix our code to be palloc/pfree friendly there's also a way to pass our custom allocator callbacks inside ZSTD functions. By default ZSTD uses malloc/free but it can be overwritten by the caller in e.g. ZSTD_createDStream_advanced(ZSTD_customMem customMem) versions of API . IMHO that would be good. If a 3rd party component allows us to inject a custom memory allocator and we do have it why not use this feature?
>
> * Most zs_foo() functions do not check ptr arguments, though some like zs_free() do. If we're speaking about a "common API" that can be used by a wide range of different modules around the project, such a relaxed way to treat input arguments IMHO can be an evil. Or at least add Assert(ptr) assertions so they can be catched up in debug mode.
>
> * For zs_create() function which must be called first to create context for further operations, a compression method is passed as integer. This method is used inside zs_create() as index inside an array of compress implementation descriptors. There are several problems:
> 1) Method ID value is not checked for validity. By passing an invalid method ID we can easily get out of array bounds.
> 2) Content of array depends on on configuration options e.g. HAVE_LIBZSTD, HAVE_LIBZ etc. So index inside this array is not persistent. For some config options combination method ID with value 0 may refer to ZSTD and for others to zlib.
> 3) There's no defines/enums/etc in public z_stream.h header that would define which value should be passed to create a specific compressor. Users have to guess it or check the code.

Fixed almost all of these issues, except the malloc/free-related stuff (will fix this later).

On Fri, Mar 19, 2021 at 11:02 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> Also, I'm not sure, but I think you may find that the zstd configure.ac should
> use pkgconfig. This allowed the CIs to compile these patches. Without
> pkg-config, the macos CI couldn't find (at least) LZ4.k
> https://commitfest.postgresql.org/32/2813/
> https://commitfest.postgresql.org/32/3015/

Now --with-zstd uses pkg-config to link the ZSTD library and works correctly on macos.

I would appreciate hearing your thoughts on the new version of the patch,

Daniil Zakhlystov

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2021-07-29 14:56:40 Re: pg_upgrade does not upgrade pg_stat_statements properly
Previous Message Daniel Verite 2021-07-29 14:45:37 Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from a REFCURSOR