Re: libpq compression

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>
Cc: "Iwata, Aya" <iwata(dot)aya(at)jp(dot)fujitsu(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "rharwood(at)redhat(dot)com" <rharwood(at)redhat(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, "g(dot)smolkin(at)postgrespro(dot)ru" <g(dot)smolkin(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, 'Dmitry Dolgov' <9erthalion6(at)gmail(dot)com>
Subject: Re: libpq compression
Date: 2019-02-10 00:25:37
Message-ID: ccb0b694-cd60-133c-fdc6-62293ad91d79@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/9/19 3:02 PM, Konstantin Knizhnik wrote:
>
>
> On 09.02.2019 1:38, Tomas Vondra wrote:
>> On 2/8/19 11:10 PM, Konstantin Knizhnik wrote:
>>>
>>> On 08.02.2019 21:57, Andres Freund wrote:
>>>> On 2019-02-08 12:15:58 +0300, Konstantin Knizhnik wrote:
>>>>> Frankly speaking, I do not think that such flexibility in choosing
>>>>> compression algorithms is really needed.
>>>>> I do not expect that there will be many situations where old client
>>>>> has to
>>>>> communicate with new server or visa versa.
>>>>> In most cases both client and server belongs to the same postgres
>>>>> distributive and so implements the same compression algorithm.
>>>>> As far as we are compressing only temporary data (traffic), the
>>>>> problem of
>>>>> providing backward compatibility seems to be not so important.
>>>> I think we should outright reject any patch without compression type
>>>> negotiation.
>>> Does it mean that it is necessary to support multiple compression
>>> algorithms and make it possible to perform switch between them at
>>> runtime?
>> IMHO the negotiation should happen at connection time, i.e. the server
>> should support connections compressed by different algorithms. Not sure
>> if that's what you mean by runtime.
>>
>> AFAICS this is quite close to how negotiation of encryption algorithms
>> works, in TLS and so on. Client specifies supported algorithms, server
>> compares that to list of supported algorithms, deduces the encryption
>> algorithm and notifies the client.
>>
>> To allow fall-back to uncompressed connection, use "none" as algorithm.
>> If there's no common algorithm, fail.
>
> It is good analogue with SSL.
> Yes, SSL protocol provides several ways of authentication, encryption,...
> And there are several different libraries implementing SSL.
> But Postgres is using only one of them: OpenSSL.
> If I want to use some other library (for example to make it possible to
> serialize and pass SSL session state to other
> process), then there is no way to achieve it.
>

That's rather misleading. Firstly, it's true we only support OpenSSL at
the moment, but I do remember we've been working on adding support to a
bunch of other TLS libraries.

But more importantly, it's not the TLS library that's negotiated. It's
the encryption algorithms that is negotiated. The server is oblivious
which TLS library is used by the client (and vice versa), because the
messages are the same - what matters is that they agree on keys,
ciphers, etc. And those can differ/change between libraries or even
versions of the same library.

For us, the situation is the same - we have the messages specified by
the FE/BE protocol, and it's the algorithms that are negotiated.

> Actually zstd also includes implementations of several compression
> algorithms and it choose one of them best fitting particular data
> stream. As in case of SSL, choice of algorithm is performed internally
> inside zstd - not at libpq level.
>

Really? I always thought zstd is a separate compression algorithm.
There's adaptive compression feature, but AFAIK that essentially tweaks
compression level based on network connection. Can you point me to the
sources or docs explaining this?

Anyway, this does not really change anything - it's internal zstd stuff.

> Sorry, if my explanation about static and dynamic (at runtime) choice
> were not correct.
> This is how compression is toggled now:
>
> #if HAVE_LIBZSTD
> ZpqStream*
> zpq_create(zpq_tx_func tx_func, zpq_rx_func rx_func, void *arg)
> {
> ...
> }
> #endif
>
> So if Postgres was configured with zstd, then this implementation is
> included inclient and server Postgres libraries.
> If postgres is configures with zlib, them  zlib implementation will be
> used.
> This is similar with using compression and most of other configurable
> features in Postgres.
>
> If we want to provide dynamic choice at runtime, then we need to have
> array with available compression algorithms:
>
> #if HAVE_LIBZSTD
> static ZpqStream*
> zstd_create(zpq_tx_func tx_func, zpq_rx_func rx_func, void *arg)
> {
> ...
> }
> #endif
>
> ZpqCompressorImpl compressorImpl[] =
> {
> #if HAVE_LIBZSTD
> {zstd_create, zstd_read,zstd_write,...},
> #endif
> #if HAVE_ZLIB
> {zlib_create, zlib_read,zslib_write,...},
> #endif
> ...
> }
>

Yes, that's mostly what I've been imagining, except that you also need
some sort of identifier for the algorithm - a cstring at the beginning
of the struct should be enough, I guess.

> And the most interesting case is that if we load library dynamically.
> Each implementation is generated in separate library (for  example
> libpztd.so).
> In this case we need to somehow specify available libraries.
> For example by placing them in separate directory, or specifying list of
> libraries in postgresql.conf.
> Then we try to load this library using dlopen.  Such library has
> external dependencies of correspondent compressor library (for example
> -lz). The library can be successfully loaded if there correspond
> compressor implementation was install at the system.
> This is most flexible approach allowing to provide custom implementation
> of compressors.
> Compression implementation can be organized as Postgres extension and
> its PG_init function registers this implementation in some list.
>

How you could make them as extensions? Those are database-specific and
the authentication happens before you have access to the database.

As I said before, I think adding them using shared_preload_libraries and
registering them in _PG_init should be sufficient.

> This is what I am asking about.
> Right now approach 1) is implemented: compression algorithm is defined
> by configure.
> It is no so difficult to extend it to support multiple algorithms.
> And the most flexible but more sophisticated is to load libraries
> dynamically.
>

Well, there's nothing stopping you from implementing the dynamic
loading, but IMHO it makes v1 unnecessarily complex.

>>
>>> Right now compression algorithm is linked statically.
>>> Negotiation of compression type is currently performed but it only
>>> checks that server and client are implementing the same algorithm and
>>> disables compression if it is not true.
>>>
>> I don't think we should automatically fall-back to disabled compression,
>> when a client specifies compression algorithm.
>
> Compression is disabled only when client and server were configured with
> different compression algorithms (i.e. zstd and zlib).
>

Yes, and I'm of the opinion we shouldn't do that, unless unless both
sides explicitly enable that in some way.

>>
>>> If we are going to support multiple compression algorithms, do we need
>>> dynamic loading of correspondent compression libraries or static linking
>>> is ok? In case of dynamic linking we need to somehow specify information
>>> about available compression algorithms.
>>> Some special subdirectory for them so that I can traverse this directory
>>> and try to load correspondent libraries?
>>>
>>> Only I find it too complicated for the addressed problem?
>>>
>> I don't think we need dynamic algorithms v1, but IMHO it'd be pretty
>> simple to do - just add a shared_preload_library which registers it in a
>> list in memory.
>
> I do not think that it is necessary to include such libraries in
> preload_shared_libraries list.
> It can be done lazily only of compression is requested by client.
> Also please notice that we need to load compression library both at
> server and client sides.
> preload_shared_libraries works only for postmaster.
>

How would you know which libraries to load for a given compression
algorithm? Surely, loading all available libraries just because they
might happen to implement the requested algorithm seems bad? IMHO the
shared_preload_libraries is a much safer (and working) approach.

But I'd just leave this aside, because trying to pack all of this into
v1 just increases the likelihood of it not getting committed in time.
And the fact that we don't have any such infrastructure in the client
just increases the risk.

+1 to go with hard-coded list of supported algorithms in v1

regars

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2019-02-10 00:48:12 Re: BUG #15623: Inconsistent use of default for updatable view
Previous Message Julien Rouhaud 2019-02-10 00:00:58 Re: executor relation handling