Re: libpq compression

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: "Iwata, Aya" <iwata(dot)aya(at)jp(dot)fujitsu(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "rharwood(at)redhat(dot)com" <rharwood(at)redhat(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, "g(dot)smolkin(at)postgrespro(dot)ru" <g(dot)smolkin(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, 'Dmitry Dolgov' <9erthalion6(at)gmail(dot)com>
Subject: Re: libpq compression
Date: 2019-02-10 06:12:13
Message-ID: ce75b684-3fd1-648a-5b86-90b7a203e71c@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10.02.2019 3:25, Tomas Vondra wrote:
>
> On 2/9/19 3:02 PM, Konstantin Knizhnik wrote:
>>
>> On 09.02.2019 1:38, Tomas Vondra wrote:
>>> On 2/8/19 11:10 PM, Konstantin Knizhnik wrote:
>>>> On 08.02.2019 21:57, Andres Freund wrote:
>>>>> On 2019-02-08 12:15:58 +0300, Konstantin Knizhnik wrote:
>>>>>> Frankly speaking, I do not think that such flexibility in choosing
>>>>>> compression algorithms is really needed.
>>>>>> I do not expect that there will be many situations where old client
>>>>>> has to
>>>>>> communicate with new server or visa versa.
>>>>>> In most cases both client and server belongs to the same postgres
>>>>>> distributive and so implements the same compression algorithm.
>>>>>> As far as we are compressing only temporary data (traffic), the
>>>>>> problem of
>>>>>> providing backward compatibility seems to be not so important.
>>>>> I think we should outright reject any patch without compression type
>>>>> negotiation.
>>>> Does it mean that it is necessary to support multiple compression
>>>> algorithms and make it possible to perform switch between them at
>>>> runtime?
>>> IMHO the negotiation should happen at connection time, i.e. the server
>>> should support connections compressed by different algorithms. Not sure
>>> if that's what you mean by runtime.
>>>
>>> AFAICS this is quite close to how negotiation of encryption algorithms
>>> works, in TLS and so on. Client specifies supported algorithms, server
>>> compares that to list of supported algorithms, deduces the encryption
>>> algorithm and notifies the client.
>>>
>>> To allow fall-back to uncompressed connection, use "none" as algorithm.
>>> If there's no common algorithm, fail.
>> It is good analogue with SSL.
>> Yes, SSL protocol provides several ways of authentication, encryption,...
>> And there are several different libraries implementing SSL.
>> But Postgres is using only one of them: OpenSSL.
>> If I want to use some other library (for example to make it possible to
>> serialize and pass SSL session state to other
>> process), then there is no way to achieve it.
>>
> That's rather misleading. Firstly, it's true we only support OpenSSL at
> the moment, but I do remember we've been working on adding support to a
> bunch of other TLS libraries.
>
> But more importantly, it's not the TLS library that's negotiated. It's
> the encryption algorithms that is negotiated. The server is oblivious
> which TLS library is used by the client (and vice versa), because the
> messages are the same - what matters is that they agree on keys,
> ciphers, etc. And those can differ/change between libraries or even
> versions of the same library.
>
> For us, the situation is the same - we have the messages specified by
> the FE/BE protocol, and it's the algorithms that are negotiated.
>
>> Actually zstd also includes implementations of several compression
>> algorithms and it choose one of them best fitting particular data
>> stream. As in case of SSL, choice of algorithm is performed internally
>> inside zstd - not at libpq level.
>>
> Really? I always thought zstd is a separate compression algorithm.
> There's adaptive compression feature, but AFAIK that essentially tweaks
> compression level based on network connection. Can you point me to the
> sources or docs explaining this?
>
> Anyway, this does not really change anything - it's internal zstd stuff.
>
>> Sorry, if my explanation about static and dynamic (at runtime) choice
>> were not correct.
>> This is how compression is toggled now:
>>
>> #if HAVE_LIBZSTD
>> ZpqStream*
>> zpq_create(zpq_tx_func tx_func, zpq_rx_func rx_func, void *arg)
>> {
>> ...
>> }
>> #endif
>>
>> So if Postgres was configured with zstd, then this implementation is
>> included inclient and server Postgres libraries.
>> If postgres is configures with zlib, them  zlib implementation will be
>> used.
>> This is similar with using compression and most of other configurable
>> features in Postgres.
>>
>> If we want to provide dynamic choice at runtime, then we need to have
>> array with available compression algorithms:
>>
>> #if HAVE_LIBZSTD
>> static ZpqStream*
>> zstd_create(zpq_tx_func tx_func, zpq_rx_func rx_func, void *arg)
>> {
>> ...
>> }
>> #endif
>>
>> ZpqCompressorImpl compressorImpl[] =
>> {
>> #if HAVE_LIBZSTD
>> {zstd_create, zstd_read,zstd_write,...},
>> #endif
>> #if HAVE_ZLIB
>> {zlib_create, zlib_read,zslib_write,...},
>> #endif
>> ...
>> }
>>
> Yes, that's mostly what I've been imagining, except that you also need
> some sort of identifier for the algorithm - a cstring at the beginning
> of the struct should be enough, I guess.
>
>> And the most interesting case is that if we load library dynamically.
>> Each implementation is generated in separate library (for  example
>> libpztd.so).
>> In this case we need to somehow specify available libraries.
>> For example by placing them in separate directory, or specifying list of
>> libraries in postgresql.conf.
>> Then we try to load this library using dlopen.  Such library has
>> external dependencies of correspondent compressor library (for example
>> -lz). The library can be successfully loaded if there correspond
>> compressor implementation was install at the system.
>> This is most flexible approach allowing to provide custom implementation
>> of compressors.
>> Compression implementation can be organized as Postgres extension and
>> its PG_init function registers this implementation in some list.
>>
> How you could make them as extensions? Those are database-specific and
> the authentication happens before you have access to the database.
>
> As I said before, I think adding them using shared_preload_libraries and
> registering them in _PG_init should be sufficient.
>
>> This is what I am asking about.
>> Right now approach 1) is implemented: compression algorithm is defined
>> by configure.
>> It is no so difficult to extend it to support multiple algorithms.
>> And the most flexible but more sophisticated is to load libraries
>> dynamically.
>>
> Well, there's nothing stopping you from implementing the dynamic
> loading, but IMHO it makes v1 unnecessarily complex.
>
>>>> Right now compression algorithm is linked statically.
>>>> Negotiation of compression type is currently performed but it only
>>>> checks that server and client are implementing the same algorithm and
>>>> disables compression if it is not true.
>>>>
>>> I don't think we should automatically fall-back to disabled compression,
>>> when a client specifies compression algorithm.
>> Compression is disabled only when client and server were configured with
>> different compression algorithms (i.e. zstd and zlib).
>>
> Yes, and I'm of the opinion we shouldn't do that, unless unless both
> sides explicitly enable that in some way.
>
>>>> If we are going to support multiple compression algorithms, do we need
>>>> dynamic loading of correspondent compression libraries or static linking
>>>> is ok? In case of dynamic linking we need to somehow specify information
>>>> about available compression algorithms.
>>>> Some special subdirectory for them so that I can traverse this directory
>>>> and try to load correspondent libraries?
>>>>
>>>> Only I find it too complicated for the addressed problem?
>>>>
>>> I don't think we need dynamic algorithms v1, but IMHO it'd be pretty
>>> simple to do - just add a shared_preload_library which registers it in a
>>> list in memory.
>> I do not think that it is necessary to include such libraries in
>> preload_shared_libraries list.
>> It can be done lazily only of compression is requested by client.
>> Also please notice that we need to load compression library both at
>> server and client sides.
>> preload_shared_libraries works only for postmaster.
>>
> How would you know which libraries to load for a given compression
> algorithm? Surely, loading all available libraries just because they
> might happen to implement the requested algorithm seems bad? IMHO the
> shared_preload_libraries is a much safer (and working) approach.
>
> But I'd just leave this aside, because trying to pack all of this into
> v1 just increases the likelihood of it not getting committed in time.
> And the fact that we don't have any such infrastructure in the client
> just increases the risk.
>
> +1 to go with hard-coded list of supported algorithms in v1
>
> regars
>
Ok, I will implement support of multiple configured compression algorithms.
Concerning usage of several different compression algorithms in zsd - I
was not correct.
It combines LZ77 with entropy encoding stageand can adaptively adjust
the compression ratio according to the load.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-02-10 06:26:14 Re: dsa_allocate() faliure
Previous Message Haribabu Kommi 2019-02-10 05:24:31 Re: Transaction commits VS Transaction commits (with parallel) VS query mean time