Re: libpq compression

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>, Daniil Zakhlystov <usernamedt(at)yandex-team(dot)ru>
Cc: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, pryzby(at)telsasoft(dot)com, x4mmm(at)yandex-team(dot)ru, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: libpq compression
Date: 2021-02-23 16:29:12
Message-ID: a364948a-d93b-fe17-f62d-334f34360daa@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22.02.2021 08:38, Craig Ringer wrote:
>
>
> On Thu, 11 Feb 2021, 21:09 Daniil Zakhlystov,
> <usernamedt(at)yandex-team(dot)ru <mailto:usernamedt(at)yandex-team(dot)ru>> wrote::
>
>
> 3. Chunked compression allows to compress only well compressible
> messages and save the CPU cycles by not compressing the others
> 4. Chunked compression introduces some traffic overhead compared
> to the permanent (1.2810G vs 1.2761G TX data on pg_restore of IMDB
> database dump, according to results in my previous message)
> 5. From the protocol point of view, chunked compression seems a
> little bit more flexible:
>  - we can inject some uncompressed messages at any time without
> the need to decompress/compress the compressed data
>  - we can potentially switch the compression algorithm at any time
> (but I think that this might be over-engineering)
>
>
> Chunked compression also potentially makes it easier to handle non
> blocking sockets, because you aren't worrying about yet another layer
> of buffering within the compression stream. This is a real pain with
> SSL, for example.
>
> It simplifies protocol analysis.
>
> It permits compression to be decided on the fly, heuristically, based
> on message size and potential compressibility.
>
> It could relatively easily be extended to compress a group of pending
> small messages, e.g. by PQflush. That'd help mitigate the downsides
> with small messages.
>
> So while stream compression offers better compression ratios, I'm
> inclined to suspect we'll want message level compression.

From my point of view there are several use cases where protocol
compression can be useful for:
1. Replication
2. Backup/dump
3. Bulk load (COPY)
4. Queries returning large objects (json, blobs,...)

All this cases are controlled by user or DBA, so them can make decision
whether to use compression or not.
Switching compression on the fly, use different algorithms in different
directions is not needed.
Yes, in all this scenarios data is mostly transferred in one direction.
So compression of small messages going in opposite direction is not
strictly needed.
But benchmarks shows that is has almost no influence on performance and
CPU usage.
So I suggest not to complicate protocol and implementation and implement
functionality which is present in most of other DBMSes.

There is no sense to try compression on workloads like pgbench and make
some conclusions based on it. From my point of view it is obvious misuse.
Compressing of each message idividually or chunked compression
significantly decrease compression ratio because typical size of message
is not large enough
and resetting compression state after processing each message (clearing
compression dictionary) adds too large overhead.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-02-23 17:15:29 Re: Bizarre behavior of \w in a regular expression bracket construct
Previous Message Daniel Gustafsson 2021-02-23 16:14:28 Re: pg_upgrade version checking questions