Re: libpq compression

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Robbie Harwood <rharwood(at)redhat(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Grigory Smolkin <g(dot)smolkin(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: libpq compression
Date: 2019-02-15 15:40:37
Message-ID: d58ebe47-5662-a50f-117d-4c1049acbfef@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15.02.2019 18:26, Tomas Vondra wrote:
>
>
> On 2/15/19 3:03 PM, Konstantin Knizhnik wrote:
>>
>> On 15.02.2019 15:42, Peter Eisentraut wrote:
>>> On 2018-06-19 09:54, Konstantin Knizhnik wrote:
>>>> The main drawback of streaming compression is that you can not
>>>> decompress some particular message without decompression of all previous
>>>> messages.
>>> It seems this would have an adverse effect on protocol-aware connection
>>> proxies: They would have to uncompress everything coming in and
>>> recompress everything going out.
>>>
>>> The alternative of compressing each packet individually would work much
>>> better: A connection proxy could peek into the packet header and only
>>> uncompress the (few, small) packets that it needs for state and routing.
>>>
>> Individual compression of each message depreciate all idea of libpq
>> compression. Messages are two small to efficiently compress each of
>> them separately. So using streaming compression algorithm is
>> absolutely necessary here.
>>
> Hmmm, I see Peter was talking about "packets" while you're talking about
> "messages". Are you talking about the same thing?
Sorry, but there are no "packet" in libpq protocol, so I assumed that
packet=message.
In any protocol-aware proxy has to proceed each message.

>
> Anyway, I was going to write about the same thing - that per-message
> compression would likely eliminate most of the benefits - but I'm
> wondering if it's actually true. That is, how much will the compression
> ratio drop if we compress individual messages?
Compression of small messages without shared dictionary will give awful
results.
Assume that average record and so message size is 100 bytes.
Just perform very simple experiment create file with 100 equal
characters and try to compress it.
With zlib result will be 173 bytes. So after "compression" size of file
is increase 1.7 times.
This is why there is no other way to efficiently compress libpq traffic
without usage of streaming compression
(when dictionary is  shared and updated for all messages).
>
> Obviously, if there are just tiny messages, it might easily eliminate
> any benefits (and in fact it would add overhead). But I'd say we're way
> more interested in transferring large data sets (result sets, data for
> copy, etc.) and presumably those messages are much larger. So maybe we
> could compress just those, somehow?
Please notice that copy stream consists of individual messages for each
record.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-02-15 15:44:27 Re: Protect syscache from bloating with negative cache entries
Previous Message Tomas Vondra 2019-02-15 15:26:13 Re: libpq compression