Re: libpq compression

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Robbie Harwood <rharwood(at)redhat(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Grigory Smolkin <g(dot)smolkin(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: libpq compression
Date: 2018-06-22 07:11:58
Message-ID: 5b673a96-b16e-9a79-c0ec-f7f03f207c62@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21.06.2018 20:14, Robbie Harwood wrote:
> Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> writes:
>
>> On 21.06.2018 17:56, Robbie Harwood wrote:
>>> Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> writes:
>>>> On 20.06.2018 23:34, Robbie Harwood wrote:
>>>>> Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> writes:
>>>>>
>>>>> Well, that's a design decision you've made. You could put lengths
>>>>> on chunks that are sent out - then you'd know exactly how much is
>>>>> needed. (For instance, 4 bytes of network-order length followed by
>>>>> a complete payload.) Then you'd absolutely know whether you have
>>>>> enough to decompress or not.
>>>> Do you really suggest to send extra header for each chunk of data?
>>>> Please notice that chunk can be as small as one message: dozen of
>>>> bytes because libpq is used for client-server communication with
>>>> request-reply pattern.
>>> I want you to think critically about your design. I *really* don't
>>> want to design it for you - I have enough stuff to be doing. But
>>> again, the design I gave you doesn't necessarily need that - you just
>>> need to properly buffer incomplete data.
>> Right now secure_read may return any number of available bytes. But in
>> case of using streaming compression, it can happen that available
>> number of bytes is not enough to perform decompression. This is why we
>> may need to try to fetch additional portion of data. This is how
>> zpq_stream is working now.
> No, you need to buffer and wait until you're called again. Which is to
> say: decompress() shouldn't call secure_read(). secure_read() should
> call decompress().
>

I this case I will have to implement this code twice: both for backend
and frontend, i.e. for secure_read/secure_write and
pqsecure_read/pqsecure_write.
Frankly speaking i was very upset by design of libpq communication layer
in Postgres: there are two different implementations of almost the same
stuff for cbackend and frontend.
I do not see any meaningful argument for it except "historical reasons".
The better decision was to encapsulate socket communication layer (and
some other system dependent stuff) in SAL (system abstraction layer) and
use it both in backend and frontend.

By passing secure_read/pqsecure_read functions to zpq_stream I managed
to avoid such code duplication at least for compression.

>> I do not understand how it is possible to implement in different way
>> and what is wrong with current implementation.
> The most succinct thing I can say is: absolutely don't modify
> pq_recvbuf(). I gave you pseudocode for how to do that. All of your
> logic should be *below* the secure_read/secure_write functions.
>
> I cannot approve code that modifies pq_recvbuf() in the manner you
> currently do.

Well. I understand you arguments.
But please also consider my argument above (about avoiding code
duplication).

In any case, secure_read function is called only from pq_recvbuf() as
well as pqsecure_read is called only from pqReadData.
So I do not think principle difference in handling compression in
secure_read or pq_recvbuf functions and do not understand why it is
"destroying the existing model".

Frankly speaking, I will really like to destroy existed model, moving
all system dependent stuff in Postgres to SAL and avoid this awful mix
of code
sharing and duplication between backend and frontend. But it is a
another story and I do not want to discuss it here.

If we are speaking about the "right design", then neither your
suggestion, neither my implementation are good and I do not see
principle differences between them.
The right approach is using "decorator pattern": this is how streams are
designed in .Net/Java. You can construct pipe of "secure", "compressed"
and whatever else streams.
Yes, it is first of all pattern for object-oriented approach and
Postgres is implemented in C. But it is actually possible to use OO
approach in pure C (X-Windows!).
But once again, this discussion may lead other too far away from the
topic of libpq compression.

As far as I already wrote, the main  points of my design were:
1. Minimize changes in Postgres code
2. Avoid code duplication
3. Provide abstract (compression stream) which can be used somewhere
else except libpq itself.

>
>>>>> My idea was the following: client want to use compression. But
>>>>> server may reject this attempt (for any reasons: it doesn't support
>>>>> it, has no proper compression library, do not want to spend CPU for
>>>>> decompression,...) Right now compression algorithm is
>>>>> hardcoded. But in future client and server may negotiate to choose
>>>>> proper compression protocol. This is why I prefer to perform
>>>>> negotiation between client and server to enable compression. Well,
>>>>> for negotiation you could put the name of the algorithm you want in
>>>>> the startup. It doesn't have to be a boolean for compression, and
>>>>> then you don't need an additional round-trip.
>>>> Sorry, I can only repeat arguments I already mentioned:
>>>> - in future it may be possible to specify compression algorithm
>>>> - even with boolean compression option server may have some reasons to
>>>> reject client's request to use compression
>>>>
>>>> Extra flexibility is always good thing if it doesn't cost too
>>>> much. And extra round of negotiation in case of enabling compression
>>>> seems to me not to be a high price for it.
>>> You already have this flexibility even without negotiation. I don't
>>> want you to lose your flexibility. Protocol looks like this:
>>>
>>> - Client sends connection option "compression" with list of
>>> algorithms it wants to use (comma-separated, or something).
>>>
>>> - First packet that the server can compress one of those algorithms
>>> (or none, if it doesn't want to turn on compression).
>>>
>>> No additional round-trips needed.
>> This is exactly how it works now... Client includes compression
>> option in connection string and server replies with special message
>> ('Z') if it accepts request to compress traffic between this client
>> and server.
> No, it's not. You don't need this message. If the server receives a
> compression request, it should just turn compression on (or not), and
> then have the client figure out whether it got compression back.
How it will managed to do it. It receives some reply and first of all it
should know whether it has to be decompressed or not.

> This
> is of course made harder by your refusal to use packet framing, but
> still shouldn't be particularly difficult.
But how?

>
> Thanks,
> --Robbie

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2018-06-22 07:16:51 Re: Push down Aggregates below joins
Previous Message Craig Ringer 2018-06-22 07:07:28 Re: PATCH: backtraces for error messages