Re: libpq compression

From: Robbie Harwood <rharwood(at)redhat(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, "Iwata\, Aya" <iwata(dot)aya(at)jp(dot)fujitsu(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, "g(dot)smolkin\(at)postgrespro(dot)ru" <g(dot)smolkin(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, 'Dmitry Dolgov' <9erthalion6(at)gmail(dot)com>
Subject: Re: libpq compression
Date: 2019-02-08 16:26:44
Message-ID: jlg7eeap7p7.fsf@redhat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> writes:

> On 08.02.2019 10:01, Iwata, Aya wrote:
>
>>> I fixed all issues you have reported except using list of supported
>>> compression algorithms.
>>
>> Sure. I confirmed that.
>>
>>> It will require extra round of communication between client and
>>> server to make a decision about used compression algorithm.
>>
>> In beginning of this thread, Robbie Harwood said that no extra
>> communication needed. I think so, too.
>
> Well, I think that this problem is more complex and requires more
> discussion.
> There are three places determining choice of compression algorithm:
> 1. Specification of compression algorithm by client. Right now it is
> just boolean "compression" parameter in connection string,
> but it is obviously possible to specify concrete algorithm here.
> 2. List of compression algorithms supported by client.
> 3. List of compression algorithms supported by server.
>
> Concerning first option I have very serious doubt that it is good idea
> to let client choose compression protocol.
> Without extra round-trip it can be only implemented in this way:
> if client toggles compression option in connection string, then libpq
> includes in startup packet list of supported compression algorithms.
> Then server intersects this list with its own set of supported
> compression algorithms and if result is not empty, then
> somehow choose  one of the commonly supported algorithms and sends it to
> the client with 'z' command.

The easiest way, which I laid out earlier in
id:jlgfu1gqjbk(dot)fsf(at)redhat(dot)com, is to have the server perform selection.
The client sends a list of supported algorithms in startup. Startup has
a reply, so if the server wishes to use compression, then its startup
reply contains which algorithm to use. Compression then begins after
startup.

If you really wanted to compress half the startup for some reason, you
can even have the server send a packet which consists of the choice of
compression algorithm and everything else in it subsequently
compressed. I don't see this being useful. However, you can use a
similar approach to let the client choose the algorithm if there were
some compelling reason for that (there is none I'm aware of to prefer
one to the other) - startup from client requests compression, reply from
server lists supported algorithms, next packet from client indicates
which one is in use along with compressed payload.

It may help to keep in mind that you are defining your own message type
here.

> Frankly speaking, I do not think that such flexibility in choosing
> compression algorithms is really needed.
> I do not expect that there will be many situations where old client has
> to communicate with new server or visa versa.
> In most cases both client and server belongs to the same postgres
> distributive and so implements the same compression algorithm.
> As far as we are compressing only temporary data (traffic), the problem
> of providing backward compatibility seems to be not so important.

Your comments have been heard, but this is the model that numerous folks
from project has told you we have. Your code will not pass review
without algorithm agility.

>> src/backend/libpq/pqcomm.c :
>> In current Postgres source code, pq_recvbuf() calls secure_read()
>> and pq_getbyte_if_available() also calls secure_read().
>> It means these functions are on the same level.
>> However in your change, pq_getbyte_if_available() calls pq_recvbuf(),
>> and pq_recvbuf() calls secure_read(). The level of these functions is different.
>>
>> I think the purpose of pq_getbyte_if_available() is to get a
>> character if it exists and the purpose of pq_recvbuf() is to acquire
>> data up to the expected length. In your change,
>> pq_getbyte_if_available() may have to do unnecessary process waiting
>> or something.
>
> Sorry, but this change is essential. We can have some available data
> in compression buffer and we need to try to fetch it in
> pq_getbyte_if_available() instead of just returning EOF.

Aya is correct about the purposes of these functions. Take a look at
how the buffering in TLS or GSSAPI works for an example of how to do
this correctly.

As with agility, this is what multiple folks from the project have told
you is a hard requirement. None of us will be okaying your code without
proper transport layering.

Thanks,
--Robbie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Sharma 2019-02-08 16:27:14 Re: ON SELECT rule on a table without columns
Previous Message Tom Lane 2019-02-08 16:15:39 Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)