Re: libpq compression

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Robbie Harwood <rharwood(at)redhat(dot)com>, "Iwata, Aya" <iwata(dot)aya(at)jp(dot)fujitsu(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, "g(dot)smolkin(at)postgrespro(dot)ru" <g(dot)smolkin(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, 'Dmitry Dolgov' <9erthalion6(at)gmail(dot)com>
Subject: Re: libpq compression
Date: 2019-02-08 17:18:19
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08.02.2019 19:26, Robbie Harwood wrote:
> Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> writes:
>> On 08.02.2019 10:01, Iwata, Aya wrote:
>>>> I fixed all issues you have reported except using list of supported
>>>> compression algorithms.
>>> Sure. I confirmed that.
>>>> It will require extra round of communication between client and
>>>> server to make a decision about used compression algorithm.
>>> In beginning of this thread, Robbie Harwood said that no extra
>>> communication needed. I think so, too.
>> Well, I think that this problem is more complex and requires more
>> discussion.
>> There are three places determining choice of compression algorithm:
>> 1. Specification of compression algorithm by client. Right now it is
>> just boolean "compression" parameter in connection string,
>> but it is obviously possible to specify concrete algorithm here.
>> 2. List of compression algorithms supported by client.
>> 3. List of compression algorithms supported by server.
>> Concerning first option I have very serious doubt that it is good idea
>> to let client choose compression protocol.
>> Without extra round-trip it can be only implemented in this way:
>> if client toggles compression option in connection string, then libpq
>> includes in startup packet list of supported compression algorithms.
>> Then server intersects this list with its own set of supported
>> compression algorithms and if result is not empty, then
>> somehow choose  one of the commonly supported algorithms and sends it to
>> the client with 'z' command.
> The easiest way, which I laid out earlier in
> id:jlgfu1gqjbk(dot)fsf(at)redhat(dot)com, is to have the server perform selection.
> The client sends a list of supported algorithms in startup. Startup has
> a reply, so if the server wishes to use compression, then its startup
> reply contains which algorithm to use. Compression then begins after
> startup.
> If you really wanted to compress half the startup for some reason, you
> can even have the server send a packet which consists of the choice of
> compression algorithm and everything else in it subsequently
> compressed. I don't see this being useful. However, you can use a
> similar approach to let the client choose the algorithm if there were
> some compelling reason for that (there is none I'm aware of to prefer
> one to the other) - startup from client requests compression, reply from
> server lists supported algorithms, next packet from client indicates
> which one is in use along with compressed payload.
I already replied you that that next package cannot indicate which
algorithm the client has choosen.
Using magics or some other tricks are not reliable and acceptable solution/
It can be done only by introducing extra message type.

Actually it is not really needed, because if client sends to the server
list of supported algorithms in startup packet,
then server can made a decision and inform client about it (using
special message type as it is done now).
Then client doesn't need to make a choice. I can do it if everybody
think that choice of compression algorithm is so important feature.
Just wonder: Postgres now is living good with hardcoded zlib and
built-in LZ compression algorithm implementation.

> It may help to keep in mind that you are defining your own message type
> here.
>> Frankly speaking, I do not think that such flexibility in choosing
>> compression algorithms is really needed.
>> I do not expect that there will be many situations where old client has
>> to communicate with new server or visa versa.
>> In most cases both client and server belongs to the same postgres
>> distributive and so implements the same compression algorithm.
>> As far as we are compressing only temporary data (traffic), the problem
>> of providing backward compatibility seems to be not so important.
> Your comments have been heard, but this is the model that numerous folks
> from project has told you we have. Your code will not pass review
> without algorithm agility.
>>> src/backend/libpq/pqcomm.c :
>>> In current Postgres source code, pq_recvbuf() calls secure_read()
>>> and pq_getbyte_if_available() also calls secure_read().
>>> It means these functions are on the same level.
>>> However in your change, pq_getbyte_if_available() calls pq_recvbuf(),
>>> and pq_recvbuf() calls secure_read(). The level of these functions is different.
>>> I think the purpose of pq_getbyte_if_available() is to get a
>>> character if it exists and the purpose of pq_recvbuf() is to acquire
>>> data up to the expected length. In your change,
>>> pq_getbyte_if_available() may have to do unnecessary process waiting
>>> or something.
>> Sorry, but this change is essential. We can have some available data
>> in compression buffer and we need to try to fetch it in
>> pq_getbyte_if_available() instead of just returning EOF.
> Aya is correct about the purposes of these functions. Take a look at
> how the buffering in TLS or GSSAPI works for an example of how to do
> this correctly.
> As with agility, this is what multiple folks from the project have told
> you is a hard requirement. None of us will be okaying your code without
> proper transport layering.
Guys, I wondering which layering violation you are talking about?
Right now there are two cut&pasted peace of almost the same code in

static int
    /* Ensure that we're in blocking mode */

    /* Can fill buffer from PqRecvLength and upwards */
    for (;;)
        int            r;

        r = secure_read(MyProcPort, PqRecvBuffer + PqRecvLength,
                        PQ_RECV_BUFFER_SIZE - PqRecvLength);

        if (r < 0)
            if (errno == EINTR)
                continue;        /* Ok if interrupted */

             * Careful: an ereport() that tries to write to the client
             * cause recursion to here, leading to stack overflow and core
             * dump!  This message must go *only* to the postmaster log.
                     errmsg("could not receive data from client: %m")));
            return EOF;
        if (r == 0)
             * EOF detected.  We used to write a log message here, but it's
             * better to expect the ultimate caller to do that.
            return EOF;
        /* r contains number of bytes read, so just incr length */
        PqRecvLength += r;
        return 0;

pq_getbyte_if_available(unsigned char *c)
    /* Put the socket into non-blocking mode */

    r = secure_read(MyProcPort, c, 1);
    if (r < 0)
         * Ok if no data available without blocking or interrupted (though
         * EINTR really shouldn't happen with a non-blocking socket).
         * other errors.
        if (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR)
            r = 0;
             * Careful: an ereport() that tries to write to the client
             * cause recursion to here, leading to stack overflow and core
             * dump!  This message must go *only* to the postmaster log.
                     errmsg("could not receive data from client: %m")));
            r = EOF;
    else if (r == 0)
        /* EOF detected */
        r = EOF;

    return r;

The only difference between them is that in first case we are using
block mode and in the second case non-blocking mode.
Also there is loop in first case handling ENITR.

I have added nowait parameter to  pq_recvbuf(bool nowait)
and remove second fragment of code so it is changed just to one line:

if (PqRecvPointer < PqRecvLength || (r = pq_recvbuf(true)) > 0)

Both functions (pq_recvbuf and pq_getbyte_if_available) are defined in
the same file.
So there is not any layering violation here. It is just elimination of
duplicated code.

If the only objection is that pq_getbyte_if_available is calling
pq_recvbuf, then I can add some other function compress_read (as analog
of secure read) and call it from both functions. But frankly speaking I
do not see any advantages of such approach.
It just introduce extra function call and gives no extra
encapsulation.modularity, flexibility or whatever else.

Konstantin Knizhnik
Postgres Professional:
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2019-02-08 17:29:17 Re: pgsql: Remove references to Majordomo
Previous Message s.cherkashin 2019-02-08 17:16:15 Re: Add client connection check during the execution of the query