Re: libpq compression

From: Robbie Harwood <rharwood(at)redhat(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Grigory Smolkin <g(dot)smolkin(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: libpq compression
Date: 2018-06-21 14:56:47
Message-ID: jlgfu1gqjbk.fsf@redhat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> writes:

> On 20.06.2018 23:34, Robbie Harwood wrote:
>> Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> writes:
>>
>>
>> My idea was the following: client want to use compression. But server
>> may reject this attempt (for any reasons: it doesn't support it, has
>> no proper compression library, do not want to spend CPU for
>> decompression,...) Right now compression algorithm is hardcoded. But
>> in future client and server may negotiate to choose proper compression
>> protocol. This is why I prefer to perform negotiation between client
>> and server to enable compression.
>> Well, for negotiation you could put the name of the algorithm you want
>> in the startup. It doesn't have to be a boolean for compression, and
>> then you don't need an additional round-trip.
>
> Sorry, I can only repeat arguments I already mentioned:
> - in future it may be possible to specify compression algorithm
> - even with boolean compression option server may have some reasons to
> reject client's request to use compression
>
> Extra flexibility is always good thing if it doesn't cost too much. And
> extra round of negotiation in case of enabling compression seems to me
> not to be a high price for it.

You already have this flexibility even without negotiation. I don't
want you to lose your flexibility. Protocol looks like this:

- Client sends connection option "compression" with list of algorithms
it wants to use (comma-separated, or something).

- First packet that the server can compress one of those algorithms (or
none, if it doesn't want to turn on compression).

No additional round-trips needed.

>> Well, that's a design decision you've made. You could put lengths on
>> chunks that are sent out - then you'd know exactly how much is needed.
>> (For instance, 4 bytes of network-order length followed by a complete
>> payload.) Then you'd absolutely know whether you have enough to
>> decompress or not.
>
> Do you really suggest to send extra header for each chunk of data?
> Please notice that chunk can be as small as one message: dozen of bytes
> because libpq is used for client-server communication with request-reply
> pattern.

I want you to think critically about your design. I *really* don't want
to design it for you - I have enough stuff to be doing. But again, the
design I gave you doesn't necessarily need that - you just need to
properly buffer incomplete data.

> Frankly speaking I do not completely understand the source of your
> concern. My primary idea was to preseve behavior of libpq function as
> much as possible, so there is no need to rewrite all places in
> Postgres code when them are used. It seems to me that I succeed in
> reaching this goal. Incase of enabled compression zpq_stream functions
> (zpq-read/write) are used instead of (pq)secure_read/write and in turn
> are using them to fetch/send more data. I do not see any bad flaws,
> encapsulation violation or some other problems in such solution.
>
> So before discussing some alternative ways of embedding compression in
> libpq, I will want to understand what's wrong with this approach.

You're destroying the existing model for no reason. If you needed to, I
could understand some argument for the way you've done it, but what I've
tried to tell you is that you don't need to do so. It's longer this
way, and it *significantly* complicates the (already difficult to reason
about) connection state machine.

I get that rewriting code can be obnoxious, and it feels like a waste of
time when we have to do so. (I've been there; I'm on version 19 of my
postgres patchset.)

>>> So loop should be something like this:
>>>
>>> decompress(ptr, n)
>>> ZSTD_inBuffer in = {0}
>>> ZSTD_outBuffer out = {0}
>>>
>>> in.src = ptr
>>> in.size = n
>>>
>>> while true
>>> ret = ZSTD_decompressStream(out, in)
>>> if ZSTD_isError(ret):
>>> give_up()
>>> if out.pos != 0
>>> // if we deomcpress soemthing
>>> return out.pos;
>>> read_input(in);
>>
>> The last line is what I'm taking issue with. The interface we have
>> already in postgres's network code has a notion of partial reads, or
>> that reads might not return data. (Same with writing and partial
>> writes.) So you'd buffer what compressed data you have and return -
>> this is the preferred idiom so that we don't block or spin on a
>> nonblocking socket.
>
> If socket is in non-blocking mode and there is available data, then
> secure_read  function will also immediately return 0.
> The pseudocode above is not quite correct. Let me show the real
> implementation of zpq_read:
>
> ssize_t
> zpq_read(ZpqStream *zs, void *buf, size_t size, size_t *processed)
> {
>     ssize_t rc;
>     ZSTD_outBuffer out;
>     out.dst = buf;
>     out.pos = 0;
>     out.size = size;
>
>     while (1)
>     {
>         rc = ZSTD_decompressStream(zs->rx_stream, &out, &zs->rx);
>         if (ZSTD_isError(rc))
>         {
>             zs->rx_error = ZSTD_getErrorName(rc);
>             return ZPQ_DECOMPRESS_ERROR;
>         }
>         /* Return result if we fill requested amount of bytes or read
> operation was performed */
>         if (out.pos != 0)
>         {
>             zs->rx_total_raw += out.pos;
>             return out.pos;
>         }
>         if (zs->rx.pos == zs->rx.size)
>         {
>             zs->rx.pos = zs->rx.size = 0; /* Reset rx buffer */
>         }
>         rc = zs->rx_func(zs->arg, (char*)zs->rx.src + zs->rx.size,
> ZPQ_BUFFER_SIZE - zs->rx.size);
>         if (rc > 0) /* read fetches some data */
>         {
>             zs->rx.size += rc;
>             zs->rx_total += rc;
>         }
>         else /* read failed */
>         {
>             *processed = out.pos;
>             zs->rx_total_raw += out.pos;
>             return rc;
>         }
>     }
> }
>
> Sorry, but I have spent quite enough time trying to provide the same
> behavior of zpq_read/write as of secure_read/write both for blocking and
> non-blocking mode.
> And I hope that now it is preserved. And frankly speaking I do not see
> much differences of this approach with supporting TLS.
>
> Current implementation allows to combine compression with TLS and in
> some cases it may be really useful.

The bottom line, though, is that I cannot recommend the code for
committer as long you have it plumbed like this. -1.

Thanks,
--Robbie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sagiv Some 2018-06-21 15:02:32 phraseto_tsquery design
Previous Message Don Seiler 2018-06-21 14:21:02 Re: [PATCH] Include application_name in "connection authorized" log message