Re: libpq compression

From: Robbie Harwood <rharwood(at)redhat(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Grigory Smolkin <g(dot)smolkin(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: libpq compression
Date: 2018-06-20 20:34:10
Message-ID: jlgo9g5qjst.fsf@redhat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> writes:

> On 20.06.2018 00:04, Robbie Harwood wrote:
>> Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> writes:
>>> On 18.06.2018 23:34, Robbie Harwood wrote:
>>>
>>>> I also don't like that you've injected into the *startup* path -
>>>> before authentication takes place. Fundamentally, authentication
>>>> (if it happens) consists of exchanging some combination of short
>>>> strings (e.g., usernames) and binary blobs (e.g., keys). None of
>>>> this will compress well, so I don't see it as worth performing this
>>>> negotiation there - it can wait. It's also another message in
>>>> every startup. I'd leave it to connection parameters, personally,
>>>> but up to you.
>>>
>>> From my point of view compression of libpq traffic is similar with
>>> SSL and should be toggled at the same stage.
>>
>> But that's not what you're doing. This isn't where TLS gets toggled.
>>
>> TLS negotiation happens as the very first packet: after completing
>> the TCP handshake, the client will send a TLS negotiation request.
>> If it doesn't happen there, it doesn't happen at all.
>>
>> (You *could* configure it where TLS is toggled. This is, I think,
>> not a good idea. TLS encryption is a probe: the server can reject
>> it, at which point the client tears everything down and connects
>> without TLS. So if you do the same with compression, that's another
>> point of tearing down an starting over. The scaling on it isn't good
>> either: if we add another encryption method into the mix, you've
>> doubled the number of teardowns.)
>
> Yes, you are right. There is special message for enabling TLS
> procotol. But I do not think that the same think is needed for
> compression. This is why I prefer to specify compression in
> connectoin options. So compression may be enabled straight after
> processing of startup package. Frankly speaking I still do no see
> reasons to postpone enabling compression till some later moment.

I'm arguing for connection option only (with no additional negotiation
round-trip). See below.

>>> Definitely authentication parameter are not so large to be
>>> efficiently compressed, by compression (may be in future password
>>> protected) can somehow protect this data. In any case I do not
>>> think that compression of authentication data may have any influence
>>> on negotiation speed. So I am not 100% sure that toggling
>>> compression just after receiving startup package is the only right
>>> solution. But I am not also convinced in that there is some better
>>> place where compressor should be configured. Do you have some
>>> concrete suggestions for it? In InitPostgres just after
>>> PerformAuthentication ?
>>
>> Hmm, let me try to explain this differently.
>>
>> pq_configure() (as you've called it) shouldn't send a packet. At its
>> callsite, we *already know* whether we want to use compression -
>> that's what the port->use_compression option says. So there's no
>> point in having a negotiation there - it's already happened.
>
> My idea was the following: client want to use compression. But server
> may reject this attempt (for any reasons: it doesn't support it, has
> no proper compression library, do not want to spend CPU for
> decompression,...) Right now compression algorithm is hardcoded. But
> in future client and server may negotiate to choose proper compression
> protocol. This is why I prefer to perform negotiation between client
> and server to enable compression.

Well, for negotiation you could put the name of the algorithm you want
in the startup. It doesn't have to be a boolean for compression, and
then you don't need an additional round-trip.

>>>> I don't like where you've put the entry points to the compression
>>>> logic: it's a layering violation. A couple people have expressed
>>>> similar reservations I think, so let me see if I can explain using
>>>> `pqsecure_read()` as an example. In pseudocode, `pqsecure_read()`
>>>> looks like this:
>>>>
>>>> if conn->is_tls:
>>>> n = tls_read(conn, ptr, len)
>>>> else:
>>>> n = pqsecure_raw_read(conn, ptr, len)
>>>> return n
>>>>
>>>> I want to see this extended by your code to something like:
>>>>
>>>> if conn->is_tls:
>>>> n = tls_read(conn, ptr, len)
>>>> else:
>>>> n = pqsecure_raw_read(conn, ptr, len)
>>>>
>>>> if conn->is_compressed:
>>>> n = decompress(ptr, n)
>>>>
>>>> return n
>>>>
>>>> In conjunction with the above change, this should also
>>>> significantly reduce the size of the patch (I think).
>>>
>>> Yes, it will simplify patch. But make libpq compression completely
>>> useless (see my explanation above). We need to use streaming
>>> compression, and to be able to use streaming compression I have to
>>> pass function for fetching more data to compression library.
>>
>> I don't think you need that, even with the streaming API.
>>
>> To make this very concrete, let's talk about ZSTD_decompressStream (I'm
>> pulling information from
>> https://facebook.github.io/zstd/zstd_manual.html#Chapter7 ).
>>
>> Using the pseudocode I'm asking for above, the decompress() function
>> would look vaguely like this:
>>
>> decompress(ptr, n)
>> ZSTD_inBuffer in = {0}
>> ZpSTD_outBuffer out = {0}
>>
>> in.src = ptr
>> in.size = n
>>
>> while in.pos < in.size:
>> ret = ZSTD_decompressStream(out, in)
>> if ZSTD_isError(ret):
>> give_up()
>>
>> memcpy(ptr, out.dst, out.pos)
>> return out.pos
>>
>> (and compress() would follow a similar pattern, if we were to talk
>> about it).
>
> It will not work in this way. We do not know how much input data we
> need to be able to decompress message.

Well, that's a design decision you've made. You could put lengths on
chunks that are sent out - then you'd know exactly how much is needed.
(For instance, 4 bytes of network-order length followed by a complete
payload.) Then you'd absolutely know whether you have enough to
decompress or not.

> So loop should be something like this:
>
> decompress(ptr, n)
> ZSTD_inBuffer in = {0}
> ZSTD_outBuffer out = {0}
>
> in.src = ptr
> in.size = n
>
> while true
> ret = ZSTD_decompressStream(out, in)
> if ZSTD_isError(ret):
> give_up()
> if out.pos != 0
> // if we deomcpress soemthing
> return out.pos;
> read_input(in);

The last line is what I'm taking issue with. The interface we have
already in postgres's network code has a notion of partial reads, or
that reads might not return data. (Same with writing and partial
writes.) So you'd buffer what compressed data you have and return -
this is the preferred idiom so that we don't block or spin on a
nonblocking socket.

This is how the TLS code works already. Look at, for instance,
pgtls_read(). If we get back SSL_ERROR_WANT_READ (i.e., not enough data
to decrypt), we return no data and wait until the socket becomes
readable again.

Thanks,
--Robbie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2018-06-20 20:58:02 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Previous Message Tom Lane 2018-06-20 20:30:42 Re: Avoiding Tablespace path collision for primary and standby