Re: libpq compression

From: Robbie Harwood <rharwood(at)redhat(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Grigory Smolkin <g(dot)smolkin(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: libpq compression
Date: 2018-06-19 21:04:59
Message-ID: jlgzhzqbi84.fsf@redhat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> writes:

> On 18.06.2018 23:34, Robbie Harwood wrote:
>
>> I also don't like that you've injected into the *startup* path -
>> before authentication takes place. Fundamentally, authentication (if
>> it happens) consists of exchanging some combination of short strings
>> (e.g., usernames) and binary blobs (e.g., keys). None of this will
>> compress well, so I don't see it as worth performing this negotiation
>> there - it can wait. It's also another message in every startup.
>> I'd leave it to connection parameters, personally, but up to you.
>
> From my point of view compression of libpq traffic is similar with SSL
> and should be toggled at the same stage.

But that's not what you're doing. This isn't where TLS gets toggled.

TLS negotiation happens as the very first packet: after completing the
TCP handshake, the client will send a TLS negotiation request. If it
doesn't happen there, it doesn't happen at all.

(You *could* configure it where TLS is toggled. This is, I think, not a
good idea. TLS encryption is a probe: the server can reject it, at
which point the client tears everything down and connects without TLS.
So if you do the same with compression, that's another point of tearing
down an starting over. The scaling on it isn't good either: if we add
another encryption method into the mix, you've doubled the number of
teardowns.)

> Definitely authentication parameter are not so large to be efficiently
> compressed, by compression (may be in future password protected) can
> somehow protect this data.
> In any case I do not think that compression of authentication data may
> have any influence on negotiation speed.
> So I am not 100% sure that toggling compression just after receiving
> startup package is the only right solution.
> But I am not also convinced in that there is some better place where
> compressor should be configured.
> Do you have some concrete suggestions for it? In InitPostgres just after
> PerformAuthentication ?

Hmm, let me try to explain this differently.

pq_configure() (as you've called it) shouldn't send a packet. At its
callsite, we *already know* whether we want to use compression - that's
what the port->use_compression option says. So there's no point in
having a negotiation there - it's already happened.

The other thing you do in pq_configure() is call zpq_create(), which
does a bunch of initialization for you. I am pretty sure that all of
this can be deferred until the first time you want to send a compressed
message - i.e., when compress()/decompress() is called for the first
time from *secure_read() or *secure_write().

> Also please notice that compression is useful not only for client-server
> communication, but also for replication channels.
> Right now it is definitely used in both cases, but if we move
> pq_configure somewhere else, we should check that this code is invoked
> in both for normal backends and walsender.

"We" meaning you, at the moment, since I don't think any of the rest of
us have set up tests with this code :)

If there's common code to be shared around, that's great. But it's not
imperative; in a lot of ways, the network stacks are very different from
each other, as I'm sure you've seen. Let's not have the desire for code
reuse get in the way of good, maintainable design.

>> Using terminology from https://facebook.github.io/zstd/zstd_manual.html :
>>
>> Right now you use streaming (ZSTD_{compress,decompress}Stream()) as the
>> basis for your API. I think this is probably a mismatch for what we're
>> doing here - we're doing one-shot compression/decompression of packets,
>> not sending video or something.
>>
>> I think our use case is better served by the non-streaming interface, or
>> what they call the "Simple API" (ZSTD_{decompress,compress}()).
>> Documentation suggests it may be worth it to keep an explicit context
>> around and use that interface instead (i.e.,
>> ZSTD_{compressCCTx,decompressDCtx}()), but that's something you'll have
>> to figure out.
>>
>> You may find making this change helpful for addressing the next issue.
>
> Sorry, but here I completely disagree with you.
> What we are doing is really streaming compression, not compression of
> individual messages/packages.
> Yes, it is not a video, but actually COPY data has the same nature as
> video data.
> The main benefit of streaming compression is that we can use the same
> dictionary for compressing all messages (and adjust this dictionary
> based on new data).
> We do not need to write dictionary and separate header for each record.
> Otherwize compression of libpq messages will be completely useless:
> typical size of message is too short to be efficiently compressed. The
> main drawback of streaming compression is that you can not decompress
> some particular message without decompression of all previous messages.
> This is why streaming compression can not be used to compress database
> pages  (as it is done in CFS, provided in PostgresPro EE). But for libpq
> it is no needed.

That makes sense, thanks. The zstd documentation doesn't articulate
that at all.

>> I don't like where you've put the entry points to the compression logic:
>> it's a layering violation. A couple people have expressed similar
>> reservations I think, so let me see if I can explain using
>> `pqsecure_read()` as an example. In pseudocode, `pqsecure_read()` looks
>> like this:
>>
>> if conn->is_tls:
>> n = tls_read(conn, ptr, len)
>> else:
>> n = pqsecure_raw_read(conn, ptr, len)
>> return n
>>
>> I want to see this extended by your code to something like:
>>
>> if conn->is_tls:
>> n = tls_read(conn, ptr, len)
>> else:
>> n = pqsecure_raw_read(conn, ptr, len)
>>
>> if conn->is_compressed:
>> n = decompress(ptr, n)
>>
>> return n
>>
>> In conjunction with the above change, this should also significantly
>> reduce the size of the patch (I think).
>
> Yes, it will simplify patch. But make libpq compression completely
> useless (see my explanation above).
> We need to use streaming compression, and to be able to use streaming
> compression I have to pass function for fetching more data to
> compression library.

I don't think you need that, even with the streaming API.

To make this very concrete, let's talk about ZSTD_decompressStream (I'm
pulling information from
https://facebook.github.io/zstd/zstd_manual.html#Chapter7 ).

Using the pseudocode I'm asking for above, the decompress() function
would look vaguely like this:

decompress(ptr, n)
ZSTD_inBuffer in = {0}
ZSTD_outBuffer out = {0}

in.src = ptr
in.size = n

while in.pos < in.size:
ret = ZSTD_decompressStream(out, in)
if ZSTD_isError(ret):
give_up()

memcpy(ptr, out.dst, out.pos)
return out.pos

(and compress() would follow a similar pattern, if we were to talk about
it).

Thanks,
--Robbie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2018-06-19 21:18:32 Re: Time to put context diffs in the grave
Previous Message Andres Freund 2018-06-19 20:54:55 Re: Time to put context diffs in the grave