Re: libpq compression

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Daniil Zakhlystov <usernamedt(at)yandex-team(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: libpq compression
Date: 2020-11-24 16:02:02
Message-ID: CA+TgmoZ07Gqcu7i=+F15+syJO5cMrhB=rVeWbyf=CR5BQz_QWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 24, 2020 at 7:33 AM Konstantin Knizhnik
<k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> New version of the patch is attached.

I read over the comments from Andres (and Peter) suggesting that this
ought to be on-the-fly configurable. Here are some thoughts on making
that work with the wire protocol:

If the client potentially wants to use compression at some point it
should include _pq_.compression in the startup message. The value
associated with _pq_.compression should be a comma-separated list of
compression methods which the client understands. If the server
responds with a NegotiateProtocolVersion message, then it either
includes _pq_.compression (in which case the server does not support
compression) or it does not (in which case the server does support
compression). If no NegotiateProtocolVersion message is returned, then
the server is from before November 2017
(ae65f6066dc3d19a55f4fdcd3b30003c5ad8dbed) and compression is not
supported.

If the client requests compression and the server supports it, it
should return a new SupportedCompressionTypes message following
NegotiateProtocolMessage response. That should be a list of
compression methods which the server understands. At this point, the
clent and the server each know what methods the other understands.
Each should now feel free to select a compression method the other
side understands, and to switch methods whenever desired, as long as
they only select from methods the other side has said that they
understand. The patch seems to think that the compression method has
to be the same in both directions and that it can never change, but
there's no real reason for that. Let each side start out uncompressed
and then let it issue a new SetCompressionMethod protocol message to
switch the compression method whenever it wants. After sending that
message it begins using the new compression type. The other side
doesn't have to agree. That way, you don't have to worry about
synchronizing the two directions. Each side is just telling the other
what is choosing to do, from among the options the other side said it
could understand.

It's an interesting question whether it's best to "wrap" the
compressed messages in some way. For example, imagine that instead of
just compressing the bytes and sending them out, you send a message of
some new type CompressedMessage whose payload is another protocol
message. Then a piece of middleware could decide to decompress each
message just enough to see whether it wants to do anything with the
message and if not just forward it. It could also choose to inject its
own messages into the message stream which wouldn't necessarily need
to be compressed, because the wrapper allows mixing of compressed and
uncompressed messages. The big disadvantage of this approach is that
in many cases it will be advantageous to compress consecutive protocol
messages as a unit. For example, when the extend query protocol is in
use, the client will commonly send P-B-D-E-S maybe even in one network
packet. It will compress better if all of those messages are
compressed as one rather than separately. That consideration argues
for the approach the patch actually takes (though the documentation
isn't really very clear about what the patch is actually doing here so
I might be misunderstanding). However, note that when the client or
server does a "flush" this requires "flushing" at the compression
layer also, so that all pending data can be sent. zlib can certainly
do that; I assume other algorithms can too, but I don't really know.
If there are algorithms that don't have that built in, this approach
might be an issue.

Another thing to think about is whether it's really beneficial to
compress stuff like P-B-D-E-S. I would guess that the benefits here
come mostly from compressing DataRow and CopyData messages, and that
compressing messages that don't contain much payload data may just be
a waste of CPU cycles. On the other hand, it's not impossible for it
to win: the query might be long, and query text is probably highly
compressible. Designing something that is specific to certain message
types is probably a bridge too far, but at least I think this is a
strong argument that the compression method shouldn't have to be the
same in both directions.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-11-24 16:23:20 Re: PoC/WIP: Extended statistics on expressions
Previous Message Tom Lane 2020-11-24 15:55:16 Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path