Re: libpq compression

From: Daniil Zakhlystov <usernamedt(at)yandex-team(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: libpq compression
Date: 2021-02-25 21:18:23
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, thanks for your review,

> On Feb 22, 2021, at 10:38 AM, Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com> wrote:
> On Thu, 11 Feb 2021, 21:09 Daniil Zakhlystov, <usernamedt(at)yandex-team(dot)ru> wrote::
> 3. Chunked compression allows to compress only well compressible messages and save the CPU cycles by not compressing the others
> 4. Chunked compression introduces some traffic overhead compared to the permanent (1.2810G vs 1.2761G TX data on pg_restore of IMDB database dump, according to results in my previous message)
> 5. From the protocol point of view, chunked compression seems a little bit more flexible:
> - we can inject some uncompressed messages at any time without the need to decompress/compress the compressed data
> - we can potentially switch the compression algorithm at any time (but I think that this might be over-engineering)
> Chunked compression also potentially makes it easier to handle non blocking sockets, because you aren't worrying about yet another layer of buffering within the compression stream. This is a real pain with SSL, for example.
> It simplifies protocol analysis.
> It permits compression to be decided on the fly, heuristically, based on message size and potential compressibility.
> It could relatively easily be extended to compress a group of pending small messages, e.g. by PQflush. That'd help mitigate the downsides with small messages.
> So while stream compression offers better compression ratios, I'm inclined to suspect we'll want message level compression.

Actually, by chunked compression, I’ve meant another variant of streaming compression,
where all of the CompressedData messages share the same compression context.
Frankly, I don’t think that starting a new compression context on each compressed
message makes sense because it will significantly hurt the compression ratio.

Also, in the current state, chunked compression requires buffering.
I’ll look into it, but it seems like that avoiding buffering will result in the increase
of the socket read/write system calls.

> On Feb 23, 2021, at 12:48 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> 1. As I mentioned above, we need to fall back to sending the
> uncompressed message if compression fails to reduce the size, or if it
> doesn't reduce the size by enough to compensate for the header we have
> to add to the packet (I assume this is 5 bytes, perhaps 6 if you allow
> a byte to mention the compression type).
> 2. Refining this further, if we notice that we are failing to compress
> messages regularly, maybe we should adaptively give up. The simplest
> idea would be something like: keep track of what percentage of the
> time compression succeeds in reducing the message size. If in the last
> 100 attempts we got a benefit fewer than 75 times, then conclude the
> data isn't very compressible and switch to only attempting to compress
> every twentieth packet or so. If the data changes and becomes more
> compressible again the statistics will eventually tilt back in favor
> of compressing every packet again; if not, we'll only be paying 5% of
> the overhead.
> 3. There should be some minimum size before we attempt compression.
> pglz gives up right away if the input is less than 32 bytes; I don't
> know if that's the right limit, but presumably it'd be very difficult
> to save 5 or 6 bytes out of a message smaller than that, and maybe
> it's not worth trying even for slightly larger messages.
> 4. It might be important to compress multiple packets at a time. I can
> even imagine having two different compressed protocol messages, one
> saying 'here is a compressed messages' and the other saying 'here are
> a bunch of compressed messages rolled up into one packet’.

I’ll look into 1) and 2). As for 3) and 4), this is already implemented.

> On Feb 23, 2021, at 1:40 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2021-02-22 14:48:25 -0500, Robert Haas wrote:
>> So, if I read these results correctly, on the "pg_restore of IMDB
>> database" test, we get 88% of the RX bytes reduction and 99.8% of the
>> TX bytes reduction for 90% of the CPU cost. On the "pgbench" test,
>> which probably has much smaller packets, chunked compression gives us
>> no bandwidth reduction and in fact consumes slightly more network
>> bandwidth -- which seems like it has to be an implementation defect,
>> since we should always be able to fall back to sending the
>> uncompressed packet if the compressed one is larger, or will be after
>> adding the wrapper overhead. But with the current code, at least, we
>> pay about a 30% CPU tax, and there's no improvement. The permanent
>> compression imposes a whopping 90% CPU tax, but we save about 33% on
>> TX bytes and about 14% on RX bytes.
> It'd be good to fix the bandwidth increase issue, of course. But other
> than that I'm not really bothered by transactional workloads like
> pgbench not saving much / increasing overhead (within reason) compared
> to bulkier operations. With packets as small as the default pgbench
> workloads use, it's hard to use generic compression methods and save
> space. While we could improve upon that even in the packet oriented
> case, it doesn't seem like an important use case to me.

The CPU/bandwidth usage increase issue was related to the incorrect compression criteria.
Indeed, pgbench packets are too short, so compressing all of the CopyData
and DataRow messages resulted in CPU and bandwidth overhead without noticeable
improvement (and even increased bandwidth consumption).

So I changed the compression criteria to filter too short messages:
“Compress CopyData and DataRow messages with length more than 60 bytes”

> On Feb 23, 2021, at 12:48 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> But there's a subtler way in which the permanent compression approach
> could be winning, which is that the compressor can retain state over
> long time periods. In a single pgbench response, there's doubtless
> some opportunity for the compressor to find savings, but an individual
> response doesn't likely include all that much duplication. But just
> think about how much duplication there is from one response to the
> next. The entire RowDescription message is going to be exactly the
> same for every query. If you can represent that in just a couple of
> bytes, it think that figures to be a pretty big win. If I had to
> guess, that's likely why the permanent compression approach seems to
> deliver a significant bandwidth savings even on the pgbench test,
> while the chunked approach doesn't. Likewise in the other direction:
> the query doesn't necessarily contain a lot of internal duplication,
> but it duplicate the previous query to a very large extent. It would
> be interesting to know whether this theory is correct, and whether
> anyone can spot a flaw in my reasoning.

> If it is, that doesn't necessarily mean we can't use the chunked
> approach, but it certainly makes it less appealing. I can see two ways
> to go. One would be to just accept that it won't get much benefit in
> cases like the pgbench example, and mitigate the downsides as well as
> we can. A version of this patch that caused a 3% CPU overhead in cases
> where it can't compress would be far more appealing than one that
> causes a 30% overhead in such cases (which seems to be where we are
> now).

Chunked compression CPU overhead in cases when it can’t compress
is not 30%, it is approximately 6-8%. To demonstrate that, I run new pgbenchmarks
with newly chosen compression criteria (mentioned above).

No compression:
pgbench --builtin tpcb-like -t 5000 --jobs=256 --client=256
latency average = 114.857 ms
tps = 2228.858196 (including connections establishing)
tps = 2229.178913 (excluding connections establishing)
real 9m35.708s
user 1m35.282s
sys 4m15.310s
RX bytes diff, human: 867.7064M
TX bytes diff, human: 1.0690G

Chunked compression (only CopyData and DataRow, length > 60 bytes):
pgbench "compression=zstd:1" --builtin tpcb-like -t 5000 --jobs=256 --client=256
latency average = 115.503 ms
tps = 2216.389584 (including connections establishing)
tps = 2216.728190 (excluding connections establishing)
real 9m38.708s
user 1m41.268s
sys 4m1.561s
RX bytes diff, human: 867.3309M
TX bytes diff, human: 1.0690G

Permanent compression:
pgbench "compression=zstd:1" --builtin tpcb-like -t 5000 --jobs=256 --client=256
latency average = 117.050 ms
tps = 2187.105345 (including connections establishing)
tps = 2187.417007 (excluding connections establishing)
real 9m46.726s
user 2m25.600s
sys 3m23.074s
RX bytes diff, human: 734.9666M
TX bytes diff, human: 725.5781M

> Alternatively, we could imagine that the compressed-message packets as
> carrying a single continuous compressed stream of bytes, so that the
> compressor state is retained from one compressed message to the next.
> Any number of uncompressed messages could could be sent in between,
> without doing anything to the compression state, but when you send the
> next compression message, both the sender and receiver feel like the
> bytes they're now being given are appended onto whatever bytes they
> saw last. This would presumably reocup a lot of the compression
> benefit that the permanent compression approach sees on the pgbench
> test, but it has some notable downsides. In particular, now you have
> to wonder what exactly you're gaining by not just compressing
> everything. Nobody snooping on the stream can snoop on an individual
> packet without having seen the whole history of compressed packets
> from the beginning of time, nor can some kind of middleware like
> pgbouncer decompress each payload packet just enough to see what the
> first byte may be. It's either got to decompress all of every packet
> to keep its compression state current, or just give up on knowing
> anything about what's going on inside those packets. And you've got to
> worry about all the same details about flushing the compressor state
> that we were worrying about with the compress-everything approach.
> Blech.

This is the exact description of the currently implemented chunked compression.
From the protocol point of view, this approach allows:
- inject uncompressed data by middleware.
- switch compression method on the fly (this is not implemented now,
but can be implemented in the future without protocol changes)

Yes, there is still no possibility for compressed traffic pass-through for poolers,
but do we actually need it?
I don’t see any solution except starting a new compression context for
each message in order to make it work.
Do we actually need to hurt the compression ratio for this specific use case?

Actually, there is an ugly hack - we may force-reset the compression context by sending the
SetCompressionMethod (which will reset the compression algorithm & context) after each CompressedData message.

This should allow interpreting of each CompressedData message on its own but will add overhead and hurt the compression ratio.

Daniil Zakhlystov

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Herrera 2021-02-25 21:43:44 Re: libpq debug log
Previous Message Paul Martinez 2021-02-25 20:22:58 Re: [PATCH] Note effect of max_replication_slots on subscriber side in documentation.