From: | Daniil Zakhlystov <usernamedt(at)yandex-team(dot)ru> |
---|---|
To: | Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | pryzby(at)telsasoft(dot)com, x4mmm(at)yandex-team(dot)ru, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: libpq compression |
Date: | 2021-02-11 13:09:20 |
Message-ID: | 3676BA4F-ED2B-470C-9892-007C943BB1D2@yandex-team.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi!
> On 09.02.2021 09:06, Konstantin Knizhnik wrote:
>
> Sorry, but my interpretation of your results is completely different:
> permanent compression is faster than chunked compression (2m15 vs. 2m27)
> and consumes less CPU (44 vs 48 sec).
> Size of RX data is slightly larger - 0.5Mb but TX size is smaller - 5Mb.
> So permanent compression is better from all points of view: it is
> faster, consumes less CPU and reduces network traffic!
>
> From my point of view your results just prove my original opinion that
> possibility to control compression on the fly and use different
> compression algorithm for TX/RX data
> just complicates implementation and given no significant advantages.
When I mentioned the lower CPU usage, I was referring to the pgbench test results in attached
google doc, where chunked compression demonstrated lower CPU usage compared to the permanent compression.
I made another (a little bit larger) pgbench test to demonstrate this:
Pgbench test parameters:
Data load
pgbench -i -s 100
Run configuration
pgbench --builtin tpcb-like -t 1500 --jobs=64 --client==600"
Pgbench test results:
No compression
latency average = 247.793 ms
tps = 2421.380067 (including connections establishing)
tps = 2421.660942 (excluding connections establishing)
real 6m11.818s
user 1m0.620s
sys 2m41.087s
RX bytes diff, human: 703.9221M
TX bytes diff, human: 772.2580M
Chunked compression (compress only CopyData and DataRow messages)
latency average = 249.123 ms
tps = 2408.451724 (including connections establishing)
tps = 2408.719578 (excluding connections establishing)
real 6m13.819s
user 1m18.800s
sys 2m39.941s
RX bytes diff, human: 707.3872M
TX bytes diff, human: 772.1594M
Permanent compression
latency average = 250.312 ms
tps = 2397.005945 (including connections establishing)
tps = 2397.279338 (excluding connections establishing)
real 6m15.657s
user 1m54.281s
sys 2m37.187s
RX bytes diff, human: 610.6932M
TX bytes diff, human: 513.2225M
As you can see in the above results, user CPU time (1m18.800s vs 1m54.281s) is significantly smaller in
chunked compression because it doesn’t try to compress all of the packets.
Here is the summary from my POV, according to these and previous tests results:
1. Permanent compression always brings the highest compression ratio
2. Permanent compression might be not worthwhile in load different from COPY data / Replication / BLOBs/JSON queries
3. Chunked compression allows to compress only well compressible messages and save the CPU cycles by not compressing the others
4. Chunked compression introduces some traffic overhead compared to the permanent (1.2810G vs 1.2761G TX data on pg_restore of IMDB database dump, according to results in my previous message)
5. From the protocol point of view, chunked compression seems a little bit more flexible:
- we can inject some uncompressed messages at any time without the need to decompress/compress the compressed data
- we can potentially switch the compression algorithm at any time (but I think that this might be over-engineering)
Given the summary above, I think it’s time to make a decision on which path we should take and make the final list of goals that need to be reached in this patch to make it committable.
Thanks,
Daniil Zakhlystov
From | Date | Subject | |
---|---|---|---|
Next Message | Ashutosh Bapat | 2021-02-11 13:09:45 | Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series) |
Previous Message | Ranier Vilela | 2021-02-11 13:08:09 | Re: Operands don't affect result (CONSTANT_EXPRESSION_RESULT) (src/backend/utils/adt/jsonfuncs.c) |