Re: libpq compression (part 2)

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Daniil Zakhlystov <usernamedt(at)yandex-team(dot)ru>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: libpq compression (part 2)
Date: 2022-01-01 17:25:05
Message-ID: 20220101172504.GW24477@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for working on this. The patch looks to be in good shape - I hope more
people will help to review and test it. I took the liberty of creating a new
CF entry. http://cfbot.cputube.org/daniil-zakhlystov.html

+zpq_should_compress(ZpqStream * zpq, char msg_type, uint32 msg_len)
+{
+ return zpq_choose_compressor(zpq, msg_type, msg_len) == -1;

I think this is backwards , and should say != -1 ?

As written, the server GUC libpq_compression defaults to "on", and the client
doesn't request compression. I think the server GUC should default to off.
I failed to convince Kontantin about this last year. The reason is that 1)
it's a new feature; 2) with security implications. An admin should need to
"opt in" to this. I still wonder if this should be controlled by a new "TYPE"
in pg_hba (rather than a GUC); that would make it exclusive of SSL.

However, I also think that in the development patches both client and server
should enable compression by default, to allow it to be exercized by cfbot.
For the other compression patches I worked on, I had an 0001 commit where the
feature default was off, plus following patches which enabled the feature by
default - only for cfbot and not intended for commit.

+/* GUC variable containing the allowed compression algorithms list (separated by comma) */
+char *libpq_compress_algorithms;

=> The global variable is conventionally initialized to the same default as
guc.c (even though the GUC default value will be applied at startup).

+ * - NegotiateProtocolVersion in cases when server does not support protocol compression
+ * Anything else probably means it's not Postgres on the other end at all.
*/
- if (!(beresp == 'R' || beresp == 'E'))
+ if (!(beresp == 'R' || beresp == 'E' || beresp == 'z' || beresp == 'v'))

=> I think NegotiateProtocolVersion and 'v' are from an old version of the
patch and no longer used ?

There's a handful of compiler warnings:
| z_stream.c:311:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]

+ ereport(LOG, (errmsg("failed to parse configured compression setting: %s", libpq_compress_algorithms)));
+ return 0;

=> Since March, errmsg doesn't need extra parenthesis around it (e3a87b4).

+ * Report current transaction start timestamp as the specified value.
+ * Zero means there is no active transaction.

=> The comment doesn't correspond to the function (copy+paste).

+ return id >= 0 && id < (sizeof(zs_algorithms) / sizeof(*zs_algorithms));
+ size_t n_algorithms = sizeof(zs_algorithms) / sizeof(*zs_algorithms);

=> You can use lengthof() for these.

Maybe pg_stat_network_traffic should show the compression?
It'd have to show the current client-server and server-client values.
Or maybe that's not useful since it can change dynamically (unless it were
reset when the compression method was changed).

I think the psql conninfo Compressor/Decompressor line may be confusing.
Maybe it should talk about Client->Server and Server->Client.
Maybe it should be displayed on a single line.

Actually, right now the \conninfo part is hardly useful, since the compressor
is allocated lazily/"on demand", so it shows the compression of some previous
command, but maybe not the last command, and maybe not the next command...

I'm not sure it's needed to allow the compression to be changed dynamically,
and the generality to support a different compression mthod for each libpq
message type seems excessive. Maybe it's enough to check that the message type
is one of VALID_LONG_MESSAGE_TYPE and its length is long enough.

I wonder whether the asymmetric compression idea is useful. The only
application I can see for this is that a server might allow to *decompress*
data from a client, but may not want to *compress* data to a client.

What happenes if the compression doesn't decrease the message size ? I don't
see anything that allows sending the original, raw data. The advantage would
be that the remote side doesn't incur the overhead of decompression.

I hit this assertion with PGCOMPRESSION=lz4 (but not zlib), and haven't yet
found the cause.

TRAP: FailedAssertion("decBytes > 0", File: "z_stream.c", Line: 315, PID: 21180)
postgres: pryzbyj postgres 127.0.0.1(46154) idle(ExceptionalCondition+0x99)[0x5642137806d9]
postgres: pryzbyj postgres 127.0.0.1(46154) idle(+0x632bfe)[0x5642137d4bfe]
postgres: pryzbyj postgres 127.0.0.1(46154) idle(zs_read+0x2b)[0x5642137d4ebb]
postgres: pryzbyj postgres 127.0.0.1(46154) idle(zpq_read+0x192)[0x5642137d1172]
postgres: pryzbyj postgres 127.0.0.1(46154) idle(+0x378b8a)[0x56421351ab8a]
postgres: pryzbyj postgres 127.0.0.1(46154) idle(pq_getbyte+0x1f)[0x56421351c12f]
postgres: pryzbyj postgres 127.0.0.1(46154) idle(PostgresMain+0xf96)[0x56421365fcc6]
postgres: pryzbyj postgres 127.0.0.1(46154) idle(+0x428b69)[0x5642135cab69]
postgres: pryzbyj postgres 127.0.0.1(46154) idle(PostmasterMain+0xd1c)[0x5642135cbbac]
postgres: pryzbyj postgres 127.0.0.1(46154) idle(main+0x220)[0x5642132f9a40]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7fb70ef45b97]

Maybe you should reset the streams between each compression message (even if
it's using the same compression algorithm). This might allow better
compression. You could either unconditionally call zs_compressor_free()/
zs_create_compressor(). Or, add a new, optional interface for resetting the
stream (and if that isn't set for a given compression method, then free+create
the stream). For LZ4, that'd be LZ4_resetStream_fast(), but only for
v1.9.0+...

Some minor formatting issues:
- There are spaces rather than tabs in a few files; you can maybe fix it by
piping the file through unexpand -t4.
- pointers in function declarations should have no space after the stars:
+zs_buffered(ZStream * zs)
- There's a few extraneous whitespace changes:
ConfigureNamesEnum, pq_buffer_has_data, libpq-int.h.
- Some lines are too long:
git log -2 -U1 '*.[ch]' |grep -E '.{80}|^diff' |less
- Some 1-line "if" statements would be easier to read without braces {}.
- Some multi-line comments should have "*/" on a separate line:
git log -3 -p |grep -E '^\+ \*.*\*\/'
git log -3 -p '*.c' |grep -E '^\+[^/]*\*.*\*\/'

--
Justin

Attachment Content-Type Size
0001-Implement-libpq-compression.patch text/x-diff 114.5 KB
0002-Add-ZSTD-support.patch text/x-diff 17.8 KB
0003-Avoid-buffer-overflow-found-by-valgrind.patch text/x-diff 1.2 KB
0004-fix-compiler-warnings.patch text/x-diff 2.2 KB
0005-respect-LZ4-compression-level.patch text/x-diff 933 bytes
0006-zpq_choose_compressor.patch text/x-diff 771 bytes
0007-conninfo-on-one-line.patch text/x-diff 923 bytes
0008-Enable-zlib-compression-by-default.patch text/x-diff 866 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhihong Yu 2022-01-01 21:16:47 Re: Collecting statistics about contents of JSONB columns
Previous Message Tomas Vondra 2022-01-01 17:21:06 Re: using extended statistics to improve join estimates