Re: libpq compression

From: Denis Smirnov <sd(at)arenadata(dot)io>
To: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Cc: Daniil Zakhlystov <usernamedt(at)yandex-team(dot)ru>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: libpq compression
Date: 2020-12-17 13:39:50
Message-ID: CCAB1F57-A71D-4DEC-9A9C-F325B44BAC29@arenadata.io
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello all,

I’ve finally read the whole thread (it was huge). It is extremely sad that this patch hang without progress for such a long time. It seems that the main problem in discussion is that everyone has its own view what problems should be solve with this patch. Here are some of positions (not all of them):

1. Add a compression for networks with a bad bandwidth (and make a patch as simple and maintainable as possible) - author’s position.
2. Don’t change current network protocol and related code much.
3. Refactor compression API (and network compression as well)
4. Solve cloud provider’s problems: on demand buy network bandwidth with CPU utilisation and vice versa.

All of these requirements have a different nature and sometimes conflict with each other. Without clearly formed requirements this patch would never be released.

Anyway, I have rebased it to the current master branch, applied pgindent, tested on MacOS and fixed a MacOS specific problem with strcpy in build_compressors_list(): it has an undefined behaviour when source and destination strings overlap.
- *client_compressors = src = dst = strdup(value);
+ *client_compressors = src = strdup(value);
+ dst = strdup(value);

According to my very simple tests with randomly generated data, zstd gives about 3x compression (zlib has a little worse compression ratio and a little bigger CPU utilisation). It seems to be a normal ratio for any streaming data - Greenplum also uses zstd/zlib to compress append optimised tables and compression ratio is usually about 3-5x. Also according to my Greenplum experience, the most commonly used zstd ratio is 1, while for zlib it is usually in a range of 1-5. CPU and execution time were not affected much according to uncompressed data (but my tests were very simple and they should not be treated as reliable).

Attachment Content-Type Size
0001-Rebase-patch-27-to-actual-master-and-fix-strcpy.patch.txt text/plain 78.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Seino Yuki 2020-12-17 13:59:20 Re: Feature improvement for pg_stat_statements
Previous Message Konstantin Knizhnik 2020-12-17 13:05:18 Re: On login trigger: take three