Re: libpq compression

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Denis Smirnov <sd(at)arenadata(dot)io>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Cc: Daniil Zakhlystov <usernamedt(at)yandex-team(dot)ru>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: libpq compression
Date: 2020-12-17 14:54:28
Message-ID: 93027ee4-fefa-a8ca-2cc7-245c912bf824@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17.12.2020 16:39, Denis Smirnov wrote:
> Hello all,
>
> I’ve finally read the whole thread (it was huge). It is extremely sad that this patch hang without progress for such a long time. It seems that the main problem in discussion is that everyone has its own view what problems should be solve with this patch. Here are some of positions (not all of them):
>
> 1. Add a compression for networks with a bad bandwidth (and make a patch as simple and maintainable as possible) - author’s position.
> 2. Don’t change current network protocol and related code much.
> 3. Refactor compression API (and network compression as well)
> 4. Solve cloud provider’s problems: on demand buy network bandwidth with CPU utilisation and vice versa.
>
> All of these requirements have a different nature and sometimes conflict with each other. Without clearly formed requirements this patch would never be released.
>
> Anyway, I have rebased it to the current master branch, applied pgindent, tested on MacOS and fixed a MacOS specific problem with strcpy in build_compressors_list(): it has an undefined behaviour when source and destination strings overlap.
> - *client_compressors = src = dst = strdup(value);
> + *client_compressors = src = strdup(value);
> + dst = strdup(value);
>
> According to my very simple tests with randomly generated data, zstd gives about 3x compression (zlib has a little worse compression ratio and a little bigger CPU utilisation). It seems to be a normal ratio for any streaming data - Greenplum also uses zstd/zlib to compress append optimised tables and compression ratio is usually about 3-5x. Also according to my Greenplum experience, the most commonly used zstd ratio is 1, while for zlib it is usually in a range of 1-5. CPU and execution time were not affected much according to uncompressed data (but my tests were very simple and they should not be treated as reliable).
>
>
>
> Best regards,
> Denis Smirnov | Developer
> sd(at)arenadata(dot)io
> Arenadata | Godovikova 9-17, Moscow 129085 Russia
>

Thank you very much for reporting the problem.
Sorry, but you fix is entirely correct: it is necessary to assign dst to
*client_compressors, not  *src. Also extra strdup requires correspondent
memory deallocation.
I prepared new version of the patch with this fix + some other code
refactoring requested by reviewers (like sending ack message in more
standard way).

I am maintaining this code in
git(at)github(dot)com:postgrespro/libpq_compression.git repository.
I will be pleased if anybody, who wants to suggest any bug
fixes/improvements of libpq compression, create pull requests: it will
be much easier for me to merge them.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
libpq-compression-28.patch text/x-patch 73.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-12-17 14:58:58 Re: cutting down the TODO list thread
Previous Message Seino Yuki 2020-12-17 13:59:20 Re: Feature improvement for pg_stat_statements