Re: libpq compression

From: Daniil Zakhlystov <usernamedt(at)yandex-team(dot)ru>
To: Ian Zagorskikh <izagorskikh(at)cloudlinux(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Subject: Re: libpq compression
Date: 2021-04-22 13:04:05
Message-ID: BF36ABF2-DFD2-4981-9B28-50B90D866A4E@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 21 Apr 2021, at 20:35, Ian Zagorskikh <izagorskikh(at)cloudlinux(dot)com> wrote:
>
> Let me drop my humble 2 cents in this thread. At this moment I checked only 0001-Add-zlib-and-zstd-streaming-compression patch. With no order:
>
> * No checks for failures. For example, value from malloc() is not checked for NULL and used as-is right after the call. Also as a result possibly NULL values passed into ZSTD functions which are explicitly not NULL-tolerant and so dereference pointers without checks.
>
> * Used memory management does not follow the schema used in the common module. Direct calls to std C malloc/free are hard coded. AFAIU this is not backend friendly. Looking at the code around I believe they should be wrapped to ALLOC/FREE local macroses that depend on FRONTEND define. As it's done for example. in src/common/hmac.c.
>
> * If we're going to fix our code to be palloc/pfree friendly there's also a way to pass our custom allocator callbacks inside ZSTD functions. By default ZSTD uses malloc/free but it can be overwritten by the caller in e.g. ZSTD_createDStream_advanced(ZSTD_customMem customMem) versions of API . IMHO that would be good. If a 3rd party component allows us to inject a custom memory allocator and we do have it why not use this feature?
>
> * Most zs_foo() functions do not check ptr arguments, though some like zs_free() do. If we're speaking about a "common API" that can be used by a wide range of different modules around the project, such a relaxed way to treat input arguments IMHO can be an evil. Or at least add Assert(ptr) assertions so they can be catched up in debug mode.
>
> * For zs_create() function which must be called first to create context for further operations, a compression method is passed as integer. This method is used inside zs_create() as index inside an array of compress implementation descriptors. There are several problems:
> 1) Method ID value is not checked for validity. By passing an invalid method ID we can easily get out of array bounds.
> 2) Content of array depends on on configuration options e.g. HAVE_LIBZSTD, HAVE_LIBZ etc. So index inside this array is not persistent. For some config options combination method ID with value 0 may refer to ZSTD and for others to zlib.
> 3) There's no defines/enums/etc in public z_stream.h header that would define which value should be passed to create a specific compressor. Users have to guess it or check the code.
>
> * I have a feeling after reading comments for ZSTD compress/decompress functions that their return value is treated in a wrong way handling only success path. Though need more checks/POCs on that.
>
> In general, IMHO the idea of a generic compress/decompress API is very good and useful. But specific implementation needs some code cleanup/refactoring.

Hi,

thank you for the detailed review. Previously in the thread, Justin Pryzby mentioned the preliminary patch which will be common between the different compression patches. Basically, this is a quick prototype of that patch and it definitely needs some additional effort. It would be also great to have a top-level overview of the 0001-Add-zlib-and-zstd-streaming-compression patch from Justin to make sure it can be potentially be used for the other compression patches.

Currently, I’m in the process of implementing the negotiation of asymmetric compression, but I can take a look at these issues after I finish with it.

> One little fix to 0001-Add-zlib-and-zstd-streaming-compression patch for configure.ac
>
> ================
> @@ -1455,6 +1456,7 @@ fi
>
> if test "$with_lz4" = yes; then
> AC_CHECK_HEADERS(lz4.h, [], [AC_MSG_ERROR([lz4.h header file is required for LZ4])])
> +fi
> ================
>
> Otherwise autoconf generates a broken configure script.

Added your fix to the patch and rebased it onto the current master.

Attachment Content-Type Size
0001-Add-zlib-and-zstd-streaming-compression.patch application/octet-stream 24.7 KB
0002-Implement-libpq-compression.patch application/octet-stream 81.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-04-22 13:29:46 Re: INT64_FORMAT in translatable strings
Previous Message Andrew Dunstan 2021-04-22 12:53:52 Re: multi-install PostgresNode fails with older postgres versions