Re: libpq compression

From: Ian Zagorskikh <izagorskikh(at)cloudlinux(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Daniil Zakhlystov <usernamedt(at)yandex-team(dot)ru>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Subject: Re: libpq compression
Date: 2021-04-21 15:35:18
Message-ID: CAByvzpb4JX_F1BuOpHxc1XxBXmCeXBeaE-TOvLOE+=5Y7MGXeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

All, thanks!

On Wed, Apr 21, 2021 at 10:47 AM Michael Paquier <michael(at)paquier(dot)xyz>
wrote:

> Patch reviews had better be posted on the community lists. This way,
> if the patch is left dead by the authors (things happen in life), then
> somebody else could move on with the patch without having to worry
> about this kind of issues twice.
> --
> Michael
>

Let me drop my humble 2 cents in this thread. At this moment I checked only
0001-Add-zlib-and-zstd-streaming-compression patch. With no order:

* No checks for failures. For example, value from malloc() is not checked
for NULL and used as-is right after the call. Also as a result possibly
NULL values passed into ZSTD functions which are explicitly not
NULL-tolerant and so dereference pointers without checks.

* Used memory management does not follow the schema used in the common
module. Direct calls to std C malloc/free are hard coded. AFAIU this is not
backend friendly. Looking at the code around I believe they should be
wrapped to ALLOC/FREE local macroses that depend on FRONTEND define. As
it's done for example. in src/common/hmac.c.

* If we're going to fix our code to be palloc/pfree friendly there's also a
way to pass our custom allocator callbacks inside ZSTD functions. By
default ZSTD uses malloc/free but it can be overwritten by the caller in
e.g. ZSTD_createDStream_advanced(ZSTD_customMem customMem) versions of API
. IMHO that would be good. If a 3rd party component allows us to inject a
custom memory allocator and we do have it why not use this feature?

* Most zs_foo() functions do not check ptr arguments, though some like
zs_free() do. If we're speaking about a "common API" that can be used by a
wide range of different modules around the project, such a relaxed way to
treat input arguments IMHO can be an evil. Or at least add Assert(ptr)
assertions so they can be catched up in debug mode.

* For zs_create() function which must be called first to create context for
further operations, a compression method is passed as integer. This method
is used inside zs_create() as index inside an array of compress
implementation descriptors. There are several problems:
1) Method ID value is not checked for validity. By passing an invalid
method ID we can easily get out of array bounds.
2) Content of array depends on on configuration options e.g. HAVE_LIBZSTD,
HAVE_LIBZ etc. So index inside this array is not persistent. For some
config options combination method ID with value 0 may refer to ZSTD and for
others to zlib.
3) There's no defines/enums/etc in public z_stream.h header that would
define which value should be passed to create a specific compressor. Users
have to guess it or check the code.

* I have a feeling after reading comments for ZSTD compress/decompress
functions that their return value is treated in a wrong way handling only
success path. Though need more checks/POCs on that.

In general, IMHO the idea of a generic compress/decompress API is very good
and useful. But specific implementation needs some code cleanup/refactoring.

--
Best Regards,
Ian Zagorskikh
CloudLinux: https://www.cloudlinux.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-04-21 15:40:07 Re: track_planning causing performance regression
Previous Message Fujii Masao 2021-04-21 15:28:11 Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?