Re: pglz performance

From: Andres Freund <andres(at)anarazel(dot)de>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vladimir Leskov <vladimirlesk(at)yandex-team(dot)ru>
Subject: Re: pglz performance
Date: 2019-08-02 16:39:48
Message-ID: 20190802163948.i6mjypdgujeorrbi@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-08-02 20:40:51 +0500, Andrey Borodin wrote:
> We have some kind of "roadmap" of "extensible pglz". We plan to provide implementation on Novembers CF.

I don't understand why it's a good idea to improve the compression side
of pglz. There's plenty other people that spent a lot of time developing
better compression algorithms.

> Currently, pglz starts with empty cache map: there is no prior 4k bytes before start. We can add imaginary prefix to any data with common substrings: this will enhance compression ratio.
> It is hard to decide on training data set for this "common prefix". So we want to produce extension with aggregate function which produces some "adapted common prefix" from users's data.
> Then we can "reserve" few negative bytes for "decompression commands". This command can instruct database on which common prefix to use.
> But also system command can say "invoke decompression from extension".
>
> Thus, user will be able to train database compression on his data and substitute pglz compression with custom compression method seamlessly.
>
> This will make hard-choosen compression unneeded, but seems overly hacky. But there will be no need to have lz4, zstd, brotli, lzma and others in core. Why not provide e.g. "time series compression"? Or "DNA compression"? Whatever gun user wants for his foot.

I think this is way too complicated, and will provide not particularly
much benefit for the majority users.

In fact, I'll argue that we should flat out reject any such patch until
we have at least one decent default compression algorithm in
core. You're trying to work around a poor compression algorithm with
complicated dictionary improvement, that require user interaction, and
only will work in a relatively small subset of the cases, and will very
often increase compression times.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2019-08-02 16:39:59 Re: Add client connection check during the execution of the query
Previous Message Jesper Pedersen 2019-08-02 16:03:09 Re: Index Skip Scan