Re: [HACKERS] Custom compression methods

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>, Ildar Musin <i(dot)musin(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Евгений Шишкин <itparanoia(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Chapman Flack <chap(at)anastigmatix(dot)net>
Subject: Re: [HACKERS] Custom compression methods
Date: 2018-04-23 16:34:38
Message-ID: 03c376ed-839f-35f4-5f03-35b21b47e9a2@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23.04.2018 18:32, Alexander Korotkov wrote:
> But that the main goal of this patch: let somebody implement own
> compression
> algorithm which best fit for particular dataset.

Hmmm...Frankly speaking I don't believe in this "somebody".

>
> From my point of view the main value of this patch is that it
> allows to replace pglz algorithm with more efficient one, for
> example zstd.
> At some data sets zstd provides more than 10 times better
> compression ratio and at the same time is faster then pglz.
>
>
> Not exactly.  If we want to replace pglz with more efficient one, then
> we should
> just replace pglz with better algorithm.  Pluggable compression
> methods are
> definitely don't worth it for just replacing pglz with zstd.

As far as I understand it is not possible for many reasons (portability,
patents,...) to replace pglz with zstd.
I think that even replacing pglz with zlib (which is much worser than
zstd) will not be accepted by community.
So from my point of view the main advantage of custom compression method
is to replace builting pglz compression with more advanced one.

>  Some types blob-like datatypes might be not long enough to let generic
> compression algorithms like zlib or zstd train a dictionary.  For example,
> MySQL successfully utilize column-level dictionaries for JSON [1].  Also
> JSON(B) might utilize some compression which let user extract
> particular attributes without decompression of the whole document.

Well, I am not an expert in compression.
But I will be very surprised if somebody will show me some real example
with large enough compressed data buffer (>2kb) where some specialized
algorithm will provide significantly
better compression ratio than advanced universal compression algorithm.

Also may be I missed something, but current compression API doesn't
support partial extraction (extra some particular attribute or range).
If we really need it, then it should be expressed in custom compressor
API. But I am not sure how frequently it will needed.
Large values are splitted into 2kb TOAST chunks. With compression it can
be about 4-8k of raw data. IMHO storing larger JSON objects is database
design flaw.
And taken in account that in JSONB we need also extract header (so at
least two chunks), it makes more obscure advantages of partial JSONB
decompression.

>>
> I do not think that assignment default compression method through
> GUC is so bad idea.
>
>
> It's probably not so bad, but it's a different story. Unrelated to
> this patch, I think.

May be. But in any cases, there are several direction where compression
can be used:
- custom compression algorithms
- libpq compression
- page level compression
...

and  them should be somehow finally "married" with each other.

>
> I think streaming compression seems like a completely different story.
> client-server traffic compression is not just server feature.  It must
> be also supported at client side.  And I really doubt it should be
> pluggable.
>
> In my opinion, you propose good things like compression of WAL
> with better algorithm and compression of client-server traffic.
> But I think those features are unrelated to this patch and should
> be considered separately.  It's not features, which should be
> added to this patch.  Regarding this patch the points you provided
> more seems like criticism of the general idea.
>
> I think the problem of this patch is that it lacks of good example.
> It would be nice if Ildus implement simple compression with
> column-defined dictionary (like [1] does), and show its efficiency
> of real-life examples, which can't be achieved with generic
> compression methods (like zlib or zstd).  That would be a good
> answer to the criticism you provide.
>
> *Links*
>
> 1.
> https://www.percona.com/doc/percona-server/LATEST/flexibility/compressed_columns.html
>
> ------
> Alexander Korotkov
> Postgres Professional:http://www.postgrespro.com
> <http://www.postgrespro.com/>
> The Russian Postgres Company
>
Sorry, I really looking at this patch under the different angle.
And this is why I have some doubts about general idea.
Postgres allows to defined custom types, access methods,...
But do you know any production system using some special data types or
custom indexes which are not included in standard Postgres distribution
or popular extensions (like postgis)?

IMHO end-user do not have skills and time to create their own
compression algorithms. And without knowledge of specific of particular
data set,
it is very hard to implement something more efficient than universal
compression library.
But if you think that it is not a right place and time to discuss it, I
do not insist.

But in any case, I think that it will be useful to provide some more
examples of custom compression API usage.
From my point of view the most useful will be integration with zstd.
But if it is possible to find some example of data-specific compression
algorithms which show better results than universal compression,
it will be even more impressive.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-04-23 16:40:00 Re: perltidy version
Previous Message Robert Haas 2018-04-23 16:21:04 Re: Problem while setting the fpw with SIGHUP