Re: [HACKERS] Custom compression methods

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>, Ildar Musin <i(dot)musin(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Евгений Шишкин <itparanoia(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Chapman Flack <chap(at)anastigmatix(dot)net>
Subject: Re: [HACKERS] Custom compression methods
Date: 2018-04-22 13:21:31
Message-ID: CAPpHfdtmy_hN_-D9OJ-BwHb_PqUcKWq97kZPtOv6o1g065L8jw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 20, 2018 at 7:45 PM, Konstantin Knizhnik <
k(dot)knizhnik(at)postgrespro(dot)ru> wrote:

> On 30.03.2018 19:50, Ildus Kurbangaliev wrote:
>
>> On Mon, 26 Mar 2018 20:38:25 +0300
>> Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru> wrote:
>>
>> Attached rebased version of the patch. Fixed conflicts in pg_class.h.
>>>
>>> New rebased version due to conflicts in master. Also fixed few errors
>> and removed cmdrop method since it couldnt be tested.
>>
>> I seems to be useful (and not so difficult) to use custom compression
> methods also for WAL compression: replace direct calls of pglz_compress in
> xloginsert.c

I'm going to object this at point, and I've following arguments for that:

1) WAL compression is much more critical for durability than datatype
compression. Imagine, compression algorithm contains a bug which
cause decompress method to issue a segfault. In the case of datatype
compression, that would cause crash on access to some value which
causes segfault; but in the rest database will be working giving you
a chance to localize the issue and investigate that. In the case of
WAL compression, recovery would cause a server crash. That seems
to be much more serious disaster. You wouldn't be able to make
your database up and running and the same happens on the standby.

2) Idea of custom compression method is that some columns may
have specific data distribution, which could be handled better with
particular compression method and particular parameters. In the
WAL compression you're dealing with the whole WAL stream containing
all the values from database cluster. Moreover, if custom compression
method are defined for columns, then in WAL stream you've values
already compressed in the most efficient way. However, it might
appear that some compression method is better for WAL in general
case (there are benchmarks showing our pglz is not very good in
comparison to the alternatives). But in this case I would prefer to just
switch our WAL to different compression method one day. Thankfully
we don't preserve WAL compatibility between major releases.

3) This patch provides custom compression methods recorded in
the catalog. During recovery you don't have access to the system
catalog, because it's not recovered yet, and can't fetch compression
method metadata from there. The possible thing is to have GUC,
which stores shared module and function names for WAL compression.
But that seems like quite different mechanism from the one present
in this patch.

Taking into account all of above, I think we would give up with custom
WAL compression method. Or, at least, consider it unrelated to this
patch.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2018-04-22 13:56:32 Re: BGWORKER_BYPASS_ALLOWCONN used nowhere (infra part of on-line checksum switcher)
Previous Message Magnus Hagander 2018-04-22 12:04:10 Re: BGWORKER_BYPASS_ALLOWCONN used nowhere (infra part of on-line checksum switcher)