Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-03-06 15:29:16
Message-ID: CAFiTN-uxeRxCROr50e62eqog0nAi+FFi1m25f_D5D0-QsdDp1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 4, 2021 at 4:03 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Thu, Mar 4, 2021 at 2:49 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > Does this patch need to do something about ExtractReplicaIdentity()?
> > If there are compressed fields in the tuple being built, can we rely
> > on the decompression engine being available at the time we need to do
> > something with the tuple?
>
> We log the replica identity tuple in the WAL so that later walsender
> can stream this to the subscriber, and before sending to the
> subscriber anyway we have detoast all the data. Said that I think the
> problem you are worried about is not only with 'replica identity
> tuple' but it is with any tuple. I mean we copy the compressed field
> as it is in WAL and suppose we copy some fields which are compressed
> with lz4 and then we restart the server with another binary that is
> compiled without lz4. Now, the problem is the walsender can not
> decompress those

Based on the off list discussion with Robert, there are a couple of
problems which might be very difficult to handle when we support the
custom compression method using the access methods, the major problems
are 1) compressed data inside composite type 2) Access method might
get dropped before walsender decode the compressed data. I think the
first problem we still have is some solution although it may impact
performance in some cases i.e. extended record. But the problem of
the compressed data inside the WAL is a bigger problem. So as of now
we are planning to go ahead only with the built-in methods and if we
are only continuing with the built-in method so it doesn't make sense
to continue the access method infrastructure. I have rewrote the
patches without using the access method for compression. Changes in
the patches

- Removed complete dependency on the access method for compression
- While moving the tuple from one table to another table with
different compression method, no need to compare the compression
method and decompress.
- Alter table set compression, will not rewrite the old data, so only
the new tuple will be compressed with the new compression method.
- No preserve.

I feel the built-in method patch now looks cleaner and smaller than it
was before.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v31-0001-Built-in-compression-method.patch text/x-patch 91.0 KB
v31-0002-Add-default_toast_compression-GUC.patch text/x-patch 7.8 KB
v31-0004-default-to-with-lz4.patch text/x-patch 1.7 KB
v31-0003-Alter-table-set-compression.patch text/x-patch 19.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhihong Yu 2021-03-06 15:45:43 Re: Parallel INSERT (INTO ... SELECT ...)
Previous Message Alvaro Herrera 2021-03-06 15:28:46 Re: is cfbot's apply aging intentional?