Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2020-10-22 12:26:55
Message-ID: CAFiTN-tn_TLqxTuP8P0pQEt9Zth+DKQTxd-uMGCTpdZXDovKxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 22, 2020 at 10:41 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Thu, Oct 22, 2020 at 2:11 AM Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >
> > On Wed, Oct 21, 2020 at 01:59:50PM +0530, Dilip Kumar wrote:
> > >On Sat, Oct 17, 2020 at 11:34 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >>
> > >> On Tue, Oct 13, 2020 at 10:30 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >> >
> > >> > On Mon, Oct 12, 2020 at 7:32 PM Tomas Vondra
> > >> > <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> > >> > >
> > >> > > On Mon, Oct 12, 2020 at 02:28:43PM +0530, Dilip Kumar wrote:
> > >> > > >
> > >> > > >> ...
> > >> > > >
> > >> > > >I have worked on this patch, so as discussed now I am maintaining the
> > >> > > >preserved compression methods using dependency. Still PRESERVE ALL
> > >> > > >syntax is not supported, I will work on that part.
> > >> > > >
> > >> > >
> > >> > > Cool, I'll take a look. What's your opinion on doing it this way? Do you
> > >> > > think it's cleaner / more elegant, or is it something contrary to what
> > >> > > the dependencies are meant to do?
> > >> >
> > >> > I think this looks much cleaner. Moreover, I feel that once we start
> > >> > supporting the custom compression methods then we anyway have to
> > >> > maintain the dependency so using that for finding the preserved
> > >> > compression method is good option.
> > >>
> > >> I have also implemented the next set of patches.
> > >> 0004 -> Provide a way to create custom compression methods
> > >> 0005 -> Extention to implement lz4 as a custom compression method.
> > >
> > >In the updated version I have worked on some of the listed items
> > >> A pending list of items:
> > >> 1. Provide support for handling the compression option
> > >> - As discussed up thread I will store the compression option of the
> > >> latest compression method in a new field in pg_atrribute table
> > >> 2. As of now I have kept zlib as the second built-in option and lz4 as
> > >> a custom compression extension. In Offlist discussion with Robert, he
> > >> suggested that we should keep lz4 as the built-in method and we can
> > >> move zlib as an extension because lz4 is faster than zlib so better to
> > >> keep that as the built-in method. So in the next version, I will
> > >> change that. Any different opinion on this?
> > >
> > >Done
> > >
> > >> 3. Improve the documentation, especially for create_compression_method.
> > >> 4. By default support table compression method for the index.
> > >
> > >Done
> > >
> > >> 5. Support the PRESERVE ALL option so that we can preserve all
> > >> existing lists of compression methods without providing the whole
> > >> list.
> > >
> > >1,3,5 points are still pending.
> > >
> >
> > Thanks. I took a quick look at the patches and I think it seems fine. I
> > have one question, though - toast_compress_datum contains this code:
> >
> >
> > /* Call the actual compression function */
> > tmp = cmroutine->cmcompress((const struct varlena *) value);
> > if (!tmp)
> > return PointerGetDatum(NULL);
> >
> >
> > Shouldn't this really throw an error instead? I mean, if the compression
> > library returns NULL, isn't that an error?
>
> I don't think that we can throw an error here because pglz_compress
> might return -1 if it finds that it can not reduce the size of the
> data and we consider such data as "incompressible data" and return
> NULL. In such a case the caller will try to compress another
> attribute of the tuple. I think we can handle such cases in the
> specific handler functions.

I have added the compression failure error in lz4.c, please refer
lz4_cmcompress in v9-0001 patch. Apart from that, I have also
supported the PRESERVE ALL syntax to preserve all the existing
compression methods. I have also rebased the patch on the current
head.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v9-0002-alter-table-set-compression.patch application/octet-stream 12.8 KB
v9-0001-Built-in-compression-method.patch application/octet-stream 201.1 KB
v9-0004-Create-custom-compression-methods.patch application/octet-stream 39.3 KB
v9-0003-Add-support-for-PRESERVE.patch application/octet-stream 40.3 KB
v9-0005-new-compression-method-extension-for-zlib.patch application/octet-stream 9.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2020-10-22 12:29:53 Re: should INSERT SELECT use a BulkInsertState?
Previous Message Anastasia Lubennikova 2020-10-22 12:11:45 Re: [patch] Fix checksum verification in base backups for zero page headers