Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2020-10-28 07:46:31
Message-ID: CAFiTN-sySfmVMiiNEGo8QOirmzXw3P4n8+DUXnt85=eNNfpD4w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 27, 2020 at 10:54 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Thu, Oct 22, 2020 at 5:56 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Thu, Oct 22, 2020 at 10:41 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Oct 22, 2020 at 2:11 AM Tomas Vondra
> > > <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> > > >
> > > > On Wed, Oct 21, 2020 at 01:59:50PM +0530, Dilip Kumar wrote:
> > > > >On Sat, Oct 17, 2020 at 11:34 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > > >>
> > > > >> On Tue, Oct 13, 2020 at 10:30 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > > >> >
> > > > >> > On Mon, Oct 12, 2020 at 7:32 PM Tomas Vondra
> > > > >> > <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> > > > >> > >
> > > > >> > > On Mon, Oct 12, 2020 at 02:28:43PM +0530, Dilip Kumar wrote:
> > > > >> > > >
> > > > >> > > >> ...
> > > > >> > > >
> > > > >> > > >I have worked on this patch, so as discussed now I am maintaining the
> > > > >> > > >preserved compression methods using dependency. Still PRESERVE ALL
> > > > >> > > >syntax is not supported, I will work on that part.
> > > > >> > > >
> > > > >> > >
> > > > >> > > Cool, I'll take a look. What's your opinion on doing it this way? Do you
> > > > >> > > think it's cleaner / more elegant, or is it something contrary to what
> > > > >> > > the dependencies are meant to do?
> > > > >> >
> > > > >> > I think this looks much cleaner. Moreover, I feel that once we start
> > > > >> > supporting the custom compression methods then we anyway have to
> > > > >> > maintain the dependency so using that for finding the preserved
> > > > >> > compression method is good option.
> > > > >>
> > > > >> I have also implemented the next set of patches.
> > > > >> 0004 -> Provide a way to create custom compression methods
> > > > >> 0005 -> Extention to implement lz4 as a custom compression method.
> > > > >
> > > > >In the updated version I have worked on some of the listed items
> > > > >> A pending list of items:
> > > > >> 1. Provide support for handling the compression option
> > > > >> - As discussed up thread I will store the compression option of the
> > > > >> latest compression method in a new field in pg_atrribute table
> > > > >> 2. As of now I have kept zlib as the second built-in option and lz4 as
> > > > >> a custom compression extension. In Offlist discussion with Robert, he
> > > > >> suggested that we should keep lz4 as the built-in method and we can
> > > > >> move zlib as an extension because lz4 is faster than zlib so better to
> > > > >> keep that as the built-in method. So in the next version, I will
> > > > >> change that. Any different opinion on this?
> > > > >
> > > > >Done
> > > > >
> > > > >> 3. Improve the documentation, especially for create_compression_method.
> > > > >> 4. By default support table compression method for the index.
> > > > >
> > > > >Done
> > > > >
> > > > >> 5. Support the PRESERVE ALL option so that we can preserve all
> > > > >> existing lists of compression methods without providing the whole
> > > > >> list.
> > > > >
> > > > >1,3,5 points are still pending.
> > > > >
> > > >
> > > > Thanks. I took a quick look at the patches and I think it seems fine. I
> > > > have one question, though - toast_compress_datum contains this code:
> > > >
> > > >
> > > > /* Call the actual compression function */
> > > > tmp = cmroutine->cmcompress((const struct varlena *) value);
> > > > if (!tmp)
> > > > return PointerGetDatum(NULL);
> > > >
> > > >
> > > > Shouldn't this really throw an error instead? I mean, if the compression
> > > > library returns NULL, isn't that an error?
> > >
> > > I don't think that we can throw an error here because pglz_compress
> > > might return -1 if it finds that it can not reduce the size of the
> > > data and we consider such data as "incompressible data" and return
> > > NULL. In such a case the caller will try to compress another
> > > attribute of the tuple. I think we can handle such cases in the
> > > specific handler functions.
> >
> > I have added the compression failure error in lz4.c, please refer
> > lz4_cmcompress in v9-0001 patch. Apart from that, I have also
> > supported the PRESERVE ALL syntax to preserve all the existing
> > compression methods. I have also rebased the patch on the current
> > head.
>
> I have added the next patch to support the compression options. I am
> storing the compression options only for the latest compression
> method. Basically, based on this design we would be able to support
> the options which are used only for compressions. As of now, the
> compression option infrastructure is added and the compression options
> for inbuilt method pglz and the external method zlib are added. Next,
> I will work on adding the options for the lz4 method.

In the attached patch set I have also included the compression option
support for lz4. As of now, I have only supported the acceleration
for LZ4_compress_fast. There is also support for the dictionary-based
compression but if we try to support that then we will need the
dictionary for decompression also. Since we are only keeping the
options for the current compression methods, we can not support
dictionary-based options as of now.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v11-0004-Create-custom-compression-methods.patch application/octet-stream 39.3 KB
v11-0005-new-compression-method-extension-for-zlib.patch application/octet-stream 9.8 KB
v11-0001-Built-in-compression-method.patch application/octet-stream 200.8 KB
v11-0003-Add-support-for-PRESERVE.patch application/octet-stream 40.4 KB
v11-0002-alter-table-set-compression.patch application/octet-stream 12.8 KB
v11-0006-Support-compression-methods-options.patch application/octet-stream 43.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo NAGATA 2020-10-28 08:26:40 Re: Implementing Incremental View Maintenance
Previous Message Amit Langote 2020-10-28 07:46:01 Re: partition routing layering in nodeModifyTable.c