Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2020-10-05 07:48:14
Message-ID: CAFiTN-vp=or=1x=DjttV=WfLPsSWnKUVecL2PFGJHOqrrXTsrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 5, 2020 at 11:17 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Mon, Oct 5, 2020 at 3:37 AM Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> Thanks, Tomas for your feedback.
>
> > 9) attcompression ...
> >
> > The main issue I see is what the patch does with attcompression. Instead
> > of just using it to store a the compression method, it's also used to
> > store the preserved compression methods. And using NameData to store
> > this seems wrong too - if we really want to store this info, the correct
> > way is either using text[] or inventing charvector or similar.
>
> The reason for using the NameData is the get it in the fixed part of
> the data structure.
>
> > But to me this seems very much like a misuse of attcompression to track
> > dependencies on compression methods, necessary because we don't have a
> > separate catalog listing compression methods. If we had that, I think we
> > could simply add dependencies between attributes and that catalog.
>
> Basically, up to this patch, we are having only built-in compression
> methods and those can not be dropped so we don't need any dependency
> at all. We just want to know what is the current compression method
> and what is the preserve compression methods supported for this
> attribute. Maybe we can do it better instead of using the NameData
> but I don't think it makes sense to add a separate catalog?
>
> > Moreover, having the catalog would allow adding compression methods
> > (from extensions etc) instead of just having a list of hard-coded
> > compression methods. Which seems like a strange limitation, considering
> > this thread is called "custom compression methods".
>
> I think I forgot to mention while submitting the previous patch that
> the next patch I am planning to submit is, Support creating the custom
> compression methods wherein we can use pg_am catalog to insert the new
> compression method. And for dependency handling, we can create an
> attribute dependency on the pg_am row. Basically, we will create the
> attribute dependency on the current compression method AM as well as
> on the preserved compression methods AM. As part of this, we will
> add two build-in AMs for zlib and pglz, and the attcompression field
> will be converted to the oid_vector (first OID will be of the current
> compression method, followed by the preserved compression method's
> oids).
>
> > 10) compression parameters?
> >
> > I wonder if we could/should allow parameters, like compression level
> > (and maybe other stuff, depending on the compression method). PG13
> > allowed that for opclasses, so perhaps we should allow it here too.
>
> Yes, that is also in the plan. For doing this we are planning to add
> an extra column in the pg_attribute which will store the compression
> options for the current compression method. The original patch was
> creating an extra catalog pg_column_compression, therein it maintains
> the oid of the compression method as well as the compression options.
> The advantage of creating an extra catalog is that we can keep the
> compression options for the preserved compression methods also so that
> we can support the options which can be used for decompressing the
> data as well. Whereas if we want to avoid this extra catalog then we
> can not use that compression option for decompressing. But most of
> the options e.g. compression level are just for the compressing so it
> is enough to store for the current compression method only. What's
> your thoughts?
>
> Other comments look fine to me so I will work on them and post the
> updated patch set.

I have fixed the other comments except this,

> 2) I see index_form_tuple does this:
>
> Datum cvalue = toast_compress_datum(untoasted_values[i],
> DefaultCompressionMethod);

> which seems wrong - why shouldn't the indexes use the same compression
> method as the underlying table?

I will fix this in the next version of the patch.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v4-0001-Built-in-compression-method.patch application/octet-stream 187.2 KB
v4-0003-Add-support-for-PRESERVE.patch application/octet-stream 33.7 KB
v4-0002-alter-table-set-compression.patch application/octet-stream 11.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Borisov 2020-10-05 07:53:27 Re: [PATCH] Automatic HASH and LIST partition creation
Previous Message Rahila Syed 2020-10-05 06:36:49 Re: [PATCH] Automatic HASH and LIST partition creation