Re: [HACKERS] Custom compression methods

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2020-10-04 22:07:13
Message-ID: 20201004220713.6vlmm2e3amlz2dil@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I took a look at this patch after a long time, and done a bit of a
review+testing. I haven't re-read the whole thread since 2017 so some of
the following comments might be mistaken - sorry about that :-(

1) The "cmapi.h" naming seems unnecessarily short. I'd suggest using
simply compression or something like that. I see little reason to
shorten "compression" to "cm", or to prefix files with "cm_". For
example compression/cm_zlib.c might just be compression/zlib.c.

2) I see index_form_tuple does this:

Datum cvalue = toast_compress_datum(untoasted_values[i],
DefaultCompressionMethod);

which seems wrong - why shouldn't the indexes use the same compression
method as the underlying table?

3) dumpTableSchema in pg_dump.c does this:

switch (tbinfo->attcompression[j])
{
case 'p':
cmname = "pglz";
case 'z':
cmname = "zlib";
}

which is broken as it's missing break, so 'p' will produce 'zlib'.

4) The name ExecCompareCompressionMethod is somewhat misleading, as the
functions is not merely comparing compression methods - it also
recompresses the data.

5) CheckCompressionMethodsPreserved should document what the return
value is (true when new list contains all old values, thus not requiring
a rewrite). Maybe "Compare" would be a better name?

6) The new field in ColumnDef is missing a comment.

7) It's not clear to me what "partial list" in the PRESERVE docs means.

+ which of them should be kept on the column. Without PRESERVE or partial
+ list of compression methods the table will be rewritten.

8) The initial synopsis in alter_table.sgml includes the PRESERVE
syntax, but then later in the page it's omitted (yet the section talks
about the keyword).

9) attcompression ...

The main issue I see is what the patch does with attcompression. Instead
of just using it to store a the compression method, it's also used to
store the preserved compression methods. And using NameData to store
this seems wrong too - if we really want to store this info, the correct
way is either using text[] or inventing charvector or similar.

But to me this seems very much like a misuse of attcompression to track
dependencies on compression methods, necessary because we don't have a
separate catalog listing compression methods. If we had that, I think we
could simply add dependencies between attributes and that catalog.

Moreover, having the catalog would allow adding compression methods
(from extensions etc) instead of just having a list of hard-coded
compression methods. Which seems like a strange limitation, considering
this thread is called "custom compression methods".

10) compression parameters?

I wonder if we could/should allow parameters, like compression level
(and maybe other stuff, depending on the compression method). PG13
allowed that for opclasses, so perhaps we should allow it here too.

11) pg_column_compression

When specifying compression method not present in attcompression, we get
this error message and hint:

test=# alter table t alter COLUMN a set compression "pglz" preserve (zlib);
ERROR: "zlib" compression access method cannot be preserved
HINT: use "pg_column_compression" function for list of compression methods

but there is no pg_column_compression function, so the hint is wrong.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-10-05 00:48:21 Re: Buggy handling of redundant options in COPY
Previous Message Thomas Munro 2020-10-04 21:20:01 Re: A modest proposal: let's add PID to assertion failure messages