Re: [HACKERS] Custom compression methods

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-02-10 20:06:56
Message-ID: CA+Tgmoa3qzewoh9JvXVm5BEA_qKewK9hfJJhyAZFXYhsx9P-1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 10, 2021 at 9:52 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> [ new patches ]

I think that in both varattrib_4b and toast_internals.h it would be
better to pick a less generic field name. In toast_internals.h it's
just info; in postgres.h it's va_info. But:

[rhaas pgsql]$ git grep info | wc -l
24552

There are no references in the current source tree to va_info, so at
least that one is greppable, but it's still not very descriptive. I
suggest info -> tcinfo and va_info -> va_tcinfo, where "tc" stands for
"TOAST compression". Looking through 24552 references to info to find
the ones that pertain to this feature might take longer than searching
the somewhat shorter list of references to tcinfo, which prepatch is
just:

[rhaas pgsql]$ git grep tcinfo | wc -l
0

I don't see why we should allow for datum_decompress to be optional,
as toast_decompress_datum_slice does. Likely every serious compression
method will support that anyway. If not, the compression AM can deal
with the problem, rather than having the core code do it. That will
save some tiny amount of performance, too.

src/backend/access/compression/Makefile is missing a copyright header.

It's really sad that lz4_cmdecompress_slice allocates
VARRAWSIZE_4B_C(value) + VARHDRSZ rather than slicelength + VARHDRSZ
as pglz_cmdecompress_slice() does. Is that a mistake, or is that
necessary for some reason? If it's a mistake, let's fix it. If it's
necessary, let's add a comment about why, probably starting with
"Unfortunately, ....".

I think you have a fairly big problem with row types. Consider this example:

create table t1 (a int, b text compression pglz);
create table t2 (a int, b text compression lz4);
create table t3 (x t1);
insert into t1 values (1, repeat('foo', 1000));
insert into t2 values (1, repeat('foo', 1000));
insert into t3 select t1 from t1;
insert into t3 select row(a, b)::t1 from t2;

rhaas=# select pg_column_compression((t3.x).b) from t3;
pg_column_compression
-----------------------
pglz
lz4
(2 rows)

That's not good, because now

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Wieck 2021-02-10 21:32:21 Re: Extensibility of the PostgreSQL wire protocol
Previous Message Justin Pryzby 2021-02-10 20:04:58 Re: CLUSTER on partitioned index