Re: [HACKERS] Custom compression methods (mac+lz4.h)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods (mac+lz4.h)
Date: 2021-03-22 15:29:10
Message-ID: CA+TgmoaG_p5VkYzAGQ0Ndo8PWPOm5Y55UoqBvkqSfoQmG=W1TA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 22, 2021 at 10:41 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Okay, the fix makes sense. In fact, IMHO, in general also this fix
> > looks like an optimization, I mean when slicelength >=
> > VARRAWSIZE_4B_C(value), then why do we need to allocate extra memory
> > even in the case of pglz. So shall we put this check directly in
> > toast_decompress_datum_slice instead of handling it at the lz4 level?
>
> Yeah, I thought about that too, but do we want to assume that
> VARRAWSIZE_4B_C is the correct way to get the decompressed size
> for all compression methods?

I think it's OK to assume this. If and when we add a third compression
method, it seems certain to just grab one of the two remaining bit
patterns. Now, things get a bit more complicated if and when we want
to add a fourth method, because at that point you've got to ask
yourself how comfortable you feel about stealing the last bit pattern
for your feature. But, if the solution to that problem were to decide
that whenever that last bit pattern is used, we will add an extra byte
(or word) after va_tcinfo indicating the real compression method, then
using VARRAWSIZE_4B_C here would still be correct. To imagine this
decision being wrong, you have to posit a world in which one of the
two remaining bit patterns for the high 2 bits cause the low 30 bits
to be interpreted as something other than the size, which I guess is
not totally impossible, but my first reaction is to think that such a
design would be (1) hard to make work and (2) unnecessarily painful.

> (If so, I think it would be better style to have a less opaque macro
> name for the purpose.)

Complaining about the name of one particular TOAST-related macro name
seems a bit like complaining about the greenhouse gasses emitted by
one particular car. They're pretty uniformly terrible. Does anyone
really know when to use VARATT_IS_1B_E or VARATT_IS_4B_U or any of
that cruft? Like, who decided that "is this varatt 1B E?" would be a
perfectly reasonable way of asking "is this varlena is TOAST
pointer?". While I'm complaining, it's hard to say enough bad things
about the fact that we have 12 consecutive completely obscure macro
definitions for which the only comments are (a) that they are
endian-dependent - which isn't even true for all of them - and (b)
that they are "considered internal." Apparently, they're SO internal
that they don't even need to be understandable to other developers.

Anyway, this particular macro name was chosen, it seems, for symmetry
with VARDATA_4B_C, but if you want to change it to something else, I'm
OK with that, too.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-03-22 15:47:59 Re: [HACKERS] Custom compression methods (mac+lz4.h)
Previous Message Dilip Kumar 2021-03-22 15:27:59 Re: [HACKERS] Custom compression methods (mac+lz4.h)