Re: [HACKERS] Custom compression methods

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-03-10 20:50:48
Message-ID: CA+TgmoY5gTwurzHM3Ofv-jVTXGkmJqYib_+B9ev_7JKop9NmjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 10, 2021 at 6:52 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> The pending comment is providing a way to rewrite a table and
> re-compress the data with the current compression method.

I spent some time poking at this yesterday and ran couldn't figure out
what was going on here. There are two places where we rewrite tables.
One is the stuff in cluter.c, which handles VACUUM FULL and CLUSTER.
That eventually calls reform_and_rewrite_tuple(), which deforms the
old tuple and creates a new one, but it doesn't seem like there's
anything in there that would expand toasted values, whether external
or inline compressed. But I think that can't be right, because it
seems like then you'd end up with toast pointers into the old TOAST
relation, not the new one, which would cause failures later. So I must
be missing something here. The other place where we rewrite tables is
in ATRewriteTable() as part of the ALTER TABLE machinery. I don't see
anything there to force detoasting either.

That said, I think that using the word REWRITE may not really capture
what we're on about. Leaving aside the question of exactly what the
CLUSTER code does today, you could in theory rewrite the main table by
just taking all the tuples and putting them into a new relfilenode.
And then you could do the same thing with the TOAST table. And despite
having fully rewritten both tables, you wouldn't have done anything
that helps with this problem because you haven't deformed the tuples
at any point. Now as it happens we do have code -- in
reform_and_rewrite_tuple() -- that does deform and reform the tuples,
but it doesn't take care of this problem either. We might need to
distinguish between rewriting the table, which is mostly about getting
a new relfilenode, and some other word that means doing this.

But, I am not really convinced that we need to solve this problem by
adding new ALTER TABLE syntax. I'd be happy enough if CLUSTER, VACUUM
FULL, and versions of ALTER TABLE that already force a rewrite would
cause the compression to be redone also. Honestly, even if the user
had to fall back on creating a new table and doing INSERT INTO newtab
SELECT * FROM oldtab I would consider that to be not a total
showstopper for this .. assuming of course that it actually works. If
it doesn't, we have big problems. Even without the pg_am stuff, we
still need to make sure that we don't just blindly let compressed
values wander around everywhere. When we insert into a table column
with a compression method, we should recompress any data that is
compressed using some other method.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message 'alvherre@alvh.no-ip.org' 2021-03-10 21:26:52 Re: libpq debug log
Previous Message Robert Haas 2021-03-10 20:28:39 Re: pg_amcheck contrib application