Re: [HACKERS] Custom compression methods

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-03-11 03:20:57
Message-ID: 20210311032057.GZ2021@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 11, 2021 at 08:17:46AM +0530, Dilip Kumar wrote:
> On Thu, Mar 11, 2021 at 2:21 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > On Wed, Mar 10, 2021 at 6:52 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > The pending comment is providing a way to rewrite a table and
> > > re-compress the data with the current compression method.
> >
> > I spent some time poking at this yesterday and ran couldn't figure out
> > what was going on here. There are two places where we rewrite tables.
> > One is the stuff in cluter.c, which handles VACUUM FULL and CLUSTER.
> > That eventually calls reform_and_rewrite_tuple(), which deforms the
> > old tuple and creates a new one, but it doesn't seem like there's
> > anything in there that would expand toasted values, whether external
> > or inline compressed. But I think that can't be right, because it
> > seems like then you'd end up with toast pointers into the old TOAST
> > relation, not the new one, which would cause failures later. So I must
> > be missing something here. The other place where we rewrite tables is
> > in ATRewriteTable() as part of the ALTER TABLE machinery. I don't see
> > anything there to force detoasting either.
>
> We do expand the external values, see below call stack. See below call stack.
>
> #0 detoast_external_attr (attr=0x2d61a80) at detoast.c:50
> #1 0x000000000055bd53 in toast_tuple_init
> #2 0x000000000050554d in heap_toast_insert_or_update
> #3 0x000000000050ad5b in raw_heap_insert
> #4 0x000000000050a9a1 in rewrite_heap_tuple
> #5 0x0000000000502325 in reform_and_rewrite_tuple
> #6 0x00000000004ff47c in heapam_relation_copy_for_cluster
>
> But that is only if there are external attributes, we do nothing for
> the inline compressed values. In raw_heap_insert only if
> HeapTupleHasExternal(tup) is true then we call raw_heap_insert. So if
> we want to do something about inline compressed data then we will have
> to do something in reform_and_rewrite_tuple because there we deform
> and form the tuple again so we have an opportunity to decompress any
> compressed data.
>
> > But, I am not really convinced that we need to solve this problem by
> > adding new ALTER TABLE syntax. I'd be happy enough if CLUSTER, VACUUM
> > FULL, and versions of ALTER TABLE that already force a rewrite would
> > cause the compression to be redone also.
>
> Okay, Maybe for directly compressed data we can do that without
> affecting the performance of unrelated paths but I am again worried
> about the composite type. Basically, if we have some row type then we
> have to deform the tuple stored in the row type and process it fully.
> Said that I think in the older version we already had a pacthes at
> [1], basically the first 3 patches will ensure that we will never have
> any compressed data in the composite type so we will not have to worry
> about those as well.
>
> Honestly, even if the user
> > had to fall back on creating a new table and doing INSERT INTO newtab
> > SELECT * FROM oldtab I would consider that to be not a total
> > showstopper for this .. assuming of course that it actually works. If
> > it doesn't, we have big problems. Even without the pg_am stuff, we
> > still need to make sure that we don't just blindly let compressed
> > values wander around everywhere. When we insert into a table column
> > with a compression method, we should recompress any data that is
> > compressed using some other method.
>
> Well, it used to work in the older version[1] so we can make it work
> here as well without much effort.

Looking at v23-0002-alter-table-set-compression, ATRewriteTable() was calling
CompareCompressionMethodAndDecompress().

I think what's wanted here is that decompression should only happen when the
tuple uses a different compression than the column's currently set compression.
So there's no overhead in the usual case. I guess CLUSTER and INSERT SELECT
should do the same.

This is important to allow someone to get rid of LZ4 compression, if they want
to get rid of that dependency.

But it's also really desirable for admins to be able to "migrate" existing
data. People will want to test this, and I guess INSERT SELECT and CLUSTER are
the obvious ways.

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-03-11 03:21:00 Re: shared-memory based stats collector
Previous Message Kyotaro Horiguchi 2021-03-11 03:11:19 Re: libpq debug log