Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-03-11 02:47:46
Message-ID: CAFiTN-sK1TbCpbFRDF+-TaG4G_hw4S4vUKujGepwg_A7On9UNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 11, 2021 at 2:21 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Mar 10, 2021 at 6:52 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > The pending comment is providing a way to rewrite a table and
> > re-compress the data with the current compression method.
>
> I spent some time poking at this yesterday and ran couldn't figure out
> what was going on here. There are two places where we rewrite tables.
> One is the stuff in cluter.c, which handles VACUUM FULL and CLUSTER.
> That eventually calls reform_and_rewrite_tuple(), which deforms the
> old tuple and creates a new one, but it doesn't seem like there's
> anything in there that would expand toasted values, whether external
> or inline compressed. But I think that can't be right, because it
> seems like then you'd end up with toast pointers into the old TOAST
> relation, not the new one, which would cause failures later. So I must
> be missing something here. The other place where we rewrite tables is
> in ATRewriteTable() as part of the ALTER TABLE machinery. I don't see
> anything there to force detoasting either.

We do expand the external values, see below call stack. See below call stack.

#0 detoast_external_attr (attr=0x2d61a80) at detoast.c:50
#1 0x000000000055bd53 in toast_tuple_init
#2 0x000000000050554d in heap_toast_insert_or_update
#3 0x000000000050ad5b in raw_heap_insert
#4 0x000000000050a9a1 in rewrite_heap_tuple
#5 0x0000000000502325 in reform_and_rewrite_tuple
#6 0x00000000004ff47c in heapam_relation_copy_for_cluster

But that is only if there are external attributes, we do nothing for
the inline compressed values. In raw_heap_insert only if
HeapTupleHasExternal(tup) is true then we call raw_heap_insert. So if
we want to do something about inline compressed data then we will have
to do something in reform_and_rewrite_tuple because there we deform
and form the tuple again so we have an opportunity to decompress any
compressed data.

> But, I am not really convinced that we need to solve this problem by
> adding new ALTER TABLE syntax. I'd be happy enough if CLUSTER, VACUUM
> FULL, and versions of ALTER TABLE that already force a rewrite would
> cause the compression to be redone also.

Okay, Maybe for directly compressed data we can do that without
affecting the performance of unrelated paths but I am again worried
about the composite type. Basically, if we have some row type then we
have to deform the tuple stored in the row type and process it fully.
Said that I think in the older version we already had a pacthes at
[1], basically the first 3 patches will ensure that we will never have
any compressed data in the composite type so we will not have to worry
about those as well.

Honestly, even if the user
> had to fall back on creating a new table and doing INSERT INTO newtab
> SELECT * FROM oldtab I would consider that to be not a total
> showstopper for this .. assuming of course that it actually works. If
> it doesn't, we have big problems. Even without the pg_am stuff, we
> still need to make sure that we don't just blindly let compressed
> values wander around everywhere. When we insert into a table column
> with a compression method, we should recompress any data that is
> compressed using some other method.

Well, it used to work in the older version[1] so we can make it work
here as well without much effort.

[1] https://www.postgresql.org/message-id/CAFiTN-u%3D2-qaLTod3isQmXuSU0s0_bR%2BRcUQL-vSvH%3DMJbEd7Q%40mail.gmail.com

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2021-03-11 02:52:07 Re: About to add WAL write/fsync statistics to pg_stat_wal view
Previous Message houzj.fnst@fujitsu.com 2021-03-11 02:44:36 RE: Parallel INSERT (INTO ... SELECT ...)