Re: In-placre persistance change of a relation

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: Jakub(dot)Wartak(at)tomtom(dot)com
Cc: tsunakawa(dot)takay(at)fujitsu(dot)com, osumi(dot)takamichi(at)fujitsu(dot)com, sfrost(at)snowman(dot)net, masao(dot)fujii(at)oss(dot)nttdata(dot)com, ashutosh(dot)bapat(dot)oss(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: In-placre persistance change of a relation
Date: 2021-12-22 06:13:27
Message-ID: 20211222.151327.439673660364783186.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, Jakub.

At Tue, 21 Dec 2021 13:07:28 +0000, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com> wrote in
> So what's suspicious is that 122880 -> 0 file size truncation. I've investigated WAL and it seems to contain TRUNCATE records
> after logged FPI images, so when the crash recovery would kick in it probably clears this table (while it shouldn't).

Darn.. It is too silly that I wrongly issued truncate records for the
target relation of the function (rel) instaed of the relation on which
we're currently operating at that time (r).

> However if I perform CHECKPOINT just before crash the WAL stream contains just RUNNING_XACTS and CHECKPOINT_ONLINE
> redo records, this probably prevents truncating. I'm newbie here so please take this theory with grain of salt, it can be
> something completely different.

It is because the WAL records are inconsistent with the on-disk state.
After a crash before a checkpoint after the SET LOGGED, recovery ends with
recoverying the broken WAL records, but after that the on-disk state
is persisted and the broken WAL records are not replayed.

The following fix works.

--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5478,7 +5478,7 @@ RelationChangePersistence(AlteredTableInfo *tab, char persistence,
xl_smgr_truncate xlrec;

xlrec.blkno = 0;
- xlrec.rnode = rel->rd_node;
+ xlrec.rnode = r->rd_node;
xlrec.flags = SMGR_TRUNCATE_ALL;

I made another change in this version. Previously only btree among all
index AMs was processed in the in-place manner. In this version we do
that all AMs except GiST. Maybe if gistGetFakeLSN behaved the same
way for permanent and unlogged indexes, we could skip index rebuild in
exchange of some extra WAL records emitted while it is unlogged.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v11-0001-In-place-table-persistence-change.patch text/x-patch 75.3 KB
v11-0002-New-command-ALTER-TABLE-ALL-IN-TABLESPACE-SET-LO.patch text/x-patch 11.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2021-12-22 07:31:37 Re: row filtering for logical replication
Previous Message Amit Kapila 2021-12-22 05:54:52 Re: row filtering for logical replication