Re: In-placre persistance change of a relation

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: stark(dot)cfm(at)gmail(dot)com
Cc: hlinnaka(at)iki(dot)fi, barwick(at)gmail(dot)com, jchampion(at)timescale(dot)com, pryzby(at)telsasoft(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, rjuju123(at)gmail(dot)com, jakub(dot)wartak(at)tomtom(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: In-placre persistance change of a relation
Date: 2023-03-17 06:16:34
Message-ID: 20230317.151634.1038632016265639446.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Fri, 03 Mar 2023 18:03:53 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> Correctly they are three parts. The attached patch is the first part -
> the storage mark files, which are used to identify storage files that
> have not been committed and should be removed during the next
> startup. This feature resolves the issue of orphaned storage files
> that may result from a crash occurring during the execution of a
> transaction involving the creation of a new table.
>
> I'll post all of the three parts shortly.

Mmm. It took longer than I said, but this is the patch set that
includes all three parts.

1. "Mark files" to prevent orphan storage files for in-transaction
created relations after a crash.

2. In-place persistence change: For ALTER TABLE SET LOGGED/UNLOGGED
with wal_level minimal, and ALTER TABLE SET UNLOGGED with other
wal_levels, the commands don't require a file copy for the relation
storage. ALTER TABLE SET LOGGED with non-minimal wal_level emits
bulk FPIs instead of a bunch of individual INSERTs.

3. An extension to ALTER TABLE SET (UN)LOGGED that can handle all
tables in a tablespace at once.

As a side note, I quickly go over the behavior of the mark files
introduced by the first patch, particularly what happens when deletion
fails.

(1) The mark file for MAIN fork ("<oid>.u") corresponds to all forks,
while the mark file for INIT fork ("<oid>_init.u") corresponds to
INIT fork alone.

(2) The mark file is created just before the the corresponding storage
file is made. This is always logged in the WAL.

(3) The mark file is deleted after removing the corresponding storage
file during the commit and rollback. This action is logged in the
WAL, too. If the deletion fails, an ERROR is output and the
transaction aborts.

(4) If a crash leaves a mark file behind, server will try to delete it
after successfully removing the corresponding storage file during
the subsequent startup that runs a recovery. If deletion fails,
server leaves the mark file alone with emitting a WARNING. (The
same behavior for non-mark files.)

(5) If the deletion of the mark file fails, the leftover mark file
prevents the creation of the corresponding storage file (causing
an ERROR). The leftover mark files don't result in the removal of
the wrong files due to that behavior.

(6) The mark file for an INIT fork is created only when ALTER TABLE
SET UNLOGGED is executed (not for CREATE UNLOGGED TABLE) to signal
the crash-cleanup code to remove the INIT fork. (Otherwise the
cleanup code removes the main fork instead. This is the main
objective of introducing the mark files.)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v27-0001-Storage-mark-files.patch text/x-patch 55.7 KB
v27-0002-In-place-table-persistence-change.patch text/x-patch 33.3 KB
v27-0003-New-command-ALTER-TABLE-ALL-IN-TABLESPACE-SET-LO.patch text/x-patch 19.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message wangw.fnst@fujitsu.com 2023-03-17 06:28:00 RE: Data is copied twice when specifying both child and parent table in publication
Previous Message Amit Kapila 2023-03-17 06:07:04 Re: Add macros for ReorderBufferTXN toptxn