Re: In-placre persistance change of a relation

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: nathandbossart(at)gmail(dot)com
Cc: jakub(dot)wartak(at)enterprisedb(dot)com, stark(dot)cfm(at)gmail(dot)com, hlinnaka(at)iki(dot)fi, barwick(at)gmail(dot)com, jchampion(at)timescale(dot)com, pryzby(at)telsasoft(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, rjuju123(at)gmail(dot)com, jakub(dot)wartak(at)tomtom(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: In-placre persistance change of a relation
Date: 2023-08-24 02:22:32
Message-ID: 20230824.112232.1895676924995181515.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for looking this!

At Mon, 14 Aug 2023 12:38:48 -0700, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote in
> I think there are some good ideas here. I started to take a look at the
> patches, and I've attached a rebased version of the patch set. Apologies
> if I am repeating any discussions from upthread.
>
> First, I tested the time difference in ALTER TABLE SET UNLOGGED/LOGGED with
> the patch applied, and the results looked pretty impressive.
>
> before:
> postgres=# alter table test set unlogged;
> ALTER TABLE
> Time: 5108.071 ms (00:05.108)
> postgres=# alter table test set logged;
> ALTER TABLE
> Time: 6747.648 ms (00:06.748)
>
> after:
> postgres=# alter table test set unlogged;
> ALTER TABLE
> Time: 25.609 ms
> postgres=# alter table test set logged;
> ALTER TABLE
> Time: 1241.800 ms (00:01.242)

Thanks for confirmation. The difference between the both directions is
that making a table logged requires to emit WAL records for the entire
content.

> My first question is whether 0001 is a prerequisite to 0002. I'm assuming
> it is, but the reason wasn't immediately obvious to me. If it's just

In 0002, if a backend crashes after creating an init fork file but
before the associated commit, a lingering fork file could result in
data loss on the next startup. Thus, an utterly reliable file cleanup
mechanism is essential. 0001 also addresses the orphan storage files
issue arising from ALTER TABLE and similar commands.

> nice-to-have, perhaps we could simplify the patch set a bit. I see that
> Heikki had some general concerns with the marker file approach [0], so
> perhaps it is at least worth brainstorming some alternatives if we _do_
> need it.

The rationale behind the file-based implementation is that any
leftover init fork file from a crash needs to be deleted before the
reinit(INIT) process kicks in, which happens irrelevantly to WAL,
before the start of crash recovery. I could implement it separately
from the reinit module, but I didn't since that results in almost a
duplication.

As commented in xlog.c, the purpose of the pre-recovery reinit CLEANUP
phase is to ensure hot standbys don't encounter erroneous unlogged
relations. Based on that requirement, we need a mechanism to
guarantee that additional crucial operations are executed reliably at
the next startup post-crash, right before recovery kicks in (or reinit
CLEANUP). 0001 persists this data on a per-operation basis tightly
bonded to their target objects.

I could turn this into something like undo longs in a simple form, but
I'd rather not craft a general-purpose undo log system for this unelss
it's absolutely necessary.

> [0] https://postgr.es/m/9827ebd3-de2e-fd52-4091-a568387b1fc2%40iki.fi

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2023-08-24 02:24:48 Re: [PoC] pg_upgrade: allow to upgrade publisher node
Previous Message Ian Lawrence Barwick 2023-08-24 01:22:49 pg_stat_get_backend_subxact() and backend IDs?