Re: using an end-of-recovery record in all cases

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: nathandbossart(at)gmail(dot)com
Cc: robertmhaas(at)gmail(dot)com, rjuju123(at)gmail(dot)com, sulamul(at)gmail(dot)com, andres(at)anarazel(dot)de, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: using an end-of-recovery record in all cases
Date: 2022-04-20 01:41:07
Message-ID: 20220420.104107.1281200824149550704.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Tue, 19 Apr 2022 13:37:59 -0700, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote in
> On Mon, Apr 18, 2022 at 04:44:03PM -0400, Robert Haas wrote:
> > Here is a new version of the patch which, unlike v1, I think is
> > something we could seriously consider applying (not before v16, of
> > course). It now removes CHECKPOINT_END_OF_RECOVERY completely, and I
> > attach a second patch as well which nukes checkPoint.PrevTimeLineID as
> > well.
>
> I'd like to add a big +1 for this change. IIUC this should help with some
> of the problems I've noted elsewhere [0].

Agreed.

> > I mentioned two problems with $SUBJECT in the first email with this
> > subject line. One was a bug, which Noah has since fixed (thanks,
> > Noah). The other problem is that LogStandbySnapshot() and a bunch of
> > its friends expect latestCompletedXid to always be a normal XID, which
> > is a problem because (1) we currently set nextXid to 3 and (2) at
> > startup, we compute latestCompletedXid = nextXid - 1. The current code
> > dodges this kind of accidentally: the checkpoint that happens at
> > startup is a "shutdown checkpoint" and thus skips logging a standby
> > snapshot, since a shutdown checkpoint is a sure indicator that there
> > are no running transactions. With the changes, the checkpoint at
> > startup happens after we've started allowing write transactions, and
> > thus a standby snapshot needs to be logged also. In the cases where
> > the end-of-recovery record was already being used, the problem could
> > have happened already, except for the fact that those cases involve a
> > standby promotion, which doesn't happen during initdb. I explored a
> > few possible ways of solving this problem.
>
> Shouldn't latestCompletedXid be set to MaxTransactionId in this case? Or
> is this related to the logic in FullTransactionIdRetreat() that avoids
> skipping over the "actual" special transaction IDs?

As the result FullTransactionIdRetreat(FirstNormalFullTransactionId)
results in FrozenTransactionId, which looks odd. It seems to me
rather should be InvalidFullTransactionId, or simply should assert-out
that case. But incrmenting the very first xid avoid all that
complexity. It is somewhat hacky but very smiple and understandable.

> > So ... I decided that the best approach here seems to be to just skip
> > FirstNormalTransactionId and use FirstNormalTransactionId + 1 for the
> > first write transaction that the cluster ever processes. That's very
> > simple and doesn't seem likely to break anything else. On the downside
> > it seems a bit grotty, but I don't see anything better, and on the
> > whole, I think with this approach we come out substantially ahead.
> > 0001 removes 3 times as many lines as it inserts, and 0002 saves a few
> > more lines of code.
>
> This doesn't seem all that bad to me. It's a little hacky, but it's very
> easy to understand and only happens once per initdb. I don't think it's
> worth any extra complexity.

+1.

> > Now, I still don't really know that there isn't some theoretical
> > difficulty here that makes this whole approach a non-starter, but I
> > also can't think of what it might be. If the promotion case, which has
> > used the end-of-recovery record for many years, is basically safe,
> > despite the fact that it switches TLIs, then it seems to me that the
> > crash recovery case, which doesn't have that complication, ought to be
> > safe too. But I might well be missing something, so if you see a
> > problem, please speak up!
>
> Your reasoning seems sound to me.
>
> [0] https://postgr.es/m/C1EE64B0-D4DB-40F3-98C8-0CED324D34CB%40amazon.com

FWIW, I don't find a flaw in the reasoning, too.

By the way do we need to leave CheckPoint.PrevTimeLineID? It is always
the same value with ThisTimeLineID. The most dubious part is
ApplyWalRecord but XLOG_CHECKPOINT_SHUTDOWN no longer induces timeline
switch.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2022-04-20 01:56:08 Re: Handle infinite recursion in logical replication setup
Previous Message Michael Paquier 2022-04-20 01:29:12 Re: minor MERGE cleanups