Re: Possible corruption by CreateRestartPoint at promotion

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: michael(at)paquier(dot)xyz
Cc: nathandbossart(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, masao(dot)fujii(at)oss(dot)nttdata(dot)com
Subject: Re: Possible corruption by CreateRestartPoint at promotion
Date: 2022-04-28 02:39:38
Message-ID: 20220428.113938.619949375918884373.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 27 Apr 2022 14:16:01 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in
> On Tue, Apr 26, 2022 at 08:26:09PM -0700, Nathan Bossart wrote:
> > On Wed, Apr 27, 2022 at 10:43:53AM +0900, Kyotaro Horiguchi wrote:
> >> At Tue, 26 Apr 2022 11:33:49 -0700, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote in
> >>> I suspect we'll start seeing this problem more often once end-of-recovery
> >>> checkpoints are removed [0]. Would you mind creating a commitfest entry
> >>> for this thread? I didn't see one.
> >>
> >> I'm not sure the patch makes any change here, because restart points
> >> don't run while crash recovery, since no checkpoint records seen
> >> during a crash recovery. Anyway the patch doesn't apply anymore so
> >> rebased, but only the one for master for the lack of time for now.
> >
> > Thanks for the new patch! Yeah, it wouldn't affect crash recovery, but
> > IIUC Robert's patch also applies to archive recovery.
> >
> >> + /* Update pg_control */
> >> + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
> >> +
> >> /*
> >> * Remember the prior checkpoint's redo ptr for
> >> * UpdateCheckPointDistanceEstimate()
> >> */
> >> PriorRedoPtr = ControlFile->checkPointCopy.redo;
> >
> > nitpick: Why move the LWLockAcquire() all the way up here?
>
> Yeah, that should not be necessary. InitWalRecovery() is the only
> place outside the checkpointer that would touch this field, but that
> happens far too early in the startup sequence to matter with the
> checkpointer.

Yes it is not necessary. I just wanted to apparently ensure not to
access ControlFile outside ControlFileLoc. So I won't be opposed to
reverting it since, as you say, it is *actuall* safe..

> >> + Assert (PriorRedoPtr < RedoRecPtr);
> >
> > I think this could use a short explanation.
>
> That's just to make sure that the current redo LSN is always older
> than the one prior that. It does not seem really necessary to me to
> add that.

Just after we call UpdateCheckPointDistanceEstimate(RedoRecPtr -
PriorRedoPtr). Don't we really need any safeguard against giving a
wrap-arounded (in other words, treamendously large) value to the
function? Actually it doesn't seem to happen now, but I don't
confident that that never ever happens.

That being said, I'm a minority here, so removed it.

> >> + /*
> >> + * Aarchive recovery has ended. Crash recovery ever after should
> >> + * always recover to the end of WAL
> >> + */
>
> s/Aarchive/Archive/.

Oops! Fixed.

> >> + ControlFile->minRecoveryPoint = InvalidXLogRecPtr;
> >> + ControlFile->minRecoveryPointTLI = 0;
> >> +
> >> + /* also update local copy */
> >> + LocalMinRecoveryPoint = InvalidXLogRecPtr;
> >> + LocalMinRecoveryPointTLI = 0;
> >
> > Should this be handled by the code that changes the control file state to
> > DB_IN_PRODUCTION instead? It looks like this is ordinarily done in the
> > next checkpoint. It's not clear to me why it is done this way.
>
> Anyway, that would be the work of the end-of-recovery checkpoint
> requested at the end of StartupXLOG() once a promotion happens or of
> the checkpoint requested by PerformRecoveryXLogAction() in the second
> case, no? So, I don't quite see why we need to update

Eventually the work is done by StartupXLOG(). So we don't need to do
that at all even in CreateCheckPoint(). If we expect that the
end-of-recovery checkpoint clears it, that won't happen if if the last
restart point takes so long time that the end-of-recovery checkpoint
request is ignored. If DB_IN_ARCHIVE_RECOVERY ended while the restart
point is running, it is highly possible that the end-of-recovery
checkpoint trigger is ignored. In that case the values are cleard at
the next checkpoint.

In short, if we want to reset them at so-called end-of-recovery
checkpoint, we should do that also in CreateRecoveryPoint.

So, it is not removed in this version.

> minRecoveryPoint and minRecoveryPointTLI in the control file here, as
> much as this does not have to be part of the end-of-recovery code
> that switches the control file to DB_IN_PRODUCTION.
>
> - if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY &&
> - ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
> - {
> 7ff23c6 has removed the last call to CreateCheckpoint() outside the
> checkpointer, meaning that there is one less concurrent race to worry
> about, but I have to admit that this change, to update the control

Sure. In very early stage the reasoning to rmove the code was it.
And the rason for proposing to back-patch the same to older versions
is basing on the further investigation, and I'm not fully confident
that for the earlier versions.

> file's checkPoint and checkPointCopy even if we don't check after
> ControlFile->checkPointCopy.redo < lastCheckPoint.redo would make the
> code less robust in ~14. So I am questioning whether a backpatch
> is actually worth the risk here.

Agreed.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-04-28 02:43:57 Re: Possible corruption by CreateRestartPoint at promotion
Previous Message John Naylor 2022-04-28 02:32:18 Re: trivial comment fix