Re: BUG #17744: Fail Assert while recoverying from pg_basebackup

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: andres(at)anarazel(dot)de
Cc: thomas(dot)munro(at)gmail(dot)com, zxwsbg12138(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17744: Fail Assert while recoverying from pg_basebackup
Date: 2023-02-02 07:45:03
Message-ID: 20230202.164503.138587422646410717.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thanks for the comment!

At Wed, 1 Feb 2023 07:32:52 -0800, Andres Freund <andres(at)anarazel(dot)de> wrote in
> Hi,
>
> On 2023-01-13 18:36:05 +0900, Kyotaro Horiguchi wrote:
> > At Tue, 10 Jan 2023 07:45:45 +0000, PG Bug reporting form <noreply(at)postgresql(dot)org> wrote in
> > > #2 0x0000000000b378e9 in ExceptionalCondition (
> > > conditionName=0xd13697 "TransactionIdIsValid(initial)",
> > > errorType=0xd12df4 "FailedAssertion", fileName=0xd12de8 "procarray.c",
> > >
> > > lineNumber=1750) at assert.c:69
> > > #3 0x0000000000962195 in ComputeXidHorizons (h=0x7ffe93de25e0)
> > > at procarray.c:1750
> > > #4 0x00000000009628a3 in GetOldestTransactionIdConsideredRunning ()
> > > at procarray.c:2050
> > > #5 0x00000000005972bf in CreateRestartPoint (flags=256) at xlog.c:7153
> > > #6 0x00000000008cae37 in CheckpointerMain () at checkpointer.c:464
> >
> > The function requires a valid value in
> > ShmemVariableCache->latestCompleteXid. But it is not initialized and
> > maintained in this case. The attached quick hack seems working, but
> > of course more decent fix is needed.
>
> I might be missing something, but I suspect the problem here is that we
> shouldn't have been creating a restart point. Afaict, the setup
> instructions provided don't configure a recovery.signal, so we'll just
> perform crash recovery.
>
> And I don't think it'd ever make sense to create a restart point during
> crash recovery?
>
> Except that in this case, it's not pure crash recovery, it's restoring
> from a backup label. Due to which it actually might make sense to create
> restart points? If you're doing PITR or such you don't really gain
> anything by doing checkpoints until you've reached consistency, unless
> you want to optimize for the case that you might need to start/stop the
> instance multiple times?
>
>
> So maybe it's the right thing to create restart points? Really not sure.

In my memory that behavior was intentionally introduced
recently... yes, this is ([1]). And it seems that this is the motive:

> Once we had the checkpointer running, we could also consider making
> the end-of-recovery checkpoint optional, or at least the wait for it
> to complete. I haven't shown that in this patch, but it's just
> different checkpoint request flags and possibly an end-of-recovery
> record. What problems do you foresee with that? Why should we have
> "fast" promotion but not "fast" crash recovery?

[1]: https://www.postgresql.org/message-id/CA%2BhUKGJ8NRsqgkZEnsnRc2MFROBV-jCnacbYvtpptK2A9YYp9Q%40mail.gmail.com

> If we do want to do restartpoints, we definitely shouldn't try to
> TruncateSUBTRANS() in the crash-recovery-like-restartpoint case, we've
> not even done StartupSUBTRANS(), because that's guarded by
> ArchiveRecoveryRequested.

Yeah..

> The most obvious (but wrong!), fix would be to change

:)

> if (EnableHotStandby)
> TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning());
> to
> if (standbyState != STANDBY_DISABLED)
> TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning());
> except that doesn't work, because we don't have working access to
> standbyState. Nor the other relevant variables. Gah.

Yeah, also not have access to ArchiveRecoveryRequested (but I can
write the variable there!).

> We've really made a hash out of the state management for
> xlog.c. ArchiveRecoveryRequested, InArchiveRecovery,
> StandbyModeRequested, StandbyMode, EnableHotStandby,
> LocalHotStandbyActive, ... :(. We use InArchiveRecovery = true, even if
> there's no archiving involved. Afaict ArchiveRecoveryRequested=false,
> InArchiveRecovery=true isn't really something the comments around the
> variables foresee.

Ah... InArchiveRecovery is set when we know how far recovery needs to
run to reach consistency regardless whether recovery starts from crash
recovery or not. This is *slightly*(:p) different from what its
comment suggests.

Mmm. Okay, now I know it's not that simple. Let me sleep on it..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Kyotaro Horiguchi 2023-02-02 08:38:40 Re: BUG #17767: psql: tab-completion causes warnings when standard_conforming_strings = off
Previous Message Tom Lane 2023-02-02 05:51:07 Re: BUG #17767: psql: tab-completion causes warnings when standard_conforming_strings = off