Re: BUG #17744: Fail Assert while recoverying from pg_basebackup

From: Andres Freund <andres(at)anarazel(dot)de>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: zxwsbg12138(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17744: Fail Assert while recoverying from pg_basebackup
Date: 2023-02-01 15:32:52
Message-ID: 20230201153252.l6kcfum7trdovw2b@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2023-01-13 18:36:05 +0900, Kyotaro Horiguchi wrote:
> At Tue, 10 Jan 2023 07:45:45 +0000, PG Bug reporting form <noreply(at)postgresql(dot)org> wrote in
> > #2 0x0000000000b378e9 in ExceptionalCondition (
> > conditionName=0xd13697 "TransactionIdIsValid(initial)",
> > errorType=0xd12df4 "FailedAssertion", fileName=0xd12de8 "procarray.c",
> >
> > lineNumber=1750) at assert.c:69
> > #3 0x0000000000962195 in ComputeXidHorizons (h=0x7ffe93de25e0)
> > at procarray.c:1750
> > #4 0x00000000009628a3 in GetOldestTransactionIdConsideredRunning ()
> > at procarray.c:2050
> > #5 0x00000000005972bf in CreateRestartPoint (flags=256) at xlog.c:7153
> > #6 0x00000000008cae37 in CheckpointerMain () at checkpointer.c:464
>
> The function requires a valid value in
> ShmemVariableCache->latestCompleteXid. But it is not initialized and
> maintained in this case. The attached quick hack seems working, but
> of course more decent fix is needed.

I might be missing something, but I suspect the problem here is that we
shouldn't have been creating a restart point. Afaict, the setup
instructions provided don't configure a recovery.signal, so we'll just
perform crash recovery.

And I don't think it'd ever make sense to create a restart point during
crash recovery?

Except that in this case, it's not pure crash recovery, it's restoring
from a backup label. Due to which it actually might make sense to create
restart points? If you're doing PITR or such you don't really gain
anything by doing checkpoints until you've reached consistency, unless
you want to optimize for the case that you might need to start/stop the
instance multiple times?

So maybe it's the right thing to create restart points? Really not sure.

If we do want to do restartpoints, we definitely shouldn't try to
TruncateSUBTRANS() in the crash-recovery-like-restartpoint case, we've
not even done StartupSUBTRANS(), because that's guarded by
ArchiveRecoveryRequested.

The most obvious (but wrong!), fix would be to change

if (EnableHotStandby)
TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning());
to
if (standbyState != STANDBY_DISABLED)
TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning());
except that doesn't work, because we don't have working access to
standbyState. Nor the other relevant variables. Gah.

We've really made a hash out of the state management for
xlog.c. ArchiveRecoveryRequested, InArchiveRecovery,
StandbyModeRequested, StandbyMode, EnableHotStandby,
LocalHotStandbyActive, ... :(. We use InArchiveRecovery = true, even if
there's no archiving involved. Afaict ArchiveRecoveryRequested=false,
InArchiveRecovery=true isn't really something the comments around the
variables foresee.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2023-02-01 20:01:34 Re: BUG #17767: psql: tab-completion causes warnings when standard_conforming_strings = off
Previous Message Tom Lane 2023-02-01 14:51:11 Re: range_agg extremely slow compared to naive implementation in obscure circumstances