|From:||Michael Paquier <michael(at)paquier(dot)xyz>|
|To:||Andres Freund <andres(at)anarazel(dot)de>|
|Cc:||pgsql-hackers(at)postgresql(dot)org, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>|
|Subject:||Re: Incorrect snapshots while promoting hot standby node when 2PC is used|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On Thu, Apr 22, 2021 at 01:36:03PM -0700, Andres Freund wrote:
> The sequence in StartupXLOG() leading to the issue is the following:
> 1) RecoverPreparedTransactions();
> 2) ShutdownRecoveryTransactionEnvironment();
> 3) XLogCtl->SharedRecoveryState = RECOVERY_STATE_DONE;
> Because 2) resets the KnownAssignedXids machinery, snapshots that happen
> between 2) and 3) will not actually look at the procarray to compute
> snapshots, as that only happens within
> snapshot->takenDuringRecovery = RecoveryInProgress();
> if (!snapshot->takenDuringRecovery)
> and RecoveryInProgress() checks XLogCtl->SharedRecoveryState !=
> RECOVERY_STATE_DONE, which is set in 3).
Oh, indeed. It is easy to see RecentXmin jumping back-and-worth while
running 023_pitr_prepared_xact.pl with a small sleep added just after
> Without prepared transaction there probably isn't an issue, because there
> shouldn't be any other in-progress xids at that point?
Yes, there should not be any as far as I recall. 2PC is kind of
special with its fake ProcArray entries.
> I think to fix the issue we'd have to move
> ShutdownRecoveryTransactionEnvironment() to after XLogCtl->SharedRecoveryState
> = RECOVERY_STATE_DONE.
> The acquisition of ProcArrayLock() in
> should prevent the data from being removed between the RecoveryInProgress()
> and the KnownAssignedXidsGetAndSetXmin() calls in GetSnapshotData().
> I haven't yet figured out whether there would be a problem with deferring the
> other tasks in ShutdownRecoveryTransactionEnvironment() until after
Hmm. This would mean releasing all the exclusive locks tracked by a
standby, as of StandbyReleaseAllLocks(), after opening the instance
for writes after a promotion. I don't think that's unsafe, but it
would be intrusive.
Anyway, isn't the issue ExpireAllKnownAssignedTransactionIds() itself,
where we should try to not wipe out the 2PC entries to make sure that
all those snapshots still see the 2PC transactions as something to
count on? I am attaching a crude patch to show the idea.
|Next Message||Masahiko Sawada||2021-05-26 08:00:22||Re: [BUG] Autovacuum not dynamically decreasing cost_limit and cost_delay|
|Previous Messagefirstname.lastname@example.org||2021-05-26 07:37:15||Fix typo: multiple tuple => tuples|