Re: PITR promote bug: Checkpointer writes to older timeline

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, soumyadeep2007(at)gmail(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org, kevin(dot)yeap(at)vmware(dot)com, michael(at)paquier(dot)xyz, jyih(at)vmware(dot)com
Subject: Re: PITR promote bug: Checkpointer writes to older timeline
Date: 2021-03-03 08:46:42
Message-ID: 21658731-e335-54f4-5b5a-107f8a169751@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/03/2021 08:47, Kyotaro Horiguchi wrote:
> At Tue, 2 Mar 2021 17:56:03 -0800, Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com> wrote in
>> When there are prepared transactions in an older timeline, in the
>> checkpointer, a call to CheckPointTwoPhase() and subsequently to
>> XlogReadTwoPhaseData() and subsequently to read_local_xlog_page() leads
>> to the following line:
>>
>> read_upto = GetXLogReplayRecPtr(&ThisTimeLineID);
>>
>> GetXLogReplayRecPtr() will change ThisTimeLineID to 1, in order to read
>> the two phase WAL records in the older timeline. This variable will
>> remain unchanged and the checkpointer ends up writing the checkpoint
>> record into the older WAL segment (when XLogBeginInsert() is called
>> within CreateCheckPoint(), the value is still 1). The value is not
>> synchronized as even if RecoveryInProgress() is called,
>> xlogctl->SharedRecoveryState is not RECOVERY_STATE_DONE
>> (SharedRecoveryInProgress = true in older versions) as the startup
>> process waits for the checkpointer inside RequestCheckpoint() (since
>> recovery_target_action='promote' involves a non-fast promotion). Thus,
>> InitXLOGAccess() is not called and the value of ThisTimeLineID is not
>> updated before the checkpoint record write.
>>
>> Since 1148e22a82e, GetXLogReplayRecPtr() is called with ThisTimeLineID
>> instead of a local variable, within read_local_xlog_page().

Confusing...

>> PFA a small patch that fixes the problem by explicitly calling
>> InitXLOGAccess() in CheckPointTwoPhase(), after the two phase state data
>> is read, in order to update ThisTimeLineID to the latest timeline. It is
>> okay to call InitXLOGAccess() as it is lightweight and would mostly be
>> a no-op.
>
> It is correct that read_local_xlog_page() changes ThisTimeLineID, but
> InitXLOGAccess() is correctly called in CreateCheckPoint:
>
> | /*
> | * An end-of-recovery checkpoint is created before anyone is allowed to
> | * write WAL. To allow us to write the checkpoint record, temporarily
> | * enable XLogInsertAllowed. (This also ensures ThisTimeLineID is
> | * initialized, which we need here and in AdvanceXLInsertBuffer.)
> | */
> | if (flags & CHECKPOINT_END_OF_RECOVERY)
> | LocalSetXLogInsertAllowed();
>
> It seems to e suficcient to recover ThisTimeLineID from the checkpoint
> record to be written, as attached?

I think it should be reset even earlier, inside XlogReadTwoPhaseData()
probably. With your patch, doesn't the LogStandbySnapshot() call just
above where you're ressetting ThisTimeLineID also write a WAL record
with incorrect timeline?

Even better, can we avoid setting ThisTimeLineID in
XlogReadTwoPhaseData() in the first place?

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2021-03-03 08:52:00 Re: Increase value of OUTER_VAR
Previous Message Magnus Hagander 2021-03-03 08:44:15 Re: We should stop telling users to "vacuum that database in single-user mode"