Re: PITR promote bug: Checkpointer writes to older timeline

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: soumyadeep2007(at)gmail(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org, kevin(dot)yeap(at)vmware(dot)com, michael(at)paquier(dot)xyz, jyih(at)vmware(dot)com
Subject: Re: PITR promote bug: Checkpointer writes to older timeline
Date: 2021-03-03 06:47:42
Message-ID: 20210303.154742.1003082376286316043.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Tue, 2 Mar 2021 17:56:03 -0800, Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com> wrote in
> Hello hackers,
>
> We came across an issue where the checkpointer writes to the older
> timeline while a promotion is ongoing after reaching the recovery point
> in a PITR, when there are prepared transactions before the recovery
> point. We came across this issue first in REL_12_STABLE and saw that it
> also exists in devel.

Good Catch! I can reproduce that.

> When there are prepared transactions in an older timeline, in the
> checkpointer, a call to CheckPointTwoPhase() and subsequently to
> XlogReadTwoPhaseData() and subsequently to read_local_xlog_page() leads
> to the following line:
>
> read_upto = GetXLogReplayRecPtr(&ThisTimeLineID);
>
> GetXLogReplayRecPtr() will change ThisTimeLineID to 1, in order to read
> the two phase WAL records in the older timeline. This variable will
> remain unchanged and the checkpointer ends up writing the checkpoint
> record into the older WAL segment (when XLogBeginInsert() is called
> within CreateCheckPoint(), the value is still 1). The value is not
> synchronized as even if RecoveryInProgress() is called,
> xlogctl->SharedRecoveryState is not RECOVERY_STATE_DONE
> (SharedRecoveryInProgress = true in older versions) as the startup
> process waits for the checkpointer inside RequestCheckpoint() (since
> recovery_target_action='promote' involves a non-fast promotion). Thus,
> InitXLOGAccess() is not called and the value of ThisTimeLineID is not
> updated before the checkpoint record write.
>
> Since 1148e22a82e, GetXLogReplayRecPtr() is called with ThisTimeLineID
> instead of a local variable, within read_local_xlog_page().
>
> PFA a small patch that fixes the problem by explicitly calling
> InitXLOGAccess() in CheckPointTwoPhase(), after the two phase state data
> is read, in order to update ThisTimeLineID to the latest timeline. It is
> okay to call InitXLOGAccess() as it is lightweight and would mostly be
> a no-op.

It is correct that read_local_xlog_page() changes ThisTimeLineID, but
InitXLOGAccess() is correctly called in CreateCheckPoint:

| /*
| * An end-of-recovery checkpoint is created before anyone is allowed to
| * write WAL. To allow us to write the checkpoint record, temporarily
| * enable XLogInsertAllowed. (This also ensures ThisTimeLineID is
| * initialized, which we need here and in AdvanceXLInsertBuffer.)
| */
| if (flags & CHECKPOINT_END_OF_RECOVERY)
| LocalSetXLogInsertAllowed();

It seems to e suficcient to recover ThisTimeLineID from the checkpoint
record to be written, as attached?

regareds.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
write_checkpoint_record_to_the_right_timeline.patch text/x-patch 620 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-03-03 06:57:47 Re: buildfarm windows checks / tap tests on windows
Previous Message Dilip Kumar 2021-03-03 06:37:53 Re: [Patch] ALTER SYSTEM READ ONLY