Re: PITR promote bug: Checkpointer writes to older timeline

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: masao(dot)fujii(at)oss(dot)nttdata(dot)com, soumyadeep2007(at)gmail(dot)com, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)postgresql(dot)org, jyih(at)vmware(dot)com, kyeap(at)vmware(dot)com
Subject: Re: PITR promote bug: Checkpointer writes to older timeline
Date: 2021-03-14 08:59:59
Message-ID: YE3Qj2s34Fayy2aF@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 04, 2021 at 05:10:36PM +0900, Kyotaro Horiguchi wrote:
> read_local_xlog_page is *designed* to maintain ThisTimeLineID.
> Currently it doesn't seem utilized but I think it's sufficiently
> reasonable that the function maintains ThisTimeLineID.

I don't quite follow this line of thoughts. ThisTimeLineID is
designed to remain 0 while recovery is running in most processes
(at the close exception of a WAL sender with a cascading setup,
physical or logical, of course), so why is there any business for
read_local_xlog_page() to touch this field at all while in recovery to
begin with?

I equally find confusing that XLogReadDetermineTimeline() relies on a
specific value of ThisTimeLineID in its own logic, while it clearly
states that all its callers have to read the current active TLI
beforehand. So I think that the existing logic is pretty weak, and
that resetting the field is an incorrect approach? It seems to me
that we had better not change ThisTimeLineID actively while in
recovery in this code path and just let others do the job, like
RecoveryInProgress() once recovery finishes, or
GetStandbyFlushRecPtr() for a WAL sender. And finally, we should
store the current TLI used for replay in a separate variable that gets
passed down to the XLogReadDetermineTimeline() as argument.

While going through it, I have simplified a bit the proposed TAP tests
(thanks for replacing the sleep() call, Soumyadeep. This would have
made the test slower for nothing on fast machines, and it would cause
failures on very slow machines).

The attached fixes the original issue for me, keeping all the records
in their correct timeline. And I have not been able to break
cascading setups. If it happens that such cases actually break, we
have holes in our existing test coverage that should be improved. I
cannot see anything fancy missing on this side, though.

Any thoughts?
--
Michael

Attachment Content-Type Size
fix-pitr-2pc-v3.patch text/x-diff 9.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2021-03-14 09:54:28 Re: psql - add SHOW_ALL_RESULTS option
Previous Message Fabien COELHO 2021-03-14 08:48:22 Re: Using COPY FREEZE in pgbench