Re: PITR promote bug: Checkpointer writes to older timeline

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: michael(at)paquier(dot)xyz
Cc: masao(dot)fujii(at)oss(dot)nttdata(dot)com, soumyadeep2007(at)gmail(dot)com, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)postgresql(dot)org, jyih(at)vmware(dot)com, kyeap(at)vmware(dot)com
Subject: Re: PITR promote bug: Checkpointer writes to older timeline
Date: 2021-03-15 06:01:09
Message-ID: 20210315.150109.1566781753677145464.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Sun, 14 Mar 2021 17:59:59 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in
> On Thu, Mar 04, 2021 at 05:10:36PM +0900, Kyotaro Horiguchi wrote:
> > read_local_xlog_page is *designed* to maintain ThisTimeLineID.
> > Currently it doesn't seem utilized but I think it's sufficiently
> > reasonable that the function maintains ThisTimeLineID.
>
> I don't quite follow this line of thoughts. ThisTimeLineID is
> designed to remain 0 while recovery is running in most processes
> (at the close exception of a WAL sender with a cascading setup,

The reason for the "0" is they just aren't interested in the value.
Checkpointer temporarily uses it then restore to 0 soon.

> physical or logical, of course), so why is there any business for
> read_local_xlog_page() to touch this field at all while in recovery to
> begin with?

Logical decoding stuff is (I think) designed to turn any backend into
a walsender, which may need to maintain ThisTimeLineID. It seems to
me that logical decoding stuff indents to maintain ThisTimeLineID of
such backends at reading a WAL record. logical_read_xlog_page also
updates ThisTimeLineID and pg_logical_slot_get_changes_guts(),
pg_replication_slot_advance() (and maybe other functions) updates
ThisTimeLineID. So it is natural that local_read_xlog_page() updates
it since it is intended to be used used in logical decoding plugins.

> I equally find confusing that XLogReadDetermineTimeline() relies on a
> specific value of ThisTimeLineID in its own logic, while it clearly
> states that all its callers have to read the current active TLI
> beforehand. So I think that the existing logic is pretty weak, and
> that resetting the field is an incorrect approach? It seems to me

It is initialized by IndentifySystem(). And the logical walsender
intends to maintain ThisTimeLineID by subsequent calls to
GetStandbyFlushRecPtr(), which happen in logical_read_xlog_page().

> that we had better not change ThisTimeLineID actively while in
> recovery in this code path and just let others do the job, like
> RecoveryInProgress() once recovery finishes, or
> GetStandbyFlushRecPtr() for a WAL sender. And finally, we should
> store the current TLI used for replay in a separate variable that gets
> passed down to the XLogReadDetermineTimeline() as argument.

I agree that it's better that the replay TLI is stored in a separate
variable. It is what I was complained on in the previous mails. (It
might not have been so obvious, though..)

> While going through it, I have simplified a bit the proposed TAP tests
> (thanks for replacing the sleep() call, Soumyadeep. This would have
> made the test slower for nothing on fast machines, and it would cause
> failures on very slow machines).
>
> The attached fixes the original issue for me, keeping all the records
> in their correct timeline. And I have not been able to break
> cascading setups. If it happens that such cases actually break, we
> have holes in our existing test coverage that should be improved. I
> cannot see anything fancy missing on this side, though.
>
> Any thoughts?

I don't think there's any acutual user of the function for the
purpose, but.. Anyawy if we remove the update of ThisTimeLineID from
read_local_xlog_page, I think we should remove or rewrite the
following comment for the function. It no longer works as written in
the catchphrase.

> * Public because it would likely be very helpful for someone writing another
> * output method outside walsender, e.g. in a bgworker.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-03-15 06:04:56 Re: Parallel INSERT (INTO ... SELECT ...)
Previous Message Amit Kapila 2021-03-15 05:55:26 Re: Parallel INSERT (INTO ... SELECT ...)