Re: PITR promote bug: Checkpointer writes to older timeline

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: michael(at)paquier(dot)xyz
Cc: soumyadeep2007(at)gmail(dot)com, masao(dot)fujii(at)oss(dot)nttdata(dot)com, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)postgresql(dot)org, jyih(at)vmware(dot)com, kyeap(at)vmware(dot)com
Subject: Re: PITR promote bug: Checkpointer writes to older timeline
Date: 2021-03-04 07:17:34
Message-ID: 20210304.161734.224512251734869803.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Thu, 4 Mar 2021 11:18:42 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in
> On Thu, Mar 04, 2021 at 10:28:31AM +0900, Kyotaro Horiguchi wrote:
> > read_local_xlog_page() works as a part of logical decoding and has
> > responsibility to update ThisTimeLineID properly. As the comment in
> > the function, it is the proper place to update ThisTimeLineID since we
> > miss a timeline change if we check it earlier and the function uses
> > the value just after. So we cannot change that behavior of the
> > function. That is, neither of them doesn't seem to be the right fix.
> >
> > The confusion here is that there's two ThisTimeLineID's here. The
> > previous TLI for read and the next TLI to write. Most part of the
> > function is needed to read a 2pc recrod so the ways we can take here
> > is:
> >
> > 1. Somehow tell the function not to update ThisTimeLineID in specific
> > cases. This can be done by xlogreader private data but this doesn't
> > seem reasonable.
> >
> > 2. Restore ThisTimeLineID after calling XLogReadRecord() in the
> > *caller* side. This is what came up to me first but I don't like
> > this, too, but I don't find better fix. way.
>
> I have not looked in details at the solutions proposed here, but could
> it be possible to have a TAP test at least please? Seeing the script
> from the top of the thread, it should not be difficult to do so. I
> would put that in a file different than 009_twophase.pl, within
> src/test/recovery/.

Year, agreed. It is needed as the final patch. That situation is
easily caused. I'm not sure how to detect the corruption yet, though.
I'll consider that.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2021-03-04 07:23:33 Re: Shared memory size computation oversight?
Previous Message Drouvot, Bertrand 2021-03-04 07:17:04 Re: [BUG] segfault during delete