Re: PITR promote bug: Checkpointer writes to older timeline

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: masao(dot)fujii(at)oss(dot)nttdata(dot)com, soumyadeep2007(at)gmail(dot)com, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)postgresql(dot)org, jyih(at)vmware(dot)com, kyeap(at)vmware(dot)com
Subject: Re: PITR promote bug: Checkpointer writes to older timeline
Date: 2021-03-22 00:07:19
Message-ID: YFfft/IZVoHK90Vy@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 18, 2021 at 12:56:12PM +0900, Michael Paquier wrote:
> I was looking at uses of ThisTimeLineID in the wild, and could not
> find it getting checked or used actually in backend-side code that
> involved the WAL reader facility. Even if it brings confidence, it
> does not mean that it is not used somewhere :/

I have been working on that over the last couple of days, and applied
a fix down to 10. One thing that I did not like in the test was the
use of compare() to check if the contents of the WAL segment before
and after the timeline jump remained the same as this would have been
unstable with any concurrent activity. Instead, I have added a phase
at the end of the test with an extra checkpoint and recovery triggered
once, which is enough to reproduce the PANIC reported at the top of
the thread.

I'll look into clarifying the use of ThisTimeLineID within the those
WAL reader callbacks, because this is really bug-prone in the long
term... This requires some coordination with the recent work aimed at
adding some logical decoding support in standbys, though.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-03-22 00:09:45 Re: Log message for GSS connection is missing once connection authorization is successful.
Previous Message Justin Pryzby 2021-03-21 23:55:45 Re: [HACKERS] Custom compression methods