Re: Loss of replication after simple misconfiguration

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Victor Yegorov <vyegorov(at)gmail(dot)com>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, hubert depesz lubaczewski <depesz(at)depesz(dot)com>, pgsql-bugs mailing list <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Loss of replication after simple misconfiguration
Date: 2020-04-10 04:14:34
Message-ID: 20200410041434.GU1606@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Apr 09, 2020 at 07:48:17PM +0300, Victor Yegorov wrote:
> We've hit similar issue last week, but on 11.5 — we
> had track_commit_timestamp enabled on master after switchover,
> replica failed to start.
>
> It might be, that fix was here:
> https://git.postgresql.org/pg/commitdiff/180feb8c7e
> (For 9.5 branch it is: https://git.postgresql.org/pg/commitdiff/69a5686368)
>
> We're not in the position to test it again, though…

Hmm. We have a gap in tests here as we don't have any tests stressing
switchovers when it comes to track_commit_timestamps. Anyway, could
you confirm that I got the problem right? Here is the flow I am getting
from the information of upthread, roughly:
1) Primary/standby cluster, both using max_worker_processes = 8, and
track_commit_timestamp = off.
2) In order to begin the switchover, first stop cleanly the primary.
3) Update configuration of the standby as follows, promote it and
restart it:
track_commit_timestamp = on
max_worker_processes = 50
4) Enable streaming on the old primary to make it a standby, starting
it fails because of the unmatching setting for max_worker_processes.
5) Re-adjust max_worker_processes correctly on the new standby, start
it. Then this startup should fail at the lookup of pg_commit_ts/.

I have been able to write a TAP test to reproduce this exact scenario,
though it succeeds for me (it could be a good idea to add some
coverage for that actually..). Perhaps I am missing a step though?
For example, perhaps the original setting was track_commit_timestamp =
on, then it got disabled once?
--
Michael

Attachment Content-Type Size
committs-switchover-test.patch text/x-diff 3.2 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2020-04-10 05:43:15 Re: Loss of replication after simple misconfiguration
Previous Message Kyotaro Horiguchi 2020-04-10 02:14:54 Re: [BUG] non archived WAL removed during production crash recovery