Re: Loss of replication after simple misconfiguration

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
Cc: Victor Yegorov <vyegorov(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, pgsql-bugs mailing list <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Loss of replication after simple misconfiguration
Date: 2020-04-10 07:59:32
Message-ID: 20200410075932.GY1606@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Apr 10, 2020 at 09:26:51AM +0200, hubert depesz lubaczewski wrote:
> In our case it was *at least* this scenario:
>
> 1. master and slave both with max_worker_processes and
> track_commit_timestamp off.
> 2. config files get changed on both to include track_commit_timestamp on
> 3. slave gets restarted
> 4. config files get changed on both to include max_worker_processes = 50
> 5. master gets stopped by "power outage"
> 6. after master re-starts, replication to slave dies.

Without the standby restarted here, its configuration remains at the
former value of max_worker_processes, which is lower than the setting
of the primary, so it would logically stop in this case if not
restarted once it replays the XLOG_PARAMETER_CHANGE record generated
from the primary.

> Andrew suggested yesterday on IRC that it could be timing issue, so
> testing for it might be complicated - hence my inability to replicate
> the problem in test environment.

I am actually wondering if we may not be putting our finger on another
issue related to the minimum consistency LSN here.

> I will try to do the tests using extended scenarios with slave2 and
> slave3, but I'm not overly optimistic about replicating this particular
> case.

Thanks.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message hubert depesz lubaczewski 2020-04-10 08:01:22 Re: Loss of replication after simple misconfiguration
Previous Message hubert depesz lubaczewski 2020-04-10 07:26:51 Re: Loss of replication after simple misconfiguration