Re: logical replication and PANIC during shutdown checkpoint in publisher

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical replication and PANIC during shutdown checkpoint in publisher
Date: 2017-04-14 00:03:06
Message-ID: CAB7nPqQbH2fxiE8PT21DXAXTi7MwWdKufAUgi25_YZSTTge_QA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 14, 2017 at 3:03 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Thu, Apr 13, 2017 at 12:36 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Thu, Apr 13, 2017 at 12:28 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> On Thu, Apr 13, 2017 at 5:25 AM, Peter Eisentraut
>>> <peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
>>>> On 4/12/17 09:55, Fujii Masao wrote:
>>>>> To fix this issue, we should terminate walsender for logical replication
>>>>> before shutdown checkpoint starts. Of course walsender for physical
>>>>> replication still needs to keep running until shutdown checkpoint ends,
>>>>> though.
>>>>
>>>> Can we turn it into a kind of read-only or no-new-commands mode instead,
>>>> so it can keep streaming but not accept any new actions?
>>>
>>> So we make walsenders switch to that mode and wait for all the already-ongoing
>>> their "write" commands to finish, and then we start a shutdown checkpoint?
>>> This is an idea, but seems a bit complicated. ISTM that it's simpler to
>>> terminate only walsenders for logical rep before shutdown checkpoint.
>>
>> Perhaps my memory is failing me here... But in clean shutdowns we do
>> shut down WAL senders after the checkpoint has completed so as we are
>> sure that they have flushed the LSN corresponding to the checkpoint
>> ending, right?
>
> Yes.
>
>> Why introducing an inconsistency for logical workers?
>> It seems to me that logical workers should fail under the same rules.
>
> Could you tell me why? You think that even walsender for logical rep
> needs to stream the shutdown checkpoint WAL record to the subscriber?
> I was not thinking that's true.

For physical replication, the property to wait that standbys have
flushed the LSN of the shutdown checkpoint can be important for
switchovers. For example, with a primary and a standby, it is possible
to stop cleanly the master, promote the standby, and then connect back
to the cluster the old primary as a standby to the now-new primary
with the guarantee that both are in a consistent state. It seems to me
that having similar guarantees for logical replication is important.

Now, I have not reviewed the code of logirep in details at the level
of Peter, Petr or yourself...
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2017-04-14 00:08:59 Re: \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)
Previous Message Michael Paquier 2017-04-13 23:44:37 Re: Small patch for pg_basebackup argument parsing