Re: logical replication and PANIC during shutdown checkpoint in publisher

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical replication and PANIC during shutdown checkpoint in publisher
Date: 2017-06-05 06:30:38
Message-ID: CAB7nPqS-L0Y6rgtFMG1S+1ZXzyLXf86He=y2b10D3dOheiHsxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 5, 2017 at 10:29 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2017-06-02 17:20:23 -0700, Andres Freund wrote:
>> Attached is a *preliminary* patch series implementing this. I've first
>> reverted the previous patch, as otherwise backpatchable versions of the
>> necessary patches would get too complicated, due to the signals used and
>> such.

That makes sense.

> I went again through this, and the only real thing I found that there
> was a leftover prototype in walsender.h. I've in interim worked on
> backpatch versions of that series, annoying conflicts, but nothing
> really problematic. The only real difference is adding SetLatch() calls
> to HandleWalSndInitStopping() < 9.6, and guarding SetLatch with an if <
> 9.5.
>
> As an additional patch (based on one by Petr), even though it more
> belongs to
> http://archives.postgresql.org/message-id/20170421014030.fdzvvvbrz4nckrow%40alap3.anarazel.de
> attached is a patch unifying SIGHUP between normal and walsender
> backends. This needs to be backpatched all the way. I've also attached
> a second patch, again based on Petr's, that unifies SIGHUP handling
> across all the remaining backends, but that's something that probably
> more appropriate for v11, although I'm still tempted to commit it
> earlier.

I have looked at all those patches. The set looks solid to me.

0001 and 0002 are straight-forward things. It makes sense to unify the
SIGUSR1 handling.

Here are some comments about 0003.

+ * This will trigger walsenders to send the remaining WAL, prevent them from
+ * accepting further commands. After that they'll wait till the last WAL is
+ * written.
s/prevent/preventing/?
I would rephrase the last sentence a bit:
"After that each WAL sender will wait until the end-of-checkpoint
record has been flushed on the receiver side."

+ /*
+ * Have WalSndLoop() terminate the connection in an orderly
+ * manner, after writing out all the pending data.
+ */
+ if (got_STOPPING)
+ got_SIGUSR2 = true;
I think that for correctness the state of the WAL sender should be
switched to WALSNDSTATE_STOPPING in XLogSendLogical() as well.

About 0004... This definitely meritates a backpatch, PostgresMain() is
taken by WAL senders as well when executing queries.

- if (got_SIGHUP)
+ if (ConfigRereadPending)
{
- got_SIGHUP = false;
+ ConfigRereadPending = false;
A more appropriate name would be ConfigReloadPending perhaps?

0005 looks like a fine one-liner to me.

For 0006, you could include as well the removal of worker_spi_sighup()
in the refactoring. I think that it would be interesting to be able to
trigger a feedback message using SIGHUP in WAL receivers, refactoring
at the same time SIGHUP handling for WAL receivers. It is possible for
example to abuse SIGHUP in autovacuum for cost parameters.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-06-05 06:34:42 Re: Server ignores contents of SASLInitialResponse
Previous Message Amit Kapila 2017-06-05 05:57:49 Re: UPDATE of partition key