Re: logical replication and PANIC during shutdown checkpoint in publisher

From: Andres Freund <andres(at)anarazel(dot)de>
To: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical replication and PANIC during shutdown checkpoint in publisher
Date: 2017-04-17 16:30:21
Message-ID: 20170417163021.v3fngu4leu36jocr@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017-04-17 18:28:16 +0200, Petr Jelinek wrote:
> On 17/04/17 18:02, Andres Freund wrote:
> > On 2017-04-15 02:33:59 +0900, Fujii Masao wrote:
> >> On Fri, Apr 14, 2017 at 10:33 PM, Petr Jelinek
> >> <petr(dot)jelinek(at)2ndquadrant(dot)com> wrote:
> >>> On 12/04/17 15:55, Fujii Masao wrote:
> >>>> Hi,
> >>>>
> >>>> When I shut down the publisher while I repeated creating and dropping
> >>>> the subscription in the subscriber, the publisher emitted the following
> >>>> PANIC error during shutdown checkpoint.
> >>>>
> >>>> PANIC: concurrent transaction log activity while database system is
> >>>> shutting down
> >>>>
> >>>> The cause of this problem is that walsender for logical replication can
> >>>> generate WAL records even during shutdown checkpoint.
> >>>>
> >>>> Firstly walsender keeps running until shutdown checkpoint finishes
> >>>> so that all the WAL including shutdown checkpoint record can be
> >>>> replicated to the standby. This was safe because previously walsender
> >>>> could not generate WAL records. However this assumption became
> >>>> invalid because of logical replication. That is, currenty walsender for
> >>>> logical replication can generate WAL records, for example, by executing
> >>>> CREATE_REPLICATION_SLOT command. This is an oversight in
> >>>> logical replication patch, I think.
> >>>
> >>> Hmm, but CREATE_REPLICATION_SLOT should not generate WAL afaik. I agree
> >>> that the issue with walsender still exist (since we now allow normal SQL
> >>> to run there) but I think it's important to identify what exactly causes
> >>> the WAL activity in your case
> >>
> >> At least in my case, the following CREATE_REPLICATION_SLOT command
> >> generated WAL record.
> >>
> >> BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ;
> >> CREATE_REPLICATION_SLOT testslot TEMPORARY LOGICAL pgoutput USE_SNAPSHOT;
> >>
> >> Here is the pg_waldump output of the WAL record that CREATE_REPLICATION_SLOT
> >> generated.
> >>
> >> rmgr: Standby len (rec/tot): 24/ 50, tx: 0,
> >> lsn: 0/01601438, prev 0/01601400, desc: RUNNING_XACTS nextXid 692
> >> latestCompletedXid 691 oldestRunningXid 692
> >>
> >> So I guess that CREATE_REPLICATION_SLOT code calls LogStandbySnapshot()
> >> and which generates WAL record about snapshot of running transactions.
> >
> > Erroring out in these cases sounds easy enough. Wonder if there's not a
> > bigger problem with WAL records generated e.g. by HOT pruning or such,
> > during decoding. Not super likely, but would probably hit exactly the
> > same, no?
> >
>
> Sounds possible, yes. Sounds like that's going to be nontrivial to fix
> though.
>
> Another problem is that queries can run on walsender now. But that
> should be possible to detect and shutdown just like backend.

This sounds like a case for s/PANIC/ERROR|FATAL/ to me...

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2017-04-17 16:33:44 Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...
Previous Message Petr Jelinek 2017-04-17 16:28:16 Re: logical replication and PANIC during shutdown checkpoint in publisher