Re: Parallel query hangs after a smart shutdown is issued

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
Subject: Re: Parallel query hangs after a smart shutdown is issued
Date: 2020-08-12 19:28:26
Message-ID: 296192.1597260506@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> On Thu, Aug 13, 2020 at 6:00 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> One other thing I changed here was to remove PM_WAIT_READONLY from the
>> set of states in which we'll allow promotion to occur or a new walreceiver
>> to start. I'm not convinced that either of those behaviors aren't
>> bugs; although if someone thinks they're right, we can certainly put
>> back PM_WAIT_CLIENTS in those checks. (But, for example, it does not
>> appear possible to reach PM_WAIT_READONLY/PM_WAIT_CLIENTS state with
>> Shutdown == NoShutdown, so the test in MaybeStartWalReceiver sure looks
>> like confusingly dead code to me. If we do want to allow restarting
>> the walreceiver in this state, the Shutdown condition needs fixed.)

> If a walreceiver is allowed to run, why should it not be allowed to
> restart?

I'd come to about the same conclusion after thinking more, so v2
attached undoes that change. I think putting off promotion is fine
though; it'll get handled at the next postmaster start. (It looks
like the state machine would just proceed to exit anyway if we allowed
the promotion, but that's a hard-to-test state transition that we could
do without.)

> Yeah, I suppose that other test'd need to be Shutdown <=
> SmartShutdown, just like we do in SIGHUP_handler(). Looking at other
> places where we test Shutdown == NoShutdown, one that jumps out is the
> autovacuum wraparound defence stuff and the nearby
> PMSIGNAL_START_AUTOVAC_WORKER code.

Oh, excellent point! I'd not thought to look at tests of the Shutdown
variable, but yeah, those should be <= SmartShutdown if we want autovac
to continue to operate in this state.

I also noticed that where reaper() is dealing with startup process
exit(3), it unconditionally sets Shutdown = SmartShutdown which seems
pretty bogus; that variable's value should never be allowed to decrease,
but this could cause it. In the attached I did

StartupStatus = STARTUP_NOT_RUNNING;
- Shutdown = SmartShutdown;
+ Shutdown = Max(Shutdown, SmartShutdown);
TerminateChildren(SIGTERM);

But given that it's forcing immediate termination of all backends,
I wonder if that's not more like a FastShutdown? (Scary here is
that the coverage report shows we're not testing this path, so who
knows if it works at all.)

regards, tom lane

Attachment Content-Type Size
keep-bg-processes-alive-in-smart-shutdown-2.patch text/x-diff 9.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-08-12 19:54:40 Re: use pg_get_functiondef() in pg_dump
Previous Message Thomas Munro 2020-08-12 18:54:38 Re: Parallel query hangs after a smart shutdown is issued