Re: Race conditions with checkpointer and shutdown

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Race conditions with checkpointer and shutdown
Date: 2019-04-19 03:41:30
Message-ID: CA+hUKG+=1G98m61VjNS-qGboJPwdZcF+rAPu2eC4XuWRTR3UPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 19, 2019 at 10:22 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> > 2019-04-16 08:23:24.178 CEST [8393] FATAL: terminating walreceiver
> > process due to administrator command

> Maybe what we should be looking for is "why doesn't the walreceiver
> shut down"? But the dragonet log you quote above shows the walreceiver
> exiting, or at least starting to exit. Tis a puzzlement.

One thing I noticed about this message: if you receive SIGTERM at a
rare time when WalRcvImmediateInterruptOK is true, then that ereport()
runs directly in the signal handler context. That's not strictly
allowed, and could cause nasal demons. On the other hand, it probably
wouldn't have managed to get the FATAL message out if that was the
problem here (previously we've seen reports of signal handlers
deadlocking while trying to ereport() but they couldn't get their
message out at all, because malloc or some such was already locked in
the user context). Is there some way that the exit code could hang
*after* that due to corruption of libc resources (FILE streams,
malloc, ...)? It doesn't seem likely to me (we'd hopefully see some
more clues) but I thought I'd mention the idea.

--
Thomas Munro
https://enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-04-19 03:48:07 Re: Race conditions with checkpointer and shutdown
Previous Message Paul Guo 2019-04-19 03:40:04 Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)