Re: SIGQUIT handling, redux

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: SIGQUIT handling, redux
Date: 2020-09-10 14:36:23
Message-ID: CA+TgmoZc6QQoFeWJUj0c5e7bqhMaq=LnwTSFtUin-euz-u1HZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 9, 2020 at 10:07 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> bgworker_die (SIGTERM)
>
> Calls ereport(FATAL). This is surely not any safer than, say,
> quickdie(). No, it's worse, because at least that won't try
> to go out via proc_exit().

I think bgworker_die() is pretty much a terrible idea. Every
background worker I've written has actually needed to use
CHECK_FOR_INTERRUPTS(). I think that the only way this could actually
be safe is if you have a background worker that never uses ereport()
itself, so that the ereport() in the signal handler can't be
interrupting one that's already happening. This seems unlikely to be
the normal case, or anything close to it. Most background workers
probably are shared-memory connected and use a lot of PostgreSQL
infrastructure and thus ereport() all over the place.

Now what to do about it I don't know exactly, but it would be nice to
do something.

> StandbyDeadLockHandler (from SIGALRM)
> StandbyTimeoutHandler (ditto)
>
> Calls CancelDBBackends, which just for starters tries to acquire
> an LWLock. I think the only reason we've gotten away with this
> for this long is the high probability that by the time either
> timeout fires, we're going to be blocked on a semaphore.

Yeah, I'm not sure these are so bad. In fact, in the deadlock case, I
believe the old coding was designed to make sure we *had to* be
blocked on a semaphore, but I'm not sure whether that's still true.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-09-10 15:21:02 Re: recovering from "found xmin ... from before relfrozenxid ..."
Previous Message Greg Steiner 2020-09-10 14:30:54 Re: BUG #15858: could not stat file - over 4GB