Fwd: background worker shudown (was Re: [HACKERS] Why does logical replication launcher exit with exit code 1?)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Fwd: background worker shudown (was Re: [HACKERS] Why does logical replication launcher exit with exit code 1?)
Date: 2018-10-09 18:28:54
Message-ID: CA+TgmobwExL4kNj_eXJxPah_tVQ31N0cYDbUN0FFm6uaY_+X=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 4, 2018 at 7:37 PM Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> I still think the current situation is non-ideal. I don't have a
> strong view on whether some or all system-wide processes should say
> hello and goodbye explicitly in the log, but I do think we need a way
> to make that not look like an error condition, and ideally without
> special cases for known built-in processes.
>
> I looked into this a bit today, while debugging an extension that runs
> a background worker. Here are some assorted approaches to shutdown:
>
> 0. The default SIGTERM handler for bgworkers is bgworker_die(), which
> generates a FATAL ereport "terminating background worker \"%s\" due to
> administrator command", directly in the signal handler (hmm, see also
> recent discussions about the legality of similar code in quickdie()).

Yeah, I think that code is bad news, for the same reasons discussed
with regard to quickdie().

> 1. Some processes install their own custom SIGTERM handler that sets
> a flag, and use that to break out of their main loop and exit quietly.
> Example: autovacuum.c, or the open-source pg_cron extension, and
> probably many other things that just want a nice quiet shutdown.
>
> 2. Some processes install the standard SIGTERM handler die(), and
> then use CHECK_FOR_INTERRUPTS() to break out of their main loop. By
> default this looks like "FATAL: terminating connection due to
> administrator command". This approach is effectively required if the
> main loop will reach other code that has a CHECK_FOR_INTERRUPTS() wait
> loop. For example, a "launcher"-type process that calls
> WaitForBackgroundWorkerStartup() could hang forever if it used
> approach 1 (ask me how I know).

My experience with background workers has been that approach #1 does
not really work. I mean, if you aren't doing anything complicated it
might be OK, if for example you are executing SQL queries, and you try
to do #1, then your SQL queries don't respond to interrupts. And that
sucks. So I have generally adopted approach #2.

To put that another way, nearly everything can reach
CHECK_FOR_INTERRUPTS(), so saying that this is required whenever that
in the case isn't much different from saying that it is required, full
stop.

> 3. Some system processes generally use approach 2, but have a special
> case in ProcessInterrupts() to suppress or alter the usual FATAL
> message or behaviour. This includes the logical replication launcher.

Maybe we could replace this by a general-purpose hook. So instead of
the various tests for process types that are there now, we would just
have

if (procdie_hook != NULL)
(*procdie_hook)();

And that hook can do whatever you like (but probably including dying).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2018-10-09 18:32:29 Re: pread() and pwrite()
Previous Message Robert Haas 2018-10-09 18:27:44 Re: executor relation handling